Files
pos-system/docs/en/guides/troubleshooting.md

219 lines
5.9 KiB
Markdown

# Troubleshooting Guide
> **Note**: This guide focuses on debugging the GoodGo Microservices Platform in a local development environment (Docker Compose).
## Table of Contents
1. [General Diagnosis](#general-diagnosis)
2. [Infrastructure Issues](#infrastructure-issues)
- [Database (Neon/PostgreSQL)](#database-neonpostgresql)
- [Redis](#redis)
- [Traefik Gateway](#traefik-gateway)
3. [Service Issues](#service-issues)
- [Service Fails to Start](#service-fails-to-start)
- [Prisma/Database Errors](#prismadatabase-errors)
- [Authentication Errors](#authentication-errors)
4. [Debugging Tools](#debugging-tools)
5. [FAQ](#faq)
---
## General Diagnosis
When something goes wrong, follow this checklist:
1. **Check Service Status**:
```bash
cd deployments/local
docker-compose ps
```
*All services should be `Up` or `Running`.*
2. **Check Logs**:
```bash
# View logs for a specific service
docker-compose logs -f <service-name>
# View last 100 lines for all
docker-compose logs --tail=100
```
3. **Check Connectivity**:
* Can you reach the Gateway? `curl http://localhost/health`
* Can you reach the Dashboard? http://localhost:8080
---
## Infrastructure Issues
### Database (Neon/PostgreSQL)
**Problem**: `P1001: Can't reach database server` or `Connection timed out`
* **Cause 1**: Internet connectivity issues (Neon is cloud-based).
* **Cause 2**: Incorrect `DATABASE_URL` in `.env`.
* **Cause 3**: IP address blocked by Neon.
**Solution**:
1. Verify internet connection: `ping neon.tech`.
2. Check `deployments/local/.env.local`. The URL should look like:
`postgres://user:pass@ep-xyz.aws.neon.tech/neondb`
3. Go to Neon Dashboard -> Settings, ensure "Allow all IPs" or add your current IP.
**Problem**: `P1003: Database does not exist`
* **Reason**: You are connecting to the wrong database name.
* **Fix**: Check the end of your connection string (e.g., `/neondb` usually). If you are using a custom DB name, ensure it exists in Neon.
### Redis
**Problem**: `Redis connection refused` or `ECONNREFUSED`
* **Cause**: Redis container is not running or port mapping is wrong.
**Solution**:
1. Check Redis status: `docker-compose ps redis`.
2. Restart Redis: `docker-compose restart redis`.
3. Check logs: `docker-compose logs redis`.
4. Connection string from services:
* **Inside Docker**: `redis:6379`
* **From Host**: `localhost:6379`
### Traefik Gateway
**Problem**: `404 Not Found` when accessing APIs (e.g., `http://localhost/api/v1/auth`)
* **Cause**: Service is down or Labels are misconfigured.
**Solution**:
1. Check Traefik Dashboard at http://localhost:8080.
* Look for "HTTP Routers" and "Services".
* If your service is missing, check `docker-compose.yml` labels.
2. Verify `PathPrefix` in labels matches your request.
```yaml
- "traefik.http.routers.iam.rule=PathPrefix(`/api/v1/auth`)"
```
3. Check if the service passed health checks (Health status in dashboard).
**Problem**: `Bad Gateway` or `Gateway Timeout`
* **Cause**: Service is crashing or taking too long to respond.
* **Fix**: Check the specific service logs (`docker-compose logs iam-service`).
---
## Service Issues
### Service Fails to Start
**Symptom**: Container status is `Exited (1)` or `Restarting`.
**Debugging**:
1. Check logs immediately:
```bash
docker-compose logs iam-service
```
2. **Common Error**: `Config validation error`
* **Fix**: Check environment variables. Using `./scripts/setup/init-project.sh` ensures `.env` exists.
3. **Common Error**: `PrismaClientInitializationError`
* **Fix**: Database connectivity issue (see Infrastructure section).
### Prisma/Database Errors
**Error**: `P2025: Record to update not found`
* **Fix**: Logic error. Ensure the ID exists before updating.
**Error**: `P2002: Unique constraint failed`
* **Fix**: You are trying to insert duplicate data (e.g., same email).
**Error**: `Migration failed`
* **Fix**:
1. Delete `prisma/migrations` folder (only in dev!).
2. Reset database: `pnpm prisma migrate reset`.
3. Regenerate client: `pnpm prisma generate`.
### Authentication Errors
**Problem**: `401 Unauthorized` despite valid token
* **Cause 1**: Token expired.
* **Cause 2**: Public key mismatch (Service can't verify token signed by IAM).
* **Cause 3**: Clock skew (Docker time vs Host time).
**Solution**:
1. Check server logs for JWT verification errors.
2. Restart services to refresh keys.
3. Sync Docker time: restart Docker Desktop.
---
## Debugging Tools
### 1. Accessing Container Shell
To inspect files or run commands inside a running container:
```bash
docker-compose exec iam-service sh
# or /bin/bash
```
### 2. Inspecting Database (via Prisma Studio)
Use Prisma Studio to view/edit data visually:
```bash
pnpm --filter @goodgo/iam-service prisma studio
# Opens http://localhost:5555
```
### 3. Inspecting Redis
```bash
docker-compose exec redis redis-cli
> PING
PONG
> KEYS *
1) "user:123:session"
```
### 4. Direct API Testing
Use `curl` or Postman.
```bash
# Health Check
curl -v http://localhost/api/v1/auth/health/live
# Login (example)
curl -X POST http://localhost/api/v1/auth/login \
-H "Content-Type: application/json" \
-d '{"email":"admin@example.com", "password":"password"}'
```
---
## FAQ
**Q: Why is my change not reflecting?**
A: If you changed `.env` or `docker-compose.yml`, you must restart:
```bash
docker-compose down && docker-compose up -d
```
If you changed code, hot-reloading (nodemon) should pick it up. If not, restart container.
**Q: How do I reset everything?**
A: Be careful, this deletes all data!
```bash
docker-compose down -v
# -v removes volumes (Redis data, etc.)
```
**Q: My computer is slow when running everything.**
A: Docker consumes RAM.
1. Stop unused services (e.g., `future-service`).
2. Increase Docker resource limits in Docker Desktop settings.