Files
pos-system/microservices/docs/en/guides/troubleshooting.md
Ho Ngoc Hai 76d75c753b Migrate
2026-05-23 18:37:02 +07:00

12 KiB

Troubleshooting Guide

Note

: This guide focuses on debugging the GoodGo Microservices Platform in a local development environment (Docker Compose).

Table of Contents

  1. General Diagnosis
  2. Infrastructure Issues
  1. Service Issues
  1. Debugging Tools
  2. FAQ

General Diagnosis

When something goes wrong, follow this checklist:

  1. Check Service Status:
cd deployments/local
docker-compose ps

All services should be Up or Running.

  1. Check Logs:
# View logs for a specific service
docker-compose logs -f <service-name>

# View last 100 lines for all
docker-compose logs --tail=100
  1. Check Connectivity:
  • Can you reach the Gateway? curl http://localhost/health
  • Can you reach the Dashboard? http://localhost:8080

Troubleshooting Flowchart

flowchart TD
 Start([ Issue Detected]) --> CheckStatus{Check Service<br/>Status}

 CheckStatus -->|All Running| CheckLogs[ Check Logs]
 CheckStatus -->|Some Down| IdentifyService[ Identify Failed<br/>Service]

 IdentifyService --> ServiceType{Service Type?}

 ServiceType -->|Infrastructure| InfraCheck[ Infrastructure<br/>Check]
 ServiceType -->|Application| AppCheck[ Application<br/>Check]

 InfraCheck --> DBCheck{Database?}
 InfraCheck --> RedisCheck{Redis?}
 InfraCheck --> TraefikCheck{Traefik?}

 DBCheck -->|Yes| DBSolution[ Check DATABASE_URL<br/> Verify Neon connection<br/> Check IP whitelist]
 RedisCheck -->|Yes| RedisSolution[ Restart Redis<br/> Check port mapping<br/> Verify connection string]
 TraefikCheck -->|Yes| TraefikSolution[ Check labels<br/> Verify PathPrefix<br/> Check health status]

 AppCheck --> ErrorType{Error Type?}

 ErrorType -->|Config| ConfigFix[ Check .env variables<br/> Run init-project.sh]
 ErrorType -->|Prisma| PrismaFix[ Check migrations<br/> Regenerate client<br/> Reset database]
 ErrorType -->|Auth| AuthFix[ Check token expiry<br/> Verify keys<br/> Sync Docker time]

 CheckLogs --> LogAnalysis{Log Shows<br/>Error?}
 LogAnalysis -->|Yes| ErrorType
 LogAnalysis -->|No| ConnCheck[ Check Connectivity]

 ConnCheck --> GatewayTest{Gateway<br/>Reachable?}
 GatewayTest -->|No| TraefikCheck
 GatewayTest -->|Yes| ServiceTest{Service<br/>Reachable?}

 ServiceTest -->|No| AppCheck
 ServiceTest -->|Yes| Resolved([ Issue Resolved])

 DBSolution --> Restart[ Restart Services]
 RedisSolution --> Restart
 TraefikSolution --> Restart
 ConfigFix --> Restart
 PrismaFix --> Restart
 AuthFix --> Restart

 Restart --> Verify{Issue<br/>Fixed?}
 Verify -->|Yes| Resolved
 Verify -->|No| DeepDebug[ Deep Debugging<br/>Required]

 DeepDebug --> ContainerShell[Access Container Shell]
 DeepDebug --> PrismaStudio[Use Prisma Studio]
 DeepDebug --> RedisInspect[Inspect Redis]
 DeepDebug --> APITest[Direct API Testing]

 style Start fill:#1a1a2e,color:#fff
 style Resolved fill:#0f3460,color:#fff
 style CheckStatus fill:#16213e,color:#fff
 style ServiceType fill:#16213e,color:#fff
 style ErrorType fill:#16213e,color:#fff
 style DBCheck fill:#16213e,color:#fff
 style RedisCheck fill:#16213e,color:#fff
 style TraefikCheck fill:#16213e,color:#fff
 style GatewayTest fill:#16213e,color:#fff
 style ServiceTest fill:#16213e,color:#fff
 style Verify fill:#16213e,color:#fff
 style LogAnalysis fill:#16213e,color:#fff
 style InfraCheck fill:#1a1a40,color:#fff
 style AppCheck fill:#1a1a40,color:#fff
 style DBSolution fill:#0f4c75,color:#fff
 style RedisSolution fill:#0f4c75,color:#fff
 style TraefikSolution fill:#0f4c75,color:#fff
 style ConfigFix fill:#0f4c75,color:#fff
 style PrismaFix fill:#0f4c75,color:#fff
 style AuthFix fill:#0f4c75,color:#fff
 style Restart fill:#3282b8,color:#fff
 style DeepDebug fill:#1b262c,color:#fff
 style IdentifyService fill:#1a1a40,color:#fff
 style CheckLogs fill:#1a1a40,color:#fff
 style ConnCheck fill:#1a1a40,color:#fff
 style ContainerShell fill:#0f3460,color:#fff
 style PrismaStudio fill:#0f3460,color:#fff
 style RedisInspect fill:#0f3460,color:#fff
 style APITest fill:#0f3460,color:#fff

Infrastructure Issues

Database (Neon/PostgreSQL)

Problem: P1001: Can't reach database server or Connection timed out

  • Cause 1: Internet connectivity issues (Neon is cloud-based).
  • Cause 2: Incorrect DATABASE_URL in .env.
  • Cause 3: IP address blocked by Neon.

Solution:

  1. Verify internet connection: ping neon.tech.
  2. Check deployments/local/.env.local. The URL should look like: postgres://user:pass@ep-xyz.aws.neon.tech/neondb
  3. Go to Neon Dashboard -> Settings, ensure "Allow all IPs" or add your current IP.

Problem: P1003: Database does not exist

  • Reason: You are connecting to the wrong database name.
  • Fix: Check the end of your connection string (e.g., /neondb usually). If you are using a custom DB name, ensure it exists in Neon.

Redis

Problem: Redis connection refused or ECONNREFUSED

  • Cause: Redis container is not running or port mapping is wrong.

Solution:

  1. Check Redis status: docker-compose ps redis.
  2. Restart Redis: docker-compose restart redis.
  3. Check logs: docker-compose logs redis.
  4. Connection string from services:
  • Inside Docker: redis:6379
  • From Host: localhost:6379

Traefik Gateway

Problem: 404 Not Found when accessing APIs (e.g., http://localhost/api/v1/auth)

  • Cause: Service is down or Labels are misconfigured.

Solution:

  1. Check Traefik Dashboard at http://localhost:8080.
  • Look for "HTTP Routers" and "Services".
  • If your service is missing, check docker-compose.yml labels.
  1. Verify PathPrefix in labels matches your request.
- "traefik.http.routers.iam.rule=PathPrefix(`/api/v1/auth`)"
  1. Check if the service passed health checks (Health status in dashboard).

Problem: Bad Gateway or Gateway Timeout

  • Cause: Service is crashing or taking too long to respond.
  • Fix: Check the specific service logs (docker-compose logs iam-service).

Service Issues

Service Fails to Start

Symptom: Container status is Exited (1) or Restarting.

Debugging:

  1. Check logs immediately:
docker-compose logs iam-service
  1. Common Error: Config validation error
  • Fix: Check environment variables. Using ./scripts/setup/init-project.sh ensures .env exists.
  1. Common Error: PrismaClientInitializationError
  • Fix: Database connectivity issue (see Infrastructure section).

Prisma/Database Errors

Error: P2025: Record to update not found

  • Fix: Logic error. Ensure the ID exists before updating.

Error: P2002: Unique constraint failed

  • Fix: You are trying to insert duplicate data (e.g., same email).

Error: Migration failed

  • Fix:
  1. Delete prisma/migrations folder (only in dev!).
  2. Reset database: pnpm prisma migrate reset.
  3. Regenerate client: pnpm prisma generate.

Authentication Errors

Problem: 401 Unauthorized despite valid token

  • Cause 1: Token expired.
  • Cause 2: Public key mismatch (Service can't verify token signed by IAM).
  • Cause 3: Clock skew (Docker time vs Host time).

Solution:

  1. Check server logs for JWT verification errors.
  2. Restart services to refresh keys.
  3. Sync Docker time: restart Docker Desktop.

Debugging Tools

1. Accessing Container Shell

To inspect files or run commands inside a running container:

docker-compose exec iam-service sh
# or /bin/bash

2. Inspecting Database (via Prisma Studio)

Use Prisma Studio to view/edit data visually:

pnpm --filter @goodgo/iam-service prisma studio
# Opens http://localhost:5555

3. Inspecting Redis

docker-compose exec redis redis-cli
> PING
PONG
> KEYS *
1) "user:123:session"

4. Direct API Testing

Use curl or Postman.

# Health Check
curl -v http://localhost/api/v1/auth/health/live

# Login (example)
curl -X POST http://localhost/api/v1/auth/login \
 -H "Content-Type: application/json" \
 -d '{"email":"admin@example.com", "password":"password"}'

FAQ

Q: Why is my change not reflecting? A: If you changed .env or docker-compose.yml, you must restart:

docker-compose down && docker-compose up -d

If you changed code, hot-reloading (nodemon) should pick it up. If not, restart container.

Q: How do I reset everything? A: Be careful, this deletes all data!

docker-compose down -v
# -v removes volumes (Redis data, etc.)

Q: My computer is slow when running everything. A: Docker consumes RAM.

  1. Stop unused services (e.g., future-service).
  2. Increase Docker resource limits in Docker Desktop settings.

Quick Tips

Debugging Shortcuts

# Quick health check all services
docker-compose ps | grep -v "Up"

# Tail logs for all services
docker-compose logs -f --tail=50

# Restart specific service without rebuilding
docker-compose restart iam-service

# Rebuild and restart service
docker-compose up -d --build iam-service

# Check resource usage
docker stats --no-stream

# Clean up unused resources
docker system prune -a --volumes

Common Error Patterns

Error Pattern Likely Cause Quick Fix
ECONNREFUSED Service not running docker-compose restart <service>
P1001 Database unreachable Check DATABASE_URL and internet
P2002 Duplicate entry Check unique constraints
401 Unauthorized Token expired/invalid Refresh token or re-login
404 Not Found Wrong route/service down Check Traefik dashboard
502 Bad Gateway Service crashed Check service logs
Config validation error Missing env vars Run init-project.sh

Log Analysis Tips

What to look for in logs:

  • Server listening on port XXXX = Service started successfully
  • Warning: = Non-critical issues
  • Error: = Critical issues requiring attention
  • Trace: = Detailed execution flow

Useful grep patterns:

# Find all errors
docker-compose logs | grep -i error

# Find specific service errors
docker-compose logs iam-service | grep -i "error\|failed"

# Find database connection issues
docker-compose logs | grep -i "prisma\|database\|p1001\|p1003"

# Find auth issues
docker-compose logs | grep -i "unauthorized\|401\|jwt\|token"

Resource Management

Recommended Docker Resources:

  • RAM: Minimum 4GB, Recommended 8GB
  • CPU: Minimum 2 cores, Recommended 4 cores
  • Disk: Minimum 10GB free space

Check resource usage:

# Overall system
docker system df

# Per container
docker stats --no-stream --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}"

Cleanup commands:

# Remove stopped containers
docker container prune

# Remove unused images
docker image prune -a

# Remove unused volumes ( deletes data!)
docker volume prune

# Nuclear option ( removes everything!)
docker system prune -a --volumes

Best Practices

  1. Always check logs first before making changes
  2. Use Traefik Dashboard (http://localhost:8080) to verify routing
  3. Keep .env.local updated with correct credentials
  4. Don't delete volumes unless you want to lose data
  5. Restart Docker Desktop if experiencing weird networking issues
  6. Use docker-compose down && up after .env changes
  7. Keep services running you're actively developing
  8. Stop services you're not using to save resources

Visual Indicators

When reading logs, look for these patterns:

  • [INFO] = Normal operation
  • [WARN] = Something to watch
  • [ERROR] = Needs immediate attention
  • [DEBUG] = Detailed information
  • [TRACE] = Very detailed execution flow