12 KiB
Troubleshooting Guide
Note
: This guide focuses on debugging the GoodGo Microservices Platform in a local development environment (Docker Compose).
Table of Contents
General Diagnosis
When something goes wrong, follow this checklist:
- Check Service Status:
cd deployments/local
docker-compose ps
All services should be Up or Running.
- Check Logs:
# View logs for a specific service
docker-compose logs -f <service-name>
# View last 100 lines for all
docker-compose logs --tail=100
- Check Connectivity:
- Can you reach the Gateway?
curl http://localhost/health - Can you reach the Dashboard? http://localhost:8080
Troubleshooting Flowchart
flowchart TD
Start([ Issue Detected]) --> CheckStatus{Check Service<br/>Status}
CheckStatus -->|All Running| CheckLogs[ Check Logs]
CheckStatus -->|Some Down| IdentifyService[ Identify Failed<br/>Service]
IdentifyService --> ServiceType{Service Type?}
ServiceType -->|Infrastructure| InfraCheck[ Infrastructure<br/>Check]
ServiceType -->|Application| AppCheck[ Application<br/>Check]
InfraCheck --> DBCheck{Database?}
InfraCheck --> RedisCheck{Redis?}
InfraCheck --> TraefikCheck{Traefik?}
DBCheck -->|Yes| DBSolution[ Check DATABASE_URL<br/> Verify Neon connection<br/> Check IP whitelist]
RedisCheck -->|Yes| RedisSolution[ Restart Redis<br/> Check port mapping<br/> Verify connection string]
TraefikCheck -->|Yes| TraefikSolution[ Check labels<br/> Verify PathPrefix<br/> Check health status]
AppCheck --> ErrorType{Error Type?}
ErrorType -->|Config| ConfigFix[ Check .env variables<br/> Run init-project.sh]
ErrorType -->|Prisma| PrismaFix[ Check migrations<br/> Regenerate client<br/> Reset database]
ErrorType -->|Auth| AuthFix[ Check token expiry<br/> Verify keys<br/> Sync Docker time]
CheckLogs --> LogAnalysis{Log Shows<br/>Error?}
LogAnalysis -->|Yes| ErrorType
LogAnalysis -->|No| ConnCheck[ Check Connectivity]
ConnCheck --> GatewayTest{Gateway<br/>Reachable?}
GatewayTest -->|No| TraefikCheck
GatewayTest -->|Yes| ServiceTest{Service<br/>Reachable?}
ServiceTest -->|No| AppCheck
ServiceTest -->|Yes| Resolved([ Issue Resolved])
DBSolution --> Restart[ Restart Services]
RedisSolution --> Restart
TraefikSolution --> Restart
ConfigFix --> Restart
PrismaFix --> Restart
AuthFix --> Restart
Restart --> Verify{Issue<br/>Fixed?}
Verify -->|Yes| Resolved
Verify -->|No| DeepDebug[ Deep Debugging<br/>Required]
DeepDebug --> ContainerShell[Access Container Shell]
DeepDebug --> PrismaStudio[Use Prisma Studio]
DeepDebug --> RedisInspect[Inspect Redis]
DeepDebug --> APITest[Direct API Testing]
style Start fill:#1a1a2e,color:#fff
style Resolved fill:#0f3460,color:#fff
style CheckStatus fill:#16213e,color:#fff
style ServiceType fill:#16213e,color:#fff
style ErrorType fill:#16213e,color:#fff
style DBCheck fill:#16213e,color:#fff
style RedisCheck fill:#16213e,color:#fff
style TraefikCheck fill:#16213e,color:#fff
style GatewayTest fill:#16213e,color:#fff
style ServiceTest fill:#16213e,color:#fff
style Verify fill:#16213e,color:#fff
style LogAnalysis fill:#16213e,color:#fff
style InfraCheck fill:#1a1a40,color:#fff
style AppCheck fill:#1a1a40,color:#fff
style DBSolution fill:#0f4c75,color:#fff
style RedisSolution fill:#0f4c75,color:#fff
style TraefikSolution fill:#0f4c75,color:#fff
style ConfigFix fill:#0f4c75,color:#fff
style PrismaFix fill:#0f4c75,color:#fff
style AuthFix fill:#0f4c75,color:#fff
style Restart fill:#3282b8,color:#fff
style DeepDebug fill:#1b262c,color:#fff
style IdentifyService fill:#1a1a40,color:#fff
style CheckLogs fill:#1a1a40,color:#fff
style ConnCheck fill:#1a1a40,color:#fff
style ContainerShell fill:#0f3460,color:#fff
style PrismaStudio fill:#0f3460,color:#fff
style RedisInspect fill:#0f3460,color:#fff
style APITest fill:#0f3460,color:#fff
Infrastructure Issues
Database (Neon/PostgreSQL)
Problem: P1001: Can't reach database server or Connection timed out
- Cause 1: Internet connectivity issues (Neon is cloud-based).
- Cause 2: Incorrect
DATABASE_URLin.env. - Cause 3: IP address blocked by Neon.
Solution:
- Verify internet connection:
ping neon.tech. - Check
deployments/local/.env.local. The URL should look like:postgres://user:pass@ep-xyz.aws.neon.tech/neondb - Go to Neon Dashboard -> Settings, ensure "Allow all IPs" or add your current IP.
Problem: P1003: Database does not exist
- Reason: You are connecting to the wrong database name.
- Fix: Check the end of your connection string (e.g.,
/neondbusually). If you are using a custom DB name, ensure it exists in Neon.
Redis
Problem: Redis connection refused or ECONNREFUSED
- Cause: Redis container is not running or port mapping is wrong.
Solution:
- Check Redis status:
docker-compose ps redis. - Restart Redis:
docker-compose restart redis. - Check logs:
docker-compose logs redis. - Connection string from services:
- Inside Docker:
redis:6379 - From Host:
localhost:6379
Traefik Gateway
Problem: 404 Not Found when accessing APIs (e.g., http://localhost/api/v1/auth)
- Cause: Service is down or Labels are misconfigured.
Solution:
- Check Traefik Dashboard at http://localhost:8080.
- Look for "HTTP Routers" and "Services".
- If your service is missing, check
docker-compose.ymllabels.
- Verify
PathPrefixin labels matches your request.
- "traefik.http.routers.iam.rule=PathPrefix(`/api/v1/auth`)"
- Check if the service passed health checks (Health status in dashboard).
Problem: Bad Gateway or Gateway Timeout
- Cause: Service is crashing or taking too long to respond.
- Fix: Check the specific service logs (
docker-compose logs iam-service).
Service Issues
Service Fails to Start
Symptom: Container status is Exited (1) or Restarting.
Debugging:
- Check logs immediately:
docker-compose logs iam-service
- Common Error:
Config validation error
- Fix: Check environment variables. Using
./scripts/setup/init-project.shensures.envexists.
- Common Error:
PrismaClientInitializationError
- Fix: Database connectivity issue (see Infrastructure section).
Prisma/Database Errors
Error: P2025: Record to update not found
- Fix: Logic error. Ensure the ID exists before updating.
Error: P2002: Unique constraint failed
- Fix: You are trying to insert duplicate data (e.g., same email).
Error: Migration failed
- Fix:
- Delete
prisma/migrationsfolder (only in dev!). - Reset database:
pnpm prisma migrate reset. - Regenerate client:
pnpm prisma generate.
Authentication Errors
Problem: 401 Unauthorized despite valid token
- Cause 1: Token expired.
- Cause 2: Public key mismatch (Service can't verify token signed by IAM).
- Cause 3: Clock skew (Docker time vs Host time).
Solution:
- Check server logs for JWT verification errors.
- Restart services to refresh keys.
- Sync Docker time: restart Docker Desktop.
Debugging Tools
1. Accessing Container Shell
To inspect files or run commands inside a running container:
docker-compose exec iam-service sh
# or /bin/bash
2. Inspecting Database (via Prisma Studio)
Use Prisma Studio to view/edit data visually:
pnpm --filter @goodgo/iam-service prisma studio
# Opens http://localhost:5555
3. Inspecting Redis
docker-compose exec redis redis-cli
> PING
PONG
> KEYS *
1) "user:123:session"
4. Direct API Testing
Use curl or Postman.
# Health Check
curl -v http://localhost/api/v1/auth/health/live
# Login (example)
curl -X POST http://localhost/api/v1/auth/login \
-H "Content-Type: application/json" \
-d '{"email":"admin@example.com", "password":"password"}'
FAQ
Q: Why is my change not reflecting?
A: If you changed .env or docker-compose.yml, you must restart:
docker-compose down && docker-compose up -d
If you changed code, hot-reloading (nodemon) should pick it up. If not, restart container.
Q: How do I reset everything? A: Be careful, this deletes all data!
docker-compose down -v
# -v removes volumes (Redis data, etc.)
Q: My computer is slow when running everything. A: Docker consumes RAM.
- Stop unused services (e.g.,
future-service). - Increase Docker resource limits in Docker Desktop settings.
Quick Tips
Debugging Shortcuts
# Quick health check all services
docker-compose ps | grep -v "Up"
# Tail logs for all services
docker-compose logs -f --tail=50
# Restart specific service without rebuilding
docker-compose restart iam-service
# Rebuild and restart service
docker-compose up -d --build iam-service
# Check resource usage
docker stats --no-stream
# Clean up unused resources
docker system prune -a --volumes
Common Error Patterns
| Error Pattern | Likely Cause | Quick Fix |
|---|---|---|
ECONNREFUSED |
Service not running | docker-compose restart <service> |
P1001 |
Database unreachable | Check DATABASE_URL and internet |
P2002 |
Duplicate entry | Check unique constraints |
401 Unauthorized |
Token expired/invalid | Refresh token or re-login |
404 Not Found |
Wrong route/service down | Check Traefik dashboard |
502 Bad Gateway |
Service crashed | Check service logs |
Config validation error |
Missing env vars | Run init-project.sh |
Log Analysis Tips
What to look for in logs:
Server listening on port XXXX= Service started successfullyWarning:= Non-critical issuesError:= Critical issues requiring attentionTrace:= Detailed execution flow
Useful grep patterns:
# Find all errors
docker-compose logs | grep -i error
# Find specific service errors
docker-compose logs iam-service | grep -i "error\|failed"
# Find database connection issues
docker-compose logs | grep -i "prisma\|database\|p1001\|p1003"
# Find auth issues
docker-compose logs | grep -i "unauthorized\|401\|jwt\|token"
Resource Management
Recommended Docker Resources:
- RAM: Minimum 4GB, Recommended 8GB
- CPU: Minimum 2 cores, Recommended 4 cores
- Disk: Minimum 10GB free space
Check resource usage:
# Overall system
docker system df
# Per container
docker stats --no-stream --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}"
Cleanup commands:
# Remove stopped containers
docker container prune
# Remove unused images
docker image prune -a
# Remove unused volumes ( deletes data!)
docker volume prune
# Nuclear option ( removes everything!)
docker system prune -a --volumes
Best Practices
- Always check logs first before making changes
- Use Traefik Dashboard (http://localhost:8080) to verify routing
- Keep
.env.localupdated with correct credentials - Don't delete volumes unless you want to lose data
- Restart Docker Desktop if experiencing weird networking issues
- Use
docker-compose down && upafter.envchanges - Keep services running you're actively developing
- Stop services you're not using to save resources
Visual Indicators
When reading logs, look for these patterns:
[INFO]= Normal operation[WARN]= Something to watch[ERROR]= Needs immediate attention[DEBUG]= Detailed information[TRACE]= Very detailed execution flow
Related Resources
- Local Deployment Guide - Setup instructions
- Development Guide - Development workflow
- Kubernetes Local Guide - K8s troubleshooting
- Neon Database Guide - Database management