Files
pos-system/docs/en/guides/troubleshooting.md
Ho Ngoc Hai 9ba4a478ee feat(docs): Enhance deployment and development guides with improved clarity and structure
- Updated Mermaid diagrams in the deployment and development guides for better visual representation and consistency.
- Improved formatting and clarity in the Kubernetes local deployment and IAM migration guides, including detailed workflows and troubleshooting sections.
- Enhanced the Vietnamese documentation to align with the English version, ensuring consistency across guides.
- Added quick tips and common issues sections to facilitate user navigation and understanding.
2026-01-08 17:10:06 +07:00

424 lines
12 KiB
Markdown

# Troubleshooting Guide
> **Note**: This guide focuses on debugging the GoodGo Microservices Platform in a local development environment (Docker Compose).
## Table of Contents
1. [General Diagnosis](#general-diagnosis)
2. [Infrastructure Issues](#infrastructure-issues)
- [Database (Neon/PostgreSQL)](#database-neonpostgresql)
- [Redis](#redis)
- [Traefik Gateway](#traefik-gateway)
3. [Service Issues](#service-issues)
- [Service Fails to Start](#service-fails-to-start)
- [Prisma/Database Errors](#prismadatabase-errors)
- [Authentication Errors](#authentication-errors)
4. [Debugging Tools](#debugging-tools)
5. [FAQ](#faq)
---
## General Diagnosis
When something goes wrong, follow this checklist:
1. **Check Service Status**:
```bash
cd deployments/local
docker-compose ps
```
*All services should be `Up` or `Running`.*
2. **Check Logs**:
```bash
# View logs for a specific service
docker-compose logs -f <service-name>
# View last 100 lines for all
docker-compose logs --tail=100
```
3. **Check Connectivity**:
* Can you reach the Gateway? `curl http://localhost/health`
* Can you reach the Dashboard? http://localhost:8080
### Troubleshooting Flowchart
```mermaid
flowchart TD
Start([ Issue Detected]) --> CheckStatus{Check Service<br/>Status}
CheckStatus -->|All Running| CheckLogs[ Check Logs]
CheckStatus -->|Some Down| IdentifyService[ Identify Failed<br/>Service]
IdentifyService --> ServiceType{Service Type?}
ServiceType -->|Infrastructure| InfraCheck[ Infrastructure<br/>Check]
ServiceType -->|Application| AppCheck[ Application<br/>Check]
InfraCheck --> DBCheck{Database?}
InfraCheck --> RedisCheck{Redis?}
InfraCheck --> TraefikCheck{Traefik?}
DBCheck -->|Yes| DBSolution[ Check DATABASE_URL<br/> Verify Neon connection<br/> Check IP whitelist]
RedisCheck -->|Yes| RedisSolution[ Restart Redis<br/> Check port mapping<br/> Verify connection string]
TraefikCheck -->|Yes| TraefikSolution[ Check labels<br/> Verify PathPrefix<br/> Check health status]
AppCheck --> ErrorType{Error Type?}
ErrorType -->|Config| ConfigFix[ Check .env variables<br/> Run init-project.sh]
ErrorType -->|Prisma| PrismaFix[ Check migrations<br/> Regenerate client<br/> Reset database]
ErrorType -->|Auth| AuthFix[ Check token expiry<br/> Verify keys<br/> Sync Docker time]
CheckLogs --> LogAnalysis{Log Shows<br/>Error?}
LogAnalysis -->|Yes| ErrorType
LogAnalysis -->|No| ConnCheck[ Check Connectivity]
ConnCheck --> GatewayTest{Gateway<br/>Reachable?}
GatewayTest -->|No| TraefikCheck
GatewayTest -->|Yes| ServiceTest{Service<br/>Reachable?}
ServiceTest -->|No| AppCheck
ServiceTest -->|Yes| Resolved([ Issue Resolved])
DBSolution --> Restart[ Restart Services]
RedisSolution --> Restart
TraefikSolution --> Restart
ConfigFix --> Restart
PrismaFix --> Restart
AuthFix --> Restart
Restart --> Verify{Issue<br/>Fixed?}
Verify -->|Yes| Resolved
Verify -->|No| DeepDebug[ Deep Debugging<br/>Required]
DeepDebug --> ContainerShell[Access Container Shell]
DeepDebug --> PrismaStudio[Use Prisma Studio]
DeepDebug --> RedisInspect[Inspect Redis]
DeepDebug --> APITest[Direct API Testing]
style Start fill:#1a1a2e,color:#fff
style Resolved fill:#0f3460,color:#fff
style CheckStatus fill:#16213e,color:#fff
style ServiceType fill:#16213e,color:#fff
style ErrorType fill:#16213e,color:#fff
style DBCheck fill:#16213e,color:#fff
style RedisCheck fill:#16213e,color:#fff
style TraefikCheck fill:#16213e,color:#fff
style GatewayTest fill:#16213e,color:#fff
style ServiceTest fill:#16213e,color:#fff
style Verify fill:#16213e,color:#fff
style LogAnalysis fill:#16213e,color:#fff
style InfraCheck fill:#1a1a40,color:#fff
style AppCheck fill:#1a1a40,color:#fff
style DBSolution fill:#0f4c75,color:#fff
style RedisSolution fill:#0f4c75,color:#fff
style TraefikSolution fill:#0f4c75,color:#fff
style ConfigFix fill:#0f4c75,color:#fff
style PrismaFix fill:#0f4c75,color:#fff
style AuthFix fill:#0f4c75,color:#fff
style Restart fill:#3282b8,color:#fff
style DeepDebug fill:#1b262c,color:#fff
style IdentifyService fill:#1a1a40,color:#fff
style CheckLogs fill:#1a1a40,color:#fff
style ConnCheck fill:#1a1a40,color:#fff
style ContainerShell fill:#0f3460,color:#fff
style PrismaStudio fill:#0f3460,color:#fff
style RedisInspect fill:#0f3460,color:#fff
style APITest fill:#0f3460,color:#fff
```
---
## Infrastructure Issues
### Database (Neon/PostgreSQL)
**Problem**: `P1001: Can't reach database server` or `Connection timed out`
* **Cause 1**: Internet connectivity issues (Neon is cloud-based).
* **Cause 2**: Incorrect `DATABASE_URL` in `.env`.
* **Cause 3**: IP address blocked by Neon.
**Solution**:
1. Verify internet connection: `ping neon.tech`.
2. Check `deployments/local/.env.local`. The URL should look like:
`postgres://user:pass@ep-xyz.aws.neon.tech/neondb`
3. Go to Neon Dashboard -> Settings, ensure "Allow all IPs" or add your current IP.
**Problem**: `P1003: Database does not exist`
* **Reason**: You are connecting to the wrong database name.
* **Fix**: Check the end of your connection string (e.g., `/neondb` usually). If you are using a custom DB name, ensure it exists in Neon.
### Redis
**Problem**: `Redis connection refused` or `ECONNREFUSED`
* **Cause**: Redis container is not running or port mapping is wrong.
**Solution**:
1. Check Redis status: `docker-compose ps redis`.
2. Restart Redis: `docker-compose restart redis`.
3. Check logs: `docker-compose logs redis`.
4. Connection string from services:
* **Inside Docker**: `redis:6379`
* **From Host**: `localhost:6379`
### Traefik Gateway
**Problem**: `404 Not Found` when accessing APIs (e.g., `http://localhost/api/v1/auth`)
* **Cause**: Service is down or Labels are misconfigured.
**Solution**:
1. Check Traefik Dashboard at http://localhost:8080.
* Look for "HTTP Routers" and "Services".
* If your service is missing, check `docker-compose.yml` labels.
2. Verify `PathPrefix` in labels matches your request.
```yaml
- "traefik.http.routers.iam.rule=PathPrefix(`/api/v1/auth`)"
```
3. Check if the service passed health checks (Health status in dashboard).
**Problem**: `Bad Gateway` or `Gateway Timeout`
* **Cause**: Service is crashing or taking too long to respond.
* **Fix**: Check the specific service logs (`docker-compose logs iam-service`).
---
## Service Issues
### Service Fails to Start
**Symptom**: Container status is `Exited (1)` or `Restarting`.
**Debugging**:
1. Check logs immediately:
```bash
docker-compose logs iam-service
```
2. **Common Error**: `Config validation error`
* **Fix**: Check environment variables. Using `./scripts/setup/init-project.sh` ensures `.env` exists.
3. **Common Error**: `PrismaClientInitializationError`
* **Fix**: Database connectivity issue (see Infrastructure section).
### Prisma/Database Errors
**Error**: `P2025: Record to update not found`
* **Fix**: Logic error. Ensure the ID exists before updating.
**Error**: `P2002: Unique constraint failed`
* **Fix**: You are trying to insert duplicate data (e.g., same email).
**Error**: `Migration failed`
* **Fix**:
1. Delete `prisma/migrations` folder (only in dev!).
2. Reset database: `pnpm prisma migrate reset`.
3. Regenerate client: `pnpm prisma generate`.
### Authentication Errors
**Problem**: `401 Unauthorized` despite valid token
* **Cause 1**: Token expired.
* **Cause 2**: Public key mismatch (Service can't verify token signed by IAM).
* **Cause 3**: Clock skew (Docker time vs Host time).
**Solution**:
1. Check server logs for JWT verification errors.
2. Restart services to refresh keys.
3. Sync Docker time: restart Docker Desktop.
---
## Debugging Tools
### 1. Accessing Container Shell
To inspect files or run commands inside a running container:
```bash
docker-compose exec iam-service sh
# or /bin/bash
```
### 2. Inspecting Database (via Prisma Studio)
Use Prisma Studio to view/edit data visually:
```bash
pnpm --filter @goodgo/iam-service prisma studio
# Opens http://localhost:5555
```
### 3. Inspecting Redis
```bash
docker-compose exec redis redis-cli
> PING
PONG
> KEYS *
1) "user:123:session"
```
### 4. Direct API Testing
Use `curl` or Postman.
```bash
# Health Check
curl -v http://localhost/api/v1/auth/health/live
# Login (example)
curl -X POST http://localhost/api/v1/auth/login \
-H "Content-Type: application/json" \
-d '{"email":"admin@example.com", "password":"password"}'
```
---
## FAQ
**Q: Why is my change not reflecting?**
A: If you changed `.env` or `docker-compose.yml`, you must restart:
```bash
docker-compose down && docker-compose up -d
```
If you changed code, hot-reloading (nodemon) should pick it up. If not, restart container.
**Q: How do I reset everything?**
A: Be careful, this deletes all data!
```bash
docker-compose down -v
# -v removes volumes (Redis data, etc.)
```
**Q: My computer is slow when running everything.**
A: Docker consumes RAM.
1. Stop unused services (e.g., `future-service`).
2. Increase Docker resource limits in Docker Desktop settings.
---
## Quick Tips
### Debugging Shortcuts
```bash
# Quick health check all services
docker-compose ps | grep -v "Up"
# Tail logs for all services
docker-compose logs -f --tail=50
# Restart specific service without rebuilding
docker-compose restart iam-service
# Rebuild and restart service
docker-compose up -d --build iam-service
# Check resource usage
docker stats --no-stream
# Clean up unused resources
docker system prune -a --volumes
```
### Common Error Patterns
| Error Pattern | Likely Cause | Quick Fix |
|--------------|--------------|-----------|
| `ECONNREFUSED` | Service not running | `docker-compose restart <service>` |
| `P1001` | Database unreachable | Check `DATABASE_URL` and internet |
| `P2002` | Duplicate entry | Check unique constraints |
| `401 Unauthorized` | Token expired/invalid | Refresh token or re-login |
| `404 Not Found` | Wrong route/service down | Check Traefik dashboard |
| `502 Bad Gateway` | Service crashed | Check service logs |
| `Config validation error` | Missing env vars | Run `init-project.sh` |
### Log Analysis Tips
**What to look for in logs:**
- `Server listening on port XXXX` = Service started successfully
- `Warning:` = Non-critical issues
- `Error:` = Critical issues requiring attention
- `Trace:` = Detailed execution flow
**Useful grep patterns:**
```bash
# Find all errors
docker-compose logs | grep -i error
# Find specific service errors
docker-compose logs iam-service | grep -i "error\|failed"
# Find database connection issues
docker-compose logs | grep -i "prisma\|database\|p1001\|p1003"
# Find auth issues
docker-compose logs | grep -i "unauthorized\|401\|jwt\|token"
```
### Resource Management
**Recommended Docker Resources:**
- **RAM**: Minimum 4GB, Recommended 8GB
- **CPU**: Minimum 2 cores, Recommended 4 cores
- **Disk**: Minimum 10GB free space
**Check resource usage:**
```bash
# Overall system
docker system df
# Per container
docker stats --no-stream --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}"
```
**Cleanup commands:**
```bash
# Remove stopped containers
docker container prune
# Remove unused images
docker image prune -a
# Remove unused volumes ( deletes data!)
docker volume prune
# Nuclear option ( removes everything!)
docker system prune -a --volumes
```
### Best Practices
1. **Always check logs first** before making changes
2. **Use Traefik Dashboard** (http://localhost:8080) to verify routing
3. **Keep `.env.local` updated** with correct credentials
4. **Don't delete volumes** unless you want to lose data
5. **Restart Docker Desktop** if experiencing weird networking issues
6. **Use `docker-compose down && up`** after `.env` changes
7. **Keep services running** you're actively developing
8. **Stop services** you're not using to save resources
### Visual Indicators
When reading logs, look for these patterns:
- `[INFO]` = Normal operation
- `[WARN]` = Something to watch
- `[ERROR]` = Needs immediate attention
- `[DEBUG]` = Detailed information
- `[TRACE]` = Very detailed execution flow
### Related Resources
- [Local Deployment Guide](./local-deployment.md) - Setup instructions
- [Development Guide](./development.md) - Development workflow
- [Kubernetes Local Guide](./kubernetes-local.md) - K8s troubleshooting
- [Neon Database Guide](./neon-database.md) - Database management