Files
goodgo-platform/docs/audits/INFRASTRUCTURE_QUICK_REFERENCE.md
Ho Ngoc Hai b8512ebff4 docs: consolidate audit and analysis reports into docs/audits/
Move 36 root-level audit/analysis documents and 7 web app audit documents
into docs/audits/ directory to declutter the project root. Remove stale
EXPLORATION_SUMMARY.txt.

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-04-11 01:37:50 +07:00

6.3 KiB

GoodGo Platform — Infrastructure Quick Reference

🚀 Quick Start

# Development
docker compose up -d --wait

# Production
docker compose -f docker-compose.prod.yml up -d --wait

# CI/E2E
docker compose -f docker-compose.ci.yml up -d --wait

📊 Service Map (Dev)

Service Port Health Status
API (NestJS) 3001 GET /health 🟢
Web (Next.js) 3000 GET / 🟢
AI Services (Python) 8000 GET /health 🟢
PostgreSQL + PostGIS 5432 pg_isready 🟢
Redis 6379 PING 🟢
Typesense 8108 GET /health 🟢
MinIO 9000 mc ready local 🟢
Prometheus 9090 GET /-/healthy 🟢
Grafana 3002 GET /api/health 🟢
Loki 3100 GET /ready 🟢
Promtail 9080 (passive) 🟢

📊 Service Map (Prod)

Same as dev, plus PgBouncer (6432) for connection pooling.

🗄️ Database

  • Type: PostgreSQL 16 + PostGIS
  • Schema: 22 Prisma models
  • Backup: Daily 02:00 UTC, 7-day retention
  • Connection Pooling (Prod): PgBouncer (transaction mode, 20 connections)
  • Verification: Weekly automated restore test

Key Tables:

  • User, RefreshToken, OAuthAccount, Agent
  • Property, PropertyMedia, Listing
  • SavedSearch, Transaction, Inquiry, Lead
  • Payment (VNPAY/MOMO/ZALOPAY/BANK_TRANSFER)
  • Plan, Subscription, UsageRecord
  • Valuation, MarketIndex, NotificationLog, AdminAuditLog, Review
  • Redis: 512MB (prod), 256MB (dev), AOF persistence, LRU eviction
  • Typesense: Full-text search on listings, geo-indexing support

📈 Monitoring

  • Prometheus: 30-day retention (prod), 15-day (dev)
  • Grafana: Pre-provisioned dashboards (7 total)
  • Loki: 15-day log retention, Pino JSON parsing
  • Alerts: p99 latency > 1s (warn), > 3s (critical), 5xx > 1%

💳 Payment Integration

Gateway Provider Status Tracking Callback Verification
VNPay VNPAY HMAC SHA-256
MoMo MOMO HMAC
ZaloPay ZALOPAY Key 1/2
Bank Transfer BANK_TRANSFER Manual N/A

Callback Handler:

  • Idempotent (updateIfStatus pattern)
  • Atomic state transitions (PENDING → COMPLETED/FAILED)
  • Domain event publishing (triggers downstream actions)

🏥 Health Checks

GET /health          # Liveness (always 200)
GET /health/ready    # Readiness (checks DB + Redis)
GET /health/db       # Database only
GET /health/redis    # Redis only

🔐 Environment Variables (Critical)

# Database
DB_USER=goodgo
DB_PASSWORD=<required>
DATABASE_URL_DIRECT=postgresql://...  # For migrations

# Redis (Prod requires password)
REDIS_PASSWORD=<required-in-prod>

# Typesense API Key
TYPESENSE_API_KEY=<required>

# JWT Secrets (REQUIRED, min 32 chars)
JWT_SECRET=<openssl rand -base64 48>
JWT_REFRESH_SECRET=<openssl rand -base64 48>

# KYC Encryption (Prod only)
KYC_ENCRYPTION_KEY=<openssl rand -hex 32>  # 64 hex chars

# Payment Gateways (optional if disabled)
VNPAY_TMN_CODE=
MOMO_PARTNER_CODE=
ZALOPAY_APP_ID=

📦 Deployment

Containers:

  • goodgo-api:${IMAGE_TAG} — NestJS API
  • goodgo-web:${IMAGE_TAG} — Next.js Frontend
  • goodgo-ai-services:${IMAGE_TAG} — Python FastAPI

Registry: ghcr.io/goodgo/

CI/CD: GitHub Actions

  • ci.yml — Test, build, lint on push
  • deploy.yml — Build images, deploy to staging (auto) or prod (manual)
  • backup-verify.yml — Weekly restore verification
  • e2e.yml — End-to-end test suite

🆘 Troubleshooting

API not healthy?

docker compose exec api curl http://localhost:3001/health/ready
docker compose logs api --tail=50

Database connection pooling full?

docker compose exec pgbouncer psql -h 127.0.0.1 -p 6432 -U pgbouncer_stats -c "SHOW stats"

Redis down?

docker compose exec redis redis-cli ping
# App continues working (DB fallback), but slower

Typesense not indexing?

curl http://localhost:8108/collections/listings -H "X-TYPESENSE-API-KEY: ${KEY}"
# Reindex: docker compose exec api npx ts-node scripts/reindex-listings.ts

Payment callback failing?

docker compose logs api | grep -i callback
# Check VNPAY_HASH_SECRET / MOMO_SECRET_KEY / ZALOPAY_KEY1

Backup stuck?

docker compose exec pg-backup bash -c "tail -f /var/log/pg-backup.log"
docker compose exec postgres psql -U goodgo -d goodgo -c "SELECT * FROM pg_stat_activity WHERE wait_event_type IS NOT NULL;"

📝 Key Files

Path Purpose
docker-compose.yml Dev (no resource limits, all services)
docker-compose.prod.yml Prod (PgBouncer, limits, secrets)
docker-compose.ci.yml Test (tmpfs, minimal services)
INFRASTRUCTURE_RUNBOOK.md Full documentation (this file's companion)
prisma/schema.prisma Complete data model
infra/pgbouncer/pgbouncer.ini Connection pooling config
monitoring/prometheus/alert-rules.yml Alert definitions
scripts/backup/pg-backup.sh Daily backup automation
scripts/backup/pg-verify-backup.sh Restore verification
.github/workflows/*.yml CI/CD pipelines

📞 Common Commands

# View all services
docker compose ps

# Tail logs
docker compose logs -f api web postgres redis

# Execute command in container
docker compose exec api npx prisma db push
docker compose exec postgres psql -U goodgo -d goodgo

# Restart single service
docker compose restart api

# Full cleanup (dev only!)
docker compose down -v && docker compose up -d --wait

# Database backup & restore
docker compose exec postgres pg_dump -U goodgo -d goodgo | gzip > backup.sql.gz
docker compose exec postgres pg_restore -U goodgo -d goodgo backup.sql.gz

# Check health endpoints
curl http://localhost:3001/health/ready | jq
curl http://localhost:3001/health/db | jq
curl http://localhost:3001/health/redis | jq

For detailed information, see INFRASTRUCTURE_RUNBOOK.md

Last updated: April 11, 2026