Commit Graph

8 Commits

Author SHA1 Message Date
Ho Ngoc Hai
e78d706b42 chore: update infrastructure configs, audit docs, and env template
- Update Docker Compose configs for Redis, Typesense, and MinIO services
- Update GitHub Actions deploy workflow with improved caching and steps
- Extend .env.example with Stringee, Zalo OA, and FCM config keys
- Update audit documentation with latest findings and recommendations
- Update CHANGELOG and README with recent feature additions

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-04-16 05:17:38 +07:00
Ho Ngoc Hai
e5f7acf7da feat: production infra — nginx configs, deploy script, security hardening
Some checks failed
CI / Lint → Typecheck → Test → Build (22) (push) Failing after 58s
Deploy / Build Web Image (push) Failing after 14s
Deploy / Rollback Production (push) Has been skipped
CI / E2E Tests (push) Has been skipped
Deploy / Build API Image (push) Failing after 3m8s
Deploy / Build AI Services Image (push) Failing after 10s
E2E Tests / Playwright E2E (push) Failing after 1m21s
Deploy / Deploy to Staging (push) Has been skipped
Deploy / Smoke Test Staging (push) Has been skipped
Deploy / Deploy to Production (push) Has been skipped
Deploy / Smoke Test Production (push) Has been skipped
Deploy / Rollback Staging (push) Has been skipped
- Add Nginx reverse-proxy configs for api.goodgo.vn and platform.goodgo.vn
  with SSL, gzip, rate limiting, security headers, and WebSocket support
- Add Cloudflare DNS setup script for A/AAAA/CNAME records
- Add server-setup.sh for Ubuntu provisioning (Docker, fail2ban, UFW,
  swap, unattended-upgrades)
- Add deploy-production.sh for manual production deployments
- Add env.production.example with all required environment variables
- Bind container ports to 127.0.0.1 in docker-compose.prod.yml
  (security: prevent direct access bypassing Nginx)
- Fix deploy workflow: add -T flag to exec, sync Nginx configs,
  copy pgbouncer and backup configs to server

Co-Authored-By: Claude Opus 4 (1M context) <noreply@anthropic.com>
2026-04-13 14:11:25 +07:00
Ho Ngoc Hai
9409706c58 feat(monitoring): add comprehensive alerting rules, Alertmanager, and DR validation
Expand production monitoring with full alert coverage for database connections,
Redis memory/connections, container resources, disk usage, service health, and
backup integrity. Add Alertmanager service with Slack routing for critical and
warning alerts, and add automated backup verification to the pg-backup cron
schedule. Update runbook with DR validation procedures and quarterly checklist.

- Expand Prometheus alert rules from 4 to 24 alerts across 7 groups
- Add Alertmanager container (prom/alertmanager:v0.27.0) with Slack routing
- Configure inhibition rules (critical suppresses warning for same service)
- Schedule automated backup verification at 04:00 UTC daily
- Add Alertmanager datasource to Grafana provisioning
- Update runbook with Section 9: DR Validation (automated + manual procedures)
- Add SLACK_WEBHOOK_URL and Grafana vars to .env.example

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-04-11 20:15:36 +07:00
Ho Ngoc Hai
05abbc5250 feat(infra): add PgBouncer connection pooling for production PostgreSQL
Introduces PgBouncer as a connection pooler between the API service and
PostgreSQL in docker-compose.prod.yml, reducing connection overhead and
improving concurrency under production load.

- Add PgBouncer service (edoburu/pgbouncer:1.23.1-p2) with transaction
  pool mode, max_client_conn=200, default_pool_size=20
- Route API DATABASE_URL through PgBouncer (port 6432), keep direct
  connection (DATABASE_URL_DIRECT) for Prisma migrations/introspection
- Create infra/pgbouncer/ config: pgbouncer.ini, userlist template,
  and entrypoint script with runtime env-var substitution
- Update prisma.config.ts to prefer DATABASE_URL_DIRECT for migrations
- Add K6 load test (e2e/load/pgbouncer-pool-test.js) with ramp-up to
  200 VUs, pool exhaustion detection, and p95 < 2s threshold
- Add PgBouncer env vars to .env.example

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-04-10 20:15:21 +07:00
Ho Ngoc Hai
60830d00d0 feat(devops): improve multi-stage production Dockerfile for NestJS API
- Use pnpm deploy --prod for pruned production node_modules (smaller image)
- Add docker-entrypoint.sh with optional Prisma migration support (RUN_MIGRATIONS)
- Copy generated Prisma client explicitly into production stage
- Add OCI image labels for container registry metadata
- Update .dockerignore: exclude apps/web, libs/ai-services, agent configs, Python artifacts
- Add build directive + RUN_MIGRATIONS env to docker-compose.prod.yml
- Maintain non-root user, dumb-init signal handling, and healthcheck

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-04-09 01:23:06 +07:00
Ho Ngoc Hai
767afb56d5 fix(docker): harden production deployment config for all services
- Add resource limits (memory/CPU) and reservations for all services
- Add security hardening: read_only, no-new-privileges, tmpfs for temp dirs
- Add missing prod services: loki, promtail, pg-backup from dev compose
- Fix API healthcheck to include catch() for proper exit codes
- Add json-file logging driver with rotation limits across all services
- Remove exposed PostgreSQL port in prod (internal only)
- Add shm_size for PostgreSQL shared memory
- Add non-root user (appuser) to AI services Dockerfile
- Add --chown=node:node to COPY directives in API/Web Dockerfiles
- Harden .dockerignore: exclude IDE files, OS files, docker-compose files
- Fix Redis URL to include password authentication
- Add JWT_REFRESH_SECRET to API environment
- Add Grafana dependency on Loki for log datasource

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-04-08 13:44:44 +07:00
Ho Ngoc Hai
e60b95cdec fix(infra): harden AI service — graceful shutdown, rate limiting, API key auth, pinned deps, Grafana secrets
- Add dumb-init + --timeout-graceful-shutdown 30 to AI service Dockerfile
- Add slowapi rate limiting (configurable via AI_RATE_LIMIT) and X-API-Key auth middleware
- Pin all Python dependencies to exact versions for reproducible builds
- Move Grafana admin credentials from env vars to Docker secrets in production compose

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-04-08 06:13:29 +07:00
Ho Ngoc Hai
7c9f682046 feat(deploy): add production Dockerfiles and CI/CD pipeline
- Multi-stage Dockerfile for apps/api (NestJS) and apps/web (Next.js standalone)
- Production docker-compose.prod.yml with all services, health checks, and security
- Real deploy.yml pipeline: build → push to GHCR → deploy staging/production
- .dockerignore for optimized build context
- Enable Next.js standalone output mode

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-04-08 04:03:27 +07:00