Commit Graph

22 Commits

Author SHA1 Message Date
Ho Ngoc Hai
e78d706b42 chore: update infrastructure configs, audit docs, and env template
- Update Docker Compose configs for Redis, Typesense, and MinIO services
- Update GitHub Actions deploy workflow with improved caching and steps
- Extend .env.example with Stringee, Zalo OA, and FCM config keys
- Update audit documentation with latest findings and recommendations
- Update CHANGELOG and README with recent feature additions

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-04-16 05:17:38 +07:00
Ho Ngoc Hai
baaeb56849 docs: fix Next.js 14→15 version refs, add libs to CLAUDE.md
- Update stale Next.js 14 references to 15 in audit docs
- Add libs/ai-services and libs/mcp-servers to CLAUDE.md project structure

Resolves TEC-2259

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-04-16 04:05:39 +07:00
Ho Ngoc Hai
8039b47795 docs: fix Next.js 14→15 references, add libs READMEs
- Fix remaining "Next.js 14" references in:
  - docs/architecture/IMPLEMENTATION_QUICK_REFERENCE.md
  - docs/load-testing/K6_LOAD_TESTING_GUIDE.md
- Create README.md for libs/ai-services/ (FastAPI AVM, moderation, NLP)
- Create README.md for libs/mcp-servers/ (MCP tool server library)
- Note: CLAUDE.md, README.md, and docs/architecture.md were already
  updated in a prior pass

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-04-15 11:30:00 +07:00
Ho Ngoc Hai
20b79acf08 fix(deploy): tag rollback images before pull, prune after smoke test
Previously, `docker image prune` ran immediately after deploying new
containers, potentially deleting the old images needed for rollback
if smoke tests subsequently failed. Now the deploy pipeline:

1. Tags current images as :rollback before pulling new versions
2. Only runs `docker image prune` after smoke tests pass
3. Uses explicit :rollback tags for rollback instead of relying on
   Docker layer cache (which is fragile)

Applied to:
- scripts/deploy-production.sh (manual deploy script)
- .github/workflows/deploy.yml (staging + production CI jobs)
- docs/deployment.md (updated rollback documentation)

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-04-15 11:17:32 +07:00
Ho Ngoc Hai
b93c28fa01 chore: organize docs — move 37 files from root into docs/ subfolders
Root now contains only essential files:
  README.md, CLAUDE.md, CHANGELOG.md, CONTRIBUTING.md

Reorganized into:
  docs/audits/       — all audit reports & checklists (71 files)
  docs/architecture/  — codebase overview, implementation plan
  docs/guides/        — auth guide, implementation checklist
  docs/load-testing/  — k6 load test guides & endpoints
  docs/security/      — payment & security reviews

Also removed 5 untracked debug/investigation files and
cleaned up playwright-report/ & test-results/ artifacts.

Co-Authored-By: Claude Opus 4 (1M context) <noreply@anthropic.com>
2026-04-13 12:09:14 +07:00
Ho Ngoc Hai
505455b6f8 docs: add production readiness checklist and sign-off document
Comprehensive 12-item production readiness assessment covering:
- Load testing, security, monitoring, backups, incident response
- Database schema freeze, CI/CD, E2E tests, performance benchmarks
- SSL/TLS, DNS, CDN infrastructure readiness

Identified 5 critical blockers and 1 high-priority blocker with
assigned owners and required actions for each.

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-04-12 00:14:57 +07:00
Ho Ngoc Hai
9409706c58 feat(monitoring): add comprehensive alerting rules, Alertmanager, and DR validation
Expand production monitoring with full alert coverage for database connections,
Redis memory/connections, container resources, disk usage, service health, and
backup integrity. Add Alertmanager service with Slack routing for critical and
warning alerts, and add automated backup verification to the pg-backup cron
schedule. Update runbook with DR validation procedures and quarterly checklist.

- Expand Prometheus alert rules from 4 to 24 alerts across 7 groups
- Add Alertmanager container (prom/alertmanager:v0.27.0) with Slack routing
- Configure inhibition rules (critical suppresses warning for same service)
- Schedule automated backup verification at 04:00 UTC daily
- Add Alertmanager datasource to Grafana provisioning
- Update runbook with Section 9: DR Validation (automated + manual procedures)
- Add SLACK_WEBHOOK_URL and Grafana vars to .env.example

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-04-11 20:15:36 +07:00
Ho Ngoc Hai
514aa507db docs: move 8 audit report files to docs/audits/
Move remaining root-level audit and CQRS handler analysis files
to the centralized docs/audits/ directory for consistency.

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-04-11 19:15:24 +07:00
Ho Ngoc Hai
25b22ea9bd docs: move additional exploration docs to docs/audits/
Move 6 recently generated inquiry and MCP exploration documents
to the centralized audit directory.

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-04-11 01:41:23 +07:00
Ho Ngoc Hai
642b593884 docs: move remaining analysis docs to docs/audits/
Move MCP module exploration, quick reference, and inquiries exploration
documents to the centralized audit directory.

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-04-11 01:38:14 +07:00
Ho Ngoc Hai
b8512ebff4 docs: consolidate audit and analysis reports into docs/audits/
Move 36 root-level audit/analysis documents and 7 web app audit documents
into docs/audits/ directory to declutter the project root. Remove stale
EXPLORATION_SUMMARY.txt.

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-04-11 01:37:50 +07:00
Ho Ngoc Hai
64c6074735 feat(devops): add staging auto-deploy pipeline on develop branch
- Trigger deploy workflow on push to `develop` branch (in addition to `master`)
- Add `staging-latest` Docker image tag for develop branch builds
- Add `rollback-staging` job: auto-reverts to previous images on smoke test failure
- Add Slack success notification for staging deploys (previously only failure was notified)
- Record pre-deploy image digests for rollback capability
- Update deployment docs with CI/CD pipeline details, rollback procedures, and required secrets

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-04-11 01:18:37 +07:00
Ho Ngoc Hai
f27b13f712 docs: add production operational runbook
Create comprehensive docs/RUNBOOK.md covering all 14 production services,
health checks, 10 common incident scenarios with diagnosis/resolution,
recovery procedures (DB restore, Redis flush, rolling restart, rollback),
escalation matrix, monitoring dashboards, and PromQL queries.

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-04-11 00:28:02 +07:00
Ho Ngoc Hai
18b5980f29 docs: consolidate and update project documentation
- Fix port numbers across all docs (API :3001, Web :3000)
- Add 6 missing modules to README, CLAUDE.md, and architecture doc
  (agents, health, inquiries, leads, reviews, metrics/web-vitals)
- Add Swagger UI reference and /api/v1 prefix notes
- Create docs/api-endpoints.md with complete REST API reference
- Create docs/README.md as documentation index
- Update deployment guide with Loki, Promtail, pg-backup services
- Update health check table with all current endpoints

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-04-10 23:32:00 +07:00
Ho Ngoc Hai
59272e9321 chore(docs): consolidate 22 audit files from root into docs/audits/
Root directory had accumulated audit/exploration markdown files cluttering
the project root. Moved all audit-related files to docs/audits/ with a
README.md index, and updated cross-references in K6_LOAD_TESTING_GUIDE.md
and README_FRONTEND_DOCS.md.

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-04-10 23:16:00 +07:00
Ho Ngoc Hai
f5ef9d8c86 docs: add comprehensive API error codes reference for frontend consumption
Document all 33 structured errorCode values from DomainException/ErrorCode
enum across all modules (auth, user, listing, property, media, payment,
subscription, course). Includes HTTP status mapping, Vietnamese error
messages, usage examples per module, alphabetical quick-reference table,
and TypeScript integration guide for frontend error handling.

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-04-10 20:11:12 +07:00
Ho Ngoc Hai
6a40ab4555 docs: comprehensive backup & DR documentation
- Added RTO/RPO targets (RPO ≤24h, RTO ≤30m)
- Added Redis backup/restore procedures (volume + BGSAVE)
- Added Typesense backup/restore + rebuild from source
- Added DR runbook: DB failure, service crash, host failure, data corruption
- Restructured doc with clear sections per service

Ref: TEC-1572

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-04-09 08:41:40 +07:00
Ho Ngoc Hai
03231271ca fix(security): remove MinIO hardcoded credentials & add presigned URL support
- Remove hardcoded minioadmin/minioadmin_secret fallback from docker-compose.yml,
  require MINIO_ACCESS_KEY/MINIO_SECRET_KEY env vars (fail-fast with :? syntax)
- Align docker-compose.yml env var names with .env.example (MINIO_ACCESS_KEY/SECRET_KEY)
- Update CI e2e workflow to use GitHub vars with non-default fallbacks
- Update .env.test to use non-default test credentials
- Add @aws-sdk/s3-request-presigner and getPresignedUploadUrl() method to
  MinioMediaStorageService for properly signed client-side uploads
- Remove hardcoded credentials from dev-environment docs

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-04-08 22:44:50 +07:00
Ho Ngoc Hai
ea10c28539 docs: add architecture, deployment guides and update dev environment docs
Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-04-08 05:03:29 +07:00
Ho Ngoc Hai
775eb7b374 feat(ops): add database backup strategy and log aggregation stack
- Add pg-backup container with daily automated pg_dump (02:00 UTC) and 7-day retention
- Add backup/restore scripts with documented recovery procedure
- Add Loki + Promtail for centralized log aggregation from all Docker containers
- Add Loki as Grafana datasource with correlation ID derived fields
- Add Grafana logs dashboard with volume, error rate, HTTP request, and log viewer panels
- Configure Promtail to parse Pino structured JSON logs with level/context labels
- Enhance LoggerService with string-level formatter and service base field
- Configure 15-day log retention in Loki

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-04-08 04:04:32 +07:00
Ho Ngoc Hai
83d55de65b feat: add ESLint flat config, Prettier, dependency-cruiser, and Husky
Setup code quality tooling for the monorepo:
- ESLint 9 flat config with TypeScript, import ordering, and NestJS rules
- Prettier with consistent formatting across all files
- dependency-cruiser enforcing module boundary rules (no cross-module internals, no circular deps)
- Husky + lint-staged for pre-commit hooks
- Auto-fixed existing files for type imports and import ordering

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-04-07 23:57:28 +07:00
Ho Ngoc Hai
e1e5fa6252 feat: scaffold monorepo with Turborepo + NestJS + Next.js
- Turborepo monorepo with pnpm workspaces
- apps/api: NestJS 11.x with CQRS module
- apps/web: Next.js 14 App Router + TailwindCSS
- src/modules/shared: base entities, Result pattern, value objects
- TypeScript 5.7+ strict mode, shared tsconfig base
- Build pipeline: dev, build, lint, test, typecheck

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-04-07 23:52:33 +07:00