Backend:
- Multi-branch shop management: SetDefaultShop, TransferShop commands, GetMerchantShops paginated query
- Shop aggregate: IsDefault field, SetAsDefault/ClearDefault/TransferOwnership behavior methods
- 2 new domain events: ShopSetAsDefaultDomainEvent, ShopTransferredDomainEvent
Frontend:
- Revenue Dashboard (MudChart line/donut/bar, 4 KPI cards, top products table)
- Staff Performance (sortable table, color-coded completion rates, CSV export)
- Customer QR Menu page (/menu/{ShopId}, mobile-first, Vietnamese labels)
- QR Code Generator admin page (batch generate, print-all, per-table QR)
- Responsive POS layout (collapsible sidebar, slide-out order drawer, touch-friendly CSS)
- ResponsiveOrderPanel component (desktop inline / tablet drawer / mobile overlay)
Infrastructure:
- Production K8s manifests: 8 services (3 replicas, 512Mi-1Gi, HPA min3/max10), Redis with persistence
- Production ingress: api.goodgo.vn, cert-manager TLS, rate-limit middleware
- Deploy script: pre-flight checks, dry-run, single-service deploy, rollback support
- CI/CD: deploy-production.yml with environment approval, commit SHA tags
- Prometheus full scrape config (11 targets), docker-compose observability stack
- Production deployment checklist (80+ items)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
186 lines
7.6 KiB
Markdown
186 lines
7.6 KiB
Markdown
# GoodGo Platform -- Production Deployment Checklist
|
|
|
|
> Version: 1.0
|
|
> Last updated: 2026-03-06
|
|
> Owner: DevOps + CTO
|
|
> Domain: goodgo.vn (production), admin.goodgo.vn (admin panel)
|
|
|
|
---
|
|
|
|
## Pre-Deployment
|
|
|
|
- [ ] All E2E tests passing on staging (Playwright + functional tests)
|
|
- [ ] Security audit completed (rate limiting, input validation, RLS)
|
|
- [ ] Database migrations reviewed and tested on staging (EF Core)
|
|
- [ ] Secrets rotated (JWT signing keys, DB passwords, API keys, MinIO credentials)
|
|
- [ ] SSL/TLS certificates configured (goodgo.vn, api.goodgo.vn, admin.goodgo.vn)
|
|
- [ ] DNS records configured (A/CNAME for all subdomains)
|
|
- [ ] CDN configured for static assets (Blazor WASM _framework/, images)
|
|
- [ ] Backup strategy verified (daily PostgreSQL backups via Neon, point-in-time recovery)
|
|
- [ ] Load testing completed on staging (target: 100 concurrent users minimum)
|
|
- [ ] Rollback plan reviewed and approved by CTO
|
|
|
|
---
|
|
|
|
## Infrastructure
|
|
|
|
### Kubernetes Cluster (RKE2)
|
|
- [ ] K8s cluster provisioned and healthy (minimum 3 nodes)
|
|
- [ ] Namespace `production` created
|
|
- [ ] Resource limits set per service (256Mi-512Mi mem, 250m-500m CPU)
|
|
- [ ] HPA (Horizontal Pod Autoscaler) configured (min 2, max 10 replicas)
|
|
- [ ] PersistentVolumeClaims provisioned for MinIO and Redis
|
|
- [ ] Ingress + TLS configured via Traefik IngressClass
|
|
- [ ] Network policies enforced (service-to-service only, deny external by default)
|
|
- [ ] Node affinity / anti-affinity rules for HA (spread pods across nodes)
|
|
|
|
### External Services
|
|
- [ ] Neon PostgreSQL production database provisioned
|
|
- [ ] Redis production instance running (persistence enabled, AOF + RDB)
|
|
- [ ] RabbitMQ production cluster (mirrored queues, 2+ nodes)
|
|
- [ ] MinIO production buckets created with proper access policies
|
|
- [ ] Traefik v3 gateway deployed with production TLS config
|
|
|
|
---
|
|
|
|
## Services (repeat per service)
|
|
|
|
> 8 core services: iam, merchant, order, fnb-engine, wallet, catalog, inventory, chat
|
|
|
|
### Per-Service Checklist
|
|
- [ ] Docker image tagged with commit SHA (NEVER use :latest)
|
|
- [ ] Image pushed to Docker Hub (goodgo/{service}:{sha})
|
|
- [ ] Environment variables set in K8s Secrets (not ConfigMaps for sensitive data)
|
|
- [ ] Health checks responding: `/health/live` (liveness), `/health/ready` (readiness)
|
|
- [ ] Database migrated (EF Core migrations applied via `dotnet ef database update`)
|
|
- [ ] Seed data loaded (if applicable)
|
|
- [ ] Connection string pointing to Neon PostgreSQL production
|
|
- [ ] Redis connection string configured
|
|
- [ ] RabbitMQ connection configured
|
|
- [ ] API versioning header `X-Api-Version` tested
|
|
- [ ] Logging level set to `Information` (not `Debug`)
|
|
- [ ] Serilog structured logging outputting to stdout (for Promtail collection)
|
|
|
|
### Service-Specific
|
|
|
|
| Service | Extra Checks |
|
|
|---------|-------------|
|
|
| iam-service | JWT signing key (RS256) deployed, OIDC discovery endpoint live, MFA configured |
|
|
| merchant-service | Subscription plans seeded, shop lifecycle tested |
|
|
| order-service | SignalR PosHub accessible, Redis backplane connected, MessagePack configured |
|
|
| fnb-engine | Kitchen ticket flow tested, inventory deduction verified |
|
|
| wallet-service | VNPay production credentials configured, IPN callback URL registered |
|
|
| catalog-service | Product categories seeded |
|
|
| inventory-service | Reorder level alerts configured |
|
|
| chat-service | SignalR hub accessible, Redis backplane connected |
|
|
|
|
---
|
|
|
|
## Monitoring
|
|
|
|
- [ ] Prometheus deployed and scraping all 8 services on `/metrics`
|
|
- [ ] Grafana deployed with GoodGo Overview dashboard loaded
|
|
- [ ] Alert rules active in Prometheus (service down, high error rate, high latency, DB pool, disk, memory)
|
|
- [ ] Alert notifications configured (Slack channel #goodgo-alerts and/or PagerDuty)
|
|
- [ ] Loki deployed and receiving logs from all containers via Promtail
|
|
- [ ] Structured logging (Serilog JSON) verified in Loki queries
|
|
- [ ] Grafana Loki datasource configured and queryable
|
|
- [ ] Dashboard access restricted (admin credentials changed from defaults)
|
|
|
|
---
|
|
|
|
## Security
|
|
|
|
### Authentication & Authorization
|
|
- [ ] JWT signing key rotated from staging key (RS256 key pair)
|
|
- [ ] OIDC discovery endpoint (/.well-known/openid-configuration) returns production issuer
|
|
- [ ] Token expiry configured (access: 15min, refresh: 7 days)
|
|
- [ ] RBAC policies verified (Admin, Owner, Staff, Customer roles)
|
|
|
|
### Network & Transport
|
|
- [ ] CORS configured (allow only goodgo.vn, admin.goodgo.vn origins)
|
|
- [ ] HTTPS enforced (HTTP -> HTTPS redirect via Traefik middleware)
|
|
- [ ] Security headers configured via Traefik middleware:
|
|
- `Strict-Transport-Security: max-age=63072000; includeSubDomains; preload`
|
|
- `Content-Security-Policy: default-src 'self'`
|
|
- `X-Frame-Options: DENY`
|
|
- `X-Content-Type-Options: nosniff`
|
|
- `Referrer-Policy: strict-origin-when-cross-origin`
|
|
|
|
### Rate Limiting
|
|
- [ ] Auth endpoints: 10 requests/min (brute force protection)
|
|
- [ ] Payment endpoints: 30 requests/min
|
|
- [ ] General API: 100 requests/min
|
|
- [ ] SignalR hub: 500 requests/min
|
|
|
|
### Data Protection
|
|
- [ ] Row-Level Security (RLS) policies applied on all tenant databases
|
|
- [ ] Database user has minimal required permissions (no SUPERUSER)
|
|
- [ ] MinIO buckets have proper ACLs (private by default, signed URLs for access)
|
|
- [ ] No secrets in environment variables visible via K8s describe (use Secrets, not ConfigMaps)
|
|
- [ ] Sensitive fields excluded from Serilog logging (passwords, tokens, card numbers)
|
|
|
|
---
|
|
|
|
## Rollback Plan
|
|
|
|
- [ ] Previous Docker images retained in Docker Hub (at least 5 recent tags)
|
|
- [ ] Database rollback migration scripts prepared and tested
|
|
- [ ] Feature flags configured for new features (can disable without redeploy)
|
|
- [ ] Canary deployment strategy documented:
|
|
1. Deploy to 1 replica first
|
|
2. Monitor error rate for 10 minutes
|
|
3. If error rate < 1%, proceed to full rollout
|
|
4. If error rate > 5%, auto-rollback via K8s rollout undo
|
|
- [ ] `kubectl rollout undo` command documented per service
|
|
- [ ] Communication plan for downtime (status page, Slack notification)
|
|
|
|
---
|
|
|
|
## Post-Deployment Verification
|
|
|
|
### Smoke Tests (within 30 minutes)
|
|
- [ ] IAM: Login flow works (email + password)
|
|
- [ ] IAM: Token refresh works
|
|
- [ ] IAM: MFA enrollment works
|
|
- [ ] Merchant: Shop creation works
|
|
- [ ] Order: Create order -> add items -> submit
|
|
- [ ] Order: Pay order (cash flow)
|
|
- [ ] FnB: Kitchen ticket appears on KDS
|
|
- [ ] Wallet: VNPay payment redirect works (sandbox -> production)
|
|
- [ ] Catalog: Product listing loads
|
|
- [ ] Inventory: Stock levels queryable
|
|
- [ ] Chat: SignalR connection established
|
|
- [ ] Storage: File upload + signed URL access
|
|
|
|
### Functional Verification (within 2 hours)
|
|
- [ ] Full Karaoke POS workflow (room select -> order -> pay -> close)
|
|
- [ ] Full Restaurant POS workflow (table -> order -> kitchen -> serve -> pay)
|
|
- [ ] QR code menu accessible from customer phone
|
|
- [ ] EOD report generates correctly with real data
|
|
- [ ] Multi-browser session (concurrent POS users on same shop)
|
|
|
|
### Monitoring Verification (within 24 hours)
|
|
- [ ] Monitor error rates (target: < 0.1% 5xx)
|
|
- [ ] Monitor p95 latency (target: < 500ms)
|
|
- [ ] Monitor SignalR connection stability (no unexpected disconnects)
|
|
- [ ] Verify Grafana dashboards show live data
|
|
- [ ] Verify alert rules fire correctly (test with synthetic failure if needed)
|
|
- [ ] Review Loki logs for any unhandled exceptions
|
|
- [ ] Verify PostgreSQL connection pool utilization is healthy (< 50%)
|
|
|
|
---
|
|
|
|
## Sign-Off
|
|
|
|
| Role | Name | Date | Approved |
|
|
|------|------|------|:--------:|
|
|
| CTO | | | [ ] |
|
|
| Tech Lead | | | [ ] |
|
|
| DevOps Lead | | | [ ] |
|
|
| QA Lead | | | [ ] |
|
|
|
|
---
|
|
|
|
*This checklist must be completed and signed off before production traffic is routed to the new deployment.*
|