Backend:
- Multi-branch shop management: SetDefaultShop, TransferShop commands, GetMerchantShops paginated query
- Shop aggregate: IsDefault field, SetAsDefault/ClearDefault/TransferOwnership behavior methods
- 2 new domain events: ShopSetAsDefaultDomainEvent, ShopTransferredDomainEvent
Frontend:
- Revenue Dashboard (MudChart line/donut/bar, 4 KPI cards, top products table)
- Staff Performance (sortable table, color-coded completion rates, CSV export)
- Customer QR Menu page (/menu/{ShopId}, mobile-first, Vietnamese labels)
- QR Code Generator admin page (batch generate, print-all, per-table QR)
- Responsive POS layout (collapsible sidebar, slide-out order drawer, touch-friendly CSS)
- ResponsiveOrderPanel component (desktop inline / tablet drawer / mobile overlay)
Infrastructure:
- Production K8s manifests: 8 services (3 replicas, 512Mi-1Gi, HPA min3/max10), Redis with persistence
- Production ingress: api.goodgo.vn, cert-manager TLS, rate-limit middleware
- Deploy script: pre-flight checks, dry-run, single-service deploy, rollback support
- CI/CD: deploy-production.yml with environment approval, commit SHA tags
- Prometheus full scrape config (11 targets), docker-compose observability stack
- Production deployment checklist (80+ items)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
7.6 KiB
7.6 KiB
GoodGo Platform -- Production Deployment Checklist
Version: 1.0 Last updated: 2026-03-06 Owner: DevOps + CTO Domain: goodgo.vn (production), admin.goodgo.vn (admin panel)
Pre-Deployment
- All E2E tests passing on staging (Playwright + functional tests)
- Security audit completed (rate limiting, input validation, RLS)
- Database migrations reviewed and tested on staging (EF Core)
- Secrets rotated (JWT signing keys, DB passwords, API keys, MinIO credentials)
- SSL/TLS certificates configured (goodgo.vn, api.goodgo.vn, admin.goodgo.vn)
- DNS records configured (A/CNAME for all subdomains)
- CDN configured for static assets (Blazor WASM _framework/, images)
- Backup strategy verified (daily PostgreSQL backups via Neon, point-in-time recovery)
- Load testing completed on staging (target: 100 concurrent users minimum)
- Rollback plan reviewed and approved by CTO
Infrastructure
Kubernetes Cluster (RKE2)
- K8s cluster provisioned and healthy (minimum 3 nodes)
- Namespace
productioncreated - Resource limits set per service (256Mi-512Mi mem, 250m-500m CPU)
- HPA (Horizontal Pod Autoscaler) configured (min 2, max 10 replicas)
- PersistentVolumeClaims provisioned for MinIO and Redis
- Ingress + TLS configured via Traefik IngressClass
- Network policies enforced (service-to-service only, deny external by default)
- Node affinity / anti-affinity rules for HA (spread pods across nodes)
External Services
- Neon PostgreSQL production database provisioned
- Redis production instance running (persistence enabled, AOF + RDB)
- RabbitMQ production cluster (mirrored queues, 2+ nodes)
- MinIO production buckets created with proper access policies
- Traefik v3 gateway deployed with production TLS config
Services (repeat per service)
8 core services: iam, merchant, order, fnb-engine, wallet, catalog, inventory, chat
Per-Service Checklist
- Docker image tagged with commit SHA (NEVER use :latest)
- Image pushed to Docker Hub (goodgo/{service}:{sha})
- Environment variables set in K8s Secrets (not ConfigMaps for sensitive data)
- Health checks responding:
/health/live(liveness),/health/ready(readiness) - Database migrated (EF Core migrations applied via
dotnet ef database update) - Seed data loaded (if applicable)
- Connection string pointing to Neon PostgreSQL production
- Redis connection string configured
- RabbitMQ connection configured
- API versioning header
X-Api-Versiontested - Logging level set to
Information(notDebug) - Serilog structured logging outputting to stdout (for Promtail collection)
Service-Specific
| Service | Extra Checks |
|---|---|
| iam-service | JWT signing key (RS256) deployed, OIDC discovery endpoint live, MFA configured |
| merchant-service | Subscription plans seeded, shop lifecycle tested |
| order-service | SignalR PosHub accessible, Redis backplane connected, MessagePack configured |
| fnb-engine | Kitchen ticket flow tested, inventory deduction verified |
| wallet-service | VNPay production credentials configured, IPN callback URL registered |
| catalog-service | Product categories seeded |
| inventory-service | Reorder level alerts configured |
| chat-service | SignalR hub accessible, Redis backplane connected |
Monitoring
- Prometheus deployed and scraping all 8 services on
/metrics - Grafana deployed with GoodGo Overview dashboard loaded
- Alert rules active in Prometheus (service down, high error rate, high latency, DB pool, disk, memory)
- Alert notifications configured (Slack channel #goodgo-alerts and/or PagerDuty)
- Loki deployed and receiving logs from all containers via Promtail
- Structured logging (Serilog JSON) verified in Loki queries
- Grafana Loki datasource configured and queryable
- Dashboard access restricted (admin credentials changed from defaults)
Security
Authentication & Authorization
- JWT signing key rotated from staging key (RS256 key pair)
- OIDC discovery endpoint (/.well-known/openid-configuration) returns production issuer
- Token expiry configured (access: 15min, refresh: 7 days)
- RBAC policies verified (Admin, Owner, Staff, Customer roles)
Network & Transport
- CORS configured (allow only goodgo.vn, admin.goodgo.vn origins)
- HTTPS enforced (HTTP -> HTTPS redirect via Traefik middleware)
- Security headers configured via Traefik middleware:
Strict-Transport-Security: max-age=63072000; includeSubDomains; preloadContent-Security-Policy: default-src 'self'X-Frame-Options: DENYX-Content-Type-Options: nosniffReferrer-Policy: strict-origin-when-cross-origin
Rate Limiting
- Auth endpoints: 10 requests/min (brute force protection)
- Payment endpoints: 30 requests/min
- General API: 100 requests/min
- SignalR hub: 500 requests/min
Data Protection
- Row-Level Security (RLS) policies applied on all tenant databases
- Database user has minimal required permissions (no SUPERUSER)
- MinIO buckets have proper ACLs (private by default, signed URLs for access)
- No secrets in environment variables visible via K8s describe (use Secrets, not ConfigMaps)
- Sensitive fields excluded from Serilog logging (passwords, tokens, card numbers)
Rollback Plan
- Previous Docker images retained in Docker Hub (at least 5 recent tags)
- Database rollback migration scripts prepared and tested
- Feature flags configured for new features (can disable without redeploy)
- Canary deployment strategy documented:
- Deploy to 1 replica first
- Monitor error rate for 10 minutes
- If error rate < 1%, proceed to full rollout
- If error rate > 5%, auto-rollback via K8s rollout undo
kubectl rollout undocommand documented per service- Communication plan for downtime (status page, Slack notification)
Post-Deployment Verification
Smoke Tests (within 30 minutes)
- IAM: Login flow works (email + password)
- IAM: Token refresh works
- IAM: MFA enrollment works
- Merchant: Shop creation works
- Order: Create order -> add items -> submit
- Order: Pay order (cash flow)
- FnB: Kitchen ticket appears on KDS
- Wallet: VNPay payment redirect works (sandbox -> production)
- Catalog: Product listing loads
- Inventory: Stock levels queryable
- Chat: SignalR connection established
- Storage: File upload + signed URL access
Functional Verification (within 2 hours)
- Full Karaoke POS workflow (room select -> order -> pay -> close)
- Full Restaurant POS workflow (table -> order -> kitchen -> serve -> pay)
- QR code menu accessible from customer phone
- EOD report generates correctly with real data
- Multi-browser session (concurrent POS users on same shop)
Monitoring Verification (within 24 hours)
- Monitor error rates (target: < 0.1% 5xx)
- Monitor p95 latency (target: < 500ms)
- Monitor SignalR connection stability (no unexpected disconnects)
- Verify Grafana dashboards show live data
- Verify alert rules fire correctly (test with synthetic failure if needed)
- Review Loki logs for any unhandled exceptions
- Verify PostgreSQL connection pool utilization is healthy (< 50%)
Sign-Off
| Role | Name | Date | Approved |
|---|---|---|---|
| CTO | [ ] | ||
| Tech Lead | [ ] | ||
| DevOps Lead | [ ] | ||
| QA Lead | [ ] |
This checklist must be completed and signed off before production traffic is routed to the new deployment.