feat: Phase 2 close-out — multi-branch management, production K8s, revenue dashboard UI, responsive POS

Backend:
- Multi-branch shop management: SetDefaultShop, TransferShop commands, GetMerchantShops paginated query
- Shop aggregate: IsDefault field, SetAsDefault/ClearDefault/TransferOwnership behavior methods
- 2 new domain events: ShopSetAsDefaultDomainEvent, ShopTransferredDomainEvent

Frontend:
- Revenue Dashboard (MudChart line/donut/bar, 4 KPI cards, top products table)
- Staff Performance (sortable table, color-coded completion rates, CSV export)
- Customer QR Menu page (/menu/{ShopId}, mobile-first, Vietnamese labels)
- QR Code Generator admin page (batch generate, print-all, per-table QR)
- Responsive POS layout (collapsible sidebar, slide-out order drawer, touch-friendly CSS)
- ResponsiveOrderPanel component (desktop inline / tablet drawer / mobile overlay)

Infrastructure:
- Production K8s manifests: 8 services (3 replicas, 512Mi-1Gi, HPA min3/max10), Redis with persistence
- Production ingress: api.goodgo.vn, cert-manager TLS, rate-limit middleware
- Deploy script: pre-flight checks, dry-run, single-service deploy, rollback support
- CI/CD: deploy-production.yml with environment approval, commit SHA tags
- Prometheus full scrape config (11 targets), docker-compose observability stack
- Production deployment checklist (80+ items)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Ho Ngoc Hai
2026-03-06 19:58:40 +07:00
parent a6ea9fa29b
commit 76b5e6afd0
40 changed files with 5582 additions and 165 deletions

View File

@@ -0,0 +1,185 @@
# GoodGo Platform -- Production Deployment Checklist
> Version: 1.0
> Last updated: 2026-03-06
> Owner: DevOps + CTO
> Domain: goodgo.vn (production), admin.goodgo.vn (admin panel)
---
## Pre-Deployment
- [ ] All E2E tests passing on staging (Playwright + functional tests)
- [ ] Security audit completed (rate limiting, input validation, RLS)
- [ ] Database migrations reviewed and tested on staging (EF Core)
- [ ] Secrets rotated (JWT signing keys, DB passwords, API keys, MinIO credentials)
- [ ] SSL/TLS certificates configured (goodgo.vn, api.goodgo.vn, admin.goodgo.vn)
- [ ] DNS records configured (A/CNAME for all subdomains)
- [ ] CDN configured for static assets (Blazor WASM _framework/, images)
- [ ] Backup strategy verified (daily PostgreSQL backups via Neon, point-in-time recovery)
- [ ] Load testing completed on staging (target: 100 concurrent users minimum)
- [ ] Rollback plan reviewed and approved by CTO
---
## Infrastructure
### Kubernetes Cluster (RKE2)
- [ ] K8s cluster provisioned and healthy (minimum 3 nodes)
- [ ] Namespace `production` created
- [ ] Resource limits set per service (256Mi-512Mi mem, 250m-500m CPU)
- [ ] HPA (Horizontal Pod Autoscaler) configured (min 2, max 10 replicas)
- [ ] PersistentVolumeClaims provisioned for MinIO and Redis
- [ ] Ingress + TLS configured via Traefik IngressClass
- [ ] Network policies enforced (service-to-service only, deny external by default)
- [ ] Node affinity / anti-affinity rules for HA (spread pods across nodes)
### External Services
- [ ] Neon PostgreSQL production database provisioned
- [ ] Redis production instance running (persistence enabled, AOF + RDB)
- [ ] RabbitMQ production cluster (mirrored queues, 2+ nodes)
- [ ] MinIO production buckets created with proper access policies
- [ ] Traefik v3 gateway deployed with production TLS config
---
## Services (repeat per service)
> 8 core services: iam, merchant, order, fnb-engine, wallet, catalog, inventory, chat
### Per-Service Checklist
- [ ] Docker image tagged with commit SHA (NEVER use :latest)
- [ ] Image pushed to Docker Hub (goodgo/{service}:{sha})
- [ ] Environment variables set in K8s Secrets (not ConfigMaps for sensitive data)
- [ ] Health checks responding: `/health/live` (liveness), `/health/ready` (readiness)
- [ ] Database migrated (EF Core migrations applied via `dotnet ef database update`)
- [ ] Seed data loaded (if applicable)
- [ ] Connection string pointing to Neon PostgreSQL production
- [ ] Redis connection string configured
- [ ] RabbitMQ connection configured
- [ ] API versioning header `X-Api-Version` tested
- [ ] Logging level set to `Information` (not `Debug`)
- [ ] Serilog structured logging outputting to stdout (for Promtail collection)
### Service-Specific
| Service | Extra Checks |
|---------|-------------|
| iam-service | JWT signing key (RS256) deployed, OIDC discovery endpoint live, MFA configured |
| merchant-service | Subscription plans seeded, shop lifecycle tested |
| order-service | SignalR PosHub accessible, Redis backplane connected, MessagePack configured |
| fnb-engine | Kitchen ticket flow tested, inventory deduction verified |
| wallet-service | VNPay production credentials configured, IPN callback URL registered |
| catalog-service | Product categories seeded |
| inventory-service | Reorder level alerts configured |
| chat-service | SignalR hub accessible, Redis backplane connected |
---
## Monitoring
- [ ] Prometheus deployed and scraping all 8 services on `/metrics`
- [ ] Grafana deployed with GoodGo Overview dashboard loaded
- [ ] Alert rules active in Prometheus (service down, high error rate, high latency, DB pool, disk, memory)
- [ ] Alert notifications configured (Slack channel #goodgo-alerts and/or PagerDuty)
- [ ] Loki deployed and receiving logs from all containers via Promtail
- [ ] Structured logging (Serilog JSON) verified in Loki queries
- [ ] Grafana Loki datasource configured and queryable
- [ ] Dashboard access restricted (admin credentials changed from defaults)
---
## Security
### Authentication & Authorization
- [ ] JWT signing key rotated from staging key (RS256 key pair)
- [ ] OIDC discovery endpoint (/.well-known/openid-configuration) returns production issuer
- [ ] Token expiry configured (access: 15min, refresh: 7 days)
- [ ] RBAC policies verified (Admin, Owner, Staff, Customer roles)
### Network & Transport
- [ ] CORS configured (allow only goodgo.vn, admin.goodgo.vn origins)
- [ ] HTTPS enforced (HTTP -> HTTPS redirect via Traefik middleware)
- [ ] Security headers configured via Traefik middleware:
- `Strict-Transport-Security: max-age=63072000; includeSubDomains; preload`
- `Content-Security-Policy: default-src 'self'`
- `X-Frame-Options: DENY`
- `X-Content-Type-Options: nosniff`
- `Referrer-Policy: strict-origin-when-cross-origin`
### Rate Limiting
- [ ] Auth endpoints: 10 requests/min (brute force protection)
- [ ] Payment endpoints: 30 requests/min
- [ ] General API: 100 requests/min
- [ ] SignalR hub: 500 requests/min
### Data Protection
- [ ] Row-Level Security (RLS) policies applied on all tenant databases
- [ ] Database user has minimal required permissions (no SUPERUSER)
- [ ] MinIO buckets have proper ACLs (private by default, signed URLs for access)
- [ ] No secrets in environment variables visible via K8s describe (use Secrets, not ConfigMaps)
- [ ] Sensitive fields excluded from Serilog logging (passwords, tokens, card numbers)
---
## Rollback Plan
- [ ] Previous Docker images retained in Docker Hub (at least 5 recent tags)
- [ ] Database rollback migration scripts prepared and tested
- [ ] Feature flags configured for new features (can disable without redeploy)
- [ ] Canary deployment strategy documented:
1. Deploy to 1 replica first
2. Monitor error rate for 10 minutes
3. If error rate < 1%, proceed to full rollout
4. If error rate > 5%, auto-rollback via K8s rollout undo
- [ ] `kubectl rollout undo` command documented per service
- [ ] Communication plan for downtime (status page, Slack notification)
---
## Post-Deployment Verification
### Smoke Tests (within 30 minutes)
- [ ] IAM: Login flow works (email + password)
- [ ] IAM: Token refresh works
- [ ] IAM: MFA enrollment works
- [ ] Merchant: Shop creation works
- [ ] Order: Create order -> add items -> submit
- [ ] Order: Pay order (cash flow)
- [ ] FnB: Kitchen ticket appears on KDS
- [ ] Wallet: VNPay payment redirect works (sandbox -> production)
- [ ] Catalog: Product listing loads
- [ ] Inventory: Stock levels queryable
- [ ] Chat: SignalR connection established
- [ ] Storage: File upload + signed URL access
### Functional Verification (within 2 hours)
- [ ] Full Karaoke POS workflow (room select -> order -> pay -> close)
- [ ] Full Restaurant POS workflow (table -> order -> kitchen -> serve -> pay)
- [ ] QR code menu accessible from customer phone
- [ ] EOD report generates correctly with real data
- [ ] Multi-browser session (concurrent POS users on same shop)
### Monitoring Verification (within 24 hours)
- [ ] Monitor error rates (target: < 0.1% 5xx)
- [ ] Monitor p95 latency (target: < 500ms)
- [ ] Monitor SignalR connection stability (no unexpected disconnects)
- [ ] Verify Grafana dashboards show live data
- [ ] Verify alert rules fire correctly (test with synthetic failure if needed)
- [ ] Review Loki logs for any unhandled exceptions
- [ ] Verify PostgreSQL connection pool utilization is healthy (< 50%)
---
## Sign-Off
| Role | Name | Date | Approved |
|------|------|------|:--------:|
| CTO | | | [ ] |
| Tech Lead | | | [ ] |
| DevOps Lead | | | [ ] |
| QA Lead | | | [ ] |
---
*This checklist must be completed and signed off before production traffic is routed to the new deployment.*