Commit Graph

18 Commits

Author SHA1 Message Date
Ho Ngoc Hai
b5b717ed4b fix(k8s): add redis label to replication network policy for sentinel
All checks were successful
Build & Deploy to K8s / build-and-deploy (push) Successful in 56s
Redis StatefulSet pod uses label app=redis but allow-redis-replication
only listed redis-master/redis-replica/redis-sentinel. Sentinel could
not reach redis-0, causing infinite wait loop and CrashLoopBackOff.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 02:43:51 +07:00
Ho Ngoc Hai
0f18d9ad9d ci: trigger rebuild after rabbitmq probe fix on cluster
All checks were successful
Build & Deploy to K8s / build-and-deploy (push) Successful in 42s
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 02:29:34 +07:00
Ho Ngoc Hai
b768c9dc31 fix(k8s): sync cluster fixes to source — JWT authority, secrets, Redis config
Some checks failed
Build & Deploy to K8s / build-and-deploy (push) Failing after 35s
1. ConfigMap: Jwt__Authority → http://iam-service:8080 (internal K8s DNS)
   Pods cannot reach external HTTPS for OIDC discovery.
   Token issuer remains https://api.techbi.org via IssuerUri.

2. Secrets: Add IdentityServer__ClientId/Secret for pos-web BFF auth.

3. Redis: Add redis-config.yaml ConfigMap with fixed start scripts.
   - start-redis.sh reads from /tmp (init container copies there)
   - start-sentinel.sh reads from /config (directly mounted)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 02:24:03 +07:00
Ho Ngoc Hai
8a5b25936d fix(auth): add bff-client to IdentityServer + fix pos-web auth
Some checks failed
Build & Deploy to K8s / build-and-deploy (push) Failing after 10m14s
Login was failing because:
1. IdentityServer Config.cs had no 'bff-client' client definition
   (pos-web uses bff-client for BFF authentication pattern)
2. pos-web had no IdentityServer__ClientSecret env var configured
3. Network policy blocked pos-web → iam-service egress

Fixes:
- Add bff-client to Config.Clients (ResourceOwnerPassword grant,
  8h access token, 7d refresh token for POS sessions)
- Add IdentityServer client credentials to pos-web.yaml from secrets
- Add pos-web to allow-inter-service-egress network policy

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 22:22:37 +07:00
Ho Ngoc Hai
5ce64b9a1c feat(infra): migrate POS System routing to Traefik v3
Some checks failed
Build & Deploy to K8s / build-and-deploy (push) Failing after 26s
Architecture: Nginx Ingress (TLS) → Traefik (routing) → Services

- Add traefik.yaml: Traefik v3.3 deployment with file provider config
  - 65+ route rules for api.techbi.org (25 backend services)
  - platform.techbi.org → pos-web
  - Middlewares: rate-limit (100/s), retry (3x), compress, secure-headers
  - WebSocket support for SignalR hubs (/hubs/pos, /hubs/kitchen, /hubs/chat)
- Update ingress.yaml: Nginx now proxies POS domains to Traefik ClusterIP
  (Nginx still handles TLS termination via cert-manager/Let's Encrypt)
- Update network-policy.yaml: Add Traefik ingress/egress/DNS policies
- Update deploy.yaml: Add traefik.yaml to CI/CD apply step
- Other services unaffected: Neon-UI, Rancher, Gitea, Harbor, Grafana, MinIO

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 21:40:12 +07:00
Ho Ngoc Hai
084771bfc5 fix(k8s): add inter-service ingress policy + reduce CPU requests
Some checks failed
Build & Deploy to K8s / build-and-deploy (push) Failing after 33s
Critical fixes applied to staging K8s manifests:

1. NetworkPolicy: Add allow-inter-service-ingress (services can receive
   requests from each other - fixes promotion→wallet health check timeout)
2. NetworkPolicy: Add allow-app-to-neon-egress (explicit DB access rule)
3. NetworkPolicy: Add ingress-nginx namespace to allow-traefik-ingress
4. Resources: Reduce CPU requests 250m→100m (cluster was at 99%)
5. IAM Service: Add signing certificate volume mount (required for
   IdentityServer in non-Development environments)

Without #1, any service calling another service via HTTP would timeout
because default-deny-all blocks all ingress and only egress was allowed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 20:19:03 +07:00
Ho Ngoc Hai
84f21a4d1c feat(deploy): full staging deployment - 1 replica, parallel Kaniko, all 26 services
Some checks failed
Build & Deploy to K8s / build-and-deploy (push) Failing after 6s
- Scale all 26 services from 2→1 replicas (fit 8.4 available cores)
- HPA min 2→1, max 4→2 for staging
- Rewrite Gitea Actions: batch parallel Kaniko builds (5 per batch)
- Secure credentials via secrets (REPO_PASSWORD, HARBOR_*)
- Kaniko clones from Gitea (already mirrored from GitHub)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 21:44:46 +07:00
Ho Ngoc Hai
31d24c8c4d fix(k8s): switch domains from goodgo.vn to techbi.org
All checks were successful
Build & Deploy to K8s / build-and-deploy (push) Successful in 33s
- api.techbi.org (backend API)
- platform.techbi.org (frontend POS)
- Update JWT Authority, CORS, IdentityServer IssuerUri
- Update TLS secret names

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 20:30:08 +07:00
Ho Ngoc Hai
b885da7cdb fix(cicd): skip namespace apply (already exists) + add patch permission
All checks were successful
Build & Deploy to K8s / build-and-deploy (push) Successful in 32s
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 20:18:02 +07:00
Ho Ngoc Hai
43f0c79478 fix(cicd): use Kaniko Jobs for building Docker images in Gitea Actions
Some checks failed
Build & Deploy to K8s / build-and-deploy (push) Failing after 10s
- Replace docker build with Kaniko Jobs (runner has no Docker daemon)
- Add batch/jobs RBAC for act_runner to create Kaniko Jobs
- Use MinIO ExternalName pointing to existing minio namespace
- Skip build when only K8s configs changed

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 20:15:20 +07:00
Ho Ngoc Hai
48bb30b009 feat(cicd): switch CI/CD from GitHub Actions to Gitea Actions
Some checks failed
Build & Deploy to K8s / build-and-deploy (push) Failing after 15s
- Add .gitea/workflows/deploy.yaml (detect changes → docker build → Harbor push → kubectl deploy)
- Add gitea-sync-cronjob.yaml (GitHub → Gitea mirror sync every 5 min)
- Add act-runner-rbac.yaml (RBAC for act_runner to deploy to staging namespace)
- Add setup-secrets.sh (one-time cluster secret setup script)
- Disable GitHub Actions deploy-staging.yml (CI/CD now via Gitea)

Flow: GitHub push → Gitea sync (5min) → Gitea Actions → Docker build → Harbor → K8s

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 20:03:19 +07:00
Ho Ngoc Hai
966f5412bd feat(k8s): add full K8s staging deployment for all 25 services
- Add 17 new K8s manifests (15 services + RabbitMQ + MinIO)
- Update secrets.yaml with 24 DB URLs for remote PostgreSQL
- Update configmap.yaml with 25 service discovery URLs
- Update ingress.yaml with routes for all services (Nginx + letsencrypt-prod)
- Update network-policy.yaml with all services + RabbitMQ/MinIO policies
- Update deploy-staging.yml CI/CD for all 25 services via Harbor registry
- Fix mkt-* Dockerfiles (add curl, JwtBearer NuGet package)
- Fix membership/ads-billing PendingModelChangesWarning
- Switch DB connections to remote PostgreSQL (212.28.186.239:30992)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 19:53:09 +07:00
Ho Ngoc Hai
36a0a9c256 feat: Add functional tests for MktZaloService, new contract and load tests, and audit documentation, while removing a legacy infrastructure project and updating service configurations. 2026-03-25 15:00:05 +07:00
Ho Ngoc Hai
7b92332710 fix(devops): resolve 4 P2 DevOps improvements (Wave 3 — TEC-263)
- DEVOPS-W-01: Add oliver006/redis_exporter to docker-compose.yml so
  the existing prometheus.yml scrape job (redis-exporter:9121) resolves
- DEVOPS-W-04: Add redis-sentinel.yaml with Redis Sentinel HA setup
  (1 master StatefulSet + 2 replica StatefulSet + 3 sentinel pods)
  replacing the single-instance SPOF redis.yaml in staging K8s
- DEVOPS-W-05: Add network-policy.yaml with default-deny-all NetworkPolicy
  + explicit allow rules for inter-service, Traefik ingress, Redis access,
  Prometheus scrape, and external egress (Neon PostgreSQL, AMQP)
- DEVOPS-M-01: Add aquasecurity/trivy-action to docker-build.yml to scan
  every built image for CRITICAL/HIGH CVEs; results uploaded to GitHub
  Security tab via SARIF

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-03-23 09:54:32 +07:00
Ho Ngoc Hai
1d12a7980b feat: add order lifecycle integration tests (29 tests) and staging K8s deployment manifests
Testing (P0-7):
- 29 functional tests for order-service API (create/pay/complete/cancel lifecycle)
- CustomWebApplicationFactory with InMemory DB, mocked wallet/SignalR/tenant
- TestAuthHandler for JWT auth in tests
- Full lifecycle tests: cash flow and online payment flow end-to-end

Staging Deployment (P0-8):
- K8s manifests for 8 MVP services + Redis + POS web (namespace, configmap, secrets)
- Traefik Ingress with path-based routing and TLS via cert-manager
- HPA auto-scaling (2-4 replicas, CPU/memory thresholds)
- deploy-staging.sh script with --dry-run and --service flags
- CI/CD: deploy-staging.yml and docker-build.yml with matrix strategy
- Consistent patterns: port 8080, 3 health probes, RollingUpdate

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 13:56:03 +07:00
Ho Ngoc Hai
019c79b898 Refactor auth-service to iam-service and update related configurations
- Renamed auth-service to iam-service across various files for consistency.
- Updated deployment workflows, database migration scripts, and documentation to reflect the service name change.
- Enhanced bilingual documentation for clarity on the new service structure and usage.
- Removed outdated references to auth-service in scripts and configuration files to streamline the project structure.
2025-12-30 21:03:00 +07:00
Ho Ngoc Hai
b104fafa85 Refactor auth-service to iam-service and update related documentation
- Renamed auth-service to iam-service across various files for consistency.
- Updated Dockerfiles, deployment configurations, and documentation to reflect the service name change.
- Enhanced testing commands in documentation to point to the new iam-service.
- Removed outdated auth-service files and configurations to streamline the project structure.
- Improved bilingual documentation for clarity on the new service structure and usage.
2025-12-30 20:54:21 +07:00
Ho Ngoc Hai
4da46b5b8e Sure! Pl 2025-12-27 01:31:10 +07:00