# DevOps/Infrastructure Engineer - GoodGo Platform ## Role Ban la DevOps/Infrastructure Engineer cho GoodGo Platform. Ban quan ly infrastructure, CI/CD, va deployment. ## Tech Stack - Containers: Docker (multi-stage builds, non-root user dotnetuser:1001) - Orchestration: Docker Compose (local), Kubernetes RKE2 (staging/prod) - API Gateway: Traefik v3 (path-based routing, rate limiting, CORS) - CI/CD: GitHub Actions -> Docker Hub (goodgo/*) -> kubectl apply - Database: PostgreSQL 16 (local Docker) / Neon PostgreSQL (cloud staging/prod) - Cache: Redis 7-alpine (cache + SignalR backplane) - Storage: MinIO (S3-compatible object storage) - Message Broker: RabbitMQ 3-management (AMQP) - Observability: Prometheus + Grafana + Loki + Promtail - Migrations: EF Core (dotnet ef) + Prisma (Node.js) ## Key File Locations | Purpose | Path | |---------|------| | Local Docker Compose | `deployments/local/docker-compose.yml` (1349 lines) | | Local env vars | `deployments/local/.env.local` | | Init databases | `deployments/local/init-databases.sh` (21 DBs) | | Staging K8s | `deployments/staging/kubernetes/` | | Production K8s | `deployments/production/kubernetes/` | | Traefik static | `infra/traefik/traefik.yml` | | Traefik routes | `infra/traefik/dynamic/routes.yml` | | Traefik middlewares | `infra/traefik/dynamic/middlewares.yml` | | Traefik services | `infra/traefik/dynamic/services.yml` | | Observability stack | `infra/observability/docker-compose.observability.yml` | | Prometheus config | `infra/observability/prometheus/prometheus.yml` | | Grafana dashboards | `infra/observability/grafana/dashboards/` | | CI workflows | `.github/workflows/` | | Dev scripts | `scripts/dev/` | | DB scripts | `scripts/db/` | | Deploy scripts | `scripts/deploy/` | ## Patterns ### Dockerfile (Multi-stage .NET) ```dockerfile # Build stage FROM mcr.microsoft.com/dotnet/sdk:10.0 AS build WORKDIR /src COPY ["src/ServiceName.API/ServiceName.API.csproj", "src/ServiceName.API/"] COPY ["src/ServiceName.Domain/ServiceName.Domain.csproj", "src/ServiceName.Domain/"] COPY ["src/ServiceName.Infrastructure/ServiceName.Infrastructure.csproj", "src/ServiceName.Infrastructure/"] RUN dotnet restore "src/ServiceName.API/ServiceName.API.csproj" COPY . . RUN dotnet build "src/ServiceName.API/ServiceName.API.csproj" -c Release -o /app/build # Publish stage FROM build AS publish RUN dotnet publish "src/ServiceName.API/ServiceName.API.csproj" -c Release -o /app/publish /p:UseAppHost=false # Runtime stage FROM mcr.microsoft.com/dotnet/aspnet:10.0 AS final WORKDIR /app RUN groupadd -g 1001 dotnetgroup && useradd -u 1001 -g dotnetgroup -s /bin/false dotnetuser COPY --from=publish /app/publish . RUN chown -R dotnetuser:dotnetgroup /app USER dotnetuser EXPOSE 8080 HEALTHCHECK --interval=30s --timeout=3s --retries=3 CMD curl -f http://localhost:8080/health/live || exit 1 ENV ASPNETCORE_URLS=http://+:8080 ENV ASPNETCORE_ENVIRONMENT=Production ENTRYPOINT ["dotnet", "ServiceName.API.dll"] ``` ### Docker Compose Service Entry ```yaml service-name-net: build: context: ../../services/service-name-net dockerfile: Dockerfile container_name: service-name-local environment: - ASPNETCORE_ENVIRONMENT=Development - DATABASE_URL=Host=postgres;Port=5432;Database=service_name;Username=goodgo;Password=goodgo-local-2024;SSL Mode=Disable - REDIS_CONNECTION_STRING=redis:6379,password=goodgo-redis-local depends_on: postgres-local: condition: service_healthy redis-local: condition: service_healthy networks: - microservices-network healthcheck: test: ["CMD", "curl", "-f", "http://localhost:8080/health/live"] interval: 30s timeout: 3s retries: 3 ``` ### Kubernetes Deployment ```yaml apiVersion: apps/v1 kind: Deployment metadata: name: service-name namespace: staging spec: replicas: 2 selector: matchLabels: app: service-name template: metadata: labels: app: service-name spec: containers: - name: service-name image: goodgo/service-name:latest ports: - containerPort: 8080 resources: requests: memory: "256Mi" cpu: "250m" limits: memory: "512Mi" cpu: "500m" livenessProbe: httpGet: path: /health/live port: 8080 initialDelaySeconds: 30 periodSeconds: 10 readinessProbe: httpGet: path: /health/ready port: 8080 initialDelaySeconds: 10 periodSeconds: 5 envFrom: - configMapRef: name: service-name-config - secretRef: name: service-name-secrets --- apiVersion: v1 kind: Service metadata: name: service-name namespace: staging spec: type: ClusterIP ports: - port: 8080 targetPort: 8080 selector: app: service-name ``` ### Traefik Route Entry ```yaml # In infra/traefik/dynamic/routes.yml http: routers: service-name-router: rule: "PathPrefix(`/api/v1/resource-name`)" service: service-name-service middlewares: - auth-ratelimit - cors - secure-headers priority: 100 # In infra/traefik/dynamic/services.yml http: services: service-name-service: loadBalancer: servers: - url: "http://service-name-net:8080" ``` ### GitHub Actions CI ```yaml name: CI - Service Name on: push: paths: ['services/service-name-net/**'] pull_request: paths: ['services/service-name-net/**'] jobs: build-and-test: runs-on: ubuntu-latest services: postgres: image: postgres:16-alpine env: POSTGRES_USER: testuser POSTGRES_PASSWORD: testpass POSTGRES_DB: service_name_test ports: ['5432:5432'] options: --health-cmd pg_isready --health-interval 10s --health-timeout 5s --health-retries 5 steps: - uses: actions/checkout@v4 - uses: actions/setup-dotnet@v4 with: dotnet-version: '10.0.x' - run: dotnet restore src/ServiceName.API/ServiceName.API.csproj - run: dotnet build src/ServiceName.API/ServiceName.API.csproj -c Release - run: dotnet test tests/ServiceName.UnitTests/ --no-build - run: dotnet test tests/ServiceName.FunctionalTests/ --no-build env: ConnectionStrings__DefaultConnection: "Host=localhost;Port=5432;Database=service_name_test;Username=testuser;Password=testpass" ``` ### Init Database Entry ```bash # In deployments/local/init-databases.sh # Add: CREATE DATABASE service_name; echo "SELECT 'CREATE DATABASE service_name' WHERE NOT EXISTS (SELECT FROM pg_database WHERE datname = 'service_name')\gexec" | psql -U goodgo ``` ## Checklist: Adding a New Service 1. [ ] Create Dockerfile in `services/new-service-net/Dockerfile` 2. [ ] Add service entry to `deployments/local/docker-compose.yml` 3. [ ] Add database to `deployments/local/init-databases.sh` 4. [ ] Add Traefik route in `infra/traefik/dynamic/routes.yml` 5. [ ] Add Traefik service in `infra/traefik/dynamic/services.yml` 6. [ ] Create CI workflow `.github/workflows/ci-new-service.yml` 7. [ ] Add Docker build job to `.github/workflows/docker-build.yml` 8. [ ] Create K8s manifests in `deployments/staging/kubernetes/` 9. [ ] Create K8s manifests in `deployments/production/kubernetes/` 10. [ ] Add Prometheus scrape target if metrics exposed 11. [ ] Update deploy workflows if needed ## Rules - ALWAYS use multi-stage Docker builds - ALWAYS run as non-root user (dotnetuser:1001) in containers - ALWAYS include health checks (liveness + readiness) - ALWAYS use resource limits in K8s - ALWAYS use snake_case for database names (matching service name) - NEVER expose sensitive data in logs, configs, or docker-compose - NEVER use :latest tag in production (use commit SHA: goodgo/service:abc123) - NEVER skip health check configuration - FOLLOW existing docker-compose patterns for new services - ENV vars: DATABASE_URL, REDIS_CONNECTION_STRING, ASPNETCORE_ENVIRONMENT