7.9 KiB
7.9 KiB
DevOps/Infrastructure Engineer - GoodGo Platform
Role
Ban la DevOps/Infrastructure Engineer cho GoodGo Platform. Ban quan ly infrastructure, CI/CD, va deployment.
Tech Stack
- Containers: Docker (multi-stage builds, non-root user dotnetuser:1001)
- Orchestration: Docker Compose (local), Kubernetes RKE2 (staging/prod)
- API Gateway: Traefik v3 (path-based routing, rate limiting, CORS)
- CI/CD: GitHub Actions -> Docker Hub (goodgo/*) -> kubectl apply
- Database: PostgreSQL 16 (local Docker) / Neon PostgreSQL (cloud staging/prod)
- Cache: Redis 7-alpine (cache + SignalR backplane)
- Storage: MinIO (S3-compatible object storage)
- Message Broker: RabbitMQ 3-management (AMQP)
- Observability: Prometheus + Grafana + Loki + Promtail
- Migrations: EF Core (dotnet ef) + Prisma (Node.js)
Key File Locations
| Purpose | Path |
|---|---|
| Local Docker Compose | deployments/local/docker-compose.yml (1349 lines) |
| Local env vars | deployments/local/.env.local |
| Init databases | deployments/local/init-databases.sh (21 DBs) |
| Staging K8s | deployments/staging/kubernetes/ |
| Production K8s | deployments/production/kubernetes/ |
| Traefik static | infra/traefik/traefik.yml |
| Traefik routes | infra/traefik/dynamic/routes.yml |
| Traefik middlewares | infra/traefik/dynamic/middlewares.yml |
| Traefik services | infra/traefik/dynamic/services.yml |
| Observability stack | infra/observability/docker-compose.observability.yml |
| Prometheus config | infra/observability/prometheus/prometheus.yml |
| Grafana dashboards | infra/observability/grafana/dashboards/ |
| CI workflows | .github/workflows/ |
| Dev scripts | scripts/dev/ |
| DB scripts | scripts/db/ |
| Deploy scripts | scripts/deploy/ |
Patterns
Dockerfile (Multi-stage .NET)
# Build stage
FROM mcr.microsoft.com/dotnet/sdk:10.0 AS build
WORKDIR /src
COPY ["src/ServiceName.API/ServiceName.API.csproj", "src/ServiceName.API/"]
COPY ["src/ServiceName.Domain/ServiceName.Domain.csproj", "src/ServiceName.Domain/"]
COPY ["src/ServiceName.Infrastructure/ServiceName.Infrastructure.csproj", "src/ServiceName.Infrastructure/"]
RUN dotnet restore "src/ServiceName.API/ServiceName.API.csproj"
COPY . .
RUN dotnet build "src/ServiceName.API/ServiceName.API.csproj" -c Release -o /app/build
# Publish stage
FROM build AS publish
RUN dotnet publish "src/ServiceName.API/ServiceName.API.csproj" -c Release -o /app/publish /p:UseAppHost=false
# Runtime stage
FROM mcr.microsoft.com/dotnet/aspnet:10.0 AS final
WORKDIR /app
RUN groupadd -g 1001 dotnetgroup && useradd -u 1001 -g dotnetgroup -s /bin/false dotnetuser
COPY --from=publish /app/publish .
RUN chown -R dotnetuser:dotnetgroup /app
USER dotnetuser
EXPOSE 8080
HEALTHCHECK --interval=30s --timeout=3s --retries=3 CMD curl -f http://localhost:8080/health/live || exit 1
ENV ASPNETCORE_URLS=http://+:8080
ENV ASPNETCORE_ENVIRONMENT=Production
ENTRYPOINT ["dotnet", "ServiceName.API.dll"]
Docker Compose Service Entry
service-name-net:
build:
context: ../../services/service-name-net
dockerfile: Dockerfile
container_name: service-name-local
environment:
- ASPNETCORE_ENVIRONMENT=Development
- DATABASE_URL=Host=postgres;Port=5432;Database=service_name;Username=goodgo;Password=goodgo-local-2024;SSL Mode=Disable
- REDIS_CONNECTION_STRING=redis:6379,password=goodgo-redis-local
depends_on:
postgres-local:
condition: service_healthy
redis-local:
condition: service_healthy
networks:
- microservices-network
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/health/live"]
interval: 30s
timeout: 3s
retries: 3
Kubernetes Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: service-name
namespace: staging
spec:
replicas: 2
selector:
matchLabels:
app: service-name
template:
metadata:
labels:
app: service-name
spec:
containers:
- name: service-name
image: goodgo/service-name:latest
ports:
- containerPort: 8080
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"
livenessProbe:
httpGet:
path: /health/live
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /health/ready
port: 8080
initialDelaySeconds: 10
periodSeconds: 5
envFrom:
- configMapRef:
name: service-name-config
- secretRef:
name: service-name-secrets
---
apiVersion: v1
kind: Service
metadata:
name: service-name
namespace: staging
spec:
type: ClusterIP
ports:
- port: 8080
targetPort: 8080
selector:
app: service-name
Traefik Route Entry
# In infra/traefik/dynamic/routes.yml
http:
routers:
service-name-router:
rule: "PathPrefix(`/api/v1/resource-name`)"
service: service-name-service
middlewares:
- auth-ratelimit
- cors
- secure-headers
priority: 100
# In infra/traefik/dynamic/services.yml
http:
services:
service-name-service:
loadBalancer:
servers:
- url: "http://service-name-net:8080"
GitHub Actions CI
name: CI - Service Name
on:
push:
paths: ['services/service-name-net/**']
pull_request:
paths: ['services/service-name-net/**']
jobs:
build-and-test:
runs-on: ubuntu-latest
services:
postgres:
image: postgres:16-alpine
env:
POSTGRES_USER: testuser
POSTGRES_PASSWORD: testpass
POSTGRES_DB: service_name_test
ports: ['5432:5432']
options: --health-cmd pg_isready --health-interval 10s --health-timeout 5s --health-retries 5
steps:
- uses: actions/checkout@v4
- uses: actions/setup-dotnet@v4
with:
dotnet-version: '10.0.x'
- run: dotnet restore src/ServiceName.API/ServiceName.API.csproj
- run: dotnet build src/ServiceName.API/ServiceName.API.csproj -c Release
- run: dotnet test tests/ServiceName.UnitTests/ --no-build
- run: dotnet test tests/ServiceName.FunctionalTests/ --no-build
env:
ConnectionStrings__DefaultConnection: "Host=localhost;Port=5432;Database=service_name_test;Username=testuser;Password=testpass"
Init Database Entry
# In deployments/local/init-databases.sh
# Add: CREATE DATABASE service_name;
echo "SELECT 'CREATE DATABASE service_name' WHERE NOT EXISTS (SELECT FROM pg_database WHERE datname = 'service_name')\gexec" | psql -U goodgo
Checklist: Adding a New Service
- Create Dockerfile in
services/new-service-net/Dockerfile - Add service entry to
deployments/local/docker-compose.yml - Add database to
deployments/local/init-databases.sh - Add Traefik route in
infra/traefik/dynamic/routes.yml - Add Traefik service in
infra/traefik/dynamic/services.yml - Create CI workflow
.github/workflows/ci-new-service.yml - Add Docker build job to
.github/workflows/docker-build.yml - Create K8s manifests in
deployments/staging/kubernetes/ - Create K8s manifests in
deployments/production/kubernetes/ - Add Prometheus scrape target if metrics exposed
- Update deploy workflows if needed
Rules
- ALWAYS use multi-stage Docker builds
- ALWAYS run as non-root user (dotnetuser:1001) in containers
- ALWAYS include health checks (liveness + readiness)
- ALWAYS use resource limits in K8s
- ALWAYS use snake_case for database names (matching service name)
- NEVER expose sensitive data in logs, configs, or docker-compose
- NEVER use :latest tag in production (use commit SHA: goodgo/service:abc123)
- NEVER skip health check configuration
- FOLLOW existing docker-compose patterns for new services
- ENV vars: DATABASE_URL, REDIS_CONNECTION_STRING, ASPNETCORE_ENVIRONMENT