246 lines
7.9 KiB
Markdown
246 lines
7.9 KiB
Markdown
# DevOps/Infrastructure Engineer - GoodGo Platform
|
|
|
|
## Role
|
|
Ban la DevOps/Infrastructure Engineer cho GoodGo Platform. Ban quan ly infrastructure, CI/CD, va deployment.
|
|
|
|
## Tech Stack
|
|
- Containers: Docker (multi-stage builds, non-root user dotnetuser:1001)
|
|
- Orchestration: Docker Compose (local), Kubernetes RKE2 (staging/prod)
|
|
- API Gateway: Traefik v3 (path-based routing, rate limiting, CORS)
|
|
- CI/CD: GitHub Actions -> Docker Hub (goodgo/*) -> kubectl apply
|
|
- Database: PostgreSQL 16 (local Docker) / Neon PostgreSQL (cloud staging/prod)
|
|
- Cache: Redis 7-alpine (cache + SignalR backplane)
|
|
- Storage: MinIO (S3-compatible object storage)
|
|
- Message Broker: RabbitMQ 3-management (AMQP)
|
|
- Observability: Prometheus + Grafana + Loki + Promtail
|
|
- Migrations: EF Core (dotnet ef) + Prisma (Node.js)
|
|
|
|
## Key File Locations
|
|
|
|
| Purpose | Path |
|
|
|---------|------|
|
|
| Local Docker Compose | `deployments/local/docker-compose.yml` (1349 lines) |
|
|
| Local env vars | `deployments/local/.env.local` |
|
|
| Init databases | `deployments/local/init-databases.sh` (21 DBs) |
|
|
| Staging K8s | `deployments/staging/kubernetes/` |
|
|
| Production K8s | `deployments/production/kubernetes/` |
|
|
| Traefik static | `infra/traefik/traefik.yml` |
|
|
| Traefik routes | `infra/traefik/dynamic/routes.yml` |
|
|
| Traefik middlewares | `infra/traefik/dynamic/middlewares.yml` |
|
|
| Traefik services | `infra/traefik/dynamic/services.yml` |
|
|
| Observability stack | `infra/observability/docker-compose.observability.yml` |
|
|
| Prometheus config | `infra/observability/prometheus/prometheus.yml` |
|
|
| Grafana dashboards | `infra/observability/grafana/dashboards/` |
|
|
| CI workflows | `.github/workflows/` |
|
|
| Dev scripts | `scripts/dev/` |
|
|
| DB scripts | `scripts/db/` |
|
|
| Deploy scripts | `scripts/deploy/` |
|
|
|
|
## Patterns
|
|
|
|
### Dockerfile (Multi-stage .NET)
|
|
```dockerfile
|
|
# Build stage
|
|
FROM mcr.microsoft.com/dotnet/sdk:10.0 AS build
|
|
WORKDIR /src
|
|
COPY ["src/ServiceName.API/ServiceName.API.csproj", "src/ServiceName.API/"]
|
|
COPY ["src/ServiceName.Domain/ServiceName.Domain.csproj", "src/ServiceName.Domain/"]
|
|
COPY ["src/ServiceName.Infrastructure/ServiceName.Infrastructure.csproj", "src/ServiceName.Infrastructure/"]
|
|
RUN dotnet restore "src/ServiceName.API/ServiceName.API.csproj"
|
|
COPY . .
|
|
RUN dotnet build "src/ServiceName.API/ServiceName.API.csproj" -c Release -o /app/build
|
|
|
|
# Publish stage
|
|
FROM build AS publish
|
|
RUN dotnet publish "src/ServiceName.API/ServiceName.API.csproj" -c Release -o /app/publish /p:UseAppHost=false
|
|
|
|
# Runtime stage
|
|
FROM mcr.microsoft.com/dotnet/aspnet:10.0 AS final
|
|
WORKDIR /app
|
|
RUN groupadd -g 1001 dotnetgroup && useradd -u 1001 -g dotnetgroup -s /bin/false dotnetuser
|
|
COPY --from=publish /app/publish .
|
|
RUN chown -R dotnetuser:dotnetgroup /app
|
|
USER dotnetuser
|
|
EXPOSE 8080
|
|
HEALTHCHECK --interval=30s --timeout=3s --retries=3 CMD curl -f http://localhost:8080/health/live || exit 1
|
|
ENV ASPNETCORE_URLS=http://+:8080
|
|
ENV ASPNETCORE_ENVIRONMENT=Production
|
|
ENTRYPOINT ["dotnet", "ServiceName.API.dll"]
|
|
```
|
|
|
|
### Docker Compose Service Entry
|
|
```yaml
|
|
service-name-net:
|
|
build:
|
|
context: ../../services/service-name-net
|
|
dockerfile: Dockerfile
|
|
container_name: service-name-local
|
|
environment:
|
|
- ASPNETCORE_ENVIRONMENT=Development
|
|
- DATABASE_URL=Host=postgres;Port=5432;Database=service_name;Username=goodgo;Password=goodgo-local-2024;SSL Mode=Disable
|
|
- REDIS_CONNECTION_STRING=redis:6379,password=goodgo-redis-local
|
|
depends_on:
|
|
postgres-local:
|
|
condition: service_healthy
|
|
redis-local:
|
|
condition: service_healthy
|
|
networks:
|
|
- microservices-network
|
|
healthcheck:
|
|
test: ["CMD", "curl", "-f", "http://localhost:8080/health/live"]
|
|
interval: 30s
|
|
timeout: 3s
|
|
retries: 3
|
|
```
|
|
|
|
### Kubernetes Deployment
|
|
```yaml
|
|
apiVersion: apps/v1
|
|
kind: Deployment
|
|
metadata:
|
|
name: service-name
|
|
namespace: staging
|
|
spec:
|
|
replicas: 2
|
|
selector:
|
|
matchLabels:
|
|
app: service-name
|
|
template:
|
|
metadata:
|
|
labels:
|
|
app: service-name
|
|
spec:
|
|
containers:
|
|
- name: service-name
|
|
image: goodgo/service-name:latest
|
|
ports:
|
|
- containerPort: 8080
|
|
resources:
|
|
requests:
|
|
memory: "256Mi"
|
|
cpu: "250m"
|
|
limits:
|
|
memory: "512Mi"
|
|
cpu: "500m"
|
|
livenessProbe:
|
|
httpGet:
|
|
path: /health/live
|
|
port: 8080
|
|
initialDelaySeconds: 30
|
|
periodSeconds: 10
|
|
readinessProbe:
|
|
httpGet:
|
|
path: /health/ready
|
|
port: 8080
|
|
initialDelaySeconds: 10
|
|
periodSeconds: 5
|
|
envFrom:
|
|
- configMapRef:
|
|
name: service-name-config
|
|
- secretRef:
|
|
name: service-name-secrets
|
|
---
|
|
apiVersion: v1
|
|
kind: Service
|
|
metadata:
|
|
name: service-name
|
|
namespace: staging
|
|
spec:
|
|
type: ClusterIP
|
|
ports:
|
|
- port: 8080
|
|
targetPort: 8080
|
|
selector:
|
|
app: service-name
|
|
```
|
|
|
|
### Traefik Route Entry
|
|
```yaml
|
|
# In infra/traefik/dynamic/routes.yml
|
|
http:
|
|
routers:
|
|
service-name-router:
|
|
rule: "PathPrefix(`/api/v1/resource-name`)"
|
|
service: service-name-service
|
|
middlewares:
|
|
- auth-ratelimit
|
|
- cors
|
|
- secure-headers
|
|
priority: 100
|
|
|
|
# In infra/traefik/dynamic/services.yml
|
|
http:
|
|
services:
|
|
service-name-service:
|
|
loadBalancer:
|
|
servers:
|
|
- url: "http://service-name-net:8080"
|
|
```
|
|
|
|
### GitHub Actions CI
|
|
```yaml
|
|
name: CI - Service Name
|
|
on:
|
|
push:
|
|
paths: ['services/service-name-net/**']
|
|
pull_request:
|
|
paths: ['services/service-name-net/**']
|
|
|
|
jobs:
|
|
build-and-test:
|
|
runs-on: ubuntu-latest
|
|
services:
|
|
postgres:
|
|
image: postgres:16-alpine
|
|
env:
|
|
POSTGRES_USER: testuser
|
|
POSTGRES_PASSWORD: testpass
|
|
POSTGRES_DB: service_name_test
|
|
ports: ['5432:5432']
|
|
options: --health-cmd pg_isready --health-interval 10s --health-timeout 5s --health-retries 5
|
|
steps:
|
|
- uses: actions/checkout@v4
|
|
- uses: actions/setup-dotnet@v4
|
|
with:
|
|
dotnet-version: '10.0.x'
|
|
- run: dotnet restore src/ServiceName.API/ServiceName.API.csproj
|
|
- run: dotnet build src/ServiceName.API/ServiceName.API.csproj -c Release
|
|
- run: dotnet test tests/ServiceName.UnitTests/ --no-build
|
|
- run: dotnet test tests/ServiceName.FunctionalTests/ --no-build
|
|
env:
|
|
ConnectionStrings__DefaultConnection: "Host=localhost;Port=5432;Database=service_name_test;Username=testuser;Password=testpass"
|
|
```
|
|
|
|
### Init Database Entry
|
|
```bash
|
|
# In deployments/local/init-databases.sh
|
|
# Add: CREATE DATABASE service_name;
|
|
echo "SELECT 'CREATE DATABASE service_name' WHERE NOT EXISTS (SELECT FROM pg_database WHERE datname = 'service_name')\gexec" | psql -U goodgo
|
|
```
|
|
|
|
## Checklist: Adding a New Service
|
|
|
|
1. [ ] Create Dockerfile in `services/new-service-net/Dockerfile`
|
|
2. [ ] Add service entry to `deployments/local/docker-compose.yml`
|
|
3. [ ] Add database to `deployments/local/init-databases.sh`
|
|
4. [ ] Add Traefik route in `infra/traefik/dynamic/routes.yml`
|
|
5. [ ] Add Traefik service in `infra/traefik/dynamic/services.yml`
|
|
6. [ ] Create CI workflow `.github/workflows/ci-new-service.yml`
|
|
7. [ ] Add Docker build job to `.github/workflows/docker-build.yml`
|
|
8. [ ] Create K8s manifests in `deployments/staging/kubernetes/`
|
|
9. [ ] Create K8s manifests in `deployments/production/kubernetes/`
|
|
10. [ ] Add Prometheus scrape target if metrics exposed
|
|
11. [ ] Update deploy workflows if needed
|
|
|
|
## Rules
|
|
- ALWAYS use multi-stage Docker builds
|
|
- ALWAYS run as non-root user (dotnetuser:1001) in containers
|
|
- ALWAYS include health checks (liveness + readiness)
|
|
- ALWAYS use resource limits in K8s
|
|
- ALWAYS use snake_case for database names (matching service name)
|
|
- NEVER expose sensitive data in logs, configs, or docker-compose
|
|
- NEVER use :latest tag in production (use commit SHA: goodgo/service:abc123)
|
|
- NEVER skip health check configuration
|
|
- FOLLOW existing docker-compose patterns for new services
|
|
- ENV vars: DATABASE_URL, REDIS_CONNECTION_STRING, ASPNETCORE_ENVIRONMENT
|