Files
pos-system/docs/en/skills/cicd-advanced-patterns.md
Ho Ngoc Hai 2640b351c3 Enhance documentation with detailed diagrams and structured flows
- Added request/response flow diagrams to api-design and api-gateway-advanced skills for better visualization of processes.
- Introduced configuration loading flow in configuration-management skill to clarify the configuration process.
- Included error propagation flow in error-handling-patterns skill to illustrate error handling across layers.
- Enhanced various skills with additional diagrams to improve understanding of complex concepts.

These updates aim to provide clearer guidance and improve the overall documentation experience for developers.
2026-01-01 23:22:54 +07:00

14 KiB

name, description
name description
cicd-advanced-patterns Advanced CI/CD patterns for GoodGo microservices including blue-green deployments, canary releases, automated rollback, deployment verification, and progressive delivery.

CI/CD Advanced Patterns

When to Use This Skill

Use this skill when:

  • Implementing blue-green deployments
  • Setting up canary releases
  • Implementing automated rollback mechanisms
  • Creating deployment verification pipelines
  • Implementing progressive delivery
  • Setting up deployment gates
  • Implementing smoke tests
  • Managing deployment strategies in Kubernetes

Core Concepts

Deployment Strategies

  1. Rolling Update: Gradual replacement (default K8s)
  2. Blue-Green: Two identical environments, switch traffic
  3. Canary: Gradual rollout to subset of users
  4. Recreate: Stop old, start new (downtime)

Deployment Verification

  • Smoke tests
  • Health checks
  • Performance tests
  • Rollback triggers

Blue-Green Deployment

Blue-green deployment maintains two identical production environments (blue and green). At any time, only one environment serves live traffic. The new version is deployed to the idle environment, verified, and then traffic is switched.

flowchart TD
    Start([Deployment Triggered]) --> DeployGreen[Deploy to Green Environment]
    DeployGreen --> WaitRollout[Wait for Rollout Complete]
    WaitRollout --> RunSmokeTests[Run Smoke Tests]
    RunSmokeTests --> TestsPassed{Tests Passed?}
    
    TestsPassed -->|Yes| SwitchTraffic[Switch Service Selector to Green]
    TestsPassed -->|No| RollbackToBlue[Rollback: Keep Blue Active]
    
    SwitchTraffic --> MonitorHealth[Monitor Health Metrics]
    MonitorHealth --> HealthOK{Health OK?}
    
    HealthOK -->|Yes| Complete([Deployment Complete])
    HealthOK -->|No| AutoRollback[Auto Rollback to Blue]
    
    AutoRollback --> Complete
    RollbackToBlue --> Fail([Deployment Failed])
    
    style Start fill:#e1f5ff
    style Complete fill:#d4edda
    style Fail fill:#f8d7da
    style TestsPassed fill:#fff3cd
    style HealthOK fill:#fff3cd

Kubernetes Implementation

# deployments/production/kubernetes/user-service-blue.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: user-service-blue
  labels:
    app: user-service
    version: blue
spec:
  replicas: 3
  selector:
    matchLabels:
      app: user-service
      version: blue
  template:
    metadata:
      labels:
        app: user-service
        version: blue
    spec:
      containers:
      - name: user-service
        image: goodgo/user-service:v1.0.0
        ports:
        - containerPort: 5000

---
# deployments/production/kubernetes/user-service-green.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: user-service-green
  labels:
    app: user-service
    version: green
spec:
  replicas: 3
  selector:
    matchLabels:
      app: user-service
      version: green
  template:
    metadata:
      labels:
        app: user-service
        version: green
    spec:
      containers:
      - name: user-service
        image: goodgo/user-service:v1.1.0
        ports:
        - containerPort: 5000

---
# Service selector switches between blue/green
apiVersion: v1
kind: Service
metadata:
  name: user-service
spec:
  selector:
    app: user-service
    version: blue  # Switch to green after verification
  ports:
  - port: 80
    targetPort: 5000

Canary Deployment

Canary deployment gradually rolls out changes to a small subset of users before making them available to everyone. This allows for real-world testing with minimal risk.

flowchart TD
    Start([Canary Deployment Started]) --> DeployCanary[Deploy Canary Version<br/>1 Replica]
    DeployCanary --> Route10[Route 10% Traffic to Canary]
    Route10 --> Wait10[Wait 5-10 minutes]
    Wait10 --> Check10{Health & Metrics OK?}
    
    Check10 -->|No| RollbackCanary[Rollback: Route 0% to Canary]
    Check10 -->|Yes| Route25[Route 25% Traffic to Canary]
    
    Route25 --> Wait25[Wait 5-10 minutes]
    Wait25 --> Check25{Health & Metrics OK?}
    
    Check25 -->|No| RollbackCanary
    Check25 -->|Yes| Route50[Route 50% Traffic to Canary]
    
    Route50 --> Wait50[Wait 5-10 minutes]
    Wait50 --> Check50{Health & Metrics OK?}
    
    Check50 -->|No| RollbackCanary
    Check50 -->|Yes| Route75[Route 75% Traffic to Canary]
    
    Route75 --> Wait75[Wait 5-10 minutes]
    Wait75 --> Check75{Health & Metrics OK?}
    
    Check75 -->|No| RollbackCanary
    Check75 -->|Yes| Route100[Route 100% Traffic to Canary]
    
    Route100 --> PromoteCanary[Promote Canary to Stable]
    PromoteCanary --> Complete([Canary Complete])
    
    RollbackCanary --> Fail([Canary Failed])
    
    style Start fill:#e1f5ff
    style Complete fill:#d4edda
    style Fail fill:#f8d7da
    style Check10 fill:#fff3cd
    style Check25 fill:#fff3cd
    style Check50 fill:#fff3cd
    style Check75 fill:#fff3cd

Kubernetes Canary with Service Mesh

# deployments/production/kubernetes/user-service-canary.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: user-service-canary
  labels:
    app: user-service
    version: canary
spec:
  replicas: 1  # Start with 1 replica (10% traffic)
  selector:
    matchLabels:
      app: user-service
      version: canary
  template:
    metadata:
      labels:
        app: user-service
        version: canary
    spec:
      containers:
      - name: user-service
        image: goodgo/user-service:v1.1.0

---
# VirtualService splits traffic
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: user-service
spec:
  hosts:
    - user-service
  http:
    - match:
        - headers:
            canary:
              exact: "true"
      route:
        - destination:
            host: user-service
            subset: canary
          weight: 100
    - route:
        - destination:
            host: user-service
            subset: stable
          weight: 90
        - destination:
            host: user-service
            subset: canary
          weight: 10  # 10% traffic to canary

Automated Rollback

Automated rollback mechanisms detect deployment failures and automatically revert to the previous stable version, minimizing downtime and impact.

flowchart TD
    Start([Deployment Completed]) --> RunSmokeTests[Run Smoke Tests]
    RunSmokeTests --> SmokePassed{Smoke Tests Pass?}
    
    SmokePassed -->|No| GetPreviousRev[Get Previous Revision]
    GetPreviousRev --> RollbackDeploy[Rollback Deployment]
    RollbackDeploy --> VerifyRollback[Verify Rollback Success]
    VerifyRollback --> RollbackComplete([Rollback Complete])
    
    SmokePassed -->|Yes| MonitorHealth[Monitor Health Metrics]
    MonitorHealth --> HealthOK{Health OK?}
    
    HealthOK -->|Yes| MonitorErrors[Monitor Error Rates]
    HealthOK -->|No| GetPreviousRev
    
    MonitorErrors --> ErrorRateOK{Error Rate < Threshold?}
    
    ErrorRateOK -->|Yes| MonitorPerformance[Monitor Performance]
    ErrorRateOK -->|No| GetPreviousRev
    
    MonitorPerformance --> PerfOK{Performance OK?}
    
    PerfOK -->|Yes| DeploymentSuccess([Deployment Successful])
    PerfOK -->|No| GetPreviousRev
    
    style Start fill:#e1f5ff
    style DeploymentSuccess fill:#d4edda
    style RollbackComplete fill:#f8d7da
    style SmokePassed fill:#fff3cd
    style HealthOK fill:#fff3cd
    style ErrorRateOK fill:#fff3cd
    style PerfOK fill:#fff3cd

Rollback Script

#!/bin/bash
# scripts/deployment/rollback.sh
# Automated rollback to previous version

SERVICE_NAME=$1
NAMESPACE=${2:-production}

# Get previous deployment revision
PREVIOUS_REVISION=$(kubectl rollout history deployment/$SERVICE_NAME -n $NAMESPACE --no-headers | tail -1 | awk '{print $1}')

if [ -z "$PREVIOUS_REVISION" ]; then
  echo "No previous revision found"
  exit 1
fi

echo "Rolling back to revision $PREVIOUS_REVISION"

# Rollback deployment
kubectl rollout undo deployment/$SERVICE_NAME -n $NAMESPACE --to-revision=$PREVIOUS_REVISION

# Wait for rollout
kubectl rollout status deployment/$SERVICE_NAME -n $NAMESPACE

echo "Rollback complete"

Automated Rollback on Failure

# .github/workflows/deploy-production.yml
name: Deploy Production

on:
  push:
    branches: [main]

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      
      - name: Deploy to Kubernetes
        run: |
          kubectl apply -f deployments/production/kubernetes/
          kubectl rollout status deployment/user-service
      
      - name: Run Smoke Tests
        run: ./scripts/deployment/smoke-tests.sh user-service
      
      - name: Rollback on Failure
        if: failure()
        run: ./scripts/deployment/rollback.sh user-service production

Deployment Verification

Smoke Tests

// scripts/deployment/smoke-tests.ts
// Smoke tests for deployment verification
import axios from 'axios';

const SERVICE_URL = process.env.SERVICE_URL || 'http://localhost';

async function runSmokeTests(): Promise<boolean> {
  try {
    // Health check
    const healthResponse = await axios.get(`${SERVICE_URL}/health`);
    if (healthResponse.status !== 200) {
      console.error('Health check failed');
      return false;
    }

    // Basic functionality test
    const testResponse = await axios.get(`${SERVICE_URL}/api/v1/users`, {
      timeout: 5000,
    });
    
    if (testResponse.status !== 200) {
      console.error('Functionality test failed');
      return false;
    }

    console.log('Smoke tests passed');
    return true;
  } catch (error) {
    console.error('Smoke tests failed', error);
    return false;
  }
}

runSmokeTests().then((success) => {
  process.exit(success ? 0 : 1);
});

Health Check Script

#!/bin/bash
# scripts/deployment/health-checks.sh
# Comprehensive health checks

SERVICE_NAME=$1
NAMESPACE=${2:-production}

echo "Running health checks for $SERVICE_NAME"

# Check pods are ready
READY_PODS=$(kubectl get pods -n $NAMESPACE -l app=$SERVICE_NAME --field-selector=status.phase=Running --no-headers | wc -l)

if [ $READY_PODS -eq 0 ]; then
  echo "No ready pods found"
  exit 1
fi

# Check service endpoints
ENDPOINTS=$(kubectl get endpoints $SERVICE_NAME -n $NAMESPACE -o jsonpath='{.subsets[0].addresses[*].ip}' | wc -w)

if [ $ENDPOINTS -eq 0 ]; then
  echo "No service endpoints found"
  exit 1
fi

# Check health endpoint
SERVICE_URL=$(kubectl get service $SERVICE_NAME -n $NAMESPACE -o jsonpath='{.status.loadBalancer.ingress[0].hostname}')

if [ -z "$SERVICE_URL" ]; then
  SERVICE_URL="http://$SERVICE_NAME.$NAMESPACE.svc.cluster.local"
fi

HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" $SERVICE_URL/health)

if [ $HTTP_CODE -ne 200 ]; then
  echo "Health endpoint returned $HTTP_CODE"
  exit 1
fi

echo "Health checks passed"

Deployment Gates

Deployment gates add checkpoints in the CI/CD pipeline that must pass before proceeding to the next stage.

# .github/workflows/deploy-with-gates.yml
name: Deploy with Gates

jobs:
  deploy:
    steps:
      - name: Deploy
        run: kubectl apply -f deployments/
      
      - name: Wait for Rollout
        run: kubectl rollout status deployment/service
      
      - name: Smoke Tests Gate
        id: smoke-tests
        run: ./scripts/deployment/smoke-tests.sh
      
      - name: Performance Tests Gate
        if: steps.smoke-tests.outcome == 'success'
        run: ./scripts/deployment/performance-tests.sh
      
      - name: Manual Approval Gate
        if: steps.smoke-tests.outcome == 'success'
        uses: trstringer/manual-approval@v1
        with:
          secret: ${{ secrets.GITHUB_TOKEN }}
          approvers: team-leads
          minimum-approvals: 1
          issue-title: "Approve deployment"

Best Practices

  1. Blue-Green: Use for zero-downtime deployments
  2. Canary: Use for gradual rollouts with monitoring
  3. Automated Rollback: Always have rollback plan
  4. Smoke Tests: Run immediately after deployment
  5. Health Checks: Monitor health continuously
  6. Gates: Use deployment gates for critical deployments

Common Mistakes

  1. No Rollback Plan: Can't recover from failed deployment

    # ✅ Always have rollback command ready
    kubectl rollout undo deployment/service
    
  2. Skipping Smoke Tests: Catching issues too late

    # ✅ Run smoke tests immediately after deploy
    - name: Smoke Tests
      run: ./scripts/smoke-tests.sh
    
  3. 100% Traffic Switch: All-or-nothing failures

    # ❌ BAD: Immediate full switch
    # ✅ GOOD: Gradual rollout (10% → 50% → 100%)
    
  4. No Health Monitoring: Missing deployment issues

    # ✅ Monitor health after deployment
    - name: Monitor Health
      run: kubectl rollout status deployment/service --timeout=5m
    

Quick Reference

Strategy Risk Downtime Resource Cost
Blue-Green Low Zero 2x (temporary)
Canary Low Zero +10-20%
Rolling Medium Zero 1x
Recreate High Yes 1x

Deployment Commands:

# Apply deployment
kubectl apply -f kubernetes/

# Check rollout status
kubectl rollout status deployment/service

# Rollback
kubectl rollout undo deployment/service

# Canary traffic split (Istio)
kubectl apply -f virtualservice-canary.yaml

GitHub Actions Triggers:

on:
  push:
    branches: [main]      # Deploy to prod
    tags: ['v*']          # Release
  pull_request:
    branches: [main]      # PR checks

Deployment Gates:

Build → Test → Security Scan → Deploy Staging
→ Smoke Tests → Manual Approval → Deploy Prod

Resources