- Added request/response flow diagrams to api-design and api-gateway-advanced skills for better visualization of processes. - Introduced configuration loading flow in configuration-management skill to clarify the configuration process. - Included error propagation flow in error-handling-patterns skill to illustrate error handling across layers. - Enhanced various skills with additional diagrams to improve understanding of complex concepts. These updates aim to provide clearer guidance and improve the overall documentation experience for developers.
14 KiB
name, description
| name | description |
|---|---|
| cicd-advanced-patterns | Advanced CI/CD patterns for GoodGo microservices including blue-green deployments, canary releases, automated rollback, deployment verification, and progressive delivery. |
CI/CD Advanced Patterns
When to Use This Skill
Use this skill when:
- Implementing blue-green deployments
- Setting up canary releases
- Implementing automated rollback mechanisms
- Creating deployment verification pipelines
- Implementing progressive delivery
- Setting up deployment gates
- Implementing smoke tests
- Managing deployment strategies in Kubernetes
Core Concepts
Deployment Strategies
- Rolling Update: Gradual replacement (default K8s)
- Blue-Green: Two identical environments, switch traffic
- Canary: Gradual rollout to subset of users
- Recreate: Stop old, start new (downtime)
Deployment Verification
- Smoke tests
- Health checks
- Performance tests
- Rollback triggers
Blue-Green Deployment
Blue-green deployment maintains two identical production environments (blue and green). At any time, only one environment serves live traffic. The new version is deployed to the idle environment, verified, and then traffic is switched.
flowchart TD
Start([Deployment Triggered]) --> DeployGreen[Deploy to Green Environment]
DeployGreen --> WaitRollout[Wait for Rollout Complete]
WaitRollout --> RunSmokeTests[Run Smoke Tests]
RunSmokeTests --> TestsPassed{Tests Passed?}
TestsPassed -->|Yes| SwitchTraffic[Switch Service Selector to Green]
TestsPassed -->|No| RollbackToBlue[Rollback: Keep Blue Active]
SwitchTraffic --> MonitorHealth[Monitor Health Metrics]
MonitorHealth --> HealthOK{Health OK?}
HealthOK -->|Yes| Complete([Deployment Complete])
HealthOK -->|No| AutoRollback[Auto Rollback to Blue]
AutoRollback --> Complete
RollbackToBlue --> Fail([Deployment Failed])
style Start fill:#e1f5ff
style Complete fill:#d4edda
style Fail fill:#f8d7da
style TestsPassed fill:#fff3cd
style HealthOK fill:#fff3cd
Kubernetes Implementation
# deployments/production/kubernetes/user-service-blue.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: user-service-blue
labels:
app: user-service
version: blue
spec:
replicas: 3
selector:
matchLabels:
app: user-service
version: blue
template:
metadata:
labels:
app: user-service
version: blue
spec:
containers:
- name: user-service
image: goodgo/user-service:v1.0.0
ports:
- containerPort: 5000
---
# deployments/production/kubernetes/user-service-green.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: user-service-green
labels:
app: user-service
version: green
spec:
replicas: 3
selector:
matchLabels:
app: user-service
version: green
template:
metadata:
labels:
app: user-service
version: green
spec:
containers:
- name: user-service
image: goodgo/user-service:v1.1.0
ports:
- containerPort: 5000
---
# Service selector switches between blue/green
apiVersion: v1
kind: Service
metadata:
name: user-service
spec:
selector:
app: user-service
version: blue # Switch to green after verification
ports:
- port: 80
targetPort: 5000
Canary Deployment
Canary deployment gradually rolls out changes to a small subset of users before making them available to everyone. This allows for real-world testing with minimal risk.
flowchart TD
Start([Canary Deployment Started]) --> DeployCanary[Deploy Canary Version<br/>1 Replica]
DeployCanary --> Route10[Route 10% Traffic to Canary]
Route10 --> Wait10[Wait 5-10 minutes]
Wait10 --> Check10{Health & Metrics OK?}
Check10 -->|No| RollbackCanary[Rollback: Route 0% to Canary]
Check10 -->|Yes| Route25[Route 25% Traffic to Canary]
Route25 --> Wait25[Wait 5-10 minutes]
Wait25 --> Check25{Health & Metrics OK?}
Check25 -->|No| RollbackCanary
Check25 -->|Yes| Route50[Route 50% Traffic to Canary]
Route50 --> Wait50[Wait 5-10 minutes]
Wait50 --> Check50{Health & Metrics OK?}
Check50 -->|No| RollbackCanary
Check50 -->|Yes| Route75[Route 75% Traffic to Canary]
Route75 --> Wait75[Wait 5-10 minutes]
Wait75 --> Check75{Health & Metrics OK?}
Check75 -->|No| RollbackCanary
Check75 -->|Yes| Route100[Route 100% Traffic to Canary]
Route100 --> PromoteCanary[Promote Canary to Stable]
PromoteCanary --> Complete([Canary Complete])
RollbackCanary --> Fail([Canary Failed])
style Start fill:#e1f5ff
style Complete fill:#d4edda
style Fail fill:#f8d7da
style Check10 fill:#fff3cd
style Check25 fill:#fff3cd
style Check50 fill:#fff3cd
style Check75 fill:#fff3cd
Kubernetes Canary with Service Mesh
# deployments/production/kubernetes/user-service-canary.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: user-service-canary
labels:
app: user-service
version: canary
spec:
replicas: 1 # Start with 1 replica (10% traffic)
selector:
matchLabels:
app: user-service
version: canary
template:
metadata:
labels:
app: user-service
version: canary
spec:
containers:
- name: user-service
image: goodgo/user-service:v1.1.0
---
# VirtualService splits traffic
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: user-service
spec:
hosts:
- user-service
http:
- match:
- headers:
canary:
exact: "true"
route:
- destination:
host: user-service
subset: canary
weight: 100
- route:
- destination:
host: user-service
subset: stable
weight: 90
- destination:
host: user-service
subset: canary
weight: 10 # 10% traffic to canary
Automated Rollback
Automated rollback mechanisms detect deployment failures and automatically revert to the previous stable version, minimizing downtime and impact.
flowchart TD
Start([Deployment Completed]) --> RunSmokeTests[Run Smoke Tests]
RunSmokeTests --> SmokePassed{Smoke Tests Pass?}
SmokePassed -->|No| GetPreviousRev[Get Previous Revision]
GetPreviousRev --> RollbackDeploy[Rollback Deployment]
RollbackDeploy --> VerifyRollback[Verify Rollback Success]
VerifyRollback --> RollbackComplete([Rollback Complete])
SmokePassed -->|Yes| MonitorHealth[Monitor Health Metrics]
MonitorHealth --> HealthOK{Health OK?}
HealthOK -->|Yes| MonitorErrors[Monitor Error Rates]
HealthOK -->|No| GetPreviousRev
MonitorErrors --> ErrorRateOK{Error Rate < Threshold?}
ErrorRateOK -->|Yes| MonitorPerformance[Monitor Performance]
ErrorRateOK -->|No| GetPreviousRev
MonitorPerformance --> PerfOK{Performance OK?}
PerfOK -->|Yes| DeploymentSuccess([Deployment Successful])
PerfOK -->|No| GetPreviousRev
style Start fill:#e1f5ff
style DeploymentSuccess fill:#d4edda
style RollbackComplete fill:#f8d7da
style SmokePassed fill:#fff3cd
style HealthOK fill:#fff3cd
style ErrorRateOK fill:#fff3cd
style PerfOK fill:#fff3cd
Rollback Script
#!/bin/bash
# scripts/deployment/rollback.sh
# Automated rollback to previous version
SERVICE_NAME=$1
NAMESPACE=${2:-production}
# Get previous deployment revision
PREVIOUS_REVISION=$(kubectl rollout history deployment/$SERVICE_NAME -n $NAMESPACE --no-headers | tail -1 | awk '{print $1}')
if [ -z "$PREVIOUS_REVISION" ]; then
echo "No previous revision found"
exit 1
fi
echo "Rolling back to revision $PREVIOUS_REVISION"
# Rollback deployment
kubectl rollout undo deployment/$SERVICE_NAME -n $NAMESPACE --to-revision=$PREVIOUS_REVISION
# Wait for rollout
kubectl rollout status deployment/$SERVICE_NAME -n $NAMESPACE
echo "Rollback complete"
Automated Rollback on Failure
# .github/workflows/deploy-production.yml
name: Deploy Production
on:
push:
branches: [main]
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Deploy to Kubernetes
run: |
kubectl apply -f deployments/production/kubernetes/
kubectl rollout status deployment/user-service
- name: Run Smoke Tests
run: ./scripts/deployment/smoke-tests.sh user-service
- name: Rollback on Failure
if: failure()
run: ./scripts/deployment/rollback.sh user-service production
Deployment Verification
Smoke Tests
// scripts/deployment/smoke-tests.ts
// Smoke tests for deployment verification
import axios from 'axios';
const SERVICE_URL = process.env.SERVICE_URL || 'http://localhost';
async function runSmokeTests(): Promise<boolean> {
try {
// Health check
const healthResponse = await axios.get(`${SERVICE_URL}/health`);
if (healthResponse.status !== 200) {
console.error('Health check failed');
return false;
}
// Basic functionality test
const testResponse = await axios.get(`${SERVICE_URL}/api/v1/users`, {
timeout: 5000,
});
if (testResponse.status !== 200) {
console.error('Functionality test failed');
return false;
}
console.log('Smoke tests passed');
return true;
} catch (error) {
console.error('Smoke tests failed', error);
return false;
}
}
runSmokeTests().then((success) => {
process.exit(success ? 0 : 1);
});
Health Check Script
#!/bin/bash
# scripts/deployment/health-checks.sh
# Comprehensive health checks
SERVICE_NAME=$1
NAMESPACE=${2:-production}
echo "Running health checks for $SERVICE_NAME"
# Check pods are ready
READY_PODS=$(kubectl get pods -n $NAMESPACE -l app=$SERVICE_NAME --field-selector=status.phase=Running --no-headers | wc -l)
if [ $READY_PODS -eq 0 ]; then
echo "No ready pods found"
exit 1
fi
# Check service endpoints
ENDPOINTS=$(kubectl get endpoints $SERVICE_NAME -n $NAMESPACE -o jsonpath='{.subsets[0].addresses[*].ip}' | wc -w)
if [ $ENDPOINTS -eq 0 ]; then
echo "No service endpoints found"
exit 1
fi
# Check health endpoint
SERVICE_URL=$(kubectl get service $SERVICE_NAME -n $NAMESPACE -o jsonpath='{.status.loadBalancer.ingress[0].hostname}')
if [ -z "$SERVICE_URL" ]; then
SERVICE_URL="http://$SERVICE_NAME.$NAMESPACE.svc.cluster.local"
fi
HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" $SERVICE_URL/health)
if [ $HTTP_CODE -ne 200 ]; then
echo "Health endpoint returned $HTTP_CODE"
exit 1
fi
echo "Health checks passed"
Deployment Gates
Deployment gates add checkpoints in the CI/CD pipeline that must pass before proceeding to the next stage.
# .github/workflows/deploy-with-gates.yml
name: Deploy with Gates
jobs:
deploy:
steps:
- name: Deploy
run: kubectl apply -f deployments/
- name: Wait for Rollout
run: kubectl rollout status deployment/service
- name: Smoke Tests Gate
id: smoke-tests
run: ./scripts/deployment/smoke-tests.sh
- name: Performance Tests Gate
if: steps.smoke-tests.outcome == 'success'
run: ./scripts/deployment/performance-tests.sh
- name: Manual Approval Gate
if: steps.smoke-tests.outcome == 'success'
uses: trstringer/manual-approval@v1
with:
secret: ${{ secrets.GITHUB_TOKEN }}
approvers: team-leads
minimum-approvals: 1
issue-title: "Approve deployment"
Best Practices
- Blue-Green: Use for zero-downtime deployments
- Canary: Use for gradual rollouts with monitoring
- Automated Rollback: Always have rollback plan
- Smoke Tests: Run immediately after deployment
- Health Checks: Monitor health continuously
- Gates: Use deployment gates for critical deployments
Common Mistakes
-
No Rollback Plan: Can't recover from failed deployment
# ✅ Always have rollback command ready kubectl rollout undo deployment/service -
Skipping Smoke Tests: Catching issues too late
# ✅ Run smoke tests immediately after deploy - name: Smoke Tests run: ./scripts/smoke-tests.sh -
100% Traffic Switch: All-or-nothing failures
# ❌ BAD: Immediate full switch # ✅ GOOD: Gradual rollout (10% → 50% → 100%) -
No Health Monitoring: Missing deployment issues
# ✅ Monitor health after deployment - name: Monitor Health run: kubectl rollout status deployment/service --timeout=5m
Quick Reference
| Strategy | Risk | Downtime | Resource Cost |
|---|---|---|---|
| Blue-Green | Low | Zero | 2x (temporary) |
| Canary | Low | Zero | +10-20% |
| Rolling | Medium | Zero | 1x |
| Recreate | High | Yes | 1x |
Deployment Commands:
# Apply deployment
kubectl apply -f kubernetes/
# Check rollout status
kubectl rollout status deployment/service
# Rollback
kubectl rollout undo deployment/service
# Canary traffic split (Istio)
kubectl apply -f virtualservice-canary.yaml
GitHub Actions Triggers:
on:
push:
branches: [main] # Deploy to prod
tags: ['v*'] # Release
pull_request:
branches: [main] # PR checks
Deployment Gates:
Build → Test → Security Scan → Deploy Staging
→ Smoke Tests → Manual Approval → Deploy Prod
Resources
- Kubernetes Deployment
- Istio Traffic Management
- Deployment Kubernetes - K8s deployment patterns
- Testing Patterns - Testing strategies
- Project Rules - GoodGo coding standards
- Skill Source:
.cursor/skills/cicd-advanced-patterns/SKILL.md