--- name: resilience-patterns description: Resilience patterns for GoodGo microservices including circuit breaker, retry strategies, timeout handling, and graceful degradation. Use when implementing fault tolerance, handling external service failures, or improving system reliability. --- # Resilience Patterns ## When to Use This Skill Use this skill when: - Implementing circuit breaker patterns for external services - Adding retry logic for transient failures - Setting timeout handling for long-running operations - Implementing graceful degradation strategies - Handling external service failures - Improving system fault tolerance ## Core Concepts ### Resilience Patterns 1. **Circuit Breaker**: Prevents cascading failures by stopping calls to failing services 2. **Retry**: Automatically retries failed operations with backoff 3. **Timeout**: Sets maximum time limits for operations 4. **Bulkhead**: Isolates failures to prevent spread 5. **Graceful Degradation**: Provides fallback behavior when services fail ## Patterns ### Circuit Breaker Pattern Protects against cascading failures: The circuit breaker has three states that transition based on error rates and timeouts: ```mermaid stateDiagram-v2 [*] --> CLOSED: Initial State CLOSED --> OPEN: Errors exceed threshold
(errorThresholdPercentage: 50%) OPEN --> HALF_OPEN: Reset timeout expires
(resetTimeout: 30s) HALF_OPEN --> CLOSED: Request succeeds HALF_OPEN --> OPEN: Request fails CLOSED --> [*]: Normal operation OPEN --> [*]: Circuit open (rejecting requests) HALF_OPEN --> [*]: Testing recovery ``` **Circuit Breaker States:** - **CLOSED**: Normal operation, requests pass through - **OPEN**: Circuit is open, requests are immediately rejected - **HALF-OPEN**: Testing if service has recovered, allows limited requests ```typescript import CircuitBreaker from 'opossum'; import { logger } from '@goodgo/logger'; export const createCircuitBreaker = ( action: (...args: TArgs) => Promise, name: string, options: Partial = {} ): CircuitBreaker => { const breaker = new CircuitBreaker(action, { timeout: 3000, errorThresholdPercentage: 50, resetTimeout: 30000, ...options, name, }); breaker.on('open', () => { logger.warn(`Circuit Breaker OPEN: ${name}`); }); breaker.on('halfOpen', () => { logger.info(`Circuit Breaker HALF-OPEN: ${name}`); }); breaker.on('close', () => { logger.info(`Circuit Breaker CLOSED: ${name}`); }); return breaker; }; // Usage const externalApiBreaker = createCircuitBreaker( async (data) => await externalApi.call(data), 'external-api' ); try { const result = await externalApiBreaker.fire(requestData); } catch (error) { // Handle circuit breaker error or fallback } ``` ### Retry Pattern Retry transient failures with exponential backoff: The retry pattern attempts an operation multiple times with increasing delays between attempts: ```mermaid flowchart TD Start([Start Operation]) --> Attempt[Attempt Operation] Attempt --> Success{Success?} Success -->|Yes| Return([Return Result]) Success -->|No| CheckRetries{Attempt < Max Retries?} CheckRetries -->|No| ThrowError([Throw Error]) CheckRetries -->|Yes| CalculateDelay[Calculate Delay:
baseDelay × 2^attempt] CalculateDelay --> Wait[Wait for Delay] Wait --> IncrementAttempt[Increment Attempt] IncrementAttempt --> Attempt style Start fill:#e1f5e1 style Return fill:#e1f5e1 style ThrowError fill:#ffe1e1 style CalculateDelay fill:#fff4e1 ``` **Exponential Backoff Example:** - Attempt 1: 1s delay - Attempt 2: 2s delay - Attempt 3: 4s delay - Attempt 4: 8s delay ```typescript async function retryWithBackoff( fn: () => Promise, maxRetries: number = 3, baseDelay: number = 1000 ): Promise { for (let attempt = 0; attempt <= maxRetries; attempt++) { try { return await fn(); } catch (error) { if (attempt === maxRetries) throw error; const delay = baseDelay * Math.pow(2, attempt); await new Promise(resolve => setTimeout(resolve, delay)); } } throw new Error('Retry exhausted'); } ``` ### Timeout Pattern Set maximum time limits: The timeout pattern uses Promise.race to enforce maximum execution time: ```mermaid sequenceDiagram participant Client participant TimeoutWrapper participant Operation participant TimeoutTimer Client->>TimeoutWrapper: Execute with timeout TimeoutWrapper->>Operation: Start operation TimeoutWrapper->>TimeoutTimer: Start timeout timer alt Operation completes first Operation-->>TimeoutWrapper: Return result TimeoutWrapper-->>Client: Return result TimeoutWrapper->>TimeoutTimer: Cancel timer else Timeout expires first TimeoutTimer-->>TimeoutWrapper: Timeout error TimeoutWrapper->>Operation: (Operation may continue) TimeoutWrapper-->>Client: Reject with timeout error end ``` **Timeout Behavior:** - Uses `Promise.race()` to compete operation vs timeout - First to resolve/reject wins - Operation may continue after timeout, but result is ignored ```typescript async function withTimeout( promise: Promise, timeoutMs: number ): Promise { const timeout = new Promise((_, reject) => { setTimeout(() => reject(new Error('Operation timeout')), timeoutMs); }); return Promise.race([promise, timeout]); } // Usage try { const result = await withTimeout( externalService.call(), 5000 // 5 second timeout ); } catch (error) { if (error.message === 'Operation timeout') { // Handle timeout } } ``` ### Graceful Degradation Provide fallback behavior: ```typescript async function getDataWithFallback() { try { return await primaryDataSource.get(); } catch (error) { logger.warn('Primary source failed, using fallback', { error }); return await fallbackDataSource.get(); } } ``` ## Best Practices 1. **Circuit Breaker**: Use for external service calls 2. **Retry**: Retry only transient failures (network, timeout) 3. **Timeout**: Set appropriate timeouts for all external calls 4. **Fallback**: Always provide fallback behavior 5. **Monitoring**: Monitor circuit breaker states and retry rates 6. **Logging**: Log all resilience actions for debugging ## Common Mistakes 1. **Retrying Non-Retryable Errors**: Retrying 4xx errors (client errors) 2. **No Timeout**: Missing timeouts on external calls 3. **No Fallback**: No graceful degradation strategy 4. **Too Many Retries**: Excessive retries causing performance issues ## Resources - [Circuit Breaker](../../services/iam-service/src/modules/common/circuit-breaker.ts) - Circuit breaker implementation