- Added request/response flow diagrams to api-design and api-gateway-advanced skills for better visualization of processes. - Introduced configuration loading flow in configuration-management skill to clarify the configuration process. - Included error propagation flow in error-handling-patterns skill to illustrate error handling across layers. - Enhanced various skills with additional diagrams to improve understanding of complex concepts. These updates aim to provide clearer guidance and improve the overall documentation experience for developers.
6.7 KiB
6.7 KiB
name, description
| name | description |
|---|---|
| resilience-patterns | Resilience patterns for GoodGo microservices including circuit breaker, retry strategies, timeout handling, and graceful degradation. Use when implementing fault tolerance, handling external service failures, or improving system reliability. |
Resilience Patterns
When to Use This Skill
Use this skill when:
- Implementing circuit breaker patterns for external services
- Adding retry logic for transient failures
- Setting timeout handling for long-running operations
- Implementing graceful degradation strategies
- Handling external service failures
- Improving system fault tolerance
Core Concepts
Resilience Patterns
- Circuit Breaker: Prevents cascading failures by stopping calls to failing services
- Retry: Automatically retries failed operations with backoff
- Timeout: Sets maximum time limits for operations
- Bulkhead: Isolates failures to prevent spread
- Graceful Degradation: Provides fallback behavior when services fail
Patterns
Circuit Breaker Pattern
Protects against cascading failures:
The circuit breaker has three states that transition based on error rates and timeouts:
stateDiagram-v2
[*] --> CLOSED: Initial State
CLOSED --> OPEN: Errors exceed threshold<br/>(errorThresholdPercentage: 50%)
OPEN --> HALF_OPEN: Reset timeout expires<br/>(resetTimeout: 30s)
HALF_OPEN --> CLOSED: Request succeeds
HALF_OPEN --> OPEN: Request fails
CLOSED --> [*]: Normal operation
OPEN --> [*]: Circuit open (rejecting requests)
HALF_OPEN --> [*]: Testing recovery
Circuit Breaker States:
- CLOSED: Normal operation, requests pass through
- OPEN: Circuit is open, requests are immediately rejected
- HALF-OPEN: Testing if service has recovered, allows limited requests
import CircuitBreaker from 'opossum';
import { logger } from '@goodgo/logger';
export const createCircuitBreaker = <TArgs extends any[], TResult>(
action: (...args: TArgs) => Promise<TResult>,
name: string,
options: Partial<CircuitBreaker.Options> = {}
): CircuitBreaker<TArgs, TResult> => {
const breaker = new CircuitBreaker(action, {
timeout: 3000,
errorThresholdPercentage: 50,
resetTimeout: 30000,
...options,
name,
});
breaker.on('open', () => {
logger.warn(`Circuit Breaker OPEN: ${name}`);
});
breaker.on('halfOpen', () => {
logger.info(`Circuit Breaker HALF-OPEN: ${name}`);
});
breaker.on('close', () => {
logger.info(`Circuit Breaker CLOSED: ${name}`);
});
return breaker;
};
// Usage
const externalApiBreaker = createCircuitBreaker(
async (data) => await externalApi.call(data),
'external-api'
);
try {
const result = await externalApiBreaker.fire(requestData);
} catch (error) {
// Handle circuit breaker error or fallback
}
Retry Pattern
Retry transient failures with exponential backoff:
The retry pattern attempts an operation multiple times with increasing delays between attempts:
flowchart TD
Start([Start Operation]) --> Attempt[Attempt Operation]
Attempt --> Success{Success?}
Success -->|Yes| Return([Return Result])
Success -->|No| CheckRetries{Attempt < Max Retries?}
CheckRetries -->|No| ThrowError([Throw Error])
CheckRetries -->|Yes| CalculateDelay[Calculate Delay:<br/>baseDelay × 2^attempt]
CalculateDelay --> Wait[Wait for Delay]
Wait --> IncrementAttempt[Increment Attempt]
IncrementAttempt --> Attempt
style Start fill:#e1f5e1
style Return fill:#e1f5e1
style ThrowError fill:#ffe1e1
style CalculateDelay fill:#fff4e1
Exponential Backoff Example:
- Attempt 1: 1s delay
- Attempt 2: 2s delay
- Attempt 3: 4s delay
- Attempt 4: 8s delay
async function retryWithBackoff<T>(
fn: () => Promise<T>,
maxRetries: number = 3,
baseDelay: number = 1000
): Promise<T> {
for (let attempt = 0; attempt <= maxRetries; attempt++) {
try {
return await fn();
} catch (error) {
if (attempt === maxRetries) throw error;
const delay = baseDelay * Math.pow(2, attempt);
await new Promise(resolve => setTimeout(resolve, delay));
}
}
throw new Error('Retry exhausted');
}
Timeout Pattern
Set maximum time limits:
The timeout pattern uses Promise.race to enforce maximum execution time:
sequenceDiagram
participant Client
participant TimeoutWrapper
participant Operation
participant TimeoutTimer
Client->>TimeoutWrapper: Execute with timeout
TimeoutWrapper->>Operation: Start operation
TimeoutWrapper->>TimeoutTimer: Start timeout timer
alt Operation completes first
Operation-->>TimeoutWrapper: Return result
TimeoutWrapper-->>Client: Return result
TimeoutWrapper->>TimeoutTimer: Cancel timer
else Timeout expires first
TimeoutTimer-->>TimeoutWrapper: Timeout error
TimeoutWrapper->>Operation: (Operation may continue)
TimeoutWrapper-->>Client: Reject with timeout error
end
Timeout Behavior:
- Uses
Promise.race()to compete operation vs timeout - First to resolve/reject wins
- Operation may continue after timeout, but result is ignored
async function withTimeout<T>(
promise: Promise<T>,
timeoutMs: number
): Promise<T> {
const timeout = new Promise<never>((_, reject) => {
setTimeout(() => reject(new Error('Operation timeout')), timeoutMs);
});
return Promise.race([promise, timeout]);
}
// Usage
try {
const result = await withTimeout(
externalService.call(),
5000 // 5 second timeout
);
} catch (error) {
if (error.message === 'Operation timeout') {
// Handle timeout
}
}
Graceful Degradation
Provide fallback behavior:
async function getDataWithFallback() {
try {
return await primaryDataSource.get();
} catch (error) {
logger.warn('Primary source failed, using fallback', { error });
return await fallbackDataSource.get();
}
}
Best Practices
- Circuit Breaker: Use for external service calls
- Retry: Retry only transient failures (network, timeout)
- Timeout: Set appropriate timeouts for all external calls
- Fallback: Always provide fallback behavior
- Monitoring: Monitor circuit breaker states and retry rates
- Logging: Log all resilience actions for debugging
Common Mistakes
- Retrying Non-Retryable Errors: Retrying 4xx errors (client errors)
- No Timeout: Missing timeouts on external calls
- No Fallback: No graceful degradation strategy
- Too Many Retries: Excessive retries causing performance issues
Resources
- Circuit Breaker - Circuit breaker implementation