- Added request/response flow diagrams to api-design and api-gateway-advanced skills for better visualization of processes. - Introduced configuration loading flow in configuration-management skill to clarify the configuration process. - Included error propagation flow in error-handling-patterns skill to illustrate error handling across layers. - Enhanced various skills with additional diagrams to improve understanding of complex concepts. These updates aim to provide clearer guidance and improve the overall documentation experience for developers.
240 lines
6.7 KiB
Markdown
240 lines
6.7 KiB
Markdown
---
|
||
name: resilience-patterns
|
||
description: Resilience patterns for GoodGo microservices including circuit breaker, retry strategies, timeout handling, and graceful degradation. Use when implementing fault tolerance, handling external service failures, or improving system reliability.
|
||
---
|
||
|
||
# Resilience Patterns
|
||
|
||
## When to Use This Skill
|
||
|
||
Use this skill when:
|
||
- Implementing circuit breaker patterns for external services
|
||
- Adding retry logic for transient failures
|
||
- Setting timeout handling for long-running operations
|
||
- Implementing graceful degradation strategies
|
||
- Handling external service failures
|
||
- Improving system fault tolerance
|
||
|
||
## Core Concepts
|
||
|
||
### Resilience Patterns
|
||
|
||
1. **Circuit Breaker**: Prevents cascading failures by stopping calls to failing services
|
||
2. **Retry**: Automatically retries failed operations with backoff
|
||
3. **Timeout**: Sets maximum time limits for operations
|
||
4. **Bulkhead**: Isolates failures to prevent spread
|
||
5. **Graceful Degradation**: Provides fallback behavior when services fail
|
||
|
||
## Patterns
|
||
|
||
### Circuit Breaker Pattern
|
||
|
||
Protects against cascading failures:
|
||
|
||
The circuit breaker has three states that transition based on error rates and timeouts:
|
||
|
||
```mermaid
|
||
stateDiagram-v2
|
||
[*] --> CLOSED: Initial State
|
||
CLOSED --> OPEN: Errors exceed threshold<br/>(errorThresholdPercentage: 50%)
|
||
OPEN --> HALF_OPEN: Reset timeout expires<br/>(resetTimeout: 30s)
|
||
HALF_OPEN --> CLOSED: Request succeeds
|
||
HALF_OPEN --> OPEN: Request fails
|
||
CLOSED --> [*]: Normal operation
|
||
OPEN --> [*]: Circuit open (rejecting requests)
|
||
HALF_OPEN --> [*]: Testing recovery
|
||
```
|
||
|
||
**Circuit Breaker States:**
|
||
- **CLOSED**: Normal operation, requests pass through
|
||
- **OPEN**: Circuit is open, requests are immediately rejected
|
||
- **HALF-OPEN**: Testing if service has recovered, allows limited requests
|
||
|
||
```typescript
|
||
import CircuitBreaker from 'opossum';
|
||
import { logger } from '@goodgo/logger';
|
||
|
||
export const createCircuitBreaker = <TArgs extends any[], TResult>(
|
||
action: (...args: TArgs) => Promise<TResult>,
|
||
name: string,
|
||
options: Partial<CircuitBreaker.Options> = {}
|
||
): CircuitBreaker<TArgs, TResult> => {
|
||
const breaker = new CircuitBreaker(action, {
|
||
timeout: 3000,
|
||
errorThresholdPercentage: 50,
|
||
resetTimeout: 30000,
|
||
...options,
|
||
name,
|
||
});
|
||
|
||
breaker.on('open', () => {
|
||
logger.warn(`Circuit Breaker OPEN: ${name}`);
|
||
});
|
||
|
||
breaker.on('halfOpen', () => {
|
||
logger.info(`Circuit Breaker HALF-OPEN: ${name}`);
|
||
});
|
||
|
||
breaker.on('close', () => {
|
||
logger.info(`Circuit Breaker CLOSED: ${name}`);
|
||
});
|
||
|
||
return breaker;
|
||
};
|
||
|
||
// Usage
|
||
const externalApiBreaker = createCircuitBreaker(
|
||
async (data) => await externalApi.call(data),
|
||
'external-api'
|
||
);
|
||
|
||
try {
|
||
const result = await externalApiBreaker.fire(requestData);
|
||
} catch (error) {
|
||
// Handle circuit breaker error or fallback
|
||
}
|
||
```
|
||
|
||
### Retry Pattern
|
||
|
||
Retry transient failures with exponential backoff:
|
||
|
||
The retry pattern attempts an operation multiple times with increasing delays between attempts:
|
||
|
||
```mermaid
|
||
flowchart TD
|
||
Start([Start Operation]) --> Attempt[Attempt Operation]
|
||
Attempt --> Success{Success?}
|
||
Success -->|Yes| Return([Return Result])
|
||
Success -->|No| CheckRetries{Attempt < Max Retries?}
|
||
CheckRetries -->|No| ThrowError([Throw Error])
|
||
CheckRetries -->|Yes| CalculateDelay[Calculate Delay:<br/>baseDelay × 2^attempt]
|
||
CalculateDelay --> Wait[Wait for Delay]
|
||
Wait --> IncrementAttempt[Increment Attempt]
|
||
IncrementAttempt --> Attempt
|
||
|
||
style Start fill:#e1f5e1
|
||
style Return fill:#e1f5e1
|
||
style ThrowError fill:#ffe1e1
|
||
style CalculateDelay fill:#fff4e1
|
||
```
|
||
|
||
**Exponential Backoff Example:**
|
||
- Attempt 1: 1s delay
|
||
- Attempt 2: 2s delay
|
||
- Attempt 3: 4s delay
|
||
- Attempt 4: 8s delay
|
||
|
||
```typescript
|
||
async function retryWithBackoff<T>(
|
||
fn: () => Promise<T>,
|
||
maxRetries: number = 3,
|
||
baseDelay: number = 1000
|
||
): Promise<T> {
|
||
for (let attempt = 0; attempt <= maxRetries; attempt++) {
|
||
try {
|
||
return await fn();
|
||
} catch (error) {
|
||
if (attempt === maxRetries) throw error;
|
||
|
||
const delay = baseDelay * Math.pow(2, attempt);
|
||
await new Promise(resolve => setTimeout(resolve, delay));
|
||
}
|
||
}
|
||
throw new Error('Retry exhausted');
|
||
}
|
||
```
|
||
|
||
### Timeout Pattern
|
||
|
||
Set maximum time limits:
|
||
|
||
The timeout pattern uses Promise.race to enforce maximum execution time:
|
||
|
||
```mermaid
|
||
sequenceDiagram
|
||
participant Client
|
||
participant TimeoutWrapper
|
||
participant Operation
|
||
participant TimeoutTimer
|
||
|
||
Client->>TimeoutWrapper: Execute with timeout
|
||
TimeoutWrapper->>Operation: Start operation
|
||
TimeoutWrapper->>TimeoutTimer: Start timeout timer
|
||
|
||
alt Operation completes first
|
||
Operation-->>TimeoutWrapper: Return result
|
||
TimeoutWrapper-->>Client: Return result
|
||
TimeoutWrapper->>TimeoutTimer: Cancel timer
|
||
else Timeout expires first
|
||
TimeoutTimer-->>TimeoutWrapper: Timeout error
|
||
TimeoutWrapper->>Operation: (Operation may continue)
|
||
TimeoutWrapper-->>Client: Reject with timeout error
|
||
end
|
||
```
|
||
|
||
**Timeout Behavior:**
|
||
- Uses `Promise.race()` to compete operation vs timeout
|
||
- First to resolve/reject wins
|
||
- Operation may continue after timeout, but result is ignored
|
||
|
||
```typescript
|
||
async function withTimeout<T>(
|
||
promise: Promise<T>,
|
||
timeoutMs: number
|
||
): Promise<T> {
|
||
const timeout = new Promise<never>((_, reject) => {
|
||
setTimeout(() => reject(new Error('Operation timeout')), timeoutMs);
|
||
});
|
||
|
||
return Promise.race([promise, timeout]);
|
||
}
|
||
|
||
// Usage
|
||
try {
|
||
const result = await withTimeout(
|
||
externalService.call(),
|
||
5000 // 5 second timeout
|
||
);
|
||
} catch (error) {
|
||
if (error.message === 'Operation timeout') {
|
||
// Handle timeout
|
||
}
|
||
}
|
||
```
|
||
|
||
### Graceful Degradation
|
||
|
||
Provide fallback behavior:
|
||
|
||
```typescript
|
||
async function getDataWithFallback() {
|
||
try {
|
||
return await primaryDataSource.get();
|
||
} catch (error) {
|
||
logger.warn('Primary source failed, using fallback', { error });
|
||
return await fallbackDataSource.get();
|
||
}
|
||
}
|
||
```
|
||
|
||
## Best Practices
|
||
|
||
1. **Circuit Breaker**: Use for external service calls
|
||
2. **Retry**: Retry only transient failures (network, timeout)
|
||
3. **Timeout**: Set appropriate timeouts for all external calls
|
||
4. **Fallback**: Always provide fallback behavior
|
||
5. **Monitoring**: Monitor circuit breaker states and retry rates
|
||
6. **Logging**: Log all resilience actions for debugging
|
||
|
||
## Common Mistakes
|
||
|
||
1. **Retrying Non-Retryable Errors**: Retrying 4xx errors (client errors)
|
||
2. **No Timeout**: Missing timeouts on external calls
|
||
3. **No Fallback**: No graceful degradation strategy
|
||
4. **Too Many Retries**: Excessive retries causing performance issues
|
||
|
||
## Resources
|
||
|
||
- [Circuit Breaker](../../services/iam-service/src/modules/common/circuit-breaker.ts) - Circuit breaker implementation
|