Files
pos-system/docs/en/skills/resilience-patterns.md
Ho Ngoc Hai 2640b351c3 Enhance documentation with detailed diagrams and structured flows
- Added request/response flow diagrams to api-design and api-gateway-advanced skills for better visualization of processes.
- Introduced configuration loading flow in configuration-management skill to clarify the configuration process.
- Included error propagation flow in error-handling-patterns skill to illustrate error handling across layers.
- Enhanced various skills with additional diagrams to improve understanding of complex concepts.

These updates aim to provide clearer guidance and improve the overall documentation experience for developers.
2026-01-01 23:22:54 +07:00

6.7 KiB
Raw Blame History

name, description
name description
resilience-patterns Resilience patterns for GoodGo microservices including circuit breaker, retry strategies, timeout handling, and graceful degradation. Use when implementing fault tolerance, handling external service failures, or improving system reliability.

Resilience Patterns

When to Use This Skill

Use this skill when:

  • Implementing circuit breaker patterns for external services
  • Adding retry logic for transient failures
  • Setting timeout handling for long-running operations
  • Implementing graceful degradation strategies
  • Handling external service failures
  • Improving system fault tolerance

Core Concepts

Resilience Patterns

  1. Circuit Breaker: Prevents cascading failures by stopping calls to failing services
  2. Retry: Automatically retries failed operations with backoff
  3. Timeout: Sets maximum time limits for operations
  4. Bulkhead: Isolates failures to prevent spread
  5. Graceful Degradation: Provides fallback behavior when services fail

Patterns

Circuit Breaker Pattern

Protects against cascading failures:

The circuit breaker has three states that transition based on error rates and timeouts:

stateDiagram-v2
    [*] --> CLOSED: Initial State
    CLOSED --> OPEN: Errors exceed threshold<br/>(errorThresholdPercentage: 50%)
    OPEN --> HALF_OPEN: Reset timeout expires<br/>(resetTimeout: 30s)
    HALF_OPEN --> CLOSED: Request succeeds
    HALF_OPEN --> OPEN: Request fails
    CLOSED --> [*]: Normal operation
    OPEN --> [*]: Circuit open (rejecting requests)
    HALF_OPEN --> [*]: Testing recovery

Circuit Breaker States:

  • CLOSED: Normal operation, requests pass through
  • OPEN: Circuit is open, requests are immediately rejected
  • HALF-OPEN: Testing if service has recovered, allows limited requests
import CircuitBreaker from 'opossum';
import { logger } from '@goodgo/logger';

export const createCircuitBreaker = <TArgs extends any[], TResult>(
  action: (...args: TArgs) => Promise<TResult>,
  name: string,
  options: Partial<CircuitBreaker.Options> = {}
): CircuitBreaker<TArgs, TResult> => {
  const breaker = new CircuitBreaker(action, {
    timeout: 3000,
    errorThresholdPercentage: 50,
    resetTimeout: 30000,
    ...options,
    name,
  });

  breaker.on('open', () => {
    logger.warn(`Circuit Breaker OPEN: ${name}`);
  });

  breaker.on('halfOpen', () => {
    logger.info(`Circuit Breaker HALF-OPEN: ${name}`);
  });

  breaker.on('close', () => {
    logger.info(`Circuit Breaker CLOSED: ${name}`);
  });

  return breaker;
};

// Usage
const externalApiBreaker = createCircuitBreaker(
  async (data) => await externalApi.call(data),
  'external-api'
);

try {
  const result = await externalApiBreaker.fire(requestData);
} catch (error) {
  // Handle circuit breaker error or fallback
}

Retry Pattern

Retry transient failures with exponential backoff:

The retry pattern attempts an operation multiple times with increasing delays between attempts:

flowchart TD
    Start([Start Operation]) --> Attempt[Attempt Operation]
    Attempt --> Success{Success?}
    Success -->|Yes| Return([Return Result])
    Success -->|No| CheckRetries{Attempt < Max Retries?}
    CheckRetries -->|No| ThrowError([Throw Error])
    CheckRetries -->|Yes| CalculateDelay[Calculate Delay:<br/>baseDelay × 2^attempt]
    CalculateDelay --> Wait[Wait for Delay]
    Wait --> IncrementAttempt[Increment Attempt]
    IncrementAttempt --> Attempt
    
    style Start fill:#e1f5e1
    style Return fill:#e1f5e1
    style ThrowError fill:#ffe1e1
    style CalculateDelay fill:#fff4e1

Exponential Backoff Example:

  • Attempt 1: 1s delay
  • Attempt 2: 2s delay
  • Attempt 3: 4s delay
  • Attempt 4: 8s delay
async function retryWithBackoff<T>(
  fn: () => Promise<T>,
  maxRetries: number = 3,
  baseDelay: number = 1000
): Promise<T> {
  for (let attempt = 0; attempt <= maxRetries; attempt++) {
    try {
      return await fn();
    } catch (error) {
      if (attempt === maxRetries) throw error;
      
      const delay = baseDelay * Math.pow(2, attempt);
      await new Promise(resolve => setTimeout(resolve, delay));
    }
  }
  throw new Error('Retry exhausted');
}

Timeout Pattern

Set maximum time limits:

The timeout pattern uses Promise.race to enforce maximum execution time:

sequenceDiagram
    participant Client
    participant TimeoutWrapper
    participant Operation
    participant TimeoutTimer
    
    Client->>TimeoutWrapper: Execute with timeout
    TimeoutWrapper->>Operation: Start operation
    TimeoutWrapper->>TimeoutTimer: Start timeout timer
    
    alt Operation completes first
        Operation-->>TimeoutWrapper: Return result
        TimeoutWrapper-->>Client: Return result
        TimeoutWrapper->>TimeoutTimer: Cancel timer
    else Timeout expires first
        TimeoutTimer-->>TimeoutWrapper: Timeout error
        TimeoutWrapper->>Operation: (Operation may continue)
        TimeoutWrapper-->>Client: Reject with timeout error
    end

Timeout Behavior:

  • Uses Promise.race() to compete operation vs timeout
  • First to resolve/reject wins
  • Operation may continue after timeout, but result is ignored
async function withTimeout<T>(
  promise: Promise<T>,
  timeoutMs: number
): Promise<T> {
  const timeout = new Promise<never>((_, reject) => {
    setTimeout(() => reject(new Error('Operation timeout')), timeoutMs);
  });
  
  return Promise.race([promise, timeout]);
}

// Usage
try {
  const result = await withTimeout(
    externalService.call(),
    5000 // 5 second timeout
  );
} catch (error) {
  if (error.message === 'Operation timeout') {
    // Handle timeout
  }
}

Graceful Degradation

Provide fallback behavior:

async function getDataWithFallback() {
  try {
    return await primaryDataSource.get();
  } catch (error) {
    logger.warn('Primary source failed, using fallback', { error });
    return await fallbackDataSource.get();
  }
}

Best Practices

  1. Circuit Breaker: Use for external service calls
  2. Retry: Retry only transient failures (network, timeout)
  3. Timeout: Set appropriate timeouts for all external calls
  4. Fallback: Always provide fallback behavior
  5. Monitoring: Monitor circuit breaker states and retry rates
  6. Logging: Log all resilience actions for debugging

Common Mistakes

  1. Retrying Non-Retryable Errors: Retrying 4xx errors (client errors)
  2. No Timeout: Missing timeouts on external calls
  3. No Fallback: No graceful degradation strategy
  4. Too Many Retries: Excessive retries causing performance issues

Resources