Files
pos-system/docs/en/skills/resilience-patterns.md
Ho Ngoc Hai 9b6c585f57 Enhance documentation structure and improve bilingual support across skills
- Updated skill documentation files to include structured metadata for better organization.
- Enhanced bilingual descriptions and guidelines for clarity in both English and Vietnamese.
- Refined sections on usage, best practices, and related skills to ensure consistency across all documentation.
- Improved formatting and removed outdated references to streamline the documentation experience.
- Added best practices checklists to relevant skills for better usability and adherence to standards.
2026-01-01 07:35:44 +07:00

4.2 KiB

name, description
name description
resilience-patterns Resilience patterns for GoodGo microservices including circuit breaker, retry strategies, timeout handling, and graceful degradation. Use when implementing fault tolerance, handling external service failures, or improving system reliability.

Resilience Patterns

When to Use This Skill

Use this skill when:

  • Implementing circuit breaker patterns for external services
  • Adding retry logic for transient failures
  • Setting timeout handling for long-running operations
  • Implementing graceful degradation strategies
  • Handling external service failures
  • Improving system fault tolerance

Core Concepts

Resilience Patterns

  1. Circuit Breaker: Prevents cascading failures by stopping calls to failing services
  2. Retry: Automatically retries failed operations with backoff
  3. Timeout: Sets maximum time limits for operations
  4. Bulkhead: Isolates failures to prevent spread
  5. Graceful Degradation: Provides fallback behavior when services fail

Patterns

Circuit Breaker Pattern

Protects against cascading failures:

import CircuitBreaker from 'opossum';
import { logger } from '@goodgo/logger';

export const createCircuitBreaker = <TArgs extends any[], TResult>(
  action: (...args: TArgs) => Promise<TResult>,
  name: string,
  options: Partial<CircuitBreaker.Options> = {}
): CircuitBreaker<TArgs, TResult> => {
  const breaker = new CircuitBreaker(action, {
    timeout: 3000,
    errorThresholdPercentage: 50,
    resetTimeout: 30000,
    ...options,
    name,
  });

  breaker.on('open', () => {
    logger.warn(`Circuit Breaker OPEN: ${name}`);
  });

  breaker.on('halfOpen', () => {
    logger.info(`Circuit Breaker HALF-OPEN: ${name}`);
  });

  breaker.on('close', () => {
    logger.info(`Circuit Breaker CLOSED: ${name}`);
  });

  return breaker;
};

// Usage
const externalApiBreaker = createCircuitBreaker(
  async (data) => await externalApi.call(data),
  'external-api'
);

try {
  const result = await externalApiBreaker.fire(requestData);
} catch (error) {
  // Handle circuit breaker error or fallback
}

Retry Pattern

Retry transient failures with exponential backoff:

async function retryWithBackoff<T>(
  fn: () => Promise<T>,
  maxRetries: number = 3,
  baseDelay: number = 1000
): Promise<T> {
  for (let attempt = 0; attempt <= maxRetries; attempt++) {
    try {
      return await fn();
    } catch (error) {
      if (attempt === maxRetries) throw error;
      
      const delay = baseDelay * Math.pow(2, attempt);
      await new Promise(resolve => setTimeout(resolve, delay));
    }
  }
  throw new Error('Retry exhausted');
}

Timeout Pattern

Set maximum time limits:

async function withTimeout<T>(
  promise: Promise<T>,
  timeoutMs: number
): Promise<T> {
  const timeout = new Promise<never>((_, reject) => {
    setTimeout(() => reject(new Error('Operation timeout')), timeoutMs);
  });
  
  return Promise.race([promise, timeout]);
}

// Usage
try {
  const result = await withTimeout(
    externalService.call(),
    5000 // 5 second timeout
  );
} catch (error) {
  if (error.message === 'Operation timeout') {
    // Handle timeout
  }
}

Graceful Degradation

Provide fallback behavior:

async function getDataWithFallback() {
  try {
    return await primaryDataSource.get();
  } catch (error) {
    logger.warn('Primary source failed, using fallback', { error });
    return await fallbackDataSource.get();
  }
}

Best Practices

  1. Circuit Breaker: Use for external service calls
  2. Retry: Retry only transient failures (network, timeout)
  3. Timeout: Set appropriate timeouts for all external calls
  4. Fallback: Always provide fallback behavior
  5. Monitoring: Monitor circuit breaker states and retry rates
  6. Logging: Log all resilience actions for debugging

Common Mistakes

  1. Retrying Non-Retryable Errors: Retrying 4xx errors (client errors)
  2. No Timeout: Missing timeouts on external calls
  3. No Fallback: No graceful degradation strategy
  4. Too Many Retries: Excessive retries causing performance issues

Resources