Files
pos-system/docs/en/architecture/data-consistency-patterns.md

20 KiB

Data Consistency Patterns

Patterns for maintaining data consistency in distributed microservices architecture

Overview Diagram

graph TD
    subgraph "Consistency Patterns"
        Saga[Saga Pattern<br/>Distributed Transactions]
        Outbox[Outbox Pattern<br/>Reliable Events]
        Idempotency[Idempotency<br/>Retry Safety]
        OptimisticLock[Optimistic Locking<br/>Concurrent Updates]
        CQRS[CQRS<br/>Read/Write Separation]
    end
    
    Service1[Service A] --> Saga
    Service2[Service B] --> Outbox
    Service3[Service C] --> Idempotency
    
    Saga --> EventualConsistency[Eventual Consistency]
    Outbox --> EventualConsistency
    Idempotency --> EventualConsistency
    OptimisticLock --> StrongConsistency[Strong Consistency]
    CQRS --> EventualConsistency
    
    style Saga fill:#e1f5ff
    style Outbox fill:#fff4e1
    style Idempotency fill:#f0e1ff
    style CQRS fill:#d4edda

Architecture Description

Architecture Overview

GoodGo platform uses multiple consistency patterns to handle distributed data:

Core Challenges:

  • No distributed transactions (2PC too slow)
  • Services own their data (database per service)
  • Network failures can cause partial completion
  • Need to maintain data integrity across services

Pattern Selection:

  • Saga: For multi-service workflows
  • Outbox: For guaranteed event publishing
  • Idempotency: For safe retries
  • Optimistic Locking: For concurrent updates
  • CQRS: For read/write optimization

System Context

C4Context
    title System Context for Data Consistency in GoodGo Platform
    
    Person(user, "User", "End user performing actions")
    
    System_Boundary(goodgo, "GoodGo Microservices") {
        System(order_service, "Order Service", "Manages orders with Saga")
        System(payment_service, "Payment Service", "Processes payments")
        System(inventory_service, "Inventory Service", "Manages stock")
        System(saga_orchestrator, "Saga Orchestrator", "Coordinates distributed transactions")
        System(outbox_processor, "Outbox Processor", "Publishes events reliably")
    }
    
    System_Ext(db_order, "Order DB", "PostgreSQL with Outbox table")
    System_Ext(db_payment, "Payment DB", "PostgreSQL with version field")
    System_Ext(db_inventory, "Inventory DB", "PostgreSQL")
    System_Ext(kafka, "Event Bus", "Kafka - Event streaming")
    System_Ext(redis, "Cache", "Redis - Idempotency keys")
    
    Rel(user, order_service, "Places order", "HTTPS")
    Rel(order_service, saga_orchestrator, "Starts saga", "Internal")
    Rel(saga_orchestrator, payment_service, "Process payment", "HTTP")
    Rel(saga_orchestrator, inventory_service, "Reserve stock", "HTTP")
    
    Rel(order_service, db_order, "Writes + Outbox", "SQL")
    Rel(payment_service, db_payment, "Updates with version", "SQL")
    Rel(inventory_service, db_inventory, "Reads/Writes", "SQL")
    
    Rel(outbox_processor, db_order, "Polls outbox", "SQL")
    Rel(outbox_processor, kafka, "Publishes events", "Kafka Protocol")
    Rel(order_service, redis, "Checks idempotency key", "Redis Protocol")
    
    UpdateRelStyle(saga_orchestrator, payment_service, $lineColor="red", $textColor="red")
    UpdateRelStyle(saga_orchestrator, inventory_service, $lineColor="red", $textColor="red")

The GoodGo platform uses a database-per-service architecture where each service owns its data. Data consistency across services is achieved through patterns like Saga (for coordinated workflows), Outbox (for reliable event publishing), Idempotency (for safe retries), and Optimistic Locking (for concurrent updates). These patterns enable eventual consistency while maintaining data integrity.

Saga Pattern

sequenceDiagram
    participant Orchestrator
    participant OrderService
    participant PaymentService
    participant InventoryService
    
    Orchestrator->>OrderService: 1. Create Order
    OrderService-->>Orchestrator: Order Created
    
    Orchestrator->>PaymentService: 2. Process Payment
    PaymentService-->>Orchestrator: Payment Success
    
    Orchestrator->>InventoryService: 3. Reserve Inventory
    
    alt Inventory Reserved
        InventoryService-->>Orchestrator: Success
        Orchestrator->>Orchestrator: Complete Saga ✓
    else Inventory Failed
        InventoryService-->>Orchestrator: Failed ✗
        Orchestrator->>PaymentService: Compensate: Refund
        PaymentService-->>Orchestrator: Refunded
        Orchestrator->>OrderService: Compensate: Cancel Order
        OrderService-->>Orchestrator: Cancelled
    end

Description: Saga manages distributed transactions as sequence of local transactions with compensation.

Implementation:

// Saga orchestrator
class OrderSaga {
  async execute(orderData: OrderData): Promise<void> {
    const sagaContext = {
      orderId: null,
      paymentId: null,
      inventoryId: null
    };
    
    try {
      // Step 1: Create order
      sagaContext.orderId = await orderService.create(orderData);
      
      // Step 2: Process payment
      sagaContext.paymentId = await paymentService.process(orderData.payment);
      
      // Step 3: Reserve inventory
      sagaContext.inventoryId = await inventoryService.reserve(orderData.items);
      
      // All success - commit
      await this.completeSaga(sagaContext);
    } catch (error) {
      // Compensate in reverse order
      await this.compensate(sagaContext, error);
      throw error;
    }
  }
  
  private async compensate(context: SagaContext, error: Error): Promise<void> {
    if (context.inventoryId) {
      await inventoryService.release(context.inventoryId);
    }
    if (context.paymentId) {
      await paymentService.refund(context.paymentId);
    }
    if (context.orderId) {
      await orderService.cancel(context.orderId);
    }
  }
}

Outbox Pattern

sequenceDiagram
    participant Service
    participant DB as Database
    participant OutboxTable as Outbox Table
    participant Processor as Outbox Processor
    participant Kafka
    
    Service->>DB: Begin Transaction
    Service->>DB: Update Business Data
    Service->>OutboxTable: Insert Event
    Service->>DB: Commit Transaction
    
    loop Every 5 seconds
        Processor->>OutboxTable: SELECT unpublished events
        OutboxTable-->>Processor: Events
        Processor->>Kafka: Publish Events
        Kafka-->>Processor: Ack
        Processor->>OutboxTable: Mark as published
    end

Description: Guarantees event publishing by storing events in database within same transaction as business data.

Implementation:

// Store event in outbox
async createUser(userData: CreateUserDto): Promise<User> {
  return await prisma.$transaction(async (tx) => {
    // Business operation
    const user = await tx.user.create({ data: userData });
    
    // Store event in outbox (same transaction)
    await tx.outbox.create({
      data: {
        aggregateId: user.id,
        aggregateType: 'User',
        eventType: 'user.created.v1',
        payload: JSON.stringify(user),
        createdAt: new Date()
      }
    });
    
    return user;
  });
}

// Outbox processor (runs periodically)
async processOutbox(): Promise<void> {
  const events = await prisma.outbox.findMany({
    where: { publishedAt: null },
    take: 100
  });
  
  for (const event of events) {
    try {
      await kafkaProducer.send({
        topic: event.eventType,
        messages: [{ value: event.payload }]
      });
      
      await prisma.outbox.update({
        where: { id: event.id },
        data: { publishedAt: new Date() }
      });
    } catch (error) {
      logger.error('Failed to publish event', { event, error });
    }
  }
}

Idempotency Pattern

graph LR
    Request1[Request with<br/>Idempotency Key]
    Request2[Retry with<br/>Same Key]
    
    Request1 --> Check{Key Exists?}
    Check -->|No| Process[Process Request]
    Check -->|Yes| Return[Return Cached Result]
    
    Process --> Store[Store Result<br/>with Key]
    Store --> Response1[Response]
    
    Request2 --> Check
    Return --> Response2[Same Response]
    
    style Check fill:#fff3cd
    style Store fill:#d4edda

Description: Ensures operations can be safely retried without side effects by using idempotency keys.

Implementation:

// Idempotency middleware
async function idempotentOperation<T>(
  key: string,
  operation: () => Promise<T>,
  ttl: number = 86400 // 24 hours
): Promise<T> {
  // Check if already processed
  const cached = await redis.get(`idempotency:${key}`);
  if (cached) {
    return JSON.parse(cached);
  }
  
  // Process operation
  const result = await operation();
  
  // Store result
  await redis.setex(`idempotency:${key}`, ttl, JSON.stringify(result));
  
  return result;
}

// Usage in controller
async createPayment(req: Request, res: Response): Promise<void> {
  const idempotencyKey = req.headers['idempotency-key'] as string;
  
  if (!idempotencyKey) {
    return res.status(400).json({ error: 'Idempotency-Key header required' });
  }
  
  const result = await idempotentOperation(
    idempotencyKey,
    () => paymentService.process(req.body)
  );
  
  res.json({ success: true, data: result });
}

Optimistic Locking

sequenceDiagram
    participant User1
    participant User2
    participant Service
    participant DB
    
    User1->>Service: Read (version=1)
    User2->>Service: Read (version=1)
    
    User1->>Service: Update (version=1)
    Service->>DB: UPDATE WHERE version=1
    DB-->>Service: Success, version→2
    Service-->>User1: Success
    
    User2->>Service: Update (version=1)
    Service->>DB: UPDATE WHERE version=1
    DB-->>Service: No rows updated
    Service-->>User2: Conflict - version mismatch
    User2->>Service: Read (version=2)
    User2->>Service: Update (version=2)
    Service-->>User2: Success

Description: Prevents lost updates by checking version on update.

Implementation:

// Prisma schema
model User {
  id      String @id @default(cuid())
  email   String @unique
  name    String
  version Int    @default(1)  // Version field
}
// Update with optimistic locking
async updateUser(userId: string, data: UpdateUserDto, currentVersion: number): Promise<User> {
  const result = await prisma.user.updateMany({
    where: {
      id: userId,
      version: currentVersion  // Check version
    },
    data: {
      ...data,
      version: { increment: 1 }  // Increment version
    }
  });
  
  if (result.count === 0) {
    throw new ConflictError('Version mismatch - data was modified by another user');
  }
  
  return await prisma.user.findUnique({ where: { id: userId } });
}

CQRS Pattern

graph LR
    subgraph "Write Side"
        Command[Command] --> WriteModel[Write Model<br/>Normalized]
        WriteModel --> Events[Domain Events]
    end
    
    subgraph "Read Side"
        Events --> Projection[Event Projection]
        Projection --> ReadModel[Read Model<br/>Denormalized]
        Query[Query] --> ReadModel
    end
    
    WriteModel --> DB1[(Write DB)]
    ReadModel --> DB2[(Read DB<br/>Optimized)]
    
    style WriteModel fill:#f0e1ff
    style ReadModel fill:#d4edda

Description: Separates read and write models for optimal performance.

Performance Characteristics

Performance metrics and optimization strategies for data consistency patterns.

Pattern Latency Impact Throughput Notes
Saga Execution 500ms - 2s 100-500 sagas/s Depends on number of steps and compensation
Outbox Processing < 100ms 10,000 events/s Async processing, minimal user impact
Idempotency Check < 10ms 50,000 checks/s Redis lookup, very fast
Optimistic Lock Update < 50ms 5,000 updates/s Single DB operation with version check
CQRS Projection 100ms - 1s 1,000 events/s Event processing to read model
Compensation Execution 200ms - 1s Varies Rollback operations in saga

Performance Optimization Strategies

Saga Pattern:

  • Minimize number of steps (< 5 steps ideal)
  • Parallel execution where possible
  • Cache service responses
  • Set appropriate timeouts (30s default)

Outbox Pattern:

  • Batch process outbox events (100-500 per batch)
  • Index publishedAt column for performance
  • Archive processed events periodically
  • Use connection pooling for Kafka

Idempotency:

  • Use Redis for fast key lookups
  • Set TTL to 24-48 hours
  • Hash long idempotency keys
  • Clean expired keys regularly

Optimistic Locking:

  • Works best for low-contention scenarios
  • Implement retry with exponential backoff
  • Monitor conflict rates (should be < 5%)
  • Consider pessimistic locking if conflicts > 10%

Security Considerations

Security measures for protecting data consistency operations.

Saga Security

Compensation Protection:

  • Validate saga execution permissions at each step
  • Encrypt sensitive data in saga context
  • Log all saga executions for audit
  • Implement timeout to prevent hanging sagas
// Secure saga context
interface SecureSagaContext {
  sagaId: string;
  userId: string; // User who initiated
  permissions: string[]; // Required permissions
  encryptedData: string; // Encrypted sensitive data
  auditLog: AuditEntry[]; // Audit trail
}

Outbox Security

Event Payload Encryption:

  • Encrypt PII (Personally Identifiable Information) before storing in outbox
  • Use AES-256-GCM for event payload encryption
  • Decrypt only when publishing to Kafka
  • Rotate encryption keys quarterly

Access Control:

  • Restrict outbox table access to outbox processor only
  • Use database roles and permissions
  • Monitor outbox table access patterns

Idempotency Security

Key Security:

  • Use cryptographic hashing for idempotency keys (SHA-256)
  • Include user context in key generation
  • Validate key ownership before processing
  • Clear keys on user logout for sensitive operations
// Secure idempotency key generation
function generateIdempotencyKey(
  operation: string,
  userId: string,
  data: any
): string {
  const payload = JSON.stringify({ operation, userId, data });
  return crypto.createHash('sha256').update(payload).digest('hex');
}

Optimistic Locking Security

Version Tampering Prevention:

  • Validate version field on server-side only
  • Never accept version from client directly
  • Log version conflicts for security monitoring
  • Rate limit update attempts per user

Deployment

How data consistency patterns are deployed and scaled.

graph TD
    subgraph "Production Deployment"
        subgraph "Order Service Cluster"
            OS1[Order Service\nPod 1]
            OS2[Order Service\nPod 2]
            OS3[Order Service\nPod 3]
        end
        
        subgraph "Saga Orchestrator"
            SO1[Saga Orchestrator\nPod 1]
            SO2[Saga Orchestrator\nPod 2]
        end
        
        subgraph "Outbox Processor"
            OP1[Outbox Processor\nPod 1]
            OP2[Outbox Processor\nPod 2]
        end
        
        OS1 & OS2 & OS3 --> DB[(Order DB\nwith Outbox)]
        OS1 & OS2 & OS3 --> Redis[(Redis\nIdempotency Keys)]
        
        SO1 & SO2 --> PS[Payment Service]
        SO1 & SO2 --> IS[Inventory Service]
        
        OP1 & OP2 --> DB
        OP1 & OP2 --> Kafka[Kafka Cluster\n5 brokers]
    end
    
    style SO1 fill:#e1f5ff
    style SO2 fill:#e1f5ff
    style OP1 fill:#fff4e1
    style OP2 fill:#fff4e1
    style DB fill:#d4edda
    style Kafka fill:#ffe1e1

Deployment Configuration

Component Replicas Resources HA Strategy
Saga Orchestrator 2-3 512Mi RAM, 500m CPU Leader election with etcd
Outbox Processor 2-5 256Mi RAM, 250m CPU Distributed lock per event batch
Services with Outbox 3+ Varies Standard service scaling
Redis (Idempotency) 3 nodes 1Gi RAM each Redis Cluster with replication

Scaling Strategy

Saga Orchestrator:

  • Scale based on pending saga count
  • Use queue-based load distribution
  • Monitor saga execution duration

Outbox Processor:

  • Scale with database sharding (1 processor per shard)
  • Increase batch size before adding replicas
  • Monitor outbox table size and age

Idempotency Store (Redis):

  • Scale Redis cluster horizontally
  • Use consistent hashing for key distribution
  • Monitor memory usage (should be < 70%)

Monitoring & Observability

Monitoring strategies for data consistency patterns.

Key Metrics

Saga Metrics:

  • saga_executions_total - Total saga executions (success/failure)
  • saga_duration_seconds - Saga execution time histogram
  • saga_compensations_total - Total compensation executions
  • saga_timeout_total - Sagas that timed out
  • saga_pending_count - Sagas currently executing

Outbox Metrics:

  • outbox_events_total - Events written to outbox
  • outbox_published_total - Events published to Kafka
  • outbox_processing_lag_seconds - Time from write to publish
  • outbox_table_size - Outbox table row count
  • outbox_failed_events_total - Failed event publications

Idempotency Metrics:

  • idempotency_checks_total - Total idempotency checks
  • idempotency_hits_total - Duplicate requests prevented
  • idempotency_key_ttl_seconds - Average key TTL
  • idempotency_redis_errors_total - Redis failures

Optimistic Lock Metrics:

  • optimistic_lock_conflicts_total - Version conflicts detected
  • optimistic_lock_retries_total - Retry attempts after conflict
  • optimistic_lock_success_rate - Update success percentage

Alerts

Critical Alerts:

# Saga timeout rate too high
alert: HighSagaTimeoutRate
expr: rate(saga_timeout_total[5m]) > 0.05
for: 5m
severity: critical

# Outbox processing lag
alert: OutboxProcessingLag
expr: outbox_processing_lag_seconds > 300
for: 10m
severity: critical

# High optimistic lock conflict rate
alert: HighOptimisticLockConflicts
expr: rate(optimistic_lock_conflicts_total[5m]) / rate(optimistic_lock_attempts_total[5m]) > 0.1
for: 5m
severity: warning

Monitoring Dashboard

Grafana Panels:

  1. Saga Orchestration Overview:

    • Saga execution rate (success/failure)
    • Average saga duration
    • Compensation rate
    • Pending saga count
  2. Outbox Processing Health:

    • Outbox publishing rate
    • Processing lag (P95, P99)
    • Failed events
    • Table size trend
  3. Idempotency Effectiveness:

    • Duplicate prevention rate
    • Redis hit rate
    • Key distribution
  4. Data Consistency SLA:

    • Overall consistency rate (target: 99.9%)
    • Mean time to consistency (MTTC)
    • Conflict resolution success rate

Distributed Tracing

Trace Saga Execution:

// Traced saga step
async function executeStepWithTracing(
  step: SagaStep,
  context: SagaContext
): Promise<void> {
  const tracer = trace.getTracer('saga-orchestrator');
  const span = tracer.startSpan(`saga.step.${step.name}`, {
    attributes: {
      'saga.id': context.sagaId,
      'saga.step': step.name,
      'saga.attempt': context.currentAttempt
    }
  });

  try {
    await step.execute(context);
    span.setStatus({ code: SpanStatusCode.OK });
  } catch (error) {
    span.setStatus({ code: SpanStatusCode.ERROR, message: error.message });
    span.recordException(error);
    throw error;
  } finally {
    span.end();
  }
}

Last Updated: 2026-01-07
Author: VelikHo (hongochai10@icloud.com) Reviewers: To be assigned