Files

Ho Ngoc Hai 495618ded7 docs: Thêm tài liệu kiến trúc bảo mật, hướng sự kiện, nhất quán dữ liệu, khả năng quan sát và caching bằng tiếng Việt, đồng thời cập nhật các tài liệu hướng dẫn và kiến trúc hiện có.

2026-01-07 10:22:42 +07:00

10 KiB

Raw Blame History

Caching Architecture

Multi-layer caching strategy for optimal performance

Overview Diagram

graph TD
    Request[API Request] --> L1{L1 Cache<br/>Memory}
    
    L1 -->|Hit| Return1[Return<br/>< 1ms]
    L1 -->|Miss| L2{L2 Cache<br/>Redis}
    
    L2 -->|Hit| WarmL1[Warm L1]
    WarmL1 --> Return2[Return<br/>< 5ms]
    
    L2 -->|Miss| DB[(Database)]
    DB --> StoreL2[Store L2 + L1]
    StoreL2 --> Return3[Return<br/>< 50ms]
    
    style L1 fill:#d4edda
    style L2 fill:#fff4e1
    style DB fill:#f0e1ff

System Context

C4Context
    title Caching System Context

    System(service, "Microservice", "Client service using cache")
    System_Ext(db, "Neon PostgreSQL", "Primary database")
    
    Boundary(caching, "Caching Layer") {
        System(l1, "L1 Cache", "In-memory NodeCache")
        System(l2, "L2 Cache", "Redis Cluster")
    }

    Rel(service, l1, "Reads/Writes", "In-process")
    Rel(service, l2, "Reads/Writes", "Redis Protocol")
    Rel(l1, l2, "Fills from", "On miss")
    Rel(l2, db, "Cache aside", "On miss")

Context Description

Service: Communicates directly with L1 Cache (in-memory) for lowest latency.
L1 Cache: Local cache, not shared, automatic expiration (short TTL).
L2 Cache: Shared Redis cluster, holds data longer and syncs across instances.
Database: Source of truth, accessed only on cache miss.

Architecture Description

Multi-Layer Caching

GoodGo platform uses 2-layer caching for performance:

L1 Cache (Memory):

In-memory cache per service instance
Very fast access (< 1ms)
Limited capacity (10k keys default)
Short TTL (60 seconds default, max 5 minutes)
Not shared across instances

L2 Cache (Redis):

Shared distributed cache
Fast access (< 5ms)
Large capacity
Longer TTL (configurable, typically 5-15 minutes)
Shared across all service instances

Cache Flow:

Request → L1 → L2 → Database
  ↓        ↓    ↓      ↓
40-50%  80-90% 10-20%  Cache miss
hit rate hit rate        rate

Cache Implementation

Multi-Layer Cache Service

export class MultiLayerCache {
  private l1Cache: NodeCache;
  private l2Cache: Redis;
  
  constructor() {
    // L1: Memory cache
    this.l1Cache = new NodeCache({
      stdTTL: 60,        // 60 seconds default
      maxKeys: 10000,    // Max 10k keys
      checkperiod: 120   // Check for expired keys every 2min
    });
    
    // L2: Redis cache
    this.l2Cache = new Redis({
      host: process.env.REDIS_HOST,
      port: parseInt(process.env.REDIS_PORT),
      db: 0
    });
  }
  
  async get<T>(key: string): Promise<T | null> {
    // Try L1 first
    const l1Value = this.l1Cache.get<T>(key);
    if (l1Value) {
      logger.debug('L1 cache hit', { key });
      return l1Value;
    }
    
    // Try L2
    const l2Value = await this.l2Cache.get(key);
    if (l2Value) {
      logger.debug('L2 cache hit', { key });
      const parsed = JSON.parse(l2Value) as T;
      
      // Warm L1 cache
      this.l1Cache.set(key, parsed);
      return parsed;
    }
    
    logger.debug('Cache miss', { key });
    return null;
  }
  
  async set(key: string, value: any, ttl: number = 300): Promise<void> {
    // Store in both L1 and L2
    this.l1Cache.set(key, value, Math.min(ttl, 300)); // L1 max 5min
    await this.l2Cache.setex(key, ttl, JSON.stringify(value));
  }
  
  async del(key: string): Promise<void> {
    this.l1Cache.del(key);
    await this.l2Cache.del(key);
  }
  
  async invalidatePattern(pattern: string): Promise<void> {
    // L1: Clear all (simple approach)
    this.l1Cache.flushAll();
    
    // L2: Delete by pattern
    const keys = await this.l2Cache.keys(pattern);
    if (keys.length > 0) {
      await this.l2Cache.del(...keys);
    }
  }
}

Cache Key Naming

Pattern: {service}:{entity}:{identifier}:{sub-resource}

Examples:

const keys = {
  user: (userId: string) => `iam:user:${userId}`,
  userPermissions: (userId: string) => `iam:user:${userId}:permissions`,
  userRoles: (userId: string) => `iam:user:${userId}:roles`,
  session: (sessionId: string) => `iam:session:${sessionId}`,
};

// Usage
const user = await cache.get(keys.user('user_123'));
const permissions = await cache.get(keys.userPermissions('user_123'));

TTL Strategies

graph LR
    subgraph "TTL Tiers"
        Short[Short TTL<br/>60-300s<br/>Frequently changing]
        Medium[Medium TTL<br/>300-1800s<br/>Moderately changing]
        Long[Long TTL<br/>1800-3600s<br/>Rarely changing]
    end
    
    Short --> Permissions[User Permissions]
    Short --> Sessions[Session Data]
    
    Medium --> UserProfiles[User Profiles]
    Medium --> OrgData[Organization Data]
    
    Long --> Config[Static Config]
    Long --> RefData[Reference Data]
    
    style Short fill:#f8d7da
    style Medium fill:#fff3cd
    style Long fill:#d4edda

TTL Guidelines:

Data Type	TTL	Reason
User permissions	5 min	Security-sensitive
Session data	Varies	Based on session length
User profiles	10 min	Moderate update frequency
Organization data	15 min	Infrequent updates
Static config	30-60 min	Very stable
Reference data	1-2 hours	Almost never changes

Cache Invalidation

sequenceDiagram
    participant API
    participant Service
    participant Cache
    participant DB
    
    API->>Service: Update User
    Service->>DB: UPDATE user
    DB-->>Service: Success
    
    Service->>Cache: Invalidate user:123
    Service->>Cache: Invalidate user:123:permissions
    Service->>Cache: Invalidate user:123:roles
    Cache-->>Service: Cleared
    
    Service-->>API: Success
    
    Note over Service,Cache: Next request will fetch fresh data

Invalidation Strategies:

// 1. Single key invalidation
async updateUser(userId: string, data: UpdateUserDto): Promise<User> {
  const user = await userRepository.update(userId, data);
  
  // Invalidate user cache
  await cache.del(cacheKeys.user(userId));
  
  return user;
}

// 2. Pattern-based invalidation
async updateUserRole(userId: string, roleId: string): Promise<void> {
  await userRoleRepository.assign(userId, roleId);
  
  // Invalidate all user-related cache
  await cache.invalidatePattern(`iam:user:${userId}:*`);
}

// 3. Time-based invalidation (TTL expiry)
// Automatically handled by cache

Cache Warming

// Preload frequently accessed data
async warmCache(): Promise<void> {
  logger.info('Starting cache warming');
  
  // Warm user permissions for active users
  const activeUsers = await userRepository.findActive({ limit: 1000 });
  
  for (const user of activeUsers) {
    const permissions = await rbacService.getUserPermissions(user.id);
    
    await cache.set(
      cacheKeys.userPermissions(user.id),
      permissions,
      300 // 5 minutes
    );
  }
  
  logger.info('Cache warming completed', { count: activeUsers.length });
}

// Run on service startup
warmCache().catch(err => logger.error('Cache warming failed', { err }));

Design Decisions

Decision 1: Multi-layer Caching (L1 + L2)

Context: Need to reduce load on Redis and achieve ultra-low latency for hot data. Decision: Use combination of L1 (NodeCache) and L2 (Redis). Consequences:

✅ Latency < 1ms for 40-50% requests.
✅ Reduced network traffic to Redis.
❌ Synchronization complexity (L1 might be stale for short duration).

Performance Characteristics

Performance Targets

Metric	Target	Notes
L1 Hit Latency	< 0.5ms	In-memory lookup
L2 Hit Latency	< 5ms	Network RTT + Redis processing
Combine Hit Rate	> 90%	L1 + L2 combined
L1 Capacity	10k items	Per instance limit to protect heap
Cache Warmup Time	< 30s	At service startup

Security Considerations

Cache Security

Encryption: Sensitive data (PII) MUST be encrypted before storing in L2 Redis (AES-256). L1 can store plaintext as it is in process memory (unless memory dump).
Isolation: Redis instance protected by password and Network Policy (allow internal K8s traffic only).
TLS: Connect to Redis via TLS 1.2+.
Data Sanitization: Do not cache entire user objects if they contain password hashes or secrets.

Deployment

graph TD
    subgraph "Kubernetes Pod"
        Service[Microservice Container]
        L1[L1 Cache (RAM)]
        Service --- L1
    end

    subgraph "Infrastructure"
        RedisMaster[Redis Master]
        RedisSlave1[Redis Slave 1]
        RedisSlave2[Redis Slave 2]
    end

    Service -->|Write| RedisMaster
    Service -->|Read| RedisSlave1
    Service -->|Read| RedisSlave2

    RedisMaster -.->|Replication| RedisSlave1
    RedisMaster -.->|Replication| RedisSlave2

    style Service fill:#e1f5ff
    style L1 fill:#d4edda
    style RedisMaster fill:#fff4e1

Deployment Description:

L1: Embedded directly in Microservice process, scales with number of Pods.
L2: Redis Cluster (or Sentinel) with at least 3 nodes for High Availability.
Connection Pooling: Use ioredis with connection pooling for efficient connection management.

Monitoring & Observability

Monitoring Metrics

Metrics: Prometheus metrics for hit rate, miss rate, latency, memory usage.
Logs: Log cache miss/hit at debug level (sampled), log connection errors at error level.
Health Checks: Readiness probe checks connection to Redis.

Monitoring Code

Cache Hit Rates:

// Track cache performance
export class CacheMetrics {
  // ... Prometheus Implementation ...
}

Expected Performance:

Metric	L1 Cache	L2 Cache	Database
Latency	< 1ms	< 5ms	< 50ms
Hit Rate	40-50%	80-90%	-
Capacity	10k keys	Unlimited	-

Best Practices

DO:

✅ Use cache for frequently accessed data
✅ Set appropriate TTLs based on data change frequency
✅ Invalidate cache on data updates
✅ Use cache key namespacing
✅ Monitor cache hit rates
✅ Warm cache on startup for critical data

DON'T:

❌ Cache data that changes very frequently
❌ Set TTL too long (stale data risk)
❌ Set TTL too short (negates cache benefit)
❌ Cache sensitive data without encryption
❌ Ignore cache invalidation on updates
❌ Use cache as primary data store

10 KiB Raw Blame History