# Kiến trúc Caching / Caching Architecture > **VI**: Chiến lược caching nhiều tầng để tối ưu hiệu suất > **EN**: Multi-layer caching strategy for optimal performance ## Sơ đồ Tổng quan / Overview Diagram ```mermaid graph TD Request[API Request] --> L1{L1 Cache
Memory} L1 -->|Hit| Return1[Return
< 1ms] L1 -->|Miss| L2{L2 Cache
Redis} L2 -->|Hit| WarmL1[Warm L1] WarmL1 --> Return2[Return
< 5ms] L2 -->|Miss| DB[(Database)] DB --> StoreL2[Store L2 + L1] StoreL2 --> Return3[Return
< 50ms] style L1 fill:#d4edda style L2 fill:#fff4e1 style DB fill:#f0e1ff ``` ``` ## Bối cảnh Hệ thống / System Context ```mermaid C4Context title Sơ đồ Bối cảnh Hệ thống Caching / Caching System Context System(service, "Microservice", "Client service using cache") System_Ext(db, "Neon PostgreSQL", "Primary database") Boundary(caching, "Caching Layer") { System(l1, "L1 Cache", "In-memory NodeCache") System(l2, "L2 Cache", "Redis Cluster") } Rel(service, l1, "Reads/Writes", "In-process") Rel(service, l2, "Reads/Writes", "Redis Protocol") Rel(l1, l2, "Fills from", "On miss") Rel(l2, db, "Cache aside", "On miss") ``` ### VI Mô tả Bối cảnh - **Service**: Giao tiếp trực tiếp với L1 Cache (in-memory) để đạt độ trễ thấp nhất. - **L1 Cache**: Cache cục bộ, không chia sẻ, tự động hết hạn (TTL ngắn). - **L2 Cache**: Redis cluster chia sẻ, giữ dữ liệu lâu dài hơn và đồng bộ giữa các instances. - **Database**: Nguồn dữ liệu gốc (source of truth), chỉ được truy cập khi cache miss. ### EN Context Description - **Service**: Communicates directly with L1 Cache (in-memory) for lowest latency. - **L1 Cache**: Local cache, not shared, automatic expiration (short TTL). - **L2 Cache**: Shared Redis cluster, holds data longer and syncs across instances. - **Database**: Source of truth, accessed only on cache miss. ## Mô tả Kiến trúc / Architecture Description ### VI: Caching Nhiều Tầng Nền tảng GoodGo sử dụng caching 2 tầng để tối ưu hiệu suất: **L1 Cache (Memory)**: - In-memory cache trên mỗi service instance - Truy cập rất nhanh (< 1ms) - Dung lượng giới hạn (10k keys mặc định) - TTL ngắn (60 giây mặc định, tối đa 5 phút) - Không share giữa instances **L2 Cache (Redis)**: - Shared distributed cache - Truy cập nhanh (< 5ms) - Dung lượng lớn - TTL dài hơn (configurable, thường 5-15 phút) - Share giữa tất cả service instances **Cache Flow**: ``` Request → L1 → L2 → Database ↓ ↓ ↓ ↓ 40-50% 80-90% 10-20% Cache miss hit rate hit rate rate ``` ### EN: Multi-Layer Caching GoodGo platform uses 2-layer caching for performance: **L1 Cache (Memory)**: - In-memory cache per service instance - Very fast access (< 1ms) - Limited capacity (10k keys default) - Short TTL (60 seconds default, max 5 minutes) - Not shared across instances **L2 Cache (Redis)**: - Shared distributed cache - Fast access (< 5ms) - Large capacity - Longer TTL (configurable, typically 5-15 minutes) - Shared across all service instances **Cache Flow**: ``` Request → L1 → L2 → Database ↓ ↓ ↓ ↓ 40-50% 80-90% 10-20% Cache miss hit rate hit rate rate ``` ## Triển khai Cache / Cache Implementation ### Multi-Layer Cache Service ```typescript // VI: Triển khai multi-layer cache // EN: Multi-layer cache implementation export class MultiLayerCache { private l1Cache: NodeCache; private l2Cache: Redis; constructor() { // VI: L1: Memory cache // EN: L1: Memory cache this.l1Cache = new NodeCache({ stdTTL: 60, // VI: 60 giây mặc định / EN: 60 seconds default maxKeys: 10000, // VI: Tối đa 10k keys / EN: Max 10k keys checkperiod: 120 // VI: Kiểm tra expired keys mỗi 2 phút / EN: Check for expired keys every 2min }); // VI: L2: Redis cache // EN: L2: Redis cache this.l2Cache = new Redis({ host: process.env.REDIS_HOST, port: parseInt(process.env.REDIS_PORT), db: 0 }); } async get(key: string): Promise { // VI: Thử L1 trước // EN: Try L1 first const l1Value = this.l1Cache.get(key); if (l1Value) { logger.debug('L1 cache hit', { key }); return l1Value; } // VI: Thử L2 // EN: Try L2 const l2Value = await this.l2Cache.get(key); if (l2Value) { logger.debug('L2 cache hit', { key }); const parsed = JSON.parse(l2Value) as T; // VI: Làm ấm L1 cache // EN: Warm L1 cache this.l1Cache.set(key, parsed); return parsed; } logger.debug('Cache miss', { key }); return null; } async set(key: string, value: any, ttl: number = 300): Promise { // VI: Lưu vào cả L1 và L2 // EN: Store in both L1 and L2 this.l1Cache.set(key, value, Math.min(ttl, 300)); // VI: L1 tối đa 5 phút / EN: L1 max 5min await this.l2Cache.setex(key, ttl, JSON.stringify(value)); } async del(key: string): Promise { this.l1Cache.del(key); await this.l2Cache.del(key); } async invalidatePattern(pattern: string): Promise { // VI: L1: Xóa tất cả (cách đơn giản) // EN: L1: Clear all (simple approach) this.l1Cache.flushAll(); // VI: L2: Xóa theo pattern // EN: L2: Delete by pattern const keys = await this.l2Cache.keys(pattern); if (keys.length > 0) { await this.l2Cache.del(...keys); } } } ``` ### Quy ước Đặt tên Key / Cache Key Naming **Pattern**: `{service}:{entity}:{identifier}:{sub-resource}` **Ví dụ / Examples**: ```typescript // VI: User cache keys // EN: User cache keys const keys = { user: (userId: string) => `iam:user:${userId}`, userPermissions: (userId: string) => `iam:user:${userId}:permissions`, userRoles: (userId: string) => `iam:user:${userId}:roles`, session: (sessionId: string) => `iam:session:${sessionId}`, }; // VI: Sử dụng // EN: Usage const user = await cache.get(keys.user('user_123')); const permissions = await cache.get(keys.userPermissions('user_123')); ``` ## Chiến lược TTL / TTL Strategies ```mermaid graph LR subgraph "TTL Tiers" Short[Short TTL
60-300s
Frequently changing] Medium[Medium TTL
300-1800s
Moderately changing] Long[Long TTL
1800-3600s
Rarely changing] end Short --> Permissions[User Permissions] Short --> Sessions[Session Data] Medium --> UserProfiles[User Profiles] Medium --> OrgData[Organization Data] Long --> Config[Static Config] Long --> RefData[Reference Data] %% style Short fill:#f8d7da %% style Medium fill:#fff3cd %% style Long fill:#d4edda ``` **Hướng dẫn TTL / TTL Guidelines**: | Loại Dữ liệu / Data Type | TTL | Lý do / Reason | |---------------------------|-----|----------------| | User permissions | 5 min | Security-sensitive / Nhạy cảm bảo mật | | Session data | Varies | Based on session length / Dựa trên độ dài session | | User profiles | 10 min | Moderate update frequency / Tần suất cập nhật vừa phải | | Organization data | 15 min | Infrequent updates / Cập nhật không thường xuyên | | Static config | 30-60 min | Very stable / Rất ổn định | | Reference data | 1-2 hours | Almost never changes / Hầu như không thay đổi | ## Vô hiệu hóa Cache / Cache Invalidation ```mermaid sequenceDiagram participant API participant Service participant Cache participant DB API->>Service: Update User Service->>DB: UPDATE user DB-->>Service: Success Service->>Cache: Invalidate user:123 Service->>Cache: Invalidate user:123:permissions Service->>Cache: Invalidate user:123:roles Cache-->>Service: Cleared Service-->>API: Success Note over Service,Cache: Next request will fetch fresh data ``` **Chiến lược Invalidation / Invalidation Strategies**: ```typescript // VI: 1. Invalidation single key // EN: 1. Single key invalidation async updateUser(userId: string, data: UpdateUserDto): Promise { const user = await userRepository.update(userId, data); // VI: Vô hiệu hóa user cache // EN: Invalidate user cache await cache.del(cacheKeys.user(userId)); return user; } // VI: 2. Invalidation theo pattern // EN: 2. Pattern-based invalidation async updateUserRole(userId: string, roleId: string): Promise { await userRoleRepository.assign(userId, roleId); // VI: Vô hiệu hóa tất cả cache liên quan đến user // EN: Invalidate all user-related cache await cache.invalidatePattern(`iam:user:${userId}:*`); } // VI: 3. Invalidation theo thời gian (TTL expiry) // EN: 3. Time-based invalidation (TTL expiry) // VI: Tự động xử lý bởi cache // EN: Automatically handled by cache ``` ## Làm ấm Cache / Cache Warming ```typescript // VI: Preload dữ liệu thường xuyên truy cập // EN: Preload frequently accessed data async warmCache(): Promise { logger.info('Starting cache warming'); // VI: Làm ấm user permissions cho active users // EN: Warm user permissions for active users const activeUsers = await userRepository.findActive({ limit: 1000 }); for (const user of activeUsers) { const permissions = await rbacService.getUserPermissions(user.id); await cache.set( cacheKeys.userPermissions(user.id), permissions, 300 // VI: 5 phút / EN: 5 minutes ); } logger.info('Cache warming completed', { count: activeUsers.length }); } // VI: Chạy khi service khởi động // EN: Run on service startup warmCache().catch(err => logger.error('Cache warming failed', { err })); ``` ## Quyết định Thiết kế / Design Decisions ### Quyết định 1: Multi-layer Caching (L1 + L2) **VI Bối cảnh**: Cần giảm tải cho Redis và đạt độ trễ cực thấp cho dữ liệu hot. **VI Quyết định**: Sử dụng kết hợp L1 (NodeCache) và L2 (Redis). **VI Hậu quả**: - ✅ Độ trễ < 1ms cho 40-50% requests. - ✅ Giảm network traffic tới Redis. - ❌ Phức tạp trong đồng bộ (L1 có thể stale trong thời gian ngắn). **EN Context**: Need to reduce load on Redis and achieve ultra-low latency for hot data. **EN Decision**: Use combination of L1 (NodeCache) and L2 (Redis). **EN Consequences**: - ✅ Latency < 1ms for 40-50% requests. - ✅ Reduced network traffic to Redis. - ❌ synchronization complexity (L1 might be stale for short duration). ## Đặc điểm Hiệu suất / Performance Characteristics ### VI: Mục tiêu Hiệu suất | Chỉ số / Metric | Mục tiêu / Target | Ghi chú / Notes | |-----------------|-------------------|-----------------| | **L1 Hit Latency** | < 0.5ms | In-memory lookup | | **L2 Hit Latency** | < 5ms | Network RTT + Redis processing | | **Combine Hit Rate** | > 90% | L1 + L2 combined | | **L1 Capacity** | 10k items | Per instance limit to protect heap | | **Cache Warmup Time** | < 30s | At service startup | ### EN: Performance Targets | Metric | Target | Notes | |--------|--------|-------| | **L1 Hit Latency** | < 0.5ms | In-memory lookup | | **L2 Hit Latency** | < 5ms | Network RTT + Redis processing | | **Combine Hit Rate** | > 90% | L1 + L2 combined | | **L1 Capacity** | 10k items | Per instance limit to protect heap | | **Cache Warmup Time** | < 30s | At service startup | ## Cân nhắc Bảo mật / Security Considerations ### VI: Bảo mật Cache - **Encryption**: Dữ liệu nhạy cảm (PII) PHẢI được mã hóa trước khi lưu vào L2 Redis (AES-256). L1 có thể lưuplaintext vì nằm trong memory process (trừ khi memory dump). - **Isolation**: Redis instance được bảo vệ bằng mật khẩu và Network Policy (chỉ allow traffic từ nội bộ K8s). - **TLS**: Kết nối tới Redis qua TLS 1.2+. - **Data Sanitization**: Không cache toàn bộ user object nếu chứa password hash hoặc secrets. ### EN: Cache Security - **Encryption**: Sensitive data (PII) MUST be encrypted before storing in L2 Redis (AES-256). L1 can store plaintext as it is in process memory (unless memory dump). - **Isolation**: Redis instance protected by password and Network Policy (allow internal K8s traffic only). - **TLS**: Connect to Redis via TLS 1.2+. - **Data Sanitization**: Do not cache entire user objects if they contain password hashes or secrets. ## Triển khai / Deployment ```mermaid graph TD subgraph "Kubernetes Pod" Service[Microservice Container] L1[L1 Cache (RAM)] Service --- L1 end subgraph "Infrastructure" RedisMaster[Redis Master] RedisSlave1[Redis Slave 1] RedisSlave2[Redis Slave 2] end Service -->|Write| RedisMaster Service -->|Read| RedisSlave1 Service -->|Read| RedisSlave2 RedisMaster -.->|Replication| RedisSlave1 RedisMaster -.->|Replication| RedisSlave2 style Service fill:#e1f5ff style L1 fill:#d4edda style RedisMaster fill:#fff4e1 ``` **VI Mô tả Triển khai**: - **L1**: Nhúng trực tiếp trong process của Microservice, scale theo số lượng Pods. - **L2**: Cụm Redis (Cluster hoặc Sentinel) với ít nhất 3 nodes cho High Availability. - **Connection Pooling**: Sử dụng ioredis với connection pooling để quản lý kết nối hiệu quả. **EN Deployment Description**: - **L1**: Embedded directly in Microservice process, scales with number of Pods. - **L2**: Redis Cluster (or Sentinel) with at least 3 nodes for High Availability. - **Connection Pooling**: Use ioredis with connection pooling for efficient connection management. ## Giám sát & Khả năng quan sát / Monitoring & Observability ### VI: Các chỉ số giám sát - **Metrics**: Prometheus metrics cho hit rate, miss rate, latency, memory usage. - **Logs**: Log cache miss/hit ở level debug (sample), log connection errors ở level error. - **Health Checks**: Readiness probe kiểm tra kết nối tới Redis. ### EN: Monitoring Metrics - **Metrics**: Prometheus metrics for hit rate, miss rate, latency, memory usage. - **Logs**: Log cache miss/hit at debug level (sampled), log connection errors at error level. - **Health Checks**: Readiness probe checks connection to Redis. ### Code Giám sát / Monitoring Code **Cache Hit Rates**: ```typescript // VI: Theo dõi hiệu suất cache // EN: Track cache performance export class CacheMetrics { private hits = new Counter({ name: 'cache_hits_total', help: 'Total cache hits', labelNames: ['layer', 'key_prefix'] }); private misses = new Counter({ name: 'cache_misses_total', help: 'Total cache misses', labelNames: ['layer', 'key_prefix'] }); recordHit(layer: 'l1' | 'l2', key: string): void { const prefix = key.split(':')[0]; this.hits.inc({ layer, key_prefix: prefix }); } recordMiss(key: string): void { const prefix = key.split(':')[0]; this.misses.inc({ layer: 'db', key_prefix: prefix }); } } ``` **Hiệu suất Kỳ vọng / Expected Performance**: | Chỉ số / Metric | L1 Cache | L2 Cache | Database | |-----------------|----------|----------|----------| | Độ trễ / Latency | < 1ms | < 5ms | < 50ms | | Tỷ lệ Hit / Hit Rate | 40-50% | 80-90% | - | | Dung lượng / Capacity | 10k keys | Unlimited | - | ## Best Practices **NÊN / DO**: - ✅ Sử dụng cache cho dữ liệu thường xuyên truy cập / Use cache for frequently accessed data - ✅ Đặt TTL phù hợp dựa trên tần suất thay đổi dữ liệu / Set appropriate TTLs based on data change frequency - ✅ Vô hiệu hóa cache khi cập nhật dữ liệu / Invalidate cache on data updates - ✅ Sử dụng cache key namespacing / Use cache key namespacing - ✅ Giám sát cache hit rates / Monitor cache hit rates - ✅ Làm ấm cache khi khởi động cho dữ liệu quan trọng / Warm cache on startup for critical data **KHÔNG NÊN / DON'T**: - ❌ Cache dữ liệu thay đổi rất thường xuyên / Cache data that changes very frequently - ❌ Đặt TTL quá dài (nguy cơ dữ liệu cũ) / Set TTL too long (stale data risk) - ❌ Đặt TTL quá ngắn (mất lợi ích cache) / Set TTL too short (negates cache benefit) - ❌ Cache dữ liệu nhạy cảm không mã hóa / Cache sensitive data without encryption - ❌ Bỏ qua cache invalidation khi cập nhật / Ignore cache invalidation on updates - ❌ Sử dụng cache làm primary data store / Use cache as primary data store ## Bối cảnh Hệ thống / System Context ```mermaid C4Context title Sơ đồ Bối cảnh Caching Architecture System(services, "Microservices", "Application services") System_Ext(redis, "Redis Cluster", "L2 distributed cache") System_Ext(db, "Neon PostgreSQL", "Primary data store") System_Ext(monitoring, "Monitoring", "Cache metrics & alerts") Rel(services, redis, "Cache operations", "Redis Protocol") Rel(services, db, "Data operations", "PostgreSQL") Rel(redis, monitoring, "Sends metrics", "Prometheus") BiRel(services, redis, "L2 cache miss → DB query") ``` **VI Mô tả**: - **Microservices**: Sử dụng multi-layer cache (L1: Memory, L2: Redis) - **Redis Cluster**: L2 cache shared giữa tất cả service instances - **PostgreSQL**: Primary data store, fallback khi cache miss - **Monitoring**: Thu thập cache metrics (hit rate, latency, evictions) **EN Description**: - **Microservices**: Use multi-layer cache (L1: Memory, L2: Redis) - **Redis Cluster**: L2 cache shared across all service instances - **PostgreSQL**: Primary data store, fallback on cache miss - **Monitoring**: Collects cache metrics (hit rate, latency, evictions) ## Cân nhắc Bảo mật / Security Considerations ### VI: Phần Tiếng Việt **Access Control**: - Redis AUTH password cho authentication - Network isolation: Redis chỉ accessible từ service pods - Kubernetes Network Policies: Whitelist specific services **Encryption**: - TLS cho Redis connections (optional, recommended for production) - Encryption at rest: Redis persistence files encrypted - Sensitive data: Encrypt before caching (AES-256-GCM) **Data Sensitivity**: - **KHÔNG cache**: Passwords, tokens, credit cards, SSN - **Cache với encryption**: PII (email, phone, address) - **Cache plaintext**: Non-sensitive data (public info, configs) **Cache Poisoning Prevention**: - Validate data before caching - Use signed cache keys để prevent tampering - Implement cache key namespacing per service **TTL Management**: - Short TTL (< 5 min) cho security-sensitive data - Invalidate cache immediately khi data changes - Auto-expire sessions on logout **Audit**: - Log cache access cho sensitive data - Monitor unusual cache patterns (high miss rate, frequent invalidations) - Alert on cache security events ### EN: English Section **Access Control**: - Redis AUTH password for authentication - Network isolation: Redis only accessible from service pods - Kubernetes Network Policies: Whitelist specific services **Encryption**: - TLS for Redis connections (optional, recommended for production) - Encryption at rest: Redis persistence files encrypted - Sensitive data: Encrypt before caching (AES-256-GCM) **Data Sensitivity**: - **DON'T cache**: Passwords, tokens, credit cards, SSN - **Cache with encryption**: PII (email, phone, address) - **Cache plaintext**: Non-sensitive data (public info, configs) **Cache Poisoning Prevention**: - Validate data before caching - Use signed cache keys to prevent tampering - Implement cache key namespacing per service **TTL Management**: - Short TTL (< 5 min) for security-sensitive data - Invalidate cache immediately when data changes - Auto-expire sessions on logout **Audit**: - Log cache access for sensitive data - Monitor unusual cache patterns (high miss rate, frequent invalidations) - Alert on cache security events ## Triển khai / Deployment ```mermaid graph TD subgraph "Redis Cluster" subgraph "Masters" M1[Redis Master 1
Slots: 0-5460] M2[Redis Master 2
Slots: 5461-10922] M3[Redis Master 3
Slots: 10923-16383] end subgraph "Slaves" S1[Redis Slave 1
Replica of M1] S2[Redis Slave 2
Replica of M2] S3[Redis Slave 3
Replica of M3] end M1 --> S1 M2 --> S2 M3 --> S3 Sentinel[Redis Sentinel
3 nodes] Sentinel -.->|Monitor| M1 Sentinel -.->|Monitor| M2 Sentinel -.->|Monitor| M3 end subgraph "Services" Service1[Service A] Service2[Service B] Service3[Service C] end Service1 --> M1 Service1 --> M2 Service1 --> M3 Service2 --> M1 Service2 --> M2 Service2 --> M3 Service3 --> M1 Service3 --> M2 Service3 --> M3 style M1 fill:#e1f5ff style M2 fill:#fff4e1 style M3 fill:#d4edda style Sentinel fill:#f0e1ff ``` ### VI: Chiến lược Triển khai **Redis Cluster Configuration**: - **Mode**: Cluster mode với 3 masters + 3 slaves - **Replication**: Mỗi master có 1 slave cho high availability - **Sentinel**: 3-node Sentinel ensemble cho automatic failover - **Sharding**: 16384 hash slots phân chia đều giữa 3 masters - **Persistence**: RDB snapshots mỗi 5 phút, AOF disabled (performance) **Resource Allocation**: | Component | CPU | Memory | Disk | Replicas | |-----------|-----|--------|------|----------| | **Redis Master** | 1 core | 2GB | 10GB SSD | 3 | | **Redis Slave** | 1 core | 2GB | 10GB SSD | 3 | | **Sentinel** | 500m | 512MB | 5GB | 3 | **Redis Configuration**: ```yaml # redis.conf maxmemory 2gb maxmemory-policy allkeys-lru # Evict least recently used keys timeout 300 # Close idle connections after 5min tcp-keepalive 60 save 300 10 # RDB snapshot every 5min if 10+ keys changed appendonly no # Disable AOF for performance # Cluster config cluster-enabled yes cluster-node-timeout 5000 cluster-replica-validity-factor 0 ``` **High Availability**: - Automatic failover với Redis Sentinel - Slave promotion khi master fails - Client-side retry logic - Connection pooling (max 50 connections per service) **Scaling Strategy**: - **Vertical**: Tăng memory per node (2GB → 4GB → 8GB) - **Horizontal**: Thêm master nodes (3 → 5 → 7) - **Read Scaling**: Route reads to slaves - **Monitoring**: Auto-alert khi memory usage > 80% ### EN: Deployment Strategy **Redis Cluster Configuration**: - **Mode**: Cluster mode with 3 masters + 3 slaves - **Replication**: Each master has 1 slave for high availability - **Sentinel**: 3-node Sentinel ensemble for automatic failover - **Sharding**: 16384 hash slots distributed evenly across 3 masters - **Persistence**: RDB snapshots every 5 minutes, AOF disabled (performance) **Resource Allocation**: | Component | CPU | Memory | Disk | Replicas | |-----------|-----|--------|------|----------| | **Redis Master** | 1 core | 2GB | 10GB SSD | 3 | | **Redis Slave** | 1 core | 2GB | 10GB SSD | 3 | | **Sentinel** | 500m | 512MB | 5GB | 3 | **Redis Configuration**: ```yaml # redis.conf maxmemory 2gb maxmemory-policy allkeys-lru # Evict least recently used keys timeout 300 # Close idle connections after 5min tcp-keepalive 60 save 300 10 # RDB snapshot every 5min if 10+ keys changed appendonly no # Disable AOF for performance # Cluster config cluster-enabled yes cluster-node-timeout 5000 cluster-replica-validity-factor 0 ``` **High Availability**: - Automatic failover with Redis Sentinel - Slave promotion when master fails - Client-side retry logic - Connection pooling (max 50 connections per service) **Scaling Strategy**: - **Vertical**: Increase memory per node (2GB → 4GB → 8GB) - **Horizontal**: Add master nodes (3 → 5 → 7) - **Read Scaling**: Route reads to slaves - **Monitoring**: Auto-alert when memory usage > 80% ## Giám sát & Khả năng quan sát / Monitoring & Observability ### VI: Chỉ số Chính **Cache Performance Metrics**: ```typescript // VI: Custom metrics cho cache performance // EN: Custom metrics for cache performance import { Counter, Histogram, Gauge } from 'prom-client'; export const cacheHits = new Counter({ name: 'cache_hits_total', help: 'Total cache hits', labelNames: ['layer', 'key_prefix'] // layer: l1/l2, key_prefix: user/session/etc }); export const cacheMisses = new Counter({ name: 'cache_misses_total', help: 'Total cache misses', labelNames: ['key_prefix'] }); export const cacheLatency = new Histogram({ name: 'cache_operation_duration_seconds', help: 'Cache operation duration', labelNames: ['operation', 'layer'], // operation: get/set/del buckets: [0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1] }); export const cacheSize = new Gauge({ name: 'cache_size_bytes', help: 'Cache size in bytes', labelNames: ['layer'] }); export const cacheEvictions = new Counter({ name: 'cache_evictions_total', help: 'Total cache evictions', labelNames: ['layer', 'reason'] // reason: ttl_expired/memory_full }); ``` **Redis Metrics**: - `redis_connected_clients` - Connected clients - `redis_used_memory_bytes` - Memory usage - `redis_memory_fragmentation_ratio` - Memory fragmentation - `redis_keyspace_hits_total` - Cache hits - `redis_keyspace_misses_total` - Cache misses - `redis_evicted_keys_total` - Evicted keys - `redis_expired_keys_total` - Expired keys - `redis_commands_processed_total` - Commands processed **Calculated Metrics**: ```promql # Cache hit rate rate(cache_hits_total[5m]) / (rate(cache_hits_total[5m]) + rate(cache_misses_total[5m])) # L1 hit rate rate(cache_hits_total{layer="l1"}[5m]) / rate(cache_hits_total[5m]) # L2 hit rate rate(cache_hits_total{layer="l2"}[5m]) / rate(cache_hits_total[5m]) # Average cache latency histogram_quantile(0.95, cache_operation_duration_seconds_bucket) # Memory usage percentage redis_used_memory_bytes / redis_maxmemory_bytes * 100 ``` **Alerting Rules**: ```yaml # VI: Quy tắc cảnh báo cho cache # EN: Alerting rules for cache groups: - name: cache_alerts interval: 30s rules: # Low cache hit rate - alert: LowCacheHitRate expr: | rate(cache_hits_total[5m]) / (rate(cache_hits_total[5m]) + rate(cache_misses_total[5m])) < 0.5 for: 10m labels: severity: warning annotations: summary: "Low cache hit rate" description: "Cache hit rate is {{ $value | humanizePercentage }}" # High memory usage - alert: HighRedisMemoryUsage expr: redis_used_memory_bytes / redis_maxmemory_bytes > 0.8 for: 5m labels: severity: warning annotations: summary: "High Redis memory usage" description: "Redis memory usage is {{ $value | humanizePercentage }}" # High eviction rate - alert: HighEvictionRate expr: rate(redis_evicted_keys_total[5m]) > 100 for: 5m labels: severity: warning annotations: summary: "High cache eviction rate" description: "Eviction rate is {{ $value }}/sec" # Redis down - alert: RedisDown expr: redis_up == 0 for: 1m labels: severity: critical annotations: summary: "Redis is down" # High replication lag - alert: HighReplicationLag expr: redis_replication_lag_seconds > 5 for: 2m labels: severity: warning annotations: summary: "High Redis replication lag" description: "Replication lag is {{ $value }}s" ``` **Dashboards**: - **Cache Overview**: Hit rate, miss rate, latency, size - **Redis Cluster**: Memory usage, connections, commands/sec - **Performance**: L1 vs L2 hit rates, operation latency - **Evictions**: Eviction rate, reasons, trends **Logging**: ```typescript // VI: Structured logging cho cache operations // EN: Structured logging for cache operations logger.debug('Cache operation', { operation: 'get', layer: 'l1', key: cacheKey, hit: true, latency: duration, correlationId: req.correlationId }); logger.warn('Cache eviction', { layer: 'l2', reason: 'memory_full', evictedKeys: count, memoryUsage: usagePercent }); logger.error('Cache error', { operation: 'set', layer: 'l2', error: error.message, key: cacheKey }); ``` **Health Checks**: ```typescript // VI: Health check cho Redis // EN: Health check for Redis async function checkRedisHealth(): Promise { try { await redis.ping(); const info = await redis.info('memory'); const memoryUsage = parseMemoryUsage(info); return memoryUsage < 0.9; // Healthy if < 90% memory } catch (error) { logger.error('Redis health check failed', { error }); return false; } } ``` ### EN: Key Metrics **Cache Performance Metrics**: ```typescript // Custom metrics for cache performance import { Counter, Histogram, Gauge } from 'prom-client'; export const cacheHits = new Counter({ name: 'cache_hits_total', help: 'Total cache hits', labelNames: ['layer', 'key_prefix'] // layer: l1/l2, key_prefix: user/session/etc }); export const cacheMisses = new Counter({ name: 'cache_misses_total', help: 'Total cache misses', labelNames: ['key_prefix'] }); export const cacheLatency = new Histogram({ name: 'cache_operation_duration_seconds', help: 'Cache operation duration', labelNames: ['operation', 'layer'], // operation: get/set/del buckets: [0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1] }); export const cacheSize = new Gauge({ name: 'cache_size_bytes', help: 'Cache size in bytes', labelNames: ['layer'] }); export const cacheEvictions = new Counter({ name: 'cache_evictions_total', help: 'Total cache evictions', labelNames: ['layer', 'reason'] // reason: ttl_expired/memory_full }); ``` **Redis Metrics**: - `redis_connected_clients` - Connected clients - `redis_used_memory_bytes` - Memory usage - `redis_memory_fragmentation_ratio` - Memory fragmentation - `redis_keyspace_hits_total` - Cache hits - `redis_keyspace_misses_total` - Cache misses - `redis_evicted_keys_total` - Evicted keys - `redis_expired_keys_total` - Expired keys - `redis_commands_processed_total` - Commands processed **Calculated Metrics**: ```promql # Cache hit rate rate(cache_hits_total[5m]) / (rate(cache_hits_total[5m]) + rate(cache_misses_total[5m])) # L1 hit rate rate(cache_hits_total{layer="l1"}[5m]) / rate(cache_hits_total[5m]) # L2 hit rate rate(cache_hits_total{layer="l2"}[5m]) / rate(cache_hits_total[5m]) # Average cache latency histogram_quantile(0.95, cache_operation_duration_seconds_bucket) # Memory usage percentage redis_used_memory_bytes / redis_maxmemory_bytes * 100 ``` **Alerting Rules**: ```yaml # Alerting rules for cache groups: - name: cache_alerts interval: 30s rules: # Low cache hit rate - alert: LowCacheHitRate expr: | rate(cache_hits_total[5m]) / (rate(cache_hits_total[5m]) + rate(cache_misses_total[5m])) < 0.5 for: 10m labels: severity: warning annotations: summary: "Low cache hit rate" description: "Cache hit rate is {{ $value | humanizePercentage }}" # High memory usage - alert: HighRedisMemoryUsage expr: redis_used_memory_bytes / redis_maxmemory_bytes > 0.8 for: 5m labels: severity: warning annotations: summary: "High Redis memory usage" description: "Redis memory usage is {{ $value | humanizePercentage }}" # High eviction rate - alert: HighEvictionRate expr: rate(redis_evicted_keys_total[5m]) > 100 for: 5m labels: severity: warning annotations: summary: "High cache eviction rate" description: "Eviction rate is {{ $value }}/sec" # Redis down - alert: RedisDown expr: redis_up == 0 for: 1m labels: severity: critical annotations: summary: "Redis is down" # High replication lag - alert: HighReplicationLag expr: redis_replication_lag_seconds > 5 for: 2m labels: severity: warning annotations: summary: "High Redis replication lag" description: "Replication lag is {{ $value }}s" ``` **Dashboards**: - **Cache Overview**: Hit rate, miss rate, latency, size - **Redis Cluster**: Memory usage, connections, commands/sec - **Performance**: L1 vs L2 hit rates, operation latency - **Evictions**: Eviction rate, reasons, trends **Logging**: ```typescript // Structured logging for cache operations logger.debug('Cache operation', { operation: 'get', layer: 'l1', key: cacheKey, hit: true, latency: duration, correlationId: req.correlationId }); logger.warn('Cache eviction', { layer: 'l2', reason: 'memory_full', evictedKeys: count, memoryUsage: usagePercent }); logger.error('Cache error', { operation: 'set', layer: 'l2', error: error.message, key: cacheKey }); ``` **Health Checks**: ```typescript // Health check for Redis async function checkRedisHealth(): Promise { try { await redis.ping(); const info = await redis.info('memory'); const memoryUsage = parseMemoryUsage(info); return memoryUsage < 0.9; // Healthy if < 90% memory } catch (error) { logger.error('Redis health check failed', { error }); return false; } } ``` ## Tài liệu Liên quan / Related Documentation - [System Design](./system-design.md) - Kiến trúc tổng thể với caching / Overall architecture with caching - [Data Consistency Patterns](./data-consistency-patterns.md) - Cache invalidation patterns --- **Cập nhật Lần cuối / Last Updated**: 2026-01-07 **Tác giả / Authors**: GoodGo Architecture Team