1105 lines
34 KiB
Markdown
1105 lines
34 KiB
Markdown
# Kiến trúc Caching / Caching Architecture
|
|
|
|
> **VI**: Chiến lược caching nhiều tầng để tối ưu hiệu suất
|
|
> **EN**: Multi-layer caching strategy for optimal performance
|
|
|
|
## Sơ đồ Tổng quan / Overview Diagram
|
|
|
|
```mermaid
|
|
graph TD
|
|
Request[API Request] --> L1{L1 Cache<br/>Memory}
|
|
|
|
L1 -->|Hit| Return1[Return<br/>< 1ms]
|
|
L1 -->|Miss| L2{L2 Cache<br/>Redis}
|
|
|
|
L2 -->|Hit| WarmL1[Warm L1]
|
|
WarmL1 --> Return2[Return<br/>< 5ms]
|
|
|
|
L2 -->|Miss| DB[(Database)]
|
|
DB --> StoreL2[Store L2 + L1]
|
|
StoreL2 --> Return3[Return<br/>< 50ms]
|
|
|
|
style L1 fill:#d4edda
|
|
style L2 fill:#fff4e1
|
|
style DB fill:#f0e1ff
|
|
```
|
|
|
|
```
|
|
|
|
## Bối cảnh Hệ thống / System Context
|
|
|
|
```mermaid
|
|
C4Context
|
|
title Sơ đồ Bối cảnh Hệ thống Caching / Caching System Context
|
|
|
|
System(service, "Microservice", "Client service using cache")
|
|
System_Ext(db, "Neon PostgreSQL", "Primary database")
|
|
|
|
Boundary(caching, "Caching Layer") {
|
|
System(l1, "L1 Cache", "In-memory NodeCache")
|
|
System(l2, "L2 Cache", "Redis Cluster")
|
|
}
|
|
|
|
Rel(service, l1, "Reads/Writes", "In-process")
|
|
Rel(service, l2, "Reads/Writes", "Redis Protocol")
|
|
Rel(l1, l2, "Fills from", "On miss")
|
|
Rel(l2, db, "Cache aside", "On miss")
|
|
```
|
|
|
|
### VI Mô tả Bối cảnh
|
|
- **Service**: Giao tiếp trực tiếp với L1 Cache (in-memory) để đạt độ trễ thấp nhất.
|
|
- **L1 Cache**: Cache cục bộ, không chia sẻ, tự động hết hạn (TTL ngắn).
|
|
- **L2 Cache**: Redis cluster chia sẻ, giữ dữ liệu lâu dài hơn và đồng bộ giữa các instances.
|
|
- **Database**: Nguồn dữ liệu gốc (source of truth), chỉ được truy cập khi cache miss.
|
|
|
|
### EN Context Description
|
|
- **Service**: Communicates directly with L1 Cache (in-memory) for lowest latency.
|
|
- **L1 Cache**: Local cache, not shared, automatic expiration (short TTL).
|
|
- **L2 Cache**: Shared Redis cluster, holds data longer and syncs across instances.
|
|
- **Database**: Source of truth, accessed only on cache miss.
|
|
|
|
## Mô tả Kiến trúc / Architecture Description
|
|
|
|
### VI: Caching Nhiều Tầng
|
|
|
|
Nền tảng GoodGo sử dụng caching 2 tầng để tối ưu hiệu suất:
|
|
|
|
**L1 Cache (Memory)**:
|
|
- In-memory cache trên mỗi service instance
|
|
- Truy cập rất nhanh (< 1ms)
|
|
- Dung lượng giới hạn (10k keys mặc định)
|
|
- TTL ngắn (60 giây mặc định, tối đa 5 phút)
|
|
- Không share giữa instances
|
|
|
|
**L2 Cache (Redis)**:
|
|
- Shared distributed cache
|
|
- Truy cập nhanh (< 5ms)
|
|
- Dung lượng lớn
|
|
- TTL dài hơn (configurable, thường 5-15 phút)
|
|
- Share giữa tất cả service instances
|
|
|
|
**Cache Flow**:
|
|
```
|
|
Request → L1 → L2 → Database
|
|
↓ ↓ ↓ ↓
|
|
40-50% 80-90% 10-20% Cache miss
|
|
hit rate hit rate rate
|
|
```
|
|
|
|
### EN: Multi-Layer Caching
|
|
|
|
GoodGo platform uses 2-layer caching for performance:
|
|
|
|
**L1 Cache (Memory)**:
|
|
- In-memory cache per service instance
|
|
- Very fast access (< 1ms)
|
|
- Limited capacity (10k keys default)
|
|
- Short TTL (60 seconds default, max 5 minutes)
|
|
- Not shared across instances
|
|
|
|
**L2 Cache (Redis)**:
|
|
- Shared distributed cache
|
|
- Fast access (< 5ms)
|
|
- Large capacity
|
|
- Longer TTL (configurable, typically 5-15 minutes)
|
|
- Shared across all service instances
|
|
|
|
**Cache Flow**:
|
|
```
|
|
Request → L1 → L2 → Database
|
|
↓ ↓ ↓ ↓
|
|
40-50% 80-90% 10-20% Cache miss
|
|
hit rate hit rate rate
|
|
```
|
|
|
|
## Triển khai Cache / Cache Implementation
|
|
|
|
### Multi-Layer Cache Service
|
|
|
|
```typescript
|
|
// VI: Triển khai multi-layer cache
|
|
// EN: Multi-layer cache implementation
|
|
export class MultiLayerCache {
|
|
private l1Cache: NodeCache;
|
|
private l2Cache: Redis;
|
|
|
|
constructor() {
|
|
// VI: L1: Memory cache
|
|
// EN: L1: Memory cache
|
|
this.l1Cache = new NodeCache({
|
|
stdTTL: 60, // VI: 60 giây mặc định / EN: 60 seconds default
|
|
maxKeys: 10000, // VI: Tối đa 10k keys / EN: Max 10k keys
|
|
checkperiod: 120 // VI: Kiểm tra expired keys mỗi 2 phút / EN: Check for expired keys every 2min
|
|
});
|
|
|
|
// VI: L2: Redis cache
|
|
// EN: L2: Redis cache
|
|
this.l2Cache = new Redis({
|
|
host: process.env.REDIS_HOST,
|
|
port: parseInt(process.env.REDIS_PORT),
|
|
db: 0
|
|
});
|
|
}
|
|
|
|
async get<T>(key: string): Promise<T | null> {
|
|
// VI: Thử L1 trước
|
|
// EN: Try L1 first
|
|
const l1Value = this.l1Cache.get<T>(key);
|
|
if (l1Value) {
|
|
logger.debug('L1 cache hit', { key });
|
|
return l1Value;
|
|
}
|
|
|
|
// VI: Thử L2
|
|
// EN: Try L2
|
|
const l2Value = await this.l2Cache.get(key);
|
|
if (l2Value) {
|
|
logger.debug('L2 cache hit', { key });
|
|
const parsed = JSON.parse(l2Value) as T;
|
|
|
|
// VI: Làm ấm L1 cache
|
|
// EN: Warm L1 cache
|
|
this.l1Cache.set(key, parsed);
|
|
return parsed;
|
|
}
|
|
|
|
logger.debug('Cache miss', { key });
|
|
return null;
|
|
}
|
|
|
|
async set(key: string, value: any, ttl: number = 300): Promise<void> {
|
|
// VI: Lưu vào cả L1 và L2
|
|
// EN: Store in both L1 and L2
|
|
this.l1Cache.set(key, value, Math.min(ttl, 300)); // VI: L1 tối đa 5 phút / EN: L1 max 5min
|
|
await this.l2Cache.setex(key, ttl, JSON.stringify(value));
|
|
}
|
|
|
|
async del(key: string): Promise<void> {
|
|
this.l1Cache.del(key);
|
|
await this.l2Cache.del(key);
|
|
}
|
|
|
|
async invalidatePattern(pattern: string): Promise<void> {
|
|
// VI: L1: Xóa tất cả (cách đơn giản)
|
|
// EN: L1: Clear all (simple approach)
|
|
this.l1Cache.flushAll();
|
|
|
|
// VI: L2: Xóa theo pattern
|
|
// EN: L2: Delete by pattern
|
|
const keys = await this.l2Cache.keys(pattern);
|
|
if (keys.length > 0) {
|
|
await this.l2Cache.del(...keys);
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
### Quy ước Đặt tên Key / Cache Key Naming
|
|
|
|
**Pattern**: `{service}:{entity}:{identifier}:{sub-resource}`
|
|
|
|
**Ví dụ / Examples**:
|
|
```typescript
|
|
// VI: User cache keys
|
|
// EN: User cache keys
|
|
const keys = {
|
|
user: (userId: string) => `iam:user:${userId}`,
|
|
userPermissions: (userId: string) => `iam:user:${userId}:permissions`,
|
|
userRoles: (userId: string) => `iam:user:${userId}:roles`,
|
|
session: (sessionId: string) => `iam:session:${sessionId}`,
|
|
};
|
|
|
|
// VI: Sử dụng
|
|
// EN: Usage
|
|
const user = await cache.get(keys.user('user_123'));
|
|
const permissions = await cache.get(keys.userPermissions('user_123'));
|
|
```
|
|
|
|
## Chiến lược TTL / TTL Strategies
|
|
|
|
```mermaid
|
|
graph LR
|
|
subgraph "TTL Tiers"
|
|
Short[Short TTL<br/>60-300s<br/>Frequently changing]
|
|
Medium[Medium TTL<br/>300-1800s<br/>Moderately changing]
|
|
Long[Long TTL<br/>1800-3600s<br/>Rarely changing]
|
|
end
|
|
|
|
Short --> Permissions[User Permissions]
|
|
Short --> Sessions[Session Data]
|
|
|
|
Medium --> UserProfiles[User Profiles]
|
|
Medium --> OrgData[Organization Data]
|
|
|
|
Long --> Config[Static Config]
|
|
Long --> RefData[Reference Data]
|
|
|
|
%% style Short fill:#f8d7da
|
|
%% style Medium fill:#fff3cd
|
|
%% style Long fill:#d4edda
|
|
```
|
|
|
|
**Hướng dẫn TTL / TTL Guidelines**:
|
|
| Loại Dữ liệu / Data Type | TTL | Lý do / Reason |
|
|
|---------------------------|-----|----------------|
|
|
| User permissions | 5 min | Security-sensitive / Nhạy cảm bảo mật |
|
|
| Session data | Varies | Based on session length / Dựa trên độ dài session |
|
|
| User profiles | 10 min | Moderate update frequency / Tần suất cập nhật vừa phải |
|
|
| Organization data | 15 min | Infrequent updates / Cập nhật không thường xuyên |
|
|
| Static config | 30-60 min | Very stable / Rất ổn định |
|
|
| Reference data | 1-2 hours | Almost never changes / Hầu như không thay đổi |
|
|
|
|
## Vô hiệu hóa Cache / Cache Invalidation
|
|
|
|
```mermaid
|
|
sequenceDiagram
|
|
participant API
|
|
participant Service
|
|
participant Cache
|
|
participant DB
|
|
|
|
API->>Service: Update User
|
|
Service->>DB: UPDATE user
|
|
DB-->>Service: Success
|
|
|
|
Service->>Cache: Invalidate user:123
|
|
Service->>Cache: Invalidate user:123:permissions
|
|
Service->>Cache: Invalidate user:123:roles
|
|
Cache-->>Service: Cleared
|
|
|
|
Service-->>API: Success
|
|
|
|
Note over Service,Cache: Next request will fetch fresh data
|
|
```
|
|
|
|
**Chiến lược Invalidation / Invalidation Strategies**:
|
|
|
|
```typescript
|
|
// VI: 1. Invalidation single key
|
|
// EN: 1. Single key invalidation
|
|
async updateUser(userId: string, data: UpdateUserDto): Promise<User> {
|
|
const user = await userRepository.update(userId, data);
|
|
|
|
// VI: Vô hiệu hóa user cache
|
|
// EN: Invalidate user cache
|
|
await cache.del(cacheKeys.user(userId));
|
|
|
|
return user;
|
|
}
|
|
|
|
// VI: 2. Invalidation theo pattern
|
|
// EN: 2. Pattern-based invalidation
|
|
async updateUserRole(userId: string, roleId: string): Promise<void> {
|
|
await userRoleRepository.assign(userId, roleId);
|
|
|
|
// VI: Vô hiệu hóa tất cả cache liên quan đến user
|
|
// EN: Invalidate all user-related cache
|
|
await cache.invalidatePattern(`iam:user:${userId}:*`);
|
|
}
|
|
|
|
// VI: 3. Invalidation theo thời gian (TTL expiry)
|
|
// EN: 3. Time-based invalidation (TTL expiry)
|
|
// VI: Tự động xử lý bởi cache
|
|
// EN: Automatically handled by cache
|
|
```
|
|
|
|
## Làm ấm Cache / Cache Warming
|
|
|
|
```typescript
|
|
// VI: Preload dữ liệu thường xuyên truy cập
|
|
// EN: Preload frequently accessed data
|
|
async warmCache(): Promise<void> {
|
|
logger.info('Starting cache warming');
|
|
|
|
// VI: Làm ấm user permissions cho active users
|
|
// EN: Warm user permissions for active users
|
|
const activeUsers = await userRepository.findActive({ limit: 1000 });
|
|
|
|
for (const user of activeUsers) {
|
|
const permissions = await rbacService.getUserPermissions(user.id);
|
|
|
|
await cache.set(
|
|
cacheKeys.userPermissions(user.id),
|
|
permissions,
|
|
300 // VI: 5 phút / EN: 5 minutes
|
|
);
|
|
}
|
|
|
|
logger.info('Cache warming completed', { count: activeUsers.length });
|
|
}
|
|
|
|
// VI: Chạy khi service khởi động
|
|
// EN: Run on service startup
|
|
warmCache().catch(err => logger.error('Cache warming failed', { err }));
|
|
```
|
|
|
|
## Quyết định Thiết kế / Design Decisions
|
|
|
|
### Quyết định 1: Multi-layer Caching (L1 + L2)
|
|
|
|
**VI Bối cảnh**: Cần giảm tải cho Redis và đạt độ trễ cực thấp cho dữ liệu hot.
|
|
**VI Quyết định**: Sử dụng kết hợp L1 (NodeCache) và L2 (Redis).
|
|
**VI Hậu quả**:
|
|
- ✅ Độ trễ < 1ms cho 40-50% requests.
|
|
- ✅ Giảm network traffic tới Redis.
|
|
- ❌ Phức tạp trong đồng bộ (L1 có thể stale trong thời gian ngắn).
|
|
|
|
**EN Context**: Need to reduce load on Redis and achieve ultra-low latency for hot data.
|
|
**EN Decision**: Use combination of L1 (NodeCache) and L2 (Redis).
|
|
**EN Consequences**:
|
|
- ✅ Latency < 1ms for 40-50% requests.
|
|
- ✅ Reduced network traffic to Redis.
|
|
- ❌ synchronization complexity (L1 might be stale for short duration).
|
|
|
|
## Đặc điểm Hiệu suất / Performance Characteristics
|
|
|
|
### VI: Mục tiêu Hiệu suất
|
|
| Chỉ số / Metric | Mục tiêu / Target | Ghi chú / Notes |
|
|
|-----------------|-------------------|-----------------|
|
|
| **L1 Hit Latency** | < 0.5ms | In-memory lookup |
|
|
| **L2 Hit Latency** | < 5ms | Network RTT + Redis processing |
|
|
| **Combine Hit Rate** | > 90% | L1 + L2 combined |
|
|
| **L1 Capacity** | 10k items | Per instance limit to protect heap |
|
|
| **Cache Warmup Time** | < 30s | At service startup |
|
|
|
|
### EN: Performance Targets
|
|
| Metric | Target | Notes |
|
|
|--------|--------|-------|
|
|
| **L1 Hit Latency** | < 0.5ms | In-memory lookup |
|
|
| **L2 Hit Latency** | < 5ms | Network RTT + Redis processing |
|
|
| **Combine Hit Rate** | > 90% | L1 + L2 combined |
|
|
| **L1 Capacity** | 10k items | Per instance limit to protect heap |
|
|
| **Cache Warmup Time** | < 30s | At service startup |
|
|
|
|
## Cân nhắc Bảo mật / Security Considerations
|
|
|
|
### VI: Bảo mật Cache
|
|
- **Encryption**: Dữ liệu nhạy cảm (PII) PHẢI được mã hóa trước khi lưu vào L2 Redis (AES-256). L1 có thể lưuplaintext vì nằm trong memory process (trừ khi memory dump).
|
|
- **Isolation**: Redis instance được bảo vệ bằng mật khẩu và Network Policy (chỉ allow traffic từ nội bộ K8s).
|
|
- **TLS**: Kết nối tới Redis qua TLS 1.2+.
|
|
- **Data Sanitization**: Không cache toàn bộ user object nếu chứa password hash hoặc secrets.
|
|
|
|
### EN: Cache Security
|
|
- **Encryption**: Sensitive data (PII) MUST be encrypted before storing in L2 Redis (AES-256). L1 can store plaintext as it is in process memory (unless memory dump).
|
|
- **Isolation**: Redis instance protected by password and Network Policy (allow internal K8s traffic only).
|
|
- **TLS**: Connect to Redis via TLS 1.2+.
|
|
- **Data Sanitization**: Do not cache entire user objects if they contain password hashes or secrets.
|
|
|
|
## Triển khai / Deployment
|
|
|
|
```mermaid
|
|
graph TD
|
|
subgraph "Kubernetes Pod"
|
|
Service[Microservice Container]
|
|
L1[L1 Cache (RAM)]
|
|
Service --- L1
|
|
end
|
|
|
|
subgraph "Infrastructure"
|
|
RedisMaster[Redis Master]
|
|
RedisSlave1[Redis Slave 1]
|
|
RedisSlave2[Redis Slave 2]
|
|
end
|
|
|
|
Service -->|Write| RedisMaster
|
|
Service -->|Read| RedisSlave1
|
|
Service -->|Read| RedisSlave2
|
|
|
|
RedisMaster -.->|Replication| RedisSlave1
|
|
RedisMaster -.->|Replication| RedisSlave2
|
|
|
|
style Service fill:#e1f5ff
|
|
style L1 fill:#d4edda
|
|
style RedisMaster fill:#fff4e1
|
|
```
|
|
|
|
**VI Mô tả Triển khai**:
|
|
- **L1**: Nhúng trực tiếp trong process của Microservice, scale theo số lượng Pods.
|
|
- **L2**: Cụm Redis (Cluster hoặc Sentinel) với ít nhất 3 nodes cho High Availability.
|
|
- **Connection Pooling**: Sử dụng ioredis với connection pooling để quản lý kết nối hiệu quả.
|
|
|
|
**EN Deployment Description**:
|
|
- **L1**: Embedded directly in Microservice process, scales with number of Pods.
|
|
- **L2**: Redis Cluster (or Sentinel) with at least 3 nodes for High Availability.
|
|
- **Connection Pooling**: Use ioredis with connection pooling for efficient connection management.
|
|
|
|
## Giám sát & Khả năng quan sát / Monitoring & Observability
|
|
|
|
### VI: Các chỉ số giám sát
|
|
- **Metrics**: Prometheus metrics cho hit rate, miss rate, latency, memory usage.
|
|
- **Logs**: Log cache miss/hit ở level debug (sample), log connection errors ở level error.
|
|
- **Health Checks**: Readiness probe kiểm tra kết nối tới Redis.
|
|
|
|
### EN: Monitoring Metrics
|
|
- **Metrics**: Prometheus metrics for hit rate, miss rate, latency, memory usage.
|
|
- **Logs**: Log cache miss/hit at debug level (sampled), log connection errors at error level.
|
|
- **Health Checks**: Readiness probe checks connection to Redis.
|
|
|
|
### Code Giám sát / Monitoring Code
|
|
|
|
**Cache Hit Rates**:
|
|
```typescript
|
|
// VI: Theo dõi hiệu suất cache
|
|
// EN: Track cache performance
|
|
export class CacheMetrics {
|
|
private hits = new Counter({
|
|
name: 'cache_hits_total',
|
|
help: 'Total cache hits',
|
|
labelNames: ['layer', 'key_prefix']
|
|
});
|
|
|
|
private misses = new Counter({
|
|
name: 'cache_misses_total',
|
|
help: 'Total cache misses',
|
|
labelNames: ['layer', 'key_prefix']
|
|
});
|
|
|
|
recordHit(layer: 'l1' | 'l2', key: string): void {
|
|
const prefix = key.split(':')[0];
|
|
this.hits.inc({ layer, key_prefix: prefix });
|
|
}
|
|
|
|
recordMiss(key: string): void {
|
|
const prefix = key.split(':')[0];
|
|
this.misses.inc({ layer: 'db', key_prefix: prefix });
|
|
}
|
|
}
|
|
```
|
|
|
|
**Hiệu suất Kỳ vọng / Expected Performance**:
|
|
| Chỉ số / Metric | L1 Cache | L2 Cache | Database |
|
|
|-----------------|----------|----------|----------|
|
|
| Độ trễ / Latency | < 1ms | < 5ms | < 50ms |
|
|
| Tỷ lệ Hit / Hit Rate | 40-50% | 80-90% | - |
|
|
| Dung lượng / Capacity | 10k keys | Unlimited | - |
|
|
|
|
## Best Practices
|
|
|
|
**NÊN / DO**:
|
|
- ✅ Sử dụng cache cho dữ liệu thường xuyên truy cập / Use cache for frequently accessed data
|
|
- ✅ Đặt TTL phù hợp dựa trên tần suất thay đổi dữ liệu / Set appropriate TTLs based on data change frequency
|
|
- ✅ Vô hiệu hóa cache khi cập nhật dữ liệu / Invalidate cache on data updates
|
|
- ✅ Sử dụng cache key namespacing / Use cache key namespacing
|
|
- ✅ Giám sát cache hit rates / Monitor cache hit rates
|
|
- ✅ Làm ấm cache khi khởi động cho dữ liệu quan trọng / Warm cache on startup for critical data
|
|
|
|
**KHÔNG NÊN / DON'T**:
|
|
- ❌ Cache dữ liệu thay đổi rất thường xuyên / Cache data that changes very frequently
|
|
- ❌ Đặt TTL quá dài (nguy cơ dữ liệu cũ) / Set TTL too long (stale data risk)
|
|
- ❌ Đặt TTL quá ngắn (mất lợi ích cache) / Set TTL too short (negates cache benefit)
|
|
- ❌ Cache dữ liệu nhạy cảm không mã hóa / Cache sensitive data without encryption
|
|
- ❌ Bỏ qua cache invalidation khi cập nhật / Ignore cache invalidation on updates
|
|
- ❌ Sử dụng cache làm primary data store / Use cache as primary data store
|
|
|
|
## Bối cảnh Hệ thống / System Context
|
|
|
|
```mermaid
|
|
C4Context
|
|
title Sơ đồ Bối cảnh Caching Architecture
|
|
|
|
System(services, "Microservices", "Application services")
|
|
|
|
System_Ext(redis, "Redis Cluster", "L2 distributed cache")
|
|
System_Ext(db, "Neon PostgreSQL", "Primary data store")
|
|
System_Ext(monitoring, "Monitoring", "Cache metrics & alerts")
|
|
|
|
Rel(services, redis, "Cache operations", "Redis Protocol")
|
|
Rel(services, db, "Data operations", "PostgreSQL")
|
|
Rel(redis, monitoring, "Sends metrics", "Prometheus")
|
|
|
|
BiRel(services, redis, "L2 cache miss → DB query")
|
|
```
|
|
|
|
**VI Mô tả**:
|
|
- **Microservices**: Sử dụng multi-layer cache (L1: Memory, L2: Redis)
|
|
- **Redis Cluster**: L2 cache shared giữa tất cả service instances
|
|
- **PostgreSQL**: Primary data store, fallback khi cache miss
|
|
- **Monitoring**: Thu thập cache metrics (hit rate, latency, evictions)
|
|
|
|
**EN Description**:
|
|
- **Microservices**: Use multi-layer cache (L1: Memory, L2: Redis)
|
|
- **Redis Cluster**: L2 cache shared across all service instances
|
|
- **PostgreSQL**: Primary data store, fallback on cache miss
|
|
- **Monitoring**: Collects cache metrics (hit rate, latency, evictions)
|
|
|
|
## Cân nhắc Bảo mật / Security Considerations
|
|
|
|
### VI: Phần Tiếng Việt
|
|
|
|
**Access Control**:
|
|
- Redis AUTH password cho authentication
|
|
- Network isolation: Redis chỉ accessible từ service pods
|
|
- Kubernetes Network Policies: Whitelist specific services
|
|
|
|
**Encryption**:
|
|
- TLS cho Redis connections (optional, recommended for production)
|
|
- Encryption at rest: Redis persistence files encrypted
|
|
- Sensitive data: Encrypt before caching (AES-256-GCM)
|
|
|
|
**Data Sensitivity**:
|
|
- **KHÔNG cache**: Passwords, tokens, credit cards, SSN
|
|
- **Cache với encryption**: PII (email, phone, address)
|
|
- **Cache plaintext**: Non-sensitive data (public info, configs)
|
|
|
|
**Cache Poisoning Prevention**:
|
|
- Validate data before caching
|
|
- Use signed cache keys để prevent tampering
|
|
- Implement cache key namespacing per service
|
|
|
|
**TTL Management**:
|
|
- Short TTL (< 5 min) cho security-sensitive data
|
|
- Invalidate cache immediately khi data changes
|
|
- Auto-expire sessions on logout
|
|
|
|
**Audit**:
|
|
- Log cache access cho sensitive data
|
|
- Monitor unusual cache patterns (high miss rate, frequent invalidations)
|
|
- Alert on cache security events
|
|
|
|
### EN: English Section
|
|
|
|
**Access Control**:
|
|
- Redis AUTH password for authentication
|
|
- Network isolation: Redis only accessible from service pods
|
|
- Kubernetes Network Policies: Whitelist specific services
|
|
|
|
**Encryption**:
|
|
- TLS for Redis connections (optional, recommended for production)
|
|
- Encryption at rest: Redis persistence files encrypted
|
|
- Sensitive data: Encrypt before caching (AES-256-GCM)
|
|
|
|
**Data Sensitivity**:
|
|
- **DON'T cache**: Passwords, tokens, credit cards, SSN
|
|
- **Cache with encryption**: PII (email, phone, address)
|
|
- **Cache plaintext**: Non-sensitive data (public info, configs)
|
|
|
|
**Cache Poisoning Prevention**:
|
|
- Validate data before caching
|
|
- Use signed cache keys to prevent tampering
|
|
- Implement cache key namespacing per service
|
|
|
|
**TTL Management**:
|
|
- Short TTL (< 5 min) for security-sensitive data
|
|
- Invalidate cache immediately when data changes
|
|
- Auto-expire sessions on logout
|
|
|
|
**Audit**:
|
|
- Log cache access for sensitive data
|
|
- Monitor unusual cache patterns (high miss rate, frequent invalidations)
|
|
- Alert on cache security events
|
|
|
|
## Triển khai / Deployment
|
|
|
|
```mermaid
|
|
graph TD
|
|
subgraph "Redis Cluster"
|
|
subgraph "Masters"
|
|
M1[Redis Master 1<br/>Slots: 0-5460]
|
|
M2[Redis Master 2<br/>Slots: 5461-10922]
|
|
M3[Redis Master 3<br/>Slots: 10923-16383]
|
|
end
|
|
|
|
subgraph "Slaves"
|
|
S1[Redis Slave 1<br/>Replica of M1]
|
|
S2[Redis Slave 2<br/>Replica of M2]
|
|
S3[Redis Slave 3<br/>Replica of M3]
|
|
end
|
|
|
|
M1 --> S1
|
|
M2 --> S2
|
|
M3 --> S3
|
|
|
|
Sentinel[Redis Sentinel<br/>3 nodes]
|
|
|
|
Sentinel -.->|Monitor| M1
|
|
Sentinel -.->|Monitor| M2
|
|
Sentinel -.->|Monitor| M3
|
|
end
|
|
|
|
subgraph "Services"
|
|
Service1[Service A]
|
|
Service2[Service B]
|
|
Service3[Service C]
|
|
end
|
|
|
|
Service1 --> M1
|
|
Service1 --> M2
|
|
Service1 --> M3
|
|
|
|
Service2 --> M1
|
|
Service2 --> M2
|
|
Service2 --> M3
|
|
|
|
Service3 --> M1
|
|
Service3 --> M2
|
|
Service3 --> M3
|
|
|
|
style M1 fill:#e1f5ff
|
|
style M2 fill:#fff4e1
|
|
style M3 fill:#d4edda
|
|
style Sentinel fill:#f0e1ff
|
|
```
|
|
|
|
### VI: Chiến lược Triển khai
|
|
|
|
**Redis Cluster Configuration**:
|
|
- **Mode**: Cluster mode với 3 masters + 3 slaves
|
|
- **Replication**: Mỗi master có 1 slave cho high availability
|
|
- **Sentinel**: 3-node Sentinel ensemble cho automatic failover
|
|
- **Sharding**: 16384 hash slots phân chia đều giữa 3 masters
|
|
- **Persistence**: RDB snapshots mỗi 5 phút, AOF disabled (performance)
|
|
|
|
**Resource Allocation**:
|
|
| Component | CPU | Memory | Disk | Replicas |
|
|
|-----------|-----|--------|------|----------|
|
|
| **Redis Master** | 1 core | 2GB | 10GB SSD | 3 |
|
|
| **Redis Slave** | 1 core | 2GB | 10GB SSD | 3 |
|
|
| **Sentinel** | 500m | 512MB | 5GB | 3 |
|
|
|
|
**Redis Configuration**:
|
|
```yaml
|
|
# redis.conf
|
|
maxmemory 2gb
|
|
maxmemory-policy allkeys-lru # Evict least recently used keys
|
|
timeout 300 # Close idle connections after 5min
|
|
tcp-keepalive 60
|
|
save 300 10 # RDB snapshot every 5min if 10+ keys changed
|
|
appendonly no # Disable AOF for performance
|
|
|
|
# Cluster config
|
|
cluster-enabled yes
|
|
cluster-node-timeout 5000
|
|
cluster-replica-validity-factor 0
|
|
```
|
|
|
|
**High Availability**:
|
|
- Automatic failover với Redis Sentinel
|
|
- Slave promotion khi master fails
|
|
- Client-side retry logic
|
|
- Connection pooling (max 50 connections per service)
|
|
|
|
**Scaling Strategy**:
|
|
- **Vertical**: Tăng memory per node (2GB → 4GB → 8GB)
|
|
- **Horizontal**: Thêm master nodes (3 → 5 → 7)
|
|
- **Read Scaling**: Route reads to slaves
|
|
- **Monitoring**: Auto-alert khi memory usage > 80%
|
|
|
|
### EN: Deployment Strategy
|
|
|
|
**Redis Cluster Configuration**:
|
|
- **Mode**: Cluster mode with 3 masters + 3 slaves
|
|
- **Replication**: Each master has 1 slave for high availability
|
|
- **Sentinel**: 3-node Sentinel ensemble for automatic failover
|
|
- **Sharding**: 16384 hash slots distributed evenly across 3 masters
|
|
- **Persistence**: RDB snapshots every 5 minutes, AOF disabled (performance)
|
|
|
|
**Resource Allocation**:
|
|
| Component | CPU | Memory | Disk | Replicas |
|
|
|-----------|-----|--------|------|----------|
|
|
| **Redis Master** | 1 core | 2GB | 10GB SSD | 3 |
|
|
| **Redis Slave** | 1 core | 2GB | 10GB SSD | 3 |
|
|
| **Sentinel** | 500m | 512MB | 5GB | 3 |
|
|
|
|
**Redis Configuration**:
|
|
```yaml
|
|
# redis.conf
|
|
maxmemory 2gb
|
|
maxmemory-policy allkeys-lru # Evict least recently used keys
|
|
timeout 300 # Close idle connections after 5min
|
|
tcp-keepalive 60
|
|
save 300 10 # RDB snapshot every 5min if 10+ keys changed
|
|
appendonly no # Disable AOF for performance
|
|
|
|
# Cluster config
|
|
cluster-enabled yes
|
|
cluster-node-timeout 5000
|
|
cluster-replica-validity-factor 0
|
|
```
|
|
|
|
**High Availability**:
|
|
- Automatic failover with Redis Sentinel
|
|
- Slave promotion when master fails
|
|
- Client-side retry logic
|
|
- Connection pooling (max 50 connections per service)
|
|
|
|
**Scaling Strategy**:
|
|
- **Vertical**: Increase memory per node (2GB → 4GB → 8GB)
|
|
- **Horizontal**: Add master nodes (3 → 5 → 7)
|
|
- **Read Scaling**: Route reads to slaves
|
|
- **Monitoring**: Auto-alert when memory usage > 80%
|
|
|
|
## Giám sát & Khả năng quan sát / Monitoring & Observability
|
|
|
|
### VI: Chỉ số Chính
|
|
|
|
**Cache Performance Metrics**:
|
|
```typescript
|
|
// VI: Custom metrics cho cache performance
|
|
// EN: Custom metrics for cache performance
|
|
|
|
import { Counter, Histogram, Gauge } from 'prom-client';
|
|
|
|
export const cacheHits = new Counter({
|
|
name: 'cache_hits_total',
|
|
help: 'Total cache hits',
|
|
labelNames: ['layer', 'key_prefix'] // layer: l1/l2, key_prefix: user/session/etc
|
|
});
|
|
|
|
export const cacheMisses = new Counter({
|
|
name: 'cache_misses_total',
|
|
help: 'Total cache misses',
|
|
labelNames: ['key_prefix']
|
|
});
|
|
|
|
export const cacheLatency = new Histogram({
|
|
name: 'cache_operation_duration_seconds',
|
|
help: 'Cache operation duration',
|
|
labelNames: ['operation', 'layer'], // operation: get/set/del
|
|
buckets: [0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1]
|
|
});
|
|
|
|
export const cacheSize = new Gauge({
|
|
name: 'cache_size_bytes',
|
|
help: 'Cache size in bytes',
|
|
labelNames: ['layer']
|
|
});
|
|
|
|
export const cacheEvictions = new Counter({
|
|
name: 'cache_evictions_total',
|
|
help: 'Total cache evictions',
|
|
labelNames: ['layer', 'reason'] // reason: ttl_expired/memory_full
|
|
});
|
|
```
|
|
|
|
**Redis Metrics**:
|
|
- `redis_connected_clients` - Connected clients
|
|
- `redis_used_memory_bytes` - Memory usage
|
|
- `redis_memory_fragmentation_ratio` - Memory fragmentation
|
|
- `redis_keyspace_hits_total` - Cache hits
|
|
- `redis_keyspace_misses_total` - Cache misses
|
|
- `redis_evicted_keys_total` - Evicted keys
|
|
- `redis_expired_keys_total` - Expired keys
|
|
- `redis_commands_processed_total` - Commands processed
|
|
|
|
**Calculated Metrics**:
|
|
```promql
|
|
# Cache hit rate
|
|
rate(cache_hits_total[5m]) / (rate(cache_hits_total[5m]) + rate(cache_misses_total[5m]))
|
|
|
|
# L1 hit rate
|
|
rate(cache_hits_total{layer="l1"}[5m]) / rate(cache_hits_total[5m])
|
|
|
|
# L2 hit rate
|
|
rate(cache_hits_total{layer="l2"}[5m]) / rate(cache_hits_total[5m])
|
|
|
|
# Average cache latency
|
|
histogram_quantile(0.95, cache_operation_duration_seconds_bucket)
|
|
|
|
# Memory usage percentage
|
|
redis_used_memory_bytes / redis_maxmemory_bytes * 100
|
|
```
|
|
|
|
**Alerting Rules**:
|
|
```yaml
|
|
# VI: Quy tắc cảnh báo cho cache
|
|
# EN: Alerting rules for cache
|
|
|
|
groups:
|
|
- name: cache_alerts
|
|
interval: 30s
|
|
rules:
|
|
# Low cache hit rate
|
|
- alert: LowCacheHitRate
|
|
expr: |
|
|
rate(cache_hits_total[5m]) /
|
|
(rate(cache_hits_total[5m]) + rate(cache_misses_total[5m])) < 0.5
|
|
for: 10m
|
|
labels:
|
|
severity: warning
|
|
annotations:
|
|
summary: "Low cache hit rate"
|
|
description: "Cache hit rate is {{ $value | humanizePercentage }}"
|
|
|
|
# High memory usage
|
|
- alert: HighRedisMemoryUsage
|
|
expr: redis_used_memory_bytes / redis_maxmemory_bytes > 0.8
|
|
for: 5m
|
|
labels:
|
|
severity: warning
|
|
annotations:
|
|
summary: "High Redis memory usage"
|
|
description: "Redis memory usage is {{ $value | humanizePercentage }}"
|
|
|
|
# High eviction rate
|
|
- alert: HighEvictionRate
|
|
expr: rate(redis_evicted_keys_total[5m]) > 100
|
|
for: 5m
|
|
labels:
|
|
severity: warning
|
|
annotations:
|
|
summary: "High cache eviction rate"
|
|
description: "Eviction rate is {{ $value }}/sec"
|
|
|
|
# Redis down
|
|
- alert: RedisDown
|
|
expr: redis_up == 0
|
|
for: 1m
|
|
labels:
|
|
severity: critical
|
|
annotations:
|
|
summary: "Redis is down"
|
|
|
|
# High replication lag
|
|
- alert: HighReplicationLag
|
|
expr: redis_replication_lag_seconds > 5
|
|
for: 2m
|
|
labels:
|
|
severity: warning
|
|
annotations:
|
|
summary: "High Redis replication lag"
|
|
description: "Replication lag is {{ $value }}s"
|
|
```
|
|
|
|
**Dashboards**:
|
|
- **Cache Overview**: Hit rate, miss rate, latency, size
|
|
- **Redis Cluster**: Memory usage, connections, commands/sec
|
|
- **Performance**: L1 vs L2 hit rates, operation latency
|
|
- **Evictions**: Eviction rate, reasons, trends
|
|
|
|
**Logging**:
|
|
```typescript
|
|
// VI: Structured logging cho cache operations
|
|
// EN: Structured logging for cache operations
|
|
|
|
logger.debug('Cache operation', {
|
|
operation: 'get',
|
|
layer: 'l1',
|
|
key: cacheKey,
|
|
hit: true,
|
|
latency: duration,
|
|
correlationId: req.correlationId
|
|
});
|
|
|
|
logger.warn('Cache eviction', {
|
|
layer: 'l2',
|
|
reason: 'memory_full',
|
|
evictedKeys: count,
|
|
memoryUsage: usagePercent
|
|
});
|
|
|
|
logger.error('Cache error', {
|
|
operation: 'set',
|
|
layer: 'l2',
|
|
error: error.message,
|
|
key: cacheKey
|
|
});
|
|
```
|
|
|
|
**Health Checks**:
|
|
```typescript
|
|
// VI: Health check cho Redis
|
|
// EN: Health check for Redis
|
|
async function checkRedisHealth(): Promise<boolean> {
|
|
try {
|
|
await redis.ping();
|
|
const info = await redis.info('memory');
|
|
const memoryUsage = parseMemoryUsage(info);
|
|
|
|
return memoryUsage < 0.9; // Healthy if < 90% memory
|
|
} catch (error) {
|
|
logger.error('Redis health check failed', { error });
|
|
return false;
|
|
}
|
|
}
|
|
```
|
|
|
|
### EN: Key Metrics
|
|
|
|
**Cache Performance Metrics**:
|
|
```typescript
|
|
// Custom metrics for cache performance
|
|
|
|
import { Counter, Histogram, Gauge } from 'prom-client';
|
|
|
|
export const cacheHits = new Counter({
|
|
name: 'cache_hits_total',
|
|
help: 'Total cache hits',
|
|
labelNames: ['layer', 'key_prefix'] // layer: l1/l2, key_prefix: user/session/etc
|
|
});
|
|
|
|
export const cacheMisses = new Counter({
|
|
name: 'cache_misses_total',
|
|
help: 'Total cache misses',
|
|
labelNames: ['key_prefix']
|
|
});
|
|
|
|
export const cacheLatency = new Histogram({
|
|
name: 'cache_operation_duration_seconds',
|
|
help: 'Cache operation duration',
|
|
labelNames: ['operation', 'layer'], // operation: get/set/del
|
|
buckets: [0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1]
|
|
});
|
|
|
|
export const cacheSize = new Gauge({
|
|
name: 'cache_size_bytes',
|
|
help: 'Cache size in bytes',
|
|
labelNames: ['layer']
|
|
});
|
|
|
|
export const cacheEvictions = new Counter({
|
|
name: 'cache_evictions_total',
|
|
help: 'Total cache evictions',
|
|
labelNames: ['layer', 'reason'] // reason: ttl_expired/memory_full
|
|
});
|
|
```
|
|
|
|
**Redis Metrics**:
|
|
- `redis_connected_clients` - Connected clients
|
|
- `redis_used_memory_bytes` - Memory usage
|
|
- `redis_memory_fragmentation_ratio` - Memory fragmentation
|
|
- `redis_keyspace_hits_total` - Cache hits
|
|
- `redis_keyspace_misses_total` - Cache misses
|
|
- `redis_evicted_keys_total` - Evicted keys
|
|
- `redis_expired_keys_total` - Expired keys
|
|
- `redis_commands_processed_total` - Commands processed
|
|
|
|
**Calculated Metrics**:
|
|
```promql
|
|
# Cache hit rate
|
|
rate(cache_hits_total[5m]) / (rate(cache_hits_total[5m]) + rate(cache_misses_total[5m]))
|
|
|
|
# L1 hit rate
|
|
rate(cache_hits_total{layer="l1"}[5m]) / rate(cache_hits_total[5m])
|
|
|
|
# L2 hit rate
|
|
rate(cache_hits_total{layer="l2"}[5m]) / rate(cache_hits_total[5m])
|
|
|
|
# Average cache latency
|
|
histogram_quantile(0.95, cache_operation_duration_seconds_bucket)
|
|
|
|
# Memory usage percentage
|
|
redis_used_memory_bytes / redis_maxmemory_bytes * 100
|
|
```
|
|
|
|
**Alerting Rules**:
|
|
```yaml
|
|
# Alerting rules for cache
|
|
|
|
groups:
|
|
- name: cache_alerts
|
|
interval: 30s
|
|
rules:
|
|
# Low cache hit rate
|
|
- alert: LowCacheHitRate
|
|
expr: |
|
|
rate(cache_hits_total[5m]) /
|
|
(rate(cache_hits_total[5m]) + rate(cache_misses_total[5m])) < 0.5
|
|
for: 10m
|
|
labels:
|
|
severity: warning
|
|
annotations:
|
|
summary: "Low cache hit rate"
|
|
description: "Cache hit rate is {{ $value | humanizePercentage }}"
|
|
|
|
# High memory usage
|
|
- alert: HighRedisMemoryUsage
|
|
expr: redis_used_memory_bytes / redis_maxmemory_bytes > 0.8
|
|
for: 5m
|
|
labels:
|
|
severity: warning
|
|
annotations:
|
|
summary: "High Redis memory usage"
|
|
description: "Redis memory usage is {{ $value | humanizePercentage }}"
|
|
|
|
# High eviction rate
|
|
- alert: HighEvictionRate
|
|
expr: rate(redis_evicted_keys_total[5m]) > 100
|
|
for: 5m
|
|
labels:
|
|
severity: warning
|
|
annotations:
|
|
summary: "High cache eviction rate"
|
|
description: "Eviction rate is {{ $value }}/sec"
|
|
|
|
# Redis down
|
|
- alert: RedisDown
|
|
expr: redis_up == 0
|
|
for: 1m
|
|
labels:
|
|
severity: critical
|
|
annotations:
|
|
summary: "Redis is down"
|
|
|
|
# High replication lag
|
|
- alert: HighReplicationLag
|
|
expr: redis_replication_lag_seconds > 5
|
|
for: 2m
|
|
labels:
|
|
severity: warning
|
|
annotations:
|
|
summary: "High Redis replication lag"
|
|
description: "Replication lag is {{ $value }}s"
|
|
```
|
|
|
|
**Dashboards**:
|
|
- **Cache Overview**: Hit rate, miss rate, latency, size
|
|
- **Redis Cluster**: Memory usage, connections, commands/sec
|
|
- **Performance**: L1 vs L2 hit rates, operation latency
|
|
- **Evictions**: Eviction rate, reasons, trends
|
|
|
|
**Logging**:
|
|
```typescript
|
|
// Structured logging for cache operations
|
|
|
|
logger.debug('Cache operation', {
|
|
operation: 'get',
|
|
layer: 'l1',
|
|
key: cacheKey,
|
|
hit: true,
|
|
latency: duration,
|
|
correlationId: req.correlationId
|
|
});
|
|
|
|
logger.warn('Cache eviction', {
|
|
layer: 'l2',
|
|
reason: 'memory_full',
|
|
evictedKeys: count,
|
|
memoryUsage: usagePercent
|
|
});
|
|
|
|
logger.error('Cache error', {
|
|
operation: 'set',
|
|
layer: 'l2',
|
|
error: error.message,
|
|
key: cacheKey
|
|
});
|
|
```
|
|
|
|
**Health Checks**:
|
|
```typescript
|
|
// Health check for Redis
|
|
async function checkRedisHealth(): Promise<boolean> {
|
|
try {
|
|
await redis.ping();
|
|
const info = await redis.info('memory');
|
|
const memoryUsage = parseMemoryUsage(info);
|
|
|
|
return memoryUsage < 0.9; // Healthy if < 90% memory
|
|
} catch (error) {
|
|
logger.error('Redis health check failed', { error });
|
|
return false;
|
|
}
|
|
}
|
|
```
|
|
|
|
|
|
## Tài liệu Liên quan / Related Documentation
|
|
|
|
- [System Design](./system-design.md) - Kiến trúc tổng thể với caching / Overall architecture with caching
|
|
- [Data Consistency Patterns](./data-consistency-patterns.md) - Cache invalidation patterns
|
|
|
|
---
|
|
|
|
**Cập nhật Lần cuối / Last Updated**: 2026-01-07
|
|
**Tác giả / Authors**: GoodGo Architecture Team
|