docs: Update architecture documentation for GoodGo Platform

- Translated and revised architecture documents to enhance clarity and accessibility for both English and Vietnamese audiences.
- Improved diagrams and descriptions for caching, data consistency, event-driven architecture, microservices communication, observability, and security architecture.
- Ensured consistent formatting and terminology across all documents to facilitate better understanding and navigation.
- Added quick tips and troubleshooting sections to assist developers in implementing and managing the architecture effectively.
This commit is contained in:
Ho Ngoc Hai
2026-01-14 13:07:19 +07:00
parent c851fd97eb
commit 3ed499ef7c
8 changed files with 2859 additions and 1722 deletions

View File

@@ -1,8 +1,8 @@
# Caching Architecture
# Kiến trúc Caching
> Multi-layer caching strategy for optimal performance
> Chiến lược caching nhiều tầng để tối ưu hiệu suất
## Overview Diagram
## Sơ đồ Tổng quan
```mermaid
graph TD
@@ -18,16 +18,21 @@ graph TD
DB --> StoreL2[Store L2 + L1]
StoreL2 --> Return3[Return<br/>< 50ms]
style L1 fill:#d4edda
style L2 fill:#fff4e1
style DB fill:#f0e1ff
classDef memory fill:#1b5e20,stroke:#2e7d32,color:#fff
classDef redis fill:#e65100,stroke:#ef6c00,color:#fff
classDef db fill:#212121,stroke:#424242,color:#fff
classDef default fill:#202020,stroke:#505050,color:#fff
class L1,Return1,WarmL1 memory
class L2,Return2,StoreL2 redis
class DB,Return3 db
```
## System Context
## Bối cảnh Hệ thống
```mermaid
C4Context
title Caching System Context
title Sơ đồ Bối cảnh Hệ thống Caching
System(service, "Microservice", "Client service using cache")
System_Ext(db, "Neon PostgreSQL", "Primary database")
@@ -41,33 +46,38 @@ C4Context
Rel(service, l2, "Reads/Writes", "Redis Protocol")
Rel(l1, l2, "Fills from", "On miss")
Rel(l2, db, "Cache aside", "On miss")
UpdateElementStyle(service, $fontColor="white", $bgColor="#1a237e", $borderColor="#3949ab")
UpdateElementStyle(db, $fontColor="white", $bgColor="#212121", $borderColor="#424242")
UpdateElementStyle(l1, $fontColor="white", $bgColor="#1b5e20", $borderColor="#2e7d32")
UpdateElementStyle(l2, $fontColor="white", $bgColor="#e65100", $borderColor="#ef6c00")
```
### Context Description
- **Service**: Communicates directly with L1 Cache (in-memory) for lowest latency.
- **L1 Cache**: Local cache, not shared, automatic expiration (short TTL).
- **L2 Cache**: Shared Redis cluster, holds data longer and syncs across instances.
- **Database**: Source of truth, accessed only on cache miss.
### Mô tả Bối cảnh
- **Service**: Giao tiếp trực tiếp với L1 Cache (in-memory) để đạt độ trễ thấp nhất.
- **L1 Cache**: Cache cục bộ, không chia sẻ, tự động hết hạn (TTL ngắn).
- **L2 Cache**: Redis cluster chia sẻ, giữ dữ liệu lâu dài hơn và đồng bộ giữa các instances.
- **Database**: Nguồn dữ liệu gốc (source of truth), chỉ được truy cập khi cache miss.
## Architecture Description
## Mô tả Kiến trúc
### Multi-Layer Caching
### Caching Nhiều Tầng
GoodGo platform uses 2-layer caching for performance:
Nền tảng GoodGo sử dụng caching 2 tầng để tối ưu hiệu suất:
**L1 Cache (Memory)**:
- In-memory cache per service instance
- Very fast access (< 1ms)
- Limited capacity (10k keys default)
- Short TTL (60 seconds default, max 5 minutes)
- Not shared across instances
- In-memory cache trên mỗi service instance
- Truy cập rất nhanh (< 1ms)
- Dung lượng giới hạn (10k keys mặc định)
- TTL ngắn (60 giây mặc định, tối đa 5 phút)
- Không share giữa instances
**L2 Cache (Redis)**:
- Shared distributed cache
- Fast access (< 5ms)
- Large capacity
- Longer TTL (configurable, typically 5-15 minutes)
- Shared across all service instances
- Truy cập nhanh (< 5ms)
- Dung lượng lớn
- TTL dài hơn (configurable, thường 5-15 phút)
- Share giữa tất cả service instances
**Cache Flow**:
```
@@ -77,97 +87,99 @@ Request → L1 → L2 → Database
hit rate hit rate rate
```
## Cache Implementation
## Triển khai Cache
### Multi-Layer Cache Service
### Multi-Layer Cache Service (.NET)
```typescript
export class MultiLayerCache {
private l1Cache: NodeCache;
private l2Cache: Redis;
constructor() {
// L1: Memory cache
this.l1Cache = new NodeCache({
stdTTL: 60, // 60 seconds default
maxKeys: 10000, // Max 10k keys
checkperiod: 120 // Check for expired keys every 2min
});
```csharp
// EN: Multi-layer cache implementation
// VI: Triển khai cache đa lớp
public class MultiLayerCacheService : ICacheService
{
private readonly IMemoryCache _l1Cache;
private readonly IConnectionMultiplexer _redis;
private readonly IDatabase _l2Cache;
private readonly ILogger<MultiLayerCacheService> _logger;
// L2: Redis cache
this.l2Cache = new Redis({
host: process.env.REDIS_HOST,
port: parseInt(process.env.REDIS_PORT),
db: 0
});
}
async get<T>(key: string): Promise<T | null> {
// Try L1 first
const l1Value = this.l1Cache.get<T>(key);
if (l1Value) {
logger.debug('L1 cache hit', { key });
return l1Value;
public MultiLayerCacheService(
IMemoryCache l1Cache,
IConnectionMultiplexer redis,
ILogger<MultiLayerCacheService> logger)
{
_l1Cache = l1Cache;
_redis = redis;
_l2Cache = redis.GetDatabase();
_logger = logger;
}
// Try L2
const l2Value = await this.l2Cache.get(key);
if (l2Value) {
logger.debug('L2 cache hit', { key });
const parsed = JSON.parse(l2Value) as T;
// Warm L1 cache
this.l1Cache.set(key, parsed);
return parsed;
public async Task<T?> GetAsync<T>(string key, CancellationToken ct = default)
{
// L1: Memory cache check
if (_l1Cache.TryGetValue(key, out T? l1Value))
{
_logger.LogDebug("L1 cache hit for key: {Key}", key);
return l1Value;
}
// L2: Redis cache check
var l2Value = await _l2Cache.StringGetAsync(key);
if (!l2Value.IsNullOrEmpty)
{
_logger.LogDebug("L2 cache hit for key: {Key}", key);
var parsed = JsonSerializer.Deserialize<T>(l2Value!);
// Warm L1 cache
_l1Cache.Set(key, parsed, TimeSpan.FromMinutes(1));
return parsed;
}
_logger.LogDebug("Cache miss for key: {Key}", key);
return default;
}
logger.debug('Cache miss', { key });
return null;
}
async set(key: string, value: any, ttl: number = 300): Promise<void> {
// Store in both L1 and L2
this.l1Cache.set(key, value, Math.min(ttl, 300)); // L1 max 5min
await this.l2Cache.setex(key, ttl, JSON.stringify(value));
}
async del(key: string): Promise<void> {
this.l1Cache.del(key);
await this.l2Cache.del(key);
}
async invalidatePattern(pattern: string): Promise<void> {
// L1: Clear all (simple approach)
this.l1Cache.flushAll();
// L2: Delete by pattern
const keys = await this.l2Cache.keys(pattern);
if (keys.length > 0) {
await this.l2Cache.del(...keys);
public async Task SetAsync<T>(string key, T value, TimeSpan? ttl = null, CancellationToken ct = default)
{
var expiry = ttl ?? TimeSpan.FromMinutes(5);
var l1Expiry = TimeSpan.FromMinutes(Math.Min(expiry.TotalMinutes, 5));
// L1: Memory cache (max 5 min)
_l1Cache.Set(key, value, l1Expiry);
// L2: Redis cache
var json = JsonSerializer.Serialize(value);
await _l2Cache.StringSetAsync(key, json, expiry);
}
public async Task RemoveAsync(string key, CancellationToken ct = default)
{
_l1Cache.Remove(key);
await _l2Cache.KeyDeleteAsync(key);
}
}
}
```
### Cache Key Naming
### Quy ước Đặt tên Key
**Pattern**: `{service}:{entity}:{identifier}:{sub-resource}`
**Examples**:
```typescript
const keys = {
user: (userId: string) => `iam:user:${userId}`,
userPermissions: (userId: string) => `iam:user:${userId}:permissions`,
userRoles: (userId: string) => `iam:user:${userId}:roles`,
session: (sessionId: string) => `iam:session:${sessionId}`,
};
**Ví dụ (C#)**:
```csharp
// Cache key constants
public static class CacheKeys
{
public static string User(string userId) => $"iam:user:{userId}";
public static string UserPermissions(string userId) => $"iam:user:{userId}:permissions";
public static string UserRoles(string userId) => $"iam:user:{userId}:roles";
public static string Session(string sessionId) => $"iam:session:{sessionId}";
public static string UserQuota(string userId) => $"storage:quota:{userId}";
}
// Usage
const user = await cache.get(keys.user('user_123'));
const permissions = await cache.get(keys.userPermissions('user_123'));
// Sử dụng
var user = await _cache.GetAsync<UserDto>(CacheKeys.User("user_123"));
var permissions = await _cache.GetAsync<List<string>>(CacheKeys.UserPermissions("user_123"));
```
## TTL Strategies
## Chiến lược TTL
```mermaid
graph LR
@@ -186,22 +198,28 @@ graph LR
Long --> Config[Static Config]
Long --> RefData[Reference Data]
style Short fill:#f8d7da
style Medium fill:#fff3cd
style Long fill:#d4edda
classDef tier fill:#202020,stroke:#505050,color:#fff
classDef short fill:#b71c1c,stroke:#f44336,color:#fff
classDef medium fill:#e65100,stroke:#ef6c00,color:#fff
classDef long fill:#1b5e20,stroke:#2e7d32,color:#fff
class Short short
class Medium medium
class Long long
class Permissions,Sessions,UserProfiles,OrgData,Config,RefData tier
```
**TTL Guidelines**:
| Data Type | TTL | Reason |
|-----------|-----|--------|
| User permissions | 5 min | Security-sensitive |
| Session data | Varies | Based on session length |
| User profiles | 10 min | Moderate update frequency |
| Organization data | 15 min | Infrequent updates |
| Static config | 30-60 min | Very stable |
| Reference data | 1-2 hours | Almost never changes |
**Hướng dẫn TTL**:
| Loại Dữ liệu | TTL | Lý do |
|---------------------------|-----|----------------|
| User permissions | 5 min | Nhạy cảm bảo mật |
| Session data | Varies | Dựa trên độ dài session |
| User profiles | 10 min | Tần suất cập nhật vừa phải |
| Organization data | 15 min | Cập nhật không thường xuyên |
| Static config | 30-60 min | Rất ổn định |
| Reference data | 1-2 hours | Hầu như không thay đổi |
## Cache Invalidation
## Vô hiệu hóa Cache
```mermaid
sequenceDiagram
@@ -224,39 +242,39 @@ sequenceDiagram
Note over Service,Cache: Next request will fetch fresh data
```
**Invalidation Strategies**:
**Chiến lược Invalidation**:
```typescript
// 1. Single key invalidation
// 1. Invalidation single key
async updateUser(userId: string, data: UpdateUserDto): Promise<User> {
const user = await userRepository.update(userId, data);
// Invalidate user cache
// Vô hiệu hóa user cache
await cache.del(cacheKeys.user(userId));
return user;
}
// 2. Pattern-based invalidation
// 2. Invalidation theo pattern
async updateUserRole(userId: string, roleId: string): Promise<void> {
await userRoleRepository.assign(userId, roleId);
// Invalidate all user-related cache
// Vô hiệu hóa tất cả cache liên quan đến user
await cache.invalidatePattern(`iam:user:${userId}:*`);
}
// 3. Time-based invalidation (TTL expiry)
// Automatically handled by cache
// 3. Invalidation theo thời gian (TTL expiry)
// Tự động xử lý bởi cache
```
## Cache Warming
## Làm ấm Cache
```typescript
// Preload frequently accessed data
// Preload dữ liệu thường xuyên truy cập
async warmCache(): Promise<void> {
logger.info('Starting cache warming');
// Warm user permissions for active users
// Làm ấm user permissions cho active users
const activeUsers = await userRepository.findActive({ limit: 1000 });
for (const user of activeUsers) {
@@ -265,118 +283,354 @@ async warmCache(): Promise<void> {
await cache.set(
cacheKeys.userPermissions(user.id),
permissions,
300 // 5 minutes
300 // 5 phút
);
}
logger.info('Cache warming completed', { count: activeUsers.length });
}
// Run on service startup
// Chạy khi service khởi động
warmCache().catch(err => logger.error('Cache warming failed', { err }));
```
## Design Decisions
## Quyết định Thiết kế
### Decision 1: Multi-layer Caching (L1 + L2)
### Quyết định 1: Multi-layer Caching (L1 + L2)
**Context**: Need to reduce load on Redis and achieve ultra-low latency for hot data.
**Decision**: Use combination of L1 (NodeCache) and L2 (Redis).
**Consequences**:
-Latency < 1ms for 40-50% requests.
-Reduced network traffic to Redis.
-Synchronization complexity (L1 might be stale for short duration).
**Bối cảnh**: Cần giảm tải cho Redis và đạt độ trễ cực thấp cho dữ liệu hot.
**Quyết định**: Sử dụng kết hợp L1 (NodeCache) L2 (Redis).
**Hậu quả**:
-Độ trễ < 1ms cho 40-50% requests.
-Giảm network traffic tới Redis.
-Phức tạp trong đồng bộ (L1 có thể stale trong thời gian ngắn).
## Performance Characteristics
## Đặc điểm Hiệu suất
### Performance Targets
| Metric | Target | Notes |
|--------|--------|-------|
### Mục tiêu Hiệu suất
| Chỉ số | Mục tiêu | Ghi chú |
|-----------------|-------------------|-----------------|
| **L1 Hit Latency** | < 0.5ms | In-memory lookup |
| **L2 Hit Latency** | < 5ms | Network RTT + Redis processing |
| **Combine Hit Rate** | > 90% | L1 + L2 combined |
| **L1 Capacity** | 10k items | Per instance limit to protect heap |
| **Cache Warmup Time** | < 30s | At service startup |
## Security Considerations
## Cân nhắc Bảo mật
### Cache Security
- **Encryption**: Sensitive data (PII) MUST be encrypted before storing in L2 Redis (AES-256). L1 can store plaintext as it is in process memory (unless memory dump).
- **Isolation**: Redis instance protected by password and Network Policy (allow internal K8s traffic only).
- **TLS**: Connect to Redis via TLS 1.2+.
- **Data Sanitization**: Do not cache entire user objects if they contain password hashes or secrets.
### Bảo mật Cache
- **Encryption**: Dữ liệu nhạy cảm (PII) PHẢI được mã hóa trước khi lưu vào L2 Redis (AES-256). L1 có thể lưuplaintext vì nằm trong memory process (trừ khi memory dump).
- **Isolation**: Redis instance được bảo vệ bằng mật khẩu và Network Policy (chỉ allow traffic từ nội bộ K8s).
- **TLS**: Kết nối tới Redis qua TLS 1.2+.
- **Data Sanitization**: Không cache toàn bộ user object nếu chứa password hash hoặc secrets.
## Deployment
## Triển khai
```mermaid
graph TD
subgraph "Kubernetes Pod"
Service[Microservice Container]
L1[L1 Cache (RAM)]
Service --- L1
subgraph "Redis Cluster"
subgraph "Masters"
M1[Redis Master 1<br/>Slots: 0-5460]
M2[Redis Master 2<br/>Slots: 5461-10922]
M3[Redis Master 3<br/>Slots: 10923-16383]
end
subgraph "Slaves"
S1[Redis Slave 1<br/>Replica of M1]
S2[Redis Slave 2<br/>Replica of M2]
S3[Redis Slave 3<br/>Replica of M3]
end
M1 --> S1
M2 --> S2
M3 --> S3
Sentinel[Redis Sentinel<br/>3 nodes]
Sentinel -.->|Monitor| M1
Sentinel -.->|Monitor| M2
Sentinel -.->|Monitor| M3
end
subgraph "Infrastructure"
RedisMaster[Redis Master]
RedisSlave1[Redis Slave 1]
RedisSlave2[Redis Slave 2]
subgraph "Services"
Service1[Service A]
Service2[Service B]
Service3[Service C]
end
Service -->|Write| RedisMaster
Service -->|Read| RedisSlave1
Service -->|Read| RedisSlave2
RedisMaster -.->|Replication| RedisSlave1
RedisMaster -.->|Replication| RedisSlave2
style Service fill:#e1f5ff
style L1 fill:#d4edda
style RedisMaster fill:#fff4e1
Service1 --> M1
Service1 --> M2
Service1 --> M3
Service2 --> M1
Service2 --> M2
Service2 --> M3
Service3 --> M1
Service3 --> M2
Service3 --> M3
classDef master fill:#e65100,stroke:#ef6c00,color:#fff
classDef slave fill:#f57c00,stroke:#e65100,color:#fff
classDef sentinel fill:#4a148c,stroke:#7b1fa2,color:#fff
classDef service fill:#1a237e,stroke:#3949ab,color:#fff
classDef default fill:#202020,stroke:#505050,color:#fff
class M1,M2,M3 master
class S1,S2,S3 slave
class Sentinel sentinel
class Service1,Service2,Service3 service
```
**Deployment Description**:
- **L1**: Embedded directly in Microservice process, scales with number of Pods.
- **L2**: Redis Cluster (or Sentinel) with at least 3 nodes for High Availability.
- **Connection Pooling**: Use ioredis with connection pooling for efficient connection management.
### Chiến lược Triển khai
## Monitoring & Observability
**Redis Cluster Configuration**:
- **Mode**: Cluster mode với 3 masters + 3 slaves
- **Replication**: Mỗi master có 1 slave cho high availability
- **Sentinel**: 3-node Sentinel ensemble cho automatic failover
- **Sharding**: 16384 hash slots phân chia đều giữa 3 masters
- **Persistence**: RDB snapshots mỗi 5 phút, AOF disabled (performance)
### Monitoring Metrics
- **Metrics**: Prometheus metrics for hit rate, miss rate, latency, memory usage.
- **Logs**: Log cache miss/hit at debug level (sampled), log connection errors at error level.
- **Health Checks**: Readiness probe checks connection to Redis.
**Resource Allocation**:
| Component | CPU | Memory | Disk | Replicas |
|-----------|-----|--------|------|----------|
| **Redis Master** | 1 core | 2GB | 10GB SSD | 3 |
| **Redis Slave** | 1 core | 2GB | 10GB SSD | 3 |
| **Sentinel** | 500m | 512MB | 5GB | 3 |
### Monitoring Code
**Redis Configuration**:
```yaml
# redis.conf
maxmemory 2gb
maxmemory-policy allkeys-lru # Evict least recently used keys
timeout 300 # Close idle connections after 5min
tcp-keepalive 60
save 300 10 # RDB snapshot every 5min if 10+ keys changed
appendonly no # Disable AOF for performance
**Cache Hit Rates**:
# Cluster config
cluster-enabled yes
cluster-node-timeout 5000
cluster-replica-validity-factor 0
```
**High Availability**:
- Automatic failover với Redis Sentinel
- Slave promotion khi master fails
- Client-side retry logic
- Connection pooling (max 50 connections per service)
**Scaling Strategy**:
- **Vertical**: Tăng memory per node (2GB → 4GB → 8GB)
- **Horizontal**: Thêm master nodes (3 → 5 → 7)
- **Read Scaling**: Route reads to slaves
- **Monitoring**: Auto-alert khi memory usage > 80%
## Giám sát & Khả năng quan sát
### Chỉ số Chính
**Cache Performance Metrics**:
```typescript
// Track cache performance
export class CacheMetrics {
// ... Prometheus Implementation ...
// Custom metrics cho cache performance
import { Counter, Histogram, Gauge } from 'prom-client';
export const cacheHits = new Counter({
name: 'cache_hits_total',
labelNames: ['layer', 'key_prefix'] // layer: l1/l2, key_prefix: user/session/etc
});
export const cacheMisses = new Counter({
name: 'cache_misses_total',
help: 'Tổng số cache misses',
labelNames: ['key_prefix']
});
export const cacheLatency = new Histogram({
name: 'cache_operation_duration_seconds',
help: 'Thời gian thực hiện cache operation',
labelNames: ['operation', 'layer'], // operation: get/set/del
buckets: [0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1]
});
export const cacheSize = new Gauge({
name: 'cache_size_bytes',
help: 'Kích thước cache (bytes)',
labelNames: ['layer']
});
export const cacheEvictions = new Counter({
name: 'cache_evictions_total',
help: 'Tổng số cache evictions',
labelNames: ['layer', 'reason'] // reason: ttl_expired/memory_full
});
```
**Redis Metrics**:
- `redis_connected_clients` - Connected clients
- `redis_used_memory_bytes` - Memory usage
- `redis_memory_fragmentation_ratio` - Memory fragmentation
- `redis_keyspace_hits_total` - Cache hits
- `redis_keyspace_misses_total` - Cache misses
- `redis_evicted_keys_total` - Evicted keys
- `redis_expired_keys_total` - Expired keys
- `redis_commands_processed_total` - Commands processed
**Calculated Metrics**:
```promql
# Cache hit rate
rate(cache_hits_total[5m]) / (rate(cache_hits_total[5m]) + rate(cache_misses_total[5m]))
# L1 hit rate
rate(cache_hits_total{layer="l1"}[5m]) / rate(cache_hits_total[5m])
# L2 hit rate
rate(cache_hits_total{layer="l2"}[5m]) / rate(cache_hits_total[5m])
# Average cache latency
histogram_quantile(0.95, cache_operation_duration_seconds_bucket)
# Memory usage percentage
redis_used_memory_bytes / redis_maxmemory_bytes * 100
```
**Alerting Rules**:
```yaml
# Quy tắc cảnh báo cho cache
groups:
- name: cache_alerts
interval: 30s
rules:
# Low cache hit rate
- alert: LowCacheHitRate
expr: |
rate(cache_hits_total[5m]) /
(rate(cache_hits_total[5m]) + rate(cache_misses_total[5m])) < 0.5
for: 10m
labels:
severity: warning
annotations:
summary: "Tỷ lệ cache hit thấp"
description: "Tỷ lệ cache hit là {{ $value | humanizePercentage }}"
# High memory usage
- alert: HighRedisMemoryUsage
expr: redis_used_memory_bytes / redis_maxmemory_bytes > 0.8
for: 5m
labels:
severity: warning
annotations:
summary: "Sử dụng bộ nhớ Redis cao"
description: "Bộ nhớ Redis sử dụng là {{ $value | humanizePercentage }}"
# High eviction rate
- alert: HighEvictionRate
expr: rate(redis_evicted_keys_total[5m]) > 100
for: 5m
labels:
severity: warning
annotations:
summary: "Tỷ lệ cache eviction cao"
description: "Tỷ lệ eviction là {{ $value }}/giây"
# Redis down
- alert: RedisDown
expr: redis_up == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Redis bị down"
# High replication lag
- alert: HighReplicationLag
expr: redis_replication_lag_seconds > 5
for: 2m
labels:
severity: warning
annotations:
summary: "Độ trễ replication cao"
description: "Độ trễ replication là {{ $value }}s"
```
**Dashboards**:
- **Cache Overview**: Hit rate, miss rate, latency, size
- **Redis Cluster**: Memory usage, connections, commands/sec
- **Performance**: L1 vs L2 hit rates, operation latency
- **Evictions**: Eviction rate, reasons, trends
**Logging**:
```typescript
// Structured logging cho cache operations
logger.debug('Cache operation', {
operation: 'get',
layer: 'l1',
key: cacheKey,
hit: true,
latency: duration,
correlationId: req.correlationId
});
logger.warn('Cache eviction', {
layer: 'l2',
reason: 'memory_full',
evictedKeys: count,
memoryUsage: usagePercent
});
logger.error('Cache error', {
operation: 'set',
layer: 'l2',
error: error.message,
key: cacheKey
});
```
**Health Checks**:
```typescript
// Health check cho Redis
async function checkRedisHealth(): Promise<boolean> {
try {
await redis.ping();
const info = await redis.info('memory');
const memoryUsage = parseMemoryUsage(info);
return memoryUsage < 0.9; // Healthy if < 90% memory
} catch (error) {
logger.error('Redis health check failed', { error });
return false;
}
}
```
**Expected Performance**:
| Metric | L1 Cache | L2 Cache | Database |
|--------|----------|----------|----------|
| Latency | < 1ms | < 5ms | < 50ms |
| Hit Rate | 40-50% | 80-90% | - |
| Capacity | 10k keys | Unlimited | - |
## Best Practices
## Tài liệu Liên quan
**DO**:
- ✅ Use cache for frequently accessed data
- ✅ Set appropriate TTLs based on data change frequency
- ✅ Invalidate cache on data updates
- ✅ Use cache key namespacing
- ✅ Monitor cache hit rates
- ✅ Warm cache on startup for critical data
- [System Design](./system-design.md) - Kiến trúc tổng thể với caching
- [Data Consistency Patterns](./data-consistency-patterns.md) - Cache invalidation patterns
**DON'T**:
- ❌ Cache data that changes very frequently
- ❌ Set TTL too long (stale data risk)
- ❌ Set TTL too short (negates cache benefit)
- ❌ Cache sensitive data without encryption
- ❌ Ignore cache invalidation on updates
- ❌ Use cache as primary data store
---
**Cập nhật Lần cuối**: 2026-01-14
**Tác giả**: GoodGo Architecture Team
## Quick Tips
### Mermaid Common Issues
- **Arrow Syntax**: Use `-->` for solid arrows, `-.->` for dotted arrows.
- **Node IDs**: Avoid spaces/special chars in IDs (e.g., `Node-A` not `Node A`).
- **Subgraphs**: Ensure `subgraph` names are unique and descriptive.
### Color Pattern Quick Reference
| Element | Dark Color | Text Color |
|---------|------------|------------|
| **Service (Blue)** | `#1a237e` | `#ffffff` |
| **Storage (Gray)** | `#212121` | `#ffffff` |
| **Cache L2 (Orange)** | `#e65100` | `#ffffff` |
| **Cache L1 (Green)** | `#1b5e20` | `#ffffff` |
| **Monitoring (Purple)** | `#4a148c` | `#ffffff` |
### Visual Indicators
-**Recommended / Khuyên dùng**
-**Not Recommended / Không khuyên dùng**
- ⚠️ **Warning / Cảnh báo**

View File

@@ -1,8 +1,8 @@
# Data Consistency Patterns
# Kiến trúc Patterns Đồng bộ Dữ liệu
> Patterns for maintaining data consistency in distributed microservices architecture
> Các patterns để duy trì tính nhất quán dữ liệu trong kiến trúc microservices phân tán
## Overview Diagram
## Sơ đồ Tổng quan
```mermaid
graph TD
@@ -24,32 +24,39 @@ graph TD
OptimisticLock --> StrongConsistency[Strong Consistency]
CQRS --> EventualConsistency
style Saga fill:#e1f5ff
style Outbox fill:#fff4e1
style Idempotency fill:#f0e1ff
style CQRS fill:#d4edda
%% Dark color palette with white text
style Saga fill:#1d4ed8,stroke:#3b82f6,color:#ffffff
style Outbox fill:#b45309,stroke:#f59e0b,color:#ffffff
style Idempotency fill:#7e22ce,stroke:#a855f7,color:#ffffff
style OptimisticLock fill:#15803d,stroke:#22c55e,color:#ffffff
style CQRS fill:#15803d,stroke:#22c55e,color:#ffffff
style EventualConsistency fill:#374151,stroke:#6b7280,color:#ffffff
style StrongConsistency fill:#374151,stroke:#6b7280,color:#ffffff
style Service1 fill:#4527a0,stroke:#7c4dff,color:#ffffff
style Service2 fill:#4527a0,stroke:#7c4dff,color:#ffffff
style Service3 fill:#4527a0,stroke:#7c4dff,color:#ffffff
```
## Architecture Description
## Mô tả Kiến trúc
### Architecture Overview
### Tổng quan Kiến trúc
GoodGo platform uses multiple consistency patterns to handle distributed data:
Nền tảng GoodGo sử dụng nhiều consistency patterns để xử lý dữ liệu phân tán:
**Core Challenges**:
- No distributed transactions (2PC too slow)
- Services own their data (database per service)
- Network failures can cause partial completion
- Need to maintain data integrity across services
**Thách thức Cốt lõi**:
- Không có distributed transactions (2PC quá chậm)
- Services sở hữu dữ liệu riêng (database per service)
- Network failures có thể gây partial completion
- Cần maintain data integrity giữa các services
**Pattern Selection**:
- **Saga**: For multi-service workflows
- **Outbox**: For guaranteed event publishing
- **Idempotency**: For safe retries
- **Optimistic Locking**: For concurrent updates
- **CQRS**: For read/write optimization
**Lựa chọn Pattern**:
- **Saga**: Cho workflows nhiều services
- **Outbox**: Cho event publishing đảm bảo
- **Idempotency**: Cho retries an toàn
- **Optimistic Locking**: Cho concurrent updates
- **CQRS**: Cho tối ưu read/write
## System Context
## Bối cảnh Hệ thống
```mermaid
C4Context
@@ -88,9 +95,9 @@ C4Context
UpdateRelStyle(saga_orchestrator, inventory_service, $lineColor="red", $textColor="red")
```
The GoodGo platform uses a database-per-service architecture where each service owns its data. Data consistency across services is achieved through patterns like Saga (for coordinated workflows), Outbox (for reliable event publishing), Idempotency (for safe retries), and Optimistic Locking (for concurrent updates). These patterns enable eventual consistency while maintaining data integrity.
Nền tảng GoodGo sử dụng kiến trúc database-per-service nơi mỗi service sở hữu dữ liệu riêng. Tính nhất quán dữ liệu giữa các services đạt được thông qua các patterns như Saga (cho workflows phối hợp), Outbox (cho event publishing đáng tin cậy), Idempotency (cho retries an toàn), và Optimistic Locking (cho concurrent updates). Các patterns này cho phép eventual consistency đồng thời duy trì data integrity.
## Saga Pattern
## Pattern Saga
```mermaid
sequenceDiagram
@@ -119,9 +126,9 @@ sequenceDiagram
end
```
**Description**: Saga manages distributed transactions as sequence of local transactions with compensation.
**Mô tả**: Saga quản lý distributed transactions dưới dạng chuỗi local transactions với compensation.
**Implementation**:
**Triển khai**:
```typescript
// Saga orchestrator
class OrderSaga {
@@ -133,19 +140,19 @@ class OrderSaga {
};
try {
// Step 1: Create order
// Bước 1: Tạo đơn hàng
sagaContext.orderId = await orderService.create(orderData);
// Step 2: Process payment
// Bước 2: Xử lý thanh toán
sagaContext.paymentId = await paymentService.process(orderData.payment);
// Step 3: Reserve inventory
// Bước 3: Đặt trước kho
sagaContext.inventoryId = await inventoryService.reserve(orderData.items);
// All success - commit
// Tất cả thành công - commit
await this.completeSaga(sagaContext);
} catch (error) {
// Compensate in reverse order
// Compensate theo thứ tự ngược lại
await this.compensate(sagaContext, error);
throw error;
}
@@ -165,7 +172,7 @@ class OrderSaga {
}
```
## Outbox Pattern
## Pattern Outbox
```mermaid
sequenceDiagram
@@ -189,57 +196,54 @@ sequenceDiagram
end
```
**Description**: Guarantees event publishing by storing events in database within same transaction as business data.
**Mô tả**: Đảm bảo event publishing bằng cách lưu events trong database cùng transaction với business data.
**Implementation**:
```typescript
// Store event in outbox
async createUser(userData: CreateUserDto): Promise<User> {
return await prisma.$transaction(async (tx) => {
// Business operation
const user = await tx.user.create({ data: userData });
**Triển khai (.NET với EF Core)**:
```csharp
// EN: Save event in outbox with business data
// VI: Lưu event trong outbox cùng với business data
public async Task<User> CreateUserAsync(CreateUserDto dto, CancellationToken ct)
{
await using var transaction = await _context.Database.BeginTransactionAsync(ct);
// Store event in outbox (same transaction)
await tx.outbox.create({
data: {
aggregateId: user.id,
aggregateType: 'User',
eventType: 'user.created.v1',
payload: JSON.stringify(user),
createdAt: new Date()
}
});
return user;
});
}
// Outbox processor (runs periodically)
async processOutbox(): Promise<void> {
const events = await prisma.outbox.findMany({
where: { publishedAt: null },
take: 100
});
for (const event of events) {
try {
await kafkaProducer.send({
topic: event.eventType,
messages: [{ value: event.payload }]
});
await prisma.outbox.update({
where: { id: event.id },
data: { publishedAt: new Date() }
});
} catch (error) {
logger.error('Failed to publish event', { event, error });
try
{
// Business operation
var user = new User
{
Id = Guid.NewGuid(),
Email = dto.Email,
FirstName = dto.FirstName,
LastName = dto.LastName
};
_context.Users.Add(user);
// Lưu event trong outbox (cùng transaction)
var outboxEvent = new OutboxMessage
{
Id = Guid.NewGuid(),
AggregateId = user.Id.ToString(),
AggregateType = nameof(User),
EventType = "user.created.v1",
Payload = JsonSerializer.Serialize(user),
CreatedAt = DateTime.UtcNow
};
_context.OutboxMessages.Add(outboxEvent);
await _context.SaveChangesAsync(ct);
await transaction.CommitAsync(ct);
return user;
}
catch
{
await transaction.RollbackAsync(ct);
throw;
}
}
}
```
## Idempotency Pattern
## Pattern Idempotency
```mermaid
graph LR
@@ -256,36 +260,43 @@ graph LR
Request2 --> Check
Return --> Response2[Same Response]
style Check fill:#fff3cd
style Store fill:#d4edda
%% Dark color palette with white text
style Request1 fill:#374151,stroke:#6b7280,color:#ffffff
style Request2 fill:#374151,stroke:#6b7280,color:#ffffff
style Check fill:#b45309,stroke:#f59e0b,color:#ffffff
style Process fill:#1d4ed8,stroke:#3b82f6,color:#ffffff
style Store fill:#15803d,stroke:#22c55e,color:#ffffff
style Return fill:#7e22ce,stroke:#a855f7,color:#ffffff
style Response1 fill:#15803d,stroke:#22c55e,color:#ffffff
style Response2 fill:#15803d,stroke:#22c55e,color:#ffffff
```
**Description**: Ensures operations can be safely retried without side effects by using idempotency keys.
**Mô tả**: Đảm bảo operations có thể retry an toàn mà không có side effects bằng cách sử dụng idempotency keys.
**Implementation**:
**Triển khai**:
```typescript
// Idempotency middleware
async function idempotentOperation<T>(
key: string,
operation: () => Promise<T>,
ttl: number = 86400 // 24 hours
ttl: number = 86400
): Promise<T> {
// Check if already processed
// Kiểm tra đã xử lý chưa
const cached = await redis.get(`idempotency:${key}`);
if (cached) {
return JSON.parse(cached);
}
// Process operation
// Xử lý operation
const result = await operation();
// Store result
// Lưu kết quả
await redis.setex(`idempotency:${key}`, ttl, JSON.stringify(result));
return result;
}
// Usage in controller
// Sử dụng trong controller
async createPayment(req: Request, res: Response): Promise<void> {
const idempotencyKey = req.headers['idempotency-key'] as string;
@@ -302,7 +313,7 @@ async createPayment(req: Request, res: Response): Promise<void> {
}
```
## Optimistic Locking
## Khóa Lạc quan (Optimistic Locking)
```mermaid
sequenceDiagram
@@ -328,38 +339,49 @@ sequenceDiagram
Service-->>User2: Success
```
**Description**: Prevents lost updates by checking version on update.
**Mô tả**: Ngăn chặn lost updates bằng cách kiểm tra version khi update.
**Implementation**:
```prisma
// Prisma schema
model User {
id String @id @default(cuid())
email String @unique
name String
version Int @default(1) // Version field
**Triển khai (.NET với EF Core)**:
```csharp
// EN: Entity with concurrency token
// VI: Entity với concurrency token
public class User
{
public Guid Id { get; set; }
public string Email { get; set; } = default!;
public string Name { get; set; } = default!;
[ConcurrencyCheck]
public int Version { get; set; } = 1;
// Or use RowVersion for SQL Server
// [Timestamp]
// public byte[] RowVersion { get; set; }
}
```
```typescript
// Update with optimistic locking
async updateUser(userId: string, data: UpdateUserDto, currentVersion: number): Promise<User> {
const result = await prisma.user.updateMany({
where: {
id: userId,
version: currentVersion // Check version
},
data: {
...data,
version: { increment: 1 } // Increment version
// EN: Update with optimistic locking
// VI: Update với optimistic locking
public async Task<User> UpdateUserAsync(
Guid userId,
UpdateUserDto dto,
CancellationToken ct)
{
var user = await _context.Users.FindAsync([userId], ct)
?? throw new UserNotFoundException(userId);
user.Name = dto.Name;
user.Version++; // Increment version
try
{
await _context.SaveChangesAsync(ct);
return user;
}
catch (DbUpdateConcurrencyException)
{
throw new ConcurrencyConflictException(
"Data was modified by another user. Please refresh and try again.");
}
});
if (result.count === 0) {
throw new ConflictError('Version mismatch - data was modified by another user');
}
return await prisma.user.findUnique({ where: { id: userId } });
}
```
@@ -381,97 +403,104 @@ graph LR
WriteModel --> DB1[(Write DB)]
ReadModel --> DB2[(Read DB<br/>Optimized)]
style WriteModel fill:#f0e1ff
style ReadModel fill:#d4edda
%% Dark color palette with white text
style Command fill:#1d4ed8,stroke:#3b82f6,color:#ffffff
style WriteModel fill:#7e22ce,stroke:#a855f7,color:#ffffff
style Events fill:#b45309,stroke:#f59e0b,color:#ffffff
style Projection fill:#1d4ed8,stroke:#3b82f6,color:#ffffff
style ReadModel fill:#15803d,stroke:#22c55e,color:#ffffff
style Query fill:#15803d,stroke:#22c55e,color:#ffffff
style DB1 fill:#374151,stroke:#6b7280,color:#ffffff
style DB2 fill:#374151,stroke:#6b7280,color:#ffffff
```
**Description**: Separates read and write models for optimal performance.
**Mô tả**: Tách biệt read write models để tối ưu hiệu suất.
## Performance Characteristics
## Đặc điểm Hiệu suất
Performance metrics and optimization strategies for data consistency patterns.
Chỉ số hiệu suất và chiến lược tối ưu cho patterns đồng bộ dữ liệu.
| Pattern | Latency Impact | Throughput | Notes |
|---------|----------------|------------|-------|
| **Saga Execution** | 500ms - 2s | 100-500 sagas/s | Depends on number of steps and compensation |
| **Outbox Processing** | < 100ms | 10,000 events/s | Async processing, minimal user impact |
| **Idempotency Check** | < 10ms | 50,000 checks/s | Redis lookup, very fast |
| **Optimistic Lock Update** | < 50ms | 5,000 updates/s | Single DB operation with version check |
| **CQRS Projection** | 100ms - 1s | 1,000 events/s | Event processing to read model |
| **Compensation Execution** | 200ms - 1s | Varies | Rollback operations in saga |
| Pattern | Tác động Độ trễ | Thông lượng | Ghi chú |
|---------|-----------------|-------------|---------|
| **Thực thi Saga** | 500ms - 2s | 100-500 sagas/s | Phụ thuộc số bước và compensation |
| **Xử lý Outbox** | < 100ms | 10,000 events/s | Xử lý bất đồng bộ, tác động tối thiểu |
| **Kiểm tra Idempotency** | < 10ms | 50,000 checks/s | Redis lookup, rất nhanh |
| **Cập nhật Optimistic Lock** | < 50ms | 5,000 updates/s | Single DB operation với version check |
| **CQRS Projection** | 100ms - 1s | 1,000 events/s | Xử lý event sang read model |
| **Thực thi Compensation** | 200ms - 1s | Varies | Rollback operations trong saga |
### Performance Optimization Strategies
### Chiến lược Tối ưu Hiệu suất
**Saga Pattern**:
- Minimize number of steps (< 5 steps ideal)
- Parallel execution where possible
- Giảm thiểu số bước (< 5 bước lý tưởng)
- Thực thi song song khi có thể
- Cache service responses
- Set appropriate timeouts (30s default)
- Đặt timeouts phù hợp (30s mặc định)
**Outbox Pattern**:
- Batch process outbox events (100-500 per batch)
- Index `publishedAt` column for performance
- Archive processed events periodically
- Use connection pooling for Kafka
- Batch process outbox events (100-500 mỗi batch)
- Index cột `publishedAt` cho hiệu suất
- Archive processed events định kỳ
- Sử dụng connection pooling cho Kafka
**Idempotency**:
- Use Redis for fast key lookups
- Set TTL to 24-48 hours
- Sử dụng Redis cho fast key lookups
- Đặt TTL 24-48 giờ
- Hash long idempotency keys
- Clean expired keys regularly
- Clean expired keys thường xuyên
**Optimistic Locking**:
- Works best for low-contention scenarios
- Implement retry with exponential backoff
- Monitor conflict rates (should be < 5%)
- Consider pessimistic locking if conflicts > 10%
- Hoạt động tốt nhất cho low-contention scenarios
- Triển khai retry với exponential backoff
- Giám sát conflict rates (nên < 5%)
- Cân nhắc pessimistic locking nếu conflicts > 10%
## Security Considerations
## Cân nhắc Bảo mật
Security measures for protecting data consistency operations.
Biện pháp bảo mật để bảo vệ các operations đồng bộ dữ liệu.
### Saga Security
### Bảo mật Saga
**Compensation Protection**:
- Validate saga execution permissions at each step
- Encrypt sensitive data in saga context
- Log all saga executions for audit
- Implement timeout to prevent hanging sagas
**Bảo vệ Compensation**:
- Xác thực saga execution permissions ở mỗi bước
- Mã hóa sensitive data trong saga context
- Log tất cả saga executions cho audit
- Triển khai timeout để ngăn hanging sagas
```typescript
// Secure saga context
// Saga context bảo mật
interface SecureSagaContext {
sagaId: string;
userId: string; // User who initiated
permissions: string[]; // Required permissions
encryptedData: string; // Encrypted sensitive data
auditLog: AuditEntry[]; // Audit trail
userId: string;
permissions: string[];
encryptedData: string;
auditLog: AuditEntry[];
}
```
### Outbox Security
### Bảo mật Outbox
**Event Payload Encryption**:
- Encrypt PII (Personally Identifiable Information) before storing in outbox
- Use AES-256-GCM for event payload encryption
- Decrypt only when publishing to Kafka
- Rotate encryption keys quarterly
**Mã hóa Event Payload**:
- Mã hóa PII (Personally Identifiable Information) trước khi lưu trong outbox
- Sử dụng AES-256-GCM cho event payload encryption
- Giải mã chỉ khi publishing sang Kafka
- Rotate encryption keys hàng quý
**Access Control**:
- Restrict outbox table access to outbox processor only
- Use database roles and permissions
- Monitor outbox table access patterns
**Kiểm soát Truy cập**:
- Hạn chế truy cập outbox table chỉ cho outbox processor
- Sử dụng database roles permissions
- Giám sát outbox table access patterns
### Idempotency Security
### Bảo mật Idempotency
**Key Security**:
- Use cryptographic hashing for idempotency keys (SHA-256)
- Include user context in key generation
- Validate key ownership before processing
- Clear keys on user logout for sensitive operations
**Bảo mật Key**:
- Sử dụng cryptographic hashing cho idempotency keys (SHA-256)
- Bao gồm user context trong key generation
- Xác thực key ownership trước khi xử lý
- Clear keys khi user logout cho sensitive operations
```typescript
// Secure idempotency key generation
// Tạo idempotency key bảo mật
function generateIdempotencyKey(
operation: string,
userId: string,
@@ -482,17 +511,17 @@ function generateIdempotencyKey(
}
```
### Optimistic Locking Security
### Bảo mật Optimistic Lock
**Version Tampering Prevention**:
- Validate version field on server-side only
- Never accept version from client directly
- Log version conflicts for security monitoring
**Ngăn chặn Giả mạo Version**:
- Xác thực version field chỉ ở server-side
- Không bao giờ chấp nhận version từ client trực tiếp
- Log version conflicts cho security monitoring
- Rate limit update attempts per user
## Deployment
## Triển khai
How data consistency patterns are deployed and scaled.
Cách các patterns đồng bộ dữ liệu được triển khai và mở rộng.
```mermaid
graph TD
@@ -523,76 +552,83 @@ graph TD
OP1 & OP2 --> Kafka[Kafka Cluster\n5 brokers]
end
style SO1 fill:#e1f5ff
style SO2 fill:#e1f5ff
style OP1 fill:#fff4e1
style OP2 fill:#fff4e1
style DB fill:#d4edda
style Kafka fill:#ffe1e1
%% Dark color palette with white text
style OS1 fill:#4527a0,stroke:#7c4dff,color:#ffffff
style OS2 fill:#4527a0,stroke:#7c4dff,color:#ffffff
style OS3 fill:#4527a0,stroke:#7c4dff,color:#ffffff
style SO1 fill:#1d4ed8,stroke:#3b82f6,color:#ffffff
style SO2 fill:#1d4ed8,stroke:#3b82f6,color:#ffffff
style OP1 fill:#b45309,stroke:#f59e0b,color:#ffffff
style OP2 fill:#b45309,stroke:#f59e0b,color:#ffffff
style DB fill:#15803d,stroke:#22c55e,color:#ffffff
style Redis fill:#7e22ce,stroke:#a855f7,color:#ffffff
style Kafka fill:#b91c1c,stroke:#ef4444,color:#ffffff
style PS fill:#374151,stroke:#6b7280,color:#ffffff
style IS fill:#374151,stroke:#6b7280,color:#ffffff
```
### Deployment Configuration
### Cấu hình Triển khai
| Component | Replicas | Resources | HA Strategy |
|-----------|----------|-----------|-------------|
| **Saga Orchestrator** | 2-3 | 512Mi RAM, 500m CPU | Leader election with etcd |
| Thành phần | Replicas | Resources | HA Strategy |
|------------|----------|-----------|-------------|
| **Saga Orchestrator** | 2-3 | 512Mi RAM, 500m CPU | Leader election với etcd |
| **Outbox Processor** | 2-5 | 256Mi RAM, 250m CPU | Distributed lock per event batch |
| **Services with Outbox** | 3+ | Varies | Standard service scaling |
| **Redis (Idempotency)** | 3 nodes | 1Gi RAM each | Redis Cluster with replication |
| **Services với Outbox** | 3+ | Varies | Standard service scaling |
| **Redis (Idempotency)** | 3 nodes | 1Gi RAM each | Redis Cluster với replication |
### Scaling Strategy
### Chiến lược Mở rộng
**Saga Orchestrator**:
- Scale based on pending saga count
- Use queue-based load distribution
- Monitor saga execution duration
- Scale dựa trên pending saga count
- Sử dụng queue-based load distribution
- Giám sát saga execution duration
**Outbox Processor**:
- Scale with database sharding (1 processor per shard)
- Increase batch size before adding replicas
- Monitor outbox table size and age
- Scale với database sharding (1 processor per shard)
- Tăng batch size trước khi thêm replicas
- Giám sát outbox table size age
**Idempotency Store (Redis)**:
- Scale Redis cluster horizontally
- Use consistent hashing for key distribution
- Monitor memory usage (should be < 70%)
- Sử dụng consistent hashing cho key distribution
- Giám sát memory usage (nên < 70%)
## Monitoring & Observability
## Giám sát & Khả năng quan sát
Monitoring strategies for data consistency patterns.
Chiến lược giám sát cho patterns đồng bộ dữ liệu.
### Key Metrics
### Chỉ số Chính
**Saga Metrics**:
- `saga_executions_total` - Total saga executions (success/failure)
- `saga_executions_total` - Tổng saga executions (success/failure)
- `saga_duration_seconds` - Saga execution time histogram
- `saga_compensations_total` - Total compensation executions
- `saga_timeout_total` - Sagas that timed out
- `saga_pending_count` - Sagas currently executing
- `saga_compensations_total` - Tổng compensation executions
- `saga_timeout_total` - Sagas timeout
- `saga_pending_count` - Sagas đang thực thi
**Outbox Metrics**:
- `outbox_events_total` - Events written to outbox
- `outbox_published_total` - Events published to Kafka
- `outbox_processing_lag_seconds` - Time from write to publish
- `outbox_table_size` - Outbox table row count
- `outbox_events_total` - Events ghi vào outbox
- `outbox_published_total` - Events published sang Kafka
- `outbox_processing_lag_seconds` - Thời gian từ write đến publish
- `outbox_table_size` - Số dòng outbox table
- `outbox_failed_events_total` - Failed event publications
**Idempotency Metrics**:
- `idempotency_checks_total` - Total idempotency checks
- `idempotency_checks_total` - Tổng idempotency checks
- `idempotency_hits_total` - Duplicate requests prevented
- `idempotency_key_ttl_seconds` - Average key TTL
- `idempotency_redis_errors_total` - Redis failures
**Optimistic Lock Metrics**:
- `optimistic_lock_conflicts_total` - Version conflicts detected
- `optimistic_lock_retries_total` - Retry attempts after conflict
- `optimistic_lock_retries_total` - Retry attempts sau conflict
- `optimistic_lock_success_rate` - Update success percentage
### Alerts
### Cảnh báo
**Critical Alerts**:
```yaml
# Saga timeout rate too high
# Saga timeout rate quá cao
alert: HighSagaTimeoutRate
expr: rate(saga_timeout_total[5m]) > 0.05
for: 5m
@@ -611,23 +647,23 @@ for: 5m
severity: warning
```
### Monitoring Dashboard
### Dashboard Giám sát
**Grafana Panels**:
1. **Saga Orchestration Overview**:
1. **Tổng quan Saga Orchestration**:
- Saga execution rate (success/failure)
- Average saga duration
- Compensation rate
- Pending saga count
2. **Outbox Processing Health**:
2. **Sức khỏe Outbox Processing**:
- Outbox publishing rate
- Processing lag (P95, P99)
- Failed events
- Table size trend
3. **Idempotency Effectiveness**:
3. **Hiệu quả Idempotency**:
- Duplicate prevention rate
- Redis hit rate
- Key distribution
@@ -637,11 +673,11 @@ severity: warning
- Mean time to consistency (MTTC)
- Conflict resolution success rate
### Distributed Tracing
### Tracing Phân tán
**Trace Saga Execution**:
```typescript
// Traced saga step
// Saga step được trace
async function executeStepWithTracing(
step: SagaStep,
context: SagaContext
@@ -668,17 +704,48 @@ async function executeStepWithTracing(
}
```
## Related Documentation
## Tài liệu Liên quan
- [Event-Driven Architecture](./event-driven-architecture.md) - Event sourcing and Kafka
- [System Design](./system-design.md) - Overall architecture
- [Microservices Communication](./microservices-communication.md) - Service communication patterns
- [Resilience Patterns](../skills/resilience-patterns.md) - Circuit breaker, retry for saga steps
- [Caching Patterns](../skills/caching-patterns.md) - Caching for idempotency keys
- [Database Prisma](../skills/database-prisma.md) - Prisma transactions for outbox pattern
- [Event-Driven Architecture](./event-driven-architecture.md) - Event sourcing Kafka
- [System Design](./system-design.md) - Kiến trúc tổng thể
- [Microservices Communication](./microservices-communication.md) - Patterns giao tiếp service
- [Resilience Patterns](../skills/resilience-patterns.md) - Circuit breaker, retry cho saga steps
- [Caching Patterns](../skills/caching-patterns.md) - Caching cho idempotency keys
- [Database Prisma](../skills/database-prisma.md) - Prisma transactions cho outbox pattern
---
**Last Updated**: 2026-01-07
**Author**: VelikHo (hongochai10@icloud.com)
**Reviewers**: To be assigned
**Cập nhật Lần cuối**: 2026-01-14
**Tác giả**: GoodGo Architecture Team
## Quick Tips
### Mermaid Common Issues
- ⚠️ **Syntax Error**: Kiểm tra dấu `(` `)` `[` `]` `{` `}`
- ⚠️ **Render Error**: Kiểm tra `graph` vs `flowchart`, sử dụng `graph` cho compatibility
- ⚠️ **Arrow Direction**: Sử dụng `-->` (solid) hoặc `-.->` (dashed)
-**Color**: Luôn sử dụng dark palette với white text
### Color Palette Reference
| Color | Fill | Stroke | Use Case |
|-------|------|--------|----------|
| **Blue** | `#1d4ed8` | `#3b82f6` | Primary Components, Saga |
| **Green** | `#15803d` | `#22c55e` | Success, DB, Stable States |
| **Purple** | `#7e22ce` | `#a855f7` | Feature, Logic, Idempotency |
| **Orange** | `#b45309` | `#f59e0b` | Warning, External, Outbox |
| **Red** | `#b91c1c` | `#ef4444` | Error, Failure, Critical |
| **Gray** | `#374151` | `#6b7280` | Background, Secondary |
**Pattern áp dụng**:
```
style NodeName fill:#1d4ed8,stroke:#3b82f6,color:#ffffff
```
### Visual Indicators
-**Recommended**: Best practices, khuyến nghị sử dụng
- ⚠️ **Warning**: Cần chú ý, có điều kiện
-**Avoid**: Anti-patterns, tránh sử dụng
- 🔒 **Security**: Liên quan đến bảo mật
-**Performance**: Liên quan đến hiệu suất

View File

@@ -1,8 +1,8 @@
# Event-Driven Architecture
# Kiến trúc Hướng Sự kiện
> Event-driven architecture for asynchronous communication using Apache Kafka
> Kiến trúc hướng sự kiện cho giao tiếp bất đồng bộ sử dụng Apache Kafka
## Overview Diagram
## Sơ đồ Tổng quan
```mermaid
graph TD
@@ -27,28 +27,36 @@ graph TD
Topics -->|Subscribe| Consumer1
Topics -->|Subscribe| Consumer2
style Kafka fill:#e1f5ff
style Topics fill:#fff4e1
style IAM fill:#5E35B1,stroke:#4527A0,color:#ffffff
style Service1 fill:#5E35B1,stroke:#4527A0,color:#ffffff
style Kafka fill:#1E88E5,stroke:#1565C0,color:#ffffff
style Topics fill:#FB8C00,stroke:#EF6C00,color:#ffffff
style Consumer1 fill:#43A047,stroke:#2E7D32,color:#ffffff
style Consumer2 fill:#43A047,stroke:#2E7D32,color:#ffffff
```
## Architecture Description
## Mô tả Kiến trúc
The GoodGo platform implements Event-Driven Architecture (EDA) for asynchronous communication between microservices.
Nền tảng GoodGo triển khai Kiến trúc Hướng Sự kiện (EDA) cho giao tiếp bất đồng bộ giữa microservices.
**Core Principles**:
1. **Event-First Design**: All state changes emit domain events
2. **Loose Coupling**: Services communicate through events
3. **Eventual Consistency**: Accept temporary inconsistency
4. **Event Sourcing**: Store changes as event sequence
5. **CQRS Pattern**: Separate read/write operations
> [!IMPORTANT]
> **Trạng thái hiện tại**: Event Bus với RabbitMQ/Kafka chưa được triển khai. Hiện tại sử dụng MediatR cho domain events nội bộ trong mỗi service.
**Technology Stack**:
- Apache Kafka - Event streaming platform
- Schema Registry - Avro schemas for validation
- KafkaJS - Node.js client library
- Event Sourcing - Custom implementation in IAM
**Nguyên tắc Cốt lõi**:
1. **Event-First Design**: Mọi thay đổi trạng thái phát ra domain events
2. **Loose Coupling**: Services giao tiếp qua events (nội bộ hoặc message broker)
3. **Eventual Consistency**: Chấp nhận inconsistency tạm thời
4. **CQRS Pattern**: Tách biệt read/write operations với MediatR
## Event Flow
**Công nghệ Hiện tại**:
- **MediatR** - Domain events nội bộ service
- **Entity Framework Core** - Domain event dispatch qua DbContext
**Công nghệ Planned**:
- **RabbitMQ + MassTransit** - Inter-service events (Roadmap)
- **Outbox Pattern** - Reliable event publishing
## Luồng Sự kiện
```mermaid
sequenceDiagram
@@ -62,33 +70,46 @@ sequenceDiagram
Consumer-->>Kafka: Acknowledge
```
**Steps**: Publish → Distribute → Consume → Retry (if failed) → DLQ (after max retries) → Acknowledge
**Các Bước**: Publish → Distribute → Consume → Retry (nếu thất bại) → DLQ (sau retry tối đa) → Acknowledge
## Event Structure
## Cấu trúc Sự kiện
```typescript
interface BaseEvent {
eventId: string; // UUID
eventType: string; // user.created.v1
eventVersion: string; // 1.0.0
timestamp: string; // ISO 8601
source: string; // iam-service
correlationId?: string; // Request correlation
data: unknown; // Event payload
}
```
### Domain Events (MediatR - Hiện tại)
**Example**:
```json
```csharp
// EN: Base domain event interface
// VI: Interface domain event cơ bản
public interface IDomainEvent : INotification
{
"eventId": "550e8400-e29b-41d4-a716-446655440000",
"eventType": "user.created.v1",
"timestamp": "2024-01-15T10:30:00Z",
"source": "iam-service",
"data": {
"userId": "user_123",
"email": "user@example.com"
}
Guid EventId { get; }
DateTime OccurredOn { get; }
}
// EN: Example domain event
// VI: Ví dụ domain event
public record UserCreatedEvent : IDomainEvent
{
public Guid EventId { get; } = Guid.NewGuid();
public DateTime OccurredOn { get; } = DateTime.UtcNow;
public required Guid UserId { get; init; }
public required string Email { get; init; }
public required string FirstName { get; init; }
}
// EN: Event handler
// VI: Handler xử lý event
public class UserCreatedEventHandler : INotificationHandler<UserCreatedEvent>
{
private readonly ILogger<UserCreatedEventHandler> _logger;
public async Task Handle(UserCreatedEvent notification, CancellationToken ct)
{
_logger.LogInformation("User created: {UserId}, {Email}",
notification.UserId, notification.Email);
// TODO: Send welcome email, create profile, etc.
}
}
```
@@ -100,19 +121,19 @@ graph LR
AuthLogin[auth.login.success<br/>Partitions: 5]
AuditEvents[audit.events<br/>Partitions: 10]
style UserCreated fill:#e1f5ff
style AuthLogin fill:#fff4e1
style AuditEvents fill:#f8d7da
style UserCreated fill:#1E88E5,stroke:#1565C0,color:#ffffff
style AuthLogin fill:#43A047,stroke:#2E7D32,color:#ffffff
style AuditEvents fill:#E53935,stroke:#C62828,color:#ffffff
```
**Naming Convention**: `{domain}.{action}.{version}`
**Quy ước Đặt tên**: `{domain}.{action}.{version}`
**Examples**:
**Ví dụ**:
- `user.created.v1`
- `auth.login.success.v1`
- `audit.event.logged.v1`
## Error Handling
## Xử lý Lỗi
```mermaid
graph TD
@@ -121,19 +142,26 @@ graph TD
Process -->|Failure| Retry[Retry 3x]
Retry -->|Max Retries| DLQ[Dead Letter Queue]
DLQ --> Alert[Alert Team]
style Event fill:#757575,stroke:#616161,color:#ffffff
style Process fill:#1E88E5,stroke:#1565C0,color:#ffffff
style Ack fill:#43A047,stroke:#2E7D32,color:#ffffff
style Retry fill:#FB8C00,stroke:#EF6C00,color:#ffffff
style DLQ fill:#E53935,stroke:#C62828,color:#ffffff
style Alert fill:#E53935,stroke:#C62828,color:#ffffff
```
**Strategy**:
1. Retry with exponential backoff (100ms → 200ms → 400ms)
2. Max 3 attempts
3. Move to DLQ after max retries
4. Manual review and reprocess
**Chiến lược**:
1. Retry với exponential backoff (100ms → 200ms → 400ms)
2. Tối đa 3 lần thử
3. Chuyển sang DLQ sau retry tối đa
4. Xem xét thủ công và xử lý lại
## System Context
## Bối cảnh Hệ thống
```mermaid
C4Context
title Event-Driven Architecture Context
title Sơ đồ Bối cảnh Event-Driven Architecture
System(iam, "IAM Service", "Event producer")
System(service_a, "Service A", "Event producer")
@@ -152,17 +180,17 @@ C4Context
Rel(kafka, monitoring, "Sends metrics", "JMX")
```
**Context Description**:
- **Producers**: IAM Service and other services publish domain events
- **Kafka**: Central event broker, manages topics and partitions
- **Consumers**: Notification and Audit services consume events
- **Schema Registry**: Manages and validates Avro schemas
- **Monitoring**: Collects metrics from Kafka cluster
**Mô tả Các Thành phần**:
- **Producers**: IAM Service và các services khác publish domain events
- **Kafka**: Event broker trung tâm, quản lý topics partitions
- **Consumers**: Notification Audit services consume events
- **Schema Registry**: Quản lý và validate Avro schemas
- **Monitoring**: Thu thập metrics từ Kafka cluster
## Performance Characteristics
## Đặc điểm Hiệu suất
| Metric | Target | Notes |
|--------|--------|-------|
| Chỉ số | Mục tiêu | Ghi chú |
|-----------------|-------------------|-----------------|
| **Event Publish Latency (P95)** | < 10ms | Fire-and-forget, async |
| **Event Delivery Latency (P95)** | < 100ms | End-to-end from publish to consume |
| **Throughput** | 10,000 events/s | Per topic, scalable with partitions |
@@ -171,43 +199,43 @@ C4Context
| **Retention** | 7 days | Default, configurable per topic |
| **Replication Factor** | 3 | For fault tolerance |
**Performance Optimizations**:
- **Batch Publishing**: Group multiple events to reduce network overhead
- **Compression**: Use Snappy or LZ4 compression
- **Partitioning**: Divide topics into multiple partitions for parallel processing
- **Consumer Groups**: Multiple consumers in same group for horizontal scaling
- **Async Publishing**: Fire-and-forget pattern, don't block request handlers
**Tối ưu hóa Hiệu suất**:
- **Batch Publishing**: Group multiple events để giảm network overhead
- **Compression**: Sử dụng Snappy hoặc LZ4 compression
- **Partitioning**: Phân chia topics thành multiple partitions cho parallel processing
- **Consumer Groups**: Multiple consumers trong cùng group để scale horizontally
- **Async Publishing**: Fire-and-forget pattern, không block request handlers
## Security Considerations
## Cân nhắc Bảo mật
**Event Encryption**:
- TLS in-transit for all Kafka connections
- Optional payload encryption for sensitive data
- End-to-end encryption with custom encryption layer
**Mã hóa Sự kiện**:
- TLS in-transit cho tất cả Kafka connections
- Optional payload encryption cho sensitive data
- End-to-end encryption với custom encryption layer
**Access Control**:
**Kiểm soát Truy cập**:
- Kafka ACLs (Access Control Lists) per topic
- SASL/SCRAM authentication for producers and consumers
- Separate credentials per service
- Principle of least privilege - grant only necessary permissions
- SASL/SCRAM authentication cho producers consumers
- Separate credentials cho mỗi service
- Principle of least privilege - chỉ grant quyền cần thiết
**Schema Validation**:
- Avro schemas in Schema Registry
- Schema evolution with backward/forward compatibility
- Reject events that don't match schema
**Xác thực Schema**:
- Avro schemas trong Schema Registry
- Schema evolution với backward/forward compatibility
- Reject events không match schema
**Audit**:
- Log all event publishes and consumes
- Correlation IDs to trace event flow
- Retention policy for audit logs (7 years)
**Kiểm toán**:
- Log tất cả event publishes consumes
- Correlation IDs để trace event flow
- Retention policy cho audit logs (7 years)
**Data Retention**:
**Lưu trữ Dữ liệu**:
- Default 7 days retention
- Configurable per topic
- Automatic deletion after retention period
- GDPR compliance (right to erasure)
- Automatic deletion sau retention period
- Compliance với GDPR (right to erasure)
## Deployment
## Triển khai
```mermaid
graph TD
@@ -253,27 +281,33 @@ graph TD
Broker2 --> Audit
Broker3 --> Audit
style Broker1 fill:#e1f5ff
style Broker2 fill:#fff4e1
style Broker3 fill:#d4edda
style ZK fill:#f0e1ff
style Broker1 fill:#1E88E5,stroke:#1565C0,color:#ffffff
style Broker2 fill:#1E88E5,stroke:#1565C0,color:#ffffff
style Broker3 fill:#1E88E5,stroke:#1565C0,color:#ffffff
style ZK fill:#8E24AA,stroke:#7B1FA2,color:#ffffff
style IAM fill:#5E35B1,stroke:#4527A0,color:#ffffff
style ServiceA fill:#5E35B1,stroke:#4527A0,color:#ffffff
style Notification fill:#43A047,stroke:#2E7D32,color:#ffffff
style Audit fill:#43A047,stroke:#2E7D32,color:#ffffff
```
**Kafka Cluster Configuration**:
### Chiến lược Triển khai
**Cấu hình Kafka Cluster**:
- **Brokers**: 3 brokers minimum (5 for production)
- **Replication Factor**: 3 (for fault tolerance)
- **Min In-Sync Replicas**: 2 (ensure data durability)
- **Partitions**: 3-10 per topic (based on throughput needs)
- **Zookeeper**: 3-node ensemble (for coordination)
**Resource Allocation**:
**Phân bổ Tài nguyên**:
| Component | CPU | Memory | Disk |
|-----------|-----|--------|------|
| **Kafka Broker** | 2 cores | 4GB RAM | 100GB SSD |
| **Zookeeper** | 1 core | 2GB RAM | 20GB SSD |
| **Schema Registry** | 500m | 1GB RAM | 10GB |
**Topic Configuration**:
**Cấu hình Topic**:
```yaml
user.created:
partitions: 3
@@ -294,15 +328,15 @@ audit.events:
compression-type: lz4
```
**High Availability**:
- Multiple brokers with partition replication
- Automatic leader election when broker fails
**Tính Sẵn sàng Cao**:
- Multiple brokers với partition replication
- Automatic leader election khi broker fails
- Consumer group rebalancing
- Monitoring and alerting for broker health
- Monitoring alerting cho broker health
## Monitoring & Observability
## Giám sát & Khả năng quan sát
**Key Metrics**:
### Chỉ số Chính
**Kafka Broker Metrics**:
- `kafka_server_brokertopicmetrics_messagesinpersec` - Messages in/sec
@@ -323,7 +357,7 @@ audit.events:
**Application Metrics**:
```typescript
// Custom metrics for event processing
// Custom metrics cho event processing
const eventPublished = new Counter({
name: 'events_published_total',
help: 'Total events published',
@@ -344,6 +378,42 @@ const eventProcessingDuration = new Histogram({
});
```
**Quy tắc Cảnh báo**:
```yaml
# High consumer lag
- alert: HighConsumerLag
expr: kafka_consumer_fetch_manager_records_lag_max > 10000
for: 5m
severity: warning
annotations:
summary: "High consumer lag detected"
description: "Consumer lag is {{ $value }} messages"
# Broker down
- alert: KafkaBrokerDown
expr: kafka_server_kafkaserver_brokerstate != 3
for: 1m
severity: critical
annotations:
summary: "Kafka broker is down"
# Under-replicated partitions
- alert: UnderReplicatedPartitions
expr: kafka_server_replicamanager_underreplicatedpartitions > 0
for: 5m
severity: warning
annotations:
summary: "Under-replicated partitions detected"
# Offline partitions
- alert: OfflinePartitions
expr: kafka_controller_kafkacontroller_offlinepartitionscount > 0
for: 1m
severity: critical
annotations:
summary: "Offline partitions detected"
```
**Dashboards**:
- Kafka Cluster Overview (brokers, topics, partitions)
- Producer Performance (throughput, latency, errors)
@@ -352,7 +422,7 @@ const eventProcessingDuration = new Histogram({
**Logging**:
```typescript
// Structured logging for events
// Structured logging cho events
logger.info('Event published', {
eventId: event.eventId,
eventType: event.eventType,
@@ -369,7 +439,33 @@ logger.info('Event consumed', {
});
```
## Related Documentation
- [System Design](./system-design.md) - Overall architecture
- [IAM Architecture](./iam-proposal.md) - Event sourcing implementation
## Tài liệu Liên quan
- [System Design](./system-design.md) - Kiến trúc tổng thể
- [IAM Architecture](./iam-proposal.md) - Triển khai Event sourcing
---
**Cập nhật Lần cuối**: 2026-01-14
**Tác giả**: GoodGo Architecture Team
## Mẹo Nhanh
### Bảng Màu Mermaid
| Loại Node | Màu Nền | Màu Viền | Màu Chữ | Sử dụng |
|-----------|------------|--------------|------------|-------|
| **Core/Broker** | `#1E88E5` (Blue) | `#1565C0` | `#ffffff` | Kafka Brokers, Main Components |
| **Topic/Data** | `#FB8C00` (Orange) | `#EF6C00` | `#ffffff` | Topics, Queues, Data Stores |
| **Success/Safe** | `#43A047` (Green) | `#2E7D32` | `#ffffff` | Successful flows, Safe states |
| **Error/Danger** | `#E53935` (Red) | `#C62828` | `#ffffff` | Errors, DLQ, Critical issues |
| **Coordination** | `#8E24AA` (Purple) | `#7B1FA2` | `#ffffff` | Zookeeper, Orchestrators |
### Các Chỉ báo Trực quan
- 🔄 **Retry Loop**: Chỉ báo thử lại tự động
- ⚠️ **DLQ/Warning**: Đường dẫn xử lý lỗi
- 📝 **Log/Audit**: Điểm ghi log
- 🔐 **Lock/Auth**: Kiểm tra bảo mật

View File

@@ -4,17 +4,15 @@ Tài liệu này mô tả đề xuất kiến trúc cho IAM Service (Identity an
## Tổng Quan: Auth Service → IAM Service
**Auth Service hiện tại** tập trung vào:
- Authentication (xác thực)
- Authorization (phân quyền)
- Session & Token management
- RBAC/ABAC
**IAM Service** cung cấp:
- **OAuth2/OpenID Connect** với OpenIddict
- **ASP.NET Core Identity** cho user management
- **Role-Based Access Control (RBAC)**
- **JWT Tokens** (Access 15min, Refresh 7 days)
- **MFA Support** (TOTP)
**IAM Service** mở rộng thêm:
- **Identity Management** (quản lý danh tính toàn diện)
- **Access Governance** (quản trị truy cập)
- **Compliance & Reporting** (tuân thủ và báo cáo)
- **Lifecycle Management** (quản lý vòng đời tài khoản)
> [!NOTE]
> IAM Service đã được triển khai với .NET 10, Clean Architecture tại `services/iam-service-net/`
---
@@ -90,56 +88,90 @@ Tài liệu này mô tả đề xuất kiến trúc cho IAM Service (Identity an
---
## 2. Kiến Trúc Module Structure
## 2. Kiến Trúc Module Structure (Thực Tế)
```
services/iam-service/
services/iam-service-net/
├── src/
│ ├── config/ # Configuration files
│ ├── core/
│ │ ├── cache/ # Multi-layer cache
│ │ ├── security/ # Zero-trust, encryption
│ │ ├── events/ # Event sourcing
│ │ └── workflows/ # Workflow engine (NEW)
├── modules/
│ ├── auth/ # ✅ Core authentication
│ │ ├── rbac/ # ✅ RBAC system
│ │ ├── social/ # ✅ Social authentication
│ │ ├── oidc/ # ✅ OIDC implementation
│ │ ── token/ # ✅ JWT & Cookie management
│ ├── session/ # ✅ Session management
├── mfa/ # ✅ Multi-factor auth
│ │
── identity/ # 🆕 Identity Management
│ │ │ ├── user/ # User lifecycle
│ │ ├── profile/ # Profile management
│ │ ├── verification/ # Identity verification
│ │ │ └── organization/ # Organizations & groups
│ │ │
│ │ ├── access/ # 🆕 Access Management
│ │ │ ├── request/ # Access requests
│ │ │ ├── review/ # Access reviews
│ │ │ ├── pam/ # Privileged access
│ │ │ └── analytics/ # Access analytics
│ │ │
│ │ ├── governance/ # 🆕 Governance & Compliance
│ │ │ ├── compliance/ # Compliance reporting
│ │ │ ├── policy/ # Policy governance
│ │ │ ├── risk/ # Risk management
│ │ │ └── reporting/ # Reporting & dashboards
│ │ │
│ │ └── workflow/ # 🆕 Workflow Engine
│ │ ├── engine/ # Workflow engine
│ │ ├── approval/ # Approval workflows
│ │ └── automation/ # Automated workflows
│ │
│ ├── middlewares/ # Express middlewares
│ ├── repositories/ # Data access layer
│ └── routes/ # Route definitions
└── prisma/
└── schema.prisma # Database schema (mở rộng)
│ ├── IamService.API/ # Presentation Layer
│ ├── Controllers/ # AuthController, UsersController, RolesController
│ │ ├── Application/ # CQRS Commands, Queries, Handlers
│ │ │ ├── Commands/ # RegisterUserCommand, ChangePasswordCommand
│ │ │ ├── Queries/ # GetUserQuery, GetUsersQuery
│ │ │ └── Validators/ # FluentValidation validators
│ └── Program.cs # App entry point
│ ├── IamService.Domain/ # Domain Layer
│ │ ├── AggregatesModel/ # ApplicationUser, ApplicationRole
│ │ ├── Events/ # UserCreatedEvent, UserDeletedEvent
│ │ ├── Exceptions/ # UserNotFoundException, InvalidCredentialsException
│ │ ── SeedWork/ # Entity, IAggregateRoot, IRepository
└── IamService.Infrastructure/ # Infrastructure Layer
├── IamServiceContext.cs # DbContext with Identity + OpenIddict
├── Repositories/ # UserRepository, RoleRepository
── Services/ # EmailService, TokenService
├── tests/
├── IamService.UnitTests/
└── IamService.FunctionalTests/
├── docs/
├── Dockerfile
└── IamService.slnx
```
### Sơ Đồ Kiến Trúc Clean Architecture
```mermaid
graph TD
%% Styling Configuration
classDef base fill:#202020,stroke:#505050,color:#fff,stroke-width:1px;
classDef core fill:#1a237e,stroke:#3949ab,color:#fff,stroke-width:1px;
classDef newModule fill:#1b5e20,stroke:#43a047,color:#fff,stroke-width:1px;
classDef database fill:#4a148c,stroke:#7b1fa2,color:#fff,stroke-width:1px;
%% Main Service Node
IAM[IAM Service]:::core
%% Identity Management Subgraph
subgraph Identity [Identity Management]
direction TB
User[User Lifecycle]:::newModule
Profile[Profile Mgmt]:::newModule
Verify[Verification]:::newModule
Org[Org & Groups]:::newModule
end
%% Access Management Subgraph
subgraph Access [Access Management]
direction TB
Req[Access Requests]:::newModule
Review[Access Reviews]:::newModule
PAM[PAM]:::newModule
Analytics[Analytics]:::newModule
end
%% Governance Subgraph
subgraph Governance [Governance & Compliance]
direction TB
Comp[Compliance]:::newModule
Policy[Policy Gov]:::newModule
Risk[Risk Mgmt]:::newModule
end
%% Database
DB[(Neon Database)]:::database
%% Relationships
IAM --> Identity
IAM --> Access
IAM --> Governance
Identity -.-> DB
Access -.-> DB
Governance -.-> DB
%% Internal Dependencies
Access --> Identity
Governance ---> Access
```
---
## 3. Database Schema Mở Rộng
@@ -168,39 +200,35 @@ services/iam-service/
---
## 4. API Endpoints Mở Rộng
## 4. API Endpoints (Thực Tế)
### 4.1 Identity Management APIs
### 4.1 Authentication APIs
```
# User Management
GET /api/v1/identity/users
POST /api/v1/identity/users
GET /api/v1/identity/users/:id
PUT /api/v1/identity/users/:id
DELETE /api/v1/identity/users/:id
POST /api/v1/identity/users/bulk-import
GET /api/v1/identity/users/bulk-export
| Method | Endpoint | Mô tả | Auth |
|--------|----------|-------|------|
| `POST` | `/api/v1/auth/register` | Đăng ký user mới | ❌ |
| `POST` | `/connect/token` | OAuth2 token endpoint (login, refresh) | ❌ |
| `POST` | `/api/v1/auth/change-password` | Đổi mật khẩu | ✅ |
| `POST` | `/api/v1/auth/logout` | Đăng xuất (revoke tokens) | ✅ |
# Profile Management
GET /api/v1/identity/users/:id/profile
PUT /api/v1/identity/users/:id/profile
POST /api/v1/identity/users/:id/profile/avatar
### 4.2 User Management APIs
# Identity Verification
POST /api/v1/identity/verification/email/request
POST /api/v1/identity/verification/email/verify
POST /api/v1/identity/verification/phone/request
POST /api/v1/identity/verification/phone/verify
| Method | Endpoint | Mô tả | Auth |
|--------|----------|-------|------|
| `GET` | `/api/v1/users` | Danh sách users (paginated) | ✅ |
| `GET` | `/api/v1/users/me` | Thông tin user hiện tại | ✅ |
| `GET` | `/api/v1/users/{id}` | Lấy user theo ID | ✅ |
| `PUT` | `/api/v1/users/{id}` | Cập nhật user | ✅ |
| `DELETE` | `/api/v1/users/{id}` | Xóa user (soft delete) | ✅ |
# Organizations & Groups
GET /api/v1/identity/organizations
POST /api/v1/identity/organizations
GET /api/v1/identity/organizations/:id/groups
POST /api/v1/identity/organizations/:id/groups
GET /api/v1/identity/groups/:id/members
POST /api/v1/identity/groups/:id/members
```
### 4.3 Role Management APIs
| Method | Endpoint | Mô tả | Auth |
|--------|----------|-------|------|
| `GET` | `/api/v1/roles` | Danh sách roles | ✅ |
| `POST` | `/api/v1/roles` | Tạo role mới | ✅ Admin |
| `PUT` | `/api/v1/roles/{id}` | Cập nhật role | ✅ Admin |
| `DELETE` | `/api/v1/roles/{id}` | Xóa role | ✅ Admin |
### 4.2 Access Management APIs
@@ -337,3 +365,32 @@ GET /api/v1/governance/reports/security-events
- **Workflow automation** linh hoạt
Điều này biến service từ authentication/authorization cơ bản thành một IAM platform toàn diện, phù hợp cho enterprise.
---
## Quick Tips
### Mermaid Common Issues
- **Syntax Error**: Kiểm tra kỹ các dấu ngoặc `[]`, `{}`, `()` trong node label.
- **Connection**: Đảm bảo các mũi tên `-->`, `-.->` đúng cú pháp.
- **Indentation**: Subgraph cần thụt đầu dòng đúng cách.
### Color Pattern Reference
| Element | Fill Color | Stroke | Text | Usage |
|---------|------------|--------|------|-------|
| **Base** | `#202020` | `#505050` | `#fff` | Node thông thường |
| **Core** | `#1a237e` | `#3949ab` | `#fff` | Node trung tâm, quan trọng |
| **Module**| `#1b5e20` | `#43a047` | `#fff` | Module, service con |
| **DB** | `#4a148c` | `#7b1fa2` | `#fff` | Database, storage |
| **Warn** | `#b71c1c` | `#f44336` | `#fff` | Cảnh báo, lỗi |
### Visual Indicators
| Icon | Meaning |
|------|---------|
| ✅ | Đã hoàn thành / Tốt |
| 🔄 | Đang xử lý / Thay đổi |
| ⚠️ | Cảnh báo / Lưu ý |
| ❌ | Lỗi / Không khuyến khích |

View File

@@ -1,8 +1,81 @@
# Microservices Communication
# Kiến trúc Giao tiếp Microservices
> Communication patterns and protocols for inter-service communication
> Các patterns protocols giao tiếp giữa các services
## Overview Diagram
## Quick Overview
Hướng dẫn nhanh về các patterns giao tiếp cơ bản trong hệ thống GoodGo.
### Mô hình Giao tiếp Cơ bản
```mermaid
graph TD
%% Nodes
Client[Web App / Mobile App]
Traefik[Traefik API Gateway]
Auth[Auth Service]
Notify[Notification Service]
%% Relationships
Client -->|HTTP Request| Traefik
Traefik -->|Routing| Auth
Auth -.->|Internal HTTP| Notify
%% Styles using dark color palette
style Client fill:#1565c0,stroke:#fff,stroke-width:2px,color:#fff
style Traefik fill:#0f4c81,stroke:#fff,stroke-width:2px,color:#fff
style Auth fill:#283593,stroke:#fff,stroke-width:2px,color:#fff
style Notify fill:#4527a0,stroke:#fff,stroke-width:2px,color:#fff
```
### Giao tiếp Đồng bộ (HTTP/REST)
Các service giao tiếp đồng bộ qua HTTP REST APIs thông qua Traefik API Gateway.
**Ví dụ Client → Service:**
```typescript
// Web App -> Auth Service
const response = await fetch('http://api.goodgo.vn/api/v1/auth/login', {
method: 'POST',
body: JSON.stringify({ email, password }),
});
```
**Ví dụ Service → Service:**
```typescript
// Auth Service -> Notification Service
const response = await fetch('http://notification-service:5003/api/v1/notifications', {
method: 'POST',
headers: { 'X-Service-Auth': process.env.INTERNAL_API_KEY },
body: JSON.stringify({ userId, message }),
});
```
### API Gateway Routing
Traefik định tuyến requests dựa trên:
- **Host header**: `api.goodgo.vn`
- **Path prefix**: `/api/v1/auth`, `/api/v1/users`
### Format Error Response Chuẩn
Tất cả services tuân theo định dạng error response nhất quán:
```json
{
"success": false,
"error": {
"code": "AUTH_001",
"message": "Invalid credentials",
"details": {}
},
"timestamp": "2024-01-01T00:00:00.000Z"
}
```
---
## Sơ đồ Tổng quan
```mermaid
graph TD
@@ -27,7 +100,7 @@ graph TD
class SD green
```
## System Context
## Bối cảnh Hệ thống
```mermaid
C4Context
@@ -57,11 +130,11 @@ C4Context
Rel(services, external_api, "Integrates", "HTTPS")
```
The GoodGo platform uses a microservices architecture where all client requests flow through an API Gateway (Traefik), which routes them to appropriate microservices. Services communicate synchronously via REST/HTTP for request-response patterns and asynchronously via Kafka for event-driven workflows. Service discovery is handled by Docker DNS in local environments and Kubernetes DNS in production.
Nền tảng GoodGo sử dụng kiến trúc microservices nơi tất cả client requests đi qua API Gateway (Traefik), được route đến các microservices phù hợp. Các services giao tiếp đồng bộ qua REST/HTTP cho patterns request-response và bất đồng bộ qua Kafka cho workflows event-driven. Service discovery được xử lý bởi Docker DNS trong môi trường local và Kubernetes DNS trong production.
## Communication Protocols
## Protocols Giao tiếp
### Protocol Comparison
### So sánh Protocols
| Protocol | Latency | Complexity | Use Case |
|----------|---------|------------|----------|
@@ -70,7 +143,7 @@ The GoodGo platform uses a microservices architecture where all client requests
| **Events** | Async | Medium | Decoupled workflows |
| **GraphQL** | Medium | Medium | Complex data fetching |
### REST/HTTP Pattern
### Pattern REST/HTTP
```mermaid
sequenceDiagram
@@ -87,30 +160,53 @@ sequenceDiagram
Gateway-->>Client: JSON Response
```
Synchronous request-response using HTTP/REST.
Request-response đồng bộ sử dụng HTTP/REST.
**Implementation**:
```typescript
// Service-to-service HTTP client
import axios from 'axios';
export class UserServiceClient {
private client = axios.create({
baseURL: process.env.USER_SERVICE_URL,
timeout: 5000,
headers: {
'x-service-auth': process.env.INTERNAL_API_KEY
**Triển khai (.NET với IHttpClientFactory)**:
```csharp
// EN: Service-to-service HTTP client
// VI: HTTP client cho giao tiếp giữa services
public class IamServiceClient : IIamServiceClient
{
private readonly HttpClient _httpClient;
private readonly ILogger<IamServiceClient> _logger;
public IamServiceClient(HttpClient httpClient, ILogger<IamServiceClient> logger)
{
_httpClient = httpClient;
_logger = logger;
}
public async Task<UserDto?> GetUserAsync(Guid userId, CancellationToken ct)
{
try
{
var response = await _httpClient.GetAsync($"/api/v1/users/{userId}", ct);
response.EnsureSuccessStatusCode();
return await response.Content.ReadFromJsonAsync<UserDto>(ct);
}
catch (HttpRequestException ex)
{
_logger.LogError(ex, "Failed to get user {UserId}", userId);
throw;
}
}
});
async getUser(userId: string): Promise<User> {
const response = await this.client.get(`/users/${userId}`);
return response.data;
}
}
// EN: Registration in Program.cs
// VI: Đăng ký trong Program.cs
builder.Services.AddHttpClient<IIamServiceClient, IamServiceClient>(client =>
{
client.BaseAddress = new Uri("http://iam-service-net:8080");
client.DefaultRequestHeaders.Add("X-Service-Name", "storage-service");
client.Timeout = TimeSpan.FromSeconds(5);
})
.AddPolicyHandler(GetRetryPolicy())
.AddPolicyHandler(GetCircuitBreakerPolicy());
```
### Event-Driven Pattern
### Pattern Event-Driven
```mermaid
sequenceDiagram
@@ -129,9 +225,9 @@ sequenceDiagram
end
```
Asynchronous event-based communication via Kafka.
Giao tiếp bất đồng bộ dựa trên events qua Kafka.
### Service Discovery
### Khám phá Dịch vụ
**Local (Docker Compose)**:
```yaml
@@ -147,7 +243,7 @@ http://service-name.namespace.svc.cluster.local
http://iam-service.default.svc.cluster.local:3001
```
## API Gateway Pattern
## Pattern API Gateway
```mermaid
graph LR
@@ -166,58 +262,66 @@ graph LR
LB --> Service1A[Instance A]
LB --> Service1B[Instance B]
classDef blue fill:#253041,stroke:#4b6584,color:#ffffff
class Gateway blue
%% Dark color palette with white text
classDef clientBlue fill:#1565c0,stroke:#fff,stroke-width:2px,color:#fff
classDef gatewayBlue fill:#0f4c81,stroke:#fff,stroke-width:2px,color:#fff
classDef featurePurple fill:#4527a0,stroke:#fff,stroke-width:2px,color:#fff
classDef serviceGreen fill:#1e3a29,stroke:#3c7a52,stroke-width:2px,color:#fff
class Client clientBlue
class Gateway gatewayBlue
class Route,LB,Auth,Rate,CORS featurePurple
class Service1,Service2,Service1A,Service1B serviceGreen
```
Single entry point for all client requests with routing, auth, rate limiting.
Điểm vào duy nhất cho tất cả client requests với routing, auth, rate limiting.
## Performance Characteristics
## Đặc điểm Hiệu suất
Performance expectations and optimization strategies for inter-service communication.
Kỳ vọng hiệu suất và chiến lược tối ưu cho giao tiếp giữa các services.
| Metric | Target | Notes |
|--------|--------|-------|
| **REST API Response Time** | < 100ms | P95 for internal service-to-service calls |
| **Event Publishing Latency** | < 50ms | Time to publish to Kafka |
| **Service Discovery Lookup** | < 10ms | DNS resolution time |
| **Gateway Routing Overhead** | < 20ms | Additional latency added by Traefik |
| **Throughput** | 10,000 req/s | Per service instance |
| **Kafka Event Processing** | < 500ms | P95 end-to-end event processing |
| Chỉ số | Mục tiêu | Ghi chú |
|------------------|-------------------|-----------------|
| **Thời gian phản hồi REST API** | < 100ms | P95 cho các cuộc gọi service-to-service nội bộ |
| **Độ trễ publish event** | < 50ms | Thời gian publish tới Kafka |
| **Service discovery lookup** | < 10ms | Thời gian phân giải DNS |
| **Chi phí routing của Gateway** | < 20ms | Độ trễ thêm vào bởi Traefik |
| **Thông lượng** | 10,000 req/s | Mỗi service instance |
| **Xử lý Kafka event** | < 500ms | P95 xử lý event end-to-end |
**Optimization Strategies**:
- **Connection Pooling**: Reuse HTTP connections between services
- **Circuit Breaker**: Prevent cascading failures with Opossum library
- **Retry with Backoff**: Exponential backoff for transient failures
- **Compression**: Enable gzip for large payloads
- **Caching**: Cache service discovery results and responses
**Chiến lược Tối ưu**:
- **Connection Pooling**: Tái sử dụng HTTP connections giữa services
- **Circuit Breaker**: Ngăn chặn cascading failures với thư viện Opossum
- **Retry with Backoff**: Exponential backoff cho transient failures
- **Compression**: Bật gzip cho payloads lớn
- **Caching**: Cache kết quả service discovery responses
## Security Considerations
## Cân nhắc Bảo mật
Security measures for protecting inter-service communication.
Biện pháp bảo mật để bảo vệ giao tiếp giữa các services.
### Service-to-Service Authentication
### Xác thực Service-to-Service
- **Internal API Keys**: Services authenticate using `x-service-auth` header
- **JWT Tokens**: For user context propagation between services
- **Mutual TLS (mTLS)**: Optional for production environments (Kubernetes service mesh)
- **Internal API Keys**: Services xác thực sử dụng `x-service-auth` header
- **JWT Tokens**: Để truyền user context giữa services
- **Mutual TLS (mTLS)**: Tùy chọn cho môi trường production (Kubernetes service mesh)
### Network Security
### Bảo mật Mạng
- **Network Policies**: Kubernetes NetworkPolicies restrict service-to-service traffic
- **Service Mesh**: Istio/Linkerd for advanced security policies (optional)
- **Private Networks**: Services communicate within private VPC/cluster network
- **Network Policies**: Kubernetes NetworkPolicies hạn chế traffic service-to-service
- **Service Mesh**: Istio/Linkerd cho security policies nâng cao (tùy chọn)
- **Private Networks**: Services giao tiếp trong private VPC/cluster network
### Data Protection
### Bảo vệ Dữ liệu
- **Encryption in Transit**: TLS 1.2+ for all external communication
- **Event Payload Encryption**: Sensitive data encrypted before publishing to Kafka
- **API Gateway**: Traefik handles SSL termination and request validation
- **Encryption in Transit**: TLS 1.2+ cho mọi external communication
- **Event Payload Encryption**: Dữ liệu nhạy cảm được mã hóa trước khi publish tới Kafka
- **API Gateway**: Xử lý SSL termination request validation
### Security Best Practices
### Best Practices Bảo mật
```typescript
// Service client with authentication
// Service client với xác thực
export class SecureServiceClient {
private client = axios.create({
baseURL: process.env.SERVICE_URL,
@@ -227,48 +331,53 @@ export class SecureServiceClient {
'x-correlation-id': generateCorrelationId()
},
httpsAgent: new https.Agent({
rejectUnauthorized: true // Verify SSL certificates
rejectUnauthorized: true // Xác minh SSL certificates
})
});
}
```
## Deployment
## Triển khai
How microservices communication is deployed and scaled across environments.
Cách giao tiếp microservices được triển khai và mở rộng qua các môi trường.
```mermaid
graph TD
subgraph "Production Cluster"
LB[Load Balancer] --> Gateway[API Gateway\n3 replicas]
LB[Load Balancer] --> Gateway[API Gateway<br/>3 replicas]
Gateway --> ServiceA1[Service A\nInstance 1]
Gateway --> ServiceA2[Service A\nInstance 2]
Gateway --> ServiceB1[Service B\nInstance 1]
Gateway --> ServiceB2[Service B\nInstance 2]
Gateway --> ServiceA1[Service A<br/>Instance 1]
Gateway --> ServiceA2[Service A<br/>Instance 2]
Gateway --> ServiceB1[Service B<br/>Instance 1]
Gateway --> ServiceB2[Service B<br/>Instance 2]
ServiceA1 & ServiceA2 --> Kafka[Kafka Cluster\n3 brokers]
ServiceA1 & ServiceA2 --> Kafka[Kafka Cluster<br/>3 brokers]
ServiceB1 & ServiceB2 --> Kafka
ServiceA1 & ServiceA2 --> DB[(PostgreSQL\nPrimary + Replica)]
ServiceA1 & ServiceA2 --> DB[(PostgreSQL<br/>Primary + Replica)]
ServiceB1 & ServiceB2 --> DB
ServiceA1 & ServiceA2 --> Redis[(Redis Cluster\n3 nodes)]
ServiceA1 & ServiceA2 --> Redis[(Redis Cluster<br/>3 nodes)]
ServiceB1 & ServiceB2 --> Redis
end
classDef blue fill:#253041,stroke:#4b6584,color:#ffffff
classDef orange fill:#3a2e1e,stroke:#7a5f3c,color:#ffffff
classDef green fill:#1e3a29,stroke:#3c7a52,color:#ffffff
classDef red fill:#3a1e1e,stroke:#7a3c3c,color:#ffffff
%% Dark color palette with white text and white strokes
classDef lbGrey fill:#424242,stroke:#fff,stroke-width:2px,color:#fff
classDef gatewayBlue fill:#0f4c81,stroke:#fff,stroke-width:2px,color:#fff
classDef servicePurple fill:#4527a0,stroke:#fff,stroke-width:2px,color:#fff
classDef kafkaOrange fill:#3a2e1e,stroke:#fff,stroke-width:2px,color:#fff
classDef dbGreen fill:#1e3a29,stroke:#fff,stroke-width:2px,color:#fff
classDef redisRed fill:#3a1e1e,stroke:#fff,stroke-width:2px,color:#fff
class Gateway blue
class Kafka orange
class DB green
class Redis red
class LB lbGrey
class Gateway gatewayBlue
class ServiceA1,ServiceA2,ServiceB1,ServiceB2 servicePurple
class Kafka kafkaOrange
class DB dbGreen
class Redis redisRed
```
### Deployment Environments
### Môi trường Triển khai
| Environment | Gateway | Services | Kafka | Service Discovery |
|-------------|---------|----------|-------|-------------------|
@@ -276,18 +385,18 @@ graph TD
| **Staging** | Traefik (2 replicas) | 2 replicas per service | 3 brokers | Kubernetes DNS |
| **Production** | Traefik (3+ replicas) | 3+ replicas per service | 5+ brokers | Kubernetes DNS + Service Mesh |
### Scaling Strategy
### Chiến lược Mở rộng
- **Horizontal Pod Autoscaler (HPA)**: Auto-scale based on CPU/memory
- **Kafka Partitions**: Scale event processing by increasing partitions
- **Load Balancing**: Kubernetes Service load balances across pod replicas
- **Gateway Scaling**: Traefik scales independently from backend services
- **Horizontal Pod Autoscaler (HPA)**: Tự động scale dựa trên CPU/memory
- **Kafka Partitions**: Scale event processing bằng cách tăng partitions
- **Load Balancing**: Cân bằng tải giữa pod replicas
- **Gateway Scaling**: Traefik scale độc lập với backend services
## Monitoring & Observability
## Giám sát & Khả năng quan sát
How to monitor and observe microservices communication.
Cách giám sát và quan sát giao tiếp microservices.
### Key Metrics
### Chỉ số Chính
**Service-to-Service Metrics**:
- `http_request_duration_seconds` - Request latency histogram
@@ -305,16 +414,16 @@ How to monitor and observe microservices communication.
- `kafka_consumer_lag` - Consumer lag
- `kafka_consumer_records_consumed_total` - Events consumed
### Health Checks
### Kiểm tra Sức khỏe
**Service Endpoints**:
```typescript
// Liveness - is service running?
// Liveness - service có đang chạy không?
app.get('/health/live', (req, res) => {
res.json({ status: 'ok', timestamp: new Date().toISOString() });
});
// Readiness - can service handle traffic?
// Readiness - service có thể xử lý traffic không?
app.get('/health/ready', async (req, res) => {
const checks = {
database: await checkDatabase(),
@@ -344,13 +453,13 @@ readinessProbe:
periodSeconds: 5
```
### Distributed Tracing
### Tracing Phân tán
- **OpenTelemetry**: Instrument all service-to-service calls
- **Jaeger**: Visualize distributed traces
- **Correlation IDs**: Propagate via `x-correlation-id` header for request tracking
- **OpenTelemetry**: Instrument tất cả service-to-service calls
- **Jaeger**: Hiển thị distributed traces
- **Correlation IDs**: Truyền qua `x-correlation-id` header để tracking requests
### Monitoring Dashboard
### Dashboard Giám sát
**Grafana Panels**:
- Service Communication Overview (request rate, latency, errors)
@@ -358,9 +467,9 @@ readinessProbe:
- Event Bus Health (Kafka lag, throughput)
- Service Dependencies (service map from traces)
## Related Documentation
## Tài liệu Liên quan
- [System Design](./system-design.md) - Overall architecture
- [System Design](./system-design.md) - Kiến trúc tổng thể
- [Event-Driven Architecture](./event-driven-architecture.md) - Event patterns
- [API Gateway Advanced](../skills/api-gateway-advanced.md) - Gateway patterns
- [Inter-Service Communication](../skills/inter-service-communication.md) - Communication patterns
@@ -393,6 +502,6 @@ readinessProbe:
---
**Last Updated**: 2026-01-07
**Authors**: GoodGo Architecture Team
**Cập nhật lần cuối / Last Updated**: 2026-01-14
**Tác giả / Authors**: GoodGo Architecture Team
**Reviewers**: To be assigned

View File

@@ -1,8 +1,8 @@
# Observability Architecture
# Kiến trúc Khả năng Quan sát
> **Note**: Comprehensive observability with metrics, logging, and tracing
> **Note**: Khả năng quan sát toàn diện với metrics, logging tracing
## Overview Diagram
## Sơ đồ Tổng quan
```mermaid
graph TD
@@ -42,11 +42,11 @@ graph TD
class Grafana,GrafanaLogs dashboard;
```
## System Context
## Bối cảnh Hệ thống
```mermaid
C4Context
title Observability System Context
title Sơ đồ Bối cảnh Khả năng Quan sát
Person(dev, "Developer", "Uses dashboards to monitor system")
Person(sre, "SRE", "Manages infrastructure & alerts")
@@ -68,12 +68,12 @@ C4Context
UpdateElementStyle(k8s, $fontColor="white", $bgColor="#4A5568", $borderColor="white")
```
### Context Description
- **Observability Stack**: Central hub for collecting and displaying data (Prometheus, Loki, Jaeger, Grafana).
- **Microservices**: Send logs, metrics, and traces (OpenTelemetry).
- **Developer/SRE**: Use Grafana to monitor system health and debug.
### Mô tả Bối cảnh
- **Observability Stack**: Trung tâm thu thập và hiển thị dữ liệu (Prometheus, Loki, Jaeger, Grafana).
- **Microservices**: Gửi logs, metrics traces (OpenTelemetry).
- **Developer/SRE**: Sử dụng Grafana để theo dõi sức khỏe hệ thống và debug.
## Three Pillars of Observability
## Ba Trụ cột Khả năng Quan sát
### 1. Metrics (Prometheus + Grafana)
@@ -94,9 +94,9 @@ graph LR
class Grafana grafana;
```
**Description**: Numerical measurements over time (requests/sec, latency, errors).
**Mô tả**: Các phép đo số theo thời gian (requests/sec, latency, errors).
**Implementation**:
**Triển khai**:
```typescript
import { Counter, Histogram, Gauge } from 'prom-client';
@@ -119,7 +119,7 @@ export const activeRequests = new Gauge({
help: 'Number of active HTTP requests'
});
// Middleware to track metrics
// Middleware để track metrics
export function metricsMiddleware(req, res, next) {
const start = Date.now();
activeRequests.inc();
@@ -145,19 +145,19 @@ export function metricsMiddleware(req, res, next) {
}
```
### 2. Logging (Winston + Loki)
### 2. Logging (Serilog + Loki)
```mermaid
sequenceDiagram
participant Service
participant Winston as Winston Logger
participant Serilog as Serilog Logger
participant Loki
participant Grafana
Service->>Winston: Log event
Winston->>Winston: Format JSON
Winston->>Winston: Add metadata<br/>(correlation ID, trace ID)
Winston->>Loki: Push logs
Service->>Serilog: Log event
Serilog->>Serilog: Format JSON
Serilog->>Serilog: Add metadata<br/>(correlation ID, trace ID)
Serilog->>Loki: Push logs
Loki->>Loki: Index & store
User->>Grafana: Query logs
@@ -165,52 +165,49 @@ sequenceDiagram
Loki-->>Grafana: Log results
```
**Description**: Structured logging with correlation IDs for request tracing.
**Mô tả**: Structured logging với correlation IDs để tracing requests.
**Implementation**:
```typescript
import winston from 'winston';
**Triển khai (.NET)**:
```csharp
// Program.cs - Serilog configuration
builder.Host.UseSerilog((context, config) => config
.ReadFrom.Configuration(context.Configuration)
.Enrich.FromLogContext()
.Enrich.WithProperty("Service", serviceName)
.Enrich.WithProperty("Environment", environment)
.WriteTo.Console(new JsonFormatter())
.WriteTo.GrafanaLoki(
"http://loki:3100",
labels: new [] { new LokiLabel { Key = "app", Value = serviceName } }
));
export const logger = winston.createLogger({
level: process.env.LOG_LEVEL || 'info',
format: winston.format.combine(
winston.format.timestamp(),
winston.format.errors({ stack: true }),
winston.format.json()
),
defaultMeta: {
service: process.env.SERVICE_NAME || 'unknown-service',
environment: process.env.NODE_ENV || 'development'
},
transports: [
new winston.transports.Console(),
// Loki transport (if configured)
]
});
// Logger middleware
export function loggerMiddleware(req, res, next) {
const correlationId = req.headers['x-correlation-id'] || generateId();
req.correlationId = correlationId;
req.logger = logger.child({ correlationId });
req.logger.info('Incoming request', {
method: req.method,
path: req.path,
ip: req.ip
});
res.on('finish', () => {
req.logger.info('Request completed', {
method: req.method,
path: req.path,
status: res.statusCode,
duration: Date.now() - req.startTime
});
});
next();
// Middleware - Add correlation ID
public class CorrelationIdMiddleware
{
private readonly RequestDelegate _next;
private readonly ILogger<CorrelationIdMiddleware> _logger;
public async Task InvokeAsync(HttpContext context)
{
var correlationId = context.Request.Headers["X-Correlation-Id"].FirstOrDefault()
?? Guid.NewGuid().ToString();
context.Items["CorrelationId"] = correlationId;
context.Response.Headers["X-Correlation-Id"] = correlationId;
using (LogContext.PushProperty("CorrelationId", correlationId))
{
_logger.LogInformation("Request started: {Method} {Path}",
context.Request.Method, context.Request.Path);
var sw = Stopwatch.StartNew();
await _next(context);
sw.Stop();
_logger.LogInformation("Request completed: {StatusCode} in {Duration}ms",
context.Response.StatusCode, sw.ElapsedMilliseconds);
}
}
}
```
@@ -238,82 +235,71 @@ graph LR
class Jaeger jaeger;
```
**Description**: Distributed tracing to track requests across services.
**Mô tả**: Distributed tracing để track requests giữa các services.
**Implementation**:
```typescript
import { trace, SpanStatusCode } from '@opentelemetry/api';
> [!NOTE]
> Distributed Tracing với Jaeger đang trong kế hoạch triển khai. Hiện tại sử dụng correlation IDs cho request tracking.
// Create traced function
export function traced<T>(
name: string,
fn: () => Promise<T>
): Promise<T> {
const tracer = trace.getTracer('app');
const span = tracer.startSpan(name);
return fn()
.then(result => {
span.setStatus({ code: SpanStatusCode.OK });
return result;
})
.catch(error => {
span.setStatus({
code: SpanStatusCode.ERROR,
message: error.message
});
span.recordException(error);
throw error;
})
.finally(() => {
span.end();
});
}
**Triển khai (.NET với OpenTelemetry)**:
```csharp
// Program.cs - OpenTelemetry configuration (planned)
builder.Services.AddOpenTelemetry()
.WithTracing(tracing => tracing
.AddAspNetCoreInstrumentation()
.AddHttpClientInstrumentation()
.AddEntityFrameworkCoreInstrumentation()
.AddJaegerExporter(options =>
{
options.AgentHost = "jaeger";
options.AgentPort = 6831;
}));
// Usage
async getUserWithTracing(userId: string): Promise<User> {
return traced('getUserById', async () => {
return await userRepository.findById(userId);
});
// Manual span creation
public async Task<User?> GetUserByIdAsync(Guid userId, CancellationToken ct)
{
using var activity = ActivitySource.StartActivity("GetUserById");
activity?.SetTag("user.id", userId.ToString());
try
{
var user = await _context.Users.FindAsync([userId], ct);
activity?.SetStatus(ActivityStatusCode.Ok);
return user;
}
catch (Exception ex)
{
activity?.SetStatus(ActivityStatusCode.Error, ex.Message);
throw;
}
}
```
## Health Checks
## Kiểm tra Sức khỏe
```typescript
// Liveness probe - is service running?
app.get('/health/live', (req, res) => {
res.json({ status: 'ok', timestamp: new Date().toISOString() });
// Health check (.NET)
app.MapHealthChecks("/health", new HealthCheckOptions
{
ResponseWriter = UIResponseWriter.WriteHealthCheckUIResponse
});
// Readiness probe - is service ready for traffic?
app.get('/health/ready', async (req, res) => {
const checks = {
database: await checkDatabase(),
redis: await checkRedis(),
disk: await checkDiskSpace()
};
const ready = Object.values(checks).every(check => check === true);
res.status(ready ? 200 : 503).json({
ready,
checks,
timestamp: new Date().toISOString()
});
app.MapHealthChecks("/health/live", new HealthCheckOptions
{
Predicate = _ => false // Liveness - always return healthy
});
async function checkDatabase(): Promise<boolean> {
try {
await prisma.$queryRaw`SELECT 1`;
return true;
} catch {
return false;
}
}
app.MapHealthChecks("/health/ready", new HealthCheckOptions
{
Predicate = check => check.Tags.Contains("ready")
});
// Health check registration
builder.Services.AddHealthChecks()
.AddNpgSql(connectionString, name: "database", tags: new[] { "ready" })
.AddRedis(redisConnectionString, name: "redis", tags: new[] { "ready" });
```
## Alerting Rules
## Quy tắc Cảnh báo
```yaml
# Prometheus alerting rules
@@ -321,7 +307,7 @@ groups:
- name: service_alerts
interval: 30s
rules:
# High error rate
# Tỷ lệ lỗi cao
- alert: HighErrorRate
expr: |
rate(http_requests_total{status=~"5.."}[5m]) > 0.05
@@ -332,7 +318,7 @@ groups:
summary: "High error rate detected"
description: "Error rate is {{ $value }} (> 5%)"
# High latency
# Độ trễ cao
- alert: HighLatency
expr: |
histogram_quantile(0.95, http_request_duration_seconds_bucket) > 1
@@ -353,11 +339,11 @@ groups:
summary: "Service is down"
```
## Performance Targets
## Đặc điểm Hiệu suất
### Performance Goals
| Metric | Target | Notes |
|--------|--------|-------|
### Mục tiêu Hiệu suất
| Chỉ số | Mục tiêu | Ghi chú |
|--------|----------|---------|
| **Metric Scrape Interval** | 15s | Critical services |
| **Log Ingestion Latency** | < 1s | Time from emit to queryable |
| **Trace Sampling Rate** | 10% | Production (100% in Dev/Staging) |
@@ -365,15 +351,15 @@ groups:
| **Alert Evaluation** | Every 1m | Evaluation interval |
| **Retention Policy** | 14 days | Logs & Traces (Metrics: 30 days) |
## Security Considerations
## Cân nhắc Bảo mật
### Observability Security
- **Log Scrubbing**: Automatically remove PII (emails, ssn, credit cards) and secrets from logs before ingestion.
- **Access Control**: Grafana integrated with OAuth2/OIDC, with Viewer/Editor/Admin roles.
- **Network Policy**: Only allow traffic from internal namespace to ingestion ports (9090, 3100, 14268).
- **TLS**: Encrypt traffic between agents and collectors.
### Bảo mật Observability
- **Log Scrubbing**: Tự động loại bỏ PII (emails, ssn, credit cards) secrets khỏi logs trước khi ingestion.
- **Access Control**: Grafana integrated với OAuth2/OIDC, phân quyền Viewer/Editor/Admin.
- **Network Policy**: Chỉ cho phép traffic từ namespace nội bộ tới các cổng ingestion (9090, 3100, 14268).
- **TLS**: Mã hóa traffic giữa agents collectors.
## Deployment
## Triển khai
```mermaid
graph TD
@@ -415,15 +401,15 @@ graph TD
class App,Agent app;
```
**Deployment Description**:
- **Agent**: Promtail or Grafana Agent runs as DaemonSet or Sidecar to collect logs.
- **Pull Model**: Prometheus scrapes metrics from `/metrics` endpoints.
- **Push Model**: Traces and Logs are pushed to collectors.
- **Resources**: Dedicated nodes for monitoring stack in production to avoid impacting main workload.
**Mô tả Triển khai**:
- **Agent**: Promtail hoặc Grafana Agent chạy như DaemonSet hoặc Sidecar để thu thập logs.
- **Pull Model**: Prometheus scrape metrics từ endpoints `/metrics`.
- **Push Model**: Traces Logs được push tới collectors.
- **Resources**: Dedicated nodes cho monitoring stack trong production để tránh ảnh hưởng workload chính.
## Related Documentation
## Tài liệu Liên quan
- [System Design](./system-design.md) - Overall architecture
- [System Design](./system-design.md) - Kiến trúc tổng thể
- [Caching Architecture](./caching-architecture.md) - Cache metrics
## Quick Tips
@@ -459,5 +445,5 @@ graph TD
---
**Last Updated**: 2026-01-10
**Author**: GoodGo Architecture Team
**Cập nhật Lần cuối**: 2026-01-14
**Tác giả**: GoodGo Architecture Team

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff