docs: Thêm tài liệu kiến trúc bảo mật, hướng sự kiện, nhất quán dữ liệu, khả năng quan sát và caching bằng tiếng Việt, đồng thời cập nhật các tài liệu hướng dẫn và kiến trúc hiện có.

2026-01-07 10:22:42 +07:00
parent d8faffd41d
commit 495618ded7
17 changed files with 7357 additions and 779 deletions
--- a/docs/en/architecture/caching-architecture.md
+++ b/docs/en/architecture/caching-architecture.md
@@ -1,9 +1,8 @@
-# Caching Architecture / Kiến trúc Caching
+# Caching Architecture
-> **EN**: Multi-layer caching strategy for optimal performance
+> Multi-layer caching strategy for optimal performance
 > **VI**: Chiến lược caching nhiều tầng để tối ưu hiệu suất
-## Overview Diagram / Sơ đồ Tổng quan
+## Overview Diagram
 ```mermaid
 graph TD
@@ -24,9 +23,35 @@ graph TD
    style DB fill:#f0e1ff
 ```
-## Architecture Description / Mô tả Kiến trúc
+## System Context
-### EN: Multi-Layer Caching
+```mermaid
 C4Context
    title Caching System Context
    System(service, "Microservice", "Client service using cache")
    System_Ext(db, "Neon PostgreSQL", "Primary database")
    Boundary(caching, "Caching Layer") {
        System(l1, "L1 Cache", "In-memory NodeCache")
        System(l2, "L2 Cache", "Redis Cluster")
    }
    Rel(service, l1, "Reads/Writes", "In-process")
    Rel(service, l2, "Reads/Writes", "Redis Protocol")
    Rel(l1, l2, "Fills from", "On miss")
    Rel(l2, db, "Cache aside", "On miss")
 ```
 ### Context Description
 - **Service**: Communicates directly with L1 Cache (in-memory) for lowest latency.
 - **L1 Cache**: Local cache, not shared, automatic expiration (short TTL).
 - **L2 Cache**: Shared Redis cluster, holds data longer and syncs across instances.
 - **Database**: Source of truth, accessed only on cache miss.
 ## Architecture Description
 ### Multi-Layer Caching
 GoodGo platform uses 2-layer caching for performance:
@@ -52,30 +77,11 @@ Request → L1 → L2 → Database
 hit rate hit rate        rate
 ```
-### VI: Caching Nhiều Tầng
+## Cache Implementation
 Nền tảng GoodGo sử dụng caching 2 tầng để tối ưu hiệu suất:
 **L1 Cache (Memory)**:
 - In-memory cache trên mỗi service instance
 - Truy cập rất nhanh (< 1ms)
 - Dung lượng giới hạn (10k keys mặc định)
 - TTL ngắn (60 giây mặc định, tối đa 5 phút)
 - Không share giữa instances
 **L2 Cache (Redis)**:
 - Shared distributed cache
 - Truy cập nhanh (< 5ms)
 - Dung lượng lớn
 - TTL dài hơn (configurable, thường 5-15 phút)
 - Share giữa tất cả service instances
 ## Cache Implementation / Triển khai Cache
 ### Multi-Layer Cache Service
 ```typescript
 // Multi-layer cache implementation
 export class MultiLayerCache {
  private l1Cache: NodeCache;
  private l2Cache: Redis;
@@ -143,13 +149,12 @@ export class MultiLayerCache {
 }
 ```
-### Cache Key Naming / Quy ước Đặt tên Key
+### Cache Key Naming
 **Pattern**: `{service}:{entity}:{identifier}:{sub-resource}`
 **Examples**:
 ```typescript
 // User cache keys
 const keys = {
  user: (userId: string) => `iam:user:${userId}`,
  userPermissions: (userId: string) => `iam:user:${userId}:permissions`,
@@ -162,7 +167,7 @@ const user = await cache.get(keys.user('user_123'));
 const permissions = await cache.get(keys.userPermissions('user_123'));
 ```
-## TTL Strategies / Chiến lược TTL
+## TTL Strategies
 ```mermaid
 graph LR
@@ -196,7 +201,7 @@ graph LR
 | Static config | 30-60 min | Very stable |
 | Reference data | 1-2 hours | Almost never changes |
-## Cache Invalidation / Vô hiệu hóa Cache
+## Cache Invalidation
 ```mermaid
 sequenceDiagram
@@ -244,7 +249,7 @@ async updateUserRole(userId: string, roleId: string): Promise<void> {
 // Automatically handled by cache
 ```
-## Cache Warming / Làm ấm Cache
+## Cache Warming
 ```typescript
 // Preload frequently accessed data
@@ -271,33 +276,83 @@ async warmCache(): Promise<void> {
 warmCache().catch(err => logger.error('Cache warming failed', { err }));
 ```
-## Performance Metrics / Chỉ số Hiệu suất
+## Design Decisions
 ### Decision 1: Multi-layer Caching (L1 + L2)
 **Context**: Need to reduce load on Redis and achieve ultra-low latency for hot data.
 **Decision**: Use combination of L1 (NodeCache) and L2 (Redis).
 **Consequences**:
 - ✅ Latency < 1ms for 40-50% requests.
 - ✅ Reduced network traffic to Redis.
 - ❌ Synchronization complexity (L1 might be stale for short duration).
 ## Performance Characteristics
 ### Performance Targets
 | Metric | Target | Notes |
 |--------|--------|-------|
 | **L1 Hit Latency** | < 0.5ms | In-memory lookup |
 | **L2 Hit Latency** | < 5ms | Network RTT + Redis processing |
 | **Combine Hit Rate** | > 90% | L1 + L2 combined |
 | **L1 Capacity** | 10k items | Per instance limit to protect heap |
 | **Cache Warmup Time** | < 30s | At service startup |
 ## Security Considerations
 ### Cache Security
 - **Encryption**: Sensitive data (PII) MUST be encrypted before storing in L2 Redis (AES-256). L1 can store plaintext as it is in process memory (unless memory dump).
 - **Isolation**: Redis instance protected by password and Network Policy (allow internal K8s traffic only).
 - **TLS**: Connect to Redis via TLS 1.2+.
 - **Data Sanitization**: Do not cache entire user objects if they contain password hashes or secrets.
 ## Deployment
 ```mermaid
 graph TD
    subgraph "Kubernetes Pod"
        Service[Microservice Container]
        L1[L1 Cache (RAM)]
        Service --- L1
    end
    subgraph "Infrastructure"
        RedisMaster[Redis Master]
        RedisSlave1[Redis Slave 1]
        RedisSlave2[Redis Slave 2]
    end
    Service -->|Write| RedisMaster
    Service -->|Read| RedisSlave1
    Service -->|Read| RedisSlave2
    RedisMaster -.->|Replication| RedisSlave1
    RedisMaster -.->|Replication| RedisSlave2
    style Service fill:#e1f5ff
    style L1 fill:#d4edda
    style RedisMaster fill:#fff4e1
 ```
 **Deployment Description**:
 - **L1**: Embedded directly in Microservice process, scales with number of Pods.
 - **L2**: Redis Cluster (or Sentinel) with at least 3 nodes for High Availability.
 - **Connection Pooling**: Use ioredis with connection pooling for efficient connection management.
 ## Monitoring & Observability
 ### Monitoring Metrics
 - **Metrics**: Prometheus metrics for hit rate, miss rate, latency, memory usage.
 - **Logs**: Log cache miss/hit at debug level (sampled), log connection errors at error level.
 - **Health Checks**: Readiness probe checks connection to Redis.
 ### Monitoring Code
 **Cache Hit Rates**:
 ```typescript
 // Track cache performance
 export class CacheMetrics {
-  private hits = new Counter({
+  // ... Prometheus Implementation ...
    name: 'cache_hits_total',
    help: 'Total cache hits',
    labelNames: ['layer', 'key_prefix']
  });
  private misses = new Counter({
    name: 'cache_misses_total',
    help: 'Total cache misses',
    labelNames: ['layer', 'key_prefix']
  });
  recordHit(layer: 'l1' | 'l2', key: string): void {
    const prefix = key.split(':')[0];
    this.hits.inc({ layer, key_prefix: prefix });
  }
  recordMiss(key: string): void {
    const prefix = key.split(':')[0];
    this.misses.inc({ layer: 'db', key_prefix: prefix });
  }
 }
 ```
@@ -308,7 +363,7 @@ export class CacheMetrics {
 | Hit Rate | 40-50% | 80-90% | - |
 | Capacity | 10k keys | Unlimited | - |
-## Best Practices / Best Practices
+## Best Practices
 **DO**:
 - ✅ Use cache for frequently accessed data
@@ -325,13 +380,3 @@ export class CacheMetrics {
 - ❌ Cache sensitive data without encryption
 - ❌ Ignore cache invalidation on updates
 - ❌ Use cache as primary data store
 ## Related Documentation / Tài liệu Liên quan
 - [System Design](./system-design.md) - Overall architecture with caching
 - [Data Consistency Patterns](./data-consistency-patterns.md) - Cache invalidation patterns
 ---
 **Last Updated**: 2024-01-15  
 **Authors**: GoodGo Architecture Team
--- a/docs/en/architecture/event-driven-architecture.md
+++ b/docs/en/architecture/event-driven-architecture.md
@@ -1,9 +1,8 @@
-# Event-Driven Architecture / Kiến trúc Hướng Sự kiện
+# Event-Driven Architecture
-> **EN**: Event-driven architecture for asynchronous communication using Apache Kafka
+> Event-driven architecture for asynchronous communication using Apache Kafka
 > **VI**: Kiến trúc hướng sự kiện cho giao tiếp bất đồng bộ sử dụng Apache Kafka
-## Overview Diagram / Sơ đồ Tổng quan
+## Overview Diagram
 ```mermaid
 graph TD
@@ -32,9 +31,7 @@ graph TD
    style Topics fill:#fff4e1
 ```
-## Architecture Description / Mô tả Kiến trúc
+## Architecture Description
 ### EN: English Section
 The GoodGo platform implements Event-Driven Architecture (EDA) for asynchronous communication between microservices.
@@ -47,28 +44,11 @@ The GoodGo platform implements Event-Driven Architecture (EDA) for asynchronous
 **Technology Stack**:
 - Apache Kafka - Event streaming platform
- Schema Registry - Avro schemas for validation  
+- Schema Registry - Avro schemas for validation
 - KafkaJS - Node.js client library
 - Event Sourcing - Custom implementation in IAM
-### VI: Vietnamese Section
+## Event Flow
 Nền tảng GoodGo triển khai Kiến trúc Hướng Sự kiện (EDA) cho giao tiếp bất đồng bộ giữa microservices.
 **Nguyên tắc Cốt lõi**:
 1. **Event-First Design**: Mọi thay đổi trạng thái phát ra domain events
 2. **Loose Coupling**: Services giao tiếp qua events
 3. **Eventual Consistency**: Chấp nhận inconsistency tạm thời  
 4. **Event Sourcing**: Lưu thay đổi dưới dạng chuỗi event
 5. **CQRS Pattern**: Tách biệt read/write operations
 **Công nghệ**:
 - Apache Kafka - Nền tảng event streaming
 - Schema Registry - Avro schemas để validation
 - KafkaJS - Thư viện Node.js client  
 - Event Sourcing - Triển khai tùy chỉnh trong IAM
 ## Event Flow / Luồng Sự kiện
 ```mermaid
 sequenceDiagram
@@ -82,11 +62,9 @@ sequenceDiagram
    Consumer-->>Kafka: Acknowledge
 ```
-**EN Steps**: Publish → Distribute → Consume → Retry (if failed) → DLQ (after max retries) → Acknowledge
+**Steps**: Publish → Distribute → Consume → Retry (if failed) → DLQ (after max retries) → Acknowledge
-**VI Các Bước**: Publish → Distribute → Consume → Retry (nếu thất bại) → DLQ (sau retry tối đa) → Acknowledge
+## Event Structure
 ## Event Structure / Cấu trúc Sự kiện
 ```typescript
 interface BaseEvent {
@@ -114,7 +92,7 @@ interface BaseEvent {
 }
 ```
-## Kafka Topics / Kafka Topics
+## Kafka Topics
 ```mermaid
 graph LR
@@ -134,7 +112,7 @@ graph LR
 - `auth.login.success.v1`
 - `audit.event.logged.v1`
-## Error Handling / Xử lý Lỗi
+## Error Handling
 ```mermaid
 graph TD
@@ -151,12 +129,247 @@ graph TD
 3. Move to DLQ after max retries
 4. Manual review and reprocess
-## Related Documentation / Tài liệu Liên quan
+## System Context
 ```mermaid
 C4Context
    title Event-Driven Architecture Context
    System(iam, "IAM Service", "Event producer")
    System(service_a, "Service A", "Event producer")
    System(notification, "Notification Service", "Event consumer")
    System(audit, "Audit Service", "Event consumer")
    System_Ext(kafka, "Apache Kafka", "Event streaming platform")
    System_Ext(registry, "Schema Registry", "Schema management")
    System_Ext(monitoring, "Monitoring", "Kafka metrics & alerts")
    Rel(iam, kafka, "Publishes events", "Kafka Protocol")
    Rel(service_a, kafka, "Publishes events", "Kafka Protocol")
    Rel(kafka, notification, "Delivers events", "Kafka Protocol")
    Rel(kafka, audit, "Delivers events", "Kafka Protocol")
    Rel(kafka, registry, "Validates schemas", "HTTP")
    Rel(kafka, monitoring, "Sends metrics", "JMX")
 ```
 **Context Description**:
 - **Producers**: IAM Service and other services publish domain events
 - **Kafka**: Central event broker, manages topics and partitions
 - **Consumers**: Notification and Audit services consume events
 - **Schema Registry**: Manages and validates Avro schemas
 - **Monitoring**: Collects metrics from Kafka cluster
 ## Performance Characteristics
 | Metric | Target | Notes |
 |--------|--------|-------|
 | **Event Publish Latency (P95)** | < 10ms | Fire-and-forget, async |
 | **Event Delivery Latency (P95)** | < 100ms | End-to-end from publish to consume |
 | **Throughput** | 10,000 events/s | Per topic, scalable with partitions |
 | **Consumer Lag** | < 1000 messages | Per partition, monitored |
 | **Event Size** | < 1MB | Recommended max size |
 | **Retention** | 7 days | Default, configurable per topic |
 | **Replication Factor** | 3 | For fault tolerance |
 **Performance Optimizations**:
 - **Batch Publishing**: Group multiple events to reduce network overhead
 - **Compression**: Use Snappy or LZ4 compression
 - **Partitioning**: Divide topics into multiple partitions for parallel processing
 - **Consumer Groups**: Multiple consumers in same group for horizontal scaling
 - **Async Publishing**: Fire-and-forget pattern, don't block request handlers
 ## Security Considerations
 **Event Encryption**:
 - TLS in-transit for all Kafka connections
 - Optional payload encryption for sensitive data
 - End-to-end encryption with custom encryption layer
 **Access Control**:
 - Kafka ACLs (Access Control Lists) per topic
 - SASL/SCRAM authentication for producers and consumers
 - Separate credentials per service
 - Principle of least privilege - grant only necessary permissions
 **Schema Validation**:
 - Avro schemas in Schema Registry
 - Schema evolution with backward/forward compatibility
 - Reject events that don't match schema
 **Audit**:
 - Log all event publishes and consumes
 - Correlation IDs to trace event flow
 - Retention policy for audit logs (7 years)
 **Data Retention**:
 - Default 7 days retention
 - Configurable per topic
 - Automatic deletion after retention period
 - GDPR compliance (right to erasure)
 ## Deployment
 ```mermaid
 graph TD
    subgraph "Kafka Cluster"
        subgraph "Brokers"
            Broker1[Kafka Broker 1<br/>Leader for partitions 0,3,6]
            Broker2[Kafka Broker 2<br/>Leader for partitions 1,4,7]
            Broker3[Kafka Broker 3<br/>Leader for partitions 2,5,8]
        end
        subgraph "Coordination"
            ZK[Zookeeper Ensemble<br/>3 nodes]
        end
        Broker1 --> ZK
        Broker2 --> ZK
        Broker3 --> ZK
    end
    subgraph "Producers"
        IAM[IAM Service]
        ServiceA[Service A]
    end
    subgraph "Consumers"
        Notification[Notification Service<br/>Consumer Group: notifications]
        Audit[Audit Service<br/>Consumer Group: audit]
    end
    IAM --> Broker1
    IAM --> Broker2
    IAM --> Broker3
    ServiceA --> Broker1
    ServiceA --> Broker2
    ServiceA --> Broker3
    Broker1 --> Notification
    Broker2 --> Notification
    Broker3 --> Notification
    Broker1 --> Audit
    Broker2 --> Audit
    Broker3 --> Audit
    style Broker1 fill:#e1f5ff
    style Broker2 fill:#fff4e1
    style Broker3 fill:#d4edda
    style ZK fill:#f0e1ff
 ```
 **Kafka Cluster Configuration**:
 - **Brokers**: 3 brokers minimum (5 for production)
 - **Replication Factor**: 3 (for fault tolerance)
 - **Min In-Sync Replicas**: 2 (ensure data durability)
 - **Partitions**: 3-10 per topic (based on throughput needs)
 - **Zookeeper**: 3-node ensemble (for coordination)
 **Resource Allocation**:
 | Component | CPU | Memory | Disk |
 |-----------|-----|--------|------|
 | **Kafka Broker** | 2 cores | 4GB RAM | 100GB SSD |
 | **Zookeeper** | 1 core | 2GB RAM | 20GB SSD |
 | **Schema Registry** | 500m | 1GB RAM | 10GB |
 **Topic Configuration**:
 ```yaml
 user.created:
  partitions: 3
  replication-factor: 3
  retention-ms: 604800000  # 7 days
  compression-type: snappy
 auth.login.success:
  partitions: 5
  replication-factor: 3
  retention-ms: 604800000
  compression-type: snappy
 audit.events:
  partitions: 10
  replication-factor: 3
  retention-ms: 220752000000  # 7 years
  compression-type: lz4
 ```
 **High Availability**:
 - Multiple brokers with partition replication
 - Automatic leader election when broker fails
 - Consumer group rebalancing
 - Monitoring and alerting for broker health
 ## Monitoring & Observability
 **Key Metrics**:
 **Kafka Broker Metrics**:
 - `kafka_server_brokertopicmetrics_messagesinpersec` - Messages in/sec
 - `kafka_server_brokertopicmetrics_bytesinpersec` - Bytes in/sec
 - `kafka_server_brokertopicmetrics_bytesoutpersec` - Bytes out/sec
 - `kafka_controller_kafkacontroller_activecontrollercount` - Active controller
 - `kafka_server_replicamanager_underreplicatedpartitions` - Under-replicated partitions
 **Consumer Metrics**:
 - `kafka_consumer_fetch_manager_records_lag_max` - Max consumer lag
 - `kafka_consumer_fetch_manager_records_consumed_rate` - Records consumed/sec
 - `kafka_consumer_coordinator_commit_latency_avg` - Commit latency
 **Producer Metrics**:
 - `kafka_producer_record_send_total` - Total records sent
 - `kafka_producer_record_error_total` - Total send errors
 - `kafka_producer_request_latency_avg` - Request latency
 **Application Metrics**:
 ```typescript
 // Custom metrics for event processing
 const eventPublished = new Counter({
  name: 'events_published_total',
  help: 'Total events published',
  labelNames: ['event_type', 'topic']
 });
 const eventConsumed = new Counter({
  name: 'events_consumed_total',
  help: 'Total events consumed',
  labelNames: ['event_type', 'topic', 'consumer_group']
 });
 const eventProcessingDuration = new Histogram({
  name: 'event_processing_duration_seconds',
  help: 'Event processing duration',
  labelNames: ['event_type'],
  buckets: [0.01, 0.05, 0.1, 0.5, 1, 2, 5]
 });
 ```
 **Dashboards**:
 - Kafka Cluster Overview (brokers, topics, partitions)
 - Producer Performance (throughput, latency, errors)
 - Consumer Performance (lag, throughput, errors)
 - Topic Metrics (messages/sec, bytes/sec, retention)
 **Logging**:
 ```typescript
 // Structured logging for events
 logger.info('Event published', {
  eventId: event.eventId,
  eventType: event.eventType,
  topic: 'user.created',
  correlationId: event.correlationId
 });
 logger.info('Event consumed', {
  eventId: event.eventId,
  eventType: event.eventType,
  topic: 'user.created',
  consumerGroup: 'notifications',
  processingTime: duration
 });
 ```
 ## Related Documentation
 - [System Design](./system-design.md) - Overall architecture
 - [IAM Architecture](./iam-proposal.md) - Event sourcing implementation
 ---
 **Last Updated**: 2024-01-15  
 **Authors**: GoodGo Architecture Team
--- a/docs/en/architecture/security-architecture.md
+++ b/docs/en/architecture/security-architecture.md
@@ -1,9 +1,8 @@
-# Security Architecture / Kiến trúc Bảo mật
+# Security Architecture
-> **EN**: Comprehensive security architecture for GoodGo platform with zero-trust model, RBAC, and compliance
+> Comprehensive security architecture for GoodGo platform with zero-trust model, RBAC, and compliance
 > **VI**: Kiến trúc bảo mật toàn diện cho nền tảng GoodGo với mô hình zero-trust, RBAC và compliance
-## Overview Diagram / Sơ đồ Tổng quan
+## Overview Diagram
 ```mermaid
 graph TD
@@ -26,9 +25,7 @@ graph TD
    style Audit fill:#fff4e1
 ```
-## Architecture Description / Mô tả Kiến trúc
+## Architecture Description
 ### EN: English Section
 The GoodGo Security Architecture implements defense-in-depth with multiple security layers:
@@ -47,26 +44,7 @@ The GoodGo Security Architecture implements defense-in-depth with multiple secur
 - Event Sourcing for Audit Trail
 - Compliance (GDPR, SOC2, ISO27001, HIPAA)
-### VI: Vietnamese Section
+## Authentication Flow
 Kiến trúc Bảo mật GoodGo triển khai defense-in-depth với nhiều tầng bảo mật:
 **Nguyên tắc Bảo mật**:
 1. **Zero Trust**: Không bao giờ tin tưởng, luôn xác minh
 2. **Least Privilege**: Quyền tối thiểu cần thiết
 3. **Defense in Depth**: Nhiều tầng bảo mật
 4. **Audit Everything**: Audit trail hoàn chỉnh
 5. **Encryption**: Mã hóa dữ liệu at rest và in transit
 **Thành phần Chính**:
 - JWT Authentication (15min access, 7 ngày refresh)
 - RBAC + ABAC Authorization
 - Zero-Trust Device Validation
 - AES-256-GCM Encryption
 - Event Sourcing cho Audit Trail
 - Compliance (GDPR, SOC2, ISO27001, HIPAA)
 ## Authentication Flow / Luồng Xác thực
 ```mermaid
 sequenceDiagram
@@ -93,7 +71,7 @@ sequenceDiagram
    end
 ```
-**EN: Authentication Details**:
+**Authentication Details**:
 **1. Password Hashing**:
 - Algorithm: bcrypt with cost factor 12
@@ -116,30 +94,7 @@ sequenceDiagram
 - Backup codes (10 single-use)
 - Recovery email verification
-**VI: Chi tiết Xác thực**:
+## Authorization Model
 **1. Password Hashing**:
 - Thuật toán: bcrypt với cost factor 12
 - Không bao giờ lưu plaintext passwords
 - Password tối thiểu: 8 ký tự với quy tắc phức tạp
 **2. JWT Tokens**:
 - Access Token: 15 phút expiry
 - Refresh Token: 7 ngày expiry
 - Thuật toán: RS256 (asymmetric signing)
 - Payload: userId, roles, permissions
 **3. Token Storage**:
 - Access: httpOnly cookie (secure, sameSite)
 - Refresh: Database SHA-256 hash
 - Rotation: Refresh token mới mỗi lần sử dụng
 **4 MFA Support**:
 - TOTP (Time-based One-Time Password)
 - Backup codes (10 single-use)
 - Recovery email verification
 ## Authorization Model / Mô hình Phân quyền
 ```mermaid
 graph TD
@@ -162,7 +117,7 @@ graph TD
    style Perm fill:#fff4e1
 ```
-**EN: RBAC (Role-Based Access Control)**
+**RBAC (Role-Based Access Control)**:
 **1. Role Hierarchy**:
 ```
@@ -186,24 +141,7 @@ SuperAdmin > OrgAdmin > Manager > User > Guest
 // Invalidate on: role change, permission change
 ```
-**VI: RBAC (Role-Based Access Control)**
+## Zero-Trust Architecture
 **1. Cấp bậc Role**:
 ```
 SuperAdmin > OrgAdmin > Manager > User > Guest
 ```
 **2. Format Permission**: `resource:action:scope`
 - Resource: `users`, `roles`, `permissions`
 - Action: `create`, `read`, `update`, `delete`
 - Scope: `own`, `org`, `global`
 **Ví dụ**:
 - `users:read:own` - Đọc profile của chính mình
 - `users:update:org` - Update users trong organization
 - `roles:create:global` - Tạo roles globally
 ## Zero-Trust Architecture / Kiến trúc Zero-Trust
 ```mermaid
 graph TD
@@ -221,7 +159,7 @@ graph TD
    style Allow fill:#d4edda
 ```
-**EN: Zero-Trust Components**:
+**Zero-Trust Components**:
 **1. Device Fingerprinting**:
 - Browser: User-Agent, Canvas, WebGL
@@ -245,22 +183,9 @@ graph TD
 - Bind session to IP address
 - Invalidate on mismatch
-**VI: Thành phần Zero-Trust**:
+## Data Protection
-**1. Device Fingerprinting**:
+**Encryption Strategy**:
 - Browser: User-Agent, Canvas, WebGL
 - Screen resolution, timezone, language
 - Phát hiện plugin, fonts có sẵn
 - Hash fingerprint → Lưu với session
 **2. IP Address Validation**:
 - Whitelist IPs đã biết cho user
 - Alert với IP mới + require MFA
 - Block IPs đáng ngờ (VPN, Tor)
 ## Data Protection / Bảo vệ Dữ liệu
 **EN: Encryption Strategy**:
 **1. Data at Rest**:
 - PII: AES-256-GCM encryption
@@ -279,22 +204,9 @@ graph TD
 - Rotate keys quarterly
 - Never hardcode secrets
-**VI: Chiến lược Mã hóa**:
+## Compliance & Audit
-**1. Data at Rest**:
+**Compliance Requirements**:
 - PII: AES-256-GCM encryption
 - Passwords: bcrypt (cost 12)
 - Tokens: SHA-256 hash
 - Keys: Environment variables + K8s secrets
 **2. Data in Transit**:
 - TLS 1.2+ cho mọi giao tiếp
 - HTTPS enforcement
 - Certificate pinning (mobile clients)
 ## Compliance & Audit / Tuân thủ & Kiểm toán
 **EN: Compliance Requirements**:
 **1. GDPR**:
 - Right to erasure (soft delete + hard delete after 90 days)
@@ -308,7 +220,6 @@ graph TD
 - Audit logging (7-year retention)
 - Incident response plan
 **3. Audit Trail**:
 ```typescript
 // Event sourcing for all auth events
 {
@@ -321,27 +232,338 @@ graph TD
 }
 ```
-**VI: Yêu cầu Tuân thủ**:
+## System Context
-**1. GDPR**:
+```mermaid
- Right to erasure (soft delete + hard delete sau 90 ngày)
+C4Context
- Data portability (export dữ liệu user)
+    title Security Architecture Context
- Quản lý consent
+    
- Thông báo breach (72 giờ)
+    Person(user, "User", "End user accessing platform")
    Person(admin, "Admin", "System administrator")
    Person(attacker, "Attacker", "Potential threat actor")
    System(iam, "IAM Service", "Authentication & Authorization")
    System_Ext(db, "Neon PostgreSQL", "Encrypted user credentials & sessions")
    System_Ext(cache, "Redis", "Permission & session cache")
    System_Ext(audit, "Audit Service", "Security event logging")
    System_Ext(mfa, "MFA Provider", "TOTP verification")
    System_Ext(monitoring, "Security Monitoring", "SIEM & alerting")
    Rel(user, iam, "Authenticates", "HTTPS + TLS 1.2+")
    Rel(admin, iam, "Manages permissions", "HTTPS + TLS 1.2+")
    Rel(attacker, iam, "Blocked by security layers", "")
    Rel(iam, db, "Stores credentials", "PostgreSQL + TLS")
    Rel(iam, cache, "Caches permissions", "Redis + TLS")
    Rel(iam, audit, "Logs security events", "Kafka")
    Rel(iam, mfa, "Verifies MFA", "HTTPS")
    Rel(iam, monitoring, "Sends security metrics", "Prometheus + Loki")
 ```
-**2. SOC2**:
+**Context Description**:
- Access controls (RBAC)
+- **IAM Service**: Central authentication and authorization
- Encryption at rest và in transit
+- **Database**: Stores encrypted credentials, sessions, permissions
- Audit logging (7 năm retention)
+- **Cache**: Caches permissions and sessions to reduce database load
- Incident response plan
+- **Audit Service**: Receives and stores all security events
 - **MFA Provider**: External TOTP verification service (Google Authenticator compatible)
 - **Security Monitoring**: SIEM (Security Information and Event Management) and alerting
-## Related Documentation / Tài liệu Liên quan
+## Database Architecture
- [System Design](./system-design.md) - Overall architecture
+```mermaid
- [IAM Architecture](./iam-proposal.md) - IAM service implementation
+erDiagram
- [Event-Driven Architecture](./event-driven-architecture.md) - Audit event streaming
+    User ||--o{ Session : has
    User ||--o{ UserRole : has
    User ||--o{ UserPermission : has
    User ||--o{ MFADevice : has
    User ||--o{ LoginHistory : has
    User ||--o{ DeviceFingerprint : has
    Role ||--o{ UserRole : assigned_to
    Role ||--o{ RolePermission : has
    Permission ||--o{ RolePermission : granted_to
    Permission ||--o{ UserPermission : granted_to
    Organization ||--o{ User : contains
    Organization ||--o{ Role : defines
    User {
        string id PK "CUID"
        string email UK "Unique, indexed"
        string passwordHash "bcrypt cost 12"
        string organizationId FK
        boolean mfaEnabled "MFA required?"
        datetime lastLoginAt "Tracking"
        datetime createdAt "Timestamp"
        datetime updatedAt "Timestamp"
        datetime deletedAt "Soft delete"
    }
    Session {
        string id PK "CUID"
        string userId FK
        string refreshTokenHash "SHA-256"
        string deviceFingerprint "Hashed"
        string ipAddress "IPv4/IPv6"
        string userAgent "Browser info"
        datetime expiresAt "7 days TTL"
        datetime lastActivityAt "Tracking"
        datetime createdAt "Timestamp"
    }
    Role {
        string id PK "CUID"
        string name "role-name"
        string organizationId FK
        int hierarchy "Priority level"
        boolean isSystem "Built-in?"
        datetime createdAt "Timestamp"
    }
    Permission {
        string id PK "CUID"
        string resource "users, roles, etc"
        string action "create, read, update, delete"
        string scope "own, org, global"
        datetime createdAt "Timestamp"
    }
    MFADevice {
        string id PK "CUID"
        string userId FK
        string type "totp, backup"
        string secret "Encrypted TOTP secret"
        boolean verified "Verified?"
        datetime lastUsedAt "Tracking"
        datetime createdAt "Timestamp"
    }
    LoginHistory {
        string id PK "CUID"
        string userId FK
        boolean success "Success/Failure"
        string ipAddress "IPv4/IPv6"
        string deviceFingerprint "Hashed"
        string failureReason "If failed"
        datetime timestamp "Event time"
    }
    DeviceFingerprint {
        string id PK "CUID"
        string userId FK
        string fingerprint "Hashed"
        boolean trusted "Auto-approved?"
        datetime firstSeenAt "First use"
        datetime lastSeenAt "Last use"
    }
 ```
---
+**Description**:
 - **User**: Stores hashed credentials, MFA settings, organization membership
 - **Session**: Stores hashed refresh tokens, device fingerprint, IP tracking
 - **Role & Permission**: RBAC hierarchy with system roles and custom roles
 - **MFADevice**: TOTP secrets (encrypted), backup codes
 - **LoginHistory**: Audit trail for all login attempts (success/failure)
 - **DeviceFingerprint**: Trusted device tracking for zero-trust model
-**Last Updated**: 2024-01-15  
+**Database Security**:
-**Authors**: GoodGo Security Team
+- Password hashes: bcrypt with cost factor 12
 - Token hashes: SHA-256
 - MFA secrets: AES-256-GCM encryption
 - Soft deletes: `deletedAt` field, hard delete after 90 days (GDPR)
 - Indexes: email (unique), userId (foreign keys), timestamps
 ## Design Decisions
 ### Decision 1: JWT with RS256 (Asymmetric)
 **Context**: Need stateless authentication with ability to verify tokens in multiple services
 **Decision**: Use JWT with RS256 (RSA asymmetric signing) instead of HS256 (HMAC symmetric)
 **Consequences**:
 - ✅ **Positive**:
  - Services can verify tokens with public key, don't need secret
  - Easier key rotation (only distribute new public key)
  - Higher security (private key only in IAM service)
  - Compliance: Clear audit trail of who signs tokens
 - ❌ **Negative**:
  - Slightly slower than HS256 (~10-20% slower)
  - More complex key management
  - Public/private key pair must be carefully protected
 **Alternatives**: HS256 (symmetric), EdDSA, OAuth 2.0 with Opaque Tokens
 ### Decision 2: Zero-Trust Model with Device Fingerprinting
 **Context**: Need to protect against credential theft, session hijacking, and unauthorized access
 **Decision**: Implement zero-trust model with device fingerprinting, IP validation, behavioral analysis
 **Consequences**:
 - ✅ **Positive**:
  - Detect anomalies (new device, new IP, unusual behavior)
  - Increased security by detecting and blocking suspicious activities
  - Compliance: SOC2, ISO27001 requirements
  - User experience: Auto-approve trusted devices
 - ❌ **Negative**:
  - Higher complexity
  - Potential false positives (legitimate users blocked)
  - Performance overhead (fingerprint hash, IP check)
  - Privacy concerns (tracking devices, IPs)
 **Alternatives**: Basic authentication only, IP whitelist only, MFA required for all
 ### Decision 3: Event Sourcing for Audit Trail
 **Context**: Need immutable audit trail for compliance (GDPR, SOC2, HIPAA) and security forensics
 **Decision**: Use event sourcing pattern to store all auth/security events
 **Consequences**:
 - ✅ **Positive**:
  - Immutable audit trail (cannot modify/delete)
  - Complete history of all security events
  - Compliance: GDPR (7-year retention), SOC2, HIPAA
  - Security forensics: Trace back attacks, breaches
  - Replay events to reconstruct state
 - ❌ **Negative**:
  - High storage cost (retain 7 years)
  - Complexity in event schema versioning
  - Performance: Event publishing overhead
  - Data privacy: Must anonymize PII after retention period
 **Alternatives**: Database audit logs only, External SIEM only, No audit trail
 ## Performance Characteristics
 | Metric | Target | Notes |
 |--------|--------|-------|
 | **Login Time (P95)** | < 500ms | Including bcrypt verification |
 | **Login Time (P99)** | < 1s | Peak load |
 | **Token Generation (P95)** | < 50ms | JWT sign with RS256 |
 | **Token Verification (P95)** | < 10ms | JWT verify with public key |
 | **Permission Check (P95)** | < 5ms | From cache (L1 or L2) |
 | **Permission Check (Cache Miss)** | < 50ms | Database query |
 | **MFA Verification (P95)** | < 100ms | TOTP validation |
 | **Session Lookup (P95)** | < 10ms | Redis cache |
 | **Password Hash (P95)** | < 200ms | bcrypt cost 12 |
 | **Device Fingerprint Hash** | < 5ms | SHA-256 |
 | **Failed Login Rate Limit** | 5 attempts / 15min | Per user |
 | **Auth Throughput** | 500 req/s | Per IAM instance |
 **Performance Optimizations**:
 - **Permission Caching**: L1 (memory) + L2 (Redis), TTL 5 minutes
 - **Token Caching**: Cache public key in memory for JWT verification
 - **Connection Pooling**: Reuse database connections
 - **Async Operations**: Event publishing, audit logging (fire-and-forget)
 - **Rate Limiting**: Prevent brute force attacks, reduce load
 - **Horizontal Scaling**: Multiple IAM service instances
 ## Deployment
 ```mermaid
 graph TD
    subgraph "Security Layer"
        LB[Load Balancer<br/>TLS Termination]
        WAF[WAF / Firewall<br/>Rate Limiting<br/>DDoS Protection]
    end
    subgraph "IAM Service Layer"
        IAM1[IAM Service Pod 1<br/>Stateless]
        IAM2[IAM Service Pod 2<br/>Stateless]
        IAM3[IAM Service Pod 3<br/>Stateless]
    end
    subgraph "Data Layer"
        DB[(Neon PostgreSQL<br/>Encrypted at Rest)]
        Cache[(Redis Cluster<br/>TLS Enabled)]
        Vault[Secrets Manager<br/>K8s Secrets]
    end
    subgraph "Security Monitoring"
        SIEM[SIEM / Security Monitoring]
        Alerts[Alerting System]
    end
    Client[Clients] --> LB
    LB --> WAF
    WAF --> IAM1
    WAF --> IAM2
    WAF --> IAM3
    IAM1 --> DB
    IAM1 --> Cache
    IAM1 --> Vault
    IAM2 --> DB
    IAM2 --> Cache
    IAM2 --> Vault
    IAM3 --> DB
    IAM3 --> Cache
    IAM3 --> Vault
    IAM1 -.->|Security Events| SIEM
    IAM2 -.->|Security Events| SIEM
    IAM3 -.->|Security Events| SIEM
    SIEM -.->|Alerts| Alerts
    style LB fill:#d4edda
    style WAF fill:#fff3cd
    style DB fill:#f0e1ff
    style Cache fill:#fff4e1
    style Vault fill:#f8d7da
    style SIEM fill:#e1f5ff
 ```
 **Deployment Strategy**:
 **Security Deployment**:
 - **TLS 1.2+ Enforcement**: All connections require TLS
 - **Network Policies (K8s)**: Deny all by default, whitelist specific services
 - **Pod Security Policies**: Non-root user, read-only filesystem, no privilege escalation
 - **Secrets Management**: Kubernetes secrets with encryption at rest
 - **Image Scanning**: Trivy/Clair scan before deployment
 - **RBAC (K8s)**: Least privilege for service accounts
 **Resource Allocation**:
 | Component | CPU | Memory | Replicas |
 |-----------|-----|--------|----------|
 | **IAM Service** | 500m | 1GB | 3-10 (HPA) |
 | **Redis** | 1 core | 2GB | 3 masters + 3 slaves |
 **Security Configuration**:
 ```yaml
 # K8s Network Policy
 apiVersion: networking.k8s.io/v1
 kind: NetworkPolicy
 metadata:
  name: iam-service-policy
 spec:
  podSelector:
    matchLabels:
      app: iam-service
  policyTypes:
  - Ingress
  - Egress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: api-gateway
    ports:
    - protocol: TCP
      port: 5000
  egress:
  - to:
    - podSelector:
        matchLabels:
          app: postgresql
    ports:
    - protocol: TCP
      port: 5432
 ```
--- a/docs/en/guides/deployment.md
+++ b/docs/en/guides/deployment.md
@@ -1,106 +1,234 @@
 # Deployment Guide
 > **Note**: This guide covers deployment strategies for GoodGo Microservices Platform across Local, Staging, and Production environments using Kubernetes and Neon PostgreSQL.
 ## Table of Contents
 1. [Deployment Architecture](#deployment-architecture)
 2. [Prerequisites](#prerequisites)
 3. [Database Setup (Neon)](#database-setup-neon)
 4. [Local Deployment](#local-deployment)
 5. [CI/CD Pipeline](#cicd-pipeline)
 6. [Staging Deployment](#staging-deployment)
 7. [Production Deployment](#production-deployment)
 8. [Scaling & Resilience](#scaling--resilience)
 9. [Rollback Procedures](#rollback-procedures)
 ---
 ## Deployment Architecture
 ```mermaid
 graph TD
    subgraph "CI/CD Pipeline (GitHub Actions)"
        Code[Code Push] --> Test[Run Tests]
        Test --> Build[Build Docker Image]
        Build --> Registry[Push to Registry]
        Registry --> Deploy[Deploy to K8s]
    end
    subgraph "Infrastructure (Kubernetes)"
        Ingress[Traefik Ingress] --> Service[K8s Service]
        Service --> Pods[Application Pods]
        Pods --> Secrets[K8s Secrets]
    end
    subgraph "External Services"
        Pods --> Neon[(Neon PostgreSQL)]
        Pods --> Redis[(Redis Cloud)]
    end
    Deploy --> Ingress
 ```
 ---
 ## Prerequisites
 Before deploying, ensure you have:
 *   **Tools**: `kubectl`, `helm`, `docker` installed.
 *   **Access**:
    *   Kubernetes Cluster (EKS/GKE/DigitalOcean).
    *   Container Registry (GHCR/DockerHub).
    *   Neon Console Account.
 *   **Configuration**:
    *   `KUBECONFIG` file set up.
    *   GitHub Secrets configured for CI/CD.
 ---
 ## Database Setup (Neon)
-All environments use **Neon PostgreSQL**. Setup once before deployment:
+We use **Neon Serverless PostgreSQL** for all environments to leverage branching and auto-scaling.
-1. Create Neon project at https://neon.tech
+1.  **Create Project**: Log in to [neon.tech](https://neon.tech) and create a project `goodgo-platform`.
-2. Create branches: `main` (dev), `staging`, `production`
+2.  **Create Branches**:
-3. Get connection strings for each branch
+    *   `main` -> For Development/Local.
-4. Configure in environment variables (see below)
+    *   `staging` -> For Staging environment.
    *   `production` -> For Production environment (Protected).
 3.  **Get Connection Strings**:
    *   Note the connection string for each branch (Pooler mode recommended).
-See [Neon Setup Guide](../../infra/databases/neon/README.md) for details.
+---
 ## Local Deployment
-```bash
+For local development, we use Docker Compose.
 # Setup Neon database URL
 cp deployments/local/env.local.example deployments/local/.env.local
 # Edit .env.local and add your Neon DATABASE_URL
-# Start services (no PostgreSQL container needed)
+```bash
 # 1. Setup Environment
 cp deployments/local/env.local.example deployments/local/.env.local
 # Edit .env.local with Neon `main` branch connection string
 # 2. Start Infrastructure (Redis, Traefik, etc.)
 cd deployments/local
 docker-compose up -d
 # 3. Start Services (Hot-reload)
 pnpm dev
 ```
 ---
 ## CI/CD Pipeline
 We use GitHub Actions for automated deployments.
 | Workflow | Trigger | Description |
 | :--- | :--- | :--- |
 | `ci-check.yml` | Pull Request | Runs unit tests, linting, and build check. |
 | `deploy-staging.yml` | Push to `develop` | Build image -> Deploy to Staging Namespace. |
 | `deploy-prod.yml` | Release / Tag | Build image -> Deploy to Production Namespace. |
 ### Secrets Configuration (GitHub)
 Set these secrets in your repository settings:
 *   `NEON_DATABASE_URL_STAGING`: Connection string for staging branch.
 *   `NEON_DATABASE_URL_PRODUCTION`: Connection string for production branch.
 *   `KUBECONFIG_STAGING`: Base64 encoded kubeconfig for staging.
 *   `KUBECONFIG_PRODUCTION`: Base64 encoded kubeconfig for production.
 *   `DOCKER_REGISTRY_TOKEN`: For pushing images.
 ---
 ## Staging Deployment
-### Prerequisites
+Staging mirrors production but uses cost-effective resources.
 - Kubernetes cluster access
 - kubectl configured
 - KUBECONFIG set
 - Neon staging branch created
 - GitHub Secrets configured:
  - `NEON_DATABASE_URL_STAGING`
  - `KUBECONFIG_STAGING`
-### Setup Secrets
+### Manual Deployment
 ```bash
-# Create Kubernetes secret
+# 1. Create Secrets
 kubectl create secret generic iam-service-secrets \
-  --from-literal=database-url='postgresql://user:pass@ep-xxx.region.neon.tech/dbname?sslmode=require&pgbouncer=true' \
+  --from-literal=database-url='<STAGING_NEON_URL>' \
-  --from-literal=jwt-secret='your-staging-jwt-secret' \
+  --from-literal=jwt-secret='<RANDOM_SECRET>' \
  --from-literal=jwt-refresh-secret='your-staging-refresh-secret' \
  -n staging
 # 2. Apply Manifests
 kubectl apply -f deployments/staging/kubernetes/ -n staging
 # 3. Verify
 kubectl get pods -n staging
 ```
-### Deploy
+### via CI/CD
-```bash
+Push code to `develop` branch. The action will:
-./scripts/deploy/deploy-staging.sh
+1.  Run tests.
-```
+2.  Run `prisma migrate deploy` against Staging DB.
 3.  Update Kubernetes deployment image.
-Or manually:
+---
 ```bash
 kubectl apply -f deployments/staging/kubernetes/
 ```
 **Note**: Migrations run automatically in CI/CD before deployment.
 ## Production Deployment
-### Prerequisites
+Production uses high-availability configurations.
 - Production Kubernetes cluster
 - kubectl configured with production context
 - Neon production branch created
 - GitHub Secrets configured:
  - `NEON_DATABASE_URL_PRODUCTION`
  - `KUBECONFIG_PRODUCTION`
-### Setup Secrets
+### 1. Database Preparation
 *   Ensure Production branch in Neon is **protected**.
 *   Configure **Point-in-Time Recovery (PITR)** window (e.g., 7 days).
 ### 2. Manual Deployment Steps
 ```bash
-# Create Kubernetes secret
+# 1. Create Namespace
 kubectl create namespace production
 # 2. Create Sealed Secrets (Recommended) or Standard Secrets
 kubectl create secret generic iam-service-secrets \
-  --from-literal=database-url='postgresql://user:pass@ep-xxx.region.neon.tech/dbname?sslmode=require&pgbouncer=true' \
+  --from-literal=database-url='<PROD_NEON_URL>' \
-  --from-literal=jwt-secret='your-production-jwt-secret' \
+  --from-literal=jwt-secret='<SECURE_RANDOM_SECRET>' \
-  --from-literal=jwt-refresh-secret='your-production-refresh-secret' \
+  --from-literal=jwt-refresh-secret='<SECURE_RANDOM_SECRET>' \
  -n production
 # 3. Deploy
 kubectl apply -f deployments/production/kubernetes/ -n production
 ```
-### Deploy
+### 3. Verification
 ```bash
-./scripts/deploy/deploy-prod.sh
+# Check Rollout Status
 kubectl rollout status deployment/iam-service -n production
 # Check Logs
 kubectl logs -l app=iam-service -n production
 ```
-**Note**: Migrations run automatically in CI/CD before deployment (with approval).
+---
-### Rollback
+## Scaling & Resilience
 ### Horizontal Pod Autoscaler (HPA)
 We use HPA to automatically scale pods based on CPU/Memory.
 ```yaml
 # Example HPA Config
 apiVersion: autoscaling/v2
 kind: HorizontalPodAutoscaler
 metadata:
  name: iam-service-hpa
 spec:
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
 ```
 ### Zero-Downtime Deployment
 Kubernetes handles this via Rolling Updates.
 *   **MaxSurge**: 25% (Add new pods before removing old ones).
 *   **MaxUnavailable**: 0 (Ensure no downtime during update).
 ---
 ## Rollback Procedures
 If a deployment fails or introduces a critical bug:
 ### Kubernetes Rollback
 ```bash
 # Undo last deployment
 kubectl rollout undo deployment/iam-service -n production
 # Undo to specific revision
 kubectl rollout undo deployment/iam-service -n production --to-revision=2
 ```
-## Health Checks
+### Database Rollback
- Liveness: `GET /health/live`
+Since Neon supports branching and PITR:
- Readiness: `GET /health/ready`
+1.  Go to Neon Console.
- Health: `GET /health`
+2.  Restore the `production` branch to a timestamp before the bad migration.
-
+3.  **Warning**: This may result in data loss for new transactions. Use with caution.
 ## Monitoring
 - Prometheus: http://prometheus:9090
 - Grafana: http://grafana:3000
 - Traefik Dashboard: http://traefik:8080
--- a/docs/en/guides/development.md
+++ b/docs/en/guides/development.md
@@ -1,111 +1,211 @@
 # Development Guide
 > **Note**: This guide provides comprehensive standards and workflows for contributing to the GoodGo Microservices Platform.
 ## Table of Contents
 1. [Project Structure](#project-structure)
 2. [Code Standards](#code-standards)
 3. [Git Workflow](#git-workflow)
 4. [Backend Development](#backend-development)
 5. [Testing Strategy](#testing-strategy)
 6. [Database Workflow](#database-workflow)
 7. [Kubernetes Deployment](#kubernetes-deployment)
 ---
 ## Project Structure
 We follow a strict monorepo structure managed by PNPM Workspaces.
 ```
-├── apps/              # Frontend applications
+Base/
-├── services/          # Backend microservices
+├── apps/                 # Frontend applications
-├── packages/          # Shared libraries
+│   ├── web-client/       # Next.js 14+ (App Router)
-├── infra/             # Infrastructure configs
+│   └── mobile-client/    # Flutter
-├── deployments/       # Deployment configs
+├── services/             # Backend microservices
-├── scripts/           # Automation scripts
+│   ├── _template/        # Template for new services
-└── docs/              # Documentation
+│   ├── iam-service/      # Identity & Access Management
 │   └── ...
 ├── packages/             # Shared libraries
 │   ├── logger/           # Structured logging (Winston)
 │   ├── types/            # Shared DTOs & Interfaces
 │   ├── http-client/      # Internal Service Client
 │   └── tracing/          # OpenTelemetry configuration
 ├── infra/                # Infrastructure-as-Code
 │   ├── traefik/          # API Gateway
 │   └── databases/        # Database setup scripts
 └── docs/                 # Documentation (EN & VI)
 ```
-## Development Workflow
+---
-### 1. Create a Feature Branch
+## Code Standards
 ### Naming Conventions
 *   **Files**: `kebab-case.ts` (e.g., `user.controller.ts`, `app.config.ts`)
 *   **Classes**: `PascalCase` (e.g., `UserController`, `AuthService`)
 *   **Functions/Variables**: `camelCase` (e.g., `getUserById`, `isValid`)
 *   **Constants**: `UPPER_SNAKE_CASE` (e.g., `MAX_RETRIES`, `DEFAULT_TIMEOUT`)
 *   **Interfaces**: `PascalCase` (e.g., `User`, `CreateUserDto`) - *No 'I' prefix*
 ### Bilingual Comments
 For core logic and public APIs, assume both international and Vietnamese developers reading the code.
 ```typescript
 /**
 * EN: Validates user credentials and returns a token
 * VI: Xác thực thông tin người dùng và trả về token
 */
 async login(dto: LoginDto): Promise<TokenResponse> { ... }
 ```
 ### TypeScript Usage
 *   **Strict Mode**: Enabled in `tsconfig.json`. No `any` allowed (use `unknown` if needed).
 *   **DTOs**: Use Zod for runtime validation and type inference.
 *   **Return Types**: Explicitly declare return types for all public methods.
 ---
 ## Git Workflow
 ### Branching Strategy
 *   `main`: Production-ready code.
 *   `develop`: Integration branch for next release.
 *   `feature/xyz`: New features (branch off `develop`).
 *   `fix/xyz`: Bug fixes (branch off `develop`).
 *   `hotfix/xyz`: Critical fixes (branch off `main`).
 ### Commit Messages
 We follow [Conventional Commits](https://www.conventionalcommits.org/):
 ```
 feat(iam): add multi-factor authentication
 fix(db): correct unique constraint on email
 docs(guide): update development setup
 style: format code with prettier
 refactor: simplify auth middleware
 test: add unit tests for user service
 chore: update dependencies
 ```
 ---
 ## Backend Development
 ### Creating a New API Endpoint
 1.  **Define DTO** (`modules/user/user.dto.ts`):
    ```typescript
    export const CreateUserDto = z.object({
      email: z.string().email(),
      name: z.string().min(2),
    });
    export type CreateUserDto = z.infer<typeof CreateUserDto>;
    ```
 2.  **Create Service Method** (`modules/user/user.service.ts`):
    *   Implement business logic.
    *   Use `BaseRepository`.
    *   Throw `HttpError` (e.g., `NotFound`, `BadRequest`).
 3.  **Create Controller** (`modules/user/user.controller.ts`):
    *   Parse body with DTO: `const dto = CreateUserDto.parse(req.body)`.
    *   Call service.
    *   Return success response: `res.json({ success: true, data: result })`.
 4.  **Register Route** (`modules/user/index.ts`):
    *   Add to Express router with middlewares.
 ### Error Handling
 Always use the custom error classes from `core/errors`:
 ```typescript
 import { NotFoundError, ConflictError } from '../../core/errors';
 if (!user) {
  throw new NotFoundError('User not found');
 }
 ```
 ---
 ## Testing Strategy
 ### Unit Tests (`*.test.ts`)
 *   **Scope**: Individual classes/functions.
 *   **Mocking**: Mock all external dependencies (DB, other services) using `jest-mock-extended`.
 *   **Location**: Co-located with source files.
 *   **Run**: `pnpm test`
 ### E2E Tests (`tests/**/*.e2e.ts`)
 *   **Scope**: Full API flows (Controller -> Service -> DB).
 *   **Database**: Use a separate test database (Dockerized).
 *   **Run**: `pnpm test:e2e`
 ### Linting & Formatting
 *   **Lint**: `pnpm lint` (ESLint)
 *   **Format**: `pnpm format` (Prettier)
 *   **Typecheck**: `pnpm typecheck` (TSC)
 ---
 ## Database Workflow
 We use **Prisma** with **Neon PostgreSQL**.
 ### Migrations
 1.  Modify `prisma/schema.prisma`.
 2.  Create migration (Dev):
    ```bash
    ./scripts/db/migrate.sh iam-service dev --name add_user_profile
    ```
 3.  Apply to Production (CI/CD):
    ```bash
    ./scripts/db/migrate.sh iam-service deploy
    ```
 ### Seed Data
 Populate database with initial data:
 ```bash
-git checkout -b feature/my-feature
+./scripts/db/seed.sh iam-service
 ```
-### 2. Make Changes
+### Visualizing Data
 - Write code following TypeScript strict mode
 - Add tests for new functionality
 - Update documentation if needed
 ### 3. Run Tests Locally
 Use Prisma Studio:
 ```bash
-# All tests
+pnpm --filter @goodgo/iam-service prisma studio
 pnpm test
 # Specific service
 pnpm --filter @goodgo/iam-service test
 ```
-### 4. Lint and Format
+---
 ```bash
 pnpm lint
 pnpm format
 ```
 ### 5. Create Pull Request
 - Push your branch
 - Create PR targeting `develop`
 - CI/CD will run automatically
 ## Adding a New Service
 1. Use the template:
   ```bash
   ./scripts/utils/create-service.sh my-new-service
   ```
 2. Update service configuration
 3. Implement business logic
 4. Add tests
 5. Update documentation
 ## Adding a New Package
 1. Create package in `packages/new-package`
 2. Add to workspace in `pnpm-workspace.yaml`
 3. Export from `index.ts`
 4. Add tests
 5. Document usage
 ## Database Migrations
 ## Database Migrations
 ```bash
 # Create migration (dev)
 ./scripts/db/migrate.sh iam-service dev
 # Apply migrations (production)
 ./scripts/db/migrate.sh iam-service deploy
 ```
 ## Kubernetes Deployment
-### Local Kubernetes (Docker Desktop)
+For local Kubernetes testing (Docker Desktop / Minikube):
 ```bash
-# Enable Kubernetes in Docker Desktop
+# 1. Build images
-# Settings → Kubernetes → Enable Kubernetes
+docker build -t goodgo/iam-service:latest -f services/iam-service/Dockerfile .
-# Deploy service
+# 2. Deploy
 cd deployments/local/kubernetes
 ./deploy.sh
-# Verify deployment
+# 3. Verify
 kubectl get pods -n iam-local
-kubectl logs -f -n iam-local -l app=iam-service
+kubectl logs -f -l app=iam-service -n iam-local
 # Port forward for testing
 kubectl port-forward svc/iam-service 5002:80 -n iam-local
 curl http://localhost:5002/health/live
 ```
-**See detailed guide**: [Kubernetes Local Deployment Guide](./kubernetes-local.md)
+See [Kubernetes Guide](./kubernetes-local.md) for detailed setup.
 ## Debugging
 - Use logger from `@goodgo/logger`
 - Check Traefik logs: `docker logs traefik-local`
 - Check service logs: `./scripts/dev/logs.sh iam-service`
--- a/docs/en/guides/getting-started.md
+++ b/docs/en/guides/getting-started.md
@@ -1,81 +1,214 @@
 # Getting Started
 > **Note**: This guide assumes you are setting up the project on macOS or Linux. Windows users should use WSL2.
 ## Table of Contents
 1. [Prerequisites](#prerequisites)
 2. [Architecture Overview](#architecture-overview)
 3. [Project Structure](#project-structure)
 4. [Installation & Setup](#installation--setup)
 5. [Development Workflow](#development-workflow)
 6. [Common Commands](#common-commands)
 7. [Troubleshooting](#troubleshooting)
 ## Prerequisites
- Node.js >= 20.0.0
+Before starting, ensure you have the following installed:
 - PNPM >= 8.0.0
 - Docker & Docker Compose
 - Git
 - Neon account (https://neon.tech) - for database
-## Initial Setup
+*   **Node.js**: v20.0.0 or higher
    ```bash
    node -v
    # v20.10.0
    ```
 *   **PNPM**: v8.0.0 or higher (we use pnpm workspaces)
    ```bash
    pnpm -v
    # 8.12.0
    ```
 *   **Docker & Docker Compose**: For local infrastructure
    ```bash
    docker -v
    # Docker version 24.0.0
    ```
 *   **Git**: For version control
 *   **Neon Account**: Serverless PostgreSQL (https://neon.tech)
-1. **Clone the repository**
+## Architecture Overview
   ```bash
   git clone <repository-url>
   cd Base
   ```
-2. **Setup Neon Database**
+GoodGo Platform uses a microservices architecture with a shared infrastructure layer.
   ```bash
   # Run setup script
   ./scripts/db/setup-neon.sh
   # Or manually:
   # 1. Create Neon project at https://neon.tech
   # 2. Create branches: main (dev), staging, production
   # 3. Get connection strings
   # 4. Update deployments/local/.env.local
   ```
   See [Neon Setup Guide](../../infra/databases/neon/README.md) for details.
-3. **Initialize the project**
+```mermaid
-   ```bash
+graph TD
-   ./scripts/setup/init-project.sh
+    Client[Client Apps] --> Traefik[Traefik Gateway]
-   ```
+    
    Traefik --> IAM[IAM Service]
    Traefik --> Template[Template Service]
    IAM --> DB[(Neon PostgreSQL)]
    IAM --> Redis[(Redis Cache)]
    IAM --> Kafka[Kafka Events]
    style Traefik fill:#e1f5ff
    style DB fill:#f0e1ff
    style Redis fill:#fff4e1
 ```
-4. **Start infrastructure** (Redis, Traefik - no PostgreSQL needed)
+## Project Structure
   ```bash
   cd deployments/local
   docker-compose up -d
   cd ../..
   ```
-5. **Run database migrations**
+The repository follows a monorepo structure:
   ```bash
   ./scripts/db/migrate.sh iam-service dev
   ```
-6. **Seed the database**
+```
-   ```bash
+Base/
-   ./scripts/db/seed.sh iam-service
+├── apps/                 # Frontend applications
-   ```
+│   ├── web-client/       # Next.js web application
 │   └── mobile-client/    # Flutter mobile application
 ├── services/             # Backend microservices
 │   ├── iam-service/      # Authentication & Authorization
 │   └── _template/        # Template for new services
 ├── packages/             # Shared libraries
 │   ├── logger/           # Structured logging
 │   ├── types/            # Shared TypeScript types
 │   └── http-client/      # Internal HTTP client
 ├── infra/                # Infrastructure configuration
 │   ├── traefik/          # API Gateway config
 │   └── databases/        # Database setup scripts
 ├── deployments/          # Deployment configurations
 │   ├── local/            # Docker Compose for dev
 │   └── k8s/              # Kubernetes manifests
 └── docs/                 # Documentation
 ```
-7. **Start all services**
+## Installation & Setup
   ```bash
   ./scripts/dev/start-all.sh
   ```
-## Access Points
+### 1. Clone the Repository
- **API Gateway**: http://localhost/api/v1
+```bash
- **Auth Service**: http://localhost:5001
+git clone <repository-url>
- **Web Admin**: http://admin.localhost or http://localhost:3000
+cd Base
- **Web Client**: http://localhost or http://localhost:3001
+```
 - **Traefik Dashboard**: http://localhost:8080
-## Database
+### 2. Configure Environment
-This project uses **Neon PostgreSQL** for all environments:
+Each service and the local infrastructure needs environment variables. We provide templates for these.
 - **Development**: Neon main branch
 - **Staging**: Neon staging branch
 - **Production**: Neon production branch
-No local PostgreSQL needed! See [Neon Setup](../../infra/databases/neon/README.md) for details.
+```bash
 # Initialize project setup (copies .env.example to .env)
 ./scripts/setup/init-project.sh
 ```
 ### 3. Setup Neon Database
 We use Neon (Serverless PostgreSQL) for all environments (Dev, Staging, Prod).
 1.  Create a project at [neon.tech](https://neon.tech).
 2.  Create a branch named `dev` (or use `main`).
 3.  Get the Connection String from the Neon dashboard.
 4.  Update `deployments/local/.env.local`:
 ```env
 DATABASE_URL="postgres://user:pass@ep-xyz.region.neon.tech/neondb"
 ```
 ### 4. Start Infrastructure
 Start the supporting infrastructure (Redis, Traefik, Observability) using Docker Compose.
 ```bash
 cd deployments/local
 docker-compose up -d
 # Expected output: Containers for traefik, redis, kafka created
 ```
 ### 5. Install Dependencies
 ```bash
 pnpm install
 ```
 ### 6. Setup Database Schema
 Push the Prisma schema to your Neon database.
 ```bash
 # Run migrations for IAM service
 pnpm --filter @goodgo/iam-service prisma migrate dev
 ```
 ### 7. Start Services
 Start all backend services in development mode.
 ```bash
 pnpm dev
 # or start specific service
 pnpm --filter @goodgo/iam-service dev
 ```
 ## Development Workflow
 ### Creating a New Service
 1.  Copy the template:
    ```bash
    cp -r services/_template services/my-new-service
    ```
 2.  Update `package.json` name.
 3.  Add logic in `src/modules/`.
 4.  Register in `deployments/local/docker-compose.yml`.
 ### Making Changes
 1.  Create a new branch: `feature/my-feature`.
 2.  Implement changes.
 3.  Run tests: `pnpm test`.
 4.  Commit with conventional commits: `feat(iam): add login endpoint`.
 ## Common Commands
 | Command | Description |
 | :--- | :--- |
 | `pnpm install` | Install all dependencies |
 | `pnpm dev` | Start all services in dev mode |
 | `pnpm build` | Build all packages and services |
 | `pnpm test` | Run unit tests |
 | `pnpm lint` | Lint code |
 | `docker-compose up -d` | Start local infra |
 | `docker-compose down` | Stop local infra |
 ## Troubleshooting
 ### Port Conflicts
 **Error**: `Bind for 0.0.0.0:80 failed: port is already allocated`
 **Solution**: Check what's using port 80 (likely another web server) and stop it, or change Traefik ports in `docker-compose.yml`.
 ```bash
 lsof -i :80
 kill -9 <PID>
 ```
 ### Database Connection Failed
 **Error**: `P1001: Can't reach database server`
 **Solution**:
 1.  Check your internet connection (Neon is cloud-based).
 2.  Verify `DATABASE_URL` in `deployments/local/.env.local`.
 3.  Ensure your IP is allowed in Neon dashboard settings.
 ### Service Not Found in Gateway
 **Error**: `404 Not Found` from api.localhost
 **Solution**:
 1.  Check if service is running.
 2.  Check Traefik dashboard at http://localhost:8080.
 3.  Verify `PathPrefix` labels in `docker-compose.yml`.
 ## Next Steps
- Read [Development Guide](development.md)
+*   [Development Guide](development.md) - Deep dive into coding standards
- Check [API Documentation](../api/openapi/)
+*   [API Documentation](../api/openapi/) - Explore the APIs
- Review [Architecture Overview](../architecture/system-design.md)
+*   [Architecture](../architecture/system-design.md) - Understand the system design
--- a/docs/en/guides/troubleshooting.md
+++ b/docs/en/guides/troubleshooting.md
@@ -1,57 +1,218 @@
 # Troubleshooting Guide
-## Common Issues
+> **Note**: This guide focuses on debugging the GoodGo Microservices Platform in a local development environment (Docker Compose).
-### Database Connection Failed
+## Table of Contents
-**Symptoms**: Service can't connect to database
+1. [General Diagnosis](#general-diagnosis)
 2. [Infrastructure Issues](#infrastructure-issues)
   - [Database (Neon/PostgreSQL)](#database-neonpostgresql)
   - [Redis](#redis)
   - [Traefik Gateway](#traefik-gateway)
 3. [Service Issues](#service-issues)
   - [Service Fails to Start](#service-fails-to-start)
   - [Prisma/Database Errors](#prismadatabase-errors)
   - [Authentication Errors](#authentication-errors)
 4. [Debugging Tools](#debugging-tools)
 5. [FAQ](#faq)
-**Solutions**:
+---
 1. Check if PostgreSQL is running: `docker ps`
 2. Verify DATABASE_URL in .env
 3. Check network connectivity: `docker network ls`
 4. Review logs: `docker logs postgres-auth-local`
-### Port Already in Use
+## General Diagnosis
-**Symptoms**: Service fails to start with port error
+When something goes wrong, follow this checklist:
-**Solutions**:
+1.  **Check Service Status**:
-1. Find process using port: `lsof -i :5001`
+    ```bash
-2. Kill process or change PORT in .env
+    cd deployments/local
-3. Check docker-compose for port conflicts
+    docker-compose ps
    ```
    *All services should be `Up` or `Running`.*
-### Prisma Client Not Generated
+2.  **Check Logs**:
    ```bash
    # View logs for a specific service
    docker-compose logs -f <service-name>
    # View last 100 lines for all
    docker-compose logs --tail=100
    ```
-**Symptoms**: Import errors for Prisma Client
+3.  **Check Connectivity**:
    *   Can you reach the Gateway? `curl http://localhost/health`
    *   Can you reach the Dashboard? http://localhost:8080
 ---
 ## Infrastructure Issues
 ### Database (Neon/PostgreSQL)
 **Problem**: `P1001: Can't reach database server` or `Connection timed out`
 *   **Cause 1**: Internet connectivity issues (Neon is cloud-based).
 *   **Cause 2**: Incorrect `DATABASE_URL` in `.env`.
 *   **Cause 3**: IP address blocked by Neon.
 **Solution**:
 1.  Verify internet connection: `ping neon.tech`.
 2.  Check `deployments/local/.env.local`. The URL should look like:
    `postgres://user:pass@ep-xyz.aws.neon.tech/neondb`
 3.  Go to Neon Dashboard -> Settings, ensure "Allow all IPs" or add your current IP.
 **Problem**: `P1003: Database does not exist`
 *   **Reason**: You are connecting to the wrong database name.
 *   **Fix**: Check the end of your connection string (e.g., `/neondb` usually). If you are using a custom DB name, ensure it exists in Neon.
 ### Redis
 **Problem**: `Redis connection refused` or `ECONNREFUSED`
 *   **Cause**: Redis container is not running or port mapping is wrong.
 **Solution**:
 1.  Check Redis status: `docker-compose ps redis`.
 2.  Restart Redis: `docker-compose restart redis`.
 3.  Check logs: `docker-compose logs redis`.
 4.  Connection string from services:
    *   **Inside Docker**: `redis:6379`
    *   **From Host**: `localhost:6379`
 ### Traefik Gateway
 **Problem**: `404 Not Found` when accessing APIs (e.g., `http://localhost/api/v1/auth`)
 *   **Cause**: Service is down or Labels are misconfigured.
 **Solution**:
 1.  Check Traefik Dashboard at http://localhost:8080.
    *   Look for "HTTP Routers" and "Services".
    *   If your service is missing, check `docker-compose.yml` labels.
 2.  Verify `PathPrefix` in labels matches your request.
    ```yaml
    - "traefik.http.routers.iam.rule=PathPrefix(`/api/v1/auth`)"
    ```
 3.  Check if the service passed health checks (Health status in dashboard).
 **Problem**: `Bad Gateway` or `Gateway Timeout`
 *   **Cause**: Service is crashing or taking too long to respond.
 *   **Fix**: Check the specific service logs (`docker-compose logs iam-service`).
 ---
 ## Service Issues
 ### Service Fails to Start
 **Symptom**: Container status is `Exited (1)` or `Restarting`.
 **Debugging**:
 1.  Check logs immediately:
    ```bash
    docker-compose logs iam-service
    ```
 2.  **Common Error**: `Config validation error`
    *   **Fix**: Check environment variables. Using `./scripts/setup/init-project.sh` ensures `.env` exists.
 3.  **Common Error**: `PrismaClientInitializationError`
    *   **Fix**: Database connectivity issue (see Infrastructure section).
 ### Prisma/Database Errors
 **Error**: `P2025: Record to update not found`
 *   **Fix**: Logic error. Ensure the ID exists before updating.
 **Error**: `P2002: Unique constraint failed`
 *   **Fix**: You are trying to insert duplicate data (e.g., same email).
 **Error**: `Migration failed`
 *   **Fix**:
    1.  Delete `prisma/migrations` folder (only in dev!).
    2.  Reset database: `pnpm prisma migrate reset`.
    3.  Regenerate client: `pnpm prisma generate`.
 ### Authentication Errors
 **Problem**: `401 Unauthorized` despite valid token
 *   **Cause 1**: Token expired.
 *   **Cause 2**: Public key mismatch (Service can't verify token signed by IAM).
 *   **Cause 3**: Clock skew (Docker time vs Host time).
 **Solution**:
 1.  Check server logs for JWT verification errors.
 2.  Restart services to refresh keys.
 3.  Sync Docker time: restart Docker Desktop.
 ---
 ## Debugging Tools
 ### 1. Accessing Container Shell
 To inspect files or run commands inside a running container:
 **Solutions**:
 ```bash
-cd services/iam-service
+docker-compose exec iam-service sh
-pnpm prisma generate
+# or /bin/bash
 ```
-### Build Failures
+### 2. Inspecting Database (via Prisma Studio)
-**Symptoms**: TypeScript or build errors
+Use Prisma Studio to view/edit data visually:
-**Solutions**:
+```bash
-1. Clean build artifacts: `./scripts/utils/cleanup.sh`
+pnpm --filter @goodgo/iam-service prisma studio
-2. Reinstall dependencies: `pnpm install`
+# Opens http://localhost:5555
-3. Check TypeScript errors: `pnpm typecheck`
+```
-### Traefik Not Routing
+### 3. Inspecting Redis
-**Symptoms**: 404 errors from Traefik
+```bash
 docker-compose exec redis redis-cli
 > PING
 PONG
 > KEYS *
 1) "user:123:session"
 ```
-**Solutions**:
+### 4. Direct API Testing
 1. Check Traefik dashboard: http://localhost:8080
 2. Verify service labels in docker-compose
 3. Check routes.yml configuration
 4. Review Traefik logs: `docker logs traefik-local`
-## Getting Help
+Use `curl` or Postman.
-1. Check service logs: `./scripts/dev/logs.sh <service>`
+```bash
-2. Review GitHub Issues
+# Health Check
-3. Contact team lead
+curl -v http://localhost/api/v1/auth/health/live
 # Login (example)
 curl -X POST http://localhost/api/v1/auth/login \
  -H "Content-Type: application/json" \
  -d '{"email":"admin@example.com", "password":"password"}'
 ```
 ---
 ## FAQ
 **Q: Why is my change not reflecting?**
 A: If you changed `.env` or `docker-compose.yml`, you must restart:
 ```bash
 docker-compose down && docker-compose up -d
 ```
 If you changed code, hot-reloading (nodemon) should pick it up. If not, restart container.
 **Q: How do I reset everything?**
 A: Be careful, this deletes all data!
 ```bash
 docker-compose down -v
 # -v removes volumes (Redis data, etc.)
 ```
 **Q: My computer is slow when running everything.**
 A: Docker consumes RAM.
 1.  Stop unused services (e.g., `future-service`).
 2.  Increase Docker resource limits in Docker Desktop settings.
--- a/docs/vi/architecture/caching-architecture.md
+++ b/docs/vi/architecture/caching-architecture.md
--- a/docs/vi/architecture/data-consistency-patterns.md
+++ b/docs/vi/architecture/data-consistency-patterns.md
@@ -0,0 +1,745 @@
 # Patterns Đồng bộ Dữ liệu / Data Consistency Patterns
 > **VI**: Các patterns để duy trì tính nhất quán dữ liệu trong kiến trúc microservices phân tán
 > **EN**: Patterns for maintaining data consistency in distributed microservices architecture
 ## Sơ đồ Tổng quan / Overview Diagram
 ```mermaid
 graph TD
    subgraph "Consistency Patterns"
        Saga[Saga Pattern<br/>Distributed Transactions]
        Outbox[Outbox Pattern<br/>Reliable Events]
        Idempotency[Idempotency<br/>Retry Safety]
        OptimisticLock[Optimistic Locking<br/>Concurrent Updates]
        CQRS[CQRS<br/>Read/Write Separation]
    end
    Service1[Service A] --> Saga
    Service2[Service B] --> Outbox
    Service3[Service C] --> Idempotency
    Saga --> EventualConsistency[Eventual Consistency]
    Outbox --> EventualConsistency
    Idempotency --> EventualConsistency
    OptimisticLock --> StrongConsistency[Strong Consistency]
    CQRS --> EventualConsistency
    style Saga fill:#e1f5ff
    style Outbox fill:#fff4e1
    style Idempotency fill:#f0e1ff
    style CQRS fill:#d4edda
 ```
 ## Mô tả Kiến trúc / Architecture Description
 ### VI: Tổng quan Kiến trúc
 Nền tảng GoodGo sử dụng nhiều consistency patterns để xử lý dữ liệu phân tán:
 **Thách thức Cốt lõi**:
 - Không có distributed transactions (2PC quá chậm)
 - Services sở hữu dữ liệu riêng (database per service)
 - Network failures có thể gây partial completion
 - Cần maintain data integrity giữa các services
 **Lựa chọn Pattern**:
 - **Saga**: Cho workflows nhiều services
 - **Outbox**: Cho event publishing đảm bảo
 - **Idempotency**: Cho retries an toàn
 - **Optimistic Locking**: Cho concurrent updates
 - **CQRS**: Cho tối ưu read/write
 ### EN: Architecture Overview
 GoodGo platform uses multiple consistency patterns to handle distributed data:
 **Core Challenges**:
 - No distributed transactions (2PC too slow)
 - Services own their data (database per service)
 - Network failures can cause partial completion
 - Need to maintain data integrity across services
 **Pattern Selection**:
 - **Saga**: For multi-service workflows
 - **Outbox**: For guaranteed event publishing
 - **Idempotency**: For safe retries
 - **Optimistic Locking**: For concurrent updates
 - **CQRS**: For read/write optimization
 ## Bối cảnh Hệ thống / System Context
 ```mermaid
 C4Context
    title System Context for Data Consistency in GoodGo Platform
    Person(user, "User", "End user performing actions")
    System_Boundary(goodgo, "GoodGo Microservices") {
        System(order_service, "Order Service", "Manages orders with Saga")
        System(payment_service, "Payment Service", "Processes payments")
        System(inventory_service, "Inventory Service", "Manages stock")
        System(saga_orchestrator, "Saga Orchestrator", "Coordinates distributed transactions")
        System(outbox_processor, "Outbox Processor", "Publishes events reliably")
    }
    System_Ext(db_order, "Order DB", "PostgreSQL with Outbox table")
    System_Ext(db_payment, "Payment DB", "PostgreSQL with version field")
    System_Ext(db_inventory, "Inventory DB", "PostgreSQL")
    System_Ext(kafka, "Event Bus", "Kafka - Event streaming")
    System_Ext(redis, "Cache", "Redis - Idempotency keys")
    Rel(user, order_service, "Places order", "HTTPS")
    Rel(order_service, saga_orchestrator, "Starts saga", "Internal")
    Rel(saga_orchestrator, payment_service, "Process payment", "HTTP")
    Rel(saga_orchestrator, inventory_service, "Reserve stock", "HTTP")
    Rel(order_service, db_order, "Writes + Outbox", "SQL")
    Rel(payment_service, db_payment, "Updates with version", "SQL")
    Rel(inventory_service, db_inventory, "Reads/Writes", "SQL")
    Rel(outbox_processor, db_order, "Polls outbox", "SQL")
    Rel(outbox_processor, kafka, "Publishes events", "Kafka Protocol")
    Rel(order_service, redis, "Checks idempotency key", "Redis Protocol")
    UpdateRelStyle(saga_orchestrator, payment_service, $lineColor="red", $textColor="red")
    UpdateRelStyle(saga_orchestrator, inventory_service, $lineColor="red", $textColor="red")
 ```
 **VI**: Nền tảng GoodGo sử dụng kiến trúc database-per-service nơi mỗi service sở hữu dữ liệu riêng. Tính nhất quán dữ liệu giữa các services đạt được thông qua các patterns như Saga (cho workflows phối hợp), Outbox (cho event publishing đáng tin cậy), Idempotency (cho retries an toàn), và Optimistic Locking (cho concurrent updates). Các patterns này cho phép eventual consistency đồng thời duy trì data integrity.
 **EN**: The GoodGo platform uses a database-per-service architecture where each service owns its data. Data consistency across services is achieved through patterns like Saga (for coordinated workflows), Outbox (for reliable event publishing), Idempotency (for safe retries), and Optimistic Locking (for concurrent updates). These patterns enable eventual consistency while maintaining data integrity.
 ## Pattern Saga / Saga Pattern
 ```mermaid
 sequenceDiagram
    participant Orchestrator
    participant OrderService
    participant PaymentService
    participant InventoryService
    Orchestrator->>OrderService: 1. Create Order
    OrderService-->>Orchestrator: Order Created
    Orchestrator->>PaymentService: 2. Process Payment
    PaymentService-->>Orchestrator: Payment Success
    Orchestrator->>InventoryService: 3. Reserve Inventory
    alt Inventory Reserved
        InventoryService-->>Orchestrator: Success
        Orchestrator->>Orchestrator: Complete Saga ✓
    else Inventory Failed
        InventoryService-->>Orchestrator: Failed ✗
        Orchestrator->>PaymentService: Compensate: Refund
        PaymentService-->>Orchestrator: Refunded
        Orchestrator->>OrderService: Compensate: Cancel Order
        OrderService-->>Orchestrator: Cancelled
    end
 ```
 **VI Mô tả**: Saga quản lý distributed transactions dưới dạng chuỗi local transactions với compensation.
 **EN Description**: Saga manages distributed transactions as sequence of local transactions with compensation.
 **Triển khai / Implementation**:
 ```typescript
 // VI: Saga orchestrator
 // EN: Saga orchestrator
 class OrderSaga {
  async execute(orderData: OrderData): Promise<void> {
    const sagaContext = {
      orderId: null,
      paymentId: null,
      inventoryId: null
    };
    try {
      // VI: Bước 1: Tạo đơn hàng
      // EN: Step 1: Create order
      sagaContext.orderId = await orderService.create(orderData);
      // VI: Bước 2: Xử lý thanh toán
      // EN: Step 2: Process payment
      sagaContext.paymentId = await paymentService.process(orderData.payment);
      // VI: Bước 3: Đặt trước kho
      // EN: Step 3: Reserve inventory
      sagaContext.inventoryId = await inventoryService.reserve(orderData.items);
      // VI: Tất cả thành công - commit
      // EN: All success - commit
      await this.completeSaga(sagaContext);
    } catch (error) {
      // VI: Compensate theo thứ tự ngược lại
      // EN: Compensate in reverse order
      await this.compensate(sagaContext, error);
      throw error;
    }
  }
  private async compensate(context: SagaContext, error: Error): Promise<void> {
    if (context.inventoryId) {
      await inventoryService.release(context.inventoryId);
    }
    if (context.paymentId) {
      await paymentService.refund(context.paymentId);
    }
    if (context.orderId) {
      await orderService.cancel(context.orderId);
    }
  }
 }
 ```
 ## Pattern Outbox / Outbox Pattern
 ```mermaid
 sequenceDiagram
    participant Service
    participant DB as Database
    participant OutboxTable as Outbox Table
    participant Processor as Outbox Processor
    participant Kafka
    Service->>DB: Begin Transaction
    Service->>DB: Update Business Data
    Service->>OutboxTable: Insert Event
    Service->>DB: Commit Transaction
    loop Every 5 seconds
        Processor->>OutboxTable: SELECT unpublished events
        OutboxTable-->>Processor: Events
        Processor->>Kafka: Publish Events
        Kafka-->>Processor: Ack
        Processor->>OutboxTable: Mark as published
    end
 ```
 **VI**: Đảm bảo event publishing bằng cách lưu events trong database cùng transaction với business data.
 **EN**: Guarantees event publishing by storing events in database within same transaction as business data.
 **Triển khai / Implementation**:
 ```typescript
 // VI: Lưu event trong outbox
 // EN: Store event in outbox
 async createUser(userData: CreateUserDto): Promise<User> {
  return await prisma.$transaction(async (tx) => {
    // VI: Business operation
    // EN: Business operation
    const user = await tx.user.create({ data: userData });
    // VI: Lưu event trong outbox (cùng transaction)
    // EN: Store event in outbox (same transaction)
    await tx.outbox.create({
      data: {
        aggregateId: user.id,
        aggregateType: 'User',
        eventType: 'user.created.v1',
        payload: JSON.stringify(user),
        createdAt: new Date()
      }
    });
    return user;
  });
 }
 // VI: Outbox processor (chạy định kỳ)
 // EN: Outbox processor (runs periodically)
 async processOutbox(): Promise<void> {
  const events = await prisma.outbox.findMany({
    where: { publishedAt: null },
    take: 100
  });
  for (const event of events) {
    try {
      await kafkaProducer.send({
        topic: event.eventType,
        messages: [{ value: event.payload }]
      });
      await prisma.outbox.update({
        where: { id: event.id },
        data: { publishedAt: new Date() }
      });
    } catch (error) {
      logger.error('Failed to publish event', { event, error });
    }
  }
 }
 ```
 ## Pattern Idempotency / Idempotency Pattern
 ```mermaid
 graph LR
    Request1[Request with<br/>Idempotency Key]
    Request2[Retry with<br/>Same Key]
    Request1 --> Check{Key Exists?}
    Check -->|No| Process[Process Request]
    Check -->|Yes| Return[Return Cached Result]
    Process --> Store[Store Result<br/>with Key]
    Store --> Response1[Response]
    Request2 --> Check
    Return --> Response2[Same Response]
    style Check fill:#fff3cd
    style Store fill:#d4edda
 ```
 **VI**: Đảm bảo operations có thể retry an toàn mà không có side effects bằng cách sử dụng idempotency keys.
 **EN**: Ensures operations can be safely retried without side effects by using idempotency keys.
 **Triển khai / Implementation**:
 ```typescript
 // VI: Idempotency middleware
 // EN: Idempotency middleware
 async function idempotentOperation<T>(
  key: string,
  operation: () => Promise<T>,
  ttl: number = 86400 // VI: 24 giờ / EN: 24 hours
 ): Promise<T> {
  // VI: Kiểm tra đã xử lý chưa
  // EN: Check if already processed
  const cached = await redis.get(`idempotency:${key}`);
  if (cached) {
    return JSON.parse(cached);
  }
  // VI: Xử lý operation
  // EN: Process operation
  const result = await operation();
  // VI: Lưu kết quả
  // EN: Store result
  await redis.setex(`idempotency:${key}`, ttl, JSON.stringify(result));
  return result;
 }
 // VI: Sử dụng trong controller
 // EN: Usage in controller
 async createPayment(req: Request, res: Response): Promise<void> {
  const idempotencyKey = req.headers['idempotency-key'] as string;
  if (!idempotencyKey) {
    return res.status(400).json({ error: 'Idempotency-Key header required' });
  }
  const result = await idempotentOperation(
    idempotencyKey,
    () => paymentService.process(req.body)
  );
  res.json({ success: true, data: result });
 }
 ```
 ## Khóa Lạc quan / Optimistic Locking
 ```mermaid
 sequenceDiagram
    participant User1
    participant User2
    participant Service
    participant DB
    User1->>Service: Read (version=1)
    User2->>Service: Read (version=1)
    User1->>Service: Update (version=1)
    Service->>DB: UPDATE WHERE version=1
    DB-->>Service: Success, version→2
    Service-->>User1: Success
    User2->>Service: Update (version=1)
    Service->>DB: UPDATE WHERE version=1
    DB-->>Service: No rows updated
    Service-->>User2: Conflict - version mismatch
    User2->>Service: Read (version=2)
    User2->>Service: Update (version=2)
    Service-->>User2: Success
 ```
 **VI**: Ngăn chặn lost updates bằng cách kiểm tra version khi update.
 **EN**: Prevents lost updates by checking version on update.
 **Triển khai / Implementation**:
 ```prisma
 // VI: Prisma schema
 // EN: Prisma schema
 model User {
  id      String @id @default(cuid())
  email   String @unique
  name    String
  version Int    @default(1)  // VI: Trường version / EN: Version field
 }
 ```
 ```typescript
 // VI: Update với optimistic locking
 // EN: Update with optimistic locking
 async updateUser(userId: string, data: UpdateUserDto, currentVersion: number): Promise<User> {
  const result = await prisma.user.updateMany({
    where: {
      id: userId,
      version: currentVersion  // VI: Kiểm tra version / EN: Check version
    },
    data: {
      ...data,
      version: { increment: 1 }  // VI: Tăng version / EN: Increment version
    }
  });
  if (result.count === 0) {
    throw new ConflictError('Version mismatch - data was modified by another user');
  }
  return await prisma.user.findUnique({ where: { id: userId } });
 }
 ```
 ## CQRS Pattern
 ```mermaid
 graph LR
    subgraph "Write Side"
        Command[Command] --> WriteModel[Write Model<br/>Normalized]
        WriteModel --> Events[Domain Events]
    end
    subgraph "Read Side"
        Events --> Projection[Event Projection]
        Projection --> ReadModel[Read Model<br/>Denormalized]
        Query[Query] --> ReadModel
    end
    WriteModel --> DB1[(Write DB)]
    ReadModel --> DB2[(Read DB<br/>Optimized)]
    style WriteModel fill:#f0e1ff
    style ReadModel fill:#d4edda
 ```
 **VI**: Tách biệt read và write models để tối ưu hiệu suất.
 **EN**: Separates read and write models for optimal performance.
 ## Đặc điểm Hiệu suất / Performance Characteristics
 **VI**: Chỉ số hiệu suất và chiến lược tối ưu cho patterns đồng bộ dữ liệu.
 **EN**: Performance metrics and optimization strategies for data consistency patterns.
 | Pattern / Pattern | Tác động Độ trễ / Latency Impact | Thông lượng / Throughput | Ghi chú / Notes |
 |-------------------|----------------------------------|--------------------------|-----------------|
 | **Thực thi Saga / Saga Execution** | 500ms - 2s | 100-500 sagas/s | Phụ thuộc số bước và compensation / Depends on number of steps and compensation |
 | **Xử lý Outbox / Outbox Processing** | < 100ms | 10,000 events/s | Xử lý bất đồng bộ, tác động tối thiểu / Async processing, minimal user impact |
 | **Kiểm tra Idempotency / Idempotency Check** | < 10ms | 50,000 checks/s | Redis lookup, rất nhanh / Redis lookup, very fast |
 | **Cập nhật Optimistic Lock / Optimistic Lock Update** | < 50ms | 5,000 updates/s | Single DB operation với version check / Single DB operation with version check |
 | **CQRS Projection** | 100ms - 1s | 1,000 events/s | Xử lý event sang read model / Event processing to read model |
 | **Thực thi Compensation / Compensation Execution** | 200ms - 1s | Varies | Rollback operations trong saga / Rollback operations in saga |
 ### Chiến lược Tối ưu Hiệu suất / Performance Optimization Strategies
 **Saga Pattern**:
 - Giảm thiểu số bước (< 5 bước lý tưởng) / Minimize number of steps (< 5 steps ideal)
 - Thực thi song song khi có thể / Parallel execution where possible
 - Cache service responses
 - Đặt timeouts phù hợp (30s mặc định) / Set appropriate timeouts (30s default)
 **Outbox Pattern**:
 - Batch process outbox events (100-500 mỗi batch / per batch)
 - Index cột `publishedAt` cho hiệu suất / Index `publishedAt` column for performance
 - Archive processed events định kỳ / Archive processed events periodically
 - Sử dụng connection pooling cho Kafka / Use connection pooling for Kafka
 **Idempotency**:
 - Sử dụng Redis cho fast key lookups / Use Redis for fast key lookups
 - Đặt TTL 24-48 giờ / Set TTL to 24-48 hours
 - Hash long idempotency keys
 - Clean expired keys thường xuyên / Clean expired keys regularly
 **Optimistic Locking**:
 - Hoạt động tốt nhất cho low-contention scenarios / Works best for low-contention scenarios
 - Triển khai retry với exponential backoff / Implement retry with exponential backoff
 - Giám sát conflict rates (nên < 5%) / Monitor conflict rates (should be < 5%)
 - Cân nhắc pessimistic locking nếu conflicts > 10% / Consider pessimistic locking if conflicts > 10%
 ## Cân nhắc Bảo mật / Security Considerations
 **VI**: Biện pháp bảo mật để bảo vệ các operations đồng bộ dữ liệu.
 **EN**: Security measures for protecting data consistency operations.
 ### Bảo mật Saga / Saga Security
 **Bảo vệ Compensation / Compensation Protection**:
 - Xác thực saga execution permissions ở mỗi bước / Validate saga execution permissions at each step
 - Mã hóa sensitive data trong saga context / Encrypt sensitive data in saga context
 - Log tất cả saga executions cho audit / Log all saga executions for audit
 - Triển khai timeout để ngăn hanging sagas / Implement timeout to prevent hanging sagas
 ```typescript
 // VI: Saga context bảo mật
 // EN: Secure saga context
 interface SecureSagaContext {
  sagaId: string;
  userId: string; // VI: User khởi tạo / EN: User who initiated
  permissions: string[]; // VI: Quyền yêu cầu / EN: Required permissions
  encryptedData: string; // VI: Dữ liệu nhạy cảm đã mã hóa / EN: Encrypted sensitive data
  auditLog: AuditEntry[]; // VI: Audit trail / EN: Audit trail
 }
 ```
 ### Bảo mật Outbox / Outbox Security
 **Mã hóa Event Payload / Event Payload Encryption**:
 - Mã hóa PII (Personally Identifiable Information) trước khi lưu trong outbox / Encrypt PII before storing in outbox
 - Sử dụng AES-256-GCM cho event payload encryption / Use AES-256-GCM for event payload encryption
 - Giải mã chỉ khi publishing sang Kafka / Decrypt only when publishing to Kafka
 - Rotate encryption keys hàng quý / Rotate encryption keys quarterly
 **Kiểm soát Truy cập / Access Control**:
 - Hạn chế truy cập outbox table chỉ cho outbox processor / Restrict outbox table access to outbox processor only
 - Sử dụng database roles và permissions / Use database roles and permissions
 - Giám sát outbox table access patterns / Monitor outbox table access patterns
 ### Bảo mật Idempotency / Idempotency Security
 **Bảo mật Key / Key Security**:
 - Sử dụng cryptographic hashing cho idempotency keys (SHA-256) / Use cryptographic hashing for idempotency keys (SHA-256)
 - Bao gồm user context trong key generation / Include user context in key generation
 - Xác thực key ownership trước khi xử lý / Validate key ownership before processing
 - Clear keys khi user logout cho sensitive operations / Clear keys on user logout for sensitive operations
 ```typescript
 // VI: Tạo idempotency key bảo mật
 // EN: Secure idempotency key generation
 function generateIdempotencyKey(
  operation: string,
  userId: string,
  data: any
 ): string {
  const payload = JSON.stringify({ operation, userId, data });
  return crypto.createHash('sha256').update(payload).digest('hex');
 }
 ```
 ### Bảo mật Optimistic Lock / Optimistic Locking Security
 **Ngăn chặn Giả mạo Version / Version Tampering Prevention**:
 - Xác thực version field chỉ ở server-side / Validate version field on server-side only
 - Không bao giờ chấp nhận version từ client trực tiếp / Never accept version from client directly
 - Log version conflicts cho security monitoring / Log version conflicts for security monitoring
 - Rate limit update attempts per user
 ## Triển khai / Deployment
 **VI**: Cách các patterns đồng bộ dữ liệu được triển khai và mở rộng.
 **EN**: How data consistency patterns are deployed and scaled.
 ```mermaid
 graph TD
    subgraph "Production Deployment"
        subgraph "Order Service Cluster"
            OS1[Order Service\nPod 1]
            OS2[Order Service\nPod 2]
            OS3[Order Service\nPod 3]
        end
        subgraph "Saga Orchestrator"
            SO1[Saga Orchestrator\nPod 1]
            SO2[Saga Orchestrator\nPod 2]
        end
        subgraph "Outbox Processor"
            OP1[Outbox Processor\nPod 1]
            OP2[Outbox Processor\nPod 2]
        end
        OS1 & OS2 & OS3 --> DB[(Order DB\nwith Outbox)]
        OS1 & OS2 & OS3 --> Redis[(Redis\nIdempotency Keys)]
        SO1 & SO2 --> PS[Payment Service]
        SO1 & SO2 --> IS[Inventory Service]
        OP1 & OP2 --> DB
        OP1 & OP2 --> Kafka[Kafka Cluster\n5 brokers]
    end
    style SO1 fill:#e1f5ff
    style SO2 fill:#e1f5ff
    style OP1 fill:#fff4e1
    style OP2 fill:#fff4e1
    style DB fill:#d4edda
    style Kafka fill:#ffe1e1
 ```
 ### Cấu hình Triển khai / Deployment Configuration
 | Thành phần / Component | Replicas | Resources | HA Strategy |
 |------------------------|----------|-----------|-------------|
 | **Saga Orchestrator** | 2-3 | 512Mi RAM, 500m CPU | Leader election với etcd / Leader election with etcd |
 | **Outbox Processor** | 2-5 | 256Mi RAM, 250m CPU | Distributed lock per event batch |
 | **Services với Outbox / Services with Outbox** | 3+ | Varies | Standard service scaling |
 | **Redis (Idempotency)** | 3 nodes | 1Gi RAM each | Redis Cluster với replication / Redis Cluster with replication |
 ### Chiến lược Mở rộng / Scaling Strategy
 **Saga Orchestrator**:
 - Scale dựa trên pending saga count / Scale based on pending saga count
 - Sử dụng queue-based load distribution / Use queue-based load distribution
 - Giám sát saga execution duration / Monitor saga execution duration
 **Outbox Processor**:
 - Scale với database sharding (1 processor per shard) / Scale with database sharding (1 processor per shard)
 - Tăng batch size trước khi thêm replicas / Increase batch size before adding replicas
 - Giám sát outbox table size và age / Monitor outbox table size and age
 **Idempotency Store (Redis)**:
 - Scale Redis cluster horizontally
 - Sử dụng consistent hashing cho key distribution / Use consistent hashing for key distribution
 - Giám sát memory usage (nên < 70%) / Monitor memory usage (should be < 70%)
 ## Giám sát & Khả năng quan sát / Monitoring & Observability
 **VI**: Chiến lược giám sát cho patterns đồng bộ dữ liệu.
 **EN**: Monitoring strategies for data consistency patterns.
 ### Chỉ số Chính / Key Metrics
 **Saga Metrics**:
 - `saga_executions_total` - Tổng saga executions (success/failure) / Total saga executions (success/failure)
 - `saga_duration_seconds` - Saga execution time histogram
 - `saga_compensations_total` - Tổng compensation executions / Total compensation executions
 - `saga_timeout_total` - Sagas timeout / Sagas that timed out
 - `saga_pending_count` - Sagas đang thực thi / Sagas currently executing
 **Outbox Metrics**:
 - `outbox_events_total` - Events ghi vào outbox / Events written to outbox
 - `outbox_published_total` - Events published sang Kafka / Events published to Kafka
 - `outbox_processing_lag_seconds` - Thời gian từ write đến publish / Time from write to publish
 - `outbox_table_size` - Số dòng outbox table / Outbox table row count
 - `outbox_failed_events_total` - Failed event publications
 **Idempotency Metrics**:
 - `idempotency_checks_total` - Tổng idempotency checks / Total idempotency checks
 - `idempotency_hits_total` - Duplicate requests prevented
 - `idempotency_key_ttl_seconds` - Average key TTL
 - `idempotency_redis_errors_total` - Redis failures
 **Optimistic Lock Metrics**:
 - `optimistic_lock_conflicts_total` - Version conflicts detected
 - `optimistic_lock_retries_total` - Retry attempts sau conflict / Retry attempts after conflict
 - `optimistic_lock_success_rate` - Update success percentage
 ### Cảnh báo / Alerts
 **Critical Alerts**:
 ```yaml
 # VI: Saga timeout rate quá cao
 # EN: Saga timeout rate too high
 alert: HighSagaTimeoutRate
 expr: rate(saga_timeout_total[5m]) > 0.05
 for: 5m
 severity: critical
 # VI: Outbox processing lag
 # EN: Outbox processing lag
 alert: OutboxProcessingLag
 expr: outbox_processing_lag_seconds > 300
 for: 10m
 severity: critical
 # VI: High optimistic lock conflict rate
 # EN: High optimistic lock conflict rate
 alert: HighOptimisticLockConflicts
 expr: rate(optimistic_lock_conflicts_total[5m]) / rate(optimistic_lock_attempts_total[5m]) > 0.1
 for: 5m
 severity: warning
 ```
 ### Dashboard Giám sát / Monitoring Dashboard
 **Grafana Panels**:
 1. **Tổng quan Saga Orchestration / Saga Orchestration Overview**:
   - Saga execution rate (success/failure)
   - Average saga duration
   - Compensation rate
   - Pending saga count
 2. **Sức khỏe Outbox Processing / Outbox Processing Health**:
   - Outbox publishing rate
   - Processing lag (P95, P99)
   - Failed events
   - Table size trend
 3. **Hiệu quả Idempotency / Idempotency Effectiveness**:
   - Duplicate prevention rate
   - Redis hit rate
   - Key distribution
 4. **Data Consistency SLA**:
   - Overall consistency rate (target: 99.9%)
   - Mean time to consistency (MTTC)
   - Conflict resolution success rate
 ### Tracing Phân tán / Distributed Tracing
 **Trace Saga Execution**:
 ```typescript
 // VI: Saga step được trace
 // EN: Traced saga step
 async function executeStepWithTracing(
  step: SagaStep,
  context: SagaContext
 ): Promise<void> {
  const tracer = trace.getTracer('saga-orchestrator');
  const span = tracer.startSpan(`saga.step.${step.name}`, {
    attributes: {
      'saga.id': context.sagaId,
      'saga.step': step.name,
      'saga.attempt': context.currentAttempt
    }
  });
  try {
    await step.execute(context);
    span.setStatus({ code: SpanStatusCode.OK });
  } catch (error) {
    span.setStatus({ code: SpanStatusCode.ERROR, message: error.message });
    span.recordException(error);
    throw error;
  } finally {
    span.end();
  }
 }
 ```
 ## Tài liệu Liên quan / Related Documentation
 - [Event-Driven Architecture](./event-driven-architecture.md) - Event sourcing và Kafka / Event sourcing and Kafka
 - [System Design](./system-design.md) - Kiến trúc tổng thể / Overall architecture
 - [Microservices Communication](./microservices-communication.md) - Patterns giao tiếp service / Service communication patterns
 - [Resilience Patterns](../skills/resilience-patterns.md) - Circuit breaker, retry cho saga steps / Circuit breaker, retry for saga steps
 - [Caching Patterns](../skills/caching-patterns.md) - Caching cho idempotency keys / Caching for idempotency keys
 - [Database Prisma](../skills/database-prisma.md) - Prisma transactions cho outbox pattern / Prisma transactions for outbox pattern
 ---
 **Cập nhật Lần cuối / Last Updated**: 2026-01-07  
 **Tác giả / Authors**: GoodGo Architecture Team  
 **Người Đánh giá / Reviewers**: To be assigned
--- a/docs/vi/architecture/event-driven-architecture.md
+++ b/docs/vi/architecture/event-driven-architecture.md
@@ -0,0 +1,639 @@
 # Kiến trúc Hướng Sự kiện / Event-Driven Architecture
 > **VI**: Kiến trúc hướng sự kiện cho giao tiếp bất đồng bộ sử dụng Apache Kafka
 > **EN**: Event-driven architecture for asynchronous communication using Apache Kafka
 ## Sơ đồ Tổng quan / Overview Diagram
 ```mermaid
 graph TD
    subgraph "Event Producers"
        IAM[IAM Service]
        Service1[Service A]
    end
    subgraph "Event Broker"
        Kafka[Apache Kafka]
        Topics[Topics: user.events, auth.events]
    end
    subgraph "Event Consumers"
        Consumer1[Notification Service]
        Consumer2[Audit Service]
    end
    IAM -->|Publish| Kafka
    Service1 -->|Publish| Kafka
    Kafka --> Topics
    Topics -->|Subscribe| Consumer1
    Topics -->|Subscribe| Consumer2
    style Kafka fill:#e1f5ff
    style Topics fill:#fff4e1
 ```
 ## Mô tả Kiến trúc / Architecture Description
 ### VI: Phần Tiếng Việt
 Nền tảng GoodGo triển khai Kiến trúc Hướng Sự kiện (EDA) cho giao tiếp bất đồng bộ giữa microservices.
 **Nguyên tắc Cốt lõi**:
 1. **Event-First Design**: Mọi thay đổi trạng thái phát ra domain events
 2. **Loose Coupling**: Services giao tiếp qua events
 3. **Eventual Consistency**: Chấp nhận inconsistency tạm thời  
 4. **Event Sourcing**: Lưu thay đổi dưới dạng chuỗi event
 5. **CQRS Pattern**: Tách biệt read/write operations
 **Công nghệ**:
 - Apache Kafka - Nền tảng event streaming
 - Schema Registry - Avro schemas để validation
 - KafkaJS - Thư viện Node.js client  
 - Event Sourcing - Triển khai tùy chỉnh trong IAM
 ### EN: English Section
 The GoodGo platform implements Event-Driven Architecture (EDA) for asynchronous communication between microservices.
 **Core Principles**:
 1. **Event-First Design**: All state changes emit domain events
 2. **Loose Coupling**: Services communicate through events
 3. **Eventual Consistency**: Accept temporary inconsistency
 4. **Event Sourcing**: Store changes as event sequence
 5. **CQRS Pattern**: Separate read/write operations
 **Technology Stack**:
 - Apache Kafka - Event streaming platform
 - Schema Registry - Avro schemas for validation  
 - KafkaJS - Node.js client library
 - Event Sourcing - Custom implementation in IAM
 ## Luồng Sự kiện / Event Flow
 ```mermaid
 sequenceDiagram
    participant Producer as IAM Service
    participant Kafka as Kafka Broker
    participant Consumer as Notification Service
    Producer->>Kafka: Publish Event (user.created)
    Kafka->>Consumer: Deliver Event
    Consumer->>Consumer: Process Event
    Consumer-->>Kafka: Acknowledge
 ```
 **VI Các Bước**: Publish → Distribute → Consume → Retry (nếu thất bại) → DLQ (sau retry tối đa) → Acknowledge
 **EN Steps**: Publish → Distribute → Consume → Retry (if failed) → DLQ (after max retries) → Acknowledge
 ## Cấu trúc Sự kiện / Event Structure
 ```typescript
 interface BaseEvent {
  eventId: string;         // UUID
  eventType: string;       // user.created.v1
  eventVersion: string;    // 1.0.0
  timestamp: string;       // ISO 8601
  source: string;          // iam-service
  correlationId?: string;  // Request correlation
  data: unknown;           // Event payload
 }
 ```
 **Ví dụ / Example**:
 ```json
 {
  "eventId": "550e8400-e29b-41d4-a716-446655440000",
  "eventType": "user.created.v1",
  "timestamp": "2024-01-15T10:30:00Z",
  "source": "iam-service",
  "data": {
    "userId": "user_123",
    "email": "user@example.com"
  }
 }
 ```
 ## Kafka Topics
 ```mermaid
 graph LR
    UserCreated[user.created<br/>Partitions: 3]
    AuthLogin[auth.login.success<br/>Partitions: 5]
    AuditEvents[audit.events<br/>Partitions: 10]
    style UserCreated fill:#e1f5ff
    style AuthLogin fill:#fff4e1
    style AuditEvents fill:#f8d7da
 ```
 **Quy ước Đặt tên / Naming Convention**: `{domain}.{action}.{version}`
 **Ví dụ / Examples**:
 - `user.created.v1`
 - `auth.login.success.v1`
 - `audit.event.logged.v1`
 ## Xử lý Lỗi / Error Handling
 ```mermaid
 graph TD
    Event[Event] --> Process[Process]
    Process -->|Success| Ack[Acknowledge]
    Process -->|Failure| Retry[Retry 3x]
    Retry -->|Max Retries| DLQ[Dead Letter Queue]
    DLQ --> Alert[Alert Team]
 ```
 **Chiến lược / Strategy**:
 1. Retry với exponential backoff (100ms → 200ms → 400ms)
 2. Tối đa 3 lần thử / Max 3 attempts
 3. Chuyển sang DLQ sau retry tối đa / Move to DLQ after max retries
 4. Xem xét thủ công và xử lý lại / Manual review and reprocess
 ## Bối cảnh Hệ thống / System Context
 ```mermaid
 C4Context
    title Sơ đồ Bối cảnh Event-Driven Architecture
    System(iam, "IAM Service", "Event producer")
    System(service_a, "Service A", "Event producer")
    System(notification, "Notification Service", "Event consumer")
    System(audit, "Audit Service", "Event consumer")
    System_Ext(kafka, "Apache Kafka", "Event streaming platform")
    System_Ext(registry, "Schema Registry", "Schema management")
    System_Ext(monitoring, "Monitoring", "Kafka metrics & alerts")
    Rel(iam, kafka, "Publishes events", "Kafka Protocol")
    Rel(service_a, kafka, "Publishes events", "Kafka Protocol")
    Rel(kafka, notification, "Delivers events", "Kafka Protocol")
    Rel(kafka, audit, "Delivers events", "Kafka Protocol")
    Rel(kafka, registry, "Validates schemas", "HTTP")
    Rel(kafka, monitoring, "Sends metrics", "JMX")
 ```
 **VI Mô tả**:
 - **Producers**: IAM Service và các services khác publish domain events
 - **Kafka**: Event broker trung tâm, quản lý topics và partitions
 - **Consumers**: Notification và Audit services consume events
 - **Schema Registry**: Quản lý và validate Avro schemas
 - **Monitoring**: Thu thập metrics từ Kafka cluster
 **EN Description**:
 - **Producers**: IAM Service and other services publish domain events
 - **Kafka**: Central event broker, manages topics and partitions
 - **Consumers**: Notification and Audit services consume events
 - **Schema Registry**: Manages and validates Avro schemas
 - **Monitoring**: Collects metrics from Kafka cluster
 ## Đặc điểm Hiệu suất / Performance Characteristics
 | Chỉ số / Metric | Mục tiêu / Target | Ghi chú / Notes |
 |-----------------|-------------------|-----------------|
 | **Event Publish Latency (P95)** | < 10ms | Fire-and-forget, async |
 | **Event Delivery Latency (P95)** | < 100ms | End-to-end from publish to consume |
 | **Throughput** | 10,000 events/s | Per topic, scalable with partitions |
 | **Consumer Lag** | < 1000 messages | Per partition, monitored |
 | **Event Size** | < 1MB | Recommended max size |
 | **Retention** | 7 days | Default, configurable per topic |
 | **Replication Factor** | 3 | For fault tolerance |
 **VI Tối ưu hóa Hiệu suất**:
 - **Batch Publishing**: Group multiple events để giảm network overhead
 - **Compression**: Sử dụng Snappy hoặc LZ4 compression
 - **Partitioning**: Phân chia topics thành multiple partitions cho parallel processing
 - **Consumer Groups**: Multiple consumers trong cùng group để scale horizontally
 - **Async Publishing**: Fire-and-forget pattern, không block request handlers
 **EN Performance Optimizations**:
 - **Batch Publishing**: Group multiple events to reduce network overhead
 - **Compression**: Use Snappy or LZ4 compression
 - **Partitioning**: Divide topics into multiple partitions for parallel processing
 - **Consumer Groups**: Multiple consumers in same group for horizontal scaling
 - **Async Publishing**: Fire-and-forget pattern, don't block request handlers
 ## Cân nhắc Bảo mật / Security Considerations
 ### VI: Phần Tiếng Việt
 **Event Encryption**:
 - TLS in-transit cho tất cả Kafka connections
 - Optional payload encryption cho sensitive data
 - End-to-end encryption với custom encryption layer
 **Access Control**:
 - Kafka ACLs (Access Control Lists) per topic
 - SASL/SCRAM authentication cho producers và consumers
 - Separate credentials cho mỗi service
 - Principle of least privilege - chỉ grant quyền cần thiết
 **Schema Validation**:
 - Avro schemas trong Schema Registry
 - Schema evolution với backward/forward compatibility
 - Reject events không match schema
 **Audit**:
 - Log tất cả event publishes và consumes
 - Correlation IDs để trace event flow
 - Retention policy cho audit logs (7 years)
 **Data Retention**:
 - Default 7 days retention
 - Configurable per topic
 - Automatic deletion sau retention period
 - Compliance với GDPR (right to erasure)
 ### EN: English Section
 **Event Encryption**:
 - TLS in-transit for all Kafka connections
 - Optional payload encryption for sensitive data
 - End-to-end encryption with custom encryption layer
 **Access Control**:
 - Kafka ACLs (Access Control Lists) per topic
 - SASL/SCRAM authentication for producers and consumers
 - Separate credentials per service
 - Principle of least privilege - grant only necessary permissions
 **Schema Validation**:
 - Avro schemas in Schema Registry
 - Schema evolution with backward/forward compatibility
 - Reject events that don't match schema
 **Audit**:
 - Log all event publishes and consumes
 - Correlation IDs to trace event flow
 - Retention policy for audit logs (7 years)
 **Data Retention**:
 - Default 7 days retention
 - Configurable per topic
 - Automatic deletion after retention period
 - GDPR compliance (right to erasure)
 ## Triển khai / Deployment
 ```mermaid
 graph TD
    subgraph "Kafka Cluster"
        subgraph "Brokers"
            Broker1[Kafka Broker 1<br/>Leader for partitions 0,3,6]
            Broker2[Kafka Broker 2<br/>Leader for partitions 1,4,7]
            Broker3[Kafka Broker 3<br/>Leader for partitions 2,5,8]
        end
        subgraph "Coordination"
            ZK[Zookeeper Ensemble<br/>3 nodes]
        end
        Broker1 --> ZK
        Broker2 --> ZK
        Broker3 --> ZK
    end
    subgraph "Producers"
        IAM[IAM Service]
        ServiceA[Service A]
    end
    subgraph "Consumers"
        Notification[Notification Service<br/>Consumer Group: notifications]
        Audit[Audit Service<br/>Consumer Group: audit]
    end
    IAM --> Broker1
    IAM --> Broker2
    IAM --> Broker3
    ServiceA --> Broker1
    ServiceA --> Broker2
    ServiceA --> Broker3
    Broker1 --> Notification
    Broker2 --> Notification
    Broker3 --> Notification
    Broker1 --> Audit
    Broker2 --> Audit
    Broker3 --> Audit
    style Broker1 fill:#e1f5ff
    style Broker2 fill:#fff4e1
    style Broker3 fill:#d4edda
    style ZK fill:#f0e1ff
 ```
 ### VI: Chiến lược Triển khai
 **Kafka Cluster Configuration**:
 - **Brokers**: 3 brokers minimum (5 for production)
 - **Replication Factor**: 3 (for fault tolerance)
 - **Min In-Sync Replicas**: 2 (ensure data durability)
 - **Partitions**: 3-10 per topic (based on throughput needs)
 - **Zookeeper**: 3-node ensemble (for coordination)
 **Resource Allocation**:
 | Component | CPU | Memory | Disk |
 |-----------|-----|--------|------|
 | **Kafka Broker** | 2 cores | 4GB RAM | 100GB SSD |
 | **Zookeeper** | 1 core | 2GB RAM | 20GB SSD |
 | **Schema Registry** | 500m | 1GB RAM | 10GB |
 **Topic Configuration**:
 ```yaml
 user.created:
  partitions: 3
  replication-factor: 3
  retention-ms: 604800000  # 7 days
  compression-type: snappy
 auth.login.success:
  partitions: 5
  replication-factor: 3
  retention-ms: 604800000
  compression-type: snappy
 audit.events:
  partitions: 10
  replication-factor: 3
  retention-ms: 220752000000  # 7 years
  compression-type: lz4
 ```
 **High Availability**:
 - Multiple brokers với partition replication
 - Automatic leader election khi broker fails
 - Consumer group rebalancing
 - Monitoring và alerting cho broker health
 ### EN: Deployment Strategy
 **Kafka Cluster Configuration**:
 - **Brokers**: 3 brokers minimum (5 for production)
 - **Replication Factor**: 3 (for fault tolerance)
 - **Min In-Sync Replicas**: 2 (ensure data durability)
 - **Partitions**: 3-10 per topic (based on throughput needs)
 - **Zookeeper**: 3-node ensemble (for coordination)
 **Resource Allocation**:
 | Component | CPU | Memory | Disk |
 |-----------|-----|--------|------|
 | **Kafka Broker** | 2 cores | 4GB RAM | 100GB SSD |
 | **Zookeeper** | 1 core | 2GB RAM | 20GB SSD |
 | **Schema Registry** | 500m | 1GB RAM | 10GB |
 **Topic Configuration**:
 ```yaml
 user.created:
  partitions: 3
  replication-factor: 3
  retention-ms: 604800000  # 7 days
  compression-type: snappy
 auth.login.success:
  partitions: 5
  replication-factor: 3
  retention-ms: 604800000
  compression-type: snappy
 audit.events:
  partitions: 10
  replication-factor: 3
  retention-ms: 220752000000  # 7 years
  compression-type: lz4
 ```
 **High Availability**:
 - Multiple brokers with partition replication
 - Automatic leader election when broker fails
 - Consumer group rebalancing
 - Monitoring and alerting for broker health
 ## Giám sát & Khả năng quan sát / Monitoring & Observability
 ### VI: Chỉ số Chính
 **Kafka Broker Metrics**:
 - `kafka_server_brokertopicmetrics_messagesinpersec` - Messages in/sec
 - `kafka_server_brokertopicmetrics_bytesinpersec` - Bytes in/sec
 - `kafka_server_brokertopicmetrics_bytesoutpersec` - Bytes out/sec
 - `kafka_controller_kafkacontroller_activecontrollercount` - Active controller
 - `kafka_server_replicamanager_underreplicatedpartitions` - Under-replicated partitions
 **Consumer Metrics**:
 - `kafka_consumer_fetch_manager_records_lag_max` - Max consumer lag
 - `kafka_consumer_fetch_manager_records_consumed_rate` - Records consumed/sec
 - `kafka_consumer_coordinator_commit_latency_avg` - Commit latency
 **Producer Metrics**:
 - `kafka_producer_record_send_total` - Total records sent
 - `kafka_producer_record_error_total` - Total send errors
 - `kafka_producer_request_latency_avg` - Request latency
 **Application Metrics**:
 ```typescript
 // VI: Custom metrics cho event processing
 // EN: Custom metrics for event processing
 const eventPublished = new Counter({
  name: 'events_published_total',
  help: 'Total events published',
  labelNames: ['event_type', 'topic']
 });
 const eventConsumed = new Counter({
  name: 'events_consumed_total',
  help: 'Total events consumed',
  labelNames: ['event_type', 'topic', 'consumer_group']
 });
 const eventProcessingDuration = new Histogram({
  name: 'event_processing_duration_seconds',
  help: 'Event processing duration',
  labelNames: ['event_type'],
  buckets: [0.01, 0.05, 0.1, 0.5, 1, 2, 5]
 });
 ```
 **Alerting Rules**:
 ```yaml
 # VI: Quy tắc cảnh báo
 # EN: Alerting rules
 # High consumer lag
 - alert: HighConsumerLag
  expr: kafka_consumer_fetch_manager_records_lag_max > 10000
  for: 5m
  severity: warning
  annotations:
    summary: "High consumer lag detected"
    description: "Consumer lag is {{ $value }} messages"
 # Broker down
 - alert: KafkaBrokerDown
  expr: kafka_server_kafkaserver_brokerstate != 3
  for: 1m
  severity: critical
  annotations:
    summary: "Kafka broker is down"
 # Under-replicated partitions
 - alert: UnderReplicatedPartitions
  expr: kafka_server_replicamanager_underreplicatedpartitions > 0
  for: 5m
  severity: warning
  annotations:
    summary: "Under-replicated partitions detected"
 # Offline partitions
 - alert: OfflinePartitions
  expr: kafka_controller_kafkacontroller_offlinepartitionscount > 0
  for: 1m
  severity: critical
  annotations:
    summary: "Offline partitions detected"
 ```
 **Dashboards**:
 - Kafka Cluster Overview (brokers, topics, partitions)
 - Producer Performance (throughput, latency, errors)
 - Consumer Performance (lag, throughput, errors)
 - Topic Metrics (messages/sec, bytes/sec, retention)
 **Logging**:
 ```typescript
 // VI: Structured logging cho events
 // EN: Structured logging for events
 logger.info('Event published', {
  eventId: event.eventId,
  eventType: event.eventType,
  topic: 'user.created',
  correlationId: event.correlationId
 });
 logger.info('Event consumed', {
  eventId: event.eventId,
  eventType: event.eventType,
  topic: 'user.created',
  consumerGroup: 'notifications',
  processingTime: duration
 });
 ```
 ### EN: Key Metrics
 **Kafka Broker Metrics**:
 - `kafka_server_brokertopicmetrics_messagesinpersec` - Messages in/sec
 - `kafka_server_brokertopicmetrics_bytesinpersec` - Bytes in/sec
 - `kafka_server_brokertopicmetrics_bytesoutpersec` - Bytes out/sec
 - `kafka_controller_kafkacontroller_activecontrollercount` - Active controller
 - `kafka_server_replicamanager_underreplicatedpartitions` - Under-replicated partitions
 **Consumer Metrics**:
 - `kafka_consumer_fetch_manager_records_lag_max` - Max consumer lag
 - `kafka_consumer_fetch_manager_records_consumed_rate` - Records consumed/sec
 - `kafka_consumer_coordinator_commit_latency_avg` - Commit latency
 **Producer Metrics**:
 - `kafka_producer_record_send_total` - Total records sent
 - `kafka_producer_record_error_total` - Total send errors
 - `kafka_producer_request_latency_avg` - Request latency
 **Application Metrics**:
 ```typescript
 // Custom metrics for event processing
 const eventPublished = new Counter({
  name: 'events_published_total',
  help: 'Total events published',
  labelNames: ['event_type', 'topic']
 });
 const eventConsumed = new Counter({
  name: 'events_consumed_total',
  help: 'Total events consumed',
  labelNames: ['event_type', 'topic', 'consumer_group']
 });
 const eventProcessingDuration = new Histogram({
  name: 'event_processing_duration_seconds',
  help: 'Event processing duration',
  labelNames: ['event_type'],
  buckets: [0.01, 0.05, 0.1, 0.5, 1, 2, 5]
 });
 ```
 **Alerting Rules**:
 ```yaml
 # Alerting rules
 # High consumer lag
 - alert: HighConsumerLag
  expr: kafka_consumer_fetch_manager_records_lag_max > 10000
  for: 5m
  severity: warning
  annotations:
    summary: "High consumer lag detected"
    description: "Consumer lag is {{ $value }} messages"
 # Broker down
 - alert: KafkaBrokerDown
  expr: kafka_server_kafkaserver_brokerstate != 3
  for: 1m
  severity: critical
  annotations:
    summary: "Kafka broker is down"
 # Under-replicated partitions
 - alert: UnderReplicatedPartitions
  expr: kafka_server_replicamanager_underreplicatedpartitions > 0
  for: 5m
  severity: warning
  annotations:
    summary: "Under-replicated partitions detected"
 # Offline partitions
 - alert: OfflinePartitions
  expr: kafka_controller_kafkacontroller_offlinepartitionscount > 0
  for: 1m
  severity: critical
  annotations:
    summary: "Offline partitions detected"
 ```
 **Dashboards**:
 - Kafka Cluster Overview (brokers, topics, partitions)
 - Producer Performance (throughput, latency, errors)
 - Consumer Performance (lag, throughput, errors)
 - Topic Metrics (messages/sec, bytes/sec, retention)
 **Logging**:
 ```typescript
 // Structured logging for events
 logger.info('Event published', {
  eventId: event.eventId,
  eventType: event.eventType,
  topic: 'user.created',
  correlationId: event.correlationId
 });
 logger.info('Event consumed', {
  eventId: event.eventId,
  eventType: event.eventType,
  topic: 'user.created',
  consumerGroup: 'notifications',
  processingTime: duration
 });
 ```
 ## Tài liệu Liên quan / Related Documentation
 - [System Design](./system-design.md) - Kiến trúc tổng thể / Overall architecture
 - [IAM Architecture](./iam-proposal.md) - Triển khai Event sourcing / Event sourcing implementation
 ---
 **Cập nhật Lần cuối / Last Updated**: 2026-01-07  
 **Tác giả / Authors**: GoodGo Architecture Team
--- a/docs/vi/architecture/observability-architecture.md
+++ b/docs/vi/architecture/observability-architecture.md
@@ -0,0 +1,450 @@
 # Kiến trúc Khả năng Quan sát / Observability Architecture
 > **VI**: Khả năng quan sát toàn diện với metrics, logging và tracing
 > **EN**: Comprehensive observability with metrics, logging, and tracing
 ## Sơ đồ Tổng quan / Overview Diagram
 ```mermaid
 graph TD
    subgraph "Services"
        Service1[Service A]
        Service2[Service B]
    end
    subgraph "Metrics"
        Service1 -->|/metrics| Prom[Prometheus]
        Service2 -->|/metrics| Prom
        Prom --> Grafana[Grafana<br/>Dashboards]
    end
    subgraph "Logging"
        Service1 -->|JSON Logs| Loki
        Service2 -->|JSON Logs| Loki
        Loki --> GrafanaLogs[Grafana<br/>Log Explorer]
    end
    subgraph "Tracing"
        Service1 -->|Spans| Jaeger
        Service2 -->|Spans| Jaeger
        Jaeger --> JaegerUI[Jaeger UI]
    end
    style Prom fill:#d4edda
    style Loki fill:#fff4e1
    style Jaeger fill:#e1f5ff
 ```
 ```
 ## Bối cảnh Hệ thống / System Context
 ```mermaid
 C4Context
    title Sơ đồ Bối cảnh Khả năng Quan sát / Observability System Context
    Person(dev, "Developer", "Uses dashboards to monitor system")
    Person(sre, "SRE", "Manages infrastructure & alerts")
    System(obs, "Observability Stack", "Prometheus, Loki, Jaeger, Grafana")
    System_Ext(service, "Microservices", "Sends telemetry data")
    System_Ext(k8s, "Kubernetes", "Sends cluster metrics")
    Rel(dev, obs, "Views Dashboards", "HTTPS")
    Rel(sre, obs, "Configures Alerts", "HTTPS")
    Rel(service, obs, "Push/Pull Telemetry", "HTTP/gRPC")
    Rel(k8s, obs, "Exposes Metrics", "HTTP")
 ```
 ### VI Mô tả Bối cảnh
 - **Observability Stack**: Trung tâm thu thập và hiển thị dữ liệu (Prometheus, Loki, Jaeger, Grafana).
 - **Microservices**: Gửi logs, metrics và traces (OpenTelemetry).
 - **Developer/SRE**: Sử dụng Grafana để theo dõi sức khỏe hệ thống và debug.
 ### EN Context Description
 - **Observability Stack**: Central collection and visualization (Prometheus, Loki, Jaeger, Grafana).
 - **Microservices**: Send logs, metrics, and traces (OpenTelemetry).
 - **Developer/SRE**: Use Grafana to monitor system health and debug.
 ## Ba Trụ cột Khả năng Quan sát / Three Pillars of Observability
 ### 1. Metrics (Prometheus + Grafana)
 ```mermaid
 graph LR
    Service[Service] -->|Expose /metrics| Prom[Prometheus]
    Prom -->|Scrape every 15s| Metrics[Time Series DB]
    Metrics --> Grafana[Grafana]
    Grafana --> Dashboard1[Request Dashboard]
    Grafana --> Dashboard2[Error Dashboard]
    Grafana --> Dashboard3[Performance Dashboard]
    style Prom fill:#d4edda
    style Grafana fill:#e1f5ff
 ```
 **VI**: Các phép đo số theo thời gian (requests/sec, latency, errors).
 **EN**: Numerical measurements over time (requests/sec, latency, errors).
 **Triển khai / Implementation**:
 ```typescript
 import { Counter, Histogram, Gauge } from 'prom-client';
 // VI: HTTP request metrics
 // EN: HTTP request metrics
 export const httpRequestDuration = new Histogram({
  name: 'http_request_duration_seconds',
  help: 'Duration of HTTP requests in seconds',
  labelNames: ['method', 'route', 'status'],
  buckets: [0.001, 0.01, 0.05, 0.1, 0.5, 1, 2, 5]
 });
 export const httpRequestTotal = new Counter({
  name: 'http_requests_total',
  help: 'Total HTTP requests',
  labelNames: ['method', 'route', 'status']
 });
 export const activeRequests = new Gauge({
  name: 'http_requests_active',
  help: 'Number of active HTTP requests'
 });
 // VI: Middleware để track metrics
 // EN: Middleware to track metrics
 export function metricsMiddleware(req, res, next) {
  const start = Date.now();
  activeRequests.inc();
  res.on('finish', () => {
    const duration = (Date.now() - start) / 1000;
    httpRequestDuration.observe(
      { method: req.method, route: req.route?.path || req.path, status: res.statusCode },
      duration
    );
    httpRequestTotal.inc({
      method: req.method,
      route: req.route?.path || req.path,
      status: res.statusCode
    });
    activeRequests.dec();
  });
  next();
 }
 ```
 ### 2. Logging (Winston + Loki)
 ```mermaid
 sequenceDiagram
    participant Service
    participant Winston as Winston Logger
    participant Loki
    participant Grafana
    Service->>Winston: Log event
    Winston->>Winston: Format JSON
    Winston->>Winston: Add metadata<br/>(correlation ID, trace ID)
    Winston->>Loki: Push logs
    Loki->>Loki: Index & store
    User->>Grafana: Query logs
    Grafana->>Loki: LogQL query
    Loki-->>Grafana: Log results
 ```
 **VI**: Structured logging với correlation IDs để tracing requests.
 **EN**: Structured logging with correlation IDs for request tracing.
 **Triển khai / Implementation**:
 ```typescript
 import winston from 'winston';
 export const logger = winston.createLogger({
  level: process.env.LOG_LEVEL || 'info',
  format: winston.format.combine(
    winston.format.timestamp(),
    winston.format.errors({ stack: true }),
    winston.format.json()
  ),
  defaultMeta: {
    service: process.env.SERVICE_NAME || 'unknown-service',
    environment: process.env.NODE_ENV || 'development'
  },
  transports: [
    new winston.transports.Console(),
    // VI: Loki transport (nếu configured)
    // EN: Loki transport (if configured)
  ]
 });
 // VI: Logger middleware
 // EN: Logger middleware
 export function loggerMiddleware(req, res, next) {
  const correlationId = req.headers['x-correlation-id'] || generateId();
  req.correlationId = correlationId;
  req.logger = logger.child({ correlationId });
  req.logger.info('Incoming request', {
    method: req.method,
    path: req.path,
    ip: req.ip
  });
  res.on('finish', () => {
    req.logger.info('Request completed', {
      method: req.method,
      path: req.path,
      status: res.statusCode,
      duration: Date.now() - req.startTime
    });
  });
  next();
 }
 ```
 ### 3. Tracing (OpenTelemetry + Jaeger)
 ```mermaid
 graph LR
    Request[Incoming Request] --> Trace[Create Trace]
    Trace --> SpanA[Span: HTTP Request]
    SpanA --> SpanB[Span: DB Query]
    SpanA --> SpanC[Span: Cache Check]
    SpanA --> SpanD[Span: External API]
    SpanB --> Jaeger[Jaeger]
    SpanC --> Jaeger
    SpanD --> Jaeger
    Jaeger --> Timeline[Trace Timeline]
    style Trace fill:#e1f5ff
    style Jaeger fill:#d4edda
 ```
 **VI**: Distributed tracing để track requests giữa các services.
 **EN**: Distributed tracing to track requests across services.
 **Triển khai / Implementation**:
 ```typescript
 import { trace, SpanStatusCode } from '@opentelemetry/api';
 // VI: Tạo traced function
 // EN: Create traced function
 export function traced<T>(
  name: string,
  fn: () => Promise<T>
 ): Promise<T> {
  const tracer = trace.getTracer('app');
  const span = tracer.startSpan(name);
  return fn()
    .then(result => {
      span.setStatus({ code: SpanStatusCode.OK });
      return result;
    })
    .catch(error => {
      span.setStatus({
        code: SpanStatusCode.ERROR,
        message: error.message
      });
      span.recordException(error);
      throw error;
    })
    .finally(() => {
      span.end();
    });
 }
 // VI: Sử dụng
 // EN: Usage
 async getUserWithTracing(userId: string): Promise<User> {
  return traced('getUserById', async () => {
    return await userRepository.findById(userId);
  });
 }
 ```
 ## Kiểm tra Sức khỏe / Health Checks
 ```typescript
 // VI: Liveness probe - service có đang chạy không?
 // EN: Liveness probe - is service running?
 app.get('/health/live', (req, res) => {
  res.json({ status: 'ok', timestamp: new Date().toISOString() });
 });
 // VI: Readiness probe - service có sẵn sàng nhận traffic không?
 // EN: Readiness probe - is service ready for traffic?
 app.get('/health/ready', async (req, res) => {
  const checks = {
    database: await checkDatabase(),
    redis: await checkRedis(),
    disk: await checkDiskSpace()
  };
  const ready = Object.values(checks).every(check => check === true);
  res.status(ready ? 200 : 503).json({
    ready,
    checks,
    timestamp: new Date().toISOString()
  });
 });
 async function checkDatabase(): Promise<boolean> {
  try {
    await prisma.$queryRaw`SELECT 1`;
    return true;
  } catch {
    return false;
  }
 }
 ```
 ## Quy tắc Cảnh báo / Alerting Rules
 ```yaml
 # VI: Prometheus alerting rules
 # EN: Prometheus alerting rules
 groups:
  - name: service_alerts
    interval: 30s
    rules:
      # VI: Tỷ lệ lỗi cao
      # EN: High error rate
      - alert: HighErrorRate
        expr: |
          rate(http_requests_total{status=~"5.."}[5m]) > 0.05
        for: 2m
        labels:
          severity: warning
        annotations:
          summary: "High error rate detected"
          description: "Error rate is {{ $value }} (> 5%)"
      # VI: Độ trễ cao
      # EN: High latency
      - alert: HighLatency
        expr: |
          histogram_quantile(0.95, http_request_duration_seconds_bucket) > 1
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High latency detected"
          description: "P95 latency is {{ $value }}s"
      # VI: Service down
      # EN: Service down
      - alert: ServiceDown
        expr: up == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Service is down"
 ```
 ## Đặc điểm Hiệu suất / Performance Characteristics
 ### VI: Mục tiêu Hiệu suất
 | Chỉ số / Metric | Mục tiêu / Target | Ghi chú / Notes |
 |-----------------|-------------------|-----------------|
 | **Metric Scrape Interval** | 15s | Critical services |
 | **Log Ingestion Latency** | < 1s | Time from emit to queryable |
 | **Trace Sampling Rate** | 10% | Production (100% in Dev/Staging) |
 | **Dashboard Load Time** | < 2s | P95 Latency |
 | **Alert Evaluation** | Every 1m | Evaluation interval |
 | **Retention Policy** | 14 days | Logs & Traces (Metrics: 30 days) |
 ### EN: Performance Targets
 | Metric | Target | Notes |
 |--------|--------|-------|
 | **Metric Scrape Interval** | 15s | Critical services |
 | **Log Ingestion Latency** | < 1s | Time from emit to queryable |
 | **Trace Sampling Rate** | 10% | Production (100% in Dev/Staging) |
 | **Dashboard Load Time** | < 2s | P95 Latency |
 | **Alert Evaluation** | Every 1m | Evaluation interval |
 | **Retention Policy** | 14 days | Logs & Traces (Metrics: 30 days) |
 ## Cân nhắc Bảo mật / Security Considerations
 ### VI: Bảo mật Observability
 - **Log Scrubbing**: Tự động loại bỏ PII (emails, ssn, credit cards) và secrets khỏi logs trước khi ingestion.
 - **Access Control**: Grafana integrated với OAuth2/OIDC, phân quyền Viewer/Editor/Admin.
 - **Network Policy**: Chỉ cho phép traffic từ namespace nội bộ tới các cổng ingestion (9090, 3100, 14268).
 - **TLS**: Mã hóa traffic giữa agents và collectors.
 ### EN: Observability Security
 - **Log Scrubbing**: Automatically scrub PII (emails, ssn, credit cards) and secrets from logs before ingestion.
 - **Access Control**: Grafana integrated with OAuth2/OIDC, roles for Viewer/Editor/Admin.
 - **Network Policy**: Allow traffic only from internal namespaces to ingestion ports (9090, 3100, 14268).
 - **TLS**: Encrypt traffic between agents and collectors.
 ## Triển khai / Deployment
 ```mermaid
 graph TD
    subgraph "Kubernetes Monitoring Namespace"
        Grafana[Grafana]
        Prom[Prometheus Server]
        Loki[Loki Gateway]
        Jaeger[Jaeger Collector]
    end
    subgraph "App Namespace"
        App[Application Pods]
        Agent[Grafana Agent / Promtail]
    end
    App -->|Push Logs| Agent
    Agent -->|Push| Loki
    Prom -->|Pull Metrics| App
    Prom -->|Pull Metrics| Agent
    App -->|Push Traces| Jaeger
    Grafana --> Prom
    Grafana --> Loki
    Grafana --> Jaeger
    style Grafana fill:#ffe1e1
    style Prom fill:#d4edda
    style Loki fill:#fff4e1
    style Jaeger fill:#e1f5ff
 ```
 **VI Mô tả Triển khai**:
 - **Agent**: Promtail hoặc Grafana Agent chạy như DaemonSet hoặc Sidecar để thu thập logs.
 - **Pull Model**: Prometheus scrape metrics từ endpoints `/metrics`.
 - **Push Model**: Traces và Logs được push tới collectors.
 - **Resources**: Dedicated nodes cho monitoring stack trong production để tránh ảnh hưởng workload chính.
 **EN Deployment Description**:
 - **Agent**: Promtail or Grafana Agent runs as DaemonSet or Sidecar to collect logs.
 - **Pull Model**: Prometheus scrapes metrics from `/metrics` endpoints.
 - **Push Model**: Traces and Logs are pushed to collectors.
 - **Resources**: Dedicated nodes for monitoring stack in production to prevent impact on main workload.
 ## Tài liệu Liên quan / Related Documentation
 - [System Design](./system-design.md) - Kiến trúc tổng thể / Overall architecture
 - [Caching Architecture](./caching-architecture.md) - Cache metrics
 ---
 **Cập nhật Lần cuối / Last Updated**: 2026-01-07  
 **Tác giả / Authors**: GoodGo Architecture Team
--- a/docs/vi/architecture/security-architecture.md
+++ b/docs/vi/architecture/security-architecture.md
--- a/docs/vi/architecture/system-design.md
+++ b/docs/vi/architecture/system-design.md
@@ -1,81 +1,928 @@
-# Thiết Kế Hệ Thống
+# Thiết Kế Hệ Thống / System Design
-## Tổng Quan
+> **VI**: Kiến trúc tổng thể của nền tảng GoodGo Microservices
 > **EN**: Overall architecture of GoodGo Microservices Platform
-GoodGo Microservices Platform được xây dựng sử dụng kiến trúc microservices với các nguyên tắc sau:
+## Sơ đồ Tổng quan / Overview Diagram
- **Độc Lập Service**: Mỗi service có database riêng và có thể deploy độc lập
+```mermaid
- **API Gateway**: Traefik xử lý routing, load balancing, và các concerns xuyên suốt
+graph TD
- **Shared Libraries**: Chức năng chung được trích xuất vào shared packages
+    subgraph "Client Layer"
- **Infrastructure as Code**: Tất cả cấu hình infrastructure đều được version
+        Web[Web App<br/>Next.js]
- **Observability**: Đầy đủ khả năng monitoring, logging, và tracing
+        Mobile[Mobile App<br/>Flutter]
-
+    end
-## Sơ Đồ Kiến Trúc
+    
-
+    subgraph "API Gateway Layer"
-```
+        Traefik[Traefik<br/>API Gateway]
-┌─────────────┐     ┌─────────────┐
+    end
-│   Web App   │     │ Mobile App  │
+    
-│  (Next.js)  │     │ (React Native)
+    subgraph "Services Layer"
-└──────┬──────┘     └──────┬──────┘
+        IAM[IAM Service<br/>Auth & RBAC]
-       │                   │
+        Future1[Future Service 1]
-       └──────────┬────────┘
+        Future2[Future Service 2]
-                  │
+    end
-         ┌────────▼────────┐
+    
-         │   Traefik       │
+    subgraph "Infrastructure Layer"
-         │  (API Gateway)   │
+        DB[(Neon PostgreSQL<br/>Primary Database)]
-         └────────┬─────────┘
+        Cache[(Redis<br/>Cache & Session)]
-                  │
+        Kafka[Apache Kafka<br/>Event Streaming]
-    ┌─────────────┼─────────────┐
+    end
-    │             │             │
+    
-┌───▼────┐   ┌───▼────┐   ┌───▼────┐
+    subgraph "Observability Layer"
-│ Auth  │   │ Future │   │ Future │
+        Prom[Prometheus<br/>Metrics]
-│Service │   │Service │   │Service │
+        Loki[Loki<br/>Logs]
-└───┬────┘   └───┬────┘   └───┬────┘
+        Jaeger[Jaeger<br/>Tracing]
-    │            │            │
+        Grafana[Grafana<br/>Dashboards]
-    └────────────┼────────────┘
+    end
-                 │
+    
-    ┌────────────┼────────────┐
+    Web --> Traefik
-    │            │            │
+    Mobile --> Traefik
-┌───▼────┐  ┌───▼────┐  ┌───▼────┐
+    
-│Postgres│  │ Redis  │  │Prometheus│
+    Traefik --> IAM
-└────────┘  └────────┘  └─────────┘
+    Traefik --> Future1
    Traefik --> Future2
    IAM --> DB
    IAM --> Cache
    IAM --> Kafka
    Future1 --> DB
    Future1 --> Cache
    Future1 --> Kafka
    Future2 --> DB
    Future2 --> Cache
    Future2 --> Kafka
    IAM -.->|metrics| Prom
    Future1 -.->|metrics| Prom
    Future2 -.->|metrics| Prom
    IAM -.->|logs| Loki
    Future1 -.->|logs| Loki
    Future2 -.->|logs| Loki
    IAM -.->|traces| Jaeger
    Future1 -.->|traces| Jaeger
    Future2 -.->|traces| Jaeger
    Prom --> Grafana
    Loki --> Grafana
    Jaeger --> Grafana
    style Traefik fill:#e1f5ff
    style DB fill:#f0e1ff
    style Cache fill:#fff4e1
    style Kafka fill:#d4edda
    style Grafana fill:#ffe1e1
 ```
-## Các Thành Phần
+## Mô tả Kiến trúc / Architecture Description
 ### VI: Phần Tiếng Việt
 GoodGo Platform được xây dựng theo kiến trúc microservices với các nguyên tắc sau:
 **Nguyên tắc Cốt lõi**:
 1. **Độc Lập Service**: Mỗi service có database riêng và có thể deploy độc lập
 2. **API Gateway Pattern**: Traefik xử lý routing, load balancing, và cross-cutting concerns
 3. **Shared Libraries**: Chức năng chung được trích xuất vào shared packages (`@goodgo/*`)
 4. **Infrastructure as Code**: Tất cả cấu hình infrastructure được version control
 5. **Observability First**: Đầy đủ metrics, logging, và distributed tracing
 **Công nghệ Stack**:
 - **Frontend**: Next.js 14+ (App Router), Flutter 3.x
 - **Backend**: Node.js 20+, TypeScript 5+, Express
 - **Database**: Neon PostgreSQL (serverless)
 - **Cache**: Redis (multi-layer caching)
 - **Message Broker**: Apache Kafka
 - **API Gateway**: Traefik
 - **Observability**: Prometheus, Grafana, Loki, Jaeger
 ### EN: English Section
 GoodGo Platform is built on microservices architecture with the following principles:
 **Core Principles**:
 1. **Service Independence**: Each service has its own database and can be deployed independently
 2. **API Gateway Pattern**: Traefik handles routing, load balancing, and cross-cutting concerns
 3. **Shared Libraries**: Common functionality extracted into shared packages (`@goodgo/*`)
 4. **Infrastructure as Code**: All infrastructure configuration is version controlled
 5. **Observability First**: Complete metrics, logging, and distributed tracing
 **Technology Stack**:
 - **Frontend**: Next.js 14+ (App Router), Flutter 3.x
 - **Backend**: Node.js 20+, TypeScript 5+, Express
 - **Database**: Neon PostgreSQL (serverless)
 - **Cache**: Redis (multi-layer caching)
 - **Message Broker**: Apache Kafka
 - **API Gateway**: Traefik
 - **Observability**: Prometheus, Grafana, Loki, Jaeger
 ## Bối cảnh Hệ thống / System Context
 ```mermaid
 C4Context
    title Sơ đồ Bối cảnh Hệ thống GoodGo Platform
    Person(user, "Người dùng / User", "End users accessing the platform")
    Person(admin, "Quản trị viên / Admin", "System administrators")
    Person(developer, "Nhà phát triển / Developer", "Platform developers")
    System(platform, "GoodGo Platform", "Microservices platform for business applications")
    System_Ext(neon, "Neon PostgreSQL", "Serverless PostgreSQL database")
    System_Ext(redis, "Redis", "In-memory cache and session store")
    System_Ext(kafka, "Apache Kafka", "Event streaming platform")
    System_Ext(monitoring, "Monitoring Stack", "Prometheus + Grafana + Loki + Jaeger")
    Rel(user, platform, "Uses", "HTTPS")
    Rel(admin, platform, "Manages", "HTTPS")
    Rel(developer, platform, "Develops & Deploys", "Git, CI/CD")
    Rel(platform, neon, "Stores data", "PostgreSQL Protocol")
    Rel(platform, redis, "Caches data", "Redis Protocol")
    Rel(platform, kafka, "Publishes/Consumes events", "Kafka Protocol")
    Rel(platform, monitoring, "Sends metrics, logs, traces", "HTTP, gRPC")
 ```
 ## Thành phần / Components
 ### Frontend Layer
 - **Web App**: Ứng dụng Next.js với App Router
 - **Mobile App**: Ứng dụng React Native
-### API Gateway
+#### Web App (Next.js)
- **Traefik**: Reverse proxy, load balancer, SSL termination
+**Mô tả**: Ứng dụng web sử dụng Next.js 14+ với App Router
 **Tính năng chính**:
 - Server-side rendering (SSR) và Static Site Generation (SSG)
 - API routes cho BFF (Backend for Frontend) pattern
 - Optimized image loading với next/image
 - Built-in routing và code splitting
 **Công nghệ sử dụng**:
 - Next.js 14+, React 18+, TypeScript
 - Tailwind CSS, Zustand (state management)
 - `@goodgo/http-client`, `@goodgo/types`
 **Vị trí File**: [`apps/web-client/`](file:///Users/velikho/Desktop/WORKING/Base/apps/web-client)
 #### Mobile App (Flutter)
 **Mô tả**: Ứng dụng mobile cross-platform sử dụng Flutter
 **Tính năng chính**:
 - Cross-platform (iOS, Android)
 - Native performance
 - Provider pattern cho state management
 - Offline-first với local storage
 **Công nghệ sử dụng**:
 - Flutter 3.x, Dart
 - Provider, Dio (HTTP client)
 **Vị trí File**: [`apps/mobile-client/`](file:///Users/velikho/Desktop/WORKING/Base/apps/mobile-client)
 ### API Gateway Layer
 #### Traefik
 **Mô tả**: Reverse proxy và API gateway xử lý routing, load balancing, SSL termination
 **Tính năng chính**:
 - Dynamic service discovery
 - Automatic HTTPS với Let's Encrypt
 - Load balancing và health checks
 - Rate limiting và circuit breaker
 - Middleware chains (CORS, auth, logging)
 **Công nghệ sử dụng**:
 - Traefik 2.x
 - Docker labels cho dynamic configuration
 **Vị trí File**: [`infra/traefik/`](file:///Users/velikho/Desktop/WORKING/Base/infra/traefik)
 ### Services Layer
- **Auth Service**: Xác thực và phân quyền
+
- **Future Services**: Payment, Order, Notification, v.v.
+#### IAM Service
 **Mô tả**: Identity and Access Management service xử lý authentication và authorization
 **Tính năng chính**:
 - JWT authentication (RS256)
 - RBAC (Role-Based Access Control)
 - ABAC (Attribute-Based Access Control)
 - Event sourcing cho audit trail
 - Zero-trust device validation
 **Công nghệ sử dụng**:
 - Node.js, Express, TypeScript
 - Prisma ORM, bcrypt, jsonwebtoken
 - `@goodgo/logger`, `@goodgo/tracing`
 **Vị trí File**: [`services/iam-service/`](file:///Users/velikho/Desktop/WORKING/Base/services/iam-service)
 #### Future Services
 **Mô tả**: Các services sẽ được phát triển trong tương lai
 **Dự kiến**:
 - Payment Service - Xử lý thanh toán
 - Order Service - Quản lý đơn hàng
 - Notification Service - Gửi thông báo
 - Analytics Service - Phân tích dữ liệu
 ### Infrastructure Layer
 - **PostgreSQL**: Database chính
 - **Redis**: Caching và session storage
 - **Prometheus**: Thu thập metrics
 - **Grafana**: Hiển thị metrics
 - **Loki**: Tập hợp logs
-## Các Mẫu Giao Tiếp
+#### Neon PostgreSQL
 **Mô tả**: Serverless PostgreSQL database với auto-scaling
- **Đồng Bộ**: HTTP/REST cho các mẫu request-response
+**Tính năng chính**:
- **Bất Đồng Bộ**: Message queues (triển khai trong tương lai)
+- Serverless với auto-scaling
- **Service Discovery**: Docker networking và Kubernetes DNS
+- Branching cho development/staging
 - Point-in-time recovery
 - Connection pooling
-## Quản Lý Dữ Liệu
+**Vị trí File**: Database schemas trong mỗi service (`services/*/prisma/schema.prisma`)
- **Database per Service**: Mỗi service sở hữu dữ liệu của mình
+#### Redis
- **API Composition**: Services expose APIs để truy cập dữ liệu
+**Mô tả**: In-memory cache và session store
 - **Event Sourcing**: Xem xét trong tương lai cho audit trails
-## Bảo Mật
+**Tính năng chính**:
 - Multi-layer caching (L1: Memory, L2: Redis)
 - Session storage
 - Rate limiting counters
 - Pub/Sub cho real-time features
- **Authentication**: JWT tokens với refresh token rotation
+**Vị trí File**: [`infra/redis/`](file:///Users/velikho/Desktop/WORKING/Base/infra/redis)
- **Authorization**: Role-based access control (RBAC)
+
- **Network Security**: TLS/SSL, rate limiting, CORS
+#### Apache Kafka
- **Secrets Management**: Environment variables, Kubernetes secrets
+**Mô tả**: Event streaming platform cho asynchronous communication
 **Tính năng chính**:
 - Event-driven architecture
 - Event sourcing
 - Eventual consistency
 - Dead letter queue (DLQ)
 **Vị trí File**: [`infra/kafka/`](file:///Users/velikho/Desktop/WORKING/Base/infra/kafka)
 ## Luồng Dữ liệu / Data Flow
 ```mermaid
 sequenceDiagram
    participant Client
    participant Traefik as API Gateway
    participant Service
    participant Cache as Redis
    participant DB as PostgreSQL
    participant Kafka
    Client->>Traefik: HTTPS Request
    Traefik->>Traefik: Rate Limiting
    Traefik->>Traefik: JWT Validation
    Traefik->>Service: Route to Service
    Service->>Cache: Check Cache
    alt Cache Hit
        Cache-->>Service: Return Cached Data
    else Cache Miss
        Service->>DB: Query Database
        DB-->>Service: Return Data
        Service->>Cache: Store in Cache (TTL: 5min)
    end
    Service->>Service: Process Business Logic
    Service->>DB: Update Data (if needed)
    Service->>Kafka: Publish Event (async)
    Service-->>Traefik: Response
    Traefik-->>Client: HTTPS Response
    Note over Kafka: Event consumers process asynchronously
 ```
 **VI Giải thích chi tiết**:
 1. **Request**: Client gửi HTTPS request đến Traefik
 2. **Gateway Processing**: Traefik thực hiện rate limiting và JWT validation
 3. **Routing**: Traefik route request đến service phù hợp
 4. **Cache Check**: Service kiểm tra L1 (memory) → L2 (Redis) cache
 5. **Database Query**: Nếu cache miss, query từ PostgreSQL
 6. **Cache Update**: Lưu kết quả vào cache với TTL phù hợp
 7. **Business Logic**: Xử lý logic nghiệp vụ
 8. **Event Publishing**: Publish domain events đến Kafka (async)
 9. **Response**: Trả về response cho client qua Traefik
 **EN Detailed Explanation**:
 1. **Request**: Client sends HTTPS request to Traefik
 2. **Gateway Processing**: Traefik performs rate limiting and JWT validation
 3. **Routing**: Traefik routes request to appropriate service
 4. **Cache Check**: Service checks L1 (memory) → L2 (Redis) cache
 5. **Database Query**: If cache miss, query from PostgreSQL
 6. **Cache Update**: Store result in cache with appropriate TTL
 7. **Business Logic**: Process business logic
 8. **Event Publishing**: Publish domain events to Kafka (async)
 9. **Response**: Return response to client via Traefik
 ## Kiến trúc Database / Database Architecture
 ```mermaid
 erDiagram
    User ||--o{ Session : has
    User ||--o{ UserRole : has
    User ||--o{ UserPermission : has
    User ||--o{ MFADevice : has
    User ||--o{ AuditEvent : triggers
    Role ||--o{ UserRole : assigned_to
    Role ||--o{ RolePermission : has
    Permission ||--o{ RolePermission : granted_to
    Permission ||--o{ UserPermission : granted_to
    Organization ||--o{ User : contains
    Organization ||--o{ Role : defines
    User {
        string id PK
        string email UK
        string passwordHash
        string organizationId FK
        boolean mfaEnabled
        datetime createdAt
        datetime updatedAt
    }
    Session {
        string id PK
        string userId FK
        string refreshTokenHash
        string deviceFingerprint
        string ipAddress
        datetime expiresAt
        datetime createdAt
    }
    Role {
        string id PK
        string name
        string organizationId FK
        int hierarchy
        datetime createdAt
    }
    Permission {
        string id PK
        string resource
        string action
        string scope
        datetime createdAt
    }
    AuditEvent {
        string id PK
        string userId FK
        string eventType
        json eventData
        datetime timestamp
    }
 ```
 **VI Mô tả**:
 - **Database per Service**: Mỗi service có database schema riêng
 - **Shared Database**: Hiện tại sử dụng shared Neon PostgreSQL, schemas isolated bằng Prisma
 - **Event Sourcing**: Audit events lưu tất cả thay đổi quan trọng
 - **Soft Delete**: Sử dụng `deletedAt` field thay vì hard delete
 **EN Description**:
 - **Database per Service**: Each service has its own database schema
 - **Shared Database**: Currently using shared Neon PostgreSQL, schemas isolated by Prisma
 - **Event Sourcing**: Audit events store all important changes
 - **Soft Delete**: Use `deletedAt` field instead of hard delete
 ## Quyết định Thiết kế / Design Decisions
 ### Quyết định 1: Microservices Architecture
 **VI Bối cảnh**: Cần khả năng scale độc lập và deploy riêng biệt cho từng business domain
 **VI Quyết định**: Sử dụng microservices architecture với database per service pattern
 **VI Hậu quả**:
 - ✅ **Tích cực**:
  - Scale độc lập từng service theo nhu cầu
  - Deploy riêng biệt, giảm risk khi release
  - Fault isolation - lỗi một service không ảnh hưởng toàn bộ
  - Technology flexibility - mỗi service có thể dùng tech stack khác
 - ❌ **Tiêu cực**:
  - Phức tạp hơn monolith (distributed systems challenges)
  - Eventual consistency thay vì strong consistency
  - Distributed transactions phức tạp (Saga pattern)
  - Operational overhead (monitoring, deployment)
 **VI Các lựa chọn thay thế**: Monolith, Modular Monolith
 **EN Context**: Need independent scaling and deployment for each business domain
 **EN Decision**: Use microservices architecture with database per service pattern
 **EN Consequences**:
 - ✅ **Positive**:
  - Independent scaling per service based on demand
  - Independent deployment, reduced release risk
  - Fault isolation - one service failure doesn't affect entire system
  - Technology flexibility - each service can use different tech stack
 - ❌ **Negative**:
  - More complex than monolith (distributed systems challenges)
  - Eventual consistency instead of strong consistency
  - Complex distributed transactions (Saga pattern)
  - Operational overhead (monitoring, deployment)
 **EN Alternatives**: Monolith, Modular Monolith
 ---
 ### Quyết định 2: Traefik as API Gateway
 **VI Bối cảnh**: Cần reverse proxy, load balancing, SSL termination, và service discovery
 **VI Quyết định**: Sử dụng Traefik thay vì Kong, NGINX, hoặc AWS API Gateway
 **VI Hậu quả**:
 - ✅ **Tích cực**:
  - Auto service discovery với Docker labels
  - Dynamic configuration không cần restart
  - Built-in Let's Encrypt support
  - Native Kubernetes integration
  - Built-in metrics và tracing
 - ❌ **Tiêu cực**:
  - Learning curve cao hơn NGINX
  - Plugin ecosystem nhỏ hơn Kong
  - Community nhỏ hơn NGINX
 **VI Các lựa chọn thay thế**: Kong, NGINX, AWS API Gateway, Envoy
 **EN Context**: Need reverse proxy, load balancing, SSL termination, and service discovery
 **EN Decision**: Use Traefik instead of Kong, NGINX, or AWS API Gateway
 **EN Consequences**:
 - ✅ **Positive**:
  - Auto service discovery with Docker labels
  - Dynamic configuration without restart
  - Built-in Let's Encrypt support
  - Native Kubernetes integration
  - Built-in metrics and tracing
 - ❌ **Negative**:
  - Higher learning curve than NGINX
  - Smaller plugin ecosystem than Kong
  - Smaller community than NGINX
 **EN Alternatives**: Kong, NGINX, AWS API Gateway, Envoy
 ---
 ### Quyết định 3: Neon PostgreSQL (Serverless)
 **VI Bối cảnh**: Cần database với auto-scaling, branching, và cost-effective cho development
 **VI Quyết định**: Sử dụng Neon PostgreSQL (serverless) thay vì self-hosted PostgreSQL hoặc AWS RDS
 **VI Hậu quả**:
 - ✅ **Tích cực**:
  - Auto-scaling theo usage
  - Database branching cho dev/staging
  - Pay-per-use pricing model
  - Automatic backups và point-in-time recovery
  - No infrastructure management
 - ❌ **Tiêu cực**:
  - Vendor lock-in
  - Cold start latency (mitigated by connection pooling)
  - Limited control over database configuration
 **VI Các lựa chọn thay thế**: Self-hosted PostgreSQL, AWS RDS, Google Cloud SQL
 **EN Context**: Need database with auto-scaling, branching, and cost-effective for development
 **EN Decision**: Use Neon PostgreSQL (serverless) instead of self-hosted PostgreSQL or AWS RDS
 **EN Consequences**:
 - ✅ **Positive**:
  - Auto-scaling based on usage
  - Database branching for dev/staging
  - Pay-per-use pricing model
  - Automatic backups and point-in-time recovery
  - No infrastructure management
 - ❌ **Negative**:
  - Vendor lock-in
  - Cold start latency (mitigated by connection pooling)
  - Limited control over database configuration
 **EN Alternatives**: Self-hosted PostgreSQL, AWS RDS, Google Cloud SQL
 ## Đặc điểm Hiệu suất / Performance Characteristics
 | Chỉ số / Metric | Mục tiêu / Target | Ghi chú / Notes |
 |-----------------|-------------------|-----------------|
 | **API Response Time (P95)** | < 200ms | Excluding external API calls |
 | **API Response Time (P99)** | < 500ms | Peak load scenarios |
 | **Throughput** | 1000 req/s | Per service instance |
 | **Database Query Time (P95)** | < 50ms | Simple queries with indexes |
 | **Cache Hit Rate (L1)** | > 40% | In-memory cache |
 | **Cache Hit Rate (L2)** | > 80% | Redis cache |
 | **Event Publish Latency (P95)** | < 10ms | Kafka fire-and-forget |
 | **Service Availability** | > 99.9% | Monthly uptime target |
 | **Error Rate** | < 1% | 4xx + 5xx errors |
 **VI Tối ưu hóa Hiệu suất**:
 - Multi-layer caching (L1: Memory, L2: Redis)
 - Connection pooling cho database
 - Pagination cho list endpoints (max 100 items)
 - Database indexes cho frequently queried fields
 - Async event publishing (fire-and-forget)
 - CDN cho static assets (Next.js)
 **EN Performance Optimizations**:
 - Multi-layer caching (L1: Memory, L2: Redis)
 - Connection pooling for database
 - Pagination for list endpoints (max 100 items)
 - Database indexes for frequently queried fields
 - Async event publishing (fire-and-forget)
 - CDN for static assets (Next.js)
 ## Cân nhắc Bảo mật / Security Considerations
 ### VI: Phần Tiếng Việt
 **Authentication**:
 - JWT với RS256 (asymmetric signing)
 - Access token: 15 phút expiry
 - Refresh token: 7 ngày expiry, rotation on use
 - httpOnly cookies cho token storage
 - MFA support (TOTP, backup codes)
 **Authorization**:
 - RBAC (Role-Based Access Control)
 - ABAC (Attribute-Based Access Control)
 - Permission format: `resource:action:scope`
 - Permission caching (5 min TTL)
 - Zero-trust device validation
 **Network Security**:
 - TLS 1.2+ enforcement
 - HTTPS-only (HSTS headers)
 - Rate limiting: 100 req/15min (standard), 10 req/hour (strict)
 - CORS whitelist từ environment variables
 - Network policies (Kubernetes)
 **Data Protection**:
 - AES-256-GCM encryption cho PII at rest
 - bcrypt (cost 12) cho password hashing
 - SHA-256 hashing cho tokens before storage
 - Database encryption at rest (Neon)
 - TLS in-transit cho tất cả connections
 **Secrets Management**:
 - Kubernetes secrets cho production
 - Environment variables validation với Zod
 - No hardcoded secrets in code
 - Quarterly secret rotation
 **Audit Trail**:
 - Event sourcing cho tất cả auth events
 - 7-year retention cho compliance
 - Immutable audit logs
 - Correlation IDs cho request tracing
 ### EN: English Section
 **Authentication**:
 - JWT with RS256 (asymmetric signing)
 - Access token: 15 minutes expiry
 - Refresh token: 7 days expiry, rotation on use
 - httpOnly cookies for token storage
 - MFA support (TOTP, backup codes)
 **Authorization**:
 - RBAC (Role-Based Access Control)
 - ABAC (Attribute-Based Access Control)
 - Permission format: `resource:action:scope`
 - Permission caching (5 min TTL)
 - Zero-trust device validation
 **Network Security**:
 - TLS 1.2+ enforcement
 - HTTPS-only (HSTS headers)
 - Rate limiting: 100 req/15min (standard), 10 req/hour (strict)
 - CORS whitelist from environment variables
 - Network policies (Kubernetes)
 **Data Protection**:
 - AES-256-GCM encryption for PII at rest
 - bcrypt (cost 12) for password hashing
 - SHA-256 hashing for tokens before storage
 - Database encryption at rest (Neon)
 - TLS in-transit for all connections
 **Secrets Management**:
 - Kubernetes secrets for production
 - Environment variables validation with Zod
 - No hardcoded secrets in code
 - Quarterly secret rotation
 **Audit Trail**:
 - Event sourcing for all auth events
 - 7-year retention for compliance
 - Immutable audit logs
 - Correlation IDs for request tracing
 ## Triển khai / Deployment
 ```mermaid
 graph TD
    subgraph "Kubernetes Cluster"
        subgraph "Ingress"
            LB[Load Balancer<br/>External IP]
            Traefik[Traefik Pods<br/>Replicas: 2]
        end
        subgraph "Services"
            IAM[IAM Service Pods<br/>Replicas: 2-10 HPA]
            Service1[Service 1 Pods<br/>Replicas: 2-10 HPA]
            Service2[Service 2 Pods<br/>Replicas: 2-10 HPA]
        end
        subgraph "Infrastructure"
            Redis[Redis Cluster<br/>3 Masters + 3 Slaves]
            Kafka[Kafka Cluster<br/>3 Brokers]
        end
        subgraph "Observability"
            Prom[Prometheus<br/>Replicas: 2]
            Loki[Loki<br/>Replicas: 2]
            Jaeger[Jaeger<br/>Replicas: 2]
            Grafana[Grafana<br/>Replicas: 2]
        end
    end
    subgraph "External"
        DB[(Neon PostgreSQL<br/>Serverless)]
    end
    LB --> Traefik
    Traefik --> IAM
    Traefik --> Service1
    Traefik --> Service2
    IAM --> Redis
    IAM --> Kafka
    IAM --> DB
    Service1 --> Redis
    Service1 --> Kafka
    Service1 --> DB
    Service2 --> Redis
    Service2 --> Kafka
    Service2 --> DB
    IAM -.->|metrics| Prom
    Service1 -.->|metrics| Prom
    Service2 -.->|metrics| Prom
    IAM -.->|logs| Loki
    Service1 -.->|logs| Loki
    Service2 -.->|logs| Loki
    IAM -.->|traces| Jaeger
    Service1 -.->|traces| Jaeger
    Service2 -.->|traces| Jaeger
    Prom --> Grafana
    Loki --> Grafana
    Jaeger --> Grafana
    style LB fill:#e1f5ff
    style DB fill:#f0e1ff
    style Redis fill:#fff4e1
    style Kafka fill:#d4edda
    style Grafana fill:#ffe1e1
 ```
 ### VI: Chiến lược Triển khai
 **Deployment Strategy**:
 - Rolling updates (maxSurge: 1, maxUnavailable: 0)
 - Zero-downtime deployments
 - Blue-green deployment cho major releases
 - Canary deployment cho high-risk changes
 **Auto-scaling**:
 - Horizontal Pod Autoscaler (HPA)
  - Min replicas: 2
  - Max replicas: 10
  - Target CPU: 70%
  - Target Memory: 80%
 **Resource Allocation**:
 | Service | Requests | Limits |
 |---------|----------|--------|
 | **Microservices** | 256Mi RAM, 250m CPU | 512Mi RAM, 500m CPU |
 | **Traefik** | 512Mi RAM, 500m CPU | 1Gi RAM, 1000m CPU |
 | **Redis** | 2Gi RAM, 1 CPU | 4Gi RAM, 2 CPU |
 | **Prometheus** | 4Gi RAM, 2 CPU | 8Gi RAM, 4 CPU |
 **Health Checks**:
 - Liveness probe: `/health/live` (K8s restarts if fails)
 - Readiness probe: `/health/ready` (K8s removes from LB if fails)
 - Startup probe: `/health/live` (initial delay 30s)
 **Environments**:
 - **Local**: Docker Compose
 - **Staging**: Kubernetes cluster (shared)
 - **Production**: Kubernetes cluster (dedicated)
 ### EN: Deployment Strategy
 **Deployment Strategy**:
 - Rolling updates (maxSurge: 1, maxUnavailable: 0)
 - Zero-downtime deployments
 - Blue-green deployment for major releases
 - Canary deployment for high-risk changes
 **Auto-scaling**:
 - Horizontal Pod Autoscaler (HPA)
  - Min replicas: 2
  - Max replicas: 10
  - Target CPU: 70%
  - Target Memory: 80%
 **Resource Allocation**:
 | Service | Requests | Limits |
 |---------|----------|--------|
 | **Microservices** | 256Mi RAM, 250m CPU | 512Mi RAM, 500m CPU |
 | **Traefik** | 512Mi RAM, 500m CPU | 1Gi RAM, 1000m CPU |
 | **Redis** | 2Gi RAM, 1 CPU | 4Gi RAM, 2 CPU |
 | **Prometheus** | 4Gi RAM, 2 CPU | 8Gi RAM, 4 CPU |
 **Health Checks**:
 - Liveness probe: `/health/live` (K8s restarts if fails)
 - Readiness probe: `/health/ready` (K8s removes from LB if fails)
 - Startup probe: `/health/live` (initial delay 30s)
 **Environments**:
 - **Local**: Docker Compose
 - **Staging**: Kubernetes cluster (shared)
 - **Production**: Kubernetes cluster (dedicated)
 ## Giám sát & Khả năng quan sát / Monitoring & Observability
 ### VI: Chỉ số Chính
 **Application Metrics**:
 - `http_requests_total` - Total HTTP requests (counter)
 - `http_request_duration_seconds` - Request duration (histogram)
 - `http_requests_active` - Active requests (gauge)
 - `cache_hits_total` / `cache_misses_total` - Cache performance
 - `db_query_duration_seconds` - Database query duration
 **Infrastructure Metrics**:
 - CPU usage, Memory usage per pod
 - Network I/O, Disk I/O
 - Pod restart count
 - Node resource utilization
 **Business Metrics**:
 - User registrations per day
 - Login success/failure rate
 - API usage by endpoint
 - Error rate by service
 **Kiểm tra Sức khỏe**:
 - `/health/live` - Liveness probe (service running?)
 - `/health/ready` - Readiness probe (ready for traffic?)
 - `/metrics` - Prometheus metrics endpoint
 **Alerting Rules**:
 ```yaml
 # High error rate
 - alert: HighErrorRate
  expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.05
  for: 2m
  severity: warning
 # High latency
 - alert: HighLatency
  expr: histogram_quantile(0.95, http_request_duration_seconds_bucket) > 0.5
  for: 5m
  severity: warning
 # Service down
 - alert: ServiceDown
  expr: up == 0
  for: 1m
  severity: critical
 # High memory usage
 - alert: HighMemoryUsage
  expr: container_memory_usage_bytes / container_spec_memory_limit_bytes > 0.85
  for: 5m
  severity: warning
 ```
 **Logging**:
 - Structured JSON logging với Winston
 - Correlation IDs cho request tracing
 - Log levels: error, warn, info, debug
 - Log aggregation với Loki
 - 7 days retention
 **Distributed Tracing**:
 - OpenTelemetry instrumentation
 - Jaeger backend
 - Trace sampling: 10% in production, 100% in staging
 - Span attributes: service, operation, user_id, correlation_id
 ### EN: Key Metrics
 **Application Metrics**:
 - `http_requests_total` - Total HTTP requests (counter)
 - `http_request_duration_seconds` - Request duration (histogram)
 - `http_requests_active` - Active requests (gauge)
 - `cache_hits_total` / `cache_misses_total` - Cache performance
 - `db_query_duration_seconds` - Database query duration
 **Infrastructure Metrics**:
 - CPU usage, Memory usage per pod
 - Network I/O, Disk I/O
 - Pod restart count
 - Node resource utilization
 **Business Metrics**:
 - User registrations per day
 - Login success/failure rate
 - API usage by endpoint
 - Error rate by service
 **Health Checks**:
 - `/health/live` - Liveness probe (service running?)
 - `/health/ready` - Readiness probe (ready for traffic?)
 - `/metrics` - Prometheus metrics endpoint
 **Alerting Rules**:
 ```yaml
 # High error rate
 - alert: HighErrorRate
  expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.05
  for: 2m
  severity: warning
 # High latency
 - alert: HighLatency
  expr: histogram_quantile(0.95, http_request_duration_seconds_bucket) > 0.5
  for: 5m
  severity: warning
 # Service down
 - alert: ServiceDown
  expr: up == 0
  for: 1m
  severity: critical
 # High memory usage
 - alert: HighMemoryUsage
  expr: container_memory_usage_bytes / container_spec_memory_limit_bytes > 0.85
  for: 5m
  severity: warning
 ```
 **Logging**:
 - Structured JSON logging with Winston
 - Correlation IDs for request tracing
 - Log levels: error, warn, info, debug
 - Log aggregation with Loki
 - 7 days retention
 **Distributed Tracing**:
 - OpenTelemetry instrumentation
 - Jaeger backend
 - Trace sampling: 10% in production, 100% in staging
 - Span attributes: service, operation, user_id, correlation_id
 ## Tài liệu Liên quan / Related Documentation
 - [Event-Driven Architecture](./event-driven-architecture.md) - Kiến trúc hướng sự kiện / Event-driven architecture
 - [Caching Architecture](./caching-architecture.md) - Chiến lược caching / Caching strategy
 - [Security Architecture](./security-architecture.md) - Kiến trúc bảo mật / Security architecture
 - [Observability Architecture](./observability-architecture.md) - Khả năng quan sát / Observability
 - [Data Consistency Patterns](./data-consistency-patterns.md) - Mẫu nhất quán dữ liệu / Data consistency patterns
 - [Microservices Communication](./microservices-communication.md) - Giao tiếp microservices / Microservices communication
 ## Tham khảo / References
 - [Microservices Patterns](https://microservices.io/patterns/index.html) - Microservices pattern catalog
 - [Twelve-Factor App](https://12factor.net/) - Best practices for cloud-native apps
 - [C4 Model](https://c4model.com/) - Software architecture diagrams
 - [Kubernetes Documentation](https://kubernetes.io/docs/) - Kubernetes official docs
 - [Traefik Documentation](https://doc.traefik.io/traefik/) - Traefik official docs
 ---
 **Cập nhật Lần cuối / Last Updated**: 2026-01-07  
 **Tác giả / Authors**: GoodGo Architecture Team  
 **Người review / Reviewers**: GoodGo Development Team
--- a/docs/vi/guides/deployment.md
+++ b/docs/vi/guides/deployment.md
@@ -1,106 +1,234 @@
 # Hướng Dẫn Deployment
-## Thiết Lập Database (Neon)
+> **Lưu ý**: Hướng dẫn này bao gồm các chiến lược deployment cho GoodGo Microservices Platform trên các môi trường Local, Staging, và Production sử dụng Kubernetes và Neon PostgreSQL.
-Tất cả môi trường sử dụng **Neon PostgreSQL**. Thiết lập một lần trước khi deploy:
+## Mục lục
-1. Tạo Neon project tại https://neon.tech
+1. [Kiến trúc Deployment](#kiến-trúc-deployment)
-2. Tạo branches: `main` (dev), `staging`, `production`
+2. [Yêu cầu Tiên quyết](#yêu-cầu-tiên-quyết)
-3. Lấy connection strings cho mỗi branch
+3. [Thiết lập Database (Neon)](#thiết-lập-database-neon)
-4. Cấu hình trong environment variables (xem bên dưới)
+4. [Local Deployment](#local-deployment)
 5. [Quy trình CI/CD](#quy-trình-cicd)
 6. [Staging Deployment](#staging-deployment)
 7. [Production Deployment](#production-deployment)
 8. [Scaling & Resilience](#scaling--resilience)
 9. [Quy trình Rollback](#quy-trình-rollback)
-Xem [Hướng Dẫn Thiết Lập Neon](../../infra/databases/neon/README.md) để biết chi tiết.
+---
 ## Kiến trúc Deployment
 ```mermaid
 graph TD
    subgraph "CI/CD Pipeline (GitHub Actions)"
        Code[Code Push] --> Test[Run Tests]
        Test --> Build[Build Docker Image]
        Build --> Registry[Push to Registry]
        Registry --> Deploy[Deploy to K8s]
    end
    subgraph "Infrastructure (Kubernetes)"
        Ingress[Traefik Ingress] --> Service[K8s Service]
        Service --> Pods[Application Pods]
        Pods --> Secrets[K8s Secrets]
    end
    subgraph "External Services"
        Pods --> Neon[(Neon PostgreSQL)]
        Pods --> Redis[(Redis Cloud)]
    end
    Deploy --> Ingress
 ```
 ---
 ## Yêu cầu Tiên quyết
 Trước khi deploy, đảm bảo bạn có:
 *   **Công cụ**: Cài đặt `kubectl`, `helm`, `docker`.
 *   **Quyền truy cập**:
    *   Kubernetes Cluster (EKS/GKE/DigitalOcean).
    *   Container Registry (GHCR/DockerHub).
    *   Tài khoản Neon Console.
 *   **Cấu hình**:
    *   File `KUBECONFIG` đã được setup.
    *   GitHub Secrets đã cấu hình cho CI/CD.
 ---
 ## Thiết lập Database (Neon)
 Chúng tôi sử dụng **Neon Serverless PostgreSQL** cho tất cả môi trường để tận dụng tính năng branching và auto-scaling.
 1.  **Tạo Project**: Đăng nhập [neon.tech](https://neon.tech) và tạo project `goodgo-platform`.
 2.  **Tạo Branches**:
    *   `main` -> Cho Development/Local.
    *   `staging` -> Cho môi trường Staging.
    *   `production` -> Cho môi trường Production (Protected).
 3.  **Lấy Connection Strings**:
    *   Lưu lại connection string cho từng branch (Khuyến nghị dùng Pooler mode).
 ---
 ## Local Deployment
-```bash
+Cho phát triển cục bộ, chúng ta sử dụng Docker Compose.
 # Setup Neon database URL
 cp deployments/local/env.local.example deployments/local/.env.local
 # Chỉnh sửa .env.local và thêm Neon DATABASE_URL của bạn
-# Khởi động services (không cần PostgreSQL container)
+```bash
 # 1. Setup Biến môi trường
 cp deployments/local/env.local.example deployments/local/.env.local
 # Sửa .env.local với connection string của Neon branch `main`
 # 2. Khởi động Infrastructure (Redis, Traefik, v.v.)
 cd deployments/local
 docker-compose up -d
 # 3. Khởi động Services (Hot-reload)
 pnpm dev
 ```
 ---
 ## Quy trình CI/CD
 Chúng tôi sử dụng GitHub Actions để tự động hóa deployment.
 | Workflow | Trigger | Mô tả |
 | :--- | :--- | :--- |
 | `ci-check.yml` | Pull Request | Chạy unit tests, linting, và kiểm tra build. |
 | `deploy-staging.yml` | Push vào `develop` | Build image -> Deploy vào Namespace Staging. |
 | `deploy-prod.yml` | Release / Tag | Build image -> Deploy vào Namespace Production. |
 ### Cấu hình Secrets (GitHub)
 Cài đặt các secrets này trong phần settings của repository:
 *   `NEON_DATABASE_URL_STAGING`: Connection string cho branch staging.
 *   `NEON_DATABASE_URL_PRODUCTION`: Connection string cho branch production.
 *   `KUBECONFIG_STAGING`: Base64 encoded kubeconfig cho staging.
 *   `KUBECONFIG_PRODUCTION`: Base64 encoded kubeconfig cho production.
 *   `DOCKER_REGISTRY_TOKEN`: Dùng để push images.
 ---
 ## Staging Deployment
-### Yêu Cầu
+Staging phản chiếu production nhưng sử dụng tài nguyên tiết kiệm hơn.
 - Quyền truy cập Kubernetes cluster
 - kubectl đã cấu hình
 - KUBECONFIG đã set
 - Neon staging branch đã tạo
 - GitHub Secrets đã cấu hình:
  - `NEON_DATABASE_URL_STAGING`
  - `KUBECONFIG_STAGING`
-### Thiết Lập Secrets
+### Deployment Thủ công
 ```bash
-# Tạo Kubernetes secret
+# 1. Tạo Secrets
 kubectl create secret generic iam-service-secrets \
-  --from-literal=database-url='postgresql://user:pass@ep-xxx.region.neon.tech/dbname?sslmode=require&pgbouncer=true' \
+  --from-literal=database-url='<STAGING_NEON_URL>' \
-  --from-literal=jwt-secret='your-staging-jwt-secret' \
+  --from-literal=jwt-secret='<RANDOM_SECRET>' \
  --from-literal=jwt-refresh-secret='your-staging-refresh-secret' \
  -n staging
 # 2. Apply Manifests
 kubectl apply -f deployments/staging/kubernetes/ -n staging
 # 3. Verify
 kubectl get pods -n staging
 ```
-### Deploy
+### Qua CI/CD
-```bash
+Push code vào branch `develop`. Action sẽ:
-./scripts/deploy/deploy-staging.sh
+1.  Chạy tests.
-```
+2.  Chạy `prisma migrate deploy` vào Database Staging.
 3.  Cập nhật Kubernetes deployment image.
-Hoặc thủ công:
+---
 ```bash
 kubectl apply -f deployments/staging/kubernetes/
 ```
 **Lưu ý**: Migrations chạy tự động trong CI/CD trước khi deployment.
 ## Production Deployment
-### Yêu Cầu
+Production sử dụng cấu hình high-availability (HA).
 - Production Kubernetes cluster
 - kubectl đã cấu hình với production context
 - Neon production branch đã tạo
 - GitHub Secrets đã cấu hình:
  - `NEON_DATABASE_URL_PRODUCTION`
  - `KUBECONFIG_PRODUCTION`
-### Thiết Lập Secrets
+### 1. Chuẩn bị Database
 *   Đảm bảo branch Production trên Neon được set **protected**.
 *   Cấu hình **Point-in-Time Recovery (PITR)** window (ví dụ: 7 ngày).
 ### 2. Các bước Deployment Thủ công
 ```bash
-# Tạo Kubernetes secret
+# 1. Tạo Namespace
 kubectl create namespace production
 # 2. Tạo Secrets (Khuyến nghị Sealed Secrets) hoặc Standard Secrets
 kubectl create secret generic iam-service-secrets \
-  --from-literal=database-url='postgresql://user:pass@ep-xxx.region.neon.tech/dbname?sslmode=require&pgbouncer=true' \
+  --from-literal=database-url='<PROD_NEON_URL>' \
-  --from-literal=jwt-secret='your-production-jwt-secret' \
+  --from-literal=jwt-secret='<SECURE_RANDOM_SECRET>' \
-  --from-literal=jwt-refresh-secret='your-production-refresh-secret' \
+  --from-literal=jwt-refresh-secret='<SECURE_RANDOM_SECRET>' \
  -n production
 # 3. Deploy
 kubectl apply -f deployments/production/kubernetes/ -n production
 ```
-### Deploy
+### 3. Xác minh
 ```bash
-./scripts/deploy/deploy-prod.sh
+# Kiểm tra trạng thái Rollout
 kubectl rollout status deployment/iam-service -n production
 # Xem Logs
 kubectl logs -l app=iam-service -n production
 ```
-**Lưu ý**: Migrations chạy tự động trong CI/CD trước khi deployment (cần approval).
+---
-### Rollback
+## Scaling & Resilience
 ### Horizontal Pod Autoscaler (HPA)
 Chúng tôi sử dụng HPA để tự động scale số lượng pods dựa trên CPU/Memory.
 ```yaml
 # Ví dụ cấu hình HPA
 apiVersion: autoscaling/v2
 kind: HorizontalPodAutoscaler
 metadata:
  name: iam-service-hpa
 spec:
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
 ```
 ### Zero-Downtime Deployment
 Kubernetes xử lý việc này thông qua Rolling Updates.
 *   **MaxSurge**: 25% (Thêm pods mới trước khi xóa pods cũ).
 *   **MaxUnavailable**: 0 (Đảm bảo không có downtime trong quá trình update).
 ---
 ## Quy trình Rollback
 Nếu deployment thất bại hoặc gây ra lỗi nghiêm trọng:
 ### Kubernetes Rollback
 ```bash
 # Undo deployment gần nhất
 kubectl rollout undo deployment/iam-service -n production
 # Undo về một revision cụ thể
 kubectl rollout undo deployment/iam-service -n production --to-revision=2
 ```
-## Health Checks
+### Database Rollback
- Liveness: `GET /health/live`
+Vì Neon hỗ trợ branching và PITR:
- Readiness: `GET /health/ready`
+1.  Vào Neon Console.
- Health: `GET /health`
+2.  Restore branch `production` về timestamp trước khi migration bị lỗi.
-
+3.  **Cảnh báo**: Việc này có thể gây mất dữ liệu giao dịch mới. Hãy cẩn trọng.
 ## Monitoring
 - Prometheus: http://prometheus:9090
 - Grafana: http://grafana:3000
 - Traefik Dashboard: http://traefik:8080
--- a/docs/vi/guides/development.md
+++ b/docs/vi/guides/development.md
@@ -1,111 +1,211 @@
 # Hướng Dẫn Development
 > **Lưu ý**: Hướng dẫn này cung cấp các tiêu chuẩn và quy trình toàn diện để đóng góp vào GoodGo Microservices Platform.
 ## Mục lục
 1. [Cấu Trúc Dự Án](#cấu-trúc-dự-án)
 2. [Tiêu Chuẩn Code](#tiêu-chuẩn-code)
 3. [Quy Trình Git](#quy-trình-git)
 4. [Phát Triển Backend](#phát-triển-backend)
 5. [Chiến Lược Testing](#chiến-lược-testing)
 6. [Quy Trình Database](#quy-trình-database)
 7. [Triển Khai Kubernetes](#triển-khai-kubernetes)
 ---
 ## Cấu Trúc Dự Án
 Chúng tôi tuân theo cấu trúc monorepo quản lý bởi PNPM Workspaces.
 ```
-├── apps/              # Frontend applications
+Base/
-├── services/          # Backend microservices
+├── apps/                 # Ứng dụng Frontend
-├── packages/          # Shared libraries
+│   ├── web-client/       # Next.js 14+ (App Router)
-├── infra/             # Infrastructure configs
+│   └── mobile-client/    # Flutter
-├── deployments/       # Deployment configs
+├── services/             # Backend microservices
-├── scripts/           # Automation scripts
+│   ├── _template/        # Template cho service mới
-└── docs/              # Documentation
+│   ├── iam-service/      # Identity & Access Management
 │   └── ...
 ├── packages/             # Thư viện chia sẻ
 │   ├── logger/           # Structured logging (Winston)
 │   ├── types/            # DTOs & Interfaces chia sẻ
 │   ├── http-client/      # Internal Service Client
 │   └── tracing/          # Cấu hình OpenTelemetry
 ├── infra/                # Infrastructure-as-Code
 │   ├── traefik/          # API Gateway
 │   └── databases/        # Scripts thiết lập Database
 └── docs/                 # Tài liệu (EN & VI)
 ```
-## Quy Trình Development
+---
-### 1. Tạo Feature Branch
+## Tiêu Chuẩn Code
 ### Quy ước Đặt tên
 *   **Files**: `kebab-case.ts` (ví dụ: `user.controller.ts`, `app.config.ts`)
 *   **Classes**: `PascalCase` (ví dụ: `UserController`, `AuthService`)
 *   **Functions/Variables**: `camelCase` (ví dụ: `getUserById`, `isValid`)
 *   **Constants**: `UPPER_SNAKE_CASE` (ví dụ: `MAX_RETRIES`, `DEFAULT_TIMEOUT`)
 *   **Interfaces**: `PascalCase` (ví dụ: `User`, `CreateUserDto`) - *Không dùng tiền tố 'I'*
 ### Bilingual Comments (Bình luận Song ngữ)
 Đối với logic cốt lõi và public APIs, giả định cả lập trình viên quốc tế và Việt Nam đều đọc code.
 ```typescript
 /**
 * EN: Validates user credentials and returns a token
 * VI: Xác thực thông tin người dùng và trả về token
 */
 async login(dto: LoginDto): Promise<TokenResponse> { ... }
 ```
 ### TypeScript Usage
 *   **Strict Mode**: Được bật trong `tsconfig.json`. Không cho phép `any` (dùng `unknown` nếu cần).
 *   **DTOs**: Sử dụng Zod để runtime validation và type inference.
 *   **Return Types**: Khai báo rõ ràng kiểu trả về cho tất cả public methods.
 ---
 ## Quy Trình Git
 ### Chiến lược Nhánh (Branching Strategy)
 *   `main`: Code production-ready.
 *   `develop`: Nhánh integration cho release tiếp theo.
 *   `feature/xyz`: Tính năng mới (tách từ `develop`).
 *   `fix/xyz`: Sửa lỗi (tách từ `develop`).
 *   `hotfix/xyz`: Sửa lỗi nghiêm trọng (tách từ `main`).
 ### Commit Messages
 Chúng tôi tuân theo [Conventional Commits](https://www.conventionalcommits.org/):
 ```
 feat(iam): add multi-factor authentication
 fix(db): correct unique constraint on email
 docs(guide): update development setup
 style: format code with prettier
 refactor: simplify auth middleware
 test: add unit tests for user service
 chore: update dependencies
 ```
 ---
 ## Phát Triển Backend
 ### Tạo API Endpoint Mới
 1.  **Định nghĩa DTO** (`modules/user/user.dto.ts`):
    ```typescript
    export const CreateUserDto = z.object({
      email: z.string().email(),
      name: z.string().min(2),
    });
    export type CreateUserDto = z.infer<typeof CreateUserDto>;
    ```
 2.  **Tạo Service Method** (`modules/user/user.service.ts`):
    *   Implement business logic.
    *   Sử dụng `BaseRepository`.
    *   Throw `HttpError` (ví dụ: `NotFound`, `BadRequest`).
 3.  **Tạo Controller** (`modules/user/user.controller.ts`):
    *   Parse body với DTO: `const dto = CreateUserDto.parse(req.body)`.
    *   Gọi service.
    *   Trả về success response: `res.json({ success: true, data: result })`.
 4.  **Đăng ký Route** (`modules/user/index.ts`):
    *   Thêm vào Express router cùng middlewares.
 ### Xử lý Lỗi (Error Handling)
 Luôn sử dụng các class lỗi tùy chỉnh từ `core/errors`:
 ```typescript
 import { NotFoundError, ConflictError } from '../../core/errors';
 if (!user) {
  throw new NotFoundError('User not found');
 }
 ```
 ---
 ## Chiến Lược Testing
 ### Unit Tests (`*.test.ts`)
 *   **Phạm vi**: Các class/function đơn lẻ.
 *   **Mocking**: Mock tất cả dependencies bên ngoài (DB, services khác) dùng `jest-mock-extended`.
 *   **Vị trí**: Đặt cùng thư mục với file source.
 *   **Chạy**: `pnpm test`
 ### E2E Tests (`tests/**/*.e2e.ts`)
 *   **Phạm vi**: Full API flows (Controller -> Service -> DB).
 *   **Database**: Sử dụng test database riêng biệt (Dockerized).
 *   **Chạy**: `pnpm test:e2e`
 ### Linting & Formatting
 *   **Lint**: `pnpm lint` (ESLint)
 *   **Format**: `pnpm format` (Prettier)
 *   **Typecheck**: `pnpm typecheck` (TSC)
 ---
 ## Quy Trình Database
 Chúng tôi sử dụng **Prisma** với **Neon PostgreSQL**.
 ### Migrations
 1.  Sửa `prisma/schema.prisma`.
 2.  Tạo migration (Dev):
    ```bash
    ./scripts/db/migrate.sh iam-service dev --name add_user_profile
    ```
 3.  Áp dụng cho Production (CI/CD):
    ```bash
    ./scripts/db/migrate.sh iam-service deploy
    ```
 ### Seed Data
 Nạp dữ liệu mẫu vào database:
 ```bash
 ./scripts/db/seed.sh iam-service
 ```
 ### Xem Dữ liệu Trực quan
 Sử dụng Prisma Studio:
 ```bash
 pnpm --filter @goodgo/iam-service prisma studio
 ```
 ---
 ## Triển Khai Kubernetes
 Để test Kubernetes cục bộ (Docker Desktop / Minikube):
 ```bash
-git checkout -b feature/my-feature
+# 1. Build images
-```
+docker build -t goodgo/iam-service:latest -f services/iam-service/Dockerfile .
-### 2. Thực Hiện Thay Đổi
+# 2. Deploy
 - Viết code tuân theo TypeScript strict mode
 - Thêm tests cho chức năng mới
 - Cập nhật tài liệu nếu cần
 ### 3. Chạy Tests Locally
 ```bash
 # Tất cả tests
 pnpm test
 # Service cụ thể
 pnpm --filter @goodgo/iam-service test
 ```
 ### 4. Lint và Format
 ```bash
 pnpm lint
 pnpm format
 ```
 ### 5. Tạo Pull Request
 - Push branch của bạn
 - Tạo PR target `develop`
 - CI/CD sẽ chạy tự động
 ## Thêm Service Mới
 1. Sử dụng template:
   ```bash
   ./scripts/utils/create-service.sh my-new-service
   ```
 2. Cập nhật cấu hình service
 3. Implement business logic
 4. Thêm tests
 5. Cập nhật tài liệu
 ## Thêm Package Mới
 1. Tạo package trong `packages/new-package`
 2. Thêm vào workspace trong `pnpm-workspace.yaml`
 3. Export từ `index.ts`
 4. Thêm tests
 5. Ghi lại cách sử dụng
 ## Database Migrations
 ## Database Migrations
 ```bash
 # Tạo migration (dev)
 ./scripts/db/migrate.sh iam-service dev
 # Áp dụng migrations (production)
 ./scripts/db/migrate.sh iam-service deploy
 ```
 ## Kubernetes Deployment
 ### Local Kubernetes (Docker Desktop)
 ```bash
 # Enable Kubernetes trong Docker Desktop
 # Settings → Kubernetes → Enable Kubernetes
 # Deploy service
 cd deployments/local/kubernetes
 ./deploy.sh
-# Verify deployment
+# 3. Xác minh
 kubectl get pods -n iam-local
-kubectl logs -f -n iam-local -l app=iam-service
+kubectl logs -f -l app=iam-service -n iam-local
 # Port forward để test
 kubectl port-forward svc/iam-service 5002:80 -n iam-local
 curl http://localhost:5002/health/live
 ```
-**Xem hướng dẫn chi tiết**: [Kubernetes Local Deployment Guide](./kubernetes-local.md)
+Xem [Hướng Dẫn Kubernetes](./kubernetes-local.md) để biết chi tiết.
 ## Debugging
 - Sử dụng logger từ `@goodgo/logger`
 - Kiểm tra Traefik logs: `docker logs traefik-local`
 - Kiểm tra service logs: `./scripts/dev/logs.sh iam-service`
--- a/docs/vi/guides/getting-started.md
+++ b/docs/vi/guides/getting-started.md
@@ -1,81 +1,214 @@
-# Bắt Đầu
+# Hướng Dẫn Bắt Đầu
-## Yêu Cầu
+> **Lưu ý**: Hướng dẫn này giả định bạn đang cài đặt trên macOS hoặc Linux. Người dùng Windows nên sử dụng WSL2.
- Node.js >= 20.0.0
+## Mục lục
 - PNPM >= 8.0.0
 - Docker & Docker Compose
 - Git
 - Tài khoản Neon (https://neon.tech) - cho database
-## Thiết Lập Ban Đầu
+1. [Yêu cầu tiên quyết](#yêu-cầu-tiên-quyết)
 2. [Tổng quan Kiến trúc](#tổng-quan-kiến-trúc)
 3. [Cấu trúc Dự án](#cấu-trúc-dự-án)
 4. [Cài đặt & Thiết lập](#cài-đặt--thiết-lập)
 5. [Quy trình Phát triển](#quy-trình-phát-triển)
 6. [Các Lệnh Thường dùng](#các-lệnh-thường-dùng)
 7. [Xử lý Sự cố](#xử-lý-sự-cố)
-1. **Clone repository**
+## Yêu cầu tiên quyết
   ```bash
   git clone <repository-url>
   cd Base
   ```
-2. **Thiết Lập Neon Database**
+Trước khi bắt đầu, đảm bảo bạn đã cài đặt:
   ```bash
   # Chạy script setup
   ./scripts/db/setup-neon.sh
   # Hoặc thủ công:
   # 1. Tạo Neon project tại https://neon.tech
   # 2. Tạo branches: main (dev), staging, production
   # 3. Lấy connection strings
   # 4. Cập nhật deployments/local/.env.local
   ```
   Xem [Hướng Dẫn Thiết Lập Neon](../../infra/databases/neon/README.md) để biết chi tiết.
-3. **Khởi tạo project**
+*   **Node.js**: v20.0.0 trở lên
-   ```bash
+    ```bash
-   ./scripts/setup/init-project.sh
+    node -v
-   ```
+    # v20.10.0
    ```
 *   **PNPM**: v8.0.0 trở lên (sử dụng pnpm workspaces)
    ```bash
    pnpm -v
    # 8.12.0
    ```
 *   **Docker & Docker Compose**: Cho infrastructure cục bộ
    ```bash
    docker -v
    # Docker version 24.0.0
    ```
 *   **Git**: Để quản lý version
 *   **Tài khoản Neon**: Serverless PostgreSQL (https://neon.tech)
-4. **Khởi động infrastructure** (Redis, Traefik - không cần PostgreSQL)
+## Tổng quan Kiến trúc
   ```bash
   cd deployments/local
   docker-compose up -d
   cd ../..
   ```
-5. **Chạy database migrations**
+GoodGo Platform sử dụng kiến trúc microservices với layer infrastructure chia sẻ.
   ```bash
   ./scripts/db/migrate.sh iam-service dev
   ```
-6. **Seed database**
+```mermaid
-   ```bash
+graph TD
-   ./scripts/db/seed.sh iam-service
+    Client[Client Apps] --> Traefik[Traefik Gateway]
-   ```
+    
    Traefik --> IAM[IAM Service]
    Traefik --> Template[Template Service]
    IAM --> DB[(Neon PostgreSQL)]
    IAM --> Redis[(Redis Cache)]
    IAM --> Kafka[Kafka Events]
    style Traefik fill:#e1f5ff
    style DB fill:#f0e1ff
    style Redis fill:#fff4e1
 ```
-7. **Khởi động tất cả services**
+## Cấu trúc Dự án
   ```bash
   ./scripts/dev/start-all.sh
   ```
-## Điểm Truy Cập
+Repository tuân theo cấu trúc monorepo:
- **API Gateway**: http://localhost/api/v1
+```
- **Auth Service**: http://localhost:5001
+Base/
- **Web Admin**: http://admin.localhost hoặc http://localhost:3000
+├── apps/                 # Ứng dụng Frontend
- **Web Client**: http://localhost hoặc http://localhost:3001
+│   ├── web-client/       # Next.js web application
- **Traefik Dashboard**: http://localhost:8080
+│   └── mobile-client/    # Flutter mobile application
 ├── services/             # Backend microservices
 │   ├── iam-service/      # Service xác thực & phân quyền
 │   └── _template/        # Template cho service mới
 ├── packages/             # Thư viện chia sẻ (Shared libraries)
 │   ├── logger/           # Structured logging
 │   ├── types/            # TypeScript types chia sẻ
 │   └── http-client/      # Client HTTP nội bộ
 ├── infra/                # Cấu hình Infrastructure
 │   ├── traefik/          # Cấu hình API Gateway
 │   └── databases/        # Scripts thiết lập Database
 ├── deployments/          # Cấu hình Deploy
 │   ├── local/            # Docker Compose cho dev
 │   └── k8s/              # Kubernetes manifests
 └── docs/                 # Tài liệu
 ```
-## Database
+## Cài đặt & Thiết lập
-Project này sử dụng **Neon PostgreSQL** cho tất cả môi trường:
+### 1. Clone Repository
 - **Development**: Neon main branch
 - **Staging**: Neon staging branch
 - **Production**: Neon production branch
-Không cần PostgreSQL local! Xem [Thiết Lập Neon](../../infra/databases/neon/README.md) để biết chi tiết.
+```bash
 git clone <repository-url>
 cd Base
 ```
-## Các Bước Tiếp Theo
+### 2. Cấu hình Môi trường
- Đọc [Hướng Dẫn Development](development.md)
+Mỗi service và infrastructure cục bộ cần biến môi trường. Chúng tôi cung cấp templates sẵn.
- Xem [Tài Liệu API](../api/openapi/)
+
- Xem lại [Tổng Quan Kiến Trúc](../architecture/system-design.md)
+```bash
 # Khởi tạo setup dự án (copy .env.example sang .env)
 ./scripts/setup/init-project.sh
 ```
 ### 3. Thiết lập Neon Database
 Dự án sử dụng Neon (Serverless PostgreSQL) cho mọi môi trường.
 1.  Tạo project tại [neon.tech](https://neon.tech).
 2.  Tạo branch tên `dev` (hoặc `main`).
 3.  Lấy Connection String từ dashboard.
 4.  Cập nhật file `deployments/local/.env.local`:
 ```env
 DATABASE_URL="postgres://user:pass@ep-xyz.region.neon.tech/neondb"
 ```
 ### 4. Khởi động Infrastructure
 Khởi động các dịch vụ hỗ trợ (Redis, Traefik, Observability) bằng Docker Compose.
 ```bash
 cd deployments/local
 docker-compose up -d
 # Expected output: Containers for traefik, redis, kafka created
 ```
 ### 5. Cài đặt Dependencies
 ```bash
 pnpm install
 ```
 ### 6. Setup Database Schema
 Đẩy Prisma schema lên Neon database.
 ```bash
 # Chạy migration cho IAM service
 pnpm --filter @goodgo/iam-service prisma migrate dev
 ```
 ### 7. Khởi động Services
 Chạy tất cả backend services ở chế độ development.
 ```bash
 pnpm dev
 # hoặc chạy service cụ thể
 pnpm --filter @goodgo/iam-service dev
 ```
 ## Quy trình Phát triển
 ### Tạo Service Mới
 1.  Copy từ template:
    ```bash
    cp -r services/_template services/my-new-service
    ```
 2.  Cập nhật tên trong `package.json`.
 3.  Thêm logic trong `src/modules/`.
 4.  Đăng ký trong `deployments/local/docker-compose.yml`.
 ### Thực hiện Thay đổi
 1.  Tạo branch mới: `feature/my-feature`.
 2.  Implement thay đổi.
 3.  Chạy tests: `pnpm test`.
 4.  Commit với conventional commits: `feat(iam): add login endpoint`.
 ## Các Lệnh Thường dùng
 | Lệnh | Mô tả |
 | :--- | :--- |
 | `pnpm install` | Cài đặt dependencies |
 | `pnpm dev` | Chạy tất cả services (dev mode) |
 | `pnpm build` | Build tất cả packages và services |
 | `pnpm test` | Chạy unit tests |
 | `pnpm lint` | Kiểm tra lỗi cú pháp (Lint) |
 | `docker-compose up -d` | Bật local infra |
 | `docker-compose down` | Tắt local infra |
 ## Xử lý Sự cố
 ### Xung đột Port (Port Conflicts)
 **Lỗi**: `Bind for 0.0.0.0:80 failed: port is already allocated`
 **Giải pháp**: Kiểm tra process nào đang dùng port 80 (thường là web server khác) và tắt nó, hoặc đổi port Traefik trong `docker-compose.yml`.
 ```bash
 lsof -i :80
 kill -9 <PID>
 ```
 ### Lỗi Kết nối Database
 **Lỗi**: `P1001: Can't reach database server`
 **Giải pháp**:
 1.  Kiểm tra kết nối internet (Neon là cloud DB).
 2.  Kiểm tra `DATABASE_URL` trong `deployments/local/.env.local`.
 3.  Đảm bảo IP của bạn được allow trong Neon dashboard.
 ### Gateway Không Tìm Thấy Service
 **Lỗi**: `404 Not Found` từ api.localhost
 **Giải pháp**:
 1.  Kiểm tra service có đang chạy không.
 2.  Kiểm tra Traefik dashboard tại http://localhost:8080.
 3.  Kiểm tra labels `PathPrefix` trong `docker-compose.yml`.
 ## Bước Tiếp Theo
 *   [Hướng dẫn Development](development.md) - Chi tiết chuẩn code và quy trình
 *   [Tài liệu API](../api/openapi/) - Khám phá các API endpoints
 *   [Kiến trúc Hệ thống](../architecture/system-design.md) - Hiểu về thiết kế hệ thống
--- a/docs/vi/guides/troubleshooting.md
+++ b/docs/vi/guides/troubleshooting.md
@@ -1,57 +1,218 @@
-# Hướng Dẫn Xử Lý Sự Cố
+# Hướng Dẫn Xử Lý Sự Cố (Troubleshooting)
-## Các Vấn Đề Thường Gặp
+> **Lưu ý**: Hướng dẫn này tập trung vào việc debug GoodGo Microservices Platform trong môi trường phát triển cục bộ (Docker Compose).
-### Kết Nối Database Thất Bại
+## Mục lục
-**Triệu chứng**: Service không thể kết nối database
+1. [Chẩn đoán Chung](#chẩn-đoán-chung)
 2. [Vấn đề Infrastructure](#vấn-đề-infrastructure)
   - [Database (Neon/PostgreSQL)](#database-neonpostgresql)
   - [Redis](#redis)
   - [Traefik Gateway](#traefik-gateway)
 3. [Vấn đề Service](#vấn-đề-service)
   - [Service Không Khởi Động](#service-không-khởi-động)
   - [Lỗi Prisma/Database](#lỗi-prismadatabase)
   - [Lỗi Authentication](#lỗi-authentication)
 4. [Công cụ Debug](#công-cụ-debug)
 5. [Câu hỏi Thường Gặp (FAQ)](#câu-hỏi-thường-gặp-faq)
 ---
 ## Chẩn đoán Chung
 Khi có sự cố, hãy làm theo danh sách kiểm tra sau:
 1.  **Kiểm tra Trạng thái Service**:
    ```bash
    cd deployments/local
    docker-compose ps
    ```
    *Tất cả services nên ở trạng thái `Up` hoặc `Running`.*
 2.  **Xem Logs**:
    ```bash
    # Xem logs của service cụ thể
    docker-compose logs -f <service-name>
    # Xem 100 dòng cuối của tất cả
    docker-compose logs --tail=100
    ```
 3.  **Kiểm tra Kết nối**:
    *   Có thể truy cập Gateway không? `curl http://localhost/health`
    *   Có thể truy cập Dashboard không? http://localhost:8080
 ---
 ## Vấn đề Infrastructure
 ### Database (Neon/PostgreSQL)
 **Vấn đề**: `P1001: Can't reach database server` hoặc `Connection timed out`
 *   **Nguyên nhân 1**: Lỗi kết nối Internet (Neon là cloud DB).
 *   **Nguyên nhân 2**: Sai `DATABASE_URL` trong `.env`.
 *   **Nguyên nhân 3**: Địa chỉ IP bị chặn bởi Neon.
 **Giải pháp**:
-1. Kiểm tra PostgreSQL có đang chạy: `docker ps`
+1.  Ping thử: `ping neon.tech`.
-2. Xác minh DATABASE_URL trong .env
+2.  Kiểm tra file `deployments/local/.env.local`. URL nên có dạng:
-3. Kiểm tra kết nối mạng: `docker network ls`
+    `postgres://user:pass@ep-xyz.aws.neon.tech/neondb`
-4. Xem logs: `docker logs postgres-auth-local`
+3.  Vào Neon Dashboard -> Settings, đảm bảo đã chọn "Allow all IPs" hoặc thêm IP hiện tại của bạn.
-### Port Đã Được Sử Dụng
+**Vấn đề**: `P1003: Database does not exist`
-**Triệu chứng**: Service không khởi động với lỗi port
+*   **Nguyên nhân**: Bạn kết nối sai tên database.
 *   **Sửa**: Kiểm tra cuối chuỗi kết nối (thường là `/neondb`). Nếu dùng tên DB tùy chỉnh, đảm bảo nó đã được tạo trên Neon.
 ### Redis
 **Vấn đề**: `Redis connection refused` hoặc `ECONNREFUSED`
 *   **Nguyên nhân**: Container Redis chưa chạy hoặc sai port mapping.
 **Giải pháp**:
-1. Tìm process đang dùng port: `lsof -i :5001`
+1.  Kiểm tra trạng thái: `docker-compose ps redis`.
-2. Kill process hoặc thay đổi PORT trong .env
+2.  Khởi động lại: `docker-compose restart redis`.
-3. Kiểm tra docker-compose cho port conflicts
+3.  Xem logs: `docker-compose logs redis`.
 4.  Chuỗi kết nối từ services:
    *   **Bên trong Docker**: `redis:6379`
    *   **Từ Host**: `localhost:6379`
-### Prisma Client Chưa Được Generate
+### Traefik Gateway
-**Triệu chứng**: Lỗi import cho Prisma Client
+**Vấn đề**: `404 Not Found` khi gọi API (ví dụ: `http://localhost/api/v1/auth`)
 *   **Nguyên nhân**: Service bị down hoặc Labels cấu hình sai.
 **Giải pháp**:
 1.  Kiểm tra Traefik Dashboard tại http://localhost:8080.
    *   Tìm trong "HTTP Routers" và "Services".
    *   Nếu service không hiện, kiểm tra labels trong `docker-compose.yml`.
 2.  Đảm bảo `PathPrefix` trùng khớp với request:
    ```yaml
    - "traefik.http.routers.iam.rule=PathPrefix(`/api/v1/auth`)"
    ```
 3.  Kiểm tra health checks xem service có healthy không.
 **Vấn đề**: `Bad Gateway` hoặc `Gateway Timeout`
 *   **Nguyên nhân**: Service bị crash hoặc phản hồi quá lâu.
 *   **Sửa**: Xem logs cụ thể của service (`docker-compose logs iam-service`).
 ---
 ## Vấn đề Service
 ### Service Không Khởi Động
 **Triệu chứng**: Trạng thái container là `Exited (1)` hoặc `Restarting`.
 **Debug**:
 1.  Xem logs ngay lập tức:
    ```bash
    docker-compose logs iam-service
    ```
 2.  **Lỗi thường gặp**: `Config validation error`
    *   **Sửa**: Kiểm tra biến môi trường. Chạy `./scripts/setup/init-project.sh` để copy `.env`.
 3.  **Lỗi thường gặp**: `PrismaClientInitializationError`
    *   **Sửa**: Lỗi kết nối database (xem phần Infrastructure).
 ### Lỗi Prisma/Database
 **Lỗi**: `P2025: Record to update not found`
 *   **Sửa**: Lỗi logic. Đảm bảo ID tồn tại trước khi update.
 **Lỗi**: `P2002: Unique constraint failed`
 *   **Sửa**: Bạn đang cố insert dữ liệu trùng lặp (ví dụ: trùng email).
 **Lỗi**: `Migration failed`
 *   **Sửa**:
    1.  Xóa thư mục `prisma/migrations` (chỉ làm ở dev!).
    2.  Reset DB: `pnpm prisma migrate reset`.
    3.  Regenerate client: `pnpm prisma generate`.
 ### Lỗi Authentication
 **Vấn đề**: `401 Unauthorized` dù token hợp lệ
 *   **Nguyên nhân 1**: Token hết hạn.
 *   **Nguyên nhân 2**: Public key mismatch (Service không verify được token do IAM ký).
 *   **Nguyên nhân 3**: Lệch giờ (Thời gian Docker khác Host).
 **Giải pháp**:
 1.  Kiểm tra logs xem lỗi verify JWT cụ thể.
 2.  Restart services để refresh keys.
 3.  Sync thời gian Docker: restart Docker Desktop.
 ---
 ## Công cụ Debug
 ### 1. Truy cập Shell của Container
 Để xem files hoặc chạy lệnh bên trong container:
 ```bash
-cd services/iam-service
+docker-compose exec iam-service sh
-pnpm prisma generate
+# hoặc /bin/bash
 ```
-### Build Failures
+### 2. Kiểm tra Database (Prisma Studio)
-**Triệu chứng**: Lỗi TypeScript hoặc build
+Giao diện trực quan để xem/sửa dữ liệu:
-**Giải pháp**:
+```bash
-1. Xóa build artifacts: `./scripts/utils/cleanup.sh`
+pnpm --filter @goodgo/iam-service prisma studio
-2. Cài đặt lại dependencies: `pnpm install`
+# Mở http://localhost:5555
-3. Kiểm tra lỗi TypeScript: `pnpm typecheck`
+```
-### Traefik Không Routing
+### 3. Kiểm tra Redis
-**Triệu chứng**: Lỗi 404 từ Traefik
+```bash
 docker-compose exec redis redis-cli
 > PING
 PONG
 > KEYS *
 1) "user:123:session"
 ```
-**Giải pháp**:
+### 4. Test API Trực tiếp
 1. Kiểm tra Traefik dashboard: http://localhost:8080
 2. Xác minh service labels trong docker-compose
 3. Kiểm tra cấu hình routes.yml
 4. Xem Traefik logs: `docker logs traefik-local`
-## Tìm Kiếm Trợ Giúp
+Dùng `curl` hoặc Postman.
-1. Kiểm tra service logs: `./scripts/dev/logs.sh <service>`
+```bash
-2. Xem lại GitHub Issues
+# Health Check
-3. Liên hệ team lead
+curl -v http://localhost/api/v1/auth/health/live
 # Login (ví dụ)
 curl -X POST http://localhost/api/v1/auth/login \
  -H "Content-Type: application/json" \
  -d '{"email":"admin@example.com", "password":"password"}'
 ```
 ---
 ## Câu hỏi Thường Gặp (FAQ)
 **H: Tại sao thay đổi của tôi không cập nhật?**
 Đ: Nếu sửa `.env` hoặc `docker-compose.yml`, bạn phải restart:
 ```bash
 docker-compose down && docker-compose up -d
 ```
 Nếu sửa code, hot-reloading sẽ tự chạy. Nếu không, restart container.
 **H: Làm sao để reset toàn bộ hệ thống?**
 Đ: Cẩn thận, lệnh này xóa toàn bộ dữ liệu!
 ```bash
 docker-compose down -v
 # -v xóa cả volumes (dữ liệu Redis, v.v.)
 ```
 **H: Máy tính chạy rất chậm khi bật dự án.**
 Đ: Docker tốn nhiều RAM.
 1.  Tắt các service không dùng (ví dụ `future-service`).
 2.  Tăng giới hạn resource trong cài đặt Docker Desktop.