docs: Thêm tài liệu kiến trúc bảo mật, hướng sự kiện, nhất quán dữ liệu, khả năng quan sát và caching bằng tiếng Việt, đồng thời cập nhật các tài liệu hướng dẫn và kiến trúc hiện có.

2026-01-07 10:22:42 +07:00
parent d8faffd41d
commit 495618ded7
17 changed files with 7357 additions and 779 deletions
--- a/docs/en/architecture/caching-architecture.md
+++ b/docs/en/architecture/caching-architecture.md
@@ -1,9 +1,8 @@
-# Caching Architecture / Kiến trúc Caching
+# Caching Architecture

-> **EN**: Multi-layer caching strategy for optimal performance
-> **VI**: Chiến lược caching nhiều tầng để tối ưu hiệu suất
+> Multi-layer caching strategy for optimal performance

-## Overview Diagram / Sơ đồ Tổng quan
+## Overview Diagram

 ```mermaid
 graph TD
@@ -24,9 +23,35 @@ graph TD
    style DB fill:#f0e1ff
 ```

-## Architecture Description / Mô tả Kiến trúc
+## System Context

-### EN: Multi-Layer Caching
+```mermaid
+C4Context
+    title Caching System Context
+
+    System(service, "Microservice", "Client service using cache")
+    System_Ext(db, "Neon PostgreSQL", "Primary database")
+    
+    Boundary(caching, "Caching Layer") {
+        System(l1, "L1 Cache", "In-memory NodeCache")
+        System(l2, "L2 Cache", "Redis Cluster")
+    }
+
+    Rel(service, l1, "Reads/Writes", "In-process")
+    Rel(service, l2, "Reads/Writes", "Redis Protocol")
+    Rel(l1, l2, "Fills from", "On miss")
+    Rel(l2, db, "Cache aside", "On miss")
+```
+
+### Context Description
+- **Service**: Communicates directly with L1 Cache (in-memory) for lowest latency.
+- **L1 Cache**: Local cache, not shared, automatic expiration (short TTL).
+- **L2 Cache**: Shared Redis cluster, holds data longer and syncs across instances.
+- **Database**: Source of truth, accessed only on cache miss.
+
+## Architecture Description
+
+### Multi-Layer Caching

 GoodGo platform uses 2-layer caching for performance:

@@ -52,30 +77,11 @@ Request → L1 → L2 → Database
 hit rate hit rate        rate
 ```

-### VI: Caching Nhiều Tầng
-
-Nền tảng GoodGo sử dụng caching 2 tầng để tối ưu hiệu suất:
-
-**L1 Cache (Memory)**:
- In-memory cache trên mỗi service instance
- Truy cập rất nhanh (< 1ms)
- Dung lượng giới hạn (10k keys mặc định)
- TTL ngắn (60 giây mặc định, tối đa 5 phút)
- Không share giữa instances
-
-**L2 Cache (Redis)**:
- Shared distributed cache
- Truy cập nhanh (< 5ms)
- Dung lượng lớn
- TTL dài hơn (configurable, thường 5-15 phút)
- Share giữa tất cả service instances
-
-## Cache Implementation / Triển khai Cache
+## Cache Implementation

 ### Multi-Layer Cache Service

 ```typescript
-// Multi-layer cache implementation
 export class MultiLayerCache {
  private l1Cache: NodeCache;
  private l2Cache: Redis;
@@ -143,13 +149,12 @@ export class MultiLayerCache {
 }
 ```

-### Cache Key Naming / Quy ước Đặt tên Key
+### Cache Key Naming

 **Pattern**: `{service}:{entity}:{identifier}:{sub-resource}`

 **Examples**:
 ```typescript
-// User cache keys
 const keys = {
  user: (userId: string) => `iam:user:${userId}`,
  userPermissions: (userId: string) => `iam:user:${userId}:permissions`,
@@ -162,7 +167,7 @@ const user = await cache.get(keys.user('user_123'));
 const permissions = await cache.get(keys.userPermissions('user_123'));
 ```

-## TTL Strategies / Chiến lược TTL
+## TTL Strategies

 ```mermaid
 graph LR
@@ -196,7 +201,7 @@ graph LR
 | Static config | 30-60 min | Very stable |
 | Reference data | 1-2 hours | Almost never changes |

-## Cache Invalidation / Vô hiệu hóa Cache
+## Cache Invalidation

 ```mermaid
 sequenceDiagram
@@ -244,7 +249,7 @@ async updateUserRole(userId: string, roleId: string): Promise<void> {
 // Automatically handled by cache
 ```

-## Cache Warming / Làm ấm Cache
+## Cache Warming

 ```typescript
 // Preload frequently accessed data
@@ -271,33 +276,83 @@ async warmCache(): Promise<void> {
 warmCache().catch(err => logger.error('Cache warming failed', { err }));
 ```

-## Performance Metrics / Chỉ số Hiệu suất
+## Design Decisions
+
+### Decision 1: Multi-layer Caching (L1 + L2)
+
+**Context**: Need to reduce load on Redis and achieve ultra-low latency for hot data.
+**Decision**: Use combination of L1 (NodeCache) and L2 (Redis).
+**Consequences**:
+- ✅ Latency < 1ms for 40-50% requests.
+- ✅ Reduced network traffic to Redis.
+- ❌ Synchronization complexity (L1 might be stale for short duration).
+
+## Performance Characteristics
+
+### Performance Targets
+| Metric | Target | Notes |
+|--------|--------|-------|
+| **L1 Hit Latency** | < 0.5ms | In-memory lookup |
+| **L2 Hit Latency** | < 5ms | Network RTT + Redis processing |
+| **Combine Hit Rate** | > 90% | L1 + L2 combined |
+| **L1 Capacity** | 10k items | Per instance limit to protect heap |
+| **Cache Warmup Time** | < 30s | At service startup |
+
+## Security Considerations
+
+### Cache Security
+- **Encryption**: Sensitive data (PII) MUST be encrypted before storing in L2 Redis (AES-256). L1 can store plaintext as it is in process memory (unless memory dump).
+- **Isolation**: Redis instance protected by password and Network Policy (allow internal K8s traffic only).
+- **TLS**: Connect to Redis via TLS 1.2+.
+- **Data Sanitization**: Do not cache entire user objects if they contain password hashes or secrets.
+
+## Deployment
+
+```mermaid
+graph TD
+    subgraph "Kubernetes Pod"
+        Service[Microservice Container]
+        L1[L1 Cache (RAM)]
+        Service --- L1
+    end
+
+    subgraph "Infrastructure"
+        RedisMaster[Redis Master]
+        RedisSlave1[Redis Slave 1]
+        RedisSlave2[Redis Slave 2]
+    end
+
+    Service -->|Write| RedisMaster
+    Service -->|Read| RedisSlave1
+    Service -->|Read| RedisSlave2
+
+    RedisMaster -.->|Replication| RedisSlave1
+    RedisMaster -.->|Replication| RedisSlave2
+
+    style Service fill:#e1f5ff
+    style L1 fill:#d4edda
+    style RedisMaster fill:#fff4e1
+```
+
+**Deployment Description**:
+- **L1**: Embedded directly in Microservice process, scales with number of Pods.
+- **L2**: Redis Cluster (or Sentinel) with at least 3 nodes for High Availability.
+- **Connection Pooling**: Use ioredis with connection pooling for efficient connection management.
+
+## Monitoring & Observability
+
+### Monitoring Metrics
+- **Metrics**: Prometheus metrics for hit rate, miss rate, latency, memory usage.
+- **Logs**: Log cache miss/hit at debug level (sampled), log connection errors at error level.
+- **Health Checks**: Readiness probe checks connection to Redis.
+
+### Monitoring Code

 **Cache Hit Rates**:
 ```typescript
 // Track cache performance
 export class CacheMetrics {
-  private hits = new Counter({
-    name: 'cache_hits_total',
-    help: 'Total cache hits',
-    labelNames: ['layer', 'key_prefix']
-  });
-  
-  private misses = new Counter({
-    name: 'cache_misses_total',
-    help: 'Total cache misses',
-    labelNames: ['layer', 'key_prefix']
-  });
-  
-  recordHit(layer: 'l1' | 'l2', key: string): void {
-    const prefix = key.split(':')[0];
-    this.hits.inc({ layer, key_prefix: prefix });
-  }
-  
-  recordMiss(key: string): void {
-    const prefix = key.split(':')[0];
-    this.misses.inc({ layer: 'db', key_prefix: prefix });
-  }
+  // ... Prometheus Implementation ...
 }
 ```

@@ -308,7 +363,7 @@ export class CacheMetrics {
 | Hit Rate | 40-50% | 80-90% | - |
 | Capacity | 10k keys | Unlimited | - |

-## Best Practices / Best Practices
+## Best Practices

 **DO**:
 - ✅ Use cache for frequently accessed data
@@ -325,13 +380,3 @@ export class CacheMetrics {
 - ❌ Cache sensitive data without encryption
 - ❌ Ignore cache invalidation on updates
 - ❌ Use cache as primary data store
-
-## Related Documentation / Tài liệu Liên quan
-
- [System Design](./system-design.md) - Overall architecture with caching
- [Data Consistency Patterns](./data-consistency-patterns.md) - Cache invalidation patterns
-
---
-
-**Last Updated**: 2024-01-15  
-**Authors**: GoodGo Architecture Team
--- a/docs/en/architecture/event-driven-architecture.md
+++ b/docs/en/architecture/event-driven-architecture.md
@@ -1,9 +1,8 @@
-# Event-Driven Architecture / Kiến trúc Hướng Sự kiện
+# Event-Driven Architecture

-> **EN**: Event-driven architecture for asynchronous communication using Apache Kafka
-> **VI**: Kiến trúc hướng sự kiện cho giao tiếp bất đồng bộ sử dụng Apache Kafka
+> Event-driven architecture for asynchronous communication using Apache Kafka

-## Overview Diagram / Sơ đồ Tổng quan
+## Overview Diagram

 ```mermaid
 graph TD
@@ -32,9 +31,7 @@ graph TD
    style Topics fill:#fff4e1
 ```

-## Architecture Description / Mô tả Kiến trúc
-
-### EN: English Section
+## Architecture Description

 The GoodGo platform implements Event-Driven Architecture (EDA) for asynchronous communication between microservices.

@@ -47,28 +44,11 @@ The GoodGo platform implements Event-Driven Architecture (EDA) for asynchronous

 **Technology Stack**:
 - Apache Kafka - Event streaming platform
- Schema Registry - Avro schemas for validation  
+- Schema Registry - Avro schemas for validation
 - KafkaJS - Node.js client library
 - Event Sourcing - Custom implementation in IAM

-### VI: Vietnamese Section
-
-Nền tảng GoodGo triển khai Kiến trúc Hướng Sự kiện (EDA) cho giao tiếp bất đồng bộ giữa microservices.
-
-**Nguyên tắc Cốt lõi**:
-1. **Event-First Design**: Mọi thay đổi trạng thái phát ra domain events
-2. **Loose Coupling**: Services giao tiếp qua events
-3. **Eventual Consistency**: Chấp nhận inconsistency tạm thời  
-4. **Event Sourcing**: Lưu thay đổi dưới dạng chuỗi event
-5. **CQRS Pattern**: Tách biệt read/write operations
-
-**Công nghệ**:
- Apache Kafka - Nền tảng event streaming
- Schema Registry - Avro schemas để validation
- KafkaJS - Thư viện Node.js client  
- Event Sourcing - Triển khai tùy chỉnh trong IAM
-
-## Event Flow / Luồng Sự kiện
+## Event Flow

 ```mermaid
 sequenceDiagram
@@ -82,11 +62,9 @@ sequenceDiagram
    Consumer-->>Kafka: Acknowledge
 ```

-**EN Steps**: Publish → Distribute → Consume → Retry (if failed) → DLQ (after max retries) → Acknowledge
+**Steps**: Publish → Distribute → Consume → Retry (if failed) → DLQ (after max retries) → Acknowledge

-**VI Các Bước**: Publish → Distribute → Consume → Retry (nếu thất bại) → DLQ (sau retry tối đa) → Acknowledge
-
-## Event Structure / Cấu trúc Sự kiện
+## Event Structure

 ```typescript
 interface BaseEvent {
@@ -114,7 +92,7 @@ interface BaseEvent {
 }
 ```

-## Kafka Topics / Kafka Topics
+## Kafka Topics

 ```mermaid
 graph LR
@@ -134,7 +112,7 @@ graph LR
 - `auth.login.success.v1`
 - `audit.event.logged.v1`

-## Error Handling / Xử lý Lỗi
+## Error Handling

 ```mermaid
 graph TD
@@ -151,12 +129,247 @@ graph TD
 3. Move to DLQ after max retries
 4. Manual review and reprocess

-## Related Documentation / Tài liệu Liên quan
+## System Context
+
+```mermaid
+C4Context
+    title Event-Driven Architecture Context
+    
+    System(iam, "IAM Service", "Event producer")
+    System(service_a, "Service A", "Event producer")
+    System(notification, "Notification Service", "Event consumer")
+    System(audit, "Audit Service", "Event consumer")
+    
+    System_Ext(kafka, "Apache Kafka", "Event streaming platform")
+    System_Ext(registry, "Schema Registry", "Schema management")
+    System_Ext(monitoring, "Monitoring", "Kafka metrics & alerts")
+    
+    Rel(iam, kafka, "Publishes events", "Kafka Protocol")
+    Rel(service_a, kafka, "Publishes events", "Kafka Protocol")
+    Rel(kafka, notification, "Delivers events", "Kafka Protocol")
+    Rel(kafka, audit, "Delivers events", "Kafka Protocol")
+    Rel(kafka, registry, "Validates schemas", "HTTP")
+    Rel(kafka, monitoring, "Sends metrics", "JMX")
+```
+
+**Context Description**:
+- **Producers**: IAM Service and other services publish domain events
+- **Kafka**: Central event broker, manages topics and partitions
+- **Consumers**: Notification and Audit services consume events
+- **Schema Registry**: Manages and validates Avro schemas
+- **Monitoring**: Collects metrics from Kafka cluster
+
+## Performance Characteristics
+
+| Metric | Target | Notes |
+|--------|--------|-------|
+| **Event Publish Latency (P95)** | < 10ms | Fire-and-forget, async |
+| **Event Delivery Latency (P95)** | < 100ms | End-to-end from publish to consume |
+| **Throughput** | 10,000 events/s | Per topic, scalable with partitions |
+| **Consumer Lag** | < 1000 messages | Per partition, monitored |
+| **Event Size** | < 1MB | Recommended max size |
+| **Retention** | 7 days | Default, configurable per topic |
+| **Replication Factor** | 3 | For fault tolerance |
+
+**Performance Optimizations**:
+- **Batch Publishing**: Group multiple events to reduce network overhead
+- **Compression**: Use Snappy or LZ4 compression
+- **Partitioning**: Divide topics into multiple partitions for parallel processing
+- **Consumer Groups**: Multiple consumers in same group for horizontal scaling
+- **Async Publishing**: Fire-and-forget pattern, don't block request handlers
+
+## Security Considerations
+
+**Event Encryption**:
+- TLS in-transit for all Kafka connections
+- Optional payload encryption for sensitive data
+- End-to-end encryption with custom encryption layer
+
+**Access Control**:
+- Kafka ACLs (Access Control Lists) per topic
+- SASL/SCRAM authentication for producers and consumers
+- Separate credentials per service
+- Principle of least privilege - grant only necessary permissions
+
+**Schema Validation**:
+- Avro schemas in Schema Registry
+- Schema evolution with backward/forward compatibility
+- Reject events that don't match schema
+
+**Audit**:
+- Log all event publishes and consumes
+- Correlation IDs to trace event flow
+- Retention policy for audit logs (7 years)
+
+**Data Retention**:
+- Default 7 days retention
+- Configurable per topic
+- Automatic deletion after retention period
+- GDPR compliance (right to erasure)
+
+## Deployment
+
+```mermaid
+graph TD
+    subgraph "Kafka Cluster"
+        subgraph "Brokers"
+            Broker1[Kafka Broker 1<br/>Leader for partitions 0,3,6]
+            Broker2[Kafka Broker 2<br/>Leader for partitions 1,4,7]
+            Broker3[Kafka Broker 3<br/>Leader for partitions 2,5,8]
+        end
+        
+        subgraph "Coordination"
+            ZK[Zookeeper Ensemble<br/>3 nodes]
+        end
+        
+        Broker1 --> ZK
+        Broker2 --> ZK
+        Broker3 --> ZK
+    end
+    
+    subgraph "Producers"
+        IAM[IAM Service]
+        ServiceA[Service A]
+    end
+    
+    subgraph "Consumers"
+        Notification[Notification Service<br/>Consumer Group: notifications]
+        Audit[Audit Service<br/>Consumer Group: audit]
+    end
+    
+    IAM --> Broker1
+    IAM --> Broker2
+    IAM --> Broker3
+    
+    ServiceA --> Broker1
+    ServiceA --> Broker2
+    ServiceA --> Broker3
+    
+    Broker1 --> Notification
+    Broker2 --> Notification
+    Broker3 --> Notification
+    
+    Broker1 --> Audit
+    Broker2 --> Audit
+    Broker3 --> Audit
+    
+    style Broker1 fill:#e1f5ff
+    style Broker2 fill:#fff4e1
+    style Broker3 fill:#d4edda
+    style ZK fill:#f0e1ff
+```
+
+**Kafka Cluster Configuration**:
+- **Brokers**: 3 brokers minimum (5 for production)
+- **Replication Factor**: 3 (for fault tolerance)
+- **Min In-Sync Replicas**: 2 (ensure data durability)
+- **Partitions**: 3-10 per topic (based on throughput needs)
+- **Zookeeper**: 3-node ensemble (for coordination)
+
+**Resource Allocation**:
+| Component | CPU | Memory | Disk |
+|-----------|-----|--------|------|
+| **Kafka Broker** | 2 cores | 4GB RAM | 100GB SSD |
+| **Zookeeper** | 1 core | 2GB RAM | 20GB SSD |
+| **Schema Registry** | 500m | 1GB RAM | 10GB |
+
+**Topic Configuration**:
+```yaml
+user.created:
+  partitions: 3
+  replication-factor: 3
+  retention-ms: 604800000  # 7 days
+  compression-type: snappy
+
+auth.login.success:
+  partitions: 5
+  replication-factor: 3
+  retention-ms: 604800000
+  compression-type: snappy
+
+audit.events:
+  partitions: 10
+  replication-factor: 3
+  retention-ms: 220752000000  # 7 years
+  compression-type: lz4
+```
+
+**High Availability**:
+- Multiple brokers with partition replication
+- Automatic leader election when broker fails
+- Consumer group rebalancing
+- Monitoring and alerting for broker health
+
+## Monitoring & Observability
+
+**Key Metrics**:
+
+**Kafka Broker Metrics**:
+- `kafka_server_brokertopicmetrics_messagesinpersec` - Messages in/sec
+- `kafka_server_brokertopicmetrics_bytesinpersec` - Bytes in/sec
+- `kafka_server_brokertopicmetrics_bytesoutpersec` - Bytes out/sec
+- `kafka_controller_kafkacontroller_activecontrollercount` - Active controller
+- `kafka_server_replicamanager_underreplicatedpartitions` - Under-replicated partitions
+
+**Consumer Metrics**:
+- `kafka_consumer_fetch_manager_records_lag_max` - Max consumer lag
+- `kafka_consumer_fetch_manager_records_consumed_rate` - Records consumed/sec
+- `kafka_consumer_coordinator_commit_latency_avg` - Commit latency
+
+**Producer Metrics**:
+- `kafka_producer_record_send_total` - Total records sent
+- `kafka_producer_record_error_total` - Total send errors
+- `kafka_producer_request_latency_avg` - Request latency
+
+**Application Metrics**:
+```typescript
+// Custom metrics for event processing
+const eventPublished = new Counter({
+  name: 'events_published_total',
+  help: 'Total events published',
+  labelNames: ['event_type', 'topic']
+});
+
+const eventConsumed = new Counter({
+  name: 'events_consumed_total',
+  help: 'Total events consumed',
+  labelNames: ['event_type', 'topic', 'consumer_group']
+});
+
+const eventProcessingDuration = new Histogram({
+  name: 'event_processing_duration_seconds',
+  help: 'Event processing duration',
+  labelNames: ['event_type'],
+  buckets: [0.01, 0.05, 0.1, 0.5, 1, 2, 5]
+});
+```
+
+**Dashboards**:
+- Kafka Cluster Overview (brokers, topics, partitions)
+- Producer Performance (throughput, latency, errors)
+- Consumer Performance (lag, throughput, errors)
+- Topic Metrics (messages/sec, bytes/sec, retention)
+
+**Logging**:
+```typescript
+// Structured logging for events
+logger.info('Event published', {
+  eventId: event.eventId,
+  eventType: event.eventType,
+  topic: 'user.created',
+  correlationId: event.correlationId
+});
+
+logger.info('Event consumed', {
+  eventId: event.eventId,
+  eventType: event.eventType,
+  topic: 'user.created',
+  consumerGroup: 'notifications',
+  processingTime: duration
+});
+```
+
+## Related Documentation

 - [System Design](./system-design.md) - Overall architecture
 - [IAM Architecture](./iam-proposal.md) - Event sourcing implementation
-
---
-
-**Last Updated**: 2024-01-15  
-**Authors**: GoodGo Architecture Team
--- a/docs/en/architecture/security-architecture.md
+++ b/docs/en/architecture/security-architecture.md
@@ -1,9 +1,8 @@
-# Security Architecture / Kiến trúc Bảo mật
+# Security Architecture

-> **EN**: Comprehensive security architecture for GoodGo platform with zero-trust model, RBAC, and compliance
-> **VI**: Kiến trúc bảo mật toàn diện cho nền tảng GoodGo với mô hình zero-trust, RBAC và compliance
+> Comprehensive security architecture for GoodGo platform with zero-trust model, RBAC, and compliance

-## Overview Diagram / Sơ đồ Tổng quan
+## Overview Diagram

 ```mermaid
 graph TD
@@ -26,9 +25,7 @@ graph TD
    style Audit fill:#fff4e1
 ```

-## Architecture Description / Mô tả Kiến trúc
-
-### EN: English Section
+## Architecture Description

 The GoodGo Security Architecture implements defense-in-depth with multiple security layers:

@@ -47,26 +44,7 @@ The GoodGo Security Architecture implements defense-in-depth with multiple secur
 - Event Sourcing for Audit Trail
 - Compliance (GDPR, SOC2, ISO27001, HIPAA)

-### VI: Vietnamese Section
-
-Kiến trúc Bảo mật GoodGo triển khai defense-in-depth với nhiều tầng bảo mật:
-
-**Nguyên tắc Bảo mật**:
-1. **Zero Trust**: Không bao giờ tin tưởng, luôn xác minh
-2. **Least Privilege**: Quyền tối thiểu cần thiết
-3. **Defense in Depth**: Nhiều tầng bảo mật
-4. **Audit Everything**: Audit trail hoàn chỉnh
-5. **Encryption**: Mã hóa dữ liệu at rest và in transit
-
-**Thành phần Chính**:
- JWT Authentication (15min access, 7 ngày refresh)
- RBAC + ABAC Authorization
- Zero-Trust Device Validation
- AES-256-GCM Encryption
- Event Sourcing cho Audit Trail
- Compliance (GDPR, SOC2, ISO27001, HIPAA)
-
-## Authentication Flow / Luồng Xác thực
+## Authentication Flow

 ```mermaid
 sequenceDiagram
@@ -93,7 +71,7 @@ sequenceDiagram
    end
 ```

-**EN: Authentication Details**:
+**Authentication Details**:

 **1. Password Hashing**:
 - Algorithm: bcrypt with cost factor 12
@@ -116,30 +94,7 @@ sequenceDiagram
 - Backup codes (10 single-use)
 - Recovery email verification

-**VI: Chi tiết Xác thực**:
-
-**1. Password Hashing**:
- Thuật toán: bcrypt với cost factor 12
- Không bao giờ lưu plaintext passwords
- Password tối thiểu: 8 ký tự với quy tắc phức tạp
-
-**2. JWT Tokens**:
- Access Token: 15 phút expiry
- Refresh Token: 7 ngày expiry
- Thuật toán: RS256 (asymmetric signing)
- Payload: userId, roles, permissions
-
-**3. Token Storage**:
- Access: httpOnly cookie (secure, sameSite)
- Refresh: Database SHA-256 hash
- Rotation: Refresh token mới mỗi lần sử dụng
-
-**4 MFA Support**:
- TOTP (Time-based One-Time Password)
- Backup codes (10 single-use)
- Recovery email verification
-
-## Authorization Model / Mô hình Phân quyền
+## Authorization Model

 ```mermaid
 graph TD
@@ -162,7 +117,7 @@ graph TD
    style Perm fill:#fff4e1
 ```

-**EN: RBAC (Role-Based Access Control)**
+**RBAC (Role-Based Access Control)**:

 **1. Role Hierarchy**:
 ```
@@ -186,24 +141,7 @@ SuperAdmin > OrgAdmin > Manager > User > Guest
 // Invalidate on: role change, permission change
 ```

-**VI: RBAC (Role-Based Access Control)**
-
-**1. Cấp bậc Role**:
-```
-SuperAdmin > OrgAdmin > Manager > User > Guest
-```
-
-**2. Format Permission**: `resource:action:scope`
- Resource: `users`, `roles`, `permissions`
- Action: `create`, `read`, `update`, `delete`
- Scope: `own`, `org`, `global`
-
-**Ví dụ**:
- `users:read:own` - Đọc profile của chính mình
- `users:update:org` - Update users trong organization
- `roles:create:global` - Tạo roles globally
-
-## Zero-Trust Architecture / Kiến trúc Zero-Trust
+## Zero-Trust Architecture

 ```mermaid
 graph TD
@@ -221,7 +159,7 @@ graph TD
    style Allow fill:#d4edda
 ```

-**EN: Zero-Trust Components**:
+**Zero-Trust Components**:

 **1. Device Fingerprinting**:
 - Browser: User-Agent, Canvas, WebGL
@@ -245,22 +183,9 @@ graph TD
 - Bind session to IP address
 - Invalidate on mismatch

-**VI: Thành phần Zero-Trust**:
+## Data Protection

-**1. Device Fingerprinting**:
- Browser: User-Agent, Canvas, WebGL
- Screen resolution, timezone, language
- Phát hiện plugin, fonts có sẵn
- Hash fingerprint → Lưu với session
-
-**2. IP Address Validation**:
- Whitelist IPs đã biết cho user
- Alert với IP mới + require MFA
- Block IPs đáng ngờ (VPN, Tor)
-
-## Data Protection / Bảo vệ Dữ liệu
-
-**EN: Encryption Strategy**:
+**Encryption Strategy**:

 **1. Data at Rest**:
 - PII: AES-256-GCM encryption
@@ -279,22 +204,9 @@ graph TD
 - Rotate keys quarterly
 - Never hardcode secrets

-**VI: Chiến lược Mã hóa**:
+## Compliance & Audit

-**1. Data at Rest**:
- PII: AES-256-GCM encryption
- Passwords: bcrypt (cost 12)
- Tokens: SHA-256 hash
- Keys: Environment variables + K8s secrets
-
-**2. Data in Transit**:
- TLS 1.2+ cho mọi giao tiếp
- HTTPS enforcement
- Certificate pinning (mobile clients)
-
-## Compliance & Audit / Tuân thủ & Kiểm toán
-
-**EN: Compliance Requirements**:
+**Compliance Requirements**:

 **1. GDPR**:
 - Right to erasure (soft delete + hard delete after 90 days)
@@ -308,7 +220,6 @@ graph TD
 - Audit logging (7-year retention)
 - Incident response plan

-**3. Audit Trail**:
 ```typescript
 // Event sourcing for all auth events
 {
@@ -321,27 +232,338 @@ graph TD
 }
 ```

-**VI: Yêu cầu Tuân thủ**:
+## System Context

-**1. GDPR**:
- Right to erasure (soft delete + hard delete sau 90 ngày)
- Data portability (export dữ liệu user)
- Quản lý consent
- Thông báo breach (72 giờ)
+```mermaid
+C4Context
+    title Security Architecture Context
+    
+    Person(user, "User", "End user accessing platform")
+    Person(admin, "Admin", "System administrator")
+    Person(attacker, "Attacker", "Potential threat actor")
+    
+    System(iam, "IAM Service", "Authentication & Authorization")
+    
+    System_Ext(db, "Neon PostgreSQL", "Encrypted user credentials & sessions")
+    System_Ext(cache, "Redis", "Permission & session cache")
+    System_Ext(audit, "Audit Service", "Security event logging")
+    System_Ext(mfa, "MFA Provider", "TOTP verification")
+    System_Ext(monitoring, "Security Monitoring", "SIEM & alerting")
+    
+    Rel(user, iam, "Authenticates", "HTTPS + TLS 1.2+")
+    Rel(admin, iam, "Manages permissions", "HTTPS + TLS 1.2+")
+    Rel(attacker, iam, "Blocked by security layers", "")
+    
+    Rel(iam, db, "Stores credentials", "PostgreSQL + TLS")
+    Rel(iam, cache, "Caches permissions", "Redis + TLS")
+    Rel(iam, audit, "Logs security events", "Kafka")
+    Rel(iam, mfa, "Verifies MFA", "HTTPS")
+    Rel(iam, monitoring, "Sends security metrics", "Prometheus + Loki")
+```

-**2. SOC2**:
- Access controls (RBAC)
- Encryption at rest và in transit
- Audit logging (7 năm retention)
- Incident response plan
+**Context Description**:
+- **IAM Service**: Central authentication and authorization
+- **Database**: Stores encrypted credentials, sessions, permissions
+- **Cache**: Caches permissions and sessions to reduce database load
+- **Audit Service**: Receives and stores all security events
+- **MFA Provider**: External TOTP verification service (Google Authenticator compatible)
+- **Security Monitoring**: SIEM (Security Information and Event Management) and alerting

-## Related Documentation / Tài liệu Liên quan
+## Database Architecture

- [System Design](./system-design.md) - Overall architecture
- [IAM Architecture](./iam-proposal.md) - IAM service implementation
- [Event-Driven Architecture](./event-driven-architecture.md) - Audit event streaming
+```mermaid
+erDiagram
+    User ||--o{ Session : has
+    User ||--o{ UserRole : has
+    User ||--o{ UserPermission : has
+    User ||--o{ MFADevice : has
+    User ||--o{ LoginHistory : has
+    User ||--o{ DeviceFingerprint : has
+    
+    Role ||--o{ UserRole : assigned_to
+    Role ||--o{ RolePermission : has
+    
+    Permission ||--o{ RolePermission : granted_to
+    Permission ||--o{ UserPermission : granted_to
+    
+    Organization ||--o{ User : contains
+    Organization ||--o{ Role : defines
+    
+    User {
+        string id PK "CUID"
+        string email UK "Unique, indexed"
+        string passwordHash "bcrypt cost 12"
+        string organizationId FK
+        boolean mfaEnabled "MFA required?"
+        datetime lastLoginAt "Tracking"
+        datetime createdAt "Timestamp"
+        datetime updatedAt "Timestamp"
+        datetime deletedAt "Soft delete"
+    }
+    
+    Session {
+        string id PK "CUID"
+        string userId FK
+        string refreshTokenHash "SHA-256"
+        string deviceFingerprint "Hashed"
+        string ipAddress "IPv4/IPv6"
+        string userAgent "Browser info"
+        datetime expiresAt "7 days TTL"
+        datetime lastActivityAt "Tracking"
+        datetime createdAt "Timestamp"
+    }
+    
+    Role {
+        string id PK "CUID"
+        string name "role-name"
+        string organizationId FK
+        int hierarchy "Priority level"
+        boolean isSystem "Built-in?"
+        datetime createdAt "Timestamp"
+    }
+    
+    Permission {
+        string id PK "CUID"
+        string resource "users, roles, etc"
+        string action "create, read, update, delete"
+        string scope "own, org, global"
+        datetime createdAt "Timestamp"
+    }
+    
+    MFADevice {
+        string id PK "CUID"
+        string userId FK
+        string type "totp, backup"
+        string secret "Encrypted TOTP secret"
+        boolean verified "Verified?"
+        datetime lastUsedAt "Tracking"
+        datetime createdAt "Timestamp"
+    }
+    
+    LoginHistory {
+        string id PK "CUID"
+        string userId FK
+        boolean success "Success/Failure"
+        string ipAddress "IPv4/IPv6"
+        string deviceFingerprint "Hashed"
+        string failureReason "If failed"
+        datetime timestamp "Event time"
+    }
+    
+    DeviceFingerprint {
+        string id PK "CUID"
+        string userId FK
+        string fingerprint "Hashed"
+        boolean trusted "Auto-approved?"
+        datetime firstSeenAt "First use"
+        datetime lastSeenAt "Last use"
+    }
+```

---
+**Description**:
+- **User**: Stores hashed credentials, MFA settings, organization membership
+- **Session**: Stores hashed refresh tokens, device fingerprint, IP tracking
+- **Role & Permission**: RBAC hierarchy with system roles and custom roles
+- **MFADevice**: TOTP secrets (encrypted), backup codes
+- **LoginHistory**: Audit trail for all login attempts (success/failure)
+- **DeviceFingerprint**: Trusted device tracking for zero-trust model

-**Last Updated**: 2024-01-15  
-**Authors**: GoodGo Security Team
+**Database Security**:
+- Password hashes: bcrypt with cost factor 12
+- Token hashes: SHA-256
+- MFA secrets: AES-256-GCM encryption
+- Soft deletes: `deletedAt` field, hard delete after 90 days (GDPR)
+- Indexes: email (unique), userId (foreign keys), timestamps
+
+## Design Decisions
+
+### Decision 1: JWT with RS256 (Asymmetric)
+
+**Context**: Need stateless authentication with ability to verify tokens in multiple services
+
+**Decision**: Use JWT with RS256 (RSA asymmetric signing) instead of HS256 (HMAC symmetric)
+
+**Consequences**:
+- ✅ **Positive**:
+  - Services can verify tokens with public key, don't need secret
+  - Easier key rotation (only distribute new public key)
+  - Higher security (private key only in IAM service)
+  - Compliance: Clear audit trail of who signs tokens
+- ❌ **Negative**:
+  - Slightly slower than HS256 (~10-20% slower)
+  - More complex key management
+  - Public/private key pair must be carefully protected
+
+**Alternatives**: HS256 (symmetric), EdDSA, OAuth 2.0 with Opaque Tokens
+
+### Decision 2: Zero-Trust Model with Device Fingerprinting
+
+**Context**: Need to protect against credential theft, session hijacking, and unauthorized access
+
+**Decision**: Implement zero-trust model with device fingerprinting, IP validation, behavioral analysis
+
+**Consequences**:
+- ✅ **Positive**:
+  - Detect anomalies (new device, new IP, unusual behavior)
+  - Increased security by detecting and blocking suspicious activities
+  - Compliance: SOC2, ISO27001 requirements
+  - User experience: Auto-approve trusted devices
+- ❌ **Negative**:
+  - Higher complexity
+  - Potential false positives (legitimate users blocked)
+  - Performance overhead (fingerprint hash, IP check)
+  - Privacy concerns (tracking devices, IPs)
+
+**Alternatives**: Basic authentication only, IP whitelist only, MFA required for all
+
+### Decision 3: Event Sourcing for Audit Trail
+
+**Context**: Need immutable audit trail for compliance (GDPR, SOC2, HIPAA) and security forensics
+
+**Decision**: Use event sourcing pattern to store all auth/security events
+
+**Consequences**:
+- ✅ **Positive**:
+  - Immutable audit trail (cannot modify/delete)
+  - Complete history of all security events
+  - Compliance: GDPR (7-year retention), SOC2, HIPAA
+  - Security forensics: Trace back attacks, breaches
+  - Replay events to reconstruct state
+- ❌ **Negative**:
+  - High storage cost (retain 7 years)
+  - Complexity in event schema versioning
+  - Performance: Event publishing overhead
+  - Data privacy: Must anonymize PII after retention period
+
+**Alternatives**: Database audit logs only, External SIEM only, No audit trail
+
+## Performance Characteristics
+
+| Metric | Target | Notes |
+|--------|--------|-------|
+| **Login Time (P95)** | < 500ms | Including bcrypt verification |
+| **Login Time (P99)** | < 1s | Peak load |
+| **Token Generation (P95)** | < 50ms | JWT sign with RS256 |
+| **Token Verification (P95)** | < 10ms | JWT verify with public key |
+| **Permission Check (P95)** | < 5ms | From cache (L1 or L2) |
+| **Permission Check (Cache Miss)** | < 50ms | Database query |
+| **MFA Verification (P95)** | < 100ms | TOTP validation |
+| **Session Lookup (P95)** | < 10ms | Redis cache |
+| **Password Hash (P95)** | < 200ms | bcrypt cost 12 |
+| **Device Fingerprint Hash** | < 5ms | SHA-256 |
+| **Failed Login Rate Limit** | 5 attempts / 15min | Per user |
+| **Auth Throughput** | 500 req/s | Per IAM instance |
+
+**Performance Optimizations**:
+- **Permission Caching**: L1 (memory) + L2 (Redis), TTL 5 minutes
+- **Token Caching**: Cache public key in memory for JWT verification
+- **Connection Pooling**: Reuse database connections
+- **Async Operations**: Event publishing, audit logging (fire-and-forget)
+- **Rate Limiting**: Prevent brute force attacks, reduce load
+- **Horizontal Scaling**: Multiple IAM service instances
+
+## Deployment
+
+```mermaid
+graph TD
+    subgraph "Security Layer"
+        LB[Load Balancer<br/>TLS Termination]
+        WAF[WAF / Firewall<br/>Rate Limiting<br/>DDoS Protection]
+    end
+    
+    subgraph "IAM Service Layer"
+        IAM1[IAM Service Pod 1<br/>Stateless]
+        IAM2[IAM Service Pod 2<br/>Stateless]
+        IAM3[IAM Service Pod 3<br/>Stateless]
+    end
+    
+    subgraph "Data Layer"
+        DB[(Neon PostgreSQL<br/>Encrypted at Rest)]
+        Cache[(Redis Cluster<br/>TLS Enabled)]
+        Vault[Secrets Manager<br/>K8s Secrets]
+    end
+    
+    subgraph "Security Monitoring"
+        SIEM[SIEM / Security Monitoring]
+        Alerts[Alerting System]
+    end
+    
+    Client[Clients] --> LB
+    LB --> WAF
+    WAF --> IAM1
+    WAF --> IAM2
+    WAF --> IAM3
+    
+    IAM1 --> DB
+    IAM1 --> Cache
+    IAM1 --> Vault
+    
+    IAM2 --> DB
+    IAM2 --> Cache
+    IAM2 --> Vault
+    
+    IAM3 --> DB
+    IAM3 --> Cache
+    IAM3 --> Vault
+    
+    IAM1 -.->|Security Events| SIEM
+    IAM2 -.->|Security Events| SIEM
+    IAM3 -.->|Security Events| SIEM
+    
+    SIEM -.->|Alerts| Alerts
+    
+    style LB fill:#d4edda
+    style WAF fill:#fff3cd
+    style DB fill:#f0e1ff
+    style Cache fill:#fff4e1
+    style Vault fill:#f8d7da
+    style SIEM fill:#e1f5ff
+```
+
+**Deployment Strategy**:
+
+**Security Deployment**:
+- **TLS 1.2+ Enforcement**: All connections require TLS
+- **Network Policies (K8s)**: Deny all by default, whitelist specific services
+- **Pod Security Policies**: Non-root user, read-only filesystem, no privilege escalation
+- **Secrets Management**: Kubernetes secrets with encryption at rest
+- **Image Scanning**: Trivy/Clair scan before deployment
+- **RBAC (K8s)**: Least privilege for service accounts
+
+**Resource Allocation**:
+| Component | CPU | Memory | Replicas |
+|-----------|-----|--------|----------|
+| **IAM Service** | 500m | 1GB | 3-10 (HPA) |
+| **Redis** | 1 core | 2GB | 3 masters + 3 slaves |
+
+**Security Configuration**:
+```yaml
+# K8s Network Policy
+apiVersion: networking.k8s.io/v1
+kind: NetworkPolicy
+metadata:
+  name: iam-service-policy
+spec:
+  podSelector:
+    matchLabels:
+      app: iam-service
+  policyTypes:
+  - Ingress
+  - Egress
+  ingress:
+  - from:
+    - podSelector:
+        matchLabels:
+          app: api-gateway
+    ports:
+    - protocol: TCP
+      port: 5000
+  egress:
+  - to:
+    - podSelector:
+        matchLabels:
+          app: postgresql
+    ports:
+    - protocol: TCP
+      port: 5432
+```
--- a/docs/en/guides/deployment.md
+++ b/docs/en/guides/deployment.md
@@ -1,106 +1,234 @@
 # Deployment Guide

+> **Note**: This guide covers deployment strategies for GoodGo Microservices Platform across Local, Staging, and Production environments using Kubernetes and Neon PostgreSQL.
+
+## Table of Contents
+
+1. [Deployment Architecture](#deployment-architecture)
+2. [Prerequisites](#prerequisites)
+3. [Database Setup (Neon)](#database-setup-neon)
+4. [Local Deployment](#local-deployment)
+5. [CI/CD Pipeline](#cicd-pipeline)
+6. [Staging Deployment](#staging-deployment)
+7. [Production Deployment](#production-deployment)
+8. [Scaling & Resilience](#scaling--resilience)
+9. [Rollback Procedures](#rollback-procedures)
+
+---
+
+## Deployment Architecture
+
+```mermaid
+graph TD
+    subgraph "CI/CD Pipeline (GitHub Actions)"
+        Code[Code Push] --> Test[Run Tests]
+        Test --> Build[Build Docker Image]
+        Build --> Registry[Push to Registry]
+        Registry --> Deploy[Deploy to K8s]
+    end
+    
+    subgraph "Infrastructure (Kubernetes)"
+        Ingress[Traefik Ingress] --> Service[K8s Service]
+        Service --> Pods[Application Pods]
+        Pods --> Secrets[K8s Secrets]
+    end
+    
+    subgraph "External Services"
+        Pods --> Neon[(Neon PostgreSQL)]
+        Pods --> Redis[(Redis Cloud)]
+    end
+    
+    Deploy --> Ingress
+```
+
+---
+
+## Prerequisites
+
+Before deploying, ensure you have:
+
+*   **Tools**: `kubectl`, `helm`, `docker` installed.
+*   **Access**:
+    *   Kubernetes Cluster (EKS/GKE/DigitalOcean).
+    *   Container Registry (GHCR/DockerHub).
+    *   Neon Console Account.
+*   **Configuration**:
+    *   `KUBECONFIG` file set up.
+    *   GitHub Secrets configured for CI/CD.
+
+---
+
 ## Database Setup (Neon)

-All environments use **Neon PostgreSQL**. Setup once before deployment:
+We use **Neon Serverless PostgreSQL** for all environments to leverage branching and auto-scaling.

-1. Create Neon project at https://neon.tech
-2. Create branches: `main` (dev), `staging`, `production`
-3. Get connection strings for each branch
-4. Configure in environment variables (see below)
+1.  **Create Project**: Log in to [neon.tech](https://neon.tech) and create a project `goodgo-platform`.
+2.  **Create Branches**:
+    *   `main` -> For Development/Local.
+    *   `staging` -> For Staging environment.
+    *   `production` -> For Production environment (Protected).
+3.  **Get Connection Strings**:
+    *   Note the connection string for each branch (Pooler mode recommended).

-See [Neon Setup Guide](../../infra/databases/neon/README.md) for details.
+---

 ## Local Deployment

-```bash
-# Setup Neon database URL
-cp deployments/local/env.local.example deployments/local/.env.local
-# Edit .env.local and add your Neon DATABASE_URL
+For local development, we use Docker Compose.

-# Start services (no PostgreSQL container needed)
+```bash
+# 1. Setup Environment
+cp deployments/local/env.local.example deployments/local/.env.local
+# Edit .env.local with Neon `main` branch connection string
+
+# 2. Start Infrastructure (Redis, Traefik, etc.)
 cd deployments/local
 docker-compose up -d
+
+# 3. Start Services (Hot-reload)
+pnpm dev
 ```

+---
+
+## CI/CD Pipeline
+
+We use GitHub Actions for automated deployments.
+
+| Workflow | Trigger | Description |
+| :--- | :--- | :--- |
+| `ci-check.yml` | Pull Request | Runs unit tests, linting, and build check. |
+| `deploy-staging.yml` | Push to `develop` | Build image -> Deploy to Staging Namespace. |
+| `deploy-prod.yml` | Release / Tag | Build image -> Deploy to Production Namespace. |
+
+### Secrets Configuration (GitHub)
+
+Set these secrets in your repository settings:
+
+*   `NEON_DATABASE_URL_STAGING`: Connection string for staging branch.
+*   `NEON_DATABASE_URL_PRODUCTION`: Connection string for production branch.
+*   `KUBECONFIG_STAGING`: Base64 encoded kubeconfig for staging.
+*   `KUBECONFIG_PRODUCTION`: Base64 encoded kubeconfig for production.
+*   `DOCKER_REGISTRY_TOKEN`: For pushing images.
+
+---
+
 ## Staging Deployment

-### Prerequisites
- Kubernetes cluster access
- kubectl configured
- KUBECONFIG set
- Neon staging branch created
- GitHub Secrets configured:
-  - `NEON_DATABASE_URL_STAGING`
-  - `KUBECONFIG_STAGING`
+Staging mirrors production but uses cost-effective resources.

-### Setup Secrets
+### Manual Deployment

 ```bash
-# Create Kubernetes secret
+# 1. Create Secrets
 kubectl create secret generic iam-service-secrets \
-  --from-literal=database-url='postgresql://user:pass@ep-xxx.region.neon.tech/dbname?sslmode=require&pgbouncer=true' \
-  --from-literal=jwt-secret='your-staging-jwt-secret' \
-  --from-literal=jwt-refresh-secret='your-staging-refresh-secret' \
+  --from-literal=database-url='<STAGING_NEON_URL>' \
+  --from-literal=jwt-secret='<RANDOM_SECRET>' \
  -n staging
+
+# 2. Apply Manifests
+kubectl apply -f deployments/staging/kubernetes/ -n staging
+
+# 3. Verify
+kubectl get pods -n staging
 ```

-### Deploy
+### via CI/CD

-```bash
-./scripts/deploy/deploy-staging.sh
-```
+Push code to `develop` branch. The action will:
+1.  Run tests.
+2.  Run `prisma migrate deploy` against Staging DB.
+3.  Update Kubernetes deployment image.

-Or manually:
-```bash
-kubectl apply -f deployments/staging/kubernetes/
-```
-
-**Note**: Migrations run automatically in CI/CD before deployment.
+---

 ## Production Deployment

-### Prerequisites
- Production Kubernetes cluster
- kubectl configured with production context
- Neon production branch created
- GitHub Secrets configured:
-  - `NEON_DATABASE_URL_PRODUCTION`
-  - `KUBECONFIG_PRODUCTION`
+Production uses high-availability configurations.

-### Setup Secrets
+### 1. Database Preparation
+
+*   Ensure Production branch in Neon is **protected**.
+*   Configure **Point-in-Time Recovery (PITR)** window (e.g., 7 days).
+
+### 2. Manual Deployment Steps

 ```bash
-# Create Kubernetes secret
+# 1. Create Namespace
+kubectl create namespace production
+
+# 2. Create Sealed Secrets (Recommended) or Standard Secrets
 kubectl create secret generic iam-service-secrets \
-  --from-literal=database-url='postgresql://user:pass@ep-xxx.region.neon.tech/dbname?sslmode=require&pgbouncer=true' \
-  --from-literal=jwt-secret='your-production-jwt-secret' \
-  --from-literal=jwt-refresh-secret='your-production-refresh-secret' \
+  --from-literal=database-url='<PROD_NEON_URL>' \
+  --from-literal=jwt-secret='<SECURE_RANDOM_SECRET>' \
+  --from-literal=jwt-refresh-secret='<SECURE_RANDOM_SECRET>' \
  -n production
+
+# 3. Deploy
+kubectl apply -f deployments/production/kubernetes/ -n production
 ```

-### Deploy
+### 3. Verification

 ```bash
-./scripts/deploy/deploy-prod.sh
+# Check Rollout Status
+kubectl rollout status deployment/iam-service -n production
+
+# Check Logs
+kubectl logs -l app=iam-service -n production
 ```

-**Note**: Migrations run automatically in CI/CD before deployment (with approval).
+---

-### Rollback
+## Scaling & Resilience
+
+### Horizontal Pod Autoscaler (HPA)
+
+We use HPA to automatically scale pods based on CPU/Memory.
+
+```yaml
+# Example HPA Config
+apiVersion: autoscaling/v2
+kind: HorizontalPodAutoscaler
+metadata:
+  name: iam-service-hpa
+spec:
+  minReplicas: 2
+  maxReplicas: 10
+  metrics:
+  - type: Resource
+    resource:
+      name: cpu
+      target:
+        type: Utilization
+        averageUtilization: 70
+```
+
+### Zero-Downtime Deployment
+
+Kubernetes handles this via Rolling Updates.
+*   **MaxSurge**: 25% (Add new pods before removing old ones).
+*   **MaxUnavailable**: 0 (Ensure no downtime during update).
+
+---
+
+## Rollback Procedures
+
+If a deployment fails or introduces a critical bug:
+
+### Kubernetes Rollback

 ```bash
+# Undo last deployment
 kubectl rollout undo deployment/iam-service -n production
+
+# Undo to specific revision
+kubectl rollout undo deployment/iam-service -n production --to-revision=2
 ```

-## Health Checks
+### Database Rollback

- Liveness: `GET /health/live`
- Readiness: `GET /health/ready`
- Health: `GET /health`
-
-## Monitoring
-
- Prometheus: http://prometheus:9090
- Grafana: http://grafana:3000
- Traefik Dashboard: http://traefik:8080
+Since Neon supports branching and PITR:
+1.  Go to Neon Console.
+2.  Restore the `production` branch to a timestamp before the bad migration.
+3.  **Warning**: This may result in data loss for new transactions. Use with caution.
--- a/docs/en/guides/development.md
+++ b/docs/en/guides/development.md
@@ -1,111 +1,211 @@
 # Development Guide

+> **Note**: This guide provides comprehensive standards and workflows for contributing to the GoodGo Microservices Platform.
+
+## Table of Contents
+
+1. [Project Structure](#project-structure)
+2. [Code Standards](#code-standards)
+3. [Git Workflow](#git-workflow)
+4. [Backend Development](#backend-development)
+5. [Testing Strategy](#testing-strategy)
+6. [Database Workflow](#database-workflow)
+7. [Kubernetes Deployment](#kubernetes-deployment)
+
+---
+
 ## Project Structure

+We follow a strict monorepo structure managed by PNPM Workspaces.
+
 ```
-├── apps/              # Frontend applications
-├── services/          # Backend microservices
-├── packages/          # Shared libraries
-├── infra/             # Infrastructure configs
-├── deployments/       # Deployment configs
-├── scripts/           # Automation scripts
-└── docs/              # Documentation
+Base/
+├── apps/                 # Frontend applications
+│   ├── web-client/       # Next.js 14+ (App Router)
+│   └── mobile-client/    # Flutter
+├── services/             # Backend microservices
+│   ├── _template/        # Template for new services
+│   ├── iam-service/      # Identity & Access Management
+│   └── ...
+├── packages/             # Shared libraries
+│   ├── logger/           # Structured logging (Winston)
+│   ├── types/            # Shared DTOs & Interfaces
+│   ├── http-client/      # Internal Service Client
+│   └── tracing/          # OpenTelemetry configuration
+├── infra/                # Infrastructure-as-Code
+│   ├── traefik/          # API Gateway
+│   └── databases/        # Database setup scripts
+└── docs/                 # Documentation (EN & VI)
 ```

-## Development Workflow
+---

-### 1. Create a Feature Branch
+## Code Standards

+### Naming Conventions
+
+*   **Files**: `kebab-case.ts` (e.g., `user.controller.ts`, `app.config.ts`)
+*   **Classes**: `PascalCase` (e.g., `UserController`, `AuthService`)
+*   **Functions/Variables**: `camelCase` (e.g., `getUserById`, `isValid`)
+*   **Constants**: `UPPER_SNAKE_CASE` (e.g., `MAX_RETRIES`, `DEFAULT_TIMEOUT`)
+*   **Interfaces**: `PascalCase` (e.g., `User`, `CreateUserDto`) - *No 'I' prefix*
+
+### Bilingual Comments
+
+For core logic and public APIs, assume both international and Vietnamese developers reading the code.
+
+```typescript
+/**
+ * EN: Validates user credentials and returns a token
+ * VI: Xác thực thông tin người dùng và trả về token
+ */
+async login(dto: LoginDto): Promise<TokenResponse> { ... }
+```
+
+### TypeScript Usage
+
+*   **Strict Mode**: Enabled in `tsconfig.json`. No `any` allowed (use `unknown` if needed).
+*   **DTOs**: Use Zod for runtime validation and type inference.
+*   **Return Types**: Explicitly declare return types for all public methods.
+
+---
+
+## Git Workflow
+
+### Branching Strategy
+
+*   `main`: Production-ready code.
+*   `develop`: Integration branch for next release.
+*   `feature/xyz`: New features (branch off `develop`).
+*   `fix/xyz`: Bug fixes (branch off `develop`).
+*   `hotfix/xyz`: Critical fixes (branch off `main`).
+
+### Commit Messages
+
+We follow [Conventional Commits](https://www.conventionalcommits.org/):
+
+```
+feat(iam): add multi-factor authentication
+fix(db): correct unique constraint on email
+docs(guide): update development setup
+style: format code with prettier
+refactor: simplify auth middleware
+test: add unit tests for user service
+chore: update dependencies
+```
+
+---
+
+## Backend Development
+
+### Creating a New API Endpoint
+
+1.  **Define DTO** (`modules/user/user.dto.ts`):
+    ```typescript
+    export const CreateUserDto = z.object({
+      email: z.string().email(),
+      name: z.string().min(2),
+    });
+    export type CreateUserDto = z.infer<typeof CreateUserDto>;
+    ```
+
+2.  **Create Service Method** (`modules/user/user.service.ts`):
+    *   Implement business logic.
+    *   Use `BaseRepository`.
+    *   Throw `HttpError` (e.g., `NotFound`, `BadRequest`).
+
+3.  **Create Controller** (`modules/user/user.controller.ts`):
+    *   Parse body with DTO: `const dto = CreateUserDto.parse(req.body)`.
+    *   Call service.
+    *   Return success response: `res.json({ success: true, data: result })`.
+
+4.  **Register Route** (`modules/user/index.ts`):
+    *   Add to Express router with middlewares.
+
+### Error Handling
+
+Always use the custom error classes from `core/errors`:
+
+```typescript
+import { NotFoundError, ConflictError } from '../../core/errors';
+
+if (!user) {
+  throw new NotFoundError('User not found');
+}
+```
+
+---
+
+## Testing Strategy
+
+### Unit Tests (`*.test.ts`)
+
+*   **Scope**: Individual classes/functions.
+*   **Mocking**: Mock all external dependencies (DB, other services) using `jest-mock-extended`.
+*   **Location**: Co-located with source files.
+*   **Run**: `pnpm test`
+
+### E2E Tests (`tests/**/*.e2e.ts`)
+
+*   **Scope**: Full API flows (Controller -> Service -> DB).
+*   **Database**: Use a separate test database (Dockerized).
+*   **Run**: `pnpm test:e2e`
+
+### Linting & Formatting
+
+*   **Lint**: `pnpm lint` (ESLint)
+*   **Format**: `pnpm format` (Prettier)
+*   **Typecheck**: `pnpm typecheck` (TSC)
+
+---
+
+## Database Workflow
+
+We use **Prisma** with **Neon PostgreSQL**.
+
+### Migrations
+
+1.  Modify `prisma/schema.prisma`.
+2.  Create migration (Dev):
+    ```bash
+    ./scripts/db/migrate.sh iam-service dev --name add_user_profile
+    ```
+3.  Apply to Production (CI/CD):
+    ```bash
+    ./scripts/db/migrate.sh iam-service deploy
+    ```
+
+### Seed Data
+
+Populate database with initial data:
 ```bash
-git checkout -b feature/my-feature
+./scripts/db/seed.sh iam-service
 ```

-### 2. Make Changes
-
- Write code following TypeScript strict mode
- Add tests for new functionality
- Update documentation if needed
-
-### 3. Run Tests Locally
+### Visualizing Data

+Use Prisma Studio:
 ```bash
-# All tests
-pnpm test
-
-# Specific service
-pnpm --filter @goodgo/iam-service test
+pnpm --filter @goodgo/iam-service prisma studio
 ```

-### 4. Lint and Format
-
-```bash
-pnpm lint
-pnpm format
-```
-
-### 5. Create Pull Request
-
- Push your branch
- Create PR targeting `develop`
- CI/CD will run automatically
-
-## Adding a New Service
-
-1. Use the template:
-   ```bash
-   ./scripts/utils/create-service.sh my-new-service
-   ```
-
-2. Update service configuration
-3. Implement business logic
-4. Add tests
-5. Update documentation
-
-## Adding a New Package
-
-1. Create package in `packages/new-package`
-2. Add to workspace in `pnpm-workspace.yaml`
-3. Export from `index.ts`
-4. Add tests
-5. Document usage
-
-## Database Migrations
-
-## Database Migrations
-
-```bash
-# Create migration (dev)
-./scripts/db/migrate.sh iam-service dev
-
-# Apply migrations (production)
-./scripts/db/migrate.sh iam-service deploy
-```
+---

 ## Kubernetes Deployment

-### Local Kubernetes (Docker Desktop)
+For local Kubernetes testing (Docker Desktop / Minikube):

 ```bash
-# Enable Kubernetes in Docker Desktop
-# Settings → Kubernetes → Enable Kubernetes
+# 1. Build images
+docker build -t goodgo/iam-service:latest -f services/iam-service/Dockerfile .

-# Deploy service
+# 2. Deploy
 cd deployments/local/kubernetes
 ./deploy.sh

-# Verify deployment
+# 3. Verify
 kubectl get pods -n iam-local
-kubectl logs -f -n iam-local -l app=iam-service
-
-# Port forward for testing
-kubectl port-forward svc/iam-service 5002:80 -n iam-local
-curl http://localhost:5002/health/live
+kubectl logs -f -l app=iam-service -n iam-local
 ```

-**See detailed guide**: [Kubernetes Local Deployment Guide](./kubernetes-local.md)
-
-## Debugging
-
- Use logger from `@goodgo/logger`
- Check Traefik logs: `docker logs traefik-local`
- Check service logs: `./scripts/dev/logs.sh iam-service`
+See [Kubernetes Guide](./kubernetes-local.md) for detailed setup.
--- a/docs/en/guides/getting-started.md
+++ b/docs/en/guides/getting-started.md
@@ -1,81 +1,214 @@
 # Getting Started

+> **Note**: This guide assumes you are setting up the project on macOS or Linux. Windows users should use WSL2.
+
+## Table of Contents
+
+1. [Prerequisites](#prerequisites)
+2. [Architecture Overview](#architecture-overview)
+3. [Project Structure](#project-structure)
+4. [Installation & Setup](#installation--setup)
+5. [Development Workflow](#development-workflow)
+6. [Common Commands](#common-commands)
+7. [Troubleshooting](#troubleshooting)
+
 ## Prerequisites

- Node.js >= 20.0.0
- PNPM >= 8.0.0
- Docker & Docker Compose
- Git
- Neon account (https://neon.tech) - for database
+Before starting, ensure you have the following installed:

-## Initial Setup
+*   **Node.js**: v20.0.0 or higher
+    ```bash
+    node -v
+    # v20.10.0
+    ```
+*   **PNPM**: v8.0.0 or higher (we use pnpm workspaces)
+    ```bash
+    pnpm -v
+    # 8.12.0
+    ```
+*   **Docker & Docker Compose**: For local infrastructure
+    ```bash
+    docker -v
+    # Docker version 24.0.0
+    ```
+*   **Git**: For version control
+*   **Neon Account**: Serverless PostgreSQL (https://neon.tech)

-1. **Clone the repository**
-   ```bash
-   git clone <repository-url>
-   cd Base
-   ```
+## Architecture Overview

-2. **Setup Neon Database**
-   ```bash
-   # Run setup script
-   ./scripts/db/setup-neon.sh
-   
-   # Or manually:
-   # 1. Create Neon project at https://neon.tech
-   # 2. Create branches: main (dev), staging, production
-   # 3. Get connection strings
-   # 4. Update deployments/local/.env.local
-   ```
-   
-   See [Neon Setup Guide](../../infra/databases/neon/README.md) for details.
+GoodGo Platform uses a microservices architecture with a shared infrastructure layer.

-3. **Initialize the project**
-   ```bash
-   ./scripts/setup/init-project.sh
-   ```
+```mermaid
+graph TD
+    Client[Client Apps] --> Traefik[Traefik Gateway]
+    
+    Traefik --> IAM[IAM Service]
+    Traefik --> Template[Template Service]
+    
+    IAM --> DB[(Neon PostgreSQL)]
+    IAM --> Redis[(Redis Cache)]
+    IAM --> Kafka[Kafka Events]
+    
+    style Traefik fill:#e1f5ff
+    style DB fill:#f0e1ff
+    style Redis fill:#fff4e1
+```

-4. **Start infrastructure** (Redis, Traefik - no PostgreSQL needed)
-   ```bash
-   cd deployments/local
-   docker-compose up -d
-   cd ../..
-   ```
+## Project Structure

-5. **Run database migrations**
-   ```bash
-   ./scripts/db/migrate.sh iam-service dev
-   ```
+The repository follows a monorepo structure:

-6. **Seed the database**
-   ```bash
-   ./scripts/db/seed.sh iam-service
-   ```
+```
+Base/
+├── apps/                 # Frontend applications
+│   ├── web-client/       # Next.js web application
+│   └── mobile-client/    # Flutter mobile application
+├── services/             # Backend microservices
+│   ├── iam-service/      # Authentication & Authorization
+│   └── _template/        # Template for new services
+├── packages/             # Shared libraries
+│   ├── logger/           # Structured logging
+│   ├── types/            # Shared TypeScript types
+│   └── http-client/      # Internal HTTP client
+├── infra/                # Infrastructure configuration
+│   ├── traefik/          # API Gateway config
+│   └── databases/        # Database setup scripts
+├── deployments/          # Deployment configurations
+│   ├── local/            # Docker Compose for dev
+│   └── k8s/              # Kubernetes manifests
+└── docs/                 # Documentation
+```

-7. **Start all services**
-   ```bash
-   ./scripts/dev/start-all.sh
-   ```
+## Installation & Setup

-## Access Points
+### 1. Clone the Repository

- **API Gateway**: http://localhost/api/v1
- **Auth Service**: http://localhost:5001
- **Web Admin**: http://admin.localhost or http://localhost:3000
- **Web Client**: http://localhost or http://localhost:3001
- **Traefik Dashboard**: http://localhost:8080
+```bash
+git clone <repository-url>
+cd Base
+```

-## Database
+### 2. Configure Environment

-This project uses **Neon PostgreSQL** for all environments:
- **Development**: Neon main branch
- **Staging**: Neon staging branch
- **Production**: Neon production branch
+Each service and the local infrastructure needs environment variables. We provide templates for these.

-No local PostgreSQL needed! See [Neon Setup](../../infra/databases/neon/README.md) for details.
+```bash
+# Initialize project setup (copies .env.example to .env)
+./scripts/setup/init-project.sh
+```
+
+### 3. Setup Neon Database
+
+We use Neon (Serverless PostgreSQL) for all environments (Dev, Staging, Prod).
+
+1.  Create a project at [neon.tech](https://neon.tech).
+2.  Create a branch named `dev` (or use `main`).
+3.  Get the Connection String from the Neon dashboard.
+4.  Update `deployments/local/.env.local`:
+
+```env
+DATABASE_URL="postgres://user:pass@ep-xyz.region.neon.tech/neondb"
+```
+
+### 4. Start Infrastructure
+
+Start the supporting infrastructure (Redis, Traefik, Observability) using Docker Compose.
+
+```bash
+cd deployments/local
+docker-compose up -d
+# Expected output: Containers for traefik, redis, kafka created
+```
+
+### 5. Install Dependencies
+
+```bash
+pnpm install
+```
+
+### 6. Setup Database Schema
+
+Push the Prisma schema to your Neon database.
+
+```bash
+# Run migrations for IAM service
+pnpm --filter @goodgo/iam-service prisma migrate dev
+```
+
+### 7. Start Services
+
+Start all backend services in development mode.
+
+```bash
+pnpm dev
+# or start specific service
+pnpm --filter @goodgo/iam-service dev
+```
+
+## Development Workflow
+
+### Creating a New Service
+
+1.  Copy the template:
+    ```bash
+    cp -r services/_template services/my-new-service
+    ```
+2.  Update `package.json` name.
+3.  Add logic in `src/modules/`.
+4.  Register in `deployments/local/docker-compose.yml`.
+
+### Making Changes
+
+1.  Create a new branch: `feature/my-feature`.
+2.  Implement changes.
+3.  Run tests: `pnpm test`.
+4.  Commit with conventional commits: `feat(iam): add login endpoint`.
+
+## Common Commands
+
+| Command | Description |
+| :--- | :--- |
+| `pnpm install` | Install all dependencies |
+| `pnpm dev` | Start all services in dev mode |
+| `pnpm build` | Build all packages and services |
+| `pnpm test` | Run unit tests |
+| `pnpm lint` | Lint code |
+| `docker-compose up -d` | Start local infra |
+| `docker-compose down` | Stop local infra |
+
+## Troubleshooting
+
+### Port Conflicts
+
+**Error**: `Bind for 0.0.0.0:80 failed: port is already allocated`
+
+**Solution**: Check what's using port 80 (likely another web server) and stop it, or change Traefik ports in `docker-compose.yml`.
+
+```bash
+lsof -i :80
+kill -9 <PID>
+```
+
+### Database Connection Failed
+
+**Error**: `P1001: Can't reach database server`
+
+**Solution**:
+1.  Check your internet connection (Neon is cloud-based).
+2.  Verify `DATABASE_URL` in `deployments/local/.env.local`.
+3.  Ensure your IP is allowed in Neon dashboard settings.
+
+### Service Not Found in Gateway
+
+**Error**: `404 Not Found` from api.localhost
+
+**Solution**:
+1.  Check if service is running.
+2.  Check Traefik dashboard at http://localhost:8080.
+3.  Verify `PathPrefix` labels in `docker-compose.yml`.

 ## Next Steps

- Read [Development Guide](development.md)
- Check [API Documentation](../api/openapi/)
- Review [Architecture Overview](../architecture/system-design.md)
+*   [Development Guide](development.md) - Deep dive into coding standards
+*   [API Documentation](../api/openapi/) - Explore the APIs
+*   [Architecture](../architecture/system-design.md) - Understand the system design
--- a/docs/en/guides/troubleshooting.md
+++ b/docs/en/guides/troubleshooting.md
@@ -1,57 +1,218 @@
 # Troubleshooting Guide

-## Common Issues
+> **Note**: This guide focuses on debugging the GoodGo Microservices Platform in a local development environment (Docker Compose).

-### Database Connection Failed
+## Table of Contents

-**Symptoms**: Service can't connect to database
+1. [General Diagnosis](#general-diagnosis)
+2. [Infrastructure Issues](#infrastructure-issues)
+   - [Database (Neon/PostgreSQL)](#database-neonpostgresql)
+   - [Redis](#redis)
+   - [Traefik Gateway](#traefik-gateway)
+3. [Service Issues](#service-issues)
+   - [Service Fails to Start](#service-fails-to-start)
+   - [Prisma/Database Errors](#prismadatabase-errors)
+   - [Authentication Errors](#authentication-errors)
+4. [Debugging Tools](#debugging-tools)
+5. [FAQ](#faq)

-**Solutions**:
-1. Check if PostgreSQL is running: `docker ps`
-2. Verify DATABASE_URL in .env
-3. Check network connectivity: `docker network ls`
-4. Review logs: `docker logs postgres-auth-local`
+---

-### Port Already in Use
+## General Diagnosis

-**Symptoms**: Service fails to start with port error
+When something goes wrong, follow this checklist:

-**Solutions**:
-1. Find process using port: `lsof -i :5001`
-2. Kill process or change PORT in .env
-3. Check docker-compose for port conflicts
+1.  **Check Service Status**:
+    ```bash
+    cd deployments/local
+    docker-compose ps
+    ```
+    *All services should be `Up` or `Running`.*

-### Prisma Client Not Generated
+2.  **Check Logs**:
+    ```bash
+    # View logs for a specific service
+    docker-compose logs -f <service-name>
+    
+    # View last 100 lines for all
+    docker-compose logs --tail=100
+    ```

-**Symptoms**: Import errors for Prisma Client
+3.  **Check Connectivity**:
+    *   Can you reach the Gateway? `curl http://localhost/health`
+    *   Can you reach the Dashboard? http://localhost:8080
+
+---
+
+## Infrastructure Issues
+
+### Database (Neon/PostgreSQL)
+
+**Problem**: `P1001: Can't reach database server` or `Connection timed out`
+
+*   **Cause 1**: Internet connectivity issues (Neon is cloud-based).
+*   **Cause 2**: Incorrect `DATABASE_URL` in `.env`.
+*   **Cause 3**: IP address blocked by Neon.
+
+**Solution**:
+1.  Verify internet connection: `ping neon.tech`.
+2.  Check `deployments/local/.env.local`. The URL should look like:
+    `postgres://user:pass@ep-xyz.aws.neon.tech/neondb`
+3.  Go to Neon Dashboard -> Settings, ensure "Allow all IPs" or add your current IP.
+
+**Problem**: `P1003: Database does not exist`
+
+*   **Reason**: You are connecting to the wrong database name.
+*   **Fix**: Check the end of your connection string (e.g., `/neondb` usually). If you are using a custom DB name, ensure it exists in Neon.
+
+### Redis
+
+**Problem**: `Redis connection refused` or `ECONNREFUSED`
+
+*   **Cause**: Redis container is not running or port mapping is wrong.
+
+**Solution**:
+1.  Check Redis status: `docker-compose ps redis`.
+2.  Restart Redis: `docker-compose restart redis`.
+3.  Check logs: `docker-compose logs redis`.
+4.  Connection string from services:
+    *   **Inside Docker**: `redis:6379`
+    *   **From Host**: `localhost:6379`
+
+### Traefik Gateway
+
+**Problem**: `404 Not Found` when accessing APIs (e.g., `http://localhost/api/v1/auth`)
+
+*   **Cause**: Service is down or Labels are misconfigured.
+
+**Solution**:
+1.  Check Traefik Dashboard at http://localhost:8080.
+    *   Look for "HTTP Routers" and "Services".
+    *   If your service is missing, check `docker-compose.yml` labels.
+2.  Verify `PathPrefix` in labels matches your request.
+    ```yaml
+    - "traefik.http.routers.iam.rule=PathPrefix(`/api/v1/auth`)"
+    ```
+3.  Check if the service passed health checks (Health status in dashboard).
+
+**Problem**: `Bad Gateway` or `Gateway Timeout`
+
+*   **Cause**: Service is crashing or taking too long to respond.
+*   **Fix**: Check the specific service logs (`docker-compose logs iam-service`).
+
+---
+
+## Service Issues
+
+### Service Fails to Start
+
+**Symptom**: Container status is `Exited (1)` or `Restarting`.
+
+**Debugging**:
+1.  Check logs immediately:
+    ```bash
+    docker-compose logs iam-service
+    ```
+2.  **Common Error**: `Config validation error`
+    *   **Fix**: Check environment variables. Using `./scripts/setup/init-project.sh` ensures `.env` exists.
+3.  **Common Error**: `PrismaClientInitializationError`
+    *   **Fix**: Database connectivity issue (see Infrastructure section).
+
+### Prisma/Database Errors
+
+**Error**: `P2025: Record to update not found`
+
+*   **Fix**: Logic error. Ensure the ID exists before updating.
+
+**Error**: `P2002: Unique constraint failed`
+
+*   **Fix**: You are trying to insert duplicate data (e.g., same email).
+
+**Error**: `Migration failed`
+
+*   **Fix**:
+    1.  Delete `prisma/migrations` folder (only in dev!).
+    2.  Reset database: `pnpm prisma migrate reset`.
+    3.  Regenerate client: `pnpm prisma generate`.
+
+### Authentication Errors
+
+**Problem**: `401 Unauthorized` despite valid token
+
+*   **Cause 1**: Token expired.
+*   **Cause 2**: Public key mismatch (Service can't verify token signed by IAM).
+*   **Cause 3**: Clock skew (Docker time vs Host time).
+
+**Solution**:
+1.  Check server logs for JWT verification errors.
+2.  Restart services to refresh keys.
+3.  Sync Docker time: restart Docker Desktop.
+
+---
+
+## Debugging Tools
+
+### 1. Accessing Container Shell
+
+To inspect files or run commands inside a running container:

-**Solutions**:
 ```bash
-cd services/iam-service
-pnpm prisma generate
+docker-compose exec iam-service sh
+# or /bin/bash
 ```

-### Build Failures
+### 2. Inspecting Database (via Prisma Studio)

-**Symptoms**: TypeScript or build errors
+Use Prisma Studio to view/edit data visually:

-**Solutions**:
-1. Clean build artifacts: `./scripts/utils/cleanup.sh`
-2. Reinstall dependencies: `pnpm install`
-3. Check TypeScript errors: `pnpm typecheck`
+```bash
+pnpm --filter @goodgo/iam-service prisma studio
+# Opens http://localhost:5555
+```

-### Traefik Not Routing
+### 3. Inspecting Redis

-**Symptoms**: 404 errors from Traefik
+```bash
+docker-compose exec redis redis-cli
+> PING
+PONG
+> KEYS *
+1) "user:123:session"
+```

-**Solutions**:
-1. Check Traefik dashboard: http://localhost:8080
-2. Verify service labels in docker-compose
-3. Check routes.yml configuration
-4. Review Traefik logs: `docker logs traefik-local`
+### 4. Direct API Testing

-## Getting Help
+Use `curl` or Postman.

-1. Check service logs: `./scripts/dev/logs.sh <service>`
-2. Review GitHub Issues
-3. Contact team lead
+```bash
+# Health Check
+curl -v http://localhost/api/v1/auth/health/live
+
+# Login (example)
+curl -X POST http://localhost/api/v1/auth/login \
+  -H "Content-Type: application/json" \
+  -d '{"email":"admin@example.com", "password":"password"}'
+```
+
+---
+
+## FAQ
+
+**Q: Why is my change not reflecting?**
+A: If you changed `.env` or `docker-compose.yml`, you must restart:
+```bash
+docker-compose down && docker-compose up -d
+```
+If you changed code, hot-reloading (nodemon) should pick it up. If not, restart container.
+
+**Q: How do I reset everything?**
+A: Be careful, this deletes all data!
+```bash
+docker-compose down -v
+# -v removes volumes (Redis data, etc.)
+```
+
+**Q: My computer is slow when running everything.**
+A: Docker consumes RAM.
+1.  Stop unused services (e.g., `future-service`).
+2.  Increase Docker resource limits in Docker Desktop settings.