Files

Ho Ngoc Hai 495618ded7 docs: Thêm tài liệu kiến trúc bảo mật, hướng sự kiện, nhất quán dữ liệu, khả năng quan sát và caching bằng tiếng Việt, đồng thời cập nhật các tài liệu hướng dẫn và kiến trúc hiện có.

2026-01-07 10:22:42 +07:00

28 KiB

Raw Blame History

Thiết Kế Hệ Thống / System Design

VI: Kiến trúc tổng thể của nền tảng GoodGo Microservices EN: Overall architecture of GoodGo Microservices Platform

Sơ đồ Tổng quan / Overview Diagram

graph TD
    subgraph "Client Layer"
        Web[Web App<br/>Next.js]
        Mobile[Mobile App<br/>Flutter]
    end
    
    subgraph "API Gateway Layer"
        Traefik[Traefik<br/>API Gateway]
    end
    
    subgraph "Services Layer"
        IAM[IAM Service<br/>Auth & RBAC]
        Future1[Future Service 1]
        Future2[Future Service 2]
    end
    
    subgraph "Infrastructure Layer"
        DB[(Neon PostgreSQL<br/>Primary Database)]
        Cache[(Redis<br/>Cache & Session)]
        Kafka[Apache Kafka<br/>Event Streaming]
    end
    
    subgraph "Observability Layer"
        Prom[Prometheus<br/>Metrics]
        Loki[Loki<br/>Logs]
        Jaeger[Jaeger<br/>Tracing]
        Grafana[Grafana<br/>Dashboards]
    end
    
    Web --> Traefik
    Mobile --> Traefik
    
    Traefik --> IAM
    Traefik --> Future1
    Traefik --> Future2
    
    IAM --> DB
    IAM --> Cache
    IAM --> Kafka
    
    Future1 --> DB
    Future1 --> Cache
    Future1 --> Kafka
    
    Future2 --> DB
    Future2 --> Cache
    Future2 --> Kafka
    
    IAM -.->|metrics| Prom
    Future1 -.->|metrics| Prom
    Future2 -.->|metrics| Prom
    
    IAM -.->|logs| Loki
    Future1 -.->|logs| Loki
    Future2 -.->|logs| Loki
    
    IAM -.->|traces| Jaeger
    Future1 -.->|traces| Jaeger
    Future2 -.->|traces| Jaeger
    
    Prom --> Grafana
    Loki --> Grafana
    Jaeger --> Grafana
    
    style Traefik fill:#e1f5ff
    style DB fill:#f0e1ff
    style Cache fill:#fff4e1
    style Kafka fill:#d4edda
    style Grafana fill:#ffe1e1

Mô tả Kiến trúc / Architecture Description

VI: Phần Tiếng Việt

GoodGo Platform được xây dựng theo kiến trúc microservices với các nguyên tắc sau:

Nguyên tắc Cốt lõi:

Độc Lập Service: Mỗi service có database riêng và có thể deploy độc lập
API Gateway Pattern: Traefik xử lý routing, load balancing, và cross-cutting concerns
Shared Libraries: Chức năng chung được trích xuất vào shared packages (@goodgo/*)
Infrastructure as Code: Tất cả cấu hình infrastructure được version control
Observability First: Đầy đủ metrics, logging, và distributed tracing

Công nghệ Stack:

Frontend: Next.js 14+ (App Router), Flutter 3.x
Backend: Node.js 20+, TypeScript 5+, Express
Database: Neon PostgreSQL (serverless)
Cache: Redis (multi-layer caching)
Message Broker: Apache Kafka
API Gateway: Traefik
Observability: Prometheus, Grafana, Loki, Jaeger

EN: English Section

GoodGo Platform is built on microservices architecture with the following principles:

Core Principles:

Service Independence: Each service has its own database and can be deployed independently
API Gateway Pattern: Traefik handles routing, load balancing, and cross-cutting concerns
Shared Libraries: Common functionality extracted into shared packages (@goodgo/*)
Infrastructure as Code: All infrastructure configuration is version controlled
Observability First: Complete metrics, logging, and distributed tracing

Technology Stack:

Frontend: Next.js 14+ (App Router), Flutter 3.x
Backend: Node.js 20+, TypeScript 5+, Express
Database: Neon PostgreSQL (serverless)
Cache: Redis (multi-layer caching)
Message Broker: Apache Kafka
API Gateway: Traefik
Observability: Prometheus, Grafana, Loki, Jaeger

Bối cảnh Hệ thống / System Context

C4Context
    title Sơ đồ Bối cảnh Hệ thống GoodGo Platform
    
    Person(user, "Người dùng / User", "End users accessing the platform")
    Person(admin, "Quản trị viên / Admin", "System administrators")
    Person(developer, "Nhà phát triển / Developer", "Platform developers")
    
    System(platform, "GoodGo Platform", "Microservices platform for business applications")
    
    System_Ext(neon, "Neon PostgreSQL", "Serverless PostgreSQL database")
    System_Ext(redis, "Redis", "In-memory cache and session store")
    System_Ext(kafka, "Apache Kafka", "Event streaming platform")
    System_Ext(monitoring, "Monitoring Stack", "Prometheus + Grafana + Loki + Jaeger")
    
    Rel(user, platform, "Uses", "HTTPS")
    Rel(admin, platform, "Manages", "HTTPS")
    Rel(developer, platform, "Develops & Deploys", "Git, CI/CD")
    
    Rel(platform, neon, "Stores data", "PostgreSQL Protocol")
    Rel(platform, redis, "Caches data", "Redis Protocol")
    Rel(platform, kafka, "Publishes/Consumes events", "Kafka Protocol")
    Rel(platform, monitoring, "Sends metrics, logs, traces", "HTTP, gRPC")

Thành phần / Components

Frontend Layer

Web App (Next.js)

Mô tả: Ứng dụng web sử dụng Next.js 14+ với App Router

Tính năng chính:

Server-side rendering (SSR) và Static Site Generation (SSG)
API routes cho BFF (Backend for Frontend) pattern
Optimized image loading với next/image
Built-in routing và code splitting

Công nghệ sử dụng:

Next.js 14+, React 18+, TypeScript
Tailwind CSS, Zustand (state management)
@goodgo/http-client, @goodgo/types

Vị trí File: apps/web-client/

Mobile App (Flutter)

Mô tả: Ứng dụng mobile cross-platform sử dụng Flutter

Tính năng chính:

Cross-platform (iOS, Android)
Native performance
Provider pattern cho state management
Offline-first với local storage

Công nghệ sử dụng:

Flutter 3.x, Dart
Provider, Dio (HTTP client)

Vị trí File: apps/mobile-client/

API Gateway Layer

Traefik

Mô tả: Reverse proxy và API gateway xử lý routing, load balancing, SSL termination

Tính năng chính:

Dynamic service discovery
Automatic HTTPS với Let's Encrypt
Load balancing và health checks
Rate limiting và circuit breaker
Middleware chains (CORS, auth, logging)

Công nghệ sử dụng:

Traefik 2.x
Docker labels cho dynamic configuration

Vị trí File: infra/traefik/

Services Layer

IAM Service

Mô tả: Identity and Access Management service xử lý authentication và authorization

Tính năng chính:

JWT authentication (RS256)
RBAC (Role-Based Access Control)
ABAC (Attribute-Based Access Control)
Event sourcing cho audit trail
Zero-trust device validation

Công nghệ sử dụng:

Node.js, Express, TypeScript
Prisma ORM, bcrypt, jsonwebtoken
@goodgo/logger, @goodgo/tracing

Vị trí File: services/iam-service/

Future Services

Mô tả: Các services sẽ được phát triển trong tương lai

Dự kiến:

Payment Service - Xử lý thanh toán
Order Service - Quản lý đơn hàng
Notification Service - Gửi thông báo
Analytics Service - Phân tích dữ liệu

Infrastructure Layer

Neon PostgreSQL

Mô tả: Serverless PostgreSQL database với auto-scaling

Tính năng chính:

Serverless với auto-scaling
Branching cho development/staging
Point-in-time recovery
Connection pooling

Vị trí File: Database schemas trong mỗi service (services/*/prisma/schema.prisma)

Redis

Mô tả: In-memory cache và session store

Tính năng chính:

Multi-layer caching (L1: Memory, L2: Redis)
Session storage
Rate limiting counters
Pub/Sub cho real-time features

Vị trí File: infra/redis/

Apache Kafka

Mô tả: Event streaming platform cho asynchronous communication

Tính năng chính:

Event-driven architecture
Event sourcing
Eventual consistency
Dead letter queue (DLQ)

Vị trí File: infra/kafka/

Luồng Dữ liệu / Data Flow

sequenceDiagram
    participant Client
    participant Traefik as API Gateway
    participant Service
    participant Cache as Redis
    participant DB as PostgreSQL
    participant Kafka
    
    Client->>Traefik: HTTPS Request
    Traefik->>Traefik: Rate Limiting
    Traefik->>Traefik: JWT Validation
    Traefik->>Service: Route to Service
    
    Service->>Cache: Check Cache
    alt Cache Hit
        Cache-->>Service: Return Cached Data
    else Cache Miss
        Service->>DB: Query Database
        DB-->>Service: Return Data
        Service->>Cache: Store in Cache (TTL: 5min)
    end
    
    Service->>Service: Process Business Logic
    Service->>DB: Update Data (if needed)
    Service->>Kafka: Publish Event (async)
    
    Service-->>Traefik: Response
    Traefik-->>Client: HTTPS Response
    
    Note over Kafka: Event consumers process asynchronously

VI Giải thích chi tiết:

Request: Client gửi HTTPS request đến Traefik
Gateway Processing: Traefik thực hiện rate limiting và JWT validation
Routing: Traefik route request đến service phù hợp
Cache Check: Service kiểm tra L1 (memory) → L2 (Redis) cache
Database Query: Nếu cache miss, query từ PostgreSQL
Cache Update: Lưu kết quả vào cache với TTL phù hợp
Business Logic: Xử lý logic nghiệp vụ
Event Publishing: Publish domain events đến Kafka (async)
Response: Trả về response cho client qua Traefik

EN Detailed Explanation:

Request: Client sends HTTPS request to Traefik
Gateway Processing: Traefik performs rate limiting and JWT validation
Routing: Traefik routes request to appropriate service
Cache Check: Service checks L1 (memory) → L2 (Redis) cache
Database Query: If cache miss, query from PostgreSQL
Cache Update: Store result in cache with appropriate TTL
Business Logic: Process business logic
Event Publishing: Publish domain events to Kafka (async)
Response: Return response to client via Traefik

Kiến trúc Database / Database Architecture

erDiagram
    User ||--o{ Session : has
    User ||--o{ UserRole : has
    User ||--o{ UserPermission : has
    User ||--o{ MFADevice : has
    User ||--o{ AuditEvent : triggers
    
    Role ||--o{ UserRole : assigned_to
    Role ||--o{ RolePermission : has
    
    Permission ||--o{ RolePermission : granted_to
    Permission ||--o{ UserPermission : granted_to
    
    Organization ||--o{ User : contains
    Organization ||--o{ Role : defines
    
    User {
        string id PK
        string email UK
        string passwordHash
        string organizationId FK
        boolean mfaEnabled
        datetime createdAt
        datetime updatedAt
    }
    
    Session {
        string id PK
        string userId FK
        string refreshTokenHash
        string deviceFingerprint
        string ipAddress
        datetime expiresAt
        datetime createdAt
    }
    
    Role {
        string id PK
        string name
        string organizationId FK
        int hierarchy
        datetime createdAt
    }
    
    Permission {
        string id PK
        string resource
        string action
        string scope
        datetime createdAt
    }
    
    AuditEvent {
        string id PK
        string userId FK
        string eventType
        json eventData
        datetime timestamp
    }

VI Mô tả:

Database per Service: Mỗi service có database schema riêng
Shared Database: Hiện tại sử dụng shared Neon PostgreSQL, schemas isolated bằng Prisma
Event Sourcing: Audit events lưu tất cả thay đổi quan trọng
Soft Delete: Sử dụng deletedAt field thay vì hard delete

EN Description:

Database per Service: Each service has its own database schema
Shared Database: Currently using shared Neon PostgreSQL, schemas isolated by Prisma
Event Sourcing: Audit events store all important changes
Soft Delete: Use deletedAt field instead of hard delete

Quyết định Thiết kế / Design Decisions

Quyết định 1: Microservices Architecture

VI Bối cảnh: Cần khả năng scale độc lập và deploy riêng biệt cho từng business domain

VI Quyết định: Sử dụng microservices architecture với database per service pattern

VI Hậu quả:

✅ Tích cực:
- Scale độc lập từng service theo nhu cầu
- Deploy riêng biệt, giảm risk khi release
- Fault isolation - lỗi một service không ảnh hưởng toàn bộ
- Technology flexibility - mỗi service có thể dùng tech stack khác
❌ Tiêu cực:
- Phức tạp hơn monolith (distributed systems challenges)
- Eventual consistency thay vì strong consistency
- Distributed transactions phức tạp (Saga pattern)
- Operational overhead (monitoring, deployment)

VI Các lựa chọn thay thế: Monolith, Modular Monolith

EN Context: Need independent scaling and deployment for each business domain

EN Decision: Use microservices architecture with database per service pattern

EN Consequences:

✅ Positive:
- Independent scaling per service based on demand
- Independent deployment, reduced release risk
- Fault isolation - one service failure doesn't affect entire system
- Technology flexibility - each service can use different tech stack
❌ Negative:
- More complex than monolith (distributed systems challenges)
- Eventual consistency instead of strong consistency
- Complex distributed transactions (Saga pattern)
- Operational overhead (monitoring, deployment)

EN Alternatives: Monolith, Modular Monolith

Quyết định 2: Traefik as API Gateway

VI Bối cảnh: Cần reverse proxy, load balancing, SSL termination, và service discovery

VI Quyết định: Sử dụng Traefik thay vì Kong, NGINX, hoặc AWS API Gateway

VI Hậu quả:

✅ Tích cực:
- Auto service discovery với Docker labels
- Dynamic configuration không cần restart
- Built-in Let's Encrypt support
- Native Kubernetes integration
- Built-in metrics và tracing
❌ Tiêu cực:
- Learning curve cao hơn NGINX
- Plugin ecosystem nhỏ hơn Kong
- Community nhỏ hơn NGINX

VI Các lựa chọn thay thế: Kong, NGINX, AWS API Gateway, Envoy

EN Context: Need reverse proxy, load balancing, SSL termination, and service discovery

EN Decision: Use Traefik instead of Kong, NGINX, or AWS API Gateway

EN Consequences:

✅ Positive:
- Auto service discovery with Docker labels
- Dynamic configuration without restart
- Built-in Let's Encrypt support
- Native Kubernetes integration
- Built-in metrics and tracing
❌ Negative:
- Higher learning curve than NGINX
- Smaller plugin ecosystem than Kong
- Smaller community than NGINX

EN Alternatives: Kong, NGINX, AWS API Gateway, Envoy

Quyết định 3: Neon PostgreSQL (Serverless)

VI Bối cảnh: Cần database với auto-scaling, branching, và cost-effective cho development

VI Quyết định: Sử dụng Neon PostgreSQL (serverless) thay vì self-hosted PostgreSQL hoặc AWS RDS

VI Hậu quả:

✅ Tích cực:
- Auto-scaling theo usage
- Database branching cho dev/staging
- Pay-per-use pricing model
- Automatic backups và point-in-time recovery
- No infrastructure management
❌ Tiêu cực:
- Vendor lock-in
- Cold start latency (mitigated by connection pooling)
- Limited control over database configuration

VI Các lựa chọn thay thế: Self-hosted PostgreSQL, AWS RDS, Google Cloud SQL

EN Context: Need database with auto-scaling, branching, and cost-effective for development

EN Decision: Use Neon PostgreSQL (serverless) instead of self-hosted PostgreSQL or AWS RDS

EN Consequences:

✅ Positive:
- Auto-scaling based on usage
- Database branching for dev/staging
- Pay-per-use pricing model
- Automatic backups and point-in-time recovery
- No infrastructure management
❌ Negative:
- Vendor lock-in
- Cold start latency (mitigated by connection pooling)
- Limited control over database configuration

EN Alternatives: Self-hosted PostgreSQL, AWS RDS, Google Cloud SQL

Đặc điểm Hiệu suất / Performance Characteristics

Chỉ số / Metric	Mục tiêu / Target	Ghi chú / Notes
API Response Time (P95)	< 200ms	Excluding external API calls
API Response Time (P99)	< 500ms	Peak load scenarios
Throughput	1000 req/s	Per service instance
Database Query Time (P95)	< 50ms	Simple queries with indexes
Cache Hit Rate (L1)	> 40%	In-memory cache
Cache Hit Rate (L2)	> 80%	Redis cache
Event Publish Latency (P95)	< 10ms	Kafka fire-and-forget
Service Availability	> 99.9%	Monthly uptime target
Error Rate	< 1%	4xx + 5xx errors

VI Tối ưu hóa Hiệu suất:

Multi-layer caching (L1: Memory, L2: Redis)
Connection pooling cho database
Pagination cho list endpoints (max 100 items)
Database indexes cho frequently queried fields
Async event publishing (fire-and-forget)
CDN cho static assets (Next.js)

EN Performance Optimizations:

Multi-layer caching (L1: Memory, L2: Redis)
Connection pooling for database
Pagination for list endpoints (max 100 items)
Database indexes for frequently queried fields
Async event publishing (fire-and-forget)
CDN for static assets (Next.js)

Cân nhắc Bảo mật / Security Considerations

VI: Phần Tiếng Việt

Authentication:

JWT với RS256 (asymmetric signing)
Access token: 15 phút expiry
Refresh token: 7 ngày expiry, rotation on use
httpOnly cookies cho token storage
MFA support (TOTP, backup codes)

Authorization:

RBAC (Role-Based Access Control)
ABAC (Attribute-Based Access Control)
Permission format: resource:action:scope
Permission caching (5 min TTL)
Zero-trust device validation

Network Security:

TLS 1.2+ enforcement
HTTPS-only (HSTS headers)
Rate limiting: 100 req/15min (standard), 10 req/hour (strict)
CORS whitelist từ environment variables
Network policies (Kubernetes)

Data Protection:

AES-256-GCM encryption cho PII at rest
bcrypt (cost 12) cho password hashing
SHA-256 hashing cho tokens before storage
Database encryption at rest (Neon)
TLS in-transit cho tất cả connections

Secrets Management:

Kubernetes secrets cho production
Environment variables validation với Zod
No hardcoded secrets in code
Quarterly secret rotation

Audit Trail:

Event sourcing cho tất cả auth events
7-year retention cho compliance
Immutable audit logs
Correlation IDs cho request tracing

EN: English Section

Authentication:

JWT with RS256 (asymmetric signing)
Access token: 15 minutes expiry
Refresh token: 7 days expiry, rotation on use
httpOnly cookies for token storage
MFA support (TOTP, backup codes)

Authorization:

RBAC (Role-Based Access Control)
ABAC (Attribute-Based Access Control)
Permission format: resource:action:scope
Permission caching (5 min TTL)
Zero-trust device validation

Network Security:

TLS 1.2+ enforcement
HTTPS-only (HSTS headers)
Rate limiting: 100 req/15min (standard), 10 req/hour (strict)
CORS whitelist from environment variables
Network policies (Kubernetes)

Data Protection:

AES-256-GCM encryption for PII at rest
bcrypt (cost 12) for password hashing
SHA-256 hashing for tokens before storage
Database encryption at rest (Neon)
TLS in-transit for all connections

Secrets Management:

Kubernetes secrets for production
Environment variables validation with Zod
No hardcoded secrets in code
Quarterly secret rotation

Audit Trail:

Event sourcing for all auth events
7-year retention for compliance
Immutable audit logs
Correlation IDs for request tracing

Triển khai / Deployment

graph TD
    subgraph "Kubernetes Cluster"
        subgraph "Ingress"
            LB[Load Balancer<br/>External IP]
            Traefik[Traefik Pods<br/>Replicas: 2]
        end
        
        subgraph "Services"
            IAM[IAM Service Pods<br/>Replicas: 2-10 HPA]
            Service1[Service 1 Pods<br/>Replicas: 2-10 HPA]
            Service2[Service 2 Pods<br/>Replicas: 2-10 HPA]
        end
        
        subgraph "Infrastructure"
            Redis[Redis Cluster<br/>3 Masters + 3 Slaves]
            Kafka[Kafka Cluster<br/>3 Brokers]
        end
        
        subgraph "Observability"
            Prom[Prometheus<br/>Replicas: 2]
            Loki[Loki<br/>Replicas: 2]
            Jaeger[Jaeger<br/>Replicas: 2]
            Grafana[Grafana<br/>Replicas: 2]
        end
    end
    
    subgraph "External"
        DB[(Neon PostgreSQL<br/>Serverless)]
    end
    
    LB --> Traefik
    Traefik --> IAM
    Traefik --> Service1
    Traefik --> Service2
    
    IAM --> Redis
    IAM --> Kafka
    IAM --> DB
    
    Service1 --> Redis
    Service1 --> Kafka
    Service1 --> DB
    
    Service2 --> Redis
    Service2 --> Kafka
    Service2 --> DB
    
    IAM -.->|metrics| Prom
    Service1 -.->|metrics| Prom
    Service2 -.->|metrics| Prom
    
    IAM -.->|logs| Loki
    Service1 -.->|logs| Loki
    Service2 -.->|logs| Loki
    
    IAM -.->|traces| Jaeger
    Service1 -.->|traces| Jaeger
    Service2 -.->|traces| Jaeger
    
    Prom --> Grafana
    Loki --> Grafana
    Jaeger --> Grafana
    
    style LB fill:#e1f5ff
    style DB fill:#f0e1ff
    style Redis fill:#fff4e1
    style Kafka fill:#d4edda
    style Grafana fill:#ffe1e1

VI: Chiến lược Triển khai

Deployment Strategy:

Rolling updates (maxSurge: 1, maxUnavailable: 0)
Zero-downtime deployments
Blue-green deployment cho major releases
Canary deployment cho high-risk changes

Auto-scaling:

Horizontal Pod Autoscaler (HPA)
- Min replicas: 2
- Max replicas: 10
- Target CPU: 70%
- Target Memory: 80%

Resource Allocation:

Service	Requests	Limits
Microservices	256Mi RAM, 250m CPU	512Mi RAM, 500m CPU
Traefik	512Mi RAM, 500m CPU	1Gi RAM, 1000m CPU
Redis	2Gi RAM, 1 CPU	4Gi RAM, 2 CPU
Prometheus	4Gi RAM, 2 CPU	8Gi RAM, 4 CPU

Health Checks:

Liveness probe: /health/live (K8s restarts if fails)
Readiness probe: /health/ready (K8s removes from LB if fails)
Startup probe: /health/live (initial delay 30s)

Environments:

Local: Docker Compose
Staging: Kubernetes cluster (shared)
Production: Kubernetes cluster (dedicated)

EN: Deployment Strategy

Deployment Strategy:

Rolling updates (maxSurge: 1, maxUnavailable: 0)
Zero-downtime deployments
Blue-green deployment for major releases
Canary deployment for high-risk changes

Auto-scaling:

Horizontal Pod Autoscaler (HPA)
- Min replicas: 2
- Max replicas: 10
- Target CPU: 70%
- Target Memory: 80%

Resource Allocation:

Service	Requests	Limits
Microservices	256Mi RAM, 250m CPU	512Mi RAM, 500m CPU
Traefik	512Mi RAM, 500m CPU	1Gi RAM, 1000m CPU
Redis	2Gi RAM, 1 CPU	4Gi RAM, 2 CPU
Prometheus	4Gi RAM, 2 CPU	8Gi RAM, 4 CPU

Health Checks:

Liveness probe: /health/live (K8s restarts if fails)
Readiness probe: /health/ready (K8s removes from LB if fails)
Startup probe: /health/live (initial delay 30s)

Environments:

Local: Docker Compose
Staging: Kubernetes cluster (shared)
Production: Kubernetes cluster (dedicated)

Giám sát & Khả năng quan sát / Monitoring & Observability

VI: Chỉ số Chính

Application Metrics:

http_requests_total - Total HTTP requests (counter)
http_request_duration_seconds - Request duration (histogram)
http_requests_active - Active requests (gauge)
cache_hits_total / cache_misses_total - Cache performance
db_query_duration_seconds - Database query duration

Infrastructure Metrics:

CPU usage, Memory usage per pod
Network I/O, Disk I/O
Pod restart count
Node resource utilization

Business Metrics:

User registrations per day
Login success/failure rate
API usage by endpoint
Error rate by service

Kiểm tra Sức khỏe:

/health/live - Liveness probe (service running?)
/health/ready - Readiness probe (ready for traffic?)
/metrics - Prometheus metrics endpoint

Alerting Rules:

# High error rate
- alert: HighErrorRate
  expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.05
  for: 2m
  severity: warning

# High latency
- alert: HighLatency
  expr: histogram_quantile(0.95, http_request_duration_seconds_bucket) > 0.5
  for: 5m
  severity: warning

# Service down
- alert: ServiceDown
  expr: up == 0
  for: 1m
  severity: critical

# High memory usage
- alert: HighMemoryUsage
  expr: container_memory_usage_bytes / container_spec_memory_limit_bytes > 0.85
  for: 5m
  severity: warning

Logging:

Structured JSON logging với Winston
Correlation IDs cho request tracing
Log levels: error, warn, info, debug
Log aggregation với Loki
7 days retention

Distributed Tracing:

OpenTelemetry instrumentation
Jaeger backend
Trace sampling: 10% in production, 100% in staging
Span attributes: service, operation, user_id, correlation_id

EN: Key Metrics

Application Metrics:

http_requests_total - Total HTTP requests (counter)
http_request_duration_seconds - Request duration (histogram)
http_requests_active - Active requests (gauge)
cache_hits_total / cache_misses_total - Cache performance
db_query_duration_seconds - Database query duration

Infrastructure Metrics:

CPU usage, Memory usage per pod
Network I/O, Disk I/O
Pod restart count
Node resource utilization

Business Metrics:

User registrations per day
Login success/failure rate
API usage by endpoint
Error rate by service

Health Checks:

/health/live - Liveness probe (service running?)
/health/ready - Readiness probe (ready for traffic?)
/metrics - Prometheus metrics endpoint

Alerting Rules:

# High error rate
- alert: HighErrorRate
  expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.05
  for: 2m
  severity: warning

# High latency
- alert: HighLatency
  expr: histogram_quantile(0.95, http_request_duration_seconds_bucket) > 0.5
  for: 5m
  severity: warning

# Service down
- alert: ServiceDown
  expr: up == 0
  for: 1m
  severity: critical

# High memory usage
- alert: HighMemoryUsage
  expr: container_memory_usage_bytes / container_spec_memory_limit_bytes > 0.85
  for: 5m
  severity: warning

Logging:

Structured JSON logging with Winston
Correlation IDs for request tracing
Log levels: error, warn, info, debug
Log aggregation with Loki
7 days retention

Distributed Tracing:

OpenTelemetry instrumentation
Jaeger backend
Trace sampling: 10% in production, 100% in staging
Span attributes: service, operation, user_id, correlation_id

Event-Driven Architecture - Kiến trúc hướng sự kiện / Event-driven architecture
Caching Architecture - Chiến lược caching / Caching strategy
Security Architecture - Kiến trúc bảo mật / Security architecture
Observability Architecture - Khả năng quan sát / Observability
Data Consistency Patterns - Mẫu nhất quán dữ liệu / Data consistency patterns
Microservices Communication - Giao tiếp microservices / Microservices communication

Tham khảo / References

Microservices Patterns - Microservices pattern catalog
Twelve-Factor App - Best practices for cloud-native apps
C4 Model - Software architecture diagrams
Kubernetes Documentation - Kubernetes official docs
Traefik Documentation - Traefik official docs

Cập nhật Lần cuối / Last Updated: 2026-01-07
Tác giả / Authors: GoodGo Architecture Team
Người review / Reviewers: GoodGo Development Team

28 KiB Raw Blame History

Thiết Kế Hệ Thống / System Design

Sơ đồ Tổng quan / Overview Diagram

Mô tả Kiến trúc / Architecture Description

VI: Phần Tiếng Việt

EN: English Section

Bối cảnh Hệ thống / System Context

Thành phần / Components

Frontend Layer

Web App (Next.js)

Mobile App (Flutter)

API Gateway Layer

Traefik

Services Layer

IAM Service

Future Services

Infrastructure Layer

Neon PostgreSQL

Redis

Apache Kafka

Luồng Dữ liệu / Data Flow

Kiến trúc Database / Database Architecture

Quyết định Thiết kế / Design Decisions

Quyết định 1: Microservices Architecture

Quyết định 2: Traefik as API Gateway

Quyết định 3: Neon PostgreSQL (Serverless)

Đặc điểm Hiệu suất / Performance Characteristics

Cân nhắc Bảo mật / Security Considerations

VI: Phần Tiếng Việt

EN: English Section

Triển khai / Deployment

VI: Chiến lược Triển khai

EN: Deployment Strategy

Giám sát & Khả năng quan sát / Monitoring & Observability

VI: Chỉ số Chính

EN: Key Metrics

Tài liệu Liên quan / Related Documentation

Tham khảo / References

28 KiB

Raw Blame History