28 KiB
Thiết Kế Hệ Thống / System Design
VI: Kiến trúc tổng thể của nền tảng GoodGo Microservices EN: Overall architecture of GoodGo Microservices Platform
Sơ đồ Tổng quan / Overview Diagram
graph TD
subgraph "Client Layer"
Web[Web App<br/>Next.js]
Mobile[Mobile App<br/>Flutter]
end
subgraph "API Gateway Layer"
Traefik[Traefik<br/>API Gateway]
end
subgraph "Services Layer"
IAM[IAM Service<br/>Auth & RBAC]
Future1[Future Service 1]
Future2[Future Service 2]
end
subgraph "Infrastructure Layer"
DB[(Neon PostgreSQL<br/>Primary Database)]
Cache[(Redis<br/>Cache & Session)]
Kafka[Apache Kafka<br/>Event Streaming]
end
subgraph "Observability Layer"
Prom[Prometheus<br/>Metrics]
Loki[Loki<br/>Logs]
Jaeger[Jaeger<br/>Tracing]
Grafana[Grafana<br/>Dashboards]
end
Web --> Traefik
Mobile --> Traefik
Traefik --> IAM
Traefik --> Future1
Traefik --> Future2
IAM --> DB
IAM --> Cache
IAM --> Kafka
Future1 --> DB
Future1 --> Cache
Future1 --> Kafka
Future2 --> DB
Future2 --> Cache
Future2 --> Kafka
IAM -.->|metrics| Prom
Future1 -.->|metrics| Prom
Future2 -.->|metrics| Prom
IAM -.->|logs| Loki
Future1 -.->|logs| Loki
Future2 -.->|logs| Loki
IAM -.->|traces| Jaeger
Future1 -.->|traces| Jaeger
Future2 -.->|traces| Jaeger
Prom --> Grafana
Loki --> Grafana
Jaeger --> Grafana
style Traefik fill:#e1f5ff
style DB fill:#f0e1ff
style Cache fill:#fff4e1
style Kafka fill:#d4edda
style Grafana fill:#ffe1e1
Mô tả Kiến trúc / Architecture Description
VI: Phần Tiếng Việt
GoodGo Platform được xây dựng theo kiến trúc microservices với các nguyên tắc sau:
Nguyên tắc Cốt lõi:
- Độc Lập Service: Mỗi service có database riêng và có thể deploy độc lập
- API Gateway Pattern: Traefik xử lý routing, load balancing, và cross-cutting concerns
- Shared Libraries: Chức năng chung được trích xuất vào shared packages (
@goodgo/*) - Infrastructure as Code: Tất cả cấu hình infrastructure được version control
- Observability First: Đầy đủ metrics, logging, và distributed tracing
Công nghệ Stack:
- Frontend: Next.js 14+ (App Router), Flutter 3.x
- Backend: Node.js 20+, TypeScript 5+, Express
- Database: Neon PostgreSQL (serverless)
- Cache: Redis (multi-layer caching)
- Message Broker: Apache Kafka
- API Gateway: Traefik
- Observability: Prometheus, Grafana, Loki, Jaeger
EN: English Section
GoodGo Platform is built on microservices architecture with the following principles:
Core Principles:
- Service Independence: Each service has its own database and can be deployed independently
- API Gateway Pattern: Traefik handles routing, load balancing, and cross-cutting concerns
- Shared Libraries: Common functionality extracted into shared packages (
@goodgo/*) - Infrastructure as Code: All infrastructure configuration is version controlled
- Observability First: Complete metrics, logging, and distributed tracing
Technology Stack:
- Frontend: Next.js 14+ (App Router), Flutter 3.x
- Backend: Node.js 20+, TypeScript 5+, Express
- Database: Neon PostgreSQL (serverless)
- Cache: Redis (multi-layer caching)
- Message Broker: Apache Kafka
- API Gateway: Traefik
- Observability: Prometheus, Grafana, Loki, Jaeger
Bối cảnh Hệ thống / System Context
C4Context
title Sơ đồ Bối cảnh Hệ thống GoodGo Platform
Person(user, "Người dùng / User", "End users accessing the platform")
Person(admin, "Quản trị viên / Admin", "System administrators")
Person(developer, "Nhà phát triển / Developer", "Platform developers")
System(platform, "GoodGo Platform", "Microservices platform for business applications")
System_Ext(neon, "Neon PostgreSQL", "Serverless PostgreSQL database")
System_Ext(redis, "Redis", "In-memory cache and session store")
System_Ext(kafka, "Apache Kafka", "Event streaming platform")
System_Ext(monitoring, "Monitoring Stack", "Prometheus + Grafana + Loki + Jaeger")
Rel(user, platform, "Uses", "HTTPS")
Rel(admin, platform, "Manages", "HTTPS")
Rel(developer, platform, "Develops & Deploys", "Git, CI/CD")
Rel(platform, neon, "Stores data", "PostgreSQL Protocol")
Rel(platform, redis, "Caches data", "Redis Protocol")
Rel(platform, kafka, "Publishes/Consumes events", "Kafka Protocol")
Rel(platform, monitoring, "Sends metrics, logs, traces", "HTTP, gRPC")
Thành phần / Components
Frontend Layer
Web App (Next.js)
Mô tả: Ứng dụng web sử dụng Next.js 14+ với App Router
Tính năng chính:
- Server-side rendering (SSR) và Static Site Generation (SSG)
- API routes cho BFF (Backend for Frontend) pattern
- Optimized image loading với next/image
- Built-in routing và code splitting
Công nghệ sử dụng:
- Next.js 14+, React 18+, TypeScript
- Tailwind CSS, Zustand (state management)
@goodgo/http-client,@goodgo/types
Vị trí File: apps/web-client/
Mobile App (Flutter)
Mô tả: Ứng dụng mobile cross-platform sử dụng Flutter
Tính năng chính:
- Cross-platform (iOS, Android)
- Native performance
- Provider pattern cho state management
- Offline-first với local storage
Công nghệ sử dụng:
- Flutter 3.x, Dart
- Provider, Dio (HTTP client)
Vị trí File: apps/mobile-client/
API Gateway Layer
Traefik
Mô tả: Reverse proxy và API gateway xử lý routing, load balancing, SSL termination
Tính năng chính:
- Dynamic service discovery
- Automatic HTTPS với Let's Encrypt
- Load balancing và health checks
- Rate limiting và circuit breaker
- Middleware chains (CORS, auth, logging)
Công nghệ sử dụng:
- Traefik 2.x
- Docker labels cho dynamic configuration
Vị trí File: infra/traefik/
Services Layer
IAM Service
Mô tả: Identity and Access Management service xử lý authentication và authorization
Tính năng chính:
- JWT authentication (RS256)
- RBAC (Role-Based Access Control)
- ABAC (Attribute-Based Access Control)
- Event sourcing cho audit trail
- Zero-trust device validation
Công nghệ sử dụng:
- Node.js, Express, TypeScript
- Prisma ORM, bcrypt, jsonwebtoken
@goodgo/logger,@goodgo/tracing
Vị trí File: services/iam-service/
Future Services
Mô tả: Các services sẽ được phát triển trong tương lai
Dự kiến:
- Payment Service - Xử lý thanh toán
- Order Service - Quản lý đơn hàng
- Notification Service - Gửi thông báo
- Analytics Service - Phân tích dữ liệu
Infrastructure Layer
Neon PostgreSQL
Mô tả: Serverless PostgreSQL database với auto-scaling
Tính năng chính:
- Serverless với auto-scaling
- Branching cho development/staging
- Point-in-time recovery
- Connection pooling
Vị trí File: Database schemas trong mỗi service (services/*/prisma/schema.prisma)
Redis
Mô tả: In-memory cache và session store
Tính năng chính:
- Multi-layer caching (L1: Memory, L2: Redis)
- Session storage
- Rate limiting counters
- Pub/Sub cho real-time features
Vị trí File: infra/redis/
Apache Kafka
Mô tả: Event streaming platform cho asynchronous communication
Tính năng chính:
- Event-driven architecture
- Event sourcing
- Eventual consistency
- Dead letter queue (DLQ)
Vị trí File: infra/kafka/
Luồng Dữ liệu / Data Flow
sequenceDiagram
participant Client
participant Traefik as API Gateway
participant Service
participant Cache as Redis
participant DB as PostgreSQL
participant Kafka
Client->>Traefik: HTTPS Request
Traefik->>Traefik: Rate Limiting
Traefik->>Traefik: JWT Validation
Traefik->>Service: Route to Service
Service->>Cache: Check Cache
alt Cache Hit
Cache-->>Service: Return Cached Data
else Cache Miss
Service->>DB: Query Database
DB-->>Service: Return Data
Service->>Cache: Store in Cache (TTL: 5min)
end
Service->>Service: Process Business Logic
Service->>DB: Update Data (if needed)
Service->>Kafka: Publish Event (async)
Service-->>Traefik: Response
Traefik-->>Client: HTTPS Response
Note over Kafka: Event consumers process asynchronously
VI Giải thích chi tiết:
- Request: Client gửi HTTPS request đến Traefik
- Gateway Processing: Traefik thực hiện rate limiting và JWT validation
- Routing: Traefik route request đến service phù hợp
- Cache Check: Service kiểm tra L1 (memory) → L2 (Redis) cache
- Database Query: Nếu cache miss, query từ PostgreSQL
- Cache Update: Lưu kết quả vào cache với TTL phù hợp
- Business Logic: Xử lý logic nghiệp vụ
- Event Publishing: Publish domain events đến Kafka (async)
- Response: Trả về response cho client qua Traefik
EN Detailed Explanation:
- Request: Client sends HTTPS request to Traefik
- Gateway Processing: Traefik performs rate limiting and JWT validation
- Routing: Traefik routes request to appropriate service
- Cache Check: Service checks L1 (memory) → L2 (Redis) cache
- Database Query: If cache miss, query from PostgreSQL
- Cache Update: Store result in cache with appropriate TTL
- Business Logic: Process business logic
- Event Publishing: Publish domain events to Kafka (async)
- Response: Return response to client via Traefik
Kiến trúc Database / Database Architecture
erDiagram
User ||--o{ Session : has
User ||--o{ UserRole : has
User ||--o{ UserPermission : has
User ||--o{ MFADevice : has
User ||--o{ AuditEvent : triggers
Role ||--o{ UserRole : assigned_to
Role ||--o{ RolePermission : has
Permission ||--o{ RolePermission : granted_to
Permission ||--o{ UserPermission : granted_to
Organization ||--o{ User : contains
Organization ||--o{ Role : defines
User {
string id PK
string email UK
string passwordHash
string organizationId FK
boolean mfaEnabled
datetime createdAt
datetime updatedAt
}
Session {
string id PK
string userId FK
string refreshTokenHash
string deviceFingerprint
string ipAddress
datetime expiresAt
datetime createdAt
}
Role {
string id PK
string name
string organizationId FK
int hierarchy
datetime createdAt
}
Permission {
string id PK
string resource
string action
string scope
datetime createdAt
}
AuditEvent {
string id PK
string userId FK
string eventType
json eventData
datetime timestamp
}
VI Mô tả:
- Database per Service: Mỗi service có database schema riêng
- Shared Database: Hiện tại sử dụng shared Neon PostgreSQL, schemas isolated bằng Prisma
- Event Sourcing: Audit events lưu tất cả thay đổi quan trọng
- Soft Delete: Sử dụng
deletedAtfield thay vì hard delete
EN Description:
- Database per Service: Each service has its own database schema
- Shared Database: Currently using shared Neon PostgreSQL, schemas isolated by Prisma
- Event Sourcing: Audit events store all important changes
- Soft Delete: Use
deletedAtfield instead of hard delete
Quyết định Thiết kế / Design Decisions
Quyết định 1: Microservices Architecture
VI Bối cảnh: Cần khả năng scale độc lập và deploy riêng biệt cho từng business domain
VI Quyết định: Sử dụng microservices architecture với database per service pattern
VI Hậu quả:
- ✅ Tích cực:
- Scale độc lập từng service theo nhu cầu
- Deploy riêng biệt, giảm risk khi release
- Fault isolation - lỗi một service không ảnh hưởng toàn bộ
- Technology flexibility - mỗi service có thể dùng tech stack khác
- ❌ Tiêu cực:
- Phức tạp hơn monolith (distributed systems challenges)
- Eventual consistency thay vì strong consistency
- Distributed transactions phức tạp (Saga pattern)
- Operational overhead (monitoring, deployment)
VI Các lựa chọn thay thế: Monolith, Modular Monolith
EN Context: Need independent scaling and deployment for each business domain
EN Decision: Use microservices architecture with database per service pattern
EN Consequences:
- ✅ Positive:
- Independent scaling per service based on demand
- Independent deployment, reduced release risk
- Fault isolation - one service failure doesn't affect entire system
- Technology flexibility - each service can use different tech stack
- ❌ Negative:
- More complex than monolith (distributed systems challenges)
- Eventual consistency instead of strong consistency
- Complex distributed transactions (Saga pattern)
- Operational overhead (monitoring, deployment)
EN Alternatives: Monolith, Modular Monolith
Quyết định 2: Traefik as API Gateway
VI Bối cảnh: Cần reverse proxy, load balancing, SSL termination, và service discovery
VI Quyết định: Sử dụng Traefik thay vì Kong, NGINX, hoặc AWS API Gateway
VI Hậu quả:
- ✅ Tích cực:
- Auto service discovery với Docker labels
- Dynamic configuration không cần restart
- Built-in Let's Encrypt support
- Native Kubernetes integration
- Built-in metrics và tracing
- ❌ Tiêu cực:
- Learning curve cao hơn NGINX
- Plugin ecosystem nhỏ hơn Kong
- Community nhỏ hơn NGINX
VI Các lựa chọn thay thế: Kong, NGINX, AWS API Gateway, Envoy
EN Context: Need reverse proxy, load balancing, SSL termination, and service discovery
EN Decision: Use Traefik instead of Kong, NGINX, or AWS API Gateway
EN Consequences:
- ✅ Positive:
- Auto service discovery with Docker labels
- Dynamic configuration without restart
- Built-in Let's Encrypt support
- Native Kubernetes integration
- Built-in metrics and tracing
- ❌ Negative:
- Higher learning curve than NGINX
- Smaller plugin ecosystem than Kong
- Smaller community than NGINX
EN Alternatives: Kong, NGINX, AWS API Gateway, Envoy
Quyết định 3: Neon PostgreSQL (Serverless)
VI Bối cảnh: Cần database với auto-scaling, branching, và cost-effective cho development
VI Quyết định: Sử dụng Neon PostgreSQL (serverless) thay vì self-hosted PostgreSQL hoặc AWS RDS
VI Hậu quả:
- ✅ Tích cực:
- Auto-scaling theo usage
- Database branching cho dev/staging
- Pay-per-use pricing model
- Automatic backups và point-in-time recovery
- No infrastructure management
- ❌ Tiêu cực:
- Vendor lock-in
- Cold start latency (mitigated by connection pooling)
- Limited control over database configuration
VI Các lựa chọn thay thế: Self-hosted PostgreSQL, AWS RDS, Google Cloud SQL
EN Context: Need database with auto-scaling, branching, and cost-effective for development
EN Decision: Use Neon PostgreSQL (serverless) instead of self-hosted PostgreSQL or AWS RDS
EN Consequences:
- ✅ Positive:
- Auto-scaling based on usage
- Database branching for dev/staging
- Pay-per-use pricing model
- Automatic backups and point-in-time recovery
- No infrastructure management
- ❌ Negative:
- Vendor lock-in
- Cold start latency (mitigated by connection pooling)
- Limited control over database configuration
EN Alternatives: Self-hosted PostgreSQL, AWS RDS, Google Cloud SQL
Đặc điểm Hiệu suất / Performance Characteristics
| Chỉ số / Metric | Mục tiêu / Target | Ghi chú / Notes |
|---|---|---|
| API Response Time (P95) | < 200ms | Excluding external API calls |
| API Response Time (P99) | < 500ms | Peak load scenarios |
| Throughput | 1000 req/s | Per service instance |
| Database Query Time (P95) | < 50ms | Simple queries with indexes |
| Cache Hit Rate (L1) | > 40% | In-memory cache |
| Cache Hit Rate (L2) | > 80% | Redis cache |
| Event Publish Latency (P95) | < 10ms | Kafka fire-and-forget |
| Service Availability | > 99.9% | Monthly uptime target |
| Error Rate | < 1% | 4xx + 5xx errors |
VI Tối ưu hóa Hiệu suất:
- Multi-layer caching (L1: Memory, L2: Redis)
- Connection pooling cho database
- Pagination cho list endpoints (max 100 items)
- Database indexes cho frequently queried fields
- Async event publishing (fire-and-forget)
- CDN cho static assets (Next.js)
EN Performance Optimizations:
- Multi-layer caching (L1: Memory, L2: Redis)
- Connection pooling for database
- Pagination for list endpoints (max 100 items)
- Database indexes for frequently queried fields
- Async event publishing (fire-and-forget)
- CDN for static assets (Next.js)
Cân nhắc Bảo mật / Security Considerations
VI: Phần Tiếng Việt
Authentication:
- JWT với RS256 (asymmetric signing)
- Access token: 15 phút expiry
- Refresh token: 7 ngày expiry, rotation on use
- httpOnly cookies cho token storage
- MFA support (TOTP, backup codes)
Authorization:
- RBAC (Role-Based Access Control)
- ABAC (Attribute-Based Access Control)
- Permission format:
resource:action:scope - Permission caching (5 min TTL)
- Zero-trust device validation
Network Security:
- TLS 1.2+ enforcement
- HTTPS-only (HSTS headers)
- Rate limiting: 100 req/15min (standard), 10 req/hour (strict)
- CORS whitelist từ environment variables
- Network policies (Kubernetes)
Data Protection:
- AES-256-GCM encryption cho PII at rest
- bcrypt (cost 12) cho password hashing
- SHA-256 hashing cho tokens before storage
- Database encryption at rest (Neon)
- TLS in-transit cho tất cả connections
Secrets Management:
- Kubernetes secrets cho production
- Environment variables validation với Zod
- No hardcoded secrets in code
- Quarterly secret rotation
Audit Trail:
- Event sourcing cho tất cả auth events
- 7-year retention cho compliance
- Immutable audit logs
- Correlation IDs cho request tracing
EN: English Section
Authentication:
- JWT with RS256 (asymmetric signing)
- Access token: 15 minutes expiry
- Refresh token: 7 days expiry, rotation on use
- httpOnly cookies for token storage
- MFA support (TOTP, backup codes)
Authorization:
- RBAC (Role-Based Access Control)
- ABAC (Attribute-Based Access Control)
- Permission format:
resource:action:scope - Permission caching (5 min TTL)
- Zero-trust device validation
Network Security:
- TLS 1.2+ enforcement
- HTTPS-only (HSTS headers)
- Rate limiting: 100 req/15min (standard), 10 req/hour (strict)
- CORS whitelist from environment variables
- Network policies (Kubernetes)
Data Protection:
- AES-256-GCM encryption for PII at rest
- bcrypt (cost 12) for password hashing
- SHA-256 hashing for tokens before storage
- Database encryption at rest (Neon)
- TLS in-transit for all connections
Secrets Management:
- Kubernetes secrets for production
- Environment variables validation with Zod
- No hardcoded secrets in code
- Quarterly secret rotation
Audit Trail:
- Event sourcing for all auth events
- 7-year retention for compliance
- Immutable audit logs
- Correlation IDs for request tracing
Triển khai / Deployment
graph TD
subgraph "Kubernetes Cluster"
subgraph "Ingress"
LB[Load Balancer<br/>External IP]
Traefik[Traefik Pods<br/>Replicas: 2]
end
subgraph "Services"
IAM[IAM Service Pods<br/>Replicas: 2-10 HPA]
Service1[Service 1 Pods<br/>Replicas: 2-10 HPA]
Service2[Service 2 Pods<br/>Replicas: 2-10 HPA]
end
subgraph "Infrastructure"
Redis[Redis Cluster<br/>3 Masters + 3 Slaves]
Kafka[Kafka Cluster<br/>3 Brokers]
end
subgraph "Observability"
Prom[Prometheus<br/>Replicas: 2]
Loki[Loki<br/>Replicas: 2]
Jaeger[Jaeger<br/>Replicas: 2]
Grafana[Grafana<br/>Replicas: 2]
end
end
subgraph "External"
DB[(Neon PostgreSQL<br/>Serverless)]
end
LB --> Traefik
Traefik --> IAM
Traefik --> Service1
Traefik --> Service2
IAM --> Redis
IAM --> Kafka
IAM --> DB
Service1 --> Redis
Service1 --> Kafka
Service1 --> DB
Service2 --> Redis
Service2 --> Kafka
Service2 --> DB
IAM -.->|metrics| Prom
Service1 -.->|metrics| Prom
Service2 -.->|metrics| Prom
IAM -.->|logs| Loki
Service1 -.->|logs| Loki
Service2 -.->|logs| Loki
IAM -.->|traces| Jaeger
Service1 -.->|traces| Jaeger
Service2 -.->|traces| Jaeger
Prom --> Grafana
Loki --> Grafana
Jaeger --> Grafana
style LB fill:#e1f5ff
style DB fill:#f0e1ff
style Redis fill:#fff4e1
style Kafka fill:#d4edda
style Grafana fill:#ffe1e1
VI: Chiến lược Triển khai
Deployment Strategy:
- Rolling updates (maxSurge: 1, maxUnavailable: 0)
- Zero-downtime deployments
- Blue-green deployment cho major releases
- Canary deployment cho high-risk changes
Auto-scaling:
- Horizontal Pod Autoscaler (HPA)
- Min replicas: 2
- Max replicas: 10
- Target CPU: 70%
- Target Memory: 80%
Resource Allocation:
| Service | Requests | Limits |
|---|---|---|
| Microservices | 256Mi RAM, 250m CPU | 512Mi RAM, 500m CPU |
| Traefik | 512Mi RAM, 500m CPU | 1Gi RAM, 1000m CPU |
| Redis | 2Gi RAM, 1 CPU | 4Gi RAM, 2 CPU |
| Prometheus | 4Gi RAM, 2 CPU | 8Gi RAM, 4 CPU |
Health Checks:
- Liveness probe:
/health/live(K8s restarts if fails) - Readiness probe:
/health/ready(K8s removes from LB if fails) - Startup probe:
/health/live(initial delay 30s)
Environments:
- Local: Docker Compose
- Staging: Kubernetes cluster (shared)
- Production: Kubernetes cluster (dedicated)
EN: Deployment Strategy
Deployment Strategy:
- Rolling updates (maxSurge: 1, maxUnavailable: 0)
- Zero-downtime deployments
- Blue-green deployment for major releases
- Canary deployment for high-risk changes
Auto-scaling:
- Horizontal Pod Autoscaler (HPA)
- Min replicas: 2
- Max replicas: 10
- Target CPU: 70%
- Target Memory: 80%
Resource Allocation:
| Service | Requests | Limits |
|---|---|---|
| Microservices | 256Mi RAM, 250m CPU | 512Mi RAM, 500m CPU |
| Traefik | 512Mi RAM, 500m CPU | 1Gi RAM, 1000m CPU |
| Redis | 2Gi RAM, 1 CPU | 4Gi RAM, 2 CPU |
| Prometheus | 4Gi RAM, 2 CPU | 8Gi RAM, 4 CPU |
Health Checks:
- Liveness probe:
/health/live(K8s restarts if fails) - Readiness probe:
/health/ready(K8s removes from LB if fails) - Startup probe:
/health/live(initial delay 30s)
Environments:
- Local: Docker Compose
- Staging: Kubernetes cluster (shared)
- Production: Kubernetes cluster (dedicated)
Giám sát & Khả năng quan sát / Monitoring & Observability
VI: Chỉ số Chính
Application Metrics:
http_requests_total- Total HTTP requests (counter)http_request_duration_seconds- Request duration (histogram)http_requests_active- Active requests (gauge)cache_hits_total/cache_misses_total- Cache performancedb_query_duration_seconds- Database query duration
Infrastructure Metrics:
- CPU usage, Memory usage per pod
- Network I/O, Disk I/O
- Pod restart count
- Node resource utilization
Business Metrics:
- User registrations per day
- Login success/failure rate
- API usage by endpoint
- Error rate by service
Kiểm tra Sức khỏe:
/health/live- Liveness probe (service running?)/health/ready- Readiness probe (ready for traffic?)/metrics- Prometheus metrics endpoint
Alerting Rules:
# High error rate
- alert: HighErrorRate
expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.05
for: 2m
severity: warning
# High latency
- alert: HighLatency
expr: histogram_quantile(0.95, http_request_duration_seconds_bucket) > 0.5
for: 5m
severity: warning
# Service down
- alert: ServiceDown
expr: up == 0
for: 1m
severity: critical
# High memory usage
- alert: HighMemoryUsage
expr: container_memory_usage_bytes / container_spec_memory_limit_bytes > 0.85
for: 5m
severity: warning
Logging:
- Structured JSON logging với Winston
- Correlation IDs cho request tracing
- Log levels: error, warn, info, debug
- Log aggregation với Loki
- 7 days retention
Distributed Tracing:
- OpenTelemetry instrumentation
- Jaeger backend
- Trace sampling: 10% in production, 100% in staging
- Span attributes: service, operation, user_id, correlation_id
EN: Key Metrics
Application Metrics:
http_requests_total- Total HTTP requests (counter)http_request_duration_seconds- Request duration (histogram)http_requests_active- Active requests (gauge)cache_hits_total/cache_misses_total- Cache performancedb_query_duration_seconds- Database query duration
Infrastructure Metrics:
- CPU usage, Memory usage per pod
- Network I/O, Disk I/O
- Pod restart count
- Node resource utilization
Business Metrics:
- User registrations per day
- Login success/failure rate
- API usage by endpoint
- Error rate by service
Health Checks:
/health/live- Liveness probe (service running?)/health/ready- Readiness probe (ready for traffic?)/metrics- Prometheus metrics endpoint
Alerting Rules:
# High error rate
- alert: HighErrorRate
expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.05
for: 2m
severity: warning
# High latency
- alert: HighLatency
expr: histogram_quantile(0.95, http_request_duration_seconds_bucket) > 0.5
for: 5m
severity: warning
# Service down
- alert: ServiceDown
expr: up == 0
for: 1m
severity: critical
# High memory usage
- alert: HighMemoryUsage
expr: container_memory_usage_bytes / container_spec_memory_limit_bytes > 0.85
for: 5m
severity: warning
Logging:
- Structured JSON logging with Winston
- Correlation IDs for request tracing
- Log levels: error, warn, info, debug
- Log aggregation with Loki
- 7 days retention
Distributed Tracing:
- OpenTelemetry instrumentation
- Jaeger backend
- Trace sampling: 10% in production, 100% in staging
- Span attributes: service, operation, user_id, correlation_id
Tài liệu Liên quan / Related Documentation
- Event-Driven Architecture - Kiến trúc hướng sự kiện / Event-driven architecture
- Caching Architecture - Chiến lược caching / Caching strategy
- Security Architecture - Kiến trúc bảo mật / Security architecture
- Observability Architecture - Khả năng quan sát / Observability
- Data Consistency Patterns - Mẫu nhất quán dữ liệu / Data consistency patterns
- Microservices Communication - Giao tiếp microservices / Microservices communication
Tham khảo / References
- Microservices Patterns - Microservices pattern catalog
- Twelve-Factor App - Best practices for cloud-native apps
- C4 Model - Software architecture diagrams
- Kubernetes Documentation - Kubernetes official docs
- Traefik Documentation - Traefik official docs
Cập nhật Lần cuối / Last Updated: 2026-01-07
Tác giả / Authors: GoodGo Architecture Team
Người review / Reviewers: GoodGo Development Team