- Translated and revised architecture documents to enhance clarity and accessibility for both English and Vietnamese audiences. - Improved diagrams and descriptions for caching, data consistency, event-driven architecture, microservices communication, observability, and security architecture. - Ensured consistent formatting and terminology across all documents to facilitate better understanding and navigation. - Added quick tips and troubleshooting sections to assist developers in implementing and managing the architecture effectively.
736 lines
23 KiB
Markdown
736 lines
23 KiB
Markdown
# Kiến Trúc Thiết Kế Hệ Thống
|
|
|
|
Kiến trúc tổng thể của nền tảng GoodGo Microservices
|
|
|
|
## Sơ đồ Tổng quan
|
|
|
|
```mermaid
|
|
%%{init: {'theme':'base', 'themeVariables': {
|
|
'primaryTextColor':'#000',
|
|
'secondaryTextColor':'#000',
|
|
'tertiaryTextColor':'#000',
|
|
'textColor':'#000',
|
|
'mainBkg':'#fff',
|
|
'secondBkg':'#fff',
|
|
'lineColor':'#333',
|
|
'border1':'#000',
|
|
'border2':'#000',
|
|
'clusterBkg':'#fff',
|
|
'clusterBorder':'#000',
|
|
'titleColor':'#000',
|
|
'edgeLabelBackground':'#fff',
|
|
'nodeTextColor':'#fff'
|
|
}}}%%
|
|
graph TD
|
|
subgraph "Client Layer"
|
|
Web[Web App<br/>Next.js]
|
|
Mobile[Mobile App<br/>Flutter]
|
|
end
|
|
|
|
subgraph "API Gateway Layer"
|
|
Traefik[Traefik<br/>API Gateway]
|
|
end
|
|
|
|
subgraph "Services Layer"
|
|
IAM[IAM Service<br/>Auth & RBAC]
|
|
Future1[Future Service 1]
|
|
Future2[Future Service 2]
|
|
end
|
|
|
|
subgraph "Infrastructure Layer"
|
|
DB[(Neon PostgreSQL<br/>Primary Database)]
|
|
Cache[(Redis<br/>Cache & Session)]
|
|
Kafka[Apache Kafka<br/>Event Streaming]
|
|
end
|
|
|
|
subgraph "Observability Layer"
|
|
Prom[Prometheus<br/>Metrics]
|
|
Loki[Loki<br/>Logs]
|
|
Jaeger[Jaeger<br/>Tracing]
|
|
Grafana[Grafana<br/>Dashboards]
|
|
end
|
|
|
|
Web --> Traefik
|
|
Mobile --> Traefik
|
|
|
|
Traefik --> IAM
|
|
Traefik --> Future1
|
|
Traefik --> Future2
|
|
|
|
IAM --> DB
|
|
IAM --> Cache
|
|
IAM --> Kafka
|
|
|
|
Future1 --> DB
|
|
Future1 --> Cache
|
|
Future1 --> Kafka
|
|
|
|
Future2 --> DB
|
|
Future2 --> Cache
|
|
Future2 --> Kafka
|
|
|
|
IAM -.->|metrics| Prom
|
|
Future1 -.->|metrics| Prom
|
|
Future2 -.->|metrics| Prom
|
|
|
|
IAM -.->|logs| Loki
|
|
Future1 -.->|logs| Loki
|
|
Future2 -.->|logs| Loki
|
|
|
|
IAM -.->|traces| Jaeger
|
|
Future1 -.->|traces| Jaeger
|
|
Future2 -.->|traces| Jaeger
|
|
|
|
Prom --> Grafana
|
|
Loki --> Grafana
|
|
Jaeger --> Grafana
|
|
|
|
style Web fill:#1565c0,stroke:#fff,stroke-width:2px,color:#fff
|
|
style Mobile fill:#1565c0,stroke:#fff,stroke-width:2px,color:#fff
|
|
style Traefik fill:#0f4c81,stroke:#fff,stroke-width:2px,color:#fff
|
|
style IAM fill:#283593,stroke:#fff,stroke-width:2px,color:#fff
|
|
style Future1 fill:#4527a0,stroke:#fff,stroke-width:2px,color:#fff
|
|
style Future2 fill:#4527a0,stroke:#fff,stroke-width:2px,color:#fff
|
|
style DB fill:#5e35b1,stroke:#fff,stroke-width:2px,color:#fff
|
|
style Cache fill:#ef6c00,stroke:#fff,stroke-width:2px,color:#fff
|
|
style Kafka fill:#2e7d32,stroke:#fff,stroke-width:2px,color:#fff
|
|
style Prom fill:#c62828,stroke:#fff,stroke-width:2px,color:#fff
|
|
style Loki fill:#d84315,stroke:#fff,stroke-width:2px,color:#fff
|
|
style Jaeger fill:#e65100,stroke:#fff,stroke-width:2px,color:#fff
|
|
style Grafana fill:#b71c1c,stroke:#fff,stroke-width:2px,color:#fff
|
|
```
|
|
|
|
## Mô tả Kiến trúc
|
|
|
|
GoodGo Platform được xây dựng theo kiến trúc microservices với các nguyên tắc sau:
|
|
|
|
**Nguyên tắc Cốt lõi**:
|
|
1. **Độc Lập Service**: Mỗi service có database riêng và có thể deploy độc lập
|
|
2. **API Gateway Pattern**: Traefik xử lý routing, load balancing, và cross-cutting concerns
|
|
3. **Clean Architecture**: Mỗi service tuân theo Clean Architecture (API, Domain, Infrastructure)
|
|
4. **Infrastructure as Code**: Tất cả cấu hình infrastructure được version control
|
|
5. **Observability First**: Đầy đủ metrics, logging, và health checks
|
|
|
|
**Công nghệ Stack**:
|
|
- **Frontend**: Next.js 14+ (App Router), Flutter 3.x
|
|
- **Backend**: .NET 10, ASP.NET Core, MediatR (CQRS)
|
|
- **Database**: Neon PostgreSQL (serverless), Entity Framework Core
|
|
- **Cache**: Redis (StackExchange.Redis)
|
|
- **Message Broker**: MediatR Domain Events (RabbitMQ planned)
|
|
- **API Gateway**: Traefik v3
|
|
- **Observability**: Prometheus, Grafana, Loki, Serilog
|
|
|
|
## Bối cảnh Hệ thống
|
|
|
|
```mermaid
|
|
C4Context
|
|
title Sơ đồ Bối cảnh Hệ thống GoodGo Platform
|
|
|
|
Person(user, "Người dùng / User", "End users accessing the platform")
|
|
Person(admin, "Quản trị viên / Admin", "System administrators")
|
|
Person(developer, "Nhà phát triển / Developer", "Platform developers")
|
|
|
|
System(platform, "GoodGo Platform", "Microservices platform for business applications")
|
|
|
|
System_Ext(neon, "Neon PostgreSQL", "Serverless PostgreSQL database")
|
|
System_Ext(redis, "Redis", "In-memory cache and session store")
|
|
System_Ext(kafka, "Apache Kafka", "Event streaming platform")
|
|
System_Ext(monitoring, "Monitoring Stack", "Prometheus + Grafana + Loki + Jaeger")
|
|
|
|
Rel(user, platform, "Uses", "HTTPS")
|
|
Rel(admin, platform, "Manages", "HTTPS")
|
|
Rel(developer, platform, "Develops & Deploys", "Git, CI/CD")
|
|
|
|
Rel(platform, neon, "Stores data", "PostgreSQL Protocol")
|
|
Rel(platform, redis, "Caches data", "Redis Protocol")
|
|
Rel(platform, kafka, "Publishes/Consumes events", "Kafka Protocol")
|
|
Rel(platform, monitoring, "Sends metrics, logs, traces", "HTTP, gRPC")
|
|
```
|
|
|
|
## Thành phần
|
|
|
|
### Frontend Layer
|
|
|
|
#### Web App (Next.js)
|
|
**Mô tả**: Ứng dụng web sử dụng Next.js 14+ với App Router
|
|
|
|
**Tính năng chính**:
|
|
- Server-side rendering (SSR) và Static Site Generation (SSG)
|
|
- API routes cho BFF (Backend for Frontend) pattern
|
|
- Optimized image loading với next/image
|
|
- Built-in routing và code splitting
|
|
|
|
**Công nghệ sử dụng**:
|
|
- Next.js 14+, React 18+, TypeScript
|
|
- Tailwind CSS, Zustand (state management)
|
|
- `@goodgo/http-client`, `@goodgo/types`
|
|
|
|
**Vị trí File**: [`apps/web-client/`](file:///Users/velikho/Desktop/WORKING/Base/apps/web-client)
|
|
|
|
#### Mobile App (Flutter)
|
|
**Mô tả**: Ứng dụng mobile cross-platform sử dụng Flutter
|
|
|
|
**Tính năng chính**:
|
|
- Cross-platform (iOS, Android)
|
|
- Native performance
|
|
- Provider pattern cho state management
|
|
- Offline-first với local storage
|
|
|
|
**Công nghệ sử dụng**:
|
|
- Flutter 3.x, Dart
|
|
- Provider, Dio (HTTP client)
|
|
|
|
**Vị trí File**: [`apps/mobile-client/`](file:///Users/velikho/Desktop/WORKING/Base/apps/mobile-client)
|
|
|
|
### API Gateway Layer
|
|
|
|
#### Traefik
|
|
**Mô tả**: Reverse proxy và API gateway xử lý routing, load balancing, SSL termination
|
|
|
|
**Tính năng chính**:
|
|
- Dynamic service discovery
|
|
- Automatic HTTPS với Let's Encrypt
|
|
- Load balancing và health checks
|
|
- Rate limiting và circuit breaker
|
|
- Middleware chains (CORS, auth, logging)
|
|
|
|
**Công nghệ sử dụng**:
|
|
- Traefik 2.x
|
|
- Docker labels cho dynamic configuration
|
|
|
|
**Vị trí File**: [`infra/traefik/`](file:///Users/velikho/Desktop/WORKING/Base/infra/traefik)
|
|
|
|
### Services Layer
|
|
|
|
#### IAM Service (.NET)
|
|
**Mô tả**: Identity and Access Management service xử lý authentication và authorization
|
|
|
|
**Tính năng chính**:
|
|
- OAuth2/OpenID Connect với OpenIddict
|
|
- JWT authentication (RS256)
|
|
- RBAC (Role-Based Access Control)
|
|
- ASP.NET Core Identity cho user management
|
|
- MFA support (TOTP)
|
|
|
|
**Công nghệ sử dụng**:
|
|
- .NET 10, ASP.NET Core, MediatR
|
|
- Entity Framework Core, OpenIddict
|
|
- Serilog, FluentValidation
|
|
|
|
**Vị trí File**: [`services/iam-service-net/`](file:///Users/velikho/Desktop/WORKING/Base/services/iam-service-net)
|
|
|
|
#### Các Services Đã Triển Khai
|
|
|
|
| Service | Mô tả | Vị trí |
|
|
|---------|-------|--------|
|
|
| **Storage Service** | File storage với MinIO/Aliyun OSS | `services/storage-service-net/` |
|
|
| **Membership Service** | Quản lý membership và subscriptions | `services/membership-service-net/` |
|
|
| **Organization Service** | Quản lý tổ chức | `services/organization-service-net/` |
|
|
| **Chat Service** | Chat và messaging | `services/chat-service-net/` |
|
|
| **Social Service** | Social features | `services/social-service-net/` |
|
|
| **Wallet Service** | Ví điện tử | `services/wallet-service-net/` |
|
|
|
|
### Infrastructure Layer
|
|
|
|
#### Neon PostgreSQL
|
|
**Mô tả**: Serverless PostgreSQL database với auto-scaling
|
|
|
|
**Tính năng chính**:
|
|
- Serverless với auto-scaling
|
|
- Branching cho development/staging
|
|
- Point-in-time recovery
|
|
- Connection pooling
|
|
|
|
**Vị trí File**: Database schemas trong mỗi service (`services/*/prisma/schema.prisma`)
|
|
|
|
#### Redis
|
|
**Mô tả**: In-memory cache và session store
|
|
|
|
**Tính năng chính**:
|
|
- Multi-layer caching (L1: Memory, L2: Redis)
|
|
- Session storage
|
|
- Rate limiting counters
|
|
- Pub/Sub cho real-time features
|
|
|
|
**Vị trí File**: [`infra/redis/`](file:///Users/velikho/Desktop/WORKING/Base/infra/redis)
|
|
|
|
#### Apache Kafka
|
|
**Mô tả**: Event streaming platform cho asynchronous communication
|
|
|
|
**Tính năng chính**:
|
|
- Event-driven architecture
|
|
- Event sourcing
|
|
- Eventual consistency
|
|
- Dead letter queue (DLQ)
|
|
|
|
**Vị trí File**: [`infra/kafka/`](file:///Users/velikho/Desktop/WORKING/Base/infra/kafka)
|
|
|
|
## Luồng Dữ liệu
|
|
|
|
```mermaid
|
|
sequenceDiagram
|
|
participant Client
|
|
participant Traefik as API Gateway
|
|
participant Service
|
|
participant Cache as Redis
|
|
participant DB as PostgreSQL
|
|
participant Kafka
|
|
|
|
Client->>Traefik: HTTPS Request
|
|
Traefik->>Traefik: Rate Limiting
|
|
Traefik->>Traefik: JWT Validation
|
|
Traefik->>Service: Route to Service
|
|
|
|
Service->>Cache: Check Cache
|
|
alt Cache Hit
|
|
Cache-->>Service: Return Cached Data
|
|
else Cache Miss
|
|
Service->>DB: Query Database
|
|
DB-->>Service: Return Data
|
|
Service->>Cache: Store in Cache (TTL: 5min)
|
|
end
|
|
|
|
Service->>Service: Process Business Logic
|
|
Service->>DB: Update Data (if needed)
|
|
Service->>Kafka: Publish Event (async)
|
|
|
|
Service-->>Traefik: Response
|
|
Traefik-->>Client: HTTPS Response
|
|
|
|
Note over Kafka: Event consumers process asynchronously
|
|
```
|
|
|
|
**Giải thích chi tiết**:
|
|
1. **Request**: Client gửi HTTPS request đến Traefik
|
|
2. **Gateway Processing**: Traefik thực hiện rate limiting và JWT validation
|
|
3. **Routing**: Traefik route request đến service phù hợp
|
|
4. **Cache Check**: Service kiểm tra L1 (memory) → L2 (Redis) cache
|
|
5. **Database Query**: Nếu cache miss, query từ PostgreSQL
|
|
6. **Cache Update**: Lưu kết quả vào cache với TTL phù hợp
|
|
7. **Business Logic**: Xử lý logic nghiệp vụ
|
|
8. **Event Publishing**: Publish domain events đến Kafka (async)
|
|
9. **Response**: Trả về response cho client qua Traefik
|
|
|
|
## Kiến trúc Database
|
|
|
|
```mermaid
|
|
erDiagram
|
|
User ||--o{ Session : has
|
|
User ||--o{ UserRole : has
|
|
User ||--o{ UserPermission : has
|
|
User ||--o{ MFADevice : has
|
|
User ||--o{ AuditEvent : triggers
|
|
|
|
Role ||--o{ UserRole : assigned_to
|
|
Role ||--o{ RolePermission : has
|
|
|
|
Permission ||--o{ RolePermission : granted_to
|
|
Permission ||--o{ UserPermission : granted_to
|
|
|
|
Organization ||--o{ User : contains
|
|
Organization ||--o{ Role : defines
|
|
|
|
User {
|
|
string id PK
|
|
string email UK
|
|
string passwordHash
|
|
string organizationId FK
|
|
boolean mfaEnabled
|
|
datetime createdAt
|
|
datetime updatedAt
|
|
}
|
|
|
|
Session {
|
|
string id PK
|
|
string userId FK
|
|
string refreshTokenHash
|
|
string deviceFingerprint
|
|
string ipAddress
|
|
datetime expiresAt
|
|
datetime createdAt
|
|
}
|
|
|
|
Role {
|
|
string id PK
|
|
string name
|
|
string organizationId FK
|
|
int hierarchy
|
|
datetime createdAt
|
|
}
|
|
|
|
Permission {
|
|
string id PK
|
|
string resource
|
|
string action
|
|
string scope
|
|
datetime createdAt
|
|
}
|
|
|
|
AuditEvent {
|
|
string id PK
|
|
string userId FK
|
|
string eventType
|
|
json eventData
|
|
datetime timestamp
|
|
}
|
|
```
|
|
|
|
**Mô tả**:
|
|
- **Database per Service**: Mỗi service có database schema riêng
|
|
- **Shared Database**: Hiện tại sử dụng shared Neon PostgreSQL, schemas isolated bằng Prisma
|
|
- **Event Sourcing**: Audit events lưu tất cả thay đổi quan trọng
|
|
- **Soft Delete**: Sử dụng `deletedAt` field thay vì hard delete
|
|
|
|
## Quyết định Thiết kế
|
|
|
|
### Quyết định 1: Microservices Architecture
|
|
|
|
**Bối cảnh**: Cần khả năng scale độc lập và deploy riêng biệt cho từng business domain
|
|
|
|
**Quyết định**: Sử dụng microservices architecture với database per service pattern
|
|
|
|
**Hậu quả**:
|
|
- ✅ **Tích cực**:
|
|
- Scale độc lập từng service theo nhu cầu
|
|
- Deploy riêng biệt, giảm risk khi release
|
|
- Fault isolation - lỗi một service không ảnh hưởng toàn bộ
|
|
- Technology flexibility - mỗi service có thể dùng tech stack khác
|
|
- ❌ **Tiêu cực**:
|
|
- Phức tạp hơn monolith (distributed systems challenges)
|
|
- Eventual consistency thay vì strong consistency
|
|
- Distributed transactions phức tạp (Saga pattern)
|
|
- Operational overhead (monitoring, deployment)
|
|
|
|
**Các lựa chọn thay thế**: Monolith, Modular Monolith
|
|
|
|
---
|
|
|
|
### Quyết định 2: Traefik as API Gateway
|
|
|
|
**Bối cảnh**: Cần reverse proxy, load balancing, SSL termination, và service discovery
|
|
|
|
**Quyết định**: Sử dụng Traefik thay vì Kong, NGINX, hoặc AWS API Gateway
|
|
|
|
**Hậu quả**:
|
|
- ✅ **Tích cực**:
|
|
- Auto service discovery với Docker labels
|
|
- Dynamic configuration không cần restart
|
|
- Built-in Let's Encrypt support
|
|
- Native Kubernetes integration
|
|
- Built-in metrics và tracing
|
|
- ❌ **Tiêu cực**:
|
|
- Learning curve cao hơn NGINX
|
|
- Plugin ecosystem nhỏ hơn Kong
|
|
- Community nhỏ hơn NGINX
|
|
|
|
**Các lựa chọn thay thế**: Kong, NGINX, AWS API Gateway, Envoy
|
|
|
|
---
|
|
|
|
### Quyết định 3: Neon PostgreSQL (Serverless)
|
|
|
|
**Bối cảnh**: Cần database với auto-scaling, branching, và cost-effective cho development
|
|
|
|
**Quyết định**: Sử dụng Neon PostgreSQL (serverless) thay vì self-hosted PostgreSQL hoặc AWS RDS
|
|
|
|
**Hậu quả**:
|
|
- ✅ **Tích cực**:
|
|
- Auto-scaling theo usage
|
|
- Database branching cho dev/staging
|
|
- Pay-per-use pricing model
|
|
- Automatic backups và point-in-time recovery
|
|
- No infrastructure management
|
|
- ❌ **Tiêu cực**:
|
|
- Vendor lock-in
|
|
- Cold start latency (mitigated by connection pooling)
|
|
- Limited control over database configuration
|
|
|
|
**Các lựa chọn thay thế**: Self-hosted PostgreSQL, AWS RDS, Google Cloud SQL
|
|
|
|
## Đặc điểm Hiệu suất
|
|
|
|
| Chỉ số / Metric | Mục tiêu / Target | Ghi chú / Notes |
|
|
|-----------------|-------------------|-----------------|
|
|
| **API Response Time (P95)** | < 200ms | Excluding external API calls |
|
|
| **API Response Time (P99)** | < 500ms | Peak load scenarios |
|
|
| **Throughput** | 1000 req/s | Per service instance |
|
|
| **Database Query Time (P95)** | < 50ms | Simple queries with indexes |
|
|
| **Cache Hit Rate (L1)** | > 40% | In-memory cache |
|
|
| **Cache Hit Rate (L2)** | > 80% | Redis cache |
|
|
| **Event Publish Latency (P95)** | < 10ms | Kafka fire-and-forget |
|
|
| **Service Availability** | > 99.9% | Monthly uptime target |
|
|
| **Error Rate** | < 1% | 4xx + 5xx errors |
|
|
|
|
**Tối ưu hóa Hiệu suất**:
|
|
- Multi-layer caching (L1: Memory, L2: Redis)
|
|
- Connection pooling cho database
|
|
- Pagination cho list endpoints (max 100 items)
|
|
- Database indexes cho frequently queried fields
|
|
- Async event publishing (fire-and-forget)
|
|
- CDN cho static assets (Next.js)
|
|
|
|
## Cân nhắc Bảo mật
|
|
|
|
**Authentication**:
|
|
- JWT với RS256 (asymmetric signing)
|
|
- Access token: 15 phút expiry
|
|
- Refresh token: 7 ngày expiry, rotation on use
|
|
- httpOnly cookies cho token storage
|
|
- MFA support (TOTP, backup codes)
|
|
|
|
**Authorization**:
|
|
- RBAC (Role-Based Access Control)
|
|
- ABAC (Attribute-Based Access Control)
|
|
- Permission format: `resource:action:scope`
|
|
- Permission caching (5 min TTL)
|
|
- Zero-trust device validation
|
|
|
|
**Network Security**:
|
|
- TLS 1.2+ enforcement
|
|
- HTTPS-only (HSTS headers)
|
|
- Rate limiting: 100 req/15min (standard), 10 req/hour (strict)
|
|
- CORS whitelist từ environment variables
|
|
- Network policies (Kubernetes)
|
|
|
|
**Data Protection**:
|
|
- AES-256-GCM encryption cho PII at rest
|
|
- bcrypt (cost 12) cho password hashing
|
|
- SHA-256 hashing cho tokens before storage
|
|
- Database encryption at rest (Neon)
|
|
- TLS in-transit cho tất cả connections
|
|
|
|
**Secrets Management**:
|
|
- Kubernetes secrets cho production
|
|
- Environment variables validation với Zod
|
|
- No hardcoded secrets in code
|
|
- Quarterly secret rotation
|
|
|
|
**Audit Trail**:
|
|
- Event sourcing cho tất cả auth events
|
|
- 7-year retention cho compliance
|
|
- Immutable audit logs
|
|
- Correlation IDs cho request tracing
|
|
## Triển khai
|
|
|
|
```mermaid
|
|
graph TD
|
|
subgraph "Kubernetes Cluster"
|
|
subgraph "Ingress"
|
|
LB[Load Balancer<br/>External IP]
|
|
Traefik[Traefik Pods<br/>Replicas: 2]
|
|
end
|
|
|
|
subgraph "Services"
|
|
IAM[IAM Service Pods<br/>Replicas: 2-10 HPA]
|
|
Service1[Service 1 Pods<br/>Replicas: 2-10 HPA]
|
|
Service2[Service 2 Pods<br/>Replicas: 2-10 HPA]
|
|
end
|
|
|
|
subgraph "Infrastructure"
|
|
Redis[Redis Cluster<br/>3 Masters + 3 Slaves]
|
|
Kafka[Kafka Cluster<br/>3 Brokers]
|
|
end
|
|
|
|
subgraph "Observability"
|
|
Prom[Prometheus<br/>Replicas: 2]
|
|
Loki[Loki<br/>Replicas: 2]
|
|
Jaeger[Jaeger<br/>Replicas: 2]
|
|
Grafana[Grafana<br/>Replicas: 2]
|
|
end
|
|
end
|
|
|
|
subgraph "External"
|
|
DB[(Neon PostgreSQL<br/>Serverless)]
|
|
end
|
|
|
|
LB --> Traefik
|
|
Traefik --> IAM
|
|
Traefik --> Service1
|
|
Traefik --> Service2
|
|
|
|
IAM --> Redis
|
|
IAM --> Kafka
|
|
IAM --> DB
|
|
|
|
Service1 --> Redis
|
|
Service1 --> Kafka
|
|
Service1 --> DB
|
|
|
|
Service2 --> Redis
|
|
Service2 --> Kafka
|
|
Service2 --> DB
|
|
|
|
IAM -.->|metrics| Prom
|
|
Service1 -.->|metrics| Prom
|
|
Service2 -.->|metrics| Prom
|
|
|
|
IAM -.->|logs| Loki
|
|
Service1 -.->|logs| Loki
|
|
Service2 -.->|logs| Loki
|
|
|
|
IAM -.->|traces| Jaeger
|
|
Service1 -.->|traces| Jaeger
|
|
Service2 -.->|traces| Jaeger
|
|
|
|
Prom --> Grafana
|
|
Loki --> Grafana
|
|
Jaeger --> Grafana
|
|
|
|
style LB fill:#1565c0,stroke:#fff,stroke-width:2px,color:#fff
|
|
style Traefik fill:#0f4c81,stroke:#fff,stroke-width:2px,color:#fff
|
|
style IAM fill:#283593,stroke:#fff,stroke-width:2px,color:#fff
|
|
style Service1 fill:#4527a0,stroke:#fff,stroke-width:2px,color:#fff
|
|
style Service2 fill:#4527a0,stroke:#fff,stroke-width:2px,color:#fff
|
|
style DB fill:#5e35b1,stroke:#fff,stroke-width:2px,color:#fff
|
|
style Redis fill:#ef6c00,stroke:#fff,stroke-width:2px,color:#fff
|
|
style Kafka fill:#2e7d32,stroke:#fff,stroke-width:2px,color:#fff
|
|
style Prom fill:#c62828,stroke:#fff,stroke-width:2px,color:#fff
|
|
style Loki fill:#d84315,stroke:#fff,stroke-width:2px,color:#fff
|
|
style Jaeger fill:#e65100,stroke:#fff,stroke-width:2px,color:#fff
|
|
style Grafana fill:#b71c1c,stroke:#fff,stroke-width:2px,color:#fff
|
|
```
|
|
|
|
### Chiến lược Triển khai
|
|
|
|
**Deployment Strategy**:
|
|
- Rolling updates (maxSurge: 1, maxUnavailable: 0)
|
|
- Zero-downtime deployments
|
|
- Blue-green deployment cho major releases
|
|
- Canary deployment cho high-risk changes
|
|
|
|
**Auto-scaling**:
|
|
- Horizontal Pod Autoscaler (HPA)
|
|
- Min replicas: 2
|
|
- Max replicas: 10
|
|
- Target CPU: 70%
|
|
- Target Memory: 80%
|
|
|
|
**Resource Allocation**:
|
|
| Service | Requests | Limits |
|
|
|---------|----------|--------|
|
|
| **Microservices** | 256Mi RAM, 250m CPU | 512Mi RAM, 500m CPU |
|
|
| **Traefik** | 512Mi RAM, 500m CPU | 1Gi RAM, 1000m CPU |
|
|
| **Redis** | 2Gi RAM, 1 CPU | 4Gi RAM, 2 CPU |
|
|
| **Prometheus** | 4Gi RAM, 2 CPU | 8Gi RAM, 4 CPU |
|
|
|
|
**Health Checks**:
|
|
- Liveness probe: `/health/live` (K8s restarts if fails)
|
|
- Readiness probe: `/health/ready` (K8s removes from LB if fails)
|
|
- Startup probe: `/health/live` (initial delay 30s)
|
|
|
|
**Environments**:
|
|
- **Local**: Docker Compose
|
|
- **Staging**: Kubernetes cluster (shared)
|
|
- **Production**: Kubernetes cluster (dedicated)
|
|
## Giám sát & Khả năng quan sát
|
|
|
|
### Chỉ số Chính
|
|
|
|
**Application Metrics**:
|
|
- `http_requests_total` - Total HTTP requests (counter)
|
|
- `http_request_duration_seconds` - Request duration (histogram)
|
|
- `http_requests_active` - Active requests (gauge)
|
|
- `cache_hits_total` / `cache_misses_total` - Cache performance
|
|
- `db_query_duration_seconds` - Database query duration
|
|
|
|
**Infrastructure Metrics**:
|
|
- CPU usage, Memory usage per pod
|
|
- Network I/O, Disk I/O
|
|
- Pod restart count
|
|
- Node resource utilization
|
|
|
|
**Business Metrics**:
|
|
- User registrations per day
|
|
- Login success/failure rate
|
|
- API usage by endpoint
|
|
- Error rate by service
|
|
|
|
**Kiểm tra Sức khỏe**:
|
|
- `/health/live` - Liveness probe (service running?)
|
|
- `/health/ready` - Readiness probe (ready for traffic?)
|
|
- `/metrics` - Prometheus metrics endpoint
|
|
|
|
**Alerting Rules**:
|
|
```yaml
|
|
# High error rate
|
|
- alert: HighErrorRate
|
|
expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.05
|
|
for: 2m
|
|
severity: warning
|
|
|
|
# High latency
|
|
- alert: HighLatency
|
|
expr: histogram_quantile(0.95, http_request_duration_seconds_bucket) > 0.5
|
|
for: 5m
|
|
severity: warning
|
|
|
|
# Service down
|
|
- alert: ServiceDown
|
|
expr: up == 0
|
|
for: 1m
|
|
severity: critical
|
|
|
|
# High memory usage
|
|
- alert: HighMemoryUsage
|
|
expr: container_memory_usage_bytes / container_spec_memory_limit_bytes > 0.85
|
|
for: 5m
|
|
severity: warning
|
|
```
|
|
|
|
**Logging**:
|
|
- Structured JSON logging với Winston
|
|
- Correlation IDs cho request tracing
|
|
- Log levels: error, warn, info, debug
|
|
- Log aggregation với Loki
|
|
- 7 days retention
|
|
|
|
**Distributed Tracing**:
|
|
- OpenTelemetry instrumentation
|
|
- Jaeger backend
|
|
- Trace sampling: 10% in production, 100% in staging
|
|
- Span attributes: service, operation, user_id, correlation_id
|
|
|
|
## Tài liệu Liên quan
|
|
|
|
- [Event-Driven Architecture](./event-driven-architecture.md) - Kiến trúc hướng sự kiện
|
|
- [Caching Architecture](./caching-architecture.md) - Chiến lược caching
|
|
- [Security Architecture](./security-architecture.md) - Kiến trúc bảo mật
|
|
- [Observability Architecture](./observability-architecture.md) - Khả năng quan sát
|
|
- [Data Consistency Patterns](./data-consistency-patterns.md) - Mẫu nhất quán dữ liệu
|
|
- [Microservices Communication](./microservices-communication.md) - Giao tiếp microservices
|
|
|
|
## Tham khảo
|
|
|
|
- [Microservices Patterns](https://microservices.io/patterns/index.html) - Microservices pattern catalog
|
|
- [Twelve-Factor App](https://12factor.net/) - Best practices for cloud-native apps
|
|
- [C4 Model](https://c4model.com/) - Software architecture diagrams
|
|
- [Kubernetes Documentation](https://kubernetes.io/docs/) - Kubernetes official docs
|
|
- [Traefik Documentation](https://doc.traefik.io/traefik/) - Traefik official docs
|
|
|
|
---
|
|
|
|
**Cập nhật Lần cuối**: 2026-01-14
|
|
**Tác giả**: GoodGo Architecture Team
|
|
**Người review**: GoodGo Development Team
|
|
|
|
## Quick Tips
|
|
|
|
### Mermaid Common Issues
|
|
- **Arrow Syntax**: Use `-->` for solid arrows, `-.->` for dotted arrows.
|
|
- **Node IDs**: Avoid spaces/special chars in IDs (e.g., `Node-A` not `Node A`).
|
|
- **Subgraphs**: Ensure `subgraph` names are unique and descriptive.
|
|
|
|
### Color Pattern Quick Reference
|
|
| Element | Dark Color | Text Color |
|
|
|---------|------------|------------|
|
|
| **Blue (Primary)** | `#0f4c81` | `#ffffff` |
|
|
| **Purple (DB)** | `#5e35b1` | `#ffffff` |
|
|
| **Orange (Cache)** | `#ef6c00` | `#ffffff` |
|
|
| **Green (Success)** | `#2e7d32` | `#ffffff` |
|
|
| **Red (Alert)** | `#c62828` | `#ffffff` |
|
|
|
|
### Visual Indicators
|
|
- ✅ **Khuyên dùng**
|
|
- ❌ **Không khuyên dùng**
|
|
- ⚠️ **Cảnh báo**
|