- Updated the architecture documentation to enhance clarity with detailed diagrams and descriptions for the GoodGo Microservices Platform. - Revised the .NET and Node.js template documentation to reflect new naming conventions, project structures, and setup instructions for local development. - Improved the guide documentation with verification checklists, troubleshooting steps, and real-world examples to assist developers in deploying and managing services effectively. - Ensured bilingual support in documentation to enhance accessibility for a wider audience.
450 lines
14 KiB
Markdown
450 lines
14 KiB
Markdown
# Kiến trúc Khả năng Quan sát
|
|
|
|
> **Note**: Khả năng quan sát toàn diện với metrics, logging và tracing
|
|
|
|
## Sơ đồ Tổng quan
|
|
|
|
```mermaid
|
|
graph TD
|
|
subgraph "Services"
|
|
Service1[Service A]
|
|
Service2[Service B]
|
|
end
|
|
|
|
subgraph "Metrics"
|
|
Service1 -->|/metrics| Prom[Prometheus]
|
|
Service2 -->|/metrics| Prom
|
|
Prom --> Grafana[Grafana<br/>Dashboards]
|
|
end
|
|
|
|
subgraph "Logging"
|
|
Service1 -->|JSON Logs| Loki
|
|
Service2 -->|JSON Logs| Loki
|
|
Loki --> GrafanaLogs[Grafana<br/>Log Explorer]
|
|
end
|
|
|
|
subgraph "Tracing"
|
|
Service1 -->|Spans| Jaeger
|
|
Service2 -->|Spans| Jaeger
|
|
Jaeger --> JaegerUI[Jaeger UI]
|
|
end
|
|
|
|
classDef service fill:#2D3748,stroke:#FFFFFF,stroke-width:2px,color:#FFFFFF;
|
|
classDef metrics fill:#2F855A,stroke:#FFFFFF,stroke-width:2px,color:#FFFFFF;
|
|
classDef logging fill:#C05621,stroke:#FFFFFF,stroke-width:2px,color:#FFFFFF;
|
|
classDef tracing fill:#2C5282,stroke:#FFFFFF,stroke-width:2px,color:#FFFFFF;
|
|
classDef dashboard fill:#4A5568,stroke:#FFFFFF,stroke-width:2px,color:#FFFFFF;
|
|
|
|
class Service1,Service2 service;
|
|
class Prom metrics;
|
|
class Loki logging;
|
|
class Jaeger,JaegerUI tracing;
|
|
class Grafana,GrafanaLogs dashboard;
|
|
```
|
|
|
|
## Bối cảnh Hệ thống
|
|
|
|
```mermaid
|
|
C4Context
|
|
title Sơ đồ Bối cảnh Khả năng Quan sát
|
|
|
|
Person(dev, "Developer", "Uses dashboards to monitor system")
|
|
Person(sre, "SRE", "Manages infrastructure & alerts")
|
|
|
|
System(obs, "Observability Stack", "Prometheus, Loki, Jaeger, Grafana")
|
|
|
|
System_Ext(service, "Microservices", "Sends telemetry data")
|
|
System_Ext(k8s, "Kubernetes", "Sends cluster metrics")
|
|
|
|
Rel(dev, obs, "Views Dashboards", "HTTPS")
|
|
Rel(sre, obs, "Configures Alerts", "HTTPS")
|
|
Rel(service, obs, "Push/Pull Telemetry", "HTTP/gRPC")
|
|
Rel(k8s, obs, "Exposes Metrics", "HTTP")
|
|
|
|
UpdateElementStyle(dev, $fontColor="white", $bgColor="#2D3748", $borderColor="white")
|
|
UpdateElementStyle(sre, $fontColor="white", $bgColor="#2D3748", $borderColor="white")
|
|
UpdateElementStyle(obs, $fontColor="white", $bgColor="#2C5282", $borderColor="white")
|
|
UpdateElementStyle(service, $fontColor="white", $bgColor="#4A5568", $borderColor="white")
|
|
UpdateElementStyle(k8s, $fontColor="white", $bgColor="#4A5568", $borderColor="white")
|
|
```
|
|
|
|
### Mô tả Bối cảnh
|
|
- **Observability Stack**: Trung tâm thu thập và hiển thị dữ liệu (Prometheus, Loki, Jaeger, Grafana).
|
|
- **Microservices**: Gửi logs, metrics và traces (OpenTelemetry).
|
|
- **Developer/SRE**: Sử dụng Grafana để theo dõi sức khỏe hệ thống và debug.
|
|
|
|
## Ba Trụ cột Khả năng Quan sát
|
|
|
|
### 1. Metrics (Prometheus + Grafana)
|
|
|
|
```mermaid
|
|
graph LR
|
|
Service[Service] -->|Expose /metrics| Prom[Prometheus]
|
|
Prom -->|Scrape every 15s| Metrics[Time Series DB]
|
|
Metrics --> Grafana[Grafana]
|
|
Grafana --> Dashboard1[Request Dashboard]
|
|
Grafana --> Dashboard2[Error Dashboard]
|
|
Grafana --> Dashboard3[Performance Dashboard]
|
|
|
|
classDef default fill:#2D3748,stroke:#FFFFFF,stroke-width:2px,color:#FFFFFF;
|
|
classDef prom fill:#2F855A,stroke:#FFFFFF,stroke-width:2px,color:#FFFFFF;
|
|
classDef grafana fill:#2C5282,stroke:#FFFFFF,stroke-width:2px,color:#FFFFFF;
|
|
|
|
class Prom prom;
|
|
class Grafana grafana;
|
|
```
|
|
|
|
**Mô tả**: Các phép đo số theo thời gian (requests/sec, latency, errors).
|
|
|
|
**Triển khai**:
|
|
```typescript
|
|
import { Counter, Histogram, Gauge } from 'prom-client';
|
|
|
|
// HTTP request metrics
|
|
export const httpRequestDuration = new Histogram({
|
|
name: 'http_request_duration_seconds',
|
|
help: 'Duration of HTTP requests in seconds',
|
|
labelNames: ['method', 'route', 'status'],
|
|
buckets: [0.001, 0.01, 0.05, 0.1, 0.5, 1, 2, 5]
|
|
});
|
|
|
|
export const httpRequestTotal = new Counter({
|
|
name: 'http_requests_total',
|
|
help: 'Total HTTP requests',
|
|
labelNames: ['method', 'route', 'status']
|
|
});
|
|
|
|
export const activeRequests = new Gauge({
|
|
name: 'http_requests_active',
|
|
help: 'Number of active HTTP requests'
|
|
});
|
|
|
|
// Middleware để track metrics
|
|
export function metricsMiddleware(req, res, next) {
|
|
const start = Date.now();
|
|
activeRequests.inc();
|
|
|
|
res.on('finish', () => {
|
|
const duration = (Date.now() - start) / 1000;
|
|
|
|
httpRequestDuration.observe(
|
|
{ method: req.method, route: req.route?.path || req.path, status: res.statusCode },
|
|
duration
|
|
);
|
|
|
|
httpRequestTotal.inc({
|
|
method: req.method,
|
|
route: req.route?.path || req.path,
|
|
status: res.statusCode
|
|
});
|
|
|
|
activeRequests.dec();
|
|
});
|
|
|
|
next();
|
|
}
|
|
```
|
|
|
|
### 2. Logging (Serilog + Loki)
|
|
|
|
```mermaid
|
|
sequenceDiagram
|
|
participant Service
|
|
participant Serilog as Serilog Logger
|
|
participant Loki
|
|
participant Grafana
|
|
|
|
Service->>Serilog: Log event
|
|
Serilog->>Serilog: Format JSON
|
|
Serilog->>Serilog: Add metadata<br/>(correlation ID, trace ID)
|
|
Serilog->>Loki: Push logs
|
|
Loki->>Loki: Index & store
|
|
|
|
User->>Grafana: Query logs
|
|
Grafana->>Loki: LogQL query
|
|
Loki-->>Grafana: Log results
|
|
```
|
|
|
|
**Mô tả**: Structured logging với correlation IDs để tracing requests.
|
|
|
|
**Triển khai (.NET)**:
|
|
```csharp
|
|
// Program.cs - Serilog configuration
|
|
builder.Host.UseSerilog((context, config) => config
|
|
.ReadFrom.Configuration(context.Configuration)
|
|
.Enrich.FromLogContext()
|
|
.Enrich.WithProperty("Service", serviceName)
|
|
.Enrich.WithProperty("Environment", environment)
|
|
.WriteTo.Console(new JsonFormatter())
|
|
.WriteTo.GrafanaLoki(
|
|
"http://loki:3100",
|
|
labels: new [] { new LokiLabel { Key = "app", Value = serviceName } }
|
|
));
|
|
|
|
// Middleware - Add correlation ID
|
|
public class CorrelationIdMiddleware
|
|
{
|
|
private readonly RequestDelegate _next;
|
|
private readonly ILogger<CorrelationIdMiddleware> _logger;
|
|
|
|
public async Task InvokeAsync(HttpContext context)
|
|
{
|
|
var correlationId = context.Request.Headers["X-Correlation-Id"].FirstOrDefault()
|
|
?? Guid.NewGuid().ToString();
|
|
|
|
context.Items["CorrelationId"] = correlationId;
|
|
context.Response.Headers["X-Correlation-Id"] = correlationId;
|
|
|
|
using (LogContext.PushProperty("CorrelationId", correlationId))
|
|
{
|
|
_logger.LogInformation("Request started: {Method} {Path}",
|
|
context.Request.Method, context.Request.Path);
|
|
|
|
var sw = Stopwatch.StartNew();
|
|
await _next(context);
|
|
sw.Stop();
|
|
|
|
_logger.LogInformation("Request completed: {StatusCode} in {Duration}ms",
|
|
context.Response.StatusCode, sw.ElapsedMilliseconds);
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
### 3. Tracing (OpenTelemetry + Jaeger)
|
|
|
|
```mermaid
|
|
graph LR
|
|
Request[Incoming Request] --> Trace[Create Trace]
|
|
Trace --> SpanA[Span: HTTP Request]
|
|
SpanA --> SpanB[Span: DB Query]
|
|
SpanA --> SpanC[Span: Cache Check]
|
|
SpanA --> SpanD[Span: External API]
|
|
|
|
SpanB --> Jaeger[Jaeger]
|
|
SpanC --> Jaeger
|
|
SpanD --> Jaeger
|
|
|
|
Jaeger --> Timeline[Trace Timeline]
|
|
|
|
classDef default fill:#2D3748,stroke:#FFFFFF,stroke-width:2px,color:#FFFFFF;
|
|
classDef trace fill:#2C5282,stroke:#FFFFFF,stroke-width:2px,color:#FFFFFF;
|
|
classDef jaeger fill:#2F855A,stroke:#FFFFFF,stroke-width:2px,color:#FFFFFF;
|
|
|
|
class Trace trace;
|
|
class Jaeger jaeger;
|
|
```
|
|
|
|
**Mô tả**: Distributed tracing để track requests giữa các services.
|
|
|
|
> [!NOTE]
|
|
> Distributed Tracing với Jaeger đang trong kế hoạch triển khai. Hiện tại sử dụng correlation IDs cho request tracking.
|
|
|
|
**Triển khai (.NET với OpenTelemetry)**:
|
|
```csharp
|
|
// Program.cs - OpenTelemetry configuration (planned)
|
|
builder.Services.AddOpenTelemetry()
|
|
.WithTracing(tracing => tracing
|
|
.AddAspNetCoreInstrumentation()
|
|
.AddHttpClientInstrumentation()
|
|
.AddEntityFrameworkCoreInstrumentation()
|
|
.AddJaegerExporter(options =>
|
|
{
|
|
options.AgentHost = "jaeger";
|
|
options.AgentPort = 6831;
|
|
}));
|
|
|
|
// Manual span creation
|
|
public async Task<User?> GetUserByIdAsync(Guid userId, CancellationToken ct)
|
|
{
|
|
using var activity = ActivitySource.StartActivity("GetUserById");
|
|
activity?.SetTag("user.id", userId.ToString());
|
|
|
|
try
|
|
{
|
|
var user = await _context.Users.FindAsync([userId], ct);
|
|
activity?.SetStatus(ActivityStatusCode.Ok);
|
|
return user;
|
|
}
|
|
catch (Exception ex)
|
|
{
|
|
activity?.SetStatus(ActivityStatusCode.Error, ex.Message);
|
|
throw;
|
|
}
|
|
}
|
|
```
|
|
|
|
## Kiểm tra Sức khỏe
|
|
|
|
```typescript
|
|
// Health check (.NET)
|
|
app.MapHealthChecks("/health", new HealthCheckOptions
|
|
{
|
|
ResponseWriter = UIResponseWriter.WriteHealthCheckUIResponse
|
|
});
|
|
|
|
app.MapHealthChecks("/health/live", new HealthCheckOptions
|
|
{
|
|
Predicate = _ => false // Liveness - always return healthy
|
|
});
|
|
|
|
app.MapHealthChecks("/health/ready", new HealthCheckOptions
|
|
{
|
|
Predicate = check => check.Tags.Contains("ready")
|
|
});
|
|
|
|
// Health check registration
|
|
builder.Services.AddHealthChecks()
|
|
.AddNpgSql(connectionString, name: "database", tags: new[] { "ready" })
|
|
.AddRedis(redisConnectionString, name: "redis", tags: new[] { "ready" });
|
|
```
|
|
|
|
## Quy tắc Cảnh báo
|
|
|
|
```yaml
|
|
# Prometheus alerting rules
|
|
groups:
|
|
- name: service_alerts
|
|
interval: 30s
|
|
rules:
|
|
# Tỷ lệ lỗi cao
|
|
- alert: HighErrorRate
|
|
expr: |
|
|
rate(http_requests_total{status=~"5.."}[5m]) > 0.05
|
|
for: 2m
|
|
labels:
|
|
severity: warning
|
|
annotations:
|
|
summary: "High error rate detected"
|
|
description: "Error rate is {{ $value }} (> 5%)"
|
|
|
|
# Độ trễ cao
|
|
- alert: HighLatency
|
|
expr: |
|
|
histogram_quantile(0.95, http_request_duration_seconds_bucket) > 1
|
|
for: 5m
|
|
labels:
|
|
severity: warning
|
|
annotations:
|
|
summary: "High latency detected"
|
|
description: "P95 latency is {{ $value }}s"
|
|
|
|
# Service down
|
|
- alert: ServiceDown
|
|
expr: up == 0
|
|
for: 1m
|
|
labels:
|
|
severity: critical
|
|
annotations:
|
|
summary: "Service is down"
|
|
```
|
|
|
|
## Đặc điểm Hiệu suất
|
|
|
|
### Mục tiêu Hiệu suất
|
|
| Chỉ số | Mục tiêu | Ghi chú |
|
|
|--------|----------|---------|
|
|
| **Metric Scrape Interval** | 15s | Critical services |
|
|
| **Log Ingestion Latency** | < 1s | Time from emit to queryable |
|
|
| **Trace Sampling Rate** | 10% | Production (100% in Dev/Staging) |
|
|
| **Dashboard Load Time** | < 2s | P95 Latency |
|
|
| **Alert Evaluation** | Every 1m | Evaluation interval |
|
|
| **Retention Policy** | 14 days | Logs & Traces (Metrics: 30 days) |
|
|
|
|
## Cân nhắc Bảo mật
|
|
|
|
### Bảo mật Observability
|
|
- **Log Scrubbing**: Tự động loại bỏ PII (emails, ssn, credit cards) và secrets khỏi logs trước khi ingestion.
|
|
- **Access Control**: Grafana integrated với OAuth2/OIDC, phân quyền Viewer/Editor/Admin.
|
|
- **Network Policy**: Chỉ cho phép traffic từ namespace nội bộ tới các cổng ingestion (9090, 3100, 14268).
|
|
- **TLS**: Mã hóa traffic giữa agents và collectors.
|
|
|
|
## Triển khai
|
|
|
|
```mermaid
|
|
graph TD
|
|
subgraph "Kubernetes Monitoring Namespace"
|
|
Grafana[Grafana]
|
|
Prom[Prometheus Server]
|
|
Loki[Loki Gateway]
|
|
Jaeger[Jaeger Collector]
|
|
end
|
|
|
|
subgraph "App Namespace"
|
|
App[Application Pods]
|
|
Agent[Grafana Agent / Promtail]
|
|
end
|
|
|
|
App -->|Push Logs| Agent
|
|
Agent -->|Push| Loki
|
|
|
|
Prom -->|Pull Metrics| App
|
|
Prom -->|Pull Metrics| Agent
|
|
|
|
App -->|Push Traces| Jaeger
|
|
|
|
Grafana --> Prom
|
|
Grafana --> Loki
|
|
Grafana --> Jaeger
|
|
|
|
classDef k8s fill:#2D3748,stroke:#FFFFFF,stroke-width:2px,color:#FFFFFF;
|
|
classDef app fill:#4A5568,stroke:#FFFFFF,stroke-width:2px,color:#FFFFFF;
|
|
classDef grafana fill:#2C5282,stroke:#FFFFFF,stroke-width:2px,color:#FFFFFF;
|
|
classDef loki fill:#C05621,stroke:#FFFFFF,stroke-width:2px,color:#FFFFFF;
|
|
classDef jaeger fill:#2F855A,stroke:#FFFFFF,stroke-width:2px,color:#FFFFFF;
|
|
classDef prom fill:#2F855A,stroke:#FFFFFF,stroke-width:2px,color:#FFFFFF;
|
|
|
|
class Grafana grafana;
|
|
class Loki loki;
|
|
class Jaeger jaeger;
|
|
class Prom prom;
|
|
class App,Agent app;
|
|
```
|
|
|
|
**Mô tả Triển khai**:
|
|
- **Agent**: Promtail hoặc Grafana Agent chạy như DaemonSet hoặc Sidecar để thu thập logs.
|
|
- **Pull Model**: Prometheus scrape metrics từ endpoints `/metrics`.
|
|
- **Push Model**: Traces và Logs được push tới collectors.
|
|
- **Resources**: Dedicated nodes cho monitoring stack trong production để tránh ảnh hưởng workload chính.
|
|
|
|
## Tài liệu Liên quan
|
|
|
|
- [System Design](./system-design.md) - Kiến trúc tổng thể
|
|
- [Caching Architecture](./caching-architecture.md) - Cache metrics
|
|
|
|
## Quick Tips
|
|
|
|
### Mermaid Common Issues
|
|
|
|
| Issue | Solution |
|
|
|-------|----------|
|
|
| **Parse Error** | Check for special characters like `()` or `[]` inside node text without quotes. Use `"text"` for complex strings. |
|
|
| **Color Not Showing** | Ensure `style` or `classDef` definitions are correct and IDs match. |
|
|
| **Arrow Direction** | `TD` = Top-Down, `LR` = Left-Right. Choose appropriately for layout. |
|
|
|
|
### Color Pattern Quick Reference
|
|
|
|
| Element | Color | Hex | Use Case |
|
|
|---------|-------|-----|----------|
|
|
| **Primary** | Dark Blue | `#2D3748` | System components, core services |
|
|
| **Secondary** | Grey | `#4A5568` | Supporting modules, libraries |
|
|
| **Accent** | Blue | `#2C5282` | Databases, external APIs |
|
|
| **Highlight** | Teal | `#285E61` | User interactions, highlights |
|
|
| **Success** | Green | `#2F855A` | Successful states, active |
|
|
| **Warning** | Orange | `#C05621` | Warning/Caution states |
|
|
| **Error** | Red | `#C53030` | Error states, failures |
|
|
|
|
### Visual Indicators
|
|
|
|
| Indicator | Meaning |
|
|
|-----------|---------|
|
|
| 🟢 | Safe / Recommended |
|
|
| 🟡 | Warning / Caution |
|
|
| 🔴 | Danger / Anti-pattern |
|
|
| 💡 | Tip / Best Practice |
|
|
|
|
---
|
|
|
|
**Cập nhật Lần cuối**: 2026-01-14
|
|
**Tác giả**: GoodGo Architecture Team
|