14 KiB
14 KiB
Kiến trúc Khả năng Quan sát
Note
: Khả năng quan sát toàn diện với metrics, logging và tracing
Sơ đồ Tổng quan
graph TD
subgraph "Services"
Service1[Service A]
Service2[Service B]
end
subgraph "Metrics"
Service1 -->|/metrics| Prom[Prometheus]
Service2 -->|/metrics| Prom
Prom --> Grafana[Grafana<br/>Dashboards]
end
subgraph "Logging"
Service1 -->|JSON Logs| Loki
Service2 -->|JSON Logs| Loki
Loki --> GrafanaLogs[Grafana<br/>Log Explorer]
end
subgraph "Tracing"
Service1 -->|Spans| Jaeger
Service2 -->|Spans| Jaeger
Jaeger --> JaegerUI[Jaeger UI]
end
classDef service fill:#2D3748,stroke:#FFFFFF,stroke-width:2px,color:#FFFFFF;
classDef metrics fill:#2F855A,stroke:#FFFFFF,stroke-width:2px,color:#FFFFFF;
classDef logging fill:#C05621,stroke:#FFFFFF,stroke-width:2px,color:#FFFFFF;
classDef tracing fill:#2C5282,stroke:#FFFFFF,stroke-width:2px,color:#FFFFFF;
classDef dashboard fill:#4A5568,stroke:#FFFFFF,stroke-width:2px,color:#FFFFFF;
class Service1,Service2 service;
class Prom metrics;
class Loki logging;
class Jaeger,JaegerUI tracing;
class Grafana,GrafanaLogs dashboard;
Bối cảnh Hệ thống
C4Context
title Sơ đồ Bối cảnh Khả năng Quan sát
Person(dev, "Developer", "Uses dashboards to monitor system")
Person(sre, "SRE", "Manages infrastructure & alerts")
System(obs, "Observability Stack", "Prometheus, Loki, Jaeger, Grafana")
System_Ext(service, "Microservices", "Sends telemetry data")
System_Ext(k8s, "Kubernetes", "Sends cluster metrics")
Rel(dev, obs, "Views Dashboards", "HTTPS")
Rel(sre, obs, "Configures Alerts", "HTTPS")
Rel(service, obs, "Push/Pull Telemetry", "HTTP/gRPC")
Rel(k8s, obs, "Exposes Metrics", "HTTP")
UpdateElementStyle(dev, $fontColor="white", $bgColor="#2D3748", $borderColor="white")
UpdateElementStyle(sre, $fontColor="white", $bgColor="#2D3748", $borderColor="white")
UpdateElementStyle(obs, $fontColor="white", $bgColor="#2C5282", $borderColor="white")
UpdateElementStyle(service, $fontColor="white", $bgColor="#4A5568", $borderColor="white")
UpdateElementStyle(k8s, $fontColor="white", $bgColor="#4A5568", $borderColor="white")
Mô tả Bối cảnh
- Observability Stack: Trung tâm thu thập và hiển thị dữ liệu (Prometheus, Loki, Jaeger, Grafana).
- Microservices: Gửi logs, metrics và traces (OpenTelemetry).
- Developer/SRE: Sử dụng Grafana để theo dõi sức khỏe hệ thống và debug.
Ba Trụ cột Khả năng Quan sát
1. Metrics (Prometheus + Grafana)
graph LR
Service[Service] -->|Expose /metrics| Prom[Prometheus]
Prom -->|Scrape every 15s| Metrics[Time Series DB]
Metrics --> Grafana[Grafana]
Grafana --> Dashboard1[Request Dashboard]
Grafana --> Dashboard2[Error Dashboard]
Grafana --> Dashboard3[Performance Dashboard]
classDef default fill:#2D3748,stroke:#FFFFFF,stroke-width:2px,color:#FFFFFF;
classDef prom fill:#2F855A,stroke:#FFFFFF,stroke-width:2px,color:#FFFFFF;
classDef grafana fill:#2C5282,stroke:#FFFFFF,stroke-width:2px,color:#FFFFFF;
class Prom prom;
class Grafana grafana;
Mô tả: Các phép đo số theo thời gian (requests/sec, latency, errors).
Triển khai:
import { Counter, Histogram, Gauge } from 'prom-client';
// HTTP request metrics
export const httpRequestDuration = new Histogram({
name: 'http_request_duration_seconds',
help: 'Duration of HTTP requests in seconds',
labelNames: ['method', 'route', 'status'],
buckets: [0.001, 0.01, 0.05, 0.1, 0.5, 1, 2, 5]
});
export const httpRequestTotal = new Counter({
name: 'http_requests_total',
help: 'Total HTTP requests',
labelNames: ['method', 'route', 'status']
});
export const activeRequests = new Gauge({
name: 'http_requests_active',
help: 'Number of active HTTP requests'
});
// Middleware để track metrics
export function metricsMiddleware(req, res, next) {
const start = Date.now();
activeRequests.inc();
res.on('finish', () => {
const duration = (Date.now() - start) / 1000;
httpRequestDuration.observe(
{ method: req.method, route: req.route?.path || req.path, status: res.statusCode },
duration
);
httpRequestTotal.inc({
method: req.method,
route: req.route?.path || req.path,
status: res.statusCode
});
activeRequests.dec();
});
next();
}
2. Logging (Serilog + Loki)
sequenceDiagram
participant Service
participant Serilog as Serilog Logger
participant Loki
participant Grafana
Service->>Serilog: Log event
Serilog->>Serilog: Format JSON
Serilog->>Serilog: Add metadata<br/>(correlation ID, trace ID)
Serilog->>Loki: Push logs
Loki->>Loki: Index & store
User->>Grafana: Query logs
Grafana->>Loki: LogQL query
Loki-->>Grafana: Log results
Mô tả: Structured logging với correlation IDs để tracing requests.
Triển khai (.NET):
// Program.cs - Serilog configuration
builder.Host.UseSerilog((context, config) => config
.ReadFrom.Configuration(context.Configuration)
.Enrich.FromLogContext()
.Enrich.WithProperty("Service", serviceName)
.Enrich.WithProperty("Environment", environment)
.WriteTo.Console(new JsonFormatter())
.WriteTo.GrafanaLoki(
"http://loki:3100",
labels: new [] { new LokiLabel { Key = "app", Value = serviceName } }
));
// Middleware - Add correlation ID
public class CorrelationIdMiddleware
{
private readonly RequestDelegate _next;
private readonly ILogger<CorrelationIdMiddleware> _logger;
public async Task InvokeAsync(HttpContext context)
{
var correlationId = context.Request.Headers["X-Correlation-Id"].FirstOrDefault()
?? Guid.NewGuid().ToString();
context.Items["CorrelationId"] = correlationId;
context.Response.Headers["X-Correlation-Id"] = correlationId;
using (LogContext.PushProperty("CorrelationId", correlationId))
{
_logger.LogInformation("Request started: {Method} {Path}",
context.Request.Method, context.Request.Path);
var sw = Stopwatch.StartNew();
await _next(context);
sw.Stop();
_logger.LogInformation("Request completed: {StatusCode} in {Duration}ms",
context.Response.StatusCode, sw.ElapsedMilliseconds);
}
}
}
3. Tracing (OpenTelemetry + Jaeger)
graph LR
Request[Incoming Request] --> Trace[Create Trace]
Trace --> SpanA[Span: HTTP Request]
SpanA --> SpanB[Span: DB Query]
SpanA --> SpanC[Span: Cache Check]
SpanA --> SpanD[Span: External API]
SpanB --> Jaeger[Jaeger]
SpanC --> Jaeger
SpanD --> Jaeger
Jaeger --> Timeline[Trace Timeline]
classDef default fill:#2D3748,stroke:#FFFFFF,stroke-width:2px,color:#FFFFFF;
classDef trace fill:#2C5282,stroke:#FFFFFF,stroke-width:2px,color:#FFFFFF;
classDef jaeger fill:#2F855A,stroke:#FFFFFF,stroke-width:2px,color:#FFFFFF;
class Trace trace;
class Jaeger jaeger;
Mô tả: Distributed tracing để track requests giữa các services.
Note
Distributed Tracing với Jaeger đang trong kế hoạch triển khai. Hiện tại sử dụng correlation IDs cho request tracking.
Triển khai (.NET với OpenTelemetry):
// Program.cs - OpenTelemetry configuration (planned)
builder.Services.AddOpenTelemetry()
.WithTracing(tracing => tracing
.AddAspNetCoreInstrumentation()
.AddHttpClientInstrumentation()
.AddEntityFrameworkCoreInstrumentation()
.AddJaegerExporter(options =>
{
options.AgentHost = "jaeger";
options.AgentPort = 6831;
}));
// Manual span creation
public async Task<User?> GetUserByIdAsync(Guid userId, CancellationToken ct)
{
using var activity = ActivitySource.StartActivity("GetUserById");
activity?.SetTag("user.id", userId.ToString());
try
{
var user = await _context.Users.FindAsync([userId], ct);
activity?.SetStatus(ActivityStatusCode.Ok);
return user;
}
catch (Exception ex)
{
activity?.SetStatus(ActivityStatusCode.Error, ex.Message);
throw;
}
}
Kiểm tra Sức khỏe
// Health check (.NET)
app.MapHealthChecks("/health", new HealthCheckOptions
{
ResponseWriter = UIResponseWriter.WriteHealthCheckUIResponse
});
app.MapHealthChecks("/health/live", new HealthCheckOptions
{
Predicate = _ => false // Liveness - always return healthy
});
app.MapHealthChecks("/health/ready", new HealthCheckOptions
{
Predicate = check => check.Tags.Contains("ready")
});
// Health check registration
builder.Services.AddHealthChecks()
.AddNpgSql(connectionString, name: "database", tags: new[] { "ready" })
.AddRedis(redisConnectionString, name: "redis", tags: new[] { "ready" });
Quy tắc Cảnh báo
# Prometheus alerting rules
groups:
- name: service_alerts
interval: 30s
rules:
# Tỷ lệ lỗi cao
- alert: HighErrorRate
expr: |
rate(http_requests_total{status=~"5.."}[5m]) > 0.05
for: 2m
labels:
severity: warning
annotations:
summary: "High error rate detected"
description: "Error rate is {{ $value }} (> 5%)"
# Độ trễ cao
- alert: HighLatency
expr: |
histogram_quantile(0.95, http_request_duration_seconds_bucket) > 1
for: 5m
labels:
severity: warning
annotations:
summary: "High latency detected"
description: "P95 latency is {{ $value }}s"
# Service down
- alert: ServiceDown
expr: up == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Service is down"
Đặc điểm Hiệu suất
Mục tiêu Hiệu suất
| Chỉ số | Mục tiêu | Ghi chú |
|---|---|---|
| Metric Scrape Interval | 15s | Critical services |
| Log Ingestion Latency | < 1s | Time from emit to queryable |
| Trace Sampling Rate | 10% | Production (100% in Dev/Staging) |
| Dashboard Load Time | < 2s | P95 Latency |
| Alert Evaluation | Every 1m | Evaluation interval |
| Retention Policy | 14 days | Logs & Traces (Metrics: 30 days) |
Cân nhắc Bảo mật
Bảo mật Observability
- Log Scrubbing: Tự động loại bỏ PII (emails, ssn, credit cards) và secrets khỏi logs trước khi ingestion.
- Access Control: Grafana integrated với OAuth2/OIDC, phân quyền Viewer/Editor/Admin.
- Network Policy: Chỉ cho phép traffic từ namespace nội bộ tới các cổng ingestion (9090, 3100, 14268).
- TLS: Mã hóa traffic giữa agents và collectors.
Triển khai
graph TD
subgraph "Kubernetes Monitoring Namespace"
Grafana[Grafana]
Prom[Prometheus Server]
Loki[Loki Gateway]
Jaeger[Jaeger Collector]
end
subgraph "App Namespace"
App[Application Pods]
Agent[Grafana Agent / Promtail]
end
App -->|Push Logs| Agent
Agent -->|Push| Loki
Prom -->|Pull Metrics| App
Prom -->|Pull Metrics| Agent
App -->|Push Traces| Jaeger
Grafana --> Prom
Grafana --> Loki
Grafana --> Jaeger
classDef k8s fill:#2D3748,stroke:#FFFFFF,stroke-width:2px,color:#FFFFFF;
classDef app fill:#4A5568,stroke:#FFFFFF,stroke-width:2px,color:#FFFFFF;
classDef grafana fill:#2C5282,stroke:#FFFFFF,stroke-width:2px,color:#FFFFFF;
classDef loki fill:#C05621,stroke:#FFFFFF,stroke-width:2px,color:#FFFFFF;
classDef jaeger fill:#2F855A,stroke:#FFFFFF,stroke-width:2px,color:#FFFFFF;
classDef prom fill:#2F855A,stroke:#FFFFFF,stroke-width:2px,color:#FFFFFF;
class Grafana grafana;
class Loki loki;
class Jaeger jaeger;
class Prom prom;
class App,Agent app;
Mô tả Triển khai:
- Agent: Promtail hoặc Grafana Agent chạy như DaemonSet hoặc Sidecar để thu thập logs.
- Pull Model: Prometheus scrape metrics từ endpoints
/metrics. - Push Model: Traces và Logs được push tới collectors.
- Resources: Dedicated nodes cho monitoring stack trong production để tránh ảnh hưởng workload chính.
Tài liệu Liên quan
- System Design - Kiến trúc tổng thể
- Caching Architecture - Cache metrics
Quick Tips
Mermaid Common Issues
| Issue | Solution |
|---|---|
| Parse Error | Check for special characters like () or [] inside node text without quotes. Use "text" for complex strings. |
| Color Not Showing | Ensure style or classDef definitions are correct and IDs match. |
| Arrow Direction | TD = Top-Down, LR = Left-Right. Choose appropriately for layout. |
Color Pattern Quick Reference
| Element | Color | Hex | Use Case |
|---|---|---|---|
| Primary | Dark Blue | #2D3748 |
System components, core services |
| Secondary | Grey | #4A5568 |
Supporting modules, libraries |
| Accent | Blue | #2C5282 |
Databases, external APIs |
| Highlight | Teal | #285E61 |
User interactions, highlights |
| Success | Green | #2F855A |
Successful states, active |
| Warning | Orange | #C05621 |
Warning/Caution states |
| Error | Red | #C53030 |
Error states, failures |
Visual Indicators
| Indicator | Meaning |
|---|---|
| 🟢 | Safe / Recommended |
| 🟡 | Warning / Caution |
| 🔴 | Danger / Anti-pattern |
| 💡 | Tip / Best Practice |
Cập nhật Lần cuối: 2026-01-14
Tác giả: GoodGo Architecture Team