- Renamed auth-service to iam-service across various files for consistency. - Updated Dockerfiles, deployment configurations, and documentation to reflect the service name change. - Enhanced testing commands in documentation to point to the new iam-service. - Removed outdated auth-service files and configurations to streamline the project structure. - Improved bilingual documentation for clarity on the new service structure and usage.
18 KiB
Observability & Monitoring / Khả Năng Quan Sát & Giám Sát
EN: Observability and monitoring patterns for GoodGo microservices. Use when adding metrics, implementing logging, setting up tracing, creating health checks, or debugging production issues. VI: Các pattern observability và monitoring cho microservices GoodGo. Sử dụng khi thêm metrics, triển khai logging, thiết lập tracing, tạo health checks, hoặc debug các vấn đề production.
Overview / Tổng Quan
EN: This skill covers the three pillars of observability (logs, metrics, traces) and how to implement them in GoodGo microservices. It includes structured logging, Prometheus metrics, distributed tracing with OpenTelemetry, health checks, and error tracking.
VI: Skill này bao gồm ba trụ cột của observability (logs, metrics, traces) và cách triển khai chúng trong microservices GoodGo. Nó bao gồm structured logging, Prometheus metrics, distributed tracing với OpenTelemetry, health checks, và error tracking.
When to Use / Khi Nào Sử Dụng
EN: Use this skill when:
- Setting up logging infrastructure
- Implementing metrics collection
- Adding distributed tracing
- Creating health check endpoints
- Setting up monitoring dashboards
- Debugging production issues
- Implementing alerting rules
- Analyzing performance bottlenecks
VI: Sử dụng skill này khi:
- Thiết lập hạ tầng logging
- Triển khai thu thập metrics
- Thêm distributed tracing
- Tạo health check endpoints
- Thiết lập monitoring dashboards
- Debug các vấn đề production
- Triển khai alerting rules
- Phân tích performance bottlenecks
Key Concepts / Khái Niệm Chính
Three Pillars of Observability / Ba Trụ Cột Của Observability
EN:
- Logs: Event records for debugging and auditing
- Metrics: Numerical measurements over time (counters, gauges, histograms)
- Traces: Request flow across services (distributed tracing)
VI:
- Logs: Bản ghi sự kiện để debug và audit
- Metrics: Đo lường số học theo thời gian (counters, gauges, histograms)
- Traces: Luồng request qua các services (distributed tracing)
Tech Stack / Công Nghệ
EN:
- Logging:
@goodgo/logger(Pino-based structured logging) - Metrics: Prometheus + Grafana
- Tracing: OpenTelemetry + Jaeger (
@goodgo/tracing) - Correlation IDs: Request tracking across services
VI:
- Logging:
@goodgo/logger(structured logging dựa trên Pino) - Metrics: Prometheus + Grafana
- Tracing: OpenTelemetry + Jaeger (
@goodgo/tracing) - Correlation IDs: Theo dõi request qua các services
Common Patterns / Các Pattern Thường Dùng
Structured Logging / Logging Có Cấu Trúc
EN: Use structured logging with correlation IDs for request tracking.
VI: Sử dụng structured logging với correlation IDs để theo dõi request.
Example from codebase: services/iam-service/src/middlewares/logger.middleware.ts
import { Request, Response, NextFunction } from 'express';
import { logger } from '@goodgo/logger';
import { getCorrelationId, getRequestId } from './correlation.middleware';
export const requestLogger = (req: Request, res: Response, next: NextFunction): void => {
// Skip detailed logging for health checks and metrics
if (req.path.startsWith('/health') || req.path.startsWith('/metrics')) {
return next();
}
const start = Date.now();
res.on('finish', () => {
const duration = Date.now() - start;
const correlationId = getCorrelationId(req);
const requestId = getRequestId(req);
logger.info('Request processed / Request đã xử lý', {
correlationId,
requestId,
method: req.method,
path: req.path,
query: req.query,
statusCode: res.statusCode,
duration: `${duration}ms`,
contentLength: res.get('Content-Length') || 0,
userAgent: req.get('User-Agent'),
ip: req.ip,
userId: (req as any).user?.userId,
});
});
next();
};
Correlation IDs / Correlation IDs
EN: Use correlation IDs to track requests across services.
VI: Sử dụng correlation IDs để theo dõi request qua các services.
Example from codebase: services/iam-service/src/middlewares/correlation.middleware.ts
import { Request, Response, NextFunction } from 'express';
import { randomUUID } from 'crypto';
import { logger } from '@goodgo/logger';
export const CORRELATION_ID_HEADER = 'x-correlation-id';
export const REQUEST_ID_HEADER = 'x-request-id';
export const correlationMiddleware = (
options: {
headerName?: string;
generateId?: () => string;
skipPaths?: string[];
} = {}
) => {
const {
headerName = CORRELATION_ID_HEADER,
generateId = randomUUID,
skipPaths = ['/health', '/metrics', '/favicon.ico'],
} = options;
return (req: Request, res: Response, next: NextFunction) => {
// Get correlation ID from header or generate new one
const correlationId = req.headers[headerName.toLowerCase()] as string || generateId();
const requestId = generateId();
// Attach to request object
req.correlationId = correlationId;
req.requestId = requestId;
// Add to response headers
res.setHeader(headerName, correlationId);
res.setHeader(REQUEST_ID_HEADER, requestId);
// Log request start
logger.info('Request started / Request bắt đầu', {
correlationId,
requestId,
method: req.method,
url: req.url,
userAgent: req.get('User-Agent'),
ip: req.ip,
});
next();
};
};
Metrics Collection / Thu Thập Metrics
EN: Expose Prometheus metrics for monitoring and alerting.
VI: Expose Prometheus metrics để monitoring và alerting.
Example from codebase: services/iam-service/src/middlewares/metrics.middleware.ts
import { Request, Response, NextFunction } from 'express';
import client from 'prom-client';
import { getCorrelationId } from './correlation.middleware';
// Create a Registry which registers the metrics
const register = client.register;
// Collect default metrics
client.collectDefaultMetrics({ register });
// Create histogram for HTTP request duration
const httpRequestDurationSeconds = new client.Histogram({
name: 'http_request_duration_seconds',
help: 'Duration of HTTP requests in seconds',
labelNames: ['method', 'route', 'status_code', 'correlation_id'],
buckets: [0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1, 2, 5, 10],
});
// Create counter for total HTTP requests
const httpRequestsTotal = new client.Counter({
name: 'http_requests_total',
help: 'Total number of HTTP requests',
labelNames: ['method', 'route', 'status_code'],
});
// Create gauge for active requests
const activeRequests = new client.Gauge({
name: 'http_active_requests',
help: 'Number of active HTTP requests',
});
// Create counter for HTTP request errors
const httpRequestErrors = new client.Counter({
name: 'http_request_errors_total',
help: 'Total number of HTTP request errors',
labelNames: ['method', 'route', 'error_type'],
});
export const metricsMiddleware = (req: Request, res: Response, next: NextFunction) => {
// Increment active requests
activeRequests.inc();
// Start timer
const start = process.hrtime.bigint();
res.on('finish', () => {
// Decrement active requests
activeRequests.dec();
// Calculate duration
const end = process.hrtime.bigint();
const durationInSeconds = Number(end - start) / 1e9;
// Normalize path to avoid high cardinality
const route = normalizeRoutePath(req);
const correlationId = getCorrelationId(req) || 'unknown';
// Record duration
httpRequestDurationSeconds
.labels(req.method, route, res.statusCode.toString(), correlationId)
.observe(durationInSeconds);
// Increment request counter
httpRequestsTotal
.labels(req.method, route, res.statusCode.toString())
.inc();
// Track errors
if (res.statusCode >= 400) {
const errorType = res.statusCode >= 500 ? 'server_error' : 'client_error';
httpRequestErrors
.labels(req.method, route, errorType)
.inc();
}
});
next();
};
// Normalize route path to prevent high cardinality metrics
function normalizeRoutePath(req: Request): string {
if (req.route && req.route.path) {
return req.route.path;
}
let path = req.path;
// Replace UUIDs and numeric IDs with placeholders
path = path.replace(/[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}/gi, ':uuid');
path = path.replace(/\d+/g, ':id');
return path;
}
Distributed Tracing / Distributed Tracing
EN: Use OpenTelemetry for distributed tracing across services.
VI: Sử dụng OpenTelemetry cho distributed tracing qua các services.
Example from codebase: packages/tracing/src/index.ts
import { NodeSDK } from '@opentelemetry/sdk-node';
import { getNodeAutoInstrumentations } from '@opentelemetry/auto-instrumentations-node';
import { JaegerExporter } from '@opentelemetry/exporter-jaeger';
import { Resource } from '@opentelemetry/resources';
import { SemanticResourceAttributes } from '@opentelemetry/semantic-conventions';
export interface TracingConfig {
serviceName: string;
jaegerEndpoint?: string;
enabled?: boolean;
}
export const initTracing = (config: TracingConfig): NodeSDK | null => {
if (config.enabled === false) {
return null;
}
// Create Jaeger exporter if endpoint is provided
const jaegerExporter = config.jaegerEndpoint
? new JaegerExporter({
endpoint: config.jaegerEndpoint,
})
: undefined;
// Initialize OpenTelemetry NodeSDK with auto-instrumentations
const sdk = new NodeSDK({
resource: new Resource({
[SemanticResourceAttributes.SERVICE_NAME]: config.serviceName,
}),
traceExporter: jaegerExporter,
instrumentations: [getNodeAutoInstrumentations()],
});
// Start the tracing SDK
sdk.start();
return sdk;
};
Usage in service:
// services/iam-service/src/main.ts
import { initTracing } from '@goodgo/tracing';
// Initialize tracing
if (process.env.TRACING_ENABLED === 'true') {
initTracing({
serviceName: process.env.SERVICE_NAME || 'iam-service',
jaegerEndpoint: process.env.JAEGER_ENDPOINT,
enabled: true,
});
}
Health Checks / Kiểm Tra Sức Khỏe
EN: Implement liveness and readiness probes for Kubernetes.
VI: Triển khai liveness và readiness probes cho Kubernetes.
Example from codebase: services/iam-service/src/modules/health/health.controller.ts
import { Request, Response } from 'express';
import { prisma } from '../../config/database.config';
import { ApiResponse } from '@goodgo/types';
export class HealthController {
/**
* EN: Basic liveness probe
* VI: Kiểm tra liveness cơ bản
*/
health = async (_req: Request, res: Response): Promise<void> => {
const response: ApiResponse<{ status: string; timestamp: string }> = {
success: true,
data: {
status: 'ok',
timestamp: new Date().toISOString(),
},
timestamp: new Date().toISOString(),
};
res.json(response);
};
/**
* EN: Readiness probe (checks database connection)
* VI: Kiểm tra readiness (kiểm tra kết nối database)
*/
ready = async (_req: Request, res: Response): Promise<void> => {
try {
// Check database connection
await prisma.$queryRaw`SELECT 1`;
res.json({
success: true,
data: { status: 'ready' },
timestamp: new Date().toISOString(),
});
} catch (error) {
// Return 503 if database is not ready
res.status(503).json({
success: false,
error: {
code: 'HEALTH_001',
message: 'Service not ready',
},
timestamp: new Date().toISOString(),
});
}
};
/**
* EN: Alias for health check
* VI: Alias cho kiểm tra sức khỏe
*/
live = async (_req: Request, res: Response): Promise<void> => {
res.json({
success: true,
data: { status: 'live' },
timestamp: new Date().toISOString(),
});
};
}
Best Practices / Thực Hành Tốt Nhất
Logging / Logging
EN:
- Use structured logging (JSON format)
- Include correlation IDs for request tracing
- Log at appropriate levels (ERROR, WARN, INFO, DEBUG)
- Avoid logging sensitive data (passwords, tokens, PII)
- Use consistent log format across services
VI:
- Sử dụng structured logging (định dạng JSON)
- Bao gồm correlation IDs để theo dõi request
- Log ở mức độ phù hợp (ERROR, WARN, INFO, DEBUG)
- Tránh log dữ liệu nhạy cảm (mật khẩu, tokens, PII)
- Sử dụng format log nhất quán giữa các services
Metrics / Metrics
EN:
- Use standard metric types (Counter, Gauge, Histogram)
- Keep cardinality low (avoid high-cardinality labels)
- Define SLIs and SLOs for critical paths
- Monitor business metrics, not just technical ones
- Normalize route paths to prevent high cardinality
VI:
- Sử dụng các loại metric chuẩn (Counter, Gauge, Histogram)
- Giữ cardinality thấp (tránh high-cardinality labels)
- Định nghĩa SLIs và SLOs cho các đường dẫn quan trọng
- Giám sát business metrics, không chỉ technical metrics
- Chuẩn hóa route paths để tránh high cardinality
Tracing / Tracing
EN:
- Add traces for critical operations
- Include relevant context in spans
- Sample appropriately to control costs
- Use distributed tracing for microservices
- Propagate correlation IDs across service boundaries
VI:
- Thêm traces cho các thao tác quan trọng
- Bao gồm context liên quan trong spans
- Sample phù hợp để kiểm soát chi phí
- Sử dụng distributed tracing cho microservices
- Truyền correlation IDs qua ranh giới service
Alerting / Cảnh Báo
EN:
- Alert on symptoms, not causes
- Include runbook links in alerts
- Avoid alert fatigue with proper thresholds
- Test alerting rules regularly
- Use correlation IDs in alert context
VI:
- Cảnh báo về triệu chứng, không phải nguyên nhân
- Bao gồm links runbook trong alerts
- Tránh alert fatigue với thresholds phù hợp
- Test alerting rules thường xuyên
- Sử dụng correlation IDs trong alert context
Examples from Project / Ví Dụ Từ Dự Án
Logging Implementation / Triển Khai Logging
- Request Logger:
services/iam-service/src/middlewares/logger.middleware.ts - Correlation Middleware:
services/iam-service/src/middlewares/correlation.middleware.ts
Metrics Implementation / Triển Khai Metrics
- Metrics Middleware:
services/iam-service/src/middlewares/metrics.middleware.ts - Metrics Endpoint: Exposed at
/metricsin all services
Tracing Implementation / Triển Khai Tracing
- Tracing Package:
packages/tracing/src/index.ts - Service Integration:
services/iam-service/src/main.ts
Health Checks / Health Checks
- Health Controller:
services/iam-service/src/modules/health/health.controller.ts
Quick Reference / Tham Khảo Nhanh
Log Levels / Mức Độ Log
EN:
ERROR: Errors that require immediate attentionWARN: Warnings that may indicate issuesINFO: Informational messages (default)DEBUG: Detailed debugging information
VI:
ERROR: Lỗi cần chú ý ngay lập tứcWARN: Cảnh báo có thể chỉ ra vấn đềINFO: Thông điệp thông tin (mặc định)DEBUG: Thông tin debug chi tiết
Metric Types / Loại Metrics
EN:
- Counter: Monotonically increasing value (e.g., request count)
- Gauge: Value that can go up or down (e.g., active connections)
- Histogram: Distribution of values (e.g., request duration)
VI:
- Counter: Giá trị tăng đơn điệu (ví dụ: số lượng request)
- Gauge: Giá trị có thể tăng hoặc giảm (ví dụ: kết nối đang hoạt động)
- Histogram: Phân phối giá trị (ví dụ: thời lượng request)
Health Check Endpoints / Endpoints Health Check
EN:
/healthor/health/live: Liveness probe (service is running)/health/ready: Readiness probe (service is ready to accept traffic)
VI:
/healthhoặc/health/live: Liveness probe (service đang chạy)/health/ready: Readiness probe (service sẵn sàng nhận traffic)
Prometheus Queries / Truy Vấn Prometheus
# Request rate
rate(http_requests_total[5m])
# Error rate
rate(http_requests_total{status_code=~"5.."}[5m])
# 95th percentile latency
histogram_quantile(0.95, http_request_duration_seconds)
# Active requests
http_active_requests
Related Skills / Skills Liên Quan
- Kubernetes Deployment - For configuring health checks in K8s
- Security - For secure logging and monitoring
- Project Rules - For service structure and standards