- Added detailed troubleshooting tables in the IAM migration, observability, and Kubernetes local deployment guides to assist users in diagnosing common issues. - Improved Mermaid diagrams with clearer component labels and color coding for better visualization. - Enhanced the structure of the Vietnamese documentation to align with the English version, ensuring consistency across guides. - Included quick tips and common issues sections to facilitate user navigation and understanding.
7.7 KiB
7.7 KiB
Observability Stack Guide
This guide explains how to use the observability stack (Grafana, Prometheus, Loki, Promtail) included in the infrastructure.
Architecture Overview
Components
The stack consists of the following components:
- Prometheus: Metrics collection and storage
- Loki: Log aggregation system
- Promtail: Log collector agent
- Grafana: Unified visualization dashboard
Architecture Diagram
flowchart LR
subgraph Services["Microservices"]
IAM[IAM Service]
USER[User Service]
TRAEFIK[Traefik Gateway]
end
subgraph Collection["Data Collection"]
PROM[Prometheus]
PROMTAIL[Promtail]
end
subgraph Storage["Data Storage"]
PROM_DB[(Prometheus DB)]
LOKI_DB[(Loki DB)]
end
subgraph Visualization["Visualization"]
GRAFANA[Grafana Dashboard]
end
IAM -->|Metrics| PROM
USER -->|Metrics| PROM
TRAEFIK -->|Metrics| PROM
IAM -.->|Logs| PROMTAIL
USER -.->|Logs| PROMTAIL
TRAEFIK -.->|Logs| PROMTAIL
PROM -->|Store| PROM_DB
PROMTAIL -->|Push| LOKI_DB
PROM_DB -->|Query| GRAFANA
LOKI_DB -->|Query| GRAFANA
style Services fill:#2d3748
style Collection fill:#2c5282
style Storage fill:#2f855a
style Visualization fill:#744210
style IAM fill:#4a5568
style USER fill:#4a5568
style TRAEFIK fill:#4a5568
style PROM fill:#3182ce
style PROMTAIL fill:#3182ce
style PROM_DB fill:#38a169
style LOKI_DB fill:#38a169
style GRAFANA fill:#d69e2e
Data Flow
sequenceDiagram
participant S as Service
participant PT as Promtail
participant P as Prometheus
participant L as Loki
participant G as Grafana
Note over S,G: Metrics Flow
S->>P: Expose /metrics endpoint
P->>P: Scrape metrics (15s interval)
G->>P: Query PromQL
P-->>G: Return metrics data
Note over S,G: Logs Flow
S->>PT: Write logs to stdout
PT->>PT: Parse & Label logs
PT->>L: Push logs via HTTP
G->>L: Query LogQL
L-->>G: Return log data
Getting Started
Prerequisites
- Docker and Docker Compose installed
- Existing
microservices-network(created by the main application stack or manually)
Starting the Stack
You can easily start the stack using the provided script:
./scripts/observability/start.sh
Or manually:
docker network create microservices-network || true
cd infra/observability
docker-compose -f docker-compose.observability.yml up -d
Check if all containers are running:
docker ps
You should see grafana, prometheus, loki, and promtail.
Accessing Services
| Service | URL | Credentials | Description |
|---|---|---|---|
| Grafana | http://localhost:3001 | admin / admin |
Main dashboard for visualization |
| Prometheus | http://localhost:9090 | N/A | Raw metrics and target status |
| Loki | http://localhost:3100 | N/A | Log aggregation API (no UI) |
Using Grafana
Initial Setup
- Login: Access http://localhost:3001 and login with
admin/admin - Change Password: You'll be prompted to change the default password (recommended)
- Verify Datasources:
- Navigate to Configuration → Data Sources
- Ensure both Prometheus and Loki are connected
Exploring Data
Go to Explore (compass icon) in the sidebar:
- Select Loki from the datasource dropdown to search logs
- Select Prometheus from the datasource dropdown to query metrics
Viewing Logs (Loki)
In the Explore view with Loki selected:
- Click Label browser
- Select a label, e.g.,
container - Choose a specific container (e.g.,
iam-serviceortraefik) - Click Show logs
LogQL Query Examples:
{container="iam-service"}
{container="iam-service"} |= "error"
{container="iam-service"} |= "error" | json
Viewing Metrics (Prometheus)
In the Explore view with Prometheus selected:
- Type a metric name in the query field (e.g.,
up,container_memory_usage_bytes) - Click Run query
PromQL Query Examples:
up
rate(http_requests_total[5m])
container_memory_usage_bytes{container="iam-service"}
Configuration
File Locations
- Prometheus:
infra/observability/prometheus/prometheus.yml - Promtail:
infra/observability/promtail/promtail-config.yml - Grafana:
infra/observability/grafana/
Custom Metrics
To expose custom metrics from your service:
import { Counter, Histogram } from 'prom-client';
const requestCounter = new Counter({
name: 'http_requests_total',
help: 'Total HTTP requests',
labelNames: ['method', 'route', 'status']
});
const requestDuration = new Histogram({
name: 'http_request_duration_seconds',
help: 'HTTP request duration',
labelNames: ['method', 'route']
});
Custom Dashboards
Create custom dashboards in Grafana:
- Click + → Dashboard
- Add Panel
- Configure query (Prometheus or Loki)
- Save dashboard
Color Palette Reference
Diagrams use a dark color palette for better readability:
| Component Type | Fill Color | Stroke Color | Purpose |
|---|---|---|---|
| 🚀 Services | #e94560 |
#c81e3b |
Microservices (red) |
| 📊 Collectors | #f39c12 |
#d68910 |
Data collection (orange) |
| 💾 Storage | #3498db |
#2874a6 |
Storage (blue) |
| 📊 Visualization | #9b59b6 |
#7d3c98 |
Visualization (purple) |
| 📦 Subgraphs | #1a1a2e - #533483 |
#16213e - #0f3460 |
Logical groups |
All text uses color:#ffffff (white) for readability on dark backgrounds
Quick Tips
Mermaid Common Issues
✅ DO:
- Use
flowchart LRfor left-to-right flow - Use
sequenceDiagramfor time-based interactions - Apply dark colors for better contrast
- Use descriptive node IDs
❌ DON'T:
- Mix
graphandflowchartsyntax - Use special characters in node IDs without quotes
- Forget closing brackets for subgraphs
LogQL Quick Reference
{label="value"}
{label="value"} |= "search"
{label="value"} |= "error" | json
{label="value"} | logfmt
PromQL Quick Reference
metric_name
metric_name{label="value"}
rate(metric_name[5m])
sum(metric_name) by (label)
Visual Indicators
- 📊 Metrics: Numerical time-series data
- 📝 Logs: Text-based event records
- 🎯 Query: Search/filter operations
- 🔍 Explore: Investigation interface
- 📈 Dashboard: Pre-configured visualizations
Troubleshooting
Common Issues
| Issue | Symptoms | Solution |
|---|---|---|
| ⚠️ No logs visible | Grafana Explore shows no logs | Check Promtail is running: docker ps | grep promtail |
| 📊 Missing metrics | Services don't appear in Prometheus targets | Check service /metrics endpoint |
| 🔴 Container won't start | docker ps doesn't show container |
View logs: docker-compose logs <service-name> |
| 🌐 Network issue | Services can't connect | Create network: docker network create microservices-network |
Logs Not Appearing in Loki
- Check Promtail logs:
docker logs promtail - Verify container labels are correct
- Ensure services are on
microservices-network
Metrics Not Appearing in Prometheus
- Check Prometheus targets: http://localhost:9090/targets
- Verify service exposes
/metricsendpoint - Check Prometheus scrape config
Grafana Shows "No Data"
- Verify datasource connection (Configuration → Data Sources)
- Check time range in query
- Ensure data exists in Prometheus/Loki