Files
pos-system/docs/en/guides/observability.md
Ho Ngoc Hai 9ba4a478ee feat(docs): Enhance deployment and development guides with improved clarity and structure
- Updated Mermaid diagrams in the deployment and development guides for better visual representation and consistency.
- Improved formatting and clarity in the Kubernetes local deployment and IAM migration guides, including detailed workflows and troubleshooting sections.
- Enhanced the Vietnamese documentation to align with the English version, ensuring consistency across guides.
- Added quick tips and common issues sections to facilitate user navigation and understanding.
2026-01-08 17:10:06 +07:00

302 lines
7.4 KiB
Markdown

# Observability Stack Guide
This guide explains how to use the observability stack (Grafana, Prometheus, Loki, Promtail) included in the infrastructure.
## Architecture Overview
### Components
The stack consists of the following components:
- **Prometheus**: Metrics collection and storage
- **Loki**: Log aggregation system
- **Promtail**: Log collector agent
- **Grafana**: Unified visualization dashboard
### Architecture Diagram
```mermaid
flowchart LR
subgraph Services["Microservices"]
IAM[IAM Service]
USER[User Service]
TRAEFIK[Traefik Gateway]
end
subgraph Collection["Data Collection"]
PROM[Prometheus]
PROMTAIL[Promtail]
end
subgraph Storage["Data Storage"]
PROM_DB[(Prometheus DB)]
LOKI_DB[(Loki DB)]
end
subgraph Visualization["Visualization"]
GRAFANA[Grafana Dashboard]
end
IAM -->|Metrics| PROM
USER -->|Metrics| PROM
TRAEFIK -->|Metrics| PROM
IAM -.->|Logs| PROMTAIL
USER -.->|Logs| PROMTAIL
TRAEFIK -.->|Logs| PROMTAIL
PROM -->|Store| PROM_DB
PROMTAIL -->|Push| LOKI_DB
PROM_DB -->|Query| GRAFANA
LOKI_DB -->|Query| GRAFANA
style Services fill:#2d3748
style Collection fill:#2c5282
style Storage fill:#2f855a
style Visualization fill:#744210
style IAM fill:#4a5568
style USER fill:#4a5568
style TRAEFIK fill:#4a5568
style PROM fill:#3182ce
style PROMTAIL fill:#3182ce
style PROM_DB fill:#38a169
style LOKI_DB fill:#38a169
style GRAFANA fill:#d69e2e
```
### Data Flow
```mermaid
sequenceDiagram
participant S as Service
participant PT as Promtail
participant P as Prometheus
participant L as Loki
participant G as Grafana
Note over S,G: Metrics Flow
S->>P: Expose /metrics endpoint
P->>P: Scrape metrics (15s interval)
G->>P: Query PromQL
P-->>G: Return metrics data
Note over S,G: Logs Flow
S->>PT: Write logs to stdout
PT->>PT: Parse & Label logs
PT->>L: Push logs via HTTP
G->>L: Query LogQL
L-->>G: Return log data
```
## Getting Started
### Prerequisites
- Docker and Docker Compose installed
- Existing `microservices-network` (created by the main application stack or manually)
### Starting the Stack
You can easily start the stack using the provided script:
```bash
./scripts/observability/start.sh
```
Or manually:
```bash
docker network create microservices-network || true
cd infra/observability
docker-compose -f docker-compose.observability.yml up -d
```
Check if all containers are running:
```bash
docker ps
```
You should see `grafana`, `prometheus`, `loki`, and `promtail`.
## Accessing Services
| Service | URL | Credentials | Description |
| :--- | :--- | :--- | :--- |
| **Grafana** | [http://localhost:3001](http://localhost:3001) | `admin` / `admin` | Main dashboard for visualization |
| **Prometheus** | [http://localhost:9090](http://localhost:9090) | N/A | Raw metrics and target status |
| **Loki** | [http://localhost:3100](http://localhost:3100) | N/A | Log aggregation API (no UI) |
## Using Grafana
### Initial Setup
1. **Login**: Access [http://localhost:3001](http://localhost:3001) and login with `admin`/`admin`
2. **Change Password**: You'll be prompted to change the default password (recommended)
3. **Verify Datasources**:
- Navigate to **Configuration****Data Sources**
- Ensure both **Prometheus** and **Loki** are connected
### Exploring Data
Go to **Explore** (compass icon) in the sidebar:
- Select **Loki** from the datasource dropdown to search logs
- Select **Prometheus** from the datasource dropdown to query metrics
### Viewing Logs (Loki)
In the **Explore** view with **Loki** selected:
1. Click **Label browser**
2. Select a label, e.g., `container`
3. Choose a specific container (e.g., `iam-service` or `traefik`)
4. Click **Show logs**
**LogQL Query Examples:**
```logql
{container="iam-service"}
{container="iam-service"} |= "error"
{container="iam-service"} |= "error" | json
```
### Viewing Metrics (Prometheus)
In the **Explore** view with **Prometheus** selected:
1. Type a metric name in the query field (e.g., `up`, `container_memory_usage_bytes`)
2. Click **Run query**
**PromQL Query Examples:**
```promql
up
rate(http_requests_total[5m])
container_memory_usage_bytes{container="iam-service"}
```
## Configuration
### File Locations
- **Prometheus**: `infra/observability/prometheus/prometheus.yml`
- **Promtail**: `infra/observability/promtail/promtail-config.yml`
- **Grafana**: `infra/observability/grafana/`
### Custom Metrics
To expose custom metrics from your service:
```typescript
import { Counter, Histogram } from 'prom-client';
const requestCounter = new Counter({
name: 'http_requests_total',
help: 'Total HTTP requests',
labelNames: ['method', 'route', 'status']
});
const requestDuration = new Histogram({
name: 'http_request_duration_seconds',
help: 'HTTP request duration',
labelNames: ['method', 'route']
});
```
### Custom Dashboards
Create custom dashboards in Grafana:
1. Click **+** → **Dashboard**
2. Add **Panel**
3. Configure query (Prometheus or Loki)
4. Save dashboard
## Color Palette Reference
Diagrams use a dark color palette for better readability:
| Component Type | Fill Color | Stroke Color | Purpose |
|----------------|------------|--------------|----------|
| Services | `#e94560` | `#c81e3b` | Microservices (red) |
| Collectors | `#f39c12` | `#d68910` | Data collection (orange) |
| Storage | `#3498db` | `#2874a6` | Storage (blue) |
| Visualization | `#9b59b6` | `#7d3c98` | Visualization (purple) |
| Subgraphs | `#1a1a2e` - `#533483` | `#16213e` - `#0f3460` | Logical groups |
**All text uses `color:#ffffff` (white) for readability on dark backgrounds**
## Quick Tips
### Mermaid Common Issues
**DO:**
- Use `flowchart LR` for left-to-right flow
- Use `sequenceDiagram` for time-based interactions
- Apply dark colors for better contrast
- Use descriptive node IDs
**DON'T:**
- Mix `graph` and `flowchart` syntax
- Use special characters in node IDs without quotes
- Forget closing brackets for subgraphs
### LogQL Quick Reference
```logql
{label="value"}
{label="value"} |= "search"
{label="value"} |= "error" | json
{label="value"} | logfmt
```
### PromQL Quick Reference
```promql
metric_name
metric_name{label="value"}
rate(metric_name[5m])
sum(metric_name) by (label)
```
### Visual Indicators
- **Metrics**: Numerical time-series data
- **Logs**: Text-based event records
- **Query**: Search/filter operations
- **Explore**: Investigation interface
- **Dashboard**: Pre-configured visualizations
## Troubleshooting
### Common Issues
| Issue | Symptoms | Solution |
|-------|----------|----------|
| No logs visible | Grafana Explore shows no logs | Check Promtail is running: `docker ps \| grep promtail` |
| Missing metrics | Services don't appear in Prometheus targets | Check service `/metrics` endpoint |
| Container won't start | `docker ps` doesn't show container | View logs: `docker-compose logs <service-name>` |
| Network issue | Services can't connect | Create network: `docker network create microservices-network` |
### Logs Not Appearing in Loki
1. Check Promtail logs: `docker logs promtail`
2. Verify container labels are correct
3. Ensure services are on `microservices-network`
### Metrics Not Appearing in Prometheus
1. Check Prometheus targets: [http://localhost:9090/targets](http://localhost:9090/targets)
2. Verify service exposes `/metrics` endpoint
3. Check Prometheus scrape config
### Grafana Shows "No Data"
1. Verify datasource connection (Configuration → Data Sources)
2. Check time range in query
3. Ensure data exists in Prometheus/Loki