- Updated Mermaid diagrams in the deployment and development guides for better visual representation and consistency. - Improved formatting and clarity in the Kubernetes local deployment and IAM migration guides, including detailed workflows and troubleshooting sections. - Enhanced the Vietnamese documentation to align with the English version, ensuring consistency across guides. - Added quick tips and common issues sections to facilitate user navigation and understanding.
302 lines
7.4 KiB
Markdown
302 lines
7.4 KiB
Markdown
# Observability Stack Guide
|
|
|
|
This guide explains how to use the observability stack (Grafana, Prometheus, Loki, Promtail) included in the infrastructure.
|
|
|
|
## Architecture Overview
|
|
|
|
### Components
|
|
|
|
The stack consists of the following components:
|
|
|
|
- **Prometheus**: Metrics collection and storage
|
|
- **Loki**: Log aggregation system
|
|
- **Promtail**: Log collector agent
|
|
- **Grafana**: Unified visualization dashboard
|
|
|
|
### Architecture Diagram
|
|
|
|
```mermaid
|
|
flowchart LR
|
|
subgraph Services["Microservices"]
|
|
IAM[IAM Service]
|
|
USER[User Service]
|
|
TRAEFIK[Traefik Gateway]
|
|
end
|
|
|
|
subgraph Collection["Data Collection"]
|
|
PROM[Prometheus]
|
|
PROMTAIL[Promtail]
|
|
end
|
|
|
|
subgraph Storage["Data Storage"]
|
|
PROM_DB[(Prometheus DB)]
|
|
LOKI_DB[(Loki DB)]
|
|
end
|
|
|
|
subgraph Visualization["Visualization"]
|
|
GRAFANA[Grafana Dashboard]
|
|
end
|
|
|
|
IAM -->|Metrics| PROM
|
|
USER -->|Metrics| PROM
|
|
TRAEFIK -->|Metrics| PROM
|
|
|
|
IAM -.->|Logs| PROMTAIL
|
|
USER -.->|Logs| PROMTAIL
|
|
TRAEFIK -.->|Logs| PROMTAIL
|
|
|
|
PROM -->|Store| PROM_DB
|
|
PROMTAIL -->|Push| LOKI_DB
|
|
|
|
PROM_DB -->|Query| GRAFANA
|
|
LOKI_DB -->|Query| GRAFANA
|
|
|
|
style Services fill:#2d3748
|
|
style Collection fill:#2c5282
|
|
style Storage fill:#2f855a
|
|
style Visualization fill:#744210
|
|
style IAM fill:#4a5568
|
|
style USER fill:#4a5568
|
|
style TRAEFIK fill:#4a5568
|
|
style PROM fill:#3182ce
|
|
style PROMTAIL fill:#3182ce
|
|
style PROM_DB fill:#38a169
|
|
style LOKI_DB fill:#38a169
|
|
style GRAFANA fill:#d69e2e
|
|
```
|
|
|
|
### Data Flow
|
|
|
|
```mermaid
|
|
sequenceDiagram
|
|
participant S as Service
|
|
participant PT as Promtail
|
|
participant P as Prometheus
|
|
participant L as Loki
|
|
participant G as Grafana
|
|
|
|
Note over S,G: Metrics Flow
|
|
S->>P: Expose /metrics endpoint
|
|
P->>P: Scrape metrics (15s interval)
|
|
G->>P: Query PromQL
|
|
P-->>G: Return metrics data
|
|
|
|
Note over S,G: Logs Flow
|
|
S->>PT: Write logs to stdout
|
|
PT->>PT: Parse & Label logs
|
|
PT->>L: Push logs via HTTP
|
|
G->>L: Query LogQL
|
|
L-->>G: Return log data
|
|
```
|
|
|
|
## Getting Started
|
|
|
|
### Prerequisites
|
|
|
|
- Docker and Docker Compose installed
|
|
- Existing `microservices-network` (created by the main application stack or manually)
|
|
|
|
### Starting the Stack
|
|
|
|
You can easily start the stack using the provided script:
|
|
|
|
```bash
|
|
./scripts/observability/start.sh
|
|
```
|
|
|
|
Or manually:
|
|
|
|
```bash
|
|
docker network create microservices-network || true
|
|
|
|
cd infra/observability
|
|
docker-compose -f docker-compose.observability.yml up -d
|
|
```
|
|
|
|
Check if all containers are running:
|
|
|
|
```bash
|
|
docker ps
|
|
```
|
|
|
|
You should see `grafana`, `prometheus`, `loki`, and `promtail`.
|
|
|
|
## Accessing Services
|
|
|
|
| Service | URL | Credentials | Description |
|
|
| :--- | :--- | :--- | :--- |
|
|
| **Grafana** | [http://localhost:3001](http://localhost:3001) | `admin` / `admin` | Main dashboard for visualization |
|
|
| **Prometheus** | [http://localhost:9090](http://localhost:9090) | N/A | Raw metrics and target status |
|
|
| **Loki** | [http://localhost:3100](http://localhost:3100) | N/A | Log aggregation API (no UI) |
|
|
|
|
## Using Grafana
|
|
|
|
### Initial Setup
|
|
|
|
1. **Login**: Access [http://localhost:3001](http://localhost:3001) and login with `admin`/`admin`
|
|
2. **Change Password**: You'll be prompted to change the default password (recommended)
|
|
3. **Verify Datasources**:
|
|
- Navigate to **Configuration** → **Data Sources**
|
|
- Ensure both **Prometheus** and **Loki** are connected
|
|
|
|
### Exploring Data
|
|
|
|
Go to **Explore** (compass icon) in the sidebar:
|
|
- Select **Loki** from the datasource dropdown to search logs
|
|
- Select **Prometheus** from the datasource dropdown to query metrics
|
|
|
|
### Viewing Logs (Loki)
|
|
|
|
In the **Explore** view with **Loki** selected:
|
|
|
|
1. Click **Label browser**
|
|
2. Select a label, e.g., `container`
|
|
3. Choose a specific container (e.g., `iam-service` or `traefik`)
|
|
4. Click **Show logs**
|
|
|
|
**LogQL Query Examples:**
|
|
|
|
```logql
|
|
{container="iam-service"}
|
|
{container="iam-service"} |= "error"
|
|
{container="iam-service"} |= "error" | json
|
|
```
|
|
|
|
### Viewing Metrics (Prometheus)
|
|
|
|
In the **Explore** view with **Prometheus** selected:
|
|
|
|
1. Type a metric name in the query field (e.g., `up`, `container_memory_usage_bytes`)
|
|
2. Click **Run query**
|
|
|
|
**PromQL Query Examples:**
|
|
|
|
```promql
|
|
up
|
|
|
|
rate(http_requests_total[5m])
|
|
|
|
container_memory_usage_bytes{container="iam-service"}
|
|
```
|
|
|
|
## Configuration
|
|
|
|
### File Locations
|
|
|
|
- **Prometheus**: `infra/observability/prometheus/prometheus.yml`
|
|
- **Promtail**: `infra/observability/promtail/promtail-config.yml`
|
|
- **Grafana**: `infra/observability/grafana/`
|
|
|
|
### Custom Metrics
|
|
|
|
To expose custom metrics from your service:
|
|
|
|
```typescript
|
|
import { Counter, Histogram } from 'prom-client';
|
|
|
|
const requestCounter = new Counter({
|
|
name: 'http_requests_total',
|
|
help: 'Total HTTP requests',
|
|
labelNames: ['method', 'route', 'status']
|
|
});
|
|
|
|
const requestDuration = new Histogram({
|
|
name: 'http_request_duration_seconds',
|
|
help: 'HTTP request duration',
|
|
labelNames: ['method', 'route']
|
|
});
|
|
```
|
|
|
|
### Custom Dashboards
|
|
|
|
Create custom dashboards in Grafana:
|
|
|
|
1. Click **+** → **Dashboard**
|
|
2. Add **Panel**
|
|
3. Configure query (Prometheus or Loki)
|
|
4. Save dashboard
|
|
|
|
## Color Palette Reference
|
|
|
|
Diagrams use a dark color palette for better readability:
|
|
|
|
| Component Type | Fill Color | Stroke Color | Purpose |
|
|
|----------------|------------|--------------|----------|
|
|
| Services | `#e94560` | `#c81e3b` | Microservices (red) |
|
|
| Collectors | `#f39c12` | `#d68910` | Data collection (orange) |
|
|
| Storage | `#3498db` | `#2874a6` | Storage (blue) |
|
|
| Visualization | `#9b59b6` | `#7d3c98` | Visualization (purple) |
|
|
| Subgraphs | `#1a1a2e` - `#533483` | `#16213e` - `#0f3460` | Logical groups |
|
|
|
|
**All text uses `color:#ffffff` (white) for readability on dark backgrounds**
|
|
|
|
## Quick Tips
|
|
|
|
### Mermaid Common Issues
|
|
|
|
**DO:**
|
|
- Use `flowchart LR` for left-to-right flow
|
|
- Use `sequenceDiagram` for time-based interactions
|
|
- Apply dark colors for better contrast
|
|
- Use descriptive node IDs
|
|
|
|
**DON'T:**
|
|
- Mix `graph` and `flowchart` syntax
|
|
- Use special characters in node IDs without quotes
|
|
- Forget closing brackets for subgraphs
|
|
|
|
### LogQL Quick Reference
|
|
|
|
```logql
|
|
{label="value"}
|
|
{label="value"} |= "search"
|
|
{label="value"} |= "error" | json
|
|
{label="value"} | logfmt
|
|
```
|
|
|
|
### PromQL Quick Reference
|
|
|
|
```promql
|
|
metric_name
|
|
metric_name{label="value"}
|
|
rate(metric_name[5m])
|
|
sum(metric_name) by (label)
|
|
```
|
|
|
|
### Visual Indicators
|
|
|
|
- **Metrics**: Numerical time-series data
|
|
- **Logs**: Text-based event records
|
|
- **Query**: Search/filter operations
|
|
- **Explore**: Investigation interface
|
|
- **Dashboard**: Pre-configured visualizations
|
|
|
|
## Troubleshooting
|
|
|
|
### Common Issues
|
|
|
|
| Issue | Symptoms | Solution |
|
|
|-------|----------|----------|
|
|
| No logs visible | Grafana Explore shows no logs | Check Promtail is running: `docker ps \| grep promtail` |
|
|
| Missing metrics | Services don't appear in Prometheus targets | Check service `/metrics` endpoint |
|
|
| Container won't start | `docker ps` doesn't show container | View logs: `docker-compose logs <service-name>` |
|
|
| Network issue | Services can't connect | Create network: `docker network create microservices-network` |
|
|
|
|
### Logs Not Appearing in Loki
|
|
|
|
1. Check Promtail logs: `docker logs promtail`
|
|
2. Verify container labels are correct
|
|
3. Ensure services are on `microservices-network`
|
|
|
|
### Metrics Not Appearing in Prometheus
|
|
|
|
1. Check Prometheus targets: [http://localhost:9090/targets](http://localhost:9090/targets)
|
|
2. Verify service exposes `/metrics` endpoint
|
|
3. Check Prometheus scrape config
|
|
|
|
### Grafana Shows "No Data"
|
|
|
|
1. Verify datasource connection (Configuration → Data Sources)
|
|
2. Check time range in query
|
|
3. Ensure data exists in Prometheus/Loki
|