# Observability Stack Guide This guide explains how to use the observability stack (Grafana, Prometheus, Loki, Promtail) included in the infrastructure. ## Architecture Overview ### Components The stack consists of the following components: - **Prometheus**: Metrics collection and storage - **Loki**: Log aggregation system - **Promtail**: Log collector agent - **Grafana**: Unified visualization dashboard ### Architecture Diagram ```mermaid flowchart LR subgraph Services["Microservices"] IAM[IAM Service] USER[User Service] TRAEFIK[Traefik Gateway] end subgraph Collection["Data Collection"] PROM[Prometheus] PROMTAIL[Promtail] end subgraph Storage["Data Storage"] PROM_DB[(Prometheus DB)] LOKI_DB[(Loki DB)] end subgraph Visualization["Visualization"] GRAFANA[Grafana Dashboard] end IAM -->|Metrics| PROM USER -->|Metrics| PROM TRAEFIK -->|Metrics| PROM IAM -.->|Logs| PROMTAIL USER -.->|Logs| PROMTAIL TRAEFIK -.->|Logs| PROMTAIL PROM -->|Store| PROM_DB PROMTAIL -->|Push| LOKI_DB PROM_DB -->|Query| GRAFANA LOKI_DB -->|Query| GRAFANA style Services fill:#2d3748 style Collection fill:#2c5282 style Storage fill:#2f855a style Visualization fill:#744210 style IAM fill:#4a5568 style USER fill:#4a5568 style TRAEFIK fill:#4a5568 style PROM fill:#3182ce style PROMTAIL fill:#3182ce style PROM_DB fill:#38a169 style LOKI_DB fill:#38a169 style GRAFANA fill:#d69e2e ``` ### Data Flow ```mermaid sequenceDiagram participant S as Service participant PT as Promtail participant P as Prometheus participant L as Loki participant G as Grafana Note over S,G: Metrics Flow S->>P: Expose /metrics endpoint P->>P: Scrape metrics (15s interval) G->>P: Query PromQL P-->>G: Return metrics data Note over S,G: Logs Flow S->>PT: Write logs to stdout PT->>PT: Parse & Label logs PT->>L: Push logs via HTTP G->>L: Query LogQL L-->>G: Return log data ``` ## Getting Started ### Prerequisites - Docker and Docker Compose installed - Existing `microservices-network` (created by the main application stack or manually) ### Starting the Stack You can easily start the stack using the provided script: ```bash ./scripts/observability/start.sh ``` Or manually: ```bash docker network create microservices-network || true cd infra/observability docker-compose -f docker-compose.observability.yml up -d ``` Check if all containers are running: ```bash docker ps ``` You should see `grafana`, `prometheus`, `loki`, and `promtail`. ## Accessing Services | Service | URL | Credentials | Description | | :--- | :--- | :--- | :--- | | **Grafana** | [http://localhost:3001](http://localhost:3001) | `admin` / `admin` | Main dashboard for visualization | | **Prometheus** | [http://localhost:9090](http://localhost:9090) | N/A | Raw metrics and target status | | **Loki** | [http://localhost:3100](http://localhost:3100) | N/A | Log aggregation API (no UI) | ## Using Grafana ### Initial Setup 1. **Login**: Access [http://localhost:3001](http://localhost:3001) and login with `admin`/`admin` 2. **Change Password**: You'll be prompted to change the default password (recommended) 3. **Verify Datasources**: - Navigate to **Configuration** → **Data Sources** - Ensure both **Prometheus** and **Loki** are connected ### Exploring Data Go to **Explore** (compass icon) in the sidebar: - Select **Loki** from the datasource dropdown to search logs - Select **Prometheus** from the datasource dropdown to query metrics ### Viewing Logs (Loki) In the **Explore** view with **Loki** selected: 1. Click **Label browser** 2. Select a label, e.g., `container` 3. Choose a specific container (e.g., `iam-service` or `traefik`) 4. Click **Show logs** **LogQL Query Examples:** ```logql {container="iam-service"} {container="iam-service"} |= "error" {container="iam-service"} |= "error" | json ``` ### Viewing Metrics (Prometheus) In the **Explore** view with **Prometheus** selected: 1. Type a metric name in the query field (e.g., `up`, `container_memory_usage_bytes`) 2. Click **Run query** **PromQL Query Examples:** ```promql up rate(http_requests_total[5m]) container_memory_usage_bytes{container="iam-service"} ``` ## Configuration ### File Locations - **Prometheus**: `infra/observability/prometheus/prometheus.yml` - **Promtail**: `infra/observability/promtail/promtail-config.yml` - **Grafana**: `infra/observability/grafana/` ### Custom Metrics To expose custom metrics from your service: ```typescript import { Counter, Histogram } from 'prom-client'; const requestCounter = new Counter({ name: 'http_requests_total', help: 'Total HTTP requests', labelNames: ['method', 'route', 'status'] }); const requestDuration = new Histogram({ name: 'http_request_duration_seconds', help: 'HTTP request duration', labelNames: ['method', 'route'] }); ``` ### Custom Dashboards Create custom dashboards in Grafana: 1. Click **+** → **Dashboard** 2. Add **Panel** 3. Configure query (Prometheus or Loki) 4. Save dashboard ## Color Palette Reference Diagrams use a dark color palette for better readability: | Component Type | Fill Color | Stroke Color | Purpose | |----------------|------------|--------------|----------| | Services | `#e94560` | `#c81e3b` | Microservices (red) | | Collectors | `#f39c12` | `#d68910` | Data collection (orange) | | Storage | `#3498db` | `#2874a6` | Storage (blue) | | Visualization | `#9b59b6` | `#7d3c98` | Visualization (purple) | | Subgraphs | `#1a1a2e` - `#533483` | `#16213e` - `#0f3460` | Logical groups | **All text uses `color:#ffffff` (white) for readability on dark backgrounds** ## Quick Tips ### Mermaid Common Issues **DO:** - Use `flowchart LR` for left-to-right flow - Use `sequenceDiagram` for time-based interactions - Apply dark colors for better contrast - Use descriptive node IDs **DON'T:** - Mix `graph` and `flowchart` syntax - Use special characters in node IDs without quotes - Forget closing brackets for subgraphs ### LogQL Quick Reference ```logql {label="value"} {label="value"} |= "search" {label="value"} |= "error" | json {label="value"} | logfmt ``` ### PromQL Quick Reference ```promql metric_name metric_name{label="value"} rate(metric_name[5m]) sum(metric_name) by (label) ``` ### Visual Indicators - **Metrics**: Numerical time-series data - **Logs**: Text-based event records - **Query**: Search/filter operations - **Explore**: Investigation interface - **Dashboard**: Pre-configured visualizations ## Troubleshooting ### Common Issues | Issue | Symptoms | Solution | |-------|----------|----------| | No logs visible | Grafana Explore shows no logs | Check Promtail is running: `docker ps \| grep promtail` | | Missing metrics | Services don't appear in Prometheus targets | Check service `/metrics` endpoint | | Container won't start | `docker ps` doesn't show container | View logs: `docker-compose logs ` | | Network issue | Services can't connect | Create network: `docker network create microservices-network` | ### Logs Not Appearing in Loki 1. Check Promtail logs: `docker logs promtail` 2. Verify container labels are correct 3. Ensure services are on `microservices-network` ### Metrics Not Appearing in Prometheus 1. Check Prometheus targets: [http://localhost:9090/targets](http://localhost:9090/targets) 2. Verify service exposes `/metrics` endpoint 3. Check Prometheus scrape config ### Grafana Shows "No Data" 1. Verify datasource connection (Configuration → Data Sources) 2. Check time range in query 3. Ensure data exists in Prometheus/Loki