Files
pos-system/docs/en/guides/observability.md
Ho Ngoc Hai 51bc4c8ec0 feat(docs): Update guides with enhanced troubleshooting sections and visual clarity
- Added detailed troubleshooting tables in the IAM migration, observability, and Kubernetes local deployment guides to assist users in diagnosing common issues.
- Improved Mermaid diagrams with clearer component labels and color coding for better visualization.
- Enhanced the structure of the Vietnamese documentation to align with the English version, ensuring consistency across guides.
- Included quick tips and common issues sections to facilitate user navigation and understanding.
2026-01-08 17:01:14 +07:00

7.7 KiB

Observability Stack Guide

This guide explains how to use the observability stack (Grafana, Prometheus, Loki, Promtail) included in the infrastructure.

Architecture Overview

Components

The stack consists of the following components:

  • Prometheus: Metrics collection and storage
  • Loki: Log aggregation system
  • Promtail: Log collector agent
  • Grafana: Unified visualization dashboard

Architecture Diagram

flowchart LR
    subgraph Services["Microservices"]
        IAM[IAM Service]
        USER[User Service]
        TRAEFIK[Traefik Gateway]
    end
    
    subgraph Collection["Data Collection"]
        PROM[Prometheus]
        PROMTAIL[Promtail]
    end
    
    subgraph Storage["Data Storage"]
        PROM_DB[(Prometheus DB)]
        LOKI_DB[(Loki DB)]
    end
    
    subgraph Visualization["Visualization"]
        GRAFANA[Grafana Dashboard]
    end
    
    IAM -->|Metrics| PROM
    USER -->|Metrics| PROM
    TRAEFIK -->|Metrics| PROM
    
    IAM -.->|Logs| PROMTAIL
    USER -.->|Logs| PROMTAIL
    TRAEFIK -.->|Logs| PROMTAIL
    
    PROM -->|Store| PROM_DB
    PROMTAIL -->|Push| LOKI_DB
    
    PROM_DB -->|Query| GRAFANA
    LOKI_DB -->|Query| GRAFANA
    
    style Services fill:#2d3748
    style Collection fill:#2c5282
    style Storage fill:#2f855a
    style Visualization fill:#744210
    style IAM fill:#4a5568
    style USER fill:#4a5568
    style TRAEFIK fill:#4a5568
    style PROM fill:#3182ce
    style PROMTAIL fill:#3182ce
    style PROM_DB fill:#38a169
    style LOKI_DB fill:#38a169
    style GRAFANA fill:#d69e2e

Data Flow

sequenceDiagram
    participant S as Service
    participant PT as Promtail
    participant P as Prometheus
    participant L as Loki
    participant G as Grafana
    
    Note over S,G: Metrics Flow
    S->>P: Expose /metrics endpoint
    P->>P: Scrape metrics (15s interval)
    G->>P: Query PromQL
    P-->>G: Return metrics data
    
    Note over S,G: Logs Flow
    S->>PT: Write logs to stdout
    PT->>PT: Parse & Label logs
    PT->>L: Push logs via HTTP
    G->>L: Query LogQL
    L-->>G: Return log data

Getting Started

Prerequisites

  • Docker and Docker Compose installed
  • Existing microservices-network (created by the main application stack or manually)

Starting the Stack

You can easily start the stack using the provided script:

./scripts/observability/start.sh

Or manually:

docker network create microservices-network || true

cd infra/observability
docker-compose -f docker-compose.observability.yml up -d

Check if all containers are running:

docker ps

You should see grafana, prometheus, loki, and promtail.

Accessing Services

Service URL Credentials Description
Grafana http://localhost:3001 admin / admin Main dashboard for visualization
Prometheus http://localhost:9090 N/A Raw metrics and target status
Loki http://localhost:3100 N/A Log aggregation API (no UI)

Using Grafana

Initial Setup

  1. Login: Access http://localhost:3001 and login with admin/admin
  2. Change Password: You'll be prompted to change the default password (recommended)
  3. Verify Datasources:
    • Navigate to ConfigurationData Sources
    • Ensure both Prometheus and Loki are connected

Exploring Data

Go to Explore (compass icon) in the sidebar:

  • Select Loki from the datasource dropdown to search logs
  • Select Prometheus from the datasource dropdown to query metrics

Viewing Logs (Loki)

In the Explore view with Loki selected:

  1. Click Label browser
  2. Select a label, e.g., container
  3. Choose a specific container (e.g., iam-service or traefik)
  4. Click Show logs

LogQL Query Examples:

{container="iam-service"}
{container="iam-service"} |= "error"
{container="iam-service"} |= "error" | json

Viewing Metrics (Prometheus)

In the Explore view with Prometheus selected:

  1. Type a metric name in the query field (e.g., up, container_memory_usage_bytes)
  2. Click Run query

PromQL Query Examples:

up

rate(http_requests_total[5m])

container_memory_usage_bytes{container="iam-service"}

Configuration

File Locations

  • Prometheus: infra/observability/prometheus/prometheus.yml
  • Promtail: infra/observability/promtail/promtail-config.yml
  • Grafana: infra/observability/grafana/

Custom Metrics

To expose custom metrics from your service:

import { Counter, Histogram } from 'prom-client';

const requestCounter = new Counter({
  name: 'http_requests_total',
  help: 'Total HTTP requests',
  labelNames: ['method', 'route', 'status']
});

const requestDuration = new Histogram({
  name: 'http_request_duration_seconds',
  help: 'HTTP request duration',
  labelNames: ['method', 'route']
});

Custom Dashboards

Create custom dashboards in Grafana:

  1. Click +Dashboard
  2. Add Panel
  3. Configure query (Prometheus or Loki)
  4. Save dashboard

Color Palette Reference

Diagrams use a dark color palette for better readability:

Component Type Fill Color Stroke Color Purpose
🚀 Services #e94560 #c81e3b Microservices (red)
📊 Collectors #f39c12 #d68910 Data collection (orange)
💾 Storage #3498db #2874a6 Storage (blue)
📊 Visualization #9b59b6 #7d3c98 Visualization (purple)
📦 Subgraphs #1a1a2e - #533483 #16213e - #0f3460 Logical groups

All text uses color:#ffffff (white) for readability on dark backgrounds

Quick Tips

Mermaid Common Issues

DO:

  • Use flowchart LR for left-to-right flow
  • Use sequenceDiagram for time-based interactions
  • Apply dark colors for better contrast
  • Use descriptive node IDs

DON'T:

  • Mix graph and flowchart syntax
  • Use special characters in node IDs without quotes
  • Forget closing brackets for subgraphs

LogQL Quick Reference

{label="value"}
{label="value"} |= "search"
{label="value"} |= "error" | json
{label="value"} | logfmt

PromQL Quick Reference

metric_name
metric_name{label="value"}
rate(metric_name[5m])
sum(metric_name) by (label)

Visual Indicators

  • 📊 Metrics: Numerical time-series data
  • 📝 Logs: Text-based event records
  • 🎯 Query: Search/filter operations
  • 🔍 Explore: Investigation interface
  • 📈 Dashboard: Pre-configured visualizations

Troubleshooting

Common Issues

Issue Symptoms Solution
⚠️ No logs visible Grafana Explore shows no logs Check Promtail is running: docker ps | grep promtail
📊 Missing metrics Services don't appear in Prometheus targets Check service /metrics endpoint
🔴 Container won't start docker ps doesn't show container View logs: docker-compose logs <service-name>
🌐 Network issue Services can't connect Create network: docker network create microservices-network

Logs Not Appearing in Loki

  1. Check Promtail logs: docker logs promtail
  2. Verify container labels are correct
  3. Ensure services are on microservices-network

Metrics Not Appearing in Prometheus

  1. Check Prometheus targets: http://localhost:9090/targets
  2. Verify service exposes /metrics endpoint
  3. Check Prometheus scrape config

Grafana Shows "No Data"

  1. Verify datasource connection (Configuration → Data Sources)
  2. Check time range in query
  3. Ensure data exists in Prometheus/Loki