Files
goodgo-platform/monitoring/prometheus/prometheus.yml
Ho Ngoc Hai 90839cf542 feat(monitoring): add API latency Grafana dashboard and alerting rules
Create comprehensive Grafana dashboard for API latency monitoring with:
- p50/p95/p99 stat panels and time series for all endpoints
- Per-endpoint latency breakdown with route/method template variables
- Top 10 slowest endpoints table and bar chart (by p99)
- Request rate (by method) and error rate (4xx/5xx) panels
- Error rate percentage (5xx/total) with SLO threshold
- Latency heatmap and histogram distribution panels

Add Prometheus alerting rules:
- ApiLatencyP99High: p99 > 1s for 5m (warning)
- ApiEndpointLatencyP99High: per-endpoint p99 > 2s (warning)
- ApiLatencyP99Critical: p99 > 3s for 3m (critical/SLO breach)
- ApiErrorRate5xxHigh: 5xx rate > 1% for 5m (warning)

Fix api-overview.json using wrong metric name
(http_request_duration_seconds → goodgo_api_request_duration_seconds).

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-04-10 23:18:09 +07:00

25 lines
605 B
YAML

global:
scrape_interval: 15s
evaluation_interval: 15s
rule_files:
- 'alert-rules.yml'
scrape_configs:
- job_name: 'goodgo-api'
metrics_path: '/metrics'
static_configs:
# host.docker.internal for dev (API on host), api:3001 for prod (API in container)
- targets: ['host.docker.internal:3001']
labels:
service: 'goodgo-api'
environment: 'development'
- targets: ['api:3001']
labels:
service: 'goodgo-api'
environment: 'production'
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']