Files
goodgo-platform/monitoring/grafana/provisioning/datasources/datasource.yml
Ho Ngoc Hai 9409706c58 feat(monitoring): add comprehensive alerting rules, Alertmanager, and DR validation
Expand production monitoring with full alert coverage for database connections,
Redis memory/connections, container resources, disk usage, service health, and
backup integrity. Add Alertmanager service with Slack routing for critical and
warning alerts, and add automated backup verification to the pg-backup cron
schedule. Update runbook with DR validation procedures and quarterly checklist.

- Expand Prometheus alert rules from 4 to 24 alerts across 7 groups
- Add Alertmanager container (prom/alertmanager:v0.27.0) with Slack routing
- Configure inhibition rules (critical suppresses warning for same service)
- Schedule automated backup verification at 04:00 UTC daily
- Add Alertmanager datasource to Grafana provisioning
- Update runbook with Section 9: DR Validation (automated + manual procedures)
- Add SLACK_WEBHOOK_URL and Grafana vars to .env.example

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-04-11 20:15:36 +07:00

33 lines
658 B
YAML

apiVersion: 1
datasources:
- name: Prometheus
uid: prometheus
type: prometheus
access: proxy
url: http://prometheus:9090
isDefault: true
editable: true
- name: Loki
uid: loki
type: loki
access: proxy
url: http://loki:3100
editable: true
jsonData:
derivedFields:
- datasourceUid: prometheus
matcherRegex: 'correlationId":"([^"]+)'
name: correlationId
url: '$${__value.raw}'
- name: Alertmanager
uid: alertmanager
type: alertmanager
access: proxy
url: http://alertmanager:9093
editable: true
jsonData:
implementation: prometheus