Files
goodgo-platform/docs
Ho Ngoc Hai 9409706c58 feat(monitoring): add comprehensive alerting rules, Alertmanager, and DR validation
Expand production monitoring with full alert coverage for database connections,
Redis memory/connections, container resources, disk usage, service health, and
backup integrity. Add Alertmanager service with Slack routing for critical and
warning alerts, and add automated backup verification to the pg-backup cron
schedule. Update runbook with DR validation procedures and quarterly checklist.

- Expand Prometheus alert rules from 4 to 24 alerts across 7 groups
- Add Alertmanager container (prom/alertmanager:v0.27.0) with Slack routing
- Configure inhibition rules (critical suppresses warning for same service)
- Schedule automated backup verification at 04:00 UTC daily
- Add Alertmanager datasource to Grafana provisioning
- Update runbook with Section 9: DR Validation (automated + manual procedures)
- Add SLACK_WEBHOOK_URL and Grafana vars to .env.example

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-04-11 20:15:36 +07:00
..

GoodGo Platform Documentation

Getting Started

Document Description
Development Environment Docker setup, local services, troubleshooting
Architecture System design, data flow, module structure

API Reference

Document Description
API Endpoints Complete REST API endpoint reference
API Error Codes Error response format and all error codes

Operations

Document Description
Deployment Production deployment guide and checklists
Backup & Restore Backup procedures and disaster recovery runbook

Audits

See audits/README.md for code quality, accessibility, and test coverage audit reports.