feat(ops): add database backup strategy and log aggregation stack

- Add pg-backup container with daily automated pg_dump (02:00 UTC) and 7-day retention
- Add backup/restore scripts with documented recovery procedure
- Add Loki + Promtail for centralized log aggregation from all Docker containers
- Add Loki as Grafana datasource with correlation ID derived fields
- Add Grafana logs dashboard with volume, error rate, HTTP request, and log viewer panels
- Configure Promtail to parse Pino structured JSON logs with level/context labels
- Enhance LoggerService with string-level formatter and service base field
- Configure 15-day log retention in Loki

Co-Authored-By: Paperclip <noreply@paperclip.ing>
This commit is contained in:
Ho Ngoc Hai
2026-04-08 04:04:32 +07:00
parent 7c9f682046
commit 775eb7b374
9 changed files with 563 additions and 0 deletions

102
docs/backup-restore.md Normal file
View File

@@ -0,0 +1,102 @@
# Database Backup & Restore Procedures
## Overview
Automated daily PostgreSQL backups run inside the `pg-backup` Docker container using `pg_dump` with custom format compression. Backups are stored in the `pg_backups` Docker volume.
## Backup Configuration
| Setting | Default | Environment Variable |
|---------|---------|---------------------|
| Schedule | Daily at 02:00 UTC | Cron in `pg-backup` service |
| Retention | 7 days | `BACKUP_RETENTION_DAYS` |
| Format | Custom (`pg_dump --format=custom`) | — |
| Compression | Level 6 | — |
| Storage | `pg_backups` Docker volume | — |
## Listing Backups
```bash
docker exec goodgo-pg-backup ls -lh /backups/
```
## Manual Backup
```bash
docker exec goodgo-pg-backup /scripts/pg-backup.sh
```
## Restore Procedure
### 1. Identify the backup to restore
```bash
docker exec goodgo-pg-backup ls -lht /backups/
```
### 2. Stop application services
```bash
docker compose stop ai-services
# Stop any NestJS API processes
```
### 3. Run restore
```bash
docker exec -it goodgo-pg-backup /scripts/pg-restore.sh /backups/goodgo_YYYYMMDD_HHMMSS.sql.gz
```
The restore script will:
- Terminate active database connections
- Drop and recreate the database
- Restore from the selected backup
### 4. Verify restore
```bash
docker exec goodgo-postgres psql -U goodgo -d goodgo -c '\dt'
docker exec goodgo-postgres psql -U goodgo -d goodgo -c 'SELECT count(*) FROM "User";'
```
### 5. Run Prisma migrations (if needed)
```bash
pnpm prisma migrate deploy
```
### 6. Restart services
```bash
docker compose up -d
```
## Backup Verification
Check the backup log:
```bash
docker exec goodgo-pg-backup cat /var/log/pg-backup.log
```
Verify backup integrity without restoring:
```bash
docker exec goodgo-pg-backup pg_restore --list /backups/goodgo_YYYYMMDD_HHMMSS.sql.gz
```
## Disaster Recovery
For complete data loss (volume destroyed):
1. Retrieve backup from external storage (if configured)
2. Recreate the `pg_backups` volume and copy backup file in
3. Follow the restore procedure above
## Log Aggregation
Logs are aggregated via Loki + Promtail and viewable in Grafana:
- **Grafana**: http://localhost:3002 (dashboard: "GoodGo - Logs")
- **Loki**: http://localhost:3100
- **Log retention**: 15 days (configured in `monitoring/loki/loki-config.yml`)