feat(ops): add database backup strategy and log aggregation stack
- Add pg-backup container with daily automated pg_dump (02:00 UTC) and 7-day retention - Add backup/restore scripts with documented recovery procedure - Add Loki + Promtail for centralized log aggregation from all Docker containers - Add Loki as Grafana datasource with correlation ID derived fields - Add Grafana logs dashboard with volume, error rate, HTTP request, and log viewer panels - Configure Promtail to parse Pino structured JSON logs with level/context labels - Enhance LoggerService with string-level formatter and service base field - Configure 15-day log retention in Loki Co-Authored-By: Paperclip <noreply@paperclip.ing>
This commit is contained in:
102
docs/backup-restore.md
Normal file
102
docs/backup-restore.md
Normal file
@@ -0,0 +1,102 @@
|
||||
# Database Backup & Restore Procedures
|
||||
|
||||
## Overview
|
||||
|
||||
Automated daily PostgreSQL backups run inside the `pg-backup` Docker container using `pg_dump` with custom format compression. Backups are stored in the `pg_backups` Docker volume.
|
||||
|
||||
## Backup Configuration
|
||||
|
||||
| Setting | Default | Environment Variable |
|
||||
|---------|---------|---------------------|
|
||||
| Schedule | Daily at 02:00 UTC | Cron in `pg-backup` service |
|
||||
| Retention | 7 days | `BACKUP_RETENTION_DAYS` |
|
||||
| Format | Custom (`pg_dump --format=custom`) | — |
|
||||
| Compression | Level 6 | — |
|
||||
| Storage | `pg_backups` Docker volume | — |
|
||||
|
||||
## Listing Backups
|
||||
|
||||
```bash
|
||||
docker exec goodgo-pg-backup ls -lh /backups/
|
||||
```
|
||||
|
||||
## Manual Backup
|
||||
|
||||
```bash
|
||||
docker exec goodgo-pg-backup /scripts/pg-backup.sh
|
||||
```
|
||||
|
||||
## Restore Procedure
|
||||
|
||||
### 1. Identify the backup to restore
|
||||
|
||||
```bash
|
||||
docker exec goodgo-pg-backup ls -lht /backups/
|
||||
```
|
||||
|
||||
### 2. Stop application services
|
||||
|
||||
```bash
|
||||
docker compose stop ai-services
|
||||
# Stop any NestJS API processes
|
||||
```
|
||||
|
||||
### 3. Run restore
|
||||
|
||||
```bash
|
||||
docker exec -it goodgo-pg-backup /scripts/pg-restore.sh /backups/goodgo_YYYYMMDD_HHMMSS.sql.gz
|
||||
```
|
||||
|
||||
The restore script will:
|
||||
- Terminate active database connections
|
||||
- Drop and recreate the database
|
||||
- Restore from the selected backup
|
||||
|
||||
### 4. Verify restore
|
||||
|
||||
```bash
|
||||
docker exec goodgo-postgres psql -U goodgo -d goodgo -c '\dt'
|
||||
docker exec goodgo-postgres psql -U goodgo -d goodgo -c 'SELECT count(*) FROM "User";'
|
||||
```
|
||||
|
||||
### 5. Run Prisma migrations (if needed)
|
||||
|
||||
```bash
|
||||
pnpm prisma migrate deploy
|
||||
```
|
||||
|
||||
### 6. Restart services
|
||||
|
||||
```bash
|
||||
docker compose up -d
|
||||
```
|
||||
|
||||
## Backup Verification
|
||||
|
||||
Check the backup log:
|
||||
|
||||
```bash
|
||||
docker exec goodgo-pg-backup cat /var/log/pg-backup.log
|
||||
```
|
||||
|
||||
Verify backup integrity without restoring:
|
||||
|
||||
```bash
|
||||
docker exec goodgo-pg-backup pg_restore --list /backups/goodgo_YYYYMMDD_HHMMSS.sql.gz
|
||||
```
|
||||
|
||||
## Disaster Recovery
|
||||
|
||||
For complete data loss (volume destroyed):
|
||||
|
||||
1. Retrieve backup from external storage (if configured)
|
||||
2. Recreate the `pg_backups` volume and copy backup file in
|
||||
3. Follow the restore procedure above
|
||||
|
||||
## Log Aggregation
|
||||
|
||||
Logs are aggregated via Loki + Promtail and viewable in Grafana:
|
||||
|
||||
- **Grafana**: http://localhost:3002 (dashboard: "GoodGo - Logs")
|
||||
- **Loki**: http://localhost:3100
|
||||
- **Log retention**: 15 days (configured in `monitoring/loki/loki-config.yml`)
|
||||
Reference in New Issue
Block a user