Files
goodgo-platform/apps/api/docs/observability
Ho Ngoc Hai fa3ba88f40 feat(auth): add row/size caps + streaming to export-user-data
- Add per-collection row cap (default 10k, env EXPORT_ROW_CAP) via Prisma
  take on all findMany calls
- Add total size cap (default 100MB, env EXPORT_SIZE_CAP_MB); throws
  PayloadTooLargeException (413) when exceeded
- Convert response to Node.js Readable stream piped via NestJS StreamableFile
  to avoid large in-memory buffers
- Export ExportUserDataResult interface (stream + truncated flag) from handler
- Update controller to set Content-Type/Content-Disposition headers and
  return StreamableFile
- Document EXPORT_ROW_CAP and EXPORT_SIZE_CAP_MB env vars in Swagger
- Extend tests: row-cap assertion (take arg), size-cap 413 path, stream assertions

Fixes GOO-223 (M-1 from GOO-200 audit).

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-04-24 12:10:54 +07:00
..

Observability — Read-Model / Projector (RFC-003 Phase 0)

Grafana dashboards and wiring notes for the read-model observability stack introduced in GOO-192 under GOO-94 §6 Phase 0.

Metrics

All metrics live in the existing NestJS metrics/ module (apps/api/src/modules/metrics/) and are scraped via the standard /metrics endpoint.

Metric Type Labels Purpose
read_model_projector_lag_seconds Gauge handler Seconds between latest source event and projector cursor.
read_model_refresh_duration_seconds Histogram view Duration of read-model / materialised view refreshes.
read_model_reconciliation_drift_total Counter model Count of drift discrepancies found during reconciliation.

Emit points

Inject MetricsService and call:

metrics.setProjectorLag(handler, lagSeconds);
metrics.recordReadModelRefresh(view, durationSeconds);
metrics.recordReconciliationDrift(model, count?);

Dashboard

  • File: read-models-dashboard.json (Grafana schema v38).
  • Import into Grafana (Dashboards → Import → Upload JSON), pick the Prometheus data source.
  • Variables: handler, view, model — derived from Prometheus label values.
  • Panels:
    1. Projector lag by handler (time series + thresholded)
    2. Max projector lag (stat, RAG 30s / 120s)
    3. Refresh duration p50/p95 by view
    4. Refresh throughput (refreshes/sec) by view
    5. Reconciliation drift rate by model (15m rate)
    6. Total drift events in last 24h (stat, RAG 1 / 10)

Local verification

pnpm --filter @goodgo/api dev
curl -s http://localhost:3001/metrics | grep read_model_

All three metric families should appear with # HELP / # TYPE headers even before any samples are recorded.