feat(auth): add row/size caps + streaming to export-user-data

- Add per-collection row cap (default 10k, env EXPORT_ROW_CAP) via Prisma take on all findMany calls - Add total size cap (default 100MB, env EXPORT_SIZE_CAP_MB); throws PayloadTooLargeException (413) when exceeded - Convert response to Node.js Readable stream piped via NestJS StreamableFile to avoid large in-memory buffers - Export ExportUserDataResult interface (stream + truncated flag) from handler - Update controller to set Content-Type/Content-Disposition headers and return StreamableFile - Document EXPORT_ROW_CAP and EXPORT_SIZE_CAP_MB env vars in Swagger - Extend tests: row-cap assertion (take arg), size-cap 413 path, stream assertions Fixes GOO-223 (M-1 from GOO-200 audit). Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-04-24 12:10:54 +07:00
parent b4bb05479e
commit fa3ba88f40
34 changed files with 1494 additions and 45 deletions
--- a/apps/api/docs/observability/README.md
+++ b/apps/api/docs/observability/README.md
@@ -0,0 +1,50 @@
+# Observability — Read-Model / Projector (RFC-003 Phase 0)
+
+Grafana dashboards and wiring notes for the read-model observability stack
+introduced in [GOO-192](/GOO/issues/GOO-192) under [GOO-94](/GOO/issues/GOO-94) §6 Phase 0.
+
+## Metrics
+
+All metrics live in the existing NestJS `metrics/` module
+(`apps/api/src/modules/metrics/`) and are scraped via the standard `/metrics`
+endpoint.
+
+| Metric                                  | Type      | Labels    | Purpose                                                   |
+| --------------------------------------- | --------- | --------- | --------------------------------------------------------- |
+| `read_model_projector_lag_seconds`      | Gauge     | `handler` | Seconds between latest source event and projector cursor. |
+| `read_model_refresh_duration_seconds`   | Histogram | `view`    | Duration of read-model / materialised view refreshes.     |
+| `read_model_reconciliation_drift_total` | Counter   | `model`   | Count of drift discrepancies found during reconciliation. |
+
+### Emit points
+
+Inject `MetricsService` and call:
+
+```ts
+metrics.setProjectorLag(handler, lagSeconds);
+metrics.recordReadModelRefresh(view, durationSeconds);
+metrics.recordReconciliationDrift(model, count?);
+```
+
+## Dashboard
+
+- File: `read-models-dashboard.json` (Grafana schema v38).
+- Import into Grafana (`Dashboards → Import → Upload JSON`), pick the Prometheus
+  data source.
+- Variables: `handler`, `view`, `model` — derived from Prometheus label values.
+- Panels:
+  1. Projector lag by handler (time series + thresholded)
+  2. Max projector lag (stat, RAG 30s / 120s)
+  3. Refresh duration p50/p95 by view
+  4. Refresh throughput (refreshes/sec) by view
+  5. Reconciliation drift rate by model (15m rate)
+  6. Total drift events in last 24h (stat, RAG 1 / 10)
+
+## Local verification
+
+```bash
+pnpm --filter @goodgo/api dev
+curl -s http://localhost:3001/metrics | grep read_model_
+```
+
+All three metric families should appear with `# HELP` / `# TYPE` headers even
+before any samples are recorded.
--- a/apps/api/docs/observability/read-models-dashboard.json
+++ b/apps/api/docs/observability/read-models-dashboard.json
@@ -0,0 +1,77 @@
+{
+  "annotations": {
+    "list": [
+      {
+        "builtIn": 1,
+        "datasource": "-- Grafana --",
+        "enable": true,
+        "hide": true,
+        "iconColor": "rgba(0, 211, 255, 1)",
+        "name": "Annotations & Alerts",
+        "type": "dashboard"
+      }
+    ]
+  },
+  "editable": true,
+  "graphTooltip": 1,
+  "id": null,
+  "uid": "goodgo-read-models",
+  "title": "GoodGo · Read-Model Observability (RFC-003 Phase 0)",
+  "tags": ["goodgo", "rfc-003", "read-models", "observability"],
+  "timezone": "browser",
+  "schemaVersion": 38,
+  "version": 1,
+  "refresh": "30s",
+  "time": { "from": "now-6h", "to": "now" },
+  "templating": {
+    "list": [
+      { "name": "datasource", "type": "datasource", "query": "prometheus", "current": { "text": "Prometheus", "value": "Prometheus" } },
+      { "name": "handler", "type": "query", "datasource": "${datasource}", "query": "label_values(read_model_projector_lag_seconds, handler)", "includeAll": true, "multi": true, "refresh": 2 },
+      { "name": "view", "type": "query", "datasource": "${datasource}", "query": "label_values(read_model_refresh_duration_seconds_bucket, view)", "includeAll": true, "multi": true, "refresh": 2 },
+      { "name": "model", "type": "query", "datasource": "${datasource}", "query": "label_values(read_model_reconciliation_drift_total, model)", "includeAll": true, "multi": true, "refresh": 2 }
+    ]
+  },
+  "panels": [
+    {
+      "id": 1, "type": "timeseries", "title": "Projector lag (seconds) — by handler",
+      "datasource": "${datasource}", "gridPos": { "h": 8, "w": 12, "x": 0, "y": 0 },
+      "fieldConfig": { "defaults": { "unit": "s", "thresholds": { "mode": "absolute", "steps": [{ "color": "green", "value": null }, { "color": "yellow", "value": 30 }, { "color": "red", "value": 120 }] } } },
+      "targets": [{ "expr": "read_model_projector_lag_seconds{handler=~\"$handler\"}", "legendFormat": "{{handler}}", "refId": "A" }]
+    },
+    {
+      "id": 2, "type": "stat", "title": "Max projector lag (current)",
+      "datasource": "${datasource}", "gridPos": { "h": 8, "w": 12, "x": 12, "y": 0 },
+      "fieldConfig": { "defaults": { "unit": "s", "thresholds": { "mode": "absolute", "steps": [{ "color": "green", "value": null }, { "color": "yellow", "value": 30 }, { "color": "red", "value": 120 }] } } },
+      "options": { "reduceOptions": { "calcs": ["lastNotNull"] } },
+      "targets": [{ "expr": "max(read_model_projector_lag_seconds{handler=~\"$handler\"})", "refId": "A" }]
+    },
+    {
+      "id": 3, "type": "timeseries", "title": "Refresh duration p50/p95 — by view",
+      "datasource": "${datasource}", "gridPos": { "h": 8, "w": 12, "x": 0, "y": 8 },
+      "fieldConfig": { "defaults": { "unit": "s" } },
+      "targets": [
+        { "expr": "histogram_quantile(0.95, sum by (view, le) (rate(read_model_refresh_duration_seconds_bucket{view=~\"$view\"}[5m])))", "legendFormat": "p95 · {{view}}", "refId": "A" },
+        { "expr": "histogram_quantile(0.50, sum by (view, le) (rate(read_model_refresh_duration_seconds_bucket{view=~\"$view\"}[5m])))", "legendFormat": "p50 · {{view}}", "refId": "B" }
+      ]
+    },
+    {
+      "id": 4, "type": "timeseries", "title": "Refresh throughput (refreshes/sec) — by view",
+      "datasource": "${datasource}", "gridPos": { "h": 8, "w": 12, "x": 12, "y": 8 },
+      "fieldConfig": { "defaults": { "unit": "ops" } },
+      "targets": [{ "expr": "sum by (view) (rate(read_model_refresh_duration_seconds_count{view=~\"$view\"}[5m]))", "legendFormat": "{{view}}", "refId": "A" }]
+    },
+    {
+      "id": 5, "type": "timeseries", "title": "Reconciliation drift rate — by model",
+      "datasource": "${datasource}", "gridPos": { "h": 8, "w": 12, "x": 0, "y": 16 },
+      "fieldConfig": { "defaults": { "unit": "ops" } },
+      "targets": [{ "expr": "sum by (model) (rate(read_model_reconciliation_drift_total{model=~\"$model\"}[15m]))", "legendFormat": "{{model}}", "refId": "A" }]
+    },
+    {
+      "id": 6, "type": "stat", "title": "Total drift events (last 24h)",
+      "datasource": "${datasource}", "gridPos": { "h": 8, "w": 12, "x": 12, "y": 16 },
+      "fieldConfig": { "defaults": { "unit": "short", "thresholds": { "mode": "absolute", "steps": [{ "color": "green", "value": null }, { "color": "yellow", "value": 1 }, { "color": "red", "value": 10 }] } } },
+      "options": { "reduceOptions": { "calcs": ["lastNotNull"] } },
+      "targets": [{ "expr": "sum by (model) (increase(read_model_reconciliation_drift_total{model=~\"$model\"}[24h]))", "legendFormat": "{{model}}", "refId": "A" }]
+    }
+  ]
+}