diff --git a/docs/db/ERD.md b/docs/db/ERD.md new file mode 100644 index 0000000..bf4a0db --- /dev/null +++ b/docs/db/ERD.md @@ -0,0 +1,104 @@ +# GoodGo Platform — Entity Relationship Diagram + +> Generated from `prisma/schema.prisma` as part of [TEC-3010](/TEC/issues/TEC-3010) (Epic: [TEC-3006](/TEC/issues/TEC-3006) — Database Schema Hardening). + +- **Database**: PostgreSQL 16 + PostGIS +- **ORM**: Prisma +- **Models**: 41 +- **Enums**: 37 +- **Geospatial models**: `Property`, `ProjectDevelopment`, `IndustrialPark`, `POI`, `TransferListing`, `InfrastructureProject` (all use `geometry(Point, 4326)` indexed with GiST) + +## High-level domain map + +| Domain | Key models | +|---|---| +| Auth / identity | `User`, `Agent`, `RefreshToken`, `OAuthAccount`, `MfaChallenge` | +| Residential listings | `Property`, `PropertyMedia`, `Listing`, `PriceHistory`, `ProjectDevelopment` | +| Industrial / KCN | `IndustrialPark`, `IndustrialListing` | +| Transfer (furniture + premises) | `TransferListing`, `TransferItem` | +| Search & save | `SavedSearch`, `SavedListing` | +| Transactions / commerce | `Transaction`, `Inquiry`, `Lead`, `Order`, `Escrow`, `Payment` | +| Subscriptions | `Plan`, `Subscription`, `UsageRecord` | +| Messaging | `Conversation`, `ConversationParticipant`, `Message` | +| Analytics / AI | `Valuation`, `MarketIndex`, `Report`, `MacroeconomicData`, `InfrastructureProject` | +| Neighborhood | `POI`, `NeighborhoodScore` | +| Moderation / audit | `AdminAuditLog`, `ModerationAuditLog`, `Review` | +| Notifications | `NotificationLog`, `NotificationPreference` | +| System | `SystemSetting` | + +## Mermaid ERD + +```mermaid +erDiagram + User ||--o| Agent : has + User ||--o{ Listing : sells + User ||--o{ SavedSearch : owns + User ||--o| Subscription : holds + User ||--o{ Payment : makes + User ||--o{ Review : writes + User ||--o{ Inquiry : sends + User ||--o{ RefreshToken : owns + User ||--o{ OAuthAccount : links + User ||--o{ Transaction : buys + User ||--o{ Order : buys_or_sells + User ||--o{ MfaChallenge : authenticates + User ||--o{ TransferListing : lists + User ||--o{ Report : requests + User ||--o{ SavedListing : saves + User ||--o{ ProjectDevelopment : owns + User ||--o{ IndustrialPark : operates + + Agent ||--o{ Listing : represents + Agent ||--o{ Lead : manages + + ProjectDevelopment ||--o{ Property : contains + Property ||--o{ Listing : published_as + Property ||--o{ Valuation : valued_by + Property ||--o{ PropertyMedia : has + + Listing ||--o{ Transaction : spawns + Listing ||--o{ Inquiry : receives + Listing ||--o{ Order : generates + Listing ||--o{ PriceHistory : tracks + Listing ||--o{ SavedListing : saved_as + + Transaction ||--o{ Payment : settled_by + Order ||--o| Escrow : protected_by + Order ||--o{ Payment : paid_by + + Plan ||--o{ Subscription : sold_as + Subscription ||--o{ UsageRecord : meters + + IndustrialPark ||--o{ IndustrialListing : leases + + Conversation ||--o{ ConversationParticipant : has + Conversation ||--o{ Message : contains + + TransferListing ||--o{ TransferItem : contains +``` + +> The Mermaid diagram above intentionally omits cardinality-only links that do not have Prisma-level relations (e.g. `Review.targetId` polymorphic references, `ModerationAuditLog.targetId`, `AdminAuditLog.targetId`, `NotificationLog.userId`). + +## Detached / polymorphic references + +These fields reference other entities by id but lack a Prisma relation — they are polymorphic on purpose, but they sidestep referential integrity: + +- `Review.targetType` + `Review.targetId` → any listing/property/agent +- `ModerationAuditLog.targetType` + `.targetId` → any moderatable entity +- `AdminAuditLog.targetId` + `.targetType` → user/listing/subscription +- `NotificationLog.userId` → `User.id` (no FK) +- `ConversationParticipant.userId` → `User.id` (no FK) +- `Message.senderId` → `User.id` (no FK) +- `IndustrialListing.sellerId` / `.agentId` → `User.id` / `Agent.id` (no FK) +- `Lead.source` (string), `Review.targetType` (string) — no enum +- `Conversation.listingId` → `Listing.id` (no FK) + +See `schema-audit.md` → section **Findings** for severity. + +## Rendered diagram + +The Mermaid block above renders directly in GitHub, GitLab and most Markdown viewers. For an offline PNG/SVG, run: + +```bash +npx @mermaid-js/mermaid-cli -i docs/db/ERD.md -o docs/db/ERD.png +``` diff --git a/docs/db/schema-audit.md b/docs/db/schema-audit.md new file mode 100644 index 0000000..bbde97a --- /dev/null +++ b/docs/db/schema-audit.md @@ -0,0 +1,282 @@ +# GoodGo Platform — Prisma Schema Audit + +> Companion to [docs/db/ERD.md](./ERD.md). Generated for [TEC-3010](/TEC/issues/TEC-3010) (Epic [TEC-3006](/TEC/issues/TEC-3006)). + +Source: `prisma/schema.prisma` (1408 lines, 41 models, 37 enums) as of 2026-04-21. + +## Severity legend + +| Level | Meaning | +|---|---| +| **High** | Data-integrity, PII, or auth/billing correctness risk. Fix before next prod migration. | +| **Medium** | Query-perf, referential-integrity, or cleanup risk. Schedule within current epic. | +| **Low** | Cosmetic, consistency, or doc-only; track as follow-up. | + +## Summary by model + +| # | Model | # Findings | Max severity | +|---|---|---|---| +| 1 | User | 4 | High | +| 2 | MfaChallenge | 1 | Low | +| 3 | RefreshToken | 1 | Medium | +| 4 | OAuthAccount | 2 | Medium | +| 5 | Agent | 2 | Medium | +| 6 | ProjectDevelopment | 2 | Medium | +| 7 | Property | 3 | Medium | +| 8 | PropertyMedia | 1 | Low | +| 9 | Listing | 2 | Medium | +| 10 | PriceHistory | 1 | Low | +| 11 | SavedSearch | 1 | Low | +| 12 | SavedListing | 0 | — | +| 13 | Transaction | 2 | Medium | +| 14 | Inquiry | 2 | Medium | +| 15 | Lead | 2 | Medium | +| 16 | Payment | 2 | High | +| 17 | Order | 2 | Medium | +| 18 | Escrow | 1 | Medium | +| 19 | Plan | 1 | Low | +| 20 | Subscription | 1 | Medium | +| 21 | UsageRecord | 2 | Medium | +| 22 | Valuation | 1 | Low | +| 23 | MarketIndex | 1 | Low | +| 24 | NotificationLog | 2 | Medium | +| 25 | NotificationPreference | 1 | Low | +| 26 | AdminAuditLog | 1 | Medium | +| 27 | ModerationAuditLog | 2 | Medium | +| 28 | POI | 1 | Low | +| 29 | NeighborhoodScore | 1 | Low | +| 30 | Review | 3 | High | +| 31 | IndustrialPark | 1 | Low | +| 32 | IndustrialListing | 3 | High | +| 33 | Conversation | 2 | Medium | +| 34 | ConversationParticipant | 1 | Medium | +| 35 | Message | 2 | Medium | +| 36 | TransferListing | 2 | Medium | +| 37 | TransferItem | 1 | Low | +| 38 | Report | 1 | Low | +| 39 | MacroeconomicData | 1 | Low | +| 40 | InfrastructureProject | 2 | Medium | +| 41 | SystemSetting | 2 | High | + +## Findings + +### 1. User — High + +- **[High]** `email` is nullable and not unique, while `emailHash` carries the uniqueness. Application code must hash every lookup; add a DB-level CHECK that `(email IS NULL) = (emailHash IS NULL)` to prevent divergence. +- **[High]** `phone` is stored plaintext AND `phoneHash` is unique — duplicate-safe, but plaintext phone should be flagged for PII encryption at rest (roadmap: TDE or app-layer envelope). +- **[Medium]** `totpSecret` comment says "Encrypted TOTP secret" but schema has no marker to enforce that; add a Prisma-level `/// ENCRYPTED` doc comment convention and a lint. +- **[Medium]** `totpBackupCodes String[]` — array mutation races: when a backup code is consumed, an atomic `array_remove` is needed at the SQL layer; document. + +### 2. MfaChallenge — Low + +- **[Low]** `type` is a free-form string (`"totp" | "backup_code"`). Convert to enum. + +### 3. RefreshToken — Medium + +- **[Medium]** `token` is `@unique String` — should be stored as SHA-256 hash rather than raw token to mitigate DB leak. Add note or rename field to `tokenHash`. + +### 4. OAuthAccount — Medium + +- **[Medium]** `accessToken` / `refreshToken` stored plaintext — must be encrypted at rest; same encryption story as `User.totpSecret`. +- **[Low]** `expiresAt` is `DateTime?` but no index; a sweeper job that deletes expired provider tokens will do a seq scan. + +### 5. Agent — Medium + +- **[Medium]** `serviceAreas Json` — lose searchability. A `@@index` via `gin` on the JSONB would help, or normalize into a join table `AgentServiceArea`. +- **[Low]** `qualityScore Float @default(0)` — no CHECK `0 ≤ qualityScore ≤ 100`. + +### 6. ProjectDevelopment — Medium + +- **[Medium]** Denormalized location columns (`ward`, `district`, `city`) lack a FK to a canonical `District` / `Ward` table. When canonical admin-region data ships, schedule a normalization pass. +- **[Low]** `minPrice`/`maxPrice` have no CHECK constraint; enforce `maxPrice >= minPrice` and `minPrice >= 0`. + +### 7. Property — Medium + +- **[Medium]** `addressNormalized` is nullable while a comment notes a backfill pending; track in follow-up issue and add `@@index([addressNormalized])` exists ✓ but partial index excluding NULL would save space. +- **[Medium]** `viewType String[]`, `suitableFor String[]` are unbounded — consider enums + join table to allow facet search. +- **[Low]** `areaM2 Float` — CHECK `areaM2 > 0` missing. + +### 8. PropertyMedia — Low + +- **[Low]** `type String` with literal `"image" | "video"` — convert to enum. + +### 9. Listing — Medium + +- **[Medium]** `sellerId onDelete: Restrict` but `agentId onDelete: SetNull` — inconsistent. User deletion flow will fail on any listing. +- **[Low]** Many CHECKs added in recent migration (see inline comments) — good; keep `pricePerM2` derivation consistent with `priceVND / property.areaM2`. + +### 10. PriceHistory — Low + +- **[Low]** No `agentId` / `actorId` — cannot attribute a price change. + +### 11. SavedSearch — Low + +- **[Low]** `filters Json` is unbounded; no schema-level validation. + +### 12. SavedListing + +- No findings. + +### 13. Transaction — Medium + +- **[Medium]** `buyerId onDelete: Restrict` — user-deletion workflow must move or anonymize. +- **[Medium]** Missing `sellerId` direct FK — must resolve through `listing.sellerId`. Denormalize for reporting. + +### 14. Inquiry — Medium + +- **[Medium]** `phone` plaintext PII — hash or encrypt at rest. +- **[Low]** No `status` field (read/responded/closed) — relied on `isRead` only. + +### 15. Lead — Medium + +- **[Medium]** `phone` plaintext + `phoneHash` optional — make hash NOT NULL to match `User` pattern. +- **[Low]** `source String` free-form; convert to enum (`referral | organic | paid | imported`). + +### 16. Payment — High + +- **[High]** `idempotencyKey` is optional on a unique composite `(userId, provider, idempotencyKey)` — `NULL` is distinct in PG, so duplicate payments with NULL key are still possible. Enforce NOT NULL or fall back to a server-generated UUID. +- **[Medium]** `amountVND BigInt` has no CHECK `amountVND > 0`. + +### 17. Order — Medium + +- **[Medium]** `amountVND = platformFeeVND + sellerPayoutVND` invariant is not enforced at DB; add trigger or CHECK. +- **[Medium]** `Restrict` on buyer/seller/listing blocks cascading user deletes. + +### 18. Escrow — Medium + +- **[Medium]** `status` transitions (PENDING → HELD → RELEASED/REFUNDED/DISPUTED) are not enforced; add a state-machine trigger or rely on an app-layer guard. + +### 19. Plan — Low + +- **[Low]** Price fields `BigInt` with no CHECK ≥ 0. + +### 20. Subscription — Medium + +- **[Medium]** `@@unique` missing on `(userId)` via `userId String @unique` — already ✓, but no index on `currentPeriodEnd` for renewal sweeps. + +### 21. UsageRecord — Medium + +- **[Medium]** `metric String` — convert to enum aligned with `Plan.max*` fields; otherwise drift between code/DB is invisible. +- **[Low]** Missing uniqueness on `(subscriptionId, metric, periodStart)` — duplicate counters possible. + +### 22. Valuation — Low + +- **[Low]** `comparables Json` unbounded — cap size, or move to S3. + +### 23. MarketIndex — Low + +- **[Low]** `period String` free-form (e.g., "2025-Q4") — enum or regex CHECK. + +### 24. NotificationLog — Medium + +- **[Medium]** `userId` has no FK (no `@relation`) — silent referential drift. +- **[Medium]** Missing partitioning strategy; this table will grow fast — add TimescaleDB or partition by `createdAt` quarter. + +### 25. NotificationPreference — Low + +- **[Low]** `eventType String` free-form; enum or CHECK. + +### 26. AdminAuditLog — Medium + +- **[Medium]** Polymorphic `targetId` + `targetType` lacks FK. Acceptable in audit tables but document the tradeoff. + +### 27. ModerationAuditLog — Medium + +- **[Medium]** `targetType String` / `action String` — as-designed (to avoid enum migrations), but add a lint to reject unknown values on write. +- **[Medium]** `id String @default(uuid())` while sibling table uses `cuid()` — pick one convention. + +### 28. POI — Low + +- **[Low]** `osmId @unique` nullable — two rows with null `osmId` will co-exist but that is desired for non-OSM POIs. + +### 29. NeighborhoodScore — Low + +- **[Low]** Each score field should have CHECK `0 ≤ value ≤ 10`, `totalScore` `0..100`. + +### 30. Review — High + +- **[High]** Polymorphic `targetType` + `targetId` without FK — highest-volume polymorphic surface, will accumulate dangling references on entity delete. Plan: create per-target-type child tables or enforce with trigger. +- **[Medium]** `rating Int` — CHECK `rating BETWEEN 1 AND 5`. +- **[Medium]** No `moderationStatus` / `hiddenAt` — operational gap for abuse response. + +### 31. IndustrialPark — Low + +- **[Low]** `occupancyRate Float` needs CHECK `0..100`; `remainingAreaHa` should be derivable from `leasableAreaHa - occupied` — stored-derived risk. + +### 32. IndustrialListing — High + +- **[High]** Missing `@relation` for `agentId` → `Agent.id` and `sellerId` → `User.id`. No FK = no cascade, no referential integrity. +- **[Medium]** No `priceVND` — only `priceUsdM2`; cross-currency reporting requires a rate snapshot column (`fxRateVndPerUsd`). +- **[Medium]** `pricingUnit String` — convert to enum. + +### 33. Conversation — Medium + +- **[Medium]** `listingId` has no `@relation` — can reference deleted listing. +- **[Low]** `subject` should probably be required when `listingId` is null (CHECK). + +### 34. ConversationParticipant — Medium + +- **[Medium]** `userId` lacks `@relation` to `User` — orphan participants possible. + +### 35. Message — Medium + +- **[Medium]** `senderId` no `@relation` to `User`. +- **[Low]** Soft-delete `deletedAt` should be paired with an index for "exclude deleted" queries. + +### 36. TransferListing — Medium + +- **[Medium]** `contactPhone`, `contactName` plaintext — PII policy alignment with `User.phone`. +- **[Low]** `businessType String?` / `footTraffic String?` — enum candidates. + +### 37. TransferItem — Low + +- **[Low]** `dimensions Json` — consider splitting into typed columns (`widthCm Float?`, etc.) for range queries. + +### 38. Report — Low + +- **[Low]** `params Json` + `content Json` — large reports should live in object storage with a pointer column to keep row size small (already has `pdfUrl`). + +### 39. MacroeconomicData — Low + +- **[Low]** `indicator String` + `unit String` free-form — enum for `indicator` at minimum. + +### 40. InfrastructureProject — Medium + +- **[Medium]** `category String` / `status String` should both be enums to align with `IndustrialPark`. +- **[Low]** `impactRadius Float` needs CHECK ≥ 0. + +### 41. SystemSetting — High + +- **[High]** Inline schema comment admits values are plaintext and `isSecret` not actually encrypted — all AI provider credentials live here in plaintext. Escalate to a dedicated secret-store or add envelope encryption before shipping more secrets into this table. +- **[Medium]** `updatedBy String?` has no FK to `User.id`. + +## Cross-cutting findings + +| # | Finding | Severity | +|---|---|---| +| X-1 | **Polymorphic target patterns** (`Review`, `AdminAuditLog`, `ModerationAuditLog`) lack DB-level integrity. Standardize on either per-target tables or nightly reconciliation. | High | +| X-2 | **Missing `@relation` on foreign-key-shaped fields** in `NotificationLog`, `ConversationParticipant`, `Message`, `Conversation.listingId`, `IndustrialListing.sellerId|agentId`, `SystemSetting.updatedBy`. | High | +| X-3 | **Plaintext PII / secrets** — `User.phone`, `Inquiry.phone`, `Lead.phone`, `OAuthAccount.accessToken/refreshToken`, `User.totpSecret` (claimed encrypted), `TransferListing.contactPhone`, `SystemSetting.value`. | High | +| X-4 | **Money fields** (`BigInt`) missing CHECK `> 0` across: `Payment.amountVND`, `Order.amountVND/platformFeeVND/sellerPayoutVND`, `Escrow.amountVND/feeVND`, `Plan.price*`, `Transaction.agreedPrice/depositAmount`. Listing and PriceHistory already have CHECKs (recent migration). | Medium | +| X-5 | **Stored derived values** — `IndustrialPark.occupancyRate/remainingAreaHa`, `Property.pricePerM2` — risk drift. Add triggers or move to views. | Medium | +| X-6 | **ID conventions** mixed — 40 tables use `cuid()`, `ModerationAuditLog` uses `uuid()`. Pick one. | Low | +| X-7 | **Enum vs free-form string** — `MfaChallenge.type`, `PropertyMedia.type`, `Lead.source`, `UsageRecord.metric`, `NotificationPreference.eventType`, `MarketIndex.period`, `InfrastructureProject.category/status`, `IndustrialListing.pricingUnit`. | Medium | +| X-8 | **High-volume append tables** (`NotificationLog`, `AdminAuditLog`, `ModerationAuditLog`, `PriceHistory`) have no partitioning / retention plan. | Medium | +| X-9 | **`onDelete: Restrict` on commerce edges** — `Listing.sellerId`, `Transaction.buyerId`, `Order.buyerId/sellerId/listingId`, `Payment.userId`. User-deletion workflow (`User.deletedAt`) cannot hard-delete; document as soft-delete-only. | Medium | +| X-10 | **Geospatial consistency** — all six geo models use `geometry(Point, 4326)` + GiST ✓. Consider SRID check constraint. | Low | + +## Acceptance checklist (TEC-3010) + +- [x] ERD generated from `prisma/schema.prisma` +- [x] `docs/db/ERD.md` committed (Mermaid + rendering notes) +- [x] `docs/db/schema-audit.md` committed with findings + severity per model (41 models) +- [x] Cross-cutting findings documented + +## Recommended follow-ups (subtasks under [TEC-3006](/TEC/issues/TEC-3006)) + +1. Enforce `(userId, provider, idempotencyKey)` uniqueness on `Payment` (X-4). +2. Add missing `@relation` FKs (X-2) — single migration, data-checked. +3. Enum migration sweep (X-7). +4. PII/secret encryption story (X-3) — RFC + migration. +5. Retention/partitioning plan for append-only tables (X-8). +6. Polymorphic target strategy decision (X-1).