docs(db): add ERD + schema audit for TEC-3010

Generated from prisma/schema.prisma (41 models, 37 enums):
- docs/db/ERD.md: Mermaid ERD + domain map
- docs/db/schema-audit.md: per-model findings with severity + 10 cross-cutting findings

Co-Authored-By: Paperclip <noreply@paperclip.ing>
This commit is contained in:
Database Architect
2026-04-21 00:53:21 +07:00
parent 26b6b37cee
commit 1a77ab625e
2 changed files with 386 additions and 0 deletions

104
docs/db/ERD.md Normal file
View File

@@ -0,0 +1,104 @@
# GoodGo Platform — Entity Relationship Diagram
> Generated from `prisma/schema.prisma` as part of [TEC-3010](/TEC/issues/TEC-3010) (Epic: [TEC-3006](/TEC/issues/TEC-3006) — Database Schema Hardening).
- **Database**: PostgreSQL 16 + PostGIS
- **ORM**: Prisma
- **Models**: 41
- **Enums**: 37
- **Geospatial models**: `Property`, `ProjectDevelopment`, `IndustrialPark`, `POI`, `TransferListing`, `InfrastructureProject` (all use `geometry(Point, 4326)` indexed with GiST)
## High-level domain map
| Domain | Key models |
|---|---|
| Auth / identity | `User`, `Agent`, `RefreshToken`, `OAuthAccount`, `MfaChallenge` |
| Residential listings | `Property`, `PropertyMedia`, `Listing`, `PriceHistory`, `ProjectDevelopment` |
| Industrial / KCN | `IndustrialPark`, `IndustrialListing` |
| Transfer (furniture + premises) | `TransferListing`, `TransferItem` |
| Search & save | `SavedSearch`, `SavedListing` |
| Transactions / commerce | `Transaction`, `Inquiry`, `Lead`, `Order`, `Escrow`, `Payment` |
| Subscriptions | `Plan`, `Subscription`, `UsageRecord` |
| Messaging | `Conversation`, `ConversationParticipant`, `Message` |
| Analytics / AI | `Valuation`, `MarketIndex`, `Report`, `MacroeconomicData`, `InfrastructureProject` |
| Neighborhood | `POI`, `NeighborhoodScore` |
| Moderation / audit | `AdminAuditLog`, `ModerationAuditLog`, `Review` |
| Notifications | `NotificationLog`, `NotificationPreference` |
| System | `SystemSetting` |
## Mermaid ERD
```mermaid
erDiagram
User ||--o| Agent : has
User ||--o{ Listing : sells
User ||--o{ SavedSearch : owns
User ||--o| Subscription : holds
User ||--o{ Payment : makes
User ||--o{ Review : writes
User ||--o{ Inquiry : sends
User ||--o{ RefreshToken : owns
User ||--o{ OAuthAccount : links
User ||--o{ Transaction : buys
User ||--o{ Order : buys_or_sells
User ||--o{ MfaChallenge : authenticates
User ||--o{ TransferListing : lists
User ||--o{ Report : requests
User ||--o{ SavedListing : saves
User ||--o{ ProjectDevelopment : owns
User ||--o{ IndustrialPark : operates
Agent ||--o{ Listing : represents
Agent ||--o{ Lead : manages
ProjectDevelopment ||--o{ Property : contains
Property ||--o{ Listing : published_as
Property ||--o{ Valuation : valued_by
Property ||--o{ PropertyMedia : has
Listing ||--o{ Transaction : spawns
Listing ||--o{ Inquiry : receives
Listing ||--o{ Order : generates
Listing ||--o{ PriceHistory : tracks
Listing ||--o{ SavedListing : saved_as
Transaction ||--o{ Payment : settled_by
Order ||--o| Escrow : protected_by
Order ||--o{ Payment : paid_by
Plan ||--o{ Subscription : sold_as
Subscription ||--o{ UsageRecord : meters
IndustrialPark ||--o{ IndustrialListing : leases
Conversation ||--o{ ConversationParticipant : has
Conversation ||--o{ Message : contains
TransferListing ||--o{ TransferItem : contains
```
> The Mermaid diagram above intentionally omits cardinality-only links that do not have Prisma-level relations (e.g. `Review.targetId` polymorphic references, `ModerationAuditLog.targetId`, `AdminAuditLog.targetId`, `NotificationLog.userId`).
## Detached / polymorphic references
These fields reference other entities by id but lack a Prisma relation — they are polymorphic on purpose, but they sidestep referential integrity:
- `Review.targetType` + `Review.targetId` → any listing/property/agent
- `ModerationAuditLog.targetType` + `.targetId` → any moderatable entity
- `AdminAuditLog.targetId` + `.targetType` → user/listing/subscription
- `NotificationLog.userId``User.id` (no FK)
- `ConversationParticipant.userId``User.id` (no FK)
- `Message.senderId``User.id` (no FK)
- `IndustrialListing.sellerId` / `.agentId``User.id` / `Agent.id` (no FK)
- `Lead.source` (string), `Review.targetType` (string) — no enum
- `Conversation.listingId``Listing.id` (no FK)
See `schema-audit.md` → section **Findings** for severity.
## Rendered diagram
The Mermaid block above renders directly in GitHub, GitLab and most Markdown viewers. For an offline PNG/SVG, run:
```bash
npx @mermaid-js/mermaid-cli -i docs/db/ERD.md -o docs/db/ERD.png
```

282
docs/db/schema-audit.md Normal file
View File

@@ -0,0 +1,282 @@
# GoodGo Platform — Prisma Schema Audit
> Companion to [docs/db/ERD.md](./ERD.md). Generated for [TEC-3010](/TEC/issues/TEC-3010) (Epic [TEC-3006](/TEC/issues/TEC-3006)).
Source: `prisma/schema.prisma` (1408 lines, 41 models, 37 enums) as of 2026-04-21.
## Severity legend
| Level | Meaning |
|---|---|
| **High** | Data-integrity, PII, or auth/billing correctness risk. Fix before next prod migration. |
| **Medium** | Query-perf, referential-integrity, or cleanup risk. Schedule within current epic. |
| **Low** | Cosmetic, consistency, or doc-only; track as follow-up. |
## Summary by model
| # | Model | # Findings | Max severity |
|---|---|---|---|
| 1 | User | 4 | High |
| 2 | MfaChallenge | 1 | Low |
| 3 | RefreshToken | 1 | Medium |
| 4 | OAuthAccount | 2 | Medium |
| 5 | Agent | 2 | Medium |
| 6 | ProjectDevelopment | 2 | Medium |
| 7 | Property | 3 | Medium |
| 8 | PropertyMedia | 1 | Low |
| 9 | Listing | 2 | Medium |
| 10 | PriceHistory | 1 | Low |
| 11 | SavedSearch | 1 | Low |
| 12 | SavedListing | 0 | — |
| 13 | Transaction | 2 | Medium |
| 14 | Inquiry | 2 | Medium |
| 15 | Lead | 2 | Medium |
| 16 | Payment | 2 | High |
| 17 | Order | 2 | Medium |
| 18 | Escrow | 1 | Medium |
| 19 | Plan | 1 | Low |
| 20 | Subscription | 1 | Medium |
| 21 | UsageRecord | 2 | Medium |
| 22 | Valuation | 1 | Low |
| 23 | MarketIndex | 1 | Low |
| 24 | NotificationLog | 2 | Medium |
| 25 | NotificationPreference | 1 | Low |
| 26 | AdminAuditLog | 1 | Medium |
| 27 | ModerationAuditLog | 2 | Medium |
| 28 | POI | 1 | Low |
| 29 | NeighborhoodScore | 1 | Low |
| 30 | Review | 3 | High |
| 31 | IndustrialPark | 1 | Low |
| 32 | IndustrialListing | 3 | High |
| 33 | Conversation | 2 | Medium |
| 34 | ConversationParticipant | 1 | Medium |
| 35 | Message | 2 | Medium |
| 36 | TransferListing | 2 | Medium |
| 37 | TransferItem | 1 | Low |
| 38 | Report | 1 | Low |
| 39 | MacroeconomicData | 1 | Low |
| 40 | InfrastructureProject | 2 | Medium |
| 41 | SystemSetting | 2 | High |
## Findings
### 1. User — High
- **[High]** `email` is nullable and not unique, while `emailHash` carries the uniqueness. Application code must hash every lookup; add a DB-level CHECK that `(email IS NULL) = (emailHash IS NULL)` to prevent divergence.
- **[High]** `phone` is stored plaintext AND `phoneHash` is unique — duplicate-safe, but plaintext phone should be flagged for PII encryption at rest (roadmap: TDE or app-layer envelope).
- **[Medium]** `totpSecret` comment says "Encrypted TOTP secret" but schema has no marker to enforce that; add a Prisma-level `/// ENCRYPTED` doc comment convention and a lint.
- **[Medium]** `totpBackupCodes String[]` — array mutation races: when a backup code is consumed, an atomic `array_remove` is needed at the SQL layer; document.
### 2. MfaChallenge — Low
- **[Low]** `type` is a free-form string (`"totp" | "backup_code"`). Convert to enum.
### 3. RefreshToken — Medium
- **[Medium]** `token` is `@unique String` — should be stored as SHA-256 hash rather than raw token to mitigate DB leak. Add note or rename field to `tokenHash`.
### 4. OAuthAccount — Medium
- **[Medium]** `accessToken` / `refreshToken` stored plaintext — must be encrypted at rest; same encryption story as `User.totpSecret`.
- **[Low]** `expiresAt` is `DateTime?` but no index; a sweeper job that deletes expired provider tokens will do a seq scan.
### 5. Agent — Medium
- **[Medium]** `serviceAreas Json` — lose searchability. A `@@index` via `gin` on the JSONB would help, or normalize into a join table `AgentServiceArea`.
- **[Low]** `qualityScore Float @default(0)` — no CHECK `0 ≤ qualityScore ≤ 100`.
### 6. ProjectDevelopment — Medium
- **[Medium]** Denormalized location columns (`ward`, `district`, `city`) lack a FK to a canonical `District` / `Ward` table. When canonical admin-region data ships, schedule a normalization pass.
- **[Low]** `minPrice`/`maxPrice` have no CHECK constraint; enforce `maxPrice >= minPrice` and `minPrice >= 0`.
### 7. Property — Medium
- **[Medium]** `addressNormalized` is nullable while a comment notes a backfill pending; track in follow-up issue and add `@@index([addressNormalized])` exists ✓ but partial index excluding NULL would save space.
- **[Medium]** `viewType String[]`, `suitableFor String[]` are unbounded — consider enums + join table to allow facet search.
- **[Low]** `areaM2 Float` — CHECK `areaM2 > 0` missing.
### 8. PropertyMedia — Low
- **[Low]** `type String` with literal `"image" | "video"` — convert to enum.
### 9. Listing — Medium
- **[Medium]** `sellerId onDelete: Restrict` but `agentId onDelete: SetNull` — inconsistent. User deletion flow will fail on any listing.
- **[Low]** Many CHECKs added in recent migration (see inline comments) — good; keep `pricePerM2` derivation consistent with `priceVND / property.areaM2`.
### 10. PriceHistory — Low
- **[Low]** No `agentId` / `actorId` — cannot attribute a price change.
### 11. SavedSearch — Low
- **[Low]** `filters Json` is unbounded; no schema-level validation.
### 12. SavedListing
- No findings.
### 13. Transaction — Medium
- **[Medium]** `buyerId onDelete: Restrict` — user-deletion workflow must move or anonymize.
- **[Medium]** Missing `sellerId` direct FK — must resolve through `listing.sellerId`. Denormalize for reporting.
### 14. Inquiry — Medium
- **[Medium]** `phone` plaintext PII — hash or encrypt at rest.
- **[Low]** No `status` field (read/responded/closed) — relied on `isRead` only.
### 15. Lead — Medium
- **[Medium]** `phone` plaintext + `phoneHash` optional — make hash NOT NULL to match `User` pattern.
- **[Low]** `source String` free-form; convert to enum (`referral | organic | paid | imported`).
### 16. Payment — High
- **[High]** `idempotencyKey` is optional on a unique composite `(userId, provider, idempotencyKey)``NULL` is distinct in PG, so duplicate payments with NULL key are still possible. Enforce NOT NULL or fall back to a server-generated UUID.
- **[Medium]** `amountVND BigInt` has no CHECK `amountVND > 0`.
### 17. Order — Medium
- **[Medium]** `amountVND = platformFeeVND + sellerPayoutVND` invariant is not enforced at DB; add trigger or CHECK.
- **[Medium]** `Restrict` on buyer/seller/listing blocks cascading user deletes.
### 18. Escrow — Medium
- **[Medium]** `status` transitions (PENDING → HELD → RELEASED/REFUNDED/DISPUTED) are not enforced; add a state-machine trigger or rely on an app-layer guard.
### 19. Plan — Low
- **[Low]** Price fields `BigInt` with no CHECK ≥ 0.
### 20. Subscription — Medium
- **[Medium]** `@@unique` missing on `(userId)` via `userId String @unique` — already ✓, but no index on `currentPeriodEnd` for renewal sweeps.
### 21. UsageRecord — Medium
- **[Medium]** `metric String` — convert to enum aligned with `Plan.max*` fields; otherwise drift between code/DB is invisible.
- **[Low]** Missing uniqueness on `(subscriptionId, metric, periodStart)` — duplicate counters possible.
### 22. Valuation — Low
- **[Low]** `comparables Json` unbounded — cap size, or move to S3.
### 23. MarketIndex — Low
- **[Low]** `period String` free-form (e.g., "2025-Q4") — enum or regex CHECK.
### 24. NotificationLog — Medium
- **[Medium]** `userId` has no FK (no `@relation`) — silent referential drift.
- **[Medium]** Missing partitioning strategy; this table will grow fast — add TimescaleDB or partition by `createdAt` quarter.
### 25. NotificationPreference — Low
- **[Low]** `eventType String` free-form; enum or CHECK.
### 26. AdminAuditLog — Medium
- **[Medium]** Polymorphic `targetId` + `targetType` lacks FK. Acceptable in audit tables but document the tradeoff.
### 27. ModerationAuditLog — Medium
- **[Medium]** `targetType String` / `action String` — as-designed (to avoid enum migrations), but add a lint to reject unknown values on write.
- **[Medium]** `id String @default(uuid())` while sibling table uses `cuid()` — pick one convention.
### 28. POI — Low
- **[Low]** `osmId @unique` nullable — two rows with null `osmId` will co-exist but that is desired for non-OSM POIs.
### 29. NeighborhoodScore — Low
- **[Low]** Each score field should have CHECK `0 ≤ value ≤ 10`, `totalScore` `0..100`.
### 30. Review — High
- **[High]** Polymorphic `targetType` + `targetId` without FK — highest-volume polymorphic surface, will accumulate dangling references on entity delete. Plan: create per-target-type child tables or enforce with trigger.
- **[Medium]** `rating Int` — CHECK `rating BETWEEN 1 AND 5`.
- **[Medium]** No `moderationStatus` / `hiddenAt` — operational gap for abuse response.
### 31. IndustrialPark — Low
- **[Low]** `occupancyRate Float` needs CHECK `0..100`; `remainingAreaHa` should be derivable from `leasableAreaHa - occupied` — stored-derived risk.
### 32. IndustrialListing — High
- **[High]** Missing `@relation` for `agentId``Agent.id` and `sellerId``User.id`. No FK = no cascade, no referential integrity.
- **[Medium]** No `priceVND` — only `priceUsdM2`; cross-currency reporting requires a rate snapshot column (`fxRateVndPerUsd`).
- **[Medium]** `pricingUnit String` — convert to enum.
### 33. Conversation — Medium
- **[Medium]** `listingId` has no `@relation` — can reference deleted listing.
- **[Low]** `subject` should probably be required when `listingId` is null (CHECK).
### 34. ConversationParticipant — Medium
- **[Medium]** `userId` lacks `@relation` to `User` — orphan participants possible.
### 35. Message — Medium
- **[Medium]** `senderId` no `@relation` to `User`.
- **[Low]** Soft-delete `deletedAt` should be paired with an index for "exclude deleted" queries.
### 36. TransferListing — Medium
- **[Medium]** `contactPhone`, `contactName` plaintext — PII policy alignment with `User.phone`.
- **[Low]** `businessType String?` / `footTraffic String?` — enum candidates.
### 37. TransferItem — Low
- **[Low]** `dimensions Json` — consider splitting into typed columns (`widthCm Float?`, etc.) for range queries.
### 38. Report — Low
- **[Low]** `params Json` + `content Json` — large reports should live in object storage with a pointer column to keep row size small (already has `pdfUrl`).
### 39. MacroeconomicData — Low
- **[Low]** `indicator String` + `unit String` free-form — enum for `indicator` at minimum.
### 40. InfrastructureProject — Medium
- **[Medium]** `category String` / `status String` should both be enums to align with `IndustrialPark`.
- **[Low]** `impactRadius Float` needs CHECK ≥ 0.
### 41. SystemSetting — High
- **[High]** Inline schema comment admits values are plaintext and `isSecret` not actually encrypted — all AI provider credentials live here in plaintext. Escalate to a dedicated secret-store or add envelope encryption before shipping more secrets into this table.
- **[Medium]** `updatedBy String?` has no FK to `User.id`.
## Cross-cutting findings
| # | Finding | Severity |
|---|---|---|
| X-1 | **Polymorphic target patterns** (`Review`, `AdminAuditLog`, `ModerationAuditLog`) lack DB-level integrity. Standardize on either per-target tables or nightly reconciliation. | High |
| X-2 | **Missing `@relation` on foreign-key-shaped fields** in `NotificationLog`, `ConversationParticipant`, `Message`, `Conversation.listingId`, `IndustrialListing.sellerId|agentId`, `SystemSetting.updatedBy`. | High |
| X-3 | **Plaintext PII / secrets**`User.phone`, `Inquiry.phone`, `Lead.phone`, `OAuthAccount.accessToken/refreshToken`, `User.totpSecret` (claimed encrypted), `TransferListing.contactPhone`, `SystemSetting.value`. | High |
| X-4 | **Money fields** (`BigInt`) missing CHECK `> 0` across: `Payment.amountVND`, `Order.amountVND/platformFeeVND/sellerPayoutVND`, `Escrow.amountVND/feeVND`, `Plan.price*`, `Transaction.agreedPrice/depositAmount`. Listing and PriceHistory already have CHECKs (recent migration). | Medium |
| X-5 | **Stored derived values**`IndustrialPark.occupancyRate/remainingAreaHa`, `Property.pricePerM2` — risk drift. Add triggers or move to views. | Medium |
| X-6 | **ID conventions** mixed — 40 tables use `cuid()`, `ModerationAuditLog` uses `uuid()`. Pick one. | Low |
| X-7 | **Enum vs free-form string**`MfaChallenge.type`, `PropertyMedia.type`, `Lead.source`, `UsageRecord.metric`, `NotificationPreference.eventType`, `MarketIndex.period`, `InfrastructureProject.category/status`, `IndustrialListing.pricingUnit`. | Medium |
| X-8 | **High-volume append tables** (`NotificationLog`, `AdminAuditLog`, `ModerationAuditLog`, `PriceHistory`) have no partitioning / retention plan. | Medium |
| X-9 | **`onDelete: Restrict` on commerce edges** — `Listing.sellerId`, `Transaction.buyerId`, `Order.buyerId/sellerId/listingId`, `Payment.userId`. User-deletion workflow (`User.deletedAt`) cannot hard-delete; document as soft-delete-only. | Medium |
| X-10 | **Geospatial consistency** — all six geo models use `geometry(Point, 4326)` + GiST ✓. Consider SRID check constraint. | Low |
## Acceptance checklist (TEC-3010)
- [x] ERD generated from `prisma/schema.prisma`
- [x] `docs/db/ERD.md` committed (Mermaid + rendering notes)
- [x] `docs/db/schema-audit.md` committed with findings + severity per model (41 models)
- [x] Cross-cutting findings documented
## Recommended follow-ups (subtasks under [TEC-3006](/TEC/issues/TEC-3006))
1. Enforce `(userId, provider, idempotencyKey)` uniqueness on `Payment` (X-4).
2. Add missing `@relation` FKs (X-2) — single migration, data-checked.
3. Enum migration sweep (X-7).
4. PII/secret encryption story (X-3) — RFC + migration.
5. Retention/partitioning plan for append-only tables (X-8).
6. Polymorphic target strategy decision (X-1).