Files
goodgo-platform/docs/osm-data-model.md
Ho Ngoc Hai 1e9ef567a9
Some checks failed
CI / AI Services (Python) — Smoke (push) Failing after 35s
Deploy / Build Web Image (push) Failing after 30s
Deploy / Build AI Services Image (push) Failing after 11s
E2E Tests / Playwright E2E (push) Failing after 37s
CI / Lint → Typecheck → Test → Build (22) (push) Failing after 11m1s
Deploy / Build API Image (push) Failing after 10m40s
Backup Verification / Backup Restore Verification (push) Failing after 14s
Deploy / Deploy to Staging (push) Has been cancelled
Deploy / Smoke Test Staging (push) Has been cancelled
CI / E2E Tests (push) Has been cancelled
Deploy / Rollback Staging (push) Has been cancelled
Deploy / Deploy to Production (push) Has been cancelled
Deploy / Smoke Test Production (push) Has been cancelled
Deploy / Rollback Production (push) Has been cancelled
Security Scanning / Dependency Audit (pnpm) (push) Failing after 6s
Security Scanning / Trivy Scan — API Image (push) Failing after 42s
Security Scanning / Trivy Scan — Web Image (push) Failing after 27s
Security Scanning / Trivy Scan — AI Services Image (push) Failing after 26s
Security Scanning / Trivy Filesystem Scan (push) Failing after 23s
Security Scanning / Security Gate (push) Failing after 1s
CodeQL Analysis / CodeQL (javascript-typescript) (push) Failing after 49s
docs(osm): note 2025 VN admin reform — vn_districts now holds ward/commune layer
Vietnam dropped the district administrative level in the 2025 reform
(Nghị quyết về sắp xếp đơn vị hành chính). Only two levels remain:
province (level=4) and ward/commune (level=6).

OSM has updated tagging accordingly: every former xã/phường/thị trấn
that survived the merge is now `admin_level=6`, no `admin_level=8`
features for VN. Our sync confirmed this — 3,189 level=6 units inserted
across 33 provinces, level=8 returns zero.

The schema column "vn_districts" stays as-is to avoid a cascade-rename
across IndustrialPark / ProjectDevelopment / Property FKs. Documented
the semantic shift in osm-data-model.md so future ops don't think
something is broken when wards are empty.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 13:13:26 +07:00

93 lines
5.2 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# OSM Data Model — GoodGo Platform
This document is the canonical reference for every OpenStreetMap-sourced
table in the GoodGo database, the sync pipelines that populate them, and
the query patterns that use them.
## Tables at a glance
| Table | Source | Geometry | Sync cadence | Used by |
|-------|--------|----------|--------------|---------|
| `vn_provinces` | OSM `boundary=administrative + admin_level=4` | MultiPolygon | Weekly (Mon 02:30 ICT) | `GeoLookupService`, KCN sync, address auto-fill |
| `vn_districts` | OSM `admin_level=6` | MultiPolygon | Weekly (Wed 02:30 ICT) | Same as above. **After the 2025 reform** this table effectively holds the new ward / commune layer (~3,200 units), since Vietnam dropped the district level. The schema name is kept for backwards-compat with goodgo's existing FK references. |
| `vn_wards` | OSM `admin_level=8` | MultiPolygon | Weekly (Sat 02:30 ICT) | Same as above. **Note**: after the 2025 admin reform Vietnam only uses level=4 (province) + level=6 (ward/commune). OSM doesn't currently tag any VN feature with admin_level=8, so this table will stay empty until/unless the policy changes. Kept for forward-compat. |
| `Poi` | OSM nodes/ways/relations matching 20 category selectors | Point | Daily 1 category rotation (02:00 ICT) | `/poi/nearby`, `/poi/by-bbox`, listing sidebar, search filter |
| `TransportLine` | OSM `route=subway|train|highway` relations | MultiLineString | Monthly | Distance scoring, planned for Phase 2 UX |
| `IndustrialPark` | OSM `landuse=industrial` ways/relations | Point + MultiPolygon boundary | Monthly (1st 03:00 ICT, 4 chunks) | `/industrial/parks/*`, KCN catalog |
| `OsmSyncRun` | Generated by orchestrator | — | Append-only audit | `/admin/osm` dashboard |
All sync writes are gated by `OSM_SYNC_ENABLED=true` so dev / staging
environments don't hit Overpass accidentally.
## GeoLookupService — the foundation
Every other layer depends on `vn_provinces.geometry` for PostGIS
`ST_Contains` lookups. The service exposes:
```ts
const r = await geo.lookup(lng, lat);
// → { province: { code, name }, district: { code, name }, ward: { code, name } }
const inside = await geo.isInVietnam(lng, lat);
// → boolean
const cov = await geo.coverage();
// → { provinces: { total, withGeometry, lastSyncedAt }, districts: ..., wards: ... }
```
It replaces the old `nearestProvince()` heuristic that walked a
hardcoded centroid table.
## Quality gates baked into sync scripts
1. **Geographic gate**`isPointInVietnam(lng, lat)` from
`scripts/data/vn-country-polygon.ts` rejects rows whose centroid
falls outside the VN mainland polygon (catches China / Laos /
Cambodia bleed across the Overpass bbox chunks).
2. **Name gate** — rows whose `name` contains zero Latin/Vietnamese
letters (`/[A-Za-zÀ-ỹ]/`) are dropped (filters CJK / Khmer / Thai).
3. **Lock gate** — when an admin sets `osmLocked=true` or adds a column
to `lockedFields`, the next sync skips that row entirely (or that
column) so manual edits survive.
## Adding a new POI category
1. Add the enum value to `PoiCategory` in `prisma/schema.prisma` and
create a Prisma migration that `ALTER TYPE "PoiCategory" ADD VALUE`.
2. Add the Overpass selector to `CATEGORY_QUERIES` in
`scripts/sync-osm-poi.ts`.
3. Append the same enum value to the `POI_CATEGORIES` rotation list in
`OsmSyncCronService` so the cron picks it up.
4. Add labels + icons + colour to `apps/web/lib/poi-api.ts` so the UI
chips render.
That's it — `OsmSyncService.findLayer('poi', 'YOUR_CAT')` will return a
def automatically because `SYNC_LAYERS` is generated from the enum keys.
## Operational runbook
* **Sync hangs / 504 from Overpass** — `kubectl describe pod` on the
Kaniko-style sync runner shows the chunk in flight. The script has
a 5× retry on the clone step (HTTP 504 from Gitea is transient).
For Overpass itself, raise the per-script `[out:json][timeout:N]`
by editing the script. Default 180s for POI, 300s for boundaries.
* **Runs stuck in `RUNNING` state** — `OsmSyncOrchestrator` writes the
row before spawning the script. If the script process dies without
emitting an `exit` event, the row stays RUNNING. Mitigation: cron
job to flip RUNNING > 6h old to FAILED with `errorMessage='timeout'`.
* **Conflict logs** — when sync updates a column the admin had locked,
it skips the column silently. There is no separate conflict table
(yet). To audit, search Loki for `[osm-sync] skipping locked field`.
## Phase status
| Phase | Status | Notes |
|-------|--------|-------|
| 0 — Admin boundaries + GeoLookupService | ✅ Schema, sync, service done. Provinces synced (33), districts in progress |
| 1 — POI catalog + sync | ✅ Schema + sync script + NestJS module + sidebar component done. Hospital category synced (~500 rows) |
| 2 — Transport (metro/railway/airport) | 🟡 Stations synced via POI; lines layer pending |
| 3 — Buildings / landuse | ⏳ Deferred — admin says low priority |
| 4 — Sync orchestrator + admin dashboard | ✅ Service + cron + Prometheus-friendly stats + admin UI done |
| 5 — User-facing UX | 🟡 Listing + KCN sidebar wired; search filter widget built; map overlays pending |
| 6 — Performance hardening | ⏳ Materialized views + Redis cache pending |