Files
goodgo-platform/scripts/prune-non-vietnam-osm.ts
Ho Ngoc Hai d6ac7c316f
Some checks failed
CI / Lint → Typecheck → Test → Build (22) (push) Failing after 8s
CI / E2E Tests (push) Has been skipped
CI / AI Services (Python) — Smoke (push) Failing after 7s
CodeQL Analysis / CodeQL (javascript-typescript) (push) Failing after 1m8s
Deploy / Build API Image (push) Failing after 7s
Deploy / Build Web Image (push) Failing after 6s
Deploy / Build AI Services Image (push) Failing after 5s
E2E Tests / Playwright E2E (push) Failing after 9s
Security Scanning / Dependency Audit (pnpm) (push) Failing after 3s
Security Scanning / Trivy Scan — API Image (push) Failing after 40s
Security Scanning / Trivy Scan — Web Image (push) Failing after 44s
Security Scanning / Trivy Scan — AI Services Image (push) Failing after 45s
Security Scanning / Trivy Filesystem Scan (push) Failing after 1m8s
Deploy / Deploy to Staging (push) Has been skipped
Deploy / Smoke Test Staging (push) Has been skipped
Deploy / Deploy to Production (push) Has been skipped
Deploy / Smoke Test Production (push) Has been skipped
Security Scanning / Security Gate (push) Failing after 1s
Deploy / Rollback Staging (push) Has been skipped
Deploy / Rollback Production (push) Has been skipped
feat(industrial): drop non-VN OSM rows + gate sync with country polygon
The OSM bbox sync was picking up `landuse=industrial` polygons that sit
just across the borders in Laos, Thailand, Cambodia and southern China.
After the bulk promote we ended up with 220 of those in the public
catalog — Vientiane SEZ, Phnom Penh SEZ, Sihanoukville SEZ, several
Thai industrial estates etc.

Two-part fix:

1. `scripts/data/vn-country-polygon.ts` — a hand-traced ~30-vertex
   GeoJSON polygon that follows VN's land + sea border. The eastern
   edge is generous (110°E) so every coastal industrial zone (Vũng Áng
   / Formosa, Dung Quất, Nhơn Hội, Vũng Tàu / Long Sơn) sits comfortably
   inside; the western/northern edges trace the actual neighbour
   borders. Includes a pure-JS `isPointInVietnam(lng, lat)` ray-cast
   helper for the sync script (no extra dep).

2. `scripts/prune-non-vietnam-osm.ts` — one-shot cleaner. Uses PostGIS
   `ST_Within(location, polygon)` to delete every OSM row whose centroid
   falls outside. Verified the polygon doesn't reject genuine VN parks
   (Formosa Hà Tĩnh, Dung Quất, Nhơn Hội, KCN Đất Đỏ etc. all pass).

3. `sync-osm-industrial-parks.ts` `parseFeature()` now calls
   `isPointInVietnam` after computing the centroid and bails early on a
   miss, so the next monthly cron run won't re-import them.

Run on dev: removed 220 rows. Final catalog 1,483 KCN, all inside the
Vietnam mainland polygon.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 00:27:37 +07:00

91 lines
3.0 KiB
TypeScript

/**
* Prune `IndustrialPark` rows whose centroid is outside the Vietnam
* mainland polygon. Catches the cross-border bleed (Laos, Thailand,
* Cambodia) that the Overpass bbox sync inevitably picks up.
*
* Usage:
* NODE_OPTIONS="-r dotenv/config" DOTENV_CONFIG_PATH=.env \
* pnpm tsx scripts/prune-non-vietnam-osm.ts [--dry-run]
*
* Strategy:
* 1. Build a PostGIS polygon from `VN_COUNTRY_POLYGON_GEOJSON`.
* 2. SELECT rows where `NOT ST_Within(location, polygon)`, scoped to
* OSM-sourced rows (we never want to delete a manually-curated
* row even if its centroid is wonky).
* 3. DELETE in one statement (cascade removes any IndustrialListing
* rows attached to those parks).
*
* Safe to re-run: idempotent.
*/
import 'dotenv/config';
import { PrismaPg } from '@prisma/adapter-pg';
import { PrismaClient } from '@prisma/client';
import pg from 'pg';
import { VN_COUNTRY_POLYGON_GEOJSON } from './data/vn-country-polygon';
const pool = new pg.Pool({ connectionString: process.env['DATABASE_URL'] });
const adapter = new PrismaPg(pool);
const prisma = new PrismaClient({ adapter });
const dryRun = process.argv.includes('--dry-run');
async function main(): Promise<void> {
const polygonSql = `ST_SetSRID(ST_GeomFromGeoJSON('${VN_COUNTRY_POLYGON_GEOJSON.replace(
/'/g,
"''",
)}'), 4326)`;
const outsideRows = await prisma.$queryRawUnsafe<
{ id: string; name: string; province: string; lat: number; lng: number; ha: number }[]
>(
`SELECT id, name, province,
ROUND(ST_Y(location::geometry)::numeric, 3)::float AS lat,
ROUND(ST_X(location::geometry)::numeric, 3)::float AS lng,
COALESCE("totalAreaHa", 0) AS ha
FROM "IndustrialPark"
WHERE "dataSource" IN ('OSM', 'OSM_PROMOTED')
AND NOT ST_Within(location::geometry, ${polygonSql})
ORDER BY ha DESC NULLS LAST`,
);
console.log(`📍 Found ${outsideRows.length} OSM rows OUTSIDE the VN polygon.`);
if (outsideRows.length === 0) {
console.log('✓ Catalog is clean.');
return;
}
// Show the top 15 by area so the operator can sanity-check before deleting.
console.log(' Top 15 by area (will be deleted):');
for (const row of outsideRows.slice(0, 15)) {
console.log(
` ${row.name.slice(0, 50).padEnd(50)} ${row.province.slice(0, 16).padEnd(16)} ${
row.ha
} ha (${row.lat}, ${row.lng})`,
);
}
if (dryRun) {
console.log('💡 --dry-run: no writes performed.');
return;
}
console.log(`\n🗑 Deleting ${outsideRows.length} rows…`);
const result = await prisma.$executeRawUnsafe(
`DELETE FROM "IndustrialPark"
WHERE "dataSource" IN ('OSM', 'OSM_PROMOTED')
AND NOT ST_Within(location::geometry, ${polygonSql})`,
);
console.log(`✓ Removed ${result} rows.`);
}
main()
.catch((err) => {
console.error(err);
process.exitCode = 1;
})
.finally(async () => {
await prisma.$disconnect();
await pool.end();
});