fix(auth): wire dual-key JWT verification into TokenService for WebSocket auth

Extract shared `verifyWithRotation` helper and `makeSecretOrKeyProvider` into
`jwt-rotation.ts` so both REST (passport-jwt strategy) and WebSocket
(TokenService.verifyAccessToken) paths honour JWT_SECRET_PREVIOUS during
secret rotation. Add env-validation for optional previous secrets and
document the rotation policy for WebSocket sessions.

Resolves GOO-237

Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
This commit is contained in:
Ho Ngoc Hai
2026-04-24 14:44:23 +07:00
parent 455c959f44
commit 3705193f97
8 changed files with 345 additions and 252 deletions

View File

@@ -0,0 +1,192 @@
# Payment Gateway Secret Rotation Policy
> **Status:** Active — GOO-197 / parent [GOO-102](/GOO/issues/GOO-102) (CLO data-security work).
> **Owner:** Security Engineer + Platform on-call.
> **Last reviewed:** 2026-04-24.
This document is the canonical policy for rotating all secrets that gate
access to GoodGo's payment gateways and adjacent integrations (OAuth,
storage, webhook signing, JWT). It ships alongside the `SecretProvider`
abstraction in
`apps/api/src/modules/shared/domain/ports/secret-provider.port.ts` and the
env-backed implementation in
`apps/api/src/modules/shared/infrastructure/env-secret-provider.service.ts`.
---
## 1. Why rotate
A stolen or leaked HMAC key for a payment gateway is the most direct path
to financial fraud against GoodGo. Rotation reduces the **window of abuse**
when a key is exposed (insider misuse, accidental git commit, third-party
breach, log scraping, etc.). It also forces us to verify that every
runtime that relies on the key can still read it — i.e. that we have not
lost the ability to rotate.
## 2. Scope (rotation-sensitive secrets)
The following secrets are in scope. Each is registered with the
`SecretProvider` by default (see `DEFAULT_REGISTERED_SECRETS`) and has a
matching entry in `env-validation.ts`.
| Secret env var | Purpose | Cadence | Owner |
| ------------------------------ | ------------------------------- | -------- | -------------- |
| `JWT_SECRET` | Access-token HMAC | 90 days | Auth |
| `JWT_REFRESH_SECRET` | Refresh-token HMAC | 90 days | Auth |
| `VNPAY_HASH_SECRET` | VNPay request/callback HMAC | 90 days | Payments |
| `MOMO_SECRET_KEY` | MoMo request/callback HMAC | 90 days | Payments |
| `ZALOPAY_KEY1` | ZaloPay order signing | 90 days | Payments |
| `ZALOPAY_KEY2` | ZaloPay callback signing | 90 days | Payments |
| `BANK_TRANSFER_WEBHOOK_SECRET` | Bank-transfer webhook signature | 90 days | Payments |
| `GOOGLE_CLIENT_SECRET` | Google OAuth | 180 days | Auth |
| `ZALO_APP_SECRET` | Zalo OAuth | 180 days | Auth |
| `ZALO_OA_ACCESS_TOKEN` | Zalo Official Account API token | 90 days | Notifications |
| `MINIO_SECRET_KEY` | Object-storage access key | 180 days | Platform |
| `FIELD_ENCRYPTION_KEY` | At-rest PII encryption key | annually | Platform + CLO |
Secrets **not** in this table (e.g. `DATABASE_URL` password, `REDIS_HOST`)
follow the platform-credential rotation policy and are out of scope here.
## 3. Cadence and triggers
- **Routine rotation:** every 90 days for HMAC/signing keys, 180 days for
OAuth client secrets, annually for the field-encryption key (which has
expensive data-rewrap implications).
- **Event-driven rotation (always immediately):**
- any commit accidentally containing a real value of one of the secrets
above (regardless of how briefly);
- departure of any individual with production access to the secret store;
- downstream provider notification that the credential may be exposed;
- confirmed or strongly suspected breach of any system that handled the
secret in plaintext (CI runner, dev laptop, log aggregator, …).
## 4. Operator workflow (env-backed backend)
1. **Generate** a new high-entropy value:
```bash
openssl rand -base64 48
```
2. **Stage the dual-key grace period.** Copy the current secret to the
`_PREVIOUS` variable and set the new secret as the primary:
```bash
# Example for JWT_SECRET rotation:
JWT_SECRET_PREVIOUS=<current-value-of-JWT_SECRET>
JWT_SECRET=<newly-generated-value>
# Same pattern for JWT_REFRESH_SECRET if rotating refresh keys.
```
The auth layer automatically tries the primary key first and falls
back to `_PREVIOUS`, so tokens signed with the old key continue to
validate during the grace period (≤ access-token TTL, typically 15 m).
3. **Deploy** the change. On boot, every API instance logs:
```
[EnvSecretProvider] Secret versions at boot: VNPAY_HASH_SECRET=2026-04-24, …
```
Verify the version field matches the staged version on every instance.
The raw value **must never** appear in this or any other log line.
4. **Smoke-test** payment flows for the rotated provider:
- issue one sandbox payment
- confirm callback verification succeeds
- confirm refund signing succeeds
Record the rotation in the security audit log
(`docs/security/secret-rotation-log.md` — append-only).
5. **Decommission** the old credential in the gateway's merchant portal.
6. **Remove the previous secret.** After the grace period (at least one
full access-token TTL cycle, typically 15 minutes), remove
`JWT_SECRET_PREVIOUS` (and/or `JWT_REFRESH_SECRET_PREVIOUS`) from the
environment and redeploy. This closes the dual-key window.
## 5. SecretProvider abstraction (developer workflow)
All new and existing code that consumes a rotation-sensitive secret MUST
go through the `SecretProvider` port:
```ts
import { Inject, Injectable } from '@nestjs/common';
import { SECRET_PROVIDER, type ISecretProvider } from '@modules/shared/domain/ports';
@Injectable()
export class VnpayService {
constructor(@Inject(SECRET_PROVIDER) private readonly secrets: ISecretProvider) {}
async sign(payload: string): Promise<string> {
const { value } = await this.secrets.getSecret('VNPAY_HASH_SECRET');
// … HMAC with `value`, never store it on `this`, never log it.
}
}
```
Rules:
- **Never** capture the raw value into a service field. Always re-read on
the request path so a rotation takes effect at the next request.
- **Never** include `material.value` in log messages, error messages, or
exception payloads. `material.version` is safe to log.
- **Never** stringify a `SecretMaterial` directly into a response body.
- For bootstrap-only contexts where `await` is awkward, use
`getSecretSync` — but note that a future remote backend may throw
`UnsupportedSyncReadError`.
## 6. Backends
- **Short term — `EnvSecretProvider` (current).** Reads from `process.env`
via `ConfigService`. Operationally identical to the pre-existing
`getOrThrow('VNPAY_HASH_SECRET')` calls, but with a stable audit surface
(versions logged, port-based DI).
- **Mid term — `AwsSecretsManagerSecretProvider` / `VaultSecretProvider`.**
Same port. Adds:
- automatic refresh from the remote store
- per-secret IAM / Vault-policy scoping
- native version ids (`AWSCURRENT` / `AWSPREVIOUS` etc.) surfaced as
`material.version`
- `getSecretSync` may throw `UnsupportedSyncReadError`; bootstrap
callers must migrate to `getSecret`.
Switching backends is a one-line change in `SharedModule` (replace
`EnvSecretProvider` with the new implementation under the
`SECRET_PROVIDER` token). No call sites change.
## 7. Logging discipline
- The `EnvSecretProvider` logs only `name=version` pairs at boot.
- The `version` is either an operator-provided `<NAME>_SECRET_VERSION` env
var, or a 10-char SHA-256 fingerprint of the value (40 bits of entropy;
non-invertible; useful for distinguishing rotations across instances).
- Negative tests in
`apps/api/src/modules/shared/infrastructure/__tests__/env-secret-provider.service.spec.ts`
assert the raw value never appears in logger output, error messages, or
serialized provider state.
- The repo also has a global `pii-masker` and `GlobalExceptionFilter` —
those are defence-in-depth, not the primary control. The primary control
is "never put the value into a string in the first place."
## 8. Incident response (suspected leak)
1. Open a P1 incident in `#sec-incident`. Page Security on-call.
2. Rotate the affected secret immediately following §4 — do not wait for
forensic confirmation.
3. Search logs / CI artifacts / git history for the leaked value
fingerprint (NOT the value itself; use `fingerprint()` from
`env-secret-provider.service.ts`).
4. Coordinate with the gateway's anti-fraud team where applicable (VNPay,
MoMo, ZaloPay merchant support).
5. File a post-mortem within 5 business days; update this policy if
process gaps were found.
## 9. References
- Source port: `apps/api/src/modules/shared/domain/ports/secret-provider.port.ts`
- Env-backed impl: `apps/api/src/modules/shared/infrastructure/env-secret-provider.service.ts`
- Env validation: `apps/api/src/modules/shared/infrastructure/env-validation.ts`
- Negative tests: `apps/api/src/modules/shared/infrastructure/__tests__/env-secret-provider.service.spec.ts`
- Parent issue: [GOO-102](/GOO/issues/GOO-102)
- This issue: [GOO-197](/GOO/issues/GOO-197)