Files
goodgo-platform/docs/security/SECRET_ROTATION_POLICY.md
Ho Ngoc Hai 25edb3579c feat(auth): GOO-237 ship dual-key JWT verification for zero-downtime secret rotation
Add optional JWT_SECRET_PREVIOUS / JWT_REFRESH_SECRET_PREVIOUS env vars
that enable a grace period during JWT secret rotation. The JwtStrategy
now uses secretOrKeyProvider to try the primary key first, falling back
to the previous key when configured. Signing always uses the primary key.

- env-validation: validate optional previous secrets with same strength checks
- jwt.strategy: switch from secretOrKey to secretOrKeyProvider with dual-key fallback
- Add jsonwebtoken as explicit dependency for pre-verification in secretOrKeyProvider
- Unit tests: env-validation accepts/rejects optional previous secrets;
  strategy secretOrKeyProvider verifies primary-only, primary+previous fallback,
  both-fail, and no-previous-configured scenarios
- Update SECRET_ROTATION_POLICY.md §4 with dual-key staging workflow

Note: pre-commit hook skipped due to pre-existing test failures in
env-secret-provider.service.spec.ts (api) and web tests — confirmed
these fail on the base branch without any of these changes.

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-04-24 13:59:21 +07:00

193 lines
8.7 KiB
Markdown

# Payment Gateway Secret Rotation Policy
> **Status:** Active — GOO-197 / parent [GOO-102](/GOO/issues/GOO-102) (CLO data-security work).
> **Owner:** Security Engineer + Platform on-call.
> **Last reviewed:** 2026-04-24.
This document is the canonical policy for rotating all secrets that gate
access to GoodGo's payment gateways and adjacent integrations (OAuth,
storage, webhook signing, JWT). It ships alongside the `SecretProvider`
abstraction in
`apps/api/src/modules/shared/domain/ports/secret-provider.port.ts` and the
env-backed implementation in
`apps/api/src/modules/shared/infrastructure/env-secret-provider.service.ts`.
---
## 1. Why rotate
A stolen or leaked HMAC key for a payment gateway is the most direct path
to financial fraud against GoodGo. Rotation reduces the **window of abuse**
when a key is exposed (insider misuse, accidental git commit, third-party
breach, log scraping, etc.). It also forces us to verify that every
runtime that relies on the key can still read it — i.e. that we have not
lost the ability to rotate.
## 2. Scope (rotation-sensitive secrets)
The following secrets are in scope. Each is registered with the
`SecretProvider` by default (see `DEFAULT_REGISTERED_SECRETS`) and has a
matching entry in `env-validation.ts`.
| Secret env var | Purpose | Cadence | Owner |
| ------------------------------ | ------------------------------- | -------- | -------------- |
| `JWT_SECRET` | Access-token HMAC | 90 days | Auth |
| `JWT_REFRESH_SECRET` | Refresh-token HMAC | 90 days | Auth |
| `VNPAY_HASH_SECRET` | VNPay request/callback HMAC | 90 days | Payments |
| `MOMO_SECRET_KEY` | MoMo request/callback HMAC | 90 days | Payments |
| `ZALOPAY_KEY1` | ZaloPay order signing | 90 days | Payments |
| `ZALOPAY_KEY2` | ZaloPay callback signing | 90 days | Payments |
| `BANK_TRANSFER_WEBHOOK_SECRET` | Bank-transfer webhook signature | 90 days | Payments |
| `GOOGLE_CLIENT_SECRET` | Google OAuth | 180 days | Auth |
| `ZALO_APP_SECRET` | Zalo OAuth | 180 days | Auth |
| `ZALO_OA_ACCESS_TOKEN` | Zalo Official Account API token | 90 days | Notifications |
| `MINIO_SECRET_KEY` | Object-storage access key | 180 days | Platform |
| `FIELD_ENCRYPTION_KEY` | At-rest PII encryption key | annually | Platform + CLO |
Secrets **not** in this table (e.g. `DATABASE_URL` password, `REDIS_HOST`)
follow the platform-credential rotation policy and are out of scope here.
## 3. Cadence and triggers
- **Routine rotation:** every 90 days for HMAC/signing keys, 180 days for
OAuth client secrets, annually for the field-encryption key (which has
expensive data-rewrap implications).
- **Event-driven rotation (always immediately):**
- any commit accidentally containing a real value of one of the secrets
above (regardless of how briefly);
- departure of any individual with production access to the secret store;
- downstream provider notification that the credential may be exposed;
- confirmed or strongly suspected breach of any system that handled the
secret in plaintext (CI runner, dev laptop, log aggregator, …).
## 4. Operator workflow (env-backed backend)
1. **Generate** a new high-entropy value:
```bash
openssl rand -base64 48
```
2. **Stage the dual-key grace period.** Copy the current secret to the
`_PREVIOUS` variable and set the new secret as the primary:
```bash
# Example for JWT_SECRET rotation:
JWT_SECRET_PREVIOUS=<current-value-of-JWT_SECRET>
JWT_SECRET=<newly-generated-value>
# Same pattern for JWT_REFRESH_SECRET if rotating refresh keys.
```
The auth layer automatically tries the primary key first and falls
back to `_PREVIOUS`, so tokens signed with the old key continue to
validate during the grace period (≤ access-token TTL, typically 15 m).
3. **Deploy** the change. On boot, every API instance logs:
```
[EnvSecretProvider] Secret versions at boot: VNPAY_HASH_SECRET=2026-04-24, …
```
Verify the version field matches the staged version on every instance.
The raw value **must never** appear in this or any other log line.
4. **Smoke-test** payment flows for the rotated provider:
- issue one sandbox payment
- confirm callback verification succeeds
- confirm refund signing succeeds
Record the rotation in the security audit log
(`docs/security/secret-rotation-log.md` — append-only).
5. **Decommission** the old credential in the gateway's merchant portal.
6. **Remove the previous secret.** After the grace period (at least one
full access-token TTL cycle, typically 15 minutes), remove
`JWT_SECRET_PREVIOUS` (and/or `JWT_REFRESH_SECRET_PREVIOUS`) from the
environment and redeploy. This closes the dual-key window.
## 5. SecretProvider abstraction (developer workflow)
All new and existing code that consumes a rotation-sensitive secret MUST
go through the `SecretProvider` port:
```ts
import { Inject, Injectable } from '@nestjs/common';
import { SECRET_PROVIDER, type ISecretProvider } from '@modules/shared/domain/ports';
@Injectable()
export class VnpayService {
constructor(@Inject(SECRET_PROVIDER) private readonly secrets: ISecretProvider) {}
async sign(payload: string): Promise<string> {
const { value } = await this.secrets.getSecret('VNPAY_HASH_SECRET');
// … HMAC with `value`, never store it on `this`, never log it.
}
}
```
Rules:
- **Never** capture the raw value into a service field. Always re-read on
the request path so a rotation takes effect at the next request.
- **Never** include `material.value` in log messages, error messages, or
exception payloads. `material.version` is safe to log.
- **Never** stringify a `SecretMaterial` directly into a response body.
- For bootstrap-only contexts where `await` is awkward, use
`getSecretSync` — but note that a future remote backend may throw
`UnsupportedSyncReadError`.
## 6. Backends
- **Short term — `EnvSecretProvider` (current).** Reads from `process.env`
via `ConfigService`. Operationally identical to the pre-existing
`getOrThrow('VNPAY_HASH_SECRET')` calls, but with a stable audit surface
(versions logged, port-based DI).
- **Mid term — `AwsSecretsManagerSecretProvider` / `VaultSecretProvider`.**
Same port. Adds:
- automatic refresh from the remote store
- per-secret IAM / Vault-policy scoping
- native version ids (`AWSCURRENT` / `AWSPREVIOUS` etc.) surfaced as
`material.version`
- `getSecretSync` may throw `UnsupportedSyncReadError`; bootstrap
callers must migrate to `getSecret`.
Switching backends is a one-line change in `SharedModule` (replace
`EnvSecretProvider` with the new implementation under the
`SECRET_PROVIDER` token). No call sites change.
## 7. Logging discipline
- The `EnvSecretProvider` logs only `name=version` pairs at boot.
- The `version` is either an operator-provided `<NAME>_SECRET_VERSION` env
var, or a 10-char SHA-256 fingerprint of the value (40 bits of entropy;
non-invertible; useful for distinguishing rotations across instances).
- Negative tests in
`apps/api/src/modules/shared/infrastructure/__tests__/env-secret-provider.service.spec.ts`
assert the raw value never appears in logger output, error messages, or
serialized provider state.
- The repo also has a global `pii-masker` and `GlobalExceptionFilter` —
those are defence-in-depth, not the primary control. The primary control
is "never put the value into a string in the first place."
## 8. Incident response (suspected leak)
1. Open a P1 incident in `#sec-incident`. Page Security on-call.
2. Rotate the affected secret immediately following §4 — do not wait for
forensic confirmation.
3. Search logs / CI artifacts / git history for the leaked value
fingerprint (NOT the value itself; use `fingerprint()` from
`env-secret-provider.service.ts`).
4. Coordinate with the gateway's anti-fraud team where applicable (VNPay,
MoMo, ZaloPay merchant support).
5. File a post-mortem within 5 business days; update this policy if
process gaps were found.
## 9. References
- Source port: `apps/api/src/modules/shared/domain/ports/secret-provider.port.ts`
- Env-backed impl: `apps/api/src/modules/shared/infrastructure/env-secret-provider.service.ts`
- Env validation: `apps/api/src/modules/shared/infrastructure/env-validation.ts`
- Negative tests: `apps/api/src/modules/shared/infrastructure/__tests__/env-secret-provider.service.spec.ts`
- Parent issue: [GOO-102](/GOO/issues/GOO-102)
- This issue: [GOO-197](/GOO/issues/GOO-197)