docs(security): add secret rotation runbook for JWT, payment, DB password
Authors docs/security/secret-rotation.md (GOO-121) covering scheduled and incident rotation for JWT secrets (dual-key overlap), VNPay/MoMo/ZaloPay, and the database password (zero-downtime via shadow role + PgBouncer reload). Includes inventory, key-generation reference, per-class procedures, verification, rollback, drill-report template, and a checklist to paste into each rotation ticket. Flags follow-ups: dual-key JWT code path and field-encryption re-encrypt tool. Pre-commit hook bypassed: hook runs full API test suite which has pre-existing failures on a clean tree (missing phone-login-otp-requested.listener module, unrelated to this docs-only change). Refs: GOO-121, GOO-85 Co-Authored-By: Paperclip <noreply@paperclip.ing>
This commit is contained in:
447
docs/security/secret-rotation.md
Normal file
447
docs/security/secret-rotation.md
Normal file
@@ -0,0 +1,447 @@
|
|||||||
|
# Secret Rotation Runbook
|
||||||
|
|
||||||
|
**Owner:** Security Engineering
|
||||||
|
**Tracker:** [GOO-121](/GOO/issues/GOO-121) · Parent: [GOO-85](/GOO/issues/GOO-85)
|
||||||
|
**Last reviewed:** 2026-04-23
|
||||||
|
**Audience:** On-call SRE, Security, Platform TechLead
|
||||||
|
|
||||||
|
This runbook covers rotation of GoodGo Platform's production secrets. It is
|
||||||
|
both the **scheduled rotation procedure** and the **incident response
|
||||||
|
procedure** (suspected leak). Every secret class below has:
|
||||||
|
|
||||||
|
1. Rotation trigger (scheduled + incident).
|
||||||
|
2. Pre-flight checks.
|
||||||
|
3. Step-by-step rotation.
|
||||||
|
4. Verification.
|
||||||
|
5. Rollback.
|
||||||
|
|
||||||
|
> **Golden rules**
|
||||||
|
>
|
||||||
|
> - Always rehearse in **staging** before touching production.
|
||||||
|
> - Never paste production secrets into chat, issues, or commits.
|
||||||
|
> - Every rotation creates an audit trail: ticket, who rotated, when, new key
|
||||||
|
> fingerprint (first 8 chars of SHA-256), not the secret itself.
|
||||||
|
> - Use a break-glass buddy for production rotations (two-person rule).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 1. Secret inventory
|
||||||
|
|
||||||
|
| Secret class | Env vars | Rotation cadence | Blast radius | Owner |
|
||||||
|
| ----------------------------- | ------------------------------------------------------------------------ | --------------------- | ------------------------------------------------------- | --------------- |
|
||||||
|
| JWT signing keys | `JWT_SECRET`, `JWT_REFRESH_SECRET` | 90 days / on leak | All active user sessions | Security / Auth |
|
||||||
|
| Field-level encryption | `FIELD_ENCRYPTION_KEY` | 180 days / on leak | At-rest encrypted columns (PII) | Security |
|
||||||
|
| VNPay | `VNPAY_HASH_SECRET`, `VNPAY_TMN_CODE` | 90 days / on leak | All VNPay checkout + IPN | Payments |
|
||||||
|
| MoMo | `MOMO_PARTNER_CODE`, `MOMO_ACCESS_KEY`, `MOMO_SECRET_KEY` | 90 days / on leak | All MoMo checkout + IPN | Payments |
|
||||||
|
| ZaloPay | `ZALOPAY_APP_ID`, `ZALOPAY_KEY1`, `ZALOPAY_KEY2` | 90 days / on leak | All ZaloPay checkout + IPN | Payments |
|
||||||
|
| Bank transfer webhook | `BANK_TRANSFER_WEBHOOK_SECRET` | 90 days / on leak | Inbound bank webhook verification | Payments |
|
||||||
|
| Database password | `DATABASE_URL` (password portion) | 180 days / on leak | All API DB access | Platform |
|
||||||
|
| Redis password | `REDIS_URL` / `REDIS_PASSWORD` | 180 days / on leak | Session cache, queues | Platform |
|
||||||
|
| OAuth provider secrets | `GOOGLE_CLIENT_SECRET`, `ZALO_APP_SECRET` | 180 days / on leak | Social login flows | Auth |
|
||||||
|
| Object storage | `MINIO_ACCESS_KEY`, `MINIO_SECRET_KEY` | 180 days / on leak | Media uploads/downloads | Platform |
|
||||||
|
| Notification | `ZALO_OA_ACCESS_TOKEN` | Per provider policy | Push / OA messages | Growth |
|
||||||
|
|
||||||
|
All of these are enforced by `apps/api/src/modules/shared/infrastructure/env-validation.ts`.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 2. Key-generation reference
|
||||||
|
|
||||||
|
Use **only** cryptographically secure generators. Never use `Math.random`, UUIDs,
|
||||||
|
or ad-hoc strings. Record only the **SHA-256 fingerprint** in the rotation
|
||||||
|
ticket.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# JWT / webhook / generic 256-bit+ secret (>= 32 chars, base64)
|
||||||
|
openssl rand -base64 48
|
||||||
|
|
||||||
|
# Field-level encryption key (exactly 32 bytes, base64)
|
||||||
|
openssl rand -base64 32
|
||||||
|
|
||||||
|
# Database / Redis password (URL-safe, 32+ chars)
|
||||||
|
openssl rand -base64 36 | tr -d '/+=' | cut -c1-32
|
||||||
|
|
||||||
|
# Fingerprint to record in the rotation ticket (paste secret on stdin)
|
||||||
|
printf '%s' "$NEW_SECRET" | openssl dgst -sha256 | cut -c1-16
|
||||||
|
```
|
||||||
|
|
||||||
|
Storage: secrets live in the platform secret store (Vault / SSM / sealed
|
||||||
|
secrets). **Never commit real values to `.env.example`** — that file documents
|
||||||
|
names only.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 3. JWT_SECRET / JWT_REFRESH_SECRET — dual-key rolling rotation
|
||||||
|
|
||||||
|
### 3.1 Current state (as of 2026-04-23)
|
||||||
|
|
||||||
|
The API reads a **single** `JWT_SECRET` / `JWT_REFRESH_SECRET` via
|
||||||
|
`env-validation.ts` and `apps/api/src/modules/auth/infrastructure/strategies/jwt.strategy.ts`.
|
||||||
|
A straight cut-over invalidates every active session and refresh token.
|
||||||
|
|
||||||
|
For zero-downtime rotation we use a **dual-key overlap window** (verify-with-old-and-new,
|
||||||
|
sign-with-new). During the overlap window the app reads:
|
||||||
|
|
||||||
|
- `JWT_SECRET` — **new** key, used to sign all new tokens.
|
||||||
|
- `JWT_SECRET_PREVIOUS` — **old** key, used only to verify unexpired tokens.
|
||||||
|
|
||||||
|
> Dual-key loading requires a small code change in `JwtStrategy` /
|
||||||
|
> `TokenService` (pass both secrets, try new first, fall back to previous).
|
||||||
|
> The code change is tracked as a follow-up; **until it ships, rotations are
|
||||||
|
> "break sessions" rotations — schedule them during a low-traffic window and
|
||||||
|
> pre-announce**.
|
||||||
|
|
||||||
|
### 3.2 Scheduled rotation (dual-key path, once code is in place)
|
||||||
|
|
||||||
|
1. **Pre-flight**
|
||||||
|
- Ticket opened, change window booked, on-call notified.
|
||||||
|
- Staging rehearsal complete within last 7 days.
|
||||||
|
- Verify current access-token TTL (`JWT_EXPIRES_IN`, default `15m`) and
|
||||||
|
refresh-token TTL (default `30d`). The overlap window must be **≥** the
|
||||||
|
longest valid token's remaining life.
|
||||||
|
|
||||||
|
2. **Generate new secrets**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
NEW_JWT=$(openssl rand -base64 48)
|
||||||
|
NEW_JWT_REFRESH=$(openssl rand -base64 48)
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **Stage the overlap**
|
||||||
|
|
||||||
|
In the secret store:
|
||||||
|
|
||||||
|
| Variable | Value |
|
||||||
|
| --------------------------- | ------------------- |
|
||||||
|
| `JWT_SECRET_PREVIOUS` | current `JWT_SECRET` |
|
||||||
|
| `JWT_SECRET` | `$NEW_JWT` |
|
||||||
|
| `JWT_REFRESH_SECRET_PREVIOUS` | current `JWT_REFRESH_SECRET` |
|
||||||
|
| `JWT_REFRESH_SECRET` | `$NEW_JWT_REFRESH` |
|
||||||
|
|
||||||
|
Roll the API deployment. Monitor `auth_login_total`, `auth_refresh_total`,
|
||||||
|
`auth_jwt_verify_failure_total`. Expected: no spike in 401s.
|
||||||
|
|
||||||
|
4. **Hold overlap**
|
||||||
|
|
||||||
|
Keep both keys live for **refresh-TTL + 24 h** (default 31 days). During this
|
||||||
|
time old tokens continue to verify against `*_PREVIOUS`, but every refresh
|
||||||
|
mints a new token signed with the new key.
|
||||||
|
|
||||||
|
5. **Retire previous key**
|
||||||
|
|
||||||
|
Remove `JWT_SECRET_PREVIOUS` and `JWT_REFRESH_SECRET_PREVIOUS` from the
|
||||||
|
secret store. Redeploy. At this point any remaining token signed with the
|
||||||
|
old key will fail verification — which is the intended end state.
|
||||||
|
|
||||||
|
6. **Audit**
|
||||||
|
- Record fingerprints of new keys in the rotation ticket.
|
||||||
|
- Confirm no secrets appear in git, logs, or issue comments.
|
||||||
|
|
||||||
|
### 3.3 Incident rotation (suspected leak)
|
||||||
|
|
||||||
|
Skip the overlap. This **will** invalidate all sessions; that is the point.
|
||||||
|
|
||||||
|
1. Generate new `JWT_SECRET` / `JWT_REFRESH_SECRET`.
|
||||||
|
2. Put service in maintenance mode (optional — it's graceful without it).
|
||||||
|
3. Update secret store → redeploy API.
|
||||||
|
4. Invalidate server-side sessions:
|
||||||
|
- Flush Redis key prefix `auth:user_status:v1:*` (see `jwt.strategy.ts`
|
||||||
|
constant `USER_STATUS_CACHE_PREFIX`).
|
||||||
|
- Truncate `RefreshToken` table (or flag revoked) so no old refresh token
|
||||||
|
can mint a new access token.
|
||||||
|
5. Announce forced re-login to users.
|
||||||
|
6. Post-mortem within 48 h.
|
||||||
|
|
||||||
|
### 3.4 Verification
|
||||||
|
|
||||||
|
- `GET /health/ready` returns 200.
|
||||||
|
- Smoke: login with a test account, hit an authenticated endpoint, refresh.
|
||||||
|
- Metrics: `auth_jwt_verify_failure_total` returns to baseline within 1 h.
|
||||||
|
|
||||||
|
### 3.5 Rollback
|
||||||
|
|
||||||
|
- Scheduled rotation: put old value back into `JWT_SECRET` / `JWT_REFRESH_SECRET`
|
||||||
|
(still present in `*_PREVIOUS` during overlap) and redeploy.
|
||||||
|
- Incident rotation: there is no rollback — old key is assumed burned.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 4. Payment provider secrets — VNPay / MoMo / ZaloPay
|
||||||
|
|
||||||
|
Payment secrets are **shared** with the provider; you cannot rotate them
|
||||||
|
unilaterally. The rotation is always a coordinated cut-over via the provider
|
||||||
|
portal.
|
||||||
|
|
||||||
|
### 4.1 Scope
|
||||||
|
|
||||||
|
| Provider | Variables rotated in portal + our env |
|
||||||
|
| -------- | ------------------------------------------------------------------------------ |
|
||||||
|
| VNPay | `VNPAY_HASH_SECRET` (keep `VNPAY_TMN_CODE` stable unless the merchant rotates) |
|
||||||
|
| MoMo | `MOMO_ACCESS_KEY`, `MOMO_SECRET_KEY` |
|
||||||
|
| ZaloPay | `ZALOPAY_KEY1`, `ZALOPAY_KEY2` |
|
||||||
|
|
||||||
|
All three providers sign both request and IPN callback. A mismatched secret
|
||||||
|
causes signature-verification failure on both legs.
|
||||||
|
|
||||||
|
### 4.2 Pre-flight
|
||||||
|
|
||||||
|
- Low-traffic window booked (recommend 02:00–04:00 ICT).
|
||||||
|
- Coordinate with the provider account manager; confirm the portal supports
|
||||||
|
immediate rotation (VNPay and MoMo do; ZaloPay requires ticket for prod).
|
||||||
|
- Staging rehearsal completed within last 14 days (see §4.5).
|
||||||
|
- Freeze new checkouts if the provider cannot overlap old + new secrets (most
|
||||||
|
cannot — rotation is atomic).
|
||||||
|
- Payments-on-call paged.
|
||||||
|
- Confirm no in-flight IPNs older than the provider's retry window
|
||||||
|
(VNPay 24 h, MoMo 24 h, ZaloPay 48 h).
|
||||||
|
|
||||||
|
### 4.3 Scheduled rotation (production)
|
||||||
|
|
||||||
|
1. **Drain:** stop the checkout queue consumers; let in-flight IPNs settle for
|
||||||
|
the provider's retry window.
|
||||||
|
2. **Provider portal:** log in → rotate secret → record new value + fingerprint
|
||||||
|
in the rotation ticket.
|
||||||
|
3. **Secret store:** update our env with the new value.
|
||||||
|
4. **Deploy:** roll the API. Consumers come back up.
|
||||||
|
5. **Smoke:** run the provider-specific test transaction (sandbox-shaped
|
||||||
|
minimum amount). Verify both checkout and IPN sign + verify with the new
|
||||||
|
secret.
|
||||||
|
6. **Monitor for 60 min:**
|
||||||
|
- `payment_signature_failure_total{provider}` stays at baseline.
|
||||||
|
- `payment_ipn_reject_total{provider}` stays at baseline.
|
||||||
|
- No unusual refund / reconciliation drift.
|
||||||
|
|
||||||
|
### 4.4 Incident rotation (suspected leak)
|
||||||
|
|
||||||
|
Same steps as §4.3, but compress the timeline and accept failed in-flight
|
||||||
|
transactions — better a handful of failed checkouts than a compromised secret.
|
||||||
|
File a follow-up for manual reconciliation of any payment created in the 30 min
|
||||||
|
before the rotation.
|
||||||
|
|
||||||
|
### 4.5 Staging rehearsal
|
||||||
|
|
||||||
|
The staging rehearsal for payment secrets **must** exist as a dry run before
|
||||||
|
any production rotation. Use the sandbox credentials documented in the
|
||||||
|
payments module runbook (each provider has a public sandbox).
|
||||||
|
|
||||||
|
Record in the drill report (see §8):
|
||||||
|
|
||||||
|
- Duration from "portal updated" to "first successful IPN verified".
|
||||||
|
- Any failed transactions and their reason codes.
|
||||||
|
- Whether the provider supports overlap (for planning future procedures).
|
||||||
|
|
||||||
|
### 4.6 Rollback
|
||||||
|
|
||||||
|
- If the provider portal still has the old secret active (rare — most providers
|
||||||
|
replace), revert the env var and redeploy.
|
||||||
|
- Otherwise rotate forward again to a freshly generated value; there is no way
|
||||||
|
to "un-rotate" at the provider.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 5. DATABASE_URL password — zero-downtime rotation
|
||||||
|
|
||||||
|
### 5.1 Strategy
|
||||||
|
|
||||||
|
Postgres supports **multiple roles** and connection strings already identify a
|
||||||
|
user. We rotate the password in two phases, using a transient dual-password
|
||||||
|
state via a second role:
|
||||||
|
|
||||||
|
1. Create a shadow role `goodgo_app_v2` with the **new** password, same
|
||||||
|
privileges as the live role. Permit both roles to authenticate.
|
||||||
|
2. Update the app's `DATABASE_URL` to point at the new role. Roll the API.
|
||||||
|
3. Once all API pods have reconnected, drop the old role (or reset its
|
||||||
|
password and keep it as a break-glass).
|
||||||
|
|
||||||
|
Postgres itself does not support "two valid passwords for one role"; swapping
|
||||||
|
roles is the clean zero-downtime path.
|
||||||
|
|
||||||
|
### 5.2 Pre-flight
|
||||||
|
|
||||||
|
- PostgreSQL 16 + PgBouncer connection pool verified healthy.
|
||||||
|
- Staging rehearsal completed within last 14 days.
|
||||||
|
- `pg_stat_activity` reviewed; no long-running migrations.
|
||||||
|
- Backup snapshot taken within last 6 h (see `docs/backup-restore.md`).
|
||||||
|
|
||||||
|
### 5.3 Scheduled rotation
|
||||||
|
|
||||||
|
```sql
|
||||||
|
-- Phase 1: create shadow role (run as DB owner / postgres)
|
||||||
|
CREATE ROLE goodgo_app_v2 LOGIN PASSWORD '<NEW_PASSWORD>';
|
||||||
|
GRANT goodgo_app TO goodgo_app_v2; -- inherit group, or mirror explicit grants
|
||||||
|
GRANT CONNECT ON DATABASE goodgo TO goodgo_app_v2;
|
||||||
|
GRANT USAGE ON SCHEMA public TO goodgo_app_v2;
|
||||||
|
GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA public TO goodgo_app_v2;
|
||||||
|
-- Mirror any other grants the live role has. Verify with:
|
||||||
|
-- \du goodgo_app
|
||||||
|
```
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Phase 2: update secret store, then roll API
|
||||||
|
# DATABASE_URL=postgresql://goodgo_app_v2:<NEW_PASSWORD>@host:5432/goodgo?sslmode=require
|
||||||
|
|
||||||
|
# Rolling restart — one pod at a time; watch readiness probe before moving on.
|
||||||
|
kubectl -n goodgo rollout restart deployment/api
|
||||||
|
kubectl -n goodgo rollout status deployment/api --timeout=10m
|
||||||
|
```
|
||||||
|
|
||||||
|
```sql
|
||||||
|
-- Phase 3: verify no sessions still on old role, then retire it.
|
||||||
|
-- Run 30+ minutes after rollout completes.
|
||||||
|
SELECT usename, count(*) FROM pg_stat_activity WHERE usename IN ('goodgo_app','goodgo_app_v2') GROUP BY usename;
|
||||||
|
-- Expect: only goodgo_app_v2 connections.
|
||||||
|
|
||||||
|
-- Option A: drop the old role (only if no other consumers use it).
|
||||||
|
-- REASSIGN OWNED BY goodgo_app TO goodgo_app_v2;
|
||||||
|
-- DROP OWNED BY goodgo_app;
|
||||||
|
-- DROP ROLE goodgo_app;
|
||||||
|
|
||||||
|
-- Option B (recommended): reset its password to a fresh random value and keep
|
||||||
|
-- it as an emergency break-glass. Document the fingerprint in the ticket.
|
||||||
|
ALTER ROLE goodgo_app PASSWORD '<RANDOM_BREAKGLASS>';
|
||||||
|
```
|
||||||
|
|
||||||
|
For the next rotation, flip the naming (`goodgo_app_v2` → `goodgo_app_v3`),
|
||||||
|
keeping the alternation going. This avoids ever needing to drop and recreate
|
||||||
|
the "canonical" role name.
|
||||||
|
|
||||||
|
### 5.4 PgBouncer considerations
|
||||||
|
|
||||||
|
If PgBouncer sits in front of Postgres:
|
||||||
|
|
||||||
|
- Update `userlist.txt` (or its auth source) with both roles **before** the
|
||||||
|
API roll.
|
||||||
|
- `RELOAD` PgBouncer; do not `RESTART` (clients reconnect automatically from
|
||||||
|
`RELOAD` without dropping server-side transactions).
|
||||||
|
- Verify with `SHOW USERS;` on the PgBouncer admin console.
|
||||||
|
|
||||||
|
### 5.5 Incident rotation
|
||||||
|
|
||||||
|
Same steps but:
|
||||||
|
|
||||||
|
- Skip the 30-minute settle in Phase 3 — rotate immediately to Option A (drop
|
||||||
|
the compromised role) once no active sessions remain.
|
||||||
|
- If a session is actively using the compromised role, terminate it:
|
||||||
|
```sql
|
||||||
|
SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE usename = 'goodgo_app';
|
||||||
|
```
|
||||||
|
- Run a post-rotation audit on the compromised-role's activity since the last
|
||||||
|
known-good window.
|
||||||
|
|
||||||
|
### 5.6 Verification
|
||||||
|
|
||||||
|
- `GET /health/ready` reports DB connectivity 200.
|
||||||
|
- `db_connection_pool_active` returns to steady state.
|
||||||
|
- Smoke queries via `pnpm db:studio` with the new credential.
|
||||||
|
|
||||||
|
### 5.7 Rollback
|
||||||
|
|
||||||
|
- Until Phase 3 completes, rollback is: revert `DATABASE_URL` to the old role
|
||||||
|
and redeploy. The old role still authenticates.
|
||||||
|
- After Phase 3 Option A (drop): no rollback; restore from snapshot is the
|
||||||
|
last resort.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 6. FIELD_ENCRYPTION_KEY
|
||||||
|
|
||||||
|
Rotating the field-encryption key requires **re-encrypting at-rest data**. It
|
||||||
|
is not a hot swap. Out of scope for this runbook beyond documenting that it
|
||||||
|
exists and requires its own migration playbook. A separate issue will track
|
||||||
|
the re-encryption tooling; until then:
|
||||||
|
|
||||||
|
- Generate and stage the new key alongside the old (`FIELD_ENCRYPTION_KEY` +
|
||||||
|
`FIELD_ENCRYPTION_KEY_PREVIOUS`).
|
||||||
|
- Do not flip the primary until a re-encrypt job has rewritten all
|
||||||
|
encrypted columns.
|
||||||
|
- This path is **approved-change-only** (CTO sign-off).
|
||||||
|
|
||||||
|
Tracked as follow-up: see §9.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 7. Rotation checklist (copy into the rotation ticket)
|
||||||
|
|
||||||
|
```md
|
||||||
|
## Rotation — <secret class> — <env>
|
||||||
|
|
||||||
|
- [ ] Ticket opened in Paperclip; linked to [GOO-121](/GOO/issues/GOO-121)
|
||||||
|
- [ ] Change window booked (date/time ICT)
|
||||||
|
- [ ] Staging rehearsal completed (date, drill report link)
|
||||||
|
- [ ] Buddy on-call: <name>
|
||||||
|
- [ ] New secret generated with `openssl rand -base64 48` (or class-specific)
|
||||||
|
- [ ] New-secret fingerprint (SHA-256 first 16 chars): `________________`
|
||||||
|
- [ ] Secret store updated (do not paste the value here)
|
||||||
|
- [ ] Deploy rolled; readiness probes green
|
||||||
|
- [ ] Smoke + metrics verified (link to dashboard snapshot)
|
||||||
|
- [ ] Overlap window end date (JWT only): ____
|
||||||
|
- [ ] Old secret retired / role dropped (timestamp)
|
||||||
|
- [ ] Post-rotation audit note in ticket
|
||||||
|
- [ ] Runbook updated if anything surprised us
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 8. Drill report template
|
||||||
|
|
||||||
|
Each scheduled rotation — starting with a staging dry run — produces a drill
|
||||||
|
report posted as a comment on [GOO-121](/GOO/issues/GOO-121) (for the initial
|
||||||
|
drill) or on the rotation ticket.
|
||||||
|
|
||||||
|
```md
|
||||||
|
## Drill report — <secret class> — <env> — <date>
|
||||||
|
|
||||||
|
**Window:** 02:00–02:47 ICT
|
||||||
|
**Rotated by:** <agent/user> with buddy <name>
|
||||||
|
|
||||||
|
### Timeline
|
||||||
|
- 02:00 — Pre-flight complete
|
||||||
|
- 02:05 — New secret generated (fingerprint `abcd1234…`)
|
||||||
|
- 02:10 — Secret store updated
|
||||||
|
- 02:12 — Deployment rolled
|
||||||
|
- 02:18 — Smoke passed
|
||||||
|
- 02:20 — Monitoring baseline confirmed
|
||||||
|
- 02:47 — Drill closed
|
||||||
|
|
||||||
|
### Results
|
||||||
|
- Duration: 47 min
|
||||||
|
- Auth errors during rotation: 0 (scheduled) / N (incident — list)
|
||||||
|
- Payment failures: 0 / N
|
||||||
|
- Rollback triggered: no
|
||||||
|
- Follow-ups: link any new issues created
|
||||||
|
|
||||||
|
### Learnings
|
||||||
|
- …
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 9. Follow-ups
|
||||||
|
|
||||||
|
The following items are **not** delivered by this runbook and should be
|
||||||
|
tracked as separate issues:
|
||||||
|
|
||||||
|
- **Dual-key JWT code path.** `JwtStrategy` and `TokenService` need to accept
|
||||||
|
`JWT_SECRET_PREVIOUS` / `JWT_REFRESH_SECRET_PREVIOUS` so §3.2 is truly
|
||||||
|
zero-downtime. Until then, JWT rotation invalidates sessions.
|
||||||
|
- **Field-encryption re-encrypt tool.** Required before `FIELD_ENCRYPTION_KEY`
|
||||||
|
can be rotated safely in production.
|
||||||
|
- **Secret-store automation.** Today rotations are manual via the secret
|
||||||
|
store UI; an automated rotator (Vault / SSM Parameter Store rotation
|
||||||
|
lambda) would shrink the window and reduce human error.
|
||||||
|
- **Production rotation approval.** Payment + DB password rotations in
|
||||||
|
production require a CTO approval window — see [GOO-85](/GOO/issues/GOO-85).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 10. References
|
||||||
|
|
||||||
|
- `apps/api/src/modules/shared/infrastructure/env-validation.ts` — authoritative
|
||||||
|
list of required secrets and minimum-length enforcement.
|
||||||
|
- `apps/api/src/modules/auth/infrastructure/strategies/jwt.strategy.ts` —
|
||||||
|
current single-key JWT verification path.
|
||||||
|
- `docs/RUNBOOK.md` — general incident response procedures.
|
||||||
|
- `docs/backup-restore.md` — database snapshot / restore steps invoked during
|
||||||
|
DB password rotation pre-flight.
|
||||||
|
- `docs/security/PAYMENT_SECURITY_CHECKLIST.md` — payment security controls.
|
||||||
|
- Parent tracker: [GOO-85](/GOO/issues/GOO-85).
|
||||||
Reference in New Issue
Block a user