diff --git a/.cursor/plans/enterprise_auth_service_c4783a66.plan.md b/.cursor/plans/enterprise_auth_service_c4783a66.plan.md new file mode 100644 index 00000000..f87cb1b0 --- /dev/null +++ b/.cursor/plans/enterprise_auth_service_c4783a66.plan.md @@ -0,0 +1,771 @@ +--- +name: Enterprise Auth Service +overview: Refactor toàn bộ auth-service thành hệ thống enterprise-grade phục vụ 100+ triệu users với kiến trúc phù hợp GoodGo Platform, sử dụng template và cấu trúc monorepo hiện tại. +todos: + - id: backup-current + content: Backup auth-service hiện tại trước khi xóa + status: pending + - id: delete-auth-service + content: Xóa toàn bộ auth-service hiện tại + status: pending + - id: copy-template + content: Copy template service và rename thành auth-service + status: pending + - id: setup-base-structure + content: Setup cấu trúc base với các modules cơ bản + status: pending + - id: implement-advanced-schema + content: Tạo Prisma schema cho RBAC, Social Login, Sessions + status: pending + - id: implement-multi-layer-cache + content: Implement multi-layer cache với Redis và in-memory + status: pending + - id: implement-rbac-module + content: Tạo module RBAC với permissions, roles, policies + status: pending + - id: implement-social-auth + content: Tạo module social auth (Google, Facebook, GitHub) + status: pending + - id: implement-oidc-module + content: Implement OIDC provider và client + status: pending + - id: implement-jwt-service + content: Tạo JWT service với access/refresh/id tokens + status: pending + - id: implement-cookie-service + content: Implement secure cookie management + status: pending + - id: implement-mfa-module + content: Tạo module MFA với TOTP và WebAuthn + status: pending + - id: implement-zero-trust + content: Implement zero-trust security middleware + status: pending + - id: setup-event-sourcing + content: Setup event sourcing cho audit logs + status: pending + - id: implement-rate-limiting + content: Dynamic rate limiting theo role + status: pending + - id: setup-monitoring + content: Setup monitoring với Prometheus và Grafana + status: pending + - id: write-tests + content: Viết unit và integration tests + status: pending + - id: update-docker-compose + content: Update docker-compose.yml với services mới + status: pending + - id: write-documentation + content: Viết documentation và API specs + status: pending +--- + +# Enterprise Auth Service - Implementation Plan + +## 1. Phân tích thách thức và giải pháp cho scale lớn + +### Performance Challenges + +- **100M+ users** = ~10-50K requests/second peak time +- **Token validation** phải < 10ms (in-memory cache) +- **Login/Register** phải < 100ms (với DB write) +- **Social login** phải handle timeout từ providers + +### Security Challenges + +- **Distributed attacks** (DDoS, brute force từ nhiều regions) +- **Token hijacking** và replay attacks +- **Data breaches** với 100M+ records +- **Compliance** (GDPR, SOC2, ISO 27001) + +### Availability Challenges + +- **99.999% uptime** = max 5.26 phút downtime/năm +- **Multi-region failover** < 30 seconds +- **Zero-downtime deployments** +- **Database replication lag** < 100ms + +## 2. Kiến trúc tổng quan + +```mermaid +graph TB + subgraph clients [Client Layer] + web[Web Apps] + mobile[Mobile Apps] + api[API Clients] + end + + subgraph cdn [CDN Layer] + cloudflare[Cloudflare] + fastly[Fastly Cache] + end + + subgraph gateway [API Gateway Layer] + kong1[Kong Gateway Region 1] + kong2[Kong Gateway Region 2] + kong3[Kong Gateway Region 3] + end + + subgraph auth [Auth Service Cluster] + subgraph region1 [US Region] + auth1[Auth Service Pods] + cache1[Redis Cluster] + db1[PostgreSQL Primary] + end + + subgraph region2 [EU Region] + auth2[Auth Service Pods] + cache2[Redis Cluster] + db2[PostgreSQL Replica] + end + + subgraph region3 [APAC Region] + auth3[Auth Service Pods] + cache3[Redis Cluster] + db3[PostgreSQL Replica] + end + end + + subgraph external [External Services] + oidc[OIDC Providers] + social[Social Providers] + sms[SMS Gateway] + email[Email Service] + end + + subgraph monitoring [Observability] + datadog[Datadog APM] + elastic[ELK Stack] + pagerduty[PagerDuty] + end + + clients --> cdn + cdn --> gateway + gateway --> auth + auth --> external + auth --> monitoring +``` + +## 3. Microservices Architecture + +```mermaid +graph LR + subgraph authCore [Auth Core Services] + authAPI[Auth API Service] + tokenService[Token Service] + sessionService[Session Service] + mfaService[MFA Service] + end + + subgraph rbacServices [RBAC Services] + permissionService[Permission Service] + roleService[Role Service] + policyService[Policy Service] + end + + subgraph socialServices [Social Services] + socialAuth[Social Auth Service] + oidcProvider[OIDC Provider] + oidcClient[OIDC Client] + end + + subgraph supportServices [Support Services] + auditService[Audit Service] + notificationService[Notification Service] + analyticsService[Analytics Service] + end + + subgraph dataLayer [Data Layer] + postgresCluster[PostgreSQL Cluster] + redisCluster[Redis Cluster] + elasticSearch[ElasticSearch] + kafka[Kafka Event Bus] + end + + authAPI --> tokenService + authAPI --> sessionService + authAPI --> mfaService + authAPI --> rbacServices + authAPI --> socialServices + + authCore --> kafka + rbacServices --> kafka + socialServices --> kafka + kafka --> supportServices + + authCore --> dataLayer + rbacServices --> dataLayer + socialServices --> dataLayer + supportServices --> dataLayer +``` + +## 4. Cấu trúc dự án với GoodGo monorepo + +``` +services/ +└── auth-service/ # Single service với modules + ├── Dockerfile + ├── package.json + ├── prisma/ + │ ├── schema.prisma # Advanced schema với RBAC + │ └── migrations/ + ├── src/ + │ ├── config/ + │ │ ├── app.config.ts + │ │ ├── database.config.ts + │ │ ├── redis.config.ts + │ │ ├── jwt.config.ts + │ │ └── social.config.ts + │ ├── core/ # Core utilities + │ │ ├── cache/ + │ │ │ ├── multi-layer-cache.ts + │ │ │ └── cache.service.ts + │ │ ├── security/ + │ │ │ ├── zero-trust.validator.ts + │ │ │ └── encryption.service.ts + │ │ └── events/ + │ │ ├── event-bus.ts + │ │ └── audit.service.ts + │ ├── modules/ + │ │ ├── auth/ # Core authentication + │ │ │ ├── auth.controller.ts + │ │ │ ├── auth.service.ts + │ │ │ ├── auth.dto.ts + │ │ │ └── strategies/ + │ │ ├── rbac/ # RBAC system + │ │ │ ├── rbac.controller.ts + │ │ │ ├── rbac.service.ts + │ │ │ ├── permission.service.ts + │ │ │ ├── role.service.ts + │ │ │ ├── policy.engine.ts + │ │ │ └── rbac.dto.ts + │ │ ├── social/ # Social authentication + │ │ │ ├── social.controller.ts + │ │ │ ├── social.service.ts + │ │ │ ├── providers/ + │ │ │ │ ├── google.provider.ts + │ │ │ │ ├── facebook.provider.ts + │ │ │ │ └── github.provider.ts + │ │ │ └── circuit-breaker.ts + │ │ ├── oidc/ # OIDC implementation + │ │ │ ├── oidc.controller.ts + │ │ │ ├── oidc-provider.service.ts + │ │ │ ├── oidc-client.service.ts + │ │ │ └── multi-tenant.service.ts + │ │ ├── token/ # Token management + │ │ │ ├── jwt.service.ts + │ │ │ ├── cookie.service.ts + │ │ │ └── token-rotation.service.ts + │ │ ├── session/ # Session management + │ │ │ ├── session.service.ts + │ │ │ └── distributed-session.ts + │ │ ├── mfa/ # Multi-factor auth + │ │ │ ├── mfa.controller.ts + │ │ │ ├── mfa.service.ts + │ │ │ ├── totp.service.ts + │ │ │ └── webauthn.service.ts + │ │ └── health/ + │ │ └── health.controller.ts + │ ├── middlewares/ + │ │ ├── auth.middleware.ts + │ │ ├── rbac.middleware.ts + │ │ ├── rate-limit.middleware.ts + │ │ ├── zero-trust.middleware.ts + │ │ └── error.middleware.ts + │ ├── repositories/ + │ │ ├── base.repository.ts + │ │ ├── user.repository.ts + │ │ ├── role.repository.ts + │ │ └── session.repository.ts + │ ├── routes/ + │ │ └── index.ts + │ └── main.ts + └── tests/ + ├── unit/ + ├── integration/ + └── load/ +``` + +## 5. Database Architecture + +### 5.1 Sharding Strategy + +```mermaid +graph TD + subgraph shardRouter [Shard Router] + router[Vitess/Citus Router] + end + + subgraph shard1 [Shard 1: Users 0-33M] + master1[PostgreSQL Primary] + replica1a[Replica 1A] + replica1b[Replica 1B] + end + + subgraph shard2 [Shard 2: Users 33M-66M] + master2[PostgreSQL Primary] + replica2a[Replica 2A] + replica2b[Replica 2B] + end + + subgraph shard3 [Shard 3: Users 66M-100M+] + master3[PostgreSQL Primary] + replica3a[Replica 3A] + replica3b[Replica 3B] + end + + router --> shard1 + router --> shard2 + router --> shard3 +``` + +### 5.2 Schema Design + +```prisma +// Optimized for sharding and performance +model User { + id String @id @default(cuid()) // Use CUID for better distribution + shardKey Int // For sharding (hash of userId % shard_count) + email String + username String? + passwordHash String? + + // Denormalized for performance + primaryRole String? // Cache primary role + permissionCache Json? // Cache computed permissions + lastLoginAt DateTime? + loginCount Int @default(0) + + @@unique([shardKey, email]) + @@index([shardKey, id]) + @@index([email]) + @@map("users") +} + +// Separate table for hot data +model UserSession { + id String @id + userId String + shardKey Int + deviceId String + ipAddress String + userAgent String + expiresAt DateTime + + @@index([userId, deviceId]) + @@index([expiresAt]) + @@map("user_sessions") +} + +// Event sourcing for audit +model AuthEvent { + id String @id @default(cuid()) + userId String + eventType String // LOGIN, LOGOUT, MFA_ENABLED, etc. + eventData Json + ipAddress String + userAgent String + timestamp DateTime @default(now()) + + @@index([userId, timestamp]) + @@index([eventType, timestamp]) + @@map("auth_events") +} +``` + +## 6. Caching Strategy + +### 6.1 Multi-layer Cache + +```typescript +// [services/auth-core/token-service/src/cache/multi-layer-cache.ts] + +export class MultiLayerCache { + private l1Cache: NodeCache; // In-memory (10MB) + private l2Cache: RedisCluster; // Redis (100GB) + private l3Cache: CDNCache; // CDN Edge (unlimited) + + async get(key: string): Promise { + // L1: Local memory (< 1ms) + let value = this.l1Cache.get(key); + if (value) return value; + + // L2: Redis cluster (< 5ms) + value = await this.l2Cache.get(key); + if (value) { + this.l1Cache.set(key, value, 60); // Cache 1 min + return value; + } + + // L3: CDN edge cache (< 20ms) + value = await this.l3Cache.get(key); + if (value) { + await this.warmUpCache(key, value); + return value; + } + + return null; + } +} +``` + +### 6.2 Cache Invalidation + +```mermaid +sequenceDiagram + participant User + participant AuthAPI + participant Cache + participant DB + participant EventBus + + User->>AuthAPI: Update Permission + AuthAPI->>DB: Write to DB + AuthAPI->>EventBus: Publish PERMISSION_CHANGED + EventBus->>Cache: Invalidate user cache + EventBus->>Cache: Invalidate permission cache + Cache-->>AuthAPI: Cache cleared + AuthAPI-->>User: Success +``` + +## 7. Security Implementation + +### 7.1 Zero-Trust Architecture + +```typescript +// [services/auth-core/auth-api/src/security/zero-trust.ts] + +export class ZeroTrustValidator { + async validateRequest(req: Request): Promise { + const checks = await Promise.all([ + this.validateDevice(req), // Device fingerprinting + this.validateLocation(req), // Geo-location check + this.validateBehavior(req), // ML-based behavior analysis + this.validateToken(req), // Token validation + this.validateRateLimit(req), // Rate limiting + ]); + + const riskScore = this.calculateRiskScore(checks); + + if (riskScore > 0.8) { + // High risk - require MFA + return { requireMFA: true }; + } else if (riskScore > 0.5) { + // Medium risk - additional logging + await this.auditLog(req, 'MEDIUM_RISK'); + } + + return { allowed: true }; + } +} +``` + +### 7.2 Advanced RBAC with ABAC + +```typescript +// [services/auth-rbac/policy-service/src/models/policy.ts] + +export class PolicyEngine { + async evaluate(context: PolicyContext): Promise { + // Attribute-based access control + const policies = await this.loadPolicies(context.resource); + + for (const policy of policies) { + if (!this.evaluateCondition(policy.condition, context)) { + return false; + } + } + + // Role-based check + const hasRole = await this.checkRole( + context.userId, + policy.requiredRole + ); + + // Permission check with scope + const hasPermission = await this.checkPermission( + context.userId, + context.resource, + context.action, + context.scope + ); + + // Time-based access + const inTimeWindow = this.checkTimeWindow( + policy.timeRestriction + ); + + return hasRole && hasPermission && inTimeWindow; + } +} +``` + +## 8. Social Login với Circuit Breaker + +```typescript +// [services/auth-social/social-auth-service/src/providers/social-provider.ts] + +export class SocialAuthProvider { + private circuitBreaker: CircuitBreaker; + + constructor(provider: string) { + this.circuitBreaker = new CircuitBreaker({ + timeout: 3000, // 3s timeout + errorThreshold: 50, // 50% error rate + resetTimeout: 30000, // Reset after 30s + }); + } + + async authenticate(code: string): Promise { + return this.circuitBreaker.execute(async () => { + // Fallback to cached profile if provider is down + try { + return await this.provider.getProfile(code); + } catch (error) { + const cached = await this.getCachedProfile(code); + if (cached) { + this.logger.warn('Using cached profile', { provider }); + return cached; + } + throw error; + } + }); + } +} +``` + +## 9. OIDC Provider với Multi-tenancy + +```typescript +// [services/auth-social/oidc-provider/src/provider/multi-tenant.ts] + +export class MultiTenantOIDCProvider { + async getConfiguration(tenantId: string): Promise { + const tenant = await this.tenantService.get(tenantId); + + return { + issuer: `https://auth.goodgo.com/${tenantId}`, + clients: tenant.clients, + claims: tenant.customClaims, + features: { + introspection: { enabled: true }, + revocation: { enabled: true }, + deviceFlow: { enabled: tenant.features.deviceFlow }, + mTLS: { + enabled: tenant.security.mtls, + certificateAuth: true + } + }, + jwks: await this.keyRotation.getCurrentKeys(tenantId), + ttl: this.getTTLConfig(tenant.security.level) + }; + } +} +``` + +## 10. Implementation Phases + +### Phase 1: Core Refactoring (2 tuần) + +- Tách auth-service thành microservices +- Implement sharding cho database +- Setup Redis Cluster +- Basic monitoring với Datadog + +### Phase 2: Performance Optimization (2 tuần) + +- Implement multi-layer caching +- Optimize database queries +- Add connection pooling +- Load testing với K6 + +### Phase 3: Security Enhancement (2 tuần) + +- Zero-trust architecture +- Advanced RBAC với ABAC +- MFA với TOTP/WebAuthn +- Audit logging với ElasticSearch + +### Phase 4: High Availability (2 tuần) + +- Multi-region deployment +- Database replication +- Disaster recovery plan +- Chaos engineering tests + +### Phase 5: Social & OIDC (1 tuần) + +- Social login với circuit breaker +- OIDC provider multi-tenant +- Federation với enterprise IdPs +- SSO implementation + +### Phase 6: Monitoring & Optimization (1 tuần) + +- Complete observability stack +- Performance tuning +- Security hardening +- Documentation & training + +## 11. Implementation Steps Chi Tiết + +### Step 1: Backup và Setup Base (30 phút) + +```bash +# Backup current auth-service +cp -r services/auth-service services/auth-service.backup + +# Delete current service +rm -rf services/auth-service + +# Copy template +cp -r services/_template services/auth-service + +# Update package.json name +``` + +### Step 2: Prisma Schema Setup (1 giờ) + +- Tạo schema với User, Role, Permission, Session, SocialAccount +- Setup indexes cho performance +- Add sharding support fields + +### Step 3: Core Modules Implementation (2 ngày) + +- **Auth Module**: Login, Register, Logout, RefreshToken +- **RBAC Module**: Roles, Permissions, Policies +- **Token Module**: JWT service với rotation +- **Session Module**: Distributed session management + +### Step 4: Advanced Features (3 ngày) + +- **Social Auth**: Google, Facebook, GitHub với Passport.js +- **OIDC**: Provider và Client implementation +- **MFA**: TOTP và WebAuthn +- **Zero-Trust**: Device fingerprinting, geo-location + +### Step 5: Performance Optimization (2 ngày) + +- **Multi-layer Cache**: Memory → Redis → CDN +- **Database Optimization**: Connection pooling, indexes +- **Rate Limiting**: Dynamic theo role +- **Load Testing**: K6 tests cho 10K req/s + +### Step 6: Security & Monitoring (1 ngày) + +- **Audit Logging**: Event sourcing pattern +- **Monitoring**: Prometheus metrics +- **Security Headers**: Helmet.js +- **Testing**: Unit & Integration tests + +### Step 7: Deployment (1 ngày) + +- Update docker-compose.yml +- Configure Traefik routing +- Setup environment variables +- Documentation + +## 12. Key Technologies + +### Core Stack (Phù hợp với GoodGo) + +- **Express.js**: Web framework (giữ nguyên theo template) +- **Prisma**: ORM với PostgreSQL/Neon +- **Redis**: Caching layer +- **TypeScript**: Type safety +- **Zod**: Validation + +### Authentication Libraries + +- **jsonwebtoken**: JWT handling +- **passport**: Social auth strategies +- **bcryptjs**: Password hashing +- **speakeasy**: TOTP for MFA +- **@simplewebauthn/server**: WebAuthn + +### Security Libraries + +- **helmet**: Security headers +- **express-rate-limit**: Rate limiting +- **ioredis**: Redis client +- **node-cache**: In-memory cache +- **fingerprint.js**: Device fingerprinting + +### Monitoring (Existing trong GoodGo) + +- **Prometheus**: Metrics (existing) +- **Grafana**: Dashboards (existing) +- **Loki**: Logging (existing) +- **@goodgo/logger**: Custom logger +- **@goodgo/tracing**: OpenTelemetry + +## 13. Performance Targets (Realistic cho Start) + +### Phase 1: MVP (Current Infrastructure) + +- **Authentication**: < 200ms p99 +- **Token Validation**: < 50ms p99 +- **Permission Check**: < 100ms p99 +- **Throughput**: 1,000 req/s +- **Availability**: 99.9% uptime + +### Phase 2: Scale Up (3-6 months) + +- **Authentication**: < 100ms p99 +- **Token Validation**: < 20ms p99 +- **Throughput**: 10,000 req/s +- **Availability**: 99.99% uptime + +### Phase 3: Enterprise (1+ year) + +- **Authentication**: < 50ms p99 +- **Token Validation**: < 10ms p99 +- **Throughput**: 50,000 req/s +- **Availability**: 99.999% uptime + +## 14. File Structure to Create + +```bash +services/auth-service/ +├── package.json # Dependencies +├── tsconfig.json # TypeScript config +├── .env.example # Environment template +├── Dockerfile # Docker build +├── jest.config.ts # Test configuration +├── prisma/ +│ └── schema.prisma # Database schema +├── src/ +│ ├── main.ts # Entry point +│ ├── config/*.ts # Configurations +│ ├── core/* # Core utilities +│ ├── modules/* # Feature modules +│ ├── middlewares/*.ts # Express middlewares +│ ├── repositories/*.ts # Data access +│ └── routes/index.ts # Route definitions +└── tests/ + ├── unit/* # Unit tests + └── integration/* # Integration tests +``` + +## 15. Timeline Thực Tế + +### Week 1: Foundation + +- **Day 1-2**: Setup base structure, Prisma schema +- **Day 3-4**: Core auth module (login, register, JWT) +- **Day 5**: Basic RBAC (roles, permissions) + +### Week 2: Advanced Features + +- **Day 1-2**: Social authentication +- **Day 3-4**: OIDC implementation +- **Day 5**: MFA và security features + +### Week 3: Optimization & Deployment + +- **Day 1-2**: Performance optimization, caching +- **Day 3**: Testing và bug fixes +- **Day 4-5**: Documentation và deployment + +**Total: 3 tuần cho MVP production-ready** \ No newline at end of file