Refactor service template and improve documentation for better usability

- Updated service template with new features and improved structure.
- Enhanced bilingual documentation across various files for clarity.
- Refined configuration files for better guidance and usability.
- Improved error handling and logging mechanisms for enhanced debugging.
- Added new dependencies and updated existing ones for better performance.
This commit is contained in:
Ho Ngoc Hai
2025-12-29 11:14:00 +07:00
parent ab44954bd6
commit 7a4bda8da7

View File

@@ -0,0 +1,771 @@
---
name: Enterprise Auth Service
overview: Refactor toàn bộ auth-service thành hệ thống enterprise-grade phục vụ 100+ triệu users với kiến trúc phù hợp GoodGo Platform, sử dụng template và cấu trúc monorepo hiện tại.
todos:
- id: backup-current
content: Backup auth-service hiện tại trước khi xóa
status: pending
- id: delete-auth-service
content: Xóa toàn bộ auth-service hiện tại
status: pending
- id: copy-template
content: Copy template service và rename thành auth-service
status: pending
- id: setup-base-structure
content: Setup cấu trúc base với các modules cơ bản
status: pending
- id: implement-advanced-schema
content: Tạo Prisma schema cho RBAC, Social Login, Sessions
status: pending
- id: implement-multi-layer-cache
content: Implement multi-layer cache với Redis và in-memory
status: pending
- id: implement-rbac-module
content: Tạo module RBAC với permissions, roles, policies
status: pending
- id: implement-social-auth
content: Tạo module social auth (Google, Facebook, GitHub)
status: pending
- id: implement-oidc-module
content: Implement OIDC provider và client
status: pending
- id: implement-jwt-service
content: Tạo JWT service với access/refresh/id tokens
status: pending
- id: implement-cookie-service
content: Implement secure cookie management
status: pending
- id: implement-mfa-module
content: Tạo module MFA với TOTP và WebAuthn
status: pending
- id: implement-zero-trust
content: Implement zero-trust security middleware
status: pending
- id: setup-event-sourcing
content: Setup event sourcing cho audit logs
status: pending
- id: implement-rate-limiting
content: Dynamic rate limiting theo role
status: pending
- id: setup-monitoring
content: Setup monitoring với Prometheus và Grafana
status: pending
- id: write-tests
content: Viết unit và integration tests
status: pending
- id: update-docker-compose
content: Update docker-compose.yml với services mới
status: pending
- id: write-documentation
content: Viết documentation và API specs
status: pending
---
# Enterprise Auth Service - Implementation Plan
## 1. Phân tích thách thức và giải pháp cho scale lớn
### Performance Challenges
- **100M+ users** = ~10-50K requests/second peak time
- **Token validation** phải < 10ms (in-memory cache)
- **Login/Register** phải < 100ms (với DB write)
- **Social login** phải handle timeout từ providers
### Security Challenges
- **Distributed attacks** (DDoS, brute force từ nhiều regions)
- **Token hijacking** và replay attacks
- **Data breaches** với 100M+ records
- **Compliance** (GDPR, SOC2, ISO 27001)
### Availability Challenges
- **99.999% uptime** = max 5.26 phút downtime/năm
- **Multi-region failover** < 30 seconds
- **Zero-downtime deployments**
- **Database replication lag** < 100ms
## 2. Kiến trúc tổng quan
```mermaid
graph TB
subgraph clients [Client Layer]
web[Web Apps]
mobile[Mobile Apps]
api[API Clients]
end
subgraph cdn [CDN Layer]
cloudflare[Cloudflare]
fastly[Fastly Cache]
end
subgraph gateway [API Gateway Layer]
kong1[Kong Gateway Region 1]
kong2[Kong Gateway Region 2]
kong3[Kong Gateway Region 3]
end
subgraph auth [Auth Service Cluster]
subgraph region1 [US Region]
auth1[Auth Service Pods]
cache1[Redis Cluster]
db1[PostgreSQL Primary]
end
subgraph region2 [EU Region]
auth2[Auth Service Pods]
cache2[Redis Cluster]
db2[PostgreSQL Replica]
end
subgraph region3 [APAC Region]
auth3[Auth Service Pods]
cache3[Redis Cluster]
db3[PostgreSQL Replica]
end
end
subgraph external [External Services]
oidc[OIDC Providers]
social[Social Providers]
sms[SMS Gateway]
email[Email Service]
end
subgraph monitoring [Observability]
datadog[Datadog APM]
elastic[ELK Stack]
pagerduty[PagerDuty]
end
clients --> cdn
cdn --> gateway
gateway --> auth
auth --> external
auth --> monitoring
```
## 3. Microservices Architecture
```mermaid
graph LR
subgraph authCore [Auth Core Services]
authAPI[Auth API Service]
tokenService[Token Service]
sessionService[Session Service]
mfaService[MFA Service]
end
subgraph rbacServices [RBAC Services]
permissionService[Permission Service]
roleService[Role Service]
policyService[Policy Service]
end
subgraph socialServices [Social Services]
socialAuth[Social Auth Service]
oidcProvider[OIDC Provider]
oidcClient[OIDC Client]
end
subgraph supportServices [Support Services]
auditService[Audit Service]
notificationService[Notification Service]
analyticsService[Analytics Service]
end
subgraph dataLayer [Data Layer]
postgresCluster[PostgreSQL Cluster]
redisCluster[Redis Cluster]
elasticSearch[ElasticSearch]
kafka[Kafka Event Bus]
end
authAPI --> tokenService
authAPI --> sessionService
authAPI --> mfaService
authAPI --> rbacServices
authAPI --> socialServices
authCore --> kafka
rbacServices --> kafka
socialServices --> kafka
kafka --> supportServices
authCore --> dataLayer
rbacServices --> dataLayer
socialServices --> dataLayer
supportServices --> dataLayer
```
## 4. Cấu trúc dự án với GoodGo monorepo
```
services/
└── auth-service/ # Single service với modules
├── Dockerfile
├── package.json
├── prisma/
│ ├── schema.prisma # Advanced schema với RBAC
│ └── migrations/
├── src/
│ ├── config/
│ │ ├── app.config.ts
│ │ ├── database.config.ts
│ │ ├── redis.config.ts
│ │ ├── jwt.config.ts
│ │ └── social.config.ts
│ ├── core/ # Core utilities
│ │ ├── cache/
│ │ │ ├── multi-layer-cache.ts
│ │ │ └── cache.service.ts
│ │ ├── security/
│ │ │ ├── zero-trust.validator.ts
│ │ │ └── encryption.service.ts
│ │ └── events/
│ │ ├── event-bus.ts
│ │ └── audit.service.ts
│ ├── modules/
│ │ ├── auth/ # Core authentication
│ │ │ ├── auth.controller.ts
│ │ │ ├── auth.service.ts
│ │ │ ├── auth.dto.ts
│ │ │ └── strategies/
│ │ ├── rbac/ # RBAC system
│ │ │ ├── rbac.controller.ts
│ │ │ ├── rbac.service.ts
│ │ │ ├── permission.service.ts
│ │ │ ├── role.service.ts
│ │ │ ├── policy.engine.ts
│ │ │ └── rbac.dto.ts
│ │ ├── social/ # Social authentication
│ │ │ ├── social.controller.ts
│ │ │ ├── social.service.ts
│ │ │ ├── providers/
│ │ │ │ ├── google.provider.ts
│ │ │ │ ├── facebook.provider.ts
│ │ │ │ └── github.provider.ts
│ │ │ └── circuit-breaker.ts
│ │ ├── oidc/ # OIDC implementation
│ │ │ ├── oidc.controller.ts
│ │ │ ├── oidc-provider.service.ts
│ │ │ ├── oidc-client.service.ts
│ │ │ └── multi-tenant.service.ts
│ │ ├── token/ # Token management
│ │ │ ├── jwt.service.ts
│ │ │ ├── cookie.service.ts
│ │ │ └── token-rotation.service.ts
│ │ ├── session/ # Session management
│ │ │ ├── session.service.ts
│ │ │ └── distributed-session.ts
│ │ ├── mfa/ # Multi-factor auth
│ │ │ ├── mfa.controller.ts
│ │ │ ├── mfa.service.ts
│ │ │ ├── totp.service.ts
│ │ │ └── webauthn.service.ts
│ │ └── health/
│ │ └── health.controller.ts
│ ├── middlewares/
│ │ ├── auth.middleware.ts
│ │ ├── rbac.middleware.ts
│ │ ├── rate-limit.middleware.ts
│ │ ├── zero-trust.middleware.ts
│ │ └── error.middleware.ts
│ ├── repositories/
│ │ ├── base.repository.ts
│ │ ├── user.repository.ts
│ │ ├── role.repository.ts
│ │ └── session.repository.ts
│ ├── routes/
│ │ └── index.ts
│ └── main.ts
└── tests/
├── unit/
├── integration/
└── load/
```
## 5. Database Architecture
### 5.1 Sharding Strategy
```mermaid
graph TD
subgraph shardRouter [Shard Router]
router[Vitess/Citus Router]
end
subgraph shard1 [Shard 1: Users 0-33M]
master1[PostgreSQL Primary]
replica1a[Replica 1A]
replica1b[Replica 1B]
end
subgraph shard2 [Shard 2: Users 33M-66M]
master2[PostgreSQL Primary]
replica2a[Replica 2A]
replica2b[Replica 2B]
end
subgraph shard3 [Shard 3: Users 66M-100M+]
master3[PostgreSQL Primary]
replica3a[Replica 3A]
replica3b[Replica 3B]
end
router --> shard1
router --> shard2
router --> shard3
```
### 5.2 Schema Design
```prisma
// Optimized for sharding and performance
model User {
id String @id @default(cuid()) // Use CUID for better distribution
shardKey Int // For sharding (hash of userId % shard_count)
email String
username String?
passwordHash String?
// Denormalized for performance
primaryRole String? // Cache primary role
permissionCache Json? // Cache computed permissions
lastLoginAt DateTime?
loginCount Int @default(0)
@@unique([shardKey, email])
@@index([shardKey, id])
@@index([email])
@@map("users")
}
// Separate table for hot data
model UserSession {
id String @id
userId String
shardKey Int
deviceId String
ipAddress String
userAgent String
expiresAt DateTime
@@index([userId, deviceId])
@@index([expiresAt])
@@map("user_sessions")
}
// Event sourcing for audit
model AuthEvent {
id String @id @default(cuid())
userId String
eventType String // LOGIN, LOGOUT, MFA_ENABLED, etc.
eventData Json
ipAddress String
userAgent String
timestamp DateTime @default(now())
@@index([userId, timestamp])
@@index([eventType, timestamp])
@@map("auth_events")
}
```
## 6. Caching Strategy
### 6.1 Multi-layer Cache
```typescript
// [services/auth-core/token-service/src/cache/multi-layer-cache.ts]
export class MultiLayerCache {
private l1Cache: NodeCache; // In-memory (10MB)
private l2Cache: RedisCluster; // Redis (100GB)
private l3Cache: CDNCache; // CDN Edge (unlimited)
async get(key: string): Promise<any> {
// L1: Local memory (< 1ms)
let value = this.l1Cache.get(key);
if (value) return value;
// L2: Redis cluster (< 5ms)
value = await this.l2Cache.get(key);
if (value) {
this.l1Cache.set(key, value, 60); // Cache 1 min
return value;
}
// L3: CDN edge cache (< 20ms)
value = await this.l3Cache.get(key);
if (value) {
await this.warmUpCache(key, value);
return value;
}
return null;
}
}
```
### 6.2 Cache Invalidation
```mermaid
sequenceDiagram
participant User
participant AuthAPI
participant Cache
participant DB
participant EventBus
User->>AuthAPI: Update Permission
AuthAPI->>DB: Write to DB
AuthAPI->>EventBus: Publish PERMISSION_CHANGED
EventBus->>Cache: Invalidate user cache
EventBus->>Cache: Invalidate permission cache
Cache-->>AuthAPI: Cache cleared
AuthAPI-->>User: Success
```
## 7. Security Implementation
### 7.1 Zero-Trust Architecture
```typescript
// [services/auth-core/auth-api/src/security/zero-trust.ts]
export class ZeroTrustValidator {
async validateRequest(req: Request): Promise<ValidationResult> {
const checks = await Promise.all([
this.validateDevice(req), // Device fingerprinting
this.validateLocation(req), // Geo-location check
this.validateBehavior(req), // ML-based behavior analysis
this.validateToken(req), // Token validation
this.validateRateLimit(req), // Rate limiting
]);
const riskScore = this.calculateRiskScore(checks);
if (riskScore > 0.8) {
// High risk - require MFA
return { requireMFA: true };
} else if (riskScore > 0.5) {
// Medium risk - additional logging
await this.auditLog(req, 'MEDIUM_RISK');
}
return { allowed: true };
}
}
```
### 7.2 Advanced RBAC with ABAC
```typescript
// [services/auth-rbac/policy-service/src/models/policy.ts]
export class PolicyEngine {
async evaluate(context: PolicyContext): Promise<boolean> {
// Attribute-based access control
const policies = await this.loadPolicies(context.resource);
for (const policy of policies) {
if (!this.evaluateCondition(policy.condition, context)) {
return false;
}
}
// Role-based check
const hasRole = await this.checkRole(
context.userId,
policy.requiredRole
);
// Permission check with scope
const hasPermission = await this.checkPermission(
context.userId,
context.resource,
context.action,
context.scope
);
// Time-based access
const inTimeWindow = this.checkTimeWindow(
policy.timeRestriction
);
return hasRole && hasPermission && inTimeWindow;
}
}
```
## 8. Social Login với Circuit Breaker
```typescript
// [services/auth-social/social-auth-service/src/providers/social-provider.ts]
export class SocialAuthProvider {
private circuitBreaker: CircuitBreaker;
constructor(provider: string) {
this.circuitBreaker = new CircuitBreaker({
timeout: 3000, // 3s timeout
errorThreshold: 50, // 50% error rate
resetTimeout: 30000, // Reset after 30s
});
}
async authenticate(code: string): Promise<SocialUser> {
return this.circuitBreaker.execute(async () => {
// Fallback to cached profile if provider is down
try {
return await this.provider.getProfile(code);
} catch (error) {
const cached = await this.getCachedProfile(code);
if (cached) {
this.logger.warn('Using cached profile', { provider });
return cached;
}
throw error;
}
});
}
}
```
## 9. OIDC Provider với Multi-tenancy
```typescript
// [services/auth-social/oidc-provider/src/provider/multi-tenant.ts]
export class MultiTenantOIDCProvider {
async getConfiguration(tenantId: string): Promise<Configuration> {
const tenant = await this.tenantService.get(tenantId);
return {
issuer: `https://auth.goodgo.com/${tenantId}`,
clients: tenant.clients,
claims: tenant.customClaims,
features: {
introspection: { enabled: true },
revocation: { enabled: true },
deviceFlow: { enabled: tenant.features.deviceFlow },
mTLS: {
enabled: tenant.security.mtls,
certificateAuth: true
}
},
jwks: await this.keyRotation.getCurrentKeys(tenantId),
ttl: this.getTTLConfig(tenant.security.level)
};
}
}
```
## 10. Implementation Phases
### Phase 1: Core Refactoring (2 tuần)
- Tách auth-service thành microservices
- Implement sharding cho database
- Setup Redis Cluster
- Basic monitoring với Datadog
### Phase 2: Performance Optimization (2 tuần)
- Implement multi-layer caching
- Optimize database queries
- Add connection pooling
- Load testing với K6
### Phase 3: Security Enhancement (2 tuần)
- Zero-trust architecture
- Advanced RBAC với ABAC
- MFA với TOTP/WebAuthn
- Audit logging với ElasticSearch
### Phase 4: High Availability (2 tuần)
- Multi-region deployment
- Database replication
- Disaster recovery plan
- Chaos engineering tests
### Phase 5: Social & OIDC (1 tuần)
- Social login với circuit breaker
- OIDC provider multi-tenant
- Federation với enterprise IdPs
- SSO implementation
### Phase 6: Monitoring & Optimization (1 tuần)
- Complete observability stack
- Performance tuning
- Security hardening
- Documentation & training
## 11. Implementation Steps Chi Tiết
### Step 1: Backup và Setup Base (30 phút)
```bash
# Backup current auth-service
cp -r services/auth-service services/auth-service.backup
# Delete current service
rm -rf services/auth-service
# Copy template
cp -r services/_template services/auth-service
# Update package.json name
```
### Step 2: Prisma Schema Setup (1 giờ)
- Tạo schema với User, Role, Permission, Session, SocialAccount
- Setup indexes cho performance
- Add sharding support fields
### Step 3: Core Modules Implementation (2 ngày)
- **Auth Module**: Login, Register, Logout, RefreshToken
- **RBAC Module**: Roles, Permissions, Policies
- **Token Module**: JWT service với rotation
- **Session Module**: Distributed session management
### Step 4: Advanced Features (3 ngày)
- **Social Auth**: Google, Facebook, GitHub với Passport.js
- **OIDC**: Provider và Client implementation
- **MFA**: TOTP và WebAuthn
- **Zero-Trust**: Device fingerprinting, geo-location
### Step 5: Performance Optimization (2 ngày)
- **Multi-layer Cache**: Memory → Redis → CDN
- **Database Optimization**: Connection pooling, indexes
- **Rate Limiting**: Dynamic theo role
- **Load Testing**: K6 tests cho 10K req/s
### Step 6: Security & Monitoring (1 ngày)
- **Audit Logging**: Event sourcing pattern
- **Monitoring**: Prometheus metrics
- **Security Headers**: Helmet.js
- **Testing**: Unit & Integration tests
### Step 7: Deployment (1 ngày)
- Update docker-compose.yml
- Configure Traefik routing
- Setup environment variables
- Documentation
## 12. Key Technologies
### Core Stack (Phù hợp với GoodGo)
- **Express.js**: Web framework (giữ nguyên theo template)
- **Prisma**: ORM với PostgreSQL/Neon
- **Redis**: Caching layer
- **TypeScript**: Type safety
- **Zod**: Validation
### Authentication Libraries
- **jsonwebtoken**: JWT handling
- **passport**: Social auth strategies
- **bcryptjs**: Password hashing
- **speakeasy**: TOTP for MFA
- **@simplewebauthn/server**: WebAuthn
### Security Libraries
- **helmet**: Security headers
- **express-rate-limit**: Rate limiting
- **ioredis**: Redis client
- **node-cache**: In-memory cache
- **fingerprint.js**: Device fingerprinting
### Monitoring (Existing trong GoodGo)
- **Prometheus**: Metrics (existing)
- **Grafana**: Dashboards (existing)
- **Loki**: Logging (existing)
- **@goodgo/logger**: Custom logger
- **@goodgo/tracing**: OpenTelemetry
## 13. Performance Targets (Realistic cho Start)
### Phase 1: MVP (Current Infrastructure)
- **Authentication**: < 200ms p99
- **Token Validation**: < 50ms p99
- **Permission Check**: < 100ms p99
- **Throughput**: 1,000 req/s
- **Availability**: 99.9% uptime
### Phase 2: Scale Up (3-6 months)
- **Authentication**: < 100ms p99
- **Token Validation**: < 20ms p99
- **Throughput**: 10,000 req/s
- **Availability**: 99.99% uptime
### Phase 3: Enterprise (1+ year)
- **Authentication**: < 50ms p99
- **Token Validation**: < 10ms p99
- **Throughput**: 50,000 req/s
- **Availability**: 99.999% uptime
## 14. File Structure to Create
```bash
services/auth-service/
├── package.json # Dependencies
├── tsconfig.json # TypeScript config
├── .env.example # Environment template
├── Dockerfile # Docker build
├── jest.config.ts # Test configuration
├── prisma/
│ └── schema.prisma # Database schema
├── src/
│ ├── main.ts # Entry point
│ ├── config/*.ts # Configurations
│ ├── core/* # Core utilities
│ ├── modules/* # Feature modules
│ ├── middlewares/*.ts # Express middlewares
│ ├── repositories/*.ts # Data access
│ └── routes/index.ts # Route definitions
└── tests/
├── unit/* # Unit tests
└── integration/* # Integration tests
```
## 15. Timeline Thực Tế
### Week 1: Foundation
- **Day 1-2**: Setup base structure, Prisma schema
- **Day 3-4**: Core auth module (login, register, JWT)
- **Day 5**: Basic RBAC (roles, permissions)
### Week 2: Advanced Features
- **Day 1-2**: Social authentication
- **Day 3-4**: OIDC implementation
- **Day 5**: MFA và security features
### Week 3: Optimization & Deployment
- **Day 1-2**: Performance optimization, caching
- **Day 3**: Testing và bug fixes
- **Day 4-5**: Documentation và deployment
**Total: 3 tuần cho MVP production-ready**