Files
pos-system/services/iam-service/docs/ARCHITECTURE.en.md
Ho Ngoc Hai d8411abd24 Revise IAM Service Architecture documentation for clarity and comprehensiveness
- Updated the document title to reflect the focus on IAM Service Architecture.
- Expanded the overview section to provide a clearer description of the IAM Service's capabilities.
- Enhanced the table of contents for better navigation.
- Added detailed architecture diagrams illustrating the layered architecture and key components.
- Included comprehensive sections on authentication flows, authorization models, caching strategies, and security architecture.
- Improved overall structure and readability to facilitate understanding of the IAM Service's design and functionality.

These changes aim to provide developers with a thorough understanding of the IAM Service architecture and its components.
2026-01-02 00:30:26 +07:00

1190 lines
37 KiB
Markdown

# IAM Service Architecture
> **Enterprise-Grade Identity & Access Management Platform**
>
> This document describes the complete architecture of the IAM Service, including system components, data flows, security model, and integration patterns.
## Table of Contents
- [Overview](#overview)
- [Overall Architecture](#overall-architecture)
- [Authentication Flows](#authentication-flows)
- [Authorization Model](#authorization-model)
- [Caching Strategy](#caching-strategy)
- [Module Dependencies](#module-dependencies)
- [Data Architecture](#data-architecture)
- [Security Architecture](#security-architecture)
- [Observability](#observability)
- [Deployment Architecture](#deployment-architecture)
---
## Overview
The IAM Service is a comprehensive Identity and Access Management platform providing:
### Core Capabilities
| Capability | Description | Modules |
|------------|-------------|---------|
| **Authentication** | User identity verification | Auth, Social, OIDC, MFA |
| **Authorization** | Access control and permissions | RBAC, ABAC, Policy Engine |
| **Identity Management** | User lifecycle and profiles | Users, Profiles, Organizations, Groups |
| **Access Governance** | Request workflows and reviews | Requests, Reviews, Analytics |
| **Compliance** | Regulatory compliance reporting | GDPR, SOC2, ISO27001, Risk Management |
| **Security** | Zero-trust and threat protection | Zero-Trust, Audit, Encryption |
| **Performance** | High-speed access with caching | Multi-layer Cache, Connection Pool |
### Key Metrics
- **50+ API Endpoints** across 10 feature modules
- **30+ Database Models** with comprehensive relationships
- **7 Authentication Methods** (password, social x3, OIDC, MFA x2)
- **Multi-layer Caching** (Memory → Redis → PostgreSQL)
- **Zero-Trust Security** with device fingerprinting
- **Enterprise Compliance** (GDPR, SOC2, ISO27001, HIPAA)
---
## Overall Architecture
The IAM Service follows a layered architecture pattern with clear separation of concerns:
```mermaid
graph TB
subgraph "Client Layer"
WebApp[Web Application<br/>React/Vue/Angular]
MobileApp[Mobile App<br/>iOS/Android/React Native]
Service[Other Microservices<br/>Product/Order/Payment]
end
subgraph "API Gateway"
Traefik[Traefik Gateway<br/>Load Balancer + Routing<br/>Port 80/443]
end
subgraph "IAM Service - Port 3001"
subgraph "Middleware Stack"
Correlation[Correlation ID<br/>Request Tracking]
Logger[Request Logger<br/>Winston]
ZeroTrust[Zero-Trust Validator<br/>Device + Location + Behavior]
RateLimit[Dynamic Rate Limiter<br/>Redis-backed<br/>50-1000 req/15min]
Auth[Authentication<br/>JWT Verification]
RBAC[Authorization<br/>RBAC + ABAC]
end
subgraph "Core Authentication Modules"
AuthModule[Auth Module<br/>Login/Register/Logout]
TokenModule[Token Module<br/>JWT + Cookie Management]
SessionModule[Session Module<br/>Device Fingerprinting]
SocialModule[Social Auth<br/>Google/FB/GitHub]
OIDCModule[OIDC Provider<br/>OpenID Connect]
MFAModule[MFA Module<br/>TOTP/WebAuthn]
RBACModule[RBAC Module<br/>Roles + Permissions]
end
subgraph "Identity Management Modules"
UserModule[User Lifecycle<br/>CRUD + Bulk Ops]
ProfileModule[Profile Management<br/>Avatar + Custom Fields]
VerificationModule[Identity Verification<br/>Email/Phone/Document]
OrgModule[Organizations<br/>Multi-tenant Hierarchy]
GroupModule[Groups<br/>Group-based Access]
end
subgraph "Access Management Modules"
RequestModule[Access Requests<br/>Approval Workflows]
ReviewModule[Access Reviews<br/>Certification Campaigns]
AnalyticsModule[Access Analytics<br/>Usage + Risk Analysis]
end
subgraph "Governance Modules"
ComplianceModule[Compliance<br/>GDPR/SOC2/ISO27001]
PolicyModule[Policy Governance<br/>Templates + Versioning]
RiskModule[Risk Management<br/>Scoring + Detection]
ReportingModule[Reporting<br/>Dashboards + Exports]
end
subgraph "Core Services"
CacheService[Multi-layer Cache<br/>Memory → Redis<br/>TTL: 60s-15min]
AuditService[Audit Logging<br/>Event Sourcing<br/>AuthEvent Model]
SecurityService[Security Services<br/>Encryption + Hashing]
end
end
subgraph "Data Layer"
PostgreSQL[(PostgreSQL 14+<br/>Primary Database<br/>Connection Pool<br/>30+ Models)]
Redis[(Redis 6+<br/>Cache + Sessions<br/>Rate Limits + Locks)]
end
subgraph "External Services (Future)"
EmailService[Email Service<br/>Verification + Notifications]
SMSService[SMS Service<br/>OTP + Phone Verification]
FileStorage[File Storage<br/>S3/GCS for Avatars]
end
subgraph "Observability Stack"
Prometheus[Prometheus<br/>Metrics Collection]
Jaeger[Jaeger<br/>Distributed Tracing]
Logs[Structured Logs<br/>Winston + Loki]
end
%% Client to Gateway
WebApp --> Traefik
MobileApp --> Traefik
Service --> Traefik
%% Gateway to Middleware
Traefik --> Correlation
Correlation --> Logger
Logger --> ZeroTrust
ZeroTrust --> RateLimit
RateLimit --> Auth
Auth --> RBAC
%% Middleware to Modules
RBAC --> AuthModule
RBAC --> UserModule
RBAC --> RequestModule
RBAC --> ComplianceModule
%% Module Dependencies
AuthModule --> TokenModule
AuthModule --> SessionModule
AuthModule --> SocialModule
AuthModule --> OIDCModule
AuthModule --> MFAModule
AuthModule --> RBACModule
UserModule --> ProfileModule
UserModule --> VerificationModule
UserModule --> OrgModule
UserModule --> GroupModule
RequestModule --> ReviewModule
RequestModule --> AnalyticsModule
ComplianceModule --> PolicyModule
ComplianceModule --> RiskModule
ComplianceModule --> ReportingModule
%% Core Services
AuthModule --> CacheService
UserModule --> CacheService
RBACModule --> CacheService
AuthModule --> AuditService
RequestModule --> AuditService
ComplianceModule --> AuditService
AuthModule --> SecurityService
%% Data Layer
CacheService --> Redis
CacheService --> PostgreSQL
AuthModule --> PostgreSQL
UserModule --> PostgreSQL
RequestModule --> PostgreSQL
ComplianceModule --> PostgreSQL
%% External Services (dashed - not yet integrated)
VerificationModule -.-> EmailService
VerificationModule -.-> SMSService
ProfileModule -.-> FileStorage
%% Observability
AuthModule -.-> Prometheus
UserModule -.-> Prometheus
RequestModule -.-> Prometheus
ComplianceModule -.-> Prometheus
CacheService -.-> Jaeger
AuditService -.-> Logs
style ZeroTrust fill:#ff9999
style CacheService fill:#99ccff
style PostgreSQL fill:#cc99ff
style Redis fill:#ffcc99
style AuditService fill:#99ff99
```
### Architecture Highlights
1. **Layered Middleware**: Every request passes through 6 middleware layers (correlation, logging, zero-trust, rate limiting, authentication, authorization)
2. **Modular Design**: 10 independent feature modules with clear boundaries
3. **Multi-layer Caching**: Memory (60s) → Redis (5-15min) → PostgreSQL for optimal performance
4. **Event Sourcing**: All authentication and authorization events logged for audit compliance
5. **Zero-Trust Security**: Continuous validation of device, location, and behavior
6. **Dynamic Rate Limiting**: Role-based limits (50-1000 req/15min)
---
## Authentication Flows
### Password-Based Authentication
Standard email/password authentication with bcrypt hashing and JWT token generation:
```mermaid
sequenceDiagram
participant Client
participant Middleware
participant AuthService
participant RBACService
participant Database
participant Redis
participant AuditService
Client->>Middleware: POST /api/v1/auth/login<br/>{email, password}
Middleware->>Middleware: 1. Correlation ID
Middleware->>Middleware: 2. Zero-Trust Validation<br/>(Device + IP + Behavior)
Middleware->>Middleware: 3. Rate Limit Check<br/>(5 login/15min)
Middleware->>AuthService: login(email, password)
AuthService->>Database: Find user by email
Database-->>AuthService: User record
AuthService->>AuthService: Verify password<br/>(bcrypt compare, cost=12)
alt Password Invalid
AuthService->>AuditService: Log LOGIN_FAILED event
AuditService->>Database: Save AuthEvent
AuthService-->>Client: 401 Unauthorized<br/>"Invalid credentials"
else Password Valid & MFA Disabled
AuthService->>RBACService: Get user roles + permissions
RBACService->>Database: Query UserRole, UserPermission
Database-->>RBACService: Roles + Permissions
RBACService-->>AuthService: ["ADMIN"], ["users:read:all"]
AuthService->>AuthService: Generate JWT tokens<br/>Access: 15min<br/>Refresh: 7 days
AuthService->>Database: Save refresh token
AuthService->>Redis: Cache user data<br/>TTL: 15min
AuthService->>Redis: Cache permissions<br/>TTL: 5min
AuthService->>Database: Create session<br/>(device fingerprint)
AuthService->>Database: Update lastLoginAt, loginCount
AuthService->>AuditService: Log LOGIN_SUCCESS event
AuthService-->>Client: 200 OK<br/>{user, tokens}<br/>Set-Cookie: refresh_token
else Password Valid & MFA Enabled
AuthService-->>Client: 200 OK<br/>{mfaRequired: true}
Note over Client: Prompt for MFA code
Client->>Middleware: POST /api/v1/mfa/verify<br/>{token: "123456"}
Note over Middleware,AuthService: MFA verification flow...
end
```
**Key Security Features:**
- Bcrypt password hashing (cost factor 12 in production)
- Token family tracking for rotation security
- Device fingerprinting for session management
- Zero-trust validation before authentication
- Comprehensive audit logging
### Social Authentication Flow
OAuth 2.0 integration with Google, Facebook, and GitHub:
```mermaid
sequenceDiagram
participant Client
participant IAM
participant Google as Google OAuth
participant Database
participant Redis
Client->>IAM: GET /api/v1/auth/social/google
IAM->>IAM: Generate state token<br/>(CSRF protection)
IAM->>Redis: Store state token<br/>TTL: 10min
IAM-->>Client: 302 Redirect to Google<br/>with state param
Client->>Google: OAuth consent screen
Google-->>Client: Authorization code + state
Client->>IAM: GET /api/v1/auth/social/google/callback<br/>?code=xxx&state=yyy
IAM->>Redis: Verify state token
alt State Invalid
IAM-->>Client: 401 CSRF token invalid
else State Valid
IAM->>Google: Exchange code for tokens
Google-->>IAM: Access token + User profile
IAM->>Database: Find or create user<br/>by provider + providerId
alt User Found
IAM->>Database: Update social account tokens
else New User
IAM->>Database: Create user + social account
IAM->>Database: Assign default role (USER)
end
IAM->>IAM: Generate IAM JWT tokens
IAM->>Redis: Cache user + permissions
IAM->>Database: Create session
IAM-->>Client: 302 Redirect to app<br/>with tokens in URL/cookie
end
```
**Supported Providers:**
- Google OAuth 2.0
- Facebook OAuth
- GitHub OAuth
- Apple Sign-In (future)
- Microsoft OAuth (future)
### MFA (Multi-Factor Authentication) Flow
TOTP-based two-factor authentication using authenticator apps:
```mermaid
sequenceDiagram
participant Client
participant IAM
participant Authenticator as Authenticator App
participant Database
Note over Client,Database: MFA Enrollment Phase
Client->>IAM: POST /api/v1/mfa/totp/enable
Note over Client: User must be authenticated
IAM->>IAM: Generate TOTP secret (32 chars)
IAM->>IAM: Generate QR code<br/>(otpauth://totp/...)
IAM-->>Client: {secret, qrCode, backupCodes}
Client->>Authenticator: Scan QR code
Authenticator-->>Client: Display 6-digit TOTP
Client->>IAM: POST /api/v1/mfa/totp/verify<br/>{token: "123456"}
IAM->>IAM: Verify TOTP token<br/>(30s window, ±1 interval)
alt Token Valid
IAM->>Database: Create MFADevice record<br/>(type: TOTP, secret: encrypted)
IAM->>Database: Update user.mfaEnabled = true
IAM-->>Client: 200 OK MFA enabled
else Token Invalid
IAM-->>Client: 401 Invalid TOTP token
end
Note over Client,Database: MFA Login Phase
Client->>IAM: POST /api/v1/auth/login<br/>{email, password}
IAM->>Database: Verify credentials
alt MFA Required
IAM-->>Client: 200 OK {mfaRequired: true}
Client->>Authenticator: Get current TOTP
Authenticator-->>Client: Current 6-digit code
Client->>IAM: POST /api/v1/mfa/totp/validate<br/>{userId, token}
IAM->>Database: Get MFADevice for user
IAM->>IAM: Verify TOTP token
alt Token Valid
IAM->>IAM: Generate full JWT tokens
IAM->>Database: Create session
IAM->>Database: Update device.lastUsedAt
IAM-->>Client: 200 OK {user, tokens}
else Token Invalid
IAM-->>Client: 401 Invalid MFA token
end
end
```
**MFA Features:**
- TOTP (Time-based One-Time Password) via authenticator apps
- QR code generation for easy enrollment
- Backup codes for account recovery
- Multiple MFA devices per user
- WebAuthn/FIDO2 framework (future implementation)
---
## Authorization Model
The IAM Service implements a hybrid authorization model combining RBAC (Role-Based Access Control) and ABAC (Attribute-Based Access Control):
### Authorization Decision Flow
```mermaid
flowchart TD
Start[Request to Protected Resource] --> Auth{Authenticated?}
Auth -->|No| Deny401[401 Unauthorized<br/>Authentication required]
Auth -->|Yes| Cache{Permission<br/>in Cache?}
Cache -->|Hit| CacheValue{Cache<br/>Value?}
CacheValue -->|Allow| Allow[Access Granted]
CacheValue -->|Deny| Deny403Cache[403 Forbidden<br/>Cached denial]
Cache -->|Miss| LoadPerms[Load Permissions<br/>from Database]
LoadPerms --> DirectPerms{Has Direct<br/>User Permission?}
DirectPerms -->|Explicit Deny| Deny403A[403 Forbidden<br/>Explicit permission denial]
DirectPerms -->|Explicit Allow| Allow
DirectPerms -->|None| RolePerms{Has Role<br/>Permission?}
RolePerms -->|Yes| CheckExpiry{Role<br/>Expired?}
RolePerms -->|No| GroupPerms{Has Group<br/>Permission?}
CheckExpiry -->|Yes| GroupPerms
CheckExpiry -->|No| Allow
GroupPerms -->|Yes| Allow
GroupPerms -->|No| ABACCheck{ABAC Policy<br/>Exists?}
ABACCheck -->|No Policies| DefaultDeny[403 Forbidden<br/>Default deny - No permissions]
ABACCheck -->|Has Policies| EvaluatePolicies[Evaluate Policies<br/>by Priority]
EvaluatePolicies --> EvalConditions{Policy<br/>Conditions?}
EvalConditions -->|Time-based| CheckTime{Current time<br/>in range?}
EvalConditions -->|Location-based| CheckLocation{IP in<br/>allowed range?}
EvalConditions -->|Attribute-based| CheckAttrs{Attributes<br/>match?}
CheckTime -->|Yes| PolicyEffect{Policy<br/>Effect?}
CheckTime -->|No| NextPolicy{More<br/>Policies?}
CheckLocation -->|Yes| PolicyEffect
CheckLocation -->|No| NextPolicy
CheckAttrs -->|Yes| PolicyEffect
CheckAttrs -->|No| NextPolicy
PolicyEffect -->|ALLOW| Allow
PolicyEffect -->|DENY| Deny403Policy[403 Forbidden<br/>Policy denial]
NextPolicy -->|Yes| EvaluatePolicies
NextPolicy -->|No| DefaultDeny
Allow --> CacheResult[Cache Result<br/>TTL: 5min]
CacheResult --> AuditLog[Log Access Event<br/>to AuthEvent]
AuditLog --> Success[200 OK<br/>Process request]
style Auth fill:#e1f5fe
style DirectPerms fill:#fff3e0
style RolePerms fill:#f3e5f5
style GroupPerms fill:#e8f5e9
style ABACCheck fill:#fce4ec
style Allow fill:#c8e6c9
style Deny401 fill:#ffcdd2
style Deny403A fill:#ffcdd2
style Deny403Cache fill:#ffcdd2
style Deny403Policy fill:#ffcdd2
style DefaultDeny fill:#ffcdd2
```
### Permission Model
The IAM Service uses a hierarchical permission model:
**Permission Format**: `resource:action:scope`
- **Resource**: The entity being accessed (e.g., `users`, `products`, `orders`)
- **Action**: The operation being performed (e.g., `read`, `create`, `update`, `delete`)
- **Scope**: The access boundary (e.g., `own`, `team`, `all`)
**Examples**:
- `users:read:all` - Read all users
- `users:update:own` - Update own user profile
- `products:create:team` - Create products for team
- `orders:delete:all` - Delete any order (admin)
### Authorization Hierarchy
```
1. Direct User Permissions (HIGHEST PRIORITY)
↓ (if not found or not denied)
2. Role Permissions
↓ (if not found)
3. Group Permissions
↓ (if not found)
4. ABAC Policies (evaluated by priority)
↓ (if no policies match)
5. Default Deny (LOWEST PRIORITY)
```
### RBAC (Role-Based Access Control)
```mermaid
graph LR
User[User] --> UserRole1[UserRole]
User --> UserRole2[UserRole]
User --> UserPerm[UserPermission<br/>Direct Override]
UserRole1 --> Role1[ADMIN Role]
UserRole2 --> Role2[MANAGER Role]
Role1 --> RolePerm1[RolePermission]
Role1 --> RolePerm2[RolePermission]
Role2 --> RolePerm3[RolePermission]
RolePerm1 --> Perm1[users:*:all]
RolePerm2 --> Perm2[products:*:all]
RolePerm3 --> Perm3[orders:read:team]
UserPerm --> Perm4[analytics:read:all]
Group[Group] --> GroupMember[GroupMember]
GroupMember --> User
Group --> GroupPerm[GroupPermission]
GroupPerm --> Perm5[reports:read:all]
style User fill:#e1f5fe
style Role1 fill:#f3e5f5
style Role2 fill:#f3e5f5
style Group fill:#e8f5e9
style Perm1 fill:#fff3e0
style Perm2 fill:#fff3e0
style Perm3 fill:#fff3e0
style Perm4 fill:#ffccbc
style Perm5 fill:#c8e6c9
```
**Features:**
- Multiple roles per user
- Temporary role assignments with expiration
- Direct user permissions (can override roles)
- Group-based permissions
- Permission caching (5 minutes TTL)
### ABAC (Attribute-Based Access Control)
```mermaid
graph TD
Request[Access Request] --> PolicyEngine[Policy Engine]
PolicyEngine --> LoadPolicies[Load Policies<br/>for Resource]
LoadPolicies --> SortPolicies[Sort by Priority<br/>Descending]
SortPolicies --> EvaluatePolicy[Evaluate Policy<br/>Conditions]
EvaluatePolicy --> TimeCondition{Time-based<br/>Condition?}
EvaluatePolicy --> LocationCondition{Location-based<br/>Condition?}
EvaluatePolicy --> AttributeCondition{Attribute-based<br/>Condition?}
TimeCondition --> CheckTime[Check current time<br/>vs allowed hours]
LocationCondition --> CheckIP[Check IP address<br/>vs allowed ranges]
AttributeCondition --> CheckAttrs[Check user attributes<br/>vs required values]
CheckTime --> ConditionMet{All Conditions<br/>Met?}
CheckIP --> ConditionMet
CheckAttrs --> ConditionMet
ConditionMet -->|Yes| ApplyEffect{Policy<br/>Effect?}
ConditionMet -->|No| NextPolicy{More<br/>Policies?}
ApplyEffect -->|ALLOW| Allow[Access Granted]
ApplyEffect -->|DENY| Deny[Access Denied]
NextPolicy -->|Yes| EvaluatePolicy
NextPolicy -->|No| DefaultDeny[Default Deny]
style PolicyEngine fill:#f3e5f5
style ConditionMet fill:#fff3e0
style Allow fill:#c8e6c9
style Deny fill:#ffcdd2
style DefaultDeny fill:#ffcdd2
```
**Policy Example (JSON Logic)**:
```json
{
"name": "Business Hours Only",
"resource": "sensitive_data",
"condition": {
"and": [
{
">=": [{"var": "hour"}, 9]
},
{
"<=": [{"var": "hour"}, 17]
},
{
"in": [{"var": "day"}, [1, 2, 3, 4, 5]]
}
]
},
"effect": "ALLOW",
"priority": 100
}
```
---
## Caching Strategy
The IAM Service implements a multi-layer caching strategy for optimal performance:
```mermaid
graph TB
Request[Incoming Request] --> CheckL1{L1 Cache<br/>Node.js Memory}
CheckL1 -->|Cache Hit| L1Hit[Fast Response<br/>< 1ms latency]
CheckL1 -->|Cache Miss| CheckL2{L2 Cache<br/>Redis}
CheckL2 -->|Cache Hit| L2Hit[Medium Response<br/>< 10ms latency]
CheckL2 -->|Cache Miss| Database[(PostgreSQL<br/>Source of Truth)]
Database --> UpdateL2[Update L2 Cache<br/>Write to Redis]
UpdateL2 --> UpdateL1[Update L1 Cache<br/>Write to Memory]
UpdateL1 --> Response[Return Response]
L2Hit --> UpdateL1
L1Hit --> Response
subgraph "Cache Layers"
L1Cache[L1: In-Memory Cache<br/>node-cache<br/>TTL: 60 seconds<br/>Hot data only]
L2Cache[L2: Redis Cache<br/>ioredis<br/>TTL: 5-15 minutes<br/>Distributed]
DBLayer[L3: PostgreSQL<br/>Prisma ORM<br/>Source of truth<br/>Persistent]
end
subgraph "Cached Data Types"
UserData[User Data<br/>TTL: 15min]
Permissions[Permissions<br/>TTL: 5min]
Tokens[Token Validation<br/>TTL: Token lifetime]
Sessions[Sessions<br/>TTL: Session lifetime]
Roles[User Roles<br/>TTL: 5min]
end
style L1Hit fill:#90EE90
style L2Hit fill:#87CEEB
style Database fill:#FFB6C1
style L1Cache fill:#c8e6c9
style L2Cache fill:#bbdefb
style DBLayer fill:#f8bbd0
```
### Cache Configuration
| Data Type | L1 TTL | L2 TTL | Invalidation Strategy |
|-----------|--------|--------|----------------------|
| User Data | 60s | 15min | On update/delete |
| Permissions | 60s | 5min | On role/permission change |
| User Roles | 60s | 5min | On role assignment/revocation |
| Token Validation | N/A | Token lifetime | On logout/revocation |
| Sessions | N/A | Session lifetime | On logout |
| Rate Limit Counters | N/A | 15min window | Time-based expiry |
### Cache Invalidation
```typescript
// Example: Invalidate user caches when permissions change
await rbacService.grantPermission(userId, permissionId);
// Automatically invalidates:
// - cacheService.keys.userPermissions(userId)
// - cacheService.keys.userRoles(userId)
```
---
## Module Dependencies
The IAM Service is organized into 10 feature modules with clear dependencies:
```mermaid
graph TD
subgraph "Presentation Layer"
Routes[Routes/Controllers<br/>50+ Endpoints]
end
subgraph "Business Logic Layer"
subgraph "Core Auth"
AuthModule[Auth Module<br/>Login/Register/Logout]
TokenModule[Token Module<br/>JWT + Cookies]
SessionModule[Session Module<br/>Device Tracking]
end
subgraph "Extended Auth"
RBACModule[RBAC Module<br/>Roles + Permissions]
SocialModule[Social Auth<br/>Google/FB/GitHub]
OIDCModule[OIDC Module<br/>Provider + Client]
MFAModule[MFA Module<br/>TOTP/WebAuthn]
end
subgraph "Identity"
IdentityModule[Identity Module<br/>Users + Profiles]
OrgModule[Organization Module<br/>Multi-tenancy]
GroupModule[Group Module<br/>Group Permissions]
end
subgraph "Access Governance"
AccessModule[Access Module<br/>Requests + Reviews]
AnalyticsModule[Analytics Module<br/>Usage Analysis]
end
subgraph "Compliance"
GovernanceModule[Governance Module<br/>Compliance + Risk]
end
end
subgraph "Core Services Layer"
CacheService[Cache Service<br/>Multi-layer]
AuditService[Audit Service<br/>Event Sourcing]
SecurityService[Security Service<br/>Encryption]
FeatureService[Feature Flags<br/>Runtime Config]
end
subgraph "Data Access Layer"
Repositories[Repositories<br/>Data Access<br/>Prisma ORM]
end
subgraph "Infrastructure"
Database[(PostgreSQL<br/>30+ Models)]
Redis[(Redis<br/>Cache + Locks)]
end
%% Routes to Modules
Routes --> AuthModule
Routes --> IdentityModule
Routes --> AccessModule
Routes --> GovernanceModule
%% Core Auth Dependencies
AuthModule --> TokenModule
AuthModule --> SessionModule
AuthModule --> SocialModule
AuthModule --> OIDCModule
AuthModule --> MFAModule
AuthModule --> RBACModule
%% Identity Dependencies
IdentityModule --> OrgModule
IdentityModule --> GroupModule
IdentityModule --> RBACModule
%% Access Dependencies
AccessModule --> AnalyticsModule
AccessModule --> RBACModule
%% Governance Dependencies
GovernanceModule --> RBACModule
%% Core Services
AuthModule --> CacheService
IdentityModule --> CacheService
RBACModule --> CacheService
AuthModule --> AuditService
AccessModule --> AuditService
GovernanceModule --> AuditService
AuthModule --> SecurityService
AuthModule --> FeatureService
%% Data Access
AuthModule --> Repositories
IdentityModule --> Repositories
AccessModule --> Repositories
GovernanceModule --> Repositories
RBACModule --> Repositories
Repositories --> Database
CacheService --> Redis
CacheService --> Database
style AuthModule fill:#e1f5fe
style RBACModule fill:#f3e5f5
style IdentityModule fill:#fff3e0
style AccessModule fill:#e8f5e9
style GovernanceModule fill:#fce4ec
style CacheService fill:#bbdefb
style AuditService fill:#c8e6c9
style SecurityService fill:#ffccbc
```
### Module Descriptions
| Module | Responsibility | Key Files |
|--------|---------------|-----------|
| **Auth** | User authentication and token management | `auth.service.ts`, `auth.controller.ts` |
| **Token** | JWT generation, validation, and refresh | `jwt.service.ts`, `cookie.service.ts` |
| **Session** | Session lifecycle and device management | `session.service.ts` |
| **RBAC** | Role and permission management | `rbac.service.ts`, `rbac.middleware.ts` |
| **Social** | OAuth integration with external providers | `social.service.ts`, `google.strategy.ts` |
| **OIDC** | OpenID Connect provider and client | `oidc-provider.service.ts` |
| **MFA** | Multi-factor authentication | `mfa.service.ts`, `totp.service.ts` |
| **Identity** | User lifecycle and profile management | `user.service.ts`, `profile.service.ts` |
| **Organization** | Multi-tenant organization support | `organization.service.ts` |
| **Group** | Group-based access control | `group.service.ts` |
| **Access** | Access request workflows and reviews | `request.service.ts`, `review.service.ts` |
| **Analytics** | Access analytics and reporting | `analytics.service.ts` |
| **Governance** | Compliance, policy, and risk management | `compliance.service.ts`, `risk.service.ts` |
| **Cache** | Multi-layer caching (Memory + Redis) | `cache.service.ts` |
| **Audit** | Event sourcing and audit logging | `audit.service.ts` |
| **Security** | Encryption, hashing, zero-trust | `zero-trust.validator.ts` |
---
## Data Architecture
The IAM Service uses PostgreSQL with 30+ Prisma models. See [DATA-MODEL.md](./concepts/DATA-MODEL.md) for complete Entity Relationship Diagram.
### Model Categories
1. **Core Authentication** (7 models):
- User, Session, RefreshToken, AuthEvent, SocialAccount, MFADevice, Policy
2. **Authorization** (6 models):
- Role, Permission, UserRole, RolePermission, UserPermission, GroupPermission
3. **Identity Management** (6 models):
- Organization, Group, GroupMember, UserProfile, IdentityVerification
4. **Access Management** (4 models):
- AccessRequest, AccessRequestApprover, AccessReview, AccessReviewItem
5. **Governance** (3 models):
- ComplianceReport, PolicyTemplate, RiskScore
### Key Relationships
```
User (1) ─── (*) UserRole ─── (*) Role ─── (*) RolePermission ─── (*) Permission
User (1) ─── (*) UserPermission ─── (*) Permission
User (1) ─── (*) Session
User (1) ─── (1) UserProfile
User (1) ─── (*) AccessRequest
Organization (1) ─── (*) User
Organization (1) ─── (*) Group ─── (*) GroupMember ─── (*) User
```
---
## Security Architecture
The IAM Service implements defense-in-depth security with multiple layers:
### Security Layers
```mermaid
graph TB
Request[Incoming Request] --> Layer1[Layer 1: Network Security<br/>Traefik Gateway + TLS]
Layer1 --> Layer2[Layer 2: Zero-Trust Validation<br/>Device + Location + Behavior]
Layer2 --> Layer3[Layer 3: Rate Limiting<br/>Dynamic by Role<br/>50-1000 req/15min]
Layer3 --> Layer4[Layer 4: Authentication<br/>JWT Validation<br/>Token Expiry: 15min]
Layer4 --> Layer5[Layer 5: Authorization<br/>RBAC + ABAC<br/>Permission Checking]
Layer5 --> Layer6[Layer 6: Input Validation<br/>Zod Schemas<br/>Sanitization]
Layer6 --> Layer7[Layer 7: Audit Logging<br/>Event Sourcing<br/>All Actions Logged]
Layer7 --> ProcessRequest[Process Request]
style Layer1 fill:#ffccbc
style Layer2 fill:#ff9999
style Layer3 fill:#ffb74d
style Layer4 fill:#fff176
style Layer5 fill:#aed581
style Layer6 fill:#4dd0e1
style Layer7 fill:#9575cd
style ProcessRequest fill:#c8e6c9
```
### Security Features
| Feature | Implementation | Location |
|---------|---------------|----------|
| **Zero-Trust** | Device fingerprinting, location, behavior analysis | `zero-trust.validator.ts` |
| **Password Hashing** | bcrypt (cost factor 12) | `auth.service.ts:43` |
| **Token Security** | JWT with HS256, 15min expiry, token rotation | `jwt.service.ts` |
| **CSRF Protection** | State tokens for OAuth, SameSite cookies | `cookie.service.ts` |
| **Rate Limiting** | Redis-backed, dynamic by role | `rate-limit.middleware.ts` |
| **Input Validation** | Zod schemas, sanitization | `validation.middleware.ts` |
| **Audit Logging** | Event sourcing (AuthEvent model) | `audit.service.ts` |
| **Session Security** | Device fingerprinting, IP tracking | `session.service.ts` |
| **MFA** | TOTP with 30s window, backup codes | `mfa.service.ts` |
### Threat Mitigation
| Threat | Mitigation |
|--------|-----------|
| **Brute Force** | Login rate limiting (5 attempts/15min), account lockout |
| **Token Theft** | Short token lifetime (15min), token rotation, device binding |
| **CSRF** | SameSite cookies, state tokens for OAuth |
| **XSS** | Content Security Policy, HttpOnly cookies |
| **SQL Injection** | Prisma ORM parameterized queries |
| **Session Hijacking** | Device fingerprinting, IP validation |
| **Privilege Escalation** | Strict permission checks, audit logging |
| **Replay Attacks** | Token expiry, nonce for OAuth |
---
## Observability
The IAM Service provides comprehensive observability with metrics, logs, and traces:
### Observability Stack
```mermaid
graph TB
subgraph "IAM Service"
Application[Application Code]
Application --> MetricsCollector[Metrics Collector<br/>Prometheus Format]
Application --> Logger[Structured Logger<br/>Winston]
Application --> Tracer[Distributed Tracer<br/>Jaeger Client]
end
subgraph "Collection Layer"
Prometheus[Prometheus<br/>Metrics Storage]
Loki[Loki<br/>Log Aggregation]
Jaeger[Jaeger<br/>Trace Storage]
end
subgraph "Visualization Layer"
Grafana[Grafana<br/>Dashboards + Alerts]
end
MetricsCollector --> Prometheus
Logger --> Loki
Tracer --> Jaeger
Prometheus --> Grafana
Loki --> Grafana
Jaeger --> Grafana
Grafana --> Alerts[Alert Manager<br/>Notifications]
style Application fill:#e1f5fe
style Prometheus fill:#f3e5f5
style Loki fill:#fff3e0
style Jaeger fill:#e8f5e9
style Grafana fill:#fce4ec
```
### Metrics (Prometheus)
**Collected Metrics:**
- HTTP request duration (histogram)
- HTTP request count (counter)
- HTTP response status codes (counter)
- Active sessions (gauge)
- Cache hit/miss ratio (counter)
- Database query duration (histogram)
- Authentication success/failure rate (counter)
- Permission check duration (histogram)
**Endpoints:**
- `/metrics` - Prometheus metrics endpoint
- `/health/live` - Liveness probe
- `/health/ready` - Readiness probe
### Logging (Winston)
**Log Levels:** ERROR, WARN, INFO, DEBUG
**Structured Log Format:**
```json
{
"level": "info",
"message": "User logged in",
"timestamp": "2024-01-01T00:00:00.000Z",
"correlationId": "req-123-456",
"userId": "user-789",
"email": "user@example.com",
"service": "iam-service"
}
```
### Tracing (Jaeger)
**Trace Spans:**
- HTTP request handling
- Database queries
- Cache operations
- External API calls
- Authentication flow
- Authorization checks
**Correlation IDs:**
- Every request gets a unique correlation ID
- Propagated across service calls
- Included in all logs and traces
---
## Deployment Architecture
The IAM Service can be deployed in multiple configurations:
### Local Development
```mermaid
graph LR
Developer[Developer<br/>Localhost] --> LocalIAM[IAM Service<br/>pnpm dev<br/>Port 3001]
LocalIAM --> LocalDB[(PostgreSQL<br/>Docker<br/>Port 5432)]
LocalIAM --> LocalRedis[(Redis<br/>Docker<br/>Port 6379)]
style LocalIAM fill:#e1f5fe
style LocalDB fill:#f3e5f5
style LocalRedis fill:#fff3e0
```
### Docker Compose (Multi-Service)
```mermaid
graph TB
subgraph "Docker Compose Network"
Traefik[Traefik Gateway<br/>Port 80/443]
Traefik --> IAMService[IAM Service<br/>Port 3001]
Traefik --> ProductService[Product Service<br/>Port 3002]
Traefik --> OrderService[Order Service<br/>Port 3003]
IAMService --> SharedDB[(PostgreSQL<br/>Port 5432)]
IAMService --> SharedRedis[(Redis<br/>Port 6379)]
ProductService --> SharedDB
ProductService --> SharedRedis
OrderService --> SharedDB
OrderService --> SharedRedis
end
style Traefik fill:#ffecb3
style IAMService fill:#e1f5fe
style SharedDB fill:#f3e5f5
style SharedRedis fill:#fff3e0
```
### Kubernetes (Production)
```mermaid
graph TB
subgraph "Ingress Layer"
Ingress[Ingress Controller<br/>NGINX/Traefik<br/>TLS Termination]
end
subgraph "Application Layer"
IAMPod1[IAM Pod 1<br/>Replica 1]
IAMPod2[IAM Pod 2<br/>Replica 2]
IAMPod3[IAM Pod 3<br/>Replica 3]
IAMService[IAM Service<br/>ClusterIP]
end
subgraph "Data Layer"
PostgreSQL[(PostgreSQL<br/>StatefulSet<br/>Persistent Volume)]
Redis[(Redis<br/>StatefulSet<br/>Sentinel HA)]
end
subgraph "Observability"
Prometheus[Prometheus<br/>Metrics]
Jaeger[Jaeger<br/>Tracing]
Loki[Loki<br/>Logs]
end
Ingress --> IAMService
IAMService --> IAMPod1
IAMService --> IAMPod2
IAMService --> IAMPod3
IAMPod1 --> PostgreSQL
IAMPod1 --> Redis
IAMPod2 --> PostgreSQL
IAMPod2 --> Redis
IAMPod3 --> PostgreSQL
IAMPod3 --> Redis
IAMPod1 -.-> Prometheus
IAMPod1 -.-> Jaeger
IAMPod1 -.-> Loki
style Ingress fill:#ffecb3
style IAMPod1 fill:#e1f5fe
style IAMPod2 fill:#e1f5fe
style IAMPod3 fill:#e1f5fe
style PostgreSQL fill:#f3e5f5
style Redis fill:#fff3e0
```
### Production Best Practices
1. **High Availability**:
- Multiple IAM service replicas (3+)
- PostgreSQL replication (primary + standby)
- Redis Sentinel for failover
2. **Security**:
- TLS/SSL for all connections
- Network policies for pod-to-pod communication
- Secrets management (HashiCorp Vault, AWS Secrets Manager)
- Non-root containers
3. **Resource Limits**:
```yaml
resources:
requests:
cpu: 500m
memory: 512Mi
limits:
cpu: 2000m
memory: 2Gi
```
4. **Health Checks**:
```yaml
livenessProbe:
httpGet:
path: /health/live
port: 3001
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /health/ready
port: 3001
initialDelaySeconds: 10
periodSeconds: 5
```
5. **Horizontal Pod Autoscaling**:
```yaml
minReplicas: 3
maxReplicas: 10
targetCPUUtilizationPercentage: 70
```
---
## Next Steps
- **User Guides**: See [GETTING-STARTED.md](./guides/01-GETTING-STARTED.md) for setup instructions
- **API Reference**: See [API_REFERENCE.md](./API_REFERENCE.md) for complete endpoint documentation
- **Security Model**: See [SECURITY-MODEL.md](./concepts/SECURITY-MODEL.md) for security details
- **Data Model**: See [DATA-MODEL.md](./concepts/DATA-MODEL.md) for database schema
- **Deployment**: See [DEPLOYMENT-GUIDE.md](./deployment/DEPLOYMENT-GUIDE.md) for deployment instructions
---
## References
- **Security Skill**: [.cursor/skills/security/SKILL.md](../../../.cursor/skills/security/SKILL.md)
- **IAM Proposal**: [docs/en/architecture/iam-proposal.md](../../../docs/en/architecture/iam-proposal.md)
- **Migration Guide**: [docs/en/guides/iam-migration.md](../../../docs/en/guides/iam-migration.md)
- **Prisma Schema**: [prisma/schema.prisma](../prisma/schema.prisma)
- **Routes Definition**: [src/routes/index.ts](../src/routes/index.ts)
---
**Last Updated**: January 2026
**Version**: 1.0.0
**Status**: Production Ready