pos-system/services/iam-service/docs/ARCHITECTURE.en.md

# IAM Service Architecture

> **Enterprise-Grade Identity & Access Management Platform**
>
> This document describes the complete architecture of the IAM Service, including system components, data flows, security model, and integration patterns.

## Table of Contents

- [Overview](#overview)
- [Overall Architecture](#overall-architecture)
- [Authentication Flows](#authentication-flows)
- [Authorization Model](#authorization-model)
- [Caching Strategy](#caching-strategy)
- [Module Dependencies](#module-dependencies)
- [Data Architecture](#data-architecture)
- [Security Architecture](#security-architecture)
- [Observability](#observability)
- [Deployment Architecture](#deployment-architecture)

---

## Overview

The IAM Service is a comprehensive Identity and Access Management platform providing:

### Core Capabilities

| Capability | Description | Modules |
|------------|-------------|---------|
| **Authentication** | User identity verification | Auth, Social, OIDC, MFA |
| **Authorization** | Access control and permissions | RBAC, ABAC, Policy Engine |
| **Identity Management** | User lifecycle and profiles | Users, Profiles, Organizations, Groups |
| **Access Governance** | Request workflows and reviews | Requests, Reviews, Analytics |
| **Compliance** | Regulatory compliance reporting | GDPR, SOC2, ISO27001, Risk Management |
| **Security** | Zero-trust and threat protection | Zero-Trust, Audit, Encryption |
| **Performance** | High-speed access with caching | Multi-layer Cache, Connection Pool |

### Key Metrics

- **50+ API Endpoints** across 10 feature modules
- **30+ Database Models** with comprehensive relationships
- **7 Authentication Methods** (password, social x3, OIDC, MFA x2)
- **Multi-layer Caching** (Memory → Redis → PostgreSQL)
- **Zero-Trust Security** with device fingerprinting
- **Enterprise Compliance** (GDPR, SOC2, ISO27001, HIPAA)

---

## Overall Architecture

The IAM Service follows a layered architecture pattern with clear separation of concerns:

```mermaid
graph TB
    subgraph "Client Layer"
        WebApp[Web Application<br/>React/Vue/Angular]
        MobileApp[Mobile App<br/>iOS/Android/React Native]
        Service[Other Microservices<br/>Product/Order/Payment]
    end

    subgraph "API Gateway"
        Traefik[Traefik Gateway<br/>Load Balancer + Routing<br/>Port 80/443]
    end

    subgraph "IAM Service - Port 3001"
        subgraph "Middleware Stack"
            Correlation[Correlation ID<br/>Request Tracking]
            Logger[Request Logger<br/>Winston]
            ZeroTrust[Zero-Trust Validator<br/>Device + Location + Behavior]
            RateLimit[Dynamic Rate Limiter<br/>Redis-backed<br/>50-1000 req/15min]
            Auth[Authentication<br/>JWT Verification]
            RBAC[Authorization<br/>RBAC + ABAC]
        end

        subgraph "Core Authentication Modules"
            AuthModule[Auth Module<br/>Login/Register/Logout]
            TokenModule[Token Module<br/>JWT + Cookie Management]
            SessionModule[Session Module<br/>Device Fingerprinting]
            SocialModule[Social Auth<br/>Google/FB/GitHub]
            OIDCModule[OIDC Provider<br/>OpenID Connect]
            MFAModule[MFA Module<br/>TOTP/WebAuthn]
            RBACModule[RBAC Module<br/>Roles + Permissions]
        end

        subgraph "Identity Management Modules"
            UserModule[User Lifecycle<br/>CRUD + Bulk Ops]
            ProfileModule[Profile Management<br/>Avatar + Custom Fields]
            VerificationModule[Identity Verification<br/>Email/Phone/Document]
            OrgModule[Organizations<br/>Multi-tenant Hierarchy]
            GroupModule[Groups<br/>Group-based Access]
        end

        subgraph "Access Management Modules"
            RequestModule[Access Requests<br/>Approval Workflows]
            ReviewModule[Access Reviews<br/>Certification Campaigns]
            AnalyticsModule[Access Analytics<br/>Usage + Risk Analysis]
        end

        subgraph "Governance Modules"
            ComplianceModule[Compliance<br/>GDPR/SOC2/ISO27001]
            PolicyModule[Policy Governance<br/>Templates + Versioning]
            RiskModule[Risk Management<br/>Scoring + Detection]
            ReportingModule[Reporting<br/>Dashboards + Exports]
        end

        subgraph "Core Services"
            CacheService[Multi-layer Cache<br/>Memory → Redis<br/>TTL: 60s-15min]
            AuditService[Audit Logging<br/>Event Sourcing<br/>AuthEvent Model]
            SecurityService[Security Services<br/>Encryption + Hashing]
        end
    end

    subgraph "Data Layer"
        PostgreSQL[(PostgreSQL 14+<br/>Primary Database<br/>Connection Pool<br/>30+ Models)]
        Redis[(Redis 6+<br/>Cache + Sessions<br/>Rate Limits + Locks)]
    end

    subgraph "External Services (Future)"
        EmailService[Email Service<br/>Verification + Notifications]
        SMSService[SMS Service<br/>OTP + Phone Verification]
        FileStorage[File Storage<br/>S3/GCS for Avatars]
    end

    subgraph "Observability Stack"
        Prometheus[Prometheus<br/>Metrics Collection]
        Jaeger[Jaeger<br/>Distributed Tracing]
        Logs[Structured Logs<br/>Winston + Loki]
    end

    %% Client to Gateway
    WebApp --> Traefik
    MobileApp --> Traefik
    Service --> Traefik

    %% Gateway to Middleware
    Traefik --> Correlation
    Correlation --> Logger
    Logger --> ZeroTrust
    ZeroTrust --> RateLimit
    RateLimit --> Auth
    Auth --> RBAC

    %% Middleware to Modules
    RBAC --> AuthModule
    RBAC --> UserModule
    RBAC --> RequestModule
    RBAC --> ComplianceModule

    %% Module Dependencies
    AuthModule --> TokenModule
    AuthModule --> SessionModule
    AuthModule --> SocialModule
    AuthModule --> OIDCModule
    AuthModule --> MFAModule
    AuthModule --> RBACModule

    UserModule --> ProfileModule
    UserModule --> VerificationModule
    UserModule --> OrgModule
    UserModule --> GroupModule

    RequestModule --> ReviewModule
    RequestModule --> AnalyticsModule

    ComplianceModule --> PolicyModule
    ComplianceModule --> RiskModule
    ComplianceModule --> ReportingModule

    %% Core Services
    AuthModule --> CacheService
    UserModule --> CacheService
    RBACModule --> CacheService

    AuthModule --> AuditService
    RequestModule --> AuditService
    ComplianceModule --> AuditService

    AuthModule --> SecurityService

    %% Data Layer
    CacheService --> Redis
    CacheService --> PostgreSQL

    AuthModule --> PostgreSQL
    UserModule --> PostgreSQL
    RequestModule --> PostgreSQL
    ComplianceModule --> PostgreSQL

    %% External Services (dashed - not yet integrated)
    VerificationModule -.-> EmailService
    VerificationModule -.-> SMSService
    ProfileModule -.-> FileStorage

    %% Observability
    AuthModule -.-> Prometheus
    UserModule -.-> Prometheus
    RequestModule -.-> Prometheus
    ComplianceModule -.-> Prometheus

    CacheService -.-> Jaeger
    AuditService -.-> Logs

    style ZeroTrust fill:#ff9999
    style CacheService fill:#99ccff
    style PostgreSQL fill:#cc99ff
    style Redis fill:#ffcc99
    style AuditService fill:#99ff99
```

### Architecture Highlights

1. **Layered Middleware**: Every request passes through 6 middleware layers (correlation, logging, zero-trust, rate limiting, authentication, authorization)
2. **Modular Design**: 10 independent feature modules with clear boundaries
3. **Multi-layer Caching**: Memory (60s) → Redis (5-15min) → PostgreSQL for optimal performance
4. **Event Sourcing**: All authentication and authorization events logged for audit compliance
5. **Zero-Trust Security**: Continuous validation of device, location, and behavior
6. **Dynamic Rate Limiting**: Role-based limits (50-1000 req/15min)

---

## Authentication Flows

### Password-Based Authentication

Standard email/password authentication with bcrypt hashing and JWT token generation:

```mermaid
sequenceDiagram
    participant Client
    participant Middleware
    participant AuthService
    participant RBACService
    participant Database
    participant Redis
    participant AuditService

    Client->>Middleware: POST /api/v1/auth/login<br/>{email, password}

    Middleware->>Middleware: 1. Correlation ID
    Middleware->>Middleware: 2. Zero-Trust Validation<br/>(Device + IP + Behavior)
    Middleware->>Middleware: 3. Rate Limit Check<br/>(5 login/15min)

    Middleware->>AuthService: login(email, password)

    AuthService->>Database: Find user by email
    Database-->>AuthService: User record

    AuthService->>AuthService: Verify password<br/>(bcrypt compare, cost=12)

    alt Password Invalid
        AuthService->>AuditService: Log LOGIN_FAILED event
        AuditService->>Database: Save AuthEvent
        AuthService-->>Client: 401 Unauthorized<br/>"Invalid credentials"
    else Password Valid & MFA Disabled
        AuthService->>RBACService: Get user roles + permissions
        RBACService->>Database: Query UserRole, UserPermission
        Database-->>RBACService: Roles + Permissions
        RBACService-->>AuthService: ["ADMIN"], ["users:read:all"]

        AuthService->>AuthService: Generate JWT tokens<br/>Access: 15min<br/>Refresh: 7 days

        AuthService->>Database: Save refresh token
        AuthService->>Redis: Cache user data<br/>TTL: 15min
        AuthService->>Redis: Cache permissions<br/>TTL: 5min
        AuthService->>Database: Create session<br/>(device fingerprint)

        AuthService->>Database: Update lastLoginAt, loginCount
        AuthService->>AuditService: Log LOGIN_SUCCESS event

        AuthService-->>Client: 200 OK<br/>{user, tokens}<br/>Set-Cookie: refresh_token
    else Password Valid & MFA Enabled
        AuthService-->>Client: 200 OK<br/>{mfaRequired: true}
        Note over Client: Prompt for MFA code
        Client->>Middleware: POST /api/v1/mfa/verify<br/>{token: "123456"}
        Note over Middleware,AuthService: MFA verification flow...
    end
```

**Key Security Features:**
- Bcrypt password hashing (cost factor 12 in production)
- Token family tracking for rotation security
- Device fingerprinting for session management
- Zero-trust validation before authentication
- Comprehensive audit logging

### Social Authentication Flow

OAuth 2.0 integration with Google, Facebook, and GitHub:

```mermaid
sequenceDiagram
    participant Client
    participant IAM
    participant Google as Google OAuth
    participant Database
    participant Redis

    Client->>IAM: GET /api/v1/auth/social/google
    IAM->>IAM: Generate state token<br/>(CSRF protection)
    IAM->>Redis: Store state token<br/>TTL: 10min
    IAM-->>Client: 302 Redirect to Google<br/>with state param

    Client->>Google: OAuth consent screen
    Google-->>Client: Authorization code + state

    Client->>IAM: GET /api/v1/auth/social/google/callback<br/>?code=xxx&state=yyy

    IAM->>Redis: Verify state token
    alt State Invalid
        IAM-->>Client: 401 CSRF token invalid
    else State Valid
        IAM->>Google: Exchange code for tokens
        Google-->>IAM: Access token + User profile

        IAM->>Database: Find or create user<br/>by provider + providerId

        alt User Found
            IAM->>Database: Update social account tokens
        else New User
            IAM->>Database: Create user + social account
            IAM->>Database: Assign default role (USER)
        end

        IAM->>IAM: Generate IAM JWT tokens
        IAM->>Redis: Cache user + permissions
        IAM->>Database: Create session

        IAM-->>Client: 302 Redirect to app<br/>with tokens in URL/cookie
    end
```

**Supported Providers:**
- Google OAuth 2.0
- Facebook OAuth
- GitHub OAuth
- Apple Sign-In (future)
- Microsoft OAuth (future)

### MFA (Multi-Factor Authentication) Flow

TOTP-based two-factor authentication using authenticator apps:

```mermaid
sequenceDiagram
    participant Client
    participant IAM
    participant Authenticator as Authenticator App
    participant Database

    Note over Client,Database: MFA Enrollment Phase

    Client->>IAM: POST /api/v1/mfa/totp/enable
    Note over Client: User must be authenticated

    IAM->>IAM: Generate TOTP secret (32 chars)
    IAM->>IAM: Generate QR code<br/>(otpauth://totp/...)
    IAM-->>Client: {secret, qrCode, backupCodes}

    Client->>Authenticator: Scan QR code
    Authenticator-->>Client: Display 6-digit TOTP

    Client->>IAM: POST /api/v1/mfa/totp/verify<br/>{token: "123456"}

    IAM->>IAM: Verify TOTP token<br/>(30s window, ±1 interval)

    alt Token Valid
        IAM->>Database: Create MFADevice record<br/>(type: TOTP, secret: encrypted)
        IAM->>Database: Update user.mfaEnabled = true
        IAM-->>Client: 200 OK MFA enabled
    else Token Invalid
        IAM-->>Client: 401 Invalid TOTP token
    end

    Note over Client,Database: MFA Login Phase

    Client->>IAM: POST /api/v1/auth/login<br/>{email, password}
    IAM->>Database: Verify credentials

    alt MFA Required
        IAM-->>Client: 200 OK {mfaRequired: true}

        Client->>Authenticator: Get current TOTP
        Authenticator-->>Client: Current 6-digit code

        Client->>IAM: POST /api/v1/mfa/totp/validate<br/>{userId, token}

        IAM->>Database: Get MFADevice for user
        IAM->>IAM: Verify TOTP token

        alt Token Valid
            IAM->>IAM: Generate full JWT tokens
            IAM->>Database: Create session
            IAM->>Database: Update device.lastUsedAt
            IAM-->>Client: 200 OK {user, tokens}
        else Token Invalid
            IAM-->>Client: 401 Invalid MFA token
        end
    end
```

**MFA Features:**
- TOTP (Time-based One-Time Password) via authenticator apps
- QR code generation for easy enrollment
- Backup codes for account recovery
- Multiple MFA devices per user
- WebAuthn/FIDO2 framework (future implementation)

---

## Authorization Model

The IAM Service implements a hybrid authorization model combining RBAC (Role-Based Access Control) and ABAC (Attribute-Based Access Control):

### Authorization Decision Flow

```mermaid
flowchart TD
    Start[Request to Protected Resource] --> Auth{Authenticated?}

    Auth -->|No| Deny401[401 Unauthorized<br/>Authentication required]
    Auth -->|Yes| Cache{Permission<br/>in Cache?}

    Cache -->|Hit| CacheValue{Cache<br/>Value?}
    CacheValue -->|Allow| Allow[Access Granted]
    CacheValue -->|Deny| Deny403Cache[403 Forbidden<br/>Cached denial]

    Cache -->|Miss| LoadPerms[Load Permissions<br/>from Database]

    LoadPerms --> DirectPerms{Has Direct<br/>User Permission?}

    DirectPerms -->|Explicit Deny| Deny403A[403 Forbidden<br/>Explicit permission denial]
    DirectPerms -->|Explicit Allow| Allow
    DirectPerms -->|None| RolePerms{Has Role<br/>Permission?}

    RolePerms -->|Yes| CheckExpiry{Role<br/>Expired?}
    RolePerms -->|No| GroupPerms{Has Group<br/>Permission?}

    CheckExpiry -->|Yes| GroupPerms
    CheckExpiry -->|No| Allow

    GroupPerms -->|Yes| Allow
    GroupPerms -->|No| ABACCheck{ABAC Policy<br/>Exists?}

    ABACCheck -->|No Policies| DefaultDeny[403 Forbidden<br/>Default deny - No permissions]
    ABACCheck -->|Has Policies| EvaluatePolicies[Evaluate Policies<br/>by Priority]

    EvaluatePolicies --> EvalConditions{Policy<br/>Conditions?}

    EvalConditions -->|Time-based| CheckTime{Current time<br/>in range?}
    EvalConditions -->|Location-based| CheckLocation{IP in<br/>allowed range?}
    EvalConditions -->|Attribute-based| CheckAttrs{Attributes<br/>match?}

    CheckTime -->|Yes| PolicyEffect{Policy<br/>Effect?}
    CheckTime -->|No| NextPolicy{More<br/>Policies?}

    CheckLocation -->|Yes| PolicyEffect
    CheckLocation -->|No| NextPolicy

    CheckAttrs -->|Yes| PolicyEffect
    CheckAttrs -->|No| NextPolicy

    PolicyEffect -->|ALLOW| Allow
    PolicyEffect -->|DENY| Deny403Policy[403 Forbidden<br/>Policy denial]

    NextPolicy -->|Yes| EvaluatePolicies
    NextPolicy -->|No| DefaultDeny

    Allow --> CacheResult[Cache Result<br/>TTL: 5min]
    CacheResult --> AuditLog[Log Access Event<br/>to AuthEvent]
    AuditLog --> Success[200 OK<br/>Process request]

    style Auth fill:#e1f5fe
    style DirectPerms fill:#fff3e0
    style RolePerms fill:#f3e5f5
    style GroupPerms fill:#e8f5e9
    style ABACCheck fill:#fce4ec
    style Allow fill:#c8e6c9
    style Deny401 fill:#ffcdd2
    style Deny403A fill:#ffcdd2
    style Deny403Cache fill:#ffcdd2
    style Deny403Policy fill:#ffcdd2
    style DefaultDeny fill:#ffcdd2
```

### Permission Model

The IAM Service uses a hierarchical permission model:

**Permission Format**: `resource:action:scope`

- **Resource**: The entity being accessed (e.g., `users`, `products`, `orders`)
- **Action**: The operation being performed (e.g., `read`, `create`, `update`, `delete`)
- **Scope**: The access boundary (e.g., `own`, `team`, `all`)

**Examples**:
- `users:read:all` - Read all users
- `users:update:own` - Update own user profile
- `products:create:team` - Create products for team
- `orders:delete:all` - Delete any order (admin)

### Authorization Hierarchy

```
1. Direct User Permissions (HIGHEST PRIORITY)
   ↓ (if not found or not denied)
2. Role Permissions
   ↓ (if not found)
3. Group Permissions
   ↓ (if not found)
4. ABAC Policies (evaluated by priority)
   ↓ (if no policies match)
5. Default Deny (LOWEST PRIORITY)
```

### RBAC (Role-Based Access Control)

```mermaid
graph LR
    User[User] --> UserRole1[UserRole]
    User --> UserRole2[UserRole]
    User --> UserPerm[UserPermission<br/>Direct Override]

    UserRole1 --> Role1[ADMIN Role]
    UserRole2 --> Role2[MANAGER Role]

    Role1 --> RolePerm1[RolePermission]
    Role1 --> RolePerm2[RolePermission]
    Role2 --> RolePerm3[RolePermission]

    RolePerm1 --> Perm1[users:*:all]
    RolePerm2 --> Perm2[products:*:all]
    RolePerm3 --> Perm3[orders:read:team]

    UserPerm --> Perm4[analytics:read:all]

    Group[Group] --> GroupMember[GroupMember]
    GroupMember --> User
    Group --> GroupPerm[GroupPermission]
    GroupPerm --> Perm5[reports:read:all]

    style User fill:#e1f5fe
    style Role1 fill:#f3e5f5
    style Role2 fill:#f3e5f5
    style Group fill:#e8f5e9
    style Perm1 fill:#fff3e0
    style Perm2 fill:#fff3e0
    style Perm3 fill:#fff3e0
    style Perm4 fill:#ffccbc
    style Perm5 fill:#c8e6c9
```

**Features:**
- Multiple roles per user
- Temporary role assignments with expiration
- Direct user permissions (can override roles)
- Group-based permissions
- Permission caching (5 minutes TTL)

### ABAC (Attribute-Based Access Control)

```mermaid
graph TD
    Request[Access Request] --> PolicyEngine[Policy Engine]

    PolicyEngine --> LoadPolicies[Load Policies<br/>for Resource]
    LoadPolicies --> SortPolicies[Sort by Priority<br/>Descending]

    SortPolicies --> EvaluatePolicy[Evaluate Policy<br/>Conditions]

    EvaluatePolicy --> TimeCondition{Time-based<br/>Condition?}
    EvaluatePolicy --> LocationCondition{Location-based<br/>Condition?}
    EvaluatePolicy --> AttributeCondition{Attribute-based<br/>Condition?}

    TimeCondition --> CheckTime[Check current time<br/>vs allowed hours]
    LocationCondition --> CheckIP[Check IP address<br/>vs allowed ranges]
    AttributeCondition --> CheckAttrs[Check user attributes<br/>vs required values]

    CheckTime --> ConditionMet{All Conditions<br/>Met?}
    CheckIP --> ConditionMet
    CheckAttrs --> ConditionMet

    ConditionMet -->|Yes| ApplyEffect{Policy<br/>Effect?}
    ConditionMet -->|No| NextPolicy{More<br/>Policies?}

    ApplyEffect -->|ALLOW| Allow[Access Granted]
    ApplyEffect -->|DENY| Deny[Access Denied]

    NextPolicy -->|Yes| EvaluatePolicy
    NextPolicy -->|No| DefaultDeny[Default Deny]

    style PolicyEngine fill:#f3e5f5
    style ConditionMet fill:#fff3e0
    style Allow fill:#c8e6c9
    style Deny fill:#ffcdd2
    style DefaultDeny fill:#ffcdd2
```

**Policy Example (JSON Logic)**:
```json
{
  "name": "Business Hours Only",
  "resource": "sensitive_data",
  "condition": {
    "and": [
      {
        ">=": [{"var": "hour"}, 9]
      },
      {
        "<=": [{"var": "hour"}, 17]
      },
      {
        "in": [{"var": "day"}, [1, 2, 3, 4, 5]]
      }
    ]
  },
  "effect": "ALLOW",
  "priority": 100
}
```

---

## Caching Strategy

The IAM Service implements a multi-layer caching strategy for optimal performance:

```mermaid
graph TB
    Request[Incoming Request] --> CheckL1{L1 Cache<br/>Node.js Memory}

    CheckL1 -->|Cache Hit| L1Hit[Fast Response<br/>< 1ms latency]
    CheckL1 -->|Cache Miss| CheckL2{L2 Cache<br/>Redis}

    CheckL2 -->|Cache Hit| L2Hit[Medium Response<br/>< 10ms latency]
    CheckL2 -->|Cache Miss| Database[(PostgreSQL<br/>Source of Truth)]

    Database --> UpdateL2[Update L2 Cache<br/>Write to Redis]
    UpdateL2 --> UpdateL1[Update L1 Cache<br/>Write to Memory]

    UpdateL1 --> Response[Return Response]
    L2Hit --> UpdateL1
    L1Hit --> Response

    subgraph "Cache Layers"
        L1Cache[L1: In-Memory Cache<br/>node-cache<br/>TTL: 60 seconds<br/>Hot data only]
        L2Cache[L2: Redis Cache<br/>ioredis<br/>TTL: 5-15 minutes<br/>Distributed]
        DBLayer[L3: PostgreSQL<br/>Prisma ORM<br/>Source of truth<br/>Persistent]
    end

    subgraph "Cached Data Types"
        UserData[User Data<br/>TTL: 15min]
        Permissions[Permissions<br/>TTL: 5min]
        Tokens[Token Validation<br/>TTL: Token lifetime]
        Sessions[Sessions<br/>TTL: Session lifetime]
        Roles[User Roles<br/>TTL: 5min]
    end

    style L1Hit fill:#90EE90
    style L2Hit fill:#87CEEB
    style Database fill:#FFB6C1
    style L1Cache fill:#c8e6c9
    style L2Cache fill:#bbdefb
    style DBLayer fill:#f8bbd0
```

### Cache Configuration

| Data Type | L1 TTL | L2 TTL | Invalidation Strategy |
|-----------|--------|--------|----------------------|
| User Data | 60s | 15min | On update/delete |
| Permissions | 60s | 5min | On role/permission change |
| User Roles | 60s | 5min | On role assignment/revocation |
| Token Validation | N/A | Token lifetime | On logout/revocation |
| Sessions | N/A | Session lifetime | On logout |
| Rate Limit Counters | N/A | 15min window | Time-based expiry |

### Cache Invalidation

```typescript
// Example: Invalidate user caches when permissions change
await rbacService.grantPermission(userId, permissionId);

// Automatically invalidates:
// - cacheService.keys.userPermissions(userId)
// - cacheService.keys.userRoles(userId)
```

---

## Module Dependencies

The IAM Service is organized into 10 feature modules with clear dependencies:

```mermaid
graph TD
    subgraph "Presentation Layer"
        Routes[Routes/Controllers<br/>50+ Endpoints]
    end

    subgraph "Business Logic Layer"
        subgraph "Core Auth"
            AuthModule[Auth Module<br/>Login/Register/Logout]
            TokenModule[Token Module<br/>JWT + Cookies]
            SessionModule[Session Module<br/>Device Tracking]
        end

        subgraph "Extended Auth"
            RBACModule[RBAC Module<br/>Roles + Permissions]
            SocialModule[Social Auth<br/>Google/FB/GitHub]
            OIDCModule[OIDC Module<br/>Provider + Client]
            MFAModule[MFA Module<br/>TOTP/WebAuthn]
        end

        subgraph "Identity"
            IdentityModule[Identity Module<br/>Users + Profiles]
            OrgModule[Organization Module<br/>Multi-tenancy]
            GroupModule[Group Module<br/>Group Permissions]
        end

        subgraph "Access Governance"
            AccessModule[Access Module<br/>Requests + Reviews]
            AnalyticsModule[Analytics Module<br/>Usage Analysis]
        end

        subgraph "Compliance"
            GovernanceModule[Governance Module<br/>Compliance + Risk]
        end
    end

    subgraph "Core Services Layer"
        CacheService[Cache Service<br/>Multi-layer]
        AuditService[Audit Service<br/>Event Sourcing]
        SecurityService[Security Service<br/>Encryption]
        FeatureService[Feature Flags<br/>Runtime Config]
    end

    subgraph "Data Access Layer"
        Repositories[Repositories<br/>Data Access<br/>Prisma ORM]
    end

    subgraph "Infrastructure"
        Database[(PostgreSQL<br/>30+ Models)]
        Redis[(Redis<br/>Cache + Locks)]
    end

    %% Routes to Modules
    Routes --> AuthModule
    Routes --> IdentityModule
    Routes --> AccessModule
    Routes --> GovernanceModule

    %% Core Auth Dependencies
    AuthModule --> TokenModule
    AuthModule --> SessionModule
    AuthModule --> SocialModule
    AuthModule --> OIDCModule
    AuthModule --> MFAModule
    AuthModule --> RBACModule

    %% Identity Dependencies
    IdentityModule --> OrgModule
    IdentityModule --> GroupModule
    IdentityModule --> RBACModule

    %% Access Dependencies
    AccessModule --> AnalyticsModule
    AccessModule --> RBACModule

    %% Governance Dependencies
    GovernanceModule --> RBACModule

    %% Core Services
    AuthModule --> CacheService
    IdentityModule --> CacheService
    RBACModule --> CacheService

    AuthModule --> AuditService
    AccessModule --> AuditService
    GovernanceModule --> AuditService

    AuthModule --> SecurityService

    AuthModule --> FeatureService

    %% Data Access
    AuthModule --> Repositories
    IdentityModule --> Repositories
    AccessModule --> Repositories
    GovernanceModule --> Repositories
    RBACModule --> Repositories

    Repositories --> Database
    CacheService --> Redis
    CacheService --> Database

    style AuthModule fill:#e1f5fe
    style RBACModule fill:#f3e5f5
    style IdentityModule fill:#fff3e0
    style AccessModule fill:#e8f5e9
    style GovernanceModule fill:#fce4ec
    style CacheService fill:#bbdefb
    style AuditService fill:#c8e6c9
    style SecurityService fill:#ffccbc
```

### Module Descriptions

| Module | Responsibility | Key Files |
|--------|---------------|-----------|
| **Auth** | User authentication and token management | `auth.service.ts`, `auth.controller.ts` |
| **Token** | JWT generation, validation, and refresh | `jwt.service.ts`, `cookie.service.ts` |
| **Session** | Session lifecycle and device management | `session.service.ts` |
| **RBAC** | Role and permission management | `rbac.service.ts`, `rbac.middleware.ts` |
| **Social** | OAuth integration with external providers | `social.service.ts`, `google.strategy.ts` |
| **OIDC** | OpenID Connect provider and client | `oidc-provider.service.ts` |
| **MFA** | Multi-factor authentication | `mfa.service.ts`, `totp.service.ts` |
| **Identity** | User lifecycle and profile management | `user.service.ts`, `profile.service.ts` |
| **Organization** | Multi-tenant organization support | `organization.service.ts` |
| **Group** | Group-based access control | `group.service.ts` |
| **Access** | Access request workflows and reviews | `request.service.ts`, `review.service.ts` |
| **Analytics** | Access analytics and reporting | `analytics.service.ts` |
| **Governance** | Compliance, policy, and risk management | `compliance.service.ts`, `risk.service.ts` |
| **Cache** | Multi-layer caching (Memory + Redis) | `cache.service.ts` |
| **Audit** | Event sourcing and audit logging | `audit.service.ts` |
| **Security** | Encryption, hashing, zero-trust | `zero-trust.validator.ts` |

---

## Data Architecture

The IAM Service uses PostgreSQL with 30+ Prisma models. See [DATA-MODEL.md](./concepts/DATA-MODEL.md) for complete Entity Relationship Diagram.

### Model Categories

1. **Core Authentication** (7 models):
   - User, Session, RefreshToken, AuthEvent, SocialAccount, MFADevice, Policy

2. **Authorization** (6 models):
   - Role, Permission, UserRole, RolePermission, UserPermission, GroupPermission

3. **Identity Management** (6 models):
   - Organization, Group, GroupMember, UserProfile, IdentityVerification

4. **Access Management** (4 models):
   - AccessRequest, AccessRequestApprover, AccessReview, AccessReviewItem

5. **Governance** (3 models):
   - ComplianceReport, PolicyTemplate, RiskScore

### Key Relationships

```
User (1) ─── (*) UserRole ─── (*) Role ─── (*) RolePermission ─── (*) Permission
User (1) ─── (*) UserPermission ─── (*) Permission
User (1) ─── (*) Session
User (1) ─── (1) UserProfile
User (1) ─── (*) AccessRequest
Organization (1) ─── (*) User
Organization (1) ─── (*) Group ─── (*) GroupMember ─── (*) User
```

---

## Security Architecture

The IAM Service implements defense-in-depth security with multiple layers:

### Security Layers

```mermaid
graph TB
    Request[Incoming Request] --> Layer1[Layer 1: Network Security<br/>Traefik Gateway + TLS]

    Layer1 --> Layer2[Layer 2: Zero-Trust Validation<br/>Device + Location + Behavior]

    Layer2 --> Layer3[Layer 3: Rate Limiting<br/>Dynamic by Role<br/>50-1000 req/15min]

    Layer3 --> Layer4[Layer 4: Authentication<br/>JWT Validation<br/>Token Expiry: 15min]

    Layer4 --> Layer5[Layer 5: Authorization<br/>RBAC + ABAC<br/>Permission Checking]

    Layer5 --> Layer6[Layer 6: Input Validation<br/>Zod Schemas<br/>Sanitization]

    Layer6 --> Layer7[Layer 7: Audit Logging<br/>Event Sourcing<br/>All Actions Logged]

    Layer7 --> ProcessRequest[Process Request]

    style Layer1 fill:#ffccbc
    style Layer2 fill:#ff9999
    style Layer3 fill:#ffb74d
    style Layer4 fill:#fff176
    style Layer5 fill:#aed581
    style Layer6 fill:#4dd0e1
    style Layer7 fill:#9575cd
    style ProcessRequest fill:#c8e6c9
```

### Security Features

| Feature | Implementation | Location |
|---------|---------------|----------|
| **Zero-Trust** | Device fingerprinting, location, behavior analysis | `zero-trust.validator.ts` |
| **Password Hashing** | bcrypt (cost factor 12) | `auth.service.ts:43` |
| **Token Security** | JWT with HS256, 15min expiry, token rotation | `jwt.service.ts` |
| **CSRF Protection** | State tokens for OAuth, SameSite cookies | `cookie.service.ts` |
| **Rate Limiting** | Redis-backed, dynamic by role | `rate-limit.middleware.ts` |
| **Input Validation** | Zod schemas, sanitization | `validation.middleware.ts` |
| **Audit Logging** | Event sourcing (AuthEvent model) | `audit.service.ts` |
| **Session Security** | Device fingerprinting, IP tracking | `session.service.ts` |
| **MFA** | TOTP with 30s window, backup codes | `mfa.service.ts` |

### Threat Mitigation

| Threat | Mitigation |
|--------|-----------|
| **Brute Force** | Login rate limiting (5 attempts/15min), account lockout |
| **Token Theft** | Short token lifetime (15min), token rotation, device binding |
| **CSRF** | SameSite cookies, state tokens for OAuth |
| **XSS** | Content Security Policy, HttpOnly cookies |
| **SQL Injection** | Prisma ORM parameterized queries |
| **Session Hijacking** | Device fingerprinting, IP validation |
| **Privilege Escalation** | Strict permission checks, audit logging |
| **Replay Attacks** | Token expiry, nonce for OAuth |

---

## Observability

The IAM Service provides comprehensive observability with metrics, logs, and traces:

### Observability Stack

```mermaid
graph TB
    subgraph "IAM Service"
        Application[Application Code]

        Application --> MetricsCollector[Metrics Collector<br/>Prometheus Format]
        Application --> Logger[Structured Logger<br/>Winston]
        Application --> Tracer[Distributed Tracer<br/>Jaeger Client]
    end

    subgraph "Collection Layer"
        Prometheus[Prometheus<br/>Metrics Storage]
        Loki[Loki<br/>Log Aggregation]
        Jaeger[Jaeger<br/>Trace Storage]
    end

    subgraph "Visualization Layer"
        Grafana[Grafana<br/>Dashboards + Alerts]
    end

    MetricsCollector --> Prometheus
    Logger --> Loki
    Tracer --> Jaeger

    Prometheus --> Grafana
    Loki --> Grafana
    Jaeger --> Grafana

    Grafana --> Alerts[Alert Manager<br/>Notifications]

    style Application fill:#e1f5fe
    style Prometheus fill:#f3e5f5
    style Loki fill:#fff3e0
    style Jaeger fill:#e8f5e9
    style Grafana fill:#fce4ec
```

### Metrics (Prometheus)

**Collected Metrics:**
- HTTP request duration (histogram)
- HTTP request count (counter)
- HTTP response status codes (counter)
- Active sessions (gauge)
- Cache hit/miss ratio (counter)
- Database query duration (histogram)
- Authentication success/failure rate (counter)
- Permission check duration (histogram)

**Endpoints:**
- `/metrics` - Prometheus metrics endpoint
- `/health/live` - Liveness probe
- `/health/ready` - Readiness probe

### Logging (Winston)

**Log Levels:** ERROR, WARN, INFO, DEBUG

**Structured Log Format:**
```json
{
  "level": "info",
  "message": "User logged in",
  "timestamp": "2024-01-01T00:00:00.000Z",
  "correlationId": "req-123-456",
  "userId": "user-789",
  "email": "user@example.com",
  "service": "iam-service"
}
```

### Tracing (Jaeger)

**Trace Spans:**
- HTTP request handling
- Database queries
- Cache operations
- External API calls
- Authentication flow
- Authorization checks

**Correlation IDs:**
- Every request gets a unique correlation ID
- Propagated across service calls
- Included in all logs and traces

---

## Deployment Architecture

The IAM Service can be deployed in multiple configurations:

### Local Development

```mermaid
graph LR
    Developer[Developer<br/>Localhost] --> LocalIAM[IAM Service<br/>pnpm dev<br/>Port 3001]

    LocalIAM --> LocalDB[(PostgreSQL<br/>Docker<br/>Port 5432)]
    LocalIAM --> LocalRedis[(Redis<br/>Docker<br/>Port 6379)]

    style LocalIAM fill:#e1f5fe
    style LocalDB fill:#f3e5f5
    style LocalRedis fill:#fff3e0
```

### Docker Compose (Multi-Service)

```mermaid
graph TB
    subgraph "Docker Compose Network"
        Traefik[Traefik Gateway<br/>Port 80/443]

        Traefik --> IAMService[IAM Service<br/>Port 3001]
        Traefik --> ProductService[Product Service<br/>Port 3002]
        Traefik --> OrderService[Order Service<br/>Port 3003]

        IAMService --> SharedDB[(PostgreSQL<br/>Port 5432)]
        IAMService --> SharedRedis[(Redis<br/>Port 6379)]

        ProductService --> SharedDB
        ProductService --> SharedRedis

        OrderService --> SharedDB
        OrderService --> SharedRedis
    end

    style Traefik fill:#ffecb3
    style IAMService fill:#e1f5fe
    style SharedDB fill:#f3e5f5
    style SharedRedis fill:#fff3e0
```

### Kubernetes (Production)

```mermaid
graph TB
    subgraph "Ingress Layer"
        Ingress[Ingress Controller<br/>NGINX/Traefik<br/>TLS Termination]
    end

    subgraph "Application Layer"
        IAMPod1[IAM Pod 1<br/>Replica 1]
        IAMPod2[IAM Pod 2<br/>Replica 2]
        IAMPod3[IAM Pod 3<br/>Replica 3]

        IAMService[IAM Service<br/>ClusterIP]
    end

    subgraph "Data Layer"
        PostgreSQL[(PostgreSQL<br/>StatefulSet<br/>Persistent Volume)]
        Redis[(Redis<br/>StatefulSet<br/>Sentinel HA)]
    end

    subgraph "Observability"
        Prometheus[Prometheus<br/>Metrics]
        Jaeger[Jaeger<br/>Tracing]
        Loki[Loki<br/>Logs]
    end

    Ingress --> IAMService
    IAMService --> IAMPod1
    IAMService --> IAMPod2
    IAMService --> IAMPod3

    IAMPod1 --> PostgreSQL
    IAMPod1 --> Redis
    IAMPod2 --> PostgreSQL
    IAMPod2 --> Redis
    IAMPod3 --> PostgreSQL
    IAMPod3 --> Redis

    IAMPod1 -.-> Prometheus
    IAMPod1 -.-> Jaeger
    IAMPod1 -.-> Loki

    style Ingress fill:#ffecb3
    style IAMPod1 fill:#e1f5fe
    style IAMPod2 fill:#e1f5fe
    style IAMPod3 fill:#e1f5fe
    style PostgreSQL fill:#f3e5f5
    style Redis fill:#fff3e0
```

### Production Best Practices

1. **High Availability**:
   - Multiple IAM service replicas (3+)
   - PostgreSQL replication (primary + standby)
   - Redis Sentinel for failover

2. **Security**:
   - TLS/SSL for all connections
   - Network policies for pod-to-pod communication
   - Secrets management (HashiCorp Vault, AWS Secrets Manager)
   - Non-root containers

3. **Resource Limits**:
   ```yaml
   resources:
     requests:
       cpu: 500m
       memory: 512Mi
     limits:
       cpu: 2000m
       memory: 2Gi
   ```

4. **Health Checks**:
   ```yaml
   livenessProbe:
     httpGet:
       path: /health/live
       port: 3001
     initialDelaySeconds: 30
     periodSeconds: 10

   readinessProbe:
     httpGet:
       path: /health/ready
       port: 3001
     initialDelaySeconds: 10
     periodSeconds: 5
   ```

5. **Horizontal Pod Autoscaling**:
   ```yaml
   minReplicas: 3
   maxReplicas: 10
   targetCPUUtilizationPercentage: 70
   ```

---

## Next Steps

- **User Guides**: See [GETTING-STARTED.md](./guides/01-GETTING-STARTED.md) for setup instructions
- **API Reference**: See [API_REFERENCE.md](./API_REFERENCE.md) for complete endpoint documentation
- **Security Model**: See [SECURITY-MODEL.md](./concepts/SECURITY-MODEL.md) for security details
- **Data Model**: See [DATA-MODEL.md](./concepts/DATA-MODEL.md) for database schema
- **Deployment**: See [DEPLOYMENT-GUIDE.md](./deployment/DEPLOYMENT-GUIDE.md) for deployment instructions

---

## References

- **Security Skill**: [.cursor/skills/security/SKILL.md](../../../.cursor/skills/security/SKILL.md)
- **IAM Proposal**: [docs/en/architecture/iam-proposal.md](../../../docs/en/architecture/iam-proposal.md)
- **Migration Guide**: [docs/en/guides/iam-migration.md](../../../docs/en/guides/iam-migration.md)
- **Prisma Schema**: [prisma/schema.prisma](../prisma/schema.prisma)
- **Routes Definition**: [src/routes/index.ts](../src/routes/index.ts)

---

**Last Updated**: January 2026
**Version**: 1.0.0
**Status**: Production Ready