Files
pos-system/services/iam-service/docs/ARCHITECTURE.en.md
Ho Ngoc Hai d8411abd24 Revise IAM Service Architecture documentation for clarity and comprehensiveness
- Updated the document title to reflect the focus on IAM Service Architecture.
- Expanded the overview section to provide a clearer description of the IAM Service's capabilities.
- Enhanced the table of contents for better navigation.
- Added detailed architecture diagrams illustrating the layered architecture and key components.
- Included comprehensive sections on authentication flows, authorization models, caching strategies, and security architecture.
- Improved overall structure and readability to facilitate understanding of the IAM Service's design and functionality.

These changes aim to provide developers with a thorough understanding of the IAM Service architecture and its components.
2026-01-02 00:30:26 +07:00

37 KiB

IAM Service Architecture

Enterprise-Grade Identity & Access Management Platform

This document describes the complete architecture of the IAM Service, including system components, data flows, security model, and integration patterns.

Table of Contents


Overview

The IAM Service is a comprehensive Identity and Access Management platform providing:

Core Capabilities

Capability Description Modules
Authentication User identity verification Auth, Social, OIDC, MFA
Authorization Access control and permissions RBAC, ABAC, Policy Engine
Identity Management User lifecycle and profiles Users, Profiles, Organizations, Groups
Access Governance Request workflows and reviews Requests, Reviews, Analytics
Compliance Regulatory compliance reporting GDPR, SOC2, ISO27001, Risk Management
Security Zero-trust and threat protection Zero-Trust, Audit, Encryption
Performance High-speed access with caching Multi-layer Cache, Connection Pool

Key Metrics

  • 50+ API Endpoints across 10 feature modules
  • 30+ Database Models with comprehensive relationships
  • 7 Authentication Methods (password, social x3, OIDC, MFA x2)
  • Multi-layer Caching (Memory → Redis → PostgreSQL)
  • Zero-Trust Security with device fingerprinting
  • Enterprise Compliance (GDPR, SOC2, ISO27001, HIPAA)

Overall Architecture

The IAM Service follows a layered architecture pattern with clear separation of concerns:

graph TB
    subgraph "Client Layer"
        WebApp[Web Application<br/>React/Vue/Angular]
        MobileApp[Mobile App<br/>iOS/Android/React Native]
        Service[Other Microservices<br/>Product/Order/Payment]
    end

    subgraph "API Gateway"
        Traefik[Traefik Gateway<br/>Load Balancer + Routing<br/>Port 80/443]
    end

    subgraph "IAM Service - Port 3001"
        subgraph "Middleware Stack"
            Correlation[Correlation ID<br/>Request Tracking]
            Logger[Request Logger<br/>Winston]
            ZeroTrust[Zero-Trust Validator<br/>Device + Location + Behavior]
            RateLimit[Dynamic Rate Limiter<br/>Redis-backed<br/>50-1000 req/15min]
            Auth[Authentication<br/>JWT Verification]
            RBAC[Authorization<br/>RBAC + ABAC]
        end

        subgraph "Core Authentication Modules"
            AuthModule[Auth Module<br/>Login/Register/Logout]
            TokenModule[Token Module<br/>JWT + Cookie Management]
            SessionModule[Session Module<br/>Device Fingerprinting]
            SocialModule[Social Auth<br/>Google/FB/GitHub]
            OIDCModule[OIDC Provider<br/>OpenID Connect]
            MFAModule[MFA Module<br/>TOTP/WebAuthn]
            RBACModule[RBAC Module<br/>Roles + Permissions]
        end

        subgraph "Identity Management Modules"
            UserModule[User Lifecycle<br/>CRUD + Bulk Ops]
            ProfileModule[Profile Management<br/>Avatar + Custom Fields]
            VerificationModule[Identity Verification<br/>Email/Phone/Document]
            OrgModule[Organizations<br/>Multi-tenant Hierarchy]
            GroupModule[Groups<br/>Group-based Access]
        end

        subgraph "Access Management Modules"
            RequestModule[Access Requests<br/>Approval Workflows]
            ReviewModule[Access Reviews<br/>Certification Campaigns]
            AnalyticsModule[Access Analytics<br/>Usage + Risk Analysis]
        end

        subgraph "Governance Modules"
            ComplianceModule[Compliance<br/>GDPR/SOC2/ISO27001]
            PolicyModule[Policy Governance<br/>Templates + Versioning]
            RiskModule[Risk Management<br/>Scoring + Detection]
            ReportingModule[Reporting<br/>Dashboards + Exports]
        end

        subgraph "Core Services"
            CacheService[Multi-layer Cache<br/>Memory → Redis<br/>TTL: 60s-15min]
            AuditService[Audit Logging<br/>Event Sourcing<br/>AuthEvent Model]
            SecurityService[Security Services<br/>Encryption + Hashing]
        end
    end

    subgraph "Data Layer"
        PostgreSQL[(PostgreSQL 14+<br/>Primary Database<br/>Connection Pool<br/>30+ Models)]
        Redis[(Redis 6+<br/>Cache + Sessions<br/>Rate Limits + Locks)]
    end

    subgraph "External Services (Future)"
        EmailService[Email Service<br/>Verification + Notifications]
        SMSService[SMS Service<br/>OTP + Phone Verification]
        FileStorage[File Storage<br/>S3/GCS for Avatars]
    end

    subgraph "Observability Stack"
        Prometheus[Prometheus<br/>Metrics Collection]
        Jaeger[Jaeger<br/>Distributed Tracing]
        Logs[Structured Logs<br/>Winston + Loki]
    end

    %% Client to Gateway
    WebApp --> Traefik
    MobileApp --> Traefik
    Service --> Traefik

    %% Gateway to Middleware
    Traefik --> Correlation
    Correlation --> Logger
    Logger --> ZeroTrust
    ZeroTrust --> RateLimit
    RateLimit --> Auth
    Auth --> RBAC

    %% Middleware to Modules
    RBAC --> AuthModule
    RBAC --> UserModule
    RBAC --> RequestModule
    RBAC --> ComplianceModule

    %% Module Dependencies
    AuthModule --> TokenModule
    AuthModule --> SessionModule
    AuthModule --> SocialModule
    AuthModule --> OIDCModule
    AuthModule --> MFAModule
    AuthModule --> RBACModule

    UserModule --> ProfileModule
    UserModule --> VerificationModule
    UserModule --> OrgModule
    UserModule --> GroupModule

    RequestModule --> ReviewModule
    RequestModule --> AnalyticsModule

    ComplianceModule --> PolicyModule
    ComplianceModule --> RiskModule
    ComplianceModule --> ReportingModule

    %% Core Services
    AuthModule --> CacheService
    UserModule --> CacheService
    RBACModule --> CacheService

    AuthModule --> AuditService
    RequestModule --> AuditService
    ComplianceModule --> AuditService

    AuthModule --> SecurityService

    %% Data Layer
    CacheService --> Redis
    CacheService --> PostgreSQL

    AuthModule --> PostgreSQL
    UserModule --> PostgreSQL
    RequestModule --> PostgreSQL
    ComplianceModule --> PostgreSQL

    %% External Services (dashed - not yet integrated)
    VerificationModule -.-> EmailService
    VerificationModule -.-> SMSService
    ProfileModule -.-> FileStorage

    %% Observability
    AuthModule -.-> Prometheus
    UserModule -.-> Prometheus
    RequestModule -.-> Prometheus
    ComplianceModule -.-> Prometheus

    CacheService -.-> Jaeger
    AuditService -.-> Logs

    style ZeroTrust fill:#ff9999
    style CacheService fill:#99ccff
    style PostgreSQL fill:#cc99ff
    style Redis fill:#ffcc99
    style AuditService fill:#99ff99

Architecture Highlights

  1. Layered Middleware: Every request passes through 6 middleware layers (correlation, logging, zero-trust, rate limiting, authentication, authorization)
  2. Modular Design: 10 independent feature modules with clear boundaries
  3. Multi-layer Caching: Memory (60s) → Redis (5-15min) → PostgreSQL for optimal performance
  4. Event Sourcing: All authentication and authorization events logged for audit compliance
  5. Zero-Trust Security: Continuous validation of device, location, and behavior
  6. Dynamic Rate Limiting: Role-based limits (50-1000 req/15min)

Authentication Flows

Password-Based Authentication

Standard email/password authentication with bcrypt hashing and JWT token generation:

sequenceDiagram
    participant Client
    participant Middleware
    participant AuthService
    participant RBACService
    participant Database
    participant Redis
    participant AuditService

    Client->>Middleware: POST /api/v1/auth/login<br/>{email, password}

    Middleware->>Middleware: 1. Correlation ID
    Middleware->>Middleware: 2. Zero-Trust Validation<br/>(Device + IP + Behavior)
    Middleware->>Middleware: 3. Rate Limit Check<br/>(5 login/15min)

    Middleware->>AuthService: login(email, password)

    AuthService->>Database: Find user by email
    Database-->>AuthService: User record

    AuthService->>AuthService: Verify password<br/>(bcrypt compare, cost=12)

    alt Password Invalid
        AuthService->>AuditService: Log LOGIN_FAILED event
        AuditService->>Database: Save AuthEvent
        AuthService-->>Client: 401 Unauthorized<br/>"Invalid credentials"
    else Password Valid & MFA Disabled
        AuthService->>RBACService: Get user roles + permissions
        RBACService->>Database: Query UserRole, UserPermission
        Database-->>RBACService: Roles + Permissions
        RBACService-->>AuthService: ["ADMIN"], ["users:read:all"]

        AuthService->>AuthService: Generate JWT tokens<br/>Access: 15min<br/>Refresh: 7 days

        AuthService->>Database: Save refresh token
        AuthService->>Redis: Cache user data<br/>TTL: 15min
        AuthService->>Redis: Cache permissions<br/>TTL: 5min
        AuthService->>Database: Create session<br/>(device fingerprint)

        AuthService->>Database: Update lastLoginAt, loginCount
        AuthService->>AuditService: Log LOGIN_SUCCESS event

        AuthService-->>Client: 200 OK<br/>{user, tokens}<br/>Set-Cookie: refresh_token
    else Password Valid & MFA Enabled
        AuthService-->>Client: 200 OK<br/>{mfaRequired: true}
        Note over Client: Prompt for MFA code
        Client->>Middleware: POST /api/v1/mfa/verify<br/>{token: "123456"}
        Note over Middleware,AuthService: MFA verification flow...
    end

Key Security Features:

  • Bcrypt password hashing (cost factor 12 in production)
  • Token family tracking for rotation security
  • Device fingerprinting for session management
  • Zero-trust validation before authentication
  • Comprehensive audit logging

Social Authentication Flow

OAuth 2.0 integration with Google, Facebook, and GitHub:

sequenceDiagram
    participant Client
    participant IAM
    participant Google as Google OAuth
    participant Database
    participant Redis

    Client->>IAM: GET /api/v1/auth/social/google
    IAM->>IAM: Generate state token<br/>(CSRF protection)
    IAM->>Redis: Store state token<br/>TTL: 10min
    IAM-->>Client: 302 Redirect to Google<br/>with state param

    Client->>Google: OAuth consent screen
    Google-->>Client: Authorization code + state

    Client->>IAM: GET /api/v1/auth/social/google/callback<br/>?code=xxx&state=yyy

    IAM->>Redis: Verify state token
    alt State Invalid
        IAM-->>Client: 401 CSRF token invalid
    else State Valid
        IAM->>Google: Exchange code for tokens
        Google-->>IAM: Access token + User profile

        IAM->>Database: Find or create user<br/>by provider + providerId

        alt User Found
            IAM->>Database: Update social account tokens
        else New User
            IAM->>Database: Create user + social account
            IAM->>Database: Assign default role (USER)
        end

        IAM->>IAM: Generate IAM JWT tokens
        IAM->>Redis: Cache user + permissions
        IAM->>Database: Create session

        IAM-->>Client: 302 Redirect to app<br/>with tokens in URL/cookie
    end

Supported Providers:

  • Google OAuth 2.0
  • Facebook OAuth
  • GitHub OAuth
  • Apple Sign-In (future)
  • Microsoft OAuth (future)

MFA (Multi-Factor Authentication) Flow

TOTP-based two-factor authentication using authenticator apps:

sequenceDiagram
    participant Client
    participant IAM
    participant Authenticator as Authenticator App
    participant Database

    Note over Client,Database: MFA Enrollment Phase

    Client->>IAM: POST /api/v1/mfa/totp/enable
    Note over Client: User must be authenticated

    IAM->>IAM: Generate TOTP secret (32 chars)
    IAM->>IAM: Generate QR code<br/>(otpauth://totp/...)
    IAM-->>Client: {secret, qrCode, backupCodes}

    Client->>Authenticator: Scan QR code
    Authenticator-->>Client: Display 6-digit TOTP

    Client->>IAM: POST /api/v1/mfa/totp/verify<br/>{token: "123456"}

    IAM->>IAM: Verify TOTP token<br/>(30s window, ±1 interval)

    alt Token Valid
        IAM->>Database: Create MFADevice record<br/>(type: TOTP, secret: encrypted)
        IAM->>Database: Update user.mfaEnabled = true
        IAM-->>Client: 200 OK MFA enabled
    else Token Invalid
        IAM-->>Client: 401 Invalid TOTP token
    end

    Note over Client,Database: MFA Login Phase

    Client->>IAM: POST /api/v1/auth/login<br/>{email, password}
    IAM->>Database: Verify credentials

    alt MFA Required
        IAM-->>Client: 200 OK {mfaRequired: true}

        Client->>Authenticator: Get current TOTP
        Authenticator-->>Client: Current 6-digit code

        Client->>IAM: POST /api/v1/mfa/totp/validate<br/>{userId, token}

        IAM->>Database: Get MFADevice for user
        IAM->>IAM: Verify TOTP token

        alt Token Valid
            IAM->>IAM: Generate full JWT tokens
            IAM->>Database: Create session
            IAM->>Database: Update device.lastUsedAt
            IAM-->>Client: 200 OK {user, tokens}
        else Token Invalid
            IAM-->>Client: 401 Invalid MFA token
        end
    end

MFA Features:

  • TOTP (Time-based One-Time Password) via authenticator apps
  • QR code generation for easy enrollment
  • Backup codes for account recovery
  • Multiple MFA devices per user
  • WebAuthn/FIDO2 framework (future implementation)

Authorization Model

The IAM Service implements a hybrid authorization model combining RBAC (Role-Based Access Control) and ABAC (Attribute-Based Access Control):

Authorization Decision Flow

flowchart TD
    Start[Request to Protected Resource] --> Auth{Authenticated?}

    Auth -->|No| Deny401[401 Unauthorized<br/>Authentication required]
    Auth -->|Yes| Cache{Permission<br/>in Cache?}

    Cache -->|Hit| CacheValue{Cache<br/>Value?}
    CacheValue -->|Allow| Allow[Access Granted]
    CacheValue -->|Deny| Deny403Cache[403 Forbidden<br/>Cached denial]

    Cache -->|Miss| LoadPerms[Load Permissions<br/>from Database]

    LoadPerms --> DirectPerms{Has Direct<br/>User Permission?}

    DirectPerms -->|Explicit Deny| Deny403A[403 Forbidden<br/>Explicit permission denial]
    DirectPerms -->|Explicit Allow| Allow
    DirectPerms -->|None| RolePerms{Has Role<br/>Permission?}

    RolePerms -->|Yes| CheckExpiry{Role<br/>Expired?}
    RolePerms -->|No| GroupPerms{Has Group<br/>Permission?}

    CheckExpiry -->|Yes| GroupPerms
    CheckExpiry -->|No| Allow

    GroupPerms -->|Yes| Allow
    GroupPerms -->|No| ABACCheck{ABAC Policy<br/>Exists?}

    ABACCheck -->|No Policies| DefaultDeny[403 Forbidden<br/>Default deny - No permissions]
    ABACCheck -->|Has Policies| EvaluatePolicies[Evaluate Policies<br/>by Priority]

    EvaluatePolicies --> EvalConditions{Policy<br/>Conditions?}

    EvalConditions -->|Time-based| CheckTime{Current time<br/>in range?}
    EvalConditions -->|Location-based| CheckLocation{IP in<br/>allowed range?}
    EvalConditions -->|Attribute-based| CheckAttrs{Attributes<br/>match?}

    CheckTime -->|Yes| PolicyEffect{Policy<br/>Effect?}
    CheckTime -->|No| NextPolicy{More<br/>Policies?}

    CheckLocation -->|Yes| PolicyEffect
    CheckLocation -->|No| NextPolicy

    CheckAttrs -->|Yes| PolicyEffect
    CheckAttrs -->|No| NextPolicy

    PolicyEffect -->|ALLOW| Allow
    PolicyEffect -->|DENY| Deny403Policy[403 Forbidden<br/>Policy denial]

    NextPolicy -->|Yes| EvaluatePolicies
    NextPolicy -->|No| DefaultDeny

    Allow --> CacheResult[Cache Result<br/>TTL: 5min]
    CacheResult --> AuditLog[Log Access Event<br/>to AuthEvent]
    AuditLog --> Success[200 OK<br/>Process request]

    style Auth fill:#e1f5fe
    style DirectPerms fill:#fff3e0
    style RolePerms fill:#f3e5f5
    style GroupPerms fill:#e8f5e9
    style ABACCheck fill:#fce4ec
    style Allow fill:#c8e6c9
    style Deny401 fill:#ffcdd2
    style Deny403A fill:#ffcdd2
    style Deny403Cache fill:#ffcdd2
    style Deny403Policy fill:#ffcdd2
    style DefaultDeny fill:#ffcdd2

Permission Model

The IAM Service uses a hierarchical permission model:

Permission Format: resource:action:scope

  • Resource: The entity being accessed (e.g., users, products, orders)
  • Action: The operation being performed (e.g., read, create, update, delete)
  • Scope: The access boundary (e.g., own, team, all)

Examples:

  • users:read:all - Read all users
  • users:update:own - Update own user profile
  • products:create:team - Create products for team
  • orders:delete:all - Delete any order (admin)

Authorization Hierarchy

1. Direct User Permissions (HIGHEST PRIORITY)
   ↓ (if not found or not denied)
2. Role Permissions
   ↓ (if not found)
3. Group Permissions
   ↓ (if not found)
4. ABAC Policies (evaluated by priority)
   ↓ (if no policies match)
5. Default Deny (LOWEST PRIORITY)

RBAC (Role-Based Access Control)

graph LR
    User[User] --> UserRole1[UserRole]
    User --> UserRole2[UserRole]
    User --> UserPerm[UserPermission<br/>Direct Override]

    UserRole1 --> Role1[ADMIN Role]
    UserRole2 --> Role2[MANAGER Role]

    Role1 --> RolePerm1[RolePermission]
    Role1 --> RolePerm2[RolePermission]
    Role2 --> RolePerm3[RolePermission]

    RolePerm1 --> Perm1[users:*:all]
    RolePerm2 --> Perm2[products:*:all]
    RolePerm3 --> Perm3[orders:read:team]

    UserPerm --> Perm4[analytics:read:all]

    Group[Group] --> GroupMember[GroupMember]
    GroupMember --> User
    Group --> GroupPerm[GroupPermission]
    GroupPerm --> Perm5[reports:read:all]

    style User fill:#e1f5fe
    style Role1 fill:#f3e5f5
    style Role2 fill:#f3e5f5
    style Group fill:#e8f5e9
    style Perm1 fill:#fff3e0
    style Perm2 fill:#fff3e0
    style Perm3 fill:#fff3e0
    style Perm4 fill:#ffccbc
    style Perm5 fill:#c8e6c9

Features:

  • Multiple roles per user
  • Temporary role assignments with expiration
  • Direct user permissions (can override roles)
  • Group-based permissions
  • Permission caching (5 minutes TTL)

ABAC (Attribute-Based Access Control)

graph TD
    Request[Access Request] --> PolicyEngine[Policy Engine]

    PolicyEngine --> LoadPolicies[Load Policies<br/>for Resource]
    LoadPolicies --> SortPolicies[Sort by Priority<br/>Descending]

    SortPolicies --> EvaluatePolicy[Evaluate Policy<br/>Conditions]

    EvaluatePolicy --> TimeCondition{Time-based<br/>Condition?}
    EvaluatePolicy --> LocationCondition{Location-based<br/>Condition?}
    EvaluatePolicy --> AttributeCondition{Attribute-based<br/>Condition?}

    TimeCondition --> CheckTime[Check current time<br/>vs allowed hours]
    LocationCondition --> CheckIP[Check IP address<br/>vs allowed ranges]
    AttributeCondition --> CheckAttrs[Check user attributes<br/>vs required values]

    CheckTime --> ConditionMet{All Conditions<br/>Met?}
    CheckIP --> ConditionMet
    CheckAttrs --> ConditionMet

    ConditionMet -->|Yes| ApplyEffect{Policy<br/>Effect?}
    ConditionMet -->|No| NextPolicy{More<br/>Policies?}

    ApplyEffect -->|ALLOW| Allow[Access Granted]
    ApplyEffect -->|DENY| Deny[Access Denied]

    NextPolicy -->|Yes| EvaluatePolicy
    NextPolicy -->|No| DefaultDeny[Default Deny]

    style PolicyEngine fill:#f3e5f5
    style ConditionMet fill:#fff3e0
    style Allow fill:#c8e6c9
    style Deny fill:#ffcdd2
    style DefaultDeny fill:#ffcdd2

Policy Example (JSON Logic):

{
  "name": "Business Hours Only",
  "resource": "sensitive_data",
  "condition": {
    "and": [
      {
        ">=": [{"var": "hour"}, 9]
      },
      {
        "<=": [{"var": "hour"}, 17]
      },
      {
        "in": [{"var": "day"}, [1, 2, 3, 4, 5]]
      }
    ]
  },
  "effect": "ALLOW",
  "priority": 100
}

Caching Strategy

The IAM Service implements a multi-layer caching strategy for optimal performance:

graph TB
    Request[Incoming Request] --> CheckL1{L1 Cache<br/>Node.js Memory}

    CheckL1 -->|Cache Hit| L1Hit[Fast Response<br/>< 1ms latency]
    CheckL1 -->|Cache Miss| CheckL2{L2 Cache<br/>Redis}

    CheckL2 -->|Cache Hit| L2Hit[Medium Response<br/>< 10ms latency]
    CheckL2 -->|Cache Miss| Database[(PostgreSQL<br/>Source of Truth)]

    Database --> UpdateL2[Update L2 Cache<br/>Write to Redis]
    UpdateL2 --> UpdateL1[Update L1 Cache<br/>Write to Memory]

    UpdateL1 --> Response[Return Response]
    L2Hit --> UpdateL1
    L1Hit --> Response

    subgraph "Cache Layers"
        L1Cache[L1: In-Memory Cache<br/>node-cache<br/>TTL: 60 seconds<br/>Hot data only]
        L2Cache[L2: Redis Cache<br/>ioredis<br/>TTL: 5-15 minutes<br/>Distributed]
        DBLayer[L3: PostgreSQL<br/>Prisma ORM<br/>Source of truth<br/>Persistent]
    end

    subgraph "Cached Data Types"
        UserData[User Data<br/>TTL: 15min]
        Permissions[Permissions<br/>TTL: 5min]
        Tokens[Token Validation<br/>TTL: Token lifetime]
        Sessions[Sessions<br/>TTL: Session lifetime]
        Roles[User Roles<br/>TTL: 5min]
    end

    style L1Hit fill:#90EE90
    style L2Hit fill:#87CEEB
    style Database fill:#FFB6C1
    style L1Cache fill:#c8e6c9
    style L2Cache fill:#bbdefb
    style DBLayer fill:#f8bbd0

Cache Configuration

Data Type L1 TTL L2 TTL Invalidation Strategy
User Data 60s 15min On update/delete
Permissions 60s 5min On role/permission change
User Roles 60s 5min On role assignment/revocation
Token Validation N/A Token lifetime On logout/revocation
Sessions N/A Session lifetime On logout
Rate Limit Counters N/A 15min window Time-based expiry

Cache Invalidation

// Example: Invalidate user caches when permissions change
await rbacService.grantPermission(userId, permissionId);

// Automatically invalidates:
// - cacheService.keys.userPermissions(userId)
// - cacheService.keys.userRoles(userId)

Module Dependencies

The IAM Service is organized into 10 feature modules with clear dependencies:

graph TD
    subgraph "Presentation Layer"
        Routes[Routes/Controllers<br/>50+ Endpoints]
    end

    subgraph "Business Logic Layer"
        subgraph "Core Auth"
            AuthModule[Auth Module<br/>Login/Register/Logout]
            TokenModule[Token Module<br/>JWT + Cookies]
            SessionModule[Session Module<br/>Device Tracking]
        end

        subgraph "Extended Auth"
            RBACModule[RBAC Module<br/>Roles + Permissions]
            SocialModule[Social Auth<br/>Google/FB/GitHub]
            OIDCModule[OIDC Module<br/>Provider + Client]
            MFAModule[MFA Module<br/>TOTP/WebAuthn]
        end

        subgraph "Identity"
            IdentityModule[Identity Module<br/>Users + Profiles]
            OrgModule[Organization Module<br/>Multi-tenancy]
            GroupModule[Group Module<br/>Group Permissions]
        end

        subgraph "Access Governance"
            AccessModule[Access Module<br/>Requests + Reviews]
            AnalyticsModule[Analytics Module<br/>Usage Analysis]
        end

        subgraph "Compliance"
            GovernanceModule[Governance Module<br/>Compliance + Risk]
        end
    end

    subgraph "Core Services Layer"
        CacheService[Cache Service<br/>Multi-layer]
        AuditService[Audit Service<br/>Event Sourcing]
        SecurityService[Security Service<br/>Encryption]
        FeatureService[Feature Flags<br/>Runtime Config]
    end

    subgraph "Data Access Layer"
        Repositories[Repositories<br/>Data Access<br/>Prisma ORM]
    end

    subgraph "Infrastructure"
        Database[(PostgreSQL<br/>30+ Models)]
        Redis[(Redis<br/>Cache + Locks)]
    end

    %% Routes to Modules
    Routes --> AuthModule
    Routes --> IdentityModule
    Routes --> AccessModule
    Routes --> GovernanceModule

    %% Core Auth Dependencies
    AuthModule --> TokenModule
    AuthModule --> SessionModule
    AuthModule --> SocialModule
    AuthModule --> OIDCModule
    AuthModule --> MFAModule
    AuthModule --> RBACModule

    %% Identity Dependencies
    IdentityModule --> OrgModule
    IdentityModule --> GroupModule
    IdentityModule --> RBACModule

    %% Access Dependencies
    AccessModule --> AnalyticsModule
    AccessModule --> RBACModule

    %% Governance Dependencies
    GovernanceModule --> RBACModule

    %% Core Services
    AuthModule --> CacheService
    IdentityModule --> CacheService
    RBACModule --> CacheService

    AuthModule --> AuditService
    AccessModule --> AuditService
    GovernanceModule --> AuditService

    AuthModule --> SecurityService

    AuthModule --> FeatureService

    %% Data Access
    AuthModule --> Repositories
    IdentityModule --> Repositories
    AccessModule --> Repositories
    GovernanceModule --> Repositories
    RBACModule --> Repositories

    Repositories --> Database
    CacheService --> Redis
    CacheService --> Database

    style AuthModule fill:#e1f5fe
    style RBACModule fill:#f3e5f5
    style IdentityModule fill:#fff3e0
    style AccessModule fill:#e8f5e9
    style GovernanceModule fill:#fce4ec
    style CacheService fill:#bbdefb
    style AuditService fill:#c8e6c9
    style SecurityService fill:#ffccbc

Module Descriptions

Module Responsibility Key Files
Auth User authentication and token management auth.service.ts, auth.controller.ts
Token JWT generation, validation, and refresh jwt.service.ts, cookie.service.ts
Session Session lifecycle and device management session.service.ts
RBAC Role and permission management rbac.service.ts, rbac.middleware.ts
Social OAuth integration with external providers social.service.ts, google.strategy.ts
OIDC OpenID Connect provider and client oidc-provider.service.ts
MFA Multi-factor authentication mfa.service.ts, totp.service.ts
Identity User lifecycle and profile management user.service.ts, profile.service.ts
Organization Multi-tenant organization support organization.service.ts
Group Group-based access control group.service.ts
Access Access request workflows and reviews request.service.ts, review.service.ts
Analytics Access analytics and reporting analytics.service.ts
Governance Compliance, policy, and risk management compliance.service.ts, risk.service.ts
Cache Multi-layer caching (Memory + Redis) cache.service.ts
Audit Event sourcing and audit logging audit.service.ts
Security Encryption, hashing, zero-trust zero-trust.validator.ts

Data Architecture

The IAM Service uses PostgreSQL with 30+ Prisma models. See DATA-MODEL.md for complete Entity Relationship Diagram.

Model Categories

  1. Core Authentication (7 models):

    • User, Session, RefreshToken, AuthEvent, SocialAccount, MFADevice, Policy
  2. Authorization (6 models):

    • Role, Permission, UserRole, RolePermission, UserPermission, GroupPermission
  3. Identity Management (6 models):

    • Organization, Group, GroupMember, UserProfile, IdentityVerification
  4. Access Management (4 models):

    • AccessRequest, AccessRequestApprover, AccessReview, AccessReviewItem
  5. Governance (3 models):

    • ComplianceReport, PolicyTemplate, RiskScore

Key Relationships

User (1) ─── (*) UserRole ─── (*) Role ─── (*) RolePermission ─── (*) Permission
User (1) ─── (*) UserPermission ─── (*) Permission
User (1) ─── (*) Session
User (1) ─── (1) UserProfile
User (1) ─── (*) AccessRequest
Organization (1) ─── (*) User
Organization (1) ─── (*) Group ─── (*) GroupMember ─── (*) User

Security Architecture

The IAM Service implements defense-in-depth security with multiple layers:

Security Layers

graph TB
    Request[Incoming Request] --> Layer1[Layer 1: Network Security<br/>Traefik Gateway + TLS]

    Layer1 --> Layer2[Layer 2: Zero-Trust Validation<br/>Device + Location + Behavior]

    Layer2 --> Layer3[Layer 3: Rate Limiting<br/>Dynamic by Role<br/>50-1000 req/15min]

    Layer3 --> Layer4[Layer 4: Authentication<br/>JWT Validation<br/>Token Expiry: 15min]

    Layer4 --> Layer5[Layer 5: Authorization<br/>RBAC + ABAC<br/>Permission Checking]

    Layer5 --> Layer6[Layer 6: Input Validation<br/>Zod Schemas<br/>Sanitization]

    Layer6 --> Layer7[Layer 7: Audit Logging<br/>Event Sourcing<br/>All Actions Logged]

    Layer7 --> ProcessRequest[Process Request]

    style Layer1 fill:#ffccbc
    style Layer2 fill:#ff9999
    style Layer3 fill:#ffb74d
    style Layer4 fill:#fff176
    style Layer5 fill:#aed581
    style Layer6 fill:#4dd0e1
    style Layer7 fill:#9575cd
    style ProcessRequest fill:#c8e6c9

Security Features

Feature Implementation Location
Zero-Trust Device fingerprinting, location, behavior analysis zero-trust.validator.ts
Password Hashing bcrypt (cost factor 12) auth.service.ts:43
Token Security JWT with HS256, 15min expiry, token rotation jwt.service.ts
CSRF Protection State tokens for OAuth, SameSite cookies cookie.service.ts
Rate Limiting Redis-backed, dynamic by role rate-limit.middleware.ts
Input Validation Zod schemas, sanitization validation.middleware.ts
Audit Logging Event sourcing (AuthEvent model) audit.service.ts
Session Security Device fingerprinting, IP tracking session.service.ts
MFA TOTP with 30s window, backup codes mfa.service.ts

Threat Mitigation

Threat Mitigation
Brute Force Login rate limiting (5 attempts/15min), account lockout
Token Theft Short token lifetime (15min), token rotation, device binding
CSRF SameSite cookies, state tokens for OAuth
XSS Content Security Policy, HttpOnly cookies
SQL Injection Prisma ORM parameterized queries
Session Hijacking Device fingerprinting, IP validation
Privilege Escalation Strict permission checks, audit logging
Replay Attacks Token expiry, nonce for OAuth

Observability

The IAM Service provides comprehensive observability with metrics, logs, and traces:

Observability Stack

graph TB
    subgraph "IAM Service"
        Application[Application Code]

        Application --> MetricsCollector[Metrics Collector<br/>Prometheus Format]
        Application --> Logger[Structured Logger<br/>Winston]
        Application --> Tracer[Distributed Tracer<br/>Jaeger Client]
    end

    subgraph "Collection Layer"
        Prometheus[Prometheus<br/>Metrics Storage]
        Loki[Loki<br/>Log Aggregation]
        Jaeger[Jaeger<br/>Trace Storage]
    end

    subgraph "Visualization Layer"
        Grafana[Grafana<br/>Dashboards + Alerts]
    end

    MetricsCollector --> Prometheus
    Logger --> Loki
    Tracer --> Jaeger

    Prometheus --> Grafana
    Loki --> Grafana
    Jaeger --> Grafana

    Grafana --> Alerts[Alert Manager<br/>Notifications]

    style Application fill:#e1f5fe
    style Prometheus fill:#f3e5f5
    style Loki fill:#fff3e0
    style Jaeger fill:#e8f5e9
    style Grafana fill:#fce4ec

Metrics (Prometheus)

Collected Metrics:

  • HTTP request duration (histogram)
  • HTTP request count (counter)
  • HTTP response status codes (counter)
  • Active sessions (gauge)
  • Cache hit/miss ratio (counter)
  • Database query duration (histogram)
  • Authentication success/failure rate (counter)
  • Permission check duration (histogram)

Endpoints:

  • /metrics - Prometheus metrics endpoint
  • /health/live - Liveness probe
  • /health/ready - Readiness probe

Logging (Winston)

Log Levels: ERROR, WARN, INFO, DEBUG

Structured Log Format:

{
  "level": "info",
  "message": "User logged in",
  "timestamp": "2024-01-01T00:00:00.000Z",
  "correlationId": "req-123-456",
  "userId": "user-789",
  "email": "user@example.com",
  "service": "iam-service"
}

Tracing (Jaeger)

Trace Spans:

  • HTTP request handling
  • Database queries
  • Cache operations
  • External API calls
  • Authentication flow
  • Authorization checks

Correlation IDs:

  • Every request gets a unique correlation ID
  • Propagated across service calls
  • Included in all logs and traces

Deployment Architecture

The IAM Service can be deployed in multiple configurations:

Local Development

graph LR
    Developer[Developer<br/>Localhost] --> LocalIAM[IAM Service<br/>pnpm dev<br/>Port 3001]

    LocalIAM --> LocalDB[(PostgreSQL<br/>Docker<br/>Port 5432)]
    LocalIAM --> LocalRedis[(Redis<br/>Docker<br/>Port 6379)]

    style LocalIAM fill:#e1f5fe
    style LocalDB fill:#f3e5f5
    style LocalRedis fill:#fff3e0

Docker Compose (Multi-Service)

graph TB
    subgraph "Docker Compose Network"
        Traefik[Traefik Gateway<br/>Port 80/443]

        Traefik --> IAMService[IAM Service<br/>Port 3001]
        Traefik --> ProductService[Product Service<br/>Port 3002]
        Traefik --> OrderService[Order Service<br/>Port 3003]

        IAMService --> SharedDB[(PostgreSQL<br/>Port 5432)]
        IAMService --> SharedRedis[(Redis<br/>Port 6379)]

        ProductService --> SharedDB
        ProductService --> SharedRedis

        OrderService --> SharedDB
        OrderService --> SharedRedis
    end

    style Traefik fill:#ffecb3
    style IAMService fill:#e1f5fe
    style SharedDB fill:#f3e5f5
    style SharedRedis fill:#fff3e0

Kubernetes (Production)

graph TB
    subgraph "Ingress Layer"
        Ingress[Ingress Controller<br/>NGINX/Traefik<br/>TLS Termination]
    end

    subgraph "Application Layer"
        IAMPod1[IAM Pod 1<br/>Replica 1]
        IAMPod2[IAM Pod 2<br/>Replica 2]
        IAMPod3[IAM Pod 3<br/>Replica 3]

        IAMService[IAM Service<br/>ClusterIP]
    end

    subgraph "Data Layer"
        PostgreSQL[(PostgreSQL<br/>StatefulSet<br/>Persistent Volume)]
        Redis[(Redis<br/>StatefulSet<br/>Sentinel HA)]
    end

    subgraph "Observability"
        Prometheus[Prometheus<br/>Metrics]
        Jaeger[Jaeger<br/>Tracing]
        Loki[Loki<br/>Logs]
    end

    Ingress --> IAMService
    IAMService --> IAMPod1
    IAMService --> IAMPod2
    IAMService --> IAMPod3

    IAMPod1 --> PostgreSQL
    IAMPod1 --> Redis
    IAMPod2 --> PostgreSQL
    IAMPod2 --> Redis
    IAMPod3 --> PostgreSQL
    IAMPod3 --> Redis

    IAMPod1 -.-> Prometheus
    IAMPod1 -.-> Jaeger
    IAMPod1 -.-> Loki

    style Ingress fill:#ffecb3
    style IAMPod1 fill:#e1f5fe
    style IAMPod2 fill:#e1f5fe
    style IAMPod3 fill:#e1f5fe
    style PostgreSQL fill:#f3e5f5
    style Redis fill:#fff3e0

Production Best Practices

  1. High Availability:

    • Multiple IAM service replicas (3+)
    • PostgreSQL replication (primary + standby)
    • Redis Sentinel for failover
  2. Security:

    • TLS/SSL for all connections
    • Network policies for pod-to-pod communication
    • Secrets management (HashiCorp Vault, AWS Secrets Manager)
    • Non-root containers
  3. Resource Limits:

    resources:
      requests:
        cpu: 500m
        memory: 512Mi
      limits:
        cpu: 2000m
        memory: 2Gi
    
  4. Health Checks:

    livenessProbe:
      httpGet:
        path: /health/live
        port: 3001
      initialDelaySeconds: 30
      periodSeconds: 10
    
    readinessProbe:
      httpGet:
        path: /health/ready
        port: 3001
      initialDelaySeconds: 10
      periodSeconds: 5
    
  5. Horizontal Pod Autoscaling:

    minReplicas: 3
    maxReplicas: 10
    targetCPUUtilizationPercentage: 70
    

Next Steps


References


Last Updated: January 2026 Version: 1.0.0 Status: Production Ready