570 lines
16 KiB
Markdown
570 lines
16 KiB
Markdown
# Security Architecture
|
|
|
|
> Comprehensive security architecture for GoodGo platform with zero-trust model, RBAC, and compliance
|
|
|
|
## Overview Diagram
|
|
|
|
```mermaid
|
|
graph TD
|
|
Request[Client Request] --> TLS[TLS/HTTPS Layer]
|
|
TLS --> RateLimit[Rate Limiting]
|
|
RateLimit --> JWT[JWT Validation]
|
|
JWT --> RBAC[RBAC Authorization]
|
|
RBAC --> ZeroTrust[Zero-Trust Checks]
|
|
ZeroTrust --> Service[Service Logic]
|
|
|
|
Service --> Encrypt[Data Encryption<br/>AES-256-GCM]
|
|
Encrypt --> DB[(Encrypted Data)]
|
|
|
|
Service --> Audit[Audit Logging]
|
|
Audit --> AuditDB[(Audit Trail<br/>7-year retention)]
|
|
|
|
style TLS fill:#d4edda
|
|
style JWT fill:#e1f5ff
|
|
style Encrypt fill:#f8d7da
|
|
style Audit fill:#fff4e1
|
|
```
|
|
|
|
## Architecture Description
|
|
|
|
The GoodGo Security Architecture implements defense-in-depth with multiple security layers:
|
|
|
|
**Security Principles**:
|
|
1. **Zero Trust**: Never trust, always verify
|
|
2. **Least Privilege**: Minimum required permissions
|
|
3. **Defense in Depth**: Multiple security layers
|
|
4. **Audit Everything**: Complete audit trail
|
|
5. **Encryption**: Data encrypted at rest and in transit
|
|
|
|
**Key Components**:
|
|
- JWT Authentication (15min access, 7d refresh)
|
|
- RBAC + ABAC Authorization
|
|
- Zero-Trust Device Validation
|
|
- AES-256-GCM Encryption
|
|
- Event Sourcing for Audit Trail
|
|
- Compliance (GDPR, SOC2, ISO27001, HIPAA)
|
|
|
|
## Authentication Flow
|
|
|
|
```mermaid
|
|
sequenceDiagram
|
|
participant Client
|
|
participant API as API Gateway
|
|
participant IAM as IAM Service
|
|
participant DB as Database
|
|
participant Cache as Redis
|
|
|
|
Client->>API: Login Request<br/>(email + password)
|
|
API->>IAM: Forward Request
|
|
IAM->>DB: Verify Credentials
|
|
DB-->>IAM: User + Hash
|
|
IAM->>IAM: bcrypt.compare()<br/>(cost 12)
|
|
|
|
alt Valid Credentials
|
|
IAM->>IAM: Generate Tokens<br/>(Access + Refresh)
|
|
IAM->>DB: Store Refresh Token<br/>(hashed SHA-256)
|
|
IAM->>Cache: Cache Permissions<br/>(5min TTL)
|
|
IAM-->>API: Tokens + User
|
|
API-->>Client: Set httpOnly Cookies
|
|
else Invalid
|
|
IAM-->>Client: 401 Unauthorized
|
|
end
|
|
```
|
|
|
|
**Authentication Details**:
|
|
|
|
**1. Password Hashing**:
|
|
- Algorithm: bcrypt with cost factor 12
|
|
- Never store plaintext passwords
|
|
- Minimum password: 8 chars with complexity rules
|
|
|
|
**2. JWT Tokens**:
|
|
- Access Token: 15 minutes expiry
|
|
- Refresh Token: 7 days expiry
|
|
- Algorithm: RS256 (asymmetric signing)
|
|
- Payload: userId, roles, permissions
|
|
|
|
**3. Token Storage**:
|
|
- Access: httpOnly cookie (secure, sameSite)
|
|
- Refresh: Database SHA-256 hash
|
|
- Rotation: New refresh token on each use
|
|
|
|
**4. MFA Support**:
|
|
- TOTP (Time-based One-Time Password)
|
|
- Backup codes (10 single-use)
|
|
- Recovery email verification
|
|
|
|
## Authorization Model
|
|
|
|
```mermaid
|
|
graph TD
|
|
User[User] --> Roles[Roles]
|
|
User --> DirectPerms[Direct Permissions]
|
|
|
|
Roles --> RolePerms[Role Permissions]
|
|
|
|
RolePerms --> Check{Permission Check}
|
|
DirectPerms --> Check
|
|
|
|
Check -->|Granted| Resource[Access Resource]
|
|
Check -->|Denied| Reject[403 Forbidden]
|
|
|
|
subgraph "Permission Model"
|
|
Perm[Permission<br/>resource:action:scope]
|
|
end
|
|
|
|
style Check fill:#e1f5ff
|
|
style Perm fill:#fff4e1
|
|
```
|
|
|
|
**RBAC (Role-Based Access Control)**:
|
|
|
|
**1. Role Hierarchy**:
|
|
```
|
|
SuperAdmin > OrgAdmin > Manager > User > Guest
|
|
```
|
|
|
|
**2. Permission Format**: `resource:action:scope`
|
|
- Resource: `users`, `roles`, `permissions`
|
|
- Action: `create`, `read`, `update`, `delete`
|
|
- Scope: `own`, `org`, `global`
|
|
|
|
**Examples**:
|
|
- `users:read:own` - Read own user profile
|
|
- `users:update:org` - Update users in organization
|
|
- `roles:create:global` - Create roles globally
|
|
|
|
**3. Permission Caching**:
|
|
```typescript
|
|
// Cache key: user:{userId}:permissions
|
|
// TTL: 5 minutes
|
|
// Invalidate on: role change, permission change
|
|
```
|
|
|
|
## Zero-Trust Architecture
|
|
|
|
```mermaid
|
|
graph TD
|
|
Request[Request] --> Device[Device Fingerprint]
|
|
Device --> IP[IP Address Validation]
|
|
IP --> Behavior[Behavioral Analysis]
|
|
Behavior --> Session[Session Binding]
|
|
|
|
Session -->|Valid| Allow[Allow Request]
|
|
Session -->|Suspicious| MFA[Require MFA]
|
|
Session -->|Anomaly| Block[Block + Alert]
|
|
|
|
style Block fill:#f8d7da
|
|
style MFA fill:#fff3cd
|
|
style Allow fill:#d4edda
|
|
```
|
|
|
|
**Zero-Trust Components**:
|
|
|
|
**1. Device Fingerprinting**:
|
|
- Browser: User-Agent, Canvas, WebGL
|
|
- Screen resolution, timezone, language
|
|
- Plugin detection, fonts available
|
|
- Hash fingerprint → Store with session
|
|
|
|
**2. IP Address Validation**:
|
|
- Whitelist known IPs per user
|
|
- Alert on new IP + require MFA
|
|
- Block suspicious IPs (VPN, Tor)
|
|
|
|
**3. Behavioral Analysis**:
|
|
- Login patterns (time, location)
|
|
- API usage patterns
|
|
- Failed auth attempts
|
|
- Alert on anomalies
|
|
|
|
**4. Session Binding**:
|
|
- Bind session to device fingerprint
|
|
- Bind session to IP address
|
|
- Invalidate on mismatch
|
|
|
|
## Data Protection
|
|
|
|
**Encryption Strategy**:
|
|
|
|
**1. Data at Rest**:
|
|
- PII: AES-256-GCM encryption
|
|
- Passwords: bcrypt (cost 12)
|
|
- Tokens: SHA-256 hash
|
|
- Keys: Environment variables + K8s secrets
|
|
|
|
**2. Data in Transit**:
|
|
- TLS 1.2+ for all communications
|
|
- HTTPS enforcement
|
|
- Certificate pinning (mobile clients)
|
|
|
|
**3. Key Management**:
|
|
- Unique key per encryption operation
|
|
- 32+ character ENCRYPTION_KEY
|
|
- Rotate keys quarterly
|
|
- Never hardcode secrets
|
|
|
|
## Compliance & Audit
|
|
|
|
**Compliance Requirements**:
|
|
|
|
**1. GDPR**:
|
|
- Right to erasure (soft delete + hard delete after 90 days)
|
|
- Data portability (export user data)
|
|
- Consent management
|
|
- Breach notification (72 hours)
|
|
|
|
**2. SOC2**:
|
|
- Access controls (RBAC)
|
|
- Encryption at rest and in transit
|
|
- Audit logging (7-year retention)
|
|
- Incident response plan
|
|
|
|
```typescript
|
|
// Event sourcing for all auth events
|
|
{
|
|
eventType: 'auth.login.success',
|
|
userId: 'user_123',
|
|
timestamp: '2024-01-15T10:30:00Z',
|
|
ipAddress: '192.168.1.1',
|
|
deviceFingerprint: 'fp_xyz',
|
|
metadata: {...}
|
|
}
|
|
```
|
|
|
|
## System Context
|
|
|
|
```mermaid
|
|
C4Context
|
|
title Security Architecture Context
|
|
|
|
Person(user, "User", "End user accessing platform")
|
|
Person(admin, "Admin", "System administrator")
|
|
Person(attacker, "Attacker", "Potential threat actor")
|
|
|
|
System(iam, "IAM Service", "Authentication & Authorization")
|
|
|
|
System_Ext(db, "Neon PostgreSQL", "Encrypted user credentials & sessions")
|
|
System_Ext(cache, "Redis", "Permission & session cache")
|
|
System_Ext(audit, "Audit Service", "Security event logging")
|
|
System_Ext(mfa, "MFA Provider", "TOTP verification")
|
|
System_Ext(monitoring, "Security Monitoring", "SIEM & alerting")
|
|
|
|
Rel(user, iam, "Authenticates", "HTTPS + TLS 1.2+")
|
|
Rel(admin, iam, "Manages permissions", "HTTPS + TLS 1.2+")
|
|
Rel(attacker, iam, "Blocked by security layers", "")
|
|
|
|
Rel(iam, db, "Stores credentials", "PostgreSQL + TLS")
|
|
Rel(iam, cache, "Caches permissions", "Redis + TLS")
|
|
Rel(iam, audit, "Logs security events", "Kafka")
|
|
Rel(iam, mfa, "Verifies MFA", "HTTPS")
|
|
Rel(iam, monitoring, "Sends security metrics", "Prometheus + Loki")
|
|
```
|
|
|
|
**Context Description**:
|
|
- **IAM Service**: Central authentication and authorization
|
|
- **Database**: Stores encrypted credentials, sessions, permissions
|
|
- **Cache**: Caches permissions and sessions to reduce database load
|
|
- **Audit Service**: Receives and stores all security events
|
|
- **MFA Provider**: External TOTP verification service (Google Authenticator compatible)
|
|
- **Security Monitoring**: SIEM (Security Information and Event Management) and alerting
|
|
|
|
## Database Architecture
|
|
|
|
```mermaid
|
|
erDiagram
|
|
User ||--o{ Session : has
|
|
User ||--o{ UserRole : has
|
|
User ||--o{ UserPermission : has
|
|
User ||--o{ MFADevice : has
|
|
User ||--o{ LoginHistory : has
|
|
User ||--o{ DeviceFingerprint : has
|
|
|
|
Role ||--o{ UserRole : assigned_to
|
|
Role ||--o{ RolePermission : has
|
|
|
|
Permission ||--o{ RolePermission : granted_to
|
|
Permission ||--o{ UserPermission : granted_to
|
|
|
|
Organization ||--o{ User : contains
|
|
Organization ||--o{ Role : defines
|
|
|
|
User {
|
|
string id PK "CUID"
|
|
string email UK "Unique, indexed"
|
|
string passwordHash "bcrypt cost 12"
|
|
string organizationId FK
|
|
boolean mfaEnabled "MFA required?"
|
|
datetime lastLoginAt "Tracking"
|
|
datetime createdAt "Timestamp"
|
|
datetime updatedAt "Timestamp"
|
|
datetime deletedAt "Soft delete"
|
|
}
|
|
|
|
Session {
|
|
string id PK "CUID"
|
|
string userId FK
|
|
string refreshTokenHash "SHA-256"
|
|
string deviceFingerprint "Hashed"
|
|
string ipAddress "IPv4/IPv6"
|
|
string userAgent "Browser info"
|
|
datetime expiresAt "7 days TTL"
|
|
datetime lastActivityAt "Tracking"
|
|
datetime createdAt "Timestamp"
|
|
}
|
|
|
|
Role {
|
|
string id PK "CUID"
|
|
string name "role-name"
|
|
string organizationId FK
|
|
int hierarchy "Priority level"
|
|
boolean isSystem "Built-in?"
|
|
datetime createdAt "Timestamp"
|
|
}
|
|
|
|
Permission {
|
|
string id PK "CUID"
|
|
string resource "users, roles, etc"
|
|
string action "create, read, update, delete"
|
|
string scope "own, org, global"
|
|
datetime createdAt "Timestamp"
|
|
}
|
|
|
|
MFADevice {
|
|
string id PK "CUID"
|
|
string userId FK
|
|
string type "totp, backup"
|
|
string secret "Encrypted TOTP secret"
|
|
boolean verified "Verified?"
|
|
datetime lastUsedAt "Tracking"
|
|
datetime createdAt "Timestamp"
|
|
}
|
|
|
|
LoginHistory {
|
|
string id PK "CUID"
|
|
string userId FK
|
|
boolean success "Success/Failure"
|
|
string ipAddress "IPv4/IPv6"
|
|
string deviceFingerprint "Hashed"
|
|
string failureReason "If failed"
|
|
datetime timestamp "Event time"
|
|
}
|
|
|
|
DeviceFingerprint {
|
|
string id PK "CUID"
|
|
string userId FK
|
|
string fingerprint "Hashed"
|
|
boolean trusted "Auto-approved?"
|
|
datetime firstSeenAt "First use"
|
|
datetime lastSeenAt "Last use"
|
|
}
|
|
```
|
|
|
|
**Description**:
|
|
- **User**: Stores hashed credentials, MFA settings, organization membership
|
|
- **Session**: Stores hashed refresh tokens, device fingerprint, IP tracking
|
|
- **Role & Permission**: RBAC hierarchy with system roles and custom roles
|
|
- **MFADevice**: TOTP secrets (encrypted), backup codes
|
|
- **LoginHistory**: Audit trail for all login attempts (success/failure)
|
|
- **DeviceFingerprint**: Trusted device tracking for zero-trust model
|
|
|
|
**Database Security**:
|
|
- Password hashes: bcrypt with cost factor 12
|
|
- Token hashes: SHA-256
|
|
- MFA secrets: AES-256-GCM encryption
|
|
- Soft deletes: `deletedAt` field, hard delete after 90 days (GDPR)
|
|
- Indexes: email (unique), userId (foreign keys), timestamps
|
|
|
|
## Design Decisions
|
|
|
|
### Decision 1: JWT with RS256 (Asymmetric)
|
|
|
|
**Context**: Need stateless authentication with ability to verify tokens in multiple services
|
|
|
|
**Decision**: Use JWT with RS256 (RSA asymmetric signing) instead of HS256 (HMAC symmetric)
|
|
|
|
**Consequences**:
|
|
- ✅ **Positive**:
|
|
- Services can verify tokens with public key, don't need secret
|
|
- Easier key rotation (only distribute new public key)
|
|
- Higher security (private key only in IAM service)
|
|
- Compliance: Clear audit trail of who signs tokens
|
|
- ❌ **Negative**:
|
|
- Slightly slower than HS256 (~10-20% slower)
|
|
- More complex key management
|
|
- Public/private key pair must be carefully protected
|
|
|
|
**Alternatives**: HS256 (symmetric), EdDSA, OAuth 2.0 with Opaque Tokens
|
|
|
|
### Decision 2: Zero-Trust Model with Device Fingerprinting
|
|
|
|
**Context**: Need to protect against credential theft, session hijacking, and unauthorized access
|
|
|
|
**Decision**: Implement zero-trust model with device fingerprinting, IP validation, behavioral analysis
|
|
|
|
**Consequences**:
|
|
- ✅ **Positive**:
|
|
- Detect anomalies (new device, new IP, unusual behavior)
|
|
- Increased security by detecting and blocking suspicious activities
|
|
- Compliance: SOC2, ISO27001 requirements
|
|
- User experience: Auto-approve trusted devices
|
|
- ❌ **Negative**:
|
|
- Higher complexity
|
|
- Potential false positives (legitimate users blocked)
|
|
- Performance overhead (fingerprint hash, IP check)
|
|
- Privacy concerns (tracking devices, IPs)
|
|
|
|
**Alternatives**: Basic authentication only, IP whitelist only, MFA required for all
|
|
|
|
### Decision 3: Event Sourcing for Audit Trail
|
|
|
|
**Context**: Need immutable audit trail for compliance (GDPR, SOC2, HIPAA) and security forensics
|
|
|
|
**Decision**: Use event sourcing pattern to store all auth/security events
|
|
|
|
**Consequences**:
|
|
- ✅ **Positive**:
|
|
- Immutable audit trail (cannot modify/delete)
|
|
- Complete history of all security events
|
|
- Compliance: GDPR (7-year retention), SOC2, HIPAA
|
|
- Security forensics: Trace back attacks, breaches
|
|
- Replay events to reconstruct state
|
|
- ❌ **Negative**:
|
|
- High storage cost (retain 7 years)
|
|
- Complexity in event schema versioning
|
|
- Performance: Event publishing overhead
|
|
- Data privacy: Must anonymize PII after retention period
|
|
|
|
**Alternatives**: Database audit logs only, External SIEM only, No audit trail
|
|
|
|
## Performance Characteristics
|
|
|
|
| Metric | Target | Notes |
|
|
|--------|--------|-------|
|
|
| **Login Time (P95)** | < 500ms | Including bcrypt verification |
|
|
| **Login Time (P99)** | < 1s | Peak load |
|
|
| **Token Generation (P95)** | < 50ms | JWT sign with RS256 |
|
|
| **Token Verification (P95)** | < 10ms | JWT verify with public key |
|
|
| **Permission Check (P95)** | < 5ms | From cache (L1 or L2) |
|
|
| **Permission Check (Cache Miss)** | < 50ms | Database query |
|
|
| **MFA Verification (P95)** | < 100ms | TOTP validation |
|
|
| **Session Lookup (P95)** | < 10ms | Redis cache |
|
|
| **Password Hash (P95)** | < 200ms | bcrypt cost 12 |
|
|
| **Device Fingerprint Hash** | < 5ms | SHA-256 |
|
|
| **Failed Login Rate Limit** | 5 attempts / 15min | Per user |
|
|
| **Auth Throughput** | 500 req/s | Per IAM instance |
|
|
|
|
**Performance Optimizations**:
|
|
- **Permission Caching**: L1 (memory) + L2 (Redis), TTL 5 minutes
|
|
- **Token Caching**: Cache public key in memory for JWT verification
|
|
- **Connection Pooling**: Reuse database connections
|
|
- **Async Operations**: Event publishing, audit logging (fire-and-forget)
|
|
- **Rate Limiting**: Prevent brute force attacks, reduce load
|
|
- **Horizontal Scaling**: Multiple IAM service instances
|
|
|
|
## Deployment
|
|
|
|
```mermaid
|
|
graph TD
|
|
subgraph "Security Layer"
|
|
LB[Load Balancer<br/>TLS Termination]
|
|
WAF[WAF / Firewall<br/>Rate Limiting<br/>DDoS Protection]
|
|
end
|
|
|
|
subgraph "IAM Service Layer"
|
|
IAM1[IAM Service Pod 1<br/>Stateless]
|
|
IAM2[IAM Service Pod 2<br/>Stateless]
|
|
IAM3[IAM Service Pod 3<br/>Stateless]
|
|
end
|
|
|
|
subgraph "Data Layer"
|
|
DB[(Neon PostgreSQL<br/>Encrypted at Rest)]
|
|
Cache[(Redis Cluster<br/>TLS Enabled)]
|
|
Vault[Secrets Manager<br/>K8s Secrets]
|
|
end
|
|
|
|
subgraph "Security Monitoring"
|
|
SIEM[SIEM / Security Monitoring]
|
|
Alerts[Alerting System]
|
|
end
|
|
|
|
Client[Clients] --> LB
|
|
LB --> WAF
|
|
WAF --> IAM1
|
|
WAF --> IAM2
|
|
WAF --> IAM3
|
|
|
|
IAM1 --> DB
|
|
IAM1 --> Cache
|
|
IAM1 --> Vault
|
|
|
|
IAM2 --> DB
|
|
IAM2 --> Cache
|
|
IAM2 --> Vault
|
|
|
|
IAM3 --> DB
|
|
IAM3 --> Cache
|
|
IAM3 --> Vault
|
|
|
|
IAM1 -.->|Security Events| SIEM
|
|
IAM2 -.->|Security Events| SIEM
|
|
IAM3 -.->|Security Events| SIEM
|
|
|
|
SIEM -.->|Alerts| Alerts
|
|
|
|
style LB fill:#d4edda
|
|
style WAF fill:#fff3cd
|
|
style DB fill:#f0e1ff
|
|
style Cache fill:#fff4e1
|
|
style Vault fill:#f8d7da
|
|
style SIEM fill:#e1f5ff
|
|
```
|
|
|
|
**Deployment Strategy**:
|
|
|
|
**Security Deployment**:
|
|
- **TLS 1.2+ Enforcement**: All connections require TLS
|
|
- **Network Policies (K8s)**: Deny all by default, whitelist specific services
|
|
- **Pod Security Policies**: Non-root user, read-only filesystem, no privilege escalation
|
|
- **Secrets Management**: Kubernetes secrets with encryption at rest
|
|
- **Image Scanning**: Trivy/Clair scan before deployment
|
|
- **RBAC (K8s)**: Least privilege for service accounts
|
|
|
|
**Resource Allocation**:
|
|
| Component | CPU | Memory | Replicas |
|
|
|-----------|-----|--------|----------|
|
|
| **IAM Service** | 500m | 1GB | 3-10 (HPA) |
|
|
| **Redis** | 1 core | 2GB | 3 masters + 3 slaves |
|
|
|
|
**Security Configuration**:
|
|
```yaml
|
|
# K8s Network Policy
|
|
apiVersion: networking.k8s.io/v1
|
|
kind: NetworkPolicy
|
|
metadata:
|
|
name: iam-service-policy
|
|
spec:
|
|
podSelector:
|
|
matchLabels:
|
|
app: iam-service
|
|
policyTypes:
|
|
- Ingress
|
|
- Egress
|
|
ingress:
|
|
- from:
|
|
- podSelector:
|
|
matchLabels:
|
|
app: api-gateway
|
|
ports:
|
|
- protocol: TCP
|
|
port: 5000
|
|
egress:
|
|
- to:
|
|
- podSelector:
|
|
matchLabels:
|
|
app: postgresql
|
|
ports:
|
|
- protocol: TCP
|
|
port: 5432
|
|
```
|