# Kiến trúc Hướng Sự kiện / Event-Driven Architecture > **VI**: Kiến trúc hướng sự kiện cho giao tiếp bất đồng bộ sử dụng Apache Kafka > **EN**: Event-driven architecture for asynchronous communication using Apache Kafka ## Sơ đồ Tổng quan / Overview Diagram ```mermaid graph TD subgraph "Event Producers" IAM[IAM Service] Service1[Service A] end subgraph "Event Broker" Kafka[Apache Kafka] Topics[Topics: user.events, auth.events] end subgraph "Event Consumers" Consumer1[Notification Service] Consumer2[Audit Service] end IAM -->|Publish| Kafka Service1 -->|Publish| Kafka Kafka --> Topics Topics -->|Subscribe| Consumer1 Topics -->|Subscribe| Consumer2 style Kafka fill:#e1f5ff style Topics fill:#fff4e1 ``` ## Mô tả Kiến trúc / Architecture Description ### VI: Phần Tiếng Việt Nền tảng GoodGo triển khai Kiến trúc Hướng Sự kiện (EDA) cho giao tiếp bất đồng bộ giữa microservices. **Nguyên tắc Cốt lõi**: 1. **Event-First Design**: Mọi thay đổi trạng thái phát ra domain events 2. **Loose Coupling**: Services giao tiếp qua events 3. **Eventual Consistency**: Chấp nhận inconsistency tạm thời 4. **Event Sourcing**: Lưu thay đổi dưới dạng chuỗi event 5. **CQRS Pattern**: Tách biệt read/write operations **Công nghệ**: - Apache Kafka - Nền tảng event streaming - Schema Registry - Avro schemas để validation - KafkaJS - Thư viện Node.js client - Event Sourcing - Triển khai tùy chỉnh trong IAM ### EN: English Section The GoodGo platform implements Event-Driven Architecture (EDA) for asynchronous communication between microservices. **Core Principles**: 1. **Event-First Design**: All state changes emit domain events 2. **Loose Coupling**: Services communicate through events 3. **Eventual Consistency**: Accept temporary inconsistency 4. **Event Sourcing**: Store changes as event sequence 5. **CQRS Pattern**: Separate read/write operations **Technology Stack**: - Apache Kafka - Event streaming platform - Schema Registry - Avro schemas for validation - KafkaJS - Node.js client library - Event Sourcing - Custom implementation in IAM ## Luồng Sự kiện / Event Flow ```mermaid sequenceDiagram participant Producer as IAM Service participant Kafka as Kafka Broker participant Consumer as Notification Service Producer->>Kafka: Publish Event (user.created) Kafka->>Consumer: Deliver Event Consumer->>Consumer: Process Event Consumer-->>Kafka: Acknowledge ``` **VI Các Bước**: Publish → Distribute → Consume → Retry (nếu thất bại) → DLQ (sau retry tối đa) → Acknowledge **EN Steps**: Publish → Distribute → Consume → Retry (if failed) → DLQ (after max retries) → Acknowledge ## Cấu trúc Sự kiện / Event Structure ```typescript interface BaseEvent { eventId: string; // UUID eventType: string; // user.created.v1 eventVersion: string; // 1.0.0 timestamp: string; // ISO 8601 source: string; // iam-service correlationId?: string; // Request correlation data: unknown; // Event payload } ``` **Ví dụ / Example**: ```json { "eventId": "550e8400-e29b-41d4-a716-446655440000", "eventType": "user.created.v1", "timestamp": "2024-01-15T10:30:00Z", "source": "iam-service", "data": { "userId": "user_123", "email": "user@example.com" } } ``` ## Kafka Topics ```mermaid graph LR UserCreated[user.created
Partitions: 3] AuthLogin[auth.login.success
Partitions: 5] AuditEvents[audit.events
Partitions: 10] style UserCreated fill:#e1f5ff style AuthLogin fill:#fff4e1 style AuditEvents fill:#f8d7da ``` **Quy ước Đặt tên / Naming Convention**: `{domain}.{action}.{version}` **Ví dụ / Examples**: - `user.created.v1` - `auth.login.success.v1` - `audit.event.logged.v1` ## Xử lý Lỗi / Error Handling ```mermaid graph TD Event[Event] --> Process[Process] Process -->|Success| Ack[Acknowledge] Process -->|Failure| Retry[Retry 3x] Retry -->|Max Retries| DLQ[Dead Letter Queue] DLQ --> Alert[Alert Team] ``` **Chiến lược / Strategy**: 1. Retry với exponential backoff (100ms → 200ms → 400ms) 2. Tối đa 3 lần thử / Max 3 attempts 3. Chuyển sang DLQ sau retry tối đa / Move to DLQ after max retries 4. Xem xét thủ công và xử lý lại / Manual review and reprocess ## Bối cảnh Hệ thống / System Context ```mermaid C4Context title Sơ đồ Bối cảnh Event-Driven Architecture System(iam, "IAM Service", "Event producer") System(service_a, "Service A", "Event producer") System(notification, "Notification Service", "Event consumer") System(audit, "Audit Service", "Event consumer") System_Ext(kafka, "Apache Kafka", "Event streaming platform") System_Ext(registry, "Schema Registry", "Schema management") System_Ext(monitoring, "Monitoring", "Kafka metrics & alerts") Rel(iam, kafka, "Publishes events", "Kafka Protocol") Rel(service_a, kafka, "Publishes events", "Kafka Protocol") Rel(kafka, notification, "Delivers events", "Kafka Protocol") Rel(kafka, audit, "Delivers events", "Kafka Protocol") Rel(kafka, registry, "Validates schemas", "HTTP") Rel(kafka, monitoring, "Sends metrics", "JMX") ``` **VI Mô tả**: - **Producers**: IAM Service và các services khác publish domain events - **Kafka**: Event broker trung tâm, quản lý topics và partitions - **Consumers**: Notification và Audit services consume events - **Schema Registry**: Quản lý và validate Avro schemas - **Monitoring**: Thu thập metrics từ Kafka cluster **EN Description**: - **Producers**: IAM Service and other services publish domain events - **Kafka**: Central event broker, manages topics and partitions - **Consumers**: Notification and Audit services consume events - **Schema Registry**: Manages and validates Avro schemas - **Monitoring**: Collects metrics from Kafka cluster ## Đặc điểm Hiệu suất / Performance Characteristics | Chỉ số / Metric | Mục tiêu / Target | Ghi chú / Notes | |-----------------|-------------------|-----------------| | **Event Publish Latency (P95)** | < 10ms | Fire-and-forget, async | | **Event Delivery Latency (P95)** | < 100ms | End-to-end from publish to consume | | **Throughput** | 10,000 events/s | Per topic, scalable with partitions | | **Consumer Lag** | < 1000 messages | Per partition, monitored | | **Event Size** | < 1MB | Recommended max size | | **Retention** | 7 days | Default, configurable per topic | | **Replication Factor** | 3 | For fault tolerance | **VI Tối ưu hóa Hiệu suất**: - **Batch Publishing**: Group multiple events để giảm network overhead - **Compression**: Sử dụng Snappy hoặc LZ4 compression - **Partitioning**: Phân chia topics thành multiple partitions cho parallel processing - **Consumer Groups**: Multiple consumers trong cùng group để scale horizontally - **Async Publishing**: Fire-and-forget pattern, không block request handlers **EN Performance Optimizations**: - **Batch Publishing**: Group multiple events to reduce network overhead - **Compression**: Use Snappy or LZ4 compression - **Partitioning**: Divide topics into multiple partitions for parallel processing - **Consumer Groups**: Multiple consumers in same group for horizontal scaling - **Async Publishing**: Fire-and-forget pattern, don't block request handlers ## Cân nhắc Bảo mật / Security Considerations ### VI: Phần Tiếng Việt **Event Encryption**: - TLS in-transit cho tất cả Kafka connections - Optional payload encryption cho sensitive data - End-to-end encryption với custom encryption layer **Access Control**: - Kafka ACLs (Access Control Lists) per topic - SASL/SCRAM authentication cho producers và consumers - Separate credentials cho mỗi service - Principle of least privilege - chỉ grant quyền cần thiết **Schema Validation**: - Avro schemas trong Schema Registry - Schema evolution với backward/forward compatibility - Reject events không match schema **Audit**: - Log tất cả event publishes và consumes - Correlation IDs để trace event flow - Retention policy cho audit logs (7 years) **Data Retention**: - Default 7 days retention - Configurable per topic - Automatic deletion sau retention period - Compliance với GDPR (right to erasure) ### EN: English Section **Event Encryption**: - TLS in-transit for all Kafka connections - Optional payload encryption for sensitive data - End-to-end encryption with custom encryption layer **Access Control**: - Kafka ACLs (Access Control Lists) per topic - SASL/SCRAM authentication for producers and consumers - Separate credentials per service - Principle of least privilege - grant only necessary permissions **Schema Validation**: - Avro schemas in Schema Registry - Schema evolution with backward/forward compatibility - Reject events that don't match schema **Audit**: - Log all event publishes and consumes - Correlation IDs to trace event flow - Retention policy for audit logs (7 years) **Data Retention**: - Default 7 days retention - Configurable per topic - Automatic deletion after retention period - GDPR compliance (right to erasure) ## Triển khai / Deployment ```mermaid graph TD subgraph "Kafka Cluster" subgraph "Brokers" Broker1[Kafka Broker 1
Leader for partitions 0,3,6] Broker2[Kafka Broker 2
Leader for partitions 1,4,7] Broker3[Kafka Broker 3
Leader for partitions 2,5,8] end subgraph "Coordination" ZK[Zookeeper Ensemble
3 nodes] end Broker1 --> ZK Broker2 --> ZK Broker3 --> ZK end subgraph "Producers" IAM[IAM Service] ServiceA[Service A] end subgraph "Consumers" Notification[Notification Service
Consumer Group: notifications] Audit[Audit Service
Consumer Group: audit] end IAM --> Broker1 IAM --> Broker2 IAM --> Broker3 ServiceA --> Broker1 ServiceA --> Broker2 ServiceA --> Broker3 Broker1 --> Notification Broker2 --> Notification Broker3 --> Notification Broker1 --> Audit Broker2 --> Audit Broker3 --> Audit style Broker1 fill:#e1f5ff style Broker2 fill:#fff4e1 style Broker3 fill:#d4edda style ZK fill:#f0e1ff ``` ### VI: Chiến lược Triển khai **Kafka Cluster Configuration**: - **Brokers**: 3 brokers minimum (5 for production) - **Replication Factor**: 3 (for fault tolerance) - **Min In-Sync Replicas**: 2 (ensure data durability) - **Partitions**: 3-10 per topic (based on throughput needs) - **Zookeeper**: 3-node ensemble (for coordination) **Resource Allocation**: | Component | CPU | Memory | Disk | |-----------|-----|--------|------| | **Kafka Broker** | 2 cores | 4GB RAM | 100GB SSD | | **Zookeeper** | 1 core | 2GB RAM | 20GB SSD | | **Schema Registry** | 500m | 1GB RAM | 10GB | **Topic Configuration**: ```yaml user.created: partitions: 3 replication-factor: 3 retention-ms: 604800000 # 7 days compression-type: snappy auth.login.success: partitions: 5 replication-factor: 3 retention-ms: 604800000 compression-type: snappy audit.events: partitions: 10 replication-factor: 3 retention-ms: 220752000000 # 7 years compression-type: lz4 ``` **High Availability**: - Multiple brokers với partition replication - Automatic leader election khi broker fails - Consumer group rebalancing - Monitoring và alerting cho broker health ### EN: Deployment Strategy **Kafka Cluster Configuration**: - **Brokers**: 3 brokers minimum (5 for production) - **Replication Factor**: 3 (for fault tolerance) - **Min In-Sync Replicas**: 2 (ensure data durability) - **Partitions**: 3-10 per topic (based on throughput needs) - **Zookeeper**: 3-node ensemble (for coordination) **Resource Allocation**: | Component | CPU | Memory | Disk | |-----------|-----|--------|------| | **Kafka Broker** | 2 cores | 4GB RAM | 100GB SSD | | **Zookeeper** | 1 core | 2GB RAM | 20GB SSD | | **Schema Registry** | 500m | 1GB RAM | 10GB | **Topic Configuration**: ```yaml user.created: partitions: 3 replication-factor: 3 retention-ms: 604800000 # 7 days compression-type: snappy auth.login.success: partitions: 5 replication-factor: 3 retention-ms: 604800000 compression-type: snappy audit.events: partitions: 10 replication-factor: 3 retention-ms: 220752000000 # 7 years compression-type: lz4 ``` **High Availability**: - Multiple brokers with partition replication - Automatic leader election when broker fails - Consumer group rebalancing - Monitoring and alerting for broker health ## Giám sát & Khả năng quan sát / Monitoring & Observability ### VI: Chỉ số Chính **Kafka Broker Metrics**: - `kafka_server_brokertopicmetrics_messagesinpersec` - Messages in/sec - `kafka_server_brokertopicmetrics_bytesinpersec` - Bytes in/sec - `kafka_server_brokertopicmetrics_bytesoutpersec` - Bytes out/sec - `kafka_controller_kafkacontroller_activecontrollercount` - Active controller - `kafka_server_replicamanager_underreplicatedpartitions` - Under-replicated partitions **Consumer Metrics**: - `kafka_consumer_fetch_manager_records_lag_max` - Max consumer lag - `kafka_consumer_fetch_manager_records_consumed_rate` - Records consumed/sec - `kafka_consumer_coordinator_commit_latency_avg` - Commit latency **Producer Metrics**: - `kafka_producer_record_send_total` - Total records sent - `kafka_producer_record_error_total` - Total send errors - `kafka_producer_request_latency_avg` - Request latency **Application Metrics**: ```typescript // VI: Custom metrics cho event processing // EN: Custom metrics for event processing const eventPublished = new Counter({ name: 'events_published_total', help: 'Total events published', labelNames: ['event_type', 'topic'] }); const eventConsumed = new Counter({ name: 'events_consumed_total', help: 'Total events consumed', labelNames: ['event_type', 'topic', 'consumer_group'] }); const eventProcessingDuration = new Histogram({ name: 'event_processing_duration_seconds', help: 'Event processing duration', labelNames: ['event_type'], buckets: [0.01, 0.05, 0.1, 0.5, 1, 2, 5] }); ``` **Alerting Rules**: ```yaml # VI: Quy tắc cảnh báo # EN: Alerting rules # High consumer lag - alert: HighConsumerLag expr: kafka_consumer_fetch_manager_records_lag_max > 10000 for: 5m severity: warning annotations: summary: "High consumer lag detected" description: "Consumer lag is {{ $value }} messages" # Broker down - alert: KafkaBrokerDown expr: kafka_server_kafkaserver_brokerstate != 3 for: 1m severity: critical annotations: summary: "Kafka broker is down" # Under-replicated partitions - alert: UnderReplicatedPartitions expr: kafka_server_replicamanager_underreplicatedpartitions > 0 for: 5m severity: warning annotations: summary: "Under-replicated partitions detected" # Offline partitions - alert: OfflinePartitions expr: kafka_controller_kafkacontroller_offlinepartitionscount > 0 for: 1m severity: critical annotations: summary: "Offline partitions detected" ``` **Dashboards**: - Kafka Cluster Overview (brokers, topics, partitions) - Producer Performance (throughput, latency, errors) - Consumer Performance (lag, throughput, errors) - Topic Metrics (messages/sec, bytes/sec, retention) **Logging**: ```typescript // VI: Structured logging cho events // EN: Structured logging for events logger.info('Event published', { eventId: event.eventId, eventType: event.eventType, topic: 'user.created', correlationId: event.correlationId }); logger.info('Event consumed', { eventId: event.eventId, eventType: event.eventType, topic: 'user.created', consumerGroup: 'notifications', processingTime: duration }); ``` ### EN: Key Metrics **Kafka Broker Metrics**: - `kafka_server_brokertopicmetrics_messagesinpersec` - Messages in/sec - `kafka_server_brokertopicmetrics_bytesinpersec` - Bytes in/sec - `kafka_server_brokertopicmetrics_bytesoutpersec` - Bytes out/sec - `kafka_controller_kafkacontroller_activecontrollercount` - Active controller - `kafka_server_replicamanager_underreplicatedpartitions` - Under-replicated partitions **Consumer Metrics**: - `kafka_consumer_fetch_manager_records_lag_max` - Max consumer lag - `kafka_consumer_fetch_manager_records_consumed_rate` - Records consumed/sec - `kafka_consumer_coordinator_commit_latency_avg` - Commit latency **Producer Metrics**: - `kafka_producer_record_send_total` - Total records sent - `kafka_producer_record_error_total` - Total send errors - `kafka_producer_request_latency_avg` - Request latency **Application Metrics**: ```typescript // Custom metrics for event processing const eventPublished = new Counter({ name: 'events_published_total', help: 'Total events published', labelNames: ['event_type', 'topic'] }); const eventConsumed = new Counter({ name: 'events_consumed_total', help: 'Total events consumed', labelNames: ['event_type', 'topic', 'consumer_group'] }); const eventProcessingDuration = new Histogram({ name: 'event_processing_duration_seconds', help: 'Event processing duration', labelNames: ['event_type'], buckets: [0.01, 0.05, 0.1, 0.5, 1, 2, 5] }); ``` **Alerting Rules**: ```yaml # Alerting rules # High consumer lag - alert: HighConsumerLag expr: kafka_consumer_fetch_manager_records_lag_max > 10000 for: 5m severity: warning annotations: summary: "High consumer lag detected" description: "Consumer lag is {{ $value }} messages" # Broker down - alert: KafkaBrokerDown expr: kafka_server_kafkaserver_brokerstate != 3 for: 1m severity: critical annotations: summary: "Kafka broker is down" # Under-replicated partitions - alert: UnderReplicatedPartitions expr: kafka_server_replicamanager_underreplicatedpartitions > 0 for: 5m severity: warning annotations: summary: "Under-replicated partitions detected" # Offline partitions - alert: OfflinePartitions expr: kafka_controller_kafkacontroller_offlinepartitionscount > 0 for: 1m severity: critical annotations: summary: "Offline partitions detected" ``` **Dashboards**: - Kafka Cluster Overview (brokers, topics, partitions) - Producer Performance (throughput, latency, errors) - Consumer Performance (lag, throughput, errors) - Topic Metrics (messages/sec, bytes/sec, retention) **Logging**: ```typescript // Structured logging for events logger.info('Event published', { eventId: event.eventId, eventType: event.eventType, topic: 'user.created', correlationId: event.correlationId }); logger.info('Event consumed', { eventId: event.eventId, eventType: event.eventType, topic: 'user.created', consumerGroup: 'notifications', processingTime: duration }); ``` ## Tài liệu Liên quan / Related Documentation - [System Design](./system-design.md) - Kiến trúc tổng thể / Overall architecture - [IAM Architecture](./iam-proposal.md) - Triển khai Event sourcing / Event sourcing implementation --- **Cập nhật Lần cuối / Last Updated**: 2026-01-07 **Tác giả / Authors**: GoodGo Architecture Team