Files
pos-system/docs/en/architecture/microservices-communication.md
Ho Ngoc Hai b89e07f4cb feat(docs): Update Mermaid diagrams and enhance Vietnamese architecture documentation
- Changed the Mermaid theme from 'default' to 'base' for improved visual consistency.
- Expanded the architecture documentation by adding new sections on caching, data consistency, and observability patterns.
- Enhanced existing diagrams with clearer color coding and class definitions for better readability.
- Removed outdated service communication documentation to streamline content.
- Included quick tips and common issues sections to assist users in navigating the documentation effectively.
2026-01-10 17:42:19 +07:00

399 lines
12 KiB
Markdown

# Microservices Communication
> Communication patterns and protocols for inter-service communication
## Overview Diagram
```mermaid
graph TD
Client[Client Apps] --> Gateway[API Gateway<br/>Traefik]
Gateway --> ServiceA[Service A]
Gateway --> ServiceB[Service B]
ServiceA <-->|REST/HTTP| ServiceB
ServiceA -->|Events| Kafka[Kafka Broker]
ServiceB <-.->|Sub| Kafka
ServiceA --> SD[Service Discovery<br/>Docker DNS / K8s DNS]
ServiceB --> SD
classDef blue fill:#253041,stroke:#4b6584,color:#ffffff
classDef orange fill:#3a2e1e,stroke:#7a5f3c,color:#ffffff
classDef green fill:#1e3a29,stroke:#3c7a52,color:#ffffff
class Gateway blue
class Kafka orange
class SD green
```
## System Context
```mermaid
C4Context
title System Context Diagram for GoodGo Microservices Communication
Person(client_web, "Web Client", "Browser/Mobile App")
Person(client_api, "API Consumer", "External API clients")
System_Boundary(goodgo, "GoodGo Platform") {
System(gateway, "API Gateway", "Traefik - Routes requests to services")
System(services, "Microservices", "IAM, User, Order, Product services")
System(kafka, "Event Bus", "Kafka - Async communication")
System(discovery, "Service Discovery", "Docker DNS / K8s DNS")
}
System_Ext(db, "Database", "Neon PostgreSQL")
System_Ext(cache, "Cache", "Redis")
System_Ext(external_api, "External APIs", "Payment, Email, SMS")
Rel(client_web, gateway, "Uses", "HTTPS")
Rel(client_api, gateway, "Calls", "HTTPS/REST")
Rel(gateway, services, "Routes to", "HTTP")
Rel(services, kafka, "Pub/Sub", "Kafka Protocol")
Rel(services, discovery, "Lookup", "DNS")
Rel(services, db, "Reads/Writes", "PostgreSQL")
Rel(services, cache, "Gets/Sets", "Redis Protocol")
Rel(services, external_api, "Integrates", "HTTPS")
```
The GoodGo platform uses a microservices architecture where all client requests flow through an API Gateway (Traefik), which routes them to appropriate microservices. Services communicate synchronously via REST/HTTP for request-response patterns and asynchronously via Kafka for event-driven workflows. Service discovery is handled by Docker DNS in local environments and Kubernetes DNS in production.
## Communication Protocols
### Protocol Comparison
| Protocol | Latency | Complexity | Use Case |
|----------|---------|------------|----------|
| **REST** | Medium | Low | External APIs, CRUD |
| **gRPC** | Low | High | Internal high-performance |
| **Events** | Async | Medium | Decoupled workflows |
| **GraphQL** | Medium | Medium | Complex data fetching |
### REST/HTTP Pattern
```mermaid
sequenceDiagram
participant Client
participant Gateway as API Gateway
participant ServiceA as Service A
participant ServiceB as Service B
Client->>Gateway: GET /api/v1/users/123
Gateway->>ServiceA: Forward Request
ServiceA->>ServiceB: GET /internal/permissions/123
ServiceB-->>ServiceA: Permissions
ServiceA-->>Gateway: User + Permissions
Gateway-->>Client: JSON Response
```
Synchronous request-response using HTTP/REST.
**Implementation**:
```typescript
// Service-to-service HTTP client
import axios from 'axios';
export class UserServiceClient {
private client = axios.create({
baseURL: process.env.USER_SERVICE_URL,
timeout: 5000,
headers: {
'x-service-auth': process.env.INTERNAL_API_KEY
}
});
async getUser(userId: string): Promise<User> {
const response = await this.client.get(`/users/${userId}`);
return response.data;
}
}
```
### Event-Driven Pattern
```mermaid
sequenceDiagram
participant ServiceA
participant Kafka
participant ServiceB
participant ServiceC
ServiceA->>Kafka: Publish: user.created
Kafka->>ServiceB: Deliver event
Kafka->>ServiceC: Deliver event
par Parallel Processing
ServiceB->>ServiceB: Send welcome email
ServiceC->>ServiceC: Create user profile
end
```
Asynchronous event-based communication via Kafka.
### Service Discovery
**Local (Docker Compose)**:
```yaml
# Services discover via Docker DNS
http://service-name:port
http://iam-service:3001
```
**Kubernetes**:
```yaml
# Services discover via K8s DNS
http://service-name.namespace.svc.cluster.local
http://iam-service.default.svc.cluster.local:3001
```
## API Gateway Pattern
```mermaid
graph LR
Client --> Gateway[API Gateway<br/>Traefik]
subgraph "Gateway Features"
Gateway --> Route[Routing]
Gateway --> LB[Load Balancing]
Gateway --> Auth[Authentication]
Gateway --> Rate[Rate Limiting]
Gateway --> CORS
end
Route --> Service1[Service 1]
Route --> Service2[Service 2]
LB --> Service1A[Instance A]
LB --> Service1B[Instance B]
classDef blue fill:#253041,stroke:#4b6584,color:#ffffff
class Gateway blue
```
Single entry point for all client requests with routing, auth, rate limiting.
## Performance Characteristics
Performance expectations and optimization strategies for inter-service communication.
| Metric | Target | Notes |
|--------|--------|-------|
| **REST API Response Time** | < 100ms | P95 for internal service-to-service calls |
| **Event Publishing Latency** | < 50ms | Time to publish to Kafka |
| **Service Discovery Lookup** | < 10ms | DNS resolution time |
| **Gateway Routing Overhead** | < 20ms | Additional latency added by Traefik |
| **Throughput** | 10,000 req/s | Per service instance |
| **Kafka Event Processing** | < 500ms | P95 end-to-end event processing |
**Optimization Strategies**:
- **Connection Pooling**: Reuse HTTP connections between services
- **Circuit Breaker**: Prevent cascading failures with Opossum library
- **Retry with Backoff**: Exponential backoff for transient failures
- **Compression**: Enable gzip for large payloads
- **Caching**: Cache service discovery results and responses
## Security Considerations
Security measures for protecting inter-service communication.
### Service-to-Service Authentication
- **Internal API Keys**: Services authenticate using `x-service-auth` header
- **JWT Tokens**: For user context propagation between services
- **Mutual TLS (mTLS)**: Optional for production environments (Kubernetes service mesh)
### Network Security
- **Network Policies**: Kubernetes NetworkPolicies restrict service-to-service traffic
- **Service Mesh**: Istio/Linkerd for advanced security policies (optional)
- **Private Networks**: Services communicate within private VPC/cluster network
### Data Protection
- **Encryption in Transit**: TLS 1.2+ for all external communication
- **Event Payload Encryption**: Sensitive data encrypted before publishing to Kafka
- **API Gateway**: Traefik handles SSL termination and request validation
### Security Best Practices
```typescript
// Service client with authentication
export class SecureServiceClient {
private client = axios.create({
baseURL: process.env.SERVICE_URL,
timeout: 5000,
headers: {
'x-service-auth': process.env.INTERNAL_API_KEY,
'x-correlation-id': generateCorrelationId()
},
httpsAgent: new https.Agent({
rejectUnauthorized: true // Verify SSL certificates
})
});
}
```
## Deployment
How microservices communication is deployed and scaled across environments.
```mermaid
graph TD
subgraph "Production Cluster"
LB[Load Balancer] --> Gateway[API Gateway\n3 replicas]
Gateway --> ServiceA1[Service A\nInstance 1]
Gateway --> ServiceA2[Service A\nInstance 2]
Gateway --> ServiceB1[Service B\nInstance 1]
Gateway --> ServiceB2[Service B\nInstance 2]
ServiceA1 & ServiceA2 --> Kafka[Kafka Cluster\n3 brokers]
ServiceB1 & ServiceB2 --> Kafka
ServiceA1 & ServiceA2 --> DB[(PostgreSQL\nPrimary + Replica)]
ServiceB1 & ServiceB2 --> DB
ServiceA1 & ServiceA2 --> Redis[(Redis Cluster\n3 nodes)]
ServiceB1 & ServiceB2 --> Redis
end
classDef blue fill:#253041,stroke:#4b6584,color:#ffffff
classDef orange fill:#3a2e1e,stroke:#7a5f3c,color:#ffffff
classDef green fill:#1e3a29,stroke:#3c7a52,color:#ffffff
classDef red fill:#3a1e1e,stroke:#7a3c3c,color:#ffffff
class Gateway blue
class Kafka orange
class DB green
class Redis red
```
### Deployment Environments
| Environment | Gateway | Services | Kafka | Service Discovery |
|-------------|---------|----------|-------|-------------------|
| **Local** | Traefik (Docker) | Single instance per service | Single broker | Docker DNS |
| **Staging** | Traefik (2 replicas) | 2 replicas per service | 3 brokers | Kubernetes DNS |
| **Production** | Traefik (3+ replicas) | 3+ replicas per service | 5+ brokers | Kubernetes DNS + Service Mesh |
### Scaling Strategy
- **Horizontal Pod Autoscaler (HPA)**: Auto-scale based on CPU/memory
- **Kafka Partitions**: Scale event processing by increasing partitions
- **Load Balancing**: Kubernetes Service load balances across pod replicas
- **Gateway Scaling**: Traefik scales independently from backend services
## Monitoring & Observability
How to monitor and observe microservices communication.
### Key Metrics
**Service-to-Service Metrics**:
- `http_request_duration_seconds` - Request latency histogram
- `http_requests_total` - Total requests counter
- `http_request_errors_total` - Failed requests counter
- `service_client_timeout_total` - Timeout counter
**Gateway Metrics**:
- `traefik_service_requests_total` - Requests per service
- `traefik_service_request_duration_seconds` - Routing latency
- `traefik_service_retries_total` - Retry attempts
**Kafka Metrics**:
- `kafka_producer_record_send_total` - Events published
- `kafka_consumer_lag` - Consumer lag
- `kafka_consumer_records_consumed_total` - Events consumed
### Health Checks
**Service Endpoints**:
```typescript
// Liveness - is service running?
app.get('/health/live', (req, res) => {
res.json({ status: 'ok', timestamp: new Date().toISOString() });
});
// Readiness - can service handle traffic?
app.get('/health/ready', async (req, res) => {
const checks = {
database: await checkDatabase(),
redis: await checkRedis(),
kafka: await checkKafka()
};
const healthy = Object.values(checks).every(c => c);
res.status(healthy ? 200 : 503).json({ ready: healthy, checks });
});
```
**Kubernetes Probes**:
```yaml
livenessProbe:
httpGet:
path: /health/live
port: 3000
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /health/ready
port: 3000
initialDelaySeconds: 5
periodSeconds: 5
```
### Distributed Tracing
- **OpenTelemetry**: Instrument all service-to-service calls
- **Jaeger**: Visualize distributed traces
- **Correlation IDs**: Propagate via `x-correlation-id` header for request tracking
### Monitoring Dashboard
**Grafana Panels**:
- Service Communication Overview (request rate, latency, errors)
- Gateway Performance (routing time, backend health)
- Event Bus Health (Kafka lag, throughput)
- Service Dependencies (service map from traces)
## Related Documentation
- [System Design](./system-design.md) - Overall architecture
- [Event-Driven Architecture](./event-driven-architecture.md) - Event patterns
- [API Gateway Advanced](../skills/api-gateway-advanced.md) - Gateway patterns
- [Inter-Service Communication](../skills/inter-service-communication.md) - Communication patterns
- [Resilience Patterns](../skills/resilience-patterns.md) - Circuit breaker, retry
---
## Quick Tips
### Mermaid Common Issues
- **Arrow Syntax**: `-->` (solid), `-.->` (dotted), `==>` (thick)
- **Special Characters**: Escape with quote marks `"`
- **Subgraphs**: Use `subgraph "Title"` ... `end`
### Color Pattern Quick Reference
| Element | Color | Hex | Stroke | Usage |
|---------|-------|-----|--------|-------|
| **Core** | Blue | `#253041` | `#4b6584` | Primary components |
| **Logic** | Purple | `#2e1e3a` | `#5f3c7a` | Processing steps |
| **Data** | Green | `#1e3a29` | `#3c7a52` | Database, Cache |
| **External** | Orange | `#3a2e1e` | `#7a5f3c` | External APIs |
| **Error** | Red | `#3a1e1e` | `#7a3c3c` | Failures, Alerts |
### Visual Indicators
- 🔵 **Blue**: Core Infrastructure
- 🟢 **Green**: Data Operations
- 🟠 **Orange**: Event/External
- 🔴 **Red**: Critical/Error
-**Grey**: Neutral/Boundary
---
**Last Updated**: 2026-01-07
**Authors**: GoodGo Architecture Team
**Reviewers**: To be assigned