- Changed the Mermaid theme from 'default' to 'base' for improved visual consistency. - Expanded the architecture documentation by adding new sections on caching, data consistency, and observability patterns. - Enhanced existing diagrams with clearer color coding and class definitions for better readability. - Removed outdated service communication documentation to streamline content. - Included quick tips and common issues sections to assist users in navigating the documentation effectively.
399 lines
12 KiB
Markdown
399 lines
12 KiB
Markdown
# Microservices Communication
|
|
|
|
> Communication patterns and protocols for inter-service communication
|
|
|
|
## Overview Diagram
|
|
|
|
```mermaid
|
|
graph TD
|
|
Client[Client Apps] --> Gateway[API Gateway<br/>Traefik]
|
|
|
|
Gateway --> ServiceA[Service A]
|
|
Gateway --> ServiceB[Service B]
|
|
|
|
ServiceA <-->|REST/HTTP| ServiceB
|
|
ServiceA -->|Events| Kafka[Kafka Broker]
|
|
ServiceB <-.->|Sub| Kafka
|
|
|
|
ServiceA --> SD[Service Discovery<br/>Docker DNS / K8s DNS]
|
|
ServiceB --> SD
|
|
|
|
classDef blue fill:#253041,stroke:#4b6584,color:#ffffff
|
|
classDef orange fill:#3a2e1e,stroke:#7a5f3c,color:#ffffff
|
|
classDef green fill:#1e3a29,stroke:#3c7a52,color:#ffffff
|
|
|
|
class Gateway blue
|
|
class Kafka orange
|
|
class SD green
|
|
```
|
|
|
|
## System Context
|
|
|
|
```mermaid
|
|
C4Context
|
|
title System Context Diagram for GoodGo Microservices Communication
|
|
|
|
Person(client_web, "Web Client", "Browser/Mobile App")
|
|
Person(client_api, "API Consumer", "External API clients")
|
|
|
|
System_Boundary(goodgo, "GoodGo Platform") {
|
|
System(gateway, "API Gateway", "Traefik - Routes requests to services")
|
|
System(services, "Microservices", "IAM, User, Order, Product services")
|
|
System(kafka, "Event Bus", "Kafka - Async communication")
|
|
System(discovery, "Service Discovery", "Docker DNS / K8s DNS")
|
|
}
|
|
|
|
System_Ext(db, "Database", "Neon PostgreSQL")
|
|
System_Ext(cache, "Cache", "Redis")
|
|
System_Ext(external_api, "External APIs", "Payment, Email, SMS")
|
|
|
|
Rel(client_web, gateway, "Uses", "HTTPS")
|
|
Rel(client_api, gateway, "Calls", "HTTPS/REST")
|
|
Rel(gateway, services, "Routes to", "HTTP")
|
|
Rel(services, kafka, "Pub/Sub", "Kafka Protocol")
|
|
Rel(services, discovery, "Lookup", "DNS")
|
|
Rel(services, db, "Reads/Writes", "PostgreSQL")
|
|
Rel(services, cache, "Gets/Sets", "Redis Protocol")
|
|
Rel(services, external_api, "Integrates", "HTTPS")
|
|
```
|
|
|
|
The GoodGo platform uses a microservices architecture where all client requests flow through an API Gateway (Traefik), which routes them to appropriate microservices. Services communicate synchronously via REST/HTTP for request-response patterns and asynchronously via Kafka for event-driven workflows. Service discovery is handled by Docker DNS in local environments and Kubernetes DNS in production.
|
|
|
|
## Communication Protocols
|
|
|
|
### Protocol Comparison
|
|
|
|
| Protocol | Latency | Complexity | Use Case |
|
|
|----------|---------|------------|----------|
|
|
| **REST** | Medium | Low | External APIs, CRUD |
|
|
| **gRPC** | Low | High | Internal high-performance |
|
|
| **Events** | Async | Medium | Decoupled workflows |
|
|
| **GraphQL** | Medium | Medium | Complex data fetching |
|
|
|
|
### REST/HTTP Pattern
|
|
|
|
```mermaid
|
|
sequenceDiagram
|
|
participant Client
|
|
participant Gateway as API Gateway
|
|
participant ServiceA as Service A
|
|
participant ServiceB as Service B
|
|
|
|
Client->>Gateway: GET /api/v1/users/123
|
|
Gateway->>ServiceA: Forward Request
|
|
ServiceA->>ServiceB: GET /internal/permissions/123
|
|
ServiceB-->>ServiceA: Permissions
|
|
ServiceA-->>Gateway: User + Permissions
|
|
Gateway-->>Client: JSON Response
|
|
```
|
|
|
|
Synchronous request-response using HTTP/REST.
|
|
|
|
**Implementation**:
|
|
```typescript
|
|
// Service-to-service HTTP client
|
|
import axios from 'axios';
|
|
|
|
export class UserServiceClient {
|
|
private client = axios.create({
|
|
baseURL: process.env.USER_SERVICE_URL,
|
|
timeout: 5000,
|
|
headers: {
|
|
'x-service-auth': process.env.INTERNAL_API_KEY
|
|
}
|
|
});
|
|
|
|
async getUser(userId: string): Promise<User> {
|
|
const response = await this.client.get(`/users/${userId}`);
|
|
return response.data;
|
|
}
|
|
}
|
|
```
|
|
|
|
### Event-Driven Pattern
|
|
|
|
```mermaid
|
|
sequenceDiagram
|
|
participant ServiceA
|
|
participant Kafka
|
|
participant ServiceB
|
|
participant ServiceC
|
|
|
|
ServiceA->>Kafka: Publish: user.created
|
|
Kafka->>ServiceB: Deliver event
|
|
Kafka->>ServiceC: Deliver event
|
|
|
|
par Parallel Processing
|
|
ServiceB->>ServiceB: Send welcome email
|
|
ServiceC->>ServiceC: Create user profile
|
|
end
|
|
```
|
|
|
|
Asynchronous event-based communication via Kafka.
|
|
|
|
### Service Discovery
|
|
|
|
**Local (Docker Compose)**:
|
|
```yaml
|
|
# Services discover via Docker DNS
|
|
http://service-name:port
|
|
http://iam-service:3001
|
|
```
|
|
|
|
**Kubernetes**:
|
|
```yaml
|
|
# Services discover via K8s DNS
|
|
http://service-name.namespace.svc.cluster.local
|
|
http://iam-service.default.svc.cluster.local:3001
|
|
```
|
|
|
|
## API Gateway Pattern
|
|
|
|
```mermaid
|
|
graph LR
|
|
Client --> Gateway[API Gateway<br/>Traefik]
|
|
|
|
subgraph "Gateway Features"
|
|
Gateway --> Route[Routing]
|
|
Gateway --> LB[Load Balancing]
|
|
Gateway --> Auth[Authentication]
|
|
Gateway --> Rate[Rate Limiting]
|
|
Gateway --> CORS
|
|
end
|
|
|
|
Route --> Service1[Service 1]
|
|
Route --> Service2[Service 2]
|
|
LB --> Service1A[Instance A]
|
|
LB --> Service1B[Instance B]
|
|
|
|
classDef blue fill:#253041,stroke:#4b6584,color:#ffffff
|
|
class Gateway blue
|
|
```
|
|
|
|
Single entry point for all client requests with routing, auth, rate limiting.
|
|
|
|
## Performance Characteristics
|
|
|
|
Performance expectations and optimization strategies for inter-service communication.
|
|
|
|
| Metric | Target | Notes |
|
|
|--------|--------|-------|
|
|
| **REST API Response Time** | < 100ms | P95 for internal service-to-service calls |
|
|
| **Event Publishing Latency** | < 50ms | Time to publish to Kafka |
|
|
| **Service Discovery Lookup** | < 10ms | DNS resolution time |
|
|
| **Gateway Routing Overhead** | < 20ms | Additional latency added by Traefik |
|
|
| **Throughput** | 10,000 req/s | Per service instance |
|
|
| **Kafka Event Processing** | < 500ms | P95 end-to-end event processing |
|
|
|
|
**Optimization Strategies**:
|
|
- **Connection Pooling**: Reuse HTTP connections between services
|
|
- **Circuit Breaker**: Prevent cascading failures with Opossum library
|
|
- **Retry with Backoff**: Exponential backoff for transient failures
|
|
- **Compression**: Enable gzip for large payloads
|
|
- **Caching**: Cache service discovery results and responses
|
|
|
|
## Security Considerations
|
|
|
|
Security measures for protecting inter-service communication.
|
|
|
|
### Service-to-Service Authentication
|
|
|
|
- **Internal API Keys**: Services authenticate using `x-service-auth` header
|
|
- **JWT Tokens**: For user context propagation between services
|
|
- **Mutual TLS (mTLS)**: Optional for production environments (Kubernetes service mesh)
|
|
|
|
### Network Security
|
|
|
|
- **Network Policies**: Kubernetes NetworkPolicies restrict service-to-service traffic
|
|
- **Service Mesh**: Istio/Linkerd for advanced security policies (optional)
|
|
- **Private Networks**: Services communicate within private VPC/cluster network
|
|
|
|
### Data Protection
|
|
|
|
- **Encryption in Transit**: TLS 1.2+ for all external communication
|
|
- **Event Payload Encryption**: Sensitive data encrypted before publishing to Kafka
|
|
- **API Gateway**: Traefik handles SSL termination and request validation
|
|
|
|
### Security Best Practices
|
|
|
|
```typescript
|
|
// Service client with authentication
|
|
export class SecureServiceClient {
|
|
private client = axios.create({
|
|
baseURL: process.env.SERVICE_URL,
|
|
timeout: 5000,
|
|
headers: {
|
|
'x-service-auth': process.env.INTERNAL_API_KEY,
|
|
'x-correlation-id': generateCorrelationId()
|
|
},
|
|
httpsAgent: new https.Agent({
|
|
rejectUnauthorized: true // Verify SSL certificates
|
|
})
|
|
});
|
|
}
|
|
```
|
|
|
|
## Deployment
|
|
|
|
How microservices communication is deployed and scaled across environments.
|
|
|
|
```mermaid
|
|
graph TD
|
|
subgraph "Production Cluster"
|
|
LB[Load Balancer] --> Gateway[API Gateway\n3 replicas]
|
|
|
|
Gateway --> ServiceA1[Service A\nInstance 1]
|
|
Gateway --> ServiceA2[Service A\nInstance 2]
|
|
Gateway --> ServiceB1[Service B\nInstance 1]
|
|
Gateway --> ServiceB2[Service B\nInstance 2]
|
|
|
|
ServiceA1 & ServiceA2 --> Kafka[Kafka Cluster\n3 brokers]
|
|
ServiceB1 & ServiceB2 --> Kafka
|
|
|
|
ServiceA1 & ServiceA2 --> DB[(PostgreSQL\nPrimary + Replica)]
|
|
ServiceB1 & ServiceB2 --> DB
|
|
|
|
ServiceA1 & ServiceA2 --> Redis[(Redis Cluster\n3 nodes)]
|
|
ServiceB1 & ServiceB2 --> Redis
|
|
end
|
|
|
|
classDef blue fill:#253041,stroke:#4b6584,color:#ffffff
|
|
classDef orange fill:#3a2e1e,stroke:#7a5f3c,color:#ffffff
|
|
classDef green fill:#1e3a29,stroke:#3c7a52,color:#ffffff
|
|
classDef red fill:#3a1e1e,stroke:#7a3c3c,color:#ffffff
|
|
|
|
class Gateway blue
|
|
class Kafka orange
|
|
class DB green
|
|
class Redis red
|
|
```
|
|
|
|
### Deployment Environments
|
|
|
|
| Environment | Gateway | Services | Kafka | Service Discovery |
|
|
|-------------|---------|----------|-------|-------------------|
|
|
| **Local** | Traefik (Docker) | Single instance per service | Single broker | Docker DNS |
|
|
| **Staging** | Traefik (2 replicas) | 2 replicas per service | 3 brokers | Kubernetes DNS |
|
|
| **Production** | Traefik (3+ replicas) | 3+ replicas per service | 5+ brokers | Kubernetes DNS + Service Mesh |
|
|
|
|
### Scaling Strategy
|
|
|
|
- **Horizontal Pod Autoscaler (HPA)**: Auto-scale based on CPU/memory
|
|
- **Kafka Partitions**: Scale event processing by increasing partitions
|
|
- **Load Balancing**: Kubernetes Service load balances across pod replicas
|
|
- **Gateway Scaling**: Traefik scales independently from backend services
|
|
|
|
## Monitoring & Observability
|
|
|
|
How to monitor and observe microservices communication.
|
|
|
|
### Key Metrics
|
|
|
|
**Service-to-Service Metrics**:
|
|
- `http_request_duration_seconds` - Request latency histogram
|
|
- `http_requests_total` - Total requests counter
|
|
- `http_request_errors_total` - Failed requests counter
|
|
- `service_client_timeout_total` - Timeout counter
|
|
|
|
**Gateway Metrics**:
|
|
- `traefik_service_requests_total` - Requests per service
|
|
- `traefik_service_request_duration_seconds` - Routing latency
|
|
- `traefik_service_retries_total` - Retry attempts
|
|
|
|
**Kafka Metrics**:
|
|
- `kafka_producer_record_send_total` - Events published
|
|
- `kafka_consumer_lag` - Consumer lag
|
|
- `kafka_consumer_records_consumed_total` - Events consumed
|
|
|
|
### Health Checks
|
|
|
|
**Service Endpoints**:
|
|
```typescript
|
|
// Liveness - is service running?
|
|
app.get('/health/live', (req, res) => {
|
|
res.json({ status: 'ok', timestamp: new Date().toISOString() });
|
|
});
|
|
|
|
// Readiness - can service handle traffic?
|
|
app.get('/health/ready', async (req, res) => {
|
|
const checks = {
|
|
database: await checkDatabase(),
|
|
redis: await checkRedis(),
|
|
kafka: await checkKafka()
|
|
};
|
|
|
|
const healthy = Object.values(checks).every(c => c);
|
|
res.status(healthy ? 200 : 503).json({ ready: healthy, checks });
|
|
});
|
|
```
|
|
|
|
**Kubernetes Probes**:
|
|
```yaml
|
|
livenessProbe:
|
|
httpGet:
|
|
path: /health/live
|
|
port: 3000
|
|
initialDelaySeconds: 30
|
|
periodSeconds: 10
|
|
|
|
readinessProbe:
|
|
httpGet:
|
|
path: /health/ready
|
|
port: 3000
|
|
initialDelaySeconds: 5
|
|
periodSeconds: 5
|
|
```
|
|
|
|
### Distributed Tracing
|
|
|
|
- **OpenTelemetry**: Instrument all service-to-service calls
|
|
- **Jaeger**: Visualize distributed traces
|
|
- **Correlation IDs**: Propagate via `x-correlation-id` header for request tracking
|
|
|
|
### Monitoring Dashboard
|
|
|
|
**Grafana Panels**:
|
|
- Service Communication Overview (request rate, latency, errors)
|
|
- Gateway Performance (routing time, backend health)
|
|
- Event Bus Health (Kafka lag, throughput)
|
|
- Service Dependencies (service map from traces)
|
|
|
|
## Related Documentation
|
|
|
|
- [System Design](./system-design.md) - Overall architecture
|
|
- [Event-Driven Architecture](./event-driven-architecture.md) - Event patterns
|
|
- [API Gateway Advanced](../skills/api-gateway-advanced.md) - Gateway patterns
|
|
- [Inter-Service Communication](../skills/inter-service-communication.md) - Communication patterns
|
|
- [Resilience Patterns](../skills/resilience-patterns.md) - Circuit breaker, retry
|
|
|
|
---
|
|
|
|
## Quick Tips
|
|
|
|
### Mermaid Common Issues
|
|
- **Arrow Syntax**: `-->` (solid), `-.->` (dotted), `==>` (thick)
|
|
- **Special Characters**: Escape with quote marks `"`
|
|
- **Subgraphs**: Use `subgraph "Title"` ... `end`
|
|
|
|
### Color Pattern Quick Reference
|
|
| Element | Color | Hex | Stroke | Usage |
|
|
|---------|-------|-----|--------|-------|
|
|
| **Core** | Blue | `#253041` | `#4b6584` | Primary components |
|
|
| **Logic** | Purple | `#2e1e3a` | `#5f3c7a` | Processing steps |
|
|
| **Data** | Green | `#1e3a29` | `#3c7a52` | Database, Cache |
|
|
| **External** | Orange | `#3a2e1e` | `#7a5f3c` | External APIs |
|
|
| **Error** | Red | `#3a1e1e` | `#7a3c3c` | Failures, Alerts |
|
|
|
|
### Visual Indicators
|
|
- 🔵 **Blue**: Core Infrastructure
|
|
- 🟢 **Green**: Data Operations
|
|
- 🟠 **Orange**: Event/External
|
|
- 🔴 **Red**: Critical/Error
|
|
- ⚪ **Grey**: Neutral/Boundary
|
|
|
|
---
|
|
|
|
**Last Updated**: 2026-01-07
|
|
**Authors**: GoodGo Architecture Team
|
|
**Reviewers**: To be assigned
|