# Microservices Communication > Communication patterns and protocols for inter-service communication ## Overview Diagram ```mermaid graph TD Client[Client Apps] --> Gateway[API Gateway
Traefik] Gateway --> ServiceA[Service A] Gateway --> ServiceB[Service B] ServiceA <-->|REST/HTTP| ServiceB ServiceA -->|Events| Kafka[Kafka Broker] ServiceB <-.->|Sub| Kafka ServiceA --> SD[Service Discovery
Docker DNS / K8s DNS] ServiceB --> SD style Gateway fill:#e1f5ff style Kafka fill:#fff4e1 style SD fill:#d4edda ``` ## System Context ```mermaid C4Context title System Context Diagram for GoodGo Microservices Communication Person(client_web, "Web Client", "Browser/Mobile App") Person(client_api, "API Consumer", "External API clients") System_Boundary(goodgo, "GoodGo Platform") { System(gateway, "API Gateway", "Traefik - Routes requests to services") System(services, "Microservices", "IAM, User, Order, Product services") System(kafka, "Event Bus", "Kafka - Async communication") System(discovery, "Service Discovery", "Docker DNS / K8s DNS") } System_Ext(db, "Database", "Neon PostgreSQL") System_Ext(cache, "Cache", "Redis") System_Ext(external_api, "External APIs", "Payment, Email, SMS") Rel(client_web, gateway, "Uses", "HTTPS") Rel(client_api, gateway, "Calls", "HTTPS/REST") Rel(gateway, services, "Routes to", "HTTP") Rel(services, kafka, "Pub/Sub", "Kafka Protocol") Rel(services, discovery, "Lookup", "DNS") Rel(services, db, "Reads/Writes", "PostgreSQL") Rel(services, cache, "Gets/Sets", "Redis Protocol") Rel(services, external_api, "Integrates", "HTTPS") ``` The GoodGo platform uses a microservices architecture where all client requests flow through an API Gateway (Traefik), which routes them to appropriate microservices. Services communicate synchronously via REST/HTTP for request-response patterns and asynchronously via Kafka for event-driven workflows. Service discovery is handled by Docker DNS in local environments and Kubernetes DNS in production. ## Communication Protocols ### Protocol Comparison | Protocol | Latency | Complexity | Use Case | |----------|---------|------------|----------| | **REST** | Medium | Low | External APIs, CRUD | | **gRPC** | Low | High | Internal high-performance | | **Events** | Async | Medium | Decoupled workflows | | **GraphQL** | Medium | Medium | Complex data fetching | ### REST/HTTP Pattern ```mermaid sequenceDiagram participant Client participant Gateway as API Gateway participant ServiceA as Service A participant ServiceB as Service B Client->>Gateway: GET /api/v1/users/123 Gateway->>ServiceA: Forward Request ServiceA->>ServiceB: GET /internal/permissions/123 ServiceB-->>ServiceA: Permissions ServiceA-->>Gateway: User + Permissions Gateway-->>Client: JSON Response ``` Synchronous request-response using HTTP/REST. **Implementation**: ```typescript // Service-to-service HTTP client import axios from 'axios'; export class UserServiceClient { private client = axios.create({ baseURL: process.env.USER_SERVICE_URL, timeout: 5000, headers: { 'x-service-auth': process.env.INTERNAL_API_KEY } }); async getUser(userId: string): Promise { const response = await this.client.get(`/users/${userId}`); return response.data; } } ``` ### Event-Driven Pattern ```mermaid sequenceDiagram participant ServiceA participant Kafka participant ServiceB participant ServiceC ServiceA->>Kafka: Publish: user.created Kafka->>ServiceB: Deliver event Kafka->>ServiceC: Deliver event par Parallel Processing ServiceB->>ServiceB: Send welcome email ServiceC->>ServiceC: Create user profile end ``` Asynchronous event-based communication via Kafka. ### Service Discovery **Local (Docker Compose)**: ```yaml # Services discover via Docker DNS http://service-name:port http://iam-service:3001 ``` **Kubernetes**: ```yaml # Services discover via K8s DNS http://service-name.namespace.svc.cluster.local http://iam-service.default.svc.cluster.local:3001 ``` ## API Gateway Pattern ```mermaid graph LR Client --> Gateway[API Gateway
Traefik] subgraph "Gateway Features" Gateway --> Route[Routing] Gateway --> LB[Load Balancing] Gateway --> Auth[Authentication] Gateway --> Rate[Rate Limiting] Gateway --> CORS end Route --> Service1[Service 1] Route --> Service2[Service 2] LB --> Service1A[Instance A] LB --> Service1B[Instance B] style Gateway fill:#e1f5ff ``` Single entry point for all client requests with routing, auth, rate limiting. ## Performance Characteristics Performance expectations and optimization strategies for inter-service communication. | Metric | Target | Notes | |--------|--------|-------| | **REST API Response Time** | < 100ms | P95 for internal service-to-service calls | | **Event Publishing Latency** | < 50ms | Time to publish to Kafka | | **Service Discovery Lookup** | < 10ms | DNS resolution time | | **Gateway Routing Overhead** | < 20ms | Additional latency added by Traefik | | **Throughput** | 10,000 req/s | Per service instance | | **Kafka Event Processing** | < 500ms | P95 end-to-end event processing | **Optimization Strategies**: - **Connection Pooling**: Reuse HTTP connections between services - **Circuit Breaker**: Prevent cascading failures with Opossum library - **Retry with Backoff**: Exponential backoff for transient failures - **Compression**: Enable gzip for large payloads - **Caching**: Cache service discovery results and responses ## Security Considerations Security measures for protecting inter-service communication. ### Service-to-Service Authentication - **Internal API Keys**: Services authenticate using `x-service-auth` header - **JWT Tokens**: For user context propagation between services - **Mutual TLS (mTLS)**: Optional for production environments (Kubernetes service mesh) ### Network Security - **Network Policies**: Kubernetes NetworkPolicies restrict service-to-service traffic - **Service Mesh**: Istio/Linkerd for advanced security policies (optional) - **Private Networks**: Services communicate within private VPC/cluster network ### Data Protection - **Encryption in Transit**: TLS 1.2+ for all external communication - **Event Payload Encryption**: Sensitive data encrypted before publishing to Kafka - **API Gateway**: Traefik handles SSL termination and request validation ### Security Best Practices ```typescript // Service client with authentication export class SecureServiceClient { private client = axios.create({ baseURL: process.env.SERVICE_URL, timeout: 5000, headers: { 'x-service-auth': process.env.INTERNAL_API_KEY, 'x-correlation-id': generateCorrelationId() }, httpsAgent: new https.Agent({ rejectUnauthorized: true // Verify SSL certificates }) }); } ``` ## Deployment How microservices communication is deployed and scaled across environments. ```mermaid graph TD subgraph "Production Cluster" LB[Load Balancer] --> Gateway[API Gateway\n3 replicas] Gateway --> ServiceA1[Service A\nInstance 1] Gateway --> ServiceA2[Service A\nInstance 2] Gateway --> ServiceB1[Service B\nInstance 1] Gateway --> ServiceB2[Service B\nInstance 2] ServiceA1 & ServiceA2 --> Kafka[Kafka Cluster\n3 brokers] ServiceB1 & ServiceB2 --> Kafka ServiceA1 & ServiceA2 --> DB[(PostgreSQL\nPrimary + Replica)] ServiceB1 & ServiceB2 --> DB ServiceA1 & ServiceA2 --> Redis[(Redis Cluster\n3 nodes)] ServiceB1 & ServiceB2 --> Redis end style Gateway fill:#e1f5ff style Kafka fill:#fff4e1 style DB fill:#d4edda style Redis fill:#ffe1e1 ``` ### Deployment Environments | Environment | Gateway | Services | Kafka | Service Discovery | |-------------|---------|----------|-------|-------------------| | **Local** | Traefik (Docker) | Single instance per service | Single broker | Docker DNS | | **Staging** | Traefik (2 replicas) | 2 replicas per service | 3 brokers | Kubernetes DNS | | **Production** | Traefik (3+ replicas) | 3+ replicas per service | 5+ brokers | Kubernetes DNS + Service Mesh | ### Scaling Strategy - **Horizontal Pod Autoscaler (HPA)**: Auto-scale based on CPU/memory - **Kafka Partitions**: Scale event processing by increasing partitions - **Load Balancing**: Kubernetes Service load balances across pod replicas - **Gateway Scaling**: Traefik scales independently from backend services ## Monitoring & Observability How to monitor and observe microservices communication. ### Key Metrics **Service-to-Service Metrics**: - `http_request_duration_seconds` - Request latency histogram - `http_requests_total` - Total requests counter - `http_request_errors_total` - Failed requests counter - `service_client_timeout_total` - Timeout counter **Gateway Metrics**: - `traefik_service_requests_total` - Requests per service - `traefik_service_request_duration_seconds` - Routing latency - `traefik_service_retries_total` - Retry attempts **Kafka Metrics**: - `kafka_producer_record_send_total` - Events published - `kafka_consumer_lag` - Consumer lag - `kafka_consumer_records_consumed_total` - Events consumed ### Health Checks **Service Endpoints**: ```typescript // Liveness - is service running? app.get('/health/live', (req, res) => { res.json({ status: 'ok', timestamp: new Date().toISOString() }); }); // Readiness - can service handle traffic? app.get('/health/ready', async (req, res) => { const checks = { database: await checkDatabase(), redis: await checkRedis(), kafka: await checkKafka() }; const healthy = Object.values(checks).every(c => c); res.status(healthy ? 200 : 503).json({ ready: healthy, checks }); }); ``` **Kubernetes Probes**: ```yaml livenessProbe: httpGet: path: /health/live port: 3000 initialDelaySeconds: 30 periodSeconds: 10 readinessProbe: httpGet: path: /health/ready port: 3000 initialDelaySeconds: 5 periodSeconds: 5 ``` ### Distributed Tracing - **OpenTelemetry**: Instrument all service-to-service calls - **Jaeger**: Visualize distributed traces - **Correlation IDs**: Propagate via `x-correlation-id` header for request tracking ### Monitoring Dashboard **Grafana Panels**: - Service Communication Overview (request rate, latency, errors) - Gateway Performance (routing time, backend health) - Event Bus Health (Kafka lag, throughput) - Service Dependencies (service map from traces) ## Related Documentation - [System Design](./system-design.md) - Overall architecture - [Event-Driven Architecture](./event-driven-architecture.md) - Event patterns - [API Gateway Advanced](../skills/api-gateway-advanced.md) - Gateway patterns - [Inter-Service Communication](../skills/inter-service-communication.md) - Communication patterns - [Resilience Patterns](../skills/resilience-patterns.md) - Circuit breaker, retry --- **Last Updated**: 2026-01-07 **Author**: VelikHo (hongochai10@icloud.com) **Reviewers**: To be assigned