Files
pos-system/docs/audit/founding-engineer.md

290 lines
12 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Audit Report: POS System — Founding Engineer Perspective
**Date**: 2026-03-20
**Auditor**: Founding Engineer
**Scope**: Code quality, shared patterns, template compliance, DRY violations
**Working Directory**: `/Users/velikho/Desktop/WORKING/pos-system`
---
## Executive Summary
The GoodGo Platform is a well-structured enterprise microservices monorepo with 23 .NET services, 5 frontend applications, and 6 shared Node.js packages. Clean Architecture and CQRS patterns are consistently applied across all services. However, three critical issues must be resolved before production: insecure CORS configuration (all 23 services), broken health check authentication (18 services), and ~5,100 lines of duplicated MediatR behaviors.
---
## Critical Issues
### C1 — Insecure CORS Configuration (All 23 Services)
**Severity**: HIGH — Security vulnerability
**Files**: `services/*/src/*/Program.cs` lines ~113120
Every service uses `AllowAnyOrigin()`:
```csharp
// Current (INSECURE)
policy.AllowAnyOrigin()
.AllowAnyMethod()
.AllowAnyHeader();
```
This allows any website to issue cross-origin requests to the API, enabling CSRF attacks. There is no origin validation. `AllowAnyOrigin()` is also incompatible with `AllowCredentials()`, silently breaking cookie/credential flows.
**Fix**:
```csharp
var allowedOrigins = builder.Configuration
.GetSection("Cors:AllowedOrigins")
.Get<string[]>() ?? Array.Empty<string>();
policy.WithOrigins(allowedOrigins)
.AllowAnyMethod()
.AllowAnyHeader()
.AllowCredentials();
```
Add `Cors__AllowedOrigins__0=https://goodgo.vn` per service in `docker-compose.yml` and Kubernetes manifests.
---
### C2 — Health Check Endpoints Require Authentication (18 of 23 Services)
**Severity**: HIGH — Breaks Kubernetes liveness/readiness probes
**Example**: `services/merchant-service-net/src/MerchantService.API/Program.cs` lines 132137
```csharp
// Current (missing .AllowAnonymous())
app.MapHealthChecks("/health");
app.MapHealthChecks("/health/live", new() { Predicate = _ => false });
app.MapHealthChecks("/health/ready");
```
Kubernetes kubelet cannot authenticate; probes return `401 Unauthorized` → pod marked unhealthy → restart loop. Docker Compose healthcheck (`curl -f http://localhost:8080/health/live`) also fails.
**Services without `.AllowAnonymous()`**: merchant, iam, order, wallet, storage, membership, booking, catalog, inventory, chat, social, mining, mission, promotion, ads-manager, ads-analytics, mkt-facebook, mkt-whatsapp (and others).
**Services correctly configured**: ads-serving, ads-tracking, ads-billing, mkt-x, mkt-zalo.
**Fix**:
```csharp
app.MapHealthChecks("/health").AllowAnonymous();
app.MapHealthChecks("/health/live", new() { Predicate = _ => false }).AllowAnonymous();
app.MapHealthChecks("/health/ready").AllowAnonymous();
```
---
### C3 — MediatR Behaviors Duplicated Across All 23 Services
**Severity**: HIGH — Maintenance risk; single bug requires 23 fixes
**Files**: `services/*/src/*/Application/Behaviors/`
Three behavior files are copied verbatim in every service:
| File | Lines per copy | Total copies | Total duplicated lines |
|------|---------------|--------------|----------------------|
| `LoggingBehavior.cs` | ~59 | 23 | ~1,357 |
| `ValidatorBehavior.cs` | ~41 | 23 | ~943 |
| `TransactionBehavior.cs` | ~50 | 23 | ~1,150 |
| **Total** | | | **~3,450** |
Example of identical files:
- `services/merchant-service-net/src/MerchantService.API/Application/Behaviors/LoggingBehavior.cs`
- `services/order-service-net/src/OrderService.API/Application/Behaviors/LoggingBehavior.cs`
Only the namespace differs. Any fix in one behavior must be manually propagated to 22 other copies.
**Fix**: Extract to a shared NuGet package `GoodGo.MediatR.Behaviors`:
```
packages/
GoodGo.MediatR.Behaviors/
LoggingBehavior.cs
ValidatorBehavior.cs
TransactionBehavior.cs
```
Reference from each service:
```xml
<PackageReference Include="GoodGo.MediatR.Behaviors" Version="1.0.0" />
```
---
### C4 — JWT Signature Validation Disabled in Development
**Severity**: MEDIUM — Allows any token to be accepted in dev environments
**Files**: `services/*/src/*/Program.cs` lines ~91108
```csharp
ValidateIssuerSigningKey = builder.Environment.IsDevelopment() ? false : true,
SignatureValidator = builder.Environment.IsDevelopment()
? (token, _) => new JsonWebToken(token)
: null,
```
In development, any crafted JWT with valid `exp`/`nbf` claims is accepted. Tokens are not signature-verified. Risk is mitigated by `IsDevelopment()` check but raises the question of why this is needed at all.
**Root Cause**: Likely the IAM service (Duende IdentityServer) uses a rotating signing key not pre-shared with other services in Docker Compose local dev.
**Fix**: Seed a deterministic development signing key via environment variable and share it across services in `docker-compose.yml`. This enables full signature validation in all environments.
---
## Warnings
### W1 — SeedWork Classes Duplicated Across Domain Layers
**Impact**: Medium — Changes to base patterns require multi-service updates
**Files**: `services/*/src/*/Domain/SeedWork/`
`Entity.cs`, `IAggregateRoot.cs`, `IRepository.cs`, `ValueObject.cs`, `IUnitOfWork.cs`, `Enumeration.cs` are copied into every service's Domain layer. This is partially justified by DDD isolation (Domain layer must have no external NuGet dependencies), but it means fixing a bug in `Entity.cs` requires updates in 23 places.
**Recommendation**: Document this as an intentional architectural trade-off in CLAUDE.md (currently undocumented). If the team wants single-source-of-truth, create `GoodGo.Domain.SeedWork` as a source-only NuGet (no DI dependencies).
---
### W2 — API Versioning Configured But Not Fully Utilized
**Impact**: Low — Unused NuGet package capability
**Files**: `services/*/src/*/Program.cs` lines 4256
```csharp
builder.Services.AddApiVersioning(options => {
options.DefaultApiVersion = new ApiVersion(1, 0);
options.AssumeDefaultVersionWhenUnspecified = true; // Bypasses versioning
});
```
Controllers use hardcoded routes (`[Route("api/v1/merchants")]`) rather than the `{version:apiVersion}` parameter. `Asp.Versioning` NuGet is installed but the API version attribute system is not used. Routes work correctly but versioning capability is nullified.
**Options**:
1. Accept current pattern (hardcoded v1 routes) and document it
2. Migrate to `[ApiVersion("1.0")]` on controllers and use `{version:apiVersion}` route parameter
---
### W3 — Missing README.md in 14 Services
**Impact**: Low — Developer onboarding friction
**Services with README**: iam-service-net, merchant-service-net, membership-service-net, mkt-facebook-service-net, mkt-whatsapp-service-net, mkt-x-service-net, mkt-zalo-service-net (7 of 23)
**Services without README**: order, wallet, storage, booking, catalog, inventory, chat, social, mining, mission, promotion, ads-manager, ads-serving, ads-tracking, ads-billing, ads-analytics (16 of 23)
---
### W4 — `init-databases.sh` Must Be Manually Maintained
**File**: `deployments/local/init-databases.sh`
When a new service is created, the database must be manually added to this script. There is no automated enforcement. If a developer creates a new service and forgets this step, the service's database won't be initialized in local dev.
**Recommendation**: Add a pre-start validation script or generate the init script from docker-compose.yml service names.
---
### W5 — One Incomplete Test TODO
**File**: `services/iam-service-net/tests/IamService.FunctionalTests/Controllers/AuthorizationPolicyTests.cs`
```csharp
/// TODO: Configure test authentication to properly inject role claims.
```
Minor — single comment in test code, not production code. But indicates an incomplete test scenario for authorization policies.
---
## Improvements
### I1 — Extract Common Infrastructure to Shared NuGet Packages
Priority: High
Create a `GoodGo.*` suite of shared NuGet packages:
- `GoodGo.MediatR.Behaviors` — LoggingBehavior, ValidatorBehavior, TransactionBehavior
- `GoodGo.AspNetCore.Extensions` — Common Program.cs middleware setup
- `GoodGo.Domain.SeedWork` *(optional, source-only)* — Entity, ValueObject, IAggregateRoot
This eliminates the bulk of cross-service duplication.
---
### I2 — Standardize CORS via Configuration
Priority: High
Move CORS allowed origins out of code and into configuration. Each service should read from `Cors:AllowedOrigins` environment variable. Docker Compose and Kubernetes manifests set per-environment values. This is a one-time templating effort with high security payoff.
---
### I3 — Add Health Check Tests to All Functional Test Suites
Priority: Medium
Verify that `/health/live` and `/health/ready` return 200 without authentication. Currently only some functional test suites include these tests. Adding them to all 23 services would catch the C2 issue automatically in CI.
```csharp
[Fact]
public async Task HealthLive_ReturnsOk_WithoutAuthentication()
{
var response = await _client.GetAsync("/health/live");
response.StatusCode.Should().Be(HttpStatusCode.OK);
}
```
---
### I4 — Centralize Docker Compose Environment Defaults
Priority: Medium
Several environment variables are repeated across 23+ service definitions in `docker-compose.yml`. Consider using YAML anchors or a `.env` fragment approach to reduce drift:
```yaml
x-common-env: &common-env
ASPNETCORE_ENVIRONMENT: Development
Serilog__MinimumLevel: Information
ConnectionStrings__Redis: redis:6379
```
---
### I5 — Add Consistent CI Health Check Validation
Priority: Low
Add a CI step that calls `/health/live` on each newly-built Docker image before pushing to Docker Hub. This catches health check regressions before staging deployment.
---
## Action Items (Prioritized)
| # | Action | Priority | Effort | Owner |
|---|--------|----------|--------|-------|
| 1 | Fix CORS — replace `AllowAnyOrigin()` with `WithOrigins()` in all 23 services | P0 | 2h | Founding Engineer |
| 2 | Fix health check authentication — add `.AllowAnonymous()` to 18 services | P0 | 1h | Founding Engineer |
| 3 | Extract MediatR behaviors to `GoodGo.MediatR.Behaviors` NuGet | P1 | 4h | Founding Engineer |
| 4 | Fix JWT development signing key — use deterministic key via env var | P1 | 2h | Founding Engineer |
| 5 | Add health check tests to all functional test suites | P1 | 3h | QA Engineer |
| 6 | Create README.md for 16 missing services | P2 | 8h | Tech Lead |
| 7 | Document SeedWork duplication trade-off in CLAUDE.md | P2 | 0.5h | Tech Lead |
| 8 — | Resolve IAM authorization test TODO | P2 | 1h | Backend Dev |
| 9 | Clarify API versioning strategy and document or clean up | P3 | 1h | CTO |
---
## Architecture Strengths
The following areas are well-implemented and should serve as reference patterns:
- **Clean Architecture**: All 23 services strictly enforce the Domain → no external dependencies rule
- **CQRS**: Commands, Queries, and Handlers follow naming conventions consistently
- **Entity Pattern**: Private fields + behavior methods + domain events is correctly applied everywhere
- **Test Structure**: Every service has both unit tests (xUnit + Moq + FluentAssertions) and functional tests (WebApplicationFactory + InMemory)
- **Dockerfile**: Multi-stage build, non-root user (dotnetuser:1001), health check executable installed, port 8080
- **Traefik**: Path-based routing, rate limiting, secure headers, CORS middleware all configured
- **Bilingual**: Comments, logs, validation messages in EN + VI across the entire codebase
- **Observability**: Prometheus + Grafana + Loki stack configured in `infra/observability/`
- **Monorepo**: pnpm workspaces + Turborepo correctly set up with shared `packages/`