docs(architecture): Revise documentation for Logical Folder Architecture and Data Sovereignty

- Updated the README and ARCHITECTURE documentation to emphasize the Logical Folder structure, clarifying that folders are a logical concept in the database rather than dependent on bucket structure.
- Highlighted the benefits of using UUID-based keys and a flat bucket structure, including improved performance, security, and scalability.
- Provided detailed examples of database schema, workflows, and performance comparisons to illustrate the advantages of the new approach over traditional methods.
- Enhanced explanations of folder management processes, including creation, renaming, and file uploads, to improve developer understanding and implementation.
This commit is contained in:
Ho Ngoc Hai
2026-01-13 22:37:20 +07:00
parent 1bcdfcccac
commit ecacde83ea
2 changed files with 377 additions and 41 deletions

View File

@@ -139,17 +139,257 @@ sequenceDiagram
end
```
### Path-based Access Control
### Logical Folder Architecture (Data Sovereignty)
Files are organized with access level prefixes:
> ⚠️ **IMPORTANT**: Following the **Data Sovereignty** principle in microservices, Storage Service must fully own its data model.
#### ❌ Anti-pattern: Relying on Bucket Structure
```
BAD APPROACH - Folder structure reflected in bucket:
storage-bucket/
├── public/{userId}/{date}/{fileId}_{filename} → Publicly accessible
├── private/{userId}/{date}/{fileId}_{filename} → Requires pre-signed URL
└── shared/{userId}/{date}/{fileId}_{filename} → Access controlled by rules
├── users/john/documents/report.pdf
├── users/john/images/photo.jpg
└── users/mary/work/presentation.pptx
PROBLEMS:
- Renaming folder "documents" → "docs" = Moving millions of files (O(n))
- Moving file = Copy + Delete on bucket (slow, risky)
- Predictable paths → Vulnerable to path traversal attacks
- Doesn't scale with millions of users
- Difficult to migrate to another storage provider
```
#### ✅ Correct Approach: Logical Separation
**Principles:**
1. **Database** = Logical structure (folders, hierarchy, permissions)
2. **Bucket** = Physical storage (flat UUID keys)
```mermaid
graph TB
subgraph "Logical Layer - PostgreSQL Database"
F[Folders Table]
FL[Files Table]
F -->|parent_id| F
FL -->|folder_id| F
end
subgraph "Physical Layer - MinIO Bucket"
B[Flat UUID Structure]
B1[private/2026/01/13/uuid1.pdf]
B2[private/2026/01/13/uuid2.jpg]
B3[public/2026/01/14/uuid3.png]
end
FL -.->|storage_key| B
style F fill:#3498db,color:#fff
style FL fill:#2ecc71,color:#fff
style B fill:#e74c3c,color:#fff
```
#### Database Schema
```sql
-- Folders: Hierarchical tree structure
CREATE TABLE folders (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id VARCHAR(255) NOT NULL,
parent_id UUID REFERENCES folders(id) ON DELETE CASCADE,
name VARCHAR(255) NOT NULL,
path VARCHAR(1000) NOT NULL, -- Materialized path: /docs/work/2024
level INT NOT NULL DEFAULT 0, -- Tree depth
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW(),
UNIQUE (user_id, parent_id, name),
INDEX idx_user_path (user_id, path)
);
-- Files: Link to logical folders
CREATE TABLE storage_files (
id UUID PRIMARY KEY,
user_id VARCHAR(255) NOT NULL,
folder_id UUID REFERENCES folders(id) ON DELETE SET NULL,
file_name VARCHAR(255) NOT NULL,
storage_key VARCHAR(500) UNIQUE, -- Physical UUID key in bucket
content_type VARCHAR(100),
file_size_bytes BIGINT,
access_level VARCHAR(20), -- Private, Public, Shared
bucket_name VARCHAR(255),
provider INT,
uploaded_at TIMESTAMP,
is_deleted BOOLEAN DEFAULT false,
INDEX idx_folder_id (folder_id),
INDEX idx_storage_key (storage_key)
);
```
#### Physical Key Generation Strategy
```csharp
public class StorageKeyGenerator
{
/// <summary>
/// Generate UUID-based key (NOT based on folder path)
/// Pattern: {prefix}/{year}/{month}/{day}/{uuid}.{ext}
/// </summary>
public string GenerateKey(FileAccessLevel access, string fileName)
{
var now = DateTime.UtcNow;
var prefix = access == FileAccessLevel.Public ? "public" : "private";
var uuid = Guid.NewGuid().ToString("N"); // 32 chars, no hyphens
var ext = Path.GetExtension(fileName);
// Example: private/2026/01/13/d290f1ee6c544b0190e6d701748f0851.pdf
return $"{prefix}/{now:yyyy}/{now:MM}/{now:dd}/{uuid}{ext}";
}
}
```
#### Bucket Structure (Flat Physical)
```
goodgo/
├── private/
│ ├── 2026/
│ │ └── 01/
│ │ └── 13/
│ │ ├── d290f1ee6c544b0190e6d701748f0851.pdf
│ │ ├── a7f3b2c19d8e4f01bcd5e92f7a4d8b63.jpg
│ │ └── f9e4c1a2b7d36e5f8a0c4d9e2b1f7a8c.docx
├── public/
│ └── 2026/
│ └── 01/
│ └── 14/
│ └── c3d8f2e1a9b47c6e5d0f8a3b2e1c7d9f.png
└── shared/
└── 2026/
└── 01/
└── 15/
└── e1f2c3d4a5b6c7e8d9f0a1b2c3d4e5f6.xlsx
NO user/documents/... folders in bucket!
```
#### Workflows
**1. Create Folder (Database Only)**
```mermaid
sequenceDiagram
Client->>API: POST /api/v1/folders {name: "documents"}
API->>Database: INSERT INTO folders (name, path)
Database-->>API: folder_id
API-->>Client: {id, name, path: "/documents"}
Note over Client,API: Bucket is NOT touched
```
**2. Upload File to Folder**
```mermaid
sequenceDiagram
Client->>API: POST /sign-upload {fileName, folderId}
API->>Database: Validate folder ownership
API->>StorageKeyGen: Generate UUID key
StorageKeyGen-->>API: private/2026/01/13/{uuid}.pdf
API->>MinIO: Get pre-signed PUT URL
MinIO-->>API: pre-signed URL
API->>Database: INSERT file (folder_id, storage_key)
API-->>Client: {uploadUrl, objectKey}
Client->>MinIO: PUT file binary
Client->>API: POST /confirm-upload
```
**3. List Files in Folder**
```sql
-- Database query (fast, indexed)
SELECT * FROM storage_files
WHERE folder_id = '{folder-uuid}'
AND is_deleted = false
ORDER BY uploaded_at DESC;
-- Result returned to client:
{
"folder": {
"id": "folder-123",
"path": "/documents/work"
},
"files": [
{
"id": "file-456",
"name": "report.pdf",
"logicalPath": "/documents/work/report.pdf", -- Displayed to user
"storageKey": "private/2026/01/13/{uuid}.pdf" -- Physical key
}
]
}
```
**4. Rename Folder**
```sql
-- Only UPDATE database (instant, O(1))
UPDATE folders
SET name = 'docs',
path = '/docs' -- Update materialized path
WHERE id = '{folder-id}';
-- Update descendant paths
UPDATE folders
SET path = REPLACE(path, '/documents', '/docs')
WHERE path LIKE '/documents/%';
-- Files KEEP their storage_key - DON'T TOUCH bucket!
```
**5. Move File Between Folders**
```sql
-- Only UPDATE database (instant, O(1))
UPDATE storage_files
SET folder_id = '{new-folder-id}'
WHERE id = '{file-id}';
-- Physical file STAYS at old storage_key
-- No Copy/Delete needed in bucket
```
#### Performance Comparison
| Operation | Anti-pattern (Bucket-based) | Correct (Logical) | Improvement |
|-----------|----------------------------|-------------------|-------------|
| **Create folder** | N/A (virtual) | INSERT to DB | Instant |
| **Rename folder** | Copy + Delete millions of files | UPDATE 1-N rows in DB | ~1000x faster |
| **Move file** | Copy + Delete 1 file | UPDATE 1 row in DB | ~100x faster |
| **List files** | Bucket prefix scan | Indexed DB query | ~50x faster |
| **Delete folder** | Delete millions of files | Cascade delete + background cleanup | Async |
#### Data Sovereignty Benefits
1. **Performance:**
- Rename folder: O(1) instead of O(n)
- Move file: O(1) instead of O(1) bucket copy
- List files: Indexed query instead of bucket scan
2. **Security:**
- UUID keys are unpredictable
- No path traversal vulnerabilities
- Access control in database
3. **Scalability:**
- Database handles metadata (fast)
- Bucket handles blobs (simple)
- Easy horizontal scaling
4. **Flexibility:**
- Easy to add features (sharing, versioning)
- Migration between storage providers
- Support multiple buckets/regions
5. **Developer Experience:**
- Client sees beautiful folder tree
- Backend works with simple UUIDs
- Clean separation of concerns
### Direct Upload Components
| Component | Purpose |

View File

@@ -183,48 +183,144 @@ Response:
}
```
## Bucket Directory Structure
## Logical Folder Architecture (Data Sovereignty)
Files are organized by access level and user ID with the following format:
```
{bucket}/
├── private/{userId}/{date}/{fileId}_{filename} → Owner access only (via pre-signed URL)
├── public/{userId}/{date}/{fileId}_{filename} → Publicly accessible
└── shared/{userId}/{date}/{fileId}_{filename} → Controlled by sharing rules
```
### Object Key Format Details
| Component | Description | Example |
|-----------|-------------|---------|
| `{bucket}` | Bucket name (from config) | `goodgo` |
| `{accessLevel}` | Access level prefix | `private`, `public`, `shared` |
| `{userId}` | Uploader's user ID | `user123` |
| `{date}` | Upload date (UTC) | `20260113` |
| `{fileId}` | First 8 chars of GUID | `a1b2c3d4` |
| `{filename}` | Sanitized file name | `document.pdf` |
### Real-World Example
> ⚠️ **IMPORTANT**: Following the **Data Sovereignty** principle in microservices, folders are a **logical concept in the Database**, NOT dependent on bucket structure.
### Design Principles
Storage Service owns its own data model:
- **Database** manages logical folder structure (hierarchy, paths, permissions)
- **Bucket** only stores file binaries with UUID keys (flat structure)
### ❌ Anti-pattern (DON'T DO THIS)
```
Bucket structure based on user paths:
goodgo/
├── private/
│ └── user123/
│ └── 20260113/
│ ├── a1b2c3d4_document.pdf
│ └── e5f6g7h8_image.jpg
├── public/
│ └── user456/
│ └── 20260113/
│ └── i9j0k1l2_avatar.png
└── shared/
└── user789/
└── 20260113/
└── m3n4o5p6_presentation.pptx
├── users/john/documents/report.pdf ← BAD
── users/mary/images/photo.jpg ← BAD
└── users/bob/work/presentation.pptx ← BAD
PROBLEMS:
- Renaming folder = Moving millions of files (slow + risky)
- Predictable paths = Vulnerable to attacks
- Doesn't scale with millions of users
```
> **Note**: The object key is returned in the `/sign-upload` response and must be sent back when calling `/confirm-upload`.
### ✅ Correct Architecture (Logical Separation)
#### 1⃣ Database: Logical Structure
```sql
-- Folders table: Manages folder tree
CREATE TABLE folders (
id UUID PRIMARY KEY,
user_id VARCHAR(255) NOT NULL,
parent_id UUID REFERENCES folders(id),
name VARCHAR(255) NOT NULL,
path VARCHAR(1000) NOT NULL, -- Example: /docs/work/2024
created_at TIMESTAMP
);
-- Files table: Links to logical folder
CREATE TABLE files (
id UUID PRIMARY KEY,
user_id VARCHAR(255) NOT NULL,
folder_id UUID REFERENCES folders(id), -- Logical folder
file_name VARCHAR(255) NOT NULL,
storage_key VARCHAR(500) UNIQUE, -- Physical key (UUID)
size_bytes BIGINT,
content_type VARCHAR(100),
access_level VARCHAR(20),
created_at TIMESTAMP
);
```
#### 2⃣ Bucket: Flat Physical Storage
Files are stored with **UUID-based keys** following pattern:
```
{prefix}/{year}/{month}/{day}/{uuid}.{ext}
Example:
goodgo/
├── private/2026/01/13/d290f1ee6c544b0190e6d701748f0851.pdf
├── private/2026/01/13/a7f3b2c19d8e4f01bcd5e92f7a4d8b63.jpg
├── public/2026/01/14/f9e4c1a2b7d36e5f8a0c4d9e2b1f7a8c.png
└── shared/2026/01/15/c3d8f2e1a9b47c6e5d0f8a3b2e1c7d9f.docx
NO user/folder structure in bucket!
All are flat UUID keys.
```
#### 3⃣ Storage Key Generation
```csharp
// Automatically generate physical key (NOT based on folder)
public string GenerateStorageKey(FileAccessLevel access, string fileName)
{
var now = DateTime.UtcNow;
var prefix = access == FileAccessLevel.Public ? "public" : "private";
var uuid = Guid.NewGuid().ToString("N");
var ext = Path.GetExtension(fileName);
// Pattern: {prefix}/{year}/{month}/{day}/{uuid}{ext}
return $"{prefix}/{now:yyyy}/{now:MM}/{now:dd}/{uuid}{ext}";
}
```
### Benefits
| Operation | Database (Logic) | Bucket (Physical) | Performance |
|-----------|------------------|-------------------|-------------|
| **Create folder** | INSERT 1 row | Nothing | Instant |
| **Rename folder** | UPDATE 1 row | Nothing | O(1) - Instant |
| **Move file** | UPDATE File.folder_id | Nothing | O(1) - Instant |
| **Delete folder** | Cascade delete | Queue cleanup | Background job |
**Compared to Anti-pattern:**
- Rename folder: **O(1)** vs O(n) - 1000x faster
- Security: UUID keys are unpredictable
- Migration: Easy to switch storage provider
### Workflow Examples
**Upload file to folder:**
```bash
# 1. Create folder (Database only)
POST /api/v1/folders
{
"name": "documents",
"parentId": null
}
# 2. Upload file to folder
POST /api/v1/storage/sign-upload
{
"fileName": "report.pdf",
"folderId": "folder-uuid-123", # Logical folder
"fileSizeBytes": 1048576
}
# Response: Physical key DOES NOT contain folder name
{
"uploadUrl": "...",
"objectKey": "private/2026/01/13/d290f1ee6c544b0190e6d701748f0851.pdf"
}
```
**List files in folder:**
```bash
GET /api/v1/folders/{folderId}/files
# Database query: SELECT * FROM files WHERE folder_id = {folderId}
# Client sees: /documents/report.pdf (logical path)
# Bucket stores: private/2026/01/13/{uuid}.pdf (physical key)
```
> 📚 **Technical Details:** See [ARCHITECTURE.md](./ARCHITECTURE.md) for deeper understanding of this pattern.
## Legacy Upload Example (Via Backend)