docs(architecture): Revise documentation for Logical Folder Architecture and Data Sovereignty
- Updated the README and ARCHITECTURE documentation to emphasize the Logical Folder structure, clarifying that folders are a logical concept in the database rather than dependent on bucket structure. - Highlighted the benefits of using UUID-based keys and a flat bucket structure, including improved performance, security, and scalability. - Provided detailed examples of database schema, workflows, and performance comparisons to illustrate the advantages of the new approach over traditional methods. - Enhanced explanations of folder management processes, including creation, renaming, and file uploads, to improve developer understanding and implementation.
This commit is contained in:
@@ -139,17 +139,257 @@ sequenceDiagram
|
||||
end
|
||||
```
|
||||
|
||||
### Path-based Access Control
|
||||
### Logical Folder Architecture (Data Sovereignty)
|
||||
|
||||
Files are organized with access level prefixes:
|
||||
> ⚠️ **IMPORTANT**: Following the **Data Sovereignty** principle in microservices, Storage Service must fully own its data model.
|
||||
|
||||
#### ❌ Anti-pattern: Relying on Bucket Structure
|
||||
|
||||
```
|
||||
BAD APPROACH - Folder structure reflected in bucket:
|
||||
storage-bucket/
|
||||
├── public/{userId}/{date}/{fileId}_{filename} → Publicly accessible
|
||||
├── private/{userId}/{date}/{fileId}_{filename} → Requires pre-signed URL
|
||||
└── shared/{userId}/{date}/{fileId}_{filename} → Access controlled by rules
|
||||
├── users/john/documents/report.pdf
|
||||
├── users/john/images/photo.jpg
|
||||
└── users/mary/work/presentation.pptx
|
||||
|
||||
PROBLEMS:
|
||||
- Renaming folder "documents" → "docs" = Moving millions of files (O(n))
|
||||
- Moving file = Copy + Delete on bucket (slow, risky)
|
||||
- Predictable paths → Vulnerable to path traversal attacks
|
||||
- Doesn't scale with millions of users
|
||||
- Difficult to migrate to another storage provider
|
||||
```
|
||||
|
||||
#### ✅ Correct Approach: Logical Separation
|
||||
|
||||
**Principles:**
|
||||
1. **Database** = Logical structure (folders, hierarchy, permissions)
|
||||
2. **Bucket** = Physical storage (flat UUID keys)
|
||||
|
||||
```mermaid
|
||||
graph TB
|
||||
subgraph "Logical Layer - PostgreSQL Database"
|
||||
F[Folders Table]
|
||||
FL[Files Table]
|
||||
F -->|parent_id| F
|
||||
FL -->|folder_id| F
|
||||
end
|
||||
|
||||
subgraph "Physical Layer - MinIO Bucket"
|
||||
B[Flat UUID Structure]
|
||||
B1[private/2026/01/13/uuid1.pdf]
|
||||
B2[private/2026/01/13/uuid2.jpg]
|
||||
B3[public/2026/01/14/uuid3.png]
|
||||
end
|
||||
|
||||
FL -.->|storage_key| B
|
||||
|
||||
style F fill:#3498db,color:#fff
|
||||
style FL fill:#2ecc71,color:#fff
|
||||
style B fill:#e74c3c,color:#fff
|
||||
```
|
||||
|
||||
#### Database Schema
|
||||
|
||||
```sql
|
||||
-- Folders: Hierarchical tree structure
|
||||
CREATE TABLE folders (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
user_id VARCHAR(255) NOT NULL,
|
||||
parent_id UUID REFERENCES folders(id) ON DELETE CASCADE,
|
||||
name VARCHAR(255) NOT NULL,
|
||||
path VARCHAR(1000) NOT NULL, -- Materialized path: /docs/work/2024
|
||||
level INT NOT NULL DEFAULT 0, -- Tree depth
|
||||
created_at TIMESTAMP DEFAULT NOW(),
|
||||
updated_at TIMESTAMP DEFAULT NOW(),
|
||||
|
||||
UNIQUE (user_id, parent_id, name),
|
||||
INDEX idx_user_path (user_id, path)
|
||||
);
|
||||
|
||||
-- Files: Link to logical folders
|
||||
CREATE TABLE storage_files (
|
||||
id UUID PRIMARY KEY,
|
||||
user_id VARCHAR(255) NOT NULL,
|
||||
folder_id UUID REFERENCES folders(id) ON DELETE SET NULL,
|
||||
file_name VARCHAR(255) NOT NULL,
|
||||
storage_key VARCHAR(500) UNIQUE, -- Physical UUID key in bucket
|
||||
content_type VARCHAR(100),
|
||||
file_size_bytes BIGINT,
|
||||
access_level VARCHAR(20), -- Private, Public, Shared
|
||||
bucket_name VARCHAR(255),
|
||||
provider INT,
|
||||
uploaded_at TIMESTAMP,
|
||||
is_deleted BOOLEAN DEFAULT false,
|
||||
|
||||
INDEX idx_folder_id (folder_id),
|
||||
INDEX idx_storage_key (storage_key)
|
||||
);
|
||||
```
|
||||
|
||||
#### Physical Key Generation Strategy
|
||||
|
||||
```csharp
|
||||
public class StorageKeyGenerator
|
||||
{
|
||||
/// <summary>
|
||||
/// Generate UUID-based key (NOT based on folder path)
|
||||
/// Pattern: {prefix}/{year}/{month}/{day}/{uuid}.{ext}
|
||||
/// </summary>
|
||||
public string GenerateKey(FileAccessLevel access, string fileName)
|
||||
{
|
||||
var now = DateTime.UtcNow;
|
||||
var prefix = access == FileAccessLevel.Public ? "public" : "private";
|
||||
var uuid = Guid.NewGuid().ToString("N"); // 32 chars, no hyphens
|
||||
var ext = Path.GetExtension(fileName);
|
||||
|
||||
// Example: private/2026/01/13/d290f1ee6c544b0190e6d701748f0851.pdf
|
||||
return $"{prefix}/{now:yyyy}/{now:MM}/{now:dd}/{uuid}{ext}";
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
#### Bucket Structure (Flat Physical)
|
||||
|
||||
```
|
||||
goodgo/
|
||||
├── private/
|
||||
│ ├── 2026/
|
||||
│ │ └── 01/
|
||||
│ │ └── 13/
|
||||
│ │ ├── d290f1ee6c544b0190e6d701748f0851.pdf
|
||||
│ │ ├── a7f3b2c19d8e4f01bcd5e92f7a4d8b63.jpg
|
||||
│ │ └── f9e4c1a2b7d36e5f8a0c4d9e2b1f7a8c.docx
|
||||
├── public/
|
||||
│ └── 2026/
|
||||
│ └── 01/
|
||||
│ └── 14/
|
||||
│ └── c3d8f2e1a9b47c6e5d0f8a3b2e1c7d9f.png
|
||||
└── shared/
|
||||
└── 2026/
|
||||
└── 01/
|
||||
└── 15/
|
||||
└── e1f2c3d4a5b6c7e8d9f0a1b2c3d4e5f6.xlsx
|
||||
|
||||
NO user/documents/... folders in bucket!
|
||||
```
|
||||
|
||||
#### Workflows
|
||||
|
||||
**1. Create Folder (Database Only)**
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
Client->>API: POST /api/v1/folders {name: "documents"}
|
||||
API->>Database: INSERT INTO folders (name, path)
|
||||
Database-->>API: folder_id
|
||||
API-->>Client: {id, name, path: "/documents"}
|
||||
Note over Client,API: Bucket is NOT touched
|
||||
```
|
||||
|
||||
**2. Upload File to Folder**
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
Client->>API: POST /sign-upload {fileName, folderId}
|
||||
API->>Database: Validate folder ownership
|
||||
API->>StorageKeyGen: Generate UUID key
|
||||
StorageKeyGen-->>API: private/2026/01/13/{uuid}.pdf
|
||||
API->>MinIO: Get pre-signed PUT URL
|
||||
MinIO-->>API: pre-signed URL
|
||||
API->>Database: INSERT file (folder_id, storage_key)
|
||||
API-->>Client: {uploadUrl, objectKey}
|
||||
Client->>MinIO: PUT file binary
|
||||
Client->>API: POST /confirm-upload
|
||||
```
|
||||
|
||||
**3. List Files in Folder**
|
||||
```sql
|
||||
-- Database query (fast, indexed)
|
||||
SELECT * FROM storage_files
|
||||
WHERE folder_id = '{folder-uuid}'
|
||||
AND is_deleted = false
|
||||
ORDER BY uploaded_at DESC;
|
||||
|
||||
-- Result returned to client:
|
||||
{
|
||||
"folder": {
|
||||
"id": "folder-123",
|
||||
"path": "/documents/work"
|
||||
},
|
||||
"files": [
|
||||
{
|
||||
"id": "file-456",
|
||||
"name": "report.pdf",
|
||||
"logicalPath": "/documents/work/report.pdf", -- Displayed to user
|
||||
"storageKey": "private/2026/01/13/{uuid}.pdf" -- Physical key
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**4. Rename Folder**
|
||||
```sql
|
||||
-- Only UPDATE database (instant, O(1))
|
||||
UPDATE folders
|
||||
SET name = 'docs',
|
||||
path = '/docs' -- Update materialized path
|
||||
WHERE id = '{folder-id}';
|
||||
|
||||
-- Update descendant paths
|
||||
UPDATE folders
|
||||
SET path = REPLACE(path, '/documents', '/docs')
|
||||
WHERE path LIKE '/documents/%';
|
||||
|
||||
-- Files KEEP their storage_key - DON'T TOUCH bucket!
|
||||
```
|
||||
|
||||
**5. Move File Between Folders**
|
||||
```sql
|
||||
-- Only UPDATE database (instant, O(1))
|
||||
UPDATE storage_files
|
||||
SET folder_id = '{new-folder-id}'
|
||||
WHERE id = '{file-id}';
|
||||
|
||||
-- Physical file STAYS at old storage_key
|
||||
-- No Copy/Delete needed in bucket
|
||||
```
|
||||
|
||||
#### Performance Comparison
|
||||
|
||||
| Operation | Anti-pattern (Bucket-based) | Correct (Logical) | Improvement |
|
||||
|-----------|----------------------------|-------------------|-------------|
|
||||
| **Create folder** | N/A (virtual) | INSERT to DB | Instant |
|
||||
| **Rename folder** | Copy + Delete millions of files | UPDATE 1-N rows in DB | ~1000x faster |
|
||||
| **Move file** | Copy + Delete 1 file | UPDATE 1 row in DB | ~100x faster |
|
||||
| **List files** | Bucket prefix scan | Indexed DB query | ~50x faster |
|
||||
| **Delete folder** | Delete millions of files | Cascade delete + background cleanup | Async |
|
||||
|
||||
#### Data Sovereignty Benefits
|
||||
|
||||
1. **Performance:**
|
||||
- Rename folder: O(1) instead of O(n)
|
||||
- Move file: O(1) instead of O(1) bucket copy
|
||||
- List files: Indexed query instead of bucket scan
|
||||
|
||||
2. **Security:**
|
||||
- UUID keys are unpredictable
|
||||
- No path traversal vulnerabilities
|
||||
- Access control in database
|
||||
|
||||
3. **Scalability:**
|
||||
- Database handles metadata (fast)
|
||||
- Bucket handles blobs (simple)
|
||||
- Easy horizontal scaling
|
||||
|
||||
4. **Flexibility:**
|
||||
- Easy to add features (sharing, versioning)
|
||||
- Migration between storage providers
|
||||
- Support multiple buckets/regions
|
||||
|
||||
5. **Developer Experience:**
|
||||
- Client sees beautiful folder tree
|
||||
- Backend works with simple UUIDs
|
||||
- Clean separation of concerns
|
||||
|
||||
### Direct Upload Components
|
||||
|
||||
| Component | Purpose |
|
||||
|
||||
@@ -183,48 +183,144 @@ Response:
|
||||
}
|
||||
```
|
||||
|
||||
## Bucket Directory Structure
|
||||
## Logical Folder Architecture (Data Sovereignty)
|
||||
|
||||
Files are organized by access level and user ID with the following format:
|
||||
|
||||
```
|
||||
{bucket}/
|
||||
├── private/{userId}/{date}/{fileId}_{filename} → Owner access only (via pre-signed URL)
|
||||
├── public/{userId}/{date}/{fileId}_{filename} → Publicly accessible
|
||||
└── shared/{userId}/{date}/{fileId}_{filename} → Controlled by sharing rules
|
||||
```
|
||||
|
||||
### Object Key Format Details
|
||||
|
||||
| Component | Description | Example |
|
||||
|-----------|-------------|---------|
|
||||
| `{bucket}` | Bucket name (from config) | `goodgo` |
|
||||
| `{accessLevel}` | Access level prefix | `private`, `public`, `shared` |
|
||||
| `{userId}` | Uploader's user ID | `user123` |
|
||||
| `{date}` | Upload date (UTC) | `20260113` |
|
||||
| `{fileId}` | First 8 chars of GUID | `a1b2c3d4` |
|
||||
| `{filename}` | Sanitized file name | `document.pdf` |
|
||||
|
||||
### Real-World Example
|
||||
> ⚠️ **IMPORTANT**: Following the **Data Sovereignty** principle in microservices, folders are a **logical concept in the Database**, NOT dependent on bucket structure.
|
||||
|
||||
### Design Principles
|
||||
|
||||
Storage Service owns its own data model:
|
||||
- **Database** manages logical folder structure (hierarchy, paths, permissions)
|
||||
- **Bucket** only stores file binaries with UUID keys (flat structure)
|
||||
|
||||
### ❌ Anti-pattern (DON'T DO THIS)
|
||||
|
||||
```
|
||||
Bucket structure based on user paths:
|
||||
goodgo/
|
||||
├── private/
|
||||
│ └── user123/
|
||||
│ └── 20260113/
|
||||
│ ├── a1b2c3d4_document.pdf
|
||||
│ └── e5f6g7h8_image.jpg
|
||||
├── public/
|
||||
│ └── user456/
|
||||
│ └── 20260113/
|
||||
│ └── i9j0k1l2_avatar.png
|
||||
└── shared/
|
||||
└── user789/
|
||||
└── 20260113/
|
||||
└── m3n4o5p6_presentation.pptx
|
||||
├── users/john/documents/report.pdf ← BAD
|
||||
├── users/mary/images/photo.jpg ← BAD
|
||||
└── users/bob/work/presentation.pptx ← BAD
|
||||
|
||||
PROBLEMS:
|
||||
- Renaming folder = Moving millions of files (slow + risky)
|
||||
- Predictable paths = Vulnerable to attacks
|
||||
- Doesn't scale with millions of users
|
||||
```
|
||||
|
||||
> **Note**: The object key is returned in the `/sign-upload` response and must be sent back when calling `/confirm-upload`.
|
||||
### ✅ Correct Architecture (Logical Separation)
|
||||
|
||||
#### 1️⃣ Database: Logical Structure
|
||||
|
||||
```sql
|
||||
-- Folders table: Manages folder tree
|
||||
CREATE TABLE folders (
|
||||
id UUID PRIMARY KEY,
|
||||
user_id VARCHAR(255) NOT NULL,
|
||||
parent_id UUID REFERENCES folders(id),
|
||||
name VARCHAR(255) NOT NULL,
|
||||
path VARCHAR(1000) NOT NULL, -- Example: /docs/work/2024
|
||||
created_at TIMESTAMP
|
||||
);
|
||||
|
||||
-- Files table: Links to logical folder
|
||||
CREATE TABLE files (
|
||||
id UUID PRIMARY KEY,
|
||||
user_id VARCHAR(255) NOT NULL,
|
||||
folder_id UUID REFERENCES folders(id), -- Logical folder
|
||||
file_name VARCHAR(255) NOT NULL,
|
||||
storage_key VARCHAR(500) UNIQUE, -- Physical key (UUID)
|
||||
size_bytes BIGINT,
|
||||
content_type VARCHAR(100),
|
||||
access_level VARCHAR(20),
|
||||
created_at TIMESTAMP
|
||||
);
|
||||
```
|
||||
|
||||
#### 2️⃣ Bucket: Flat Physical Storage
|
||||
|
||||
Files are stored with **UUID-based keys** following pattern:
|
||||
|
||||
```
|
||||
{prefix}/{year}/{month}/{day}/{uuid}.{ext}
|
||||
|
||||
Example:
|
||||
goodgo/
|
||||
├── private/2026/01/13/d290f1ee6c544b0190e6d701748f0851.pdf
|
||||
├── private/2026/01/13/a7f3b2c19d8e4f01bcd5e92f7a4d8b63.jpg
|
||||
├── public/2026/01/14/f9e4c1a2b7d36e5f8a0c4d9e2b1f7a8c.png
|
||||
└── shared/2026/01/15/c3d8f2e1a9b47c6e5d0f8a3b2e1c7d9f.docx
|
||||
|
||||
NO user/folder structure in bucket!
|
||||
All are flat UUID keys.
|
||||
```
|
||||
|
||||
#### 3️⃣ Storage Key Generation
|
||||
|
||||
```csharp
|
||||
// Automatically generate physical key (NOT based on folder)
|
||||
public string GenerateStorageKey(FileAccessLevel access, string fileName)
|
||||
{
|
||||
var now = DateTime.UtcNow;
|
||||
var prefix = access == FileAccessLevel.Public ? "public" : "private";
|
||||
var uuid = Guid.NewGuid().ToString("N");
|
||||
var ext = Path.GetExtension(fileName);
|
||||
|
||||
// Pattern: {prefix}/{year}/{month}/{day}/{uuid}{ext}
|
||||
return $"{prefix}/{now:yyyy}/{now:MM}/{now:dd}/{uuid}{ext}";
|
||||
}
|
||||
```
|
||||
|
||||
### Benefits
|
||||
|
||||
| Operation | Database (Logic) | Bucket (Physical) | Performance |
|
||||
|-----------|------------------|-------------------|-------------|
|
||||
| **Create folder** | INSERT 1 row | Nothing | Instant |
|
||||
| **Rename folder** | UPDATE 1 row | Nothing | O(1) - Instant |
|
||||
| **Move file** | UPDATE File.folder_id | Nothing | O(1) - Instant |
|
||||
| **Delete folder** | Cascade delete | Queue cleanup | Background job |
|
||||
|
||||
**Compared to Anti-pattern:**
|
||||
- Rename folder: **O(1)** vs O(n) - 1000x faster
|
||||
- Security: UUID keys are unpredictable
|
||||
- Migration: Easy to switch storage provider
|
||||
|
||||
### Workflow Examples
|
||||
|
||||
**Upload file to folder:**
|
||||
```bash
|
||||
# 1. Create folder (Database only)
|
||||
POST /api/v1/folders
|
||||
{
|
||||
"name": "documents",
|
||||
"parentId": null
|
||||
}
|
||||
|
||||
# 2. Upload file to folder
|
||||
POST /api/v1/storage/sign-upload
|
||||
{
|
||||
"fileName": "report.pdf",
|
||||
"folderId": "folder-uuid-123", # Logical folder
|
||||
"fileSizeBytes": 1048576
|
||||
}
|
||||
|
||||
# Response: Physical key DOES NOT contain folder name
|
||||
{
|
||||
"uploadUrl": "...",
|
||||
"objectKey": "private/2026/01/13/d290f1ee6c544b0190e6d701748f0851.pdf"
|
||||
}
|
||||
```
|
||||
|
||||
**List files in folder:**
|
||||
```bash
|
||||
GET /api/v1/folders/{folderId}/files
|
||||
|
||||
# Database query: SELECT * FROM files WHERE folder_id = {folderId}
|
||||
# Client sees: /documents/report.pdf (logical path)
|
||||
# Bucket stores: private/2026/01/13/{uuid}.pdf (physical key)
|
||||
```
|
||||
|
||||
> 📚 **Technical Details:** See [ARCHITECTURE.md](./ARCHITECTURE.md) for deeper understanding of this pattern.
|
||||
|
||||
## Legacy Upload Example (Via Backend)
|
||||
|
||||
|
||||
Reference in New Issue
Block a user