feat: add documentation
This commit is contained in:
parent
a7dc96d1a5
commit
e88617b430
|
|
@ -0,0 +1,607 @@
|
|||
# Banatie Database Design
|
||||
|
||||
## 📊 Database Schema for AI Image Generation System
|
||||
|
||||
This document describes the complete database structure for Banatie - an AI-powered image generation service with support for named references, flows, and prompt URL caching.
|
||||
|
||||
**Version:** 2.0
|
||||
**Last Updated:** 2025-10-26
|
||||
**Status:** Approved for Implementation
|
||||
|
||||
---
|
||||
|
||||
## 🏗️ Architecture Overview
|
||||
|
||||
### Core Principles
|
||||
|
||||
1. **Dual Alias System**: Project-level (global) and Flow-level (temporary) scopes
|
||||
2. **Technical Aliases Computed**: `@last`, `@first`, `@upload` are calculated programmatically
|
||||
3. **Audit Trail**: Complete history of all generations with performance metrics
|
||||
4. **Referential Integrity**: Proper foreign keys and cascade rules
|
||||
5. **Simplicity First**: Minimal tables, JSONB for flexibility
|
||||
|
||||
### Scope Resolution Order
|
||||
|
||||
```
|
||||
Flow-scoped aliases (@hero in flow) → Project-scoped aliases (@logo global) → Technical aliases (@last, @first)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📋 Existing Tables (Unchanged)
|
||||
|
||||
### 1. ORGANIZATIONS
|
||||
|
||||
```typescript
|
||||
organizations {
|
||||
id: UUID (PK)
|
||||
name: TEXT
|
||||
slug: TEXT UNIQUE
|
||||
email: TEXT UNIQUE
|
||||
created_at: TIMESTAMP
|
||||
updated_at: TIMESTAMP
|
||||
}
|
||||
```
|
||||
|
||||
**Purpose:** Top-level entity for multi-tenant system
|
||||
|
||||
---
|
||||
|
||||
### 2. PROJECTS
|
||||
|
||||
```typescript
|
||||
projects {
|
||||
id: UUID (PK)
|
||||
organization_id: UUID (FK -> organizations) CASCADE
|
||||
name: TEXT
|
||||
slug: TEXT
|
||||
created_at: TIMESTAMP
|
||||
updated_at: TIMESTAMP
|
||||
|
||||
UNIQUE INDEX(organization_id, slug)
|
||||
}
|
||||
```
|
||||
|
||||
**Purpose:** Container for all project-specific data (images, generations, flows)
|
||||
|
||||
---
|
||||
|
||||
### 3. API_KEYS
|
||||
|
||||
```typescript
|
||||
api_keys {
|
||||
id: UUID (PK)
|
||||
key_hash: TEXT UNIQUE
|
||||
key_prefix: TEXT DEFAULT 'bnt_'
|
||||
key_type: ENUM('master', 'project')
|
||||
|
||||
organization_id: UUID (FK -> organizations) CASCADE
|
||||
project_id: UUID (FK -> projects) CASCADE
|
||||
|
||||
scopes: JSONB DEFAULT ['generate']
|
||||
|
||||
created_at: TIMESTAMP
|
||||
expires_at: TIMESTAMP
|
||||
last_used_at: TIMESTAMP
|
||||
is_active: BOOLEAN DEFAULT true
|
||||
|
||||
name: TEXT
|
||||
created_by: UUID
|
||||
}
|
||||
```
|
||||
|
||||
**Purpose:** Authentication and authorization for API access
|
||||
|
||||
---
|
||||
|
||||
## 🆕 New Tables
|
||||
|
||||
### 4. FLOWS
|
||||
|
||||
```typescript
|
||||
flows {
|
||||
id: UUID (PK)
|
||||
project_id: UUID (FK -> projects) CASCADE
|
||||
|
||||
// Flow-scoped named aliases (user-assigned only)
|
||||
// Technical aliases (@last, @first, @upload) computed programmatically
|
||||
// Format: { "@hero": "image-uuid", "@product": "image-uuid" }
|
||||
aliases: JSONB DEFAULT {}
|
||||
|
||||
meta: JSONB DEFAULT {}
|
||||
|
||||
created_at: TIMESTAMP
|
||||
// Updates on every generation/upload activity within this flow
|
||||
updated_at: TIMESTAMP
|
||||
}
|
||||
```
|
||||
|
||||
**Purpose:** Temporary chains of generations with flow-scoped references
|
||||
|
||||
**Key Design Decisions:**
|
||||
- No `status` field - computed from generations
|
||||
- No `name`/`description` - flows are programmatic, not user-facing
|
||||
- No `expires_at` - cleanup handled programmatically via `created_at`
|
||||
- `aliases` stores only user-assigned aliases, not technical ones
|
||||
|
||||
**Indexes:**
|
||||
```sql
|
||||
CREATE INDEX idx_flows_project ON flows(project_id, created_at DESC);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 5. IMAGES
|
||||
|
||||
```typescript
|
||||
images {
|
||||
id: UUID (PK)
|
||||
|
||||
// Relations
|
||||
project_id: UUID (FK -> projects) CASCADE
|
||||
generation_id: UUID (FK -> generations) SET NULL
|
||||
flow_id: UUID (FK -> flows) CASCADE
|
||||
api_key_id: UUID (FK -> api_keys) SET NULL
|
||||
|
||||
// Storage (MinIO path format: orgSlug/projectSlug/category/YYYY-MM/filename.ext)
|
||||
storage_key: VARCHAR(500) UNIQUE
|
||||
storage_url: TEXT
|
||||
|
||||
// File metadata
|
||||
mime_type: VARCHAR(100)
|
||||
file_size: INTEGER
|
||||
file_hash: VARCHAR(64) // SHA-256 for deduplication
|
||||
|
||||
// Dimensions
|
||||
width: INTEGER
|
||||
height: INTEGER
|
||||
aspect_ratio: VARCHAR(10)
|
||||
|
||||
// Focal point for image transformations (imageflow)
|
||||
// Normalized coordinates: { "x": 0.5, "y": 0.3 } where 0.0-1.0
|
||||
focal_point: JSONB
|
||||
|
||||
// Source
|
||||
source: ENUM('generated', 'uploaded')
|
||||
|
||||
// Project-level alias (global scope)
|
||||
// Flow-level aliases stored in flows.aliases
|
||||
alias: VARCHAR(100) // @product, @logo
|
||||
|
||||
// Metadata
|
||||
description: TEXT
|
||||
tags: TEXT[]
|
||||
meta: JSONB DEFAULT {}
|
||||
|
||||
// Audit
|
||||
created_at: TIMESTAMP
|
||||
updated_at: TIMESTAMP
|
||||
deleted_at: TIMESTAMP // Soft delete
|
||||
}
|
||||
```
|
||||
|
||||
**Purpose:** Centralized storage for all images (uploaded + generated)
|
||||
|
||||
**Key Design Decisions:**
|
||||
- `flow_id` enables flow-scoped uploads
|
||||
- `alias` is for project-scope only (global across project)
|
||||
- Flow-scoped aliases stored in `flows.aliases` table
|
||||
- `focal_point` for imageflow server integration
|
||||
- `api_key_id` for audit trail of who created the image
|
||||
- Soft delete via `deleted_at` for recovery
|
||||
|
||||
**Constraints:**
|
||||
```sql
|
||||
CHECK (source = 'uploaded' AND generation_id IS NULL)
|
||||
OR (source = 'generated' AND generation_id IS NOT NULL)
|
||||
|
||||
CHECK alias IS NULL OR alias ~ '^@[a-zA-Z0-9_-]+$'
|
||||
|
||||
CHECK file_size > 0
|
||||
|
||||
CHECK (width IS NULL OR (width > 0 AND width <= 8192))
|
||||
AND (height IS NULL OR (height > 0 AND height <= 8192))
|
||||
```
|
||||
|
||||
**Indexes:**
|
||||
```sql
|
||||
CREATE UNIQUE INDEX idx_images_project_alias
|
||||
ON images(project_id, alias)
|
||||
WHERE alias IS NOT NULL AND deleted_at IS NULL AND flow_id IS NULL;
|
||||
|
||||
CREATE INDEX idx_images_project_source
|
||||
ON images(project_id, source, created_at DESC)
|
||||
WHERE deleted_at IS NULL;
|
||||
|
||||
CREATE INDEX idx_images_flow ON images(flow_id) WHERE flow_id IS NOT NULL;
|
||||
CREATE INDEX idx_images_generation ON images(generation_id);
|
||||
CREATE INDEX idx_images_storage_key ON images(storage_key);
|
||||
CREATE INDEX idx_images_hash ON images(file_hash);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 6. GENERATIONS
|
||||
|
||||
```typescript
|
||||
generations {
|
||||
id: UUID (PK)
|
||||
|
||||
// Relations
|
||||
project_id: UUID (FK -> projects) CASCADE
|
||||
flow_id: UUID (FK -> flows) SET NULL
|
||||
api_key_id: UUID (FK -> api_keys) SET NULL
|
||||
|
||||
// Status
|
||||
status: ENUM('pending', 'processing', 'success', 'failed') DEFAULT 'pending'
|
||||
|
||||
// Prompts
|
||||
original_prompt: TEXT
|
||||
enhanced_prompt: TEXT // AI-enhanced version (if enabled)
|
||||
|
||||
// Generation parameters
|
||||
aspect_ratio: VARCHAR(10)
|
||||
width: INTEGER
|
||||
height: INTEGER
|
||||
|
||||
// AI Model
|
||||
model_name: VARCHAR(100) DEFAULT 'gemini-flash-image-001'
|
||||
model_version: VARCHAR(50)
|
||||
|
||||
// Result
|
||||
output_image_id: UUID (FK -> images) SET NULL
|
||||
|
||||
// Referenced images used in generation
|
||||
// Format: [{ "imageId": "uuid", "alias": "@product" }, ...]
|
||||
referenced_images: JSONB
|
||||
|
||||
// Error handling
|
||||
error_message: TEXT
|
||||
error_code: VARCHAR(50)
|
||||
retry_count: INTEGER DEFAULT 0
|
||||
|
||||
// Metrics
|
||||
processing_time_ms: INTEGER
|
||||
cost: INTEGER // In cents (USD)
|
||||
|
||||
// Request context
|
||||
request_id: UUID // For log correlation
|
||||
user_agent: TEXT
|
||||
ip_address: INET
|
||||
|
||||
// Metadata
|
||||
meta: JSONB DEFAULT {}
|
||||
|
||||
// Audit
|
||||
created_at: TIMESTAMP
|
||||
updated_at: TIMESTAMP
|
||||
}
|
||||
```
|
||||
|
||||
**Purpose:** Complete audit trail of all image generations
|
||||
|
||||
**Key Design Decisions:**
|
||||
- `referenced_images` as JSONB instead of M:N table (simpler, sufficient for reference info)
|
||||
- No `parent_generation_id` - not needed for MVP
|
||||
- No `final_prompt` - redundant with `enhanced_prompt` or `original_prompt`
|
||||
- No `completed_at` - use `updated_at` when `status` changes to success/failed
|
||||
- `api_key_id` for audit trail of who made the request
|
||||
- Technical aliases resolved programmatically, not stored
|
||||
|
||||
**Referenced Images Format:**
|
||||
```json
|
||||
[
|
||||
{ "imageId": "uuid-1", "alias": "@product" },
|
||||
{ "imageId": "uuid-2", "alias": "@style" }
|
||||
]
|
||||
```
|
||||
|
||||
**Constraints:**
|
||||
```sql
|
||||
CHECK (status = 'success' AND output_image_id IS NOT NULL)
|
||||
OR (status != 'success')
|
||||
|
||||
CHECK (status = 'failed' AND error_message IS NOT NULL)
|
||||
OR (status != 'failed')
|
||||
|
||||
CHECK retry_count >= 0
|
||||
|
||||
CHECK processing_time_ms IS NULL OR processing_time_ms >= 0
|
||||
|
||||
CHECK cost IS NULL OR cost >= 0
|
||||
```
|
||||
|
||||
**Indexes:**
|
||||
```sql
|
||||
CREATE INDEX idx_generations_project_status
|
||||
ON generations(project_id, status, created_at DESC);
|
||||
|
||||
CREATE INDEX idx_generations_flow
|
||||
ON generations(flow_id, created_at DESC)
|
||||
WHERE flow_id IS NOT NULL;
|
||||
|
||||
CREATE INDEX idx_generations_output ON generations(output_image_id);
|
||||
CREATE INDEX idx_generations_request ON generations(request_id);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 7. PROMPT_URL_CACHE
|
||||
|
||||
```typescript
|
||||
prompt_url_cache {
|
||||
id: UUID (PK)
|
||||
|
||||
// Relations
|
||||
project_id: UUID (FK -> projects) CASCADE
|
||||
generation_id: UUID (FK -> generations) CASCADE
|
||||
image_id: UUID (FK -> images) CASCADE
|
||||
|
||||
// Cache keys (SHA-256 hashes)
|
||||
prompt_hash: VARCHAR(64)
|
||||
query_params_hash: VARCHAR(64)
|
||||
|
||||
// Original request (for debugging/reconstruction)
|
||||
original_prompt: TEXT
|
||||
request_params: JSONB // { width, height, aspectRatio, template, ... }
|
||||
|
||||
// Cache statistics
|
||||
hit_count: INTEGER DEFAULT 0
|
||||
last_hit_at: TIMESTAMP
|
||||
|
||||
// Audit
|
||||
created_at: TIMESTAMP
|
||||
}
|
||||
```
|
||||
|
||||
**Purpose:** Deduplication and caching for Prompt URL feature
|
||||
|
||||
**Key Design Decisions:**
|
||||
- Composite unique key: `project_id + prompt_hash + query_params_hash`
|
||||
- No `expires_at` - cache lives forever unless manually cleared
|
||||
- Tracks `hit_count` for analytics
|
||||
|
||||
**Constraints:**
|
||||
```sql
|
||||
CHECK hit_count >= 0
|
||||
```
|
||||
|
||||
**Indexes:**
|
||||
```sql
|
||||
CREATE UNIQUE INDEX idx_cache_key
|
||||
ON prompt_url_cache(project_id, prompt_hash, query_params_hash);
|
||||
|
||||
CREATE INDEX idx_cache_generation ON prompt_url_cache(generation_id);
|
||||
CREATE INDEX idx_cache_image ON prompt_url_cache(image_id);
|
||||
CREATE INDEX idx_cache_hits
|
||||
ON prompt_url_cache(project_id, hit_count DESC, created_at DESC);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🔗 Relationships Summary
|
||||
|
||||
### One-to-Many (1:M)
|
||||
|
||||
1. **organizations → projects** (CASCADE)
|
||||
2. **organizations → api_keys** (CASCADE)
|
||||
3. **projects → api_keys** (CASCADE)
|
||||
4. **projects → flows** (CASCADE)
|
||||
5. **projects → images** (CASCADE)
|
||||
6. **projects → generations** (CASCADE)
|
||||
7. **projects → prompt_url_cache** (CASCADE)
|
||||
8. **flows → images** (CASCADE)
|
||||
9. **flows → generations** (SET NULL)
|
||||
10. **generations → images** (SET NULL) - output image
|
||||
11. **api_keys → images** (SET NULL) - who created
|
||||
12. **api_keys → generations** (SET NULL) - who requested
|
||||
|
||||
### Cascade Rules
|
||||
|
||||
**ON DELETE CASCADE:**
|
||||
- Deleting organization → deletes all projects, api_keys
|
||||
- Deleting project → deletes all flows, images, generations, cache
|
||||
- Deleting flow → deletes all flow-scoped images
|
||||
- Deleting generation → nothing (orphaned references OK)
|
||||
|
||||
**ON DELETE SET NULL:**
|
||||
- Deleting generation → sets `images.generation_id` to NULL
|
||||
- Deleting image → sets `generations.output_image_id` to NULL
|
||||
- Deleting flow → sets `generations.flow_id` to NULL
|
||||
- Deleting api_key → sets audit references to NULL
|
||||
|
||||
---
|
||||
|
||||
## 🎯 Alias System
|
||||
|
||||
### Two-Tier Alias Scope
|
||||
|
||||
#### Project-Scoped (Global)
|
||||
- **Storage:** `images.alias` column
|
||||
- **Lifetime:** Permanent (until image deleted)
|
||||
- **Visibility:** Across entire project
|
||||
- **Examples:** `@logo`, `@brand`, `@header`
|
||||
- **Use Case:** Reusable brand assets
|
||||
|
||||
#### Flow-Scoped (Temporary)
|
||||
- **Storage:** `flows.aliases` JSONB
|
||||
- **Lifetime:** Duration of flow
|
||||
- **Visibility:** Only within specific flow
|
||||
- **Examples:** `@hero`, `@product`, `@variant`
|
||||
- **Use Case:** Conversational generation chains
|
||||
|
||||
#### Technical Aliases (Computed)
|
||||
- **Storage:** None (computed on-the-fly)
|
||||
- **Types:**
|
||||
- `@last` - Last generation in flow (any status)
|
||||
- `@first` - First generation in flow
|
||||
- `@upload` - Last uploaded image in flow
|
||||
- **Implementation:** Query-based resolution
|
||||
|
||||
### Resolution Algorithm
|
||||
|
||||
```
|
||||
1. Check if technical alias (@last, @first, @upload) → compute from flow data
|
||||
2. Check flow.aliases for flow-scoped alias → return if found
|
||||
3. Check images.alias for project-scoped alias → return if found
|
||||
4. Return null (alias not found)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🔧 Dual Alias Assignment
|
||||
|
||||
### Uploads
|
||||
```typescript
|
||||
POST /api/images/upload
|
||||
{
|
||||
file: <binary>,
|
||||
alias: "@product", // Project-scoped (optional)
|
||||
flowAlias: "@hero", // Flow-scoped (optional)
|
||||
flowId: "uuid" // Required if flowAlias provided
|
||||
}
|
||||
```
|
||||
|
||||
**Result:**
|
||||
- If `alias` provided → set `images.alias = "@product"`
|
||||
- If `flowAlias` provided → add to `flows.aliases["@hero"] = imageId`
|
||||
- Can have both simultaneously
|
||||
|
||||
### Generations
|
||||
```typescript
|
||||
POST /api/generations
|
||||
{
|
||||
prompt: "hero image",
|
||||
assignAlias: "@brand", // Project-scoped (optional)
|
||||
assignFlowAlias: "@hero", // Flow-scoped (optional)
|
||||
flowId: "uuid"
|
||||
}
|
||||
```
|
||||
|
||||
**Result (after successful generation):**
|
||||
- If `assignAlias` → set `images.alias = "@brand"` on output image
|
||||
- If `assignFlowAlias` → add to `flows.aliases["@hero"] = outputImageId`
|
||||
|
||||
---
|
||||
|
||||
## 📊 Performance Optimizations
|
||||
|
||||
### Critical Indexes
|
||||
|
||||
All indexes listed in individual table sections above. Key performance considerations:
|
||||
|
||||
1. **Alias Lookup:** Partial index on `images(project_id, alias)` WHERE conditions
|
||||
2. **Flow Activity:** Composite index on `generations(flow_id, created_at)`
|
||||
3. **Cache Hit:** Unique composite on `prompt_url_cache(project_id, prompt_hash, query_params_hash)`
|
||||
4. **Audit Queries:** Indexes on `api_key_id` columns
|
||||
|
||||
### Denormalization
|
||||
|
||||
**Avoided intentionally:**
|
||||
- No counters (image_count, generation_count)
|
||||
- Computed via COUNT(*) queries with proper indexes
|
||||
- Simpler, more reliable, less trigger overhead
|
||||
|
||||
---
|
||||
|
||||
## 🧹 Data Lifecycle
|
||||
|
||||
### Soft Delete
|
||||
|
||||
**Tables with soft delete:**
|
||||
- `images` - via `deleted_at` column
|
||||
|
||||
**Cleanup strategy:**
|
||||
- Hard delete after 30 days of soft delete
|
||||
- Implemented via cron job or manual cleanup script
|
||||
|
||||
### Hard Delete
|
||||
|
||||
**Tables with hard delete:**
|
||||
- `generations` - cascade deletes
|
||||
- `flows` - cascade deletes
|
||||
- `prompt_url_cache` - cascade deletes
|
||||
|
||||
---
|
||||
|
||||
## 🔐 Security & Audit
|
||||
|
||||
### API Key Tracking
|
||||
|
||||
All mutations tracked via `api_key_id`:
|
||||
- `images.api_key_id` - who uploaded/generated
|
||||
- `generations.api_key_id` - who requested generation
|
||||
|
||||
### Request Correlation
|
||||
|
||||
- `generations.request_id` - correlate with application logs
|
||||
- `generations.user_agent` - client identification
|
||||
- `generations.ip_address` - rate limiting, abuse prevention
|
||||
|
||||
---
|
||||
|
||||
## 🚀 Migration Strategy
|
||||
|
||||
### Phase 1: Core Tables
|
||||
1. Create `flows` table
|
||||
2. Create `images` table
|
||||
3. Create `generations` table
|
||||
4. Add all indexes and constraints
|
||||
5. Migrate existing MinIO data to `images` table
|
||||
|
||||
### Phase 2: Advanced Features
|
||||
1. Create `prompt_url_cache` table
|
||||
2. Add indexes
|
||||
3. Implement cache warming for existing data (optional)
|
||||
|
||||
---
|
||||
|
||||
## 📝 Design Decisions Log
|
||||
|
||||
### Why JSONB for `flows.aliases`?
|
||||
- Simple key-value structure
|
||||
- No need for JOINs
|
||||
- Flexible schema
|
||||
- Atomic updates
|
||||
- Trade-off: No referential integrity (acceptable for temporary data)
|
||||
|
||||
### Why JSONB for `generations.referenced_images`?
|
||||
- Reference info is append-only
|
||||
- No need for complex queries on references
|
||||
- Simpler schema (one less table)
|
||||
- Trade-off: No CASCADE on image deletion (acceptable)
|
||||
|
||||
### Why no `namespaces`?
|
||||
- Adds complexity without clear benefit for MVP
|
||||
- Flow-scoped + project-scoped aliases sufficient
|
||||
- Can add later if needed
|
||||
|
||||
### Why no `generation_groups`?
|
||||
- Not needed for core functionality
|
||||
- Grouping can be done via tags or meta JSONB
|
||||
- Can add later if analytics requires it
|
||||
|
||||
### Why `focal_point` as JSONB?
|
||||
- Imageflow server expects normalized coordinates
|
||||
- Format: `{ "x": 0.0-1.0, "y": 0.0-1.0 }`
|
||||
- JSONB allows future extension (e.g., multiple focal points)
|
||||
|
||||
### Why track `api_key_id` in images/generations?
|
||||
- Essential for audit trail
|
||||
- Cost attribution per key
|
||||
- Usage analytics
|
||||
- Abuse detection
|
||||
|
||||
---
|
||||
|
||||
## 📚 References
|
||||
|
||||
- **Imageflow Focal Points:** https://docs.imageflow.io/querystring/focal-point
|
||||
- **Drizzle ORM:** https://orm.drizzle.team/
|
||||
- **PostgreSQL JSONB:** https://www.postgresql.org/docs/current/datatype-json.html
|
||||
|
||||
---
|
||||
|
||||
*Document Version: 2.0*
|
||||
*Last Updated: 2025-10-26*
|
||||
*Status: Ready for Implementation*
|
||||
Loading…
Reference in New Issue