608 lines
15 KiB
Markdown
608 lines
15 KiB
Markdown
# Banatie Database Design
|
|
|
|
## 📊 Database Schema for AI Image Generation System
|
|
|
|
This document describes the complete database structure for Banatie - an AI-powered image generation service with support for named references, flows, and prompt URL caching.
|
|
|
|
**Version:** 2.0
|
|
**Last Updated:** 2025-10-26
|
|
**Status:** Approved for Implementation
|
|
|
|
---
|
|
|
|
## 🏗️ Architecture Overview
|
|
|
|
### Core Principles
|
|
|
|
1. **Dual Alias System**: Project-level (global) and Flow-level (temporary) scopes
|
|
2. **Technical Aliases Computed**: `@last`, `@first`, `@upload` are calculated programmatically
|
|
3. **Audit Trail**: Complete history of all generations with performance metrics
|
|
4. **Referential Integrity**: Proper foreign keys and cascade rules
|
|
5. **Simplicity First**: Minimal tables, JSONB for flexibility
|
|
|
|
### Scope Resolution Order
|
|
|
|
```
|
|
Flow-scoped aliases (@hero in flow) → Project-scoped aliases (@logo global) → Technical aliases (@last, @first)
|
|
```
|
|
|
|
---
|
|
|
|
## 📋 Existing Tables (Unchanged)
|
|
|
|
### 1. ORGANIZATIONS
|
|
|
|
```typescript
|
|
organizations {
|
|
id: UUID (PK)
|
|
name: TEXT
|
|
slug: TEXT UNIQUE
|
|
email: TEXT UNIQUE
|
|
created_at: TIMESTAMP
|
|
updated_at: TIMESTAMP
|
|
}
|
|
```
|
|
|
|
**Purpose:** Top-level entity for multi-tenant system
|
|
|
|
---
|
|
|
|
### 2. PROJECTS
|
|
|
|
```typescript
|
|
projects {
|
|
id: UUID (PK)
|
|
organization_id: UUID (FK -> organizations) CASCADE
|
|
name: TEXT
|
|
slug: TEXT
|
|
created_at: TIMESTAMP
|
|
updated_at: TIMESTAMP
|
|
|
|
UNIQUE INDEX(organization_id, slug)
|
|
}
|
|
```
|
|
|
|
**Purpose:** Container for all project-specific data (images, generations, flows)
|
|
|
|
---
|
|
|
|
### 3. API_KEYS
|
|
|
|
```typescript
|
|
api_keys {
|
|
id: UUID (PK)
|
|
key_hash: TEXT UNIQUE
|
|
key_prefix: TEXT DEFAULT 'bnt_'
|
|
key_type: ENUM('master', 'project')
|
|
|
|
organization_id: UUID (FK -> organizations) CASCADE
|
|
project_id: UUID (FK -> projects) CASCADE
|
|
|
|
scopes: JSONB DEFAULT ['generate']
|
|
|
|
created_at: TIMESTAMP
|
|
expires_at: TIMESTAMP
|
|
last_used_at: TIMESTAMP
|
|
is_active: BOOLEAN DEFAULT true
|
|
|
|
name: TEXT
|
|
created_by: UUID
|
|
}
|
|
```
|
|
|
|
**Purpose:** Authentication and authorization for API access
|
|
|
|
---
|
|
|
|
## 🆕 New Tables
|
|
|
|
### 4. FLOWS
|
|
|
|
```typescript
|
|
flows {
|
|
id: UUID (PK)
|
|
project_id: UUID (FK -> projects) CASCADE
|
|
|
|
// Flow-scoped named aliases (user-assigned only)
|
|
// Technical aliases (@last, @first, @upload) computed programmatically
|
|
// Format: { "@hero": "image-uuid", "@product": "image-uuid" }
|
|
aliases: JSONB DEFAULT {}
|
|
|
|
meta: JSONB DEFAULT {}
|
|
|
|
created_at: TIMESTAMP
|
|
// Updates on every generation/upload activity within this flow
|
|
updated_at: TIMESTAMP
|
|
}
|
|
```
|
|
|
|
**Purpose:** Temporary chains of generations with flow-scoped references
|
|
|
|
**Key Design Decisions:**
|
|
- No `status` field - computed from generations
|
|
- No `name`/`description` - flows are programmatic, not user-facing
|
|
- No `expires_at` - cleanup handled programmatically via `created_at`
|
|
- `aliases` stores only user-assigned aliases, not technical ones
|
|
|
|
**Indexes:**
|
|
```sql
|
|
CREATE INDEX idx_flows_project ON flows(project_id, created_at DESC);
|
|
```
|
|
|
|
---
|
|
|
|
### 5. IMAGES
|
|
|
|
```typescript
|
|
images {
|
|
id: UUID (PK)
|
|
|
|
// Relations
|
|
project_id: UUID (FK -> projects) CASCADE
|
|
generation_id: UUID (FK -> generations) SET NULL
|
|
flow_id: UUID (FK -> flows) CASCADE
|
|
api_key_id: UUID (FK -> api_keys) SET NULL
|
|
|
|
// Storage (MinIO path format: orgSlug/projectSlug/category/YYYY-MM/filename.ext)
|
|
storage_key: VARCHAR(500) UNIQUE
|
|
storage_url: TEXT
|
|
|
|
// File metadata
|
|
mime_type: VARCHAR(100)
|
|
file_size: INTEGER
|
|
file_hash: VARCHAR(64) // SHA-256 for deduplication
|
|
|
|
// Dimensions
|
|
width: INTEGER
|
|
height: INTEGER
|
|
aspect_ratio: VARCHAR(10)
|
|
|
|
// Focal point for image transformations (imageflow)
|
|
// Normalized coordinates: { "x": 0.5, "y": 0.3 } where 0.0-1.0
|
|
focal_point: JSONB
|
|
|
|
// Source
|
|
source: ENUM('generated', 'uploaded')
|
|
|
|
// Project-level alias (global scope)
|
|
// Flow-level aliases stored in flows.aliases
|
|
alias: VARCHAR(100) // @product, @logo
|
|
|
|
// Metadata
|
|
description: TEXT
|
|
tags: TEXT[]
|
|
meta: JSONB DEFAULT {}
|
|
|
|
// Audit
|
|
created_at: TIMESTAMP
|
|
updated_at: TIMESTAMP
|
|
deleted_at: TIMESTAMP // Soft delete
|
|
}
|
|
```
|
|
|
|
**Purpose:** Centralized storage for all images (uploaded + generated)
|
|
|
|
**Key Design Decisions:**
|
|
- `flow_id` enables flow-scoped uploads
|
|
- `alias` is for project-scope only (global across project)
|
|
- Flow-scoped aliases stored in `flows.aliases` table
|
|
- `focal_point` for imageflow server integration
|
|
- `api_key_id` for audit trail of who created the image
|
|
- Soft delete via `deleted_at` for recovery
|
|
|
|
**Constraints:**
|
|
```sql
|
|
CHECK (source = 'uploaded' AND generation_id IS NULL)
|
|
OR (source = 'generated' AND generation_id IS NOT NULL)
|
|
|
|
CHECK alias IS NULL OR alias ~ '^@[a-zA-Z0-9_-]+$'
|
|
|
|
CHECK file_size > 0
|
|
|
|
CHECK (width IS NULL OR (width > 0 AND width <= 8192))
|
|
AND (height IS NULL OR (height > 0 AND height <= 8192))
|
|
```
|
|
|
|
**Indexes:**
|
|
```sql
|
|
CREATE UNIQUE INDEX idx_images_project_alias
|
|
ON images(project_id, alias)
|
|
WHERE alias IS NOT NULL AND deleted_at IS NULL AND flow_id IS NULL;
|
|
|
|
CREATE INDEX idx_images_project_source
|
|
ON images(project_id, source, created_at DESC)
|
|
WHERE deleted_at IS NULL;
|
|
|
|
CREATE INDEX idx_images_flow ON images(flow_id) WHERE flow_id IS NOT NULL;
|
|
CREATE INDEX idx_images_generation ON images(generation_id);
|
|
CREATE INDEX idx_images_storage_key ON images(storage_key);
|
|
CREATE INDEX idx_images_hash ON images(file_hash);
|
|
```
|
|
|
|
---
|
|
|
|
### 6. GENERATIONS
|
|
|
|
```typescript
|
|
generations {
|
|
id: UUID (PK)
|
|
|
|
// Relations
|
|
project_id: UUID (FK -> projects) CASCADE
|
|
flow_id: UUID (FK -> flows) SET NULL
|
|
api_key_id: UUID (FK -> api_keys) SET NULL
|
|
|
|
// Status
|
|
status: ENUM('pending', 'processing', 'success', 'failed') DEFAULT 'pending'
|
|
|
|
// Prompts
|
|
original_prompt: TEXT
|
|
enhanced_prompt: TEXT // AI-enhanced version (if enabled)
|
|
|
|
// Generation parameters
|
|
aspect_ratio: VARCHAR(10)
|
|
width: INTEGER
|
|
height: INTEGER
|
|
|
|
// AI Model
|
|
model_name: VARCHAR(100) DEFAULT 'gemini-flash-image-001'
|
|
model_version: VARCHAR(50)
|
|
|
|
// Result
|
|
output_image_id: UUID (FK -> images) SET NULL
|
|
|
|
// Referenced images used in generation
|
|
// Format: [{ "imageId": "uuid", "alias": "@product" }, ...]
|
|
referenced_images: JSONB
|
|
|
|
// Error handling
|
|
error_message: TEXT
|
|
error_code: VARCHAR(50)
|
|
retry_count: INTEGER DEFAULT 0
|
|
|
|
// Metrics
|
|
processing_time_ms: INTEGER
|
|
cost: INTEGER // In cents (USD)
|
|
|
|
// Request context
|
|
request_id: UUID // For log correlation
|
|
user_agent: TEXT
|
|
ip_address: INET
|
|
|
|
// Metadata
|
|
meta: JSONB DEFAULT {}
|
|
|
|
// Audit
|
|
created_at: TIMESTAMP
|
|
updated_at: TIMESTAMP
|
|
}
|
|
```
|
|
|
|
**Purpose:** Complete audit trail of all image generations
|
|
|
|
**Key Design Decisions:**
|
|
- `referenced_images` as JSONB instead of M:N table (simpler, sufficient for reference info)
|
|
- No `parent_generation_id` - not needed for MVP
|
|
- No `final_prompt` - redundant with `enhanced_prompt` or `original_prompt`
|
|
- No `completed_at` - use `updated_at` when `status` changes to success/failed
|
|
- `api_key_id` for audit trail of who made the request
|
|
- Technical aliases resolved programmatically, not stored
|
|
|
|
**Referenced Images Format:**
|
|
```json
|
|
[
|
|
{ "imageId": "uuid-1", "alias": "@product" },
|
|
{ "imageId": "uuid-2", "alias": "@style" }
|
|
]
|
|
```
|
|
|
|
**Constraints:**
|
|
```sql
|
|
CHECK (status = 'success' AND output_image_id IS NOT NULL)
|
|
OR (status != 'success')
|
|
|
|
CHECK (status = 'failed' AND error_message IS NOT NULL)
|
|
OR (status != 'failed')
|
|
|
|
CHECK retry_count >= 0
|
|
|
|
CHECK processing_time_ms IS NULL OR processing_time_ms >= 0
|
|
|
|
CHECK cost IS NULL OR cost >= 0
|
|
```
|
|
|
|
**Indexes:**
|
|
```sql
|
|
CREATE INDEX idx_generations_project_status
|
|
ON generations(project_id, status, created_at DESC);
|
|
|
|
CREATE INDEX idx_generations_flow
|
|
ON generations(flow_id, created_at DESC)
|
|
WHERE flow_id IS NOT NULL;
|
|
|
|
CREATE INDEX idx_generations_output ON generations(output_image_id);
|
|
CREATE INDEX idx_generations_request ON generations(request_id);
|
|
```
|
|
|
|
---
|
|
|
|
### 7. PROMPT_URL_CACHE
|
|
|
|
```typescript
|
|
prompt_url_cache {
|
|
id: UUID (PK)
|
|
|
|
// Relations
|
|
project_id: UUID (FK -> projects) CASCADE
|
|
generation_id: UUID (FK -> generations) CASCADE
|
|
image_id: UUID (FK -> images) CASCADE
|
|
|
|
// Cache keys (SHA-256 hashes)
|
|
prompt_hash: VARCHAR(64)
|
|
query_params_hash: VARCHAR(64)
|
|
|
|
// Original request (for debugging/reconstruction)
|
|
original_prompt: TEXT
|
|
request_params: JSONB // { width, height, aspectRatio, template, ... }
|
|
|
|
// Cache statistics
|
|
hit_count: INTEGER DEFAULT 0
|
|
last_hit_at: TIMESTAMP
|
|
|
|
// Audit
|
|
created_at: TIMESTAMP
|
|
}
|
|
```
|
|
|
|
**Purpose:** Deduplication and caching for Prompt URL feature
|
|
|
|
**Key Design Decisions:**
|
|
- Composite unique key: `project_id + prompt_hash + query_params_hash`
|
|
- No `expires_at` - cache lives forever unless manually cleared
|
|
- Tracks `hit_count` for analytics
|
|
|
|
**Constraints:**
|
|
```sql
|
|
CHECK hit_count >= 0
|
|
```
|
|
|
|
**Indexes:**
|
|
```sql
|
|
CREATE UNIQUE INDEX idx_cache_key
|
|
ON prompt_url_cache(project_id, prompt_hash, query_params_hash);
|
|
|
|
CREATE INDEX idx_cache_generation ON prompt_url_cache(generation_id);
|
|
CREATE INDEX idx_cache_image ON prompt_url_cache(image_id);
|
|
CREATE INDEX idx_cache_hits
|
|
ON prompt_url_cache(project_id, hit_count DESC, created_at DESC);
|
|
```
|
|
|
|
---
|
|
|
|
## 🔗 Relationships Summary
|
|
|
|
### One-to-Many (1:M)
|
|
|
|
1. **organizations → projects** (CASCADE)
|
|
2. **organizations → api_keys** (CASCADE)
|
|
3. **projects → api_keys** (CASCADE)
|
|
4. **projects → flows** (CASCADE)
|
|
5. **projects → images** (CASCADE)
|
|
6. **projects → generations** (CASCADE)
|
|
7. **projects → prompt_url_cache** (CASCADE)
|
|
8. **flows → images** (CASCADE)
|
|
9. **flows → generations** (SET NULL)
|
|
10. **generations → images** (SET NULL) - output image
|
|
11. **api_keys → images** (SET NULL) - who created
|
|
12. **api_keys → generations** (SET NULL) - who requested
|
|
|
|
### Cascade Rules
|
|
|
|
**ON DELETE CASCADE:**
|
|
- Deleting organization → deletes all projects, api_keys
|
|
- Deleting project → deletes all flows, images, generations, cache
|
|
- Deleting flow → deletes all flow-scoped images
|
|
- Deleting generation → nothing (orphaned references OK)
|
|
|
|
**ON DELETE SET NULL:**
|
|
- Deleting generation → sets `images.generation_id` to NULL
|
|
- Deleting image → sets `generations.output_image_id` to NULL
|
|
- Deleting flow → sets `generations.flow_id` to NULL
|
|
- Deleting api_key → sets audit references to NULL
|
|
|
|
---
|
|
|
|
## 🎯 Alias System
|
|
|
|
### Two-Tier Alias Scope
|
|
|
|
#### Project-Scoped (Global)
|
|
- **Storage:** `images.alias` column
|
|
- **Lifetime:** Permanent (until image deleted)
|
|
- **Visibility:** Across entire project
|
|
- **Examples:** `@logo`, `@brand`, `@header`
|
|
- **Use Case:** Reusable brand assets
|
|
|
|
#### Flow-Scoped (Temporary)
|
|
- **Storage:** `flows.aliases` JSONB
|
|
- **Lifetime:** Duration of flow
|
|
- **Visibility:** Only within specific flow
|
|
- **Examples:** `@hero`, `@product`, `@variant`
|
|
- **Use Case:** Conversational generation chains
|
|
|
|
#### Technical Aliases (Computed)
|
|
- **Storage:** None (computed on-the-fly)
|
|
- **Types:**
|
|
- `@last` - Last generation in flow (any status)
|
|
- `@first` - First generation in flow
|
|
- `@upload` - Last uploaded image in flow
|
|
- **Implementation:** Query-based resolution
|
|
|
|
### Resolution Algorithm
|
|
|
|
```
|
|
1. Check if technical alias (@last, @first, @upload) → compute from flow data
|
|
2. Check flow.aliases for flow-scoped alias → return if found
|
|
3. Check images.alias for project-scoped alias → return if found
|
|
4. Return null (alias not found)
|
|
```
|
|
|
|
---
|
|
|
|
## 🔧 Dual Alias Assignment
|
|
|
|
### Uploads
|
|
```typescript
|
|
POST /api/images/upload
|
|
{
|
|
file: <binary>,
|
|
alias: "@product", // Project-scoped (optional)
|
|
flowAlias: "@hero", // Flow-scoped (optional)
|
|
flowId: "uuid" // Required if flowAlias provided
|
|
}
|
|
```
|
|
|
|
**Result:**
|
|
- If `alias` provided → set `images.alias = "@product"`
|
|
- If `flowAlias` provided → add to `flows.aliases["@hero"] = imageId`
|
|
- Can have both simultaneously
|
|
|
|
### Generations
|
|
```typescript
|
|
POST /api/generations
|
|
{
|
|
prompt: "hero image",
|
|
assignAlias: "@brand", // Project-scoped (optional)
|
|
assignFlowAlias: "@hero", // Flow-scoped (optional)
|
|
flowId: "uuid"
|
|
}
|
|
```
|
|
|
|
**Result (after successful generation):**
|
|
- If `assignAlias` → set `images.alias = "@brand"` on output image
|
|
- If `assignFlowAlias` → add to `flows.aliases["@hero"] = outputImageId`
|
|
|
|
---
|
|
|
|
## 📊 Performance Optimizations
|
|
|
|
### Critical Indexes
|
|
|
|
All indexes listed in individual table sections above. Key performance considerations:
|
|
|
|
1. **Alias Lookup:** Partial index on `images(project_id, alias)` WHERE conditions
|
|
2. **Flow Activity:** Composite index on `generations(flow_id, created_at)`
|
|
3. **Cache Hit:** Unique composite on `prompt_url_cache(project_id, prompt_hash, query_params_hash)`
|
|
4. **Audit Queries:** Indexes on `api_key_id` columns
|
|
|
|
### Denormalization
|
|
|
|
**Avoided intentionally:**
|
|
- No counters (image_count, generation_count)
|
|
- Computed via COUNT(*) queries with proper indexes
|
|
- Simpler, more reliable, less trigger overhead
|
|
|
|
---
|
|
|
|
## 🧹 Data Lifecycle
|
|
|
|
### Soft Delete
|
|
|
|
**Tables with soft delete:**
|
|
- `images` - via `deleted_at` column
|
|
|
|
**Cleanup strategy:**
|
|
- Hard delete after 30 days of soft delete
|
|
- Implemented via cron job or manual cleanup script
|
|
|
|
### Hard Delete
|
|
|
|
**Tables with hard delete:**
|
|
- `generations` - cascade deletes
|
|
- `flows` - cascade deletes
|
|
- `prompt_url_cache` - cascade deletes
|
|
|
|
---
|
|
|
|
## 🔐 Security & Audit
|
|
|
|
### API Key Tracking
|
|
|
|
All mutations tracked via `api_key_id`:
|
|
- `images.api_key_id` - who uploaded/generated
|
|
- `generations.api_key_id` - who requested generation
|
|
|
|
### Request Correlation
|
|
|
|
- `generations.request_id` - correlate with application logs
|
|
- `generations.user_agent` - client identification
|
|
- `generations.ip_address` - rate limiting, abuse prevention
|
|
|
|
---
|
|
|
|
## 🚀 Migration Strategy
|
|
|
|
### Phase 1: Core Tables
|
|
1. Create `flows` table
|
|
2. Create `images` table
|
|
3. Create `generations` table
|
|
4. Add all indexes and constraints
|
|
5. Migrate existing MinIO data to `images` table
|
|
|
|
### Phase 2: Advanced Features
|
|
1. Create `prompt_url_cache` table
|
|
2. Add indexes
|
|
3. Implement cache warming for existing data (optional)
|
|
|
|
---
|
|
|
|
## 📝 Design Decisions Log
|
|
|
|
### Why JSONB for `flows.aliases`?
|
|
- Simple key-value structure
|
|
- No need for JOINs
|
|
- Flexible schema
|
|
- Atomic updates
|
|
- Trade-off: No referential integrity (acceptable for temporary data)
|
|
|
|
### Why JSONB for `generations.referenced_images`?
|
|
- Reference info is append-only
|
|
- No need for complex queries on references
|
|
- Simpler schema (one less table)
|
|
- Trade-off: No CASCADE on image deletion (acceptable)
|
|
|
|
### Why no `namespaces`?
|
|
- Adds complexity without clear benefit for MVP
|
|
- Flow-scoped + project-scoped aliases sufficient
|
|
- Can add later if needed
|
|
|
|
### Why no `generation_groups`?
|
|
- Not needed for core functionality
|
|
- Grouping can be done via tags or meta JSONB
|
|
- Can add later if analytics requires it
|
|
|
|
### Why `focal_point` as JSONB?
|
|
- Imageflow server expects normalized coordinates
|
|
- Format: `{ "x": 0.0-1.0, "y": 0.0-1.0 }`
|
|
- JSONB allows future extension (e.g., multiple focal points)
|
|
|
|
### Why track `api_key_id` in images/generations?
|
|
- Essential for audit trail
|
|
- Cost attribution per key
|
|
- Usage analytics
|
|
- Abuse detection
|
|
|
|
---
|
|
|
|
## 📚 References
|
|
|
|
- **Imageflow Focal Points:** https://docs.imageflow.io/querystring/focal-point
|
|
- **Drizzle ORM:** https://orm.drizzle.team/
|
|
- **PostgreSQL JSONB:** https://www.postgresql.org/docs/current/datatype-json.html
|
|
|
|
---
|
|
|
|
*Document Version: 2.0*
|
|
*Last Updated: 2025-10-26*
|
|
*Status: Ready for Implementation*
|