# Banatie Database Design ## ๐Ÿ“Š Database Schema for AI Image Generation System This document describes the complete database structure for Banatie - an AI-powered image generation service with support for named references, flows, and prompt URL caching. **Version:** 2.0 **Last Updated:** 2025-10-26 **Status:** Approved for Implementation --- ## ๐Ÿ—๏ธ Architecture Overview ### Core Principles 1. **Dual Alias System**: Project-level (global) and Flow-level (temporary) scopes 2. **Technical Aliases Computed**: `@last`, `@first`, `@upload` are calculated programmatically 3. **Audit Trail**: Complete history of all generations with performance metrics 4. **Referential Integrity**: Proper foreign keys and cascade rules 5. **Simplicity First**: Minimal tables, JSONB for flexibility ### Scope Resolution Order ``` Flow-scoped aliases (@hero in flow) โ†’ Project-scoped aliases (@logo global) โ†’ Technical aliases (@last, @first) ``` --- ## ๐Ÿ“‹ Existing Tables (Unchanged) ### 1. ORGANIZATIONS ```typescript organizations { id: UUID (PK) name: TEXT slug: TEXT UNIQUE email: TEXT UNIQUE created_at: TIMESTAMP updated_at: TIMESTAMP } ``` **Purpose:** Top-level entity for multi-tenant system --- ### 2. PROJECTS ```typescript projects { id: UUID (PK) organization_id: UUID (FK -> organizations) CASCADE name: TEXT slug: TEXT created_at: TIMESTAMP updated_at: TIMESTAMP UNIQUE INDEX(organization_id, slug) } ``` **Purpose:** Container for all project-specific data (images, generations, flows) --- ### 3. API_KEYS ```typescript api_keys { id: UUID (PK) key_hash: TEXT UNIQUE key_prefix: TEXT DEFAULT 'bnt_' key_type: ENUM('master', 'project') organization_id: UUID (FK -> organizations) CASCADE project_id: UUID (FK -> projects) CASCADE scopes: JSONB DEFAULT ['generate'] created_at: TIMESTAMP expires_at: TIMESTAMP last_used_at: TIMESTAMP is_active: BOOLEAN DEFAULT true name: TEXT created_by: UUID } ``` **Purpose:** Authentication and authorization for API access --- ## ๐Ÿ†• New Tables ### 4. FLOWS ```typescript flows { id: UUID (PK) project_id: UUID (FK -> projects) CASCADE // Flow-scoped named aliases (user-assigned only) // Technical aliases (@last, @first, @upload) computed programmatically // Format: { "@hero": "image-uuid", "@product": "image-uuid" } aliases: JSONB DEFAULT {} meta: JSONB DEFAULT {} created_at: TIMESTAMP // Updates on every generation/upload activity within this flow updated_at: TIMESTAMP } ``` **Purpose:** Temporary chains of generations with flow-scoped references **Key Design Decisions:** - No `status` field - computed from generations - No `name`/`description` - flows are programmatic, not user-facing - No `expires_at` - cleanup handled programmatically via `created_at` - `aliases` stores only user-assigned aliases, not technical ones **Indexes:** ```sql CREATE INDEX idx_flows_project ON flows(project_id, created_at DESC); ``` --- ### 5. IMAGES ```typescript images { id: UUID (PK) // Relations project_id: UUID (FK -> projects) CASCADE generation_id: UUID (FK -> generations) SET NULL flow_id: UUID (FK -> flows) CASCADE api_key_id: UUID (FK -> api_keys) SET NULL // Storage (MinIO path format: orgSlug/projectSlug/category/YYYY-MM/filename.ext) storage_key: VARCHAR(500) UNIQUE storage_url: TEXT // File metadata mime_type: VARCHAR(100) file_size: INTEGER file_hash: VARCHAR(64) // SHA-256 for deduplication // Dimensions width: INTEGER height: INTEGER aspect_ratio: VARCHAR(10) // Focal point for image transformations (imageflow) // Normalized coordinates: { "x": 0.5, "y": 0.3 } where 0.0-1.0 focal_point: JSONB // Source source: ENUM('generated', 'uploaded') // Project-level alias (global scope) // Flow-level aliases stored in flows.aliases alias: VARCHAR(100) // @product, @logo // Metadata description: TEXT tags: TEXT[] meta: JSONB DEFAULT {} // Audit created_at: TIMESTAMP updated_at: TIMESTAMP deleted_at: TIMESTAMP // Soft delete } ``` **Purpose:** Centralized storage for all images (uploaded + generated) **Key Design Decisions:** - `flow_id` enables flow-scoped uploads - `alias` is for project-scope only (global across project) - Flow-scoped aliases stored in `flows.aliases` table - `focal_point` for imageflow server integration - `api_key_id` for audit trail of who created the image - Soft delete via `deleted_at` for recovery **Constraints:** ```sql CHECK (source = 'uploaded' AND generation_id IS NULL) OR (source = 'generated' AND generation_id IS NOT NULL) CHECK alias IS NULL OR alias ~ '^@[a-zA-Z0-9_-]+$' CHECK file_size > 0 CHECK (width IS NULL OR (width > 0 AND width <= 8192)) AND (height IS NULL OR (height > 0 AND height <= 8192)) ``` **Indexes:** ```sql CREATE UNIQUE INDEX idx_images_project_alias ON images(project_id, alias) WHERE alias IS NOT NULL AND deleted_at IS NULL AND flow_id IS NULL; CREATE INDEX idx_images_project_source ON images(project_id, source, created_at DESC) WHERE deleted_at IS NULL; CREATE INDEX idx_images_flow ON images(flow_id) WHERE flow_id IS NOT NULL; CREATE INDEX idx_images_generation ON images(generation_id); CREATE INDEX idx_images_storage_key ON images(storage_key); CREATE INDEX idx_images_hash ON images(file_hash); ``` --- ### 6. GENERATIONS ```typescript generations { id: UUID (PK) // Relations project_id: UUID (FK -> projects) CASCADE flow_id: UUID (FK -> flows) SET NULL api_key_id: UUID (FK -> api_keys) SET NULL // Status status: ENUM('pending', 'processing', 'success', 'failed') DEFAULT 'pending' // Prompts original_prompt: TEXT enhanced_prompt: TEXT // AI-enhanced version (if enabled) // Generation parameters aspect_ratio: VARCHAR(10) width: INTEGER height: INTEGER // AI Model model_name: VARCHAR(100) DEFAULT 'gemini-flash-image-001' model_version: VARCHAR(50) // Result output_image_id: UUID (FK -> images) SET NULL // Referenced images used in generation // Format: [{ "imageId": "uuid", "alias": "@product" }, ...] referenced_images: JSONB // Error handling error_message: TEXT error_code: VARCHAR(50) retry_count: INTEGER DEFAULT 0 // Metrics processing_time_ms: INTEGER cost: INTEGER // In cents (USD) // Request context request_id: UUID // For log correlation user_agent: TEXT ip_address: INET // Metadata meta: JSONB DEFAULT {} // Audit created_at: TIMESTAMP updated_at: TIMESTAMP } ``` **Purpose:** Complete audit trail of all image generations **Key Design Decisions:** - `referenced_images` as JSONB instead of M:N table (simpler, sufficient for reference info) - No `parent_generation_id` - not needed for MVP - No `final_prompt` - redundant with `enhanced_prompt` or `original_prompt` - No `completed_at` - use `updated_at` when `status` changes to success/failed - `api_key_id` for audit trail of who made the request - Technical aliases resolved programmatically, not stored **Referenced Images Format:** ```json [ { "imageId": "uuid-1", "alias": "@product" }, { "imageId": "uuid-2", "alias": "@style" } ] ``` **Constraints:** ```sql CHECK (status = 'success' AND output_image_id IS NOT NULL) OR (status != 'success') CHECK (status = 'failed' AND error_message IS NOT NULL) OR (status != 'failed') CHECK retry_count >= 0 CHECK processing_time_ms IS NULL OR processing_time_ms >= 0 CHECK cost IS NULL OR cost >= 0 ``` **Indexes:** ```sql CREATE INDEX idx_generations_project_status ON generations(project_id, status, created_at DESC); CREATE INDEX idx_generations_flow ON generations(flow_id, created_at DESC) WHERE flow_id IS NOT NULL; CREATE INDEX idx_generations_output ON generations(output_image_id); CREATE INDEX idx_generations_request ON generations(request_id); ``` --- ### 7. PROMPT_URL_CACHE ```typescript prompt_url_cache { id: UUID (PK) // Relations project_id: UUID (FK -> projects) CASCADE generation_id: UUID (FK -> generations) CASCADE image_id: UUID (FK -> images) CASCADE // Cache keys (SHA-256 hashes) prompt_hash: VARCHAR(64) query_params_hash: VARCHAR(64) // Original request (for debugging/reconstruction) original_prompt: TEXT request_params: JSONB // { width, height, aspectRatio, template, ... } // Cache statistics hit_count: INTEGER DEFAULT 0 last_hit_at: TIMESTAMP // Audit created_at: TIMESTAMP } ``` **Purpose:** Deduplication and caching for Prompt URL feature **Key Design Decisions:** - Composite unique key: `project_id + prompt_hash + query_params_hash` - No `expires_at` - cache lives forever unless manually cleared - Tracks `hit_count` for analytics **Constraints:** ```sql CHECK hit_count >= 0 ``` **Indexes:** ```sql CREATE UNIQUE INDEX idx_cache_key ON prompt_url_cache(project_id, prompt_hash, query_params_hash); CREATE INDEX idx_cache_generation ON prompt_url_cache(generation_id); CREATE INDEX idx_cache_image ON prompt_url_cache(image_id); CREATE INDEX idx_cache_hits ON prompt_url_cache(project_id, hit_count DESC, created_at DESC); ``` --- ## ๐Ÿ”— Relationships Summary ### One-to-Many (1:M) 1. **organizations โ†’ projects** (CASCADE) 2. **organizations โ†’ api_keys** (CASCADE) 3. **projects โ†’ api_keys** (CASCADE) 4. **projects โ†’ flows** (CASCADE) 5. **projects โ†’ images** (CASCADE) 6. **projects โ†’ generations** (CASCADE) 7. **projects โ†’ prompt_url_cache** (CASCADE) 8. **flows โ†’ images** (CASCADE) 9. **flows โ†’ generations** (SET NULL) 10. **generations โ†’ images** (SET NULL) - output image 11. **api_keys โ†’ images** (SET NULL) - who created 12. **api_keys โ†’ generations** (SET NULL) - who requested ### Cascade Rules **ON DELETE CASCADE:** - Deleting organization โ†’ deletes all projects, api_keys - Deleting project โ†’ deletes all flows, images, generations, cache - Deleting flow โ†’ deletes all flow-scoped images - Deleting generation โ†’ nothing (orphaned references OK) **ON DELETE SET NULL:** - Deleting generation โ†’ sets `images.generation_id` to NULL - Deleting image โ†’ sets `generations.output_image_id` to NULL - Deleting flow โ†’ sets `generations.flow_id` to NULL - Deleting api_key โ†’ sets audit references to NULL --- ## ๐ŸŽฏ Alias System ### Two-Tier Alias Scope #### Project-Scoped (Global) - **Storage:** `images.alias` column - **Lifetime:** Permanent (until image deleted) - **Visibility:** Across entire project - **Examples:** `@logo`, `@brand`, `@header` - **Use Case:** Reusable brand assets #### Flow-Scoped (Temporary) - **Storage:** `flows.aliases` JSONB - **Lifetime:** Duration of flow - **Visibility:** Only within specific flow - **Examples:** `@hero`, `@product`, `@variant` - **Use Case:** Conversational generation chains #### Technical Aliases (Computed) - **Storage:** None (computed on-the-fly) - **Types:** - `@last` - Last generation in flow (any status) - `@first` - First generation in flow - `@upload` - Last uploaded image in flow - **Implementation:** Query-based resolution ### Resolution Algorithm ``` 1. Check if technical alias (@last, @first, @upload) โ†’ compute from flow data 2. Check flow.aliases for flow-scoped alias โ†’ return if found 3. Check images.alias for project-scoped alias โ†’ return if found 4. Return null (alias not found) ``` --- ## ๐Ÿ”ง Dual Alias Assignment ### Uploads ```typescript POST /api/images/upload { file: , alias: "@product", // Project-scoped (optional) flowAlias: "@hero", // Flow-scoped (optional) flowId: "uuid" // Required if flowAlias provided } ``` **Result:** - If `alias` provided โ†’ set `images.alias = "@product"` - If `flowAlias` provided โ†’ add to `flows.aliases["@hero"] = imageId` - Can have both simultaneously ### Generations ```typescript POST /api/generations { prompt: "hero image", assignAlias: "@brand", // Project-scoped (optional) assignFlowAlias: "@hero", // Flow-scoped (optional) flowId: "uuid" } ``` **Result (after successful generation):** - If `assignAlias` โ†’ set `images.alias = "@brand"` on output image - If `assignFlowAlias` โ†’ add to `flows.aliases["@hero"] = outputImageId` --- ## ๐Ÿ“Š Performance Optimizations ### Critical Indexes All indexes listed in individual table sections above. Key performance considerations: 1. **Alias Lookup:** Partial index on `images(project_id, alias)` WHERE conditions 2. **Flow Activity:** Composite index on `generations(flow_id, created_at)` 3. **Cache Hit:** Unique composite on `prompt_url_cache(project_id, prompt_hash, query_params_hash)` 4. **Audit Queries:** Indexes on `api_key_id` columns ### Denormalization **Avoided intentionally:** - No counters (image_count, generation_count) - Computed via COUNT(*) queries with proper indexes - Simpler, more reliable, less trigger overhead --- ## ๐Ÿงน Data Lifecycle ### Soft Delete **Tables with soft delete:** - `images` - via `deleted_at` column **Cleanup strategy:** - Hard delete after 30 days of soft delete - Implemented via cron job or manual cleanup script ### Hard Delete **Tables with hard delete:** - `generations` - cascade deletes - `flows` - cascade deletes - `prompt_url_cache` - cascade deletes --- ## ๐Ÿ” Security & Audit ### API Key Tracking All mutations tracked via `api_key_id`: - `images.api_key_id` - who uploaded/generated - `generations.api_key_id` - who requested generation ### Request Correlation - `generations.request_id` - correlate with application logs - `generations.user_agent` - client identification - `generations.ip_address` - rate limiting, abuse prevention --- ## ๐Ÿš€ Migration Strategy ### Phase 1: Core Tables 1. Create `flows` table 2. Create `images` table 3. Create `generations` table 4. Add all indexes and constraints 5. Migrate existing MinIO data to `images` table ### Phase 2: Advanced Features 1. Create `prompt_url_cache` table 2. Add indexes 3. Implement cache warming for existing data (optional) --- ## ๐Ÿ“ Design Decisions Log ### Why JSONB for `flows.aliases`? - Simple key-value structure - No need for JOINs - Flexible schema - Atomic updates - Trade-off: No referential integrity (acceptable for temporary data) ### Why JSONB for `generations.referenced_images`? - Reference info is append-only - No need for complex queries on references - Simpler schema (one less table) - Trade-off: No CASCADE on image deletion (acceptable) ### Why no `namespaces`? - Adds complexity without clear benefit for MVP - Flow-scoped + project-scoped aliases sufficient - Can add later if needed ### Why no `generation_groups`? - Not needed for core functionality - Grouping can be done via tags or meta JSONB - Can add later if analytics requires it ### Why `focal_point` as JSONB? - Imageflow server expects normalized coordinates - Format: `{ "x": 0.0-1.0, "y": 0.0-1.0 }` - JSONB allows future extension (e.g., multiple focal points) ### Why track `api_key_id` in images/generations? - Essential for audit trail - Cost attribution per key - Usage analytics - Abuse detection --- ## ๐Ÿ“š References - **Imageflow Focal Points:** https://docs.imageflow.io/querystring/focal-point - **Drizzle ORM:** https://orm.drizzle.team/ - **PostgreSQL JSONB:** https://www.postgresql.org/docs/current/datatype-json.html --- *Document Version: 2.0* *Last Updated: 2025-10-26* *Status: Ready for Implementation*