banatie-service/banatie-database-design.md

15 KiB

Banatie Database Design

📊 Database Schema for AI Image Generation System

This document describes the complete database structure for Banatie - an AI-powered image generation service with support for named references, flows, and prompt URL caching.

Version: 2.0
Last Updated: 2025-10-26
Status: Approved for Implementation


🏗️ Architecture Overview

Core Principles

  1. Dual Alias System: Project-level (global) and Flow-level (temporary) scopes
  2. Technical Aliases Computed: @last, @first, @upload are calculated programmatically
  3. Audit Trail: Complete history of all generations with performance metrics
  4. Referential Integrity: Proper foreign keys and cascade rules
  5. Simplicity First: Minimal tables, JSONB for flexibility

Scope Resolution Order

Flow-scoped aliases (@hero in flow) → Project-scoped aliases (@logo global) → Technical aliases (@last, @first)

📋 Existing Tables (Unchanged)

1. ORGANIZATIONS

organizations {
  id: UUID (PK)
  name: TEXT
  slug: TEXT UNIQUE
  email: TEXT UNIQUE
  created_at: TIMESTAMP
  updated_at: TIMESTAMP
}

Purpose: Top-level entity for multi-tenant system


2. PROJECTS

projects {
  id: UUID (PK)
  organization_id: UUID (FK -> organizations) CASCADE
  name: TEXT
  slug: TEXT
  created_at: TIMESTAMP
  updated_at: TIMESTAMP
  
  UNIQUE INDEX(organization_id, slug)
}

Purpose: Container for all project-specific data (images, generations, flows)


3. API_KEYS

api_keys {
  id: UUID (PK)
  key_hash: TEXT UNIQUE
  key_prefix: TEXT DEFAULT 'bnt_'
  key_type: ENUM('master', 'project')
  
  organization_id: UUID (FK -> organizations) CASCADE
  project_id: UUID (FK -> projects) CASCADE
  
  scopes: JSONB DEFAULT ['generate']
  
  created_at: TIMESTAMP
  expires_at: TIMESTAMP
  last_used_at: TIMESTAMP
  is_active: BOOLEAN DEFAULT true
  
  name: TEXT
  created_by: UUID
}

Purpose: Authentication and authorization for API access


🆕 New Tables

4. FLOWS

flows {
  id: UUID (PK)
  project_id: UUID (FK -> projects) CASCADE
  
  // Flow-scoped named aliases (user-assigned only)
  // Technical aliases (@last, @first, @upload) computed programmatically
  // Format: { "@hero": "image-uuid", "@product": "image-uuid" }
  aliases: JSONB DEFAULT {}
  
  meta: JSONB DEFAULT {}
  
  created_at: TIMESTAMP
  // Updates on every generation/upload activity within this flow
  updated_at: TIMESTAMP
}

Purpose: Temporary chains of generations with flow-scoped references

Key Design Decisions:

  • No status field - computed from generations
  • No name/description - flows are programmatic, not user-facing
  • No expires_at - cleanup handled programmatically via created_at
  • aliases stores only user-assigned aliases, not technical ones

Indexes:

CREATE INDEX idx_flows_project ON flows(project_id, created_at DESC);

5. IMAGES

images {
  id: UUID (PK)
  
  // Relations
  project_id: UUID (FK -> projects) CASCADE
  generation_id: UUID (FK -> generations) SET NULL
  flow_id: UUID (FK -> flows) CASCADE
  api_key_id: UUID (FK -> api_keys) SET NULL
  
  // Storage (MinIO path format: orgSlug/projectSlug/category/YYYY-MM/filename.ext)
  storage_key: VARCHAR(500) UNIQUE
  storage_url: TEXT
  
  // File metadata
  mime_type: VARCHAR(100)
  file_size: INTEGER
  file_hash: VARCHAR(64) // SHA-256 for deduplication
  
  // Dimensions
  width: INTEGER
  height: INTEGER
  aspect_ratio: VARCHAR(10)
  
  // Focal point for image transformations (imageflow)
  // Normalized coordinates: { "x": 0.5, "y": 0.3 } where 0.0-1.0
  focal_point: JSONB
  
  // Source
  source: ENUM('generated', 'uploaded')
  
  // Project-level alias (global scope)
  // Flow-level aliases stored in flows.aliases
  alias: VARCHAR(100) // @product, @logo
  
  // Metadata
  description: TEXT
  tags: TEXT[]
  meta: JSONB DEFAULT {}
  
  // Audit
  created_at: TIMESTAMP
  updated_at: TIMESTAMP
  deleted_at: TIMESTAMP // Soft delete
}

Purpose: Centralized storage for all images (uploaded + generated)

Key Design Decisions:

  • flow_id enables flow-scoped uploads
  • alias is for project-scope only (global across project)
  • Flow-scoped aliases stored in flows.aliases table
  • focal_point for imageflow server integration
  • api_key_id for audit trail of who created the image
  • Soft delete via deleted_at for recovery

Constraints:

CHECK (source = 'uploaded' AND generation_id IS NULL) 
   OR (source = 'generated' AND generation_id IS NOT NULL)

CHECK alias IS NULL OR alias ~ '^@[a-zA-Z0-9_-]+$'

CHECK file_size > 0

CHECK (width IS NULL OR (width > 0 AND width <= 8192))
  AND (height IS NULL OR (height > 0 AND height <= 8192))

Indexes:

CREATE UNIQUE INDEX idx_images_project_alias 
  ON images(project_id, alias) 
  WHERE alias IS NOT NULL AND deleted_at IS NULL AND flow_id IS NULL;

CREATE INDEX idx_images_project_source 
  ON images(project_id, source, created_at DESC) 
  WHERE deleted_at IS NULL;

CREATE INDEX idx_images_flow ON images(flow_id) WHERE flow_id IS NOT NULL;
CREATE INDEX idx_images_generation ON images(generation_id);
CREATE INDEX idx_images_storage_key ON images(storage_key);
CREATE INDEX idx_images_hash ON images(file_hash);

6. GENERATIONS

generations {
  id: UUID (PK)
  
  // Relations
  project_id: UUID (FK -> projects) CASCADE
  flow_id: UUID (FK -> flows) SET NULL
  api_key_id: UUID (FK -> api_keys) SET NULL
  
  // Status
  status: ENUM('pending', 'processing', 'success', 'failed') DEFAULT 'pending'
  
  // Prompts
  original_prompt: TEXT
  enhanced_prompt: TEXT // AI-enhanced version (if enabled)
  
  // Generation parameters
  aspect_ratio: VARCHAR(10)
  width: INTEGER
  height: INTEGER
  
  // AI Model
  model_name: VARCHAR(100) DEFAULT 'gemini-flash-image-001'
  model_version: VARCHAR(50)
  
  // Result
  output_image_id: UUID (FK -> images) SET NULL
  
  // Referenced images used in generation
  // Format: [{ "imageId": "uuid", "alias": "@product" }, ...]
  referenced_images: JSONB
  
  // Error handling
  error_message: TEXT
  error_code: VARCHAR(50)
  retry_count: INTEGER DEFAULT 0
  
  // Metrics
  processing_time_ms: INTEGER
  cost: INTEGER // In cents (USD)
  
  // Request context
  request_id: UUID // For log correlation
  user_agent: TEXT
  ip_address: INET
  
  // Metadata
  meta: JSONB DEFAULT {}
  
  // Audit
  created_at: TIMESTAMP
  updated_at: TIMESTAMP
}

Purpose: Complete audit trail of all image generations

Key Design Decisions:

  • referenced_images as JSONB instead of M:N table (simpler, sufficient for reference info)
  • No parent_generation_id - not needed for MVP
  • No final_prompt - redundant with enhanced_prompt or original_prompt
  • No completed_at - use updated_at when status changes to success/failed
  • api_key_id for audit trail of who made the request
  • Technical aliases resolved programmatically, not stored

Referenced Images Format:

[
  { "imageId": "uuid-1", "alias": "@product" },
  { "imageId": "uuid-2", "alias": "@style" }
]

Constraints:

CHECK (status = 'success' AND output_image_id IS NOT NULL) 
   OR (status != 'success')

CHECK (status = 'failed' AND error_message IS NOT NULL) 
   OR (status != 'failed')

CHECK retry_count >= 0

CHECK processing_time_ms IS NULL OR processing_time_ms >= 0

CHECK cost IS NULL OR cost >= 0

Indexes:

CREATE INDEX idx_generations_project_status 
  ON generations(project_id, status, created_at DESC);

CREATE INDEX idx_generations_flow 
  ON generations(flow_id, created_at DESC) 
  WHERE flow_id IS NOT NULL;

CREATE INDEX idx_generations_output ON generations(output_image_id);
CREATE INDEX idx_generations_request ON generations(request_id);

7. PROMPT_URL_CACHE

prompt_url_cache {
  id: UUID (PK)
  
  // Relations
  project_id: UUID (FK -> projects) CASCADE
  generation_id: UUID (FK -> generations) CASCADE
  image_id: UUID (FK -> images) CASCADE
  
  // Cache keys (SHA-256 hashes)
  prompt_hash: VARCHAR(64)
  query_params_hash: VARCHAR(64)
  
  // Original request (for debugging/reconstruction)
  original_prompt: TEXT
  request_params: JSONB // { width, height, aspectRatio, template, ... }
  
  // Cache statistics
  hit_count: INTEGER DEFAULT 0
  last_hit_at: TIMESTAMP
  
  // Audit
  created_at: TIMESTAMP
}

Purpose: Deduplication and caching for Prompt URL feature

Key Design Decisions:

  • Composite unique key: project_id + prompt_hash + query_params_hash
  • No expires_at - cache lives forever unless manually cleared
  • Tracks hit_count for analytics

Constraints:

CHECK hit_count >= 0

Indexes:

CREATE UNIQUE INDEX idx_cache_key 
  ON prompt_url_cache(project_id, prompt_hash, query_params_hash);

CREATE INDEX idx_cache_generation ON prompt_url_cache(generation_id);
CREATE INDEX idx_cache_image ON prompt_url_cache(image_id);
CREATE INDEX idx_cache_hits 
  ON prompt_url_cache(project_id, hit_count DESC, created_at DESC);

🔗 Relationships Summary

One-to-Many (1:M)

  1. organizations → projects (CASCADE)
  2. organizations → api_keys (CASCADE)
  3. projects → api_keys (CASCADE)
  4. projects → flows (CASCADE)
  5. projects → images (CASCADE)
  6. projects → generations (CASCADE)
  7. projects → prompt_url_cache (CASCADE)
  8. flows → images (CASCADE)
  9. flows → generations (SET NULL)
  10. generations → images (SET NULL) - output image
  11. api_keys → images (SET NULL) - who created
  12. api_keys → generations (SET NULL) - who requested

Cascade Rules

ON DELETE CASCADE:

  • Deleting organization → deletes all projects, api_keys
  • Deleting project → deletes all flows, images, generations, cache
  • Deleting flow → deletes all flow-scoped images
  • Deleting generation → nothing (orphaned references OK)

ON DELETE SET NULL:

  • Deleting generation → sets images.generation_id to NULL
  • Deleting image → sets generations.output_image_id to NULL
  • Deleting flow → sets generations.flow_id to NULL
  • Deleting api_key → sets audit references to NULL

🎯 Alias System

Two-Tier Alias Scope

Project-Scoped (Global)

  • Storage: images.alias column
  • Lifetime: Permanent (until image deleted)
  • Visibility: Across entire project
  • Examples: @logo, @brand, @header
  • Use Case: Reusable brand assets

Flow-Scoped (Temporary)

  • Storage: flows.aliases JSONB
  • Lifetime: Duration of flow
  • Visibility: Only within specific flow
  • Examples: @hero, @product, @variant
  • Use Case: Conversational generation chains

Technical Aliases (Computed)

  • Storage: None (computed on-the-fly)
  • Types:
    • @last - Last generation in flow (any status)
    • @first - First generation in flow
    • @upload - Last uploaded image in flow
  • Implementation: Query-based resolution

Resolution Algorithm

1. Check if technical alias (@last, @first, @upload) → compute from flow data
2. Check flow.aliases for flow-scoped alias → return if found
3. Check images.alias for project-scoped alias → return if found
4. Return null (alias not found)

🔧 Dual Alias Assignment

Uploads

POST /api/images/upload
{
  file: <binary>,
  alias: "@product",        // Project-scoped (optional)
  flowAlias: "@hero",       // Flow-scoped (optional)
  flowId: "uuid"            // Required if flowAlias provided
}

Result:

  • If alias provided → set images.alias = "@product"
  • If flowAlias provided → add to flows.aliases["@hero"] = imageId
  • Can have both simultaneously

Generations

POST /api/generations
{
  prompt: "hero image",
  assignAlias: "@brand",    // Project-scoped (optional)
  assignFlowAlias: "@hero", // Flow-scoped (optional)
  flowId: "uuid"
}

Result (after successful generation):

  • If assignAlias → set images.alias = "@brand" on output image
  • If assignFlowAlias → add to flows.aliases["@hero"] = outputImageId

📊 Performance Optimizations

Critical Indexes

All indexes listed in individual table sections above. Key performance considerations:

  1. Alias Lookup: Partial index on images(project_id, alias) WHERE conditions
  2. Flow Activity: Composite index on generations(flow_id, created_at)
  3. Cache Hit: Unique composite on prompt_url_cache(project_id, prompt_hash, query_params_hash)
  4. Audit Queries: Indexes on api_key_id columns

Denormalization

Avoided intentionally:

  • No counters (image_count, generation_count)
  • Computed via COUNT(*) queries with proper indexes
  • Simpler, more reliable, less trigger overhead

🧹 Data Lifecycle

Soft Delete

Tables with soft delete:

  • images - via deleted_at column

Cleanup strategy:

  • Hard delete after 30 days of soft delete
  • Implemented via cron job or manual cleanup script

Hard Delete

Tables with hard delete:

  • generations - cascade deletes
  • flows - cascade deletes
  • prompt_url_cache - cascade deletes

🔐 Security & Audit

API Key Tracking

All mutations tracked via api_key_id:

  • images.api_key_id - who uploaded/generated
  • generations.api_key_id - who requested generation

Request Correlation

  • generations.request_id - correlate with application logs
  • generations.user_agent - client identification
  • generations.ip_address - rate limiting, abuse prevention

🚀 Migration Strategy

Phase 1: Core Tables

  1. Create flows table
  2. Create images table
  3. Create generations table
  4. Add all indexes and constraints
  5. Migrate existing MinIO data to images table

Phase 2: Advanced Features

  1. Create prompt_url_cache table
  2. Add indexes
  3. Implement cache warming for existing data (optional)

📝 Design Decisions Log

Why JSONB for flows.aliases?

  • Simple key-value structure
  • No need for JOINs
  • Flexible schema
  • Atomic updates
  • Trade-off: No referential integrity (acceptable for temporary data)

Why JSONB for generations.referenced_images?

  • Reference info is append-only
  • No need for complex queries on references
  • Simpler schema (one less table)
  • Trade-off: No CASCADE on image deletion (acceptable)

Why no namespaces?

  • Adds complexity without clear benefit for MVP
  • Flow-scoped + project-scoped aliases sufficient
  • Can add later if needed

Why no generation_groups?

  • Not needed for core functionality
  • Grouping can be done via tags or meta JSONB
  • Can add later if analytics requires it

Why focal_point as JSONB?

  • Imageflow server expects normalized coordinates
  • Format: { "x": 0.0-1.0, "y": 0.0-1.0 }
  • JSONB allows future extension (e.g., multiple focal points)

Why track api_key_id in images/generations?

  • Essential for audit trail
  • Cost attribution per key
  • Usage analytics
  • Abuse detection

📚 References


Document Version: 2.0
Last Updated: 2025-10-26
Status: Ready for Implementation