15 KiB

Raw Blame History

Banatie Database Design

📊 Database Schema for AI Image Generation System

This document describes the complete database structure for Banatie - an AI-powered image generation service with support for named references, flows, and prompt URL caching.

Version: 2.0
Last Updated: 2025-10-26
Status: Approved for Implementation

🏗️ Architecture Overview

Core Principles

Dual Alias System: Project-level (global) and Flow-level (temporary) scopes
Technical Aliases Computed: @last, @first, @upload are calculated programmatically
Audit Trail: Complete history of all generations with performance metrics
Referential Integrity: Proper foreign keys and cascade rules
Simplicity First: Minimal tables, JSONB for flexibility

Scope Resolution Order

Flow-scoped aliases (@hero in flow) → Project-scoped aliases (@logo global) → Technical aliases (@last, @first)

📋 Existing Tables (Unchanged)

1. ORGANIZATIONS

organizations {
  id: UUID (PK)
  name: TEXT
  slug: TEXT UNIQUE
  email: TEXT UNIQUE
  created_at: TIMESTAMP
  updated_at: TIMESTAMP
}

Purpose: Top-level entity for multi-tenant system

2. PROJECTS

projects {
  id: UUID (PK)
  organization_id: UUID (FK -> organizations) CASCADE
  name: TEXT
  slug: TEXT
  created_at: TIMESTAMP
  updated_at: TIMESTAMP
  
  UNIQUE INDEX(organization_id, slug)
}

Purpose: Container for all project-specific data (images, generations, flows)

3. API_KEYS

api_keys {
  id: UUID (PK)
  key_hash: TEXT UNIQUE
  key_prefix: TEXT DEFAULT 'bnt_'
  key_type: ENUM('master', 'project')
  
  organization_id: UUID (FK -> organizations) CASCADE
  project_id: UUID (FK -> projects) CASCADE
  
  scopes: JSONB DEFAULT ['generate']
  
  created_at: TIMESTAMP
  expires_at: TIMESTAMP
  last_used_at: TIMESTAMP
  is_active: BOOLEAN DEFAULT true
  
  name: TEXT
  created_by: UUID
}

Purpose: Authentication and authorization for API access

🆕 New Tables

4. FLOWS

flows {
  id: UUID (PK)
  project_id: UUID (FK -> projects) CASCADE
  
  // Flow-scoped named aliases (user-assigned only)
  // Technical aliases (@last, @first, @upload) computed programmatically
  // Format: { "@hero": "image-uuid", "@product": "image-uuid" }
  aliases: JSONB DEFAULT {}
  
  meta: JSONB DEFAULT {}
  
  created_at: TIMESTAMP
  // Updates on every generation/upload activity within this flow
  updated_at: TIMESTAMP
}

Purpose: Temporary chains of generations with flow-scoped references

Key Design Decisions:

No status field - computed from generations
No name/description - flows are programmatic, not user-facing
No expires_at - cleanup handled programmatically via created_at
aliases stores only user-assigned aliases, not technical ones

Indexes:

CREATE INDEX idx_flows_project ON flows(project_id, created_at DESC);

5. IMAGES

images {
  id: UUID (PK)
  
  // Relations
  project_id: UUID (FK -> projects) CASCADE
  generation_id: UUID (FK -> generations) SET NULL
  flow_id: UUID (FK -> flows) CASCADE
  api_key_id: UUID (FK -> api_keys) SET NULL
  
  // Storage (MinIO path format: orgSlug/projectSlug/category/YYYY-MM/filename.ext)
  storage_key: VARCHAR(500) UNIQUE
  storage_url: TEXT
  
  // File metadata
  mime_type: VARCHAR(100)
  file_size: INTEGER
  file_hash: VARCHAR(64) // SHA-256 for deduplication
  
  // Dimensions
  width: INTEGER
  height: INTEGER
  aspect_ratio: VARCHAR(10)
  
  // Focal point for image transformations (imageflow)
  // Normalized coordinates: { "x": 0.5, "y": 0.3 } where 0.0-1.0
  focal_point: JSONB
  
  // Source
  source: ENUM('generated', 'uploaded')
  
  // Project-level alias (global scope)
  // Flow-level aliases stored in flows.aliases
  alias: VARCHAR(100) // @product, @logo
  
  // Metadata
  description: TEXT
  tags: TEXT[]
  meta: JSONB DEFAULT {}
  
  // Audit
  created_at: TIMESTAMP
  updated_at: TIMESTAMP
  deleted_at: TIMESTAMP // Soft delete
}

Purpose: Centralized storage for all images (uploaded + generated)

Key Design Decisions:

flow_id enables flow-scoped uploads
alias is for project-scope only (global across project)
Flow-scoped aliases stored in flows.aliases table
focal_point for imageflow server integration
api_key_id for audit trail of who created the image
Soft delete via deleted_at for recovery

Constraints:

CHECK (source = 'uploaded' AND generation_id IS NULL) 
   OR (source = 'generated' AND generation_id IS NOT NULL)

CHECK alias IS NULL OR alias ~ '^@[a-zA-Z0-9_-]+$'

CHECK file_size > 0

CHECK (width IS NULL OR (width > 0 AND width <= 8192))
  AND (height IS NULL OR (height > 0 AND height <= 8192))

Indexes:

CREATE UNIQUE INDEX idx_images_project_alias 
  ON images(project_id, alias) 
  WHERE alias IS NOT NULL AND deleted_at IS NULL AND flow_id IS NULL;

CREATE INDEX idx_images_project_source 
  ON images(project_id, source, created_at DESC) 
  WHERE deleted_at IS NULL;

CREATE INDEX idx_images_flow ON images(flow_id) WHERE flow_id IS NOT NULL;
CREATE INDEX idx_images_generation ON images(generation_id);
CREATE INDEX idx_images_storage_key ON images(storage_key);
CREATE INDEX idx_images_hash ON images(file_hash);

6. GENERATIONS

generations {
  id: UUID (PK)
  
  // Relations
  project_id: UUID (FK -> projects) CASCADE
  flow_id: UUID (FK -> flows) SET NULL
  api_key_id: UUID (FK -> api_keys) SET NULL
  
  // Status
  status: ENUM('pending', 'processing', 'success', 'failed') DEFAULT 'pending'
  
  // Prompts
  original_prompt: TEXT
  enhanced_prompt: TEXT // AI-enhanced version (if enabled)
  
  // Generation parameters
  aspect_ratio: VARCHAR(10)
  width: INTEGER
  height: INTEGER
  
  // AI Model
  model_name: VARCHAR(100) DEFAULT 'gemini-flash-image-001'
  model_version: VARCHAR(50)
  
  // Result
  output_image_id: UUID (FK -> images) SET NULL
  
  // Referenced images used in generation
  // Format: [{ "imageId": "uuid", "alias": "@product" }, ...]
  referenced_images: JSONB
  
  // Error handling
  error_message: TEXT
  error_code: VARCHAR(50)
  retry_count: INTEGER DEFAULT 0
  
  // Metrics
  processing_time_ms: INTEGER
  cost: INTEGER // In cents (USD)
  
  // Request context
  request_id: UUID // For log correlation
  user_agent: TEXT
  ip_address: INET
  
  // Metadata
  meta: JSONB DEFAULT {}
  
  // Audit
  created_at: TIMESTAMP
  updated_at: TIMESTAMP
}

Purpose: Complete audit trail of all image generations

Key Design Decisions:

referenced_images as JSONB instead of M:N table (simpler, sufficient for reference info)
No parent_generation_id - not needed for MVP
No final_prompt - redundant with enhanced_prompt or original_prompt
No completed_at - use updated_at when status changes to success/failed
api_key_id for audit trail of who made the request
Technical aliases resolved programmatically, not stored

Referenced Images Format:

[
  { "imageId": "uuid-1", "alias": "@product" },
  { "imageId": "uuid-2", "alias": "@style" }
]

Constraints:

CHECK (status = 'success' AND output_image_id IS NOT NULL) 
   OR (status != 'success')

CHECK (status = 'failed' AND error_message IS NOT NULL) 
   OR (status != 'failed')

CHECK retry_count >= 0

CHECK processing_time_ms IS NULL OR processing_time_ms >= 0

CHECK cost IS NULL OR cost >= 0

Indexes:

CREATE INDEX idx_generations_project_status 
  ON generations(project_id, status, created_at DESC);

CREATE INDEX idx_generations_flow 
  ON generations(flow_id, created_at DESC) 
  WHERE flow_id IS NOT NULL;

CREATE INDEX idx_generations_output ON generations(output_image_id);
CREATE INDEX idx_generations_request ON generations(request_id);

7. PROMPT_URL_CACHE

prompt_url_cache {
  id: UUID (PK)
  
  // Relations
  project_id: UUID (FK -> projects) CASCADE
  generation_id: UUID (FK -> generations) CASCADE
  image_id: UUID (FK -> images) CASCADE
  
  // Cache keys (SHA-256 hashes)
  prompt_hash: VARCHAR(64)
  query_params_hash: VARCHAR(64)
  
  // Original request (for debugging/reconstruction)
  original_prompt: TEXT
  request_params: JSONB // { width, height, aspectRatio, template, ... }
  
  // Cache statistics
  hit_count: INTEGER DEFAULT 0
  last_hit_at: TIMESTAMP
  
  // Audit
  created_at: TIMESTAMP
}

Purpose: Deduplication and caching for Prompt URL feature

Key Design Decisions:

Composite unique key: project_id + prompt_hash + query_params_hash
No expires_at - cache lives forever unless manually cleared
Tracks hit_count for analytics

Constraints:

CHECK hit_count >= 0

Indexes:

CREATE UNIQUE INDEX idx_cache_key 
  ON prompt_url_cache(project_id, prompt_hash, query_params_hash);

CREATE INDEX idx_cache_generation ON prompt_url_cache(generation_id);
CREATE INDEX idx_cache_image ON prompt_url_cache(image_id);
CREATE INDEX idx_cache_hits 
  ON prompt_url_cache(project_id, hit_count DESC, created_at DESC);

🔗 Relationships Summary

One-to-Many (1:M)

organizations → projects (CASCADE)
organizations → api_keys (CASCADE)
projects → api_keys (CASCADE)
projects → flows (CASCADE)
projects → images (CASCADE)
projects → generations (CASCADE)
projects → prompt_url_cache (CASCADE)
flows → images (CASCADE)
flows → generations (SET NULL)
generations → images (SET NULL) - output image
api_keys → images (SET NULL) - who created
api_keys → generations (SET NULL) - who requested

Cascade Rules

ON DELETE CASCADE:

Deleting organization → deletes all projects, api_keys
Deleting project → deletes all flows, images, generations, cache
Deleting flow → deletes all flow-scoped images
Deleting generation → nothing (orphaned references OK)

ON DELETE SET NULL:

Deleting generation → sets images.generation_id to NULL
Deleting image → sets generations.output_image_id to NULL
Deleting flow → sets generations.flow_id to NULL
Deleting api_key → sets audit references to NULL

🎯 Alias System

Two-Tier Alias Scope

Project-Scoped (Global)

Storage: images.alias column
Lifetime: Permanent (until image deleted)
Visibility: Across entire project
Examples: @logo, @brand, @header
Use Case: Reusable brand assets

Flow-Scoped (Temporary)

Storage: flows.aliases JSONB
Lifetime: Duration of flow
Visibility: Only within specific flow
Examples: @hero, @product, @variant
Use Case: Conversational generation chains

Technical Aliases (Computed)

Storage: None (computed on-the-fly)
Types:
- @last - Last generation in flow (any status)
- @first - First generation in flow
- @upload - Last uploaded image in flow
Implementation: Query-based resolution

Resolution Algorithm

1. Check if technical alias (@last, @first, @upload) → compute from flow data
2. Check flow.aliases for flow-scoped alias → return if found
3. Check images.alias for project-scoped alias → return if found
4. Return null (alias not found)

🔧 Dual Alias Assignment

Uploads

POST /api/images/upload
{
  file: <binary>,
  alias: "@product",        // Project-scoped (optional)
  flowAlias: "@hero",       // Flow-scoped (optional)
  flowId: "uuid"            // Required if flowAlias provided
}

Result:

If alias provided → set images.alias = "@product"
If flowAlias provided → add to flows.aliases["@hero"] = imageId
Can have both simultaneously

Generations

POST /api/generations
{
  prompt: "hero image",
  assignAlias: "@brand",    // Project-scoped (optional)
  assignFlowAlias: "@hero", // Flow-scoped (optional)
  flowId: "uuid"
}

Result (after successful generation):

If assignAlias → set images.alias = "@brand" on output image
If assignFlowAlias → add to flows.aliases["@hero"] = outputImageId

📊 Performance Optimizations

Critical Indexes

All indexes listed in individual table sections above. Key performance considerations:

Alias Lookup: Partial index on images(project_id, alias) WHERE conditions
Flow Activity: Composite index on generations(flow_id, created_at)
Cache Hit: Unique composite on prompt_url_cache(project_id, prompt_hash, query_params_hash)
Audit Queries: Indexes on api_key_id columns

Denormalization

Avoided intentionally:

No counters (image_count, generation_count)
Computed via COUNT(*) queries with proper indexes
Simpler, more reliable, less trigger overhead

🧹 Data Lifecycle

Soft Delete

Tables with soft delete:

images - via deleted_at column

Cleanup strategy:

Hard delete after 30 days of soft delete
Implemented via cron job or manual cleanup script

Hard Delete

Tables with hard delete:

generations - cascade deletes
flows - cascade deletes
prompt_url_cache - cascade deletes

🔐 Security & Audit

API Key Tracking

All mutations tracked via api_key_id:

images.api_key_id - who uploaded/generated
generations.api_key_id - who requested generation

Request Correlation

generations.request_id - correlate with application logs
generations.user_agent - client identification
generations.ip_address - rate limiting, abuse prevention

🚀 Migration Strategy

Phase 1: Core Tables

Create flows table
Create images table
Create generations table
Add all indexes and constraints
Migrate existing MinIO data to images table

Phase 2: Advanced Features

Create prompt_url_cache table
Add indexes
Implement cache warming for existing data (optional)

📝 Design Decisions Log

Why JSONB for `flows.aliases`?

Simple key-value structure
No need for JOINs
Flexible schema
Atomic updates
Trade-off: No referential integrity (acceptable for temporary data)

Why JSONB for `generations.referenced_images`?

Reference info is append-only
No need for complex queries on references
Simpler schema (one less table)
Trade-off: No CASCADE on image deletion (acceptable)

Why no `namespaces`?

Adds complexity without clear benefit for MVP
Flow-scoped + project-scoped aliases sufficient
Can add later if needed

Why no `generation_groups`?

Not needed for core functionality
Grouping can be done via tags or meta JSONB
Can add later if analytics requires it

Why `focal_point` as JSONB?

Imageflow server expects normalized coordinates
Format: { "x": 0.0-1.0, "y": 0.0-1.0 }
JSONB allows future extension (e.g., multiple focal points)

Why track `api_key_id` in images/generations?

Essential for audit trail
Cost attribution per key
Usage analytics
Abuse detection

📚 References

Imageflow Focal Points: https://docs.imageflow.io/querystring/focal-point
Drizzle ORM: https://orm.drizzle.team/
PostgreSQL JSONB: https://www.postgresql.org/docs/current/datatype-json.html

Document Version: 2.0
Last Updated: 2025-10-26
Status: Ready for Implementation

15 KiB Raw Blame History

Banatie Database Design

📊 Database Schema for AI Image Generation System

🏗️ Architecture Overview

Core Principles

Scope Resolution Order

📋 Existing Tables (Unchanged)

1. ORGANIZATIONS

2. PROJECTS

3. API_KEYS

🆕 New Tables

4. FLOWS

5. IMAGES

6. GENERATIONS

7. PROMPT_URL_CACHE

🔗 Relationships Summary

One-to-Many (1:M)

Cascade Rules

🎯 Alias System

Two-Tier Alias Scope

Project-Scoped (Global)

Flow-Scoped (Temporary)

Technical Aliases (Computed)

Resolution Algorithm

🔧 Dual Alias Assignment

Uploads

Generations

📊 Performance Optimizations

Critical Indexes

Denormalization

🧹 Data Lifecycle

Soft Delete

Hard Delete

🔐 Security & Audit

API Key Tracking

Request Correlation

🚀 Migration Strategy

Phase 1: Core Tables

Phase 2: Advanced Features

📝 Design Decisions Log

Why JSONB for flows.aliases?

Why JSONB for generations.referenced_images?

Why no namespaces?

Why no generation_groups?

Why focal_point as JSONB?

Why track api_key_id in images/generations?

📚 References

15 KiB

Raw Blame History

Why JSONB for `flows.aliases`?

Why JSONB for `generations.referenced_images`?

Why no `namespaces`?

Why no `generation_groups`?

Why `focal_point` as JSONB?

Why track `api_key_id` in images/generations?