From e88617b430815ca3425ce855bd7ac27ef732e33b Mon Sep 17 00:00:00 2001 From: Oleg Proskurin Date: Sun, 26 Oct 2025 22:26:02 +0700 Subject: [PATCH] feat: add documentation --- banatie-database-design.md | 607 +++++++++++++++++++++++++++++++++++++ 1 file changed, 607 insertions(+) create mode 100644 banatie-database-design.md diff --git a/banatie-database-design.md b/banatie-database-design.md new file mode 100644 index 0000000..8980422 --- /dev/null +++ b/banatie-database-design.md @@ -0,0 +1,607 @@ +# Banatie Database Design + +## ๐Ÿ“Š Database Schema for AI Image Generation System + +This document describes the complete database structure for Banatie - an AI-powered image generation service with support for named references, flows, and prompt URL caching. + +**Version:** 2.0 +**Last Updated:** 2025-10-26 +**Status:** Approved for Implementation + +--- + +## ๐Ÿ—๏ธ Architecture Overview + +### Core Principles + +1. **Dual Alias System**: Project-level (global) and Flow-level (temporary) scopes +2. **Technical Aliases Computed**: `@last`, `@first`, `@upload` are calculated programmatically +3. **Audit Trail**: Complete history of all generations with performance metrics +4. **Referential Integrity**: Proper foreign keys and cascade rules +5. **Simplicity First**: Minimal tables, JSONB for flexibility + +### Scope Resolution Order + +``` +Flow-scoped aliases (@hero in flow) โ†’ Project-scoped aliases (@logo global) โ†’ Technical aliases (@last, @first) +``` + +--- + +## ๐Ÿ“‹ Existing Tables (Unchanged) + +### 1. ORGANIZATIONS + +```typescript +organizations { + id: UUID (PK) + name: TEXT + slug: TEXT UNIQUE + email: TEXT UNIQUE + created_at: TIMESTAMP + updated_at: TIMESTAMP +} +``` + +**Purpose:** Top-level entity for multi-tenant system + +--- + +### 2. PROJECTS + +```typescript +projects { + id: UUID (PK) + organization_id: UUID (FK -> organizations) CASCADE + name: TEXT + slug: TEXT + created_at: TIMESTAMP + updated_at: TIMESTAMP + + UNIQUE INDEX(organization_id, slug) +} +``` + +**Purpose:** Container for all project-specific data (images, generations, flows) + +--- + +### 3. API_KEYS + +```typescript +api_keys { + id: UUID (PK) + key_hash: TEXT UNIQUE + key_prefix: TEXT DEFAULT 'bnt_' + key_type: ENUM('master', 'project') + + organization_id: UUID (FK -> organizations) CASCADE + project_id: UUID (FK -> projects) CASCADE + + scopes: JSONB DEFAULT ['generate'] + + created_at: TIMESTAMP + expires_at: TIMESTAMP + last_used_at: TIMESTAMP + is_active: BOOLEAN DEFAULT true + + name: TEXT + created_by: UUID +} +``` + +**Purpose:** Authentication and authorization for API access + +--- + +## ๐Ÿ†• New Tables + +### 4. FLOWS + +```typescript +flows { + id: UUID (PK) + project_id: UUID (FK -> projects) CASCADE + + // Flow-scoped named aliases (user-assigned only) + // Technical aliases (@last, @first, @upload) computed programmatically + // Format: { "@hero": "image-uuid", "@product": "image-uuid" } + aliases: JSONB DEFAULT {} + + meta: JSONB DEFAULT {} + + created_at: TIMESTAMP + // Updates on every generation/upload activity within this flow + updated_at: TIMESTAMP +} +``` + +**Purpose:** Temporary chains of generations with flow-scoped references + +**Key Design Decisions:** +- No `status` field - computed from generations +- No `name`/`description` - flows are programmatic, not user-facing +- No `expires_at` - cleanup handled programmatically via `created_at` +- `aliases` stores only user-assigned aliases, not technical ones + +**Indexes:** +```sql +CREATE INDEX idx_flows_project ON flows(project_id, created_at DESC); +``` + +--- + +### 5. IMAGES + +```typescript +images { + id: UUID (PK) + + // Relations + project_id: UUID (FK -> projects) CASCADE + generation_id: UUID (FK -> generations) SET NULL + flow_id: UUID (FK -> flows) CASCADE + api_key_id: UUID (FK -> api_keys) SET NULL + + // Storage (MinIO path format: orgSlug/projectSlug/category/YYYY-MM/filename.ext) + storage_key: VARCHAR(500) UNIQUE + storage_url: TEXT + + // File metadata + mime_type: VARCHAR(100) + file_size: INTEGER + file_hash: VARCHAR(64) // SHA-256 for deduplication + + // Dimensions + width: INTEGER + height: INTEGER + aspect_ratio: VARCHAR(10) + + // Focal point for image transformations (imageflow) + // Normalized coordinates: { "x": 0.5, "y": 0.3 } where 0.0-1.0 + focal_point: JSONB + + // Source + source: ENUM('generated', 'uploaded') + + // Project-level alias (global scope) + // Flow-level aliases stored in flows.aliases + alias: VARCHAR(100) // @product, @logo + + // Metadata + description: TEXT + tags: TEXT[] + meta: JSONB DEFAULT {} + + // Audit + created_at: TIMESTAMP + updated_at: TIMESTAMP + deleted_at: TIMESTAMP // Soft delete +} +``` + +**Purpose:** Centralized storage for all images (uploaded + generated) + +**Key Design Decisions:** +- `flow_id` enables flow-scoped uploads +- `alias` is for project-scope only (global across project) +- Flow-scoped aliases stored in `flows.aliases` table +- `focal_point` for imageflow server integration +- `api_key_id` for audit trail of who created the image +- Soft delete via `deleted_at` for recovery + +**Constraints:** +```sql +CHECK (source = 'uploaded' AND generation_id IS NULL) + OR (source = 'generated' AND generation_id IS NOT NULL) + +CHECK alias IS NULL OR alias ~ '^@[a-zA-Z0-9_-]+$' + +CHECK file_size > 0 + +CHECK (width IS NULL OR (width > 0 AND width <= 8192)) + AND (height IS NULL OR (height > 0 AND height <= 8192)) +``` + +**Indexes:** +```sql +CREATE UNIQUE INDEX idx_images_project_alias + ON images(project_id, alias) + WHERE alias IS NOT NULL AND deleted_at IS NULL AND flow_id IS NULL; + +CREATE INDEX idx_images_project_source + ON images(project_id, source, created_at DESC) + WHERE deleted_at IS NULL; + +CREATE INDEX idx_images_flow ON images(flow_id) WHERE flow_id IS NOT NULL; +CREATE INDEX idx_images_generation ON images(generation_id); +CREATE INDEX idx_images_storage_key ON images(storage_key); +CREATE INDEX idx_images_hash ON images(file_hash); +``` + +--- + +### 6. GENERATIONS + +```typescript +generations { + id: UUID (PK) + + // Relations + project_id: UUID (FK -> projects) CASCADE + flow_id: UUID (FK -> flows) SET NULL + api_key_id: UUID (FK -> api_keys) SET NULL + + // Status + status: ENUM('pending', 'processing', 'success', 'failed') DEFAULT 'pending' + + // Prompts + original_prompt: TEXT + enhanced_prompt: TEXT // AI-enhanced version (if enabled) + + // Generation parameters + aspect_ratio: VARCHAR(10) + width: INTEGER + height: INTEGER + + // AI Model + model_name: VARCHAR(100) DEFAULT 'gemini-flash-image-001' + model_version: VARCHAR(50) + + // Result + output_image_id: UUID (FK -> images) SET NULL + + // Referenced images used in generation + // Format: [{ "imageId": "uuid", "alias": "@product" }, ...] + referenced_images: JSONB + + // Error handling + error_message: TEXT + error_code: VARCHAR(50) + retry_count: INTEGER DEFAULT 0 + + // Metrics + processing_time_ms: INTEGER + cost: INTEGER // In cents (USD) + + // Request context + request_id: UUID // For log correlation + user_agent: TEXT + ip_address: INET + + // Metadata + meta: JSONB DEFAULT {} + + // Audit + created_at: TIMESTAMP + updated_at: TIMESTAMP +} +``` + +**Purpose:** Complete audit trail of all image generations + +**Key Design Decisions:** +- `referenced_images` as JSONB instead of M:N table (simpler, sufficient for reference info) +- No `parent_generation_id` - not needed for MVP +- No `final_prompt` - redundant with `enhanced_prompt` or `original_prompt` +- No `completed_at` - use `updated_at` when `status` changes to success/failed +- `api_key_id` for audit trail of who made the request +- Technical aliases resolved programmatically, not stored + +**Referenced Images Format:** +```json +[ + { "imageId": "uuid-1", "alias": "@product" }, + { "imageId": "uuid-2", "alias": "@style" } +] +``` + +**Constraints:** +```sql +CHECK (status = 'success' AND output_image_id IS NOT NULL) + OR (status != 'success') + +CHECK (status = 'failed' AND error_message IS NOT NULL) + OR (status != 'failed') + +CHECK retry_count >= 0 + +CHECK processing_time_ms IS NULL OR processing_time_ms >= 0 + +CHECK cost IS NULL OR cost >= 0 +``` + +**Indexes:** +```sql +CREATE INDEX idx_generations_project_status + ON generations(project_id, status, created_at DESC); + +CREATE INDEX idx_generations_flow + ON generations(flow_id, created_at DESC) + WHERE flow_id IS NOT NULL; + +CREATE INDEX idx_generations_output ON generations(output_image_id); +CREATE INDEX idx_generations_request ON generations(request_id); +``` + +--- + +### 7. PROMPT_URL_CACHE + +```typescript +prompt_url_cache { + id: UUID (PK) + + // Relations + project_id: UUID (FK -> projects) CASCADE + generation_id: UUID (FK -> generations) CASCADE + image_id: UUID (FK -> images) CASCADE + + // Cache keys (SHA-256 hashes) + prompt_hash: VARCHAR(64) + query_params_hash: VARCHAR(64) + + // Original request (for debugging/reconstruction) + original_prompt: TEXT + request_params: JSONB // { width, height, aspectRatio, template, ... } + + // Cache statistics + hit_count: INTEGER DEFAULT 0 + last_hit_at: TIMESTAMP + + // Audit + created_at: TIMESTAMP +} +``` + +**Purpose:** Deduplication and caching for Prompt URL feature + +**Key Design Decisions:** +- Composite unique key: `project_id + prompt_hash + query_params_hash` +- No `expires_at` - cache lives forever unless manually cleared +- Tracks `hit_count` for analytics + +**Constraints:** +```sql +CHECK hit_count >= 0 +``` + +**Indexes:** +```sql +CREATE UNIQUE INDEX idx_cache_key + ON prompt_url_cache(project_id, prompt_hash, query_params_hash); + +CREATE INDEX idx_cache_generation ON prompt_url_cache(generation_id); +CREATE INDEX idx_cache_image ON prompt_url_cache(image_id); +CREATE INDEX idx_cache_hits + ON prompt_url_cache(project_id, hit_count DESC, created_at DESC); +``` + +--- + +## ๐Ÿ”— Relationships Summary + +### One-to-Many (1:M) + +1. **organizations โ†’ projects** (CASCADE) +2. **organizations โ†’ api_keys** (CASCADE) +3. **projects โ†’ api_keys** (CASCADE) +4. **projects โ†’ flows** (CASCADE) +5. **projects โ†’ images** (CASCADE) +6. **projects โ†’ generations** (CASCADE) +7. **projects โ†’ prompt_url_cache** (CASCADE) +8. **flows โ†’ images** (CASCADE) +9. **flows โ†’ generations** (SET NULL) +10. **generations โ†’ images** (SET NULL) - output image +11. **api_keys โ†’ images** (SET NULL) - who created +12. **api_keys โ†’ generations** (SET NULL) - who requested + +### Cascade Rules + +**ON DELETE CASCADE:** +- Deleting organization โ†’ deletes all projects, api_keys +- Deleting project โ†’ deletes all flows, images, generations, cache +- Deleting flow โ†’ deletes all flow-scoped images +- Deleting generation โ†’ nothing (orphaned references OK) + +**ON DELETE SET NULL:** +- Deleting generation โ†’ sets `images.generation_id` to NULL +- Deleting image โ†’ sets `generations.output_image_id` to NULL +- Deleting flow โ†’ sets `generations.flow_id` to NULL +- Deleting api_key โ†’ sets audit references to NULL + +--- + +## ๐ŸŽฏ Alias System + +### Two-Tier Alias Scope + +#### Project-Scoped (Global) +- **Storage:** `images.alias` column +- **Lifetime:** Permanent (until image deleted) +- **Visibility:** Across entire project +- **Examples:** `@logo`, `@brand`, `@header` +- **Use Case:** Reusable brand assets + +#### Flow-Scoped (Temporary) +- **Storage:** `flows.aliases` JSONB +- **Lifetime:** Duration of flow +- **Visibility:** Only within specific flow +- **Examples:** `@hero`, `@product`, `@variant` +- **Use Case:** Conversational generation chains + +#### Technical Aliases (Computed) +- **Storage:** None (computed on-the-fly) +- **Types:** + - `@last` - Last generation in flow (any status) + - `@first` - First generation in flow + - `@upload` - Last uploaded image in flow +- **Implementation:** Query-based resolution + +### Resolution Algorithm + +``` +1. Check if technical alias (@last, @first, @upload) โ†’ compute from flow data +2. Check flow.aliases for flow-scoped alias โ†’ return if found +3. Check images.alias for project-scoped alias โ†’ return if found +4. Return null (alias not found) +``` + +--- + +## ๐Ÿ”ง Dual Alias Assignment + +### Uploads +```typescript +POST /api/images/upload +{ + file: , + alias: "@product", // Project-scoped (optional) + flowAlias: "@hero", // Flow-scoped (optional) + flowId: "uuid" // Required if flowAlias provided +} +``` + +**Result:** +- If `alias` provided โ†’ set `images.alias = "@product"` +- If `flowAlias` provided โ†’ add to `flows.aliases["@hero"] = imageId` +- Can have both simultaneously + +### Generations +```typescript +POST /api/generations +{ + prompt: "hero image", + assignAlias: "@brand", // Project-scoped (optional) + assignFlowAlias: "@hero", // Flow-scoped (optional) + flowId: "uuid" +} +``` + +**Result (after successful generation):** +- If `assignAlias` โ†’ set `images.alias = "@brand"` on output image +- If `assignFlowAlias` โ†’ add to `flows.aliases["@hero"] = outputImageId` + +--- + +## ๐Ÿ“Š Performance Optimizations + +### Critical Indexes + +All indexes listed in individual table sections above. Key performance considerations: + +1. **Alias Lookup:** Partial index on `images(project_id, alias)` WHERE conditions +2. **Flow Activity:** Composite index on `generations(flow_id, created_at)` +3. **Cache Hit:** Unique composite on `prompt_url_cache(project_id, prompt_hash, query_params_hash)` +4. **Audit Queries:** Indexes on `api_key_id` columns + +### Denormalization + +**Avoided intentionally:** +- No counters (image_count, generation_count) +- Computed via COUNT(*) queries with proper indexes +- Simpler, more reliable, less trigger overhead + +--- + +## ๐Ÿงน Data Lifecycle + +### Soft Delete + +**Tables with soft delete:** +- `images` - via `deleted_at` column + +**Cleanup strategy:** +- Hard delete after 30 days of soft delete +- Implemented via cron job or manual cleanup script + +### Hard Delete + +**Tables with hard delete:** +- `generations` - cascade deletes +- `flows` - cascade deletes +- `prompt_url_cache` - cascade deletes + +--- + +## ๐Ÿ” Security & Audit + +### API Key Tracking + +All mutations tracked via `api_key_id`: +- `images.api_key_id` - who uploaded/generated +- `generations.api_key_id` - who requested generation + +### Request Correlation + +- `generations.request_id` - correlate with application logs +- `generations.user_agent` - client identification +- `generations.ip_address` - rate limiting, abuse prevention + +--- + +## ๐Ÿš€ Migration Strategy + +### Phase 1: Core Tables +1. Create `flows` table +2. Create `images` table +3. Create `generations` table +4. Add all indexes and constraints +5. Migrate existing MinIO data to `images` table + +### Phase 2: Advanced Features +1. Create `prompt_url_cache` table +2. Add indexes +3. Implement cache warming for existing data (optional) + +--- + +## ๐Ÿ“ Design Decisions Log + +### Why JSONB for `flows.aliases`? +- Simple key-value structure +- No need for JOINs +- Flexible schema +- Atomic updates +- Trade-off: No referential integrity (acceptable for temporary data) + +### Why JSONB for `generations.referenced_images`? +- Reference info is append-only +- No need for complex queries on references +- Simpler schema (one less table) +- Trade-off: No CASCADE on image deletion (acceptable) + +### Why no `namespaces`? +- Adds complexity without clear benefit for MVP +- Flow-scoped + project-scoped aliases sufficient +- Can add later if needed + +### Why no `generation_groups`? +- Not needed for core functionality +- Grouping can be done via tags or meta JSONB +- Can add later if analytics requires it + +### Why `focal_point` as JSONB? +- Imageflow server expects normalized coordinates +- Format: `{ "x": 0.0-1.0, "y": 0.0-1.0 }` +- JSONB allows future extension (e.g., multiple focal points) + +### Why track `api_key_id` in images/generations? +- Essential for audit trail +- Cost attribution per key +- Usage analytics +- Abuse detection + +--- + +## ๐Ÿ“š References + +- **Imageflow Focal Points:** https://docs.imageflow.io/querystring/focal-point +- **Drizzle ORM:** https://orm.drizzle.team/ +- **PostgreSQL JSONB:** https://www.postgresql.org/docs/current/datatype-json.html + +--- + +*Document Version: 2.0* +*Last Updated: 2025-10-26* +*Status: Ready for Implementation*