feature/api-development #1

Merged
usulpro merged 47 commits from feature/api-development into main 2025-11-29 23:03:01 +07:00
1 changed files with 607 additions and 0 deletions
Showing only changes of commit e88617b430 - Show all commits

607
banatie-database-design.md Normal file
View File

@ -0,0 +1,607 @@
# Banatie Database Design
## 📊 Database Schema for AI Image Generation System
This document describes the complete database structure for Banatie - an AI-powered image generation service with support for named references, flows, and prompt URL caching.
**Version:** 2.0
**Last Updated:** 2025-10-26
**Status:** Approved for Implementation
---
## 🏗️ Architecture Overview
### Core Principles
1. **Dual Alias System**: Project-level (global) and Flow-level (temporary) scopes
2. **Technical Aliases Computed**: `@last`, `@first`, `@upload` are calculated programmatically
3. **Audit Trail**: Complete history of all generations with performance metrics
4. **Referential Integrity**: Proper foreign keys and cascade rules
5. **Simplicity First**: Minimal tables, JSONB for flexibility
### Scope Resolution Order
```
Flow-scoped aliases (@hero in flow) → Project-scoped aliases (@logo global) → Technical aliases (@last, @first)
```
---
## 📋 Existing Tables (Unchanged)
### 1. ORGANIZATIONS
```typescript
organizations {
id: UUID (PK)
name: TEXT
slug: TEXT UNIQUE
email: TEXT UNIQUE
created_at: TIMESTAMP
updated_at: TIMESTAMP
}
```
**Purpose:** Top-level entity for multi-tenant system
---
### 2. PROJECTS
```typescript
projects {
id: UUID (PK)
organization_id: UUID (FK -> organizations) CASCADE
name: TEXT
slug: TEXT
created_at: TIMESTAMP
updated_at: TIMESTAMP
UNIQUE INDEX(organization_id, slug)
}
```
**Purpose:** Container for all project-specific data (images, generations, flows)
---
### 3. API_KEYS
```typescript
api_keys {
id: UUID (PK)
key_hash: TEXT UNIQUE
key_prefix: TEXT DEFAULT 'bnt_'
key_type: ENUM('master', 'project')
organization_id: UUID (FK -> organizations) CASCADE
project_id: UUID (FK -> projects) CASCADE
scopes: JSONB DEFAULT ['generate']
created_at: TIMESTAMP
expires_at: TIMESTAMP
last_used_at: TIMESTAMP
is_active: BOOLEAN DEFAULT true
name: TEXT
created_by: UUID
}
```
**Purpose:** Authentication and authorization for API access
---
## 🆕 New Tables
### 4. FLOWS
```typescript
flows {
id: UUID (PK)
project_id: UUID (FK -> projects) CASCADE
// Flow-scoped named aliases (user-assigned only)
// Technical aliases (@last, @first, @upload) computed programmatically
// Format: { "@hero": "image-uuid", "@product": "image-uuid" }
aliases: JSONB DEFAULT {}
meta: JSONB DEFAULT {}
created_at: TIMESTAMP
// Updates on every generation/upload activity within this flow
updated_at: TIMESTAMP
}
```
**Purpose:** Temporary chains of generations with flow-scoped references
**Key Design Decisions:**
- No `status` field - computed from generations
- No `name`/`description` - flows are programmatic, not user-facing
- No `expires_at` - cleanup handled programmatically via `created_at`
- `aliases` stores only user-assigned aliases, not technical ones
**Indexes:**
```sql
CREATE INDEX idx_flows_project ON flows(project_id, created_at DESC);
```
---
### 5. IMAGES
```typescript
images {
id: UUID (PK)
// Relations
project_id: UUID (FK -> projects) CASCADE
generation_id: UUID (FK -> generations) SET NULL
flow_id: UUID (FK -> flows) CASCADE
api_key_id: UUID (FK -> api_keys) SET NULL
// Storage (MinIO path format: orgSlug/projectSlug/category/YYYY-MM/filename.ext)
storage_key: VARCHAR(500) UNIQUE
storage_url: TEXT
// File metadata
mime_type: VARCHAR(100)
file_size: INTEGER
file_hash: VARCHAR(64) // SHA-256 for deduplication
// Dimensions
width: INTEGER
height: INTEGER
aspect_ratio: VARCHAR(10)
// Focal point for image transformations (imageflow)
// Normalized coordinates: { "x": 0.5, "y": 0.3 } where 0.0-1.0
focal_point: JSONB
// Source
source: ENUM('generated', 'uploaded')
// Project-level alias (global scope)
// Flow-level aliases stored in flows.aliases
alias: VARCHAR(100) // @product, @logo
// Metadata
description: TEXT
tags: TEXT[]
meta: JSONB DEFAULT {}
// Audit
created_at: TIMESTAMP
updated_at: TIMESTAMP
deleted_at: TIMESTAMP // Soft delete
}
```
**Purpose:** Centralized storage for all images (uploaded + generated)
**Key Design Decisions:**
- `flow_id` enables flow-scoped uploads
- `alias` is for project-scope only (global across project)
- Flow-scoped aliases stored in `flows.aliases` table
- `focal_point` for imageflow server integration
- `api_key_id` for audit trail of who created the image
- Soft delete via `deleted_at` for recovery
**Constraints:**
```sql
CHECK (source = 'uploaded' AND generation_id IS NULL)
OR (source = 'generated' AND generation_id IS NOT NULL)
CHECK alias IS NULL OR alias ~ '^@[a-zA-Z0-9_-]+$'
CHECK file_size > 0
CHECK (width IS NULL OR (width > 0 AND width <= 8192))
AND (height IS NULL OR (height > 0 AND height <= 8192))
```
**Indexes:**
```sql
CREATE UNIQUE INDEX idx_images_project_alias
ON images(project_id, alias)
WHERE alias IS NOT NULL AND deleted_at IS NULL AND flow_id IS NULL;
CREATE INDEX idx_images_project_source
ON images(project_id, source, created_at DESC)
WHERE deleted_at IS NULL;
CREATE INDEX idx_images_flow ON images(flow_id) WHERE flow_id IS NOT NULL;
CREATE INDEX idx_images_generation ON images(generation_id);
CREATE INDEX idx_images_storage_key ON images(storage_key);
CREATE INDEX idx_images_hash ON images(file_hash);
```
---
### 6. GENERATIONS
```typescript
generations {
id: UUID (PK)
// Relations
project_id: UUID (FK -> projects) CASCADE
flow_id: UUID (FK -> flows) SET NULL
api_key_id: UUID (FK -> api_keys) SET NULL
// Status
status: ENUM('pending', 'processing', 'success', 'failed') DEFAULT 'pending'
// Prompts
original_prompt: TEXT
enhanced_prompt: TEXT // AI-enhanced version (if enabled)
// Generation parameters
aspect_ratio: VARCHAR(10)
width: INTEGER
height: INTEGER
// AI Model
model_name: VARCHAR(100) DEFAULT 'gemini-flash-image-001'
model_version: VARCHAR(50)
// Result
output_image_id: UUID (FK -> images) SET NULL
// Referenced images used in generation
// Format: [{ "imageId": "uuid", "alias": "@product" }, ...]
referenced_images: JSONB
// Error handling
error_message: TEXT
error_code: VARCHAR(50)
retry_count: INTEGER DEFAULT 0
// Metrics
processing_time_ms: INTEGER
cost: INTEGER // In cents (USD)
// Request context
request_id: UUID // For log correlation
user_agent: TEXT
ip_address: INET
// Metadata
meta: JSONB DEFAULT {}
// Audit
created_at: TIMESTAMP
updated_at: TIMESTAMP
}
```
**Purpose:** Complete audit trail of all image generations
**Key Design Decisions:**
- `referenced_images` as JSONB instead of M:N table (simpler, sufficient for reference info)
- No `parent_generation_id` - not needed for MVP
- No `final_prompt` - redundant with `enhanced_prompt` or `original_prompt`
- No `completed_at` - use `updated_at` when `status` changes to success/failed
- `api_key_id` for audit trail of who made the request
- Technical aliases resolved programmatically, not stored
**Referenced Images Format:**
```json
[
{ "imageId": "uuid-1", "alias": "@product" },
{ "imageId": "uuid-2", "alias": "@style" }
]
```
**Constraints:**
```sql
CHECK (status = 'success' AND output_image_id IS NOT NULL)
OR (status != 'success')
CHECK (status = 'failed' AND error_message IS NOT NULL)
OR (status != 'failed')
CHECK retry_count >= 0
CHECK processing_time_ms IS NULL OR processing_time_ms >= 0
CHECK cost IS NULL OR cost >= 0
```
**Indexes:**
```sql
CREATE INDEX idx_generations_project_status
ON generations(project_id, status, created_at DESC);
CREATE INDEX idx_generations_flow
ON generations(flow_id, created_at DESC)
WHERE flow_id IS NOT NULL;
CREATE INDEX idx_generations_output ON generations(output_image_id);
CREATE INDEX idx_generations_request ON generations(request_id);
```
---
### 7. PROMPT_URL_CACHE
```typescript
prompt_url_cache {
id: UUID (PK)
// Relations
project_id: UUID (FK -> projects) CASCADE
generation_id: UUID (FK -> generations) CASCADE
image_id: UUID (FK -> images) CASCADE
// Cache keys (SHA-256 hashes)
prompt_hash: VARCHAR(64)
query_params_hash: VARCHAR(64)
// Original request (for debugging/reconstruction)
original_prompt: TEXT
request_params: JSONB // { width, height, aspectRatio, template, ... }
// Cache statistics
hit_count: INTEGER DEFAULT 0
last_hit_at: TIMESTAMP
// Audit
created_at: TIMESTAMP
}
```
**Purpose:** Deduplication and caching for Prompt URL feature
**Key Design Decisions:**
- Composite unique key: `project_id + prompt_hash + query_params_hash`
- No `expires_at` - cache lives forever unless manually cleared
- Tracks `hit_count` for analytics
**Constraints:**
```sql
CHECK hit_count >= 0
```
**Indexes:**
```sql
CREATE UNIQUE INDEX idx_cache_key
ON prompt_url_cache(project_id, prompt_hash, query_params_hash);
CREATE INDEX idx_cache_generation ON prompt_url_cache(generation_id);
CREATE INDEX idx_cache_image ON prompt_url_cache(image_id);
CREATE INDEX idx_cache_hits
ON prompt_url_cache(project_id, hit_count DESC, created_at DESC);
```
---
## 🔗 Relationships Summary
### One-to-Many (1:M)
1. **organizations → projects** (CASCADE)
2. **organizations → api_keys** (CASCADE)
3. **projects → api_keys** (CASCADE)
4. **projects → flows** (CASCADE)
5. **projects → images** (CASCADE)
6. **projects → generations** (CASCADE)
7. **projects → prompt_url_cache** (CASCADE)
8. **flows → images** (CASCADE)
9. **flows → generations** (SET NULL)
10. **generations → images** (SET NULL) - output image
11. **api_keys → images** (SET NULL) - who created
12. **api_keys → generations** (SET NULL) - who requested
### Cascade Rules
**ON DELETE CASCADE:**
- Deleting organization → deletes all projects, api_keys
- Deleting project → deletes all flows, images, generations, cache
- Deleting flow → deletes all flow-scoped images
- Deleting generation → nothing (orphaned references OK)
**ON DELETE SET NULL:**
- Deleting generation → sets `images.generation_id` to NULL
- Deleting image → sets `generations.output_image_id` to NULL
- Deleting flow → sets `generations.flow_id` to NULL
- Deleting api_key → sets audit references to NULL
---
## 🎯 Alias System
### Two-Tier Alias Scope
#### Project-Scoped (Global)
- **Storage:** `images.alias` column
- **Lifetime:** Permanent (until image deleted)
- **Visibility:** Across entire project
- **Examples:** `@logo`, `@brand`, `@header`
- **Use Case:** Reusable brand assets
#### Flow-Scoped (Temporary)
- **Storage:** `flows.aliases` JSONB
- **Lifetime:** Duration of flow
- **Visibility:** Only within specific flow
- **Examples:** `@hero`, `@product`, `@variant`
- **Use Case:** Conversational generation chains
#### Technical Aliases (Computed)
- **Storage:** None (computed on-the-fly)
- **Types:**
- `@last` - Last generation in flow (any status)
- `@first` - First generation in flow
- `@upload` - Last uploaded image in flow
- **Implementation:** Query-based resolution
### Resolution Algorithm
```
1. Check if technical alias (@last, @first, @upload) → compute from flow data
2. Check flow.aliases for flow-scoped alias → return if found
3. Check images.alias for project-scoped alias → return if found
4. Return null (alias not found)
```
---
## 🔧 Dual Alias Assignment
### Uploads
```typescript
POST /api/images/upload
{
file: <binary>,
alias: "@product", // Project-scoped (optional)
flowAlias: "@hero", // Flow-scoped (optional)
flowId: "uuid" // Required if flowAlias provided
}
```
**Result:**
- If `alias` provided → set `images.alias = "@product"`
- If `flowAlias` provided → add to `flows.aliases["@hero"] = imageId`
- Can have both simultaneously
### Generations
```typescript
POST /api/generations
{
prompt: "hero image",
assignAlias: "@brand", // Project-scoped (optional)
assignFlowAlias: "@hero", // Flow-scoped (optional)
flowId: "uuid"
}
```
**Result (after successful generation):**
- If `assignAlias` → set `images.alias = "@brand"` on output image
- If `assignFlowAlias` → add to `flows.aliases["@hero"] = outputImageId`
---
## 📊 Performance Optimizations
### Critical Indexes
All indexes listed in individual table sections above. Key performance considerations:
1. **Alias Lookup:** Partial index on `images(project_id, alias)` WHERE conditions
2. **Flow Activity:** Composite index on `generations(flow_id, created_at)`
3. **Cache Hit:** Unique composite on `prompt_url_cache(project_id, prompt_hash, query_params_hash)`
4. **Audit Queries:** Indexes on `api_key_id` columns
### Denormalization
**Avoided intentionally:**
- No counters (image_count, generation_count)
- Computed via COUNT(*) queries with proper indexes
- Simpler, more reliable, less trigger overhead
---
## 🧹 Data Lifecycle
### Soft Delete
**Tables with soft delete:**
- `images` - via `deleted_at` column
**Cleanup strategy:**
- Hard delete after 30 days of soft delete
- Implemented via cron job or manual cleanup script
### Hard Delete
**Tables with hard delete:**
- `generations` - cascade deletes
- `flows` - cascade deletes
- `prompt_url_cache` - cascade deletes
---
## 🔐 Security & Audit
### API Key Tracking
All mutations tracked via `api_key_id`:
- `images.api_key_id` - who uploaded/generated
- `generations.api_key_id` - who requested generation
### Request Correlation
- `generations.request_id` - correlate with application logs
- `generations.user_agent` - client identification
- `generations.ip_address` - rate limiting, abuse prevention
---
## 🚀 Migration Strategy
### Phase 1: Core Tables
1. Create `flows` table
2. Create `images` table
3. Create `generations` table
4. Add all indexes and constraints
5. Migrate existing MinIO data to `images` table
### Phase 2: Advanced Features
1. Create `prompt_url_cache` table
2. Add indexes
3. Implement cache warming for existing data (optional)
---
## 📝 Design Decisions Log
### Why JSONB for `flows.aliases`?
- Simple key-value structure
- No need for JOINs
- Flexible schema
- Atomic updates
- Trade-off: No referential integrity (acceptable for temporary data)
### Why JSONB for `generations.referenced_images`?
- Reference info is append-only
- No need for complex queries on references
- Simpler schema (one less table)
- Trade-off: No CASCADE on image deletion (acceptable)
### Why no `namespaces`?
- Adds complexity without clear benefit for MVP
- Flow-scoped + project-scoped aliases sufficient
- Can add later if needed
### Why no `generation_groups`?
- Not needed for core functionality
- Grouping can be done via tags or meta JSONB
- Can add later if analytics requires it
### Why `focal_point` as JSONB?
- Imageflow server expects normalized coordinates
- Format: `{ "x": 0.0-1.0, "y": 0.0-1.0 }`
- JSONB allows future extension (e.g., multiple focal points)
### Why track `api_key_id` in images/generations?
- Essential for audit trail
- Cost attribution per key
- Usage analytics
- Abuse detection
---
## 📚 References
- **Imageflow Focal Points:** https://docs.imageflow.io/querystring/focal-point
- **Drizzle ORM:** https://orm.drizzle.team/
- **PostgreSQL JSONB:** https://www.postgresql.org/docs/current/datatype-json.html
---
*Document Version: 2.0*
*Last Updated: 2025-10-26*
*Status: Ready for Implementation*