5.8 KiB
Technical Architecture and Functional Specification
Date: 2025-11-01
Version: 1.0
Status: ✅ Validated (current technical architecture)
Related docs: strategy/07-validated-icp-ai-developers.md, execution/09-mvp-scope.md, execution/12-the-current-tech-state.md
Platform Overview
Banatie is an API-first platform for programmatic generation and delivery of production-ready media assets. Unlike traditional generators, Banatie integrates cutting-edge AI generation capabilities (powered by Google Gemini) with a complete production delivery pipeline (CDN, hosting, transformations).
Target audience: Need to be investigated. The working hypothesis: Developers, webmasters, and SaaS creators who need a comprehensive, optimized solution for automating content creation and embedding.
Technology Stack
| Component | Technology | Role in Architecture |
|---|---|---|
| Core Synthesizer | Gemini 2.5 Flash Image | High-speed image synthesis engine |
| AI Agent Models | Gemini 2.5 Flash (and other fast models) | Prompt Enhancement (prompt optimization) and Asset Analysis (metadata extraction and focal point detection) |
| Backend & API Gateway | Express (Node.js) | High-performance REST API server and Flow-Based Generation logic |
| Frontend & UI | Next.js | Main website, documentation, demo UIs |
| Account Management | nextjs/saas-starter (Template) | Foundation for auth architecture, organizations, and projects |
| Object Storage | MinIO (S3-compatible) | Primary, highly-available storage for generated and uploaded assets |
| Image Transformation | Imageflow-Server | Dynamic asset transformation (resize, crop, format) via Query Params |
| Content Delivery (CDN) | Cloudflare | Global caching and optimized delivery of transformed images |
| Database | PostgreSQL | Relational storage for generation metadata, users, projects, and billing |
| Deployment | Docker / VPS | Containerization and service hosting |
Core Generation & Delivery Flow
The pipeline is divided into 6 stages to ensure production-ready assets:
Stage 1: User Input
Receive unstructured prompt (in any language) and additional parameters (style, aspect ratio).
Stage 2: Prompt Enhancement (AI Agent)
Specialized agent analyzes, translates, and optimizes the prompt (considering selected style and Gemini best practices), creating a detailed, highly-effective request.
Stage 3: Core Image Synthesis
Optimized prompt is sent to Gemini API for image generation.
Stage 4: Asset Analysis & Metadata Extraction
Second AI agent analyzes the generated image, identifying the focal point and key metadata needed for proper automatic cropping/transformation.
Stage 5: Asset Persistence & Indexing
Image is saved to MinIO. Metadata (prompts, parameters, focal point) is indexed in PostgreSQL.
Stage 6: Production URL & Delivery
A permanent, cacheable URL is generated. On request, the image passes through Imageflow-Server (transformation) and is cached in Cloudflare CDN. The API response also includes a set of common transformation presets for convenient layout integration.
Core Differentiating Features
| Feature | Description | Developer Value |
|---|---|---|
| Flow-Based Chained Generation | Programmatic sequence of generations where each new generation has access to context and results from previous Flow steps | Enables creation of complex, logically connected asset sets (character iterations, game assets) |
| On-Demand Generation via URL | Image generation triggered by GET request to URL with prompt in Query Params. Repeated requests return cached asset | Allows LLM agents to generate HTML pages with ready-made, optimized images |
| Contextual Asset Referencing | Ability to assign names to assets (@logo) and use these names directly in text prompts to pass reference images to the model |
Simplifies Inpainting/Outpainting and content creation tied to brand or existing elements |
| Image Transformation Pipeline | Dynamic image transformation (resize, aspect ratio change, focal point cropping, formats) via Query Params in CDN link | Eliminates manual image processing, ensuring optimal load speed and quality across all devices |
| Namespaces & Styles | Virtual asset separation in projects with ability to set common system prompts and styles for visual consistency | Ideal for managing brand guidelines or styling different website sections |
Integration Channels
REST API
Primary channel providing full access to all features.
JS/TS SDK
High-level wrapper for convenient programmatic work with Flow-Based Generation.
Model Context Protocol (MCP)
Specialized API/protocol for integration with LLMs and AI agents, optimized for contextual and sequential requests.
User Interface (UI)
Web interface for testing and debugging. Every generation includes Code Snippets for API, SDK, and MCP.
Authorization
Based on API keys (apikey). Each key is associated with an Organization/Project pair for access control and billing isolation.
MVP Release Strategy
For the first public release, full functionality is required in the following key areas:
1. Core Generation
Fully functional Prompt Enhancement and Asset Persistence.
2. Delivery Pipeline
Working Image Transformation Pipeline with CDN, generating production-ready links.
3. Unique Features
On-Demand Generation via URL and basic Contextual Asset Referencing (@logo).
4. Authorization & Billing
Fully functional API Key system and Free Tier with usage limit enforcement.
Document owner: Oleg (technical lead) Last updated: 2025-11-01 Next review: After ICP validation