banatie-strategy/strategy/11-technical-architecture.md

5.8 KiB

Technical Architecture and Functional Specification

Date: 2025-11-01 Version: 1.0 Status: Validated (current technical architecture) Related docs: strategy/07-validated-icp-ai-developers.md, execution/09-mvp-scope.md, execution/12-the-current-tech-state.md


Platform Overview

Banatie is an API-first platform for programmatic generation and delivery of production-ready media assets. Unlike traditional generators, Banatie integrates cutting-edge AI generation capabilities (powered by Google Gemini) with a complete production delivery pipeline (CDN, hosting, transformations).

Target audience: Need to be investigated. The working hypothesis: Developers, webmasters, and SaaS creators who need a comprehensive, optimized solution for automating content creation and embedding.


Technology Stack

Component Technology Role in Architecture
Core Synthesizer Gemini 2.5 Flash Image High-speed image synthesis engine
AI Agent Models Gemini 2.5 Flash (and other fast models) Prompt Enhancement (prompt optimization) and Asset Analysis (metadata extraction and focal point detection)
Backend & API Gateway Express (Node.js) High-performance REST API server and Flow-Based Generation logic
Frontend & UI Next.js Main website, documentation, demo UIs
Account Management nextjs/saas-starter (Template) Foundation for auth architecture, organizations, and projects
Object Storage MinIO (S3-compatible) Primary, highly-available storage for generated and uploaded assets
Image Transformation Imageflow-Server Dynamic asset transformation (resize, crop, format) via Query Params
Content Delivery (CDN) Cloudflare Global caching and optimized delivery of transformed images
Database PostgreSQL Relational storage for generation metadata, users, projects, and billing
Deployment Docker / VPS Containerization and service hosting

Core Generation & Delivery Flow

The pipeline is divided into 6 stages to ensure production-ready assets:

Stage 1: User Input

Receive unstructured prompt (in any language) and additional parameters (style, aspect ratio).

Stage 2: Prompt Enhancement (AI Agent)

Specialized agent analyzes, translates, and optimizes the prompt (considering selected style and Gemini best practices), creating a detailed, highly-effective request.

Stage 3: Core Image Synthesis

Optimized prompt is sent to Gemini API for image generation.

Stage 4: Asset Analysis & Metadata Extraction

Second AI agent analyzes the generated image, identifying the focal point and key metadata needed for proper automatic cropping/transformation.

Stage 5: Asset Persistence & Indexing

Image is saved to MinIO. Metadata (prompts, parameters, focal point) is indexed in PostgreSQL.

Stage 6: Production URL & Delivery

A permanent, cacheable URL is generated. On request, the image passes through Imageflow-Server (transformation) and is cached in Cloudflare CDN. The API response also includes a set of common transformation presets for convenient layout integration.


Core Differentiating Features

Feature Description Developer Value
Flow-Based Chained Generation Programmatic sequence of generations where each new generation has access to context and results from previous Flow steps Enables creation of complex, logically connected asset sets (character iterations, game assets)
On-Demand Generation via URL Image generation triggered by GET request to URL with prompt in Query Params. Repeated requests return cached asset Allows LLM agents to generate HTML pages with ready-made, optimized images
Contextual Asset Referencing Ability to assign names to assets (@logo) and use these names directly in text prompts to pass reference images to the model Simplifies Inpainting/Outpainting and content creation tied to brand or existing elements
Image Transformation Pipeline Dynamic image transformation (resize, aspect ratio change, focal point cropping, formats) via Query Params in CDN link Eliminates manual image processing, ensuring optimal load speed and quality across all devices
Namespaces & Styles Virtual asset separation in projects with ability to set common system prompts and styles for visual consistency Ideal for managing brand guidelines or styling different website sections

Integration Channels

REST API

Primary channel providing full access to all features.

JS/TS SDK

High-level wrapper for convenient programmatic work with Flow-Based Generation.

Model Context Protocol (MCP)

Specialized API/protocol for integration with LLMs and AI agents, optimized for contextual and sequential requests.

User Interface (UI)

Web interface for testing and debugging. Every generation includes Code Snippets for API, SDK, and MCP.

Authorization

Based on API keys (apikey). Each key is associated with an Organization/Project pair for access control and billing isolation.


MVP Release Strategy

For the first public release, full functionality is required in the following key areas:

1. Core Generation

Fully functional Prompt Enhancement and Asset Persistence.

2. Delivery Pipeline

Working Image Transformation Pipeline with CDN, generating production-ready links.

3. Unique Features

On-Demand Generation via URL and basic Contextual Asset Referencing (@logo).

4. Authorization & Billing

Fully functional API Key system and Free Tier with usage limit enforcement.


Document owner: Oleg (technical lead) Last updated: 2025-11-01 Next review: After ICP validation