banatie-strategy/09_mvp_scope.md

778 lines
19 KiB
Markdown

# MVP Scope: Banatie for AI Developers
**Date:** October 20, 2025
**Target ICP:** AI-powered developers (Claude Code, Cursor users)
**Development Timeline:** 4-6 weeks
**Launch Goal:** First 5-10 beta users by end of November 2025
---
## MVP Philosophy
**Principle:** Build the MINIMUM to validate willingness to pay, not the complete vision.
**Goal:** Solve ONE core problem exceptionally well:
> "Generate production-ready images from Claude Code without context switching"
**Not the goal:** Build all planned features (Flow, Namespaces, On-demand URL generation)
---
## Core Value Proposition (MVP)
**What users get:**
1. MCP integration for Claude Code → generate images without leaving environment
2. Prompt Enhancement → write in any language, get optimized results
3. Production CDN URLs → no manual download/upload/hosting
4. Contextual references → maintain consistency across assets (@logo, @hero)
5. Basic transformations → resize, format, optimize automatically
**What users DON'T get in MVP:**
- Flow-based chained generation (future)
- Namespaces / project organization (future)
- On-demand generation via URL (future)
- Advanced focal point analysis (future)
- Team collaboration features (future)
---
## MUST HAVE Features (Launch Blockers)
### 1. MCP Server for Claude Code ✅ CRITICAL
**Why it's critical:**
- This is the KILLER FEATURE
- Solves the #1 pain point (context switching)
- No competitor has this
- Differentiates from "just another AI image API"
**Functionality:**
```typescript
// MCP Tools to implement:
banatie_generate({
prompt: string, // User's prompt (any language)
name?: string, // Optional name for reference (e.g., "logo")
style?: string, // Optional style preset
aspectRatio?: string, // e.g., "16:9", "1:1", "4:3"
width?: number, // Target width for optimization
referenceImages?: string[] // Array of @names to reference
})
// Returns: {
// url: "https://cdn.banatie.app/...",
// name: string (if provided),
// transformations: { ... } (preset URLs)
// }
banatie_upload({
file: base64 | url,
name: string // Required for referencing
})
// Returns: {
// name: string,
// url: "https://cdn.banatie.app/..."
// }
banatie_list_images({
limit?: number
})
// Returns: Array of previously generated/uploaded images
```
**Implementation notes:**
- Follow MCP spec: https://modelcontextprotocol.io/
- Test with Claude Desktop + Claude Code
- Provide clear error messages
- Handle API key authentication
**Time estimate:** 1-2 weeks
---
### 2. Prompt Enhancement Agent ✅ CRITICAL
**Why it's critical:**
- Solves #2 pain point (prompt engineering complexity)
- Massive value for non-native English speakers (Russian developers)
- Improves generation quality automatically
- Low-hanging fruit (you already have working version)
**Functionality:**
**Input:** User's prompt (any language, casual description)
```
"маг эльфийской крови в средневековом городе на закате"
```
**Process:**
1. Detect language (if not English, translate)
2. Analyze intent and visual elements
3. Apply Gemini 2.5 Flash Image best practices:
- Camera parameters for photorealism
- Lighting descriptions
- Composition guidelines
- Style keywords
4. Generate optimized English prompt
**Output:** Production-ready prompt
```
"A photorealistic portrait of an elven-blooded wizard standing in a medieval European city street at golden hour sunset, warm amber lighting, shot with 85mm lens at f/2.8, shallow depth of field, detailed texture on stone buildings, volumetric light rays, cinematic composition"
```
**Implementation notes:**
- Use Gemini 2.0 Flash (fast, cheap)
- Include Google's official guidelines in agent prompt
- Cache common translations
- Show both original + enhanced prompt to user (educational)
**Time estimate:** 3-5 days (since you already have working version)
---
### 3. Asset Persistence + CDN URLs ✅ CRITICAL
**Why it's critical:**
- Solves #3 pain point (manual file management)
- "Production-ready" means hosted + optimized
- No competitor bundles this (they just return base64)
**Functionality:**
**Storage:**
- MinIO (S3-compatible) ✅ Already implemented
- Organized by user/project
- Permanent URLs (no expiry)
**CDN Delivery:**
- Cloudflare CDN integration
- Global caching
- Automatic optimization
**URL structure:**
```
https://cdn.banatie.app/u/{user_id}/{image_id}.webp?w=800&q=90
```
**Metadata stored (PostgreSQL):**
- image_id (UUID)
- user_id
- original_prompt (user's input)
- enhanced_prompt (after agent)
- generation_params (style, aspect ratio, etc.)
- reference_images (if used)
- name (if provided, for @references)
- created_at
- generation_cost (for billing)
**Time estimate:** 1 week (storage already done, need CDN setup)
---
### 4. Basic Image Transformations ✅ IMPORTANT
**Why it's important:**
- Solves #4 pain point (responsive images, formats)
- Demonstrates "production-ready" value
- Common use case (mobile vs. desktop)
**Functionality:**
**Via URL query parameters:**
```
?w=800 // Width resize
?h=600 // Height resize
?ar=16:9 // Aspect ratio crop
?f=webp // Format (webp, png, jpg, avif)
?q=85 // Quality (1-100)
?fit=cover // Fit mode (cover, contain, fill)
```
**Preset transformations (returned in API response):**
```json
{
"url": "https://cdn.banatie.app/u/123/img456.webp",
"transformations": {
"mobile": "...?w=400&f=webp&q=85",
"tablet": "...?w=768&f=webp&q=85",
"desktop": "...?w=1200&f=webp&q=90",
"thumbnail": "...?w=150&h=150&fit=cover&f=webp"
}
}
```
**Implementation:**
- Imageflow-Server (as per tech spec) ✅ Planned
- OR Cloudflare Image Resizing (simpler for MVP)
- Cache transformed versions (don't regenerate)
**Time estimate:** 1 week
---
### 5. Contextual Asset Referencing (@name) ✅ UNIQUE FEATURE
**Why it's critical:**
- Solves #5 pain point (consistency across assets)
- UNIQUE to Banatie (no competitor has this)
- Enables powerful workflows (brand consistency, character consistency)
**Functionality:**
**Naming assets:**
```javascript
// Generate and name
banatie.generate("fictional water brand logo", {name: "logo"})
// Upload and name
banatie.upload("./logo.png", {name: "logo"})
```
**Referencing in future generations:**
```javascript
// Use @name in prompt
banatie.generate("product photo with @logo on table")
banatie.generate("hero banner with @logo in nature background")
```
**Behind the scenes:**
1. Parse prompt for @references
2. Fetch referenced images from storage
3. Include as image inputs to Gemini API
4. Generate with visual context
**Implementation notes:**
- Simple regex to find @names in prompts
- Replace @names with actual image references
- Support multiple @references in one prompt
- Show which references were used (transparency)
**Time estimate:** 3-5 days
---
### 6. REST API ✅ FOUNDATION
**Why it's critical:**
- MCP is built on top of this
- Enables direct integration for power users
- Future SDK/libraries depend on this
**Endpoints:**
```
POST /v1/generate
Body: {
prompt: string,
name?: string,
style?: string,
aspectRatio?: string,
width?: number,
referenceImages?: string[]
}
Response: {
id: string,
url: string,
enhanced_prompt: string,
transformations: {...}
}
POST /v1/upload
Body: { file: base64, name: string }
Response: { name: string, url: string }
GET /v1/images
Query: ?limit=20
Response: [ {...image objects} ]
GET /v1/images/:id
Response: {...image object with metadata}
DELETE /v1/images/:id
Response: { success: boolean }
GET /v1/account/usage
Response: {
generations_used: number,
credits_remaining: number,
tier: "free" | "credits" | "pro"
}
```
**Authentication:**
- API key in header: `Authorization: Bearer bnt_xxx`
- Rate limiting (by tier)
- Usage tracking (for billing)
**Time estimate:** 1 week (partially done)
---
### 7. Simple UI / Playground ✅ IMPORTANT
**Why it's important:**
- First impression (users want to "see it work")
- Visual proof of quality
- Educational (shows code snippets)
- Low barrier to test
**Pages:**
**1. Homepage (Landing)**
- Value prop headline
- 3-step explanation (MCP → Generate → CDN)
- "Try Demo" CTA
- Features overview
- Pricing overview
**2. Demo/Playground**
- API key input (no registration needed for MVP)
- Prompt textarea (accepts Russian, English, etc.)
- Style dropdown (optional)
- Aspect ratio selector
- Generate button
- Results display:
- Generated image
- Original + enhanced prompt (side by side)
- Transformation previews
- **Code snippets panel** (cURL, Python, JS, MCP)
- Copy-to-clipboard for URLs
**3. Dashboard (After API key entered)**
- Generation history (last 20)
- Usage stats (generations used, credits left)
- API key management
- Billing / credits (if applicable)
**Tech stack:**
- Next.js ✅ Already implemented
- Tailwind CSS ✅ Already used
- Simple, developer-focused design (not marketing fluff)
**Time estimate:** 1 week (refine existing demo UI)
---
### 8. Credit-Based Payment System ✅ REQUIRED FOR REVENUE
**Why it's critical:**
- Can't validate willingness to pay without payment system
- Credits model validated in ICP research
- Stripe integration straightforward
**Functionality:**
**Credit packs for purchase:**
- $20 = 200 generations (90-day expiry)
- $50 = 600 generations (90-day expiry)
- $100 = 1,500 generations (90-day expiry)
**Free tier:**
- 10 generations/month
- Resets monthly
- Watermark (SynthID) on images
**Payment flow:**
1. User clicks "Buy Credits"
2. Stripe Checkout (hosted page)
3. Webhook on success → add credits to account
4. Credits deducted per generation
**Stripe setup:**
- Products: 3 credit packs
- Webhook handler for `checkout.session.completed`
- Customer portal (manage payment methods)
**Database schema additions:**
```sql
credits_transactions (
id, user_id, amount, pack_size,
expires_at, stripe_session_id, created_at
)
users.credits_balance (integer)
users.credits_expiry (timestamp)
```
**Time estimate:** 1 week
---
## NICE TO HAVE (If Time Permits)
### 9. @last Reference (Shortcut)
**Functionality:**
```javascript
banatie.generate("hero in armor")
banatie.generate("make @last more detailed") // References previous generation
```
**Why nice-to-have:**
- Convenient for iteration
- Simple to implement (just cache last generation ID)
**Time estimate:** 1-2 days
---
### 10. Batch Generation (Multiple Prompts)
**Functionality:**
```javascript
banatie.generateBatch([
"hero level 1",
"hero level 2",
"hero level 3"
])
```
**Why nice-to-have:**
- Useful for game asset generation (Oleg's use case)
- Not critical for initial validation
**Time estimate:** 2-3 days
---
## CUT FROM MVP (Future Roadmap)
### ❌ Flow-Based Chained Generation
**Why cut:**
- Complex to build (requires state management, execution engine)
- Hard to explain (cognitive load)
- Niche use case (not every developer needs this)
- Can add after PMF
**Future priority:** HIGH (after 50+ users)
---
### ❌ Namespaces / Project Organization
**Why cut:**
- Users can manage with @names for MVP
- Adds UI complexity (project switcher, settings)
- Not blocking for core workflow
**Future priority:** MEDIUM (after 100+ users)
---
### ❌ On-Demand Generation via URL
**Why cut:**
- Clever feature but not core pain point
- Requires caching strategy, URL signing
- Better to validate core workflow first
**Future priority:** HIGH (cool differentiator, but after PMF)
---
### ❌ Advanced Focal Point Analysis
**Why cut:**
- Nice-to-have for auto-cropping
- Not critical if transformations are manual
- Can use basic center-crop for MVP
**Future priority:** LOW (automation, not essential)
---
### ❌ Style Presets / Fine-Tuning
**Why cut:**
- Gemini 2.5 Flash Image is already great
- Adds complexity (preset management, UI)
- Prompt Enhancement covers most use cases
**Future priority:** MEDIUM (after users request specific styles)
---
### ❌ Team Collaboration / Multi-User
**Why cut:**
- ICP is solo developers
- Adds auth complexity (invites, roles, permissions)
- Can add when agencies become customers
**Future priority:** MEDIUM (for agency expansion)
---
### ❌ Image Editing / Inpainting
**Why cut:**
- Out of scope (we're generation, not editing)
- Complex UI (selection tools, masks)
- Gemini supports it, but not MVP focus
**Future priority:** LOW (different product direction)
---
## Technical Architecture (MVP)
### Backend (Express + Node.js) ✅ Existing
**Core services:**
- API Gateway (REST endpoints)
- Prompt Enhancement Agent (Gemini 2.0 Flash)
- Image Generation (Gemini 2.5 Flash Image)
- Asset Manager (MinIO integration)
- Transformation Service (Imageflow or Cloudflare)
- Billing Service (Stripe webhooks)
**Database (PostgreSQL):**
- users (id, email, api_key, credits_balance, tier, created_at)
- images (id, user_id, prompt, enhanced_prompt, url, name, metadata, created_at)
- credits_transactions (id, user_id, amount, expires_at, stripe_session_id)
- api_keys (id, user_id, key_hash, last_used, created_at)
---
### Frontend (Next.js) ✅ Existing
**Pages:**
- `/` - Landing page
- `/demo` - Playground (API key input + generation)
- `/dashboard` - History + usage (after auth)
- `/pricing` - Credit packs
- `/docs` - API documentation
**Components:**
- ImageGenerator (prompt input + results)
- CodeSnippets (cURL, Python, JS, MCP examples)
- TransformationPreview (show different sizes/formats)
- ApiKeyInput (simple auth for demo)
---
### Infrastructure
**Hosting:**
- VPS (Contabo, Singapore) ✅ Existing
- Docker containers (backend + frontend)
**Storage:**
- MinIO (S3-compatible) ✅ Existing
**CDN:**
- Cloudflare (free tier OK for MVP)
**Payments:**
- Stripe (standard integration)
**Monitoring:**
- Basic logging (PM2 logs)
- Uptime monitoring (UptimeRobot free tier)
- Error tracking (Sentry free tier)
---
## Development Timeline (4-6 Weeks)
### Week 1: Core Generation Pipeline
- [ ] Finalize REST API endpoints
- [ ] Prompt Enhancement Agent (refine existing)
- [ ] Image generation with reference support
- [ ] Basic storage + CDN integration
**Deliverable:** Working API (generate + upload + references)
---
### Week 2: MCP Implementation
- [ ] MCP server setup (follow spec)
- [ ] Implement 3 tools (generate, upload, list)
- [ ] Test with Claude Desktop
- [ ] Documentation for MCP usage
**Deliverable:** Working MCP integration
---
### Week 3: Transformations + UI
- [ ] Image transformation service (Imageflow or Cloudflare)
- [ ] Refine demo UI (code snippets, previews)
- [ ] Dashboard (history, usage stats)
- [ ] API key management
**Deliverable:** Functional UI for testing
---
### Week 4: Payments + Polish
- [ ] Stripe integration (credit packs)
- [ ] Free tier limits enforcement
- [ ] Watermark for free tier
- [ ] Landing page copy + design
- [ ] API documentation
**Deliverable:** Monetization-ready product
---
### Week 5-6: Beta Testing + Iteration
- [ ] Invite 5-10 validated users from research
- [ ] High-touch onboarding (help with setup)
- [ ] Gather feedback, fix bugs
- [ ] Iterate on UX pain points
- [ ] Prepare for public launch
**Deliverable:** Product-market fit signals or pivot triggers
---
## Success Metrics (MVP)
### Technical Metrics
- [ ] MCP integration works reliably (95%+ success rate)
- [ ] Image generation latency <10 seconds (p95)
- [ ] CDN delivery fast (global, <500ms)
- [ ] API uptime >99%
### Product Metrics
- [ ] 5-10 beta users onboarded
- [ ] 50+ generations completed
- [ ] 3+ users generate >10 images (engaged)
- [ ] 2+ users purchase credits (willingness to pay validated)
### Qualitative Metrics
- [ ] "This solves my problem" feedback (3+ users)
- [ ] Feature requests are refinements, not fundamental changes
- [ ] Users recommend to others
- [ ] Low churn (users stick around after trial)
---
## What "Done" Looks Like (MVP Launch)
**A developer can:**
1. Install Banatie MCP in Claude Desktop (5 min setup)
2. Use Claude Code to generate a Next.js site
3. Generate images via MCP without leaving Claude Code:
```
Human: Create a hero image for this landing page about eco-friendly water bottles
Claude: [calls banatie_generate MCP tool]
I've generated a hero image and inserted the production CDN URL in the code.
```
4. Reference previous images for consistency:
```
Human: Now create a product photo with the same bottle @hero
Claude: [calls banatie_generate with @hero reference]
Done! Product photo maintains the same bottle design.
```
5. See generated images in demo UI (history, transformations)
6. Copy code snippets (cURL, Python, JS) for direct API use
7. Purchase credits ($20 pack) via Stripe
8. Use credits for additional generations
**And it feels:**
- Fast (no waiting, instant CDN URLs)
- Seamless (no context switching)
- Professional (production-ready, not prototype)
- Trustworthy (stable, reliable, documented)
---
## Risk Mitigation
### Risk 1: MCP Integration Complexity
**Mitigation:**
- Study existing MCP servers (examples in repo)
- Test early and often with Claude Desktop
- Provide clear error messages
- Fallback: REST API works even if MCP has issues
---
### Risk 2: Prompt Enhancement Quality
**Mitigation:**
- Use Gemini 2.0 Flash (fast, capable)
- Include Google's official guidelines in agent prompt
- Show both prompts to user (transparency)
- Allow user to override/edit enhanced prompt
---
### Risk 3: CDN/Transformation Service Complexity
**Mitigation:**
- Start with Cloudflare Image Resizing (simpler)
- Fallback: Imageflow-Server if Cloudflare insufficient
- Precompute common transformations (cache)
---
### Risk 4: Payment Fraud / Abuse
**Mitigation:**
- Stripe handles fraud detection
- Rate limit free tier aggressively (10/month)
- Monitor usage patterns (flag anomalies)
- Require email verification for credits purchase
---
### Risk 5: Gemini API Costs Exceeding Revenue
**Mitigation:**
- Track cost per generation (Gemini API fees)
- Ensure pricing covers costs + margin
- Free tier is truly limited (10/month)
- Monitor burn rate daily
---
## Post-MVP Roadmap (Prioritized)
### Phase 2 (After PMF):
1. Flow-based generation (chained workflows)
2. On-demand generation via URL (programmatic)
3. Pro subscription tier (500 gen/month included)
4. Advanced style presets
5. Batch generation
### Phase 3 (Agency Expansion):
1. Namespaces / project organization
2. Team collaboration (multi-user)
3. Usage analytics dashboard
4. White-label / reseller options
### Phase 4 (Platform Play):
1. Public API marketplace (share custom agents)
2. Community styles / presets
3. Integrations (Figma, Vercel, Netlify)
4. Enterprise tier (SLA, support, SSO)
---
## Decision Gates
**After 4 weeks (MVP complete):**
- [ ] Beta test with 5-10 users
- [ ] If 60%+ engaged → continue
- [ ] If <60% engaged → reassess features/ICP
**After 6 weeks (Beta feedback):**
- [ ] If 2+ purchased credits → validated willingness to pay
- [ ] If 0 purchases → pricing issue or value unclear
- [ ] If bugs/UX issues → iterate 1-2 more weeks
**After 8 weeks (Soft Launch):**
- [ ] Post in r/ClaudeAI, Indie Hackers
- [ ] Track sign-ups, usage, conversion
- [ ] If $500+ MRR → full launch
- [ ] If <$500 MRR → pivot or iterate
---
**Document owner:** @men + Oleg (joint)
**Status:** Draft for MVP development
**Next review:** After validation complete (if GO decision)
**Related docs:** `07_validated_icp_ai_developers.md`, `10_pricing_strategy.md`