math-tasks/.claude/skills/gen-image/SKILL.md

107 lines
6.4 KiB
Markdown

---
name: gen-image
description: >
Generate and modify images via Banatie API. Use this skill whenever the user
asks to generate, create, or make an image, picture, icon, illustration,
background, banner, hero image, photo, thumbnail, or any visual asset. Also
trigger when the user wants to modify, change, fix, adjust, or iterate on an
existing image — e.g. "too detailed", "change the background", "make it
darker", "remove X", "more like Y". Also trigger when the user mentions
Banatie, asks for a sticker, product photo, comic-style art, photorealistic
render, minimalist graphic, or needs to use reference images for generation.
Covers text-to-image, image modification via references, aspect ratios, and
enhancement templates.
---
# Image Generation Skill
Generate and modify images using the Banatie API. Parse user arguments, validate inputs, and run the bundled generation script.
## Arguments
Parse these from the user's message. Ask the user for any missing required arguments.
| Argument | Required | Default | Description |
|----------|----------|---------|-------------|
| **Prompt** | Yes | — | Image description |
| **Output path** | Yes | — | Where to save the file (e.g. `assets/icons/star.png`) |
| **Aspect ratio** | No | `1:1` | `1:1`, `16:9`, `9:16`, `3:2`, `4:3`, `3:4`, `21:9` |
| **Reference images** | No | — | Local file paths or `@alias` names (max 3) |
| **Enhancement template** | No | `general` | `general`, `photorealistic`, `illustration`, `minimalist`, `sticker`, `product`, `comic` |
| **Auto enhance** | No | `true` | Set to `false` to skip AI prompt enhancement and use the prompt as-is |
## Two Modes of Operation
### New image — generate from scratch
The user asks to create something new. No existing image is involved.
### Modify image — iterate on an existing image
The user wants to change, fix, or adjust an image that was already generated or exists in the project. Detect this mode when the user says things like "too detailed", "change the background", "make it brighter", "remove the text", "more like X", or any feedback about a previously generated image.
**In modification mode, always use the current image as a `--ref` argument.** The prompt should describe the desired result (not the diff). For example, if the user says "too many details, should look like an irregular boulder" about `assets/items/asteroid1.png`, run:
```bash
node <skill-dir>/banatie-gen.mjs \
--prompt "simple irregular boulder, smooth rock with minimal details, in No Man's Sky style on white background" \
--output assets/items/asteroids/asteroid1.png \
--ref assets/items/asteroids/asteroid1.png \
--template minimalist
```
The reference image gives the AI a visual anchor (composition, colors, overall shape) while the prompt steers it toward the desired changes. This produces much better results than generating from scratch with a new prompt, because the output stays visually consistent with the original.
## Reference Image Policy
**Never add `--ref` silently when creating a new image.** The rules:
1. **User explicitly provides a ref** (file path or @alias) → use it
2. **Modification mode** (user gives feedback on an existing image) → use the existing image as ref automatically
3. **New image, similar assets exist nearby****ask the user first**: "I see [filename] in the same folder. Would you like to use it as a reference for visual consistency, or generate from scratch?" Do not assume.
4. **New image, no similar context** → generate from scratch, no ref
The project's CLAUDE.md may override this policy with project-specific ref rules (e.g. "always use X as ref for assets in folder Y"). If CLAUDE.md provides ref guidance, follow it without asking.
## Workflow
1. **Determine the mode.** Is this a new image or a modification of an existing one? If the user gives feedback on a recently generated image or asks to change something about an existing file, use modification mode.
2. **Parse arguments** from the user's message. Extract prompt, output path, aspect ratio, references, template, and auto-enhance flag.
3. **Fill missing required arguments.** Suggest an output path based on context. In modification mode, default to overwriting the original file unless the user asks for a variation.
4. **In modification mode:** automatically add the existing image path as `--ref`. Write the prompt as a full description of the desired result, incorporating the user's requested changes. Do not describe only the changes — describe what the final image should look like.
5. **Validate** that any referenced local files exist before proceeding.
6. **Read API docs** from the `docs/` subfolder of this skill when the user needs advanced features (references, flows, aliases). The docs are:
- `docs/image-generation.md` — basic generation, aspect ratios, prompt enhancement, templates
- `docs/image-generation-advanced.md` — reference images, aliases, flows, regeneration
- `docs/images-upload.md` — image upload, alias management
7. **Run generation** using the bundled script (path relative to this skill's directory):
```bash
node <skill-dir>/banatie-gen.mjs \
--prompt "<prompt>" \
--output <path> \
[--aspect-ratio <ratio>] \
[--template <template>] \
[--no-enhance] \
[--ref <file_or_alias>]...
```
Where `<skill-dir>` is the directory containing this SKILL.md (e.g. `.claude/skills/gen-image`).
The script handles polling automatically — if the API returns a pending/processing status, it waits until generation completes (up to 2 minutes).
8. **Evaluate the result.** View the generated image and assess whether it matches the user's request. If it clearly doesn't (wrong style, missing key elements, too different from what was asked), tell the user what went wrong and suggest another attempt with an adjusted prompt. This self-evaluation loop is encouraged.
9. **Handle errors.** If generation fails:
- `UNAUTHORIZED` → check that `BANATIE_KEY` is set in `.env` at the project root
- `RATE_LIMIT_EXCEEDED` → wait and retry, or inform the user (limit: 100 requests/hour)
- `VALIDATION_ERROR` → check prompt, aspect ratio, and reference file formats (PNG, JPEG, WebP, max 5MB)
- Timeout → the generation took too long, suggest retrying with a simpler prompt
10. **Report results**: output file path, image dimensions, and the full command used for reproducibility.
## Environment
The script reads `BANATIE_KEY` from `.env` in the project root. Rate limit: 100 requests per hour.