Fuuga.Image.cpu
1.0.12
dotnet add package Fuuga.Image.cpu --version 1.0.12
NuGet\Install-Package Fuuga.Image.cpu -Version 1.0.12
<PackageReference Include="Fuuga.Image.cpu" Version="1.0.12" />
<PackageVersion Include="Fuuga.Image.cpu" Version="1.0.12" />
<PackageReference Include="Fuuga.Image.cpu" />
paket add Fuuga.Image.cpu --version 1.0.12
#r "nuget: Fuuga.Image.cpu, 1.0.12"
#:package Fuuga.Image.cpu@1.0.12
#addin nuget:?package=Fuuga.Image.cpu&version=1.0.12
#tool nuget:?package=Fuuga.Image.cpu&version=1.0.12
Fuuga.Image
This is image generation MCP client for Fuuga (or other agents) to make the agent capable of producing images via other (Stable Diffusion) models from for example HuggingFace.
Standalone CLI tool for image generation and captioning, built on stable-diffusion.cpp and Phi-3.5-vision.
Commands
fuuga-image txt2img Generate an image from a text prompt
fuuga-image img2img Transform an image guided by a text prompt
fuuga-image txt2video Generate a video from a text prompt (video models only)
fuuga-image img2video Animate a still image into a video (video models only)
fuuga-image vid2vid Edit/continue a video guided by text (VACE; video models only)
fuuga-image caption Generate a text description of an image
fuuga-image info Display model metadata
Quick Start
dotnet build
# Text-to-image
dotnet run -- txt2img --model path/to/model.ckpt --prompt "a cat sitting on a windowsill" --steps 20
# Image-to-image
dotnet run -- img2img --model path/to/model.ckpt --source input.png --prompt "in watercolor style" --strength 0.75
# Text-to-video (requires a video model, e.g. a Wan2.1/2.2 T2V GGUF)
dotnet run -- txt2video --model path/to/wan-t2v.gguf --prompt "a wave crashing on rocks" --frames 16 --fps 8 --format mp4
# Image-to-video (animate a still)
dotnet run -- img2video --model path/to/wan-i2v.gguf --source photo.png --prompt "gentle zoom" --frames 24 --format gif
# Video-to-video (edit/continue a video guided by text; --source is a video file or a frames dir)
dotnet run -- vid2vid --model path/to/wan-vace.gguf --source clip.mp4 --prompt "make it snowy" --format mp4
# Attach an audio track to a generated mp4 (models don't make audio; this muxes a provided file)
dotnet run -- txt2video --model path/to/wan-t2v.gguf --prompt "a marching band" --format mp4 --audio track.mp3
# Caption an image
dotnet run -- caption --model path/to/vision-model/ --image photo.jpg --detail standard
# Caption a video (samples frames and describes them; --video is a file or a frames dir)
dotnet run -- caption --model path/to/vision-model/ --video clip.mp4 --frame-samples 8 --detail detailed
# Transcribe/describe audio (needs an audio-capable model, e.g. Phi-4-multimodal — same runtime)
dotnet run -- caption --model path/to/phi-4-multimodal/ --audio speech.wav
# Model info
dotnet run -- info --model path/to/model.ckpt
Common Options
| Option | Description | Default |
|---|---|---|
--model <path> |
Path to model file or directory | (required) |
--output <dir> |
Output directory | output |
--format <fmt> |
Output format: png or jpeg |
png |
--gpu |
Enable GPU acceleration | off |
--verbose |
Enable detailed logging | off |
txt2img Options
| Option | Description | Default |
|---|---|---|
--prompt <text> |
Text description of desired image | (required) |
--negative <text> |
What to avoid in the image | none |
--width <int> |
Image width in pixels (multiple of 8) | 512 |
--height <int> |
Image height in pixels (multiple of 8) | 512 |
--steps <int> |
Denoising steps | 20 |
--cfg <float> |
Classifier-free guidance scale | 7.0 |
--seed <int> |
Random seed (-1 for random) | -1 |
--sampler <name> |
euler, euler_a, heun, dpm2, dpm++2s, lcm |
euler_a |
--batch <int> |
Number of images to generate | 1 |
--vae <path> |
External VAE model | none |
--lora <path> |
LoRA weights | none |
--quantization <q> |
f32, f16, q8_0, q5_1, q4_0 |
f16 |
img2img Options
Same as txt2img, plus:
| Option | Description | Default |
|---|---|---|
--source <path> |
Source image to transform | (required) |
--strength <float> |
Denoising strength 0.0-1.0 | 0.75 |
caption Options
Captioning uses whatever multimodal model directory you pass to --model. Provide one of
--image (a still), --video (a video file decoded via ffmpeg, or a directory of frames, sampled
to --frame-samples), or --audio (transcribe/describe an audio clip). Audio needs an
audio-capable model such as Phi-4-multimodal — it runs on the same ONNX-Runtime-GenAI runtime
already bundled (no extra dependency), you just point --model at it.
| Option | Description | Default |
|---|---|---|
--image <path> |
Image to caption | (one of image/video/audio) |
--video <path> |
Video file or frames directory to describe | (one of image/video/audio) |
--audio <path> |
Audio file to transcribe/describe (audio-capable model) | (one of image/video/audio) |
--frame-samples <N> |
Frames sampled from the video | 8 |
--detail <level> |
brief, standard, detailed |
standard |
--max-tokens <int> |
Maximum tokens to generate | 256 |
--prompt <text> |
Custom prompt for image captioning (must contain <\|image_1\|>) |
none |
txt2video / img2video / vid2vid Options
Video is generated only when the loaded model supports it (e.g. Wan2.1/2.2 text-to-video,
image-to-video, or VACE GGUF models); the tool checks the model's capability and errors clearly
otherwise. img2video additionally takes --source <path> (the starting image); vid2vid takes
--source <path> (a video file or a directory of frame images) plus --vace-strength <f>.
Video models produce no audio. --audio <path> muxes a user-provided audio file into an
mp4/webp output via ffmpeg (ignored for gif / image sequences).
| Option | Description | Default |
|---|---|---|
--frames <int> |
Number of frames to generate | 16 |
--fps <int> |
Frames per second of the output | 8 |
--width / --height |
Frame size | 512 |
--steps / --cfg / --seed / --sampler |
Same as txt2img | |
--format <fmt> |
png (image sequence, default), mp4, gif, webp |
png |
Output formats. png writes an image sequence (frame_0000.png …) into a per-run
subdirectory — this is dependency-free and always available. mp4, gif, and webp are
encoded from those frames with ffmpeg, which is an optional external tool (see below). If a
single-file format is requested but ffmpeg is not found, the tool falls back to an image
sequence and prints a note.
Optional: ffmpeg for mp4 / gif / webp
ffmpeg is not a .NET dependency and is not bundled. To enable single-file video output, install it
and make sure it is on PATH (or point FUUGA_FFMPEG at the executable):
- Windows:
winget install Gyan.FFmpeg(orchoco install ffmpeg), or download from https://ffmpeg.org/download.html and add thebinfolder toPATH. - macOS:
brew install ffmpeg - Linux:
sudo apt install ffmpeg(Debian/Ubuntu) or your distro's package manager. - Explicit path: set
FUUGA_FFMPEG=/full/path/to/ffmpegto use a specific build.
Without ffmpeg, --format png (image sequence) still works and can be turned into a video later
with any tool.
MCP Server Mode
Fuuga.Image can run as an MCP tool server over stdio, enabling integration with Fuuga LLM via the MCP protocol:
dotnet run -- --mcp-server
This exposes these tools via JSON-RPC 2.0 over stdin/stdout:
fuuga_image_txt2img-- text-to-image generationfuuga_image_img2img-- image-to-image transformationfuuga_image_txt2video-- text-to-video generation (video models only)fuuga_image_img2video-- image-to-video animation (video models only)fuuga_image_caption-- image captioning
See architecture.md Section 18 for the full integration design.
F# Script Demo
An interactive demo script shows how to use Fuuga.Image as an F# library — calling the modules directly with record-based configuration, validation, and I/O:
dotnet build Fuuga.Image
dotnet fsi examples/image-demo.fsx
The script runs without models (dry-run mode), exercising types, validation, image I/O, and prompt templates. Uncomment Section 5 and set your model paths for actual generation. See examples/image-demo.fsx for the full source.
Project Structure
Fuuga.Image/
Types.fs Layer 0: Domain types, error handling (ImageError DU)
Config.fs Layer 1: CLI argument parsing
ImageIO.fs Layer 2: Image load/save, format conversion, validation
Diffusion.fs Layer 3: Stable Diffusion model wrapper (txt2img, img2img)
Caption.fs Layer 4: Phi-3.5-vision captioning (ONNX Runtime GenAI)
McpServer.fs Layer 5: MCP JSON-RPC 2.0 server over stdio
Program.fs Layer 6: Entry point, subcommand routing, error handling
Architecture
- Image Extension Architecture -- standalone design
- Main Architecture Section 18 -- Fuuga LLM integration
| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net10.0 is compatible. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed. |
-
net10.0
- FSharp.Core (>= 10.1.301)
- HPPH.SkiaSharp (>= 1.0.0)
- Microsoft.ML.OnnxRuntimeGenAI (>= 0.14.1)
- StableDiffusion.NET (>= 7.0.0)
- StableDiffusion.NET.Backend.Cpu (>= 7.0.0)
NuGet packages
This package is not used by any NuGet packages.
GitHub repositories
This package is not used by any popular GitHub repositories.
| Version | Downloads | Last Updated |
|---|---|---|
| 1.0.12 | 44 | 7/3/2026 |
1.0.11: Compatibility release — the new `fuuga image` command in the Fuuga package drives this tool's txt2img / img2img / caption tools over MCP.