Fuuga.Image.cpu 1.0.12

dotnet add package Fuuga.Image.cpu --version 1.0.12
                    
NuGet\Install-Package Fuuga.Image.cpu -Version 1.0.12
                    
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="Fuuga.Image.cpu" Version="1.0.12" />
                    
For projects that support PackageReference, copy this XML node into the project file to reference the package.
<PackageVersion Include="Fuuga.Image.cpu" Version="1.0.12" />
                    
Directory.Packages.props
<PackageReference Include="Fuuga.Image.cpu" />
                    
Project file
For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.
paket add Fuuga.Image.cpu --version 1.0.12
                    
#r "nuget: Fuuga.Image.cpu, 1.0.12"
                    
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
#:package Fuuga.Image.cpu@1.0.12
                    
#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.
#addin nuget:?package=Fuuga.Image.cpu&version=1.0.12
                    
Install as a Cake Addin
#tool nuget:?package=Fuuga.Image.cpu&version=1.0.12
                    
Install as a Cake Tool

Fuuga.Image

This is image generation MCP client for Fuuga (or other agents) to make the agent capable of producing images via other (Stable Diffusion) models from for example HuggingFace.

Standalone CLI tool for image generation and captioning, built on stable-diffusion.cpp and Phi-3.5-vision.

Commands

fuuga-image txt2img     Generate an image from a text prompt
fuuga-image img2img     Transform an image guided by a text prompt
fuuga-image txt2video   Generate a video from a text prompt (video models only)
fuuga-image img2video   Animate a still image into a video (video models only)
fuuga-image vid2vid     Edit/continue a video guided by text (VACE; video models only)
fuuga-image caption     Generate a text description of an image
fuuga-image info        Display model metadata

Quick Start

dotnet build

# Text-to-image
dotnet run -- txt2img --model path/to/model.ckpt --prompt "a cat sitting on a windowsill" --steps 20

# Image-to-image
dotnet run -- img2img --model path/to/model.ckpt --source input.png --prompt "in watercolor style" --strength 0.75

# Text-to-video (requires a video model, e.g. a Wan2.1/2.2 T2V GGUF)
dotnet run -- txt2video --model path/to/wan-t2v.gguf --prompt "a wave crashing on rocks" --frames 16 --fps 8 --format mp4

# Image-to-video (animate a still)
dotnet run -- img2video --model path/to/wan-i2v.gguf --source photo.png --prompt "gentle zoom" --frames 24 --format gif

# Video-to-video (edit/continue a video guided by text; --source is a video file or a frames dir)
dotnet run -- vid2vid --model path/to/wan-vace.gguf --source clip.mp4 --prompt "make it snowy" --format mp4

# Attach an audio track to a generated mp4 (models don't make audio; this muxes a provided file)
dotnet run -- txt2video --model path/to/wan-t2v.gguf --prompt "a marching band" --format mp4 --audio track.mp3

# Caption an image
dotnet run -- caption --model path/to/vision-model/ --image photo.jpg --detail standard

# Caption a video (samples frames and describes them; --video is a file or a frames dir)
dotnet run -- caption --model path/to/vision-model/ --video clip.mp4 --frame-samples 8 --detail detailed

# Transcribe/describe audio (needs an audio-capable model, e.g. Phi-4-multimodal — same runtime)
dotnet run -- caption --model path/to/phi-4-multimodal/ --audio speech.wav

# Model info
dotnet run -- info --model path/to/model.ckpt

Common Options

Option Description Default
--model <path> Path to model file or directory (required)
--output <dir> Output directory output
--format <fmt> Output format: png or jpeg png
--gpu Enable GPU acceleration off
--verbose Enable detailed logging off

txt2img Options

Option Description Default
--prompt <text> Text description of desired image (required)
--negative <text> What to avoid in the image none
--width <int> Image width in pixels (multiple of 8) 512
--height <int> Image height in pixels (multiple of 8) 512
--steps <int> Denoising steps 20
--cfg <float> Classifier-free guidance scale 7.0
--seed <int> Random seed (-1 for random) -1
--sampler <name> euler, euler_a, heun, dpm2, dpm++2s, lcm euler_a
--batch <int> Number of images to generate 1
--vae <path> External VAE model none
--lora <path> LoRA weights none
--quantization <q> f32, f16, q8_0, q5_1, q4_0 f16

img2img Options

Same as txt2img, plus:

Option Description Default
--source <path> Source image to transform (required)
--strength <float> Denoising strength 0.0-1.0 0.75

caption Options

Captioning uses whatever multimodal model directory you pass to --model. Provide one of --image (a still), --video (a video file decoded via ffmpeg, or a directory of frames, sampled to --frame-samples), or --audio (transcribe/describe an audio clip). Audio needs an audio-capable model such as Phi-4-multimodal — it runs on the same ONNX-Runtime-GenAI runtime already bundled (no extra dependency), you just point --model at it.

Option Description Default
--image <path> Image to caption (one of image/video/audio)
--video <path> Video file or frames directory to describe (one of image/video/audio)
--audio <path> Audio file to transcribe/describe (audio-capable model) (one of image/video/audio)
--frame-samples <N> Frames sampled from the video 8
--detail <level> brief, standard, detailed standard
--max-tokens <int> Maximum tokens to generate 256
--prompt <text> Custom prompt for image captioning (must contain <\|image_1\|>) none

txt2video / img2video / vid2vid Options

Video is generated only when the loaded model supports it (e.g. Wan2.1/2.2 text-to-video, image-to-video, or VACE GGUF models); the tool checks the model's capability and errors clearly otherwise. img2video additionally takes --source <path> (the starting image); vid2vid takes --source <path> (a video file or a directory of frame images) plus --vace-strength <f>.

Video models produce no audio. --audio <path> muxes a user-provided audio file into an mp4/webp output via ffmpeg (ignored for gif / image sequences).

Option Description Default
--frames <int> Number of frames to generate 16
--fps <int> Frames per second of the output 8
--width / --height Frame size 512
--steps / --cfg / --seed / --sampler Same as txt2img
--format <fmt> png (image sequence, default), mp4, gif, webp png

Output formats. png writes an image sequence (frame_0000.png …) into a per-run subdirectory — this is dependency-free and always available. mp4, gif, and webp are encoded from those frames with ffmpeg, which is an optional external tool (see below). If a single-file format is requested but ffmpeg is not found, the tool falls back to an image sequence and prints a note.

Optional: ffmpeg for mp4 / gif / webp

ffmpeg is not a .NET dependency and is not bundled. To enable single-file video output, install it and make sure it is on PATH (or point FUUGA_FFMPEG at the executable):

  • Windows: winget install Gyan.FFmpeg (or choco install ffmpeg), or download from https://ffmpeg.org/download.html and add the bin folder to PATH.
  • macOS: brew install ffmpeg
  • Linux: sudo apt install ffmpeg (Debian/Ubuntu) or your distro's package manager.
  • Explicit path: set FUUGA_FFMPEG=/full/path/to/ffmpeg to use a specific build.

Without ffmpeg, --format png (image sequence) still works and can be turned into a video later with any tool.

MCP Server Mode

Fuuga.Image can run as an MCP tool server over stdio, enabling integration with Fuuga LLM via the MCP protocol:

dotnet run -- --mcp-server

This exposes these tools via JSON-RPC 2.0 over stdin/stdout:

  • fuuga_image_txt2img -- text-to-image generation
  • fuuga_image_img2img -- image-to-image transformation
  • fuuga_image_txt2video -- text-to-video generation (video models only)
  • fuuga_image_img2video -- image-to-video animation (video models only)
  • fuuga_image_caption -- image captioning

See architecture.md Section 18 for the full integration design.

F# Script Demo

An interactive demo script shows how to use Fuuga.Image as an F# library — calling the modules directly with record-based configuration, validation, and I/O:

dotnet build Fuuga.Image
dotnet fsi examples/image-demo.fsx

The script runs without models (dry-run mode), exercising types, validation, image I/O, and prompt templates. Uncomment Section 5 and set your model paths for actual generation. See examples/image-demo.fsx for the full source.

Project Structure

Fuuga.Image/
  Types.fs        Layer 0: Domain types, error handling (ImageError DU)
  Config.fs       Layer 1: CLI argument parsing
  ImageIO.fs      Layer 2: Image load/save, format conversion, validation
  Diffusion.fs    Layer 3: Stable Diffusion model wrapper (txt2img, img2img)
  Caption.fs      Layer 4: Phi-3.5-vision captioning (ONNX Runtime GenAI)
  McpServer.fs    Layer 5: MCP JSON-RPC 2.0 server over stdio
  Program.fs      Layer 6: Entry point, subcommand routing, error handling

Architecture

Product Compatible and additional computed target framework versions.
.NET net10.0 is compatible.  net10.0-android was computed.  net10.0-browser was computed.  net10.0-ios was computed.  net10.0-maccatalyst was computed.  net10.0-macos was computed.  net10.0-tvos was computed.  net10.0-windows was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last Updated
1.0.12 44 7/3/2026

1.0.11: Compatibility release — the new `fuuga image` command in the Fuuga package drives this tool's txt2img / img2img / caption tools over MCP.