Fuuga.Image.cpu 1.0.12

.NET 10.0

dotnet add package Fuuga.Image.cpu --version 1.0.12

NuGet\Install-Package Fuuga.Image.cpu -Version 1.0.12

This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.

<PackageReference Include="Fuuga.Image.cpu" Version="1.0.12" />

For projects that support PackageReference, copy this XML node into the project file to reference the package.

<PackageVersion Include="Fuuga.Image.cpu" Version="1.0.12" />
                    

                            Directory.Packages.props

<PackageReference Include="Fuuga.Image.cpu" />
                    

                            Project file

For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.

paket add Fuuga.Image.cpu --version 1.0.12

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

#r "nuget: Fuuga.Image.cpu, 1.0.12"

#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.

#:package Fuuga.Image.cpu@1.0.12

#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.

#addin nuget:?package=Fuuga.Image.cpu&version=1.0.12
                    

                            Install as a Cake Addin

#tool nuget:?package=Fuuga.Image.cpu&version=1.0.12
                    

                            Install as a Cake Tool

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

Fuuga.Image

This is image generation MCP client for Fuuga (or other agents) to make the agent capable of producing images via other (Stable Diffusion) models from for example HuggingFace.

Standalone CLI tool for image generation and captioning, built on stable-diffusion.cpp and Phi-3.5-vision.

Commands

fuuga-image txt2img     Generate an image from a text prompt
fuuga-image img2img     Transform an image guided by a text prompt
fuuga-image txt2video   Generate a video from a text prompt (video models only)
fuuga-image img2video   Animate a still image into a video (video models only)
fuuga-image vid2vid     Edit/continue a video guided by text (VACE; video models only)
fuuga-image caption     Generate a text description of an image
fuuga-image info        Display model metadata

Quick Start

dotnet build

# Text-to-image
dotnet run -- txt2img --model path/to/model.ckpt --prompt "a cat sitting on a windowsill" --steps 20

# Image-to-image
dotnet run -- img2img --model path/to/model.ckpt --source input.png --prompt "in watercolor style" --strength 0.75

# Text-to-video (requires a video model, e.g. a Wan2.1/2.2 T2V GGUF)
dotnet run -- txt2video --model path/to/wan-t2v.gguf --prompt "a wave crashing on rocks" --frames 16 --fps 8 --format mp4

# Image-to-video (animate a still)
dotnet run -- img2video --model path/to/wan-i2v.gguf --source photo.png --prompt "gentle zoom" --frames 24 --format gif

# Video-to-video (edit/continue a video guided by text; --source is a video file or a frames dir)
dotnet run -- vid2vid --model path/to/wan-vace.gguf --source clip.mp4 --prompt "make it snowy" --format mp4

# Attach an audio track to a generated mp4 (models don't make audio; this muxes a provided file)
dotnet run -- txt2video --model path/to/wan-t2v.gguf --prompt "a marching band" --format mp4 --audio track.mp3

# Caption an image
dotnet run -- caption --model path/to/vision-model/ --image photo.jpg --detail standard

# Caption a video (samples frames and describes them; --video is a file or a frames dir)
dotnet run -- caption --model path/to/vision-model/ --video clip.mp4 --frame-samples 8 --detail detailed

# Transcribe/describe audio (needs an audio-capable model, e.g. Phi-4-multimodal — same runtime)
dotnet run -- caption --model path/to/phi-4-multimodal/ --audio speech.wav

# Model info
dotnet run -- info --model path/to/model.ckpt

Common Options

Option	Description	Default
`--model <path>`	Path to model file or directory	(required)
`--output <dir>`	Output directory	`output`
`--format <fmt>`	Output format: `png` or `jpeg`	`png`
`--gpu`	Enable GPU acceleration	off
`--verbose`	Enable detailed logging	off

txt2img Options

Option	Description	Default
`--prompt <text>`	Text description of desired image	(required)
`--negative <text>`	What to avoid in the image	none
`--width <int>`	Image width in pixels (multiple of 8)	512
`--height <int>`	Image height in pixels (multiple of 8)	512
`--steps <int>`	Denoising steps	20
`--cfg <float>`	Classifier-free guidance scale	7.0
`--seed <int>`	Random seed (-1 for random)	-1
`--sampler <name>`	`euler`, `euler_a`, `heun`, `dpm2`, `dpm++2s`, `lcm`	`euler_a`
`--batch <int>`	Number of images to generate	1
`--vae <path>`	External VAE model	none
`--lora <path>`	LoRA weights	none
`--quantization <q>`	`f32`, `f16`, `q8_0`, `q5_1`, `q4_0`	`f16`

img2img Options

Same as txt2img, plus:

Option	Description	Default
`--source <path>`	Source image to transform	(required)
`--strength <float>`	Denoising strength 0.0-1.0	0.75

caption Options

Captioning uses whatever multimodal model directory you pass to --model. Provide one of --image (a still), --video (a video file decoded via ffmpeg, or a directory of frames, sampled to --frame-samples), or --audio (transcribe/describe an audio clip). Audio needs an audio-capable model such as Phi-4-multimodal — it runs on the same ONNX-Runtime-GenAI runtime already bundled (no extra dependency), you just point --model at it.

Option	Description	Default
`--image <path>`	Image to caption	(one of image/video/audio)
`--video <path>`	Video file or frames directory to describe	(one of image/video/audio)
`--audio <path>`	Audio file to transcribe/describe (audio-capable model)	(one of image/video/audio)
`--frame-samples <N>`	Frames sampled from the video	8
`--detail <level>`	`brief`, `standard`, `detailed`	`standard`
`--max-tokens <int>`	Maximum tokens to generate	256
`--prompt <text>`	Custom prompt for image captioning (must contain `<\\|image_1\\|>`)	none

txt2video / img2video / vid2vid Options

Video is generated only when the loaded model supports it (e.g. Wan2.1/2.2 text-to-video, image-to-video, or VACE GGUF models); the tool checks the model's capability and errors clearly otherwise. img2video additionally takes --source <path> (the starting image); vid2vid takes --source <path> (a video file or a directory of frame images) plus --vace-strength <f>.

Video models produce no audio. --audio <path> muxes a user-provided audio file into an mp4/webp output via ffmpeg (ignored for gif / image sequences).

Option	Description	Default
`--frames <int>`	Number of frames to generate	16
`--fps <int>`	Frames per second of the output	8
`--width` / `--height`	Frame size	512
`--steps` / `--cfg` / `--seed` / `--sampler`	Same as txt2img
`--format <fmt>`	`png` (image sequence, default), `mp4`, `gif`, `webp`	`png`

Output formats. png writes an image sequence (frame_0000.png …) into a per-run subdirectory — this is dependency-free and always available. mp4, gif, and webp are encoded from those frames with ffmpeg, which is an optional external tool (see below). If a single-file format is requested but ffmpeg is not found, the tool falls back to an image sequence and prints a note.

Optional: ffmpeg for mp4 / gif / webp

ffmpeg is not a .NET dependency and is not bundled. To enable single-file video output, install it and make sure it is on PATH (or point FUUGA_FFMPEG at the executable):

Windows: winget install Gyan.FFmpeg (or choco install ffmpeg), or download from https://ffmpeg.org/download.html and add the bin folder to PATH.
macOS: brew install ffmpeg
Linux: sudo apt install ffmpeg (Debian/Ubuntu) or your distro's package manager.
Explicit path: set FUUGA_FFMPEG=/full/path/to/ffmpeg to use a specific build.

Without ffmpeg, --format png (image sequence) still works and can be turned into a video later with any tool.

MCP Server Mode

Fuuga.Image can run as an MCP tool server over stdio, enabling integration with Fuuga LLM via the MCP protocol:

dotnet run -- --mcp-server

This exposes these tools via JSON-RPC 2.0 over stdin/stdout:

fuuga_image_txt2img -- text-to-image generation
fuuga_image_img2img -- image-to-image transformation
fuuga_image_txt2video -- text-to-video generation (video models only)
fuuga_image_img2video -- image-to-video animation (video models only)
fuuga_image_caption -- image captioning

See architecture.md Section 18 for the full integration design.

F# Script Demo

An interactive demo script shows how to use Fuuga.Image as an F# library — calling the modules directly with record-based configuration, validation, and I/O:

dotnet build Fuuga.Image
dotnet fsi examples/image-demo.fsx

The script runs without models (dry-run mode), exercising types, validation, image I/O, and prompt templates. Uncomment Section 5 and set your model paths for actual generation. See examples/image-demo.fsx for the full source.

Project Structure

Fuuga.Image/
  Types.fs        Layer 0: Domain types, error handling (ImageError DU)
  Config.fs       Layer 1: CLI argument parsing
  ImageIO.fs      Layer 2: Image load/save, format conversion, validation
  Diffusion.fs    Layer 3: Stable Diffusion model wrapper (txt2img, img2img)
  Caption.fs      Layer 4: Phi-3.5-vision captioning (ONNX Runtime GenAI)
  McpServer.fs    Layer 5: MCP JSON-RPC 2.0 server over stdio
  Program.fs      Layer 6: Entry point, subcommand routing, error handling

Architecture

Image Extension Architecture -- standalone design
Main Architecture Section 18 -- Fuuga LLM integration

Product	Compatible and additional computed target framework versions.
.NET	net10.0 is compatible. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed.

Compatible target framework(s)

Included target framework(s) (in package)

Learn more about Target Frameworks and .NET Standard.

net10.0
- FSharp.Core (>= 10.1.301)
- HPPH.SkiaSharp (>= 1.0.0)
- Microsoft.ML.OnnxRuntimeGenAI (>= 0.14.1)
- StableDiffusion.NET (>= 7.0.0)
- StableDiffusion.NET.Backend.Cpu (>= 7.0.0)

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version	Downloads	Last Updated
1.0.12	44	7/3/2026

1.0.11: Compatibility release — the new `fuuga image` command in the Fuuga package drives this tool's txt2img / img2img / caption tools over MCP.