MagicOnnxRuntimeGenAi 0.4.0.3

.NET 8.0 .NET Standard 2.0

The owner has unlisted this package. This could mean that the package is deprecated, has security vulnerabilities or shouldn't be used anymore.

dotnet add package MagicOnnxRuntimeGenAi --version 0.4.0.3

NuGet\Install-Package MagicOnnxRuntimeGenAi -Version 0.4.0.3

This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.

<PackageReference Include="MagicOnnxRuntimeGenAi" Version="0.4.0.3" />

For projects that support PackageReference, copy this XML node into the project file to reference the package.

<PackageVersion Include="MagicOnnxRuntimeGenAi" Version="0.4.0.3" />
                    

                            Directory.Packages.props

<PackageReference Include="MagicOnnxRuntimeGenAi" />
                    

                            Project file

For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.

paket add MagicOnnxRuntimeGenAi --version 0.4.0.3

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

#r "nuget: MagicOnnxRuntimeGenAi, 0.4.0.3"

#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.

#:package MagicOnnxRuntimeGenAi@0.4.0.3

#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.

#addin nuget:?package=MagicOnnxRuntimeGenAi&version=0.4.0.3
                    

                            Install as a Cake Addin

#tool nuget:?package=MagicOnnxRuntimeGenAi&version=0.4.0.3
                    

                            Install as a Cake Tool

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

MagicOnnxRuntimeGenAI

MagicOnnxRuntimeGenAI is an extension of the Microsoft.ML.OnnxRuntimeGenAI library that removes the limitations associated with hardware utilization and platform compatibility. It allows you to run multiple AI models on different hardware environments (CPU, CUDA, DirectML) simultaneously in a single instance, solving the original library’s constraint of choosing only one type of hardware at a time. The goal is to maintain code similarity with the original library, while enhancing flexibility and scalability.

Nuget

CPU: https://www.nuget.org/packages/MagicOnnxRuntimeGenAi.Cpu/0.4.0.2

DirectML: https://www.nuget.org/packages/MagicOnnxRuntimeGenAi.DirectML/0.4.0.2

Cuda: (Still being worked on. The 0.4.0.1 release isn't fully working. The 0.4.0.2 will be a working version) https://www.nuget.org/packages/MagicOnnxRuntimeGenAi.Cuda/0.4.0.2

The main Cuda DLL's I'll be hosting on HuggingFace. These files will download in runtime due to free storage limitations. Plus maybe that's the better way anyways. It's the exact DLL's from Microsoft. You can use Microsofts official ones for the OnnxRuntimeGenAI library if you'd like, just go to their github:

Microsoft GenAI Github: https://github.com/microsoft/onnxruntime-genai

My HuggingFace Dataset hosted DLLs: https://huggingface.co/datasets/magiccodingman/MagicOnnxRuntimeGenAI

Features

Multi-hardware support: Run CPU, CUDA, and DirectML versions in parallel, enabling better performance scaling across platforms.
Automatic library path handling: Dynamically manage paths to hardware-specific DLLs, eliminating conflicts from shared DLL names.
Simple migration: Maintain close compatibility with Microsoft.ML.OnnxRuntimeGenAI. Just add "Magic" to the class names to switch to the enhanced version.
Cross-platform AI scaling: Utilize different hardware setups on platforms like Android, iOS, Windows, Linux, and Mac.
ASP.NET and client-side AI models: Run AI models on different devices and environments without being restricted to server-side execution.
Automated DirectML setup: Automatically adds the DirectML.dll to your output directory.
XUnit test support: Includes test samples showcasing CPU and DirectML models running in parallel for better validation.

Motivation

The original OnnxRuntimeGenAI library imposes limitations on using hardware acceleration across different platforms. For instance, running AI tasks on CUDA restricts you to NVIDIA GPUs, while DirectML is Windows-only. These restrictions make it difficult to scale AI solutions across platforms like mobile, web, and desktop applications. With MagicOnnxRuntimeGenAI, you can overcome these barriers and run different models on various hardware configurations (CPU, GPU, NPU) in parallel.

Use Cases

Running a text embedding LLM on one CPU thread while simultaneously running another LLM on another CPU thread, and utilizing DirectML for a larger LLM.
Developing an AI-powered client-side application using a platform like MAUI Blazor, which can scale across different platforms.
Creating an ASP.NET REST API that manages multiple AI models across various hardware environments.
Maintaining flexibility and scalability while minimizing server-side dependencies, reducing latency, and improving control.

Library Structure

The key libraries (cpu, cuda, dml) are separated into different folders, avoiding conflicts due to identical DLL names. This allows you to utilize all three in a single application.

For DirectML users, MagicOnnxRuntimeGenAI automatically includes the required DirectML.dll in the output directory.

Quickstart Example

Here are some examples showcasing how to use MagicOnnxRuntimeGenAI:

CPU Model Example

/// <summary>
/// Run a model using the CPU.
/// </summary>
/// <returns></returns>
[Fact]
public async Task Phi3MiniCpuResponse()
{
    var model = new MagicModel(GlobalSetup.CpuModelPath);
    var tokenizer = new MagicTokenizer(model);

    string systemPrompt = @"You're a helpful AI assistant.";
    string userPrompt = @"Write a very short story about a goblin becoming a hero and saving the princess.";

    var aiResponse = await new CallAi().GenerateAIResponseV6(model, tokenizer, systemPrompt, userPrompt, null, 4000, ConsoleColor.Red);
    _output.WriteLine(aiResponse.UpdatedHistory.LastOrDefault().aiResponse);

    var endAiMessage = aiResponse.UpdatedHistory.LastOrDefault().aiResponse;
    Assert.True(!string.IsNullOrWhiteSpace(endAiMessage));
}

DirectML Model Example (Windows Only)

/// <summary>
/// Run a model using DirectML (Windows-only).
/// </summary>
/// <returns></returns>
[Fact]
public async Task Phi3MiniDmlResponse()
{
    var model = new MagicModel(GlobalSetup.DmlModelPath);
    var tokenizer = new MagicTokenizer(model);

    string systemPrompt = @"You're a helpful AI assistant.";
    string userPrompt = @"Write a very short story about a goblin becoming a hero and saving the princess.";

    var aiResponse = await new CallAi().GenerateAIResponseV6(model, tokenizer, systemPrompt, userPrompt, null, 4000, ConsoleColor.Red);
    _output.WriteLine(aiResponse.UpdatedHistory.LastOrDefault().aiResponse);

    var endAiMessage = aiResponse.UpdatedHistory.LastOrDefault().aiResponse;
    Assert.True(!string.IsNullOrWhiteSpace(endAiMessage));
}

Parallel Execution: CPU and DirectML Models

/// <summary>
/// Run both CPU and DirectML models in parallel.
/// </summary>
/// <returns></returns>
[Fact]
public async Task Phi3MiniDmlAndCpuResponse()
{
    var cpuModel = new MagicModel(GlobalSetup.CpuModelPath);
    var dmlModel = new MagicModel(GlobalSetup.DmlModelPath);
    var cpuTokenizer = new MagicTokenizer(cpuModel);
    var dmlTokenizer = new MagicTokenizer(dmlModel);

    string systemPrompt = @"You're a helpful AI assistant.";
    string userPrompt = @"Write a very short story about a goblin becoming a hero and saving the princess.";

    // Start the CPU model response task
    var cpuResponseTask = Task.Run(() =>
        new CallAi().GenerateAIResponseV6(cpuModel, cpuTokenizer, systemPrompt, userPrompt, null, 4000, ConsoleColor.Red)
    );

    // Start the DML model response task with a delay
    var dmlResponseTask = Task.Run(async () =>
    {
        await Task.Delay(6000); // Delay for DML response
        return await new CallAi().GenerateAIResponseV6(dmlModel, dmlTokenizer, systemPrompt, userPrompt, null, 4000, ConsoleColor.Red);
    });

    // Await both tasks
    var results = await Task.WhenAll(cpuResponseTask, dmlResponseTask);

    // Extract and output responses
    var cpuResponse = results[0].UpdatedHistory.LastOrDefault().aiResponse;
    var dmlResponse = results[1].UpdatedHistory.LastOrDefault().aiResponse;

    _output.WriteLine(cpuResponse);
    _output.WriteLine(dmlResponse);

    Assert.True(!string.IsNullOrWhiteSpace(cpuResponse), "CPU model response should not be null or whitespace.");
    Assert.True(!string.IsNullOrWhiteSpace(dmlResponse), "DML model response should not be null or whitespace.");
}

Future Development

CUDA Support: Plans to extend the capabilities, but currently replicates what's in the original GenAI.Cuda
OnnxRuntime support: There are plans to extend the Magic protocol to the larger OnnxRuntime library.
Automatic GenAI updates: Automating the update process to newer versions of OnnxRuntimeGenAI.
New Projects: Future projects will build on this, making AI easier to use with higher-level abstractions.

Contributing

Contributions are welcome! If you wish to add features, please ensure you include relevant XUnit tests to make merging easier. The project’s goal is to remain as close to the original OnnxRuntimeGenAI as possible, with minimal changes.

How to Contribute

Fork the repository.
Create a new branch (feature/my-feature).
Write your code and accompanying unit tests.
Submit a pull request.

Product	Compatible and additional computed target framework versions.
.NET	net5.0 was computed. net5.0-windows was computed. net6.0 was computed. net6.0-android was computed. net6.0-ios was computed. net6.0-maccatalyst was computed. net6.0-macos was computed. net6.0-tvos was computed. net6.0-windows was computed. net7.0 was computed. net7.0-android was computed. net7.0-ios was computed. net7.0-maccatalyst was computed. net7.0-macos was computed. net7.0-tvos was computed. net7.0-windows was computed. net8.0 is compatible. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net9.0 was computed. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 was computed. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed.
.NET Core	netcoreapp2.0 was computed. netcoreapp2.1 was computed. netcoreapp2.2 was computed. netcoreapp3.0 was computed. netcoreapp3.1 was computed.
.NET Standard	netstandard2.0 is compatible. netstandard2.1 was computed.
.NET Framework	net461 was computed. net462 was computed. net463 was computed. net47 was computed. net471 was computed. net472 was computed. net48 was computed. net481 was computed.
MonoAndroid	monoandroid was computed.
MonoMac	monomac was computed.
MonoTouch	monotouch was computed.
Tizen	tizen40 was computed. tizen60 was computed.
Xamarin.iOS	xamarinios was computed.
Xamarin.Mac	xamarinmac was computed.
Xamarin.TVOS	xamarintvos was computed.
Xamarin.WatchOS	xamarinwatchos was computed.

Compatible target framework(s)

Included target framework(s) (in package)

Learn more about Target Frameworks and .NET Standard.

.NETStandard 2.0
- Newtonsoft.Json (>= 13.0.3)
- System.Memory (>= 4.5.5)
net8.0
- Newtonsoft.Json (>= 13.0.3)
- System.Memory (>= 4.5.5)

NuGet packages (3)

Showing the top 3 NuGet packages that depend on MagicOnnxRuntimeGenAi:

Package	Downloads
MagicOnnxRuntimeGenAi.Cpu OnnxRuntimeGenAI.CPU variant that removes the limitation of not being able to use CPU, Cuda, and DirectML at once.	1.6K
MagicOnnxRuntimeGenAi.DirectML OnnxRuntimeGenAI.DirectML variant that removes the limitation of not being able to use CPU, Cuda, and DirectML at once.	654
MagicOnnxRuntimeGenAi.Cuda OnnxRuntimeGenAI.Cuda variant that removes the limitation of not being able to use CPU, Cuda, and DirectML at once.	476

GitHub repositories

This package is not used by any popular GitHub repositories.

Version	Downloads	Last Updated