Microsoft.ML.OnnxRuntimeGenAI
0.1.0-rc4
Prefix Reserved
See the version list below for details.
dotnet add package Microsoft.ML.OnnxRuntimeGenAI --version 0.1.0-rc4
NuGet\Install-Package Microsoft.ML.OnnxRuntimeGenAI -Version 0.1.0-rc4
<PackageReference Include="Microsoft.ML.OnnxRuntimeGenAI" Version="0.1.0-rc4" />
paket add Microsoft.ML.OnnxRuntimeGenAI --version 0.1.0-rc4
#r "nuget: Microsoft.ML.OnnxRuntimeGenAI, 0.1.0-rc4"
// Install Microsoft.ML.OnnxRuntimeGenAI as a Cake Addin #addin nuget:?package=Microsoft.ML.OnnxRuntimeGenAI&version=0.1.0-rc4&prerelease // Install Microsoft.ML.OnnxRuntimeGenAI as a Cake Tool #tool nuget:?package=Microsoft.ML.OnnxRuntimeGenAI&version=0.1.0-rc4&prerelease
ONNX Runtime Generative AI
Run generative AI models with ONNX Runtime.
This library provides the generative AI loop for ONNX models, including inference with ONNX Runtime, logits processing, search and sampling, and KV cache management.
Users can call a high level generate()
method, or run each iteration of the model in a loop.
- Support greedy/beam search and TopP, TopK sampling to generate token sequences
- Built in logits processing like repetition penalties
- Easy custom scoring
Features
- Supported model architectures:
- Gemma
- LLaMA
- Mistral
- Phi-2
- Supported targets:
- CPU
- GPU (CUDA)
- Supported sampling features
- Beam search
- Greedy search
- Top P/Top K
- APIs
- Python
- C/C++
Coming very soon
- Support for the Whisper model architectures
- C# API
- Support for DirectML
Roadmap
- Automatic model download and cache
- More model architectures
Sample code for phi-2 in Python
Install onnxruntime-genai.
(Temporary) Build and install from source according to the instructions below.
import onnxruntime_genai as og
model = og.Model(f'models/microsoft/phi-2', device_type)
tokenizer = og.Tokenizer(model)
prompt = '''def print_prime(n):
"""
Print all primes between 1 and n
"""'''
tokens = tokenizer.encode(prompt)
params = og.SearchParams(model)
params.set_search_options({"max_length":200})
params.input_ids = tokens
output_tokens = model.generate(params)
text = tokenizer.decode(output_tokens)
print("Output:")
print(text)
Build from source
This step requires cmake
to be installed.
Clone this repo
git clone https://github.com/microsoft/onnxruntime-genai cd onnxruntime-genai
Install ONNX Runtime
By default, the onnxruntime-genai build expects to find the ONNX Runtime include and binaries in a folder called
ort
in the root directory of onnxruntime-genai. You can put the ONNX Runtime files in a different location and specify this location to the onnxruntime-genai build. These instructions use ORT_HOME as the location.Install from release
These instructions are for the Linux GPU build of ONNX Runtime. Replace the location with the operating system and target of choice.
cd $ORT_HOME wget https://github.com/microsoft/onnxruntime/releases/download/v1.17.1/onnxruntime-linux-x64-gpu-1.17.1.tgz tar xvzf onnxruntime-linux-x64-gpu-1.17.1.tgz mv onnxruntime-linux-x64-gpu-1.17.1/include . mv onnxruntime-linux-x64-gpu-1.17.1/lib .
Or build from source
git clone https://github.com/microsoft/onnxruntime.git cd onnxruntime
Create include and lib folders in the ORT_HOME directory
mkdir $ORT_HOME/include mkdir $ORT_HOME/lib
Build from source and copy the include and libraries into ORT_HOME
On Windows
build.bat --config RelWithDebInfo --build_shared_lib --skip_tests --parallel [--use_cuda] copy include\onnxruntime\core\session\onnxruntime_c_api.h $ORT_HOME\include copy build\Windows\RelWithDebInfo\RelWithDebInfo\*.dll $ORT_HOME\lib
On Linux
./build.sh --build_shared_lib --skip_tests --parallel [--use_cuda] cp include/onnxruntime/core/session/onnxruntime_c_api.h $ORT_HOME/include cp build/Linux/RelWithDebInfo/libonnxruntime*.so* $ORT_HOME/lib
Build onnxruntime-genai
If you are building for CUDA, add the cuda_home argument.
cd .. python build.py [--cuda_home <path_to_cuda_home>]
Install Python wheel
cd build/wheel pip install *.whl
Model download and export
ONNX models are run from a local folder, via a string supplied to the Model()
method.
To source microsoft/phi-2
optimized for your target, download and run the following script. You will need to be logged into Hugging Face via the CLI to run the script.
Install model builder dependencies.
pip install numpy
pip install transformers
pip install torch
pip install onnx
pip install onnxruntime
Export int4 CPU version
huggingface-cli login --token <your HuggingFace token>
python -m onnxruntime_genai.models.builder -m microsoft/phi-2 -p int4 -e cpu -o <model folder>
Contributing
This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.
Trademarks
This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.
Product | Versions Compatible and additional computed target framework versions. |
---|---|
.NET | net5.0 was computed. net5.0-windows was computed. net6.0 was computed. net6.0-android was computed. net6.0-ios was computed. net6.0-maccatalyst was computed. net6.0-macos was computed. net6.0-tvos was computed. net6.0-windows was computed. net7.0 was computed. net7.0-android was computed. net7.0-ios was computed. net7.0-maccatalyst was computed. net7.0-macos was computed. net7.0-tvos was computed. net7.0-windows was computed. net8.0 was computed. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. |
.NET Core | netcoreapp3.0 was computed. netcoreapp3.1 was computed. |
.NET Standard | netstandard2.1 is compatible. |
MonoAndroid | monoandroid was computed. |
MonoMac | monomac was computed. |
MonoTouch | monotouch was computed. |
native | native is compatible. |
Tizen | tizen60 was computed. |
Xamarin.iOS | xamarinios was computed. |
Xamarin.Mac | xamarinmac was computed. |
Xamarin.TVOS | xamarintvos was computed. |
Xamarin.WatchOS | xamarinwatchos was computed. |
-
.NETCoreApp 0.0
- Microsoft.ML.OnnxRuntime (>= 1.17.1)
- Microsoft.ML.OnnxRuntimeGenAI.Managed (>= 0.1.0-rc4)
-
.NETFramework 0.0
- Microsoft.ML.OnnxRuntime (>= 1.17.1)
- Microsoft.ML.OnnxRuntimeGenAI.Managed (>= 0.1.0-rc4)
-
.NETStandard 0.0
- Microsoft.ML.OnnxRuntime (>= 1.17.1)
- Microsoft.ML.OnnxRuntimeGenAI.Managed (>= 0.1.0-rc4)
NuGet packages (3)
Showing the top 3 NuGet packages that depend on Microsoft.ML.OnnxRuntimeGenAI:
Package | Downloads |
---|---|
Microsoft.SemanticKernel.Connectors.Onnx
Semantic Kernel connectors for the ONNX runtime. Contains clients for text embedding generation. |
|
Microsoft.KernelMemory.AI.Onnx
Provide access to ONNX LLM models in Kernel Memory to generate text |
|
feiyun0112.SemanticKernel.Connectors.OnnxRuntimeGenAI.CPU
Semantic Kernel connector for Microsoft.ML.OnnxRuntimeGenAI. |
GitHub repositories (2)
Showing the top 2 popular GitHub repositories that depend on Microsoft.ML.OnnxRuntimeGenAI:
Repository | Stars |
---|---|
microsoft/semantic-kernel
Integrate cutting-edge LLM technology quickly and easily into your apps
|
|
microsoft/kernel-memory
RAG architecture: index and query any data using LLM and natural language, track sources, show citations, asynchronous memory patterns.
|
Version | Downloads | Last updated |
---|---|---|
0.5.2 | 60 | 11/25/2024 |
0.5.1 | 640 | 11/13/2024 |
0.5.0 | 575 | 11/7/2024 |
0.4.0 | 32,818 | 8/21/2024 |
0.4.0-rc1 | 199 | 8/14/2024 |
0.3.0 | 22,213 | 6/21/2024 |
0.3.0-rc2 | 1,523 | 5/29/2024 |
0.3.0-rc1 | 189 | 5/22/2024 |
0.2.0 | 552 | 5/20/2024 |
0.2.0-rc7 | 246 | 5/14/2024 |
0.2.0-rc6 | 193 | 5/4/2024 |
0.2.0-rc4 | 482 | 4/25/2024 |
0.2.0-rc3 | 160 | 4/24/2024 |
0.1.0 | 379 | 4/8/2024 |
0.1.0-rc4 | 212 | 3/27/2024 |
Introducing the ONNX Runtime GenAI Library.