KokoroSharp 0.5.1

.NET 8.0

dotnet add package KokoroSharp --version 0.5.1

NuGet\Install-Package KokoroSharp -Version 0.5.1

This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.

<PackageReference Include="KokoroSharp" Version="0.5.1" />

For projects that support PackageReference, copy this XML node into the project file to reference the package.

paket add KokoroSharp --version 0.5.1

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

#r "nuget: KokoroSharp, 0.5.1"

#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.

// Install KokoroSharp as a Cake Addin
#addin nuget:?package=KokoroSharp&version=0.5.1

// Install KokoroSharp as a Cake Tool
#tool nuget:?package=KokoroSharp&version=0.5.1

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

https://github.com/user-attachments/assets/869e13c1-675e-4ff8-b89a-215bb802ca39

KokoroSharp

KokoroSharp is a fully-featured inference engine for Kokoro TTS, built entirely in C# with ONNX runtime. It enables developers to perform flexible and fast text-to-speech synthesis utilizing multiple speakers and languages.

Features

Plug & Play integration via the nuget package. All dependencies are handled automatically.
Nuget package includes ALL voices released by hexgrad with their Kokoro 82M v1.0 release.
High-level interface designed to suit both beginners and power users.
Text-segment streaming for seamless text-to-speech. Responses feel instant.
Voice mixing with no restrictions on the amounts of voices mixed, and ability to save/load mixed voices.
Linear job scheduling with background worker as dispatcher.
Optional multi-platform playback support with pre-integrated audio queue handling.

Supports languages/accents:

[American English, British English, Spanish, French, Italian, Brazilian/Portuguese].

With a custom phonemization solution, these additional languages are also supported:

[MandarinChinese, Japanese, Hindi].

How to setup

On Windows, Linux, and MacOS: Install via Nuget (Package Manager or CLI), and you're set!
On Other platforms: For platforms other than the ones above, developers are expected to provide their own phonemization solution. The built-in tokenizer supports raw (phonemes -> tokens) conversion.

The package is accessible on all .NET platforms, yet integrated phonemization is only available with the eSpeak NG backend atm.

Getting started

KokoroTTS tts = KokoroTTS.LoadModel(); // Load or download the model (~320MB for full precision)
KokoroVoice heartVoice = KokoroVoiceManager.GetVoice("af_heart"); // Grab a voice of your liking,
while (true) { tts.SpeakFast(Console.ReadLine(), heartVoice); } // .. and have it speak your text!
// Note: Language detection is automated based on what the loaded voice supports.

Above is a simple way to get started on the highest level. For more control, check out the example Program, which covers more advanced parts like job scheduling, voice mixing, and long-term, speaker-agnostic playback queuing.

Models can be found on taylorchu's releases, and can be loaded via `KokoroTTS.LoadModel("path/to/model")`, or downloaded automatically with `KokoroTTS.LoadModel()`. Check out the various overloads of `KokoroTTS.LoadModel` for background loading.

Notes

KokoroSharp prioritizes a smooth developer experience by logging potential misuse instead of throwing exceptions. Wherever possible, the library attempts to automatically resolve issues to minimize disruptions.
All communication with the AI model and playback devices happens on background threads, letting the main thread focus on rendering the UI in peace. The library is carefully designed with thread-safety in mind.
The voices folder are automatically copied to your build path when you build and are ready to be accessed. Same with the mentioned espeak backends. Developers may opt to remove them when shipping their apps.
Mind that LoadVoicesFromPath exists as an option, in case developers want to implement their custom voice-loading logic when shipping a project that utilizes KokoroSharp for text-to-speech synthesis.
In addition, the built-in tokenization (text -> tokens) is NOT mandatory, and can be bypassed for platforms like Android/iOS, given developers provide pre-phonemized input with their phonemization solution of choice.

License

This project is licensed under the MIT License.
The Kokoro 82M model and its voices are released under the Apache License.
eSpeak NG is licensed under the GPLv3 License.

Product	Compatible and additional computed target framework versions.
.NET	net8.0 is compatible. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net9.0 was computed. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed.

Product

.NET

Compatible target framework(s)

Included target framework(s) (in package)

Learn more about Target Frameworks and .NET Standard.

net8.0
- Microsoft.ML.OnnxRuntime (>= 1.20.1)
- NAudio (>= 2.2.1)
- NumSharp (>= 0.30.0)
- OpenTK.Audio.OpenAL (>= 5.0.0-pre.13)
- System.Numerics.Tensors (>= 9.0.1)

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version	Downloads	Last updated
0.5.1	0	2/7/2025