Tokenizers.DotNet
0.9.1
Additional Details
These packages are compatible with the runtime 0.5.0 or below. Please use 0.9.2 or above to use the runtime 0.6.1 or above.
For the reason, check https://github.com/sappho192/Tokenizers.DotNet/issues/4
There is a newer version of this package available.
See the version list below for details.
See the version list below for details.
dotnet add package Tokenizers.DotNet --version 0.9.1
NuGet\Install-Package Tokenizers.DotNet -Version 0.9.1
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="Tokenizers.DotNet" Version="0.9.1" />
For projects that support PackageReference, copy this XML node into the project file to reference the package.
paket add Tokenizers.DotNet --version 0.9.1
The NuGet Team does not provide support for this client. Please contact its maintainers for support.
#r "nuget: Tokenizers.DotNet, 0.9.1"
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
// Install Tokenizers.DotNet as a Cake Addin #addin nuget:?package=Tokenizers.DotNet&version=0.9.1 // Install Tokenizers.DotNet as a Cake Tool #tool nuget:?package=Tokenizers.DotNet&version=0.9.1
The NuGet Team does not provide support for this client. Please contact its maintainers for support.
Tokenizers.DotNet
.NET wrapper of HuggingFace Tokenizers library
Nuget Package list
Package | main | Description |
---|---|---|
Tokenizers.DotNet | Core library | |
Tokenizers.DotNet.runtime.win | Native bindings for windows x64 |
Requirements
- .NET 6 or above
Supported functionalities
- Download tokenizer files from Hugginface Hub
- Load tokenizer file(
.json
) from local - Decode embeddings to string
How to use
(1) Install the packages
- From the NuGet, install
Tokenizers.DotNet
package - And then, install
Tokenizers.DotNet.runtime.win
package too
(2) Write the code
Check following example code:
using Tokenizers.DotNet;
// Download skt/kogpt2-base-v2/tokenizer.json from the hub
var hubName = "skt/kogpt2-base-v2";
var filePath = "tokenizer.json";
var fileFullPath = await HuggingFace.GetFileFromHub(hubName, filePath, "deps");
Console.WriteLine($"Downloaded {fileFullPath}");
// Create a tokenizer instance
var tokenizer = new Tokenizer(vocabPath: fileFullPath);
var tokens = new uint[] { 9330, 387, 12857, 9376, 18649, 9098, 7656, 6969, 8084, 1 };
var decoded = tokenizer.Decode(tokens);
Console.WriteLine($"Decoded: {decoded}");
Console.WriteLine($"Version of Tokenizers.DotNet.runtime.win: {tokenizer.GetVersion()}");
How to build
- Prepare following stuff:
- Rust build system (
cargo
) - .NET build system (
dotnet 6.0
) - PowerShell (Recommend
7.4.2
or above)
- Rust build system (
- Run
build_all_clean.ps1
- To build
Tokenizers.DotNet.runtime.win
only, runbuild_rust.ps1
- To build
Tokenizers.DotNet
only, runbuild_dotnet.ps1
- To build
Each build artifacts will be in nuget
directory.
Product | Versions Compatible and additional computed target framework versions. |
---|---|
.NET | net6.0 is compatible. net6.0-android was computed. net6.0-ios was computed. net6.0-maccatalyst was computed. net6.0-macos was computed. net6.0-tvos was computed. net6.0-windows was computed. net7.0 is compatible. net7.0-android was computed. net7.0-ios was computed. net7.0-maccatalyst was computed. net7.0-macos was computed. net7.0-tvos was computed. net7.0-windows was computed. net8.0 is compatible. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. |
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.
-
net6.0
- No dependencies.
-
net7.0
- No dependencies.
-
net8.0
- No dependencies.
NuGet packages (1)
Showing the top 1 NuGet packages that depend on Tokenizers.DotNet:
Package | Downloads |
---|---|
EDMTranslator
Text translator library based on LLM models, especially EncoderDecoderModel in HuggingFace |
GitHub repositories
This package is not used by any popular GitHub repositories.