Lofcz.Forks.Tiktoken 1.1.4

There is a newer version of this package available.
See the version list below for details.
dotnet add package Lofcz.Forks.Tiktoken --version 1.1.4                
NuGet\Install-Package Lofcz.Forks.Tiktoken -Version 1.1.4                
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="Lofcz.Forks.Tiktoken" Version="1.1.4" />                
For projects that support PackageReference, copy this XML node into the project file to reference the package.
paket add Lofcz.Forks.Tiktoken --version 1.1.4                
#r "nuget: Lofcz.Forks.Tiktoken, 1.1.4"                
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
// Install Lofcz.Forks.Tiktoken as a Cake Addin
#addin nuget:?package=Lofcz.Forks.Tiktoken&version=1.1.4

// Install Lofcz.Forks.Tiktoken as a Cake Tool
#tool nuget:?package=Lofcz.Forks.Tiktoken&version=1.1.4                

Tiktoken

Nuget package dotnet License: MIT Discord

This implementation aims for maximum performance, especially in the token count operation.
There's also a benchmark console app here for easy tracking of this.
We will be happy to accept any PR.

Implemented encodings

  • cl100k_base
  • r50k_base
  • p50k_base
  • p50k_edit

Usage

var encoding = Tiktoken.Encoding.ForModel("gpt-4");
var tokens = encoding.Encode("hello world"); // [15339, 1917]
var text = encoding.Decode(tokens); // hello world
var numberOfTokens = encoding.CountTokens(text); // 2
var stringTokens = encoding.Explore(text); // ["hello", " world"]

var encoding = Tiktoken.Encoding.Get(Encodings.P50KBase);
var tokens = encoding.Encode("hello world"); // [31373, 995]
var text = encoding.Decode(tokens); // hello world

Benchmarks

You can view the reports for each version here


BenchmarkDotNet v0.13.7, macOS Ventura 13.5.1 (22G90) [Darwin 22.6.0]
Apple M1 Pro, 1 CPU, 10 logical and 10 physical cores
.NET SDK 7.0.400
  [Host]     : .NET 7.0.10 (7.0.1023.36312), Arm64 RyuJIT AdvSIMD
  DefaultJob : .NET 7.0.10 (7.0.1023.36312), Arm64 RyuJIT AdvSIMD


Method Categories Data Mean Ratio Gen0 Gen1 Allocated Alloc Ratio
SharpTokenV1_2_8_ CountTokens 1. (...)57. [19866] 1,450,007.0 ns 1.00 292.9688 146.4844 1846187 B 1.00
TiktokenSharpV1_0_6_ CountTokens 1. (...)57. [19866] 977,818.9 ns 0.67 250.0000 125.0000 1571155 B 0.85
TokenizerLibV1_3_2_ CountTokens 1. (...)57. [19866] 854,357.2 ns 0.59 246.0938 85.9375 1547673 B 0.84
Tiktoken_ CountTokens 1. (...)57. [19866] 355,029.1 ns 0.24 49.3164 - 309449 B 0.17
SharpTokenV1_2_8_ CountTokens Hello, World! 1,722.2 ns 1.00 0.5264 - 3304 B 1.00
TiktokenSharpV1_0_6_ CountTokens Hello, World! 6,291.2 ns 3.65 2.1820 0.0305 13728 B 4.15
TokenizerLibV1_3_2_ CountTokens Hello, World! 604.0 ns 0.35 0.2356 - 1480 B 0.45
Tiktoken_ CountTokens Hello, World! 247.0 ns 0.14 0.0420 - 264 B 0.08
SharpTokenV1_2_8_ CountTokens King(...)edy. [275] 15,377.1 ns 1.00 4.1199 0.1526 26008 B 1.00
TiktokenSharpV1_0_6_ CountTokens King(...)edy. [275] 14,758.1 ns 0.96 5.1117 0.1526 32096 B 1.23
TokenizerLibV1_3_2_ CountTokens King(...)edy. [275] 8,366.9 ns 0.54 3.0823 0.1373 19344 B 0.74
Tiktoken_ CountTokens King(...)edy. [275] 3,838.6 ns 0.25 0.6409 - 4032 B 0.16
SharpTokenV1_2_8_Encode Encode 1. (...)57. [19866] 1,393,026.6 ns 1.00 292.9688 146.4844 1846187 B 1.00
TiktokenSharpV1_0_6_Encode Encode 1. (...)57. [19866] 1,246,776.8 ns 0.90 250.0000 125.0000 1571155 B 0.85
TokenizerLibV1_3_2_Encode Encode 1. (...)57. [19866] 852,519.6 ns 0.61 246.0938 85.9375 1547673 B 0.84
Tiktoken_Encode Encode 1. (...)57. [19866] 378,546.7 ns 0.27 59.5703 2.4414 375665 B 0.20
SharpTokenV1_2_8_Encode Encode Hello, World! 1,719.3 ns 1.00 0.5264 - 3304 B 1.00
TiktokenSharpV1_0_6_Encode Encode Hello, World! 6,293.3 ns 3.66 2.1820 0.0305 13728 B 4.15
TokenizerLibV1_3_2_Encode Encode Hello, World! 607.6 ns 0.35 0.2356 - 1480 B 0.45
Tiktoken_Encode Encode Hello, World! 320.6 ns 0.19 0.1135 - 712 B 0.22
SharpTokenV1_2_8_Encode Encode King(...)edy. [275] 15,444.0 ns 1.00 4.1199 0.1526 26008 B 1.00
TiktokenSharpV1_0_6_Encode Encode King(...)edy. [275] 14,704.0 ns 0.95 5.1117 0.1526 32096 B 1.23
TokenizerLibV1_3_2_Encode Encode King(...)edy. [275] 8,556.8 ns 0.55 3.0823 0.1373 19344 B 0.74
Tiktoken_Encode Encode King(...)edy. [275] 4,136.4 ns 0.27 0.8011 - 5056 B 0.19

Support

Priority place for bugs: https://github.com/tryAGI/LangChain/issues
Priority place for ideas and general questions: https://github.com/tryAGI/LangChain/discussions
Discord: https://discord.gg/Ca2xhfBf3v

Product Compatible and additional computed target framework versions.
.NET net8.0 is compatible.  net8.0-android was computed.  net8.0-browser was computed.  net8.0-ios was computed.  net8.0-maccatalyst was computed.  net8.0-macos was computed.  net8.0-tvos was computed.  net8.0-windows was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.
  • net8.0

    • No dependencies.

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last updated
1.1.6 154 2/18/2024
1.1.5 124 2/18/2024
1.1.4 123 2/18/2024