Tiktoken 1.0.2
See the version list below for details.
dotnet add package Tiktoken --version 1.0.2
NuGet\Install-Package Tiktoken -Version 1.0.2
<PackageReference Include="Tiktoken" Version="1.0.2" />
paket add Tiktoken --version 1.0.2
#r "nuget: Tiktoken, 1.0.2"
// Install Tiktoken as a Cake Addin #addin nuget:?package=Tiktoken&version=1.0.2 // Install Tiktoken as a Cake Tool #tool nuget:?package=Tiktoken&version=1.0.2
Tiktoken
This implementation aims for maximum performance, especially in the token count operation.
There's also a benchmark console app here for easy tracking of this.
We will be happy to accept any PR.
Implemented encodings
cl100k_base
r50k_base
p50k_base
p50k_edit
Usage
var encoding = Tiktoken.Encoding.ForModel("gpt-4");
var tokens = encoding.Encode("hello world"); // [15339, 1917]
var text = encoding.Decode(tokens); // hello world
var numberOfTokens = encoding.CountTokens(text); // 2
var encoding = Tiktoken.Encoding.Get("p50k_base");
var tokens = encoding.Encode("hello world"); // [31373, 995]
var text = encoding.Decode(tokens); // hello world
Benchmarks
You can view the reports for each version here
BenchmarkDotNet=v0.13.5, OS=macOS Ventura 13.4 (22F66) [Darwin 22.5.0]
Apple M1 Pro, 1 CPU, 10 logical and 10 physical cores
.NET SDK=7.0.304
[Host] : .NET 7.0.7 (7.0.723.27404), Arm64 RyuJIT AdvSIMD
Job-EVPJQP : .NET 7.0.7 (7.0.723.27404), Arm64 RyuJIT AdvSIMD DEBUG
BuildConfiguration=Debug
Method | Categories | Data | Mean | Ratio | Gen0 | Gen1 | Gen2 | Allocated | Alloc Ratio |
---|---|---|---|---|---|---|---|---|---|
SharpTokenV1_0_28_ | CountTokens | 1. (...)57. [19866] | 5,416,309.6 ns | 1.00 | 601.5625 | 289.0625 | - | 3805771 B | 1.00 |
TiktokenSharpV1_0_5_ | CountTokens | 1. (...)57. [19866] | 1,532,135.7 ns | 0.28 | 250.0000 | 125.0000 | - | 1571155 B | 0.41 |
TokenizerLibV1_3_2_ | CountTokens | 1. (...)57. [19866] | 856,737.2 ns | 0.16 | 246.0938 | 87.8906 | - | 1547674 B | 0.41 |
Tiktoken_ | CountTokens | 1. (...)57. [19866] | 413,599.2 ns | 0.08 | 49.3164 | - | - | 309449 B | 0.08 |
SharpTokenV1_0_28_ | CountTokens | Hello, World! | 3,322.0 ns | 1.00 | 0.6752 | - | - | 4240 B | 1.00 |
TiktokenSharpV1_0_5_ | CountTokens | Hello, World! | 6,690.3 ns | 2.02 | 2.1820 | 0.0381 | - | 13728 B | 3.24 |
TokenizerLibV1_3_2_ | CountTokens | Hello, World! | 642.2 ns | 0.19 | 0.2356 | - | - | 1480 B | 0.35 |
Tiktoken_ | CountTokens | Hello, World! | 326.3 ns | 0.10 | 0.0420 | - | - | 264 B | 0.06 |
SharpTokenV1_0_28_ | CountTokens | King(...)edy. [275] | 71,849.3 ns | 1.00 | 8.5449 | 0.3662 | - | 54160 B | 1.00 |
TiktokenSharpV1_0_5_ | CountTokens | King(...)edy. [275] | 21,938.0 ns | 0.31 | 5.0964 | 0.2136 | - | 32096 B | 0.59 |
TokenizerLibV1_3_2_ | CountTokens | King(...)edy. [275] | 8,467.3 ns | 0.12 | 3.0823 | 0.1373 | - | 19344 B | 0.36 |
Tiktoken_ | CountTokens | King(...)edy. [275] | 4,616.9 ns | 0.06 | 0.6409 | - | - | 4032 B | 0.07 |
SharpTokenV1_0_28_Encode | Encode | 1. (...)57. [19866] | 5,639,328.3 ns | 1.00 | 601.5625 | 296.8750 | - | 3805771 B | 1.00 |
TiktokenSharpV1_0_5_Encode | Encode | 1. (...)57. [19866] | 1,658,228.7 ns | 0.29 | 251.9531 | 126.9531 | 1.9531 | 1571157 B | 0.41 |
TokenizerLibV1_3_2_Encode | Encode | 1. (...)57. [19866] | 926,416.4 ns | 0.16 | 248.0469 | 124.0234 | 1.9531 | 1547678 B | 0.41 |
Tiktoken_Encode | Encode | 1. (...)57. [19866] | 430,360.1 ns | 0.08 | 59.5703 | 29.7852 | - | 375665 B | 0.10 |
SharpTokenV1_0_28_Encode | Encode | Hello, World! | 3,417.4 ns | 1.00 | 0.6752 | 0.0038 | - | 4240 B | 1.00 |
TiktokenSharpV1_0_5_Encode | Encode | Hello, World! | 6,804.5 ns | 1.99 | 2.1820 | 0.0458 | - | 13728 B | 3.24 |
TokenizerLibV1_3_2_Encode | Encode | Hello, World! | 640.1 ns | 0.19 | 0.2356 | 0.0010 | - | 1480 B | 0.35 |
Tiktoken_Encode | Encode | Hello, World! | 479.1 ns | 0.14 | 0.1135 | - | - | 712 B | 0.17 |
SharpTokenV1_0_28_Encode | Encode | King(...)edy. [275] | 62,696.5 ns | 1.00 | 8.5449 | 0.4883 | - | 54160 B | 1.00 |
TiktokenSharpV1_0_5_Encode | Encode | King(...)edy. [275] | 27,168.7 ns | 0.43 | 5.0964 | 0.3052 | - | 32096 B | 0.59 |
TokenizerLibV1_3_2_Encode | Encode | King(...)edy. [275] | 12,317.4 ns | 0.20 | 3.0823 | 0.1831 | - | 19344 B | 0.36 |
Tiktoken_Encode | Encode | King(...)edy. [275] | 4,933.3 ns | 0.08 | 0.8011 | 0.0153 | - | 5056 B | 0.09 |
Possible optimizations
- Modes - Fast(without special token regex)/Strict
- SIMD?
- Parallelism?
- string as dictionary key?
Support
Priority place for bugs: https://github.com/tryAGI/LangChain/issues
Priority place for ideas and general questions: https://github.com/tryAGI/LangChain/discussions
Discord: https://discord.gg/Ca2xhfBf3v
Product | Versions Compatible and additional computed target framework versions. |
---|---|
.NET | net5.0 was computed. net5.0-windows was computed. net6.0 is compatible. net6.0-android was computed. net6.0-ios was computed. net6.0-maccatalyst was computed. net6.0-macos was computed. net6.0-tvos was computed. net6.0-windows was computed. net7.0 is compatible. net7.0-android was computed. net7.0-ios was computed. net7.0-maccatalyst was computed. net7.0-macos was computed. net7.0-tvos was computed. net7.0-windows was computed. net8.0 was computed. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net9.0 was computed. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. |
.NET Core | netcoreapp2.0 was computed. netcoreapp2.1 was computed. netcoreapp2.2 was computed. netcoreapp3.0 was computed. netcoreapp3.1 was computed. |
.NET Standard | netstandard2.0 is compatible. netstandard2.1 is compatible. |
.NET Framework | net461 is compatible. net462 was computed. net463 was computed. net47 was computed. net471 was computed. net472 was computed. net48 was computed. net481 was computed. |
MonoAndroid | monoandroid was computed. |
MonoMac | monomac was computed. |
MonoTouch | monotouch was computed. |
Tizen | tizen40 was computed. tizen60 was computed. |
Xamarin.iOS | xamarinios was computed. |
Xamarin.Mac | xamarinmac was computed. |
Xamarin.TVOS | xamarintvos was computed. |
Xamarin.WatchOS | xamarinwatchos was computed. |
-
.NETFramework 4.6.1
- System.ValueTuple (>= 4.5.0)
-
.NETStandard 2.0
- No dependencies.
-
.NETStandard 2.1
- No dependencies.
-
net6.0
- No dependencies.
-
net7.0
- No dependencies.
NuGet packages (8)
Showing the top 5 NuGet packages that depend on Tiktoken:
Package | Downloads |
---|---|
tryAGI.OpenAI
Generated C# SDK based on official OpenAI OpenAPI specification. Includes C# Source Generator which allows you to define functions natively through a C# interface, and also provides extensions that make it easier to call this interface later |
|
LangChain.Providers.HuggingFace
HuggingFace API LLM and Chat model provider. |
|
LangChain.Providers.Anyscale
Anyscale Endpoints Chat model provider. |
|
drittich.SemanticSlicer
A library for slicing text data into smaller chunks while attempting to preserve context. |
|
Aeolin.AwosFramework.OpenAIFunctionCalling
Package Description |
GitHub repositories (1)
Showing the top 1 popular GitHub repositories that depend on Tiktoken:
Repository | Stars |
---|---|
Richasy/Rodel.Agent
支持主流在线 AI 服务的应用
|
Version | Downloads | Last updated |
---|---|---|
2.2.0 | 12,770 | 11/15/2024 |
2.1.1 | 2,811 | 11/9/2024 |
2.1.0 | 637 | 11/9/2024 |
2.0.3 | 133,171 | 7/10/2024 |
2.0.2 | 33,935 | 5/19/2024 |
2.0.1 | 141 | 5/18/2024 |
2.0.0 | 1,743 | 5/15/2024 |
1.2.0 | 133,143 | 2/18/2024 |
1.1.3 | 188,052 | 10/11/2023 |
1.1.2 | 138,213 | 8/28/2023 |
1.1.1 | 130,137 | 7/12/2023 |
1.1.0 | 2,603 | 6/28/2023 |
1.0.2 | 281 | 6/21/2023 |
1.0.1 | 351 | 6/16/2023 |
1.0.0 | 704 | 6/12/2023 |
0.9.7 | 290 | 6/11/2023 |
0.9.6 | 355 | 6/9/2023 |
0.9.5 | 273 | 6/9/2023 |
0.9.4 | 182 | 6/9/2023 |
0.9.3 | 177 | 6/9/2023 |
0.9.2 | 383 | 6/9/2023 |