Sep 0.1.0-preview.6
See the version list below for details.
dotnet add package Sep --version 0.1.0-preview.6
NuGet\Install-Package Sep -Version 0.1.0-preview.6
<PackageReference Include="Sep" Version="0.1.0-preview.6" />
paket add Sep --version 0.1.0-preview.6
#r "nuget: Sep, 0.1.0-preview.6"
// Install Sep as a Cake Addin #addin nuget:?package=Sep&version=0.1.0-preview.6&prerelease // Install Sep as a Cake Tool #tool nuget:?package=Sep&version=0.1.0-preview.6&prerelease
Sep - Possibly the World's Fastest .NET CSV Parser
Modern, minimal, fast, zero allocation, reading and writing of separated values
(csv
, tsv
etc.). Cross-platform, trimmable and AOT/NativeAOT compatible.
Featuring an opinionated API design and pragmatic implementation targetted at
machine learning use cases.
⭐ Modern - utilizes modern features such as
Span<T>
,
Generic Math,
ref struct
and other .NET 7+/C# 11
features.
🔎 Minimal - a succinct yet expressive API with few moving parts, configurations and no hidden changes to input or output. What you read/write is what you get. This means there is no "automatic" escaping/unescaping of quotes, for example.
🗑️ Zero allocation - intelligent and efficient memory management allowing for zero allocations after warmup incl. supporting use cases of reading or writing arrays of values (e.g. features) easily without repeated allocations.
🚀 Fast - blazing fast with both architecture specific and cross-platform SIMD vectorized parsing and using csFastFloat for fast parsing of floating points. Reads or writes one row at a time efficiently with benchmarks to prove it.
🌐 Cross-platform - works on any platform, any architecture supported by .NET. 100% managed and written in beautiful modern C#.
✂️ Trimmable and 🤖 AOT/NativeAOT compatible - no problematic reflection or dynamic code generation. Hence, fully trimmable and Ahead-of-Time compatible. With a simple console tester program executable possible in just a few MBs. 💾
🗣️ Opinionated and pragmatic 🤔 - conforms to the essentials of RFC-4180, but takes an opinionated and pragmatic approach towards this especially with regards to quoting and line ends. See section RFC-4180.
var text = """
A;B;C;D;E;F
Sep;🚀;1;1.2;0.1;0.5
CSV;✅;2;2.2;0.2;1.5
""";
using var reader = Sep.Reader().FromText(text); // Infers separator 'Sep' from header
using var writer = reader.Sep.Writer().ToText(); // Writer defined from reader 'Sep'
// Use .FromFile(...)/ToFile(...) for files
var idx = reader.Header.IndexOf("B");
var nms = new[] { "E", "F" };
foreach (var readRow in reader) // Read one row at a time
{
var a = readRow["A"].Span; // Column as ReadOnlySpan<char>
var b = readRow[idx].ToString(); // Column to string (allocates new string per call)
var c = readRow["C"].Parse<int>(); // Parse any T : ISpanParsable<T>
var d = readRow["D"].Parse<float>(); // Parse float/double fast via csFastFloat
var s = readRow[nms].Parse<double>(); // Parse multiple columns as Span<T>
// - Sep handles array allocation and reuse
foreach (ref var v in s) { v *= 10; }
using var writeRow = writer.NewRow(); // Start new row. Row written on Dispose.
writeRow["A"].Set(a); // Set by ReadOnlySpan<char>
writeRow["B"].Set(b); // Set by string
writeRow["C"].Set($"{c * 2}"); // Set via InterpolatedStringHandler, no allocs
writeRow["D"].Format(d / 2); // Format any T : ISpanFormattable
writeRow[nms].Format(s); // Format multiple columns directly
}
var expected = """
A;B;C;D;E;F
Sep;🚀;2;0.6;1;5
CSV;✅;4;1.1;2;15
""";
Assert.AreEqual(expected, writer.ToString());
// Above example code is for demonstration purposes only.
// Short names and repeated constants are only for demonstration.
Benchmarks
Reader Comparison Benchmarks
BenchmarkDotNet=v0.13.2, OS=Windows 10 (10.0.19044.2728/21H2/November2021Update)
AMD Ryzen 9 5950X, 1 CPU, 32 logical and 16 physical cores
.NET SDK=7.0.202
[Host] : .NET 7.0.4 (7.0.423.11508), X64 RyuJIT AVX2
Job-LOZXLZ : .NET 7.0.4 (7.0.423.11508), X64 RyuJIT AVX2
Runtime=.NET 7.0 Toolchain=net70 InvocationCount=1
MaxIterationCount=5 MinIterationCount=1 UnrollFactor=1
WarmupCount=3
Type | Method | Reader | Lines | Mean | Error | StdDev | Ratio | RatioSD | Gen0 | Gen1 | MB | MB/s | ns/line | Gen2 | Allocated | Alloc Ratio |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
PackageAssetsBenchParseAsset | Sep__________ | UTF16 StringReader | 200000 | 140.4 ms | 22.62 ms | 5.87 ms | 1.00 | 0.00 | 4000.0000 | 3000.0000 | 116 | 831.4 | 702.1 | 1000.0000 | 54377.29 KB | 1.00 |
PackageAssetsBenchParseAsset | Sylvan_______ | UTF16 StringReader | 200000 | 174.2 ms | 23.79 ms | 6.18 ms | 1.24 | 0.07 | 4000.0000 | 3000.0000 | 116 | 670.4 | 870.8 | 1000.0000 | 54725.33 KB | 1.01 |
PackageAssetsBenchParseAsset | ReadSplitLine | UTF16 StringReader | 200000 | 469.3 ms | 124.65 ms | 32.37 ms | 3.34 | 0.23 | 28000.0000 | 27000.0000 | 116 | 248.8 | 2346.4 | 4000.0000 | 408584.26 KB | 7.51 |
PackageAssetsBenchParseAsset | CsvHelper____ | UTF16 StringReader | 200000 | 428.2 ms | 2.67 ms | 0.15 ms | 3.00 | 0.13 | 4000.0000 | 3000.0000 | 116 | 272.7 | 2140.9 | 1000.0000 | 54543.54 KB | 1.00 |
PackageAssetsBenchParseCsvOnly | Sep__________ | UTF16 StringReader | 1000000 | 104.5 ms | 1.31 ms | 0.34 ms | 1.00 | 0.00 | - | - | 583 | 5587.8 | 104.5 | - | 5.93 KB | 1.00 |
PackageAssetsBenchParseCsvOnly | Sylvan_______ | UTF16 StringReader | 1000000 | 167.5 ms | 2.96 ms | 0.77 ms | 1.60 | 0.01 | - | - | 583 | 3486.1 | 167.5 | - | 135.29 KB | 22.82 |
PackageAssetsBenchParseCsvOnly | ReadSplitLine | UTF16 StringReader | 1000000 | 271.2 ms | 16.60 ms | 4.31 ms | 2.60 | 0.04 | 108000.0000 | - | 583 | 2152.6 | 271.2 | - | 1772445.8 KB | 298,910.49 |
PackageAssetsBenchParseCsvOnly | CsvHelper____ | UTF16 StringReader | 1000000 | 1,392.3 ms | 59.08 ms | 15.34 ms | 13.33 | 0.13 | 57000.0000 | - | 583 | 419.3 | 1392.3 | - | 935255.09 KB | 157,724.18 |
RFC-4180
While the RFC-4180 requires \r\n
(CR,LF) as line ending, the well-known line endings (\r\n
, \n
and \r
) are
supported similar to .NET. Environment.NewLine
is used when writing. Quoting
is supported by simply matching pairs of quotes, no matter what. With no
automatic escaping. Hence, you are responsible and in control of this at this
time.
Note that some libraries will claim conformance but the RFC is, perhaps
naturally, quite strict e.g. only comma is supported as separator/delimiter. Sep
defaults to using ;
as separator if writing, while auto-detecting supported
separators when reading. This is decidedly non-conforming.
The RFC defines the following condensed ABNF grammar:
file = [header CRLF] record *(CRLF record) [CRLF]
header = name *(COMMA name)
record = field *(COMMA field)
name = field
field = (escaped / non-escaped)
escaped = DQUOTE *(TEXTDATA / COMMA / CR / LF / 2DQUOTE) DQUOTE
non-escaped = *TEXTDATA
COMMA = %x2C
CR = %x0D ;as per section 6.1 of RFC 2234 [2]
DQUOTE = %x22 ;as per section 6.1 of RFC 2234 [2]
LF = %x0A ;as per section 6.1 of RFC 2234 [2]
CRLF = CR LF ;as per section 6.1 of RFC 2234 [2]
TEXTDATA = %x20-21 / %x23-2B / %x2D-7E
Note how TEXTDATA
is restricted too, yet many will allow any character incl.
emojis or similar (which Sep supports), but is not in conformance with the RFC.
Quotes inside an escaped field e.g. "fie""ld"
are only allowed to be double
quotes. Sep currently allows any pairs of quotes and quoting doesn't need to be
at start of or end of field (col or column in Sep terminology).
All in all Sep takes a pretty pragmatic approach here as the primary use case is not exchanging data on the internet, but for use in machine learning pipelines or similar.
Product | Versions Compatible and additional computed target framework versions. |
---|---|
.NET | net7.0 is compatible. net7.0-android was computed. net7.0-ios was computed. net7.0-maccatalyst was computed. net7.0-macos was computed. net7.0-tvos was computed. net7.0-windows was computed. net8.0 was computed. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. |
-
net7.0
- csFastFloat (>= 4.1.0)
NuGet packages (1)
Showing the top 1 NuGet packages that depend on Sep:
Package | Downloads |
---|---|
WoW2.Backbone.Data.Transportation.Cars
Global manufactured cars static data and data providers |
GitHub repositories (3)
Showing the top 3 popular GitHub repositories that depend on Sep:
Repository | Stars |
---|---|
DataDog/dd-trace-dotnet
.NET Client Library for Datadog APM
|
|
joakimskoog/AnApiOfIceAndFire
An API of Ice And Fire is the world's greatest source for quantified and structured data from the universe of Ice and Fire (as well as the HBO series Game of Thrones).
|
|
JasonBock/Rocks
A mocking library based on the Compiler APIs (Roslyn + Mocks)
|
Version | Downloads | Last updated |
---|---|---|
0.5.5 | 22,429 | 10/8/2024 |
0.5.4 | 12,487 | 9/20/2024 |
0.5.3 | 81,307 | 7/11/2024 |
0.5.2 | 1,146,966 | 4/21/2024 |
0.5.1 | 158 | 4/20/2024 |
0.5.0 | 4,591 | 4/15/2024 |
0.4.6 | 2,599 | 4/4/2024 |
0.4.5 | 2,233 | 3/28/2024 |
0.4.4 | 870 | 3/20/2024 |
0.4.3 | 1,323 | 3/10/2024 |
0.4.2 | 320 | 3/8/2024 |
0.4.1 | 128 | 3/8/2024 |
0.4.0 | 18,848 | 1/1/2024 |
0.4.0-preview.1 | 117 | 12/23/2023 |
0.3.0 | 911,085 | 11/18/2023 |
0.2.7 | 5,113 | 10/12/2023 |
0.2.6 | 497 | 9/27/2023 |
0.2.5 | 295 | 9/14/2023 |
0.2.4 | 271 | 9/8/2023 |
0.2.3 | 298 | 9/5/2023 |
0.2.2 | 475,377 | 8/10/2023 |
0.2.1 | 184 | 8/10/2023 |
0.2.0 | 983 | 8/7/2023 |
0.2.0-preview.3 | 124 | 7/29/2023 |
0.1.0 | 837 | 5/30/2023 |
0.1.0-rc.1 | 108 | 5/26/2023 |
0.1.0-preview.8 | 88 | 5/26/2023 |
0.1.0-preview.7 | 118 | 5/8/2023 |
0.1.0-preview.6 | 108 | 4/24/2023 |
0.1.0-preview.5 | 117 | 3/19/2023 |
0.1.0-preview.4 | 120 | 12/31/2022 |
0.1.0-preview.3 | 120 | 12/4/2022 |
0.1.0-preview.2 | 144 | 3/21/2022 |
0.1.0-preview.1 | 149 | 1/28/2022 |