Sep 0.1.0-preview.7

This is a prerelease version of Sep.
There is a newer version of this package available.
See the version list below for details.
dotnet add package Sep --version 0.1.0-preview.7                
NuGet\Install-Package Sep -Version 0.1.0-preview.7                
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="Sep" Version="0.1.0-preview.7" />                
For projects that support PackageReference, copy this XML node into the project file to reference the package.
paket add Sep --version 0.1.0-preview.7                
#r "nuget: Sep, 0.1.0-preview.7"                
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
// Install Sep as a Cake Addin
#addin nuget:?package=Sep&version=0.1.0-preview.7&prerelease

// Install Sep as a Cake Tool
#tool nuget:?package=Sep&version=0.1.0-preview.7&prerelease                

Sep - Possibly the World's Fastest .NET CSV Parser

Build Status NuGet

Modern, minimal, fast, zero allocation, reading and writing of separated values (csv, tsv etc.). Cross-platform, trimmable and AOT/NativeAOT compatible. Featuring an opinionated API design and pragmatic implementation targetted at machine learning use cases.

  • ⭐ Modern - utilizes modern features such as Span<T>, Generic Math, ref struct and other .NET 7+/C# 11 features.
  • 🔎 Minimal - a succinct yet expressive API with few moving parts, configurations and no hidden changes to input or output. What you read/write is what you get. This means there is no "automatic" escaping/unescaping of quotes, for example.
  • 🚀 Fast - blazing fast with both architecture specific and cross-platform SIMD vectorized parsing and using csFastFloat for fast parsing of floating points. Reads or writes one row at a time efficiently with detailed benchmarks to prove it.
  • 🗑️ Zero allocation - intelligent and efficient memory management allowing for zero allocations after warmup incl. supporting use cases of reading or writing arrays of values (e.g. features) easily without repeated allocations.
  • 🌐 Cross-platform - works on any platform, any architecture supported by .NET. 100% managed and written in beautiful modern C#.
  • ✂️ Trimmable and AOT/NativeAOT compatible - no problematic reflection or dynamic code generation. Hence, fully trimmable and Ahead-of-Time compatible. With a simple console tester program executable possible in just a few MBs. 💾
  • 🗣️ Opinionated and pragmatic - conforms to the essentials of RFC-4180, but takes an opinionated and pragmatic approach towards this especially with regards to quoting and line ends. See section RFC-4180.
var text = """
           A;B;C;D;E;F
           Sep;🚀;1;1.2;0.1;0.5
           CSV;✅;2;2.2;0.2;1.5
           """;

using var reader = Sep.Reader().FromText(text);  // Infers separator 'Sep' from header
using var writer = reader.Sep.Writer().ToText(); // Writer defined from reader 'Sep'
                                                 // Use .FromFile(...)/ToFile(...) for files
var idx = reader.Header.IndexOf("B");
var nms = new[] { "E", "F" };

foreach (var readRow in reader)           // Read one row at a time
{
    var a = readRow["A"].Span;            // Column as ReadOnlySpan<char>
    var b = readRow[idx].ToString();      // Column to string (allocates new string per call)
    var c = readRow["C"].Parse<int>();    // Parse any T : ISpanParsable<T>
    var d = readRow["D"].Parse<float>();  // Parse float/double fast via csFastFloat
    var s = readRow[nms].Parse<double>(); // Parse multiple columns as Span<T>
                                          // - Sep handles array allocation and reuse
    foreach (ref var v in s) { v *= 10; }

    using var writeRow = writer.NewRow(); // Start new row. Row written on Dispose.
    writeRow["A"].Set(a);                 // Set by ReadOnlySpan<char>
    writeRow["B"].Set(b);                 // Set by string
    writeRow["C"].Set($"{c * 2}");        // Set via InterpolatedStringHandler, no allocs
    writeRow["D"].Format(d / 2);          // Format any T : ISpanFormattable
    writeRow[nms].Format(s);              // Format multiple columns directly
}

var expected = """
               A;B;C;D;E;F
               Sep;🚀;2;0.6;1;5
               CSV;✅;4;1.1;2;15
               """;
Assert.AreEqual(expected, writer.ToString());

// Above example code is for demonstration purposes only.
// Short names and repeated constants are only for demonstration.

Comparison Benchmarks

Reader Comparison Benchmarks

AMD 5950X Platform Information
BenchmarkDotNet=v0.13.2, OS=Windows 10 (10.0.19044.2846/21H2/November2021Update)
AMD Ryzen 9 5950X, 1 CPU, 32 logical and 16 physical cores
Machine Learning Reader Comparison Benchmarks

Benchmark demonstrating what Sep is built for. Namely floating points or features as in machine learning. Here a simple CSV-file is randomly generated with 20 ground truth values, 20 predicted result values and some typical extra columns leading that, but which aren't parsed or read in the benchmark. For Scope=Floats the benchmark will parse those features as two spans of floats; one for ground truth values and one for predicted result values. Finally, as an example calculates the mean squared error (MSE) of those i.e. loss.

AMD 5950X - Machine Learning Benchmark Results
Method Scope Rows Mean Ratio MB MB/s ns/row Allocated Alloc Ratio
Sep______ Row 1000000 152.2 ms 1.00 1089 7160.3 152.2 38.52 KB 1.00
Sylvan___ Row 1000000 214.2 ms 1.41 1089 5088.7 214.2 140.26 KB 3.64
ReadLine_ Row 1000000 538.7 ms 3.54 1089 2022.9 538.7 3597953.38 KB 93,415.42
CsvHelper Row 1000000 2,061.0 ms 13.54 1089 528.8 2061.0 21.3 KB 0.55
Sep______ Cols 1000000 174.0 ms 1.00 1089 6263.5 174.0 38.52 KB 1.00
Sylvan___ Cols 1000000 307.9 ms 1.77 1089 3539.2 307.9 140.26 KB 3.64
ReadLine_ Cols 1000000 555.7 ms 3.19 1089 1961.2 555.7 3597953.38 KB 93,415.42
CsvHelper Cols 1000000 1,692.8 ms 9.73 1089 643.8 1692.8 1136460.33 KB 29,506.48
Sep______ Floats 100000 136.9 ms 1.00 109 796.0 1369.4 46.35 KB 1.00
Sylvan___ Floats 100000 290.3 ms 2.12 109 375.6 2902.5 52.33 KB 1.13
ReadLine_ Floats 100000 320.5 ms 2.34 109 340.1 3205.3 359872.43 KB 7,763.98
CsvHelper Floats 100000 545.0 ms 3.98 109 200.0 5449.8 87699.66 KB 1,892.05
NCsvPerf PackageAssets Reader Comparison Benchmarks
AMD 5950X - PackageAssets Benchmark Results
Method Scope Rows Mean Ratio MB MB/s ns/row Allocated Alloc Ratio
Sep______ Row 1000000 79.66 ms 1.00 583 7328.1 79.7 1.72 KB 1.00
Sylvan___ Row 1000000 118.37 ms 1.49 583 4931.7 118.4 135.29 KB 78.71
ReadLine_ Row 1000000 262.97 ms 3.29 583 2220.0 263.0 1772445.8 KB 1,031,241.19
CsvHelper Row 1000000 1,073.80 ms 13.52 583 543.7 1073.8 20.97 KB 12.20
Sep______ Cols 1000000 94.61 ms 1.00 583 6170.1 94.6 1.72 KB 1.00
Sylvan___ Cols 1000000 163.57 ms 1.73 583 3569.0 163.6 135.29 KB 78.71
ReadLine_ Cols 1000000 263.55 ms 2.79 583 2215.1 263.5 1772445.8 KB 1,031,241.19
CsvHelper Cols 1000000 1,696.07 ms 17.94 583 344.2 1696.1 446.63 KB 259.85
Sep______ Asset 1000000 813.55 ms 1.00 583 717.6 813.6 266663.45 KB 1.00
Sylvan___ Asset 1000000 948.49 ms 1.16 583 615.5 948.5 267014.27 KB 1.00
ReadLine_ Asset 1000000 2,001.81 ms 2.46 583 291.6 2001.8 2038832.84 KB 7.65
CsvHelper Asset 1000000 1,975.56 ms 2.43 583 295.5 1975.6 266833.67 KB 1.00
AMD 5950X - PackageAssets with Quotes Benchmark Results
Method Scope Rows Mean Ratio MB MB/s ns/row Allocated Alloc Ratio
Sep______ Row 1000000 181.1 ms 1.00 667 3687.8 181.1 1.72 KB 1.00
Sylvan___ Row 1000000 477.5 ms 2.64 667 1398.3 477.5 135.29 KB 78.71
ReadLine_ Row 1000000 307.9 ms 1.70 667 2168.3 307.9 2175928.97 KB 1,265,995.04
CsvHelper Row 1000000 1,237.7 ms 6.83 667 539.5 1237.7 20.97 KB 12.20
Sep______ Cols 1000000 196.8 ms 1.00 667 3393.0 196.8 1.72 KB 1.00
Sylvan___ Cols 1000000 532.7 ms 2.71 667 1253.3 532.7 135.29 KB 78.71
ReadLine_ Cols 1000000 314.1 ms 1.59 667 2125.6 314.1 2175928.97 KB 1,265,995.04
CsvHelper Cols 1000000 1,935.6 ms 9.84 667 345.0 1935.6 446.63 KB 259.85
Sep______ Asset 1000000 916.4 ms 1.00 667 728.7 916.4 266669.81 KB 1.00
Sylvan___ Asset 1000000 1,274.4 ms 1.39 667 523.9 1274.4 267016.83 KB 1.00
ReadLine_ Asset 1000000 2,692.4 ms 2.94 667 248.0 2692.4 2442316.47 KB 9.16
CsvHelper Asset 1000000 2,236.7 ms 2.44 667 298.5 2236.7 266832.63 KB 1.00

RFC-4180

While the RFC-4180 requires \r\n (CR,LF) as line ending, the well-known line endings (\r\n, \n and \r) are supported similar to .NET. Environment.NewLine is used when writing. Quoting is supported by simply matching pairs of quotes, no matter what. With no automatic escaping. Hence, you are responsible and in control of this at this time.

Note that some libraries will claim conformance but the RFC is, perhaps naturally, quite strict e.g. only comma is supported as separator/delimiter. Sep defaults to using ; as separator if writing, while auto-detecting supported separators when reading. This is decidedly non-conforming.

The RFC defines the following condensed ABNF grammar:

file = [header CRLF] record *(CRLF record) [CRLF]
header = name *(COMMA name)
record = field *(COMMA field)
name = field
field = (escaped / non-escaped)
escaped = DQUOTE *(TEXTDATA / COMMA / CR / LF / 2DQUOTE) DQUOTE
non-escaped = *TEXTDATA
COMMA = %x2C
CR = %x0D ;as per section 6.1 of RFC 2234 [2]
DQUOTE =  %x22 ;as per section 6.1 of RFC 2234 [2]
LF = %x0A ;as per section 6.1 of RFC 2234 [2]
CRLF = CR LF ;as per section 6.1 of RFC 2234 [2]
TEXTDATA =  %x20-21 / %x23-2B / %x2D-7E

Note how TEXTDATA is restricted too, yet many will allow any character incl. emojis or similar (which Sep supports), but is not in conformance with the RFC.

Quotes inside an escaped field e.g. "fie""ld" are only allowed to be double quotes. Sep currently allows any pairs of quotes and quoting doesn't need to be at start of or end of field (col or column in Sep terminology).

All in all Sep takes a pretty pragmatic approach here as the primary use case is not exchanging data on the internet, but for use in machine learning pipelines or similar.

Product Compatible and additional computed target framework versions.
.NET net7.0 is compatible.  net7.0-android was computed.  net7.0-ios was computed.  net7.0-maccatalyst was computed.  net7.0-macos was computed.  net7.0-tvos was computed.  net7.0-windows was computed.  net8.0 was computed.  net8.0-android was computed.  net8.0-browser was computed.  net8.0-ios was computed.  net8.0-maccatalyst was computed.  net8.0-macos was computed.  net8.0-tvos was computed.  net8.0-windows was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.

NuGet packages (1)

Showing the top 1 NuGet packages that depend on Sep:

Package Downloads
WoW2.Backbone.Data.Transportation.Cars

Global manufactured cars static data and data providers

GitHub repositories (3)

Showing the top 3 popular GitHub repositories that depend on Sep:

Repository Stars
DataDog/dd-trace-dotnet
.NET Client Library for Datadog APM
joakimskoog/AnApiOfIceAndFire
An API of Ice And Fire is the world's greatest source for quantified and structured data from the universe of Ice and Fire (as well as the HBO series Game of Thrones).
JasonBock/Rocks
A mocking library based on the Compiler APIs (Roslyn + Mocks)
Version Downloads Last updated
0.5.5 20,057 10/8/2024
0.5.4 12,415 9/20/2024
0.5.3 80,473 7/11/2024
0.5.2 1,128,724 4/21/2024
0.5.1 156 4/20/2024
0.5.0 4,298 4/15/2024
0.4.6 2,592 4/4/2024
0.4.5 2,200 3/28/2024
0.4.4 868 3/20/2024
0.4.3 1,317 3/10/2024
0.4.2 318 3/8/2024
0.4.1 126 3/8/2024
0.4.0 18,798 1/1/2024
0.4.0-preview.1 115 12/23/2023
0.3.0 910,950 11/18/2023
0.2.7 5,080 10/12/2023
0.2.6 495 9/27/2023
0.2.5 293 9/14/2023
0.2.4 269 9/8/2023
0.2.3 296 9/5/2023
0.2.2 475,375 8/10/2023
0.2.1 180 8/10/2023
0.2.0 980 8/7/2023
0.2.0-preview.3 122 7/29/2023
0.1.0 832 5/30/2023
0.1.0-rc.1 106 5/26/2023
0.1.0-preview.8 86 5/26/2023
0.1.0-preview.7 114 5/8/2023
0.1.0-preview.6 106 4/24/2023
0.1.0-preview.5 115 3/19/2023
0.1.0-preview.4 118 12/31/2022
0.1.0-preview.3 118 12/4/2022
0.1.0-preview.2 141 3/21/2022
0.1.0-preview.1 148 1/28/2022