Izumi.SICK
0.1.0-alpha.24
See the version list below for details.
dotnet add package Izumi.SICK --version 0.1.0-alpha.24
NuGet\Install-Package Izumi.SICK -Version 0.1.0-alpha.24
<PackageReference Include="Izumi.SICK" Version="0.1.0-alpha.24" />
paket add Izumi.SICK --version 0.1.0-alpha.24
#r "nuget: Izumi.SICK, 0.1.0-alpha.24"
// Install Izumi.SICK as a Cake Addin #addin nuget:?package=Izumi.SICK&version=0.1.0-alpha.24&prerelease // Install Izumi.SICK as a Cake Tool #tool nuget:?package=Izumi.SICK&version=0.1.0-alpha.24&prerelease
SICK: Streams of Independent Constant Keys
SICK
is an approach to handle JSON
-like structures and various libraries implementing it.
SICK
allows you to achieve the following:
- Store
JSON
-like data in efficient indexed binary form - Avoid reading and parsing whole
JSON
files and access the data you need just in time - Store multiple
JSON
-like structures in one deduplicating storage - Implement ideal streaming parsers for
JSON
-like data - Efficiently stream updates for
JSON
-like data
The tradeoff for these benefits is somehow more complicated and less efficient encoder.
The problem
JSON
has a Type-2 grammar and requires a pushdown automaton to parse it. So, it's not possible to implement efficient streaming parser for JSON
. Just imagine a huge hierarchy of nested JSON
objects: you won't be able to finish parsing the top-level object until the whole file.
JSON
is frequently used to store and transfer large amounts of data and these transfers tend to grow over time. Just imagine a typical JSON
config file for a large enterprise product.
The non-streaming nature of almost all the JSON parsers requires a lot of work to be done every time you need to deserialize your a huge chunk of JSON
data: you need to read it from disk, parse it and, usually, map raw JSON
tree to object instances.
This may be very inefficient and cause unnecessary delays and pauses.
The idea
Let's assume that we have a small JSON
:
[
{"some key": "some value"},
{"some key": "some value"},
{"some value": "some key"},
]
Let's build a table for every unique value in our JSON
:
Type | index | Value | Is Root |
---|---|---|---|
string | 0 | "some key" | No |
string | 1 | "some value" | No |
object | 0 | [string:0, string,1] | No |
object | 1 | [string:1, string:0] | No |
array | 0 | [object:0, object:0, object:1] | Yes (file.json) |
This way we flattened and deduplicated our JSON
.
Streaming
Now we may do manu different things, for example we may stream our table:
string:0 = "some key"
string:1 = "some value"
object:0.size = 2
object:0[string:0] = string:1
object:1[string:1] = string:0
array:0.size = 2
array:0[0] = object:0
array:0[1] = object:1
string:2 = "file.json"
root:0=array.0,string:2
This particular encoding is inefficient but it's streamable and, moreover, we can extend with removal message to support arbitrary updates:
array:0[0] = object:1
array:0[1] = remove
There is an interesting observation: the initial stream entries (when there is no removals) may be safely reordered, though sometimes the receiver would need to store them until it can sort them out.
Binary storage
We may note that the only complex data structures in our "Value" column are lists and (type, index)
pairs. Let's call the pairs "references".
A reference can be represented as a pair of integers, so it would have a fixed byte length.
A list of references can be represented as an integer storing list length followed by all the references' bytes. Let's note that such binary structure is indexed, when we know the index of the element we want to access we can do it immediately.
A list of any fixed-size scalar values can be represented the same way.
A list of variable-size values (e.g. a list of strings) can be represented the following way:
{strings count}{list of string offsets}{all the strings concatenated}
So, ["a", "bb", "ccc"]
would become something like 3 0 2 3 a b bb ccc
without spaces.
An important fact is that this encoding is indexed too and it can be reused to store any lists of variable-length data.
Additional capabilities over JSON
SICK
encoding follows compositional principles of JSON
(a set primitive types plus lists and dictionaries), though it is more powerful: it has "reference" type and allows you to encode custom types.
(1) It's easy to note that our table may store circular references, something JSON
can't do natively:
Type | index | Value | Is Root |
---|---|---|---|
object | 0 | [string:0, object:1] | No |
object | 1 | [string:1, object:0] | No |
This may be convenient in some complex cases.
(2) Also we may note, that we may happily store multiple json files in one table and have full deduplication over their content. We just need to introduce a separate attribute (is root
)
storing either nothing or the name of our "root entry" (JSON
file).
In real implementation it's more convenient to just create a separate "root" type, the value of a root type should always be a reference to its name and a reference to the actual JSON
value we encoded:
Type | index | Value |
---|---|---|
string | 0 | "some key" |
string | 1 | "some value" |
string | 2 | "some value" |
object | 0 | [string:0, string,1] |
object | 1 | [string:1, string:0] |
array | 0 | [object:0, object:0, object:1] |
root | 0 | [string:2, array:0] |
(3) We may encode custom scalar data types (e.g. timestamps) natively just by introducing new type tags.
(4) We may even store polymorphic types by introducing new type tags or even new type references.
Implementation
TODO
Streaming
TODO
Efficient binary indexed storage
TODO
Limitations
Current implementation has the following limitations:
- Maximum object size:
65534
keys - The order of object keys is not preserved
- Maximum amount of array elements:
2^32
- Maximum amount of unique values of the same type:
2^32
These limitations may be lifted by using more bytes to store offset pointers and counts on binary level. Though it's hard to imagine a real application which would need that.
Product | Versions Compatible and additional computed target framework versions. |
---|---|
.NET | net5.0 was computed. net5.0-windows was computed. net6.0 was computed. net6.0-android was computed. net6.0-ios was computed. net6.0-maccatalyst was computed. net6.0-macos was computed. net6.0-tvos was computed. net6.0-windows was computed. net7.0 was computed. net7.0-android was computed. net7.0-ios was computed. net7.0-maccatalyst was computed. net7.0-macos was computed. net7.0-tvos was computed. net7.0-windows was computed. net8.0 was computed. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. |
.NET Core | netcoreapp3.0 was computed. netcoreapp3.1 was computed. |
.NET Standard | netstandard2.1 is compatible. |
MonoAndroid | monoandroid was computed. |
MonoMac | monomac was computed. |
MonoTouch | monotouch was computed. |
Tizen | tizen60 was computed. |
Xamarin.iOS | xamarinios was computed. |
Xamarin.Mac | xamarinmac was computed. |
Xamarin.TVOS | xamarintvos was computed. |
Xamarin.WatchOS | xamarinwatchos was computed. |
-
.NETStandard 2.1
- Newtonsoft.Json (>= 13.0.1)
NuGet packages
This package is not used by any NuGet packages.
GitHub repositories
This package is not used by any popular GitHub repositories.