Axis.Dia
0.0.1
dotnet add package Axis.Dia --version 0.0.1
NuGet\Install-Package Axis.Dia -Version 0.0.1
<PackageReference Include="Axis.Dia" Version="0.0.1" />
paket add Axis.Dia --version 0.0.1
#r "nuget: Axis.Dia, 0.0.1"
// Install Axis.Dia as a Cake Addin #addin nuget:?package=Axis.Dia&version=0.0.1 // Install Axis.Dia as a Cake Tool #tool nuget:?package=Axis.Dia&version=0.0.1
Dia
Data representation specification and implementation, supporting both textual (human readable) and binary formats.
Contents
- Introduction
- Specification
- Annotation
- Bool
- Integer
- Decimal
- Instant
- String
- Symbol
- Clob
- Blob
- List
- Record
- Appendix (things like key-words, var-bytes, etc will appear here)
<a id="Introduction"></a> Introduction
Dia is yet another data representation format, born from the need for more feature-sets in json; as such, it is a superset of json.
<a id="Specification"></a> Specification
Dia recognizes 9 data types. The concept of types in Dia
allow for the absence of values: in this case, a null is used. Dia
types are:
0. Annotation
- Bool
- Int
- Decimal
- Instant
- String
- Symbol
- Clob
- Blob
- List
- Record
Every dia value represents data of the corresponding type, or the absence of data. All values may also have an optional annotation list attached to them.
As already stated, Dia supports Textual and Binary representation of a specific arrangement of the above types. The following sections will discuss the details of each of the types, as well as their representation in the 2 formats. However, before proceeding, a general overview of the binary format is necessary, as there are shared concepts among the types that need to be established first.
1. Data Packet
At the root of the Dia data model is a data packet. Strictly speaking, it is a list of 0 or more Dia values, where each Dia value can be of any of the supported types. It is in this form that data is transmitted or stored, or utilized.
2. Binary Representation
In binary representation, every value exists as a self-contained entity, independently from other values - meaning a series of bytes will represent each value, the reading and interpretation of which can be done without need for any other ancillary information. The only exception to this rule is for the symbol type, and will be explained in detail later.
Generally speaking, the first byte of each dia-value, henceforth called the type-metadata, represents it's type: since dia supports only 9 distinct types, this byte is more than sufficient to represent 9 distinct values: 1 - 9. For values of the bool type, a single byte is usually enough to encapsulate the entire data.
The first 4 bits (index 0 - 3) are reserved to represent actual type identifiers, bit 5 (index 4) is reserved for indicating if annotations exist on the value, bit 6 (index 5) is reserved to indicate if the value is null (if set), while the remaining 2 bits (index 6, 7) are left for each type to use as it pleases.
Immediately after the type-metadata is an optional byte group for annotations, and then a byte group for the types pay-load.
Dia binary packets are a sequential list of dia binary values.
3. Text Representation
As stated previously, the textual representation of dia is a superset of json
; it is essentially json
+ annotations + extra-data-types.
<a id="Annotation"></a> Annotation
An annotation is not a bonafide dia type, but a special use-case of the Symbol type. An annotation is used to tag a value with extra (possible semantic) meaning, and is usually open to interpretation by the user of the data. Annotations come in 2 flavors:
Tags, which are symbols in the
identifier
format, and cannot be equivalent to Dia keywords.Attributes (Value-Pair): a name/value pair defined as follows:
<tag>:<text>
, where<tag>
is the name, and<text>
is the value. Attributes conform to the pattern:/^\[a-zA-Z_\](([.-])?\[a-zA-Z0-9_\])*@.+\z/
, the symbol is called anattribute
.
Note: Annotations are never null (null symbols).
Text Representation
When present in textual format, attribtues present themselves as text with an end-delimiter of "::". The following examples illustrate this:
valid::annotations::<dia-value> // valid
in valid::att-ributes::<dia-value> // invalid
'scale@metric'::'expiry-notation@false positive'::<dia-value> // valid
'genre@horror'::<dia-value> // valid
abc@xyz::<dia-value> // invalid
Binary Representation
Identical to (Symbol)(#Symbol), but with a var-byte byte-count appearing first, representing the number of annotations that follow.
<a id="Bool"></a> 1. Bool
The Bool type represents typical boolean values: true
, and false
, in addition to null
.
Text Representation
Binary values are represented by case insensitive "true" and "false", and the case sensitive "null.bool".
Binary Representation
Type-Metadata Byte
[.... 0001]
(0x1)
Description
The type-metadata byte is sufficient for encapsulating all instances of the bool value. So a bool value will be represented with only 1 byte.
- true:
[.1.. 0010]
- false:
[.0.. 0010]
- null:
[..1. 0010]
- annotated:
[...1 0010]
<a id="Int"></a> 2. Integer
The int type represents signed, unlimited mathematical integer values, in addition to null
.
Text Representation
Integers come in 3 flavors:
- Decimal integers: represented as an optional negative sign, and a series of digits possibly separated by an underscore.
- Hex integers: represented by the prefix "0x", followed by an unlimited sequence of any valid hexadecimal characters (0-9, a-f, A-F), possibly separated by an underscore.
- Binary integers: represented by the prefix "0b", followed by an unlimited sequence of zeros and ones, possibly separated by an underscore. Example:
// decimal integers
0 // valid
01 // valid
1 // valid
000 // valid
34565454 // valid
343_454_53 // valid
_545 // invalid
4345_ // invalid
// hex integers
0x01 // valid
0X0 // valid
0x // invalid
0x000a // valid
0xA // valid
0xx // invalid
0x545yt345 // invalid
// binary
0b // invalid
0B0 // invalid
0b0000011010110 // invalid
0b11010110 .. valid
Binary Representation
Type-Metadata
[.... 0010]
(0x2)
Custom-Metadata
- annotated:
[...1 0010]
- null:
[..1. 0010]
- Int8:
[00.. 0010]
- Int16:
[01.. 0010]
- Int32:
[10.. 0010]
- Intxx:
[11.. 0010]
Description
Integers require additional bytes besides the type-metadata for their data to be represented.
Dia supports unlimited/arbitrary-length integers, so special consideration is taken to cater for this.
Examples
- To represent the value '42' as an Int8 value, we need 2 bytes:
- 00 [00.. 0010]
- 01 [0010 1010]
- To represent the value '1228' as an Int16 value, we need 3 bytes:
- 00 [01.. 0010]
- 01 [1100 1100]
- 02 [0000 0100]
- To represent the value '86443187' as an Int32 value, we need 5 bytes:
- 00 [10.. 0010]
- 01 [1011 0011]
- 02 [0000 0100]
- 03 [0010 0111]
- 04 [0000 0101]
- To represent the arbitrarily large integers, e.g '9223372036854775807000981123', we need 15 bytes:
- 00 [11.. 0010]
- 01 [1000 0011]
- 02 [1101 1101]
- 03 [1101 0000]
- 04 [1010 0011]
- 05 [1111 1100]
- 06 [1111 1111]
- 07 [1111 1111]
- 08 [1111 1111]
- 09 [1111 1111]
- 10 [1111 1111]
- 11 [1001 0011]
- 12 [1110 1011]
- 13 [1101 1100]
- 14 [0000 0011]
See var-byte for a better explanation of how var-bytes
are represented.
<a id="Decimal"></a> 3. Decimal
The Decimal type represents signed, unlimited mathematical floating point values, in addition to null
.
Text Representation
Decimals come in 2 flavors:
- Regular decimal notation
- Scientific/exponent decimal notation. Examples:
// regular notation
0.0
123.0
-0.44544
// exponent notation
0.0e0
2.5E-1
2345.0E+06
With either notation, the fractional part of the number must always be present. Also, the exponent notation is essentially the regular notation with a "E<sign><digits>" concatenated to it.
Binary Representation
Type-Metadata
[.... 0011]
(0x3)
Custom-Metadata
- annotated:
[...1 0011]
- null:
[..1. 0011]
- Decimal16:
[00.. 0011]
- BigDecimal:
[01.. 0011]
Description
The binary representation for decimals comes in 2 flavors:
- The straightforward adaptation of the
dotnet
binary representation for decimal, ergo 16 bytes of data. - The custom format for big (arbitrary size) decimal values. BigDecimals are a tuple of an int (scale), and an arbitrary-length integer (significand) represented by a var-byte value.
4. <a id="Instant"></a> Instant
The Instant type represents a timestamp, in addition to null
. Instants are represented in various precisions, with
the unspecified precisions assuming the default values.
Text Representation
In text, instants are represented in the format: <year>-<month>-<day><delimiter><hour>:<minute>:<second>.<sub-seconds><time-zone>.
- Mandatory components: year, month, day, delimiter
- Optional components: hour, minute, second, sub-seconds, time-zone. Optional components are used in order of thier appearance: e.g,
year-month-day-delimiter-second.sub-second
is invalid, except for the time-zone component, which can appear independent of the presence of all the other optional components. Examples:
2023-02-13TZ // <year>-<month>-<day><delimiter><time-zone> -> 2023/02/13 00:00:00.0 +00:00
1993-09-27T12 +00:00 // <year>-<month>-<day><delimiter><hour><time-zone> -> 1993/09/27 12:00:00.0 +00:00
1993-09-27T12:31 // <year>-<month>-<day><delimiter><hour>:<minute> -> 1993/09/27 12:31:00.0
1993-09-27T12:31:08 // <year>-<month>-<day><delimiter><hour>:<minute>:<seconds> -> 1993/09/27 12:31:08.0
1993-09-27T12:31:08.0023319 // <year>-<month>-<day><delimiter><hour>:<minute>:<seconds>.<sub-second> -> 1993/09/27 12:31:08.0023319
Binary Representation
Type-Metadata
[.... 0100]
(0x4)- annotated:
[...1 0100]
- null:
[..1. 0100]
Custom-Metadata
...
Description
The instant is made of 8 components, each with their own binary representation, the first 3 of which are mandatory.
In practice, however, there are actually only 5 components, because the Hour
, Minute
and Second
components
are stored as a unit of seconds, as well as the Month
and Day
components.
Components include (in the order they appear in the data stream):
- Year
- MD (Month-Day)
- HMS (Hour-Minute-Seconds) (optional)
- Sub-seconds (optional)
- Time-zone (optional)
Year
The year is a special component. It represents an ever increasing value, and as such a definit amount of data cannot
be reserved for it. Owing to this, it is stored using var-bytes
. The unique ability for var-bytes
to store an
arbitrary stream of bits is taken advantage of, thus the year component also acts as a "bit-lender" for other components
whose data representations overflow whole-bytes.
Specifically speaking, the Day
component borrows the needed 1 bit from the first bit of the Year
component: meaning
a single "right-shift" of the bits restores the original value of the year component.
- Data type: var-byte
- Capacity: variable
Month + Day
12 months require 4 bits to encapsulate, while 31 days require 5 bits. Together, this is a byte and one bit. As stated
above, the single bit is borrowed from the first bit-position of the year
component.
- Data type: byte
- Capacity: 1.125
- Arrangement:
- Month:
00 [.... xxxx]
- Day:
00 [xxxx ....], M0 [.... ...x]
ps:M0
represents the custom metadata byte[0]
- Month:
Hour + Minute + Second
24 hours (0-23) requires 5 bits; being identical, minutes and seconds require 6 bits each to store their range (0-59), yielding a total of 17bits, or 2 bytes and an extra bit. The extra bit is borrowed from the CustomMetadata.
- Data type: byte
- Capacity: 2.125
- Arrangement:
- Seconds:
00 [..xx xxxx]
- Minutes:
00 [xx.. ....], 01 [.... xxxx]
- Hours:
01 [xxxx ....], M0 [.... ..x.]
- Seconds:
Sub-seconds
The unit of the subsecond component is the tick, each of which is 100 nanoseconds. There are a total of 9,999,999 ticks in a second, translating to 24 bits, or 3 bytes.
- Data type: byte
- Capacity: 3
Time-Zone
The time zone is stored as a sign bit indicating the timezone direction (positive for east, negative for west), 4 bits for 12 hour range, and 6 bits for 59 minutes range, a total of 11 bits
- Data type: bits
- Capacity: 7
- Arrangement:
- Sign:
M0 [.... .x..]
- Hour:
00 [xx.. ....], M0 [...x x...]
- Minutes:
00 [..xx xxxx]
- Sign:
Overall arrangement
00 [type-metadata]
01 [custom-metadata]
02 [year]+
03 [month-day]
04 [HMS]
05* [HMS]
06 [sub-seconds]
07 [sub-seconds]
08 [sub-seconds]
07* [time-zone]
<a id="String"></a> 5. String
The string maintains its ubiquitous definition here: a sequence of unicode-encoded characters, in addition to null
.
Text Representation
There are 2 flavors of this: Single-line string, Multi-line string. Both representations are delimiter-enclosed, and support escaping.
1. Single-line string
This is represented as a delimiter-enclosed group of characters that must be presented on a single line. This means literal characters that
descend to the next line must be escaped. The enclosing delimiter for the single-line string is "
.
Examples:
"a valid string"
"another valid string with \n escaped new-line"
"another valid string with \u11EC a unicode escaped character"
"invalid
string"
2. Multi-line string
This slightly more complex flavor supports new-line characters, and epecial escape sequences that allow for user-friendly representation
of text. The enclosing delimiter for the multi-line string is @"
, and "
.
Exmaples:
@"Valid string"
@"Valid string
with new line"
@"Valid string
with new line, and another \n escaped new ilne"
// special escaping for improved formatting/readability
@"\
This string has the special new-line escape that \
essentially \"swallows\" all white-space characters \
up until the next non-whitespace character, or the \
end of the text. to use a regular back-slash, do this '\\'.
In this case, all of the white spaces preceeding the regular \
back-slash are included in the text.
"
<a id="Escapes"></a> Escape sequences
The following escape sequences are supported:
Escape Sequence | Meaning |
---|---|
\0 | Null character |
\a | Bell |
\b | Backspace |
\t | Tab |
\v | Vertical Tab |
\n | New line |
\r | Carriage return |
\f | Form feed |
\" | Double quote |
\\ | Backslash |
\NL | Escape whitespaces. A backslash followed by a new line, and arbitrary number of whitespaces |
\xHH | 1 byte char escape |
\uHHHH | 2 byte char escape |
Binary Representation
Type-Metadata
[.... 0101]
(0x5)
Custom-Metadata
- annotated:
[...1 0101]
- null:
[..1. 0101]
Description
String data is stored as unicode - ie, 2 bytes per character; however, a string-count component is used to signify how many characters (groups of 2 bytes) the string contains. The string-count comes right after the type-metadata, and is represented as a var-byte.
<a id="Symbol"></a> 6. Symbol
A symbol is similar to a string, but with a few restrictions on it: It is a sequence of ONLY printable ascii characters. There are 3 types of symbols, each depending on the nature of the character sequence contained:
When the sequence of characters conforms to the pattern
/^\[a-zA-Z_\](([.-])?\[a-zA-Z0-9_\])*\z/
, the symbol is called anidentifier
. In the identifier form, symbols cannot be equal to the Dia keywords.All other character sequence arrangements not matching the 2 above are classified as general symbols.
Two symbols are equivalent if they contain the same sequence of characters. Also note since the symbol is restricted to printable ascii characters, it means escape characters are never applied, but left in their escape format.
Symbols provide an Api for extracting attributes, or identifiers where present; only when extracting attributes are escape sequences processed and reduced to their actual unicode characters.
Textual Representation
Textually, except for identifier
symbols, symbols MUST be enclosed by the '
delimiter. In the case of identifier
s, the single-quotes
can be omited. When present however, the enclosing delimiters cannot be empty.
Escape sequences supported are all listed here, excluding \NL
, but including \'
.
Exmaples:
null.symbol // valid, representing a null symbol
abc // valid identifier symbol
'abc' // valid identifier symbol, equivalent to the previous symbol
abc_xyz // valid
abc.xyz // valid
abc-xyz // valid
symbol // invalid
symbol.something.else // valid
null // invalid (keyword)
null_something // valid
null.something // valid
'another valid symbol' // valid
'also valid \n\x4e \u1c2f symbol' // valid
Binary Representation
Symbols represent the only data types that need contextual information while reading/writing from a binary stream. The reason is that symbols are repeated a lot, and can benefit from some form of compression. The compression process used is simple:
- Differentiate between a binary representation of a regular symbol, and a symbol ID.
- The first time a regular symbol is encountered, it is read/written as a regular symbol, and an ID is created for it using a sequentially incremented integer value, and store in a symbol table. This table is built and used ONLY during the course of reading/writing.
- For reading scenarios, when a symbol ID is encountered, it means it has already been read before, so the actual value is resolved from the table mentioned above.
- For writing scenarios, subsequent encounters of the symbol will be resolved from the table, and the IDs will be written to the stream.
Type-Metadata
[.... 0110]
(0x6)
Custom-Metadata
- annotated:
[...1 0110]
- null:
[..1. 0110]
- regular symbol:
[.... 0110]
- symbol ID:
[.1.. 0110]
Description
In the same manner as with strings, data is store as unicode, so the symbol uses a var-byte to store the character count (not byte count), and following that is a sequence of bytes for the characters.
<a id="Clob"></a> 7. Clob
The Clob type represents a sequence of unicode characters whose meaning is left for the interpretation of the applications/systems that utilize it.
Text Representation
The textual representation of Clobs are identical to the multi-line
string representation with one exception: they have an extra escape sequence -
owing to the distinct enclosing delimiters. Enclosing delimiters are <<
and >>
.
Escape sequences supported are all listed here, including \>>
, and while the \NL
escape sequence is supported, the version
that accepts alignment parameters is not recognized for clobs.
Examples:
example 1
<<
clob stuff. Can be anything at all, however, use of the greater-than symbol \> must be escaped.
>>
example 2
<<\
all of the white-spaces preceeding the '\\' are ignored.
>>
example 3
<<
function asString(x) {
return x.toString("some values come here, even \\n new lines are accepted");
}
>>
Binary Representation
Type-Metadata
[.... 0111]
(0x7)
Custom-Metadata
- annotated:
[...1 0111]
- null:
[..1. 0111]
Description
Following the type-metadata is a var-byte
value that represents the number of expected bytes - since Clobs are ascii characters,
each character is one byte. Following the var-byte
value is the actual byte sequence for the Clob.
<a id="Blob"></a> 8. Blob
The blob is, as the name implies, a block of raw bytes.
Text Representation
Blob values are represented textually as a delimiter-enclosed base-64 encoding of the actual bytes. Enclosing delimiters
are <
and >
. Whitespaces and comments are allowed between the delimiters and the base-64 data.
Examples:
<
VGhlIHF1aWNrIGJyb3duIGZveCBqdW1wcyBvdmVyIHRoZSBsYXp5IGRvZy4=
>
<VGhlIHF1aWNrIGJyb3duIGZveCBqdW1wcyBvdmVyIHRoZSBsYXp5IGRvZy4=>
< VGhlIHF1aWNrIGJyb3duIGZveCBqdW1wcyBvdmVyIHRoZSBsYXp5IGRvZy4= >
Binary Representation
Type-Metadata
[.... 1000]
(0x8)
Custom-Metadata
- annotated:
[...1 1000]
- null:
[..1. 1000]
Description
Following the type-metadata is a var-byte
value that represents the number of expected bytes, following that, is the actual
byte sequence for the Blob.
<a id="List"></a> 9. List
A list is a sequence of zero or more Dia
values.
Text Representation
Similar to json, this is represented as a delimiter enclosed sequence of comma separated values. Delimiters are [
, and ]
.
Between each value can appear whitespaces or comments.
Examples:
[ 234, 3.45, true, [], 1992-04-05T, <abcxyz> << clob text>>, annotated::"string value"]
Binary Representation
Type-Metadata
[.... 1001]
(0x9)
Custom-Metadata
- annotated:
[...1 1001]
- null:
[..1. 1001]
Description
Similar to the blob, a var-byte
value follows the type-metadata, representing the number of items in the list.
Following the count, are the binary representation of each value, layed out serially.
<a id="Record"></a> 10. Record
A record is a sequence of zero or more Dia
properties. A property is a key-value pair, where the key is a non-null symbol,
and the value is any legal Dia
value. Records do not support duplicated property names in it's sequence of properties.
Text Representation
Similar to json, this is represented as a delimiter enclosed sequence of comma separated properties. Delimiters are {
, and }
.
Between each property can appear whitespaces or comments. Each property in turn consists of a symbol and
any dia-value, separated by a colon :
. Again, whitespaces/comments can appear anywhere between these elements.
Worthy of note is that the symbols in this case can also be enclosed in "
- this way, valid json
also become valid dia
structures.
Examples:
// valid
{ something: true}
// valid
{
something: 2345.54,
annotation::"key" : < b64_bytes= >,
'Key@value'::again::'property' : bleh::34
}
Binary Representation
Type-Metadata
[....-1010]
(0xA)
Custom-Metadata
- annotated:
[...1 1010]
- null:
[..1. 1010]
Description
The record is similar to the list, i.e the var-byte
count represents number of properties in the record.
Following the property count, the properties are themselves laid out serially with the key/name first, then the value
last.
<a id="Appendix"></a> Apendix
<a id="Var-byte"></a> Var-byte
A variable byte binary representation. This is a regular 1-byte integer number, except that the sign-bit now represents
'overflow': if the bit is set, it means another var-byte value follows, containing more bits for the data.
Reading the collection of var-byte
data requires removing all the overflow bits, and concatenating the remaining bits.
<a id="Comments"></a> Comments
Comments are...
<a id="Keywords"></a> Keywords
Keywords incldue...
Product | Versions Compatible and additional computed target framework versions. |
---|---|
.NET | net7.0 is compatible. net7.0-android was computed. net7.0-ios was computed. net7.0-maccatalyst was computed. net7.0-macos was computed. net7.0-tvos was computed. net7.0-windows was computed. net8.0 was computed. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net9.0 was computed. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. |
-
net7.0
- Axis.Luna.Common (>= 6.0.22)
- Axis.Luna.Extensions (>= 6.0.23)
- Axis.Luna.FInvoke (>= 6.0.22)
- Axis.Pulsar.Grammar (>= 0.7.18)
- Axis.Pulsar.Languages (>= 0.7.18)
NuGet packages
This package is not used by any NuGet packages.
GitHub repositories
This package is not used by any popular GitHub repositories.
Version | Downloads | Last updated | |
---|---|---|---|
0.0.1 | 226 | 9/26/2023 |