HDF5.NET 1.0.0-alpha.21

Suggested Alternatives

PureHDF

Additional Details

The package has been renamed to PureHDF.

This is a prerelease version of HDF5.NET.
There is a newer prerelease version of this package available.
See the version list below for details.
dotnet add package HDF5.NET --version 1.0.0-alpha.21                
NuGet\Install-Package HDF5.NET -Version 1.0.0-alpha.21                
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="HDF5.NET" Version="1.0.0-alpha.21" />                
For projects that support PackageReference, copy this XML node into the project file to reference the package.
paket add HDF5.NET --version 1.0.0-alpha.21                
#r "nuget: HDF5.NET, 1.0.0-alpha.21"                
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
// Install HDF5.NET as a Cake Addin
#addin nuget:?package=HDF5.NET&version=1.0.0-alpha.21&prerelease

// Install HDF5.NET as a Cake Tool
#tool nuget:?package=HDF5.NET&version=1.0.0-alpha.21&prerelease                

See https://github.com/Apollo3zehn/HDF5.NET/issues/9 for not yet implemented features.

API Documentation
.NET Standard 2.0
.NET Standard 2.1
.NET 5
.NET 6

HDF5.NET

GitHub Actions NuGet

A pure C# library without native dependencies that makes reading of HDF5 files (groups, datasets, attributes, links, ...) very easy.

The minimum supported target framework is .NET Standard 2.0 which includes

  • .NET Framework 4.6.1+
  • .NET Core (all versions)
  • .NET 5+

This library runs on all platforms (ARM, x86, x64) and operating systems (Linux, Windows, MacOS, Raspbian, etc) that are supported by the .NET ecosystem without special configuration.

The implemention follows the HDF5 File Format Specification.

Overwhelmed by the number of different HDF 5 libraries? Here is a comparison table.

1. Objects

// open HDF5 file, the returned H5File instance represents the root group ('/')
using var root = H5File.OpenRead(filePath);

1.1 Get Object

Group

// get nested group
var group = root.Group("/my/nested/group");

Dataset


// get dataset in group
var dataset = group.Dataset("myDataset");

// alternatively, use the full path
var dataset = group.Dataset("/my/nested/group/myDataset");

Commited Data Type

// get commited data type in group
var commitedDatatype = group.CommitedDatatype("myCommitedDatatype");

Any Object Type

When you do not know what kind of link to expect at a given path, use the following code:

// get H5Object (base class of all HDF5 object types)
var myH5Object = group.Get("/path/to/unknown/object");

1.2 Additional Info

With an external link pointing to a relative file path it might be necessary to provide a file prefix (see also this overview).

You can either set an environment variable:

Environment.SetEnvironmentVariable("HDF5_EXT_PREFIX", "/my/prefix/path");

Or you can pass the prefix as an overload parameter:

var linkAccess = new H5LinkAccess() 
{
    ExternalLinkPrefix = prefix 
}

var dataset = group.Dataset(path, linkAccess);

Iteration

Iterate through all links in a group:

foreach (var link in group.Children)
{
    var message = link switch
    {
        H5Group group               => $"I am a group and my name is '{group.Name}'.",
        H5Dataset dataset           => $"I am a dataset, call me '{dataset.Name}'.",
        H5CommitedDatatype datatype => $"I am the data type '{datatype.Name}'.",
        H5UnresolvedLink lostLink   => $"I cannot find my link target =( shame on '{lostLink.Name}'."
        _                           => throw new Exception("Unknown link type");
    }

    Console.WriteLine(message)
}

An H5UnresolvedLink becomes part of the Children collection when a symbolic link is dangling, i.e. the link target does not exist or cannot be accessed.

2. Attributes

// get attribute of group
var attribute = group.Attribute("myAttributeOnAGroup");

// get attribute of dataset
var attribute = dataset.Attribute("myAttributeOnADataset");

3. Data

The following code samples work for datasets as well as attributes.

// class: fixed-point

    var data = dataset.Read<int>();

// class: floating-point

    var data = dataset.Read<double>();

// class: string

    var data = dataset.ReadString();

// class: bitfield

    [Flags]
    enum SystemStatus : ushort /* make sure the enum in HDF file is based on the same type */
    {
        MainValve_Open          = 0x0001
        AuxValve_1_Open         = 0x0002
        AuxValve_2_Open         = 0x0004
        MainEngine_Ready        = 0x0008
        FallbackEngine_Ready    = 0x0010
        // ...
    }

    var data = dataset.Read<SystemStatus>();
    var readyToLaunch = data[0].HasFlag(SystemStatus.MainValve_Open | SystemStatus.MainEngine_Ready);

// class: opaque

    var data = dataset.Read<byte>();
    var data = dataset.Read<MyOpaqueStruct>();

// class: compound

    /* option 1 (faster) */
    var data = dataset.Read<MyNonNullableStruct>();
    /* option 2 (slower, for more info see the link below after this code block) */
    var data = dataset.ReadCompound<MyNullableStruct>();

// class: reference

    var data = dataset.Read<H5ObjectReference>();
    var firstRef = data.First();

    /* NOTE: Dereferencing would be quite fast if the object's name
     * was known. Instead, the library searches recursively for the  
     * object. Do not dereference using a parent (group) that contains
     * any circular soft links. Hard links are no problem.
     */

    /* option 1 (faster) */
    var firstObject = directParent.Get(firstRef);

    /* option 1 (slower, use if you don't know the objects parent) */
    var firstObject = root.Get(firstRef);

// class: enumerated

    enum MyEnum : short /* make sure the enum in HDF file is based on the same type */
    {
        MyValue1 = 1,
        MyValue2 = 2,
        // ...
    }

    var data = dataset.Read<MyEnum>();

// class: variable length

    var data = dataset.ReadString();

// class: array

    var data = dataset
        .Read<int>()
        /* dataset dims = int[2, 3] */
        /*   array dims = int[4, 5] */
        .ToArray4D(2, 3, 4, 5);

// class: time
// -> not supported (reason: the HDF5 C lib itself does not fully support H5T_TIME)

For more information on compound data, see section Reading compound data.

4. Partial I/O and Hyperslabs

4.1 Overview

Partial I/O is one of the strengths of HDF5 and is applicable to all dataset types (contiguous, compact and chunked). With HDF5.NET, the full dataset can be read with a simple call to dataset.Read(). However, if you want to read only parts of the dataset, hyperslab selections are your friend. The following code shows how to work with these selections using a three-dimensional dataset (source) and a two-dimensional memory buffer (target):

var dataset = root.Dataset("myDataset");
var memoryDims = new ulong[] { 75, 25 };

var datasetSelection = new HyperslabSelection(
    rank: 3,
    starts: new ulong[] { 2, 2, 0 },
    strides: new ulong[] { 5, 8, 2 },
    counts: new ulong[] { 5, 3, 2 },
    blocks: new ulong[] { 3, 5, 2 }
);

var memorySelection = new HyperslabSelection(
    rank: 2,
    starts: new ulong[] { 2, 1 },
    strides: new ulong[] { 35, 17 },
    counts: new ulong[] { 2, 1 },
    blocks: new ulong[] { 30, 15 }
);

var result = dataset
    .Read<int>(
        fileSelection: datasetSelection,
        memorySelection: memorySelection,
        memoryDims: memoryDims
    )
    .ToArray2D(75, 25);

All shown parameters are optional. For example, when the fileSelection parameter is unspecified, the whole dataset will be read. Note that the number of data points in the file selection must always match that of the memory selection.

Additionally, there is an overload method that allows you to provide your own buffer.

4.2 Experimental: IQueryable (1-dimensional data only)

Another way to build the file selection is to invoke the AsQueryable method which can then be used as follows:

var result = dataset.AsQueryable<int>()
    .Skip(5)    // start
    .Stride(5)  // stride
    .Repeat(2)  // count
    .Take(3)    // block
    .ToArray();

All methods are optional, i.e. the code

var result = dataset.AsQueryable<int>()
    .Skip(5)
    .ToArray();

will simply skip the first 5 elements and return the rest of the dataset.

This way of building a hyperslab / selection has been implemented in an efford to provide a more .NET-like experience when working with data.

5. Filters

5.1 Built-in Filters

  • Shuffle (hardware accelerated<sup>1</sup>, SSE2/AVX2)
  • Fletcher32
  • Deflate (zlib)
  • Scale-Offset

<sup>1</sup> NET Standard 2.1 and above

5.2 External Filters

Before you can use external filters, you need to register them using H5Filter.Register(...). This method accepts a filter identifier, a filter name and the actual filter function.

This function could look like the following and should be adapted to your specific filter library:

public static Memory<byte> FilterFunc(
    H5FilterFlags flags, 
    uint[] parameters, 
    Memory<byte> buffer)
{
    // Decompressing
    if (flags.HasFlag(H5FilterFlags.Decompress))
    {
        // pseudo code
        byte[] decompressedData = MyFilter.Decompress(parameters, buffer.Span);
        return decompressedData;
    }
    // Compressing
    else
    {
        throw new Exception("Writing data chunks is not yet supported by HDF5.NET.");
    }
}

5.3 Tested External Filters

5.4 How to use Deflate (hardware accelerated)

(1) Install the P/Invoke package:

dotnet package add Intrinsics.ISA-L.PInvoke

(2) Add the Deflate filter registration helper function to your code.

(3) Register Deflate:

 H5Filter.Register(
     identifier: H5FilterID.Deflate, 
     name: "deflate", 
     filterFunc: DeflateHelper_Intel_ISA_L.FilterFunc);

(4) Enable unsafe code blocks in .csproj:

<PropertyGroup>
    <AllowUnsafeBlocks>true</AllowUnsafeBlocks>
</PropertyGroup>

5.5 How to use Blosc / Blosc2 (hardware accelerated)

(1) Install the P/Invoke package:

dotnet package add Blosc2.PInvoke

(2) Add the Blosc filter registration helper function to your code.

(3) Register Blosc:

 H5Filter.Register(
     identifier: (H5FilterID)32001, 
     name: "blosc2", 
     filterFunc: BloscHelper.FilterFunc);

5.6 How to use BZip2

(1) Install the SharpZipLib package:

dotnet package add SharpZipLib

(2) Add the BZip2 filter registration helper function and the MemorySpanStream implementation to your code.

(3) Register BZip2:

 H5Filter.Register(
     identifier: (H5FilterID)307, 
     name: "bzip2", 
     filterFunc: BZip2Helper.FilterFunc);

6. Reading Compound Data

There are three ways to read structs which are explained in the following sections. Here is an overview:

method return value speed restrictions
Read<T>() T fast predefined type with correct field offsets required; nullable fields are not allowed
ReadCompound<T>() T slow predefined type with matching names required
ReadCompound() Dictionary<string, object> slow -

6.1 Structs without nullable fields

Structs without any nullable fields (i.e. no strings and other reference types) can be read like any other dataset using a high performance copy operation:

[StructLayout(LayoutKind.Explicit, Size = 5)]
struct SimpleStruct
{
    [FieldOffset(0)]
    public byte ByteValue;

    [FieldOffset(1)]
    public ushort UShortValue;

    [FieldOffset(3)]
    public TestEnum EnumValue;
}

var compoundData = dataset.Read<SimpleStruct>();

Just make sure the field offset attributes matches the field offsets defined in the HDF5 file when the dataset was created.

This method does not require that the structs field names match since they are simply mapped by their offset.

If your struct contains an array of fixed size (here: 3), you would need to add the unsafe modifier to the struct definition and define the struct as follows:

[StructLayout(LayoutKind.Explicit, Size = 8)]
unsafe struct SimpleStructWithArray
{
    // ... all the fields from the struct above, plus:

    [FieldOffset(5)]
    public fixed float FloatArray[3];
}

var compoundData = dataset.Read<SimpleStruct>();

6.2 Structs with nullable fields (strings, arrays)

If you have a struct with string or normal array fields, you need to use the slower ReadCompound method:

struct NullableStruct
{
    public float FloatValue;
    public string StringValue1;
    public string StringValue2;
    public byte ByteValue;
    public short ShortValue;

    [MarshalAs(UnmanagedType.ByValArray, SizeConst = 3)]
    public float[] FloatArray;
}

var compoundData = dataset.ReadCompound<NullableStruct>();
var compoundData = attribute.ReadCompound<NullableStruct>();
  • Please note the use of the MarshalAs attribute on the array property. This attribute tells the runtime that this array is of fixed size (here: 3) and that it should be treated as value which is embedded into the struct instead of being a separate object.

  • Nested structs with nullable fields are not supported with this method.

  • Arrays with nullable element type are not supported with this method.

  • It is mandatory that the field names match exactly those in the HDF5 file. If you would like to use custom field names, consider the following approach:


// Apply the H5NameAttribute to the field with custom name.
struct NullableStructWithCustomFieldName
{
    [H5Name("FloatValue")]
    public float FloatValueWithCustomName;

    // ... more fields
}

// Create a name translator.
Func<FieldInfo, string> converter = fieldInfo =>
{
    var attribute = fieldInfo.GetCustomAttribute<H5NameAttribute>(true);
    return attribute is not null ? attribute.Name : fieldInfo.Name;
};

// Use that name translator.
var compoundData = dataset.ReadCompound<NullableStructWithCustomFieldName>(converter);

6.3 Unknown structs

You have no idea how the struct in the H5 file looks like? Or it is so large that it is no fun to predefine it? In that case, you can fall back to the non-generic dataset.ReadCompound() which returns a Dictionary<string, object?>[] where the dictionary values can be anything from simple value types to arrays or nested dictionaries (or even H5ObjectReference), depending on the kind of data in the file. Use the standard .NET dictionary methods to work with these kind of data.

The type mapping is as follows:

H5 type .NET type
fixed point, 1 byte, unsigned byte
fixed point, 1 byte, signed sbyte
fixed point, 2 bytes, unsigned ushort
fixed point, 2 bytes, signed short
fixed point, 4 bytes, unsigned uint
fixed point, 4 bytes, signed int
fixed point, 8 bytes, unsigned ulong
fixed point, 8 bytes, signed long
floating point, 4 bytes float
floating point, 8 bytes, double
string string
bitfield byte[]
opaque byte[]
compound Dictionary<string, object?>
reference H5ObjectReference
enumerated <base type>
variable length, type = string string
array <base type>[]

Not supported data types like time and variable length type = sequence will be represented as null.

7. Advanced Scenarios

7.1 Memory-Mapped File

In some cases, it might be useful to read data from a memory-mapped file instead of a regular FileStream to reduce the number of (costly) system calls. Depending on the file structure this may heavily increase random access performance. Here is an example:

using var mmf = MemoryMappedFile.CreateFromFile(
    fileStream, 
    mapName: default, 
    capacity: 0, 
    MemoryMappedFileAccess.Read,
    HandleInheritability.None);

using var mmfStream = mmf.CreateViewStream(
    offset: 0, 
    size: 0,
    MemoryMappedFileAccess.Read);

using var root = H5File.Open(mmfStream);

...

7.2 Reading Multidimensional Data

7.2.1 Generic Method

Sometimes you want to read the data as multidimensional arrays. In that case use one of the byte[] overloads like ToArray3D (there are overloads up to 6D). Here is an example:

var data3D = dataset
    .Read<int>()
    .ToArray3D(new long[] { -1, 7, 2 });

The methods accepts a long[] with the new array dimensions. This feature works similar to Matlab's reshape function. A slightly adapted citation explains the behavior:

When you use -1 to automatically calculate a dimension size, the dimensions that you do explicitly specify must divide evenly into the number of elements in the input array.

7.2.2 High-Performance Method (2D only)

The previously shown method (ToArrayXD) performs a copy operation. If you would like to avoid this, you might find the Span2D type interesting which is part of the CommunityToolkit.HighPerformance. To make use of it, run dotnet add package CommunityToolkit.HighPerformance and then use it like this:

using CommunityToolkit.HighPerformance;

data2D = dataset
    .Read<int>()
    .AsSpan()
    .AsSpan2D(height: 20, width: 10);

No data are being copied and you can work with the array similar to a normal Span<T>, i.e. you may want to slice through it.

8 Asynchronous Data Access (.NET 6+)

HDF5.NET supports reading data asynchronously to allow the CPU work on other tasks while waiting for the result.

Note: All async methods shown below are only truly asynchronous if the FileStream is opened with the useAsync parameter set to true:

var h5File = H5File.Open(
    filePath,
    FileMode.Open, 
    FileAccess.Read, 
    FileShare.Read, 
    useAsync: true);

// alternative
var stream = new FileStream(..., useAsync: true);
var h5File = H5File.Open(stream);

Sample 1: Load data of two datasets

async Task LoadDataAsynchronously()
{
    var data1Task = dataset1.ReadAsync<int>();
    var data2Task = dataset2.ReadAsync<int>();

    await Task.WhenAll(data1Task, data2Task);
}

Sample 2: Load data of two datasets and process it

async Task LoadAndProcessDataAsynchronously()
{
    var processedData1Task = Task.Run(async () => 
    {
        var data1 = await dataset1.ReadAsync<int>();
        ProcessData(data1);
    });

    var processedData2Task = Task.Run(async () => 
    {
        var data2 = await dataset2.ReadAsync<int>();
        ProcessData(data2);
    });

    await Task.WhenAll(processedData1Task, processedData2Task);
}

Sample 3: Load data of a single dataset and process it


async Task LoadAndProcessDataAsynchronously()
{
    var processedData1Task = Task.Run(async () => 
    {
        var fileSelection1 = new HyperslabSelection(start: 0, block: 50);
        var data1 = await dataset1.ReadAsync<int>(fileSelection1);

        ProcessData(data1);
    });

    var processedData2Task = Task.Run(async () => 
    {
        var fileSelection2 = new HyperslabSelection(start: 50, block: 50);
        var data2 = await dataset2.ReadAsync<int>(fileSelection2);

        ProcessData(data2);
    });

    await Task.WhenAll(processedData1Task, processedData2Task);
}

9 Comparison Table

The following table considers only projects listed on Nuget.org.

Name Arch Platform Kind Mode Version License Maintainer Comment
v1.10
HDF5.NET all all managed ro N/A MIT Apollo3zehn version does not apply, standalone implementation
HDF5-CSharp x86,x64 Win,Lin,Mac HL rw 1.10.6 MIT LiorBanai
SciSharp.Keras.HDF5 x86,x64 Win,Lin,Mac HL rw 1.10.5 MIT SciSharp fork of HDF-CSharp
ILNumerics.IO.HDF5 x64 Win,Lin HL rw ? proprietary IL_Numerics_GmbH probably 1.10
LiteHDF x86,x64 Win,Lin,Mac HL ro 1.10.5 MIT silkfire
hdflib x86,x64 Windows HL wo 1.10.6 MIT bdebree
Mbc.Hdf5Utils x86,x64 Win,Lin,Mac HL rw 1.10.6 Apache-2.0 bqstony
HDF.PInvoke x86,x64 Windows bindings rw 1.8,1.10 HDF5 hdf,gheber
HDF.PInvoke.1.10 x86,x64 Win,Lin,Mac bindings rw 1.10.6 HDF5 hdf,Apollo3zehn
HDF.PInvoke.NETStandard x86,x64 Win,Lin,Mac bindings rw 1.10.5 HDF5 surban
v1.8
HDF5DotNet.x64 x64 Windows HL rw 1.8 HDF5 thieum
HDF5DotNet.x86 x86 Windows HL rw 1.8 HDF5 thieum
sharpHDF x64 Windows HL rw 1.8 MIT bengecko
HDF.PInvoke x86,x64 Windows bindings rw 1.8,1.10 HDF5 hdf,gheber
hdf5-v120-complete x86,x64 Windows native rw 1.8 HDF5 daniel.gracia
hdf5-v120 x86,x64 Windows native rw 1.8 HDF5 keen

Abbreviations:

Term .NET API Native dependencies
managed high-level none
HL high-level C-library
bindings low-level C-library
native none C-library
Product Compatible and additional computed target framework versions.
.NET net5.0 is compatible.  net5.0-windows was computed.  net6.0 is compatible.  net6.0-android was computed.  net6.0-ios was computed.  net6.0-maccatalyst was computed.  net6.0-macos was computed.  net6.0-tvos was computed.  net6.0-windows was computed.  net7.0 was computed.  net7.0-android was computed.  net7.0-ios was computed.  net7.0-maccatalyst was computed.  net7.0-macos was computed.  net7.0-tvos was computed.  net7.0-windows was computed.  net8.0 was computed.  net8.0-android was computed.  net8.0-browser was computed.  net8.0-ios was computed.  net8.0-maccatalyst was computed.  net8.0-macos was computed.  net8.0-tvos was computed.  net8.0-windows was computed. 
.NET Core netcoreapp2.0 was computed.  netcoreapp2.1 was computed.  netcoreapp2.2 was computed.  netcoreapp3.0 was computed.  netcoreapp3.1 was computed. 
.NET Standard netstandard2.0 is compatible.  netstandard2.1 is compatible. 
.NET Framework net461 was computed.  net462 was computed.  net463 was computed.  net47 was computed.  net471 was computed.  net472 was computed.  net48 was computed.  net481 was computed. 
MonoAndroid monoandroid was computed. 
MonoMac monomac was computed. 
MonoTouch monotouch was computed. 
Tizen tizen40 was computed.  tizen60 was computed. 
Xamarin.iOS xamarinios was computed. 
Xamarin.Mac xamarinmac was computed. 
Xamarin.TVOS xamarintvos was computed. 
Xamarin.WatchOS xamarinwatchos was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.