ClickHouse.BulkExtension 1.0.3

dotnet add package ClickHouse.BulkExtension --version 1.0.3                
NuGet\Install-Package ClickHouse.BulkExtension -Version 1.0.3                
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="ClickHouse.BulkExtension" Version="1.0.3" />                
For projects that support PackageReference, copy this XML node into the project file to reference the package.
paket add ClickHouse.BulkExtension --version 1.0.3                
#r "nuget: ClickHouse.BulkExtension, 1.0.3"                
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
// Install ClickHouse.BulkExtension as a Cake Addin
#addin nuget:?package=ClickHouse.BulkExtension&version=1.0.3

// Install ClickHouse.BulkExtension as a Cake Tool
#tool nuget:?package=ClickHouse.BulkExtension&version=1.0.3                

ClickHouse.BulkExtension

NuGet License: MIT

Overview

ClickHouse.BulkExtension is a high-performance .NET library designed for efficient bulk insertion of data into ClickHouse databases via streaming. It supports both synchronous and asynchronous data sources, including streaming from non-materialized collections, and provides advanced features like dynamic code compilation and optional data compression.

Features

  • Support for IEnumerable and IAsyncEnumerable: Stream data directly from synchronous or asynchronous enumerables without materializing the entire collection in memory.
  • Dynamic Code Compilation: Uses dynamic code generation to avoid boxing of value types, improving performance and reducing memory allocations.
  • Optional Compression: Supports optional data compression (e.g., GZip) to reduce network bandwidth usage.
  • High Performance: Designed for high-throughput scenarios, optimized to minimize CPU usage and memory allocations.
  • Minimal Memory Allocations: Efficiently handles large datasets with minimal impact on memory consumption.
  • Easy Integration: Seamlessly integrates with existing projects using ClickHouse.

Benchmark Results

Below are benchmark results comparing different methods of bulk insertion, highlighting the performance benefits of ClickHouseBulkExtension.

Environment:

  • Operating System: Windows 11 (10.0.22631.4169/23H2/2023Update/SunValley3)
  • Processor: AMD Ryzen 7 5800X, 1 CPU, 16 logical and 8 physical cores
  • .NET SDK: 8.0.108

BenchmarkDotNet v0.14.0

Disclaimer: The benchmark results are based on specific hardware and software configurations. Your results may vary depending on your environment.

Method Count Mean (ms) Allocated (KB)
Traditional Bulk Insert 10,000 31.05 7,211.42
BulkExtension (Complex Struct) 10,000 60.53 10.85
BulkExtension (Complex Struct) without compression 10,000 52.29 10.57
Traditional Bulk Insert 100,000 195.88 69,989.35
BulkExtension (Complex Struct) 100,000 199.56 11.25
BulkExtension (Complex Struct) without compression 100,000 123.70 10.63
Traditional Bulk Insert 300,000 582.16 210,602.02
BulkExtension (Complex Struct) 300,000 518.25 12.77
BulkExtension (Complex Struct) without compression 300,000 285.06 11.55
Traditional Bulk Insert 1,000,000 2,007.25 696,371.72
BulkExtension (Complex Struct) 1,000,000 1,599.89 12.23
BulkExtension (Complex Struct) without compression 1,000,000 838.29 12.86
  • Traditional Bulk Insert: Using ClickHouse.Client.Copy method for bulk insertion with complex structures.
  • Traditional Bulk Insert without compression: No results, because ClickHouse.Client.Copy does not support compression switch off. (why?!)
  • BulkExtension (Complex Struct): Using ClickHouse.Client.BulkExtension with complex structures.
  • BulkExtension (Complex Struct) without compression: Using ClickHouse.Client.BulkExtension with complex structures and no compression.
Interpretation of Results
  • Memory Allocations: The methods using ClickHouse.Client.BulkExtension show over 99% reduction in memory allocations compared to traditional methods. This demonstrates efficient memory usage, crucial for large datasets.
  • Performance: Execution times are competitive. In cases with 1,000,000 records, BulkExtension performs faster than traditional methods, showcasing its high performance.
  • Scalability: The library maintains consistent performance and low memory usage even as data volume increases, proving its scalability.

Resource Consumption

The library is designed to minimize resource consumption by efficiently streaming data to ClickHouse, reducing the impact on system resources, and improving overall performance.

Let's examine the resource consumption during a one-minute operation with a batch of one million records, totaling approximately 100 million records inserted:

Memory Profile 1 Figure 1: Memory consumption over the operation period. Memory Profile 2 Figure 2: Detailed memory allocation analysis.

For profiling, a structure with three fields and a total size of 24 bytes was used. Given the total insertion of 100 million records, the payload traffic amounts to approximately 2.4 gigabytes. The screenshots show the library's memory consumption over the entire period of operation—200.6 kilobytes—which is about 0.008% of the total payload traffic.

The test console utility and the profiling file are located in the ClickHouse.BulkExtension.Profiling folder (ClickHouse.BulkExtension.Profiling.dmw for dotMemory).

Getting Started

Installation

You can install the library via NuGet:

  • To install the full package with the client:
dotnet add package ClickHouse.Client.BulkExtension
  • If you only need the bulk copy functionality:
dotnet add package ClickHouse.BulkExtension

If you need just the bulk copy library.

Usage

The examples folder contains examples of using ClickHouse.Client.BulkExtension for both synchronous and asynchronous data insertion.

Dynamic Code Compilation

ClickHouseBulkExtension uses dynamic code compilation to generate optimized serialization code at runtime. This avoids boxing of value types, reducing memory allocations and improving performance.

How It Works

  • Streaming Data: The library streams data directly to ClickHouse using HTTP streaming, supporting both synchronous (IEnumerable) and asynchronous (IAsyncEnumerable) data sources.
  • Optimized Serialization: By dynamically generating serialization code, the library efficiently converts your data types into the binary format expected by ClickHouse.
  • Compression: Optional compression reduces the amount of data transmitted over the network, which can improve performance when network bandwidth is a limiting factor.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contributing

Contributions are welcome! If you find a bug or have a feature request, please open an issue on the GitHub repository. Pull requests are also appreciated.

Support

For questions or assistance, feel free to open an issue or contact us at cuzitworks92@gmail.com.


Feel free to explore the library and integrate it into your projects for efficient and high-performance data insertion into ClickHouse. We look forward to your feedback and contributions!

Product Compatible and additional computed target framework versions.
.NET net6.0 is compatible.  net6.0-android was computed.  net6.0-ios was computed.  net6.0-maccatalyst was computed.  net6.0-macos was computed.  net6.0-tvos was computed.  net6.0-windows was computed.  net7.0 is compatible.  net7.0-android was computed.  net7.0-ios was computed.  net7.0-maccatalyst was computed.  net7.0-macos was computed.  net7.0-tvos was computed.  net7.0-windows was computed.  net8.0 is compatible.  net8.0-android was computed.  net8.0-browser was computed.  net8.0-ios was computed.  net8.0-maccatalyst was computed.  net8.0-macos was computed.  net8.0-tvos was computed.  net8.0-windows was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.

NuGet packages (1)

Showing the top 1 NuGet packages that depend on ClickHouse.BulkExtension:

Package Downloads
ClickHouse.Client.BulkExtension

ClickHouseBulkExtension is a high-performance .NET library designed for efficient bulk insertion of data into ClickHouse databases via streaming. It supports both synchronous and asynchronous data sources, including non-materialized collections, and provides advanced features like dynamic code compilation and optional data compression.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last updated
1.0.3 104 10/1/2024
1.0.2 104 9/30/2024
1.0.1 98 9/28/2024
1.0.0 89 9/28/2024