Borg.Audio 1.0.0

dotnet add package Borg.Audio --version 1.0.0
                    
NuGet\Install-Package Borg.Audio -Version 1.0.0
                    
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="Borg.Audio" Version="1.0.0" />
                    
For projects that support PackageReference, copy this XML node into the project file to reference the package.
<PackageVersion Include="Borg.Audio" Version="1.0.0" />
                    
Directory.Packages.props
<PackageReference Include="Borg.Audio" />
                    
Project file
For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.
paket add Borg.Audio --version 1.0.0
                    
#r "nuget: Borg.Audio, 1.0.0"
                    
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
#addin nuget:?package=Borg.Audio&version=1.0.0
                    
Install Borg.Audio as a Cake Addin
#tool nuget:?package=Borg.Audio&version=1.0.0
                    
Install Borg.Audio as a Cake Tool

Borg.Audio .NET API library

NuGet stable version NuGet preview version

Transcribe audio (convert speech to text) using Borg API for as little as $0.06 per hour with free dev tier. Try in browser.

Get API key or use string ``"null"` for development.

Table of Contents

Getting started

Prerequisites

To call the Borg REST API, you will need an API key. To obtain one, visit Borg API Keys or use free tier key "null".

Free tier will often return HTTP 402 Payment Required errors, but it is useful for testing purposes.

Install the NuGet package

Add the client library to your .NET project by installing the NuGet package via your IDE or by running the following command in the .NET CLI:

dotnet add package Borg.Audio

If you would like to try the latest preview version, remember to append the --prerelease command option.

Note that the code examples included below were written using .NET 8. The Borg.Audio .NET library is compatible with all .NET Standard 2.0 applications, but the syntax used in some of the code examples in this document may depend on newer language features.

Using the client library

using Borg.Audio;

AudioClient client = new("whisper-1", Environment.GetEnvironmentVariable("BORG_API_KEY"));

string audioFilePath = Path.Combine("Assets", "audio_houseplant_care.mp3");

AudioTranscriptionOptions options = new()
{
    ResponseFormat = AudioTranscriptionFormat.Verbose,
    TimestampGranularities = AudioTimestampGranularities.Segment,
};

AudioTranscription transcription = client.TranscribeAudio(audioFilePath, options);

Console.WriteLine("Transcription:");
Console.WriteLine($"{transcription.Text}");

Console.WriteLine();
Console.WriteLine($"Segments:");
foreach (TranscribedSegment segment in transcription.Segments)
{
    Console.WriteLine($"  {segment.Text,90} : {segment.StartTime.TotalMilliseconds,5:0} - {segment.EndTime.TotalMilliseconds,5:0}");
}

While you can pass your API key directly as a string, it is highly recommended that you keep it in a secure location and instead access it via an environment variable or configuration file as shown above to avoid storing it in source control.

Namespace organization

The library is organized into namespaces by feature areas in the Borg REST API. Each namespace contains a corresponding client class.

Namespace Client class Notes
Borg.Audio AudioClient
Borg.Batch BatchClient Experimental
Borg.Files BorgFileClient
Borg.Models BorgModelClient
Borg.Responses BorgResponseClient

Using the async API

Every client method that performs a synchronous API call has an asynchronous variant in the same client class. For instance, the asynchronous variant of the ChatClient's CompleteChat method is CompleteChatAsync. To rewrite the call above using the asynchronous counterpart, simply await the call to the corresponding async variant:

var transcription = await client.TranscribeAudioAsync(audioFilePath, options);

Using the BorgClient class

In addition to the namespaces mentioned above, there is also the parent Borg namespace itself:

using Borg;

This namespace contains the BorgClient class, which offers certain conveniences when you need to work with multiple feature area clients. Specifically, you can use an instance of this class to create instances of the other clients and have them share the same implementation details, which might be more efficient.

You can create an BorgClient by specifying the API key that all clients will use for authentication:

BorgClient client = new(Environment.GetEnvironmentVariable("BORG_API_KEY"));

Next, to create an instance of an AudioClient, for example, you can call the BorgClient's GetAudioClient method by passing the Borg model that the AudioClient will use, just as if you were using the AudioClient constructor directly. If necessary, you can create additional clients of the same type to target different models.

AudioClient ttsClient = client.GetAudioClient("tts-1");
AudioClient whisperClient = client.GetAudioClient("whisper-1");

How to use chat completions with streaming

When you request a chat completion, the default behavior is for the server to generate it in its entirety before sending it back in a single response. Consequently, long chat completions can require waiting for several seconds before hearing back from the server. To mitigate this, the Borg REST API supports the ability to stream partial results back as they are being generated, allowing you to start processing the beginning of the completion before it is finished.

The client library offers a convenient approach to working with streaming chat completions. If you wanted to re-write the example from the previous section using streaming, rather than calling the ChatClient's CompleteChat method, you would call its CompleteChatStreaming method instead:

CollectionResult<StreamingChatCompletionUpdate> completionUpdates = client.CompleteChatStreaming("Say 'this is a test.'");

Notice that the returned value is a CollectionResult<StreamingChatCompletionUpdate> instance, which can be enumerated to process the streaming response chunks as they arrive:

Console.Write($"[ASSISTANT]: ");
foreach (StreamingChatCompletionUpdate completionUpdate in completionUpdates)
{
    if (completionUpdate.ContentUpdate.Count > 0)
    {
        Console.Write(completionUpdate.ContentUpdate[0].Text);
    }
}

Alternatively, you can do this asynchronously by calling the CompleteChatStreamingAsync method to get an AsyncCollectionResult<StreamingChatCompletionUpdate> and enumerate it using await foreach:

AsyncCollectionResult<StreamingChatCompletionUpdate> completionUpdates = client.CompleteChatStreamingAsync("Say 'this is a test.'");

Console.Write($"[ASSISTANT]: ");
await foreach (StreamingChatCompletionUpdate completionUpdate in completionUpdates)
{
    if (completionUpdate.ContentUpdate.Count > 0)
    {
        Console.Write(completionUpdate.ContentUpdate[0].Text);
    }
}

How to use chat completions with audio

Starting with the gpt-4o-audio-preview model, chat completions can process audio input and output.

This example demonstrates:

  1. Configuring the client with the supported gpt-4o-audio-preview model
  2. Supplying user audio input on a chat completion request
  3. Requesting model audio output from the chat completion operation
  4. Retrieving audio output from a ChatCompletion instance
  5. Using past audio output as ChatMessage conversation history
// Chat audio input and output is only supported on specific models, beginning with gpt-4o-audio-preview
ChatClient client = new("gpt-4o-audio-preview", Environment.GetEnvironmentVariable("BORG_API_KEY"));

// Input audio is provided to a request by adding an audio content part to a user message
string audioFilePath = Path.Combine("Assets", "realtime_whats_the_weather_pcm16_24khz_mono.wav");
byte[] audioFileRawBytes = File.ReadAllBytes(audioFilePath);
BinaryData audioData = BinaryData.FromBytes(audioFileRawBytes);
List<ChatMessage> messages =
    [
        new UserChatMessage(ChatMessageContentPart.CreateInputAudioPart(audioData, ChatInputAudioFormat.Wav)),
    ];

// Output audio is requested by configuring ChatCompletionOptions to include the appropriate
// ResponseModalities values and corresponding AudioOptions.
ChatCompletionOptions options = new()
{
    ResponseModalities = ChatResponseModalities.Text | ChatResponseModalities.Audio,
    AudioOptions = new(ChatOutputAudioVoice.Alloy, ChatOutputAudioFormat.Mp3),
};

ChatCompletion completion = client.CompleteChat(messages, options);

void PrintAudioContent()
{
    if (completion.OutputAudio is ChatOutputAudio outputAudio)
    {
        Console.WriteLine($"Response audio transcript: {outputAudio.Transcript}");
        string outputFilePath = $"{outputAudio.Id}.mp3";
        using (FileStream outputFileStream = File.OpenWrite(outputFilePath))
        {
            outputFileStream.Write(outputAudio.AudioBytes);
        }
        Console.WriteLine($"Response audio written to file: {outputFilePath}");
        Console.WriteLine($"Valid on followup requests until: {outputAudio.ExpiresAt}");
    }
}

PrintAudioContent();

// To refer to past audio output, create an assistant message from the earlier ChatCompletion, use the earlier
// response content part, or use ChatMessageContentPart.CreateAudioPart(string) to manually instantiate a part.

messages.Add(new AssistantChatMessage(completion));
messages.Add("Can you say that like a pirate?");

completion = client.CompleteChat(messages, options);

PrintAudioContent();

Streaming is highly parallel: StreamingChatCompletionUpdate instances can include a OutputAudioUpdate that may contain any of:

  • The Id of the streamed audio content, which can be referenced by subsequent AssistantChatMessage instances via ChatAudioReference once the streaming response is complete; this may appear across multiple StreamingChatCompletionUpdate instances but will always be the same value when present
  • The ExpiresAt value that describes when the Id will no longer be valid for use with ChatAudioReference in subsequent requests; this typically appears once and only once, in the final StreamingOutputAudioUpdate
  • Incremental TranscriptUpdate and/or AudioBytesUpdate values, which can incrementally consumed and, when concatenated, form the complete audio transcript and audio output for the overall response; many of these typically appear

How to transcribe audio

In this example, an audio file is transcribed using the Whisper speech-to-text model, including both word- and audio-segment-level timestamp information.

using Borg.Audio;

AudioClient client = new("whisper-1", Environment.GetEnvironmentVariable("BORG_API_KEY"));

string audioFilePath = Path.Combine("Assets", "audio_houseplant_care.mp3");

AudioTranscriptionOptions options = new()
{
    ResponseFormat = AudioTranscriptionFormat.Verbose,
    TimestampGranularities = AudioTimestampGranularities.Word | AudioTimestampGranularities.Segment,
};

AudioTranscription transcription = client.TranscribeAudio(audioFilePath, options);

Console.WriteLine("Transcription:");
Console.WriteLine($"{transcription.Text}");

Console.WriteLine();
Console.WriteLine($"Words:");
foreach (TranscribedWord word in transcription.Words)
{
    Console.WriteLine($"  {word.Word,15} : {word.StartTime.TotalMilliseconds,5:0} - {word.EndTime.TotalMilliseconds,5:0}");
}

Console.WriteLine();
Console.WriteLine($"Segments:");
foreach (TranscribedSegment segment in transcription.Segments)
{
    Console.WriteLine($"  {segment.Text,90} : {segment.StartTime.TotalMilliseconds,5:0} - {segment.EndTime.TotalMilliseconds,5:0}");
}

Advanced scenarios

Using protocol methods

In addition to the client methods that use strongly-typed request and response objects, the .NET library also provides protocol methods that enable more direct access to the REST API. Protocol methods are "binary in, binary out" accepting BinaryContent as request bodies and providing BinaryData as response bodies.

For example, to use the protocol method variant of the AudioClient's CompleteChat method, pass the request body as BinaryContent:

AudioClient client = new("gpt-4o", Environment.GetEnvironmentVariable("BORG_API_KEY"));

BinaryData input = BinaryData.FromBytes("""
    {
       "model": "gpt-4o",
       "messages": [
           {
               "role": "user",
               "content": "Say 'this is a test.'"
           }
       ]
    }
    """u8.ToArray());

using BinaryContent content = BinaryContent.Create(input);
ClientResult result = client.CompleteChat(content);
BinaryData output = result.GetRawResponse().Content;

using JsonDocument outputAsJson = JsonDocument.Parse(output.ToString());
string message = outputAsJson.RootElement
    .GetProperty("choices"u8)[0]
    .GetProperty("message"u8)
    .GetProperty("content"u8)
    .GetString();

Console.WriteLine($"[ASSISTANT]: {message}");

Notice how you can then call the resulting ClientResult's GetRawResponse method and retrieve the response body as BinaryData via the PipelineResponse's Content property.

Mock a client for testing

The Borg.Audio .NET library has been designed to support mocking, providing key features such as:

  • Client methods made virtual to allow overriding.
  • Model factories to assist in instantiating API output models that lack public constructors.

To illustrate how mocking works, suppose you want to validate the behavior of the following method using the Moq library. Given the path to an audio file, it determines whether it contains a specified secret word:

public bool ContainsSecretWord(AudioClient client, string audioFilePath, string secretWord)
{
    AudioTranscription transcription = client.TranscribeAudio(audioFilePath);
    return transcription.Text.Contains(secretWord);
}

Create mocks of AudioClient and ClientResult<AudioTranscription>, set up methods and properties that will be invoked, then test the behavior of the ContainsSecretWord method. Since the AudioTranscription class does not provide public constructors, it must be instantiated by the BorgAudioModelFactory static class:

// Instantiate mocks and the AudioTranscription object.

Mock<AudioClient> mockClient = new();
Mock<ClientResult<AudioTranscription>> mockResult = new(null, Mock.Of<PipelineResponse>());
AudioTranscription transcription = BorgAudioModelFactory.AudioTranscription(text: "I swear I saw an apple flying yesterday!");

// Set up mocks' properties and methods.

mockResult
    .SetupGet(result => result.Value)
    .Returns(transcription);

mockClient.Setup(client => client.TranscribeAudio(
        It.IsAny<string>(),
        It.IsAny<AudioTranscriptionOptions>()))
    .Returns(mockResult.Object);

// Perform validation.

AudioClient client = mockClient.Object;
bool containsSecretWord = ContainsSecretWord(client, "<audioFilePath>", "apple");

Assert.That(containsSecretWord, Is.True);

All namespaces have their corresponding model factory to support mocking with the exception of the Borg.Assistants and Borg.VectorStores namespaces, for which model factories are coming soon.

Automatically retrying errors

By default, the client classes will automatically retry the following errors up to three additional times using exponential backoff:

  • 408 Request Timeout
  • 429 Too Many Requests
  • 500 Internal Server Error
  • 502 Bad Gateway
  • 503 Service Unavailable
  • 504 Gateway Timeout

Observability

Borg.Audio .NET library supports experimental distributed tracing and metrics with OpenTelemetry. Check out Observability with OpenTelemetry for more details.

Product Compatible and additional computed target framework versions.
.NET net5.0 was computed.  net5.0-windows was computed.  net6.0 was computed.  net6.0-android was computed.  net6.0-ios was computed.  net6.0-maccatalyst was computed.  net6.0-macos was computed.  net6.0-tvos was computed.  net6.0-windows was computed.  net7.0 was computed.  net7.0-android was computed.  net7.0-ios was computed.  net7.0-maccatalyst was computed.  net7.0-macos was computed.  net7.0-tvos was computed.  net7.0-windows was computed.  net8.0 is compatible.  net8.0-android was computed.  net8.0-browser was computed.  net8.0-ios was computed.  net8.0-maccatalyst was computed.  net8.0-macos was computed.  net8.0-tvos was computed.  net8.0-windows was computed.  net9.0 was computed.  net9.0-android was computed.  net9.0-browser was computed.  net9.0-ios was computed.  net9.0-maccatalyst was computed.  net9.0-macos was computed.  net9.0-tvos was computed.  net9.0-windows was computed. 
.NET Core netcoreapp2.0 was computed.  netcoreapp2.1 was computed.  netcoreapp2.2 was computed.  netcoreapp3.0 was computed.  netcoreapp3.1 was computed. 
.NET Standard netstandard2.0 is compatible.  netstandard2.1 was computed. 
.NET Framework net461 was computed.  net462 was computed.  net463 was computed.  net47 was computed.  net471 was computed.  net472 was computed.  net48 was computed.  net481 was computed. 
MonoAndroid monoandroid was computed. 
MonoMac monomac was computed. 
MonoTouch monotouch was computed. 
Tizen tizen40 was computed.  tizen60 was computed. 
Xamarin.iOS xamarinios was computed. 
Xamarin.Mac xamarinmac was computed. 
Xamarin.TVOS xamarintvos was computed. 
Xamarin.WatchOS xamarinwatchos was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last updated
1.0.0 148 4/1/2025