HtmlKit 1.2.0
Requires NuGet 2.12 or higher.
dotnet add package HtmlKit --version 1.2.0
NuGet\Install-Package HtmlKit -Version 1.2.0
<PackageReference Include="HtmlKit" Version="1.2.0" />
paket add HtmlKit --version 1.2.0
#r "nuget: HtmlKit, 1.2.0"
// Install HtmlKit as a Cake Addin #addin nuget:?package=HtmlKit&version=1.2.0 // Install HtmlKit as a Cake Tool #tool nuget:?package=HtmlKit&version=1.2.0
HtmlKit
What is HtmlKit?
HtmlKit is a cross-platform .NET framework for parsing HTML.
HtmlKit implements the HTML5 tokenizing state machine described in W3C's HTML5 Tokenization Specification.
Goals
I haven't fully figured that out yet.
So far the goal is tokenizing HTML with the intention of using it for MimeKit's HtmlToHtml text converter, replacing the quick & dirty HTML tokenizer I originally wrote.
Maybe someday I'll implement a DOM. Who knows.
License Information
HtmlKit is Copyright (C) 2015-2024 Jeffrey Stedfast and is licensed under the MIT license:
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
Installing via NuGet
The easiest way to install HtmlKit is via NuGet.
In Visual Studio's Package Manager Console, simply enter the following command:
Install-Package HtmlKit
Getting the Source Code
First, you'll need to clone HtmlKit from my GitHub repository. To do this using the command-line version of Git, you'll need to issue the following command in your terminal:
git clone https://github.com/jstedfast/HtmlKit.git
If you are using TortoiseGit on Windows, you'll need to right-click in the directory
where you'd like to clone HtmlKit and select Git Clone... in the menu. Once you do that, you'll get a dialog
asking you to specify the repository you'd like to clone. In the textbox labeled URL:, enter
https://github.com/jstedfast/HtmlKit.git
and then click OK. This will clone HtmlKit onto your local machine.
Updating the Source Code
Occasionally you might want to update your local copy of the source code if I have made changes to HtmlKit since you downloaded the source code in the step above. To do this using the command-line version fo Git, you'll need to issue the following command in your terminal within the HtmlKit directory:
git pull
If you are using TortoiseGit on Windows, you'll need to right-click on the HtmlKit directory and select Git Sync... in the menu. Once you do that, you'll need to click the Pull button.
Building
Once you've opened HtmlKit.sln solution file in Visual Studio, you can choose the Debug or Release build configuration and then build.
Both Visual Studio 2022 and Visual Studio 2019 should be able to build HtmlKit without any issues, but older versions such as Visual Studio 2015 and 2017 will likely require modifications to the projects in order to build correctly.
Note: The Release build will generate the xml API documentation, but the Debug build will not.
Using HtmlKit
Parsing HTML
The primary purpose of HtmlKit is parsing HTML.
using (var reader = new StreamReader (stream)) {
var tokenizer = new HtmlTokenizer (reader);
HtmlToken token;
// ReadNextToken() returns `false` when the end of the stream is reached.
while (tokenizer.ReadNextToken (out token)) {
switch (token.Kind) {
case HtmlTokenKind.ScriptData:
case HtmlTokenKind.CData:
case HtmlTokenKind.Data:
// ScriptData, CData, and Data tokens contain text data.
var text = (HtmlDataToken) token;
Console.WriteLine ("{0}: {1}", token.Kind, text.Data);
break;
case HtmlTokenKind.Tag:
// Tag tokens represent tags and their attributes.
var tag = (HtmlTagToken) token;
Console.Write ("<{0}{1}", tag.IsEndTag ? "/" : "", tag.Name);
foreach (var attribute in tag.Attributes) {
if (attribute.Value != null)
Console.Write (" {0}={1}", attribute.Name, Quote (attribute.Value));
else
Console.Write (" {0}", attribute.Name);
}
Console.WriteLine (tag.IsEmptyElement ? "/>" : ">");
break;
case HtmlTokenKind.Comment:
var comment = (HtmlCommentToken) token;
Console.WriteLine ("Comment: {0}", comment.Comment);
break;
case HtmlTokenKind.DocType:
var doctype = (HtmlDocTypeToken) token;
if (doctype.ForceQuirksMode)
Console.Write ("");
Console.Write ("<!DOCTYPE");
if (doctype.Name != null)
Console.Write (" {0}", doctype.Name.ToUpperInvariant ());
if (doctype.PublicIdentifier != null) {
Console.Write (" PUBLIC \"{0}\"", doctype.PublicIdentifier);
if (doctype.SystemIdentifier != null)
Console.Write (" \"{0}\"", doctype.SystemIdentifier);
} else if (doctype.SystemIdentifier != null) {
Console.Write (" SYSTEM \"{0}\"", doctype.SystemIdentifier);
}
Console.WriteLine (">");
break;
}
}
}
Contributing
The first thing you'll need to do is fork HtmlKit to your own GitHub repository. For instructions on how to do that, see the section titled Getting the Source Code.
If you use Visual Studio for Mac or MonoDevelop, all of the solution files are configured with the coding style used by HtmlKit. If you use Visual Studio on Windows or some other editor, please try to maintain the existing coding style as best as you can.
Once you've got some changes that you'd like to submit upstream to the official HtmlKit repository, send me a Pull Request and I will try to review your changes in a timely manner.
If you'd like to contribute but don't have any particular features in mind to work on, check out the issue tracker and look for something that might pique your interest!
Reporting Bugs
Have a bug or a feature request? Please open a new bug report or feature request.
Before opening a new issue, please search through any existing issues to avoid submitting duplicates.
If you are getting an exception from somewhere within HtmlKit, don't just provide the Exception.Message
string. Please include the Exception.StackTrace
as well. The Message
, by itself, is often useless.
Documentation
API documentation can be found in the source code in the form of XML doc comments.
Product | Versions Compatible and additional computed target framework versions. |
---|---|
.NET | net5.0 was computed. net5.0-windows was computed. net6.0 is compatible. net6.0-android was computed. net6.0-ios was computed. net6.0-maccatalyst was computed. net6.0-macos was computed. net6.0-tvos was computed. net6.0-windows was computed. net7.0 was computed. net7.0-android was computed. net7.0-ios was computed. net7.0-maccatalyst was computed. net7.0-macos was computed. net7.0-tvos was computed. net7.0-windows was computed. net8.0 is compatible. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net9.0 was computed. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. |
.NET Core | netcoreapp2.0 was computed. netcoreapp2.1 was computed. netcoreapp2.2 was computed. netcoreapp3.0 was computed. netcoreapp3.1 was computed. |
.NET Standard | netstandard2.0 is compatible. netstandard2.1 is compatible. |
.NET Framework | net461 was computed. net462 is compatible. net463 was computed. net47 is compatible. net471 was computed. net472 was computed. net48 is compatible. net481 was computed. |
MonoAndroid | monoandroid was computed. |
MonoMac | monomac was computed. |
MonoTouch | monotouch was computed. |
Tizen | tizen40 was computed. tizen60 was computed. |
Xamarin.iOS | xamarinios was computed. |
Xamarin.Mac | xamarinmac was computed. |
Xamarin.TVOS | xamarintvos was computed. |
Xamarin.WatchOS | xamarinwatchos was computed. |
-
.NETFramework 4.6.2
- System.Buffers (>= 4.6.0)
- System.Memory (>= 4.6.0)
-
.NETFramework 4.7
- System.Buffers (>= 4.6.0)
- System.Memory (>= 4.6.0)
-
.NETFramework 4.8
- System.Buffers (>= 4.6.0)
- System.Memory (>= 4.6.0)
-
.NETStandard 2.0
- System.Buffers (>= 4.6.0)
- System.Memory (>= 4.6.0)
-
.NETStandard 2.1
- System.Buffers (>= 4.6.0)
- System.Memory (>= 4.6.0)
-
net6.0
- No dependencies.
-
net8.0
- No dependencies.
NuGet packages
This package is not used by any NuGet packages.
GitHub repositories (1)
Showing the top 1 popular GitHub repositories that depend on HtmlKit:
Repository | Stars |
---|---|
bkaankose/Wino-Mail
Built-in Mail & Calendars app clone for Windows.
|
* Lazy-load the Id properties for HtmlTagToken and HtmlAttribute for performance improvements.
* Removed the need for reflection in HtmlTagId and HtmlAttributeId logic for converting between these enums and their string equivalents.
* Optimized HtmlTokenizer using a char[] buffer instead of reading 1 char at a time.
* Added HtmlTokenizer .ctors that take a Stream instead of a TextReader.
* Bumped System.Buffers dependency to 4.6.0
* Bumped System.Memory dependency to 4.6.0
* Dropped support for net7.0.
* Added support for net8.0.