ExcavatorSharp.WebScraper.x64
1.2.1
See the version list below for details.
dotnet add package ExcavatorSharp.WebScraper.x64 --version 1.2.1
NuGet\Install-Package ExcavatorSharp.WebScraper.x64 -Version 1.2.1
<PackageReference Include="ExcavatorSharp.WebScraper.x64" Version="1.2.1" />
paket add ExcavatorSharp.WebScraper.x64 --version 1.2.1
#r "nuget: ExcavatorSharp.WebScraper.x64, 1.2.1"
// Install ExcavatorSharp.WebScraper.x64 as a Cake Addin #addin nuget:?package=ExcavatorSharp.WebScraper.x64&version=1.2.1 // Install ExcavatorSharp.WebScraper.x64 as a Cake Tool #tool nuget:?package=ExcavatorSharp.WebScraper.x64&version=1.2.1
ExcavatorSharp is a multi-threaded server for scraping web data. It converts HTML code into a structured array of data. The library allows data scraping from multiple sites in parallel mode, within a single running application. Create scraping tasks and perform data extraction on a schedule.
The library is designed for professional extraction and parsing of large volumes of data. Under the hood there are .css-selectors and xpath support, data export into .csv/.xlsx/.sql/.json, online data export, support for proxy servers, dynamic content crawling, interaction with the site via javascript and much more. The library uses .NET Sockets and Chromium Embedded Framework.
The library can be used separately as crawler or parser. We support the formats sitemap.xml and robots.txt. We support the gzip / deflate compression.
Attention! Only x64 versions are supported for .NET 4.5.2 and 4.6 platforms. AnyCPU build does not support! You will NOT be able to run the library when building AnyCPU. This is caused by the features of CEF.
Product | Versions Compatible and additional computed target framework versions. |
---|---|
.NET Framework | net452 is compatible. net46 is compatible. net461 was computed. net462 was computed. net463 was computed. net47 was computed. net471 was computed. net472 was computed. net48 was computed. net481 was computed. |
-
- cef.redist.x64 (>= 79.1.36)
- cef.redist.x86 (>= 79.1.36)
- CefSharp.Common (>= 75.1.360)
- CefSharp.OffScreen (>= 75.1.360)
- EPPlus (<= 4.5.3.3)
- HtmlAgilityPack (>= 1.11.23)
- HtmlAgilityPack.CssSelectors (>= 1.0.2)
- Newtonsoft.Json (>= 12.0.3)
- RestSharp (>= 106.10.1)
NuGet packages
This package is not used by any NuGet packages.
GitHub repositories
This package is not used by any popular GitHub repositories.
Version | Downloads | Last updated |
---|---|---|
1.2.8 | 790 | 8/10/2020 |
1.2.7 | 459 | 8/10/2020 |
1.2.3 | 521 | 5/20/2020 |
1.2.2 | 487 | 5/10/2020 |
1.2.1 | 488 | 5/5/2020 |
1.2.0 | 515 | 4/30/2020 |
1.1.0 | 489 | 4/23/2020 |
1.0.53 | 463 | 4/12/2020 |
1.0.52 | 488 | 4/11/2020 |
1.0.51 | 496 | 4/11/2020 |
1.0.6 | 484 | 4/23/2020 |
1.0.5 | 482 | 4/11/2020 |
1.0.4 | 530 | 4/3/2020 |
1.0.3 | 500 | 2/12/2020 |
1.0.2 | 529 | 1/30/2020 |
1.0.1 | 506 | 1/30/2020 |
1.0.0 | 445 | 1/23/2020 |
1) Fixed current errors and improved overall stability of the application.
2) Fixed the problem of using the SSL / TLS protocols that are not available in some versions of the OS. The problem occurred when trying to set the ServicePointManager.SecurityProtocol property to certain values on certain operating systems.
3) Added principal login functionality to the site as a separate behavior for CEF. Now if you want to extract data from a site that requires authentication by login and password, you can do so not through CEFBehaviors (which requires some skill), but through CEFWebsiteAuthBehavior. Inside you find a simple set of fields, including a template script. In general, this greatly simplifies the work with sites that require authentication.
4) Fixed the Excel export algorithm - the library was downgraded to a stable build without additional license fees (EPPlus starting from version 5 is no longer free).
5) Fixed the algorithm of exporting through Excel and CSV in complex cases. Now if one of the export results is not recorded successfully, the overall export process does NOT stop.
6) Added callback to the export mechanism which is called after each exported record. This allows you to keep track of the export process, rather than waiting for a long time for the program to finish.
7) Remains to question the behavior of the program in case it is not run from under the administrator on server versions of the operating system. At the moment, if you have any problems, we recommend you to use the "Launch from under an administrator" mechanism.
Bug fixes and improvements to the previous release. Improvement of threading security in some code blocks. Improved information logging.