Orobouros 1.0.10
See the version list below for details.
dotnet add package Orobouros --version 1.0.10
NuGet\Install-Package Orobouros -Version 1.0.10
<PackageReference Include="Orobouros" Version="1.0.10" />
paket add Orobouros --version 1.0.10
#r "nuget: Orobouros, 1.0.10"
// Install Orobouros as a Cake Addin #addin nuget:?package=Orobouros&version=1.0.10 // Install Orobouros as a Cake Tool #tool nuget:?package=Orobouros&version=1.0.10
The Orobouros Framework
Orobouros is a C# framework for scraping the web. Many attempts to do this have been created in various languages, but a different approach is taken with Orobouros due to the patented OrobourosModule™ system that allows any person to write their own plugin for any website.
Installation
Orobouros is available as a NuGet package and from the Github Actions
page. Keep in mind the pre-compiled builds on GitHub do not include dependencies. If you prefer the .NET CLI, you can also simply run:
dotnet add package Orobouros
On its own, Orobouros does nothing and needs modules to function. A list of publically available modules for download is listed on the GitHub repository.
Building
If you insist on compiling this yourself, all you need is .NET 8 Core
. I would not recommend taking advantage of the tests, as they require specific configurations I use in debugging.
Development
Take a look at the TestModule project included in this repo to get a general idea on how to use this framework. XML annotations are also provided. At some point I will create a wiki with relevant information, but for now the core functionality takes priority. Obfuscated code is allowed (as in the framework won't refuse to execute it) but incredibly discouraged due to malware concerns. If you really feel the need to keep your source code hidden, just don't share your module.
Example code to submit a scrape request to the loaded module stack:
ScrapingManager.InitializeModules(); // Only call this once at the entry point of your application
List<ModuleContent> requestedInfo = new List<ModuleContent> { ModuleContent.Text }; // Content you want to request from the modules. How this is handled is entirely dependent on the module's developer.
ModuleData? data = ScrapingManager.ScrapeURL("https://www.test.com/posts/posthere", requestedInfo); // Perform scrape request and wait for the returned data.
ScrapingManager.FlushSupplementaryMethods(); // Stop background methods. This should be called at least once when the application is exiting.
Example code to return a simple line of text from a module's scrape method:
ModuleData data = new ModuleData();
ProcessedScrapeData exampleInstance = new ProcessedScrapeData(ModuleContent.Text, parameters.URL, "Hello World!");
data.Content.Add(exampleInstance);
return data;
Please consult the XML documentation or the TestModule project for further code examples.
Copyright
This repository holds no responsibility over any modules programmers develop for this framework. No copywritten content is included in this repo and will never be. If someone has made a module for your website and you don't like it, I cannot help you. You must get in contact with them to resolve such matters. This also applies to potentially illicit/illegal content scraped with modules created by the community.
TODO:
- Dynamic module loading
- Raw HTTP support
- Downloader service
- Attribute scanning
- Custom attributes
- Module init method
- Module supplementary methods
- Module scrape method
- Module options
- Module return data
- Module GUIDs
- Custom library support
- Referenced library support
- SQLite support
- Dynamic database support
- Website API support (separate from raw HTTP)
- Cross-module support
- XML annotations
- Module security checks
- Module sanity checks
- Multiple modules for same website support
- Improved module error handling
- SQlite module integration
- Public module downloader tool
- Better data sanitizing & JSON storage
- General framework configuration class
- Cross-language support (extremely advanced)
- Data language translation toolkit
- Overhaul download class & integrate better
- Logging overhaul
- Module developer web toolkit
- Framework-level exception handling
- Bulk data downloading functions (stored in RAM)
Credits
- Branden Stober - Main Project Lead
- ImSoupp - Reflection Help & Database Help
- CTAG - Database Help
Product | Versions Compatible and additional computed target framework versions. |
---|---|
.NET | net8.0 is compatible. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. |
-
net8.0
- Downloader (>= 3.0.6)
- HtmlAgilityPack (>= 1.11.59)
- Microsoft.ClearScript.V8 (>= 7.4.4)
- Microsoft.EntityFrameworkCore.Sqlite (>= 8.0.2)
- Microsoft.Net.Http.Headers (>= 8.0.2)
- Newtonsoft.Json (>= 13.0.3)
- Serilog (>= 3.1.1)
- Serilog.Sinks.Console (>= 5.0.1)
- Serilog.Sinks.File (>= 5.0.0)
- SQLite (>= 3.13.0)
- System.Drawing.Common (>= 8.0.2)
- System.Net.Http (>= 4.3.4)
- System.Resources.Extensions (>= 8.0.0)
NuGet packages
This package is not used by any NuGet packages.
GitHub repositories
This package is not used by any popular GitHub repositories.
Version | Downloads | Last updated |
---|---|---|
1.1.3 | 157 | 4/3/2024 |
1.1.2 | 127 | 3/25/2024 |
1.1.1 | 117 | 3/24/2024 |
1.1.0 | 126 | 3/11/2024 |
1.0.12 | 154 | 3/10/2024 |
1.0.11 | 128 | 3/10/2024 |
1.0.10 | 154 | 3/8/2024 |
1.0.9 | 129 | 3/6/2024 |
1.0.8 | 130 | 3/5/2024 |
1.0.7 | 127 | 3/5/2024 |
1.0.6 | 124 | 3/4/2024 |
1.0.5 | 122 | 3/4/2024 |
1.0.4 | 118 | 3/4/2024 |
1.0.3 | 123 | 3/4/2024 |
1.0.2 | 119 | 3/3/2024 |
1.0.1 | 124 | 3/3/2024 |
1.0.0 | 132 | 3/3/2024 |