Architecture¶

Kreuzcrawl is a Rust core crate (kreuzcrawl) with a small public surface, surrounded by polyglot bindings that all wrap the same core. The runtime is Tokio; HTTP fetching uses reqwest; browser-backed rendering and interaction can use either the chromiumoxide CDP backend or the in-process native backend.

Public surface¶

The crate root exports seven free functions over an opaque handle, plus serialisable configuration and result types:

Symbol	Purpose
`create_engine(config: Option<CrawlConfig>) -> Result<CrawlEngineHandle, CrawlError>`	Build an engine from a validated `CrawlConfig`.
`scrape(&engine, url) -> Result<ScrapeResult, _>`	Fetch and extract a single page.
`crawl(&engine, url) -> Result<CrawlResult, _>`	Follow links from a seed up to `max_depth` / `max_pages`.
`map_urls(&engine, url) -> Result<MapResult, _>`	Discover URLs via sitemaps and link extraction.
`interact(&engine, url, actions) -> Result<InteractionResult, _>`	Navigate once and run ordered page actions.
`batch_scrape(&engine, urls) -> Result<Vec<BatchScrapeResult>, _>`	Scrape many URLs concurrently.
`batch_crawl(&engine, urls) -> Result<Vec<BatchCrawlResult>, _>`	Crawl many seeds concurrently.
`serve_api(...)` (feature `api`) / `start_mcp_server(...)` (feature `mcp`)	Long-running REST and MCP servers backed by the same engine.

All other items in the source tree are internal — the public crate surface is intentionally narrow.

Data flow¶

graph LR
    A[create_engine] --> B[CrawlEngineHandle]
    B --> S[scrape / crawl / map_urls / interact / batch_*]
    S --> H[HTTP fetch + middleware]
    H --> R{Content type}
    R -->|HTML| E[Extraction pipeline]
    R -->|PDF / binary| D[DownloadedDocument]
    E --> M[Markdown conversion]
    M --> O[ScrapeResult / CrawlResult / MapResult]
    D --> O

The middleware stack between the engine and the network applies per-domain rate limiting, conditional caching, and User-Agent rotation, plus optional tracing spans. WAF responses can trigger an automatic browser fallback when BrowserMode::Auto is set. interact() bypasses the crawl/extraction pipeline and keeps one browser page open while it executes PageAction values such as click, type, wait, screenshot, JavaScript evaluation, and scrape. Chromiumoxide screenshots are compositor captures; native screenshots are deterministic PNG snapshots derived from the post-action HTML and are intended for inspection, not pixel-perfect Chrome parity. The extraction pipeline is described in detail in Content Extraction.

Bindings¶

Every binding consumes the same Rust core via FFI. The per-binding glue is generated by alef from the core types and a binding manifest (alef.toml); generated code lives under packages/<lang>/ and crates/kreuzcrawl-<binding>/. Binding-level differences (async runtimes, naming conventions, type marshalling) are handled by the generator — the core itself stays language-agnostic.

Binding crate	Distribution	Mechanism
`crates/kreuzcrawl-py`	PyPI `kreuzcrawl`	PyO3 + maturin
`crates/kreuzcrawl-node`	npm `@kreuzberg/kreuzcrawl`	NAPI-RS
`crates/kreuzcrawl-php`	Composer `kreuzberg-dev/kreuzcrawl`	ext-php-rs
`crates/kreuzcrawl-wasm`	npm `@kreuzberg/kreuzcrawl-wasm`	wasm-bindgen
`crates/kreuzcrawl-ffi`	Shared library + cbindgen header	C FFI
`packages/ruby/ext/...`	RubyGems `kreuzcrawl`	Magnus + rb-sys
`packages/elixir/native/...`	Hex `kreuzcrawl`	Rustler NIF
`packages/go`	Go module `github.com/kreuzberg-dev/kreuzcrawl/packages/go`	cgo over C FFI
`packages/java`	Maven Central `dev.kreuzberg.kreuzcrawl:kreuzcrawl`	Java 25 Panama FFM
`packages/kotlin-android`	Maven Central `dev.kreuzberg.kreuzcrawl:kreuzcrawl-android`	Android AAR with JNI .sos
`packages/csharp`	NuGet `Kreuzcrawl`	.NET 10 P/Invoke
`packages/dart`	pub.dev `kreuzcrawl`	Dart FFI
`packages/swift`	Swift Package Manager	Swift over C FFI
`packages/zig`	`zig fetch --save`	Zig over C FFI

Feature gates¶

Cargo features keep the default build minimal — the default feature set is empty. The user-facing features are:

Feature	Capability
`browser`	Headless-Chrome fallback for JS-heavy or WAF-protected pages.
`browser-native`	In-process native browser backend for rendering and page interaction.
`interact`	Compatibility alias for browser-backed page interaction. The public API is always compiled.
`tracing`	OpenTelemetry-compatible request spans.
`api`	`serve_api(...)` — Firecrawl v1-compatible REST server.
`mcp`	`start_mcp_server(...)` — Model Context Protocol server for AI-agent integration.
`mcp-http`	MCP over HTTP transport (implies `mcp` + `api`).
`warc`	WARC 1.1 output via `CrawlConfig::warc_output`.

Edit this page on GitHub