Contributing¶
Setup¶
Rust 1.95 is pinned in rust-toolchain.toml. Clone and build:
rustup will fetch 1.95 on first run if you don't have it. Most Rust work is under crates/kreuzcrawl/.
Pre-commit¶
Install pre-commit once, then wire up both hook types:
It runs automatically on git commit. To run it by hand:
Feature flags¶
The default build has no optional dependencies. Add features as needed:
| Feature | What it adds |
|---|---|
browser |
Browser-backed rendering and interaction |
ai |
LLM extraction via liter-llm |
api |
REST API server via Axum |
mcp |
Model Context Protocol server |
mcp-http |
MCP over HTTP (implies mcp + api) |
tracing |
OpenTelemetry spans |
interact |
Compatibility alias for browser-backed page interaction |
warc |
WARC archive output |
interact() is always part of the public API. Runtime support depends on the compiled browser backend:
chromiumoxide performs the actions, native performs supported actions in-process, and builds without a browser
backend return Unsupported.
Tests¶
To run all tests across the workspace (this includes both unit tests and integration tests, which may hit the network):
To run only the integration test binaries:
Browser tests need a running Chrome instance. Docker is the least painful way:
CI treats clippy warnings as errors, so run this before pushing:
Code conventions¶
Formatting is configured in rustfmt.toml. cargo fmt (or the pre-commit hook) handles it.
Four things that aren't obvious from the code:
- Async runtime is
tokio. Don't introduceasync-stdorsmol. - Public API types belong in
crates/kreuzcrawl/src/types/. Internal types live next to the code that uses them. defaults/ispub(crate). Don't re-export from it at the crate root without a discussion — the narrow public surface is intentional.- Error types go in
error.rswiththiserror. Noanyhowin library code.
Commits¶
gitfluff lints commit messages on commit. Format:
Types: feat, fix, docs, refactor, test, chore. Scope is the crate or area — engine, api, cli, python, etc.
Good example: fix(engine): respect max_pages limit during batch crawl
Pull requests¶
Branch from main using <type>/<short-description> — e.g. feat/streaming-results or fix/rate-limiter-overflow.
Before opening:
cargo fmt --allcargo clippy --workspace --all-features --all-targets -- -D warningscargo test --workspace- Update
docs/if you changed a public API or added a feature.
PR descriptions don't need a template. Say what changed and why. Link the issue if there is one.
CI runs the full binding test suite on each PR. You don't need Ruby, Java, PHP, or Elixir set up locally to work on the Rust core.
License¶
Kreuzcrawl is under the Elastic License 2.0. Contributing means your changes are covered by it.