Antibot Strategy & Stealth Surfaces¶
Kreuzcrawl detects WAF/bot-mitigation signals, classifies them with hot-reloadable rules, and can escalate through the configured dispatch chain. Customize policy with AntibotStrategy, DispatchProfile, retry policy, domain state, and optional caller-supplied bypass providers.
Architecture¶
Three layers compose the antibot system:
- WAF Classifier — Matches HTTP responses against
crates/kreuzcrawl/rules/waf_fingerprints.tomland returns aWafSignalwithvendor,fingerprint_id, andweight. - Decision Hook —
AntibotStrategytrait pair:pre_request(warm external state) andpost_response(inspect response, decide next action). - Dispatch Policy —
DispatchProfilecombines escalation strategy, retry policy, classifier, domain state, budget, and optional bypass provider.
When a WAF signal is detected, the engine consults the strategy and retry policy to decide whether to retry, escalate to the browser tier, route through a caller-supplied bypass provider, or stop. Decision::RotateProxy is present but not implemented; it logs and falls through to Accept.
The AntibotStrategy trait¶
Implement this trait to intercept every request/response pair:
#[async_trait]
pub trait AntibotStrategy: Send + Sync {
/// Called once per attempt, before the tower-stack fetch fires.
/// Return Err to abort this attempt; the retry policy decides what happens.
async fn pre_request(&self, url: &str) -> Result<(), AntibotError>;
/// Called once per successful HTTP response, after WAF classification.
/// Return a Decision that overrides the retry policy for this attempt.
async fn post_response(
&self,
response: &HttpResponse,
waf_signal: Option<&WafSignal>,
) -> Decision;
}
The Decision enum controls what happens next:
pub enum Decision {
/// Continue normally — hand the response to the retry policy.
Accept,
/// Retry the same tier after `backoff` (Duration).
Retry { backoff: Duration },
/// Rotate to a different proxy and retry (not yet implemented).
/// Logs a warn!() and falls through to Accept for now.
RotateProxy,
/// Force escalation to the Browser tier, bypassing the retry policy.
EscalateBrowser,
}
Errors during strategy execution are returned as:
pub enum AntibotError {
/// The pre_request hook failed.
PreRequest(String),
/// The post_response hook failed.
PostResponse(String),
}
BrowserMode variants¶
The BrowserConfig.mode field controls when and how the browser is used. BrowserMode::Stealth is the new variant:
| Variant | Escalation | Stealth surfaces | Use case |
|---|---|---|---|
Auto (default) |
HTTP first, escalate to browser on WAF/403 | None | Balanced; handles most sites |
Always |
Skip HTTP entirely | None | JS-heavy SPAs |
Never |
No browser fallback | None | Performance-critical; no WAF expected |
Stealth |
Browser tier for every request | JS patches, native TLS spoofing, UA/viewport defaults | Stealth-hardened mode for challenging sites |
BrowserMode::Stealth behaves like Always for request routing (every page goes through the browser tier) but additionally enables:
- chromiumoxide JS patches (
crate::stealth::apply_stealth_patches) that spoofnavigator.webdriver,navigator.chromeFlags, andnavigator.plugins. - Native-backend TLS fingerprint spoofing (JA3 randomization).
- Stealth-aware default user-agent when no explicit user-agent is set.
- Forced viewport (1920×1080) to avoid detection via unusual screen sizes.
Previously, a BrowserConfig.stealth: bool field existed but had a bug: JS patches always ran regardless of the flag. This field has been removed pre-v1. Use BrowserMode::Stealth instead.
Worked example: per-vendor backoff¶
Wrap DefaultAntibotStrategy to inject custom backoff rules per WAF vendor:
use std::time::Duration;
use async_trait::async_trait;
use kreuzcrawl::{
AntibotStrategy, Decision, DefaultAntibotStrategy,
AntibotError, http::HttpResponse, WafSignal,
};
#[derive(Debug)]
struct VendorBackoffStrategy {
inner: DefaultAntibotStrategy,
}
#[async_trait]
impl AntibotStrategy for VendorBackoffStrategy {
async fn pre_request(&self, url: &str) -> Result<(), AntibotError> {
// Delegate to default (no-op)
self.inner.pre_request(url).await
}
async fn post_response(
&self,
resp: &HttpResponse,
signal: Option<&WafSignal>,
) -> Decision {
match signal.map(|s| s.vendor.as_str()) {
Some("cloudflare") => {
// Cloudflare: aggressive backoff
Decision::Retry {
backoff: Duration::from_secs(10),
}
}
Some("datadome") => {
// DataDome: skip to browser
Decision::EscalateBrowser
}
_ => {
// Everything else: use default logic
self.inner.post_response(resp, signal).await
}
}
}
}
// Wire it up
let strategy = std::sync::Arc::new(VendorBackoffStrategy {
inner: DefaultAntibotStrategy::new(),
});
let profile = DispatchProfile::builder()
.antibot_strategy(strategy)
.build();
Defaults¶
Without an attached strategy, the engine uses DefaultAntibotStrategy (defined at crates/kreuzcrawl/src/types/antibot.rs:132-164):
pre_requestis a no-op.post_responsereturnsDecision::EscalateBrowserwhen a WAF signal is present,Decision::Acceptotherwise.
This matches the pre-Cluster-5 escalation logic, so existing code continues to work unchanged.
WAF detection corpus v0.3¶
Kreuzcrawl classifies WAF fingerprints via TomlClassifier at crates/kreuzcrawl/rules/waf_fingerprints.toml. The rules currently cover Cloudflare, DataDome, PerimeterX/HUMAN Security, Imperva, AWS WAF, Akamai, F5, and generic block patterns.
Fingerprints match response headers, body substrings, or status code/header combinations. The classifier supports hot reload via TomlClassifier::watch() so rule updates can land without restarting the process.
Dispatch and EWMA state v0.3¶
The default retry and dispatch layer can combine WAF signals, transient errors, and EwmaDomainState observations. EWMA state lets callers track per-domain outcomes and feed a LearningRetryPolicy without requiring a database.
Bypass providers¶
Kreuzcrawl exposes the BypassProvider trait and BypassResponse type for caller-owned integrations. Providers are responsible for authentication, request shaping, response decoding, cost metadata, and error mapping.
Kreuzcrawl does not ship Bright Data, Zyte, ScrapingBee, or other vendor adapters in the core crate.