Types Reference

Types Reference¶

All types defined by the library, grouped by category. Types are shown using Rust as the canonical representation.

Result Types¶

InteractionResult¶

Result of executing a sequence of page interaction actions.

Field	Type	Default	Description
`action_results`	`Vec<ActionResult>`	`vec![]`	Results from each executed action.
`final_html`	`String`	—	Final page HTML after all actions completed.
`final_url`	`String`	—	Final page URL (may have changed due to navigation).
`screenshot`	`Option<Vec<u8>>`	`Default::default()`	Screenshot taken after all actions, if requested.

ActionResult¶

Result from a single page action execution.

Field	Type	Default	Description
`action_index`	`usize`	—	Zero-based index of the action in the sequence.
`action_type`	`String`	—	The type of action that was executed.
`success`	`bool`	—	Whether the action completed successfully.
`data`	`Option<serde_json::Value>`	`Default::default()`	Action-specific return data (screenshot bytes, JS return value, scraped HTML).
`error`	`Option<String>`	`Default::default()`	Error message if the action failed.

ScrapeResult¶

The result of a single-page scrape operation.

Field	Type	Default	Description
`status_code`	`u16`	—	The HTTP status code of the response.
`final_url`	`String`	—	The final URL after following all redirects.
`content_type`	`String`	—	The Content-Type header value.
`html`	`String`	—	The HTML body of the response.
`body_size`	`usize`	—	The size of the response body in bytes.
`metadata`	`PageMetadata`	—	Extracted metadata from the page.
`links`	`Vec<LinkInfo>`	`vec![]`	Links found on the page.
`images`	`Vec<ImageInfo>`	`vec![]`	Images found on the page.
`feeds`	`Vec<FeedInfo>`	`vec![]`	Feed links found on the page.
`json_ld`	`Vec<JsonLdEntry>`	`vec![]`	JSON-LD entries found on the page.
`is_allowed`	`bool`	—	Whether the URL is allowed by robots.txt.
`crawl_delay`	`Option<u64>`	`Default::default()`	The crawl delay from robots.txt, in seconds.
`noindex_detected`	`bool`	—	Whether a noindex directive was detected.
`nofollow_detected`	`bool`	—	Whether a nofollow directive was detected.
`x_robots_tag`	`Option<String>`	`Default::default()`	The X-Robots-Tag header value, if present.
`is_pdf`	`bool`	—	Whether the content is a PDF.
`was_skipped`	`bool`	—	Whether the page was skipped (binary or PDF content).
`detected_charset`	`Option<String>`	`Default::default()`	The detected character set encoding.
`auth_header_sent`	`bool`	—	Whether an authentication header was sent with the request.
`response_meta`	`Option<ResponseMeta>`	`Default::default()`	Response metadata extracted from HTTP headers.
`assets`	`Vec<DownloadedAsset>`	`vec![]`	Downloaded assets from the page.
`js_render_hint`	`bool`	—	Whether the page content suggests JavaScript rendering is needed.
`browser_used`	`bool`	—	Whether the browser fallback was used to fetch this page.
`markdown`	`Option<MarkdownResult>`	`Default::default()`	Markdown conversion of the page content.
`extracted_data`	`Option<serde_json::Value>`	`Default::default()`	Structured data extracted by LLM. Populated when extraction is configured.
`extraction_meta`	`Option<ExtractionMeta>`	`Default::default()`	Metadata about the LLM extraction pass (cost, tokens, model).
`screenshot`	`Option<Vec<u8>>`	`Default::default()`	Screenshot of the page as PNG bytes. Populated when browser is used and capture_screenshot is enabled.
`downloaded_document`	`Option<DownloadedDocument>`	`Default::default()`	Downloaded non-HTML document (PDF, DOCX, image, code, etc.).
`browser`	`Option<BrowserExtras>`	`Default::default()`	Browser-specific extras (eval result, network events, cookies). Only populated when `BrowserBackend.Native` was used for this request.

CrawlPageResult¶

The result of crawling a single page during a crawl operation.

Field	Type	Default	Description
`url`	`String`	—	The original URL of the page.
`normalized_url`	`String`	—	The normalized URL of the page.
`status_code`	`u16`	—	The HTTP status code of the response.
`content_type`	`String`	—	The Content-Type header value.
`html`	`String`	—	The HTML body of the response.
`body_size`	`usize`	—	The size of the response body in bytes.
`metadata`	`PageMetadata`	—	Extracted metadata from the page.
`links`	`Vec<LinkInfo>`	`vec![]`	Links found on the page.
`images`	`Vec<ImageInfo>`	`vec![]`	Images found on the page.
`feeds`	`Vec<FeedInfo>`	`vec![]`	Feed links found on the page.
`json_ld`	`Vec<JsonLdEntry>`	`vec![]`	JSON-LD entries found on the page.
`depth`	`usize`	—	The depth of this page from the start URL.
`stayed_on_domain`	`bool`	—	Whether this page is on the same domain as the start URL.
`was_skipped`	`bool`	—	Whether this page was skipped (binary or PDF content).
`is_pdf`	`bool`	—	Whether the content is a PDF.
`detected_charset`	`Option<String>`	`Default::default()`	The detected character set encoding.
`markdown`	`Option<MarkdownResult>`	`Default::default()`	Markdown conversion of the page content.
`extracted_data`	`Option<serde_json::Value>`	`Default::default()`	Structured data extracted by LLM. Populated when extraction is configured.
`extraction_meta`	`Option<ExtractionMeta>`	`Default::default()`	Metadata about the LLM extraction pass (cost, tokens, model).
`downloaded_document`	`Option<DownloadedDocument>`	`Default::default()`	Downloaded non-HTML document (PDF, DOCX, image, code, etc.).
`browser_used`	`bool`	—	Whether the browser fallback was used to fetch this page.

CrawlResult¶

The result of a multi-page crawl operation.

Field	Type	Default	Description
`pages`	`Vec<CrawlPageResult>`	`vec![]`	The list of crawled pages.
`final_url`	`String`	—	The final URL after following redirects.
`redirect_count`	`usize`	—	The number of redirects followed.
`was_skipped`	`bool`	—	Whether any page was skipped during crawling.
`error`	`Option<String>`	`Default::default()`	An error message, if the crawl encountered an issue.
`cookies`	`Vec<CookieInfo>`	`vec![]`	Cookies collected during the crawl.
`stayed_on_domain`	`bool`	—	Whether all crawled pages stayed on the same domain as the start URL.
`browser_used`	`bool`	—	Whether the browser fallback was used for any page in this crawl.
`normalized_urls`	`Vec<String>`	`vec![]`	Normalized URLs encountered during crawling (for deduplication counting).

MapResult¶

The result of a map operation, containing discovered URLs.

Field	Type	Default	Description
`urls`	`Vec<SitemapUrl>`	`vec![]`	The list of discovered URLs.

MarkdownResult¶

Rich markdown conversion result from HTML processing.

Field	Type	Default	Description
`content`	`String`	—	Converted markdown text.
`document_structure`	`Option<serde_json::Value>`	`Default::default()`	Structured document tree with semantic nodes.
`tables`	`Vec<serde_json::Value>`	`vec![]`	Extracted tables with structured cell data.
`warnings`	`Vec<String>`	`vec![]`	Non-fatal processing warnings.
`citations`	`bool`	—	Whether citation conversion was applied and produced at least one reference. `true` when the markdown contained inline links that were converted to numbered citation references. The converted content (with `[N]` markers) is available in `content`; the full reference list is accessible via `generate_citations` if needed separately.
`fit_content`	`Option<String>`	`Default::default()`	Content-filtered markdown optimized for LLM consumption.

CitationResult¶

Result of citation conversion.

Field	Type	Default	Description
`content`	`String`	—	Markdown with links replaced by numbered citations.
`references`	`Vec<CitationReference>`	`vec![]`	Numbered reference list: (index, url, text).

BatchScrapeResult¶

Result from a single URL in a batch scrape operation.

Field	Type	Default	Description
`url`	`String`	—	The URL that was scraped.
`result`	`Option<ScrapeResult>`	`Default::default()`	The scrape result, if successful.
`error`	`Option<String>`	`Default::default()`	The error message, if the scrape failed.

BatchCrawlResult¶

Result from a single URL in a batch crawl operation.

Field	Type	Default	Description
`url`	`String`	—	The seed URL that was crawled.
`result`	`Option<CrawlResult>`	`Default::default()`	The crawl result, if successful.
`error`	`Option<String>`	`Default::default()`	The error message, if the crawl failed.

BatchScrapeResults¶

Aggregate result of a batch scrape, exposing per-URL results plus precomputed counts.

The counts are derived once at construction so every binding language can read them as plain integer fields without re-iterating the results vector.

Field	Type	Default	Description
`results`	`Vec<BatchScrapeResult>`	`vec![]`	Per-URL scrape results, in the order URLs were submitted.
`total_count`	`usize`	—	Total number of URLs in the batch (equal to `results.len()`).
`completed_count`	`usize`	—	Number of URLs whose scrape succeeded (`error` is `None`).
`failed_count`	`usize`	—	Number of URLs whose scrape failed (`error` is `Some`).

BatchCrawlResults¶

Aggregate result of a batch crawl, exposing per-URL results plus precomputed counts.

The counts are derived once at construction so every binding language can read them as plain integer fields without re-iterating the results vector.

Field	Type	Default	Description
`results`	`Vec<BatchCrawlResult>`	`vec![]`	Per-URL crawl results, in the order seed URLs were submitted.
`total_count`	`usize`	—	Total number of seed URLs in the batch (equal to `results.len()`).
`completed_count`	`usize`	—	Number of seed URLs whose crawl succeeded (`error` is `None`).
`failed_count`	`usize`	—	Number of seed URLs whose crawl failed (`error` is `Some`).

Configuration Types¶

See Configuration Reference for detailed defaults and language-specific representations.

ProxyConfig¶

Proxy configuration for HTTP requests.

Field	Type	Default	Description
`url`	`String`	—	Proxy URL (e.g. "http://proxy:8080", "socks5://proxy:1080").
`username`	`Option<String>`	`Default::default()`	Optional username for proxy authentication.
`password`	`Option<String>`	`Default::default()`	Optional password for proxy authentication.

ContentConfig¶

Content extraction and conversion configuration.

Controls how HTML is converted to the output format. Uses html-to-markdown-rs as the conversion engine for all formats (markdown, plain text, djot).

Field	Type	Default	Description
`output_format`	`String`	`"markdown"`	Output format: `"markdown"` (default), `"plain"`, `"djot"`.
`preprocessing_preset`	`String`	`"standard"`	Preprocessing aggressiveness: `"minimal"`, `"standard"` (default), `"aggressive"`. - Minimal: only scripts/styles removed. - Standard: also removes nav, nav-hinted headers/footers/asides, forms. - Aggressive: removes all footers/asides unconditionally.
`remove_navigation`	`bool`	`true`	Remove navigation elements (nav, breadcrumbs, menus). Default: `true`.
`remove_forms`	`bool`	`true`	Remove form elements. Default: `true`.
`strip_tags`	`Vec<String>`	`vec![]`	HTML tag names to strip (render children only, remove the tag wrapper). Default: `["noscript"]`.
`preserve_tags`	`Vec<String>`	`vec![]`	HTML tag names to preserve as raw HTML in output.
`exclude_selectors`	`Vec<String>`	`vec![]`	CSS selectors for elements to exclude entirely (element + all content). Unlike `strip_tags` (which removes the wrapper but keeps children), excluded elements and all descendants are dropped. Supports CSS selectors: `.class`, `#id`, `[attribute]`, compound selectors. Example: `[".cookie-banner", "#ad-container", "[role='complementary']"]`
`skip_images`	`bool`	`false`	Skip image elements in output. Default: `false`.
`max_depth`	`Option<usize>`	`None`	Max DOM traversal depth. Prevents stack overflow on deeply nested HTML.
`wrap`	`bool`	`false`	Enable line wrapping. Default: `false`.
`wrap_width`	`usize`	`80`	Wrap width when `wrap` is enabled. Default: `80`.
`include_document_structure`	`bool`	`true`	Include document structure tree in output. Default: `true`.

BrowserConfig¶

Browser fallback configuration.

Field	Type	Default	Description
`mode`	`BrowserMode`	`BrowserMode::Auto`	When to use the headless browser fallback.
`backend`	`BrowserBackend`	`BrowserBackend::Chromiumoxide`	Browser backend used to render JavaScript-heavy pages.
`endpoint`	`Option<String>`	`None`	CDP WebSocket endpoint for connecting to an external browser instance.
`timeout`	`Duration`	`30000ms`	Timeout for browser page load and rendering (in milliseconds when serialized).
`wait`	`BrowserWait`	`BrowserWait::NetworkIdle`	Wait strategy after browser navigation.
`wait_selector`	`Option<String>`	`None`	CSS selector to wait for when `wait` is `Selector`.
`extra_wait`	`Option<Duration>`	`None`	Extra time to wait after the wait condition is met.
`stealth`	`bool`	`false`	Enable browser-realistic TLS fingerprint via the stealth HTTP client. Only honored by `BrowserBackend.Native` — chromiumoxide is already full-stealth via Chrome's TLS stack.
`proxy`	`Option<ProxyConfig>`	`None`	Proxy for browser fetches. Overrides `CrawlConfig.proxy` when set. Native backend supports http/https only (no SOCKS5).
`block_url_patterns`	`Vec<String>`	`vec![]`	URL patterns to block before the network request fires. Supports `*` wildcards. Useful for skipping ads/analytics/large images. Honored by `BrowserBackend.Native`; chromiumoxide ignores this field today.
`eval_script`	`Option<String>`	`None`	JavaScript snippet evaluated after navigation completes. Scraping captures the native backend result in `ScrapeResult.browser.eval_result`. Interactions run this script before page actions on both browser backends but do not include the script result in `InteractionResult`.
`robots_user_agent`	`Option<String>`	`None`	User-agent used when fetching robots.txt. Defaults to `BrowserConfig.user_agent` (or kreuzcrawl's default) if unset. Native only.
`capture_network_events`	`bool`	`false`	Capture the full network event stream into the result. Default false (only the document event is captured). Native only.
`session_affinity`	`bool`	`true`	Enable session affinity: reuse chromiumoxide Pages for same-domain requests so cookies + fingerprint + solved challenges persist. Default: true. When false, each request gets a fresh Page.

CrawlConfig¶

Configuration for crawl, scrape, and map operations.

Field	Type	Default	Description
`max_depth`	`Option<usize>`	`None`	Maximum crawl depth (number of link hops from the start URL).
`max_pages`	`Option<usize>`	`None`	Maximum number of pages to crawl.
`max_concurrent`	`Option<usize>`	`None`	Maximum number of concurrent requests.
`respect_robots_txt`	`bool`	`false`	Whether to respect robots.txt directives.
`soft_http_errors`	`bool`	`false`	When true, HTTP-level error responses (404 NotFound, 403 Forbidden, WAF blocks) are surfaced as `ScrapeResult` records with the matching `status_code` rather than raised as `CrawlError`. Default `false` preserves the historical throw-on-error contract for direct fetches. Independently of this flag, 404s reached at the end of a redirect chain are always surfaced softly — the user opted into redirect-following, so receiving a 404 there is part of the normal flow rather than an unexpected error.
`user_agent`	`Option<String>`	`None`	Custom user-agent string.
`stay_on_domain`	`bool`	`false`	Whether to restrict crawling to the same domain.
`allow_subdomains`	`bool`	`false`	Whether to allow subdomains when `stay_on_domain` is true.
`include_paths`	`Vec<String>`	`vec![]`	Regex patterns for paths to include during crawling.
`exclude_paths`	`Vec<String>`	`vec![]`	Regex patterns for paths to exclude during crawling.
`custom_headers`	`HashMap<String, String>`	`HashMap::new()`	Custom HTTP headers to send with each request.
`request_timeout`	`Duration`	`30000ms`	Timeout for individual HTTP requests (in milliseconds when serialized).
`rate_limit_ms`	`Option<u64>`	`None`	Per-domain rate limit in milliseconds. When set, enforces a minimum delay between requests to the same domain. Defaults to 200ms when `None`.
`max_redirects`	`usize`	`10`	Maximum number of redirects to follow.
`retry_count`	`usize`	`0`	Number of retry attempts for failed requests.
`retry_codes`	`Vec<u16>`	`vec![]`	HTTP status codes that should trigger a retry.
`cookies_enabled`	`bool`	`false`	Whether to enable cookie handling.
`auth`	`Option<AuthConfig>`	`None`	Authentication configuration.
`max_body_size`	`Option<usize>`	`None`	Maximum response body size in bytes.
`remove_tags`	`Vec<String>`	`vec![]`	CSS selectors for tags to remove from HTML before processing.
`content`	`ContentConfig`	—	Content extraction and conversion configuration.
`map_limit`	`Option<usize>`	`None`	Maximum number of URLs to return from a map operation.
`map_search`	`Option<String>`	`None`	Search filter for map results (case-insensitive substring match on URLs).
`download_assets`	`bool`	`false`	Whether to download assets (CSS, JS, images, etc.) from the page.
`asset_types`	`Vec<AssetCategory>`	`vec![]`	Filter for asset categories to download.
`max_asset_size`	`Option<usize>`	`None`	Maximum size in bytes for individual asset downloads.
`browser`	`BrowserConfig`	—	Browser configuration.
`proxy`	`Option<ProxyConfig>`	`None`	Proxy configuration for HTTP requests.
`user_agents`	`Vec<String>`	`vec![]`	List of user-agent strings for rotation. If non-empty, overrides `user_agent`.
`capture_screenshot`	`bool`	`false`	Whether to capture a screenshot when using the browser.
`download_documents`	`bool`	`true`	Whether to download non-HTML documents (PDF, DOCX, images, code, etc.) instead of skipping them.
`document_max_size`	`Option<usize>`	`Default::default()`	Maximum size in bytes for document downloads. Defaults to 50 MB.
`document_mime_types`	`Vec<String>`	`vec![]`	Allowlist of MIME types to download. If empty, uses built-in defaults.
`warc_output`	`Option<PathBuf>`	`None`	Path to write WARC output. If `None`, WARC output is disabled.
`browser_profile`	`Option<String>`	`None`	Named browser profile for persistent sessions (cookies, localStorage).
`save_browser_profile`	`bool`	`false`	Whether to save changes back to the browser profile on exit.
`dispatch`	`Option<String>`	`None`	Pluggable dispatch components: bypass provider, escalation strategy, retry policy, WAF classifier, domain state, escalation budget, and max_total_attempts. When `None`, the engine uses its built-in defaults (no bypass, `BrowserOnly` strategy, `SimpleRetryPolicy`, built-in WAF classifier, no domain state, unlimited budget, 10 total attempt cap). Not serializable — callers construct this at runtime and skip in TOML/JSON configs.

BrowserExtras¶

Browser-specific extras populated when the native browser backend was used.

Available on ScrapeResult.browser when BrowserBackend.Native handled the request.

Field	Type	Default	Description
`eval_result`	`Option<serde_json::Value>`	`Default::default()`	Return value of `BrowserConfig.eval_script`, if provided.
`network_events`	`Vec<ResponseMeta>`	`vec![]`	Network events captured during page navigation (only populated when `BrowserConfig.capture_network_events` is true).
`cookies`	`Vec<CookieInfo>`	`vec![]`	All non-expired cookies present in the browser's cookie jar after navigation completes (includes both prior cookies and server Set-Cookie).

DownloadedDocument¶

A downloaded non-HTML document (PDF, DOCX, image, code file, etc.).

When the crawler encounters non-HTML content and download_documents is enabled, it downloads the raw bytes and populates this struct instead of skipping the resource.

Field	Type	Default	Description
`url`	`String`	—	The URL the document was fetched from.
`mime_type`	`String`	—	The MIME type from the Content-Type header.
`content`	`Vec<u8>`	—	Raw document bytes. Skipped during JSON serialization.
`size`	`usize`	—	Size of the document in bytes.
`filename`	`Option<String>`	`Default::default()`	Filename extracted from Content-Disposition or URL path.
`content_hash`	`String`	—	SHA-256 hex digest of the content.
`headers`	`HashMap<String, String>`	`HashMap::new()`	Selected response headers.

SitemapUrl¶

A URL entry from a sitemap.

Field	Type	Default	Description
`url`	`String`	—	The URL.
`lastmod`	`Option<String>`	`Default::default()`	The last modification date, if present.
`changefreq`	`Option<String>`	`Default::default()`	The change frequency, if present.
`priority`	`Option<String>`	`Default::default()`	The priority, if present.

LinkInfo¶

Information about a link found on a page.

Field	Type	Default	Description
`url`	`String`	—	The resolved URL of the link.
`text`	`String`	—	The visible text of the link.
`link_type`	`LinkType`	`LinkType::Internal`	The classification of the link.
`rel`	`Option<String>`	`Default::default()`	The `rel` attribute value, if present.
`nofollow`	`bool`	—	Whether the link has `rel="nofollow"`.

ImageInfo¶

Information about an image found on a page.

Field	Type	Default	Description
`url`	`String`	—	The image URL.
`alt`	`Option<String>`	`Default::default()`	The alt text, if present.
`width`	`Option<u32>`	`Default::default()`	The width attribute, if present and parseable.
`height`	`Option<u32>`	`Default::default()`	The height attribute, if present and parseable.
`source`	`ImageSource`	`ImageSource::Img`	The source of the image reference.

FeedInfo¶

Information about a feed link found on a page.

Field	Type	Default	Description
`url`	`String`	—	The feed URL.
`title`	`Option<String>`	`Default::default()`	The feed title, if present.
`feed_type`	`FeedType`	`FeedType::Rss`	The type of feed.

JsonLdEntry¶

A JSON-LD structured data entry found on a page.

Field	Type	Default	Description
`schema_type`	`String`	—	The `@type` value from the JSON-LD object.
`name`	`Option<String>`	`Default::default()`	The `name` value, if present.
`raw`	`String`	—	The raw JSON-LD string.

CookieInfo¶

Information about an HTTP cookie received from a response.

Field	Type	Default	Description
`name`	`String`	—	The cookie name.
`value`	`String`	—	The cookie value.
`domain`	`Option<String>`	`Default::default()`	The cookie domain, if specified.
`path`	`Option<String>`	`Default::default()`	The cookie path, if specified.

DownloadedAsset¶

A downloaded asset from a page.

Field	Type	Default	Description
`url`	`String`	—	The original URL of the asset.
`content_hash`	`String`	—	The SHA-256 content hash of the asset.
`mime_type`	`Option<String>`	`Default::default()`	The MIME type from the Content-Type header.
`size`	`usize`	—	The size of the asset in bytes.
`asset_category`	`AssetCategory`	`AssetCategory::Image`	The category of the asset.
`html_tag`	`Option<String>`	`Default::default()`	The HTML tag that referenced this asset (e.g., "link", "script", "img").

HreflangEntry¶

An hreflang alternate link entry.

Field	Type	Default	Description
`lang`	`String`	—	The language code (e.g., "en", "fr", "x-default").
`url`	`String`	—	The URL for this language variant.

FaviconInfo¶

Information about a favicon or icon link.

Field	Type	Default	Description
`url`	`String`	—	The icon URL.
`rel`	`String`	—	The `rel` attribute (e.g., "icon", "apple-touch-icon").
`sizes`	`Option<String>`	`Default::default()`	The `sizes` attribute, if present.
`mime_type`	`Option<String>`	`Default::default()`	The MIME type, if present.

HeadingInfo¶

A heading element extracted from the page.

Field	Type	Default	Description
`level`	`u8`	—	The heading level (1-6).
`text`	`String`	—	The heading text content.

CrawlStreamRequest¶

Request to begin a single-URL streaming crawl.

Wraps a single seed URL for delivery through the streaming-adapter binding surface. Required as a struct because alef's streaming adapter requires a named request type — primitives are not supported.

Field	Type	Default	Description
`url`	`String`	—	The seed URL to crawl.

BatchCrawlStreamRequest¶

Request to begin a multi-URL streaming crawl.

Wraps a set of seed URLs for delivery through the streaming-adapter binding surface. Required as a struct because alef's streaming adapter requires a named request type — primitives are not supported.

Field	Type	Default	Description
`urls`	`Vec<String>`	`vec![]`	The seed URLs to crawl. Each URL is followed independently up to the engine's configured depth.

CitationReference¶

A single numbered reference in a citation list — produced by the citation extractor when content uses inline [N]-style markers.

Field	Type	Default	Description
`index`	`usize`	—	1-based reference number as it appears in the source text.
`url`	`String`	—	Resolved absolute URL for this reference.
`text`	`String`	—	Human-readable anchor text or title for the reference.

Metadata Types¶

ExtractionMeta¶

Metadata about an LLM extraction pass.

Field	Type	Default	Description
`cost`	`Option<f64>`	`Default::default()`	Estimated cost of the LLM call in USD.
`prompt_tokens`	`Option<u64>`	`Default::default()`	Number of prompt (input) tokens consumed.
`completion_tokens`	`Option<u64>`	`Default::default()`	Number of completion (output) tokens generated.
`model`	`Option<String>`	`Default::default()`	The model identifier used for extraction.
`chunks_processed`	`usize`	—	Number of content chunks sent to the LLM.

ArticleMetadata¶

Article metadata extracted from article:* Open Graph tags.

Field	Type	Default	Description
`published_time`	`Option<String>`	`Default::default()`	The article publication time.
`modified_time`	`Option<String>`	`Default::default()`	The article modification time.
`author`	`Option<String>`	`Default::default()`	The article author.
`section`	`Option<String>`	`Default::default()`	The article section.
`tags`	`Vec<String>`	`vec![]`	The article tags.

ResponseMeta¶

Response metadata extracted from HTTP headers.

Field	Type	Default	Description
`etag`	`Option<String>`	`Default::default()`	The ETag header value.
`last_modified`	`Option<String>`	`Default::default()`	The Last-Modified header value.
`cache_control`	`Option<String>`	`Default::default()`	The Cache-Control header value.
`server`	`Option<String>`	`Default::default()`	The Server header value.
`x_powered_by`	`Option<String>`	`Default::default()`	The X-Powered-By header value.
`content_language`	`Option<String>`	`Default::default()`	The Content-Language header value.
`content_encoding`	`Option<String>`	`Default::default()`	The Content-Encoding header value.

PageMetadata¶

Metadata extracted from an HTML page's <meta> tags and <title> element.

Field	Type	Default	Description
`title`	`Option<String>`	`Default::default()`	The page title from the `<title>` element.
`description`	`Option<String>`	`Default::default()`	The meta description.
`canonical_url`	`Option<String>`	`Default::default()`	The canonical URL from `<link rel="canonical">`.
`keywords`	`Option<String>`	`Default::default()`	Keywords from `<meta name="keywords">`.
`author`	`Option<String>`	`Default::default()`	Author from `<meta name="author">`.
`viewport`	`Option<String>`	`Default::default()`	Viewport content from `<meta name="viewport">`.
`theme_color`	`Option<String>`	`Default::default()`	Theme color from `<meta name="theme-color">`.
`generator`	`Option<String>`	`Default::default()`	Generator from `<meta name="generator">`.
`robots`	`Option<String>`	`Default::default()`	Robots content from `<meta name="robots">`.
`html_lang`	`Option<String>`	`Default::default()`	The `lang` attribute from the `<html>` element.
`html_dir`	`Option<String>`	`Default::default()`	The `dir` attribute from the `<html>` element.
`og_title`	`Option<String>`	`Default::default()`	Open Graph title.
`og_type`	`Option<String>`	`Default::default()`	Open Graph type.
`og_image`	`Option<String>`	`Default::default()`	Open Graph image URL.
`og_description`	`Option<String>`	`Default::default()`	Open Graph description.
`og_url`	`Option<String>`	`Default::default()`	Open Graph URL.
`og_site_name`	`Option<String>`	`Default::default()`	Open Graph site name.
`og_locale`	`Option<String>`	`Default::default()`	Open Graph locale.
`og_video`	`Option<String>`	`Default::default()`	Open Graph video URL.
`og_audio`	`Option<String>`	`Default::default()`	Open Graph audio URL.
`og_locale_alternates`	`Vec<String>`	`vec![]`	Open Graph locale alternates.
`twitter_card`	`Option<String>`	`Default::default()`	Twitter card type.
`twitter_title`	`Option<String>`	`Default::default()`	Twitter title.
`twitter_description`	`Option<String>`	`Default::default()`	Twitter description.
`twitter_image`	`Option<String>`	`Default::default()`	Twitter image URL.
`twitter_site`	`Option<String>`	`Default::default()`	Twitter site handle.
`twitter_creator`	`Option<String>`	`Default::default()`	Twitter creator handle.
`dc_title`	`Option<String>`	`Default::default()`	Dublin Core title.
`dc_creator`	`Option<String>`	`Default::default()`	Dublin Core creator.
`dc_subject`	`Option<String>`	`Default::default()`	Dublin Core subject.
`dc_description`	`Option<String>`	`Default::default()`	Dublin Core description.
`dc_publisher`	`Option<String>`	`Default::default()`	Dublin Core publisher.
`dc_date`	`Option<String>`	`Default::default()`	Dublin Core date.
`dc_type`	`Option<String>`	`Default::default()`	Dublin Core type.
`dc_format`	`Option<String>`	`Default::default()`	Dublin Core format.
`dc_identifier`	`Option<String>`	`Default::default()`	Dublin Core identifier.
`dc_language`	`Option<String>`	`Default::default()`	Dublin Core language.
`dc_rights`	`Option<String>`	`Default::default()`	Dublin Core rights.
`article`	`Option<ArticleMetadata>`	`Default::default()`	Article metadata from `article:*` Open Graph tags.
`hreflangs`	`Vec<HreflangEntry>`	`vec![]`	Hreflang alternate links.
`favicons`	`Vec<FaviconInfo>`	`vec![]`	Favicon and icon links.
`headings`	`Vec<HeadingInfo>`	`vec![]`	Heading elements (h1-h6).
`word_count`	`Option<usize>`	`Default::default()`	Computed word count of the page body text.

Other Types¶

CrawlEngineHandle¶

Opaque handle to a configured crawl engine.

Constructed via create_engine with an optional CrawlConfig. Default implementations for all pluggable components are used internally.

Opaque type — fields are not directly accessible.

Enums¶

AssetCategory¶

The category of a downloaded asset.

Variant	Wire value	Description
`Document`	`document`	A document file (PDF, DOC, etc.).
`Image`	`image`	An image file.
`Audio`	`audio`	An audio file.
`Video`	`video`	A video file.
`Font`	`font`	A font file.
`Stylesheet`	`stylesheet`	A CSS stylesheet.
`Script`	`script`	A JavaScript file.
`Archive`	`archive`	An archive file (ZIP, TAR, etc.).
`Data`	`data`	A data file (JSON, XML, CSV, etc.).
`Other`	`other`	An unrecognized asset type.

AuthConfig¶

Authentication configuration.

Variant	Wire value	Description
`Basic`	`basic`	HTTP Basic authentication. — Fields: `username`: `String`, `password`: `String`
`Bearer`	`bearer`	Bearer token authentication. — Fields: `token`: `String`
`Header`	`header`	Custom authentication header. — Fields: `name`: `String`, `value`: `String`

BrowserBackend¶

Browser backend used for JavaScript rendering.

Variant	Wire value	Description
`Chromiumoxide`	`chromiumoxide`	Existing Chromium/CDP backend powered by chromiumoxide.
`Native`	`native`	Kreuzcrawl-owned native browser backend derived from Obscura.

BrowserMode¶

When to use the headless browser fallback.

Variant	Wire value	Description
`Auto`	`auto`	Automatically detect when JS rendering is needed and fall back to browser.
`Always`	`always`	Always use the browser for every request.
`Never`	`never`	Never use the browser fallback.

BrowserWait¶

Wait strategy for browser page rendering.

Variant	Wire value	Description
`NetworkIdle`	`network_idle`	Wait until network activity is idle.
`Selector`	`selector`	Wait for a specific CSS selector to appear in the DOM.
`Fixed`	`fixed`	Wait for a fixed duration after navigation.

CrawlEvent¶

An event emitted during a streaming crawl operation.

Not available on wasm32 targets — streaming requires native concurrency primitives (tokio channels, JoinSet) that are not supported on wasm32.

Delivered to bindings via alef's streaming-adapter pattern. The crawl_stream / batch_crawl_stream binding wrappers in bindings.rs expose this as the per-language streaming idiom (Python AsyncIterator, Ruby Enumerator, PHP Generator, Elixir Stream.unfold, etc.).

Variant	Wire value	Description
`Page`	`page`	A single page has been crawled. — Fields: `result`: `CrawlPageResult`
`Error`	`error`	An error occurred while crawling a URL. — Fields: `url`: `String`, `error`: `String`
`Complete`	`complete`	The crawl has completed. — Fields: `pages_crawled`: `usize`

FeedType¶

The type of a feed (RSS, Atom, or JSON Feed).

Variant	Wire value	Description
`Rss`	`rss`	RSS feed.
`Atom`	`atom`	Atom feed.
`JsonFeed`	`json_feed`	JSON Feed.

ImageSource¶

The source of an image reference.

Variant	Wire value	Description
`Img`	`img`	An `<img>` tag.
`PictureSource`	`picture_source`	A `<source>` tag inside `<picture>`.
`OgImage`	`og:image`	An `og:image` meta tag.
`TwitterImage`	`twitter:image`	A `twitter:image` meta tag.

LinkType¶

The classification of a link.

Variant	Wire value	Description
`Internal`	`internal`	A link to the same domain.
`External`	`external`	A link to a different domain.
`Anchor`	`anchor`	A fragment-only link (e.g., `#section`).
`Document`	`document`	A link to a downloadable document (PDF, DOC, etc.).

PageAction¶

A single page interaction action.

Actions are serialized with a type tag using camelCase naming, except ExecuteJs which is explicitly renamed to "executeJs".

Variant	Wire value	Description
`Click`	`click`	Click on an element matching the given CSS selector. — Fields: `selector`: `String`
`TypeText`	`type`	Type text into an element matching the given CSS selector. — Fields: `selector`: `String`, `text`: `String`
`Press`	`press`	Press a keyboard key (e.g. "Enter", "Tab", "Escape"). — Fields: `key`: `String`
`Scroll`	`scroll`	Scroll the page or a specific element. — Fields: `direction`: `ScrollDirection`, `selector`: `String`, `amount`: `i64`
`Wait`	`wait`	Wait for a duration or for an element to appear. — Fields: `milliseconds`: `i64`, `selector`: `String`
`Screenshot`	`screenshot`	Take a screenshot of the current page. — Fields: `full_page`: `bool`
`ExecuteJs`	`executeJs`	Execute arbitrary JavaScript in the page context. Safety: The script runs with full page privileges in the browser context. Only execute scripts from trusted sources. — Fields: `script`: `String`
`Scrape`	`scrape`	Scrape the current page HTML.

ScrollDirection¶

Direction for a scroll action.

Variant	Wire value	Description
`Up`	`up`	Scroll upward.
`Down`	`down`	Scroll downward.

Edit this page on GitHub