Skip to content

MCP Tools Reference

Kreuzcrawl implements a Model Context Protocol (MCP) server that exposes web scraping, crawling, and mapping capabilities as tools for LLM agents. The server uses stdio transport and is started via kreuzcrawl mcp or programmatically with kreuzcrawl::mcp::start_mcp_server().

Server Info

Field Value
Name kreuzcrawl-mcp
Title Kreuzcrawl Web Crawling MCP Server
Transport stdio
Capabilities Tools

Tools

scrape

Scrape a single URL and extract content as markdown or JSON.

Annotations: read_only_hint = true, idempotent_hint = true

Parameters:

Parameter Type Required Default Description
url string Yes -- URL to scrape (must start with http:// or https://)
format string No "markdown" Output format: "markdown" or "json"
use_browser boolean No false Force browser rendering instead of HTTP fetch (requires browser feature)

Returns: Text content containing the page content in the requested format. Markdown format includes the converted page content. JSON format includes the full ScrapeResult as structured JSON.


crawl

Crawl a website following links up to a configured depth.

Annotations: read_only_hint = true

Parameters:

Parameter Type Required Default Description
url string Yes -- Starting URL for the crawl
max_depth integer No Engine default Maximum link depth from the start URL (max: 100)
max_pages integer No Engine default Maximum number of pages to crawl (1--100,000)
format string No "markdown" Output format: "markdown" or "json"
stay_on_domain boolean No Engine default Restrict crawling to the same domain

Returns: Text content with results for all discovered pages. In markdown format, each page is separated by ---. In JSON format, each page is a structured object.

Validation:

  • max_depth must be <= 100
  • max_pages must be between 1 and 100,000

map

Discover all pages on a website via links and sitemaps.

Annotations: read_only_hint = true, idempotent_hint = true

Parameters:

Parameter Type Required Default Description
url string Yes -- URL of the website to map
limit integer No No limit Maximum number of URLs to return
search string No -- Case-insensitive substring filter for discovered URLs
respect_robots_txt boolean No Engine default Whether to respect robots.txt directives

Returns: Text content with the list of discovered URLs and their sitemap metadata (last modified, change frequency, priority).


batch_scrape

Scrape multiple URLs concurrently.

Annotations: read_only_hint = true

Parameters:

Parameter Type Required Default Description
urls string[] Yes -- List of URLs to scrape (must not be empty)
format string No "markdown" Output format: "markdown" or "json"
concurrency integer No Engine default Maximum number of concurrent requests

Returns: Text content with results for each URL, separated by ---. Failed URLs include an error message.


download

Download a document from a URL.

Annotations: read_only_hint = true

Parameters:

Parameter Type Required Default Description
url string Yes -- URL to download
max_size integer No 50 MB Maximum document size in bytes

Returns: JSON text with document metadata:

{
  "url": "https://example.com/doc.pdf",
  "mime_type": "application/pdf",
  "size": 102400,
  "filename": "doc.pdf",
  "content_hash": "abc123..."
}

If the URL returns HTML instead of a downloadable document, the response includes a note field explaining this.


get_version

Get the current kreuzcrawl library version.

Annotations: read_only_hint = true, idempotent_hint = true

Parameters: None (empty object {}).

Returns: JSON text:

{
  "version": "0.1.0"
}

Planned Tools (Not Yet Implemented)

The following tools are defined but return placeholder messages. They require specific feature flags to be enabled at compile time.

screenshot

Capture a screenshot of a URL.

Requires: browser feature

Parameter Type Required Description
url string Yes URL to capture
full_page boolean No Capture full page vs viewport only

interact

Execute browser actions on a page (click, type, scroll, etc.).

Requires: interact feature

Parameter Type Required Description
url string Yes URL to navigate to before executing actions
actions object[] Yes Sequence of browser actions

research

AI-driven research across multiple pages.

Requires: ai feature

Parameter Type Required Description
query string Yes Research query or question
max_depth integer No Maximum crawl depth per seed URL
max_pages integer No Maximum total pages to visit
seed_urls string[] No Optional seed URLs to start from

crawl_status

Check the status of a crawl job.

Parameter Type Required Description
job_id string No Job ID to check status for

Error Handling

MCP tool errors are mapped from CrawlError variants:

CrawlError Variant MCP Error Code Description
InvalidConfig -32602 (INVALID_PARAMS) Invalid configuration parameter
All other variants -32603 (INTERNAL_ERROR) Network, browser, or server errors with descriptive context

Error messages preserve the original context to aid debugging.

Usage with Claude Desktop

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "kreuzcrawl": {
      "command": "kreuzcrawl",
      "args": ["mcp"]
    }
  }
}