CLI Usage¶

The kreuzcrawl CLI provides commands for scraping, crawling, site mapping, and running the API and MCP servers.

kreuzcrawl <COMMAND> [OPTIONS]

Commands¶

scrape¶

Scrape a single URL and extract metadata.

kreuzcrawl scrape <URL> [OPTIONS]

Arguments:

Argument	Required	Description
`URL`	Yes	URL to scrape

Options:

Flag	Default	Description
`--format <FORMAT>`	`json`	Output format: `json` or `markdown`
`--proxy <URL>`	--	Proxy URL
`--user-agent <STRING>`	--	Custom user agent
`--timeout <MS>`	`30000`	Request timeout in milliseconds
`--respect-robots-txt`	`false`	Respect robots.txt
`--browser-mode <MODE>`	`auto`	Browser mode: `auto`, `always`, or `never`
`--browser-endpoint <WS_URL>`	--	CDP WebSocket endpoint for an external browser

Examples:

# Scrape a page as JSON (default)
kreuzcrawl scrape https://example.com

# Scrape as markdown
kreuzcrawl scrape https://example.com --format markdown

# Scrape through a proxy with custom timeout
kreuzcrawl scrape https://example.com --proxy http://proxy:8080 --timeout 60000

# Force browser rendering for a JS-heavy page
kreuzcrawl scrape https://quotes.toscrape.com/js/ --browser-mode always --format markdown

# Connect to an external browser via CDP
kreuzcrawl scrape https://example.com --browser-endpoint ws://127.0.0.1:9222/devtools/browser/...

Output:

json format: Prints the full ScrapeResult as pretty-printed JSON to stdout.
markdown format: Prints the markdown content to stdout. If no markdown content is available, prints an error to stderr.

crawl¶

Crawl a website following links.

kreuzcrawl crawl <URL>... [OPTIONS]

Arguments:

Argument	Required	Description
`URL`	Yes (one or more)	Seed URL(s) to crawl. Multiple URLs triggers batch mode.

Options:

Flag	Short	Default	Description
`--depth <N>`	`-d`	`2`	Maximum crawl depth
`--max-pages <N>`	`-n`	--	Maximum pages to crawl
`--concurrent <N>`	`-c`	`10`	Maximum concurrent requests
`--rate-limit <MS>`	--	`200`	Rate limit delay between requests per domain (ms)
`--format <FORMAT>`	--	`json`	Output format: `json` or `markdown`
`--proxy <URL>`	--	--	Proxy URL
`--user-agent <STRING>`	--	--	Custom user agent
`--timeout <MS>`	--	`30000`	Request timeout in milliseconds
`--respect-robots-txt`	--	`false`	Respect robots.txt
`--stay-on-domain`	--	`false`	Stay on the same domain
`--browser-mode <MODE>`	--	`auto`	Browser mode: `auto`, `always`, or `never`
`--browser-endpoint <WS_URL>`	--	--	CDP WebSocket endpoint for an external browser

Examples:

# Crawl with default settings (depth 2, 10 concurrent)
kreuzcrawl crawl https://example.com

# Crawl deeper with more concurrency
kreuzcrawl crawl https://example.com -d 5 -c 20 --max-pages 500

# Crawl and output as markdown
kreuzcrawl crawl https://example.com --format markdown --stay-on-domain

# Batch crawl multiple seed URLs
kreuzcrawl crawl https://example.com https://example.org -d 1

# Force browser rendering during crawl
kreuzcrawl crawl https://quotes.toscrape.com/js/ --browser-mode always --format markdown

Output:

Single URL, json format: Prints the CrawlResult as pretty-printed JSON.
Single URL, markdown format: Prints each page separated by --- with URL header.
Multiple URLs, json format: Prints an array of { seed_url, result } objects.
Multiple URLs, markdown format: Prints each page with seed URL and page URL headers.

map¶

Discover all URLs on a website via sitemaps and link extraction.

kreuzcrawl map <URL> [OPTIONS]

Arguments:

Argument	Required	Description
`URL`	Yes	URL to map

Options:

Flag	Default	Description
`--limit <N>`	--	Maximum URLs to return
`--search <STRING>`	--	Filter URLs by substring
`--respect-robots-txt`	`false`	Respect robots.txt

Examples:

# Discover all URLs
kreuzcrawl map https://example.com

# Limit results and filter
kreuzcrawl map https://example.com --limit 50 --search "/docs/"

Output: Prints one URL per line to stdout.

serve¶

Start the REST API server. Requires the api feature.

kreuzcrawl serve [OPTIONS]

Options:

Flag	Default	Description
`--host <ADDRESS>`	`0.0.0.0`	Host address to bind to
`--port <PORT>`	`3000`	Port to listen on

Examples:

# Start on default port
kreuzcrawl serve

# Start on custom host and port
kreuzcrawl serve --host 127.0.0.1 --port 8080

The server prints a startup message to stderr and runs until interrupted.

mcp¶

Start the MCP server using stdio transport. Requires the mcp feature.

kreuzcrawl mcp

No options. The server communicates via stdin/stdout using the MCP protocol. Startup messages are printed to stderr.

Usage with Claude Desktop:

Add to claude_desktop_config.json:

{
  "mcpServers": {
    "kreuzcrawl": {
      "command": "kreuzcrawl",
      "args": ["mcp"]
    }
  }
}

Exit Codes¶

Code	Meaning
`0`	Success
`1`	Error (scrape/crawl/map failure, server startup failure)

Errors are printed to stderr. Successful output goes to stdout.

Edit this page on GitHub