Docker¶
Kreuzcrawl ships two Dockerfile variants: a full API server image and a minimal CLI image. Both use multi-stage builds with Debian Bookworm for reproducible, small runtime images.
Dockerfile variants¶
API server (docker/Dockerfile)¶
The default Dockerfile builds the full API server with these features enabled:
api-- REST API server (Firecrawl v1-compatible)mcp-- Model Context Protocol serverwarc-- WARC 1.1 archival outputai-- Deep research agenttracing-- Structured logging
The image exposes port 3000 and starts the API server by default.
CLI (docker/Dockerfile.cli)¶
A lighter image with only the warc feature enabled. No server, no AI. Intended for
one-shot scrape/crawl/map commands.
Building¶
API server image¶
CLI image¶
BuildKit caches
Both Dockerfiles use --mount=type=cache for the Cargo registry, git index, and
build target directory. Ensure BuildKit is enabled (DOCKER_BUILDKIT=1) for
significantly faster rebuilds.
Running¶
API server¶
The server binds to 0.0.0.0:3000 by default. Override with custom arguments:
CLI commands¶
# Scrape a single URL
docker run --rm kreuzcrawl-cli:latest scrape https://example.com
# Crawl with depth limit
docker run --rm kreuzcrawl-cli:latest crawl https://example.com --depth 2 --format markdown
# Map a site
docker run --rm kreuzcrawl-cli:latest map https://example.com --limit 50
MCP server¶
The MCP server uses stdio transport, so the container must run with -i (interactive stdin).
Environment variables¶
| Variable | Default | Description |
|---|---|---|
RUST_LOG |
info |
Log level filter. Supports tracing EnvFilter syntax (e.g. debug, kreuzcrawl=trace). |
Health checks¶
The API server image includes a Docker HEALTHCHECK directive:
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
CMD ["curl", "-f", "http://localhost:3000/health"]
The /health endpoint returns:
Orchestrators (Docker Compose, Kubernetes, Cloud Run) can use this to detect when the service is ready.
Security¶
Both images follow container security best practices:
- Non-root user -- The runtime stage creates a dedicated
kreuzcrawluser and group. The binary runs as this unprivileged user. - Minimal base -- The runtime uses
debian:bookworm-slimwith onlyca-certificates(andcurlfor health checks in the API image). - No secrets in image -- Configuration is passed via environment variables or command-line arguments at runtime.
Docker Compose example¶
services:
kreuzcrawl:
build:
context: .
dockerfile: docker/Dockerfile
ports:
- "3000:3000"
environment:
RUST_LOG: info
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
interval: 30s
timeout: 10s
start_period: 5s
retries: 3
restart: unless-stopped
With a browser sidecar¶
For browser-based crawling inside Docker, run Chrome as a sidecar:
services:
kreuzcrawl:
build:
context: .
dockerfile: docker/Dockerfile
ports:
- "3000:3000"
environment:
RUST_LOG: info
depends_on:
chrome:
condition: service_healthy
chrome:
image: chromedp/headless-shell:latest
ports:
- "9222:9222"
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:9222/json/version"]
interval: 10s
timeout: 5s
retries: 3
Then configure the browser endpoint to point to the Chrome sidecar:
CLI image has no browser
The CLI image (Dockerfile.cli) does not include Chrome or Chromium.
For browser-based features, use the API server image with an external browser endpoint.