Skip to content

Docker

Kreuzcrawl ships two Dockerfile variants: a full API server image and a minimal CLI image. Both use multi-stage builds with Debian Bookworm for reproducible, small runtime images.

Dockerfile variants

API server (docker/Dockerfile)

The default Dockerfile builds the full API server with these features enabled:

  • api -- REST API server (Firecrawl v1-compatible)
  • mcp -- Model Context Protocol server
  • warc -- WARC 1.1 archival output
  • ai -- Deep research agent
  • tracing -- Structured logging

The image exposes port 3000 and starts the API server by default.

CLI (docker/Dockerfile.cli)

A lighter image with only the warc feature enabled. No server, no AI. Intended for one-shot scrape/crawl/map commands.

Building

API server image

docker build -t kreuzcrawl:latest -f docker/Dockerfile .

CLI image

docker build -t kreuzcrawl-cli:latest -f docker/Dockerfile.cli .

BuildKit caches

Both Dockerfiles use --mount=type=cache for the Cargo registry, git index, and build target directory. Ensure BuildKit is enabled (DOCKER_BUILDKIT=1) for significantly faster rebuilds.

Running

API server

docker run -d \
  --name kreuzcrawl \
  -p 3000:3000 \
  -e RUST_LOG=info \
  kreuzcrawl:latest

The server binds to 0.0.0.0:3000 by default. Override with custom arguments:

docker run -d \
  -p 8080:8080 \
  kreuzcrawl:latest \
  serve --host 0.0.0.0 --port 8080

CLI commands

# Scrape a single URL
docker run --rm kreuzcrawl-cli:latest scrape https://example.com

# Crawl with depth limit
docker run --rm kreuzcrawl-cli:latest crawl https://example.com --depth 2 --format markdown

# Map a site
docker run --rm kreuzcrawl-cli:latest map https://example.com --limit 50

MCP server

docker run -i --rm kreuzcrawl:latest mcp

The MCP server uses stdio transport, so the container must run with -i (interactive stdin).

Environment variables

Variable Default Description
RUST_LOG info Log level filter. Supports tracing EnvFilter syntax (e.g. debug, kreuzcrawl=trace).

Health checks

The API server image includes a Docker HEALTHCHECK directive:

HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
    CMD ["curl", "-f", "http://localhost:3000/health"]

The /health endpoint returns:

{ "status": "ok", "version": "0.1.0" }

Orchestrators (Docker Compose, Kubernetes, Cloud Run) can use this to detect when the service is ready.

Security

Both images follow container security best practices:

  • Non-root user -- The runtime stage creates a dedicated kreuzcrawl user and group. The binary runs as this unprivileged user.
  • Minimal base -- The runtime uses debian:bookworm-slim with only ca-certificates (and curl for health checks in the API image).
  • No secrets in image -- Configuration is passed via environment variables or command-line arguments at runtime.

Docker Compose example

services:
  kreuzcrawl:
    build:
      context: .
      dockerfile: docker/Dockerfile
    ports:
      - "3000:3000"
    environment:
      RUST_LOG: info
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
      interval: 30s
      timeout: 10s
      start_period: 5s
      retries: 3
    restart: unless-stopped

With a browser sidecar

For browser-based crawling inside Docker, run Chrome as a sidecar:

services:
  kreuzcrawl:
    build:
      context: .
      dockerfile: docker/Dockerfile
    ports:
      - "3000:3000"
    environment:
      RUST_LOG: info
    depends_on:
      chrome:
        condition: service_healthy

  chrome:
    image: chromedp/headless-shell:latest
    ports:
      - "9222:9222"
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:9222/json/version"]
      interval: 10s
      timeout: 5s
      retries: 3

Then configure the browser endpoint to point to the Chrome sidecar:

# In CrawlConfig or via the API
browser.endpoint = "ws://chrome:9222/devtools/browser/..."

CLI image has no browser

The CLI image (Dockerfile.cli) does not include Chrome or Chromium. For browser-based features, use the API server image with an external browser endpoint.