Local Web Access

Ralph Workflow is a free and open-source AI agent orchestrator built around a simple core loop inspired by the original Ralph loop. That simple core composes into a stronger workflow system for serious repo work, and the default workflow is already strong enough to start with before you customize anything.

Ralph Workflow supports three related web capabilities: search, visit, and crawl. They solve different problems, so the fastest way to stay productive is to pick the smallest tool that fits the job.

Search vs. visit vs. crawl

Need

Tool

What you get

Find relevant pages

web_search

Titles, URLs, and snippets

Read one page

visit_url

Extracted readable text from a single URL

Traverse many pages or scrape a JS-heavy site

upstream crawler

Multi-page traversal and structured extraction

The default choice: visit_url

visit_url is Ralph Workflow’s built-in page reader. It fetches a single URL and returns readable extracted text without any extra setup.

Use it when you want to:

  • read one documentation page

  • inspect a changelog, release note, or issue

  • follow up on a URL returned by web_search

It ships with Ralph Workflow and works out of the box.

When to use an upstream crawler

For multi-page crawls, JavaScript-rendered SPAs, or structured extraction, Ralph Workflow can delegate to a local upstream MCP server.

Use an upstream crawler when you need to:

  • crawl an entire docs site

  • scrape a JavaScript-heavy application

  • extract structured content with selectors or schemas

This path requires extra configuration in .agent/mcp.toml because the crawler runs locally as a separate service.

Choosing between them

Need

Tool

Fetch one static HTML page

visit_url

JavaScript-rendered SPA

ralph_upstream__crawl4ai__crawl

Multi-page crawl

ralph_upstream__crawl4ai__crawl_many

Structured extraction

ralph_upstream__crawl4ai__crawl with an extraction schema

Safety posture

SSRF guard

The built-in visit_url tool blocks requests to loopback, private-network, link-local, multicast, and other reserved address ranges by default. That means it will reject localhost, 127.0.0.1, and private IPs unless you explicitly relax the guard.

This is intentional: it reduces the risk of exposing internal services by accident.

When to enable private-network access

Set allow_private_networks = true in [web_visit] only when you understand the trade-off and the environment is appropriately isolated.

Typical cases:

  • Ralph Workflow runs in an isolated container or VM

  • you intentionally want access to local development servers

  • a CI runner has its own dedicated network boundary

Timeouts and size limits

visit_url enforces:

  • a 15-second default timeout per request

  • a 2 MiB maximum response body size

These limits keep fetch operations from growing into unbounded background work.

Tool names you will see

The most important names are:

  • web_search — search the web

  • visit_url — read one page

  • ralph_upstream__crawl4ai__crawl — crawl through an upstream crawler when configured

  • ralph_upstream__crawl4ai__crawl_many — batch crawling through that upstream crawler