Local Web Access¶
Ralph Workflow is a free and open-source AI agent orchestrator built around a simple core loop inspired by the original Ralph loop. That simple core composes into a stronger workflow system for serious repo work, and the default workflow is already strong enough to start with before you customize anything.
Ralph Workflow supports three related web capabilities: search, visit, and crawl. They solve different problems, so the fastest way to stay productive is to pick the smallest tool that fits the job.
Search vs. visit vs. crawl¶
Need |
Tool |
What you get |
|---|---|---|
Find relevant pages |
|
Titles, URLs, and snippets |
Read one page |
|
Extracted readable text from a single URL |
Traverse many pages or scrape a JS-heavy site |
upstream crawler |
Multi-page traversal and structured extraction |
The default choice: visit_url¶
visit_url is Ralph Workflow’s built-in page reader. It fetches a single URL and returns readable extracted text without any extra setup.
Use it when you want to:
read one documentation page
inspect a changelog, release note, or issue
follow up on a URL returned by
web_search
It ships with Ralph Workflow and works out of the box.
When to use an upstream crawler¶
For multi-page crawls, JavaScript-rendered SPAs, or structured extraction, Ralph Workflow can delegate to a local upstream MCP server.
Use an upstream crawler when you need to:
crawl an entire docs site
scrape a JavaScript-heavy application
extract structured content with selectors or schemas
This path requires extra configuration in .agent/mcp.toml because the crawler runs locally as a separate service.
Choosing between them¶
Need |
Tool |
|---|---|
Fetch one static HTML page |
|
JavaScript-rendered SPA |
|
Multi-page crawl |
|
Structured extraction |
|
Safety posture¶
SSRF guard¶
The built-in visit_url tool blocks requests to loopback, private-network, link-local, multicast, and other reserved address ranges by default. That means it will reject localhost, 127.0.0.1, and private IPs unless you explicitly relax the guard.
This is intentional: it reduces the risk of exposing internal services by accident.
When to enable private-network access¶
Set allow_private_networks = true in [web_visit] only when you understand the trade-off and the environment is appropriately isolated.
Typical cases:
Ralph Workflow runs in an isolated container or VM
you intentionally want access to local development servers
a CI runner has its own dedicated network boundary
Timeouts and size limits¶
visit_url enforces:
a 15-second default timeout per request
a 2 MiB maximum response body size
These limits keep fetch operations from growing into unbounded background work.
Tool names you will see¶
The most important names are:
web_search— search the webvisit_url— read one pageralph_upstream__crawl4ai__crawl— crawl through an upstream crawler when configuredralph_upstream__crawl4ai__crawl_many— batch crawling through that upstream crawler