Can You Actually Run AI Coding Agents Offline? A Practical Guide to Local LLM Development
Air-gapped development with Ollama, local models, and coding agents that don't phone home. What works, what breaks, and how to build an offline pipeline that produces real software.
Codeberg-first
Ralph Workflow is free and open source. Inspect the primary repo on Codeberg before you install — or jump to the GitHub mirror.
Most AI coding tools assume you're online. Claude, ChatGPT, Copilot — every keystroke goes through someone else's datacenter. That's fine for open-source side projects. It's not fine when you work on proprietary code, HIPAA data, defense contracts, or infrastructure that can't leave your network.
So the question comes up constantly: can you actually do serious coding with a local LLM and no internet?
The short answer: yes, and the workflow layer matters more than the model.
What works (better than you'd expect)
Ollama + Continue + VS Code is the current sweet spot. Run a 14B–32B parameter model on an RTX 4090 or M-series MacBook and you get competent autocomplete, decent function generation, and solid refactoring. The models that matter:
| Model | Size | Strengths | RAM Needed |
|---|---|---|---|
| Codestral 22B | 22B | Fill-in-middle, function gen | ~16 GB |
| DeepSeek Coder V2 | 16B | Reasoning, architecture | ~12 GB |
| Qwen 2.5 Coder | 14B/32B | Tool calling, long context | 10–24 GB |
| Llama 4 Scout | 17B | General coding + explanation | ~12 GB |
For most tasks shorter than 200 lines, you won't notice a meaningful difference from cloud models. The gap shows up in long-context refactoring and complex multi-file orchestration — which is where the workflow layer earns its keep.
What breaks (and how to fix it)
1. Tool calling degrades
Local models are worse at tool use than Claude or GPT-4. They'll drift off-spec, call the wrong tool, or hallucinate parameters. The fix: phase gates. Break your run into analysis → plan → implement → verify. Each phase produces a concrete artifact that gate-checks before the next one starts. If the agent drifts in implementation, the verification phase catches it because it reads the plan artifact, not the agent's memory.
2. Context window exhaustion
Cloud models have 200K token windows. Local models on consumer hardware top out around 32K–64K. That means you can't throw your entire codebase at the agent and ask it to refactor. The fix: scope phases to single files or small modules. The Ralph Workflow phase-gate architecture was designed for exactly this constraint — each phase gets a bounded input, produces a bounded output, and hands off cleanly.
3. No web search / browsing
Your online agent finds the docs, reads the API reference, checks StackOverflow. The offline agent can't. The fix: pre-load relevant docs into the workspace before starting the run. Copy API reference files, relevant documentation, and example code into the project directory. The agent can search local files as well as it can search the web — it just needs the right files.
4. Model selection gets stuck on one size
The temptation is to run everything through the biggest local model you can fit. But a 32B reasoning model is overkill for "rename this variable" and a 7B speed model is underkill for "design the database schema." The fix: cost-aware model routing — even offline. Route simple tasks to smaller/faster models and reserve the big ones for architecture and review.
The workflow layer is the differentiator
Here's the pattern that most people miss: the quality gap between a local 14B model and Claude isn't the model — it's that Claude Code has a built-in workflow (permission system, conversation management, tool orchestration) and your local model is just an API endpoint.
When you add a phase-gated workflow on top of a local model — one that enforces analysis artifacts, planning documents, and verification checkpoints — the local model's output approaches the quality of an attended cloud session. Not because the model got smarter, but because the workflow prevented the failure modes that eat 80% of the token budget.
Try it tonight on your own hardware
- Install Ollama:
curl -fsSL https://ollama.com/install.sh | sh - Pull a coding model:
ollama pull qwen2.5-coder:14b - Write a one-paragraph spec for a task you've been putting off
- Run it through a phase-gated workflow and see what comes out
The first run will surprise you. Not because the model is amazing — because the structure prevents the mistakes you didn't know you were accepting from attended cloud sessions.
Build your offline pipeline: First-task guide →
Primary repo (Codeberg): RalphWorkflow/Ralph-Workflow ★
GitHub mirror: Ralph-Workflow
Ralph Workflow is vendor-neutral — it works with any API endpoint, including local Ollama instances. No cloud dependency required.
Related Posts
When Your AI Coding Agent Gets Stuck: How to Stop the Infinite Tool Loop
The #1 failure mode nobody writes about: an AI coding agent that keeps calling the same tool until your token budget evaporates. Here's how to recognize it, break out, and prevent it at the workflow level.
Testing AI-Generated Code: A Strategy for Reviewing Autonomous Coding Output
Most articles cover how to run AI coding agents, but skip the hardest part: how to actually test and validate what they produce. Practical strategies for differential testing, property-based tests, and the review-budget concept.
AI Cost Model Routing: Stop Paying Frontier Prices for Grunt Work
Most AI coding tools burn expensive tokens on boilerplate and planning. Cost model routing — using cheap models for analysis and strong models for implementation — cuts costs by 60% or more without sacrificing quality. Here's how it works and why your workflow tool should support it.
Best evaluator path
Turn the idea into a real overnight test, not another saved tab.
Codeberg-first: open the primary repo, choose one bounded backlog task, run it tonight, and ask one question tomorrow morning — would I merge this? GitHub stays available as the mirror.
Open the primary Codeberg repo
Read the public source before you install anything.
Pick a first task
Use the guide to choose a bounded backlog item that is honest to review.
Install and run Ralph Workflow
Keep the machine awake, then decide in the morning whether the diff is good enough to merge.