the free openclaw compute ladder
How builders run agents 80% free and only pay when deep reasoning actually matters.
Most AI agent setups fail for one reason.
They rely on one AI provider.
When that provider rate limits or goes down, the agent retries the same provider until everything stops working.
You hit 429 errors.
Your quota burns.
Workflows stall.
You don’t need unlimited AI.
You need a compute ladder.
Fast models handle most tasks.
Fallback providers keep the system alive.
Paid models only run when they actually matter.
This stack’s currently running in my own OpenClaw environment using Groq, OpenRouter, Ollama, and OpenAI.
the compute ladder
tier 4 openai/gpt-4o
tier 3 openrouter/moonshotai/kimi-k2:free
tier 2 openrouter/z-ai/glm-4.5-air:free
tier 1 groq/llama-3.1-8b-instant
tier 0 ollama/qwen2.5:0.5bMost tasks finish on tier 1.
Tier 2 and tier 3 act as fallback providers if earlier models fail.
Tier 4’s the break-glass paid model for deep reasoning, complex coding, or long analysis.
Tier 0 runs locally so your system never completely dies even if every cloud provider fails.
how requests move through the ladder
request
↓
groq fast model
↓
glm reasoning fallback
↓
kimi heavier reasoning
↓
paid model if needed
↓
local model safety netMost requests resolve at the top.
Lower tiers only activate when needed.
before you start
This assumes you’ve already got OpenClaw installed and running.
If not, follow the beginner install guide here
You should already be able to run:
openclaw status
openclaw gateway status
openclaw doctorstep 1 install the local fallback model
Install Ollama if you don’t already have it.
https://ollama.com
Pull the local model:
ollama pull qwen2.5:0.5bTest it:
ollama run qwen2.5:0.5bType:
say hello in one sentenceIf it replies, your local fallback works.
Press Ctrl+C to exit.
step 2 create provider keys
You’ll need API keys for:
Groq
https://console.groq.com
OpenRouter
https://openrouter.ai
OpenAI
https://platform.openai.com
During OpenClaw onboarding you can connect these providers.
Note: OpenRouter free models have rate limits. Expect roughly 20 requests per minute and daily caps unless credits are added.
That’s why Groq sits at the top of the ladder.
step 3 open the openclaw dashboard
Run:
openclaw dashboardIf authentication’s required:
openclaw config get gateway.auth.tokenIf a token doesn’t exist:
openclaw doctor --generate-gateway-tokenPaste the token when prompted.
step 4 configure the ladder
Open the dashboard.
Go to:
config → raw json
Paste this:
{
“agents”: {
“defaults”: {
“model”: {
“primary”: “groq/llama-3.1-8b-instant”,
“fallbacks”: [
“openrouter/z-ai/glm-4.5-air:free”,
“openrouter/moonshotai/kimi-k2:free”,
“openai/gpt-4o”
]
},
“heartbeat”: {
“model”: “ollama/qwen2.5:0.5b”
}
}
}
}Save the configuration.
Restart OpenClaw if prompted.
step 5 verify providers
Run:
openclaw models status --probeYou should see:
providers authenticated
models resolving
fallback chain visible
That confirms your ladder’s working.
step 6 verify inside openclaw
Open a chat session and run:
/model statusYou should see the active model and fallback chain.
step 7 add a routing guardrail
Send this once to the agent:
you are running on a multi provider ai ladder.
rules
use the primary model first
if a provider returns 429 or timeout move immediately to the next fallback
never retry the same provider more than once
use the paid model only for deep reasoning or coding
prefer free models for short tasksThat prevents retry loops that burn tokens.
step 8 test the ladder
Test 1
Ask:
summarize this article in three sentencesExpected result: fast response using Groq.
Test 2
Disable your Groq key temporarily.
Ask the same question.
Expected result: OpenClaw falls back to OpenRouter.
Test 3
Switch to the local model:
/model ollama/qwen2.5:0.5bAsk:
confirm heartbeat is activeYou should receive a local response.
example workflow using the ladder
Here’s a simple real task to test the stack.
Paste this prompt into your OpenClaw chat.
analyze this article and return:
1 short summary
3 key takeaways
1 action step
keep the response under 150 wordsMost runs will complete on the Groq fast model.
If Groq fails, the task automatically moves down the ladder.
That’s the entire point.
Reliable agents without burning paid tokens.
important note about fallbacks
Fallback models activate when providers fail.
They don’t activate because a task’s complex.
If you want to intentionally use a stronger model:
/model openai/gpt-4othe real goal
The goal isn’t just free AI.
The goal’s resilient AI infrastructure.
A system that:
stays online during provider outages
keeps most tasks free
escalates only when necessary
stays easy to debug
That’s what this stack provides.




The Groq - “compound”
model is lowkey amazing
I have it embedded in many different workflows and automations, basically it is smart and fast, and a great complement to a competent engineer
That’s a solid set up 😎👍