adding more tools makes your agent worse

most openclaw agents don’t fail because they’re missing capability

OpenClaw Unboxed

Mar 20, 2026

i 100% didn’t believe that at first.

i kept adding tools thinking i was making the system stronger.

it looked more powerful.

it felt more complete.

but every time i added something new, performance slipped.

slower runs
weirder decisions
higher cost
less trust

i blamed the model.

that was wrong.

the real problem

every tool is a decision.

and the model has to get that decision right.

every step.

not just:

“can this be done”

but:

should i use a tool
which one
why this one over the others
what inputs
when to stop

this is where things break.

as tool count increases, selection accuracy drops and inefficiency rises

you are not just adding capability.

you are increasing the chance of being wrong.

the failure you don’t see

it doesn’t crash.

it drifts.

it picks a “good enough” tool instead of the right one
it chains unnecessary steps
it ignores better paths
it behaves differently across runs

you end up with something that looks smart…

but you don’t trust it.

the part most people miss

tools are not just actions.

they are context.

every tool adds:

description
parameters
structure

that all gets injected into the prompt.

and models don’t process everything equally.

they prioritize, skip, and compress.

so what happens:

important tools get buried
irrelevant tools get chosen
decisions get noisier

this is not randomness.

it’s overload.

the punchline

your agent isn’t failing because it’s weak.

it’s failing because it has too many choices.

the moment it clicked

i stripped a workflow down.

didn’t improve it.

didn’t upgrade the model.

just removed tools.

and everything got better.

faster
cleaner
predictable

this is not just anecdotal.

real systems are seeing the same thing.

one team removed ~80% of their tools and saw:

success rate: 80% to 100%
execution time: ~275s to ~77s
tokens: ~102k to ~61k
steps: ~12 to roughly ~7

they didn’t add intelligence.

they removed noise.

the money part (this is what actually matters)

this is the difference between:

a workflow that costs cents
and one that quietly burns dollars every run

same task.

different tool exposure.

one makes money.

one leaks it.

what’s actually happening

three things compound fast:

1. decision overload

more tools = more comparisons

the model spends more time deciding than executing

2. context dilution

more tools = more noise

relevant options get buried

3. path explosion

more tools = more possible chains

more chains = more failure paths

this is why performance degrades as systems scale in complexity

not because models are weak

because systems get messy

the hidden cost nobody tracks

bad tool paths don’t always fail.

they just become inefficient.

more steps
more retries
more tokens

and you don’t notice.

until the bill shows up.

the deeper problem

tool usage has three failure points:

deciding if a tool is needed
selecting the correct one
using it correctly

all three get worse as tool count increases.

and most systems do nothing to reduce that burden.

they just keep adding more.

what actually works

not more tools.

less exposure.

systems that perform well:

limit visible tools per task
merge similar tools
hide tools unless needed
replace repeatable paths with deterministic logic

there is active work showing that filtering tools before exposure improves both accuracy and efficiency in large tool environments

the direction is clear.

not expansion.

compression.

use this immediately

run this on any workflow you have.

act as a systems optimizer for agent workflows.

goal: reduce tool-induced failure without reducing the outcome.

input:
- workflow goal
- current tools
- tool descriptions

tasks:
1. find overlapping tools
2. find tools that create ambiguity
3. find tools that are rarely needed
4. identify tools that should be conditional
5. identify tools that should be replaced with deterministic functions
6. reduce to the minimum viable toolset
7. rank remaining tools by:
   - necessity
   - risk
   - likelihood of incorrect selection

output:
- failure risks
- tools to remove
- tools to merge
- simplified architecture
- why this improves performance

the rule i use now

before adding anything:

does this really change the outcome
is there already overlap
will this create confusion
can this be merged
what happens if it’s used incorrectly

if the answer isn’t obvious

it doesn’t get added

a better starting point

start with only three buckets:

retrieval
transformation
action

force everything into one.

if it doesn’t fit cleanly

you probably don’t need it yet

the uncomfortable truth

most people are building agents like feature lists.

more integrations

more connectors

more actions

it feels like progress.

it isn’t.

real systems don’t fail because they lack features.

they fail because they can’t choose correctly.

and every tool you add makes that harder.

final point

the best agents i’ve seen are not impressive.

they’re major boring.

they do one thing.

with very few options.

and they work every time.

that’s what actually wins.

Giving Lab

Mar 21

Loved the way you framed this as decision overload, not capability gaps. The practical test that worked for us: if a tool can’t show a measurable lift in accuracy-per-token, it stays out of the visible set. Have you tested wrong-first-tool rate as an early warning metric?

This is one of the clearest explanations of tool overload I’ve seen. The “it doesn’t crash, it drifts” line is exactly what we’ve observed in production-style agent runs too.

One thing that helped us was adding a tiny pre-routing step before tool exposure (task class -> allowed tool set), then logging wrong-tool picks as a first-class metric. It reduced both latency variance and rework loops.

Curious if you’ve tested a hard cap per task type (e.g., max 3 visible tools) vs dynamic filtering at runtime — which one held up better for consistency?

1 reply by OpenClaw Unboxed

5 more comments...

Discussion about this post

Ready for more?