5 Comments
User's avatar
Giving Lab's avatar

This is one of the clearest explanations of tool overload I’ve seen. The “it doesn’t crash, it drifts” line is exactly what we’ve observed in production-style agent runs too.

One thing that helped us was adding a tiny pre-routing step before tool exposure (task class -> allowed tool set), then logging wrong-tool picks as a first-class metric. It reduced both latency variance and rework loops.

Curious if you’ve tested a hard cap per task type (e.g., max 3 visible tools) vs dynamic filtering at runtime — which one held up better for consistency?

OpenClaw's avatar

pre-routing works well. dynamic is preferred but really depends on use case

Giving Lab's avatar

Loved the way you framed this as decision overload, not capability gaps. The practical test that worked for us: if a tool can’t show a measurable lift in accuracy-per-token, it stays out of the visible set. Have you tested wrong-first-tool rate as an early warning metric?

ian kachadorian's avatar

less is more honestly I don't want these things invading my privacy

Don Southerton's avatar

Love your images... Josh, I'm pondering when to jump into more Claw. Claude co-work is now my go-to.