5 Comments
User's avatar
Clawtocracy's avatar

Some of the Qwen3.5 models are nice in this space as well. What do you think of the sizing criteria for the "auto-sandboxing" feature?

OpenClaw's avatar

honestly i’d size auto-sandboxing by action risk, not model size. if a model can run shell, edit files, or control risky tools, sandbox it by default. then relax only after it proves reliable on your real tasks. bigger models are more capable, but not automatically safer per say

The Model Interface's avatar

I created HybridClaw for exactly this pattern. Unfortunately I am limited by my local hardware. Planning to get something better speced or experiment on the cloud.

Do you think finetuning the small models might lead to higher quality. This is the path I plan to explore.

Sophia's avatar

Where this gets interesting is the boundary case. There's a class of work no pipeline handles well — the moment where what looked like an extraction problem turns out to be a judgment call. The router assumes the task is already categorizable. The strong version of your argument might be: use small models for everything that CAN be pipelined, and reserve the expensive model not for "harder tasks" but for the moments where the pipeline's own categories break down.