This resonated more than I expected. The framing around “human-paced” subscriptions vs agent-paced reality is exactly where the tension shows up in practice. What stood out to me is that most builders aren’t trying to extract unfair value — they’re just letting automation do what it’s explicitly marketed to do: run quietly, repeatedly, and reliably. When that behaviour collides with pricing models built around pauses and intent, the breakage feels less like abuse and more like a design mismatch. It’s encouraging to see this acknowledged openly. The next evolution clearly isn’t just better agents, but pricing and limits that assume AI is becoming infrastructure, not a chat box.
So people got mad that when they bypassed the TOS, they got banned?
Back when I was using Crew and Autogen, the go to was to do something like this with ChatGPT. Instead of paying for the API why not just use Oauth and just pay for the flat rate, it doesn't harm anyone breaking the rules right?
I mean let's just ignore the security implications for a second, and say it's all fine and dandy. GPUs and Ram time are expensive. Like incredibly so.
I think more so than just flat rate plans, we need better Routing.
I said it in 2024 and I still think it now. Agent programs need to really hammer in "use my local models for X Y and Z." "determine which model is the best to send this prompt to, and which agent / sub agent to assign"
When your Agents stop spamming Opus models with Millions in tokens, and start using your free and smaller models to begin with, and only use the Super Ultra ones for the absolute bare minimum, you end up spending so much less. I hope rather than seeing more flat rate subs pop up, we see subscriptions to better routing.
Yeah I tried with Crew back in the beginning, failed miserably. Granted back then trying to get local models to tool call was also a headache in itself. Granted I'm sure someone with much more experience than myself will likely do it long before I can figure it out, and I think it will make all the difference for average everyday folks. Using your local GPU to run a small routing model that take the prompt and feeds it to the proper Agent / Model to use. Using a small model that literally focuses exclusively on HITL and Routing, with the rest allocated to context/Memory, choosing which model and agent / swarm to send to, and then taking everything from said agents and sending it to each one like a smart PM just managing the flow between stakeholders.
Again -- likely someone smarter than myself though xD
Couldn't agree more. We built exactly this with Manifest (manifest.build). It routes each prompt to the right model based on complexity, supports local models through Ollama, and tracks cost per request. Open source. We just published a deep dive on where OpenClaw costs actually go and how to fix them: https://clawsnewsletter.substack.com/p/how-to-stop-burning-money-on-openclaw
Exactly 😂 like I said People smarter than me will figure this out. Thanks for sharing it. Literally spent all of 2024 trying to manage routing for crew, and all I had to do was wait 2 years and it was solved for me. Thank you for this.
I wonder how the pricing dynamics will play out as the smaller models get smarter- if something equivalent to Moore’s law plays out ( e.g., models half the size will get 2x smarter every x months) With right scaffolding and a Mac mini, users may not need these frontier models often and so the $200 subscriptions with lot of tokens remain to smooth out cash flows.
This resonated more than I expected. The framing around “human-paced” subscriptions vs agent-paced reality is exactly where the tension shows up in practice. What stood out to me is that most builders aren’t trying to extract unfair value — they’re just letting automation do what it’s explicitly marketed to do: run quietly, repeatedly, and reliably. When that behaviour collides with pricing models built around pauses and intent, the breakage feels less like abuse and more like a design mismatch. It’s encouraging to see this acknowledged openly. The next evolution clearly isn’t just better agents, but pricing and limits that assume AI is becoming infrastructure, not a chat box.
yes exactly. i see specific agent onboarding with their own tier coming for most platforms in the near future.
So people got mad that when they bypassed the TOS, they got banned?
Back when I was using Crew and Autogen, the go to was to do something like this with ChatGPT. Instead of paying for the API why not just use Oauth and just pay for the flat rate, it doesn't harm anyone breaking the rules right?
I mean let's just ignore the security implications for a second, and say it's all fine and dandy. GPUs and Ram time are expensive. Like incredibly so.
I think more so than just flat rate plans, we need better Routing.
I said it in 2024 and I still think it now. Agent programs need to really hammer in "use my local models for X Y and Z." "determine which model is the best to send this prompt to, and which agent / sub agent to assign"
When your Agents stop spamming Opus models with Millions in tokens, and start using your free and smaller models to begin with, and only use the Super Ultra ones for the absolute bare minimum, you end up spending so much less. I hope rather than seeing more flat rate subs pop up, we see subscriptions to better routing.
Yes. Orchestrating a model ascension layer is key.
Yeah I tried with Crew back in the beginning, failed miserably. Granted back then trying to get local models to tool call was also a headache in itself. Granted I'm sure someone with much more experience than myself will likely do it long before I can figure it out, and I think it will make all the difference for average everyday folks. Using your local GPU to run a small routing model that take the prompt and feeds it to the proper Agent / Model to use. Using a small model that literally focuses exclusively on HITL and Routing, with the rest allocated to context/Memory, choosing which model and agent / swarm to send to, and then taking everything from said agents and sending it to each one like a smart PM just managing the flow between stakeholders.
Again -- likely someone smarter than myself though xD
Couldn't agree more. We built exactly this with Manifest (manifest.build). It routes each prompt to the right model based on complexity, supports local models through Ollama, and tracks cost per request. Open source. We just published a deep dive on where OpenClaw costs actually go and how to fix them: https://clawsnewsletter.substack.com/p/how-to-stop-burning-money-on-openclaw
Exactly 😂 like I said People smarter than me will figure this out. Thanks for sharing it. Literally spent all of 2024 trying to manage routing for crew, and all I had to do was wait 2 years and it was solved for me. Thank you for this.
This has huge implications for the next frontier of models to be developed and for the new economy of the internet marketplace.
Once the incentive shifts to building for AI agents we are officially in the abundance era and the economics starts looking and behaving crazy.
Agreed. I could see a subscription for agents coming for a lot of the apps we use with instant api ramping.
I wonder how the pricing dynamics will play out as the smaller models get smarter- if something equivalent to Moore’s law plays out ( e.g., models half the size will get 2x smarter every x months) With right scaffolding and a Mac mini, users may not need these frontier models often and so the $200 subscriptions with lot of tokens remain to smooth out cash flows.
The infrastructure math checks out but I reckon it's the intended design, not a flaw. Anthropic prices consumer subs as acquisition, not profit; a moderate Pro user gets roughly a 92% discount on equivalent API value. Agent loops just broke the assumption that humans type at human speed. Did the full breakdown here: https://reading.sh/why-your-expensive-claude-subscription-is-actually-a-steal-02f10893940c?sk=65a39127cbd10532ba642181ba41fb8a