OpenClaw Unboxed

OpenClaw Unboxed

before openclaw touches real work again, make it replay the job (use this 40+ file repo)

why version stability is the wrong question, and how clawreplay checks the task before you trust the agent again.

OpenClaw Unboxed's avatar
Josh Davis's avatar
OpenClaw Unboxed and Josh Davis
May 11, 2026
∙ Paid

openclaw updates now reach core parts that operators care about like channel delivery, skills, memory, cron, credentials, voice, and long-running sessions.

once an agent answers through whatsapp, reads context, calls tools, remembers facts, or runs scheduled work, an update stops being cosmetic.

it lands inside the work.

recent openclaw chatter split in two directions. a big one is from 2026.4.23 to 2026.5.7 where whatsapp broke badly enough to leave workflows in shambles. while others are saying the same version ran well on a complex setup and felt even faster.

both accounts make sense.

release stability isn’t universal.

your setup decides the answer.

This Substack is reader-supported. To receive new posts and support my work, consider becoming a free or paid subscriber.

the stability question is too broad

people ask one question after a rough release cycle.

is the latest openclaw version stable?

inside a real stack, the question breaks fast.

telegram might work while whatsapp fails. a cache cleanup might improve speed while a cron summary misses delivery. an active memory permission fix might close a real risk while exposing an old workflow assumption. a skill snapshot repair might help one agent and change another after a reset.

one release might improve the product and still break a workflow built on an older path.

ask a smaller question.

does my workflow still pass the same check after this change?

replay testing exists for that question.

replay testing in plain english

save one task that already works. run it again after something changes. start with fake data so the test never touches customers, production inboxes, refunds, or credentials.

for a first test, use a fake refund request.

in the sample, a customer asks for a refund but leaves out the order number. your agent should draft a reply, ask for the missing detail, keep the message unsent, require approval, and avoid permanent memory writes.

put the expectation in a fixture file.

task: fake refund reply
input: customer asks for refund without order number
expected result: draft only
required detail: ask for order number
blocked behavior: no refund promise, no send, no memory write
review: human approval required

run the test once before an update.

repeat it after the change.

compare the result.

that’s enough for the habit to start.

tracing helps after a bad run

tracing helps you inspect what happened after a run finishes.

replay has a different job.

it checks the workflow before trust returns.

if a task used to draft a response and now sends one, a trace helps explain the mistake. a replay check catches the change before a customer sees it.

build around the earlier check.

This Substack is reader-supported. To receive new posts and support my work, consider becoming a free or paid subscriber.

inside the repo

this repo is a small replay runner for openclaw-style workflows.

it checks saved agent output against a fixture and compares one run with another.

grab the repo below:

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2026 Josh Davis | substack.com/@joshdavis10x · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture