Skip to main content
loomcycle
§ release note

We drove a fresh Claude Code session as the operator — and the first experiment found a DeepSeek bug we'd shipped without noticing.

To pressure-test loomcycle from the outside, we set up an isolated sandbox, installed the brew binary, and drove it through a fresh Claude Code session as the operator — talking to loomcycle over MCP, with no internal shortcuts and no API access we wouldn't expose to a real third party. Every experiment was designed in advance by us; every step was executed by Claude through the same tool surface (`spawn_run`, `list_agents`, `agentdef`, `interruption_resolve`, …) a community operator would see.

The point of running the experiments through Claude rather than a hand-written script is that the agent surfaces gaps a script wouldn't. A script does what the author wrote. An agent does what the system tells it is possible — and an honest agent surfaces the moments where the system's docs, error messages, and capability gates don't line up. We get an evaluation of the operator-facing wire surface at the same time we get the experiment's result.

The series is four experiments, each escalating one notch in surface area:

  1. exp1 — tool access. Can a coding agent actually use the built-in tools (Read / Write / Bash / WebSearch) the operator enabled? (This post.)
  2. exp2 — interruption. Can an agent ask a Yes/No question, hold its run while the human decides, and branch on the answer? (This post.)
  3. exp3 — multi-agent loop. Three agents iterating a shared artifact over Channels + Memory + Evaluation, with a third aggregating. (Tomorrow's post.)
  4. exp4 — real dev workflow. Real Gitea webhooks, third-party MCP, Telegram — closed-loop coder → PR → reviewer → merge. (Day after.)

This post covers exp1 + exp2 on v0.22.0 and what shipped in v0.23.0 to close the gaps each one surfaced. The headline finding — F10, a DeepSeek adapter bug — would have remained invisible without a real agent exercising the loop on a non-Anthropic provider.

Experiment 1 — can the agent actually use the built-in tools?

The simplest possible test: a generic code-guru agent ( allowed tools [Read, Write, Edit, Bash, Grep, Glob, WebSearch, WebFetch]; model deepseek-v4-pro) is asked to write a Python program generating the first 100 prime numbers, save it as exp1/generate_primes.py, run it, verify the output, and report PASS/FAIL.

We picked DeepSeek deliberately — Anthropic-OAuth is the "easy" path we already exercise during internal development, and provider-adapter bugs hide there. Real operators run mixed fleets.

Run 1 — failed at mkdir

F10 · v0.22.0 · openai-compat adapterThe first Bash call (mkdir -p exp1) returned empty stdout. The DeepSeek API returned 400: messages[3] missing field content. The loop died.

Tracing it: the agent's first tool_use was a Bash{mkdir -p exp1}. mkdir -p on success prints nothing — empty stdout. Loomcycle's openai-compat path (used for DeepSeek's API) was serializing the tool-result message with no content field when the content was empty, on the assumption that the OpenAI message schema treats content as optional. DeepSeek's implementation does not — it requires content on every message, even if it's the empty string. So the loop's second LLM call returned 400 mid-conversation, and the agent never got to write the file.

The agent had working tool access — the first Bash call dispatched, executed, returned. The provider adapter aborted the next round-trip. From the operator's perspective this looks like "the tools don't work"; from the runtime's perspective the tools worked perfectly.

The fix (#379, RFC Q in v0.23.0): the openai-compat serializer now always emits content on tool messages, defaulting to the empty string when the actual content is empty. The regression test makes the empty-content path explicit: a Bash that produces no stdout no longer trips DeepSeek. This unblocks every tool-using agent on DeepSeek — every other provider tolerated the missing field, so the bug had been invisible against Anthropic / OpenAI / Gemini.

Run 2 — PASS, with independent verification

Workaround on v0.22.0 was simple: make every shell command print something. The agent then completed the loop cleanly — wrote generate_primes.py (trial division to isqrt(n), even-skip), ran it, read the CSV back, reported PASS. Independent verification by recomputing primes in a separate process: header index,prime, exactly 100 data rows, indices 1..100, values match the first 100 primes (0 mismatches), first/last = 2 / 541. Total tokens: 38,317 in / 2,884 out.

Built-in tool access is real. Two real bugs surfaced along the way:

Plus one already-known gap that bit:

Experiment 2 — interruption (human-in-the-loop) gating

The second experiment: a yesno-agent asks one Yes/No question ("Approve the deployment?" — options Yes/No), blocks until answered, and outputs RESULT: PROCEED on Yes or RESULT: ABORT on No. Three checks:

  1. Pass 1 — Claude (the operator) self-answers the question. The agent should wake and branch.
  2. Pass 2 — Claude marshals the question to a human via the IDE UI; the human answers; the agent should wake on the human's answer.
  3. Timeout hold — confirm the agent is genuinely suspended (not crashed) while waiting.

Both passes worked end-to-end. The agent held cleanly, the answer arrived, the branch was correct. The interesting moment was which resolve endpoint we used.

F15 — cross-runtime wake fails silently

F15 · v0.22.0 · in-process busPass 1, first attempt: we resolved the interrupt via the Claude Code plugin's MCP server. The DB row flipped to resolved. The run never woke. After 85 seconds the run was still running — the only events on its SSE stream were : keepalive.

Root cause: Interruption.ask blocks on channels.Bus.Wait("intr:<id>") — an in-process bus. The resolve handler calls bus.Notify on whichever process received the resolve. The plugin MCP server and the HTTP server (:8788) on v0.22.0 were two separate processes sharing the SQLite store but each with its own in-process bus. The plugin's bus.Notify woke no one; the HTTP server's blocking ask never received the signal.

Resolving via the HTTP server's own MCP endpoint (POST /v1/_mcpinterruption_resolve) woke the agent immediately and the branch was correct. So the feature works — the bug was that which process you resolved from mattered, and the only error indication was the run silently never waking until the 1-hour timeout.

What shipped in v0.23.0: instead of fixing the cross-process notify directly, the runtime got a different topology — the thin-client MCP mode (#381, RFC R). loomcycle mcp --upstream <runtime-url> launches a stdio MCP server that proxies to a single full runtime, instead of starting its own. Every tool call lands on the one process that owns the in-process bus. The cross-runtime topology that triggered F15 stops arising in normal use; the operator-facing pattern is "one full runtime, N thin-client MCP entry points."

The breaking change in the same release (#383): loomcycle mcp --no-http is removed. The pattern that needed it — running two full runtimes side-by-side, one with HTTP suppressed — is the pattern that caused F15. Removing the option is more honest than documenting the footgun.

A real fix for the original code path — bus.Notify via durable cross-runtime wake — shipped two days later (#400) for the rare topology that genuinely needs multiple full runtimes (HA across separate hosts). The thin-client topology is the supported answer for everything else.

The engineering lesson worth keeping

Provider-adapter correctness is load-bearing in a way unit tests miss. The F10 bug had been live in internal/providers/openai/serialize.go for months. Unit tests against the OpenAI API (which tolerates the missing field) passed; integration against DeepSeek (which doesn't) had never been exercised on a Bash that returns empty stdout. The shape of the bug — "every other provider tolerated it" — is the hardest class to catch without a real agent driving the loop. Hand-written integration tests would have had to know to send an empty content deliberately; an agent doing mkdir -p sends it accidentally on the first call.

The discipline going forward: every provider in provider_priority must pass an end-to-end smoke test where the agent's first tool call returns empty stdout. The smoke test is now in the v0.23.x CI matrix. We don't ship a new openai-compat adapter without it.

And on F15 / F11 — the right answer to "this topology has subtle failure modes" is sometimes "stop supporting that topology." The thin-client pattern (RFC R) collapses what used to be a multi-process coordination problem into a single-process invariant. Less code, fewer ways to be wrong, and the failure mode that bit us in exp2 dissolves rather than being patched.

Next post: the MCP server wedged the IDE on a list — head-of-line blocking, and why killing the process was the only release. The same operator-driven framing, now finding a wedge bug that turned a long spawn_run into "your whole MCP transport is frozen for an hour."