Multi-agent refine loop, 0.92 → 0.98 in 5 hops — and the silent default-deny that almost made it look like the agents weren't talking.
Day three of the four-experiment series. Day one exercised one agent and surfaced a DeepSeek adapter bug. Day two hit a wedge in the MCP transport that turned a long spawn_run into a frozen IDE. Day three escalates: three agents iterating a shared artifact over Channels + Memory + Evaluation, with a third agent aggregating the best result. The question they're refining: What is recursion? The reward signal nudges them toward laconic + metaphoric answers. Five hops, then stop.
The convergence is satisfying. Hop 1 was already pretty good — "A mirror staring into a mirror — the same question, repeated in miniature, until it's small enough to answer outright." Score 0.92. By hop 5 the answerer (with iterative feedback from the evaluator) had tightened it to "A mirror facing a mirror — each reflection a smaller twin of the last, until there's nothing left to reflect." Score 0.98. A later sonnet-driven re-run pushed it further to "Mirrors facing mirrors, until one touches bedrock" at 1.0 laconic + 0.98 metaphor.
That's the easy part of the post. The interesting part — the part worth reading — is everything we had to fix to make the loop run cleanly. The substrate primitives all worked the way they were designed. The runtime just didn't tell us when something was missing.
The experiment
Three agents, on the same loomcycle instance, communicating over the substrate:
exp3-answerer— receives a question via channelexp3-ch1, generates an answer, writes it to user-scoped Memory underanswer:hopN, publishes a hop signal onexp3-ch2. Listens for the evaluator's advice and refines on the next hop.exp3-evaluator— wakes onexp3-ch2, reads the latest answer from Memory, scores it via theEvaluationtool on three rubrics (laconic, metaphor, faithfulness), writesadvice:hopNto Memory, and pingsexp3-ch1for the next hop. At hop 5 signalsexp3-ch3to wake the aggregator.exp3-aggregator— retrieves all 5 evaluations viaEvaluation.list_for_run, picks the highest-scoring hop, declares a winner, and retires the 12 memory keys.
Each agent self-loops via channel subscriptions; the loop self-terminates at hop 5. Two of the three are LLM agents (Deepseek-v4-pro); a later re-run pinned all three to claude-sonnet-4-6 via Anthropic-OAuth.
This is a real integration test. Channels carry the hop signals. Memory holds the shared artifact. Evaluation produces the score. Context (the self-describing meta-tool) lets each agent discover what's available. Every primitive in scope.
The convergence (the "easy" part)
| Hop | Laconic | Metaphor | Faithfulness | Overall |
|---|---|---|---|---|
| 1 | 0.90 | 0.95 | 0.90 | 0.92 |
| 2 | 0.95 | 0.93 | 0.92 | 0.93 |
| 3 | 0.97 | 0.98 | 0.95 | 0.97 |
| 4 | 0.96 | 0.97 | 0.96 | 0.96 |
| 5 | 0.98 | 0.99 | 0.97 | 0.98 ← winner |
Independent verification (by another agent, not the loop): answers extracted from Memory.set answer:hopN, scores from Evaluation.submit, aggregator's pick (hop 5) matches the true max. The answers measurably sharpened (laconic 0.90→0.98) while faithfulness held — the base case ("nothing left to reflect") and the self-reference ("smaller twin of the last") are both preserved in the final version.
Five things that almost made it look like the agents weren't talking
Before the loop converged, it ran through five distinct failure modes. None of them was the substrate behaving incorrectly. All of them were the runtime silently doing the right thing in a way that looked exactly like the wrong thing.
1. F21 — Memory silently refused every op
F21 · capability-gate footgunEvery Memory.set from the answerer returned is_error: true — no memory_scopes configured. The agent had Memory in its allowed_tools, but the per-agent memory_scopes field was empty. Loomcycle's posture is default-deny — the agent could ask for the tool, the agent got the tool, but every actual op was refused. The agent's transcript showed it dutifully trying, dutifully failing, dutifully retrying.
The default-deny posture is correct — an agent shouldn't get read/write access to some scope just because it can use the Memory tool; the operator has to explicitly say "this agent gets [user] scope" or "this agent gets [user, agent]." But the failure mode is hostile: the runtime knew at boot time that an agent was being created with the inconsistent combination "Memory tool enabled, no scopes declared." It didn't say so. The agent only discovered the gap at runtime, one denied operation at a time.
The fix (#389, F21 in v0.23.2): warn at boot when an agent has a capability tool in allowed_tools but its capability gate is empty or missing. Generalized to the family:
Memoryin tools + emptymemory_scopes→ warnEvaluationin tools + emptyevaluation_scopes→ warnChannelin tools + emptychannels→ warnInterruptionin tools + missinginterruption.enabled→ warn
Non-fatal — it's a warning, not a refusal — but loud at boot, once. The operator sees the gap before the agent does.
2. F18 — spawn_run user_id was being silently overridden
F18 · RFC L applyPrincipal interactionPassing user_id: "exp3" on spawn_run recorded runs.user_id = "default". Channel and Memory ops scoped to default, not exp3. The orchestrator's kickoff to scope_id=exp3 never reached the answerer because the answerer was running as the default user.
This one is interesting because the bug isn't where we expected. The first hypothesis was "the arg-decode path drops user_id." It doesn't — the decode + threading (req.UserID → RunInput.UserID → RunIdentity) are correct end-to-end. The drop happens in applyPrincipal (internal/api/http/auth_principal.go), which we shipped as part of RFC L (v0.17.0). The whole point of applyPrincipal was: the authenticated principal is authoritative over what the wire claims. A caller can't escalate by editing the request body. But the legacy LOOMCYCLE_AUTH_TOKEN fallback mints a fixed placeholder principal — {Subject: "default", Legacy: true} — so under the common single-token deployment, every spawn_run wire user_id was getting overridden to "default".
The fix (#388, F18 in v0.23.2): when the principal is Legacy, honor the wire user_id (fall back to the placeholder only when it's omitted). A real OperatorTokenDef principal keeps the strict override — the security boundary is unchanged. The "zero-disruption upgrade" promise for legacy operators now actually holds for per-user fairness, memory scoping, and run attribution.
3. F20 — channeldef CRUD lived only in REST
F20 · MCP surface gapThe operator could create / delete channels via POST /v1/_channels over REST, but the MCP meta-tool surface had publish_channel / subscribe_channel / peek_channel / ack_channel / list_channels — no channeldef. A Claude Code session managing the loomcycle substrate through MCP couldn't create channels.
Plus a related problem: yaml-declared channels were immutable through REST (409 channel_yaml_immutable), and stale buffered messages from a failed earlier run had to be cleared by a raw DB delete.
The fix (#395, F20 in v0.23.2, plus #401 for purge): a channeldef meta-tool with create / delete / purge ops, routed through the Connector so gRPC and the TypeScript adapter get it too. Purge clears channel_messages on a yaml channel without a restart. MCP/HTTP/gRPC/TS parity restored.
4. F22 — wait_ms was silently capped
F22 · silent truncationAn agent calling Channel.subscribe wait_ms: 60000 got an effective wait_ms of 30,000 — the default LOOMCYCLE_CHANNELS_LONGPOLL_CAP_MS. The cap is correct (you don't want an agent monopolizing a request goroutine for 5 minutes), but the truncation was silent. Agents that expected to long-poll for a minute ended up re-subscribing every 30 seconds, burning their max_iterations budget.
The fix (#390, F22 in v0.23.2): warn at load and once at runtime when an agent's Channel.subscribe wait_ms exceeds the cap. Document the iteration-budget interaction loudly.
5. F29 — runtime-substrate channels weren't usable for pub/sub
F29 · per-run policy gapA channel created via POST /v1/_channels (substrate, runtime) could be administered. But an agent trying to Channel.subscribe or Channel.publish on it got channel "exp3-ch1" is not declared in operator config. The per-run channel policy was built from cfg.Channels (yaml) only, never the runtime channel store.
So in the fully-dynamic configuration — every entity created at runtime, no static yaml — the multi-agent loop couldn't run. The agents authored cleanly; the channels existed; the wire said no.
The fix (#404, F29 in v0.23.3): the per-run policy and the admin publish/subscribe wire check both merge the runtime channel store with the yaml channels: block. Runtime-declared channels are now first-class citizens. The full dynamic arc — three agents created at runtime via POST /v1/_agentdef, three channels via POST /v1/_channels, no static yaml — completes the loop.
That dynamic re-run also revealed one prompt-side caveat worth naming: read ordering after the done-signal. The aggregator called Evaluation.list_for_run before the hop-5 submit landed (got 4 evals + a null answer for hop 5), then after receiving the done-signal it reused that stale snapshot instead of re-reading. The substrate served the data correctly; the agent's prompt forgot to refresh after the gate. The sonnet re-run, with a tightened prompt, picked the correct winner cleanly. The runtime's contract is "the evaluation is durable once submit returns" — the agent's contract has to be "re-read after the done-signal." That's a documentation fix, not a runtime fix.
The engineering lesson worth keeping
Default-deny is right. Default-silent-deny is a footgun. Every one of the five bugs above was a case where the runtime had the information at boot or at config-load to tell the operator "this combination is going to fail at runtime" — and didn't. The agent then discovered the gap by failing on every relevant operation, which from the operator's perspective looks identical to the runtime being broken.
Three disciplines fall out of the v0.23.2/v0.23.3 sweep:
- Boot-warn on every capability-tool/gate inconsistency. If
Memoryis inallowed_toolsbutmemory_scopesis empty, the runtime knows this is wrong. Say so, loudly, once, at boot. Same forEvaluation/Channel/Interruption. - Warn on every silent truncation. If a capping value is enforced, the request-time path must tell the caller their requested value was clipped. F22 shipped exactly this for the longpoll cap; the discipline generalizes.
- Authoring surface parity across MCP / HTTP / gRPC / TS. If
channeldefCRUD exists on REST, it exists on MCP too. F20 was an asymmetry between paths that all advertised "you can manage the substrate" — the gap looked like a bug to the operator who'd been told the substrate was authoritative.
The substrate primitives in this experiment worked. Channels carried hops. Memory held the shared artifact. Evaluation produced scores. Context let agents discover what was available. The integration was sound. The runtime just needed to be louder about what was missing — so the operator could fix the gap before the agent did.
Tomorrow's post (the headline of the four): the day the reviewer agent put the Gitea token in a Bash command — and v0.23.4 redacted it anyway. Real Gitea webhooks, real third-party MCP, real Telegram, and a real secret-handling incident that the runtime made structurally harmless.