Skip to main content
loomcycle
§ release note

Agent ensembles arrive — scheduler-driven fan-out, Channel.await fan-in, and a clock for agents.

Experiment 5 ran end-to-end last night. Five RSS collectors — one each for Hacker News, Wired, Engadget, Ars Technica, TechCrunch — fired in parallel by loomcycle's scheduler every five minutes. Each collector, on success, pinged a fan-in channel via the scheduler's on_complete hook. A consolidator agent, scheduled one minute after each collector crossing, called the brand-new Channel.await {channels, mode: at_least, n: 5, wait_ms: 120000} combinator and woke when all five collectors had reported in (or the budget elapsed). It loaded 25 items from user-scoped Memory, deduped by normalized URL, wrote a consolidated record, and called mcp__telegram__send_message — Telegram returned {"ok": true, "message_id": 22}. Three cycles, then max_fires=3 auto-retired every collector schedule. Zero workarounds.

That ensemble — five parallel agents converging on a shared fan-in point, terminating after a bounded number of cycles, producing one consolidated output — is the shape loomcycle's substrate now has primitives for. Today's v0.25 "agentic-ensemble" release ships RFC S: three new primitives that close the gaps the prior multi-agent experiments worked around. Three follow-up fixes — one per follow-up release across the next 24 hours — close the rough edges exp5 surfaced as it stress-tested them.

This is the fifth post in the operator-via-MCP experiment series. exp1 exercised one agent. exp2 added human-in-the-loop. exp3 ran a 3-agent refine loop. exp4 wired a closed-loop dev workflow against real Gitea + Telegram. exp5 is the first to own the term agent ensemble — multiple agents, coordinated through substrate primitives, with a clean fan-out / fan-in / self-terminate story.

What "agent ensemble" means here

Plenty of frameworks call themselves "multi-agent." Most of them mean "you can spawn a sub-agent from a parent agent and read its return value." That's worth having, and loomcycle's Agent.parallel_spawn covered it from v0.10. But it's a sub-agent tree, not an ensemble — the parent waits for its children, the children don't talk to each other, the whole thing collapses on the parent's run.

An agent ensemble is shaped differently:

The runtime is now ensemble-shaped, not just agent-shaped. That's the v0.25 framing change worth taking forward.

RFC S — the three primitives the ensemble shape needed

exp3's multi-agent loop and exp4's webhook-driven workflow each surfaced gaps in what the substrate could express. RFC S landed three primitives across #417 + #418 + #419 + #420 + the v0.25.0 promotion (de98558):

P0 — Context op=time (the agents finally have a clock)

Closes F34. Pre-RFC-S, Context had op=tools, op=doc, op=help, op=self — but no op=time. An agent that wanted to bucket events into 5-minute windows had to shell out to Bash date, which dragged Bash into its allowed_tools surface for no other reason. A consolidator that wanted to enforce a "this work is for cycle N, not cycle N-1" rule had no way to compute N.

Context op=time returns {unix_ms, rfc3339, monotonic_ns}. Cycle bucketing is now cycle = floor(unix_ms / 300000) directly in the agent's prompt. Deadline math is wall-clock subtraction. Both the collectors and the consolidator in exp5 compute the cycle from op=time — no Bash in either's allowed-tools list. The 5-minute bucket is exact.

P1 — Channel.await (the missing fan-in combinator) + Channel.broadcast (its symmetric twin)

Closes F35. Pre-RFC-S, Channel.subscribe was single-channel. There was no any/all/at_least_n combinator across multiple channels. A consolidator waiting for N collectors had to poll one channel in a loop, count distinct ids, and burn its max_iterations budget with no clean timeout. The only native AND-barrier (Agent.parallel_spawn) couples agents into a single run — exactly the coupling an ensemble is supposed to avoid.

Channel.await is one call:

Channel op=await
        channels=["exp5-pings"]
        mode="at_least"          // or "any" | "all"
        n=5
        wait_ms=120000

→ { satisfied: true,             // n_reached OR all/any satisfied
    timed_out: false,
    fired: ["exp5-pings"],
    results: { "exp5-pings": [...] },
    total_messages: 5 }

One round-trip, one budget, three modes. mode: any for "wake on the first ping from any channel" (race semantics). mode: all for "wake when every channel has at least one ping" (rendezvous semantics). mode: at_least with n=k for "wake when k pings have arrived across these channels" (the exp5 case). Each mode pairs with wait_ms for a hard deadline — timed_out: true is a real result, not an error.

The bonus symmetric primitive shipped at the same time: Channel.broadcast publishes one payload to N channels in one call. The pair is the natural shape — await is fan-in, broadcast is fan-out, and an ensemble that wakes on one event and signals N consumers needs both halves.

P2 — schedule max_fires (self-terminating ensembles)

Closes F36. Pre-RFC-S, a ScheduledRun fired indefinitely until the operator manually called retire. A test ensemble that wanted to run "3 cycles then stop" needed an external watcher — a cron, a tool call, a human. No way to express "this is a bounded experiment."

max_fires: 3 on the schedule def is enough. The sweeper increments a fire_count column on each successful fire; when fire_count == max_fires, the def is auto-retired on the same tick. exp5's six schedule defs (five collectors + one consolidator) all set max_fires: 3; after three cycles the boot log emits scheduler: reached max_fires=3 — retired def ×6 and the ensemble is done. No watcher, no manual cleanup.

The escape hatch for forks: a forked def explicitly setting max_fires: 0 lifts the cap (so a long-running production ensemble can fork a bounded test ensemble without inheriting its cap). Reviewed and added in #418.

The exp5 pipeline — how the ensemble actually runs

Here's the topology in one diagram:

SCHEDULER (cron "*/5 * * * *", max_fires:3 each)
   ├── exp5-collect-hn       ──┐
   ├── exp5-collect-wired    ──┤  fire in PARALLEL
   ├── exp5-collect-engadget ──┤  every 5 minutes
   ├── exp5-collect-ars      ──┤
   └── exp5-collect-tc       ──┘
       │
       │  each: Context op=time → cycle
       │        HTTP GET feed-url → pick ≤5 AI/agentic items
       │        Memory.set scope=user key=digest:<cycle>:<feed>:<n>
       │
       └── on_complete: channel.publish → exp5-pings  (scope: global)

SCHEDULER (cron "1,6,11,…", max_fires:3)
   └── exp5-consolidate
       │
       │  Channel op=await
       │      channels=[exp5-pings]
       │      mode=at_least  n=5  wait_ms=120000
       │  → satisfied: true (5/5, n_reached)
       │
       │  Memory.list prefix="digest:<cycle>:" → 25 items
       │  dedup by normalized URL → 23 unique items
       │  Memory.set consolidated:<cycle>
       │
       └── mcp__telegram__send_message  → {"ok":true,"message_id":22}

The collectors don't talk to each other. They don't know about the consolidator. They don't even need the Channel tool — the scheduler's on_complete hook does the ping on their behalf, stamping each message with the firing schedule's name (so the consolidator can count distinct collectors by schedule_name, not by publisher identity).

The consolidator doesn't know the collectors' run_ids. It awaits the channel, loads memory by cycle prefix, deduplicates, sends the message. If two collectors failed (host outage, allowlist refusal), satisfied=false, timed_out=true, total_messages=3 — the consolidator sends a partial digest with k/5 sources in the header and the ensemble survives. No retry storm, no half-state in the run table.

Three follow-up fixes — what stress-testing the ensemble surfaced

The first complete exp5 run landed on v0.25.0 with the three RFC S primitives wired in. It worked, with three rough edges. Each one was a real gap the experiment was the first thing to exercise, and each one shipped in the next release within hours.

F37 — on_complete channel.publish honors declared scope (#422, v0.25.1, RFC T)

F37 · scope mismatch on the scheduler hookThe collectors ran under user_id: exp5, so the scheduler's on_complete: channel.publish hook published the ping under user/exp5 — even though the channel was declared scope: global. The consolidator's Channel.await on the global channel read 0 messages. The collectors had succeeded; the consolidator timed out.

Root cause: the scheduler's dispatch path inherited the run's user scope into the publish call, ignoring the channel's declared scope. The v0.25.0 workaround was to declare the fan-in channel scope: user, which sort-of worked because all five collectors shared user_id: exp5. But that's wrong — a fan-in channel is shared, the operator declared it global for a reason, and the hook is operator-authored config (not a model-authored choice that needs principal-overriding discipline).

The fix: a new ResolveChannelScope resolver in internal/api/http, wired into the scheduler so the publish hook resolves the channel's declared scope and uses that, not the run's scope. The same resolver also catches the static-vs-dynamic channel mismatch the existing channel CRUD code already handled correctly. exp5's global fan-in channel now works as authored.

F38 — scheduled runs resolve their agent in the def's tenant (#424, v0.25.2, RFC U)

F38 · scheduler tenant resolutionThe fully-dynamic exp5 variant (every entity created at runtime via REST) had each scheduled collector fire fail with unknown agent: exp5-collector. The scheduler's fire path was resolving agents from the yaml-static registry only, not the AgentDef substrate store the dynamic variant lived in.

Same shape as the F30 webhook-spawn fix from exp4 — a runtime entity (webhook there, schedule here) needed to resolve a runtime agent under a consistent tenant, and the resolution path wasn't following the AgentDef store. The fix: scheduled runs now stamp the def's tenant onto the spawn, and the agent resolves against both static and dynamic registries.

With #424 in place, the dynamic exp5 collectors ran cleanly — the scheduler-fired runs spawned, executed, pinged, and the consolidator's fan-in landed at 5/5.

F39 — dynamic stdio MCP env interpolation (#426, v0.25.3, RFC V)

F39 · env not interpolated at spawn timeThe fully-dynamic Telegram leg failed: mcp__telegram__send_message returned HTTP 404 because the bot received the literal string ${LOOMCYCLE_TELEGRAM_BOT_TOKEN} as its API token instead of the resolved value. Dynamic stdio MCPServerDef env values weren't being interpolated against the runtime's env at spawn time, only at YAML-load (which the dynamic path bypasses by design).

The same envelope-of-references pattern we hit in MCPServerDef header expansion back at v0.18 — there for HTTP MCP headers, here for stdio MCP env. The fix: expand ${ENV} references in the dynamic stdio MCP def's env map at spawn time, using the same prefix-allowlisted resolver the rest of the substrate uses. The Telegram bot now receives the resolved token; send_message returns {"ok": true, "message_id": 22} on the dynamic variant with no workaround.

Three fixes, three releases, 24 hours. v0.25.1 closed F37, v0.25.2 closed F38, v0.25.3 closed F39. Each had a regression test that fails on the prior tag — no "we shipped a fix, did it work?" ambiguity. The dynamic exp5 variant is the soundness test: it ran end-to-end on v0.25.3 with zero workarounds.

Companion change — every experiment now ships as a self-contained example

The five experiments aren't just sandbox reports anymore. Every prior experiment now lives as a self-contained directory under loomcycle/examples/, with everything an operator needs to clone and exercise the substrate in minutes:

ExampleWhat it exercisesExternal deps
exp1-tools-usage/Built-in tools (Read / Write / Edit / Bash / Web): a coding agent writes, runs, and verifies a program in a sandbox.loomcycle + a provider
exp2-interruption/Interruption (human-in-the-loop): Yes/No gating with REST-resolved branches.loomcycle + a provider
exp3-multiagent-loop/Channel + Memory + Evaluation + Context: a 3-agent, 5-hop refine/evaluate loop.loomcycle + a provider
exp4-gitea-telegram/Inbound webhooks + 3rd-party MCP + Telegram: coder → PR → reviewer-merge → advisor → Telegram.Gitea, Telegram, gitea-mcp binary
exp5-agent-ensemble/(joining shortly) Scheduler-driven news-digest ensemble with Channel.await fan-in.RSS feed allowlist + Telegram

Each directory carries its own loomcycle.yaml, a run.sh launcher that boots the substrate from ./work, a .env.local.example template (empty values; first run.sh copies it to .env.local for you to fill in), and a README.md with step-by-step reproduction + verification instructions. Every example routes Anthropic OAuth primary → DeepSeek v4-pro fallback via tier: middle, so an operator with either credential exercises the same agent.

The point of shipping the experiments as examples: the sandbox reports are how we test loomcycle from the outside; the examples are how operators learn from the same code path. Same primitives, same configs, same wire surface — minus the experiment-specific narration, plus the documentation and the reproducibility scaffolding.

The lesson worth keeping

A multi-agent system is not an ensemble. An ensemble needs primitives that survive the loss of any single agent's run. Agent.parallel_spawn ties N children's lifecycles to one parent. Channel.await ties N agents' coordination to a substrate channel that lives independently. The shift from "the parent waits for the children" to "the substrate is the connective tissue" is what makes the ensemble shape work — and it's what unlocks the scheduler-driven, cron-fireable, self-terminating, partial-tolerant pipelines real production agentic systems need.

What this means for the runtime going forward: parallel agent ensembles are now the design center, not a sub-case of multi-agent. The substrate primitives are sized for it. The examples directory shows it. The next experiments (long-running async, ensemble-of-ensembles, RAG-backed) build on this base.

Companion reading from the operator-via-MCP series: exp1 + exp2 — tool access and interruption · exp3 side analysis — the MCP wedge · exp3 main — the multi-agent refine loop · exp4 — Gitea + Telegram + secret redaction. Plus the upstream design: doc-internal/rfcs/agent-ensemble-primitives.md for the locked RFC S decisions.