§ production story

Collapsing four hallucinating LLM orchestrators into zero tokens. Two bugs the migration found.

2026-06-04 · by Dennis Gubsky · ~10 min read

JobEmber.ai's agentic pipeline had four batch orchestrators. Each one took some list of N items, fanned out N LLM worker agents in parallel, optionally reduced the workers' outputs, and emitted a single result. Three of them lived in TypeScript on the web side, hand-rolled with Promise.allSettled over runner.run(...). One had been lifted into the runtime as an LLM agent with a careful system prompt instructing it to "fire all N spawns in ONE iteration, then wait."

The orchestration work itself was deterministic. Partition a list of job sites round-robin into N slices. Chunk a list of matches into batches. Wrap each item in a worker prompt. Spawn one child per slice. Collect results. None of that needed a language model.

It was costing us ~8,000 tokens per run on the LLM-orchestrator side. And the model occasionally violated its own prompt - the job-search-batch orchestrator was supposed to fire all N spawns in one iteration so they ran in parallel, but the weak-tier model would sometimes spawn one, wait for it to finish, then spawn the next. The fan-out it was hired to do, it was failing to do.

v0.20.0's inline code_body ingestion (see yesterday's post) made the right fix viable: replace all four orchestrators with deterministic code-js agents. JS for what JS is good at; LLM for what LLMs are good at; nothing in between burning tokens on logic that wasn't a language problem.

This post is the production-side account. The four orchestrators. The cost collapse to zero tokens. The hallucination that goes away when the orchestrator doesn't have to be coaxed into the right behavior. And two latent loomcycle bugs the migration surfaced - both of them in shapes that the LLM-orchestrator path had been hiding.

Why an LLM was doing the orchestration in the first place

It's worth being honest about how each orchestrator ended up as an LLM agent in the first place. Two of them never had to be - they were TypeScript fan-outs with Promise.allSettled, no LLM in the orchestration layer, but with two real production problems:

N orphan run-ids. Each Promise.allSettled over runner.run(...) spawned N independent run-ids that the orchestrator-as-TS-code had to monitor itself. There was no single "Stop" button - clicking cancel on the parent operation didn't actually cancel any of the children.
Not cron-fireable. The TS orchestrator only ran when the HTTP route was hit. For an end-to-end "search at 3 a.m." flow to work, the fan-out logic had to live in the runtime, where loomcycle's ScheduleDef primitive could fire it autonomously.

Lifting those two into the runtime as agents fixed both problems instantly. The only question was: does the agent need to be an LLM, or can it be deterministic JS? Pre-v0.20.0, the answer for cloud deploys was effectively "LLM, because shipping a JS body required a host filesystem mount." Post-v0.20.0, the answer is "deterministic JS - register the body inline via ensureCodeAgent and never touch a sidecar disk."

The other two were LLM agents from the start because the work involved parsing as well as fan-out - and parsing felt like LLM territory. In practice it wasn't. The ats-filter-batch reducer ran a chunked filter over relevance scores; the parsing was a JSON envelope an ES5 function could handle in twenty lines. The job-search-batch was a round-robin partition; the partitioning was four lines of arithmetic.

The four orchestrators

Across two PRs (#115 + #117 + #118), we replaced four orchestrators with code-js. The shape is the same across all of them: parameters arrive via the v0.21.0 metadata channel (not the prompt), the body fans out with Agent.parallel_spawn, and the orchestrator is a pure function of input.metadata - replay-safe by construction.

Orchestrator	Was	Now	Shape
`employer-research-batch`	TS `Promise.allSettled`	code-js	Fire-and-forget - children self-ingest via `postResearchIngest`; orchestrator counts
`cv-cl-batch`	TS `Promise.allSettled`	code-js	Fire-and-forget - children self-ingest via `patchApplication`
`ats-filter-batch`	TS chunked map	code-js	Map-reduce - chunked filter workers, parse + dedup verdicts, single envelope back
`job-search-batch`	LLM agent (~8K tokens/run)	code-js	Round-robin partition `jobSites[]` into N slices, spawn one worker per slice

The fourth one is the headline economic story. job-search-batch was burning ~8,000 tokens per run to do round-robin partitioning - four lines of i % N arithmetic - because the runtime didn't yet have a way to register a JS body without a host filesystem. With v0.20.0's inline code_body, the agent's body is:

// Partition jobSites round-robin into N slices, fire N workers in parallel.
function run({ prompt, metadata }) {
  const jobSites = mcp__jobs__getAgentContext({}).jobSites;
  const N = computeN(jobSites.length, metadata.maxWorkers);
  const slices = roundRobin(jobSites, N);
  const childIds = [];
  for (let i = 0; i < N; i++) {
    childIds.push(Agent.spawn({
      agent: "job-searcher",
      prompt: buildWorkerPrompt(slices[i], metadata),
    }));
  }
  Agent.parallel_spawn(childIds);  // wait for all
  return { final_text: `dispatched ${N} workers across ${jobSites.length} sites` };
}

Zero tokens. Replay-deterministic (pure function of input.metadata + getAgentContext's recorded tool result). Single run-id that cascades cancel to all children. job-searcher workers stay LLM agents - searching, refining, scoring is genuinely a language task - but the orchestration stops paying for itself in tokens.

The "FIRE ALL N SPAWNS IN ONE ITERATION" hallucination is worth naming. The LLM version of job-search-batch had a system prompt explicitly instructing it to emit all N tool-use calls in one model turn so the loop dispatched them in parallel. The weak-tier model would sometimes emit one, wait for the result, then emit the next - serializing the workers. Each "I'll just spawn one, see the result, then continue" decision turned a 30-second parallel run into a 5-minute serial one. The JS version doesn't have an opinion about how to dispatch; it just calls Agent.parallel_spawn. The hallucination wasn't about the data - it was about the orchestration logic itself.

Two bugs the migration found

Migrating real fan-out orchestrators onto code-js surfaced two latent bugs in the v0.20.0 runtime. Both got fixed and landed in v0.21.0. Both are in shapes the LLM-orchestrator path had been hiding because no LLM agent ever reproduces them.

Bug #1 - The wall-clock budget was 120 seconds, and the timeout was misclassified

loomcycle · PR #359 · v0.21.0Fan-out orchestrators tripped a CPU-oriented 120s timeout and surfaced as code_agent_threw: run: context deadline exceeded at <innocent source line>.

RFC J's code-js provider was sized for CPU-bound JS - a function that runs to completion in goja, no I/O, no waiting. The default wall-clock budget was 120 seconds. That assumption held for every code-js agent shipped to date, because every one had been a self-contained transform.

A fan-out orchestrator doesn't fit that shape. It calls Agent.parallel_spawn and parks waiting for LLM children that take three, five, ten minutes. The wall-clock budget keeps ticking while the orchestrator is parked. So the resume turn (when parallel_spawn returns) starts already over-budget, and the runtime interrupts the JS at the next interruptible bytecode - typically parseBatch or whatever line happened to be running when the deadline expired. The error message named that line, blaming entirely innocent code.

Worse, interruptWatch's timer branch fired rt.Interrupt(context.DeadlineExceeded) without cancelling the parent context. classifyRunErr only special-cased ctx.Err() != nil, so it fell to the default and emitted code_agent_threw instead of a distinct timeout error. The error said "the JS threw"; the JS hadn't thrown. The runtime had killed it.

The fix has three parts:

Distinct error class. interruptWatch's timer branch now sets replayState.timedOut (atomic) before the interrupt; classifyRunErr emits a code_agent_timeout stating the budget, attributing no source line, separate from code_agent_cancelled (ctx cancel) and code_agent_threw (real JS throw). The error message points at the override knobs.
Per-agent override. run_timeout_seconds on the AgentDef (the operational, not-in-content-hash field, mirroring retry_attempts). Threaded through the four mirrors (config.AgentDef, mergedDef, SubstrateAgentDef, ToConfigDef) and into RunMeta.
Per-run override. run_timeout_seconds on the /v1/runs + /v1/sessions/{id}/messages wire surfaces. Resolved server-side as per-run > per-agent > global default.

The per-agent budget threads through all four loop.Run entry sites (RunOnce, handleRuns, handleMessages, and - caught in self-review - runSubAgent, the one site that almost shipped without the override, exactly the fan-out topology the override was designed to serve). JobEmber.ai's batchRunTimeoutSeconds() scales the budget with the fan-out width: ceil(N / concurrency) waves, clamped to [180s, 1800s].

Why this was hidden behind the LLM-orchestrator path. An LLM agent doesn't park in Agent.parallel_spawn - its tool calls are issued by the model, not by a synchronous JS function. The loop's outer ctx covers the LLM agent's wall-clock; the code-js inner budget never fires. Until we ran a fan-out orchestrator as code-js, the budget had never been load-bearing.

Bug #2 - Go map[string]any → JS object key order was non-deterministic

loomcycle · PR #366 · v0.21.0Same input.metadata produced JS objects with different key order on each replay turn. JSON.stringify emitted byte-different bytes turn-1 vs replay → code_agent_replay_divergence.

Code-js is replay-based. Each turn rebuilds the goja runtime, re-converts the run's metadata (a Go map[string]any) into input.metadata via rt.ToValue, fast-forwards through the run's transcript, and stops at the first un-recorded tool call. The replay must be byte-identical to the original execution; if it diverges (a tool call's input differs, an emitted text differs), the runtime fails the turn with code_agent_replay_divergence - the primitive that makes "deterministic by construction" a real promise rather than a hopeful one.

Go deliberately randomizes map[string]any iteration order per access. Two iterations of the same map yield keys in different orders. rt.ToValue on a Go map walks it and inserts keys into the JS object in iteration order. JS objects are insertion-ordered. So the JS object that hands back to input.metadata has a different key order on every turn.

For 99% of code-js agents this didn't matter - they access input.metadata.foo by key, and key lookup is order-independent. The runtime never even noticed. But the moment an agent did JSON.stringify(input.metadata.matches) to build a prompt for a child worker - exactly what ats-filter-batch does - the stringified bytes depended on the JS object's key order, which depended on Go's randomized map iteration, which was different every turn.

The result: replay divergence on tool call #0. code_agent_replay_divergence, emitted with a message suggesting the operator set LOOMCYCLE_CODE_AGENTS_DETERMINISTIC=1 - which would not have helped, because that flag pins the RNG seed and the clock anchor, not Go-map key order.

The fix is stableJSValue(): a recursive value converter that materializes every Go map as a JS object with sorted keys. Arrays keep their order (Go slices are ordered, so they're already deterministic); only maps get reordered. JS objects are insertion-ordered, so sorted insertion yields sorted-by-iteration and sorted-by-JSON.stringify. The key values and reserved-key precedence (user_id / agent) are unchanged; only the output key order becomes deterministic. It was already non-deterministic, so nothing could have relied on the prior behavior.

The error message was also corrected: it had wrongly implied LOOMCYCLE_CODE_AGENTS_DETERMINISTIC=1 fixes all divergence. It now names non-deterministic key order as a distinct cause and notes that loomcycle sorts input.metadata while the agent must normalize any other serialized object (e.g. an object built inside JS from non-deterministic sources).

The normalizeMatch deletion is the soundness test of the fix. JobEmber.ai's first response to the divergence was a defensive workaround: a fixed-key normalizeMatch function inside the JS agent that built a new object literal with the keys in a known order before serializing (commit 13832fd). That worked - replays stopped diverging. Once loomcycle PR #366 landed and was deployed, the next commit (7fc408c) deleted normalizeMatch and stringified the slice directly. Replay staying divergence-free without the workaround is the test that the fix is correct. Same shape as a backend regression test: it's not the patch that proves the fix; it's the absence of the workaround.

Why this was hidden behind the LLM-orchestrator path. An LLM agent doesn't replay JS. It has no analogue to "rebuild the runtime each turn and re-convert metadata"; the LLM consumes RunInput as text. Map-key order has never mattered for LLM agents. The instant the orchestration moved to code-js and an orchestrator started serializing input.metadata inside JS, the latent non-determinism in rt.ToValue became visible - and visible only for chunk-in-JS orchestrators that pass objects through metadata, not pre-built prompt strings.

What this costs now vs what it cost before

The cost story per orchestrator is straightforward. Each LLM-orchestrator run was on the order of 8,000 tokens - model output is small (a few tool_use calls), but input grew with the system prompt + the worker prompt template + each worker's emitted result the orchestrator had to read to know whether to spawn more. On a tier-2 model at typical retail pricing that's a few cents per run; at JobEmber.ai's daily search cadence across active users it added up to a meaningful share of the agentic-pipeline bill.

Across the four orchestrators, job-search-batch was the only one that had been an LLM agent - the other three were TS Promise.allSettled, free per run. So the raw token-cost collapse is for one of four. But the other wins apply uniformly to all four:

One run-id to monitor + cancel. The TS Promise.allSettled shape gave N orphan run-ids and no cascading cancel. A code-js orchestrator is one run, and cancelling it cascades to every child via loomcycle's multi-replica cancel primitive. The "Stop" button on the UI now actually stops the work.
Scheduler-fireable. A TS route handler runs when the route is hit. A loomcycle agent fires on cron via ScheduleDef. The autonomous "3 a.m. job search" flow that's been on the roadmap doesn't need a separate scheduler component - the existing ScheduleDef points at the new code-js agent and the same fan-out runs every night.
Replay-deterministic by construction. A TS fan-out crashes mid-way and you've lost the work in progress. A code-js orchestrator that gets interrupted at any turn resumes from the transcript with the same partition, the same workers, the same metadata - and (per PR #366) the same byte-stable serialization. Cancel → resume → continue.
No hallucination on the orchestration logic. The JS version of job-search-batch doesn't have an opinion about whether to spawn workers serially or in parallel. It calls Agent.parallel_spawn. Every run.

The pattern worth taking forward

If a step in your agentic pipeline can be expressed as a 30-line deterministic function, it should not be an LLM agent. The work that LLM agents are good at is everything else - the work where the value of a language model is irreplaceable. Routing, partitioning, chunking, reducing, formatting - those are not language tasks. Asking an LLM to do them costs tokens, introduces non-determinism, and occasionally surfaces in a model "creatively" reinterpreting its instructions.

The reason this pattern hadn't been more common in agentic systems is that the JS-orchestrator path required a host filesystem mount, which doesn't survive cloud / container / n8n interactive deployments. v0.20.0 closed that; v0.21.0 (the two fixes above) closed the runtime gaps the first real fan-out adopter surfaced. The path is now clear:

Identify the orchestration steps in your agentic pipeline that are deterministic.
Register them as code-js agents via ensureCodeAgent with inline code_body.
Pass non-secret parameters through the v0.21.0 metadata channel (the prompt stays for the LLM workers).
Set run_timeout_seconds on the AgentDef to cover the longest expected fan-out wave.
If you serialize input.metadata objects inside JS (the chunk-in-JS shape), loomcycle v0.21+ sorts them for you; otherwise, normalize any other objects you serialize.

Companion reading: Code agents without a host filesystem (the v0.20.0 inline code_body primitive that made this whole story possible), Code as agent - and the design we replaced before shipping (the original RFC J writeup that introduced code-js), and We inverted a startup race - and found four asymmetries (the same lesson - every dynamic-path seam must work the same as the static-path seam - applied to MCPServerDef).