§ release note

Code as agent. The design we replaced before shipping.

2026-05-31 · by Dennis Gubsky · ~9 min read

RFC J shipped in v0.16. provider: code-js runs operator-authored JavaScript via goja instead of calling an LLM API. AgentDef references it like any other provider; from outside the agent's perspective it is an agent - same loop, same OTEL spans, same scheduler / webhook / A2A reachability, same sub-agent composition, at zero token cost. JobEmber.ai's nightly ATS scrape (four job boards, dedupe against per-user memory, publish fresh jobs to a channel) now runs as one of these. No LLM in the pipeline. ~200ms wall time. Zero tokens.

That's the feature. The engineering punchline is that we built it twice. The first design - a parked-goroutine continuation model with state held across Provider.Call invocations - was reviewed, integration-tested, working, and a few honest engineering concerns away from being unmaintainable. The second design - stateless replay using the run's existing transcript as a durable memoization log - superseded it before either shipped to a release. Both designs are in main history; PR #307's commit subject says it plainly: "stateless replay model (supersedes #306)."

The honest engineering core of v0.16 is the architectural reversal, not the feature. Three trade-offs forced it. The result is a runtime where suspend/resume is symmetric for free, not because we engineered the suspend half - because we removed it.

The fundamental problem - JS code that calls tools

The single design constraint that shaped everything: operator JavaScript needs to make tool calls. Not pretend tool calls. The real ones - Memory.get(...) through the substrate's pluggable backend, WebFetch({url}) through the loop's HTTP host-allowlist, mcp__server__tool(...) through the same MCP client an LLM agent would call. With the loop's existing context, hooks, OTEL spans, credential substitution. Otherwise we'd ship a code-agent that pretends to be an agent but isn't reachable through the substrate's existing surfaces.

That constraint locked a few decisions immediately. The provider streams EventToolCall + StopReason "tool_use" exactly like an LLM driver. The loop dispatches the tool call (its ctx, its hooks, its OTEL recorder, its credential substitution). The loop re-invokes Provider.Call with the result appended to the transcript. The provider never imports internal/tools. Leaf-package layering is preserved; nothing in this design is novel. The interesting part is what happens between turns.

Operator JS is most naturally synchronous. Promise/async support in goja is functional but rough; an operator writing const text = WebFetch({url}) expects a string back, not a Promise to await. So the JS must look like it suspends - pausing on the tool call, resuming when the result arrives, synchronous from the operator's perspective. The question is what suspends and how. That's where the two designs diverge.

Design A - the parked-goroutine continuation (PR #306)

The first design lived in PR #306 and called itself Appendix-A Mechanism 1 in the RFC. The shape:

The JS runs on its own goroutine inside the goja runtime. A bound Go function - the tool surface - parks the goroutine on a Go channel when the operator's JS calls a tool. The Go side returns EventToolCall to the loop; the loop dispatches; the result round-trips back. The provider's Call matches the result against a parked continuation (keyed by a token minted into the tool_use ID), unblocks the channel, and the JS goroutine resumes synchronously - const x = Memory.get(...) returns the value as if the parked time never happened.

It worked. The integration test drove the real loop.Run against two JS tool calls; each round-tripped via the loop's Dispatcher.Execute and the result returned synchronously into the JS. Operator JS was natural synchronous code. 24 concurrent isolated runs passed clean; crafted-resume fails loud; CPU-bound loops killed by rt.Interrupt. Sound from a behavior perspective.

Three problems we couldn't unsee.

Problem 1 - the provider held state across Provider.Call invocations. Every other provider in the runtime (anthropic, openai, ollama, mock, mock-stable) is stateless across calls. The loop hands them the transcript; they return a response; they don't remember anything between calls. code-js needed to keep the parked goroutine and its goja runtime alive between the loop's dispatch turns - typically a registry keyed by run token, with a leak-backstop sweeper to clean up runs that never resumed. Every other provider in the runtime obeyed an invariant; this one didn't.

Problem 2 - cancellation depended on a goja quirk. goja's rt.Interrupt can stop running JS, but it cannot break a host function parked on a Go channel - goja issue #97. So the cancel path had to be a Go ctx deadline, with rt.Interrupt wired additionally for CPU-bound JS that never reaches a tool call. Two cancel mechanisms; one upstream issue documenting the constraint we'd inherited. Tests covered both paths, but the design depended on knowing about the quirk; a future goja upgrade resolving #97 wouldn't be a behavior change but the comments would lie about why.

Problem 3 - not resumable across restart or replica. The parked goroutine lives in process memory. If loomcycle restarts mid-call, the continuation is gone. The run is stranded. The transcript on disk shows the tool result but no JS state to resume into. We documented this as "Not resumable across restart" in the help topic and called it a sharp edge. The other providers don't have this problem - they're stateless, so a restart picks up at whatever the transcript says is the latest turn.

These three problems all pointed at the same thing: the design held state where the rest of the runtime doesn't, and the runtime's existing properties (statelessness, restart-resumability, cluster portability) didn't apply to code-js as a result. We'd built a feature that opted out of three architectural invariants the rest of the substrate paid for.

Design B - the stateless replay (PR #307)

The replay design started from a single observation: the transcript already contains every prior tool call and its result. The transcript is durable. The transcript is the memoization log.

The model becomes:

Each Provider.Call builds a fresh goja runtime. Nothing held across calls.
The runtime starts executing the operator's JS from the top.
Every tool binding is wired so that on call, it walks the run's existing transcript (in req.Messages) looking for an already-recorded result matching this call's position in execution order.
If found - the tool returns the recorded result synchronously. The JS keeps running.
If not found - this is the frontier. The tool calls rt.Interrupt; the JS stops; the provider returns EventToolCall to the loop.
The loop dispatches the tool (its ctx, hooks, OTEL, credentials), appends the result to the transcript, and re-invokes Provider.Call.
The next Call builds a fresh runtime, fast-forwards through the now-extended transcript, hits the next un-recorded call (or completes), repeat.

The JS is rebuilt and re-executed from the top on every turn. The transcript is the durable memoization log; every tool call dispatches exactly once because the transcript records its result the first time and replay returns the recorded value every subsequent time.

All three problems collapse:

No state across Provider.Call invocations. The runtime is built and discarded within each Call. The Provider contract every other driver obeys is restored. The leak-backstop sweeper, the registry of parked continuations, the run-token plumbing - all deleted.
Cancel is just the Call's ctx. No goja #97 dependency. No two-path cancel logic. The JS execution time inside a single Call is bounded by the run timeout; cancelling the ctx mid-Call stops the JS via rt.Interrupt and tears down the runtime on return. Single mechanism, same as every other provider.
Resumable across restart and replica for free. The transcript is in Postgres. A different replica's Call walks the same transcript, hits the same frontier, dispatches the same un-recorded tool. The substrate's existing restart-resumability and cluster portability apply to code-js without any code-js-specific work.

Replay needs determinism - and determinism is now always-on

Replay only works if the JS executes the same way each time. Otherwise the fast-forward through the transcript could match a different sequence of tool calls than the recorded one, and the provider would dispatch something that doesn't match what already happened.

So ambient determinism is always on, not an opt-in. The runtime's Math.random() is seeded from a per-run seed (derived from the run ID). Date.now() and new Date() are anchored to the run's real start time + a monotonic per-call offset. No filesystem, no setTimeout, no fetch, no host-side I/O of any kind reaches the JS - the only sources of non-determinism inside the runtime are Math.random and the clock, and both are seeded. Given a fixed transcript, the JS produces the same tool-call sequence every replay.

The --deterministic flag (LOOMCYCLE_CODE_AGENTS_DETERMINISTIC=1) didn't go away; it changed meaning. It now means freeze the seed and clock anchor across runs - useful for testing and replay scenarios where you want two independent runs of the same agent to produce identical output. Within-run determinism is the baseline; cross-run determinism is the opt-in.

The honest residual scope: a name mismatch against the recorded transcript sequence fails loud with code_agent_replay_divergence. If an operator modifies agent_code/scraper/index.js mid-run such that the JS's tool-call sequence on resume differs from what the transcript recorded, the provider doesn't silently dispatch the wrong tool - it surfaces the divergence as a terminal error. Loud, not silent, when state and code disagree.

The trade-offs we explicitly accepted

Three of them, named loudly because they're real.

O(N²) re-execution cost. A code-agent that makes N tool calls re-executes from the top on each of the N Provider.Call invocations. The first call dispatches tool 1, the second re-runs the prelude + dispatches tool 2, the third re-runs the prelude + tool-1-replay + tool 2-replay + dispatches tool 3, and so on. For a code-agent making 5 tool calls with a tiny prelude, this is invisible. For one making 50 tool calls with a heavy prelude (large data prep, complex parsing before the first tool call), it's measurable. The mitigation is operator-side: keep heavy work after the first tool call, not before. The bundled ats-scraper example does exactly this - the multi-board scraping happens via WebFetch tool calls; the JS's actual code is small. The pattern wants you to do the work in the tools, not in the JS preludes.

The JS surface is synchronous-blocking. No async/await. No parallel tool calls from inside the JS. Operators wanting parallel tool calls fan out via Agent.spawn(...), which is already concurrent at the loomcycle layer - child runs execute in parallel by default. This was the trade-off in the parked-goroutine design too; replay didn't change it.

Allowed-tools is the only sandbox layer beyond goja. Sandbox is goja's runtime: no eval, no Function constructor (both deleted from the runtime at boot), no fetch, no filesystem, no setTimeout. The remaining surface is whatever the operator's allowed_tools permits. A tool not in allowed_tools has no JS binding at all - a code-agent without Memory in its allowed tools sees ReferenceError: Memory is not defined, not a permission-denied error. Default-deny by construction.

One naming detail that needed fixing during review

The early PR #307 commits used lowercase JS bindings (memory.get(...), channel.publish(...), agent.spawn(...)) while flat tools (WebFetch, WebSearch) kept their canonical CamelCase. A three-way split - substrate Memory / yaml Memory / JS memory - and an inconsistency between meta-tools and flat tools.

The final commit (refactor(rfc-j): CamelCase tool names in JS - consistent with the substrate) reverses the split. The JS surface is now CamelCase everywhere: Memory.get(...), Channel.publish(...), Agent.spawn(...), WebFetch(...), mcp__server__tool(...). The only distinction in the JS is shape, not case: the three multi-op meta-tools (Memory, Channel, Agent) are objects with a method per op; every other allowed tool is a flat function. One rule: a tool's name is identical everywhere, no casing translation.

Small detail, real cognitive cost. Worth the late refactor.

The architectural insight is the meta-engineering one. The parked-goroutine design wasn't wrong about how to suspend JS. It was wrong about where state lives. The substrate already had a durable, content-addressed, replica-portable, restart-resumable record of every tool call the agent ever made: the transcript. The first design built a parallel state machine - registry, run tokens, cleanup sweepers, leak backstops - to hold the same information the substrate already held. Replay removed the parallel state. Suspend/resume is now symmetric for free because nothing is held across the loop's dispatch gap: the suspend half doesn't exist, so the resume half is just "read the transcript and pick up where you left off." The cheapest engineering is the engineering you don't do.

What you can do with it today

On v0.16.0, with LOOMCYCLE_CODE_AGENTS_ENABLED=1, an AgentDef declaring provider: code-js resolves agent_code/<name>/index.js from $LOOMCYCLE_CODE_AGENTS_ROOT (default ./agent_code) and runs it through the replay model. The bundled ats-scraper example shows the canonical shape - fetch a few job boards via WebFetch, dedupe against per-user Memory, publish fresh jobs to a Channel, return a summary string. Zero tokens. Scheduled at 3am via a ScheduleDef, the run lands on the scheduler's existing per-tenant fairness, OTEL span, A2A-reachable agent surface - same as any LLM agent.

The validator fails loud at boot for any provider: code-js agent whose JS is missing or unparsable. Naming the agent + path. Don't ship a code-agent that breaks at first scheduled fire when the operator's looking the other way.

Companion reading: Scheduled runs at 30,000 fires (the scheduler code-agents fire from, with the same 100% completion guarantee for replay-based agents as LLM agents), Two memory interfaces (the Memory primitive the code-js sandbox routes through, with both flat-KV and layered paradigms now available), and the bundled Context.help code-agents topic on any post-v0.16 build for the full operator picture.