§ release digest

Budgets, costs, and encrypted credentials.

2026-07-04 · by Dennis Gubsky · v1.9.0 → v1.11.1 · ~14 min read

Three days. Five releases. Four arcs, deeply intertwined. Last weekend's release digest closed with a note about what a shared multi-tenant deployment needs next: a tenant's own provider keys, per-scope cost accounting, and enforceable budgets. That whole thing shipped this week, and a whole-repo security review that shipped alongside it makes the substrate credible for anything above homelab use.

The four arcs, in the order they matter to an operator running loomcycle for a team: hardening (a proper security review closed 17 findings under v1.9.1, plus an MCP thin-client that self-recovers from dropped upstreams), encrypted credentials (CredentialDef lands as a substrate Def family under RFC AR, sealed per-tenant secrets, $cred: binding for HTTP MCP servers, tenant provider-key overrides by env-var name), cost attribution (RFC AV lands a per-call token-usage ledger, a GET /v1/_usage report grouped by any of tenant/user/provider/model/source, and a Web UI Usage page), and token budgets (RFC AW ships per-scope soft-warn and hard-cap tiers, a new EventLimit surfaced on every transport, and a Web UI Limits page that accepts K/M/G shorthand).

The short version. A whole-repo security review closes 17 findings across gRPC tenant isolation, A2A peer auth, mem9 SSRF, secret exposure, grep symlink escape, and provider driver correctness (v1.9.1). CredentialDef becomes a first-class substrate primitive: envelope AES-256-GCM, per-tenant DEK derived via HKDF from a deployment KEK, sealed rows only, plaintext never in the transcript. The tenant/user provider-key override lands the "a tenant bills its own Anthropic account" story that RFC L gestured at a year ago. RFC AV wires a per-call token-usage ledger with a source label so operator-key spend and tenant-key spend fall out of the same rows. RFC AW turns that ledger into an enforced budget: most-restrictive of operator/tenant/user wins, soft-warn Interruption before the ceiling, hard-cap 429 at admission, and a new EventLimit event on SSE and gRPC and MCP so a UI or an adapter can render the crossing inline in the run's transcript.

Arc 1 — the hardening pass (v1.9.1)

Since v1.0 the loomcycle codebase has grown in surface faster than the small maintainer batch could keep up with on code-review alone. A proper whole-repo security review this week produced 17 findings across five categories: tenant-isolation gaps, secret-exposure gaps, provider-driver correctness bugs, MCP resilience, and a few concurrency edges. Each finding got its own fix, its own regression test, and its own merged PR. v1.9.1 is the tagged snapshot that folds all of them into the deploy pins.

Tenant-isolation fixes

Four different transports had places where the tenant boundary was resolved from the wire instead of from the authenticated principal. Under the RFC L trust model the runtime always resolves tenant from the bearer, never the payload; under the review these were the four holes.

gRPC read + channel RPCs did not check that the caller's tenant matched the requested resource. A tenant operator could read another tenant's runs or subscribe to another tenant's channel by passing the target run/channel id. Fix: every gRPC read RPC now runs through grpcTenantScope, the same helper the HTTP handlers use. Cross-tenant reads return NotFound, not PermissionDenied, so tenant membership isn't leaked by the response shape.

A2A peer authentication was still going through the legacy pre-RFC L bearer path, which meant a peer runtime could authenticate but the resolved principal didn't carry the OperatorTokenDef's tenant/scope info. Fix: A2A now authenticates peers via the same operator-token substrate as everything else. A2A calls now carry tenant identity through the same code path as an HTTP call from the same operator.

mem9 base_url SSRF + API-key exfil. The mem9 memory backend takes a base URL and an API key; a model-authored config could set the base URL to a private internal host, causing the runtime to leak the mem9 API key to that host, or to fetch attacker-controlled content into the memory backend. Fix: block SSRF and API-key exfil at the mem9 config layer. Config-supplied base URLs are validated against an allowlist; model-authored config edits are refused if they change the base URL.

Run cancel + interrupt-resolve tenant gate. These two write RPCs (across HTTP and gRPC) resolved the target run by id but did not verify the caller's tenant against the run's tenant. Fix: both now tenant-gate the target before acting. Cross-tenant cancel/interrupt attempts return the opaque cross-tenant miss.

Secret-exposure fixes

Four gaps closed. The tool-call transcript was capturing full env-var payloads on MCP server startup (the payload was redacted on read but stored on write), so an operator opening a raw transcript row saw the plaintext. The audit-log writer was capturing full Authorization headers on inbound webhook rows. The Context op=self response was returning the operator-token pepper as a member of the config summary. And the boot log printed the OpenAI Bearer at INFO level when the driver retried a transient connect. All four now redact at write time, not read time, which is the timing-safe fix. If the write is redacted at the source, no downstream reader (an admin, a snapshot restore, a log grep) can find the plaintext regardless of what redactor toggles are set.

Provider driver correctness

Six driver-level bugs surfaced by the review.

Anthropic: replay the thinking block on tool-use continuations. A run that got a thinking-mode response with a tool_use content block was continuing the run without replaying the earlier thinking content block back to Anthropic. Anthropic refused the continuation with an invalid_request_error pointing at the missing block. Fix: replay the thinking content block on every tool-use continuation, alongside the tool_use blocks. Reasoning models are now safe to combine with tool use inside a run.

OpenAI: reasoning models use max_completion_tokens, not max_tokens. Passing max_tokens to an o-series reasoning model was silently accepted at the API layer but ignored; the model would stop only at its own max_completion_tokens default (much higher). Fix: the OpenAI driver now uses max_completion_tokens for reasoning models. The per-agent max_tokens setting now actually caps the reasoning-model output.

Ollama: surface in-stream error frames. Ollama's SSE stream carries error frames as JSON objects with an error field. The driver was reading the frame, parsing the JSON, and returning empty text (silent success). Fix: surface the error frame as a proper Go error with the frame's text. An operator sees "model out of memory" or whatever the ollama daemon actually reported, instead of a silent success and a confusingly empty response.

DeepSeek: thinking-model downgrade goes to deepseek-chat, not -flash. When a thinking-mode DeepSeek run gets a rate-limit or unavailable error, the fallback logic tried to swap in deepseek-v4-flash as the non-thinking sibling. But -flash doesn't support thinking either, and the effort hint that had been set for the initial call was surviving the swap. The re-call re-enabled thinking on the wrong sibling and 400'd. Fix: downgrade to deepseek-chat, which is the correct non-thinking model, and drop the effort hint at the swap. Combines with the fallback-vision fix from v1.7.1 to make the fallback path capability-aware in general.

Grep: re-check symlink containment. The Grep builtin walks a directory tree and opens each file. The walk phase was resolving symlinks against the caller's confinement root; the open phase was not re-checking. So a symlink resolved during the walk but pointed at an out-of-root target at open time (a TOCTOU) would open the out-of-root file. Fix: re-verify each path is inside the confinement root immediately before os.Open. Same shape as the resolveInsideRoot guard that already gates Read/Write; grep just needed the guard too.

Runstate: deliver events under the lock. The run-state event bus delivered subscribers under a read lock, but the "close the channel and terminate" path took a write lock and closed the channel while a delivery was in flight. Result: a send-on-closed panic on a rare race. Fix: deliver events under the same write lock as the close-and-terminate path. Race window closes.

MCP resilience: the thin client self-recovers

The final and most operator-visible fix. Every loomcycle user driving the runtime from Claude Code has hit the "session not found or expired" wall at some point: the loomcycle mcp --upstream thin client cached an upstream session id from its initial handshake, and any upstream restart (a container bounce, a snapshot restore, a network glitch) invalidated the id server-side. From then on every request from Claude Code got the session-not-found error until someone killed the thin-client subprocess and let it be re-spawned by /reload-plugins.

Last week's digest mentioned the v1.6.1 fix for a similar wedge on the HTTP transport. This week's fix hardens the same shape on both transports. On a 404 or a JSON-RPC -32001 ("session not found"), the thin client re-handshakes with the upstream, retries the original request, and returns the correct response to the client. Transparent to Claude Code. No visible reload. The "please reload plugins" dance is gone as a class of friction.

Arc 2 — CredentialDef: the encrypted per-tenant credential store (RFC AR)

A month ago RFC L shipped multi-tenant isolation for runs, memory, and volumes. It didn't ship anything for the substrate's actual load-bearing secrets: provider API keys, third-party MCP server tokens, per-user Telegram or Slack bots. Those all lived in the deployment env at boot, which meant every tenant used the operator's Anthropic key, every user shared one Slack token per tenant, and rotating any of them meant restarting the runtime with a new env.

CredentialDef is the substrate answer. A new Def family, tenant-scoped, keyed on (tenant, scope, scope_id, name), holding sealed ciphertext or an external-backend pointer. Never plaintext at rest. Scope precedence agent > user > tenant at resolve time, so a per-user token shadows a tenant default, so a tenant default shadows the operator host default. It landed across three tightly-scoped PRs.

The crypto + storage foundation

Envelope encryption, AES-256-GCM. Every deployment sets a master LOOMCYCLE_SECRET_KEY (the KEK). At store time the runtime derives a per-tenant DEK via HKDF-SHA256, seals the plaintext with a fresh nonce, and stores the ciphertext plus the KEK's fingerprint. The GCM AAD binds each ciphertext to its row identity (key_id | tenant | scope | scope_id | name), so a row copied to another tenant or renamed fails authentication at decrypt time. Fail-closed on a missing KEK: Seal and Open return ErrNoKey, never plaintext.

Key rotation supports current + previous KEK. LOOMCYCLE_SECRET_KEY_PREVIOUS opens old rows during the rotation window; a lazy re-encrypt migrates each row onto the current KEK the first time it's resolved. No forced re-encrypt sweep; the cost amortizes across normal reads.

The credential_defs table (migration 0051, both backends) holds sealed ciphertext or an external-backend pointer, never plaintext. Deliberately excluded from snapshots by omission: Snapshot and Restore don't serialize the definition column. The correct security posture is "secrets don't ride out on backup tapes," and CredentialDef inherits it from OperatorTokenDef's earlier precedent.

The $cred: binding for HTTP MCP servers

The consumption half. An HTTP or streamable-HTTP MCPServerDef declares a header like Authorization: Bearer $cred:slack_bot_token. At request time the per-run identity carries the caller's tenant + subject; the credential engine resolves slack_bot_token at the caller's scope (agent, then user, then tenant), seals-and-opens once for the call, injects the plaintext into the outbound header, and returns. The plaintext never lands in the transcript or the log.

An important shape point: the substitution is per-request, not per-server. The MCP client pool is a shared long-lived process; different runs served by the same pool get different resolved values on the wire. This is the load-bearing property that lets a shared Slack MCP server post as different users, from the same connection pool, without leaking a tenant's or a user's token to another run.

Stdio-transport MCP servers are excluded from per-user $cred: resolution because a stdio server is a pooled long-lived subprocess whose env is set once at spawn. Per-user tokens on a stdio MCP are documented as unsupported at v1; use an HTTP MCP wrapper if you need per-user routing.

Tenant + user provider-key override by env-var name

The pattern extends to the LLM provider drivers. Store a CredentialDef named ANTHROPIC_API_KEY at tenant scope, and every Anthropic call from that tenant's runs uses the tenant's Anthropic key instead of the operator's host key. Same shape for OPENAI_API_KEY, GEMINI_API_KEY, OLLAMA_API_KEY (hosted Ollama only; local-Ollama has no per-tenant key model). Same shape for BRAVE_API_KEY and DEEPSEEK_API_KEY.

Scope precedence carries. A user-scoped ANTHROPIC_API_KEY shadows the tenant default, which shadows the operator host. So a specific user can bill their own Anthropic account, while the rest of the tenant bills the tenant's account, while runs on the "no override" path bill the operator's account. The runtime decides at each call, from the ctx-carried CredentialResolver, without a config edit.

Model-availability probes stay on the operator key. The runtime needs to know which models exist before it can route a run to them; asking a tenant's key just to enumerate models is an operational and privacy misfit. So fetchModels uses the operator key, and only the inference path honors overrides. Same shape from the tenant's perspective: they pay for their own inference, not the operator's model catalog.

Fail-soft: no KEK, no stored credential, or a resolve error means the runtime falls back to the operator host key. Never a silent failure; never a run refused because a CredentialDef was misconfigured. The override is an opt-in improvement, not a required step.

Arc 3 — cost attribution (RFC AV)

The tenant/user override made the next question inescapable: whose key paid for which call? RFC AV lands the accounting layer. Per-call token-usage ledger, per-scope report, a Web UI page. Two-week arc squeezed into two days once RFC AR shipped.

The per-call ledger (Phase 1)

A new token_usage table (migration 0052, both backends) records one row per LLM call: the run id, the tenant/user/agent that owned the call, the provider and model that served it, four token buckets (input, cached-input, output, reasoning), the priced cost, and a credential_source label indicating whose key paid (operator, tenant, or user, with the paying scope's id in a companion column). Append-only. Per-call granularity is exact across a mid-run provider fallback, because each fallback's Provider.Call writes its own row.

Cost is DOUBLE PRECISION, nullable when unpriced (unknown model). Distinct from a genuine zero. A report row surfaces the count of unpriced calls in its group, so a Web UI can render "pricing incomplete for this group" instead of silently under-counting a tenant's spend.

The pricing table lives in the operator's yaml (pricing: section) with per-model $/Mtoken entries. Provider-reported cost wins when the driver returns one; otherwise loomcycle computes cost = tokens × per-1M rate. Loomcycle owns the pricing table (as opposed to fetching it from a live pricing API) because a live-API dependency introduces a runtime failure mode where a pricing-service outage denies runs, and because per-model rates change on a slower cadence than the deploy cycle already handles.

One row per call means a run's total cost equals the sum of the run's ledger rows. The runs.cost summary column is written at FinishRun as exactly that sum. So a run's cost matches the sum of its per-call rows by construction; a report grouped by run gives the same total as the runs table.

The report endpoint (Phase 2a)

GET /v1/_usage returns an aggregated report over the ledger. Query parameters: group_by (any combination of tenant, user, provider, model, source), from/to (time window), tenant (admin-only focus filter).

The operator-vs-tenant split falls out of the group-by. Grouping by source gives you the operator bill (source = operator) vs the tenant-funded spend (source in tenant or user). Grouping by tenant gives each tenant's total consumption. Grouping by both gives a matrix.

Tenant-scoped by ScopeTenant. An admin sees everything with an optional ?tenant= focus. A substrate:tenant operator is confined to its own tenant's rows; cross-tenant leakage is not possible from a tenant bearer regardless of query.

Group-by dimensions are a whitelist; unknown dimensions return 400. This is the SQL-injection guard. Each dimension SELECTs in a fixed canonical order (grouped columns first, ungrouped as empty string), so the scan can't drift on a stored dimension mismatch.

Retention: rollup-and-prune sweeper + old-run archiver (Phase 2b)

A ledger that grows unbounded is a ledger that eventually fails an INSERT with disk-full. Two sweepers keep the shape sustainable.

The rollup-and-prune sweeper compacts finished-run token_usage rows older than the retention window (default 30 days) into a summary row per day per group. Original per-call rows go to usage_archive, which sits on the same store but can be moved to cold storage on a schedule if it grows. The rollup is lossless with respect to reports at day+ granularity; it loses per-call detail past the window.

The old-run archiver prunes runs whose completed-at exceeds the run-retention window. The archiver exports each pruned run's transcript + metadata into runs_archive before deleting the live row. Same shape as the usage archiver; different table.

Both sweepers run on a ticker configured via env (defaults to hourly), respect a -race-safe batching size, and break the loop on a canceled context so a shutdown mid-sweep doesn't hang.

Cross-transport parity + the Web UI Usage page (Phase 2c)

The gRPC UsageReport RPC + @loomcycle/client TypeScript adapter's getUsage() + the Python adapter's usage_report() all speak the same shape as the HTTP endpoint. Same tenant-scoping. Same dimension whitelist.

A new Usage page lands in the Web UI (nav visibility tenant, following the RFC AS convention). Group-by chips (tenant / user / provider / model / source; default is tenant + source, which shows the operator-vs-tenant matrix). A from/to window picker. An admin tenant focus. A summary strip with total cost and, when grouped by source, the operator-bill vs tenant-funded split. An unpriced-calls indicator when any group has calls without a priced row.

A patch (v1.11.1) covers two operator-console fixes that landed after the initial merge: the page blanked on an empty report because the endpoint returned Go nil (JSON null) instead of []; fixed at the endpoint. And the K/M/G shorthand is now recognized in the token-limit editor (see Arc 4).

Arc 4 — token budgets (RFC AW)

The ledger tells you what you spent. Budgets tell the runtime to stop. RFC AW ships in two phases and a small polish patch.

Per-scope soft + hard tiers, most-restrictive wins

A budget is {tenant, scope, scope_id, window, soft, hard}. Scope is one of operator (global), tenant, or user. Window is a calendar month, UTC. Soft and hard are optional token counts; either can be set alone (soft-only for warn-and-continue behavior; hard-only for silent-then-refuse; both for warn-then-refuse, which is the recommended default).

Most-restrictive-of-the-three-scopes wins. An operator-wide soft of 10M, a tenant soft of 2M, and a user soft of 500K resolve to a 500K soft ceiling for that user. Same for hard. Missing rows are unlimited. So a fresh deploy has no ceilings; opting in is a matter of writing a row.

Enforcement points:

Admission (hard): at RunOnce's AcquireForUser site, after slot acquisition, the runtime calls limits.Check(tenant, user) and refuses the run if the resolved hard ceiling is exceeded. Returns runner.ErrTokenLimitExceeded, which becomes a 429 on HTTP and ResourceExhausted on gRPC. The slot releases; nothing runs.
In-flight (soft + hard): recordCallUsage increments the tracker on each ledger row, and if either threshold is newly crossed, emits an EventLimit event down the run's event channel. A soft crossing is a warning; the run continues. A hard crossing during a run also emits the event but doesn't abort the run mid-way (no partial rollback in the middle of a tool call); the run finishes, and any next run refuses at admission.

The in-memory tracker is calendar-month keyed and boot-seeds itself from the token_usage ledger at server start, so a restart doesn't reset counters. Cross-replica coordination is advisory at v1 (the runtime is honest about not being strict-consistent across replicas mid-crossing); a follow-up RFC handles the strict case.

The EventLimit event, on every transport

A new event type in the providers package: EventLimit, carrying a LimitInfo{scope, scope_id, severity, window, used, limit, message} block. The event is server-generated (at admission and at recordCallUsage), rides the same event channel as EventText/EventToolUse/EventThinking, and lands on every transport in Phase 2.

HTTP SSE consumers see the event inline in the run's stream. The persisted transcript captures it as a limit row (so an operator opening a transcript later sees where the crossing happened). gRPC consumers see the event on Run/Continue streams. The @loomcycle/client TypeScript adapter surfaces it as a "limit" EventType with a typed LimitInfo. The Python adapter has a LimitInfo dataclass and a limit field on AgentEvent. MCP tool calls that spawn a run (via spawn_run) get Limits alongside Usage on the result, so a budget crossing rides the tool return.

The Web UI's run terminal renders the event as a banner (amber for soft, red for hard) inline in the transcript, from both the live SSE stream and the persisted row.

/v1/_limits CRUD + gRPC parity + the Web UI Limits page

GET/PUT/DELETE /v1/_limits manages budget rows. Same tenant-scoped auth as the Usage endpoint: admin sees all, tenant operator confined to its own tenant. The gRPC TokenLimit(list|set|delete) RPC covers the same surface for programmatic access. Both flow through a single limits.ResolveWrite helper that owns the confinement rule (a tenant operator may write only its own tenant's tenant/user budgets; the operator-global scope is admin-only). One rule, two transports.

The Web UI's Limits page (nav visibility tenant) is CRUD over the endpoint plus a live "month-to-date used" column, sourced from the same in-memory tracker the enforcer reads. The operator sees, next to each budget row, how much of the budget has been consumed already, without a separate report query.

The token-limit editors accept K/M/G shorthand (500K, 5M, 2G, also B and T, decimals, thousands separators). The live "= 5,000,000" hint below the input confirms the parsed value. Small polish that shipped in v1.11.1 (a Web UI patch); nobody writes 5000000 without a typo.

Small wins

Routing view (v1.9.0)

A GET /v1/_routing endpoint returns the live provider/model cascade the runtime uses to resolve a given tier. For each configured user_tier × tier combination, the endpoint returns the ordered fallback chain (which provider/model gets tried first, which is second, and so on). Admin sees availability status (whether each entry is reachable); a substrate:tenant operator sees the config cascade only.

A companion Web UI Routing page renders the cascade visually with the selected entry highlighted. Useful when a tenant asks "which model does chat resolve to in my tier?" and the operator wants a definitive answer without grepping yaml.

RFC AU — tenant import of Claude Code skills + MCP servers

A Web UI action for a tenant operator: paste a Claude Code .claude/ directory as a file bundle, and the runtime imports the skills and MCP servers into the tenant's Library as tenant-scoped Defs. Useful for tenants who've built up a workflow in Claude Code and want to run it under loomcycle with local LLMs, without re-authoring every skill by hand.

Skills import as SkillDefs; MCP servers import as MCPServerDefs (usually stdio, sometimes HTTP). The import flow flags any imports that reference a Claude Code proprietary tool (which loomcycle doesn't have) so the operator sees the gap before running.

Path VFS: implicit-directory synthesis in one-level ls

A small fix that surfaced during Wednesday's Path work. Path op=ls /loomcycle/ returned only the direct-leaf documents but not the implicit rfcs/ "directory" that had three documents living under it (/loomcycle/rfcs/agent-teams, etc.). The S3-style implicit-directory design was correct in principle; the ls op just wasn't synthesizing the parent-directory entries from the union of leaf paths.

Fixed. Path op=ls /loomcycle/ now returns both the direct-leaf documents and the synthesized directory entries. An operator navigating the Path tree in the Web UI sees the folder shape they'd expect from a Unix-ish filesystem.

What this three-day arc unlocks

For a solo operator running loomcycle for themselves, this week's arc is invisible; the operator key still pays for everything, no budgets are needed, no other tenants exist. For a maintainer running loomcycle for one team, it's the difference between "one shared bill" and "a per-user bill you can inspect." For a maintainer running loomcycle for multiple teams, it's the difference between "one Anthropic account funds everyone" and "each tenant bills their own account, the operator's account funds only the operator's own agents, and every crossing renders inline in the run's transcript."

Together with last week's tenant-operator Web UI (RFC AS), this is the substrate for shared-tenancy at real scale. A tenant admin can plug in their own provider keys, browse their spend at any dimension, cap their users' spend, and have every over-budget refusal render as an EventLimit event in the client of their choice. Nothing about the substrate assumes a specific operator's business model; the same substrate serves a homelab, a small team's shared TrueNAS deployment, or a hosted multi-tenant service.

Next up: the loomboard chat surface using the CredentialDef store for its per-user Slack / Telegram / GitHub bindings; a TrueNAS Scale catalog-app manifest that includes the LOOMCYCLE_SECRET_KEY KEK generation in the install flow; and RFC AP (TeamDef), which lands the multi-agent workflow graph, the piece that finally makes "team of agents shipping a feature end-to-end" a substrate primitive instead of a hand-wired composition.

Companion reading: last week's digest (tenant-operator Web UI + TrueNAS deployment + thinking traces), three MCP tokens in one run (the earlier per-scope credential ergonomics work), and the RFC AV, AW, AR RFC documents in docs/ for the design details behind each substrate primitive.