Skip to main content
loomcycle
§ release digest

Tenant surfaces, TrueNAS deployment, and thoughts on the wire.

Five days. Eight releases. Three arcs that landed in parallel, plus a handful of small wins that end old friction.

The three arcs, in the order they mattered: tenant surfaces (the tenant-operator Web UI shipped across five patches under RFC AS), TrueNAS deployment (docs + config fixes that moved loomcycle-on-TrueNAS from possible to supported), and thoughts on the wire (vision input across every provider, plus Ollama thinking traces that the loop was silently dropping for every provider). The small wins are the kind that quietly retire a class of bug: the MCP thin-client learned to re-handshake on session expiry, a new GET /v1/_models endpoint exposes the operator's alias map so a UI can offer aliases in its picker, and sub-agents finally inherit their parent's tenant.

The short version. RFC AS lands the tenant-operator Web UI so a substrate:tenant bearer sees and manages its own tenant's Library, Path/Documents, schedules, and audit log with the same nav shape an admin uses. TrueNAS deployment moves from possible to supported through Postgres-floor softening, secrets in env_file (not inline YAML), a CREATEROLE grant for SQL Memory, LOOMCYCLE_PUBLIC_URL wired in the compose, and a chat + chat-local bundle. Vision input lands across all providers + transports (RFC AT) with a fallback-target gate that closes the image-to-DeepSeek-400 leak the loomboard team surfaced mid-week. Ollama emits thinking traces via the effort hint. And the loop finally forwards EventThinking to consumers, a two-line fix for a bug that dropped every provider's reasoning trace silently.

Arc 1 — tenant surfaces (RFC AS): the operator UI a tenant can actually use

RFC L (v1.0's OSS multi-tenant authorization) shipped the substrate: an OperatorTokenDef bearer resolves to an authoritative (tenant, subject, scopes), all runs and memory are per-tenant isolated, and cross-tenant access returns opaque 404. That closed the trust boundary. It did not close the operator-experience gap.

A tenant operator with a substrate:tenant bearer could authenticate. Then the Web UI showed them either an empty page or a form that would 403 on the next click. The Library returned zero agents because the list endpoint gated on substrate:admin. Schedules 404'd. Path/Document browse was locked to the caller's own subject with no way to see another. Sub-agent runs 404'd on session open because the sub-agent's session had the wrong tenant. Audit logs showed nothing tenant-scoped. Every one of those was a design gap, not a security bug, and every one made a shared-tenancy deployment operationally painful.

RFC AS is the surface-by-surface fix. It lands across five patch releases, each shipping the pieces that were ready.

v1.6.3 — Phases 1 + 2

Two moves. First: tenant-scope the read side. The Library list endpoints (/v1/_library/agents, /v1/_library/skills, /v1/_library/mcp-servers) and the def-plane /names reads now filter by the authenticated tenant. An admin still sees everything; a substrate:tenant bearer sees its own tenant only. The route gate on the Library also opens to substrate:tenant so a tenant operator can reach it at all.

Second: per-surface nav visibility (Phase 2). A three-way class emerges. Admin sees the full nav. Tenant operator sees the surfaces they can act on (Library, runs, schedules, Path/Documents, audit) and doesn't see the ones they can't (global settings, token minting, cross-tenant lists). Read-only viewers see a narrower cut still. The class assignment lives in the UI's client-side auth resolver keyed on the bearer's scopes.

There's also a small footgun fix bundled here. The token-mint UI used to allow an operator to accidentally mint a substrate:admin token if they clicked the wrong scope checkbox with an admin bearer active. If the operator then let the admin bearer expire without another admin token in circulation, the runtime went into a lockout state where no one could mint the replacement. The UI now refuses to mint substrate:admin from any interactive click; you author admin tokens via yaml declared-principals or the CLI, deliberately.

v1.6.4 — RFC AS completion

The remaining tenant-scopable surfaces, all shipped in one release once the shared plumbing was clear.

Static / bundled agents visible to tenant operators. Operator-declared static AgentDefs live in yaml with no per-tenant owner; they used to be admin-only in the Library. Now a substrate:tenant operator sees them as read-only reference items, alongside their own tenant-scoped forks. Same for bundled skills and bundled MCP servers.

Schedules surface tenant-scoped. GET /v1/_scheduledef?tenant=... filters by the authenticated tenant when the caller isn't admin. The Web UI's schedules page uses the same filter.

Path / Document browse-by-subject. Two moves here that took a bit to line up. Backend: the off-run Path and Document handlers accept a caller-chosen subject parameter and validate it against the authenticated principal's tenant (admin sees any subject; a tenant operator sees any subject within their tenant, empty result if the subject doesn't exist there). The topbar UI grew a subject picker so a tenant operator can browse the Path tree for a specific user's namespace without impersonating them. The picker is scoped: a tenant operator only sees subjects that exist in their own tenant.

Audit log tenant-scoped via owning session. The event log is keyed by session; the session carries the tenant; the audit view filters by session ownership. A tenant operator sees only events from runs and sessions in their tenant.

Bundled with this release: a docs fix for the SQL-Memory Postgres role. It needs CREATEROLE to provision per-scope least-privilege roles at runtime; the TrueNAS INSTALL had a plain CREATE USER which fails silently until an agent tries to open a SQL-Memory scope and the role provision returns permission denied. Small fix; unblocks anyone running the TrueNAS compose past the SQL-Memory boundary.

v1.6.5 — bundled skills in Library + documents never orphaned

Two things surfaced by the live v1.6.4 UI. The Library handler had been reading only from the top-level skills: section of yaml, ignoring inline top-level skills declared inside bundles: (RFC AQ). So a bundled skill was reachable by its agent at runtime but invisible in the Library browser. Fixed: the handler now scans both sources and merges. Admin and tenant operators both see the full skill set.

The other fix: documents were sometimes reachable only by UUID, not by a Path-tree name. If a caller ran Document op=create_document without a path: argument, the document existed in SQL Memory and Path resolution returned nothing for it. It was queryable but not browsable. Now create_document always registers a dirent (defaulting to /documents/<title> if no path is supplied), and a new Document op=set_path attaches or re-homes a Path name for an existing document. So a document is never orphaned from the Path/Library browser, and moving one to a new location is a one-op operation.

The set_path op is what fixed the migration I did earlier this week. The publishing plan Document was at /loomcycle/marketing under one tenant, needed to move to the loomcycle-dev tenant, and the local server didn't have set_path yet (it wasn't merged). I used Path op=mv as a workaround, which moves the dirent but not the document body. With v1.6.5 on TrueNAS, set_path is the correct call: attach a dirent to an existing document by id, no dirent-vs-body drift.

v1.6.6 — sub-agent session inherits parent's tenant

A tenant operator opened a sub-agent's run in the Web UI. The parent run's tenant was correct. The sub-agent's session opened with an empty tenant, so GET /v1/sessions/<id> returned 404 (opaque cross-tenant miss). The operator saw "run exists, session doesn't" and had no way through.

The fix is one call site. runSubAgent was passing parentIdentity.TenantID to the run but not to openOrCreateSessionAndRun. The session got created with the runtime's default tenant (empty) instead of the parent's. Now the parent's tenant propagates into the session too. A sub-agent's Web UI page opens cleanly for the tenant operator who owns the parent run.

v1.6.7 — Context op=self reports identity + server URL

An agent authenticated over MCP had no reliable way to answer "who am I?" from inside the run. Context op=self returned agent_id, agent_name, filesystem, allowed hosts, provider, model, user_id. It did not return tenant. So an agent running under a tenant bearer couldn't confirm the tenant it was running in, and couldn't include tenant metadata in a substrate call.

v1.6.7 fills the gap. Context op=self now reports a full principal block: subject, tenant_id, scopes, token_def_id, is_admin, legacy. Plus a server block: listen_addr (the internal port) and url (the operator-configured public URL, from LOOMCYCLE_PUBLIC_URL). So an agent knows its tenant, its principal type, and the URL an external caller would use to reach the runtime it's living in.

The LOOMCYCLE_PUBLIC_URL env var also lands in the TrueNAS deploy compose and INSTALL.md this release, so the value is set correctly on a fresh install without editing the yaml.

I used this the same day. Reconfiguring my MCP plugin to point at the TrueNAS instance instead of localhost, I called Context op=self to verify the new endpoint. The response showed the Tailscale URL, the loomcycle-dev tenant, the correct OperatorTokenDef subject. No more guessing which server the plugin is talking to.

The load-bearing shape of RFC AS

Every surface follows the same posture: admin sees all, with an optional ?tenant= focus filter; a substrate:tenant operator is confined to its own tenant by construction. Cross-tenant reads return empty results (never a 403; opaque 404-shape for existence questions). The Web UI's nav visibility is derived from the bearer's scopes, not hardcoded per-user. Bundled and static defs stay visible to tenant operators as read-only references. And the runtime resolves the tenant from the principal, never the wire, at every write site.

Once RFC AS finished landing, running a shared loomcycle for multiple tenants stopped being a maintainer-only operational feat. The auth boundary was there in v1.0; the surface parity to make it usable took five patch releases across three days.

Arc 2 — TrueNAS deployment: from "possible" to "supported"

Loomcycle has run as a docker-compose service on my TrueNAS box for a couple of weeks now. It worked, but "worked" meant several rounds of me fixing something that had been implicit knowledge. The last five days packaged that implicit knowledge into shippable docs and default config so the next person running loomcycle on TrueNAS doesn't hit the same fixes.

No single release owns this arc. The work is distributed across every version tag in the window as small docs-plus-config commits.

Postgres version wording (v1.6.2)

The compose and INSTALL originally pinned Postgres to 16. Postgres 14 works. Postgres 15 works. The runtime uses only the standard SQL features plus LISTEN/NOTIFY, which have been stable since 9. Softened to "Postgres ≥ 14" so the operator can use whatever Postgres their environment has, including the version TrueNAS Scale ships in its catalog.

Secrets in env_file, not inline YAML (v1.6.2)

The compose used to have secrets inline (Anthropic API key, OpenAI API key, per-provider bearers, the operator-token pepper). This is fine for a demo but wrong for a real deployment: the compose file is meant to be readable, committable to a private repo, or shared across TrueNAS admin sessions. Secrets in it become secrets in git or secrets in a screenshot.

The fix: the compose points at an external env_file: /path/to/.env.local. The env file stays on disk with tight permissions, out of the compose. The compose file itself is now safe to commit or paste. Standard docker-compose pattern; loomcycle was late in adopting it.

SQL-Memory Postgres role needs CREATEROLE (v1.6.4)

SQL Memory (RFC AA, shipped v1.2.0) creates per-scope least-privilege roles at runtime. The role that connects on behalf of loomcycle needs CREATEROLE to do that; without it, the first attempt to open a SQL-Memory scope returns "permission denied to create role" and the agent's SQL calls fail. The TrueNAS INSTALL now grants CREATEROLE explicitly. Unblocks anyone whose agents use SQL Memory (which is anyone running the sample agents past the default set).

LOOMCYCLE_PUBLIC_URL wired in (v1.6.7)

The server.url field in Context op=self comes from LOOMCYCLE_PUBLIC_URL. If that env var isn't set, the runtime reports a placeholder that isn't useful. The TrueNAS deploy compose and INSTALL now set LOOMCYCLE_PUBLIC_URL to the operator's Tailscale hostname (or LAN IP) so agent introspection returns a real URL from a fresh install.

Chat + chat-local bundle (v1.6.2)

Two new bundled agents ship for chat use cases. chat targets the operator's tier: middle configuration (typically a cloud provider). chat-local targets local-medium (typically an Ollama-hosted model on the TrueNAS box's own inference hardware). Both are registered as allowed_tools permissive agents suitable for driving from the loomboard chat surface in development, without operators authoring an AgentDef from scratch.

The chat-local agent is the natural landing point for the TrueNAS build I described in last week's field report. The build sits idle in Ollama; chat-local is the entry to actually use it.

ollama-local num_gpu knob (v1.6.2)

New env var: LOOMCYCLE_OLLAMA_LOCAL_NUM_GPU. Sets the num_gpu option on Ollama requests to the local provider. That's the model-loading-onto-GPU-layers tuning arg I called out in the TrueNAS field report as the big lever for iGPU inference performance. It was previously set via a model option in yaml, per-model; now it's a top-level knob so the whole ollama-local provider inherits it. Setting it to 99 pushes as many layers as possible onto the iGPU; Ollama fits what it can and spills the rest to CPU.

Composite effect across all these fixes: a fresh TrueNAS install of loomcycle now follows the INSTALL from top to bottom without me having to answer questions or debug environment gaps. That's the difference between "possible" and "supported." Next milestone: the TrueNAS Scale catalog app manifest (RFC AR), which is drafted and pending PR.

Arc 3 — thoughts on the wire: vision + thinking traces on every SSE stream

Two related landings under one theme: what the model produces as structured signal (image inputs going in, reasoning traces coming out) is now first-class on every transport, for every provider that supports it. Historically both were per-provider curiosities that the runtime either didn't propagate or propagated inconsistently.

v1.7.0 — RFC AT vision across all providers + transports

Image input in an agent's message reaches the model on Anthropic, OpenAI, Gemini, Ollama (via vision-capable local models like llava or llama3.2-vision), and OpenAI-compat (which covers DeepSeek but with vision unsupported at the DeepSeek endpoint; the runtime refuses image content there). Per-provider serialization: Anthropic native discriminated blocks; OpenAI/OpenAI-compat content array; Gemini inlineData; Ollama message images field. All handled by the driver.

The wire lift extends to every transport in the same release: HTTP POST /v1/runs accepts image content in messages; gRPC's RunRequest got image-block support in its proto/loomcycle.proto; MCP's spawn_run tool accepts image content in its input; both the @loomcycle/client TypeScript adapter and the Python adapter got image-attachment helpers. Adapters bumped to @loomcycle/[email protected] and [email protected].

There's also a capability gate. Every provider reports SupportsVision in its Capabilities(). Before Provider.Call, the loop scans the assembled messages for image content; if any message has an image and the resolved provider is text-only, the loop emits a clear error ("model X on provider Y does not support image input") and refuses the call. This is what makes an incorrectly-tiered fallback fail loudly instead of the image being silently dropped.

v1.7.1 — the fallback-target gate (§4.4 of RFC AT)

The vision gate above ran once, at the initially-resolved provider. Fallback logic (tryProviderFallback) could then swap in a text-only provider mid-run without re-checking the vision capability. So a run authored against Claude, carrying an image, that hit an Anthropic 503, would swap to DeepSeek-text on the fallback attempt, and the image_url content would reach DeepSeek unchecked. DeepSeek's response: a raw HTTP 400 with unknown variant 'image_url'. The image was leaked to a provider that couldn't handle it; the operator saw a confusing 400 instead of a clear vision-mismatch error.

The loomboard team surfaced this mid-week with a specific repro and a fix recipe. The patch is straightforward: re-run the SupportsVision check against the fallback target inside tryProviderFallback. If the run carries an image content block and the newly-resolved provider doesn't support vision, refuse the swap. Emit EventFallbackSuppressed with a clear reason ("fallback from X/Y to A/B suppressed: run carries an image but fallback target does not support vision (RFC AT §4.4)"). Return fallbackOutcomeNotEligible. The original error propagates unchanged. No image leaks.

A regression test pins the shape: an image-bearing run with a vision-capable initial provider that fails over to a non-vision target must call the vision-capable provider once, then refuse the fallback (never call the vision-incapable target), emit the suppressed event with the RFC-AT cite and both provider names, and propagate the original failure. DeepSeek's Capabilities().SupportsVision is confirmed false even though it wraps the OpenAI driver.

v1.8.0 — Ollama thinking traces via the effort hint

Cloud reasoning models (Claude Opus's extended-thinking mode, OpenAI's o-series, DeepSeek's deepseek-reasoner) emit a "thinking" or "reasoning" trace that the loop's provider drivers turn into EventThinking stream events. Local reasoning models on Ollama (qwen3, deepseek-r1, and their variants) also emit thinking traces, but under a different flag: Ollama's request format has a top-level think: true boolean the driver must set. Without it, the model generates the same reasoning inline in the response text; with it, the reasoning is separated into a thinking field.

v1.8.0 wires the effort hint to that flag. effort: medium or effort: high maps to think: true; effort: low maps to think: false; unset leaves the model default. So the same per-agent effort: field an operator sets for cloud reasoning models now drives the local ones too. The Ollama driver's Capabilities() reports SupportsThinking and SupportsEffort as true.

One behaviour change worth flagging: effort on a non-thinking Ollama model (e.g. llama3) now returns an error instead of silently ignoring the setting. That matches how the cloud drivers behave (Anthropic errors on effort for non-thinking models; OpenAI errors on reasoning for non-o-series). Consistent error posture across providers.

v1.8.1 — CLI config-layering fix + Ollama think diagnostic

Two small landings. First: the CLI subcommands validate, agents, and doctor now honour the same config-layering the server uses. They read LOOMCYCLE_PRESETS, LOOMCYCLE_CONFIG_DIR, and LOOMCYCLE_CONFIG_FILES the same way the runtime does at boot. A common failure mode this fixes: a preset-defined model alias (e.g. deepseek-pro declared in ~/.config/loomcycle/presets/) resolved on the server but broke in loomcycle validate because the CLI hadn't loaded the preset layer. The three commands share a common loadLayeredConfig helper now.

Second: a diagnostic env var. LOOMCYCLE_OLLAMA_DEBUG_THINK=1 logs each Ollama request's model, effort, and think flag at INFO level. Useful when a local reasoning model isn't emitting the trace you expected: turn on the debug flag, run once, read the log to see whether the runtime resolved effort:medium to think:true and sent that on the wire. The log line is the single source of truth for "did the runtime ask for thinking, or not?"

v1.8.2 — EventThinking forwarded to consumers

This is a bug that had been quietly present since EventThinking was introduced. Every provider driver correctly parsed the thinking trace out of the provider's response and emitted an EventThinking event down the event channel. The loop's central event switch, which fans events out to the SSE stream and the gRPC event stream and the run's persisted transcript, had no case for EventThinking. And no default case. So the event hit the switch, matched nothing, and was silently dropped.

For every provider. For every consumer. Anthropic thinking, OpenAI reasoning, DeepSeek reasoner, Ollama thinking (once v1.8.0 turned it on), Gemini thinking. All silently dropped at the loop. The clients saw the model's final response but never the reasoning that produced it.

Fix: one line. case providers.EventThinking: emit(ev). Now the reasoning trace flows through the loop the same way EventText and EventToolUse do. SSE consumers see it. gRPC consumers see it. The persisted transcript captures it. Adapters (TypeScript, Python) already had event types for it; they'd just never received one.

The reason this took as long to catch as it did: the transcript stored the final response text (which includes the reasoning for models that emit it inline as text), so from a user perspective the reasoning wasn't "missing" if you were reading the transcript. It was missing from the streaming event view. And most reasoning models emit inline text in the response body anyway, so the stream view happened to look right for cloud providers. Ollama's think:true is what separates reasoning into a distinct field; wiring Ollama thinking is what exposed the dropped-event bug for what it was, across every provider.

Small wins

MCP thin-client re-handshake (v1.6.1)

The loomcycle mcp --upstream thin client holds an upstream session id after the initial initialize handshake. When the upstream server rotates its session table (a container restart, a snapshot restore, a network glitch that dropped the session server-side), the thin client kept trying to reuse the stale id. Every request returned session not found or expired until someone killed the thin client subprocess and let Claude Code respawn it.

Anyone using loomcycle as an MCP server for Claude Code has hit this. It's the "please run /reload-plugins" dance. v1.6.1 fixes it: on a 404 / -32001 response, the thin client re-handshakes with the upstream (fresh initialize, fresh session id, retry the request) instead of returning the error to the client. Transparent to Claude Code. The whole class of session-expiry friction goes away.

GET /v1/_models — expose configured model aliases

Loomcycle has always let the operator define named model aliases in yaml (models: {chat: {provider: anthropic, model: claude-sonnet-4-6}}) so agents can reference tiered aliases (chat, local-medium) instead of concrete provider-model pairs. The alias resolves at runtime; an operator changing the alias's target in yaml retargets every agent using it.

What was missing: a way for a UI to discover the alias set. GET /v1/_providers/{id}/models returned concrete per-provider models; the tier system returned tiered resolution paths; neither exposed the operator's alias map. So a UI wanting to offer aliases in a model-picker had no source. The picker fell back to concrete models, which meant a fork created through the UI stored a concrete model instead of an alias, which meant the fork didn't follow operator-level alias retargets.

New endpoint: GET /v1/_models returns the alias map. Non-secret config (provider names, model names). Tenant-readable (substrate:tenant gate matches the Library reads); the alias set is a global operator concern, so every authenticated caller sees the same list. Loomboard's model picker uses this to offer aliases as first-class picks; a fork created with chat stays aliased to chat.

What this five-day arc unlocks

The three arcs weave together more than they look. RFC AS makes multi-tenant deployment operationally realistic for the first time. TrueNAS hardening makes self-hosting operationally realistic on real prosumer hardware. Vision + thinking on every wire makes model output legible to consumers regardless of which provider ran under the hood.

Together, they're the substrate for a shared, self-hosted, cross-provider loomcycle that a small team can actually run in production without me on the phone. The v1.0 launch shipped the primitives. This week shipped the operational surface.

Next up: the RFC AR TrueNAS Scale catalog app (which packages the compose into a one-click TrueNAS install), and the loomboard chat surface (the first standalone product on top of loomcycle's substrate, currently pre-alpha in the loomboard repo). Both draw directly on what shipped this week: RFC AS for the tenant Web UI, chat + chat-local bundles for the default agents, Context op=self for the substrate that loomboard chat introspects, EventThinking forwarding for the reasoning trace loomboard renders inline.

Companion reading: last week's TrueNAS field report (the build this arc's TrueNAS hardening targets), v1.5.0 co-authoring (the RFC AG + AO substrate that made RFC AS tractable), v1.4.0 Path + Document (the primitive op=set_path extends).