Skip to main content
loomcycle
§ release note

The day the reviewer agent inlined the Gitea token in a Bash command — and v0.23.4 redacted it anyway.

Day four of the operator-via-MCP series, and the final experiment. Day one exercised tool dispatch and found a DeepSeek bug. Day two hit the MCP wedge and shipped concurrent dispatch. Day three ran a multi-agent refine loop and surfaced five capability-gate footguns. Today's experiment was the real-world one: a closed-loop dev workflow against a real self-hosted Gitea repo, with a real third-party MCP server, ending in a real Telegram notification. End-to-end, fully un-bridged, no relay — Gitea on the tailnet talked directly to loomcycle's webhook receiver.

The loop runs. The coder agent picks up a task from a channel, writes the code, opens a real PR. A Gitea webhook auto-spawns the reviewer (with the full signed payload). The reviewer clones, reviews the diff, calls mcp__gitea__pull_request_review_write + pull_request_write{method:"merge"}, and the PR is merged. A second webhook auto-spawns the advisor, which checks the merge state and posts "✅ exp4: PR #10 merged" to Telegram. PR opens, PR merges, Telegram lights up — every step driven by the runtime, every agent created at runtime, no static yaml in the final dynamic re-run.

That's the headline result. The interesting moment came mid-merge, in the reviewer's transcript. The agent decided to debug its own environment by running env — and inlined the resolved GITEA_TOKEN in a Bash command. The token is now in the SQLite transcript store. This is exactly the secret-leak pattern operators run experiments to find. Pre-v0.23.4 would have persisted the literal token to disk. Post-v0.23.4 made it structurally harmless. This post is about how.

The loop

Three agents, one channel, one third-party MCP, one Telegram bot:

AgentWakes onToolsWhat it does
exp4-coderchannel exp4-tasksRead, Write, Bash, mcp__gitea-dyn__*Writes code, opens PR via gitea-mcp
exp4-reviewerGitea webhook pull_requestRead, Bash, mcp__gitea-dyn__*Clones, diffs, approves, merges
exp4-advisorGitea webhook pull_request (merged)mcp__telegram-dyn__*, ContextPosts merge notification to Telegram

The Gitea MCP server is gitea-mcp v1.3.0, a real third-party project — 53 tools (PR create/read/review, repo CRUD, file ops, branches, comments, the works). We registered it dynamically via POST /v1/_mcpserverdef using a token-safe stdio wrapper that maps loomcycle's inherited env onto gitea-mcp's expected variable name without ever putting the secret in a tool argument. The Telegram MCP is a small self-built stdio server (send_message only) backed by a Telegram bot account.

Tailnet ingress had no relay. Loomcycle bound the tailnet IP (100.68.197.107:8788); Gitea, on another tailnet host, reached /v1/_webhooks/* directly. No smee.io, no Cloudflare tunnel — just two services on the same private network agreeing on a TLS-or-bust contract.

Two webhook footguns first — F23 and F28

Getting to "the loop runs un-bridged" required closing two webhook discoverability footguns surfaced in the v0.23.0 incarnation of the experiment.

F23 — the HMAC secret allowlist had a stale name. The receiver was reading LOOMCYCLE_SCHEDULER_ENV_ALLOWLIST, not the intuitive LOOMCYCLE_ENV_ALLOWLIST the CLI hint referenced. Set the wrong env var: every webhook returned 503 secret_unresolvable. The fix (#385, v0.23.1): a three-rule model — LOOMCYCLE_* verify auto-allow, static-yaml auto-trust, or an explicit LOOMCYCLE_WEBHOOKS_ENV_ALLOWLIST — plus an auth.kind: none trusted-network option. Zero allowlist config needed for the common case.

F28 — the agent's payload was empty. The webhook receiver mapped the run's prompt segment from proj.Fields["goal"] via the def's payload_mapping. Without a mapping, goal was the empty string. The webhook fired, the agent spawned, the agent saw nothing, the agent silently no-op'd. The fix (#387, v0.23.2): when no goal mapping is configured, default the segment text to the compact-JSON raw signed body (fenced as untrusted-block). The "GitHub-pattern" expectation — that the agent receives the event body — now holds without ceremony.

With those two closed, the loop ran fully un-bridged: PR #3 opened by the coder; Gitea's pull_request webhook auto-spawned the reviewer with the full payload; the reviewer cloned, reviewed, and auto-merged. Real PR. Real merge. No human in the loop.

The moment the agent put the token in a Bash command

Run a later iteration of the same loop on v0.23.3 with a freshly-cleared DB. The coder opens PR #8. The webhook fires. The reviewer agent spawns. It clones, diffs, calls mcp__gitea__pull_request_review_write — and hits the expected self-approval refusal (the agent's Gitea token is the same as the PR author's; Gitea blocks self-approval). The reviewer falls back to merging directly. Before issuing the merge call, it decides to debug its environment.

Here is what the reviewer's transcript shows:

F32 · pre-v0.23.4 · the leak in the wildThe reviewer agent emitted a Bash tool call where the resolved Gitea token was inlined literally in the shell command, and then it ran env — printing the token a second time in the result.

Specifically: GITEA_TOKEN="<literal-token-value>" git -c http.extraheader=… in the command, then env | grep GITEA in the result. Two distinct exposures: the call input and the result text both contained the literal secret. Pre-v0.23.4, both would have persisted to the tool_calls table in SQLite alongside every other tool event. The DB on disk would have contained the literal token, retrievable by anyone who grabbed the file.

This is the leak operators worry about. The runtime had been careful about configuration secrets — env vars referenced by name, never logged by value, MCP server connection strings carrying ${run.credentials.x} placeholders. But the runtime trust boundary stopped at the wire surface. Once a secret was substituted into a Bash command by the agent's own reasoning, it was just bytes in a tool call, and the transcript store recorded those bytes faithfully.

The fix — v0.23.4 #407, value-based redaction at rest

The fix that shipped (#407, v0.23.4) is the right shape for this class of leak:

  1. Value-based, not name-based. The runtime knows the resolved values of every LOOMCYCLE_* env var (it resolved them at boot). Before persisting a tool call's input or text, the redactor scans for substring matches against those values. Every occurrence is replaced with [redacted:<env-var-name>].
  2. Catches the typo. The agent assigned the token to GITEA_TOKEN in the shell, but the env var the operator set is LOOMCYCLE_GITEA_TOKEN. Name-based redaction would have missed this — the agent named its shell var differently. Value-based catches it because the resolved string is the same.
  3. Preserves the env-var name for debuggability. The redaction tag includes the name ([redacted:LOOMCYCLE_GITEA_TOKEN]), so an operator reviewing the transcript can see which secret got used where, without seeing the value. Debug context preserved; secret gone.
  4. At rest, not just in-transit. The redaction happens on the persistence path. The DB on disk contains [redacted:LOOMCYCLE_GITEA_TOKEN], not the value. Anyone who grabs the file gets nothing.

Reproduced under v0.23.4: the same reviewer-agent behavior (inline assignment + env dump). Whole-DB scan with the secret pulled from .env.local, piped through strings | grep -f0 literal token hits, 0 webhook-secret hits, 32 env-var-name references. The discipline holds: store names, never values. Now enforced by the runtime, not by the agent's good behavior.

What you see in the transcript instead of the token:

"command": "GITEA_TOKEN=\"[redacted:LOOMCYCLE_GITEA_TOKEN]\" git -c http.extraheader=… …"
"text":    "LOOMCYCLE_GITEA_TOKEN=[redacted:LOOMCYCLE_GITEA_TOKEN]\n
            LOOMCYCLE_GITEA_WEBHOOK_SECRET=[redacted:LOOMCYCLE_GITEA_WEBHOOK_SECRET]\n…"

The residual caveat worth naming. The Bash process the agent spawns still has the real environment — that's how the command works at all. So at runtime, while the process is alive, the secret is in memory and a determined attacker with local-process access could grab it. The fix is "value-at-rest," not "value-out-of-process." A real sandbox boundary (e.g. running tool processes under a different UID with a scrubbed env, or strict capability-gated env passthrough) is a separate, larger design question. This redaction is the right discipline for the persistence boundary specifically — the one most operators worry about for compliance and post-mortem review.

The fully-dynamic arc — closing the substrate

Three more fixes in this same window completed a longer arc: every entity in the workflow now creatable at runtime, no static yaml required. The series across all four experiments has been moving in this direction; exp4 is where it lands.

F30 — webhook can spawn a dynamic agent (#403)

v0.23.2 had a gap: a runtime-created webhook tried to spawn a runtime-created agent, and the spawn refused with rejected_spawn_setup: unknown agent: exp4-reviewer-dyn. Root cause: the webhook-spawn path was resolving agents from yaml only, not from the AgentDef store. The fix: WebhookDef now stamps TenantID from the run identity (mirroring AgentDef), so a runtime webhook resolves a runtime agent under the same tenant.

F31 — dynamic stdio MCP, gated (#405)

Dynamic MCP registration in v0.23.2 was HTTP-only — a stdio server would run an arbitrary local command outside loomcycle's HTTP host-allowlist mediation, so the substrate refused it. The fix: LOOMCYCLE_MCP_ALLOW_DYNAMIC_STDIO=1 opt-in flag that permits POST /v1/_mcpserverdef with transport: stdio + command / args / env. Off by default; the operator explicitly opts in.

The token-safe pattern for an MCP server like gitea-mcp that wants its env named differently than the loomcycle convention: a tiny shell wrapper that maps inherited env on the way through. gitea-mcp-stdio.sh:

#!/bin/sh
exec env GITEA_ACCESS_TOKEN="$LOOMCYCLE_GITEA_TOKEN" gitea-mcp -t stdio

The secret flows env → env → env. It never enters a tool argument; it never enters a command line a shell would see; it never enters the MCPServerDef's persisted content. With v0.23.4's redaction, even if the agent does something silly with it inside the run, the persistence boundary holds.

F32 (the structural half) — MCPServerDef stores env reference (#406)

Related cleanup: when a dynamic MCPServerDef headers (or HTTP URL) contain ${LOOMCYCLE_*}, the def now stores the literal reference, not the resolved value. The runtime resolves at dial time. So even the def's content hash and persisted shape don't contain the secret. Combined with #407's redaction, secrets-at-rest is closed at two distinct seams.

F33 — dynamic MCP tools advertised at run start (#409)

One last footgun, found while wiring up the advisor → Telegram leg. The advisor was created dynamically with allowed_tools: ["mcp__telegram-dyn__*"] and no Context. The OAuth-driven sonnet model emitted the send_message call as literal <function_calls> text — zero real tool_call events. The Telegram message was never sent.

Root cause: lazy MCP tool resolution. The dynamic telegram-dyn server's tool catalog wasn't advertised at run start — it would have lazy-registered on the first call, but the model never entered tool-calling mode because it had nothing in its advertised tool list. Adding Context to allowed_tools worked around it (the agent then discovered the tools via Context op=tools), but the underlying gap was real.

The fix: advertise dynamic MCP tools at run start, so a single-purpose MCP-only agent can enter tool-calling mode without the Context crutch. Verified on v0.23.5: the no-Context advisor emitted a real tool_call and Telegram delivered message_id: 9 for PR #10.

The engineering lesson worth keeping

Store names, never values — and enforce it value-based, so the agent's typo doesn't undo your discipline. Every config primitive in loomcycle has been "env-var name in the yaml, value resolved at dial time." That discipline is correct on the runtime side. But the agent — the part of the system we don't control — will sometimes write the value back out into a shell command, an HTTP body, or a log. Value-based redaction at the persistence boundary makes the runtime's discipline structurally hold, no matter what the agent does inside the run.

Four experiments, six and a half releases (v0.22.0 → v0.23.5), 33 findings, 8 already shipped by the time the experiments wrapped, the rest sequenced for v0.24 and beyond. The runtime grew up under the pressure of a real operator driving it through real wire surface for real tasks.

What's next? More experiments. The series wasn't a one-off — the four were just the first batch. exp5 is a real RAG agent against a real corpus; exp6 is a long-running async pipeline with snapshot/resume. Same framing: design the experiment in advance, run it through a fresh Claude Code session over MCP, file every gap the agent surfaces. The point is the gaps the system tells the operator are possible but isn't quite ready for. That's how you ship a runtime that holds up in someone else's hands.

Companion reading: exp1 + exp2 — tool access and interruption, exp3 side analysis — the MCP wedge, exp3 main — the multi-agent refine loop. And RFC L (v0.17.0) for the principal model F18 interacts with.