§ architecture note

Route agents by data sensitivity: local where it matters, cloud where it doesn't.

2026-05-27 · by Dennis Gubsky · ~5 min read

"Can we use agents on our sensitive data?" is the question every enterprise asks before it lets an agentic system near anything real. The two obvious answers are both wrong.

"Run everything on local models" keeps the data on your infrastructure, but local models are meaningfully weaker than frontier cloud models - your whole agent fleet inherits the quality ceiling of whatever fits on your GPU. "Run everything on the cloud provider" gives you the best models, but now your customers' PII, your internal documents, and your proprietary data are crossing a network boundary to a third party - and "they promised not to train on it" is a contract you have to trust, not a fact you can prove.

The right answer is neither. It's route by sensitivity: the handful of agents that actually touch sensitive data run on a local model, where the data physically never leaves your box; everything else uses the best cloud model for the job. Most agents in a real fleet don't touch sensitive data at all - they research public information, transform already-public text, call external APIs. Those should get the good models. Only the few that handle the sensitive material pay the local-model quality tax. That's the pragmatic operator default, and it's a per-agent decision.

Provider is a per-agent policy, not a global default

In loomcycle, which model an agent runs on isn't a global setting - it's part of the agent's own definition. The shared loomcycle.yaml holds the common provider-resolution structure: the (provider, tier) → model mapping every agent draws from by default. Each agent's body config can lean on that - just name a tier - or override it, pinning a specific provider, tier, and even a particular model. So you classify agents by sensitivity and direct the sensitive ones to a local model:

# .claude/agents/pii-handler.md  - touches customer PII
---
name: pii-handler
provider: ollama-local        # runs on your GPU; data never leaves the box
model: llama3.1:70b           # pin a specific local model directly...
allowed_tools: [Memory, Read]
---
You extract and normalise contact details from uploaded documents.
Everything you process is sensitive. Never call external tools.

# .claude/agents/web-researcher.md  - touches only public data
---
name: web-researcher
provider: anthropic           # best cloud model; only sees public web content
tier: high                    # ...or just name a tier, resolved via loomcycle.yaml
allowed_tools: [WebSearch, WebFetch, Memory]
---
You research public information on the open web and summarise findings.

Same runtime, same binary, same bearer token. The pii-handler agent's requests go to a local Ollama model on the same machine - no outbound HTTP to any provider. The web-researcher agent's requests go to Anthropic, because it only ever sees public web content where a cloud model is the obvious choice. The operator sets the classification; the agents can't escape it (the UNIX-style trust model makes operator config the floor).

The two examples show both styles: web-researcher names a tier and lets the common resolver in loomcycle.yaml pick the concrete model; pii-handler pins a specific model directly, because for the sensitive tier you want to know exactly which local weights are running, not "whatever the resolver maps middle-tier to today." Both are per-agent; both are operator-set.

And the per-agent direction lives in whichever surface fits your workflow - the same three loomcycle exposes everywhere:

Agent .md frontmatter (Claude Code-compatible) - the format shown above; drop a file in .claude/agents/.
Static YAML agent blocks - declare the agent inline in loomcycle.yaml for fully-static deployments.
The Web UI Library editor - set provider / tier / model in the structured form, no file edit, no restart (writes to the AgentDef substrate, so it's runtime-mutable).

The supporting pieces all shipped well before v1.0:

ollama-local as a first-class provider (split out from cloud Ollama in v0.8.3) - the on-box execution target for the sensitive tier.
The common provider-resolution structure in loomcycle.yaml - the shared (provider, tier, effort) → model map every agent draws from unless it overrides; routing is policy you can change, not code you rewrite.
Per-agent override - provider, tier, and a specific model, settable from any of the three surfaces above.
PII redaction - for the awkward middle case where an agent genuinely needs a cloud model and touches some identifying data, identifying fields are masked before the prompt leaves the box. (See Even with no-training contracts, the LLM should never see your name.)

Residency you can prove, not retention you have to trust

The local-model tier gives you something a cloud-only architecture structurally cannot: a residency guarantee instead of a retention promise.

Claim	What it means	How you verify it
Retention ("they won't train on it")	Your data still leaves your box; the provider contractually agrees not to retain or train on it.	You can't. You trust the contract and the provider's word.
Residency ("it never left")	The sensitive agent runs on a local model; the data never crosses your network boundary at all.	A packet capture. There's no outbound request to inspect.

In a security review, "we can show you the network trace - nothing left the host" ends the conversation in a way "here's the DPA clause about model training" never does. That's the dimension where routing the sensitive tier to a local model wins, and it's the dimension cloud-only agent platforms can't compete on no matter what their privacy policy says.

Lead with the pragmatism, not the purity. This isn't "go fully local for safety" - that sacrifices quality across your whole fleet for a constraint that only a few agents actually have. It's "spend the local-model quality tax only where the data sensitivity demands it, and get frontier-cloud quality everywhere else." The win is that loomcycle lets you make that call per agent, on one runtime, instead of forcing one provider decision across everything.

The honest tradeoff

Local models are weaker. An agent pinned to ollama-local will produce lower-quality output than the same agent on a frontier cloud model - sometimes much lower, depending on the task and the model that fits your hardware. This is real, and pretending otherwise would be dishonest.

Two things make it workable in practice:

Sensitive-data agents are often the simpler ones. Extracting structured fields from a document, normalising contact details, classifying a record - these are tasks where a competent local model is adequate. The hard reasoning tasks (multi-step research, complex synthesis) tend to be the ones working over public or non-sensitive data, where you can use the cloud model anyway.
You can A/B it without re-architecting. Because provider is a per-agent field, you can run the sensitive agent on ollama-local in production and the same definition on a cloud provider in a staging eval, compare outputs, and decide whether the local quality clears your bar - per agent, with no code change. The AgentDef substrate even lets you fork-and-pin versions for the comparison.

If a sensitive agent's task genuinely needs frontier-model quality and the local option doesn't clear the bar, that's where PII redaction becomes the fallback: mask the identifying fields, send the structure to the cloud model, re-inflate on the way back. Residency-via-local-model is the strongest guarantee; redaction-on-cloud is the next-best when local quality won't do. loomcycle gives you both, selectable per agent.

Who this is for

Any deployment where some agents touch data that can't leave your infrastructure and some don't:

Regulated industries - healthcare (PHI), finance (account data), legal (privileged material) - where a subset of agents handle the regulated data and the rest do general work.
Enterprises with data-residency requirements - GDPR, data-localisation laws, internal "no customer PII to third parties" policies.
Anyone who wants frontier-model quality for the 80% of agent work that's safe to send to the cloud, without compromising on the 20% that isn't.

The mechanism is in the open-source runtime today - per-agent provider policy plus a local-model provider is all it takes to build this. Configure it free; the OSS does the routing. (The governance layer that enforces the classification and produces audit-log proof that sensitive agents only ever hit local models - for SOC2 / GDPR / HIPAA evidence - is the kind of thing that belongs in a future enterprise tier, but the routing itself is yours today.)

Companion writeup: Even with no-training contracts, the LLM should never see your name - the PII-redaction layer that handles the case where a sensitive agent must use a cloud model. Residency-via-local-model and redaction-on-cloud are the two halves of the same privacy posture; you pick per agent.