Loomcycle blog

Loomcycle blog https://loomcycle.dev/blog/ Engineering writeups from the loomcycle project. Benchmark findings, architecture decisions, lessons learned. en-us Copyright 2026 Dennis Gubsky. Apache-2.0 licensed software; blog posts are All Rights Reserved unless noted otherwise. Fri, 15 May 2026 10:00:00 GMT 1440 The final bench scoreboard — 25 models, $21.92, all CAPABLE https://loomcycle.dev/blog/the-final-bench-scoreboard.html https://loomcycle.dev/blog/the-final-bench-scoreboard.html Fri, 15 May 2026 10:00:00 GMT denn@loomcycle.dev (Dennis Gubsky) Sweep #6 with v3 cases + multi-judge consensus across three provider families. Every model hit CAPABLE; the real signal is cost-per-pass and overall-pass count. ollama/deepseek-v4-pro topped both quality (0.91 semantic) and price ($0.0022/pass) — beating opus at 1/75 the cost. How we selected agent- and tool-capable models with our own benchmark https://loomcycle.dev/blog/how-we-selected-agent-and-tool-capable-models-with-own-benchmark.html https://loomcycle.dev/blog/how-we-selected-agent-and-tool-capable-models-with-own-benchmark.html Thu, 14 May 2026 22:30:00 GMT denn@loomcycle.dev (Dennis Gubsky) We benchmarked five providers and all current flagship models for agentic tool-calling. Four sweeps in, we found a bug in our own bench harness that invalidated most of our conclusions. Here's what we learned, what the corrected findings actually say, and what's going into v2 of the bench. How I burned $80 on Claude Code in a Sunday afternoon https://loomcycle.dev/blog/the-80-dollars-i-burned-on-claude-code.html https://loomcycle.dev/blog/the-80-dollars-i-burned-on-claude-code.html Thu, 07 May 2026 22:30:00 GMT denn@loomcycle.dev (Dennis Gubsky) 100 parallel claude code --print instances. MacBook Pro M1 fan at maximum. ANTHROPIC_API_KEY inherited via execve. Opus 4.7 on a dumb classification task. The bill: $80. Anthropic's robot denied reimbursement. The architectural lesson became loomcycle.