<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>Loomcycle blog</title>
    <link>https://loomcycle.dev/blog/</link>
    <atom:link href="https://loomcycle.dev/blog/feed.xml" rel="self" type="application/rss+xml" />
    <description>Engineering writeups from the loomcycle project. Benchmark findings, architecture decisions, lessons learned.</description>
    <language>en-us</language>
    <copyright>Copyright 2026 Dennis Gubsky. Apache-2.0 licensed software; blog posts are All Rights Reserved unless noted otherwise.</copyright>
    <lastBuildDate>Fri, 15 May 2026 10:00:00 GMT</lastBuildDate>
    <ttl>1440</ttl>

    <item>
      <title>The final bench scoreboard — 25 models, $21.92, all CAPABLE</title>
      <link>https://loomcycle.dev/blog/the-final-bench-scoreboard.html</link>
      <guid isPermaLink="true">https://loomcycle.dev/blog/the-final-bench-scoreboard.html</guid>
      <pubDate>Fri, 15 May 2026 10:00:00 GMT</pubDate>
      <author>denn@loomcycle.dev (Dennis Gubsky)</author>
      <description>Sweep #6 with v3 cases + multi-judge consensus across three provider families. Every model hit CAPABLE; the real signal is cost-per-pass and overall-pass count. ollama/deepseek-v4-pro topped both quality (0.91 semantic) and price ($0.0022/pass) — beating opus at 1/75 the cost.</description>
    </item>

    <item>
      <title>How we selected agent- and tool-capable models with our own benchmark</title>
      <link>https://loomcycle.dev/blog/how-we-selected-agent-and-tool-capable-models-with-own-benchmark.html</link>
      <guid isPermaLink="true">https://loomcycle.dev/blog/how-we-selected-agent-and-tool-capable-models-with-own-benchmark.html</guid>
      <pubDate>Thu, 14 May 2026 22:30:00 GMT</pubDate>
      <author>denn@loomcycle.dev (Dennis Gubsky)</author>
      <description>We benchmarked five providers and all current flagship models for agentic tool-calling. Four sweeps in, we found a bug in our own bench harness that invalidated most of our conclusions. Here's what we learned, what the corrected findings actually say, and what's going into v2 of the bench.</description>
    </item>

    <item>
      <title>How I burned $80 on Claude Code in a Sunday afternoon</title>
      <link>https://loomcycle.dev/blog/the-80-dollars-i-burned-on-claude-code.html</link>
      <guid isPermaLink="true">https://loomcycle.dev/blog/the-80-dollars-i-burned-on-claude-code.html</guid>
      <pubDate>Thu, 07 May 2026 22:30:00 GMT</pubDate>
      <author>denn@loomcycle.dev (Dennis Gubsky)</author>
      <description>100 parallel claude code --print instances. MacBook Pro M1 fan at maximum. ANTHROPIC_API_KEY inherited via execve. Opus 4.7 on a dumb classification task. The bill: $80. Anthropic's robot denied reimbursement. The architectural lesson became loomcycle.</description>
    </item>
  </channel>
</rss>
