<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
    <channel>
        <title>Claude-Code on Isac Builds</title>
        <link>https://blog.isacbuilds.com/tags/claude-code/</link>
        <description>Recent content in Claude-Code on Isac Builds</description>
        <generator>Hugo</generator>
        <language>en-us</language>
        <lastBuildDate>Sun, 12 Apr 2026 00:00:00 +0000</lastBuildDate>
        <atom:link href="https://blog.isacbuilds.com/tags/claude-code/index.xml" rel="self" type="application/rss+xml" />
        <item>
            <title>Harness Engineering: The Skill That Separates AI-Native Devs</title>
            <link>https://blog.isacbuilds.com/posts/harness-engineering-guide-ai-native-developers/</link>
            <pubDate>Sun, 12 Apr 2026 00:00:00 +0000</pubDate>
            <guid>https://blog.isacbuilds.com/posts/harness-engineering-guide-ai-native-developers/</guid>
            <description>&lt;h1 id=&#34;harness-engineering-the-skill-that-separates-ai-native-devs&#34;&gt;Harness Engineering: The Skill That Separates AI-Native Devs&lt;/h1&gt;&#xA;&lt;p&gt;Harness engineering is the discipline of building everything around an LLM to make it a reliable production system. Not prompt engineering. Not picking the right model. The real skill is designing the tools, memory, guardrails, and orchestration that turn raw model intelligence into consistent, useful output. If you&amp;rsquo;ve been wondering why some developers ship 10x with AI while others struggle with the same models, this is the answer.&lt;/p&gt;</description>
            <content:encoded>
                <![CDATA[<h1 id="harness-engineering-the-skill-that-separates-ai-native-devs">Harness Engineering: The Skill That Separates AI-Native Devs</h1>
<p>Harness engineering is the discipline of building everything around an LLM to make it a reliable production system. Not prompt engineering. Not picking the right model. The real skill is designing the tools, memory, guardrails, and orchestration that turn raw model intelligence into consistent, useful output. If you&rsquo;ve been wondering why some developers ship 10x with AI while others struggle with the same models, this is the answer.</p>
<p>The equation is simple: <strong>Agent = Model + Harness.</strong> The model provides intelligence. The harness provides direction. And in 2026, the harness is where all the engineering value lives.</p>
<p>I&rsquo;ve been building this way for months without having a name for it. My Obsidian vault, my <code>CLAUDE.md</code> files, my custom CLI tools, my skills workflows. All of it maps directly to what <a href="https://martinfowler.com/articles/harness-engineering.html">Martin Fowler</a>, <a href="https://openai.com/index/harness-engineering/">OpenAI</a>, and <a href="https://www.anthropic.com/engineering/harness-design-long-running-apps">Anthropic</a> are now formalizing as harness engineering. Here&rsquo;s what the discipline actually looks like in practice.</p>
<h2 id="what-is-a-harness-exactly">What Is a Harness, Exactly?</h2>
<p>A harness is every piece of code, configuration, and infrastructure that is <strong>not</strong> the model itself. It&rsquo;s the interface between the LLM and the real world. It manages:</p>
<ul>
<li><strong>How context is loaded</strong> (what the model sees)</li>
<li><strong>Which tools are available</strong> (what the model can do)</li>
<li><strong>How failures are handled</strong> (what happens when things break)</li>
<li><strong>How state persists</strong> (what the model remembers across sessions)</li>
</ul>
<p>Think of it like this: the model is a powerful engine. The harness is the chassis, steering, transmission, and brakes that make it a usable vehicle. Without the harness, you just have raw power with no direction.</p>
<p><img src="/images/Pasted%20image%2020260413190319.png" alt="Pasted image 20260413190319.png"></p>
<h2 id="the-six-core-principles">The Six Core Principles</h2>
<p>After synthesizing research from Martin Fowler&rsquo;s Thoughtworks team, OpenAI&rsquo;s Codex group, Anthropic&rsquo;s engineering blog, and practitioners like Dex Horthy, six principles keep showing up across every source.</p>
<h3 id="1-build-context-into-the-environment">1. Build Context into the Environment</h3>
<p>Stop cramming documents into chat windows. Instead, build a structured environment (a vault, a docs directory, a well-organized repo) that the AI can search and read as needed.</p>
<p>This is <strong>progressive disclosure</strong>. Give the model a map, not a 1,000-page manual. A short <code>CLAUDE.md</code> or <code>AGENTS.md</code> that acts as a table of contents pointing to deeper documentation. The model pulls what it needs, when it needs it.</p>
<p>Because everything here is new, there&rsquo;s no perfect recipe. You test, evaluate against your own use case, and iterate. Harness engineering is exactly that: shaping the environment and refining it until it&rsquo;s resilient enough for agentic code.</p>
<p>For example, you can use hooks to preload context or make the LLM aware of its environment before it acts.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-fallback" data-lang="fallback"><span style="display:flex;"><span>brain/
</span></span><span style="display:flex;"><span>├── CLAUDE.md           ← entry point, the &#34;map&#34;
</span></span><span style="display:flex;"><span>├── docs/
</span></span><span style="display:flex;"><span>│   ├── architecture.md
</span></span><span style="display:flex;"><span>│   ├── conventions.md
</span></span><span style="display:flex;"><span>│   └── decisions/
</span></span><span style="display:flex;"><span>├── .claude/
</span></span><span style="display:flex;"><span>│   └── skills/         ← automated workflows
</span></span><span style="display:flex;"><span>└── content/
</span></span><span style="display:flex;"><span>    └── ...</span></span></code></pre></div><h3 id="2-the-filesystem-is-king">2. The Filesystem Is King</h3>
<p>High-performance harnesses use plain markdown files and git history as the primary state mechanism. Not vector databases. Not complex RAG pipelines. Markdown files that are human-readable, version-controlled, and cheap to maintain.</p>
<p>This sounds almost too simple, but it works. Git gives you version history. Markdown gives you portability. The filesystem gives you a search interface the model already understands.</p>
<h3 id="3-verification-multiplies-quality">3. Verification Multiplies Quality</h3>
<p>Giving a model a way to verify its own work (linters, tests, a dedicated evaluator agent) can improve output quality by <strong>2-3x</strong>. This is one of the most underappreciated principles.</p>
<p>Anthropic&rsquo;s approach uses a Generator-Evaluator architecture inspired by GANs. One agent produces the work. A separate, skeptical agent grades it against concrete criteria. The key insight: models are inherently poor at evaluating their own output. They&rsquo;ll praise mediocre work if you ask them to self-review.</p>
<h3 id="4-feedforward-and-feedback-controls">4. Feedforward and Feedback Controls</h3>
<p>Martin Fowler&rsquo;s team frames harness components as two types of controls:</p>
<ul>
<li><strong>Guides (feedforward):</strong> Steer behavior <em>before</em> the model acts. Your <code>CLAUDE.md</code>, your coding standards, your project conventions. These are the guardrails.</li>
<li><strong>Sensors (feedback):</strong> Observe results <em>after</em> the model acts. Linters, test suites, type checkers. These let the model self-correct before a human reviews.</li>
</ul>
<p>The combination is powerful. Guides prevent errors. Sensors catch what slips through.</p>
<h3 id="5-agent-legibility-over-human-aesthetics">5. Agent Legibility Over Human Aesthetics</h3>
<p>OpenAI is pushing hard on this idea: optimize your codebase for <strong>agent reasoning</strong>, not just human stylistic preferences. This means favoring predictable structures, &ldquo;boring&rdquo; technologies, and explicit boundaries that the AI can easily navigate.</p>
<p>Practically, this looks like:</p>
<ul>
<li>Consistent file naming conventions</li>
<li>Clear module boundaries with documented interfaces</li>
<li>Architectural decisions recorded in markdown, not in someone&rsquo;s head</li>
<li>Error messages that include remediation instructions (so the model can fix what it breaks)</li>
</ul>
<h3 id="6-react-loops-as-the-execution-model">6. ReAct Loops as the Execution Model</h3>
<p>Harnesses use a Reasoning and Acting (ReAct) loop: observe state, reason about the next step, take an action via a tool, observe the result. This is the fundamental execution pattern behind tools like Claude Code, Cursor, and every serious coding agent.</p>
<p>The loop is non-deterministic. Unlike traditional orchestration with rigid DAGs, agent loops evolve based on the model&rsquo;s reasoning. This is a fundamental shift from traditional software engineering.</p>
<h2 id="the-frameworks-rpi-and-qrspi">The Frameworks: RPI and QRSPI</h2>
<p>Two methodologies have emerged for structuring how agents work within a harness.</p>
<h3 id="rpi-research-plan-implement">RPI (Research, Plan, Implement)</h3>
<p>RPI keeps context windows small and focused by splitting work into three phases:</p>
<ol>
<li><strong>Research:</strong> Open a fresh context window. Scan the codebase objectively to understand the system. No preconceptions.</li>
<li><strong>Plan:</strong> Outline exact steps with file names, line snippets, and testing procedures. Build a vertical plan (mock API, then UI, then real database) instead of a horizontal one.</li>
<li><strong>Implement:</strong> Execute the plan in a <strong>clean context window</strong> to avoid &ldquo;context anxiety,&rdquo; the degradation that happens when a model&rsquo;s context fills past 40-60% capacity.</li>
</ol>
<p>The key insight is the separation between phases. Each runs in fresh context so the model stays in what practitioners call the &ldquo;Smart Zone,&rdquo; the performance sweet spot below 40% context usage.</p>
<h3 id="qrspi-crispy">QRSPI (&ldquo;Crispy&rdquo;)</h3>
<p>QRSPI is an evolution of RPI that adds more structure for complex features:</p>
<p><strong>Q</strong>uestions, <strong>R</strong>esearch, <strong>D</strong>esign, <strong>S</strong>tructure, <strong>P</strong>lan, <strong>W</strong>orktree, <strong>I</strong>mplement, <strong>PR</strong></p>
<p>The critical addition is the Design and Structure phase. Before the agent writes thousands of lines of code, you align on a ~200-line markdown artifact. This is the human checkpoint. You review the design, not the implementation.</p>
<p>QRSPI also enforces an instruction budget: each phase stays under <strong>40 instructions</strong>. This comes from the finding that models reliably follow about 150-200 instructions total. Monolithic prompts with 85+ instructions lead to skipped steps and inconsistent output.</p>
<h2 id="where-the-industry-disagrees">Where the Industry Disagrees</h2>
<p>The most interesting part of this research was finding where leading teams disagree. These aren&rsquo;t settled questions.</p>
<table>
  <thead>
      <tr>
          <th>Topic</th>
          <th>Position A</th>
          <th>Position B</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><strong>Should you read the code?</strong></td>
          <td>OpenAI: Steer, don&rsquo;t code. Corrections are cheap, human review is expensive.</td>
          <td>Dex Horthy: Tried not reading code for 6 months. &ldquo;Did not end well.&rdquo;</td>
      </tr>
      <tr>
          <td><strong>Context resets vs compaction</strong></td>
          <td>Anthropic: Full resets with clean handoffs produce better results.</td>
          <td>Others: Intentional compaction to markdown preserves valuable context.</td>
      </tr>
      <tr>
          <td><strong>Instruction budgets</strong></td>
          <td>Horthy: Keep prompts small (~150-200 instructions max).</td>
          <td>OpenAI/Anthropic: Use mechanical enforcement (linters, evaluators) instead.</td>
      </tr>
      <tr>
          <td><strong>Merge gates</strong></td>
          <td>OpenAI: Minimal blocking, agent-to-agent reviews.</td>
          <td>Horthy: Humans must own and review the code.</td>
      </tr>
  </tbody>
</table>
<p>My take: you need to read the code, especially at this stage. The models are good, not perfect. Skipping review is a shortcut that compounds into tech debt you won&rsquo;t understand because you didn&rsquo;t write it.</p>
<h2 id="what-this-means-for-your-career-in-2026">What This Means for Your Career in 2026</h2>
<p>The role of the software engineer is moving one level up. Your value isn&rsquo;t in writing implementation code. It&rsquo;s in:</p>
<ul>
<li><strong>Designing feedback loops</strong> that catch errors before they ship</li>
<li><strong>Building context systems</strong> that make agents more effective over time</li>
<li><strong>Specifying intent</strong> clearly enough that agents can execute reliably</li>
<li><strong>Reviewing and owning</strong> the output, because your name is still on it</li>
</ul>
<p>OpenAI reported a small team shipping <strong>1 million lines of code with 0 manually-written lines</strong> at 3.5 PRs per engineer per day. That&rsquo;s not a future prediction. That&rsquo;s happening now.</p>
<p>The developers who learn harness engineering will ride that wave. The ones who keep pasting prompts into chat windows will wonder why their output stays flat.</p>
<h2 id="where-this-goes-next-harness-as-code-hac">Where This Goes Next: Harness as Code (HaC)</h2>
<p>Here&rsquo;s where I think this goes. If the harness is the new codebase, then the next step is codifying harnesses themselves. Developers won&rsquo;t just write application code, they&rsquo;ll write <strong>harness templates</strong> that spin up agent-ready environments: context maps, skills, hooks, evaluators, and guardrails all defined as code.</p>
<p>Call it <strong>HaC (Harness as Code)</strong>. The same way Terraform let teams scale infrastructure, HaC will let teams scale <em>agent environments</em>. That&rsquo;s where I think the real leverage shows up.</p>
<h2 id="how-to-start-today">How to Start Today</h2>
<p>You don&rsquo;t need to overhaul your workflow overnight. Start with these five steps:</p>
<ol>
<li><strong>Create a <code>CLAUDE.md</code> or <code>AGENTS.md</code></strong> in your project root. Define your agent&rsquo;s role, coding standards, and project context. This is Layer 1 of the harness.</li>
<li><strong>Structure your documentation</strong> in markdown files within the repo. Architectural decisions, conventions, and design patterns. If the agent can&rsquo;t find it, it doesn&rsquo;t exist.</li>
<li><strong>Set up an agentic runtime.</strong> Claude Code, Cursor, or similar. The specific tool matters less than having one.</li>
<li><strong>Apply RPI to your next feature.</strong> Research in one context, plan in another, implement in a third. Notice the quality difference.</li>
<li><strong>Automate one repetitive workflow.</strong> Turn a recurring process into a slash-command skill. This is where the compound interest starts.</li>
</ol>
<p>The harness is the new codebase. Start building yours.</p>
<h2 id="faq">FAQ</h2>
<h3 id="whats-the-difference-between-harness-engineering-and-prompt-engineering">What&rsquo;s the difference between harness engineering and prompt engineering?</h3>
<p>Prompt engineering focuses on crafting individual messages to get better responses. Harness engineering is the broader discipline of building the entire infrastructure around the model: tools, memory, verification loops, context management, and orchestration. Prompts are one small piece of the harness.</p>
<h3 id="do-i-need-to-know-aiml-to-do-harness-engineering">Do I need to know AI/ML to do harness engineering?</h3>
<p>No. Harness engineering is software engineering applied to AI systems. You need to understand context management, tool orchestration, and system design. The model handles the ML part. You handle everything else.</p>
<h3 id="which-framework-should-i-start-with-rpi-or-qrspi">Which framework should I start with, RPI or QRSPI?</h3>
<p>Start with RPI. It&rsquo;s simpler and teaches the core principle of separating research, planning, and implementation into distinct phases. Move to QRSPI when you&rsquo;re building features complex enough to need the design/structure alignment step.</p>
<h3 id="what-is-harness-as-code-hac">What is Harness as Code (HaC)?</h3>
<p>HaC is the idea that harnesses themselves will be codified and shared, the same way infrastructure became Infrastructure as Code. Instead of hand-rolling a harness per project, developers will write harness templates that spin up agent-ready environments: context maps, skills, hooks, evaluators, and guardrails defined as code. It&rsquo;s an early concept, but I think it&rsquo;s where the real scale comes from.</p>
<h3 id="is-harness-engineering-only-for-coding-agents">Is harness engineering only for coding agents?</h3>
<p>No. The principles apply to any AI system: writing assistants, research tools, customer support agents, automation pipelines. Anywhere you have Agent = Model + Harness, the discipline applies.</p>
]]>
            </content:encoded>
        </item>
    </channel>
</rss>
