Engineering the Infinite: Design Principles for Agentic Workflows

There's a specific kind of failure you run into when you start taking AI agents seriously for complex engineering work.

The task is well-defined. The model is capable. The tools are connected. And it still falls apart somewhere around step 7 of 12 — the agent loses the thread, makes an assumption that contradicts something from three steps ago, and you're left debugging not just the code, but the reasoning. By the time you've reconstructed what happened, you've spent more time troubleshooting than the original task would have taken.

The problem usually isn't capability. It's context architecture.

Context is a constraint, not a feature

We talk about context windows like they're just a number — 200K tokens, sure, great. But in practice, context behaves more like working memory. You have a finite amount. Everything you put in costs something. And as it fills up, earlier information doesn't disappear — it gets deprioritized.

That's the subtle failure mode: the model doesn't forget a decision from step 2, but by step 9, it's weighing that decision against a pile of newer information and making quiet assumptions you never explicitly authorized. You end up with a system that's technically coherent but wrong in ways that are hard to trace.

Agentic engineering isn't about finding a model that can do everything in one shot. It's about designing workflows where that's never required.

The real cost of chaos

A one-shot prompt that fails is expensive in a non-obvious way. You don't just lose the output — you lose the diagnostic signal.

When a 50-step agentic run goes sideways, figuring out where it went sideways means reading every tool call in sequence, reconstructing the state at each step, and working backwards from the wreckage. It's archaeology in a collapsed mine.

Break that same work into ten independently verifiable phases, and the failure mode changes completely. When step 3 produces garbage, you know immediately. You fix step 3. The rest of the work — validated, intact — is still there.

This is just good engineering. We decompose functions. We write tests. We define interfaces so that components can be reasoned about in isolation. Agentic workflows deserve exactly the same discipline.

Three principles I keep returning to

1. Phased delivery

Never aim for a single giant leap. Move through defined phases where each output can be checked before the next phase begins.

The question to ask at each boundary: "If this step produces garbage, will I know immediately?"

If the answer is no — if bad output could silently flow into step 4, 5, and 7 before anything breaks visibly — the phase boundary is in the wrong place. Move it earlier.

2. Test in the actual environment

What works in isolation has to be proven in production-like conditions. This is obvious in theory and constantly violated in practice.

Agentic workflows make it especially easy to paper over the gap. A model can confidently execute an operation that looks correct in the abstract and fails in context — because the environment is part of the system, and you haven't tested the environment.

Sandboxes are for exploration. Real validation happens in real conditions.

3. Modular construction by task type

The way you structure a refactoring task should look different from how you structure a new feature implementation. Refactoring has a defined end state you can verify against. A new feature has ambiguity that needs to surface early, before you're eight steps into building the wrong thing.

A single CLAUDE.md that tries to describe everything usually describes nothing well. Match the instruction structure to the type of work.

What this actually enables

The interesting thing about engineering around context constraints: it doesn't just make agentic workflows more reliable. It makes them capable of tackling problems that look impossible at first glance.

A problem that's "too big" for a single context window is almost never actually too big. It's just undecomposed. Break it into phases, define the interfaces between them, validate at each boundary — and suddenly you're not fighting the constraint. You're working with it.

The real frontier in agentic engineering isn't model capability. It's workflow architecture. And that's a problem we already know how to think about.

A high-performance engine needs the right cooling, the right fuel, and a driver who understands the limits. Raw capability is only part of the equation.