ketchalegend
← Back

AI Agents Need Control Flow, Not More Prompts

The best prompts can't fix unreliable agents. Instead of adding more instructions, we need to programmatically manage agent behavior with control flow.

No matter how cleverly you craft a prompt, AI agents still fail at deterministic tasks. A recent Hacker News thread and a thoughtful blog post argue the solution isn't better prompting—it's control flow.

Why Prompt Engineering Hits a Ceiling

In a post titled "Agents need control flow, not more prompts," Bsuh makes a compelling case: treating agents as pure prompt-to-action systems is fundamentally broken. Instead of piling on more instructions, skill lists, or few-shot examples, build agents with explicit control flow. Code decides when to call the LLM, when to validate, when to retry, and when to confidently execute without the model.

Current agent frameworks (LangChain, AutoGPT, etc.) often wrap LLMs in brittle loops: prompt → LLM → (maybe validate) → loop. This works for demos but fails in production because the LLM's output distribution is too wide for reliable automation. The alternative: let LLMs generate proposals, but always validate and coalesce results through deterministic code.

Why the Hacker News Community Agrees

The thread resonated with builders who've hit the ceiling of prompt engineering. One commenter wrote:

"I wonder if a part of the problem isn't just the misapplication of LLMs in the first place. As has been mentioned elsewhere, perhaps the agent's prompt should be to write code to accomplish as much of the task in as repeatable/verifiable/deterministic a way as possible."

Another commenter expanded:

"When you hit the limit of prompting, you need to move from using LLMs at run time to accomplish a task to using LLMs to write software to accomplish the task."

The most upvoted comments all lean toward the same conclusion: trust the LLM for creative parts, but use code for logic.

How Control Flow Improves Reliability

I've been building with LLMs since GPT-3. The pattern is always the same: a promising agent demo falls apart in production because the LLM doesn't follow instructions reliably. The fundamental issue is that LLMs are probabilistic, not deterministic. No amount of prompt engineering can change that—it's an architectural constraint.

Control flow isn't about dismissing LLMs; it's about using them where they shine and compensating for their weaknesses with code. Consider an agent that processes customer support tickets. Prompting the LLM to "extract the user's name, email, and issue" is unreliable. Instead, prompt it to output structured data (JSON) and validate that data with a schema. Better yet, have the LLM generate code to parse the fields, then run that code in a sandbox.

This mirrors how robust systems are built: break the task into modules, each with explicit inputs and outputs. An LLM call is just one module—it should be guarded, instrumented, and optionally replaced.

Practical Patterns for Production Agents

If you're building agents, here's how to apply control flow:

  1. Separate planning from execution. Let the LLM produce a plan as structured output (e.g., a list of steps). Execute each step with deterministic code. Validate the plan's structure before execution.

  2. Validate LLM output immediately. Instead of hoping the LLM returns correct JSON, parse and validate it against a schema. If it fails, retry with a better prompt or fallback.

  3. Use code-generation where possible. For tasks like data extraction or file manipulation, have the LLM generate a script, run it in a sandboxed environment, and inspect the result.

  4. Build in explicit retry and error handling. Don't just loop until the LLM gets it right. Write code that catches common errors, adjusts the context, and re-prompts with additional guidance.

Here's a minimal example in Python:

import json
from pydantic import BaseModel, ValidationError

class Ticket(BaseModel):
    name: str
    email: str
    issue: str

response = llm_call("""
Extract the user's info as JSON:
name, email, issue
Return valid JSON.
""")

try:
    data = json.loads(response)
    ticket = Ticket.model_validate(data)
except (json.JSONDecodeError, ValidationError):
    fixed = llm_call(f"The previous JSON was invalid: {response}\nPlease output correct JSON.")
    data = json.loads(fixed)
    ticket = Ticket.model_validate(data)

This pattern costs slightly more but dramatically improves reliability. You can extend it with retries, fallback strategies, and logging.

When to Use Control Flow

If you're building agents for production—especially customer-facing or financially-critical systems—yes. Prompt-only approaches will fail at scale. If you're prototyping or working in creative domains (story generation, brainstorming), existing tools may suffice. But for any agent that needs to perform concrete actions reliably, add control flow now. Don't wait for models to improve.

The LLMs of today are powerful but unreliable. Treat them as a component in a larger system, not the system itself. That means writing code to manage their behavior, not just prompts to describe their job.


Links: Original blog post, HN discussion, Pydantic validation, LangChain, OpenAI structured outputs.