Agentic AI Architecture: From Monolithic Scripts to Resilient Supervisors

The epilogue of Funderburk’s book is where everything clicks together. All the individual skills from earlier chapters (pipelines, RAG, tool contracts, Haystack components, LangGraph orchestration) get assembled into a single architectural argument. And that argument is surprisingly clear: separate the doing from the thinking.

The Agentic Inflection Point

Here’s the thing. Between 2023 and 2025, the entire AI industry was obsessed with capability. GPT-4 pushed reasoning boundaries. Gemini went after massive context windows. Llama 3 and DeepSeek-R1 brought frontier-level intelligence to local hardware. Everybody wanted bigger, smarter, faster models.

But here’s the problem. As organizations moved from “cool demo” to “we’re deploying this in production,” the focus shifted from “can it do it?” to “can we control it?” The gap between what a model can do and what we can reliably govern is what Funderburk calls the agentic reliability crisis.

And her point is that prompt engineering alone does not fix this. You can write the most perfect prompt in the world, and if your architecture is a mess, the system will still hallucinate when things go sideways. The fix is what she calls context engineering: managing what information the model sees and how it’s structured, not just what you ask it to do.

Context engineering has four strategies: write, select, compress, and isolate. We’ll see all four play out across three versions of the Yelp Navigator project.

The Tool vs Orchestration Thesis

The central architectural thesis of this book is clean and simple: decouple doing from thinking.

The “doing” layer is your tool layer. In this case, Haystack pipelines deployed as microservices via Hayhooks. These are deterministic. They either succeed or fail. They don’t hallucinate a retrieval strategy. They’re robust directed graphs that handle data retrieval, preprocessing, embedding, and reranking.

The “thinking” layer is your orchestration layer. That’s LangGraph. It manages stateful reasoning, routing, loops, and decision-making. It’s where the LLM lives and reasons about what to do next.

Here’s the key insight. The agent doesn’t execute RAG pipeline code directly. It makes HTTP POST requests to local endpoints:

@tool
def search_businesses(query: str) -> Dict[str, Any]:
    """Search for businesses using natural language query."""
    response = requests.post(
        f"{BASE_URL}/business_search/run",
        json={"query": query}, timeout=30
    )

This separation has real consequences. The RAG pipeline can scale independently. You can update it without redeploying the agent. If the pipeline fails, you take it down, fix it, redeploy it. The agent treats the entire RAG process as a black box. It only needs to know the function signature and when to call it.

Haystack handles rigid, high-throughput data processing. LangGraph handles fluid, stateful reasoning. Each does what it’s good at.

Version 1: The Sequential Chain (The Shallow Agent)

The book walks through three versions of the Yelp Navigator to show how agentic architectures mature. Each version shares the same tools and prompts. What changes is the graph structure, the state management, and the control flow.

V1 is the shallow agent. It’s basically a script: prompt the LLM, parse the output, call a tool, return the result. It assumes the happy path. The user asks a clear question, the API is up, the model parses everything correctly, and the answer appears on the first attempt.

The problems are severe. If the search step returns junk because of a vague query, the sentiment step runs anyway and tries to analyze irrelevant businesses. The summarization step then hallucinates a coherent answer from disjointed data. Everything gets dumped into a single growing context window. As the conversation goes on, context rot sets in and the model starts ignoring its own system instructions.

Version 2: The Router Pattern

V2 introduces a supervisor node. Instead of each worker node deciding what to call next, a central supervisor evaluates the state and delegates work. This is a huge shift.

The first line of defense is a clarification node. It acts as a gatekeeper. If the user is just chatting, the query routes to a general chat node. If the user wants data, it goes to the supervisor. The supervisor then routes to specific worker nodes (search, details, sentiment) based on the state.

The state management also evolves. V1 dumped everything into a growing list of messages. V2 introduces structured fields with clean data for each tool, separating search context from raw conversation history. The supervisor works with Boolean flags (“did search return data?”) instead of re-reading entire JSON payloads.

V2 uses LangGraph’s Command class for routing:

def clarify_intent_node(state, config):
    # If the user needs to provide more info
    if decision.need_clarification:
        return Command(goto=END, update={"messages": [...]})
    # If it's just general chat
    if decision.intent == "general_chat":
        return Command(goto="general_chat")
    # Otherwise, route to supervisor for tool use
    return Command(goto="supervisor", update={...})

But V2 still assumes a benign user and a perfect world. No guardrails against prompt injection. No PII filtering. No retry logic when microservices fail. If a tool goes down, V2 either hallucminates a reason or crashes.

Version 3: The Resilient Supervisor

V3 wraps the V2 logic in layers of protection. Three additions matter most.

Input guardrails. Before anything touches the LLM, a deterministic node scans for PII and injection attacks using RegEx. No tokens spent. No LLM involved. Fast and rigid.

Retry policies at the infrastructure level. Instead of telling the LLM “try again if you fail” in a prompt, V3 uses LangGraph’s RetryPolicy on tool nodes:

retry_policy = RetryPolicy(
    max_attempts=3,
    initial_interval=1.0,
    backoff_factor=2.0,
    max_interval=10.0
)
workflow.add_node("search_tool", search_tool_node, retry_policy=retry_policy)

If a tool fails three times, the supervisor gets a structured failure signal. It can then gracefully shut down instead of spiraling into an error loop.

Memory via checkpointing. Using LangGraph’s MemorySaver, the agent remembers previous searches. If a user asks about a specific business after a search, the agent uses its existing data instead of triggering an entirely new search.

The evolution from V1 to V3 mirrors the shift from scripting to engineering. V3 is a software system that happens to use LLMs, not an LLM wrapper.

In the next post, we’ll see the hard numbers: how much V2 saves on tokens compared to V1, what happens when you deliberately break the microservices, and which architecture falls apart versus which one exits gracefully.


This is post 22 of 24 in the Building Natural Language and LLM Pipelines series.

Book series page

Previous: Future Trends in NLP and LLM Pipelines (Part 2)

Next: Token Economics, System Integrity, and the Sovereign Stack

About

About BookGrill.net

BookGrill.net is a technology book review site for developers, engineers, and anyone who builds things with code. We cover books on software engineering, AI and machine learning, cybersecurity, systems design, and the culture of technology.

Know More