Context Engineering, Prompt Strategies, and Framework Wars - Chapter 2 Part 2
In Part 1, we covered how transformers work and how models split into small language models (SLMs) and reasoning language models (RLMs). Now Funderburk shifts to a big question: how do you actually interact with these models in a reliable way?
The short answer is that prompt engineering is not enough anymore. Welcome to context engineering.
From Prompt Engineering to Context Engineering
In 2024, prompt engineering was the main skill. You wrote a careful prompt, tweaked the wording, and hoped for the best. Funderburk calls this “prompt crafting.” It works fine for simple, one-shot tasks.
But here’s the problem. Modern AI systems are not one-shot. They are agents running in loops, calling tools, checking results, and making decisions over multiple steps. When an agent generates data on every loop iteration, a single hallucination in step 3 can pollute the context for steps 4 through 20. The agent starts chasing nonsense and there is no way to debug it.
This is what the book calls context rot. Your context window fills up with noisy, outdated, or contradictory information, and the model’s performance degrades. Related problems include context distraction (irrelevant information pulling attention away), context confusion (ambiguous signals), and context clash (contradictory facts in the same window).
Context engineering is the 2025 answer. Funderburk quotes Anthropic’s definition: “the art and science of curating what will go into the limited context window from a constantly evolving universe of possible information.”
It is a superset of prompt engineering. You still write good prompts, but you also manage everything else that goes into the context window: retrieved documents, tool definitions, chat history, and outputs from previous agent steps.
The Four Core Strategies
By 2025, four strategies form the backbone of context engineering: Write, Select, Compress, and Isolate.
Write means storing information outside the immediate context window for later use. This is how agent memory works. Techniques include scratchpads (working memory for intermediate calculations) and long-term memory (user preferences and facts stored in a vector database).
Select means dynamically pulling only the most relevant information into the context window at the moment it is needed. This is “just in time” retrieval. RAG is the obvious example. But it also includes memory selection (pulling relevant items from your vector store) and dynamic tool selection (only giving the model the tools it needs right now, not all 50 of them).
Compress means distilling large pieces of information into smaller, token-efficient representations. An agent calls a tool and gets back a 500-line JSON blob. After extracting the one useful fact, you throw away the raw JSON and keep only the extracted fact. Summarization is another compression technique.
Isolate is the advanced pattern. You split a complex task into independent compartments, each with its own clean context window. A supervisor agent delegates a research subtask to a subagent. That subagent runs in its own isolated context, does all the messy work, and returns only a clean summary to the supervisor. The supervisor’s context never gets polluted by the subagent’s intermediate work.
Here’s a simplified picture of how these strategies work together:
User question
|
v
[SELECT] --> Pull relevant docs from vector store
[SELECT] --> Pick the right tools for this task
|
v
Agent reasons and calls tools
|
v
[COMPRESS] --> Trim raw tool output to essential facts
[WRITE] --> Save important findings to long-term memory
|
v
[ISOLATE] --> Spin up subagent for deep research (clean context)
--> Subagent returns summary only
|
v
Agent produces final answer
The Framework Split: LangGraph vs. Haystack
Funderburk then turns to frameworks. In 2023, LangChain, LlamaIndex, and Haystack all started as general-purpose NLP frameworks. By 2025, they have specialized.
LangGraph (by LangChain) is the agentic orchestration layer. It is a low-level, graph-based runtime for building agents. It supports loops (the think-act-observe cycle), conditional branching, and durable state with checkpointing. If you need an agent that can pause, resume, and make complex multi-step decisions, LangGraph is the tool.
LangChain 1.0 was rebuilt on top of LangGraph 1.0. Its new create_agent abstraction uses middleware hooks like before_model and wrap_tool_call that let you control what goes into the context at every step. This is context engineering implemented as code.
Haystack 2.0 (by deepset) is the data and tool layer. It is a “pipeline-first” framework built on explicit directed graphs with strict input/output data contracts. Its strengths are enterprise-grade RAG (hybrid retrieval combining dense and sparse search), built-in evaluation nodes for quality metrics, and Hayhooks for deploying any pipeline as a REST API or MCP Server.
The book makes a strong argument: these frameworks are not competitors in 2025. They are complementary.
| Feature | LangGraph 1.0 | Haystack 2.0 |
|---|---|---|
| Core focus | Agentic control-flow | Reliable dataflow |
| Architecture | Stateful graph with cycles | Explicit directed graph |
| Strengths | Complex workflows, state persistence | Measurable RAG, hybrid retrieval |
| Ideal use case | Building the agent’s brain | Building reliable tools the agent calls |
The Hybrid Architecture
Here’s the thesis of the book: the right approach is not LangGraph OR Haystack. It is LangGraph AND Haystack.
LangGraph handles the orchestration layer. It decides what to do, what tools to call, and manages agent state. Haystack handles the tool layer. It builds the actual RAG pipelines, deployed as microservices through Hayhooks, that the agent calls when it needs information.
A typical workflow looks like this:
- User asks a question. It goes to the LangGraph agent.
- The agent decides it needs external data and calls a tool.
- That tool hits a Haystack pipeline deployed as a REST API.
- The Haystack pipeline embeds the query, retrieves relevant documents, builds a prompt, and gets an LLM to generate a grounded answer.
- The result comes back as clean JSON.
- The agent uses it to write the final answer.
The key difference from doing everything in one framework is that the RAG pipeline is not a simple Python object. It is a scalable service that can handle thousands of documents in text, tabular, audio, and image formats. And because it is a separate service, you get clean separation of concerns.
Trying to force one framework to do both jobs leads to friction. Haystack is not built for complex agentic loops. LangGraph is not built for battle-tested RAG components. Using each for what it does best is the mature engineering choice.
This sets up the rest of the book. Chapters 3 through 7 focus on building the Haystack tool layer. Chapter 8 brings it all together with LangGraph for the full hybrid architecture.
This is post 5 of 24 in the Building Natural Language and LLM Pipelines series.