Hands-On NER Pipelines and Text Classification With Haystack: From Monolithic to Tool-Based Architecture
Chapter 8 is where Funderburk says: enough with single pipelines. Time to build tools. And then make an agent pick which tool to use.
The big idea is a shift from monolithic pipelines (one pipeline does everything) to discrete, high-performance tools that an agent can call. This is how production systems actually work. You don’t build one giant pipeline. You build small specialized ones and let a brain decide which to call.
The Agent Recap: Pipelines as Tools
Funderburk opens with a quick recap of how Haystack agents work. The hierarchy goes like this:
Components -> Pipelines -> SuperComponents -> Agents
You take a pipeline (say, a hybrid RAG pipeline), wrap it in a SuperComponent, then wrap that in a ComponentTool. Now the agent can use it.
tool_name = "internal_document_search"
tool_description = "Use this tool to search internal knowledge..."
internal_search_tool = ComponentTool(
name=tool_name,
component=hybrid_rag_sc,
description=tool_description,
)
You give the agent an LLM, a system prompt, and a list of tools. The agent gets a question, picks the right tool, calls it, gets back data, and answers. Simple flow: query -> decide tool -> call tool -> get answer.
Why Built-In Agents Hit a Wall
Here’s the problem. The Haystack agent has a fixed reasoning loop: thought -> action -> observation. That’s great for simple stuff. But when you need multiple agents talking to each other, supervisor patterns, human-in-the-loop approval, or dynamic replanning, things get messy.
Funderburk identifies two core limitations:
Complex manual loops. Building even a simple loop requires wiring ConditionalRouter, ToolInvoker, and MessageCollector together. Scaling that to multi-agent systems where agents communicate with each other (not just respond to users) creates spaghetti connections.
Rigid data flow. Haystack pipelines are designed for directed data flow. Forcing them to behave as dynamic state machines, where execution paths change based on intermediate results, means cramming stateful logic into stateless components.
There’s also the problem of context rot. As conversations grow, the context window fills up and the LLM starts forgetting things. Haystack handles memory through components like ConversationalMemory, but managing overflow is something the developer has to build manually.
Enter LangGraph
So Funderburk introduces LangGraph 1.0 as the solution for agentic orchestration. It’s not an agent itself. It’s a low-level library for building agents as graphs with explicit control.
LangGraph uses three core concepts: state, nodes, and edges. You define the agent’s logic as a state machine. Each node is a Python function that modifies a central state object. Edges connect nodes and can be conditional.
Here’s what LangGraph gives you that Haystack’s built-in agent doesn’t:
- Cyclical flows. Loops are native. Wire the LLM node to the tool node and back again. That IS the agent’s runtime.
- Guardrails as nodes. A guardrail is just another node in the graph. Put one before the LLM to filter bad queries, put another after the tool call to validate output.
- Observable state. The central state object gives you a step-by-step trace of what happened.
- Context engineering. LangGraph 1.0 has built-in middleware for preventing context pollution, plus a checkpointer system for durable, resumable agents.
The key insight from the chapter: Haystack’s graph is structural (how data flows between components). LangGraph’s graph is logical (a state machine where nodes are Python functions modifying persistent state). Different philosophies, both useful.
Mini-Project: Named Entity Recognition
NER is the first hands-on project. It’s a fundamental NLP technique: take a sentence like “Schedule a meeting with Sarah at Blue Bottle Coffee in San Francisco” and extract structured entities:
- Person: Sarah
- Organization: Blue Bottle Coffee
- Location: San Francisco
- Date: Next Friday
Why does this matter for agents? An agent can’t act on raw text. It needs structured data. Before calling a get_weather tool, the agent needs to know the location is “London” and the date is “tomorrow.” NER is the tool that extracts those parameters.
Funderburk uses Haystack’s NamedEntityExtractor component with the bert-base-NER model from Hugging Face. This model recognizes four entity types: LOC, ORG, PER, and MISC.
The NER pipeline chains several components:
- Web search finds relevant documents from allowed domains
- Link content fetcher grabs the HTML
- HTML to document converter turns raw HTML into Haystack Documents
- Document cleaner strips whitespace and junk
- Named entity extractor runs BERT-based NER
- NERPopulator (custom component) processes the annotations and structures entities into metadata
The whole thing gets wrapped as a SuperComponent and given to a Haystack agent as a tool. Feed it a query like “Find entities about Nikola Tesla from Britannica” and the agent finds articles, extracts entities (15 people, 13 organizations, 16 locations), and saves results to CSV.
One important limitation Funderburk calls out: ambiguity. The word “bank” could mean a financial institution or a riverbank. NER systems struggle with this. The solution is entity linking, where you disambiguate entities by connecting them to entries in a knowledge base like Wikipedia. If “Jaguar” appears near “horsepower,” it’s the car brand, not the animal.
Text Classification: Zero-Shot Routing
The second mini-project tackles text classification. Funderburk introduces two Haystack components:
TransformersTextRouter routes text based on pre-trained model labels. Good for tasks the model was specifically trained for, like sentiment analysis.
TransformersZeroShotTextRouter is the interesting one. It classifies text into categories it’s never seen during training. You just give it labels like “politics,” “sport,” “technology,” “entertainment,” “business” and it figures out which one fits. No fine-tuning required.
Funderburk tests it on a dataset of 2,225 text samples across five categories using the deberta-v3-large-zeroshot-v2.0 model. The results are solid: 91% overall accuracy. Sport hits a perfect 1.00 recall. Business has slightly lower recall at 0.79 but very high precision at 0.97.
The classification pipeline follows the same pattern as NER: web search -> fetch content -> convert to documents -> clean -> classify. You query for “Elon Musk” articles from Yahoo Finance, and the pipeline automatically labels them as Politics, Business, or Technology based on content.
Both NER and text classification pipelines are designed to be serialized and deployed through Hayhooks as REST endpoints. That’s the setup for the final mini-project, where everything comes together in a multi-agent system. More on that in the next post.
This is post 17 of 24 in the Building Natural Language and LLM Pipelines series.