From Jupyter Notebooks to Production RAG: Docker, uv, SuperComponents, and Why Project Structure Matters

Chapter 6 is where the book shifts gears. Hard. Funderburk basically says: “Cool, you built a RAG pipeline. It works on your laptop. Now what?”

The answer is: now you make it real. And “real” means reproducible, testable, and something your team can actually maintain. This chapter is the bridge between “it works in my notebook” and “it works in production.”

Notebooks Are Great. Until They’re Not.

The first thing Funderburk tackles is the uncomfortable truth about Jupyter notebooks. They were perfect for learning. Chapters 4 and 5 used them heavily. But notebooks are terrible for production code. They don’t play well with version control. They hide dependencies. They make collaboration painful.

So the chapter moves everything into a proper Python project structure. Organized scripts, clear directory layout, real dependency management with uv, and Docker containers for the infrastructure.

Here’s the thing. This isn’t just “tidying up.” It’s a fundamental shift in how you think about your code. Your RAG pipeline is no longer a single-file experiment. It’s a system with moving parts that different people on your team will own.

The Project Layout That Maps to Your Team

The project structure Funderburk proposes is organized by function:

  • scripts/rag/ holds the core pipeline code: indexing, naive RAG, and hybrid RAG
  • scripts/synthetic_data_generation/ has the knowledge graph and test data components from Chapter 5
  • scripts/ragas_evaluation/ contains the evaluation logic
  • scripts/wandb_experiments/ handles monitoring and observability

But here’s what I like about her explanation. She doesn’t just say “put files in folders.” She connects the structure to actual team roles. The rag/ directory is the NLP engineer’s domain. The evaluation directories belong to QA engineers. The monitoring directory is for the DevOps/MLOps person.

This maps to a real production team. Clear ownership. Clear boundaries. Clear “data contracts” between parts of the system. That’s how you scale both a product and a team.

SuperComponents: Wrapping Pipelines Into Reusable Blocks

The key pattern in this chapter is Haystack’s SuperComponent abstraction. In previous chapters, you built pipelines by connecting individual components manually. Now you wrap an entire pipeline into a single reusable unit.

The blueprint looks roughly like this:

@super_component
class MyRAGPipeline:
    def __init__(self, config_params):
        # Store config, validate API keys
        self._build_pipeline()

    def _build_pipeline(self):
        # Initialize components
        # Create pipeline, connect everything
        # Define simple input/output mappings

The input/output mapping part is especially practical. Without it, calling a naive RAG pipeline means passing data to text_embedder.text, retriever.query, and prompt_builder.question separately. That’s three internal socket paths you have to know about.

With the SuperComponent mapping, you just pass query and the SuperComponent routes it to the right places internally. One input, three destinations. Clean interface, messy internals hidden away.

This gives you three benefits: interface abstraction (callers don’t need to know internal names), easy substitution (change internals without breaking external code), and the ability to map single inputs to multiple components at once.

Docker and the Dual-Elasticsearch Setup

Previous chapters used in-memory document stores. Fine for prototyping. Useless for production. This chapter introduces Elasticsearch as the persistent vector database, running inside Docker containers.

But here’s the interesting part. Funderburk doesn’t spin up just one Elasticsearch instance. She uses two. One for embeddings from text-embedding-3-small and one for text-embedding-3-large.

Why? Because of what she calls the “vector space singularity.” This is a critical rule: the same embedding model must be used for indexing, for generating synthetic test data, and for querying at runtime. If you index with one model and query with another, you’re using a map of Paris to navigate Tokyo. The vectors live in completely different mathematical spaces. Your search results will be nonsense.

So if you want to compare the performance of two different embedding models, you need separate vector stores. You can’t mix embeddings from different models in the same database. The dual-Elasticsearch setup in docker-compose.yml is how you run that comparison cleanly.

Why Two Embedding Models?

The dual setup is really a framework for A/B testing:

  • Pipeline A uses ES_SMALL_URL with vectors from the small embedding model
  • Pipeline B uses ES_LARGE_URL with vectors from the large embedding model

You send the same query to both. You measure quality with RAGAS. You track cost per query. You get real data on the cost-performance tradeoff.

The second benefit is resource allocation. The large embedding model produces 3,072-dimension vectors. The small one produces 1,536 dimensions. Those larger vectors need more RAM and storage. By splitting them into separate Elasticsearch instances, you can provision each one appropriately. The small instance gets 512 MB to 1 GB of heap. The large one gets 2 to 4 GB. No over-provisioning, no wasted infrastructure spend.

The Shift From Developer to Architect

This first half of Chapter 6 is really about one idea: your pipeline code is not the product. The system around it is. Dependency management with uv. Containerization with Docker. Modular project structure. Clean abstractions with SuperComponents. These are the things that turn a working prototype into something you can deploy, monitor, and hand off to another engineer without a three-hour explanation.

In Part 2, we’ll get into the actual evaluation with RAGAS metrics, how Weights and Biases adds observability, and the real cost-performance numbers comparing small versus large embedding models.


This is post 13 of 24 in the Building Natural Language and LLM Pipelines series.

Based on Chapter 6 of “Building Natural Language and LLM Pipelines” by Laura Funderburk (ISBN: 978-1-83546-799-2, Packt Publishing, 2025).

Previous: Chapter 5, Part 2: Custom Haystack Components

Next: Chapter 6, Part 2: RAGAS Evaluation and Observability

About

About BookGrill.net

BookGrill.net is a technology book review site for developers, engineers, and anyone who builds things with code. We cover books on software engineering, AI and machine learning, cybersecurity, systems design, and the culture of technology.

Know More