FastAPI, Docker, and Securing Your NLP Endpoints - Chapter 7 Part 1

Chapter 7 of Laura Funderburk’s book is where the rubber meets the road. You built a RAG pipeline in Chapter 6. Now you need to ship it. Get it out of a notebook and into something that real users can hit with HTTP requests.

Here’s the thing. A pipeline sitting in a Jupyter notebook is a prototype. It is not a product. This chapter walks you through two paths to make it a product. Part 1 covers the “do it yourself” path with FastAPI and Docker.

Why Deployment Matters

Funderburk lays out five things you need to think about before deploying any NLP pipeline:

  • Scalability: Your RAG pipeline needs to serve one user or a thousand users without falling over. This is the number one reason you move beyond a script. Container orchestration with Docker and Kubernetes is how you get there.
  • Accessibility: Making your pipeline reachable through a REST API turns it into a building block. Any frontend, mobile app, or backend service can consume it.
  • Resource management: LLMs and embedding models eat memory and GPU cycles. You need to manage those expensive resources or your cloud bill will explode.
  • Reliability: Users who see a 500 Internal Server Error will not trust your app. You need health checks, logging, and recovery processes.
  • Security: Your pipeline handles user queries and potentially sensitive documents. Data must be encrypted and access must be authenticated.

Two Deployment Strategies

The chapter presents two methods. Think of it as a trade-off between control and velocity.

Custom FastAPIHayhooks
GoalTotal control, custom logicSpeed, simplicity
BoilerplateHigh (you write everything)Minimal to zero
How it worksImport and run Pipeline in your Python API codeRead a YAML file, auto-generate endpoints
Best forComplex apps where the pipeline is one piece of a larger systemRapidly deploying RAG APIs where the pipeline IS the system

Both can be Dockerized. Both can scale with Kubernetes. The difference is how much code you write yourself. Funderburk teaches Method 1 (FastAPI) first so you understand the mechanics. Then Method 2 (Hayhooks) shows up to automate all that boilerplate away. Part 2 of this chapter covers Hayhooks.

Building the FastAPI Application

FastAPI is the de facto standard for Python ML APIs. Here’s why. It is built on Starlette for async performance and Pydantic for data validation. When your API calls an LLM, it is just waiting for an I/O response. Async support means the server can handle hundreds of other requests during that wait instead of blocking.

The sample app from the book exposes the hybrid RAG SuperComponent from Chapter 6 through multiple endpoints:

  • POST /query - Send a question, get an answer from the RAG pipeline plus source documents
  • GET /health - Check if Qdrant and the RAG component are alive
  • GET /info - See what models and settings are in use
  • GET / - Root endpoint with API status

The app connects to a Qdrant document store on startup and initializes the HybridRAGSuperComponent. Pydantic models validate incoming requests and structure outgoing responses. In simplified form:

app = FastAPI()

@app.post("/query", response_model=QueryResponse)
async def query_documents(request: QueryRequest):
    result = rag_pipeline.run({"query": request.query})
    return QueryResponse(answer=result["answer"], sources=result["documents"])

@app.get("/health")
async def health_check():
    return {"qdrant": "connected", "rag": "initialized"}

Nothing fancy. But it works and you control every detail.

Dockerizing the Pipeline

Once you have the FastAPI app, you package it in Docker. A Dockerfile is the blueprint for your “shipping container.” The book recommends multi-stage builds. One stage installs dependencies. The final stage copies only what you need for a lean production image.

The key steps:

  1. Foundation: Start with Python 3.11 slim, install essentials like curl, copy the uv dependency manager
  2. Code and dependencies: Copy pyproject.toml and uv.lock first (for caching efficiency), then copy source code
  3. Install: Run uv sync, make startup scripts executable
  4. Security: Create a non-root app user and switch to it
  5. Runtime: Expose port 8000, add a HEALTHCHECK, set up a start script that runs the indexing pipeline first, then starts Uvicorn
  6. Execution: Default command runs the startup script

Build and run it like this:

docker build -t hybrid-rag-api .
docker run -d --name hybrid-rag -p 8000:8000 \
  -e OPENAI_API_KEY=your_key \
  -e RAG_API_KEY=your_secret \
  hybrid-rag-api

The portability of Docker is what lets you run this on any cloud and later scale it with Kubernetes.

Securing Your Endpoints

You do not want your RAG endpoint open to the world. The book shows API key authentication using FastAPI’s built-in dependency injection.

api_key_header = APIKeyHeader(name="X-API-Key", auto_error=True)

async def get_api_key(api_key: str = Security(api_key_header)):
    if api_key != settings.rag_api_key:
        raise HTTPException(status_code=401, detail="Invalid API Key")
    return api_key

@app.post("/query", response_model=QueryResponse)
async def query_documents(
    request: QueryRequest,
    api_key: str = Depends(get_api_key)
):
    # only runs if API key is valid
    ...

Configuration lives in a Settings class that inherits from Pydantic’s BaseSettings. Each field maps to an environment variable. If a required variable like RAG_API_KEY is missing, the app refuses to start. No silent failures.

When you run the Docker container, secrets get passed as environment variables at runtime. They never get baked into the image. To query the protected endpoint:

curl -X POST http://localhost:8000/query \
  -H "Content-Type: application/json" \
  -H "X-API-Key: your-secret-key" \
  -d '{"query": "What is retrieval-augmented generation?"}'

What I Think

This is solid, practical content. Funderburk does a good job of explaining not just the “how” but the “why” behind each decision. The multi-stage Docker build, the non-root user, the Pydantic validation, the API key dependency injection. These are real production patterns, not toy examples.

If you have shipped Python APIs before, a lot of this will feel familiar. But if you are coming from a data science background and your comfort zone ends at notebooks, this chapter is exactly what you need. It bridges the gap between “I built a cool pipeline” and “I deployed a cool pipeline.”

In Part 2, we will cover the faster path: CI/CD automation, pipeline serialization into YAML, and Hayhooks, which takes all the boilerplate from this post and makes it disappear.


This is post 15 of 24 in the Building Natural Language and LLM Pipelines series.

Previous: Chapter 6: Production RAG with Evaluation and Feedback - Part 2

Next: Chapter 7: Deploying Haystack Applications - Part 2

About

About BookGrill.net

BookGrill.net is a technology book review site for developers, engineers, and anyone who builds things with code. We cover books on software engineering, AI and machine learning, cybersecurity, systems design, and the culture of technology.

Know More