Ep #90: Vector Databases 101 (Part 2): Optimizing RAG for Production

Moving beyond basic RAG: Advanced chunking, Hybrid Search, and how to stop your AI from hallucinating.

Mar 12, 2026

∙ Paid

Breaking the complex System Design Components

By Amit Raghuvanshi | The Architect’s Notebook
🗓️ Mar 12, 2026 · Deep Dive ·

The Chunking Dilemma

In Part 1, we built a basic RAG pipeline. But if you deploy that today, you will hit a wall. The #1 cause of bad RAG performance isn’t the model—it’s the Chunking.

The Problem: Consider this legal text:

“Section 3.2: The contractor shall deliver the final report by December 31, 2024. 
Payment terms are outlined in Appendix B. The report must include all findings 
from the site inspection conducted in October.”

If you naively split at 50 characters:

Chunk 1: “Section 3.2: The contractor shall deliver the rep”
Chunk 2: “ort by Dec 31. Payment terms are in Appendix B.”

Both chunks are semantically broken. The vector for Chunk 1 won’t know about the date. The vector for Chunk 2 won’t know who the “contractor” is.

Smart Chunking Strategies

Semantic Chunking: Don’t split by character count. Split by sentences or paragraphs. Use an embedding model to calculate similarity between sentences; if the topic shifts, start a new chunk.
Parent-Child Chunking: Store small chunks (for precise search) but link them to a larger “Parent” document. When you find the small chunk, you feed the Parent to the LLM for better context.

# Small chunk for retrieval
small_chunk = “Payment terms are outlined in Appendix B.”
# Link to parent document ID for full context
parent_doc_id = “contract_2024_section3”

Overlapping Windows: Always maintain 10-20% overlap between chunks to prevent information loss at boundaries:

Chunk 1: “...contractor shall deliver the final report by December 31, 2024.”
Chunk 2: “by December 31, 2024. Payment terms are outlined in Appendix B...”

📖 Before You Close This Tab
If you found today’s breakdown valuable, you should look at my new book: The Architecture of Neural Scale: Volume 1(A) - Foundations of AI Systems.
The tech world is currently obsessed with treating LLMs as infinite black boxes. You send a prompt, you get a response. But what happens when that abstraction leaks? What happens when latency spikes or your cloud bill explodes?
This book strips away the vendor magic. It is a ground-up engineering guide to the physical reality of AI. We start at the mathematical foundations of neural networks and plunge straight into the “metal layer.” You will learn how to calculate VRAM, navigate memory bottlenecks, and design the infrastructure required to serve billions of parameters without your systems catching fire. No academic fluff. Just production engineering.
The 10% launch discount expires this weekend. If you are ready to graduate from API wrappers to actual AI architecture, grab your copy below.
Get the AI Foundations Book

Part 2: Vectors are Bad at Keywords

If a user searches for a specific part number: “Part-XH-992”. Vector search might return “Part-XH-993” because they are semantically identical (both are “part numbers”). But for a mechanic, that is a catastrophic failure.

The Fix: Hybrid Search Don’t rely on vectors alone.

Run a Keyword Search (BM25) to catch exact matches (”XH-992”).
Run a Vector Search to catch semantic intent (”replacement part”).
Combine results using Reciprocal Rank Fusion (RRF).

This gives you the best of both worlds: the precision of ElasticSearch and the understanding of ChatGPT.

🔒 Subscribe to read Evaluation, Cost & Future RAG

How do you know if your RAG system is good? It fails silently. The user gets an answer, but it’s wrong. You need observability.

In the rest of this deep dive, we will cover:

Choosing the Stack: Can I use Postgres or dedicated VectorDB
Distance Metrcis: What are different options to calculate distance.
RAG Evaluation: How to use “LLM-as-a-Judge” to grade your system’s accuracy automatically.
Query/Cost Optimization: Techniques to reduce embedding costs by 90%.
The Future: Agentic RAG and GraphRAG—where the LLM decides what to search for.

Upgrade to Paid to Master Production AI

Ep #90: Vector Databases 101 (Part 2): Optimizing RAG for Production

Moving beyond basic RAG: Advanced chunking, Hybrid Search, and how to stop your AI from hallucinating.

The Chunking Dilemma

📖 Before You Close This Tab

Part 2: Vectors are Bad at Keywords

🔒 Subscribe to read Evaluation, Cost & Future RAG

This post is for paid subscribers