Day 12- AI Engineering - Why RAG Systems Need Vector Databases

Day 12 — Why RAG Systems Need Vector Databases

AI Engineering — Day by Day
My journey to becoming an AI Engineer




After learning about embeddings and chunking, I reached an interesting point in my AI engineering journey.

I understood:

  • How text becomes vectors
  • How retrieval works semantically
  • Why chunking affects answer quality

But then a much bigger question appeared:

What happens when the system has thousands or millions of chunks?

This is where I discovered:

Vector Databases


The Problem with Naive Retrieval

Initially, my retrieval pipeline looked something like this:


Query  
↓  
Generate embedding  
↓  
Compare against all embeddings  
↓  
Return closest match

This works perfectly for:

  • 5 chunks
  • 20 chunks
  • Small experiments

But it quickly breaks at scale.

Imagine:

100,000+ chunks

Now for every query:

  • The system compares against every vector
  • Latency increases
  • Memory usage grows
  • Performance drops significantly

At this point, I realized:

RAG systems need efficient retrieval infrastructure.

What is a Vector Database?

A vector database stores embeddings and performs efficient similarity search.

Traditional databases search using:

Exact values or keywords

Vector databases search using:

Semantic similarity

Then I thought whats the difference between traditional and vector db and using GenAI tool I have created below table.


Traditional Search vs Vector Search

Traditional Search Vector Search
Keyword matching Meaning matching
Exact terms Semantic similarity
"refund" "refund", "money back", "return"
String comparison Vector similarity

This difference completely changed how I think about retrieval systems.


The Full Retrieval Pipeline

At this point, the RAG pipeline started making much more sense to me:


Documents  
↓  
Chunking  
↓  
Embeddings  
↓  
Store in Vector Database  
↓  
User Query  
↓  
Query Embedding  
↓  
Similarity Search  
↓  
Top-k Chunks  
↓  
LLM Generation  

This was the moment where retrieval stopped feeling like “magic”.


Why Vector Databases Exist

The biggest realization for me was:

Vector databases do not exist because embeddings are cool.
They exist because brute-force retrieval does not scale.

 Exact Search vs Approximate Search

This was another major concept I learned.


Exact Search

Compare query against ALL vectors

Pros:

  • Highly accurate

Cons:

  • Very slow at large scale

Approximate Nearest Neighbor (ANN)

Instead of checking every vector:

  • Use indexing structures
  • Search nearby regions only

Pros:

  • Extremely fast

Cons:

  • May miss the mathematically perfect match

This introduced another important engineering tradeoff:

Real-world AI systems often trade a small amount of accuracy for massive speed improvements.

My First FAISS Experiment

To understand this practically, I built a small vector search pipeline locally using:

  • Sentence Transformers
  • FAISS

The workflow was:


Generate embeddings  
↓  
Store vectors in FAISS index  
↓  
Convert query into embedding  
↓  
Search nearest vectors  

And seeing similarity search happen locally felt like a major milestone.


What Actually Gets Stored?

Initially, I thought vector databases stored “documents”.

But that’s not entirely correct.

A vector database typically stores:

Component Purpose
Embeddings Semantic representation
Metadata Source info, tags, chunk IDs
References Link back to original text
Indexes Efficient similarity search

A Huge Retrieval Problem I Realized

Something else became very clear during this learning:

Retrieval quality is not only about finding relevant chunks.
It is also about avoiding irrelevant chunks.

Suppose:

  • Top 5 chunks are retrieved
  • Only 1 chunk is actually relevant

This creates multiple problems:

  • Context pollution
  • Attention dilution
  • Noisy generation
  • Wasted context window tokens

At this point, I started understanding:

Bad retrieval → polluted context → degraded generation

Retrieval Precision vs Recall

This introduced another important tradeoff:

Concept Meaning
Precision Retrieved chunks are highly relevant
Recall Retrieve all possible relevant chunks

Too many chunks:

  • More recall
  • More noise

Too few chunks:

  • Higher precision
  • Risk missing information

This was another reminder that:

RAG systems are full of tradeoffs.

Code

I’ve uploaded the complete FAISS experiment and retrieval pipeline here:

VectorDB Code example

The Biggest Insight I Got

At this point, I can clearly see how all the pieces connect:


Chunking  
↓  
Controls embedding quality  

Embeddings  
↓  
Control semantic retrieval  

Vector Database  
↓  
Controls scalable retrieval

And together:

They form the foundation of modern RAG systems.

What’s Next

Now that I understand scalable retrieval, the next step is:

Connecting everything into one complete RAG pipeline.

In the next post (Day 13), I’ll build:

  • Query → Embed → Retrieve → Generate
  • A complete local RAG system

Final Thought

Before today, vector databases sounded like infrastructure details.

Now:

  • I understand why they are essential
  • I understand the scalability problem they solve
  • I understand how retrieval quality impacts generation quality

This is Day 12 of my AI engineering journey — and this was the first time RAG systems started feeling like real production architectures instead of isolated concepts.

Comments