Day 12 — Why RAG Systems Need Vector Databases

AI Engineering — Day by Day
My journey to becoming an AI Engineer

After learning about embeddings and chunking, I reached an interesting point in my AI engineering journey.

I understood:

How text becomes vectors
How retrieval works semantically
Why chunking affects answer quality

But then a much bigger question appeared:

What happens when the system has thousands or millions of chunks?

This is where I discovered:

Vector Databases

The Problem with Naive Retrieval

Initially, my retrieval pipeline looked something like this:


Query  
↓  
Generate embedding  
↓  
Compare against all embeddings  
↓  
Return closest match

This works perfectly for:

5 chunks
20 chunks
Small experiments

But it quickly breaks at scale.

Imagine:

100,000+ chunks

Now for every query:

The system compares against every vector
Latency increases
Memory usage grows
Performance drops significantly

At this point, I realized:

RAG systems need efficient retrieval infrastructure.

What is a Vector Database?

A vector database stores embeddings and performs efficient similarity search.

Traditional databases search using:

Exact values or keywords

Vector databases search using:

Semantic similarity

Then I thought whats the difference between traditional and vector db and using GenAI tool I have created below table.

Traditional Search vs Vector Search

Traditional Search	Vector Search
Keyword matching	Meaning matching
Exact terms	Semantic similarity
"refund"	"refund", "money back", "return"
String comparison	Vector similarity

This difference completely changed how I think about retrieval systems.

The Full Retrieval Pipeline

At this point, the RAG pipeline started making much more sense to me:


Documents  
↓  
Chunking  
↓  
Embeddings  
↓  
Store in Vector Database  
↓  
User Query  
↓  
Query Embedding  
↓  
Similarity Search  
↓  
Top-k Chunks  
↓  
LLM Generation

This was the moment where retrieval stopped feeling like “magic”.

Why Vector Databases Exist

The biggest realization for me was:

Vector databases do not exist because embeddings are cool.
They exist because brute-force retrieval does not scale.

Exact Search vs Approximate Search

This was another major concept I learned.

Exact Search

Compare query against ALL vectors

Pros:

Highly accurate

Cons:

Very slow at large scale

Approximate Nearest Neighbor (ANN)

Instead of checking every vector:

Use indexing structures
Search nearby regions only

Pros:

Extremely fast

Cons:

May miss the mathematically perfect match

This introduced another important engineering tradeoff:

Real-world AI systems often trade a small amount of accuracy for massive speed improvements.

My First FAISS Experiment

To understand this practically, I built a small vector search pipeline locally using:

Sentence Transformers
FAISS

The workflow was:


Generate embeddings  
↓  
Store vectors in FAISS index  
↓  
Convert query into embedding  
↓  
Search nearest vectors

And seeing similarity search happen locally felt like a major milestone.

What Actually Gets Stored?

Initially, I thought vector databases stored “documents”.

But that’s not entirely correct.

A vector database typically stores:

Component	Purpose
Embeddings	Semantic representation
Metadata	Source info, tags, chunk IDs
References	Link back to original text
Indexes	Efficient similarity search

A Huge Retrieval Problem I Realized

Something else became very clear during this learning:

Retrieval quality is not only about finding relevant chunks.
It is also about avoiding irrelevant chunks.

Suppose:

Top 5 chunks are retrieved
Only 1 chunk is actually relevant

This creates multiple problems:

Context pollution
Attention dilution
Noisy generation
Wasted context window tokens

At this point, I started understanding:

Bad retrieval → polluted context → degraded generation

Retrieval Precision vs Recall

This introduced another important tradeoff:

Concept	Meaning
Precision	Retrieved chunks are highly relevant
Recall	Retrieve all possible relevant chunks

Too many chunks:

More recall
More noise

Too few chunks:

Higher precision
Risk missing information

This was another reminder that:

RAG systems are full of tradeoffs.

Code

I’ve uploaded the complete FAISS experiment and retrieval pipeline here:

VectorDB Code example

The Biggest Insight I Got

At this point, I can clearly see how all the pieces connect:


Chunking  
↓  
Controls embedding quality  

Embeddings  
↓  
Control semantic retrieval  

Vector Database  
↓  
Controls scalable retrieval

And together:

They form the foundation of modern RAG systems.

What’s Next

Now that I understand scalable retrieval, the next step is:

Connecting everything into one complete RAG pipeline.

In the next post (Day 13), I’ll build:

Query → Embed → Retrieve → Generate
A complete local RAG system

Final Thought

Before today, vector databases sounded like infrastructure details.

Now:

I understand why they are essential
I understand the scalability problem they solve
I understand how retrieval quality impacts generation quality

This is Day 12 of my AI engineering journey — and this was the first time RAG systems started feeling like real production architectures instead of isolated concepts.

JSDevLife

Search This Blog

Day 12- AI Engineering - Why RAG Systems Need Vector Databases