Day 12 — Why RAG Systems Need Vector Databases
AI Engineering — Day by Day
My journey to becoming an AI Engineer
After learning about embeddings and chunking, I reached an interesting point in my AI engineering journey.
I understood:
- How text becomes vectors
- How retrieval works semantically
- Why chunking affects answer quality
But then a much bigger question appeared:
What happens when the system has thousands or millions of chunks?
This is where I discovered:
Vector Databases
The Problem with Naive Retrieval
Initially, my retrieval pipeline looked something like this:
Query
↓
Generate embedding
↓
Compare against all embeddings
↓
Return closest match
This works perfectly for:
- 5 chunks
- 20 chunks
- Small experiments
But it quickly breaks at scale.
Imagine:
100,000+ chunks
Now for every query:
- The system compares against every vector
- Latency increases
- Memory usage grows
- Performance drops significantly
At this point, I realized:
RAG systems need efficient retrieval infrastructure.
What is a Vector Database?
A vector database stores embeddings and performs efficient similarity search.
Traditional databases search using:
Exact values or keywords
Vector databases search using:
Semantic similarity
Then I thought whats the difference between traditional and vector db and using GenAI tool I have created below table.
Traditional Search vs Vector Search
| Traditional Search | Vector Search |
|---|---|
| Keyword matching | Meaning matching |
| Exact terms | Semantic similarity |
| "refund" | "refund", "money back", "return" |
| String comparison | Vector similarity |
This difference completely changed how I think about retrieval systems.
The Full Retrieval Pipeline
At this point, the RAG pipeline started making much more sense to me:
Documents
↓
Chunking
↓
Embeddings
↓
Store in Vector Database
↓
User Query
↓
Query Embedding
↓
Similarity Search
↓
Top-k Chunks
↓
LLM Generation
This was the moment where retrieval stopped feeling like “magic”.
Why Vector Databases Exist
The biggest realization for me was:
Vector databases do not exist because embeddings are cool.
They exist because brute-force retrieval does not scale.
Exact Search vs Approximate Search
This was another major concept I learned.
Exact Search
Compare query against ALL vectors
Pros:
- Highly accurate
Cons:
- Very slow at large scale
Approximate Nearest Neighbor (ANN)
Instead of checking every vector:
- Use indexing structures
- Search nearby regions only
Pros:
- Extremely fast
Cons:
- May miss the mathematically perfect match
This introduced another important engineering tradeoff:
Real-world AI systems often trade a small amount of accuracy for massive speed improvements.
My First FAISS Experiment
To understand this practically, I built a small vector search pipeline locally using:
- Sentence Transformers
- FAISS
The workflow was:
Generate embeddings
↓
Store vectors in FAISS index
↓
Convert query into embedding
↓
Search nearest vectors
And seeing similarity search happen locally felt like a major milestone.
What Actually Gets Stored?
Initially, I thought vector databases stored “documents”.
But that’s not entirely correct.
A vector database typically stores:
| Component | Purpose |
|---|---|
| Embeddings | Semantic representation |
| Metadata | Source info, tags, chunk IDs |
| References | Link back to original text |
| Indexes | Efficient similarity search |
A Huge Retrieval Problem I Realized
Something else became very clear during this learning:
Retrieval quality is not only about finding relevant chunks.
It is also about avoiding irrelevant chunks.
Suppose:
- Top 5 chunks are retrieved
- Only 1 chunk is actually relevant
This creates multiple problems:
- Context pollution
- Attention dilution
- Noisy generation
- Wasted context window tokens
At this point, I started understanding:
Bad retrieval → polluted context → degraded generation
Retrieval Precision vs Recall
This introduced another important tradeoff:
| Concept | Meaning |
|---|---|
| Precision | Retrieved chunks are highly relevant |
| Recall | Retrieve all possible relevant chunks |
Too many chunks:
- More recall
- More noise
Too few chunks:
- Higher precision
- Risk missing information
This was another reminder that:
RAG systems are full of tradeoffs.
Code
I’ve uploaded the complete FAISS experiment and retrieval pipeline here:
VectorDB Code exampleThe Biggest Insight I Got
At this point, I can clearly see how all the pieces connect:
Chunking
↓
Controls embedding quality
Embeddings
↓
Control semantic retrieval
Vector Database
↓
Controls scalable retrieval
And together:
They form the foundation of modern RAG systems.
What’s Next
Now that I understand scalable retrieval, the next step is:
Connecting everything into one complete RAG pipeline.
In the next post (Day 13), I’ll build:
- Query → Embed → Retrieve → Generate
- A complete local RAG system
Final Thought
Before today, vector databases sounded like infrastructure details.
Now:
- I understand why they are essential
- I understand the scalability problem they solve
- I understand how retrieval quality impacts generation quality
This is Day 12 of my AI engineering journey — and this was the first time RAG systems started feeling like real production architectures instead of isolated concepts.
Comments
Post a Comment