Day 10 — Chunking in RAG (The Most Underrated Part of AI Systems)
AI Engineering — Day by Day
My journey to becoming an AI Engineer
After learning about embeddings and semantic search, I started feeling like I finally understood how retrieval works.
But then I realized something important:
Even perfect embeddings cannot save a badly chunked system.
And honestly, this completely changed how I think about RAG pipelines.
The Question That Started Everything
Once I understood embeddings, the next question became:
What exactly are we embedding and retrieving?
The answer:
Chunks of text
And this process of splitting documents into smaller pieces is called:
Chunking
Why Chunking Exists
Documents are usually:
- Large
- Unstructured
- Too big for direct retrieval
For example:
100-page PDF Large knowledge base Long policy documents
We cannot simply embed an entire document as one giant block.
So, we split it into smaller meaningful units.
What I Initially Thought
At first, chunking sounded trivial to me.
I thought:
“Just split text every few hundred words.”
But the deeper I explored, the more I realized:
Chunking is not text splitting.
It is context preservation engineering.
What Happens with Large Chunks
Suppose one chunk contains:
- Refund policy
- Shipping policy
- Account setup
Now imagine the user asks:
What is the refund policy?
The embedding generated for this chunk becomes:
A mixed representation of multiple topics
This creates a problem:
- The semantic meaning becomes diluted
- Retrieval accuracy decreases
This was a major realization for me:
Larger chunks don’t always mean better context.
What Happens with Small Chunks
Now let’s go to the opposite extreme.
Suppose the chunk is:
"Returned within 7 days"
What’s missing?
- What is being returned?
- Under what conditions?
This leads to:
- Loss of surrounding context
- Fragmented retrieval
- Incomplete answers
At this point, I realized:
Too small chunks improve precision… but reduce meaning.
Why Overlapping Chunks Matter
This was one of the most interesting concepts.
Instead of splitting like this:
Chunk 1 → 1–500 Chunk 2 → 501–1000
We overlap:
Chunk 1 → 1–500 Chunk 2 → 400–900
Why?
Because ideas often continue across boundaries.
Overlap helps:
- Preserve continuity
- Reduce context loss
- Improve retrieval reliability
Types of Chunking
1. Fixed Chunking
Split based on:
- Character count
- Token count
Example:
Every 500 tokens
Pros:
- Simple
- Fast
Cons:
- Breaks meaning
- Ignores structure
2. Semantic Chunking
Split based on:
- Paragraphs
- Topics
- Meaning
Pros:
- Preserves context
- Improves retrieval quality
Cons:
- More complex
Questions I Had While Learning
Why do large chunks reduce retrieval accuracy?
Because large chunks contain multiple topics, causing embeddings to represent a broad mixture of meanings instead of a focused semantic concept.
Why do very small chunks reduce answer quality?
Because they often lose surrounding context and relationships between ideas, resulting in fragmented retrieval and incomplete answers.
Why is overlap important?
Overlap preserves continuity between chunks and ensures important contextual information spanning chunk boundaries is not lost during retrieval.
The Biggest Insight I Got
At this point, something became very clear:
Most RAG failures are not model failures.
They are retrieval and chunking failures.
This changed my perspective completely.
Earlier:
- I focused mainly on prompts and models
Now:
- I understand the retrieval pipeline is equally important
What’s Next
Now that I understand chunking conceptually, the next step is:
Actually, implementing and experimenting with chunking strategies.
In the next post, I’ll:
- Create different chunking strategies
- Compare retrieval quality
- Observe real-world failures
Final Thought
Before today, chunking felt like a preprocessing step.
Now it feels like:
One of the most important design decisions in a RAG system.
This is Day 10 of my AI engineering journey — and this concept completely changed how I think about retrieval systems.
No comments:
Post a Comment