Day 2 — Tokens, Context Window & Sampling (Hidden Mechanics of LLMs)
If you’ve understood that LLMs predict the next token, the next question is:
What actually controls their behavior?
Three things shape every output:
- Tokenization
- Context window
- Sampling
Most people ignore these — and that’s why things break.
1. Tokenization
LLMs don’t read words. They read tokens.
"unbelievable" → ["un", "believ", "able"]
Even small changes matter:
- "Hello" ≠ " Hello"
- "AI" ≠ "AI."
Why it matters:
- Cost is per token
- Formatting affects output
- Debugging becomes tricky
The model works on token patterns, not language.
2. Context Window
This is the model’s memory limit.
Two problems:
- Hard limit: Old data gets removed
- Soft limit: Too much info → weaker attention
Impact:
- Long chats lose instructions
- Big inputs → worse answers
Context window is a performance constraint, not just a limit.
3. Sampling
The model predicts probabilities — but doesn’t always pick the top one.
Controls:
- Temperature: low = safe, high = creative
- Top-k: restricts choices
- Top-p: probability-based selection
Sampling controls how stable or creative the output is.
Why This Breaks Systems
Common mistake:
- Long prompts
- Unnecessary tokens
- High temperature
Result → unstable + hallucinated outputs
Final Mental Model
LLM behavior = Tokenization + Context + Sampling
Final Thought
LLMs are not unpredictable.
They follow rules most people don’t see.
Once you understand these — you stop guessing and start controlling.
This is where prompt engineering becomes real engineering.
What's next:
Day 3 - AI Engineering Journey- Prompt Engineering Is Not What You Think
Comments
Post a Comment