Day 2 — Tokens, Context Window & Sampling (Hidden Mechanics of LLMs)
If you’ve understood that LLMs predict the next token, the next question is:
What actually controls their behavior?
Three things shape every output:
- Tokenization
- Context window
- Sampling
Most people ignore these — and that’s why things break.
🔤 1. Tokenization
LLMs don’t read words. They read tokens.
"unbelievable" → ["un", "believ", "able"]
Even small changes matter:
- "Hello" ≠ " Hello"
- "AI" ≠ "AI."
Why it matters:
- Cost is per token
- Formatting affects output
- Debugging becomes tricky
The model works on token patterns, not language.
📏 2. Context Window
This is the model’s memory limit.
Two problems:
- Hard limit: Old data gets removed
- Soft limit: Too much info → weaker attention
Impact:
- Long chats lose instructions
- Big inputs → worse answers
Context window is a performance constraint, not just a limit.
🎲 3. Sampling
The model predicts probabilities — but doesn’t always pick the top one.
Controls:
- Temperature: low = safe, high = creative
- Top-k: restricts choices
- Top-p: probability-based selection
Sampling controls how stable or creative the output is.
⚠️ Why This Breaks Systems
Common mistake:
- Long prompts
- Unnecessary tokens
- High temperature
Result → unstable + hallucinated outputs
🧩 Final Mental Model
LLM behavior = Tokenization + Context + Sampling
💭 Final Thought
LLMs are not unpredictable.
They follow rules most people don’t see.
Once you understand these — you stop guessing and start controlling.
This is where prompt engineering becomes real engineering.
No comments:
Post a Comment