JavaScript Solutions, Competitive programming in JavaScript, MCQ in JS

Wednesday, 22 April 2026

Day 2- AI Engineering Journey- Tokens, Context Window & Sampling (Hidden Mechanics of LLMs)

Day 2 — Tokens, Context Window & Sampling (Hidden Mechanics of LLMs)

If you’ve understood that LLMs predict the next token, the next question is:

What actually controls their behavior?

Three things shape every output:

  • Tokenization
  • Context window
  • Sampling

Most people ignore these — and that’s why things break.


🔤 1. Tokenization

LLMs don’t read words. They read tokens.

"unbelievable" → ["un", "believ", "able"]

Even small changes matter:

  • "Hello" ≠ " Hello"
  • "AI" ≠ "AI."

Why it matters:

  • Cost is per token
  • Formatting affects output
  • Debugging becomes tricky
The model works on token patterns, not language.

📏 2. Context Window

This is the model’s memory limit.

Two problems:

  • Hard limit: Old data gets removed
  • Soft limit: Too much info → weaker attention

Impact:

  • Long chats lose instructions
  • Big inputs → worse answers
Context window is a performance constraint, not just a limit.

🎲 3. Sampling

The model predicts probabilities — but doesn’t always pick the top one.

Controls:

  • Temperature: low = safe, high = creative
  • Top-k: restricts choices
  • Top-p: probability-based selection
Sampling controls how stable or creative the output is.

⚠️ Why This Breaks Systems

Common mistake:

  • Long prompts
  • Unnecessary tokens
  • High temperature

Result → unstable + hallucinated outputs


🧩 Final Mental Model

LLM behavior = Tokenization + Context + Sampling

💭 Final Thought

LLMs are not unpredictable.

They follow rules most people don’t see.

Once you understand these — you stop guessing and start controlling.

This is where prompt engineering becomes real engineering.

No comments:

Post a Comment