JavaScript Solutions, Competitive programming in JavaScript, MCQ in JS

Tuesday, 21 April 2026

Day 1- AI Engineering Journey - How Large Language Models (LLMs) Actually Work — Explained Simply

How Large Language Models (LLMs) Actually Work — Explained Simply (But Correctly)

This is Day 1 of Posting AI Engineering content. In this post I will mostly cover how exactly LLM works.


Most people say things like “LLMs understand language” or “they think like humans.”
That sounds nice — but it’s not how they actually work.

Let’s break it down properly.


The Core Idea

At its heart, an LLM is not thinking, reasoning, or understanding.

It is a system that does one thing extremely well:

It predicts the next token given the previous tokens.

That’s it.

Everything else — conversations, code, reasoning — emerges from this single capability.


🔤 Step 1: Text → Tokens

Before processing, your input is converted into tokens.

Tokens are not exactly words. They can be:

  • Full words → cat
  • Parts of words → un, believ, able
  • Symbols → +, =, ;

Example:

"unbelievable" → ["un", "believ", "able"]

This matters because:

  • Cost is based on tokens
  • Models have token limits (context window)
  • Poor tokenization can affect output quality

⚙️ Step 2: Transformer Processes the Input

Once tokenized, the input is passed into a transformer model.

The transformer:

  • Looks at the entire sequence at once
  • Understands relationships between tokens using attention
  • Builds a contextual understanding of the input

Example:

“The animal didn’t cross the road because it was tired.”

The model understands:

  • “it” refers to animal, not road

📊 Step 3: Predicting the Next Token

The model does NOT generate full sentences.

Instead, it calculates:

What is the probability of each possible next token?

Example:

"The sky is ___"

blue → 0.7  
green → 0.1  
falling → 0.05  
pizza → 0.001  

🎲 Step 4: Sampling (Why Outputs Change)

The model doesn’t always pick the highest probability token.

Instead, it samples based on parameters like:

  • Temperature
    • Low → safer, deterministic
    • High → creative, risky
  • Top-k / Top-p
    • Restrict which tokens can be chosen

This is why:

The same prompt can produce different outputs

🔁 Step 5: Repeat the Loop

Once a token is selected:

  1. It gets appended to the sequence
  2. The model runs again
  3. Predicts the next token

This loop continues until the response is complete.


⚠️ Why LLMs Hallucinate

LLMs are not optimized for truth.

They are optimized for:

Generating the most probable continuation

So if something sounds right, the model may generate it — even if it’s wrong.

Reasons include:

  • No real-world grounding
  • Imperfect training data
  • No built-in verification system

📏 Context Window Limitation

LLMs can only process a limited number of tokens at once.

If input is too large:

  • Older parts get truncated
  • Important context is lost

Even within limits:

  • Too much information → weaker attention → poorer answers

🧩 Final Mental Model

If you remember just one thing, remember this:

An LLM converts text into tokens, processes them using a transformer, predicts the probability of the next token, selects one using sampling, and repeats this process step-by-step to generate output.

🚀 Why This Matters

Understanding this unlocks:

  • Better prompt engineering
  • Building RAG systems
  • Designing AI agents
  • Debugging hallucinations

💭 Final Thought

LLMs don’t “know” things.

They are incredibly powerful pattern predictors.

And once you understand that — you stop using them blindly, and start using them like an engineer.


If you're learning AI engineering, this is your foundation. Everything else builds on top of this.

No comments:

Post a Comment