How Does It Work? (Part 2): Attention

How Does an LLM Work?

Why the model knows *France* matters more than *the*. The attention mechanism explained: queries, keys, values, multi-head attention.

1

Learning Material

6 pages

Lesson 5 — How Does It Work? (Part 2): Attention

Seite 1 von 6

Understanding the Complex: How Does an LLM Work?


Back to the anchor example:

"The capital of France is ___"

The model has tokenized the sentence and converted each token to a vector. Five vectors, entering a 96-layer network. Now what?

The problem: without additional machinery, the model would treat all five tokens equally. It would average them, essentially. And an average of The, capital, of, France, and is would tell you nothing useful about what comes next.

The solution: attention.


Want more?

Sign up for AI tutoring, study plans, exam prep, and more.

Sign up free