A clear mental model for how large language models actually work

A recurring problem in AI discussions — especially at leadership level — is that large language models are either treated as magic or dismissed as shallow pattern matchers. Both positions usually stem from a lack of a clear mental model of how these systems actually work.

A post shared by Andreas Horn (Head of AIOps, IBM) does an unusually good job of closing that gap by breaking LLMs down into their core mechanical components, without oversimplifying them.

At a high level, modern LLMs operate through a sequence of stages that transform text into mathematical representations, iteratively refine meaning through context and then generate outputs probabilistically.

Tokenisation and embeddings form the first step. Input text is broken into tokens, each of which is mapped into a high-dimensional vector space. This representation allows words with similar meanings or usage to cluster together mathematically, rather than being treated as isolated symbols.

From there, self-attention mechanisms determine how tokens influence one another based on context. This is what allows a model to distinguish between different meanings of the same word depending on surrounding text; for example, a “bank” in a financial versus geographical sense. Attention dynamically re-weights relationships between tokens rather than relying on fixed rules.

Once contextualised, tokens pass through deep feed-forward neural layers. These layers progressively refine representations, learning increasingly abstract semantic relationships. This process is repeated across many layers, which is where the “deep” in deep learning becomes meaningful in practice.

Finally, the model performs prediction and sampling. The refined representation is used to generate a probability distribution over possible next tokens, from which outputs are sampled step by step. Importantly, this means LLMs do not retrieve answers, they generate them probabilistically based on learned structure.

The value of this framing is not academic. A solid understanding of these mechanics is essential if you want to design AI systems that are scalable, controllable and responsible. Without it, organisations tend to over-index on surface behaviour while underestimating risks, limitations and integration challenges.

For anyone working seriously with GenAI, whether as a technologist, architect or decision-maker; this kind of grounding is not optional. It is foundational.

Primary Source (Citation)

Author: Andreas Horn
Role: Head of AIOps, IBM
Platform: LinkedIn
Original post: This is hands down one of the best visualizations of how LLMs actually work
URL: https://www.linkedin.com/posts/andreashorn1_%F0%9D%97%A7%F0%9D%97%B5%F0%9D%97%B6%F0%9D%98%80-%F0%9D%97%B6%F0%9D%98%80-%F0%9D%97%B5%F0%9D%97%AE%F0%9D%97%BB%F0%9D%97%B1%F0%9D%98%80-%F0%9D%97%B1%F0%9D%97%BC%F0%9D%98%84%F0%9D%97%BB-%F0%9D%97%BC%F0%9D%97%BB%F0%9D%97%B2-activity-7300037316503388160-dcvB
Captured from: LinkedIn post UI (“~10 months ago” label)
Published: ~13 March 2025
Captured: 13 Jan 2026

Secondary Referenced Material

3Blue1Brown — Neural Networks / Transformers visual explanation
https://lnkd.in/dAviqK_6

Attribution & Use Statement

This post is a summary and commentary written in my own words.
All original ideas, expressions and visual materials remain the intellectual property of their respective authors and publishers. This content is provided for analysis and educational commentary.