How LLMs Predict the Next Word — AI Explained for Non-Engineers
When you ask ChatGPT a question, it looks like a person thought carefully and then answered. That is why many people assume an LLM is “a very smart search engine” or “a computer that thinks.” But what actually happens inside is simpler than you might expect.
A large language model, or LLM, does one thing that you can sum up in a single sentence. It predicts, by probability, the most likely next word that follows the text it has been given so far. In this article, I will walk you through how that one line turns into a conversation like ChatGPT, and why understanding this principle helps you use AI better and get fooled by it less.
An LLM Doesn’t Think — It Predicts the Next Word #
Picture the feature on your phone keyboard that suggests the next word as you type a message. Type “the weather today is” and it offers a word like “nice.” It helps to think of an LLM as that same autocomplete, scaled up to a size that is hard to imagine. As a big-picture mental model, that is accurate.
The difference is scale and sophistication. Your phone’s autocomplete looks at only the last few words and makes a simple suggestion. An LLM, by contrast, takes in the entire context so far and computes the next word with far more refined probabilities. That is why its output goes beyond single sentences into long passages, code, and translations that flow naturally.
Here is the important point. An LLM does not understand the meaning of your question the way a person does, and it does not go fetch a correct answer from somewhere. It simply computes which word is most likely to come after this text.
The Model Works with Tokens, Not Words #
To be a bit more precise, an LLM does not work with words but with smaller units called tokens. A token can be a whole word, or it can be a fragment of one. For example, a single word like “autocomplete” may be split inside the model into two pieces such as “auto” and “complete.”
Why split things this way? Because handling text as combinations of frequently occurring fragments, rather than memorizing every word in the world whole, lets the model flexibly handle words it has never seen and even typos. So to put it exactly, an LLM predicts the “next token,” not the “next word.” That said, for the sake of understanding, I will use “token” to mean roughly the same thing as “word” throughout this article.
It Learns by Repeatedly Guessing the Next Token from Internet Text #
So how does the model learn what counts as a plausible next word? The answer is training.
An LLM uses the vast amount of text piled up on the internet — books, articles, wikis, web pages — as its raw material. The training process is a surprisingly simple repetition. You hide part of a sentence and ask the model to guess the next word for the hidden part. When it gets it wrong, you adjust its internal numbers little by little and let it guess again.
This process is repeated an astronomical number of times. As a result, the model does not memorize specific answers; it statistically absorbs the patterns embedded in language. Grammar rules, the habits of which words tend to appear together, frequently occurring facts, even the mood of a particular writing style — all of it settles inside the model in the form of “which word tends to come after which word.”
What looks like knowledge is, in fact, the product of this statistical process. Because the model has seen, countless times, the pattern that “Seoul” overwhelmingly follows the sentence “The capital of South Korea is,” it produces that answer with confidence.
It Completes Sentences by Stringing Together One Word at a Time #
Let’s also look at how a trained model actually produces an answer. The key is that it does not write a whole sentence at once; it strings words together one at a time.
The model first picks the single most plausible next word. Then it takes the text including the word it just picked and feeds it back in as input, and predicts the word that comes after that once more. This process of feeding its own output back in as new input and growing the text one word at a time is called autoregressive generation. It is also why ChatGPT’s answer streams out one character at a time from the left.
The model does not always pick only the highest-probability word, though. By adjusting a value called temperature, it controls whether to pick safe, predictable words or somewhat more varied, surprising ones. That is why the answer to the same question comes out a little differently each time.
The Context Given Beforehand Changes the Prediction #
An LLM’s prediction does not happen in a vacuum. The model looks at the conversation and instructions given up to that point and computes the next word with them in view. This entire input is called the context.
Even the same word, like “spring,” takes on completely different probabilities for what follows depending on whether the preceding text was about a season or about a metal coil. This is possible because the model looks at the earlier content alongside.
This is where the saying that the prompt matters comes from. Depending on what context and instructions we place in front, the probabilities the model computes — and its output along with them — change wholesale. The more you assign a role, or spell out the format and conditions you want, the more the model picks words that fit that context. Getting a good answer is nearly the same task as building a good context.
And That Is Why Hallucinations Happen #
Once you understand this principle, the LLM’s biggest weakness explains itself naturally. That weakness is hallucination — the phenomenon of confidently making up wrong content as if it were fact.
The model does not separately judge whether a piece of information is true or false. It simply strings together “plausible next words.” That is why it can produce book titles that do not exist, fake papers, and incorrect statistics in a grammatically smooth and plausible way. From the model’s point of view, there is no essential difference between real information and a plausible falsehood. Both are just sequences of high-probability words.
On top of that, the text the model trained on has a cut-off point in time. Recent events or changed information that occurred after training are not contained inside the model. So for important facts, the latest information, numbers, and quotations, you need the habit of not trusting an LLM’s answer as-is and verifying it separately. These days, methods that supplement this weakness by also drawing on search or external sources are widely used, but the final check is still the human’s job.
Knowing the Principle Helps You Use It Better and Get Fooled Less #
Let me wrap up everything so far. An LLM is not a machine that thinks or searches. It is a machine that learned the patterns of language by repeatedly guessing the next token from vast amounts of text, and then, following the context it is given, picks and appends words one at a time by probability.
This simple principle is the foundation of its remarkable ability to translate, summarize, write, and even code. At the same time, it is the reason it cannot guarantee facts. In other words, it is a powerful tool with clear limits.
Once you know the principle, how to use it becomes clear too. Carefully build good context and instructions, and verify for yourself the parts of the output that count as facts. Do that, and you can get far better results from the same tool — and get fooled by plausible falsehoods less often.
For the more fundamental story of how letters and words are ultimately handled as numbers inside a computer, see How Computers Represent Everything in Binary. It connects naturally with the fact that an LLM converts tokens into numbers to compute, so I’d recommend reading the two together.