How does AI actually work?

Listen to the podcast at our podcast host, Blubrry.com, or find it on your platform of choice, including iTunes, Spotify, Amazon, Audible, and YouTube. I also have this short (6 minute) video here.
Transcript
Can you complete this phrase: “Mary had a little ______.”
How about this one: “Jack and Jill went up the _______.”
OK, here’s a tougher one: “I think I’m going to be ______. “
Now hold on to those – I will come back to them.
This episode is about how Generative AI actually works. I have fallen in love with its friendly nature and seemingly infinite wisdom, you might not want to listen to what goes on behind the curtain. But if you are concerned about Generative AI taking your job or becoming sentient to the point of becoming our robot overlords, you may want to stick around.
To get to a place where we can understand just how Generative AI works, I first want to take you to the movies. And I suggest we go revisit The Wizard of Oz from 1939. You might expect that I will use the metaphor of pulling back the curtain to reveal the real person behind the magic – spoiler alert – that is the key moment in that movie. Such a reference is completely appropriate and accurate, but it’s also too easy. We need to go deeper than that, so I will bring back the concept of movies and that movie in particular as we go through this story together. But here’s a hint: to add to my list of phrases I asked you to mentally complete at the start of this episode, here’s one more: “We’re off to see the _______.”
When it comes to Large Language Models, LLMs, like ChatGPT, Claude, Gemini and Copilot. There’s lots to enjoy. In my experience, and probably yours, the first is the speed and depth of the answers. The second is that we don’t have to make our question conform to any pre-set rule the way you have to do with Google. We can simply ask in our natural way of speaking. Third is the positive, supportive and enthusiastic way in which the LLM presents its answers. And fourth is how proactive it is in offering the next steps in the process.
If you have spent time using any of these tools, you’ve likely had at least one moment where you stopped mid-sentence and thought, “How does it know that?” It’s not just the facts, but the way it answers. The tone. The relevance. Like chatting with a friend, it seems to pick up on what you’re trying to do, sometimes even before you’ve fully figured it out yourself.
You might even ask a vague question and get a surprisingly clear answer. You might ask for help with a problem and be offered suggestions you had not considered. And occasionally, the system will even ask you a question or offer a next step: “Would you like me to expand on that?” or “Do you want to see an example?” or “Shall I draw up a printable step-by-step PDF?”
At that point, especially for most of us who grew up poring through Google search pages, this approach stops feeling like software and starts feeling like something much more sentient.
So, how does generative AI actually work? And why and how does it sometimes get things spectacularly right, and sometimes spectacularly wrong. Now I have to add a bit of a disclaimer here. This is a technology that is advancing and maturing so quickly that some of the things I describe here which we have currently experienced may have been solved, improved or replaced within the next six to twelve months. But it’s still worth knowing.
Let’s start by clearing away a few myths. First, generative AI is not a database or a library. When you ask it a question, it doesn’t retrieve an answer from some vast storage space. It doesn’t even have an index. There’s no moment where the system says, “Ah yes, here is the stored file labeled ‘Leadership Best Practices,’ or “what your dog is trying to say.”
It’s also not a search engine. Even when it seems like it is summarizing key parts of the information available on the internet, it’s not browsing, crawling, or retrieving web pages the way Google does. It does go to websites, but the process, as we will see, is more spontaneous and linear compared to a search engine.
Most importantly, your LLM is not thinking, reasoning, or understanding in the human sense. There is no awareness. No beliefs. No intention. It has no internal voice saying, “This seems helpful.” And yet, it produces language that feels purposeful, informed, and often remarkably aligned with what you need.
This is quite the paradox.
The Simplest Possible Explanation
The most honest, stripped-down explanation of how generative AI works is this: “it simply predicts what should come next.” That’s it. Every answer you see, every paragraph, every explanation, every follow-up question is built one small step at a time by predicting the most likely next piece of language. Not the best answer, not even the correct answer. Just the most probable continuation, given everything that came before. It’s like a global version of the autocomplete feature you have in Microsoft Words and on your SMS app.
When I say, “Mary had a little _______,” your mind just did its own autocomplete. Why did it do that?
The magic behind this process is how Generative AI predicts that next word. It does so from what it learned about language in order to do that so well.
Tokens, Not Words
A key fact here, and maybe a surprising one, is that generative AI doesn’t actually work with words at all. It works with tokens. A token might be a full word, a part of a word, a number, a piece of punctuation, or even a space.
When you type a prompt – that is – you enter a question or comment, the system breaks your text into these tokens. Then it starts asking a question over and over again: “Given all these tokens so far, what token is most likely to come next?” And it chooses one. Then it repeats the process. Again. And again. And again. Thousands of times per response. That single mechanism – prediction repeated at scale – is responsible for everything you experience as intelligence.
Where the “Knowledge” Comes From
So how does an LLM know anything? The answer lies in training. And the clue lies in its generic name, LLM. Before a generative AI system ever talks to a user, it goes through an enormous training process where it is exposed to massive amounts of human-created text: books, articles, explanations, conversations, instructions, examples, stories. A large language model. Lots and lots of language, hence the term, LLM. But while it reviews all this material, it’s not memorizing the facts in there, instead, it’s learning patterns. What sort of patterns?
- How questions tend to be phrased
- How explanations tend to unfold
- What usually follows phrases like “In summary…”
- How arguments are structured
- How tone shifts between formal and casual
Over time, the model builds an internal map of how language works. It does not store answers. All it stores is relationships between language elements. That’s why it can explain something it has never seen before, by assembling familiar patterns in new ways.
Here’s an example. If you have ever said “thank you” to Copilot for generating a piece of good writing or an image, it will respond in kind. “Thank you,” I type, “this is great.” Copilot replies, “you’re welcome, Steve, this is a really great article. If you need a summary for LinkedIn or some key points for social media, just hit me up. I know you’ll do great.”
Gives you the warm and fuzzies, doesn’t it? It sure would be nice if I actually had a robot friend like that. But all that Copilot is doing is responding to the tokens inside the word, “Thank you.” As most of us do, it responds with the most natural sounding next token, which, in English, is “Your welcome.” Since this session on Copilot was to write an article, the next token is to address what happens with that article. I, Steve, will likely want to post it somewhere and then promote it on LinkedIn and on other socials. Those are tokens that follow a string of tokens that the LLMs has seen a million times in its reading. People say, “you’re welcome” after someone says, “thank you.” People who write something usually want it to be seen by other people. If, after I said “thank you” to Copilot for helping me write the article, it said “I hope you find the article delicious when you eat it for lunch,” it will clearly have pursued the wrong string of tokens.
Many users describe generative AI as feeling emotionally intelligent. They say, “it seems like it understands me.” But in reality, the AI system isn’t detecting emotion, it’s detecting linguistic patterns associated with emotion. We humans constantly broadcast emotion-related concepts such as intent, uncertainty, urgency, confidence, and frustration through our language. Think about sentence length, word choice, politeness, directness, or hesitation. The AI model has access to millions of examples of how people write when they’re stressed, curious, overwhelmed, or reflective, so when it responds appropriately, it’s not empathy – it’s pattern alignment.
But here’s the important part: from our user experience perspective, that difference often doesn’t matter.
This is where a movie analogy comes in. When you sit down to watch a movie – like the Wizard of Oz, or any other celluloid movie or digital video, your eyes are registering a series of still images. For movies it’s 24 frames per second, and for TV and live streaming, its 30. Your eyes see that many images each second and your mind stitches them all together to become an experience. If someone hands you a reel of movie film, or an SD card with a movie inside it, it does nothing for you until you insert them into the appropriate machine that allows you to see the sequence of images. Your mind puts it all together to create the illusion of motion. And that is what happens when we experience the illusion of personality inside an LLM. It puts the right words in the right order. We interpret this as an emotional experience.
At this point you might ask, “isn’t this just what we do? “If someone says, ‘thank you,’ I will also say “you’re welcome.” And this is true. But for us these responses are learned and retained inside our minds. We know what to say, and we don’t have to look it up somewhere. If you are introduced to somebody and they extend their hand for a handshake, you automatically know to respond, by extending your own hand and shaking hands with the firmness and brevity that is expected, at least in many places in the world. You don’t have to do some research on why this person is extending their hand toward me. But that’s what an LLM does each and every time. It remembers nothing. It researched everything. Every time.
That’s Why Generative AI Sometimes Gets Things Wrong
This model also explains the hallucinations – when AI confidently produces incorrect information. My personal favorite has always been images of people in early AI-generated art: contorted hands with eight fingers, more than two legs, an extra disembodied arm – a general wrongness and unnatural shape. Although this has improved somewhat, the fact that it happened at all shows how LLMs work. They don’t have an innate knowledge of what humans look like – they have to pull it from thousands of images of humans. In most cases, people in photos, paintings and drawings are doing something. Their hands are busy holding something or pointing. So many images of people’s hands have them contorted in some way like holding a pen, or pointing, that there was no single specific definition of what a hand looked like. It was just an undefined set of fingers based on the LLM’s collective experience.
Interestingly this allows us to draw another parallel with our human selves. Look at how kids draw people. The average three-year-old or four-year-old graduates from random scribbling to attempts to draw the people around them – parents, caregivers, siblings. These images are known as “tadpole people,” since they often have disproportionately large heads, with eyes high up on the face, and smaller stick-figure bodies. This is a reflection of the child’s own experience – call it passive research – in that their experience with their family members to this point has largely been very up close, with intense focus on the face and eyes of the person caring for them. So, the effect is the same – a four-year-old child draws their world based on what they have seen, as does an LLM.
In both cases the knowledge gets refined over time and with additional experience.
But with LLMs these hallucinations, as they are called, can also occur with facts, and can get caught up in the answers that it delivers so quickly and efficiently. From the system’s point of view, nothing has gone wrong. It wasn’t trying to be accurate. It was trying to be plausible. If a sentence sounds like it belongs in a certain context, the model may generate it, even if it isn’t true.
This is why generative AI is best understood as a language engine, not a truth engine. Accuracy and truth are something we value in fellow humans and that we tend to retain in our memory, as individuals and collectively as part of a community. It is therefore something we encourage through the design, feedback, and constraints with Generative AI. But it is not the core mechanism.
That’s why, when I use an LLM to do some research or to draft up a document, I later go and use a different one to do some fact checking. This by the way is one of the many ways you can legitimately accept LLMs as a tool, not a cheating mechanism. When ChatGPT writes an email or a memo for you, it is doing service that it knows nothing about. It saves you time and often adds valuable ideas that you might never have thought of yourself, but it’s up to you to ensure the quality of the final product.
This reminds me of a similar, much maligned technology, the calculator. Back when I was in high school, the great moral concern was whether students should be allowed to bring calculators to school, and worried parents and educators presented the exact same arguments as we are now hearing with generative AI: it’s cheating. The machine is doing the work. How can students become employable if they rely on machines? Well, instead we have greatly benefited from calculator technology. It has allowed people to save time with much of the tedium and errors of longhand calculations and used that time and the information the calculator delivers to work on higher level task that create the tools, technology and innovations that improve our lives.
How does it recognize images?
When a language model recognizes what’s in an image, it isn’t “seeing” the picture the way a human does. It’s analyzing patterns of pixels and translating them into probabilities. A photo is just a grid of colored dots, and the model has been trained on enormous numbers of images paired with human descriptions. During training, it learns that certain shapes, edges, textures, color groupings, and spatial relationships tend to correspond with concepts like “chair,” “table,” “window,” or “person.” It also learns how objects usually relate to one another: people tend to be upright, furniture rests on the floor, walls form vertical boundaries, and rooms have consistent spatial layouts. When you upload a photo, the model scans the image for these learned visual patterns, identifies the most likely objects and their relationships, and then converts that visual understanding into language. In other words, it doesn’t start with meaning and work downward. It starts with pixels, detects structure, assigns probabilities, and gradually builds a coherent interpretation of what the image most likely represents, given everything it has learned from millions of prior examples.
Where the jobs come from
What often gets overlooked is just how many humans are involved in creating and maintaining what we experience as an AI’s “personality.” Large language models don’t arrive fully formed and then run unattended. Teams numbering in the hundreds, and in some cases thousands, are involved across the lifecycle of a model. Researchers design the behavioral goals; engineers implement constraints and reward mechanisms. Human reviewers evaluate outputs, rating them for helpfulness, clarity, tone, safety, and alignment with real human expectations. Subject-matter experts are brought in to assess accuracy in sensitive domains. Policy teams define boundaries. User experience (UX) designers shape how responses feel in real use.
This extensive work doesn’t stop at launch: models are continuously monitored, re-trained, adjusted, and corrected based on new data, new risks, and new patterns of misuse. So, when an AI sounds polite, encouraging, cautious, or curious, that voice is not emerging spontaneously. It is the accumulated result of countless human judgments about what “good” communication should look like. There is no inner character, but there is a consistent style to its output.
The question that is often asked is “whether AI will take my job” but in truth it creates much more work for people. The quality of the output from Generative AI relies on the quality and accuracy of the prompts that are asked of it. Writing prompts is a human skill. People write the best prompts. This is human work. As is the art of teaching people who to write prompts. People also need to check the accuracy and appropriateness of the output from an LLM. This, too, is human work. You can’t trust LLM output to be completely accurate or appropriate. A person who one was responsible for typing up a report will now be needed to write the correct prompts to have the LLM do the writing and will then be needed to assess the output as an editor, ethicist or quality control manager.
This may be seen as an application of The Iron Law of Automation, published in a paper in 1983 by Lisanne Bainbridge, in which she states, “the more advanced an automated system becomes, the more crucial, and demanding, the role of the human operator becomes.” In other words, automation does not remove human work; it shifts it upward into tasks such as oversight, exception handling, judgment calls, and system correction. With Generative AI, this shows up as prompt design, validation, bias detection, edge-case handling, and ethical oversight. (Bainbridge, Lisanne. “Ironies of Automation,” Automatica, Vol. 19, No. 6, pp. 775–779, 1983)
And this is not even counting all the more traditional jobs required, for example in construction. The data centers required to generate the computing power of Generative AI requires a great many people to build and maintain the physical infrastructure of this seemingly invisible power. While inside, the opportunities for advancing every area of work from running more effective and hopefully fewer meetings, all the way through to detecting high blood pressure from a standard blood test actually mean more work, not less.
So not only does the The Iron Law of Automation make sense, so too does Moore’s law, which was initially coined to predict that the number of transistors that can fit on a microchip doubles roughly every two years, while the cost of computers is halved. But this law quickly demonstrated the expansion of the usability of tools like the fax machine and email. The more people use it, the more useful the technology becomes on an exponential factor. The same applied to the internet itself, and we will very likely see the same thing as Generative and Agentic Ai settle into our working infrastructure. There will be more work to do, but more people, not less work done by fewer people.
Why It Asks, “Would You Like Me To…?”
This is one of the most unsettling features for new users, because it feels proactive, intentional and almost… thoughtful. But again, there’s a simpler explanation.
As I mentioned before, humans tend to follow predictable workflows. If someone says, “thank you,” you are likely to say, “you’re welcome.” Furthermore, if someone asks for an explanation, they often want examples. If they need a summary, they will often want key takeaways. If they are looking for a plan, they often want next steps. The LLM model has seen these sequences countless times. So, when it asks, “Would you like me to expand on that?” it’s not making a decision, it is simply completing a pattern. This is goal completion, not agency.
In our human world, there are many areas where this happens that we have simply been accustomed to. Think of great customer service, for example. Let’s say you take a pricey vacation, maybe a cruise or staying at a premium hotel. You as the customer will expect an elevated degree of customer service. The staff members at the hotel will – or should be – incredibly friendly, proactive, thoughtful and aware of your every need, even before you need it. Their job is to make the experience of your vacation as positive as possible. But do they really care about you? Not so much. They are likely very caring people, but once you have checked out of your room and have headed back the airport, they simply prepare the room – and their performance – for the next person or family. With rare exceptions that might come from regular visits over time where a genuine friendship is made, a great hospitality professional is essentially working from a script of actions to take that align with and seek to exceed the expectations of the customer.
In both cases, in the human a world and with parallels, it’s incredibly effective.
Why Understanding This Matters
If you think generative AI “knows things,” you may be over-trusting it. If you think it is just an action of “just autocomplete,” you may underuse it. The truth sits in between. As a simple example, let’s go back to the examples I used at the start of this episode. Almost everyone will autocomplete in their minds the phrase “Mary had a little lamb.” For many of us it stems from childhood experience and is the most immediate “next word” that comes to mind. The same with “Jack and Jill went up the hill.” This is even more specific. With Mary, I could have been speaking about any number of “Marys” in culture, media, or even in my own world. The “Jack and Jill” paring makes the expected outcome even more likely, given the relatively fewer instances of those two names appearing together. And “I think I’m going to be…” requires much more context. When I run this exercise in my classes, I receive answers ranging from “I think I’m going to be sick” to “I think I’m going to be OK” and many more. To autocomplete such a sentence, an LLM will require much more context.
Generative AI is best used as a high speed and somewhat competent assistant. A thinking partner, a drafting assistant for documents, not just blueprints, a pattern amplifier. A reflection tool. When people ask the most common question, “Will AI take my job?” they do so as a reaction to three forces: the first, not fully understanding a what Generative AI is and isn’t, second, assuming it can do all the things that they currently do in their job, and third, because they hear news stories everywhere about companies laying people off or simply not hiring because of AI. As I mentioned earlier, employers should really be given solid grounding on what Generative AI is and isn’t before deciding to lay off staff. The best and most strategic line of thinking is always going to be, “how can this technology make my company more efficient and profitable, and that., as The Iron Law of Automation puts forth, and Moore’s law quickly follows, requires people and tools, not just tools.
The Real Magic is in our interpretation of AI
The reason that Generative AI appears to have an intelligence of its own is because human language itself is intelligent, and Generative AI and this technology has learned its structure at scale. As a result, it reflects us, specifically our knowledge, our habits and our own collective personality.
Once again, I go back to the experience we feel when we watch a movie, or even a music event or even a big-time magician or illusionist. A truly entertaining spectacle will give people a buzz, a glow as they become absorbed emotionally into the experience.
As concert goers and movie watchers, we all know it’s just a performance. After the show, the performers exchange their costumes for street clothes and head to the tour bus or the hotel. Exquisite concert lighting and lasers is replaced by the fluorescent house lights of the arena. After an engrossing and enjoyable movie, we get up exit the cinema and trudge through the parking lot. But at those moments when a show exists, we sit wrapped in a cloak of pleasure.
With generative AI, we experience the same cues and responses that we would when talking to a trusted friend or watching a great movie. Whereas a Google search has for most of its time been simply the act of reviewing a list of websites that match the terms you are searching for, talking with ChatGPT or Copilot is like talking to that special librarian, or sales assistant in a store who understands you, seems to know exactly what you are looking for and can provide advice in a friendly manner. We soak up the experience of a computer finally engaging with us as humans, and then we start to hope, or even believe that it has actually become so. In truth we have all been down this yellow brick road before. When Samuel Morse created a way for communication to travel across wires almost instantly, Tesla, Marconi, Edison, and Graham Bell followed suit, it changed the way we communicate forever. We adapted and built upon it. That’s how humans are. We adapt. We grow; we move on. That is, of course what Dorothy Gale and her colleagues learned when they finally pulled back that curtain deep inside the Emerald City. What they learned was that they already had what they needed to continue on and thrive, and the illusion of a technique or person as the agent on enormous power was just that: an illusion, just like movies themselves.
Thank you for visiting. Do you have comments or thoughts about this episode? Feel free to get in touch through our Contact page.