Imagine yourself in class, sitting in a circle with five other people, taking turns to read a Chinese novel. Now, let’s also suppose you don’t know any Chinese. Finally, let’s say that you know in advance which paragraphs you’ll have to read. If you phonetically memorize the paragraphs and pretend to be reading from the book when your turn comes, would anyone be able to tell you don’t understand Chinese?
Assuming your pronunciation and pretending skills are good, the answer is no.
The very same thing happens with artificial intelligence. With the rise of Chat GPT, an AI chatbot made by OpenAI, many people have started to wonder if we are a step closer to artificial consciousness like the one we see in sci-films such as Star Wars and Terminator. Having conversations with these chatbots, it does feel as if they were conscious, fully capable of human-like understanding.
However, this couldn’t be farther from the truth.
These chatbots, which are more accurately known as large language models, are doing the exact same thing you did a moment ago with the Chinese book: it perfectly pretends to understand, when in reality, it uses other methods to mimic understanding. Does this mean Chat GPT has memorized every single conversation possible?
Not quite, but you could think about it that way.
If we grossly simplify it, what Chat GPT does is to statistically predict the most appropriate next word given the context. This is done with the aid of transformer models. Below are the steps explaining the elements of transformer models and how they work:
Encoder: With the aid of the attention mechanism—which adds weight to the most important words from your input—the encoder turns text into vectors (vectors are like coordinates, but instead of describing locations, they describe your words)
Decoder: Using the vectors from the encoder, the decoder starts producing output. Each token of the output represents a word, and has its own weight (level of importance). The decoder simultaneously uses the following mechanisms to statistically predict the best next word until its response is completed:
- Attention mechanism: Takes into account the weights of each vector from the encoder, making sure the response addresses the key words from your text
- Self-attention mechanism: Takes into account the weight of the previous tokens, making sure the next word is coherently connected to the rest of the sentence
This deconstruction of ChatGPT clearly shows how chatbots are different from us. Although our writings are probably indistinguishable from texts produced by artificial intelligence, the way we produce them is intrinsically dissimilar. We are able to produce coherent sentences because we consciously understand how language works. We learn how language works not by reading thousands of texts and deducing the correctness of sentences based on previous samples, but by grasping abstract concepts that serve as good explanations of the mechanisms regulating syntax. We don’t need to run any statistical analysis, nor take into consideration specific words to predict the best next word.
In all, while ChatGPT and other AI chatbots have become incredibly advanced and capable of holding complex human-like conversations, it is important to remember that being able to read a Chinese text does not imply an understanding of Chinese.