Generative AIAI fundamentals

Introduction to LLMs

11 minutes read

In this topic, we will cover what an LLM is at a high level, look at how they can be broadly categorized into two types based on their output, and consider their applications beyond being a friendly AI assistant.

What is a Large Language Model

A large language model (LLM) is a type of artificial intelligence (AI) capable of understanding and generating human-like text. It is trained on vast amounts of text data, enabling it to recognize, summarize, translate, predict, and create content. At its core, an LLM uses a neural network architecture, typically a transformer architecture, to process language.

By analyzing the statistical relationships between words and phrases in its training data, it learns grammar, facts, and reasoning abilities. This allows it to perform many tasks, from answering questions and writing emails to generating code and maintaining conversations. The "large" in its name refers to the massive number of parameters—up to billions or even trillions—that the model uses to make predictions.

LLM development usually involves two major training stages. The first stage trains the foundation model — a model that processes large amounts of text data and extracts various patterns from it (like grammar constructs, associations, contextual meaning, etc.). More formally, the model learns statistical representations and relationships between data at this point. This training happens in a self-supervised way, meaning there are no explicit labels, and the task focuses on extracting structures from the data.

The process of creating the foundation model is called pre-training. At this stage, the foundation model doesn't directly solve specific tasks. Depending on its type (covered in the next section), it might convert input text into numerical representations or perform autocomplete by predicting word sequences (both useful but limited functions).

Fine-tuning, typically the second stage, involves further training the foundation model in a supervised manner (using a labeled dataset that's usually much smaller than the pre-training dataset). Fine-tuning aims to either solve a specific task (like recognizing named entities) or adapt to a specific domain (such as medical texts).

Note that fine-tuning does not have to be supervised and can be based on other approaches (such as semi-supervised training or reinforcement learning), but we will not consider those in the scope of this topic.

This step enables the foundation model to handle various tasks (like classification or following instructions). The "chat" in ChatGPT means the GPT (a foundation model) has been (non-exclusively) fine-tuned on instructions (specifically, on a labeled dataset of "task description" — "desired response" pairs) for conversation purposes.

The high level overview of LLM training

Fine-tuning typically includes a sub-step called alignment. In this step, developers adjust the model to behave in a specific way (such as being friendly) to meet user expectations.

The two types

There are two main categories of LLMs based on their outputs: the autoregressive and the representational (or embedding) models.

Autoregressive (or generative) models generate text by predicting the next token (word or subword) in a sequence, given all the previous tokens. They model the conditional probability of a token based on the tokens that have come before it. You have certainly interacted with an autoregressive model — ChatGPT, Claude, Gemini, Llama, Grok, and similar models. They all fall into this category. That's why you see them streaming tokens one by one as they produce the answer to your query — that's autoregression in plain sight.

Illustration of autoregressive modelRepresentational models, on the other hand, are designed to represent the entire input sequence. They encode it into a latent representation using an embedding model. These models learn to represent words and text as vectors (also known as embeddings). They then make predictions based on that representation.

Some representational models, particularly models like BERT, learn by predicting masked words from their surrounding context (although this is not the only way to train embedding models). Autoregressive model providers also provide representational models.

Representational models can produce embeddings as their final output (that is, the foundation model representational model). They can later be fine-tuned to perform a narrower task, such as sentiment analysis, and output a predefined label ('positive', 'negative', or 'neutral') for a given input.

Illustration of representational LLM

Representational models typically do not generate text. Autoregressive models, on the other hand, primarily focus on generation.

Building with LLMs

Developers are using LLMs to power various applications. A foundational technique is semantic search, which moves beyond simple keyword matching to find information based on contextual relevance. A fine-tuned representational model can be used for typical natural language processing tasks — text classification (for sentiment analysis), part-of-speech tagging, or finding whether two documents are similar, to name a few. This is a supervised task, so labels are required. This approach is the backbone of modern Q&A systems, recommendation engines, and intelligent document retrieval.

Autoregressive models are more general-purpose and are used for applications requiring content generation. These models, controlled by user prompts, are the engines behind conversational AI and chatbots. Because they generate text by predicting the most statistically likely next word, they are prone to "hallucinating"—creating plausible-sounding but factually incorrect information. Furthermore, they have no inherent memory of their training data, making it impossible for them to cite the sources of their information. They are useful for most general tasks, but they are unreliable for tasks that demand high accuracy and trustworthiness.

To solve this, various techniques, such as retrieval-augmented generation (RAG), have been popularized in recent years. RAG combines the indexing of document embeddings in a specific domain—a technique from representational models—with the generative aspect. This grounds the model with specific, verifiable data. The system first retrieves relevant facts from a knowledge base and then instructs the model to generate its response based only on that provided information. This makes the output more accurate, with the ability to reference the exact source of the output.

Beyond generating text based on provided context, the next frontier is building LLM-powered agents capable of autonomous action. This pattern uses an autoregressive model as a reasoning engine to interpret a goal, create a plan, and execute it by using external tools. The model's task is to decide what to do next, whether it's calling an API, running a script, or querying a database. This enables the creation of more capable personal assistants or automated systems that can perform complex tasks.

Conclusion

The evolution of LLMs shows an impressive journey from understanding language to generating it, and with agents, taking action with it. LLMs fall into two main categories. Representational models create meaningful embeddings for tasks like semantic search and classification. Autoregressive models can perform various functions based on the prompts. Through architectures like RAG, developers can overcome limitations such as the lack of source memory. This enables us to build reliable systems based on factual information that transform industries in many ways.

21 learners liked this piece of theory. 1 didn't like it. What about you?
Report a typo