Transformer models are the current leading technology in the NLP domain. Transformers are able to solve almost all possible problems in the field of NLP, from text classification to speech generation. In this topic, we will tell you about transformers and the tasks they can perform, as well as how to use them. We will also mention the main Transformer models and compare them.
Transformers as language models
The first transformer model, GPT, was introduced in 2018, a year after the invention of the transformer architecture. Over ten models have been introduced since that time. Here is a short timeline from 2018 to 2021:
All the transformers mentioned above are language representations. Transformers have emerged as powerful language models, capable of understanding and generating human-like text. Language models are trained on large amounts of textual data and learn the statistical patterns and structures inherent in language. Transformers, in particular, have shown remarkable success in various language modeling tasks.
Transformers as language models exhibit several key characteristics:
-
Contextual understanding: transformers can comprehend the contextual information within a text, considering the relationships between words and phrases. This understanding helps them generate more coherent and contextually appropriate responses.
-
Creative text generation: transformers can generate text that goes beyond simple rule-based or template-based systems. They can produce creative and diverse outputs, often mimicking human-like language patterns.
-
Fine-tuning for specific tasks: pre-trained transformer models can be fine-tuned on specific downstream tasks using smaller task-specific datasets. This process helps them adapt their language generation capabilities to more specialized domains, such as customer support, medical texts, or legal documents.
-
Language understanding and generation: transformers excel at both understanding and generating natural language. They can comprehend the meaning and nuances of the text, as well as produce text that is contextually appropriate and coherent.
Transformers as language models have revolutionized various domains, including chatbots, virtual assistants, content generation, and language translation. Their ability to understand and generate human-like text has opened up new possibilities for natural language understanding and generation applications, pushing the boundaries of what can be achieved with automated language processing.
Tasks performed with transformers
Transformers, a type of deep learning model, have significantly advanced natural language processing (NLP) tasks due to their ability to capture contextual information effectively. Here are several NLP tasks where transformers have demonstrated remarkable performance:
- Text Classification: Transformers can classify documents into predefined categories, such as sentiment analysis (determining whether a text expresses a positive or negative sentiment), topic classification, spam detection, and intent recognition.
- Named Entity Recognition (NER): Transformers can identify and classify named entities within a text, such as people, organizations, locations, dates, and other specific entities.
- Part-of-Speech Tagging (POS): Transformers can assign appropriate grammatical tags to each word in a sentence, such as a noun, verb, adjective, etc.
- Machine Translation: Transformers, especially models like the Transformer-based "Attention Is All You Need" (BERT), have greatly improved machine translation systems by learning to understand the context and semantics of the source and target languages.
- Text Summarization: Transformers can generate concise summaries of longer texts, capturing the most important information and key points.
- Question Answering: Transformers can answer questions based on a given context, as demonstrated by models like BERT and OpenAI's GPT series.
- Text Generation: Generative transformers like GPT have the capability to generate coherent and contextually relevant text, making them useful for tasks such as chatbots, dialogue systems, and creative writing assistance.
- Sentiment Analysis: Transformers excel at determining the sentiment expressed in a piece of text, helping analyze public opinions, customer feedback, and social media sentiments.
- Document Classification: Transformers can categorize entire documents into predefined classes, enabling tasks such as document organization, news classification, and document routing.
These are just a few examples of the many NLP tasks to which transformers have made significant contributions. With their ability to model contextual relationships effectively, transformers have revolutionized the field of NLP and continue to drive advancements in language understanding and generation.
Let's find out more about the most popular transformer models!
Overview of transformers
The BERT transformer model was introduced in 2018 by Google. Note that this model differs from the standard neural network BERT model. The BERT transformer model is a language representation; the original one is not. BERT was trained with the help of masked language modeling (MLM) and next sentence prediction (NSP) objectives. It is efficient at predicting masked tokens and at NLU in general, but it is not optimal for text generation.
BERT represents masked language modeling. MLM models such as BART are pre-trained to predict masked tokens. This process includes replacing a random subset of the input with a mask token, and then the model predicts the original tokens for each of the tokens. BERT is a good solution for classifying texts and tokens, NER, sentiment analysis, and so on.
T5 is a kind of transfer learning where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, and it has emerged as a powerful technique in NLP. T5 comes in many sizes: t5-small, t5-base, t5-large, t5-3b, and t5-11b. Based on the original t5-base model, Google developed a multilingual one called mT5. While the original model was trained on the C4 corpus, mT5 was trained on the mC4 corpus, which includes 101 languages.
Since T5 is a seq2seq model, it converts all NLP problems into text-to-text problems. So, you will always need two utterances to train your model. For machine translation tasks, the corpus is as follows: a sentence in language X, and a sentence in language Y. The sequence in the language X is fed to the model using input_ids. The sentence in language Y (also called the target sequence) is shifted to the right and fed to the decoder using decoder_input_ids.
Besides machine translation and text summarization, T5 is also suitable for text classification (check the Colab Notebook on how to implement this task with T5), named entity classification (check the Colab Notebook), and question answering (check the Colab Notebook). By the way, the first two tasks are not text-to-text ones.
XLNet is a bigger version of the Transformer-XL model that was trained using an autoregressive method to learn bidirectional contexts by choosing the one with the highest expected likelihood over all the different ways of putting the factorization order of the input sequence. XLNet is pretty hard to train; that's why XLNet is pre-trained using only a subset of the output tokens as targets, which are selected with the target_mapping input.
XLNet is good for classification tasks. Developers claim that XLNet outperforms BERT on 20 tasks, often by a large margin, including question answering, natural language inference, sentiment analysis, and document ranking.
Moreover, XLNet is one of the few models that has no sequence length limit.
BART uses a standard seq2seq/machine translation architecture with a bidirectional encoder and a left-to-right decoder. BART is outstanding for text generation tasks, though you can apply it to NLU tasks too. The model achieves new state-of-the-art results on a range of abstractive dialogue, question-answering, and summarization tasks, with gains of up to 6 ROUGE.
GPT, which stands for "Generative Pre-trained Transformer," is a type of language model developed by OpenAI. GPT is an autoregressive model, generating text sequentially from left to right. It predicts the next word in a sentence by considering the preceding words. The training of GPT follows a unidirectional (causal) language modeling objective. It focuses on predicting the next word based on contextual information from previous words. The GPT model can be used for various applications, including content creation, language translation, question-answering systems, chatbots, and more. The models have been particularly successful in generating human-like responses in conversational settings. The most recent version of GPT is GPT-4, which was released in 2023.
Comparison of different Transformer models:
| Feature\Model | BERT | T5 | XLNet | BART | GPT |
| Encoder or Decoder type | Encoder | Seq2seq | Decoder | Seq2seq | Decoder |
| Left-to-right encoder/decoder | - | - | - | + | + |
| Bidirectional encoder | + | - | + | + | - |
| Masked Language Modelling | + | - | - | + | - |
| Available in HuggingFace | + | + | + | + | + |
Conclusion
In this topic, you have learned more about:
- transformers as language models;
- the most popular transformers;
- the NLP tasks that could be done with transformers.