You have already learned that Large Language Models are suited for tasks like producing text or code. Well, despite all their advantages, sometimes LLMs can be misleading!
In this topic, you'll learn how and when they tend to provide incorrect information and struggle with understanding complex situations. You'll also learn about these limitations affecting their practical usage.
Knowledge cut-offs
Large language models like GPT-3 and Claude have a knowledge cut-off, meaning they're unaware of events after their last update. This is a challenge in software development, where new syntax, frameworks, and libraries develop rapidly, and outdated code can lead to errors.
ChatGPT-4, a paid version provided by OpenAI, has addressed this problem by incorporating the 'Browse with Bing' feature. This feature allows ChatGPT-4 to search the web directly and answer questions that require more recent information. With the release of GPT-4o, the web search functionality is now included by default, ensuring users have access to the most current data. Other LLMs like Perplexity AI and Google Gemini also are able to get the latest information via web search.
Hallucinations
Hallucination occurs when a language model gives incorrect results for your prompts. For example, in the prompt snippet below, you may ask about the implementation of the vector cross product for complex numbers from the Eigen library:
However, after looking at the Eigen documentation, it was found that the implementation for the vector cross product for complex numbers does exist, which shows that LLMs could hallucinate.
Relying on information from AI models can lead to inaccuracies, resulting in challenges such as misguided decisions or misunderstandings of critical insights. It is important to verify the accuracy of information manually due to the extensive use of AI in many sectors resulting in the increasing prevalence of AI-generated content.
Cross-verification and fact-checking help to eradicate hallucinations by comparing the results from AI with the authentic sources. Remember, this ensures that the information is trustworthy, keeping you safe from building upon erroneous foundations!
Input and output limitations
The model gets the idea from the input to generate the output. LLMs utilize the tokenization process, meaning that the input text is divided into individual units known as tokens in a way that allows the model to understand and process them effectively.
In the following example, each word and each comma has been counted as a single token. To count the characters, the spaces are also considered along with words and commas.
Below are the rough estimations of tokens provided by OpenAI:
1 token is approximately equivalent to 4 characters in English
1 token is approximately equivalent to ¾ words
100 tokens is approximately equivalent to 75 words
The limitation in the number of input tokens reduces the excessive information to be transmitted to LLM. As a result, it allows the model to generate output within a confined range, enhancing accuracy based on the provided input. ChatGPT-3.5 has a context window of 4,096 tokens, whereas ChatGPT-4 has a context window of 8,192 tokens. ChatGPT-4o offers a context window of 128k, which allows larger input context.
Note: The pricing and token policies of models are subject to change due to the ongoing development of AI technologies. You are advised to check the latest information and pricing details on the official OpenAI pricing page.
LLMs cannot work efficiently if the input tokens exceed their limit because this will hinder the model from remembering past information, affecting the response's coherence. If you want to know more about tokens and how to count them, you can read more in OpenAI's official documentation.
The LLMs depend on the context window to generate the output coherently. The LLMs have 'short-term memory', meaning they keep the context that lies in the current context window. Users have to remind it of important facts from previous prompts.
Different language models have their specific ways to count the tokens. So, for the rough estimation to count the token and to make sure that the input token lies within the limit, you could use a tool by OpenAI called "Tokenizer" to calculate the number of tokens sent as a prompt to the LLM for effective generation of response.
Conclusion
To use LLMs effectively, users have to be aware of their advantages and disadvantages for decision-making. It is advised to consider the limit of input tokens that you provide to the model because exceeding the token limits will lead to truncated outputs.
Although LLMs mostly generate logical responses, the facts still must be cross-verified due to the prevalence of challenges like hallucinations and knowledge cutoffs. Keeping these few simple rules in mind might save you lots of time and effort when working with generative AIs!