Computer scienceData scienceNLPMain NLP tasksQuestion Answering

Introduction to dialogue systems

2 minutes read

Dialogue systems are one of the most interesting areas in NLP. Of course, you have already met them: dialogue systems include Siri and Cortana, various assistants on store websites, and many others. Let's learn more about them!

What is a dialogue system?

Dialogue systems, or conversational agents, are programs that allow a person and a computer to communicate in a way that is similar to a conversation between two people. Communication takes place as follows: a person asks a question or makes an order, then the program answers them or performs some instructions.

Dialogue systems can be hosted on websites, applications, or social networks. The hosting platform determines how the dialog starts. Sometimes the user must enter a special command, such as start, in order for the dialog system to send the start message. In some platforms, when a dialog is opened, the dialog system itself starts the conversation and sends the start message first.

The most important part of writing a dialogue system is planning and writing a scenario. The scenario is a plan of what the dialog system will do during the dialog, and what actions it will perform after user action. For example, send a welcome message to the users when they enter the word "hello" or search the Internet for information about kittens and send it to the users in response to the query "who are the kittens?", etc.

How the dialog ends depends on the scenario development: for example, the user can explicitly say goodbye to the dialog system and it will stop working. Alternatively, the dialog system can continue indefinitely (which is a bad option in terms of memory leaking) and the user can end the dialog only by closing the page or applications.

There are some differences between human-human and human-computer communication. Computers, as we know, can't understand human speech properly, they could "talk" using a set of NLP instruments, so the conversational agents have serious restrictions. They can't keep up the conversation on any topic, but only on the one that is set in their databases and algorithms.

Let's learn more about what dialogue systems can do.

Chatbots and task-oriented dialogue systems

The dialogue systems are quite different from each other in terms of functionality. They are divided into two types:

Task-oriented dialogue agents are helpful tools for performing some tasks, such as booking a room in a hotel, setting an alarm for 9 o'clock, finding some information on the Internet, and so on. They are used in digital assistants, commercials, entertainment, and many other areas.
Chatbots are used for chit-chat, so they don't perform anything. But they can also be useful: for example, people learning a foreign language can train it with chatbots. The term "chatbot" is often used as a synonym for "dialogue system", but it's not the same thing: the chatbot is a kind of dialogue system.

The dialogue systems are divided not only by functionality but also by the method of communication, which affects their implementation. You'll read more about it in the next section.

Dialogue system content: audio VS text

There are two types of conversational agents in terms of communication channels: audio- and text-based dialogue systems.

You may encounter audio dialogue systems when you call some company that uses conversational agents to answer the most common questions. Digital assistants also use audio content. They could receive a query from users in two ways:

By pressing buttons: the dialogue system offers a menu, such as: "If you want to know your card balance, press one" and so on. Such dialogue systems are simpler in implementation, but they can be inconvenient due to the fact that the menu may not have the questions the user needs. Another disadvantage lies in the fact that a lot of time is spent listening to the menu.
By users' speech: these dialogue systems have speech-processing instruments in their architectures. It's a more complex approach, so it could be hard to implement these systems. This is convenient when the dialog system can immediately answer the question instead of running through the entire menu. But speech recognition is a quite new approach, so recognition errors are common.

Audio-based conversational systems are convenient because you do not need to spend time typing a query. But the audio-based dialogue systems have a sufficient restriction: they can answer you only by voice and can't use URLs, images, or videos.

Text-based dialogue systems are called so because the users need to type some query at first. Text-based dialogue systems can use various types of content: images, videos, URLs, not only text. They could be placed on websites or social networks, which makes them comfortable for users. Some text-based dialog systems can accept user intents not as text, but with buttons. Considering text-based dialogue systems have more implementation options and more functionality, they are more popular than audio-based.

In order to understand which system to choose, you need to think about the necessary functionality and existing restrictions of these types of dialogue systems.

Now you know a lot about different types of conversational agents. But how can this program conduct a dialogue? Let's take a look at the standard algorithm.

General algorithm

There is a standard algorithm of the dialogue system, which doesn't take into account some features inherent in different types of dialogue systems. This description is needed in order to understand the general structure of any dialogue system. Let's take a look at it step by step!

The dialogue system greets a user and tells them about its functions. It's an important step to let the user know what commands they can give or what questions to ask. Usually, dialogue systems have a name and an avatar to make the dialogue more user-friendly.
For example, you visit an airline site and see a pop-up window with a dialog system. There is a message from the dialogue system in the window: "Hello! I'm Leya, The Great Great Airlines' virtual assistant. I can show the takeoff and landing boards from any airport, book tickets, talk about flight rules, and many other things. Tell me, how I can help you?"
The user tells the system some information: asks a question or gives instructions. There are different ways to do it: enter a text, press a button, or tell it by voice. This query is called intent.
Let us say, you want to know what the conditions for the transportation of animals are accepted in this company. So you enter a text: "Tell me how I can fly with my cat to San Francisco".
The system processes intent, which should fall into one of the categories that the dialog system is able to answer. It's common practice in implementing conversational agents to create an "other" category, which catches the queries that don't belong to any category. In that case, the dialogue system tells something like that: "Sorry, I can't get your question. Could you phrase it differently?" or render the dialogue to a human assistant if it's possible.
But luckily, in our example, the dialogue system could define the category correctly! It decided that your intent belongs to the "Rules for the transport of animals" category.
When the program defines the category of the query, it gets corresponding instructions and does something: performs some action, returns the answer to the user, or does both. It's common practice to have various answers to the same question, not to repeat one phrase all the time.
For example, there are the following answer variances in Leya's dialogue system:
a. "The cat during the flight must be in a carrier measuring 60*40*40 cm. Additional fee for an animal is $50."
b. "There are two simple rules: firstly, your cat during the flight must be in a carrier (60*40*40 cm). Secondly, you have to pay an extra $50 for bringing a pet."
The algorithm chose the second answer, so you could see this message in the dialogue window.
After that, the program may ask the user if they want to do something else and wait for another query. If the user enters another query, the algorithm will be repeated.

You'll take a look at some specific features and approaches to create the dialogue systems in the following topics.

Conclusion

Conversational agents can be used in a wide variety of fields, this is truly impressive! They have great advantages: they could help to optimize many processes and are available at any time.

This topic has taught you the following:

definition of the dialogue systems;
different types of dialogue systems (audio VS text, task-oriented VS chatbots);
how dialogue systems process queries.

We'll continue to discuss the ways to implement the dialogue systems in the following topics, but now let's do some practice!

How did you like the theory?

Report a typo