6 minutes read

Just like most people around the world, you probably interact with computers and mobile devices daily. Have you ever wondered how these machines manage to understand you? Maybe you are a computer engineer, but common users certainly aren't, so generally, they communicate with machines in a different way. If you are interested in how it is done, get ready to peek under the hood and explore the world of language technology.

Natural language

Apart from user-friendly interfaces, there's another important thing: computers actually need to understand our human language. Scientists call our language natural, as opposed to artificial like Esperanto or Interlingua or formal languages, which include the programming ones.

Natural languages were not built on purpose by somebody. They evolved in human communities. The rules of our conversation and writing are not fixed in some laws, it's more like a social contract. We learn to speak like we learn to move, breathe, or digest. No wonder, it's difficult for computers to analyze "natural" texts because even we, the humans, misunderstand each other. Understanding language requires a complex approach.

Indo-European language tree

Indeed, the current language technologies have an interdisciplinary basis. It is based on linguistics, psychology, computer science, machine learning, and even ethics. Let's move on to a bit more formal introduction.

Natural language processing

Natural language processing, or NLP for short, is a branch of artificial intelligence that helps with interaction between computers and human languages. It involves processing of large text data. Texts are usually grouped into a special collection called corpus. Text corpora are convenient to work with, but if you want to process language efficiently, it's crucial to know how it is organized.

We share a common linguistic background if we speak the same languages. Computers cannot understand words, grammatical rules, and connotations that we choose. Natural languages have many levels: syntax, morphology, semantics, and so on. The linguistic analysis at multiple levels can show us text's internal structure. It can help us solve various real-life problems.

Main applications

In NLP, we use computers to solve language-related tasks, and it has already shown great promise.

Conversational agents from sci-fi books have made their way into reality. Nowadays, they converse with us to guide, train, support or simply entertain. Communication with dialogue systems usually consists of several stages. First, a program recognizes human speech, which may be difficult due to various accents, speech difficulties, and pronunciation characteristics. The transcribed text is then analyzed and represented semantically, so that the computer can understand and respond. The last stage is speech synthesis. That's quite an engaging field indeed, as diverse as human beings themselves.

conversational agents from sci-fi books

The architecture of dialogue systems may include natural language generation. At this step, the program selects parts of data to focus on, structures the content, and puts ideas into words. You can generate replies to keep up the flow of conversation or even work with non-textual data, for example, to describe what's on a picture. It's called image captioning.

Spell checkers and writing assistants may seem a bit down-to-earth, but they certainly make your life easier. They can correct your misspellings, check grammar, catch punctuation mistakes, and spot other issues. These tools rely on statistical patterns found in a language to improve your writing.

Machine translation seems vitally important in some cases. Why should there be language barriers? In terms of input and output, a text is translated from a source language to a target language. Of course, there are some difficulties in between. It's always good to know that machine translation, being one of the earliest challenges in computational linguistics, now includes a vast number of approaches and applications.

machine translation in action

Sentiment analysis and opinion mining are generally applied to texts of a smaller size, for example, tweets or online reviews. The idea is to separate what is said (an opinion) from how it is said (an emotion, or polarity). It is useful for identifying trends in social media and developing new strategies in marketing.

Text summarization helps create a concise summary of a document. First, the task is to identify the most important, the gist. Then, you can either extract and compile the original data or transform it into a brand-new text. Either way, it's a superpower in the world overloaded with data.

And we're only getting started. We will recommend resources for further study, but now, it's time to sum up.

Conclusion

Let's highlight the main points we've discussed:

  • Natural language processing (NLP) is a branch of artificial intelligence that builds a bridge between computers and human languages;

  • This research field relies on various disciplines, such as linguistics and computer science.

  • Natural language processing tackles a wide variety of language-related problems and has many real-world applications.

Hopefully, we have advertised NLP just enough to attract you and spark your interest (that's what we aimed for!). If you are bursting with curiosity now, check out Speech and Language Processing, a brilliant book by Dan Jurafsky and James H. Martin. You can also start learning Python, a great programming language with lots of NLP tools.

330 learners liked this piece of theory. 5 didn't like it. What about you?
Report a typo