Computer scienceData scienceNLPText corpora

WordNet

10 minutes read

You have probably dealt with dictionaries, glossaries, and thesauri. WordNet, a similar concept in many ways, is a valuable tool in NLP. When you learn a language, you use a dictionary to learn new words; we use WordNet to teach a computer word meanings. One of the ways of automatically differentiating senses is WordNet. That's why it is vital to study WordNet.

What is WordNet?

WordNet is a lexical database that shows the semantic relationship between different words. The first iteration of WordNet was for English in 1985 at Princeton University. At that time, it was an innovation, and many researchers were attracted by the idea of making field-specific WordNet: WordNet for other languages or BioWordnet, a biomedical extension. These days, there is a WordNet for every wide-known language:

WordNet

Language

Link

Arabic WordNet

Arabic

Website to download,

Application of AWN

GermaNet

German

Official website

EuroWordNet

Dutch, Danish, Italian, Spanish, German, French, Czech, Russian, Portuguese, and some others

Official website (archived)

IndoWordNet

Sanskrit, Hindi, Nepali, and 15 other Indian languages

Official website

Open Dutch WordNet

Dutch

Official website

PLWORDNET

Polish

Official website
in Polish

PolNet

Polish

Official website

RussNet

Russian

Official website

Yet Another RussNet

Russian

Official GitHub page

WordNet Libre
du Français

French

Official website

Chinese Open WordNet

Mandarin Chinese

Official website

Persian WordNet

Persian (Farsi)

Download link

Thai WordNet

Thai

Download link

Japanese WordNet

Japanese

Download link

BalkaNet

Bulgarian, Czech, Greek, Romanian, Turkish, Serbian

Official website

ItalWordNet

Italian

Dataset page on "datahub"

Princeton WordNet

English

Official website

You can notice that some languages have two or even more WordNets in this table. That is because each institution tries to make its WordNet. Czech occurs in BalkanNet and EuroWordNet. There is a bunch of other WordNets in which Czech may occur. And the examples are abundant. For example, the Institute of Linguistics (Academy Sinica, Taiwan) has made its WordNet for Chinese.

WordNet is a helpful tool for many tasks in NLP: Word-Sense Disambiguation (WSD), Text classification, Text Summarization, Machine Translation, etc. For WSD, it is crucial to determine the kind of semantic relationship. It checks the similarities between words or checks what the key differences are. WordNet, like every thesaurus, provides a list of meanings of the same words. So in WSD, we can choose which one fits us the best.

Semantic relationships

To grasp the basic concept of WordNet, let's discuss the semantic relationships. WordNet concentrates on semantics relationships between nouns, verbs, adjectives, and adverbs. We generally define the following relationships for nouns:

  • Synonyms. These are words of similar meaning. Examples: "precise" - "accurate" - "particular";

  • Antonyms. Words with opposite meanings. Examples: "adored" - "hated", "naughty" - "obedient";

  • Hyponyms. It is a word whose meaning is included in the definition of another word. Note that we're talking about partial convergence. Examples: "elephant" is a hyponym for "mammal," "Sun" is a hyponym for "star";

  • Hypernyms. The opposite of hyponyms: "mammal" is a hypernym for "elephant"; "star" is a hypernym for "Sun";

  • Meronyms. A meronym is a part of something whole. Examples: "screen" is a meronym for "laptop", "engine" is a meronym for "car";

  • Holonyms. The opposite of the meronym. Examples: "laptop" is a holonym for "screen" and "keyboard"; "car" is a holonym for "engine" and "wheel" (in both its senses)

  • Coordinate terms. We call two words coordinate terms if they have the same hypernym. Examples: "elephant" and "giraffe" have the same hypernym "mammal" --> they are coordinate terms.

WordNet structure

WordNet is organized into hierarchies with synsets (synonym sets) on each level. The most common hierarchy has a hypernym-holonym relationship.

Synset is a set of synonyms on each level. All the words in the synset have one common hypernym.

In the example below, you can see a segment from Princeton WordNet. Numbers indicate the level of synset: 1 is the highest, 9 - the lowest. Words in synset are separated by /. --> means that the next synset is a hyponym of the previous one:

1. creation  -->
  2. product/production  -->
    3. word/piece_of_work  -->
      4. publication  -->
        5.  book  -->
          6.  reference_book/reference/reference_work/book_of_facts  -->
            7. cookbook
            7. instruction_book
            7. word_book  -->
              8. dictionary/lexicon  -->
                9. bilingual dictionary 
                9. etymological_dictionary
                9. learner's_dictionary
                9. pocket_dictionary
              8. vocabulary
              8. glossary
              8. thesaurus  -->
                9. word_finder
            7. handbook/enchiridion/vade_mecum -->
              8. manual -->
                9. reference_manual
              8. bible  # meaning: a book regarded as authoritative in its field
              8. guidebook

According to this schema, we can make the following statements:

  1. creation is an inheritor hypernym of guidebook;

  2. publication is a direct hypernym of book;

  3. dictionary and lexicon are synonyms and represent one common synset.

Our example does not show meronyms/holonyms, so it is not the complete structure.

Generally, we use labels after the word if it has multiple senses. For example, in WordNet, word book will have labels like: book.v.01 (for the first sense of the verb) and book.n.05 (for the fifth sense of the noun). We enumerate these labels according to a specific dictionary (in most cases, Wiktionary).

WordNet Search 3.1

WordNet Search 3.1. is available on the official website of Princeton University.

You can search for any English word. Let's try book:

This is a screenshot of WordNet site where "book" is written in the search bar

After you press Search WordNet, it will show you a typical dictionary entry. There, we have many definitions of the word book. We will be working with the first definition in the Noun section.

If you press S: before this definition, you will see a subsection with five lines:

There is an output of the search query "book"

You can choose what you want to see further: hyponyms, meronyms, etc.

There is an output of the search query "book", where "direct hyponyms" are chosen

On the screenshot above, you see all direct and inherited hyponyms of the word book. If you choose a random word in this list (for example, crammer), you can either click on the word, and then you will see a dictionary entry to this word, or you can press S: and then you will be able to choose hypernyms (or something else) to this word.

WordNet in NLTK

Princeton WordNet is available in NLTK.

Let's find the synset with the word book (in a Python list of all synsets with this word):

import nltk
from nltk.corpus import wordnet as wn
nltk.download('wordnet')
nltk.download('omw-1.4')

wn.synsets('book')[1]


#  Synset('book.n.02')

Now, we know that there is a synset labeled as book.n.02. We can see what is the definition of the noun book (in the second sense), example(s), and all words of the current synset:

print("Synset('book.n.02')\n")

print('Definition:  ', wn.synset('book.n.02').definition())
print("Examples: ", wn.synset('book.n.02').examples()[0])
print("Synset name:  ", wn.synset('book.n.02').name())


print('\nWords in synset: ')
for num, lemma in enumerate(wn.synset('book.n.02').lemmas()):
    print(f"Word №{num}  - "  ,  str(lemma.name()))



#  Synset('book.n.02')

#  Definition:   physical objects consisting of a number of pages bound together
#  Examples:  he used a large book as a doorstop
#  Synset name :   book.n.02

#  Words in synset: 
#  Word №1  -  book
#  Word №2  -  volume

Beware of the .examples() function — some words do not have an example. In this case, you will get an error. For example, the noun book in the 5th sense has no examples:

len(wn.synset('book.n.04').examples())

# 0

You can check direct hypernyms and hyponyms of this word:

print("Hypernyms :  ", wn.synset('book.n.02').hypernyms())
print("Hyponyms :  ", wn.synset('book.n.02').hyponyms())


# Hypernyms :   [Synset('product.n.02')]
# Hyponyms :   [Synset('album.n.02'), Synset('coffee-table_book.n.01'), Synset('folio.n.03'), Synset('hardback.n.01'), Synset('journal.n.04'), Synset('notebook.n.01'), Synset('novel.n.02'), Synset('order_book.n.02'), Synset('paperback_book.n.01'), Synset('picture_book.n.01'), Synset('sketchbook.n.01')]

An ontology tree in WordNet has 25 or so words on top — these 25 words are called root hypernyms, as they are general terms for all words on the bottom of the tree. In NLTK, you can check the root hypernym:

print("Top hypernym :  ", wn.synset('book.n.05').root_hypernyms())


#  Top hypernym :   [Synset('entity.n.01')]

You can also find antonyms for adjectives, but the number of words, for which antonyms are provided, is tiny:

print("Antonyms to strange:  ", wn.synset('strange.a.01').lemmas()[0].antonyms())


#  Antonyms to strange:   [Lemma('familiar.a.02.familiar')]

In NLTK, WordNet, we can translate English words into 25 languages, including Japanese, French, and Arabic. Let's translate noun book in its first sense into Japanese:

print(wn.synset('book.n.01').lemma_names('jpn'))


#  ['ご本', 'ブック', '単行本', '図書', '巻', '巻帙', '御本', '教科書', '書', '書典', '書冊', '書史', '書巻', '書帙', '書物', '書籍', '書誌', '本', '竹帛', '篇帙', '篇章', '編章', '著', '著作', '著作物', '著書', '読みもの', '述作', '韋編']

The output will be a list of words in the Japanese synset linked to the English synset. You can choose the first element of the list with an index.

Conclusion

In this topic, we discussed:

  • The idea of WordNet and its structure;

  • How to use WordNet Search 3.1;

  • How to deal with WordNet in NLTK.

We have also discussed linguistic concepts such as hypernym, hyponym, meronym, and holonym. They helped us understand the usefulness of WordNet.

Now, it's time for some practice!

You can find more on this topic in Mastering Stemming and Lemmatization on Hyperskill Blog.

5 learners liked this piece of theory. 1 didn't like it. What about you?
Report a typo