Computer scienceData scienceNLPText corpora

WordNet

10 minutes read

You have probably dealt with dictionaries, glossaries, and thesauri. WordNet, a similar concept in many ways, is a valuable tool in NLP. When you learn a language, you use a dictionary to learn new words; we use WordNet to teach a computer word meanings. One of the ways of automatically differentiating senses is WordNet. That's why it is vital to study WordNet.

What is WordNet?

WordNet is a lexical database that shows the semantic relationship between different words. The first iteration of WordNet was for English in 1985 at Princeton University. At that time, it was an innovation, and many researchers were attracted by the idea of making field-specific WordNet: WordNet for other languages or BioWordnet, a biomedical extension. These days, there is a WordNet for every wide-known language:

WordNet	Language	Link
Arabic WordNet	Arabic	Website to download, Application of AWN
GermaNet	German	Official website
EuroWordNet	Dutch, Danish, Italian, Spanish, German, French, Czech, Russian, Portuguese, and some others	Official website (archived)
IndoWordNet	Sanskrit, Hindi, Nepali, and 15 other Indian languages	Official website
Open Dutch WordNet	Dutch	Official website
PLWORDNET	Polish	Official website in Polish
PolNet	Polish	Official website
RussNet	Russian	Official website
Yet Another RussNet	Russian	Official GitHub page
WordNet Libre du Français	French	Official website
Chinese Open WordNet	Mandarin Chinese	Official website
Persian WordNet	Persian (Farsi)	Download link
Thai WordNet	Thai	Download link
Japanese WordNet	Japanese	Download link
BalkaNet	Bulgarian, Czech, Greek, Romanian, Turkish, Serbian	Official website
ItalWordNet	Italian	Dataset page on "datahub"
Princeton WordNet	English	Official website

You can notice that some languages have two or even more WordNets in this table. That is because each institution tries to make its WordNet. Czech occurs in BalkanNet and EuroWordNet. There is a bunch of other WordNets in which Czech may occur. And the examples are abundant. For example, the Institute of Linguistics (Academy Sinica, Taiwan) has made its WordNet for Chinese.

WordNet is a helpful tool for many tasks in NLP: Word-Sense Disambiguation (WSD), Text classification, Text Summarization, Machine Translation, etc. For WSD, it is crucial to determine the kind of semantic relationship. It checks the similarities between words or checks what the key differences are. WordNet, like every thesaurus, provides a list of meanings of the same words. So in WSD, we can choose which one fits us the best.

Semantic relationships

To grasp the basic concept of WordNet, let's discuss the semantic relationships. WordNet concentrates on semantics relationships between nouns, verbs, adjectives, and adverbs. We generally define the following relationships for nouns:

Synonyms. These are words of similar meaning. Examples: "precise" - "accurate" - "particular";
Antonyms. Words with opposite meanings. Examples: "adored" - "hated", "naughty" - "obedient";
Hyponyms. It is a word whose meaning is included in the definition of another word. Note that we're talking about partial convergence. Examples: "elephant" is a hyponym for "mammal," "Sun" is a hyponym for "star";
Hypernyms. The opposite of hyponyms: "mammal" is a hypernym for "elephant"; "star" is a hypernym for "Sun";
Meronyms. A meronym is a part of something whole. Examples: "screen" is a meronym for "laptop", "engine" is a meronym for "car";
Holonyms. The opposite of the meronym. Examples: "laptop" is a holonym for "screen" and "keyboard"; "car" is a holonym for "engine" and "wheel" (in both its senses)
Coordinate terms. We call two words coordinate terms if they have the same hypernym. Examples: "elephant" and "giraffe" have the same hypernym "mammal" --> they are coordinate terms.

WordNet structure

WordNet is organized into hierarchies with synsets (synonym sets) on each level. The most common hierarchy has a hypernym-holonym relationship.

Synset is a set of synonyms on each level. All the words in the synset have one common hypernym.

In the example below, you can see a segment from Princeton WordNet. Numbers indicate the level of synset: 1 is the highest, 9 - the lowest. Words in synset are separated by /. --> means that the next synset is a hyponym of the previous one:

1. creation  -->
  2. product/production  -->
    3. word/piece_of_work  -->
      4. publication  -->
        5.  book  -->
          6.  reference_book/reference/reference_work/book_of_facts  -->
            7. cookbook
            7. instruction_book
            7. word_book  -->
              8. dictionary/lexicon  -->
                9. bilingual dictionary 
                9. etymological_dictionary
                9. learner's_dictionary
                9. pocket_dictionary
              8. vocabulary
              8. glossary
              8. thesaurus  -->
                9. word_finder
            7. handbook/enchiridion/vade_mecum -->
              8. manual -->
                9. reference_manual
              8. bible  # meaning: a book regarded as authoritative in its field
              8. guidebook

According to this schema, we can make the following statements:

creation is an inheritor hypernym of guidebook;
publication is a direct hypernym of book;
dictionary and lexicon are synonyms and represent one common synset.

Our example does not show meronyms/holonyms, so it is not the complete structure.

Generally, we use labels after the word if it has multiple senses. For example, in WordNet, word book will have labels like: book.v.01 (for the first sense of the verb) and book.n.05 (for the fifth sense of the noun). We enumerate these labels according to a specific dictionary (in most cases, Wiktionary).

WordNet Search 3.1

WordNet Search 3.1. is available on the official website of Princeton University.

You can search for any English word. Let's try book:

This is a screenshot of WordNet site where "book" is written in the search bar

After you press Search WordNet, it will show you a typical dictionary entry. There, we have many definitions of the word book. We will be working with the first definition in the Noun section.

If you press S: before this definition, you will see a subsection with five lines:

There is an output of the search query "book"

You can choose what you want to see further: hyponyms, meronyms, etc.

There is an output of the search query "book", where "direct hyponyms" are chosen

On the screenshot above, you see all direct and inherited hyponyms of the word book. If you choose a random word in this list (for example, crammer), you can either click on the word, and then you will see a dictionary entry to this word, or you can press S: and then you will be able to choose hypernyms (or something else) to this word.

WordNet in NLTK

Princeton WordNet is available in NLTK.

Let's find the synset with the word book (in a Python list of all synsets with this word):

import nltk
from nltk.corpus import wordnet as wn
nltk.download('wordnet')
nltk.download('omw-1.4')

wn.synsets('book')[1]


#  Synset('book.n.02')

Now, we know that there is a synset labeled as book.n.02. We can see what is the definition of the noun book (in the second sense), example(s), and all words of the current synset:

print("Synset('book.n.02')\n")

print('Definition:  ', wn.synset('book.n.02').definition())
print("Examples: ", wn.synset('book.n.02').examples()[0])
print("Synset name:  ", wn.synset('book.n.02').name())


print('\nWords in synset: ')
for num, lemma in enumerate(wn.synset('book.n.02').lemmas()):
    print(f"Word №{num}  - "  ,  str(lemma.name()))



#  Synset('book.n.02')

#  Definition:   physical objects consisting of a number of pages bound together
#  Examples:  he used a large book as a doorstop
#  Synset name :   book.n.02

#  Words in synset: 
#  Word №1  -  book
#  Word №2  -  volume

Beware of the .examples() function — some words do not have an example. In this case, you will get an error. For example, the noun book in the 5th sense has no examples:

len(wn.synset('book.n.04').examples())

# 0

You can check direct hypernyms and hyponyms of this word:

print("Hypernyms :  ", wn.synset('book.n.02').hypernyms())
print("Hyponyms :  ", wn.synset('book.n.02').hyponyms())


# Hypernyms :   [Synset('product.n.02')]
# Hyponyms :   [Synset('album.n.02'), Synset('coffee-table_book.n.01'), Synset('folio.n.03'), Synset('hardback.n.01'), Synset('journal.n.04'), Synset('notebook.n.01'), Synset('novel.n.02'), Synset('order_book.n.02'), Synset('paperback_book.n.01'), Synset('picture_book.n.01'), Synset('sketchbook.n.01')]

An ontology tree in WordNet has 25 or so words on top — these 25 words are called root hypernyms, as they are general terms for all words on the bottom of the tree. In NLTK, you can check the root hypernym:

print("Top hypernym :  ", wn.synset('book.n.05').root_hypernyms())


#  Top hypernym :   [Synset('entity.n.01')]

You can also find antonyms for adjectives, but the number of words, for which antonyms are provided, is tiny:

print("Antonyms to strange:  ", wn.synset('strange.a.01').lemmas()[0].antonyms())


#  Antonyms to strange:   [Lemma('familiar.a.02.familiar')]

In NLTK, WordNet, we can translate English words into 25 languages, including Japanese, French, and Arabic. Let's translate noun book in its first sense into Japanese:

print(wn.synset('book.n.01').lemma_names('jpn'))


#  ['ご本', 'ブック', '単行本', '図書', '巻', '巻帙', '御本', '教科書', '書', '書典', '書冊', '書史', '書巻', '書帙', '書物', '書籍', '書誌', '本', '竹帛', '篇帙', '篇章', '編章', '著', '著作', '著作物', '著書', '読みもの', '述作', '韋編']

The output will be a list of words in the Japanese synset linked to the English synset. You can choose the first element of the list with an index.

Conclusion

In this topic, we discussed:

The idea of WordNet and its structure;
How to use WordNet Search 3.1;
How to deal with WordNet in NLTK.

We have also discussed linguistic concepts such as hypernym, hyponym, meronym, and holonym. They helped us understand the usefulness of WordNet.

Now, it's time for some practice!

You can find more on this topic in Mastering Stemming and Lemmatization on Hyperskill Blog.

5 learners liked this piece of theory. 1 didn't like it. What about you?

Report a typo