You have probably dealt with dictionaries, glossaries, and thesauri. WordNet, a similar concept in many ways, is a valuable tool in NLP. When you learn a language, you use a dictionary to learn new words; we use WordNet to teach a computer word meanings. One of the ways of automatically differentiating senses is WordNet. That's why it is vital to study WordNet.
What is WordNet?
WordNet is a lexical database that shows the semantic relationship between different words. The first iteration of WordNet was for English in 1985 at Princeton University. At that time, it was an innovation, and many researchers were attracted by the idea of making field-specific WordNet: WordNet for other languages or BioWordnet, a biomedical extension. These days, there is a WordNet for every wide-known language:
WordNet | Language | Link |
Arabic WordNet | Arabic | |
GermaNet | German | |
EuroWordNet | Dutch, Danish, Italian, Spanish, German, French, Czech, Russian, Portuguese, and some others | |
IndoWordNet | Sanskrit, Hindi, Nepali, and 15 other Indian languages | |
Open Dutch WordNet | Dutch | |
PLWORDNET | Polish | |
PolNet | Polish | |
RussNet | Russian | |
Yet Another RussNet | Russian | |
WordNet Libre | French | |
Chinese Open WordNet | Mandarin Chinese | |
Persian WordNet | Persian (Farsi) | |
Thai WordNet | Thai | |
Japanese WordNet | Japanese | |
BalkaNet | Bulgarian, Czech, Greek, Romanian, Turkish, Serbian | |
ItalWordNet | Italian | |
Princeton WordNet | English |
You can notice that some languages have two or even more WordNets in this table. That is because each institution tries to make its WordNet. Czech occurs in BalkanNet and EuroWordNet. There is a bunch of other WordNets in which Czech may occur. And the examples are abundant. For example, the Institute of Linguistics (Academy Sinica, Taiwan) has made its WordNet for Chinese.
WordNet is a helpful tool for many tasks in NLP: Word-Sense Disambiguation (WSD), Text classification, Text Summarization, Machine Translation, etc. For WSD, it is crucial to determine the kind of semantic relationship. It checks the similarities between words or checks what the key differences are. WordNet, like every thesaurus, provides a list of meanings of the same words. So in WSD, we can choose which one fits us the best.
Semantic relationships
To grasp the basic concept of WordNet, let's discuss the semantic relationships. WordNet concentrates on semantics relationships between nouns, verbs, adjectives, and adverbs. We generally define the following relationships for nouns:
Synonyms. These are words of similar meaning. Examples: "precise" - "accurate" - "particular";
Antonyms. Words with opposite meanings. Examples: "adored" - "hated", "naughty" - "obedient";
Hyponyms. It is a word whose meaning is included in the definition of another word. Note that we're talking about partial convergence. Examples: "elephant" is a hyponym for "mammal," "Sun" is a hyponym for "star";
Hypernyms. The opposite of hyponyms: "mammal" is a hypernym for "elephant"; "star" is a hypernym for "Sun";
Meronyms. A meronym is a part of something whole. Examples: "screen" is a meronym for "laptop", "engine" is a meronym for "car";
Holonyms. The opposite of the meronym. Examples: "laptop" is a holonym for "screen" and "keyboard"; "car" is a holonym for "engine" and "wheel" (in both its senses)
Coordinate terms. We call two words coordinate terms if they have the same hypernym. Examples: "elephant" and "giraffe" have the same hypernym "mammal" --> they are coordinate terms.
WordNet structure
WordNet is organized into hierarchies with synsets (synonym sets) on each level. The most common hierarchy has a hypernym-holonym relationship.
Synset is a set of synonyms on each level. All the words in the synset have one common hypernym.
In the example below, you can see a segment from Princeton WordNet. Numbers indicate the level of synset: 1 is the highest, 9 - the lowest. Words in synset are separated by /. --> means that the next synset is a hyponym of the previous one:
1. creation -->
2. product/production -->
3. word/piece_of_work -->
4. publication -->
5. book -->
6. reference_book/reference/reference_work/book_of_facts -->
7. cookbook
7. instruction_book
7. word_book -->
8. dictionary/lexicon -->
9. bilingual dictionary
9. etymological_dictionary
9. learner's_dictionary
9. pocket_dictionary
8. vocabulary
8. glossary
8. thesaurus -->
9. word_finder
7. handbook/enchiridion/vade_mecum -->
8. manual -->
9. reference_manual
8. bible # meaning: a book regarded as authoritative in its field
8. guidebookAccording to this schema, we can make the following statements:
creationis an inheritor hypernym ofguidebook;publicationis a direct hypernym ofbook;dictionaryandlexiconare synonyms and represent one common synset.
Our example does not show meronyms/holonyms, so it is not the complete structure.
Generally, we use labels after the word if it has multiple senses. For example, in WordNet, word book will have labels like: book.v.01 (for the first sense of the verb) and book.n.05 (for the fifth sense of the noun). We enumerate these labels according to a specific dictionary (in most cases, Wiktionary).
WordNet Search 3.1
WordNet Search 3.1. is available on the official website of Princeton University.
You can search for any English word. Let's try book:
After you press Search WordNet, it will show you a typical dictionary entry. There, we have many definitions of the word book. We will be working with the first definition in the Noun section.
If you press S: before this definition, you will see a subsection with five lines:
You can choose what you want to see further: hyponyms, meronyms, etc.
On the screenshot above, you see all direct and inherited hyponyms of the word book. If you choose a random word in this list (for example, crammer), you can either click on the word, and then you will see a dictionary entry to this word, or you can press S: and then you will be able to choose hypernyms (or something else) to this word.
WordNet in NLTK
Princeton WordNet is available in NLTK.
Let's find the synset with the word book (in a Python list of all synsets with this word):
import nltk
from nltk.corpus import wordnet as wn
nltk.download('wordnet')
nltk.download('omw-1.4')
wn.synsets('book')[1]
# Synset('book.n.02')Now, we know that there is a synset labeled as book.n.02. We can see what is the definition of the noun book (in the second sense), example(s), and all words of the current synset:
print("Synset('book.n.02')\n")
print('Definition: ', wn.synset('book.n.02').definition())
print("Examples: ", wn.synset('book.n.02').examples()[0])
print("Synset name: ", wn.synset('book.n.02').name())
print('\nWords in synset: ')
for num, lemma in enumerate(wn.synset('book.n.02').lemmas()):
print(f"Word №{num} - " , str(lemma.name()))
# Synset('book.n.02')
# Definition: physical objects consisting of a number of pages bound together
# Examples: he used a large book as a doorstop
# Synset name : book.n.02
# Words in synset:
# Word №1 - book
# Word №2 - volumeBeware of the .examples() function — some words do not have an example. In this case, you will get an error. For example, the noun book in the 5th sense has no examples:
len(wn.synset('book.n.04').examples())
# 0You can check direct hypernyms and hyponyms of this word:
print("Hypernyms : ", wn.synset('book.n.02').hypernyms())
print("Hyponyms : ", wn.synset('book.n.02').hyponyms())
# Hypernyms : [Synset('product.n.02')]
# Hyponyms : [Synset('album.n.02'), Synset('coffee-table_book.n.01'), Synset('folio.n.03'), Synset('hardback.n.01'), Synset('journal.n.04'), Synset('notebook.n.01'), Synset('novel.n.02'), Synset('order_book.n.02'), Synset('paperback_book.n.01'), Synset('picture_book.n.01'), Synset('sketchbook.n.01')]An ontology tree in WordNet has 25 or so words on top — these 25 words are called root hypernyms, as they are general terms for all words on the bottom of the tree. In NLTK, you can check the root hypernym:
print("Top hypernym : ", wn.synset('book.n.05').root_hypernyms())
# Top hypernym : [Synset('entity.n.01')]You can also find antonyms for adjectives, but the number of words, for which antonyms are provided, is tiny:
print("Antonyms to strange: ", wn.synset('strange.a.01').lemmas()[0].antonyms())
# Antonyms to strange: [Lemma('familiar.a.02.familiar')]In NLTK, WordNet, we can translate English words into 25 languages, including Japanese, French, and Arabic. Let's translate noun book in its first sense into Japanese:
print(wn.synset('book.n.01').lemma_names('jpn'))
# ['ご本', 'ブック', '単行本', '図書', '巻', '巻帙', '御本', '教科書', '書', '書典', '書冊', '書史', '書巻', '書帙', '書物', '書籍', '書誌', '本', '竹帛', '篇帙', '篇章', '編章', '著', '著作', '著作物', '著書', '読みもの', '述作', '韋編']The output will be a list of words in the Japanese synset linked to the English synset. You can choose the first element of the list with an index.
Conclusion
In this topic, we discussed:
The idea of WordNet and its structure;
How to use WordNet Search 3.1;
How to deal with WordNet in NLTK.
We have also discussed linguistic concepts such as hypernym, hyponym, meronym, and holonym. They helped us understand the usefulness of WordNet.
Now, it's time for some practice!
You can find more on this topic in Mastering Stemming and Lemmatization on Hyperskill Blog.