Computer scienceData scienceNLPText processing

N-gram and collocation measures

Find the best bigrams by PMI

Report a typo

Take a tokenized version of Shakespeare's Hamlet from the NLTK corpus:

import nltk
nltk.download('gutenberg')
from nltk.corpus import gutenberg

hamlet = gutenberg.words('shakespeare-hamlet.txt')

Preporcess the text by filtering words shorter than three letters and words that are marked as stopwords. Find the ten best bigrams according to the PMI association score.

Enter a list of the tuples (bigrams) you have found in the answer field, with each bigram on a new line.
For example,
[('First word', 'Second word'),
('First word', 'Second word'),
('First word', 'Second word')]

Enter a short text

___

Create a free account to access the full topic