Project

Simple Text Summarization

26 completions
~ 28 hours
4.6

You will learn how to use the TF-IDF metric for extractive text summarization, see how XML files are structured and how to parse them. You will also get familiar with a Python machine learning library scikit-learn and a library BeautifulSoup.

Provided by

JetBrains Academy JetBrains Academy

About

The modern world is rapid and dynamic, and every day people read a lot of news about politics, science, entertainment industry, programming, and so on. When there is so much information to digest, it gets really hard to deal with all of it! In this project, you will get familiar with something that might help: simple text summarization technique. This method extracts the most important sentences from the given text and is a great baseline for further experiments with other summarization approaches.

Training project icon

Training project

This project allows you to practice and strengthen your coding skills, helping you get ready for more advanced tasks ahead.

What you'll learn

Once you choose a project, we'll provide you with a study plan that includes all the necessary topics from your course to get it built. Here’s what awaits you:
Extract the first √N sentences of the text assuming that they represent the most important information in the summarized text.
Implement the tokenization and lemmatization of each news article.
Create and fit a TfIidfVectorizer object on preprocessed texts. Find the mean TF-IDF for each sentence and pick √N sentences with the highest scores.
Use your trained TfIidfVectorizer object and add extra weights for words occurring in the headline.

Reviews

Raman But-Husaim avatar
Raman But-Husaim
12 months ago
A really nice project that helps to grasp several NLP topics, including TF-IDF algorithm. Really enjoyed the process.
Christopher Christian avatar
Christopher Christian
2 years ago
had a lot of fun learning more about web scraping and numpy operations. very informative.
Mauricio Alejandro Gil Prieto Palacios avatar
Mauricio Alejandro Gil Prieto Palacios
3 years ago
I have learnt to summarize text in a straightforward way. I have gained a lot of knowledge about NLP!

4.6

Learners who completed this project within the course rated it as follows:
Usefulness
4.8
Fun
4.5
Clarity
4.3