Project

News Text Summarization

Hard
1 completion
~ 22 hours
New

Through this project, you will gain practical experience in applying abstractive text summarization using the BART model. You will learn to prepare a dataset for NLP tasks, fine-tune a pre-trained model, evaluate its performance.

Provided by

JetBrains Academy JetBrains Academy

About

In this project, you will embark on an exciting journey to develop a practical, real-world news summarization system using the BART (Bidirectional and Auto-Regressive Transformers) model. Unlike extractive summarization models that merely regurgitate the same words from the news articles, BART excels at abstractive summarization, creating new, better-phrased summaries. What's even better is that you will fine-tune BART to enhance its performance on news articles, making it perform even better than many state-of-the-art models. Throughout this project, you will learn how to preprocess real-world news data from APIs and use a fine-tuned model to generate concise and coherent summaries.

Graduate project icon

Graduate project

This project covers the core topics of the MLOps Engineer course, making it sufficiently challenging to be a proud addition to your portfolio.

At least one graduate project is required to complete the course.

What you'll learn

Once you choose a project, we'll provide you with a study plan that includes all the necessary topics from your course to get it built. Here’s what awaits you:
In this stage, you will fine-tune a pre-trained BART model using the prepared news summarization dataset. This process involves training the model on the dataset to specialize in generating concise and accurate summaries of news articles.
This stage focuses on building a pipeline to collect and prepare news articles specifically for evaluating the fine-tuned BART model. This includes fetching articles from a news source, cleaning the raw text data, and storing the processed articles.
This final stage involves assessing the performance of your fine-tuned BART model on unseen news articles. You'll use the prepared news data to generate summaries and review their quality.