Project

Spam Filter

Challenging
28 completions
~ 34 hours
3.9

Build a spam filter from scratch with Naive Bayes, one of the most common algorithms. But first, learn about data preprocessing and feature extraction from texts with SpaCy and Pandas. Make use of functions to perform repetitive tasks and functional decomposition. Implement your own algorithms and measure how well they perform against the Multinomial Naive Bayes classification algorithm from the Scikit-learn library.

Provided by

JetBrains Academy JetBrains Academy

About

Beep! You received a new message! You reach out for your smartphone only to find out it is just another piece of spam. It is becoming very hard to tell important messages from loads of spam in your inbox. Manually checking the entire mailbox for junk might be a lengthy process, and you may want to spend all that time more wisely. Let's build a program that will help you identify and filter out spam messages!

Graduate project icon

Graduate project

This project covers the core topics of the Coding Machine Learning Algorithms course, making it sufficiently challenging to be a proud addition to your portfolio.

At least one graduate project is required to complete the course.

What you'll learn

Once you choose a project, we'll provide you with a study plan that includes all the necessary topics from your course to get it built. Here’s what awaits you:
Remove punctuation marks, stop words, and lemmatize tokens.
Create a function that returns a dataframe with a bag of words.
Calculate the probabilities of words in ham and spam subsets.
Build the Naive Bayes with the Laplace smoothing classifier to make better predictions.
Calculate the accuracy, precision, recall, and f1 score metrics of your model.
Train, predict, and calculate the metrics with the Multinomial Naive Bayes algorithm from the Scikit-Learn library, and then compare your model against it.

Reviews

Andrei Raugas avatar
Andrei Raugas
2 years ago
I learned how to vectorize text data and use a bag of words model with a multinomial Naive Bayes classifier.
synth avatar
synth
2 years ago
Moderator
In this project, I worked with pandas, spacy, and scikit-learn.I've trained a multinomial naive bayes model on the spam dataset using my own implementation as well as the one from the scikit-learn package.The additional laplace smoothing method was used to deal with missing words between classes. ...
Christian Camilo Valencia Villegas avatar
Christian Camilo Valencia Villegas
2 years ago
I have learned about natural language processing and the basics of binomial classification.

3.9

Learners who completed this project within the Coding Machine Learning Algorithms course rated it as follows:
Usefulness
4.6
Fun
4.0
Clarity
3.1