Project

NBA Data Preprocessing

Challenging

110 completions

~ 31 hours

4.4

Learn how to handle missing values in numerical and categorical variables, clean a DataFrame using element-wise operations, handle high-cardinality features, and engineer new features from the existing ones. Determine which features to keep and which ones to drop in the case of multicollinearity and get to know data transformation techniques.

Provided by

JetBrains Academy

About

Data preprocessing is one of the first steps in the machine learning workflow. The main idea is to transform raw data into a format that machine learning algorithms can easily understand. The predictive performance of a machine learning model highly depends on the input data quality. Thus, it's an absolute must to know how to improve the quality of your input data by removing the features with low predictive value, engineering new ones, and dealing with multicollinearity. With this project, you'll apply these concepts to NBA data to get a high-quality dataset ready to be fed to a linear model!

Graduate project

This project covers the core topics of the Pandas for Data Analysis course, making it sufficiently challenging to be a proud addition to your portfolio.

At least one graduate project is required to complete the course.

What you'll learn

Once you choose a project, we'll provide you with a study plan that includes all the necessary topics from your course to get it built. Here’s what awaits you:

Handle missing values, remove extraneous characters, and parse the features.

Create new numerical features out of the existing ones and deal with high cardinality.

Drop the multicollinear features by observing the correlation coefficients.

Apply the transformation techniques to numerical and categorical features.

Reviews

Guillaume Konen

4 days ago

4.7

I learned to use pandas to import data and manipulate columns, rows. The project covers different steps to clean data, prepare for Machine Learning by selecting, combining and removing some feature. A good project to introduce typical steps for preprocessing.

Bojan Gjokjevski

4 months ago

5.0

Great project to start learning and understanding sklearn.preprocessing data.

EVGENII MORGUNOV

10 months ago

5.0

I have learned how to handle missing values in both numerical and categorical variables, clean a DataFrame using element-wise operations, and manage high-cardinality features. Additionally, I have gained experience in engineering new features from existing ones, determining which features to keep or ...

4.4

Learners who completed this project within the Pandas for Data Analysis course rated it as follows:

Usefulness

4.7

Fun

4.3

Clarity

4.1