NBA Data Preprocessing

Challenging

107 completions

~ 31 hours

4.4

Learn how to handle missing values in numerical and categorical variables, clean a DataFrame using element-wise operations, handle high-cardinality features, and engineer new features from the existing ones. Determine which features to keep and which ones to drop in the case of multicollinearity and get to know data transformation techniques.

Provided by

JetBrains Academy

About

Data preprocessing is one of the first steps in the machine learning workflow. The main idea is to transform raw data into a format that machine learning algorithms can easily understand. The predictive performance of a machine learning model highly depends on the input data quality. Thus, it's an absolute must to know how to improve the quality of your input data by removing the features with low predictive value, engineering new ones, and dealing with multicollinearity. With this project, you'll apply these concepts to NBA data to get a high-quality dataset ready to be fed to a linear model!

Graduate project

This project covers the core topics of the Pandas for Data Analysis course, making it sufficiently challenging to be a proud addition to your portfolio.

At least one graduate project is required to complete the course.

What you'll learn

Once you choose a project, we'll provide you with a study plan that includes all the necessary topics from your course to get it built. Here’s what awaits you:

Handle missing values, remove extraneous characters, and parse the features.

Create new numerical features out of the existing ones and deal with high cardinality.

Drop the multicollinear features by observing the correlation coefficients.

Apply the transformation techniques to numerical and categorical features.

Reviews

Bojan Gjokjevski

3 months ago

5.0

Great project to start learning and understanding sklearn.preprocessing data.

EVGENII MORGUNOV

9 months ago

5.0

I have learned how to handle missing values in both numerical and categorical variables, clean a DataFrame using element-wise operations, and manage high-cardinality features. Additionally, I have gained experience in engineering new features from existing ones, determining which features to keep or ...

Aneurin Sutton

12 months ago

5.0

I increased my skill at pre-processing data for predictive models. Specifically, how to identify and remove colinear features, standardize numeric features, and encode nominal features.

4.4

Learners who completed this project within the Pandas for Data Analysis course rated it as follows:

Usefulness

4.7

Fun

4.3

Clarity

4.1