Salary Prediction. Stage 1/5

Linear regression with one independent variable

Report a typo

Description

In the first stage, let's start with the simplest linear model — it will include salary as a dependent variable and the player's rating as the only predictor. Your goal is to fit such a model, find its coefficients and calculate the MAPE (mean average percentage error).

player salary linear regression

The scatterplot above shows the relationship between rating and salary and its linear approximation. The red line is the formula salary^=krating+b\hat {\text{salary}} = k \cdot \text{rating} + b, where salary^\hat {\text{salary}} is predicted player's salary, kk is the slope of the linear regression model, and bb is its intercept. You need to find kk and bb. After that, you also need to calculate the MAPE. You can do it with sklearn.metrics.mean_absolute_percentage_error.

Objectives

  1. We have automated the data download process in the .py file provided to you. However, if that is inconvenient, feel free to download the dataset on your own;
  2. Load the DataFrame using the pandas.read_csv method;
  3. Make X a DataFrame with a predictor rating and y a series with a target salary;
  4. Split predictor and target into training and test sets. Use test_size=0.3 and random_state=100 parameters — they guarantee that the results will be as expected;
  5. Fit the linear regression model with the following formula on the training data: salaryrating\text{salary} \sim \text{rating} .
  6. Predict a salary with the fitted model on test data and calculate the MAPE;
  7. Print three float numbers: the model intercept, the slope, and the MAPE rounded to five decimal places and separated by whitespace.

Example

Example 1: program output

123456.78901 987.65432 1.23456
Write a program
IDE integration
Checking the IDE status
___

Create a free account to access the full topic