SGD regression with feature scaling

Report a typo

You are given the following starter code:

from sklearn.model_selection import train_test_split
from sklearn.datasets import fetch_california_housing
from sklearn.linear_model import SGDRegressor
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_absolute_error

data = fetch_california_housing()
X = data.data
Y = data.target

X_train, X_test, y_train, y_test = train_test_split(X, Y, train_size=0.8, random_state=42)

pipeline = Pipeline([
    ("Scaling", StandardScaler()),
    ("Regression", SGDRegressor(random_state=42))
])

Fit the pipeline (by calling .fit() on pipeline) with the training data, make the predictions on X_test (by calling pipeline.predict()), and calculate the mean absolute error on the resulting predictions (with mean_absolute_error(y_test, y_pred)). After that, fit the pipeline without scaling, make the predictions on X_test, and calculate the mean absolute error again.

Observe the difference between the calculated MAE scores. Your answer can be one of the following options:

  1. Both scores fall in the [0,100][0, 100] range and the absolute difference between them is less than 2020.

  2. One of the scores falls in the [0,1][0, 1] range, and the other in the [1,70][1, 70] range.

  3. The two scores differ by a factor greater than one thousand, with the scaled score being significantly closer to 00 than the un-scaled score.

  4. One of the scores lies in the [1,50][1, 50] range, and the other is a value in the [60,120][60, 120] range.

Your answer should contain the correct option as an integer.

Enter a number
___

Create a free account to access the full topic