Computer scienceData scienceInstrumentsScikit-learnData preprocessing with scikit-learn

Feature scaling in scikit-learn

SGD regression with feature scaling

Report a typo

You are given the following starter code:

from sklearn.model_selection import train_test_split
from sklearn.datasets import fetch_california_housing
from sklearn.linear_model import SGDRegressor
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_absolute_error

data = fetch_california_housing()
X = data.data
Y = data.target

X_train, X_test, y_train, y_test = train_test_split(X, Y, train_size=0.8, random_state=42)

pipeline = Pipeline([
    ("Scaling", StandardScaler()),
    ("Regression", SGDRegressor(random_state=42))
])

Fit the pipeline (by calling .fit() on pipeline) with the training data, make the predictions on X_test (by calling pipeline.predict()), and calculate the mean absolute error on the resulting predictions (with mean_absolute_error(y_test, y_pred)). After that, fit the pipeline without scaling, make the predictions on X_test, and calculate the mean absolute error again.

Observe the difference between the calculated MAE scores. Your answer can be one of the following options:

Both scores fall in the $[0, 100]$ range and the absolute difference between them is less than $20$ .
One of the scores falls in the $[0, 1]$ range, and the other in the $[1, 70]$ range.
The two scores differ by a factor greater than one thousand, with the scaled score being significantly closer to $0$ than the un-scaled score.
One of the scores lies in the $[1, 50]$ range, and the other is a value in the $[60, 120]$ range.

Your answer should contain the correct option as an integer.

Enter a number

___

Create a free account to access the full topic