Description
In previous stages, we have successfully carried out the Stochastic gradient descent on the Mean squared error and Log-loss cost functions.
At this stage, you need to train three models:
Your implementation of logistic regression with the
fit_msecost function;The same logistic regression with the
fit_log_losscost function;The
sklearnlogistic regression algorithm.
Our cost functions can determine errors in the following way:
The Mean squared error from
fit_mse:
The Log-loss from
fit_log_loss:
Our goal is to minimize errors during the training process. In this stage, we are going to analyze their behavior.
Objectives
Load the dataset and select the same independent and target variables as in the previous stage;
Standardize
X;Instantiate the
CustomLogisticRegressionclass;Use the train-test split from Stage 1;
Fit a model with the training set using the
fit_log_lossmethod;Fit a model with the training set using the
fit_msemethod;Import
LogisticRegressionfromsklearn.linear_modeland fit it with the training set;Determine the error values during the first and the last epoch of training custom logistic regression for
fit_msemethod;We need to recalculate the error after each step of stochastic gradient descent as the coefficients and bias are updated.
Repeat the same operation for
fit_log_lossmethod;Predict
y_hatvalues for the test set with all three models;Calculate the accuracy scores for the test set for all models;
Print the accuracy scores of all models and the errors from the first and the last epochs of the training custom models as a Python dictionary. Please, print the accuracies and errors in the same order as in the Examples section.
Use the following parameters for all three models:
n_epoch = 1000 # although this parameter can be specified only for custom models
fit_intercept = True
l_rate = 0.01
columns = ['worst concave points', 'worst perimeter', 'worst radius'] # same as in the previous stageIf your solution works properly, you will receive graph.jpg in the current directory. It shows the errors plotted on four graphs. Explore these plots and answer the following questions:
What is the minimum MSE value for the first epoch?
What is the minimum MSE value for the last epoch?
What is the maximum Log-loss value for the first epoch?
What is the maximum Log-loss value for the last epoch?
Has the range of the MSE values expanded or narrowed? (
expanded/narrowed)Has the range of the Log-loss values expanded or narrowed? (
expanded/narrowed)
Once you're done with the Objectives, provide answers to the questions in the format shown below. Round number to the fifth decimal place:
Answers to the questions: # the actual answers may differ
1) 0.00080
2) 0.00000
3) 0.00242
4) 0.00100
5) expanded
6) narrowedTip: In Python, you can use triple quotes """here is your multi-line string""" to print multi-line strings.
Examples
There're several examples below with different instantiation parameters and, therefore, different answers. The independent variables also differ, but the train-test split parameters always remain the same (train_size=0.8, random_state=43). Only a part of the answers is shown due to the large lengths of error arrays. After each output example, there's an attached text file with a full answer, so you can test your program to be sure that it works fine.
Example 1: instantiation parameters
n_epoch = 30
fit_intercept = False
l_rate = 0.01
columns = ['mean radius', 'mean smoothness'] # names of independent variablesOutput (a Python dict):
{'mse_accuracy': 0.8947368421052632, 'logloss_accuracy': 0.8947368421052632, 'sklearn_accuracy': 0.9035087719298246, 'mse_error_first': [0.0005494505494505495, 0.0005487619260382545, ...], 'mse_error_last': [0.0002595746966187235, 3.58874250533432e-05, ...], 'logloss_error_first': [0.0015234003968350445, 0.001523388285406441, ...], 'logloss_error_last': [0.0014739026005740753, 0.0014436401701211706, ...]}Download full output as a file
Example 2: instantiation parameters
n_epoch = 500
fit_intercept = True
l_rate = 1
columns = ['smoothness error', 'mean fractal dimension', 'texture error'] # names of independent variablesOutput (a Python dict):
{'mse_accuracy': 0.6052631578947368, 'logloss_accuracy': 0.6666666666666666, 'sklearn_accuracy': 0.6666666666666666, 'mse_error_first': [0.0005494505494505495, 0.000604497455225717, ...], 'mse_error_last': [0.0009635572252935032, 0.0008034312168157345, ...], 'logloss_error_first': [0.0015234003968350445, 0.0015243461138814232, ...], 'logloss_error_last': [0.002177979476029927, 0.0010583180413733793, ...]}Download full output as a file