Logistic Regression from Scratch. Stage 4/4

Visualize it!

Report a typo

Description

In previous stages, we have successfully carried out the Stochastic gradient descent on the Mean squared error and Log-loss cost functions.

At this stage, you need to train three models:

  • Your implementation of logistic regression with the fit_mse cost function;

  • The same logistic regression with the fit_log_loss cost function;

  • The sklearn logistic regression algorithm.

Our cost functions can determine errors in the following way:

  • The Mean squared error from fit_mse:

J(b0,b1,...)=1ni=1n(yi^yi)2J(b_0,b_1, ...) ={1\over n} { \displaystyle\sum_{i=1}^{n}(\hat{y_i}-y_i)^{2}}

  • The Log-loss from fit_log_loss:

J(b0,b1,...)=1ni=1nyilog(yi^)+(1yi)log(1yi^)J(b_0,b_1, ...) = -{1\over n} \displaystyle\sum_{i=1}^{n}{y_i \cdot log(\hat{y_i})}+({1-y_i) \cdot log(1-\hat{y_i})}

Our goal is to minimize errors during the training process. In this stage, we are going to analyze their behavior.

Objectives

  1. Load the dataset and select the same independent and target variables as in the previous stage;

  2. Standardize X;

  3. Instantiate the CustomLogisticRegression class;

  4. Use the train-test split from Stage 1;

  5. Fit a model with the training set using the fit_log_loss method;

  6. Fit a model with the training set using the fit_mse method;

  7. Import LogisticRegression from sklearn.linear_model and fit it with the training set;

  8. Determine the error values during the first and the last epoch of training custom logistic regression for fit_mse method;

    We need to recalculate the error after each step of stochastic gradient descent as the coefficients and bias are updated.

  9. Repeat the same operation for fit_log_loss method;

  10. Predict y_hat values for the test set with all three models;

  11. Calculate the accuracy scores for the test set for all models;

  12. Print the accuracy scores of all models and the errors from the first and the last epochs of the training custom models as a Python dictionary. Please, print the accuracies and errors in the same order as in the Examples section.

Use the following parameters for all three models:

n_epoch = 1000  # although this parameter can be specified only for custom models
fit_intercept = True
l_rate = 0.01

columns = ['worst concave points', 'worst perimeter', 'worst radius'] # same as in the previous stage

If your solution works properly, you will receive graph.jpg in the current directory. It shows the errors plotted on four graphs. Explore these plots and answer the following questions:

  1. What is the minimum MSE value for the first epoch?

  2. What is the minimum MSE value for the last epoch?

  3. What is the maximum Log-loss value for the first epoch?

  4. What is the maximum Log-loss value for the last epoch?

  5. Has the range of the MSE values expanded or narrowed? (expanded/narrowed)

  6. Has the range of the Log-loss values expanded or narrowed? (expanded/narrowed)

Once you're done with the Objectives, provide answers to the questions in the format shown below. Round number to the fifth decimal place:

Answers to the questions:  # the actual answers may differ
1) 0.00080
2) 0.00000
3) 0.00242
4) 0.00100
5) expanded
6) narrowed

Tip: In Python, you can use triple quotes """here is your multi-line string""" to print multi-line strings.

Examples

There're several examples below with different instantiation parameters and, therefore, different answers. The independent variables also differ, but the train-test split parameters always remain the same (train_size=0.8, random_state=43). Only a part of the answers is shown due to the large lengths of error arrays. After each output example, there's an attached text file with a full answer, so you can test your program to be sure that it works fine.

Example 1: instantiation parameters

n_epoch = 30
fit_intercept = False
l_rate = 0.01
columns = ['mean radius', 'mean smoothness'] # names of independent variables

Output (a Python dict):

{'mse_accuracy': 0.8947368421052632, 'logloss_accuracy': 0.8947368421052632, 'sklearn_accuracy': 0.9035087719298246, 'mse_error_first': [0.0005494505494505495, 0.0005487619260382545, ...], 'mse_error_last': [0.0002595746966187235, 3.58874250533432e-05, ...], 'logloss_error_first': [0.0015234003968350445, 0.001523388285406441, ...], 'logloss_error_last': [0.0014739026005740753, 0.0014436401701211706, ...]}

Download full output as a file

Logistic Regression from Scratch: Log-loss function charts for example 1

Example 2: instantiation parameters

n_epoch = 500
fit_intercept = True
l_rate = 1
columns = ['smoothness error', 'mean fractal dimension', 'texture error'] # names of independent variables

Output (a Python dict):

{'mse_accuracy': 0.6052631578947368, 'logloss_accuracy': 0.6666666666666666, 'sklearn_accuracy': 0.6666666666666666, 'mse_error_first': [0.0005494505494505495, 0.000604497455225717, ...], 'mse_error_last': [0.0009635572252935032, 0.0008034312168157345, ...], 'logloss_error_first': [0.0015234003968350445, 0.0015243461138814232, ...], 'logloss_error_last': [0.002177979476029927, 0.0010583180413733793, ...]}

Download full output as a file

Logistic Regression from Scratch: Log-loss function charts for example 2

Write a program
IDE integration
Checking the IDE status
___

Create a free account to access the full topic