Stage implement · Gradient descent with MSE

Logistic Regression from Scratch. Stage 2/4

Gradient descent with MSE

Report a typo

Description

In the previous stage, we have provided you with the coef_ values. In this stage, you need to estimate the coef_ values by gradient descent on the Mean squared error cost function. Gradient descent is an optimization technique for finding the local minimum of a cost function by first-order differentiating. To be precise, we're going to implement the Stochastic gradient descent (SGD).

The Mean squared error cost function can be expressed as:

$J(b_0,b_1, ...) ={1\over n} { \displaystyle\sum_{i=1}^{n}(\hat{y_i}-y_i)^{2}}$

Where $i$ indexes the rows (observations), and:

$\hat{y_i} = {1 \over 1 + e^{-t_i}}, \ \ \ \ t_i =b_0 + b_1x_{i1} + b_2x_{i2} +...$ $\hat{y_i}$ is the predicted probability value for the $i^{th}$ row, while $y_i$ is its actual value. As usual, $x_{ij}$ is a value of the $i^{th}$ row and the $j^{th}$ column. In other words, it's the $i^{th}$ value of the $j^{th}$ independent variable. Weights are updated by their first-order derivatives in the training loop as follows: $b_1 = b_1 - l\_rate \cdot (\hat{y_i}-y_i) \cdot \hat{y_i} \cdot (1-\hat{y_i}) \cdot x_{i1} \\ b_2 = b_2 - l\_rate \cdot (\hat{y_i}-y_i) \cdot \hat{y_i} \cdot (1-\hat{y_i}) \cdot x_{i2} \\ ... \\ b_j = b_j - l\_rate \cdot (\hat{y_i}-y_i) \cdot \hat{y_i} \cdot (1-\hat{y_i}) \cdot x_{ij} \\ ...$

The bias $b_0$ can be updated by: $b_0 = b_0 - l\_rate \cdot (\hat{y_i}-y_i) \cdot \hat{y_i} \cdot (1-\hat{y_i})$

For learning purposes, we will use the entire training set to update weights sequentially. The number of the epoch n_epoch is the number of iterations over the training set. A training loop is a nested for-loop over n_epoch and all the rows in the train set. If n_epoch = 10 and the number of rows in the training set is 100, the coefficients are updated 1000 times after training loops:

# Training loop
for one_epoch in range(n_epoch):
    for i, row in enumerate(X_train):
        # update weight b0
        # update weight b1
        # update weight b2
             ...

The initial values of the weights are insignificant; they are optimized to the values that minimize the cost function. So, you can randomize the weights or set them to zeros. The weight optimization process occurs inside the fit_mse method.

If a particular weight value is updated by large increments, it descents down the quadratic curve in an erratic way and may jump to the opposite side of the curve. In this case, we may miss the value of the weight that minimizes the loss function. The learning rate l_rate can tune the value for updating the weight to the step size that allows for gradual descent along the curve with every iteration:

class CustomLogisticRegression:

    def __init__(self, fit_intercept=True, l_rate=0.01, n_epoch=100):
        self.fit_intercept = ...
        self.l_rate = ...
        self.n_epoch = ...

    def sigmoid(self, t):
        return ...

    def predict_proba(self, row, coef_):
        t = ...
        return self.sigmoid(t)

    def fit_mse(self, X_train, y_train):
        self.coef_ = ...  # initialized weights

        for _ in range(self.n_epoch):
            for i, row in enumerate(X_train):
                y_hat = self.predict_proba(row, coef_)
                # update all weights

    def predict(self, X_test, cut_off=0.5):
        ...
        for row in X_test:
            y_hat = self.predict_proba(row, self.coef_)
        return predictions # predictions are binary values - 0 or 1

The predict method calculates the values of y_hat for each row in the test set and returns a numpy array that contains these values. Since we are solving a binary classification problem, the predicted values can be only $0$ or $1$ . The return of predict depends on the cut-off point. The predict_proba probabilities that are less than the cut-off point are rounded to $0$ , while those that are equal or bigger are rounded to $1$ . Set the default cut-off value to $0.5$ . To determine the prediction accuracy of your model, use accuracy_score from sklearn.metrics.

Objectives

Implement the fit_mse method;
Implement the predict method;
Load the dataset. Select the following columns as independent variables: worst concave points, worst perimeter, worst radius. The target variable remains the same;
Standardize X;

Instantiate the CustomLogisticRegression class with the following attributes:

lr = CustomLogisticRegression(fit_intercept=True, l_rate=0.01, n_epoch=1000)

Fit the model with the training set from the previous stage (train_size=0.8 and random_state=43) using fit_mse;
Predict y_hat values for the test set;
Calculate the accuracy score for the test set;
Print coef_ array and accuracy score as a Python dictionary in the format shown in the Examples section.

Examples

The training set in the examples below is the same as in the Objectives section. Only the test set and CustomLogisticRegression class attributes vary.

Example test set (features are standardized):

Standardized X_test and y_test data
`worst concave points`	`worst perimeter`	`worst radius`	`y`
0.320904	0.230304	-0.171560	1.0
-1.743529	-0.954428	-0.899849	1.0
1.014627	0.780857	0.773975	0.0
1.432990	-0.132764	-0.123973	0.0

Download as a file

Example 1: processing the CustomLogisticRegression class with the following attributes

lr = CustomLogisticRegression(fit_intercept=True, l_rate=0.01, n_epoch=100)

Output (a Python dict):

{'coef_': [ 0.7219814 , -2.06824488, -1.44659819, -1.52869155], 'accuracy': 0.75}

Example 2: processing the CustomLogisticRegression class with the following attributes

lr = CustomLogisticRegression(fit_intercept=False, l_rate=0.01, n_epoch=100)

Output (a Python dict):

{'coef_': [-1.86289827, -1.60283708, -1.69204615], 'accuracy': 0.75}

Write a program

IDE100

___

Create a free account to access the full topic

Topics in stage