Computer scienceData scienceInstrumentsScikit-learnTraining ML models with scikit-learnClassification in scikit-learn

SVM in scikit-learn

9 minutes read

As you have seen in the corresponding topic, SVM is an algorithm used mainly in classification. The algorithm's goal is to find the optimal hyperplane that can separate data points of different classes. In this topic, we will go through a hands-on classification example with SVM using scikit-learn.

Available SVM implementations

scikit-learn comes with three SVM implementations for classification: SVC, NuSVC, and LinearSVC. In this topic, we will be using svm.SVC class to classify the data. Below is a brief comparison of the available implementations:

SVM implementation Description Kernel Notes
SVC In the most general implementation that can apply the kernel trick, the main tunable parameter is C, which controls the trade-off between the decision boundary and the misclassification. Radial Basis Function (RBF) by default, accepts any custom kernel Will be slow for large datasets
NuSVC Similar to SVC, but requires the tuning of the nu parameter instead of C, nu controls the fraction of training errors and the fraction of support vectors Radial Basis Function (RBF) by default, accepts any custom kernel Offers a more flexible approach w.r.t. the number of support vectors
LinearSVC Similar to SVC, but a more efficient implementation for high-dimensional data with only a single (linear) kernel type. Only linear kernel

Can handle both sparse and dense datasets, with the underlying liblinear implementation converting the initial data representation into a sparse format, and creating a copy of the original data. Less sensitive to large values of C, however, setting a value too high results in performance degradation

Data preparation

In this topic, we will use the breast cancer dataset provided by scikit-learn. We first load the dataset and split it into the training and testing datasets:

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split

X, y = load_breast_cancer(return_X_y = True)

X_train, X_test, y_train, y_test = train_test_split(X, y,test_size=0.33, random_state=27)

The dataset contains 30 features (stored in X), describing various characteristics of the cell nuclei, such as their radius, area, symmetry, etc. y, the target variable, contains only two classes represented by 0s and 1s. Label 0 indicates the tumor is malignant, and 1 indicates the tumor is benign.

SVM is affected by the feature scales, so we perform standardization to bring all features to the same scale:

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

Fitting SVC

Initializing the default SVC object is straightforward:

from sklearn.svm import SVC

model = SVC()

Below is a review of the main parameters of SVC:

  • C (float, default=1.0) — the regularization parameter, controls the avoidance of misclassifying each training example. In SVM, the objective is to find a hyperplane with the largest minimum margin and a hyperplane that can correctly classify as many samples as possible, which creates a bias-variance trade-off. When the value of C is large, the regularization strength is reduced, and a smaller margin hyperplane will be chosen if that hyperplane is better at classifying the training samples, however, bigger values might lead to overfitting. On another hand, a small value of C will result in a larger margin hyperplane, even if that hyperplane misclassifies more samples, which can result in underfitting. Setting a smaller value of Cis recommended if there are a lot of outliers present.
  • kernel ('linear'/'poly'/'rbf'/'sigmoid'/'precomputed'/callable, default= 'rbf') — The kernel function to be used. Passing a callable enables you to pass a custom kernel function.
  • degree (int, only if kernel='poly', default=3) — degree of the polynomial kernel function.
  • gamma ('scale'/'auto'/float, only for kernel in ['rbf', 'poly', 'sigmoid'], default='scale')— the kernel coefficient. Higher values of gamma will emphasize points closer to the hyperplane, which results in a tighter decision boundary, leading to poorer generalization and potential overfitting (and smaller values have the opposite effect, by simplifying the decision boundary and improving generalization). The default value ('scale') is computed as 1 / (n_features * X.var())
  • shrinking(bool, default=True) — whether to use the shrinking heuristic, can lead to an increase in the training speed, but hurt the precision.

Then, we fit the model on the training set:

model.fit(X_train, y_train)

Once SVC is fitted, we can move to the performance evaluation.

Inspecting the model

Let's quickly check the mean accuracy of the default SVC on the holdout set:

model.score(X_test, y_test)
# 0.9840425531914894

We can also build a confusion matrix as:

from sklearn.metrics import confusion_matrix
import seaborn as sns
import numpy as np

y_pred = model.predict(X_test)
cm = confusion_matrix(y_test, y_pred)

group_names = ['True Negative', 'False Positive', 'False Negative', 'True Positive']
group_counts = cm.flatten()
                     
labels = [f'{v1}\n{v2}' for v1, v2 in zip(group_names, group_counts)]
labels = np.asarray(labels).reshape(2,2)

sns.heatmap(cm, annot=labels, fmt='')

A confusion matrix for the classification results

Hyperparameter tuning with GridSearchCV

Judging by the test score, looks like we got lucky with the default hyperparameter set. Let's see if we get a different hyperparameter configuration by the grid search, which will split the training dataset into multiple sets and evaluate each combination via cross-validation.

from sklearn.model_selection import GridSearchCV

kernel = ['rbf', 'linear', 'rbf', 'poly']
gamma = ['scale', 'auto']
C = [0.1, 1.0, 10, 100, 1000]

param_grid = {"kernel": kernel, "gamma": gamma, "C": C}

grid_search = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs = -1)
grid_search.fit(X_train, y_train)
grid_search.best_params_

The last line returns {'C': 0.1, 'gamma': 'scale', 'kernel': 'linear'}. The default fit had {'C': 1.0, 'gamma': 'scale', 'kernel': 'rbf'}.

grid_search.score(X_test, y_test)

will result in approximately 0.968, which is a drop from the default configuration. We can inspect the results further with grid_search.cv_results_:

   param_C param_gamma param_kernel  mean_test_score  rank_test_score
1      0.1       scale       linear         0.966029                1
5      0.1        auto       linear         0.966029                1
...
8      1.0       scale          rbf         0.963397                3

We can see that the mean_test_score (the mean score over the cross-validation folds) is only slightly higher for the chosen configuration, which is negotiable, however, the stars generally do not align so well. A lower score on the test occurred due to the different data split via cross-validation (so SVC was refit multiple times on different sets). If you look further into the grid_search.cv_results_, you can see the scores ranging across different cross-validation splits. In any case, using the test set for tuning the values is likely to lead to overfitting on the test data, thus, a technique such as grid search should be applied instead.

Conclusion

In summary, in this topic, we learned about:

  • how to use the sklearn.svm.SVC class;

  • how to perform SVC's hyperparameter tuning.

How did you like the theory?
Report a typo