Computer scienceData scienceInstrumentsScikit-learnTraining ML models with scikit-learnClustering in scikit-learn

Agglomerative clustering in scikit-learn

Dimensionality reduction

Report a typo

Note that this task assumes scikit-learn version 1.2.2

You have the following starter code:

from sklearn.decomposition import PCA
from sklearn.datasets import load_wine
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import AgglomerativeClustering
from sklearn.metrics.cluster import adjusted_rand_score
import numpy as np

X, y = load_wine(return_X_y = True)

In order to complete this task, perform the following steps:

  • First, perform standardization of the train set.
  • After the standardization, use PCA to reduce the train set to 2 components (you will need two versions of train: with and without PCA, but standardization is applied to both)
  • Fit AgglomerativeClustering() with 3 clusters on the non-PCA and PCA-reduced train sets. Make the predictions.
  • Calculate the adjusted Rand scores for the non-PCA and PCA-reduced clusterings.

The answer should contain the absolute difference between the two adjusted Rand scores, rounded up to the third decimal.

Enter a number
___

Create a free account to access the full topic