In this task, you will implement text clustering on the Kaggle dataset. The dataset is available on the Kaggle page with a detailed description of each column.
The dataset contains two CSV files. Open this one:
import pandas as pd
df = pd.read_csv('/content/wine-reviews/winemag-data_first150k.csv')
Note that /content/ is defined in the path name only if you use Google Colaboratory. Correct the path name if you use a platform other than Colab. Wine reviews are stored in df['description'].
Implement TF-IDF vectorization with the following parameters: max_df=0.95, min_df=5. Don't forget to add English stopwords. While splitting into train and test sets use the following conditions: test_size=0.33, random_state=42. Implement KMeans text clustering with the following settings: n_clusters=7, max_iter=100, n_init=2, random_state=42.
In the file below, there is a review of this dataset. Your task is to find that review and type down its cluster (its row number). If the cluster's number is 4, then the correct answer should be 4.