Project

K-Means Clustering from Scratch

Medium
32 completions
~ 16 hours
3.8

Consolidate your knowledge of K-Means by creating a working algorithm from scratch. Find appropriate positions for clusters on the training loop, select an appropriate k to see how well the whole thing performs.

Provided by

JetBrains Academy JetBrains Academy

About

In this project, you'll dive into implementing one of the simplest algorithms to cluster your data: K-Means. Explore it from scratch using only numpy and matplotlib for visualization. We'll stick to the Wine dataset, so at least it will not be boring!

Training project icon

Training project

This project allows you to practice and strengthen your coding skills, helping you get ready for more advanced tasks ahead.

What you'll learn

Once you choose a project, we'll provide you with a study plan that includes all the necessary topics from your course to get it built. Here’s what awaits you:
Find a new cluster center using the information from the previous step.
Implement the whole fit-predict class and make the algorithm work.
Try using your coded algorithm with different values of k to find an appropriate one.
Automate the process of finding an appropriate k by writing a function for that.
Finish the task by using the power of your code to predict clusters for each object of the dataset and compare them to the "real" clusters (classes).

Reviews

Krzysztof Kopel avatar
Krzysztof Kopel
5 months ago
Actually, quite a lot (although the project could be even bigger if you were to include more advanced topics).
Anton Teplov avatar
Anton Teplov
7 months ago
I like this project, i have some issues with numpy, but concept of the project is very interesting and engaging.
synth avatar
synth
1 year ago
Moderator
I've learned about KMeans, the elbow rule, and silhouette scores. What seemed like an easy project turned out to be quite challenging. I kept struggling with the dimensions in NumPy and ended up failing over and over. 😬 🤣The color on the last plot changed between clusters. Is it due to mislabelin ...

3.8

Learners who completed this project within the Introduction to Data Science course rated it as follows:
Usefulness
4.2
Fun
3.9
Clarity
3.4