Project
K-Means Clustering from Scratch
Medium
32 completions
~ 16 hours
3.8Consolidate your knowledge of K-Means by creating a working algorithm from scratch. Find appropriate positions for clusters on the training loop, select an appropriate k to see how well the whole thing performs.
Provided by
JetBrains Academy
About
In this project, you'll dive into implementing one of the simplest algorithms to cluster your data: K-Means. Explore it from scratch using only numpy and matplotlib for visualization. We'll stick to the Wine dataset, so at least it will not be boring!
Training project
This project allows you to practice and strengthen your coding skills, helping you get ready for more advanced tasks ahead.
What you'll learn
Once you choose a project, we'll provide you with a study plan that includes all the necessary topics from your course to get it built. Here’s what awaits you:
Search for the nearest cluster center for each object.
Find a new cluster center using the information from the previous step.
Implement the whole fit-predict class and make the algorithm work.
Try using your coded algorithm with different values of k to find an appropriate one.
Automate the process of finding an appropriate k by writing a function for that.
Finish the task by using the power of your code to predict clusters for each object of the dataset and compare them to the "real" clusters (classes).
Reviews
5 months ago
Actually, quite a lot (although the project could be even bigger if you were to include more advanced topics).
I've learned about KMeans, the elbow rule, and silhouette scores. What seemed like an easy project turned out to be quite challenging. I kept struggling with the dimensions in NumPy and ended up failing over and over. 😬 🤣The color on the last plot changed between clusters. Is it due to mislabelin ...
3.8
Learners who completed this project within the Introduction to Data Science course rated it as follows: