KMeans clustering searches for clusters of data within a dataset. This is an unsupervised learning model. If we look at plot 1 below, we can easily see the clusters of data – but we haven’t labeled the data (we haven’t told KMeans which cluster each datapoint belongs to). However, as you can see at the bottom of the page that the clusters have been correctly defined.
I have added detailed explanations throughout the code in comments:
Using the below, we can also add names to our clusters. To do this, I find the central point of the cluster & divide the grid into 4. If the cluster falls in the top left quarter of the chart, I say that variable1 is above the mean of all var1 data points and variable2 is below the mean of all var2 data points. I also use Pandas to output the dataframe to CSV with the new cluster names included. We get something like the below chart (the central point dot colours are what the legend refers to).