Maciej Pacula

k-means clustering example (Python)

I had to illustrate a k-means algorithm for my thesis, but I could not find any existing examples that were both simple and looked good on paper. See below for Python code that does just what I wanted.

#!/usr/bin/python
 
# Adapted from http://hackmap.blogspot.com/2007/09/k-means-clustering-in-scipy.html
 
import numpy
import matplotlib
matplotlib.use('Agg')
from scipy.cluster.vq import *
import pylab
pylab.close()
 
# generate 3 sets of normally distributed points around
# different means with different variances
pt1 = numpy.random.normal(1, 0.2, (100,2))
pt2 = numpy.random.normal(2, 0.5, (300,2))
pt3 = numpy.random.normal(3, 0.3, (100,2))
 
# slightly move sets 2 and 3 (for a prettier output)
pt2[:,0] += 1
pt3[:,0] -= 0.5
 
xy = numpy.concatenate((pt1, pt2, pt3))
 
# kmeans for 3 clusters
res, idx = kmeans2(numpy.array(zip(xy[:,0],xy[:,1])),3)
 
colors = ([([0.4,1,0.4],[1,0.4,0.4],[0.1,0.8,1])[i] for i in idx])
 
# plot colored points
pylab.scatter(xy[:,0],xy[:,1], c=colors)
 
# mark centroids as (X)
pylab.scatter(res[:,0],res[:,1], marker='o', s = 500, linewidths=2, c='none')
pylab.scatter(res[:,0],res[:,1], marker='x', s = 500, linewidths=2)
 
pylab.savefig('/tmp/kmeans.png')

The output looks like this (also available in vector format here):

The X’s mark cluster centers. Feel free to use any of these files for whatever purposes. An attribution would be nice, but is not required :-).

Machine Learning, Programming, Python

April 27th, 2011

21 responses

Do you want to comment?

Comments RSS and TrackBack Identifier URI ?

Seth Brown

Thanks for posting your k-means example. I was having some trouble and I couldn’t find any examples until I stumbled onto your implementation. Thank you!

July 4, 2011 10:34 pm

Maciej Pacula

You’re welcome, Seth! Glad I could help.

July 6, 2011 7:09 pm

Joaquín

Maciej! thank you very much, i have been looking an example like this for a while. This helped a lot! 🙂
And by the way, the graphic rocks!!, matplotlib gave me a big headache in the past

December 11, 2011 7:18 pm

Maciej Pacula

You are most welcome 🙂

December 14, 2011 11:00 pm

Nizam

Thanks man!

March 14, 2012 8:21 am

Khan

Maciej,

Can I implement it 1D data? I have 1D data but this code is not working on it (obviously because it looks for columns)..so what additions would you recommend me to do in this script to make it go for 1D data too?

December 18, 2012 6:11 pm

Alex Guerra

Thanks for the post, Maciej. I have used kmeans to identify clusters (rings) in a matrix of sea surface height. The objective is to identify the rings and to determine their centroids. But kmeans, like kmeans2, requires as input parameter the number of clusters to be sought. That is a problem because I usually do not know previously how many rings will be present in the area. So, I was wondering how to avoid this kmeans limitation. Do you have any idea?
Regards,
Alex

March 15, 2013 9:10 am

Maciej Pacula

Hi Alex,

This may help: http://en.wikipedia.org/wiki/Determining_the_number_of_clusters_in_a_data_set

Maciej

March 19, 2013 4:58 pm

sukard

Hello, thanks for the info.
Is it possible to use scipy k-means in a capacitated k-means?
If so, how?
Thanks.

July 29, 2013 6:52 pm

BBB

Hi Maciej,

Thank you so much for your post! It was extremely useful.
However I might need your help. I’m working on a raw dataset of crimes in the city of chicago and I’m trying to cluster them up according to the type of crime committed using k-means. However I’m struggling to define the clusters and mostly to write a code for those. Any change I might get your help?

Thank you

February 24, 2016 10:44 am

Sidney Leal

Many thanks!!

September 29, 2016 2:24 pm

geetha devi

I think things like this are really interesting. I absolutely love to find unique places like this. It really looks super creepy though!!

September 20, 2018 5:52 am

Rishab

Sir,
Kindly explain the code in detail.

September 25, 2018 8:30 am

Pamela

Hi, thank you. I am using your graphic for a blog post on ML/DL and I wanted a clear demonstration of K-means in practice. I am attributing your name and linking your blog. Thanks 🙂

January 29, 2019 12:35 pm

BestShela

I have noticed you don’t monetize mpacula.com, don’t waste your traffic,
you can earn extra bucks every month with new monetization method.

This is the best adsense alternative for any type of website (they approve all websites),
for more details simply search in gooogle: murgrabia’s tools

July 23, 2019 9:03 pm

k-means clustering example (Python)

21 responses

Comment now!

Trackbacks

Archives

Links

Meta