Data Mining stories: bisecting k-Means

Monday, August 6, 2012

Bisecting k-Means

Bisecting k-Means is like a combination of k-Means and hierarchical clustering.

It starts with all objects in a single cluster.

The pseudocode of the algorithm is displayed below:

Basic Bisecting K-means Algorithm for finding K clusters

Pick a cluster to split.

Find 2 sub-clusters using the basic k-Means algorithm (Bisecting step)

Repeat step 2, the bisecting step, for ITER times and take the split that produces the clustering with the highest overall similarity.

Repeat steps 1, 2 and 3 until the desired number of clusters is reached.

The critical part is which cluster to choose for splitting. And there are different ways to proceed, for example, you can choose the biggest cluster or the cluster with the worst quality or a combination of both.

Source: "A comparison of document clustering techniques", M. Steinbach, G. Karypis and V. Kumar. Workshop on Text Mining, KDD, 2000. [pdf]