The Isodata Algorithm

Let

x = the set of samples

f = the number of features

N = number of samples

 = the set of representatives

C = the number of clusters with representatives,

S (i) = indication of splitting of cluster

L(i) = indication of lumping of cluster

 

Initialize

K = number of cluster centers desired

M = the current number of clusters

 = the minimum cluster size

= the standard deviation for the splitting threshold

 = the splitting fraction [0,1]
= maximum number of samples in a cluster.
= lumping parameter (maximum allowable distance between clusters)
 = maximum number of pairs of clusters that can be lumped

 = tolerance
MAXITER = number of iterations allowed

S(i) = L(i) = 2, i = 1,N

t = 0

 

2.  Distribute the samples

if (t > MAXITER)

go to step 13

for all x, assign x to Cj where Cj = mink=1,m (x – Rk) (settle ties arbitrarily)

 

3. Remove small clusters

Any cluster with fewer than  samples is discarded with samples distributed as in 2.  Reduce M by 1 for each discarded.

 

4. Update representatives

Update the cluster representatives to the mean of the samples in the cluster.

5. Computer sample distances

Compute the average distance of each sample from the cluster center for its cluster.

            

6. Compute overall average distance

Computer the average distance of all samples from their respective cluster centers:

 

7. Determine need to split or lump

(a)  if  go to step 8.

(b)  if  and t odd, go to step 8

(c)    if  go to 10

(d)   if  and t even, go to step 10

8. Calculate parameters to split clusters

Find the standard deviation vectors for each feature in each cluster:

let  

9. Split clusters

if  do not split, otherwise

if 

split along the jth coordinate, where

recalculate the

C = C + 1

set S(t) = 1

else

if t > 1 AND L(t-1) == 0 AND C == 0, go to 12

if t > 1 AND L(t-1) == 0 AND C == 1, go to 2

if t > 1 AND L(t-1) <> 0, continue

if t==1, continue

10. Calculate the parameters for lumping

Set L(t) = 0

if c < 2, AND S(t) == 0 AND C == 0, go to step 12

if c < 2 AND S(t) == 0 AND C ==1, go to step 2

if c < 2 AND S(t) == 2, go to step 2

Calculate the distances between all cluster centers

 C

Sort the in ascending order and denote the number of distances that do not exceed . Then,

if k*==0 AND S(t) == 2, go to step 2

if k*==0 AND S(t) == 0, go to step 12

L(t) = 1

11. Lumping

for i=1,k*

lump i and j , where .  If for any pair, one or both have already been lumped, do not lump again.

Recalculate the ’s

Go to step 2

12. Output  and stop

13. Output “Number of iterations exceeded”,  and stop