The Isodata Algorithm
Let
x = the set of samples
f = the number of
features
N = number of samples
= the set of
representatives
C = the number of
clusters with representatives,
S (i) = indication of
splitting of cluster
L(i) = indication of
lumping of cluster
Initialize
K = number of cluster
centers desired
M = the current number
of clusters
= the minimum cluster
size
= the standard deviation for the splitting threshold
= the splitting
fraction [0,1]
= maximum number of samples in a cluster.
= lumping parameter (maximum allowable distance between
clusters)
= maximum number of
pairs of clusters that can be lumped
= tolerance
MAXITER = number of iterations allowed
S(i) = L(i) = 2, i = 1,N
t = 0
2. Distribute the samples
if (t > MAXITER)
go to step 13
for all x, assign x to Cj
where Cj = mink=1,m (x – Rk) (settle ties
arbitrarily)
3.
Remove small clusters
Any cluster with fewer
than samples is discarded with samples distributed as in 2. Reduce M by 1 for each discarded.
4.
Update representatives
Update the cluster
representatives to the mean of the samples in the cluster.
5.
Computer sample distances
Compute the average
distance of each sample from the cluster center for its cluster.
6.
Compute overall average distance
Computer the average
distance of all samples from their respective cluster centers:
7. Determine need to split or lump
(a) if go to step 8.
(b) if and t odd, go to step 8
(c) if go to 10
(d) if and t even, go to step 10
8. Calculate parameters to split clusters
Find the standard
deviation vectors for each feature in each cluster:
let
9. Split clusters
if do not split,
otherwise
if
split along the jth
coordinate, where
recalculate the
C = C + 1
set S(t) = 1
else
if t > 1 AND L(t-1)
== 0 AND C == 0, go to 12
if t > 1 AND L(t-1)
== 0 AND C == 1, go to 2
if t > 1 AND L(t-1)
<> 0, continue
if t==1, continue
10. Calculate the parameters for lumping
Set L(t) = 0
if c < 2, AND S(t) ==
0 AND C == 0, go to step 12
if c < 2 AND S(t) ==
0 AND C ==1, go to step 2
if c < 2 AND S(t) ==
2, go to step 2
Calculate the distances
between all cluster centers
C
Sort the in ascending order and denote the number of distances that do
not exceed
. Then,
if k*==0 AND S(t)
== 2, go to step 2
if k*==0 AND S(t)
== 0, go to step 12
L(t) = 1
11. Lumping
for i=1,k*
lump i and j
, where . If for any pair,
one or both have already been lumped, do not lump again.
Recalculate the ’s
Go to step 2
12. Output and stop
13. Output “Number of iterations exceeded”, and stop