partitions a matrix of dissimilarities into a given number of clusters, identifying an upper bound by biased-sampling complete link cluster analysis, and then using a branch-and-bound algorithm to minimize the 'partition diameter', that is, the maximum of the cluster 'diameters' (the maximum pairwise dissimilarity values within each group). The same process is used in CONPAR when partitioning the matrix of dissimilarities between the original set of subject matrices.
Data: 2-way, 1-mode dis/similarities
Clustering is
often used as an alternative or as a supplementary technique to the basic model
of MDS and takes the same form of data. In contrast to The
algorithm
The object is to
develop a partition of a matrix of pairwise dissimilarity measures
matrix into a series of clusters, each containing at least one item.
Clusters are to be mutually exclusive, and exhaustive, in that all items
are assigned to a cluster. A commonly-used criterion in clustering is to
minimize the within-cluster sum of pairwise dissimilarities, but this has a
tendency to produce clusters of approximately the same size, irrespective of the
data. For this reason, the enhanced branch and bound algorithm employed by
BBDIAM seeks instead to minimize the partition diameter, which
is related to Johnson's diameter method for hierarchical clustering. The
diameter of a cluster is the maximum pairwise dissimilarity index among objects
in that cluster. The partition diameter is defined as the maximum of the cluster
diameters. To minimize the diameter of the partition is to minimize the maximum
dissimilarity index accross all subsets. An advantage of the partition diameter
index is that it is not predisposed to produce clusters of particular
sizes. It is also computationally simpler than minimizing the
within-cluster sum of dissimilarities. Guaranteed minimum-diameter solutions can
be obtained using branch-and-bound methods comparable to those used for
colouring the nodes of a graph.
The efficiency
of the branch-and-bound algorithm in minimizing the partition
diameter depends on the quality of the upper bound. A good upper bound can
frequently be established heuristically using a complete-link clustering
algorithm, as in HICLUS Method(1).
An algorithm for partitioning then applies two kinds of
local-search operations - trial movement of each object from its current
subset to each of the other subsets, and pairwise interchange of objects w.r.t.
their subset memberships. These local-search operations are conducted until
no further improvement is possible in the upper bound.
Because the
resulting solutions may be sensitive to the initial
partition, BBDIAM applies a probabilistic process that uses
100 replications and selects linkages using biased sampling. In particular,
when considering the next linkage, the algorithm has a 50% chance of
selecting the best linkage and a 50% chance of selecting the second best
linkage. This biased-sampling version often produces better bounds than the
deterministic version.
Using
BBDIAM
BBDIAM expects
data in the form of a lower triangle or a full symmetric matrix of
(dis)similarity measures between a set of objects (stimuli). Any of the types of
data suitable for input to MINISSA are suitable. Note that data values must be
non-negative.
BBDIAM
will produce the series of partitions defined by the
CLUSTERS command, which may specify a list and/or a range of
numbers of partitions, for example 2,5,8 or
2 TO 5 or 2, 4 TO 6.
Specifying CLUSTERS to produce full enumeration of all
possible combinations is not recommended for matrices with many more than
15 variables because of the time this may take to
compute. Occasionally BBDIAM appears to be unable to reach an optimal
partition into a particular number of clusters. In this case, you should halt
the process and try again, requesting a different number of clusters.
Since the
method does not develop clusters hierarchically it is not
possible to display them as a dendrogram, as in Output
The
output simply lists the partitions requested, according to the members allocated
to the corresponding numbers of groups. Giving names to the variables or stimuli
when entering the matrix using the input Wizard, or use of the
LABELS command in the input, considerably clarifies the cluster
contents listed. INPUT
COMMANDS PARAMETERS Note: N
OF STIMULI may be replaced with N OF POINTS for BBDIAM.
Apart from
PRINT DATA, there are
no PRINT or PLOT options for
BBDIAM. PROGRAM
LIMITS See
also The NewMDSX commands in
full
Transform: Nonmetric, invariant to monotone
transformations
Model: Minimizing
partition diameter for a specified number of groups
Keyword Function
N
OF
STIMULI [number] Number
of stimuli in the
analysis
LABELS [followed
by a series Optionally
identify the
stimuli.
of
labels (<= 65 chars There should be
as many
labels
each
on a separate line] as there are stimuli.
CLUSTERS
[number]
The number of clusters to
be
[number
list] identified
by partitioning
the
[number] TO [number] input
matrix
READ MATRIX Start
reading the input matrix
Keyword
Default
Function
DATA
TYPE 1 0: Lower-triangle
matrix
of similarities
(high values mean high
similarities
between
points).
1: Lower-triangle
matrix of
dissimilarities
(high values mean high
dissimilarities
between
points).
2: Full-symmetric matrix
of similarities
(high values mean high
similarities
between
points).
3: Full-symmetric
matrix of
dissimilarities
(high values mean high
dissimilarities
between points).
Maximum no. stimuli = 300
Maximum no. of clusters =
300