BBDIAM : Branch-and-Bound clustering

BBDIAM 

partitions a matrix of dissimilarities into a given number of clusters, identifying an upper bound by biased-sampling complete link cluster analysis, and then using a branch-and-bound algorithm to minimize the 'partition diameter', that is, the maximum of the cluster 'diameters' (the maximum pairwise dissimilarity values within each group).  The same process is used in CONPAR when partitioning the matrix of dissimilarities between the original set of subject matrices.

Data:  2-way, 1-mode dis/similarities                  
Transform:  Nonmetric, invariant to monotone transformations            
Model:  Minimizing partition diameter for a specified number of groups                         

Clustering is often used as an alternative or as a supplementary technique to the basic model of MDS and takes the same form of data. In contrast to HICLUS, BBDIAM uses an efficient new algorithm (Brusco, 2003) to partition the input matrix into the number of clusters specified by the user. The process can be repeated in a single run for several different numbers of clusters. The content of  the various partitions produced by BBDIAM may usefully be compared with the results of applying MINISSA and HICLUS first to the same matrix. Given the number of ties that can occur in minimum diameter partitioning, it is likely that there are many alternative optima in large matrices which can cause difficulties for this algorithm.

The algorithm

The object is to develop a partition of a matrix of pairwise dissimilarity measures matrix into a series of clusters, each containing at least one item. Clusters are to be mutually exclusive, and exhaustive, in that all items are assigned to a cluster. A commonly-used criterion in clustering is to minimize the within-cluster sum of pairwise dissimilarities, but this has a tendency to produce clusters of approximately the same size, irrespective of the data. For this reason, the enhanced branch and bound algorithm employed by BBDIAM seeks instead to minimize the partition diameter, which is related to Johnson's diameter method for hierarchical clustering. The diameter of a cluster is the maximum pairwise dissimilarity index among objects in that cluster. The partition diameter is defined as the maximum of the cluster diameters. To minimize the diameter of the partition is to minimize the maximum dissimilarity index accross all subsets. An advantage of the partition diameter index is that it is not predisposed to produce clusters of particular sizes. It is also computationally simpler than minimizing the within-cluster sum of dissimilarities. Guaranteed minimum-diameter solutions can be obtained using branch-and-bound methods comparable to those used for colouring the nodes of a graph.

The efficiency of the branch-and-bound algorithm in minimizing the partition diameter depends on the quality of the upper bound. A good upper bound can frequently be established heuristically using a complete-link clustering algorithm, as in HICLUS Method(1). An algorithm for partitioning then applies two kinds of local-search operations - trial movement of each object from its current subset to each of the other subsets, and pairwise interchange of objects w.r.t. their subset memberships. These local-search operations are conducted until no further improvement is possible in the upper bound.

 

Because the resulting solutions may be sensitive to the initial partition,  BBDIAM applies a probabilistic process that uses 100 replications and selects linkages using biased sampling. In particular, when considering the next linkage, the algorithm has a 50% chance of selecting the best linkage and a 50% chance of selecting the second best linkage. This biased-sampling version often produces better bounds than the deterministic version.

Using BBDIAM

BBDIAM expects data in the form of a lower triangle or a full symmetric matrix of (dis)similarity measures between a set of objects (stimuli). Any of the types of data suitable for input to MINISSA are suitable. Note that data values must be non-negative.

BBDIAM will produce the series of partitions defined by the CLUSTERS command, which may specify a list and/or a range of numbers of partitions, for example  2,5,8  or 2 TO 5  or  2, 4 TO 6.  Specifying CLUSTERS to produce full enumeration of all possible combinations is not recommended for matrices with many more than 15 variables because of the time this may take to compute. Occasionally BBDIAM appears to be unable to reach an optimal partition into a particular number of clusters. In this case, you should halt the process and try again, requesting a different number of clusters.

Since the method does not develop clusters hierarchically it is not possible to display them as a dendrogram, as in HICLUS. The content of  the various partitions produced by BBDIAM may be identifiable, however, in the graphic display of the results of applying MINISSA to the same matrix.

Output

The output simply lists the partitions requested, according to the members allocated to the corresponding numbers of groups. Giving names to the variables or stimuli when entering the matrix using the input Wizard, or use of the LABELS command in the input, considerably clarifies the cluster contents listed.

INPUT COMMANDS
Keyword                                                             Function
N OF STIMULI    [number]                          Number of stimuli in the analysis
LABELS             [followed by a series          Optionally identify the stimuli.
                        of labels (<= 65 chars       There should be as many labels
                        each on a separate line]     as there are stimuli.
CLUSTERS        [number]                          The number of clusters to be
                       [number list]                      identified by partitioning the
                    [number] TO [number]          input matrix
READ MATRIX                                           Start reading the input matrix

PARAMETERS
Keyword            Default                                      Function
DATA TYPE            1                          0: Lower-triangle matrix of similarities
                                                            (high values mean high similarities
                                                             between points).
                                                        1: Lower-triangle matrix of dissimilarities
                                                             (high values mean high dissimilarities
                                                             between points).
                                                        2: Full-symmetric matrix of similarities
                                                             (high values mean high similarities
                                                              between points).
                                                        3: Full-symmetric matrix of dissimilarities
                                                           (high values mean high dissimilarities
                                                           between points).

Note: N OF STIMULI may be replaced with N OF POINTS for BBDIAM.

Apart from PRINT DATA,  there are no PRINT or PLOT options for BBDIAM.

PROGRAM LIMITS
Maximum no. stimuli = 300
Maximum no. of clusters = 300

See also The NewMDSX commands in full