HICLUS : HIerarchical CLUStering

HICLUS provides analysis of (dis)similarity data by means of a hierarchical (agglomerative) clustering scheme, whose results are ordinally invariant.

 


Data:
 2-way, 1-mode non-negative dis/similarities                 
Transform:  Monotonic                        
Model:   Ultra-metric distance

The method of hierarchical clustering implemented in HICLUS is often used as an alternative or as a supplementary technique to the basic model of MDS and takes the same form of data.

The matrix of (dis)similarities between a set of objects is used to define a set of non-overlapping clusters such that the more similar objects are joined together before less similar objects. The scheme consists of a series of clustering (levels), each of which is a partition  In the initial level each object forms a cluster, whilst at the highest level all the objects form a single cluster. In a hierarchical clustering scheme there are exactly (p-1) levels where there are p objects. Each intermediate  level joins points or clusters present at the lower (finer) level. The clustering scheme is hierarchical in the sense that once two objects have been joined together at a lower level of the scheme, they may not be separated at a higher level. So the clustering at each  level includes the ones below it..

HICLUS expects data in the form of a lower triangle or a full symmetric matrix of (dis)similarity measures between a set of objects (stimuli). Any of the types of data suitable for input to MINISSA are suitable. Note that data values must be non-negative.

HICLUS implements Johnson's (1967) Hierarchical Clustering Schemes. If data conform exactly to the ultra-metric inequality, which defines a hierarchical clustering scheme, then there is no ambiguity in defining the distance between a cluster and a new point. However,  most data do not perfectly conform, and the problem then becomes to define the distance unambiguously, since a number of options (mean, median, mode) are possible, but each option chosen will produce a different clustering. Johnson therefore proposes using the two extreme possibilities (the minimum and the maximum) as solutions, thus alerting the user to the full range of possible solutions.

The "minimum" method
Also known as the "connectedness" or "single-link" method, this approach defines the dissimilarity between a point and a cluster as the smallest of the dissimilarities between the external point and the constituent points in the cluster. This method tends to join single points to existing clusters and schemes resulting from it are often not easily amenable to substantive interpretation. The "level" value in this approach gives the length of the longest chain joining any two points in the cluster. The approach is chosen by specifying METHODS(1) in the PARAMETERS statement.

The "maximum" method
Also known as the 'diameter' or 'complete link' method, this approach defines the dissimilarity between a point and a cluster to be the largest of the dissimilarities between it and the points constituting the cluster. In this case the " level" gives the size of the diameter of the largest cluster at that level. This method is chosen by specifying METHODS(2) in the PARAMETERS statement.

The default option METHODS(3) allows for both methods to be used sequentially.

HICLUS is not a dimensional method; the solutions are presented as a dendogram or hierarchical tree.

The method used to obtain a solution is not an iterative method, and no fit measure is  produced.

See also : Displaying dendrograms,  
                The NewMDSX commands in full

INPUT COMMANDS
Keyword                                                             Function
N OF STIMULI    [number]                          Number of stimuli in the analysis
LABELS             [followed by a series          Optionally identify the stimuli.
                        of labels (<= 65 chars        There should be as many labels
                        each on a separate line]     as there are stimuli.
READ MATRIX                                            Start reading input data

PARAMETERS
Keyword        Default             Function
DATA TYPE          0          0: Lower-triangle matrix of similarities
                                            (high values mean high similarities
                                             between points).
                                      1: Lower-triangle matrix of dissimilarities
                                             (high values mean high dissimilarities
                                              between points).
                                      2: Full-symmetric matrix of similarities
                                             (high values mean high similarities
                                             between points).
                                      3: Full-symmetric matrix of dissimilarities
                                             (high values mean high dissimilarities
                                            between points).
METHODS            3          1: Only the minimum method is used.
                                      2: Only the maximum method is used.
                                      3: Both methods are used.

The following statements are not valid with HICLUS:
N OF SUBJECTS
DIMENSIONS
ITERATIONS
PLOT
PUNCH

N OF STIMULI may be replaced with N OF POINTS

The input should be specified as (non-negative) real numbers and should be presented as a lower-triangle matrix without diagonal.

PRINT option
Option           Description
HISTORY     A detailed history of the clustering is produced.

PROGRAM LIMIT
Maximum no. stimuli = 300