MINI-Smallest-Space-Analysis (Nijmegen version) : MINISSA-N

MINISSA performs the basic model of non-metric MDS by taking  data in the form of the full square symmetric matrix (or its lower triangle) of (dis)similarities), whose elements are to be transformed to give the distances of the solution. This transformation will preserve the rank order of the input data.

Data:   2-way, 1-mode dis/similarities                 
Transform:  Monotonic                        
Model:  (Euclidean and other Minkowski) Distance

Output from MINISSA may in turn be used as input for PINDIS.

The aim of the algorithm is to position the elements as points in a space of minimum user-specified dimensionality so that a measure of departure from perfect fit between the (monotonically) rescaled data and the distances of the solution (STRESS1) is minimised. Perfect fit occurs if a monotone transformation of the dissimilarites data can be found which forms a set of actual distances.

OPTIONS IN MINISSA

1. Ties in the data
It is possible to treat ties in the data in two ways when calculating STRESS, selected by the keyword TIES in the PARAMETERS command. The primary approach TIES (1) allows that if two data elements are equal, the assigned fitting values may be unequal, if in so doing STRESS is reduced. The secondary approach TIES (2) requires that the fitting values be equal for equal data. This constraint is more stringent, but is recommended when there are relatively few distinct values in the data. Choice of the secondary approach to ties will normally result in higher STRESS values.

2. EPSILON
A further approach to tied data uses EPSILON in the PARAMETERS command. If the difference between any pair of values is less than the value specified, they will be regarded as tied. This approach is recommended if the user wishes to place little emphasis on smaller variations in the data.

3. The initial configuration
The values of a 'good' starting point for the iterative process include saving on machine time and avoidance of local minima. The program will generate a starting configuration with desirable numerical properties, using only the ordinal properties of the data. This has been found particularly useful in avoiding problems with local minima and implements the recommendations of Lingoes and Roskam (1973).
Alternatively, the user may supply a starting configuration. The matrix of coordinates is preceded by READ CONFIG which may have associated with it an INPUT FORMAT statement to read real (F-type) values. The configuration may be input either stimuli (rows) by dimensions (columns) or dimensions (rows) by stimuli (columns). (In this latter case, the parameter MATFORM should be given the value (1) in the PARAMETERS command).

4. Distances in the configuration
The user may choose how the distances between the points in the configuration are to be computed using the MINKOWSKI parameter. The default value of 2.0 applies the Euclidean metric, and 1.0 the 'city-block' metric, but any positive number may be used. It is however unwise to use large values as there is a risk of overflow.

5. Dimensionality
As a general rule solutions should be computed in a number of dimensionalities, and the program will always start with the highest dimensionality, and step down to the lowest. Since a perfect fit will be obtained in p-2 dimensions the trial dimensionalities should always be in less than p-3. As a practical guide to the choice of trial dimensionalities it is recommended that the data compression ratio (defined by Young as the  product of stimuli x dimensions divided by the number of input data elements) should be greater than 2.

INPUT COMMANDS

Keyword                                                        Function
N OF STIMULI    [number]                            Number of stimuli in the analysis

DIMENSIONS     [number]
                              [number list]                       Dimensions for analysis
                              [number] TO [number]        Analyses are always performed from the highest to the lowest dimensionality.

LABELS              [followed by a series           Optionally used to identify the stimuli.
                         of labels (<= 65 chars         There should be as many
                         each on a separate line]      labels as stimuli.
                                                                    
READ MATRIX                                             Start reading input data matrix
READ CONFIG                                            Optional - followed by a
                                                                 starting configuration
COMPUTE                                                   Start computation
FINISH                                                       Final statement in the run

PARAMETERS
Keyword         Default Value      Function
DATA TYPE                0        0: Lower-triangle matrix of similarities
                                              (high values mean high similarities
                                               between points).
                                           1: Lower-triangle matrix of dissimilarities
                                                (high values mean high dissimilarities
                                                between points).
                                           2: Full-symmetric matrix of similarities
                                               (high values mean high similarities
                                               between points).
                                           3: Full-symmetric matrix of dissimilarities
                                               (high values mean high dissimilarities
                                               between points).

MINIMUM ITERATIONS 6     Sets the minimum number of iterations to
                                        be performed before the convergence test.

EPSILON                  0.0    Data are to be considered tied if the difference
                                        between them is less than EPSILON.

MATFORM                   0    (Relevant only when 'READ CONFIG' is used.)
                                         0: The input configuration is stimuli (rows)
                                              by dimensions (columns).
                                         1: The input configuration is dimensions (rows)
                                              by stimuli (columns).

TIES                          1      1: Primary approach to ties in the data.
                                         2: Secondary approach to ties in the data.

MINKOWSKI           2.0       1.0: Distances in the configuration are
                                             by 'city-block' metric.
                                         2.0: Distances are measured by Euclidean metric.
                                            Any positive number may be used here.

NOTES
1.N OF STIMULI may be replaced by N OF POINTS.
2.N OF SUBJECTS is not valid.
3. Note that the program expects free-format real (F-type) numbers separated by one or more spaces. The INPUT FORMAT, if used,
should read the longest row of this matrix (i.e. n-1 values for a lower-triangle matrix
without diagonal, when there are n stimuli - n values for a full symmetric matrix).
4. Note that similarity/dissimilarity values should preferably not be negative. If so, a metric equivalent model (e.g. MRSCAL) should be used, or (if product-moment correlations) converted by sqrt (1-r) into distances, or made non-negative by adding the largest negative value to all entries.

PRINT options (to main output file)
Option                   Form                   Description
INITIAL          p x r matrix          Initial configuration, either generated
                                                  by the program or printed from the user-provided configuration.
                     (p = no. of stimuli).
FINAL             p x r matrix          Final configuration, rotated to
                                                  principal components.
DISTANCES    lower triangular,  Solution distances between points,
                     with diagonal       calculated according to MINKOWSKI
                                                parameter.
FITTING         lower triangular,    Fitting values: the disparities
                      with diagonal       (DHAT) values.
RESIDUALS    lower triangular,    The difference between the distances
                      with diagonal         and the disparities.
HISTORY        An iteration by iteration history
                      of STRESS and values.

By default only the final configuration and the final STRESS values are output.

For matrices of between 10 and 60 stimuli, the output also reports Spence’s approximation for Kruskal’s Stress based on random rankings (Spence, 1979). The standard deviation associated with the values produced by this approximation can be taken as 0.01, for those inclined to contruct a formal test for the presence of random data. If the obtained value is, say, only a third or a half as large as the approximation, one can be fairly sure that the data are good.

PLOT options (to main output file)
Option                      Description
INITIAL              Up to r(r-1)/2 plots of the
                         initial configuration. (r = no. of
                         dimensions).
FINAL                 Up to r(r-1)/2 plots of final
                         configuration (r = no. of dimensions).
SHEPARD          The Shepard diagram of distances
                         plotted against data. Fitted values
                         are shown by *, actual data/distance
                         pairs by 0.
STRESS              Plot of STRESS values by iteration,
                         with a summary plot of stress
                         by dimensions.
POINT                Histogram of point contributions to
                         STRESS in descending order.
RESIDUALS        Histogram of residual values.

By default, the Shepard diagram and the final configuration will be plotted.
Configuration plots are calibrated both from 0 to 100 and from 0 to the
maximum coordinate value.

By default, no secondary output file is produced.

PROGRAM LIMITS
Maximum no. of stimuli = 300
Maximum no. of dimensions = 8

See also

  • The NewMDSX commands in full