MINISSA performs
the basic model of non-metric MDS by taking data in the form of the full square
symmetric matrix (or its lower triangle) of (dis)similarities), whose elements
are to be transformed to give the distances of the solution. This transformation
will preserve the rank order of the input data.
Data:
2-way, 1-mode dis/similarities
Transform:
Monotonic
Model: (Euclidean
and other Minkowski) Distance
Output from
MINISSA may in turn be used as input for PINDIS.
The aim
of the algorithm is to position the elements as points in a space of minimum
user-specified dimensionality so that a measure of departure from perfect fit
between the (monotonically) rescaled data and the distances of the solution
(STRESS1) is minimised. Perfect fit occurs if a monotone transformation of the
dissimilarites data can be found which forms a set of actual distances.
OPTIONS
IN MINISSA
1. Ties
in the data
It is possible to treat ties in the data in two ways when
calculating STRESS, selected by the keyword TIES in the PARAMETERS
command. The primary approach TIES (1) allows that if two data elements
are equal, the assigned fitting values may be unequal, if in so doing STRESS is
reduced. The secondary approach TIES (2) requires that the fitting values
be equal for equal data. This constraint is more stringent, but is recommended
when there are relatively few distinct values in the data. Choice of the
secondary approach to ties will normally result in higher STRESS
values.
2.
EPSILON
A further approach to tied data uses EPSILON in the
PARAMETERS command. If the difference between any pair of values is less
than the value specified, they will be regarded as tied. This approach is
recommended if the user wishes to place little emphasis on smaller variations in
the data.
3. The
initial configuration
The values of a 'good' starting point for the iterative
process include saving on machine time and avoidance of local minima. The
program will generate a starting configuration with desirable numerical
properties, using only the ordinal properties of the data. This has been found
particularly useful in avoiding problems with local minima and implements the
recommendations of Lingoes and Roskam (1973).
Alternatively, the user may
supply a starting configuration. The matrix of coordinates is preceded by
READ CONFIG which may have associated with it an INPUT FORMAT
statement to read real (F-type) values. The configuration may be input either
stimuli (rows) by dimensions (columns) or dimensions (rows) by stimuli
(columns). (In this latter case, the parameter MATFORM should be given
the value (1) in the PARAMETERS command).
4.
Distances in the configuration
The user may choose how the distances between
the points in the configuration are to be computed using the MINKOWSKI
parameter. The default value of 2.0 applies the Euclidean metric, and 1.0 the
'city-block' metric, but any positive number may be used. It is however unwise
to use large values as there is a risk of overflow.
5.
Dimensionality
As a general rule solutions should be computed in a number of
dimensionalities, and the program will always start with the highest
dimensionality, and step down to the lowest. Since a perfect fit will be
obtained in p-2 dimensions the trial dimensionalities should always be in less
than p-3. As a practical guide to the choice of trial dimensionalities it is
recommended that the data compression ratio (defined by Young as the product of stimuli x dimensions divided
by the number of input data elements) should be greater than
2.
INPUT
COMMANDS
Keyword
Function
N OF
STIMULI [number]
Number of stimuli in the analysis
DIMENSIONS
[number]
[number
list]
Dimensions for
analysis
[number] TO
[number]
Analyses are always performed from the highest to the lowest
dimensionality.
LABELS
[followed by a series
Optionally used to identify the
stimuli.
of labels (<= 65
chars There should be as
many
each on a separate
line] labels as
stimuli.
READ MATRIX
Start reading input data matrix
READ
CONFIG Optional
- followed by
a
starting
configuration
COMPUTE Start
computation
FINISH
Final statement in the run
PARAMETERS
Keyword
Default
Value Function
DATA
TYPE
0 0: Lower-triangle matrix
of similarities
(high
values mean high
similarities
between
points).
1: Lower-triangle
matrix of
dissimilarities
(high
values mean high
dissimilarities
between
points).
2: Full-symmetric
matrix
of similarities
(high
values mean high
similarities
between
points).
3: Full-symmetric
matrix of
dissimilarities
(high
values mean high
dissimilarities
between points).
MINIMUM ITERATIONS 6 Sets the
minimum number of iterations
to
be performed before the convergence
test.
EPSILON
0.0 Data are to be considered tied if the
difference
between them is less than
EPSILON.
MATFORM
0 (Relevant only when 'READ
CONFIG' is
used.)
0: The input configuration is stimuli
(rows)
by
dimensions
(columns).
1: The input configuration is
dimensions
(rows)
by
stimuli (columns).
TIES
1
1: Primary approach to ties in the
data.
2: Secondary approach to ties in the data.
MINKOWSKI 2.0
1.0: Distances in the configuration
are
by
'city-block'
metric.
2.0: Distances are measured by
Euclidean
metric.
Any positive number
may be used here.
NOTES
1.N
OF STIMULI may be replaced by N OF POINTS.
2.N OF SUBJECTS
is not valid.
3. Note that the program expects free-format real (F-type)
numbers separated by one or more spaces. The INPUT FORMAT, if
used,
should read the longest row of this matrix (i.e. n-1 values for a
lower-triangle matrix
without diagonal, when there are n stimuli - n values
for a full symmetric matrix).
4. Note that similarity/dissimilarity
values should preferably not be negative. If so, a metric
equivalent model (e.g. MRSCAL) should be used, or (if product-moment
correlations) converted by sqrt (1-r) into distances, or made non-negative by
adding the largest negative value to all entries.
PRINT
options (to main output
file)
Option
Form
Description
INITIAL p
x r matrix Initial
configuration, either
generated
by the program or printed from the user-provided
configuration.
(p = no. of
stimuli).
FINAL p
x r matrix Final
configuration, rotated
to
principal components.
DISTANCES lower
triangular, Solution distances between
points,
with
diagonal calculated according to
MINKOWSKI
parameter.
FITTING lower
triangular, Fitting values: the
disparities
with diagonal
(DHAT)
values.
RESIDUALS lower triangular,
The difference between the
distances
with diagonal
and the
disparities.
HISTORY An
iteration by iteration
history
of STRESS and values.
By default only the final configuration and the final STRESS values are output.
For matrices of between 10 and 60 stimuli, the output also reports Spence’s approximation for Kruskal’s Stress based on random rankings (Spence, 1979). The standard deviation associated with the values produced by this approximation can be taken as 0.01, for those inclined to contruct a formal test for the presence of random data. If the obtained value is, say, only a third or a half as large as the approximation, one can be fairly sure that the data are good.
PLOT
options (to main output
file)
Option
Description
INITIAL
Up to r(r-1)/2 plots of
the
initial configuration. (r
= no.
of
dimensions).
FINAL
Up to r(r-1)/2 plots of
final
configuration (r = no. of
dimensions).
SHEPARD The
Shepard diagram of
distances
plotted against data.
Fitted
values
are shown by *, actual
data/distance
pairs by
0.
STRESS
Plot of STRESS values by
iteration,
with a summary plot of
stress
by
dimensions.
POINT
Histogram of point contributions
to
STRESS in descending
order.
RESIDUALS Histogram of
residual values.
By
default, the Shepard diagram and the final configuration will be
plotted.
Configuration plots are calibrated both from 0 to 100 and from 0 to
the
maximum coordinate value.
By
default, no secondary output file is produced.
PROGRAM
LIMITS
Maximum no.
of stimuli = 300
Maximum no. of dimensions = 8
See also