provides 'external' analysis of preference data: i.e. it seeks to
relate subjects' preferences for a set of stimuli to an existing configuration
of the stimulus points by means of a (strict) hierarchy of four different
models. These models are the four 'phases' of the program and they form a
hierarchy inasmuch as each phase is a special case of the one preceding it. The
subjects are mapped into the stimulus configuration as points of "maximum
preference" (or, in phase IV, vectors) so that the preference scores are
maximally reproduced in the distances from each subject's point to the stimulus
point. (The program also allows for subject points to be negative, and in this
case represent “pessimal” points (least preference) on some or all
dimensions).
PREFMAP may also be used to provide a quasi-internal
analysis by generating a stimulus configuration from the preference data.
PREFMAP
expects two matrices to be input in the usual case: i) a matrix
defining the configuration of stimulus points, and
ii) a matrix of subjects'
preferences.
The
configuration of stimulus points into which the program maps subjects'
preferences may be an a priori arrangement of some sort or the result of a
previous MDS analysis – usually a prior scaling of the subjects’ similarity
judgments.
Alternatively,
the program may generate a configuration from the preference data themselves,
thus providing a quasi-internal analysis. The analysis is not truly internal,
since once the configuration has been generated, it is not changed during the
course of the analysis to provide a better fit to the data. In this case the
value attached to the parameter GENERATE becomes relevant.
The
preference matrix
As is usual in MDS "preference" is used as a shorthand for
any proximity or (dis)similarity type of judgement or data. Any data which can
be thought of as resulting from a question of the type "which has the more of
attribute x" are amenable to analysis in PREFMAP and other "preference" analysis
programs in New MDSX.
Ranks
vs. Scores
Preference judgements may be represented for PREFMAP (as in MDPREF and other programs) in four distinct ways. The
major distinction is that between a rank and a score.
Ranked data in
which the ordering is most-preferred to least-preferred may be input to PREFMAP
in this form by specifying DATA TYPE (1). DATA TYPE (2) indicates
that the rank ordering is the reverse of this.
If
instead scores are used, in which the highest number is used to denote the most
preferred stimulus and the lowest the least preferred, the option is indicated
by: DATA TYPE (3). DATA TYPE (4) refers to scores signifying the
opposite order of preference.
The
Model
The four phases in PREFMAP are in fact four distinct models, which are
"nested" in the sense that each phase is a special case of the one preceding it.
In all four phases the program seeks to represent the preference information
within the given configuration. As the program moves from phase 1 to phase 4 the
restrictions become more strict.
Phase I: The general unfolding
model
In this model each subject is
allowed
1. orthogonally to rotate the axes of
the similarity space to his/her own reference
dimensions;
2. to assign to each of his/her
dimensions a different evaluative weight or salience.
Within the
configuration space each subject is represented as a point located by his/her
most preferred position on each of the constituent dimensions of this space,
i.e. placed at his/her ideal point.
Phase II: The weighted unfolding
model
In this model subjects
1. are assumed
to share the same reference
axes
but
2. are allowed to weight each of
the common dimensions (axes) differentially. The subjects' ideal points are
again mapped into the space.
Phase III: The simple unfolding model
In
this model the subjects
1. are assumed to
share the same reference axes
2. and to
have the same (unit) dimensions (i.e. no differential
weighting)
Then each ideal point is mapped into
the common stimulus space.
Phase IV: The vector model
The vector model
represents each subject's preference as a vector directed towards their region
of maximum preference. The projections of the stimulus points onto the vector
reproduce the subject's preference values. Moreover, the angle which the vector
makes with each dimension can be thought of as representing the salience of that
dimension in the preference judgement.
The program thus forms a hierarchy
of models in the sense that each higher numbered phase is a special case of the
lower numbered.
The user may choose which of the models to apply by means
of the parameters S-PHASE and E-PHASE (Start-phase and End-phase) which may have
the same value, in which case only that one model is implemented. By default,
the program will begin at phase I and compute solutions by the other models in
order of increasing restriction to end at phase IV.
The program computes
the F statistic as a measure of improvement as it moves from phase to
phase.
Phase IV of PREFMAP is analogous to PROFIT and is the external version of
MDPREF
Transformations
In addition to containing four separate models
for representing preference data, PREFMAP allows the user to choose the form of
the transformation function linking data to the solution, which is tantamount to
defining the level of measurement of the preference data.
The linear
option
By specifying FIT TYPE(0) in the PARAMETERS statement the user
signifies that, in phases I-III, the distances from the subject's ideal point to
the stimulus points will be (a least-squares approximation to) a linear
transformation of the rating values ascribed to the stimuIi by the subject. In
phase IV it is the distances between the projection of the points onto the
subject's vector which are so related.
This option is applicable if the
data are believed to have interval (or, indeed ratio) level measurement
properties.
The non-linear (monotone) option
Alternatively, the user
may believe that the data will bear only ordinal interpretation and that the
solution distances should be allowed to be (as close as possible to) some
ordinal transformation of the original data values. This option is chosen by
specifying FIT TYPE(1) in the PARAMETERS statement. FIT TYPE(1) should, however,
be chosen only if there are no tied data values. If there are equal data values
then the user has the choice of two monotonic approaches:
The primary
approach to ties (FIT TYPE(2))
FIT TYPE(2) indicates that the information
contained in the ties is important and that ties should be matched by ties in
the fitting values. This is to say, that any equal data values which have
associated with them unequal distances in the solution will decrease the
goodness-of-fit of solution to data.
The secondary approach to ties (FIT
TYPE(3))
If, however, the user believes that existence of tied data values is
not significant information, they may specify FIT TYPE(3). The program will then
not be constrained to matching equal data with equal fitting values, and will
'break' ties if in so doing the goodness-of-fit is improved.
The PREFMAP
algorithm
The linear procedure involves at each phase the solution of a
multiple regression equation. The coefficients of the equation are output as
beta-weights. The non-linear procedure solves first the linear regression then
proceeds to use the linear solution as a starting-point for the monotone
regression, which is an iterative procedure.
Quasi-Internal
analysis
Although PREFMAP is primarily a program for external analysis, the facility exists to perform a quasi-internal analysis. If no configuration is input by
the user, the program will generate one from the preference data themselves by
an Eckart-Young factorisation of the minor product of the preference data
matrix, which has often been transformed in some way (see below). The user has a
number of options in choosing how this configuration is formed using the
parameter GENERATE :
GENERATE(0)
When GENERATE is given the value 0
the preference matrix is doubled-centred. This means that both row- and
column-means are extracted and the overall mean of the matrix added back in.
This means in essence that only interaction effects remain. The configuration is
then formed in the manner indicated above. When this option is chosen with FIT
TYPE(0) and S-PHASE(3), the resulting analysis is equivalent to internal metric
unfolding.
GENERATE(1)
Alternatively, before the configuration is
formed only the row-means of the preference matrix are extracted. This has the
effect of removing from the configuration any influence due to the actual values
used by different subjects, though not any effect due to the spread in the
scores.
GENERATE(2)
If, in addition, the user wishes to remove
influence of the actual spread of preference scores from the initial
configuration, then choice of GENERATE(2) instructs the program to standardise
the preference scores before extracting the
configuration.
GENERATE(3)
The user may wish simply to remove the row
effects and the column effects without fully double-centring the matrix. In this
case the user should choose GENERATE(3).
These options are not operative
when READ CONFIG is used to input a separate stimulus configuration.
Normalisation of scale values
Independent of whether the analysis is
external or internal and what information is taken into account if and when
generating the configuration, the user may choose whether, in mapping the
preference data into the configuration, (s)he wishes to ipsatize the subjects'
data. If NORMALISE (1) is chosen each subject's preference scores are normalised
by removing the subject's mean score and dividing by their standard deviation.
The default option NORMALISE (0) leaves the scores unchanged.
Canonical
rotation: the KEEP parameter
In phase II subjects are thought of as applying
evaluative weights to the dimensions of the stimulus space. The orientation of
these axes is then, in a substantive sense, non-arbitrary. As part of the
solution to phase I (where arbitrary rotation of the axes is allowed) the
program produces a canonically-rotated space. This is the stimulus configuration
with its axes rotated to those of an "average subject" whose data are formed by
averaging the subject's preference scores. These axes have certain optimal
properties and will not normally correspond to (e.g.) the principal axes of the
configuration.
In the normal way the program, in passing from phase I to
phase II, will use this average subject space as the basis of its calculations
in this and lower phases. If, however, the user feels that the axes of the input
configuration have substantive significance and wishes to retain them in the
lower phases, then (s)he should set the parameter KEEP (1) in the PARAMETERS
statement.
If a solution is not being computed in phase I, then the
program will simply use the original input configuration.
Initial weight
estimates
If the user begins the analysis at phase III (i.e. S-PHASE (3))
then (s)he has the option of having the program read in estimates of the
dimension weights. This is done by means of the READ WEIGHTS command, The
weights (one per dimension) should follow READ WEIGHTS, in free format, or
optionally according to an associated INPUT FORMAT specification.
The
BETA weights
In the linear procedure the program seeks to represent the
preference data as a linear function of the (squared) distance between the
subject's ideal point/vector and the stimulus points. This is done in a multiple
regression. For this to be done the distance equation must be multiplied out
into its constituent terms. Since the form of the distance equation differs in
each phase with the differing weighting options, the number of terms involved in
this expansion will differ at each phase. The program estimates a standardised
regression coefficient for each of the terms in this expansion. These are the
so-called BETA weights.
The F-statistic
The F-statistic is computed
within phases, to ascertain whether the variance explained by the model
significantly differs from the residual (unexplained) variance.
Between
phases it measures the extent to which the higher-level model explains
significantly more variation than the lower level one.
In both cases the
F-statistic is only approximate since the stages are not
independent.
Phases and strategies
It should be noted that choice of
S-PHASE affects the form of the subsequent analysis. In particular note the use
of the canonical reference space noted above. Further considerations include the
following:
When S-PHASE (1)
At the end of this phase the original
input configuration is replaced by the phase 1 "private space" of the average
subject. The rationale is that the average subject's configuration is likely to
provide a better fit than the original configuration to this data at subsequent
stages.
When S-PHASE (2)
At the end of this phase, the original input
configuration is replaced by the phase 2 "private space" of the average subject
(i.e. differentially weighted to conform but to the averaged data). This removes
the canonical reference space for subsequent phases.
When S-PHASE
(3)
If the user begins at phase 1 or 2, it is quite possible that the
individual ideal point at phase 3 will be a "saddle point", i.e. a point which
is a mixture of optimal preference on some one or more dimensions (signalled by
a positive value) and pessimal preference (negative value) on others. However,
if one begins with S-PHASE (3), the original configuration is unchanged and all
ideal points will be positive. In many applications the user may wish to start
at a higher phase and finish at phase 3 or 4, but then perform a separate run
with S-PHASE(3) if the simple unfolding model turns out to be adequate. In this
case a solution is obtained where all ideal points are constrained to be
positive.
When S-PHASE (4)
The program will simply implement the
vector model.
INPUT COMMANDS
Keyword
Function
N OF
STIMULI [number]
Number of stimuli in the analysis
N OF SUBJECTS
[number]
Number of subjects/cases
DIMENSIONS [number]
Dimensions for
analysis
LABELS [followed
by a series
Optionally identify the
stimuli
of
labels (<= 65 chars and
subjects, Labels
should identify
each
on a separate line] first the
stimuli (columns) and
then the subjects (rows), of
the
data matrix, without omissions.
PARAMETERS
Keyword Default
Value Function
S-PHASE 1
Sets
starting-phase for
analysis.
E-PHASE 4 Sets
end-phase for
analysis.
NORMALISE 1 0:
Preference matrix not
normalised.
1:
Matrix is
row-normalised.
AVERAGE 1 0:
Average subject's scores
are
computed
in starting phase
only.
1:
Average subject's scores
are
computed
at all phases.
DATA TYPE
1 1:
Data are ranks: first
stimulus
most
preferred.
2:
Data are ranks: first
stimulus
least
preferred.
3:
Data are scores: highest
value
=
most
preferred.
4:
Data are scores: lowest
value
=
most preferred.
FIT
TYPE 0 0:
Linear
fit.
1:
Monotone fit: no
ties.
2:
Monotone fit: secondary
approach
to
ties.
3:
Monotone fit: primary
approach
to
ties.
MATFORM 0 0:
Input configuration
stimuli
(rows)
by dimensions
(columns).
1:
Input configuration
dimensions
(rows)
by stimuli
(columns).
ORIGINAL 0 0:
At each new phase begin
with
fitting
values from previous
phase.
1:
At each new phase return
to
original
preference
matrix.
(Applicable
only if
FIT(0))
KEEP 1 0:
Stimulus configuration
becomes
the
canonically rotated
space
from
phase 1 or
2.
1:
Return to original
configuration.
CRITERION 0.005 Sets
the criterion for
terminating
iterations.
(Not
applicable if
FIT(0))
GENERATE 0 0:
Configuration is generated
from
the
double-centred preference
matrix.
1:
Configuration is generated
after
row-means
of preference
matrix
are
extracted.
2:
Configuration is generated
from
preference
matrix with
standardised
rows.
3:
Configuration is generated
from
preference
matrix with both
row-
and
column-means extracted.
NOTES
1. In the parameters S-PHASE and
E-PHASE, the hyphens are significant characters.
2. The preference data are
submitted to the program as a matrix of real numbers.
3. The data matrix must
have subjects as rows and stimuli as columns.
4. Only one dimensionality per
run (task) may be analysed by PREFMAP.
PRINT, PLOT AND PUNCH
OPTIONS
Since the output from PREFMAP is extensive, the form of the PRINT,
PLOT and PUNCH commands has been modified to limit the amount of information
generated.
The format of the commands remains the same,
viz:
PRINT ALL
PLOT ALL
BUT keyword (arg.),
keyword(arg.)...
PUNCH EXCEPT
In
other NewMDSX procedures the argument takes the form of a number
(or numbers) which indicate the dimensionality (dimensionalities) for which the
particular output specified by the keyword is required. In PREFMAP the analysis
is performed in only one dimensionality per task and thus the argument refers to
the phase (or phases) for which the particular output is required.
Obviously then, the argument will consist of numbers between 1 and 4
only.
Specification of ALL in the PRINT command will generate a
detailed set of output for all phases from S-PHASE to E-PHASE. This is detailed
and voluminous and its use is not recommended. Default options, as usual,
provide the essential information.
PRINT
options
Option Form
Description
GENERATED p
x r If the program generates the
stimulus
configuration
from the preference
matrix
then
this is
output.
STIMULI p
x r The 'solution' matrix of
stimulus
coordinates
is output. This will
differ
at
each phase unless KEEP CONFIG
is
specified.
SUBJECTS N
x r The matrix of subject locations is
output.
For
phases I-III the matrix is of
point
coordinates.
For phase IV
direction
cosines
are output.
ROTATIONS N
matrices Phase I only. For each subject the
cosines
each
r x r of the angles through which
each
dimension
is rotated are
output.
WEIGHTS N
x r Phases I-III only. The weights
which
each
subject
assigns to each
dimension
are
output.
COMPOSITE matrices Phase I
only. The composite
transformation
each
r x r matrix for each subject is
output.
This
is the matrix which transformed
the
original
space into the weighted
and
rotated
'private'
space.
IDEAL N
x r Phase I and II only.
The coordinates
of
the
subject ideal points in the
'private'
spaces
(i.e. the space as rotated
and/or
weighted)
are output.
PRIVATE N
matrices Phases I and II only. For each
subject
each p x r the coordinates of the stimulus
with
respect
to the axes of the rotated
and/or
weighted
space are
output.
BETA - The
regression weights for each
subject
are
output.
NORMALISED N x
p This keyword takes no
argument and
the
matrix
of preference values as
normalised
will
be output once
only.
FITTING N
x p For the linear
procedure the
estimated
values
are output. For the
non-linear
procedure
the fitting values are output.
DISTANCES N
x p Phases I-III: The
matrix of
squared
distances
between each subject and
the
stimuli
is
output.
Phase
IV: The projections of the
stimulus
points
onto each vector are output.
RESIDUALS N x
p The residual values
(DISTANCES -
FITTING)
are
output.
JOINT N
x r Prints both the subject and
stimuli
final
p
x
r co-ordinates.
By
default, the following are output:
STIMULI
SUBJECTS
ROTATIONS (phase
I)
IDEAL (phases I-II)
Within- and between-phase F-tests
Individual
correlation of data to solution
PLOT
options
Option Description
INITIAL This
keyword takes no argument. The
configuration
of
stimulus points as input is plotted once
only.
STIMULI The
configuration of stimulus points at
each
selected
phase is plotted. There will be
r(r-1)/2
two-way
plots.
SUBJECTS The
configuration of subject ideal points
(vectors)
is
plotted in the selected phases in the form
of
r(r-1)/2
plots.
JOINT The
configuration of stimulus points and
subject
points
(vectors) is
plotted.
SHEPARD A
Shepard diagram of the data values
plotted
against
the (squared) distances is produced.
CORRELATIONS A series
of plots of individual
(subject)
correlations
to the solution is
produced.
RESIDUALS A
histogram of residual values is produced.
By default only the first two
dimensions of the joint space will be plotted.
PUNCH options (to
secondary output
file)
Option Description
SPSS Each
line of the output matrix contains
the
following
for each phase
selected
I
: the subject
index
J
: the stimulus
index
INPUT
: the corresponding data
value
NORMALISED:
the data value as
normalised
FITTING
: the corresponding fitting
value
DISTANCE
: the squared distance between I and
J
RESIDUAL
: the corresponding residual
value
in
a fixed format.
GENERATED The
matrix of stimulus values if this is
generated
by
the
program.
SOLUTION The
two solution matrices, i.e. the
matrix
of
stimulus coordinates and the matrix of
subject
coordinates
(or, in phase IV, cosines).
By default, no secondary output file is
produced.
PROGRAM LIMITS
Maximum no. of subjects =
200
Maximum no. of stimuli = 200
Maximum no. of dimensions =
5
See also