PREFerence MAPping : PREFMAP

provides 'external' analysis of preference data: i.e. it seeks to relate subjects' preferences for a set of stimuli to an existing configuration of the stimulus points by means of a (strict) hierarchy of four different models. These models are the four 'phases' of the program and they form a hierarchy inasmuch as each phase is a special case of the one preceding it. The subjects are mapped into the stimulus configuration as points of "maximum preference" (or, in phase IV, vectors) so that the preference scores are maximally reproduced in the distances from each subject's point to the stimulus point. (The program also allows for subject points to be negative, and in this case represent “pessimal” points (least preference) on some or all dimensions).

PREFMAP may also be used to provide a quasi-internal analysis by generating a stimulus configuration from the preference data.

PREFMAP expects two matrices to be input in the usual case: i) a matrix defining the configuration of stimulus points, and
ii) a matrix of subjects' preferences.

The configuration of stimulus points into which the program maps subjects' preferences may be an a priori arrangement of some sort or the result of a previous MDS analysis – usually a prior scaling of the subjects’ similarity judgments.

Alternatively, the program may generate a configuration from the preference data themselves, thus providing a quasi-internal analysis. The analysis is not truly internal, since once the configuration has been generated, it is not changed during the course of the analysis to provide a better fit to the data. In this case the value attached to the parameter GENERATE becomes relevant.

The preference matrix
As is usual in MDS "preference" is used as a shorthand for any proximity or (dis)similarity type of judgement or data. Any data which can be thought of as resulting from a question of the type "which has the more of attribute x" are amenable to analysis in PREFMAP and other "preference" analysis programs in New MDSX.

Ranks vs. Scores
Preference judgements may be represented for PREFMAP (as in MDPREF and other programs) in four distinct ways. The major distinction is that between a rank and a score.

Ranked data in which the ordering is most-preferred to least-preferred may be input to PREFMAP in this form by specifying DATA TYPE (1). DATA TYPE (2) indicates that the rank ordering is the reverse of this.

If instead scores are used, in which the highest number is used to denote the most preferred stimulus and the lowest the least preferred, the option is indicated by: DATA TYPE (3). DATA TYPE (4) refers to scores signifying the opposite order of preference.

The Model
The four phases in PREFMAP are in fact four distinct models, which are "nested" in the sense that each phase is a special case of the one preceding it. In all four phases the program seeks to represent the preference information within the given configuration. As the program moves from phase 1 to phase 4 the restrictions become more strict.

Phase I: The general unfolding model
In this model each subject is allowed
     1. orthogonally to rotate the axes of the similarity space to his/her own reference dimensions;
     2. to assign to each of his/her dimensions a different evaluative weight or salience.

Within the configuration space each subject is represented as a point located by his/her most preferred position on each of the constituent dimensions of this space, i.e. placed at his/her ideal point.

Phase II: The weighted unfolding model
In this model subjects
     1. are assumed to share the same reference axes
but
     2. are allowed to weight each of the common dimensions (axes) differentially. The subjects' ideal points are again mapped into the space.

Phase III: The simple unfolding model
In this model the subjects

     1. are assumed to share the same reference axes
     2. and to have the same (unit) dimensions (i.e. no differential weighting)
     Then each ideal point is mapped into the common stimulus space.

Phase IV: The vector model
The vector model represents each subject's preference as a vector directed towards their region of maximum preference. The projections of the stimulus points onto the vector reproduce the subject's preference values. Moreover, the angle which the vector makes with each dimension can be thought of as representing the salience of that dimension in the preference judgement.

The program thus forms a hierarchy of models in the sense that each higher numbered phase is a special case of the lower numbered.

The user may choose which of the models to apply by means of the parameters S-PHASE and E-PHASE (Start-phase and End-phase) which may have the same value, in which case only that one model is implemented. By default, the program will begin at phase I and compute solutions by the other models in order of increasing restriction to end at phase IV.

The program computes the F statistic as a measure of improvement as it moves from phase to phase.

Phase IV of PREFMAP is analogous to PROFIT and is the external version of MDPREF

Transformations
In addition to containing four separate models for representing preference data, PREFMAP allows the user to choose the form of the transformation function linking data to the solution, which is tantamount to defining the level of measurement of the preference data.

The linear option
By specifying FIT TYPE(0) in the PARAMETERS statement the user signifies that, in phases I-III, the distances from the subject's ideal point to the stimulus points will be (a least-squares approximation to) a linear transformation of the rating values ascribed to the stimuIi by the subject. In phase IV it is the distances between the projection of the points onto the subject's vector which are so related.

This option is applicable if the data are believed to have interval (or, indeed ratio) level measurement properties.

The non-linear (monotone) option
Alternatively, the user may believe that the data will bear only ordinal interpretation and that the solution distances should be allowed to be (as close as possible to) some ordinal transformation of the original data values. This option is chosen by specifying FIT TYPE(1) in the PARAMETERS statement. FIT TYPE(1) should, however, be chosen only if there are no tied data values. If there are equal data values then the user has the choice of two monotonic approaches:

The primary approach to ties (FIT TYPE(2))
FIT TYPE(2) indicates that the information contained in the ties is important and that ties should be matched by ties in the fitting values. This is to say, that any equal data values which have associated with them unequal distances in the solution will decrease the goodness-of-fit of solution to data.

The secondary approach to ties (FIT TYPE(3))
If, however, the user believes that existence of tied data values is not significant information, they may specify FIT TYPE(3). The program will then not be constrained to matching equal data with equal fitting values, and will 'break' ties if in so doing the goodness-of-fit is improved.

The PREFMAP algorithm
The linear procedure involves at each phase the solution of a multiple regression equation. The coefficients of the equation are output as beta-weights. The non-linear procedure solves first the linear regression then proceeds to use the linear solution as a starting-point for the monotone regression, which is an iterative procedure.

Quasi-Internal analysis
Although PREFMAP is primarily a program for external analysis, the facility exists to perform a quasi-internal analysis. If no configuration is input by the user, the program will generate one from the preference data themselves by an Eckart-Young factorisation of the minor product of the preference data matrix, which has often been transformed in some way (see below). The user has a number of options in choosing how this configuration is formed using the parameter GENERATE :

GENERATE(0)
When GENERATE is given the value 0 the preference matrix is doubled-centred. This means that both row- and column-means are extracted and the overall mean of the matrix added back in. This means in essence that only interaction effects remain. The configuration is then formed in the manner indicated above. When this option is chosen with FIT TYPE(0) and S-PHASE(3), the resulting analysis is equivalent to internal metric unfolding.

GENERATE(1)
Alternatively, before the configuration is formed only the row-means of the preference matrix are extracted. This has the effect of removing from the configuration any influence due to the actual values used by different subjects, though not any effect due to the spread in the scores.

GENERATE(2)
If, in addition, the user wishes to remove influence of the actual spread of preference scores from the initial configuration, then choice of GENERATE(2) instructs the program to standardise the preference scores before extracting the configuration.

GENERATE(3)
The user may wish simply to remove the row effects and the column effects without fully double-centring the matrix. In this case the user should choose GENERATE(3).

These options are not operative when READ CONFIG is used to input a separate stimulus configuration.

Normalisation of scale values
Independent of whether the analysis is external or internal and what information is taken into account if and when generating the configuration, the user may choose whether, in mapping the preference data into the configuration, (s)he wishes to ipsatize the subjects' data. If NORMALISE (1) is chosen each subject's preference scores are normalised by removing the subject's mean score and dividing by their standard deviation. The default option NORMALISE (0) leaves the scores unchanged.

Canonical rotation: the KEEP parameter
In phase II subjects are thought of as applying evaluative weights to the dimensions of the stimulus space. The orientation of these axes is then, in a substantive sense, non-arbitrary. As part of the solution to phase I (where arbitrary rotation of the axes is allowed) the program produces a canonically-rotated space. This is the stimulus configuration with its axes rotated to those of an "average subject" whose data are formed by averaging the subject's preference scores. These axes have certain optimal properties and will not normally correspond to (e.g.) the principal axes of the configuration.

In the normal way the program, in passing from phase I to phase II, will use this average subject space as the basis of its calculations in this and lower phases. If, however, the user feels that the axes of the input configuration have substantive significance and wishes to retain them in the lower phases, then (s)he should set the parameter KEEP (1) in the PARAMETERS statement.

If a solution is not being computed in phase I, then the program will simply use the original input configuration.

Initial weight estimates
If the user begins the analysis at phase III (i.e. S-PHASE (3)) then (s)he has the option of having the program read in estimates of the dimension weights. This is done by means of the READ WEIGHTS command, The weights (one per dimension) should follow READ WEIGHTS, in free format, or optionally according to an associated INPUT FORMAT specification.

The BETA weights
In the linear procedure the program seeks to represent the preference data as a linear function of the (squared) distance between the subject's ideal point/vector and the stimulus points. This is done in a multiple regression. For this to be done the distance equation must be multiplied out into its constituent terms. Since the form of the distance equation differs in each phase with the differing weighting options, the number of terms involved in this expansion will differ at each phase. The program estimates a standardised regression coefficient for each of the terms in this expansion. These are the so-called BETA weights.

The F-statistic
The F-statistic is computed within phases, to ascertain whether the variance explained by the model significantly differs from the residual (unexplained) variance.

Between phases it measures the extent to which the higher-level model explains significantly more variation than the lower level one.

In both cases the F-statistic is only approximate since the stages are not independent.

Phases and strategies
It should be noted that choice of S-PHASE affects the form of the subsequent analysis. In particular note the use of the canonical reference space noted above. Further considerations include the following:

When S-PHASE (1)
At the end of this phase the original input configuration is replaced by the phase 1 "private space" of the average subject. The rationale is that the average subject's configuration is likely to provide a better fit than the original configuration to this data at subsequent stages.

When S-PHASE (2)
At the end of this phase, the original input configuration is replaced by the phase 2 "private space" of the average subject (i.e. differentially weighted to conform but to the averaged data). This removes the canonical reference space for subsequent phases.

When S-PHASE (3)
If the user begins at phase 1 or 2, it is quite possible that the individual ideal point at phase 3 will be a "saddle point", i.e. a point which is a mixture of optimal preference on some one or more dimensions (signalled by a positive value) and pessimal preference (negative value) on others. However, if one begins with S-PHASE (3), the original configuration is unchanged and all ideal points will be positive. In many applications the user may wish to start at a higher phase and finish at phase 3 or 4, but then perform a separate run with S-PHASE(3) if the simple unfolding model turns out to be adequate. In this case a solution is obtained where all ideal points are constrained to be positive.

When S-PHASE (4)
The program will simply implement the vector model.

INPUT COMMANDS

Keyword                                                        Function
N OF STIMULI    [number]                            Number of stimuli in the analysis
N OF SUBJECTS [number]                            Number of subjects/cases
DIMENSIONS     [number]                           Dimensions for analysis
LABELS             [followed by a series            Optionally identify the stimuli
                        of labels (<= 65 chars          and subjects, Labels should identify
                        each on a separate line]       first the stimuli (columns) and
                                                                  then the subjects (rows), of the
                                                                  data matrix, without omissions.

PARAMETERS
Keyword    Default Value           Function
S-PHASE          1             Sets starting-phase for analysis.
E-PHASE          4             Sets end-phase for analysis.
NORMALISE     1              0: Preference matrix not normalised.
                                      1: Matrix is row-normalised.
AVERAGE         1             0: Average subject's scores are
                                        computed in starting phase only.
                                      1: Average subject's scores are
                                       computed at all phases.
DATA TYPE      1              1: Data are ranks: first stimulus
                                       most preferred.
                                      2: Data are ranks: first stimulus
                                       least preferred.
                                      3: Data are scores: highest value
                                       = most preferred.
                                      4: Data are scores: lowest value
                                       = most preferred.
FIT TYPE         0               0: Linear fit.
                                      1: Monotone fit: no ties.
                                      2: Monotone fit: secondary approach
                                         to ties.
                                      3: Monotone fit: primary approach
                                         to ties.
MATFORM       0                0: Input configuration stimuli
                                       (rows) by dimensions (columns).
                                       1: Input configuration dimensions
                                       (rows) by stimuli (columns).
ORIGINAL      0                 0: At each new phase begin with
                                        fitting values from previous phase.
                                       1: At each new phase return to
                                         original preference matrix.
                                       (Applicable only if FIT(0))
KEEP             1                 0: Stimulus configuration becomes
                                         the canonically rotated space
                                         from phase 1 or 2.
                                       1: Return to original configuration.
CRITERION   0.005            Sets the criterion for terminating
                                        iterations.
                                       (Not applicable if FIT(0))
GENERATE     0                 0: Configuration is generated from
                                        the double-centred preference matrix.
                                       1: Configuration is generated after
                                        row-means of preference matrix
                                        are extracted.
                                       2: Configuration is generated from
                                        preference matrix with standardised
                                        rows.
                                       3: Configuration is generated from
                                        preference matrix with both row-
                                        and column-means extracted.

NOTES
1. In the parameters S-PHASE and E-PHASE, the hyphens are significant characters.
2. The preference data are submitted to the program as a matrix of real numbers.
3. The data matrix must have subjects as rows and stimuli as columns.
4. Only one dimensionality per run (task) may be analysed by PREFMAP.

PRINT, PLOT AND PUNCH OPTIONS
Since the output from PREFMAP is extensive, the form of the PRINT, PLOT and PUNCH commands has been modified to limit the amount of information generated.

The format of the commands remains the same, viz:
PRINT           ALL
PLOT            ALL BUT   keyword (arg.), keyword(arg.)...
PUNCH          EXCEPT

In other NewMDSX procedures the argument takes the form of a number (or numbers) which indicate the dimensionality (dimensionalities) for which the particular output specified by the keyword is required. In PREFMAP the analysis is performed in only one dimensionality per task and thus the argument refers to the phase (or phases) for which the particular output is required. Obviously then, the argument will consist of numbers between 1 and 4 only.

Specification of ALL in the PRINT command will generate a detailed set of output for all phases from S-PHASE to E-PHASE. This is detailed and voluminous and its use is not recommended. Default options, as usual, provide the essential information.

PRINT options
Option                Form           Description
GENERATED        p x r       If the program generates the stimulus
                                       configuration from the preference matrix
                                       then this is output.
STIMULI              p x r      The 'solution' matrix of stimulus
                                       coordinates is output. This will differ
                                       at each phase unless KEEP CONFIG is
                                       specified.
SUBJECTS           N x r      The matrix of subject locations is output.
                                       For phases I-III the matrix is of point
                                       coordinates. For phase IV direction
                                       cosines are output.
ROTATIONS    N matrices   Phase I only. For each subject the cosines
                     each r x r     of the angles through which each
                                        dimension is rotated are output.
WEIGHTS            N x r       Phases I-III only. The weights which each
                                        subject assigns to each dimension
                                        are output.
COMPOSITE    matrices   Phase I only. The composite transformation
                     each r x r     matrix for each subject is output.
                                        This is the matrix which transformed the
                                        original space into the weighted and
                                        rotated 'private' space.
IDEAL               N x r          Phase I and II only. The coordinates of
                                         the subject ideal points in the 'private'
                                        spaces (i.e. the space as rotated and/or
                                        weighted) are output.
PRIVATE        N matrices     Phases I and II only. For each subject
                      each p x r    the coordinates of the stimulus with
                                        respect to the axes of the rotated and/or
                                        weighted space are output.
BETA                  -             The regression weights for each subject
                                        are output.
NORMALISED    N x p         This keyword takes no argument and the
                                        matrix of preference values as normalised
                                        will be output once only.
FITTING           N x p          For the linear procedure the estimated
                                        values are output. For the non-linear
                                        procedure the fitting values are output.
DISTANCES      N x p         Phases I-III: The matrix of squared
                                        distances between each subject and the
                                        stimuli is output.
                                        Phase IV: The projections of the stimulus
                                        points onto each vector are output.
RESIDUALS      N x p         The residual values (DISTANCES - FITTING)
                                        are output.
JOINT               N x r         Prints both the subject and stimuli final
                       p x r          co-ordinates.

By default, the following are output:
STIMULI
SUBJECTS
ROTATIONS (phase I)
IDEAL (phases I-II)
Within- and between-phase F-tests
Individual correlation of data to solution

PLOT options
Option                          Description
INITIAL               This keyword takes no argument. The configuration
                          of stimulus points as input is plotted once only.
STIMULI              The configuration of stimulus points at each
                          selected phase is plotted. There will be r(r-1)/2
                          two-way plots.
SUBJECTS           The configuration of subject ideal points (vectors)
                          is plotted in the selected phases in the form of
                           r(r-1)/2 plots.
JOINT                 The configuration of stimulus points and subject
                          points (vectors) is plotted.
SHEPARD            A Shepard diagram of the data values plotted
                          against the (squared) distances is produced.
CORRELATIONS   A series of plots of individual (subject)
                          correlations to the solution is produced.
RESIDUALS         A histogram of residual values is produced.

By default only the first two dimensions of the joint space will be plotted.

PUNCH options (to secondary output file)
Option                   Description
SPSS                  Each line of the output matrix contains the
                          following for each phase selected
                          I : the subject index
                          J : the stimulus index
                          INPUT : the corresponding data value
                          NORMALISED: the data value as normalised
                          FITTING : the corresponding fitting value
                          DISTANCE : the squared distance between I and J
                          RESIDUAL : the corresponding residual value
                          in a fixed format.
GENERATED        The matrix of stimulus values if this is generated
                          by the program.
SOLUTION           The two solution matrices, i.e. the matrix
                          of stimulus coordinates and the matrix of subject
                          coordinates (or, in phase IV, cosines).

By default, no secondary output file is produced.

PROGRAM LIMITS
Maximum no. of subjects = 200
Maximum no. of stimuli = 200
Maximum no. of dimensions = 5