Multi-Dimensional Scaling for SORTing data : MDSORT

The basic operation of sorting consists of subjects allocating a set of objects into categories of their own choosing. The researcher usually defines a common set of "objects" (stimuli, statements, names, artefacts, pictures) and then asks typically asks each of the n subjects to sort the p objects into a subject-chosen number c of groups/categories. The mathematical representation of the sorting is:The most important characteristic of a partition is that the categories of a subject's sorting must be mutually exclusive and exhaustive, i.e. each object must be sorted into one, and only one, category. This allows an object to be put into a category by itself, but it explicitly disallows overlapping categories. Sorting data are therefore, at least initially, at the nominal level of measurement. Note that sorting data can also be coded in Burt matrices and run with Correspondence Analysis (CORRESP) .

The model implemented in MDSORT is designed specifically for the direct analysis of sorting data, and was developed to generate a joint representation of objects and subjects' categories, which simultaneously scales and represents the sorting data. Takane's (1980) model takes the data as a matrix F consisting of a set of N row vectors, one for each respondent i, arrayed so that each column refers to a given object j and where the entry f(i,j) consists of the category/group number in which the object is located. The categories are in a sequential (but arbitrary) numbering, that is:

where the value of the cell fi,j is the category number (say, k) in which object j occurs in i's sorting.

The data matrix F is then expanded into a set of individual matrices G, each of which is of size p rows and q categories, where q may differ from subject to subject in free-sorting, consisting of the values

The program generates a joint scaling by decomposing the data matrix. The major feature of the model is that a decomposition is sought which simultaneously seeks to locate both the object point locations and the category centroids for each subject - this being the degree of individual difference allowed in this model, which thus allows the subjects to be represented by a series of category centroids, rather than by a single ideal point.

The intention is to obtain a configuration of stimulus/object points in such a way that the sum of squared inter-category distances (averaged over subjects) is maximized under suitable normalization restrictions. MDSORT determines a matrix X of coordinates of the n objects in a minimal, user-chosen dimensionality, r. The squared distances between category centroids are related by definition to the trace of the product-moment of X, which is determined so that tr(X'BX) is maximized, where B is the mean of the sum of the subject-specific similarity matrices. It is important to note that these similarity values are scaled according to the size of the categories on which they are based, so that the similarity between two objects sorted into the same group is inversely related to the size of the category. The raw co-occurrence counts may also be output, and may be submitted for comparison to other scaling routines within NewMDSX.

With the added restriction for the multidimensional case:

the maximum of tr(X'BX) is the matrix of normalized eigenvectors of B corresponding to its r dominant eigenvalues and satisfying the centering requirement by excluding the constant eigenvector. Once X has been obtained in this way, category centroids for each subject can be derived from it, in combination with the original data matrix.

INPUT for MDSORT
The number of principal components to be listed must be restricted by the number given in the DIMENSIONS statement. The number of columns in the input data is given by N OF STIMULI and the number of rows by N OF SUBJECTS. The input matrix is read by the READ DATA command. By default, input is assumed to be in free format, but if an INPUT FORMAT specification is used, it should be specified to read a line of integer values corresponding to the N OF STIMULI given. LABELS, followed by a series of labels (<= 65 char) each on a separate line, optionally identifies the stimuli, in order, with no omissions.

OUTPUT

PRINT options (to main output file)
Option                            Description
SIMILARITIES           Outputs the matrix of similarities B between
                                the stimuli, derived from the input data.
CO-OCCURRENCES    Outputs the matrix of raw cooccurrences
                                in categories of the stimuli.
CLUSTERS                 Outputs the set of individual cluster centroids
                                corresponding to the overall similarities.

PLOT options (to main output file)
Option                           Description
STIMULI                  Plots the stimulus configuration, representing
                              the number of normalized principal components
                              specified by the DIMENSIONS statement.
CLUSTERS               Plots the set of individual cluster centroid configurations.
                              If the N OF SUBJECTS is more than a small,
                              number, this option may produce a
                              rather large amount of output.

NOTES
1. The READ DATA command is obligatory in MDSORT.
2. No secondary output file is produced by MDSORT.
3. No PARAMETERS are used by MDSORT.

PROGRAM LIMITS
Maximum no. of subjects = 200
Maximum no. of stimuli = 200
Maximum dimensions = 8

See also

  • The NewMDSX commands in full