CANonical DECOMPosition : CANDECOMP

provides internal analysis by decomposing a 3- to 7-way data matrix of (dis)similarity or correlation matrices, into a set of dimensional weights (one set per way), whose scalar product  reproduces the data (using a linear transformation of the data).

DATA: N-way (‘rectangular’) matrix of dis/similarity or correlational  measures
TRANSFORM: Linear
MODEL: N-way scalar-products

CANDECOMP takes as input a table of data values with between three and seven ways. In the solution, each of these ways is represented by a configuration of points representing the elements of that particular way in a space of chosen dimensionality. Each data value is regarded as being the scalar product between the relevant elements. The program assumes that the data are at the interval level of measurement.

There are two basic forms of data input to CANDECOMP, which we will refer to as
1.  Canonical Decomposition analysis proper, and
2.  "Extended INDSCAL" analysis

1.                CANDECOMP proper

In the general CANDECOMP case all the ways are considered distinct, and up to seven ways are allowed for. For example, a four-way CANDECOMP might consists of a set of subjects (W1) who make preference ratings of a set of objects (W2)  in different experimental conditions (W3)  at different point in time (W4). The default parameter values produce this analysis.

2.  The “extended INDSCAL” analysis

What we call the 'extended INDSCAL' refers to the case where the user wishes to extend a conventional INDSCAL analysis to include not only a third way (e.g. subjects) but yet further ways. The mode will always be one less than the way of the data. For instance, the user may have a set of two-way one-mode dis/similarity matrices  which exist both for individuals (third way) and different points  in time (fourth way). The solution will give a configuration  (the Group Space) and dimensional weights for each remaining way  -- in this case, for individuals and another set for each point in time.

For data of this type, SET MATRICES should be given the value 1 in the PARAMETERS command. The DATA TYPE parameter should also be given a suitable value. See below for a description of the use of the SIZES parameter.

3. Initial configuration

CANDECOMP is prone to suboptimal solutions; users are recommended to make a series of runs with different starting configurations. A series of similar solutions will usually indicate if a global minimum has been found. If the extended INDSCAL analysis is required (i.e. SET MATRICES(1)) then an initial configuration may also be input.

To perform an external INDSCAL analysis, SET MATRICES(1) is also required, an initial configuration is input, and the FIX POINTS parameter is set to 1. This can be a useful option when there are a very large number of subjects, and an INDSCAL Group Space from a representative sample is used as the fixed initial configuration. The program is then used to estimate weights of input batches of subjects, each referring to the same fixed Group Space.

4. DATA

Data are read by the READ MATRIX instruction, in free format, or under an associated INPUT FORMAT specification. The dimensions of the input matrix are supplied by the SIZES command, which is peculiar to CANDECOMP. This replaces N OF SUBJECTS and N OF STIMULI, which are not recognized by this program. SIZES requires as its operand up to seven numbers, separated by commas or spaces, each of which is the number of objects in one of the ways of the matrix. As many numbers must be specified as there are ways in the data.

The order in which the ways are entered in the SIZES command is critical:

5. PARAMETERS in CANDECOMP

Keyword   Default     Function
DATA TYPE   0  0: An N-way table is input.
                       1: Lower triangle similarity matrix (without diagonals).
                       2: Lower triangle dissimilarity matrix (without diagonals).
                       3: Lower triangle Euclidean distances (without diagonals).
                       4: Lower triangle correlation matrix (without diagonals).
                       5: Lowerhalf covariance matrix (with diagonals).
                       6: Full symmetric similarity matrix (diagonals ignored).
                       7: Full symmetric dissimilarity matrix (diagonals ignored).

RANDOM    12345 (Any positive integer)
                        Seed for pseudo-random number generator.

SET MATRICES 0 0: The CANDECOMP analysis is performed.
                         1: The extended INDSCAL analysis is performed
                             (matrix 2 and matrix 3 are set equal).

FIX POINTS   0  0: Iterate and solve for all matrices.
                        1: One matrix is input, and held constant (external analysis).

CRITERION   0.005 (values between 0 and 1)
                            Sets improvement level for terminating iterations.

CENTRE        0  0: The matrices are not centred.
                       1: Each of the N ways will be centred by extraction
                           of the appropriate mean (only applicable if DATA(0)).

NOTES
1. The SIZES command is obligatory for CANDECOMP
2. The commands N OF STIMULI, N OF SUBJECTS are not valid with CANDECOMP.
3. When DATA TYPE takes values 1 through 5 no diagonal is input.
    For values 6 and 7 the diagonals are input but ignored.
4. In the parameters SET MATRICES and FIX POINTS the spaces are significant characters.

5. PRINT options
( N denotes the number of ways in the analysis (3
£N £ 7), m the number of modes (2 £ m £ 7).
Option            Form                   Description
INITIAL        n matrices       The initial estimates of the
                   are output.      configurations are output. Each
                                         matrix contains the coordinates of the
                                         points in the required dimensions.
                                         If the user has input an initial
                                         configuration, then the second two
                                         matrices will be identical.
FINAL           m matrices     The solution configurations are output.
                   are output.      Each matrix contains the coordinates of
                                         the relevant number of points on the
                                         axes of the space. These are followed
                                         by the correlations between each
                                         subject's data and solution.
                                         The matrix of cross-products between
                                         the dimensions is output.
CORRELATIONS                  Correlations between computed scores and
                                         original data for subjects.
HISTORY       The overall correlation at each iteration is output.
                    The unnormalised matrices at convergence are also
                    output (there will be n of these).

By default only the FINAL matrices and the overall correlation at convergence are output.

6. PLOT options
Option                Description
INITIAL              The initial configuration may be plotted only if one has
                          been input by the user.
CORRELATIONS  The overall correlation at each iteration is plotted in the
                         form of a histogram.
WAY1
WAY2
WAY3                 r(r-1)/2 plots are produced for
WAY4                 each way specified.
WAY5
WAY6
WAY7

7. PUNCH options (to secondary output file)
Option                 Description
FINAL                 The configuration of points for each way in the chosen
                         dimensionality is output in a fixed format.
CORRELATIONS  The overall correlation at each iteration is output
                          in a fixed format.
Note: by default, no secondary output file is produced.

8. PROGRAM LIMITS
Maximum no. of ways = 7
Maximum no. of dimensions = 10
Maximum no. of elements per way = 100
Way1 x Way2 x Way3 = 18000

See also

  • The NewMDSX commands in full