Canonical Correlation Analysis

Version info: Code for this page was tested in IBM SPSS 20.

Canonical correlation analysis is used to identify and measure the associations among two sets of variables. Canonical correlation is appropriate in the same situations where multiple regression would be, but where are there are multiple intercorrelated outcome variables. Canonical correlation analysis determines a set of canonical variates, orthogonal linear combinations of the variables within each set that best explain the variability both within and between sets.

Please Note: The purpose of this page is to show how to use various data analysis commands. It does not cover all aspects of the research process which researchers are expected to do. In particular, it does not cover data cleaning and checking, verification of assumptions, model diagnostics and potential follow-up analyses.

Examples of canonical correlation analysis

Example 1.

A researcher has collected data on three psychological variables, four academic variables (standardized test scores) and gender for 600 college freshman. She is interested in how the set of psychological variables relates to the academic variables and gender. In particular, the researcher is interested in how many dimensions are necessary to understand the association between the two sets of variables.

Example 2. A researcher is interested in exploring associations among factors from two multidimensional personality tests, the MMPI and the NEO. She is interested in what dimensions are common between the tests and how much variance is shared between them. She is specifically interested in finding whether the neuroticism dimension from the NEO can account for a substantial amount of shared variance between the two tests..

Description of the data

Let’s pursue Example 1 from above.

We have included the data file, which can be obtained by clicking on mmreg.sav. The dataset has 600 observations on eight variables. The psychological variables are locus of control, self-concept and motivation. The academic variables are standardized tests in reading, writing, math and science. Additionally, the variable female is a zero-one indicator variable with the one indicating a female student.

Let’s look at the data.

get file='d:\data\mmreg.sav'.

descriptives
  variables=locus_of_control self_concept motivation 
  read write math science female
  /statistics=mean stddev min max.



frequencies
  variables=female .

Here are the correlations among the variables in the analysis.

correlations
  /variables=locus_of_control self_concept motivation read write math science female.

Analysis methods you might consider

Below is a list of some analysis methods you may have encountered. Some of the methods listed are quite reasonable while others have either fallen out of favor or have limitations.

Canonical correlation analysis, the focus of this page.
Separate OLS Regressions – You could analyze these data using separate OLS regression analyses for each variable in one set. The OLS regressions will not produce multivariate results and does not report information concerning dimensionality.
Multivariate multiple regression is a reasonable option if you have no interest in dimensionality.

SPSS performs canonical correlation using the manova command. Don’t look for manova in the point-and-click analysis menu, its not there. The manova command is one of SPSS’s hidden gems that is often overlooked. Used with the discrim option, manova will compute the canonical correlation analysis.

Due to the length of the output, we will be making comments in several places along the way.

manova locus_of_control self_concept motivation WITH read write math science female
/ discrim all alpha(1) 
/ print=sig(eigen dim) .

The number of possible canonical variates, also known as canonical dimensions, is equal to the number of variables in the smaller set (the variables to the left of “WITH” in this example, called “DEPENDENT variables” in SPSS output). In our example, the first set has three variables and the second set has five (called “COVARIATES” in SPSS output). This leads to three possible canonical variates for each set, which corresponds to the three columns for each set and three canonical correlation coefficients in the output. Canonical dimensions are latent variables that are analogous to factors obtained in factor analysis, except that canonical variates also maximize the correlation between the two sets of variables. In general, not all the canonical dimensions will be statistically significant. A significant dimension corresponds to a significant canonical correlation and vice versa.

The output below begins with an overall multivariate test of the entire model using four different multivariate criteria. This is followed by the three canonical correlations and the multivariate tests of each of the dimensions. These results show that the first two of the three canonical correlations are statistically significant at the .05 level.

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
The default error term in MANOVA has been changed from WITHIN CELLS to
WITHIN+RESIDUAL.  Note that these are the same for all full factorial designs.



* * * * * * * * * * * * * * * * * A n a l y s i s   o f   V a r i a n c e * * * * * * * * * * * * * * * * *


       600 cases accepted.
         0 cases rejected because of out-of-range factor values.
         0 cases rejected because of missing data.
         1 non-empty cell.

         1 design will be processed.

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -



* * * * * * * * * * * * * * * * * A n a l y s i s   o f   V a r i a n c e -- Design   1 * * * * * * * * * * * * * * * * *

 EFFECT .. WITHIN CELLS Regression
 Multivariate Tests of Significance (S = 3, M = 1/2, N = 295 )

 Test Name             Value        Approx. F       Hypoth. DF         Error DF        Sig. of F

 Pillais                .25425         11.00057            15.00          1782.00             .000
 Hotellings             .31430         12.37633            15.00          1772.00             .000
 Wilks                  .75436         11.71573            15.00          1634.65             .000
 Roys                   .21538

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
 Eigenvalues and Canonical Correlations

 Root No.       Eigenvalue           Pct.      Cum. Pct.     Canon Cor.        Sq. Cor

        1           .27450       87.33628       87.33628         .46409         .21538
        2           .02887        9.18537       96.52164         .16751         .02806
        3           .01093        3.47836      100.00000         .10399         .01081

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Dimension Reduction Analysis

 Roots              Wilks L.                F       Hypoth. DF         Error DF        Sig. of F

 1 TO 3               .75436         11.71573            15.00          1634.65             .000
 2 TO 3               .96143          2.94446             8.00          1186.00             .003
 3 TO 3               .98919          2.16461             3.00           594.00             .091

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Here we have the overall multivariate tests for dimensionality.

We also have the canonical correlations as well how much variance of the dependent variables is explained by the dimensions. For this particular model there are three canonical dimensions of which only the first two are statistically significant. The first test of dimensions tests whether all three dimensions combined are significant (they are), the next test tests whether dimensions 2 and 3 combined are significant (they are). Finally, the last test tests whether dimension 3, by itself, is significant (it is not). Therefore dimensions 1 and 2 must each be significant.

 EFFECT .. WITHIN CELLS Regression (Cont.)
 Univariate F-tests with (5,594) D. F.

 Variable       Sq. Mul. R     Adj. R-sq.     Hypoth. MS       Error MS              F      Sig. of F

 locus_of           .18062         .17372        9.72160         .37123       26.18789           .000
 self_con           .01957         .01131        1.16669         .49212        2.37076           .038
 motivati           .07874         .07098        1.10799         .10913       10.15338           .000

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
 Raw canonical coefficients for DEPENDENT variables
           Function No.

 Variable                  1                2                3

 locus_of            1.25383          -.62148           .66169
 self_con            -.35135         -1.18769          -.82672
 motivati            1.26242          2.02726         -2.00023

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
 Standardized canonical coefficients for DEPENDENT variables
           Function No.

 Variable                  1                2                3

 locus_of             .84042          -.41656           .44352
 self_con            -.24788          -.83793          -.58326
 motivati             .43267           .69480          -.68554

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
 Correlations between DEPENDENT and canonical variables
           Function No.

 Variable                  1                2                3

 locus_of             .90405          -.38969           .17562
 self_con             .02084          -.70874          -.70516
 motivati             .56715           .35089          -.74513

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
 Variance in dependent variables explained by canonical variables

 CAN. VAR.       Pct Var DEP      Cum Pct DEP      Pct Var COV      Cum Pct COV

        1           37.97982         37.97982          8.17994          8.17994
        2           25.90966         63.88948           .72701          8.90694
        3           36.11052        100.00000           .39050          9.29745

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
 Raw canonical coefficients for COVARIATES
           Function No.

 COVARIATE                 1                2                3

 read                 .04462          -.00491          -.02138
 write                .03588           .04207          -.09131
 math                 .02342           .00423          -.00940
 science              .00503          -.08516           .10984
 female               .63212          1.08464          1.79465

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
 Standardized canonical coefficients for COVARIATES
           CAN. VAR.

 COVARIATE                 1                2                3

 read                 .45080          -.04961          -.21601
 write                .34896           .40921          -.88810
 math                 .22047           .03982          -.08848
 science              .04878          -.82660          1.06608
 female               .31504           .54057           .89443

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
 Correlations between COVARIATES and canonical variables
           CAN. VAR.

 Covariate                 1                2                3

 read                 .84045          -.35883          -.13536
 write                .87654           .06484          -.25456
 math                 .76395          -.29795          -.14776
 science              .65841          -.67680           .23036
 female               .36411           .75493           .54340

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
 Variance in covariates explained by canonical variables

 CAN. VAR.       Pct Var DEP      Cum Pct DEP      Pct Var COV      Cum Pct COV

        1           11.30458         11.30458         52.48769         52.48769
        2             .70132         12.00590         24.99409         77.48177
        3             .09804         12.10394          9.06617         86.54795

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

The raw canonical coefficients above are used to generate the canonical variates, represented by the columns (1 2 3) in the coefficient tables, for each set. They are interpreted in a manner analogous to interpreting regression coefficients i.e., for the variable read, a one unit increase in reading leads to a .0446 increase in the first canonical variate of the COVARIATE set when all of the other variables are held constant. Here is another example: being female leads to a .6321 increase in the dimension 1 for the COVARIATE set with the other predictors held constant.

The raw canonical coefficients are interpreted in a manner analogous to interpreting regression coefficients i.e., for the variable read, a one unit increase in reading leads to a .0446 increase in the first canonical variate of set 2 when all of the other variables are held constant. Here is another example: being female leads to a .6321 increase in the dimension 1 for set 2 with the other predictors held constant. When the variables in the model have very different standard deviations, the standardized coefficients allow for easier comparisons among the variables.

The raw canonical coefficients are followed by the standardized canonical coefficients. The standardized canonical coefficients are interpreted in a manner analogous to interpreting standardized regression coefficients. For example, consider the variable read, a one standard deviation increase in reading leads to a 0.45 standard deviation increase in the score on the first canonical variate for the COVARIATE set when the other variables in the model are held constant.

Things to consider

As in the case of multivariate regression, MANOVA and so on, for valid inference, canonical correlation analysis requires the multivariate normal and homogeneity of variance assumption.
Canonical correlation analysis assumes a linear relationship between the canonical variates and each set of variables.
Similar to multivariate regression, canonical correlation analysis requires a large sample size.

References

Afifi, A, Clark, V and May, S. 2004. Computer-Aided Multivariate Analysis. 4th ed. Boca Raton, Fl: Chapman & Hall/CRC.
Garson, G. David (2015). GLM Multivariate, MANOVA, and Canonical Correlation. Asheboro, NC: Statistical Associates Publishers.
G. David Garson, Canonical Correlation in Statnotes: Topics in Multivariate Analysis
Pedhazur, E. 1997. Multiple Regression in Behavioral Research. 3rd ed. Orlando, Fl: Holt, Rinehart and Winston, Inc.

Examples of canonical correlation analysis

Description of the data

Analysis methods you might consider

Canonical correlation analysis

Things to consider

See also

References