Canonical Correlation: Linear and Nonlinear (Statistical Associates Publishers Blue Book Series)
Author: G. David Garson
Format: Kindle Edition
Publisher: Statistical Associates Publishers
Release Date: May 16, 2012
A canonical correlation is the correlation of two canonical (latent) variables, one representing a set of independent variables, the other a set of dependent variables. Each set may be considered a latent variable based on measured indicator variables in its set. The canonical correlation is optimized such that the linear correlation between the two latent variables is maximized. Whereas multiple regression is used for many-to-one relationships, canonical correlation is used for many-to-many relationships.
There may be more than one such linear correlation relating the two sets of variables, with each such correlation representing a different dimension by which the independent set of variables is related to the dependent set. The purpose of canonical correlation is to explain the relation of the two sets of variables, not to model the individual variables. For each canonical variate one can also assess how strongly it is related to measured variables in its own set, or the set for the other canonical variate. Wilks's lambda is commonly used to test the significance of canonical correlation.
Analogous to ordinary correlation, canonical correlation squared is the percent of variance in the dependent set explained by the independent set of variables along a given dimension (there may be more than one). In addition to asking how strong the relationship is between two latent variables, canonical correlation is useful in determining how many dimensions are needed to account for that relationship. Canonical correlation finds the linear combination of variables that produces the largest correlation with the second set of variables. This linear combination, or "root," is extracted and the process is repeated for the residual data, with the constraint that the second linear combination of variables must not correlate with the first one. The process is repeated until a successive linear combination is no longer significant.
Canonical correlation is a member of the multiple general linear hypothesis (MLGH) family and shares many of the assumptions of multiple regression and multiple analysis of variance, such as linearity of relationships, homoscedasticity (same level of relationship for the full range of the data), interval or near-interval data, untruncated variables, proper specification of the model, lack of high multicollinearity, and multivariate normality for purposes of hypothesis testing. It also shares with factor analysis the need to impute labels for the canonical variables based on structure correlations, which function as a form of canonical factor loading; researchers may well impute different labels based on the same data.
Nonlinear canonical correlation is also available in SPSS in what it labels its OVERALLS procedure. Nonlinear canonical correlation is treated in this Statistical Associates "blue book" volume.
See also the separate Statistical Associates "blue book" on partial least squares regression, which is sometimes used to predict one set of response variables from a set of independent variables.
Table of Contents
Key Concepts and Terms 7
Canonical variable or variate 7
Canonical correlation and canonical dimensions 7
Pooled canonical correlation 9
Canonical weights 9
Canonical scores 10
Canonical factor loadings 10
Canonical communality coefficients 11
Canonical variate adequacy coefficients 11
Redundancy coefficients 11
Pooled redundancy coefficients 12
Canonical plots 12
Likelihood ratio test 14
Linear canonical correlation in SPSS 16
The MANOVA method in SPSS 16
Significance of the model: Wilks's lambda 17