Abstract
Canonical correlation analysis (CCA) describes the associations between two sets of variables by maximizing the correlation between linear combinations of the variables in each dataset. However, in high-dimensional settings where the number of variables exceeds the sample size or when the variables are highly correlated, traditional CCA is no longer appropriate. This paper proposes a method for sparse CCA. Sparse estimation produces linear combinations of only a subset of variables from each dataset, thereby increasing the interpretability of the canonical variates. We consider the CCA problem from a predictive point of view and recast it into a regression framework. By combining an alternating regression approach together with a lasso penalty, we induce sparsity in the canonical vectors. We compare the performance with other sparse CCA techniques in different simulation settings and illustrate its usefulness on a genomic dataset.
Original language | English |
---|---|
Pages (from-to) | 834-851 |
Journal | Biometrical Journal |
Volume | 57 |
Issue number | 5 |
DOIs | |
Publication status | Published - Sept 2015 |
Externally published | Yes |
Keywords
- Canonical correlation analysis
- Genomic data
- Lasso
- Penalized regression
- Sparsity