Sparse canonical correlation analysis from a predictive point of view

Ines Wilms*, Christophe Croux

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

Abstract

Canonical correlation analysis (CCA) describes the associations between two sets of variables by maximizing the correlation between linear combinations of the variables in each dataset. However, in high-dimensional settings where the number of variables exceeds the sample size or when the variables are highly correlated, traditional CCA is no longer appropriate. This paper proposes a method for sparse CCA. Sparse estimation produces linear combinations of only a subset of variables from each dataset, thereby increasing the interpretability of the canonical variates. We consider the CCA problem from a predictive point of view and recast it into a regression framework. By combining an alternating regression approach together with a lasso penalty, we induce sparsity in the canonical vectors. We compare the performance with other sparse CCA techniques in different simulation settings and illustrate its usefulness on a genomic dataset.
Original languageEnglish
Pages (from-to)834-851
JournalBiometrical Journal
Volume57
Issue number5
DOIs
Publication statusPublished - Sept 2015
Externally publishedYes

Keywords

  • Canonical correlation analysis
  • Genomic data
  • Lasso
  • Penalized regression
  • Sparsity

Cite this