Sparse canonical correlation analysis from a predictive point of view

Ines Wilms; Christophe Croux

doi:10.1002/bimj.201400226

Sparse canonical correlation analysis from a predictive point of view

Ines Wilms^*, Christophe Croux

^*Corresponding author for this work

Research output: Contribution to journal › Article › Academic › peer-review

Abstract

Canonical correlation analysis (CCA) describes the associations between two sets of variables by maximizing the correlation between linear combinations of the variables in each dataset. However, in high-dimensional settings where the number of variables exceeds the sample size or when the variables are highly correlated, traditional CCA is no longer appropriate. This paper proposes a method for sparse CCA. Sparse estimation produces linear combinations of only a subset of variables from each dataset, thereby increasing the interpretability of the canonical variates. We consider the CCA problem from a predictive point of view and recast it into a regression framework. By combining an alternating regression approach together with a lasso penalty, we induce sparsity in the canonical vectors. We compare the performance with other sparse CCA techniques in different simulation settings and illustrate its usefulness on a genomic dataset.

Original language	English
Pages (from-to)	834-851
Journal	Biometrical Journal
Volume	57
Issue number	5
DOIs	https://doi.org/10.1002/bimj.201400226
Publication status	Published - Sept 2015
Externally published	Yes

Keywords

Canonical correlation analysis
Genomic data
Lasso
Penalized regression
Sparsity

Access to Document

10.1002/bimj.201400226

Cite this

@article{2fcd57dd1e0d4329b9bddee9ed170522,

title = "Sparse canonical correlation analysis from a predictive point of view",

abstract = "Canonical correlation analysis (CCA) describes the associations between two sets of variables by maximizing the correlation between linear combinations of the variables in each dataset. However, in high-dimensional settings where the number of variables exceeds the sample size or when the variables are highly correlated, traditional CCA is no longer appropriate. This paper proposes a method for sparse CCA. Sparse estimation produces linear combinations of only a subset of variables from each dataset, thereby increasing the interpretability of the canonical variates. We consider the CCA problem from a predictive point of view and recast it into a regression framework. By combining an alternating regression approach together with a lasso penalty, we induce sparsity in the canonical vectors. We compare the performance with other sparse CCA techniques in different simulation settings and illustrate its usefulness on a genomic dataset.",

keywords = "Canonical correlation analysis, Genomic data, Lasso, Penalized regression, Sparsity",

author = "Ines Wilms and Christophe Croux",

note = "Breastcancer data set available in Witten, D., Tibshirani, R. and Gross, S. (2011). Penalized Multivariate Analysis. R package version 1.0.7.1. Available on CRAN (http://cran.rproject.org/web/packages/PMA/index.html).",

year = "2015",

month = sep,

doi = "10.1002/bimj.201400226",

language = "English",

volume = "57",

pages = "834--851",

journal = "Biometrical Journal",

issn = "0323-3847",

publisher = "Wiley",

number = "5",

}

TY - JOUR

T1 - Sparse canonical correlation analysis from a predictive point of view

AU - Wilms, Ines

AU - Croux, Christophe

N1 - Breastcancer data set available in Witten, D., Tibshirani, R. and Gross, S. (2011). Penalized Multivariate Analysis. R package version 1.0.7.1. Available on CRAN (http://cran.rproject.org/web/packages/PMA/index.html).

PY - 2015/9

Y1 - 2015/9

N2 - Canonical correlation analysis (CCA) describes the associations between two sets of variables by maximizing the correlation between linear combinations of the variables in each dataset. However, in high-dimensional settings where the number of variables exceeds the sample size or when the variables are highly correlated, traditional CCA is no longer appropriate. This paper proposes a method for sparse CCA. Sparse estimation produces linear combinations of only a subset of variables from each dataset, thereby increasing the interpretability of the canonical variates. We consider the CCA problem from a predictive point of view and recast it into a regression framework. By combining an alternating regression approach together with a lasso penalty, we induce sparsity in the canonical vectors. We compare the performance with other sparse CCA techniques in different simulation settings and illustrate its usefulness on a genomic dataset.

AB - Canonical correlation analysis (CCA) describes the associations between two sets of variables by maximizing the correlation between linear combinations of the variables in each dataset. However, in high-dimensional settings where the number of variables exceeds the sample size or when the variables are highly correlated, traditional CCA is no longer appropriate. This paper proposes a method for sparse CCA. Sparse estimation produces linear combinations of only a subset of variables from each dataset, thereby increasing the interpretability of the canonical variates. We consider the CCA problem from a predictive point of view and recast it into a regression framework. By combining an alternating regression approach together with a lasso penalty, we induce sparsity in the canonical vectors. We compare the performance with other sparse CCA techniques in different simulation settings and illustrate its usefulness on a genomic dataset.

KW - Canonical correlation analysis

KW - Genomic data

KW - Lasso

KW - Penalized regression

KW - Sparsity

U2 - 10.1002/bimj.201400226

DO - 10.1002/bimj.201400226

M3 - Article

C2 - 26147637

SN - 0323-3847

VL - 57

SP - 834

EP - 851

JO - Biometrical Journal

JF - Biometrical Journal

IS - 5

ER -