TY - JOUR
T1 - Cellwise robust regularized discriminant analysis
AU - Aerts, Stephanie
AU - Wilms, Ines
N1 - Forest soil data is available in Todorov (2016) rrcovHD: Robust multivariate methods for high dimensional data. R package version 0.2-5 Available on CRAN (https://cran.r-project.org/web/packages/rrcovHD/index.html).
Phoneme data is available in Hastie, T., Tibshirani, R. and Friedman, F. (2009), The elements of statistical learning, data mining and prediction, 2nd edition, Springer Verlag, New York, https://web.stanford.edu/~hastie/ElemStatLearn/.
PY - 2017/12
Y1 - 2017/12
N2 - Quadratic and linear discriminant analysis (qda and lda) are the most often applied classification rules under normality. In qda, a separate covariance matrix is estimated for each group. If there are more variables than observations in the groups, the usual estimates are singular and cannot be used anymore. Assuming homoscedasticity, as in lda, reduces the number of parameters to estimate. This rather strong assumption is however rarely verified in practice. Regularized discriminant techniques that are computable in high dimension and cover the path between the 2 extremes qda and lda have been proposed in the literature. However, these procedures rely on sample covariance matrices. As such, they become inappropriate in the presence of cellwise outliers, a type of outliers that is very likely to occur in high-dimensional datasets. In this paper, we propose cellwise robust counterparts of these regularized discriminant techniques by inserting cellwise robust covariance matrices. Our methodology results in a family of discriminant methods that (1) are robust against outlying cells, (2) cover the gap between lda and qda, and (3) are computable in high dimension. The good performance of the new methods is illustrated through simulated and real data examples. As a by-product, visual tools are provided for the detection of outliers.
AB - Quadratic and linear discriminant analysis (qda and lda) are the most often applied classification rules under normality. In qda, a separate covariance matrix is estimated for each group. If there are more variables than observations in the groups, the usual estimates are singular and cannot be used anymore. Assuming homoscedasticity, as in lda, reduces the number of parameters to estimate. This rather strong assumption is however rarely verified in practice. Regularized discriminant techniques that are computable in high dimension and cover the path between the 2 extremes qda and lda have been proposed in the literature. However, these procedures rely on sample covariance matrices. As such, they become inappropriate in the presence of cellwise outliers, a type of outliers that is very likely to occur in high-dimensional datasets. In this paper, we propose cellwise robust counterparts of these regularized discriminant techniques by inserting cellwise robust covariance matrices. Our methodology results in a family of discriminant methods that (1) are robust against outlying cells, (2) cover the gap between lda and qda, and (3) are computable in high dimension. The good performance of the new methods is illustrated through simulated and real data examples. As a by-product, visual tools are provided for the detection of outliers.
KW - cellwise robust precision matrix
KW - classification
KW - discriminant analysis
KW - penalized estimation
KW - INVERSE COVARIANCE ESTIMATION
KW - GRAPHICAL LASSO
KW - MULTIVARIATE LOCATION
KW - SCATTER
U2 - 10.1002/sam.11365
DO - 10.1002/sam.11365
M3 - Article
SN - 1932-1864
VL - 10
SP - 436
EP - 447
JO - Statistical Analysis and Data Mining
JF - Statistical Analysis and Data Mining
IS - 6
ER -