Weight randomization test for the selection of the number of components in PLS models

Thanh Tran*, Ewa Szymanska, Jan Gerretzen, Lutgarde Buydens, Nelson Lee Afanador, Lionel Blanchet

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

Abstract

The selection of the optimal number of components remains a difficult but essential task in partial least squares (PLS). Randomization tests have the advantage of being automatic and they make use of the entire dataset, in contrary with the widely used cross-validation approaches. Partial least squares modeling may include component(s) with a large amount of irrelevant data variation, and this might affect the model, depending on the assigned y-loading (which is the regression coefficient in the latent domain). This has recently been indicated by us in the basic sequence framework with respect to the underlying theory of the PLS algorithm and presented to the chemometrics society. We will show in this work that this irrelevant data variation is the root cause of the difficulty in current methods for selecting the optimal number of components. For randomization tests, PLS models with nonsignificant components may result in false positive tests because of the incorrect assumption that "the components enter the model in a natural order". In this work, we introduce a new randomization test, weight randomization test, selection of the optimal number of components in PLS in light of the underlying theory of the PLS algorithm. In the proposed method the null distribution is well characterized and efficiently determined taking into account a newly defined model quality metric: the number of consecutive non-significant components (CNC). We illustrate the effectiveness of weight randomization test in optimization of preprocessing as well as in classification models, where results are compared with the double cross-validation procedure for the latter. This is an important step towards the full automation of PLS model development and routine updates.

Original languageEnglish
Article numbere2887
Number of pages15
JournalJournal of Chemometrics
Volume31
Issue number5
DOIs
Publication statusPublished - May 2017

Keywords

  • number of components
  • partial least squares
  • randomization test
  • ION MOBILITY SPECTROMETRY
  • PARTIAL LEAST-SQUARES
  • MULTIVARIATE CALIBRATION
  • VARIABLE IMPORTANCE
  • REGRESSION-MODELS
  • CROSS-VALIDATION
  • CHEMOMETRICS
  • DISTRIBUTIONS
  • OPTIMIZATION
  • SPECTROSCOPY

Cite this