Random Forest and Ensemble Methods

Research output: Chapter in Book/Report/Conference proceedingChapterAcademic

Abstract

Recent expansions of technology led to growth and availability of different types of data. This, thus gave various opportunities for the machine learning, data mining, chemometrics and data science fields. Both fields have been consequently developing new approaches and algorithms in a wide range of applications in biomedical, medical, -omics but also from daily-life to national security areas. Ensemble techniques become the backbone of the machine learning field. The phrase refers to an approach in which multiple, independent, aka uncorrelated, predictive models are combined. Those multiple models can be combined for instance by simple averaging or voting. The advantage of ensemble techniques is their ability to yield very high performance model. The use of ensemble techniques is present in our daily lives. We tend to ask or check the opinion of several specialists before making the final decision for instance before purchasing an item or before hiring a new employee we search for judgment of several referees. In this book article, the theoretical and practical demonstration of three ensembles techniques, adaptive boosting, random forest and gradient boosting are shown. Each technique is discussed from its theoretical perspective followed by presentation of pro and cons of each method. The last part of the chapter is focused on the comparison between the techniques using two simulated data sets.
Original languageEnglish
Title of host publicationComprehensive Chemometrics
Subtitle of host publicationChemical and Biochemical Data Analysis
EditorsSteven Brown, Roma Tauler, Beata Walczak
PublisherElsevier BV
Chapter3.32
Pages661-672
Number of pages12
Edition2
ISBN (Print)978-0-444-64166-3
DOIs
Publication statusPublished - 2020

Cite this