Small data materials design with machine learning: When the average model knows best

Danny Vanpoucke; Katrien Bernaerts; Siamak Mehrkanoon; Ko Hermans; Onno van Knippenberg

doi:10.1063/5.0012285

Small data materials design with machine learning: When the average model knows best

Danny Vanpoucke, Katrien Bernaerts^*, Siamak Mehrkanoon, Ko Hermans, Onno van Knippenberg

^*Corresponding author for this work

Research output: Contribution to journal › Article › Academic › peer-review

216 Downloads (Pure)

Abstract

Machine learning is quickly becoming an important tool in modern materials design. Where many of its successes are rooted in huge
datasets, the most common applications in academic and industrial materials design deal with datasets of at best a few tens of data points.
Harnessing the power of machine learning in this context is, therefore, of considerable importance. In this work, we investigate the
intricacies introduced by these small datasets. We show that individual data points introduce a significant chance factor in both model
training and quality measurement. This chance factor can be mitigated by the introduction of an ensemble-averaged model. This model
presents the highest accuracy, while at the same time, it is robust with regard to changing the dataset size. Furthermore, as only a single
model instance needs to be stored and evaluated, it provides a highly efficient model for prediction purposes, ideally suited for the practical
materials scientist.

Original language	English
Article number	054901
Number of pages	12
Journal	Journal of Applied Physics
Volume	128
Issue number	5
DOIs	https://doi.org/10.1063/5.0012285
Publication status	Published - 7 Aug 2020

Keywords

Machine Learning
Regression Analysis
adhesives
artificial intelligence (AI)
coating
featured article
materials science
scilight
REGRESSION
DIAMOND
INFORMATION
DISCOVERY
MULTITARGET OPTIMIZATION

Access to Document

10.1063/5.0012285

Full TextFinal published version, 2.56 MBLicence: Taverne

Cite this

@article{b9344a4f8e2f41ffa41826421b534c57,

title = "Small data materials design with machine learning: When the average model knows best",

abstract = "Machine learning is quickly becoming an important tool in modern materials design. Where many of its successes are rooted in hugedatasets, the most common applications in academic and industrial materials design deal with datasets of at best a few tens of data points.Harnessing the power of machine learning in this context is, therefore, of considerable importance. In this work, we investigate theintricacies introduced by these small datasets. We show that individual data points introduce a significant chance factor in both modeltraining and quality measurement. This chance factor can be mitigated by the introduction of an ensemble-averaged model. This modelpresents the highest accuracy, while at the same time, it is robust with regard to changing the dataset size. Furthermore, as only a singlemodel instance needs to be stored and evaluated, it provides a highly efficient model for prediction purposes, ideally suited for the practicalmaterials scientist.",

keywords = "Machine Learning, Regression Analysis, adhesives, artificial intelligence (AI), coating, featured article, materials science, scilight, REGRESSION, DIAMOND, INFORMATION, DISCOVERY, MULTITARGET OPTIMIZATION",

author = "Danny Vanpoucke and Katrien Bernaerts and Siamak Mehrkanoon and Ko Hermans and {van Knippenberg}, Onno",

note = "Funding Information: D.E.V.P. and K.V.B. acknowledge the project D-NL-HIT carried out in the framework of INTERREG-Program Deutschland-Nederland, which is co-financed by the European Union, the MWIDE NRW, the Ministerie van Economische Zaken en Klimaat, and the provinces of Limburg, Gelderland, Noord-Brabant, and Overijssel. The computational resources and services used in this work were provided by the VSC (Flemish Supercomputer Center) and funded by the Research Foundation Flanders (FWO) and the Flemish Government—Department EWI. Publisher Copyright: {\textcopyright} 2020 Author(s).",

year = "2020",

month = aug,

day = "7",

doi = "10.1063/5.0012285",

language = "English",

volume = "128",

journal = "Journal of Applied Physics",

issn = "0021-8979",

publisher = "American Institute of Physics Publising LLC",

number = "5",

}

TY - JOUR

T1 - Small data materials design with machine learning: When the average model knows best

AU - Vanpoucke, Danny

AU - Bernaerts, Katrien

AU - Mehrkanoon, Siamak

AU - Hermans, Ko

AU - van Knippenberg, Onno

N1 - Funding Information: D.E.V.P. and K.V.B. acknowledge the project D-NL-HIT carried out in the framework of INTERREG-Program Deutschland-Nederland, which is co-financed by the European Union, the MWIDE NRW, the Ministerie van Economische Zaken en Klimaat, and the provinces of Limburg, Gelderland, Noord-Brabant, and Overijssel. The computational resources and services used in this work were provided by the VSC (Flemish Supercomputer Center) and funded by the Research Foundation Flanders (FWO) and the Flemish Government—Department EWI. Publisher Copyright: © 2020 Author(s).

PY - 2020/8/7

Y1 - 2020/8/7

N2 - Machine learning is quickly becoming an important tool in modern materials design. Where many of its successes are rooted in hugedatasets, the most common applications in academic and industrial materials design deal with datasets of at best a few tens of data points.Harnessing the power of machine learning in this context is, therefore, of considerable importance. In this work, we investigate theintricacies introduced by these small datasets. We show that individual data points introduce a significant chance factor in both modeltraining and quality measurement. This chance factor can be mitigated by the introduction of an ensemble-averaged model. This modelpresents the highest accuracy, while at the same time, it is robust with regard to changing the dataset size. Furthermore, as only a singlemodel instance needs to be stored and evaluated, it provides a highly efficient model for prediction purposes, ideally suited for the practicalmaterials scientist.

AB - Machine learning is quickly becoming an important tool in modern materials design. Where many of its successes are rooted in hugedatasets, the most common applications in academic and industrial materials design deal with datasets of at best a few tens of data points.Harnessing the power of machine learning in this context is, therefore, of considerable importance. In this work, we investigate theintricacies introduced by these small datasets. We show that individual data points introduce a significant chance factor in both modeltraining and quality measurement. This chance factor can be mitigated by the introduction of an ensemble-averaged model. This modelpresents the highest accuracy, while at the same time, it is robust with regard to changing the dataset size. Furthermore, as only a singlemodel instance needs to be stored and evaluated, it provides a highly efficient model for prediction purposes, ideally suited for the practicalmaterials scientist.

KW - Machine Learning

KW - Regression Analysis

KW - adhesives

KW - artificial intelligence (AI)

KW - coating

KW - featured article

KW - materials science

KW - scilight

KW - REGRESSION

KW - DIAMOND

KW - INFORMATION

KW - DISCOVERY

KW - MULTITARGET OPTIMIZATION

U2 - 10.1063/5.0012285

DO - 10.1063/5.0012285

M3 - Article

SN - 0021-8979

VL - 128

JO - Journal of Applied Physics

JF - Journal of Applied Physics

IS - 5

M1 - 054901

ER -