Small data materials design with machine learning: When the average model knows best

Danny Vanpoucke, Katrien Bernaerts*, Siamak Mehrkanoon, Ko Hermans, Onno van Knippenberg

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

203 Downloads (Pure)


Machine learning is quickly becoming an important tool in modern materials design. Where many of its successes are rooted in huge
datasets, the most common applications in academic and industrial materials design deal with datasets of at best a few tens of data points.
Harnessing the power of machine learning in this context is, therefore, of considerable importance. In this work, we investigate the
intricacies introduced by these small datasets. We show that individual data points introduce a significant chance factor in both model
training and quality measurement. This chance factor can be mitigated by the introduction of an ensemble-averaged model. This model
presents the highest accuracy, while at the same time, it is robust with regard to changing the dataset size. Furthermore, as only a single
model instance needs to be stored and evaluated, it provides a highly efficient model for prediction purposes, ideally suited for the practical
materials scientist.
Original languageEnglish
Article number054901
Number of pages12
JournalJournal of Applied Physics
Issue number5
Publication statusPublished - 7 Aug 2020


  • Machine Learning
  • Regression Analysis
  • adhesives
  • artificial intelligence (AI)
  • coating
  • featured article
  • materials science
  • scilight

Cite this