Quantification of biases in predictions of protein stability changes upon mutations

Fabrizio Pucci, Katrien Bernaerts, Jean Marc Kwasigroch , Marianne Rooman

Research output: Contribution to journalArticleAcademicpeer-review

Abstract

Motivation: Bioinformatics tools that predict protein stability changes upon point mutations have made a lot of progress in the last decades and have become accurate and fast enough to make computational mutagenesis experiments feasible, even on a proteome scale. Despite these achievements, they still suffer from important issues that must be solved to allow further improving their performances and utilizing them to deepen our insights into protein folding and stability mechanisms. One of these problems is their bias toward the learning datasets which, being dominated by destabilizing mutations, causes predictions to be better for destabilizing than for stabilizing mutations.

Results: We thoroughly analyzed the biases in the prediction of folding free energy changes upon point mutations (Delta Delta G(0)) and proposed some unbiased solutions. We started by constructing a dataset S-sym of experimentally measured Delta Delta G(0)s with an equal number of stabilizing and destabilizing mutations, by collecting mutations for which the structure of both the wild-type and mutant protein is available. On this balanced dataset, we assessed the performances of 15 widely used Delta Delta G(0) predictors. After the astonishing observation that almost all these methods are strongly biased toward destabilizing mutations, especially those that use black-box machine learning, we proposed an elegant way to solve the bias issue by imposing physical symmetries under inverse mutations on the model structure, which we implemented in PoPMuSiC(sym). This new predictor constitutes an efficient trade-off between accuracy and absence of biases. Some final considerations and suggestions for further improvement of the predictors are discussed.

Original languageEnglish
Pages (from-to)3659-3665
Number of pages7
JournalBioinformatics
Volume34
Issue number21
DOIs
Publication statusPublished - 1 Nov 2018

Keywords

  • PERFORMANCE
  • SERVER

Cite this

Pucci, Fabrizio ; Bernaerts, Katrien ; Kwasigroch , Jean Marc ; Rooman, Marianne . / Quantification of biases in predictions of protein stability changes upon mutations. In: Bioinformatics. 2018 ; Vol. 34, No. 21. pp. 3659-3665.
@article{ffd24531191b4ea19d347304eb261802,
title = "Quantification of biases in predictions of protein stability changes upon mutations",
abstract = "Motivation: Bioinformatics tools that predict protein stability changes upon point mutations have made a lot of progress in the last decades and have become accurate and fast enough to make computational mutagenesis experiments feasible, even on a proteome scale. Despite these achievements, they still suffer from important issues that must be solved to allow further improving their performances and utilizing them to deepen our insights into protein folding and stability mechanisms. One of these problems is their bias toward the learning datasets which, being dominated by destabilizing mutations, causes predictions to be better for destabilizing than for stabilizing mutations.Results: We thoroughly analyzed the biases in the prediction of folding free energy changes upon point mutations (Delta Delta G(0)) and proposed some unbiased solutions. We started by constructing a dataset S-sym of experimentally measured Delta Delta G(0)s with an equal number of stabilizing and destabilizing mutations, by collecting mutations for which the structure of both the wild-type and mutant protein is available. On this balanced dataset, we assessed the performances of 15 widely used Delta Delta G(0) predictors. After the astonishing observation that almost all these methods are strongly biased toward destabilizing mutations, especially those that use black-box machine learning, we proposed an elegant way to solve the bias issue by imposing physical symmetries under inverse mutations on the model structure, which we implemented in PoPMuSiC(sym). This new predictor constitutes an efficient trade-off between accuracy and absence of biases. Some final considerations and suggestions for further improvement of the predictors are discussed.",
keywords = "PERFORMANCE, SERVER",
author = "Fabrizio Pucci and Katrien Bernaerts and Kwasigroch, {Jean Marc} and Marianne Rooman",
year = "2018",
month = "11",
day = "1",
doi = "10.1093/bioinformatics/bty348",
language = "English",
volume = "34",
pages = "3659--3665",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "21",

}

Quantification of biases in predictions of protein stability changes upon mutations. / Pucci, Fabrizio ; Bernaerts, Katrien; Kwasigroch , Jean Marc ; Rooman, Marianne .

In: Bioinformatics, Vol. 34, No. 21, 01.11.2018, p. 3659-3665.

Research output: Contribution to journalArticleAcademicpeer-review

TY - JOUR

T1 - Quantification of biases in predictions of protein stability changes upon mutations

AU - Pucci, Fabrizio

AU - Bernaerts, Katrien

AU - Kwasigroch , Jean Marc

AU - Rooman, Marianne

PY - 2018/11/1

Y1 - 2018/11/1

N2 - Motivation: Bioinformatics tools that predict protein stability changes upon point mutations have made a lot of progress in the last decades and have become accurate and fast enough to make computational mutagenesis experiments feasible, even on a proteome scale. Despite these achievements, they still suffer from important issues that must be solved to allow further improving their performances and utilizing them to deepen our insights into protein folding and stability mechanisms. One of these problems is their bias toward the learning datasets which, being dominated by destabilizing mutations, causes predictions to be better for destabilizing than for stabilizing mutations.Results: We thoroughly analyzed the biases in the prediction of folding free energy changes upon point mutations (Delta Delta G(0)) and proposed some unbiased solutions. We started by constructing a dataset S-sym of experimentally measured Delta Delta G(0)s with an equal number of stabilizing and destabilizing mutations, by collecting mutations for which the structure of both the wild-type and mutant protein is available. On this balanced dataset, we assessed the performances of 15 widely used Delta Delta G(0) predictors. After the astonishing observation that almost all these methods are strongly biased toward destabilizing mutations, especially those that use black-box machine learning, we proposed an elegant way to solve the bias issue by imposing physical symmetries under inverse mutations on the model structure, which we implemented in PoPMuSiC(sym). This new predictor constitutes an efficient trade-off between accuracy and absence of biases. Some final considerations and suggestions for further improvement of the predictors are discussed.

AB - Motivation: Bioinformatics tools that predict protein stability changes upon point mutations have made a lot of progress in the last decades and have become accurate and fast enough to make computational mutagenesis experiments feasible, even on a proteome scale. Despite these achievements, they still suffer from important issues that must be solved to allow further improving their performances and utilizing them to deepen our insights into protein folding and stability mechanisms. One of these problems is their bias toward the learning datasets which, being dominated by destabilizing mutations, causes predictions to be better for destabilizing than for stabilizing mutations.Results: We thoroughly analyzed the biases in the prediction of folding free energy changes upon point mutations (Delta Delta G(0)) and proposed some unbiased solutions. We started by constructing a dataset S-sym of experimentally measured Delta Delta G(0)s with an equal number of stabilizing and destabilizing mutations, by collecting mutations for which the structure of both the wild-type and mutant protein is available. On this balanced dataset, we assessed the performances of 15 widely used Delta Delta G(0) predictors. After the astonishing observation that almost all these methods are strongly biased toward destabilizing mutations, especially those that use black-box machine learning, we proposed an elegant way to solve the bias issue by imposing physical symmetries under inverse mutations on the model structure, which we implemented in PoPMuSiC(sym). This new predictor constitutes an efficient trade-off between accuracy and absence of biases. Some final considerations and suggestions for further improvement of the predictors are discussed.

KW - PERFORMANCE

KW - SERVER

U2 - 10.1093/bioinformatics/bty348

DO - 10.1093/bioinformatics/bty348

M3 - Article

VL - 34

SP - 3659

EP - 3665

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 21

ER -