Finding a most parsimonious or likely tree in a network with respect to an alignment

Steven Kelk; Fabio Pardi; Celine Scornavacca; Leo Van Iersel

doi:10.1007/s00285-018-1282-2

Finding a most parsimonious or likely tree in a network with respect to an alignment

Steven Kelk^*, Fabio Pardi, Celine Scornavacca, Leo Van Iersel

^*Corresponding author for this work

BioMathematics and BioInformatics

Research output: Contribution to journal › Article › Academic › peer-review

Abstract

Phylogenetic networks are often constructed by merging multiple conflicting phylogenetic signals into a directed acyclic graph. It is interesting to explore whether a network constructed in this way induces biologically-relevant phylogenetic signals that were not present in the input. Here we show that, given a multiple alignment A for a set of taxa X and a rooted phylogenetic network N whose leaves are labelled by X, it is NP-hard to locate a most parsimonious phylogenetic tree displayed by N (with respect to A) even when the level of N-the maximum number of reticulation nodes within a biconnected component-is 1 and A contains only 2 distinct states. (If, additionally, gaps are allowed the problem becomes APX-hard.) We also show that under the same conditions, and assuming a simple binary symmetric model of character evolution, finding a most likely tree displayed by the network is NP-hard. These negative results contrast with earlier work on parsimony in which it is shown that if A consists of a single column the problem is fixed parameter tractable in the level. We conclude with a discussion of why, despite the NP-hardness, both the parsimony and likelihood problem can likely be well-solved in practice.

Original language	English
Pages (from-to)	527-547
Number of pages	21
Journal	Journal of Mathematical Biology
Volume	78
Issue number	1-2
DOIs	https://doi.org/10.1007/s00285-018-1282-2
Publication status	Published - 1 Jan 2019

Keywords

Phylogenetic tree
Phylogenetic network
Maximum parsimony
Maximum likelihood
NP-hardness
APX-hardness
MAXIMUM-LIKELIHOOD
BAYESIAN-INFERENCE
RECOMBINATION
EVOLUTION
MODEL

Access to Document

10.1007/s00285-018-1282-2Licence: CC BY

http://link.springer.com/10.1007/s00285-018-1282-2

Cite this

@article{0cebea1f8ecc487cba6d5a746e664c54,

title = "Finding a most parsimonious or likely tree in a network with respect to an alignment",

abstract = "Phylogenetic networks are often constructed by merging multiple conflicting phylogenetic signals into a directed acyclic graph. It is interesting to explore whether a network constructed in this way induces biologically-relevant phylogenetic signals that were not present in the input. Here we show that, given a multiple alignment A for a set of taxa X and a rooted phylogenetic network N whose leaves are labelled by X, it is NP-hard to locate a most parsimonious phylogenetic tree displayed by N (with respect to A) even when the level of N-the maximum number of reticulation nodes within a biconnected component-is 1 and A contains only 2 distinct states. (If, additionally, gaps are allowed the problem becomes APX-hard.) We also show that under the same conditions, and assuming a simple binary symmetric model of character evolution, finding a most likely tree displayed by the network is NP-hard. These negative results contrast with earlier work on parsimony in which it is shown that if A consists of a single column the problem is fixed parameter tractable in the level. We conclude with a discussion of why, despite the NP-hardness, both the parsimony and likelihood problem can likely be well-solved in practice.",

keywords = "Phylogenetic tree, Phylogenetic network, Maximum parsimony, Maximum likelihood, NP-hardness, APX-hardness, MAXIMUM-LIKELIHOOD, BAYESIAN-INFERENCE, RECOMBINATION, EVOLUTION, MODEL",

author = "Steven Kelk and Fabio Pardi and Celine Scornavacca and {Van Iersel}, Leo",

note = "Funding Information: Acknowledgements Leo van Iersel was partly supported by the Netherlands Organization for Scientific Research (NWO), including Vidi grant 639.072.602, and partly by the 4TU Applied Mathematics Institute. Celine Scornavacca was partly supported by the French Agence Nationale de la Recherche Investissements d{\textquoteright}Avenir/Bioinformatique (ANR-10-BINF-01-02, Ancestrome). Funding Information: Leo van Iersel was partly supported by the Netherlands Organization for Scientific Research (NWO), including Vidi grant 639.072.602, and partly by the 4TU Applied Mathematics Institute. Celine Scornavacca was partly supported by the French Agence Nationale de la Recherche Investissements d?Avenir/Bioinformatique (ANR-10-BINF-01-02, Ancestrome). Publisher Copyright: {\textcopyright} 2018, The Author(s).",

year = "2019",

month = jan,

day = "1",

doi = "10.1007/s00285-018-1282-2",

language = "English",

volume = "78",

pages = "527--547",

journal = "Journal of Mathematical Biology",

issn = "0303-6812",

publisher = "Springer Verlag",

number = "1-2",

}

TY - JOUR

T1 - Finding a most parsimonious or likely tree in a network with respect to an alignment

AU - Kelk, Steven

AU - Pardi, Fabio

AU - Scornavacca, Celine

AU - Van Iersel, Leo

N1 - Funding Information: Acknowledgements Leo van Iersel was partly supported by the Netherlands Organization for Scientific Research (NWO), including Vidi grant 639.072.602, and partly by the 4TU Applied Mathematics Institute. Celine Scornavacca was partly supported by the French Agence Nationale de la Recherche Investissements d’Avenir/Bioinformatique (ANR-10-BINF-01-02, Ancestrome). Funding Information: Leo van Iersel was partly supported by the Netherlands Organization for Scientific Research (NWO), including Vidi grant 639.072.602, and partly by the 4TU Applied Mathematics Institute. Celine Scornavacca was partly supported by the French Agence Nationale de la Recherche Investissements d?Avenir/Bioinformatique (ANR-10-BINF-01-02, Ancestrome). Publisher Copyright: © 2018, The Author(s).

PY - 2019/1/1

Y1 - 2019/1/1

N2 - Phylogenetic networks are often constructed by merging multiple conflicting phylogenetic signals into a directed acyclic graph. It is interesting to explore whether a network constructed in this way induces biologically-relevant phylogenetic signals that were not present in the input. Here we show that, given a multiple alignment A for a set of taxa X and a rooted phylogenetic network N whose leaves are labelled by X, it is NP-hard to locate a most parsimonious phylogenetic tree displayed by N (with respect to A) even when the level of N-the maximum number of reticulation nodes within a biconnected component-is 1 and A contains only 2 distinct states. (If, additionally, gaps are allowed the problem becomes APX-hard.) We also show that under the same conditions, and assuming a simple binary symmetric model of character evolution, finding a most likely tree displayed by the network is NP-hard. These negative results contrast with earlier work on parsimony in which it is shown that if A consists of a single column the problem is fixed parameter tractable in the level. We conclude with a discussion of why, despite the NP-hardness, both the parsimony and likelihood problem can likely be well-solved in practice.

AB - Phylogenetic networks are often constructed by merging multiple conflicting phylogenetic signals into a directed acyclic graph. It is interesting to explore whether a network constructed in this way induces biologically-relevant phylogenetic signals that were not present in the input. Here we show that, given a multiple alignment A for a set of taxa X and a rooted phylogenetic network N whose leaves are labelled by X, it is NP-hard to locate a most parsimonious phylogenetic tree displayed by N (with respect to A) even when the level of N-the maximum number of reticulation nodes within a biconnected component-is 1 and A contains only 2 distinct states. (If, additionally, gaps are allowed the problem becomes APX-hard.) We also show that under the same conditions, and assuming a simple binary symmetric model of character evolution, finding a most likely tree displayed by the network is NP-hard. These negative results contrast with earlier work on parsimony in which it is shown that if A consists of a single column the problem is fixed parameter tractable in the level. We conclude with a discussion of why, despite the NP-hardness, both the parsimony and likelihood problem can likely be well-solved in practice.

KW - Phylogenetic tree

KW - Phylogenetic network

KW - Maximum parsimony

KW - Maximum likelihood

KW - NP-hardness

KW - APX-hardness

KW - MAXIMUM-LIKELIHOOD

KW - BAYESIAN-INFERENCE

KW - RECOMBINATION

KW - EVOLUTION

KW - MODEL

U2 - 10.1007/s00285-018-1282-2

DO - 10.1007/s00285-018-1282-2

M3 - Article

C2 - 30121824

SN - 0303-6812

VL - 78

SP - 527

EP - 547

JO - Journal of Mathematical Biology

JF - Journal of Mathematical Biology

IS - 1-2

ER -