Pitfalls in supermatrix phylogenomics

Philippe, Hervé; Vienne, Damien M. De; Ranwez, Vincent; Roure, Béatrice; Baurain, Denis; Delsuc, Frédéric

doi:10.5852/ejt.2017.283

Download

Article (Scientific journals)

Pitfalls in supermatrix phylogenomics

Philippe, Hervé; Vienne, Damien M. De; Ranwez, Vincent et al.

2017 • In European Journal of Taxonomy, 283, p. 1-25

Peer Reviewed verified by ORBi

Permalink
https://hdl.handle.net/2268/207688

DOI
10.5852/ejt.2017.283

Files (1)Send to Details Statistics Bibliography Similar publications

Files

Full Text

Philippe_et_al_2017_EJT_postprint_editor.pdf

Publisher postprint (1.83 MB)

Download

All documents in ORBi are protected by a user license.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Keywords :

supermatrix; systematic error; data quality; incongruence

Abstract :

[en] In the mid-2000s, molecular phylogenetics turned into phylogenomics, a development that improved the resolution of phylogenetic trees through a dramatic reduction in stochastic error. While some then predicted “the end of incongruence”, it soon appeared that analysing large amounts of sequence data without an adequate model of sequence evolution amplifies systematic error and leads to phylogenetic artefacts. With the increasing flood of (sometimes low-quality) genomic data resulting from the rise of high-throughput sequencing, a new type of error has emerged. Termed here “data errors”, it lumps together several kinds of issues affecting the construction of phylogenomic supermatrices (e.g., sequencing and annotation errors, contaminant sequences). While easy to deal with at a single-gene scale, such errors become very difficult to avoid at the genomic scale, both because hand curating thousands of sequences is prohibitively time-consuming and because the suitable automated bioinformatics tools are still in their infancy. In this paper, we first review the pitfalls affecting the construction of supermatrices and the strategies to limit their adverse effects on phylogenomic inference. Then, after discussing the relative non-issue of missing data in supermatrices, we briefly present the approaches commonly used to reduce systematic error.

Disciplines :

Biochemistry, biophysics & molecular biology
Genetics & genetic processes

Author, co-author :

Philippe, Hervé

Vienne, Damien M. De

Ranwez, Vincent

Roure, Béatrice

Baurain, Denis ; Université de Liège > Département des sciences de la vie > Phylogénomique des eucaryotes

Delsuc, Frédéric

Language :

English

Title :

Pitfalls in supermatrix phylogenomics

Publication date :

2017

Journal title :

European Journal of Taxonomy

ISSN :

2118-9773

Publisher :

MNHN, Paris, France

Volume :

283

Pages :

1-25

Peer reviewed :

Peer Reviewed verified by ORBi

Additional URL :

http://www.europeanjournaloftaxonomy.eu/index.php/ejt/article/view/407

Available on ORBi :

since 25 February 2017

Statistics

Number of views

678 (17 by ULiège)

Number of downloads

751 (5 by ULiège)

More statistics

Scopus citations^®

118

Scopus citations^®
without self-citations

109

OpenAlex citations

188

Bibliography

Abascal F., Zardoya R. & Telford M.J. 2010. TranslatorX: multiple alignment of nucleotide sequences guided by amino acid translations. Nucleic Acids Research 38: W7–13. http://dx.doi.org/10.1093/nar/gkq291
Altschul S.F. & Lipman D.J. 1990. Protein database searches for multiple alignments. Proceedings of the National Academy of Sciences 87: 5509–5513. http://dx.doi.org/10.1073/pnas.87.14.5509
Baguna J. & Riutort M. 2004. The dawn of bilaterian animals: the case of acoelomorph flatworms. BioEssays 26: 1046–1057. http://dx.doi.org/10.1002/bies.20113
Bininda-Emonds O.R. 2005. transAlign: using amino acids to facilitate the multiple alignment of protein-coding DNA sequences. BMC Bioinformatics 6: e156. http://dx.doi.org/10.1186/1471-2105-6-156
Blanquart S. & Lartillot N. 2008. A site- and time-heterogeneous model of amino acid replacement. Molecular Biology and Evolution 25: 842–858. http://dx.doi.org/10.1093/molbev/msn018
Bourlat S.J., Nielsen C., Lockyer A.E., Littlewood D.T. & Telford M.J. 2003. Xenoturbella is a deuterostome that eats molluscs. Nature 424: 925–928. http://dx.doi.org/10.1038/nature01851
Bradley R.K., Roberts A., Smoot M., Juvekar S., Do J., Dewey C., Holmes I. & Pachter L. 2009. Fast statistical alignment. PLoS Computational Biology 5: e1000392. http://dx.doi.org/10.1371/journal. pcbi.1000392
Brinkmann H. & Philippe H. 1999. Archaea sister group of Bacteria? Indications from tree reconstruction artifacts in ancient phylogenies. Molecular Biology and Evolution 16: 817–825.
Brinkmann H., Giezen M., Zhou Y., Raucourt G.P. & Philippe H. 2005. An empirical assessment of long-branch attraction artefacts in deep eukaryotic phylogenomics. Systematic Biology 54: 743–757. http://dx.doi.org/10.1080/10635150500234609
Brown J.M. 2014. Detection of implausible phylogenetic inferences using posterior predictive assessment of model fit. Systematic Biology 63: 334–348. http://dx.doi.org/10.1093/sysbio/syu002
Cannon J.T., Vellutini B.C., Smith 3rd J., Ronquist F., Jondelius U. & Hejnol A. 2016. Xenacoelomorpha is the sister group to Nephrozoa. Nature 530: 89–93. http://dx.doi.org/10.1038/nature16520
Capella-Gutierrez S., Silla-Martinez J.M. & Gabaldon T. 2009. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25: 1972–1973. http://dx.doi.org/10.1093/bioinformatics/btp348
Castresana J. 2000. Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Molecular Biology and Evolution 17: 540–552.
Chang J.M., Di Tommaso P. & Notredame C. 2014. TCS: a new multiple sequence alignment reliability measure to estimate alignment accuracy and improve phylogenetic tree reconstruction. Molecular Biology and Evolution 31: 1625–1637. http://dx.doi.org/10.1093/molbev/msu117
Chessel D. & Hanafi M. 1996. Analyses de la co-inertie de K nuages de points. Revue de Statistique Appliquée 44 (2): 35–60.
Criscuolo A. & Gribaldo S. 2010. BMGE (Block Mapping and Gathering with Entropy): a new software for selection of phylogenetic informative regions from multiple sequence alignments. BMC Evolutionary Biology 10: e210. http://dx.doi.org/10.1186/1471-2148-10-210
Dayhoff M.O., Schwartz R.M. & Orcutt B.C. 1978. A model of evolutionary change in proteins. In: Dayhoff M.O. (ed.) Atlas of Protein Sequences and Structure: 345–352. National Biomedical Research Foundation, Washington DC.
Delsuc F., Brinkmann H. & Philippe H. 2005. Phylogenomics and the reconstruction of the tree of life. Nature Reviews Genetics 6: 361–375. http://dx.doi.org/10.1038/nrg1603
Delsuc F., Brinkmann H., Chourrout D. & Philippe H. 2006. Tunicates and not cephalochordates are the closest living relatives of vertebrates. Nature 439: 965–968. http://dx.doi.org/10.1038/nature04336
Driskell A.C., Ane C., Burleigh J.G., McMahon M.M., O'Meara B.C. & Sanderson M.H. 2004. Prospects for building the Tree of Life from large sequence databases. Science 306: 1172–1174. http://dx.doi.org/10.1126/science.1102036
Dunn C.W., Hejnol A., Matus D.Q., Pang K., Browne W.E., Smith S.A., Seaver E., Rouse G.W., Obst M., Edgecombe G.D., Sørensen M.V., Haddock S.H., Schmidt-Rhaesa A., Okusu A., Kristensen R.M., Wheeler W.C., Martindale M.Q. & Giribet G. 2008. Broad phylogenomic sampling improves resolution of the animal tree of life. Nature 452: 745–749. http://dx.doi.org/10.1038/nature06614
Dutheil J.Y. & Figuet E. 2015. Optimization of sequence alignments according to the number of sequences vs. number of sites trade-off. BMC Bioinformatics 16: e190. http://dx.doi.org/10.1186/s12859-015-0619-8
Edgar R.C. 2004. MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 5: e113. http://dx.doi.org/10.1186/1471-2105-5-113
Eyre-Walker A. 1993. Recombination and mammalian genome evolution. Proceedings of the the Royal Society B 252: 237–243. http://dx.doi.org/10.1098/rspb.1993.0071
Felsenstein J. 1978. Cases in which parsimony or compatibility methods will be positively misleading. Systematic Zoology 27: 401–410. http://dx.doi.org/10.2307/2412923
Felsenstein J. 1981. Evolutionary trees from DNA sequences: a maximum likelihood approach. Journal of Molecular Evolution 17: 368–376.
Felsenstein J. 1988. Phylogenies from molecular sequences: inference and reliability. Annual Review of Genetics 22: 521–565. http://dx.doi.org/10.1146/annurev.ge.22.120188.002513
Finet C., Timme R.E., Delwiche C.F. & Marletaz F. 2010. Multigene phylogeny of the green lineage reveals the origin and diversification of land plants. Current Biology 20: 2217–2222. http://dx.doi.org/10.1016/j.cub.2010.11.035
Foster P.G. 2004. Modeling compositional heterogeneity. Systematic Biology 53: 485–495. http://dx.doi.org/10.1080/10635150490445779
Galtier N. & Gouy M. 1998. Inferring pattern and process: maximum-likelihood implementation of a nonhomogeneous model of DNA sequence evolution for phylogenetic analysis. Molecular Biology and Evolution 15: 871–879.
Gee H. 2003. Evolution: ending incongruence. Nature 425: 782. http://dx.doi.org/10.1038/425782a
Goldman N. & Yang Z. 1994. A codon-based model of nucleotide substitution for protein-coding DNA sequences. Molecular Biology and Evolution 11: 725–736.
Gouy M., Guindon S. & Gascuel O. 2010. SeaView version 4: A multiplatform graphical user interface for sequence alignment and phylogenetic tree building. Molecular Biology and Evolution 27: 221–224. http://dx.doi.org/10.1093/molbev/msp259
Guindon S. & Gascuel O. 2003. A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Systematic Biology 52: 696–704. http://dx.doi.org/10.1080/10635150390235520
Hampl V., Hug L., Leigh J.W., Dacks J.B., Lang B.F., Simpson A.G. & Roger A.J. 2009. Phylogenomic analyses support the monophyly of Excavata and resolve relationships among eukaryotic “supergroups”. Proceedings of the National Academy of Sciences 106: 3859–3864. http://dx.doi.org/10.1073/pnas.0807880106
Hebert P.D., Cywinska A., Ball S.L. & deWaard J.R. 2003. Biological identifications through DNA barcodes. Proceedings of the Royal Society B 270: 313–321. http://dx.doi.org/10.1098/rspb.2002.2218
Hein J. 1990. Unified approach to alignment and phylogenies. Methods in Enzymology 183: 626–645. http://dx.doi.org/10.1016/0076-6879(90)83041-7
Hejnol A., Obst M., Stamatakis A., Ott M., Rouse G.W., Edgecombe G.D., Martinez P., Baguna J., Bailly X., Jondelius U., Wiens M., Muller W.E., Seaver E., Wheeler W.C., Martindale M.Q., Giribet G. & Dunn C.W. 2009. Assessing the root of bilaterian animals with scalable phylogenomic methods. Proceedings of the Royal Society B 276: 4261–4270. http://dx.doi.org/10.1098/rspb.2009.0896
Hendy M.D. & Penny D. 1989. A framework for the quantitative study of evolutionary trees. Systematic Zoology 38: 297–309. http://dx.doi.org/10.2307/2992396
Henikoff S. & Henikoff J.G. 1992. Amino acid substitution matrices from protein blocks. Proceedings of the National Academy of Sciences 89: 10915–10919. http://dx.doi.org/10.1073/pnas.89.22.10915
Higgins D.G., Bleasby A.J. & Fuchs R. 1992. CLUSTAL V: improved software for multiple sequence alignment. Computer Applications in the Biosciences 8: 189–191.
Hosner P.A., Faircloth B.C., Glenn T.C., Braun E.L. & Kimball R.T. 2016. Avoiding missing data biases in phylogenomic inference: an empirical study in the landfowl (Aves: Galliformes). Molecular Biology and Evolution 33: 1110–1125. http://dx.doi.org/10.1093/molbev/msv347
Huang J. & Gogarten J.P. 2006. Ancient horizontal gene transfer can benefit phylogenetic reconstruction. Trends in Genetics 22: 361–366. http://dx.doi.org/10.1016/j.tig.2006.05.004
Huelsenbeck J.P. 1991. When are fossils better than extant taxa in phylogenetic analysis? Systematic Zoology 40: 458–469. http://dx.doi.org/10.2307/2992240
Huelsenbeck J.P. 2002. Testing a covariotide model of DNA substitution. Molecular Biology and Evolution 19: 698–707.
Husník F., Chrudimský T. & Hypša V. 2011. Multiple origins of endosymbiosis within the Enterobacteriaceae (γ-Proteobacteria): convergency of complex phylogenetic approaches. BMC Biology 9: e87. http://dx.doi.org/10.1186/1741-7007-9-87
Jarvis E.D., Mirarab S., Aberer A.J., Li B., Houde P., Li C., Ho S.Y., Faircloth B.C., Nabholz B., Howard J.T., Suh A., Weber C.C., da Fonseca R.R., Li J., Zhang F., Li H., Zhou L., Narula N., Liu L., Ganapathy G., Boussau B., Bayzid M.S., Zavidovych V., Subramanian S., Gabaldon T., Capella-Gutierrez S., Huerta-Cepas J., Rekepalli B., Munch K., Schierup M., Lindow B., Warren W.C., Ray D., Green R.E., Bruford M.W., Zhan X., Dixon A., Li S., Li N., Huang Y., Derryberry E.P., Bertelsen M.F., Sheldon F.H., Brumfield R.T., Mello C.V., Lovell P.V., Wirthlin M., Schneider M.P., Prosdocimi F., Samaniego J.A., Vargas Velazquez A.M., Alfaro-Nunez A., Campos P.F., Petersen B., Sicheritz-Ponten T., Pas A., Bailey T., Scofield P., Bunce M., Lambert D.M., Zhou Q., Perelman P., Driskell A.C., Shapiro B., Xiong Z., Zeng Y., Liu S., Li Z., Liu B., Wu K., Xiao J., Yinqi X., Zheng Q., Zhang Y., Yang H., Wang J., Smeds L., Rheindt F.E., Braun M., Fjeldsa J., Orlando L., Barker F.K., Jonsson K.A., Johnson W., Koepfli K.P., O’Brien S., Haussler D., Ryder O.A., Rahbek C., Willerslev E., Graves G.R., Glenn T.C., McCormack J., Burt D., Ellegren H., Alstrom P., Edwards S.V., Stamatakis A., Mindell D.P., Cracraft J., Braun E.L., Warnow T., Jun W., Gilbert M.T. & Zhang G. 2014. Whole-genome analyses resolve early branches in the tree of life of modern birds. Science 346: 1320–1331. http://dx.doi.org/10.1126/science.1253451
Jeffroy O., Brinkmann H., Delsuc F. & Philippe H. 2006. Phylogenomics: the beginning of incongruence? Trends in Genetics 22: 225–231. http://dx.doi.org/10.1016/j.tig.2006.02.003
Katoh K., Kuma K., Toh H. & Miyata T. 2005. MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Research 33: 511–518. http://dx.doi.org/10.1093/nar/gki198
Kim J. & Ma J. 2011. PSAR: measuring multiple sequence alignment reliability by probabilistic sampling. Nucleic Acids Research 39: 6359–6368. http://dx.doi.org/10.1093/nar/gkr334
Kluge A. & Farris J. 1969. Quantitative phyletics and the evolution of anurans. Systematic Zoology 30: 1–32.
Kocot K.M., Cannon J.T., Todt C., Citarella M.R., Kohn A.B., Meyer A., Santos S.R., Schander C., Moroz L.L., Lieb B. & Halanych K.M. 2011. Phylogenomics reveals deep molluscan relationships. Nature 477: 452–456. http://dx.doi.org/10.1038/nature10382
Koski L.B. & Golding G.B. 2001. The closest BLAST hit is often not the nearest neighbor. Journal of Molecular Evolution 52: 540–542. http://dx.doi.org/10.1007/s002390010184
Lanave C., Preparata G., Saccone C. & Serio G. 1984. A new method for calculating evolutionary substitution rates. Journal of Molecular Evolution 20: 86–93. http://dx.doi.org/10.1007/BF02101990
Landan G. & Graur D. 2007. Heads or tails: a simple reliability check for multiple sequence alignments. Molecular Biology and Evolution 24: 1380–1383. http://dx.doi.org/10.1093/molbev/msm060
Landan G. & Graur D. 2008. Local reliability measures from sets of co-optimal multiple sequence alignments. Pacific Symposium on Biocomputing 13: 15–24.
Lartillot N. & Philippe H. 2004. A Bayesian mixture model for across-site heterogeneities in the aminoacid replacement process. Molecular Biology and Evolution 21: 1095–1109. http://dx.doi.org/10.1093/molbev/msh112
Lartillot N., Rodrigue N., Stubbs D. & Richer J. 2013. PhyloBayes MPI: phylogenetic reconstruction with infinite mixtures of profiles in a parallel environment. Systematic Biology 62: 611–615. http://dx.doi.org/10.1093/sysbio/syt022
Lassmann T. & Sonnhammer E.L. 2007. Automatic extraction of reliable regions from multiple sequence alignments. BMC Bioinformatics 8 (Suppl. 5): S9. http://dx.doi.org/10.1186/1471-2105-8-S5-S9
Laurin-Lemay S., Brinkmann H. & Philippe H. 2012. Origin of land plants revisited in the light of sequence contamination and missing data. Current Biology 22: R593–594. http://dx.doi.org/10.1016/j. cub.2012.06.013
Leebens-Mack J., Vision T., Brenner E., Bowers J.E., Cannon S., Clement M.J., Cunningham C.W., dePamphilis C., deSalle R., Doyle J.J., Eisen J.A., Gu X., Harshman J., Jansen R.K., Kellogg E.A., Koonin E.V., Mishler B.D., Philippe H., Pires J.C., Qiu Y.L., Rhee S.Y., Sjölander K., Soltis D.E., Soltis P.S., Stevenson D.W., Wall K., Warnow T. & Zmasek C. 2006. Taking the first steps towards a standard for reporting on phylogenies: Minimum Information About a Phylogenetic Analysis (MIAPA). OMICS: A Journal of Integrative Biology 10: 231–237. http://dx.doi.org/10.1089/omi.2006.10.231
Lemmon A.R., Brown J.M., Stanger-Hall K. & Lemmon E.M. 2009. The effect of ambiguous data on phylogenetic estimates obtained by maximum likelihood and Bayesian inference. Systematic Biology 58: 130–145. http://dx.doi.org/10.1093/sysbio/syp017
Lewis P.O., Holder M.T. & Swofford D.L. 2015. Phycas: software for Bayesian phylogenetic analysis. Systematic Biology 64: 525–531. http://dx.doi.org/10.1093/sysbio/syu132
Liu K., Warnow T.J., Holder M.T., Nelesen S.M., Yu J., Stamatakis A.P. & Linder C.R. 2012. SATe-II: very fast and accurate simultaneous estimation of multiple sequence alignments and phylogenetic trees. Systematic Biology 61: 90–106. http://dx.doi.org/10.1093/sysbio/syr095
Loytynoja A. & Goldman N. 2005. An algorithm for progressive multiple alignment of sequences with insertions. Proceedings of the National Academy of Sciences 102: 10557–10562. http://dx.doi.org/10.1073/pnas.0409137102
Loytynoja A. & Milinkovitch M.C. 2001. SOAP, cleaning multiple alignments from unstable blocks. Bioinformatics 17: 573–574. http://dx.doi.org/10.1093/bioinformatics/17.6.573
Morrison D.A. 2006. L.A.S. Johnson Review No. 8. Multiple sequence alignment for phylogenetic purposes. Australian Systematic Botany 19: 479–539. http://dx.doi.org/10.1071/SB06020
Morrison D.A. 2009. Why would phylogeneticists ignore computerized sequence alignment? Systematic Biology 58: 150–158. http://dx.doi.org/10.1093/sysbio/syp009
Morrison D.A. & Ellis J.T. 1997. Effects of nucleotide sequence alignment on phylogeny estimation: a case study of 18S rDNAs of Apicomplexa. Molecular Biology and Evolution 14: 428–441.
Notredame C., Higgins D.G. & Heringa J. 2000. T-Coffee: A novel method for fast and accurate multiple sequence alignment. Journal of Molecular Biology 302: 205–217. http://dx.doi.org/10.1006/jmbi.2000.4042
Ogden T.H. & Rosenberg M.S. 2006. Multiple sequence alignment accuracy and phylogenetic inference. Systematic Biology 55: 314–328. http://dx.doi.org/10.1080/10635150500541730
Okusu A. & Giribet G. 2003. New 18S rRNA sequences from neomenioid aplacophorans and the possible origin of persistent exogenous contamination. Journal of Molluscan Studies 69: 385–387. http://dx.doi.org/10.1093/mollus/69.4.385
Olsen G. 1987. Earliest phylogenetic branching: comparing rRNA-based evolutionary trees inferred with various techniques. Cold Spring Harbor Symposia on Quantitative Biology 52: 825–837.
Pagel M. & Meade A. 2004. A phylogenetic mixture model for detecting pattern-heterogeneity in gene sequence or character-state data. Systematic Biology 53: 571–581. http://dx.doi.org/10.1080/10635150490468675
Pawlowski J., Bolivar I., Fahrni J.F., Cavalier-Smith T. & Gouy M. 1996. Early origin of Foraminifera suggested by SSU rRNA gene sequences. Molecular Biology and Evolution 13: 445–450.
Penn O., Privman E., Ashkenazy H., Landan G., Graur D. & Pupko T. 2010. GUIDANCE: a web server for assessing alignment confidence scores. Nucleic Acids Research 38: W23–28. http://dx.doi.org/10.1093/nar/gkq443
Philippe H. 2011. Une décroissance de la recherche scientifique pour rendre la science durable? In: Abraham, Y.-M., Marion, L., Philippe, H. (eds) Décroissance versus Développement Durable: Débats Pour la Suite du Monde: 166–186. Écosociété, Montréal.
Philippe H. & Roure B. 2011. Difficult phylogenetic questions: more data, maybe; better methods, certainly. BMC Biology 9: e91. http://dx.doi.org/10.1186/1741-7007-9-91
Philippe H., Snell E.A., Bapteste E., Lopez P., Holland P.W. & Casane D. 2004. Phylogenomics of eukaryotes: impact of missing data on large alignments. Molecular Biology and Evolution 21: 1740–1752. http://dx.doi.org/10.1093/molbev/msh182
Philippe H., Delsuc F., Brinkmann H. & Lartillot N. 2005a. Phylogenomics. Annual Review of Ecology, Evolution, and Systematics 36: 541–562. http://dx.doi.org/10.1146/annurev.ecolsys.35.112202.130205
Philippe H., Lartillot N. & Brinkmann H. 2005b. Multigene analyses of bilaterian animals corroborate the monophyly of Ecdysozoa, Lophotrochozoa, and Protostomia. Molecular Biology and Evolution 22: 1246–1253. http://dx.doi.org/10.1093/molbev/msi111
Philippe H., Brinkmann H., Martinez P., Riutort M. & Baguna J. 2007. Acoel flatworms are not platyhelminthes: evidence from phylogenomics. PLoS One 2: e717. http://dx.doi.org/10.1371/journal. pone.0000717
Philippe H., Derelle R., Lopez P., Pick K., Borchiellini C., Boury-Esnault N., Vacelet J., Renard E., Houliston E., Queinnec E., Da Silva C., Wincker P., Le Guyader H., Leys S., Jackson D.J., Schreiber F., Erpenbeck D., Morgenstern B., Worheide G. & Manuel M. 2009. Phylogenomics revives traditional views on deep animal relationships. Current Biology 19: 706–712. http://dx.doi.org/10.1016/j.cub.2009.02.052
Philippe H., Brinkmann H., Copley R.R., Moroz L.L., Nakano H., Poustka A.J., Wallberg A., Peterson K.J. & Telford M.J. 2011a. Acoelomorph flatworms are deuterostomes related to Xenoturbella. Nature 470: 255–258. http://dx.doi.org/10.1038/nature09676
Philippe H., Brinkmann H., Lavrov D.V., Littlewood D.T., Manuel M., Worheide G. & Baurain D. 2011b. Resolving difficult phylogenetic questions: why more sequences are not enough. PLoS Biology 9: e1000602. http://dx.doi.org/10.1371/journal.pbio.1000602
Phillips M.J., Delsuc F. & Penny D. 2004. Genome-scale phylogeny and the detection of systematic biases. Molecular Biology and Evolution 21: 1455–1458. http://dx.doi.org/10.1093/molbev/msh137
Pisani D. 2004. Identifying and removing fast-evolving sites using compatibility analysis: An example from the Arthropoda. Systematic Biology 53: 978–989. http://dx.doi.org/10.1080/10635150490888877
Poirot O., O’Toole E. & Notredame C. 2003. Tcoffee@igs: A web server for computing, evaluating and combining multiple sequence alignments. Nucleic Acids Research 31: 3503–3506. http://dx.doi.org/10.1093/nar/gkg522
Prakash A. & Tompa M. 2005. Statistics of local multiple alignments. Bioinformatics 21 (Suppl. 1): i344–i350. http://dx.doi.org/10.1093/bioinformatics/bti1042
Ranwez V., Harispe S., Delsuc F. & Douzery E.J. 2011. MACSE: Multiple Alignment of Coding SEquences accounting for frameshifts and stop codons. PloS One 6: e22594. http://dx.doi.org/10.1371/journal.pone.0022594
Rokas A., Williams B.L., King N. & Carroll S.B. 2003. Genome-scale approaches to resolving incongruence in molecular phylogenies. Nature 425: 798–804. http://dx.doi.org/10.1038/nature02053
Romiguier J., Ranwez V., Delsuc F., Galtier N. & Douzery E.J. 2013. Less is more in mammalian phylogenomics: AT-rich genes minimize tree conflicts and unravel the root of placental mammals. Molecular Biology and Evolution 30: 2134–2144. http://dx.doi.org/10.1093/molbev/mst116
Roure B. & Philippe H. 2011. Site-specific time heterogeneity of the substitution process and its impact on phylogenetic inference. BMC Evolutionary Biology 11: e17. http://dx.doi.org/10.1186/1471-2148-11-17
Roure B., Baurain D. & Philippe H. 2013. Impact of missing data on phylogenies inferred from empirical phylogenomic data sets. Molecular Biology and Evolution 30: 197–214. http://dx.doi.org/10.1093/molbev/mss208
Rouse G.W., Wilson N.G., Carvajal J.I. & Vrijenhoek R.C. 2016. New deep-sea species of Xenoturbella and the position of Xenacoelomorpha. Nature 530: 94–97. http://dx.doi.org/10.1038/nature16545
Sanderson M.J. & Shaffer H.B. 2002. Troubleshooting molecular phylogenetic analyses. Annual Review of Ecology and Systematics 33: 49–72. http://dx.doi.org/10.1146/annurev.ecolsys.33.010802.150509
Sanderson M.J., Driskell A.C., Ree R.H., Eulenstein O. & Langley S. 2003. Obtaining maximal concatenated phylogenetic datasets from large sequence databases. Molecular Biology and Evolution 20: 1036–1042. http://dx.doi.org/10.1093/molbev/msg115
Savill N.J., Hoyle D.C. & Higgs P.G. 2001. RNA sequence evolution with secondary structure constraints: comparison of substitution rate models using maximum-likelihood methods. Genetics 157: 399–411.
Schierwater B., Eitel M., Jakob W., Osigus H.J., Hadrys H., Dellaporta S.L., Kolokotronis S.O. & Desalle R. 2009. Concatenated analysis sheds light on early metazoan evolution and fuels a modern “urmetazoon” hypothesis. PLoS Biology 7: e20. http://dx.doi.org/10.1371/journal.pbio.1000020
Smith S.A., Wilson N.G., Goetz F.E., Feehery C., Andrade S.C., Rouse G.W., Giribet G. & Dunn C.W. 2011. Resolving the evolutionary relationships of molluscs with phylogenomic tools. Nature 480: 364– 367. http://dx.doi.org/10.1038/nature10526
Soltis D.E., Albert V.A., Savolainen V., Hilu K., Qiu Y.L., Chase M.W., Farris J.S., Stefanovic S., Rice D.W., Palmer J.D. & Soltis P.S. 2004. Genome-scale data, angiosperm relationships, and “ending incongruence”: a cautionary tale in phylogenetics. Trends in Plant Science 9: 477–483. http://dx.doi.org/10.1016/j.tplants.2004.08.008
Stamatakis A. & Ott M. 2008. Efficient computation of the phylogenetic likelihood function on multigene alignments and multi-core architectures. Philosophical Transactions of the Royal Society of London B 363: 3977–3984. http://dx.doi.org/10.1098/rstb.2008.0163
Steel M. 2005. Should phylogenetic models be trying to “fit an elephant”? Trends in Genetics 21: 307– 309. http://dx.doi.org/10.1016/j.tig.2005.04.001
Streicher J.W., Schulte 2nd J.A. & Wiens J.J. 2016. How should genes and taxa be sampled for phylogenomic analyses with missing data? An empirical study in iguanian lizards. Systematic Biology 65: 128–145. http://dx.doi.org/10.1093/sysbio/syv058
Sun L., Fang L., Zhang Z., Chang X., Penny D. & Zhong B. 2016. Chloroplast phylogenomic inference of green algae relationships. Nature Science Reports 6: e20528. http://dx.doi.org/10.1038/srep20528
Szollosi G.J., Tannier E., Daubin V. & Boussau B. 2015. The inference of gene trees with species trees. Systematic Biology 64: e42–62. http://dx.doi.org/10.1093/sysbio/syu048
Talavera G. & Castresana J. 2007. Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments. Systematic Biology 56: 564–577. http://dx.doi.org/10.1080/10635150701472164
Tamas I., Wernegreen J.J., Nystedt B., Kauppinen S.N., Darby A.C., Gomez-Valero L., Lundin D., Poole A.M. & Andersson S.G. 2008. Endosymbiont gene functions impaired and rescued by polymerase infidelity at poly(A) tracts. Proceedings of the National Academy of Sciences 105: 14934–14939. http://dx.doi.org/10.1073/pnas.0806554105
Tan G., Muffato M., Ledergerber C., Herrero J., Goldman N., Gil M. & Dessimoz C. 2015. Current methods for automated filtering of multiple sequence alignments frequently worsen single-gene phylogenetic inference. Systematic Biology 64: 778–791. http://dx.doi.org/10.1093/sysbio/syv033
Vienne D.M. de, Ollier S. & Aguileta G. 2012. Phylo-MCOA: A fast and efficient method to detect outlier genes and species in phylogenomics using multiple co-inertia analysis. Molecular Biology and Evolution 29: 1587–1598. http://dx.doi.org/10.1093/molbev/msr317
Wiens J.J. 2003. Missing data, incomplete taxa, and phylogenetic accuracy. Systematic Biology 52: 528–538. http://dx.doi.org/10.1080/10635150390218330
Wiens J.J. 2005. Can incomplete taxa rescue phylogenetic analyses from long-branch attraction? Systematic Biology 54: 731–742. http://dx.doi.org/10.1080/10635150500234583
Wiens J.J. & Morrill M.C. 2011. Missing data in phylogenetic analysis: reconciling results from simulations and empirical data. Systematic Biology 60: 719–731. http://dx.doi.org/10.1093/sysbio/syr025
Wodniok S., Brinkmann H., Glockner G., Heidel A.J., Philippe H., Melkonian M. & Becker B. 2011. Origin of land plants: do conjugating green algae hold the key? BMC Evolutionary Biology 11: e104. http://dx.doi.org/10.1186/1471-2148-11-104
Wong K.M., Suchard M.A. & Huelsenbeck J.P. 2008. Alignment uncertainty and genomic analysis. Science 319: 473–476. http://dx.doi.org/10.1126/science.1151532
Wu M., Chatterji S. & Eisen J.A. 2012. Accounting for alignment uncertainty in phylogenomics. PloS One 7: e30288. http://dx.doi.org/10.1371/journal.pone.0030288
Yang Z. 1993. Maximum-Likelihood estimation of phylogeny from DNA sequences when substitution rates differ over sites. Molecular Biology and Evolution 10: 1396–1401.
Yang Z. 1996. Maximum-Likelihood models for combined analyses of multiple sequence data. Journal of Molecular Evolution 42: 587–596. http://dx.doi.org/10.1007/BF02352289