Handling Logical Character Dependency in Phylogenetic Inference: Extensive Performance Testing of Assumptions and Solutions Using Simulated and Empirical Data-Reference-Cited by-同舟云学术

Handling Logical Character Dependency in Phylogenetic Inference: Extensive Performance Testing of Assumptions and Solutions Using Simulated and Empirical Data

Published:2023-02-11 Issue:3 Volume:72 Page:662-680
ISSN:1063-5157
Container-title:Systematic Biology
language:en
Short-container-title:

Author:

Simões Tiago R¹^ORCID,Vernygora Oksana V²,de Medeiros Bruno A S³,Wright April M⁴^ORCID

Affiliation:

1. Department of Organismic and Evolutionary Biology and Museum of Comparative Zoology, Harvard University , Cambridge, Massachusetts , USA

2. Department of Entomology, University of Kentucky , Lexington, Kentucky , USA

3. Smithsonian Tropical Research Institute , Panama City , Panama

4. Department of Biological Sciences, Southeastern Louisiana University , Hammond, Louisiana , USA

Abstract

AbstractLogical character dependency is a major conceptual and methodological problem in phylogenetic inference of morphological data sets, as it violates the assumption of character independence that is common to all phylogenetic methods. It is more frequently observed in higher-level phylogenies or in data sets characterizing major evolutionary transitions, as these represent parts of the tree of life where (primary) anatomical characters either originate or disappear entirely. As a result, secondary traits related to these primary characters become “inapplicable” across all sampled taxa in which that character is absent. Various solutions have been explored over the last three decades to handle character dependency, such as alternative character coding schemes and, more recently, new algorithmic implementations. However, the accuracy of the proposed solutions, or the impact of character dependency across distinct optimality criteria, has never been directly tested using standard performance measures. Here, we utilize simple and complex simulated morphological data sets analyzed under different maximum parsimony optimization procedures and Bayesian inference to test the accuracy of various coding and algorithmic solutions to character dependency. This is complemented by empirical analyses using a recoded data set on palaeognathid birds. We find that in small, simulated data sets, absent coding performs better than other popular coding strategies available (contingent and multistate), whereas in more complex simulations (larger data sets controlled for different tree structure and character distribution models) contingent coding is favored more frequently. Under contingent coding, a recently proposed weighting algorithm produces the most accurate results for maximum parsimony. However, Bayesian inference outperforms all parsimony-based solutions to handle character dependency due to fundamental differences in their optimization procedures—a simple alternative that has been long overlooked. Yet, we show that the more primary characters bearing secondary (dependent) traits there are in a data set, the harder it is to estimate the true phylogenetic tree, regardless of the optimality criterion, owing to a considerable expansion of the tree parameter space. [Bayesian inference, character dependency, character coding, distance metrics, morphological phylogenetics, maximum parsimony, performance, phylogenetic accuracy.]

Funder

Natural Sciences and Engineering Research Council of Canada

National Institute of General Medical Sciences

Smithsonian Institution

Publisher

Oxford University Press (OUP)

Subject

Genetics,Ecology, Evolution, Behavior and Systematics

Link

https://academic.oup.com/sysbio/advance-article-pdf/doi/10.1093/sysbio/syad006/50147581/syad006.pdf

Reference75 articles.

1. Genomic support for a moa–tinamou clade and adaptive morphological convergence in flightless ratites;Baker;Mol. Biol. Evol,2014