Description
|
Abstract While indel rate variation has been observed and analyzed in detail, it is not taken into account by current indel-aware phylogenetic reconstruction methods. In this work, we introduce a continuous time stochastic process, the geometric Poisson indel process, that generalizes the Poisson indel process by allowing insertion and deletion rates to vary across sites. We design an efficient algorithm for computing the probability of a given multiple sequence alignment based on our new indel model. We describe a method to construct phylogeny estimates from a fixed alignment using neighbor joining. Using simulation studies, we show that ignoring indel rate variation may have a detrimental effect on the accuracy of the inferred phylogenies, and that our proposed method can sidestep this issue by inferring latent indel rate categories. We also show that our phylogenetic inference method may be more stable to taxa subsampling than methods that either ignore indels or indel rate variation. (2020-06-24)
Usage notes Molluscan RNA dataThis dataset is obtained from http://www.rna.icmb.utexas.edu/SIM/4D/Mollusk/alignment.gb. This dataset in nexus format is converted from the original dataset in GenBank format using EMBOSS. The dataset in fasta format used in the data analysis section of the paper can be obtained directly from https://github.com/yzhai220/geopip together with the source code.molluscan.nexus.txt (2020-06-24)
|
Notes
| Dryad version number: 1
Version status: submitted
Dryad curation status: Published
Sharing link: https://datadryad.org/stash/share/8lPS2uIKU09qZwl0Y8iG_lK4CvPyUDS6yUXsIeX9hsI
Storage size: 84068
Visibility: public |