Abstract
AbstractMultiple sequence alignment plays an important role in many important analyses. However, aligning multiple biological sequences is a complex task, thus many tools have been developed to align sequences under a biologically-inspired objective function. But these tools require a user-defined parameter vector, which if chosen incorrectly, can greatly impact downstream analysis. Parameter Advising addresses this challenge of selecting input-specific parameter vectors by comparing alignments produced by a carefully constructed set of parameter configurations. Ideally accuracy would be used to rank alignments, but in practice, we do not have a reference from which accuracy is calculated. Thus, it is necessary toestimatethe accuracy in order to rank alignments. The accuracy estimatorFacetcomputes an estimate of accuracy as a linear combination of efficiently-computable feature functions. In this work we introduce two versions ofLead(short forLearnedaccuracyestimator from largedatasets) which use the same underlying feature functions asFacetbut are built on top of highly efficient machine learning protocols, allowing us to take advantage of a larger training corpus. This produces an estimator that is more correlated with accuracy. For Parameter Advising,Leadshows an increase of 6% on testing data over using only the default parameter vector.
Publisher
Cold Spring Harbor Laboratory
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献