FASTCAR: Rapid alignment-free prediction of sequence alignment identity scores using self-supervised general linear models-Reference-Cited by-同舟云学术

FASTCAR: Rapid alignment-free prediction of sequence alignment identity scores using self-supervised general linear models

Published:2018-07-31 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

James Benjamin T.,Luczak Brian B.,Girgis Hani Z.^ORCID

Abstract

AbstractMotivationPairwise alignment is a predominant algorithm in the field of bioinformatics. This algorithm is quadratic — slow especially on long sequences. Many applications utilize identity scores without the corresponding alignments. For these applications, we propose FASTCAR. It produces identity scores for pairs of DNA sequences using alignment-free methods and two self-supervised general linear models.ResultsFor the first time, the new tool can predict the pair-wise identity score in linear time and space. On two large-scale sequence databases, FASTCAR provided the best compromise between sensitivity and precision while being faster than BLAST by 40% and faster than USEARCH by 6–10 times. Further, FASTCAR is capable of producing the pair-wise identity scores of long DNA sequences — millions-of-nucleotides-long bacterial genomes; this task cannot be accomplished by any alignment-based tool.AvailabilityFASTCAR is available at https://github.com/TulsaBioinformaticsToolsmith/FASTCAR and as the Supplementary Dataset 1.Contacthani-girgis@utulsa.eduSupplementary informationSupplementary data are available online.

Publisher

Cold Spring Harbor Laboratory

Reference56 articles.

1. A measure of the similarity of sets of sequences not requiring sequence alignment

2. Alignment-free genetic sequence comparisons: a review of recent approaches by word analysis

3. Integrating alignment-based and alignment-free sequence similarity measures for biological sequence classification

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. LtrDetector: A tool-suite for detecting long terminal repeat retrotransposons de-novo;BMC Genomics;2019-06-03

2. MeShClust2: Application of alignment-free identity scores in clustering long DNA sequences;2018-10-24

3. Look4TRs: A de-novo tool for detecting simple tandem repeats using self-supervised hidden Markov models;2018-10-23

4. LtrDetector: A modern tool-suite for detecting long terminal repeat retrotransposons de-novo on the genomic scale;2018-10-22