Abstract
ABSTRACTMotivationNon-coding RNAs (ncRNAs) play important roles in various biological processes. In past, homologousncRNA search in genomic scale (e.g., search all house mouse ncRNAs for several human ones) is difficult since explicit consideration of secondary structure in alignment leads to impractical complexity on both of time and space.ResultsIn this study, building the program CRAST (Context RNA Alignment Search Tool, available at “https://github.com/heartsh/crast” including the used validation/test set), we developed the CRAST algorithm, a “seed-and-extend” alignment one based on adaptive seed and RNA secondary structure context (motif probabilities) as in Fig. 1. The algorithm is O(n: a sum of lengths of target sequences) on time through help of adaptive seed, implicitly considering both of sequence and secondary structure; it provides computation time comparable with other BLAST-like tools, significantly reduced from any variant of the Sankoff algorithm for alignment with the explicit consideration. It detects homologs as many as other BLAST-like tools and the lowest number of non-homologous ncRNAs.
Publisher
Cold Spring Harbor Laboratory