Longest Order Conserved Exemplar Subsequences

Author:

Zhang Shu,Pu Lianrong,Yang RunminORCID,Wang Luli,Zhu DamingORCID,Jiang Haitao

Abstract

AbstractWe propose a new problem whose input data are two linear genomes together with two indexed gene subsequences of them, which asks to find a longest common exemplar subsequence of the two given genomes with a subsequence identical to the given indexed gene subsequences. We present an algorithm for this problem such that the algorithm is allowed to take diminishing time and space to solve the problem by setting the indexed genes with an incremental number. Although an incremental number of indexed genes were selected, the algorithm was verified definite to reach a solution whose length insistently comes very close to a real longest common exemplar subsequence of the two given genomes.Aiming at 23 human/gorilla chromosome pairs, the algorithm was examined for use in questing for longest common exemplar subsequences whose basic units are annotated genes as well as pseudo genes, namely consecutive DNA subsequences. By contrasting the pseudo gene common exemplar subsequences the algorithm had reached for the human chromosomes 7 and 16 and their gorilla homologues with those annotated genes in the human and gorilla chromosomes, we found more than 1 000 and 500 pseudo genes in the human chromosomes 7 and 16 that occur in the same order as they are in the gorilla chromosomes 7 and 16 and, do not overlap with any annotated gene.Author summaryThere is a benefit of the algorithm: It can reach a long enough common exemplar subsequence of two linear genomes in as fast a speed as one requires even if the given genomes would be equipped with too many duplicated genes, which can be done by setting incremental number of indexed genes. We developed a Java software based on the algorithm, that has been available for download on https://github.com/ShuZhang-sdu/LCES.Only in need to set the indexed gene sequences as null, was it verified successful for our algorithm to obtain the longest common exemplar subsequences of the annotated gene summary pairs extracted from 23 human/gorilla chromosome pairs.In convenience for researchers to find new motifs or conserved genes, we devoted for the algorithm to quest pseudo gene (i.e. consecutive DNA subsequences) summary pairs of the 23 human/gorilla chromosome pairs for solutions. There are 20 pseudo gene summary pairs whose longest common exemplar subsequences have been found by the algorithm with null indexed gene sequences. The other 3 pseudo gene summary pairs were verified solvable for the algorithm to reach their longest common exemplar subsequences that have to admit subsequences identical to given indexed gene subsequences. There were informed to exist 2 353 and 1 148 pseudo genes in the gorilla chromosome 7 and 16 that occur in the same order as they are in the human chromosome 7 and 16 and, do not overlap with any annotated gene. These pseudo genes should be significant for annotating the human or gorilla genome.

Publisher

Cold Spring Harbor Laboratory

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3