Abstract
AbstractTwo-dimensional graphical dotplotting is adopted to identify sequence elements and their variants in lengths of DNA of up to 10 kb. Named GCAT for identification of precisely defined short sequences and their variants, its use complements the precise matching of many computational programs, including BLAST. Short reiterated “search” sequences are entered in the Y-axis of the dotplot program to be matched at their identical and near identical (variant) sites in a sequence of interest entered in the X-axis. The result is a barcode-like representation of the identified sequence elements along the X-axis of the dotplot. Alignments of searches and sequence landmarks provide visualization of composition and juxtapositions. The method is described here by example of characterizations of three distinctive sequences available in the annotated Drosophila melanogaster reference genome (www.flybase.org): the Jonah 99C gene region, the transcript of Dipeptidase B and the transposable element roo. Surprising observations emerging from these explorations include in-frame STOP codons in the large exonic intron of Dip-B, high A-content of the replicative strand of roo as TE example and similarities of its ORF and the large intron of Dip-B.
Publisher
Cold Spring Harbor Laboratory