Abstract
MotivationStatistical dependencies between nucleotides at different positions within a DNA sequence have been used for several purposes including distinguishing coding from noncoding regions of a genome. Coding sequences show correlations within and between codon positions. This study asked whether such correlations between positions separated by short distances might also exist in noncoding DNA. To this end, positional nucleotide dependencies were examined in the promoter regions of four eukaryotic species:Homo sapiens(Hs),Mus musculus(Mm),Drosophila melanogaster(Dm), andSaccharomyces cerevisiae(Sc). The degree of dependency between pairwise positions across a set of aligned sequences was quantified by Mutual Information (MI) and visualized using a novel heatmap method.ResultsMI in promoter sequences aligned at their putative Transcription Start Site (TSS) generally decreased with increasing distance between two positions, but also showed a prominent increase at a distance of 6 base pairs (bp) (i.e., between nucleotides atxandx+ 6 in a sequence) in the three multicellular species, but much less so inSc. This dependency at a distance of 6 bp appears to reflect anN1…N7homonucleotide bias in promoters.AvailabilityR code and data files available at github/dannemil/promoters.Contactdannemil@rice.eduSupplementary informationDannemiller-supplementary-documents.zip
Publisher
Cold Spring Harbor Laboratory