Abstract
ABSTRACTMaximal homology alignment is a new biologically-relevant approach to DNA sequence alignment that maps the internal dispersed microhomology of individual sequences onto two dimensions. It departs from the current method of gapped alignment, which uses a simplified binary state model of nucleotide position. In gapped alignment nucleotide positions have either no relationship (1-to-None) or else orthological relationship (1-to-1) with nucleotides in other sequences. Maximal homology alignment, however, allows additional states such as 1-to-Many and Many-to-Many, thus modeling both orthological and paralogical relationships, which together comprise the main homology types. Maximal homology alignment collects dispersed microparalogy into the same alignment columns on multiple rows, and thereby generates a two-dimensional representation of a single sequence. Sequence alignment then proceeds as the alignment of two-dimensional topological objects. The operations of producing and aligning two-dimensional auto-alignments motivate a need for tests of two-dimensional homological integrity. Here, I work out and implement basic principles for computationally testing the two dimensions of positional homology, which are inherent to biological sequences due to replication slippage and related errors. I then show that maximal homology alignment is more informative than gapped alignment in modeling the evolution of genetic sequences. In general, MHA is more suited when small insertions and deletions predominantly originate as local microparalogy. These results show that both conserved and non-conserved genomic sequences are enriched with a signature of replication slippage relative to their random permutations.
Publisher
Cold Spring Harbor Laboratory