Abstract
Plant cells have two major organelles with their own genomes: chloroplasts and mitochondria. While chloroplast genomes tend to be structurally conserved, the mitochondrial genomes of plants, which are much larger than those of animals, are characterized by complex structural variation. We introduce TIPP_plastid, a user-friendly, reference-free assembly tool that uses PacBio high-fidelity (HiFi) long-read data and that does not rely on genomes from related species or nuclear genome information for the assembly of organellar genomes. TIPP_plastid employs a deep learning model for initial read classification and leverages k-mer counting for further refinement, significantly reducing the impact of nuclear insertions of organellar DNA on the assembly process. We used TIPP_plastid to completely assemble a set of 54 complete chloroplast genomes. No other tool was able to completely assemble this set. TIPP_platiid outperforms PMAT in mitochondrial genome assembly, especially with respect to the completeness of protein coding genes. We also used the assembled organelle genomes to identify instances of nuclear plastid DNA (NUPTs) and nuclear mitochondrial DNA (NUMTs) insertions. The cumulative length of NUPTs/NUMTs positively correlates with the size of the nuclear genome, suggesting that insertions occur stochastically. NUPTs/NUMTs show predominantly C:G to T:A changes, with the mutated cytosines typically found in CG and CHG contexts, suggesting that degradation of NUPT and NUMT sequences is driven by the known elevated mutation rate of methylated cytosines. siRNA loci are enriched in NUPTs and NUMTs, consistent with the RdDM pathway mediating DNA methylation in these sequences.
Publisher
Cold Spring Harbor Laboratory