Abstract
AbstractDespite the convergent evolution of sex chromosomes on multiple organisms, so far litle is known regarding the structure and content of the different Y (or W) chromosomes. The sequence characteristics acquired by the Y throughout the divergence from its X pair interfere with computational methods, causing a difficulty in the correct assembly and characterization of Y-linked regions. Even in cases where the rest of the genome is resolved in chromosome-level, Y chromosome appears fragmented in many unmapped scaffolds or erroneously assembled with autosomal regions. This limits the insights on Y chromosomes and requires the development of computational tools for the efficient detection of male-only regions in the assembly. Two novel Y-detection methods are presented here, R-CQ and KAMY, that revisit two classical ratio- and kmer-based approaches, towards the efficient characterization of Y-linked assembly regions. The R-CQ and KAMY methods were benchmarked against their predecessors CQ and YGS for their performance on identifying Y-linked regions on the genomes of two Tephritidae species,Bactrocera oleaeandCeratitis capitata,that are characterized by different Y morphologies and are sequenced with different methodologies. The efficiency and generic applicability of the methods was further validated by using the existing Y annotations ofDrosophila melanogaster. The results indicate an improved performance for R-CQ and KAMY over the characterization of Y chromosomes from different lineages compared to CQ and YGS. In addition, the extensive manual curation of the results reported here describes the size and quality of the assembled sequence from the two Tephritidae Y chromosomes, while evaluating the performance of methods when dealing with validated mis-assemblies in contigs. We conclude with recommendations for future sequencing efforts in insects aiming towards including a contiguous Y-chromosome assembly.
Publisher
Cold Spring Harbor Laboratory