Author:
Luo Junwei,Gao Runtian,Chang Wenjing,Wang Junfeng
Abstract
The role and biological impact of structural variation (SV) are increasingly evident. Deletion accounts for 40% of SV and is an important type of SV. Therefore, it is of great significance to detect and genotype deletions. At present, high accurate long reads can be obtained as HiFi reads. And, through a combination of error-prone long reads and high accurate short reads, we can also get accurate long reads. These accurate long reads are helpful for detecting and genotyping SVs. However, due to the complexity of genome and alignment information, detecting and genotyping SVs remain a challenging task. Here, we propose LSnet, an approach for detecting and genotyping deletions with a deep learning network. Because of the ability of deep learning to learn complex features in labeled datasets, it is beneficial for detecting SV. First, LSnet divides the reference genome into continuous sub-regions. Based on the alignment between the sequencing data (the combination of error-prone long reads and short reads or HiFi reads) and the reference genome, LSnet extracts nine features for each sub-region, and these features are considered as signal of deletion. Second, LSnet uses a convolutional neural network and an attention mechanism to learn critical features in every sub-region. Next, in accordance with the relationship among the continuous sub-regions, LSnet uses a gated recurrent units (GRU) network to further extract more important deletion signatures. And a heuristic algorithm is present to determine the location and length of deletions. Experimental results show that LSnet outperforms other methods in terms of the F1 score. The source code is available from GitHub at https://github.com/eioyuou/LSnet.
Subject
Genetics (clinical),Genetics,Molecular Medicine