Pre-training with pseudo-labeling for regulatory sequence prediction-Reference-Cited by-同舟云学术

Pre-training with pseudo-labeling for regulatory sequence prediction

Published:2023-12-23 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Mourad Raphael^ORCID

Abstract

Predicting molecular processes using deep learning is a promising approach to provide biological insights for non-coding SNPs identified in genome-wide association studies. However, most deep learning methods rely on supervised learning which requires DNA sequences associated with functional data, and whose amount is severely limited by the finite size of the human genome. Conversely, the amount of mammalian DNA sequences is growing exponentially due to ongoing large-scale sequencing projects, but in most cases without functional data. To alleviate the limitations of supervised learning, we propose a novel semi- supervised learning based on pseudo-labeling, which allows to explot unannotated DNA sequences from numerous genomes during model pre-training. The approach is very flexible and can be used to train any neural architecture including state-of-the-art models, and shows in certain situations strong predictive performance improvements compared to standard supervised learning in most cases.

Publisher

Cold Spring Harbor Laboratory

Reference22 articles.

1. 10 Years of GWAS Discovery: Biology, Function, and Translation

2. Genome-wide association studies of coronary artery disease and heart failure: where are we going?

3. The genetics of type 2 diabetes: what have we learned from GWAS?

4. Genome-wide association studies in psychiatry: what have we learned?

5. Systematic Localization of Common Disease-Associated Variation in Regulatory DNA

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Improving the performance of supervised deep learning for regulatory genomics using phylogenetic augmentation;Bioinformatics;2024-03-29