Abstract
Machine learning has greatly advanced over the past decade, owing to advances in algorithmic innovations, hardware acceleration, and benchmark datasets to train on domains such as computer vision, natural-language processing, and more recently the life sciences. In particular, the subfield of machine learning known as deep learning has found applications in genomics, proteomics, and metabolomics. However, a thorough assessment of how the data preprocessing methods required for the analysis of life science data affect the performance of deep learning is lacking. This work contributes to filling that gap by assessing the impact of commonly used as well as newly developed methods employed in data preprocessing workflows for metabolomics that span from raw data to processed data. The results from these analyses are summarized into a set of best practices that can be used by researchers as a starting point for downstream classification and reconstruction tasks using deep learning.
Subject
Molecular Biology,Biochemistry,Endocrinology, Diabetes and Metabolism
Reference34 articles.
1. Back-propagation neural networks for modeling complex systems
2. Adam: A Method for Stochastic Optimization;Kingma;arXiv,2014
3. Deep Learning Scaling is Predictable, Empirically;Hestness;arXiv,2017
4. A survey on Image Data Augmentation for Deep Learning
5. Attention Is All You Need;Vaswani;arXiv,2017
Cited by
7 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献