Author:
Pfeifer Leah D.,Patabandige Milani W.,Desaire Heather
Abstract
Applying machine learning strategies to interpret mass spectrometry data has the potential to revolutionize the way in which disease is diagnosed, prognosed, and treated. A persistent and tedious obstacle, however, is relaying mass spectrometry data to the machine learning algorithm. Given the native format and large size of mass spectrometry data files, preprocessing is a critical step. To ameliorate this challenge, we sought to create an easy-to-use, continuous pipeline that runs from data acquisition to the machine learning algorithm. Here, we present a start-to-finish pipeline designed to facilitate supervised and unsupervised classification of mass spectrometry data. The input can be any ESI data set collected by LC-MS or flow injection, and the output is a machine learning ready matrix, in which each row is a feature (an abundance of a particular m/z), and each column is a sample. This workflow provides automated handling of large mass spectrometry data sets for researchers seeking to implement machine learning strategies but who lack expertise in programming/coding to rapidly format the data. We demonstrate how the pipeline can be used on two different mass spectrometry data sets: 1) ESI-MS of fingerprint lipid compositions acquired by direct infusion and, 2) LC-MS of IgG glycopeptides. This workflow is uncomplicated and provides value via its simplicity and effectiveness.
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献