Affiliation:
1. Qilu University of Technology (Shandong Academy of Science)
2. East China Normal University
3. University of Science and Technology of China
Abstract
Abstract
Molecular recognition usually adopts the molecular spectra library matching or the simulation-based ‘trial-and-error’ strategies. However, these two methods are largely limited by the low coverage rate, high construction cost and the time-consuming process. Here we developed TranSpec, a deep learning model based on the convolutional neural networks and multi-head attention mechanism, to directly ‘translate’ the molecular vibrational spectra into the simplified molecular input line entry system (SMILES) representations. Taking the QM9S dataset that includes the quantum chemistry simulated spectra of 130K molecules as the benchmark, we have demonstrated the greedy search (generating only one SMILES) can precisely identify 90%-100% functional groups, and provide about 60% correct SMILES based on the infrared (IR) or Raman spectra. To enhance the translation accuracy, we proposed several strategies such as leveraging the packed IR and Raman spectra as input, employing threshold search to generate more SMILES candidates, and filtering with the molecular mass. Finally, we demonstrated that TranSpec has good transferability through translation experimental infrared spectra and found that threshold searches (generating top 1 and top 10 SMILES candidates) correctly identified 21.8% and 55.9% of the molecules, respectively. The proposed TranSpec realized the direct interpretation of molecule spectra and paved a promising way to achieve fast and real-time molecular recognition.
Publisher
Research Square Platform LLC