Abstract
AbstractRecent successes of foundation models in artificial intelligence have prompted the emergence of large-scale chemical pre-trained models. Despite the growing interest in large molecular pre-trained models that provide informative representations for downstream tasks, attempts for multimodal pre-training approaches on the molecule domain were limited. To address this, here we present a multimodal molecular pre-trained model that incorporates the modalities of structure and biochemical properties, drawing inspiration from recent advances in multimodal learning techniques. Our proposed model pipeline of data handling and training objectives aligns the structure/property features in a common embedding space, which enables the model to regard bidirectional information between the molecules’ structure and properties. These contributions emerge synergistic knowledge, allowing us to tackle both multimodal and unimodal downstream tasks through a single model. Through extensive experiments, we demonstrate that our model has the capabilities to solve various meaningful chemical challenges, including conditional molecule generation, property prediction, molecule classification, and reaction prediction.
Publisher
Springer Science and Business Media LLC
Reference76 articles.
1. Ryu, S. & Lee, S. Accurate, reliable and interpretable solubility prediction of druglike molecules with attention pooling and Bayesian learning. arXiv preprint arXiv:2210.07145 (2022).
2. Kuenneth, C. & Ramprasad, R. Polybert: A chemical language model to enable fully machine-driven ultrafast polymer informatics. Nat. Commun. 14, 4099 (2023).
3. Moon, S., Zhung, W., Yang, S., Lim, J. & Kim, W. Y. Pignet: A physics-informed deep learning model toward generalized drug–target interaction predictions. Chem. Sci. 13, 3661–3673 (2022).
4. Xu, C., Wang, Y. & Farimani, A. B. Transpolymer: a transformer-based language model for polymer property predictions. npj Comp. Mat. 9, 64 (2023).
5. Paul, D. et al. Artificial intelligence in drug discovery and development. Drug Discov. Today 26, 80–93 (2021).