Affiliation:
1. Global Institute of Future Technology Shanghai Jiaotong University University Shanghai 200240 China
2. School of Computer Science and Engineering Sun Yat‐sen University Guangzhou 510000 China
3. Galixir Technologies Shanghai 200100 China
4. School of Informatics Xiamen University Xiamen 361005 China
5. IBENS, Ecole Normale Supérieure PSL Research Institute Paris France
Abstract
AbstractConstructing discriminative representations of molecules lies at the core of a number of domains such as drug discovery, chemistry, and medicine. State‐of‐the‐art methods employ graph neural networks and self‐supervised learning (SSL) to learn unlabeled data for structural representations, which can then be fine‐tuned for downstream tasks. Albeit powerful, these methods are pre‐trained solely on molecular structures and thus often struggle with tasks involved in intricate biological processes. Here, it is proposed to assist the learning of molecular representation by using the perturbed high‐content cell microscopy images at the phenotypic level. To incorporate the cross‐modal pre‐training, a unified framework is constructed to align them through multiple types of contrastive loss functions, which is proven effective in the formulated novel tasks to retrieve the molecules and corresponding images mutually. More importantly, the model can infer functional molecules according to cellular images generated by genetic perturbations. In parallel, the proposed model can transfer non‐trivially to molecular property predictions, and has shown great improvement over clinical outcome predictions. These results suggest that such cross‐modality learning can bridge molecules and phenotype to play important roles in drug discovery.
Funder
National Key Research and Development Program of China
National Natural Science Foundation of China
Reference70 articles.
1. MoleculeNet: a benchmark for molecular machine learning
2. D. K.Duvenaud D.Maclaurin J.Iparraguirre R.Bombarell T.Hirzel A.Aspuru‐Guzik R. P.Adams inAdvances in neural information processing systems 28 Curran Associates Inc Red Hook NY2015 pp.2224–2232.
3. K.Xu W.Hu J.Leskovec S.Jegelka in7thInternational Conference on Learning Representations ICLR 2019 OpenReview.net Amherst MA2019.
4. Y.Song S.Zheng Z.Niu Z.‐H.Fu Y.Lu Y.Yang inProceedings of the Twenty‐Ninth International Joint Conference on Artificial Intelligence (IJCAI 2020) California2020 pp.2831–2838.