CLOOME: contrastive learning unlocks bioimaging databases for queries with chemical structures-Reference-Cited by-同舟云学术

CLOOME: contrastive learning unlocks bioimaging databases for queries with chemical structures

Published:2023-11-13 Issue:1 Volume:14 Page:
ISSN:2041-1723
Container-title:Nature Communications
language:en
Short-container-title:Nat Commun

Author:

Sanchez-Fernandez Ana,Rumetshofer Elisabeth,Hochreiter Sepp^ORCID,Klambauer Günter^ORCID

Abstract

AbstractThe field of bioimage analysis is currently impacted by a profound transformation, driven by the advancements in imaging technologies and artificial intelligence. The emergence of multi-modal AI systems could allow extracting and utilizing knowledge from bioimaging databases based on information from other data modalities. We leverage the multi-modal contrastive learning paradigm, which enables the embedding of both bioimages and chemical structures into a unified space by means of bioimage and molecular structure encoders. This common embedding space unlocks the possibility of querying bioimaging databases with chemical structures that induce different phenotypic effects. Concretely, in this work we show that a retrieval system based on multi-modal contrastive learning is capable of identifying the correct bioimage corresponding to a given chemical structure from a database of ~2000 candidate images with a top-1 accuracy >70 times higher than a random baseline. Additionally, the bioimage encoder demonstrates remarkable transferability to various further prediction tasks within the domain of drug discovery, such as activity prediction, molecule classification, and mechanism of action identification. Thus, our approach not only addresses the current limitations of bioimaging databases but also paves the way towards foundation models for microscopy images.

Publisher

Springer Science and Business Media LLC

Subject

General Physics and Astronomy,General Biochemistry, Genetics and Molecular Biology,General Chemistry,Multidisciplinary

Link

https://www.nature.com/articles/s41467-023-42328-w.pdf

Reference81 articles.

1. O’Leary, N. A. et al. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res. 44, D733–D745 (2016).

2. UniProt Consortium. UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res. 47, D506–D515 (2019).

3. Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).

4. Karolchik, D. et al. The UCSC genome browser database. Nucleic Acids Res. 31, 51–54 (2003).

5. Burley, S. K. et al. Protein data bank (PDB): the single global macromolecular structure archive. Protein Crystallogr. 1607, 627–641 (2017).

Cited by 9 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Decoding phenotypic screening: A comparative analysis of image representations;Computational and Structural Biotechnology Journal;2024-12

2. Morphological Profiling Dataset of EU-OPENSCREEN Bioactive Compounds Over Multiple Imaging Sites and Cell Lines;2024-08-27

3. Unleashing the potential of cell painting assays for compound activities and hazards prediction;Frontiers in Toxicology;2024-07-17

4. Transformer technology in molecular science;WIREs Computational Molecular Science;2024-07

5. Machine learning-aided generative molecular design;Nature Machine Intelligence;2024-06-18