FusOn-pLM: A Fusion Oncoprotein-Specific Language Model via Focused Probabilistic Masking
Author:
Vincoff Sophia, Goel Shrey, Kholina Kseniia, Pulugurta Rishab, Vure Pranay, Chatterjee PranamORCID
Abstract
AbstractFusion oncoproteins, a class of chimeric proteins arising from chromosomal translocations, drive and sustain various cancers, particularly those impacting children. Unfortunately, due to their intrinsically disordered nature, large size, and lack of well-defined, druggable pockets, they have been historically challenging to target therapeutically: neither small molecule-based methods nor structure-based approaches for binder design are strong options for this class of molecules. Recently, protein language models (pLMs) have demonstrated success at representing protein sequences with information-rich embeddings, enabling downstream design applications from sequence alone. However, no current pLM has been trained on fusion oncoprotein sequences and thus may not produce optimal representations for these proteins. In this work, we introduceFusOn-pLM, a novel pLM that fine-tunes the state-of-the-art ESM-2 model on fusion oncoprotein sequences. We specifically introduce a novel masked language modeling (MLM) strategy, employing a binding-site probability predictor to focus masking on key amino acid residues, thereby generating more optimal fusion oncoprotein-aware embeddings. Our model improves performance on both fusion oncoprotein-specific benchmarks and disorder prediction tasks in comparison to baseline ESM-2 representations, as well as manually-constructed biophysical embeddings, motivating downstream usage of FusOn-pLM embeddings for therapeutic design tasks targeting these fusions. We have made our model publicly available to the community athttps://huggingface.co/ChatterjeeLab/FusOn-pLM.
Publisher
Cold Spring Harbor Laboratory
Reference38 articles.
1. [Abramson et al., 2024] Abramson, J. , Adler, J. , Dunger, J. , Evans, R. , Green, T. , Pritzel, A. , Ronneberger, O. , Willmore, L. , Ballard, A. J. , Bambrick, J. , Bodenstein, S. W. , Evans, D. A. , Hung, C.-C. , O’Neill, M. , Reiman, D. , Tunyasuvunakool, K. , Wu, Z. , Žemgulytė, A. , Arvaniti, E. , Beattie, C. , Bertolli, O. , Bridgland, A. , Cherepanov, A. , Congreve, M. , Cowen-Rivers, A. I. , Cowie, A. , Figurnov, M. , Fuchs, F. B. , Gladman, H. , Jain, R. , Khan, Y. A. , Low, C. M. R. , Perlin, K. , Potapenko, A. , Savy, P. , Singh, S. , Stecula, A. , Thillaisundaram, A. , Tong, C. , Yakneen, S. , Zhong, E. D. , Zielinski, M. , Žídek, A. , Bapst, V. , Kohli, P. , Jaderberg, M. , Hassabis, D. , and Jumper, J. M. (2024). Accurate structure prediction of biomolecular interactions with alphafold3. Nature. 2. Fusion oncoproteins in childhood cancers: Potential role in targeted therapy;The Journal of Pediatric Pharmacology and Therapeutics,2021 3. Disprot in 2024: improving function annotation of intrinsically disordered proteins;Nucleic Acids Research,2023 4. [Bhat et al., 2023] Bhat, S. , Palepu, K. , Yudistyra, V. , Hong, L. , Kavirayuni, V. S. , Chen, T. , Zhao, L. , Wang, T. , Vincoff, S. , and Chatterjee, P. (2023). De novogeneration and prioritization of target-binding peptide motifs from sequence alone. 5. Salt&peppr is an interface-predicting language model for designing peptide-guided protein degraders;Communications Biology,2023
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
|
|