Affiliation:
1. Swiss Data Science Center, ETH Zurich and EPFL , Zurich , Switzerland
2. Biognosys AG , Wagistrasse 21, 8952 Schlieren , Switzerland
3. Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM, BIGR , F-75014 Paris , France
Abstract
Abstract
In the era of constantly increasing amounts of the available protein data, a relevant and interpretable visualization becomes crucial, especially for tasks requiring human expertise. Poincaré disk projection has previously demonstrated its important efficiency for visualization of biological data such as single-cell RNAseq data. Here, we develop a new method PoincaréMSA for visual representation of complex relationships between protein sequences based on Poincaré maps embedding. We demonstrate its efficiency and potential for visualization of protein family topology as well as evolutionary and functional annotation of uncharacterized sequences. PoincaréMSA is implemented in open source Python code with available interactive Google Colab notebooks as described at https://www.dsimb.inserm.fr/POINCARE_MSA.
Funder
Ministry of Research
Université Paris Cité
National Institute for Health and Medical Research
Laboratory of Excellence GR-Ex
French National Research Agency
High Performance Computing
Institut du Développement et Des Ressources en Informatique Scientifique, France
Très Grand Centre de Calcul
Grand Equipement National de Calcul Intensif, France
Publisher
Oxford University Press (OUP)
Subject
Molecular Biology,Information Systems
Reference28 articles.
1. Using deep learning to annotate the protein universe;Bileschi;Nat Biotechnol,2022
2. Visualizing data using t-SNE;Maaten;J Machine Learning Res,2008
3. UMAP: uniform manifold approximation and projection;McInnes;J Open Source Softw,2018
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献