A literature-derived knowledge graph augments the interpretation of single cell RNA-seq datasets-Reference-Cited by-同舟云学术

A literature-derived knowledge graph augments the interpretation of single cell RNA-seq datasets

Published:2021-04-04 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Doddahonnaiah Deeksha,Lenehan Patrick^ORCID,Hughes Travis^ORCID,Zemmour David^ORCID,Garcia-Rivera Enrique,Venkatakrishnan AJ^ORCID,Chilaka Ramakrisha,Khare Apoorv,Anand Akash,Barve Rakesh^ORCID,Thiagarajan Viswanathan,Soundararajan Venky^ORCID

Abstract

AbstractTechnology to generate single cell RNA-sequencing (scRNA-seq) datasets and tools to annotate them have rapidly advanced in the past several years. Such tools generally rely on existing transcriptomic datasets or curated databases of cell type defining genes, while the application of scalable natural language processing (NLP) methods to enhance analysis workflows has not been adequately explored. Here we deployed an NLP framework to objectively quantify associations between a comprehensive set of over 20,000 human protein-coding genes and over 500 cell type terms across over 26 million biomedical documents. The resultant gene-cell type associations (GCAs) are significantly stronger between a curated set of matched cell type-marker pairs than the complementary set of mismatched pairs (Mann Whitney p < 6.15×10−76, r = 0.24; cohen’s D = 2.6). Building on this, we developed an augmented annotation algorithm that leverages GCAs to categorize cell clusters identified in scRNA-seq datasets, and we tested its ability to predict the cellular identity of 185 clusters in 13 datasets from human blood, pancreas, lung, liver, kidney, retina, and placenta. With the optimized settings, the true cellular identity matched the top prediction in 66% of tested clusters and was present among the top five predictions for 94% of clusters. Further, contextualization of differential expression analyses with these GCAs highlights poorly characterized markers of established cell types, such as CLIC6 and DNASE1L3 in retinal pigment epithelial cells and endothelial cells, respectively. Taken together, this study illustrates for the first time how the systematic application of a literature derived knowledge graph can expedite and enhance the annotation and interpretation of scRNA-seq data.

Publisher

Cold Spring Harbor Laboratory

Reference90 articles.

1. Massively parallel digital transcriptional profiling of single cells

2. Low-coverage single-cell mRNA sequencing reveals cellular heterogeneity and activated signaling pathways in developing cerebral cortex

3. Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets

4. Droplet Barcoding for Single-Cell Transcriptomics Applied to Embryonic Stem Cells

5. Massively Parallel Single-Cell RNA-Seq for Marker-Free Decomposition of Tissues into Cell Types

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. On the origin of Omicron’s unique Spike gene insertion;2022-06-06

2. Genetic alteration of human MYH6 is mimicked by SARS-CoV-2 polyprotein: mapping viral variants of cardiac interest;2021-11-29