Author:
Ji Lanlan,Wang An,Sonthalia Shreyash,Naiman Daniel Q.,Younes Laurent,Colantuoni Carlo,Geman Donald
Abstract
1AbstractAccurate identification of cell classes across the tissues of living organisms is central in the analysis of growing atlases of single-cell RNA sequencing (scRNA-seq) data across biomedicine. Such analyses are often based on the existence of highly discriminating “marker genes” for specific cell classes which enables a deeper functional understanding of these classes as well as their identification in new, related datasets. Currently, marker genes are defined by methods that serially assess the level of differential expression (DE) of individual genes across landscapes of diverse cells. This serial approach has been extremely useful, but is limited because it ignores possible redundancy or complementarity across genes, that can only be captured by analyzing several genes at the same time. We wish to identify discriminatingpanelsof genes. To efficiently explore the vast space of possible marker panels, leverage the large number of cells often sequenced, and overcome zero-inflation in scRNA-seq data, we propose viewing panel selection as a variation of the “minimal set-covering problem” in combinatorial optimization which can be solved with integer programming. In this formulation, the covering elements are genes, and the objects to be covered are cells of a particular class, where a cell is covered by a gene if that gene is expressed in that cell. Our method, CellCover, identifies a panel of marker genes in scRNA-seq data that covers one class of cells within a population. We apply this method to generate covering marker gene panels which characterize cells of the developing mouse neocortex as postmitotic neurons are generated from neural progenitor cells (NPCs). We show that CellCover captures cell class-specific signals distinct from those defined by DE methods and that CellCover’s compact gene panels can be expanded to explore cell type specific function.Transfer learning experiments exploring these covering panels acrossin vivomouse, primate, and human scRNA-seq datasets demonstrate that CellCover identifies markers of conserved cell classes in neurogenesis, as well as markers of temporal progression in the molecular identity of these cell types across development of the mammalian neocortex. The gene covering panels we identify across cell types and developmental time can be freely explored in visualizations across all the public data we use in this report at with NeMo Analytics [1] throughhttps://nemoanalytics.org/p?l=CellCover. The code for CellCover is written in R and the Gurobi R interface and is available at [2].Graphical Abstract
Publisher
Cold Spring Harbor Laboratory
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献