Intrinsic entropy model for feature selection of scRNA-seq data

Author:

Li Lin12,Tang Hui3ORCID,Xia Rui12,Dai Hao1,Liu Rui3ORCID,Chen Luonan1456ORCID

Affiliation:

1. State Key Laboratory of Cell Biology, Shanghai Institute of Biochemistry and Cell Biology, CAS Center for Excellence in Molecular Cell Science, Chinese Academy of Sciences , Shanghai 200031, China

2. University of Chinese Academy of Sciences , Beijing 100049, China

3. School of Mathematics, South China University of Technology , Guangzhou 510640, China

4. Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences , Kunming 650223, China

5. Key Laboratory of Systems Health Science of Zhejiang Province, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Chinese Academy of Sciences , Hangzhou 310024, China

6. Guangdong Institute of Intelligence Science and Technology , Zhuhai 519031, China

Abstract

Abstract Recent advances of single-cell RNA sequencing (scRNA-seq) technologies have led to extensive study of cellular heterogeneity and cell-to-cell variation. However, the high frequency of dropout events and noise in scRNA-seq data confounds the accuracy of the downstream analysis, i.e. clustering analysis, whose accuracy depends heavily on the selected feature genes. Here, by deriving an entropy decomposition formula, we propose a feature selection method, i.e. an intrinsic entropy (IE) model, to identify the informative genes for accurately clustering analysis. Specifically, by eliminating the ‘noisy’ fluctuation or extrinsic entropy (EE), we extract the IE of each gene from the total entropy (TE), i.e. TE = IE + EE. We show that the IE of each gene actually reflects the regulatory fluctuation of this gene in a cellular process, and thus high-IE genes provide rich information on cell type or state analysis. To validate the performance of the high-IE genes, we conduct computational analysis on both simulated datasets and real single-cell datasets by comparing with other representative methods. The results show that our IE model is not only broadly applicable and robust for different clustering and classification methods, but also sensitive for novel cell types. Our results also demonstrate that the intrinsic entropy/fluctuation of a gene serves as information rather than noise in contrast to its total entropy/fluctuation.

Funder

National Key Research and Development Program of China

National Natural Science Foundation of China

Chinese Academy of Sciences

Japan Science and Technology Corporation

Publisher

Oxford University Press (OUP)

Subject

Cell Biology,Genetics,Molecular Biology,General Medicine

Reference40 articles.

1. Random forests;Breiman;Mach. Learn.,2001

2. Accounting for technical noise in single-cell RNA-seq experiments;Brennecke;Nat. Methods,2013

3. Detecting early-warning signals for sudden deterioration of complex diseases by dynamical network biomarkers;Chen;Sci. Rep.,2012

4. XGBoost: a scalable tree boosting system;Chen;In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,2016

5. CCL20 signaling in the tumor microenvironment;Chen;Adv. Exp. Med. Biol.,2020

Cited by 7 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Single-cell omics: experimental workflow, data analyses and applications;Science China Life Sciences;2024-07-23

2. A framework for scRNA-seq data clustering based on multi-view feature integration;Biomedical Signal Processing and Control;2024-03

3. scFED: Clustering Identifying Cell Types of scRNA-Seq Data Based on Feature Engineering Denoising;Interdisciplinary Sciences: Computational Life Sciences;2023-07-04

4. ESR: Optimizing Gene Feature Selection for scRNA-seq Data;2023 IEEE 10th International Conference on Cyber Security and Cloud Computing (CSCloud)/2023 IEEE 9th International Conference on Edge Computing and Scalable Cloud (EdgeCom);2023-07

5. Reference: An algorithm for recognizing the main melody of orchestral music based on artificial intelligence of music melody contour;Applied Mathematics and Nonlinear Sciences;2023-04-28

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3