Correlation-Based Feature Selection of Single Cell Transcriptomics Data from Multiple Sources

Author:

Mitić Nenad S.1,Malkov Saša N.1,Ružičić Mirjana M. Maljković1,Veljković Aleksandar N.1,Čukić Ivan Lj.1,Lin Xin2,Lyu Minjie2,Brusić² Vladimir2

Affiliation:

1. University of Belgrade

2. University of Nottingham Ningbo China

Abstract

Abstract

When using data mining or machine learning techniques on large and diverse datasets, it is often necessary to construct descriptive and predictive models. Descriptive models are used for discovering relationships among the attributes of the data while predictive models identify the characteristics of the data that will be collected in future. Bioinformatics data are high-dimensional, making it practically impossible to apply the majority of "classic" algorithms for classification and clustering. Even when the algorithms are useful, the training with large multidimensional data significantly increases the processing time. The algorithms specialized for working with high-dimensional data often cannot process data that contains large data sets that have several thousand dimensions (features). Dimension reduction methods (such as PCA) do not provide satisfactory results, and in addition, they obscure the meaning of the initial attributes in the data. For the constructed models to be usable, they must meet the requirement of scalability due to the large increase in the amount of bioinformatics data collected daily. Furthemore, the significance of the individual data features can also differ from source to source. This work describes an attribute selection method to efficiently classify high-dimensional (30,698) transcriptomics data collected from multiple sources. The proposed method was tested using 22 classification algorithms. The classification results for the selected sets of attributes are comparable to the results for the complete set of attributes.

Publisher

Research Square Platform LLC

Reference20 articles.

1. Mechanisms and Measurement of Changes in Gene Expression;Singh KP;Biol Res Nurs,2018

2. Transcriptome: connecting the genome to gene function;Adams J;Nat Educ,2008

3. Alberts B, Johnson A, Lewis J et al. Molecular Biology of the Cell. 4th edition. New York: Garland Science; 2002. From DNA to RNA. https://www.ncbi.nlm.nih.gov/books/NBK26887/.

4. RNA-Seq: a revolutionary tool for transcriptomics;Wang Z;Nat Rev Genet,2009

5. Ishii T. Cellular Endocrinology in Health and Disease (Second Edition), Academic Press, 2021, pp. 349–374, https://doi.org/10.1016/B978-0-12-819801-8.00017-X.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3