Mining Gene Expression Profile with Missing Values: An Integration of Kernel PCA and Robust Singular Values Decomposition

Author:

Islam Md. Saimul1,Hoque Md. Aminul1,Islam Md. Sahidul1,Ali Mohammad2,Hossen Md. Bipul3,Binyamin Md.4,Merican Amir Feisal5,Akazawa Kohei6,Kumar Nishith7,Sugimoto Masahiro1

Affiliation:

1. Department of Statistics, University of Rajshahi, Rajshahi-6205, Bangladesh

2. Statistics Discipline, Khulna University, Khulna-9208, Bangladesh

3. Department of Statistics, Begum Rokeya University, Rangpur-5400, Bangladesh

4. Department of Statistics, Mawlana Bhashani Science and Technology University, Santosh, Tangail-1902, Bangladesh

5. Institute of Biological Sciences, Faculty of Science and Centre of Research for Computational Sciences & Informatics for Biology, Bioindustry, Environment, Agriculture, and Healthcare (CRYSTAL), University of Malaya, Kuala Lumpur- 50603, Malaysia

6. Department of Medical Informatics, Niigata University Medical and Dental Hospital, Asahimachidori 1-754, Niigata 951-8520, Japan

7. Department of Statistics, Bangabandhu Sheikh Mujibur Rahman Science and Technology University,Gopalganj, Bangladesh

Abstract

Background: Gene expression profiling and transcriptomics provide valuable information about the role of genes that are differentially expressed between two or more samples. It is always important and challenging to analyse High-throughput DNA microarray data with a number of missing values under various experimental conditions. </P><P> Objectives: Graphical data visualizations of the expression of all genes in a particular cell provide holistic views of gene expression patterns, which improve our understanding of cellular systems under normal and pathological conditions. However, current visualization methods are sensitive to missing values, which are frequently observed in microarray-based gene expression profiling, potentially affecting the subsequent statistical analyses. Methods: We addressed in this study the problem of missing values with respect to different imputation methods using gene expression biplot (GE biplot), one of the most popular gene visualization techniques. The effects of missing values for mining differentially expressed genes in gene expression data were evaluated using four well-known imputation methods: Robust Singular Value Decomposition (Robust SVD), Column Average (CA), Column Median (CM), and K-nearest Neighbors (KNN). Frobenius norm and absolute distances were used to measure the accuracy of the methods. Results: Three numerical experiments were performed using simulated data (i) and publicly available colon cancer (ii) and leukemia data (iii) to analyze the performance of each method. The results showed that CM and KNN performed better than Robust SVD and CA for identifying the index gene profile in the biplot visualization in both the simulation study and the colon cancer and leukemia microarray datasets. Conclusion: The impact of missing values on the GE biplot was smaller when the data matrix was imputed by KNN than by CM. This study concluded that KNN performed satisfactorily in generating a GE biplot in the presence of missing values in microarray data.

Publisher

Bentham Science Publishers Ltd.

Subject

Computational Mathematics,Genetics,Molecular Biology,Biochemistry

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3