Functional module extraction from gene expression data using data mining techniques

Author:

Jha Monica1,Roy Swarup2

Affiliation:

1. Sikkim Manipal Institute of Technology, Majitar, Rangpoo, Sikkim, India

2. Sikkim University, Tadong, Gangtok, Sikkim, India

Abstract

A set of correlated and co-expressed genes, often referred as a functional module, play a synergistic role during any disease or any biological activities. Genes participating in a common module may cause clinically similar diseases and shares the common genetic origin of their associated disease phenotypes. Identifying such modules may be helpful in system level understanding of biological and cellular processes or pathophysiologic basis of associated diseases. As a result, detecting such functional modules is an active research issue in the area of computational biology. Many techniques have been proposed so far to find functional modules based on gene co-regulation or co-expression data. These methods are broadly categorized into nonnetwork based gene expression clustering techniques and network-based methods that extract modules from gene co-expression networks using expression data sources. We surved main approaches for obtaining modules, and we evaluated their performance regarding finding biologically significant gene modules in the light of both microarray and RNASeq data. No prior effort, other than independent assessment, has been made so far to evaluate their performances in an integrated way in the light of both microarray and RNASeq data. It could be observed that these methods are basically based on certain features and several other features are ignored. No single technique appears to be effective in all respect. Therefore, there is a possibility that some significant modules might be missed out. Keeping this in view we came up with a solution which would engulf the goodness of all the methods into one. We proposed a multilayer ensemble approach based on few well-known module detection techniques into one. We observed that ensemble of techniques enhances the quality of modules in terms biological significance. We evaluated the effectiveness of the ensemble approach in detecting disease specific modules. Often, a set of genes found to be responsible for dual (or even more) functionalities while participating in multiple overlapping module formation. A more compact form of overlapping module structure is intrinsic structure, where a set of genes within a module playing additional role despite its parent role where it belongs to. We proposed a unique way of detecting such module structures by using the ensemble of modules obtained by the subspace clustering. We used the concept of frequent itemset mining, a step in Association Rule Mining, to derive such compact modules as subspace clusters. To the best of our knowledge, no prior work is attempted so far to detect overlapping and intrinsic modules simultaneously from ensemble outcomes. Finally, we proposed a ranking method to infer disease responsible key genes based on gene expression data. We used top ranked disease significant modules derived by our ensemble method and based on significance of the module with respect to disease pathways. We applied our ranking method in Breast Cancer and Alzheimer's Disease (AD). We inferred top genes and assessed their significance related to the disease with the help of various gene-disease association databases. Experimental results revealed that BRCA 1 , BRCA 2 , PTEN, ABI 1 and CASP8 are the top key genes in Breast Cancer, whereas, MAPK 1 , APP, CASP 7 , APOE and PSEN 1 are the key players in AD.

Publisher

Association for Computing Machinery (ACM)

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3