Abstract
Data generation using high throughput technologies has led to the accumulation of diverse types of molecular data.These data have different types (discrete,real,string etc.) and occur in various formats and sizes. Datasets including gene expression, miRNA expression, protein-DNA binding data (ChIP-Seq/ChIP-ChIP), mutation data(copy number variation, single nucleotide polymorphisms), GO annotations, protein-protein interaction and disease-gene association data are some of the commonly used genomic datasets to study biological processes. Each of them provides a unique, complementary and partly independent view of the genome and hence embed essential information about their regulatory mechanisms. In order to understand the functions of genes, proteins and analyze mechanisms arising out of their interactions, information provided by each of these datasets individually may not be sufficient. Therefore integrating these multi-omic data and inferring regulatory interactions from the integrated dataset provides a system level biological insights in predicting gene functions and their phenotypic outcomes. To study genome functionality through interaction networks, different methods have been proposed for collective mining of information from an integrated dataset. We survey here data integration approaches using state-of-the-art techniques such as network integration, Bayesian networks, regularized regression (LASSO) and multiple kernel learning methods.
Cited by
5 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献