Infer related genes from large scale gene expression dataset with embedding-Reference-Cited by-同舟云学术

Infer related genes from large scale gene expression dataset with embedding

Published:2018-07-05 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Choy Chi Tung^ORCID,Wong Chi Hang,Chan Stephen Lam

Abstract

AbstractArtificial neural networks (ANNs) have been utilized for classification and prediction task with remarkable accuracy. However, its implications for unsupervised data mining using molecular data is under-explored. We adopted a method of unsupervised ANN, namely word embedding, to extract biologically relevant information from TCGA gene expression dataset. Ground truth relationship, such as cancer types of the input sample and semantic meaning of genes, were showed to retain in the resulting entity matrices. We also demonstrated the interpretability and usage of these matrices in shortlisting candidates from a long gene list. This method is feasible to mine big volume of biological data, and would be a valuable tool to discover novel knowledge from omics data. The resulting embedding matrices mined from TCGA gene expression data are interactively explorable online (http://bit.ly/tcga-embedding-cancer) and could serve as an informative reference.

Publisher

Cold Spring Harbor Laboratory

Reference35 articles.

1. Deep Learning Applications for Predicting Pharmacological Properties of Drugs and Drug Repurposing Using Transcriptomic Data;Molecular Pharmaceutics,2016

2. Classification of breast cancer histology images using Convolutional Neural Networks;PLOS ONE,2017

3. Continuous Distributed Representation of Biological Sequences for Deep Proteomics and Genomics

4. Comprehensive and Integrative Genomic Characterization of Hepatocellular Carcinoma

5. Das, A. S. , Datar, M. , Garg, A. , & Rajaram, S. (2007). Google News Personalization: Scalable Online Collaborative Filtering. In Proceedings of the 16th International Conference on World Wide Web (pp. 271–280). New York, NY, USA: ACM. https://doi.org/10.1145/1242572.1242610

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Domain-PFP: Protein Function Prediction Using Function-Aware Domain Embedding Representations;2023-08-24

2. Learning functional properties of proteins with language models;Nature Machine Intelligence;2022-03-21

3. Evaluation of Methods for Protein Representation Learning: A Quantitative Analysis;2020-10-28