Dictionary learning for transcriptomics data reveals type-specific gene modules in a multi-class setting-Reference-Cited by-同舟云学术

Dictionary learning for transcriptomics data reveals type-specific gene modules in a multi-class setting

Published:2020-03-11 Issue:3-4 Volume:62 Page:119-134
ISSN:2196-7032
Container-title:it - Information Technology
language:en
Short-container-title:

Author:

Rams Mona¹,Conrad Tim¹²³

Affiliation:

1. 9166 Freie Universität Berlin , Institute for Mathematics , Arnimallee 6 , Berlin , Germany

2. Zuse Institute Berlin , Takustraße 7 , Berlin , Germany

3. Berlin Institute for the Foundations of Learning and Data , Berlin , Germany

Abstract

Abstract Extracting information from large biological datasets is a challenging task, due to the large data size, high-dimensionality, noise, and errors in the data. Gene expression data contains information about which gene products have been formed by a cell, thus representing which genes have been read to activate a particular biological process. Understanding which of these gene products can be related to which processes can for example give insights about how diseases evolve and might give hints about how to fight them. The Next Generation RNA-sequencing method emerged over a decade ago and is nowadays state-of-the-art in the field of gene expression analyses. However, analyzing these large, complex datasets is still a challenging task. Many of the existing methods do not take into account the underlying structure of the data. In this paper, we present a new approach for RNA-sequencing data analysis based on dictionary learning. Dictionary learning is a sparsity enforcing method that has widely been used in many fields, such as image processing, pattern classification, signal denoising and more. We show how for RNA-sequencing data, the atoms in the dictionary matrix can be interpreted as modules of genes that either capture patterns specific to different types, or else represent modules that are reused across different scenarios. We evaluate our approach on four large datasets with samples from multiple types. A Gene Ontology term analysis, which is a standard tool indicated to help understanding the functions of genes, shows that the found gene-sets are in agreement with the biological context of the sample types. Further, we find that the sparse representations of samples using the dictionary can be used to identify type-specific differences.

Funder

Bundesministerium für Bildung und Forschung

Publisher

Walter de Gruyter GmbH

Subject

General Computer Science

Link

https://www.degruyter.com/document/doi/10.1515/itit-2019-0048/pdf

Reference25 articles.

1. Orly Alter, Patrick O. Brown, and David Botstein. Singular value decomposition for genome-wide expression data processing and modeling. Proceedings of the National Academy of Sciences, 97(18):10101–10106, 2000.

2. Sven Bergmann, Jan Ihmels, and Naama Barkai. Iterative signature algorithm for the analysis of large-scale gene expression data. Physical review E, 67(3):031902, 2003.

3. Brian Cleary, Le Cong, Anthea Cheung, Eric S. Lander, and Aviv Regev. Efficient generation of transcriptomic profiles by random composite measurements. Cell, 171(6):1424–1436, 2017.

4. Ronald R. Coifman and David L. Donoho. Translation-invariant de-noising. In Wavelets and statistics, pages 125–150. Springer, 1995.

5. Gene Ontology Consortium. Gene ontology consortium: Going forward. Nucleic acids res. 43:D1049–d1056, 2015.

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Dictionary learning allows model-free pseudotime estimation of transcriptomic data;BMC Genomics;2022-01-15