Discovering a sparse set of pairwise discriminating features in high-dimensional data-Reference-Cited by-同舟云学术

Discovering a sparse set of pairwise discriminating features in high-dimensional data

Published:2020-07-30 Issue:2 Volume:37 Page:202-212
ISSN:1367-4803
Container-title:Bioinformatics
language:en
Short-container-title:

Author:

Melton Samuel¹,Ramanathan Sharad²³⁴

Affiliation:

1. Applied Mathematics Harvard University, Cambridge, MA 02138, USA

2. Applied Physics, John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, MA 02138, USA

3. Department of Stem Cell and Regenerative Biology, Cambridge, MA 02138, USA

4. Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA 02138, USA

Abstract

Abstract Motivation Recent technological advances produce a wealth of high-dimensional descriptions of biological processes, yet extracting meaningful insight and mechanistic understanding from these data remains challenging. For example, in developmental biology, the dynamics of differentiation can now be mapped quantitatively using single-cell RNA sequencing, yet it is difficult to infer molecular regulators of developmental transitions. Here, we show that discovering informative features in the data is crucial for statistical analysis as well as making experimental predictions. Results We identify features based on their ability to discriminate between clusters of the data points. We define a class of problems in which linear separability of clusters is hidden in a low-dimensional space. We propose an unsupervised method to identify the subset of features that define a low-dimensional subspace in which clustering can be conducted. This is achieved by averaging over discriminators trained on an ensemble of proposed cluster configurations. We then apply our method to single-cell RNA-seq data from mouse gastrulation, and identify 27 key transcription factors (out of 409 total), 18 of which are known to define cell states through their expression levels. In this inferred subspace, we find clear signatures of known cell types that eluded classification prior to discovery of the correct low-dimensional subspace. Availability and implementation https://github.com/smelton/SMD. Supplementary information Supplementary data are available at Bioinformatics online.

Funder

DARPA

S.M.

NIH

Publisher

Oxford University Press (OUP)

Subject

Computational Mathematics,Computational Theory and Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Statistics and Probability

Link

http://academic.oup.com/bioinformatics/advance-article-pdf/doi/10.1093/bioinformatics/btaa690/33979409/btaa690.pdf

Reference84 articles.

1. Making a commitment: cell lineage allocation and axis patterning in the early mouse embryo;Arnold;Nat. Rev. Mol. Cell Biol,2009

2. Pivotal roles for eomesodermin during axis formation, epithelium-to-mesenchyme transition and endoderm specification in the mouse;Arnold;Development,2008

3. Multipotent cell lineages in early mouse development depend on sox2 function;Avilion;Genes Dev,2003

4. Concise review: early embryonic erythropoiesis: not so primitive after all;Baron;Stem Cells (Dayton, Ohio),2013

Cited by 9 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. The specious art of single-cell genomics;PLOS Computational Biology;2023-08-17

2. Controlling human organoid symmetry breaking reveals signaling gradients drive segmentation clock waves;Cell;2023-02

3. Controlling organoid symmetry breaking uncovers an excitable system underlying human axial elongation;Cell;2023-02

4. Coupled organoids reveal that signaling gradients drive traveling segmentation clock waves during human axial morphogenesis;2022-05-11

5. Machine learning directed organoid morphogenesis uncovers an excitable system driving human axial elongation;2022-05-11