Abstract
Abstract
Many biological datasets are high-dimensional yet manifest an underlying order. In this paper, we describe an unsupervised data analysis methodology that operates in the setting of a multivariate dataset and a network which expresses influence between the variables of the given set. The technique involves network geometry employing the Wasserstein distance, global spectral analysis in the form of diffusion maps, and topological data analysis using the Mapper algorithm. The prototypical application is to gene expression profiles obtained from RNA-Seq experiments on a collection of tissue samples, considering only genes whose protein products participate in a known pathway or network of interest. Employing the technique, we discern several coherent states or signatures displayed by the gene expression profiles of the sarcomas in the Cancer Genome Atlas along the TP53 (p53) signaling network. The signatures substantially recover the leiomyosarcoma, dedifferentiated liposarcoma (DDLPS), and synovial sarcoma histological subtype diagnoses, and they also include a new signature defined by activation and inactivation of about a dozen genes, including activation of serine endopeptidase inhibitor SERPINE1 and inactivation of TP53-family tumor suppressor gene TP73.
Funder
Breast Cancer Research Foundation
Publisher
Springer Science and Business Media LLC
Reference25 articles.
1. Rachev, S. T. & Rüschendorf, L. Mass transportation problems. Vol. II. Probability and its Applications (New York). Applications (Springer-Verlag, New York, 1998).
2. Rubner, Y., Tomasi, C. & Guibas, L. J. The earth mover’s distance as a metric for image retrieval. International Journal of Computer Vision 40, 99–121 (2000).
3. Coifman, R. R. & Lafon, S. Diffusion maps. Applied and Computational Harmonic Analysis 21, 5–30, Special Issue: Diffusion Maps and Wavelets (2006).
4. Nicolau, M., Levine, A. J. & Carlsson, G. Topology based data analysis identifies a subgroup of breast cancers with a unique mutational profile and excellent survival. Proc. Natl. Acad. Sci. USA 108, 7265–7270 (2011).
5. Seemann, L., Shulman, J. & Gunaratne, G. H. A robust topology-based algorithm for gene expression profiling. ISRN Bioinform 2012, 381023 (2012).
Cited by
6 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献