Author:
Mathews James,Nadeem Saad,Pouryahya Maryam,Belkhatir Zehor,Deasy Joseph O.,Tannenbaum Allen
Abstract
AbstractWe present a framework based on optimal mass transport to construct, for a given network, a reduction hierarchy which can be used for interactive data exploration and community detection. Given a network and a set of numerical data samples for each node, we calculate a new computationally-efficient comparison metric between Gaussian Mixture Models, the Gaussian Mixture Transport distance, to determine a series of merge simplifications of the network. If only a network is given, numerical samples are synthesized from the network topology. The method has its basis in the local connection structure of the network, as well as the joint distribution of the data associated with neighboring nodes.The analysis is benchmarked on networks with known community structures. We also analyze gene regulatory networks, including the PANTHER curated database and networks inferred from the GTEx lung and breast tissue RNA profiles. Gene Ontology annotations from the EBI GOA database are ranked and superimposed to explain the salient gene modules. We find that several gene modules related to highly specific biological processes are well-coordinated in such tissues. We also find that 18 of the 50 genes of the PAM50 breast-tumor prognostic signature appear among the highly coordinated genes in a single gene module, in both the breast and lung samples. Moreover these 18 are precisely the subset of the PAM50 recently identified as the basal-like markers.
Publisher
Cold Spring Harbor Laboratory