Author:
Woicik Addie,Zhang Mingxin,Xu Hanwen,Mostafavi Sara,Wang Sheng
Abstract
AbstractMotivationThe exponential growth of genomic sequencing data has created ever-expanding repositories of gene networks. Unsupervised network integration methods are critical to learn informative representations for each gene, which are later used as features for downstream applications. However, these network integration methods must bescalableto account for the increasing number of networks androbustto an uneven distribution of network types within hundreds of gene networks.ResultsTo address these needs, we present Gemini, a novel network integration method that uses memory-efficient high-order pooling to represent and weight each network according to its uniqueness. Gemini then mitigates the uneven distribution through mixing up existing networks to create many new networks. We find that Gemini leads to more than a 10% improvement in F1score, 14% improvement in micro-AUPRC, and 71% improvement in macro-AURPC for protein function prediction by integrating hundreds of networks from BioGRID, and that Gemini’s performance significantly improves when more networks are added to the input network collection, while the comparison approach’s performance deteriorates. Gemini thereby enables memory-efficient and informative network integration for large gene networks, and can be used to massively integrate and analyze networks in other domains.AvailabilityGemini can be accessed at:https://github.com/MinxZ/Gemini.Contactaddiewc@cs.washington.edu,swang@cs.washington.edu
Publisher
Cold Spring Harbor Laboratory