Abstract
AbstractData integration of single-cell data describes the task of embedding datasets obtained from different sources into a common space, so that cells with similar cell type or state end up close from one another in this representation independently from their dataset of origin. Data integration is a crucial early step in most data analysis pipelines involving multiple batches and allows informative data visualization, batch effect reduction, high resolution clustering, accurate label transfer and cell type inference. Many tools have been proposed over the last decade to tackle data integration, and some of them are routinely used today within data analysis workflows. Despite constant endeavors to conduct exhaustive benchmarking studies, a recent surge in the number of these methods has made it difficult to choose one objectively for a given use case. Furthermore, these tools are generally provided as rigid pieces of software allowing little to no user agency on their internal parameters and algorithms, which makes it hard to adapt them to a variety of use cases. In an attempt to address both of these issues at once we introducetransmorph, an ambitious unifying framework for data integration. It allows building complex data integration pipelines by combining existing and original algorithmic modules, and is supported by a rich software ecosystem to easily benchmark modules, analyze and report results. We demonstratetransmorphcapabilities and the value of its expressiveness by solving a variety of practical single-cell applications including supervised and unsupervised joint datasets embedding, RNA-seq integration in gene space and label transfer of cell cycle phase within cell cycle genes space. We providetransmorphas a free, open source and computationally efficient python library, with a particular effort to make it compatible with the other state-of-the-art tools and workflows.
Publisher
Cold Spring Harbor Laboratory
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献