Abstract
AbstractBackgroundGene set enrichment analysis (GSEA) tools can be used to identify biological insights from transcriptional datasets and have become an integral analysis within gene expression-based cancer studies. Over the years, additional methods of GSEA-based tools have been developed, providing the field with an ever-expanding range of options to choose from. Although several studies have compared the statistical performance of these tools, the downstream biological implications that arise when choosing between the range of pairwise or single sample forms of GSEA methods remain understudied.MethodsIn this study, we compare the statistical and biological interpretation of results obtained when using a variety of pre-ranking methods and options for pairwise GSEA and fast GSEA (fGSEA), alongside single sample GSEA (ssGSEA) and gene set variation analysis (GSVA). These analyses are applied to a well-established cohort of n=215 colon tumour samples, using the clinical feature of cancer recurrence status, non-relapse (NR) and relapse (R), as an initial exemplar, in conjunction with the Molecular Signatures Database “Hallmark” gene sets.ResultsDespite minor fluctuations in statistical performance, pairwise analysis revealed remarkably similar results when deployed using a range of gene pre-ranking methods or across a range of choices of GSEA versus fGSEA, with the same well-established prognostic signatures being consistently returned as significantly associated with relapse status. In contrast, when the same statistically significant signatures, such as Interferon Gamma Response, were assessed using ssGSEA and GSVA approaches, there was a complete absence of biological distinction between these groups (NR and R).ConclusionsData presented here highlights how pairwise methods can overgeneralise biological enrichment within a group, assigning strong statistical significance to gene sets that may be inadvertently interpreted as equating to distinct biology. Importantly, single sample approaches allow users to clearly visualise and interpret statistical significance alongside biological distinction between samples within groups-of-interest; thus, providing a more robust and reliable basis for discovery research.
Publisher
Cold Spring Harbor Laboratory