Abstract
AbstractDrug sensitivity prediction models for human cancer cell lines constitute important tools in identifying potential driving factors of responsiveness in a pre-clinical setting. Integrating information derived from a range of heterogeneous data is crucial, but remains non-trivial, as differences in data structures may hinder fitting algorithms from assigning adequate weights to complementary information that is contained in distinct omics data. In order to counteract this effect that tends to lead to just one data type dominating supposedly multi-omics models, we developed a novel tool that enables users to train single-omics models separately in a first step and to integrate them into a multi-omics model in a second step. Extensive ablation studies are performed in order to facilitate an in-depth evaluation of the respective contributions of singular data types and of combinations thereof, effectively identifying redundancies and interdependencies between them. Moreover, the integration of the single-omics models is realized by a range of distinct classification algorithms, thus allowing for a performance comparison. Sets of molecular events and tissue types found to be related to significant shifts in drug sensitivity are returned to facilitate a comprehensive and straightforward analysis of potential drivers of drug responsiveness. Our two-step approach yields sets of actual multi-omics pan-cancer classification models that are highly predictive for a majority of drugs in the GDSC data base. In the context of targeted drugs with particular modes of action, its predictive performances compare favourably to those of classification models that incorporate multi-omics data in a simple one-step approach. Additionally, case studies demonstrate that it succeeds both in correctly identifying known key drivers of specific drug compounds as well as in providing sets of potential candidates for additional driving factors of drug sensitivity.
Publisher
Cold Spring Harbor Laboratory