Abstract
AbstractHistopathologic diagnosis and classification of cancer plays a critical role in guiding treatment. Advances in next-generation sequencing have ushered in new complementary molecular frameworks. However, existing approaches do not independently assess both site-of-origin (e.g. prostate) and lineage (e.g. adenocarcinoma) and have minimal validation in metastatic disease, where classification is more difficult. Utilizing gradient-boosted machine learning, we developed ATLAS, a pair of separate AI Tumor Lineage and Site-of-origin models from RNA expression data on 8249 tumor samples. We assessed performance independently in 10,376 total tumor samples, including 1490 metastatic samples, achieving an accuracy of 91.4% for cancer site-of-origin and 97.1% for cancer lineage. High confidence predictions (encompassing the majority of cases) were accurate 98–99% of the time in both localized and remarkably even in metastatic samples. We also identified emergent properties of our lineage scores for tumor types on which the model was never trained (zero-shot learning). Adenocarcinoma/sarcoma lineage scores differentiated epithelioid from biphasic/sarcomatoid mesothelioma. Also, predicted lineage de-differentiation identified neuroendocrine/small cell tumors and was associated with poor outcomes across tumor types. Our platform-independent single-sample approach can be easily translated to existing RNA-seq platforms. ATLAS can complement and guide traditional histopathologic assessment in challenging situations and tumors of unknown primary.
Funder
U.S. Department of Health & Human Services | National Institutes of Health
U.S. Department of Defense
University of Wisconsin Carbone Cancer Center
Doris Duke Charitable Foundation
Publisher
Springer Science and Business Media LLC