Data Perturbation Independent Diagnosis and Validation of Breast Cancer Subtypes Using Clustering and Patterns

Author:

Alexe G.12,Dalgin G.S.3,Ramaswamy R.24,Delisi C.5,Bhanot G.1256

Affiliation:

1. Computational Biology Center, IBM Thomas J. Watson Research Center, Yorktown Heights, NY 10598, U.S.A.

2. The Simons Center for Systems Biology, Institute for Advanced Study, Princeton NJ 08540, U.S.A.

3. Molecular Biology, Cell Biology and Biochemistry Program, Boston University, 2 Cummington Street, Boston, MA 02215, U.S.A.

4. School of Information Technology, Jawaharlal Nehru University, New Delhi 110 067, India.

5. Biomedical Engineering, Boston University, 44 Cummington Street, Boston, MA 02215, U.S.A.

6. Department of Biomedical Engineering and BioMaPS Institute, Rutgers University, Piscataway, NJ 08854.

Abstract

Molecular stratification of disease based on expression levels of sets of genes can help guide therapeutic decisions if such classifications can be shown to be stable against variations in sample source and data perturbation. Classifications inferred from one set of samples in one lab should be able to consistently stratify a different set of samples in another lab. We present a method for assessing such stability and apply it to the breast cancer (BCA) datasets of Sorlie et al. 2003 and Ma et al. 2003. We find that within the now commonly accepted BCA categories identified by Sorlie et al. Luminal A and Basal are robust, but Luminal B and ERBB2+ are not. In particular, 36% of the samples identified as Luminal B and 55% identified as ERBB2+ cannot be assigned an accurate category because the classification is sensitive to data perturbation. We identify a “core cluster” of samples for each category, and from these we determine “patterns” of gene expression that distinguish the core clusters from each other. We find that the best markers for Luminal A and Basal are (ESR1, LIV1, GATA-3) and (CCNE1, LAD1, KRT5), respectively. Pathways enriched in the patterns regulate apoptosis, tissue remodeling and the immune response. We use a different dataset (Ma et al. 2003) to test the accuracy with which samples can be allocated to the four disease subtypes. We find, as expected, that the classification of samples identified as Luminal A and Basal is robust but classification into the other two subtypes is not.

Publisher

SAGE Publications

Subject

Cancer Research,Oncology

Cited by 9 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Cross-Study Replicability in Cluster Analysis;Statistical Science;2023-05-01

2. Validation of cluster analysis results on validation data: A systematic framework;WIREs Data Mining and Knowledge Discovery;2021-12-23

3. Breast cancer subtype predictors revisited: from consensus to concordance?;BMC Medical Genomics;2016-06-03

4. Pathway-based personalized analysis of breast cancer expression data;Molecular Oncology;2015-04-29

5. Predicting Cancer Survival Using Expression Patterns;Medical Biostatistics for Complex Diseases;2010-07-19

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3