Genetic signature of differentiated thyroid carcinoma susceptibility: a machine learning approach

Author:

Brigante Giulia12,Lazzaretti Clara1,Paradiso Elia1,Nuzzo Federico1,Sitti Martina1,Tüttelmann Frank3,Moretti Gabriele4,Silvestri Roberto4,Gemignani Federica4,Försti Asta56,Hemminki Kari78,Elisei Rossella9,Romei Cristina9,Zizzi Eric Adriano10,Deriu Marco Agostino10,Simoni Manuela1211,Landi Stefano4ORCID,Casarini Livio111ORCID

Affiliation:

1. Unit of Endocrinology, Department of Biomedical, Metabolic and Neural Sciences, University of Modena and Reggio Emilia, Modena, Italy

2. Unit of Endocrinology, Department of Medical Specialties, Azienda Ospedaliero-Universitaria, Modena, Italy

3. Institute of Reproductive Genetics, University of Münster, Münster, Germany

4. Department of Biology, University of Pisa, Pisa, Italy

5. Hopp Children’s Cancer Center (KiTZ), Heidelberg, Germany

6. Division of Pediatric Neurooncology, German Cancer Research Center (DKFZ), German Cancer Consortium (DKTK), Heidelberg, Germany

7. Biomedical Center, Faculty of Medicine and Biomedical Center in Pilsen, Charles University in Prague, Pilsen, Czech Republic

8. Division of Cancer Epidemiology, German Cancer Research Center (DKFZ), Heidelberg, Germany

9. Department of Endocrinology, University Hospital, Pisa, Italy

10. PolitoBIO Med Lab, Department of Mechanical and Aerospace Engineering, Politecnico di Torino, Italy

11. Center for Genomic Research, University of Modena and Reggio Emilia, Modena, Italy

Abstract

To identify a peculiar genetic combination predisposing to differentiated thyroid carcinoma (DTC), we selected a set of single nucleotide polymorphisms (SNPs) associated with DTC risk, considering polygenic risk score (PRS), Bayesian statistics and a machine learning (ML) classifier to describe cases and controls in three different datasets. Dataset 1 (649 DTC, 431 controls) has been previously genotyped in a genome-wide association study (GWAS) on Italian DTC. Dataset 2 (234 DTC, 101 controls) and dataset 3 (404 DTC, 392 controls) were genotyped. Associations of 171 SNPs reported to predispose to DTC in candidate studies were extracted from the GWAS of dataset 1, followed by replication of SNPs associated with DTC risk (P < 0.05) in dataset 2. The reliability of the identified SNPs was confirmed by PRS and Bayesian statistics after merging the three datasets. SNPs were used to describe the case/control state of individuals by ML classifier. Starting from 171 SNPs associated with DTC, 15 were positive in both datasets 1 and 2. Using these markers, PRS revealed that individuals in the fifth quintile had a seven-fold increased risk of DTC than those in the first. Bayesian inference confirmed that the selected 15 SNPs differentiate cases from controls. Results were corroborated by ML, finding a maximum AUC of about 0.7. A restricted selection of only 15 DTC-associated SNPs is able to describe the inner genetic structure of Italian individuals, and ML allows a fair prediction of case or control status based solely on the individual genetic background.

Publisher

Bioscientifica

Subject

Endocrinology, Diabetes and Metabolism

Cited by 1 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3