Comprehensive evaluation of the implementation of episignatures for diagnosis of neurodevelopmental disorders (NDDs)

Author:

Giuili Edoardo,Grolaux Robin,Macedo Catarina Z. N. M.,Desmyter Laurence,Pichon Bruno,Neuens Sebastian,Vilain Catheline,Olsen Catharina,Van Dooren Sonia,Smits Guillaume,Defrance Matthieu

Abstract

AbstractEpisignatures are popular tools for the diagnosis of rare neurodevelopmental disorders. They are commonly based on a set of differentially methylated CpGs used in combination with a support vector machine model. DNA methylation (DNAm) data often include missing values due to changes in data generation technology and batch effects. While many normalization methods exist for DNAm data, their impact on episignature performance have never been assessed. In addition, technologies to quantify DNAm evolve quickly and this may lead to poor transposition of existing episignatures generated on deprecated array versions to new ones. Indeed, probe removal between array versions, technologies or during preprocessing leads to missing values. Thus, the effect of missing data on episignature performance must also be carefully evaluated and addressed through imputation or an innovative approach to episignatures design. In this paper, we used data from patients suffering from Kabuki and Sotos syndrome to evaluate the influence of normalization methods, classification models and missing data on the prediction performances of two existing episignatures. We compare how six popular normalization methods for methylarray data affect episignature classification performances in Kabuki and Sotos syndromes and provide best practice suggestions when building new episignatures. In this setting, we show that Illumina, Noob or Funnorm normalization methods achieved higher classification performances on the testing sets compared to Quantile, Raw and Swan normalization methods. We further show that penalized logistic regression and support vector machines perform best in the classification of Kabuki and Sotos syndrome patients. Then, we describe a new paradigm to build episignatures based on the detection of differentially methylated regions (DMRs) and evaluate their performance compared to classical differentially methylated cytosines (DMCs)-based episignatures in the presence of missing data. We show that the performance of classical DMC-based episignatures suffers from the presence of missing data more than the DMR-based approach. We present a comprehensive evaluation of how the normalization of DNA methylation data affects episignature performance, using three popular classification models. We further evaluate how missing data affect those models’ predictions. Finally, we propose a novel methodology to develop episignatures based on differentially methylated regions identification and show how this method slightly outperforms classical episignatures in the presence of missing data.

Funder

Fonds De La Recherche Scientifique - FNRS

Innoviris Foundation

Publisher

Springer Science and Business Media LLC

Subject

Genetics (clinical),Genetics

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3