Abstract
AbstractDNA methylation is one of the most commonly studied epigenetic biomarkers, due to its role in disease and development. The Illumina Infinium methylation arrays still remains the most common method to interrogate methylation across the human genome, due to its capabilities of screening over 480, 000 loci simultaneously. As such, initiatives such as The Cancer Genome Atlas (TCGA) have utilized this technology to examine the methylation profile of over 20,000 cancer samples. There is a growing body of methods for pre-processing, normalisation and analysis of array-based DNA methylation data. However, the shape and sampling distribution of probe-wise methylation that could influence the way data should be examined was rarely discussed. Therefore, this article introduces a pipeline that predicts the shape and distribution of normalised methylation patterns prior to selection of the most optimal inferential statistics screen for differential methylation. Additionally, we put forward an alternative pipeline, which employed feature selection, and demonstrate its ability to select for biomarkers with outstanding differences in methylation, which does not require the predetermination of the shape or distribution of the data of interest.AvailabilityThe Distribution test and the feature selection pipelines are available for download at: https://github.com/uqjlu8/DistributionTest
Publisher
Cold Spring Harbor Laboratory