Author:
Law Joseph,Gallon Richard,Teare Ethan,Koref Ivan Santibanez,Phelps Rachel,Burn John,Jackson Michael,Koref Mauro Santibanez
Abstract
1.AbstractAnalysis of somatic mutation patterns is widely used to infer exposure to exogenous and endogenous mutagenic influences. This raises the question of the amount of sequence data required to detect factors of interest. A common use of mutation pattern analysis is the identification of increased microsatellite instability to uncover mismatch repair (MMR) defects in tumours and normal tissues. Here we explore the effects of sequencing depth and the number of loci analysed on the ability to detect MMR deficiency using artificial neural networks and publicly available amplicon sequencing data from colorectal tumours on 24 short quasi monomorphic microsatellites (up to 12 bp in length, PMID 31471937) split in a training (99 samples) and a test set (95 samples). We show that, at a sequencing depth of 200, pairs mononucleotide repeats can achieve discrimination between MMR proficient and deficient colorectal tumours similar to that obtained with the full 24 marker panel, with accuracies above 97% and ROC AUCs in excess of 99% in the test set. Our results indicate that for short monomorphic microsatellites considering the length distribution of the different alleles at each locus, representing these distributions as two-dimensional structures and including convolutional layers in the network can facilitate discrimination between MMR deficient and proficient tumour material. They also indicate that, despite the limitations imposed by amplification, sequencing accuracy and the limited divergence time between the sequences from one locus, high depth sequencing can be used to identify MMR deficiency from a limited number of loci. However, they also suggest that, for a fixed total number of reads per sample, increasing sensitivity by increasing the number of targets is more efficient than by increasing per target sequencing depth. These results are of interest for screening large numbers of samples and for assessing the impact MMR deficiency in different areas of the genome.
Publisher
Cold Spring Harbor Laboratory