Abstract
AbstractCopy number variations (CNVs) are an essential component of genetic variation distributed across large parts of the human genome. CNV detection from next-generation sequencing data and artificial intelligence algorithms has progressed in recent years. However, only a few tools have taken advantage of machine learning algorithms for CNV detection, and none propose using artificial intelligence to automatically detect probable CNV-positive samples. Furthermore, in general, most CNV software that is developed for specific data types has sub-optimal reliability for routine practice. In addition, the most developed approach is to use a reference or normal dataset to compare with the samples of interest, and it is well known that selecting appropriate normal samples represents a challenging task which dramatically influences the precision of results in all CNV-detecting tools. With careful consideration of these issues, we propose here ifCNV, a new software based on isolation forests that creates its own reference, available in R and python with customisable parameters. ifCNV combines artificial intelligence using two isolation forests and a comprehensive scoring method to faithfully detect CNVs among various samples. It was validated using datasets from diverse origins (capture and amplicon, germline and somatic), and it exhibits high sensitivity, specificity and accuracy. ifCNV is a publicly available open-source software that allows the detection of CNVs in many clinical situations.Key pointsCopy number variation detectionMachine learningLocalisation scoringBenchmark on various clinical situations and on various datasetsEasy-to-use R and Python open-source Package
Publisher
Cold Spring Harbor Laboratory
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献