Author:
Zhai Yue,Bardel Claire,Vallée Maxime,Iwaz Jean,Roy Pascal
Abstract
Introduction: Ideally, evaluating next-generation sequencing performance requires a gold standard; in its absence, concordance between replicates is often used as substitute standard. However, the appropriateness of the concordance-discordance criterion has been rarely evaluated. This study analyses the relationship between the probability of discordance and the probability of error under different conditions. Methods: This study used a conditional probability approach under conditional dependence then conditional independence between two sequencing results and compares the probabilities of discordance and error in different theoretical conditions of sensitivity, specificity, and correlation between replicates, then on real results of sequencing genome NA12878. The study examines also covariate effects on discordance and error using generalized additive models with smooth functions. Results: With 99% sensitivity and 99.9% specificity under conditional independence, the probability of error for a positive concordant pair of calls is 0.1%. With additional hypotheses of 0.1% prevalence and 0.9 correlation between replicates, the probability of error for a positive concordant pair is 47.4%. With real data, the estimated sensitivity, specificity, and correlation between tests for variants are around 98.98%, 99.996%, and 93%, respectively, and the error rate for positive concordant calls approximates 2.5%. In covariate effect analyses, the effects’ functional form is close between discordance and error models, though the parts of deviance explained by the covariates differ between discordance and error models. Conclusion: With conditional independence of two sequencing results, the concordance-discordance criterion seems acceptable as substitute standard. However, with high correlation, the criterion becomes questionable because a high percentage of false concordant results appears among concordant results.