Affiliation:
1. INRIA, 78153 Le Chesnay, France
2. École Normale Supérieure, 75005 Paris, France
Abstract
We study and compare two classes of statistical criteria to assess the significance of exceptional words. Indeed, the Z-score-like criteria, or the normal approximation that is a strict equivalent, suffer from several drawbacks in terms of sensitivity and specificity. Thanks to the combinatorial structure of words, a computation of the exact P-value has been made possible by recent mathematical results. We study here the drawbacks of the Z-score, the choice of the threshold and the tightness to the P-value. A major conclusion is that the normal approximation is always very poor and overestimates statistical significance.
Publisher
World Scientific Pub Co Pte Lt
Subject
Computer Science Applications,Molecular Biology,Biochemistry
Cited by
15 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献