Affiliation:
1. MARS Research Unit, Faculty of sciences of Monastir, University of Monastir, Monastir, Tunisia
Abstract
Choosing the optimal threshold for the collocations extraction remains a manual task performed by experts. Until today, there is no serious work, based on deep studies, which explores possible solutions to automate the learning of the threshold in the statistical terminology field. In this paper, the authors try to spotlight on this problem by exploring, firstly, the evaluation performance techniques used in several scientific areas (such as biomedical and biometric) and applying them, subsequently, on the statistical terminology field. The experimental study gives promoters results. First, it shows the effectiveness of usual techniques (such as ROC and Precision-Recall curves) used to evaluate the performance of binary classification systems. Second, it provides a practical solution for automatic estimation of optimal thresholds for collocation extraction systems.
Reference43 articles.
1. Bayesian analysis of extreme events with threshold estimation
2. Collocations and General-purpose Dictionaries
3. The use of the area under the ROC curve in the evaluation of machine learning algorithms
4. Church, K., Gale, W., Hanks, P., & Hindle, D. (1989). Parsing, word associations and typical predicate-argument relations. In Proc. The workshop on Speech and Natural Language (HLT '89). Stroudsburg, PA: Association for Computational Linguistics.
5. Word association norms, mutual information and lexicography. J.;K.Church;Computational Linguistics,1990
Cited by
11 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献