Affiliation:
1. College of Automation Southeast University Nanjing China
Abstract
AbstractOpen‐Domain Question Answering (ODQA) has attracted increasing interests due to its extensive applications in search engines and smart robots. In the experiments, it is observed that the convergence of the method has a huge effect on the generalizability performance. Motivated by this observation, an unsupervised clustering technique (namely, ClusSampling) is proposed to promote both the convergence and efficacy of existing ODQA methods via unsupervised clustering. Specifically, unsupervised clustering is first conducted and then negative samples are selected for higher similarity to the questions. In addition, the authors propose to use gap statistics to determine the optimal number of clusters. Experimental results show that the method achieves notable speedup during training and produces accuracy gains of 5.3% and 2.2 on two widely used benchmarks.
Publisher
Institution of Engineering and Technology (IET)
Reference21 articles.
1. Chen D. Yih W.:Open‐domain question answering. In:Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts pp.34–37.Association for Computational Linguistics Stroudsburg PA(2020)
2. Ramos J. et al.:Using tf‐idf to determine word relevance in document queries. In:Proceedings of the First Instructional Conference on Machine Learning vol.242(1) pp.29–48.Citeseer(2003)
3. The Probabilistic Relevance Framework: BM25 and Beyond
4. Devlin J. Chang M. Lee K. Toutanova K.:Bert: Pre‐training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
5. Zhuang L. Wayne L. Ya S. Zhao J.:A robustly optimized BERT pre‐training approach with post‐training.Proceedings of the 20th Chinese National Conference on Computational Linguistics pp.1218–1227.Association for Computational Linguistics Stroudsburg PA(2021)