Affiliation:
1. IBM Watson, NY, USA
2. Yahoo Labs London, UK
3. Universitá degli Studi di Bari Aldo Moro, Italy
Abstract
Community question answering (CQA) sites use a collaborative paradigm to satisfy complex information needs. Although the task of matching questions to their best answers has been tackled for more than a decade, the social question-answering practice is a complex process. The factors influencing the accuracy of question-answer matching are many and hard to disentangle. We approach the task from an application-oriented perspective, probing the space of several dimensions relevant to this problem: features, algorithms, and topics. We gather under a learning to rank framework the most extensive feature set used in literature to date, including 225 features from five different families. We test the power of such features in predicting the best answer to a question on the largest dataset from Yahoo Answers used for this task so far (40M answers) and provide a faceted analysis of the results along different topical areas and question types. We propose a novel family of distributional semantics measures that most of the time can seamlessly replace widely used linguistic similarity features, being more than one order of magnitude faster to compute and providing greater predictive power. The best feature set reaches an improvement between 11% and 26% in P@1 compared to recent well-established state-of-the-art methods.
Funder
Ministry of Science and Innovation of Spain
SocialSensor project
European Community's Seventh Framework Programme
Spanish Centre for the Development of Industrial Technology under the CENIT program
(http://www.cenitsocialmedia.es) “Social Media”
Publisher
Association for Computing Machinery (ACM)
Subject
Computer Science Applications,General Business, Management and Accounting,Information Systems
Cited by
25 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献