Affiliation:
1. Baruch College, City University of New York, USA
2. The College of Staten Island, City University of New York, USA
3. Queensland University of Technology, Australia
Abstract
Background knowledge has been actively investigated as a possible means to improve performance of machine learning algorithms. Research has shown that background knowledge plays an especially critical role in three atypical text categorization tasks: short-text classification, limited labeled data, and non-topical classification. This chapter explores the use of machine learning for non-hierarchical classification of search queries, and presents an approach to background knowledge discovery by using information retrieval techniques. Two different sets of background knowledge that were obtained from the World Wide Web, one in 2006 and one in 2009, are used with the proposed approach to classify a commercial corpus of web query data by the age of the user. In the process, various classification scenarios are generated and executed, providing insight into choice, significance and range of tuning parameters, and exploring impact of the dynamic web on classification results.
Reference53 articles.
1. Banerjee, S. (2008). Improving text classification accuracy using topic modeling over an additional corpus. Proceedings of the 31st International ACM SIGIR conference on research and development in information retrieval, 867-868.
2. Beitzel, S., Jensen, E., Chowdhury, A., & Frieder, O. (2008). Analysis of varying approaches to topical web query classification. Proceedings of the 3rd international Conference on Scalable information Systems, 1-5.
3. Bobicev, V., & Sokolova, M. (2008). An effective and Robust Method for Short Text Classification. Proceedings of the 21st Conference of the Association of the Advancement of Artificial Intelligence, 1444-1445.
4. Cesa-Bianchi, N., Gentile, C., & Zaniboni, L. (2006). Hierarchical Classification: Combining Bayes with SVM. Proceedings of the 23rd International Conference on Machine Learning, 177–184, Pittsburgh, PA.
5. SEMI-SUPERVISED CLASSIFICATION USING BRIDGING