Author:
Yoshida Minoru,Kita Kenji
Abstract
Both words and numerals are tokens found in almost all documents but they have different properties. However, relatively little attention has been paid in numerals found in texts and many systems treated the numbers found in the document in ad-hoc ways, such as regarded them as mere strings in the same way as words, normalized them to zeros, or simply ignored them. Recent growth of natural language processing (NLP) research areas has change this situations and more and more attentions have been paid to the numeracy in documents. In this survey, we provide a quick overview of the history and recent advances of the research of mining such relations between numerals and words found in text data.
Reference40 articles.
1. John M. Prager, Jennifer Chu-Carroll, Krzysztof Czuba, Christopher A. Welty, Abraham Ittycheriah, Ruchi Mahindru: IBM’s PIQUANT in TREC2003. TREC 2003: 283-292
2. Numerical data integration for cooperative question-answering Véronique Moriceau Proceedings of the Workshop KRAQ’06: Knowledge and Reasoning for Language
3. Somnath Banerjee, Soumen Chakrabarti, Ganesh Ramakrishnan: Learning to rank for quantity consensus queries. SIGIR 2009: 243-250
4. Sunita Sarawagi, Soumen Chakrabarti: Open-domain quantity queries on web tables: annotation, response, and consensus models. KDD 2014: 711-720
5. Minoru Yoshida, Issei Sato, Hiroshi Nakagawa, Akira Terada: Mining Numbers in Text Using Suffix Arrays and Clustering Based on Dirichlet Process Mixture Models. PAKDD (2) 2010: 230-237