Applications of Machine Learning for Linguistic Analysis of Texts-Reference-Cited by-同舟云学术

Applications of Machine Learning for Linguistic Analysis of Texts

Published:2012 Issue: Volume: Page:133-148
ISSN:
Container-title:Machine Learning Algorithms for Problem Solving in Computational Applications
language:
Short-container-title:

Author:

Torney Rosemary¹,Yearwood John²,Vamplew Peter¹,Kelarev Andrei V.¹

Affiliation:

1. University of Ballarat, Australia

2. Federation University, Australia

Abstract

This chapter describes a novel multistage method for linguistic clustering of large collections of texts available on the Internet as a precursor to linguistic analysis of these texts. This method addresses the practicalities of applying clustering operations to a very large set of text documents by using a combination of unsupervised clustering and supervised classification. The method relies on creating a multitude of independent clusterings of a randomized sample selected from the International Corpus of Learner English. Several consensus functions and sophisticated algorithms are applied in two substages to combine these independent clusterings into one final consensus clustering, which is then used to train fast classifiers in order to enable them to perform the profiling of very large collections of text and web data. This approach makes it possible to apply advanced highly accurate and sophisticated clustering techniques by combining them with fast supervised classification algorithms. For the effectiveness of this multistage method it is crucial to determine how well the supervised classification algorithms are going to perform at the final stage, when they are used to process large data sets available on the Internet. This performance may also serve as an indication of the quality of the combined consensus clustering obtained in the preceding stages. The authors’ experimental results compare the performance of several classification algorithms incorporated in this multistage scheme and demonstrate that several of these classification algorithms achieve very high precision and recall and can be used in practical implementations of their method.

Publisher

IGI Global

Reference69 articles.

1. Writeprints

2. Agarwal, N., Liu, H., Subramanya, S., Salerno, J. J., & Yu, P. S. (2009). Connecting sparsely distributed similar bloggers. In Proceedings 2009 Ninth IEEE International Conference on Data Mining, ICDM09 (pp. 11-20). Miami, Florida, USA.

3. Instance-based learning algorithms

4. Aho, B. T., & Dzeroski, S. (2009). Rule ensembles for multi-target regression. In Proceedings 2009 Ninth IEEE International Conference on Data Mining, ICDM09 (pp. 21-30). Miami, Florida, USA.

5. Ailon, N., Charikar, M., & Newman, A. (2005). Aggregating inconsistent information: ranking and clustering. In Proceedings of 37th Annual ACM Symposium on Theory of Computing (pp. 684-693).