Affiliation:
1. Institute of Business Administration (IBA), Karachi, Pakistan
Abstract
Document clustering techniques often produce clusters that require human intervention to interpret the meaning of such clusters. Automatic cluster labeling refers to the process of assigning a meaningful phrase to a cluster as a label. This article proposes an unsupervised method for cluster labeling that is based on noun phrase chunking. The proposed method is compared with four other statistical-based methods, including Z-Order, M-Order, T-Order, and YAKE. In addition to the statistical measures based labeling schemes, the approach is also compared with two graph-based techniques: TextRank and PositionRank. The experiments were performed on the low-resource Urdu language corpus of News Headlines. The proposed approach's effectiveness was evaluated using cosine similarity, the Jaccard index, and feedback received from human evaluators. The results show that the proposed method outperforms other methods. It was found that the labels produced were more relevant and semantically rich in contrast to other approaches.
Publisher
Association for Computing Machinery (ACM)