Affiliation:
1. Department of Information Technology, College of Technology and Design, University of Economics Ho Chi Minh City, Ho Chi Minh City 700000, Vietnam
Abstract
Discovering customer intents from text or speech data plays a vital role in text mining and automated dialogue response. It is challenging to process thousands of customer interactions daily. Deep Embedded Clustering (DEC) and Improved DEC (IDEC) with Kullback–Leibler loss handle a lot of data inefficiently due to the asymmetric nature of the loss. To address the challenge, an unsupervised learning approach to discover intents and automatically produce the labels from a collection of unlabeled utterances in the context of the banking domain is proposed. The proposed approach focuses on improving both architectures of DEC and IDEC by combining the Jensen–Shannon (JS) divergence to simultaneously learn feature representations and cluster assignments, and the Second-order Clipped Stochastic Optimization (Sophia). Then, a set of intent labels for each cluster is generated by using a dependency parser in the second stage. Experimental results showed that the proposed approach is capable of generating meaningful intent labels and short text clustering with high performance.
Funder
University of Economics Ho Chi Minh City
Publisher
World Scientific Pub Co Pte Ltd