An intent recognition pipeline for conversational AI
-
Published:2023-12-29
Issue:
Volume:
Page:
-
ISSN:2511-2104
-
Container-title:International Journal of Information Technology
-
language:en
-
Short-container-title:Int. j. inf. tecnol.
Author:
Chandrakala C. B.,
Bhardwaj Rohit,
Pujari ChetanaORCID
Abstract
AbstractNatural Language Processing (NLP) is one of the Artificial Intelligence applications that is entitled to allow computers to process and understand human language. These models are utilized to analyze large volumes of text and also support aspects like text summarization, language translation, context modeling, and sentiment analysis. Natural language, a subset of Natural Language Understanding (NLU), turns natural language into structured data. NLU accomplishes intent classification and entity extraction. The paper focuses on a pipeline to maximize the coverage of a conversational AI (chatbot) by extracting maximum meaningful intents from a data corpus. A conversational AI can best answer queries with respect to the dataset if it is trained on the maximum number of intents that can be gathered from the dataset which is what we focus on getting in this paper. The higher the intent we gather from the dataset, the more of the dataset we cover in training the conversational AI. The pipeline is modularized into three broad categories - Gathering the intents from the corpus, finding misspellings and synonyms of the intents, and finally deciding the order of intents to be picked up for training any classifier ML model. Several heuristic and machine-learning approaches have been considered for optimum results. For finding misspellings and synonyms, they are extracted through text vector neural network-based algorithms. Then the system concludes with a suggestive priority list of intents that should be fed to a classification model. In the end, an example of three intents from the corpus is picked, and their order is suggested for the optimum functioning of the pipeline. This paper attempts to pick intents in descending order of their coverage in the corpus in the most optimal way possible.
Funder
Manipal Academy of Higher Education, Manipal
Publisher
Springer Science and Business Media LLC
Subject
Electrical and Electronic Engineering,Applied Mathematics,Artificial Intelligence,Computational Theory and Mathematics,Computer Networks and Communications,Computer Science Applications,Information Systems
Reference31 articles.
1. Mohit M (2020) String similarity-the basic know your algorithms guide! https://itnext.io/string-similarity-the-basic-know-your-algorithms-guide-3de3d7346227, Medium, ITNEXT. Accessed 8 Jan 2022
2. Fernandes A (2020) 7 definitive AI chatbot trends for 2019. https://blog.verloop.io/chatbot-applications-top-10-industries-that-use-chatbots/. Accessed 5 July 2022
3. Blei DM, Ng AY, Jordan MI (2023) Latent dirichlet allocation. J Mach Learn Res 3:993–1022
4. Guo S, Yao N (2021) Document vector extension for documents classification. IEEE Trans Knowl Data Eng 33(8):3062–3074. https://doi.org/10.1109/TKDE.2019.2961343
5. Rohit B, Exploration and visualisation of word vectors in chat, text vector visualisation. https://rohetoric.github.io/text-vector-visualisation/. Accessed 10 Oct 2021