Abstract
In data mining, shorter text analysis is performed more widely for many applications. Based on the syntax of the language, it is very difficult to analyze the short text with several traditional tools of natural language processing and this is not applied correctly either. In short text, it is known that there are rare and insufficient data available and further it is difficult to identify semantic knowledge with the great noise and ambiguity of short texts. In this paper, the authors proposed to replace the coefficient of similarity of Cosine with the measure of similarity of Jaro-Winkler to obtain the coincidence of similarity between pairs of text (source text and target text). Jaro-Winkler does a better job of determining the similarity of the strings because it takes an order into account when using the positional indices to estimate relevance. It is presumed that the performance of CACT driven by Jaro-Wrinkler with respect to one-to-many data links offers optimized performance when compared to the operation of CACT driven by cosine. In this paper, the ensemble algorithm CACTS and SAE is adopted with Jaro-Winkler similarity approach. The new algorithm is employed for short text analysis and better results. An evaluation of our proposed concept is sufficient as validation.
Publisher
Blue Eyes Intelligence Engineering and Sciences Engineering and Sciences Publication - BEIESP
Subject
Computer Science Applications,General Engineering,Environmental Engineering
Cited by
8 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献