Affiliation:
1. Indian Institute of Technology Kharagpur, Kharagpur, West Bengal, India
Abstract
The primary objective of this work is to classify Hindi and Telugu stories into three genres:
fable, folk-tale,
and
legend
. In this work, we are proposing a framework for story classification (SC) using keyword and part-of-speech (POS) features. For improving the performance of SC system, feature reduction techniques and combinations of various POS tags are explored. Further, we investigated the performance of SC by dividing the story into parts depending on its semantic structure. In this work, stories are (i) manually divided into parts based on their semantics as
introduction, main,
and
climax
; and (ii) automatically divided into equal parts based on number of sentences in a story as
initial, middle,
and
end
. We have also examined
sentence increment model,
which aims at determining an optimum number of sentences required to identify story genre by incremental selection of sentences in a story. Experiments are conducted on Hindi and Telugu story corpora consisting of 300 and 150 short stories, respectively. The performance of SC system is evaluated using different combinations of keyword and POS-based features, with three well-established machine learning classifiers: (i) Naive Bayes (NB), (ii) k-Nearest Neighbour (KNN), and (iii) Support Vector Machine (SVM). Performance of the classifier is evaluated using 10-fold cross-validation and effectiveness of classifier is measured using precision, recall, and F-measure. From the classification results, it is observed that adding linguistic information boosts the performance of story classification. In view of the structure of the story, main, and initial parts of the story have shown comparatively better performance. The results from the sentence incremental model have indicated that the first nine and seven sentences in Hindi and Telugu stories, respectively, are sufficient for better classification of stories. In most of the studies, SVM models outperformed the other models in classification accuracy.
Funder
Development of Text-to-Speech synthesis for Indian Languages Phase II
Publisher
Association for Computing Machinery (ACM)
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. AI-Based Secure Software-Defined Controller to Assist Alzheimer’s Patients in Their Daily Routines;Intelligent Data Engineering and Analytics;2023
2. Named entity recognition on different languages: A survey;PROCEEDINGS OF THE 1ST INTERNATIONAL CONFERENCE ON FRONTIER OF DIGITAL TECHNOLOGY TOWARDS A SUSTAINABLE SOCIETY;2023