A Hybrid Hindi Printed Document Classification System Using SVM and Fuzzy-Reference-Cited by-同舟云学术

A Hybrid Hindi Printed Document Classification System Using SVM and Fuzzy

Published:2019-10 Issue:4 Volume:12 Page:107-131
ISSN:1938-7857
Container-title:Journal of Information Technology Research
language:en
Short-container-title:

Author:

Puri Shalini¹^ORCID,Singh Satya Prakash¹

Affiliation:

1. Birla Institute of Technology, Mesra, Ranchi, India

Abstract

This article introduces a new advanced tri-layered segmentation and bi-leveled-classifier-based Hindi printed document classification system, which categorizes imaged documents into pre-defined mutually exclusive categories by using SVM and Fuzzy matching at character and document classifications, respectively. During training, the improved and noise-free image is segmented into lines and words by profiling. Then it obtains Shirorekha Less (SL) isolated characters along with upper, left and right modifier components from the SL words. These components use their locations and inter character-modifier component distance to get associate with their corresponding characters only. Further, confidence values of all characters are calculated with SVM training and all characters are mapped into Romanized labels to generate the words. Finally, documents are classified by Fuzzy based matching of Romanized detected words and predefined classes. The average execution times of SL characters are 0.22675 sec. and 0.20375 sec. and classification accuracy are 74.61% and 80.73% for training and testing, respectively.

Publisher

IGI Global

Subject

General Computer Science

Reference50 articles.

1. Generalization of Hindi OCR Using Adaptive Segmentation and Font Files

2. Text line script identification for a tri-lingual document

3. A survey on optical character recognition for Bangla and Devanagari scripts

4. Review on extraction techniques for images, text lines and keywords from document images.;S. L.Bagadkar;International Conference on Computational and Computing Research,2014

5. Two-stage approach for word-wise script identification.;S.Chanda;10th International Conference on Document Analysis and Recognition,2009

Cited by 10 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. OneLife IoT-Based Self-Monitoring Healthcare System;Advances in Educational Technologies and Instructional Design;2024-04-19

2. An Efficient Fuzzy Colored Petri-Nets-Based Ubiquitous Framework for Diversified Culture of Building Automation in India;Procedia Computer Science;2024

3. Image Classification with Information Extraction by Evaluating the Text Patterns in Bilingual Documents;Communications in Computer and Information Science;2023

4. Retrospective Inspection for Research in Natural Language Processing of Hindi Language Using Fuzzy Logic;ICT Infrastructure and Computing;2022-11-08

5. A Novel Approach to Ambiguous Fake News Classification through Machine Learning;2022 IEEE 3rd Global Conference for Advancement in Technology (GCAT);2022-10-07