Affiliation:
1. Department of Computer Engineering and Science, Yuan-Ze University, 135 Yuan-Yung Rd., Nei-Li, Chung-Li, Taoyuan, 32026, Taiwan, R.O.C.
Abstract
A novel character segmentation method for printed documents is proposed in this paper. It is very difficult to process touching, overlapping and broken characters simultaneously. The strategy of our method is to adjust the binarization parameters such that broken characters can be avoided. On the contrary, adjacent characters may spread into each other seriously. Henceforth, the character segmentation problem can be focused on touching-character detection and separation. In the proposed approach, touching characters can be detected using the topological attributes of characters and the typographical relationship between characters. More specifically, the topological attributes are derived from the spatial organization of concave residua contained in the convex hull enclosing the characters. A shortest-path algorithm together with the convex-hull information is used to separate the composite. Since these features based upon the convex hull are insensitive to character fonts and sizes, the touching-character problem of various fonts and sizes can be managed even for heavily touching characters or italic-type overlapping characters without prior slant correction. The proposed method has been applied to extract isolated characters from the contents of technical journals, which contain characters of various fonts and sizes. The promising experimental results prove the practicality and feasibility of the proposed method.
Publisher
World Scientific Pub Co Pte Lt
Subject
Artificial Intelligence,Computer Vision and Pattern Recognition,Software
Cited by
8 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献