Affiliation:
1. Department of Computer Science, University of Stuttgart, Azenbergstr 12, D-7000, Stuttgart, FR Germany
Abstract
The realization of the paper-free office seems to be difficult that expected. Therefore, good paper-computer interfaces are necessary to transform paper documents into an electronic form, which allows the use of a filing and retrieval system. An electronic document page is an optically scanned and digitized representation of a printed page. Document analysis is the problem of interpreting and labeling the constitutents of the document. Although there are very reliable optical character recognition (OCR) methods, the process could be very inefficient. To prune the search space and to become more efficient, some search supporting methods have to be developed. This article proposes an approach to identify the layout of a document page by dividing it recursively into nested rectangular areas. The procedure is used as a basis for a document layout model, which is able to control an automatic interpretation mechanism for deriving a high level representation of the contents of a document. We have implemented our method in Common Lisp on a Symbolies 3640 Workstation and have run it for a large population of office documents. The results obtained have been very encouraging and have convincingly confirmed the soundness of our approach.
Publisher
World Scientific Pub Co Pte Lt
Subject
Artificial Intelligence,Computer Vision and Pattern Recognition,Software
Cited by
16 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献