Error correction vs. query garbling for Arabic OCR document retrieval
-
Published:2007-11
Issue:1
Volume:26
Page:5
-
ISSN:1046-8188
-
Container-title:ACM Transactions on Information Systems
-
language:en
-
Short-container-title:ACM Trans. Inf. Syst.
Author:
Darwish Kareem1,
Magdy Walid1
Affiliation:
1. IBM Technology Development Center, Cairo, Abou Rawash, Egypt
Abstract
Due to the existence of large numbers of legacy documents (such as old books and newspapers), improving retrieval effectiveness for OCR'ed documents continues to be an important problem. This article compares the effect of OCR error correction with and without language modeling and the effect of query garbling with weighted structured queries on the retrieval of OCR degraded Arabic documents. The results suggest that moderate error correction does not yield statistically significant improvement in retrieval effectiveness when indexing and searching using n-grams. Also, reversing error correction models to perform query garbling in conjunction with weighted structured queries yields improved retrieval effectiveness. Lastly, using very good error correction that utilizes language modeling yields the best improvement in retrieval effectiveness.
Publisher
Association for Computing Machinery (ACM)
Subject
Computer Science Applications,General Business, Management and Accounting,Information Systems
Reference36 articles.
1. Ahmed M. 2000. A large-scale computational processor of Arabic morphology and applications. MSc. thesis Cairo University---Cairo Egypt. Ahmed M. 2000. A large-scale computational processor of Arabic morphology and applications. MSc. thesis Cairo University---Cairo Egypt.
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Studying the effect and treatment of misspelled queries in Cross-Language Information Retrieval;Information Processing & Management;2016-07
2. Applications Exploiting Multimedia Semantics;Multimedia Ontology;2015-06-26
3. Arabic Information Retrieval;Foundations and Trends® in Information Retrieval;2014
4. Information Retrieval;Natural Language Processing of Semitic Languages;2014