Abstract
Traditional document processing can be labor-intensive and time-consuming to manually extract and organize the information in a document. This manual process is often inefficient and error-prone. In order to improve processing efficiency and accuracy of document data, we develop IntelliExtract, an end-to-end framework designed for document information extraction. This is a comprehensive framework that includes image text detection and recognition, information extraction, and document intelligent question-answering. Some recent models and algorithms are employed, OCR models for converting scanned documents into machine readable text, layout analysis algorithms for understanding the spatial arrangement of document elements, and information extraction techniques for extracting structured data from unstructured documents. To evaluate the effectiveness of the framework, we conducted experiments by employing a Chinese Talent Resumes Dataset for visualizing the results. For named entity extraction, the confidence level of the extracted results from the text in the images is generally above 0.95. The proposed framework provides a powerful tool for enterprises, educational institutions, and other entities in processing document information, and holds promise for significant practical applications.
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Fast and Accurate Resume Parsing Method Based on Multi-Task Learning;2023 International Conference on Asian Language Processing (IALP);2023-11-18