Affiliation:
1. The School of Microelectronics and Communication Engineering, Chongqing University, Chongqing 400044, China
2. National Information Center of GACC, Beijing 100010, China
Abstract
As a crucial national security defense line, the existing risk prevention and screening system of customs falls short in terms of intelligence and diversity for risk identification factors. Hence, the urgent issues to be addressed in the risk identification system include intelligent extraction technology for key information from Customs Unstructured Accompanying Documents (CUADs) and the reliability of the extraction results. In the customs scenario, OCR is employed for M2M interactions, but current models have difficulty adapting to diverse image qualities and complex customs document content. We propose a hybrid mutual learning knowledge distillation (HMLKD) method for optimizing a pre-trained OCR model’s performance against such challenges. Additionally, current models lack effective incorporation of domain-specific knowledge, resulting in insufficient text recognition accuracy for practical customs risk identification. We propose a customs domain knowledge graph (CDKG) developed using CUAD knowledge and propose an integrated CDKG post-OCR correction method (iCDKG-PostOCR) based on CDKG. The results on real data demonstrate that the accuracies improve for code text fields to 97.70%, for character type fields to 96.55%, and for numerical type fields to 96.00%, with a confidence rate exceeding 99% for each. Furthermore, the Customs Health Certificate Extraction System (CHCES) developed using the proposed method has been implemented and verified at Tianjin Customs in China, where it has showcased outstanding operational performance.
Funder
National Key Research and Development Program of China
Reference44 articles.
1. Chakraborty, S., Harit, G., and Ghosh, S. (2023, January 21–26). TransDocAnalyser: A framework for semi-structured offline handwritten documents analysis with an application to legal domain. Proceedings of the International Conference on Document Analysis and Recognition, San Jose, CA, USA.
2. Optical character recognition on bank cheques using 2D convolution neural network;Srivastava;Proceedings of the Applications of Artificial Intelligence Techniques in Engineering: SIGMA 2018,2019
3. Pradipta, D.J., Handayani, P.W., and Shihab, M.R. (2021, January 9–11). Evaluation of the customs document lane system effectiveness: A case study in Indonesia. Proceedings of the 2021 3rd East Indonesia Conference on Computer and Information Technology (EIConCIT), Surabaya, Indonesia.
4. Modern Customs Risk Management Framework: Improvement towards Institutional Reform;Basir;Int. J. Innov. Sci. Res. Technol.,2019
5. Historical review of OCR research and development;Mori;Proc. IEEE,1992