Variable indexing method in rule documents for ship design using extraction of portable document format elements

Author:

Kong Min-Chul1ORCID,Roh Myung-Il12ORCID,Kim Ki-Su3,Kim Jongoh4,Kim Ju-Sung4,Park Hogyun4

Affiliation:

1. Department of Naval Architecture and Ocean Engineering, Seoul National University , Gwanak-gu, Seoul, 08826, Republic of Korea

2. Research Institute of Marine Systems Engineering, Seoul National University , Gwanak-gu, Seoul, 08826 , Republic of Korea

3. School of Naval Architecture and Ocean Engineering, University of Ulsan , Nam-gu, Ulsan, 44610, Republic of Korea

4. ICT Solution Team, Korean Register , Gangseo-gu, Busan, 46762, Republic of Korea

Abstract

Abstract Design rules for ships have become more extensive and detailed due to an increase in the sizes of ships. Several variables and equations used in the rules are complex, thereby impeding their review by reviewers due to their voluminosity. In addition, because these rules are constantly revised, professional investigators may miss these changes. To prevent such confusion, a shipping register, which approves ship drawings, constantly automates the search and review processes of the rules. Consequently, this study proposes a method for recognizing variables in documents to review the rules and build relationships between variables. Each component of a document must be accurately identified. The document containing these rules includes different components such as equations, figures, and strings. Because these rules are mainly converted to a portable document format (PDF) for compatibility, it is challenging to extract each component as raw data. This study used a public library to extract elements from the PDF and utilized the positional relationship between the elements to identify the variables. By applying the Levenshtein distance algorithm, which compares the differences between two strings, the document was partitioned following to the table of contents. Hence, the identified variables were indexed into sections of the table of content. Additionally, based on the indexed information, a data structure was proposed to show the equations, definition of variables, and relationships. This study applied it to common structural rules, which are widely used in the shipbuilding industry. The effectiveness of the proposed method was confirmed by achieving the F1 score = 0.93 in variable recognition and intuitively visualizing the relationship between the variables.

Funder

Seoul National University

Publisher

Oxford University Press (OUP)

Subject

Computational Mathematics,Computer Graphics and Computer-Aided Design,Human-Computer Interaction,Engineering (miscellaneous),Modeling and Simulation,Computational Mechanics

Reference29 articles.

1. Enriching word vectors with subword information;Bojanowski;Transactions of the Association for Computational Linguistics,2017)

2. Layout and content extraction for PDF documents;Chao,(2004)

3. Text extraction and categorization from watermark scientific document in bulk;Chia,(2018)

4. Table detection using deep learning;Gilani,(2017)

Cited by 1 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Natural language processing-based approach for automatically coding ship sensor data;International Journal of Naval Architecture and Ocean Engineering;2024

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3