Abstract
Several works have been done in the last decades for understanding tables in documents, but most of them were not specifically designed to understand tables in engineering specification documents. Tables in engineering specifications have characteristics such as various table structures with restricted terms. A framework is developed to address the issues in understanding tables in engineering specification documents. The framework consists of three steps: (1) Identifying minimal tables, (2) classifying cells, and (3) extending a domain knowledge map. A modified XY-tree algorithm was developed to find minimal tables, and a neural network algorithm was adopted to classify cells into labels and data. Then, specific domain rules were developed to discover concepts and relationships from terms in the classified cells. It is assumed a domain ontology is given, and it is extended with new concepts and relationships extracted from tables. We illustrated how each step performed with engineering table examples. The proposed framework could be used for searching product specification and for discovering hidden knowledge from tables in engineering specification documents.
Funder
Ministry of Trade, Industry and Energy
Subject
Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science