Automatic Text Classification of PDF Documents using NLP Techniques-Reference-Cited by-同舟云学术

Automatic Text Classification of PDF Documents using NLP Techniques

Published:2022-07 Issue:1 Volume:32 Page:1320-1331
ISSN:2334-5837
Container-title:INCOSE International Symposium
language:en
Short-container-title:INCOSE International Symp

Author:

Abdoun Nabil¹,Chami Mohammad¹

Affiliation:

1. SysDICE GmbH Franz‐Volhard‐Str. 5 68167 Mannheim Germany

Abstract

AbstractOne of the regular activities performed by engineers during the design and development of the technical systems is to determine which sentences in a PDF specification document represent a requirement, functional architecture, design solution, variability, or other types of systems engineering (SE) data. Extracting such data from these PDF documents and transferring it into system model elements is still performed manually, requires high effort, and is error prone. Hereby, automatic extraction and classification of such SE data has a great potential, but it is still relatively scarce and a challenging task for engineers working with large PDF specification documents. One solution is to follow a suitable writing formulation which provide an immediate and easy way to classify and analyze the PDF documents. However, such formulations are not always followed strictly. As part of our work towards adopting Artificial Intelligence (AI) for Model‐Based Systems Engineering (MBSE), we have been researching the data extraction and data classification topics from PDF files in order to transfer it to system models elements. In this paper, we present the early status of a solution based on AI that uses Natural Language Processing (NLP) techniques to label the SE data existing in PDF files, extract them, and classify them into predefined classes.

Publisher

Wiley

Subject

Automotive Engineering

Reference27 articles.

1. Automated demarcation of requirements in textual specifications: a machine learning-based approach

2. Bernardo J. M. &Smith A. F.(2009).Bayesian theory(Vol. 405). John Wiley & Sons.

3. Bloechle J.-L. Rigamonti M. Hadjar K. Lalanne D. &Ingold R.(2006). XCDF: A canonical and structured document format.International Workshop on Document Analysis Systems 141–152.

4. Brijain M. Patel R. Kushik M. &Rana K.(2014).A Survey on Decision Tree Algorithm For Classification.

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Integrating AI with MBSE for Data Extraction from Medical Standards;INCOSE International Symposium;2024-07

2. Advancements in Text Classification, A Comprehensive Review;2023 IEEE 11th Region 10 Humanitarian Technology Conference (R10-HTC);2023-10-16