Integrating corpus-based and NLP approach to extract terminology and domain-oriented information: an example of US military corpus-Reference-Cited by-同舟云学术

Integrating corpus-based and NLP approach to extract terminology and domain-oriented information: an example of US military corpus

Published:2022-07-28 Issue: Volume:44 Page:e60486
ISSN:1806-2563
Container-title:Acta Scientiarum. Technology
language:
Short-container-title:Acta Sci Technol

Author:

Chen Liang-Ching^ORCID,Chang Kuei-Hu^ORCID,Yang Shu-Ching

Abstract

Within the modern information, communication and technology (ICT), seeking high efficient and accurate corpus-based approaches to process natural language data (NLD) is critical. Traditional corpus-based approaches for processing corpus (i.e. the collected NLD) mainly focused on quantifying and ranking words for assisting human in extracting keywords. However, traditional corpus-based approaches cannot identify the meanings behind the words to properly extract terminologies nor their information. To address this issue, the main objective of this paper is to propose an integrated linguistic analysis approach that combines two corpus-based approaches and a rule-based natural language processing (NLP) approach to extract and identify terminologies and create the text database for extracting deeper domain-oriented information by using the terminologies as channels to retrieve core information from the target corpus. Military domain is an uncommon research field and often classified as confidential data, which caused little researches to focus on. Nevertheless, military information is vital to national security and should not be ignored. Hence, to verify the proposed approach in extracting terminologies and information of the terminologies, the researchers adopt the US Army field manual (FM) 8-10-6 as the target corpus and empirical case. Compared with AntConc 3.5.8 and Tongpoon-Patanasorn’s hybrid approach, the results indicate that from the perspectives of terminology identification, texts database creation, domain knowledge extraction, only the proposed approach can handle all these issues.

Publisher

Universidade Estadual de Maringa

Subject

General Earth and Planetary Sciences,General Physics and Astronomy,General Engineering,General Mathematics,General Chemistry,General Computer Science

Reference1 articles.

1. .

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. An entropy-based corpus method for improving keyword extraction: An example of sustainability corpus;Engineering Applications of Artificial Intelligence;2024-07

2. Artificial Intelligence and Information Processing: A Systematic Literature Review;Mathematics;2023-05-23