A Novel Statistic-Based Corpus Machine Processing Approach to Refine a Big Textual Data: An ESP Case of COVID-19 News Reports-Reference-Cited by-同舟云学术

A Novel Statistic-Based Corpus Machine Processing Approach to Refine a Big Textual Data: An ESP Case of COVID-19 News Reports

Published:2020-08-09 Issue:16 Volume:10 Page:5505
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Chen Liang-Ching^ORCID,Chang Kuei-Hu^ORCID,Chung Hsiang-Yu^ORCID

Abstract

With developments of modern and advanced information and communication technologies (ICTs), Industry 4.0 has launched big data analysis, natural language processing (NLP), and artificial intelligence (AI). Corpus analysis is also a part of big data analysis. For many cases of statistic-based corpus techniques adopted to analyze English for specific purposes (ESP), researchers extracted critical information by retrieving domain-oriented lexical units. However, even if corpus software embraces algorithms such as log-likelihood tests, log ratios, BIC scores, etc., the machine still cannot understand linguistic meanings. In many ESP cases, function words reduce the efficiency of corpus analysis. However, many studies still use manual approaches to eliminate function words. Manual annotation is inefficient and time-wasting, and can easily cause information distortion. To enhance the efficiency of big textual data analysis, this paper proposes a novel statistic-based corpus machine processing approach to refine big textual data. Furthermore, this paper uses COVID-19 news reports as a simulation example of big textual data and applies it to verify the efficacy of the machine optimizing process. The refined resulting data shows that the proposed approach is able to rapidly remove function and meaningless words by machine processing and provide decision-makers with domain-specific corpus data for further purposes.

Funder

Ministry of Science and Technology, Taiwan

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Link

https://www.mdpi.com/2076-3417/10/16/5505/pdf

Reference56 articles.

1. An Innovative Industry 4.0 Cloud Data Transfer Method for an Automated Waste Collection System

2. Looking at energy through the lens of Industry 4.0: A systematic literature review of concerns and challenges

3. Sustainability accounting and reporting in the industry 4.0

4. Identifying Data Dependencies as First Step to Obtain a Proactive Historian: Test Scenario in the Water Industry 4.0

5. Study on Reverse Logistics Focused on Developing the Collection Signal Algorithm Based on the Sensor Data and the Concept of Industry 4.0

Cited by 18 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. An entropy-based corpus method for improving keyword extraction: An example of sustainability corpus;Engineering Applications of Artificial Intelligence;2024-07

2. A machine-based corpus optimization method for extracting domain-oriented technical words: an example of COVID-19 corpus data;Journal of Intelligent & Fuzzy Systems;2024-04-18

3. A novel frequency-range analysis (FRA) method for determining critical words among English high-stakes tests;Journal of Intelligent & Fuzzy Systems;2023-12-02

4. The words that make fake stories go viral: A corpus-based approach to analyzing Russian Covid-19 disinformation;Russian Journal of Linguistics;2023-09-30

5. An Extended AHP-Based Corpus Assessment Approach for Handling Keyword Ranking of NLP: An Example of COVID-19 Corpus Data;Axioms;2023-07-28