Text mining techniques for identifying failure modes-Reference-Cited by-同舟云学术

Text mining techniques for identifying failure modes

Published:2023-02-06 Issue:3 Volume:29 Page:666-682
ISSN:1355-2511
Container-title:Journal of Quality in Maintenance Engineering
language:en
Short-container-title:JQME

Author:

Malan Francina^ORCID,Jooste Johannes Lodewyk^ORCID

Abstract

PurposeThe purpose of this paper is to compare the effectiveness of the various text mining techniques that can be used to classify maintenance work-order records into their respective failure modes, focussing on the choice of algorithm and preprocessing transforms. Three algorithms are evaluated, namely Bernoulli Naïve Bayes, multinomial Naïve Bayes and support vector machines.Design/methodology/approachThe paper has both a theoretical and experimental component. In the literature review, the various algorithms and preprocessing techniques used in text classification is considered from three perspectives: the domain-specific maintenance literature, the broader short-form literature and the general text classification literature. The experimental component consists of a 5 × 2 nested cross-validation with an inner optimisation loop performed using a randomised search procedure.FindingsFrom the literature review, the aspects most affected by short document length are identified as the feature representation scheme, higher-order n-grams, document length normalisation, stemming, stop-word removal and algorithm selection. However, from the experimental analysis, the selection of preprocessing transforms seemed more dependent on the particular algorithm than on short document length. Multinomial Naïve Bayes performs marginally better than the other algorithms, but overall, the performances of the optimised models are comparable.Originality/valueThis work highlights the importance of model optimisation, including the selection of preprocessing transforms. Not only did the optimisation improve the performance of all the algorithms substantially, but it also affects model comparisons, with multinomial Naïve Bayes going from the worst to the best performing algorithm.

Publisher

Emerald

Subject

Industrial and Manufacturing Engineering,Strategy and Management,Safety, Risk, Reliability and Quality

Reference42 articles.

1. A review of machine learning algorithms for text-documents classification;Journal of Advances in Information Technology,2010

2. Random search for hyper-parameter optimization;Journal ofMachine Learning Research,2012

3. Classifying sentiment in microblogs: is brevity an advantage?,2010

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Research on the construction of human resources job competency model based on text mining;Third International Conference on Advanced Manufacturing Technology and Electronic Information (AMTEI 2023);2024-04-01

2. Multinomial Naïve Bayes Classifier for Sentiment Analysis of Internet Movie Database;Vietnam Journal of Computer Science;2023-08-09