Automatic Classification of National Health Service Feedback-Reference-Cited by-同舟云学术

Automatic Classification of National Health Service Feedback

Published:2022-03-18 Issue:6 Volume:10 Page:983
ISSN:2227-7390
Container-title:Mathematics
language:en
Short-container-title:Mathematics

Author:

Haynes Christopher^ORCID,Palomino Marco A.^ORCID,Stuart Liz,Viira David,Hannon Frances,Crossingham Gemma,Tantam Kate

Abstract

Text datasets come in an abundance of shapes, sizes and styles. However, determining what factors limit classification accuracy remains a difficult task which is still the subject of intensive research. Using a challenging UK National Health Service (NHS) dataset, which contains many characteristics known to increase the complexity of classification, we propose an innovative classification pipeline. This pipeline switches between different text pre-processing, scoring and classification techniques during execution. Using this flexible pipeline, a high level of accuracy has been achieved in the classification of a range of datasets, attaining a micro-averaged F1 score of 93.30% on the Reuters-21578 “ApteMod” corpus. An evaluation of this flexible pipeline was carried out using a variety of complex datasets compared against an unsupervised clustering approach. The paper describes how classification accuracy is impacted by an unbalanced category distribution, the rare use of generic terms and the subjective nature of manual human classification.

Publisher

MDPI AG

Subject

General Mathematics,Engineering (miscellaneous),Computer Science (miscellaneous)

Link

https://www.mdpi.com/2227-7390/10/6/983/pdf

Reference52 articles.

1. A comparative study of two automatic document classification methods in a library setting

2. An evaluation of Naive Bayesian anti-spam filtering;Androutsopoulos,2000

3. Prose Analysis: Purposes, Procedures, and Problems 1;Meyer,2017

4. Some effective techniques for naive bayes text classification;Kim;IEEE Trans. Knowl. Data Eng.,2006

Cited by 9 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Explainable text-based features in predictive models of crowdfunding campaigns;Annals of Operations Research;2024-01-12

2. Automatic Product Classification Using Supervised Machine Learning Algorithms in Price Statistics;Mathematics;2023-03-24

3. Visualizing the recovery of patients in Critical Care Units;Information Visualization;2023-03-21

4. Statistical Depth for Text Data: An Application to the Classification of Healthcare Data;Mathematics;2023-01-02

5. Automatic Classification of Tweets Identifying Mental Health Conditions in Central American Population in a Pandemic;Communications in Computer and Information Science;2023