Text clustering based on pre-trained models and autoencoders-Reference-Cited by-同舟云学术

Text clustering based on pre-trained models and autoencoders

Published:2024-01-05 Issue: Volume:17 Page:
ISSN:1662-5188
Container-title:Frontiers in Computational Neuroscience
language:
Short-container-title:Front. Comput. Neurosci.

Author:

Xu Qiang,Gu Hao,Ji ShengWei

Abstract

Text clustering is the task of grouping text data based on similarity, and it holds particular importance in the medical field. sIn healthcare, medical data clustering is a highly active and effective research area. It not only provides strong support for making correct medical decisions from medical datasets but also aids in patient record management and medical information retrieval. With the development of the healthcare industry, a large amount of medical data is being generated, and traditional medical data clustering faces significant challenges. Many existing text clustering algorithms are primarily based on the bag-of-words model, which has issues such as high dimensionality, sparsity, and the neglect of word positions and context. Pre-trained models are a deep learning-based approach that treats text as a sequence to accurately capture word positions and context information. Moreover, compared to traditional K-means and fuzzy C-means clustering models, deep learning-based clustering algorithms are better at handling high-dimensional, complex, and nonlinear data. In particular, clustering algorithms based on autoencoders can learn data representations and clustering information, effectively reducing noise interference and errors during the clustering process. This paper combines pre-trained language models with deep embedding clustering models. Experimental results demonstrate that our model performs exceptionally well on four public datasets, outperforming most existing text clustering algorithms, and can be applied to medical data clustering.

Publisher

Frontiers Media SA

Reference33 articles.

1. “A survey of text clustering algorithms,”;Aggarwal;Mining text data,2012

2. Layer normalization;Ba;arXiv preprint arXiv:1607.06450,2016

3. A survey of density based clustering algorithms;Bhattacharjee;Front. Comput. Sci,2021

4. A cloud-edge-aided incremental high-order possibilistic c-means algorithm for medical data clustering;Bu;IEEE Trans. Fuzzy Syst,2020

5. Locally consistent concept factorization for document clustering;Cai;IEEE Trans. Knowl. Data Eng,2010

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Automatic Extraction and Cluster Analysis of Natural Disaster Metadata Based on the Unified Metadata Framework;ISPRS International Journal of Geo-Information;2024-06-14

2. Masked self‐supervised pre‐training model for EEG‐based emotion recognition;Computational Intelligence;2024-06

3. Neural Network Meaningful Learning Theory and its Application for Deep Text Clustering;IEEE Access;2024

4. CafeLLM: Context-Aware Fine-Grained Semantic Clustering Using Large Language Models;Communications in Computer and Information Science;2024