Exploration of Scientific Documents through Unsupervised Learning-Based Segmentation Techniques-Reference-Cited by-同舟云学术

Exploration of Scientific Documents through Unsupervised Learning-Based Segmentation Techniques

Published:2024-04-01 Issue: Volume:3 Page:68
ISSN:3008-8127
Container-title:Seminars in Medical Writing and Education
language:
Short-container-title:Seminars in Medical Writing and Education

Author:

Cherradi Mohamed,El Haddadi Anass

Abstract

Navigating the extensive landscape of scientific literature presents a significant challenge, prompting the development of innovative methodologies for efficient exploration. Our study introduces a pioneering approach for unsupervised segmentation, aimed at revealing thematic trends within articles and enhancing the accessibility of scientific knowledge. Leveraging three prominent clustering algorithms—K-Means, Hierarchical Agglomerative, and DBSCAN—we demonstrate their proficiency in generating meaningful clusters, validated through assessment metrics including Silhouette Score, Calinski-Harabasz Index, and Davies-Bouldin Index. Methodologically, comprehensive web scraping of scientific databases, coupled with thorough data cleaning and preprocessing, forms the foundation of our approach. The efficacy of our methodology in accurately identifying scientific domains and uncovering interdisciplinary connections underscores its potential to revolutionize the exploration of scientific publications. Future endeavors will further explore alternative unsupervised algorithms and extend the methodology to diverse data sources, fostering continuous innovation in scientific knowledge organization

Publisher

Salud, Ciencia y Tecnologia

Reference11 articles.

1. 1. Afzali, M., & Kumar, S. (2019). Text Document Clustering : Issues and Challenges. International Conference on Machine Learning, Big Data, Cloud and Parallel Computing (COMITCon), 263‑268. https://doi.org/10.1109/COMITCon.2019.8862247

2. 2. Cozzolino, I., & Ferraro, M. (2022). Document clustering. Wiley Interdisciplinary Reviews: Computational Statistics, 14. https://doi.org/10.1002/wics.1588

3. 3. Mishra, S., Saini, N., Saha, S., & Bhattacharyya, P. (2022). Scientific document summarization in multi-objective clustering framework. Applied Intelligence, 52, 1‑24. https://doi.org/10.1007/s10489-021-02376-5

4. 4. Jalal, A., & Ali, B. (2021). Text documents clustering using data mining techniques. International Journal of Electrical and Computer Engineering, 11, 664‑670. https://doi.org/10.11591/ijece.v11i1.pp664-670

5. 5. Kim, S.-W., & Gil, J.-M. (2019). Research paper classification systems based on TF-IDF and LDA schemes. Human-centric Computing and Information Sciences, 9. https://doi.org/10.1186/s13673-019-0192-7

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Cytoprotection of Cecropia obtusifolia Bertol (Cecropiaceae) extract on the normal adherent cell line of human fibroblasts Hs68;Salud, Ciencia y Tecnología - Serie de Conferencias;2024-05-09

2. The Business Paradox: Exploring the interaction between the Business Clock and the Sustainable Development Goals through an ethical, sustainable and well-being prism;Salud, Ciencia y Tecnología - Serie de Conferencias;2024-05-07