Classification of Full Text Biomedical Documents: Sections Importance Assessment-Reference-Cited by-同舟云学术

Classification of Full Text Biomedical Documents: Sections Importance Assessment

Published:2021-03-17 Issue:6 Volume:11 Page:2674
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Oliveira Gonçalves Carlos Adriano^ORCID,Camacho Rui,Gonçalves Célia Talma,Seara Vieira Adrián,Borrajo Diz Lourdes^ORCID,Lorenzo Iglesias Eva^ORCID

Abstract

The exponential growth of documents in the web makes it very hard for researchers to be aware of the relevant work being done within the scientific community. The task of efficiently retrieving information has therefore become an important research topic. The objective of this study is to test how the efficiency of the text classification changes if different weights are previously assigned to the sections that compose the documents. The proposal takes into account the place (section) where terms are located in the document, and each section has a weight that can be modified depending on the corpus. To carry out the study, an extended version of the OHSUMED corpus with full documents have been created. Through the use of WEKA, we compared the use of abstracts only with that of full texts, as well as the use of section weighing combinations to assess their significance in the scientific article classification process using the SMO (Sequential Minimal Optimization), the WEKA Support Vector Machine (SVM) algorithm implementation. The experimental results show that the proposed combinations of the preprocessing techniques and feature selection achieve promising results for the task of full text scientific document classification. We also have evidence to conclude that enriched datasets with text from certain sections achieve better results than using only titles and abstracts.

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Link

https://www.mdpi.com/2076-3417/11/6/2674/pdf

Reference32 articles.

1. The SMART Retrieval System—Experiments in Automatic Document Processing;Salton,1971

2. Text categorization with support vector machines: Learning with many relevant features;Joachims,1998

3. Machine learning in automated text categorization

4. Systematic Characterizations of Text Similarity in Full Text Biomedical Publications

5. A comprehensive and quantitative comparison of text-mining in 15 million full-text articles versus their corresponding abstracts

Cited by 5 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Text mining for contexts and relationships in cancer genomics literature;Bioinformatics;2024-01-01

2. To Enhance Full-Text Biomedical Document Classification Through Semantic Enrichment;Lecture Notes in Computer Science;2023

3. An Incremental Approach to Classify Healthcare URLs Using a Novel ‘Web Document Classification Model’;ICT with Intelligent Applications;2022-10-01

4. A Novel Multi-View Ensemble Learning Architecture to Improve the Structured Text Classification;Information;2022-06-01

5. Last Teen Pixels for Arabic Font Size and Style Recognition;International Journal of Online and Biomedical Engineering (iJOE);2021-11-29