Affiliation:
1. SRM Institute of Science and Technology, India
Abstract
Given the scarcity of labeled corpora and the high costs of human annotation by qualified experts, clinical decision-making algorithms in biomedical text classification require a significant number of costly training texts. To reduce labeling expenses, it is common practice to use the active learning (AL) approach to reduce the volume of labeled documents required to produce the required performance. There are two methods for categorizing articles: article-level classification and journal-level classification. In this chapter, the authors present a hybrid strategy for training classifiers with article metadata such as title, abstract, and keywords annotated with the journal-level classification FoR (fields of research) using natural language processing (NLP) embedding techniques. These classifiers are then applied at the article level to analyze biomedical publications using PubMed metadata. The authors trained BERT classifiers with FoR codes and applied them to classify publications based on their available metadata.