Using Transformer-based Language Models to Identify Publications from Clinical Trials that Use Nested Designs (Preprint)

Author:

Elaheh ElahehORCID,Murray David

Abstract

BACKGROUND

For the public health community, monitoring recently published articles is crucial for staying informed about the latest research developments and advancements. However, identifying publications with specific research designs from the extensive body of public health publications is a challenge with currently available methods. Using search queries to retrieve publications of this type often yields low sensitivity, failing to identify many qualifying publications. While document classification techniques like classical and deep learning machine learning models have been widely used for categorizing biomedical documents, the use of state-of-the-art language models for this purpose has yet to be explored. The language models pre-trained on biomedical data that do exist have suboptimal performance in predicting categories in specialized tasks like identifying publications from clinical trials that use nested designs.

OBJECTIVE

Our objective is to develop a fine-tuned pre-trained language model that can accurately identify publications from clinical trials that use a Group- or Cluster-Randomized Trial (GRT), Individually Randomized Group-Treatment Trial (IRGT), or Stepped Wedge Group- or Cluster-Randomized Trial (SWGRT) design within the biomedical literature.

METHODS

We fine-tuned the BiomedBERT language model using a dataset of biomedical literature from the Office of Disease Prevention at the National Institutes of Health. The model was trained to classify publications into three categories of clinical trials that use nested designs. The model performance was evaluated on unseen data and demonstrated high sensitivity and specificity for each class.

RESULTS

When our proposed model was tested for generalizability with unseen data, it delivered high sensitivity and specificity for each class as follows: Non-randomized trials (0.95 and 0.93), GRTs (0.94 and 0.90), IRGTs (0.81 and 0.97), and SWGRTs (0.96 and 0.99), respectively.

CONCLUSIONS

This model offers a valuable tool for the public health community to directly identify publications from clinical trials that use one of three classes of nested designs.

Publisher

JMIR Publications Inc.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3