Abstract
AbstractMotivationIn the field of biomedical text mining, the ability to extract relations from literature is crucial for advancing both theoretical research and practical applications. There is a notable shortage of corpora designed to enhance the extraction of multiple types of relations, particularly focusing on proteins and protein-containing entities such as complexes and families, as well as chemicals.ResultsIn this work we present RegulaTome, a corpus that overcomes the limitations of several existing biomedical relation extraction (RE) corpora, many of which concentrate on single-type relations at the sentence level. RegulaTome stands out by offering 16,962 relations annotated in over 2,500 documents, making it the most extensive dataset of its kind to date. This corpus is specifically designed to cover a broader spectrum of over 40 relation types beyond those traditionally explored, setting a new benchmark in the complexity and depth of biomedical RE tasks. Our corpus both broadens the scope of detected relations and allows for achieving noteworthy accuracy in RE. A Transformer-based model trained on this corpus has demonstrated a promising F1-score (66.6%) for a task of this complexity, underscoring the effectiveness of our approach in accurately identifying and categorizing a wide array of biological relations. This achievement highlights RegulaTome’s potential to significantly contribute to the development of more sophisticated, efficient, and accurate RE systems to tackle biomedical tasks. Finally, a run of the trained relation extraction system on all PubMed abstracts and PMC Open Access full-text documents resulted in over 18 million relations, extracted from the entire biomedical literature.AvailabilityThe corpus and all introduced resources are openly accessible via Zenodo (https://zenodo.org/doi/10.5281/zenodo.10808330) and GitHub (https://github.com/farmeh/RegulaTome_extraction).
Publisher
Cold Spring Harbor Laboratory
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献