Abstract
The Transformer-based approaches to solving natural language processing (NLP) tasks such as BERT and GPT are gaining popularity due to their ability to achieve high performance. These approaches benefit from using enormous data sizes to create pre-trained models and the ability to understand the context of words in a sentence. Their use in the information retrieval domain is thought to increase effectiveness and efficiency. This paper demonstrates a BERT-based method (CASBERT) implementation to build a search tool over data annotated compositely using ontologies. The data was a collection of biosimulation models written using the CellML standard in the Physiome Model Repository (PMR). A biosimulation model structurally consists of basic entities of constants and variables that construct higher-level entities such as components, reactions, and the model. Finding these entities specific to their level is beneficial for various purposes regarding variable reuse, experiment setup, and model audit. Initially, we created embeddings representing compositely-annotated entities for constant and variable search (lowest level entity). Then, these low-level entity embeddings were vertically and efficiently combined to create higher-level entity embeddings to search components, models, images, and simulation setups. Our approach was general, so it can be used to create search tools with other data semantically annotated with ontologies - biosimulation models encoded in the SBML format, for example. Our tool is named Biosimulation Model Search Engine (BMSE).
Funder
Aotearoa Foundation
National Institutes of Health
Auckland Bioengineering Institute
Subject
General Pharmacology, Toxicology and Pharmaceutics,General Immunology and Microbiology,General Biochemistry, Genetics and Molecular Biology,General Medicine
Reference48 articles.
1. Bert: Pre-training of deep bidirectional transformers for language understanding.;J Devlin;arXiv preprint arXiv:1810.04805.,2018
2. Improving Language Understanding by Generative Pre-Training.;A Radford;OpenAI Blog.,2018
3. Language models are unsupervised multitask learners.;A Radford;OpenAI blog.,2019
4. An Overview of CellML 1.1, a Biological Model Description Language.;A Cuellar;SIMULATION.,December 2003
5. Nielsen, and Hugh Sorby. CellML 2.0.;M Clerx;J. Integr. Bioinform.,June 2020
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献