Author:
Renner Robinette,Li Shengyu,Huang Yulong,van der Zijp-Tan Ada Chaeli,Tan Shaobo,Li Dongqi,Kasukurthi Mohan Vamsi,Benton Ryan,Borchert Glen M.,Huang Jingshan,Jiang Guoqian
Abstract
AbstractBackgroundThe medical community uses a variety of data standards for both clinical and research reporting needs. ISO 11179 Common Data Elements (CDEs) represent one such standard that provides robust data point definitions. Another standard is the Biomedical Research Integrated Domain Group (BRIDG) model, which is a domain analysis model that provides a contextual framework for biomedical and clinical research data. Mapping the CDEs to the BRIDG model is important; in particular, it can facilitate mapping the CDEs to other standards. Unfortunately, manual mapping, which is the current method for creating the CDE mappings, is error-prone and time-consuming; this creates a significant barrier for researchers who utilize CDEs.MethodsIn this work, we developed a semi-automated algorithm to map CDEs to likely BRIDG classes. First, we extended and improved our previously developed artificial neural network (ANN) alignment algorithm. We then used a collection of 1284 CDEs with robust mappings to BRIDG classes as the gold standard to train and obtain the appropriate weights of six attributes in CDEs. Afterward, we calculated the similarity between a CDE and each BRIDG class. Finally, the algorithm produces a list of candidate BRIDG classes to which the CDE of interest may belong.ResultsFor CDEs semantically similar to those used in training, a match rate of over 90% was achieved. For those partially similar, a match rate of 80% was obtained and for those with drastically different semantics, a match rate of up to 70% was achieved.DiscussionOur semi-automated mapping process reduces the burden of domain experts. The weights are all significant in six attributes. Experimental results indicate that the availability of training data is more important than the semantic similarity of the testing data to the training data. We address the overfitting problem by selecting CDEs randomly and adjusting the ratio of training and verification samples.ConclusionsExperimental results on real-world use cases have proven the effectiveness and efficiency of our proposed methodology in mapping CDEs with BRIDG classes, both those CDEs seen before as well as new, unseen CDEs. In addition, it reduces the mapping burden and improves the mapping quality.
Publisher
Springer Science and Business Media LLC
Subject
Health Informatics,Health Policy,Computer Science Applications
Reference31 articles.
1. Wetherall ASTDJ. Computer networks. 5th ed. Upper Saddle River: Prentice Hall Publishing; 2011.
2. Richesson RL, Fung KW, Krischer JP. Heterogeneous but "standard" coding systems for adverse events: issues in achieving interoperability between apples and oranges. Contemp Clin Trials. 2008;29(5):635–45.
3. CIBMTR Progress report 2017. http://www.cibmtr.org/About/AdminReports/Pages/index.aspx. Accessed 26 May 2018.
4. Renner R, Carlis J, Maiers M, Rizzo JD, O’Neill C, Horowitz M, et al. Integration of hematopoietic cell transplantation outcomes data. Proceedings of 2015 International Conference on Data Integration in the Life Sciences. 2015;9162:139–46.
5. Becnel LB, Hastak S, Ver Hoef W, Milius RP, Slack M, Wold D, et al. BRIDG: a domain information model for translational and clinical protocol-driven research. J Am Med Inform Assoc. 2017;24(5):882–90.
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献