Affiliation:
1. Federal University of São Paulo, Brazil; Federal Institute of Education, Science and Technology of São Paulo, Brazil
2. Federal University of São Paulo, Brazil
Abstract
Successful consumer health vocabulary (CHV) models have been engineered and updated by using automatic term extraction techniques from online content. However, the relationship between terms has yet to be mapped. This study aims to describe a CHV model for the Brazilian Portuguese language that is supported by a complex network. The method was split up into three distinct stages: (1) collect and automatically extract terms from structured data sources on the web, such as Unified Medical Language System (UMLS) vocabularies and DBpedia; (2) construct a complex network; and (3) select the terms supported by clustering techniques. A model called CHV.br was developed and supported by a complex network structure which makes connections between the controlled vocabulary and consumer vocabulary and maps semantic relationships as categories, synonyms and related terms. CHV.br contains 146,956 terms, of which 31,439 are UMLS preferred terms and 83,279 are synonyms. The CHV.br is available and powered by Simple Knowledge Organization System and Resource Description Framework standards. The method used in this study showed to be valid for the selection of the candidate terms by connecting the terms from different reliable resources, in addition to expanding the number of terms and their semantic relationships. The content and structure of CHV.br could play a vital role in enhancing the development of consumer-oriented health applications.
Subject
Library and Information Sciences,Information Systems