BACKGROUND
Big data research in the health field is hindered by a lack of agreement on how to identify and define different conditions and their medications. This means researchers and health professionals often have different phenotype definitions of the same condition. This lack of agreement makes it difficult to compare different study findings and so hinders the field’s ability to do repeatable and reusable research.
OBJECTIVE
To examine the requirements of various users, such as researchers, clinicians, machine learning experts, and managers, in the development of a data portal for phenotypes (a concept library).
METHODS
A qualitative study using interviews and a focus group. One-to-one interviews were conducted with researchers, clinicians, machine learning experts, and senior research managers in health data science (n=6) to explore their specific needs in the development of a concept library. In addition, a focus group with researchers (n=14) working with the SAIL databank, a national e-health data linkage infrastructure, was held to perform a SWOT analysis (Strengths, Weaknesses, Opportunities, Threats) for the current system for phenotyping and the proposed concept library. The interviews and focus group were both verbatim transcribed, and two thematic analyses were performed.
RESULTS
Most of the participants thought that the prototype concept library would be a very helpful resource for conducting repeatable research, but they specified many requirements needed before its development. Although all the participants stated that they were aware of some existing concept libraries, the majority of them expressed negative perceptions about them. The participants mentioned several facilitators that would stimulate them to share their work and/or to reuse the work of others, and they pointed out several barriers that could inhibit them from sharing their work and/or reusing the work of others. The participants have suggested some developments they would like to see to improve reproducible research output using routine data.
CONCLUSIONS
The study indicated that most interviewees would value a concept library for phenotypes. However, only half of the participants felt they would contribute by providing definitions for the concept library, and they reported many barriers regarding sharing their work on a publicly accessible platform. Analysis of interviews and the focus group revealed that different stakeholders have different requirements, facilitators, barriers, and concerns about a prototype concept library.
CLINICALTRIAL