Author:
Lesmann Hellen,Lyon Gholson J.,Caro Pilar,Abdelrazek Ibrahim M.,Moosa Shahida,Pantel Jean Tori,ten Hagen Merle,Rosnev Stanislav,Kamphans Tom,Meiswinkel Wolfgang,Li Jing-Mei,Klinkhammer Hannah,Hustinx Alexander,Javanmardi Behnam,Knaus Alexej,Uwineza Annette,Knopp Cordula,Marchi Elaine,Elbracht Miriam,Mattern Larissa,Jamra Rami Abou,Velmans Clara,Strehlow Vincent,Nabil Amira,Graziano Claudio,Artem Borovikov,Schnabel Franziska,Heuft Lara,Herrmann Vera,Höller Matthias,Alaaeldin Khoshoua,Jezela-Stanek Aleksandra,Mohamed Amal,Lasa-Aranzasti Amaia,Elmakkawy Gehad,Safwat Sylvia,Ebstein Frédéric,Küry Sébastien,Arlt Annabelle,Marbach Felix,Netzer Christian,Kaptain Sophia,Weiland Hannah,Devriendt Koen,Gripp Karen W.,Mücke Martin,Verloes Alain,Schaaf Christian P.,Nellåker Christoffer,Solomon Benjamin D.,Waikel Rebekah,Abdalla Ebtesam,Nöthen Markus M.,Krawitz Peter M.,Hsieh Tzung-Chien
Abstract
AbstractThe value of computer-assisted image analysis has been shown in several studies. The performance of tools with artificial intelligence (AI), such as GestaltMatcher, is improved with the size and diversity of the training set, but properly labeled training data is currently the biggest bottleneck in developing next-generation phenotyping (NGP) applications. Therefore, we developed GestaltMatcher Database (GMDB) - a database for machine-readable medical image data that complies with the FAIR principles and improves the openness and accessibility of scientific findings in Medical Genetics.An entry in GMDB consists of a medical image such as a portrait, X-ray, or fundoscopy, and machine-readable meta information such as a clinical feature encoded in HPO terminology or a disease-causing mutation reported in HGVS format. In the beginning, data was mainly collected by curators gathering images from the literature. Currently, clinicians and individuals recruited from patient support groups provide their previously unpublished data. For this patient-centered approach, we developed a digital consent form. GMDB is a modern publication medium for case reports that complements preprints, e.g., on medRxiv. To enable inter-cohort comparisons, we implemented a research feature in GMDB that computes the pairwise syndromic similarity between hand-picked cases.Through a community-driven effort, we compiled an image collection of over 7,533 cases with 792 disorders in GMDB. Most of the data was collected from 2,058 publications. In addition, about 1,018 frontal images of 498 previously unpublished cases were obtained. The web interface enables gene- and phenotype-centered queries or infinite scrolls in the gallery. Digital consent has led to increasing adoption of the approach by patients. The research app within GMDB was used to generate syndromic similarity matrices to characterize two novel phenotypes (CSNK2B, PSMC3).GMDB is the first FAIR database for NGP, where data are findable, accessible, interoperable, and reusable. It is a repository for medical images that cannot be included in medRxiv. That means GMDB connects clinicians with a shared interest in particular phenotypes and improves the performance of AI.
Publisher
Cold Spring Harbor Laboratory