How Gaussian mixture modelling can help to verify reference intervals from laboratory data with a high proportion of pathological values

Author:

Hoffmann Georg1,Allmeier Nina1,Kuti Modupe23,Holdenrieder Stefan1,Trulson Inga1

Affiliation:

1. German Heart Center Munich , Institute of Laboratory Medicine , Munich , Germany

2. College of Medicine , University of Ibadan , Ibadan , Nigeria

3. Synlab Nigeria Laboratories , Lagos , Nigeria

Abstract

Abstract Objectives Although there are several indirect methods that can be used to verify reference limits, they have a common weakness in that they assume a low proportion of pathological values. This paper investigates whether a Gaussian decomposition algorithm can identify the non-pathological fraction even if it is not the main subset of mixed data. Methods All investigations are carried out in the R programming environment. The mclust package is used for Gaussian mixture modelling via the expectation maximization (EM) algorithm. For right-skewed distributions, logarithms of the original values are taken to approximate the Gaussian model. We use the Bayesian information criterion (BIC) for evaluation of the results. The reflimR and refineR packages serve as comparison procedures. Results We generate synthetic data mixtures with known normal distributions to demonstrate the feasibility and reliability of our approach. Application of the algorithm to real data from a Nigerian and a German population produces results, which help to interpret reference intervals of reflimR and refineR that are obviously too wide. In the first example, the mclust analysis of hemoglobin in Nigerian women supports the medical hypothesis that an anemia rate of more than 50 % leads to falsely low reference limits. Our algorithm proposes various scenarios based on the BIC values, one of which suggests reference limits that are close to published data for Nigeria but significantly lower than those established for the Caucasian population. In the second example, the standard statistical analysis of creatine kinase in German patients with predominantly cardiac diseases yields a reference interval that is clearly too wide. With mclust we identify overlapping fractions that explain this false result. Conclusions Gaussian mixture modelling does not replace standard methods for reference interval estimation but is a valuable adjunct when these methods produce discrepant or implausible results.

Publisher

Walter de Gruyter GmbH

Cited by 1 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3