Information-Content-Informed Kendall-tau Correlation: Utilizing Missing Values-Reference-Cited by-同舟云学术

Information-Content-Informed Kendall-tau Correlation: Utilizing Missing Values

Published:2022-02-28 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Flight Robert M^ORCID,Bhatt Praneeth S,Moseley Hunter NB^ORCID

Abstract

AbstractAlmost all correlation measures currently available are unable to handle missing values. Typically, missing values are either ignored completely by removing them or are imputed and used in the calculation of the correlation coefficient. In both cases, the correlation value will be impacted based on a perspective that missing data represents no useful information. However, missing values occur in real data sets for a variety of reasons. In omics data sets that are derived from analytical measurements, the primary reason for missing values is that a specific measurable phenomenon falls below the detection limits of the analytical instrumentation. These missing data are not missing at random, but represent some information by their “missingness.” Therefore, we propose an information-content-informed Kendall-tau (ICI-Kt) correlation coefficient that allows missing values to carry explicit information in the determination of concordant and discordant pairs. With both simulated and real data sets from RNA-seq experiments, we demonstrate that the ICI-Kt allows for the inclusion of missing data values as interpretable information. Moreover, our implementation of ICI-Kt uses a mergesort-like algorithm that provides O(nlog(n)) computational performance. Finally, we show that approximate ICI-Kt correlations can be calculated using smaller feature subsets of large data sets with significant time savings, which has practical computational value when feature sizes are very large.The ICI-Kt correlation calculation is available in an R package and Python module on GitHub at https://github.com/moseleyBionformaticsLab/ICIKendallTau and https://github.com/moseleyBionformaticsLab/icikt, respectively.

Publisher

Cold Spring Harbor Laboratory

Reference33 articles.

1. NOTES ON THE HISTORY OF CORRELATION

2. Thirteen Ways to Look at the Correlation Coefficient

3. Complex heatmaps reveal patterns and correlations in multidimensional genomic data

4. Integrated omics approaches in plant systems biology

5. Untargeted Lipidomics of Non-Small Cell Lung Carcinoma Demonstrates Differentially Abundant Lipid Classes in Cancer vs. Non-Cancer Tissue

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Scan-Centric, Frequency-Based Method for Characterizing Peaks from Direct Injection Fourier Transform Mass Spectrometry Experiments;Metabolites;2022-06-02

2. Scan-Centric, Frequency-Based Method for Characterizing Peaks from Direct Injection Fourier transform Mass Spectrometry Experiments;2022-04-15