Abstract
A test of independence is commonly used to determine differences (or associations) between samples in a nominal level measurement. Fisher’s exact test and Chi-square test are two of the most widely applied tests of independence used in the data analyses in different areas such as information technologies, biostatistics, psychology and health sciences. In some cases, contingency tables with null entries (also called random zeros) arise, particularly if the number of samples is small, and the variables analyzed are multilevel. This situation becomes a problem because if one or more entries in a contingency table are zero or have small values, then the tests of independence produce unreliable results. In this paper, we propose a method to address that issue. The method merges one or more levels of the variables analyzed to create contingency tables with only one degree of freedom, avoiding applying a test of independence on contingency tables with random zeros. The source code (Python) of the method is publicly available for use. The results obtained using our method give a complete panorama of the associations between the variables of a data set. To show the effectiveness of our approach to find dependencies between variables, we use four data sets publicly available on the Internet.
Funder
Universidad Autónoma del Estado de México
Subject
Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Dichotomization and Estimation of Interaction through a Boolean Framework;2023 IEEE Ural-Siberian Conference on Biomedical Engineering, Radioelectronics and Information Technology (USBEREIT);2023-05-15
2. Secure Global Software Development: A Practitioners’ Perspective;Applied Sciences;2023-02-14