Author:
Mallik Saurav,Seth Soumita,Si Amalendu,Bhadra Tapas,Zhao Zhongming
Abstract
Introduction: Association rule mining (ARM) is a powerful tool for exploring the informative relationships among multiple items (genes) in any dataset. The main problem of ARM is that it generates many rules containing different rule-informative values, which becomes a challenge for the user to choose the effective rules. In addition, few works have been performed on the integration of multiple biological datasets and variable cutoff values in ARM.Methods: To solve all these problems, in this article, we developed a novel framework MOOVARM (multi-objective optimized variable cutoff-based association rule mining) for multi-omics profiles.Results: In this regard, we identified the positive ideal solution (PIS), which maximized the profit and minimized the loss, and negative ideal solution (NIS), which minimized the profit and maximized the loss for all gene sets (item sets), belonging to each extracted rule. Thereafter, we computed the distance (d +) from PIS and distance (d −) from NIS for each gene set or product. These two distances played an important role in determining the optimized associations among various pairs of genes in the multi-omics dataset. We then globally estimated the relative closeness to PIS for ranking the gene sets. When the relative closeness score of the rule is greater than or equal to the pre-defined threshold value, the rule can be considered a final resultant rule. Moreover, MOOVARM evaluated the relative score of the rule based on the status of all genes instead of individual genes.Conclusions:MOOVARM produced the final rank of the extracted (multi-objective optimized) rules of correlated genes which had better disease classification than the state-of-the-art algorithms on gene signature identification.
Funder
Cancer Prevention and Research Institute of Texas