Affiliation:
1. Department of Computer Science, University of Nebraska at Omaha, Omaha, NE 68182, USA
Abstract
Identifying the significant, or dominant, features is important to reveal the cause-and-effect relations in many pattern recognition applications, such as medical diagnosis, gene analysis, cyber security, finance and insurance fraud detection, etc. Samples that are sparsely populated and binary-valued in highly imbalanced datasets pose a challenge to the identification of these features. This paper explores an approach based on the confusion matrix measurement of the feature values with respect to their potential classification outcomes. The approach is able to compute the Discriminative Significances of the features and rank the features unbiasedly with respect to the imbalance ratios of the datasets. Experiment results on real-world and experimental datasets show that the approach made consistent evaluations of the features and identified the most significant ones accordingly on the sparse and binary-valued samples of the class-imbalanced datasets.
Publisher
World Scientific Pub Co Pte Ltd
Subject
Artificial Intelligence,Computer Vision and Pattern Recognition,Software
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Discrimination of Insurance Fraud Based on Machine Learning;Highlights in Business, Economics and Management;2023-08-02