Estimating the Class Prior in Positive and Unlabeled Data Through Decision Tree Induction-Reference-Cited by-同舟云学术

Estimating the Class Prior in Positive and Unlabeled Data Through Decision Tree Induction

Published:2018-04-29 Issue:1 Volume:32 Page:
ISSN:2374-3468
Container-title:Proceedings of the AAAI Conference on Artificial Intelligence
language:
Short-container-title:AAAI

Author:

Bekker Jessa,Davis Jesse

Abstract

For tasks such as medical diagnosis and knowledge base completion, a classifier may only have access to positive and unlabeled examples, where the unlabeled data consists of both positive and negative examples. One way that enables learning from this type of data is knowing the true class prior. In this paper, we propose a simple yet effective method for estimating the class prior, by estimating the probability that a positive example is selected to be labeled. Our key insight is that subdomains of the data give a lower bound on this probability. This lower bound gets closer to the real probability as the ratio of labeled examples increases. Finding such subsets can naturally be done via top-down decision tree induction. Experiments show that our method makes estimates which are equivalently accurate as those of the state of the art methods, and is an order of magnitude faster.

Publisher

Association for the Advancement of Artificial Intelligence (AAAI)

Subject

General Medicine

Cited by 23 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Joint empirical risk minimization for instance-dependent positive-unlabeled data;Knowledge-Based Systems;2024-11

2. Deep forests with tree-embeddings and label imputation for weak-label learning;2024 International Joint Conference on Neural Networks (IJCNN);2024-06-30

3. Quantifying disparities in intimate partner violence: a machine learning method to correct for underreporting;npj Women's Health;2024-05-15

4. Modeling User Attention in Music Recommendation;2024 IEEE 40th International Conference on Data Engineering (ICDE);2024-05-13

5. Positive-unlabeled learning for coronary artery segmentation in CCTA images;Biomedical Signal Processing and Control;2024-01