Author:
Das Srijita,Ramanan Nandini,Kunapuli Gautam,Radivojac Predrag,Natarajan Sriraam
Abstract
We consider the problem of active feature elicitation in which, given some examples with all the features (say, the full Electronic Health Record), and many examples with some of the features (say, demographics), the goal is to identify the set of examples on which more information (say, lab tests) need to be collected. The observation is that some set of features may be more expensive, personal or cumbersome to collect. We propose a classifier-independent, similarity metric-independent, general active learning approach which identifies examples that are dissimilar to the ones with the full set of data and acquire the complete set of features for these examples. Motivated by four real clinical tasks, our extensive evaluation demonstrates the effectiveness of this approach. To demonstrate the generalization capabilities of the proposed approach, we consider different divergence metrics and classifiers and present consistent results across the domains.
Funder
University of Texas at Dallas
Reference57 articles.
1. VOILA: efficient feature-value acquisition for classification,;Bilgic,2007
2. Online feature elicitation in interactive optimization,;Boutilier,2009
3. Test-cost sensitive naive bayes classification,;Chai,2004
4. Families of alpha- beta- and gamma- divergences: flexible and robust measures of similarities;Cichocki;Entropy,2010
5. Hellinger distance decision trees are robust and skew-insensitive;Cieslak;Data Mining Knowl. Discov.,2012