Abstract
Feature (or variable) selection is the process of identifying the minimal set of features with the highest predictive performance on the target variable of interest. Numerous feature selection algorithms have been developed over the years, but only few have been implemented in R and made publicly available R as packages while offering few options. The R package MXM offers a variety of feature selection algorithms, and has unique features that make it advantageous over its competitors: a) it contains feature selection algorithms that can treat numerous types of target variables, including continuous, percentages, time to event (survival), binary, nominal, ordinal, clustered, counts, left censored, etc; b) it contains a variety of regression models that can be plugged into the feature selection algorithms (for example with time to event data the user can choose among Cox, Weibull, log logistic or exponential regression); c) it includes an algorithm for detecting multiple solutions (many sets of statistically equivalent features, plain speaking, two features can carry statistically equivalent information when substituting one with the other does not effect the inference or the conclusions); and d) it includes memory efficient algorithms for high volume data, data that cannot be loaded into R (In a 16GB RAM terminal for example, R cannot directly load data of 16GB size. By utilizing the proper package, we load the data and then perform feature selection.). In this paper, we qualitatively compare MXM with other relevant feature selection packages and discuss its advantages and disadvantages. Further, we provide a demonstration of MXM’s algorithms using real high-dimensional data from various applications.
Funder
Seventh Framework Programme
Subject
General Pharmacology, Toxicology and Pharmaceutics,General Immunology and Microbiology,General Biochemistry, Genetics and Molecular Biology,General Medicine
Reference37 articles.
1. Towards principled feature selection: relevancy, filters and wrappers.;I Tsamardinos;AISTATS.,2003
2. Feature Selection with the R Package MXM: Discovering Statistically-Equivalent Feature Subsets.;V Lagani;J Stat Softw.,2017
3. Algorithms for Large Scale Markov Blanket Discovery.;I Tsamardinos;FLAIRS Conference.,2003
4. Time and sample efficient discovery of Markov Blankets and direct causal relations.;I Tsamardinos;Proceedings of the ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.,2003
5. Forward-backward selection with early dropping.;G Borboudakis;J Mach Learn Res.,2019
Cited by
11 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献