Author:
Tripathi Rajesh Kumar,Raja Linesh,Kumar Ankit,Dadheech Pankaj,Kumar Abhishek,Nachappa M N
Abstract
Abstract
There is tremendous upturn in data repositories because of data generation by various organizations like government, cooperates, health caring in large amounts. Large amount of data is being produced, processed, collected, and analysed online. So there comes a requirement to transform this data into valuable information. This process of extracting the knowledge from large amount of data is referred as data mining. The proposed hybrid approach can be checked on different classifiers like Naïve Bayes, Random forest classifier etc. In proposed methodology we find that SMOTE algorithm which used K-nearest neighbour algorithm is limited to some minority class instances for creating synthetic samples, which sometimes leads to over fitting, so an effective oversampling approach can be developed.
Reference23 articles.
1. Analysing the classification of imbalanced data-sets with multiple classes: Binarization techniques and ad-hoc approaches;Fernandez;Knowledge-Based Systems,2013
2. An overview of classification algorithms for imbalanced datasets;Ganganwar;International Journal of Emerging Technology and Advanced Engineering,2012
3. Improving Risk Predictions by Preprocessing Imbalanced Credit Data;García,2012
4. On the effectiveness of preprocessing methods when dealing with different levels of class imbalance;García;Knowledge-Based Systems,2012
5. The class imbalance problem in pattern classification and learning;García,2007
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献