Abstract
A large dataset is not preferable as it increases computational burden on the methods operating over it. Given the Large dataset, it is always interesting that whether one can generate smaller dataset which is a subset or a set (cardinality should be less when compare to original dataset) of extracted patterns from that large dataset. The patterns in the subset are representatives of the patterns in the original dataset. The subset (set) of representing patterns forms the Prototype set. Forming Prototype set is broadly categorized into two types. 1) Prototype set which is a proper subset of original dataset. 2) Prototype set which contains patterns extracted by using the patterns in the original dataset. This process of reducing the training set can also be done with the features of the training set. The authors discuss the reduction of the datasets in the both directions. These methods are well known as Data Compaction Techniques.
Reference29 articles.
1. Weighted k-Nearest Leader Classifier for Large Data Sets
2. A Survey of Clustering Data Mining Techniques
3. Blake, C., & Merz, C. J. (1998). {UCI} Repository of machine learning databases. Academic Press.
4. Spam filtering using statistical data compression models.;A.Bratko;Journal of Machine Learning Research,2006
5. A survey of bitmap index compression algorithms for Big Data