Abstract
In the age of the data deluge there are still many domains and applications restricted to the use of small datasets. The ability to harness these small datasets to solve problems through the use of supervised learning methods can have a significant impact in many important areas. The insufficient size of training data usually results in unsatisfactory performance of machine learning algorithms. The current research work aims to contribute to mitigate the small data problem through the creation of artificial instances, which are added to the training process. The proposed algorithm, Geometric Small Data Oversampling Technique, uses geometric regions around existing samples to generate new high quality instances. Experimental results show a significant improvement in accuracy when compared with the use of the initial small dataset as well as other popular artificial data generation techniques.
Funder
Fundação para a Ciência e a Tecnologia
Publisher
Public Library of Science (PLoS)
Reference34 articles.
1. Incorporating prior information in machine learning by creating virtual examples;P Niyogi;Proceedings of the IEEE,1998
2. Handling a Small Dataset Problem in Prediction Model by employ Artificial Data Generation Approach: A Review;MA Lateh;Journal of Physics: Conference Series,2017
3. Using mega-trend-diffusion and artificial samples in small data set learning for early flexible manufacturing system scheduling knowledge;DC Li;Computers & Operations Research,2007
4. European Commission, Directorate-General for Justice and Consumers. The GDPR: new opportunities, new obligations: what every business needs to know about the EU’s General Data Protection Regulation.; 2018. Available from: https://data.europa.eu/doi/10.2838/97649.
5. A few useful things to know about machine learning;P Domingos;Communications of the ACM,2012
Cited by
8 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献