Abstract
Multiclass data classification, where the goal is to segment data into classes, is an important task in machine learning. However, the task is challenging due to reasons including the scarcity of labeled training data; in fact, most machine learning algorithms require a large amount of labeled examples to perform well. Moreover, the accuracy of a classifier can be dependent on the accuracy of the training labels which can be corrupted. In this paper, we present an efficient and unconditionally stable semi-supervised graph-based method for multiclass data classification which requires considerably less labeled training data to accurately classify a data set compared to current techniques, due to properties such as the embedding of data into a similarity graph. In particular, it performs very well and more accurately than current approaches in the common scenario of few labeled training elements. Morever, we show that the algorithm performs with good accuracy even with a large number of mislabeled examples and is also able to incorporate class size information. The proposed method uses a modified auction dynamics technique. Extensive experiments on benchmark datasets are performed and the results are compared to other methods.
Subject
Artificial Intelligence,Computer Vision and Pattern Recognition,Theoretical Computer Science
Reference67 articles.
1. T. Joachims, Text categorization with support vector machines: learning with many relevant features, in: European Conference on Machine Learning, 1998, pp. 137–142.
2. LIBSVM: A library for support vector machines;Chang;ACM Transactions on Intelligent Systems and Technology,2011
3. A tutorial on support vector machines for pattern recognition;Burges;Data Mining and Knowledge Discovery,1998
4. Learning active learning from data;Konyushkova;Advances in Neural Information Processing Systems,2017
5. Convolutional networks for images, speech, and time series;LeCun;The Handbook of Brain Theory and Neural Networks,1995