Affiliation:
1. LIRE Laboratory, Abdelhamid MEHRI, Constantine 2 University, Algeria
Abstract
With the exponential growth of biological data, labeling this kind of data becomes difficult and costly. Although unlabeled data are comparatively more plentiful than labeled ones, most supervised learning methods are not designed to use unlabeled data. Semi-supervised learning methods are motivated by the availability of large unlabeled datasets rather than a small amount of labeled examples. However, incorporating unlabeled data into learning does not guarantee an improvement in classification performance. This paper introduces an approach based on a model of semi-supervised learning, which is the self-training with a deep learning algorithm to predict missing classes from labeled and unlabeled data. In order to assess the performance of the proposed approach, two datasets are used with four performance measures: precision, recall, F-measure, and area under the ROC curve (AUC).