Author:
Hasan Syed Sarfaraz,Baitha Arun,Gangwar Lal Singh,Kumar Sanjeev
Abstract
Deep learning is a class of machine learning algorithms that extract high-level features from the raw input for making intelligent decisions. Identification of promising genotypes in varietal trials is one of many agriculture domain applications requiring implementation of deep learning to perform intelligent decision using varietal trial data. However, it has been found that varietal trial data to be used for identification is highly imbalanced one providing great challenges for classification tasks in deep learning. For example, only 33 genotypes were identified as promising in zonal varietal trials of All India Coordinated Research Project (AICRP) on Sugarcane during 2016-21, while those of non-promising class are 148. Balancing an imbalanced class is crucial as the classification model, which is trained using the imbalanced class dataset will tend to exhibit the prediction accuracy according to the highest class of the dataset. One way to address this issue is to use resampling, which adjusts the ratio between the different classes, making the data more balanced. Study was conducted to implement and evaluate four resampling techniques viz. random undersampling, random oversampling, ensemble, SMOTE to balance varietal trial dataset in order to build deep learning model to identify promising genotypes in sugarcane. Paper describes the methodology used in our approach for building deep learning model using resampling techniques and then presented comparative performance of these approaches in identifying promising genotypes. Results indicate that SMOTE and random oversampling performed well for balancing imbalanced dataset for developing deep learning model in comparison to no-resampling of imbalanced dataset. SMOTE outperformed all resampling techniques by achieving high values of precision, recall and F1 score for both positive and negative classes. However, ensemble and random undersampling methods did not showed good results in comparison to SMOTE and random oversampling technique. Studies conducted will be useful in developing artificial intelligence based tools for automatic identification of promising genotypes in varietals trials of sugarcane in particular, as well as other crops in general.
Publisher
The Indian Society of Genetics and Plant Breeding
Reference33 articles.
1. Abdi L., and Sattar H. 2016. To combat multi-class imbalanced problems by means of over-sampling techniques. IEEE Trans Knowl Data Eng. 28(1):238–51.
2. Bagui S., and Li K. 2021. Resampling imbalanced data for network intrusion detection datasets. J Big Data. 8, 6. https://doi.org/10.1186/s40537-020-00390-x.
3. Berry M. J. A., and Linoff G. 2000. Astering Data Mining. The Art and Science of Customer Relationship Management. Willey.
4. Chen H., Yining L., Chen C. L., and Xiaoou T. 2016. Learning Deep Representation for Imbalanced Classification. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 5375-5384.
5. Cieslak D. A., Chawla N. W., and Striegel A. 2006. Combating Imbalance in Network Intrusion Datasets. Proc IEEE Int Conf Granular Computing, 2006, Atlanta, Georgia, USA, 732-737.