Evaluation of resampling techniques for deep learning based identification of promising genotypes in sugarcane varietal trials

Author:

Hasan Syed Sarfaraz,Baitha Arun,Gangwar Lal Singh,Kumar Sanjeev

Abstract

Deep learning is a class of machine learning algorithms that extract high-level features from the raw input for making intelligent decisions. Identification of promising genotypes in varietal trials is one of many agriculture domain applications requiring implementation of deep learning to perform intelligent decision using varietal trial data. However, it has been found that varietal trial data to be used for identification is highly imbalanced one providing great challenges for classification tasks in deep learning. For example, only 33 genotypes were identified as promising in zonal varietal trials of All India Coordinated Research Project (AICRP) on Sugarcane during 2016-21, while those of non-promising class are 148. Balancing an imbalanced class is crucial as the classification model, which is trained using the imbalanced class dataset will tend to exhibit the prediction accuracy according to the highest class of the dataset. One way to address this issue is to use resampling, which adjusts the ratio between the different classes, making the data more balanced. Study was conducted to implement and evaluate four resampling techniques viz. random undersampling, random oversampling, ensemble, SMOTE to balance varietal trial dataset in order to build deep learning model to identify promising genotypes in sugarcane. Paper describes the methodology used in our approach for building deep learning model using resampling techniques and then presented comparative performance of these approaches in identifying promising genotypes. Results indicate that SMOTE and random oversampling performed well for balancing imbalanced dataset for developing deep learning model in comparison to no-resampling of imbalanced dataset. SMOTE outperformed all resampling techniques by achieving high values of precision, recall and F1 score for both positive and negative classes. However, ensemble and random undersampling methods did not showed good results in comparison to SMOTE and random oversampling technique. Studies conducted will be useful in developing artificial intelligence based tools for automatic identification of promising genotypes in varietals trials of sugarcane in particular, as well as other crops in general.

Publisher

The Indian Society of Genetics and Plant Breeding

Reference33 articles.

1. Abdi L., and Sattar H. 2016. To combat multi-class imbalanced problems by means of over-sampling techniques. IEEE Trans Knowl Data Eng. 28(1):238–51.

2. Bagui S., and Li K. 2021. Resampling imbalanced data for network intrusion detection datasets. J Big Data. 8, 6. https://doi.org/10.1186/s40537-020-00390-x.

3. Berry M. J. A., and Linoff G. 2000. Astering Data Mining. The Art and Science of Customer Relationship Management. Willey.

4. Chen H., Yining L., Chen C. L., and Xiaoou T. 2016. Learning Deep Representation for Imbalanced Classification. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 5375-5384.

5. Cieslak D. A., Chawla N. W., and Striegel A. 2006. Combating Imbalance in Network Intrusion Datasets. Proc IEEE Int Conf Granular Computing, 2006, Atlanta, Georgia, USA, 732-737.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3