Kennard-Stone method outperforms the Random Sampling in the selection of calibration samples in SNPs and NIR data

Author:

Ferreira Roberta de Amorim1ORCID,Teixeira Gabriely2ORCID,Peternelli Luiz Alexandre2ORCID

Affiliation:

1. Universidade Federal de Viçosa (UFV),, Brazil; Instituto Federal de Minas Gerais (IFMG), Brazil

2. Universidade Federal de Viçosa (UFV),, Brazil

Abstract

ABSTRACT: Splitting the whole dataset into training and testing subsets is a crucial part of optimizing models. This study evaluated the influence of the choice of the training subset in the construction of predictive models, as well as on their validation. For this purpose we assessed the Kennard-Stone (KS) and the Random Sampling (RS) methods in near-infrared spectroscopy data (NIR) and marker data SNPs (Single Nucleotide Polymorphisms). It is worth noting that in SNPs data, there is no knowledge of reports in the literature regarding the use of the KS method. For the construction and validation of the models, the partial least squares (PLS) estimation method and the Bayesian Lasso (BLASSO) proved to be more efficient for NIR data and for marker data SNPs, respectively. The evaluation of the predictive capacity of the models obtained after the data partition occurred through the correlation between the predicted and the observed values, and the corresponding square root of the mean squared error of prediction. For both datasets, results indicated that the results from KS and RS methods differ statistically from each other by the F test (P-value < 0.01). The KS method showed to be more efficient than RS in practically all repetitions. Also, KS method has the advantage of being easy and fast to be applied and also to select the same samples, which provides excellent benefits in the following analyses.

Publisher

FapUNIFESP (SciELO)

Subject

General Veterinary,Agronomy and Crop Science,Animal Science and Zoology

Reference46 articles.

1. Optimization of genomic selection training populations with a genetic algorithm.;AKDEMIR D.;Genetics Selection Evolution,2015

2. Prediction of lignin content in Different Parts of Sugarcane Using Near-Infrared Spectroscopy (NIR), Ordered Predictors Selection (OPS), and Partial Least Squares (PLS).;ASSIS C.;Applied Spectroscopy,2017

3. Independent component regression applied to genomic selection for carcass traits in pigs;AZEVEDO C.;Pesquisa Agropecuaria Brasileira,2013

4. Elementos de Amostragem;BOLFARINE H.,2005

5. Chemical Systems Under Indirect Observation: Latent Properties and Chemometrics.;BROWN S.;Applied Spectroscopy,1995

Cited by 14 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3