Author:
Kim Sunhee,Chu Sang-Ho,Park Yong-Jin,Lee Chang-Yong
Abstract
As genomic selection emerges as a promising breeding method for both plants and animals, numerous methods have been introduced and applied to various real and simulated data sets. Research suggests that no single method is universally better than others; rather, performance is highly dependent on the characteristics of the data and the nature of the prediction task. This implies that each method has its strengths and weaknesses. In this study, we exploit this notion and propose a different approach. Rather than comparing multiple methods to determine the best one for a particular study, we advocate combining multiple methods to achieve better performance than each method in isolation. In pursuit of this goal, we introduce and develop a computational method of the stacked generalization within ensemble methods. In this method, the meta-model merges predictions from multiple base models to achieve improved performance. We applied this method to plant and animal data and compared its performance with currently available methods using standard performance metrics. We found that the proposed method yielded a lower or comparable mean squared error in predicting phenotypes compared to the current methods. In addition, the proposed method showed greater resistance to overfitting compared to the current methods. Further analysis included statistical hypothesis testing, which showed that the proposed method outperformed or matched the current methods. In summary, the proposed stacked generalization integrates currently available methods to achieve stable and better performance. In this context, our study provides general recommendations for effective practices in genomic selection.