Comparing Prediction Methods of Wheat Grain Quality With the Area Under the Receiver Operating Characteristic Curves
Abstract
A widely used breeding method is genomic selection, which uses genome-wide marker coverage to predict genotypic values for quantitative traits. Genomic selection combines molecular and phenotypic data in a training population to obtain the genomic estimated breeding values of individuals in a testing population that have been genotyped but not phenotyped. One popular method for this estimation is G-BLUP. To further simplify data collection efforts and costs, we developed models with linear model, Bayesian linear model, K-nearest neighbors, and Random Forest to predict quality traits and compare the predictive ability of this new approach with G-BLUP using Pearson correlation and area under the receiver operating characteristic curve. The goal of this approach is to enable the analysis of large-scale data sets to provide relatively accurate estimates of quality traits without the time and energy consumption of marker analysis. Application of the methods to predict the quality traits for spring wheat breeding data reveals that compared with G-BLUP methods, the proposed methods perform better in loaf volume prediction, perform poorly in flour extraction and bake absorption prediction, and in mixograph prediction, the performance is not bad.