ConvCGP: A convolutional neural network to predict genetic values of agronomic traits from compressed genome-wide polymorphisms

Fuente: PubMed "rice"
Plant Genome. 2026 Jun;19(2):e70223. doi: 10.1002/tpg2.70223.ABSTRACTThe growing size of genome-wide polymorphism data in animal and plant breeding has raised concerns regarding computational load and time, particularly when predicting genetic values for target traits using genomic prediction. Several deep learning and conventional methods, including dimensionality reduction techniques such as principal component analysis (PCA) and autoencoders, have been proposed to address these challenges by selecting subsets of polymorphisms or compressing high-dimensional data for predictive analysis. However, these methods are often computationally intensive and time-consuming. A major challenge in applying deep-learning models directly to high-dimensional genomic data is the substantial computational cost and time required for hyperparameter tuning and model training. To address these limitations, we propose a novel deep learning framework, Compression-based Genomic Prediction using Convolutional Neural Networks (ConvCGP), that integrates autoencoder-based nonlinear compression with convolutional neural network-based prediction in an end-to-end trainable pipeline. This method reduces data to a compact latent representation that retains meaningful information for prediction, thereby significantly reducing storage needs and computational load. We applied ConvCGP to high-dimensional rice datasets for agronomic trait prediction and further tested it on maize, which is large in scale. The results show that ConvCGP maintained prediction accuracy comparable to models trained on uncompressed data, even under extreme compression where only 2% of the original features were retained. This demonstrates that ConvCGP not only scales effectively to massive datasets but also preserves predictive information under drastic dimensionality reduction. Moreover, ConvCGP consistently outperformed PCA-based models, genomic best linear unbiased prediction, LASSO (least absolute shrinkage and selection operator), support vector machine, and other methods, establishing it as a powerful, efficient, and scalable solution for modern genomic prediction.PMID:42003104 | DOI:10.1002/tpg2.70223