Fuente:
Molecules - Revista científica (MDPI)
Molecules, Vol. 31, Pages 750: Quantitative Analysis of Polyphenols in Lonicera caerulea Based on Mid-Infrared Spectroscopy and Hybrid Variable Selection
Molecules doi: 10.3390/molecules31040750
Authors:
Haiwei Wu
Xuexin Li
Jianwei Liu
Zhihao Wang
Yuchun Liu
Lonicera caerulea L. (blue honeysuckle) is rich in antioxidant polyphenols, and rapid and accurate determination of its polyphenol content is of great significance for functional food quality control. This study proposed a hybrid variable selection strategy designed for high-dimensional small-sample scenarios and developed a quantitative prediction model for polyphenol content based on mid-infrared (MIR) spectroscopy. A total of 191 Lonicera caerulea samples were collected from Northeast China, and 7468-dimensional spectral data were acquired using a Fourier transform infrared spectrometer. Polyphenol reference values were determined by the Folin–Ciocalteu method. Samples were divided into calibration (n = 152) and prediction (n = 39) sets using the SPXY algorithm. Among the 10 preprocessing methods evaluated, MSC combined with Savitzky–Golay first derivative achieved the best performance and was therefore used for subsequent modeling. The proposed hybrid variable selection method (VIP1.0∩RFR30%) intersected PLS variable importance in projection (VIP ≥ 1.0) with the top 30% important variables from random forest regression, selecting 984 key wavelengths and achieving 86.8% dimensionality reduction. A three-stage hyperparameter tuning strategy was implemented across four models (PLS, RFR, SVR, and XGBoost) to validate feature stability and control overfitting. The optimized XGBoost model achieved excellent performance on the independent test set (R2 = 0.92, RMSE = 0.098, RPD = 3.47). Compared with the classical CARS method (R2 = 0.78, RPD = 2.14), R2 improved by 16.3% and RPD improved by 55.2%. The results demonstrate that the proposed hybrid variable selection strategy can effectively address the challenges of high-dimensional MIR spectral data in small-sample modeling, providing a reliable tool for rapid and non-destructive quantitative analysis of polyphenols in Lonicera caerulea.