Sustainability, Vol. 18, Pages 4056: A Hybrid Framework for Sustainable Ecosystem Management Through Robust Litterfall Prediction Under Data Scarcity

Fuente: Sustainability - Revista científica (MDPI)
Sustainability, Vol. 18, Pages 4056: A Hybrid Framework for Sustainable Ecosystem Management Through Robust Litterfall Prediction Under Data Scarcity
Sustainability doi: 10.3390/su18084056
Authors:
Nourhan K. Elbahnasy
Fatma M. Najib
Wedad Hussein
Walaa Gad

Accurate ecological prediction is critical for sustainable environmental management and carbon cycle assessment, yet model development is often constrained by limited datasets and inconsistent preprocessing practices. Reliable litterfall prediction plays a key role in understanding nutrient cycling and supporting sustainable forest ecosystem management. Although gradient boosting models have shown promising performance in ecological applications, structured evaluations integrating preprocessing strategies with synthetic data augmentation remain limited under data-scarce conditions. This study proposes the Hybrid Preprocessing and Augmented Boosting Framework (HPABF), which combines multi-stage preprocessing—including MICE imputation, log transformation, and feature engineering—with synthetic data augmentation to enhance predictive robustness. The framework was evaluated across eight machine learning models using a 968-sample forest ecological dataset. To mitigate data scarcity, 5000 synthetic samples were generated while preserving the statistical distribution and multivariate structure of the original data (91% fidelity). Fractal dimension analysis was further introduced as a geometric validation metric to assess prediction structure and stability beyond conventional performance measures. Within the HPABF, gradient boosting models achieved a 7% improvement over baseline performance (R2 = 0.96, MAE = 0.06) under cross-validation strategies designed to reduce overfitting. Training with synthetic data further improved predictive accuracy (R2 = 0.98), demonstrating the framework’s effectiveness for data-scarce ecological applications. By improving prediction reliability under limited data conditions, the proposed framework supports more accurate environmental monitoring, informed decision-making, and sustainable management of forest ecosystems.