Fuente:
Sustainability - Revista científica (MDPI)
Sustainability, Vol. 18, Pages 2022: An Assessment of the Multi-Input Spatiotemporal RF–XGBoost Hybrid Framework for PM10 Estimation in Lithuania
Sustainability doi: 10.3390/su18042022
Authors:
Mina Adel Shokry Fahim
Jūratė Sužiedelytė Visockienė
Air pollution remains a major public-health concern, and exposure to particulate matter (PM), particularly PM10 (with a diameter ≤ 10 µm), is associated with adverse respiratory and cardiovascular outcomes. Most research relies on a singular model for PM10 surface estimation. This study is an assessment of a national-scale, daily PM10 estimation framework for Lithuania (2019–2024), using a hybrid machine-learning method that combines Random Forest (RF) and extreme gradient boosting (XGBoost) algorithms. Hourly PM10 observations were aggregated from 18 monitoring stations to obtain daily means and temporal means. The predictors integrated meteorological factors, such as temperature, wind, humidity, and precipitation, to determine satellite-based atmospheric composition from Sentinel-5P Tropospheric Monitoring Instruments (TROPOMI). Atmospheric components include nitrogen dioxide (NO2), carbon monoxide (CO), sulfur dioxide (SO2), ozone (O3), formaldehyde (HCHO), and the absorbing aerosol index (AI). Moderate-Resolution Imaging Spectroradiometers (MODIS) were used to record land-surface temperature and static spatial descriptors, such as elevation, land cover, Normalized Difference Vegetation Index (NDVI), population, and road proximity. The dataset was partitioned temporally into training (70%), validation (20%), and testing (10%). The hybrid model achieved an improved accuracy, compared with single-model baselines, reaching a coefficient of determination (R2) of 0.739 in validation and R2 = 0.75 in the tested dataset. Mean absolute error (MAE) was 3.15 µg/m3, and root mean square error (RMSE) was 3.98 µg/m3. The results indicate a slight tendency to overestimate PM10 concentrations at lower concentration levels. Feature-importance analysis revealed that short-term temporal persistence is the key to daily PM10 prediction, while meteorological variables provide secondary contributions. Temporal evaluation, using consecutive two-year windows, revealed a consistent improvement in predictive performance from 2019–2020 to 2023–2024, while station-level analysis showed moderate-to-strong agreement between the predicted and observed PM10 concentrations across monitoring stations, with R2 ranging from 0.455 to 0.760. This provides decision-support capabilities for air-quality management, the evaluation of mitigation measures, and integration of air-pollution considerations into sustainable urban planning strategies assessing public-health protection.