Sustainability, Vol. 18, Pages 5611: Identifying Meteorological and Gaseous Pollutant Factors Across PM2.5 Pollution Levels for Sustainable Air Quality Management in the Beijing–Tianjin–Hebei Region Using CatBoost–SHAP: A 2021–2024 Analysis

Fuente: Sustainability - Revista científica (MDPI)
Sustainability, Vol. 18, Pages 5611: Identifying Meteorological and Gaseous Pollutant Factors Across PM2.5 Pollution Levels for Sustainable Air Quality Management in the Beijing–Tianjin–Hebei Region Using CatBoost–SHAP: A 2021–2024 Analysis
Sustainability doi: 10.3390/su18115611
Authors:
Ling Zeng
Dandan Shuai
Daichi Xu
Linhai Jing

This study examines the meteorological and gaseous pollutant drivers of PM2.5 under mild, moderate, and severe pollution conditions in the Beijing–Tianjin–Hebei region, with the aim of supporting sustainable air quality management. Daily observations from approximately 65 monitoring stations from 1 November 2021 to 31 October 2024 were used, including PM2.5, four gaseous pollutants (SO2, NO2, CO, and O3), and five meteorological variables: temperature, pressure, relative humidity, precipitation, and wind speed. A CatBoost–SHAP framework was adopted, with CatBoost used for station-level spatial prediction of PM2.5 and SHAP applied to interpret variable contributions. Based on predefined PM2.5 thresholds, 425 pollution days were classified into those three pollution-level scenarios. These pollution days occurred mainly in winter and spring, with higher frequencies in Handan, Baoding, and Shijiazhuang, followed by Tianjin and Beijing. The model performed well across the three pollution-level scenarios. The severe-pollution scenario achieved the highest R2, indicating a clearer spatial structure under high-PM2.5 conditions. Although absolute RMSE and MAE increased with pollution severity, their normalized values changed little, suggesting that larger errors mainly reflected stronger spatial heterogeneity at higher PM2.5 concentrations. SHAP results showed that CO, precipitation, wind speed, and temperature dominated the prediction structure. CO was the most stable and influential predictor, but its importance should be interpreted as an indicator of combustion-related pollution accumulation rather than direct causality. Precipitation represented event-dependent wet scavenging, wind speed reflected dispersion conditions, and temperature captured seasonal and thermal background effects. SHAP dependence analysis further indicated that CO had the clearest direct dependence, whereas wind speed and temperature were more background-dependent, and precipitation acted as an episodic nonlinear regulator.