Modelling and Forecasting – How the Process Works
The actual forecasting model uses a method called XGBoost, a modern machine learning technique based on decision trees. This method has proven particularly effective at identifying complex relationships across a wide range of data sources.
A separate model is trained for each relevant air pollutant, i.e. NO₂, PM₁₀, and PM₂.₅. This allows each model to focus on the influencing factors specific to that pollutant and fluctuations typically associated with it. To ensure they stay current, the models are retrained monthly. This allows them to account for changes in traffic behaviour or weather patterns, and other environmental variables, maintaining their adaptability.
Forecasting is carried out on a high-resolution grid with 50 × 50 metre cells. This means that each grid point receives a predicted air pollution level. The fine resolution is key to capturing even small-scale variations in air quality across the city.
The model uses a wide range of features, including:
- temporal variables – year, day of the week, time of day, and specific periods like school holidays or public holidays,
- meteorological data – temperature, wind direction and speed, precipitation, and other weather-related variables,
- spatial factors – building density, proportion of green space, and the road network, all of which affect how pollutants disperse,
- traffic data – predicted vehicle counts and speeds for each grid cell, and
- past measurements – lagged variables or ‘lags’, which allow the model to identify and respond to short-term trends.