At the heart of the model lies the intelligent integration of different types of data, which together provide a comprehensive picture of air pollution across the city. In total, three main categories of data feed into the model:
-
Historical data – past measurements of air pollutants such as nitrogen dioxide (NO₂) and particulate matter (PM₁₀, PM₂.₅), along with weather conditions and traffic data. These provide the foundation for the model to recognise and learn the relationships between variables.
-
Current, time-varying data – snapshots of traffic data and the latest weather information available in real time or with a short delay. These allow the model to respond to the present situation and improve short-term predictions.
-
Spatially variable data – information that reflects how the city is structured, including information on land use (e.g. green spaces, residential areas), building density, and the layout of the road network. The impact of these factors on air quality varies by location.
To consolidate this information, the model draws on a variety of sources, including:
-
BLUME, the Berlin Air Quality Monitoring Network (available in German only), providing accurate measurements of air pollution levels at numerous locations across the city,
-
weather forecasts from Germany’s Meteorological Service (DWD),
-
Copernicus CAMS, a European programme sharing environmental data on atmospheric conditions at continental and global scales,
- traffic data from Berlin’s viz.berlin.de platform, offering detailed insights into traffic volumes and vehicle movements,
-
urban structure data from the Geoportal Berlin, including information on urban development, road layouts, and land use, and
- even school holiday schedules, as these are known to greatly influence traffic volumes and, in turn, air quality.