key: cord-329414-zueqafmn authors: Mallet, Marc Daniel title: Meteorological normalisation of PM(10) using machine learning reveals distinct increases of nearby source emissions in the Australian mining town of moranbah date: 2020-08-17 journal: Atmos Pollut Res DOI: 10.1016/j.apr.2020.08.001 sha: doc_id: 329414 cord_uid: zueqafmn The impacts of poor air quality on human health are becoming more apparent. Businesses and governments are implementing technologies and policies in order to improve air quality. Despite this the PM(10) air quality in the mining town of Moranbah, Australia, has worsened since measurements commenced in 2011. The annual average PM(10) concentrations during 2012, 2017, 2018 and 2019 have all exceeded the Australian National Environmental Protection Measure's standard, and there has been an increase in the frequency of exceedances of the daily standard. The average annual increase in PM(10) was 1.2 [Formula: see text] 0.5 μg [Formula: see text] between 2011 and 2019 and has been 2.5 [Formula: see text] 1.2 μg [Formula: see text] since 2014. The cause of this has not previously been established. Here, two machine learning algorithms (gradient boosted regression and random forest) have been implemented to model and then meteorologically normalise PM(10) mass concentrations measured in Moranbah. The best performing model, using the random forest algorithm, was able to explain 59% of the variance in PM(10) using a range of meteorological, environmental and temporal variables as predictors. An increasing trend after normalising for these factors was found of 0.6 [Formula: see text] 0.5 μg [Formula: see text] since 2011 and 1.7 [Formula: see text] 0.3 μg [Formula: see text] since 2014. These results indicate that more than half of the increase in PM(10) is due to a rise in local emissions in the region. The remainder of the rise in PM(10) was found to be due to a decrease of soil water content in the surrounding region, which can facilitate higher dust emissions. Whether the presence of open-cut coal mines exacerbated the role of soil water content is unclear. Although fires can have drastic effects on the local air quality, changes in fire patterns are not responsible for the rising trend. PM(10) composition measurements or more detailed data relating to local sources is still needed to better isolate these emissions. Nonetheless, this study highlights the need and potential for action by industry and government to improve the air quality and reduce health risks for the nearby population. ods are yet to be realised in Australia and across the southern hemisphere 55 (Rybarczyk and Zalakeviciute, 2018) . Ensemble machine learning methods, 56 such as random forests (Breiman, 2001) or gradient boosted regression (Fre-57 und and Schapire, 1995, Friedman, 2001) , use a range of predictor variables 58 and an ensemble of decision trees to make predictions. They offer a consider-59 able advantage over other machine learning methods such as neural networks 60 because the relationships between the predictor variables and the predicted 61 variable can be fully interpreted (Fuller and Font, 2019) . Furthermore, both 62 numeric and categorical predictor variables can be used, which allows com-63 plex systems such as regional synoptic conditions or air mass origins to be Residents of Moranbah have reportedly been concerned with the high 85 levels of dust appearing in households but, to date, a comprehensive investi-86 gation of trends and drivers of air quality in the township has not been done. The objective of this study is to exploit the recent advances in machine learn-88 ing to investigate the trends in PM 10 in Moranbah and assess the impact of 89 changes in local industrial actions on air quality using open-access datasets 90 and techniques. The primary intent of this study is therefore to provide lo-91 cal and state governments, as well as industry, a starting point to assess how 92 changes in industrial development, residential growth or modes of employ-93 ment might influence the air quality to inform future policies or procedures. The secondary intent is to establish a methodology for this meteorological 95 normalisation that accounts for the influence of nearby fires, which are an 96 important source of particulate matter in the Australian dry season, as well 97 as other environmental factors such as soil water content. This study will 98 therefore provide an updated meteorological normalisation technique that 99 can then be applied to the numerous datasets of long-term monitoring of air 100 quality across Australia. A K-means clustering analysis was then applied to each 6-hourly trajectories 178 using the OpenAir R-package (Carslaw and Ropkins, 2012) (see Figure 3 ). This was done using both the Euclidean distance and the angular distance The random forest modelling was performed using the rmweather R pack- forest machine learning techniques is that the partial dependencies between 257 the predictor variables and predictant can be investigated. This is done by 258 randomly sampling all but one of the predictor variables, one at a time. Exploiting this allows for the influence of each predictor variables on PM 10 260 to be isolated. 3). It is difficult to identify the reason that the random forest algorithm 296 outperformed the gradient boosted regression in this study. Even though a 297 wide range of hyper parameters were tested with both models, the random 298 forest models were able to explain more than 10 % more of the variance 299 in PM 10 than the gradient boosted regression. One possible reason is that 300 the random forest models were less prone to over-fitting than the gradient 301 boosted regression for this data set. The optimal random forest model (R 2 = 0.59, RMSE = 19.5) was given it was also computationally much faster than a much higher number of trees. The optimal random forest model and gradient boosted regression model for. This partial dependence gives an indication of the meteorologically nor-393 malised trend which will be discussed later. Temperature was an important predictor variable on the predicted PM 10 . There are many ways that temperature can influence air quality from changes indicates that winds from the north-east and south are responsible for the 420 highest PM 10 concentrations. This will discussed further in the next section. The air mass backwards trajectory was not an influential variable on the 422 predicted PM 10 , giving strong evidence that local, rather than regional or The scope of this study was to explore the alarming increase in PM 10 to 625 above safe levels in Moranbah. Beyond the local and regional area surround- All raw data are available from free, publicly available, sources that are outlined in the methods. All code that is used to load, clean, analyse and visualise data, as well as the prepared dataset used for gradient boosted regression and random forest modelling, is available on the public github repository, https://github.com/marc-mallet/moranbah p m10. Towards the 673 development of a low cost airborne sensing system to monitor dust particles 674 after blasting at open-pit mine sites Can land use intensification in the mallee, australia increase the supply of soluble iron to the southern ocean? 677 Scientific reports Unprecedented smoke-related 680 health burden associated with the 2019-20 bushfires in eastern australia Random forests Air pollution and health. The lancet 684 deweather: Remove the influence of weather on air qual-686 ity data openair -an r package for air qual-689 ity data analysis. Environmental Modelling Software 27-28 Copernicus Climate Change Service (C3S): ERA5: Fifth gen-692 eration of ECMWF atmospheric reanalyses of the global climate Source apportionment of pm2. 5 and pm10 aerosols in brisbane 697 (australia) by receptor modelling Apportionment 701 of sources of fine and coarse particles in four major australian cities by 702 positive matrix factorisation Influence 704 of the 23 october 2002 dust storm on the air quality of four australian 705 cities Mortality and mor-707 bidity in populations in the vicinity of coal mining: a systematic review. 708 BMC public health 18 A review on the importance of metals and metalloids in 711 atmospheric dust and aerosol from mining operations Emission factors of trace gases and particles from tropical 716 savanna fires in australia A desicion-theoretic generalization of on-719 line learning and an application to boosting Greedy function approximation: a gradient boosting 722 machine Keeping air pollution policies on track Effect 726 of moisture on fine dust emission from tillage operations on agricultural 727 soils Sentinel Hotspot Characteristics of hazardous airborne dust 732 around an indian surface coal mining area. Environmental Monitoring and 733 Using meteorological normalisation to 735 detect interventions in air quality time series. Science of The Total Envi-736 ronment 653 Ran-738 dom forest meteorological normalisation models for swiss pm10 trend anal-739 ysis Marine aerosol at southern mid-latitudes gbm: Generalized boosted regression models Organic 746 aerosol formation from the oxidation of biogenic hydrocarbons Char-750 acterisation of the impact of open biomass burning on urban air quality in 751 brisbane Air pollution emis-753 sions 2008-2018 from australian coal mining: Implications for public and 754 occupational health splitr: Use the hysplit model from inside r Long-760 term trends in pm2. 5 mass and particle number concentrations in urban 761 air: The impacts of mitigation measures and extreme events due to chang-762 ing climates Evalu-764 ation of interventions to reduce air pollution from biomass smoke on mor-765 tality in launceston, australia: retrospective analysis of daily mortality The ncep/ncar 769 40-year reanalysis project When smoke comes to town: The impact of biomass burning smoke on air 773 quality Quantification of 775 secondary organic aerosol in an australian urban location Journal of occupational and environmental medicine/American 779 College of Ambient partic-782 ulate air pollution and daily mortality in 652 cities A bagging-gbdt ensemble learning model 785 for city air pollutant concentration prediction Biomass burning emissions in north australia during the early dry season: 790 an overview of the 2014 safired campaign Biomass burning 792 emissions over northern australia constrained by aerosol measurements: 793 I-modelling the distribution of hourly emissions. Atmospheric Environ-794 ment 42 The ambient aerosol characterization during the pre-797 scribed bushfire season in brisbane 2013 Diurnal variation of 800 pm10 concentrations and its spatial distribution in the south east queens-801 land airshed Effects of bushfire smoke on 804 daily mortality and hospital admissions in sydney Australian Government Depart-807 ment of the Environment and Energy Coal mine dust 811 lung disease in the modern era Origin, transport and depo-814 sition of aerosol iron to australian coastal waters. Atmospheric Environ-815 ment Mining developments 817 and social impacts on communities: Bowen basin case studies Queensland Government Statistician's Office. URL: 820 www.qgso.qld.gov Queensland Gov-822 ernment Air Quality Monitoring R: A language and environment for statistical comput-826 ing Size-resolved mass and chemical properties of dust aerosols from 829 australia's lake eyre basin Impact of smoke from biomass burning on air qual-832 ity in rural communities in southern australia Characterization of par-835 ticulate emissions from australian open-cut coal mines: Toward improved 836 emission estimates Health 839 effects of particulate air pollution: a review of epidemiological evidence Machine learning approaches for 842 outdoor air quality modelling: A systematic review bomrang: 845 fetch australian government bureau of meteorology data in r Analysis and interpretation 847 of particulate matter-pm10, pm2. 5 and pm1 emissions from the hetero-848 geneous traffic near an urban roadway Noaa's hysplit atmospheric transport and dispersion modeling 852 system Assessing the impact of clean air action on air quality trends in 855 beijing using a machine learning technique Changing supersites: Assessing the impact 859 of the southern uk emep supersite relocation on measured atmospheric 860 composition Severe air pollution 862 events not avoided by reduced anthropogenic activities during covid-19 Conservation and Recycling 158 Homelessness in rural 865 and regional queensland mining communities. Parity 30 WHO Air quality guidelines for particulate matter, ozone, 867 nitrogen dioxide and sulfur dioxide: global update 2005: summary of risk 868 assessment Welcome to the tidyverse Extending the kolmogorov-zurbenko fil-876 ter: application to ozone, particulate matter, and meteorological trends Meteorologically adjusted urban air quality 879 trends in the southwestern united states ranger: A fast implementation of random 882 forests for high dimensional data in C++ and R Testing and dating of 885 structural changes in practice Significant changes in chemistry of 889 fine particles in wintertime beijing from 2007 to 2017: Impact of clean air 890 actions. Environmental science & technology . for "Meteorological normalisation of PM10 using machine learning reveals distinct increases of nearby source emissions in a mining town" by Mallet, 2020: • PM10 concentrations are rising by 1.2 ug/m^3 per year in Moranbah, Australia • Machine learning methods can account for the influence of meteorology on air quality • Meteorologically normalised PM10 shows rising source emissions from mining activity and drying soil This research did not receive any specific grant from funding agencies in 670 the public, commercial, or not-for-profit sectors. J o u r n a l P r e -p r o o f Declaration of interests ☐ The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.☒The authors declare the following financial interests/personal relationships which may be considered as potential competing interests:The author has family members that live in the town that is the focus of this study and employed in local industry. These personal relationships did not influence the analyses or discussion presented in this study in any way.