key: cord-0731537-bopdmtks authors: Wang, Meng; Chen, Huichao; Lei, Mei title: Identifying potentially contaminated areas with MaxEnt model for petrochemical industry in China date: 2022-03-18 journal: Environ Sci Pollut Res Int DOI: 10.1007/s11356-022-19697-8 sha: 0729d5562a6031fe80c9cfb20a9cc050ee74afa9 doc_id: 731537 cord_uid: bopdmtks The presence of heavy metal and organic pollutants in wastewater effluents, flue gases, and even solid wastes from petrochemical industries renders improper discharges liable to posing threats to the ecological environment and human health. It is beneficial for pollution control to find out the regional distribution of contaminated sites. This study explored the relationship between the petrochemical contaminated areas and natural, socio-economic, and traffic factors. Ten indicators were selected as input variables, and the MaxEnt model was conducted to identify the potentially contaminated areas. Moreover, among these 10 variables, the factors that have the great impact on the results were determined according to the contribution of variables. The results showed that the MaxEnt model performed well with AUC of 0.981 ± 0.004, and 90% of the measured contaminated sites was located in areas with medium and high probability of contamination in the prediction results. The map of potentially contaminated areas indicated that the areas with high probability of contamination were distributed in Yangtze River Delta, Beijing, Tianjin, southern Guangdong, Fujian coastal areas, central Hubei and northeast Hunan, central Sichuan, and southwest Chongqing. The responses of variables presented that high probability of petrochemical contamination tended to appear in cities with developed economy, dense population, and convenient transportation. This study presents a novel way to identify the potentially contaminated areas for petrochemical sites and provides a theoretical basis to formulate future management strategies. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s11356-022-19697-8. Petrochemical industry is an industry of processing petroleum products and chemical products with petroleum fractions and natural gas as raw materials through complex processes (Liu et al. 2011 ). The petrochemical industry, a pillar industry worldwide, has greatly promoted the development of economy (Fan et al. 2015) . However, as the major source of organic and inorganic toxic pollutants, the petrochemical industry has also posed a great threat to environment and human health (Gonzalez et al. 2021; Jephcote et al. 2020; Lin et al. 2021; Wu et al. 2016) . During the early industrial period, the backward technology and equipment as well as the lack of correct conducts had severely deteriorated the soil environment. To improve this situation, researches were conducted to optimize the production technology and waste treatment process (Abilov et al. 1999; Di Fabio et al. 2013; Muller and Craig 2016; Rejowski et al. 2009 ). Recently, digital modeling was adopted to optimize the process control system, and machine learning methods were applied to improve the safety of production process and product quality based on industrial big data Han et al. 2022; Pariyani et al. 2010; Wu et al. 2022 ). In addition to technological breakthroughs, it is important for prevention and remediation of contaminated soil to find out the spatial distribution of potentially contaminated areas. Many studies have been carried out on soil contamination and risk assessment for a single site. These reports are mainly about the pollutant concentration (Han et al. 2020; Nadal et al. 2004) , the extent of contamination (Zhang et al. 2013 (Zhang et al. , 2014 , human health, and ecological risks of a specific research (Kim et al. 2001; Rovira et al. 2014) . However, few investigations have been conducted to proactively identify potentially contaminated areas at a national scale (Liu et al. 2010; Teng et al. 2015; Zhang et al. 2014) . It is important to identify the areas with high probability of contamination and provide a basis to formulate future management strategies (Nadal et al. 2006; Wang et al. 2020b) . In this work, niche model was introduced to identify potentially contaminated areas at a large scale. The niche model is an effective tool to identify suitable areas of species and provide a quantitative framework to describe the relationship between characteristics and geographical distribution (Sillero 2011; Soberon and Nakamura 2009 ). The MaxEnt model is a niche model based on maximum entropy theory, developed by Phillips team in 2004 (Phillips et al. 2006 . Given the environmental constraints, the MaxEnt model is able to find out the most possible distribution space of species in the study area (Elith et al. 2011) . Recently, MaxEnt has been widely applied in spatial distribution of species such as Gentiana rigescens (Shen et al. 2021) , potatoes (Wang et al. 2021a) , and antelopes (Wang et al. 2021a) . Moreover, the MaxEnt has also been used to present spatial distribution of fields such as foot and mouth disease (Gao and Ma 2021), Dengue fever (Li et al. 2017) , COVID-19 (Ren et al. 2020) , and energy systems sites (Tekin et al. 2021) . These results show that MaxEnt can well identify and predict the spatial distribution of various research objects. Essentially, the principle of maximum entropy is to connect the problem with information entropy, and then take the maximum information entropy as a useful hypothesis. The target is subject to a set of constraints, and MaxEnt can present the target probability distribution by finding maximum entropy of these constraints. Moreover, an obvious advantage of this model is that it can get accurate results with less data. So far, the application of MaxEnt to calculate the possibility of contamination has not been reported. Therefore, MaxEnt was developed to identify potentially contaminated areas of petrochemical industry in China, and the probability distribution of contaminated areas is set to be related to natural, socio-economic, and traffic factors. In this study, the main purposes were as follows: (1) to present probability distribution of contaminated areas in petrochemical industry; (2) to explore the quantitative relationship between natural, socio-economic, and traffic factors and spatial distribution of potentially contaminated areas; (3) to reveal the thresholds of factors in areas with high probability of contamination; (4) to provide a basis for the better development of petrochemical industry. Additionally, this method can be extended to other industries, and the probability distribution of soil contamination can be obtained by superimposing the potentially contaminated areas of all industries in the study area. China is one of the most important producers and consumers of petrochemical products in the world (Wang et al. 2020b; Zhang et al. 2009 ). Petrochemical industry accounts for 20% of total industrial economy in China (Wang et al. 2020b) . In this study, the mainland of China was covered in the research scope. One hundred fifty contaminated sites of petrochemical industry in China were collected from official websites of ecological environment bureaus at all levels ( Fig. 1) . Only one occurrence within 10 km was kept to reduce the correlation between points (Kong et al. 2021) . The spatial clusters of localities were eliminated by ENMtools in ArcGIS 10.2 . Therefore, 100 records on contaminated sites were maintained for analysis. The data was exported and converted into CSV format by Excel, which was used as the input of the actual distribution of contaminated sites in the MaxEnt software. Ten variables were considered in the model, including natural variables (Nat1-3), socio-economic variables (Soc1-4), and traffic variables (Tra1-3) ( Table 1 ). The natural variables and the socio-economic variables including gross domestic product (Soc1) and population density (Soc2) were collected from Resource and Environment Science and Data Center (http:// www. resdc. cn/). The vector data of traffic variables and the socio-economic variables including distance to residential area (Soc3) and distance to residential point (Soc4) was collected from the National Fundamental Geographic Information System (http:// www. ngcc. cn/), which was calculated (European distance) and converted into raster grids in ArcGIS 10.2. The specific information of the variables was described in Table 1 . All variables were resampled and converted to ASCII raster grids on the 1-km × 1-km scale (Wei et al. 2021 ). Pearson's correlation analysis of input variables was conducted in the ArcGIS 10.2, and the results presented that the absolute values of correlation coefficients were less than 0.8 (Fig. 2) . Therefore, 10 variables were all input variables of the model (Su et al. 2021; Yang et al. 2013 ). In this study, MaxEnt was developed to present the potentially contaminated areas of petrochemical industry and investigated the relationship between the spatial distribution and the variables. The flowchart of the modelling process is shown in Figure S1 . Seventy-five percent of the collected data was randomly selected as training data, while the remaining 25% was testing data (Guerra-Coss et al. 2021; Shabani et al. 2020) . To ensure the stability, the model was performed with 10 replicates. The final output was the average of 10 replicates (Rodriguez-Basalo et al. 2021; Yadav et al. 2021) . The receiver operating characteristic (ROC) was used to evaluate the model performance (Manzoor et al. 2021) . The area under the curve (AUC) of testing set was calculated in the MaxEnt software, ranging from 0 to 1. The AUC value close to 1 represented perfect prediction while AUC value of 0.5 or below indicated a bad performance (Wang et al. 2021b ). The model performance was divided into five levels by the AUC value: poor (0.5-0.6), fair (0.6-0.7), good (0.7-0.8), very good (0.8-0.9), and excellent (0.9-1.0) (Li et al. 2017 ). To explore the importance of the variables in identifying the potentially contaminated areas, the percent contribution of variables was evaluated. Moreover, response curves were used to show the relationship between the factors and probability distribution. In this study, the MaxEnt model presented excellent performance with AUC of 0.981 ± 0.004 in the identification of potentially contaminated areas. In order to clearly estimate the spatial distribution of potentially contaminated areas, contaminated areas were divided into three levels according to the natural breakpoint method: low, medium, and high probability of contamination. The probability values of 100 Table 2 . Ninety percent of the samples was in areas with medium and high probability of contamination, and only 10% of the samples was in areas with low probability of contamination. This also indicated that MaxEnt model performed well in identifying potentially contaminated areas for petrochemical sites. The cities identified as the areas with medium and high probability of contamination require more attention to the soil contamination caused by industrial development. Figure 3 shows the spatial distribution of potentially contaminated areas of petrochemical sites in China. The high probability of contamination occurred in Yangtze River Delta, Beijing, Tianjin, southern Guangdong, Fujian coastal areas, central Hubei and northeast Hunan, central Sichuan, and southwest Chongqing. Combined with the map of current petrochemical enterprises ( Figure S1 ) and Fig. 3 , it can be shown that potentially contaminated areas were often in areas with dense petrochemical enterprises. This indicated that the result was reasonable and consistent with industrial distribution. The contributions of variables in identifying potentially contaminated areas presented that Soc1 (48.7% contribution) was the most relevant factor, followed by Soc3 (25.8% contribution), Soc2 (10.2% contribution), Nat3 (7.2% contribution), and Tra1(3.1% contribution). The socio-economic In order to eliminate the influence of correlation in variables and further explore the relationship between input factors and potentially contaminated areas, single-factor modeling was performed in the MaxEnt software. The response curves between the contamination probability, and the factors were plotted. The purpose was to find out the threshold value of variables, so as to formulate prevention management policies for the high-probability contaminated areas (Seaborn et al. 2021 ). The response curves are shown in Fig. 4 . According to the response curves, the probability of contamination first increased and then decreased with the increase of rainfall and temperature, and it reached the highest value when the rainfall was 757 ~ 2318 mm, and the temperature was 13 ~ 21 °C. The soil types in areas with high probability of contamination are yellow brown soil, Cinnamon soil, lou soil, acid rocky soil, moisture soil, seashore saline soil, paddy soil, and dewatering paddy soil. The probability of contaminated site occurrence increased with the raise of gross national product Xie et al. 2012) . This showed that the soil contamination of petrochemical Fig. 4 Response curves of input factors based on MaxEnt (the soil type (Nat3) codes are shown on the website of Resource and Environment Science and Data Center (http:// www. resdc. cn/)) sites tended to occur in economically developed areas. The probability of contaminated site occurrence increased sharply with increasing population density (Lv and Yu 2018) . Moreover, the probability decreased sharply with increasing distance to residential areas and points. Additionally, traffic factors were of importance for spatial distribution of contaminated sites. The possibility of contamination decreased when the distance to the road, railway, and waterway increased. The parameter values at different levels were extracted, and the thresholds of the factors are described in Table 4 . Attention should be paid to the areas with factors within the thresholds, where soil contamination would probably occur with high risk to human health. The contribution of input variables showed that the leading correlative factors in identification of potentially contaminated areas were as follows: Soc1, Soc3, Soc2, Nat3, and Tra1, which accounted for 94.9% of the cumulative contribution rate. Considering this, the MaxEnt model was re-established with the five factors as inputs, and the performance was evaluated. It can be found that the MaxEnt model presented an AUC of 0.979 ± 0.003 in the identification of potentially contaminated areas. This illustrated that potentially contaminated areas of petrochemical industry can also be well identified with these five factors based on the MaxEnt model. Particularly, a close relationship was found between spatial distribution of potentially contaminated areas and socioeconomic conditions. In China, all of the seven world-class petrochemical industry bases are located in coastal areas with developed economy and dense population, and the middle and downstream industries for chemical products rely on the oil refining industry, so a decreasing trend in the spatial distribution of petrochemical enterprises was developed from the east coast to the west inland. Therefore, the socio-economic factors played the important roles in the distribution of petrochemical soil contamination (Wang et al. 2020c ). The little difference in Soc4 (the distance to residential points (yurts, grazing sites and ordinary houses)) may be due to the less significant difference between the eastern region and the central region. Similarly, the transportation of petrochemical products is essential for industry and is normally convenient by sea and by railway. Sea transportation is suitable for international trade and long-distance transportation in domestic coastal areas, while railway is an important way for land transportation. So, Tra1 presented great importance in the distribution of petrochemical sites and thus the potentially contaminated areas (Zou and Duan 2019) . Additionally, soil types show diversity in spatial distribution, and there are great differences in soil types among different regions. Moreover, soil types affect the migration and transformation of pollutants in soil (Sukarjo et al. 2019; Yang et al. 2014) . Therefore, these factors (Soc1, Soc3, Soc2, Nat3, and Tra1) played a decisive role in the identification of potentially contaminated areas and needed special attention when formulating management strategies. The contributions of input variables showed that GDP (Soc1) was the most relevant factor, and Fig. 3 presents that the potentially contaminated areas of petrochemical industry were mainly distributed in developed areas (Wang et al. 2020d) while there were few petrochemical enterprises in Western China with vast land and sparse population, economic depression, and backward transportation. Therefore, the soil contamination of petrochemical sites needs more attention in developed areas. Jiangsu is a typical representative of developed regions. As shown in Fig. 5 , the probability of soil contamination of petrochemical sites in the Jiangsu was generally high, especially the region along Yangtze River (Jia et al. 2021 (Jia et al. , 2020 Wang et al. 2020a ). Therefore, developed regions like Jiangsu should be given priority in the pollution research and risk management of petrochemical industry (Qiu et al. 2019 ). Petrochemical sites are mainly distributed in developed areas, which is determined by socio-economic factors Xie et al. 2012) . To reduce the soil burden and control risk caused by petrochemical industry, petrochemical enterprises with serious pollution and small scale can be relocated from the developed areas. For areas with natural conditions that are easy to form pollution, more strict access requirement for petrochemical industry should be implemented . At the same time, the petrochemical industry should further promote the green transformation, eliminate the backward production capacities of high energy consumption and enlarged pollution discharge, and increase the proportion of environment-friendly green products (Tantisattayakul et al. 2016 ). Though the dataset of contaminated petrochemical sites was limited for the analysis, the performance of MaxEnt model in identifying the potentially contaminated areas was still excellent. Due to the limitation of data acquisition, only few factors are discussed in this study. If more detailed data are available, the methodology can be further explored through the following ways: (1) The factors can be explored about the probability distribution of contaminated areas, such as policies and pollution emissions. (2) This paper only explored the possibility of regional petrochemical pollution, but not the degree and density of contamination. If the degree of contamination of samples is available, the severity and density would be investigated in combination with the weight-matrix that denotes degree of contamination. In this study, the petrochemical industry was taken as an example to present the spatial distribution of potentially contaminated areas and explore the important factors in identification of contaminated areas with MaxEnt model. This method can also be used to analyze other high-pollution industries or combine multiple industries to analyze the superposition of contamination. In this study, a novel method was proposed to identify the potentially contaminated areas for petrochemical industry and reveal threshold of factors based on MaxEnt model. The MaxEnt model performed well with AUC of 0.981 ± 0.004 for spatial distribution of soil contamination caused by petrochemical activities. (1) The areas with high probability of contamination tended to locate in developed zone and were distributed in Yangtze River Delta, Beijing, Tianjin, southern Guangdong, Fujian coastal areas, central Hubei and northeast Hunan, central Sichuan, and southwest Chongqing. (2) Among the factors being explored in this study, the socio-economic variables were the most relevant factors for identification of potentially contaminated areas, followed by the natural factors and the traffic factors. Gross domestic product, distance to residential area, population density, soil type, and distance to railway accounted for 94.9% of the cumulative contribution rate. (3) Gross domestic product, distance to residential area, population density, soil type, and distance to railway were considered as inputs to re-establish the MaxEnt model. The results showed that the MaxEnt model performed well with AUC of 0.979 ± 0.003 for the potentially contaminated areas of petrochemical industry based on these five factors. (4) The thresholds of factors were as the following: Soc1 > 2650 1000yuan/km 2 , Soc3 < 3105 m, Soc2 > 578 people/km 2 , Nat3: yellow brown soil, cinnamon soil, lou soil, acid rocky soil, moisture soil, seashore saline soil, paddy soil and dewatering paddy soil, Tra1 < 6320 m. Soil contamination caused by petrochemical activities should be paid attention in the areas with factors within the thresholds. The online version contains supplementary material available at https:// doi. org/ 10. 1007/ s11356-022-19697-8. Author contribution Meng Wang: conceptualization, data curation, investigation, methodology and writing-original draft. Huichao Chen: supervision, writing-review and editing. Mei Lei: project administration and funding acquisition. All the authors read and approved the final manuscript. Funding National Key Research and Development Program of China (2018YFC1800104). The datasets generated and/or analyzed during the current study are available on the website of National Fundamental Geographic Information System (http:// www. ngcc. cn/) and Resource and Environment Science and Data Center (http:// www. resdc. cn/). Ethics approval and consent to participate Not applicable. The authors declare no competing interests. Optimization of oilcontaining wastewater treatment processes Optimization of membrane bioreactors for the treatment of petrochemical wastewater under transient conditions A statistical explanation of MaxEnt for ecologists Using LMDI method to analyze the influencing factors of carbon emissions in China's petrochemical industries Novel transformer based on gated convolutional neural network for dynamic soft sensor modeling of industrial processes Concentrations of arsenic and vanadium in environmental and biological samples collected in the neighborhood of petrochemical industries: a review of the scientific literature Modelling and validation of the spatial distribution of suitable habitats for the recruitment of invasive plants on climate change scenarios: an approach from the regeneration niche Investigation and assessment of pollution situation of soil and groundwater in abandoned petrochemical sites Fault monitoring using novel adaptive kernel principal component analysis integrating grey relational analysis A systematic review and meta-analysis of haematological malignancies in residents living near petrochemical facilities Investigation of health risk assessment and odor pollution of volatile organic compounds from industrial activities in the Yangtze River Delta region Insights into chemical composition, abatement mechanisms and regional transport of atmospheric pollutants in the Yangtze River Delta region, China during the COVID-19 outbreak control period Efficacy of histopathology in detecting petrochemicalinduced toxicity in wild cotton rats (Sigmodon hispidus) Assessing the impact of climate change on the distribution of osmanthus fragrans using Maxent Source identification and health risk assessment of persistent organic pollutants (POPs) in the topsoils of typical petrochemical industrial area in Beijing Ecological niche modeling identifies fine-scale areas at high risk of dengue fever in the Pearl River Delta, China Air pollution diffusion simulation and seasonal spatial risk analysis for industrial areas Evaluation and optimization of petrochemical industrial spatial organization in China Survey of soil and groundwater contamination in oil pollution site Source identification and spatial distribution of metals in soils in a typical area of the lower Yellow River, eastern Land use and climate change interaction triggers contrasting trajectories of biological invasion Energy reduction for a dual circuit cooling water system using advanced regulatory control Definition and GIS-based characterization of an integral risk index applied to a chemical/petrochemical area Metal pollution of soils and vegetation in an area with petrochemical industry Improving process safety and product quality using large databases Maximum entropy modeling of species geographic distributions Petrochemical and industrial sources of volatile organic compounds analyzed via regional wind-driven network in Shanghai Sustain activities for real-time optimization models of ethylene plants Early forecasting of the potential risk zones of COVID-19 in China's megacities High resolution spatial distribution for the hexactinellid sponges Asconema setubalense and Pheronema carpenteri in the Central Cantabrian Sea Asthma, respiratory symptoms and lung function in children living near a petrochemical site Drivers of distributions and niches of North American cold-adapted amphibians: evaluating both climate and land use Invasive weed species' threats to global biodiversity: future scenarios of changes in the number of invasive species in a changing climate Assessing the impacts of climate change and habitat suitability on the distribution and quality of medicinal plant using multiple information integration: take Gentiana rigescens as an example What does ecological modelling model? A proposed classification of ecological niche models based on their underlying methods Niches and distributional areas: concepts, methods, and assumptions Prediction of future natural suitable areas for rice under representative concentration pathways (RCPs) The critical limit of cadmium in three types of soil texture with shallot as an indicator plant Energy, environmental, and economic analysis of energy conservation measures in Thailand's upstream petrochemical industry Selection of renewable energy systems sites using the MaxEnt model in the Eastern Mediterranean region in Turkey Assessment of soil organic contamination in a typical petrochemical industry park in China Interdecadal variation of potato climate suitability in China Ozone pollution characteristics and sensitivity analysis using an observation-based model in Nanjing, Yangtze River Delta Region of China Spatial distribution and assessment of the human health risks of heavy metals in a retired petrochemical industrial area, south China Heavy metal contamination of urban topsoil in a petrochemical industrial city in Xinjiang Simulating spatial change of mangrove habitat under the impact of coastal land use: coupling MaxEnt and Dyna-CLUE models Spatial distribution and source analysis of heavy metals in soils influenced by industrial enterprise distribution: case study in Jiangsu Province Spatial distribution and source analysis of heavy metals in soils influenced by industrial enterprise distribution: Case study in Jiangsu Province Chinese caterpillar fungus (Ophiocordyceps sinensis) in China: current distribution, trading, and futures under climate change and overexploitation Production capacity assessment and carbon reduction of industrial processes based on novel radial basis function integrating multi-dimensional scaling A heuristic approach for petrochemical plant layout considering steam pipeline length County-scale distribution of polycyclic aromatic hydrocarbons in topsoil of the Yellow River Delta Region Predicting impact of climate change on geographical distribution of major NTFP species in the Central India Region. Modeling Earth Systems and Environment Nitrogen enrichment in runoff sediments as affected by soil texture in Beijing mountain area Spatio-temporal variation in potential habitats for rare and endangered plants and habitat conservation based on the maximum entropy model Distribution characteristics of volatile pollutants on wastewater treatment plant of petrochemical refinery Distribution of petroleum hydrocarbons in soils and the underlying unsaturated subsurface at an abandoned petrochemical site Spatial evolution of chemical industry and its influencing factors in the regions along the Yangtze River