Abstract
Over the years, humanity has deepened its understanding of its actions’ profound impacts on global temperatures. Beyond the rise in global temperatures driven by global warming, local biomes also significantly influence regional temperature variations, often due to changes in land use. Historically, climatologists have attributed local temperature increases to deforestation, urban development, and agricultural activities. Although these factors are well-known, there remains a need for methodologies to objectively quantify the impact of land use changes on temperature increases. This paper proposes an explanation for temperature increase utilizing advanced statistical models such as ARMAX and neural networks with explainable AI components aiming to quantify the effects of these land use changes on local temperature trends over several years. For this purpose, two cities were chosen to apply the methodology. São Félix do Xingu, in the Brazilian Amazon rainforest (a city that has experienced significant deforestation due to agriculture over the last decade), and Cajazeiras, in the Brazilian semi-arid region (a city that has had substantial urban development over the years). The Data for this analysis were sourced from the MapBiomas Brazilian database and temperature records collected by the Brazilian National Institute of Meteorology. Our approach offers insights into human activities’ climatic impact, giving weight to soil changes and providing a methodology for further interdisciplinary studies, potentially serving as a starting point for specialist systems across various domains.
Access provided by University of Notre Dame Hesburgh Library. Download conference paper PDF
Similar content being viewed by others
1 Introduction
The Sixth Assessment Report of the Intergovernmental Panel on Climate Change (IPCC AR6) [8] acknowledges that climate change is real, based on strong evidence such as declines in Arctic sea ice and other glaciers, thawing of permafrost, an increase in the frequency and severity of heat extremes accompanied by a decrease in cold extremes, a reduction in available water in dry seasons, and an increase in the occurrence of weather conditions that promote torrential rains or wildfires. Thus, global warming is one of the most significant concerns regarding climate change induced by modern society. Due to this phenomenon, practically all beings in the world are negatively affected, including people [1].
In this sense, although Greenhouse Gas (GHG) emissions are the most debated in the scientific community, land use and land cover change are also essential factors to be considered [9]. Humans have deforested vast areas to carry out their activities and utilize natural resources: extraction of natural resources, the establishment of farms, or expansion of urban sites [14]. Consequently, the most threatened areas have become ecological hotspots [11].
In the case of Brazil, and considering Myers’ classic definition [11], the Amazon region does not fit the hotspot criteria. Nevertheless, its rapid deforestation has raised significant concerns, as the collapse of this vegetation could have deleterious effects with global repercussions [6]. Hence, it is important to conduct studies in this area to demonstrate how anthropogenic alterations, encompassing changes in land, may be responsible for temperature fluctuations, aiming to scientifically support their preservation, as current policies on the subject seem not to have produced the desired effects [15].
Therefore, this study covers the period from 1990 to 2022 and utilized data on land occupation from the MapBiomas project [16] along with monthly temperature records from the Brazilian National Institute of Meteorology (INMET) in two different locations: Cajazeiras-PB and São Félix do Xingu-PA. The first city is Brazil’s Northeast region, with a semi-arid climate, while the second one is part of the Amazon rainforest biome. For both regions, the local station presented few data gaps and a significant contrast in land use and land cover change during the study period, with an increase in urban areas but primarily a large expansion of areas dedicated to agriculture and livestock, accompanied by a reduction in native forest.
In this vein, the main objective of the research is to gain insights from machine learning models on the effect of land use changes in local temperature. Since the exogenous variables are land occupation and the endogenous variable is temperature, an ARMAX model was used to perform regression on the temperature time series. Then, Random Forests and Multilayer Perceptron Networks (MPL) were trained with ARMAX outputs. Finally, by confronting the ARMAX model coefficients and XAI techniques for the machine learning models, we provide machine learning-powered quantification of the effects of urban, farming, and forest biome transformations on local temperature. The findings indicate a significant influence of urban and farming areas on increasing temperatures. However, each model provides a different perspective on the magnitude of the influence of each exogenous variable.
This work proposes a methodology to address the impacts of soil changes on local temperature. The remaining paper is organized as follows: Sect. 2 outlines related works on time series forecasting, focusing on temperature forecasting. Section 3 describes the data preparation steps. Section 4 presents the explainability methods applied in this work. Section 5 discusses the results achieved in the experiments. Section 6 provides the conclusions of the paper and suggests directions for further research.
2 Releated Works
Numerous studies have leveraged machine learning methods to enhance temperature prediction and downscaling across different geographical areas [2]. Cifuentes et al. [4] examined various machine learning approaches for temperature forecasting, emphasizing their critical role in planning and decision-making processes across governmental, industrial, and public sectors. This review noted that Deep Learning models generally exhibited lower error rates than conventional Artificial Neural Network structures, particularly at regional scales. At the same time, Support Vector Machines were favored worldwide for their simplicity and precision.
Hernanz et al. [7] expressed concerns about the extrapolation abilities of machine learning techniques when used for statistical downscaling of climate change projections. Through controlled experiments, they illustrated that widely used machine learning models, including artificial neural networks and support vector machines, often show limited performance under extrapolation conditions. This study stressed the importance of understanding the limitations inherent in applying machine learning to downscaling tasks.
Xinyi Li et al. [10] explored the efficacy of both machine learning (Long Short-Term Memory Networks and Support Vector Machines) and statistical (arithmetic ensemble mean and multiple linear regression) methods in downscaling daily temperatures in Ontario, Canada. They reported that machine learning and statistical methods effectively produced high-resolution daily temperature forecasts using multi-model ensembles, though challenges persisted in accurately predicting extreme temperature events.
While numerous studies highlight the utility of machine learning in temperature prediction and downscaling, there remains a significant gap in the literature, specifically concerning the impacts of urban expansion and deforestation on temperature increases. Recent works, such as those by [3, 5], and [13], focus on these areas, elucidating the relationship between local air temperatures and land use. Understanding and mitigating these impacts is crucial for developing strategies to mitigate the adverse effects of urbanization and land use changes on temperature trends.
Furthermore, research by Nascimento et al. [12] directly links temperature variations to land surface changes. This study utilized a network of thermometers spread throughout São Paulo-SP. It compared their readings with data from the LANDSAT 8 satellite’s TIRS sensor, affirming a correlation between air and surface temperatures. Notably, land cover, especially vegetated areas, significantly influences temperature, typically resulting in cooler conditions.
Building upon these insights, our study contributes to this field by proposing a methodology emphasizing soil data’s importance in temperature forecasting systems. Our approach demonstrates how changes in soil can be discerned through temperature variations over the years, providing a crucial understanding of how human land use impacts local temperatures. This methodology enhances our ability to predict temperature changes and deepens our understanding of the interplay between human activity and environmental conditions, paving the way for more informed urban and environmental planning.
3 Data Processing
3.1 Temperature Database: Cajazeiras-PB and São Félix do Xingu-PA Temperature Data
This article examines the average maximum temperatures for each month in the cities of Cajazeiras and São Félix do Xingu. The temperature data from Cajazeiras were collected from a weather station operated by the National Institute of Meteorology (INMET). This station (ID 82689) is situated in the São Gonçalo district, near Cajazeiras, within the municipality of Sousa-PB. The data from São Félix do Xingu were obtained from the INMET station with code 82688, located in the São Félix do Xingu Municipality, Pará State.
To mitigate the impact of seasonal patterns, both time series should undergo a 12-month differencing. However, to retain the long-term trend within the series, the differencing is conducted by comparing each month to the corresponding month in 1990. Consequently, the output yields two series of delta values relative to 1990, facilitating a more precise analysis of temperature changes over time while mitigating the effects of seasonal variability. Statistical tests such as Pettitt’s test were employed to analyze the behavior of the series. Pettitt’s test identified a significant change point in both series (CP) using a significance level of 0.05, indicating a lack of homogeneity. Figure 1 illustrates each differenced series with Pettitt’s u plotted over the series.
Soil Database: Cajazeiras-PB and Sã o Félix do Xingu-PA Soil Ocuppation. For the analysis of land use and cover, data were sourced from a Brazilian government platform called MapBiomas. This platform classifies each 30 \(\times \) 30 m plot of soil across Brazil using satellite imagery. In our study, the soil data for Cajazeiras and São Félix do Xingu were collected and treated as exogenous variables. The data from MapBiomas are collected annually; however, a specific transformation was required to integrate them with the monthly frequency of the temperature records. This transformation involved adjusting the temporal granularity of the MapBiomas data to align with the monthly frequency of the temperature records. Figures 2 and 3 depict the evolution of soil for Cajazeiras and São Félix do Xingu, respectively.
Another essential aspect of our methodology is the consideration of the relative frequency of each biome near each city. Our soil processing approach involves grouping pixels corresponding to each biome to form relative frequency data for each biome within the area of the cities. Subsequently, we employ linear interpolation to interpolate the annual data from MapBiomas, assigning data values for each month. Another important aspect of our data processing involves the classification of soil. While MapBiomas originally categorized over 30 different soil classes, its methodology allows these classes to be consolidated into five super-classes. These super-classes are utilized as exogenous variables in the experiments detailed in this paper. The classes were grouped into super-classes as defined in Table 1.
4 Fiting Model and Explainability Approach
Following the methodology described by Andrade et al. [5], we employed an ARIMAX model, initiating training fifty steps prior to the change point (CP) value to enhance our analysis of predictions near the change point identified by Pettit’s test. This approach was chosen to explore which exogenous variables exert the most influence near the change point. To determine the optimal parameters for the ARIMAX model, we conducted a grid search across the autoregressive term (p), integrative term (d), and moving average term (q). The model with the lowest Akaike Information Criterion (AIC) score was selected, ensuring the best fit for model simplicity and predictive accuracy. The specific CP values and the chosen p, d, and q parameters for each series are detailed in Table 2.
The exogenous variables were modeled using soil data, where the relative frequency of each biome within the city’s total area was pre-processed. Then, each biome was grouped into its superclass. For example, Biome 33, categorized as Mangrove, and Biome 1, categorized as Tropical Forest, were labeled under the superclass “Forest,” as described in Table 1. Table 3 lists the remaining biomes and their respective variation (2022 value divided by 1990 value) for each series.
Although both time series are categorized under the “Forest” biome, the specific forest types for each series differ significantly. São Félix do Xingu is part of the Amazon rainforest biome, characterized by dense, moisture-rich tropical forests. In contrast, the forest biome of Cajazeiras is composed of caatinga, a biome with characteristics well-suited to drier locations. For example, the outskirts of Cajazeiras, where this biome is prevalent, receive less than 1000 mm of precipitation annually, while São Félix do Xingu receives more than 2000 mm. These characteristics are crucial because they influence the biome’s ability to affect local temperatures. Since the primary vegetation in Cajazeiras is caatinga, it is expected to contribute less to urban cooling than the lush, dense vegetation found in the Amazon rainforest of São Félix do Xingu.
As noted in Table 3, the most significant soil variations in both cities were observed in the forest, farming, and urban biomes. Consequently, our approach will focus exclusively on these biomes for subsequent analyses, discarding the Non-Forest Natural Formation class due to its minimal impact on overall soil variation. Following this reduction of exogenous variables, the statistical model for each series was trained using a rolling ARMAX approach with zero integration (\(i=0\)). At each prediction point, coefficients of the exogenous variables were recorded for subsequent analysis to determine their influence on the response variable over time.
A Random Forest (RF) model and an Artificial Neural Network (ANN) were also trained to augment the analysis of the statistical model’s coefficients. The RF model utilized 15 maximum depth and 100 estimators, while the ANN was configured with two hidden layers. These machine-learning models used the output of the ARMAX model as the target and shared the same input variables as the statistical model. Two techniques of XAI (Explainable Artificial Intelligence) were used, SHAP (SHapley Additive exPlanations) and LIME (Local Interpretable Model-agnostic Explanations), to provide a deeper understanding of the model outputs. This approach aims to offer an alternative explanation to the traditional statistical coefficient analysis. Because these machine learning models were trained on the ARMAX outputs, the SHAP/LIME values can provide an alternative viewpoint on the statistical model outputs from both the Random Forest and ANN perspectives.
5 Results and Discussion
The initial analysis primarily focused on scrutinizing the coefficients of the statistical model, evaluating the magnitude of each coefficient across multiple prediction steps. However, before delving into the interpretation of the statistical model, it is imperative to assess the model’s adequacy. Table 4 presents the Mean Absolute Percentage Error (MAPE) for each model applied to the target time series. The metrics were collected in the test set.
Upon examination of Table 4, both models exhibit satisfactory fits. Thus, we proceed with the analysis of the coefficients of the statistical model. Notably, given the proximity of change points in both models, our analysis will concentrate specifically on the window from steps 230 to 260, encapsulating the change point for both time series. Figure 4 illustrates the logarithm of the coefficient magnitudes for each exogenous variable.
The coefficients divulge that, for both datasets, the ARMAX model accords greater significance to the Urban Area, followed by the Farming biomes. In the case of the Cajazeiras time series, the coefficient magnitude for Farming closely rivals that of the Forest, albeit numerically higher. This suggests that the Urban biome significantly influences the observed temperature increase over the analyzed period. Another crucial aspect to consider is the employment of Explainable AI (XAI) techniques in the machine learning models, which were fitted to the statistical outputs generated by the ARMAX model. These models exhibited low errors, as corroborated in Table 5.
Furthermore, the utilization of Shapley Values in the Random Forest model, trained on the ARMAX model’s statistical output, offers insights into the impact of features. Higher positive SHAP values signify a positive contribution to the temperature increase from an exogenous variable, while higher negative SHAP values imply otherwise. Figure 5 provides an overview of the SHAP values measured for both time series.
The findings presented in Fig. 5 offer intriguing insights. Beyond the expected correlation indicating that higher urban areas correspond to increased temperatures, the SHAP values suggest that an increase in farming areas contributes to rising temperatures. Conversely, a decrease in forest coverage accelerates this increase. It is vital to acknowledge the distinct characteristics of the two cities analyzed. São Félix, nestled within the Amazon rainforest, benefits from the cooling properties of its dense forest.
In contrast, Cajazeiras, characterized by the caatinga biome, features sparser vegetation, offering lesser cooling effects. This distinction is mirrored in the SHAP values, which assign numerically higher importance to the forest of São Félix due to its more pronounced cooling effects. Additionally, a Multilayer Perceptron (MLP) was employed with LIME to analyze the data. The MLP underwent 250 epochs of training using the same inputs as the ARMAX statistical model. Figure 6 showcases the box plot results of the LIME analysis.
Similar to the SHAP values applied to the Random Forest model, the LIME technique underscores the significant influence of increases in Farming and Urban Areas on temperature rises in both cities. Notably, both models not only identify the increase in urban and farming biomes as pertinent to the temperature increase but also acknowledge the forest biome’s role as a cooling factor for both cities.
6 Conclusion
This study presented a comprehensive analysis of the impact of land use changes on local temperatures in two distinctly different Brazilian cities, Cajazeiras and São Félix do Xingu, using advanced statistical and machine learning models. By employing an ARMAX model alongside Multilayer Perceptron (MLP) and Random Forest models enhanced with Explainable AI techniques (SHAP and LIME), we quantified the effects of urban, farming, and forest biomes on temperature variations. The findings indicate a significant influence of urban and farming areas on increasing temperatures, confirming the critical role of land use in local climate change.
The approach proposed in this study combines statistical modeling with machine learning outputs. Future research could explore how these outputs (ARMAX coefficients, SHAP values, and LIME explanations) might be integrated to create a new explicability score for exogenous variables, providing enhanced clarity and understanding of these variables within the models. Although the current methodology already offers valuable insights for various applications, with each technique revealing different aspects of the data, further development could lead to even more robust understanding and applications. Further development can also explore more complex data. The current analisys has only 3 exogenous variables, future works can explore data with more exogenous data.
References
Adamo, N., Al-Ansari, N., Sissakian, V.: Review of climate change impacts on human environment: past, present and future projections. Engineering 13(11), 605–630 (2021). https://doi.org/10.4236/eng.2021.1311044
Anjali, T., Chandini, K., Anoop, K., Lajish, V.L.: Temperature prediction using machine learning approaches. In: 2019 2nd International Conference on Intelligent Computing, Instrumentation and Control Technologies (ICICICT), pp. 1264–1268. IEEE (2019). https://doi.org/10.1109/ICICICT46008.2019.8993316. https://ieeexplore.ieee.org/document/8993316/
Arshad, M.Z., Iqbal, Z., Kumar, M.S.: The urban expansion and its effect on temperature trends of Faisalabad City, Pakistan. Nat. Appl. Sci. Int. J. 5(1), 27–43 (2024)
Cifuentes, J., Marulanda, G., Bello, A., Reneses, J.: Air temperature forecasting using machine learning techniques: a review. Energies 13(16), 4215 (2020). https://doi.org/10.3390/en13164215. https://www.mdpi.com/1996-1073/13/16/4215
De Andrade, J.V.R., Junior, T.G.D.S., Leandro, H.D.S., Lucena, D.B., Fernandes, B.J.: Assessing the effect of urban expansion and deforestation on temperature rise in cajazeiras, brazil: a data-driven approach. In: 2023 IEEE Latin American Conference on Computational Intelligence (LA-CCI), pp. 1–6. IEEE (2023)
Flores, B.M., et al.: Critical transitions in the amazon forest system. Nature 626(7999), 555–564 (2024). https://doi.org/10.1038/s41586-023-06970-0
Hernanz, A., Garcíaâ-Valero, J.A., Domínguez, M., Rodríguez-Camino, E.: A critical view on the suitability of machine learning techniques to downscale climate change projections: illustration for temperature with a toy experiment. Atmos. Sci. Lett. 23(6) (2022). https://doi.org/10.1002/asl.1087. https://onlinelibrary.wiley.com/doi/10.1002/asl.1087
IPCC: Climate Change 2022: Mitigation of Climate Change. Contribution of Working Group III to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change. Cambridge University Press, Cambridge, UK and New York, NY, USA (2022). https://doi.org/10.1017/9781009157926. https://www.ipcc.ch/report/ar6/wg3/downloads/report/IPCC_AR6_WGIII_FullReport.pdf
Kalnay, E., Cai, M.: Impact of urbanization and land-use change on climate. Nature 423(6939), 528–531 (2003). https://doi.org/10.1038/nature01675
Li, X., Li, Z., Huang, W., Zhou, P.: Performance of statistical and machine learning ensembles for daily temperature downscaling. Theor. Appl. Climatol. 140(1-2), 571–588 (2020). https://doi.org/10.1007/s00704-020-03098-3. http://link.springer.com/10.1007/s00704-020-03098-3
Myers, N.: Threatened biotas: “hot spots” in tropical forests. Environmentalist 8(3), 187–208 (1988). https://doi.org/10.1007/bf02240252
do Nascimento, A.C.L., Galvani, E., Gobo, J.P.A., Wollmann, C.A.: Comparison between air temperature and land surface temperature for the city of São Paulo, Brazil. Atmosphere 13(3), 491 (2022). https://doi.org/10.3390/atmos13030491. https://www.mdpi.com/2073-4433/13/3/491
Qi, T., Ren, Q., He, C., Zhang, X.: Dual effects on vegetation from urban expansion in the drylands of northern china: a multiscale investigation using the vegetation disturbance index. Sci. Total Environ. 172481 (2024)
Roser, M.: Humans destroyed forests for thousands of years – we can become the first generation that achieves a world in which forests expand. Our World in Data (2022). https://ourworldindata.org/global-forest-transition
da Silva, R.F.B., et al.: Slow-down of deforestation following a brazilian forest policy was less effective on private lands than in all conservation areas. Commun. Earth Environ. 4(1) (2023). https://doi.org/10.1038/s43247-023-00783-9
Souza, C.M., et al.: Reconstructing three decades of land use and land cover changes in brazilian biomes with landsat archive and earth engine. Remote Sens. 12(17), 2735 (2020). https://doi.org/10.3390/rs12172735
Acknowledgements
This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001, and the Brazilian agencies FACEPE and CNPq.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
de Andrade, J.V.R. et al. (2025). Exploring Climatic Shifts in Brazilian Climates: Insights from ARMAX, Decision Trees, and Artificial Neural Networks. In: Paes, A., Verri, F.A.N. (eds) Intelligent Systems. BRACIS 2024. Lecture Notes in Computer Science(), vol 15415. Springer, Cham. https://doi.org/10.1007/978-3-031-79038-6_12
Download citation
DOI: https://doi.org/10.1007/978-3-031-79038-6_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-79037-9
Online ISBN: 978-3-031-79038-6
eBook Packages: Computer ScienceComputer Science (R0)







