Abstract
A considerable part of Brazilian municipalities experiences the recurrence of emergencies, related to different types of disasters, which weakens them – both the public administration and the citizens – while they are equally oriented to operate transitions to achieve the 2030 Sustainable Development Goals (SDG) Agenda. To mitigate the crises addressed by the 2030 Agenda, intersectoral consultation guided by an integrated approach to social justice, accountability and sustainability is required to renew social dynamics and corresponding local public policies. The objective of this paper is to carry out an integrated analysis of such dynamics in their main social and economic components, modeling the indicators of some specific SGD and emergence decrees data as input variables to a classification problem and using explainable machine learning, namely SHAP values technique, to assess their intertwining and/or synergies. Three classification models were tested on data from 2003 to 2020 and the model which presented the best accuracy was analyzed with SHAP values, revealing that different variables are decisive for each phenomenon associated with the disaster and their understanding allows elucidating critical points to be addressed by managers.
Access provided by University of Notre Dame Hesburgh Library. Download conference paper PDF
Similar content being viewed by others
1 Introduction
In the first decades of this century, several nations on the planet – including Brazil – have been facing a rapid materialization of events threatening the well-being of their population. Many of these events, although classified as natural, are from different groups and subgroups, such as belonging to the geological (mass movement), hydrological (floods, torrents), meteorological and climatological (storms, droughts, dry season) and biological (viral infectious diseases) categories [1].
The greater the process of socioeconomic and environmental vulnerability to which local populations may be subject, the greater their fragility if exposed to such threatening factors, whether they are located in poorer or more affluent macro-regions of the country [2]. One aspect of this vulnerability is the asymmetry in the distribution of opportunities for decent work, income, education and safe housing; in other words, it concerns the structural challenges underlying the development model that the country has been following. Another aspect to consider is the relative institutional unpreparedness to protect these vulnerable populations. Systematically, there has been no success in improving preventive and preparedness civil protection actions in the country. This results in disasters, defined as a significant set of collective losses and damages that cause significant stress and social suffering [3,4,5], negatively affecting local socio-spatial dynamics, from wealth flows to difficulties in taking public action to recover damaged or destroyed infrastructure.
In the face of disasters, there is an institutional need for municipal public administration to declare an emergency. When the local public authority declares an emergency associated with a given manifested event, this means that the damage caused by it exceeds the capacity of local actors to respond to the situation. Emergency decrees associate the temporality of the event’s manifestation (generally short) with the bureaucratic temporality of the emergency (of medium duration) and point to the disasters themselves (an even longer social time of the disturbed daily life of the affected groups) [6]. Emergency decrees duration allows the local administration to take or induce measures to repair the situation more strategically and urgently, enabling certain social and economic routines, territorial flows and infrastructures to be resumed, even if dealing with a certain precariousness for a longer period of time so that a full recovery can be envisaged.
More than undermining well-being conditions in the affected place (city, state or country), disasters can also harm the achievement of the Sustainable Development Goals by 2030. The SDGs are a multilateral agreement to incisively combat the main planetary socioeconomic and socio-environmental problems. This includes commitments to stop precarious work, expand access to basic sanitation, protect tropical forests and eliminate world hunger. In the SDG Progress Report [7], the United Nations recognizes that more than half the world is being left behind, with disasters related to climate change, pandemics and loss of biodiversity as well as their adverse economic and social consequences on developing countries where there aren’t enough resources to reverse the situation.
The use of Explainable AI (XAI) methods [8] with Machine Learning (ML) algorithms has become increasingly common over the last few years, even being considered by researchers as a mandatory step in critical domains, where it is expected that the model not only provides an answer but, also explains how and why the conclusion was reached. Among the methods currently available, Shapley Additive Explanations (SHAP) [9] stands out as one of the most used, for presenting several advantages such as ease of use, being model agnostic and allowing the presentation of analyzes through graphs that are both easy to interpret and informative. Ensemble machine learning models, specifically XGBoost [11] and Random Forest (RF) [10], associated to SHAP technique, have been successfully applied in a broad variety of fields including health [12], finances [13], urban environment [14], among many others.
In the specific domain of threatening factors associated to disasters monitoring and assessment, the last few years witnessed a significant increase in the number of works that explore the combined ensemble model-SHAP approaches. In [15], the authors adopted an integrated approach using statistical modeling and machine learning techniques to assess the effect of urban expansion and deforestation on temperature rise in Cajazeiras, Brazil. The databases were created with time series representing the evolution of each biome and temperature from 1990 to 2020. RF with SHAP values unraveled the importance of urbanization and the absence of savanna formation near the city. A landslide susceptibility evaluation model was constructed in [16]. Using the historical landslide information from three counties in Chongqing, China as the base data, a model was constructed based on the XGBoost algorithm and the SHAP algorithm was applied to quantify the contribution of the influencing factors on landslide occurrence at both global and local levels. The study in [17] proposes a wildfire severity mapping approach that incorporates feature selection techniques within the XAI framework, aiming at enhancing the precision and providing insights into the factors contributing to model decisions. Utilizing post-fire imagery, the authors developed a RF model and identified the most influential predictors to elucidate the qualitative and quantitative impacts on ML algorithm performance. In [18], an explainable ML pipeline using the XGBoost model and SHAP was proposed, based on a comprehensive database of drought impacts in the U.S., both at national and state levels. Data from 2011 to 2020 were obtained from online databases and processed to represent the occurrence of various drought indicators and drought impacts. The authors concluded that the patterns observed by the SHAP explainer were aligned with expert knowledge, revealing that the abnormal dryness positively contributes to the occurrence of drought impacts. The tropical cyclone disaster loss was evaluated in [19] using ML algorithms and identifying the impact of specific features on the prediction with SHAP values. The indicators of hazard, vulnerability, and resilience were incorporated into the system as input variables in a dataset of 492 disaster events caused by tropical cyclones that occurred from 2000 to 2020 in different provinces in China. The study presented in [20], uses a XGBoost model and SHAP values to assess how urban morphology influences urban flooding susceptibility. The city of Shenzhen in China, was chosen as a case study due to its rapid urbanization that led to increased residential population and ecosystem fragmentation. The findings underscore the varying impact of disaster variables on urban flooding, with morphological attributes becoming highly significant during severe inundations.
In the aforementioned studies, the necessity of explaining the model outputs were highlighted and the benefits of ML models and SHAP values as a valuable XAI tool for decision-makers when managing disasters were emphasized as the main contribution of the researches. Although relevant, the majority of related work address specific problems. In this sense, our work brings a novel and broader approach that provides insights to reduce the lack of studies in Brazil, and addresses the problem from a comprehensive level. Based on the Brazilian context, this study is a preliminary and original exercise on the connections between disasters associated with different types of events (geological, hydrological, meteorological, climatological and biological) - and which have been recognized by local and federal authorities as emergencies - and dozens of socioeconomic variables available in official open databases, covering the period 2003-2020.
The main objective of the study is to identify, by means of ML models and SHAP values, which socioeconomic and socioenvironmental variables are relevant in their association with the different types of events that led to the declaration of emergency in Brazil in the period 2003-2020. Once identified, preliminary interpretations were made about this connection, to be tested in future studies. Secondarily, the aim was to establish the link between the variables highlighted in each type of event (class) causing disaster and the SDGs, indicating the bottlenecks that must be addressed to comply with them.
2 Materials and Methods
2.1 Dataset
The dataset includes 44,285 examples created from emergency decrees in Brazilian municipalities from 2003 to 2020, each with 55 features, such that 11 of them, extracted from S2IDFootnote 1, are related to the decrees. The other 44 external features are categorized as follows: housing data, HDI indicators and occupational sector data, form the IBGE CensusFootnote 2; the share of value added in relation to the national GDP, from IBGE’s SIDRA plataformFootnote 3; education level investment data, from INEPFootnote 4. All external features are provided annually at the estate level, except for data from INEP, which are provided at the national level. Table 1 shows the variables, their corresponding abbreviated names used in the graphs and sources.
Besides the complete dataset, two subsets were created: one with data from 2003 to 2011, comprising 13,407 examples, and the other with data from 2012 to 2020, comprising 30,878 examples. This separation was considered to allow the emergence of different variables among the most important ones in different periods, due to the changes that occurred in 2012, such as the new classification of disasters [1] and the standardization of validity periods for emergency situations (90 days) and states of public calamity (180 days).
2.2 Machine Learning Algorithms and SHAP Values
Our problem was modeled as a classification task with the COBRADE category of events [1] as the class, aiming at unraveling insights on the relationships among these events and several socio-economic indicators. We employed tree-based models such as XGBoost, Random Forest, and Decision Tree [22] for our classification task due to their effectiveness in handling complex relationships within data and their ability to capture nonlinearities and interactions among features. Evaluation of model performance included metrics such as accuracy, precision, recall, and F1 score. To enhance interpretability, we utilized the SHAP libraryFootnote 5 to compute average variable importance scores for each class separately. These insights were crucial for evaluation by a disaster specialist, considering both geographical and socio-political aspects.
3 Experiments and Analysis
The algorithms were run on the entire dataset, spanning from 2003 to 2020, and on two separate datasets, covering the periods: Period 1 (2003-2011) and Period 2 (2012-2020). In all executions we employed an 80-20 split for this study, allocating 80% for training and 20% for testing and using the default parameters. Throughout the process, our focus was on ensuring adequate class representation to allow the model to generalize common behaviors across event occurrences. We acknowledge the inherent imbalance in our dataset, however, introducing synthetic data was deemed inappropriate as it could distort variable importance and exacerbate biases. The performance measures for all algorithms and time periods are shown in Table 2. The model with best accuracy (XGBoost) was selected to perform the analyses of features importance by the SHAP method.
As not all classes have enough examples, only the classes with the highest frequency in the data set were considered for analysis (COBRADE classes in parenthesis): Landslide (1.1.3.2.1), Storms and floods (1.2.1.0.0), Dry season (1.4.1.1.0), Drought (1.4.1.2.0), Forest fires (1.4.1.3.1) and Viral infectious diseases (1.5.1.1.0) (see Fig. 1). The SHAP beeswarm plot is used to present the 11 most important variables for each class in the period chosen for analysis: the entire period (2003-2020) or separate periods (2002-2011 or 2012-2020), depending on the relevance identified in each case. The beeswarm plot gives the rank of features by their importance, and provides information on how the variation of the features values can impact each class. For each feature, all instances in the dataset are identified by a dot, with the horizontal position of the dot being the SHAP value of that feature. The blue color represents low values and, red, the high ones.
It can be noted that date related features such as year or month of beginning and end of the decree validity, as well as IBGE Code, appear frequently among the most important ones. Although important for the class determination, it is not possible to extract specific information based on the feature values, except for some cases where the decrees validity start in specific years, for example Viral infectious diseases (1.5.1.1.0) that occurred mainly during 2020 COVID-19 pandemic period as seen in Fig. 1. For this reason, we do not comment on these variables in the analysis that follows. It is important to mention that IBGE Code is included as one of the attributes as an identifier of the municipalities.
Below, we comment on the variables that appear among the most important and provide relevant information for the objectives of the article, for each of the selected classes.
Landslide. A landslide is defined as the movement of a mass on a slope due to the action of gravity and the effect of some force [21]. In the Brazilian case, intense rainfall over these areas acts to produce the movement of soil sliding (in the form of mud) or falling rocks and debris (including natural or planted flora) causing considerable destruction to the form of human occupation in the path of their flow. Because of this, it can directly affect the share of agriculture in the municipal/state contribution when this productive activity is carried out on this type of land. This explanation shapes both the low values of the agriculture share (3rd, Fig. 2a) and the high values of the industry share (1st, Fig. 2b). The importance of monetary value in high administration, observed more markedly in the second period, may be due to the significant volumes of extraordinary monetary resources accessed by the municipal executive power after the emergency decree when this event reaches the fullness of its local destructive manifestation. This is because the damage caused, especially on trafficable roads and in inhabited areas, requires expeditious and exceptional technical intervention – carried out by human staff and current public equipment, however, combined with that of private companies hired on an emergency basis – to unblock and repair the roads. reached as well as containing unstable slopes. It is surprising, however, that in the second period (2012-2020) HDI income and longevity, both at high values, emerge and become significant in the model. This may indicate that the topographic and geomorphological characteristics of certain municipalities belonging to wealthy regions make the location more susceptible to this event; however, even though the regional socioeconomic characteristics, of a more general nature, are prosperous, the most susceptible terrain in the affected locations may be those in the territories of the most socioeconomically disaffiliated communities, that is, prosperous regions with severe asymmetries in terms of socio-spatial susceptibility.
Storms and Floods. These events, characterized by climatological/atmospheric and hydrological origins, are important in both periods (2003-2011 and 2012-2020), although they occur at different times in the five Brazilian macro-regions. Since this class of events has relative predictability regarding the period of the year - the months - in which it will occur, if the Federative Unit (UF) and the municipalities are (or remain) unprepared to dealing with this type of events, disaster become inevitable. The regularity of manifestation of these threats throughout the year is what would allow the local public administration to achieve the SDGs 11 (particularly, referring to sub-goals 11.5 and 11.7.b) and 13 (sub-goal 13.1), if it acted in non-seasonal periods. Rainy seasons for planning and executing preventive and preparatory measures to face them in the next time and avoid a new disaster. The low and medium number of vulnerable people dependent on the elderly (1st and 6th, Fig. 3) denotes that the locations currently susceptible to disasters associated with this class of events are socioeconomically prosperous; therefore, the chances are greater that the local administration will have its own human, technological and financial resources to adopt an agenda in accordance with the sub-objectives of sustainable development mentioned above.
Dry Season. Medium and high values for single mother heading households and high percentage of vulnerable people dependent on the elderly (1st and 4th, Fig. 4) indicate that the predominant social and economic characteristics associated with the dry season are structural poverty. This is confirmed by the importance, also demonstrated in the model, of the low values in both variables, added value of agriculture and added value of industry ((3rd and 5th, Fig. 4). Thus, it is clear how far the localities experiencing disasters related to this class of event are from SDG 11 (sub goal 11.5: protect the poor and people in vulnerable situations). It is important to highlight the unusual aspect of the variable high investment value in basic education (10th, Fig. 4) associated with disasters caused by dry season, in an apparent contradiction with the above interpretation. It turns out that, in less economically dynamic locations, the amounts transferred from the federal level to the municipalities as well as mandatory spending on education (25% of revenue) stands out in front of the dispersion of other expense items with which the local public administration must deal with. Thus, the importance of high investments in education in the model signals a convergent direction with SDG 4, especially with regard to sub-goal 4.1 (ensure that all girls and boys complete primary and secondary education [.. .] that leads to relevant and effective learning outcomes).
Drought. In period 1, the triad of low values in percentage of employment in the manufacturing industry, monetary value added to administrative activities and percentage of investments in basic education (7th, 4th and 9th, Fig. 5a) indicates that, to an inoperative public management corresponds a poorly dynamic economy and a precarious educational cultural standard for the child population. Thus, the lack of commitment to the intellectual training of the new generations compromises the brightest perspectives in their occupational future and both make it impossible for these sub-citizens to exercise greater social control over the public policies. This context connects, in an unfavorable way, SDG 1 (eradication of poverty), particularly subgoal 1.5.b (support accelerated investments in actions to eradicate poverty) to SDG 8 (decent work and economic growth), especially subgoal 8.3 (promote development-oriented policies that support productive activities, generation of decent employment, entrepreneurship, creativity and innovation). Period 2 reconfirms the meanings of this same triad, although it does so through the variables low value added by agriculture and low HDI (5th, 9th, Fig. 5b), which clash with the same sub-goals of the SDGs mentioned above (1.5.b and 8.3). The low HDI, however, also points unfavorably to the possibility of achieving SDG 3 (health and well-being) and SDG 4 (quality education), distancing localities in drought emergencies from a local, state and regional development model effectively sustainable.
Forest Fires. Forest fires are well demarcated in the model in relation to the beginning and end months of the season in both periods. In the first one (2003-2011), it indicates direct correspondence with high percentages of employment in the service sector (5th, Fig. 6a), these probably being harmed by atmospheric pollution and destruction of forests, especially if they were linked to tourism services, as in the Brazilian Pantanal. In the second period (2012-2020), the lowest percentage of vulnerable people dependent on the elderly (8th, Fig. 6b) shows that these are economically prosperous locations. The regional characteristic (Central-West) and the added value of agriculture (large-scale production) stand out (7th, Fig. 6b), indicating the potential huge economic loss to the primary sector. If, on the one hand, the possibility of specifying the location of these events (by region and municipality) would facilitate prevention strategies, which would converge with SDG 15, sub-goal 15.2 (stop deforestation, restore degraded forests), on the other the recurrence of forest fires indicates that the relative ineffectiveness of the actions taken is far from achieving this goal.
Viral Infectious Diseases. Viral infectious diseases gained prominence in the model only in the second period (2012-2020), mainly due to the Covid-19 pandemic in 2020, which led to all regions of Brazil declaring an emergency, which is why the event’s starting year is the best demarcated variable in the model (1st, Fig. 7).
The set of high values for variables related to economic activities (high percentage of jobs in the service sector and added value of agriculture, 9th and 10th, Fig. 7) indicates the success of adaptive strategies in the face of a serious biological threat. Despite the mandatory change in collective behavior, through preventive measures of social isolation, it was possible to keep the production and distribution of essential and consumer goods that society needed flowing, which is in line with the purposes of SDG 8.8 (promote safe and secure working environments for all workers) and 3.d (strengthen the capacity [...] particularly of developing countries, for early warning, risk reduction and management of national [...] health risks).
In this case, even though the country’s health policies were unprepared to mitigate the situation that year, the health sector pursued SDG 3, sub-goal 3.9.b (support the research and development of vaccines and medicines for communicable and non-communicable diseases, which mainly affect developing countries).
4 Conclusions
With only seven years left for nations, like Brazil, to achieve the SDGs to which they had committed, United Nations Secretary António Guterres stated: “Unless we act now, the 2030 Agenda will become an epitaph for a world that might have been.” The findings above indicate that the larger number of emergencies that occur in Brazil - indicative of weak institutional and social resilience to face geological, hydrological, climatic, biological and similar threats and prevent them from triggering disasters - is intertwined with cultural vulnerabilities, historical economic and territorial issues, however, uncomfortably persistent in the face of multilateral aspirations. In other words, the Brazilian pace of inducing sustainable inter-sectoral public policies, which can reduce the risks of disasters based on their discussion and training in the content of quality basic education, in the training processes to obtain employment, is slow. There is still a lack of good strategies to protect subsistence and small-scale agriculture in the face of climate setbacks as well as in the expansion of slope containment measures and the improvement of rainwater drainage systems, among others.
During data modeling, we encountered problems with inconsistent and missing data for several indicators, which could significantly impact conclusions and prevent us from exploring more recent periods for an up-to-date warning study. Despite an unbalanced dataset and the inability to create synthetic data, we obtained information that helped interpret events and achieved relevant performance using simple tree-based models. Future studies could expand the analysis with a wider range of classes and variables at both state and municipal levels, enhancing the understanding of emergency decrees. Improving the model would help express the association between emergencies and socioeconomic and socio-environmental variables. This would aid decision-makers in identifying underlying factors related to decrees, supporting a strong government commitment to investing in public institutions that collect, systematize, and disseminate updated open data, thus accelerating progress toward fulfilling the 2030 Agenda.
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
References
Brasil: Classificação e Codificação Brasileira de Desastres. Secretaria Nacional de Proteção e Defesa Civil-SEDEC/MDR, Brasília (2012)
Kowarick, L.: Viver em risco: sobre a vulnerabilidade socioeconômica e civil. Editora 34, São Paulo (2009)
Fritz, C.: Disaster. In: Merton, R.. Nisbet, R. (eds.) Contemporary Social Problems. 1\(^{\underline{a}}\) ed., pp. 651–694. Harcourt Brace Jovanovich, New York (1961)
Quarantelli, E.: Uma agenda de pesquisa do século 21 em ciências sociais para os desastres: questões teóricas, metodológicas e empíricas, e suas implementações no campo professional. O Social em Questão. Rio de Janeiro, vol. 18, pp. 25–56 (2015)
Valencio, N.: Para além do ‘dia do desastre’: o caso brasileiro. Coleção Ciências Sociais. Ed. Appris, Curitiba (2012)
Valencio, N., Valencio, A.: O assédio em nome do bem: Dos sofrimentos conectados à dor moral coletiva de vítimas de desastres. LUMINA 12, 19–39 (2018)
United Nations: The Sustainable Development Goals Report 2023, Special United Nations Publications, New York (2023)
Arrieta, A., et al.: Explainable Artificial Intelligence (XAI): concepts, taxonomies, opportunities and challenges toward responsible AI. Inf. Fusion 58, 82–115 (2020)
Lundberg, S., Lee, S.-I.: A unified approach to interpreting model predictions. Adv. Neural. Inf. Process. Syst. 30, 4765–4774 (2017)
Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)
Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794. ACM, New York (2016)
Yi, F., Yang, H., Chen, D., et al.: XGBoost-SHAP-based interpretable diagnostic framework for alzheimer’s disease. BMC Med. Inform. Decis. Mak. 23(137), 1–14 (2023)
Tan, B., Gan, Z., Wu, Y.: The measurement and early warning of daily financial stability index based on XGBoost and SHAP: evidence from China. Expert Syst. Appl. 227, 120325 (2023)
Hatami, F., Rahman, M., Nikparvar, B., Thill, J.-C.: Non-linear associations between the urban built environment and commuting modal split: a random forest approach and SHAP evaluation. IEEE Access 11, 12649–12662 (2023)
Andrade, J., De Souza Junior, T., Silva, L. Lucena, D., Fernandes, B.: Assessing the effect of urban expansion and deforestation on temperature rise in Cajazeiras, Brazil: a data-driven approach. In: 2023 IEEE LA-CCI, pp. 1–6. IEEE, Recife (2023)
Zhang, J., et al.: Insights into geospatial heterogeneity of landslide susceptibility based on the SHAP-XGBoost model. J. Environ. Manage. 332, 117357 (2023)
Van, L., Tran, V., Nguyen, G., Yeon, M., Do, M., Lee, G.: Enhancing wildfire mapping accuracy using mono-temporal Sentinel-2 data: a novel approach through qualitative and quantitative feature selection with explainable AI. Eco. Inform. 81, 102601 (2024)
Zhang, B., Salem, F., Hayes, M., Smith, K., Tadesse, T., Wardlow, B.: Explainable machine learning for the prediction and assessment of complex drought impacts. Sci. Total Environ. 898, 165509 (2023)
Liu, S., et al.: Evaluation of tropical cyclone disaster loss using machine learning algorithms with an explainable artificial intelligence approach. Sustainability 15, 12261 (2023)
Wang, M., et al.: An XGBoost-SHAP approach to quantifying morphological impact on urban flooding susceptibility. Ecol. Ind. 156, 111137 (2023)
Highland, L.M., Bobrowsky, P.: The landslide handbook—a guide to understanding landslides. U.S. Geological Survey Circular, vol. 1325 (2008)
Wu, X., et al.: Top 10 algorithms in data mining. Knowl. Inf. Syst. 14(1), 1–37 (2008)
Acknowledgments
This study was funded by FAPESP (Processes 2022/09136-1 and 2023/03000-3), CNPq (Processes 316828/2023-8 and 144465/2023-0) and CAPES.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Teixeira, L., Matos, A., Carvalho, G., Valencio, N., Camargo, H. (2025). Explainability of Machine Learning Models with XGBoost and SHAP Values in the Context of Coping with Disasters. In: Paes, A., Verri, F.A.N. (eds) Intelligent Systems. BRACIS 2024. Lecture Notes in Computer Science(), vol 15415. Springer, Cham. https://doi.org/10.1007/978-3-031-79038-6_11
Download citation
DOI: https://doi.org/10.1007/978-3-031-79038-6_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-79037-9
Online ISBN: 978-3-031-79038-6
eBook Packages: Computer ScienceComputer Science (R0)








