1 Introduction

In the first decades of this century, several nations on the planet – including Brazil – have been facing a rapid materialization of events threatening the well-being of their population. Many of these events, although classified as natural, are from different groups and subgroups, such as belonging to the geological (mass movement), hydrological (floods, torrents), meteorological and climatological (storms, droughts, dry season) and biological (viral infectious diseases) categories [1].

The greater the process of socioeconomic and environmental vulnerability to which local populations may be subject, the greater their fragility if exposed to such threatening factors, whether they are located in poorer or more affluent macro-regions of the country [2]. One aspect of this vulnerability is the asymmetry in the distribution of opportunities for decent work, income, education and safe housing; in other words, it concerns the structural challenges underlying the development model that the country has been following. Another aspect to consider is the relative institutional unpreparedness to protect these vulnerable populations. Systematically, there has been no success in improving preventive and preparedness civil protection actions in the country. This results in disasters, defined as a significant set of collective losses and damages that cause significant stress and social suffering [3,4,5], negatively affecting local socio-spatial dynamics, from wealth flows to difficulties in taking public action to recover damaged or destroyed infrastructure.

In the face of disasters, there is an institutional need for municipal public administration to declare an emergency. When the local public authority declares an emergency associated with a given manifested event, this means that the damage caused by it exceeds the capacity of local actors to respond to the situation. Emergency decrees associate the temporality of the event’s manifestation (generally short) with the bureaucratic temporality of the emergency (of medium duration) and point to the disasters themselves (an even longer social time of the disturbed daily life of the affected groups) [6]. Emergency decrees duration allows the local administration to take or induce measures to repair the situation more strategically and urgently, enabling certain social and economic routines, territorial flows and infrastructures to be resumed, even if dealing with a certain precariousness for a longer period of time so that a full recovery can be envisaged.

More than undermining well-being conditions in the affected place (city, state or country), disasters can also harm the achievement of the Sustainable Development Goals by 2030. The SDGs are a multilateral agreement to incisively combat the main planetary socioeconomic and socio-environmental problems. This includes commitments to stop precarious work, expand access to basic sanitation, protect tropical forests and eliminate world hunger. In the SDG Progress Report [7], the United Nations recognizes that more than half the world is being left behind, with disasters related to climate change, pandemics and loss of biodiversity as well as their adverse economic and social consequences on developing countries where there aren’t enough resources to reverse the situation.

The use of Explainable AI (XAI) methods [8] with Machine Learning (ML) algorithms has become increasingly common over the last few years, even being considered by researchers as a mandatory step in critical domains, where it is expected that the model not only provides an answer but, also explains how and why the conclusion was reached. Among the methods currently available, Shapley Additive Explanations (SHAP) [9] stands out as one of the most used, for presenting several advantages such as ease of use, being model agnostic and allowing the presentation of analyzes through graphs that are both easy to interpret and informative. Ensemble machine learning models, specifically XGBoost [11] and Random Forest (RF) [10], associated to SHAP technique, have been successfully applied in a broad variety of fields including health [12], finances [13], urban environment [14], among many others.

In the specific domain of threatening factors associated to disasters monitoring and assessment, the last few years witnessed a significant increase in the number of works that explore the combined ensemble model-SHAP approaches. In [15], the authors adopted an integrated approach using statistical modeling and machine learning techniques to assess the effect of urban expansion and deforestation on temperature rise in Cajazeiras, Brazil. The databases were created with time series representing the evolution of each biome and temperature from 1990 to 2020. RF with SHAP values unraveled the importance of urbanization and the absence of savanna formation near the city. A landslide susceptibility evaluation model was constructed in [16]. Using the historical landslide information from three counties in Chongqing, China as the base data, a model was constructed based on the XGBoost algorithm and the SHAP algorithm was applied to quantify the contribution of the influencing factors on landslide occurrence at both global and local levels. The study in [17] proposes a wildfire severity mapping approach that incorporates feature selection techniques within the XAI framework, aiming at enhancing the precision and providing insights into the factors contributing to model decisions. Utilizing post-fire imagery, the authors developed a RF model and identified the most influential predictors to elucidate the qualitative and quantitative impacts on ML algorithm performance. In [18], an explainable ML pipeline using the XGBoost model and SHAP was proposed, based on a comprehensive database of drought impacts in the U.S., both at national and state levels. Data from 2011 to 2020 were obtained from online databases and processed to represent the occurrence of various drought indicators and drought impacts. The authors concluded that the patterns observed by the SHAP explainer were aligned with expert knowledge, revealing that the abnormal dryness positively contributes to the occurrence of drought impacts. The tropical cyclone disaster loss was evaluated in [19] using ML algorithms and identifying the impact of specific features on the prediction with SHAP values. The indicators of hazard, vulnerability, and resilience were incorporated into the system as input variables in a dataset of 492 disaster events caused by tropical cyclones that occurred from 2000 to 2020 in different provinces in China. The study presented in [20], uses a XGBoost model and SHAP values to assess how urban morphology influences urban flooding susceptibility. The city of Shenzhen in China, was chosen as a case study due to its rapid urbanization that led to increased residential population and ecosystem fragmentation. The findings underscore the varying impact of disaster variables on urban flooding, with morphological attributes becoming highly significant during severe inundations.

In the aforementioned studies, the necessity of explaining the model outputs were highlighted and the benefits of ML models and SHAP values as a valuable XAI tool for decision-makers when managing disasters were emphasized as the main contribution of the researches. Although relevant, the majority of related work address specific problems. In this sense, our work brings a novel and broader approach that provides insights to reduce the lack of studies in Brazil, and addresses the problem from a comprehensive level. Based on the Brazilian context, this study is a preliminary and original exercise on the connections between disasters associated with different types of events (geological, hydrological, meteorological, climatological and biological) - and which have been recognized by local and federal authorities as emergencies - and dozens of socioeconomic variables available in official open databases, covering the period 2003-2020.

The main objective of the study is to identify, by means of ML models and SHAP values, which socioeconomic and socioenvironmental variables are relevant in their association with the different types of events that led to the declaration of emergency in Brazil in the period 2003-2020. Once identified, preliminary interpretations were made about this connection, to be tested in future studies. Secondarily, the aim was to establish the link between the variables highlighted in each type of event (class) causing disaster and the SDGs, indicating the bottlenecks that must be addressed to comply with them.

2 Materials and Methods

2.1 Dataset

The dataset includes 44,285 examples created from emergency decrees in Brazilian municipalities from 2003 to 2020, each with 55 features, such that 11 of them, extracted from S2IDFootnote 1, are related to the decrees. The other 44 external features are categorized as follows: housing data, HDI indicators and occupational sector data, form the IBGE CensusFootnote 2; the share of value added in relation to the national GDP, from IBGE’s SIDRA plataformFootnote 3; education level investment data, from INEPFootnote 4. All external features are provided annually at the estate level, except for data from INEP, which are provided at the national level. Table 1 shows the variables, their corresponding abbreviated names used in the graphs and sources.

Besides the complete dataset, two subsets were created: one with data from 2003 to 2011, comprising 13,407 examples, and the other with data from 2012 to 2020, comprising 30,878 examples. This separation was considered to allow the emergence of different variables among the most important ones in different periods, due to the changes that occurred in 2012, such as the new classification of disasters [1] and the standardization of validity periods for emergency situations (90 days) and states of public calamity (180 days).

Table 1. Selected variables.

2.2 Machine Learning Algorithms and SHAP Values

Our problem was modeled as a classification task with the COBRADE category of events [1] as the class, aiming at unraveling insights on the relationships among these events and several socio-economic indicators. We employed tree-based models such as XGBoost, Random Forest, and Decision Tree [22] for our classification task due to their effectiveness in handling complex relationships within data and their ability to capture nonlinearities and interactions among features. Evaluation of model performance included metrics such as accuracy, precision, recall, and F1 score. To enhance interpretability, we utilized the SHAP libraryFootnote 5 to compute average variable importance scores for each class separately. These insights were crucial for evaluation by a disaster specialist, considering both geographical and socio-political aspects.

3 Experiments and Analysis

The algorithms were run on the entire dataset, spanning from 2003 to 2020, and on two separate datasets, covering the periods: Period 1 (2003-2011) and Period 2 (2012-2020). In all executions we employed an 80-20 split for this study, allocating 80% for training and 20% for testing and using the default parameters. Throughout the process, our focus was on ensuring adequate class representation to allow the model to generalize common behaviors across event occurrences. We acknowledge the inherent imbalance in our dataset, however, introducing synthetic data was deemed inappropriate as it could distort variable importance and exacerbate biases. The performance measures for all algorithms and time periods are shown in Table 2. The model with best accuracy (XGBoost) was selected to perform the analyses of features importance by the SHAP method.

Table 2. Model Performance Metrics across Time Periods

As not all classes have enough examples, only the classes with the highest frequency in the data set were considered for analysis (COBRADE classes in parenthesis): Landslide (1.1.3.2.1), Storms and floods (1.2.1.0.0), Dry season (1.4.1.1.0), Drought (1.4.1.2.0), Forest fires (1.4.1.3.1) and Viral infectious diseases (1.5.1.1.0) (see Fig. 1). The SHAP beeswarm plot is used to present the 11 most important variables for each class in the period chosen for analysis: the entire period (2003-2020) or separate periods (2002-2011 or 2012-2020), depending on the relevance identified in each case. The beeswarm plot gives the rank of features by their importance, and provides information on how the variation of the features values can impact each class. For each feature, all instances in the dataset are identified by a dot, with the horizontal position of the dot being the SHAP value of that feature. The blue color represents low values and, red, the high ones.

It can be noted that date related features such as year or month of beginning and end of the decree validity, as well as IBGE Code, appear frequently among the most important ones. Although important for the class determination, it is not possible to extract specific information based on the feature values, except for some cases where the decrees validity start in specific years, for example Viral infectious diseases (1.5.1.1.0) that occurred mainly during 2020 COVID-19 pandemic period as seen in Fig. 1. For this reason, we do not comment on these variables in the analysis that follows. It is important to mention that IBGE Code is included as one of the attributes as an identifier of the municipalities.

Fig. 1.
figure 1

Frequency of Cobrade classes over the years, 2003-2020.

Below, we comment on the variables that appear among the most important and provide relevant information for the objectives of the article, for each of the selected classes.

Fig. 2.
figure 2

Most important variables for class Landslide (1.1.3.2.1)

Landslide. A landslide is defined as the movement of a mass on a slope due to the action of gravity and the effect of some force [21]. In the Brazilian case, intense rainfall over these areas acts to produce the movement of soil sliding (in the form of mud) or falling rocks and debris (including natural or planted flora) causing considerable destruction to the form of human occupation in the path of their flow. Because of this, it can directly affect the share of agriculture in the municipal/state contribution when this productive activity is carried out on this type of land. This explanation shapes both the low values of the agriculture share (3rd, Fig. 2a) and the high values of the industry share (1st, Fig. 2b). The importance of monetary value in high administration, observed more markedly in the second period, may be due to the significant volumes of extraordinary monetary resources accessed by the municipal executive power after the emergency decree when this event reaches the fullness of its local destructive manifestation. This is because the damage caused, especially on trafficable roads and in inhabited areas, requires expeditious and exceptional technical intervention – carried out by human staff and current public equipment, however, combined with that of private companies hired on an emergency basis – to unblock and repair the roads. reached as well as containing unstable slopes. It is surprising, however, that in the second period (2012-2020) HDI income and longevity, both at high values, emerge and become significant in the model. This may indicate that the topographic and geomorphological characteristics of certain municipalities belonging to wealthy regions make the location more susceptible to this event; however, even though the regional socioeconomic characteristics, of a more general nature, are prosperous, the most susceptible terrain in the affected locations may be those in the territories of the most socioeconomically disaffiliated communities, that is, prosperous regions with severe asymmetries in terms of socio-spatial susceptibility.

Storms and Floods. These events, characterized by climatological/atmospheric and hydrological origins, are important in both periods (2003-2011 and 2012-2020), although they occur at different times in the five Brazilian macro-regions. Since this class of events has relative predictability regarding the period of the year - the months - in which it will occur, if the Federative Unit (UF) and the municipalities are (or remain) unprepared to dealing with this type of events, disaster become inevitable. The regularity of manifestation of these threats throughout the year is what would allow the local public administration to achieve the SDGs 11 (particularly, referring to sub-goals 11.5 and 11.7.b) and 13 (sub-goal 13.1), if it acted in non-seasonal periods. Rainy seasons for planning and executing preventive and preparatory measures to face them in the next time and avoid a new disaster. The low and medium number of vulnerable people dependent on the elderly (1st and 6th, Fig. 3) denotes that the locations currently susceptible to disasters associated with this class of events are socioeconomically prosperous; therefore, the chances are greater that the local administration will have its own human, technological and financial resources to adopt an agenda in accordance with the sub-objectives of sustainable development mentioned above.

Fig. 3.
figure 3

Most important variables for class Storms and floods (1.2.1.0.0), 2003-2020.

Dry Season. Medium and high values for single mother heading households and high percentage of vulnerable people dependent on the elderly (1st and 4th, Fig. 4) indicate that the predominant social and economic characteristics associated with the dry season are structural poverty. This is confirmed by the importance, also demonstrated in the model, of the low values in both variables, added value of agriculture and added value of industry ((3rd and 5th, Fig. 4). Thus, it is clear how far the localities experiencing disasters related to this class of event are from SDG 11 (sub goal 11.5: protect the poor and people in vulnerable situations). It is important to highlight the unusual aspect of the variable high investment value in basic education (10th, Fig. 4) associated with disasters caused by dry season, in an apparent contradiction with the above interpretation. It turns out that, in less economically dynamic locations, the amounts transferred from the federal level to the municipalities as well as mandatory spending on education (25% of revenue) stands out in front of the dispersion of other expense items with which the local public administration must deal with. Thus, the importance of high investments in education in the model signals a convergent direction with SDG 4, especially with regard to sub-goal 4.1 (ensure that all girls and boys complete primary and secondary education [.. .] that leads to relevant and effective learning outcomes).

Fig. 4.
figure 4

Most important variables for class Dry season (1.4.1.1.0), 2003-2020.

Drought. In period 1, the triad of low values in percentage of employment in the manufacturing industry, monetary value added to administrative activities and percentage of investments in basic education (7th, 4th and 9th, Fig. 5a) indicates that, to an inoperative public management corresponds a poorly dynamic economy and a precarious educational cultural standard for the child population. Thus, the lack of commitment to the intellectual training of the new generations compromises the brightest perspectives in their occupational future and both make it impossible for these sub-citizens to exercise greater social control over the public policies. This context connects, in an unfavorable way, SDG 1 (eradication of poverty), particularly subgoal 1.5.b (support accelerated investments in actions to eradicate poverty) to SDG 8 (decent work and economic growth), especially subgoal 8.3 (promote development-oriented policies that support productive activities, generation of decent employment, entrepreneurship, creativity and innovation). Period 2 reconfirms the meanings of this same triad, although it does so through the variables low value added by agriculture and low HDI (5th, 9th, Fig. 5b), which clash with the same sub-goals of the SDGs mentioned above (1.5.b and 8.3). The low HDI, however, also points unfavorably to the possibility of achieving SDG 3 (health and well-being) and SDG 4 (quality education), distancing localities in drought emergencies from a local, state and regional development model effectively sustainable.

Fig. 5.
figure 5

Most important variables for class Drought (1.4.1.2.0)

Forest Fires. Forest fires are well demarcated in the model in relation to the beginning and end months of the season in both periods. In the first one (2003-2011), it indicates direct correspondence with high percentages of employment in the service sector (5th, Fig. 6a), these probably being harmed by atmospheric pollution and destruction of forests, especially if they were linked to tourism services, as in the Brazilian Pantanal. In the second period (2012-2020), the lowest percentage of vulnerable people dependent on the elderly (8th, Fig. 6b) shows that these are economically prosperous locations. The regional characteristic (Central-West) and the added value of agriculture (large-scale production) stand out (7th, Fig. 6b), indicating the potential huge economic loss to the primary sector. If, on the one hand, the possibility of specifying the location of these events (by region and municipality) would facilitate prevention strategies, which would converge with SDG 15, sub-goal 15.2 (stop deforestation, restore degraded forests), on the other the recurrence of forest fires indicates that the relative ineffectiveness of the actions taken is far from achieving this goal.

Fig. 6.
figure 6

Most important variables for class Forest fires (1.4.1.3.1)

Viral Infectious Diseases. Viral infectious diseases gained prominence in the model only in the second period (2012-2020), mainly due to the Covid-19 pandemic in 2020, which led to all regions of Brazil declaring an emergency, which is why the event’s starting year is the best demarcated variable in the model (1st, Fig. 7).

The set of high values for variables related to economic activities (high percentage of jobs in the service sector and added value of agriculture, 9th and 10th, Fig. 7) indicates the success of adaptive strategies in the face of a serious biological threat. Despite the mandatory change in collective behavior, through preventive measures of social isolation, it was possible to keep the production and distribution of essential and consumer goods that society needed flowing, which is in line with the purposes of SDG 8.8 (promote safe and secure working environments for all workers) and 3.d (strengthen the capacity [...] particularly of developing countries, for early warning, risk reduction and management of national [...] health risks).

In this case, even though the country’s health policies were unprepared to mitigate the situation that year, the health sector pursued SDG 3, sub-goal 3.9.b (support the research and development of vaccines and medicines for communicable and non-communicable diseases, which mainly affect developing countries).

Fig. 7.
figure 7

Most important variables for class Viral infectious diseases (1.5.1.1.0), 2012-2020.

4 Conclusions

With only seven years left for nations, like Brazil, to achieve the SDGs to which they had committed, United Nations Secretary António Guterres stated: “Unless we act now, the 2030 Agenda will become an epitaph for a world that might have been.” The findings above indicate that the larger number of emergencies that occur in Brazil - indicative of weak institutional and social resilience to face geological, hydrological, climatic, biological and similar threats and prevent them from triggering disasters - is intertwined with cultural vulnerabilities, historical economic and territorial issues, however, uncomfortably persistent in the face of multilateral aspirations. In other words, the Brazilian pace of inducing sustainable inter-sectoral public policies, which can reduce the risks of disasters based on their discussion and training in the content of quality basic education, in the training processes to obtain employment, is slow. There is still a lack of good strategies to protect subsistence and small-scale agriculture in the face of climate setbacks as well as in the expansion of slope containment measures and the improvement of rainwater drainage systems, among others.

During data modeling, we encountered problems with inconsistent and missing data for several indicators, which could significantly impact conclusions and prevent us from exploring more recent periods for an up-to-date warning study. Despite an unbalanced dataset and the inability to create synthetic data, we obtained information that helped interpret events and achieved relevant performance using simple tree-based models. Future studies could expand the analysis with a wider range of classes and variables at both state and municipal levels, enhancing the understanding of emergency decrees. Improving the model would help express the association between emergencies and socioeconomic and socio-environmental variables. This would aid decision-makers in identifying underlying factors related to decrees, supporting a strong government commitment to investing in public institutions that collect, systematize, and disseminate updated open data, thus accelerating progress toward fulfilling the 2030 Agenda.