1 Introduction

Crime prediction has long been a topic of great interest, supporting law enforcement agencies in their decision-making processes [11]. In this context, machine learning (ML) techniques have played a major role, providing a multitude of analytical tools to scrutinize and predict crime-related phenomena [1, 8, 17, 20]. However, traditional machine learning algorithms often fail to capture complex spatiotemporal patterns inherent in non-random crime events [13]. A primary reason for their limited performance is that these models handle each data point independently, disregarding the interconnected nature of events over time and space [14].

Recent advancements in machine learning, particularly in Graph Neural Networks (GNNs) [31], bring new perspectives to these challenging scenarios. Specifically, GNNs have demonstrated remarkable success across various applications involving spatiotemporal data, including social network evolution [7], traffic prediction [12], and identification of urban structural patterns [30]. Despite the success in various domains, Graph Neural Networks have not yet been fully explored for crime prediction tasks, where traditional ML models still dominate the field. Among the few works that utilize GNNs for crime analysis and forecasting is the model proposed by Han et al. [10], which leverages Graph Convolutional Networks (GCNs) and Long Short-Term Memory (LSTM) networks to capture spatial dependencies and temporal patterns of crimes. A similar methodology has been proposed by Jin et al. [13], where Graph Convolutional Networks and Recurrent Neural Networks (RNNs) are combined to predict crime hotspots. These studies demonstrate the potential of GNNs for analyzing and forecasting crime events.

Effective data modeling and pre-processing are crucial when preparing datasets for Graph Neural Networks, particularly for crime prediction tasks. These processes dictate how raw data is transformed and integrated into graph structures, significantly impacting the model’s performance and its ability to learn and identify meaningful patterns. For instance, when street maps are used to generate graph-based spatial discretization, the nodes of the graph can correspond to street segments, with edges connecting nodes whose street segments intersect. Additionally, attributes can be assigned to nodes and edges of the graph. The challenge to make this assignment lies in integrating data from different modalities, discretizations, and resolutions. For instance, socioeconomic indicators, population education levels, and the number of points of interest (such as bars, bus stops, and schools) near each graph node or edge are typically discretized into census tracts and considered static information. Climate data is provided at a few specific locations and varies over time, while crime events are geolocated and typically dispersed throughout the spatial urban domain. Properly integrating all these data sources into the nodes and edges of the graph is a problem for which no consolidated methodology currently exists.

This work presents a data modeling methodology capable of integrating data from different modalities and distinct discretization domains into a graph structure derived from street maps. The proposed data modeling method has been employed to structure data used as input for two distinct spatiotemporal GNN models. Specifically, we evaluate and compare the Dynamic Self-Attention Network (DySAT) [26] and the Evolving Graph Convolutional Network (EvolveGCN) [22] both adapted to operate in crime data. DySAT employs a dynamic self-attention mechanism to model the temporal evolution of node representations, effectively capturing long-range dependencies in both space and time. EvolveGCN, on the other hand, uses recurrent neural network (RNN) architectures to dynamically update the parameters of the GCN layers, thus adapting to changes in the graph structure over time. DySAT and EvolveGCN performance are compared against a baseline to better understand the advantages and limitations of the GNNs in the context of crime prediction.

Our Contributions. In summary, the main contributions of the present work are:

  1. 1.

    Data modeling pipeline: We develop a robust data processing pipeline to integrate crime (we used data from São Paulo’s Department of Public Safety [27]) and urban infrastructure into a graph derived from a street map. This integration allows for a detailed and dynamic representation of the crime landscape in São Paulo. The proposed pipeline is versatile enough to be applied to various domains and geographical areas, requiring only the data is geolocated.

  2. 2.

    GNNs for Crime Data: We adapt two distinct spatiotemporal GNN architectures, DySAT and EvolveGCN, for crime prediction. To our knowledge, these models have not been previously applied to the context of crime prediction.

  3. 3.

    Models Evaluation and Comparison: We thoroughly and comprehensively evaluate and compare DySAT and EvolveGCN using the processed São Paulo crime data, assessing their effectiveness in learning from historical data and predicting future crime occurrences.

This study explores the effectiveness of GNN architectures, specifically DySAT and EvolveGCN, in the crime domain by applying, evaluating, and comparing these models on crime data using our developed data pipeline. Integrating street maps and crime data, combined with advanced spatiotemporal modeling techniques, provides a novel approach to understanding and predicting crime dynamics in urban environments. Our findings offer valuable insights for researchers and practitioners working on crime prevention and public safety enhancement.

2 Related Work

Crime analytics has been an essential area of urban safety and policy-making study. Traditional studies primarily focus on statistical methods and classical machine learning models. For instance, Gorr and Harries [9] employed time-series regression models to forecast crime rates, concentrating primarily on the temporal elements of crime data. These models, while valuable, often needed to account for the spatial dependencies that inherently exist in crime patterns. Spatial analysis techniques, such as hotspot analysis and spatial clustering [6], have been used to identify areas with high crime rates. These methods highlight the geographic concentration of crimes but typically lack the temporal dimension essential for dynamic crime prediction.

Recent advancements in spatio-temporal data modeling have enabled the development of sophisticated and complex analytical tools for analyzing complex phenomena [8]. For instance, Salinas et al. [25] proposed a city hub framework that integrates data from different sources on the nodes of a street map graph, aiming mainly to investigate crime-related phenomena. Although their goal was more on visualizing and analyzing crime events, they leveraged crime prediction models to support the analysis. To analyze human and drug trafficking crimes Ahmed et al. [2] used entity resolution techniques to merge multiple state-wide and county-wide crime datasets into a geographic graph, combining incident reports, crime reports, and court records, in a single dataset. However, their data do not account for urban-related variables, which are fundamental for crime analysis. The work by Salinas et al. [25] and Ahmed et al. [2] are examples of methods designed to integrate data into a graph structure. However, as far as we know, there is no consolidated methodology to perform such an integration.

Deep learning models such as Convolutional Neural Networks [21] and Recurrent Neural Networks [19] have been adapted for handling spatio-temporal data, with noteworthy success. Shi at al. [28], for instance, demonstrated the effectiveness of CNNs for spatial feature extraction and RNNs for capturing temporal dependencies in video data. Despite these advancements, these models frequently struggle with irregular and non-Euclidean data structures, which are natural real-world representations, particularly for urban analytics [4].

Graph Neural Networks (GNNs) have emerged as robust tools for modeling complex graph-structured data. The work by Kipf and Welling [15] on Graph Convolutional Networks (GCNs) indicated that GNNs can leverage connectivity information present in graph data to improve predictive performance. However, static GNNs are limited in their ability to model temporal dynamics, which is critical for many real-world applications, including crime prediction.

Dynamic GNNs extend static GNNs by integrating temporal information, making them suitable for spatio-temporal prediction tasks. For instance, the Dynamic Self-Attention Network (DySAT) [26] employs a self-attention mechanism to capture temporal dependencies within dynamic graphs. Similarly, Evolving Graph Convolutional Networks (EvolveGCN) [22] use recurrent neural networks to update node embeddings over time, effectively capturing the evolution of graph structures. These dynamic GNNs have shown substantial advancements in various applications, such as traffic prediction, social network analysis, and financial forecasting, effectively modeling the temporal evolution of graph data. In the context of crimes, Han et al. [10] developed a crime prediction framework using Graph Convolutional Networks to model the spatial dependencies between different regions and Long Short-Term Memory (LSTM) networks to capture temporal patterns. Jin et al. [13] introduced a Spatio-Temporal Graph Convolutional Network (ST-GCN) that combines GCNs with recurrent neural networks to predict crime hotspots, learning both spatial and temporal features from crime data.

The discussion above has only the intent of contextualizing our contribution. A more comprehensive discussion about the use of machine and deep learning for crime forecasting can be found in several surveys bout the theme [5, 18, 29].

3 Data Modeling

For this study, we utilized crime data from the São Paulo Department of Public Safety [27], which includes detailed information on crime occurrences with latitude and longitude coordinates. To complement this data, we extracted street network data using the Osmnx Python package [3]. The street network data is used to construct graphs representing streets as nodes and their connections as edges. Each node is labeled to indicate the presence or absence of a crime, making the labels dynamic as they evolve over time.

The crime and street network data are integrated to form a spatiotemporal dataset. This integration is crucial for creating the dynamic graph structures necessary for our models. The resulting dataset consists of monthly snapshots of the graph spanning two years, resulting in 24 graphs in total. In this work, we have used months as a time duration, but our data modeling pipeline can also be used for days and years, depending on the analysis. We decided to use monthly intervals as a temporal range to reduce the sparsity effect in the data. If processed daily, the ratio of occurrences to non-occurrences would be around 1/100, whereas with monthly processing, it is around 1/5.

In more details, the graph construction process involved the following steps:

  1. 1.

    Downloading street graph from given region: We use Osmnx [3] to download street graphs from OpenStreetMaps. Our code allows us to input two types of parameters at this stage: a string containing the location from which we want to fetch the graph, or a polygon formed by a list of latitude/longitude coordinates defining the region of interest. In this study, we restricted our graph to the central region of São Paulo.

  2. 2.

    Processing street graph: In step 1, we acquire a directed graph where streets are depicted as edges and intersections as vertices, encompassing diverse transportation networks. In this step, we unify all types of transportation networks and invert the roles of vertices and edges: street segments are redefined as vertices, while intersections become edges, resulting in a undirected graph.

  3. 3.

    Extracting Spatial Features: For each vertex, we associate a feature vector containing spatial information. These features are static and sourced from the public Geosampa website [24], which provides tools for extracting various data about the city. The features selected for testing include points of interest, such as the number of health, cultural, security, education, and social assistance services located within a predetermined distance from a node where a crime occurred.

  4. 4.

    Projecting crime data: Crime data is extracted from São Paulo’s Department of Public Safety [27]. Each occurrence is associated with a latitude and longitude coordinate, which is then projected onto the respective street vertex. Crimes with coordinates outside the region define in step 1 are removed.

The complete data processing pipeline can be found in our GitHub repositoryFootnote 1. The final result of our data processing pipeline is illustrated in Fig. 1. Figure 1 represents data from the month immediately following the training period. The street segments with green lines represent the non-crime nodes (streets), and the red lines represent crime nodes.

Fig. 1.
figure 1

Street Map Graph for the Central Region of São Paulo. Crime nodes are marked in red, and non-crime nodes are marked in green. (Color figure online)

4 Models

This paper utilizes two distinct graph neural networks (GNNs) architectures for crime prediction: DySAT [26] and EvolveGCN [22]. The choice of DySAT and EvolveGCN is motivated by their ability to handle dynamic graphs effectively [16], making them eligible for our analysis.

4.1 DySAT

Dynamic Self-Attention Network (DySAT) [26] is an unsupervised graph embedding model that learns a latent node representation to capture dynamic graph representation. It computes node representations through self-attention along two dimensions (layers): structural neighborhood and temporal dynamics. The structural attention layer extracts features from local node neighborhoods in each snapshot through self-attentional aggregation. The temporal attention layer captures the temporal variations in graph structure over multiple time steps. DySAT leverages the self-attention mechanism to concentrate on the most appropriate parts of the graph when updating node representations. This permits the model to effectively consider the importance of different nodes and edges, improving its ability to learn complex patterns. DySAT jointly learns both spatial (structural) and temporal patterns. By integrating these two types of information, it constructs more accurate and robust node embeddings.

The architecture of the model is as follows:

  • Input layer: The input to the DySAT model consists of snapshots of the dynamic graph over all time steps. Each snapshot describes the graph at a particular time step, with nodes and edges having associated features.

  • Spatial Self-Attention: Each snapshot undergoes a spatial self-attention layer to capture structural associations, estimating attention scores between nodes to aggregate information from relevant neighbors.

  • Temporal Self-Attention: Spatial embeddings are then processed through a temporal self-attention layer to learn temporal dependencies and how node representations evolve over time.

  • Output Layer: Final node embeddings are obtained by integrating spatial and temporal embeddings, used for tasks like node classification, link prediction, and anomaly detection. In this study, we used them for node classification.

4.2 EvolveGCN

The main goal of Evolving Graph Convolutional Networks (EvolveGCN) [22] is to adapt GCNs, originally designed for static graphs, to effectively handle graphs that evolve continuously. This adaptation is crucial for real-world applications such as social networks, recommendation systems, and communication networks, where relationships between entities evolve over time. EvolveGCN achieves this by introducing an evolutionary approach to update GCN parameters as the graph evolves. It treats these parameters as temporal sequences and utilizes time-series models to predict and adjust them over time. The method focuses on two primary models for parameter evolution:

  • Hidden model: This model employs LSTM networks to capture temporal dependencies and evolve GCN parameters based on historical states.

  • Output model: In contrast, this model uses direct RNN architectures to update parameters based on output states from the previous layers.

In this work, we make use of the Hidden Version. The training of EvolveGCN involves optimizing not only the base GCN’s parameters but also the evolution model’s (LSTM or RNN) parameters. At each new time step, the GCN’s weights are updated using the selected evolution model, ensuring that the network adapts to the graph’s changing dynamics.

4.3 Baseline Models

We compare the GNN models against three traditional non-deep learning models: Support Vector Machine (SVM), Random Forest and Logistic Regression. This analysis aims to highlight the predictive capabilities of GNNs in contrast to these established machine learning approaches. By evaluating the performance of these traditional models, we provide a benchmark to better understand the advantages and limitations of GNNs in the context of crime prediction.

5 Experimental Setup

We conducted experiments on a real-world crime records dataset collected in Sao Paulo from Jan. 1st, 2021, to Dec. 31st, 2022. We integrated the crime data with street map data and transformed them into dynamic graphs; the transformation details are already discussed in Sect. 3. Dynamic graphs are a series of static graph snapshots, \(G = \{G^1, ..., G^t\}\) where t represents the number of time steps. In this work, each snapshot represents the crime data for a month, therefore, in total, we have 24 months of graph snapshots. We generated 22 instances from 24 snapshots using a sliding window approach, with each instance comprising three consecutive months (the first two for training and the last for evaluation). These were split 80-20 ratio into 17 for train/validation and 5 for testing, with the train/validation set further divided into 12 for training and 5 for validation. This method captures temporal dependencies while ensuring unbiased evaluation. Each graph has 4021 nodes and 19899 edges. The graph statistics are described in Table 1.

Table 1. Crime dataset summary.

5.1 Models Hyperparameters

We conducted several experiments varying the hyperparameters and selected the most advantageous trade-off among the metrics used. Ultimately, we chose the hyperparameters based on the stability of the performance in the training and validation sets, which led to a significant improvement.

DySAT. In our DySAT model experiments, we carefully selected hyperparameters to balance complexity, efficiency, and overfitting. To address the dataset’s imbalance, we used ClassWeights with a 1:4 ratio for majority and minority classes. We set the learning rate at 0.01 and applied weight decay of 0.0005 to regularize the model. Dropout rates of 0.4 and 0.5 were used for spatial and temporal attention to enhance robustness. The model’s architecture included 16 attention heads and 128 units in spatial and temporal layers, enabling it to capture complex relationships while maintaining computational efficiency.

Table 2. Models Hyperparameters.

EvolveGCN. Three primary hyperparameters in EvolveGCN significantly impact its performance. The first, ClassWeights, assigns weights to classes, with the second value weighting the target class (class 1) and the first value weighting class 0. The second, NumberOfSteps, defines the number of temporal layers (e.g., months) the model considers for predictions. The third, FeaturesPerNodes, determines the input and output dimensions between network layers, adjusting the state vector’s size during processing. Table 2 provides the selected hyperparameter values.

Baselines. We utilize the default parameters provided by the Scikit-Learn Python library [23] for all the baseline methods. However, due to the dataset’s imbalanced nature, we employ the ClassWeight parameter set to ’balanced’ across all these methods. This adjustment ensures more accurate results by addressing the class imbalance that would otherwise degrade the model’s performance. We have a list of features for each time step, with each element representing a dataframe of node features at a specific time step. To prepare the data for traditional baseline methods, we employed a sliding window approach with a window size of 3. In this method, we selected three consecutive dataframes, concatenated the first two for training, and used the third for testing, averaging the results across all sliding windows.

6 Evaluation

We used precision, recall, and F1-score to compare the models’ performance. These metrics are calculated for both the positive class (crime occurrences) and the negative class (no crime occurrences).

6.1 Metrics Performance

In both deep learning models, as illustrated in Fig. 2, the training and validation loss curves are closely aligned and consistently decrease with the increase in epochs. The close alignment of these curves indicates that the models generalize well to unseen data, providing strong evidence that overfitting is not occurring. This observation is further affirmed by comparing validation and test metrics shown in Fig. 3. However, during the training of the EvolveGCN model, an unusual spike occurs around epoch 800. We hypothesize that this may be due to the model encountering a local valley in the loss landscape. Further investigation is required to confirm this hypothesis.

Fig. 2.
figure 2

Training and Validation loss over epochs of GNNs models.

Dysat. In the context of highly imbalanced data, achieving high F1 scores is particularly challenging but essential for effective model performance. In Fig. 3a, the increasing F1 scores for both validation and test sets indicate that the DySAT model is adept at handling the imbalanced dataset. An F1 score of 0.57 on the validation set demonstrates that the model reasonably balances precision and recall. It signifies that DySAT can correctly identify many positive instances while minimizing false positives and negatives. Similarly, an F1 score of around 0.55 on the test set indicates that the model generalizes well to unseen data. Additionally, Table 3 reaffirms DySAT’s superior performance compared to other methods. The DySAT model’s ability to manage the complexities of imbalanced data and maintain consistent performance across validation and test sets highlights its potential for real-world applications, particularly in areas where accurate minority class predictions are critical.

Fig. 3.
figure 3

F1 scores over epochs of GNN’s model

EvolveGCN. Despite the EvolveGCN model’s metrics not ranking among the highest, when we plot the model’s results on the map for hotspot analysis, we observe an interesting outcome. As illustrated in Fig. 4, the reason for the metrics not being so high is that the model tends to favor classifying safe (blue) and unsafe (red) areas only when there is strong evidence to do so, which explains the large number of uncertain areas (blue) and the higher recall compared to all other models. This aspect of the model has its advantages and disadvantages. For instance, while this cautious approach may lead to more reliable identifications of true hotspots, it can also result in overlooking subtle but important patterns or changes in the data, thereby affecting the model’s overall precision and specificity.

Table 3. Performance Metrics.
Fig. 4.
figure 4

Model predictions projected onto street graph. (Color figure online)

Baselines. When it comes to traditional machine learning models, Random Forest achieved results comparable to the DySAT model for class 1 metrics, while Logistic Regression performed similarly to the EvolveGCN model, as shown in Table 3. In contrast, the SVM model demonstrated notably poor performance. Despite the metrics being similar to those of graph-based models, there is a crucial caveat to consider: traditional models do not take into account the temporal dynamics of our data. Consequently, they fail to generalize their predictions over time. This limitation becomes particularly evident when plotting predictions on a map for hotspot construction, as we show in Fig. 4.

In these figures, green represents “safe” zones, red indicates “dangerous” zones, and blue signifies model uncertainty. Notably, the GNN models have distinct safe and dangerous regions, which we will discuss in the next section. However, in traditional models, these regions are less clear. This is because the models are often uncertain about their predictions (as seen with SVM, Logistic Regression and even Random Forest have more blue lines) or because they make highly heterogeneous predictions across the region of interest.

6.2 Hotspot Analysis

As shown in Fig. 5, predictions produced by GNN methods, which consider the temporal dynamics of the events under study, can pinpoint several notable regions on the map, marked A, B, C, D, E, and F. These regions are significant as they align with known areas of high or low crime rates in central São Paulo, as shown in Fig. 1. A detailed description of each region is provided below:

  1. A.

    Encompasses the entire República neighborhood in São Paulo, which is widely known for its high risk of robbery and theft.

  2. B.

    Located just above the República neighborhood, is identified by our model as a safer area. Examination of this region reveals the presence of several banks, including Bradesco, Santander, and Itaú, in the nearby blocks.

  3. C.

    Includes the surroundings of the Estação da Luz in São Paulo, a well-known high-traffic area with a reputation for being dangerous.

  4. D.

    Covers the vicinity of Largo da Concórdia square in São Paulo, near a Lojão do Brás. This area, where most transportation is on foot, experiences a high incidence of pickpocketing, leading to many cell phone thefts.

  5. E.

    Encompasses regions A and is very close to region C.

  6. F.

    Encompasses region D.

Fig. 5.
figure 5

Noteworthy predicted regions. (Color figure online)

In our hotspot maps, shown in Figs. 4 and 5, we used soft predictions for both baseline and GNN models to better represent prediction confidence. Probabilities were mapped into three regions: non-crime (green) for probabilities near 0, uncertain (blue) for probabilities around 0.5, and crime (red) for probabilities near 1. This mapping effectively highlights areas of high and low confidence.

When comparing traditional machine learning models to Graph Neural Networks (GNNs), the latter demonstrates significant advantages in capturing the complex spatial and temporal dependencies inherent in crime data. Among the traditional models tested, Logistic Regression, Random Forest, and SVM, Random Forest achieved the highest F1-score, slightly surpassing the GNN models, particularly DySAT. However, GNNs practical application reveals its superiority. When visualizing predictions on a street map, DySAT effectively identifies crime hotspots with clear regions of red (crime nodes) and green (no crime nodes) and minimal uncertainty (blue lines) as shown in Fig. 5. This contrasts with Random Forest, which shows more uncertainty and less distinct regions. Thus, despite a some slightly lower scores, GNNs practical performance in hotspot identification demonstrates its superior effectiveness over traditional models. While somewhat subjective, this approach enhances our understanding of spatial crime patterns.

7 Conclusion

This study demonstrates the potential of Graph Neural Networks (GNNs), specifically DySAT and EvolveGCN, in improving crime prediction accuracy by effectively modeling crime data’s dynamic and interconnected nature. The developed data modeling pipeline, integrating crime data with street map graphs, enables a detailed representation of crime dynamics in São Paulo. Our evaluation and comparison of DySAT and EvolveGCN reveal their effectiveness in learning from historical data and predicting future crime occurrences. These findings highlight the need for further exploration and adoption of GNNs in the crime prediction domain, offering valuable insights for enhancing public safety strategies and crime prevention efforts.