1 Introduction

Intelligent Transportation Systems (ITS) have recently gained attention due to rising road safety and efficiency concerns. Traffic prediction is crucial for route planning, optimizing vehicle dispatching, controlling traffic congestion, etc. [1, 2]. Traffic prediction (travel time, flows, speeds, occupancy, and demand) utilizes a trainable function to analyze past traffic data and forecast future traffic conditions. It relies on two primary types of data: traffic flow, indicating total detected vehicles over a period, and traffic speed, representing average vehicle velocity in the same area during the same time frame [3]. This study will refer to traffic flow and speed as traffic.

Forecasting traffic presents a formidable challenge given the non-linear, time-dependent, and spatio-temporal properties of traffic time series [1, 4]. Consequently, developing an accurate traffic prediction model is imperative. Accordingly, a wide range of traffic forecasting approaches have been presented in the literature, which can be broadly categorized into conventional parametric statistical models and non-parametric machine learning (ML) and deep learning (DL)-based methods [1, 5]. AutoRegressive Integrated Moving Average (ARIMA) models and their variants, including Seasonal ARIMA (SARIMA) and Vector ARIMA (VARIMA), are commonly used statistical techniques in traffic prediction, as discussed in [5, 6]. However, these models encounter challenges due to the complexity, non-stationarity, and non-linearity of traffic data. Additionally, ARIMA models generally necessitate a substantial volume of historical data to perform effectively, which may not always be accessible [5, 7].

To tackle these challenges, ML and DL methods are extensively utilized in traffic prediction tasks owing to their capacity to automatically extract crucial features from historical traffic data, thus eliminating the need for complex mathematical model design. Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), and their variants, such as Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU), have shown remarkable performance [5, 8]. In addition, Graph Neural Networks (GNNs) [4] are considered state-of-the-art methods particularly well-suited for traffic forecasting challenges due to their capacity to capture spatial dependencies. As discussed in [4], several types of GNNs have been developed, including graph recurrent neural networks (GRNN) [9], graph-structured recurrent neural networks (GSRNN) [10], and graph LSTM (GLSTM) [11], among others. Attention mechanisms and transformers represent another category of DL techniques that have proven effective in the field of traffic prediction [5, 12, 13].

Despite the considerable success and frequent usage of DL methods, they suffer from limitations such as interpretability, parsimony, and high computational costs in terms of time and resources. Fuzzy Time Series (FTS) forecasting methods offer an alternative due to their simplicity, interpretability, updatability, scalability, and capacity to handle uncertainty and complex systems [14]. FTS is a methodology for time series forecasting (TSF) that involves converting the numerical time series into a linguistic time series representation using fuzzy sets. Then, the transitions between the fuzzy sets in the historical data are used to find rules for forecasting. Fuzzy Cognitive Maps (FCMs), a subset of weighted FTS methods, represent a specialized category in fuzzy modeling and forecasting techniques [15]. FCMs, as interpretable ML models, have a good capacity for dealing with uncertainty and effectively simulating the dynamic behavior of non-linear and complex systems [15, 16]. Few studies among the proposed FTS methods have utilized FCM for traffic prediction. Examples include interpretable deep attention FCM in [17], interpretable deep FCM (DFCM) in [18], FCM learned with an evolutionary algorithm (FCMEVOL) in [19].

From another perspective, existing centralized ML and DL traffic prediction methods necessitate collecting raw data for model training, leading to significant privacy risks. DL algorithms typically require vehicles to transmit raw data, including sensitive information like location, to a central server for training the proposed models in a centralized manner [20]. If the central server is compromised, the entire forecasting system is vulnerable to a single point-of-failure attack, risking severe privacy breaches for vehicles. Additionally, heavy reliance on extensive data for centralized training increases communication overhead and the risk of data leakage and pollution [21]. To address these problems, federated learning (FL) [22], which shares model updates without exchanging raw data, has recently been introduced as an efficient solution [23]. FL performs local learning model calculations and then sends the local learning model parameters to the central server for global model aggregation. Accordingly, it can avoid direct interaction with original data and achieve the trade-off between model performance, communication overhead, and data privacy.

To the best of our knowledge, only two investigations have explored the use of FL and FCMs. The authors in [24] implemented FL methods for FCMs in medical applications, specifically for classifying dengue in Colombian cities. A blind federated learning approach without an initial model was introduced in [25] proposing two innovative methodologies for PSO-based FL of FCMs to classify breast cancer and demographic features related to adult income. Consequently, there are no references regarding federated FCMs in forecasting applications. Thus, the focal contribution of this study is to fill this gap by presenting a novel univariate forecasting method termed FL-RHFCM for the first time in the literature. FL-RHFCM is a hybrid method integrating FL and randomized high-order FCM (R-HFCM) to predict traffic flow and traffic speed. R-HFCM [26] is a new class of FCMs combining FTS, FCM, and echo-state network (ESN), trained via least squares (LS). It functions as an ESN with input, reservoir, and output layers. The reservoir contains parallel sub-reservoirs (L) with unalterable weights, a unique feature that sets R-HFCM apart from typical FCM-based methods. LS is then applied to train the output layer and determine the LS coefficients (\(\lambda _i\)). Therefore, in FL-RHFCM, each user is trained locally to obtain the optimal values of \(\lambda _i\). Afterward, these values are shared with the server to be aggregated using the average method to update the global model. Finally, clients are updated locally, receiving updated global model parameters from the aggregator. This process is executed for 15 rounds using the Flower platform.

The rest of this paper is structured as follows: Sect. 2 shares a brief review of traffic prediction; Sect. 3 provides a comprehensive presentation of the proposed approach; Sect. 4 details the experimental results and discussion; and lastly, Sect. 6 encapsulates the paper’s findings and delineates potential directions for future research.

2 Literature Review

Plenty of univariate and multivariate traffic forecasting techniques have been developed in the literature [3, 27], which can be grouped into three main classes, including statistical methods, traditional ML, and DL methods. According to [27, 28], statistical methods, particularly ARIMA and its variants such as SARIMA, SARIMAX, Kohenen ARIMA (KARIMA), and Vector ARIMA, have been frequently employed to predict traffic. These methods are generally suitable for simpler and less complex datasets but require a substantial volume of historical data to perform effectively. This dual requirement means they face limitations when dealing with non-stationary, non-linear, spatio-temporal, and complex datasets. ML models excel in generalization and adaptability to changing traffic network conditions compared to statistical methods [27, 28]. They are typically divided into three categories: feature-based methods, Gaussian process models, and state-space models, capable of handling non-linear and complex time series data [29]. They have pros and cons, as explained in [27]. Artificial neural networks (ANN) [30], k-nearest neighbor (KNN) [31], and support vector regression (SVR) are some examples of this category [32].

DL methods are extensively used in traffic prediction tasks because of their strong capacity to capture stochastic and nonlinear relationships in traffic data [27]. Based on this reference, Multi-layer Perceptrons (MLP), Autoencoder (AE), CNN, RNN, LSTM, GNNs, Restricted Boltzmann Machines (RBMs), Deep Belief Networks (DBN), Graph Convolution Network (GCN), Wavelet Neural Network (WNN) and Attention-based models have been employed to predict traffic. Also, the authors in [3] reviewed 37 DL traffic forecasting models to predict spatial and/or temporal traffic flow and speed, confirming various types of DNN methods. For instance, in [33], DBN, k-means clustering, and Dempster-Shafer theory are utilized for traffic flow prediction, while in [34], CNN with Pearson correlation-based theory is employed to predict traffic speed. Graph Neural Networks (GNNs) with DL are another traffic forecasting method elaborated in [4], including models like graph LSTM (GLSTM), graph multi-attention network (GMAN), graph attention temporal convolutional network (GATCN), among others. The next sub-group of DNN forecasting methods are hybrid methods such as LSTM and bidirectional LSTM [35], encoder-decoder LSTM and FNN-based attention module [36], encoder-decoder GRU, and graph diffusion [37], encoder-decoder GRU, FNN and graph attention network [38], among others.

Fuzzy-based forecasting models are the last category of traffic forecasting techniques that can solve some limitations regarding DL methods regarding interpretability, training time, scalability, and complexity. A new fuzzy-based CNN method in [39], an evolving fuzzy neural network (EFNN) in [40], and an adaptive hybrid fuzzy rule-based system in [41] are some studies in this group. Also, some researchers combined fuzzy and DL such as fuzzy deep convolution network (FDCN) in [42], spatiotemporal fuzzy-graph convolutional network model in [43], and a combination of fuzzy logic, LSTM, and decision trees (DTs) in [44].

Despite accurately forecasting centralized methods, data privacy poses significant challenges. To address this, some researchers have utilized decentralized FL methods to make a tradeoff between prediction accuracy and privacy preservation. An integration of FL and GRU in [45], a combination of FL and attention-based spatial-temporal GNN (ASTGNN) in [46], graph attention networks (GAT), LSTM, and FL in [21], clustering-based hierarchical and two-step-optimized FL in [47], federated community GCN (FCGCN) in [48], and FL with asynchronous GCN in [49] are some examples of FL-based models for traffic flow and speed forecasting.

FCMs, as a neuro-fuzzy method, have shown considerable success in capturing the dynamics of various complex systems and effectively handling uncertainties [26]. Despite this, there is a notable gap in the literature regarding using decentralized FCMs for predictive applications, such as traffic forecasting. To address this, our research introduces a novel federated FCM forecasting model in this field.

Fig. 1.
figure 1

Generic structure of the proposed R-HFCM technique

3 Proposed FL-RHFCM Method

As explained in [15], FCMs are composed of a set of concepts and signed directed connections, known as weights. Therefore, weight matrices form the core element of each FCM, and various training methods have been developed to optimize both the weights and structure.

This research introduces a novel hybrid univariate forecasting method called FL-RHFCM, a fusion of R-HFCM and FL, to predict traffic flow and speed. R-HFCM [26] is a centralized FCM-based forecasting method with a different structure from regular FCMs. Accordingly, this section is divided into two subsections. Section 3.1 details R-HFCM, and Sect. 3.2 provides information regarding our proposed FL-RHFM technique.

3.1 Centralized R-HFCM Method

R-HFCM is a hybrid method combining the concepts of FTS, FCMs, and echo state networks (ESN) [26]. More precisely, as shown in Fig. 1, R-HFCM consists of three layers: the input layer, reservoir, and output layer. The reservoir layer is composed of a specific number of sub-reservoirs (L). Each sub-reservoir employs the HFCM-FTS method [50], with weights randomly initialized following the ESN approach and kept constant throughout training. Then, output from each sub-reservoir, generated from the defuzzification step, is fed to the output layer. Finally, LS is employed to train the output layer and identify the optimal values of LS coefficients (\(\lambda _i\)).

From another point of view, R-HFCM can be seen as an ESN (or reservoir computing) method such that only the output layer is trainable. In R-HFCM, the LS coefficients (\(\lambda _i\)) serve as the sole trainable parameters.

Thus, R-HFCM is not focused on training weights among the concepts. This property distinguishes R-HFCM from other FCM-based methods and makes it much faster than methods trained via population-based techniques such as genetic algorithms (GA) or particle swarm optimization (PSO). Figure 2 a simple topology of the R-HFCM method with \(L=2\), \(k=5\) (concepts), and \(\varOmega =2\) (order). In this case, LS trains the output layer to find LS coefficients. The number of coefficients is three (\(\lambda _0,\lambda _1\) and \(\lambda _2\)) which means that the number of LS coefficients directly depends on the number of sub-reservoirs. As such, the number of LS coefficients is equal to \(L+1\). The details of training and forecasting procedures are described in detail as follows:

Fig. 2.
figure 2

A simple example R-HFCM model with L = 2, \(\varOmega = 2\) and k = 5.

A. Training Process

1) Weight and bias initialization: The weights among the concepts \(C_i\) and \(C_j\) are randomly chosen from a uniform distribution in [−1,1] and scaled inspiring by ESN weight initialization using:

(1)

where \(\mathbf {\rho _{max}}(\mathbf {W^{rand}})\) is the largest absolute eigenvalue of \(\mathbf {W^{rand}}\), and \(\mathbf {\epsilon } = 0.5\). The bias vector \(\mathbf {w^0}\) is initialized similarly:

(2)

where \(\textbf{S}\) is the maximum singular value of \(\mathbf {w_{rand}^0}\).

2) Partitioning: Firstly, the Universe of Discourse (UoD) is determined using the formula \(UoD = [\min (Y_i)-D_1, \max (Y_i)+D_2]\), where \(D_1 = \min (Y_i)\times 0.2\) and \(D_2 = \max (Y_i)\times 0.2\). Then, UoD is partitioned into k even-length intervals (representing concepts of FCMs) using grid partitioning and the triangular membership function.

3) Fuzzification: In this step, the activation state \(a_i(t)\) of each concept \(C_i \in C\), \(\forall y(t) \in Y\), is computed to convert the crisp time series Y into a fuzzy series A. Each fuzzified sample \(a(t) \in A\) is given by \(a_i(t) = \mu _{C_i}(y(t))\), \(\forall i\in \{1, \ldots , k\}\).

4) Activation: The state value of each concept within each sub-reservoir at time t+1 is updated based on the following formula:

$$\begin{aligned} \textbf{a}_j(t+1) = f\left( \textbf{w}^0+ \sum _{l=1}^\varOmega \textbf{W}^{l} \cdot \textbf{a}(t-l+1) \right) \end{aligned}$$
(3)

5) Defuzzification: The output from each sub-reservoir is generated through this process using the below equation:

$$\begin{aligned} \hat{y}_j(t+1)=\dfrac{\sum _{i=1}^k a_{ji}(t+1) \cdot mp_{i}}{\sum _{i=1}^k a_{ji}(t+1)} \end{aligned}$$
(4)

where \(a_j(t+1)\) is the activation state of each concept at time \(t+1\) and \(mp_i\) represents the center of each concept \(C_i\).

6) Calculating LS coefficients: From the outputs \(y_j(t+1)\) \(\forall j \in \{1,\ldots ,L\}\), and for each input sample \(y(t)\in Y\), \(t=1\ldots T\), a design matrix X is formed for the linear system \(Y = \lambda X\). The LS method is then used to find the coefficient vector \(\lambda \) to minimize the mean squared error.

B. Forecasting Process

1) Fuzzification: Same as the third stage of the training process.

2) Activation: Same as the fourth stage of the training process.

3) Defuzzification: First, the defuzzified value of each sub-reservoir is calculated using Eq. 4. Then, the linear combination of \(\hat{y}_j(t+1)\) and \(\lambda _j\) is considered to generate the final predicted value as follows:

$$\begin{aligned} \hat{y}_{f}(t+1)=\lambda _0+\sum _{j=1}^{L} \lambda _{j}\cdot \hat{y}_{j}(t+1) \end{aligned}$$
(5)

3.2 Decentralized R-HFCM Method

This section introduces FL-RHFCM as a decentralized adaptation of the R-HFCM technique, wherein FL is incorporated into R-HFCM to enhance data privacy. Thus, in FL-RHFCM, data is not shared among the users, and each user is trained using the local dataset. Figure 3 exhibits our proposed FL-RHFCM approach. Given m users, denoted as \(\{U_1, U_2,\dots , U_m\}\), each associated with its local data denoted by \(\{Y_1, Y_2, \dots , Y_m\} \). In FL-RHFCM, the UoD for each client is calculated using \(UoD = [\min (Y_i)-D_1, \max (Y_i)+D_2]\), as expressed earlier.

Fig. 3.
figure 3

The generic architecture of FL-RHFCM, considering m clients with m different datasets

In Fig. 3, \(a_i\) and \(b_i\) respectively represent \(\min {(Y_i)}\) and \(\max {(Y_i)}\), \(\forall i \in m\). Furthermore, \(w_0\), W, and \(\lambda _{avg}\) represent bias, weight, and average LS coefficients, respectively. As mentioned earlier in Sect. 3.1, the only trainable parameters in the R-HFCM method are LS coefficients. Accordingly, each local node is in charge of training a local model and sending the local model to the server node for aggregation. More specifically, the steps of our proposed federated method are expressed as follows:

  1. Step 1

    The server initializes \(w_0\), W, a, b, and \(\lambda \), then transmits them to the clients. Besides, \(w_0\) and W remain fixed and are the same during all executions.

  2. Step 2

    Each client (\(U_i\)) is trained using local time series (\(Y_i\)), calculating \(a_i\), \(b_i\) and \(\lambda _i \in \{\lambda _{0i},\lambda _{1i},\ldots ,\lambda _{li}\}\), \(\forall l \in L \).

  3. Step 3

    The obtained \(a_i\), \(b_i\) and \(\lambda _i\) are transferred to the server for aggregation. The server updates a, b, and \(\lambda \) as follows: \( a=\min _{i=1}^{m} a_i\), \( b=\max _{i=1}^{m} b_i\) and \(\lambda =\frac{1}{m} \sum _{i=1}^{m} \lambda _i\).

  4. Step 4

    The updated values are shared with the users. The performance of the proposed model for each user is calculated in terms of accuracy metrics.

  5. Step 5

    This process is repeated in 15 rounds.

It is worth noting that we deployed FL-RHFCM with three client nodes, each trained using distinct time series datasets, as explained in the following section.

4 Computational Experiments

4.1 Case Studies

Two different traffic datasets are exploited to validate our approach:

  1. 1.

    Traffic flow (hourly)Footnote 1: This dataset collected from sensors contains 48,120 observations of the number of vehicles in four junctions;

  2. 2.

    Traffic speed (minutely)Footnote 2: contains 6 road segments of the Xueyuan Road in Beijing, China.

Fig. 4.
figure 4

Plotting of 8,000 samples from three junctions (J1, J2, and J3)

Table 1. Summary statistics of the 6 datasets used in the experiments
Fig. 5.
figure 5

Plotting of 8,000 samples from three road segments (R3, R4, and R5)

This study employs datasets from three junctions (J1, J2, and J3) and three segments (R3, R4, and R5). Detailed information about these time series is reported in Table 1. In addition, Figs. 4 and 5 display traffic flow and traffic speed time series, respectively.

4.2 Experimental Methodology

Root mean squared error (RMSE) and normalized RMSE (NRMSE) are employed to evaluate the model’s accuracy. RMSE is calculated using

$$ RMSE = \sqrt{ \frac{1}{n} \sum _{i=1}^n (y_i - \hat{y}_i)^2 }, $$

where \(y_i\) represents the actual values, \(\hat{y}_i\) represents the predicted values, and n is the total number of samples. NRMSE is then obtained by normalizing RMSE as follows:

$$ NRMSE = \frac{RMSE}{y_{max}-y_{min}}. $$

These metrics provide insights into the model’s performance. It is noteworthy that 80% of each dataset is designated for training, while the remaining 20% is for testing. The model’s Python code is publicly available for replication via the provided link: https://github.com/OMIDUFMG2019/FL-RHFCM-model.

5 Results and Discussion

This section analyzes the accuracy of FL-RHFCM in comparison to various centralized methods, including R-HFCM, PWFTS, LSTM, CNN, CNN-LSTM, and ARIMA. As we mentioned, FL-RHFCM includes three client nodes. Two scenarios are considered: (i) in the first scenario, each traffic speed time series (R3, R4, and R5) is directed to each node, and (ii) in the second scenario, each node receives its traffic flow time series (J1, J2, and J3).

To obtain the optimized performance of the FL-HFCM technique, multiple experiments are conducted, exploring various combinations of hyperparameters (HPs) including \(k \in \{3,4,\ldots ,10,20,30\}\), \(L \in \{2,3,\ldots ,10,20,40\}\), \(\varOmega \in \{2,3,\ldots ,10\}\), and \(f \in \{sigmoid,tanh, ReLU\} \)).

In Scenario 1, the best performance is achieved with \(L=20\), \(k=3\), \(\varOmega =5\), and \(f=tanh\), whereas \(L=8\), \(k=3\), \(\varOmega =5\), and \(f=tanh\) yield the best result in Scenario 2. It is worth observing that the randomized grid search is utilized to find the best HPs for competing methods.

Table 2 showcases the experimental results of all methods to compare the performance of FL-RHFCM with other centralized techniques. The evaluation metrics used are RMSE and NRMSE, where the top result for each dataset is highlighted in bold and the second-best is underscored.

Table 2. Comparison of the FL-RHFCM method with other centralized models

The recorded results in Table 2 suggest that FL-RHFCM performs better predicting traffic speed than forecasting traffic flow. In more detail, FL-RHFCM outperforms other competing methods with R3 and J3 datasets, while centralized R-HFCM is superior for R4, R5, and J2. To this end, employing the average NRMSE facilitates a meticulous comparison of the accuracy of the methods. According to the table, R-HFCM demonstrates the most accurate predictor. Although PWFTS, CNN-LSTM, and LSTM outperform FL-RHFCM, their performances only marginally exceed that of FL-RHFCM. Furthermore, FL-RHFCM maintains data privacy and security, as data is not shared among clients, unlike centralized methods. Moreover, this method is simpler, has faster training speed, and incurs reduced communication costs compared to deep learning models. Also, Fig. 6 indicates that FL-RHFCM converges for each client after the second round, demonstrating its efficiency in reaching stable predictions quickly. Thus, FL-RHFCM’s competitive performance and additional benefits make it a valuable method for spatio-temporal forecasting in federated learning environments. The combination of accuracy, privacy, and efficiency positions FL-RHFCM as a noteworthy approach in the landscape of forecasting methodologies.

5.1 Limitations

  1. 1.

    The number of clients involved in the experiments is relatively low.

  2. 2.

    It is not compared with complex state-of-the-art techniques such as GNNs in this area.

  3. 3.

    While federated experiments were conducted to evaluate our proposed approach, there needs to be more comparison with other federated learning approaches in the literature.

Fig. 6.
figure 6

The NRMSE accuracy of the FL-RHFCM method for each client per round.

6 Conclusion and Future Works

This study introduces the first proposal of a distributed framework for FCM-based forecasting methods called FL-RHFCM. FL-RHFCM integrates FL and centralized R-HFCM to predict both traffic speed and traffic flow. The experimental results illustrate that FL-RHFCM is a robust and efficient forecasting method, especially advantageous in scenarios requiring data privacy and decentralized learning. While centralized R-HFCM demonstrates the highest overall accuracy, FL-RHFCM’s competitive performance and additional benefits make it a valuable method in spatio-temporal time series forecasting scenarios. Since traffic datasets are high-dimensional and non-stationary time series, future work will extend FL-RHFCM to predict multivariate time series, considering these aspects.