key: cord-0128782-fmoffpm8
authors: Kang, Jiawen; Xiong, Zehui; Jiang, Chunxiao; Liu, Yi; Guo, Song; Zhang, Yang; Niyato, Dusit; Leung, Cyril; Miao, Chunyan
title: Scalable and Communication-efficient Decentralized Federated Edge Learning with Multi-blockchain Framework
date: 2020-08-10
journal: nan
DOI: nan
sha: de1dffcb3dae887ce9817a125b3b01b0e05c7f11
doc_id: 128782
cord_uid: fmoffpm8

The emerging Federated Edge Learning (FEL) technique has drawn considerable attention, which not only ensures good machine learning performance but also solves"data island"problems caused by data privacy concerns. However, large-scale FEL still faces following crucial challenges: (i) there lacks a secure and communication-efficient model training scheme for FEL; (2) there is no scalable and flexible FEL framework for updating local models and global model sharing (trading) management. To bridge the gaps, we first propose a blockchain-empowered secure FEL system with a hierarchical blockchain framework consisting of a main chain and subchains. This framework can achieve scalable and flexible decentralized FEL by individually manage local model updates or model sharing records for performance isolation. A Proof-of-Verifying consensus scheme is then designed to remove low-quality model updates and manage qualified model updates in a decentralized and secure manner, thereby achieving secure FEL. To improve communication efficiency of the blockchain-empowered FEL, a gradient compression scheme is designed to generate sparse but important gradients to reduce communication overhead without compromising accuracy, and also further strengthen privacy preservation of training data. The security analysis and numerical results indicate that the proposed schemes can achieve secure, scalable, and communication-efficient decentralized FEL.

With the rapid advancement of Artificial Intelligence, a larger amount of emerging applications empowered by machine learning technologies significantly enhance the life quality of humans [1] . These applications, such as automatic driving and smart healthcare, utilize advanced machine learning algorithms to train different learning tasks on massive user data from various edge nodes, e.g., smart phones. For traditional machine learning approaches, user data needs to be gathered and centralised in a central server for model training, such as chest CT image analysis for COVID-19 diagnosis. However, the centralized learning approaches may bring serious data privacy leakage problems. The growing concerns about security and privacy of user data have intensified the demand for new solutions. A promising machine learning technique named Federated Edge Learning (FEL) is introduced to achieve privacy-preserving model training [2] . In FEL, the edge nodes collaboratively train a globally shared model by their local data, and only send their local model updates instead of raw data to a central server [3] . The central server gathers all the local model updates to generate an updated global model for the next training iterations.

Despite that FEL has great advantages for AI-based application with requirements of data privacy protection, there exist two major challenges for the wide deployment of FEL as follows: (I) The central server plays an important role to aggregate local model updates from edge devices and maintain global model parameters, but is vulnerable to security challenges, e.g., single point of failure. An unstable central server may result in a system crash. A compromised central server may generate falsified global model to mislead model training and increase system resource consumption. (II) There lacks a communication-efficient FEL framework for scalable model training. In the existing FEL framework, edge devices need to frequently upload a large number of local model parameters to the central server for model aggregation, which causes excessive communication overhead and a high demand for network bandwidth [4] .

For the security issues of a single central server, previous researchers have integrated blockchain into federated learning for secure model training [4] [5] [6] . Kim et al. presented a public blockchain-based federated learning framework, in which local model updates are exchanged and verified among miners running energy-hungry Proof-of-Work consensus algorithms [5] . Instead of public blockchain, Lu et al. [6] proposed a hybrid blockchain framework with an asynchronous learning scheme for secure and efficient federated learning. Similarly, Li et al. [4] designed a decentralized federated learning framework using permissioned blockchain. Although blockchain is an effective way to replace the central server with security guarantee, the process of sharing local model updates among miners brings data privacy leakage challenges to FEL, which is ignored in the existing work. Specifically, recent studies have shown that, even only sharing gradient parameters, a compromised miner may launch inference attack that infers features of private training data, even the training data of edge devices, from publicly shared gradients on blockchain [7] .

For the communication efficiency issues, the existing study presented new consensus mechanisms for blockchain-based FEL to reduce communication cost [4] or developed communication-efficient stochastic gradient descent algorithms [2] , e.g., gradient quantization and encoding [8] . However, the existing schemes cannot be straightforwardly applied to large-scale FEL because of high communicationoverhead caused by lots of gradients exchanged between edge devices and a central server (or miners). The challenges drive the urgent need of developing secure, decentralized, privacy-preserving and communication-efficient FEL.

To address these challenges, we first propose a Blockchain-empowered Federated Edge Learning (BFEL) framework without relying on a trusted centralized server. In BFEL, a consortium blockchain acting as a trusted and decentralized ledger to manage model updates from edge devices. To filter out malicious or poisoning model updates, we then propose a Proof-of-Verifying (PoV) consensus scheme to collaboratively verify the quality of local model updates among predefined miners. Only the verified model updates can be stored into the block for decentralized federated learning. Since the communication efficiency is significantly important for BFEL, we further integrate a gradient compression scheme into PoV without lowering learning accuracy. This scheme also relieves inference attack to improve privacy protection of training data. Moreover, after model training, learning task publishers can share their models to other entities without enough budget or resources to organize federated learning. For example, a map company can reuse and trade its traffic-prediction training model to vehicles for economic benefit. For the sake of security, the sharing records will be added in the blockchain. However, if both model updates and model sharing records are stored into a single blockchain, this will result in larger block size and higher consensus delay. The miners with limited resources cannot synchronize block data in real time. To avoid this dilemma, we design further a scalable and flexible framework consisting of a public blockchain as the main blockchain and multiple consortium blockchains as subchains for performance isolation [9] . Specifically, according to data characteristics and service demands (e.g., access control), the model updates from edge devices are respectively stored on individual subchains named "Model training subchains". Meanwhile, the model sharing records between the task publishers and other entities are stored in a subchain named "Model trading subchain".

The main contributions of this paper are summarized as follows.

-Unlike single blockchain-based systems, we design a hierarchical blockchain framework with a main blockchain and multiple subchains to manage model updates and model sharing records in a secure, scalable and flexible manner. -For model training subchains, we design a PoV consensus scheme to filter out unreliable model updates by allowing miners to collaboratively verify the quality of model updates for secure BFEL. -We propose a gradient compression scheme to improve the communication efficiency of BFEL without compromising learning accuracy, and also to enhance privacy preservation by mitigating inference attacks. 

As shown in Fig. 1 , the considered federated edge learning system includes an application layer and a blockchain layer. In the application layer, each task publisher, e.g., a map company, sets a learning task (e.g., traffic prediction) and sends the collaborative machine learning request to nearby wireless communication infrastructures, e.g., RoadSide Units (RSUs) in vehicular networks or base stations in cellular networks (Step a in Fig. 1 ) [3, 10] . These infrastructures broadcast the learning task to edge devices with suitable data (e.g., vehicles or smart phones). Legitimate edge devices can join in a task group and act as workers to train the learning task on their local datasets (Step b). Each dataset is generated from personal applications (e.g., navigation services) or collected from surroundings (e.g., sensors on vehicles). Each worker trains a given global model from its task publisher, and generates local model updates (Steps c, d, e). Considering large communication overhead of transmitting local model updates to miners, a gradient compression scheme is performed to transform the model updates into compressed model updates with sparse gradients (more details are given in Section 4). Here, the miners can be pre-selected RSUs or base stations to establish a consortium blockchain called "Model training subchain". Next, the workers upload their compressed model updates to the miners for model quality evaluation. After executing Proof-of-Verifying (PoV) consensus scheme (introduced in Section 3), the qualified model updates are included into a new block and stored in a model training subchain (Step f ). Finally, the workers download the latest block data and calculate a new global model for the next iterations till meeting the accuracy requirements of the task publisher. The final global model is sent back to the task publisher, and the task publisher rewards the workers according to their contributions [3] . Furthermore, after training, task publishers with high-quality global models can act as model sellers to trade their models with model buyers (e.g., drivers) without sufficient cooperating workers or training budget. The model trading records are recorded in a consortium blockchain named "Model trading subchain" for secure storage (Steps g, h).

In the blockchain layer, blockchains play a significant role in the federated edge learning to provide secure, traceable, tamper-proof data storage (i.e., model updates and trading records), which removes the control from a centralized server suffering from security and privacy challenges. However, traditional blockchain systems based on a single blockchain are not practical and scalable for largescale FEL because of limited throughput, long consensus delay, and large block size. Miners in a single blockchain are often overloaded because of constrained resources. Moreover, block data from different services or purposes, e.g., model training and model trading records, should be set different access permission for different entities, and is stored in isolation to protect data privacy [9] . To this end, we propose a multi-blockchain system including consortium blockchainbased subchains and a public blockchain-based main chain.

Specifically, by treating model updates as "transactions" between workers and task publishers, local model updates of workers and workers' contributions are securely stored in their corresponding model training subchains [9] . Each subchain is only accessible for a task publisher and its participating workers. Meanwhile, to enable secure and reliable model trading, the model trading records should be kept as tamper-proof records in the model trading subchain. Only the task publishers and their model buyers can access and obtain block data in this subchain. For different subchains, miners are randomly chosen from communication infrastructures with sufficient computation and storage resources to execute efficient consensus algorithms (e.g., DPoS and PBFT), respectively. These miners will be changed after each consensus round to reduce the effects of possible collusion among the miners. The miner selection schemes are out of scope here, but can refer to related work in [11] .

To efficiently monitor all subchains and miner behaviors, all the subchains should be anchored to the main chain after a time interval for effective governance. To solve the trust problem among blockchains, the block data in the individual subchains can be easily verified by following the notary mechanism in [9, 12] . The main chain periodically stores the Merkle tree root of the block data from different subchains, not the original bock data on the subchains for privacy protection and saving storage resources. This means that the main chain only manages and maintains network addresses of model updates and model trading records. The model buyers can search global models by the latest block data in the main chain, and thus send trading requests to finish the model trading. In short, compared with traditional single blockchain-based systems, the proposed framework can achieve: i) data privacy protection by setting access permission on individual subchains and ii) performance isolation through individual consensus algorithms. Each individual subchain maintains its own data locally, and all the subchains are anchored to the main blockchain periodically for publicly verifiable integrity of subchains as well as ensuring scalability and flexibility.

Although federated learning can solve data privacy issues to a certain extent, it is subject to new security threats, such as: i) poisoning attack and ii) inference attack. For poisoning attack, malicious edge devices may intentionally send malicious, poisonous or low-quality model updates to poison the global model, thus misleading model training process and increasing the probability of incorrect learning results [3] . The poisoning attacks degrade the accuracy of learning tasks, increase the convergence time of the global model, and the probability of erroneous learning results. For inference attack, recent studies have shown that a compromised central server (i.e., parameter server) can infer underlying training data by analyzing shared local gradients from edge devices when using gradientbased reconstruction. This intrudes the data privacy of edge devices illegally and silently [7, 13] . This attack is becoming more serious because more entities may obtain shared gradients in blockchain-based federated learning systems. Therefore, it is important to defend against the poisoning attack and inference attack for secure and privacy-enhanced federated edge learning [3, 7] .

Proof-of-Verifying Consensus Scheme for Training Subchain

In this paper, inspired by the Delegated Proof-of-Stake (DPoS) consensus algorithm, we propose an efficient consensus scheme named Proof-of-Verifying (PoV) that integrates model updates and quality evaluation into the consensus process, which can defend against poisoning attacks and achieve secure model update and storage. The main steps involved in PoT are as follows. Meanwhile, the leader miner is responsible for aggregating all qualified local model updates and generating pending block. After each round of consensus, for the sake of safety, the leader and the verifiers will be changed randomly. Similar to DPoS, all miners should submit a deposit to a shared account under public supervision. If a miner has malicious behaviors during PoV consensus process or causes damage to the global model, the blockchain system will confiscate the deposit and remove the miner. -Step 3: Quality evaluation of local model updates: After finishing a local model training process, each worker (i.e., participating edge device) executes the gradient compression scheme to generate compressed local model updates. More details about the gradient compression scheme are given in Section 4. Then the worker sends its compressed model update to the nearest miner on the corresponding model training subchain. This miner (i.e., verifier) first evaluates the quality of the compressed model updates from nearby workers by using a testing dataset. This small testing dataset is verified and provided by the task publisher in each model training subchain, which is considered as a reliable dataset for verifying the training model. Only the qualified model updates, whose accuracy is higher than a given threshold, are picked up to store in a pending block later. The thresholds can be adjusted according to security requirements of different task publishers. In this way, the model evaluation can prevent poisoning attacks incurred by malicious participants, thus improving security of the proposed BFEL framework. [4] . 

In blockchain-empowered federated edge learning, workers need to send a large amount of gradient information (i.e., local model update parameters) to miners for aggregating model updates in each training iteration. The workers not only bear large communication overhead, but also suffer from the inference attack when sharing gradients. However, previous studies have shown that the sparseness degree of gradient is generally high, so only a few important gradients (i.e., gradients with large absolute values) have a positive effect on the accuracy of the model [13] . Inspired by this, we propose a gradient compression scheme to achieve communication-efficient and secure BFEL. Here, only the important gradients (with large absolute values) are uploaded to the miners to reduce the communication overhead. The importance of a gradient is indicated by its magnitude.

Only the gradients, whose absolute values are larger than a given threshold, are transmitted. To maintain model performance, the gradient compression scheme utilizes the techniques of momentum correction and local gradient clipping on top of the gradient sparsification to ensure no loss of accuracy [7] . As a result, the gradient compression scheme not only reduces communication bandwidth problems by gradient sparsification (i.e., compressing the gradients), but also relieves the inference attack problems by only sharing limited gradient information [7, 13] . More specifically, the workers only send a part of gradients with large absolute values to their miners. To avoid information loss caused by gradient sparsification, the rest of gradients are stored in local buffer space of workers, and accumulated locally till becoming large enough to be uploaded [7] . Here, we use distributed stochastic gradient descent for iterative updates, and define the loss function to be optimized as follows [7, 14] :

where F (ω) is the loss function, f (x, ω) is the loss calculated from data sample x ∈ D k for workers, and ω is the weight of the neural network. The learning rate is denoted as η, and B k,t is a sequence of N mini-batches sampled from D k for the t-th round of training (1 ≤ k < N ), and b is size of each local data sample. Note that the model convergence time will be affected when the sparsification degree of gradients reaches a large value, e.g., 99% [7] . To address the convergence problem, we employ a momentum correction mechanism proposed in [7, 14] to mitigate this effect. Using the momentum correction mechanism, the accumulated small gradients for each worker converge toward the direction of the gradients with a larger absolute value, thus accelerating the model convergence speed. Moreover, we also apply gradient clipping mechanism to overcome gradient explosion. Specifically, by following [14] , the gradient clipping is executed locally before adding current gradients to the previous local gradient accumulation, thus the gradient explosion problem is alleviated [14, 15] .

We prove the gradient compression scheme has no impact on the model convergence as follows [14] . We define g (i) as the i-th gradient, and u (i) is the sum of the gradients using the optimization algorithm in [2] . v (i) represents the sum of the gradients accumulated in local buffer space, and m is the ratio of the remaining gradients to all gradients. If the i-th gradient does not exceed threshold until the (t − 1)-th iteration and triggers the model update, we have:

thus we can update ω

If the i-th gradient is larger than the threshold at the t-th iteration, model update is triggered, then we have: u

Next, we can obtain,

Therefore, the result of using the local gradient accumulation is consistent with the usage effect of the optimization algorithm in [2] . The detailed implementation of the gradient compression scheme is given in Algorithm 1 with the following phases:

-Phase 1: Local Model Training: The workers train their local models on their own local datasets with momentum correction and local gradient clipping mechanisms. These mechanisms can solve the learning convergence and gradient explosion problems, respectively. -Phase 2: Gradient Compression: Each worker executes the gradient compression process in Algorithm 1 to compress the gradients and upload sparse gradients (i.e., only the gradients whose absolute values larger than a threshold are transmitted) to the nearby miner. Note that the workers send the remaining local gradients in their buffer space to the nearby miner when the local gradient accumulation is greater than the threshold. 

Blockchain-related Issues: The proposed decentralized federated learning framework with multi-blockchains is secure and reliable due to the following Algorithm 1: Gradient compression scheme.

Input: A set of workers N = {n1, n2, · · · , ni}, B is the local mini-batch size, D k is the local dataset, η is the learning rate, and the optimization function SGD. Output: ω. 1 Initialize ωt; 2 g k ← 0; 3 for t = 0, 1, · · · do 4 g k t ← g k t−1 ;

Sample data x from D k ;

Thr ← |Top ρ% of {g k t }|; 12 if |g k j t |> Thr then 13 Send this gradient to the nearby miner; 14 Send the remaining gradients to the buffer space of the worker; 15 else if When accumulated local gradient > Thr then 16 Send this gradient to the nearby miner; 17 All-reduce g k t : gt ← This framework enables performance isolation that each individual subchain maintains its own data locally without privacy concerns. All the subchains are anchored to the main blockchain periodically for publicly verifiable integrity of subchains. (III) Similar to the DPoS consensus algorithm, the proposed Proof-of-Verifying scheme is secure and reliable as long as the number of malicious miners does not exceed 1 3 of the total number of miners [4] . The malicious miners will be punished and their deposit confiscated (mentioned in

Step 2 of the PoV consensus scheme), which deters the malicious behaviors of miners. (IV) The local model update records and model training records are secure because of tamper-proof, decentralization and traceability properties of blockchain technologies [16] [17] [18] .

Federated Learning-related Issues: With the help of PoV consensus scheme, both the local model updates and global model updates are reliable and secure for federated edge learning. The reason is that, for the i-th round of local model updates, miners will mutually verify the quality of the local model updates using a given testing dataset, and remove poisonous local model updates that may damage the global model. Only the high-quality model updates are added into model training subchains to generate a new and reliable global model for the next iteration. Therefore, the PoV consensus scheme can defend against poisoning attacks and ensure secure decentralized federated edge learning. Moreover, the gradients from workers contains the distribution of local training data. For inference attacks, the attackers analyze this distribution information and reconstruct the training data according to shared gradients by reverse engineering [7] . Thereby, we can utilize the gradient compression scheme to generate sparse gradients, and upload these gradients to the miners without compromising learning accuracy. Using this approach, we can prevent the attackers from obtaining the complete distribution of local training data, which can reduce gradient privacy issues during decentralized model learning. As a result, the gradient compression scheme not only improves the communication efficiency of BFEL, but also relieves inference attacks caused by gradient leakage problems.

We evaluate the performance of the proposed BFEL framework and schemes by using real-world datasets including MNIST and CIFAR-10. The datasets are uniformly divided into a training set including 70% data and the rest data is included in a test set. We implement the proposed BFEL framework using Pytorch, PySyft, and a blockchain platform named EOSIO with DPoS scheme [3] . The experiment is conducted on a virtual workstation with the Ubuntu 18.04 operating system, Intel (R) Core (TM) i7-4500U CPU, 16GB RAM, 512GB SSD. There exist 2 task publishers, 22 miners, 20 workers, and also a model trading subchain and 2 model training subchains in the simulation. All of the subchains apply the DPoS scheme as their consensus algorithms.

In our Blockchain-based Federated Edge Learning (BFEL) framework, the gradient compression scheme plays an important role for system performance. We first evaluate effects of a hyperparameter ρ (i.e., the threshold of gradient absolute value in Algorithm 1) for the BFEL. A simple Convolutional Neural Network (CNN) network (i.e., CNN with 2 convolutional layers followed by 1 fully connected layer) is used to perform the classification tasks on MNIST and CIFAR-10 datasets, respectively. The pixels in all datasets are normalized into range of [0,1]. In the simulation, we take a model training subchain with 10 workers and 11 miners as an example. The learning rate is η = 0.001, and the training epoch is E = 1000. The mini-batch size is B = 128, and θ is set as 0.05. We compare the performance of different ρ thresholds for the learning accuracy, and thus find out the best threshold of the gradient compression scheme in our simulation.

Specifically, ρ takes value from the set {0.1, 0.2, 0.3, 0.5, 0.9, 1, 100} to carry out simulation on the MNIST and CIFAR-10 datasets to observe the best threshold of the gradient compression scheme. As shown in Fig. 2 , we observe that the larger ρ leads to the better accuracy performance of the proposed framework. For the MNIST task, the results demonstrate that the accuracy is 97.25% when ρ = 0.3, and the accuracy is 99.08% when ρ = 100. This means that although the gradient size has been raised more than 300 times as compared with ρ = 0.3, the learning accuracy is only improved 1.83% than that of ρ = 0.3. Furthermore, we observe a trade-off between the gradient threshold and accuracy. Therefore, to achieve the trade-off between the gradient threshold and the learning accuracy, we set ρ = 0.3 as the best threshold of the gradient compression scheme. For the communication efficiency of the BFEL framework, we compare the BFEL framework with the Gradient Compression Scheme (GCS) with the traditional centralized FEL framework with or without GCS. We apply typical CNN, Long Short-Term Memory (LSTM), Gate Recurrent Unit (GRU), CNN-LSTM, and Support Machine Vector (SVM) methods with an identical simulation configuration. For these methods, CNN is running on MNIST dataset to execute an image classification task, and the rest of methods are running on a power demand dataset with time series data to perform power consumption prediction task [19] . The gradient threshold ρ of the GCS is set as 0.3. Similar to DPoS in EOSIO platform, the consensus time of PoV scheme in each round is set as 0.5 seconds for the BEFL framework [20] . Considering the communication overhead of each round as a fixed value, we compare the running time of the above methods in three scenarios (i.e., BFEL with GCS, FEL with or without GCS) to indicate the communication efficiency. As shown in Fig. 3 , we observe that the running time of FEL framework with GCS is less 50% than that of FEL without GCS. The reason is that GCS can reduce the number of gradients exchanged between the workers and the cloud aggregator. Since there exists delay caused by PoV scheme in BFEL, the running time of BFEL framework with GCS in different scenarios is higher than that of FEL with GCS, but much lower than that of FEL without GCS. Moreover, the BFEL framework with GCS can defend against poisoning attacks by the PoV scheme and remove the centralization security challenges by blockchain technology. Furthermore, GCS can compress the gradient size by 300 times with almost no reduction in accuracy. Therefore, the proposed BEFL framework is more secure, communication-efficient and practical in real-world applications.

In this paper, we propose BFEL, a scalable, communication-efficient, blockchainbased framework for federated edge learning. First, we introduce a hierarchical blockchain framework with multiple blockchains to manage training models and model trading records in a scalable and flexible way. Second, we propose a Proofof-Verifying consensus scheme to defend against poisoning attacks and ensure reliable federated edge learning. Third, a gradient compression scheme is presented to reduce communication overhead and achieve communication-efficient federated edge learning. We evaluate the performance of the proposed framework and schemes on real-world datasets with different typical machine learning methods. Security analysis and numerical results indicate that the proposed framework not only ensures secure, scalable federated learning, but also achieves communication-efficient federated edge learning.

Edge intelligence: Paving the last mile of artificial intelligence with edge computing

Federated learning: Strategies for improving communication efficiency

Incentive mechanism for reliable federated learning: A joint optimization approach to combining reputation and contract theory

A blockchain-based decentralized federated learning framework with committee consensus

Blockchained on-device federated learning

Blockchain empowered asynchronous federated learning for secure data sharing in internet of vehicles

Deep leakage from gradients

Qsgd: Communication-efficient sgd via gradient quantization and encoding

A platform architecture for multi-tenant blockchain-based systems

Privacy-preserving traffic flow prediction: A federated learning approach

An overview of blockchain technology: Architecture, consensus, and future trends

Performance benchmarking and optimization for blockchain systems: A survey

A framework for evaluating gradient leakage attacks in federated learning

Deep gradient compression: Reducing the communication bandwidth for distributed training

Gradient sparsification for communication-efficient distributed optimization

Blockchain for internet of things: A survey

Blockchain challenges and opportunities: A survey

Recommending differentiated code to support smart contract update