key: cord-0070352-wll3v99l
authors: Li, Dun; Han, Dezhi; Weng, Tien-Hsiung; Zheng, Zibin; Li, Hongzhi; Liu, Han; Castiglione, Arcangelo; Li, Kuan-Ching
title: Blockchain for federated learning toward secure distributed machine learning systems: a systemic survey
date: 2021-11-20
journal: Soft comput
DOI: 10.1007/s00500-021-06496-5
sha: 61c6f05073d8b739d7a23376e0b4d2e0cac4e5ee
doc_id: 70352
cord_uid: wll3v99l

Federated learning (FL) is a promising decentralized deep learning technology, which allows users to update models cooperatively without sharing their data. FL is reshaping existing industry paradigms for mathematical modeling and analysis, enabling an increasing number of industries to build privacy-preserving, secure distributed machine learning models. However, the inherent characteristics of FL have led to problems such as privacy protection, communication cost, systems heterogeneity, and unreliability model upload in actual operation. Interestingly, the integration with Blockchain technology provides an opportunity to further improve the FL security and performance, besides increasing its scope of applications. Therefore, we denote this integration of Blockchain and FL as the Blockchain-based federated learning (BCFL) framework. This paper introduces an in-depth survey of BCFL and discusses the insights of such a new paradigm. In particular, we first briefly introduce the FL technology and discuss the challenges faced by such technology. Then, we summarize the Blockchain ecosystem. Next, we highlight the structural design and platform of BCFL. Furthermore, we present the attempts ins improving FL performance with Blockchain and several combined applications of incentive mechanisms in FL. Finally, we summarize the industrial application scenarios of BCFL.

The quality and security of data are the keys to the development of machine learning and artificial intelligence (AI). However, rich data is often privacy sensitive and large scale, which will hinder traditional methods to log into a data center and train there. Besides, most of the data and resources needed for effective training of machine learning models are owned by a few large technology companies, which is detrimental to privacy protection and further leads to centralization problems. Thus, a novel, distributed learning approach that allows large-scale joint modeling without publishing raw data becomes imperative. In this context, Federated learning (FL) proposed by Google (Konečnỳ et al. 2016; Aledhari et al. 2020; McMahan et al. 2017) has recently received great attention at both the research and application levels.

Specifically, FL is an emerging machine learning technology consisting of many mobile devices and a central storage server. This technology allows distributed model training using local datasets from large-scale nodes, such as mobile devices. FL updates the parameters without uploading the original training data and then builds a shared model by aggregating the locally computed updates . A typical example is the FedAVG algorithm, which is based on the iterative model averaging proposed in McMahan et al. (2017) . This method is robust and allows to generate imbalanced and independent, and constant distribution non-IID data distributions. The basic design structure of FL is shown in Fig. 1 . Based on this, FL offers promising privacy protection for mobile devices while ensuring high learning performance.

However, despite the many benefits mentioned above, FL still faces serious challenges. The gradient aggregation mechanism used for FL makes the entire algorithmic model dependent on the control of a central node. So we need to address two trust issues: one is to ensure that there is a central node that all participants trust, and the other is to ensure that information about the operations of the central node is transparent. First of all, FL relies on centralized databases and remains at risk of distributed denial of service DDoS attacks and privacy breaches. Again, currently, FL systems do not have suitable and transparent contribution evaluation mechanisms and incentive mechanisms to ensure continuous active training of training nodes. Finally, an effective distributed system needs to identify and prevent malicious nodes. However, the current FL system does not provide adequate mechanisms to implement these operations.

Interestingly, Blockchain technology provides an opportunity to address the above challenges of FL. More precisely, through the combination of chain structure, tree structure, and graph structure, the Blockchain ensures secure storage and data traceability . Besides, through the consensus mechanism of proof-of-work (POW ), Blockchain realizes the untamperability of data. In more detail, due to the validation process of Blockchain local training results, the proposed BCFL framework can avoid the single point of failure (SPOF) and extend its federation scope to untrusted users in the public network. In addition, by providing rewards proportional to the size of the training samples, BCFL can realize effective incentives and thus facilitate the union of more devices with a large number of training samples. Therefore, the Blockchain can be seen as a perfect complement for FL, providing it with improved interoperability, privacy, security, reliability, and scalability .

Although many papers involve different aspects of the BCFL paradigm, there is no systematic investigation on this paradigm. In this article, we present a survey on a new paradigm for integrating Blockchain and FL. This survey denotes such a synthesis of Blockchain and FL as Blockchain-based federated learning (BCFL) framework. To present a complete picture of BCFL-related studies, we surveyed the related works focusing on structure design, performance enhancement attempts, incentive mechanism Fig. 1 The architecture of FL design, and industrial applications of BCFL, in a period ranging from 2016 to 2021. Given the previous work, we aim to (i) provide a conceptual introduction to FL and Blockchain technology, (ii) provide a systematic analysis of the potential of incorporating Blockchain into FL, and (iii) discuss the specific applications of BCFL in depth.

In detail, the main contributions of this paper are summarized as follows.

• We provide an overview of the definition, architectural design, and deployed platform for Blockchain and FL convergence. • We provide a systematic survey on the studies dedicated to improving the performance of FL by integrating block FL systems. • We survey the existing studies on effective incentive mechanisms for training nodes using Blockchain. • We summarize the current feasible applications for BCFL in industrial applications.

The rest of this article is organized as follows. We first introduce the related work in Sect. 2. Section 3 then introduces the background and fundamentals of FL and Blockchain. Subsequently, Sect. 4 presents the convergence architecture and deployment platform of BCFL. Section 5 then summarizes the attempts to make appropriate improvements to the BCFL. Section 6 discusses the transparent contribution recognition and effective reward for clients in BCFL. Section 7 next summarizes the feasible application of BCFL. Finally, Sect. 8 concludes the paper. summarized the main information concerning the structure and characteristics of Blockchain. In this work, we take Yang's work ) and Zheng's work (Zheng et al. 2017) as baselines and organize the closely related research. As Fig.  2 shows, Yang's work is associated with more highly cited articles, and Zheng's work links more paper groups.

In conclusion, the technological development of FL has attracted much attention, and the related research has shown an explosive growth trend. However, as Table 1 shows, there is no existing survey related to the combination of Blockchain and FL in the literature. To fill this gap, we propose in this work the first survey that performs a thorough investigation of the relevant studies published in recent years on BCFL. Again, we systematically present the structural designs, deployed platforms, performance improvement, node incentive mechanisms, and the industrial applications of BCFL. Finally, based on the related works, Table 2 defines a list of acronyms and the definitions used in this survey.

In this section, we provide all the background necessary to understand better and follow this paper. More precisely, we briefly introduce FL integration in Sect. 3.1 and present Blockchain ecosystem in Section 3.2.

FL refers to the calculation process that enables the data owner F i to perform model training and obtain the model M F E D without giving their data D i while ensuring that the gap between the effect V F E D of the model M F E D and the effect V SU M of the model M SU M is small enough. This calculation can be expressed as follows.

Where |D i | is an arbitrarily small positive value, 1 ≤ i ≤ n, and n is the number of participants to the system.

The basis of FL is the data matrix. As shown in Fig. 3 , based on the different distribution patterns of sample space and feature space of data, FL can be divided into three categories: horizontal federated learning (HFL), vertical federated learning (VFL), and federated transfer learning (FTL) which divide the dataset horizontally (i.e., user dimension), longitudinal (i.e., feature dimensions), and non-dimensionally, respectively.

FL systems generally consist of data holders and central servers. The amount of local data or the number of features of each data holder may not be enough to support successful model training. Therefore, support from other data holders is required. Figure 4 illustrates the FL process for the clientserver architecture.

In a typical cooperative modeling process of FL, the training of local data by the data holders occurs only locally to protect data privacy. Next, the gradients generated by the iterations are used as interaction information after desensitization and uploaded to a third-party trusted server instead of local data, waiting for the server to return the aggregated parameters to update the model. In detail, the steps of FL can be summarized as follows.

• Step 1. System Initialization. First, the central server sends the modeling task and seeks to participate in the client. • Step 2. Local Calculation. After the joint modeling task is opened and the system parameters are initialized, each data holder will be required to perform local calculations according to the data locally first. Fig. 3 The category of data partition for FL

• Step 3. Central Polymerization. After receiving the calculation results from multiple data holders, the central server aggregates the calculated values. In the aggregation process, efficiency, security, privacy, and other issues need to be considered.

Notably, the work of the FL central server is similar to a distributed machine learning server, which collects the gradients of each data holder and then returns a new gradient after performing aggregation operations in the server.

Currently, FL has been integrated with other emerging technologies by many scholars to enable industrial applications, such as the efficiency improvement of mobile and wireless communication (Konecný et al. 2016; Sattler et al. 2020; Reisizadeh et al. 2020; Niknam et al. 2020 ) and fog computing . It can be seen that FL is prominent in industrial applications for privacy-sensitive data and the processing of non-IID data. Practical industrial-scale applications are not yet sufficient, but theoretical preparations are relatively well established.

There are currently a few open-source frameworks for researchers and developers to build FL systems. A summary of such frameworks is listed in Tab 3. 

Blockchain is essentially a decentralized distributed database. All the interactive records (transactions) generated in the system are linked into chains as blocks and stored in each section in time. Furthermore, each transaction is guaranteed by cryp- 

The architecture of Blockchain tography and PoW algorithms that cannot be tampered with or forged, so each node in the system can achieve secure peer-to-peer transactions. As Fig. 5 shows, a block consists of a block header containing metadata and some transaction records. These blocks are linked by the hash pointer of the block header to form a complete ledger, which is the narrow definition of Blockchain. More precisely, from the bottom to the top, the Blockchain is composed of the data layer, incentive mechanism, consensus layer, network layer, and application layer (Zheng et al. 2017; Fan et al. 2021; Zheng et al. 2018; Lu 2018; ). Based on different application scenarios and designed systems, the Blockchain is generally divided into public chain, consortium chain, and private chain. Table 4 presents the comparison of three different types of Blockchain. Generally, different types of Blockchain are selected according to the requirements of different business scenarios . However, in a broad sense, only the public chain can meet the original design intention of the Blockchain.

The most fundamental consensus mechanism of Blockchain is the proof-of-work (POW ). A node chooses to store the hash value of a specific block in the current block and then mines it. Once successfully linked, it means that the node accepts the transactions of this block and all previous blocks linked by this block. In addition to PoW, there are many other types of consensus mechanisms. Table 5 lists several typical consensus mechanisms and gives a comparative explanation.

The smart contract can digitally verify the negotiated or executed contracts and allow trusted transactions without a third party. Besides, these transactions are traceable, and irreversible (Huang et al. 2019 ). Thus, the success of Ethereum has contributed to the realization of smart contracts. As shown in Fig. 6 , it includes transaction processing and preservation mechanism and a complete state machine for accepting and processing various smart contracts. Smart contracts bring great versatility and adaptability to the Blockchain. It is because of the smart contract functionality that various algorithms, including FL, can be deployed on the Blockchain.

This section outlines the main characteristics of the Blockch ain-based federated learning (BCFL) framework. More precisely, in Sect. 4.1, we first introduce the BCFL architecture arising from the integration of Blockchain and FL. Then, we present the design of data storage and the deployed platform of BCFL in Sects. 4.2 and 4.3, respectively.

The first related research focused on the construction of BCFL has been proposed by Kim et al. (2018) . The main concept underlying the BCFL is to solve the issues on private exchange and reward mechanisms by using Blockchain (Hieu et al. 2020) . Subsequent related studies, such as Mugunthan et al. (2020) , , Ma et al. (2020) , and Majeed and Hong (2019) , have also built some contributions on this foundation, but only introducing some small-scale improve-ments. Besides, to make an intuitive display, a demo of BCFL has been proposed by . However, these follow-up studies all adopted this basic design structure, as shown in Fig. 7 .

Specifically, the Blockchain mainly serves as a central database for the FL system, which is fully decentralized and privacy-protected. Therefore, the main goal is to reward the clients according to the quality of their contributions while 

As with any distributed system, FL bears the privacy leakage challenge. For BCFL, the Blockchain plays a pivotal role in solving this problem . Indeed, the decentralized functioning of Blockchain enables to make FL fault-tolerant (Shayan et al. 2021) , and can help to avoid attacks effectively. More precisely, to better solve the security problem of data storage, many studies try to make further improvements based on ordinary Blockchain. For example, a new ring decentralization algorithm , and an innovative committee consensus mechanism ) was shown to be feasible solutions for improving decentralized FL performance and reducing consensus computation, respectively. In summary, the Blockchain data storage model can protect the privacy of a single client update and maintain the large-scale performance of the global model.

In BCFL, the functions of the Blockchain layer need to be implemented with the support of a platform. Different Blockchain platforms have different characteristics. For example, public chains provide stable performance, consortium chains provide robust security, and private chains provide more customization features. From a careful analysis of the literature, the current BCFL mainly adopts four platforms: Ethereum, Hyperledger Fabric, EOS, and Custom Blockchain. The features comparison of these platforms is shown in Table 6 .

As the earliest programmable Blockchain, Ethereum is Turing-Complete (Buterin XXXX). The work proposed by Nagar (2019) deploys the BCFL platform using an unlicensed side chain, using a technique proposed by layer 2 extension. Moreover, based on smart contracts in Ethereum, Mugunthan et al. (2020) proposes the BlockFLow architecture, which initially realizes accountable and privacy-preserving FL through a novel contribution scoring procedure. Similarly, Baffle (Ramanan et al. 2020) and ChainFL (Korkmaz et al. 2020 ) are both Etherium-based FL systems, which use smart contracts to coordinate round partitioning, model aggregation, and update tasks in FL.

As an open-source project, Fabric is initiated by the Linux Foundation and maintained by several corporate organizations. demonstrate FL training neural network model on FL client's physical distribution dataset. The underlying communication between the server and the client uses the new Blockchain-based protocol on the secure data exchange system.

The Enterprise Operating System (EOS) is a Blockchainbased operating system designed for commercial distributed applications (Grigg XXXX) . For example, an EOS-based FL framework is proposed in Martinez et al. (2019) , in which the model owner O has the total liability of payment for the device and producer work, as opposed to devices D needing to pay for their transactions.

Although there are many well-established public chains or consortium chains on the market, many researchers still choose to load FL systems with Custom Blockchains. The main reason is that the Custom Blockchain allows better flexibility, programmability, and extensibility. In particular, the work of proposes BlockFL, an architecture based on a Custom Blockchain in which local learning model updates are exchanged and validated. Similarly, propose a system consisting of a dual-module containing a permission Blockchain module and a FL module.

FL is essentially a kind of machine learning. Therefore, its learning performance, efficiency, and security are important aspects to take into account. For this reason, several studies have been proposed to make appropriate improvements to the BCFL and enhance the above model performance. 7 summarizes the current effective attempts to improve the BCFL.

FL is a distributed machine learning method that supports local storage of data. In this method, the client implements training through interactive gradient values. Therefore, the underlying idea for improving the accuracy of the model is similar to classical machine learning.

ChainFL proposed in Korkmaz et al. (2020) achieves encouraging results on the Modified National Institute of Standards and Technology database digit recognition task (MNIST ) and Canadian Institute for Advanced Research image classification task (CIFAR-10). Such results demonstrate that the BCFL model can enhance the system fault tolerance without losing the corresponding model performance compared to the traditional FL model.

The ID labels of data samples have a significant impact on the accuracy of machine learning models. To address the problem that user-generated data samples across devices are likely to become non-IID, Jeong et al. (2018) proposed federated augmentation( FAug), a data augmentation scheme that collectively trains generative models on each device to enhance the local data to generate IID datasets.

For industrial areas such as languages and games, large-scale computations still have high demands on overall algorithm performance (Ogiela and Ogiela 2009 ). Thus, the tracking and measurement of the algorithm's efficiency are therefore crucial.

The efficiency of the database will have an appreciable impact on the efficiency of FL. Again, the smart contract function in the Blockchain can replace the oracle service to achieve the data access function. The work of Drungilas et al. (2021) uses chaincode in Hyperledger structures instead of oracle services in the database and compares the runtime of functions executed using either chaincode or oracle services, demonstrating that negligible differences between implementations justify the flexible choice of model.

Blockchain allows the performance of algorithms to be securely stored and recorded, and in particular, the long-term trend of FL can be tracked, depicting the overall situation and future dynamics of algorithm efficiency during operation. Therefore, weights based on each client's local learning accuracy and weights based on each client's frequency of participation can be used to achieve higher stability and faster convergence times to target accuracy. For instance, Kim and Hong (2019) propose a local learning weighting method for node recognition. This method selects nodes according to the participation frequency and data and weights to achieve fast convergence and stable learning accuracy. 

Existing schemes have proven that the Blockchain-based decentralized control mechanism of Blockchain can effectively prevent risks such as SPOF Firdaus and Rhee, 2021; Dwivedi et al., 2021; Ruggeri et al., 2020) , DDoS attacks Saad et al., 2019; Rodrigues et al., 2017; Houda et al., 2019; Elisa et al., 2020) , and poisoning attacks Barański and Konorski, 2020; Rathore et al., 2019) . However, the considerable computing power and storage cost of standard solutions are still critical challenges.

Another possibility to achieve low-cost security improvements is to use re-encryption algorithms . For example, the work by proposes a crowdsourcing framework called CrowdSFL, in which a re-encryption algorithm based on the ElGamal cryptosystem is designed to ensure that interaction values and other information are not exposed to other participants outside the workflow. In this way, users can realize crowdsourcing with less overhead and higher security.

As mentioned in Sect. 4.2, the consensus mechanism in the Blockchain can better ensure the security and privacy of FL's data storage. Therefore, the appropriate improvement of the consensus mechanism can make FL more suitable for different scenarios and data. A reliable worker selection scheme for FL tasks proposed in introduces the concept of reputation as a metric to identify trusted and reliable workers in joint to prevent unreliable updates.

FL participants pay for computational resources. However, the training and commercialization of models are not instantaneous, and therefore, there is some delay before the federation reimburses participants. In this section, we outline the incentive mechanism underlying the BCFL. More precisely, in Sect. 6.1, we summarize the current attempts to apply Blockchain technology in handling lazy clients, while in Sects. 6.2 and 6.3 we assess the client contribution and compelling motivation, respectively.

Basic FL does not take into account the identification of lazy clients and lacks incentives for influential learning clients.

Some studies have already begun to try the node incentive of FL, such as Ng et al. (2020) , Khan et al. (2020) . However, since there is no actual token mechanism design, these studies mainly focus on documentation, detection, and simulation. In contrast, Block-FL's incentive mechanism deals with lazy nodes more practically. Typically, the works of and propose and evaluate the learning performance of Blade-FL with bounds that are convex concerning the total number of rounds K and optimize the computational resource allocation to minimize the upper bound.

To sustain the long-term engagement of the high-quality data owners (especially enterprises), the FL system needs to provide appropriate incentives based on the accurate evaluation of computational contributions. The systems in FL can be synchronous or asynchronous, depending on whether they use communication or not. In practice, the system functionality of FL can be well realized only if the computational work of the participating nodes is reasonably and well evaluated. The current, reliable methods mainly include a joint learning framework based on Blockchain protocol ) and a new measurement standard based on verification error (Martinez et al. 2019) . Similarly, some of these methods introduce the concept of competition to prevent workers from deviating from the protocol (Ogiela et al. 2016) , rewarding only those who contribute (Toyoda et al. 2020 ).

Based on the contribution score assessment, part of the BCFL model attempts to incentive highly reputable mobile devices with high-quality data to participate in FL (Kang et al. 2019) .

The peer-to-peer payment system is a natural profit allocation mechanism in the Blockchain. Taking inspiration from that mechanism, the work of proposes a support vector machine-based profit allocation framework based on the proof of Shapley protocol. On the other hand, the framework proposed in is based on evaluating the fractions of the dataset for the corresponding share rewards and a framework of reasonable contribution scores generated by both protocols.

Due to the strong adaptability exhibited by BCFL, there is an increasing trend of its wide application. This section mainly studies the industrial applications of BCFL. As shown in Table 8 , we divide these applications into nine areas and summarize the benefits and improvements brought by the corresponding research.

The health care industry is in a prominent position in using data to create value and improve human health. However, it has been proved that the traditional methods used to alleviate the privacy problems of health data are insufficient to protect personal interests. For this reason, it is easy to guess that medical data is highly privacy sensitive. BCFl can be an effective solution to mitigate the problems mentioned above since it can perfectly meet the data processing requirements in the field of medical and health care. In particular, BCFl not only completes the modeling requirements of physical therapy data but also avoids privacy leaks on relevant data. For instance, a new agent model based on BCFL is proposed in Dp et al. (2021) , as a real-time medical data processing system. Again, to strengthen the privacy of health care data, the model proposed in Passerat-Palmbach et al. (2019) adopts the integration of unique privacy protection technology based on a protocol composed of protected hardware components and the native Ethereum cryptographic toolkit. Finally, the work of Passerat-Palmbach et al. (2020) also uses a similar model, and on this basis, it strengthens the incentive mechanism of data operation.

Open networks and service sharing scenarios are complex and varied, leading to serious security challenges ). In the FL setting, adversaries have more opportunities to poison a local machine learning model with malicious training samples, thus affecting the results of FL and evading detection. However, the work of Preuveneers et al. (2018) shows that audit machine learning models using an anomaly detection algorithm that detects incremental updates recorded on a Blockchain ledger can effectively prevent attacks. For the same purpose, the framework proposed by Desai et al. (2020) uses smart contracts to detect and punish attackers through fines automatically.

Device fault detection is one of the most critical issues in the industrial Internet of Things (IIoT ). However, in traditional IoT device fault detection, client devices need to upload raw data to a central server for model training, which carries the risk of leakage of sensitive business data .

Given the sensitivity, massive volume, fragmentation, and security of multi-party data computation in IoT environment, the works of Yin et al. (2020) , and Rahman et al. (2020) both propose a BCFL-based learning approach for device Defence framework for sustainable society "Airplane," "Bird," "Drone," and "Ship" from the different sources Advanced validation Sharma et al. (2020) Privacy protection fault detection in IoT. In particular, to solve the data heterogeneity problem in IoT fault detection, Zhang et al. (2021) proposed a novel centroid distance weighted federated averaging (CDW_FedAvg) algorithm. In detail, this algorithm effectively enhances the applicability and model accuracy by taking the distance between positive and negative classes of each client dataset as the basis for calculation.

On the Internet of Vehicles (IoV ), sharing data between vehicles for collaborative analysis can improve the driving experience and service quality ). However, efficiency, security, and privacy issues have become obstacles for data providers to participate in the data sharing process (Meng et al. 2021; . Fortunately, the BCFL framework is a suitable solution to the contradiction between large-scale data sharing and privacy protection. More precisely, the fundamental applications of BCFL deal with using the validation and consensus mechanisms within the Blockchain to secure IoV data and jointly ensuring trustworthy shared training for mutual machine learning models on decentralized end devices (Otoum et al. 2020) . In detail, such operations are carried out by adapting instant block validation at the Blockchain level (Pokhrel 2020) and assessing the trustworthiness of vehicle observations during data collection (Chai et al. 2020) . On this basis, the work of Pokhrel and Choi uses the consensus mechanism of Blockchain to manage data without any centralized training or coordination. Meanwhile, the characteristics of controllable networks and BCFL parameters (such as retransmission limit, block size, block arrival rate, and frame size) can better capture their impact on system-level performance. Finally, some researchers have deployed SVM (Hua et al. 2020) , and DRL algorithms to improve the efficiency.

In recent years, a large number of new applications requiring different network services have emerged. To secure FL in 5G communication, the main current solutions are Blockchain authorization ) and decentralized federated slicing architecture . Furthermore, the work of Lu et al. (2021) proposed a digital twin wireless network (DTWN) scheme which moved real-time data processing and computing to the edge plane by merging digital twins into wireless networks.

Edge computing architecture can quickly process the data collected by the Internet of Things (IoT ) . Based on the concept of Blockchain reputation perception for fine-grained FL, the model proposed in Rehman et al. (2020) can ensure credible collaborative training in mobile edge computing systems. Again, the work of Cui et al. (2020) proposes to apply a compression algorithm of FL, assisted by the Blockchain, to predict the content caching of files.

On the other hand, as shown in Shen et al. (2021) , a new attribute inference attack is proposed. This attack exploits the unexpected attribute leakage of FL aided by Blockchain in intelligent edge computing.

As an extension of cloud computing and the foundation of IoT, fog computing is experiencing rapid growth. Indeed, fog computing has the potential to alleviate some thorny issues, such as network congestion, latency, and local autonomy. However, privacy concerns and consequent inefficiencies are slowing down the performance of fog computing (Huang et al. 2019) . FL-Block proposed in Qu et al. (2020) modifies the structure of the fog server by storing global updates on the Blockchain to secure the global updates, allowing the end devices to maintain the global model and coordinates based on distributed consensus.

Cognitive computing is used to teach a computer to think like a human brain, not just to develop an artificial system. In particular, with the success of AlphaGo and other AI algorithms, cognitive computing has also ushered in a vast development.

In this context, the work of Qu et al. (2021) introduces a BCFL-based customized reward system to promote public equipment to participate in high-performance industries by deploying Blockchain as the underlying architecture.

Defense organizations and armed forces are crucial elements for the protection and survival of a nation. However, ensuring these elements requires robust networks and computing power to coordinate intelligence and information processing efficiently. Moreover, given the highly classified nature of national data, Sharma et al. (2020) propose a distributed computational defense framework for a sustainable society using Blockchain technology and FL features. In particular, the proposed framework enables us to infer battlefield states while protecting the privacy of sensitive data.

This paper presents a survey on the applicability and integration of Blockchain with federated learning FL. More precisely, we denote this integration as the Blockchain-based federated learning (BCFL) framework and provide a comprehensive survey of issues related to BCFL implementation. In this paper, we first provide a basic description of the definitions and ecosystems characterizing Blockchain and FL. Then, we present the structure design of BCFL as a whole and summarize the feasible deployment platforms. Next, we discuss the model improvement of FL through the introduction of Blockchain. After that, we survey the research related to Blockchain incentives as an element to improve FL systems. Finally, we summarize the full range of possible applications of BCFL in the industry.

In conclusion, the combination of Blockchain and FL is an auspicious research direction, as it can better ensure data security and privacy in the case of abundant data. In addition, this combination makes it possible for more application scenarios to adopt this distributed learning model that does not need to share raw data for joint modeling.

This survey aims to provide a clear view on this topic to ensure that more and more researchers would start working on it. Future research directions could deepen and develop the following aspects:

(1) This paper does not use a cross-referencing and quantitative measure to quantify the overall trends in relevant research. Therefore, future research could consider introducing these elements as a supplement.

(2) Future studies may consider summarizing and classifying the related works from a broader range of perspectives to uncover additional research information relevant to BCFL.

(3) BCFL may be applied to increasingly more industrial fields. Consequently, some research efforts may consider more application effects in different industrial fields and make more comparative studies.

Protecting personal healthcare record using blockchain and federated learning technologies

Federated learning: a survey on enabling technologies, protocols, and applications

Mitigation of fake data content poisoning attacks in ndn via blockchain

Decentralised learning in federated deployment environments: a system-level survey

Risk and advantages of federated learning for health care data collaboration

Towards federated learning at scale: system design

A review of privacy preserving federated learning for private iot analytics

Ethereum/wiki, github

Dynamic sample selection for federated learning with heterogeneous data in fog computing

2CP: decentralized protocols to transparently evaluate contributivity in blockchain federated learning environments

A hierarchical blockchainenabled federated learning algorithm for knowledge sharing in internet of vehicles

Creat: Blockchain-assisted compression algorithm of federated learning for content caching in edge computing

A survey on application of machine learning for internet of things

BlockFLA: Accountable federated learning via hybrid blockchain architecture

Mitigating data poisoning attacks on a federated learning-edge computing network

Agent architecture of an intelligent medical system based on federated learning and blockchain technology

Towards blockchain-based federated machine learning: smart contract for model inference

Federated learning for vehicular internet of things: recent advances and open issues

Federated learning for vehicular internet of things: recent advances and open issues

Blockchainbased internet of things and industrial IoT: a comprehensive survey

A framework of blockchainbased secure and privacy-preserving e-government system

SBBS: A secure blockchain-based scheme for IoT data credibility in fog environment

Federated learning framework for mobile edge computing networks

Smart Ponzi scheme detection using federated learning

On blockchain-enhanced secure data storage and sharing in vehicular edge computing networks

End-to-end evaluation of federated learning and split learning for internet of things

Introducing tensorflow federated

From blockchain consensus back to byzantine consensus

Eos-an introduction

A traceable and revocable ciphertextpolicy attribute-based encryption scheme based on privacy protection

FedML: a research library and benchmark for federated machine learning

Survey on blockchain based smart contracts: applications, opportunities and challenges

Resource management for blockchain-enabled federated learning: a deep reinforcement learning approach

Cochain-SC: an intra-and inter-domain DDoS mitigation scheme based on blockchain using SDN and smart contract

Age-optimal power allocation in industrial IoT: a risk-sensitive federated learning approach

Blockchain enabled federated slicing for 5G networks with AI accelerated optimization

Blockchain-based federated learning for intelligent control in heavy haul railway

Blockchain-based fair three-party contract signing protocol for fog computing

The oarf benchmark suite: characterization and implications for federated learning systems

GFL: a decentralized federated learning framework based on blockchain

Distributed sensing using smart end-user devices: Pathway to federated learning for autonomous IoT

Communication-efficient on-device machine learning: federated distillation and augmentation under non-iid private data

Federated learning in smart city sensing: challenges and opportunities

Towards utilizing unlabeled data in federated learning: a survey and prospective. arXiv e-prints

Advances and open problems in federated learning

Retrospective sensing based on federated learning in the IoT

Incentive mechanism for reliable federated learning: a joint optimization approach to combining reputation and contract theory

Reliable federated learning for mobile networks

Scalable and communication-efficient decentralized federated edge learning with multi-blockchain framework

Federated learning for edge networks: resource optimization and incentive mechanism

Blockchain-based node-aware dynamic weighting methods for improving federated learning performance

Blockchained on-device federated learning

A study on blockchain-based music distribution framework: focusing on copyright protection

On-device federated learning via blockchain and its latency analysis

Federated optimization: Distributed machine learning for on-device intelligence

Federated learning: strategies for improving communication efficiency

Federated learning: strategies for improving communication efficiency

Chain FL: Decentralized federated machine learning via blockchain

Fourth world conference on smart trends in systems, security and sustainability (WorldS4)

Blockchain: overview, practical implementation & its uses

Blockchain-federated-learning and deep learning models for COVID-19 detection using CT imaging

Modelchain: Decentralized privacypreserving healthcare predictive modeling framework on private blockchain networks

Crowdbc: a blockchain-based decentralized framework for crowdsourcing

Crowdsfl: a secure crowd computing framework based on blockchain and federated learning

Federated learning: challenges, methods, and future directions

A survey on the security of blockchain systems

Design and implementation of an anomaly network traffic detection model integrating temporal and spatial features

A blockchainbased decentralized federated learning framework with committee consensus

Distributed blockchain-based data protection framework for modern power systems against cyber attacks

A secure fabric blockchain-based data transmission technique for industrial internet-of-things

An industrial network intrusion detection algorithm based on multifeature data clustering optimization model

Deep reinforcement learning for resource protection and real-time detection in iot environment

Secure data storage and recovery in industrial blockchain network environments

A mutual security authentication method for RFID-PUF circuit based on deep learning

Data fusion approach for collaborative anomaly intrusion detection in blockchain-based systems

Circuit copyright blockchain: blockchain-based homomorphic encryption for IP circuit protection

Federated learning in mobile edge networks: a comprehensive survey

Federated learning in mobile edge networks: a comprehensive survey

Federated reinforcement learning for training control policies on multiple IoT devices

GGS: general gradient sparsification for federated learning in edge computing*

Blockchain assisted decentralized federated learning (BLADE-FL) with lazy clients

Blockchain assisted decentralized federated learning (BLADE-FL): performance analysis and resource allocation

LotteryFL: Personalized and communication-efficient federated learning with lottery ticket hypothesis on non-iid datasets

A secure federated learning framework for 5G networks

Fedvision: an online visual object detection platform powered by federated learning

A secure domain name resolution and management architecture based on blockchain

A survey on federated learning systems: vision, hype and reality for data privacy and protection

A systematic literature review on federated machine learning: from a software engineering perspective

Blockchain: a survey on functions, applications and open issues

Blockchain and federated learning for privacy-preserved data sharing in industrial IoT

Differentially private asynchronous federated learning for mobile edge computing in urban informatics

Low-latency federated learning and blockchain for edge association in digital twin empowered 6G networks

Threats to federated learning: a survey

PaddlePaddle: an open-source deep learning platform from industrial practice

Transparent contribution evaluation for secure federated learning on blockchain

FLchain: Federated learning via mec-enabled blockchain network

When federated learning meets blockchain: a new distributed learning paradigm. ArXiv

Record and reward federated learning contributions with blockchain

Communication-efficient learning of deep networks from decentralized data

A lightweight anonymous cross-regional mutual authentication scheme using blockchain technology for internet of vehicles

A survey on security and privacy of federated learning

BlockFLow: an accountable and privacy-preserving solution for federated learning

Privacy-preserving blockchain based federated learning with differential data sharing

A multi-player game for studying federated learning incentive schemes

Federated learning for wireless communications: motivation, opportunities, and challenges

Secure information splitting using grammar schemes. New challenges in computational collective intelligence

Efficiency of strategic data sharing and management protocols

Blockchain-supported federated learning for trustworthy vehicular networks

Blockchain-orchestrated machine learning for privacy preserving federated learning in electronic health data

blockchain-orchestrated federated learning architecture for healthcare consortia

WITHDRAWN: towards efficient and reliable federated learning using blockchain for autonomous vehicles

Federated learning with blockchain for autonomous vehicles: analysis and design challenges

Improving TCP performance over WiFi for internet of vehicles: a federated learning approach

Chained anomaly detection models for federated learning: an intrusion detection case study

Particle swarm optimized federated learning for industrial IoT and smart city services

Decentralized privacy using blockchain-enabled federated learning in fog computing

A blockchained federated learning framework for cognitive computing in industry 4.0 networks

Blockchain-enabled 5g edge networks and beyond: an intelligent cross-silo federated learning approach

Secure and provenance enhanced internet of health things framework: a blockchain managed federated learning approach

Baffle: blockchain based aggregator free federated learning

Blockdeepnet: a blockchain-based secure deep learning for IoT network

Towards blockchain-based reputation-aware federated learning

FedPAQ: A communication-efficient federated learning method with periodic averaging and quantization

The future of digital health with federated learning

A blockchain-based architecture for collaborative DDoS mitigation with smart contracts. FIP International conference on autonomous infrastructure, management and security

BCB-X3DH: a blockchain based improved version of the extended triple diffie-hellman protocol

A generic framework for privacy preserving deep learning

Exploring the attack surface of blockchain: a systematic overview

Distributed federated learning for ultra-reliable low-latency vehicular communications

Federated learning meets contract theory: energyefficient framework for electric vehicle networks

Robust and communication-efficient federated learning from non-iid data

Federated learning with cooperating devices: a consensus approach for massive IoT networks

Blockchain and federated learningbased distributed computing defence framework for sustainable society

Biscotti: a blockchain system for private and secure federated learning

Exploiting unintended property leakage in blockchain-assisted federated learning for intelligent edge computing

From distributed machine learning to federated learning: In the view of data privacy and security. Practice and Experience

F ederated machine learning in vehicular networks: A summary of recent applications

Blockchainenabled federated learning with mechanism design

Federated machine learning: survey, multi-level classification, desirable criteria and future directions in communication and networking systems

Adaptive federated learning in resource constrained edge computing systems

In-edge AI: intelligentizing mobile edge computing, caching and communication by federated learning

Fate: An industrial grade federated learning framework

Enhancing IoT anomaly detection performance for federated learning

A survey of distributed consensus protocols for blockchain networks

A secure framework for data sharing in private blockchain-based WBANs

Verifynet: secure and verifiable federated learning

A blockchain-based roadside unit-assisted authentication and key agreement protocol for internet of vehicles

Federated machine learning: concept and applications

Federated machine learning for intelligent IoT via reconfigurable intelligent surface

FDC: A secure federated deep learning mechanism for data collaborations in the internet of things

A real-world service mashup platform based on data integration, information synthesis, and knowledge fusion

A federated learning framework for healthcare IoT devices

Systematic review of privacy-preserving distributed machine learning from federated databases in health care

A survey of incentive mechanism design for federated learning

Blockchain-based federated learning for device failure detection in industrial IoT

Demo: a blockchain based protocol for federated learning

Privacy-preserving blockchain-based federated learning for IoT devices

Blockchain challenges and opportunities: a survey

An overview of blockchain technology: Architecture, consensus, and future trends

Solutions to scalability of blockchain: a survey

Privacypreserving federated learning in fog computing

Towards the optimality of service instance selection in mobile edge computing

Reputation-based regional federated learning for knowledge trading in blockchain-enhanced IoV In: 2021 IEEE Wireless communications and networking conference (WCNC)

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.