1 Introduction

The significant generation of data, which is fundamental to modern systems, has increasingly become a powerful force in integrating the physical and digital worlds. This transformation not only changes the way we interact with devices around us but also reshapes entire industrial sectors, opening up new opportunities for countless innovations. The Internet of Things (IoT) market is expected to reach a value of USD 1.39 trillion by 2026, making it a major contributor to the large-scale generation of data. In Brazil, the technology has significant potential, with the National Bank for Economic and Social Development (BNDES) estimating that the country will generate around USD 200 billion in revenue from IoT implementations by 2025 [5]. This growth reflects not only technological advancement but also the increasing need for personalized solutions to address specific challenges, particularly those related to privacy.

Even simple activities generate important inputs for the use of Machine Learning (ML) and thus empower the development of intelligent tasks according to business needs. Hospitals routinely collect patient data (e.g., X-ray images), which constitutes a valuable resource for training ML models that aim at detecting diseases [14]. Yet, the volume of disease-specific data (e.g., pneumonia) collected by each hospital often falls short of supporting robust supervised ML models, particularly sophisticated methodologies like Convolutional Neural Networks (CNNs) [8, 9]. So, sharing data from multiple hospitals to establish a comprehensive repository (large dataset) offers a promising avenue to overcome this limitation, facilitating the development of more accurate ML models. Nevertheless, keeping patients’ privacy is paramount.

Recently, the American government issued a presidential executive order on safe, secure, and trustworthy Artificial Intelligence (AI) [18] which mentions the intention to protect Americans’ privacy, and explicitly provides mechanisms to strengthen research into privacy-preserving technologies prioritizing federal support to accelerate the development and use of such techniques. This fact alongside many others, such as the publication of regulations like the GDPR [1], highlights the need for new technologies capable of meeting privacy preservation requirements while using techniques widely recognized in the context of AI.

Fig. 1.
figure 1

Adapted from Bonawitz et al. [3].

Federated learning training protocol.

Federated Learning (FL) [7, 11] is a machine learning methodology tailored to situations where there are decentralized clients and/or when data privacy is a major concern, as exemplified by sensitive medical examination data. The primary objective is to aggregate the knowledge acquired from diverse sources throughout the learning process. This operational principle entails the collaborative execution of training across multiple edge devices or servers, collaboratively aggregating updates (i.e., CNN weights) from these entities without the need to centralize the data, thereby preserving privacy and reducing communication overhead [10, 12, 15]. Figure 1 illustrates the FL training approach. The server selects all the available clients among all n clients (a client may vary from a mobile device to an institution, like a hospital or a company, for instance). After selection, the model weights and settings are sent from the server to the clients to initiate the local model training. The training time may differ from client to client, depending on the amount of data and the computing power available at each site. Following the training stage, all the clients should report their results to the server by sending the model updates. After receiving all the updates, the server will aggregate the knowledge from all clients into one single model, starting a new federated iteration.

This work presents a federated-learning-based solution to train a supervised neural network with data from multiple sites (hospitals) for lung disease identification in chest X-ray images in a decentralized way. Throughout successive learning iterations, our approach individually trains a local model for each site, leveraging solely its own data. At the end of each round, the learned weights from all participating sites are shared with a central server, ensuring data privacy by refraining from sharing images. The central server then aggregates these weights into a global model, subsequently disseminating it back to each site for further refinement as part of a new federated iteration. This iterative process runs across multiple rounds.

We evaluated our solution within a simulated environment containing chest X-ray images from five distinct sites. Our approach relies on a dense convolutional neural network, DenseNet [6], based on the CheXNet architecture proposed by Rajpurkar et al. [14] for lung diseases identification. Our solution presented promising results in identifying fourteen lung diseases when compared against three baselines trained on the whole dataset.

1.1 Motivation

We were motivated by two main factors in our work. The first is related to the promotion of techniques that prioritize collaborative approaches over private data strategies. While there are many open datasets available from different medical fields [16], it is estimated that a model trained with such datasets would not be as effective as a collaborative model due to the size of the federated dataset needed for widespread adoption by multiple entities. This could usher in a new era of improved diagnosis accuracy.

The other motivating factor is related to the importance of human life. A recent study from the Johns Hopkins Armstrong Institute has estimated that more than 795,000 Americans suffer from serious harm due to misdiagnosis every year [13]. Some of them live permanently disabled, while others lose their lives. We believe that to overcome this situation, the medical field can utilize the benefits of AI to support its professionals in making diagnoses. As shown in a recent study [17], even simple techniques like logistic regression and statistical analysis, when applied to tabular data, can provide crucial information to support better medical decision-making.

1.2 Organization

The remainder of this article is organized as follows. In Sect. 2, we present the methods and materials used in the experiment held by this work. In Sect. 3, we show in detail the obtained results and discuss them compared to other works. And finally, in Sect. 4, we conclude our article with a summary of our main findings and future work.

2 Materials and Methods

The presented solution employs the principles of federated learning to train a decentralized supervised neural network specifically designed to automate the identification of lung diseases within chest X-ray images. Tailored for multilabel classification, wherein each patient image can be assigned zero or more labels (in our case, lung diseases), the proposed solution is application-independent, thus adaptable for other domains and image modalities, such as identifying brain lesions in Magnetic Resonance (MR) images, for instance. However, a fundamental requirement for its implementation remains the utilization of neural networks.

Fig. 2.
figure 2

Sample images from the ChestX-Ray14 image dataset. The text positioned above each image denotes its corresponding labels (diseases).

2.1 Data

While our solution is application-independent, evaluating it proves challenging due to the absence of publicly available large annotated datasets for various medical conditions. In this context, we opted to utilize the ChestX-Ray14 dataset [19] for assessment, comprising a substantial collection of 112, 120 frontal-view chest X-ray images sourced from 30, 805 distinct patients. Figure 2 presents a few examples of images for multiple diseases. Each image was annotated with as many as 14 distinct lung disease labels, acquired through automated extraction methods applied to radiology reports. These labels encompass Atelectasis, Cardiomegaly, Consolidation, Edema, Effusion, Emphysema, Fibrosis, Hernia, Infiltration, Mass, Nodule, Pleural Thickening, Pneumonia, and Pneumothorax. Images with no diseases are labeled as ‘No Finding’.

2.2 Model

Our model is a convolutional neural network with 121 layers. The network is initialized with weights pre-trained in the Imagenet dataset [4]. The network is trained using the Adam optimizer with default parameters (\(\beta _{1}\) = 0.9 and \(\beta _{2}\) = 0.999). We train the model using mini-batches of size 16. Since the dataset has fourteen labels (see Sect. 2.1), we replaced the final layer with a 14-output fully connected layer, where we applied an element-wise sigmoidal function.

2.3 Federated Training

Following the federated learning protocol illustrated in Fig. 1, we conducted a simulation involving five hospitals, leveraging X-ray images from each hospital to facilitate the identification of lung diseases. Our methodology entails the utilization of local hospital datasets to train individual local models, one per hospital. After local training, the resultant model weights from all hospitals are sent to a central server, and the weights are aggregated through the Federated Averaging [12] method, within a single global model that synthesizes the learned knowledge. This global model is then returned to all participating hospitals, perpetuating the federated training process. This process is shown in Fig. 3.

Fig. 3.
figure 3

Federated Training: Experimental implementation.

For our experimental setup, we initially partitioned the ChestX-Ray14 dataset into two distinct subsets: a training set denoted as \(D_{train}\) (comprising \(80\%\) of the data) and a separate testing set labeled as \(D_{test}\) (accounting for \(20\%\) of the data). Subsequently, we further subdivided \(D_{train}\) into five distinct subsets, simulating a scenario involving five distinct hospitals. Each hospital dataset, referred to as \(H^i\), comprised approximately 18, 000 images. Finally, we divided each hospital’s dataset into two distinct portions: a hospital training subset, designated as \(H^i_{train}\) (constituting \(75\%\) of \(H^i\)) to train the local model, and a separate hospital validation subset, denoted as \(H^i_{val}\) (comprising \(25\%\) of \(H^i\)) for local validation.

It is worth highlighting that after \(D_{train}\) is split into five subsets, the images themselves remain localized within each hospital site. Only the weights of each neural network, trained individually at each site, are shared with the central server. The federated learning process encompassed 30 rounds, each comprising 10 local epochs, resulting in a cumulative total of 300 training iterations per site. In each round, the global model, with the aggregated weights, was evaluated using \(D_{test}\).

The orchestration of training and communication across these training sites is coordinated via the Flower framework [2] in simulation mode. Our simulation required approximately 72 h to complete the training phase on a single computer equipped with the following hardware specifications: an Intel Core i7 CPU clocked at 1.80GHz (8th generation), with 16 GB of RAM, and 4 GB of dedicated GPU memory (Nvidia GeForce).

3 Results and Discussion

We used the per-class AUROC (Area under the Receiver Operating Characteristic Curve) to evaluate and compare our results against three baselines. These baselines are: 1) Wang et al. [19], the work that released the ChestX-Ray14 dataset and former state-of-the-art for 1 class; 2) Yao et al. [20], the former state-of-the-art for 13 classes; and 3) CheXNet [14], which, as far as we know, is the current state-of-the-art model for lung disease identification in the ChestX-Ray14 dataset. We highlight that all baselines were trained over the full training set \(D_{train}\) while our solution used about 5 times less training data per site. Although we do not expect our results to surpass the state-of-the-art performance, we do expect them to be comparable.

Table 1 shows the per-class AUROC of our solution (FL) compared to the baselines. We have highlighted in bold the diseases for which our method outperformed Wang et al. [19] and underlined the diseases for which we obtained virtually the same results when compared to other methods.

Table 1. Experimental results (per-class AUROC) for lung disease detection in the ChestX-Ray14 dataset.

Our solution demonstrated comparable performance to Wang et al. [19], yielding improved scores (in bold) for specific diseases such as Atelectasis, Infiltration, Nodule, Pneumonia, Consolidation, and Hernia, and virtually the same performance (underlined values) for Effusion, Mass, Pneumothorax, Fibrosis and Pleural Thickening. Conversely, it presented inferior results than Yao et al. [20] and CheXNet [14] in all diseases. This discrepancy can be attributed to the fact that both baselines were trained using the complete training set \(D_{train}\). At the same time, FL utilized substantially less data—each hospital’s training set is approximately one-fifth the size of \(D_{train}\). Hence, this comparison lacks parity, and the anticipation of suboptimal results is justified. Nevertheless, FL presented promising results for certain diseases in comparison to Yao et al.and CheXNet, particularly evident in the cases of ‘Infiltration,’ ‘Consolidation,’ and ‘Fibrosis,’ as highlighted in Table 1.

It is important to highlight a few considerations. Firstly, the effectiveness of the baseline methods, notably CheXNet, is contingent upon access to a large annotated dataset (e.g., encompassing tens of thousands of labeled images). Yet, this prerequisite proves impracticable in most practical clinical routines, given the challenges of obtaining such a volume of annotated data for a specific disease, coupled with the labor-intensive and time-consuming nature of data annotation. Consequently, these baseline methods tend to exhibit poorer results when such circumstances cannot be held. In contrast, while prioritizing data privacy, FL operates under the premise of a substantially smaller annotated dataset and strives to harness collective learning from diverse sources. This underscores both the potential inherent in our proposed solution and the substantial room for further enhancement.

4 Conclusion

This work presented a federated-learning-based solution for automatically identifying lung diseases in chest X-ray images, effectively tackling privacy challenges inherent to medical imaging and enhancing identification robustness through federated learning. This approach facilitates decentralized and collaborative model training, embracing data from diverse sources. Our solution was trained on considerably fewer data per site than the baselines, mirroring a more realistic clinical setting. Yet, it endeavors to compete with state-of-the-art methods that may not be feasible in a clinical scenario since they demand large annotated sensitive datasets due to privacy constraints. Our federated learning model exhibits significant potential for improvements (e.g., fine-tuning), allowing it to contend with state-of-the-art benchmarks following further refinements.

While hospitals routinely collect patient data, sharing these data across multiple hospitals to establish a comprehensive large dataset is impractical due to privacy constraints. Our solution followed the federated learning protocol to train decentralized dense convolutional neural networks across data from multiple sites (hospitals). Throughout successive learning rounds, each hospital’s learned model weights (knowledge) are shared with a central server and aggregated to build a global model, thus preserving data privacy and enhancing robustness.

Our solution reported promising results in identifying fourteen lung diseases for a comprehensive chest X-ray dataset. Although it presented inferior results than state-of-the-art methods, their efficacy depends on a large annotated dataset, a demand often unfeasible within practical clinical workflows. The hurdles of assembling a substantial volume of annotated data for a particular ailment, combined with the laborious and time-intensive data annotation process, underscore these limitations. In contrast, our solution leverages lung disease identification by aggregating the knowledge learned from smaller datasets across multiple sites, mirroring a more realistic clinical setting.

For future work, we first intend to refine our dense CNN through fine-tuning, regularization methods, and addressing the class imbalance. Second, we aspire to assess other deep neural networks for lung disease identification and consider different data volumes per site, on non-IID scenarios, and various numbers of sites. Finally, we intend to evaluate our solution in other medical imaging problems, such as detecting brain lesions in MR images.