key: cord-131094-1zz8rd3h
authors: Parisi, L.; Neagu, D.; Ma, R.; Campean, F.
title: QReLU and m-QReLU: Two novel quantum activation functions to aid medical diagnostics
date: 2020-10-15
journal: nan
DOI: nan
sha: 
doc_id: 131094
cord_uid: 1zz8rd3h

The ReLU activation function (AF) has been extensively applied in deep neural networks, in particular Convolutional Neural Networks (CNN), for image classification despite its unresolved dying ReLU problem, which poses challenges to reliable applications. This issue has obvious important implications for critical applications, such as those in healthcare. Recent approaches are just proposing variations of the activation function within the same unresolved dying ReLU challenge. This contribution reports a different research direction by investigating the development of an innovative quantum approach to the ReLU AF that avoids the dying ReLU problem by disruptive design. The Leaky ReLU was leveraged as a baseline on which the two quantum principles of entanglement and superposition were applied to derive the proposed Quantum ReLU (QReLU) and the modified-QReLU (m-QReLU) activation functions. Both QReLU and m-QReLU are implemented and made freely available in TensorFlow and Keras. This original approach is effective and validated extensively in case studies that facilitate the detection of COVID-19 and Parkinson Disease (PD) from medical images. The two novel AFs were evaluated in a two-layered CNN against nine ReLU-based AFs on seven benchmark datasets, including images of spiral drawings taken via graphic tablets from patients with Parkinson Disease and healthy subjects, and point-of-care ultrasound images on the lungs of patients with COVID-19, those with pneumonia and healthy controls. Despite a higher computational cost, results indicated an overall higher classification accuracy, precision, recall and F1-score brought about by either quantum AFs on five of the seven bench-mark datasets, thus demonstrating its potential to be the new benchmark or gold standard AF in CNNs and aid image classification tasks involved in critical applications, such as medical diagnoses of COVID-19 and PD.

SARS-CoV-2 is responsible for COVID-19, the 'severe acute respiratory syndrome coronavirus 2' (Cohen & Normile, 2020) and the current global pandemic announced by the World Health Organization (WHO, Mar 2020) . This virus leads to respiratory disease in humans (Cui et al., 2019) , but it may take from 2 to 14 days for the initial symptoms, e.g., fever and cough, to become manifest after an infection (Centers for Disease Control and Prevention, 2020) . However, more severe symptoms can progress to viral pneumonia and typically require mechanical ventilation to assist patients with breathing (Verity et al., 2020) . In some more severe cases, COVID-19 can also lead to worsen symptoms and even death (Zhou, et al., 2020) , as well as it may be an aetiology of PD itself (Beauchamp et al., 2020) .

Thus, it is important to be able to detect neurodegenerative co-morbidities in vulnerable undiagnosed patients, such as PD, promptly and non-invasively too, for example via CNNs that can recognise patterns from spiral drawings, and then applying non-ionising medical imaging techniques (Bhaskar et al., 2020) , which are more appropriate for such patients, to facilitate a prompt diagnosis of COVID-19 to improve clinical outcomes. Whilst tremors can be detected from patterns in spiral drawings as indicators of early PD, ground-glass opacities, lung consolidation, bilateral patchy shadowing and relevant other lesionslike patterns can be detected as biomarkers to identify COVID-19-related pneumonia from any other types, including both viral and bacterial pneumonia (Shi et al., 2020) . Improvements in the AFs of CNNs can help to improve generalisation in both these image classification tasks.

Different layers of a deep neural network represent various degrees of abstraction, thus capturing a varying extent of patterns from input images (Zeiler & Fergus, 2014) . AFs provide the CNN with the non-linearity required to learn from non-linearly distributed data, even in presence of a reasonable amount of noise. An AF defines the gradient of a layer, which depends on its domain and the range. AFs are differentiable and can be either saturated or unsaturated. In Table 1 the main activation functions commonly used in deep neural networks, including the convolutional neural network (CNN), with their equations and references, are summarised, and introduced below.

Saturated AFs are continuous with their outputs threshold into finite boundaries, typically represented as S-shaped curves, also named 'sigmoidal' or 'squashing' AFs, e.g., the logistic sigmoidal function with its output in the range of 0 and 1 (Liew et al., 2016) . Saturated AFs are typically applied in shallow neural networks, e.g., in MLPs. However, saturated AFs lead to the vanishing gradient issue whilst training a network with back-propagation (Cui, 2018) , i.e., results in gradients that are less than 1, which become smaller with multiple differentiations and ultimately become 0 or 'vanish'. Thus, changes in the activated neurons do not lead to modifications of any weights during back-propagation. Moreover, the exploding gradient problem can occur, which has an opposite effect to vanishing gradients, wherein the error gradient in the weight is so high that it leads to instability whilst updating the weights during back-propagation. Hyperbolic tangent or 'tanh' (see Table 1 ) is a further saturated AF, but it attempts to mitigate this issue by extending the range of the logistic function from -1 to 1, centred at 0. Nevertheless, tanh still does not solve the vanishing gradient problem.

Unsaturated functions are not bounded in any output ranges and are centred at 0. The Rectified Linear Unit (ReLu) ( Table  1) is the most widely applied unsaturated AF in deep neural networks, e.g., in CNNs, which provides faster convergence than logistic sigmoidal (LeCun et al., 1998) and tanh AFs, as well as improved generalisation (Litjens et al., 2017) . In fact, ReLU generally leads to more efficient updates of weights during the back-propagation training process (Gao et al., 2020) . The ReLU's gradient (or slope) is either one for positive inputs or zero for negative ones, thus solving the vanishing gradient issue. Nevertheless, despite providing appropriate initialisation of the weights to small random values via the He initialisation stage (Glorot et al., 2011) , with large weight updates, the summed input to the ReLU activation function is always negative ('dying ReLU' problem) . This negative value yields a zero value at the output and the corresponding nodes do not have any influence on the neural network (Abdelhafiz et al., 2019) , which can lead to misclassification resulting in lack of ability in detecting a pathology involved in an image classification task accurately and reliably, such as for COVID-19 or PD diagnostics.

In an attempt to mitigate the 'dying ReLU' issue, in CNNs and deeper AlexNet, VGG 16, ResNet, etc.) , multiple variations of the ReLU AF have been introduced, such as the Leaky ReLU (LReLU), the Parametric ReLU (PReLU), the Randomised ReLU (RReLU) and the Concatenated ReLU (CReLU), as summarised in Table 1 . Maas et al. (2013) introduced Leaky ReLU (LReLU) to provide a small negative gradient for negative inputs into a ReLU function, instead of being 0. A constant variable , with a default value of 0.01, was used to compute the output for negative inputs ( Another variant of ReLU, named 'Exponential Linear Unit' (ELU) is aimed at improving convergence (Maas et al., 2013) (Table 1) , but it still does not solve the 'dying ReLU' issue either. Klambauer et al. (2017) introduced a variant of ELU called 'Scaled Exponential linear Unit' (SELU) ( Table 1) , which is a self-normalising function that provides an output as a normal distribution graph, making it suitable for deep neural networks with the output converging to zero mean when passed through multiple layers. Although SELU attempts to avoid both vanishing and exploding gradient problems, it does not mitigate the 'dying ReLU' issue. He et al. (2015) proposed the Parametric Rectified Linear Unit (PReLU) in an attempt to provide a better performance than ReLU in large-scale image classification tasks, although the only difference from LReLU is that is not a constant and it is learned during training via back-propagation. Nevertheless, due to this, the PReLU does not solve the 'dying ReLU' issue either, as it is intrinsically a slight variation of the LReLU AF. Similarly, the Randomised Leaky Rectified Linear Unit is a randomised version of LReLU (Pedamonti, 2018) , whereby is a random number sampled from a uniform distribution, thus being still susceptible to the 'dying ReLU' issue too. Shang et al. (2016) proposed a further slight improvement to the ReLU named 'Concatenated ReLU' (CReLU), allowing for both a positive and negative input activation, by applying ReLU after copying the input activations and concatenating them. Thus, CReLU is computationally expensive and prone to the 'dying ReLU' problem, although it generally leads to competitive classification performance with respect to the gold standard ReLU and LReLU AFs (Shang et al., 2016) . Table 1 . The main activation functions commonly used in deep neural networks, including the convolutional neural network (CNN), with their equations and reference. The ReLU and Leaky ReLU are the most common and reliable ones in CNNs.

Logistic Sigmoid ( ) = 1 1 + − Han & Moraga (1995) tanh

Gold & Rangarajan (1996) ArcTan ( ) = −1 ( ) Campbell et al. (1999) SoftPlus 

Despite the wide application of DL-based algorithms for image classification in healthcare, such as the CNN (LeCun et al., 2015) described in 1.2, its classical AF, although it mitigates the vanishing gradient issue typical of sigmoid AFs, can still experience the 'dying ReLU' problem. As discussed in 1.2, none of the recently proposed AFs, such as the LReLU, the PReLU, ELU and SeLU, have not solved this issue yet, as they are still algorithmically similar in their ReLU-like implementations.

This issue can lead to lack of generalisation for CNNs, thus hindering their application in a clinical setting. It is worth noting that, as an example, the last fully connected layer of the CNN in Kollias et al. (2018) , having 1,500 neurons led, due to the 'dying ReLU' problem, to having only 30 neurons yielding non-zero values. Even by coupling a recurrent neural network (RNN) with their CNN, thus having a CNN-RNN (Kollias et al., 2018) , and their last layer then being designed with 128 neurons, only about 20 of them led to non-zero values, whilst the remaining ones experienced the 'dying ReLU' issue, yielding negligible values. These two examples confirm that classical approaches to ReLU failed to solve its associated 'dying ReLU' problem, thus warranting a different approach, which the authors suggest being of quantum nature, as illustrated in 1.4 and motivated in 1.5.

Quantum ML is a relatively new field that blends the computational advantages brought by quantum computing and advances in ML beyond classical computation (Ciliberto, et al., 2018) . Quantum ML has not only led to more effective algorithmic performance, but it has also enabled to find the global minimum in the solutions sought after in ML with a higher probability (Ciliberto, et al., 2018) . The main principles of quantum computing are those inherited from quantum physics, such as superposition, entanglement, and interferences (Barabasi et al., 2019) . According to the quantum principle of superposition, the fundamental quantum bit or qubit can have multiple states at any point in time, i.e., a qubit can have a value of either 0 or 1, such as classical bits, but, differently from and beyond classical bits, a qubit can also have both values 0 and 1 concurrently (Barabasi et al., 2019) . A quantum gate is the unification of two quantum states for them to stay 'entangled' into an individual quantum state, wherein a change in one state would affect the other one and vice versa (Jozsa & Linden, 2003) . Thus, a system of qubits, each of which holds multiple bits of information concurrently, behaves as one via the quantum property named 'entanglement', hence enabling massive parallelism too (Cleve et al., 1998; Solenov et al., 2018) .

However, existing quantum approaches to implement AFs in deep neural networks have only adopted the repeat-until-success (RUS) technique to achieve pseudo non-linearity due to restrictions to linear and unitary operations in quantum mechanics (Nielsen & Chuang, 2002; Cao et al., 2017) . This RUS approach to AFs involves an individual state preparation routine and the generation of various superimposed and entangled linear combinations to propagate the routine of an AF to all states at unison. Thus, a deep neural network leveraging this quantum RUS technique could theoretically approximate most nonlinear AFs (Macaluso et al., 2020) . Nevertheless, the practical applications of this approach are very limited due to the input range of the neurons in such architectures being bounded between 0 and π/2 as a trade-off of their theoretically generic AF formulation. Hu (2018) led a similar theoretical research effort in proposing a sigmoid-based non-linear AF, which is not periodic to enable a more efficient gradient descent whilst leveraging the principle of superposition in training neurons with multiple states concurrently. However, the classical form of the approach of Hu (2018) is the traditional ReLU, thus still not solving the 'dying ReLU' problem either. Konarac et al. (2020) leveraged a similar quantum-based sigmoid AF in their Quantum-Inspired Self-Supervised Network (QIS-Net) to provide high accuracy (99%) and sensitivity (96.1%) in magnetic resonance image segmentation, improving performance by about 1% with respect to classical approaches.

Differently from the related studies mentioned above, the two properties of entanglement and superposition could be pivotal in devising a quantum-based approach to ReLU in having both a positive solution and a negative one simultaneously, being able to avoid a negative solution by preferring the positive one, whereas traditional classical ReLU at times would fail by leading to negative solutions only, i.e., the 'dying ReLU' problem. Moreover, this principle enables quantum systems to reduce computational cost with respect to classical approaches, since several optimisations in multiple states can be performed concurrently (Schuld et al., 2014) .

As described in sections 1.1 and 1.2, DL is highly suitable in classifying medical images due to its intrinsic feature extraction mechanisms. As illustrated in both 1.2 and 1.4, the importance of the AF is evident in both classical and quantum DL, respectively. Although numerous variants of ReLU functions have been proposed in classical DL models (as revised in Section 1.2) they have not been widely adopted as ReLU and LReLU. These two AFs typically ensure accurate and reliable classification and are readily available in Python open source libraries, such as TensorFlow and Keras. Nevertheless, both these AFs and any recent AFs (see Section 1.2) have not solved the 'dying ReLU' problem yet. Moreover, vanishing and exploding gradient issues have not been fully resolved either. ELU and SELU may at times provide faster convergence than ReLU and LReLU, but they are not as reliable as those and are computationally more expensive (Pedamonti, 2018) . Such unresolved issues lead to lack of generalisation that may hinder the diagnostic accuracy and reliability of an application leveraging DL techniques for the detection of COVID-19 or PD, thus resulting in a potentially high number of false negatives when the model's performance is evaluated on unseen patient data. The authors have hypothesised that this impaired generalisation is due to the classical approach underpinning such ReLU-based AFs that has been just leveraged and moulded in various ways so far, without breaking its inherent functional limitations.

The hereby contribution proposes, for the first time, that a quantum-based methodology to ReLU would improve the learning and generalisation in CNNs with relevant impact for critical applications, such as the above-mentioned diagnostic tasks. In particular, by blending the two key quantum principles of entanglement of qubits and the effects of superposition to help reach the global minimum in the solution, thus avoiding negative solutions differently from classical approaches as in 1.3, this study investigates the development of a novel AF 'Quantum ReLU' to avoid the problem of the 'dying ReLU' in a quantistic manner. This builds on recent research efforts by Cong et al. (2019) to develop a Quantum CNN that, although demonstrating how quantum states can be recognised, have not yet addressed the 'dying ReLU' problem, as it simply leveraged the traditional ReLUs instead.

Patterns from lung ultrasound images and spiral drawings are known diagnostic biomarkers for COVID-19 and PD respectively, PD being at times a delicate co-morbidity of COVID-19 patients, and improvements in generalisation are key to an accurate and reliable early diagnosis that can improve outcomes, especially in the event of co-morbidities. Thus, the novel Quantum ReLU will be leveraged in a CNN to improve classification performance in such pattern recognition tasks, as quantified via clinically relevant and interpretable metrics, and compared against the same CNN with current gold-standard AFs, including ReLU and LReLU. The proposed added capability of a Quantum ReLU in a CNN is hypothesised to improve its generalisation for pattern recognition in image classification, such as detecting COVID-19 and PD from ultrasound scans and spiral drawings, respectively.

The remaining sections of the paper are structured as follows. Section 2 deals with the methods, including sub-section 2.1 illustrating the two novel quantum AFs, along with their mathematical formulation and respective implementations in Python codes (in both TensorFlow and Keras libraries). Sub-section 2.2 provides a description of the benchmark datasets selected, along with a standardised data pre-processing strategy, whilst section 3 summarises the results obtained comparing the accuracy, reliability and computational time of a CNN with the proposed quantum AFs against salient gold standard AFs outlined in Table 1 . Eventually, section 4 provides a thorough discussion of the results and section 5 summarises the current work and outlines its access, impact, and future applications.

Despite appropriate initialisation of the weights to small random values via the He initialisation, with large weight updates, the summed input to the traditional ReLU activation function is always negative, although the input values fed to the CNN. Current improvements to the ReLU, such as the Leaky ReLU, allow for a more non-linear output to either account for small negative values or facilitate the transition from positive to small negative values, without eliminating the problem though.

Consequently, this study investigates the development of a novel activation function to obviate the problem of the 'dying ReLU' in a quantistic manner, i.e., by achieving a positive solution where previously the solution was negative. Such an added novel capability in a CNN was hypothesised to improve its generalisation for pattern recognition in image classification, particularly important in critical applications, such as medical diagnoses of COVID-19 and PD.

Thus, using the same standard two-layered CNN in TensorFlow for MNIST data classification, after identifying the main reproducible (with associated codes available in TensorFlow and Keras) AFs following a critical review of the literature (section 1), the following nine classical activation functions were considered: ReLU, Leaky ReLU, CReLU, sigmoid, tanh, softmax, VLReLU, ELU and SELU.

A two-step quantum approach was applied to ReLU first, by selecting its solution for positive values ( ( ) = , ∀ > 0), and the Leaky ReLU's solution for negative values ( ( ) = × , ∀ ≤ 0, ℎ = 0.01) as a starting point to improve quantistically.

By applying the quantum principle of entanglement, the tensor product of the two candidate state spaces from ReLU and Leaky ReLU was performed and the following quantum-based combination of solutions was obtained:

Thus, keeping R(z) = z for positive values (z > 0) as in the ReLU, but with the added novelty of the entangled solution for negative values (1), the Quantum ReLU (QReLU) was attained (Fig. 1) . The algorithms to describe the methodology and AF were implemented in TensorFlow and Keras, and presented in Listings 1 and 2 respectively, thus avoiding the 'dying ReLU' maintaining the positivity of the solution mathematically via this new quantum state. ()) model.add(layers.MaxPooling2D((2, 2))) By leveraging the quantum principle of superposition on the QReLU's solution for positive and negative values, the following modified QReLU (m-QReLU) was obtained (Fig. 2) . The algorithms to describe the methodology and AF were implemented in TensorFlow and Keras, and presented in Listings 3 and 4 respectively, still avoiding the 'dying ReLU' issue: Listing 3 provides the snippet of code in Python to leverage the m-QReLU in TensorFlow, using 'py_func' per Listing 1.

Its usage in TensorFlow is the same as the 'QReLU' in Listing 1 but using 'tf_m_q_relu' as an activation function of the second convolutional layer ('conv2_act' Conv2D(32, (3, 3) , input_shape=(32, 32, 3))) #model.add(QReLU()) model.add(m_QReLU()) model.add(layers. MaxPooling2D((2, 2) ))

The m-QReLU also satisfies the entanglement principle being derived via the tensor outer product of the solutions from the QReLU.

Thus, a quantum-based blend of both superposition and entanglement principles mathematically leads the QReLU and the m-QReLU to obviate the 'dying ReLU' problem intrinsically. As shown in (1) and (2), although the two proposed AFs are quantistic in nature, both QReLU and m-QReLU can be run on classical hardware, such as central processing unit (CPU), graphics processing unit (GPU) and tensor processing unit (TPU), the latter being the type of runtime used in this study via Google Colab (http://colab.research.google.com/) to perform the required evaluation on the datasets described in 2.1. The novel QReLU and m-QReLU were developed and tested using Python 3.6 and written to be compatible with both TensorFlow (1.12 and 1.15 tested, 1.15 supports TensorFlow serving to deploy the novel AFs on the cloud) and the Keras Sequential API. Thus, both AFs were programmed as new Keras layers for ease of use.

By selecting the positive quantum state of the summed input of the QReLU and m-QReLU, an optimal early diagnosis could be achieved for patients with COVID-19 and PD. Thus, this study demonstrates the QReLU and m-QReLU as a potential new benchmark activation function to use in CNNs for critical image classification tasks, particularly useful in medical diagnoses, wherein generalisation is key to improving patient outcomes.

To assess which AF was suitable for each of the pattern recognition tasks involved in classifying the seven benchmark datasets as per 2.1, the performance of the baseline CNN was assessed via the test or out-of-sample classification accuracy, precision, sensitivity/recall and F1-score. Precision, recall, and F1-score are important metrics to measure the reliability of the classification outcomes. 95% confidence intervals (CIs) were also reported.

To enable reproducibility and replicability of the results obtained, publicly available benchmark datasets were gathered and used in this study, as mentioned below. Moreover, to this purpose, full Python codes (.py and .ipynb formats) in both Ten-sorFlow (https://www.tensorflow.org/) and Keras (https://keras.io/) on how these were used for training the model, as well as to evaluate its performance, are also provided.

As a general benchmark dataset for any image classifiers, especially CNNs, the MNIST data (LeCun et al., 1998) , including 60,000 images of handwritten digits (50,000 images for training, 10,000 images for testing), was used for the initial model and AF validation. This dataset is in tensor format available in TensorFlow (https://www.tensorflow.org/datasets/catalog/mnist).

To address the specific needs to improve diagnosis of Parkinson's disease (PD) and that of COVID-19 dealt with in this study, further benchmark datasets were used. Four benchmark datasets were leveraged to identify PD based on patterns on spiral drawings (1290 subjects in total), as follows:

As in the MNIST dataset, images in all benchmark datasets were converted to grayscale and resized to be 28*28.

The two-layered CNN, designed as an MNIST classifier, was initially validated on the MNIST benchmark dataset itself, used for recognising handwritten digits. The QReLU and the m-QReLU were the best and second-best performing activation functions respectively, leading to an ACC and an F1-score of 0.99 (99%) and of 0.98 (98%) respectively (Table 2 ). The ReLU, the Leaky ReLU and the VLReLU also led to the best classification performance on the MNIST data (ACC = 0.99/99%, F1-score = 0.99/99%) (Table 2) . Thus, the proposed QReLU achieved gold standard classification performance on this benchmark dataset.

Noteworthily, the QReLU and the m-QReLU led the same two-layered CNN architecture to achieve the best (ACC = 0.92/92%, F1-score = 0.93/93%) and third (ACC = 0.88/88%, F1-score = 0.90/90%) classification performance (Table 3) on the benchmark dataset named 'Spiral HandPD' on images of spiral drawings taken via graphic tablets from patients with PD and healthy subjects.

As illustrated in Table 4 , competitive results were achieved by the QReLU and the m-QReLU versions on a further benchmark dataset on spiral drawings, the 'NewHandPD dataset', leading to the sixth and eight classification performance respectively (ACC = 0.83/83%, F1-score = 0.83/83%; ACC = 0.79/79%, F1-score = 0.79/79%). Very competitive outcomes were obtained by the two proposed quantum AFs on the Kaggle Spiral Drawings dataset, with m-QReLU (ACC = 0.73/73%, F1score = 0.70/70%) and QReLU (ACC = 0.67/67%, F1-score = 0.67/67%) leading to the second and fourth classification performance respectively (Table 5) , as well as when evaluated against the UCI Spiral Drawings dataset (QReLU ranked fifth with ACC = 0.82/82% and F1-score = 0.74/74%; m-QReLU ranked sixth with ACC = 0.78/78% and F1-score = 0.68/68%) ( Table 6 ).

The overall increased generalisation brought about by the two novel quantum AFs is evident in the outstanding and mutually consistent classification outcomes achieved on both benchmark lung US datasets to distinguish COVID-19 from both pneumonia and healthy subjects with the best (Table 7 -QReLU and m-QReLU with ACC = 0.73/73% and F1-score = 0.73/73%) and the second (Table 8 -QReLU and m-QReLU with ACC = 0.6/60% and F1-score = 0.63/63%) classification performance respectively for both QReLU and m-QReLU.

Despite a higher computational cost (four-fold with respect to the other AFs except for the CReLU's increase being almost three-fold), the results achieved by either or both the proposed QReLU and m-ReLU AFs, assessed on classification accuracy, precision, recall and F1-score, indicate an overall higher generalisation achieved on five of the seven benchmark datasets ( Table 2 on the MNIST data, Tables 3 and 5 on PD-related spiral drawings, Tables 7 and 8 on COVID-19 lung US images). Consequently, the two quantum ReLU methods are the overall best performing AFs that can be applied for aiding diagnosis of both COVID-19 from lung US data and PD from spiral drawings.

Specifically, when using the novel quantum AFs (QReLU and m-QReLU) as compared to the traditional ReLU and Leaky ReLU AFs, the gold standard AFs in DNNs, the following percentage increases in ACC, precision, recall/sensitivity and F1score were noted:

• An increase of 55.32% in ACC and sensitivity/recall via m-QReLU as compared to ReLU and by 37.74% with respect to Leaky ReLU, thus avoiding the 'dying ReLU' problem when the CNN was evaluated on the Kaggle Spiral Drawings benchmark dataset (Table 5 ); • An increase by 65.91% in F1-score via both QReLU and m-QReLU as opposed to Leaky ReLU, hence obviating the 'dying ReLU' problem again but when tested on the COVID-19 Ultrasound benchmark dataset (Table 7) . • An increase of 50% in ACC and sensitivity/recall via both QReLU and m-QReLU with regards to both ReLU and Leaky ReLU, hence solving the 'dying ReLU' problem when evaluated on the POCUS 19 Ultrasound benchmark dataset (Table 8 ). • An increase by 82,000% in ACC and sensitivity/recall via QReLU (82%) when compared to tanh (0% ACC and sensitivity/recall), thus avoiding the vanishing gradient problem too, as assessed on the UCI Spiral Drawings benchmark dataset (Table 6) .

Furthermore, it is worth noting the proposed quantum AFs led to improved classification outcomes as compared to recent advances in ReLU AFs, such as CReLU and VLReLU:

• QReLU led to ACC, precision, sensitivity/recall, and F1-score all higher by 1% those obtained via CReLU when evaluating the CNN's classification performance on the MNIST data (Table 2 ). • m-QReLU resulted in an ACC and a sensitivity/recall higher by 3% than CReLU, and an F1-score greater by 2% on the Spiral HandPD dataset (Table 3) . • m-QReLU led to an ACC and a sensitivity/recall greater by 11% than VLReLU, and an F1-score also higher by 11% on the Spiral HandPD dataset (Table 3) . • m-QReLU resulted in an ACC and a sensitivity/recall higher by 6% than VLReLU, and an F1-score greater by 3% on the Kaggle Spiral Drawings dataset (Table 5) . • QReLU and m-QReLU led to an ACC and a sensitivity/recall greater by 9% and 18% than CReLU and VLReLU respectively, and an F1-score higher by 5% and 14% on the COVID-19 Ultrasound dataset (Table 7) . • QReLU and m-QReLU resulted in an ACC and a sensitivity/recall higher by 20% than VLReLU, and an F1-score greater by 10% on the POCUS 19 Ultrasound dataset (Table 8) .

The results obtained via the QReLU and m-QReLU in a two-layered CNN on the MNIST dataset (Table 2) The two-layered CNN's classification performance via the proposed m-QReLU (ACC = 92%, F1-score = 93%, Table 3 ) was also higher by over 2% than the best performing five-layered CNNs, whose hyperparameters were also optimised respectively via both the Bat Algorithm and Particle Swarm Optimisation (PSO) (Pereira et al., 2016c) , to aid diagnosis of PD from spiral drawings, such as using the 'Spiral HandPD' benchmark dataset.

A comparable precision was achieved by the two-layered CNN model (Table 7) when the QReLU and m-QReLU were used as AFs with respect to the best classifier so far on the COVID 19 Ultrasound dataset, i.e., the sixteen-layered POCOVID-Net model, which builds on the VGG 16 model (Born et al., 2020) . Table 5 . Results on performance evaluation of the first Convolutional Neural Network having two convolutional layers, built in Tensor-Flow, and tested on the Kaggle Spiral Drawings benchmark dataset. The size of the images was set to 28*28, as per the MNIST benchmark dataset. Table 6 . Results on performance evaluation of the first Convolutional Neural Network having two convolutional layers, built in Tensor-Flow, and tested on the University California Irvine (UCI) Spiral Drawings benchmark dataset. The Kaggle Spiral Drawings benchmark dataset, which includes drawings from both healthy subjects and patients with Parkinson's Disease, was used for training and the UCI Spiral Drawings benchmark dataset, which only has spiral drawings acquired during both static and dynamic tests from patients with PD, was deployed for testing. The size of the images was set to 28*28, as per the MNIST benchmark dataset. Table 7 . Results on performance evaluation of the first Convolutional Neural Network having two convolutional layers, built in Tensor-Flow, and tested on the COVID-19 Ultrasound benchmark dataset. The size of the images was set to 28*28, as per the MNIST benchmark dataset. 

Further to the extensive review of existing ReLU AFs provided in Section 1.2, also considering that classical approaches have been unable to solve the 'dying ReLU' problem as reviewed in Section 1.3, and taking into account the advantages of quantum states in AFs (listed in Section 1.4), two novel quantum-based AFs were mathematically formulated in Section 2.2 and developed in both TensorFlow (Listings 1 and 3 , https://www.tensorflow.org/) and Keras (Listings 2 and 4, https://keras.io/) to enable reproducibility and replicability. Thus, the MNIST two-layered CNN-based classifier in Tensor-Flow was selected as the baseline model to assess the impact of using either quantum AFs (QReLU and m-QReLU) on the classification performance on seven benchmark datasets as described in Section 2.1 and evaluated based on test ACC, precision, recall/sensitivity and F1-score, as mentioned in Section 2.2.

The proposed QReLU leads to the best classification performance on the MNIST benchmark dataset (ACC = 99%, F1-score = 99%, Table 2 ) to recognise handwritten digits serves as a regression test to validate the hypothesis whereby, using the baseline CNN-based MNIST classifier, the highest classification performance is achieved with the presumed best AF. This hypothesis has been further confirmed by the m-QReLU achieving the second classification performance (ACC = 99%, F1score = 99%, Table 2 ) across all eleven AFs evaluated as in 2.2. Achieving the same classification performance as the gold standard reproducible and replicable AFs in CNNs (ReLU, the Leaky ReLU and the VLReLU)readily available in both TensorFlow and Kerasthe QReLU can be granted the designation of benchmark AF for the task of handwritten digits recognition performed on the MNIST benchmark dataset.

The benefits of avoiding the 'dying ReLU' problem become evident when assessing the same two-layered CNN architecture with the QReLU especially (ACC = 0.92/92%, F1-score = 0.93/93%, Table 3 ), which achieved the best classification performance on critical image classification tasks, such as recognising PD-related patterns from spiral drawings in the 'Spiral HandPD' benchmark dataset. The higher generalisability achieved via the two proposed quantum AFs in further support of the advantage of obviating the 'dying ReLU' issue is evident from the best classification performance in differentiating COVID-19 from both bacterial pneumonia and healthy controls from the Lung US data (Table 7 -QReLU and m-QReLU with ACC = 0.73/73% and F1-score = 0.73/73%). Such an overall higher diagnostic performance is corroborated by the second-best classification outcomes attained on the second benchmark Lung US dataset (Table 8) .

Whilst traditional ReLU approaches show highly variable classification outcomes, especially when they experience the 'dying ReLU' problem (Tables 5, 7 and 8), both the QReLU and the m-QReLU were able to ensure a consistently higher classification performance and generalisation across the entire variety of image classification tasks involved, from the benchmark handwritten digits recognition task (MNIST), to recognising PD-related patterns from spiral drawings taken from graphic tablets, to aiding detection of COVID-19 from bacteria pneumonia and healthy lungs based on US scans. The advantage of using the proposed AFs for COVID-19 detection lies in the potential for their translational applications in a clinical setting, i.e., in leveraging CNNs with the QReLU or m-QReLU to detect COVID-19 in patients with neurodegenerative co-morbidities, such as PD, via non-ionising medical imaging (e.g., US). This added capability will come handy in future, as portable MRI and ML-enhanced MRI technologies will also become more affordable and widespread, thus being improvable with deep learning models (e.g., the two-layered CNN with QReLU or m-QReLU AFs in this study). Solutions either on edge devices or on the cloud for tele-diagnosis and tele-monitoring required in pandemics similar to the current one (COVID-19) could be soon suitable for in-home diagnostic and prognostic assessments too, which should improve personalised care for shielded or vulnerable individuals.

Moreover, competitive outcomes were obtained via the QReLU and the m-QReLU on three further benchmark datasets, e.g., 'NewHandPD dataset', the Kaggle and the UCI Spiral Drawings benchmark datasets, with ACC and F1-score mostly above 75% (Tables 4-6) using the relatively simple deep neural network leveraged in this study (the two-layered MNIST CNN classifier). Such results also demonstrate the added capability of the proposed QReLU and the m-QReLU to avoid the vanishing gradient problem occurred using tanh (0% ACC and sensitivity/recall), as evaluated on the UCI Spiral Drawings benchmark dataset (Table 6) .

Despite the overall increase in generalisability brought about by the QReLU and the m-QReLU, the computational cost of the CNN increased by four times as compared to the other nine AFs evaluated, except for the CReLU, against which a threefold increase was reported (Tables 2-8) . Nevertheless, considering the importance of achieving higher classification performance over lower computational cost for diagnostic applications in a clinical setting, especially for the critical image classification tasks involved in this study, such as the detection of PD (Tables 3-6 ) and COVID-19 (Tables 7 and 8) , this increase in computational cost is not expected to impair the wide application of the two novel quantum AFs to aid such diagnostic tasks and any other medical applications involving image classification.

In fact, the QReLU and m-QReLU have been demonstrated as considerably better than the current (undisputedly assumed) gold standard AFs in CNNs, i.e., the traditional ReLU and the Leaky ReLU. In particular, an increase by 50-66% in both accuracy and reliability (especially, sensitivity/recall and F1-score) metrics was reported across both pattern recognition tasks, i.e., detection of PD-related patterns from spiral drawings (Tables 5 and 6 ) and aiding diagnosis of COVID-19 from US scans ( Table 7) . The two proposed quantum AFs also outperformed more cutting-edge ReLU AFs, such as the CReLU and the VLReLU, by 5-20% across all classification tasks considered, i.e., MNIST data classification (Table 2) , spiral drawings PD-related pattern recognition (in particular, Tables 3 and 5) , and COVID-19 detection from US scans (Tables 7 and  8) .

Moreover, the QReLU and the m-QReLU led the baseline two-layered CNN MNIST classifier to achieve a comparable classification performance on the MNIST dataset as deeper CNNs, ranging from three to four layers (LeCun et al., 1998; Siddique et al., 2019; Ahlawat et al., 2020) , including deeper architectures, e.g., ResNet and DenseNet (Chen et al., 2018) . It is worth noting that, when leveraging the QReLU and the m-QReLU, the two-layered CNN with hyperparameters based on the MNIST data outperformed (ACC = 92%, F1-score = 93%, Table 3 ) deeper and BA-and PSO-optimised CNNs from published studies by over 2% (Pereira et al., 2016c) in aiding the diagnosis of PD from patterns in spiral drawings (e.g., using the 'Spiral HandPD' benchmark data). The two-layered CNN model with either QReLU or m-QReLU as AFs achieved a comparable precision (Table 7) to the best-performing classifier on the COVID 19 Ultrasound dataset, i.e., the sixteen-layered POCOVID-Net model, which is an extension of the VGG 16 benchmark model (Born et al., 2020) .

These outcomes show the two main practical advantages brought about by the avoidance of the 'dying ReLU' problem in QReLU and the m-QReLU that outweigh the initial consideration on these two quantum AFs leading to an overall higher computational cost despite the increased generalisation, which are as follows:

1. Using QReLU or m-QReLU can obviate the need for several convolutional layers in CNNs and any CNN-derived models, such as AlexNet, ResNet, DenseNet, CondenseNet, cCondensenet and VGG 16, as demonstrated above and in section 3 (results), 2. Leveraging QReLU or m-QReLU as AFs in CNN can minimise the need for optimisation of CNN's hyperparameters.

The implications of the two above-mentioned practical benefits are multiple. Firstly, the two proposed AFs may not only improve generalisation but also computational cost when considering image classification tasks that involve deeper architectures than the two-layered CNN used in this study. Thus, the proposed AFs may be viable alternatives to the ReLU AF, which is the current gold standard AF in CNNs. Second, by improving both generalisation and computational cost when deeper architectures may be required, the QReLU and m-QReLU may be suitable for tasks that require scalability of deep neural networks. Third, the proposed quantum AFs may enable more effective transfer learning, such as for COVID-19 detection in multiple geographical areas, as well as extending trained deep nets to further diagnostic tasks, including prognostic applications too, and aiding self-driving vehicles in image classification tasks essential to ensure passenger safety.

Overall, the avoidance of the 'dying ReLU' problem achieved via QReLU and m-QReLU is expected to radically shift the paradigm of blindly relying on the traditional ReLU AF in CNN and any CNN-derived models, and embrace innovative approaches, including quantum-based, such as the two novel AFs designed, developed and validated in this study.

Further to a thorough analysis of the classification performance of the two-layered CNN MNIST classifier leveraging the two quantum AFs developed in this study, QReLU and m-QReLU, and evaluated against nine benchmark AFs, including ReLU and its main recent reproducible and replicable advances, as well as relevant published studies, the proposed QReLU and m-QReLU prove to be the first two AFs in the recorded history of deep learning to successfully avoid the 'dying ReLU' problem, by design. Their novel algorithms describing the methodology and AF were implemented in TensorFlow and Keras, as well as presented in Listings 1-4. This added capability ensured accurate and reliable classification for recognising PDrelated patterns from spiral drawings and detecting COVID-19 from non-ionising medical imaging (US) data.

Furthermore, its availability in both Google's TensorFlow and Kerasthe two most popular libraries in Python for deep learning -facilitate their wide application beyond clinical diagnostics, including medical prognostics and any other applications involving image classification. Thus, the QReLU and m-QReLU can aid detection of COVID-19 during these unprecedented times of this pandemic, as well as deliver continuous value added in aiding the diagnosis of PD based on pattern recognition from spiral drawings.

Noteworthily, when leveraging the proposed quantum AFs, the baseline CNN model achieved comparable classification performance to deeper CNN and CNN-derived architectures across all image recognition tasks involved in this study, from handwritten digits recognition, to detection of PD-related patterns from spiral drawings and COVID-19 from lung US scans. Thus, these outcomes corroborate the benefit of using AFs that avoid the 'dying ReLU' problem for critical image classification tasks, such as for medical diagnoses, making them a viable alternative to the current gold standard AF in CNNs, i.e., the ReLU. This study is expected to have a radical impact in redefining the benchmark AFs in CNN and CNN-derived deep learning architectures for applications across academic research and industry.

Improved Handwritten Digit Recognition Using Convolutional Neural Networks (CNN)

Quantum Computing and Deep Learning Working Together to Solve Optimization Problems

Big Data and Machine Learning in Health Care

Parkinsonism as a Third Wave of the COVID-19 Pandemic?

Chronic Neurology in COVID-19 Era: Clinical Considerations and Recommendations from the REPROGRAM Consortium. Front. Neurol

POCOVID-Net: automatic detection of COVID-19 from a new lung ultrasound imaging dataset (POCUS)

Stability and bifurcation of a simple neural network with multiple time delays

Quantum Neuron: an elementary building block for machine learning on quantum computers. arXiv

Assessing four neural networks on handwritten digit recognition dataset (MNIST)

Quantum machine learning: a classical perspective

Quantum algorithms revisited

Quantum Convolutional Neural Networks

Applying Gradient Descent in Convolutional Neural Networks

Origin and evolution of pathogenic coronaviruses

Adaptive Convolution ReLUs. Thirty-Fourth AAAI Conference on Artificial Intelligence

Deep sparse rectifier neural networks

Softmax to softassign: Neural network algorithms for combinatorial optimization

The influence of the sigmoid function parameters on the speed of backpropagation learning

Sigmoid transfer functions in backpropagation neural networks

Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification

Reducing the Dimensionality of Data with Neural Networks

Towards a Real Quantum Neuron

Improved spiral test using digitized graphics tablet for monitoring Parkinson's disease

On the role of entanglement in quantum-computational speed-up

Deep Learning Applications in Medical Image Analysis

Self-Normalizing Neural Networks. arXiv

Deep neural architectures for prediction in healthcare

A Quantum-Inspired Self-Supervised Network model for automatic segmentation of brain MR images

ImageNet Classification with Deep Convolutional Neural Networks

Convolutional networks for images, speech, and time-series

Gradient-based learning applied to document recognition. Proceedings of the IEEE

Deep Learning

Bounded activation functions for enhanced training stability of deep neural networks on visual pattern recognition problems

A survey on deep learning in medical image analysis

Rectifier Nonlinearities Improve Neural Network Acoustic Models. Proceedings of the 30th International Conference on Machine Learning

A Variational Algorithm for Quantum Neural Networks

Rectified Linear Units Improve Restricted Boltzmann Machines

Quantum Computation and Quantum Information

Comparison of non-linear activation functions for deep neural networks on MNIST classification task

A new computer vision-based approach to aid the diagnosis of Parkinson's disease

Deep learning-aided Parkinson's disease diagnosis from handwritten dynamics

Convolutional neural networks applied for Parkinson's disease identification

FReLU: Flexible Rectified Linear Units for Improving Convolutional Neural Networks

Learning representations by back-propagating errors

ImageNet Large Scale Visual Recognition Challenge

Collection and analysis of a Parkinson speech dataset with multiple types of sound recordings

The quest for a Quantum Neural Network. Quantum Inf Process

Understanding and improving convolutional neural networks via concatenated rectified linear units

Radiological findings from 81 patients with COVID-19 pneumonia in Wuhan

Recognition of handwritten digit using convolutional neural network in python with tensorflow and comparison of performance for various hidden layers

The Potential of Quantum Computing and Machine Learning to Advance Clinical Research and Change the Practice of Medicine

Leaky_Relu | Tensorflow Core V2.3.0

TensorFlow. 2020. Tf.Keras.Layers.Leakyrelu | Tensorflow Core V2.3.0

Estimates of the severity of coronavirus disease 2019: a model-based analysis

Empirical evaluation of rectified activations in convolutional network

Visualizing and Understanding Convolutional Networks

Distinguishing different stages of Parkinson's disease using composite index of speed and pen-pressure of sketching a spiral

Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: a retrospective cohort study

The authors would like to thank two research assistants from the University of Bradford, Ms Smriti Kotiyal and Mr Rohit Trivedi, for their assistance to the background review relevant for this paper.The authors declare that no ethical approval was required for carrying out the study, as the data used in it were taken from publicly available repositories and appropriately referenced in text. Moreover, the authors declare not to have any competing interests and an appropriate funding statement has been provided on the title page of this article.