key: cord- -f d zov authors: qiu, xi; liang, shen; zhang, yanchun title: simultaneous ecg heartbeat segmentation and classification with feature fusion and long term context dependencies date: - - journal: advances in knowledge discovery and data mining doi: . / - - - - _ sha: doc_id: cord_uid: f d zov arrhythmia detection by classifying ecg heartbeats is an important research topic for healthcare. recently, deep learning models have been increasingly applied to ecg classification. among them, most methods work in three steps: preprocessing, heartbeat segmentation and beat-wise classification. however, this methodology has two drawbacks. first, explicit heartbeat segmentation can undermine model simplicity and compactness. second, beat-wise classification risks losing inter-heartbeat context information that can be useful to achieving high classification performance. addressing these drawbacks, we propose a novel deep learning model that can simultaneously conduct heartbeat segmentation and classification. compared to existing methods, our model is more compact as it does not require explicit heartbeat segmentation. moreover, our model is more context-aware, for it takes into account the relationship between heartbeats. to achieve simultaneous segmentation and classification, we present a faster r-cnn based model that has been customized to handle ecg data. to characterize inter-heartbeat context information, we exploit inverted residual blocks and a novel feature fusion subroutine that combines average pooling with max-pooling. extensive experiments on the well-known mit-bih database indicate that our method can achieve competitive results for ecg segmentation and classification. arrhythmia occurs when the heart rhythms are irregular, which can lead to serious organ damage. arrhythmias can be caused by high blood pressure, heart diseases, etc [ ] . electrocardiogram (ecg) is one of the most popular tools for arrhythmia diagnosis. to manually handle long ecg recordings with thousands of heartbeats, clinicians have to determine the class of each heartbeat to detect arrhythmias, which is highly costly. therefore, great efforts have been made to create computer-aided diagnosis tools that can detect irregular heartbeats automatically. in recent years, deep learning models have been gradually applied to ecg classification. among them, most methods work in three steps: preprocessing, heartbeat segmentation and beat-wise classification (see sect. ). the preprocessing step removes various kinds of noise from raw signals, the heartbeat segmentation step identifies individual heartbeats, and the beat-wise classification step classifies each heartbeat. this methodology has the following drawbacks: first, explicit heartbeat segmentation can undermine model simplicity and compactness. traditional heartbeat segmentation methods explicitly extract ecg features for qrs detection. since deep learning methods can produce feature maps from raw data, heartbeat segmentation can be simultaneously conducted with classification with a single neural network. second, beat-wise classification uses isolated heartbeats, which risks losing inter-heartbeat context information that can be useful to boosting classification performance. addressing these drawbacks, we propose a novel deep learning model that can simultaneously conduct heartbeat segmentation and classification. compared to existing methods, our model is more compact as it does not require explicit heartbeat segmentation. the difference between our model and existing deep learning models is shown in fig. . as is shown, our model takes in a -d ecg sequence and outputs both the segmented heartbeats and their corresponding labels. besides, our model is more context-aware, for it takes into account the relationship between heartbeats. to achieve simultaneous segmentation and classification, we present a faster r-cnn [ ] based model that has been customized to handle ecg sequences. to capture inter-heartbeat context information, we exploit inverted residual blocks [ ] to produce multi-scale feature maps, which are then fused by a novel feature fusion mechanism to learn inter-heartbeat context information. moreover, the semantic information and morphological information are explored from the fused features to improve performance. our main contributions are as follows: -we propose a novel deep learning model for simultaneous heartbeat segmentation and classification. -we present a novel faster r-cnn based model that has been customized to handle ecg data. -we use inverted residual blocks and a novel feature fusion subroutine to exploit long term inter-heartbeat dependencies for context awareness. -we conduct extensive experiments on the well-known mit-bih database [ , ] to demonstrate the effectiveness of our model. the rest of this paper is organized as follows. section reviews the related work. section presents our model. section reports the experimental results. section concludes this paper. traditional arrhythmia detection methods extract handcrafted features from ecg data, such as r-r intervals [ , ] , ecg morphology [ ] , frequency [ ] , etc. classifiers such as linear discriminant analysis models [ ] , support vector machines [ ] and random forests [ ] are then built upon these features. in recent years, many researchers turn to deep neural networks for heartbeat classification. the majority of deep learning models take raw signals as their input, omitting explicit feature extraction and selection steps. in [ ] , kiranyaz et al. proposed a patient-specific -d cnn for ecg classification. in [ ] , yildirim et al. designed a deep ltsm network with wavelet-based layers for heartbeat classification. some methods [ , ] combine lstm with cnn. these aforementioned deep learning methods work in three steps: preprocessing, heartbeat segmentation and beat-wise classification. they do not explicitly utilize context information among heartbeats. by contrast, mousavi et al. [ ] proposed a sequence-to-sequence lstm based model which maps a sequence of heartbeats in time order to a sequence of labels, where context dependencies are captured by the cell state of the network. hannun et al. [ ] proposed a -layer neural network for arrhythmia detection which maps ecg data to a sequence of labels. however, the precise regions of arrhythmias cannot be obtained. oh et al. [ ] used a modified u-net to identify regions of heartbeats and the background from raw signals, yet this method needs extra steps to detect arrhythmias from the generated annotation. several researches applies faster r-cnn to ecg analysis. for example, ji et al. [ ] proposed a heartbeat classification framework based on faster r-cnn. -d heartbeats extracted from original signals are converted to images as the input of the model. sophisticated preprocessing is required before classification. he et al. [ ] and yu et al. [ ] use faster r-cnn to perform heartbeat segmentation and qrs complex detection. in our method, we present a modified faster r-cnn for arrhythmia detection which works in only two steps: preprocessing, and simultaneous heartbeat segmentation and classification. the architecture of our model is shown in fig. , which takes -d ecg sequence as its input and conducts heartbeat segmentation and classification simultaneously. to achieve this, our model consists of modules: a backbone network, a region proposal network (rpn), a region classification network (rcn), a filter block, a down sampling block and a region pooling block. the backbone network produces multi-scale feature maps from the ecg signal. the modules in the upper part of fig. performs heartbeat segmentation, while ones in the lower part performs heartbeat classification. we now elaborate on the details of our method. the preprocessing step removes noise from raw signals. here we employ a three-order butterworth bandpass filter with a frequency range of . hz- hz because this range contains the main components of ecg signals [ ] . the backbone network generates multi-scale semantic and morphological feature maps from raw ecg signals. for efficiency, we choose the inverted residual block [ ] as the building block. we customize it for ecg data in the following manner: ( ) we increase the kernel size from to to enlarge the receptive field. ( ) the activation function is replaced by elu for less information loss. ( ) the residual connection is added to the building block in stride condition. there are two branches in stride condition (fig. ) . ( ) stride convolution is replaced by max-pooling to make the model more lightweight. there are layers in our backbone. each layer is composed of several building blocks (fig. ) . besides, each layer downsamples the feature map by a factor of two. different from most deep learning methods which compute feature maps for a single heartbeat, our backbone model takes a long ecg sequence as its input. the produced feature maps encode not only morphological and semantic information of individual heartbeats, but also context information amongst multiple heartbeats. the bottom layers of the backbone generate feature maps of strong morphological information while the top layers produce feature maps of strong semantic information [ ] . moreover, the receptive field increases from bottom layers to top layers, thus the feature maps encode involve inter-heartbeat context dependencies with varying differences in time. we fuse multi-scale feature maps to utilize both morphological and semantic information of the heartbeats in segmentation and classification. besides, the fused feature maps can provide more context information. all feature maps except those in the first two layers are used for efficiency. in the segmentation task, the feature maps need to be normalized to have equal dimensions by the downsampling block before fusion. feature maps are downsampled to a fixed length by a novel mechanism shown in eq. , which is a trade-off between performance and complexity. contrary to convolution based down sampling methods [ ] , our down sampling block is parameter-free. average pooling is exploited for less information loss during downsampling, while max-pooling highlights the discriminative features in the feature maps. the rpn fuses the feature maps from the downsampling block and performs heartbeat segmentation. here we directly segment the heartbeats without qrs detection. as is shown in fig. (a) , rpn has two branches performing regression (top) and classification (bottom). the classification branch produces a binary label for each region indicating whether it contains a single heartbeat. the regression branch produces endpoints of each region which encloses a heartbeat. intuitively, the regression task is far more difficult than binary classification for rpn, thus we use multi-size convolutional layers ( , , in this paper) to further extract features before regression. following the practice of faster r-cnn [ ] , at each position of a feature map, we pre-define three reference regions. these regions have different sizes ( , , for ecg heartbeats). to predict a region, we use the center of one of the three regions as the reference point and report its offset to this reference. however, the regions obtained by rpn can overlap with nearby regions, undermining the efficiency of the model. in response, we use non-maximum suppression (nms) to filter these regions in the filter block. nms selects a region with max confidence in each iteration and then compute the overlaps between each remaining region and the selected one. the regions whose overlap exceeds a pre-set threshold ( % in this paper) are discarded, so are the regions containing no heartbeats (confidence below . ). in the heartbeat classification task, the region pooling block generates heartbeat feature maps for the predicted heartbeat regions [ ] . feature maps in the last four layers (with strides , , , ) are reused to extract heartbeat features. because these feature maps have different sizes, the predicted regions are mapped as: region = (start/stride, end/stride). moreover, each region is divided into fixed-size sub-regions. heartbeat feature maps are produced by average pooling on each sub-region. to keep sufficient morphological information, heartbeat feature maps in the bottom layers have larger sizes ( , , , for strides , , , ). heartbeat feature maps are then fed into the region classification network (rcn, fig. (b) ) to classify heartbeats inside each region. rcn performs heartbeat classification by fusing the feature maps from the region pooling block. note that we do not fine-tune the regions as in faster r-cnn because it trades efficiency for only minor improvements in accuracy. following common practices in the detection task [ , ] , our backbone network is initialized with a pre-trained network. we extract heartbeats from the experimental database (to be discussed later) and pre-train the backbone network with extra layers on these heartbeats. then, the last few layers are removed while the remain ones are used as the backbone network. we coarsely annotate the groundtruth heartbeat region for each heartbeat which ranges from . s- . s around the r peak so as to most of heartbeat. since our model can capture inter-heartbeat context information, finer annotation is not necessary. the offsets of reference regions to the groundtruth ones [ ] are used to train the rpn regression branch. to train the classification branch, positive labels are assigned a predicted region when the following criteria are met: ( ) its overlap with a groundtruth heartbeat region is over . . ( ) it has the highest overlap with a groundtruth heartbeat region. in rcn training, the label of a predict region is assigned to the heartbeat inside it. we use jaccard distance as the metric for overlap computation. similar to [ ] , our entire training process has two steps: ) train the rpn with regression loss and binary classification loss. ) train the rcn with multiclassification loss. for better performance, we choose smooth l loss (eq. ) to train the regression branch and focal loss (eq. ) to train rcn and rpn for classification. where p t is the estimated probability for the class t. we set γ to , and α t to [ . , ] for binary classification and [ , . , , . , ] for multi-classification. we implemented our model using pytorch . our source code is available for reproducibility. the experiments were run on a linux server with an intel xeon w- cpu @ . ghz, gb memory and a single nvidia ti gpu. adadelta was used as the optimizer with weight decay e − . the learning rate was set to . for training, decaying every epochs exponentially as follows: lr = . * . epoch/ . the batch size was set to for both training and testing. we used data from the well-known mit-bih database [ , ] , which contains half-hour two-lead ecg recordings. we used the mlii lead in the experiments and excluded recordings with paced beats, following the ansi/aami ec standard [ ] . due to limited computational resources, we divided each recording into a series of long sequences with data points. the first and last s of each recording were discarded. note that our model can process much longer ecg recordings with abundant computational resources. we run the experiments times, randomly dividing the dataset into training, validation and test sets for each run. the training set contained % of all data. the evaluation and test sets included % and % of all data. the heartbeat labels were mapped into groups by ansi/aami standard, namely n, s, v, f, q (see table ). we did not take the q class into consideration because of its scarcity. we applied the following metrics for evaluation: positive predictive value (ppv), sensitivity (sen), specificity (spe) and accuracy (acc). to evaluate heartbeat segmentation performance, we define truth positive (tp) as: ) a predicted region contains only one heartbeat and ) its non-overlapping area with the groundtruth is less than ms. we define false positive (fp) is as: ) a predict region encloses more than two heartbeats or ) its non-overlapping area with the groundtruth is exceeds ms. we define false negative (fn) as: a groundtruth heartbeat is not enclosed by any predicted region. for baselines, we used two qrs detection based heartbeat segmentation methods: pan-tompkins [ ] and wavedet [ ] . the results are shown in table . as is shown, our method is highly competitive against the baselines. it is worth noting that unlike the baselines, our model does not apply qrs detection for segmentation, thus there may be inconsistencies on the definitions of tp, fp and fn between our model and the baselines. however, it is safe to say that our model performs well enough to be applied in real-world scenarios. we now evaluate the heartbeat classification performance. the baselines come from [ , , , ] . here we applied smote [ ] for data augmentation as it was also used in our baselines. figure shows the results. our model achieves an accuracy of . %, a sensitivity of . % and a specificity of . %. these results are similar to those obtained by [ ] which used a lstm-based sequenceto-sequence model to learn context information. the difference between our work and [ ] is that we learn context information from raw signals while [ ] did so using a sequence of individual heartbeats. besides, the lstm-based model has lower efficiency. compared to other baselines which perform classification on individual heartbeat, our model has a simpler model structure but achieves similar or better results on some metrics, highlighting the power of contextawareness. we now investigate the impact of the key design features of our model. for better evaluation, we did not use smote [ ] here. to better capture long term dependencies, we have enlarged the receptive field to retain context information. also, our model captures inter-heartbeat dependencies by learning multi-scale feature maps with rpn and rcn. to demonstrate the effectiveness of these design choices, we conducted the following ablation tests: ) setting the kernel size to for all convolution filters, ) using the feature maps in the top layers only, ) using the feature maps in the bottom layers only, ) equalize the output sizes of multi-scale feature maps in the region pooling block. figure and table presents the results in the segmentation and classification tasks. as is shown, while the effectiveness of these design features in the segmentation task is limited, they are indeed beneficial to the classification task. figure and table show the results. by modifying the backbone, our model can learn more strong features to improve performance. our downsampling method also outperforms using only max-pooling or average pooling. our model has more parameters in the regression branch of rpn based on the intuition that regression is more difficult than binary classification in rpn. to evaluate this design, we conducted the following ablation tests: ) enlarging the classification branch. ) simplifying the regression branch. table presents the results. as is shown, simplifying the regression branch has negative impact on performance, while enlarging the classification branch brings about no improvement. in this paper, we have propose a novel deep learning model that can simultaneously conduct heartbeat segmentation and classification. compared to existing methods, our model is more compact as it does not require explicit heartbeat segmentation. moreover, our model is context-aware by using feature fusion and long term context dependencies. in the future, we plan to extend our model to multi-lead ecg analysis tasks. cardiac arrhythmia: mechanisms, diagnosis, and management faster r-cnn: towards real-time object detection with region proposal networks proceedings of the ieee conference on computer vision and pattern recognition the impact of the mit-bih arrhythmia database physiobank, physiotoolkit, and physionet: components of a new research resource for complex physiologic signals automatic classification of heartbeats using ecg morphology and heartbeat interval features classification of electrocardiogram signals with support vector machines and particle swarm optimization ecg beat classification using pca, lda, ica and discrete wavelet transform medical decision support system for diagnosis of heart arrhythmia using dwt and random forests classifier real-time patient-specific ecg classification by -d convolutional neural networks a novel wavelet sequence based on deep bidirectional lstm network model for ecg signal classification cardiac arrhythmia detection from ecg combining convolutional and long short-term memory networks a lstm and cnn based assemble neural network framework for arrhythmias classification inter-and intra-patient ecg heartbeat classification for arrhythmia detection: a sequence to sequence deep learning approach cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network automated beat-wise arrhythmia diagnosis using modified u-net on extended electrocardiographic recordings with heterogeneous arrhythmia types electrocardiogram classification based on faster regions with convolutional neural network a deep learning method for heartbeat detection in ecg image qrs detection and measurement method of ecg paper based on convolutional neural networks frequency content and characteristics of ventricular conduction feature pyramid networks for object detection ansi/aami ec : -testing and reporting performance results of cardiac rhythm and st segment measurement algorithms a real-time qrs detection algorithm a wavelet-based ecg delineator: evaluation on standard databases arrhythmias classification by integrating stacked bidirectional lstm and two-dimensional cnn a selective ensemble learning framework for ecg-based heartbeat classification with imbalanced data smote: synthetic minority over-sampling technique acknowledgement. this work is funded by nsfc grant and dongguan innovative research team program . we sincerely thank prof chun liang and dr zhiqing he from department of cardiology, shanghai changzheng hospital for their valuable advice. key: cord- - zz rd h authors: parisi, l.; neagu, d.; ma, r.; campean, f. title: qrelu and m-qrelu: two novel quantum activation functions to aid medical diagnostics date: - - journal: nan doi: nan sha: doc_id: cord_uid: zz rd h the relu activation function (af) has been extensively applied in deep neural networks, in particular convolutional neural networks (cnn), for image classification despite its unresolved dying relu problem, which poses challenges to reliable applications. this issue has obvious important implications for critical applications, such as those in healthcare. recent approaches are just proposing variations of the activation function within the same unresolved dying relu challenge. this contribution reports a different research direction by investigating the development of an innovative quantum approach to the relu af that avoids the dying relu problem by disruptive design. the leaky relu was leveraged as a baseline on which the two quantum principles of entanglement and superposition were applied to derive the proposed quantum relu (qrelu) and the modified-qrelu (m-qrelu) activation functions. both qrelu and m-qrelu are implemented and made freely available in tensorflow and keras. this original approach is effective and validated extensively in case studies that facilitate the detection of covid- and parkinson disease (pd) from medical images. the two novel afs were evaluated in a two-layered cnn against nine relu-based afs on seven benchmark datasets, including images of spiral drawings taken via graphic tablets from patients with parkinson disease and healthy subjects, and point-of-care ultrasound images on the lungs of patients with covid- , those with pneumonia and healthy controls. despite a higher computational cost, results indicated an overall higher classification accuracy, precision, recall and f -score brought about by either quantum afs on five of the seven bench-mark datasets, thus demonstrating its potential to be the new benchmark or gold standard af in cnns and aid image classification tasks involved in critical applications, such as medical diagnoses of covid- and pd. sars-cov- is responsible for covid- , the 'severe acute respiratory syndrome coronavirus ' (cohen & normile, ) and the current global pandemic announced by the world health organization (who, mar ) . this virus leads to respiratory disease in humans (cui et al., ) , but it may take from to days for the initial symptoms, e.g., fever and cough, to become manifest after an infection (centers for disease control and prevention, ) . however, more severe symptoms can progress to viral pneumonia and typically require mechanical ventilation to assist patients with breathing (verity et al., ) . in some more severe cases, covid- can also lead to worsen symptoms and even death (zhou, et al., ) , as well as it may be an aetiology of pd itself (beauchamp et al., ) . thus, it is important to be able to detect neurodegenerative co-morbidities in vulnerable undiagnosed patients, such as pd, promptly and non-invasively too, for example via cnns that can recognise patterns from spiral drawings, and then applying non-ionising medical imaging techniques (bhaskar et al., ) , which are more appropriate for such patients, to facilitate a prompt diagnosis of covid- to improve clinical outcomes. whilst tremors can be detected from patterns in spiral drawings as indicators of early pd, ground-glass opacities, lung consolidation, bilateral patchy shadowing and relevant other lesionslike patterns can be detected as biomarkers to identify covid- -related pneumonia from any other types, including both viral and bacterial pneumonia (shi et al., ) . improvements in the afs of cnns can help to improve generalisation in both these image classification tasks. different layers of a deep neural network represent various degrees of abstraction, thus capturing a varying extent of patterns from input images (zeiler & fergus, ) . afs provide the cnn with the non-linearity required to learn from non-linearly distributed data, even in presence of a reasonable amount of noise. an af defines the gradient of a layer, which depends on its domain and the range. afs are differentiable and can be either saturated or unsaturated. in table the main activation functions commonly used in deep neural networks, including the convolutional neural network (cnn), with their equations and references, are summarised, and introduced below. saturated afs are continuous with their outputs threshold into finite boundaries, typically represented as s-shaped curves, also named 'sigmoidal' or 'squashing' afs, e.g., the logistic sigmoidal function with its output in the range of and (liew et al., ) . saturated afs are typically applied in shallow neural networks, e.g., in mlps. however, saturated afs lead to the vanishing gradient issue whilst training a network with back-propagation (cui, ) , i.e., results in gradients that are less than , which become smaller with multiple differentiations and ultimately become or 'vanish'. thus, changes in the activated neurons do not lead to modifications of any weights during back-propagation. moreover, the exploding gradient problem can occur, which has an opposite effect to vanishing gradients, wherein the error gradient in the weight is so high that it leads to instability whilst updating the weights during back-propagation. hyperbolic tangent or 'tanh' (see table ) is a further saturated af, but it attempts to mitigate this issue by extending the range of the logistic function from - to , centred at . nevertheless, tanh still does not solve the vanishing gradient problem. unsaturated functions are not bounded in any output ranges and are centred at . the rectified linear unit (relu) ( table ) is the most widely applied unsaturated af in deep neural networks, e.g., in cnns, which provides faster convergence than logistic sigmoidal (lecun et al., ) and tanh afs, as well as improved generalisation (litjens et al., ) . in fact, relu generally leads to more efficient updates of weights during the back-propagation training process (gao et al., ) . the relu's gradient (or slope) is either one for positive inputs or zero for negative ones, thus solving the vanishing gradient issue. nevertheless, despite providing appropriate initialisation of the weights to small random values via the he initialisation stage (glorot et al., ) , with large weight updates, the summed input to the relu activation function is always negative ('dying relu' problem) . this negative value yields a zero value at the output and the corresponding nodes do not have any influence on the neural network (abdelhafiz et al., ) , which can lead to misclassification resulting in lack of ability in detecting a pathology involved in an image classification task accurately and reliably, such as for covid- or pd diagnostics. in an attempt to mitigate the 'dying relu' issue, in cnns and deeper alexnet, vgg , resnet, etc.) , multiple variations of the relu af have been introduced, such as the leaky relu (lrelu), the parametric relu (prelu), the randomised relu (rrelu) and the concatenated relu (crelu), as summarised in table . maas et al. ( ) introduced leaky relu (lrelu) to provide a small negative gradient for negative inputs into a relu function, instead of being . a constant variable , with a default value of . , was used to compute the output for negative inputs ( another variant of relu, named 'exponential linear unit' (elu) is aimed at improving convergence (maas et al., ) (table ) , but it still does not solve the 'dying relu' issue either. klambauer et al. ( ) introduced a variant of elu called 'scaled exponential linear unit' (selu) ( table ) , which is a self-normalising function that provides an output as a normal distribution graph, making it suitable for deep neural networks with the output converging to zero mean when passed through multiple layers. although selu attempts to avoid both vanishing and exploding gradient problems, it does not mitigate the 'dying relu' issue. he et al. ( ) proposed the parametric rectified linear unit (prelu) in an attempt to provide a better performance than relu in large-scale image classification tasks, although the only difference from lrelu is that is not a constant and it is learned during training via back-propagation. nevertheless, due to this, the prelu does not solve the 'dying relu' issue either, as it is intrinsically a slight variation of the lrelu af. similarly, the randomised leaky rectified linear unit is a randomised version of lrelu (pedamonti, ) , whereby is a random number sampled from a uniform distribution, thus being still susceptible to the 'dying relu' issue too. shang et al. ( ) proposed a further slight improvement to the relu named 'concatenated relu' (crelu), allowing for both a positive and negative input activation, by applying relu after copying the input activations and concatenating them. thus, crelu is computationally expensive and prone to the 'dying relu' problem, although it generally leads to competitive classification performance with respect to the gold standard relu and lrelu afs (shang et al., ) . table . the main activation functions commonly used in deep neural networks, including the convolutional neural network (cnn), with their equations and reference. the relu and leaky relu are the most common and reliable ones in cnns. logistic sigmoid ( ) = + − han & moraga ( ) tanh gold & rangarajan ( ) arctan ( ) = − ( ) campbell et al. ( ) softplus despite the wide application of dl-based algorithms for image classification in healthcare, such as the cnn (lecun et al., ) described in . , its classical af, although it mitigates the vanishing gradient issue typical of sigmoid afs, can still experience the 'dying relu' problem. as discussed in . , none of the recently proposed afs, such as the lrelu, the prelu, elu and selu, have not solved this issue yet, as they are still algorithmically similar in their relu-like implementations. this issue can lead to lack of generalisation for cnns, thus hindering their application in a clinical setting. it is worth noting that, as an example, the last fully connected layer of the cnn in kollias et al. ( ) , having , neurons led, due to the 'dying relu' problem, to having only neurons yielding non-zero values. even by coupling a recurrent neural network (rnn) with their cnn, thus having a cnn-rnn (kollias et al., ) , and their last layer then being designed with neurons, only about of them led to non-zero values, whilst the remaining ones experienced the 'dying relu' issue, yielding negligible values. these two examples confirm that classical approaches to relu failed to solve its associated 'dying relu' problem, thus warranting a different approach, which the authors suggest being of quantum nature, as illustrated in . and motivated in . . quantum ml is a relatively new field that blends the computational advantages brought by quantum computing and advances in ml beyond classical computation (ciliberto, et al., ) . quantum ml has not only led to more effective algorithmic performance, but it has also enabled to find the global minimum in the solutions sought after in ml with a higher probability (ciliberto, et al., ) . the main principles of quantum computing are those inherited from quantum physics, such as superposition, entanglement, and interferences (barabasi et al., ) . according to the quantum principle of superposition, the fundamental quantum bit or qubit can have multiple states at any point in time, i.e., a qubit can have a value of either or , such as classical bits, but, differently from and beyond classical bits, a qubit can also have both values and concurrently (barabasi et al., ) . a quantum gate is the unification of two quantum states for them to stay 'entangled' into an individual quantum state, wherein a change in one state would affect the other one and vice versa (jozsa & linden, ) . thus, a system of qubits, each of which holds multiple bits of information concurrently, behaves as one via the quantum property named 'entanglement', hence enabling massive parallelism too (cleve et al., ; solenov et al., ) . however, existing quantum approaches to implement afs in deep neural networks have only adopted the repeat-until-success (rus) technique to achieve pseudo non-linearity due to restrictions to linear and unitary operations in quantum mechanics (nielsen & chuang, ; cao et al., ) . this rus approach to afs involves an individual state preparation routine and the generation of various superimposed and entangled linear combinations to propagate the routine of an af to all states at unison. thus, a deep neural network leveraging this quantum rus technique could theoretically approximate most nonlinear afs (macaluso et al., ) . nevertheless, the practical applications of this approach are very limited due to the input range of the neurons in such architectures being bounded between and π/ as a trade-off of their theoretically generic af formulation. hu ( ) led a similar theoretical research effort in proposing a sigmoid-based non-linear af, which is not periodic to enable a more efficient gradient descent whilst leveraging the principle of superposition in training neurons with multiple states concurrently. however, the classical form of the approach of hu ( ) is the traditional relu, thus still not solving the 'dying relu' problem either. konarac et al. ( ) leveraged a similar quantum-based sigmoid af in their quantum-inspired self-supervised network (qis-net) to provide high accuracy ( %) and sensitivity ( . %) in magnetic resonance image segmentation, improving performance by about % with respect to classical approaches. differently from the related studies mentioned above, the two properties of entanglement and superposition could be pivotal in devising a quantum-based approach to relu in having both a positive solution and a negative one simultaneously, being able to avoid a negative solution by preferring the positive one, whereas traditional classical relu at times would fail by leading to negative solutions only, i.e., the 'dying relu' problem. moreover, this principle enables quantum systems to reduce computational cost with respect to classical approaches, since several optimisations in multiple states can be performed concurrently (schuld et al., ) . as described in sections . and . , dl is highly suitable in classifying medical images due to its intrinsic feature extraction mechanisms. as illustrated in both . and . , the importance of the af is evident in both classical and quantum dl, respectively. although numerous variants of relu functions have been proposed in classical dl models (as revised in section . ) they have not been widely adopted as relu and lrelu. these two afs typically ensure accurate and reliable classification and are readily available in python open source libraries, such as tensorflow and keras. nevertheless, both these afs and any recent afs (see section . ) have not solved the 'dying relu' problem yet. moreover, vanishing and exploding gradient issues have not been fully resolved either. elu and selu may at times provide faster convergence than relu and lrelu, but they are not as reliable as those and are computationally more expensive (pedamonti, ) . such unresolved issues lead to lack of generalisation that may hinder the diagnostic accuracy and reliability of an application leveraging dl techniques for the detection of covid- or pd, thus resulting in a potentially high number of false negatives when the model's performance is evaluated on unseen patient data. the authors have hypothesised that this impaired generalisation is due to the classical approach underpinning such relu-based afs that has been just leveraged and moulded in various ways so far, without breaking its inherent functional limitations. the hereby contribution proposes, for the first time, that a quantum-based methodology to relu would improve the learning and generalisation in cnns with relevant impact for critical applications, such as the above-mentioned diagnostic tasks. in particular, by blending the two key quantum principles of entanglement of qubits and the effects of superposition to help reach the global minimum in the solution, thus avoiding negative solutions differently from classical approaches as in . , this study investigates the development of a novel af 'quantum relu' to avoid the problem of the 'dying relu' in a quantistic manner. this builds on recent research efforts by cong et al. ( ) to develop a quantum cnn that, although demonstrating how quantum states can be recognised, have not yet addressed the 'dying relu' problem, as it simply leveraged the traditional relus instead. patterns from lung ultrasound images and spiral drawings are known diagnostic biomarkers for covid- and pd respectively, pd being at times a delicate co-morbidity of covid- patients, and improvements in generalisation are key to an accurate and reliable early diagnosis that can improve outcomes, especially in the event of co-morbidities. thus, the novel quantum relu will be leveraged in a cnn to improve classification performance in such pattern recognition tasks, as quantified via clinically relevant and interpretable metrics, and compared against the same cnn with current gold-standard afs, including relu and lrelu. the proposed added capability of a quantum relu in a cnn is hypothesised to improve its generalisation for pattern recognition in image classification, such as detecting covid- and pd from ultrasound scans and spiral drawings, respectively. the remaining sections of the paper are structured as follows. section deals with the methods, including sub-section . illustrating the two novel quantum afs, along with their mathematical formulation and respective implementations in python codes (in both tensorflow and keras libraries). sub-section . provides a description of the benchmark datasets selected, along with a standardised data pre-processing strategy, whilst section summarises the results obtained comparing the accuracy, reliability and computational time of a cnn with the proposed quantum afs against salient gold standard afs outlined in table . eventually, section provides a thorough discussion of the results and section summarises the current work and outlines its access, impact, and future applications. despite appropriate initialisation of the weights to small random values via the he initialisation, with large weight updates, the summed input to the traditional relu activation function is always negative, although the input values fed to the cnn. current improvements to the relu, such as the leaky relu, allow for a more non-linear output to either account for small negative values or facilitate the transition from positive to small negative values, without eliminating the problem though. consequently, this study investigates the development of a novel activation function to obviate the problem of the 'dying relu' in a quantistic manner, i.e., by achieving a positive solution where previously the solution was negative. such an added novel capability in a cnn was hypothesised to improve its generalisation for pattern recognition in image classification, particularly important in critical applications, such as medical diagnoses of covid- and pd. thus, using the same standard two-layered cnn in tensorflow for mnist data classification, after identifying the main reproducible (with associated codes available in tensorflow and keras) afs following a critical review of the literature (section ), the following nine classical activation functions were considered: relu, leaky relu, crelu, sigmoid, tanh, softmax, vlrelu, elu and selu. a two-step quantum approach was applied to relu first, by selecting its solution for positive values ( ( ) = , ∀ > ), and the leaky relu's solution for negative values ( ( ) = × , ∀ ≤ , ℎ = . ) as a starting point to improve quantistically. by applying the quantum principle of entanglement, the tensor product of the two candidate state spaces from relu and leaky relu was performed and the following quantum-based combination of solutions was obtained: thus, keeping r(z) = z for positive values (z > ) as in the relu, but with the added novelty of the entangled solution for negative values ( ), the quantum relu (qrelu) was attained (fig. ) . the algorithms to describe the methodology and af were implemented in tensorflow and keras, and presented in listings and respectively, thus avoiding the 'dying relu' maintaining the positivity of the solution mathematically via this new quantum state. ()) model.add(layers.maxpooling d(( , ))) by leveraging the quantum principle of superposition on the qrelu's solution for positive and negative values, the following modified qrelu (m-qrelu) was obtained (fig. ) . the algorithms to describe the methodology and af were implemented in tensorflow and keras, and presented in listings and respectively, still avoiding the 'dying relu' issue: listing provides the snippet of code in python to leverage the m-qrelu in tensorflow, using 'py_func' per listing . its usage in tensorflow is the same as the 'qrelu' in listing but using 'tf_m_q_relu' as an activation function of the second convolutional layer ('conv _act' conv d( , ( , ) , input_shape=( , , ))) #model.add(qrelu()) model.add(m_qrelu()) model.add(layers. maxpooling d(( , ) )) the m-qrelu also satisfies the entanglement principle being derived via the tensor outer product of the solutions from the qrelu. thus, a quantum-based blend of both superposition and entanglement principles mathematically leads the qrelu and the m-qrelu to obviate the 'dying relu' problem intrinsically. as shown in ( ) and ( ), although the two proposed afs are quantistic in nature, both qrelu and m-qrelu can be run on classical hardware, such as central processing unit (cpu), graphics processing unit (gpu) and tensor processing unit (tpu), the latter being the type of runtime used in this study via google colab (http://colab.research.google.com/) to perform the required evaluation on the datasets described in . . the novel qrelu and m-qrelu were developed and tested using python . and written to be compatible with both tensorflow ( . and . tested, . supports tensorflow serving to deploy the novel afs on the cloud) and the keras sequential api. thus, both afs were programmed as new keras layers for ease of use. by selecting the positive quantum state of the summed input of the qrelu and m-qrelu, an optimal early diagnosis could be achieved for patients with covid- and pd. thus, this study demonstrates the qrelu and m-qrelu as a potential new benchmark activation function to use in cnns for critical image classification tasks, particularly useful in medical diagnoses, wherein generalisation is key to improving patient outcomes. to assess which af was suitable for each of the pattern recognition tasks involved in classifying the seven benchmark datasets as per . , the performance of the baseline cnn was assessed via the test or out-of-sample classification accuracy, precision, sensitivity/recall and f -score. precision, recall, and f -score are important metrics to measure the reliability of the classification outcomes. % confidence intervals (cis) were also reported. to enable reproducibility and replicability of the results obtained, publicly available benchmark datasets were gathered and used in this study, as mentioned below. moreover, to this purpose, full python codes (.py and .ipynb formats) in both ten-sorflow (https://www.tensorflow.org/) and keras (https://keras.io/) on how these were used for training the model, as well as to evaluate its performance, are also provided. as a general benchmark dataset for any image classifiers, especially cnns, the mnist data (lecun et al., ) , including , images of handwritten digits ( , images for training, , images for testing), was used for the initial model and af validation. this dataset is in tensor format available in tensorflow (https://www.tensorflow.org/datasets/catalog/mnist). to address the specific needs to improve diagnosis of parkinson's disease (pd) and that of covid- dealt with in this study, further benchmark datasets were used. four benchmark datasets were leveraged to identify pd based on patterns on spiral drawings ( subjects in total), as follows: as in the mnist dataset, images in all benchmark datasets were converted to grayscale and resized to be * . the two-layered cnn, designed as an mnist classifier, was initially validated on the mnist benchmark dataset itself, used for recognising handwritten digits. the qrelu and the m-qrelu were the best and second-best performing activation functions respectively, leading to an acc and an f -score of . ( %) and of . ( %) respectively (table ). the relu, the leaky relu and the vlrelu also led to the best classification performance on the mnist data (acc = . / %, f -score = . / %) (table ) . thus, the proposed qrelu achieved gold standard classification performance on this benchmark dataset. noteworthily, the qrelu and the m-qrelu led the same two-layered cnn architecture to achieve the best (acc = . / %, f -score = . / %) and third (acc = . / %, f -score = . / %) classification performance (table ) on the benchmark dataset named 'spiral handpd' on images of spiral drawings taken via graphic tablets from patients with pd and healthy subjects. as illustrated in table , competitive results were achieved by the qrelu and the m-qrelu versions on a further benchmark dataset on spiral drawings, the 'newhandpd dataset', leading to the sixth and eight classification performance respectively (acc = . / %, f -score = . / %; acc = . / %, f -score = . / %). very competitive outcomes were obtained by the two proposed quantum afs on the kaggle spiral drawings dataset, with m-qrelu (acc = . / %, f score = . / %) and qrelu (acc = . / %, f -score = . / %) leading to the second and fourth classification performance respectively (table ) , as well as when evaluated against the uci spiral drawings dataset (qrelu ranked fifth with acc = . / % and f -score = . / %; m-qrelu ranked sixth with acc = . / % and f -score = . / %) ( table ). the overall increased generalisation brought about by the two novel quantum afs is evident in the outstanding and mutually consistent classification outcomes achieved on both benchmark lung us datasets to distinguish covid- from both pneumonia and healthy subjects with the best (table -qrelu and m-qrelu with acc = . / % and f -score = . / %) and the second (table -qrelu and m-qrelu with acc = . / % and f -score = . / %) classification performance respectively for both qrelu and m-qrelu. despite a higher computational cost (four-fold with respect to the other afs except for the crelu's increase being almost three-fold), the results achieved by either or both the proposed qrelu and m-relu afs, assessed on classification accuracy, precision, recall and f -score, indicate an overall higher generalisation achieved on five of the seven benchmark datasets ( table on the mnist data, tables and on pd-related spiral drawings, tables and on covid- lung us images). consequently, the two quantum relu methods are the overall best performing afs that can be applied for aiding diagnosis of both covid- from lung us data and pd from spiral drawings. specifically, when using the novel quantum afs (qrelu and m-qrelu) as compared to the traditional relu and leaky relu afs, the gold standard afs in dnns, the following percentage increases in acc, precision, recall/sensitivity and f score were noted: • an increase of . % in acc and sensitivity/recall via m-qrelu as compared to relu and by . % with respect to leaky relu, thus avoiding the 'dying relu' problem when the cnn was evaluated on the kaggle spiral drawings benchmark dataset (table ); • an increase by . % in f -score via both qrelu and m-qrelu as opposed to leaky relu, hence obviating the 'dying relu' problem again but when tested on the covid- ultrasound benchmark dataset (table ) . • an increase of % in acc and sensitivity/recall via both qrelu and m-qrelu with regards to both relu and leaky relu, hence solving the 'dying relu' problem when evaluated on the pocus ultrasound benchmark dataset (table ). • an increase by , % in acc and sensitivity/recall via qrelu ( %) when compared to tanh ( % acc and sensitivity/recall), thus avoiding the vanishing gradient problem too, as assessed on the uci spiral drawings benchmark dataset (table ) . furthermore, it is worth noting the proposed quantum afs led to improved classification outcomes as compared to recent advances in relu afs, such as crelu and vlrelu: • qrelu led to acc, precision, sensitivity/recall, and f -score all higher by % those obtained via crelu when evaluating the cnn's classification performance on the mnist data (table ). • m-qrelu resulted in an acc and a sensitivity/recall higher by % than crelu, and an f -score greater by % on the spiral handpd dataset (table ) . • m-qrelu led to an acc and a sensitivity/recall greater by % than vlrelu, and an f -score also higher by % on the spiral handpd dataset (table ) . • m-qrelu resulted in an acc and a sensitivity/recall higher by % than vlrelu, and an f -score greater by % on the kaggle spiral drawings dataset (table ) . • qrelu and m-qrelu led to an acc and a sensitivity/recall greater by % and % than crelu and vlrelu respectively, and an f -score higher by % and % on the covid- ultrasound dataset (table ) . • qrelu and m-qrelu resulted in an acc and a sensitivity/recall higher by % than vlrelu, and an f -score greater by % on the pocus ultrasound dataset (table ) . the results obtained via the qrelu and m-qrelu in a two-layered cnn on the mnist dataset (table ) the two-layered cnn's classification performance via the proposed m-qrelu (acc = %, f -score = %, table ) was also higher by over % than the best performing five-layered cnns, whose hyperparameters were also optimised respectively via both the bat algorithm and particle swarm optimisation (pso) (pereira et al., c) , to aid diagnosis of pd from spiral drawings, such as using the 'spiral handpd' benchmark dataset. a comparable precision was achieved by the two-layered cnn model (table ) when the qrelu and m-qrelu were used as afs with respect to the best classifier so far on the covid ultrasound dataset, i.e., the sixteen-layered pocovid-net model, which builds on the vgg model (born et al., ) . table . results on performance evaluation of the first convolutional neural network having two convolutional layers, built in tensor-flow, and tested on the kaggle spiral drawings benchmark dataset. the size of the images was set to * , as per the mnist benchmark dataset. table . results on performance evaluation of the first convolutional neural network having two convolutional layers, built in tensor-flow, and tested on the university california irvine (uci) spiral drawings benchmark dataset. the kaggle spiral drawings benchmark dataset, which includes drawings from both healthy subjects and patients with parkinson's disease, was used for training and the uci spiral drawings benchmark dataset, which only has spiral drawings acquired during both static and dynamic tests from patients with pd, was deployed for testing. the size of the images was set to * , as per the mnist benchmark dataset. table . results on performance evaluation of the first convolutional neural network having two convolutional layers, built in tensor-flow, and tested on the covid- ultrasound benchmark dataset. the size of the images was set to * , as per the mnist benchmark dataset. further to the extensive review of existing relu afs provided in section . , also considering that classical approaches have been unable to solve the 'dying relu' problem as reviewed in section . , and taking into account the advantages of quantum states in afs (listed in section . ), two novel quantum-based afs were mathematically formulated in section . and developed in both tensorflow (listings and , https://www.tensorflow.org/) and keras (listings and , https://keras.io/) to enable reproducibility and replicability. thus, the mnist two-layered cnn-based classifier in tensor-flow was selected as the baseline model to assess the impact of using either quantum afs (qrelu and m-qrelu) on the classification performance on seven benchmark datasets as described in section . and evaluated based on test acc, precision, recall/sensitivity and f -score, as mentioned in section . . the proposed qrelu leads to the best classification performance on the mnist benchmark dataset (acc = %, f -score = %, table ) to recognise handwritten digits serves as a regression test to validate the hypothesis whereby, using the baseline cnn-based mnist classifier, the highest classification performance is achieved with the presumed best af. this hypothesis has been further confirmed by the m-qrelu achieving the second classification performance (acc = %, f score = %, table ) across all eleven afs evaluated as in . . achieving the same classification performance as the gold standard reproducible and replicable afs in cnns (relu, the leaky relu and the vlrelu)readily available in both tensorflow and kerasthe qrelu can be granted the designation of benchmark af for the task of handwritten digits recognition performed on the mnist benchmark dataset. the benefits of avoiding the 'dying relu' problem become evident when assessing the same two-layered cnn architecture with the qrelu especially (acc = . / %, f -score = . / %, table ), which achieved the best classification performance on critical image classification tasks, such as recognising pd-related patterns from spiral drawings in the 'spiral handpd' benchmark dataset. the higher generalisability achieved via the two proposed quantum afs in further support of the advantage of obviating the 'dying relu' issue is evident from the best classification performance in differentiating covid- from both bacterial pneumonia and healthy controls from the lung us data (table -qrelu and m-qrelu with acc = . / % and f -score = . / %). such an overall higher diagnostic performance is corroborated by the second-best classification outcomes attained on the second benchmark lung us dataset (table ) . whilst traditional relu approaches show highly variable classification outcomes, especially when they experience the 'dying relu' problem (tables , and ), both the qrelu and the m-qrelu were able to ensure a consistently higher classification performance and generalisation across the entire variety of image classification tasks involved, from the benchmark handwritten digits recognition task (mnist), to recognising pd-related patterns from spiral drawings taken from graphic tablets, to aiding detection of covid- from bacteria pneumonia and healthy lungs based on us scans. the advantage of using the proposed afs for covid- detection lies in the potential for their translational applications in a clinical setting, i.e., in leveraging cnns with the qrelu or m-qrelu to detect covid- in patients with neurodegenerative co-morbidities, such as pd, via non-ionising medical imaging (e.g., us). this added capability will come handy in future, as portable mri and ml-enhanced mri technologies will also become more affordable and widespread, thus being improvable with deep learning models (e.g., the two-layered cnn with qrelu or m-qrelu afs in this study). solutions either on edge devices or on the cloud for tele-diagnosis and tele-monitoring required in pandemics similar to the current one (covid- ) could be soon suitable for in-home diagnostic and prognostic assessments too, which should improve personalised care for shielded or vulnerable individuals. moreover, competitive outcomes were obtained via the qrelu and the m-qrelu on three further benchmark datasets, e.g., 'newhandpd dataset', the kaggle and the uci spiral drawings benchmark datasets, with acc and f -score mostly above % (tables - ) using the relatively simple deep neural network leveraged in this study (the two-layered mnist cnn classifier). such results also demonstrate the added capability of the proposed qrelu and the m-qrelu to avoid the vanishing gradient problem occurred using tanh ( % acc and sensitivity/recall), as evaluated on the uci spiral drawings benchmark dataset (table ) . despite the overall increase in generalisability brought about by the qrelu and the m-qrelu, the computational cost of the cnn increased by four times as compared to the other nine afs evaluated, except for the crelu, against which a threefold increase was reported (tables - ) . nevertheless, considering the importance of achieving higher classification performance over lower computational cost for diagnostic applications in a clinical setting, especially for the critical image classification tasks involved in this study, such as the detection of pd (tables - ) and covid- (tables and ) , this increase in computational cost is not expected to impair the wide application of the two novel quantum afs to aid such diagnostic tasks and any other medical applications involving image classification. in fact, the qrelu and m-qrelu have been demonstrated as considerably better than the current (undisputedly assumed) gold standard afs in cnns, i.e., the traditional relu and the leaky relu. in particular, an increase by - % in both accuracy and reliability (especially, sensitivity/recall and f -score) metrics was reported across both pattern recognition tasks, i.e., detection of pd-related patterns from spiral drawings (tables and ) and aiding diagnosis of covid- from us scans ( table ) . the two proposed quantum afs also outperformed more cutting-edge relu afs, such as the crelu and the vlrelu, by - % across all classification tasks considered, i.e., mnist data classification (table ) , spiral drawings pd-related pattern recognition (in particular, tables and ) , and covid- detection from us scans (tables and ) . moreover, the qrelu and the m-qrelu led the baseline two-layered cnn mnist classifier to achieve a comparable classification performance on the mnist dataset as deeper cnns, ranging from three to four layers (lecun et al., ; siddique et al., ; ahlawat et al., ) , including deeper architectures, e.g., resnet and densenet (chen et al., ) . it is worth noting that, when leveraging the qrelu and the m-qrelu, the two-layered cnn with hyperparameters based on the mnist data outperformed (acc = %, f -score = %, table ) deeper and ba-and pso-optimised cnns from published studies by over % (pereira et al., c) in aiding the diagnosis of pd from patterns in spiral drawings (e.g., using the 'spiral handpd' benchmark data). the two-layered cnn model with either qrelu or m-qrelu as afs achieved a comparable precision (table ) to the best-performing classifier on the covid ultrasound dataset, i.e., the sixteen-layered pocovid-net model, which is an extension of the vgg benchmark model (born et al., ) . these outcomes show the two main practical advantages brought about by the avoidance of the 'dying relu' problem in qrelu and the m-qrelu that outweigh the initial consideration on these two quantum afs leading to an overall higher computational cost despite the increased generalisation, which are as follows: . using qrelu or m-qrelu can obviate the need for several convolutional layers in cnns and any cnn-derived models, such as alexnet, resnet, densenet, condensenet, ccondensenet and vgg , as demonstrated above and in section (results), . leveraging qrelu or m-qrelu as afs in cnn can minimise the need for optimisation of cnn's hyperparameters. the implications of the two above-mentioned practical benefits are multiple. firstly, the two proposed afs may not only improve generalisation but also computational cost when considering image classification tasks that involve deeper architectures than the two-layered cnn used in this study. thus, the proposed afs may be viable alternatives to the relu af, which is the current gold standard af in cnns. second, by improving both generalisation and computational cost when deeper architectures may be required, the qrelu and m-qrelu may be suitable for tasks that require scalability of deep neural networks. third, the proposed quantum afs may enable more effective transfer learning, such as for covid- detection in multiple geographical areas, as well as extending trained deep nets to further diagnostic tasks, including prognostic applications too, and aiding self-driving vehicles in image classification tasks essential to ensure passenger safety. overall, the avoidance of the 'dying relu' problem achieved via qrelu and m-qrelu is expected to radically shift the paradigm of blindly relying on the traditional relu af in cnn and any cnn-derived models, and embrace innovative approaches, including quantum-based, such as the two novel afs designed, developed and validated in this study. further to a thorough analysis of the classification performance of the two-layered cnn mnist classifier leveraging the two quantum afs developed in this study, qrelu and m-qrelu, and evaluated against nine benchmark afs, including relu and its main recent reproducible and replicable advances, as well as relevant published studies, the proposed qrelu and m-qrelu prove to be the first two afs in the recorded history of deep learning to successfully avoid the 'dying relu' problem, by design. their novel algorithms describing the methodology and af were implemented in tensorflow and keras, as well as presented in listings - . this added capability ensured accurate and reliable classification for recognising pdrelated patterns from spiral drawings and detecting covid- from non-ionising medical imaging (us) data. furthermore, its availability in both google's tensorflow and kerasthe two most popular libraries in python for deep learning -facilitate their wide application beyond clinical diagnostics, including medical prognostics and any other applications involving image classification. thus, the qrelu and m-qrelu can aid detection of covid- during these unprecedented times of this pandemic, as well as deliver continuous value added in aiding the diagnosis of pd based on pattern recognition from spiral drawings. noteworthily, when leveraging the proposed quantum afs, the baseline cnn model achieved comparable classification performance to deeper cnn and cnn-derived architectures across all image recognition tasks involved in this study, from handwritten digits recognition, to detection of pd-related patterns from spiral drawings and covid- from lung us scans. thus, these outcomes corroborate the benefit of using afs that avoid the 'dying relu' problem for critical image classification tasks, such as for medical diagnoses, making them a viable alternative to the current gold standard af in cnns, i.e., the relu. this study is expected to have a radical impact in redefining the benchmark afs in cnn and cnn-derived deep learning architectures for applications across academic research and industry. improved handwritten digit recognition using convolutional neural networks (cnn) quantum computing and deep learning working together to solve optimization problems big data and machine learning in health care parkinsonism as a third wave of the covid- pandemic? chronic neurology in covid- era: clinical considerations and recommendations from the reprogram consortium. front. neurol pocovid-net: automatic detection of covid- from a new lung ultrasound imaging dataset (pocus) stability and bifurcation of a simple neural network with multiple time delays quantum neuron: an elementary building block for machine learning on quantum computers. arxiv assessing four neural networks on handwritten digit recognition dataset (mnist) quantum machine learning: a classical perspective quantum algorithms revisited quantum convolutional neural networks applying gradient descent in convolutional neural networks origin and evolution of pathogenic coronaviruses adaptive convolution relus. thirty-fourth aaai conference on artificial intelligence deep sparse rectifier neural networks softmax to softassign: neural network algorithms for combinatorial optimization the influence of the sigmoid function parameters on the speed of backpropagation learning sigmoid transfer functions in backpropagation neural networks delving deep into rectifiers: surpassing human-level performance on imagenet classification reducing the dimensionality of data with neural networks towards a real quantum neuron improved spiral test using digitized graphics tablet for monitoring parkinson's disease on the role of entanglement in quantum-computational speed-up deep learning applications in medical image analysis self-normalizing neural networks. arxiv deep neural architectures for prediction in healthcare a quantum-inspired self-supervised network model for automatic segmentation of brain mr images imagenet classification with deep convolutional neural networks convolutional networks for images, speech, and time-series gradient-based learning applied to document recognition. proceedings of the ieee deep learning bounded activation functions for enhanced training stability of deep neural networks on visual pattern recognition problems a survey on deep learning in medical image analysis rectifier nonlinearities improve neural network acoustic models. proceedings of the th international conference on machine learning a variational algorithm for quantum neural networks rectified linear units improve restricted boltzmann machines quantum computation and quantum information comparison of non-linear activation functions for deep neural networks on mnist classification task a new computer vision-based approach to aid the diagnosis of parkinson's disease deep learning-aided parkinson's disease diagnosis from handwritten dynamics convolutional neural networks applied for parkinson's disease identification frelu: flexible rectified linear units for improving convolutional neural networks learning representations by back-propagating errors imagenet large scale visual recognition challenge collection and analysis of a parkinson speech dataset with multiple types of sound recordings the quest for a quantum neural network. quantum inf process understanding and improving convolutional neural networks via concatenated rectified linear units radiological findings from patients with covid- pneumonia in wuhan recognition of handwritten digit using convolutional neural network in python with tensorflow and comparison of performance for various hidden layers the potential of quantum computing and machine learning to advance clinical research and change the practice of medicine leaky_relu | tensorflow core v . . tensorflow. . tf.keras.layers.leakyrelu | tensorflow core v . . estimates of the severity of coronavirus disease : a model-based analysis empirical evaluation of rectified activations in convolutional network visualizing and understanding convolutional networks distinguishing different stages of parkinson's disease using composite index of speed and pen-pressure of sketching a spiral clinical course and risk factors for mortality of adult inpatients with covid- in wuhan, china: a retrospective cohort study the authors would like to thank two research assistants from the university of bradford, ms smriti kotiyal and mr rohit trivedi, for their assistance to the background review relevant for this paper.the authors declare that no ethical approval was required for carrying out the study, as the data used in it were taken from publicly available repositories and appropriately referenced in text. moreover, the authors declare not to have any competing interests and an appropriate funding statement has been provided on the title page of this article. key: cord- -u ybz ds authors: yu, chanki; yang, sejung; kim, wonoh; jung, jinwoong; chung, kee-yang; lee, sang wook; oh, byungho title: acral melanoma detection using a convolutional neural network for dermoscopy images date: - - journal: plos one doi: . /journal.pone. sha: doc_id: cord_uid: u ybz ds background/purpose: acral melanoma is the most common type of melanoma in asians, and usually results in a poor prognosis due to late diagnosis. we applied a convolutional neural network to dermoscopy images of acral melanoma and benign nevi on the hands and feet and evaluated its usefulness for the early diagnosis of these conditions. methods: a total of dermoscopy images comprising acral melanoma ( images from patients) and benign nevi ( images from patients), and confirmed by histopathological examination, were analyzed in this study. to perform the -fold cross validation, we split them into two mutually exclusive subsets: half of the total image dataset was selected for training and the rest for testing, and we calculated the accuracy of diagnosis comparing it with the dermatologist’s and non-expert’s evaluation. results: the accuracy (percentage of true positive and true negative from all images) of the convolutional neural network was . % and . %, which was higher than the non-expert’s evaluation ( . %, . %) and close to that of the expert ( . %, . %). moreover, the convolutional neural network showed area-under-the-curve values like . , . and youden’s index like . , . , which were similar score with the expert. conclusion: although further data analysis is necessary to improve their accuracy, convolutional neural networks would be helpful to detect acral melanoma from dermoscopy images of the hands and feet. a a a a a in asians, melanoma is rare, compared to its prevalence in caucasians, and usually occurs in acral areas such as the hands and feet. it can be misrecognized as benign nevi (bn), is occasionally hidden by calluses, and eventually results in late diagnosis at an advanced stage, with a poor prognosis [ ] [ ] [ ] . since effective anti-cancer agents for treating melanoma have not yet been developed, early detection and wide excision of the skin lesion is more crucial to the cure for melanoma. recently, to aid the early diagnosis of melanoma and the reduction of unnecessary skin biopsy, dermoscopy has been widely used [ , ] . moreover, because it is difficult for nonexperts to use [ ] , artificial intelligence and deep-learning models have been applied to help physicians who are untrained to handle a digital dermoscope [ ] ; its use is expected to increase in the field of teledermatology. a convolutional neural network (cnn) is one of the representative models among the various deep-learning models. it has already shown potential for general and highly variable tasks across many fine-grained object categories [ ] [ ] [ ] [ ] [ ] and has been shown to exceed human performance in object recognition [ ] . recently, it was applied to detect skin cancers in images, including from dermoscopy, and successfully demonstrated artificial intelligence capable of classifying skin cancer with a competence level comparable to that of dermatologists [ ] . for the success of cnn models, a large amount of training data labeled with class types to produce a rich feature hierarchy is necessary, and therefore, its usefulness in the diagnosis of rare diseases with insufficient data has not been fully established. in this study, we applied an end-to-end cnn framework to detect a rare disease in asians, acral melanoma (am), from the dermoscopy images of pigmentation on the hands and feet. to overcome the insufficiency of the datasets, we adopted a transfer learning technique to leverage learned features from a cnn model pre-trained on a large-scale natural image dataset [ ] . moreover, we also applied a half-training and half-trial method to validate its clinical usefulness for the early diagnosis of patients compared with the dermatologist's and non-expert's evaluation. a total of dermoscopy images were collected from january to march at the severance hospital in the yonsei university health system, seoul, korea, and from march to april at the dongsan hospital in the keimyung university health system, daegu, korea. among them, dermoscopy images were from patients with am and images were from patients with bn of the acral area (fig ) . a total of dermoscopy images were captured by the dermlite cam ( gen inc., usa), and images were captured by the dermlite hybrid ii ( gen inc., usa), connected to a digital camera (nikon coolpix p , japan). all diagnoses were histopathologically confirmed and multiple images were captured in cases of large lesions. we provide a strobe checklist for the study of diagnostic efficacy as supporting information (s table) . dermoscopy images of bn were divided into nine types, and am images into three types according to the reference [ ] , by two dermatologists. this study protocol was approved by the institutional review board of yonsei university, severance hospital and keimyung university, dongsan hospital and was conducted according to the declaration of helsinki principles. patient records/information was anonymized and deidentified prior to analysis. we have described the cnn architecture we adopted in section . and presented the training and inference methods for detecting melanoma in section . . . convolutional neural network. cnns are composed of several convolutional layers, each involving linear and nonlinear operators, as well as fully connected layers. the architecture for the state-of-the-art cnn has many parameters; for example, the vgg- model has million parameters, where the parameters are learned from the imagenet dataset containing . million general object images of , different object categories for training [ ] . deep neural networks are difficult to train using small datasets (i.e., a few hundred images). to circumvent this problem, we used the fine-tuning technique, which is one of the regularization techniques. we fine-tuned a modified vgg model with layers ( convolutional and three fully connected layers), which uses the convolution filters of the same size (i.e., × ) for all convolution layers, as seen in table . our network configuration is depicted in fig and table . each layer and feature map in the cnn is represented by a three-dimensional array of table ; "conv" represents a convolutional layer and "fc" represents a fully connected layer). the input with a fixed-size, × , was passed through a stack of convolutional layers, where each followed a rectified linear unit (relu) activation function, and max-pooling was performed over a × pixel window with a stride of . a series of convolutional layers (conv , conv , conv , conv , and conv ) were followed by three fully connected layers: the first fully connected layers (fc and fc ) had , channels each, where each followed a relu activation function, while the last fully connected layer (fc ) had channels since our problem was a two-way classification problem (melanoma and non-melanoma class). it should be noted that the number of channels of the last fully connected layer was the same as the number of classes. hence, we replaced the original fully connected layer (fc : fc with channels) with a fully connected layer with two channels. the last layer had the soft-max activation function and predicted whether the input patch was a melanoma or non-melanoma lesion. moreover, the vgg- model pre-trained on the imagenet database are used to perform transfer learning, and the weights of the last convolutional layers (the last two layers of conv ) and three fully convolutional layers (fc , fc , and fc ) are initialized using xavier weight initialization [ ] . in order to perform fine-tuning, we froze the weights of conv , conv , conv , conv , and the first layer of conv on pre-trained imagenet, and trained the initialized weights on our dermoscopy image dataset. the above procedure is performed to prevent the large gradient caused by randomly initialized weights from ruining the pre-trained weights. after several training epochs, we trained the all weights of our network without freezing any layer. our dataset consisted of images and associated labels, which were split into two mutually exclusive subsets (group a and b); half of the total image dataset was selected for training and the rest for testing. the scale and location of a skin lesion in a captured image were changed according to the capture conditions. to resolve this issue, we adopted a sliding window strategy and used the cropped patches instead of the full image at the training and inference time. at the inference time, we extracted about image patches from each test image on a regularly spaced grid with a partial overlap between neighboring patches and then each patch was rescaled to the size of × pixels, as seen in fig . in addition, to increase the robustness of the variation of geometric transformation in our cnn model, the training dataset was artificially augmented at training. additional augmented data were formed by rotating and flipping images from the original training set. we generated image patches from a single image using rotations by ˚, ˚, ˚, and ˚, as well as leftright and top-bottom reflections. in addition, the patches that did not contain any melanoma lesions among the melanoma training images were manually removed and the patches that did not contain any skin lesions among the non-melanoma training images were assigned to the non-melanoma class at training time. we randomly selected % of the training dataset as a validation set and the rest as a training set at the onset of training. the validation data were used to prevent the overfitting of the training data and to provide guidance on when to stop training the network. the training of our cnn was stopped when the validation error on the validated dataset stopped decreasing. we trained the network using an adaptive stochastic sub-gradient method where the batch size is set to , and the momentum parameter, learning rate, and weight decay are set to . , . , and . , respectively. some of the filters learned from our melanoma dataset may be seen in fig . fig (a) shows learned filters at the st convolutional layer, where each represents a learned filter with a × kernel size. the input of the first layer is an rgb image with x pixel size, and it is convolved with learned filters with x kernel size as shown in fig (a) and feature maps with x size are generated. in addition, the output feature maps are used as the input of the next layer. fig (b) - (m) shows filters among the learned filters from the nd to the th convolution layer, respectively, where each represents a learned filter with a × kernel size. at the time of inference, we interpreted image patches per test image, and when one or more images were predicted as containing melanoma, the corresponding test image was interpreted as containing melanoma. each input of the network was an rgb image subtracted from the average image and calculated over the entire training image dataset. we implemented our method using matconvnet, a matlab-based cnn framework for computer vision applications [ ] . moreover, we fine-tuned a vgg model with layers downloaded from http:// www.vlfeat.org/matconvnet/pretrained). to assess the clinical usefulness of the cnn, we compared its diagnostic rate with those of two dermatologists who had five or more years of clinical experience in dermoscopy (expert group) and two non-trained general physicians (non-expert group). all images on the computer screen were evaluated simultaneously. if there was a dissensus between two physicians, they reached a conclusion under the agreement. since images were randomly and equally the agreement between the pathologic result and each rater's diagnosis was measured using the calculation of cohen's kappa coefficient. all statistical analyses were performed with medcalc software version . . (po = accuracy, pe = hypothetical probability of a chance agreement) among dermoscopy images, images were from the hands and fingers, and the others were from the feet and toes. a total of am images included homogenous diffuse irregular pigmented, parallel ridge, and multicomponent patterns, while bn images included parallel furrow, fibrillar, lattice-like, reticular, globular, and homogenous patterns (s table) . in the group a results obtained by the training of group b images, cnn showed . % sensitivity and . % specificity, which were similar to those of the expert ( . % and . %, respectively). however, the non-expert showed lower sensitivity ( . %) and relatively higher specificity ( . %, table ). for diagnostic accuracy, both the cnn and expert group showed similar scores ( . % and . %, respectively), which were higher than that of the non-expert ( . %, fig ) . in the result of group b by the training of group a images, cnn also showed a higher diagnostic accuracy ( . %) than that of the non-expert ( . %) but was similar to that of the expert ( . %). for validating diagnostic reliability, both the cnn and expert showed an auc above . in group a and b (fig ) . however, the non-expert regarding the concordance rate between the cnn and expert group, cases ( / , . %) in group a (am: cases, bn: cases) were discordant. of these, cases ( . %) of the cnn and cases ( . %) of the expert were identical with the pathologic results. however, in the concordant cases between them, cases ( / , . %) differed from the pathology reports. in group b, cases (am: cases, bn: cases) showed discordance between the cnn and expert, and cases ( . %) of the cnn and cases ( . %) of the expert were identical with the pathologic results. among the concordant cases in group b, cases ( / , . %) differed from the pathology results. cohen's kappa between cnn and expert, cnn and non-expert, expert and non-expert is shown in table . to verify the performance of cnn architecture for the discrimination of acral melanoma, we perform the deep learning architecture, inception-v , in [ ] , the state-of-the-art publication for the classification of skin cancer. in [ ] , a single image was used for learning. meanwhile, we applied multiple images for learning. thus, we compared inception-v with a single image and inception-v with multiple images to cnn with multiple images. the results are shown in table . although non-invasive and automated diagnostic techniques have been introduced for the early detection of melanoma, they are still not easy to apply in the acral type [ , ] . this may be due to the overall low occurrence rate of melanoma in asians, depending on the ethnic differences, which need a longer time to provide a sufficient dataset to improve diagnostic accuracy. to overcome the problem of an insufficient dataset, we adopt a -fold cross validation method, for the training and test groups. in addition, capturing images at different places for one lesion helps to construct a robust cnn. similarly, data augmentation generating virtual images using rotation, translation, different angle positioning from one image also helps for a robust cnn. these procedures are necessary to construct an automated diagnosis system from small datasets due to the low occurrence rate of acral melanoma. for the effective screening of melanoma, higher sensitivity is required. thus, if there is a small compartment corresponding to the melanoma in one image, our system considers it as melanoma. also, our system recognizes one image as one patient. from the results, the accuracy of the cnn was above %, which was similar in both groups and was close to that of the expert. the cnn and expert also showed auc values above . , indicating good discrimination. generally, higher auc values are considered to demonstrate better discriminatory abilities as follows: excellent discrimination, auc of ! . ; good discrimination, . auc < . ; fair discrimination, . auc < . ; and poor discrimination, auc of < . [ ] . since the auc of the non-expert was lower than . , cnn can be a useful tool for the early detection of am by the physicians who are not familiar with the dermoscopic images. moreover, additional datasets of am images can improve the diagnostic accuracy of cnn [ ] , making it a more reliable tool for the evaluation of the need for skin biopsy for hand and feet pigmentation. there were several auto-classification methods independent of the size of training data using dermatologists' checklist, such as the abcd rule and -point scale [ ] [ ] [ ] [ ] . this method used particular features such as color, shape, size, the boundary of the skin lesion, and statistical features of wavelength, which showed . % of accuracy and . auc value [ ] . however, these cannot be directly applied to acral melanoma due to the different morphologic features such as ridge or furrow patterns. although there was a new dermoscopic algorithm reflecting these characteristics for diagnosing acral melanoma: braaff [ ] , it has not yet been applied to the automated diagnosis. in addition, although there is a state-of-art automated classification method for acral melanoma, these methods cannot be generalized and only work well for a particular pattern of acral melanoma, which is a ridge-and-furrow pattern [ ] . automated diagnosis methods using particular features are able to reflect experts' perception and the speed of performance is fast. however, it is not easy to catch experts' perception, although we are trying to reach the goal with significant features. on the other hand, deep learning does not require specific features as inputs. it automatically finds the most correlated features with expert's perception by learning. thus the accuracy is higher than feature-based methods. however, a large database is critical for the successful completion of deep learning. recently, the melanoma classification performance of cnn using , dermoscopy images was reported as having an auc of . [ ] , which was higher than noted in our results ( . , . ). our inferior results may be due to the characteristics of am; it occurs on the pressure area, thick skin, callus, etc., which can hinder and transform the classic pigmented lesion into an atypical case. because of this, experts in our experiment also showed an auc of . . therefore, if the datasets are analyzed separately considering these anatomic characters, cnn may perform a more precise discrimination. furthermore, if combined with images from noninvasive devices for melanoma diagnosis, which may overcome the problems presented by a thick skin, the accuracy of cnn can be markedly improved. several non-invasive devices such as confocal and photon microscopy are being introduced to provide convenient ways to diagnose melanoma early [ ] . however, they require much effort and time for a physician to gain expertise. an automated diagnostic system using a cnn, even with a small dataset, may alleviate the difficulty of learning how to use these newly developed devices. in conclusion, a half-training and half-trial method were useful for creating a comparatively accurate deep-learning model from a relatively small dataset. although further data analysis is necessary to improve its accuracy, cnn would be helpful for the early detection of am, which is usually associated with delayed diagnosis and poor prognosis. supporting information s conceptualization: sejung yang, byungho oh. treatment and outcomes of melanoma in acral location in korean patients epub / / plantar malignant melanoma-a challenge for early recognition improvement in survival rate of patients with acral melanoma observed in the past years in sendai biopsy of the pigmented lesion-when and how dermoscopy of pigmented skin lesions: results of a consensus meeting via the internet diagnostic accuracy of dermoscopy systematic review of dermoscopy and digital dermoscopy/ artificial intelligence for the diagnosis of melanoma deep learning imagenet large scale visual recognition challenge imagenet classification with deep convolutional neural networks. advances in neural information processing systems rethinking the inception architecture for computer vision deep residual learning for image recognition dermatologist-level classification of skin cancer with deep neural networks convolutional neural networks for medical image analysis: full training or fine tuning? an atlas of dermoscopy very deep convolutional networks for large-scale image recognition understanding the difficulty of training deep feedforward neural networks matconvnet: convolutional neural networks for matlab. proceedings of the rd acm international conference on multimedia non-invasive tools for the diagnosis of cutaneous melanoma can routine laboratory tests discriminate between severe acute respiratory syndrome and other causes of community-acquired pneumonia? revisiting unreasonable effectiveness of data in deep learning era computer image analysis in the diagnosis of melanoma reliability of computer image analysis of pigmented skin lesions of australian adolescents combination of features from skin pattern and abcd analysis for lesion classification computer-aided diagnosis of melanoma using border and waveletbased texture analysis the braaff checklist: a new dermoscopic algorithm for diagnosing acral melanoma. the british journal of dermatology ridge and furrow pattern classification for acral lentiginous melanoma using dermoscopic images innovations and developments in dermatologic non-invasive optical imaging and potential clinical applications key: cord- - i bwlh authors: boudaya, amal; bouaziz, bassem; chaabene, siwar; chaari, lotfi; ammar, achraf; hökelmann, anita title: eeg-based hypo-vigilance detection using convolutional neural network date: - - journal: the impact of digital technologies on public health in developed and developing countries doi: . / - - - - _ sha: doc_id: cord_uid: i bwlh hypo-vigilance detection is becoming an important active research areas in the biomedical signal processing field. for this purpose, electroencephalogram (eeg) is one of the most common modalities in drowsiness and awakeness detection. in this context, we propose a new eeg classification method for detecting fatigue state. our method makes use of a and awakeness detection. in this context, we propose a new eeg classification method for detecting fatigue state. our method makes use of a convolutional neural network (cnn) architecture. we define an experimental protocol using the emotiv epoc+ headset. after that, we evaluate our proposed method on a recorded and annotated dataset. the reported results demonstrate high detection accuracy ( %) and indicate that the proposed method is an efficient alternative for hypo-vigilance detection as compared with other methods. hypo-vigilance has been one of the major causes of accidents in many areas such as driving [ ] , aviation [ ] and military sector [ ] . hence, the drowsiness problem has gained great interest from researchers. this is today a real up to date problem within the current covid- [ ] pandemic where medical stuff is generally overbooked. in fact, the drowsy condition is expressed predominantly by the emergence of various behavioral signs such as heaviness in terms of reaction, reflex reduction, occurrences of yawning, heaviness of the eyelids and/or the difficulty of keeping the head in the frontal position relative to the field of vision. many studies [ ] [ ] [ ] [ ] have been proposed to detect hypo-vigilance based on biomedical signals such as electroencephalogram (eeg), electrocardiogram (ecg), electromyogram (emg), and electrooculogram (eog). given, its high temporal resolution, portability and reasonable cost, the present work focus on hypo-vigilance detection by analyzing eeg signal of various brain's functionalities using fourteen electrodes placed on the participant's scalp. on the other hand, deep learning networks offer great potential for biomedical signals analysis through the simplification of raw input signals (i.e., through various steps including feature extraction, denoising and feature selection) and the improvement of the classification results. in this paper, we focus on the eeg signal study recorded by fourteen electrodes for hypo-vigilance detection by analyzing the various functionalities of the brain from the electrodes placed on the participant's scalp. various deep learning architectures [ ] exist such as convolutional neural network (cnn), recurrent cnn (r-cnn), auto-encoder (ae), deep belief network (dbn), including long short-term memory (lstm) and gated recurrent units (gru). as in [ ] , the cnn architecture is the most used to biomedical signals analysis providing a high classification accuracy. previous related work [ ] proposes a hypo-vigilance detection method using cnn by facial features. this method showed a classification accuracy of . %. likewise [ ] , introduces an adaptive conditional representation learning system for driver drowsiness detection based on a d-cnn. the proposed system consists of four steps (spatio-temporal representation, data preprocessing, features combination and somnolence detection). the experimental results show a detection accuracy equal to . %. in this paper, we propose a cnn hypo-vigilance detection method using eeg data in order to classify drowsiness and awakeness states. accordingly, the proposed approach including used equipment are presented in sect. . section describes the experimental results and the evaluation of the employed method. finally, a conclusion and future work are drawn in sect. . as shown in fig. , the realization of the proposed approach is suggested by two primary procedures: data acquisition and data analysis. the following subsections provide a detailed explanation of each procedure. the eeg data acquisition procedure is made up of two main steps which are data collection and data preprocessing. to collect the raw eeg data from participants, we use an emotiv epoc+ headset as shown in fig. [a] for the data acquisition process. the key feature of this headset is a non-invasive brain computer interface (bci) tool designed for the development of human brain and contextual research [ ] . the emotiv epoc + helmet contains fourteen active electrodes with two reference electrodes (drl and cms), as shown in fig. [b]. the electrodes are placed around the participant's head in the structures of the following zones: frontal and anterior parietal (af , af , f , f , f , f , fc , fc ), temporal (t , t ) and occipital-parietal (o , o , p , p ). the specific preprocessing steps of the data revolve around the following points which are data preparation, data annotation and data augmentation. during data acquisition, our raw eeg signals may be influenced by various sources of artifacts and noise such as endogenous electrical properties, specific fabrics physical structure, dipolar size variation, muscle shifts and blinks. hence, data processing is a preliminary step to denoising the raw signals. we suggest using an infinite impulse response (iir) filter that manages an impulsive signal within time and frequency domains. other sophisticated denoising approaches could be considered at the expense of higher computational complexity [ , ] . - to evaluate each individual's state of exhaustion, we concentrate on the brain areas that are responsible for hypo-vigilance detection. in this regard, different brain waves are targeted such as [ ] : • delta waves refer to consciousness, sleep or deep sleep states. these waves were found in the temporal and occipital conditions with low frequency (less than hz) and high amplitude. • theta waves design the relaxation and hypnosis states with a range of frequency between and hz. theta waves are extracted from the temporal zone and are produced during the first phase of slow sleep or in deep relaxation state. • alpha waves refer to waking but relaxed states. these waves are captured in the posterior part, precisely the occipital region, with a frequency interval between and hz and a low amplitude interval between and µv. • beta waves relate to alertness states. these waves are captured from the temporal and occipital lobes of the brain. they are characterized by high frequency interval of to hz with a low amplitude interval of to µv. • gamma waves refer to hypervigilance states with a frequency interval between to hz. in the data annotation step, we only use the o and o electrodes of occipital zone which are responsible for drowsiness sensation. as an annotation example, fig. indicates the amplitudes of the alpha and theta signals from the two o and o electrodes reported for a participant in three periods of the day. the relaxation state has been indicated by alpha waves which have a frequency interval between to hz and an amplitude interval between to µv. the somnolence state has been indicated by theta waves which have a frequency interval between to hz and an amplitude interval between and µv. in order to reduce overfitting and increase testing accuracy, we use the data augmentation technique [ ] which consists of increasing the training set by label-retaining data transformations. the purpose procedure is to extend the data by doubling the vectors from ( , ) to ( , ) where (resp. ) represents the vector size and represents the class number. the diagram of the neural network simple cnn used in our eeg drowsiness detection approach is represented in fig. . the proposed simple cnn model is composed of the following six main layers: -the convolutional layers allow the filter application and the features extraction characteristics of the input signals. -the sample-based discretization max-pooling- d blocks is used to sub-sample each input layer by reducing its dimensionality using a decrease in the number of the parameters to learn, there by reducing calculation costs. our protocol revolves around the following axes: eight volunteers in which four women and four men aged twenty six and fifty eight with normal mental health. for each participant, we make three recordings of sixteen minutes divided over three day periods (morning, afternoon and evening). to fully understand the condition of the participants, we split the signal into windows to accurately identify these different states. in the proposed simple cnn architecture for eeg signals classification, we use the keras deep learning library. the different parameters as filters, kernelsize, padding, kernel-initializer, and activation of the four convolutional layers have the same values respectively , , same, normal and relu. the parameter values of the remaining layers are detailed in the following: -the dropout layer value equal to . (respect. . ) is used to inactivate % (respect. %) of neurons in order to prevent overfitting. -the max-pooling d layer is used with a filter size of . -the muti-dimensional data output flatting using d flatten layer. -for better classification results, two dropout layers are used. the first hidden layer takes a value of neurons. since a binary classification problem, the second layer takes a value of . the choice of the optimization algorithm makes the difference between good results in minutes, hours or even days. there are various optimizers like adam [ ] , sgd [ ] and rms pop optimizer [ ] . in our model, we use the sgd optimizer which is more popular [ ] . the method of this optimizer is simple and effective for finding optimal values in a neural network. table presents the hyperparameters choice of our model. for selecting the best accuracy rate of the proposed method, we propose to compare different results recorded by different numbers of electrodes. in [ , ] , the authors discover that the prefrontal and occipital cortex are the most important channels to better diagnose the hypo-vigilance state. in this regard, we choose the following recorded data: -recorded data by electrodes (o and o ) electrodes from the occipital area. -recorded data by electrodes (t , t , o and o ) from temporal and occipital areas. -recorded data by electrodes (af , f , f , t , o , p , f ) from prefrontal and occipital areas. -recorded data by electrodes. for the distribution of our data, we choose % for the train part and % for the test. table presents the reported testing and training accuracy respectively with two, four, seven and fourteen electrodes. after convergence the optimum number of test epochs for all the different electrodes results establish a value equal to . the best results are given by the recording of electrodes from the occipital area. the curves of testing and training results for recorded data by o and o electrodes are represented in fig. . according to results obtained in fig. , we note that the test accuracy increases after a certain number of epochs and the test loss decreases. to test our system's efficiency we measured the precision, recall and f -score. table shows these different measures in our experimental configuration. for comparison purposes, we compare the proposed method with recent drowsiness methodology [ ] where the authors propose a driver hypovigilance detection using the emotiv epoc+ helmet. the common spatial pattern (csp) algorithm is used for optimization accuracy of extreme learning machine (elm). the reported values in table indicate that our method gives the optimum accuracy value classification. the present work proposes a cnn based approach for hypo-vigilance detection. in order to create a eeg dataset, we recorded raw eeg data using epoc+ headset. the suggested system achieves an average classification accuracy to . % by testing it on a real dataset of eight participants. in future work, we will focus to improve classification accuracy with large datasets. additionally, fusion with other biomedical signals should be also considered to improve the classification accuracy. open access this chapter is licensed under the terms of the creative commons attribution . international license (http://creativecommons.org/licenses/by/ . /), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the creative commons license and indicate if changes were made. the images or other third party material in this chapter are included in the chapter's creative commons license, unless indicated otherwise in a credit line to the material. if material is not included in the chapter's creative commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. noise robustness analysis of performance for eeg-based driver fatigue detection using different entropy feature sets fatigue detection in commercial flight operations: results using physiological measures simulated sustained flight operations and performance, part : effects of fatigue covid- pandemic by the "real-time" monitoring: the tunisian case and lessons for global epidemics in the context of pm strategies electromyogram signal based hypovigilance detection real-time ecg-based detection of fatigue driving using sample entropy exploring neuro-physiological correlates of drivers' mental fatigue caused by sleep deprivation using simultaneous eeg, ecg, and fnirs data muscle fatigue detections during arm movement using emg signal a state-of-the-art survey on deep learning theory and architectures -d convolutional neural networks for signal processing applications drowsy driver detection using representation learning driver drowsiness detection using conditionadaptive representation learning framework analysis of performance metrics using emotiv epoc+ hybrid sparse regularization for magnetic resonance spectroscopy sparse signal recovery using a bernouilli generalized gaussian prior analysis of the meditation brainwave from consumer eeg device a novel deep learning approach with data augmentation to classify motor imagery signals deep learning for eeg data analytics: a survey two classes classification using different optimizers in convolutional neural network automatic microemboli characterization using convolutional neural networks and radio frequency signals optimization of deep learning using various optimizers, loss functions and dropout real time fatigue-driver detection from electroencephalography using emotiv epoc+ drowsiness analysis using common spatial pattern and extreme learning machine based on electroencephalogram signal key: cord- -mtbo tnq authors: sun, yuliang; fei, tai; li, xibo; warnecke, alexander; warsitz, ernst; pohl, nils title: real-time radar-based gesture detection and recognition built in an edge-computing platform date: - - journal: nan doi: . /jsen. . sha: doc_id: cord_uid: mtbo tnq in this paper, a real-time signal processing frame-work based on a ghz frequency-modulated continuous wave (fmcw) radar system to recognize gestures is proposed. in order to improve the robustness of the radar-based gesture recognition system, the proposed framework extracts a comprehensive hand profile, including range, doppler, azimuth and elevation, over multiple measurement-cycles and encodes them into a feature cube. rather than feeding the range-doppler spectrum sequence into a deep convolutional neural network (cnn) connected with recurrent neural networks, the proposed framework takes the aforementioned feature cube as input of a shallow cnn for gesture recognition to reduce the computational complexity. in addition, we develop a hand activity detection (had) algorithm to automatize the detection of gestures in real-time case. the proposed had can capture the time-stamp at which a gesture finishes and feeds the hand profile of all the relevant measurement-cycles before this time-stamp into the cnn with low latency. since the proposed framework is able to detect and classify gestures at limited computational cost, it could be deployed in an edge-computing platform for real-time applications, whose performance is notedly inferior to a state-of-the-art personal computer. the experimental results show that the proposed framework has the capability of classifying gestures in real-time with a high f -score. r adar sensors are being widely used in many longrange applications for the purpose of target surveillance, such as in aircrafts, ships and vehicles [ ] , [ ] . thanks to the continuous development of silicon techniques, various electric components can be integrated in a compact form at a low price [ ] , [ ] . since radar sensors become more and more affordable to the general public, numerous emerging short-range radar applications, e.g., non-contact hand gesture recognition, are gaining tremendous importance in efforts to improve the quality of human life [ ] , [ ] . hand gesture recognition enables users to interact with machines in a more natural and intuitive manner than conventional touchscreen-based and button-based human-machine-interfaces [ ] . for example, google has integrated a ghz radar into the smartphone pixel , which allows users to change songs without touching the screen [ ] . what's more, virus and bacteria surviving on surfaces for a long time could contaminate the interface and cause people's health problems. for instance, in , tens of a video is available on https://youtu.be/ir nnzvzblk this article will be published in a future issue of ieee sensors journal. doi: . /jsen. . thousands of people have been infected with covid- by contacting such contaminate surfaces [ ] . radar-based hand gesture recognition allows people to interact with the machine in a touch-less way, which may reduce the risk of being infected with virus in a public environment. unlike optical gesture recognition techniques, radar sensors are insensitive to the ambient light conditions; the electromagnetic waves can penetrate dielectric materials, which makes it possible to embed them inside devices. in addition, because of privacypreserving reasons, radar sensors are preferable to cameras in many circumstances [ ] . furthermore, computer vision techniques applied to extract hand motion information in every frame are usually not power efficient, which is therefore not suitable for wearable and mobile devices [ ] . motivated by the benefits of radar-based touch-less hand gesture recognition, numerous approaches were developed in recent years. the authors in [ ] , [ ] , [ ] extracted physical features from micro-doppler signature [ ] in the time-dopplerfrequency (tdf) domain to classify different gestures. li et al. [ ] extracted sparsity-based features from tdf spectrums for gesture recognition using a doppler radar. in addition to doppler information of hand gestures, the google soli project [ ] , [ ] utilized the range-doppler (rd) spectrums for gesture recognition via a ghz frequency-modulated continuous wave (fmcw) radar sensor. thanks to the wide available bandwidth ( ghz), their systems could recognize fine hand motions. similarly, the authors in [ ] - [ ] also extracted hand motions based on rd spectrums via an fmcw radar. in [ ] , [ ] , apart from the range and doppler information of hand gestures, they also considered the incident angle information by using multiple receive antennas to enhance the classification accuracy of their gesture recognition system. however, none of the aforementioned techniques exploited all the characteristics of a gesture simultaneously, i.e., range, doppler, azimuth, elevation and temporal information. for example, in [ ] - [ ] , they could not differentiate two gestures, which share similar range and doppler information. this restricts the design of gestures to be recognized. in order to classify different hand gestures, many research works employed artificial neural networks for this multiclass classification task. for example, the authors in [ ] , [ ] - [ ] considered the tdf spectrums or range profiles as images and directly fed them into a deep convolutional neural network (cnn). whereas, other research works [ ] , [ ] , [ ] considered the radar data over multiple measurement-cycles - © ieee. personal use of this material is permitted. permission from ieee must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. as a time-sequential signal, and utilized both the cnns and recurrent neural networks (rnns) for gesture classification. the soli project [ ] employed a -dimensional ( -d) cnn with a long short-term memory (lstm) to extract both the spatial and temporal features, while the latern [ ] , [ ] replaced the -d cnn with -d cnn [ ] followed by several lstm layers. because the -d cnn could extract not only the spatial but also the short-term temporal information from the rd spectrum sequence, it results in a better classification accuracy than the -d cnn [ ] . however, the proposed -d cnn, -d cnn and lstm for gesture classification require huge amounts of memory in the system, and are computationally inefficient. although choi et al. [ ] projected the range-doppler-measurement-cycles into rangetime and doppler-time to reduce the input dimension of the lstm layer and achieved a good classification accuracy in real-time, the proposed algorithms were implemented on a personal computer with powerful computational capability. as a result, the aforementioned radar-based gesture recognition system in [ ] , [ ] - [ ] , [ ] - [ ] are not applicable for most commercial embedded systems such as wearable devices, smartphones, in which both memory and computational power are limited. in this paper, we present a real-time gesture recognition system using a ghz fmcw radar in an edge-computing platform. the proposed system is expected to be applied in short-range applications (e.g., tablet, display, and smartphone) where the radar is assumed to be stationary to the user. the entire signal processing framework is depicted in fig. . after applying the -dimensional finite fourier transform to the raw data, we select a certain number of points from the resulting rd spectrum as an intermediate step rather than directly putting the entire spectrum into deep neural networks. additionally, thanks to the l-shaped receive antenna array, the angle of arrival (aoa) information of the hand, i.e., azimuth and elevation, can be calculated. for every measurement-cycle, we store this information in a feature matrix with reduced dimensions. by selecting a few points from the rd spectrum, we reduce the input dimension of the classifier and limit the computational cost. further, we present a hand activity detection (had) algorithm called the short-term average/longterm average (sta/lta)-based gesture detector. it employs the concept of sta/lta [ ] to detect when a gesture comes to an end, i.e., the tail of a gesture. after detecting the tail of a gesture, we arrange the feature matrices belonging to the measurement-cycles, which are previous to this tail, into a feature cube. this feature cube constructs a compact and comprehensive gesture profile which includes the features of all the dominant point scatters of the hand. it is subsequently fed into a shallow cnn for classification. the main contributions are summarized as follows: • the proposed signal processing framework is able to recognize more gestures ( gestures) than those reported in other works in the literature. the framework can run in real-time built in an edge-computing platform with limited memory and computational capability. • we develop a multi-feature encoder to construct the ges-ture profile, including range, doppler, azimuth, elevation and temporal information into a feature cube with reduced dimensions for the sake of data processing efficiency. • we develop an had algorithm based on the concept of sta/lta to reliably detect the tail of a gesture. • since the proposed multi-feature encoder has encoded all necessary information in a compact manner, it is possible to deploy a shallow cnn with a feature cube as its input to achieve a promising classification performance. • the proposed framework is evaluated twofold: its performance is compared with the benchmark in off-line scenario, and its recognition ability in real-time case is assessed as well. the remainder of this paper is organized as follows. section ii introduces the fmcw radar system. section iii describes the multi-feature encoder including the extraction of range, doppler and aoa information. in section iv, we introduce the had algorithm based on the concept of the sta/lta. in section v, we present the structure of the applied shallow cnn for gesture classification. in section vi, we describe the experimental scenario and the collected gesture dataset. in section vii, the performance is evaluated in both off-line and real-time cases. finally, conclusions are given in section viii. our ghz radar system adopts the linear chirp sequence frequency modulation [ ] to design the waveform. after mixing, filtering and sampling, the discrete beat signal consisting of i t point scatters of the hand in a single measurement-cycle from the z-th receive antenna can be approximated as [ ] : where the range and doppler frequencies f ri and f di are given as: respectively, r i and v ri are the range and relative velocity of the i-th point scatter of the hand, f b is the available bandwidth, t c is the chirp duration, λ is the wavelength at ghz, c is the speed of light, the complex amplitude a (z) i contains the phase information, i s is the number of sampling points in each chirp, i c is the number of chirps in every measurement-cycle, and the sampling period t s = t c /i s . the ghz radar system applied for gesture recognition can be seen in fig. . it can also be seen that, the radar system has an l-shaped receive antenna array. to calculate the aoa in azimuth and elevation directions, the spatial distance between two receive antennas in both directions is d, where d = λ/ . a -d fft is applied to the discrete beat signal in ( ) to extract the range and doppler information in every measurement- cycle [ ] . the resulting complex-valued rd spectrum for the z-th receive antenna can be calculated as: where w(u, v) is a -d window function, p and q are the range and doppler frequency indexes. the range and relative velocity resolution can be deduced as: where the range and doppler frequency resolution ∆f r and ∆f d are /t c and /(i c t c ), respectively. to improve the signal-to-noise ratio (snr), we sum the rd spectrums of the three receive antennas incoherently, i.e., to obtain the range, doppler and aoa information of the hand in every measurement-cycle, we select k points from rd(p, q), which have the largest magnitudes. the parameter k is predefined, and its choice will be discussed in section vii-a. then, we extract the range, doppler frequencies and the magnitudes of those k points, which are denoted asf rk , f dk and a k , respectively, where k = , · · · , k. the aoa can be calculated from the phase difference of extracted points in the same positions of complex-valued rd spectrums belonging to two receive antennas. the aoa in azimuth and elevation of the k-th point can be calculated as: respectively, where ψ(·) stands for the phase of a complex value, a (z) k is the complex amplitude b (z) f rk ,f dk from the z-th receive antenna. as a consequence, in every measurement-cycle, the k-th point in rd(p, q) has five attributes, i.e., range, doppler, azimuth, elevation and magnitude. as depicted in fig. , we encode the range, doppler, azimuth, elevation and magnitude of those k points with the largest magnitudes in rd(p, q) along i l measurement-cycles into the feature cube v with dimension i l ×k × . the v has five channels corresponding to five attributes and each element in v at the l-th measurementcycle can be described as: where l = , · · · , i l . similar to voice activity detection in the automatic speech recognition system, our gesture recognition system also needs to detect some hand activities in advance, before forwarding the data to the classifier. it helps to design a power-efficient gesture recognition system, since the classifier is only activated when a gesture is detected rather than keeping it active for every measurement-cycle. the state-of-the-art event detection algorithms usually detect the start time-stamp of an event. for example, the authors in [ ] used the sta/lta and power spectral density methods to detect when a micro-seismic event occurs. in the case of radar-based gesture recognition, we could also theoretically detect the start time-stamp of a gesture and consider that a gesture event occurs within the following i l measurement-cycles. however, detecting the start-stamp and forwarding the hand data in the following i l measurement-cycles to the classifier could cause a certain time delay, since the time duration of designed gestures is usually different. as illustrated in fig. (a) , due to the facts that the proposed multi-feature encoder requires i l measurementcycles and the duration of the gesture is usually shorter than i l , a delay occurs, if we detect the start time-stamp of the gesture. therefore, as depicted in fig. (b) , to reduce the time delay, our proposed had algorithm is designed to detect when a gesture finishes, i.e., the tail of a gesture, rather than detecting the start time-stamp. we propose a sta/lta-based gesture detector to detect the tail of a gesture. the exponential moving average (ema) is used to detect the change of the magnitude signal at the l-th measurement-cycle, which is given as: where α ∈ [ , ] is the predefined smoothing factor, x(l) is the range-weighted magnitude (rwm), and it is defined as: where a max represents the maximal magnitude among k points in rd(p, q) at l-th measurement-cycle, f rmax denotes the range corresponding to a max , and the predefined coefficient β denotes the compensation factor. the radar cross section (rcs) of a target is independent of the propagation path loss between the radar and the target. according to the radar equation [ ] , the measured magnitude of a target is a function of many arguments, such as the path loss, rcs, etc. as deduced in ( ), we have built a coarse estimate of the rcs by multiplying the maximal range information with its measured magnitude to partially compensate the path loss. furthermore, we define the sta(l) and lta(l) as the mean ema in short and long windows at the l-th measurementcycle: respectively, where l and l are the length of the short and long window. the tail of a gesture is detected, when the following conditions are fulfilled: where γ and γ are the predefined detection thresholds. fig. illustrates that the tails of two gestures are detected via the proposed sta/lta gesture detector. according to ( ) , one condition of detecting the tail of a gesture is that, the average of rwm in the long window exceeds the threshold γ . it means that a hand motion appears in the long window. the other condition is that, the ratio of the mean ema in the short window and that in the long window is lower than the threshold γ . in other words, it detects when the hand movement finishes. in practice, the parameters β, γ and γ in our had algorithm should be thoroughly chosen according to different application scenarios. as discussed in section iii-d, the feature cube obtained by the multi-feature encoder has a dimension of i l ×k × . thus, we could simply use the cnn for classification without any reshaping operation. the structure of the cnn can be seen in fig. . we employ four convolutional (conv) layers, each of that has a kernel size × and the number of kernels in each conv layer is . in addition, the depth of the first kernel is five, since the input feature cube has five channels (i.e., range, doppler, azimuth, elevation and magnitude), while that of the other kernels in the following three conv layers is . we choose the rectified linear unit (relu) [ ] as activation function, since it solves the problem of gradient vanishing and is able to accelerate the convergence speed of training [ ] . then, the last conv layer is connected by two fullyconnected (fc) layers, either of which has hidden units and is followed by a dropout layer for preventing the network from overfitting. the third fc layer with a softmax function is utilized as the output layer. the number of hidden units in the third fc layer is designed to be in accordance with the number of classes in the dataset. the softmax function normalizes the output of the last fc layer to a probability distribution over the classes. through thoroughly network tuning (e.g., number of hidden layers, number of hidden units, depth number), we construct the cnn structure as shown in fig. . the designed network should (a) take the feature cube as input, (b) achieve a high classification accuracy, (c) consume few computational resources, and (d) be deployable in the edge-computing platform. in section vii, we will show that the designed network in fig. fulfills these criteria. as illustrated in fig. , we used the ghz fmcw radar in fig. to recognize gestures. our radar system has a detection range up to . m and an approx. • antenna beam width in both azimuth and elevation directions. the parameter setting used in the waveform design is presented in table i , where the pulse repetition interval (pri) is ms. the radar is connected with an edge-computing platform, i.e., nvidia jetson nano, which is equipped with quad-core arm a at . ghz as central processing unit (cpu), -core maxwell as graphics processing unit (gpu) and gb memory. we have built our entire radar-based gesture recognition framework described in fig. in the edge-computing platform in c/c++. the proposed multi-feature encoder and had have been implemented in a straightforward manner without any runtime optimization, while the implementation of the cnn is supported by tensorrt developed by nvidia. in addition, as depicted in fig we invited human subjects including both genders with various heights and ages to perform these gestures. among subjects, the ages range from to years old, and the heights are from cm to cm. we divided the subjects into two groups. in the first group, ten subjects were taught how to perform gestures in a normative way. whereas, in the second group, in order to increase the diversity of the dataset, only an example for each gesture was demonstrated to the other ten subjects and they performed gestures using their own interpretations. self-evidently, their gestures were no longer as normative as the ones performed by the ten taught subjects. furthermore, every subject repeated each gesture times. therefore, the total number of realizations in our gesture dataset is ( gestures)×( people)×( times), namely . we also found out that the gestures performed in our dataset take less than . s. thus, to ensure that the entire hand movement of a gesture is included in the observation time, we set i l to , which amounts to a duration of . s ( measurement-cycles × ms). in this section, the proposed approach is evaluated regarding a twofold objective: first, its performance is thoroughly compared with benchmarks in literature through an off-line crossvalidation, and secondly, its real-time capability is investigated with an on-line performance test. in section vii-a, we discuss how the parameter k affects the classification accuracy. in section vii-b, we compare our proposed algorithm with the state-of-the-art radar-based gesture recognition algorithms in terms of classification accuracy and computational complexity based on leave-one-out cross-validation (loocv). it means that, in each fold, we use the gestures from one subject as test set, and the rest as training set. in addition, section vii-c describes the real-time evaluation results of our system. the performances of taught and untaught subjects are evaluated separately. we randomly selected eight taught and eight untaught subjects as training sets, while the remaining two taught and two untaught subjects are test sets. in realtime performance evaluation, we performed the hardware-inthe-loop (hil) test, and fed the raw data recorded by the radar from the four test subjects into our edge-computing platform. a. determination of parameter k as described in section iii, we extract k points with the largest magnitudes from rd(p, q), to represent the hand information in a single measurement-cycle. we define the average (avg.) accuracy as the avg. classification accuracy across the gestures based on loocv. in fig. , we let k vary from to , and compute the avg. accuracy in five trials. it can be seen that the mean avg. accuracy over five trials keeps increasing and reaches approx. %, when k is . after that, increasing k can barely improve the classification accuracy. as a result, in order to keep low computational complexity of the system and achieve a high classification accuracy, we set k to . it results that the feature cube v in our proposed multi-feature encoder has a dimension of × × . in the off-line case, we assumed that each gesture is perfectly detected by the had algorithm and compared our proposed multi-feature encoder + cnn with the -d cnn + lstm [ ] , the -d cnn + lstm [ ] , -d cnn + lstm (with aoa) and shallow -d cnn + lstm (with aoa) in terms of the avg. classification accuracy and computational complexity based on loocv. in our proposed multi-feature encoder + cnn, the feature cube v, which has the dimension of × × , was fed into the cnn described in fig. . the input of the -d cnn + lstm [ ] and the -d cnn + lstm [ ] is the rd spectrum sequence over measurement-cycles, which has the dimension of × × × . since [ ] did not include any aoa information in their system for gesture classification, the comparison might not be fair. thus, we added the aoa information according to ( ) and ( ) cnn but with reduced classification accuracy. to achieve a fair comparison, we optimized the structures and the hyperparameters as well as the training parameters of those models. the cnn demonstrated in fig. in the proposed approach was trained for steps based on the back propagation [ ] using the adam optimizer [ ] with an initial learning rate of × − , which degraded to − , − and − after , and steps, respectively. the batch size is . ) classification accuracy and training loss curve: in table ii , we present the classification accuracy of each type of gesture based on the algorithms mentioned above. the avg. accuracies of the -d cnn + lstm [ ] and -d cnn + lstm [ ] are only . % and . %, respectively. since no aoa information is utilized, the rotate cw and rotate ccw can hardly be distinguished, and similarly the four swipe gestures can hardly be separated, either. on the contrary, considering the aoa information, the multi-feature encoder + cnn, the -d cnn + lstm (with aoa) and the shallow -d cnn + lstm (with aoa) are able to separate the two rotate gestures, and the four swipe gestures. it needs to be mentioned that the avg. accuracy of our proposed multifeature encoder is almost the same as that of the -d cnn + lstm with (aoa). however, it will be shown in the following section that our approach requires much less computational resources and memory than those of the other approaches. what's more, in fig. , we plot the training loss curves of the three structures of neural networks. it can be seen that the loss of the proposed cnn in fig. has the fastest rate of convergence among the three structures of neural networks and approaches to zero at around the -th training step. unlike the input of the -d cnn + lstm (with aoa) and shallow -d cnn + lstm (with aoa), the feature cube contains sufficient gesture characteristics in spite of its compact form ( × × ). it results that the cnn in fig. is easier to be trained than the other neural networks, and it achieves a high classification accuracy. ) confusion matrix: in fig. , we plotted two confusion matrices for ten taught and ten untaught subjects based on our proposed multi-feature encoder + cnn. it could be observed that, for the normative gestures performed by the ten taught subjects, we could reach approx. . % avg. accuracy. although we could observe an approx. % degradation in avg. accuracy in fig. (b) , where the gestures to be classified are performed by ten untaught subjects, it still has . % avg. accuracy. ) computational complexity and memory: the structures of the -d cnn + lstm (with aoa), shallow -d cnn + lstm (with aoa) and the proposed multi-feature encoder + cnn are presented in table iii . we evaluated their computational complexity and required memory in line with the giga floating point operations per second (gflops) and the model size. the gflops of different models were calculated by the built-in function in tensorflow, the model size is observed through tensorboard [ ] . although the -d cnn + lstm (with aoa) offers almost the same classification accuracy as that of the proposed multi-feature encoder + cnn, it needs much more gflops than that of the multi-feature encoder + cnn ( . gflops vs. . gflops). its model size is also much larger than that of the proposed approach ( mb vs. . mb). although we could reduce its gflops using a shallow network structure, such as the shallow -d cnn + lstm (with aoa) in table iii , it results in the degradation of classification accuracy ( . %), as can be seen in table ii . we also found out that the cnn used in our approach has the least model size, since its input dimension is much smaller than that of other approaches. on the contrary, the input of the -d cnn + lstm (with aoa) contains lots of zeros due to the sparsity of rd spectrums. such large volumes usually need large amounts of coefficients in neural networks. whereas, we exploit the hand information in every measurement-cycle using only points, and the input dimension of the cnn is only × × , which requires much less computational complexity than the other approaches. as mentioned above, subjects are divided into taught and untaught groups, and each has ten subjects. in each group, eight subjects are randomly selected as training set, and the remaining two subjects constitute the test set, resulting in either group having true gestures in the test set. in the hil context, we directly fed the recorded raw data from the four test subjects into the edge-computing platform. in the realtime case, the system should be robust enough to distinguish true gestures from random motions (rms). thus, we also included a certain amount of rms as negative samples during the training phase. the scale of rms and true gestures is around : . ) precision, recall and f -score: to quantitatively analyze the real-time performance of our system, we introduce the precision, recall and f -score, which are calculated as: precision = tp tp + fp , recall = tp tp + fn , where tp, fp and fn denote the number of true positive, false positive, and false negative estimates. for two subjects in the test set, we have realizations for each gesture. it means that tp + fn = . as presented in table iv , the avg. precision and recall over types of gestures using two taught subjects as test set are . % and . %, respectively, while those using two untaught subjects as test set are . % and . %. it needs to be mentioned that, the off-line avg. accuracies in fig. , namely . % and . %, can also be regarded as the recall in taught and untaught cases. after comparing with the recall in the off-line case, we could observe an approx. % and % degradation in recall in the realtime case considering both the taught and untaught subjects. the reason is that, in the off-line performance evaluation, we assumed that each gesture is detected perfectly. however, in the real-time case, the recall reduction is caused by the facts that our had performance miss-detected some gestures or incorrectly triggered the classifier even when the gesture was not completely finished. for example, due to the small movement of the hand, the had sometimes failed to detect the gesture "pinch index". similarly, the recall of the gesture "cross" is also impaired, since the gesture "cross" has a turning point, which leads to a short pause. in some cases where the subject performs the gesture "cross" with lowvelocity, the had would incorrectly consider the turning point as the end of "cross", resulting in a wrong classification. overall, in both taught and untaught cases, the f -score of our radar-based gesture recognition system reaches . % and . %, respectively. ) detection matrix: we summarized the gesture detection results of our real-time system. since we did not aim to evaluate the classification performance here, we depicted the detection results in table v considering all four test subjects. our system correctly detected true positive gestures, and provoked false alarms among the total of test samples in which there are true gestures and true negative rms, respectively. furthermore, we define two different types of miss-detections (mds), in which the mds from had means that our had miss-detects a gesture, while the mds from the classifier means that, the had detects the gesture, but this gesture is incorrectly rejected by our classifier as a rm. the false alarm rate (far) and miss-detection rate (mdr) of our system are . % and . %, respectively. ) runtime: as depicted in table vi , in the hil context, we also noted the avg. runtime of the multi-feature encoder, had and cnn based on all the classifications, which include true positives, true negatives, false alarms and mds from the classifier. the multi-feature encoder includes the -d fft, points selection, rd and aoa estimation. it needs to be mentioned that the multifeature encoder and the had were executed in the cpu using unoptimized c/c++ code, while the cnn ran in the gpu based on tensorrt. the multi-feature encoder and had took only approx. . ms and . ms without using any fft acceleration engine, while the cnn took only . ms on average. the overall runtime of our proposed radar-based gesture recognition system is only approx. ms. we developed a real-time radar-based gesture recognition system built in an edge-computing platform. the proposed multi-feature encoder could effectively encode the gesture profile, i.e., range, doppler, azimuth, elevation, temporal information as a feature cube, which is then fed into a shallow cnn for gesture classification. furthermore, to reduce the latency caused by the fixed number of required measurementcycles in our system, we proposed the sta/lta-based gesture detector, which detects the tail of a gesture. in the off-line case, based on loocv, our proposed gesture recognition approach achieves . % and . % avg. accuracy using gestures from taught and untaught subjects, respectively. in addition, the trained shallow cnn has a small model size and requires few gflops. in the hil context, our approach achieves . % and . % f -scores based on two taught and two untaught subjects as test sets, respectively. finally, our system could be built in the edge-computing platform, and requires only approx. ms to recognize a gesture. thanks to the promising recognition performance and low computational complexity, our proposed radar-based gesture recognition system has the potential to be utilized for numerous applications, such as mobile and wearable devices. in future works, different gesture datasets with large diversity need to be constructed according to specific use cases. what's more, in some use cases where the radar is not stationary to the user, the classification accuracy of the proposed system might decrease and accordingly algorithms, such as ego motion compensation, could be considered. micro-doppler effect in radar: phenomenon, model, and simulation study millimeter-wave technology for automotive radar sensors in the ghz frequency band an ultra-wideband ghz fmcw radar system using a sige bipolar transceiver chip stabilized by a fractional-n pll synthesizer radar-based human-motion recognition with deep learning: promising applications for indoor monitoring radar signal processing for sensing in assisted living: the challenges associated with real-time implementation of emerging algorithms motion sensing using radar: gesture interaction and beyond google pixel and xl handson: this time, it's not about the camera persistence of coronaviruses on inanimate surfaces and its inactivation with biocidal agents gesture classification with handcrafted micro-doppler features using a fmcw radar soli: ubiquitous gesture sensing with millimeter wave radar hand gesture recognition based on radar micro-doppler signature envelopes hand gesture recognition using micro-doppler signatures with convolutional neural network sparsity-driven micro-doppler feature extraction for dynamic hand gesture recognition interacting with soli: exploring fine-grained dynamic gesture recognition in the radio-frequency spectrum ts-i d based hand gesture recognition method with radar sensor short-range radar based real-time hand gesture recognition using lstm encoder short-range radar-based gesture recognition system using d cnn with triplet loss hand-gesture recognition using two-antenna doppler radar with deep convolutional neural networks automatic radar-based gesture detection and classification via a region-based deep convolutional neural network u-deephand: fmcw radar-based unsupervised hand gesture feature learning using deep convolutional auto-encoder network latern: dynamic continuous hand gesture recognition using fmcw radar sensor riddle: real-time interacting with hand description via millimeter-wave sensor d convolutional neural networks for human action recognition multimodal gesture recognition using -d convolution and convolutional lstm comparison of the sta/lta and power spectral density methods for microseismic event detection new chirp sequence radar waveform a high-resolution framework for range-doppler frequency estimation in automotive radar systems two-dimensional subspace-based model order selection methods for fmcw automotive radar systems radar handbook rectified linear units improve restricted boltzmann machines empirical evaluation of rectified activations in convolutional network backpropagation applied to handwritten zip code recognition adam: a method for stochastic optimization tensorflow: a system for large-scale machine learning and the research institute for automotive electronics (e-lab) in collaboration with hella gmbh & co. kgaa, lippstadt, germany. his research interests are automotive radar signal processing, radar-based human motion recognition and machine learning collaboration with the signal processing group at tud, darmstadt, germany, where his research interest was the detection and classification of underwater mines in sonar imagery lippstadt, germany, where he is mainly responsible for the development of reliable signal processing algorithms for automotive radar systems xibo li received the b.sc. degree in mechanical engineering from beijing institute of technology his current research interests include automotive radar signal processing, machine learning and sensor fusion as a research associate at the institute for power electronic and electrical drives (isea), he was involved in several projects related to ageing of lithiumion batteries at the chair for electrochemical energy conversion and storage systems he joined the department of communications engineering of the university of paderborn in as a research staff member, where he was involved in several projects related to single-and multi-channel speech processing and automated speech recognition he is currently the head of the radar signal processing and signal validation department at hella gmbh & co. kgaa, lippstadt, germany. nils pohl (gsm' -m' -sm' ) received the dipl.-ing. and dr.-ing. degrees in electrical engineering from he has authored or coauthored more than scientific papers and has issued several patents. his current research interests include ultra-wideband mm-wave radar, design, and optimization of mm-wave integrated sige circuits and system concepts with frequencies up to ghz and above, as well as frequency synthesis and antennas. prof. pohl is a member of vde, itg, euma, and ursi. he was a corecipient of the the authors would like to thank the editor and anonymous reviewers for giving us fruitful suggestions, which significantly improve the quality of this paper. many thanks to the students for helping us collect the gesture dataset in this interesting work. key: cord- -r idtl authors: yasar, huseyin; ceylan, murat title: a new deep learning pipeline to detect covid- on chest x-ray images using local binary pattern, dual tree complex wavelet transform and convolutional neural networks date: - - journal: appl intell doi: . /s - - - sha: doc_id: cord_uid: r idtl in this study, which aims at early diagnosis of covid- disease using x-ray images, the deep-learning approach, a state-of-the-art artificial intelligence method, was used, and automatic classification of images was performed using convolutional neural networks (cnn). in the first training-test data set used in the study, there were x-ray images, of which were covid- and were non-covid- , while in the second training-test data set there were x-ray images, of which were covid- and were non-covid- . thus, classification results have been provided for two data sets, containing predominantly covid- images and predominantly non-covid- images, respectively. in the study, a -layer cnn architecture and a -layer cnn architecture were developed. within the scope of the study, the results were obtained using chest x-ray images directly in the training-test procedures and the sub-band images obtained by applying dual tree complex wavelet transform (dt-cwt) to the above-mentioned images. the same experiments were repeated using images obtained by applying local binary pattern (lbp) to the chest x-ray images. within the scope of the study, four new result generation pipeline algorithms having been put forward additionally, it was ensured that the experimental results were combined and the success of the study was improved. in the experiments carried out in this study, the training sessions were carried out using the k-fold cross validation method. here the k value was chosen as for the first and second training-test data sets. considering the average highest results of the experiments performed within the scope of the study, the values of sensitivity, specificity, accuracy, f- score, and area under the receiver operating characteristic curve (auc) for the first training-test data set were , , , , , , , and , respectively; while for the second training-test data set, they were , , , , , , , and , ; respectively. within the scope of the study, finally, all the images were combined and the training and testing processes were repeated for a total of x-ray images comprising covid- images and non-covid- images, by applying -fold cross. in this context, the average highest values of sensitivity, specificity, accuracy, f- score, and auc for this last training-test data set were found to be , , , , , , , and , ; respectively. in the last few months of , a new type of virus, which is a member of the family coronaviridae, emerged. the virus in question is considered to have had a zoonotic origin [ ] . the virus that emerged in the city of wuhan in hubei province in china affected this region first and then spread all over the world in a short time. the virus generally affects the upper and lower respiratory tract, lungs, and, less frequently, the heart muscles [ ] . while the virus generally affects young and middle-aged people and people who do not have any chronic diseases to a lesser extent, it can cause severe consequences, resulting in death, in people who suffer from diseases such as hypertension, cardiovascular disease, and diabetes [ ] . the epidemic, which was declared to be a pandemic in march by the world health organization; as of the first week of october of the same year, had a number of cases approaching thirty-six million, while the death toll reached one million hundred thousand. also, a modeling study carried out by hernandez-matamoros et al. [ ] indicates that the effects of the epidemic will become more severe in the future. in people suffering severely from the disease, the serious adverse effects are generally in the lungs [ ] . in this context, many literature studies have been carried out in a short time in which these effects of the disease in the lungs were shown using ct scans of lungs and chest x-ray imaging. literature studies indicate that radiological imaging, along with clinical symptoms, blood, and biochemical tests, is an effective and reliable diagnostic tool for the diagnosis of covid- disease. many clinical studies in which x-ray images were examined [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] have shown that covid- disease causes interstitial involvement, bilateral and irregular ground-glass opacities, parenchymal abnormalities, a unilobar reversed halo sign, and consolidation on the lungs. the recent review article published by long and ehrenfeld [ ] highlighted the importance of using artificial intelligence methods to quickly diagnose covid- disease and reduce the effects of the outbreak crisis. in this context, some literature studies have been carried out that diagnose covid- disease (covid- and non-covid- ) through x-ray images and using deep learning methods. table contains some summary information about the number of images, study methods, and study results used in these literature studies. ct imaging generally contains more data than x-ray imaging. however, it has some disadvantages for the follow-up of all stages of the disease due to the excess amount of radiation that the patients are exposed to. for this reason, an artificial intelligence application using x-ray images was created and tested in the study. in this study, which aims at early diagnosis of covid- disease with the help of x-ray images, a deep learning approach, which is an artificial intelligence method applying the latest technology, was used. in this context, automatic classification of the images was carried out through the two different convolutional neural networks (cnns). in the study, experiments were carried out for the use of images directly, using local binary pattern (lbp) as a pre-process and dual tree complex wavelet transform (dt-cwt) as a secondary operation, and the results of the automatic classification were calculated separately. within the scope of the study, four new classification approaches that involve performing the experiments together and combining the results through a result generation algorithm, have been proposed and tested. the results of the study show that in the diagnosis of covid- disease, the analysis of chest x-ray images using deep learning methods provides fast and highly accurate results. the chest x-ray images of patients with covid- used in the study were obtained by combining metadata data sets that were made open access over github after being created by cohen et al. [ ] and over kaggle after being created by dadario [ ] . the images that these data sets contain in common and the clinical notes related to these images were combined and a mixed covid- image data set consisting of chest x-ray images was created. in the study, images obtained while the patients were facing the x-ray device directly were used. in the studies, the images taken from the same patient were obtained on different days of the course of the disease and therefore do not contain exactly the same content. the dimensions of the images in question vary between px × px and px × px (px is pixel abbreviation) and show a wide variety. also, these images have different data formats such as png, jpg, jpeg and two different bit depths such as -bit (gray-level) and -bit (rgb). standardization of the images is an essential process for use in this study. in this context, all of the images have been converted to -bit gray-level images. then, to clarify the area of interest on the images, manual framing was performed so as to cover the chest area. after this process, all the images were rearranged to px × px and saved in png format. for the non-covid- x-ray images in the study, two data sets, a montgomery data set [ ] and a shenzhen data set [ ] , were used separately. these databases contain and non-covid- x-ray images, respectively. the first trainingtest data set contains a total of x-ray images, of which are covid- images and are non-covid- images, while the second training-test data set contains x-ray images, of which are covid- images and are non-covid- images. thus, it was ensured that classification results were obtained for the two data sets that contained predominantly covid- images and predominantly non-covid- images, respectively. the processes applied to the covid- images were likewise applied to the non-covid- images. in fig. , original and edited versions of the x-ray images are shown; one belonging to a patient with covid- and two belonging to people without covid- (non-covid- people). local binary pattern (lbp) is an approach that was proposed by ojala et al. [ ] to reveal local features. the method is basically based on comparing a pixel on the image to the neighboring pixels one by one, in terms of size. in fig. , the images obtained by applying the lbp operation to the x-ray images given in fig. are included. the purpose of benefiting from lbp operation within the scope of this study is to observe the effects of using lbp images, which reflect the local features in the cnn input on the study results, rather than the original images. additionally, the aim of the study is to increase the image feature depth used in the new result generation algorithm. dual tree complex wavelet transform (dt-cwt) was first introduced by kingsbury [ ] [ ] [ ] . this method is generally similar to the gabor wavelet transform. in the gabor wavelet transform, low-pass and high-pass filters are applied to the rows and columns of the image horizontally and vertically. in this way, two different sub-band groups are formed in rows and columns as low (l) and high (h). crossing is made during the conversion of the said one-dimensional bands into two dimensions. at the end of the process, a low sub-band, named ll, is obtained. in addition, three sub-bands containing high bands, lh, hl, and hh, are formed. further sub-bands (such as lll, llh) can be obtained by applying the same operations to the ll sub-band. unlike the gabor wavelet transform, instead of a single filter, dt-cwt uses two filters that work in parallel. these two trees contain real and imaginary parts of complex numbers. that is, as a result of the dt-cwt process, a sub-band containing more directions than the gabor wavelet transform is obtained. when dt-cwt is applied to an image, the processes are performed for six different directions, + , − , + , − , + , and − degrees. three of these directions represent real sub-bands and the other three represent imaginary sub-bands. figure shows the dt-cwt decomposition tree. in fig. , real and imaginary sub-band images obtained by applying the dt-cwt process (scale = ) to the x-ray images given in fig. , are shown. within the scope of the study, the dt-cwt process was used with a scale (level) value of , and the dimensions of the sub-band images obtained were half the size of the original images. since the complex wavelet transform has been successful in many studies [ ] [ ] [ ] where medical images have previously been used, this conversion was preferred in the study. deep learning has come to the fore in recent years as an artificial intelligence approach that provides successful results in many image processing applications from image enhancement (such as [ ] ) to object identification (such as [ , ] ). convolutional neural network (cnn) has been the preferred deep learning model in image processing applications in recent years. the cnn classifier, in general, consists of a convolution layer, activation functions, a pooling layer, a flatten layer, and fully connected layer components. in this context, fig. describes the general operation of the cnn classifier. it is possible to examine more detailed information fig. a) x-ray image of a patient with covid- (phan et al. [ ] ) b) non-covid- x-ray image (montgomery data set [ ] )) c) non-covid- x-ray image (shenzhen data set [ ] )) about the functions and operating modes of the layers in the cnn classifier from the studies [ ] [ ] [ ] [ ] [ ] . within the scope of the study, a cnn architecture with a total of layers was designed. an effective design was aimed at, since increasing the number of layers in the cnn architecture leads to increased processing time in the training and classification processes. table contains details of the first cnn architecture used in the study. also, a second cnn architecture was used to check whether the proposed pipeline approaches applied to other cnn architectures. in this context, an architecture modeled on vgg- cnn was used. however, to reduce the processing load, the number of filters and the fully connected layer sizes have been reduced. additionally, normalization layers were added after the intermediate convolution layers. details of this second cnn architecture used are given in table . in the context of the study, matlab a program was preferred as software. the layer names and parameters in tables and are the names and parameters used directly in the software. in the study, more than one experiment was carried out and the sizes of the input images used in the experiments differ. for this reason, there are different sizes in the input layer in tables and . those cnn architectures were used in all the experiments carried out within the scope of the study. within the scope of the study, confusion matrix and statistical parameters obtained from this matrix were used to evaluate the results. it is possible to examine detailed information about the confusion matrix, i.e., sensitivity (sen), specificity (spe), accuracy (acc), and f- score (f- ), from the studies [ ] . receiver operating characteristic (roc) analysis was also used to evaluate the results. in addition, the sizes of the areas under the roc curve (area under curve (auc)) were calculated. roc analysis basically reflects graphically the variation of sensitivity (sen) (y-axis) relative to -spe (x-axis) for the case that the threshold value is gradually changed with a certain precision between the minimum and maximum output predicted for the classification. first of all, in the proposed pipeline algorithm, training and test procedures for images of size of × were performed and results were obtained. & before the experiments after the first experiment were conducted, dt-cwt was applied to the images of size & in the third experiment, training and testing procedures were carried out and results were obtained for the case of giving the imaginary part of the ll sub-band image obtained by applying dt-cwt, as input to the cnn. & in the fourth experiment, training and testing procedures were carried out and results were obtained for the case of & in the seventh experiment, results were obtained for the case of giving the real and imaginary parts of the ll, lh, hl sub-band images obtained by applying dt-cwt, as input to the cnn, together. a block diagram of the experiments carried out in the study is shown in fig. . the first seven experiments conducted were repeated using new images obtained by applying lbp to the x-ray images, and the first stage experiments were completed. since the image size decreases after lbp processing, these images were rearranged as px × px in size. in the ongoing part of the study, four pipeline classification algorithms were designed using the principle of parallel operation. these algorithms are based on combining the results of previous experiments to obtain new results. the first two pipeline classification algorithms mentioned above work as follows: & if the numbers of labeling (threshold value for , ) obtained in the experiments (with and without lbp) for an image are not equal to each other, the labeling result obtained in more than half of the experiments is considered to be the algorithm labeling result for covid- or non-covid- the basic coding of the first two pipeline classification approaches is included in table . in the codes between tables and , result- and label- represent the actual test result and the label obtained without using lbp, while result- and label- represent the actual test result and the label obtained using lbp. in the third and fourth pipeline algorithms, unlike the first two pipeline algorithms, if the tags obtained as a result of the classification experiment differ from each other, the result obtained without applying lbp has been taken into consideration with priority. accordingly, in the case where the two classification tags are different from each other in the third pipeline algorithm, if the tag result obtained without applying lbp was abnormal, the result was considered abnormal. in the fourth pipeline algorithm, in the case of the two classification tags being different from each other, if the tag result obtained without applying lbp was normal, the result was considered normal. the other procedures are the same as for the first two pipeline algorithms. a mixing rate of % - % was applied in the third and fourth pipeline algorithms. the basic coding of the third and fourth pipeline classification approaches is given in tables and . in this study, which aims to detect covid- disease early using x-ray images, the deep learning approach, which is the artificial intelligence method applying the latest technology, was used and automatic classification of the images was performed using cnn. in the first training-test data set used in the study, there were x-ray images, of which were covid- and were non-covid- , while in the second training-test data set there were x-ray images, of which were covid- and were non-covid- . thus, it was ensured that the classification results were obtained separately from the two data sets containing predominantly abnormal images and predominantly normal images. the information from the training-test data sets is given in table . within the scope of the study, chest x-ray images were manually framed to cover the lung region, primarily to determine the areas of interest on the image. then, standardization table basic coding of the pipeline algorithms (pipeline- and - ) proposed in the study table basic coding of the pipeline algorithm (pipeline- ) proposed in the study was carried out since the images used were of very different sizes, formats, and bit depths. the areas of interest on the image were resized and the image sizes were arranged as px × px. after that, the images in question were saved in png format so as to be as gray-scale and -bit depth. these operations were applied to all the abnormal and normal images used in the study. in the ongoing part of the study, a -layer cnn architecture and a -layer cnn architecture were designed and used, the details of which have been previously described. those cnn architectures were used in all the experiments. due to the fact that more than one experiment was performed within the scope of the study, only the images given to the cnn input differ in size. in the experiments conducted in the study, the trainings were carried out with the k-fold cross validation method. in this context, the k value was chosen as . since the first training-test data set consists of images, images, except for ten images at each stage (fold), were used for the training operations, and the remaining ten images were used for the testing operations. the second training-test data set consists of images, and, in the same way, except / ( groups consisting of images and seven groups of images) images, / images were used in the training operations, and the remaining / images were used in the testing operations. the test procedures were repeated times and classification results were obtained for all the images. finally, within the scope of the study, all the images were combined and the training and testing procedures were repeated by applying a -fold cross for a total of x-ray images comprising covid- images and non-covid- images. considering the length of the study as well, the results that have been shared in the study are only for the input data that provided the best results for the first and second data sets. in this part of the study, a total of experiments were carried out. some initial weights and parameters in the cnn are randomly assigned. to make the study results stable, each experiment was repeated five times in itself, and average results in the study are shown. within the scope of the study, the cpu time taken for an experiment to be completed entirely, including the training and testing, was divided by the total number of images processed, and the processing cpu time per image was measured. the experiments of this study were carried out using matlab (a) software running on a computer with gb ram and intel(r) xeon (r) cpu e - . ghz ( cpus). in the first experimental group within the scope of the study, the training and testing procedures were first performed using the table basic coding of the pipeline algorithm (pipeline- ) proposed in the study chest x-ray images, and the results were obtained. lbp operation was then applied to the images in question, and then the training and testing procedures were repeated and the results were calculated. finally, the results were calculated using the pipeline classification algorithms, the details of which were previously described and proposed within the scope of the study. due to the random assignment of some initial variables used in the internal structure of the cnn, each experiment group was repeated five times in order to make the results more stable. the image sizes given to the cnn as input for this experiment were × × . the results obtained from the experimental group are given in table (first training-test data set) and table (second training-test data set). in the second experimental group within the scope of the study, the training and testing procedures were performed using the real part of the ll sub-image obtained by applying dt-cwt to the chest x-ray images, and the results were obtained. then, the training and testing procedures were performed using the real part of the ll subimage obtained by applying the lbp and dt-cwt operations to the x-ray images, respectively. finally, the results were calculated using the pipeline classification algorithms, the details of which were previously described and proposed within the scope of the study. the image sizes given to the cnn as input for this experiment were × × . the results obtained from the experimental group are given in table (first training-test data set) and table (second training-test data set). in the third experimental group within the scope of the study, the training and testing procedures were performed using the imaginary part of the ll sub-image obtained by applying dt-cwt to the chest x-ray images, and the results were obtained. then, the training and testing procedures were performed using the imaginary part of the ll sub-image obtained by applying the lbp and dt-cwt operations to the x-ray images, respectively. finally, the results were calculated using the pipeline classification algorithms, the details of which were previously described and proposed within the scope of the study. the image sizes given to the cnn as input for this experiment were × × . the results obtained from the experimental group are given in table (first training-test data set) and table (second training-test data set). in the fourth experimental group within the scope of the study, the training and testing procedures were performed using the real part of the ll, lh and hl sub-images obtained by applying dt-cwt to the chest x-ray images, and the results were obtained. then, the training and testing procedures were performed using the real part of the ll, lh and hl sub-images obtained by applying the lbp and dt-cwt operations to the x-ray images, respectively. finally, the results were calculated using the pipeline classification algorithms, the details of which were previously described and proposed within the scope of the study. the image sizes given to the cnn as input for this experiment were × × . the results obtained from the experimental group are given in table (first training-test data set) and table (second training-test data set). in the fifth experimental group within the scope of the study, the training and testing procedures were performed using the imaginary part of the ll, lh and hl sub-images obtained by applying dt-cwt to the chest x-ray images, and the results were obtained. then, the training and testing procedures were performed using the imaginary part of the ll, lh and hl sub-images obtained by applying the lbp and dt-cwt operations to the x-ray images, respectively. finally, the results were calculated using the pipeline classification algorithms, the details of which were previously described and proposed within the scope of the study. the image in the sixth experimental group within the scope of the study, the training and testing procedures were performed using the real and imaginary parts of the ll sub-image obtained by applying dt-cwt to the chest x-ray images, and the results were obtained. then, the training and testing procedures were performed using the real and imaginary parts of the ll sub-image obtained by applying the lbp and dt-cwt operations to the x-ray images, respectively. finally, the results were calculated using the pipeline classification algorithms, the details of which were previously described and proposed within the scope of the study. the image sizes given to the cnn as input for this experiment were × × . the results obtained from the experimental group are given in table (first training-test data set) and table (second training-test data set). in the seventh experimental group within the scope of the study, the training and testing procedures were performed using the real and imaginary parts of the ll, lh, hl subimages obtained by applying dt-cwt to the chest x-ray images, and the results were obtained. then, the training and testing procedures were performed using the real and imaginary parts of the ll, lh, hl sub-images obtained by applying the lbp and dt-cwt operations to the x-ray images, respectively. finally, the results were calculated using the pipeline classification algorithms, the details of which were previously described and proposed within the scope of the study. the image sizes given to the cnn as input for this experiment were × × . the results obtained from he experimental group are given in table (first training-test data set) and table (second trainingtest data set). finally, all the training-test data sets were combined to test the performance of the proposed method and the pipeline approaches. in this context, a collective training-test data set containing a total of x-ray images comprising covid- and non-covid- images was created. then the k value was determined as (cross training and testing for covid- and non-covid- images). the training and testing processes were realized for the input images (original image and the ll (real sub-band)), ensuring the best results in the first and second training-test data sets. the results obtained are given in tables and . in this section, first of all, the results that were obtained without using pipeline algorithms are compared. when the results of the study given between tables and are examined within the scope of the study, it can be seen that the results of the study obtained without using lbp are generally better than the results of the study using lbp, for the same input image. in this context, it is understood that there are exceptions for the sensitivity parameter of some results obtained using the first cnn architecture for the first training-test data set. within the scope of the study, the highest mean sensitivity, specificity, accuracy, f- score, and auc values obtained without using the pipeline algorithms were, respectively; within the scope of the study, dt-cwt was used to reduce the image dimensions. in this way, dt-cwt tolerated the increase in result-producing time due to the use of the pipeline algorithm. in this context, when the results obtained using the original images and the ones obtained using dt-cwt are compared, it can be seen that there is no serious decrease in the results, in general. using dt-cwt, the image sizes were reduced successfully and a reduction in the result-producing times was achieved, in the study. the pipeline algorithms proposed within the scope of the study are based on combining the results obtained without using lbp and with using lbp, as detailed previously. after this stage, the study results obtained by using the pipeline algorithms were analyzed. with the introduction of the pipeline algorithms, improvements were achieved in all the parameters obtained by using both training-test data sets and the cnn architectures. in this context, an improvement was achieved in general, according to the highest results obtained without lbp and with using lbp, in terms of percentage ranging between , % and , % for the sensitivity parameter, between , % and , % for the specificity parameter, between % to , % for the accuracy parameter, between , % and , % for the f- score parameter, and between % and , % for the auc parameter. it was also observed that similar improvements were achieved for the experiments performed by combining all data and using -fold cross. in this context, according to the highest results obtained without lbp and with using lbp, an improvement was achieved generally in terms of percentage ranging between , % and , % for the sensitivity parameter, between , % and , % for the specificity parameter, between , % and , % for the accuracy parameter, between , % and , % for the f- score parameter, and between , % and , % for the auc parameter. when comparing the success of pipeline algorithms in improving the results in general, it can be seen that the algorithms of pipeline- and pipeline- obtain the highest sensitivity values; pipeline- obtains the highest specificity values; pipeline- and pipeline- obtain the highest accuracy values; pipeline- and pipeline- obtain the highest f- scores values; and pipeline- , pipeline- and pipeline- algorithms successfully obtained the highest auc values. when the input data with the best results obtained by using the pipeline algorithms are examined, it can be seen that using the real part of the ll sub-image band for the first training-test data set and using the original images for the second trainingtest data set provided the best results. experiments performed using the -fold cross by combining all the data also confirm this situation. for this reason, only the results of the experiments mentioned were included in the study, in consideration of the length of the study. the highest mean sensitivity, specificity, accuracy, f- score, and auc values obtained using the study pipeline algorithms are as follows, respectively; , , , , , , , , , for the first training-test data set and the first cnn architecture; , , , , , , , , , for the first training-test data set and the second cnn architecture; , , , , , , , , , for the second training-test data set and the first cnn architecture; and , , , , , , , , , for the second training-test and the second cnn architecture. the highest mean sensitivity, specificity, accuracy, f- score and auc values obtained in the experiments performed by combining all data and using the -fold cross were respectively; , , , , , , , , , for the first cnn architecture; and , , , , , , , , , for the second cnn architecture. within the scope of the study, the best results obtained before and after using the pipeline algorithm and the comparison of these results with the recent literature studies are given in table . as a result of our study on the automatic classification of chest x-ray images and using one of the deep learning methods, the cnn, some important and comprehensive test results were obtained for early diagnosis of covid- disease. when the results obtained within the scope of the study are compared with the literature studies detailed in tables and , the results of the study were found to be better than the out of the studies in which this value was calculated for the sensitivity parameter, than all the studies in which this value was calculated for the specificity parameter, than the out of the studies in which this value was calculated for the accuracy parameter, than the eight out of the nine studies in which this value was calculated for the f- score parameter, and than all the studies in which this value was calculated for the auc parameter. moreover, if it is necessary to make a comparison in terms of run-times, it was found that it produced a result at least three times faster in terms of runtime than the result was obtained in the study conducted by mohammed et al. [ ] . this study is the only study in which this parameter was calculated. also, it is at least ten times faster than the study conducted by toraman et al. [ ] . these two studies were studies in which the run-times were shared. no information was given about run-times in the other previous studies. overall, the results obtained within the scope of the study lagged behind the results obtained in studies conducted by tuncer et al. [ ] , benbrahim et al. [ ] , and loey et al. [ ] . however, in order to make a more detailed comparison, the number of images used in these studies should be compared with the number of images used in our study. the number of images used in our study is higher than the number of images used in these studies. in particular, the number of images used in our study is almost three times the number of images used by loey et al. [ ] . another important issue is the procedure for training and testing. there was no cross validation in the studies by benbrahim et al. [ ] and loet et al. [ ] . in our study, cross-validation in the training-test processes is one of the important measures taken against the overfitting problem that occurs during the training of the network. however, it is known that cross validation improves the reliability of the study results while balancing the study results. in this context, these issues should be taken into consideration when making a comparison. in the context of the study, if an evaluation should be based on the differentiation made between giving the images to the cnn as input directly and after the lbp was applied, it can be seen that the images obtained by applying the lbp produced worse results than the original images. however, the pipeline classification algorithm presented in the context of this study enabled the results obtained to be improved by combining the original and lbp-applied images. in this context, a significant part of the best results obtained in the study was provided using the pipeline classification algorithm. in this sense, it can be seen that the results of the study support some other literature studies [ ] [ ] [ ] [ ] [ ] [ ] where the cnn and lbp methods are used together and use of the lbp was shown to increase the success of the relevant study. the success achieved through the pipeline approaches in the study is due to the fact that some classification results that could not be revealed without using the lbp alone and with using the lbp alone were revealed by using the two methods together. feeding the results from the two sources in the pipeline approaches results in an increase in running time. [ ] , , , x x ozturk et al. [ ] , , , , x mohammed et al. [ ] , - , , - , , - , , - , , - , khan et al. [ ] , , , , x apostolopoulos and mpesiana [ ] , , , x x waheed et al. [ ] , - , , - , , - , x x mahmud et al. [ ] , , , , , vaid et al. [ ] , , , , x benbrahim et al. [ ] , - , x , - , , - , x elaziz et al. [ ] , - , x , - , x x martínez et al. [ ] , x , , x loey et al. [ ] , , , x x toraman et al. [ ] , - , , - , , - , , - , x duran-lopez et al. [ ] however, the results obtained within the scope of the study show that this time cost can be eliminated by using dt-cwt. in this way, it has been observed that working success can be increased significantly without time cost. it is considered that this model is within the scope of the study and can be used in many other deep learning studies. it was evaluated that another important factor in achieving the successful results in this study was the framing process, which included the chest region and clarified the area of interest before the training and test procedures started. hence, thanks to this pre-process carried out in this context, the parts lacking medical diagnostic information were removed from the images and only the relevant areas on the images were used in the procedures. as the size of the inputs given to the cnn increases, the time taken for the training and testing increases. the dt-cwt transformation used in the study reduces the size of the image by half. although the image sizes are reduced by half, there is no serious adverse effect on the study results. by contrast, some of the best results achieved in the study were obtained using the dt-cwt. in this context, although the pipeline classification algorithms proposed in the study increase the time to produce the results for the image, the times in question are less than half the time required for the images to be used directly without applying lbp and dt-cwt. also, all the training and test procedures provided in the study reflect the amount per image. however, approximately % of these periods are spent on the training procedures. in this context, in the case where the results obtained by the transfer learning approach are used with the pipeline classification algorithm proposed in the study, the periods mentioned will decrease accordingly. the pipeline algorithms revealed within the scope of the study were tested for data sets with different weights in terms of the number of covid- and non-covid- images, for different training-test ratios and different cnn architectures. the pipeline algorithms were successful for all these situations that may have affected the results. this shows that the proposed pipeline algorithms are not partial but are general solutions. from this point of view, it is obvious that if the pipeline algorithms mentioned above are added to the algorithms used in other literature studies, this would increase the success of these studies. the results of the study show that analyzing chest x-ray images in the diagnosis of covid- disease using deep learning methods will speed up the diagnosis and significantly reduce the burden on healthcare personnel. to further improve the results of the study, increasing the number of images in the training set, i.e., the creation of databases in which the clinical data of patients with covid- that are accessible to the public, is of prime importance. after this stage, it is aimed to realize applications using ct images of the lungs an important diagnostic tool, such as chest x-ray images, in covid- disease diagnosis. in addition, it is planned to analyze the effects of using the results obtained, through direct transfer learning in pipeline classification algorithms, on the study results. this is evaluated as another important application to classify the complex-valued sub-bands of images obtained by applying dt-cwt, with the help of using the complex-valued cnn directly. conflict of interest dr. ceylan declares that he has no conflict of interest. mr. yasar declares that he has no conflict of interest. ethical approval this article does not contain any studies with human participants or animals performed by any of the authors. a novel coronavirus from patients with pneumonia in china clinical features of patients infected with novel coronavirus in a review of coronavirus disease- forecasting of covid per regions using arima models and polynomial functions -novel coronavirus severe adult respiratory distress syndrome in two cases in italy: an uncommon radiological presentation featuring covid- cases via screening symptomatic patients with epidemiologic link during flu season in a medical center of central taiwan clinical characteristics of patients infected with sars-cov- in wuhan epidemiological and clinical characteristics of cases of novel coronavirus pneumonia in wuhan, china: a descriptive study a case of covid- and pneumonia returning from macau in taiwan: clinical course and anti-sars-cov- igg dynamic a locally transmitted case of sars-cov- infection in taiwan breadth of concomitant immune responses prior to patient recovery: a case report of non-severe covid- case of the index patient who caused tertiary transmission of covid- infection in korea: the application of lopinavir/ritonavir for the treatment of covid- infected pneumonia monitored by quantitative rt-pcr chest imaging appearance of covid- infection chest radiographic and ct findings of the novel coronavirus disease (covid- ): analysis of nine patients treated in korea clinical characteristics of imported cases of coronavirus disease (covid- ) in jiangsu province: a multicenter descriptive study coronavirus disease (covid- ): a perspective from china emerging novel coronavirus ( -ncov) pneumonia first case of novel coronavirus in the united states first case of coronavirus disease (covid- ) pneumonia in taiwan imaging profile of the covid- infection: radiologic findings and literature review evolution of ct manifestations in a patient recovered from novel coronavirus ( -ncov) pneumonia in wuhan first imported case of novel coronavirus in canada, presenting as mild pneumonia importation and human-to-human transmission of a novel coronavirus in vietnam the first vietnamese case of covid- acquired from china the role of augmented intelligence (ai) in detecting and preventing the spread of novel coronavirus an automated residual exemplar local binary pattern and iterative relieff based corona detection method using lung x-ray image application of deep learning for fast detection of covid- in x-rays using ncovnet automated detection of covid- cases using deep neural networks with x-ray images benchmarking methodology for selection of optimal covid- diagnostic model based on entropy and topsis methods coronet: a deep neural network for detection and diagnosis of covid- from chest x-ray images covid- : automatic detection from x-ray images utilizing transfer learning with convolutional neural networks covidgan: data augmentation using auxiliary classifier gan for improved covid- detection covxnet: a multidilation convolutional neural network for automatic covid- and other pneumonia detection from chest x-ray images with transferable multi-receptive feature optimization deep learning covid- detection bias: accuracy through artificial intelligence deep transfer learning with apache spark to detect covid- in chest x-ray images new machine learning method for image-based diagnosis of covid- performance evaluation of the nasnet convolutional network in the automatic identification of covid- within the lack of chest covid- x-ray dataset: a novel detection model based on gan and deep transfer learning convolutional capsnet: a novel artificial neural network approach to detect covid- disease from x-ray images using capsule networks covid-xnet: a custom deep learning system to diagnose and locate covid- in chest x-ray images deepcovid: predicting covid- from chest x-ray images using deep transfer learning covid- image data collection covid- x rays two public chest x-ray datasets for computer-aided screening of pulmonary diseases a comparative study of texture measures with classification based on featured distributions the dual-tree complex wavelet transform the dual-tree complex wavelet transform: a new efficient tool for image restoration and enhancement shift invariant properties of the dual-tree complex wavelet transform dual-tree complex wavelet transform and svd based medical image resolution enhancement a novel method for lung segmentation on chest ct images: complex-valued artificial neural network with complex wavelet transform blood vessel extraction from retinal images using complex wavelet transform and complex-valued artificial neural network improved adaptive image retrieval with the use of shadowed sets uncertainty-optimized deep learning model for small-scale person re-identification vehicle and wheel detection: a novel ssd-based approach and associated large-scale benchmark dataset kernel pooling for convolutional neural networks deep metric learning with angular loss a study on the cardinality of ordered average pooling in visual recognition data augmentation for eeg-based emotion recognition with deep convolutional neural networks a modified convolutional neural network for face sketch synthesis a novel comparative study for detection of covid- on ct lung images using texture analysis, machine learning, and deep learning methods a novel comparative study using multi-resolution transforms and convolutional neural network (cnn) for contactless palm print verification and identification a face recognition method based on lbp feature for cnn. in advanced information technology, electronic and automation control conference (iaeac) facial expression recognition algorithm basedon cnn and lbp feature fusion local binary convolutional neural networks. in: conference on computer vision and pattern recognition a novel face recognition algorithm based on the combination of lbp and cnn automated breast tumor diagnosis using local binary patterns (lbp) based on deep learning classification publisher's note springer nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations key: cord- -aguwenwo authors: chatsiou, kakia title: text classification of manifestos and covid- press briefings using bert and convolutional neural networks date: - - journal: nan doi: nan sha: doc_id: cord_uid: aguwenwo we build a sentence-level political discourse classifier using existing human expert annotated corpora of political manifestos from the manifestos project (volkens et al., a) and applying them to a corpus ofcovid- press briefings (chatsiou, ). we use manually annotated political manifestos as training data to train a local topic convolutionalneural network (cnn) classifier; then apply it to the covid- pressbriefings corpus to automatically classify sentences in the test corpus.we report on a series of experiments with cnn trained on top of pre-trained embeddings for sentence-level classification tasks. we show thatcnn combined with transformers like bert outperforms cnn combined with other embeddings (word vec, glove, elmo) and that it is possible to use a pre-trained classifier to conduct automatic classification on different political texts without additional training. a substantial share of citizen involvement in politics arises through written discourse especially in the digital space. through advanced, novel communication strategies, the public can play their part in constructing a political agenda, which has led politicians to increasingly use social media and other types of digital broadcasting to communicate (compared to mainstream press and traditional print media). this is especially pertinent with crisis communication discourse and the recent covid- pandemic has created a great opportunity to study how similar topics get communicated in different countries and the narrative choices made by government and public health officials at different levels of governance (international, national, regional). to aid fellow scholars with the systematic study of such a large and dynamic set of unstructured data, we set out to employ a text categorization classifier trained on similar domains (like existing manually annotated sentences from political manifestos) and use it to classify press briefings about the pandemic in a more effective and scalable way. the main attraction behind using manually coded political manifestos (volkens et al., a) as training data is that the political science expert community have been manually collecting and annotating in a systematic way political parties' manifestos for years (since the s) around the world in order to apply content analysis methods and to advance political science. they have subsequently been used as training data in semi-supervised domain-specific classification tasks with good results (zirn et in this paper, we build variations of a cnn sentence-level political discourse classifier using existing annotated corpora of political manifestos from the manifestos project (volkens et al., a) . we test different cnn and word embedding architectures on the already annotated (english language) sentences of the manifestos project corpus. we then apply them to a corpus of covid- press briefings (chatsiou, ) , a subset of which was manually annotated by political scholars for the purposes of this work. the article is organised as follows: we first offer a brief overview of previous related work on the use of human expert annotated political manifestos for discourse classification. we then describe our framework including the training data used, data pre-processing performed and used architecture. we report on a series of experiments with cnn trained on top of pre-trained word vectors for sentence-level classification tasks. we conclude with evaluation of the bert+cnn architecture against other combinations (word vec+cnn, glove+cnn, elmo+cnn) for both corpora. experimental results show that a cnn classifier combined with transformers like bert outperforms cnn combined with other non-context sensitive embeddings (word vec, glove, elmo). the use of nlp methods to analyse political texts is a well-established field within political science and computational social science more generally (lazer et al., ; grimmer and stewart, ; benoit, laver, and mikhaylov, ) . researchers have used nlp methods to acccomplish various classification tasks, such as political positioning on a left to right continuum (slapin and proksch, ; glavas, nanni, and ponzetto, ) , identification of political ideology differences from text glavas, nanni, and ponzetto ( ) propose an approach for cross-lingual topical coding of sentences from electoral manifestos using as training data, manually coded manifestos with a total of sentences in four languages (english, french, german and italian) (and cnns with word embeddings) and inducing a joint multilingual embedding space. they report achieving better results than monolingual classifiers in english, french and italian but worse results with their multilingual classifier than a monolingual classifier in german. more recently, bilbao-jayo and almeida ( a) build a sentence classifier using multi-scale convolutional neural networks trained in seven different languages trained with sentences extracted from annotated parties' election manifestos. they use the full range of the domains defined by the manifestos project and they prove that enhancing the multi-scale convolutional neural networks with context data improves their classification. for a detailed discussion of different deep learning text classification-based models for text classification and their technical contributions, similarities, and strengths (chatsiou and mikhaylov, ; minaee et al., , see). -using annotated political manifestos as the training dataset for classifying other types of political texts is gaining traction in the literature, especially with the boost in performance of deep learning methods for text. nanni et al. ( ) used expert annotated political manifestos in english and speeches to train a local supervised topic classifier (svm with a bag of words approach) that combines lexical with semantic textual similarity features at a sentencelevel. a sub-part of the training set was annotated manually by human experts, and the rest was labelled automatically with the global optimisation step performed via a markov logic network presented in zirn et al. ( ) . the advantage of such a domain transfer approach is that no manual topic annotation on the rest of the corpus is needed. they then classify the speeches from the , and us presidential campaign into the domains defined by the manifestos project, without the need for additional topic annotation. bilbao-jayo and almeida ( b) used annotated political manifestos in spanish and the regional manifestos project taxonomy alonso, gomez, and cabeza ( ), to train a neural network sentence-level classifier (cnn) with word vec word embeddings, also taking account the context of the phrase (like what was previously said and the political affiliation of the transmitter). they used this to analyse social media (twitter) data of the main spanish political parties during and spanish general elections without the need for additional manual coding of the twitter data. this paper builds on this area of research presenting a comparison of a cnn classifier trained on the manifestos project annotations for english, but comparing more context-free (word vec, glove, elmo) to context-sensitive (bert) word embeddings. we then apply this to a corpus of daily press-briefings on the covid- status by government and public health authorities. the main attraction behind using manually coded political manifestos (volkens et al., a) as training data is that the political science community has been manually collecting and annotating in a systematic way political parties' manifestos for decades in a combined effort to create a resource for the systematic content analysis and to advance political science. the corpus is based on the work of the manifesto research group (mrg) and the comparative manifestos (cmp) projects (budge et al., ) . classification annotations are described in the manifesto coding handbook which has evolved over the years, and provides information and instructions to the human annotators on how political parties' manifestos should be coded (latest version in volkens et al. ( b) ). the handbook also includes a speficic set of policy areas or 'domains' ( ) and subareas or 'subdomains' ( ) which are available to annotators to use (see figure ) . for our training corpus, we use a subset of the corpus contatining english manifestos with , annotated sentences. table shows the domain codes distribution in the dataset. . % domain (social groups) . % the coronavirus (covid- ) press briefings corpus is a collection of daily briefings on the covid- status and policies from the uk and the world health organisation. the corpus is still in development, but we have selected example sentences from the uk and who which were the ones available. during the peak of the pandemic, most countries around the world informed their citizens of the status of the pandemic (usually involving an update on the number of infection cases, number of deaths) and other policy-oriented decisions about dealing with the health crisis, such as advice about what to do to reduce the spread of the epidemic. at the moment the dataset includes briefings covering announcements between march and august from the uk (england, scotland, wales, northern ire-land) and the world health organisation (who) as follows: • , ) ). word vec uses a shallow neural network model to learn word associations from a large corpus of text. once trained, such a model can detect synonymous words or suggest additional words for a partial sentence. word vec uses a neural network model to learn word associations from a large corpus of text. once trained, such a model can detect synonymous words or suggest additional words for a partial sentence. glove is an unsupervised learning model for obtaining vector representations for words. this is achieved by mapping words into a meaningful space where the distance between words is related to semantic similarity. training is performed on aggregated global word-word co-occurrence statistics from a corpus, and the resulting representations showcase interesting linear substructures of the word vector space. we also obtained word embeddings for more context-sensitive word embeddings, namely elmo (peters et al., ) and bert (devlin et al., ) . elmo is a deep contextualized word representation that models both ( ) complex characteristics of word use (e.g., syntax and semantics), and ( ) how these uses vary across linguistic contexts (i.e., to model polysemy). these word vectors are learned functions of the internal states of a deep bidirectional language model (bilm), which is pre-trained on a large text corpus. they can be easily added to existing models and significantly improve the state of the art across a broad range of challenging nlp problems, including question answering, textual entailment and sentiment analysis. bert is a deeply bidirectional, unsupervised language representation, pre-trained using only a plain text corpus. it includes a variant that uses the english wikipedia with . million words. unlike previous context-free models, which generate a single word embedding representation for each word in the vocabulary, bert takes into account the context for each occurrence of a given word, providing a contextualised embedding that is different for each sentence. since kim ( )'s paper outlining the idea of using cnns for text classification (traditionally used for recognising visual patterns from images), cnns have achieved very good performance in several text classification tasks (poria, cambria, and gelbukh, ; bilbao-jayo and almeida, b). cnns involve convolutional operations of moving frames or windows (filter sizes) which analyse and reduce different overlapping regions in a matrix, to extract different features. the ability to also bootstrap word embeddings in this type of neural network make it an excellent candidate for extracting knowledge and classifying non-annotated texts. we therefore set up variations of the cnn classifier m , m , m , m as follows: . word vectors of the training dataset sentences are created using one of the following word embeddings: word vec (m ), glove (m ), elmo (m ) and bert (m ). sentences are fed as sequences of words, then mapped to indexes, then a sequence of word vectors. we have chosen as the word vector size and x d for the space where the convolution operations can be performed. vectors are fed to the neural network (cnn). we then perform convolution operations with filters and three different filter sizes ( x d, x d, and x d). we reduce the dimensionality of the feature maps generated by each group of filters using -max-pooling, which are consequently concatenated (boureau, ponce, and lecun, ). a dropout rate of . is applied (srivastava et al., ) as regularisation to prevent overfitting. the layer with softmax computes the probability distribution over the labels. we perform optimization using the adam optimiser with the parameters of the original manuscript (kingma and ba, ). note that this is a sentence-level topic classifier basing its predictions by taking into account only the information local within the sentence. for our training corpus, we use a subset of the corpus containing english manifestos with , annotated sentences. table shows the domain codes distribution in the dataset. in order to evaluate the different architectures, we divided our training dataset in different subsets: training and validation sets ( %) and test set ( %). typically, we have used a validation set (or development test set) separate from the test set, to ensure correct evaluation and that our model(s) do not overfit, thus ensuring how each domain is classified and that the evaluation is robust. we performed experiments, one for each combination of cnn and word embeddings: • m : cnn with word vec table , the performance of the classifier improves when more context-sensitive word embeddings are used. using bert with cnn (m ) seems to provide a substantial increase in accuracy and f , whereas using elmo performs very well as well. we also tested the performance of the same different pre-trained models on the covid- corpus. we asked two political science scholars to annotate a subset of press briefings ( of each set), using the domains of the manifestos project. this resulting in a dataset of manually annotated sentences, with domain distrubution as in table . note that the pre-trained models have been trained using the annotated manifestos from the manifestos project, without any additional training on the press briefings corpus sentences. as shown in table , the performance of the classifier improves when more context-sensitive word embeddings are used in the context of the covid- press briefings corpus as well. using bert with cnn (m ) seems to provide a substantial increase in accuracy and f , whereas using elmo performs very well as well. as expected there is some loss of accuracy, as we are porting the classifier to a slightly different domain of political text (from manifestos to press briefings). in this paper, we built a sentence-level political discourse classifier using existing human expert annotated corpora of english political manifestos from the manifestos project (volkens et al., a) . we tested the accuracy and performance of a neural networks classifier (cnn) using different word embeddings as part of the word to vector mapping and we showed that sentence-level cnn classifiers combined with transformers like bert outperform models with other embeddings (word vec, glove, elmo). we then applied the same pre-trained models to a different set of text, the covid- press briefings corpus. we observe similar patterns in the accuracy and f scores, and additionally show that it is possible to use a pre-trained classifier to conduct automatic classification on different political texts without additional training in the future, we aim to conduct similar experiments also considering the 'subdomain' categories of the manifesto corpus annotations. we also look forward to re-running these experiments for other languages in the manifestos project, testing the language-agnostic advantage of word embeddings and see if we could obtain different results. this paper follows the aaai publications ethics and malpractice statement and the aaai code of professional conduct. we use publicly available text data to ensure transparency and reproducibility of the research. additionally, all code will be available as open source code (on github.com) at the end of the submission and reviewing process. the paper suggests ways to automatically extract topic information from political discourse texts, employing deep learning methods which are usually associated with artificial intelligence and ethical considerations around them. we do not envisage any ethical, social and legal considerations arising from the work outlined in this study, such as impact of ai on humans, on economic growth, on inequality, amplifying bias or undermining political stability or other issues described in recent reports on ethics in ai (see for example (bird et al., ) ). table domain codes' distribution in the english subset of the manifestos corpus used for training the cnn classifier. . . . . . . table domain results of all models using political manifestos . . . . table manifest project domain codes' distribution in the manually annotated subset of the covid- corpus. . . . . . . . . . . . table domain results of all models using covid- probabilistic latent semantic indexing mapping policy preferences: estimates for parties, electors, and governments latent dirichlet allocation". en. in: a deeply bidirectional, unsupervised language representation, pre-trained using only a plain text corpus automated classification of congressional legislation a scaling model for estimating time-series party positions from texts". en treating words as data with error: uncertainty in text statements of policy positions life in the network: the coming age of computational social science use of force and civil-military relations in russia: an automated content analysis". en a theoretical analysis of feature pooling in visual recognition affective news: the automated coding of sentiment in political texts measuring centre-periphery preferences: the regional manifestos project text as data: the promise and pitfalls of automatic content analysis methods for political texts". en efficient estimation of word representations in vector space measuring ideological proportions in political speeches convolutional neural networks for sentence classification glove: global vectors for word representation dropout: a simple way to prevent neural networks from overfitting". en deep convolutional neural network textual features and multiple kernel learning for utterancelevel multimodal sentiment analysis crowd-sourced text analysis: reproducible and agile production of political data entities as topic labels: combining entity linking and labeled lda to improve topic interpretability and evaluability". en agreement and disagreement: comparison of points of view in the political domain topfish: topic-based analysis of political position in us electoral campaigns classifying topics and detecting topic shifts in political manifestos". en understanding state preferences with text as data: introducing the un general debate corpus". en cross-lingual classification of topics in political texts". en adam: a method for stochastic optimization building entity-centric event collections automatic political discourse analysis with multi-scale convolutional neural networks and contextual data". en political discourse classification in social networks using context sensitive convolutional neural networks". en deep contextualized word representations bert: pre-training of deep bidirectional transformers for language understanding topic models meet discourse analysis: a quantitative tool for a qualitative approach structural topic modeling for social scientists: a brief case study with social movement studies literature the ethics of artificial intelligence: issues and initiatives. en. study -european parliament's panel for the future of science and technology pe . . lu: publications office covid- press briefings corpus. eng. type: dataset deep learning for political science deep learning based text classification: a comprehensive review manifesto project dataset. en. version number: a type: dataset. the manifesto data collection. manifesto project (mrg/cmp/marpor). version the author would like to acknowledge the support of the business and local government data research centre (es/s / ) funded by the economic and social research council (esrc) whilst undertaking this work. key: cord- -qv pacau authors: polsinelli, matteo; cinque, luigi; placidi, giuseppe title: a light cnn for detecting covid- from ct scans of the chest date: - - journal: nan doi: nan sha: doc_id: cord_uid: qv pacau ovid- is a world-wide disease that has been declared as a pandemic by the world health organization. computer tomography (ct) imaging of the chest seems to be a valid diagnosis tool to detect covid- promptly and to control the spread of the disease. deep learning has been extensively used in medical imaging and convolutional neural networks (cnns) have been also used for classification of ct images. we propose a light cnn design based on the model of the squeezenet, for the efficient discrimination of covid- ct images with other ct images (community-acquired pneumonia and/or healthy images). on the tested datasets, the proposed modified squeezenet cnn achieved . % of accuracy, . % of sensitivity, . % of specificity, . % of precision and . of f score in a very efficient way ( . seconds medium-end laptot without gpu acceleration). besides performance, the average classification time is very competitive with respect to more complex cnn designs, thus allowing its usability also on medium power computers. in the next future we aim at improving the performances of the method along two directions: ) by increasing the training dataset (as soon as other ct images will be available); ) by introducing an efficient pre-processing strategy. : images extracted from dataset [ ] . a covid- image (a) and a not covid- image also containing inflammations (b). coronavirus (covid- ) is a world-wide disease that has been declared as a pandemic by the world health organization on th march . to date, more than two million people have been infected and more than thousand died. a quick diagnosis is fundamental to control the spread of the disease and increases the effectiveness of medical treatment and, consequently, the chances of survival without the necessity of intensive and sub-intensive care. this is a crucial point because hospitals have limited availability of equipment for intensive care. viral nucleic acid detection using real-time polymerase chain reaction (rt-pcr) is the accepted standard diagnostic method. however, many countries are unable to provide the sufficient rt-pcr due to the fact that the disease is very contagious. so, only people with evident symptoms are tested. moreover, it takes several hours to furnish a result. therefore, faster and reliable screening techniques that could be further confirmed by the pcr test (or replace it) are required. computer tomography (ct) imaging seems to be a valid alternative to detect covid- [ ] with a higher sensitivity [ ] (up to % compared with % of rt-pcr). ct is likely to become increasingly important for the diagnosis and management of covid- pneumonia, considering the continuous increments in global cases. early research shows a pathological pathway that might be amenable to early ct detection, particularly if the patient is scanned or more days after developing symptoms [ ] . nevertheless, the main bottleneck that radiologists experience in analysing radiography images is the visual scanning of small details. moreover, a large number of ct images have to be evaluated in a very short time thus increasing the probability of misclassifications. this justifies the use of intelligent approaches that can automatically classify ct images of the chest. deep learning methods have been extensively used in medical imaging. in particular, convolutional neural networks (cnns) have been used both for classification and segmentation problems, also of ct images [ ] . though cnns have demonstrated promising performance in this kind of applications, they require a lot of data to be correctly trained. in fact, ct images of the lungs can be easily misclassified, especially when both contain damages due to pneumonia, referred due to different causes ( figure ). until now, there are limited datasets for covid- and those available contain a limited number of ct images. for this reason, during the training phase it is necessary to avoid/reduce overfitting (that means the cnn is not learning the discriminant features of covid- ct scans but only memorizing it). another critical point is that cnn inference requires a lot of computational power. in fact, usually cnns are executed on particularly expensive gpus equipped with specific hardware acceleration systems. anyway, expensive gpus are still the exception rather than the norm in normal computing clusters that usually are cpu based [ ] . even more, this type of machines could not be available be available in hospitals, especially in emergency situations and/or in developing countries. in the present work, we aim at obtaining acceptable performances for an automatic method in recognizing covid- ct images of lungs while, at the same time, dealing with reduced datasets for training and validation and reducing the computational overhead imposed by more complex automatic systems. for this reason, in this work we started from the model of the squeezenet cnn, because it is able to reach the same accuracy of modern cnns but with fewer parameters [ ] . moreover, in a recent benchmark [ ] , squeezenet has achieved the best accuracy density (accuracy divided by number of parameters) and the best inference time. to date, some works on covid- detection by ct images are being published [ , , ] . all these works use heavy cnns (respectively resnet, inception and resnet) adapted to improve accuracy. in this work we developed, trained and tested a light cnn (based on the squeezenet) to discriminate between covid- and community-acquired pneumonia and/or healthy ct images. the hyper-parameters have been optimized with bayesian method on two datasets [ , ] . in addition, class activation mapping (cam) [ ] has been used to understand which parts of the image are relevant for the cnn to classify it and to check that no over-fitting occurs. the paper is structured as follow: in the next section (materials and methods) the datasets organization, the used processing equipment and the proposed methodology are presented; section contains results and discussion, including a comparison with recent works on the same argument; finally section concludes the paper and proposes future improvements. . the datasets used therein are the zhao et al. dataset [ ] and the italian dataset [ ] . the zhao et al. dataset [ ] is composed by ct scans of covid- subjects and ct scans of other kinds of illnesses and/or healthy subjects. the italian dataset is composed of ct scans of covid- . these datasets are continuously updating and their images is raising at the same time. in this work we used two different arrangements of the datasets, one in which data from both datasets are used separately and the other containing data mixed by both datasets. the first arrangement contains two different test datasets (test- and test- ). in fact, the zhao dataset is used alone and divided in train, validation and test- . the italian dataset is integrated into a second test dataset, test- ( table ) , while the zhao dataset is always used in train, validation and test- (in test- , the not covid- images of the zhao dataset are the same of test- ). the first arrangement is used to check if, even with a small training dataset, it is possible to train a cnn capable to work well also on a completely different and new dataset (the italian one). in the second arrangement, both datasets are mixed as indicated in table . in this arrangement the number of images from the italian dataset used to train, validate and test- are , and , respectively. the second arrangement represents a more realistic case in which both datasets are mixed to increase as possible the training dataset (at the expenses of a test- which, in this case, is absent). in both arrangements, the training dataset has been augmented with the following transformations: a rotation (with a random angle between and degrees), a scale (with a random value between . and . ) and addition of gaussian noise to the original image. . for the numerical of the proposed cnns we used two hardware systems: ) a high level computer with cpu intel core i - , ram gb and gpu nvidia geforce gtx gb dedicated memory; ) a low level laptot with cpu intel core i processor, ram gb and no dedicated gpu. the first is used for hyperparameters optimization and to train, validate and test the cnns; the second is used just for test in order to demonstrate the computational efficiency of the proposed solution. in both cases we used the development environment matlab a. matlab integrates powerful toolboxes for the design of neural networks. moreover, with matlab it is possible to export the cnns in an open source format called "onnx", useful to share the cnns with research community. when used the high level computer is used, the gpu acceleration is enabled in matlab environment, based on the technology nvida cuda core provided by the gpu that allows parallel computing. in this way we speed up the prototyping of the cnns. when final tests are performed on the low level hardware, no gpu acceleration is used. . the squeezenet is capable of achieving the same level of accuracy of others, more complex, cnns designs which have a huge number of layers and parameters [ ] . for example, squeezenet can achieve the same accuracy of alex-net [ ] on the imagenet dataset [ ] with x fewer parameters and a model size of less than . mb [ ] . the squeezenet is composed of blocks called "fire module". as shown in figure .a, each block is composed of a squeeze convolution layer (which has x filters) feeding an expanding section of two convolution layers with x and x filters, respectively. each convolution layer is followed by a relu layer. the relu layers output of the expanding section are concatenated with a concatenation layer. to improve the training convergence and to reduce overfitting we added a batch normalization layer between the squeeze convolution layer and the relu layer ( figure .b). each batch normalization layer adds % of computation overhead and for this reason we chose to add them only before the expanding section in order to make it more effective while, at the same time, limiting their number. moreover, we replaced all the relu layers with elu layers because, from literature [ ] , elus networks without batch normalization significantly outperform relu networks with batch normalization. the squeezenet has fire modules in cascade configuration. anyway, two more complex architectures exist: one with simple and another with complex bypass. the simple bypass configuration consists in skip connections added between fire module and fire module , fire module and fire module , fire module and fire module and, finally, between fire module and fire module . the complex bypass added more skip connections (between the same fire modules) with a convolutional layer of filter size x . from the original paper [ ] it seems that the better accuracy is achieved by the simpler bypass configuration. for this reason, in this work we test both squeezenet without any bypass (to have the most efficient model) and with simple bypass (to have the most accurate model), while complex bypass configuration is not considered. besides, we propose also a further modify cnn (figure ) based on the squeezenet without any bypass. moreover, we added a transpose convolutional layer to the last custom fire module that expands the feature maps times along width and height dimensions. these feature maps are concatenated in depth with the feature maps from the second custom fire module through a skip connection. weighted sum is performed between them with a convolution layer with filters of size x . finally all the feature map are concatenated in depth and averaged with a global average pool layer. this design allows to combine spatial information (early layers) and features information ( last layers) to improve the accuracy. . since we are using a light cnn to classify, the optimization of the training phase is crucial to achieve good results with a limited number of parameters. the training phase of a cnn is highly correlated with settings hyperparameters. hyperparameters are different from model weights. the former are calculated before the training phase, whereas the latter are optimised during the training phase. setting of hyperparameters is not trivial and different strategies can be adopted. a first way is to select hyperparameters manually though it would be preferable to avoid it because the number of different configurations is huge. for the same reason, approaches like grid search do not use do not use past evaluations: a lot of time has to be spent for evaluating bad hyperparameters configurations. instead, bayesian approaches, by using past evaluation results to build a surrogate probabilistic model mapping hyperparameters to a probability of a score on the objective function, seem to work better. in this work we used bayesian optimization for the following hyper-parameters: . initial learning rate: the rate used for updating weights during the training time; . momentum: this parameter influences the weights update taking into consideration the update value of the previous iteration; . l -regularization: a regularization term for the weights to the loss function in order to reduce over-fitting. . squeezenet with simple bypass but without transfer learning; . squeezenet with simple bypass and transfer learning; regarding the arrangement , the results of the experiments are reported in table . for a better visualization of the results, we report just the the best accuracy calculated with respect to all the attempts, the accuracy estimated by the objective function at the end of all attempts and the values of the hyperparameters. the best accuracy value is achieved with the experiment # . both observed and estimated accuracy are the highest between all the experiments. regarding the original paper of the squeezenet [ ] , it seems that there is not a relevant difference between the model without bypass and with bypass. it is also interesting to note that use transfer learning (experiment # ) from the original weights of the squeezenet does not have a relevant effect. regarding the dataset arrangement , the results of the experiments are shown in table . the experiment # is still the best one, though experiment # is closer in terms of observed accuracy. however, we did not expect such a difference between the learning rate of experiment # of table and table . moreover, also the l -regularization changed a lot. it suggests that the cnn trained/validated on the dataset arrangement (that we call cnn- ) has a different behavior with respect to the cnn trained/validated on dataset arrangement (that we call cnn- ). however, the results shown in table and table suggest that the proposed cnn achieves better results when compared to different configurations of the original squeezenet. . both cnn- and cnn- have been trained for more epochs, with a learning rate drop of . every epochs. after that, both cnns have been evaluated with the respective test- dataset with the following benchmark metrics: accuracy (measures the correct predictions of the cnn), sensitivity (measures the positives that are correctly identified), specificity (measures the negatives that are correctly identified), precision (measures the proportion of positive identification that is actually correct) and f score(measures the balance between precision and recall). the results, shown in table , confirm the hypothesis of the previous section: cnn- and cnn- have a different behavior. this is clearly understandable by taking into account the sensitivity and specificity values. the cnn- has higher specificity ( . against . ) and that means that is capable to better recognize not covid- images. the cnn- has higher sensitivity ( . against . ) and that means that is capable to better recognize covid- images. regarding the application of cnn- on test- , the results are frustrating. the accuracy reaches just . because the cnn is capable only to recognize well not covid- images (precision is . ) but has very poor performance on covid- images (sensitivity = . ). as affirmed before, the analyses of test- is very hard if we do not use a larger dataset of images. in order to deeply understand the behaviour of cnn- and cnn- we used cam [ ] , that gives a visual explanations of the predictions of convolutional neural networks. this is useful to figure out what each cnn has learned and which part of the input of the network is responsible for the classification. it can be useful to identify biases in the training set and to increase model accuracy. with cam it is also possible to understand if the cnns are overfitting. in fact, if the network has high accuracy on the training set, but low accuracy on the test set, cam helps to verify if the cnn is basing its predictions on the relevant features of the images or on the background. to this aim, we expect that the activations maps are focused on the lungs and especially on those parts affected by covid- (lighter regions with respect to healthy, darker, zones of the lungs). figure shows examples of cams for each cnns and, to allow comparisons, we refer them to the same ct images (covid- diagnosed both from radiologists and cnns) extracted from the training dataset. for cnn- , figure .a, .b and .c, the activations are not localized inside the lungs. in figure .b the activations are just a little bit better than figures .a .c, because the red area is partially focused on the ill part of the right lung. the situations enhances in the cams of cnn- (figures .d, .e, .f) because the activations are more localized on the ill parts of the lungs (this situation is perfectly represented in figure .f). figure shows examples of cams for each cnns (as figure ) but with ct images of lungs not affected by covid- and correctly classified by both cnns. cnn- focuses on small isolated zones ( figures .a, .b and .c): even if these zones are inside the lungs, it seems unreasonable to obtain a correct classification with so few information (and without having checked the remaining of the lungs). instead, in cnn- , the activations seems to take into consideration the whole region occupied by lungs, as demonstrated in figures .d, .e and .f, which is the necessary step to correctly classify a lung ct image. as a conclusion, it is evident that cnn- has a better behaviour with respect to cnn- . since cnn- and cnn- have the same model design but different training datatasets, we argue that the training dataset is the responsible of their different behaviour. in fact, the dataset arrangement- contains more training images (taken from the italian dataset) and the cnn- seems to be gain by it. so, figure and figure suggest that the cnn model, even with a limited number of parameters, is capable to learn the discriminant features of this kind of images. therefore, the increment of the training dataset should increase also the performance of the cnn. . we compare the results of our work (in particular the cnn- ) with [ , , ] . since methods and datasets (training and test) differ and a correct quantitative comparison is arduous, we can have an idea regarding the respective results, summarized in table . the methods [ ] achieve better results than the method we propose. with respect to [ ] , our method achieves better results, especially regarding sensitivity which, in our method, is % higher: this suggests a better classification regarding covid- images. the average time required by our cnn to classify a single ct image is . seconds on our high-end workstation. as comparison, the method in [ ] requires . seconds on a similar high-end workstation (intel xeon processor e - , gpu ram gb, gpu nvidia quadro m gb). on our medium-end laptot the cnn requires an average time of . seconds to classify a single image. this represents, for the method proposed therein, the possibility to be used massively on medium-end computers: a dataset of about images, roughly corresponding to patients [ ] , could be classified in about . hours. the improvement in efficiency of the proposed method with respect to the previously compared is demonstrated in table , where the sensitivity value (the only parameter reported by all the compared methods) is rated with respect the number of parameters used to reach it: the resulting ratio confirms that the proposed method greatly overcomes the others in efficiency. in this study, we proposed a cnn design (starting from the model of the squeezenet cnn) to discriminate between covid- and other ct images (composed both by community-acquired pneumonia and healthy images). on both dataset arrangements, the proposed cnn outperforms the original squeezenet. in particular, on the test dataset the proposed cnn (cnn- ) achieved . % of accuracy, . % of sensitivity, . % of specificity, . % of precision and . of f score. moreover, the proposed cnn is more efficient with respect to other, more complex cnns design. in fact, the average classification time is low both on a high-end computer ( . seconds for a single ct image) and on a medium-end laptot ( . seconds for a single ct image). this demonstrates that the proposed cnn is capable to analyze thousands of images per day even with limited hardware resources. the next major improvements that we want to achieve is to improve the accuracy, sensitivity, specificity, precision and f score. in order to do that, since the cnn model seems to be robust as shown with cams tests, we aim at increasing the training dataset as soon as new ct images will be available. moreover, when we compared our methods with those presented in [ , ] and in [ ] , we noticed that the last method, as ours, does not use pre-processing, differently from the first two. a possible explanation of the better results of methods [ , ] with respect to our method could be in the usage of pre-processing. as a future work, we aim to study efficient pre-processing strategies that could improve accuracy while reducing computational overhead in order to preserve the efficiency. the role of ct in case ascertainment and management of covid- pneumonia in the uk: insights from high-incidence regions sensitivity of chest ct for covid- : comparison to rt-pcr efficient multiple organ localization in ct image using d region proposal network improving the speed of neural networks on cpus squeezenet: alexnet-level accuracy with x fewer parameters and< . mb model size benchmark analysis of representative deep neural network architectures a deep learning algorithm using ct images to screen for corona virus disease artificial intelligence distinguishes covid- from community acquired pneumonia on chest ct deep learning system to screen coronavirus disease pneumonia sirm dataset of covid- chest ct scan learning deep features for discriminative localization imagenet classification with deep convolutional neural networks imagenet: a large-scale hierarchical image database fast and accurate deep network learning by exponential linear units (elus) key: cord- -wpqdtdjs authors: qi, xiao; brown, lloyd; foran, david j.; hacihaliloglu, ilker title: chest x-ray image phase features for improved diagnosis of covid- using convolutional neural network date: - - journal: nan doi: nan sha: doc_id: cord_uid: wpqdtdjs recently, the outbreak of the novel coronavirus disease (covid- ) pandemic has seriously endangered human health and life. due to limited availability of test kits, the need for auxiliary diagnostic approach has increased. recent research has shown radiography of covid- patient, such as ct and x-ray, contains salient information about the covid- virus and could be used as an alternative diagnosis method. chest x-ray (cxr) due to its faster imaging time, wide availability, low cost and portability gains much attention and becomes very promising. computational methods with high accuracy and robustness are required for rapid triaging of patients and aiding radiologist in the interpretation of the collected data. in this study, we design a novel multi-feature convolutional neural network (cnn) architecture for multi-class improved classification of covid- from cxr images. cxr images are enhanced using a local phase-based image enhancement method. the enhanced images, together with the original cxr data, are used as an input to our proposed cnn architecture. using ablation studies, we show the effectiveness of the enhanced images in improving the diagnostic accuracy. we provide quantitative evaluation on two datasets and qualitative results for visual inspection. quantitative evaluation is performed on data consisting of , normal (healthy), , pneumonia, and , covid- cxr scans. in dataset- , our model achieves . % average accuracy for a three classes classification, % precision, recall, and f -scores for covid- cases. for dataset- , we have obtained . % average accuracy, and % precision, recall, and f -scores for detection of covid- . conclusions: our proposed multi-feature guided cnn achieves improved results compared to single-feature cnn proving the importance of the local phase-based cxr image enhancement. coronavirus disease (covid- ) is an infectious disease caused by severe acute respiratory syndrome coronavirus (sars-cov- ), a newly discovered coronavirus [ , ] . in march , the world health organization (who) declared the covid- outbreak a pandemic. up to now, more than . million cases have been reported across countries and territories, resulting in more than , deaths [ ] . early and accurate screening of infected population and isolation from public is an effective way to prevent and halt spreading of virus. currently, the gold standard method used for diagnosing covid- is real-time reverse transcription polymerase chain reaction (rt-pcr) [ ] . the disadvantages of rt-pcr include its complexity and problems associated with its sensitivity, reproducibility, and specificity [ ] . moreover, the limited availability of test kits makes it challenging to provide the sufficient diagnosis for every suspected patients in the hyper-endemic regions or countries. therefore, a faster, reliable and automatic screening technique is urgently required. in clinical practice, easily accessible imaging, such as chest x-ray (cxr), provides important assistance to clinicians in decision making. compared to computed tomography (ct) the main advantages of cxr are: enabling fast screening of patients, being portable, and easy to setup (can be setup in isolation rooms). however, the sensitivity and specificity (radiographic assessment accuracy) of cxr for diagnosing covid- is low compared to ct. this is especially problematic for identifying early stage covid- patients with mild symptoms. this causes larger intra-and inter-observer variability in reading the collected data by radiologists since qualitative indicators can be subtle. therefore, there is increased demand for computer aided diagnostic method to aid the radiologist during decision making for improved management of covid- disease. in view of these advantages and motivated by the need for accurate and automatic interpretation of cxr images, a number of studies based on deep convolutional neural networks (cnns) have shown quite promising results. ozturk et al. [ ] proposed a cnn architecture, termed darkcovidnet, and achieved . % three class classification accuracy. the method was evaluated on covid- , healthy and pneumonia cxr scans. covid- data was obtained from patients. wang et al. [ ] built a public dataset named covidx, which is comprised of a total of cxr images from patient case and developed covid-net, a deep learning model. their dataset had covid- images obtained from patients. their model achieved . % overall accuracy in classifying normal, pneumonia, and covid- scans. in [ ] a resnet- architecture was utilized to achieve a . % overall accuracy in classifying four classes, where pneumonia was split into bacterial pneumonia and viral pneumonia. however, there were only eight covid- cxr images used for testing. in [ ] , . % overall accuracy was reported on a dataset including normal, pneumonia and covid- scans. covid- data was collected from patients. in order to improve the performance of the proposed method, data augmentation was performed on the covid- dataset bringing the total covid- datasize to , . with data augmentation they have improved the overall accuracy . %. in [ ] , contrast limited adaptive histogram equalization (clahe) was used to enhance the cxr data. the authors proposed a depth-wise separable convolutional neural network (dscnn) architecture. evaluation was performed on normal, pneumonia, and covid- cxr scans. average reported multi-class accuracy was . %. number of patients for the covid- dataset was not available. in [ ] , a stacked cnn architecture achieved an average accuracy of . %. the evaluation dataset had covid- scans from patients, normal scans from patients, and pneumonia scans from patients. in [ ] , the reported multi-class average classification accuracy was s . %. the evaluation dataset included normal, pneumonia, and covid- cxr scans. the data was collected from various sources and patient information was not specified. in [ ] transfer learning was investigated for training the cnn architecture. the evaluation dataset included covid- , normal, and pneumonia images. . % average accuracy was reported for three-class classification. the average accuracy increased to . % if viral pneumonia was included in the evaluation. in [ ] , performance of three different, previously proposed, cnn architectures was evaluated for multi-class classification. with , covid- images, the study used the largest covid- dataset reported so far. average area under the curve (auc), for classification of covid- from regular pneumonia, was . [ ] . although numerous studies have shown the capability of cnns in effective identification of covid- from cxr images, none of these studies investigated local phase cxr image features as multi-feature input to a cnn architecture for improved diagnosis of covid- disease. furthermore, except [ , ] , most of the previous work was evaluated on a limited number of covid- cxr scans. in this work we show how local phase cxr features based image enhancement improves the accuracy of cnn architectures for covid- diagnosis. specifically, we extract three different cxr local phase image features which are combined as a multi-feature image. we design a new cnn architecture for processing multi-feature cxr data. we evaluate our proposed methods on large scale cxr images obtained from healthy subjects as well as subjects who are diagnosed with community acquired pneumonia and covid- . quantitative results show the usefulness of local phase image features for improved diagnosis of covid- disease from cxr scans. our proposed method is designed for processing cxr images and consists of two main stages as illustrated in figure : -we enhance the cxr images (cxr(x, y)) using local phase-based image processing method in order to obtain a multi-feature cxr image (m f (x, y)), and -we classify cxr(x, y) by designing a deep learning approach where multi feature cxr images (m f (x, y)), together with original cxr data (cxr(x, y)), is used for improving the classification performance. next, we describe how these two major processes are achieved. in order to enhance the collected cxr images, denoted as cxr(x, y), we use local phase-based image analysis [ ] . three different cxr(x, y) image phase features are extracted: -local weighted mean phase angle (lwp a(x, y)), -lwp a(x, y) weighted local phase energy (lp e(x, y)), and -enhanced local energy attenuation image (elea(x, y)). lp e(x, y) and lwp a(x, y) image features are extracted using monogenic signal theory where the monogenic signal image (cxr m (x,y)) is obtained by combining the bandpass filtered cxr(x, y) image, denoted as cxr b (x, y), with the riesz filtered components as: here h and h represent the vector valued odd filter (riesz filter) [ ] . α-scale space derivative quadrature filters (assd) are used for band-pass filtering due to their superior edge detection [ ] . the lwp a(x, y) image is calculated using: ). we do not employ noise compensation during the calculation of the lwp a(x, y) image in order to preserve the important structural details of cxr(x, y). the lp e(x, y) image is obtained by averaging the phase sum of the response vectors over many scales using: in the above equation sc represents the number of scales. lp e(x, y) image extracts the underlying tissue characteristics by accumulating the local energy of the image along several filter responses. the lp e(x, y) image is used in order to extract the third local phase image elea(x, y). this is achieved by using lp e(x, y) image feature as an input to an l norm based contextual regularization method. the image model, denoted as cxr image transmission map (cxr a (x, y)), enhances the visibility of lung tissue features inside a local region and assures that the mean intensity of the local region is less than the echogenicity of the lung tissue. the scattering and attenuation effects in the tissue are combined as: here ρ is a constant value representative of echogenicity in the tissue. in order to calculate elea(x, y), cxr a (x, y) is estimated first by minimizing the following objective function [ ] : in the above equation • represents element-wise multiplication, χ is an index set, and * is convolution operator. d j is calculated using a bank of high order differential filters [ ] . the filter bank enhances the cxr tissue features inside a local region while attenuating the image noise. w j is a weighting matrix calculated using: equation the first part measures the dependence of cxr a (x, y) on lp e(x, y) and the second part models the contextual constraints of cxr a (x, y) [ ] . these two terms are balanced using a regularization parameter λ [ ] . after and is a small constant used to avoid division by zero [ ] . combination of these three types of local phase images as three-channel input creates a new multi-feature image, denoted as m f (x, y). qualitative results corresponding to the enhanced local phase images are displayed in figure . investigating figure we can observe that the enhanced local phase images extract new lung features that are not visible in the original cxr(x, y) images. since local phase image processing is intensity invariant, the enhancement results will not be affected from the intensity variations due to patient characteristics or x-ray machine acquisition settings. the multi-feature image m f (x, y) and the original cxr(x, y) image are used as an input to our proposed deep learning architecture which is explained in the next section. our proposed multi-feature cnn architecture consists of two same convolutional network streams for processing cxr(x, y) images and the corresponding m f (x, y) respectively. strategies for the optimal fusion of features from multi-modal images is an active area of research. generally, data is fused earlier when the image features are correlated, and later when they are less correlated [ ] . depending on the dataset, different types of fusion strategies outperform the other [ ] . in [ ] , our group has also investigated early, mid, and late-level fusion operations in the context of bone segmentation from ultrasound data. late-fusion operation has outperformed the other fusion operations. in [ ] , authors have also used late-fusion network, for segmenting brain tumors from mri data, has outperformed other fusion operations. during this work we design mid-fusion and late-fusion architectures (fig. ) . as part of this work we have also investigate several fusion operations: sum fusion, max fusion, averaging fusion, concatenation fusion, convolution fusion. based on the performance of the fusion operations and fusion architectures, on a preliminary experiment, we use concatenation fusion operation for both of our architectures. we use the following network architectures as the encoder network: pretrained alexnet [ ] , resnet [ ] , sononet [ ] , xnet(xception) [ ] , inceptionv (inception-resnet-v ) [ ] and efficient-netb [ ] . pretrained alexnet [ ] and resnet [ ] have been incorporated into various medical image analysis tasks [ ] . sononet achieved excellent performance in implementation of both classification and localization tasks [ ] . xnet(xception) [ ] , inceptionv (inception-resnet-v ) [ ] and ef-ficientnetb [ ] were chosen due to their outstanding performance on recent medical data classification tasks as well as classification of covid- from chest ct data [ , ] . we use the following datasets to evaluate the performance of proposed fusion network models: bimcv [ ] , covidx [ ] , and covid-cxnet [ ] . covid- cxr scans from bimcv [ ] and covidx [ ] datasets were combined to generate the 'evaluation dataset' (table ) . for normal and pneumonia datasets we have randomly selected a subset of images (from subjects) from the evaluation dataset (table ). in total images from each class (normal, pneumonia, covid- ) were used during -fold cross validation. table shows the data split for covid- data only. similar split was also performed for normal and pneumonia datasets. in order to provide additional testing for our proposed networks, we have designed a new test dataset which we call 'test dataset- ' ( table ). the images from normal and pneumonia cases which were not included in the 'evaluation dataset' were part of the 'test dataset- '. furthermore, we have included all the covid- scans from covid-cxnet [ ] . in order to show the improvements achieved using our proposed multifeature cnn architecture we also trained the same cnn architectures using only m f (x, y) or cxr(x, y) images. we refer to these architectures as monofeature cnns. quantitative performance was evaluated by calculating average accuracy, precision, recall, and f -scores for each class [ , ] . the experiments were implemented in python using pytorch framework. all models were trained using stochastic gradient descent (sgd) optimizer, crossentropy loss function, learning rate . for the first epoch and a learning rate fig. : grad-cam images [ ] obtained by late fusion resnet architecture. decay of . every epochs with a mini-batches of size . for local phase image enhancement, we have used sc = and the rest of the assd filter parameters were kept same as reported in [ ] . for calculating elea(x, y) images we used λ = , = . , η = . , and ρ, the constant related to tissue echogenicity, was chosen as the mean intensity value of lp e(x, y). these values were determined empirically and kept constant during qualitative and quantitative analysis. qualitative analysis: gradient-weighted class activation mapping (grad-cam) [ ] visualization of normal, pneumonia, and covid- are presented as qualitative results in figure . investigating figure we can see the discriminative regions of interest localized in the normal, pneumonia, and covid- data. quantitative analysis of evaluation dataset: table shows average accuracy of the -fold cross validation on the 'evaluation dataset' for mono-feature cnn architectures as well as the proposed multi-feature cnn architectures. a box and whisker plot is presented in figure . in most of the investigated network designs m f (x, y)-based mono-feature cnn architectures outperform cxr(x, y)-based mono-feature cnn architectures. the best average accuracy is obtained when using our proposed multi-feature resnet [ ] architecture. all multi-feature cnns with mid-and late-fusion operation compared with mono-feature cnns, with original cxr(x, y) images as input, achieved statistically significant difference in terms of classification accuracy (p< . using a paired t-test at % significance level). except sononet [ ] , xnet(xception) [ ] , and inceptionv (inception-resnet-v ) [ ] , all multi-feature cnns with mid-fusion operation compared with mono-feature cnns with m f (x, y) images as input show statistically significant difference in terms of classification accuracy (p< . using a paired t-test at % significance level). we did not find any statistical significant difference in the average accuracy results between the middle-level and late-fusion networks (p> . using a paired t-test at % significance level). figure presents confusion matrix results together with average precision, recall, and f -scores for all multi-feature late-fusion cnn architectures. one important aspect observed from the presented results we can see that almost all the investigated multi-feature networks achieved very high precision, recall, and f -scores for covid- data indicating very few cases were misclassified as covid- from other infected types. quantitative analysis of test dataset- : multi-feature resnet provides the highest overall accuracy shown in table , which is consistent with the quantitative result achieved with the 'evaluation dataset'. figure shows a box and whisker plot for each network. all multi-feature cnns with late-fusion operation compared with mono-feature cnns, with original cxr(x, y) im- fig. : confusion matrix, and average precision, recall and f -scores obtained from -fold cross validation on 'evaluation data' using all multi-feature network models. ages as input, achieved statistically significant difference in terms of classification accuracy (p< . using a paired t-test at % significance level). except xnet(xception) [ ] , all the multi-feature cnns with mid fusion operation compared with mono-feature cnns with original cxr(x, y) images as input achived statistically significant difference in terms of classification accuracy (p< . using a paired t-test at % significance level). except xnet(xception) [ ] , all multi-feature cnns with mid-fusion operation compared with mono-feature cnns with m f (x, y) images as input show statistically significant difference in terms of classification accuracy (p< . using a paired t-test at % significance level). similar to 'evaluation dataset' results, there was no statistically significant difference in the average accuracy results between the middle-level and late-fusion networks (p> . using a paired t-test at % significance level) except resnet [ ] , and xnet(xception) [ ] architectures. confusion matrix results, together with average precision recall and f -score values, for all multi-feature late-fusion cnn architectures evaluated are presented in fig-ure . similar to the results presented for 'evaluation dataset', high precision, recall, and f -score values are obtained for the covid- data. development of a new computer aided diagnostic methods for robust and accurate diagnosis of covid- disease from cxr scans is important for improved management of this pandemic. in order to provide a solution to this need, in this work, we present a multi-feature deep learning model for classification of cxr images into three classes including covid- , pneumonia,and normal healthy subjects. our work was motivated by the need for enhanced representation of cxr images for achieving improved diagnostic accuracy. to this end we proposed a local phase-based cxr image enhancement method. we have shown that by using the enhanced cxr data, denoted as m f (x, y), in conjunction with the original cxr data, diagnostic accuracy of cnn architectures can be improved. our proposed multi-feature cnn architectures were trained on a large dataset in terms of the number of covid- cxr scans and have achieved improved classification accuracy across all classes. one of the very encouraging result is the proposed models show high precision, recall, and f -scores on the covid- class for both testing datasets. in addition, except for alexnet [ ] , all multi-feature cnns with late fusion operation has less number of parameters compared with corresponding multi-feature cnns with middle fusion operation ( figure ). since the image classifier of alexnet [ ] is consist of three fully connected layers (fc), which store majority of parameters, alexnet [ ] with late fusion operation almost double the number of parameters compared with middle fusion operation. the rest of networks have only one or no fc layer in the image classifiers. finally, compared to previously reported results, our work achieves the highest three class classification accuracy on a significantly larger covid- dataset (table ). this will ensure few false positive cases for the covid- detected from cxr images and will help alleviate burden on the healthcare system by reducing the amount of ct scans performed. while the obtained results are very promising, more evaluation studies are required specifically for diagnosing early stage covid- from cxr images. our future work will involve the collection of cxr scans fig. : model size vs. overall accuracy from early stage or asymptotic covid- patients. we will also investigate the design of a cxr-based patient triaging system. haghanifar et al. [ ] unet+densenet training data: testing data: a review of coronavirus disease- (covid- ) coronavirus disease an interactive web-based dashboard to track covid- in real time detection of sars-cov- in different types of clinical specimens development of reverse transcription (rt)-pcr and real-time rt-pcr assays for rapid detection and quantification of viable yeasts and molds contaminating yogurts and pasteurized food products automated detection of covid- cases using deep neural networks with x-ray images covid-net: a tailored deep convolutional neural network design for detection of covid- cases from chest x-ray images covid-resnet: a deep learning framework for screening of covid from radiographs covidiagnosis-net: deep bayes-squeezenet based diagnostic of the coronavirus disease (covid- ) from x-ray images covidlite: a depth-wise separable deep neural network with white balance and clahe for detection of covid- stacked convolutional neural network for diagnosis of covid- disease from x-ray images covid-cxnet: detecting covid- in frontal chest x-ray images using deep learning covid- : automatic detection from x-ray images utilizing transfer learning with convolutional neural networks umls-chestnet: a deep convolutional neural network for radiological findings, differential diagnoses and localizations of covid- in chest x-rays localization of bone surfaces from ultrasound data using local phase information and signal transmission maps the monogenic signal α scale spaces filters for phase based edge detection in ultrasound images efficient image dehazing with boundary constraint and contextual regularization multimodal deep learning. in: icml a review: deep learning for medical image segmentation using multi-modality fusion automatic segmentation of bone surfaces from ultrasound using a filter-layer-guided cnn multi modal convolutional neural networks for brain tumor segmentation imagenet classification with deep convolutional neural networks deep residual learning for image recognition sononet: real-time detection and localisation of fetal standard scan planes in freehand ultrasound xception: deep learning with depthwise separable convolutions inception-v , inception-resnet and the impact of residual connections on learning efficientnet: rethinking model scaling for convolutional neural networks a survey on deep learning in medical image analysis identifying melanoma images using efficientnet ensemble: winning solution to the siim-isic melanoma classification challenge automatic detection of coronavirus disease (covid- ) in x-ray and ct images: a machine learningbased approach bimcv covid- +: a large annotated dataset of rx and ct images from covid- patients grad-cam: visual explanations from deep networks via gradient-based localization acknowledgements the authors are thankful to all the research groups, and national agencies worldwide who provided the open source x-ray images. funding: nothing to declare. conflict of interest the authors declare that they have no conflict of interest. key: cord- - a jfz authors: basly, hend; ouarda, wael; sayadi, fatma ezahra; ouni, bouraoui; alimi, adel m. title: cnn-svm learning approach based human activity recognition date: - - journal: image and signal processing doi: . / - - - - _ sha: doc_id: cord_uid: a jfz although it has been encountered for a long time, the human activity recognition remains a big challenge to tackle. recently, several deep learning approaches have been proposed to enhance the recognition performance with different areas of application. in this paper, we aim to combine a recent deep learning-based method and a traditional classifier based hand-crafted feature extractors in order to replace the artisanal feature extraction method with a new one. to this end, we used a deep convolutional neural network that offers the possibility of having more powerful extracted features from sequence video frames. the resulting feature vector is then fed as an input to the support vector machine (svm) classifier to assign each instance to the corresponding label and bythere, recognize the performed activity. the proposed architecture was trained and evaluated on msr daily activity d dataset. compared to state of art methods, our proposed technique proves that it has performed better. human activity recognition remains a very important research field of numerous computer science organizations because of its potency to provide adapted support for various applications such as human-computer interaction, ehealth applications and surveillance. nowadays, according to the method of feature extraction, the recognition of the human activity system can be classified as a classical or a deep model. a classical model is based on hand-crafted feature descriptors which can be categorized in three types; local features, global features or a combination between them to tackle the human activity recognition problem. the global features designate the image as a whole to describe the entire human body motions. however, the local features are extracted from a set of spatio-temporal interest points (stips) to describe the image patches of a human action. although global methods are able to represent more visual informations by maintaining spatio-temporal structures of the occured actions in the video, they are very sensitive to background variations and partial occlusions. the local features considers the image as small regions, which is practically computationally expensive. on another side, deep models using deep neural networks are a promising alternative in the image analysis applications areas. convolutional neural network (cnn) is considered as one of the successful deep models for image classification tasks. traditionally, to deal with such problem of recognition, researcher are obliged to anticipate their algorithms of human activity recognition by prior data training preprocessing in order to extract a set of features using different types of descriptors such as hog d [ ] , extended surf [ ] and space time interest points (stips) [ ] before inputting them to the specific classification algorithm such as hmm, svm, random forest [ ] [ ] [ ] . it has been proven that the previous approaches are not very robust due to their poor performance and their requirement in time and memory space. recently, deep learning architectures are employed in order to change the engineering feature extraction phase by an automatic processing where deep neural networks have been directly applied to the raw data without human intervention to extract deep features. since the training of a new cnn from scratch requires to load huge amount of data and expensive computational resources, we used the concept of transfer learning and fine tune the parameters of a pretrained model. the initial cnn model was trained on a subset of the ilsvrc- of the large scale imagenet [ ] dataset. consequently, we decreased the training time, and avoid over fitting by insuring the suitable weight initialization given the quite small used data set. in this study, we proposed an advanced human activity recognition method from video sequence using cnn, where the large scale dataset imagenet pretrains the network. in fact, a pretrained cnn extracts feature vectors that characterize frames from the raw data. the resulting deep sparse representation of features vectors are fed as input to a multi class support vector machines algorithm to be classified. since the deep neural networks are more difficult to train, the residual learning approach based resnet model was proposed to facilitate the training phase. the main contribution of the present work is to propose a learning approach for human activity recognition based cnn and svm able to classify activities from one shot. the proposed framework is trained and tested on a publicly available dataset, i.e., msrdailyactivity d dataset [ ] . obtained results show that the proposed method outperforms the state-of-the-art methods. the rest of this paper is organized as follows: sect. highlights some related works, in sect. , we describe our proposed approach. we present the experimental evaluation in sect. . finally, in sect. , we conclude the paper. for human activity recognition challenge, an activity has to be represented by a set of features. to represent complex activities, authors in [ ] have combined the histogram of oriented gradient (hog), the motion history image (mhi) and the foreground image (fi). the hog feature represents the magnitude and the direction of corners and edges, mhi feature is extracted to characterize motion direction and the fi is obtained by background subtraction. finally, all the resulting features have been merged to be fed as input to a simulated annealing multiple instance learning support vector machine (smile-svm) classifier for human activity recognition. the work of [ ] extracted a motion space-time feature descriptor characterizing the video frames by combining the histogram of silhouette and the optical flow values. the first feature is obtained by background subtraction and the second is calculated using the algorithm of lucas-kanade [ ] inside a normalized bounding box. a multi class svm classifier has been used to classify the activities. this system was set up to face the restraints of long training time and high dimension of the feature vector. [ ] investigates a two distinct stream convnets architecture that includes spatial and temporal networks. in the spatial stream, the action recognition is performed from rgb video frames, whereas in the temporal stream, the recognition of action was made from motion information obtained by stacking dense optical flow between consecutive frames. both streams are employed as convnets and are finally combined by late fusion. two fusion methods have been considered; a fusion by averaging and a fusion by multi-class linear svm on softmax scores. the purpose in [ ] is to classify the human actions from videos into different classes. the process is performed by extracting interest points from each video, segmenting images and constructing motion history images. after selecting discriminating features and representing images by visual words, a histogram of visual words is elaborated based on features extracted from the motion history images. finally, the extracted features vectors are used to train a support vector machine for action classification. [ ] proposed a system to recognize abnormal comportment providing an alert to the accurate user on his android mobile phone. the task is to extract features using scale invariant feature transform (sift) descriptor for each video after dividing them into number of frames. the extracted features are then exploited as input to two different types of classifiers, i.e; the k nearest neighbor (knn) and the support vector machine (svm) to classify the actions. as recent written works [ , , ] has proven, the deep hierarchical visual feature extractors are currently outperforming traditional hand-crafted descriptor, and are more generalizable and accurate when dealing with important levels of immanent noise problems. to describe the activities in a frame-wise way, we chose to use the cnn approach based on rgb data because of its widespread application in different areas. cnns are also advantageous by their reduction of the number of parameters and connections used on artificial neural model to facilitate their training phase. in this step, the question now is how to represent the human actions in each extracted frame of the video. to extract the most pertinent and significant features from the raw rgb video frame, we employed a pre-trained deep cnn architecture with pre-trained parameters based on ima-genet. the original cnn was trained on the . m high-resolution images of the ilsvrc classification training subset of the imagenet dataset. though, in the proposed method, we used a deep cnn network architecture to generate a probability vector for each input frame which represents the probability of the presence of the different objects present in each individual frame. a resnet model is used with pre-trained parameters from imagenet database and applied to extract sparse and pertinent residual representations of features from video frames of each sequence video. the architecture is composed of several resnet blocks with three layer deep, composed of five composite convolutional layers including small kernels sizing by × , × and × . the network takes an input of size × which was reduced five times in the network by a stride of . the output obtained from the average pooling operation is applied to the final feature map of the network followed by the fully connected layer. the resulting vector from the last pooling layer is considered as the features representation generated from the reused pretrained model in a feedforward pass. after each convolution, a batch normalization and an relu are achieved. the residual units are represented as: where x l and x l+ correspond to the input and the output of the l t h layer, f denotes a nonlinear residual mapping characterized by convolutional filter weights wl and f corresponds to the relu function. the main advantage of handling residual units in such types of networks, is that their skip connections or "shortcuts" allow the direct propagation of signals over all the network' layers. this design is very advantageous mainly during the backpropagation phase; in fact, gradients are directly propagated from the loss layer to all the other preceding layers while skipping some intermediate layers which have the potential to provoke the deterioration or the disappearance of the gradient signal. this strategy helped the network to appreciate the accuracy gained from deeper architectures. since training a new deep cnn model from scratch requires important loads of data and elevated resources of computation, we have implemented a transfer learning procedure to fine-tune the parameters of a pre-trained model. we adopted an original cnn model that was pretrained on a subset of the largescale image classification dataset such as the imagenet. proceeding in this way, we succeed to reduce the required time for training and to avoid our dataset from overfitting by assuring a good initialization of weights, given the quiet small available dataset. in fact, the dataset was artificially augmented by using three techniques. first random reflect frames in the left direction, second a random horizontal translation that consists of moving frames along the horizontal direction, and finally, a random vertical translation is applied by moving frames on the vertical direction. in reality, the last layer of the adopted cnn model is a classification layer; though, in the present study, we removed this layer and exploited the output of the preceding layer as frame features for the classification step. instead of the eliminated layer, the svm classifier has been employed to predict the human activity label. figure summarizes the architecture of the proposed action recognition model. svm is supposed as machine learning classifier method that gives good results in comparison with other types of classifier. we decided to use it in this study because of its effectiveness when dealing with quiet small datasets and its performance in high dimensional spaces [ , , [ ] [ ] [ ] [ ] [ ] . the principal idea behind the use of svm is to applicate a supervised learning algorithm facilitating to find the optimal hyperplane that separates the feature space. during training, the svm generates hyperplanes in a high dimensional space to separate the training dataset into different classes. if the training data subset are not linearly separable, a kernel function svm is used to transmit the data to a new vector space. svm performs well with large scale training datasets and yields to accurate and effective results. for a given training dataset; d(x , y ), (x , y ), . ..(x n , y n ) where x i ∈ r n and memberships y i ∈ ± classes; i represents the label corresponding to each action in the defined dataset. to determine a decision function for a linear classification, the hyperplane separation is represented by: a generic hyperplane is defined by satisfying the condition: when delimited by margins, the set of hyperplanes can be written as: to formulate the optimal hyperplane that separates the data, we should minimize: subject to the constraints of eq. ). multi-class svm. even though svm were initially developed for binary classification, it can be successfully extended to be applied to multiclass classification problems. the main strategy consists to separate the multiclass problem into many biclass problems and combine the outputs of all the sub-binary classifiers to provide the final class prediction of a sample. fundamentally, there are two main methods for multiclass svm. the first type is called "oneagainstone" [ ] , it consists to construct one classifier per pair of classes and combine binary classifiers in a way to form a multi-class classifier by selecting the most voted class. so, n (n − )/ binary svm classifiers are needed, each of them is trained on the samples of the two corresponding classes. the second method is called "oneagainstall" [ ] and it considers all the classes of data in one optimization problem. in fact, for each classifier, the considered class is fitted against all the other classes, so, n number of classes use n svm classifiers. when using the latter technique, the training process takes a long time. the"msrdailyactivity d"dataset [ ] is an rgb sequences dataset that contains sixteen daily human activities. the database was captured by a kinect camera around various objects, and the humans in question are located at different distances from the camera. activities are accomplished by ten different subjects, the most of them are categorized as "human object interactions". activities were performed twice by each person in two different positions; i.e; the "standing" and the "sitting" situation. the deep cnn model was trained using matlab . our approach based cnn model was performed on a machine equipped with a nvidia geforce m gpu, gb memory and an intel core i - hq ( . ghz) processor. our dataset was artificially augmented. this technique allows to avoid the problem of dataset overfitting. each video from our dataset were split into frames which serve as input to the pre-trained cnn model. in the training stage, a × frame is randomly reflected from the selected frame; it then undergoes a random horizontal and vertical translation. these operations are applied in such a way that the training dataset is augmented at each iteration. the dimensional vector resulting from the last pooling layer of the resnet model were used to activate the training and testing subsets. the resulting vectors were used as training and test data for the multi-class svm classifier. the training process is performed using a mini-batch stochastic gradient descent with a momentum set to . to learn the network weights. at each iteration, a mini-batch size of samples is constructed by sampling the training videos by , from which a single frame is selected randomly. during our experimentation, the learning rate is initially set to e − and the network is trained for epochs. we also tried to increment the number of epochs but we got always overfitting. for our used multi-class svm classifier, we chose to employ the linear function kernel to project the original linear or nonlinear dataset into a higher dimensional space in order to make it linearly separable and to give a better performance for the svm. the linear kernel is a simple kernel function based on the penalty parameter c described by the following format: during experimentation, we evaluated our method on the dataset described above: % used for the training stage and % from data are used for testing. firstly, each frame is resized to × resolution. we have determined the confusion matrix of our proposed system in order to demonstrate the correspondence between the predicted labels along the x-axis and the true labels along the y-axis and to represent the recognition performance for each action class in the msrdailyactivity d dataset. generally, a confusion matrix involves four groupings: tp (true positive) mean the instances that are correctly identified as positives, fp (false positive) refers to the negative examples incorrectly identified as positive, tn (true negative) refers to the negative instances that are correctly predicted as negative, and fn (false negative) represents the positive instances incorrectly predicted as negative. we also evaluate different performance metrics of our proposed approach by calculating the precision, recall and f-measure values as shown in table . figure demonstrates that the most confusion is between sit down and stand up labels. this misclassification can be explained by the similarity in a few steps when carrying out both of actions which contain a person in a half-sitting position.İn fact, the middle frames of the two classes sit down and stand up presenting a person in a half setting position are making the confusion, because of their repetition in the two cases. whereas more than half of the classes have been correctly classified at %. table notices that our approach has achieved a good recognition performance and outperforms other state-of-the-art methods on msrdailyactivity d dataset. achieved performance confirms the generalization competence of our learned representations across domains. the work of [ ] has obtained bad results in this dataset despite it was based on the combination of two deep neural network models which are the cnn and lstm. whereas the implemented cnn model for feature extraction is not based transfer learning concept. based on all these observations, we can deduce that pretraining a model on a largescale dataset and fine tune his hyper-parameters on a small one is very efficient to obtain good performance rate. we have also combined the same pretrained resnet model which was used to extract features, once with a multi layer perception (mlp) classifier and another time with a long short term memory (lstm) network. the obtained results show that using a multi-class svm classifier, gives the best result. in order to investigate on the effect of the choice of the svm kernel, we have performed a classification using radial basis function (rbf) kernel. the results were not interesting due to the relevance of the feature representation obtained from convolutional neural network. in this study we presented the support vector machines approach for human activity recognition task. we proposed to use a pre-trained cnn approach based resnet model in order to extract spatial and temporal features from consecutive video frames. our proposed architecture was trained and tested on msrdaily-activity d dataset and it achieved a good recognition performance. for our future works, we propose to use a combination of a genetic algorithm with support vector machines in order to optimize the weights of the used cnn model leading to automatically improve the performance. likewise, we would like to expend the proposed model for more large-scale dataset such as ntu rgb+d because the used dataset is small and the used pretrained cnn model can be more effective when applied to a big one. a spatio-temporal descriptor based on d-gradients an efficient dense and scale-invariant spatio-temporal interest point detector behavior recognition via sparse spatio-temporal features multi-sensor fusion for human daily activity recognition in robot-assisted living recognizing human activities from smartphone sensor signals unintrusive eating recognition using google glass imagenet large scale visual recognition challenge mining actionlet ensemble for action recognition with depth cameras action detection in complex scenes with spatial and temporal ambiguities faster human activity recognition with svm an iterative image registration technique with an application to stereo vision two-stream convolutional networks for action recognition in videos human action recognition: a construction of codebook by discriminative features selection approach human activity recognition on real time and offline dataset support-vector networks improving accuracy of intrusion detection model using pca and optimized svm support vector machines: a recent method for classification in chemometrics convolutional neural networks and long short-term memory for skeleton-based human activity and hand gesture recognition spatio-temporal rgbd cuboids feature for human activity recognition spatio-temporal depth cuboid similarity feature for activity recognition using depth camera learning actionlet ensemble for d human action recognition multimodal multipart learning for action recognition in depth videos action recognition from depth maps using deep convolutional neural networks deep neural network features for horses identity recognition using multiview horses' face pattern neural approach for context scene image classification based on geometric, texture and color information relidss: novel lie detection system from speech signal deepcolorfasd: face anti spoofing solution using a multi channeled color spaces cnn human gait identity recognition system based on gait pal and pal entropy (gppe) and distances features fusion towards human behavior recognition based on spatio temporal features and support vector machines key: cord- - yt uqyy authors: kassani, sara hosseinzadeh; kassasni, peyman hosseinzadeh; wesolowski, michal j.; schneider, kevin a.; deters, ralph title: automatic detection of coronavirus disease (covid- ) in x-ray and ct images: a machine learning-based approach date: - - journal: nan doi: nan sha: doc_id: cord_uid: yt uqyy the newly identified coronavirus pneumonia, subsequently termed covid- , is highly transmittable and pathogenic with no clinically approved antiviral drug or vaccine available for treatment. the most common symptoms of covid- are dry cough, sore throat, and fever. symptoms can progress to a severe form of pneumonia with critical complications, including septic shock, pulmonary edema, acute respiratory distress syndrome and multi-organ failure. while medical imaging is not currently recommended in canada for primary diagnosis of covid- , computer-aided diagnosis systems could assist in the early detection of covid- abnormalities and help to monitor the progression of the disease, potentially reduce mortality rates. in this study, we compare popular deep learning-based feature extraction frameworks for automatic covid- classification. to obtain the most accurate feature, which is an essential component of learning, mobilenet, densenet, xception, resnet, inceptionv , inceptionresnetv , vggnet, nasnet were chosen amongst a pool of deep convolutional neural networks. the extracted features were then fed into several machine learning classifiers to classify subjects as either a case of covid- or a control. this approach avoided task-specific data pre-processing methods to support a better generalization ability for unseen data. the performance of the proposed method was validated on a publicly available covid- dataset of chest x-ray and ct images. the densenet feature extractor with bagging tree classifier achieved the best performance with % classification accuracy. the second-best learner was a hybrid of the a resnet feature extractor trained by lightgbm with an accuracy of %. a series of pneumonia cases of unknown etiology occurred in december , in wuhan, hubei province, china. on december , , unexplained cases of pneumonia were identified and found to be associated with so called "wet markets" which sell fresh meat and seafood from a variety of animals including bats and pangolins. the pneumonia was found to be caused by a virus identified as "severe acute respiratory syndrome coronavirus " (sars-cov- ), with the associated disease subsequently termed coronavirus disease (covid- ) figure : the illustration of covid- , created at the centers for disease control and prevention (cdc) [ ] . the protein particles e, s, and m are located on the outer surface of the virus particle. the spherical viral particles, colorized blue, contain cross-sections through the viral genome, seen as black dots [ ] . processing techniques and deep learning algorithms could assist physicians as diagnostic aides for covid- and help provide a better understanding of the progression the disease. hemdan et al. [ ] developed a deep learning framework, covidx-net, to diagnose covid- in x-ray images. a comparative study of different deep learning architectures including vgg , densenet , resnetv , inceptionv , inceptionresnetv , xception and mo-bilenetv is provided by authors. the public dataset of x-ray images was provided by dr. joseph cohen [ ] and dr. adrian rosebrock [ ] . the provided dataset included x-ray images, divided into two classes as normal cases and positive covid- images. hemdan's results demonstrated vgg and densenet models achieved the best performance scores among counterparts with . % accuracy. barstugan et al. [ ] proposed a machine learning approach for covid- classification from ct images. patches with different sizes × , × , × , × were extracted from ct images. different hand-crafted features such as grey level co-occurrence matrix (glcm), local directional pattern (ldp), grey level run length matrix (glrlm), grey-level size zone matrix (glszm), and discrete wavelet transform (dwt) algorithms were employed. the extracted features were fed into a support vector machine (svm) [ ] classifier on -fold, -fold and -fold cross-validations. the best accuracy of . % was obtained by glszm feature extractor with -fold cross-validation. wang and wong [ ] designed a tailored deep learning-based framework, covid-net, developed for covid- detection from chest x-ray images. the covid-net architecture was constructed of combination of × convolutions, depth-wise convolution and the residual modules to enable design deeper architecture and avoid the gradient vanishing problem. the provided dataset consisted of s a combination of covid chest x-ray dataset provided by dr. joseph cohen [ ] , and kaggle chest x-ray images dataset [ ] for a multi-class classification of normal, bacterial infection, viral infection (non-covid) and covid- infection. obtained accuracy of this study was . %. in a study conducted by maghdid et al. [ ] , a deep learning-based method and transfer learning strategy were used for automatic diagnosis of covid- pneumonia. the proposed architecture is a combination of a simple convolutional neural network (cnn) architecture (one convolutional layer with filters followed by batch normalization, rectified linear unit (relu), two fully-connected layers) and a modified alexnet [ ] architecture with the feasibility of transfer learning. the proposed modified architecture achieved an accuracy of . %. ghoshal and tucker [ ] investigated the diagnostic uncertainty and interpretability of deep learning-based methods for covid- detection in x-ray images. dropweights based bayesian convolutional neural networks (bcnn) were used to estimate uncertainty in deep learning solutions and provide a level of confidence of a computer-based diagnosis for a trusted clinician setting. to measure the relationship between accuracy and uncertainty, posterioranterior (pa) lung x-ray images of covid- positive patients from the public dataset provided by dr. joseph cohen [ ] were selected and balanced by kaggle's chest x-ray images dataset [ ] . to prepare the dataset, all images were resized to × pixels. a transfer learning strategy and real-time data augmentation strategies were employed to overcome the limited size of the dataset. the proposed bayesian inference approach obtained the detection accuracy of . % on x-ray images using vgg deep learning model. hall et al. [ ] used a vgg architecture and transfer learning strategy with -fold crossvalidation trained on the dataset from dr. joseph cohen [ ] . all images were rescaled to × pixels and a data augmentation strategy was employed to increase the size of dataset. the proposed approach achieved an overall accuracy . % and overall area under curve (auc) of . % on the provided dataset. farooq and hafeez [ ] proposed a fine-tuned and pre-trained resnet- architecture, covid-resnet, for covid- pneumonia screening. to improve the generalization of the training model, different data augmentation methods including vertical flip, random rotation (with angle of degree), along with the model regularization were used. the proposed method achieved the accuracy of . % on a multi-class classification of normal, bacterial infection, viral infection (non-covid- ) and covid- infection dataset. the main motivation of this study is to present a generic feature extraction method using convolutional neural networks that does not require handcrafted or very complex features from input data while being easily applied to different modalities such as x-ray and ct images. another primary goal is to reduce the generalization error while achieving a more accurate diagnosis. the contributions are summarized as follows: • deep convolutional feature representation [ , , ] is used to extract highly representative features using state-of-the-art deep cnn descriptors. the employed approach is able to discriminate between covid- and healthy subjects from chest x-ray and ct images and hence produce higher accuracy in comparison to other works presented in the literature. to the best of our knowledge, this research is the first comprehensive study of the application of machine learning (ml) algorithms ( deep cnn visual feature extractor and ml classifier) for automatic diagnoses of covid- from x-ray and ct images. • to overcome the issue of over-fitting in deep learning due to the limited number of training images, a transfer-learning strategy is adopted as the training of very deep cnn models from scratch requires a large number of training data. • no data augmentation or extensive pre-processing methods are applied to the dataset in order to increase the generalization ability and also reduce bias toward the model performance. • the proposed approach reduces the detection time dramatically while achieving satisfactory accuracy, which is a superior advantage for developing real or near real-time inferences on clinical applications. • with extensive experiments, we show that the combination of a deep cnn with bagging trees classifier achieves very good classification performance applied on covid- data despite the limited number of image samples. • finally, we developed an end to end web-based detection system to simulate a virtual clinical pipeline and facilitate the screening of suspicious cases. the rest of this paper is organized as follows. the proposed methodology for automatically classifying covid- and healthy cases is explained in section . the dataset description, experimental settings and performance metrics are given in section . a brief discussion and results analysis are provided in section , and finally, the conclusion is presented in section . few studies have been published on the application of deep cnn feature descriptors to x-ray and ct images. each of the cnn architectures is constructed by different modules and convolution layers that aid in extracting fundamental and prominent features from a given input image. briefly, in the first step, we collect available public chest x-ray and ct images. in the next step, we pre-processed the provided dataset using standard image normalization techniques to improve the quality of visual information of the input data. once input images are prepared, we fed them into the feature extraction phase with the state-of-the-art cnn descriptors to extract deep features from each input image. for the training phase, the generated features are then fed into machine learning classifiers such as decision tree (dt) [ ] , random forest (rf) [ ] , xgboost [ ] , adaboost [ ] , bagging classifier [ ] and lightgbm [ ] . finally, the performance of the proposed approach is evaluated on test images. the concept of transfer learning has been introduced for solving deep learning problems arising from insufficiently labeled data, or when the cnn model is too deep and complex. aiming to tackle these challenges, studies in a variety computer vision tasks demonstrated the advantages of transfer learning strategies from an auxiliary domain in improving the detection rate and performance of a classifier [ ] [ ] [ ] . in a transfer learning strategy, we transfer the weights already learned on a cross-domain dataset into the current deep learning task instead of training a model from scratch. with the transfer learning strategy, the deep cnn can obtain general features from the source dataset that cannot be learned due to the limited size of the dataset in the current task. transfer learning strategies have various advantages, such as avoiding the overfitting issue when the number of training samples is limited, reducing the computational resources, and also speeding up the convergence of the network [ ] [ ]. effective feature extraction is one of the most important steps toward learning rich and informative representations from raw input data to provide accurate and robust results. the small or imbalanced size of the training samples poses a significant challenge for the training of a deep cnn where data dimensionality is much larger than the number of samples leading to over-fitting. although various strategies, e.g. data augmentation [ ] , transfer learning [ ] and fine-tuning [ ] , may reduce the problem of insufficient or imbalance training data, the detection rate of the cnn model may degrade due to the over-fitting issue. since the overall performance obtained by a fine-tuning method in the initial experiments for this study was not significant, we employed a different approach inspired by [ ] [ ] [ ] known as deep convolutional feature representation. in this method, we used pre-trained well-established cnn models as a visual feature extractor to encode the input images into a feature vector of sparse descriptors of low dimensionality. then the computed encoded feature vectors produced by cnn architectures are fed into different classifiers, i.e. machine learning algorithms, to yield the final prediction. this lower dimension vector significantly reduces the risk of over-fitting and also the training time. different robust cnn architectures such as mobilenet, densenet, xception, inceptionv , inceptionresnetv , resnet, vggnet, nasnet are selected for feature extraction with the possibility of transfer learning advantage for limited datasets and also their satisfying performances in different computer vision tasks [ , , , ] . figure . illustrates the visual features extracted by vggnet architecture from an x-ray image of a covid- positive patient. in order to evaluate the performance of our feature extracting and classifying approach, we used the public dataset of x-ray images provided by dr. joseph cohen available from a github repository [ ] . we used the available chest x-ray images and ct images ( images in total) of covid- positive cases. we also included images of healthy cases of x-ray images from kaggle chest x-ray images (pneumonia) dataset available at [ ] and images of healthy cases of ct images from kaggle rsna pneumonia detection dataset available at [ ] to balance the dataset with both positive and normal cases. figure shows examples of confirmed covid- images extracted from the provided dataset. the x-ray images of confirmed covid- infection demonstrate different shapes of "pure ground glass" also known as hazy lung opacity with irregular linear opacity depending the disease progress [ ] . the images within the dataset were collected from multiple imaging clinics with different equipment and image acquisition parameters; therefore, considerable variations exist in images' intensity. the proposed method in this study avoids extensive pre-processing steps to improve the generalization ability of the cnn architecture. this helps to make the model more robust to noise, artifacts and variations in input images during feature extraction phase. hence, we only employed two standard pre-processing steps in training deep learning models to optimize the training process. • resizing: the images in this dataset vary in resolution and dimension, ranging from × to × pixels; therefore, we re-scaled all images of the original size to the size of × pixels to obtain a consistent dimension for all input images. the input images were also separately resized to × pixels and × pixels as required for nasnetlarge and nasnetmobile architectures, respectively. • image normalization: for image normalization, first, we re-scaled the intensity values of the pixels using imagenet mean subtraction as a pre-processing step. the imagenet mean is a pre-computed constant derived from the imagenet database [ ] . another essential pre-process step is intensity normalization. to accomplish this, we normalized the intensity values of all images from [ , ] to the standard normal distribution by min-max normalization to the intensity range of [ , ], which is computed as: where x is the pixel intensity. x min and x max are minimum and maximum intensity values of the input image in equation . this operation helps to speed up the convergence of the model by removing the bias from the features and achieve a uniform distribution across the dataset. to measure the prediction performance of the methods in this study, we utilized common evaluation metrics such as recall, precision, accuracy and f -score. according to equations ( ) ( ) ( ) ( ) true positive (tp) is the number of instances that correctly predicted; false negative (fn) is the number of instances that incorrectly predicted. true negative (tn) is the number of negative instances that predicted correctly, while false positive (fp) is the number of negative instances incorrectly predicted. given tp, tn, fp and fn, all evaluation metrics were calculated as follows: recall or sensitivity is the measure of covid- cases that are correctly classified. recall is critical, especially in the medical field and is given by: precision or positive predictive value is defined as the percentage of correctly classified labels in truly positive patients and is given as: accuracy shows the number of correctly classified cases divided by the total number of test images, and is defined as: f -score, also known as f-measure, is defined as the weighted average of precision and recall that combines both the precision and recall together. f-measure is expressed as: diagnostic imaging modalities, such as chest radiography and ct are playing an important role in confirming the primary diagnosis from the polymerase chain reaction (pcr) test for covid- . medical imaging is also playing a critical in monitoring the progression of the disease and patient care. extracting features from radiology modalities is an essential step in training machine learning models since the model performance directly depends on the quality of extracted features. motivated by the success of deep learning models in computer vision, the focus of this research is to provide an extensive comprehensive study on the classification of covid- pneumonia in chest x-ray and ct imaging using features extracted by the stateof-the-art deep cnn architectures and trained on machine learning algorithms. the -fold cross-validation technique was adopted to evaluate the average generalization performance of the classifiers in each experiment. for all cnns, the network weights were initialized from the weights trained on imagenet. the windows based computer system used for this work had an intel(r) core(tm) i - k . ghz processors with gb ram. the training and testing process of the proposed architecture for this experiment was implemented in python using keras package with tensorflow backend as the deep learning framework backend and run on nvidia geforce gtx ti gpu with gb ram. table and figure summarize the accuracy performance of six machine learning algorithms, namely, dt, rf, xgboost, adaboost, bagging classifier and lightgbm on the feature extracted by deep cnns. each entry in table , is in the format (µ ± σ) where µ is the average classification accuracy and σ is standard deviation. analyzing table the topmost result was obtained by bagging classifier with a maximum of . % ± . accuracy on features extracted by desnsenet architecture (with feature extraction time of . seconds and training time of . seconds in table ), which is the highest result reported in the literature for covid- classification of this dataset. it is also inferred from table that the second-best result obtained by resnet feature extractor and lightgbm classifier (with feature extraction time of . seconds and training time of . seconds in table ) with an overall accuracy of . ± . . comparing the first and second winners among all combinations, the classification accuracy of densenet with bagging is slightly better ( %) than resnet with lightgbm, while the training time of the second winner is tempting, almost times better than the first winner in terms of accuracy. although bagging is a slow learner, it has the lowest standard deviation and hence is more stable than other learners. the results also demonstrate that the detection rate is worst on the features extracted by resnet v trained by the adaboost classifier with . ± . accuracy. figure and figure demonstrate box-plot distributions of deep cnns feature extractors and classification accuracy from the -fold cross-validation. circles in figure represent outliers. in tables , table : comparison of classification f -score metric of different machine learning models. the bold value indicates the best result; underlined value represents the second-best result of the respective category. trained visual feature extractor so far was desnsenet , mobilenet and inceptionv rather than counterpart architectures for covid- image classification. although the approach presented here shows satisfying performance, it also has limitations classifying more challenging instances with vague, low contrast boundaries, and the presence of artifacts. some examples of these cases are illustrated in figure . finally, comparison of the feature extraction time using deep cnn models and training with ml algorithms are shown in table and after training a model, the pre-trained weights and models can be used as predictive engine for cad systems to allow an automatic classification of new data. a web-based application was implemented using standard web development tools and techniques such as python, javascript, html, and flask web framework. figure shows the output of our web-based application for covid- pneumonia detection. this web application could help doctors benefit from our proposed method by providing an online tool that only requires uploading an x-ray or ct image. the application then provides the physician with a simple covid- positive, or covid- negative observation. it should be noted that this application has yet to be clinically validated, is not yet approved for diagnostic use and would simply serve as a diagnostic aid for the medical imaging specialist. the proposed method is generic as it does not need handcrafted features and can be easily adapted, requiring minimal pre-processing. the provided dataset is collected across multiple sources with different shape, textures and morphological characteristics. the transfer learning strategy has successfully transferred knowledge from the source to the target domain despite the limited dataset size of the provided dataset. during the proposed approach, we observed that no overfitting occurs to impact the classification accuracy adversely. however, our study has some limitations. the training data samples are limited. extending the dataset size by additional data sources can provide a better understanding on the proposed approach. also, employing pre-trained networks as feature extractors requires to rescale the input images to a certain dimension which may discard valuable information. although the proposed methodology achieved satisfying performance with an accuracy of . %, the diagnostic performance of the deep learning visual feature extractor and machine learning classifier should be evaluated on real clinical study trials. the ongoing pandemic of covid- has been declared a global health emergency due to the relatively high infection rate of the disease. as of the time of this writing, there is no clinically approved therapeutic drug or vaccine available to treat covid- . early detection of covid- is important to interrupt the human-to-human transmission of covid- and patient care. currently, the isolation and quarantine of the suspicious patients is the most effective way to prevent the spread of covid- . diagnostic modalities such as chest xray and ct are playing an important role in monitoring the progression and severity of the disease in covid- positive patients. this paper presents a feature extractor-based deep learning and machine learning classifier approach for computer-aided diagnosis of covid- pneumonia. several ml algorithms were trained on the features extracted by well-established cnns architectures to find the best combination of features and learners. considering the high visual complexity of image data, proper deep feature extraction is considered as a critical step in developing deep cnn models. the experimental results on available chest x-ray and ct dataset demonstrate that the features extracted by desnsenet architecture and trained by a bagging tree classifier generates very accurate prediction of . % in terms of classification accuracy. covid- infection: origin, transmission, and characteristics of human coronaviruses thrombocytopenia is associated with severe coronavirus disease (covid- ) infections: a meta-analysis probable pangolin origin of sars-cov- associated with the covid- outbreak the impact of the covid- epidemic on the utilization of emergency dental services coronavirus disease (covid- ): a primer for emergency physicians the epidemiology and pathogenesis of coronavirus disease (covid- ) outbreak clinical and ct imaging features of the covid- pneumonia: focus on pregnant women and children covid- ) pandemic transmission potential and severity of covid- in south korea coronavirus infections -transmission electron microscopic image temporal changes of ct findings in patients with covid- pneumonia: a longitudinal study covidx-net: a framework of deep learning classifiers to diagnose covid- in x-ray images covid- image data collection detecting covid- in x-ray images with keras, tensorflow, and deep learning coronavirus (covid- ) classification using ct images by machine learning methods an introduction to support vector machines and other kernel-based learning methods covid-net: a tailored deep convolutional neural network design for detection of covid- cases from chest radiography images kaggle's chest x-ray images (pneumonia) dataset diagnosing covid- pneumonia from x-ray and ct images using deep learning and transfer learning algorithms imagenet classification with deep convolutional neural networks estimating uncertainty and interpretability in deep learning for coronavirus (covid- ) detection finding covid- from chest x-rays using deep learning on a small dataset covid-resnet: a deep learning framework for screening of covid from radiographs a theoretical analysis of feature pooling in visual recognition deep convolutional neural networks for breast cancer histology image analysis deep learning for visual understanding: a review induction of decision trees random forests proceedings of the nd acm sigkdd international conference on knowledge discovery and data mining -kdd ' a desicion-theoretic generalization of on-line learning and an application to boosting bagging predictors lightgbm: a highly efficient gradient boosting decision tree breast cancer diagnosis with transfer learning and global pooling a novel deep learning based framework for the detection and classification of breast cancer using transfer learning breast cancer histology images classification: training from scratch or transfer learning? pathological brain detection based on alexnet and transfer learning classification of histopathological biopsy images using ensemble of deep learning networks automatic diagnosis of fungal keratitis using data augmentation and image fusion with deep convolutional neural network a novel scene classification model combining resnet based transfer learning and data augmentation with a filter decision fusionbased fetal ultrasound image plane classification using convolutional neural networks automated identification and grading system of diabetic retinopathy using deep neural networks deep learning iot system for online stroke detection in skull computed tomography images detection of tumors on brain mri images using the hybrid convolutional neural network architecture mapgi: accurate identification of anatomical landmarks and diseased tissue in gastrointestinal tract using deep learning rsna pneumonia detection challenge key: cord- -ki gkoc authors: kikkisetti, s.; zhu, j.; shen, b.; li, h.; duong, t. title: deep-learning convolutional neural networks with transfer learning accurately classify covid lung infection on portable chest radiographs date: - - journal: nan doi: . / . . . sha: doc_id: cord_uid: ki gkoc portable chest x-ray (pcxr) has become an indispensable tool in the management of coronavirus disease (covid- ) lung infection. this study employed deep-learning convolutional neural networks to classify covid- lung infections on pcxr from normal and related lung infections to potentially enable more timely and accurate diagnosis. this retrospect study employed deep-learning convolutional neural network (cnn) with transfer learning to classify based on pcxrs covid- pneumonia (n= ) on pcxr from normal (n= ), bacterial pneumonia (n= ), and non-covid viral pneumonia (n= ). the data was split into % training and % testing. a five-fold cross-validation was used. performance was evaluated using receiver-operating curve analysis. comparison was made with cnn operated on the whole pcxr and segmented lungs. cnn accurately classified covid- pcxr from those of normal, bacterial pneumonia, and non-covid- viral pneumonia patients in a multiclass model. the overall sensitivity, specificity, accuracy, and auc were . , . , and . , . respectively (whole pcxr), and were . , . , . , and . (cxr of segmented lung). the performance was generally better using segmented lungs. heatmaps showed that cnn accurately localized areas of hazy appearance, ground glass opacity and/or consolidation on the pcxr. deep-learning convolutional neural network with transfer learning accurately classifies covid- on portable chest x-ray against normal, bacterial pneumonia or non-covid viral pneumonia. this approach has the potential to help radiologists and frontline physicians by providing more timely and accurate diagnosis. coronavirus disease (covid- ) is a highly infectious disease that causes severe respiratory illness ( , ) . it was first reported in wuhan, china in december ( ) and was declared a pandemic on mar , ( ) . the first confirmed case of coronavirus disease in the united states was reported from washington state on january , . ( ) soon after, washington, california and new york reported outbreaks. covid- has already infected million, killed more than . million people, and the united states has become the worst-affected country, with more than . million diagnosed cases and at least , deaths (https://coronavirus.jhu.edu, assessed jun , ). there are recent spikes of covid- infection cases across many states and around the world and there will likely be second waves and recurrence. a definitive test of covid- infection is the reverse transcription polymerase chain reaction (rt-pcr) of a nasopharyngeal or oropharyngeal swab specimen ( , ) . although rt-pcr has high specificity, it has low sensitivity, high false negative rate, and long turn-around time ( , ) (currently ~ days although it is improving and other tests are becoming available ( )). by contrast, portable chest x-rays (pcxr) is convenient to perform, has a fast turnaround, and is well suited for imaging contagious patients and longitudinal monitoring of critically ill patients in the intensive care units because the equipment can be readily disinfected, preventing crossinfection. pcxr of covid- infection has certain unique characteristics, such as predominance of bilateral, peripheral, and low lobes involvement, with ground-glass opacities with or without airspace consolidations as the disease progresses. these characteristics generally differ from other lung pathologies, such as bacterial pneumonia or other viral (non-covid- ) lung infection. based on cxr and laboratory findings, clinicians might start patients on empirical treatment before the rt-pcr results become available or even if the rt-pcr come back negative due to high false negative rate of rt-pcr. early treatment in covid- patients is associated with better clinical outcomes. similarly, computed tomography (ct), which offers relatively more detailed features (such as subtle ground-glass opacity ( , )), has also been used in the context of covid- . however, ct suite and equipment are more challenging to disinfect, and thus it is much less suitable for examining patients suspected of or confirmed with contagious diseases in general and covid- in particular. longitudinal ct monitoring of critically ill patients in the intensive care units is also challenging. in short, pcxr has become an indispensable imaging tool in the management of covid- infection, is often one of the first examinations a patient suspected of covid- infection receives in the emergency room, and ideally used for longitudinal monitoring of critically ill patients in the intensive care units. the usage of pcxr under the covid- pandemic circumstances is unusual in many aspects. for instance, pcxr is preferred as it can be used at the bedside without moving the patients, but the imaging quality is not as good as conventional cxr ( ) . in addition, covid- patients may not be able to take full inspirations during the examination, obscuring possible pathology, especially in the lower lung fields. many sicker patients may be positioned on the side which compromises imaging quality. thus, pcxr data under the covid- pandemic circumstances are suboptimal and, thus, may be more challenging to interpret. moreover, pcxr is increasingly read by non-chest radiologists in some hospitals due to increasing demands, resulting in reduced accuracy and efficiency. pcxr images contain important clinical features that could be easily missed by the naked eyes. computer-aided methods can improve efficiency and accuracy of pcxr interpretations, all rights reserved. no reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted september , . . https://doi.org/ . / . . . doi: medrxiv preprint which in turn provides more timely and relevant information to frontline physicians. deeplearning artificial intelligence (ai) has become increasingly popular for analyzing diagnostic images ( , ) . ai has the potential to facilitate disease diagnosis, staging of disease severity and longitudinal monitoring of disease progression. one common machine-learning algorithm is the convolutional neural network (cnn) ( , ) , which takes an input image, learns important features in the image such as size or intensity, and saves these parameters as weights and bias to differentiate types of images ( , ) . cnn architecture is ideally suited for analyzing images. moreover, the majority of machine learning algorithms to date are trained to solve specific tasks, working in isolation. models have to be rebuilt from scratch if the feature-space distribution changes. transfer learning overcomes the isolated learning paradigm by utilizing knowledge acquired for one task to solve related ones. transfer learning in ai is particularly important for small sample size data because the pre-trained weights enable more efficient training and improved performance ( , ). many artificial intelligence (ai) algorithms based on deep-learning convolutional neural networks have been deployed for pcxr applications ( ) ( ) ( ) ( ) ( ) and these algorithms can be readily repurposed for covid- pandemic circumstances. while there are already many papers describing prevalence and radiographic features on pcxr of covid- lung infection (see reviews ( , ) ), there is a few peer-reviewed ai papers ( - ) and non-peer reviewed papers ( - ) to classify cxrs of covid- patients from cxr of normals or related lung infections. the full potential of ai applications of pcxr under covid- pandemic circumstances is not yet fully realized. the goal of this pilot study is to employ deep-learning convolutional neural networks to classify normal, bacterial infection, and non-covid- viral infection (such as influenza) all rights reserved. no reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted september , . . https://doi.org/ . / . . . doi: medrxiv preprint against covid- infection on pcxr. the performance was evaluated using receiver-operating curve (roc) analysis. heatmaps were also generated to visualize and assessment the performance of the ai algorithm. we recognized that this dataset was a public, community-driven dataset and there are potential selection biases. a radiologist (bs) evaluated all images for quality and relevance and each case was covid- positive based on available data. thus, this dataset is useful and valid for the purpose of algorithm development. the other datasets were taken from the established kaggle chest x-ray image (pneumonia) dataset (https://www.kaggle.com/paultimothymooney/chest-xray-pneumonia). although the kaggle database has a large sample size, we randomly selected a sample size comparable to that of covid- to avoid asymmetric sample size bias that could skew sensitivity and specificity. the sample sizes chosen for bacterial pneumonia, non-covid- viral pneumonia, and normal pcxr were , and patients, respectively. similarly, a chest radiologist evaluated all images for quality. all rights reserved. no reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted september , . . https://doi.org/ . / . . . doi: medrxiv preprint cnn: the cnn architecture was based on vgg , a convolutional neural network ( ) . the vgg model was used because it was pretrained on the imagenet database and properly employs transfer learning which makes the training process efficient. the data was normalized first by transforming all files into rgb images and resizing them into x pixels to make them compatible with the vgg framework. next, the images were one-hot-encoded and split into % training and % testing. for data analysis, batch sizes of were used to limit computational expense and trained for epochs. several optimizers were tested however, adams optimization function gave the lowest validation loss. the learning rate was lowered from the recommended . to . to prevent overshooting the global minimum loss. categorical cross entropy was used as a loss function since the loss value decreases as the predicted probability converges to the actual label. the vgg architecture was utilized for computation efficiency and ease to implement, for immediate translation potential. all rights reserved. no reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted september , . figure shows examples of pcxr from a normal subject and from patients with different lung infections. covid- is often characterized by ground-glass opacities with or all rights reserved. no reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted september , . . https://doi.org/ . / . . . doi: medrxiv preprint without nodular consolidation with predominance of bilateral, peripheral and lower lobes involvement. non-covid- viral pneumonia is often characterized by diffuse interstitial opacities, usually bilaterally. bacterial pneumonia is often characterized by confluent areas of focal airspace consolidation. table . the precision, recall and f scores for the whole pcxr are shown in table . the overall precision, recall and f scores showed good to excellent performance. for cnn with transfer learning performed on the whole pcxr, the overall sensitivity, specificity, accuracy, and auc were . , . , and . , . respectively. for cnn performed on segmented lungs, the overall sensitivity, specificity, accuracy, and auc were . , . , . , . respectively. the performance was generally better using segmented lungs. to visualize the spatial location on the images that the cnn networks were paying attention to for classification, heatmaps of the covid- versus normal pcxr are shown in performed on the whole pcxr, the majority of the hot spots were reasonably localized to regions of ground glass opacities and/or consolidations, but some hot spots were located outside the all rights reserved. no reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted september , . . https://doi.org/ . / . . . doi: medrxiv preprint lungs. for cnn performed on segmented lungs, the majority of the hot spots were reasonably localized to regions of ground glass opacities and/or consolidations, mostly as expected. this study developed and applied a deep-learning cnn algorithm with transfer learning to classify covid- cxr from normal, bacterial pneumonia, and non-covid viral pneumonia cxr in a multiclass model. heatmaps showed reasonable localization of abnormalities in the lungs. the overall sensitivity, specificity, accuracy, and auc were . , . , . , and . respectively (segmented lungs). there are a few ai studies to date using machine learning methods to classify cxrs of covid- , normal and related lung infections. by the time this paper is reviewed many more (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted september , . . https://doi.org/ . / . . . doi: medrxiv preprint no-findings (n= ) vs. pneumonia (n= ) as well as a binary classification for covid vs. no-findings which achieved . % and . % accuracies, respectively ( ). pereira et al. pneumonia vs no-finding using resampling algorithms, texture descriptors, and cnn. this model achieved a f -score of . for the multiclass approach and f score of . for the hierarchical classification ( ). auc and accuracy were not reported. a few non-peer reviewed pre-prints using ai to classify covid- cxrs have also been reported ( - ). our study had one of the larger cohorts, balanced sample sizes, and multi-class model. our approach is also amongst the simplest ai models with comparable performance index, likely facilitate immediate clinical translation. together, these studies indicate that ai has the potential to assist frontline physicians in distinguishing covid- infection based on cxrs. heatmaps are informative tools to visualize regions that cnn algorithm pays attention to for detection. this is particular important given ai operates on high dimensional space. such heatmaps enable reality checks and make ai interpretable with respect to clinical findings. our algorithm showed that the majority of the hotspots were highly localized to abnormalities within the lungs, i.e., ground glass opacity and/or consolidation, albeit imperfect. the majority of the above-mentioned machine learning studies to classify covid- cxrs did not provide heatmaps. we also noted that cnn on whole pcxr image resulted in some hot spots located outside the lungs. cnn of segmented lungs solved this problem. another advantage of using segmented lung is reduced computational cost during training. transfer learning also reduced computational cost, making this algorithm practical. the performance is generally better using segmented lungs. all rights reserved. no reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted september , . . https://doi.org/ . / . . . doi: medrxiv preprint most covid- positive patients showed significant abnormalities on pcxr ( ) . some early studies have even suggested that pcxr could be used as a primary tool for covid- screening in epidemic areas ( , ) , which could complement swab testing which still has long turnaround time and non-significant false positive rate. in some cases, imaging revealed chest abnormalities even before swab tests confirm infection ( , ). in addition, pcxr can detect superimposed bacteria pneumonia, which necessitates urgent antibiotic treatment. pcxr can also suggest acute respiratory distress syndrome, which is associated with severe negative outcomes and necessitates immediate treatment. together with the anticipated widespread shortage of intensive care units and mechanical ventilators in many hospitals, pcxr also has the potential to play a critical role in decision-making, especially in regards to which patients to admit to the icu, put on mechanical ventilation, or when to safely extubate. a timely implementation of ai methods could help to realize the full potential of pcxr in this covid- pandemic. this pilot proof-of-principal study has several limitations. this is a retrospective study with a small sample size and the data sets used for training had limited alternative diagnoses. although the kaggle database has a large sample size for non-covid- cxr, we chose the sample sizes to be comparable to that of covid- to avoid asymmetric sample sizes that could skew sensitivity and specificity. future studies will need to increase the covid- sample size and include additional lung pathologies. the spatiotemporal characteristics on pcxr of covid- infection and its relation to clinical outcomes are unknown. future endeavors could include developing ai algorithms to stage severity, and predict progression, treatment response, all rights reserved. no reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted september , . . https://doi.org/ . / . . . doi: medrxiv preprint recurrence, and survival, to inform and advise risk management and resource allocation associated with the covid- pandemic. in conclusion, deep learning convolutional neural networks with transfer learning accurately classify covid- pcxr from pcxr of normal, bacterial pneumonia, and non-covid viral pneumonia patients in a multiclass model. this approach has the potential to help radiologists and frontline physicians by providing efficient and accurate diagnosis. all rights reserved. no reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted september , . . all rights reserved. no reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted september , . (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted september , . all rights reserved. no reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted september , . . https://doi.org/ . / . . . doi: medrxiv preprint table shows the precision and recall rate and f score (whole cxr). recall f -score (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted september , . . https://doi.org/ . / . . . doi: medrxiv preprint (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted september , . . https://doi.org/ . / . . . doi: medrxiv preprint the continuing -ncov epidemic threat of novel coronaviruses to global health -the latest novel coronavirus outbreak in wuhan, china outbreak of pneumonia of unknown etiology in wuhan, china: the mystery and the miracle early transmission dynamics in wuhan, china, of novel coronavirus-infected pneumonia first case of novel coronavirus in the united states the laboratory diagnosis of covid- infection: current issues and challenges detection of sars-cov- in different types of clinical imaging and clinical features of patients with novel coronavirus sars-cov- portable versus fixed x-ray equipment: a review of the clinical effectiveness, cost-effectiveness, and guidelines. ottawa (on) deep learning using deep convolutional neural networks to identify and classify tumor-associated stroma in diagnostic breast biopsies imagenet classification with deep convolutional neural networks improving neural networks by preventing co-adaptation of feature detectors very deep convolutional networks for large-scale image recognition deep machine learning-a new frontier in artificial transfer learning with deep convolutional neural network for liver steatosis assessment in ultrasound images artificial intelligence and machine learning in respiratory medicine a systematic review of the diagnostic accuracy of artificial intelligence-based computer programs to analyze chest x-rays for pulmonary tuberculosis attention-guided convolutional neural network for detecting pneumonia on chest x-rays deep learning algorithms with demographic information help to detect tuberculosis in chest radiographs in annual workers' health examination data explainable covid- predictions based on chest x-ray images an automated machine learning model to assist in the diagnosis of covid- infection in chest x-ray images automatic detection of coronavirus disease (covid- ) using x-ray images and deep convolutional neural networks very deep convolutional networks for large-scale image recognition learning deep features for discriminative localization association of inpatient use of angiotensin converting enzyme inhibitors and angiotensin ii receptor blockers with mortality among patients with hypertension hospitalized with covid- correlation of chest ct and rt-pcr testing in coronavirus key: cord- -hh hugqi authors: wang, jun; liu, qianying; xie, haotian; yang, zhaogang; zhou, hefeng title: boosted efficientnet: detection of lymph node metastases in breast cancer using convolutional neural network date: - - journal: nan doi: nan sha: doc_id: cord_uid: hh hugqi in recent years, advances in the development of whole-slide images have laid a foundation for the utilization of digital images in pathology. with the assistance of computer images analysis that automatically identifies tissue or cell types, they have greatly improved the histopathologic interpretation and diagnosis accuracy. in this paper, the convolutional neutral network (cnn) has been adapted to predict and classify lymph node metastasis in breast cancer. unlike traditional image cropping methods that are only suitable for large resolution images, we propose a novel data augmentation method named random center cropping (rcc) to facilitate small resolution images. rcc enriches the datasets while retaining the image resolution and the center area of images. in addition, we reduce the downsampling scale of the network to further facilitate small resolution images better. moreover, attention and feature fusion (ff) mechanisms are employed to improve the semantic information of images. experiments demonstrate that our methods boost performances of basic cnn architectures. and the best-performed method achieves an accuracy of . % and an auc of . % on rpcam datasets, respectively. even though excellent progress has been made in understanding cancers and blooming the diagnostic and therapeutic methods, breast cancer is the most common malignant cancer diagnosed worldwide, leading to the second cause of cancer-associated death in women [ ] [ ] [ ] . metastatic breast cancers (mbcs), the leading cause of breast cancer death due to their incurable nature, start spreading from the local invasion of surrounding tissues, expand into the lymphatic and blood vessels, and terminate to distant organs . it is estimated that % to % of patients arise metastases despite diagnosed with regular bc at the beginning . besides, the rate and site of metastasis possess heterogeneities due to the primary tumor subtype . thus, accurate diagnosis, prognosis, and treatment for mbcs remain challenging. for bc diagnosis, one of the essential jobs is the staging of bc that counts the recognition of axillary lymph node (aln) metastases, which is detectable in most nodepositive patients using sentinel lymph node (sln) biopsies , . evaluating microscopy images from slns are conventional techniques to assess alns. however, they require on-site pathologists to investigate samples, which is time-consuming, laborious, and lesser reliable due to a certain degree of subjectivity, particularly in cases that contain small lesions or the lymph nodes are negative for cancer . consequently, developing digital pathology methods to assist in microscopic diagnosis has evolved significantly during the last decade , . advanced scanning technology, cost reduction, quality of spatial images, and magnification have made full digitalization feasible for evaluating histopathologic tissues . digital pathology has multiple advantages, including remote consultation and sample analysis, thus improving the availability of samples and waiving on-site experts. still, it requires manual inspection, which brings inconsistent diagnostic decisions caused by individual pathologists that affect the accuracy of diagnosis are unsettled. in addition, hospitals are short of professional equipment and pathologists to support digital pathology. it is reported that presumptive treatment phenomena may exist widely among developing countries due to the lack of well-trained pathologists and professional equipment . moreover, the majority of the population can barely get access to pathology and laboratory medicine services. take cancer and cardiovascular disease as examples, only a few and unbalanced communities can get the plam treatment [ ] [ ] [ ] . to better facilitate digital pathology, reduce the cost of hospitals, and alleviate the problems mentioned before, various analysis methods have been proposed (e.g., deep learning, machine learning, and some specific software) to enhance the accuracy and sensitivity of metastatic cancer detection [ ] [ ] [ ] [ ] [ ] . convolutional neural network (cnn) is the most successful deep learning method in the computer vision field due to its robust feature extraction ability. it has been wildly used in diseases diagnosed with microscopy (e.g., alzheimer's diseases) [ ] [ ] [ ] [ ] . cnn automatically learns image features from multiple dimensions on a large image dataset, which is applied to identify or classify structures and is therefore applicable in multiple automated image-recognition biomedical areas , . cnn-based cancer detection was proved as a convenient method to classify tumours from other cells or tissues and has demonstrated satisfactory results [ ] [ ] [ ] [ ] . efficientnet is one of the most potent cnn architecture that utilizes the compound scaling method to enlarge the network depth, width, and resolution, obtaining state-of-the-art capacity in various benchmark datasets while requiring lesser computation resources than other models . hence, the efficientnet as a suitable model may show significant medical image classification potentials, although there is a big difference between the medical images and traditional images. however, few studies have explored the performance of efficientnet in medical images, which motivates us to conduct this research. in this work, we propose three strategies to improve the capability of efficientnet, including developing a cropping method called random center cropping (rcc) to retain significant features on the center area of images, reducing the downsampling scale of efficientnet to facilitate the small resolution images of rpcam datasets, and integrating attention and feature fusion mechanisms with efficientnet to obtain features containing rich semantic information. this work has three main contributions: ( ) to our limited knowledge, we are the first study to explore the power of efficientnet on mbcs classification, and elaborate experiments are conducted to compare the performance of efficientnet with other state-of-the-art cnn models, which might offer inspirations for researchers who are interested in image-based diagnosis using dl; ( ) we propose a novel data augmentation method rcc to facilitate the data enrichment of small resolution datasets; ( ) all of our four technological improvements boost the performance of original efficientnet. the best accuracy and auc achieve . % and . %, respectively, confirming the applicability of utilizing cnn-based methods for bc diagnosis. digital pathology has been widely employed for early cancer detection, classification, and monitoring treatment-response since it can be deployed readily and alleviate the uneven distribution of medical experts to a certain extent while saving their valuable time . the manual process of recognizing mbcs requires high professionalism and many auxiliary materials (e.g., bone scanning, liver ultrasonography, and chest radiography), and it is time-consuming . in addition, judgments may be affected by some factors, such as fatigue. due to the unexpected low accuracy of manual-based mbc detection, the arbitration of conflicting double reading opinions in was put forward . until now, the challenge of improving diagnostic accuracy is remaining. therefore, computer-aided diagnosis (cad) systems were adopted to assist pathologists in interpreting medical images to mitigate problems mentioned before . the traditional machine learning (ml) method plays a crucial role in the mbc classification based on cad in the early stage. in , wu et al. used three-layer, feed-forward artificial neural networks to diagnose bc on mammograms, and obtained a roc value over %, which outperformed the average capacity of attending and resident radiologists alone . in , quinlan applied the decision tree method in bc classification and demonstrated a . % classification accuracy using the c . decision tree with a -fold cross-validation . besides, hamilton et al. showed a % accuracy via the riac method, while ster and dobnikar gained a . % accuracy via the linear discreet analysis method. furthermore, abonyi and szeifert adopted the supervised fuzzy clustering (sfc) technique and achieved a . % accuracy. however, training and testing datasets in these works are small, leading to low generalization ability. with the rapid development of computer vision technology, computer hardware, and big data technology, image recognition based on dl has matured. since alexnet won the imagenet competition, an increasing number of convnets have been proposed (e.g., vgg , inception , resnet , densenet ), leading to a significant advance in computer vision tasks, including image classification and object detection. deep convolutional neural networks (dcnns) models can automatically learn image features, classify images in various fields, and possess higher generalization ability than traditional ml methods, which can distinguish different types of cells, allowing diagnosing other lesions. this technology has also achieved remarkable advances in medical fields . in past decades, many articles have been published relevant to applying the cnn method to cancer detection and diagnosis. for instance, albayrak et al. developed a cnn-based feature extraction algorithm to detect mitosis in bc histopathological images. in this algorithm, the cnn model was used to extract features to train a support vector machine (svm) for mitosis detection. also, dl technology was proved to be useful in lung detection on various image modalities. dcnns were adopted to predict patients' survival time directly from lung cancer pathological images . moreover, other groups utilized dl methods to finish medical image classification and achieved good results [ ] [ ] [ ] [ ] . for bc detection and diagnosis, agarwal et al. released a cnn method for automated masses detection in digital mammograms, which used transfer learning with three pre-trained models (e.g., vgg , resnet , and inceptionv ). in , ribli et al. proposed a faster r-cnn model-based method for the detection and classification of bc masses . the evaluation of their model on the inbreast dataset showed an auc over %. besides, shayma'a et al. used alexnet and googlenet to test bc masses on the national cancer institute (nci) and mias database . alexnet performed an accuracy of . % with auc of . %, and an accuracy of . % with auc of . % on the national cancer institute (nci) and mias database, respectively. in comparison, googlenet achieved a . % accuracy with . % auc and an . % accuracy with . % auc. also, alantari et al. presented a dl method including detection, segmentation, and classification of bc masses from digital x-ray mammograms . they utilized the cnn architecture yolo and obtained . % accuracy and an auc of . % . the accuracy of the method was competitive to that of pathologists, with the auc of . %. tan et al. proposed efficientnet, the state-of-the-art dcnn, that maintains competitive performance while requiring remarkably lesser computation resources in image recognitions . they presented a systematic study to balance the network depth, width, and resolution. great success could be seen about applying efficientnet in many benchmark datasets. academics also explored the capability of efficientnet in medical imaging classification. gonçalo marques et al. utilized efficientnet to support the diagnosis of covid- and demonstrate a . % accuracy this work also utilizes efficientnet as the backbone, which is similar to some aforementioned works, but we focus on the mbc task. in addition, quite different from past works that usually use bc masses datasets with large resolution, our work detects the lymph node metastases in breast cancer and the dataset resolution is small. to our limited knowledge, there is no research to explore the performance of efficientnet in the detection of lymph node metastases in breast cancer. therefore, this work aims to examine and improve the capacity of efficientnet in bc detection. the performances of dl models are highly dependent on the scale and quality of training datasets. a large dataset allows researchers to train deeper networks and improves the generalization ability of models, thus enhancing the performance of dl methods. however, establishing large datasets is timeconsuming and not economically proficient. to cope with this problem, data augmentation has been proposed to enrich the dataset without introducing new data. cropping is one of the most commonly used data augmentation methods in computer vision tasks and is adopted in our work. however, as mentioned in . , features used for metastasis distinguishments are mainly focused in the central area ( * ) in an image, so traditional cropping methods (random cropping and center cropping) may lead to the incomplete or lose of these essential areas. therefore, we propose a cropping method named random center cropping (rcc) to ensure the integrity of the central * area while selecting peripheral pixels randomly, allowing dataset enrichment. apart from retaining the significant center areas, rcc maintains more pixels facilitating small resolution images and enabling deeper network architectures. this section clearly describes our methods to improve the performance of efficientnet on rpcam datasets. we reduce the downsampling scale to maintain appropriate-level semantics information of features. besides, feature fusion (ff) and attention mechanisms are embedded in this work, which enhance the feature representation ability and increase the response of vital features. there are eight types of efficientnet from efficientnet-b to efficientnet-b with an increasing network scale. efficientnet-b is selected as our backbone network due to its superior performances than other architectures according to our experimental results on rpcam datasets. the architecture of boosted efficientnet-b is shown in figure . the main building block is mbconv . components in red dashed rectangles are different from the original efficientnet-b . images are first sent to some blocks containing multiple convolutional layers to extract image features. then, these features are weighted by the attention mechanism to improve the response of features contributing to classification. next, feature fusion mechanism is utilized, enabling features to retain some low-level information. in consequence, images are classified according to those fused features. figure . the architecture of boosted-efficientnet-b . efficientnet first extracts image features by its convolutional layers. attention mechanism is then utilized to reweight features, increasing the activation of significant parts. next, we perform ff on the outputs of several convolutional layers. after that, images are classified based on those fused features. details of these methods are described in the following sections. although efficientnet has demonstrated competitive functions in many tasks, we observe that there is a disparity in image resolution between the designed model inputs and rpcam datasets. most models set their input resolution to * or lager, which maintains the balance between the performance and time complexity. the depth of the network is also designed for adapting the input size. this setting performs well in most well-known baseline image datasets (e.g., imagenet , pascal voc ) as their resolutions usually are more than * . however, the resolution of rpcam datasets is * , which is much smaller than the designed model inputs * . after the efficientnet processing, the size of the final feature will be times smaller than the input (from * to * ). this feature map is likely to be too abstractive and thus losing low-level features, which may defect the performance of efficientnet. to mitigate this problem, we adjust the down-sampling multiple in efficientnet. our idea is implemented by modifying the stride of the convolution kernel of efficientnet even though the receptive filed of convolution kernels might be reduced. however, the reduction influence could be slight since the resolution of inputs is small. to select the best-performed downsampling scale, multiple and elaborate experiments are conducted on the downsampling scale { , , , , }, and strategy outperforms other settings. the size of the feature map in best-performed downsampling scale ( ) is * , which is one times larger than the original downsampling multiple ( ) . the change of the downsampling scale from to is implemented by modifying the stride of the first convolution layer from two to one, as shown in the red dashed rectangles on the left half of figure . when seeing a picture, the human visual system selectively focuses on a specific part of the picture while ignoring other visible information due to limited visual information processing resources. for example, although the sky information largely covers in the figure, people are able to capture the aeroplane in the image readily (figure ) . to simulate this process in artificial neural networks, attention mechanism is proposed and has achieved great success in many tasks such as image caption , , image classification , and object detection , . attention technique can be simply interpreted as a means of increasing the response of the most informative parts and suppressing the activation of others. for instance (figure ) , it can be seen that the response of background is large as most parts of image are background. however, this information usually is useless to the classification, so their response should be suppressed. on the other hand, cancerous tissue is more informative and deserves higher activation, so its response is enhanced after processed by the attention mechanism. as we stated before, the most informative features are in the center area of images on rpcam datasets, making attention more critical for this work. hence, this project also adopts the attention mechanism implemented by a squeeze-and-excitation block proposed by hu et al. briefly, the essential components are the squeeze and excitation. suppose feature maps have channels and the size of the feature in each channel is * . for squeeze operation, global average pooling is applied to , enabling features to gain a global receptive field. after squeeze operation, the size of feature maps change from * * to * * . results are denoted as . more precisely, this change is given by where denotes ℎ channel of , and is the squeeze function. following the squeeze operation, the excitation operation is to learn the weight (scalar) of different channels, which is simply implemented by the gating mechanism. specifically, two fully connected layers are employed to learn the weight of features and activation function sigmoid, and relu are applied for non-linearity increasing. excepting the non-linearity, the sigmoid function also certifies the weight falls in the range of [ , ]. the calculation process of the scalar (weight) is shown in equation ( ). where is the result of excitation operation, is the excitation function, and refers to the gating function. and denote the sigmoid and relu function, respectively. and are learnable parameters of the two fully connected layers. the final output is calculated by multiplying the scalar s with the original feature maps u. in our work, the attention mechanism is combined with the feature fusion technique, as shown in figure . high-level features generated by deeper convolutional layers contain rich semantic information, but they usually lose details such as positions and colors that are helpful in the classification. in reverse, low-level features include more detailed information but introducing non-specific noise. ff is a technique that combines low-level and high-level features and has been adopted in many image recognition tasks for performance improvement . detail information is more consequential in our work since complex textures contours exist in the rpcam images despite their small resolution. accordingly, we adopt the ff technique to boost classification accuracy. four steps are involved during the ff technique ( figure ): ) during the forward process, we save the outputs (features) of the convolutional layers in the th , th , th and th blocks. ( ) after the last convolutional layer extracts features, attention mechanism is applied to features recorded in step one to value the essential information. ( ) low-level and high-level features are combined using the outputs of step after the attention mechanism. ( ) these fused features are then sent to the following layers to conduct classification. this section first introduces the evaluation metrics used for verifying the performance of our methods. implementation details are then clearly described. next, we exhibit the capacity of boosted efficientnet and comparisons among other state-of-the-art models. after that, the influence of each method is investigated via ablation studies. consequently, elaborate experiments are conducted to explore the effectiveness of the boosted efficientnet. we evaluate our method on the rectified patchc camelyon (rpcam) dataset. since the testing set is not provided, we split the original training set into a training set and a validation set and utilize the validation set to verify the performance of models. in detail, the capacities of models are evaluated by five indicators, including area under the curve (auc), accuracy (acc), sensitivity (sen), specificity (spe), and f-measure our method is built on the efficientnet-b model and implemented based on the pytorch deep learning framework using python . four pieces of gtx ti gpus are employed to accelerate the training. all models are trained for epochs. the gradient optimizer is adam. before being fed into the network, images are normalized by the mean and standard deviation on their rgb-channels. in addition to the rcc, we also employ random horizontal and vertical flipping in the training time to enrich the datasets. during the training, the initial learning rate is . and decayed by a factor of at the th and rd epochs. the batch size is set to . the parameters of boosted efficientnet and other comparable models are placed as close as possible to enhance the credibility of the comparison experiment. in detail, the parameter sizes of these three models are increased in turn from the improved efficientnet, densenet , and resnet . experiments are conducted on the basic efficientnet and boosted-efficientnet to evaluate the effectiveness of our methods. moreover, we compare boosted efficientnet with another two state-ofthe-art cnn models, resnet and densenet , to prove its superiority further. the results are shown in table and figure . it can be seen that basic efficientnet outperforms boosted-efficientnet-b on the training set both on the acc and auc, while a different pattern can be seen on the testing set. the main reason for this different trend is that the basic efficientnet overfits the training set but boosted-efficientnet-b mitigates overfitting problems since rcc enables the algorithm to crop images randomly, and thus improving the diversity of training images. although enhancing the performance of a well-performing model is of great difficulty, compared with basic efficientnet-b , boosted-efficientnet-b significantly improves the acc from . % to . % and boosts auc from . % to . % modestly. besides, more than % increasing can be seen in the sen, spe, and f-measure. same patterns of comparison between basic efficientnet and boosted efficientnet-b can be found when comparing efficientnet-b to other cnn architectures. notably, resnet and densenet significantly suffer from the overfitting problem. efficientnet-b obtains better performance than resnet and densenet for all indicators on testing datasets while using lesser parameters and computation resources (figure ) . all these results confirm the capability of our methods, and we believe these methods can boost other state-of-the-art backbone networks. therefore, we intend to extend the application scope of these methods in the future. ablation studies are conducted to illustrate the effectiveness and coupling degree of the four methods, as shown in section . . in this part, we conduct ablation experiments to illustrate the capacity of our methods, including random center cropping (rcc), reduce the downsampling scale (rds), feature fusion (ff), and attention. auc and acc are utilized as the primary evaluation metrics. from the first two rows of table , we can observe that the rcc significantly improves performances of algorithms by noticing the auc is increased from . % to . %, and the acc is increased from . % to . % because rcc enhances the diversity of training images and mitigates overfitting problem. as the first and third rows of table show, modest improvements of acc and auc ( . % and . %, respectively) can be seen because of the larger feature map. the image resolution of the rpcam dataset is much lower than the designed input of the efficientnet-b , resulting in smaller and abstractive features, thus defecting the performance. it is worth noting that the improvement of the rds is enhanced when being combined with the rcc. feature fusion (ff) combines low-level and high-level features to boost the performance of models. as shown in table , when adopting only one mechanism, the ff demonstrates the largest auc and the second-highest acc increasing among rcc, rds, and ff, indicating ff's adaptability and effectiveness in efficientnet. the ff contributes to more remarkable improvement to the model after utilizing rcc and rds since acc reaches the highest value, and auc comes the second among all methods. it should be emphasized that the attention mechanism needs to be combined with ff in our work. utilizing the attention mechanism to enhance the response of cancerous tissues and suppress the background can further boost the performance. from the th, th rows of table , it can be seen that the attention mechanism improves the performance of original architectures both in the acc and auc, confirming its effectiveness. then, we analyze the last four rows. when the first three strategies are employed, adding attention increases the auc by . %, but the acc remains at a . % value. meanwhile, attention brings a significant performance improvement comparing with models only utilize rcc and ff since acc and auc are increased from . % to . % and from . % to . %, respectively. although the model using all methods demonstrates the same value of the auc as the model only utilizing rcc, rds, and ff, all utilized model shows . % acc improvements. a possible reason for the minor improvement between these two models is that rds enlarges the size of the final feature maps, thus maintaining some low-level information to some extent, which is similar to ff and attention mechanism. the purpose of this project is to facilitate the development of digital diagnosis in mbcs and explore the applicability of a novel cnn architecture efficientnet on mbc. in this paper, we propose a boosted efficientnet cnn architecture to automatically diagnose the presence of cancer cells in the pathological tissue of breast cancers. we develop a data augmentation method rcc to retain the most informative parts of images and maintain original image resolution. experiments demonstrate that this method significantly improves the performance of efficentnet-b . in addition, we propose to reduce the downsampling scale of basic efficientnet by adjusting the architecture of efficientnet-b to facilitate small resolution training images better. moreover, two mechanisms are employed to enrich the semantic information of features. as shown in the ablation studies, both of these methods boost the basic efficientnet-b , and more remarkable improvements can be obtained by combining some of them. boosted-efficientnet-b is also compared with another two state-of-the-art cnn architectures, resnet and densenet , and shows superior performance. we believe that our methods can be utilized in other models and lead to improved performance on other diseases diagnosis and will explore this in the future. in summary, our boosted efficientnet-b achieves an accuracy of . % and an auc value of . %, respectively, and hence may provide a reliable, efficient, and economical alternative for medical institutions in relevant areas. all data generated or analyzed during this study are included in this published article and its supplementary information files. the authors declare that they have no competing interests. detection of breast cancer on digital histopathology images: present status and future possibilities immunomagnetic sequential ultrafiltration (isuf) platform for enrichment and purification of extracellular vesicles from biofluids. biorxiv isolation and detection technologies of extracellular vesicles and application on cancer diagnostic breast cancer metastasis: markers and models metastatic behavior of breast cancer subtypes effect of axillary dissection vs no axillary dissection on -year overall survival among women with invasive breast cancer and sentinel node metastasis: the acosog z (alliance) randomized clinical trial sentinel-node biopsy to avoid axillary dissection in breast cancer with clinically negative lymph-nodes axillary node interventions in breast cancer: a systematic review digital imaging in pathology: whole-slide imaging and beyond validation of a digital pathology system including remote review during the covid- pandemic histopathological image analysis: a review large scale tissue histopathology image classification, segmentation, and visualization via deep convolutional activation features improving pathology and laboratory medicine in low-income and middleincome countries: roadmap to solutions nanoscale technologies in highly sensitive diagnosis of cardiovascular diseases exosomes: a novel therapeutic agent for cartilage and bone tissue regeneration deep learning: convergence to big data analytics machine learning for medical imaging open source software for digital pathology image analysis a. j. o. p. pathology image analysis using segmentation deep learning algorithms deep learning for identifying radiogenomic associations in breast cancer impact of deep learning assistance on the histopathologic review of lymph nodes for metastatic breast cancer deep learning in medical image analysis deep learning a study on deep machine learning algorithms for diagnosis of diseases identifying medical diagnoses and treatable diseases by image-based deep learning international conference on computing, mathematics and engineering technologies (icomet international conference image analysis and recognition deep learning vs. radiomics for predicting axillary lymph node metastasis of breast cancer using ultrasound images: don't forget the peritumoral region efficientnet: rethinking model scaling for convolutional neural networks imaging and cancer: a review baseline staging tests after a new diagnosis of breast cancer: further evidence of their limited indications the pathological and radiological features of screen-detected breast cancers diagnosed following arbitration of discordant double reading opinions c. m. i. & graphics. computer-aided diagnosis in medical imaging: historical review, current status and future potential artificial neural networks in mammography: application to decision making in the diagnosis of breast cancer j. o. a. i. r. improved use of continuous attributes in c riac: a rule induction algorithm based on approximate classification international conference on engineering applications of neural networks supervised fuzzy clustering for the identification of fuzzy classifiers advances in neural information processing systems very deep convolutional networks for large-scale image recognition proceedings of the ieee conference on computer vision and pattern recognition proceedings of the ieee conference on computer vision and pattern recognition proceedings of the ieee conference on computer vision and pattern recognition breast cancer masses classification using deep convolutional neural networks and transfer learning ieee th international symposium on computational intelligence and informatics (cinti) ieee international conference on bioinformatics and biomedicine (bibm) computer-aided diagnosis for burnt skin images using deep convolutional neural network a deep learning-based framework for automatic brain tumors classification using transfer learning deep convolutional neural networks with transfer learning for automated brain image classification transfer learning of class decomposed medical images in convolutional neural networks automatic mass detection in mammograms using deep convolutional neural networks detecting and classifying lesions in mammograms with deep learning i. j. o. m. i. a fully integrated computer-aided diagnosis system for digital x-ray mammograms via deep learning detection, segmentation, and classification proceedings of the ieee conference on computer vision and pattern recognition deep learning to improve breast cancer detection on screening mammography classification and mutation prediction from non-small cell lung cancer histopathology images using deep learning automated medical diagnosis of covid- through efficientnet convolutional neural network international conference on advanced machine learning technologies and applications international conference on medical image computing and computer-assisted intervention diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer proceedings of the ieee conference on computer vision and pattern recognition ieee conference on computer vision and pattern recognition the pascal visual object classes challenge (voc ) results i. j. o. c. v. the pascal visual object classes (voc) challenge international conference on machine learning proceedings of the ieee conference on computer vision and pattern recognition i. j. o. r. s. a survey of image classification methods and techniques for improving classification performance proceedings of the ieee computer society conference on computer vision and pattern recognition. cvpr . i-i sixth international conference on computer vision proceedings of the ieee conference on computer vision and pattern recognition & processing, i. multisensor image fusion using the wavelet transform a review on evaluation metrics for data classification evaluations key: cord- -azpg yrh authors: mead, dylan j.t.; lunagomez, simón; gatherer, derek title: visualization of protein sequence space with force-directed graphs, and their application to the choice of target-template pairs for homology modelling date: - - journal: j mol graph model doi: . /j.jmgm. . . sha: doc_id: cord_uid: azpg yrh the protein sequence-structure gap results from the contrast between rapid, low-cost deep sequencing, and slow, expensive experimental structure determination techniques. comparative homology modelling may have the potential to close this gap by predicting protein structure in target sequences using existing experimentally solved structures as templates. this paper presents the first use of force-directed graphs for the visualization of sequence space in two dimensions, and applies them to the choice of suitable rna-dependent rna polymerase (rdrp) target-template pairs within human-infective rna virus genera. measures of centrality in protein sequence space for each genus were also derived and used to identify centroid nearest-neighbour sequences (cnns) potentially useful for production of homology models most representative of their genera. homology modelling was then carried out for target-template pairs in different species, different genera and different families, and model quality assessed using several metrics. reconstructed ancestral rdrp sequences for individual genera were also used as templates for the production of ancestral rdrp homology models. high quality ancestral rdrp models were consistently produced, as were good quality models for target-template pairs in the same genus. homology modelling between genera in the same family produced mixed results and inter-family modelling was unreliable. we present a protocol for the production of optimal rdrp homology models for use in further experiments, e.g. docking to discover novel anti-viral compounds. ( words) since high-throughput sequencing technologies entered mainstream use towards the end of the first decade of the st century, there has been an explosion in available protein sequences. by contrast, there has been no corresponding high-throughput revolution in structural biology. obtaining solved structures of proteins at adequate resolution remains a painstaking task. x-ray crystallography is still the gold standard for structure determination more than years after its first use in determining myoglobin structure [ ] . the result of this discrepancy between the rate of protein sequence determination and the rate of protein structure determination is the protein sequence-structure gap [ ] . homology modelling is a rapid computational technique for prediction of a protein's structure from (a) the protein's sequence, and (b) a solved structure of a related protein, referred to as the target and the template, respectively. since structural similarity often exists even where sequence similarity is low [ , ] , homology modelling has the potential to reduce massively the size of the protein sequence-structure gap, provided the models produced can be considered reliable enough for use in further research. the rna-dependent rna polymerase (rdrp) of rna viruses presents an opportunity to test and expand this approach. rdrps are the best conserved proteins throughout the rna viruses, being essential for their replication [ ] . conservation is particularly high in structural regions that are involved in the replication process, for instance the indispensable rna-binding pocket [ ] . rdrps are also of immense medical importance as the principal targets for antiviral drugs. evolution of resistance against anti-viral drugs is a major concern for the future, and the design of novel anti-viral compounds is a highly active research area. solved structures of rdrps are of great assistance to these efforts, as they enable the use of docking protocols against large libraries of pharmaceutical candidate compounds [e.g. refs. [ , ] ]. although some human-infective rna viruses have solved rdrp structures, there are still large areas within the virus taxonomy that lack any. this paper will first identify where the protein sequencestructure gap is at its widest in rdrps. because of the sequencestructure gap, it is therefore impossible in many genera to perform docking protocols against solved structures of rdrp for discovery of novel anti-viral compounds. under these circumstances, replacement of real solved structures with homology models for docking experiments requires that the homology models used should be both high quality and also optimally representative of their respective genera. our second task is to present several similarity metrics in sequence space that assist in the identification of the virus species having the rdrp sequence that is most representative of its genus as a whole. we then present the first use of force-directed graphs to produce an intuitive visualization of sequence space, and select target rdrps without solved structures for homology modelling. these are then used to perform homology modelling using template-target pairs within the same genus, between sister genera and between sister families, monitoring the quality of the models produced as the template becomes progressively more genetically distant to the target sequence being modelled. finally, we produce homology models for reconstructed common ancestral rdrp sequences. in the light of our results, we comment on the strengths and weakness of homology modelling to reduce the size of the protein sequence-structure gap for rdrps, and produce a flowchart of recommendations for docking experiments on rdrp proteins lacking a solved structure. we chose rdrps from human-infective viruses based on the list provided by woolhouse & brierley [ ] . given the global medical importance of aids, we also included lentivirus reverse transcriptases (rts) for analysis. solved structures for these proteins, where available, were downloaded from the rcsb protein data bank (pdb) [ ] . table presents our criteria for selecting suitable homology modelling candidates. rdrp and rt amino acid sequences for all virus species satisfying the criteria of table were downloaded from genbank [ ] . alignment of sequence sets for each genus, was performed using mafft [ ] . alignments were refined in mega [ ] using muscle [ ] where necessary, and the best substitution model determined. alignment of target sequences onto their solved structure templates for homology modelling was carried out using the molecular operating environment (moe v. . , chemical computing group, montreal h a r , canada). we define sequence space as a theoretical multi-dimensional space within which protein sequences may be represented by points. for an alignment of n related proteins, the necessary dimensionality of this sequence space is n- , with the hyperspatial co-ordinates in each dimension for any protein determined by its genetic distance to the n- other proteins. for n ¼ , direct visualization of all dimensions of sequence space is impractical at best, since a -dimensional space must be simulated in three dimensions, and is effectively impossible for n ! . the following methods were used to reduce sequence space to two and three dimensions for ease of visualization. to simplify calculations, we allow an extra dimension defined by the distance from each sequence to itself. the value of the co-ordinate in that dimension is always zero and our sequence space has n dimensions rather than n- . the pairwise distance matrix (m d ) for each genus, calculated from the sequence alignment in mega, consists of entries m d (i,j) giving the genetic distance between each pair of sequences i and j where {i, j} { , …. n} and i s j, for a set of n sequences. in our data set n ranges (see supplementary table) the similarity matrix was then used as input for r package qgraph [ ] . the "spring" layout option was chosen, which uses the fruchterman-reingold algorithm to produce a two-dimensional undirected graph in which edge thickness is proportional to absolute distance in n dimensions and node proximity in two dimensions is optimized for ease of viewing while attempting to ensure that those nodes closely related in the n-dimensional input are also close in the two-dimensional output [ ] . iterations were performed, or until convergence was achieved. for each alignment, the pairwise distance matrix (m d ) was used as input for r package cmdscale, which uses multi-dimensional scaling to produce a three-dimensional graph from the n-dimensional input, with node proximity again reflecting relative similarity [ ] . spotfire analyst (tibco spotfire analyst, v. . . , ) was used to visualize the output of cmdscale. we define the centroid as a hypothetical protein sequence located at the centre point of the sequence space of an alignment. the real sequence closest to the hypothetical centroid is termed the centroid nearest neighbour (cnn). we calculate the position of the cnn in three ways. table list of criteria used to select rna-dependent rna polymerases (rdrps) for homology modelling. human-infective virus importance to human health ncbi refseq annotated genome easy retrieval of high quality rdrp sequence rdrp located at the end of polyprotein or on its own segment eliminates unconventional rdrps at least one solved rdrp at a range of different taxonomic levels, e.g. in same species, same genus, same family, same order to be used as the templates in homology modelling at different levels of genetic distance . . . shortest-path centroid nearest neighbour for a sequence i { , …. n} in an alignment of n sequences, its total path length d(i) to the other n- sequences may be calculated from the distance matrix m d as follows: is zero. this may be omitted to enforce a strict n- dimensions for n input sequences, but we leave it in to simplify subsequent calculations. we define i* as the index that minimizes d(i). the shortest path cnn is therefore sequence i*. for alignments where clusters of closely related sequences exist, giving many values of m d (i,j) close to zero, this method will tend to place the cnn within a cluster. to overcome this problem, the arithmetic mean and median, respectively, were used to determine the mean cnn and the median cnn. the values of d (equation ( )) may be averaged to produce mean total path distance d: where again n is the total number of sequences in the alignment. we now re-define i* as the index that minimizes d(i) -d. in the event of equation ( ) returning zero, the mean cnn and the true centroid are identical. as with all variables using means, the mean cnn is liable to skewing by outliers. we generate a vector d over i { , …. n}, in which each entry d(i) represents the total path length for sequence i (equation ( )). the values of vector d are then ranked in ascending order x s( ) to x s(n) to produce vector d s . the median cnn is the sequence with value d(i) situated in the middle of the array d s , at d(m), where d(m) is either d (m odd ) or d (m even ) for alignments with odd or even numbers of sequences respectively. we now re-define i* as the index that minimizes d(i) -d(m). again, in the event of equation ( ) returning zero, the median cnn and the true centroid are identical. as with all variables using medians, the median cnn is liable to skewing by the presence in the alignment of multiple sequences with the same value of d(i). the choice of solved structures as templates for homology modelling, and the choice of targets to be modelled, within each genus was governed by the following rules: ( ) for each genus the solved structure that covered the highest proportion of the rdrp or rt sequence was chosen as the template for that genus. ( ) if more than one candidate template structure was found at this sequence length, the structure with the lowest resolution in angstroms was selected. see table for the templates satisfying these two criteria. ( ) within each genus, the sequence with the greatest genetic distance from the template, was chosen as the target for homology modelling. see table for the template-target pairs satisfying this criterion. ( ) criterion was applied to find template-target pairs in different genera (see table ) and different families (see table ), thus testing the limits of homology modelling at high genetic distances. homology modelling was carried out using the molecular operating environment (moe v. . , chemical computing group, montreal h a r , canada). ten intermediate models were produced using the amber :eht forcefield under medium refinement. the model that scored best under the generalised born/ volume integral (gb/vi) was selected to undergo further energy minimisation using protonate d, which predicts the location of hydrogen atoms using the model's d coordinates [ , ] . to assess the stereochemical quality of the homology models produced, ramachandran plots were derived in moe, and used to calculate the proportion of bad outlier f-j angles in the model, after subtraction of the number of outlier f-j angles in the template. generally, outlier angle percentage below . % indicates a very high quality model, and a percentage below % indicates a good quality model [ ] . models were superposed with their templates in moe and rootmean-square deviation (rmsd) value derived for the alpha carbons (ca) in the two structures. generally, an rmsd below Å indicates a good quality model [ ] . qualitative model energy analysis (qmean) was used to analyse models using both statistical and predictive methods [ ] . the qmean z-score is an overall measure of the quality of the model when compared to similar models from a pdb reference set of x-ray crystallography-solved structures. a z-score of would indicate a model of the same quality as a similar high quality x-ray crystallographic structure, while a z-score below À . indicates a low quality model [ ] . maximum likelihood (ml) trees [ ] were produced for each genus in mega. the ml tree and the corresponding multiple sequence alignment were input into the ancestral reconstruction server, fastml [ ] . the reconstructed sequence for the root of the tree, i.e. the putative common ancestor rdrp or rt sequence for the genus was used as the target for homology modelling in moe, using the template chosen according to the rules in section . . the reconstructed ancestral sequence was added to the alignment and the force-directed graph re-drawn. fig. b , showing the targettemplate pairs for homology modelling may be compared with fig. c , showing the ancestor-template pairs. our first observation is that there are still large areas of the viral taxonomy where no solved rdrp structures exist. no suitable templates for homology modelling were found within the entire nidovirales order of rna viruses. this order contains several coronaviruses important to human health including severe acute respiratory syndrome-related coronavirus (sars-cov) and middle east respiratory syndrome-related coronavirus (mers-cov) [ ] . in the order mononegavirales, vesiculovirus was the only genus with a solved rdrp structure suitable for homology modelling. however, this order contains many medically important viruses such as zaire ebolavirus, hendra henipavirus, measles morbillivirus, and mumps rubulavirus [ ] . in the order bunyavirales, phenuiviridae stands out as an important family lacking a solved rdrp, despite it containing various human-infective arboviruses such as rift valley fever phlebovirus and sandfly fever naples phlebovirus [ ] . (table ) . fig. shows two-dimensional force-directed graphs of similarity for each genus with more than four rdrp reference sequences (or rt sequences in the case of lentivirus). in principle, it would be possible to draw force-directed graphs for entire families and even orders. however, the input to qgraph is the similarity matrix calculated from the distance matrix, and the distance matrix is calculated in mega from an alignment. once taxonomic distance begin to extend beyond genera, alignment becomes progressively less reliable, with all the downstream statistics tending to degrade as a consequence. we therefore confine our construction of forcedirected graphs to intra-genus comparisons. it is evident from fig. that sequences are not necessarily evenly distributed in sequence space. clustering is noticeable in the genus flavivirus, with two sub-groups and an outlier sequence evident. mammarenavirus also shows division into two sub-groups. by contrast, picobirnavirus has only five relatively equidistant reference sequences, thus producing a highly regular pentagram. similarly, rotavirus has eight reference sequences, with four at each end of a fairly regular cuboid. fig. a also shows how the various methods equations ( )e( ) for determining the cnn of sequence space for each genus, are in poor agreement. only in rotavirus and table solved structures of rdrps and reverse transcriptase (for hiv- ) selected as templates for homology modelling. all are derived by x-ray crystallography except a which is a cryo-electron microscopy structure. for protein coverage, indicates that the template covers more than % of the sequence, indicates less. for f-j outliers and qmean z-score, indicates good-quality, indicates poor-quality, determined by the following thresholds: f-j ¼ %, qmean z-score ¼ À . . table homology modelling at intra-genus, inter-species level. templates are as given in table . targets are the rdrp (or reverse transcriptase for lentivirus) sequences from the reference genome accession numbers given. rmsd: root mean square deviation in angstroms between template and model when superposed in moe. indicates good quality, indicates poor quality, determined by the following thresholds: f-j < %; qmean z-score > À . ; rmsd < Å. indicates good quality, but using a partial template (see table ) *imjin thottimvirus was reclassified in by the international committee on taxonomy of viruses (ictv) in a new genus thottimvirus. table homology modelling at intra-family, inter-genus level. templates are as given in table . targets are the rdrp (or reverse transcriptase for spumavirus) sequences from the reference genome accession numbers given. rmsd: root mean square deviation in angstroms between template and model when superposed in moe. indicates goodquality, indicates poor-quality, determined by the following thresholds: f-j < %; qmean z-score > À . ; rmsd < Å. picobirnavirus are mean and median cnns found in the same sequence. fig. a also shows that the best solved structure for the purposes of template choice in homology modelling is rarely close to the centre of sequence space. only in lentivirus is the optimal template also the mean cnn, and only in vesiculovirus is the optimal template a shortest-path cnn. fig. b shows the relations of the template-target pairs in sequence space, illustrating how intra-genus homology modelling template-target selection attempts to traverse the largest genetic distance available within the genus. figs. and compare, for genera orthohantavirus and mammarenavirus respectively, the force-directed graphs of fig. with the three-dimensional equivalent output of multidimensional scaling. fig. shows a sequence clustering within orthohantavirus that is not readily apparent in the force-directed graph. the cnns are distributed among four clusters, as there is no sequence close to the geometrical centre of the three-dimensional space, where the notional centroid is located. the solved structure has other sequences in its proximity in the three-dimensional space, roughly table homology modelling at intra-order, inter-family level. templates are as given in table . targets are the rdrp (or reverse transcriptase for lentivirus) sequences from the reference genome accession numbers given. rmsd: root mean square deviation in angstroms between template and model when superposed in moe. indicates goodquality, indicates poor-quality, determined by the following thresholds: f-j < %; qmean z-score > À . ; rmsd < Å. fig. . force-directed graph visualisations of similarity of rdrps (or reverse transcriptase for lentivirus) within genera. the genetic distance matrix for each alignment was converted into a similarity matrix equations ( ) and ( ). the fruchterman-reingold algorithm ( minimisation iterations) was implemented in r module qgraph to produce a force-directed graph. relative similarity is represented by node proximity, and absolute similarity is proportional to edge thickness. the solved structure and the three types of centroid nearest neighbour (cnn) sequences are highlighted. the species names corresponding to the numbered nodes are listed in the supplementary table. cardiovirus has less than four reference sequences and is omitted. a: location of solved structure and the three cnns in sequence space equations ( )e( ). some genera have two median cnns. equivalent to the lower right quadrant of the two-dimensional force-directed graph. similarly, the shortest-path cnn and mean cnn are both located within another three-dimensional cluster also containing sequences, which is roughly equivalent to the upper right quadrant of the two-dimensional force-directed graph. fig. presents a similar picture for mammarenavirus. the forcedirected graph for mammarenavirus has more obvious clustering that for orthohantavirus, showing a lower-left to top-right split. in the three-dimensional representation, these are equivalent, respectively, to the three clusters on the right and two clusters on the left. as with orthohantavirus, there is no cnn near the geometrical centre of the three-dimensional space, but the cnns are distributed around two clusters. three dimensional representations of all the genera in fig. are available from the link in the raw data section. homology modelling was carried out as follows: ( ) intra-genus, inter-species ( models, table ) ( ) intra-family, inter-genus ( models, table ) ( ) intra-order, inter-family ( models, table ) ( ) intra-genus, on reconstructed common ancestor ( models, table ) table shows that homology modelling with template and target within the same genus, produced good quality models in most cases, as judged by percentage of f-j outliers and rmsd within the high quality range. only the models for american bat vesiculovirus and tamana bat virus have percentages of f-j outliers outside of the high quality range. qmean, however, is rather more critical of the output with only the model for porcine picobirnavirus falling within the high quality range. the model for imjin thottimvirus scores eighth best on percentage of f-j outliers and second best on rmsd, despite the re-classification (occurring after the completion of our experimental work) by the ictv of this virus, originally in genus orthohantavirus into a new thottimvirus genus [ ] . it should be noted that the models for imjin thottimvirus, burana orthonairovirus and brazilian mammarenavirus were based on very short template structures (see table ). table shows that homology modelling with template and target within the same family but different genera, still produced good quality models in most cases, as judged by percentage of f-j outliers and rmsd within the high quality range. only the models for lleida bat lyssavirus and macaque simian foamy virus have percentages of f-j outliers outside of the high quality range. however, once again, qmean assesses all models as outside the high quality range. table shows that homology modelling with template and target within the same order but in different families, is a far more difficult proposition than at the lower taxonomic levels. the model for mammalian orthobornavirus fails all three quality tests and only the model for rift valley fever phlebovirus manages to pass two out of three. table shows that modelling the structure of the reconstructed sequence of the common ancestor of each genus, produces models of the same standard as intra-genus modelling (compare tables and ). by contrast with almost all the other models, the qmean scores are within the high quality range, with only two exceptions, table. the common ancestors of genera rotavirus and vesiculovirus. fig. c shows the force-directed graphs with the locations of the ancestral sequences added. table summarises the results of tables e inclusive. as the taxonomical distance increases, production of high quality homology models becomes more difficult. however, modelling the reconstructed ancestral sequence of each genus is typically productive of a better scoring model even than the real sequence targets chosen for intra-genus modelling. fig. shows representative examples of homology models of high and low quality superimposed with their template solved structure along with their corresponding ramachandran plots and qmean quality scores. all homology models in tables e are available from the link in the raw data section. the first objective of this study was to identify viral taxa which are comparatively lacking in solved structures for rna-dependent rna polymerase (rdrp). we observed that the entire order nidovirales, the families bornaviridae, filoviridae and paramyxoviridae within the order mononegavirales, and the family phenuiviridae within the order bunyavirales, fall into this category. additionally, within the genera orthohantavirus, orthonairovirus and mammarenavirus, all within the order bunyavirales, the solved structure available for rdrp covers less than % of the protein sequence. given the medical importance of many viruses within these taxa, and the number of anti-viral drugs that target rdrps, we suggest that they are prioritized for x-ray crystallography to close the "sequence-structure gap". our second objective was to assess how well homology modelling could provide models that might serve for computerassisted drug discovery of novel anti-viral compounds. to assist in the visualization of sequence space, we produced the first application of force-directed graphs to protein sequences (fig. ) . we also applied multidimensional scaling for comparative purposes (figs. and ) . force-directed graphs enable the visualization of complex data in two dimensions. the three dimensional visualization produced from multidimensional scaling is visually richer, but this benefit can only be appreciated when a viewing application such as spotfire is available so that the three-dimensional image can be rotated. force-directed graphs convey much of the information in a single image which may be printed on a page or viewed on screen. this two-dimensional collapsing of sequence space also allows for easy simultaneous comparison of multiple datasets, in the present case multiple genera, which cannot readily be performed if separate three-dimensional viewers require to be open. the most common method of visualizing sequence space is the phylogenetic tree. for instance, starting from a distance matrix, agglomerative hierarchical clustering, such as the upgma method [ ] , can be performed to generate a tree. slightly more sophisticated methods, such as neighbour-joining [ ] can generate trees where the branch lengths are proportional to genetic distance. force-directed graphs do not represent genetic distance as accurately as phylogenetic trees, since the distances between nodes, table. although optimized to reflect relatedness, are constrained by the fruchterman-reingold algorithm to the best representation in two dimensions. however, force-directed graphs again allow easier simultaneous comparison of several data sets than phylogenetic trees. fig. would be impossible to create on a single page if trees were used instead of force-directed graphs. trees represent ancestral sequences as nodes on the tree, with only existing taxa as leaves. force-directed graphs, by contrast, allow ancestral sequences to be represented in the same way as existing ones. fig. c shows that ancestral sequences do not necessarily appear as outliers in force-directed graphs. indeed, for genera flavivirus, hepacivirus, orthobunyavirus and orthohantavirus in particular, the insertion of the reconstructed ancestral sequence into the forcedirected graph in fig. c does not overly distort its original shape in fig. aeb . the reason for this becomes apparent when one considers a phylogenetic tree represented in unrooted "star" format. the ancestral sequence is then at the centre of the star topology and it can be seen that the genetic distance from the root to any particular leaf sequence may often be less than for many pairwise leaf sequence combinations. we did not perform calculation of centroid nearest neighbours (cnns) for alignments incorporating reconstructed ancestral sequences, but we are tempted to speculate that many of the ancestral sequences would have been cnns, had they been included. table homology modelling the common ancestor for each genus. templates are as given in table . targets are the reconstructed ancestral rdrp (or reverse transcriptase for lentivirus) sequences. rmsd: root mean square deviation in angstroms between template and model when superposed in moe. indicates good-quality, indicates poorquality, determined by the following thresholds: f-j < %; qmean z-score > À . ; rmsd < Å. table mean model (or structure) quality. the top line shows the mean quality scores for the solved structures used. the other lines show the mean quality scores for the models produced at various levels of taxonomic distance between template and target. indicates good-quality, indicates poor-quality, determined by the following thresholds: f-j < %; qmean z-score > À . ; rmsd < Å. numbers in brackets indicate the revised scores if the model for imjin thottimvirus is moved out of the intra-genus category and into the intra-family category in the light of its subsequent transfer into the new genus thottimvirus. ), and outliers ( cross, text). the z-score graphics show model quality on a sliding scale: low-quality ( ), high-quality ( ). qmean shows the overall z-score, "all atom" shows the average z-score for all of the atoms in the model, "cbeta" the z-score for all cb carbons, "solvation" is a measure of how accessible the residues are to solvents, and "torsion" is a measure of torsion angle for each residue compared to adjacent residues. it is important to remember that homology models are theoretical constructions and caution must be exercised in treating them as input material for further experiments. among the various statistics for assessment of model quality, f-j outlier percentage is a measure of the proportion of implausible dihedral angles in the model, and indicate where parts of the model backbone are likely to be incorrectly predicted. nevertheless, it is also important not to become too dependent on statistics such as f-j outlier percentage, as "bad" angles do occasionally occur in solved structures. for instance in the present study, the thresholds of < . % for a very high quality model, and < % for a good quality model given by lovell et al. [ ] would suggest that six of the twelve template solved structures used here ( table ) would not have been assessed as "very high quality" had they been models rather than solved structures. indeed the templates from indiana vesiculovirus and rotavirus a have more than . % f-j outliers, and also have the poor quality scores for qmean. these two structures also have the poorest resolution of any of our templates, at > Å. the poor quality scoring may therefore simply be a consequence of uncertainties in positioning of atoms in these structures. one might reasonably posit that the use of template solved structures having such issues might influence the resulting models to contain the same outliers. however, the model for rotavirus i has a lower level of f-j outliers than its rotavirus a template ( table ) . as might be expected, production of high quality models becomes more difficult as the genetic distance between target and template increases, as show in tables e nevertheless, even at the level of template-target pairs in separate genera (table ) , the average performance is acceptable, as summarized in table . we therefore suggest that homology modelling may be used to produce rdrp models for research use even for genera where no solved structure exists, provided a template structure exists within the same family. here, we provide examples (table ) of such successful inter-genus, intra-family, models for genera coltivirus and parechovirus. our inter-genus models for lyssavirus and spumavirus are slightly less successful. moving to the next taxonomic level, models with template-target pairs in separate families (table ) are generally less successful. one exception is our model for family phenuiviridae, which is better than some of the intra-family models. this is encouraging, since phenuiviridae is a family without any solved rdrp structure. homology models have been produced at much larger taxonomic distances than those dealt with here, for instance from bacteria to eukaryotes [ ] , so it should be stressed that we make no claim for the generality of our findings outside of the viral orders under consideration, or for proteins other than rdrp. multi-domain proteins in particular, may produce higher quality models for some domains than others. one surprising result was the high quality of the models of reconstructed ancestral sequences (table , summarized in table ). as previously discussed, this may be due to the fact that the ancestral sequence is, assuming a regular molecular clock, potentially equally related to all descendent members of its genus. in this paper, we calculated centroid nearest neighbours (cnns) as the central points in sequence space for each genus (fig. ) . a reconstructed ancestral sequence may also be considered as a candidate central point. the value of central points is that they may serve as targets that could be used to make models representative of their genus as a whole. for instance, the shortest-path, mean and median cnns of genus orthohantavirus are sequences , and (see supplementary table for a list of sequences for each genus) , representing sin nombre orthohantavirus, rockport orthohantavirus and cao bang orthohantavirus respectively. the partial solved structure used as the template for modelling in the genus orthohantavirus in the present paper is from hantaan orthohantavirus ( ize, see table ) and the target used, imjin thottimvirus (sequence in orthohantavirus panel of fig. ) , is now classified as belonging to a new genus thottimvirus (table ) . the three cnns, sin nombre orthohantavirus, rockport orthohantavirus and cao bang orthohantavirus are %, % and % identical to ize respectively, whereas imjin thottimvirus is only % identical. the latter was of course chosen to test the effectiveness of intra-genus homology modelling over as wide a genetic distance as possible (see section . ). for the performance of subsequent experimental procedures on orthohantavirus rdrps, for instance docking to discover novel anti-viral compounds, a homology model corresponding to one of the three cnns mentioned above or to the reconstructed ancestor (table ) would be the preferred target, along with the existing solved structure. where a solved rdrp structure exists in a genus, it should be used. however, if that solved structure is not a cnn, a homology model of a cnn or ancestral sequence should be produced for comparative purposes. where no solved rdrp structure exists in a genus, a structure from another genus in the same family may be used. on the basis of our investigations, we recommend a procedural flowchart for selection of an rdrp structure for further study, for instance docking to discover novel anti-viral compounds, in any rna virus genus of interest (fig. ) . where a solved structure exists within a genus, it is the obvious choice for further experiments. however, where that solved structure is far from any of the cnn sequences of the genus, as judged by the force-directed graph, a cnn may also be homology modelled for comparative purposes, using the existing solved structure as a template. any differential performance of the solved structure and the homology model in, for instance, a docking experiment, may give clues as to the generality of conclusions derived from the solved structure alone. a reconstructed ancestral rdrp may also be used as an alternative to, or in addition to, a cnn. the limits of homology modelling would appear, on the basis of the results presented here, to be at the intrafamily, inter-genus level. template-target pairs in different viral families are unlikely to be of practical use, as the predicted quality of the resulting models is low. our models were produced using moe, and we have not performed comparisons using other modelling tools, such as swiss-model [ ] or modeller [ ] . we feel that it is unlikely that significant differences in output would be produced, but when the object of the exercise is drug-discovery, we recommend that the protocol in fig. be implemented using several alternative modelling softwares. crystallographic structural genome projects are badly needed to close the sequence-structure gap. in the meantime, systematic attempts to fill the gaps via homology modelling may be useful. however, for many taxa e all of the order nidovirales and much of mononegavirales -the paucity of solved structures to act as templates remains a serious obstacle. all code, inputs and outputs are available from: https://doi.org/ . /lancaster/researchdata/ . a three-dimensional model of the myoglobin molecule obtained by x-ray analysis protein modeling: what happened to the "protein structure gap the high throughput sequence annotation service (ht-sas) -the shortcut from sequence to true medline words the evolution and emergence of rna viruses crystal structure of the full-length japanese encephalitis virus ns reveals a conserved methyltransferase-polymerase interface molecular docking revealed the binding of nucleotide/ side inhibitors to zika viral polymerase solved structures using bioinformatics tools for the discovery of dengue rna-dependent rna polymerase inhibitors epidemiological characteristics of humaninfective rna viruses the rcsb protein data bank: integrative view of protein, gene and d structural information reference sequence (refseq) database at ncbi: current status, taxonomic expansion, and functional annotation mafft: iterative refinement and additional methods mega : molecular evolutionary genetics analysis version . for bigger datasets muscle: multiple sequence alignment with high accuracy and high throughput network visualizations of of relationships in psychometric data graph drawing by force-directed placement some properties of classical multidimensional scaling protonate d: assignment of ionization states and hydrogen coordinates to macromolecular structures the generalized born/volume integral implicit solvent model: estimation of the free energy of hydration using london dispersion instead of atomic surface area structure validation by calpha geometry: phi,psi and cbeta deviation on the accuracy of homology modeling and sequence alignment methods applied to membrane proteins qmean: a comprehensive scoring function for model quality assessment toward the estimation of the absolute quality of individual protein structure models evolutionary trees from dna sequences: a maximum likelihood approach fastml: a web server for probabilistic reconstruction of ancestral sequences sars and mers: recent insights into emerging coronaviruses taxonomy of the order mononegavirales: second update emerging phleboviruses taxonomy of the order bunyavirales: second update construction of phylogenetic trees for proteins and nucleic acids: empirical evaluation of alternative matrix methods the neighbor-joining method: a new method for reconstructing phylogenetic trees swiss-model and the swiss-pdbviewer: an environment for comparative protein modeling modeller: generation and refinement of homology-based protein structure models supplementary data to this article can be found online at https://doi.org/ . /j.jmgm. . . . key: cord- -w gndka authors: ozkaya, umut; ozturk, saban; barstugan, mucahid title: coronavirus (covid- ) classification using deep features fusion and ranking technique date: - - journal: nan doi: nan sha: doc_id: cord_uid: w gndka coronavirus (covid- ) emerged towards the end of . world health organization (who) was identified it as a global epidemic. consensus occurred in the opinion that using computerized tomography (ct) techniques for early diagnosis of pandemic disease gives both fast and accurate results. it was stated by expert radiologists that covid- displays different behaviours in ct images. in this study, a novel method was proposed as fusing and ranking deep features to detect covid- in early phase. x (subset- ) and x (subset- ) patches were obtained from ct images to generate sub-datasets. within the scope of the proposed method, patch images have been labelled as covid- and no finding for using in training and testing phase. feature fusion and ranking method have been applied in order to increase the performance of the proposed method. then, the processed data was classified with a support vector machine (svm). according to other pre-trained convolutional neural network (cnn) models used in transfer learning, the proposed method shows high performance on subset- with . % accuracy, . % sensitivity, . % specificity, . % precision, . % f -score and . % matthews correlation coefficient (mcc) metrics. corona virus disease (covid- ) is essential to apply the necessary quarantine conditions and discover the treatment methods in order to prevent the rapid spread of covid- . it has become a global epidemic similar to other pandemic diseases, causes patient deaths in china according to world health organization (who) data [ ] [ ] [ ] . early application of treatment procedures for individuals with covid- infection increases the patient's chances of survival. fever, cough and shortness of breath are the most important symptoms in infected individuals for the diagnosis of covid- . at the same time, these symptoms may show carrier characteristics by not being seen in infected individuals. pathological tests performed in laboratories are taking more time. also, the margin of error can be high. a fast and accurate diagnosis is necessary for an effective struggle against covid- . for this reason, experts have been started to use radiological imaging methods. these procedures are performed with computed tomography (ct) or x-ray imaging techniques. covid- cases have similar features in ct images in the early and late stages. it shows a circular and inward diffusion from within the image [ ] . therefore, radiological imaging provides early detection of suspicious cases with an accuracy of %. when the studies in the literature are examined, shan et al proposed a neural network model called vb-net in order to segment the covid- regions in ct images. this proposed method has been tested in new cases. a recommendation system has been used to make it easier for radiologists to mark infected areas within ct images [ ] . xu et al. analyzed ct images to determine healthy, covid- and other viral case. the dataset used included covid- , viral diseases and healthy images. they achieved . % general classification accuracy with their deep learning method [ ] . apostolopoulos et al. proposed a transfer learning methods to classify covid- and normal case. they obtained performance metrics which are . % accuracy, . % sensitivity, and . % specificity [ ] . shuai et al. were able to successfully diagnose covid- using deep learning models that could obtain graphical features in ct images [ ] . ct images were used in this study to classify covid- cases. two different datasets were generated from ct images. these datasets include × and × patch images. each dataset contains number of images labeled with covid- and no findings. deep features were obtained with pre-trained convolutional neural network (cnn) models. these deep features was fused and rank to train support vector machine (svm). the performance of proposed method can be used for early diagnosis of covid- cases. this study consists of sections. the properties of obtained patch images are visualized in section . in section , the basics of deep learning methods, feature fusion and ranking techniques are mentioned. comparative classification performances are given in section . there is a discussion and conclusion in section . infected ct images was accessed to the societa italiana di radiologia medica e interventistica to generate datasets [ ] . patch images obtained from infected and non-infected regions form ct images. properties of two different patch are given in table . ct images were obtained. the process of obtaining patches is given in figure . in , geoffrey hinton has shown that deep neural networks can be effectively trained by the greedy-layered pre-training method [ ] . other research groups used the same strategy to train many other deep networks. the use of term (deep learning) in order to draw attention to the theoretical importance of depths has been popularized for the design of better performing networks of neural networks and the importance of deeper networks. deep learning, which has become quite popular recently, has been used in many areas. e-mail filtering, search engine matching, smartphones, social media, e-commerce can be written to them. academic studies have been pioneers for their use in these areas. deep learning is also used for face recognition, object recognition, object detection, text classification and speech recognition. deep learning is a type of artificial neural network and has multilayers. the more layers are increased, the greater accuracy is achieved. while deep convolutional networks are successfully used in image, video, speech and sound processing, recurrent neural networks are used in sequential data such as text and speech. deep learning, started to be used in , a large data set with multilayer of machine learning calculations used in many layers, even in the machine learning the parameters that need to be defined, perhaps a better system that can evaluate the parameters. deep learning artificial neural networks are the algorithms created by taking advantage of the functions of the brain. in machine learning, deep belief networks (dbn) is a productive graphical model or, alternatively, a class of deep neural networks consisting of multiple layers in hidden nodes. when trained on a series of unsupervised examples, the dbn can learn to reconfigure its entries as probabilistic. the layers then act as feature detectors. after this learning phase, a dbn can be trained with more control to make the classification. dbns can be seen as a combination of simple, unsupervised networks, such as restricted boltzmann machines (rbms) or auto encoder, which serve as the hidden layer of each subnet, the visible layer of the next layer. convolution is used as a mathematical process. it is a special type of linear operations. convolutional neural networks (cnn) are a type of neural network with at least one layer of convolution. however, the convolution process in deep learning is different from the convolution process in normal or engineering mathematics. convolution neural networks has some layer such as convolution, relu, pooling, normalization, fully connected and softmax layer. in the convolution neural networks, classification process takes place in fully connected layers and softmax layer. generally, convolution is a process that takes place on two actual functions. to describe the convolution operation, two function can be used for this definition. for example, the location of a space shuttle with a laser is monitored. the laser sensor produces a simple x(t) output, which is the space of the space shuttle at time t. where x and t are actual values, for example, any t is a different value received at a snapshot time. also, this sensor has a bit noisy. to carry out a less noisy prediction, designer can take the average of several measurements together. naturally, final measurements are closer, so that the average weights that give more weights to desired final measurements. this can be done with the weighting function w(a), which is a measurement period. if a weighted average operation is applied at all times, a new function is obtained which allows to more accurately estimate the position: the above process is a convolution and is represented by a star: in cnn terminology, first argument in x function at eq. is called an introduction to convolution and the second argument for w function is called the kernel. the output is called feature map. in the above example, the measurement is made without interruption, but this is not realistic. time is parsed when working on the computer. in order to realize realistic measurement, one measurement per second is taken. where t is the time index and is an integer, so x and w are integers. in machine learning applications, the input function consists of a multidimensional array set and the kernel function consists of a multidimensional array of several parameters. multiple axes are convolved at one time. so if the input is a two-dimensional image, the kernel becomes a two-dimensional matrix. the above equation means shifting the kernel according to the input. this increases invariance of convolution [ ] . but this feature is not very important for machine learning libraries. instead, many machine learning libraries process the kernel without inversion, which is called as cross correlation, which is related to convolution. but because it looks like a convolution, it is called a convulsive neural network: discrete convolution is seen as a matrix product. typical convolution neural networks' benefit from further expertise to effectively deals with large inputs. figure shows how the process occurs in convolution neural networks: convolution provides three important thoughts to improve a machine learning system: infrequent interactions, parameter sharing, and covariant representations. furthermore, convolution process can be worked with variable-sized inputs. convolution neural network layers use a matrix parameter with a matrix parameter that includes a different kinds of link between each input unit and each output unit. it means that each output unit connects with each input unit. however, cnn typically have infrequent interactions (also called sparse links or sparse weights). this is done by making the kennel smaller than the entrance. since the number of pixels after each convolution process decreases, if there is a quality that should not be overlooked at the edges, zero and edge attributes are preserved by adding zero at the end of the rows and columns. this process is called padding. for example, input image may consist of thousands or millions of pixels for image process, but small and meaningful properties such as kernel's edges consisting of only ten or hundreds of pixels can be detected. this means we need to save fewer parameters that both reduce the memory requirements of cnn model and increase its efficiency. it also means that calculating output requires less processing. these improvements in productivity are generally quite large. parameter sharing refers to the use of the same parameter for more than one function in a model. in a conventional neural network, each element in weighted matrix is used to calculate the output of a layer. this is multiplied by an element of the entry and will not be reviewed again. it can be said that a network ties weights because the value of the weight applied to an input depends on the value of the weight applied elsewhere as in parameter sharing. in a cnn, each member of the core is used in each position of the insert. parameter sharing used by the convolution process means that instead of learning a separate set of parameters for each subject, only one set will be learned. considering that the images are three-dimensional in the form of h x w x d size if k x k is called kernel size is how many pixels of convolution output is calculated as follows: roughly means normalization. the size of the data in artificial neural networks is important. as the data grows, the memory they occupy increases and this reduces both the efficiency of the artificial neural network and decreases the working speed. by compressing the entire dataset value to - , the operations are made easy. it extracts this process from the average of all the data sets and thus the data is in the range - . the result of standardization) is to rescale features for a standard normal distribution. where μ and σ is represented as average standard deviation respectively. standard scores for each samples are computed as follows: the standard deviation for the features is centered between and . also, it is important for training of many machine learning algorithms. a pooling function changes the output of the network at a specific location with a summary statistics of nearby outputs. for example, max-pooling yields the largest in the quadrilateral space as output. other popular pooling functions; mean and minimum pooling functions. when number of parameters in the next layer depends on input image or feature map size, any reduction in input size also increases the statistical efficiency and reduces the memory requirements for storing parameters. the number of pixels of the pooling output is calculated as follows: rectified linear unit is an activation function type. the rectified linear unit has recently become popular. calculates the function f (x) = max ( , x). in other words, activation is thresholded equal to zero. there are a number of pros and cons of the use of relu. it has been found that stochastic gradient descent significantly accelerates convergence compared to sigmoid / tanh functions. it is claimed that this originates from a linear, unsatisfactory form. when the neurons containing costly operations are compared to tanh / sigmoid, relu can simply be applied by thresholding an activation matrix to zero. relu units can become sensitive during training phase. for example, a large gradient scale flowing through neuron with a relu activation function can cause weights to be updated so that the neuron is not reactivated at any data point. if this happens, the gradient flowing through the unit will be zero from that point forever. that is, relu can kill units irrevocably during training because data replication can be disabled. for example, if the learning rate is too high, % of the network may be dead. this is a less frequent occurrence with an appropriate adjustment of the learning rate. in fully connected layers, reduction of nodes below a certain threshold increased the performance. so it is observed that forgetting the weak information increases learning. some properties of dropout value are as follows. the dropout value is generally . . different uses are also common. it varies according to the problem and data set. the random elimination method can also be used for the dropout. the dropout value is defined as a value in the range [ , ] when used as the threshold value. it is not necessary to use the same dropout value on all layers; different dilution values can also be used. the softmax function is a sort of classifier. logistic regression is a classifier of the classifier and the softmax function is multi-class of logistic regression. /∑je fj term normalizes the distribution. that is, the sum of the values equals . therefore, it calculates the probability of the class to which the class belongs. when a test input is given x, the activation function in j = ,…,k is asked to predict the probability of p (y = j | x) for each value. for example, it is desirable to estimate the probability that the class tag will have each of the different possible values. thus, as a result of the activation function, it produces a k-dimensional vector which gives us our predictive possibilities. the error value must be calculated for the learning to occur and the error value for the softmax function is calculated by the softmax loss function. in the softmax classifier, the f (xi; w) = wxi function match remains unchanged, but we now interpret these scores as normalized log probabilities for each class and use the following form of cross entropy loss. vgg- , googlenet and resnet- models were used for feature extraction. the obtained feature vectors with these models were fused to obtain higher dimensional fusion features. in this way, the effect of insufficient features obtained from a single cnn network is minimized. in addition, there is a certain level of correlation and excessive information among the features. this also increases consuming time and computational complexity. therefore, it is necessary to rank the features. t-test technique was used in feature ranking. it calculates the difference between the two features and determines its differences statistically [ ] . in this way, it performs the ranking process by taking into account the frequency of the same features in the feature vector and the frequency of finding the average feature. after the feature fusion and ranking functions were performed, the binary svm classifier was trained for classification. svm transfers features into space where it can better classify features with kernel functions [ ] . linear kernel function was used in svm. the svm classifier was trained to minimize the squared hinge loss. the squared hinge loss is given in eq. . here, xn represents the fusion and the ranking feature vector. the wrong classification penalty is determined by the c hyper parameter in the loss function. in the proposed method, pre-trained cnn networks were trained for subset- and subset- separately. vgg- , googlenet and resnet- models were used as a pre-trained network. patch images were given as input to trained pre-trained cnn structures during the test phase. feature vectors ( × × ) obtained from these networks provide a new feature set with fusion process. correlation values between features were taken into consideration in fusion process. the obtained features were ranked by t-test method. in the t-test ranking process, features close to each other were eliminated according to feature frequency. in the last stage, fusion and ranking deep features were evaluated with svm classifier. the method proposed in figure is visualized. there are pieces of × ct patches in subset- . data distribution between classes is equal. % of these images were used for training and % for testing. table shows comparatively classification performance pre-trained cnn networks and of the proposed method. subset- includes covid- and no finding × ct patches. comparative classification results of subset- are given in table . the best performance in subset- showed proposed method with . % as can be seen in respectively. the proposed method achieved the highest metric performance in f -score and mcc metrics with . % and . % respectively. as can be seen in table and table , there are confusion matrixes obtained with subset- and subset- datasets of the proposed method in figure and figure . confusion matrix was obtained for proposed method using subset- in figure . when confusion matrix was evaluated in class, covid- class was classified with an accuracy rate of . %. performance of no findings class was lower than covid- . . % accuracy rate was obtained for this class. a classification accuracy of . % was obtained in the analysis of positive class. in negative class, this rate is higher and had a value of . %. subset- was used in the training and testing process for the proposed method. in figure , a confusion matrix was obtained for test data. in class analysis, . % accuracy rate of covid- class was obtained. performance was increased compared to subset- in the no findings class. accuracy rate was . % for this class. in the positive and negative class evaluation, a classification accuracy of . % and . % was obtained respectively. the first case of covid- was found in the wuhan region of china. covid- is an epidemic disease and threatens world health system and economy. covid- virus behaves similarly to other pandemic viruses. this makes it difficult to detect covid- cases quickly. therefore, covid- is a candidate for a global epidemic. radiological imaging techniques are used for a more accurate diagnosis in the detection of covid- . therefore, it is possible to obtain more detailed information about covid- using ct imaging techniques. when ct images are examined, shadows come to the fore in the regions where covid- is located. at the same time, a spread is observed from the outside to the inner parts. obtained images with different ct devices were used in the study. there were different levels of grey level in the images. different characteristics of ct devices caused it. this complicates the analysis of the images. in the study, deep features were obtained by using pre-trained cnn networks. then, deep features were fused and ranked. the data set was generated by taking random patches on ct images. clinical features of patients infected with novel coronavirus in wuhan added value of computer-aided ct image features for early lung cancer diagnosis with small pulmonary nodules: a matched case-control study dermatologist-level classification of skin cancer with deep neural networks mining x-ray images of sars patients lung infection quantification of covid- in deep learning system to screen coronavirus disease covid- : automatic detection from x-ray images utilizing transfer learning with convolutional neural networks a deep learning algorithm using ct images to screen for corona virus disease (covid- ) improving neural networks by preventing co-adaptation of feature detectors non-native children speech recognition through transfer learning a modified t-test feature selection method and its application on the hapmap genotype data statistical learning theory: a tutorial evaluation of the confusion matrix method in the validation of an automated system for measuring feeding behaviour of cattle key: cord- -muh rwla authors: madichetty, sreenivasulu; m., sridevi title: a stacked convolutional neural network for detecting the resource tweets during a disaster date: - - journal: multimed tools appl doi: . /s - - - sha: doc_id: cord_uid: muh rwla social media platform like twitter is one of the primary sources for sharing real-time information at the time of events such as disasters, political events, etc. detecting the resource tweets during a disaster is an essential task because tweets contain different types of information such as infrastructure damage, resources, opinions and sympathies of disaster events, etc. tweets are posted related to need and availability of resources (nar) by humanitarian organizations and victims. hence, reliable methodologies are required for detecting the nar tweets during a disaster. the existing works don’t focus well on nar tweets detection and also had poor performance. hence, this paper focus on detection of nar tweets during a disaster. existing works often use features and appropriate machine learning algorithms on several natural language processing (nlp) tasks. recently, there is a wide use of convolutional neural networks (cnn) in text classification problems. however, it requires a large amount of manual labeled data. there is no such large labeled data is available for nar tweets during a disaster. to overcome this problem, stacking of convolutional neural networks with traditional feature based classifiers is proposed for detecting the nar tweets. in our approach, we propose several informative features such as aid, need, food, packets, earthquake, etc. are used in the classifier and cnn. the learned features (output of cnn and classifier with informative features) are utilized in another classifier (meta-classifier) for detection of nar tweets. the classifiers such as svm, knn, decision tree, and naive bayes are used in the proposed model. from the experiments, we found that the usage of knn (base classifier) and svm (meta classifier) with the combination of cnn in the proposed model outperform the other algorithms. this paper uses and nepal and italy earthquake datasets for experimentation. the experimental results proved that the proposed model achieves the best accuracy compared to baseline methods. micro-blogging [ , , , ] sites like twitter, facebook, instagram, etc. are helpful for collecting situational information [ ] during a disaster like an earthquake, floods, disease outbreaks [ ] , etc. during these events, minor tweets are posted relevant to the specific classes such as infrastructure damage, resources [ , ] , service requests [ ] , etc., and also spam tweets, communal tweets and emotion information are posted [ , , , , , ] . therefore, it is required to design the powerful methodologies for the detection of specific class tweets (like need, availability of resources, etc.), so that relevant tweets can be automatically detected from the large set of tweets. the detection of specific class tweets [ , , , ] has received much attention in the last two years. in the next few years, the detection of specific class tweets is likely to become more important in social media. specifically, the detection of two types of tweets contains information related to need and availability of resources is a challenging task. during the disaster, victims post tweets with information such as where essential resources such as food, water, medical aid, shelter, etc. are needed or required. similarly, humanitarian organizations post tweets with information such as where specific resources such as medical resources, food, water packets, etc., are available in the affected area. examples of need and availability of resource tweets are shown in table . the first four tweets represent the need for resources such as mobile hospitals, password-free wi-fi, blood and ambulances. the next four tweets reflect the availability of information on resources such as the italian army to provide services to earthquake victims, the availability of shelter tents, money and ambulances. however, detection of need and availability of resource tweets is very beneficial for both humanitarian organizations and victims during the disaster. the main objective of this work is to assist the victims and humanitarian organizations in the event of a disaster by designing a method for automatic identification of need and availability of resource tweets (nar) from twitter. the problem of detecting nar tweets can be treated as a multi-classification problem. the classes are (i) need of resource tweet (ii) availability of resource tweet and (iii). none of both. only a few existing works [ , , ] are only focused on extracting the need and availability of resource tweets during the disaster. among them, most of the works used informationretrieval methodologies such as word vec, a combination of word embeddings and character embeddings, etc. specifically, the authors in [ ] used both information-retrieval methodologies and classification methodologies (cnn with crisis word embeddings) to extract the need and availability of resource tweets during the disaster. the main drawback of cnn with crisis embeddings is that it does not work well if the number of training tweets is small and, in the case of information retrieval methodologies, keywords must be given manually to identify the need and availability of resource tweets during the disaster. to overcome the above-mentioned issues, a novel method is proposed by using the stacking mechanism [ ] to identify nar tweets during the disaster. the stacking mechanism uses a two-level classifiers. the first level uses multiple classifiers and the classifier output is used as the second level classifier input, while the second level uses only one classifier. search and rescue dogs ambulances on the ground in #perugia following #earthquake volunteers from @crocerossa on the scene. the stacking method does not produce improved results if the models used in the stacking method are stable. therefore, different models such as cnn and knn classifiers with domain-specific features are used in this work. cnn is used to capture the semantic similarity between words, and even vocabulary words are different in the testing phase. in order to overcome the problem of a lower number of training tweets, new features are proposed and used in the knn classifier to detect nar tweets. the two models (cnn and knn classifiers with proposed features) have different functionality for the detection of tweets. the output of these two models is given as input to the svm (second level) classifier. the svm classifier is trained to determine the relationship between the output of the two cnn and knn classifier models. it gives the final prediction of tweets whether a tweet label is a resource need or a resource availability or none. the efficacy of the final prediction depends on the classifiers used in level- and level- . the reason for selecting the knn and svm classifiers as first and second level classifiers is clearly explained in sections . . and . . . the main contributions are summarized as: this paper is organized as follows. the second section examines the related work. the proposed approach for the detection of nar tweets during a disaster is described in the third section. experimental results and analysis are discussed in the fourth section. the last section is the conclusion of the paper. many studies [ , , , ] focused on the detection of the tweets related to a disaster. preliminary work [ ] focused mainly on extracting the features such as uni-gram and bigram frequency, parts-of-speech (pos), objective or subjective, personal or impersonal and formal or informal from tweets and used the classifiers for classifying the tweets based on the relevancy. classifiers such as naive bayes and max entropy classifiers are used for detection of the situational tweets related to the disaster. the authors explained that their work depends on the vocabulary of a specific event. in [ ] , the authors investigated and developed an application for detecting the earthquake based on the features such as context words, keyword position, content words and length of the tweets. it is applicable only for japanese tweets. to overcome the problem domain dependent, the authors in [ ] proposed a novel framework for classifying the situational and non-situational information based on the low-level lexical and syntactical features. after classification, the tweets are summarized based on the content words and also concluded that it works on cross-domain (domain independent). however, all the methods are focused only on situational tweets related to disaster but they failed to address specific class tweets. in recent years, more researchers focused on the detection of user-defined class tweets during a disaster. several studies, for instance [ , , , ] have been proposed on different specific classes. the authors in [ ] , suggests that decision tree with context and content features give the best results for recall and f -measure parameters among the classifiers such as svm, adaboost and random forest. however, it does not focus on nar tweets. in recent literature, the authors in [ ] developed a method by extracting the features by applying maximum frequency of words from the tweets to detect resource tweets during a disaster. resources include both availability and need of the resources. however, it's not focused alone on the availability and need of the resources tweets during a disaster. the authors in [ ] designed artificial intelligence disaster response (aidr) system for classifying the tweets into user-defined categories for detecting the tweets related to the disaster. in aidr, the uni-gram and bi-gram features are used for detecting the tweets related to the user-defined categories. these features are applied for detecting any user-defined classes during a disaster. in [ ] , the authors manually analyzed whatsapp messages for the requirement of medical, human, infrastructural resources during a disaster by considering the case study of nepal earthquake dataset . however, they have not proposed an automatic method for identifying the resources. in [ ] , the authors found that neural network retrieved models by integrating the character-level and word-level embeddings with pattern recognition techniques perform well than state-of-art models. the authors applied information retrieval techniques for detecting the nar tweets. in [ ] , the authors used a novel vector training approach for clustering the tweets about the emergency situations and compared their method with bag-of-words (bow), word vec-sum and doc vec. and described that clustering of tweets will be helpful further for identifying the different aspects of topic in emergency situations. however, they are not proposed a method for identifying the nar tweets during a disaster. the problem can be defined as follows: given a 'n' number of tweets x = {x , x , x , x , .....x n }, identify the tweets which are related to the three classes such as ). need of the resource ). availability of the resource and ). none of the above. this section describes the stacked convolutional neural network for identifying the nar tweets during a crisis. the overview of the proposed stacked convolutional neural network is shown in fig. . the stacking mechanism [ ] combines the predictions of diverse classifiers in the best way by learning the relationship between the models. different classifiers vary in prediction errors from the data. for instance, some classifiers mispredict the data, while some other classifiers predict the same data correctly. it increases the generalization ability of the model and reduces the misclassification rate, bias and variance of the model. the stacking based classifiers give a high performance than the individual classifier models due to its generalization ability [ ] . however, most of the resource detection systems focus on the individual classifier models rather than the ensemble methods (a combination of diverse classifiers). in this work, stacked convolutional neural network is proposed for detecting the resource tweets from social media during the disaster. it consists of two phases of the classifier. in the first phase, the convolutional neural network and the knn classifiers are used and referred to as base-level classifiers. the svm classifier is used as a meta-level classifier in the second phase. before the tweets are given as inputs to the base-level classifiers, the following pre-processing and extraction steps are performed, such as: -all tweets are changing to lower case letters to avoid the multiple copies of same words. -these are divided into words and it referred as tokens -the user mentions (@users), hash-tags (#) and url's are removed from the tweets. -similarly, stop-words, numerical and unknown symbols are omitted from tweets. for each tweet, two types of feature representation, and the following techniques are used to generate a feature representation from tweets, such as: we used pre-trained crisis word embeddings to represent the -dimensional vectors for each word in a tweet. it is mainly based on million crisis-related tweets collected during crisis events and used word vec tool for training the word embeddings. it uses the continuous bag of words model (cbow) architecture with negative sampling to generate word embeddings. [ ] to extract the top-most informative words from tweets because it has already been shown to be one of the most efficient feature selection algorithm for text categorization. the svm classifier is used for the χ − static feature selection algorithm because the authors in [ ] concluded that the svm with χ statistic feature selection performed well than other traditional methods. the extracted domain-specific features are shown in table . the first, second, and third columns are the serial number, features and information category, respectively. χ − static feature selection algorithm is used the above two methods provide two feature vector representations for each tweet that are given as input to base-level classifiers such as cnn and knn classifiers. cnn is suitable to elicit local and deep features from natural language. the authors [ ] have shown that cnn has had better results in sentence classification. the authors in [ ] have extended a convolutional-recursive deep model for d object classification that employs a combination of convolutional and recursive neural networks (rnn) cooperatively. the cnn layer discovers the low-level translation stable features that are feed into multiple, fixed-tree rnns to formulate higher-order features. in [ ] , the authors have shown that cnn outperforms many traditional methods in biomedical text classification, embedding layer it is the very first layer of cnn. it takes a fixed number of words from the tweets as input and converts into a corresponding -dimensional crisis word vector. the -dimensional tweet vector is passed into a series of convolution and pooling operations to understand high-level feature representations. in the convolution layer, the new features 'f ' are generated by using convolution kernel 'u ∈ r gd ' to a window of g words (filter size) as shown in ( ). where 'x j :j +g− ' is the concatenation of input vectors '(x j , x j + ...x j +g− )', 'b' is a bias term and 'f' is a non-linear activation function like 'sig', 'tanh', etc. the filter is used to the window of 'g' words for getting the feature map with 'f ∈ r n−g+ ' which is shown in ( ). different 'g' values ( , , ) are used to capture the different n-gram features from the tweet. this process is repeated for times ( filters) to produce the feature maps to learn the complementary features of the same filter size. after getting the feature map, maximum pooling is applied to each feature map. where 'μ q (f i )' refers to the maximum pooling operation [ ] used to the each window of 'q' features in the feature map 'f i '. the output dimension is reduced by the max-pooling while keeping important features from each feature map. after the maximum pooling operation, different feature vectors are generated from the convolution layer with filter sizes ( , , ) . then, the concatenation operation is applied to the different feature vectors to become a single block. the dense layer with the softmax activation function is used on the top of the pooling layer to keep the features generated from the pooling layer. it is shown in the ( ). where 'w' is a weight matrix, 'b e ' is a bias vector and 'e' is a non-linear activation function. the input of dense layer may be variable length, which produces fixed output 'z', and it is given as input for classification. the output layer defines the probability distribution and uses a softmax function. the probability of the 't' label output is given by ( ) . where 'w t ' is the weights associated with the class 't' labels in the output layer. we adopted the k-nearest neighbour as a base-level classifier in the proposed model to get the feature vector of the tweet to the meta-level (second-level) classifier. it acts as a firstlevel classifier for getting better performance than other classifiers (decision tree, naive bayes classifier), and a detailed explanation is shown in sections . and . . . it accepts domain-specific features such as aid, needs, etc., as an input feature vector of the tweets. the knn classifier gives the scores to the tweet neighbors among the training tweets and uses the class labels of 'k' most similarity neighbors to predict the probability vector of the tweet. we use the euclidean distance 'e(t w, t w )' to measure the similarity between the tweets 't w' and 't w ' that is shown in ( ) where 'n' is dimension size of the tweet vectors 't w' and 't w '. the classes of these neighbors are weighted using the similarity of each neighbor to t w as follows: where 'knn(t w)' indicates the set of k-nearest neighbors of tweet tw. δ(t w j , c i ) represents the probability of t w j with respect to the class c i and i= represents the number of classes are three such as need of resource, availability of resource and none of the both. finally, it produces the three-dimensional probability vector for each tweet in testing data. results indicate that the knn classifier also plays a significant role in the proposed model for detecting the nar tweets. in this work, we have adopted the svm classifier [ ] and it is one of the traditional machine learning algorithms in the proposed model. svm is used as a meta-level classifier for getting better performance than other classifiers (decision tree, naive bayes classifier) and a detailed explanation is shown in sections . and . . . it accepts the concatenation of the predicted outputs of the cnn and knn classifiers as input features. the size of the input vector is six-dimensional. we used the radial basis function (rbf) kernel in the svm classifier for transforming the data into a higher dimensional feature space. given a set of testing tweets to the base-level classifiers and it produces the output of six-dimensional vectors. the results are sent as input features to the meta-level classifier (svm classifier). the output of the svm (second level classifier) is used as a final tweet prediction. later, the learned model will be used to detect nar tweets during a disaster. the main advantage of the proposed stacked convolutional neural network for detecting nar tweets during a disaster is that it works effectively, even for small datasets, due to the use of domain-specific features. and also, even though the words are different in both training and testing tweets using the cnn model. the summarization of the proposed method is shown in algorithm . the summarization of the proposed method. cnn and knn with proposed features : it represents tweet related to the availability of resources : it represents tweet non-related to the need and availability resources : it represents tweet related to the need resources steps: . the tweets are preprocessed by applying the following techniques. -removal of stop-words, numerical and unknown symbols. -changing to lower case letters. in this section, we first introduce the datasets, parameters details of the model and metrics used for performance evaluation. subsequently, the experimental results include the results of the preliminary experiments, the classifier selection experiments in the proposed model and the ablation experiments. furthermore, a comparison is made between the proposed approach and existing approaches. the data are collected from nepal and italy earthquakes that occurred during and , respectively. tweets are crawled from the tweet-id's through the twitter api the tweet-id's are obtained from the authors [ ] . out of the total tweets, % and % of tweets are used for training and testing the proposed model, respectively. the details of disaster datasets are given in table . the code is made available to the public . training the cnn model by optimizing the sparse-cross entropy of ( ) using the adadelta [ ] algorithm. the maximum epoch number is set at . the mini-batch sizes of , , are used. the mini-batch size is , which gives better results compared to other batch sizes and is tabulated in table and filter sizes of , , are used. to avoid the over-fitting, . dropout [ ] and early stopping criteria based on the loss of the validation data are used. all the experiments are performed using the python language scikit [ ] package. table gives the inscription of the various methods. the first column, second column and third column indicate the serial number, method name and abbreviation, respectively. in the abbreviation, the methods before and after '+' symbol are the base-level classifiers (first level classifiers) , '+' indicates the concatenation of predicted output of the base-level classifiers (first level classifiers) and '→' symbol indicates the flow of predicted output of the base-level classifiers as input to the metaclassifier. the method after '→' symbol indicates the meta-level classifier (second level classifier). the performance of the proposed models is assessed based on the standard measures such as accuracy, precision, recall and f -score are calculated using eqs. to , respectively. where t p table for various batch sizes. however, the batch size of got the best accuracy compared to the batch sizes of and . therefore, for further experiments batch size of cnn, is considered. this section explains the results of the preliminary experiments, the classifier selection experiments in the proposed model, and the ablation experiments. initially, the experiment is performed on the svm classifier based on the proposed domainspecific features for the identification of nar tweets and compared to the bow model shown in table . it highlighted the impact of the proposed domain-specific features compared with the bow model for the proposed solution. it is beneficial for the proposed solution to identify tweets, especially for smaller datasets. later, various experiments are performed using the cnn model to determine the best batch size. the batch sizes such as , and are used. results of the cnn model using the accuracy parameter is shown in table by varying the batch sizes. the results show that the cnn model provides the best outcome for the batch size of compared to others, such as and . therefore, for additional experiments, batch size is considered. it is noted that the values reported in all tables are based on the average need and availability of resource classes. the following four different experiments are performed for the proposed method to choose the best appropriate classifier for base-level and meta-level classifiers. . in the first experiment, the output of cnn and svm (base-level classifiers) are given as features to the meta-level classifier. by varying the meta-level classifiers (svm, knn, decision tree and naive bayes), the results are reported in table . knn gives the best performance than other classifiers for the nepal earthquake dataset. but in the case of the italy earthquake dataset, svm gives the best performance than the other classifiers. . in the second experiment, the cnn output and the decision tree (base-level classifiers) are given as features to the meta-level classifier. the models used in the second experiment by different meta-level classifiers are cds, cdk, cdnb and cdd, and the results are reported in table . among the other models, cdk gives the best accuracy for the nepal earthquake dataset and italy earthquake dataset. cdnb also provides the same accuracy as cdk in the case of the italy earthquake dataset. . in the third experiment, the output of the cnn and naive bayes classifiers (base-level classifiers) is given as a feature to the meta-level classifier. the models used in the third experiment to vary the meta-level classifiers are cnbs, cnbk, cnbnb and cnbd, and the results are reported in table . cnbnb has the best accuracy among the models for both disaster datasets. cnbs gives the same accuracy as the cnbnb in the case of the italy earthquake dataset. . finally, in the fourth experiment, the output of the cnn and knn classifiers (baselevel classifiers) is given as input to the meta-classifier. the models used in the fourth experiment to vary the meta-classifiers are cks, ckk, cknb and ckd, and the results are tabulated in table . cks achieves the highest accuracy among the models for both disaster models. after performing four different experiments, the best f -score models (models that achieve the best f -score) are selected from the four various experiments of models such as cdk, cks / ckk, cnbs and csk for both disaster datasets. in the same way, the best precision models (models that achieve the highest precision) such as cknb, cdnb, cnbb / cnbd and csnb on the nepal earthquake dataset are selected. similarly, csnb, cds, cnbnb and cks models achieve the best precision for the italy earthquake dataset. in the case of the execution time, cds runs very fastly on average of both disaster datasets. however, it does not give the best results compare to other models. finally, all models are compared and selected as the csk model that achieves the best f -score for the nepal earthquake dataset. in the case of an accuracy parameter, the csk model gives the best performance for the nepal earthquake dataset but not provide for the italy earthquake dataset. overall comparison of all the models, cks performs well than the other models on both disaster datasets. therefore, cks is selected to identify nar tweets during the disaster. various experiments are conducted to assess the effectiveness of the individual component in the proposed model (cks) on two datasets, such as nepal and italy earthquake. the proposed model is initially evaluated and the results for two datasets are tabulated in table . later, the experiments are performed by excluding informative (domain-specific) features and cnn individually in the proposed model and the results are reported in table . the informative features play a crucial role in the proposed method for italy's earthquake dataset, which reduces the performance of the proposed model by almost . % accuracy. in the case of the nepal earthquake, the performance is reduced by approximately . % accuracy. by removing the cnn model, the performance of both datasets is drastically reduced by almost % and % for the nepal and italy earthquake datasets, respectively. it indicates that cnn plays a significant role in both disaster datasets. by removing both cnn and svm classifiers from the proposed model, the performance reduction is the same as when cnn is removed. it indicates that the svm classifier alone does not have much impact on the performance of the model. however, the proposed method (cks) provides the best accuracy than any of the components used to identify nar tweets during the disaster. it is also proved by using statistical validation and it is given in section . . . this section provides a brief explanation of the methods that are compared with the proposed model. it can be categorized into two subsections based on the methods. . classification methodologies. . statistical validation of the classifier models. this section describes the comparison of the proposed model with the existing classification methodologies [ , , , ] . in [ ] , the authors presented an aidr platform for automatic classification of tweets into user-defined categories with the use of uni-gram and bi-gram features. similarly, in this paper, the svm classifier with features such as uni-gram and bi-gram used as a baseline, and experiments are performed. in [ ] , the authors used features such as location, infrastructure damage, communication, etc., for identifying the resources during a disaster and svm classifier is used for classification. the authors [ ] used cnn for sentence classification by hyper-tuning the parameters. similar to this, cnn is experimented and compared with the proposed model. in [ ] , the authors used the low-level lexical and syntactical features for identifying the situational information during a disaster. the proposed cks model achieves the best accuracy compared to existing methods on the nepal and italy earthquake dataset and the results are reported in table . however, the proposed model outperforms existing methods on both nepal and italy earthquake datasets for identifying the nar tweets. better accuracy is achieved for the proposed model when compared to the existing method due to the use of informative features and traditional classifiers, which enhanced the diversity of the model for identifying the nar tweets. in general, stacking models give better accuracy than individual models when the models have diversity. and also, it is observed that from table , for italy earthquake dataset has a huge impact on the proposed method compared to the nepal earthquake dataset due to the small dataset. in case of the execution time, rudra model [ ] runs very fastly and bow model [ ] runs very slowly compared to other models. however, it does not give the best result for detecting the nar tweets during the disaster. in this section, we have investigated the statistical significance of the different classification models. the authors in [ ] suggest that the use of the mcnemar statistical test for the deep learning models. therefore, we have used the mcnemar statistical methods [ ] to study the efficacy of statistical significance for classification methods. the contingency table of the mcnemar test is shown in table . here 'n ' represents the number of tweets corrected detected by model a and model b. 'n ' represents the number of tweets corrected detected by model b and wrongly detected by model a. 'n ' represents the number of tweets corrected detected by model a and wrongly detected by model b. 'n ' represents the number of tweets wrongly detected by model a and model b the chi-squared (χ ) can be defined as follows: the hypothesis is: . null hypothesis (n ): there exists no significant difference between the performances of the classifier model. . alternate hypothesis (n ): it can be defined as the existence of a significant difference between the performances of the classifier model. if n is accepted, then the probability (p) value is greater than . . if n is accepted, then the probability (p) value is less than . . tables and show the results of the mcnemar statistical test of the performance of the various proposed methods and the comparison with the existing methods. in tables, the '↑↑' indicates that the strong evidence of the proposed method is statistically significant compared to the other method and that the probability value is less than . (p< . ). it represents the confidence level of . % of the proposed method. '↑' indicates that the weak evidence of the proposed method is statistically significant compared to the other method and the probability value is between . and . ( .