key: cord-0316750-gc0w1y23
authors: Andreu-Perez, Javier; P'erez-Espinosa, Humberto; Timonet, Eva; Kiani, Mehrin; Gir'on-P'erez, Manuel I.; Benitez-Trinidad, Alma B.; Jarchi, Delaram; Rosales-P'erez, Alejandro; Gatzoulis, Nick; Reyes-Galaviz, Orion F.; Torres-Garc'ia, Alejandro; Reyes-Garc'ia, Carlos A.; Ali, Zulfiqar; Rivas, Francisco
title: A Generic Deep Learning Based Cough Analysis System from Clinically Validated Samples for Point-of-Need Covid-19 Test and Severity Levels
date: 2021-11-10
journal: nan
DOI: 10.1109/tsc.2021.3061402
sha: ecbb898c16029ea1943f7dc72c30aa018c3afe4f
doc_id: 316750
cord_uid: gc0w1y23

We seek to evaluate the detection performance of a rapid primary screening tool of Covid-19 solely based on the cough sound from 8,380 clinically validated samples with laboratory molecular-test (2,339 Covid-19 positives and 6,041 Covid-19 negatives). Samples were clinically labeled according to the results and severity based on quantitative RT-PCR (qRT-PCR) analysis, cycle threshold, and lymphocytes count from the patients. Our proposed generic method is an algorithm based on Empirical Mode Decomposition (EMD) with subsequent classification based on a tensor of audio features and a deep artificial neural network classifier with convolutional layers called DeepCough'. Two different versions of DeepCough based on the number of tensor dimensions, i.e. DeepCough2D and DeepCough3D, have been investigated. These methods have been deployed in a multi-platform proof-of-concept Web App CoughDetect to administer this test anonymously. Covid-19 recognition results rates achieved a promising AUC (Area Under Curve) of 98.800.83%, sensitivity of 96.431.85%, and specificity of 96.201.74%, and 81.08%5.05% AUC for the recognition of three severity levels. Our proposed web tool and underpinning algorithm for the robust, fast, point-of-need identification of Covid-19 facilitates the rapid detection of the infection. We believe that it has the potential to significantly hamper the Covid-19 pandemic across the world.

The COrona VIrus Disease- is an infectious disease caused by the newly discovered severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Covid-19 bears stark similarities with the Severe Acute Respiratory Syndrome (SARS) as well as the common cold. According to the World Health Organization (WHO), the mild symptoms of Covid-19 can include fever, cough and shortness of breath akin to the common cold [1] . Like SARS in more severe cases, Covid-19 also causes pneumonia and/or significant breathing difficulties, and in some rare instances, the disease can be fatal with the overall mortality rate estimated to be 0.28% worldwide. The initial cases of Covid-19 were initially diagnosed as pneumonia on 31 December 2019, and later re-diagnosed as Covid-19.

Covid-19 has proven to be a very infectious disease with the virus (SARS-CoV-2) spreading quickly on coming in close contact with an infected person (mean infection rate of 2.5). More specifically, according to the WHO, the (Covid- 19) virus is transmitted through direct contact with respiratory droplets of an infected person (generated through coughing and sneezing) [2] . The WHO declared it a global pandemic on 11 March 2020, within three months of first reported cases in China.

Covid-19 has put considerable strain on the health systems worldwide, with even developed countries struggling to test enough people to stop its spread effectively. Hence, taking the Covid-19 pandemic context in consideration, it is important to re-think the classical approaches for timely case finding [3] , as well as to utilise the limited resources available most effectively [4] .

In past pandemics, such as Malaria, a two pronged approach for a screening test was successfully employed to combat the spread of a prevalent virus [5] . In these two-stage strategies, the primary stage focuses on greater accessibility and ease of screening that is cost-effective. The primary stage is to 'alert' a potential carrier if they test positive on a primary screening test. In most cases, only those who test positive on the primary test go on to the secondary test, hence reducing the burden on the health system, and making the most of the resources available to conduct the secondary test.

The secondary screening is where the null hypothesis that the participant is not carrying an infection is accepted or rejected. The current techniques employed for screening of Covid-19 use serology, and diagnosis is based on the presence of genetic material of the virus. Clinical molecular tests have robust diagnostic accuracy but require spe-cialised equipment, as well as trained personnel to conduct the test. The turn-around time of these tests can vary from hours to several days.

Given the established success of two-pronged screening mechanisms to hamper the spread of infectious diseases, in this work, we aim to develop a web-based tool for the primary screening of . The motivation is to identify Covid-19 carriers using a model trained with clinically validated cough signals since Covid-19 affects the respiratory system [6] - [8] . Established works have evidenced the possibility of using the latent sound characteristics of coughs to identify respiratory diseases [9] , [10] . In addition, prior works have also reported that voluntary coughs (asymptomatic) contain sound characteristics that allow detecting abnormal pulmonary functioning and respiratory diseases [11] , [12] .

The remainder of this paper is structured as follows: section II outlines a summary of contributions, section III gives an overview of related work; section IV describes the procedural and methodological stages of the development of this technology; section V evaluates the recognition and assessment results; section VI discusses the results and achievements; with conclusion in section VII.

The main contributions of this work are manifold and listed as follows:

1) The proposed method 'DeepCough' achieves high accuracy, without the necessity of using specific pretrained models or transfer learning of data from other studies. Hence, differently from related work, the proposed methodology is generic, paving the way for derivative works. 2) In contrast to related work, we are able to evaluate the real capacities of detecting Covid-19 in a large clinically validated dataset (8,000+) where all data samples are matched with molecular-test of Covid-19 viral infection dispensed in certified laboratories to participants. 3) Also unique to this work, the accompanying molecular-tests (qRT-PCR) along with the cough samples, allow us to predict as well the extent of the infection. This is studied in this work using either the cycle threshold (Ct) from the qRT-PCR test or lymphocyte counts. 4) Furthermore, a full-stack automatic processing framework, from a raw sound stream to the test results, is also presented. 5) Development of a tangible test service prototype, as a platform-independent web-app service, CoughDetect.com 1

In an attempt to better understand the Covid-19 infection, and its associated symptoms, scientists have been 1 https://coughdetect.com collecting a wide spectrum of information in the latest months. This includes, but is not limited to, the respiratory sounds related to Covid-19 [6] [8] [13] , thermal imaging [14] , digestive symptoms [15] , as well as self reported surveys. The motivation of collating Covid-19 related information is to develop robust mechanisms for early detection of Covid-19. The most common symptoms of Covid-19 have been linked to pneumonia (cough, fever, shortness of breath, among others). Therefore, the analysis of cough audio signals is considered a viable course of action for a primary Covid-19 diagnosis [8] .

In general, three different respiratory sounds have been investigated to detect Covid-19 in patients: voice, breath and cough. The voice is a bio-signal that has been studied for many years to decode emotional, mental and physical aspects of a speaker. Usman et al. [16] conclude that there is a strong correlation between speech and Covid-19 symptoms, and therefore endorse the usage of speech signals for detecting Covid-19.

Faezipour et al. [17] recommended the use of signal processing techniques in tandem with state-of-the-art machine learning and pattern recognition techniques for preliminary diagnosis of Covid-19 from breathing audio signals. However, neither of the studies [16] and [17] encompass the Covid-19 recognition at this stage, with the additional caveat of quality of breath sounds hinged on the sensitivity of the microphone.

Another notable work on breathing patterns is done by Wang et al. [18] who developed a respiratory simulation model (RSM) for detecting the abnormal respiratory patterns of people remotely, and unobtrusively using a depth camera. However, their proposed RSM did not incorporate data from Covid-19 carriers. Nevertheless, the use of video cameras may raise privacy concerns. Imran et al. [19] presented AI4COVID -an approach to classify coughs using deep learning, and achieved an accuracy of 92.85%. However, their dataset contains only 70 Covid-19 cough samples, which renders their analysis to be inconclusive.

Sharma et al. [20] presented Coswara 2 , a database embodying respiratory sounds (cough, breath, and voice). This dataset is crowdsourced (volunteers from the web), i.e. not clinically controlled samples, with only eight positive Covid-19 samples at the time of writing of this study. Here, it is also important to note that sound modalities, especially voice, embodies privacy concerns since an individual can be identified from their voice [21] . Other notable database creation projects collecting data from the web include: Opensigma 3 by MIT collects collecting cough samples, Corona Voice Detect 4 by Voca.ai and Carnegie Mellon University (CMU) is collecting voice data, Covid Voice Detector 5 also by CMU is collecting further voice samples, and finally, the Covid-19 Sounds App 6 by the University of Cambridge is collecting crowdsourced samples of voice, cough, and breath.

A consensus derived from the related work referenced above is the challenge associated in the collection of clinically validated Covid-19 data which can be subsequently used for the training of Covid-19 recognition mechanisms. Towards this end, the data used in this study is collected following a strict protocol designed at laboratories and hospitals dedicated to Covid-19 diagnosis by expert immunologists. Another major strength of our proposed webbased app CoughDetect lies in the anonymity of the users. Coughs sounds are inherently anonymous. Collecting just cough sounds, along with the usage of in-house code only and strict privacy-preserving practices, we have ensured that participants share their cough samples without exposing their personal information. This robust quality control of our collected samples is an advantage of our work with respect to other studies, e.g. collecting clinical data via web questionnaires (crowdsourcing).

Covid-19 web-app service from only cough sound samples The cough samples are collected by means of an in-house developed web app named CoughDetect. The CoughDetect app (https://coughdetect.com) can be easily used with a laptop, mobile phone, or tablet, as shown in Fig. 1 . The development of the whole stack for Covid-19 primary screening required the use of several technologies to capture, process, analyse and make the test available. An illustration of the proposed technology stack diagram for the CoughDetect operational architecture is shown in Fig. 2 . The app records (.wav) sound files at 44,100Hz sample rate and transfers them to a secure data server using HTTP over SSL connection.

The three stages of the development stack include: 1) Sound stream processing and Detection; 2) A recognition method based on the generation of an Acoustic Cough tensor and Deep Learning (Deep-Cough); 3) Development and Deployment of the framework in a Web Tool App (CoughDetect). A flow chart delineating the steps in the inference mechanism of DeepCough is shown in Fig. 3 . The pre-processing of the raw sound signals is done to increase the signal-tonoise ratio and reduce the signal size. Cough bursts are detected in the recording and the rest of the signal is discarded. A set of low-level acoustic descriptors (a.k.a. sonographs) are extracted from a pre-processed cough sound. Two-and three-dimensional (2D and 3D) tensors are generated from these descriptors. These tensors are fed to a convolutional deep neural network that allows classification of positive and negative Covid-19 cough samples. Additionally, positive patients are sub-classified according to severity: borderline positive, standard positive, high positive based on qRT-PCR values and lymphopenia, or normal lymphocytes based on their blood lymphocyte count, as shown in Fig. 3 . Further details of research ethics and the different stages for building the CoughTensor and classification are presented next.

The collection of clinically validated cough data was carried out in collaboration with Hospital Costa del Sol Health Agency in Málaga, Spain and the National Laboratory for Research in Food Safety (LANIIA) laboratory in Nayarit, Mexico. The collection of the data started at the peak of the pandemic in Spain and Mexico on the 4th of April 2020 and lasted until the 21st of September 2020. The clinical protocols and research ethics are approved by the respective local institutional ethics committees (Code: BIOETIC_HUM_2020_02, Mexico; Code: APP_Covid-19_03042020, Spain). The Nayarit Unit and Málaga hospital are both accredited centres for the molecular diagnosis of Covid-19 and are also ISO 9001 certified.

The cough samples are collected from patients coming to the named institutions for a qRT-PCR test for detection of SARS-Cov-2 (Covid-19) by registered nurses trained to use the CoughDetect app. At all stages of the cough sample collection, the guidelines to interact with potential Covid-19 patients recommended by the WHO are strictly followed. For instance, the nurse wears personal protective equipment at all times, and a protocol for the smartphone disinfection, each time a cough is recorded, is observed.

The user interface and control functions of the Web App have been developed with in-house code to uphold the anonymity of the users and minimise the possibility of information leakages to external entities. This is in conformity with both the EU General Data Protection Regulation (GDPR) and the UK Data Protection Act 2018. In addition, our research and application also meet the ethical standards of the Declaration of Helsinki. A written informed consent was collected from each participant prior to acquiring their data sample. Clinical data was collected for this study by healthcare professionals. Table I ©2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. test and 6,041 coughs are from patients with a negative qRT-PCR test. Of those patients who resulted negative in the qRT-PCR test, 47.46% had no symptoms, and 52.54% had symptoms, at the time of taking the samples. Of those patients who resulted positive in the qRT-PCR test, 20.00% had no symptoms, and 80.00% had symptoms, at the time of taking the samples.

Cough samples (.wav) are were acquired at 44.10 kHz, Pulse-code modulation (PCM) format, monochannel. The raw sound data is low pass filtered with a cutoff frequency of 1 kHz. A Chebyshev type-2 second order filter with a transition frequency of 10Hz is applied to retain the high pitch sound of cough while attenuating background sounds simultaneously. Before cough detection, the filtered sound signal is decimated. For an initial bout of sounds in the recording, such as initial involuntary voice before coughing, envelope analysis detects the first peak amplitude and the following signal is subsequently trimmed. The cough detection algorithm with the filtered audio signals is based on empirical mode decomposition (EMD) [22] , [23] . EMD is a fully data-driven signal processing technique that does not employ base functions. EMD splits a sequence into a set of smaller sequences, referred to as intrinsic mode functions (IMFs), or simply modes, whereby each mode contains the energy associated at a certain scale. EMD has become popular in many applications, e.g. wearable sensors [24] , perhaps because the decomposition occurs in the same space as the original sequence.

EMD is applied to find the modes that better reflect the coughing periods. These periods are empirically selected to essentially detect cough burst in the filtered sound recordings. Individual or a set of IMFs can be objectively used for signal filtering, peak detection and signal reconstruction. For cough detection, depending on the noise level of the signal, certain IMFs contain rich information related to the peaks associated with coughs. Based on testing a number of signals with various noise levels, the 5 th and the 9 th modes are found as the prime IMFs essential for detection.

The instantaneous amplitudes (IAs) of the selected modes (5 th and 9 th ) are calculated by the Hilbert Transform [22] . The IAs of the selected modes are averaged, low pass filtered using a median filter with a window size of 500 signal samples, and normalised. Thresholding is performed using local signal peak detection: A signal sample is a local mode or peak if it has the local maximal value being preceded (to the left) by a value difference of ∆ ≥ 0.006. Thresholding the processed IAs partitions the original signal into cough and non-cough burst event. A summary of the EMD based algorithm for cough detection is depicted in Fig. 5 .

The detection algorithm produces a sequence of binary values: ones for cough and zeros for non-cough segments. A post-processing step joins consecutive cough bursts (segments) which are part of a single or main cough. To do this, an additional threshold is specified to decide whether to join neighbouring cough bursts with a distance shorter than 1500 decimated signal samples (0.34 seconds). Once an entire cough sound is detected, the rest of the signal is discarded. In addition, segments of short duration (length less than 400 signal samples) were discarded as they were often found to be more representative of short spikes in the signals due to ambient noise rather than part of a cough sound. The final output is a vector of indices that indicates where a cough in a raw sound stream is found.

Following detection, the information contained in the audio signals is transformed into a tensor form. We focused on representations that capture the main acoustic properties of the coughs. We used three types of sonographs: 1) Mel-frequency Cepstral Coefficients (MFCCs), 2) Melscaled spectrogram (MelSpec), and 3) Linear Predictive Coding Spectrum (LPCS) coefficients. These sound representations have specific properties for classification in intelligent audio analysis. We describe them here and discuss what they can inform us about coughs sounds.

1) Mel-frequency Cepstral Coefficients: MFCCs take into account human auditory perception, where low frequencies are better understood than high frequencies. The frequency bands are logarithmically located according to the Mel scale, which simulates the human auditory response more appropriately than the linearly spaced bands while at the same time disregards all other information. This descriptor is robust to variations in speech across subjects as well as the variations in recording conditions. MFCCs have been widely used in frequency domain speech recognition [25] - [27] . The computation of MFCCs involves the following main steps: (i) blocking of pre-processed cough sounds into overlapping windows to avoid loss of information at the ends of windows, (ii) applying hamming window on each frame to taper ends of a frame to zero so that spectral leakage can be avoided during the implementation of Fourier Transformation (FT), and (iii) computation of the power spectrum by applying FT. Next, (iv) the computed spectrum is passed through Melspaced band pass filters, where each filter provides the sum of energy for each frame. Finally, (v) the application of discrete cosine transformation yields MFCCs.

The MelSpec is a sonograph where frequencies are converted to the Mel scale in order to visually assess the energy distribution in the signal. The distribution of the energy in the Mel-based spectrum is relevant for the detection of Covid-19 positive samples. Fig. 6 provide examples of the energy spectrum for positive and negative samples. It can be observed that when a Covid-19 patient starts coughing, the energy is in the lowfrequency region. However, over time the energy shifts to the high-frequency region. The lower frequencies at the start may be due to pain, and later perhaps the extra efforts required for coughing make the signal more irregular and complex over time. A similar trend is also observed in the voice of people who are suffering from pain due to vocal folds disorders. Extra efforts in speaking render the signal complex which result in an irregular spectrum (continuous voice breaks and disperses energy) compared to a healthy person [28] , [29] . In contrast, for a Covid-19 negative person, the energy is uniformly distributed among all frequencies. Therefore, the stark differences in MelSpecs from Covid-19 positive and negative individuals can be leveraged for successful identification of Covid-19 infection.

3) Linear Predictive Coding Spectrum (LPCS) coefficients: LPCS models the emission source of an acoustic signal. LPCS is based on the source-filter model of phonatory signals. It is frequently used for the processing of speech and infant cry. Linear predictive coding analysis estimates the values of a signal as a linear function of previous samples. LPCS is a simplified vocal tract model that reflects the speech production system using a sourcefilter model. LPCS derives a compact representation of the spectral magnitude of brief duration signals (e.g. coughs). Its parametric analysis allows more accurate spectral resolution than the non-parametric FT when the signal is stationary for only a short time [30] . This sound representation has been used for assessing the vocality of cough sounds [31] and detecting coughs from other human sounds [32] .

For each audio frame, we extracted MFCCs with 33 coefficients, MelSpec with 33 bands, and LPCS with 33 line spectral pair frequencies from 33 coefficients. We obtained three matrices of 33 columns by the number of frames of the audio sample. The three sonographs are stacked to ©2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

form a three-dimensional tensor. Since cough samples have different durations, they have different number of frames. For all samples, the tensor is padded with zeros to complete 100 x 33 x 3 matrices (see Fig. 3 ) to obtain matrices of the same shape before passing them to the training stage. We set a sampling rate of 22050 and a hop length of 512. The 100 frames are equivalent to around 2.3 seconds.

We chose this tensor length as the minimum duration of a cough event after pre-processing and detection (section IV-B) falls in this range. Additionally, using this length we ensure that no spurious noise is included in the audio input.

The CoughTensor generated in Section IV-C are input to a stack of convolution blocks. Fig. 7 illustrates the architecture of DeepCough along with the dimensions of each layer. The sonograph tensor is fed to the convolution blocks in a manner analogous to how RGB images are processed. The first dimension corresponds to the horizontal axis of the sonograph (time frames), the second dimension is the vertical axis (frequencies, bands, coefficients), and the third dimension is the type of sonograph. For comparison purposes we defined two types of DeepCough: 1) DeepCough2D: The CoughTensor where 2D Mel-Spec is included in the tensor only, making a tensor spanning two dimensions (frequency and time) i.e. 100 × 33 × 1. 2) DeepCough3D: The CoughTensor stacks all sonographs described in section IV-C, with additional third dimension added for each sonograph hence rendering a tensor size of 100 × 33 × 3. Each convolutional block is composed of the following layers:

• Convolutional layers with rectifier linear units (ReLU): Convolution window is set to 2 × 2 (height/width) and initial padding is set to the length of the input tensor. The input dimensions are row, column and channels. • Max pooling layer: The pooling window is also set to 2 × 2 for height and width. • Dropout layer: A drop out level of 20% probability in each block to deter the model from over-fitting. This basic block is stacked four times, permitting a balance between architectural depth and complexity. The stack is followed by subsequent layers to transform the intermediate layer outputs for the final layer:

• A global average pooling layer (GA): It averages all spatial dimensions of the input tensor until the spatial dimension is one. • Dense layer (D): A dense layer yielding an output equivalent to the number of classes (one per class). • Softmax layer: A softmax type action function that performs classification over the inputs. Adaptive Moment Activation (ADAM) is the optimiser used to train the network with a categorical cross-entropy loss function. The evaluation metric during training for ADAM is the sum between resultant area under the curve (AUC) and balanced accuracy. The entire model is implemented in Keras [33] with Tensorflow backend.

The remarkable classification prowess of DeepCough arises from representation learning via convolutional neural networks of the sonograph representations. It is not only an intuitive approach for the analysis of pattern singularities in cough sounds but also has the capacity to integrate information from different sonographs, therefore jointly performing pattern analysis in the information that comes to represent emission (MFCCs, MelSpec) and perception characteristics (LPCS) of sounds (section IV-C).

The methods described in this paper are deployed in a Web App proof of concept (POC) available at https: //coughdetect.com. The main objectives of the interface are as follows:

• Enable a sleek and multi-platform Web App that can be accessed from any device with connectivity to the Internet without installation i.e. like accessing any other Web page. • Capable of running without the use of session cookies (page reload) or third party services to ensure patient's anonymity is upheld. • Interaction with user-server should be one-off and response. Multiple interactions with the server are prevented by reducing the number of requests to the server.

©2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. The use of MERN (MongoDB, ExpressJS, ReactJS, NodeJS) stack enables a true separation of layers allowing flexible control over each front-end and back-end component as depicted in Fig. 2 . React fundamentally uses SPA (Single Page Application) approach to quickly load a single resource (index.js) that contains the entire application rather than sending HTTP requests to the server every time a user wants to navigate elsewhere within the app. Not having to reload the page disables the need for storing cookies in the user machine that can be used to re-identify the user. The front-end logic is mainly javascript code that will be run in the client machine.

Functional interaction, such as recording the cough, passing it over to the server for evaluation and receiving feedback, is done using a custom-built self-hosted API instance solution on a different port. Connections to the server are always encrypted using Hypertext Transfer Protocol Secure (HTTPS). Locally, in the server machine, the node.js endpoint interacts with a python-based API that implements the algorithms and methods from sections IV-B to IV-D. Once the server receives an audio stream, the processing pipeline is activated, a prediction of the test is issued, and an asynchronous message is returned to the user (client), through the same established secured connection, to update the Web App with the result of the test as illustrated in Fig. 2 .

In this section we present A) the recognition results of DeepCough for the detection of Covid-19 vs non-Covid-19, and B) further categorisation of the Covid-19 positive samples into groups indicating the grade of Covid-19 disease, with respect to qRT-PCT and lymphocyte counts separately. A comparison of our proposed method Deep-Cough3D with approaches in related work (AI4COVID [19] , Coswara [20] ) and Cough against Covid [34] ), as well as Auto-ML [35] is also presented.

The classification results are reported for a stratified k = 10 cross-folding replication strategy for internal validity. A sample can only be exclusive member of one fold, (viz. folds are participant-independent). In each iteration a disjoint fold is left out for testing, a different one for validation and the remaining are used for training. The confusion matrix for DeepCough3D, shown in Table III , demonstrates the classification prowess of DeepCough3D with true positives at 97.18% and true negatives at 96.64%.

We also compare our proposed method with approaches in related work as well as AutoML [35] . AutoML is a full model meta-learning algorithm that combines Bayesian optimisation in a set of shallow machine learning algorithms, such as k-nearest neighbours, naïve Bayes, support vector machines, decision trees, random forest, and boosted classifiers. Auto-ML uses Bayesian optimisation of the AUC score to find for a method or their combinations (viz. pipelines), as well as the model hyper-parameters that yield the highest classification performance as delineated in Fig. 8 . It further considers feature selection through information gain, relief, χ 2 statistics. The Auto-ML method is trained with flattened vectors of audio signal descriptors (Mel-Frequency Cepstral Coefficients, Zero-Crossing Rate, Roll-Off Frequency, and Spectral Centroid).

Models were implemented in Python and trained on an Ubuntu Linux machine with AMD(R) Threadripper(R), 3.40GHz processor and 32 GB of RAM. Training time in this machine for 10-folds of the DeepCough approach was ∼ 35 minutes. We further deployed the trained models in an Oracle cloud virtual machine with eight cores (CPU only) as the back-end of the Web app (section IV-E. In this setting, the detection of a cough in a sound stream lasts in the range of 6-12 seconds and the results of the test are issued in 1-2 seconds.

Performance comparison of DeepCough 2D and 3D vs other related approaches and Auto-ML in terms of statistical measures of AUC, precision, sensitivity, and specificity are listed in Table II . A bar graph of the same results is shown in Fig. 9 (a) for a visual comparison. In Fig. 9 (b) the recognition performance of DeepCough3D is primarily assessed in terms of the AUC since AUC allows considering both sensitivity and specificity for different cut-points and gives a better view of the benefit of the ©2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. Fig. 8 : Flowchart outlining the model selection process with Auto-ML [35] .

binary classifier with skewed samples, e.g. more negatives than positives, than standard accuracy. All of the above results conclude that DeepCough 3D approach afforded the highest significant performance rates in most statistics for the classification of Covid-19 positive vs negative cough samples.

In this study, alongside the cough samples, we also collected the outcomes from quantitative real time polymerase chain test (qRT-PCR) and lymphocyte count (blood ratio) tests. qRT-PCR test is currently considered the gold standard for detecting a positive Covid-19 infection. qRT-PCR test detects the (Covid-19) virus' RNA within a patient's genetic material. In qRT-PCR test, the RNA is reverse transcribed to DNA using specific enzymes. Additional short fragments of DNA, that are complementary to transcribed viral DNA, are then added. Some DNA strands are programmed to release a fluorescent dye. The amount of fluorescence is monitored in each cycle, until a threshold is surpassed. The fewer the cycles (Ct) it takes to surpass this threshold, the higher the severity of the infection. During the Covid-19 pandemic, a challenge is to identify patients with low and mild levels of infection or asymptomatic, so called 'carriers' [36] . Regardless of their asymptomatic conditions, positive qRT-PCR detection can be done with an adequate sample pooling to deal with potential borderline Ct values from these patients [37] . For this experiment, we labelled a cough sample in terms of whether it came from a patient whose Ct values were borderline positive (30 ≤ Ct < 35), standard positive (20 ≤ Ct < 30) or strong positive (Ct ≤ 20).

The performance results for the recognition of the cough sample using DeepCough2D and DeepCough3D are displayed in Fig. 10a and enlisted in Table IV . Overall, performance results show a recognition rate well above chance level and average AUC of 81.08% ± 5.05% for DeepCough3D. This can potentially be helpful to support two-stage screening protocols previously discussed in the introduction. The recognition of the disease severity was better at discriminating samples coming from the highly discerning groups, i.e. borderline and high positive, but struggled with the intermediate group. This is, however expected as intermediate samples can have a mixed pattern of cough acoustics to those loosely or highly affected. Nevertheless, its specificity was highly better than for the two other groups.

Another marker of disease severity that we have explored is lymphocyte count (viz. lymphopenia vs normal levels of lymphocyte counts). Lymphopenia is a condition defined as when patients have a blood lymphocyte percentage (LYM%) lower than 20%. Lymphopenia is the frequency associated with a severe infection or illness. The performance results from the recognition of lymphopenia vs normal levels of lymphocytes are graphically displayed in Fig. 10b . Although some works have suggested lymphocyte count as a way to grade Covid-19 severity [38] , our results to predict an infection grade using this marker are not as good as when labelled by the qRT-PCR test. However, the performance of DeepCough3D could also be hampered, at this occasion, by subset levels of lymphocyte counts that can be affected by biological and inter-subject variabilities [39] , [40] .

The Covid-19 pandemic has proven difficult to contain not only because of its high infection rate, but also because the symptoms of Covid-19 borne stark similarities with other viruses such as the common-flu and pneumonia. Hence, it has been particularly challenging for carriers of Covid-19 to know that they have been infected by Covid-19, therefore furthering the spread of Covid-19. To facilitate the early detection of Covid-19, we have developed a test from clinically validated Covid-19 positive and negative individuals that provided a cough sample and performed a molecular-based test in certified laboratories. This is a multi-center study, with populations from Spain and Mexico, to ensure the trained inference mechanism of DeepCough3D is unbiased towards particular demographic characteristics. In addition, the proposed Deep-Cough3D model, subsequently embedded in CoughDetect for recognition of Covid-19 coughs, was compared against related work and Auto-ML [35] . AutoML is a method for algorithm selection and hyper-parameter tuning, optimised through a full model selection strategy. In all the performance metrics for Covid-19 positives recognition, DeepCough3D performed better, as noted in Table II (a) Comparison of DeepCough 3D and 2D Classifiers with an Auto-ML ensemble of shallow machine learning methods [35] combined.

(b) Receiver Operating Characteristic curve for DeepCough and other methods in related work as a function of true and false positive rates. [42] .

In this work, a primary screening test for Covid-19 is proposed and assessed using clinically validated cough samples of participants, who jointly performed a molecular-test (qRT-PCR) in our partner hospitals. The proposed test framework is powered by a generic cough identification algorithm based in EMD and a recognition method named DeepCough3D. This latter method generates a 3D audio tensor to leverage the strength of a convolutional neural network approach to identify the latent characteristics in Covid-19 cough signals. The performance of DeepCough3D attains an AUC of 98.80 ± 0.83, a sensitivity of 96.43% ± 1.85%, that is comparable to the reported sensitivity (91% ± 10%) of accelerated serology tests based on saliva [43] . The proposed generic method does not require using specific transfer learning models or data from other studies, paving the way for derivative works. The proposed approach outperforms related works and other state-of-the-art methods. Further, the quality of our clinically controlled and validated large dataset increases our confidence in the validity of these results.

In addition to the development of a recognition test for Covid-19 using coughs, this work further investigates the possibility to recognise the extent of the Covid-19 infection in Covid-19 positive participants. This is undertaken with the qRT-PCR test and the lymphocyte count, and the results greatly surpassed chance levels of performance, indicating the feasibility of assessing severity to some extend. Classification of the coughs in three severity levels, defined by the resulting Ct of the molecular test for Covid-19, yields an average AUC of 81.08% ± 5.05%. This could potentially serve as an additional functional feature to diagnose the extent of the Covid-19 infection in a given Covid-19 carrier. This can help facilitate effective management of healthcare facilities during a pandemic, such as ventilators, which were in short supply during the first wave of the Covid-19 pandemic around the world.

Furthermore, the entire framework has been embodied as a web-app service available at CoughDetect.com. The motivation for developing this alternative test based on coughs is to have a fast turn-around for Covid-19 pointof-need primary test to a) reduce the burden on specialised personnel for clinical or secondary diagnosis of Covid-19, b) to make the primary screening available to masses at large from the comfort of their homes at negligible costs, and c) the anonymity of the participants is kept at the core by using in-house custom code to power the analysis and recording only their cough sounds. It can also be used as an electronic health certificate at public places such as airports, and schools.

In the midst of a global pandemic, the significance of our proposed point-of-need primary test, developed and tested on clinically validated data, is paramount. Our proposed primary test can mitigate the logistics, long turnaround time, and cost of clinical diagnostic test of Covid-19. For future works, parameter tuning of the sonograph representations and complementary analysis of coughing behaviours could be explored to investigate further improvements in performance. It would also be of interest to investigate whether tracking of Covid-19 progression can be done using DeepCough3D.

Last Accessed on 19/8/2020

Modes of transmission of virus causing COVID-19: implications for IPC precaution recommendations

COVID-19: Towards controlling of a pandemic

Countries test tactics in 'war' against COVID-19

Crimalddi: platform technologies and novel antimalarial drug targets

Sudden and complete olfactory loss of function as a possible symptom of covid-19

Covid-19 patients' clinical characteristics, discharge rate, and fatality rate of meta-analysis

Leveraging data science to combat covid-19: A comprehensive review

Automated algorithm for wet/dry cough sounds classification

Feature extraction for the differentiation of dry and wet cough sounds

Classification of voluntary cough sound and airflow patterns for detecting abnormal pulmonary function

Classification of voluntary coughs applied to the screening of respiratory disease

Enhanced forensic speaker verification using a combination of dwt and mfcc feature warping in the presence of noise and reverberation conditions

Digital technology and covid-19

Clinical characteristics of covid-19 patients with digestive symptoms in hubei, china: A descriptive, crosssectional, multicenter study

On the possibility of using speech to detect covid-19 symptoms: An overview and proof of concept

Smartphone-based self-testing of covid-19 using breathing sounds

Abnormal respiratory patterns classifier may contribute to largescale screening of people infected with covid-19 in an accurate and unobtrusive manner

Ai4covid-19: Ai enabled preliminary diagnosis for covid-19 from cough samples via an app

Coswara-a database of breathing, cough, and voice sounds for covid-19 diagnosis

Forensic voice identification

The empirical mode decomposition and the hilbert spectrum for nonlinear and non-stationary time series analysis

On empirical mode decomposition and its algorithms

Body Sensor Networking, Design and Algorithms

Hidden markov model based drone sound recognition using mfcc technique in practical noisy environments

On the recognition of cochlear implant-like spectrally reduced speech with mfcc and hmm-based asr

Pathological findings of covid-19 associated with acute respiratory distress syndrome

An intelligent healthcare system for detection and classification to discriminate vocal fold disorders

Automatic voice pathology detection with running speech by using estimation of auditory spectrum and cepstral coefficients based on the all-pole model

Linear predictive coding

Assessing the sound of cough towards vocality

Neural network based algorithm for automatic identification of cough sounds

Keras: The python deep learning library

Cough Against COVID: Evidence of COVID-19 Signature in Cough Sounds

Auto-sklearn 2.0

Covid-19: identifying and isolating asymptomatic people helped eliminate virus in italian village

Pooling of samples for testing for SARS-CoV-2 in asymptomatic people

Lymphopenia in severe coronavirus disease-2019 (covid-19): systematic review and meta-analysis

Biological variability of lymphocyte subsets of human adults' blood

Last Accessed on 18/8/2020

Diagnostic accuracy of serological tests for covid-19: systematic review and meta-analysis

Saliva as a candidate for covid-19 diagnostic testing: A meta-analysis

Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists

Acknowledgements. We thank the reviewers for the helpful and constructive comments. Thanks are extended to the ESRC Impact Acceleration Account (ES/T501815/1) at Essex University and the Talentia Senior Program by The Regional Ministry of Economy, Innovation, Science and Employment of Andalusia (reg. 201899905497765) for indirectly supporting this research. Thank you to Oracle for Research for providing Oracle Cloud credits and related resources to support this project.

Author contributions JAP contributed to the conceptualisation and coordination of the work, methodology, implementation, analysis, figures and writing of the manuscript. HPE contributed to the organisation of the study, methodology, implementation, analysis, figures and writing of the manuscript. ET, MGP & ABT worked in the data collection and laboratory analysis. MK was involved in the writing of the manuscript, elaboration of figures and analyses. ARP, ORG, & ATG contributed to the implementation of the proposed approach, and comparison methods. DJ contributed to the signal processing and cough sound detection. ZA worked in the sonograph analysis. NG worked in the implementation of the web-app prototype. CGR contributed to the coordination, methodology and appraisal of the work.