key: cord-240274-igoz2ei4
authors: Subirana, Brian; Hueto, Ferran; Rajasekaran, Prithvi; Laguarta, Jordi; Puig, Susana; Malvehy, Josep; Mitja, Oriol; Trilla, Antoni; Moreno, Carlos Iv'an; Valle, Jos'e Francisco Munoz; Gonz'alez, Ana Esther Mercado; Vizmanos, Barbara; Sarma, Sanjay
title: Hi Sigma, do I have the Coronavirus?: Call for a New Artificial Intelligence Approach to Support Health Care Professionals Dealing With The COVID-19 Pandemic
date: 2020-04-10
journal: nan
DOI: nan
sha: 
doc_id: 240274
cord_uid: igoz2ei4

Just like your phone can detect what song is playing in crowded spaces, we show that Artificial Intelligence transfer learning algorithms trained on cough phone recordings results in diagnostic tests for COVID-19. To gain adoption by the health care community, we plan to validate our results in a clinical trial and three other venues in Mexico, Spain and the USA . However, if we had data from other on-going clinical trials and volunteers, we may do much more. For example, for confirmed stay-at-home COVID-19 patients, a longitudinal audio test could be developed to determine contact-with-hospital recommendations, and for the most critical COVID-19 patients a success ratio forecast test, including patient clinical data, to prioritize ICU allocation. As a challenge to the engineering community and in the context of our clinical trial, the authors suggest distributing cough recordings daily, hoping other trials and crowdsourcing users will contribute more data. Previous approaches to complex AI tasks have either used a static dataset or were private efforts led by large corporations. All existing COVID-19 trials published also follow this paradigm. Instead, we suggest a novel open collective approach to large-scale real-time health care AI. We will be posting updates at https://opensigma.mit.edu. Our personal view is that our approach is the right one for large scale pandemics, and therefore is here to stay - will you join?

and volunteers, we may do much more. For example, for confirmed stay-at-home COVID-19 patients, a longitudinal audio test could be developed to determine contact-with-hospital recommendations, and for the most critical COVID-19 patients a success ratio forecast test, including patient clinical data, to prioritize ICU allocation. As a challenge to the engineering community and in the context of our clinical trial, the authors suggest distributing cough recordings daily, hoping other trials and crowdsourcing users will contribute more data. Previous approaches to complex AI tasks have either used a static dataset or were private efforts led by large corporations. All existing COVID-19 trials published also follow this paradigm. Instead, we suggest a novel open collective approach to large-scale real-time health care AI. We will be posting updates at https://opensigma.mit.edu. Our personal view is that our approach is the right one for large scale pandemics, and therefore is here to stay -will you join?

Neither Siri, Alexa, nor Google can tell us if we have the Coronavirus despite the millions of expensive programing man-hours invested in them, nor can they support operational efficiency of related health care processes. Early detection of COVID-19 patients could drastically lower the spread of the disease 1,2 , but it is not common practice yet because existing tests are not available at scale, require health care professionals' already limited time, are lengthy, and, are often wasted due to the lack of a reliable pre-screening test.

The goal of our research is weather such a test can be performed simply through phone-based cough recordings. In the past, speech recognition algorithms have been demonstrated to be able to perform various tasks related to cough detection. 3 In several cases involving different neurological conditions, researchers were able to develop machine learning algorithms that used free-flow speech to predict disease onset earlier than human experts, including psychosis with a sample size of less than 50, 4 and cognitive impairment with less than a thousand 5 . These studies effectively demonstrate the potential for Artificial Intelligence (AI) to achieve superhuman diagnosis capabilities, which we now believe can also be leveraged for COVID-19 diagnosis. Previous research shows audio recordings can be used to diagnose pneumonia 6 , even from cheap cell phone recordings, 7 similar to the COVID-19 example shown in Figure 1 . Dysphonia can be caused by and lead to the detection of inflammatory conditions such as allergies 8 , infections 9 , reflux 10 , smoking 11 Details of our implementation will be published elsewhere. After trying a few models, we modified a biologically inspired SOP, trained on regular speech. To do so, we started with a convolutional neural network trained on a regular speech dataset and then applied transfer learning. Transfer learning is a field of machine learning that focuses on improving the predictive power of an algorithm on a specific task by learning from a similar but distinct task. In formal terms, we can define a domain that is composed by a feature space and marginal distribution ( ). A task for a given domain is defined as a label space and predictive function , which is based on a conditional distribution ( | ). We can therefore define a source domain

Transfer learning refers to the action of improving on the target task " 's predictive function " from the applicable transferable information learned from ! and ! , where " ≠ ! and " ≠ ! . Note that transfer learning does not care about the predictive power of ! once " has been updated, which is a field called multi-task learning 16 . Our implementation uses transfer learning from the domain of speech audio recordings !&''() to the target domain of COVID infected cough audio recordings *+,-) . Assuming audio recording and preprocessing is similar between both domains, we can define the specific case of !&''() = *+,-) , but distinct label spaces !&''() ≠ *+,-) , which in the field of transfer learning is formally defined as heterogeneous transfer learning. This specific case of heterogeneous transfer learning requires for a domain adaption, which is to minimize the distance between marginal distributions ( 4 !&''() 5 ≠ ( *+,-) )), and marginal distributions

. Feature based approaches (e.g. 17 18 19 ) can be used

where is a transformation function derived from features based transfer learning approaches. From this point on, this becomes a homogeneous transfer learning problem, which has been tackled in the past (and continues to be so) notably within instance-based (e.g. 20 21 ), symmetric and asymmetric feature-based (e.g. 22 23 ), parameter based (e.g. 24 25 ) and relational-based (e.g. 26 27 ) transfer learning approaches 28 .

Specifically, within the field of deep learning, a popular approach for transfer learning is "off-theshelf" feature generation 29 , where a pretrained model from a source domain is used to generate a set of features from the task domain. These features are then processed using a shallow classifier (e.g. logistic regression, SVM, k-Nearest-Neighbors). This approach is especially utilized when the target domain " has very few labeled examples, which is the case within our application, but tends to be surpassed by "fine-tuning" approaches when more data is available 30 , by retraining specific layers from data within the target domain. We also tested this approach by pre-training DenseNet201 31 and ResNet50 32 architectures with samples from the speech dataset. We then used the output of the last layer of each of the models to generate features from our small set of cough recordings from COVID19 diagnosed and healthy patients (0.7 train, 0.3 test split).

We believe that the first stage in the transfer learning, i.e. that of selecting the non-cough domain, is what can have most impact in the results of the model. In fact, we evaluated the performance of four shallow machine learning classification algorithms (SVM, k-Nearest Neighbors, Random Forest, Logistic Regression) over a set of 5 cross validation test splits (see Figure 2 ). We then used Principal Component Analysis 33 to generate a visualization demonstrating the clustering between healthy and COVID-19 coughs respectively (see results on Figure 3 and 4). Our immediate next steps are to validate and grow our model as we get more data from our clinical trial.

In the next two sections we describe how we may do so collectively.

To further validate and grow our model we aim to collect data on 150 patients and 3000 contacts as part of the clinical trial motioned in the abstract, which is focused on the COVID-19 onset. A second concurrent effort will look at Hospital Clínic de Barcelona patients focusses on symptomatic patients. We have also recently initiated data collection in Mexico and the USA. These four concurrent efforts may offer increased chances for transfer learning for our original screening test goal. We believe that if we broaden our sampling goals, we can also incorporate two news tests:

-For confirmed stay-at-home COVID-19 patients, a longitudinal audio test to determine contact-with-hospital recommendations. -For the most critical COVID-19 patients, a success ratio to prioritize intensive care unit (ICU) allocation.

Thus, we have started a process to collect short audio-recording segments of 12 seconds for COVID-19 positive and control individuals, accepting them via WhatsApp, Web, Email or using an MIT developed Mobile App. Our underlying hypothesis is that several second-long voice samples can ultimately save millions of lives. We are focusing on four types of daily samples for each donating individual:

-Cough sounds -The digits from 0 to 9 -The word "Ommmmmmm", with the "m" sound extending for the 12 seconds when possible

The rational for these choices follows from what is standard practice in the speech recognition community. 34 We conjecture that numeral pronunciation can signal type of voice patterns, while the extended "Om" sound may be detectably related to lung conditions. Metadata useful to subject matter experts is often also helpful in developing AI algorithms (see supplementary table I for the one we will be collecting). It is difficult to say at the current time how important such metadata may be. In some cases, metadata not used by humans can yield superhuman results in AI algorithms. For example, unlike human specialists, AI can detect gender information from retina scans. 35 Until we have a sizable and longitudinal sample database, we cannot be sure how many useful tests a COVID-19 detection algorithm can extract from sound signals. Indeed, it would be irresponsible to set upper or lower limits at this point. The data may be heavily biased towards one particular language, or the model may only apply to the onset of the disease -where the clinical trial is focusing on. As we discuss next, that is why we suggest an additional and different approach, one that depends on you the reader and many others.

Prevalent approaches to AI either enjoy a continuous pipeline of training data, such as the ones available to the FAANGs 2 , or have the benefit of accessing a growing base of code, as in open source projects. The need for a new approach perhaps explains why AI has been largely absent from the infectious disease management debate, including all the current COVID-19 clinical trials we know of. Bill Gates did not even mention AI in his prophetic Ted Talk "The next outbreak? We are not ready" 3 . Others have advocated non-diagnostic uses of AI 36, 37, 38 .

In contrast, we suggest a rapid development approach we call Sigma, where both data and code are shared real time. This approach means a novel collective effort by the Health and Engineering communities, where the first sets directions and provides patient samples in real time, while the second creates algorithms to improve infection management practices. Both groups will share samples, models and insights on a daily basis, further iterating sampling requirements, health practices and engineering efforts.

Sigma's objective is to set a research and engineering pipeline where daily samples received are immediately posted with an open source license for world-wide collaboration. The goal is to maximize the exposure of the samples to leverage talent worldwide as quickly as possible, with the three tests above as an initial challenge. AI algorithms have shown to be able to process unstructured data from various sources while incorporating cognitive data from experts. We will report elsewhere some of the legal hurdles we are encountering.

Sigma may become a template for future collaborations between engineering and medical professionals, but the urgent priority now is that other Hospitals join us in promoting the effort to feed the open sample database as quickly as possible. We are doing the groundwork so that MIT can accept samples donated by participating Hospitals and crowdsourcing individuals that are de-identified. This means that there is no reasonable basis to identify the recorded individual, and, that identifiers are not included as part of the metadata associated with the sample (such as names, telephone, age above 89, geographic subdivision of a certain size, etc.).

The three tests above are not the only urgent priority we may address collectively. As a byproduct of the data capturing process, we feel that voice interfaces can vastly increase ICU operational efficiency while minimizing infections. We expect other byproducts as a benefit of the Sigma approach. For example, the possibility for Physicians where the Pandemic has not yet arrived, to hear COVID-19 cough sounds or to see chest X-Rays for training. Or that the partial shutdown of the economy means vast idle computational and human resources may temporarily be donated to this effort.

Our suggested approach is unique because data is offered real time, as soon as is curated, allowing engineers to immediately test and improve candidate AI processing pipelines. Will you join? Figure 3 with samples a few days a part.

Taking the right measures to control COVID-19

Clinical Characteristics of 138 Hospitalized Patients With 2019 Novel Coronavirus-Infected Pneumonia in Wuhan, China

Automatic identification of wet and dry cough in pediatric patients with respiratory diseases

Automated analysis of free speech predicts psychosis onset in high-risk youths

Role-specific Language Models for Processing Neuropsychological Exams

Spoken Language Biomarkers for Detecting Cognitive Impairment

Wavelet augmented cough analysis for rapid childhood pneumonia diagnosis

Diagnosis of pneumonia from sounds collected using low cost cell phones

Vocal Symptoms and Allergy-A Pilot Study

Effect of chronic otitis media on language and speech development

Laryngopharyngeal Reflux: Position Statement of the Committee on Speech, Voice, and Swallowing Disorders of the American Academy of Otolaryngology-Head and Neck Surgery

Cigarette smoking and voice fundamental frequency

Spasmodic Dysphonia Subsequent to Head Trauma

Automatic identification of wet and dry cough in pediatric patients with respiratory diseases

Deep Learning System to Screen Coronavirus Disease 2019 Pneumonia

A survey on transfer learning

Regularized multi--task learning

Feature-based transfer learning for network security

Simultaneous deep transfer across domains and tasks

Deep transfer network with joint distribution adaptation: A new intelligent fault diagnosis framework for industry application

Double-bootstrapping source data selection for instance-based transfer learning

Multi-source domain adaptation and its application to early detection of fatigue

Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion

Asymmetric transfer learning with deep gaussian processes

The more you know, the less you learn: from knowledge transfer to one-shot learning of object categories

Safety in numbers: learning categories from few examples with multi model knowledge transfer

Relational knowledge transfer for zero-shot learning

Transfer learning from minimal target data by mapping across relational domains

A survey of transfer learning

CNN features off-the-shelf: an astounding baseline for recognition

How transferable are features in deep neural networks?

Densenet: Implementing efficient convnet descriptor pyramids

Identity mappings in deep residual networks

Sparse principal component analysis

See for example the 2020 ADReSS Challenge (Alzheimer's Dementia Recognition through Spontaneous Speech

Prediction of cardiovascular risk factors from retinal fundus photographs via deep learning

On the Coronavirus (COVID-19) Outbreak and the Smart City Network: Universal Data Sharing Standards Coupled with Artificial Intelligence (AI) to Benefit Urban Health Monitoring and Management

Identification of COVID-19 Can be Quicker through Artificial Intelligence framework using a Mobile Phone-Based Survey in the Populations when Cities/Towns Are Under Quarantine

A Viral Warning for Change. The Wuhan Coronavirus Versus the Red Cross: Better Solutions Via Blockchain and Artificial Intelligence. The Wuhan Coronavirus Versus the Red Cross: Better Solutions Via Blockchain and Artificial Intelligence

Clinical characteristics of 138 hospitalized patients with 2019 novel coronavirus-infected pneumonia in Wuhan, China

Speech commands: A dataset for limited-vocabulary speech recognition

Computing mel-frequency cepstral coefficients on the power spectrum