key: cord-0743039-5edtllrf authors: Somasekar, J.; Pavan Kumar Visulaization, P.; Sharma, Avinash; Ramesh, G. title: Machine Learning and Image Analysis Applications in the Fight against COVID-19 Pandemic: Datasets, Research Directions, Challenges and Opportunities date: 2020-09-22 journal: Mater Today Proc DOI: 10.1016/j.matpr.2020.09.352 sha: 86b6be1030b4499b737407c56a5764ed5829c275 doc_id: 743039 cord_uid: 5edtllrf COVID-19 pandemic has become the most devastating disease of the current century and spread over 216 countries around the world. The disease is spreading through outbreaks despite the availability of modern sophisticated medical treatment. Machine Learning and Image Analysis research has been making great progress in many directions in the healthcare field for providing support to subsequent medical diagnosis. In this paper, we have propose three research directions with methodologies in the fight against the pandemic namely: Chest X-Ray (CXR) images classification using deep convolution neural networks with transfer learning to assist diagnosis; Patient Risk prediction of pandemic based on risk factors such as patient characteristics, comorbidities, initial symptoms, vital signs for prognosis of disease; and forecasting of disease spread & case fatality rate using deep neural networks. Further, some of the challenges, open datasets and opportunities are discussed for researchers. The Corona Virus Disease-19 (COVID-19) pandemic and 216 countries, areas or territories throughout the world are now actively monitoring the outbreak. The WHO situation report published on July 17, 2020 tells that the confirmed cases are 1,36,16,593 and 5,85,727 of deaths [1] . COVID-19 is a global threat and obviously to tackle the pandemic requires an urgent global effort from not only all countries or territories, but also the scientific communities across all fields and disciplines. Computational Epidemiology has become increasingly multidisciplinary field that uses techniques from theoretical computer sciences, machine learning, image processing and has led to novel computational methods are very useful for modeling, understand, analyzing and controlling disease spread of disease and also treat COVID-19 infected patients [2] . The disease is spreading through outbreaks despite the availability of modern sophisticated medical treatment. With the rapid development of machine learning, image processing has been widely applied in the medical field for providing support to subsequent medical diagnosis. Accurate medical data analysis adds an advantage of early disease detection, risk prediction, patient care, and community services. The remaining article is organized as follows. Section 2 provides the public datasets to apply machine learning and image analysis techniques. In Section-3, we present the research directions with proposed methodologies. Section-4 provides challenges and research opportunities. Finally, the article concludes in Section 5. Coronavirus Disease (COVID-19) pandemic is the defining global health crisis. In this regard, there exist several open online datasets such as Chest X-Ray Images, Chest CT (CCT) Images, Risk factors and number of daily diseased with fatalities, datasets are available for researchers and health care professionals. For experimentation, the Chest X-Ray images of normal and COVID-19 infected patients can be collected from GitHub repository and other research articles [3] [4] [7] . The sample CXR images from the dataset as shown in Fig.1 . The CCT images normal and infected patients are available in the dataset [5] [6] . The dataset for timeseries disease spread and CFR forecasting is available in the Our World in Data by Hannah Ritchie [8] . The data on the coronavirus pandemic such as number of new cases, number of deaths, total number of cases, total tests and new tests etc., country-wise updated daily [9] [10]. There is a strong need for an automated diagnostic system for COVID-19 to improve the diagnostic quality and reduce the analysis time in chest X-Ray imaging. Automated detection and classification are critical steps in CAD for early diagnosis and treatment of COVID-19. Once a chest x-ray image is loaded onto a computer, it can be used for automated detection and classification of diseases. The image preprocessing such as contrast enhancement, noise removal required before post processing and will improve the classification performance. In this study, the image augmentation operations, such as random clipping, up-down flipping can be performed on specimens to increase the number of training samples. Since the training dataset is small and CNN tends to overfit, Deep CNN (DConvNet) with Transfer Learning is developed for classification of chest X-ray (CXR) images. In this study, different pre-trained DCNN models (namely, Resnet50, VGG19, VGG16, DenseNet-161, Resnet 101, VGG-19, and Inception V3) are used via transfer learning technique for COVID-19 chest X-ray image classification. The SoftMax regression activation function is used to classify the chest X-ray images into two classes in the last layer of the proposed model. The proposed classification framework is shown in Fig.2 for diagnosis of COVID-19 using CXR. The performance measures sensitivity, specificity, F-score, MCC and AUC can be used for testing efficiency of the proposed one. During the outbreak, identifying people at the risk of COVID-19 is the cornerstone of its [13] . The first layer of CNN is basic patient features (risk factors). The second layer is the convolution layer, followed by the max-pooling layer for introducing Sparsity on the learned features. Thus, we implement normalized convolution as a layer in CNN to filter the sparse data for sparse inputs so that the model gains better performance, and risk prediction is still highly accurate. The fourth layer is a fully connected layer linking with SoftMax classifiers for risk prediction with output label as, C = {C 0 ,C 1 }, where, C 0 and C 1 indicates the patient is at high-risk and low-risk of COVID-19, respectively. The pipeline of the proposed methodology is -pre-processing, building training set, training the proposed deep learning model, and testing. The reliable time series forecasting of COVID-19 disease is important for strategic preparedness to control case fatality rate (CFR) with proper preventive measures. In this work, we propose a Deep Neural Networks based on the Forecasting for Pandemic Case Fatality Rate (DNFCFR) for forecasting the numbers of infected cases and fatalities followed by CFT at a given time. The temporal features include fatalities, age, disease cases, etc., and the demographic features include density, population, hospitals, etc., which are used as input variables for the network, as shown in Fig.3 . In DNFCFR, we develop a two-branch neural network model, trained on both temporal and demographic features with multi-agent simulation, and use it to generate training data. The LSTM (Long short-term memory) network is used to capture the dynamic temporal behavior of the observations and need sequences of the input features for training. The length of the sequence is optimal. Since LSTM fits on unscaled data and to speed up learning, and faster convergence, we normalized training data by with A merge layer is added to combine = { 1 , 2 , 3 ,...., }. In this section, challenges & opportunities of Machine Learning (ML) and Image Analysis (IA) in fighting against the pandemic has been discussed. The main challenges in the development of framework using the ML and IA to fight against pandemic are: the generalizability of machine learning models, collecting multiple types of data in dataset, lack of the facilities/infrastructure to assess the deep learning models, imbalanced data in the datasets such as missing or null values corresponding to some attributes in risk prediction datasets, fatality prediction datasets, limited number of positive images of disease and more number of negative samples images and vice-versa in medical imaging datasets leads to false prediction while testing the model. In Fig.4 , some of the research opportunities and challenges are as shown [14] . In this study, the open research problems and challenges towards fight against the pandemic has been discussed. The three open research directions such as diagnosis of pandemic through classification of CXR images using machine learning techniques, risk prediction for prognosis of pandemic and forecasting of disease spreading for strategic preparedness to control case fatality rate with proper preventive measures & methodologies are presented. The heterogeneous datasets of COVID-19, challenges and opportunities are also discussed. A Novel Coronavirus from Patients with Pneumonia in China COVID-19 Image Data Collection Automated detection of COVID-19 cases using deep neural networks with X-ray images Deep Bayes-SqueezeNet based diagnosis of the coronavirus disease 2019 (COVID-19) from X-ray images Ferhat Ucar, Deniz Korkmaz, Medical Hypotheses Can AI help in screening viral and covid-19 pneumonia? arXiv preprint Classification of covid-19 patients from chest CT images using multi-objective differential evolution-based convolutional neural networks Data: COVID-19 Dataset Clinical features of patients infected with 2019 novel coronavirus in Wuhan Risk factors for severe corona virus disease 2019 (COVID-19) Risk Factors Associated With Acute Respiratory Distress Syndrome and Death in Patients With Coronavirus Disease Predicting the growth and trend of COVID-19 pandemic using machine learning and cloud computing Methodology and Conceptualization Data collection and visulaization Investigation and writing declare the following financial interests/personal relationships which may be considered as potential competing interests The authors declare that there is no conflict of interest.16.