key: cord-0782246-h2h4pluf authors: Comito, Carmela; Pizzuti, Clara title: Artificial intelligence for forecasting and diagnosing COVID-19 pandemic: A focused review date: 2022-03-28 journal: Artif Intell Med DOI: 10.1016/j.artmed.2022.102286 sha: 924a63a0e6dfa83e99af670ef34f3d2d7d3e6a7c doc_id: 782246 cord_uid: h2h4pluf The outbreak of novel corona virus 2019 (COVID-19) has been treated as a public health crisis of global concern by the World Health Organization (WHO). COVID-19 pandemic hugely affected countries worldwide raising the need to exploit novel, alternative and emerging technologies to respond to the emergency created by the weak health-care systems. In this context, Artificial Intelligence (AI) techniques can give a valid support to public health authorities, complementing traditional approaches with advanced tools. This study provides a comprehensive review of methods, algorithms, applications, and emerging AI technologies that can be utilized for forecasting and diagnosing COVID-19. The main objectives of this review are summarized as follows. (i) Understanding the importance of AI approaches such as machine learning and deep learning for COVID-19 pandemic; (ii) discussing the efficiency and impact of these methods for COVID-19 forecasting and diagnosing; (iii) providing an extensive background description of AI techniques to help non-expert to better catch the underlying concepts; (iv) for each work surveyed, give a detailed analysis of the rationale behind the approach, highlighting the method used, the type and size of data analyzed, the validation method, the target application and the results achieved; (v) focusing on some future challenges in COVID-19 forecasting and diagnosing. After networks. In recent years AI technology has been receiving a lot of interest in many application fields, including medicine to assist physicians and authorities in image inspection, surgery, medical data integration, hospital management, disease-assisted diagnosis, to name a few. In the following, we recall and describe the main AI learning techniques used by researchers to forecast the propagation of the coronavirus infection and its effects on new cases, recoveries, deaths, and diagnosis. A summary of the described approaches, along with the achronim used for denoting them, is reported in Table 1 . Regression analysis [18] is a supervised learning technique based on statistical concepts which allows to estimate the relationships between a dependent variable and one or more independent variables and to model the future relationship between them. The idea at the base of regression analysis for forecasting a time series Y is that there is a linear relationship with other time series X . Y is called regressand, forecast or dependent variable, while X the regressors, predictors or independent variables. In the simplest case, the forecast variable has a linear relationship with a single variable: multivariable regression model is obtained as with the regression coefficients i  computed for each independent variable i X . These coefficients measure the effect of each predictor by considering the effects of all the other predictors. In order to build a model, the regression coefficients must be estimated. The least square principle allows to choose the values of the coefficients by minimizing the sum of squared errors : Fitting (or training or learning) the models then means finding the best estimates of the regression coefficients which minimize the sum of squared errors. The prediction of Y can thus be obtained by substituting the estimated coefficients through equation (3) in the equation (2) Regularization is a technique that reduces overfitting when data has high variation. To achieve less variance on the test data, a penalty term is added to the best fit obtained from the training set and compresses the coefficients of the predictor variables to reduce their influence on the output variable. Thus, the number of variables is the same but the magnitude of their coefficients is reduced. Logistic regression is a predictive analysis technique used when the dependent variable is binary, like presence/absent, yes/no. Consider the simplest case with two predictors, 1 X and 2 X , and a binary variable Y . Let p denote the probability that =1 Y (p=P(Y=1)). It is assumed a linear relationship between the predictor variables and the log-odds (also called logit) of the event that =1 Y . In statistics, the logit function is the logarithm of the odds (a measure of the likelihood of a particular outcome) of the result 1 p p  . This relationship can be written as: (8) and applying algebraic manipulations the probability that Y=1 is given as Thus, if the coefficients are fixed, it is possible to compute the outcome of the Y variable. A time series [19, 20] is defined as a collection of data observed sequentially over time. A time series is modeled as a sequence of random variables = { : } t Y Y t T  , with T an index set. Y is called stochastic process and it is assumed to satisfy the assumption of stationarity, i.e. the probability laws of the process do not change over time. Time series analysis aims to model the stochastic mechanism that generates the observed series and to forecast the future values of the series on the base of the known history of that series. Often, a time series is decomposed into three components: the trend, which considers the variable movements without taking into account seasonality or irregularities; the seasonality, i.e. the periodic fluctuation of the variables; the residual, which is the unexplainable part of the time series. Moreover, time series can be univariate and multivariate. The former contains a single observation stored sequentially over time, the latter are used when more variables and their interactions are considered. J o u r n a l P r e -p r o o f If the number of weights is finite, the process is called a moving average process, and it is denoted as This series is called a moving average of order q and it is abbreviated to () MA q . An autoregressive process obtains the current value of the series t Y by using its past values. More in detail, a p -order auto regressive process t Y is obtained as a linear combination of the most recent past p values plus a new term, thus is satisfies the following equation: where t e is assumed to be independent of the past t Y , for every t . If the series is partly autoregressive and partly moving average, we obtain a mixed (13) J o u r n a l P r e -p r o o f {} t Y is called a mixed autoregressive moving average process of orders p and q and denoted as ( , ) ARMA p q . The above models assume stationarity, i.e. the process has a deterministic trend that will persist in the future. However, in many applications such an assumption is not realistic, and time series are non-stationary, thus do not have a constant mean over time. (14) The ( , , ) ARIMA p d q model is thus an extension of the ARMA (p, q) model which combines the Auto-Regressive ( ( )) AR p and the Moving Average ( ( )) MA q time series models with a differencing parameter d used to convert a non-stationary time series into a stationary series. Exponential smoothing is a time series forecasting method which, differently from the moving average family, assigns exponentially decreasing weights over time to the past observations. The simplest form of exponential smoothing forecasts the current value of A probabilistic model assumes that data input and output are random variables drawn from a probabilistic distribution ( , ) p x y , which is the ground truth. A model distribution, which approximates the ground truth, is built from the data. It is then possible to compute the probability of a class label given an input ( | ) p y x . This procedure is called marginalization . A probabilistic model refers to either discriminative model distribution or generative model distribution over the data. A generative model obtains the distribution from the dataset. Bayesian Learning [25] is as very popular approach to learning based on the famous Bayes rule: where a and b are random variables and ( | ) p a b is conditional probability of a given b , defined as J o u r n a l P r e -p r o o f (19) ( , ) p a b is the probability that both a and b occur. The term ( | ) p b a is called the likelihood, () pa the prior, and ( | ) p a b the posterior. In the machine learning context, given a training set D with m examples, the input x and the output y random variables, the aim is to find a probabilistic model ( , | ) p x y D which produces the data. It is possible to apply the Bayes rule by replacing y by the unknown parameters  . Thus we get: pD is the likelihood of parameters  , () p  is the prior probability of  , is the posterior of  given data D . Support Vector Machine (SVM) is a classification technique introduced by Boser et al. [26] that maximizes the margin between the training data and the decision boundary. SVM solves a binary classification problem by using the concept of separation hyperplane and finding the maximum separation margin that correctly classifies the training data as much as possible. The optimal hyperplane is represented with the support vectors. One of the main characteristics of SVM is the use of the so-called kernel trick [27] . Since often data is not linearly separable in the original input space, data is mapped into a higher-dimensional space by using a kernel function  . In this new space a linear separator is able to better discriminate between the different classes. where  is a kernel function. Several kernel functions can be used for the mapping, such as linear, polynomial, Gaussian, exponential, and Sigmoid. Changing the kernel, allows to build new models. SVM has been shown to be one of the most powerful classifiers in machine learning. Least Square Support Vector Machine is a variation of SVM introduced by Suykens and Vandewalle [28] which solves a set of linear equations instead of the inequalities (21) . The main advantage of this formulation of SVM is the higher efficiency since it transforms the task of solving a complex quadratic program to that of finding a solution of a set of linear equations. SVM can be used also to deal with regression problems. As described in Section 3.1, in a regression problem the model returns a continuous-valued output instead of a set of discrete values, thus regression is a generalization of the classification problem. Support Vector Regression is an extension of SVM which introduces a region, named tube, around the function to optimize with the aim of finding the tube that best approximates the continuous-valued function, while minimizing the prediction error, that is, the difference between the predicted and the true class label. SVR uses an  -insensitive loss function which penalizes predictions farther than  from the desired output. Different loss functions can be used, such as linear or quadratic. The value of  determines the width of the tube. J o u r n a l P r e -p r o o f Instance-based learning (IBL) is a group of algorithms that build an hypothesis directly from the training instances, and perform generalization by comparing a new instance with instances seen in training, already stored in memory. These algorithms are referred to as lazy , since computation is postponed until a new instance is observed. An example of IBL classifiers is the K-Nearest Neighbor (KNN), which, in order to assign an instance to a class, computes the similarity between the current instance and the k nearest training instances. Decision Trees (DTs) [29] is one of the most known classification method which predicts the class label of unknown instances after generating a tree from a set of training examples. The nodes of the tree are the attributes of the training set and a branch from a node corresponds to one of the possible values of that attribute. A new instance is classified by starting from the root of the tree, testing the value of its attributes, and following the branch down along the tree having the same attribute value of that example. [32] Stacking ensemble is a variation of ensemble learning whose main characteristic is the combination of different types of weak learners. Artificial Neural Networks have been proposed since 1940s as a simplified model of the human brain. However, it was only in 2006, after the paper of Hinton et al. [34] proposing the deep neural networks (DNNs), that the research in the field propagated very fast. Let w be a vector containing the parameters and x the input, ANNs can be mathematically considered as a nonlinear regression model ( Perceptrons are the basic units of ANNs. Their model function is computed as The nonlinear function  is called activation function. Training the perception model is done by updating the weights as follows: where  is the learning rate. , and after each update, it is given by: where W is the GG  matrix containing the weights and  is a nonlinear activation function. RNN are particularly apt to analyze sequential data and thus for temporal forecasting applications. However training could be challenging because of the problem of exploding and vanishing gradients and they are unable to model long term dependencies [36] . Long Short-term Memory networks [37] trie to address these problems by introducing the concepts of memory cells Convolutional Neural Network [40] is a feed-forward network using three main layers: the convolutional and the pooling, which are used to reduce the complexity, and the fully connected layer, which is a flattened layer connected to the output. The term convolutional comes from the mathematical convolution operation, which, given two functions, produces a new function providing how the shape of one is modified by the other. In can be considered as a specialized type of linear operator. Convolutional operation is used in place of matrix multiplication. Generative Adversarial Network GAN is a learning model composed of two neural networks, the generative network which generates candidate solutions, and the discriminative network which evaluates them. An autoencoder is an artificial neural network which learns efficient codings, generally using dimensionality reduction techniques, of unlabeled data. The encoding is evaluated and improved by trying to regenerate the input from the encoding. The model is trained with the objective of minimizing the error between the encoded-decoded data and the original data. A Variational Autoencoder (VAE) is an autoencoder which regularizes the training to avoid overfitting and improves the generative process. In this section we summarize the evaluation indexes adopted in the described papers for assessing the quality of the results obtained by the approaches. The main evaluation metrics are reported in Table 2 Figure 1 and Figure 2 show the specific ML or DL methods used, respectively. Figure 1 highlights that ARIMA J o u r n a l P r e -p r o o f is the most used technique with a percentage of 13%, followed by SVR (8%), SVM (7%), Naive Bayes and MLP (5%). Figure 2 shows that LSTM is the most used deep learning method with a percentage of 29%, followed by ANN with 16%, RNN and NN with 8%, CNN 6%, and Bi-LSTM 5%. The selected papers are published by different editors like IEEE, Elsevier, Springer, MDPI, and some others, while for preprints Medrxiv and arXiv are considered; 26% of the papers are preprints and 74% are peer reviewed articles and conference proceedings. Figure 3 shows that Elsevier largely surpasses all others with 33% of publications, followed by Springer with 10% and IEEE with 8%. Of the 146 papers selected, we removed preprints and chose the most significant and representative for each of the main ML and DL techniques. In total 38 papers have been selected to undergo a more in-depth study. An analytical review of those 38 papers is reported in Section 5 and summarized in Table 3 . Papers have been categorized on the basis of the AI method they used and described in the appropriate section according to the classification. However, some of the papers implemented more AI methods, thus, the classification includes the different methods used. In Section 5.5 the remaining 146 papers, either preprint at the time of writing, or reporting experimentations on few available data at the publication time, are overviewed. In this section we review the works in the literature discussing models, methods and applications of machine learning or deep learning techniques for COVID-19 forecasting and tracking, selected following the procedures discussed in Section 4. Differently from previous reviews regarding COVID-19, the one proposed in this paper focuses on a very specific topic, that is COVID-19 forecasting exploiting DL and ML methods. Accordingly, data extraction and classification of the selected studies, were conducted to evaluate the efficacy of the approaches in terms of COVID-19 detection, diagnosis, forecasting and spreading throughout AI enhancements, such as learning, regression and prediction. In particular, a detailed analysis of the 38 selected papers is provided throughout the section, grouping the different works according to the AI category employed. For each study in the literature, we extracted the most important features like the method implemented, the data type and size used, the evaluation methods adopted, the accuracy for each method, the results achieved. For each study, the features are summarized in Table 3 , while in Section 5.5 the rest of the 146 papers, including preprints, are overviewed. the reported daily COVID-19 confirmed, recovered, death, and active cases for these 10 countries from March 1st until May 20th, downloading them from the COVID-19 data repository managed by the Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE) [43] . The authors considered two times series forecasting models, ARIMA [21] and Prophet [22] , to obtain predictions and evaluated the quality of the results by using statistical measures. The results showed that ARIMA obtains better performance than Prophet, for most of the countries. For instance, the MAPE value of the active cases in Iran is 2% while that of Prophet is 82%. However, the main problem of ARIMA is that the authors used a different value for the order of the autoregression, i.e. the number of previous days necessary for finding the parameters. where Q is the population size, r the intrinsic growth rate, and K the maximum population size that the environment could carry. / dQ dt represents the growth of the population. performance when compared to the other models. The models achieve errors in a range 0.87%-3.51%, 1.02%-5.63%, and 0.95%-6.90% in one, three, and six-days-ahead, respectively. The ranking of models, from the best to the worst regarding accuracy, in all scenarios is The ranking of the models, with respect to the obtained accuracy in decreasing order, is SVR, stacking-ensemble learning, ARIMA, CUBIST, RIDGE, and RF. In J o u r n a l P r e -p r o o f The approaches described in the previous sections mainly apply the existing AI techniques on available data and compare the different methods to experimentally evaluate them, without introducing significant novel ideas. In this section, we report some of the approaches in both DL and ML domains that did not simply experimented existing techniques, rather they introduced original new ideas aiming at advancing methodologies in the field of AI for COVID-19 forecasting. The work of Casiraghi et al. [80] proposed an interesting methodological approach to identify abnormalities in chest radiographs (CXR) and, thus, improving patient risk prediction. To this purpose they designed an explainable machine learning system which may provide simple decision criteria to be used by clinicians as a support for early assessment of COVID-19 risk prediction estimated by both expert radiologists and by specialized state-of-the-art deep neural networks. A novel feature selection algorithm is proposed that combines the Boruta algorithm with permutation based feature selection methods to select variables that are most relevant for COVID-19 risk prediction. The most important variables are then selected to train a RF classifier, whose rules may be extracted, simplified, and pruned to finally build an associative tree. Results show that the radiological score automatically computed through a neural network is highly correlated with the score computed by radiologists, and that laboratory variables, together with the number of comorbidities, aid risk prediction. This study was performed on clinical, comorbidity, In Ramchandani et al. [83] is presented DeepCOVIDNet, a deep learning approach to predict COVID cases in the next seven days by using several features, such as census data, intra-county mobility, inter-county mobility, social distancing data, past growth of infection. The number of papers published in the last year is really huge, thus an exhaustive review and description of each of them is not possible. In this section we briefly describe approaches published as preprint at the time of writing, and thus not yet accepted after peer revision, or published in the early 2020s experimenting their methods on the few available data in the consider period. We report them since we used for computing statistics reported in Section 4. In In Kolozsv et al. [95] , a recurrent neural network is proposed to predict the epidemic curve. Two prediction models are created in this work, first the data is fed to a dense neural network and then a consequent regression output layer is used to predict the value. In Li et al. [96] , a recurrent NN is proposed to build a model of the pandemic in Italy. Kapoor [97] proposed a novel forecasting approach for COVID-19 case prediction that uses Graph Neural Networks and mobility data. In contrast to existing time series forecasting models, the proposed approach learns from a single large-scale spatio-temporal graph, where nodes represent the region-level human mobility, spatial edges represent the human mobility based inter-region connectivity, and temporal edges represent node features through time. A combination of XGBoost, K-means and LSTM algorithms is used in Vadyala et al. [98] to build a model to predict the pandemic in Louisiana, USA. In Javod et al. [99] , polynomial regression and neural network algorithms are used with the data made available by John Hopkins University to build a model of the pandemic. In [100] , exponential smoothing and ARIMA are used to predict the pandemic in India. In Zandavi et al. [101] , LSTM with dynamic behavioral model is adopted which considers the effect of multiple factors to enhance the accuracy of the prediction across top 10 most affected countries. In order to build a predictive model for the pandemic, a new architecture for DNN is proposed in Direkoglu et al. [102] , which consists of a LSTM layer, dropout layer and fully connected layers to predict regional and worldwide forecasts. In order to study the epidemic behavior in different zones in New York city, a clustering algorithm is proposed in [115] khmaissia et al., that models the outbreak in the city. In Suzuki et al. [ 116] , XGBoost is used to predict the number of infections in South Korea. In Pereira et al. [117] , a clustering algorithm is applied to the world regions for which epidemic data are available and the pandemic is at an advanced stage. Then a set of features representing the countries response to the early spread of the pandemic are used to train an Autoencoder Network to predict the future of the pandemic in Brazil. Two machine learning algorithms, neural network and Prophet, are used in Balde et al. [118] to study the impact of nation-wide measures on the pandemic. In Dandekar et al. [120] , an epidemiological model augmented with a neural network approach is proposed to study the effect of quarantine and isolation measures implemented in Wuhan on the reproduction number, R0. In this section are outlined articles that are still preliminary works. In da Silva et al. In Hartono [133] , Neural Networks and LSTM are used to build a model to forecast the pandemic all over the world. In [134] , a multi-layer perceptron and and vector aggression method are used to design a forecasting model for the epidemic in India. An unsupervised neural network algorithm called self-organizing map is proposed in Melin et al. [135] , which spatially groups together the countries that are similar to one another with respect to the pandemic, so can benefit from using similar strategies. A multilayer perceptron neural network is used in Mollato et al. [136] to predict the incidence rate of the pandemic in United States. In Tamang Table 3 summarizes the main features of the 38 reviewed papers. The features mainly concern AI-related characteristics, data related information, the topic addressed by the method, the experimental methodology adopted and the results obtained. Specifically, we have a column about the kind of ML or DL model used, or what could be considered more in general belonging to AI; other two columns reporting information about the dataset, and about the type of data and the time interval in which the COVID-19 related data was collected. The remaining columns specify the output produced, the validation method and the results achieved. to be the use of ensemble neural network to predict the number of confirmed cases and deaths. The models were evaluated on the basis of their accuracy and efficacy for different prediction lead times and employed different types of data from different countries in their study. Experiments have been validated following well known metrics in literature used for the evaluation of prediction performance, like prediction accuracy measured in terms of AUC, ROC curves, specificity, sensitivity, precision, correlation and prediction error measures in term of MAPE, MAE, RMSE. The analyzed papers address COVID-19 forecasting by looking at different factors and covering different scopes and topics. Figure 5 shows that the majority of the papers focuses on COVID-19 daily cases forecasting, with a relevant prevalence compared to the other tackled topics (42%). Mortality risk prediction was another topic widely studied in the selected literature papers, it has been found in the 30% of the works, followed by the prediction of recovery cases that reaches about 8%. COVID-19 risk prediction and diagnosis both reach 5%. Critical cases prediction and positive and negative cases prediction are around 3%. Since COVID-19 daily cases forecasting is the most widely studied topic, we show in Figure 6 the most performing ML and DL methods used by researchers. ARIMA and LSTM with both a percentage of 17% resulted to be the most successful AI methods for COVID-19 daily cases forecasting, followed by ANN with 13% and MLP with 9%. As can be argued from the figure, the rest of the approaches used other variants J o u r n a l P r e -p r o o f of LSTM like BiLSTM, DeepLSTM, which in total represents 37% of the approaches. This confirms the fact that LSTM-based approaches turned out to be the most successful for COVID-19 cases prediction. Figure 4 shows the data types used for COVID-19 forecasting. Different data types have been exploited including demographics, comorbidities, clinical data, blood tests, number of daily cases, number of daily deaths, number of recovery cases, vaccination rate, physiological data and number of daily phone calls. Of those types the most widely used is the number of daily cases with a percentage of 41%, followed by clinical data with the 19%, the number of daily deaths with 14%, demographics with 7% and then all other types with smaller percentage. While these studies show how a range of different methodological choices can be made when building forecasting models, they demonstrate the complexities involved in choosing between such models and the non-trivial interplay between methods, hyperparameters, and datasets. Moreover, since much of the data collected for COVID-19 modeling tasks is limited, the choice of models and datasets can have significant effects on overall performance. Artificial Intelligence algorithms play a key role in rapid forecasting, detection, classification, screening, and diagnosis of COVID-19 infection cases. Currently, AI mainly focuses on medical image inspection, genomics, drug development, and transmission prediction, and thus AI still has great unexplored potential mainly in terms of number of new cases and deaths prediction. In fact, even if many applications addressing COVID-19 forecasting and diagnoses have been proposed, only few of them are currently mature enough to be effective in real-world scenarios. Till end of 2020 AI was not fully explored on tracking and prediction of COVID-19 cases due to the lack of a vast amount of historical data to train the AI models. Accordingly, earlier papers that were published after few months of the worldwide COVID-19 outbreak, reported results of limited relevance due both to the lack of sufficient data to train the AI techniques in an appropriate way, but also because of the quality of the data themselves. In fact, due to the rapid diffusion of COVID-19, there was insufficient data at disposal as well as extensive labelled datasets not yet available. Training models on unrepresentative datasets lead to poor and even misleading outcomes as the fast-moving nature of the problem can make it difficult to perform informed model selection and parameters. This severely affected the performance and accuracy of the forecasting models. Today the availability of COVID-19 surveillance data in terms of number of daily and cumulative cases, number of deaths and number of recovery is not an issue anymore. In fact, after two years since COVID-19 outbreak, several collections of detailed data are available from different sources, like for example the one gathered by the Coronavirus Research Center of the Johns Hopkins University. Therefore, it would be very interesting if the authors of those early works could re-execute the proposed approaches using the high volumes of data now available and validate their approach on the new data. Another limitation is that many of the analyzed works do not exploit any exogenous variable in the forecasting process. Accounting restrictive measures like lockdown, quarantine, traveling limitations could enhance the prediction accuracy. Furthermore, the availability of vaccination data could be integrated in the forecasting models, greatly improving the performance of the prediction. Accordingly, a future research line could be to extend the proposed forecasting models with exogenous variables like the ones just discussed. Still concerns remain for the use of clinical data for COVID-19 early diagnosis and early symptoms prediction. There are several limitations to the feasible applications of AI methods for COVID-19 prediction on such kind of data. We outlined some of them as follows: 1. Lack of available large-scale training data. Most AI methods rely on large-scale annotated training data. In addition, annotating training samples is very time-consuming and requires professional medical personnel. 2. The distributed and heterogeneous nature of many data sources contributes to data scarcity. In fact, the different data formats together with the lack of data standardization and interoperability and missing values, make the application of AI methods on such data often inaccurate and unreliable. As highlighted in Dagliati et al. [1] , interoperability is a key concept: COVID-19 pandemic made clear that unified frameworks for sharing and exchanging digital epidemiological data together with data protection are necessary. Data federation, data integration and data fusion could be applied to overcome data heterogeneity, as well as the use of common standards at international level. Another suggestion could be the design of analytical models tailored to work with the specific issues of the current available clinical data related to COVID-19. 6. Privacy, anonymity and ethic issues are key concerns, which need to be addressed so as to enable effective contact tracing between citizens as well as effectively preserving their privacy. Privacy matters are also relevant when dealing with specific type of data like, for example, social media data that is often exposed to privacy violation as reported by Combi et al. [2] . As final remark we want to underline that still there is space to exploit advanced ML algorithms, like ensemble methods such as bagging, boosting, stacking, etc. for COVID-19 forecasting of new infections, deaths and recovery. Furthermore, the applicability of AI for early symptoms detection and disease diagnosing is not fully exploited yet. For example, supervised classification methods could be better adapted and explored for detection and classification of the different symptoms associated with COVID-19. The paper presented a systematic and comprehensive survey of the application of AI technologies for forecasting, detecting, and diagnosing COVID-19. The study examined and reviewed an extensive collection of state-of-the-art COVID-19 prediction and diagnosis J o u r n a l P r e -p r o o f algorithms, providing a detailed background description of the AI techniques used for COVID-19. For each work surveyed, is provided a detailed analysis of the rational behind the approach, highlighting the method used, the type and size of data analyzed, the validation method, the target application and the results achieved. Despite all the significant progress in the application of AI in addressing COVID-19 issues, there is still a need for further implementation of these technologies for detecting, monitoring, and diagnosing. Future work should focus on strengthening the current technologies mostly for early differential diagnosis of COVID-19 on clinical data. Also, future work should consider the issues related to privacy preserving and security of sensible health and personal data of citizens. Health informatics and EHR to support clinical research in the COVID-19 pandemic: an overview Health informatics: Clinical information systems and artificial intelligence to support medicine in the covid-19 pandemic A survey on applications of artificial intelligence in fighting against COVID-19 Artificial intelligence vs COVID-19: limitations, constraints and pitfalls Artificial intelligence (ai) and big data for coronavirus (COVID-19) pandemic: A survey on the state-of-the-arts Data-Driven Methods to Monitor, Model, Forecast and Control COVID-19 Pandemic: Leveraging Data Science How big data and artificial intelligence can help better manage the COVID-19 pandemic Leveraging data science to combat COVID-19: A comprehensive review Prediction models for diagnosis and prognosis of COVID-19: systematic review and critical appraisal Applications of machine learning and artificial intelligence for COVID-19 (sars-cov-2) pandemic: A review A review of modern technologies for tackling COVID-19 pandemic, Diabetes and Metabolic Syndrome Mapping the landscape of artificial intelligence applications against COVID-19 Artificial intelligence in the fight against COVID-19: Scoping review Machine learning applications for COVID-19: A state-of-the-art review Significance of deep learning for COVID-19: state-of-the-art review Applications of artificial intelligence in battling against COVID-19: A literature review AI techniques for COVID-19 Forecasting: Principles and Practice Time Series Analysis, Forecasting and Control Forecasting at scale Some studies in machine learning using the game of checkers Machine Learning: a Probabilistic Perspective A training algorithm for optimal margin classifiers A generalized representer theorem Least squares support vector machine classifiers McGrow-Hill Using model trees for classification Bagging predictors Boosting a weak learning by maiority Proceedings of the 13th Int. Conference on Machine Learning A fast learning algorithm for deep belief nets Learning and generalization characteristics of the random vector functional-link net Deep Learning Long short-term memory Learning phrase representations using RNN encoder-decoder for statistical machine translation An enhanced stacked LSTM method with no random initialization for malware threat hunting in safety and time-critical systems Convolutional Networks for Images, Speech, and Time Series procedures for performing systematic reviews COVID-19 pandemic prediction using time series forecasting models Github inc. covid-19 cases Study of ARIMA and least square support vector machine (LS-SVM) models for the prediction of SARS-CoV-2 confirmed cases in the most affected countries Prediction of epidemic trends in covid-19 with logistic model and machine learning technics Forecasting of covid19 per regions using arima models and polynomial functions Predictions for covid-19 with deep learning models of lstm, gru and bi-lstm Forecasting of COVID-19 cases using deep learning models: Is it reliable and practically significant? Time series forecasting of covid-19 using deep learning models: India-usa comparative case study, Chaos, Solitons and Deep learning methods for forecasting covid-19 time-series data: A comparative study Comparison of deep learning approaches to predict covid-19 infection A novel adaptive deep learning model of covid-19 with focus on mortality reduction strategies Real-time measurement of the uncertain epidemiological appearances of COVID-19 infections A deep learning prognosis model help alert for covid-19 patients at high-risk of death: A multi-center study Artificial intelligence forecasting of COVID-19 in China COVID-19 forecasting based on an improved interior search algorithm and multi-layer feed forward neural network A bayesian -deep learning model for estimating COVID-19 evolution in spain COVID-19 learning models A comparative analysis of different regression models on predicting the spread of covid-19 in india Covid-19 pandemic prediction for hungary; a hybrid machine learning approach A machine learning model to identify early stage symptoms of sars-cov-2 infected patients Machine learning based early warning system enables accurate mortality risk prediction for covid-19 Predicting the covid-19 infection with fourteen clinical features using machine learning classification algorithms Utilization of machine-learning models to accurately predict the risk for critical covid-19 Detection of covid-19 infection from routine blood exams with machine learning: a feasibility study Application of machine learning time series analysis for prediction covid-19 pandemic Mohi Ud Din, Machine learning based approaches for detecting covid-19 using clinical text data Realizing an effective covid-19 diagnosis system based on machine learning and iot in smart hospital environment Short-term forecasting covid-19 cumulative confirmed cases: Perspectives for Brazil Combining instance-based and model-based learning Machine learning model for computational tracking and forecasting the covid-19 dynamic propagation Modelling and forecasting of covid-19 spread using wavelet-coupled random vector functional link networks Arima modelling and forecasting of covid-19 affected countries Using machine learning to predict icu transfer in hospitalized COVID-19 patients Machine-learning approaches in COVID-19 survival analysis and discharge-time likelihood prediction using clinical data Prediction of respiratory decompensation in COVID-19 patients using machine learning: The ready trial Ensemble learning model for diagnosing COVID-19 from routine blood tests Predicting mortality risk in patients with COVID-19 using machine learning to help medical decision-making A methodological approach for predicting COVID-19 epidemic using eemd-ann hybrid model Explainable machine learning for early assessment of covid-19 risk prediction in emergency departments A novel intelligent computational approach to model epidemiological trends and assess the impact of non-pharmacological interventions for covid-19 Forecasting COVID-19 daily cases using phone call data Deepcovidnet: An interpretable deep learning model for predictive surveillance of COVID-19 using heterogeneous features and their interactions Deepfm: A factorization-machine based neural network for CTR prediction Predicting COVID-19 in China using hybrid AI model Deep-lstm ensemble framework to forecast COVID-19: an insight to the global pandemic Covid-19 outbreak prediction with machine learning Modelling and forecasting of covid-19 spread using wavelet-coupled random vector functional link networks Hi-covidnet: Deep learning approach to predict inbound COVID-19 patients and case study in south korea Multiple-input deep convolutional neural network model for COVID-19 forecasting in China COVID-19 epidemic analysis using machine learning and deep learning algorithms A machine learning model reveals older age and delayed hospitalization as predictors of mortality in patients with COVID-19 Prediction of criticality in patients with severe COVID-19 infection using three clinical features: a machine learning-based prognostic model with clinical data in wuhan A machine learning-based model for survival prediction in patients with severe COVID-19 infection Predicting the epidemic curve of the coronavirus (sars-cov-2) disease (COVID-19) using artificial intelligence A recurrent neural network and differential equation based spatiotemporal infectious disease model with application to COVID-19 Examining COVID-19 forecasting using spatio-temporal graph neural networks Prediction of the number of COVID-19 confirmed cases based on k-means-lstm Predictive analysis of COVID-19 time-series data from johns hopkins university Short-term forecasts of COVID-19 spread across indian states until Forecasting the spread of COVID-19 under control scenarios using lstm and dynamic behavioral models Worldwide and regional forecasting of coronavirus (COVID-19) spread using a deep learning model Forecasting the COVID-19 pandemic with climate variables for top five burdening and three south asian countries COVID-19 growth prediction using multivariate long short term memory Forecasting COVID-19 cases in india using machine learning models Multiple ensemble neural network models with fuzzy response aggregation for predicting COVID-19 time series: The case of mexico Covid-net: A deep learning based and interpretable predication model for the county-wise trajectories of COVID-19 in the United States Predicting the epidemic curve of the coronavirus (sars-cov-2) disease (COVID-19) using artificial intelligence Tracking and classifying global COVID-19 cases by using 1d deep convolution neural networks How well can we forecast the COVID-19 pandemic with curve fitting and recurrent neural networks? Preparedness and mitigation by projecting the risk against COVID-19 transmission using machine learning techniques Explainable machine learning models to understand determinants of COVID-19 mortality in the united states Forecasting COVID-19 cases using machine learning models A machine learning methodology for real COVID-19 outbreak using internet searches, news alerts, and estimates from mechanistic models An unsupervised machine learning approach to assess the zip code level impact of COVID-19 in nyc Machine learning model estimating number of COVID-19 infection cases over coming 24 days in every province of south korea (xgboost and multioutputregressor Forecasting COVID-19 dynamics in Brazil: a data driven approach Impact studies of nationwide measures COVID-19 anti-pandemic: compartmental model and machine learning Modeling projections for COVID-19 pandemic by combining epidemiological, statistical, and neural network approaches Neural network aided quarantine control model estimation of covid spread in wuhan Forecasting brazilian and american COVID-19 cases based on artificial intelligence coupled with climatic exogenous variables Use of machine learning and artificial intelligence to predict sars-cov-2 infection from full blood counts in a population Modelling and predicting the spatio-temporal spread of coronavirus disease 2019 (COVID-19) in italy Artificial neural networks for short-term forecasting of cases, deaths, and hospital beds occupancy in the COVID-19 pandemic at the brazilian amazon Investigating a serious challenge in the sustainable development process: Analysis of confirmed cases of regression analysis Outbreak prediction of COVID-19 for dense and populated countries using machine learning COVID-19 prediction using lstm algorithm: Gcc case study Arima and nar based prediction model for time series analysis of COVID-19 cases in india Prediction of the COVID-19 pandemic for the top 15 affected countries: Advanced autoregressive integrated moving average (arima) model Modeling the spread of COVID-19 infection using a multilayer perceptron Composite monte carlo decision making under high uncertainty of novel coronavirus epidemic using hybridized deep learning and fuzzy rule induction Finding an accurate early forecasting model from small dataset: A case of 2019-ncov novel coronavirus outbreak Similarity maps and pairwise predictions for transmission dynamics of COVID-19 with neural networks Stoch environ res risk assess Analysis of spatial spread relationships of coronavirus (COVID-19) pandemic in the world using self organizing maps Artificial neural network modeling of novel coronavirus (COVID-19) incidence rates across the continental united states Forecasting of COVID-19 cases based on prediction using artificial neural network curve fitting technique Modeling and prediction of COVID-19 in mexico applying mathematical and computational models Forecasting COVID-19 outbreak progression in italian regions: A model based on neural network training from chinese data Forecasting the prevalence of COVID-19 outbreak in egypt using nonlinear autoregressive artificial neural networks Statistical explorations and univariate Time series forecasting of COVID-19 transmission in canada using lstm networks Deep learning and holt-trend algorithms for predicting COVID-19 pandemic Prediction for the spread of COVID-19 in india and effectiveness of preventive measures Predicting COVID-19 incidence through analysis of google trends data in iran: Data mining and deep learning pilot study Neural network based country wise risk prediction of COVID-19 Association between weather data and COVID-19 pandemic predicting mortality rate: Machine learning approaches Prioritizing and analyzing the role of climate and urban parameters in the confirmed cases of COVID-19 based on artificial intelligence applications Comparative analysis and forecasting of COVID-19 cases in various european countries with arima Analysis on novel coronavirus (COVID-19) using machine learning methods An empirical overview of nonlinearity and overfitting in machine learning using COVID-19 data Marine predators algorithm for forecasting confirmed cases of COVID-19 in italy, usa, iran and korea Optimization method for forecasting confirmed cases of COVID-19 in China Prognostic modeling of covid-19 using artificial intelligence in the united kingdom: Model development and validation