key: cord-0896199-o67p8lfl
authors: Ayoub Khan, Mohammad; Khan, Rijwan; Algarni, Fahad; Kumar, Indrajeet; Choudhary, Akshika; Srivastava, Aditi
title: Performance Evaluation of Regression Models for COVID-19: A Statistical and Predictive Perspective
date: 2021-09-08
journal: Ain Shams Engineering Journal
DOI: 10.1016/j.asej.2021.08.016
sha: f074ef2283bb15bc6a1fa50666dae41d7799162e
doc_id: 896199
cord_uid: o67p8lfl

Research is very important in the pandemic situation of COVID-19 to deliver a speedy solution to this problem. COVID-19 has presented governments, corporations and ordinary citizens around the world with technology playing an essential role to tackle the crisis. Moderate and flexible innovation arrangements that can speed up progress towards giving critical well-being ability are proved hourly. Knowledge with the aid of creativity must be obtained, accepted and analysed in a short time frame. In this example, the machine learning model has a major role to play in predicting the number of next positive COVID-19 cases to come. For government departments to take effective and strengthened future COVID-19 planning and innovation. The ongoing global pandemic of COVID-19 has been non-linear and dynamic. Due to the especially perplexing nature of the COVID-19 episode and its diversity from country to country, this study recommends machine learning as a convincing means to demonstrate flare-up. In this linear regression, polynomial regression, ridge regression, polynomial ridgeregression, support vector regression models, the COVID-19 data set from multiple on-line tools have been evaluated. During the work process comprehensive experiments were performed and each test was evaluated with the parameters mean square error (MSE), medium absolute error (MAE), root mean square error (RMSE) and R2 score. This study also offers a path for future research using regression models based on machine learning. Precise validation and data analysis can contribute to strategies for healing and disease prevention at an early stage. A systematic comprehensive strategy is a new philosophy in which statistical data for government agencies and community can be forecast.

Coronavirus or COVID-19 originated in Wuhan, China in the month of December 2019. Till date (12 July, 2021) , there is no avowed human immunization for combating it. COVID-19 engendering is speedy when people are in vicinity [1] . Coronavirus would be the most unnerving situation faced by the whole world since the end of the second world war. Even more unnerving as the enemy is invisible. COVID-19 has affected our lives to a great extent. Work, economy, education and almost everything has come to stand still. People nowadays only focus on what is essential to live.

Government and other organizations have had to take unprecedented steps to stem the trail of destruction. The reasons for its spread and thinking about its threat, practically all the nations have proclaimed either partial or complete lockdowns all through the influenced districts and regions.

Since there is no endorsed medicine till now for slaughtering the infection so the legislatures of all nations are concentrating on the precautionary measures which can stop the spread [2] . Since there is no embraced solution till now for dispatching this contamination, theauthorities of all countries are paying attention to the careful steps that can minimize the spread. This virus causes respiratory tract infections that can range from mild to fatal. This infection can influence individuals of any age yet older people or individuals with previous ailments are more inclined or vulnerable to it.

The most testing part of its spread is that an individual can have the infection for a long time without indicating side effects [3, 4, 5, 6] . Utilization of mask, sanitizer, customary hand washing and cleanliness is the most ideal approach to keep from this illness. People are urged to believe Staying homes safes lives. COVID-19 will reshape our reality.

The World Health Organization (WHO) on February 11, 2020 came forward stating this virus as a pandemic outbreak naming it as COVID-19 [21] further stating that the virus had first taken place in China, making it to different countries. USA, Brazil, and India have been the most affected countries where the number of cases of this pandemic COVID-19 multiplied rapidly on the daily basis.

Machine learning plays a major role in better understanding and examining COVID-19 crisis as it identifies the patterns in data and uses them to automatically make predictions or decisions. In the medicinal services area, Machine Learning can be seen as an asset that has the extension to process colossal datasets past the capacity of human personalities and the derived bits of knowledge help doctors in arranging and giving consideration to get acceptable treatment. On the basis of this concept, this work has been carried out for analysis of COVID-19 cases and prediction of upcoming new positive cases [18] . In this model authors are going to look at the development of Coronavirus affirmed, demise and recouped instances of whole world and comparison of five countries that have been vigorously tainted done with rest of world. Investigation and predictions are done using Linear Regression, Ridge Regression, Polynomial ridge Regression [5] , polynomial Regression and Support Vector Regression (SVM). The proposed expectation model guarantees that it follows the authentic outcome with respect to this pandemic circumstance so that tremendous financial misfortune, network spread, measure of social separation of individuals can be identifiers and furthermore precise choice can be taken likewise [7] . This strategy will ensure the administration to yield preventive evaluations reliant on our next work for foreseeing the presence of this infection in future.

The remainder of the paper comprises five sections. Section 2 present the Related Work, Section 3 presents the Materials and Methods and Staging Analysis Parameter used for investigation, Section 4 presents the evaluation with comparison table and section 5 sums up the paper and presents the conclusion with references.

Since the time the rise of COVID-19 and its resulting spread across landmasses overwhelming both progressed and creating countries, there has been a great deal of exploration papers distributions on different parts of COVID-19. So, in various research paper analysis is done on vaccination, drug therapy and also on the prediction of future infected, recovered and death cases [1, 2] . Using various forecasting techniques L. Jia at el., 2020 [12] did the prediction and analysis of COVID-19 through various models. The differential equation, classical differential equation, historical arrangement expectation model is utilized in the project. They also used an Internet-based infectious disease prediction model. They used various models which came under mathematical models such as the Logistic model, Bertalanffy model [12] and Gompertz model [12] . Model evaluation was done using regression coefficient (R2). Fitting and analysis of SARS and COVID-19 was done on two basis/categories: number of affirmed cases and number of passing toll. Different prediction graphs were obtained under these categories. According to this paper they concluded that this epidemic will be over presumably in the last of April but their estimation is not so accurate [12] .F. Rustam et al [2020] introduced future forecasting on Covid-19 using various machine learning models. They use four regression models namely LR, LASSO, SVM and Exponential Smoothing. They predicted their results on the basis of Evaluation Parameters used in their models. According to their results ES model is best for predicting infected cases and recovered cases and LR model best predict the death rate [9] .

S. Dutta at el., 2020 [20] introduced in their paper the prediction models for confirming COVID-19 cases. They used three models in their paper namely LSTM (Long Short-Term Memory) model, GRU (Gated Recurrent Unit) model and combined LSTM-GRU Model [20] . Deep Learning Neural Network is employed. The consolidated LSTM-GRU based RNN model gives a relatively better outcomes as far as forecast of affirmed, delivered, negative, demise cases on the information [20] . Similarly, Narinder at el., 2020 [19] in this paper mirrored that because the eruption of the COVID-19 has turned into pandemic, the analysis of medicine knowledge isrequired to arrange the society to combat this challenge. The four stages of COVID-19 are described in this paper. The regression is trained and cross-validated on real time knowledge victimization the quantity of confirmed, recovered, and death cases. The results were displayed through graphs that pictured COVID-19 worldwide epidemic analysis victimization of machine learning and deep learning techniques/algorithms [19] .

The work carried by Sohini and Sareeta in 2020 [10] proposed a model in which they have done data visualization after that they used various forecasting machine learning techniques. During this paper a comparison is formed between the expansion of Covid-19 confirmed deaths and recovered cases of Asian countries to different major countries that have conjointly been heavily infected.

The various predictions are aforethought on a bar chart. Gender distribution is mirrored through a chart that reveals males are a lot of possible to be diagnosed with Covid-19. Age wise distribution is additionally shown through pie charts. They concluded that Sigmoid model is best among all the models they used [10] . Binti Hamzah at el., [8] and his fellow mates in year 2020 made corona tracker an Online stage that provides most up-to-date what's a lot of, solid news advancement, even as insights and investigation on COVID-19. They have done continuous information inquiry and imagined on their site and at that point questioned information is utilized for SEIR modelling.

On the basis of day by day observations they utilized the SEIR model to figure Covid-19 episodes inside and Outside of China. Similarly, they have done investigation on the Queried News [8] .

L Wynants at el., (2020) proposed a model in which they have inspected all the papers identified with covid-19 pandemic. They recognized that proposed schemeis ineffectively detailed and at high danger of predisposition, raising worry that their expectations could be untrustworthy once applied day by day [4] . R. Sujatha at el., in 2020 used three models for their predictions and did visualization on India. Using graph plotting, they try to figure out the future trend of this epidemic.

From apprehended qualities and coordination cases from dataset information they concluded that MLP strategy is giving acceptable forecast outcomes than that of the LR and VAR technique [11] .

L. Peng at el., (2020) using SEIR model they predict in light of the open information of National Wellbeing Commission of China from January twentieth to February sixteenth, 2020, they dependably gauge key plague boundaries and make forecasts in the curvature period and conceivable consummation time of 24 regions in Territory and 16 regions in Hubei region [17] . Li Cuilian at el., in 2020 proposed a model to assess the forecast estimation of the Internet search information from online web indexes and web-based social networking for the COVID-19 episode in China [16] . Vinay and Lei Zhang (2020) reflected in their paper the approximate finishing purpose of this epidemic in Canada and in the world. They have developed a forecasting model using deep learning techniques. The dataset required for this purpose was collected from a University and Canadian fitness management. Long short-term memory model was displayed to forecast COVID-19 and based on the graphs depicted by LSTM they predicted that this pandemic will end by June 2020.Despite the fact that their model accomplished better execution when contrasted and other anticipating models [15] .

Dong Je at el., (2020) introduced a model that clarifies the high-risk factors of coronavirus. They collected the clinical data of all the patients that were admitted to Fuyang Second People's Hospital in China between 20th January and 22nd February 2020. They divided around 208 patients under two categories-Stable group and Progressive group. Univariate and multivariate analysis depicted the autonomous high-hazard factors for COVID-19 movement. They concluded that the use of the CALL score model can be an effective resource to combat this challenge [14] . Weston C.Roda at el., (2020) summarized that after comparing SEIR and SIR models that it is not necessary that a complex model such as SEIR is more effective than a simple model SIR. They used Bayesian framework, Markov chain Monte Carlo algorithms and Akaike Information Criterion (AIC).

Parameters in the SIR model and Parameters in the SEIR model were shown through tables.

Comparisons were made through graphs [13] . C. Sohrabi at el., 2020 [21] tells that on 31st

December 2019 few cases of coronavirus were identified in China. WHO global health emergency, global reaction, reported UK cases and British response, transmission, viral spread, preventing, diagnosing, treating, forecasting and containment methods for this virus have been addressed by the authors.The authors also compared the diagnostic criteria for the CDC against the WHO based on symptoms and travel. In the paper the authors have also described the retrospective from response to COVID-19.

R. Khan at el., 2020 [43] investigate sentiment analysis techniques for the analysis of twitter covid-19 data.They find that it has been seen by pre-handling the information utilizing the regex and coach has been a viable answer for getting out the unpredictability of the applied calculation also the information instead of straightforwardly applying on the crude information itself. By utilizing the prepared model and further utilizing it with the classifier end up being a superior route for grouping as it decreased the time period and the size of the information outline diminishing the time unpredictability associated with the cycle. Discussing the examination, it has been seen that the number for tweets shared by the dynamic clients has been consistently more prominent when contrasted with different assessments. It implies with respect to the pandemic greatest number of individuals thought and took the choices made by the administration or the nearby experts in a positive manner. While the quantity of tainted and demised individuals continued expanding it didn't influence the psychological quality of the populace. For the 3-month investigation with respect to the Indian sub-landmass the variety among the positive, negative and nonpartisan estimations stayed consistent with the quantity of expanding cases step by step.

Gozes, Ophir, et al, 2020 [44] , underlying examination, which is right now being extended to a bigger populace, exhibited that quickly created AI-based picture investigation can accomplish high precision in identification of Coronavirus just as measurement and following of sickness trouble.

Shi, Feng, et al, 2020 [45] discussed that for COVID-19 pandemic clinical imaging techniques has a significant function in battling against COVID-19. In their research authors examines how AI

gives sheltered, precise and proficient imaging arrangements in Coronavirus applications. The Xray and CT scan techniques have been utilized to exhibits the adequacy of Man-made intelligence engaged clinical imaging. It is significant that imaging just gives incomplete data about patients. This is imperative to join imaging information with clinical indications and lab assessment results to support the screening, location furthermore, finding of COVID-19. For this situation, we trust AI will exhibit its capacity in combining data from this multi-source information, for performing precise and productive conclusion, investigation and development.

Yan, Carol H., et al, 2020 [46] , In ambulatory people with flu like side effects, chemosensory brokenness was unequivocally connected with COVID-19 contamination and ought to be viewed as when screening manifestations. Most will recoup chemosensory work inside weeks, resembling goal of other illness related manifestations.

Sun, Yinxiaohe, et al, 2020, [47] discusses that clinical and laboratory data can be rapidly established for individuals who are exposed to COVID-19 and allow PCR testing and containment efforts to be made a priority. For prediction models, the primary laboratory test results were important.

Oliveiros, Barbara, et al, 2020 [48] discussed that COVID-19 progression rate is projected to be slower with spring and summer. However, the two variables make up at most 18% of the transition, with the other 82% connected to other factors, such as containment steps, general health policies, population density, transport, cultural matters, etc. Furthermore, the direct effect is small: if the temperature rises by 20 ° C for example, the average time doubling is predicted to increase by 1.8 days in the best-case scenario.

Chen, Xiaofeng, et al, 2020 [49] discussed that pneumonia patients with and without COVID-19

can be differentiated on the basis of CT imaging and clinical symptoms alone. A model consisting of semantic and clinical radiological features performs very well in diagnosing COVID-19.

Dryhurst, Sarah, et al. [50] said that the risk of COVID-19 around the worldclearly show the consistent connection between COVID-19 perceptions of risk and various experience-and sociocultural factors across countries. At the same time, we notice that cultural differences in risk perception must be addressed.

As we know, data science helps us to clarify and understand the data that has been accumulated so far. Data science helps one to simulate and imagine the patterns of how coronavirus spreads including the number of patients reported daily with coronavirus or possible infected cases, i.e. it is growing or more than before and so on. Data analysis helps one to gain some valuable insight into the data. Machine Learning lets us make predictions and lets us create models that map the real world and that will take some data and predict the future. So, in our model, we first try to imagine data and then forecast future data and also try to find the best fit regression model that could help us with future predictions.

In this examination paper we have utilized the overall details of coronavirus from January 22, 2020, to July12, 2021, was assembled from the online resources like Kaggle, Weka 3.8.4 and

Orange [11] . The datasets give us the quantity of affirmed cases, recouped cases, and passing cases everywhere throughout the world. The datasets are accessible in time arrangement position with death, month and year so fleeting parts are not neglected. We isolated our datasets into a preparing set (85%) on which our model readied and a testing set (15%) to test the exhibition of our framework.

The experimental work flow diagram of the work is shown in Fig. 1 . From the Fig. 1 , it has been found that the complete work divided into two sections. Section one is used for data visualization and section two is used for future data prediction. For data visualization, progress bar, recovery rate and mortality rate are shown. In section 2, linear, polynomial, ridge, ridge polynomial and support vector machine (SVM) having polynomial kernel function is used for future case prediction. Each model is evaluated using MSE, MAE, RMSE and R2_SCORE parameters. 

In this work demonstrate the week after week progress of various sort of cases in world which incorporates affirmed cases, recouped cases and demise cases. From this it has been inferred that the pace of increment in affirmed cases is a lot higher than the recovered and demise rate cases.

Passing rate is low in contrast with other two cases though recouped rate is moderate. At that point we have indicated the everyday increment in affirmed cases, recouped cases and demise cases.

The curve given in Fig. 2 shows the week number vs number of cases line graph shows the weekly increase of different types of cases in worldwide. From our prediction confirmed cases are increasing with a high rate and recovery rate is average as compared to number of active cases.

Also, death rate is less as compared to number of active cases. 

Mortality rate is a proportion of the quantity of deaths in a specific populace, scaled to the size of that populace, per unit of time. Recovery rate is the degree to which head and accumulated enthusiasm on defaulted obligation can be recouped, communicated as a level of assumed worth.

The monthly progress of different types of cases in world is shown in Fig. 3 . 

In this section a comparative analysis has been done according to mortality and recovery rate of top 15 countries. Fig. 4 shows the list of top 15 countries having high mortality rate and low mortality rate. According to this, Yemen has the highest mortality rate and Bhutan get the lowest position in the list of mortality rate country. According to statistical analysis done by the author's, the following key facts have been observed.

(a) Top 

Growth factor is a proportion of how rapidly the quantity of new cases is rising or falling and the basic thing to recall is that we need to keep it fewer than one. Critically, one thing that has changed since the last time the development factor was over one is that the quantity of new cases every day is low. It is simply the factor by which an amount duplicates itself after some time. Fig. 6 shows the growth factor of different types of cases worldwide. Table 1 shows the doubling rate of coronavirus infected cases. The above Fig. 7 which compares the confirmed, recovered and death cases of India, China, UK, Italy, US and rest of the world. We can infer from the graph that confirmed, recovered and death cases of the rest world is higher than the other five countries.

(a) The blue line represents the confirmed, death and recovered cases of India. It is observed that confirmed cases and death cases-in India increased exponentially after March 2021.

Recovery rate was maximum in July 2021. The above graph shown in Fig. 8 , presents the mortality rate and recovery rate comparison of 5 countries with the rest of the world. Mortality rate of U.K has increased in the timestamp whereas India has least mortality rate. Recovery rate of mainland China is highest and U.K has least recovery rate. (e) Purple line represents the mortality rate and recovery rate of UK. It is observed that mortality rate has decreased after November 2020 and recovery rate is constant after mid of December 2020.

(f) Brown line represents the mortality rate and recovery rate of rest of the world. It is observed that mortality rate has decreased after September 2020 and recovery rate has increased after September 2020.

In this work unsupervised machine learning algorithm is used for the prediction of future cases of COVID-19 in terms of confirm cases, recovered cases and death cases. 

It is an immediate method to manage showing the association between a scalar response (or ward variable) and at any rate one instructive factor (or free variables). It was the essential kind of backslide examination to be thought completely, and to be used broadly in feasible applications. This is because models which depend legitimately upon their dark limits are less difficult to fit than models which are non-straightforwardly related to their limits and considering the way that the real properties of the resulting estimators are easier to choose. One of the information digging methods utilized for expectation undertakings is Linear Regression [22, 23, 24, 25] . In an issue with one indicator, this procedure attempts to locate the best line to fit. The formula to calculate linear regression is given in Eq. 1.

Where, b 0 is a constant, b 1 is the regression coefficient; x is the independent variable and is the predicted value of dependent variable.

In estimations, it is a kind of relapse examination wherein the association between the free factor and the poor variable is shown as a farthest breaking point polynomial in . It fits a nonlinear association between the estimation of and the relating unforeseen mean of , implied ( | ).

Even though polynomial regression fits a nonlinear model to the data, as a quantifiable estimation issue it is immediate, as in the backslide work is straight in the dark limits that are ( | ) assessed from the data [26, 27, 28] . Subsequently, polynomial relapse is seen as an uncommon case of various direct backslides. It can be calculated according to Eq. 2. 

For some, machine learning issues with an enormous number of highlights or a low number of perceptions, a direct model tends to overfit and variable determination is dubious. Models that utilize shrinkage, for example, Lasso and Ridge can improve the forecast precision as they diminish the estimation difference while giving an interpretable last model. Ridge and Lasso expand on the direct model; however, their basic idiosyncrasy is regularization. The objective of these techniques is to improve the misfortune work with the goal that it depends not just on the whole of the squared contrasts yet in addition on the relapse coefficients [29, 30, 31] .

The primary thing in the advancement of such a framework is the right determination of the regularization boundary. Compared to Linear Regression, Ridge and Lasso models are progressively impervious to anomalies and the spread of information. Generally speaking, their fundamental reason for existing is to forestall overfitting. The primary distinction between Ridge relapse and Lasso is the means by which they dole out a punishment term to the coefficients.

The Ridge regression is a method which is specific to multilinear regression information which is multicollinearity in nature. Ridge Regression is one of the most head regularization methodologies which isn't used by various people due to the multifaceted science behind it [32, 33, 34] . On the off chance that you have a general thought regarding the idea of different relapse, it's not all that hard to investigate the science behind Ridge relapse in r. Relapse is the equivalent, what makes regularization diverse is that the way how the model coefficients are resolved.The equation for this technique is given in Eq. 3.

This sets the coefficient that can be said as min (whole of square residuals + λ |slope|), where, λ |slope| is punishment term.

They are astounding yet versatile controlled AI figuring which are used both for course of action and backslide. SVMs have their exceptional strategy for execution when diverged from other AI estimations. As of late, they are incredibly notable because of their ability to manage different relentless and obvious components.A SVM model is basically a depiction of different classes in a hyperplane in multidimensional space [35, 36] . The hyperplane will be delivered in an iterative manner by SVM with the objective that the goof can be constrained. The goal of SVM is to parcel the datasets into classes to find the biggest immaterial hyperplane (MMH).

The RBF Kernel is additionally called the Gaussian part. There'sa boundless number of measurements within the feature space since it can be extended by the Taylor Arrangement.

Within the arrange underneath, the γ parameter characterizes how much impact a single preparing illustration has. The larger it is, the closer other examples must be to be influenced [37] . It may be a general-purpose part; utilized when there's no earlier information around the data.Mathematically, it is represented as given in Eq. 4.

(4)

denotes the squared Euclidean distance between two feature vectors. is a free || -′ || 2 parameter.

Polynomial Kernel: The polynomial kernel looks not as it were at the given highlights of input tests to decide their similitude [38] , but too combinations of these". With n unique highlights and k degrees of polynomial, the polynomial bit yields extended highlights where d is the degree of polynomial. It is prevalent in image processing.Mathematically, it is represented as given in Eq.

( -) = ( . + 1) Not at all like other Regression models that endeavour to constrain the bumble between the veritable and foreseen worth, the SVR endeavours to fit the best line inside an edge regard (Distance among hyperplane and limit line), a. Consequently, we can say that SVR model attempts fulfil the following condition:

It utilized the focuses with this limit to anticipate the worth.

RMSE:It is the standard deviation of the residuals. Residuals are a proportion of how a long way from the relapse line information focuses is. RMSE is an extent of how spread out these residuals is. Figuratively speaking, it uncovers to you how centred the data is around the line of best fit [40, 41] . Root mean square error is regularly utilized in climatology, anticipating, and relapse investigation to check test results. The formula to calculate RMSE is given in Eq. 7.

Where, k is the number of observations, is the observed value, and is the predicted value.

R2_SCORE:It is a factual proportion of how close the information is to the fitted relapse line. It is otherwise called the coefficient of assurance, or the coefficient of various assurances for numerous relapses. It is always between 0 and 100%. :0% represents that the model clarifies none of the changeability of the reaction information around its mean and 100% shows that the model clarifies all the fluctuation of the reaction information around its mean. The mathematical expression of R2_SCORE is given in Eq. 8.

Where, is the actual cumulative confirmed cases, is the predicted cumulative confirmed cases, is the average of the actual cumulative confirmed cases.

Mean Absolute Error: It measures the normal size of the errors in a lot of forecasts, without thinking about their heading [39] . It's the normal over the test of the outright contrasts among expectation and genuine perception where every individual distinction has equivalent weight. The obtained value of MAE is calculated according to Eq. 9.

Where, p represents the number of errors,| -r| denotes the absolute errors.

Mean Squared Error: MSE of an estimator is the normal squared contrast between the assessed qualities and the real worth. The MSE is an extent of the idea of an estimator. It is reliably nonnegative, and characteristics increasingly like zero are better.MSE is calculated according to Eq.

.

Where, z is the number of data points, represent the observed values, and represent predicted value.

To make our data perfect with sklearn design, we made another section called "Days since" which tracks the quantity of days since the initial date. We have taken four evaluation parameters namely Mean Square Error (MSE), Mean Absolute Error (MAE), Root Mean Square Error (RMSE), r2_score for identification of the best regression model which predicts the best about future cases, recovered cases and deaths cases. The brief description of experiment carried out for the work is given in Table 2 . Experiment carried out for death cases using five different regression models. This investigation endeavours to build up a framework for the future gauging of the quantity of cases influenced by COVID-19 utilizing AI strategies. The dataset utilized for the investigation contains data about the day by day reports of the number of recently contaminated cases, the quantity of recuperations, and the quantity of passing due to COVID-19 around the world. We have utilized five AI models LR, polynomial, Ridge, polynomial ridge and SVM to foresee the quantity of recently affected cases, the quantity of passing, and the quantity of recuperations.We additionally attempt to anticipate the best model for gauging affected cases, recuperated cases and the passing cases.

Experiment 1:The investigation performs forecasts on infected cases and concurring to results polynomial ridge performs better among all the models; SVM additionally performs well and polynomial relapse performs normal score. In correlation, LR and Ridge performs most exceedingly terrible in this circumstance and accomplish nearly the same R2_score. Be that as it may, contrasting the current confirmed cases statistics and our models' forecasts, the polynomial ridge expectation is following the patterns which are near the genuine circumstance. Results of the extensive experiments have been reported in Table 3 for active cases. (c) Purple line represents the prediction of confirmed cases using polynomial regression.

(d) Green line represents the prediction of confirmed cases using polynomial ridge regression.

(e) Blue line represents the prediction of confirmed cases using SVM.

The investigation performs forecasts on Recovered cases and concurring to results polynomial ridge performs better among all the models; polynomial Regression additionally performs well and SVM performs normal score. In correlation, LR and Ridge performs most exceedingly terrible in this circumstance and accomplish nearly the same R2_score. (e) Blue line represents SVM prediction of recovered cases on the basis of testing set. (e) Blue line represents the SVM prediction of death cases.

Linear and Ridge Regression models' results are very similar to each other and they predict very similar r2_score value in all the three cases. [9, 10, 11] is performed and the major findings of the study is given as: a. In the study, visualization of Active Cases, closed cases (recovered and death) all over the world has been reported.

b. The study also performs the visualization of weekly and daily progress of different types of cases in the world.

c. The work also highlights the country wise confirmed cases, Death cases, Mortality rate, Recovery Rate. e. The study has also visualized the growth factor of different cases World-wide.

f. Analysis on India and comparison with other counties to find out how much time other countries took to reach a certain number of confirmed cases in comparison to India has been done.

g. Prediction of the time taken for doubling the number of confirmed cases has also been done.

h. Future prediction using five regression models and analysis on the premise of Mean square Error, Mean Absolute Error has been done.

i. From the comparison Model it can be inferred that no alternative has done as several things as through this model.

This examination introduced current patterns of COVID-19 occurrence from twenty-second February 2020 to 2 nd September 2020 as envisioned in our task. The quickly expanding number of latest COVID-19 cases day by day worldwide has placed an overwhelming weight on clinical assets in nations with enormous flare-ups. Therefore, prediction of future confirmed cases became necessary. The size of information accessible is gigantic associate degreed gathering knowledge and obtaining an intriguing example out of the cumulated information could be a tough trip. With the common info regarding confirmed, recovered and ending across Republic of India for over the time term helps in anticipating and deciding the not thus distant future. Our outcomes imply that there is a sure comprehensiveness in the time development of Covid-19.

This proposes a nation that turns into the venue of a scourge flood can be regarded, in any event in first guess, as a very much mixed synthetic reactor, where various populaces associate as indicated by mass-activity like principles with little association with topographical varieties.

In light of itemized examination of the overall information, we gauge a few key boundaries for COVID-19, similar to the inactive time, the isolate time and the compelling propagation number in a moderately dependable manner, and foresee the inflection point, conceivable completion time and final absolute tainted cases for everywhere throughout the world. We have done prediction by using 5 regression models so that we can conclude which model is best and on the basis of that model we are able to tell the rate of increase in number of infected, recovered and death cases in future. The experiments have been completed with best MSE value as 737383324668728. 4 

Forecasting the novel coronavirus COVID-19

Optimization Method for Forecasting Confirmed Cases of COVID-19 in China

Analysis and forecast of COVID-19 spreading in China, Italy and France

Prediction models for diagnosis and prognosis in Covid-19

SEIR and Regression Model based COVID-19 outbreak predictions in India

Forecasting COVID-19

Real-time tracking of self-reported symptoms to predict potential COVID-19

CoronaTracker: World-wide COVID-19 Outbreak Data Analysis and Prediction

COVID-19 Future Forecasting Using Supervised Machine Learning Models

Covid-19 Pandemic Data Analysis and Forecasting using Machine Learning Algorithms

A machine learning forecasting model for COVID-19 pandemic in India

Prediction and analysis of Coronavirus Disease

Why is it difficult to accurately predict the COVID-19 epidemic?

Prediction for Progression Risk in Patients With COVID-19 Pneumonia: The CALL Score

Time series forecasting of COVID-19 transmission in Canada using LSTM networks

Retrospective analysis of the possibility of predicting the COVID-19 outbreak from Internet searches and social media data

Epidemic analysis of COVID-19 in China by dynamical modeling

Predicting the impacts of epidemic outbreaks on global supply chains: A simulation-based analysis on the coronavirus outbreak (COVID-19/SARS-CoV-2) case

COVID-19 Epidemic Analysis using Machine Learning and Deep Learning Algorithms

Machine Learning Approach for Confirmation of COVID-19 Cases: Positive, Negative, Death and Release

World Health Organization declares global emergency: A review of the 2019 novel coronavirus (COVID-19)

A tutorial on testing, visualizing, and probing an interaction involving a multicategorical variable in linear regression analysis

Prediction by linear regression on a quantum computer

Learning linear regression models over factorized joins

Current status linear regression

Polynomial regression as an alternative to neural nets

Polynomial Regression and Measurement Error: Implications for Information Systems Research

Polynomial regression-based model-free predictive control for nonlinear systems

Ridge and lasso regression models for cross-version defect prediction

Comparing Ridge and LASSO estimators for data analysis

The logistic lasso and ridge regression in predicting corporate failure

An iterative, sketching-based framework for ridge regression

Some ridge regression estimators and their performances

The logistic lasso and ridge regression in predicting corporate failure

Wind speed forecasting for wind farms: A method based on support vector regression

Global sensitivity analysis using support vector regression

Classification of tweets data based on polarity using improved RBF kernel of SVM

Fast learning with polynomial kernels

Mean absolute percentage error for regression models

Root mean square error (RMSE) or mean absolute error (MAE)?-Arguments against avoiding RMSE in the literature

Root mean square error (RMSE) or mean absolute error (MAE)?

Fuzzy sigmoid kernel for support vector classifiers

A. social media analysis with AI: sentiment analysis techniques for the analysis of twitter covid-19 data

Rapid ai development cycle for the coronavirus (covid-19) pandemic: Initial results for automated detection & patient monitoring using deep learning ct image analysis

Review of artificial intelligence techniques in imaging data acquisition, segmentation and diagnosis for covid-19

Association of chemosensory dysfunction and Covid-19 in patients presenting with influenza-like symptoms

Epidemiological and clinical predictors of COVID-19

Role of temperature and humidity in the modulation of the doubling time of COVID-19 cases

A diagnostic model for coronavirus disease 2019 (COVID-19) based on radiological semantic and clinical features: a multi-center study

Risk perceptions of COVID-19 around the world

Dr. Ayoub Khan (Senior Member, IEEE) received a Ph. D (Electrical Engg.) from Jamia Millia Islamia, New Delhi, India and a Master of Technology (Computer Science and Engineering) from Guru Gobind Singh Indraprastha, New Delhi, India. He worked with many leading organizations like C-DAC (Ministry of IT and Communications), Noida, Sharda University. Presently, he is an Associate Professor at the University of Bisha, Saudi Arabia with interests in the Internet of Things, RFID, wireless sensor networks, ad hoc networks, smart cities, industrial IoT, signal processing, NFC, routing in network-on-chip, real time and embedded systems. He has more than 15 years of experience in his research areas. He has published many research papers and books in reputed journals and international conferences. He contributes to the research community by undertaking various volunteer activities in the capacity of editor for many journals and as a conference chair. 

Conflicts of Interest: "The authors declare that they have no conflicts of interest to report regarding the present study."