key: cord-0975511-6puhtums authors: nan title: A Novel Parametric Model for the Prediction and Analysis of the COVID-19 Casualties date: 2020-10-22 journal: IEEE Access DOI: 10.1109/access.2020.3033146 sha: de88b81ec39cc308269714c8e2c2f4e8586816cc doc_id: 975511 cord_uid: 6puhtums Coronavirus disease (COVID-19) outbreak has affected billions of people, where millions of them have been infected and thousands of them have lost their lives. In addition, to constraint the spread of the virus, economies have been shut down, curfews and restrictions have interrupted the social lives. Currently, the key question in minds is the future impacts of the virus on the people. It is a fact that the parametric modelling and analyses of the pandemic viruses are able to provide crucial information about the character and also future behaviour of the viruses. This paper initially reviews and analyses the Susceptible-Infected-Recovered (SIR) model, which is extensively considered for the estimation of the COVID-19 casualties. Then, this paper introduces a novel comprehensive higher-order, multi-dimensional, strongly coupled, and parametric Suspicious-Infected-Death (SpID) model. The mathematical analysis results performed by using the casualties in Turkey show that the COVID-19 dynamics are inside the slightly oscillatory, stable (bounded) region, although some of the dynamics are close to the instability region (unbounded). However, analysis with the data just after lifting the restrictions reveals that the dynamics of the COVID-19 are moderately unstable, which would blow up if no actions are taken. The developed model estimates that the number of the infected and death individuals will converge zero around 300 days whereas the number of the suspicious individuals will require about a thousand days to be minimized under the current conditions. Even though the developed model is used to estimate the casualties in Turkey, it can be easily trained with the data from the other countries and used for the estimation of the corresponding COVID-19 casualties. Coronavirus disease (COVID-19) is described as a contagious respiratory disease caused by Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) [1] . It was first noticed in Wuhan, China in December 2019, and then spread rapidly to all over the world [2] . The World Health Organization (WHO) declared the COVID-19 outbreak as a pandemic on March 11, 2020 [3] . CoVs are classified as alpha-, beta-, gamma-and delta-coronaviruses [4] . Bats lead to alpha-and beta-type coronaviruses, while birds and pigs cause gamma-and delta-type coronaviruses. Though alphatype coronaviruses have mild symptomatic effects, beta-type The associate editor coordinating the review of this manuscript and approving it for publication was Derek Abbott . coronaviruses are more severe [5] since they result in serious problems especially in respiratory systems [6] , [7] . Seven human coronaviruses have been detected to date [8] . 229E (1966) and NL63 (2004) have been alpha-type, while OC43 (1967), HKU1 (2005), SARS-CoV (Severe Acute Respiratory Syndrome, 2002) and MERS-CoV (Middle East Respiratory Syndrome, 2012) have been beta-type [9] . The two zoonotic viruses, SARS and MERS, had led to serious diseases which caused a large number of deaths and they have had the most catastrophic impact among all the known coronaviruses in the world [5] . SARS was first seen in southern China and spread to 29 countries in less than a year. There were more than 8000 people infected with the virus and 774 deaths were reported between November 2002 and July 2003 [10] . MERS was identified in 2012 in Saudi Arabia with the death of a 60-year-old patient and affected around 2500 people in 27 countries where 848 of them lost their lives [11] . As in the case of SARS-CoV and MERS-CoV, the COVID-19 is also thought to be transmitted from bats to humans. The mortality rate of the COVID-19 virus outbreak is larger than the SARS virus and its transmission rate is also much more significant than them [12] . The COVID-19 is transmitted from human to human through droplets that spread from the coughs or sneezes of people with the disease. The virus may have different effects on the infected people where some people show mild symptoms and recover without hospitalization. The most common symptoms of the COVID-19 are fever, dry cough, and tiredness. Difficult breathing, chest pain, and loss of speech are some of the more severe symptoms as well. Since, there are no drugs or vaccines that have been proven to protect people from the COVID-19, it is still uncontrollable [13] . During the pandemic periods, international organizations such as the WHO and the public authorities have required comprehensive and accurate short-term and long-term estimators to identify the most appropriate strategies and take the necessary measures. These estimators, known as models, provide forecasting in the short and long term are of great importance. Therefore, modelling the pandemic plays a significant role to overcome the detrimental effects of the pandemic viruses in the presence of the uncertainties. It is possible to use mathematical and statistical methods to model the pandemic, analyse its characteristics and evaluate its control mechanisms [14] , [15] . Modelling the pandemic enables to examine the dynamics of the infectious diseases in detail and estimate the infection parameters. Additionally, it provides insights about the effects of the interventions (closing the schools, quarantine of infected people, social distancing etc.) to control the outbreak. Modelling approaches can be classified as parametric and non-parametric. In terms of the non-parametric approaches, machine learning methods such as Neural Networks (NN) and Support Vector Machines (SVM) are considered without specifying the parameters and the data spaces [16] . Thus, it is not possible to know where to map the real data in the imaginary solution space. In addition, the estimated solution can correspond to a local region instead of the global where the estimates are only valid in a small region. Since they are generally iterative approaches, it is likely for them to converge somewhere in the parameter space which might not be the optimum, especially in the stochastic cases. Moreover, even though the statistical analysis approaches are available for the non-parametric modelling approaches, they are usually not rigorous as the exact model parameters are unavailable. Finally, the non-parametric models have a bias-variance trade-off dilemma and they require testing and validation data together with the training data [17] . With respect to the parametric modelling approaches, they necessaire accurate insights about the real systems (i.e. orders, zeros, coupling, and forcing terms) behind the available data. Therefore, they require initial observations and analysis of the source data. However, when the model structure is constructed, performing a parametric modelling approach is straightforward. The linear and non-linear models can be obtained easily and extensive analyses of them can be achieved by using the well-known mathematical approaches such as the roots, eigenvalues, and imaginary components. Since the batch type optimization approaches together with the iterative ones are available for the parametric modelling, it is possible to reach the terminal parametric solution in one step. More importantly, the detrimental impacts of the stochasticity in the data can be eliminated as the batch type optimizations can able to ignore the random variables. Lastly, the parametric models can reveal key knowledge about the strong and weak sides of the real systems such as failing treatments, so that policies can be developed to control the behaviour of them as desired under uncertainties [18] . Recently, a number of parametric models were proposed for the estimation of the COVID-19 casualties. Threatened-Healed-Extinct (SIDARTHE) model to analyse the casualties in Italy [23] . Chang et al. focused on the known pandemic dynamics to analyse the casualties in Australia [24] . Even though these models provide some insights about the COVID-19, since they are mostly based on known parameters such as the infectious rate, cure rate, and mortality rate, their parameters are not optimized. Thus, it is not possible to know whether the individual parts of a multidimensional model and its other parameters are covered and constructed properly. In addition, a number of non-parametric modelling approaches have been available. Chinnazzi et al. considered a global metapopulation disease transmission model to reveal the effects of the travel limitations enforced in Wuhan city of China on the spread of the virus [25] . Lauer et al. considered a non-parametric statistical approach to analyse the median incubation and symptoms development periods from 50 provinces outside Wuhan and Hubei provinces of China [26] . However, all these models have used simple statistical approaches or synchronized parameters which are highly likely to fail when the internal dynamics of the virus or external uncertainty vary. Based on these corresponding gaps in the literature, the key contributions of this paper can be summarized as; 1) This paper develops a Suspicious-Infected-Death (SpID) model, which has utterly unknown dynamics. VOLUME 8, 2020 2) The developed SpID model is highly coupled since the suspicious, infected, and death casualties are strongly dependent on each other. 3) Each sub-model of the developed SpID has 2 nd order internal dynamics to represent the peaks and fluctuations in the COVID-19 casualties. 4) To learn the unknown parameters of the SpID model, the exact bases corresponding to the parameter space of the model are constructed and the unknown parameters are learnt by performing a batch type Least Squares (LS) estimator. 5) The model with the determined parameters has been extensively analysed by utilizing the mathematical tools. 6) Predicted future COVID-19 casualties for Turkey have been provided by using the developed model. In the rest of the paper, Section II reviews the SIR model, Section III introduces the proposed SpID model, Section IV formulates the LS based parameter learning approach, Section V analysis the COVID-19 casualties in Turkey, Section VI provides the key insights of the SpID model, Section VII presents the predicted future casualties for Turkey and finally, Section VIII summarizes the work. This section reviews the SIR model adopted for the estimation of the COVID-19 casualties. The gained insights in this section greatly contribute to the construction of the comprehensive new mathematical SpID model presented in Section III. The SIR model is expressed with unforced (homogeneous), time-invariant, slightly coupled, three individual first-order ordinary differential equations (ODE) as: where; • S(t) represents the Susceptible (S) individuals who may be infected and have a lack of immunity, • I (t) represents the Infected (I) individuals who are exposed and become infected after contracting the disease, • R(t) represents the Recovered (R) individuals who have gained immune to the disease and are not infectious, • β represents the transmission rate, • γ represents the infectious rate. Next section discusses the properties of the SIR model in terms of covering the dominant COVID-19 dynamics. We can summarize the key properties of the SIR model as 1) Its S(t) and I (t) sub-models are non-linear due to S(t)I (t) multiplication, 2) Each ODE (sub-model) is first order, 3) Its I (t) sub-model has linear coupling with R(t) through the γ parameter. 4) It is a continuous model due to time derivatives, 5) It is a deterministic model since it does not cover any uncertainties Next section constructs a new model called as SpID. The SpID model does not contain the number of the recovered people R as in the SIR model because the optimization algorithms mainly focus on minimization such as the number of the suspicious, infected, and death people, rather than the number of the recovered people requiring maximization. In addition, the proposed SpID model does not explicitly cover the parameters such as β and γ , instead it has unknown parameters where the optimization algorithms determine them implicitly. To provide consistency between the constructed model and the real system, casualties in Turkey are referred. Even though the magnitudes of the casualties in the worldwide are different, the overall character of them such as peaks, increments, and decays are similar. Thus, the proposed model can be easily adopted for the other cases in different countries. The proposed model considers the number of the suspicious Sp(t) casualties rather than the number of the susceptible S(t) casualties as in the SIR model. This is because • The developed SpID model aims at modelling the number of the suspicious casualties Sp(t) which directly feeds the number of the infected and death casualties. • The number of the suspicious casualties Sp(t) cover the number of the people who have been tested and/or quarantined based on suspicion of being infected, in which the corresponding data are revealed daily by the state authorities. To develop a model for the suspicious casualties Sp(t), three steps are followed. Step 1: Consider the internal dynamics of the number of the suspicious people. As can be seen from Fig. 3 , the number of the suspicious people has two moderate peaks (overshoots), which imply that the model is almost overdamped (not exactly damped). Thus, the system can be represented as a 2 nd order linear system as where a 1 and a 0 are the unknown parameters which will be determined in Section IV. Step 2: It is the fact that the number of the infected people I has an important role in the number of the suspicious people since the infected people are infectious. So that they continue spreading the virus until they are completely isolated. Therefore, the suspicious model should be coupled with the number of the infected people as where b 3 is the unknown parameter that scales up the impact of the infected people on the number of the suspicious people. Fitting the suspicious data of Turkey shows that the suspicious model (3) reflects the general character of the real system. As can be seen from Fig. 1 , except the awareness (transient period) and the lockdowns, the constructed model (6) carries general properties of the pandemic. Step 3: It is important to note that the model estimation has larger frequencies than the real one. This is because that the real data is discrete (collected daily samples), but the constructed model is continuous. Hence, the continuous model (3) is converted in its discrete form as The parameters a 1 , a 0 and b 3 are kept unchanged in continuous and discrete models as they are only unknown parameters, not specifically defined parameters. To develop a model for the infected casualties, four steps are followed. Step 1: Consider the internal dynamics of the infected number of the people shown by Fig. 3 . It has a large peak (overshoot); henceforth, it is underdamped. Therefore, it is at least 2 nd order represented as where b 1 and b 0 are the unknown parameters of the infected model. Step 2: The number of the suspicious people affects the number of the infected people. Hence the model (5) becomes where a 3 is the unknown parameter scaling up the impact of the number of the suspicious people on the infected number of people. Step 3: The number of the deaths has a role on the number of the infected people (i.e. increased number of deaths reduce the number of the infected people). Thus, the model (6) becomes where d 3 is the unknown parameter scaling up the impact of the number of the deaths on the infected number of people. Step 4: Similarly, the continuous time model (7) in discrete form is The infected model (8) is 2 nd order and highly coupled. Next section presents the model of the death. To develop a model for the death casualties, three steps are followed. Step 1: The number of deaths in Fig. 4 has a large peak (overshoot), henceforth the system is slightly damped with at least 2 nd order dynamics which can be represented as where d 1 and d 0 are the unknown parameters. Step 2: Since the number of the infected people directly affects the number of the deaths, the model (9) can be improved asD where b 4 is the scaling factor of the number of the infected people on the number of the deaths. Step 3: The continuous model (10) in its discrete form is Next section presents the LS based optimization approach to determine the unknown parameters of the proposed SpID model. This section formulates the bases and the unknown parameter vectors of the SpID model together with the labelled output. This section also provides the batch type LS based unknown parameter estimation approach to learn the unknown parameters offline. To perform the LS based optimization, initially a basis should be constructed for each part of the SpID model. For the basis of the suspicious model, consider the right-hand side of the discrete model (4) where N is the length of the data. Similarly, to construct the basis for the infected φ I , consider the right-hand side of the discrete model (8) Lastly, take into account the right-hand side of the discrete model (11) to construct the basis for the deaths φ D as These bases have information about the past casualties of the COVID-19 and will be used for formulation of the estimated and parametrized casualties. The estimated model consists of the unknown parameter vectors defined as where w Sp , w I , w D are the unknown parameter vectors of the suspicious, infected and deaths models respectively. The estimated individual models arê whereŷ Sp ,ŷ I ,ŷ D are the estimated outputs or future casualties for the suspicious, infected and death sub-models. To perform the LS optimization, the next step is to label the real outputs presented next. To construct the real outputs, consider the left-hand sides of the discrete models (4), (8) and (11) . The real outputs (nonparametrized) are where y Sp , y I , y D are the real outputs. Finally, next section formulates the LS. Consider the real outputs (17) and estimated outputs (16) by reducing the indices of the parameters and variables. The error between them provides a tool for the estimation of the unknown parameter vector (15) . The error vector e is e = y −ŷ (18) where y = y Sp y I y D T andŷ = ŷ SpŷIŷD T . To ensure positive definiteness in the estimates, square the error e in (18) and expand as To determine the unknown parameters w which minimizes the squared error (19) , take the gradient of (19) as The unknown parameter vector w in (20) is obtained by setting it zero as This formulation of the unknown parameter vector (21) can now be used to analyse the developed model in Section VI and to predict the future casualties of the COVID-19 in Section VII. In this sub-section, we provide a simple pseudo-code to apply the SpID model for the casualties of the other countries. Input: Reported suspicious (Sp), infected (I ) and death D casualties Output: Estimated modelsŷ Sp ,ŷ Î , andŷ D 1. Construct the bases φ Sp , φ I and φ D given by equations (12) , (13) and (14) . 2. Construct the output vectors y Sp , y I , and y D given by Equation (17) . 3. Determine the unknown parameters of each submodel by using the LS optimizer in Equation (21). 4. Obtain the estimated outputsŷ Sp ,ŷ Î , andŷ D by using Equation (16) . Next section presents the analysis of the COVID-19 data. This part of the paper provides a brief presentation and analysis of the COVID-19 casualties in Turkey. This data is used for determining the unknown parameters of the model in Section V and also is used for the analysis of the model and predicted future casualties in Sections VI and VII. Fig. 2 shows the daily suspicious casualties (tested due to appearance of the symptoms) reported by the Health Ministry of Turkey. As can be seen, initially no suspicious casualties have been reported even though the deaths have been reported. The number of the suspicious casualties has increased quite sharply for about 40 days and then due to mostly imposed curfews and lockdowns for about 30 days duration, the number of the suspicious casualties has reduced moderately. However, it continues climbing after lifting the restrictions. Fig. 3 shows the daily infected casualties reported by the Health Ministry of Turkey. It is clear that the number of the infected people sharply reaches the peak after around 30 days of 12 of March 2020. The number of the infected casualties reduces from 5000 to under 1000 after imposing restrictions and raising social awareness against the virus. Nevertheless, the number of the infected people slightly increases after releasing the restrictions. However, it is noticeable that despite the large increase in the number of the suspicious people (Fig. 2) , increase in the number of the infected people is limited (Fig. 3) . This is likely because the latest tests are for protection purpose rather than the existence of the strong evidences of the COVID-19 symptoms. Fig. 4 shows the number of the deaths stemmed from the COVID-19 virus in Turkey. It is clear that the character of the deaths (Fig. 4) is strongly correlated with the number of the infected people (Fig. 3 ), but not with the number of the suspicious people (Fig. 2) . It is clear that the number of the deaths has reduced from 130s to 20s, but it has not minimized. Previously it is shown that the infected and death casualties are largely reduced, but they fluctuate around their new equilibrium points after removing the curfews. As can be seen from Fig. 5 , all the elements of the SpID model converge the bounded regions and these regions have a number of periods where small variations yield different periods. While the suspicious casualties have the largest region (Fig. 5a) , the death casualties have the smallest region (Fig. 5c ). This section provides the insightful analysis of the coupled and higher order parametric SpID model. The learned parameters of the SpID model with the LS estimator (21) are Insight 1: All the individual past casualties have strong impact on the current casualties since the coefficients of the past values are, for instance 1.55S k+1 − 0.55S k for S k+2 in (22) , and likewise for the others. Insight 2: Infected number of the people slightly affects the number of the suspicious people since 0.10I k in (22) . Insight 3: However, the role of the number of the suspicious people on the number of the infected people is limited (0.00001S k ) due to widely performed precautious tests for the people who start their tasks (i.e. soldiers, workers). Also, the role of the infected number of the people on the number of the deaths is limited since the majority of the infected people have recovered after successful treatments. Eigenvalues of the coupled and 6 th order discrete model (22) provides key information about the future behaviour (decrease or increase unboundedly and the time to reach a certain level). Therefore, the eigenvalues of the whole data and the data after the restrictions (late data) have been evaluated. Since the model is discrete, any eigenvalue smaller than 1 yields stable response (convergent) whereas any eigenvalue larger than 1 leads to instability (unbounded or infinity response). Based on this fact, the following insights can be deduced. Insight 1: When the whole data is considered, the real eigenvalues in rectangle 1 of Fig. 6 are close to 1. They are in stable region, but they are also close to the instable region. Therefore, any internal change or external effect can easily drive these eigenvalues outside the stability region. Henceforth, all the casualties can explode. Insight 2: When the whole data is considered, there are imaginary eigenvalues in rectangle 2 of Fig. 6 . These imaginary values imply fluctuations in the casualties, but they are considerably small compare to the real part of the eigenvalues, which are less than 1. Thus, the casualties will slightly fluctuate over a period of time. Insight 3: The dominant eigenvalue of the whole data is represented with 0.99 in rectangle 1 of Fig. 6 . The other eigenvalues of the whole data will disappear, but the dominant eigenvalue will be insignificant around 900 days later if there are no disturbances or changes in the conditions. Insight 4: When the late data (after the lockdowns) is considered, the two of the real eigenvalues in rectangle 3 of Fig. 6 are just larger than 1. Henceforth, the response is unstable and the casualties explode unboundedly if no action is taken against them. However, since the unstable eigenvalues are slightly larger than 1, the casualties will increase sluggishly. Insight 5: When the late data is considered, there are imaginary values in rectangles 4 and 5 of Fig. 6 , which are quite large compare to their real values. Therefore, the future casualties will be largely oscillatory. Since the bases of the parametric optimization algorithm are small and exact, so that the corresponding parameter space, errors in the estimates are expected. As can be seen from Fig. 7 mean error in the estimation of the suspicious casualties (Mean E Sp ) is around 100, mean error in the estimation of the infected casualties (Mean E I ) is about 10 and mean error in the estimation of the death casualties (Mean E D ) is significantly less than 1. These results confirm that the developed model can quite accurately estimate the infected and death casualties in the presence of the unknown uncertainties. Even though the mean error for the suspicious casualties (Mean E Sp ) seems large, compare to an average of 60000 daily suspicious casualties, it is acceptable as well. Next section now presents the future estimates of the COVID-19 determined based on the model predictions. This section provides the predicted future casualties estimated by the model (22) under the current conditions. Fig. 8a shows that the number of the suspicious casualties is minimized around 1000 days whereas the number of the infected and death people reach their minimum around 300 days. There exists a peak in the results due to small imaginary parts of the eigenvalues discussed in Section VI-B. The proposed model is developed by taking into consideration the suspicious, infected, and death casualties, but it does not take into account the intensive care and intubation casualties, non-pharmacological policies, pharmacological policies, and unknown uncertainties. In the future, modified versions of the model that include these issues can easily be developed based on our current proposed approach. Later, the developed model should be incorporated with artificial intelligence approaches to create policies for future pandemic casualties. The paper initially has reviewed the SIR model adopted for the COVID-19 casualties' estimation. Then, the novel comprehensive SpID model has been introduced, analysed and justified. Later, the unknown parameters of the model have been determined by using the LS based parametric optimization approach for the COVID-19 casualties in Turkey. The results show that the developed model can closely estimate the casualties in Turkey. In addition, the model predicts that the number of the infected and death people will be minimized in 300 days, whereas the number of the suspicious casualties will reach their minimum around 1000 days. Even though the model is trained and analysed by using COVID-19 casualties in Turkey, its unknown parameters can be adapted for the casualties in other countries in the world. Thus, the COVID-19 authorities of the countries can plan new measures against the virus in the short-medium-long terms, and accordingly, update their regulations in the fields of economy, travel and health systems according to the predictions of the model we propose. ONDER TUTSOY was born in Turkey. He graduated from the University of Fırat, Turkey. He received the M.Sc. degree in advanced control and system engineering and the Ph.D. degree in electrical and electronic engineering from The University of Manchester, U.K. He is currently an Associate Professor with Adana Alparslan Türkeş Science and Technology University specialized in design and analysis of robotics, control, artificial intelligence, and object recognition. The recent challenges of highly contagious COVID-19, causing respiratory infections: Symptoms, diagnosis, transmission, possible vaccines, animal models, and immunotherapy The WHO characterizes the COVID-119 as a Pandemic The structure and functions of coronavirus genomic 3' and 5' ends Relationship to duration of infection,'' Radiology The SARS-CoV-2 outbreak: What we know A novel coronavirus from patients with pneumonia in China A familial cluster of pneumonia associated with the 2019 novel coronavirus indicating person-to-person transmission: A study of a family cluster Pneumonia associated with 2019 novel coronavirus: Can computed tomographic findings help predict the prognosis of the disease? Clinical characteristics of severe acute respiratory syndrome coronavirus 2 reactivation The middle east respiratory syndrome (MERS) COVID-19, SARS and MERS: Are they closely related? Coronavirus 2019-nCoV: A brief perspective from the front line The hearth of mathematical and statistical modelling during the coronavirus pandemic Modeling household and community transmission of Ebola virus disease: Epidemic growth, spatial dynamics and insights for epidemic control Rapid AI development cycle for the coronavirus (covid-19) pandemic: Initial results for automated detection & patient monitoring using deep learning Ct image analysis Non-parametric adaptation for neural machine translation Upfront CAD-Parametric modeling techniques for shape optimization,'' in Advances in Evolutionary and Deterministic Methods for Design, Optimization and Control in Engineering and Sciences Epidemic analysis of COVID-19 in China by dynamical modeling Structural identifiability and observability of compartmental models of the COVID-19 pandemic Modeling the epidemic dynamics and control of COVID-19 outbreak in China Effective containment explains subexponential growth in recent confirmed COVID-19 cases in China Modelling the COVID-19 epidemic and implementation of population-wide interventions in Italy Modelling transmission and control of the COVID-19 pandemic in Australia The effect of travel restrictions on the spread of the 2019 novel coronavirus (COVID-19) outbreak,'' Science The incubation period of coronavirus disease 2019 (COVID-19) from publicly reported confirmed cases: Estimation and application