key: cord-0818443-v6on5gv9 authors: Asteris, Panagiotis G.; Douvika, Maria; Karamani, Christina; Skentou, Athanasia; Daras, Tryfon; Cavaleri, Liborio; Armaghani, Danial Jahed; Chlichlia, Katerina; Zaoutis, Theoklis E. title: A Novel Heuristic Global Algorithm to Predict the COVID-19 Pandemic Trend date: 2020-04-22 journal: nan DOI: 10.1101/2020.04.16.20068445 sha: 93fed3054f2960f7f8ff0b846542fa34a6ed8f4a doc_id: 818443 cord_uid: v6on5gv9 Background Mathematical models are useful tools to predict the course of an epidemic. The present manuscript proposes a heuristic global algorithm for predicting the COVID-19 pandemic trend. Methods The proposed method utilizes a Gaussian-function-based algorithm for estimating how the temporal evolution of the pandemic develops by predicting daily COVID-19 deaths, for up to 10 days, from the day the prediction is made. This dataset, the number of daily deaths in each country or region, encapsulates information about (a) the quality of the health system of each country or region, (b) the age profile of the country s or region s population, and (c) environmental and other conditions. Findings The validity of the proposed heuristic global algorithm has been tested in the case of China (at different temporal stages of the pandemic), a country where the disease trend seems to have run its course. It has been applied to ten countries/states/cities, namely California, Germany, China, Greece, Iran, Lombardia/Italy, New York, Sweden, United Kingdom and USA, for each one of which predictions have been obtained. The method has also been applied to the United States as a whole, as well as to the states of New York and California, in order to investigate how the pandemic is developing in different parts of the same country. Interpretation Based on the predicted findings, the proposed algorithm seems to offer a robust and reliable method for revealing the SARS-CoV-2 temporal dynamics and disease trend, as such, can be a useful tool for the relevant authorities. In January 2020, the novel aggressive coronavirus SARS-CoV-2 was identified as the causative agent of an outbreak of viral pneumonia in Wuhan/China, the coronavirus disease 2019 . The outbreak of COVID-19 has already spread to more than 200 countries and has been officially declared a global pandemic [1] . The number of confirmed cases increases drastically every day as well as the number of deaths. In response to COVID-19, governments have implemented several regulations that constrain personal freedoms (physical distancing) and restrict their economies, placing approximately 3 billion people under lockdown. To follow the transmission dynamics, there is a big demand in early diagnosis with a race in developing and approving tests for early and accurate molecular diagnosis of the infection. Thus, there is an urgent need in early prediction in order to reduce the risk for transmission worldwide in all locations. The task governments and national authorities now are facing is to apply recommendations and decisions for rapid strengthening of outbreak surveillance and control efforts. It is very important to be able to estimate and predict the virus spread and be able to decode viral characteristics of the infection and spread pattern in many countries worldwide, taking into consideration all relevant aspects, in order to take robust decisions for political interventions and control measures. Mathematical models are used to forecast the course of the epidemic. In the light of the above, with reference to the mortality data in each country, the aim of the current study was to develop a novel robust and reliable global algorithm for estimating and predicting the COVID-19 pandemic outburst for up to 10 days after the prediction date in different locations worldwide. During the study of the development of the COVID-19 pandemic, the daily total number of confirmed deaths due to COVID-19 for each location have been recorded and utilized further. The selection of daily deaths was based on the authors' assumption that mortality rates provide more accurate and reliable data as opposed to recordings of the number of daily infected individuals. The credibility of the latter is restricted due to the fact that the actual real situation needs systematic and thorough study of many data based on statistical rules, which render the study time-demanding and high-priced. In addition, the daily mortality rate contains additional information on many crucial parameters that influence the pandemic transmission trend and spread. Among the parameters that are included in the death recordings are the following:  The climate conditions of each country, state or region  The quality of the health-care system  The experience/level of the medical staff / health care workers  The age distribution of the population (demographic structure) In addition to the above, the main assumption during the design of this algorithm was the observation that the mortality rate, in particular the death numbers in the respective populations, follow a normal distribution. Even though daily recording might not be the case for optimal normal distribution, it is important to notice that the selection of death recordings every 2 days, or 3 days etc. leads -almost alwaysto an optimal normal distribution. Following this assumption, the simulation of the pandemic spread was investigated for a variety of different scenarios of mortality rate recordings and the best setting giving the best results and predictions was selected. Analyzing the official data from China (daily Coronavirus-19 incidents and deaths), the country where the pandemic began and which now appears to have largely overcome it, one can easily see that these data can be closely approximated/expressed using a suitable Gaussian curve (or equivalently a proper normal distribution density function). In addition, by studying the evolution of the pandemic and the course of the events/restrictions in this country, and taking into account that almost all European and other world countries have taken similarly strict restrictive measures, we assume that if something does not change dramatically, and not taking in account possible population or climate/environmental differences, the development of Coronavirus-19 pandemic will be similar in (most) European and world countries i.e data concerning the epidemic will be expressed using a proper normal distribution. A Gaussian function is a function of the form: The graph of the function is a symmetrical bell-shaped curve centered at the position x   , A is the height of the peak and the variance 2  controls its width. On both sides of the peak, the tails of the curve quickly fall of and approach the x-axis (asymptote). Our algorithm is trying to determine in each case (country/state) the optimal normal curve (for daily deaths), by calculating the parameters 2 A , i.e by fitting to the given data the "best" possible normal curve. The optimality is given w.r.t well known statistical indices. More precisely, the main steps of the algorithm are (through a triple loop): This interval is used to fit the actual data (using a proper transformation). 3. (for  / third inner loop) We start from a value of  =20 and we continue, with step 1 (day), up to a value of  =60 (we observed for example that in the case of China, the phenomenon lasted for about 60 days with an average (peak day of deaths) in about the 30th day). 4. As a result, of the algorithm application, a great deal of (proper) normal distributions are being created, by calculating in each time the theoretical/ experimental values of the corresponding normal distribution. Finally, these values are being compared with the empirical values (actual deaths data) and the "best" possible curve is being selected using a number of indices (smallest possible differences between theoretical and empirical data). All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The reliability and accuracy of the developed best fit Gaussian curves for each one prediction were evaluated using Pearson's correlation coefficient R and the root mean square error (RMSE). RMSE presents information on the short-term efficiency which is a benchmark of the difference of predicated values in relation to the experimental values. The lower the RMSE, the more accurate is the evaluation. The Pearson's correlation coefficient R measures the variance that is interpreted by the model, which is the reduction of variance when using the model. R values ranges from 0 to 1 while the model has healthy predictive ability when it is near to 1 and is not analyzing whatever when it is near to 0. These performance metrics are a good measure of the overall predictive accuracy. The present section outlines the methodology used to investigate the spread of COVID-19 in a country or parts of it such as a state, city or region. In particular, the methodology is being presented here step by step, as it was conducted and applied in the case of the investigation of the spread of the epidemic in China. In fact, given that the epidemic in China proceeded of the epidemic in other countries, provides the possibility of applying the proposed algorithm both at the beginning of the phenomenon, in its next phase which is usually characterized by a strong dynamic and finally at its peak where the dynamic of the phenomenon begins to fade as it is usual in dynamic phenomena. The main characteristics/steps of the proposed methodology are:  In each step of the study of the phenomenon, the optimal normal distribution is calculated using the proposed algorithm and based on (up to the moment of application) available data. All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 22, 2020. . https://doi.org/10.1101/2020.04.16.20068445 doi: medRxiv preprint  It was deemed necessary that, the first assessment be made 14 days after the first death record. The period of two weeks is considered necessary to characterize, in a reliable way, the beginning of the phenomenon (initial conditions in the light of the dynamic phenomena in the field of engineering)  In each time step, following the 14 day period from the first death record, the optimal data simulation curve is calculated with the use of the proposed algorithm. Figure 1 shows the optimal curve that best simulates the data (number of deaths) that precede the time of the prediction (12 2-day periods or 24 days). All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 22, 2020. . https://doi.org/10.1101/2020.04.16.20068445 doi: medRxiv preprint  The same procedure is being applied on a daily basis and based on the values of the maximum number of deaths and on the time when this maximum is attained we plot the curve of Figure 2 This figure is very useful as it illustrates how the phenomenon evolves by providing us with an estimate of when the phenomenon is expected to peak, even with the number of deaths. This information is especially useful to the authorities because it helps them prepare accordingly to deal with it.  By predicting the optimal curve of Figure 1 , we are also provided with information about the dynamics of the phenomenon. In particular, knowing the parameters of the distribution (sigma, mi and fitting probability) its area is calculated which shows the total predicted number of deaths. Based on the percentage change in the number of deaths as a All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 22, 2020. . https://doi.org/10.1101/2020.04.16.20068445 doi: medRxiv preprint function of time, the change in the dynamics of the phenomenon is defined (Figure 3 ). In this figure it is intensely demonstrated that the COVID-19 phenomenon is a predominantly dynamic phenomenon with clear dynamic characteristics which oscillates strongly during its transition to the peak and then dissipates.  In addition to the above useful estimates and the revelation of the dynamic characteristics of the phenomenon using the proposed heuristic algorithm, it is possible to reliably predict the expected number of deaths for the next 10 days Figure 4 . Simultaneously, with the estimated expected number of deaths we get an estimate for its higher and lower limits. Based on a comprehensive study in all ten countries and the results that were All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 22, 2020. . https://doi.org/10.1101/2020.04.16.20068445 doi: medRxiv preprint procured and, will be presented below, these limits were confirmed for all countries and cities as well as the difference between the predicted and actual deaths. in 2-day intervals, for the next ten days starting February 12, 2020, for the country of China. Black dots represent actual data until the day in which the algorithm made the prediction. Blue dots represent actual data after the day in which the algorithm made the prediction. In the light of the above, a new computer software has been developed in Computational Mechanics Laboratory, School of Pedagogical and Technological Education, Athens, Greece under the supervision of Prof. Asteris. Utilizing this software through implementation of the heuretic algorithm, the behavioral development of the epidemic was investigated in ten different geographical locations. In particular, COVID-19 development trend was examined in California, All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. Germany, Greece, Iran, New York, Sweden, United Kingdom and United States. The investigation was implemented in two stages. In the first stage during the previous time periods, the data as well as the daily mortality rates and results were known. This stage was selected to evaluate the phenomen, but more importantly to document the proposed heuretic global algorithm and the respective methodology. In the second stage, predictions were made for which the results are unknown. In detail, for the total of the above-mentioned locations the number of daily deaths for the next 10 consecutive days, from the start of 13 th of April 2020. The results are presented in detail for every situation in tables and in figures in supplementary materials. The main results of our study are the following:  The proposed algorithm was confirmed absolutely for the total predictions implemented for the first phase, where data and results of daily mortality rates were known.  The proposed methodology provides an upper and a lower estimation limit, which was confirmed for the total cases of the first stage predictions.  For the predictions of the secondary stage and more specifically from the 13 th until the 22 nd of April the confirmation or disproof is expected based on the respective predictions provided by the Institute for Health Metrics and Evaluation (IHME). These were performed on the 12 th of April, the same day of the current study, The upper and lower estimation limit is in our predictions much smaller than the respective limits provided by IHME as evident in  Table 1.  Noteworthy, in confirmation of the above-mentioned statements, using the diagrams where all the development of the phenomenon is presented, additional information is All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The aim of the current study was to develop a novel algorithm for estimating and predicting the COVID-19 pandemic outburst for up to 10 days after the prediction date. Although so far there are available reports on country-specific models, taking into consideration the specific features of each country or region, this novel global algorithm is of particular importance as it applies in all 10 different locations examined with exceptional characteristics and relevance. Noteworthy, since there are significant differences in many aspects among countries as well as unique local transmission dynamics, there are difficulties in designing an algorithm that is able to predict with high confidence the outcome of the outbreak for the next All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 22, 2020. . https://doi.org/10.1101/2020.04.16.20068445 doi: medRxiv preprint period of time in all examined locations. Interestingly, this global prediction tool applies to all countries tested although every country has different internal political characteristics in response to coronavirus crisis (big variation in how well governments are responding), and as such the impact of SARS-CoV-2 infection pattern is not evenly distributed. This information, by providing a tool for estimating and predicting the development of the pandemic, is of high relevance and paramount importance for governments and local authorities for taking key decisions as whether to extend quarantine or relax social distancing control measures. While disease is growing exponentially, the health-care system faces several burdens. Based on the proposed predictions, governments can be prepared, plan and act immediately to ensure adequate health-care and reduced mortality risk due to COVID-19. The proposed algorithm is expected to make a substantial contribution to engineering problems, where it is frequent that the parameters of a multitude of engineering problems follow a normal distribution. Authors also believe that since data/parameters referring to other "related families of viruses" of COVID-19 appear to have a normal distribution, the proposed algorithm will be universally applicable. The authors have begun to investigate in this direction and very soon results will be presented on a companion paper. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. Tables Table 2 Predicted COVID-19): WHO characterizes COVID-19 as a pandemic