key: cord-1039437-hmem8se3 authors: Nesteruk, Igor title: Long-term predictions for COVID-19 pandemic dynamics in Ukraine, Austria and Italy date: 2020-04-11 journal: nan DOI: 10.1101/2020.04.08.20058123 sha: ce3fa97e6a6f00bae587f6225273f189392d2cad doc_id: 1039437 cord_uid: hmem8se3 The SIR (susceptible-infected-removed) model, statistical approach to the parameter identification and the official WHO daily data about the confirmed cumulative number of cases were used to make some estimations for the dynamics of the coronavirus pandemic dynamics in Ukraine, Italy and Austria. The volume of the data sets and the influence of the information about the initial stages of the epidemics were discussed in order to have reliable long-time predictions. The final sizes and durations for the pandemic in these countries are estimated. Here we consider the development of epidemic outbreak in Italy, Ukraine and Austria caused by coronavirus COVID-19 (2019-nCoV) (see e.g., [1] ). Some estimations of the epidemic dynamics in these countries can be found in [2] [3] [4] [5] [6] [7] . In particular, the final size of the epidemic in these tables. The data sets presented in Table 1 were used only for comparison with corresponding SIR curves. The SIR model for an infectious disease [7] [8] [9] [10] [11] relates the number of susceptible persons S (persons who are sensitive to the pathogen and not protected); the number of infected is I (persons who are sick and spread the infection; please don't confuse with the number of still ill persons, so known active cases) and the number of removed R (persons who no longer spread the infection; this number is the sum of isolated, recovered, dead, and infected people who left the region);  and  are constants. To determine the initial conditions for the set of equations (1-3), let us suppose that at the moment of the epidemic outbreak 0 t , [10, 11] : 5 12 3858 47 1 6 13 4636 66 1 7 14 5883 104 1 8 15 7375 112 1 9 16 9172 131 1 10 17 10149 182 1 11 18 12462 302 1 12 19 15113 361 3 13 20 17660 504 3 14 21 21157 800 3 15 22 24747 959 3 16 23 27980 1132 7 17 24 31506 1332 14 18 25 35713 1646 16 19 26 41035 1843 16 20 27 47021 2649 26 21 28 53578 3024 47 22 29 59138 3631 47 23 30 63927 4486 84 24 31 69176 5282 113 25 32 74386 5888 156 26 33 80539 7029 218 27 34 86498 7697 311 28 35 92472 8291 418 29 36 97689 8813 480 30 37 101739 9618 549 31 38 105792 10182 669 1 39 110574 10711 804 2 40 115242 11129 987 3 41 119827 11525 1096 4 42 124632 11766 1251 5 43 128948 11983 1319 6 44 132547 12297 1462 All rights reserved. No reuse allowed without permission. the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint (which was not peer-reviewed) is . https://doi.org/10.1101/2020.04.08.20058123 doi: medRxiv preprint Thus, for every set of parameters N,  , , 0 t and a fixed value of V the integral (6) can be calculated and the corresponding moment of time can be determined from (5) . Then functions I(t) and R(t) can be easily calculated with the of formulas, [11, 12] . Function I has a maximum at S   and tends to zero at infinity, see [8, 9] . In comparison, the number of susceptible persons at infinity 0 S   , and can be calculated from the non-linear equation, [11, 12] : The final number of victims (final accumulated number of cases) can be calculated from: To estimate the duration of an epidemic outbreak, we can use the condition which means that at final t t  less than one person still spread the infection. In the case of a new epidemic, the values of this independent four parameters are unknown and must be identified with the use of limited data sets. A statistical approach was developed in [11] and used in [5, 11, 12, 15] to estimate the values of unknown parameters. The registered points for the number of victims V j corresponding to the moments of time t j can be used in order to calculate for every fixed values N and  with the use of (6) and then to check how the registered points fit the straight line (5) . For this purpose the linear regression can be used, e.g., [13] , and the optimal straight line, minimizing the sum of squared distances between registered and theoretical points, can be defined. Thus we can find the optimal values of  , 0 t and calculate the correlation coefficient r . All rights reserved. No reuse allowed without permission. the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint (which was not peer-reviewed) is . https://doi.org/10.1101/2020.04.08.20058123 doi: medRxiv preprint Then the F-test may be applied to check how the null hypothesis that says that the proposed linear relationship (5) fits the data set. The experimental value of the Fisher function can be calculated with the use of the formula: where n is the number of observations, m=2 is the number of parameters in the regression equation, [13] . The corresponding experimental value F has to be compared with the critical value The first preliminary prediction for Italy was published in [5] on March 27, 2020. Its results are presented in the first column of Table 3 . Table 3 ). These moments correspond to May 25-31, 2020. The average time of spreading infection 1/  could be estimated as 0.7-0.75 days (according to last two predictions, see Table 3 ). By comparison, in South Korea was approximately 4.3 hours, [16] . Numbers of infected I (green), removed R (black) and the number of victims V=I+R (blue line). the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint (which was not peer-reviewed) is . https://doi.org/10.1101/2020.04.08.20058123 doi: medRxiv preprint Numbers of infected I (green), removed R (black) and the number of victims V=I+R (blue line); "circles" show the cases taken for calculations; "triangles" correspond to the cases during initial stage of the epidemic; "star" -the last data point used only for a verification of the prediction. the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint (which was not peer-reviewed) is . https://doi.org/10.1101/2020.04.08.20058123 doi: medRxiv preprint For this country the results of calculations are shown in Table 4 . Fig. 2 represents the results for the prediction No. 2 corresponding to the highest value of / (1, 2) C F F n . It can be seen that first cases of COVID-2019 infection were probably timely identified and sick people were isolated. This may be a reason for much lower saturation level of the epidemic in Austria (approximately 12,000 -13,000 according to the last predictions 2 and 3). The new cases could stop to appear at the moments of time 62 -65 (see two last values in the last row of Table 4 ). These moments correspond to April 24-27, 2020. The average time of spreading infection 1/  could be estimated as 0.44-0.47 days and is much lower than in Italy, but higher than in South Korea, [16] . The results of calculations are shown in Table 5 . Table 5 ). These moments correspond to April 19-23, 2020. The average time of spreading infection 1/  could be estimated as 0.36-0.4 days. Numbers of infected I (green), removed R (black) and the number of victims V=I+R (blue line); "circles" show the cases taken for calculations; "triangles" correspond to the cases during initial stage of the epidemic; "star" -last data point used only for a verification of the prediction. The accuracy of any mathematical model is limited. The used SIR model is not an exception. The real processes are much more complicated. In particular, all the parameters in SIR model are supposed to be constant. If the quarantine measures and speed of isolation change or new infected persons are coming in the country, the accuracy of the prediction reduces. The accuracy of predictions increases with increasing the number of observations. On the other hand, the need for forecasts is reduced if an epidemic is stabilized. The SIR (susceptible-infected-removed) model and statistical approach to the parameter are able to make some reliable estimations for the epidemic outbreaks. The accuracy of long-term predictions is limited by uncertain information, especially at the beginning of an epidemic. The long All rights reserved. No reuse allowed without permission. the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint (which was not peer-reviewed) is . https://doi.org/10.1101/2020.04.08.20058123 doi: medRxiv preprint enough observations may eliminate the influence of the initial stage data and increase the accuracy of predictions. Even at limited amount of data the SIR model can be used to estimate the final size of an epidemic and its duration. The further course of COVID-19 pandemic in Ukraine, Austria and Italy will show the real accuracy of the proposed method. Coronavirus disease (COVID-2019) situation reports Coronavirus epidemic outbreak in Europe. Comparison with the dynamics in mainland China Comparison of the coronavirus epidemic dynamics in Italy and mainland China Comparison of the coronavirus pandemic dynamics in Europe. USA and South Korea Stabilization of the coronavirus pandemic in Italy and global prospects Coronavirus pandemic dynamics in March Comparison of the coronavirus pandemic dynamics in Ukraine and neighboring countries A contribution to the mathematical theory of epidemics Mathematical biology Comparison of mathematical models for the dynamics of the Chernivtsi children disease Statistics-based predictions of coronavirus epidemic spreading in mainland China Applied Regression Analysis Maximal speed of underwater locomotion Estimations of the coronavirus epidemic dynamics in South Korea with the use of SIR model I would like to express my sincere thanks to Gerhard Demelmair and Ihor Kudybyn for their help in collecting and processing data.