key: cord-0434144-rozt2qja authors: Yanev, Nikolay M.; Stoimenova, Vessela K.; Atanasov, Dimitar V. title: Stochastic modeling and estimation of COVID-19 population dynamics date: 2020-04-02 journal: nan DOI: nan sha: db4bd9a68431bd3e71294fc82bf3eb387305368d doc_id: 434144 cord_uid: rozt2qja The aim of the paper is to describe a model of the development of the Covid-19 contamination of the population of a country or a region. For this purpose a special branching process with two types of individuals is considered. This model is intended to use only the observed daily statistics to estimate the main parameter of the contamination and to give a prediction of the mean value of the non-observed population of the contaminated individuals. This is a serious advantage in comparison with other more complicated models where the observed official statistics are not sufficient. In this way the specific development of the Covid-19 epidemics is considered for different countries. Nowadays the Theory of Branching Processes is a powerful tool for modeling the development of populations whose members have an ability of reproduction following some stochastic laws. The objects may be of different types and physical nature. From nuclear reaction and cosmic rays to cell proliferation and digital information, branching stochastic models are used to explain very interesting real-world stochastic phenomena. Branching processes have serious applications in Physics, Chemistry, Biology and Medicine, Demography, Epidemiology, Economics an so on. The basic models and analytical results are presented in many books and a lot of papers. We would like to point out the monographs [1 − 5] among the others. Some applications of branching processes in Biology and Medicine are presented in [4] and [6] . Some basic estimation problems are considered in [7] . The aim of the present paper is to describe an adequate model of the development of the Covid-19 contamination in the population. For this purpose a special branching process with two types of contaminated individuals is constructed and considered day by day. In this way we are able to use the observed statistics of the Covid-19 daily registered contaminated individuals and to estimate the main parameter of contamination. In fact this parameter m represents the mean value of the contaminated individuals by one individual per day. Using the observed statistics some methods for estimation are proposed and the corresponding graphics are presented. In this way we are able to give a prediction of the possible development of the mean value of the contaminated individuals. The modeling and estimation is performed for Bulgaria and some other countries: Italy, France, Germany, Spain globally. Of course, the model can be applied to any other country or region. In the paper the results are presented in detail for Bulgaria, Italy and globally. Results for other countries, additional information, reports and plots, related to this research can be found on http://ir-statistics.net/covid-19. The theoretical model is described in detail in Section 2. The p.g.f.'s and the mathematical expectations are obtained. Regardless of its simplicity the model has a great advantage using only the observed official statistics of the lab-confirmed cases. The estimation problems are presented in Section 3. Some conclusive remarks are given in Section 4. 2 Two-type branching process as a model of Covid-19 population dynamics. What can be observed? -Only that part of the contaminated individuals who became ill or who are discovered as a result of medical tests. Every day only the statistics of this registered part of all contaminated individuals is available. To describe this situation we can consider a two type branching process {Z 1 (n), Z 2 (n)} where type T 1 are contaminated (but still healthy) individuals who don't know that they are Covid-19 infected and type T 2 of discovered with Covid-19 virus individuals (and this is our real statistics). Every individual of type T 1 (contaminated) produces per day a random number of new individuals of type T 1 (contaminated) or only one individual of type T 2 (more precisely, in this case the individual type T 1 is transformed into an individual type T 2 ). Note that T 2 is a final type, i.e. the individuals of this type don't take part in the further evolution of the process because they are isolated under the quarantine. Let 2 ) be the offspring vector of type T 1 . Then the offspring joint probability generating function (p.g.f.) of type T 1 can be defined as follows: where | s 1 | ≤ 1, |s 2 | ≤ 1. Obviously h 2 (s 1 , s 2 ) ≡ 1 because type T 2 has (0, 0) offspring. Note that p 0 is the probability that type T 1 goes out of the reproduction process (the individual becomes healthy or goes out of the country, i.e. emigrates), p j is the probability to produce new j contaminated individuals of type T 1 and q is the probability that the individual type T 1 is confirmed ill (or dead). In other words, q = P{T 1 → T 2 }, i.e. with probability q an individual of type T 1 is transformed into an individual of type T 2 . One can obtain also that the marginal p.g.f. are If we assume that Z 1 (0) > 0 and Z 2 (0) = 0 then for n = 1, 2, ... 2 (n; j)} are independent and identically distributed (iid) as (ξ 2 ). Interpretation: Z 1 (n) is the total number of individuals (type T 1 ) in the n-th day contaminated by the individuals of the (n − 1)-th day; Z 2 (n) is the total number of the registered with Covid-19 individuals (type T 2 ) in the n-th day. The process starts with Z 1 (0) contaminated individuals, where Z 1 (0) can be an integer-valued random variable with a p.g.f. h 0 (s) = Es Z0 = ∞ k=1 p 0k s k , |s| ≤ 1, or Z 0 = N for some integer value, N = 1, 2, ... . The random variable ξ (1) 1 (n; j) is the number of contaminated individuals (type T 1 ) in the n-th day infected by the j-th contaminated individual from the (n − 1)-th day, j = 1, 2, ..., Z 1 (n − 1). Similarly the random variable ξ (1) 2 (n; j) is the number of the confirmed contaminated individuals (type T 2 ) in the n-th day transformed by the j-th contaminated individual from the (n − 1)-th day, j = 1, 2, ..., Z 1 (n − 1). Note that P{ξ .. In other words the probability q can be interpreted as a proportion of the confirmed individuals in the day n among all contaminated individuals in the day n − 1. Let Then it is not difficult to check that for n = 0, 1, 2, ..., we are able to obtain the p.g.f. of the process: Estimating m we are able to predict the mean value of the contaminated (non observed) individuals in the population. In the case when we assume that Z 1 (0) = 1 then M 1 (n) = EZ 1 (n) can be approximated respectively by m n n , or m n n , or m n n,N . In fact it means that we can obtain three types of estimators M 1 (n) = m n n , M 1 (n) = m n n and M 1 (n) = m n n,N . In other words we could say that we have at least three prognostic lines. Therefore if we have observations (Z 2 (1), Z 2 (2), ..., Z 2 (n)) over the first n days, we are able to predict the mean value of the contaminated individuals for the next k days by the relations: M 1 (n + k) = m n+k n , M 1 (n + k) = m n+k n and M 1 (n + k) = m n+k n,N , k = 1, 2, ... We are able to estimate also the proportion α(n) of the registered contaminated individuals among the population in the n-th day. Then we can obtain the following three types of estimators: All obtained estimators will be presented by the observed registered labconfirmed cases in the next section. The quality of the estimation, however, depends on the representativeness of the sample due to the specifics of the data collection in each country. Comment. For more detailed investigation and simulation the following models can be applied in the considered situation: (i) h * (s) = q + p 0 + p 1 s + p 2 s 2 + ... + p k s k , where q = 1 − k j=0 p j and p j , j = 0, 1, ..., k, can be specially chosen for k = 2, 3, 4, 5, 6, 7, 8. ( It is possible to consider also the restricted geometrical distribution up to some k = 2, 3, 4, 5, 6, 7, 8. ( Similarly it is possible to consider also the restricted Poisson distribution up to some k = 2, 3, 4, 5, 6, 7, 8. Note that the parameters of these distributions can be set in the manner that d ds h * (s)| s=1 is equal to m n n , or m n n , or m n n,N . Then with this individual distributions it is possible to simulate the trajectories of the non-observed process of contamination for further studies. We would like to point out once again that the considered in Section 2 model is versitile but the application in each country is specific because it depends essentially on the official data from the country. The plots and tables below illustrate well some specific details for different countries as well as the common trend. The data used for the estimation of the parameters of the model come from the official World Health Organization (WHO) situation reports [10] . We apply the here defined model. Note that the observed data is the number of the newly (daily) registered individuals denoted by Z 2 (n). The data about the new number of infected individuals (denoted by Z 1 (n)) is unobservable. The initial number m 0 = EZ 1 (0) is also unknown. Here n is the corresponding day from the beginning of the contamination. The estimation can be summarized in the following steps. The dynamics of these parameters over time is studied and for the forecasting purposes the most recent value is used. 3. The proportion α(s + k) of the registered contaminated individuals among the population of all infected in the s + k-th day is estimated. We will demonstrate the approach described above with the data of the reported laboratory-confirmed COVID-19 daily cases globally, in Italy and in Bulgaria. On Figure 1 one can see the original data for the newly reported and for the total number of registered cases globally. The dynamics of the Harris, Lotka-Nagaev and Crump-Hove estimators can be followed on Figure 2 . The Harris estimator shows a relatively more stable behaviour during the period. The three estimates exhibit similar asymptotics. The next step is to calculate the mean values of the expected number of nonregistered contaminated individuals for the three types of estimators, starting with s = 20 (Figure 3) . The last five points on the graph after day 20 represent the forecast for the next 5 days. The corresponding values of the proportion of the registered individuals among the contaminated population is presented on Figure 4 For the registered cases in Italy the raw data can be seen on the Figures 5 -8 In fact these data form the sample (Z 2 (1), Z 2 (2), ..., Z 2 (20)), where now n = 20, i.e. 20 days from the first confirmed contaminated individuals. We use also the statistics U (k) = k j=1 Z 2 (j), k = 1, 2, ..., 17, whose values are given as The two samples are presented on Figure 9 . The estimated mean number of infected individuals for the three types of estimators is presented on Figure 10 , the estimated expected number of contaminated individuals (with s = 10, chosen on the basis of the shorter time period of contamination) can be seen on Figure 11 and the estimated α -on Figure 12 . The values of the Harris estimator at the end of the contamination period for the three datasets considered above can be found in Table 1 Up to day 20 the Harris estimator (for Bulgaria) is 1.1093 (i.e. slightly To approve the behaviour of the model one can compare the predicted number of confirmed individualsZ 2 (n − k) (using the Harris estimatorm n ) for the period of k days before the latest date in the data set with the actually observed values of Z 2 (n). The corresponsing values, as well as the 95% confidence interval of the forecast are presented on Table 2 . Additionaly, using the data, provided by WHO the comparison between the behaviour of the process under different local conditions with a approximatelly same contamination period. For example on the Figure 13 the Harris estimator globally, for Italy, Germany and France are compared. As mentioned earlier the mean number of new contaminated individuals by one c.i. tends to values close, but greater than 1 as the time of contamination increases. Calculating the proportions α of the confirmed contaminated individuals among the four populations with s = 20 (presented on Figure 14 ) one can see that α's vary largely but stay relatively high, especially at the end of the contamination period. This fact can be considered as a result of different types of actions, undertaken to restrict the spread of the infection. First of all the estimation of the mean value of reproduction m allows us to classify the contamination process as supercritical (m > 1), critical (m = 1) and subcritical (m < 1). In the supercritical case the mean population growth is exponential, in the critical case the mean value of the population is constant and in the subcritical case the decreasing of the mean population is exponential. Up to the moment of our investigation the estimated mean number of new contaminated individuals (for Bulgaria) is slightly greater than 1 which corresponds to the exponential growth of the contaminated population, globally and locally in specific countries and regions. Finally the estimating of the mean parameter of contamination can be considered as a first stage to construction of a more complicated epidemiological model. As a such model for example, one can use a branching process with random migration considered in [8] [9] or some other model of controlled branching processes (see [5] ). Additional information, reports and plots, related to this research can be found on http://ir-statistics.net/covid-19. The Theory of Branching Processes Branching Processes. Nauka, Moscow, 1971 Branching Processes Branching Processes with Biological Applications Controlled Branching Processes Transient Processes in Cell Proliferation Kinetics Statistical inference for branching processes Critical branching processes with random migration Branching Processes with two types of emigration and state-dependent immigration Data set day kZ 2 (n − k) Z 2 (n − k) 95% CI lower 95% CI upper