key: cord-221131-44n5pojb authors: Zullo, Federico title: Some numerical observations about the COVID-19 epidemic in Italy date: 2020-03-25 journal: nan DOI: nan sha: doc_id: 221131 cord_uid: 44n5pojb We give some numerical observations on the total number of infected by the SARS-CoV-2 in Italy. The analysis is based on a tanh formula involving two parameters. A polynomial correlation between the parameters gives an upper bound for the time of the peak of new infected. A numerical indicator of the temporal variability of the upper bound is introduced. The result and the possibility to extend the analysis to other countries are discussed in the conclusions. The coronavirus disease 2019 (COVID-19) has been recognized in Italy starting from 31st January 2020, also if there are some evidences that the first cases started some time earlier [3] . The spread of the disease started from the northern regions of Italy, Lombardy and Veneto on the 21st of February 2020: after about twenty positive cases, two small areas had been put in quarantine but, since then, the number of infected increased exponentially in this two districts. From the localities situated farthest south in Lombardy the disease spread in closest regions, Emilia-Romagna above all, with the provinces of Piacenza, Parma, Modena, Rimini, but soon after also in other provinces of Lombardy (e.g. in the province of Bergamo). On the 8th of March a lockdown for the entire region of Lombardy (with a population of about 10 millions of inhabitants) and other fourteen provinces (population of about six millions of inhabitants) was imposed by the central government of Italy and then extended, on the 9th of March, to the entire country. Up to the 23rd of March, the total number of infected is 63927, of which 28761 in the Lombardy region. Since the start of the epidemic in China, a certain number of studies appeared in the mathematical community about this subject: the description of the spatial or temporal diffusion of the infected in given regions [4] , [8] - [10] , the transmission dynamics of the infection [6] , the economic and financial consequences of the epidemic [1] , the effect of atmospheric indicators on the spread of the virus [5] , are only a fraction of the topics under investigation in these days. One of the simplest non-linear deterministic continuous (in time) model of epidemiology is the SIR model, in which the overall population is divided in three disjoint classes: S is the number of susceptible individuals, I the number of infectious individuals and R the number of recovered individuals. Albeit its non-linearity, the dynamic of the model is fairly uncomplicated and manageable from an analytical point of view, and displays very interesting properties such as the existence of an epidemic threshold (see e.g. [11] ), making it a very reasonable model. In section (2) the SIR model is introduced and briefly discussed, in section (3) we will analyze the data of the cumulative number of infected in Italy on the base of two simple hypotheses. An upper bound for the timing of the peak of new number of infected is obtained. In the conclusions we will comment about the results and look for possible extensions. The SIR model describes the evolution of the individuals in the susceptible, infectious or recovered classes with the following differential equations: The total population N = S + R + I is a conserved quantity from the dynamical point of view, meaning that there are only two independent variables in the set of equations (1) . The characteristics of this model are well-known and the interested readers can look for example at the discussions in the classical books of Braun [2] and Murrray [11] . Here we will make only few observations, relevant for the next sections. Some authors do not include the denominator N on the right hand side of (1), since it is a constant and can be absorbed by a re-definition of the parameter r. However, we will keep it: in this way it is evident the scaling property of this model: if the initial conditions (S 0 , I 0 , R 0 ) are scaled by a common constant factor α, (and so the total population is scaled by a factor α), the solution is scaled by the same factor. Some temporal properties of this model, like the time corresponding to a maximum in I (the time of the peak of the infected), do not depend on the scaling. This property is very useful, since the actual number of infected or susceptible (and then of recovered) is in general not known. The reasonable assumption that the same fraction (with respect to the total) of infected, susceptible and recovered individuals are known, gives the possibility, in this case, to compare the measured data with the properties that are scale-independent. The solution of the system (1) can be written in terms of just one variable: if R(t) is known, I(t) can be obtained by the third equation and S(t) from the first one or from the constrain N = S(t) + I(t) + R(t). If the epidemic is not severe (the number R(t) can be considered small compared to the overall population), an explicit formula for the number of recovered can be obtained in terms of the hyperbolic tangent function. The functions reads as where we used the initial value R(0) = 0 and the parameters (α, β, c) can be made explicit in terms of the parameters appearing in (1) . The previous is one of the example of the so-called s-shaped epidemiological curve (with a "peaked" derivative, the function sech 2 ) that universally describes an infection disease. The value of the number of infected can be obtained by derivation, i.e. I(t) = αβsech(βt − c) 2 . When considering the cumulative number of infected, R + I, the contribution of sech 2 is negligible on the tails, whereas it is more pronounced in correspondence of the maximum of sech 2 , but it is however small if the value of the parameter β is less than one. In this case, the value of R + I is well approximated by a tanh formula like (2), with a certain different value of c. This is what we will assume in the next section. It is clear that the description made in the previous section is very basic. However it has the advantage to be manageable and to incorporate the main properties of the SIR model. It is not by chance that the first application of the SIR model (the Bombay plague of 1905) by Kermack and McKendrick [7] used precisely the tanh formula above. In the following we will base our analysis on two hypothesis: (i) We assume that the cumulative number of infected is described by a tanh model. This assumption is independent on the underlying dynamical model considered, but may be justified on the base of some of them (e.g. the SIR model, as shown in the previous section). (ii) We assume that, whatever it is the underlying model describing the evolution of the number of infected, this model is scale invariant, in the sense specified in the previous section. The second hypothesis is fundamental since we are going to look at scale-independent quantities: even in the case the measured number of infected and recovered individuals are different from the actual values, it is possible to estimate these quantities. The cumulative total number of infected that will be considered in the next lines are those of entire Italy territory. There are at least two reasons that suggested to not take regional or local data: the first one is that the epidemic started to spread across three different regions (Lombardy, Veneto and Emilia-Romagna) and there could not be a correspondence between the locality where a certain fraction of inhabitants reside and the region where this fraction was infected. This is also true at a national level, but the fraction is assumed to be smaller. The second reason is that a non negligible number of workers and students moved, just before the lockdown, from the regions in the north of Italy to their regions of origin in the center and south of Italy. The possibility that a non negligible flow of infected people passed from the north to other regions should be taken into consideration. By taking the entire national set of data we overpass the above issues. The data can be taken for example from Italian Protezione Civile [12] or from WHO [13]. The cumulative total number of infected will be indicated by F n , with F 1 = 20 corresponding to the number of infected on 21st of February 2020. The subscript n stays for the number of days from the starting of epidemic. These data will be opposed to the continuous formula The value of β will be taken to be constrained by the equation The function f (3) then depends on two parameters, α and c. When necessary, to stress the dependence on these parameters, we will denote the function with f α,c (t). The cumulative final number of infected expected from formula (3) is given by f ∞ = α(1 + tanh(c)). It is possible to estimate the parameters α and c by minimizing the difference between the actual and predicted number of cases, i.e. minimizing In order to have a reasonable minimum number of data, we start the analysis by taking n ≥ 15. The values of the parameters minimizing the sum S n are reported in table (1) At the 31st day of epidemic the final number of infected is estimated to be 76193, but this is only a lower bound since the values of α n and c n are increasing. A plot of f α 31 ,c 31 (t) and of the cumulative number of infected is reported in figure (1) . A fundamental observation is that the function S n actually has a basin of depressed values, showed in detail in figure (2) for a given value of n. This basin of minimum seems to indicate that there is a given function α(c) giving a family of tanh curves with Clearly, by considering a number N of values of α n and c n to fit a k , k=0,...,3, we will obtain a set of values {a k,N }. By fitting all the data available (i.e. by taking N = 17), we get the following values for the coefficients a k : It is possible to get more terms in the sum (6), but the cubic term is sufficient to get a formula accurate enough to what we are going to say. The plot of the fit is given in figure (5), together with the values of the residuals α n − 3 k=0 a k c k n , where the values a k are those given in equation (7) . A comparison between the curve α(c) and the basin of minima for S n has been plotted in figure (6): the red curve is the function (6) with the black dots giving the actual values of (c n , α n ) in table (1) . The function α(c) denotes a trend in the data that may be useful. If in the next days the values of the infection continue to rise, it is reasonable to expect that the values of α and c will be constrained closely by the same curve. Clearly the model used here is rough, but it can give at least an idea about the future trend of the data. We are tacitly assuming that there will be no other cluster of infection around Italy in the next days: the point will be discussed later. Now we consider the function f in (3) as a function of t and c alone, since the value of a is constrained by the curve (6) . The plot of the derivative of this function (with respect to t) gives the time of the peak of infections. The plot is reported in figure (7): we notice that the maximum of the derivative of the cumulative number of infected increases with n up to c ∼ 4.15 and then decreases by increasing c. This gives an upper bound for the peak of new number of infected (the point where the second derivative of f (t) (3) is zero), given by 32 days after the first infection. In the next days (at the moment of writing we are exactly at the 24th of March, exactly at 32 days) the description above can be tested. The above analysis, despite using a rough function for the total number of infected, is able to give an upper bound for the time of the peak of new infected (around 23rd of March) thanks to the observation that the values of α n are, in a certain sense, not independent on the values of c n and are well described by a polynomial interpolation with linear coefficient. The hypothesis about the scale invariance of the underlying model (that, we repeat, not necessarily is represented by the SIR model) is fundamental for the accuracy of the result. Another underlying assumption is that the restrictive measures will be kept and observed in the next days and there will be no other clusters in the south of Italy (in the SIR model language, the values of S 0 are below the epidemic threshold, see e.g. [11] ). In the unfortunate case that there will be other clusters, it is possible to think at a substitution of the tanh curve by a combination of such functions: if there are two clusters of comparable magnitude, then we will have f (t) = α 1 tanh(β 1 t − c 1 ) + α 1 tanh(c 1 ) + α 2 tanh(β 2 t − c 2 ) + α 2 tanh(c 2 ) In a next paper other sets of data will be analyzed, whereas the above analysis will be updated if necessary. Coronavirus and oil price crash Differential Equations and Their Applications: An Introduction to Applied Mathematics Early phylogenetic estimate of the effective reproduction number of SARSCoV2 Analysis and forecast of COVID-19 spreading in China Weifeng Lv: High Temperature and High Humidity Reduce the Transmission of COVID-19 Early dynamics of transmission and control of COVID-19: a mathematical modelling study A Contribution to the Mathematical Theory of Epidemics Data analysis for the COVID-19 early dynamics in Northern Italy Data Analysis for the COVID-19 early dynamics in Northern Italy. The effect of first restrictive measures Modelling and predicting the spatio-temporal spread of Coronavirus disease 2019 (COVID-19) in Italy