key: cord-0845586-zl7j3ono
authors: Fenga, L.; Gaspari, M.
title: Predictive Capacity of COVID-19 Test Positivity Rate
date: 2021-03-08
journal: nan
DOI: 10.1101/2021.03.04.21252897
sha: 53d72bbbea65cc269c92cdb5dc7b37234339e20d
doc_id: 845586
cord_uid: zl7j3ono

Background: COVID-19 infections can spread silently, due to the simultaneous presence of significant numbers of both critical and asymptomatic to mild cases. While for the former reliable data are available (in the form of number of hospitalization and/or beds in intensive care units), this is not the case of the latter. Hence, analytical tools designed to generate reliable forecast and future scenarios, should be implemented to help decision makers planning ahead (e.g. medical structures and equioment). Method: The Test Positivity Rate (TPR) is an indicator designed to describe the evolution of an infectipus disease by accounting for the proportion of the number of persons tested positive in a given day. Previous work of one of the authors showed how an alternative formulation of the TPR exhibits a strong correlation with the number of patients admitted in hospital and intensive care units. In this paper, we investigate the lagged correlation structure between the newly defined TPR and the hospitalized people time series, exploiting a rigorous statistical model, the Seasonal Auto Regressive Moving Average (SARIM A). Results: The rigourous analytical framework chosen, i.e. the stochastic processes theory, allowed for a reliable forecasting about 12 days ahead, of those quantities. The proposed approach would also allow decision makers to forecast the number of beds in hospitals and intensive care units needed 12 days ahead. Conclusion: The obtained results show that a standardized TPR index is a valuable metric to monitor the growth of the COVID-19 epidemic. The index can be computed on daily basis and it is probably one of the best forecasting tools available today for predicting hospital and intensive care units overload, being an optimal compromise between simplicity of calculation and accuracy.

One of the aspects that makes the COVID-19 pandemic difficult to control, is the simultaneous presence of significant numbers of both critical and asymptomatic to mild cases. While for the former reliable data are available (in the form of number of hospitalizations and/or beds in ICUs), this is not the case of the latter. In many instances, in fact, those who contracted the virus are unaware of such a condition and thus enter the status of spreaders. As a result, the infections can grow uncontrolled with a major impact on the health system. Action-wise, such a situation calls for at least two measures: on the one hand policy and decision makers should plan ahead the needs in terms of medical structures and equipment whereas, on the other hand, analytical tools designed to generate reliable forecast and future scenarios should be implemented. While a number of effective approaches have been studied and proposed for different epidemics over the years, this is not the case of the CoVID-19 pandemic. In fact, all the efforts so far done to model and predict such a disease might hardly support the idea that a uniformly "better" model is available to describe and predict the evolution of such a catastrophic pandemic. Therefore, even though many valid contributions have been proposed so far 21 , it is not unreasonable to look at those efforts as the building block of one or more best practices. In particular, the forecasting problem has been addressed for two of the the most populated countries in the world, i.e. China 23 and India 32 . A survey including other approaches is presented here 33 . The complexity of such a task is discussed in 3 , where the authors analyzed three different regional-scale models for forecasting and assessing the course of the pandemic. Along those lines, is worth mentioning the excellent article 20 , where the main reasons leading to the failure of a forecasting models are presented. Finally, two different predictive approaches has been proposed for Italy, i.e. one exploiting the bootstrapped prediction generated by a model of the type ARMA 12 and one based on the simulated annealing algorithm 13 .

The Test Positivity Rate (TPR) is one of the indexes often used worldwide for monitoring the progression of the COVID-19 pandemic, see for example the coronavirus testing dataset 17 , which contains an updated picture of the international situation concerning testing strategies and the associated data for many countries. Until now, the TPR was mainly studied considering its relationship with confirmed cases 11 , for example it was used to estimate COVID-19 prevalence in the different states of US 27 . However, a more intensive use of diagnosis tests associated with a standardization of the TPR, crucial in light of differences in the available tests, can solve their limited investigation abilities (see, e.g., 26) . In more details, a recent work of one of the authors 16 shows that a standardized COVID-19 Test Positivity Rate (TPR) can be used to predict hospital overload. In particular, by observing its trend, it is possible to forecast the course of patients admitted in hospital and in intensive care units. For example, when the TPR reaches a peak, a growth in COVID-19 hospitalisations lasting 12-15 days can be inferred. The insight is that the TPR index models the trend of the COVID-19 infections. More precisely, if the TPR increases in a given day, an increasing number of active cases (including the unknown ones) can be inferred for the same day, and presumably the number of infections is increasing too. Thus, after a while the number of hospitalized people will also increase. In other words the TPR is designed to embody the unknown cases. Clearly, for this measure to be valid, all the administered diagnostic tests should be considered in the TPR calculation, as pointed out in 16 .

In this paper, we investigate the predictive capacity of the TPR index, exploiting a rigorous statistical model. The lagged correlation between the TPR and hospitalized people time series will be modeled using a SARIM A (short for Seasonal Auto Regressive Moving Average) model. A generalization of the ARIM A (Auto Regressive Moving Average) class ( 6 ), SARIM A models have been introduced to model complex dynamics of the type stochastic seasonal in many fields of research, such as economics ( 14 and 10 ), engineering ( 25 ) or hydrology ( 26 ). In epidemiology, SARIMA models have been applied in a variety of studies: in 28 the authors applied this model for estimating case occurrence of two diseases: malaria and hepatitis A from January 1980 to June 1995 for the United States whereas in 9 the epidemiological and aetiological characteristics of influenza have been identified by establishing suitable SARIMA models. In particular, such an approach proved to be accurate in the forecasting of the percentage of visits for influenza-like illness in urban and rural areas of Shenyang (China). More recently, 24 used the SARIMA method -in conjunction with models belonging to the class exponential smoothing -to predict the trend of acute hemorrhagic conjunctivitis disease and used the obtained outcomes to provide evidence for the government to formulate policies regarding its prevention in mainland China. The proposed mathematical model allowed us to estimate a predictive lag of about 12 days of the TPR for the prediction of hospitalized people time series in some Italian regions. Moreover, we defined a methodology to forecast the number of beds in hospitals and intensive care units needed 12 days ahead. The obtained results show that a standardized TPR index is a valuable metrics to monitor the growth of the COVID-19 epidemic. The index can be computed daily and it is probably one of the best forecasting tools available today for monitoring hospital and intensive care units overload, being an optimal compromise between simplicity of calculation and accuracy.

As already pointed out, the TPR is one of the metrics commonly used to infer the level of transmission of a disease in a population 7 , and, as a such, has been also used in the case of the COVID-19 for different purposes, see for example 17;27;29 . However when different types of tests are used, as it happened during the second phase of the pandemic in Italy, where antigen tests have been extensively used, the definition of the TPR becomes more critical. In this study, we will use a standardized version of the TPR index defined by one of the authors 16 , which allows to integrate antigen tests in the index calculation.

. CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted March 8, 2021. ;  Following the style of 16 , where the Greek letters Θ, Φ and µ have been replaced respectively with the letters τ , ρ and ω, for consistency with the statistic notation later employed, the mean TPR index τ on ω days is defined as follows:

where dayP ω , dayT ω and dayA ω are respectively the average values of new positive cases, molecular (PCR) tests and antigen tests done in the last ω days.

To compute the TPR index, the average number of healed patients in the last ω days, dayR ω and an estimation for the number of repeated tests P r are subtracted from the total number of tests. We assume that at least one test is done for each healed patient. The number of repeated tests P r is computed using the formula 2, following the approach presented in 16 :

This formula is obtained assumning that the positivity rates for antigen tests and molecular tests are the same, and thus dayA/P r = dayT /(dayP − P r). Using this approach the computed P r can be considered an upper bound, because the molecular tests positivity rate is generally greater then the one related to antineg tests which are mainly used for screening purposes, see for example 34 . Finally, following the style of 16 , a factor ρ is added to τ in order to model the impact of the number of tests on the remaining susceptible individuals, which are computed removing the total infected cases I from the population N of a given region. The number of tests are subtracted removing the repeated ones and those used for healed patients, obtaining the following formula:

and the TPR index τ ω is defined as follows:

Of course, in general it would be possible to define more precise measures if the data were provided in a more structured form. In this regard see, for example the extended version of the TPR index presented in 16 . However, adding more information will make the collection of data more difficult, and the computation of the TPR on a daily basis would be unfeasible or impractical. Indeed, there is a trade-off between what can be actually collected and what can be effectively represented and used to compute the TPR daily. Using the above definition, the TPR can be computed in any given day for any region/state (see, e.g., Figure  1 where the TPR index for the Toscana region is reported from Sept. 2 2020, to Feb. 10 2021). This Figure also plots the time series of patients admitted in hospitals and in intensive care units. An interesting correlation between the curves can be 4 . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted March 8, 2021. ;  observed: the TPR peak anticipates the peak of patients admitted in hospital and intensive care units. This property holds only if the data on the diagnosis tests used are complete 16 . There is an intuitive motivation behind this effect, in fact, considering COVID-19 epidemiological data, we know that symptoms on average occur 11 days after the contraction of the infection and that critical patients are admitted in hospital about 4 days later. If we assume that the TPR is a measurement of the infections occurring in a given day, in an ideal situation, the infected people with a critical evolution will be presumably admitted in hospital 15 days later. However, there are known biases involving diagnostic tests data that are difficult to deal with, e.g., those related to reporting delays 17 . As a result, the ideal predictive capacity cannot be assumed in practice, especially if different kind of tests are used, as in the case of the current Italian situation.

Despite these limitations in the provided data, the TPR can be effectively used to deduct important information on the course of the disease, as illustrated in Figure 2 where the epidemic course in Toscana region in autumn 2020 is depiceted.

The aim of this research is to analyse in details this scenario to get to the heart of some hard-hitting questions, especially when the TPR is growing considerably. How many days will be needed to reach the peak of hospitalized people? How many beds in hospitals will be necessary to add? And, in general, which is the "theoretical" predictive capacity of the proposed TPR index?

Starting from this motivation, we analysed the TPR index time series, as well as the hospitalized, and ICU patients time series, to individuate the time lags that can be effectively inferred from the available data. We first introduce the statistical methodology used and then we present a detailed analysis for four Italian region, for which data on antigen tests were available as reported in 16 .

. CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted March 8, 2021. ; https://doi.org/10.1101/2021.03.04.21252897 doi: medRxiv preprint 

Throughout the paper, the time series of interest, say x t , is always intended to be a real-valued, uniformly sampled, sequence of data points of length T , formally expressed as

Furthermore, x t is supposed to be a realization of an underlying stochastic process of the type SARIM A (short for Seasonal Auto Regressive Moving Average). A generalization of the ARIM A (Auto Regressive Moving Average) class ( 6 ), SARIM A models have been introduced to model complex dynamics of the type stochastic seasonal in many fields of research, such as economics ( 14 and 10 ), engineering ( 25 ) or hydrology ( 26 ). In epidemiology, SARIMA models have been applied in a variety of studies: in 28 the authors applied this model for estimating case occurrence of two diseases: malaria and hepatitis A from January 1980 to June 1995 for the United States whereas in 9 the epidemiological and aetiological characteristics of influenza have been identified by establishing suitable SARIMA models. In particular, such an approach proved to be accurate in the forecasting of the percentage of visits for influenza-like illness in urban and rural areas of Shenyang (China). More recently, 24 used the SARIMA method -in conjunction with models belonging to the class exponential smoothing -to predict the trend of acute hemorrhagic conjunctivitis disease and employed the obtained outcomes to provide evidence for the government to formulate policies regarding its prevention in mainland China.

Mathematically, SARIMA models take the form of a t-indexed difference 6 . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted March 8, 2021. ; https://doi.org/10.1101/2021.03.04.21252897 doi: medRxiv preprint equation -being t as defined in (5) -i.e.:

Denoting with B, d and D the backward shift operator and the non-seasonal and seasonal difference operator respectively,

Here, φ, θ, Φ, Θ, respectively denote the non-seasonal autoregressive and moving average parameters and the seasonal autoregressive and moving average parameters. Finally α t is a 0-mean white noise with finite variance σ 2 . In the present paper, external information is exploited and embodied in (6) in the form of a matrix of regressors D j,t−k , with k ∈ Z + , weighted by a vector of coefficients β j , i.e.

This particular extension is usually referred to as REG-SARIMA, to stress the role played by the possibly lagged (of an amount equals to k temporal lags) regressors, stored in the matrix D j,t . This types of models are designed to capture the stochastic dynamics generated by the residuals obtained by regressing the matrix D (the independent variable(s)) on the time series of interest (the dependent variable). A better insight of the stochastic mechanism governing the REG-SARIMA equation can be gained by re-expressing equation 6 so as to emphasize the role played by the term u t in (7), i.e.

This formulation makes clear the flexibility of this approach which allows the extraction of the significant lags at which the different regressors impact the time series of interest as well as their magnitudes.

If the integration constants d and D (introduced in Equation 8 ) are certainly useful to mitigate -if not solve altogether -many stationarity problems, on the other hand they might not be effective against non-normality and/or eteroschedasticity issues. Unfortunately, the data considered in this paper are affected by both these phenomena and therefore, as a coping mechanism, the well-known one-parameter Box-Cox data transformation has been adopted. Presented in the mid-sixties in 5 , this method has been discussed and applied in a wide range of problems (see, among others, 31 , 19 and 22 ), given the widespread acceptance gained over the years. Its mathematical formulation is quite straightforward and takes the form of a power transformation, i.e.

x t (λ) =

By embodying the λ parameter in Equation 7, the model employed in this paper is finally defined, i.e.

The inference procedures carried out for the estimation of Equation 9 are of two types: maximum likelihood for the SARIMA parameters {φ, θ, Φ, Θ, d, D} and ordinary least squares for the vector β. Finally, the hyper-parameters {(p, d, q, P, D, Q)} as well as the Box-Cox constant λ are estimated within the framework of the Information Theory as explained in the following section.

Akaike's Information Criterion AIC ( 1 , 8 , 18 ) -one of the most popular model selector -will be employed to choose the SARIMA model order as well as the Box-Cox λ parameter. The selection of those constants is not a trivial task as it entails the solution of a conditional multi-objective problem induced by the 6-dimensional vector of unknown constants Γ ≡ {(p, d, q, P, D, Q)} conditional to the Box-Cox paramter λ. The estimation method employed to find the "best" conditioned vector of hyper-parameters -that is the one governing the selected order structure M * ≡ (Γ * |λ * ) -relies on the information theory and, in particular, on the Akaike Information Criterion (AIC). At its core, AIC is based on an estimate of the expected relative entropy (the Kullback-Leibler divergence) contained in an estimated model, that is the degree of divergence from the "true" theoretical model. Assuming X t to be randomly drawn from an unknown distribution H(x), with density h(x), estimation of h is done by means of a parametric family of distributions with densities [f (x|θ; θ ∈ Θ)], θ the unknown parameters' vector. Denoting by f (z|θ) the predictive density function, by f the true model and by h the approximating one, Kullback-Leiber divergence takes the form

which, after some algebra, can be written as follows:

This quantity can be estimated by replacing H with its empirical distribution H, so that L(X T ;Ĥ) = 1

T T α=1 log f (X α |θ). This is an overestimated quantity of the expected log likelihood, given thatĤ is closer toθ than H. The related bias can be written as follows:

b(H) = E H L(X n ;Ĥ) − L(X n ; H) .

. CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted March 8, 2021. ; https://doi.org/10.1101/2021.03.04.21252897 doi: medRxiv preprint Denoting, by the Greek letter ξ the number of estimated parameters, Akaike proved that b(H) = ξ T , so that the information based criterion takes the form L(X T ;Ĥ) + ξ T . By multiplying this quantity by −2, finally AIC is defined as

Elaborating on 30 , the correct formulation of AIC for the model expressed in Equation 9 takes the form

By sequentially applying Equation 11 for different combinations of the hyperparameters {(p, d, q, P, D, Q))} and conditioning the observed data to a given λ parameter (which in Equation 12 has been denoted with λ 0 ) a sequence of AIC values is obtained. This is the first of the two-step selection strategy adopted in the present paper, which is usually referred to as M AICE (short for Minimum AIC Expectation) 2 procedure. In the second step, the order (Γ * ) satisfying:

i.e. the minimizer of the AICs generated by the candidate models, will be the winner model structure. However, Equations 12 and 13 are not designed to estimate the Box-Cox λ parameter. To this end, a grid search approachover a set Λ of B competing parameters {λ j ; j = 1, 2, . . . , B} -has been applied. Each λ has been evaluated in terms of the contributions given in terms of both data normalization and statistical significance of the external regressor. Finally, M AICE procedure requires the definition of an upper bound for all the Γ parameters, as a maximum order a given process can reach. This choice, unfortunately, is a priori and arbitrary.

persons tested positive for COVID-19; the number of tests done considering both molecular (PCR) tests and antigen tests, and the number of healed persons), and those related to the number of hospitalizations and beds in intensive care units occupied by patients tested positive for COVID-19. The considered time frame ranges from Sept. 2 2020 to Feb. 10 2021 for a total of 353 data points. We have analysed 4 Italian regions for which the collection of the data on the antigenbased tests administered from Oct. 2020 to the 15th of Jan. 2021, has been possible, i.e. Toscana, Veneto, Piemonte and Alto Adige. The interested reader may refer to 16 for the details of the data collection procedure. Unfortunately, certain data concerning the use of diagnosis tests in the considered time frame are still not available for the other Italian regions. Figure 3 presents the TPR and hospitalised time series for Veneto, Piemonte and Alto Adige, while those related to the Toscana region has been already introduced in Figure 1 . The presented empirical experiment considers two different scenarios, according to the way the available information is used. Their aim is to answer the hard-hitting questions that we have set in Figure 2 . The first one -which can be defined of the type real-life -exploits the whole data set and it is designed to analyse the predictive capacity of TPR, to deliver a "theoretical" time lag between the two series, and prediction which, by design, cannot be verified being projected into the unknown future. On the contrary, the second experiment concerns forecasting the number of beds needed in hospitals and intensive care units after the determined time lag in specific situations in the past, that can be verified using the available data.

In essence, this part of the experiment, being based on the whole data set, can support only qualitative considerations on the proposed method. In accordance with the intuition that TPR values represent the evolution of infections, the TPR should impact the hospitalization time series 15 days in advance. The approximated time period of 15 days between the infection and hospitalization, is obtained by adding to the average time of 11 days (between the infection and symptoms' onset) the 4 days usually reported from symptoms to hospitalization. However, such an estimate might be affected by retrospective revisions of the testing time series, according to the day the revision has been made, and not when the tests actually occurred 17 .

Studying the lagged correlations between the TPR time series and those of patients admitted in hospitals and ICUs, using the SARIMA model, we have individuated a predictive time lag of about 12 days for all the studied regions, which confirm our intuitive hypothesis. Indeed, a 12 days predictive capacity for the TPR, with respect to hospitalized patients instead of the hypothesised 15, can be reasonably expected considering the above mentioned retrospective revisions effect. The outputs of the SARIMA models are summarized in Table  1 . This table also reports for each region (last column) an indicative value for the variation in the number of beds in hospitals needed if the TPR varies of one unit. Although these values, being future estimates and thus necessarily 10 . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. 

. CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted March 8, 2021. ;  embodying, to different extents, amounts of uncertainty, they can be used for doing empirical considerations. For example, in the Veneto region, if the TPR increases of one unit, about 82 additional beds may be needed in the near future (after 12 days). The same consideration holds for intensive care units, for which the additional beds are about 12. Vice versa, if the TPR decreases in Veneto a similar amount of beds should be subtracted. In the considered regions, the average variation of beds in hospital and ICUs are 63 and 16 respectively.

The second scenario has been built to give account of the forecasting performances delivered by the proposed method and has been carried out considering different scenarios on all the studied regions. Such a quantitative evaluation has been conducted using a test set, i.e. a portion of the data employed to provide an unbiased evaluation of the final model previously fitted on the training dataset. Table 2 gives account on how the time series have been broken down in training and test set for the different regions. It is worth emphasizing that the test sets have been defined having in mind the evaluation of the performances of our approach under two specific conditions i.e. we have considered different situations: two in which the TPR was growing considerably in Toscana and Alto Adige; one associated to the beginning of the "red zone" Piemonte; one characterized by a slow growth of the TPR index in Veneto; and one associated to a fast lowering of the TPR indicator in Veneto. In the three-tiered system issued in Italy to combat the spread of COVID-19, the "red zone" indicates an high-contagion-risk area where non-essential shops and markets are closed and residents are only allowed to leave their homes for work, health reasons or emergencies.

As for the REG-SARIMA model, as already mentioned in Section 2.2, the model order has been defined using the MAICE procedure and constraining the Box-Cox λ parameter to 0 (i.e. log -transforming the data). However, being an exhaustive search of the "best" REG-SARIMA model either unfeasible or or impractical for computational reasons, the competition set has been built following the Box-Jenkins procedure, as illustrated, e.g., in 6 . Almost all the parameters of the final models are statistically significant and generate a sequence of residuals which can be deemed acceptable in terms of whiteness. Most of the times, the Maximum Likelihood algorithm converged quickly, with the only exception of the Piemonte region. In this case, a "sparse" data generating process in the autoregressive part involved a lengthy estimation approach -of the type trial and error -for the definition of the "best" (in AIC sense) model's non-seasonal structure.

As already pointed out, the adopted MAICE procedure (13) is constrained to a specific value of the Box-Cox constant, which therefore has been set to λ 0 = 0. As for the maximum order Γ 0 , it has been arbitrarily chosen on a case by case basis.

The results of the forecasting experiments are summarized in Figures 4. The reader will certainly notice that the best forecasting results are obtained 12 . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted March 8, 2021. ;  13 . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted March 8, 2021 14 . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted March 8, 2021. in the last experiment concerning the fast TPR lowering scenario in the Veneto region, where more data are available. However, all the other example provide reasonable results, and most importantly, when a fast growing of the TPR was present in the preceding of the cut, significant increases in hospitalizations are estimated. The determined increments are in general comparable to the generic estimations presented in Table 1 .

The proposed approach is general and can be exploited in any region/state under the condition that a set of requirements, below reported, are satisfied: 3. The TPR should reach a peak before the hospitalized and ICU patients reach theirs;

In general, if the first two requirements hold, also the 3rd one should, viceversa, if this is not the case, probably other anomalies or errors are present in the provided data. Should one or more of the above mentioned requirements be unfulfilled, the predictive properties of TPR might be affected. If this the case, an integration effort should be made to collect the missing data, and/or correct possible errors. For example, even though requirement 2 was not met for the Alto Adige region, we were able to analyse the TPR by manually adding the missing information to the time series of the new positives 16 .

At this point, it is worth to compare the TPR index with other COVID-19 key indicators, commonly used for monitoring purposes 15 , to the end of assessing their predictive properties. In particular, we have chosen the following indicators, designed to measure the dynamical behavior of the infections, i.e:

• growth rate: positives daily variation;

• incidence: fraction of COVID-19 positives per 100.000 individuals;

• The reproduction number R t : number of secondary infections generated from a case at time t. Table 3 shows the pure predictive capacity with respect to hospitalizations of these COVID-19 indicators, for comparison with the TPR. While the TPR can be considered as a measure of the number of infections that occur on a certain day, also accounting for unknown cases, indicators based on officially reported positive cases (e.g. incidence and growth rate), measure the variation of official cases in a given area. Assuming that critical cases are admitted into hospitals within 4 days after tested positive, such a delay can be taken as a approximate "upper bound" for their pure predictive capacity. As for the 15 . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) 

In this paper, we have presented a forecasting method for the short term prediction of the impact of CoViD-19 disease on the public health system. To this end, we have provided enough evidence about the goodness of the TPR as a leading indicator for both the number of people hospitalized and, out of this group, for those who required a bed in intensive care units. The theoretical framework chosen -that is the time series analysis -has been particularly useful for the dynamic comparison and the exploitation of the information contained in the TPR time series. In our simulations, the model chosen, of the type REG-SARIMA, was able to generate reliable predictions from a minimum of 8 to 12 lags. However, especially in light of new developments of the diseasewhich take the form of many variants -the prediction performances of the REG-SARIMA model might might be affected, if not impaired altogether. Therefore, future directions include the study of a more appropriate model, e.g. of the type regime-switching. Furthermore, additional external information (e.g. the time varying percentage of critical cases) could be fruitfully exploited in a Bayesian theoretical framework (e.g. of the type Bayesian Hidden Markov Models 35 ) or using heuristic based approaches (e.g. like the DempsterShafer techniques 4 ). Finally, we will consider the remaining Italian regions as soon as time series of "enough" lenght become available.

. CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted March 8, 2021. ; https://doi.org/10.1101/2021.03.04.21252897 doi: medRxiv preprint . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted March 8, 2021. ; https://doi.org/10.1101/2021.03.04.21252897 doi: medRxiv preprint

A new look at the statistical model identification

Modern development of statistical methods

The challenges of modeling and forecasting the spread of covid-19

The dempstershafer theory of evidence: an alternative approach to multicriteria decision modelling

An analysis of transformations

Time series analysis: forecasting and control

Practical implications of the non-linear relationship between the test positivity rate and malaria incidence

Model selection and akaike's information criterion (aic): The general theory and its analytical extensions

Epidemiological features and time-series analysis of influenza incidence in urban and rural areas of shenyang, china

Forecast of sarima models: An application to unemployment rates of greece

Test positivity rates and actual incidence and growth of diseases

Forecasting the covid-19 diffusion in italy and the related occupancy of intensive care units

Covid19 meta heuristic optimization based forecast method on time dependent bootstrapped data

Time-series forecasting of the german unemployment rate

Monitoring the covid-19 epidemic in the context of widespread local transmission

Covid-19 test positivity rate as a marker for hospital overload. medRxiv

A cross-country database of covid-19 testing

Akaike information criterion

The estimation of economic depreciation using vintage asset prices: An application of the box-cox power transformation

Forecasting for covid-19 has failed

Predictive mathematical models of the covid-19 pandemic: underlying principles and value of projections

Bayesian inference for multivariate meta-analysis box-cox transformation models for individual patient data with applications to evaluation of cholesterol-lowering drugs

Trend and forecasting of the covid-19 outbreak in china

Forecast of the trend in incidence of acute hemorrhagic conjunctivitis in china from 2011-2019 using the seasonal autoregressive integrated moving average (sarima) and exponential smoothing (ets) models

An algorithm for traffic flow prediction based on improved sarima and ga

Estimation of water demand in iran based on sarima models

Using test positivity and reported case rates to estimate state-level covid-19 prevalence in the united states

Dynamic linear model and sarima: a comparison of their forecasting performance in epidemiology

Considerations for implementing and adjusting public health and social measures in the context of covid-19: interim guidance

On the order determination of arima models

The box-cox transformation technique: a review

Modeling and forecasting the covid-19 pandemic in india

Forecasting models for coronavirus disease (covid-19): a survey of the state-of-the-art

Clinical application of a rapid antigen test for the detection of sars-cov-2 infection in symptomatic and asymptomatic patients evaluated in the emergency department: a preliminary report

Inference in hidden Markov models

The author would like to thank the Italian Civil Protection Department, and all the staff involved for providing the data of the outbreak used in this study.

The author declares that he has no conflict of interest.