key: cord-0291994-uqm8qkho authors: Drewes, H.; Flaeschner, G.; Moeller, P. title: Improving the reproduction number calculation by treating for daily variations of SARS-CoV-2 cases date: 2021-08-23 journal: nan DOI: 10.1101/2021.08.15.21262071 sha: 07e1f0815e36b63a5638f19dd315f718d813f0d4 doc_id: 291994 cord_uid: uqm8qkho The Covid-19 pandemic impacted the human life all over the globe, starting in the year of its emergence, 2019, and in the following years. A epidemiological key indicator that gained particular recognition in politics and decision making is the time-dependent reproduction number R_t, which is commonly calculated by institutions responsible for disease control following a method presented by Cori et. al. Here, we propose an improved as well as an alternative method, which makes the calculation more stable against oscillations arising from daily variations in testing. Both methods can be used without great statistical knowledge or effort. The methods provide a smoother result without increasing the time-lag, and provides an advantage particularly in the timeframe of weeks, which might serve as a better ground for forecasts and the raising of alarms. The -value describes the average number of people an individual is expected to infect and is, therefore, acting as a measure of transmissibility and can provide feedback on the effectiveness and need of interventions. Particularly during the Covid-19 pandemic, this measure gained recognition, albeit experts warned to not exclusively focus on it, as it does not account for all the complex dynamics during a pandemic 1 . Nonetheless, the -value provides an important and easy to grasp concept, which aids political decision making. As such an > 1 indicates an acceleration of the pandemic and, conversely, an < 1, leads to a slowing-down. Therefore, political decision makers are anxious to keep below 1, and tighten regulations otherwise. Different methods to calculate the -value have been developed to varying degrees of complexity, depending on the requirements (e.g., "ease-of-use" and statistical information). Cori et. al. presented a method of All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 23, 2021. ; https://doi.org/10.1101/2021.08.15.21262071 doi: medRxiv preprint particular use for non-experts of pandemic model building, as it is easy to implement and robust in its application. The calculation is based on the incidence time series and the serial interval, which is the time offset between the symptoms of a primary case and a secondary case. This approach is prominently applied by the Robert-Koch-Institute 2 (RKI), the German federal government agency responsible for disease control, but also by the Swiss Government 3 . In case of the RKI, the calculation is based on the now-casting 4 numbers which estimate the progression of the number of Covid-19 infections and provides a supression of oscillations caused by the reporting delays. However, in many databases and for many countries this kind of now-casting data is not available. In all cases, these data is subject to noise which is why two different -values are distinguished by the RKI, depending on different smoothing intervals of days. The smoothed reproduction number , is calculated as 2 : (2) is the over days averaged number of new infections on day and si = 4 in case of Cov-Sars2 virus. Typically, an averaging interval of = 7 is being used. A great advantage of this methodto which we will refer as 'Cori's method' -is, that it can easily be implemented even with spread sheets. We will show (1) that our different approach provides better results in terms of robustness against periodic oscillations and (2) that this approach can be transferred to the standard method by changing the arithmetic mean to a geometric mean, as shown at the end of the results section. We created an -value test-function (see Figure 1 and Supplementary Information) to illustrate the differences of the -value calculation methods, and generated the curve of hypothetical newinfections ̃ per day, based on the formula where Δt = 1 day and the serial interval is t si = 4 days. Eq (3) is yielded from rearranging the definition of . Furthermore, we introduce a multiplicative random noise term (1 + ) and a multiplicative term that introduces cyclical variations (1 + ), to mimic All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 23, 2021. ; https://doi.org/10.1101/2021.08.15.21262071 doi: medRxiv preprint variations in daily reporting and medical diagnostics 5 , and write for the number of reported infections =̃ (1 + )(1 + ) . Hereby, z t is a value drawn from a random distribution with mean 0 and a standard deviation = 0.1, and = cos (2 / − ), where is the amplitude chosen to be 0.25, = 2 7 representing a phase offset such that the maximum value of the cosine is found on the th day of the week, and determining the periodicity of the cyclic oscillations, which we found to be 7, as explained by the length of the week. All parameters have been chosen such that mimics the data from the humanitarion data exchange6 for Germany during the time from 02.03.2020 to 29.04.2020, see Fig 1. To extract the we rewrite Eq. 3 by taking the logarithm, such that is the slope of a linear equation for a constant −1 . To find the τ-days averaged ̅̅̅̅̅̅ , we can minimize the sum of squared deviations (SSD) of τ data points with respect to a straight line. We used = 7, as it is used by the RKI, resulting in 11 datapoints per one calculation due to the 4 days serial interval. From these 11 datapoints of infectious events, we made two subsets, data ponts 1 to 10 and data ponts 2 to 11, and determined the respective slopes by minimizing the SSD according to the standard expressions of linear regression. See Fig. 2a for an illustration. Subsequently, we averaged the two slopes ̅̅̅̅̅̅ = 1 2 ( ̅ −1 −1 + ̅ −1 ) This procedure serves as an addtional mean of noise reduction. From the found ̅̅̅̅̅̅ 7 , we calculated = si ̅̅̅̅̅̅̅ 7 . The starting point for our simulation comparing the models is seen in Fig. 1a . The infection starts out with a big -value, that is gradually declining as measures to prevent the spread of the disease are taken and is finally dropping to values below 1. The -values are generated based from the original data of the humanitarian data exchange seen in Fig. 1b using Cori's method. The simplified R t curve which we used as a test-function for comparing ours and Cori's method is shown in Fig. 1c . It generates infection events (see Materials and Methods) as shown in Fig. 1d , approximating the original data very well. Here, and also for the later deliberations, it is irrelevant that the test-function is not smooth. It is, however, handy to keep the test function that way, as it reduces the number of parameters to generate it. All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. We took the infection events as shown in Fig. 1d and used them to extract the corresponding R tvalues using Cori's method (see Introduction) and our approach (see Materials and Methods). The fits upon which the latter one relies, are in shown in Fig. 2a . As can be seen in Fig. 2b Cori's method shows strong deviations from the test-function particularly during the times of highvalues. The daily variation is still strong when the averaging over 7 days, during with the infection events are changing a lot. As can be seen in Fig. 2b , the from Cori's method resulting deviations from the 'real' -value are considerably big, when either > 1 or We calculated the relative error (RE), the results are shown in Fig. 2d . As can be seen, acts as an additive constant , or floor to RE, such that = 0.41 , for both models. For = 0.25, which closely mimicks the behavior seen in the real data, the contribution of the cyclic noise is only found for the Cori's method and can be described as = 0.38 * |ln |. The total error wz R for the Cori's method is, therefore, = √ 2 + 2 ,whereas for our method wz R =z R . This result implies that our method does not suffer from inaccuracies stemming from the daily oscillations. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 23, 2021. ; https://doi.org/10.1101/2021.08.15.21262071 doi: medRxiv preprint Figure 2 The daily variations in reported infections has a lower impact on the value calculations based method than this of Cori. a The data shown in Fig 1c in the time frame from 10 to 31 days is logarithmically plotted. Linear fits for of 10 days, displaced by one day, are used to calculate the average slope in the center day (in this case day 15). b Extracted -values based on a (orange dots) compare better agains Cori's method (blue dots) in the region of high R t -values. c Results of a simulation where the R t -value is kept constant at 3 (reference). It is clearly visible, that the daily oscillations can be suppressed by our method (shown in orange). d Dependency of the error of the -values for both extraction methods described as standard deviation from the true value. The base line of the error ( ) is given by the random noise, whereas the error resulting from random noise and the daily variations ( ) is big for small and big -values in Cori's method (shown in blue). Our method by contrast is not affected (orange, red) by the amplitude of the daily variations. Next, we explored whether a simple fix could be applied to Cori's method, starting from understanding the different treatment of the noise contributions in each model. Hereby, we found that due to our effective averaging of the infection data on the level of logarithms, it would be interesting to modify Eq. 2 to be geometric instead of an arithmetic mean. The reasoning relates to the geometric mean expressible as the arithmetic mean of logarithms. We thus modified Eq. 2 and inserted into Eq. 1: All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 23, 2021. ; https://doi.org/10.1101/2021.08.15.21262071 doi: medRxiv preprint With Eq. 5 we achieved very similar results as with our model, as can be seen in Fig. 3 . We have introduced a simple method to determine the R t value based on an approach that is stemming from the control of dynamic systems. In brief, it determines R t as the slope of a linear equation that is a subset of the time series of new infections described in the logarithmic scale. This approach accounts for the exponential nature of the viral spread and is easily implementable via spread sheets. We found that this approach is not susceptible for the daily variations of the nowcasting data as introduced by variations in daily reporting and medical diagnostics as we found it to be the case for Cori's method for R t > 1 and R t < 1 . We further found, that Cori's method can improved by using geometric averages instead arithmetic averages. As in other applications of finance 7 and social science 8 , the geometric mean proves to be better to describe growth rates as can easily be illustrated by the compound annual growth rate 9 : an initial growth of 80% and a subsequent growth of 25% for instance, is effectively an average growth by 50%, and not 52.5% as derived by the arithmetic mean. To conclude, we believe that the simple change from arithmetic to geometric mean or alternatively our method might prove a valuable tool to determine the R t -values on time frames where cyclical variations are present -as in the days of the week -in particular, when the occurrence of infections is changing rapidly. This in turn, is beneficial to avoid false alarms and to strengthen the trust of the population in the data extracted and therefore in the governmental organizations and scientists, as the data is visibly more free of noise. It can also be of importance for making better forecasts. All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 23, 2021. ; https://doi.org/10.1101/2021.08.15.21262071 doi: medRxiv preprint A guide to R -the pandemic's misunderstood metric Estimation and worldwide monitoring of the effective reproductive number of SARS-CoV-2. medrxiv Bayesian nowcasting during the STEC O104:H4 outbreak in Germany Oscillations in U.S. COVID-19 Incidence and Mortality Data Reflect Diagnostic and Reporting Factors. mSystems The Financial System Today Why is the geometric mean used for the HDI rather than the arithmetic mean? The Handbook of Traditional and Alternative Investment Vehicles: Investment Characteristics and Strategies The authors want to thank Claudius Noack for the fruitful discussions. This work was supported by Hamburg University of Applied Sciences. The authors do declare no competing interests.