key: cord-0103904-i5zhuuux authors: Perone, Christian S. title: Analysis of the SARS-CoV-2 outbreak in Rio Grande do Sul / Brazil date: 2020-07-20 journal: nan DOI: nan sha: bee115ff209403f071efd2bf4b20beb8cdd6d675 doc_id: 103904 cord_uid: i5zhuuux This article contains a series of analyses done for the SARS-CoV-2 outbreak in Rio Grande do Sul (RS) in the south of Brazil. These analyses are focused on the high-incidence cities such as the state capital Porto Alegre and at the state level. We provide methodological details and estimates for the effective reproduction number $R_t$, a joint analysis of the mobility data together with the estimated $R_t$ as well as ICU simulations and ICU LoS (length of stay) estimation for hospitalizations in Porto Alegre/RS. The effective reproduction number, often called R (t ), R t or R e is the expected number of new infections that are caused by a primary case, on a context where not the entire population are susceptible, as opposed to the basic reproduction number R 0 [3] , where the entire population is assumed to be susceptible. Since April 12th, we made available 1 the R t estimates for all states in Brazil, including Rio Grande do Sul. Choosing a method to estimate the R (t ) is a challenging task. Many methods with different characteristics were developed in the past [4, 5, 6] , and deciding which method to use heavily depends on the context and the properties that the context requires. We opted to use the method from [4] , which uses the serial interval (SI) distribution -the time between successive clinical onsets in a chain of transmission -, which is a quantity that is easier to observe than the generation time, since times of infection are rarely observed. The method from [4] is based on the relationship of the incidence I t at time t , the infectivity profile w s that is approximated by the distribution of the serial interval and the R t , as given by the following definition: The relationship shown in the Equation 1 tells us that we are weighting the incidence by symptom onset using the infectivity profile, and then multiplying it by the R t . To reduce the variance of estimating the R t from the equation above, the authors calculate estimates over longer time windows (assuming that the instantaneous reproduction number is constant within that window). In our case, we used a 1 week period, but longer periods can be used to reduce variance, but might also hide some dynamics of the R t or lead to lagged and inaccurate R t estimates. Another advantage of the method from [4] , is that the posterior can be easily calculated when assuming a Gamma prior distribution on the R t . The method also allows us to take into consideration the uncertainty around the serial interval distribution, a topic of constant debate [7, 8] , and therefore with high uncertainty associated. Another important property that was also one of the key reasons to select the method was that the method is reasonably robust to under-reporting as shown in the sensitivity analysis on various scenarios made by the authors. As this is a known issue in many states of Brazil, this method poses a strong advantage as long as the testing bias is constant, which is often not true at the beginning of the outbreak, but important as testing capacity tends to stabilize afterward. Besides the aforementioned reasons to select the method [4] , recently, a study found that the method [6] systematically underestimates R t when the true value is substantially higher than one, while also being biased as transmission rates shift and recommended the method from [4] . It is well-known that the SARS-CoV-2 outbreak in Brazil posed a lot of new challenges regarding the testing procedures that required strategical coordination among many different entities from different cities. This pressure caused major delays in many states of Brazil and by looking at the Figure 1 , where we show the delay distribution since symptom onset until cases appear in the main panel of the SES-RS, there is strong evidence of major delays taking place in the state procedures. Cases from Porto Alegre/RS, the state capital, also suffered major delays and on a single day, more than 1600 cases were added on a batch that had cases with more than one month of delay since symptom onset. Given this context, it is even more important to deal with that truncation properly. Many statistical methods exist to deal with right-censoring, including parametric distribution estimation or non-parametric methods. Most of the time, these methods rely on the estimation of an Exponential, Gamma ou Weibull distribution and then uses the (flipped) cumulative density function (CDF) to correct for the delays. Parametric methods are often sample efficient and work well on the small data regime. However, they fail to model the complexity of distributions under model misspecification, especially when there are multiple modes due to complex and dynamic processes. Since we have a reasonable amount of historical data for state level analysis, a better approach is to use the empirical CDF (ECDF) and then do bootstrapping to accommodate for the uncertainty around the observed delays, as shown in Figure 2 . The ECDF in our case is constructed using the delay distribution by aggregating the new symptom onset dates at each new day, and using only a few recent weeks of data to capture the most recent delay patterns. The ECDF is then used to parametrize a Negative Binomial distribution as in [9] : where F (t -j ; θ i )), differently than in [9] , is the ECDF for the bootstrapped delay distribution (θ i sample). This estimation will give the proportion of onset cases from t -j days ago that are expected to appear over the j days from that time until the present on the SES-RS panel. The t , as in [9] , is the last date on which cases were reported, so the number of onsets o t -j on day t − j is taken as a result of a Bernoulli trial from the total true number of cases with clinical onsets on day t -j . Like in [9] we also discard some days at the end (6 days in our case) and allow upscaling up to a limit of 5 times the reported cases for the day, as this approach also introduces an inevitable estimation bias. An example of this adjustment result can be seen in Figure 3 . In order to propagate the right-censoring adjusment uncertainty into the R t estimation procedure, we sample 1000 samples from the bootstraped delay distribution described in the previous section, and then fit 1000 models using EpiEstim [10] and a serial interval with µ = 4.7(3.7 − 6.0) and σ = 2.9(1.9 − 4.9) [11] . After estimating 1000 models, we sample from the assumed Gamma posterior of each model to build the final posterior and calculate the R t statistics such as mean, median, and highest density interval (HDI) for each day. Propagation of right-censoring adjustment uncertainty affects mostly in the most recent days of the distribution. In Figure 4 , we show the results of the R t estimation for the state of Rio Grande do Sul. These estimates used official data provided by SES-RS 2 until July 17th, 2020. As we can see in the Figure 4, we had a median R t below 1.0 for a longer period only at the beginning of the outbreak when the interventions were potentially able to achieve its goals (but with the uncertainty interval above 1.0), after that period, coincidentally when interventions started to be relaxed (with local media entities reporting many cities reopening), the R t kept oscillating above 1.0 for a long period. We can also see at the bottom panel that ICU occupancy also started to grow, showing a very worrying and sustained trend that can cause a collapse of the health system as seen in other places [12] . In Figure 5 we also show a heatmap of confirmed cases for 8 cities in RS with the most number of cases in the state. A pattern that was also seen in other places is shown, where the infections usually start at a young cohort before spreading to the elderly. Note that we did not adjust for right-censoring in this figure, therefore for all cities, it appears that there is a reduction of cases in the last days, but is usually an artifact of the delay from symptom onset to confirmation and reporting. Adjusted incidence for Rio Grande do Sul (symptom onset) Original incidence Right-censor adjustment F I G U R E 3 Result of the right-censoring adjustment with a HDI interval of 0.7 and 0.95 respectively shown as orange bands. The red region are the discarded samples. Many providers made available mobility data during the outbreak, one example is the COVID-19 Community Mobility Reports [13] that was made available early in the outbreak by Google. The dataset contains anonymized data (using techniques such as differential privacy) that prevents the identification of any individual person. These reports contain the movement trends by region, across different categories of places, and compared to a pre-outbreak baseline taken from a 5-week period between Jan 3rd and Feb 6th, 2020. The Community Mobility Reports show movement trends by region, across different categories of places: parks, workplaces, residential, transit stations, grocery and pharmacy, and retail and recreation. In Figure 6 we can see this data for Rio Grande do Sul, separated by each different place category together with a incidence panel of confirmed cases at the bottom of the figure. Adding a new dimension to the R t such as the mentioned mobility data can help visualize the dynamics of the R t and mobility at the same time. In Figure 7 we show these joint trajectories of mobility and the estimated R t for Rio Grande do Sul. The density distribution at the background was estimated using a Gaussian kernel. As we can see in the figure, most of the density mass is concentrated above the median R t of 1.0. If we look at the workplaces panel of Figure 7 we can see a very clear separated density cluster representing the early moment at the outbreak when mobility was very low and with a respective R t below 1.0. The same procedure described in the earlier sections was employed to estimate the R t of Porto Alegre/RS (state capital) during the outbreak. We can see the results of the estimation in Figure 8 . What we can see in the R t dynamics is a period of relative stability at the beginning of the outbreak and coincidentally after adopting interventions to reduce mobility. After some time during the restriction period, we were able to see a slow but sustained increase of the R t and then a bigger increase after relaxing interventions followed by another peak on the R t potentially due to a sudden change of the testing capacity or reporting procedures. In this analysis, we have not taken into consideration imported cases, which can pose a limitation as it assumes implicitly that all cases after the first, arise from local transmission [14] . Note also that interpreting the temporal trends is not always an easy task, changes in R t can be caused by changes in contact patterns of the affected population, interventions that have their effects overlapped with other interventions, reduction of susceptible population, or changes in test capacity. The state capital of Rio Grande do Sul, Porto Alegre, on July 17th, had an adult ICU occupancy of 263 patients. Among these patients, many were from other cities, however, the major part of them had origin from the city itself (as seen on the black bars at the bottom panel of Figure 8 ). To estimate the length of stay (LoS) at the ICUs in Porto Alegre/RS we used the data from SIVEP-Gripe 3 . While potentially under-reported, this dataset contains a great proportion of the admissions with rich information at the patient level. As shown before on May 18th 4 , the estimated mean LoS in Rio Grande do Sul was one of the largest in the country. To estimate the LoS of Porto Alegre, we did a Bayesian estimation of the parameters α and β of a Weibull distribution, using weakly informative priors from a Half Normal distribution with σ = 50.0. Samples from the prior predictive distribution and the estimated posterior distribution can be seen at the left panel and right panel of Figure 9 , respectively. In Figure 10 we can also see samples from the Weibull posterior distribution. The estimated mean for the α parameter of the Weibull was 1.10 (1.02 -1.18, HDI 97%) and 12.42 (11.28 -13.57, HDI 97%) for the β parameter. With the estimated LoS distribution at hand, we can now simulate and create future scenarios to understand how ICUs occupation and dynamics will behave. Unfortunately, due to lack of information such as admissions (that are only found at SIVEP-gripe dataset, released in average at every 7 days) and the difficulties to estimate the reporting delays and correct for the right-censoring of this data, we limited the analysis to estimate how long it would take to confidently have a normalization of the ICUs in regarding the COVID-19 patients after stopping the admissions to zero. An unrealistic scenario in the middle of the outbreak, but it shows how long ICU occupancy can take to have the outcome (death/recovery) for all hospitalized patients in these units. We also shown simulated scenarios with different daily admission rate. Since we are dealing with a discrete quantity (length of stay) in days, we have to discretize the distribution to avoid introducing biases due to rounding, to that purpose we employed distcrete 5 . We also discarded the last 7 days of data due to right-censoring. To simulate the ICU we used a similar approach described in [15] , where for each date and incidence of admissions on that day we sample an estimate from the LoS posterior and then beds are counted by summing all cases for each day. To take stochasticity of the LoS into account, this process is repeated N times, resulting in N predictions of bed needs over time, where the credibility intervals are computed. The result of this simulation can be seen in Figure 11 . As we can see, even with initial under-reporting of SIVEP-Gripe data and right-censoring, we were able to match the complex dynamics of the daily observed ICU occupancy in Porto Alegre/RS. Note also that the simulation was able to match the growing observed ICU occupancy very well even without using the release dates from SIVEP-Gripe, showing that the model is indeed capturing the observed growing patterns in Porto Alegre/RS. What is worrying about this plot is that if admissions due to COVID-19 are stopped to zero, it would take nearly one month and a half to decay the occupancy to zero. In Figure 12 , we show a simulation for three different scenarios using a period with constant daily admission rates of 12, 15, and 18 before dropping admissions to zero. As we can see in the figures, there is evidence to support that a daily admission rate below 15 patients would be required to start to see a downward trend in the occupation of the ICUs in Porto Alegre/RS. Note that in the figures, there is a period where we use reported admissions from SIVEP-Gripe data, then followed by a constant daily admission rate for a period and finally we stop admissions (to zero). These three different periods are to show how ICU occupancy will behave under different scenarios. The different scenarios in Figure 12 show how fast occupancy can grow, but how slow it is to decay, as seen on many countries suffering from high ICU occupancy. Regarding general hospitalizations, we show in Figure 13 a table with hospitalizations present in the SIVEP-Gripe dataset where the hospitalization city is different than the city of residence. The plot is constrained to cities where the sum of all desitination or sources is greater or equal than 5 patients. We can see the enormous pressure that neighbour cities can cause to health hubs such as Porto Alegre/RS and Passo Fundo/RS. We show the complexity of these import/export dynamics in the Figure 14 where we plot a directed graph for all cities in Rio Grande do Sul. Finally, in Figure 15 and Figure 16 we show the same directed graph but only when hospitalization city is Porto Alegre/RS and Passo Fundo/RS, respectively. For this analysis we assume that SIVEP-Gripe provides a representative report of the ICU hospitalizations in Porto Alegre, however, it might be biased and lack reports from some health institutions. Admission dates can also be lagged, therefore it can introduce bias in the analysis as well. However, by looking at the simulation results we can see that it approximated the observed daily aggregated reports from all hospitals of the city with data provided from the city ICU occupancy dashboard 6 . Mobility data from providers such as Google [13] and Facebook [16] that are collected from mobile devices provide us a unique view on mobility. Although mobility data from Google can help at the state-level analyses, by leveraging higher resolution data such as mobility data from Facebook [16] , we can have rich information about the effect of adopted interventions at the city level. In Figure 17 , we show mobility data for Porto Alegre/RS, together with the mobility of the top 6 cities that are exporting hospitalizations to Porto Alegre/RS, to mention: Alvorada, Viamão, Gravataí, Canoas, Cachoeirinha, Guaíba. In this figure, we can see that after the initial peak of interventions around the last week of March, there was a slow 6 https://bit.ly/3hkXXmd increase in the change of average number of visited tiles (represented by 0.6km by 0.6km) compared to the pre-outbreak baseline. After new interventions on July 22th, we can see reduction again of mobility, but not to the same levels seen before. We can also see in this figure, that neighbour cities (that are exporting hospitalizations to Porto Alegre/RS) doesn't seem to have the same level of agreement with the mobility reduction seen in Porto Alegre/RS, especially when we look at the weekends. Also, looking at the weekend of July 19th, when compared to the weekend of 12th July, we can see an increase of mobility not only in Porto Alegre/RS but also in the cities exporting hospitalizations to the city, a worrying scenario as ICUs in Porto Alegre/RS reached already nearly 90% of occupancy at these dates. Another interesting analysis that can be done with city-level mobility data, is to find what are the cities that share similar mobility patterns. We use sparse inverse covariance (precision matrix) estimation to find which cities are correlated conditionally on the others using mobility data. From the precision matrix, we employ affinity propagation [17] , a clustering technique that creates clusters by sending messages between pairs of samples until convergence is achieved. Please note that we are interested in patterns that vary together, therefore we standardize mobility patterns. Affinity propagation found 30 clusters of cities that are described in Table 1 . We also show the mobility patterns beloging to each cluster in Figure 19 and finally a geographical map of Rio Grande do Sul with colors representing the clusters that each city belongs to in Figure 18 . Interpreting mobility data is challenging. The information from providers on population mobility, needs to be contextualized with additional data sources on local demographics, infrastructure and socioeconomic indicators. As mentioned in [18] , areas with higher densities of population and infrastructure, tend to have shorter trips and easier access to essential services, while the opposite can happen on other regions. Also, data from Facebook mobility represent only a slice of users that have opted into the Location History setting, therefore it may be overrepresented in certain populations and underrepresented in others. Aware of all limitations and difficulties to estimate and interpret the R t through time, we provided an adjusted estimation of this important epidemiological quantity that is a fundamental variable to observe during the outbreak and within the context. We also provided an estimation for the length of stay (LoS) for the state capital Porto Alegre, where we showed how fast admissions can cause a surge of occupancy and how slow are these occupancies to decay to the same regime it was before the surge. The goal of this work was not only to try to characterize the dynamics of the outbreak in the state but also to raise awareness to the population to the worrying scenario that usually comes after a long period when even basic epidemiological quantities such as R t are above their safe regimes. While it is true that there are many limitations in these estimates, when we look at these quantities in context together with mobility data and ICU progression, we can see that urgent interventions are needed to break the chain of transmissions and the growing trend hospitalizations that we are witnessing for more than 2 months. F I G U R E 1 3 This graph shows the number of hospitalizations found on SIVEP-Gripe dataset where the hospitalization city is different then the city of residence. The plot is constrained to cities where the sum of all desitination or sources is greater or equal then 5 patients. The number at each square is the number of hospitalizations with the color map mapping the range of hospitalizations. This graph used data from SIVEP-Gripe dated as 21th July, 2020. F I G U R E 1 4 This is a directed graph plot showing the complexity of hospitalizations in Rio Grande do Sul/Brazil. Each node represents a city and the edges shows the direction of hospitalization from patients on the city of origin to the destination city. Colors and thickness of the edge represents the amount of hospitalizations from a city to another (the lighter the color or thicker the edge, the more hospitalizations had origin at the city). The node with yellowish arrows arriving at close to the center of the graph is Porto Alegre / RS. This graph used data from SIVEP-Gripe dated as 21th July, 2020. Each node represents a city and the edges shows the direction of hospitalization from patients on the city of origin to Porto Alegre/RS. Colors and thickness of the edge represents the amount of hospitalizations from a city to another (the lighter the color or thicker the edge, the more hospitalizations had origin at the city). This graph used data from SIVEP-Gripe dated as 21th July, 2020. Each node represents a city and the edges shows the direction of hospitalization from patients on the city of origin to Passo Fundo/RS. Colors and thickness of the edge represents the amount of hospitalizations from a city to another (the lighter the color or thicker the edge, the more hospitalizations had origin at the city). This graph used data from SIVEP-Gripe dated as 21st July, 2020. F I G U R E 1 8 This map shows the mobility clusters found using affinity propagation on the estimated sparse inverse covariance from mobility patterns. Cities with similar patterns are in the same cluster (same color). This map used mobility data from Facebook [16] . Regions with parallel red lines are regions with missing mobility data or partial mobility data that wasn't included in the analysis. Cluster 30 F I G U R E 1 9 Clusters of mobility after affinity propagation on the estimated sparse inverse covariance (precision matrix). Note that the the series are all standardized. The cluster in red is to show thich cluster Porto Alegre/RS belongs to. TA B L E 1 Table showing the 30 clusters found using affinity propagation on the estimated sparse inverse covariance of the Facebook's mobility data [16] . We are very thankful for the work done by Brasil.io and its contributors. We are also thankful to Fernando Azambuja for the help with the SES-RS historical files. We are also thankful to everyone from Infovid 7 and Rede Análise COVID-19 8 for all the help and fruitful discussions. Complexity of the basic reproduction number (R0) Practice of Epidemiology A New Framework and Software to Estimate Time-Varying Reproduction Numbers During Epidemics ORIGINAL CONTRIBUTIONS Different Epidemic Curves for Severe Acute Respiratory Syndrome Reveal Similar Impacts of Control Measures Real time bayesian estimation of the epidemic potential of emerging infectious diseases Serial Interval Distribution of SARS-CoV-2 Infection in Brazil COVID-19 infectivity profile correction Estimating the time-varying reproduction number of SARS-CoV-2 using national and subnational case counts Serial interval of novel coronavirus (2019-nCoV) infections. Infectious Diseases (except HIV/AIDS) Critical Care Utilization for the COVID-19 Outbreak in Lombardy, Italy: Early Experience and Forecast during an Emergency Response COVID-19 Community Mobility Reports Improved inference of timevarying reproduction numbers during infectious disease outbreaks Forecasting critical care bed requirements for COVID-19 patients in England | CMMID Repository Our Work on COVID-19 -Facebook Data for Good Clustering by passing messages between data points Facebook for Good: Mobility Data (Latest Available) | NAPSG PrepResponse Portal