key: cord-0226025-lht9jsel authors: Marin, Robin; Runvik, Haakan; Medvedev, Alexander; Engblom, Stefan title: Bayesian Monitoring of COVID-19 in Sweden date: 2022-05-02 journal: nan DOI: nan sha: acd6346fd98cee801cb63c569e8b1e6900a5f530 doc_id: 226025 cord_uid: lht9jsel In an effort to provide regional decision support for the public healthcare, we design a data-driven compartment-based model of COVID-19 in Sweden. From national hospital statistics we deduce parameter priors, and we develop linear filtering techniques to drive the simulations given data in the form of daily healthcare demands. We additionally put forward an optimization scheme which enables a refined resolution of the reproduction number estimate, and which also improves substantially on our confidence in the overall results thanks to a parametric bootstrap procedure. Taken together we obtain a computationally efficient Bayesian approach of predictive value which provides important insight into the progression of the disease, including estimates of the effective reproduction number, the infection fatality rate, and the regional-level immunity. We successfully validate our posterior model against several different sources, including outputs from extensive screening programs. Since our required data in comparison is easy and non-sensitive to collect, we argue that our approach is particularly promising as a tool to support monitoring and decisions within public health. The results in this paper stem from the work carried out within the cross-disciplinary research project CRUSH Covid at Uppsala University † . Starting in the fall 2020, every week the group published a widely circulated report covering the region's COVID status by, e.g., collecting data from PCR tests, mobile apps, wastewater analysis, and health care. Our contribution consists of a Bayesian disease-spread model which provides decision support in the form of predictions of health care demands and additional epidemiological insight. The pathogen SARS-CoV-19 and the subsequent pandemic resulted in an explosion in research related to COVID-19: research on data collection and interpretation, modeling and forecasting, as well as scenario generation. From the beginning of the outbreak, data have been instrumental in understanding the disease dynamics [1, 2, 3] . With increasing amounts of cases and recorded patient data, statistical models for diagnosis and prognosis were quickly developed [4] . Combining in-patient data with other covariates, the use of digital technologies for disease surveillance is now possible [5] , notably with an impact also for COVID-19 [6, 7, 8, 9] . Challenges connected to the collection and distribution of available data where met with specialized tools for decentralized publishing [10, 11] and anonymized mobility data were also in active use [12, 13] . Since the start of the pandemic, there have been several attempts to model and forecast the spread of the virus. Data might appear plenty at first, but the abundance of poorly informative data also hinders modeling efforts [14, 15] . Several aspects of the disease transmission are of interest and need to be modeled, from small scale in vitro properties to global interventions. Some modeling efforts include multiple levels of resolutions to capture, e.g., global travel patterns [16] or local within-country dynamics [17] . Understanding how to combine the various scales can substantially benefit the fidelity of scenario generators [18] . It is fair to say that models which have been in actual use are understudied due to time constraints and therefore often lack a thorough uncertainty analysis [19, 20] . In fact, with the frequent lack of high-quality data for the current state of the disease, nowcasting has been increasingly critical in decision making [3, 21, 22, 23] . When modeling under data limitations and with potentially large process uncertainties, the Bayesian framework stands out as perhaps the most natural choice for modeling and prediction; indeed, Bayesian inference in disease models is an active research area [24, 25, 26] . The main challenges are to balance model granularity and computational costs with data quality and actual information content. Our solution involves adapting a linear noise approximation [27] which enables fusion of different data sources and supports a computationally cheap approximate likelihood function via optimal linear filters. We also investigate the posterior model not only through the posterior predictive distribution [28, 29] , but also by estimating the bias introduced by the approximate likelihood using ideas from parametric bootstrap [24] . Our disease spread model attempts to balance three key qualities: interpretability, quantifiable uncertainty, and forecasting accuracy. The posterior model was investigated through marginal estimators, by comparing our results to several other sources, and by bias estimates obtained via bootstrap arguments. As this paper demonstrates, the achieved accuracy and robustness are quite remarkable considering that no data from screening programs were used. Bayesian COVID-19 model In the standard SEIR model, susceptible individuals S become exposed E (without symptoms), and after progressing to a symptomatic infectious state I, they become recovered R. Based on the available data, we extend the SEIR model with the states (H, W, D), and regard them as worsened states of the symptomatic infection such that only a certain fraction of the individuals will enter them. It is widely accepted that not all exposed become symptomatic [30] and hence we also extend the model by including an asymptomatic state A with no or very mild symptoms. The transmission is usually driven by random or time-dependent contact intensities between the susceptible and the infected individuals. As in [31, 32, 24] , we rather consider implicit spread via an infectious pressure compartment ϕ, an environmental state variable which models the current force of infection of the virus. The pressure is sourced by shedding from all infected carriers and decays over time. An advantage with this type of spread is that it is easy to model disease spread by more than one infectious compartment, i.e., (E, A, I) in our case. Similarly, spread over a network defined by, e.g., commuting intensities between regions can easily be handled. After experimenting, however, we did not incorporate network spread in the model, since letting the regions function as independent nodes reduces the computational complexity while giving a very similar data fit. The resulting model is summarized in Fig. 1 , with a detailed description of the model and all parameters in the Supporting Information (SI). With increased model complexity follows explanatory power at the expense of poorer model identifiability. With this in mind we exclude all model refinements that are either missing or are unreliably or incompletely reported in the data, e.g., age and gender as well as certain refined states of minor symptoms. We recognize the infection's varying effects on different age groups [33] and so must accept that our results remain in an age-averaged regime. Increased model complexity which, however, is required involves using a mix of dynamic and static parameters, since this allows the model to respond to functional changes such as societal interventions [34] , vaccinations and virus mutations [35] . We thus let β t , the infection rate related to the reproduction number R t , and the infection fatality rate (IFR) be time-varying. All in all the problem is then to determine the posterior distribution for 10 static and two dynamic parameters. The latter are assumed constant for four week periods but are re-sampled independently for each such period. Clearly, well-chosen priors are required to make the problem definite. Considerable work went into constructing and continuously updating our priors using published research and public registries; the final priors are displayed in Fig. 2 , with a complete list of priors and prior predictive estimates given in the SI. We derive a Kalman filter model [36] from the topology of our compartment model and assuming exponentially distributed waiting times. The marginal likelihood from the filter then approximates the likelihood conditioned on a parameter proposal when subject to an observation triple (H, W, D), and we use the Adaptive Metropolis (AM) [37] method to generate samples from the induced posterior. The role of the marginal likelihood is remindful of a synthetic ) for the model parameters and the associated posterior distributions (blue) for the Swedish aggregate inferred from publicly available regional data from April 1, 2020 to May 30, 2021. The dashed lines indicate the prior means, and the full lines and numeric values the posterior means. The reproduction rate R t and the IFR are dynamic parameters and a temporal average is displayed here. likelihood [38] and we refer to the specific combination of a Kalman marginal Likelihood and Adaptive Metropolis as KLAM. Additional details are found in Material and Methods and in the SI. Our model was used for weekly reported predictions within CRUSH Covid. Prior to each report we carried out a model updating procedure: new data were pulled from public repositories and screened for contradictory or incorrect values. The posterior was sampled by KLAM using an initialization either from an estimated initial state as described in Material and Methods and developed for this very purpose, or simply using previously stored state samples. The latter allows for faster sampling as it reduces the burn-in period: about 24 hours of compute for 21 regions and a year's worth of data on a 4-core laptop was then reduced to a few hours. The final posterior model was queried for one-week-ahead forecasts with uncertainty bounds for the (H, W, D) triple. We also continuously evaluated the previous week's predictions against the up-to-date data in the same round. In Fig. 2 , we display the prior distributions together with the resulting Swedish aggregate posterior, i.e., the population weighted average of the individual posterior of each of Sweden's 21 regions. Several priors are clearly very similar to their respective posteriors, e.g., the latent period rate σ and the symptomatic period rate γ I . This is expected and simply indicates little information in the observations for these parameters relative to the prior. Our data also cannot improve on the prior for the share of spread from exposed (pre-symptomatic) and asymptomatic individuals (parameters θ E and θ A , respectively). This is simply due to the fact that we have no continuously reported data neither for E nor for A. On the same note we had to rely on a directed study [30] to define the prior for the parameter E 2 I, i.e., the fraction of exposed individuals who eventually develop symptoms. Of more interest are the parameters that govern the fraction that transition to a worsened state of the disease: [18.7, 19.4] . We could have used the earlier of those point estimates to improve on the corresponding priors, however, as validation possibilities are scarce, we decided to rather use them for this purpose instead. The regional models were used for posterior predictions on a weekly basis, e.g., one 7-day ahead prediction per region and one for the entire nation. In Fig. 3 this is exemplified over a longer period together with the actual outcome. * * We indicate Credible Intervals (CrI) by square brackets [·, ·] and Confidence Intervals (CI) by regular parentheses (·, ·), unless otherwise specified. Figure 4 : The fraction of recovered individuals in the Stockholm region. Our model (blue), is set to 2.35% on April 5, 2020 (matching [40] ). Serology-based Bayesian model predictions [40] (red) which reported 70/95% CrIs. Estimated mean prevalence of antibodies found in spared blood samples from outpatient care, with 95% CI (solid, pink) and blood donors (dashed, purple) [41] . On 2020, December 27, the Swedish vaccination campaign started and both the number of delivered first doses (dotted) and second doses (dash-dot) are indicated. The performance of the weekly predictions which were reported live (N = 25) is presented in Tab. 1. Each prediction included a mean with 68/95% CrI. The week after publishing, we evaluated the prediction against the then-available data. The predictions performed better in the medium sized region Uppsala and notably, the predictions for casualties was poor in the larger region Stockholm. This observation eventually lead us to allow the IFR to become a timevarying parameter as previously described (in Tab. 1 only 9 out of 25 reports supported a dynamic IFR). Another possible reason for the poorer prediction performance here could be that, since the Stockholm region contains three large hospitals, the greater heterogeneity in terms of reporting makes identification more challenging. One can rightly question if smaller sub-regions should rather be modeled here, but we did not have access to the data to drive such a model. The posterior model can also be used as a kind of "Bayesian twin" and estimate quantities which are otherwise very difficult to approach. For example, we can readily estimate the number of individuals that have contracted the disease and survived; these individuals are the ones that could potentially have developed antibodies which are detectable in serology tests. In Fig. 4 we visualize several such reported results for Stockholm [40, 41] along with our estimates. Clearly, those estimates compare very well, and notably so given that no screening data was used by our method. Recall that the IFR is defined as the proportion of deaths among infected individuals, i.e., including asymptotic cases. By design, our model relies on an IFR which is constant over four weeks. Our estimated IFR for Sweden stayed relatively constant over the period May 2020-November 2020 at 0.69% [0.11, 1.5] (95% CrI). For comparison, in April 2020, the PHA published an early estimate of 0.58% (0.37, 1.05) (95% CI) for Stockholm, Sweden [42] . However, this estimate relies on initial assumptions on the number of undetected cases that seem unjustified when compared to later findings [43, 44] . It also assumes the relatively younger Stockholm demographics rather than the national one, and is therefore an underestimate of the national IFR. A later Sweden-wide IFR estimate of 0.76% [0.65, 0.87] was published in November 2020 [45] and aligns well with our estimate above. Although our 95% CrI is comparatively wide, this is partially due to modeling each period independently and without any regularization in the transition between periods. Tab. 2 summarizes a few bi-monthly estimates for the period after October 2020. The first two entries in the table overlap the estimates from [46] : Nov 2020 = [0.605, 1.461]%, and Jan 2021 = [0.565, 1.437]%. Clearly, the IFR was trending downwards and this is also known to be the case in Stockholm [45] as well as for the world in general [46] . Of interest is also the case fatality risk (CFR) [47] , i.e., the risk of death conditioned on being diagnosed with the disease. We more generally define CFR X as the proportion of deaths expected given a certain number of individuals in compartment X ∈ {I, H, W }. Note that CFR I involves the number of symptomatic individuals which is not the same thing as the number of confirmed cases. From our posterior we may directly estimate the national average CFRs to be {[0.72, 1.3], [16, 19] , [34, 36] }% (95% CrIs) for CFR I , CFR H , and CFR W . By comparison, [48] . From NBHW data we may estimate those same numbers to be {8 335, 4 102, 935} [49] , after a scaling of about 5% to arrive at the same total number of dead as in our dataset (D = 13 372). Apparently, our model overestimates the deaths under ICU and to some extent also at hospital, while our estimate for deaths outside hospital is left-tilted compared to this data point. This could likely improve given more detailed data sources, including, e.g., improved records of COVID-related deaths and hospital care outcomes. The reproduction number provides an essential insight into the future development of the spread of a disease in a population. An accurate estimate of this number is vital to support rational public health decision-making and to inform the general public. Temporal variations in the reproduction number are caused by socio-behavioral, environmental, and virological and biological factors [50] . The dynamics of the reproduction number is therefore significantly faster than those of most other parameters and hence it is desirable to increase the temporal resolution of this estimator. The procedure described so far yields static reproduction number estimates for each four-week period and with comparably large spread. We obtained daily estimates through a separate estimation procedure, but using the same data. This estimation is different as it is only based on the mean of the posterior distributions and the mean-field dynamics and without the correction step of the Kalman filter. By utilizing the innovation covariance from the Kalman filter, and a quadratic regularization term to ensure smoothness of the solution, a quasi-maximum likelihood formulation is obtained. From this, the estimates are calculated using dynamic optimization, see Material and Methods and the SI for further details. Reproduction number estimates have been calculated in this way for the whole duration of the parameterized period for all regions in Sweden. In Fig. 5 , we present our results for Uppsala together with the testing-based estimate produced by the PHA for the same period. The cost-effectiveness of our estimate is apparent here since the results are similar, but the PHA estimate relies on costly incidence data. Apart from providing more detailed information, this procedure enables a validation of the Bayesian workflow since it supports synthetic data to be simulated in an off-line fashion. That is, for a given posterior and a daily time-dependent estimate of the reproduction number, our full COVID-19 model can be simulated to generate replicas of a process with a known truth. Our Bayesian inversion may now be employed a second time in order to estimate bias or sensitivities for various estimates of interest. The improved confidence in the results achieved by following this procedure cannot be overstated. We have devised a detailed Bayesian model for the regional dynamics of the COVID-19 pandemic in Sweden. A datacentric viewpoint is that, using model-based data analysis, we have gained a thorough insight into the progression of the COVID-19 pandemic in Sweden in general, with extra emphasis on the Uppsala Region. The proposed compartmentbased model combined with the novel use of optimal linear filters turned out to be a quite remarkable informationtheoretic epidemiological tool. The output quality as obtained in our work compares very well with official estimates gathered in test-based programs, cf. Figs. 4-5 (and also the incidence comparison in the SI). During the second and third waves in Sweden, at least 10 million RT-PCR tests were administered at a standard cost of 1 400 SEK (≈ $150) per test [51] . The expenses for such a PCR test for all symptomatic-strategy quickly grows, underlining the economical advantages of our approach. Clearly, at the individual level there are several benefits with testing and the importance of screening as a means to collect initial statistics for the disease spread cannot be stressed enough. However, our approach remains very promising as a supporting tool to continue to monitor the situation when testing is limited due to risk-cost trade-offs. The resulting model was further processed to output weekly predictions for health care demands, and also marginal estimators for important characteristics of the disease such as infection fatality rates and reproduction numbers. The latter output increased the confidence into the overall approach through the generation of synthetic data and parametric bootstrap techniques. Improved data that would have enabled a higher model precision include (1) a consistently managed incidence report from randomized testing (not necessarily high volume), and (2) a higher temporal resolution of hospitalization and intensive care risks as well as times for treatment in these respective categories. These statistics could both be collected at a relatively small cost but would likely improve the precision considerably. Without relying on public testing strategies, our model-based approach provided improved situation awareness of the progression of the pandemic. By virtue of the consistent Bayesian framework, uncertainties are transparently propagated under clear assumptions, also in the face of potentially hazardous situations. We argue that this quality makes the techniques developed herein particularly promising from a communicative perspective. Our data streams are high in latency, but are on the other hand fairly low in noise. Low-latency signals, e.g., public screening, self-reporting mobile apps, or analysis of sewage water, are instead often more noisy or otherwise biased. Combining these different kinds of streams provides for excellent decision support and appears extremely promising for use in tracking regional epidemics at a near-daily resolution. We detail the Swedish publicly available data and the most important parts of the computational model and its associated synthetic likelihood here. Two 'bootstrap' procedures are also outlined: one for improving the temporal resolution of the reproduction number estimates, and one for bounding the inversion bias through the generation of synthetic data. Further details concerning the derivation of the linear filters, data pre-processing, and optimization algorithms are found in the SI. In Sweden, the publicly available data for the COVID-19 pandemic fall in one of two categories: hospital load and results from PCR testing. In addition, cumulative national-level disease severity statistics has been updated approximately once a month throughout the pandemic. The 21 regional councils compile hospital data and report the number of patients undergoing inpatient or intensive care, and also the number of deceased individuals. These numbers are reported on a daily basis and have been judged to be of consistent quality over sufficiently long periods of time to be used in our modeling. We retrieve those data from the portal initiative c19.se, which in turn collects the data from the regional councils. For validation, we have compared with official public registries, including the Swedish Public Health Agency (PHA), the National Board of Health and Welfare (NBHW), and the Swedish Intensive Care Registry (SIR). There are occasional inconsistencies in the data which need to be filtered away; see the SI for our quite basic approach for this. The Swedish daily incidence as reported from PCR testing has been poor in several periods of time due to restrictions and changes in testing recommendations [52] . In June 2020, the Swedish government appointed a commission to evaluate COVID-19 measures, including, among other things, the testing programs. They found that the time from booking a PCR test to receiving the test results exceeded six days across several regions during the period of time studied in this paper, with additional time delays in reporting COVID-19 notification rates on the regional-and municipality levels [53] . For these reasons, we judged the incidence data to be unreliable and excluded it from our model. In the SI we directly compare the incidence data to the estimate implicit in our model. Self-reporting via mobile apps has been proposed as a cheap and fast alternative to PCR testing [9] . However, the validity of the signal depends on symptoms that overlap with other respiratory infections. For example, the PHA noted a high occurrence of symptoms of acute respiratory infections by the start of the Swedish second wave in fall 2020 [9, 54] . Laboratory analyses of respiratory viruses later indicated a high incidence of common colds caused by rhinoviruses during the same period [55] . Another alternative data source is the surveillance of wastewater [56, 57]. Due to the signal's large relative noise ratio, we did not consider it in this study but left it for future work. We understand the compartment model as a continuous-time Markov chain (CTMC) over an integer lattice counting the number of individuals in the different compartments. Hence the waiting time for exiting one compartment is exponentially distributed of mean rate 1/λ, and the number of individuals which exit in a small window of time [t, t + ∆t) is Poissonian ∼ Po(n t λ∆t), with n t the number of individuals at time t in the compartment. If the number of individuals is large enough, this transition can be approximated by the normal distribution ∼ N (n t λ∆t, n t λ∆t), which can be directly translated into a contribution to the state update matrices (F k , Q k ) of a discrete-time Kalman filter with the equations of state with k the discrete time index. To reduce the transients, we use an initialization of the filter developed specifically for this purpose. Given the state update matrix F k , observation matrix H k , and time series data y = (y k ), the first step of the algorithm removes all cumulative states from F 0 , H 0 , and y 0 to produce F red , H red , and y red . The initial non-cumulative states are then obtained by projecting the dominating eigenvector of F red onto the subspace defined by H red x red = y red under a positivity constraint. For the measured cumulative states, the initial state itself is taken as the data, while the unmeasured cumulative states are set to zero. In particular and under reasonable assumptions, if the system would be initialized from the eigenvector and simulated without perturbations, the relative magnitude of the non-cumulative states would remain constant. The linear filter allows for an approximation of the intractable likelihood of a parameter proposalΘ, namely the marginal filter likelihood, (2) In our application we rely on the likelihood to produce an approximate sample (Θ i ) from the posterior in a Metropolis algorithm with a Gaussian proposal function, and where the covariance is adaptively tuned on the covariance of the posterior samples [37] . The achievable temporal resolution of the Bayesian parameter estimates provided by Metropolis sampling is limited by both computational complexity and the amount of data. To obtain more fine-grained estimates, a different approach is needed. Since the reproduction number is the dominating parameter of the dynamics, a more highly resolved estimate of the reproduction number is particularly useful. In terms of the parameters of the model, this corresponds to improving the estimation of β t . Therefore, a daily estimate β(t k ) = β k is calculated using dynamic optimization techniques. Dynamic optimization has been used for estimating the reproduction number of the COVID-19 outbreak also by others [58] . However, when combined with an existing posterior distribution, care must be taken to avoid overconfidence from using the same data twice. For this reason, we do not attempt to derive an improved posterior distribution of β t , but instead a single marginal time-dependent maximum likelihood estimate is sought, where the rest of the posterior distribution is "frozen". The same logarithmic marginal likelihood that is used in the Kalman filter is utilized for this purpose, now understood as a quadratic cost function. Here, however, the deviation between measurements and the outputs from the mean-field dynamics is minimized, i.e., a shortened formulation compared to the filter is employed since the Kalman correction step is neglected. To avoid fast variations in β k , a regularizing term penalizing square gradients is also added. The resulting optimization problem can be solved using standard techniques. Further details on the formulation of the optimization problem and its solution are provided in the SI. In real applications, a "true" or a "best" parameter posterior P * is usually unknown. Evaluating the stability and the quality of the approximate posteriorP is unfortunately often overlooked. We suggest employing a parametric bootstrap approach as in [24] to assess the error between samples from the true and the approximated posteriorẼ :=Θ − Θ * , where Θ * ∼ P * andΘ ∼P. Denote by θ * := E[Θ * ], the minimum mean square error estimator (MMSE) of an assumed truth. Decomposing the mean square error around this value we find whereθ := E[Θ] is the MMSE ofΘ. Formally, this still requires samples from the true posterior when estimating the bias. We approximate this via a bootstrapped estimator using a sample of N boot synthetic data sets generated from the MMSE of the approximate posterior. The generative simulator requires daily estimates of (β k ) or else the synthetic data quickly drifts off compared to the observations, and thus this technique ultimately hinges on the highly resolved marginal estimator described above. Posterior samples may then be generated for each synthetic set, yielding now a set of samplesΘ i ∼P i , which then allows for the use of the now tractable bias estimatorb Our final estimator is then an average over these synthetic sets; (3). While up to 196 000 samples from each posterior were used to compute point estimators and CrIs, bootstrap replicas are much more costly to process so we used N boot = 3, and mainly relied on the bias estimator to diagnose non-robustness in point estimators. That is, a point estimator with CrI A of order α = 68%, say, and with bias estimateb is considered less robust whenever The bootstrap posterior densities can be used in aggregate form for bias estimation and check of estimator robustness as just described, or for a related check of credibility interval robustness. Consider two CrI intervals A and B of order α, e.g., α = 68%, and where B is a bootstrap replicate of A. A basic measure of the robustness of A is the level of overlap between A and B and one can reasonably require the same overlap as the indicated order α, i.e., to require If this criterion is satisfied, then a random variable drawn uniformly from A ∪ B has probability ≥ α to also be in A ∩ B. The robustness checks Eqs. (4)-(5) are explicitly reported in Tab. 2, but were also routinely employed when evaluating various results. This supplement contains further explanations and details of the data, the model, and the computational methodology, as well as some additional results mentioned in the main article. The SI is organized in sections as follows: • Incidence: our model's estimate of the incidence of symptomatic cases compared to the number of confirmed cases by PHA. This also produces an estimate of the proportion of undetected symptomatic cases. • R t -estimators: for a selection of small and large regions. Data A summary of the various sources and use of data. Compartment model Our extended SEIR-model in detail, including an explanation of all model parameters. Priors The parameter priors for the model and how they were determined. Kalman filter The linear filter approximation of the continuous-time Markov chain, which in turn is our probabilistic understanding of the extended SEIR-model. Posterior sampling Details of the Metropolis sampling procedure. The procedure for 'bootstrap on the margin' to improve the temporal resolution of the reproduction number estimator. • Posterior robustness: A comparison of the 21 regional posteriors, which thus form a natural bootstrap population. • Bootstrap robustness: The estimated bias and some additional quality statistics for the 21 regional posteriors. • Baseline predictor: A comparison of the prediction of our Kalman filter and a simpler regression estimator. The code as well as the data required for reproducing the results in the paper are publicly available and can be downloaded at github.com/robineriksson/Bayesian-Monitoring-of-COVID-19-in-Sweden. Refer to the included file README.md for more information. In the same vain as in the validation against seroprevalence studies, the model can also give estimates for the symptomatic incidence. In Fig. 6 we illustrate our estimated symptomatic incidence along with the reported number of positive RT-PCR tests reported by the PHA for Stockholm and Uppsala. The reported tests are smoothed by a 7-day Savitzky-Golay filter and the estimated uncertainty is the corresponding rolling standard deviation. After testing phase 3, the ratio between our estimate and the positive tests oscillates around one, indicating that the testing at that time captures most of the symptomatic infected. After April 2021, our model estimates a somewhat smaller symptomatic incidence than what was captured in actual tests. A possible explanation for this is that at smaller incidence, an increasing proportion of asymptomatic cases are included in the test pool, e.g., from regular screening of hospital personnel and similar. Through this method of exploring the data we can also confirm our early suspicion that the incidence data signal was very poor during Testing phase 1 and in part during phase 2. The noise in the screening data explains the nervous behavior of the PHAs reproduction number estimates as seen in Figs. 5 and 7. At the start of the second wave, and during Testing phase 3, this signal was much more accurate albeit at huge economical costs. We illustrate the daily R t -estimators for the three largest and the three smallest (by population) regions: Stockholm, Västra Götaland, Skåne (largest), and, respectively, Gotland, Jämtland, and Blekinge (smallest) in Fig. 7 . The relative uncertainty is visibly smaller in the larger regions than in the smaller ones, a typical small population effect as the data streams for the smaller regions consist of smaller counts with larger uncertainties in a relative sense. The daily observations we use in the inference is retrieved from c19.se using Python scripts. In turn, c19.se fetches the data from the official regional sources on a daily basis. Sweden has 21 regions (Fig. 8) that are in charge of providing for the healthcare of its population. They also present data on the number of hospitalized patients, the number of patients in intensive care, and the daily number of diseased by COVID-19. We use a couple of other datasets for prior generation and for posterior validation, see Tab. 3. The sets C19 and NBHW D had double use, however, the prior information obtained from these were on an aggregated level. Data pre-processing Our main data sources were updated each day and incorrect or questionable data points were common. Two errors that need to be addressed are next described. Firstly, there are impossible or very extreme updates, e.g., a negative incidence or very large jumps, most likely due to an accumulation of delayed reports, or in some cases possibly in an attempt to correct earlier reports. For example, a delay in the reporting of deceased has been described in the Swedish data set [12] , and was there explained as "batch" reporting. In similar spirit, there are also a few missing data points. However, these are all found very early on in the data history and could safely be replaced with NaN-values, thus simply ignored. Second, and more problematic from a model-based perspective, is the fact that parts of the reporting display a strong periodic component, that is, which is not supported by the model and rather most likely an effect of weekday periodicity. The pre-processing of data is done in two steps. First, we ensure that there is no negative incidence data. Negative values are detected and set to 0, with values backwards in time corrected in a 'reasonable' way using a simple linear decaying memory kernel, and with the corresponding cumulative compartment adjusted accordingly. In this first step we also replace missing or manually found early outlier measurements with either linearly interpolated data from adjacent days, or using NaN-values. The second step involves smoothing the cumulative fields and the incidence fields to mitigate the effects of weekly periodic lags or batch reporting. The smoothing algorithm removes unlikely incidences by starting from the most extreme outliers, e.g., ≥ [10, 9, . . . , 2] standard deviations under a Poissonian approximation, and spreads them out during the period before in such a way as to dampen weekday dependencies in reporting. Most weekly periodicity is removed by this procedure, see Fig. 9 for an example, and whatever remains does not seem to interfere with the Bayesian inference. The total effect the pre-processing has on the data can be measured in the sense of the mean maximum relative difference, where i ∈ {1, 2, 3} for compartment X ∈ [H, W, D], i.e., the three data compartments and over some period of time of length N t . We report this measure for the period April 1, 2000-May 31, 2021 in Tab. 6. The population weighted national average difference is 6.6%. The extended SEIR model is depicted in Fig. 10 . with Σ = S + I, the total population size. Susceptible individuals turn infected at a rate proportional to the local infectious pressure ϕ, and infected individuals recover at a rate γ. The last compartment ϕ represents the environmental pathogen load and follows the dynamics [14] ϕ (t) = θ I I(t) − ρϕ(t). Figure 9 : Reports of deceased per weekday over the period April 1, 2020-May 31, 2021 for the Stockholm region. The smoothing procedure attempts to correct for weekly reporting patterns in the original data and behaves similarly to a bootstrap replicate for the same time period and region. Infected individuals shed the pathogen into the environment, thus sourcing the infectious pressure, which also decays at a fixed rate β. The R 0 -number for the SIS E -model is given by R 0 = θ I β/(γρ) such that θ I = ρ is a convenient non-dimensionalization. Our extended SEIR-model involves two types of parameters: rates and fractions. The rates are We may further divide the rates into waiting times and those governing the infectious pressure. The waiting times are σ, γ I , γ A , γ H , and γ W , and are understood as the inverse of the mean time an individual stays in a certain compartment, e.g., the mean recovery time for a symptomatic infected is γ −1 I . The transition from S to E depends on ϕ and β t . The infectious pressure ϕ is sourced by the viral shedding from asymptomatic θ A , from exposed θ E , and from symptomatic individuals θ I , and it decreases by the viral decay rate ρ. We use the non-dimensionalization θ I = ρ and the scaled variables For this model the reproduction number can be determined as and an identical relation holds for (R t , β t ). The fraction parameters determine the probabilistic fates of individuals in a compartment with more than one exit. They were determined from demographic averages since the daily data did not contain the level of detail to support age-dependency (cf. the discussion in Material and Methods). A → I : F 1 = A 2 I (= 0 in our simulations), H → D : F 3d = SIR MORT × HOSP MORT (14) W → D : F 4 = SIR MORT, and, The final relation is best explained from the discussion leading to (24) below. The remaining fractions of the model can generally be obtained by requiring that the total sum of all outgoing fractions is one. That is, the fraction which recovers from compartment X is given by 1 − F X − F Xd , where F X is the fraction that enters the next state in the chain and where F Xd is the fraction that dies. Note that the model parameters covering fatalities, (F 2d , F 3d , F 4 ), are obtained from the more natural parameters (SIR MORT, HOSP MORT, IFR) via Eqs. (14)- (16) . We did not find that connecting the regions improved the fit to data and, in the end, we therefore decided not to use this technique on a regular basis. However, since we did use it in Tab. 1 and for completeness, we describe below how it was implemented. The network connection is obtained by introducing a prior commuting factor λ ∈ [0.5, 1.5] that allows for connections between the regions. This affects only the calculations on a national level, and it does so mainly by adding some correlation between connected regions. The network is defined by a connection matrix D which is a 21-by-21 square matrix with a zero diagonal. For the ith row, each column j contains the proportion of individuals commuting into region i from region j. In turn, the connection matrix itself is found by a linear programming formulation on the volumes of individuals commuting in and out per each region [10] . At each forward step in the simulation, the infectious pressure ϕ k = ϕ(t k ) is then updated as whereφ k+1 is the update according to Eqs. (28)-(30), i.e., without any network effects, and where d is the column sum of D and is element-wise multiplication. The prior knowledge relied upon to construct priors stems from several sources. Below we briefly comment on the techniques and assumptions used to derive our priors; effective summaries are found in Tabs. 4 and 5. The process of constructing priors was ongoing during the whole period fall 2020 to late spring 2021. We revised the priors whenever we became aware of new research, when more statistics became available, or we deemed model tweaks necessary. The final priors (arrived at towards the end of May 2021) are what we discuss below. σ and γ I /γ A The mean incubation time σ −1 is set to 6.2 days with support in [5.4, 7] , derived from the mean and 95% CI given in [15] . The recovery time for the symptomatic infectious γ −1 I is less known and we therefore set the mean to 7 days [16] [17] with a wider support of [4, 10] days. We assume similarity between asymptomatic and symptomatic cases by setting γ A = γ I , which covers the priors suggested in [16] . γ H and γ W We determine the priors for hospital recovery times, γ −1 H and γ −1 W from a dataset published by the NBHW [5] , including the distribution of exit times for hospital patients with and without ICU care. The dataset is right-tailed censored for waiting times over 30 days, which affects the ICU patient records. For γ −1 H , we first make a Bayesian fit of an exponential distribution to hospital caring time data (N = 40 507 patients) and place a somewhat broader Beta distribution across the credible interval for the hyper parameter thus found, this results in a Beta distribution of mean 8.9 and support on [8.7, 9.1] [days]. The recovery time under intensive care, γ −1 W , is more complicated since the available data involves also non-ICU caring times. Assuming the previously determined exponentially distributed waiting times under hospital care, we first subtracted two such waiting times (to model pre-and post-ICU care, respectively), next made a Bayesian exponential fit to the remaining time (N = 4 039 patients). This fit was judged a bit worse than for non-ICU care time and so we used a larger enclosing support of [10.8, 13.9] with mean 12.2 [days]. E 2 I The prior for the fraction E 2 I (E → I) was taken from [18] : 0.75 (0.62, 0.84) (95% CI), for which we determine a scaled Beta distribution that fits the mean and the given CI interval. The resulting prior distribution has mean 0.75 and support on [0.014, 1]. IFR The IFR, i.e., the eventual fraction of infected individuals that dies (E → D in our model), is volatile in that the IFR depends strongly on the age distribution of the region as well as on the quality of the health care and the currently dominating virus variant [19, 42] . We therefore settled on a scaled Beta distribution with positive skewed and rather wide support: 0.67%, [0, 2]. HOSP MORT and SIR MORT The priors for the mortality at hospital and intensive care (HOSP MORT, SIR MORT) are recovered directly from mortality datasets linked to hospitalization deaths [4] (N = 46 236 patients and 5 729 diseased) and ICU deaths [6] (N = 5 744 patients and 1 357 diseased), and are kept as fixed constants. IC HOSP To find our prior for the proportion of hospitalized patients needing intensive care, we use the c19.se data in aggregate form and extract the percentiles ([0.5, 50, 99.5]%) of the quotient [H : W ] between the number of hospitalized-and ICU patients for all data available. We next assume a relation of the form for W approximately stationary and thus satisfying a balance condition. This means that It then follows that Transforming the percentiles of [H : W ] accordingly, we then find a Beta distribution with support on these ([0.065, 0.95]) and a mean at the median (0.18). , and all in relation to the exposed population E. We find the asymptotic fractions [H] and [W ] by considering the dynamics expressed in Fig. 10 and recognizing a geometric series, with x = IC HOSP × (1 − SIR MORT). Similarly, We thus arrive at the relation We next find a prior for the fraction of symptomatic individuals that enters the hospital (HOSP) as follows. The model in itself restricts the dead source compartments to I, H, and W . By the previous calculations the total risk of death from compartment I can be decomposed into for some unknown scaling [I : HW ]. Connecting with aggregated mortality data for total deaths from c19 and hospital deaths from [4] , we find [I : HW ] ≈ 0.98 which we simply take to be a Beta-distribution with support We fit a scaled Beta distribution to 100 000 samples from this distribution and finally obtain a Beta distribution with mean 0.033 and support [0, 0.17]. θ E and θ A The viral shedding from compartments E and A, θ E * and θ A * , respectively, is assumed to be uncertain but bounded. We assign the same prior to both shedding rates, a scaled Beta distribution with the mode at 1 and support in [0, 2]. Infectious half-life and ρ The decay rate ρ in ϕ is defined as ρ = − log(0.5) × Infectious half-life. The prior for the infectious half-life is taken to be uniform between 1 and 12 hours, realized as a uniform distribution between 1/24 and 12/24. This encloses the estimate from [21] who suggests 3 hours. λ As mentioned, we do not find the posterior for the network coupling λ. But we do define a distribution which we keep fixed and sample from when performing some of our Sweden-level simulations. We assume a scaled Beta distribution with mean τ and support τ × [0.5, 1.5], where τ = 8/24 × 5/7. This scaling is meant to achieve the proportion of time at work under normal (non-pandemic) circumstances. R 0 Lastly we have β t which we find from sampling R t and using the map (9). We therefore place the prior on R t instead, for which the literature is plenty. The R t prior is taken from R 0 similar to [22] ; a truncated lognormal distribution with log mean log(1.3) and log standard deviation 0.4, and with support [0, 4]. We assess the quality of the prior distribution through some samples from the prior predictive distribution. We generate 7-day ahead predictions using the same set-up as in Fig. 3 and illustrate the results in Fig. 11 . A linear noise approximation of a continuous-time Markov chain (CTMC) with exponentially distributed waiting times between the compartments is employed in the Bayesian modeling. Under the assumption that the susceptible population decreases slowly (in a relative sense) compared to the other states of the system, i.e., only a small portion of the total population is becoming infected over short time horizons, the dependence of the susceptible population on the infection rate is neglected and instead captured implicitly via the time-dependence of (β t , R t ). The model is discretized in time, which results in the linear state-space model Figure 11 : 7-day ahead prediction with 68% CrI (shaded) using 1 000 samples from the prior distribution and with data for reference (points). See Fig. 3 for the posterior equivalent. where x k is the 8-dimensional state vector consisting of the compartments k is the time index (daily) and F is given explicitly by The parameters in this matrix are the ones described in the previous sections. Note how the equation of state for the infectious pressure ϕ has been integrated explicitly. An important property of this model is the distribution of the noise w k . Since a Poisson-distributed transition has a variance proportional to the number of individuals in the compartment, the process noise is state-dependent. A term proportional to the squared compartment population, representing an uncertainty in the transition "flow", is also added to the variance, as well as a constant, i.e., regularizing, term. Hence for a transition from a state A to state B with rate µ, the noise covariance matrix is of the form with where , q A and q B have positive values. Note that a negative correlation is induced by the Poisson-noise, while we do not introduce such terms for Q v nor Q 0 . The process noise covariance of the full model is then calculated by addition of the individual contributions from all transitions which are encoded in F . Note that the state ϕ is the discretization of an ordinary rather than a stochastic differential equation, so there are no Poissonian contributions to the corresponding elements of the covariance matrix. In our setup, = 0.05 2 and diagonal elements of unity for Q 0 are chosen, i.e., corresponding to process noise on the order of single individuals. The measured signals are given by The components of the measurement v k are assumed to be uncorrelated and consist of both a constant and a statedependent term so that the corresponding covariance matrix becomes where the parameter values r 0,H = r 0,W = r 0,D = 1 and r d,H = r d,W = r d,D = 0.001 2 are used in our model. In the Kalman filter, the covariance matrices are calculated based on state estimates at every iteration. One should, however, note that optimality results of the Kalman filter only hold when the noise is Gaussian and additive, i.e., independent of the states, so there is no theoretical justification for the optimality of the Kalman filter in the current setup. Calculated results can therefore be understood as an approximation in density and the Kalman marginal likelihood is an approximation to the true likelihood. Nonetheless, the filter has worked rather convincingly in practice. When performing Bayesian inference on models with intractable likelihoods one common approach is Approximate Bayesian Computations (ABC), also referred to as likelihood-free inference [24] [25] . The cornerstone in ABC is to use a simulator y ∼ F (θ) to generate data. These generated data are then compared to the observed data and the "distance" between them acts as a proxy for the likelihood of the parameter given the observation. A flavor of ABC called synthetic likelihoods (SL) [26] , finds the proxy-likelihood by generating multiple data samples per parameter proposal, and, assuming asymptotic normality, computes the then tractable likelihood of the observations. In our case of using a Kalman filter estimator, the likelihood estimate is the Kalman marginal likelihood. This likelihood acts in similar spirit as the SL, since the filter can be viewed as the limit of multiple simulations under a Gaussian assumption. However, the analogy is not perfect since the Kalman filter implements correction steps for each new data point. We use the Kalman Likelihood in the Adaptive Metropolis algorithm ("KLAM"). This is similar to the classical Metropolis algorithm but with an adaptive proposal function [27] . The proposal function is a multivariate normal distribution with mean at the current parameter point and an adjustable covariance matrix N (x t−1 , C t ), where We assume a diagonal initial covariance C 0 = 0.001 × I d , for the d-dimensional identity I d , and we start adapting after t 0 = 10 accepted proposals. We also use the step-length tuning parameter s = 0.05 × 2.4 2/d , and we run each region parameter posterior chain for four parallel 5 × 10 4 samples resulting in a Gelman-Rubin score below 1.1, e.g., = 1.01 for Uppsala. By using the Kalman filter likelihood, samples are fast to generate. A full regional posterior of 2 × 10 5 (minus 1 × 10 4 as burn-in) is generated in a little over 30 minutes on an Intel (4×)Core i7-6820HQ CPU @ 2.70GHz. The regional posteriors are also solved independently of each other, thus allowing us to sample them in parallel using Matlab's parfor parallel for-loop. A full national posterior can thus be generated in little over 12 hours. The fast sampling is made possible not only thanks to the filter set-up, but also thanks to the time-sequential character of the problem. We repeated the inference each week, and could use the previous weeks' and regions' posteriors as initial guesses for the new posteriors. With posterior estimates of β t available in four-week intervals, we now explain further the method to estimate the daily infection recruitment β k = β(t k ) and as a result the daily reproduction number. The methodology relies on minimizing the negative logarithmic likelihood. For that purpose, let B denote a vector of consecutive β t 's as andB denote the corresponding estimate. Writing the dependence on B explicitly, the optimization problem can then be formulated asB = arg min where (K) (B) denotes the logarithmic marginal likelihood computed over the time horizon [1, K] , i.e. where c is a positive regularization parameter and ∆ denotes the first difference. The second part of the cost function in (38) thus constitutes a regularization term which reduces high-frequent fluctuations in the estimate.ỹ k is the deviation between measurements z k and the output from the mean-field dynamics system: There is no correction of the state from the measurements as in the Kalman filter, but instead the state x k is given by the state space model i.e., the matrix F k is now time varying according to where the remaining parameters of F k are given by the previously inferred posterior maximum likelihoods. Finally, the covariance matrices S k in (38) are calculated from the linear filter, again with the maximum likelihood parameter values, and including β t . As a result, the optimization can be viewed as a quasi-maximum likelihood estimation. Here, the parameter β t is estimated, as expected, to show the fastest temporal fluctuations, but problems with several free parameters could also be considered, such as the IFR. The initial state x 0 , could also be included in the optimization formulation. The problem is solved using the interior point method with the function fmincon in Matlab. Including the whole optimization horizon in a single optimization, however, turns out to be very time consuming computationally, or even infeasible for the problem at hand, when K is of the order of several hundreds. For this reason, an approach inspired by techniques from automatic control is used to divide the optimization horizon as described below. Since the complexity of the optimization algorithm is superlinear in K, computational gains can be made by dividing the optimization horizon into shorter windows, which are solved independently. In our context, an additional motivation for this is that the problem is solved every week as new data becomes available, which means that optimization results for earlier time windows potentially could be reused. However, the result of concatenating optimization results calculated over non-overlapping windows does not coincide with the solution to the full optimization (38) , due to end-of-horizon effects, i.e., that the β t -estimates toward the end of one time window do take data outside the window into account. Inspired by the methodology of model predictive control (MPC), we therefore utilize overlapping optimization windows. More specifically, time steps ∆k = k i+1 − k i are used to iterate over the optimization horizon, corresponding to the sampling times in MPC, and at each step, an optimization of the form (38) is solved, but over a horizon [k i , k i + K p ], where ∆k < K p < K. This horizon corresponds to the prediction horizon in MPC. The calculated values of β k for k ∈ [k i , k i+1 ] are then used to build the vectorB. Assuming that for a sufficiently large K 0 , measurements at times κ 0 + L, L ≥ K 0 have negligible effect on the values of the optimal β k , k ≤ κ 0 , we are thus able to replace the optimization problem (38) with a sequence of optimization problems of the form whereB then is created by concatenating the first ∆k elements of eachB i (except for the lastB i which is used in its entirety). Notice that constraints need to be added to the optimizations to ensure "continuity" ofB, i.e. that the regularization is employed also across the limits of the time windows. In our case, a prediction horizon of 150 days and a step length of 20 days was used. For typical datasets, there is then no discernible difference between the optimal solution for the full horizon and the combination of the solutions to the smaller problems. In Fig. 12 , this is illustrated for the estimation of β t from one year of data from Uppsala. The solution time with the divided optimization is shorter; approximately seven minutes instead of ten on a standard modern laptop. This difference increases with the length of the total time horizon. Figure 12 : Estimated β t with one batch optimization versus dividing the horizon. For the latter method, the discarded "tails" of each optimization which are caused by the endof-horizon effect are visible. The computed posterior in Fig. 2 is the population weighted posterior when combining the samples from Sweden's 21 regions. In Fig. 13 we display the mean posterior ± 1 standard deviation for each region. The results agree well across this natural bootstrap population albeit with a few outliers. Figure 13 : Posterior mean ± 1 standard deviation per parameter and across 21 Swedish regions. For the dynamic parameters R t and IFR, the temporal average is displayed. By the bootstrap procedure presented in Material and Methods, we can investigate the posterior robustness, including estimating the bias due to the approximate likelihood. The inference procedure is repeated on a synthetic data set Figure 14 : Fully synthetic simulations for a few selected regions with parameters from the national mean posterior, but with upscaled regional β t . The lines of lighter shades of blue and red are realizations and the solid lines are the mean of the samples. The points are the data points used in the inference. generated by the mean posterior estimate and with the temporal resolution upscaling procedure for β t . In Fig. 14 we display 25 such replicates for a few selected regions. Note that these were obtained in a completely off-line fashion and are never corrected against data. For each region r = 1, . . . , 21 and parameter dimension k = 1, . . . , K in the posterior we estimate the biasb r,k , as per the description in Material and Methods. The dynamic parameters are here treated as a single parameter, that is, with a single average bias, and together with the static parameters there are K = 12 in total. We compute three statistics T = [T 1 , T 2 , T 3 ] to characterize the spread: the coefficient of variation (CoV), the coefficient of bias (CoB), and the normalized root-mean-square error (NRMSE), T k,r,1 = CoV = σ k,r /µ k,r , T k,r,2 = CoB = |b k,r |/µ k,r , T k,r,3 = NRMSE = σ 2 k,r +b 2 k,r /µ k,r , for the standard deviation σ k,r , the mean µ k,r , and the estimated biasb k,r . In Tab. 6, we present the median value per region and statistic s ∈ {1, 2, 3} over the K parameters, T r,s = Median k T k,r,s . We visualize both the data posterior and the aggregate of all bootstrap replicate posteriors (n = 3) in Fig. 15 . We also compare the daily R t estimate of the actual data posterior and the average from the bootstrap replicates for Uppsala in Fig. 16 . To evaluate the predictive performance of our Kalman filter model outside of a model vacuum, we compare to an autoregressive model (AR) model. AR models are common in time-series predictions, and for COVID-19, AR models with auxiliary indicators show significant prediction power [28] . We consider a single-day forward expanding data window for which we fit the AR model on all data in the window and predict 7 days ahead. This procedure is evaluated across the same data as used in our live reports (N = 25). As the Kalman filter, the AR model also uses [H, W, D] as observations. We use Matlab's arx implementation of the AR model, formally called Vector AR model with Exogenous Variables. The polynomial orders and delays are set from finding good predictions in a mean square error sense on the first 50 days of the dataset. We tune n b and n c by hand, and proposed values for n a by a partial autocorrelation function plot per data dimensions. We find the order of the A q polynomials: n a = [1, 1, 1] , the B q polynomials: n b = [0, 1, 0] , and the input-output delay: n c = [1, 1, 1] to be close to optimal choices. In Fig. 17 , we visualize the 7-day ahead prediction made by the AR model for Uppsala. In Tab. 7, we present the respective prediction precision by NRMSE and multivariate Energy score [29] of the two models along with a repetition of the results from Tab. 1. A closer inspection reveals that the simpler models have a smaller NRMSE and Energy Score than the posterior Kalman filter but with overly pessimistic CrIs. The simple AR model generates good mean-square predictions in a fraction of the training time of our posterior Kalman filter and could likely be improved a bit when it comes to the width of the confidence interval estimate, reducing the Energy score further. The great advantage with our approach lies instead in the fact that the posterior model itself can be disassembled and contains valuable epidemiological information. Figure 17 : Result from the 7-day ahead AR prediction for the Uppsala region with 68% CrI (shaded). The live reporting (N = 25) for Tabs. 1 and 7 was done in the reporting period indicated towards the second half. Early transmission dynamics in Wuhan, China, of novel coronavirus-infected pneumonia Suppression of a SARS-CoV-2 outbreak in the Italian municipality of Vo' Nowcasting and forecasting the potential domestic and international spread of the 2019-nCoV outbreak originating in Wuhan, China: a modelling study Prediction models for diagnosis and prognosis of covid-19: systematic review and critical appraisal Digital public health surveillance: a systematic scoping review Real-time tracking of self-reported symptoms to predict potential COVID-19 Rapid implementation of mobile technology for real-time epidemiology of COVID-19 A framework for identifying regional outbreak and spread of COVID-19 from one-minute population-wide surveys App-based COVID-19 syndromic surveillance and prediction of hospital admissions in COVID Symptom Study Sweden COVID-19 data hub An open repository of real-time COVID-19 indicators TBCOV: Two Billion Multilingual COVID-19 Tweets with Sentiment, Entity, Geo, and Gender Labels Public mobility data enables COVID-19 forecasting and management at local and global scales Forecasting models for coronavirus disease (COVID-19): a survey of the state-of-the-art Can auxiliary indicators improve COVID-19 forecasting and hotspot prediction? Cryptic transmission of SARS-CoV-2 and the first COVID-19 wave Spread and dynamics of the COVID-19 epidemic in Italy: Effects of emergency containment measures Optimization in the context of covid-19 prediction and control: A literature review The impact of uncertainty on predictions of the CovidSim epidemiological code The effect of interventions on COVID-19 Nowcasting Covid-19 statistics reported with delay: a casestudy of Sweden Gaussian process nowcasting: Application to COVID-19 mortality reporting Nowcasting epidemics of novel pathogens: lessons from COVID-19 Bayesian epidemiological modeling over high-resolution network data Adaptive Bayesian learning and forecasting of epidemic evolution-data analysis of the COVID-19 outbreak Outbreakflow: Model-based Bayesian inference of disease outbreak dynamics with invertible neural networks and its application to the COVID-19 pandemics in Germany Inference for reaction networks using the linear noise approximation Bayesian data analysis Visualization in Bayesian workflow Inferred duration of infectious period of SARS-CoV-2: rapid scoping review and analysis of available evidence for asymptomatic and symptomatic COVID-19 cases The population dynamics of microparasites and their invertebrate hosts Spatio-temporal modelling of verotoxigenic Escherichia coli O157 in cattle in Sweden: exploring options for control Scenarier för fortsatt spridning Ranking the effectiveness of worldwide COVID-19 government interventions The reproductive number of the delta variant of SARS-CoV-2 is far higher compared to the ancestral SARS-CoV-2 virus A new approach to linear filtering and prediction problems An adaptive metropolis algorithm Statistical inference for noisy nonlinear ecological dynamic systems Estimating the burden of SARS-CoV-2 in france Seropositivity in blood donors and pregnant women during the first year of SARS-CoV-2 transmission in Påvisning av antikroppar mot SARS-CoV-2 (1) i blodprov frånöppenvården The infection fatality rate of COVID-19 in Stockholm -technical report Evaluation of undetected cases during the COVID-19 epidemic in austria Estimating SARS-CoV-2 infections from deaths, confirmed cases, tests, and random surveys Temporal trends in hospitalizations and 30-day mortality in older patients during the COVID pandemic from COVID-19 Forecasting Team. Variation in the COVID-19 infection-fatality ratio by age, time, and geography during the pre-vaccine era: a systematic analysis. The Lancet Case fatality: rate, ratio, or risk? Case fatality rate of COVID-19: a systematic review and meta-analysis Avlidna och covid-19 Complexity of the basic reproduction number (R0) Meddelande från styrelsen -Överenskommelse mellan Staten och Sveriges Kommuner och Regioner omökad nationell testning och smittspårning för covid-19 2021 Bekräftade fall av covid-19 i sverige Avlidna och covid-19 Vård och covid-19 Mortalitet 30 dagar Aktuellt reproduktionstal (R-tal) Påvisning av antikroppar mot SARS-CoV-2 (1) i blodprov frånöppenvården Seropositivity in blood donors and pregnant women during the first year of SARS-CoV-2 transmission in Antal pendlare per län och kommun Nowcasting Covid-19 statistics reported with delay: a casestudy of Sweden The population dynamics of microparasites and their invertebrate hosts Data-driven computational disease spread modeling: from measurement to parametrization and control Nawel Zammit, Rim Ghammem, Sihem Ben Fredj, and Hassen Ghannem. The incubation period during the pandemic of COVID-19: a systematic review and meta-analysis Inferred duration of infectious period of SARS-CoV-2: rapid scoping review and analysis of available evidence for asymptomatic and symptomatic COVID-19 cases Quantifying SARS-CoV-2 transmission suggests epidemic control with digital contact tracing Magnitude of asymptomatic COVID-19 cases throughout the course of infection: A systematic review and meta-analysis COVID-19 infection fatality ratio: estimates from seroprevalence The infection fatality rate of COVID-19 in Stockholm -technical report Aerosol and surface stability of SARS-CoV-2 as compared with SARS-CoV-1 Estimate of the basic reproduction number for COVID-19: a systematic review and meta-analysis Mortality trends among hospitalised COVID-19 patients in sweden: a nationwide observational cohort study. The Lancet Regional Health-Europe Approximate Bayesian computational methods Handbook of approximate Bayesian computation Statistical inference for noisy nonlinear ecological dynamic systems An adaptive metropolis algorithm Can auxiliary indicators improve COVID-19 forecasting and hotspot prediction? Assessing probabilistic forecasts of multivariate quantities, with an application to ensemble predictions of surface winds This work was financially supported by the Swedish Innovation Agency Vinnova, by the Swedish Research Council Formas (S. Engblom), and by the Swedish Research Council (H. Runvik and A. Medvedev). SE conceived the research and RM and SE developed the initial forward model. Linear filters were designed by SE and HR who also developed and operated the dynamic optimization approach with inputs from AM. SE developed priors and data pre-processing and RM adapted the Bayesian sampling techniques, collected data, performed computations, and prepared the manuscript joint with SE and with inputs from HR. All authors took part in revising the manuscript.