key: cord-0477262-xx32xqh7
authors: Wood, Simon N.; Wit, Ernst C.; Fasiolo, Matteo; Green, Peter J.
title: COVID-19 and the difficulty of inferring epidemiological parameters from clinical data
date: 2020-04-28
journal: nan
DOI: nan
sha: aeca2a8b7dc6e608c9a5691738a62177b3fd0e79
doc_id: 477262
cord_uid: xx32xqh7

Knowing the infection fatality ratio (IFR) is of crucial importance for evidence-based epidemic management: for immediate planning; for balancing the life years saved against the life years lost due to the consequences of management; and for evaluating the ethical issues associated with the tacit willingness to pay substantially more for life years lost to the epidemic, than for those to other diseases. Against this background Verity et al. (2020, Lancet Infections Diseases) have rapidly assembled case data and used statistical modelling to infer the IFR for COVID-19. We have attempted an in-depth statistical review of their approach, to identify to what extent the data are sufficiently informative about the IFR to play a greater role than the modelling assumptions, and have tried to identify those assumptions that appear to play a key role. Given the difficulties with other data sources, we provide a crude alternative analysis based on the Diamond Princess Cruise ship data and case data from China, and argue that, given the data problems, modelling of clinical data to obtain the IFR can only be a stop-gap measure. What is needed is near direct measurement of epidemic size by PCR and/or antibody testing of random samples of the at risk population.

Knowing the infection fatality ratio (IFR) is crucial for epidemic management: for immediate planning; for balancing the life years saved against those lost to the consequences of management; and for evaluating the ethics of paying substantially more to save a life year from the epidemic than from other diseases. Impressively, Verity et al. (2020) rapidly assembled case data and used statistical modelling to infer the IFR for COVID-19. We have attempted an in-depth statistical review of their paper, eschewing statistical nit-picking, but attempting to identify the extent to which the (necessarily compromised) data are more informative about the IFR than the modelling assumptions. First the data.

• Individual level data for outside China appear problematic, because different countries have differing levels of ascertainment and different disease-severity thresholds even for classification as a case. Their use in IFR estimation would require country-specific model ascertainment parameters, about which we have no information. Consequently these data provide no useful information on IFR.

• Repatriation flight data provide the sole information on Wuhan prevalence (excepting the lower bound of confirmed cases). 689 foreign nationals eligible for repatriation are doubtfully representative of the susceptible population of Wuhan. Hence it is hard to see how to usefully incorporate the 6 positive cases from this sample.

• Case-mortality data from China provide an upper bound for IFR, and, with extra assumptions, on the age dependence of IFR. Since prevalence is unknown, they contain no information for estimating the absolute IFR magnitude.

• Because of extensive testing, the Diamond Princess (used only for validation by Verity et al.) supplies data on both infections and symptomatic cases, with fewer ascertainment problems. These data appear directly informative about IFR. Against this, the co-morbidity load on the DP is unlikely to fully represent any population of serious interest (perhaps fewer very severe but more milder co-morbidities).

Secondly, the modelling assumptions: we see two primary problems.

1. Verity et al. correct the Chinese case data by assuming that ascertainment differences across age groups determine case rate differences. Outside Wuhan they replace observed case data by the cases that would have occurred if each age group had the same per-capita observed case rate as the 50-59 group. They assume complete ascertainment for the 50-59s. These are very strong modelling assumptions that will greatly affect the results: but the published uncertainty bounds reflect no uncertainty about them. In Wuhan, the complete ascertainment assumption is relaxed by introducing a parameter, but one for which the data appear uninformative, so the results will be driven by the assumed uncertainty.

2. Generically, Bayesian models describe uncertainty both in the data and in prior beliefs about the studied system. .11,.30 ). The strong assumptions required, by this approach too, emphasize the need for improved data. We should replace complex models of inadequate clinical data, with simpler models of epidemiological prevalence data from appropriately designed random sampling using antibody or PCR tests.

We attempt to construct a model for the Diamond Princess (henceforth DP) data and aggregated data from China, with the intention that the DP data informs the absolute magnitude of the IFR while the China data contributes to the estimation of relative IFR by age class. For the Diamond Princess we lump the 80-89 and 90+ age groups into an 80+ group to match the China data, noting that there are no deaths in the 90+ group. We obtained the age of death of the 12 cases from the Diamond Princess Wikipedia page, checking the news reports on which the information was based. One case has no age reported except that he was an adult. Given that there was no mention of a young victim we have assumed that he was 50 or over. We adopt the assumptions of Verity et al. (2020) of a constant attack rate with age, and that there is perfect ascertainment in one age class, but assume that this is the 80+ age group for the DP. The assumption seems more tenable for the DP population than for China, given that 4003 PCR tests were administered to the 3711 people on board, with the symptomatic and elderly tending to be tested first. However given that the tests were not administered weekly to all people not yet tested positive, from the start of the outbreak, and that the tests are not 100% reliable, the assumption is still unlikely to be perfect, which may bias results upwards. Unlike Verity et al. (2020) we do not correct the case data, but adopt a simple model for under-ascertainment by age, allowing some, but by no means all, of the uncertainty associated with this assumption to be reflected in the intervals reported below. We then model a proportion of the potentially detectable cases as being symptomatic, making a second strong assumption that this rate is constant across age classes. This assumption is made because the data only tell us that there were 314 symptomatic cases among 706 positive tests but not their ages, so we have no information to further distinguish age specific under-ascertainment and age specific rates of being asymptomatic. We then adopt a simple model for the probability of death with age (quadratic on the logit scale).

For the China data we necessarily use a different attack rate to the DP, but the same model as the DP to go from infected to symptomatic cases (on the basis that this reflects an intrinsic characteristic of the infectious disease). However we assume that only a proportion of symptomatic cases are detected (at least relative to whatever threshold counted as symptomatic on the DP). Furthermore we are forced to adopt a modified ascertainment model for China, and correct for the difference between this and the DP ascertainment model, within the sub-model for China. We assume the same death rates for symptomatic cases in China, but apply the Verity et al. (2020) correction for not-yet-occurred deaths, based on their fitted Gamma model, treating this correction as fixed.

2 Technical details of the crude IFR model

In detail, starting with the Diamond Princess, let α be the infection probability, constant for all age classes, p c i the probability of an infection to be detectable in age class i, p s i the probability that a detectable case develops symptoms and p d i the probability that a symptomatic case dies. p c i p s i p d i is the IFR for age class i. Let a i denote the lower age boundary of class i. The models are (i) for the detectability probability

Note the assumption that all cases in the oldest age class are ascertained; (ii) a constant symptomatic probability model, p s i = φ, and (iii) for the probability that a symptomatic case dies,

For a case to be recorded on the DP, the person needed to be attacked by the virus, gotten ill and detected at the right moment. In principle, this means that the number of cases in age class i is distributed as a binom(p c i α, n i ), where p c i α is the probability of gotten ill and detected, and n i is the number of people in age class i on the DP. However, as only 619 out of the 706 cases have their age recorded, we split the cases into

where C i are the observed cases of known age and C + i are the additional cases, assumed to follow the same age distribution, but not actually recorded by age. Binomial parameters are rounded appropriately. Letting S i denote the symptomatics among the cases in age group i, we have

The deaths among the symptomatics of known age are distributed as

where h i is the probability of being of known age on death (this is treated as fixed at 1 for ages less than 50, and 11/12 for 50+ given the one victim on the DP for which no age was recorded, except that he was an adult). For the deaths of unknown age, D na , (there is one of these) among the symptomatics of unknown age (an artificial category)

where the probability of death is

Finally the total number of symptomatics is modelled as S t ∼ N ( i S i , 5 2 ), allowing some limited uncertainty in the symptomatic/asymptomatic classification.

The actual available data on the DP are S t , D na and {C i , n i , D i } 80 i=0 .

Moving on to the Chinese data, the assumption is that the patterns with age with respect to detection (p c i ), to being symptomatic (p s i ) and to death (p d i ) are similar, but the attack rateα for China is different. Let N i be the population size in age class i andS i the symptomatics. Theñ Unlike on the DP, only a fraction δ i of the symptomatics are tested to become cases,

and the (observed) deaths are then distributed as

where p y i is the average probability of a case in class i having died yet, given they will die -this was treated as a fixed correction and is computed from the Verity et al. (2020) estimated Gamma model of time from onset to death, and the known onset times for the cases. The scaling by δ i ensures that p d i maintains the same meaning between DP and China. We model δ i as δ i = δp cc i /p c i where p cc i is an attempt to capture the shape of the actual China detectability with age and is defined as p cc i = exp{−(a i − 65) 2 /e γc }.

We define the following priors using precision and not variance when defining normal densities: α ∼ U (.1, .9), γ 1 ∼ U (.01, .99), γ 2 ∼ N (7.2, .001), φ ∼ U (.1, .9), β 1 ∼ N (−3.5, .001), β 2 ∼ N (0, .001), β 3 ∼ N (0, .001), α ∼ U (10 −4 , .5), δ ∼ U (.1, .9), γ c ∼ N (7.4, .01).

This structure uses the information from the DP to assess the symptomatic rate and hidden case rate and the scale of the death probabilities, while the China data refines the information on how death rates change with age. It is possible to formulate a model in which the China data appear to contribute to inference about absolute levels of mortality, but this model is completely driven by the prior put on proportion of cases observed (about which the China data are completely uninformative).

The model was implemented in JAGS 4.3.0. Mixing is slow, but 5×10 7 steps, retaining every 2500th sample, gives an effective sample size of about 660 for δ, the slowest moving parameter. We discarded the first 2000 retained samples as burn in, although diagnostic plots show no sign that this is necessary. Posterior predictive distribution plots are shown in Figure 2 . We note the problems with young Chinese detected cases, although even the most extreme mismatch only corresponds to a factor of 2 IFR change, if reflecting incorrect numbers of actual cases. In older groups the model cases are a little high on average, but not by enough to suggest much change in IFR. These mismatches might be reduced by better models for the ascertainment proportion by age. Figure 1 shows the posterior predictive distribution for total Diamond Princess deaths with the actual deaths as a thick red bar. The median and credible intervals for the IFR as percentages in various groups are in Table 1 . They show different estimates of this crucial quantity compared to Verity et al. (2020) , again emphasising the urgent need for statistically principled sampling data to directly measure prevalence, instead of having to rely on complex models of problematic data with strong built in assumptions. 

## Diamond Princess and China model -the two data sources that appear to ## contain information. library(rjags) load.module("bugs")

Estimates of the severity of coronavirus disease 2019: a model-based analysis. The Lancet Infectious Diseases

## This approach slightly underconstrains model as I know total should ## be 706 and have failed to include fact Cl[i]˜dbinom(pc[i] * alpha,n[i]-round(n[i] * 619/706)) ## extra cases not classified by age in data C[i]˜dbinom(pc[i] * alpha

ps[i] <-phi #ilogit(lphi) ## the symptomatic

Cpp[i]+Cl[i]) ## Posterior predictive version ## probability of death given symptoms and age known

)ˆ2) ## deaths in Symptomatics of known age

])) ## Posterior predictive version } ## total with symptoms

St <-sum(S) ## Monitor this as posterior predictive ## allow some slop in symptomatic/asymptomatic assessment

1/25) ## Observed node ## deal with Deaths without an age

-pa) * S) ## Symptomatic with unknown age DNA˜dbinom(sum((1-pa) * pd * S)/SNA,round(SNA)) ## Observed node for

## China detection profile correction } ## Now do China

## China attack rate (assumed constant with age as in paper) for (i in 1:9) { ## China data has 9 age classes Sch[i]˜dbinom(pc[i] * alpha.ch * ps[i],pop[i]) ## potentially detectable symptomatics ## but only some proportion of those are detected

## observed node Cchpp[i]˜dbinom(delta * dc[i],Sch[i]) ## Posterior predictive version Dch

Sy)mptomatic total ## DNA is Deaths No Age, pa is probability of not knowing age. -list(age = 0:8 * 10 , n = c

) ## corrections for insufficient time to see all deaths

208) ################### setwd("foo/bar") ## NOTE: set to jags file location ################### jdp <-jags.model("dp-china.jags

system.time(um <-jags.samples(jdp, c

## burn-in for (k in 1:4) { if (k==1) { pp <-um$Cpp;true <-dat$C;xlab <-"DP cases"} else if (k==2) { pp <-um$Dpp;true <-dat$D;xlab <-"DP deaths"} else if (k==3) { pp <-um$Cchpp;true <-dat$Cch;xlab <-"China cases"} else { pp <-um$Dchpp;true <-dat$Dch;xlab <-"China deaths"} for

<-hist(log10(ifr), main=(i-1) * 10,xlab="log10(risk)")

:2000),1]),xlab = "DP deaths

04) ## roughly China demography ## Wikipedia Indian demography

27) ## total pop statista uk <-uk/sum(uk) ## 2018 UK demography ## Verity point estimate IFR by age

## DP deaths according to Verity and assuming all cases found sum(uk * verity) ## UK IFR Verity point estimates sum(uk * ci[2,]) ## UK IFR median point estimates ## overall IFR for various demographies

975)) * 100 ## China pt <-uk % * % (um$ps

975)) * 100 ## UK pt <-india % * % (um$ps

975)) * 100 ## India

Acknowledgements: we thank Jonathan Rougier and Guy Nason for helpful discussion of onset-todeath interval estimation and the individual level data.