key: cord-0495389-f09lsotw authors: Gasparini, Mauro title: Improving Bayesian estimation of Vaccine Efficacy date: 2021-03-07 journal: nan DOI: nan sha: 299e27161a405b2a21766666cc24d36d64e2b9f6 doc_id: 495389 cord_uid: f09lsotw A full Bayesian approach to the estimation of Vaccine Efficacy is presented, which is an improvement over the currently used exact method conditional on the total number of cases. As an example, we reconsider the statistical sections of the BioNTech/Pfizer protocol, which in 2020 has led to the first approved anti-Covid-19 vaccine. The so-called "exact method conditional on the total number of cases" is a Bayesian approach to the estimation of vaccine efficay (VE) which has been used in the recent pivotal clinical trial of the anti Covid-19 Vaccine sponsored by Pfizer/BioNTech ( [2] ) and has also been mentioned in the analogous paper about the vaccine sponsored by Moderna ( [1] ), the other currently approved mRNA based vaccine. The name "exact method conditional on the total number of cases" comes from that citation in the latter work. In addition to the enormous impact of these new therapies on the lives of billions of people, it should be stressed that these were some of the first major clinical trials adopting Bayesian methods for planning and analysis, something many statisticians have been advocating for quite some time now. Nonetheless, the exact method conditional on the total number of cases is only an approximate Bayesian approach, since it makes only partial use of the full likelihood and of the Bayesian updating mechanism. In particular, the total number of cases and the surveillance times of the vaccinated and of the placebo cohorts are treated as known parameters instead of observed statistics, hence the adjective "conditional": the method is a partially Bayesian method conditional on the the total number of cases and on the surveillance times. This work contains a more complete full Bayesian approach which takes as starting point the same assumptions of the conditional method but generalizes it by computing the distributions of the total number of cases and of the surveillance times and by including them in the full model. The exact method conditional on the total number of cases is described in Section 2, while the full Bayesian model is derived in Section 3. Section 4 con-tains a revisitation of the Pfizer/BioNTech results and a comparison with the full Bayesian approach. The "exact method conditional on the total number of cases" relies on the mathematical assumption that the infection processes can be modeled by two overlapping homogeneous Poisson processes: one for vaccinated participantswith intensity λ v -and an independent one for the control (not vaccinated) participants -with intensity λ c . The time dimension of the Poisson processes is called "surveillance time" and it is measured in person-years of follow-up. It is the sum of all durations participants have been experiencing in the clinical trial from 7 days (for the BNT162b2 mRNA vaccine) after the second dose up until the earliest of the following four endpoints happens: onset of disease, death, loss to follow up or end of study. A common measure of comparison between two infection processes in Epidemiology is the incidence rate ratio IRR= λ v /λ c ; based on it, the percentage version of VE is defined as which can be interpreted as the average percentage of missed infections (percentage of not infected vaccinated participants who would have been infected if not vaccinated). In order to estimate VE, one can define a likelihood based on the following statistics: • s v = surveillance time of the vaccine cohort, • s c = surveillance time of the control cohort, • x v + x c = total number of infections, • x v = number of infections in the vaccine cohort. Using standard probability symbolism, the joint density of the corresponding random variables (indicated in capital letters), which is the dual way of writing the likelihood, can be expressed as i.e. as the chain product of the marginal density of S V , S C times the conditional density of X V + X C times the conditional density of X V given x V + x C , which can be easily proved to be binomial: Notice that the probability of infection in this formula is The exact method conditional on the total number of cases consists of assuming that the first two factors of the likelihood (2.2) do not depend on VE, substituing de facto the binomial expression (2.3) for the full likelihood (2.2). To complete the analysis, a conditional Bayesian approach is then followed and a conjugate prior Beta(a,b) is given to the parameter θ. Once the posterior is obtained, the mean a posteriori (MAP) estimate, Bayesian credible intervals and posterior probabilities can be computed regarding θ and, working equation (2.4) backward, regarding VE itself. An example from the Pfizer/BioNTech paper is discussed in Section 4. It is not true that the first two factors of the likelihood (2.2) are independent of VE. Imagine studies that go on for a long time: the ratio between the control surveillance time s c and the vaccinated surveillance time s v approximates then the ratio of the two mean times to infection, by the law of large numbers applied once to the numerator and once to the denominator of the ratio. Now, the ratio of the two mean times is exactly 1-VE, since the times to infection are exponentially distributed with mean 1/λ v for the vaccinated and 1/λ c for the control groups. Hence, the ratio of the surveillance times does contain some extra information about VE, in addition to the numbers of cases in the two groups. In practice, things are complicated by the fact that the study must have a finite duration D. Now, it is not impossible to derive an explicit expression for the full likelihood (2.2). Fist, one should notice that by the properties of two independent overlapping Poisson processes. Next, the total surveillance times are the sum of many i.i.d. random variables, each of them given by the minimum between the time to infection and a random censoring time. By the central limit theorem, the surveillance time of each of the two cohorts is therefore approximated normal as in the following theorem, where a reasonable specific assumption is made about the patient recruiting process. and variance Var(min(T, C)) = 1 Proof. The potential infection time is exponential with rate λ and can be censored by an independent censoring random variable C. By assumption 3, C can be written as D −U , where U is uniform between 0 and the recruitment duration D. Therefore it has itself a uniform distribution between 0 and D. Next, Finally, the central limit theorem applies. Having completed the construction of the likelihood, to obtain a full Bayesian model only the prior on (λ v , λ c ) remains to be decided. A natural proposal is to have independent gamma priors with hyperparameters (a v , b v ) and (a c , b c ). The scale parameters b v and b c should then be chosen to give λ v and λ c the right order of magnitude. For example, having a prior guess λ c for the average number of infected people in the unit time -something we may estimate based on the natural history of the disease -we could set b c = a c / λ c and, for the lack of better information, impose b v = b c . Next, the two hyperparameters a V and a C can be chosen by gauging V E = 1 − λ v /λ c . For example, noticing that where VE is a suitable prior guess for VE, with 0 ≤ V E ≤ 1 (there would be no point in experimenting with a vaccine for which the expected VE is negative, since that case would imply λ v > λ c ). Finally, the choice of a v could be driven by noticing that a v = 1 allows for an inverted J-shape density on λ v , which is therefore exponential. WIth this choice, b c = b v = ( λ c ) −1 , i.e. the average time to infection of a randomly selected participant in the control group. Notice also that in this case a c > 1, which would allow for the prior expectation of VE in formula (3.3) to exist, positive. To recap, here's the full Bayesian model proposed, which is also illustrated graphically by its associated Directed acyclic Graph (DAG) in Figure 1 λ with E(min(T v , C v ))Var(min(T v , C v )), E(min(T c , C c ))Var(min(T c , C c )) given in Theorem 3.1 for the vaccine and the control group respectively and the following default choices: The following theorem draws a connection to the exact method conditional on the total number of cases. Proof. It is easy to see that b v λ v ∼ Gamma(a v , 1) and , independently, b c λ c ∼ Gamma(a c , 1). Then, by the well known Renyi's representation of a Dirichlet distribution, which in one dimension is the same as a Beta distribution, the theorem follows. If we used the data-dependent prior b v = s v and b c = s c , then the full model would approximate the exact method conditional on the total number of cases (see equation 2.4). That is not what is recommended here though, since data-dependent priors are difficult to accept and instead the prior choice b v = b c = ( λ c ) −1 discussed above looks more reasonable. Several computational strategies are available for the full Bayesian model, the easiest being MCMC simulation of the exact posterior distribution of VE. Among the many possibilities available nowadays, the sofware OpenBUGS (http://www.openbugs.net/) has been used here. The model and data files necessary to run the MCMC simulation in OpenBUGS are listed in the Appendix. The following results are taken from Table 2 and obtained a posterior mean equal to 93.6 and a posterior interval (89.0,97.0) for VE. That does not differ from the full Bayes analysis due to the very large sample sizes, which make the influence of the prior disappear and provide substantial evidence for the validity of the Pfizer/BioNTech vaccine. From a theoretical point of view, a new fully Bayesian coherent model is developed here for VE. It does not provide results in strong contrast with the approximations in the original paper [2] , which exhibits large sample sizes and uncontroversial positive results. The new model may be the theoretical basis for the other vaccines currently under development, which may not exhibit the same size of VE. For those, a careful and motivated prior may make a difference. The model of the OpenBUGS program is model{ lambdac~dgamma(ac,bc) lambdav~dgamma(av,bv) sc~dnorm(expsc, tausc) sv~dnorm(expsv, tausv) tausc <-1/(nv*varsc) tausv <-1/(nv*varsv) expsc <-nc*(1-(1-exp(-lambdac*D))/(lambdac*D)) / lambdac varsc <-(2*exp(-lambdac*D) + 4*exp(-lambdac*D)/(lambdac*D)pow((1-(1-exp(-lambdac*D))/(lambdac*D)),2))/pow(lambdac,2) expsv <-nv*(1-(1-exp(-lambdav*D))/(lambdav*D))/lambdav varsv <-(2*exp(-lambdav*D) + 4*exp(-lambdav*D)/(lambdav*D)pow((1-(1-exp(-lambdav*D))/(lambdav*D)),2))/pow(lambdav,2) cases~dpois(meancases) meancases <-sc*lambdac + sv*lambdav xv~dbin(theta, cases) theta <-sv*lambdav/(sc*lambdac + sv*lambdav) Efficacy and Safety of the mRNA-1273 SARS-CoV-2 Vaccine Safety and Efficacy of the BNT162b2 mRNA Covid-19 Vaccine