key: cord-0814240-saos249x
authors: Wibbens, P. D.
title: How effective are the COVID-19 vaccines? A Bayesian analysis
date: 2020-11-30
journal: nan
DOI: 10.1101/2020.11.30.20240671
sha: 50bbceb6007cf8b61801a6e29885d552257744e9
doc_id: 814240
cord_uid: saos249x

This short paper reports a Bayesian analysis of the publicly available COVID-19 trial results. The analysis casts some doubts on whether the half+full dose regime of the AstraZeneca COVID-19 vaccine is truly (much) more effective than the 2x full dose regime. The 95% posterior interval for the effectiveness of the half+full dose regime is 66-96%, while for the 2x full dose regime it is 39-74%. The estimated effectiveness for the Pfizer vaccine is 89-97% and for Moderna 86-97%. These results should be interpreted with care though, since this analysis does not account for differences in for instance trial population, COVID-19 testing, and storage requirements for the various vaccines.

Over the past weeks, data for three COVID-19 vaccines have been released. The Pfizer and Moderna vaccines exhibit around 95% effectiveness [5, 4] . The AstraZeneca vaccine had a 62% effectiveness for two regular doses and a 90% when the first shot was administered in half a dose.

Though promising, these data also raise many questions. How big are the uncertainty margins for these effectiveness numbers, given that these initial results are based on a limited number of COVID-19 infections? And is it really likely that a first half dose for the AstraZeneca vaccine-which was apparently administrated in error [2] -is more effective than the full dose?

A Bayesian analysis is ideally suited to address such questions. In a Bayesian analysis, the probability distribution of unknown parameters is inferred from limited known data [3] . In this case, the probability distribution of the vaccine effectiveness for each trial is inferred from the publicly available data of COVID-19 infections in vaccinated and control groups.

Since the infection rates as a fraction of the total trial participants is low (in the order of one or a few percent), the number of infections of the control group n v can be assumed to follow a Poisson distribution with rate λ > 0. A vaccine with effectiveness 0 ≥ α ≥ 1 reduces this rate to (1 − α)λ for the number of infections in the vaccinated group n v . This assumes that trial participants have equal probability of being in the vaccine or the control group. If the vaccine has no effect (α = 0) the infection rate in the vaccinated group is the same as in the control group, and if the vaccine works perfectly (α = 1), the rate is reduced to zero. Summarizing:

Assume flat priors on α and log(λ). This means α ∼ Uniform(0, 1) and p(λ) ∝ 1/λ (an improper prior). This leads to the following posterior distribution:

Because we are primarily interested in the distribution of the vaccine effectiveness α, 2 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint

The copyright holder for this this version posted November 30, 2020. ; https://doi.org/10.1101/2020.11.30.20240671 doi: medRxiv preprint the parameter λ can be integrated out:

The third step in this derivation uses a parameter transformation in the integral λ → λ 2−α . Since the resulting integral is constant in α, it can be taken out of the posterior, which is commonly defined up to a normalization constant (hence the proportionality instead of equality signs). The resulting distribution is used for the posterior inference in this paper, after normalizing such that the total probability is equal to one.

The vaccine developers have released only limited data so far, through press releases. Pfizer and Moderna have released the data for the number of COVID-19 cases in treatment and control groups [5, 4] . AstraZeneca states that there were 131 COVID-19 cases observed so far, while the efficacy was 90% in the half+full dose group, 62% in the 2x full dose group, and 70% when combining both groups [1] . This is just enough data to infer the number of COVID-19 infections in the vaccine in control groups. Table 1 shows the summary data for the different trials, including the average implied effectiveness followingᾱ = 1 − n v /n c . Note. Number of COVID patients in vaccine group (n v ) and control group (n c ) with implied average vaccine effectiveness (ᾱ = 1 − n v /n c ), based on manufacturers' press releases of trial data [1, 5, 4 ].

. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint

The copyright holder for this this version posted November 30, 2020. ; https://doi.org/10.1101/2020.11.30.20240671 doi: medRxiv preprint

The posterior distribution of α resulting from Equation (1) can be inferred for each trial using the data in Table 1 . Figure 1 shows the resulting posterior probability density functions. Table 2 summarizes these distributions in terms of the posterior median as well as the 95% posterior probability intervals of vaccine effectiveness α. First of all note that the posterior median (i.e., "most likely") effectiveness in Table 2 is somewhat lower than the average implied effectiveness in Table 1 . The reason for this 4 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this this version posted November 30, 2020. ;

can be most easily understood in the extreme case that there are no observed patients in the vaccine group, n v = 0. In such a case the implied effectivenessᾱ would be 100%. Of course, though there is still a chance that the vaccine does not work perfectly, so there will be probability mass for α < 100%, and hence the posterior median is below 100%. A similar asymmetry is apparent in the posterior distributions in Figure 1 , leading to lower posterior medians for α than the reported average vaccine effectiveness.

Furthermore, the AstraZeneca trials exhibit wide error margins for the vaccine effectiveness. Most notably, the margins are sufficiently wide that it appears conceivable that the effectiveness for the half+full dose regime actually does not differ from the 2x full dose regime. In other words, the observed differences in vaccine effectiveness could be just to random noise. If the effectiveness of the different dosing regimes is in fact similar, the data from these two arms of the trial could be combined, which would yield an effectiveness between 54% and 79% for the AstraZeneca vaccine (95% posterior interval).

The Pfizer and Moderna data exhibit significantly smaller error margins. The current data imply a vaccine effectiveness between 89% and 97% for the Pfizer vaccine and 86% to 97% effectiveness for the Moderna vaccine. These 95% intervals are strictly higher than for the AstraZeneca combined vaccine, though it could be that the half+full dose regime of that vaccine is as effective as the Pfizer and Moderna vaccines. Clearly, more data on the AstraZeneca vaccine in the different dosage regimes would be needed to assess this.

This study uses a simple Bayesian analysis on publicly available trial data. It comes with several limitations. Most notably, not taken into account is that vaccines and trials can differ in many ways, such as:

• Trial participant demographics (country, age, medical history, etc.)

• COVID-19 testing procedure (for instance, in the AZ-Oxford trials participants are checked pro-actively for asymptomatic infections, while the Pfizer and Moderna trials rely on self-reporting with follow-up tests [2] )

• The storage and distribution of the vaccines (for instance, the Pfizer vaccine needs to be stored in −70 • C, while the AstraZeneca one comes with relatively mild storage conditions, facilitating distribution, especially for less economically-developed areas) The analysis could also be further extended. For instance, hierarchical priors could be used across different trials in order to get sense of the distribution of vaccine effectiveness. Also, a more formal Bayesian analysis could be performed to assess the posterior probability of the difference in effectiveness of the two dosage regimes of the AstraZeneca vaccine. Such 5 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint

The copyright holder for this this version posted November 30, 2020.

analysis could use an expert assessment of the prior probability that a regime starting with a half dose would be more effective. Finally, inference can be improved as more data comes available on these and other vaccines. Appendix A contains the R code used for the present analysis, which future researchers can use for these extended analyses.

A Bayesian analysis of the publicly available trial data casts some doubt on whether the half+full dose regime of the AstraZeneca COVID-19 vaccine is truly (much) more effective than the 2x full dose regime. The data for the combined trials suggests that this vaccine might be less effective than the Pfizer and Moderna vaccines, which likely have an effectiveness between 89-97% and 86-97%, respectively.

AZD1222 vaccine met primary efficacy endpoint in preventing COVID-19

Another covid-19 vaccine joins the party

Bayesian data analysis

Promising interim results from clinical trial of NIH-Moderna COVID-19 vaccine

Pfizer and BioNTech conclude phase 3 study of COVID-19 vaccine candidate, meeting all primary efficacy endpoints

A Appendix: R codeThe below R code generates the tables and figure presented in this paper. library(tidyverse) dfData <-tribble( trial,~nv,~nc, "AZ-Oxford (half+full dose)",3,30, "AZ-Oxford (2x full dose)",27,71, "AZ-Oxford (combined)",30,101, "Pfizer-BioNTech",8,162, "Moderna-NIH",5,90 ) %>% mutate( alphaM = 1 -nv / nc, trial = factor(trial, levels = trial)) print(dfData) post <-function(alpha, nv, nc) { p <-(1-alpha)^nv / (2-alpha)^{nv+nc} p / sum(p) } dfOut <-expand_grid(dfData, alpha = seq(0, 1, 0.01)) %>% group_by(trial) %>% mutate(p = post(alpha, nv, nc), cump = cumsum(p)) %>% ungroup() ggplot(dfOut, aes(y = p, x = alpha)) + geom_line() + facet_wrap(~trial, dir = "v", nrow = 2) + scale_x_continuous(labels = scales::percent_format(accuracy = 1), limits = c(0.4,1)) + ylab("Posterior probability density") + xlab(expression(paste("Vaccine effectiveness ", alpha))) + theme(panel.spacing = unit(1, "lines")) dfSum <-dfOut %>% group_by(trial) %>% summarize( median = alpha[which.min(abs(cump-0.5))] * 100, p.025 = alpha[which.min(abs(cump-0.025))] * 100, p.975 = alpha[which.min(abs(cump-0.975))] * 100) print(dfSum)