key: cord-0274684-jrsu7kwd authors: Hu, Ye-fan; Hu, Jing-chu; Gong, Hua-rui; Danchin, Antoine; Sun, Ren; Chu, Hin; Hung, Ivan Fan-Ngai; Yuen, Kwok Yung; To, Kelvin Kai-Wang; Zhang, Bao-zhong; Yau, Thomas; Huang, Jian-Dong title: Computation of Antigenicity Predicts SARS-CoV-2 Vaccine Breakthrough Variants date: 2022-01-20 journal: bioRxiv DOI: 10.1101/2022.01.19.477009 sha: 8f09005c08abfd8e134d55150dfe8a5ebdefca51 doc_id: 274684 cord_uid: jrsu7kwd It has been reported that multiple SARS-CoV-2 variants of concerns (VOCs) including B.1.1.7 (Alpha), B.1.351 (Beta), P.1 (Gamma), and B.1.617.2 (Delta) can reduce neutralisation by antibodies, resulting in vaccine breakthrough infections. Virus-antiserum neutralisation assays are typically performed to monitor potential vaccine breakthrough strains. However, such experimental-based methods are slow and cannot instantly validate whether newly emerging variants can break through current vaccines or therapeutic antibodies. To address this, we sought to establish a computational model to predict the antigenicity of SARS-CoV-2 variants by sequence alone and in real time. In this study, we firstly identified the relationship between the antigenic difference transformed from the amino acid sequence and the antigenic distance from the neutralisation titres. Based on this correlation, we obtained a computational model for the receptor binding domain (RBD) of the spike protein to predict the fold decrease in virus-antiserum neutralisation titres with high accuracy (~0.79). Our predicted results were comparable with experimental neutralisation titres of variants, including B.1.1.7 (Alpha), B.1.351 (Beta), B.1.617.2 (Delta), B.1.429 (Epsilon), P.1 (Gamma), B.1.526 (Iota), B.1.617.1 (Kappa), and C.37 (Lambda), as well as SARS-CoV. Here, we firstly predicted the fold of decrease of B.1.1.529 (Omicron) as 17.4-fold less susceptible to neutralisation. We visualised all 1521 SARS-CoV-2 lineages to indicate variants including B.1.621 (Mu), B.1.630, B.1.633, B.1.649, and C.1.2, which can induce vaccine breakthrough infections in addition to reported VOCs B.1.351 (Beta), P.1 (Gamma), B.1.617.2 (Delta), and B.1.1.529 (Omicron). Our study offers a quick approach to predict the antigenicity of SARS-CoV-2 variants as soon as they emerge. Furthermore, this approach can facilitate future vaccine updates to cover all major variants. An online version can be accessed at http://jdlab.online. To predict the antigenicity of SARS-CoV-2 variants, we firstly integrated the reported 1 0 1 conformational or linear epitopes (Fig. S1 &Table S1) on the SARS-CoV-2 Spike 1 0 2 protein (Fig. 1a) with the reported experimental virus-antiserum neutralisation titres 1 0 3 against SARS-CoV-2 variants including B.1.1.7 1-5 , B.1.351 2,3,6,7 , and P.1 1,2,8 ( Table 1 0 4 S2a). Considering the distinct assays used in the different studies, we standardised the 1 0 5 neutralisation titres of each variant to the titre of the ancestral strain of SARS-CoV-2 1 0 6 (lineage A) using the same assay in each study on a log 2 scale, and thus we got 1 0 7 observed antigenic distance (H ab ) from neutralisation titres (Fig. 1b) . For the antigenic 1 0 8 difference (D ab ), we used Poisson distance to represent the difference between two 1 0 9 amino acid sequences (Fig. 1b) . By comparing the observed antigenic distance with 1 1 0 the antigenic difference, we found a relationship between observed antigenic distance 1 1 1 and the antigenic difference: H ab =T max ·D ab /(D 50 +D ab ), where T max is the maximal fold 1 1 2 of decrease and D 50 is the antigenic difference which may lead to neutralisation 1 1 3 decrease at the 50% level of the maximal decrease (the fold change between SARS-1 1 4 CoV-2 and SARS-CoV). This relationship described that the decrease of 1 1 5 neutralisation titre increases with the accumulation of amino acid changes, and then 1 1 6 reaches at the maximal decrease (Figs. 1c-d) . Based on this correlation, we obtained a 1 1 7 computational model using the receptor binding domain (RBD) of the spike protein to 1 1 8 predict the fold decrease in virus-antiserum neutralisation titres with higher accuracy 1 1 9 (~0.79, the calculation of accuracy in Methods) compared with other fragments of 1 2 0 spike (entire spike, N terminal domain plus RBD, or S1, Fig. 1d ). With repeated 5-1 2 1 fold or 10-fold cross validation (Fig. 1d) , we found that prediction using RBD is 1 2 2 relatively robust in terms of root-mean-square error (RMSE), mean absolute error 1 2 3 (MAE), coefficient of determination (R 2 ) and accuracy. 1 2 4 To further validate our model, we predicted the fold decreases in neutralisation titres 1 2 5 (comparing to the ancestral of SARS-CoV-2) of multiple variants including B. reported that VOCs can elicit vaccine breakthrough infections, which correlated with 1 3 0 fold decreases in the neutralisation titres from experimental assays was disclosed 1 3 1 (Table S2) . Our predicted results were highly consistent with the neutralisation assay 1 3 2 results (Fig. 1e) . We also predicted the fold of decrease in neutralisation titre of the 1 3 3 most recent VOC, B. interval: 13.7, 22.2) decrease in neutralisation titre (shown as a blue point in Fig. 1c) . , 1 3 6 The predicted result is consistent with reported decrease folds ranging from 10 to 40 1 3 7 17,18 . This result alarmed the risk of vaccine breakthrough or re-infection of B.1.1.529 1 3 8 (Omicron) due to the dramatic decrease in neutralization. 1 3 9 The prediction of potential vaccine breakthrough strains 1 4 0 To predict the next potential SARS-CoV-2 vaccine breakthrough variants, we 1 4 1 visualised the antigenicity of all available SARS-CoV-2 variants as an indicator of 1 4 2 their vaccine breakthrough potential. We firstly selected all 1521 lineage variants 1 4 3 using PANGO 30 updated on December 6, 2021 (Table S3 ) to predict their 1 4 4 antigenicity. Then we calculated the pairwise distances of different variants. For 1 4 5 visualising these results, we captured two principal components from the high-1 4 6 dimensional data of antigenic distance 25 . We used all spike amino acid sequences to 1 4 7 plot the 'genetic map' of SARS-CoV-2 to represent the genetic difference among 1 4 8 different variants (Fig. 2a-b) . We then plotted the 'antigenic map' using the predicted 1 4 9 antigenic distances ( Fig. 2c -d, online versions available at http://jdlab.online). 1 5 0 Based on the relationship between neutralisation titre fold change and protective 1 5 1 efficacy 31 , it was convenient to set up some 'cut-offs' in the current vaccine coverage. 1 5 2 We included phase 3 and real-world results of vaccine efficacy or effectiveness, as 1 5 3 well as neutralisation titre data from phase 1 and 2 studies (Table S4-5) . Thus, we got 1 5 4 the relationship between neutralisation titre and protective efficacy against a 1 5 5 symptomatic COVID-19 (Fig. 2e) . A 3.93-fold decrease in neutralisation titres 1 5 6 induced by VOCs that can dampened the efficacy of some vaccines to lower than 50%. 1 5 7 In this way, one cut-off of 1.98 arbitrary units (A.U.) represented a 3.93-fold decrease 1 5 8 in the neutralisation titre (shown as a pink circle in Fig. 2c -d). All variants outside this 1 5 9 cut-off have the potential to be vaccine breakthrough variants. By comparing the 1 6 0 "genetic map" and antigenic map, we can set up the border of antigenic map. 1 6 1 Although there are >200 mutations in the SARS-CoV and WIV1-CoV spike (Fig 2a) , 1 6 2 the antigenic distance is around 4.9 A.U. which mean ~ 30-fold decrease in the 1 6 3 neutralisation titre (shown as a dark red circle in Fig. 2c -d). To reveal the distribution of variant, we plotted the density of variants on the 'genetic 1 6 5 map' and antigenic map due to overlapping dots. In the genetic map, hotspots are 1 6 6 located at lineage A (>10%) and B.1 (>40%) mainly, as well as AY.* and P.1 ( Fig. 1 6 7 2b). While in the antigenic map, hotspots are placed at lineage A (>40%) mainly, 1 6 8 together with AY.* (Fig. 2d) . Although most variants were shown to be close to the 1 6 9 ancestral strain (Figs. 2b&d), multiple variants were found to decrease neutralisation 1 7 0 titres significantly (Fig. 2c) . In addition to reported VOCs including B. variants with more than 3.93-fold decrease (Fig. 2c) . Besides the pandemic of 1 7 7 B.1.617.2 (Delta) 9 and the outbreak of B.1.1.529 (Omicron), multiple variants should 1 7 8 be investigated immediately as they have the potential to become tomorrow's VOCs. 1 7 9 Discussion 1 8 0 Predicting neutralisation responses against all SARS-CoV-2 variants based on 1 8 1 sequences alone is vital for selecting the next vaccine seeds for the development of 1 8 2 effective COVID-19 vaccines. We established a computational approach to predict 1 8 3 neutralisation titres and validated these predictions using experimental data. Our 1 8 4 computational approach could potentially provide the first hints of whether a newly 1 8 5 identified variant can break through vaccines just by its sequence information, which 1 8 6 would greatly shorten the time for the crucial early warning of emerging vaccine 1 8 7 breakthrough strains. 1 8 8 In the prediction of the antigenicity of SARS-CoV-2 variants, we proposed that the 1 8 9 limit of neutralisation titre decrease is set by SARS-CoV (Fig. 1) . In recent studies, 1 9 0 SARS-CoV is ~ 36-fold less susceptible to neutralisation comparing to the ancestral 1 9 1 strain of SARS-CoV-2. Based on this result, a non-linear curve was established to 1 9 2 describe the relationship between the observed antigenic distance and the antigenic 1 9 3 difference. We further performed calculation using different fragments of the Spike 1 9 4 protein (Fig. 1d) . Among the Spike protein and the RBD, NTD-RBD, and S1 1 9 5 fragments, we found the prediction using amino acid sequences of RBD was able to 1 9 6 estimate the neutralisation titre more accurately than the others (Figs. 1d) . Thus, we 1 9 7 used the RBD-based computations to determine the neutralisation titres. 1 9 8 A major concern of our computation of the neutralisation titre is that the data is based 1 9 9 on diverse neutralisation assays of serum samples from both patients and vaccinees 2 0 0 against both live virus and pseudovirus (Table S2) . Although the results were 2 0 1 consistent qualitatively, the variation of fold change is too large to be ignored ( Fig. 2 0 2 1e). Considering the variation in the real world, we set up values 2-fold or less than 2 0 3 the experimental values as the criteria based on previous studies 28 . It is better to 2 0 4 establish a convenient and standardised neutralisation pipeline in the future, like the 2 0 5 haemagglutination inhibition (HI) assay for influenza virus. Such a pipeline can allow 2 0 6 the precise estimation of neutralisation titres. Together with estimating the association 2 0 7 of neutralisation with protection, it will help to develop next generation vaccines. It is crucial to update vaccines to cover all vaccine breakthrough strains that have 2 0 9 significant amino acid and glycosylation changes to prevent further infectious 2 1 0 outbreaks. However, not all predicted SARS-CoV-2 vaccine breakthrough variants 2 1 1 will have the chance to cause an outbreak due to their changed viral fitness 32 or by 2 1 2 pure luck. Based on previous studies of influenza viruses, it is possible for variants to 2 1 3 have alterations that change the antigenicity, but fail to cause outbreaks in the wider 2 1 4 population 33 . Considering immune escape elicited by variants, updating current 2 1 5 vaccine seeds with new variants should extend the vaccine coverage. As SARS-CoV-2 1 6 2 showed different variant directions in the antigenic map (Fig. 2) , the use of multiple 2 1 7 virus seeds based on the different directions might be appropriate to cover all major 2 1 8 variants in the long term. Our method could help in the selection of SARS-CoV-2 2 1 9 variants for updating vaccines. We collected 149 confirmed conformational epitopes with protein structures released 2 2 3 in the Protein Data Bank (PDB) (https://www.rcsb.org/) or annotated epitope 2 2 4 footprints and 76 linear epitopes published in the literature (Table S1) . We plotted the 2 2 5 footprint of all Spike protein epitopes from the aforementioned 225 epitopes using R-2 2 6 3.6.6. 2 2 7 Antigenic distances from neutralisation data 2 2 8 We calculated antigenic distances from the neutralisation data based on previous 2 2 9 publications 26 . For virus variant a, reference virus b, and antiserum β (referencing 2 3 0 virus b), we defined the antigenic distance of variant a to reference virus b in terms of 2 3 1 the standardised log titre as H ab =log 2 T aβ -log 2 T bβ , where T bβ is the titre of antiserum β 2 3 2 against virus b, and T aβ is the titre of antiserum β against virus a 26 . Merged data with 2 3 3 reference virus lineage A (the ancestral strain of SARS-CoV-2) were collected from 2 3 4 several publications (Table S2) . Genetic and antigenic difference calculation 2 3 6 We selected 1521 SARS-CoV-2 lineages using PANGO (v.3.1.15) updated on 2 3 7 December 6, 2021 (https://cov-lineages.org/). Spike protein amino acid sequences of 2 3 8 these lineages were obtained from GISAID, using the earliest collected for each 2 3 9 lineage (Table S3 ). All sequences with neutralisation titres were also included ( we used an information theory-based approach p-all-epitope 27,28 to measure the 2 4 4 pairwise distances among amino acid sequences of the antigenic footprint ('antigenic 2 4 5 positions'). The distance is based on the number of different amino acids n d between 2 4 6 two n-mer viral sequences of variants a and b. Under the assumption that the number 2 4 7 of amino acid substitutions per site follows a Poisson distribution, we can then 2 4 8 calculate the distance between a and b as D ab =-ln(1-n d /n). Modelling and performance measurement 2 5 0 A model considering the maximal neutralisation tire decrease was applied to examine 2 5 1 the antigenic distance from the neutralisation data H ab and our computed results D ab as 2 5 2 H ab =T max ·D ab /(D 50 +D ab ), where T max is the maximal decrease and D 50 is the antigenic 2 5 3 difference which may lead to neutralisation decrease at the 50% level of the maximal 2 5 4 decrease. The predicted neutralisation titre is then given as 2 5 5 P ab ≈ Ĥ ab =T max ·D ab /(D 50 +D ab ). Root-mean-square error (RMSE), mean absolute error 2 5 6 (MAE), and coefficient of determination (R-squared R 2 ) were used to measure the 2 5 7 performance of the linear correlation. Reproducibility was determined by pairwise sequences and neutralisation titres. Neutralisation titre data were converted into variables by calculating the relative 2 6 0 difference in the neutralisation titres between reference virus and variant against the 2 6 1 antiserum. Accuracy was the percentage of correctly predicted neutralisation titres 2 6 2 using amino acid sequences. Based on previous studies 28 , computational values 2-2 6 3 fold or less than the experimental values were considered to be similar (correct) and 2 6 4 those more than 2-fold lower were considered dissimilar (error). Here, 10-time 2 6 5 repeated 5-fold and 10-fold cross validation were applied in terms of root-mean-2 6 6 square error (RMSE), mean absolute error (MAE), coefficient of determination (R-2 6 7 squared R 2 ), and accuracy. Gamma B.1.1.529: Omicron B.1.1.7: Alpha B.1.351: Beta B.1.617.2/AY.*: Delta P.1/P.1.*: Gamma B.1.1.529: Omicron B.1.1.7: Alpha B.1.351: Beta B.1.617.2/AY.*: Delta P.1/P.1.*: Gamma B epitopes are coloured in slate and linear epitopes in light blue. Some antigenic 3 9 0 positions in both conformational epitopes and linear epitopes are coloured in blue. All 3 9 1 glycosylation sites are in teal. (b) A flowchart of the process to establish the sequence-3 9 2 based computational model of SARS-CoV-2 antigenicity. The antigenic distance of 3 9 3 variant a to reference virus b from neutralisation titre was defined as H ab =log 2 T aβ -3 9 4 log 2 T bβ, where β , T aβ , and T bβ denote antiserum (referencing virus b), the titre of 3 9 5 antiserum β against virus b, and the titre of antiserum β against virus a 26 . The 3 9 6antigenic distance of variant a to reference virus b from amino acid sequences was 3 9 7defined as D ab =-ln(1-n d /n), where n d is the number of amino acid substitutions 3 9 8between variant a and reference virus b, n is the number of antigenic sites. Then, we 3 9 9proposed a relationship between observed antigenic distance and the antigenic Predicted versus observed antigenic distances of variants of concern. Here, The 4 0 8observed antigenic distances as fold decreases in the neutralisation titres of variants of 4 0 9concern versus the original strain on a log 2 scale. Each point shows the mean of 4 1 0 antigenic distances in each assay. Predicted antigenic distances are based on the 4 1 1 prediction in (c). Leave-one-out predicted antigenic distances are predicted based on 4 1 2 the datasets without the variant that we aim to compare. antigenic distance (mean of neutralisation titres in vaccinees divided by corresponding 4 2 7 mean of titres in convalescent patients in log 2) and protection from SARS-CoV-2 4 2 8infection. The reported mean neutralization level from phase 1 or 2 studies (Table S4) 4 2 9 and the protective efficacy or effectiveness from phase 3 trials or real-world studies 4 3 0 (Table S5)