key: cord-0784246-zdfslfmb authors: Majumdar, Parinita; Niyogi, Sougata title: ORF3a mutation associated with higher mortality rate in SARS-CoV-2 infection date: 2020-10-26 journal: Epidemiology and infection DOI: 10.1017/s0950268820002599 sha: b1de80dfa733d9d6c6fdf5d97c1f3022eb67085f doc_id: 784246 cord_uid: zdfslfmb [Figure: see text] The betacoronaviruses of Coronaviridae family are potent human pathogens of zoonotic origin [1, 2] . Till now, human coronaviruses (CoVs) have been associated with three major outbreaks of acute respiratory disorders namely severe acute respiratory syndrome (SARS) in 2002, middle-east respiratory syndrome (MERS) in 2012 and COVID-19 in 2019 (Coronavirus disease 2019) with an overall mortality rate of 9.6%, 40% and 6.9%, respectively [1, [3] [4] [5] , https://www.worldometers.info/coronavirus/]. COVID-19 originated at Wuhan city of Central China in mid-November 2019, gradually spread to more than 200 countries with a death toll of 2,44,664 as on 2 May 2020 [5, https://www.worldometers.info/coronavirus/]. The complete genome sequence of the causative organism was first reported from China and found to be severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) which bears 79% and 50% genetic similarities with SARS-CoV and MERS-CoV [2, 5] . The unusually larger, ∼30 kb genome of SARS-CoV-2 has similar genomic organisation with SARS-CoV and encodes 16 non-structural proteins (NSP1−NSP16), 4 structural proteins including Spike (S), Membrane (M), Envelope (E) and Nucleocapsid (N) proteins and 6 accessory proteins (ORF3a, ORF6, ORF7a, ORF7b, ORF8 and ORF10) [2, [4] [5] [6] . The non-structural proteins of CoVs are indispensable for viral replication and transcription whereas the structural proteins constitute the mature virion [2, 5] . The role of accessory proteins has just begun to be evident in viral pathogenesis [7] [8] [9] . Among the accessory proteins, ORF3a is the largest one containing 274 amino acids in SARS-CoV [9] . Cell surface localisation of ORF3a in SARS-CoV potentiates viral entry within the host and has immunogenic properties [3, 9] . Moreover, it is involved in pro-inflammatory cytokine and chemokine production by activating various signal transduction pathways including C-Jun N-terminal kinase (JNK) and nuclear factor kappa-light-chain-enhancer of activated B cells (NF-κB) [10] . An uncontrolled release of pro-inflammatory cytokines and chemokines results in an exacerbated host immune response in SARS-CoV-2 infected patients leading to cytokine storm [11] . ORF3a is also implicated in ion channel formation and modulates release of virus from the host cell [9] . Soon after the isolation of Wuhan strain (EPI_ISL_402123), an increasing number of SARS-CoV-2 variants with genetic diversity has started to emerge across the globe [4] [5] [6] 12] . The large genome size of SARS-CoV-2 and low fidelity RNA-dependent RNA polymerase (RdRp) are the contributing factors for considerable higher rate of adaptive mutations leading to the evolution and coexistence of pathogenic viral strains worldwide [4, 6, 13] . Several independent recurrent mutations in NSP6, NSP7, NSP12, NSP13 encoded by ORF1ab and spike protein resulting in non-synonymous amino acid mutations are identified as mutational hotspot which are closely associated with inter-species transmission and virulence [4, 6, [14] [15] . Thus it is intriguing to understand the functional implications of different mutations in the diverse infection and mortality rate of SARS-CoV-2. All data were collected from coronavirus worldometer section (https://www.worldometers.info/coronavirus/) as on 2 May 2020. Initially, 23 countries were selected with a minimum positive 20,000 case reports. Percent infection was defined as (total cases/total tests) × 100 and % death was calculated as (total deaths/total cases) × 100 (online Supplementary table 1). Fifteen countries with either lower (group I) or higher (group II) values (both % infection and % death) were shortlisted for further analysis ( Fig. 1a and b) . Wuhan strain from China (EPI_ISL_402123) was included as a reference for most of the analysis. Amino acid changes in SARS-CoV-2 strains for each country were extracted from NextStrain open source project (https://nextstrain. org/ncov) [12] . Mutation frequency was calculated by dividing the number of mutations for a particular protein with total number of mutations corresponding to all the proteins for a given country, multiplied by 100. The mutation frequency was represented by stacked bar chart which was created using Microsoft Excel. Amino acid sequences of ORF3a protein (NP_828852.2 and YP_009724391.1 for SARS-CoV and SARS-CoV-2, respectively) in FASTA format were downloaded from NCBI protein database and pair-wise sequence alignment was performed in EMBOSS Needle web server with default parameters [16] . Amino acid sequences of concerned proteins were given to IEDB-AR (Immune Epitope Database Analysis Resource) for B cell epitope prediction using default parameters and 0.55 threshold value [17] . Biological significance of any mutation was determined using PROVEAN web tool (v1.1.3). FASTA sequence of a wild-type query protein and mutant variants were uploaded. Functional relevance was predicted with default settings [18] . Amino acid sequences of ORF3a protein (NP_828852.2 and YP_009724391.1 for SARS-CoV and SARS-CoV-2, respectively) in FASTA format were further scanned in ExPASy-prosite Parinita Majumdar and Sougata Niyogi (prosite.expasy.org) portal for motif analysis. Tertiary structures for ORF3a protein were obtained from Phyre2 web portal upon amino acid sequence submission (FASTA). Human interacting partners of SARS-CoV ORF3a protein were charted using STRING Viruses database (10.5) . Not more than 20 1 st shell and <10 2 nd shell interactions were taken with a threshold confidence score of 0.7 [19] . Twelve different human interacting partners of ORF3a protein were taken for functional annotation in DAVID (Database for Annotation, Visualization and Integrated Discovery, v6.8) [20] . After gene-enrichment analysis, functional annotation clustering and KEGG (Kyoto encyclopedia for gene and genomes) pathway mapping, a cluster with highest enrichment score (4.11) was obtained. Initially 23 countries with more than 20,000 COVID-19 positive case reports were selected for analysis. These countries were ranked based on percent infection (list1) and percent death (list 2) (Fig. 1a) . A and B represent bottom ten countries from list 1 and 2, respectively. While C and D correspond to top ten countries from those two lists. Now these countries were categorised into two groups; low infection as well as mortality rate (group I) and high infection and mortality rate (group II) (Fig. 1a) . Based on these parameters, 15 Fig. 1c and d) . Mutations with considerably higher frequency was observed in ORF1a and ORF1b which comprise two-third of the viral genome and was prevalent in all group I and group II countries (Fig. 1d) . Similarly, mutation in S protein was common in all the countries tested except Portugal. Some of the group II countries including UK, Brazil, Belgium and Netherlands exhibited mutations in M protein which was completely absent from group I countries (Fig. 1d) . Further, viral strains from all the countries except Germany accumulated mutations in N protein indicating a probable mutational hotspot. Interestingly, ORF3a mutation was present in all group II countries which were not found in group I (except Portugal). Based on the rapid emergence of multiple genetic variants, we speculate a potential involvement of SARS-CoV-2 genetic diversity in varied clinical outcomes of its infection. Thus along with diverse regulating factors such as age, sex, co-morbidity complications like diabetes, cardiovascular diseases and chronic lung diseases, ORF3a mutations are linked with differential infection and mortality rate of SARS-CoV-2 infection (Fig. 1b) . The association between ORF3a mutations and countries with higher infection as well as mortality rate instigated us to elucidate the nature of amino acid mutations in each of the viral isolates. Altogether, 18 different amino acid changes were observed of which 5 mutations (T175I, L94F, K16N, L94I and A72 T) were neutral and remaining were deleterious (Q57H, G251 V, P25L, W149L, R126 T, T176I, T217I, D142N, V90F, Y109C, D155Y, Y156N and K67E) (online Supplementary Table 3 ). Some of these mutations resulted in the loss of predicted motifs and B cell epitope as found in wild-type (WT) ORF3a protein (Fig. 2) . The WT ORF3a protein from SARS-CoV-2 was predicted to have N-myristoylation site, leucine-zipper and phosphorylation sites for protein kinase C as well as casein kinase II, (Fig. 2a) . Putative ORF3a of SARS-CoV-2 is predicted to encode 275 amino acids containing protein of 31 kDa molecular weight [8] . SARS-CoV-2 ORF3a bears 72.4% sequence identity and 85.1% sequence similarity with that of SARS-CoV (Fig. 3a) . Currently, crystal structures of SARS-CoV and SARS-CoV-2 ORF3a are not yet resolved. However, the predicted tertiary structures of ORF3a protein from SARS-CoV exhibited 3 alpha helices which were also observed in SARS-CoV suggesting functional conservation (Supplementary Fig. 1a and b) . ORF3a protein of SARS-CoV has three N terminal transmembrane domains and exhibits both plasma membrane and intracellular localisations [7, 9] . SARS-CoV-2 ORF3a is also predicted to have N terminal transmembrane domains similar to SARS-CoV ORF3a ( Supplementary Fig. 1c and d). In silico motif scan showed the presence of conserved N-myristoylation site and several phosphorylation sites for protein kinase C and casein kinase II in SARS-CoV and SARS-CoV-2 ORF3a (Fig. 3b) . However, a leucine zipper was predicted only in SARS-CoV-2 while N-glycosylation site was absent which was predicted in SARS-CoV ORF3a (Fig. 3b) . SARS-CoV ORF3a protein has been shown to have cysteine motif CWLCWKC (127−133 amino acid), di-acidic motif ExD (171−173 amino acid) and C terminal YxxΦ domain (160−163 amino acid), which are implicated in mediating protein−protein interactions with other SARS-CoV proteins, export from endoplasmic reticulum and host immune evasion [9] . Sequence alignment of SARS-CoV and SARS-CoV-2 ORF3a proteins revealed C127L mutation in cysteine motif of SARS-CoV-2 (Fig. 3a) and E171S mutation in di-acidic motif (Fig. 3a) . However, YxxΦ domain was conserved between the two proteins ( Fig. 3a marked in green) . Since none of the accessible SARS-CoV-2 genomes are streamlined in protein databases like STRING we could not directly get ORF3a (SARS-CoV-2) − human protein interactome. Our data have significantly established the structural resemblance of ORF3a protein between SARS-CoV and SARS-CoV-2 thus conceding further functional prediction. So we deduce the functional pertinence of SARS-CoV-2 putative ORF3a protein from interactome and pathway enrichment analysis of SARS-CoV. First, we sought to perform a viral and host protein−protein network analysis using STRING viruses database and found 12 human (JAK1/ 2/3, STAT3, SOCS3, NOS3, IL-6, MAPK1, CCND1, CAV1, SRC and EGFR) and one viral (S glycoprotein) interacting partners (direct or indirect) for ORF3a protein of SARS-CoV. Only two host proteins (STAT3 and CAV1) were acquired as direct interacting partners (Fig. 4a) . Next, we used this list of interacting protein encoding genes in DAVID for functional annotation and pathway prediction. We got disparate pathways like TLR signalling pathway, NOD like receptor pathway, TNF signalling pathway and T cell receptor pathway to be associated with this ORF3a interacting nexus (online Supplementary Table 4 ). These pathways were mostly connected to immune response and signal transduction suggesting potential involvement of SARS-CoV-2 ORF3a protein in host immune modulation. Surprisingly, insulin resistance was found to be linked with this ORF3a interactome. PI3 K-Akt signalling pathway, chemokine pathway and JAK-STAT pathway were present for almost all the interacting partners (online Supplementary Table 4) . A direct interacting host protein CAV1 was found to be federated with endocytic pathways indicating probable contribution of ORF3a towards viral entry (Fig. 4a , online Supplementary Table 4 ). Other intracellular signalling pathways such as VEGF, MAPK, AMPK pathways were detected to be linked to this interacting network (online Supplementary Table 4 ). A cluster of 8 interacting partners (STAT3, JAK1/2/3, NOS3, SOCS3, IL-6, CCND1) were given highest enrichment score of 4.11. In 'X' axis of the cluster 7 different associated pathways (JAK-STAT signalling pathway, cytokine-mediated signalling pathway, positive regulation of tyrosine phosphorylation of STAT3 protein, IL6 type cytokine signal transduction, Herpes simplex infection, negative regulation of cell proliferation and insulin resistance) are shown (Fig. 4b) . IL-6 was reported to be federated with all the charted pathways whereas JAK3 and CCND1 were found to be linked with only JAK-STAT pathway. While JAK2 and STAT3 were associated with all the noted pathways except insulin resistance and Herpes simplex infection, respectively, only two connections (negative regulation of cell proliferation and insulin resistance) were reported for NOS3 ( Fig. 4b) . Interestingly, all the proteins were somehow linked to JAK-STAT pathway. Since its discovery, seven human CoVs have been delineated and the recent one, SARS-CoV-2 engendered a global medical emergency [1] [2] . Even after contemplating several climatic, anthropometric and clinical factors [4] , we found an unforeseen variation in the mortality rate among different countries. These countries were categorised into two distinct groups depending on their infection and mortality rate ( Fig. 1a-b) . There are persuasive mathematical models suggesting evolutionary and causal relationships between virulence and mortality rate [21] . Virulence evolution is further affected by intra-host mutation and inter-strain competition-mediated selection pressure of an obligate parasite like virus [22] . Human SARS-CoV has a fairly large and flexible genome thus allowing considerable amount of adaptive mutations further leading to the emergence of novel strains with enhanced pathogenicity [23] . In agreement to this we could also detect various mutations (amino acid changes) in 13 different ORFs of SARS-CoV-2 and interestingly ORF3a mutations were found to be markedly associated with group II countries having higher mortality rate (Fig. 1d) . Where the changes in phosphorylation sites in ORF3a of SARS-CoV-2 variants might be responsible for functional involvement in signal transduction pathways, N-myristoylation site remained conserved indicating the intact viral immune-pathogenicity [24] . K67E variant showed gain of casein kinase II phosphorylation site. Computer-aided study revealed N protein of coronavirus as a probable substrate of casein kinase II [25] but no studies are available to suggest any possible aftermath of a casein kinase phosphorylation site gain. Leucine zipper motif loss in murine coronavirus S protein could impart Epidemiology and Infection defective cell to cell fusion [26] but potential implication of such mutation in ORF3a protein of human corona virus remained elusive. The N terminal peptide of SARS-CoV-2 ORF3a protein spanning from 15 to 28 amino acids had been shown to induce strong and protective humoral response in the infected patient [27] . We found a drastic drop in B cell epitope intensity of SARS-CoV-2 variants (P25L) which suggests a probable mechanism of immune evasion and its enhanced virulence. However, the significance of the loss of B cell epitope in G251 V variant needs to be addressed. Through sundry in silico approaches we could show appreciable structural similarities of SARS-CoV and SARS-CoV-2 ORF3a proteins, suggesting their functional accordance ( Fig. 3a-b , online Supplementary Fig. 1a-d) . Although these in silico predictions warrant further in vivo validations. ORF3a protein was extensively implicated in host immune evasion, immunomodulation and virulence of SARS-CoV via NF-κB, JNK and cytokine pathways [7] . In addition to virulence and pathogenecity, ORF3a protein of SARS-CoV-2 is predicted to be involved in ion channel formation and viral release, similar to SARS-CoV ORF3a [8] . Our study did not find any mutations in the functional motifs of SARS-CoV-2 ORF3a involved in such functions (Fig. 3a) . Thus it is unlikely that these mutations would affect viral life cycle. However, the precise role of ORF3a protein in SARS-CoV-2 life cycle needs further investigation. Our functional enrichment data also stipulate the reasonable association of ORF3a protein with host immune response through JAK-STAT, chemokine and cytokine-related pathways ( Fig. 4b and online Supplementary Table 4 ). Together we could sensibly propose the potential role of ORF3a mutations in elevated mortality rate for SARS-CoV-2 infection through host immune evasion and immoderate cytokine storm. Supplementary material. The supplementary material for this article can be found at https://doi.org/10.1017/S0950268820002599 Data availability. The data integral to this finding would be available from the corresponding author upon request. The mutation data on various SARS-CoV-2 proteins are freely accessible from NextStrain open source project (https://nextstrain.org/ncov). Conflicts of interest. The authors declare no conflicts of interest in this study. History and recent advances in coronavirus discovery Coronaviruses: an overview of their replication and pathogenesis Epidemiology and cause of severe acute respiratory syndrome (SARS) in Guangdong, People's Republic of China Clinical characteristics of coronavirus disease 2019 in China Genomic characterization and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding Emerging SARS-CoV-2 mutation hot spots include a novel RNA-dependent-RNA polymerase variant Severe acute respiratory syndrome coronavirus ORF3a protein activates the NLRP3 inflammasome by promoting TRAF3-dependent ubiquitination of ASC SARS-CoV-2 and ORF3a: nonsynonymous mutations, functional domains, and viral pathogenesis Accessory proteins of SARS-CoV and other coronaviruses Augmentation of chemokine production by severe acute respiratorysyndrome coronavirus 3a/X1 and 7a/X4 proteins through NF-κB activation COVID-19: consider cytokine storm syndromes and immunosuppression Nextstrain: real-time tracking of pathogen evolution Epidemiology, genetic recombination, and pathogenesis of coronaviruses Emergence of genomic diversity and recurrent mutations in SARS-CoV-2. Infection Structure, function, and antigenicity of the SARS-CoV-2 spike glycoprotein A general method applicable to the search for similarities in the amino acid sequence of two proteins Comparative genomic analysis of rapidly evolving SARS CoV-2 viruses reveal mosaic pattern of phylogeographical distribution PROVEAN web server: a tool to predict the functional effect of amino acid substitutions and indels Viruses. STRING: A virus-host protein-protein interaction database Extracting biological meaning from large gene lists with DAVID On the evolution of virulence and the relationship between various measures of mortality Mutation and the evolution of virulence The spike glycoprotein of the new coronavirus 2019-nCoV contains a furin-like cleavage site absent in CoV of the same clade Myristoylation: an important protein modification in the immune response Identification of two ATR-dependent phosphorylation sites on coronavirus nucleocapsid protein with nonessential functions in viral replication and infectivity in cultured cells Amino acid substitutions within the leucine zipper domain of the murine coronavirus spike protein cause defects in oligomerization and the ability to induce cell-to-cell fusion Amino terminus of the SARS coronavirus protein 3a elicits strong, potentially protective humoral responses in infected patients