key: cord-0702298-axvmzbak authors: Chakravarty, Suvobrata title: COVID-19: The Effect of Host Genetic Variations on Host–Virus Interactions date: 2020-12-10 journal: J Proteome Res DOI: 10.1021/acs.jproteome.0c00637 sha: 78d97df9b6d8d6b41520a324647becb7afb455a6 doc_id: 702298 cord_uid: axvmzbak [Image: see text] Spurred into action by the COVID-19 pandemic, the global scientific community has, in a short of period of time, made astonishing progress in understanding and combating COVID-19. Given the known human protein machinery for (a) SARS-CoV-2 entry, (b) the host innate immune response, and (c) virus–host interactions (protein–protein and RNA–protein), the potential effects of human genetic variation in this machinery, which may contribute to clinical differences in SARS-CoV-2 pathogenesis and help determine individual risk for COVID-19 infection, are explored. The Genome Aggregation Database (gnomAD) was used to show that several rare germline exome variants of proteins in these pathways occur in the human population, suggesting that carriers of these rare variants (especially for proteins of innate immunity pathways) are at risk for severe symptoms (like the severe symptoms in patients who are known to be rare variant carriers), whereas carriers of other variants could have a protective advantage against infection. The occurrence of genetic variation is thus expected to motivate the experimental probing of natural variants to understand the mechanistic differences in SARS-CoV-2 pathogenesis from one individual to another. The worldwide spread of the coronavirus disease 2019 (COVID-19), 1 caused by the novel virus severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), 2 has resulted in 1.33 million deaths globally, having advanced at a staggering rate of ∼22 000 new infections and ∼385 deaths every hour (WHO, November 18 situation report). During this health crisis, a knowledge of the risk factors of COVID-19 infection is valuable for determining the most appropriate measures, especially given our limited resources, to mitigate the threat. That is, in addition to efforts toward the discovery of a cure (e.g., vaccines or therapeutics), the identification of individual risk for separating low-risk from high-risk individuals is also important (e.g., in maintaining a minimal workforce to prevent the global economy from plunging into a depression). Determining the relationship between the genetic background of an individual and the susceptibility to and progression of the disease remains an important task in understanding and determining an individual's risk for COVID-19. Understanding the molecular basis of susceptibility is of vital interest and has been the subject of a number of important studies. 3−10 The emerging picture on susceptibility from these studies suggests that multiple genetic factors could contribute to the risk of SARS-CoV-2 infection and the severity of COVID-19 disease. 11, 12 In general, proteins engaged in pathways related to the viral life cycle and host defense are important components of the genetic factors. 11, 12 Polymorphic variants of human proteins engaged in these pathways are thus expected to make important contributions toward COVID-19 susceptibility. 11, 12 Therefore, a discussion is presented here on polymorphic variants (carried by an individual) of pathway-specific proteins that have the potential to link the susceptibility to the disease to the genetic background of an individual. This discussion has only been possible due to the availability of results from a number of molecular studies (Table 1) carried out in the last 8 months at a very rapid pace in several different disciplines, providing critical advances in understanding the molecular mechanisms of COVID-19 pathogenesis. Despite the health crisis, the efforts of scientists in unraveling the molecular pathology to combat the crisis through breakthroughs (Table 1) in a short time have been unparalleled. These results make possible an exploration of the possible relationship between the genetic background of an individual and the susceptibility to and the progression of the disease, which would be valuable to better understand COVID-19 pathogenesis. Genome-wide association study (GWAS) in the human population 13−16 has been the method of choice for identifying genetic risk factors associated with infectious diseases (e.g., mapping of HLA peptide binding cleft amino acid variants for disease susceptibility 14 ) . Although powerful, GWAS typically maps genetic loci (i.e., not genes necessarily), and probing the molecular mechanism may not be straightforward for a list of genetic loci. GWAS generally also fails to identify rare alleles. In general, GWAS of infectious disease susceptibility has been used for pathogens that have coexisted with human civilization since ancient times. 17 On an evolutionary time scale, the fixation of alleles providing selective advantage 18, 19 by natural selection has influenced host susceptibility. COVID-19, in comparison, is a very recent threat, and the possibility of rare alleles providing selective advantage demands exploration. Consistent with the well-known fact that most human genes have several natural variants, the genes identified as interacting with SARS-CoV-2 also have several variants. Thus the question arises of whether the genotype of an individual could have a bearing on COVID-19 infection and prognosis. This question stems from previous observations of natural variants found to afford the host protection against specific pathogens. For example: (a) HBB missense variant against Plasmodium falciparum, 20 (b) CCR5 deletion variant against HIV-1, 21,22 (c) FUT2 variants against the Norwalk virus, 23 (d) SLC4A1 variants against Plasmodium falciparum, 24 and so on. CCR5Δ32 (i.e., CCR5 coreceptor deletion variant) impairs the T-cell entry of HIV-1, just as FUT2 and SLC4A1 variants impair the host-cell entry by the respective pathogens. In other words, certain natural variants of a host protein have the potential to influence the clinical outcome of a specific host− pathogen interaction if the host protein is essential to the pathogen life cycle. A recent in silico study on ACE2 variants found in Iranian ethnic groups 25 suggests that ACE2 variants could possibly alter the receptor's interaction affinity, implying that subpopulations could have differences in intrinsic susceptibility to COVID-19. 25 Whereas the above example variants are known to provide protection against the pathogens, examples of variants responsible for severe infectious disease symptoms have also been reported 12 (e.g., herpes simplex virus 1 encephalitis, 26 viral respiratory infection 27 and influenza pneumonitis 28, 29 in children, etc.). Many of the variants responsible for severe infectious disease symptoms are rare, 12,26−29 and the catalogues of rare variants, now available from next-generation sequencing studies of large populations (e.g., gnomAD 30 ), therefore demand careful discussion in the context of disease susceptibility. In addition, recent pharmacogenomic studies 31,32 of receptor−drug interactions also emphasized the importance of rare variants, and like receptor−drug interactions, host-protein rare variants demand attention in the context of virus−host-protein−protein interactions as well. With several human proteins identified in the reports in Table 1 as being likely to play roles in the SARS-CoV-2 life cycle, the analysis of the variants of these proteins thus holds promise for relating the genetic backgrounds of patients to the clinical outcomes, complementing GWAS. 9 On the basis of the observations of some of the reports in Table 1 , the discussion on variants is presented for host-protein machineries that are engaged in (a) viral entry, (b) innate immunity, and (c) virus− host-protein−protein and RNA−protein interactions to com- 47 and analysis of S-protein variants for infectivity and antigenicity 48−51 human ACE2 function and structure (a) Identification of the ACE2 as the cellular receptor for the S protein 36, 52, 53 for membrane fusion (b) Determination of the tissue-level expression of the receptor and fusion-associated human proteins 54−56 and population level expression of proteins 10,11 (c) Structure elucidation of the interactions of the S protein with ACE2 (also chaperon for B 0 AT1) in complex with 57 Just as several of the above examples concern pathogen entry, variants related to SARS-CoV-2 entry are of great interest in understanding differences in pathogenesis. Human angiotensinconverting enzyme 2 (ACE2) is the cellular receptor of the SARS-CoV-2 Spike (S) protein, 36, 52, 53 and the ACE2 protein serves as the critical site for viral attachment. ACE2, a membrane-bound, counter-regulatory carboxypeptidase (responsible for the proteolysis of angiotensin I/II, neurotensin, kinetensin, and apelins), 114, 115 is an essential component of the renin−angiotensin hormone system, playing a critical role in cardiovascular homeostasis. 116 Understandably, the probability of a loss-of-function-intolerant (pLI 117 ) score for ACE2 is 0.99 (Figure 2A ), indicating that ACE2 truncation variants are less likely to be tolerated and that the virus utilizes an essential human protein for entry. In contrast, the pLI scores for CCR5 and FUT2 are both 0.0, suggesting higher tolerance for deletion/ truncation, which is consistent with the presence of deletion/ truncation variants of CCR5 and FUT2, even as homozygotes, in the human population being known to provide protective advantages against the respective pathogens. For ACE2, we therefore predominantly focus on its missense variants, in particular, those distinct from the eight variants discussed in the Iranian population study. 25 ACE2 is located on the X chromosome, and thus men are hemizygous for ACE2 variants. The hemizygous nature of ACE2 variants demands attention in the context of SARS-CoV-2 interactions. ACE2 is also included among genes that escape X chromosome inactivation (XCI) 118 (i.e., expressed from both the active and inactive X chromosomes in females), and the contributions of ACE2 variants toward SARS-CoV-2 interactions in females are therefore less likely to depend on XCI. Figure 2A summarizes the gnomAD variants of ACE2 (along with those of other proteins likely engaged in viral entry). The variants of the proteins are discussed in this section. ACE2 Variants. Topologically, a single transmembrane helix of ACE2 separates the extracellular peptidase domain (PD) from the cytoplasmic domain. 57 The PD is generally referred to as the receptor for coronaviruses (CoVs), and the C-terminal domain of S protein, referred to as the receptor-binding domain (RBD), physically interacts with the PD. 57−60 Recent structures of the PD−RBD complex 57−60 highlight the similarities and differences in receptor recognition between SARS-CoV-2 and other CoVs. 58, 59 To appreciate the possible consequences of human PD sequence variants, it is important to look at how changes in mammalian ACE2 sequence influence SARS-CoV-2 RBD recognition. 119, 120 Genetic variants are discussed in the same context as the PD sequence. 119,120 SARS-CoV-2 infects humans, bats, pigs, civets, 52 and golden hamsters 98 but not mice. 52 This observation was the key reason for creating a transgenic mouse (a cost-effective animal model 95 ) expressing human ACE2 95, 96 to study COVID-19 pathogenesis. A comparison of the mouse PD sequence with those of SARS-CoV-2 hosts ( Figure 1A ) sheds light on how the infection may be mediated by PD−RBD complexation. Although sequences of human, bat, pig, hamster, and civet PDs do differ from one another, they still interact with the SARS-CoV-2 RBD, whereas subtle changes in the mouse PD sequence likely compromise the interaction ( Figure 1A ). On the basis of the structure of the human ACE2 PD−RBD complex, 57−60 the mouse PD has conservative (with respect to the human) substitutions (D30N, Y83F, and K353H) at three hydrogen-bonding interfacial sites ( Figure 1 ), which together are likely sufficient to negatively perturb the interaction. This observation suggests that subtle changes in hydrogen bonding and charged residues at interfacial sites could impact the receptor−S-protein interaction. This is also consistent with adapting mouse ACE2 for the SARS-CoV-2 RBD interaction by a few point mutations. 97 The predicted loss of binding free energy (ΔΔG) for the three mouse PD substitutions is 0.34, 0.73, and 1.04 kcal/mol (using BeAtMuSiC 121, 122 ) . Given the behavior of PD interfacial residues, Genome Aggregation Database (gnomAD, 30 an unbiased large population database) missense variants that map to the PD−RBD interface are of great interest. Six human ACE2 missense variants, including p.Ser19Pro (PolyPhen score, 123 0.767), p.Ile20Val (0.000), p.Thr27Ala (0.000), p.Glu37Lys (0.712), p.Gly326Glu (0.090), and p.Glu329Gly (0.027), map to the interface ( Figure 2B ). The low PolyPhen scores (<0.9) indicate that these variants are unlikely to be deleterious (i.e., unlikely to perturb the essential peptidase function, being distal from the catalytic site). Ser19 is engaged in hydrogen bonding with the SARS-CoV-2 RBD backbone atoms, whereas the hydrogen bonding of Glu329 is prominent in the SARS-CoV RBD ( Figure 2B ). In addition, p.Glu37Lys and p.Glu329Gly result in interfacial charge alteration ( Figure 2B ). Ser19, Thr27, and Glu37 also engage in intramolecular hydrogen bonds, and variants at these positions could perturb PD−RBD interactions. The predicted ΔΔG for these variants (0.04, 0.18, 0.92, 0.34, 0.62, and 0.33 kcal/mol) is also within the range of the mouse PD substitutions. Given the known role of hydrogen bonds and charged residues in the mouse PD, these variants could contribute to altering the PD−RBD affinity. Position-specific amino acid enrichment/depletion in the ACE2 sequence observed by Chan et al. 72 in the designed decoy ACE2 receptor by deep mutagenesis 72 is also an important reference for predicting the expected behavior of ACE2 missense variants. In the deep mutagenesis study, the substitutions corresponding to three gnomAD missense variants (p.Ser19Pro, p.Thr27Ala, and p.Thr92Ile) were observed to be highly enriched 72 (i.e., Pro for Ser19, Ile for Thr92), suggesting that these variants are anticipated to enhance PD−RBD affinity. Asn90 and Thr92 constitute the N-glycosylation motif of the ACE2 protein, and substitutions at both positions were highly enriched, suggesting that N-glycans at these positions likely interfere with RBD binding, 72 whereas the proline substitution at Ser19 (the first turn of the interfacial helix) likely entropically stabilizes the helix. 72 Substitutions associated with a number of missense variants showing medium-range depletion (p.Glu35Lys, p.Glu37Lys, p.Asn51Ser, p.Lys68Glu, p.Phe72Val, p.Gly326Glu, p.Glu329Gly, p.Gly352Val, and p.Gln388Leu) or enrichment (e.g., p.Lys26Arg, p.Asn64Lys, p.Gln102Pro and p.His378Arg) were also observed, 72 suggesting that these variants could contribute to weakening or enhancing the PD− RBD affinity, respectively. The dose of a viral inoculum 124 is an important factor for infection or severity of infection; a weakened/enhanced affinity could influence the dose required for infection in individuals carrying variants, and these observations encourage the investigation of the consequences of host variants in SARS-CoV-2 pathogenesis. These variants, however, are rare (gnomAD 30 allele frequencies, 10 −3 to 10 −5 ) and are unlikely to occur in ACE2 at the same time. The occurrence frequency of these variants also varies between human subpopulations. For example, in gnomAD, 30 billion in Africa (and assuming 50% men), the p.Ser19Pro variant, with a frequency of 3.3 × 10 −3 , is likely to be present in a significant number of people. In women, these variants are likely to occur predominantly as heterozygotes (ACE2 escapes XCI). In heterozygotes, oligomeric protein complexation can be affected (e.g., the sickle-cell trait of heterozygotes 20 or oligomeric haptoglobin of heterozygotes and homozygotes 125 ). The recent structural observation of ACE2 chaperoning B 0 AT1 (also known as SLC6A19) suggests that ACE2 likely dimerizes within the membrane. The positions of the PD−RBD interface variants are distal from the homodimer interface, suggesting that the variants are also less likely to affect homodimerization. The ACE dimers in heterozygous women could also be affected by PD−RBD interface variants. The ACE2−B 0 AT1 interface is also important in this respect, especially in the kidney and the intestine, where B 0 AT1 is predominantly expressed. Proteolytic cleavage of ACE2 by TMPRSS2 for the endosomal release of SARS-CoV-2 is an important step, and B 0 AT1 could possibly compete with TMPRSS2 for ACE2 access ( Figure 2D ). ACE2− B 0 AT1 interfacial variants could therefore also indirectly influence pathogenesis. The ACE2 transmembrane helix is the primary site of B 0 AT1 interaction 57 ( Figure 1B) , and four ACE2 missense variants (p.Pro734Leu, pIle741Val, p.Val745Ile, and p.Ile753Met) can be mapped onto the ACE2−B 0 AT1 interface ( Figure 1C) , with predicted ΔΔG values in a similar range as above. Among the four variants, only p.Pro734Leu has a PolyPhen score of 1.0 (i.e., likely deleterious), and this variant is seen in the South Asian population with an allele frequency 1.1 × 10 −4 (also observed in 1 out of ∼5000 South Asian men in gnomAD). That is, despite being predicted to be deleterious, men possessing the variant are known to have survived. It is, however, unclear whether these variants could interfere with B 0 AT1 interactions to influence TMPRSS2 access. Missense variants (p.Asn638Ser and p.Arg710His) at the ACE2 homodimer interface ( Figure 1C ) could also influence viral entry (i.e., homodimer interface disruption could enhance TMPRSS2 accessibility). Asn638 and Arg710 are engaged in ACE2 intersubunit hydrogen bonding and salt bridge formation, respectively. In particular, p.Arg710His has a PolyPhen score of 1.0 (i.e., deleterious). p.Arg710His is observed in non-Finnish Europeans (also observed in 1 out of ∼40 000 Europeans) and the East Asian population, with allele frequencies of 6.1 × 10 −5 and 1.5 × 10 −4 , respectively. The predicted ΔΔG of these two naturally occurring mutants is 1.95 and 0.24 kcal/mol, respectively, and they would be useful in revealing mechanistic details of the endosomal release of the trimeric S protein docked onto the dimeric ACE2 protein, similar in flavor to a previous in vitro mechanistic study 126 of HIV-1 entry disruption by CCR5 nonsense variant C100X (a rare allele, 6.6 × 10 −4 ). Similarly, one of the ACE2 disulfide (Cys133−Cys141) disrupting variants (p.Cys141Tyr, 5.8 × 10 −6 ) would be useful. In addition to the ACE2 missense variants discussed here in the context of ACE2 structure and complex, a discussion of ACE2 nonmissense variants would also be very important. gnomAD reports ACE2 nonmissense variants (Figure 2A, B) , although only a few due to high pLI score. There, respectively, are two, one, and two ACE2 rare nonsense (stop-gained, 5.9 × 10 −6 and 6.4 × 10 −6 ), frameshift (5.4 × 10 −6 ), and inframe deletion (1.1 × 10 −5 and 5.6 × 10 −6 ) variants reported in gnomAD ( Figure 2B ). All nonmissense variants map to PD, suggesting that PD truncation and structural perturbations from the nonmissense variants are likely to negatively impact PD−RBD interactions and viral entry in carrier individuals. Auxiliary Protein Variants. In addition to ACE2 (which directly interacts with SARS-CoV-2 proteins), other human proteins could also influence ACE2−S-protein interactions (Figure 2A,D) . For example, interactions of B 0 AT1 with ACE2 could possibly compete with TMPRSS2 and ADAM17 to influence ACE2 proteolysis ( Figure 2D ). Proteins (e.g., B 0 AT1) with the potential to exert indirect influence on host−viral protein interactions (e.g., ACE2−S-protein) are referred to as auxiliary proteins here for convenience. The list of auxiliary proteins (e.g., B 0 AT1, TMPRSS2, ADAM17, etc.) that likely influence viral entry is provided in Figure 2A , highlighting different categories of variants associated with these auxiliary proteins. In this regard, B 0 AT1 missense variants at the ACE2− B 0 AT1 interface would be important in influencing the proteolytic outcome of ACE2. Six out of the 14 B 0 AT1 missense variants at the ACE2−B 0 AT1 interface have 30 Å 2 or more of their surface area in contact with ACE2, with the predicted ΔΔG of four of these being quite high (1.20 to 2.02 kcal/mol). Of these four, p.Arg214Gly is engaged in intermolecular hydrogen bonding and is carried by non-Finnish Europeans (allele frequency, 6.1 × 10 −5 ) and African (3.5 × 10 −5 ) populations. Unlike ACE2, pLI for B 0 AT1 is 0.0 (and pNull = 1.0, see later), and several B 0 AT1 stop-gained and other nonmissense variants are observed in gnomAD (Figure 2A, C) . Truncations/ structural perturbations of B 0 AT1 due to nonmissense variants are likely to favor ACE2 proteolysis. Analyzing expression quantitative trait loci (eQTL) polymorphic variants for examining ACE2 expression in the genotype-tissue expression (GTEx) database, Chen et al. 10 in a recent report showed that at the population level, ACE2 expression is negatively correlated with COVID-19 severity. 10 To substantiate the counterintuitive observation, follow-up in silico expression analyses by Brest et al. 11 of variants of not only ACE2 but also the transmembrane proteases TMPRSS2 and ADAM17 suggest that protease expressions are likely to also contribute to disease severity. 11 The logical support for the argument 11 is that ACE2 is shed from membranes upon proteolysis 127 ( Figure 2D ). Proteolysis by ADAM17 results in ACE2 shedding into extracellular space, as ADAM17 operates on ACE2 and not the S protein. 11, 127 TMPRSS2, on the contrary, functions on ACE2 as well as the S protein, leading to endosomal release (i.e., virus uptake) upon membrane fusion. 11 The authors suggest that for low viral loads, ACE2 shedding could serve as a barrier for infection, whereas for high viral loads, proteolysis would lead to infection 11 ( Figure 2D ). In short, these proteases and their variants are likely to play an important role in the clinical outcome. The variants probed by human population level GTEx expression analyses are genome variants (i.e., not exome variants necessarily) typically occurring within ±10 kb of the gene of interest with minor allele frequencies between 0.05 and 0.5. The focus here is on rare (allele frequencies, 10 −3 to 10 −5 ) exome variants, as these variants have the potential (see later) to influence clinical outcomes of an individual (with the prospect of understanding mechanisms from protein analysis), complementing the population level study. The list of gnomAD rare nonmissense variants for TMPRSS2 and ADAM17 shows several variants (Figure 2A,C) , especially for TMPRSS2 (pLI = 0.0), that could influence viral entry. Post-viral-entry protein machinery engaged in the host immune response is discussed next, as a number of recent studies 3−8 point to the contributions of innate immunity toward disease severity. For example, an important recent study from the Casanova laboratory 5 reports that ∼10% of patients with lifethreatening COVID-19 pneumonia have neutralizing autoantibodies (auto-Abs) against interferons, 5 whereas the antiinterferon auto-Abs are absent in asymptomatic patients. 5 Interferons (cytokine subgroup) are key signaling proteins of innate immunity, especially in viral infections. It had previously been shown that phenocopies of auto-Abs against cytokines are similar to clinical phenotypes of germline variants of cytokines and cytokine-receptors. 5 In other words, the absence of functional innate immunity signaling proteins due to either germline mutations or auto-Abs against them could compromise the innate immune response, leading to a severe infectious. Here variants of proteins associated with innate immunity signaling pathway for the production of interferons and other cytokines (e.g., interleukins) are discussed in regard to COVID-19 disease severity. Toll-like Receptor 7 (TLR7) Variants. van der Made et al. 3 reported an important case study highlighting the role of monogenic rare genetic variants in severe COVID-19 in four young men (i.e., two pairs of brothers) from unrelated families with no prior history of major chronic diseases. 3 Rapid whole genome sequencing of the four (including one deceased) paired brother patients showed putative loss-of-function mutations in TLR7. 3 TLR7 functions as a pattern recognition receptor (e.g., recognizing ssRNA) and plays an important role in the viral (e.g., RNA viruses) immune response. TLR7 is encoded from the X chromosome. The two TLR7 variants found in the brothers are frameshift p.Gln710Argfs*18 (in one family) and missense p.Val795Phe (in the other) variants that map to the leucine-rich repeat (LRR) structural region of TLR7 ( Figure 3A ,B). These variants are not observed in gnomAD, suggesting that the two variants are rare (i.e., occur with very low allele frequency to be observed in ∼140 000 individuals in gnomAD). The functional assay of primary immune cells of the patients (i.e., carrying the TLR7 variants) by probing with TLR7 agonist showed the defective regulation of type-I IFN-related genes in comparison with normal cells. 3 This suggested that the functional (and likely structural) consequences of the rare variants compromise TLR7 (and downstream innate immunity signaling protein) functions, further underscoring the importance of the study of rare variants. The p.Gln710Argfs*18 frameshift variant is expected to alter the spacing of nonpolar residue patterns at distinctive intervals within the 22−29 residue repeating LRR motif, and p.Gln710Argfs*18 is thus expected to perturb the LRR fold. Given the functional consequence of the LRR frameshift (p.Gln710Argfs*18) variant in the above patient brothers, the identification of TLR7 nonmissense variants is of great importance. pLI for TLR7 is 0.98; therefore, only a few nonmissense (two stop-gained, three frameshift, and three inframe deletion) TLR7 variants are observed in gnomAD ( Figure 3A) , and all eight TLR7 nonmissense variants map to LRRs ( Figure 3A) . These nonmissense variants are likely to similarly compromise TLR7 function, and men carrying these nonmissense variants could be at risk for severe COVID-19. The structure of human TLR7 is unavailable, but estimates of structural perturbations due to missense variants of human TLR7 can be made by utilizing the structure of Rhesus macaque TLR7 129 (98% sequence identity). On the basis of the Rhesus macaque TLR7 structure, gnomAD missense variants that are expected to be similar to p.Val795Phe (in the patient brothers) in their behavior are also important for discussion. The Figure 3C ), suggesting that the observed rare variants would be important for determining the risk of COVID-19 severity. The structure of Rhesus macaque TLR7 129 in complex with ssRNA in the functionally competent homodimer state ( Figure 3B ) also shows the presence of interfacial missense variants such as p.Arg473Lys (TLR7−ssRNA interface), p.Arg186Gln, and p.Arg553Trp (TLR7−TLR7 interface), along with the variant p.Asn523His at a glycosylation site. All of these variants, through structural or functional perturbation of TLR7, have the potential to contribute to the malfunctioning of TLR7-mediated signaling. For X-linked TLR7, men are expected to be affected more by TLR7 variants. The impact of TLR7 variants on females is less clear, as some studies report TLR7 to escape XCI, 131 whereas other studies have not included TLR7 among the list of genes escaping XCI. 118 However, the autosomal dominant behavior of TLR3 variants (see later) suggests that females could also be affected by TLR7 variants, likely due to the dimerization during RNA recognition. Variants of Other Innate Immunity Proteins. Another important recent study 4 from the Casanova laboratory shows that rare loss-of-function variants in proteins associated with TLR3 and the interferon regulatory factor 7 (IRF7)-dependent type-I interferon (IFN) signaling pathway are enriched in patients with life-threatening COVID-19 pneumonia relative to asymptomatic infections. 4 In the study, 13 genetic loci whose variants had been known to be associated with immunity to the influenza virus were probed experimentally to identify autosomal-dominant/-recessive loss-of-function variants in these loci in ∼4% of patients with life-threatening COVID-19 pneumonia. 4 Information on the loss-of-function variants of these 13 genes present in the human population would therefore be valuable for predicting susceptibility to the severe disease. In addition, because auto-Abs against interferons (i.e., resulting in (IFNA1−8, IFNA10, IFNA14, IFNA16, IFNA17, IFNA21, IFNB1 , and IFNW1) genes are listed in Figure 4A . It is important to note that the pLI score of many of these genes (e.g., TLR3, IRF7, IFNAR1, etc.) is ∼0.0, suggesting that loss-offunction variants of these genes are tolerated, especially TLR3, IRF7, and IFNAR1, which have 15, 10, and 17 truncation variants, respectively, in gnomAD along with having several other nonmissense variants ( Figure 4A ). It is interesting to note that for some of the loss-of-function variants (e.g., in TLR3, TBK1, IRF3, etc.) identified in the study 4 (or those known from previous studies), a single copy of the variant (i.e., heterozygous) could result in severe disease. 4 That is, with respect to disease severity, some of the variant alleles are dominant. In other words, a heterozygous individual can otherwise be healthy (i.e., in absence of the infection, unlikely to be affected by the loss-offunction) but could develop severe disease when challenged by the infection. For the autosomal dominant nature (i.e., with respect to disease severity) of several variants of the genes identified in the study, 4 other variants of these genes observed in gnomAD also have the potential to result in malfunction of the corresponding proteins. For example, like the TLR3 dominant p.Ser339 fs (frameshift) and p.Trp769* (stop-gained) loss-offunction variants observed in patients with severe pneumonia, 4 the list of other TLR3 stop-gained (e.g., p.Trp11*, p.Ser88*, p.Arg394*, etc.), frameshift (e.g., p.Leu243MetfsTer16, p.Leu255GlufsTer2, p.Lys345ArgfsTer10, etc.), and deletion/ insertion (e.g., p.Asn285del, p.Asp348del, p.Leu482del, etc.) variants, IRF7 stop-gained (e.g., p.Trp30*, p.Trp214*, p.Trp391*, etc.) and frameshift (e.g., p.Asp207AlafsTer206, p.Leu185SerfsTer26, p.Glu95ArgfsTer3, etc.) variants, and others observed in gnomAD are of great interest for understanding the potential for risks of severe symptoms. In gnomAD, rare variants are typically observed in heterozygotes, and the rare variants of innate immunity proteins have the potential to function as autosomal dominant with respect to disease severity. Whereas there are several variants of proteins ( Figure 4A ) with the potential to interfere with the type I interferon (IFN) pathway (and therefore demand attention), only a few variants (e.g., missense) of TLR3 and TBK1 ( Figure 4B −E) are discussed here in the context of protein structure for convenience. The autosomal dominant TLR3 missense variants p.Pro554Ser and p.Met870Val probed in the study 4 above are, respectively, within the LRR and TIR (Toll/IL-1 receptor) regions of the molecule. In the mouse TLR3 LRR structure (∼80% sequence identity with human), p.Pro554Ser is placed in the loop connecting two sets of LRRs engaged in the dsRNA and the dimer interface, respectively ( Figure 4B ). With an estimated energetic contribution of ∼2 kcal/mol (FoldX 130 ), Pro55 likely plays an important role in the assembly for the TLR3 signaling complex. The known p.Leu360Pro dominant loss-of-function TLR3 missense variant, impacting herpes simplex virus-1 encephalitis (HSE), 28 resides in an inward facing position within the LRR motif. Similar to p.Leu360Pro, several missense variants that substitute nonpolar residues (i.e., Leu/Ile) at inward facing positions within the LRRs (with estimated energetic contributions of 2−4 kcal/mol) are also observed in gnomAD ( Figure 4D, left) . In addition, TLR3−dsRNA interface variant (p.Arg489Ser) ( Figure 4D , right) and TLR3 disulfide disrupting variants (p.Cys28Gly and p.Cys651Ser) are also observed ( Figure 4B , right). TLR3 TIR belongs to Group 3 TIR. 133 Group 3 TIR possesses a distinct cluster of bulky nonpolar residues ( Figure 4C ), and the modeled human TLR3 TIR structure suggests that p.Met870Val is likely to interfere with the nonpolar residue cluster ( Figure 4C ). The TBK1 homodimer structure indicates that gnomAD missense variants p.Arg444Gln and p.Arg547Lys in each protomer are likely to perturb several interfacial hydrogen-bonded interactions and that these variants have the potential to perturb TBK1 function ( Figure 4E ). In addition to variants of the innate immunity pathway proteins, the genetic variability across the human leukocyte antigen (HLA) A, B, and C genes could also contribute to SARS-CoV-2 susceptibility and disease severity. In this regard, in silico 134−136 assessment of the binding of SARS-CoV-2 peptides onto major histocompatibility complex (MHC) class I binding pocket across HLA-A, -B, and -C genotypes is also useful for the analysis of susceptibility. 134−136 Because the presentation of high-affinity peptide by MHC class I requires the peptideloading complex 137, 138 protein machinery, information about the variants of genes (TAPBP, PDIA3, CALR, TAP1, and TAP2, referred to here as MHCI-assembly genes) of the peptideloading complex will also be useful in this regard. The variants of MHCI-assembly genes are therefore listed in Figure 5A (bottom), and it is interesting to note that pLI probabilities for TAP1, TAP2, and TAPBP are 0.0. The focus of the discussion above on variants was on a small set of genes that are engaged in a specific pathway (e.g., viral entry and innate immunity). Variants of proteins of other pathways could also contribute to susceptibility. In general, for a systemlevel view of variants in the context of host−virus interactions, proteins identified through proteomic-scale approaches provide a useful reference. In this regard, a discussion on variants associated with human proteins recently identified by Gordon et al. 92 and Flynn et al. 94 that interact with SARS-CoV-2 proteins and RNA, respectively, would be valuable. Gordon et al. 92 and Flynn et al., 94 respectively, report 330 and 309 human proteins engaged in high-confidence SARS-CoV-2−human protein− protein interactions 92 (referred to as SARS-CoV-2 PPI proteins) and RNA−protein interactions 94 (referred to as SARS-CoV-2 RPI proteins). For convenience, the discussion on variants of SARS-CoV-2 PPI and RPI proteins is in the context of tolerance to loss-of-function variants. Genes can be categorized as (a) tolerant to loss-of-function variants as homozygous (pNull probability high), (b) tolerant to loss-of-function variants as heterozygous (pRec probability high), and (c) intolerant to lossof-function variants as heterozygous (pLI probability high). 117 As discussed above, TLR3, IRF7, and IFNAR1 all have low pLI (i.e., 0.0, Figure 4A ). IRF7 and IFNAR1 have high pNull (0.99 and 0.84, respectively), and TLR3 additionally has small pNull (i.e., 0.01, Figure 4A ). That is, IRF7 and IFNAR1 are expected to be tolerant to loss-of-function variants as homozygous, whereas TLR3 is expected to be tolerant to loss-of-function as heterozygous. It is interesting to note that autosomal recessive variants (i.e., homozygous) for disease severity were observed for IRF7 (e.g., p.Pro364 fs/p.Pro364 fs) 4 and IFNAR1 (e.g., p.Trp73Cys/Trp73Cys), 4 which is consistent with their high pNull probability for an individual to carry both loss-of-function alleles. Autosomal dominant (i.e., heterozygous) variants were observed for TLR3 (e.g., p.Trp769*/wildtype), 4 which is consistent with the low pNull probability for TLR3. In fact, autosomal dominant variants identified in the above study 4 were predominantly in genes with low pNull probability (e.g., TLR3, UNC93B1, TBK1, IRF3, etc., Figure 4A ). In short, the set of pLI, pNull, and pRec probabilities provides important information for expecting a variant to be observed as homozygous/ heterozygous. Genes with low pLI (equal to 0) and high pNull (>0.8) probability would have a higher chance of having homozygous variants in a population. For example, FUT2 and CCR5 (pLI = 0 and 0; pNull = 1.0 and 0.98) genes have homozygous truncation variants present in the human population. Because these proteins directly participate in the entry of their respective virus, homozygous truncation variants of FUT2 and CCR5 provide protective advantage. Therefore, for SARS-CoV-2 PPI and RPI proteins that directly interact with viral machinery (and likely participate in the essential steps of the viral life cycle), a discussion on the behavior of pLI and pNull probabilities of these genes is very useful ( Figure 5 ). There are 24 and 13 genes of SARS-CoV-2 PPI and RPI proteins, respectively, with pLI = 0 and pNull >0.8 ( Figure 5A) , and individuals carrying homozygous truncation variants of these genes are likely to be found in the human population. For example, homozygous individuals carrying nonsense or frameshift variants of these genes (e.g., CNTRL, NINL, ANO6, CHD1L, ANXA5, NLRX1, etc.) are observed in gnomAD. The absence of functional forms of these proteins in homozygous individuals could provide a protective advantage. In general, with respect to the disease susceptibility, homozygous variants of genes with high pNull probability are expected to (a) provide protection if the protein is essential for the viral life cycle and (b) result in severe symptoms if the protein is a vital player in innate immunity. The distribution of the pLI score ( Figure 5B ) shows that ∼24 and ∼49% of human SARS-CoV-2 PPI and RPI proteins, respectively, have pLI > 0.9, whereas only ∼16% of human proteins, in comparison, have pLI > 0.9 ( Figure 5B ). This suggests that the virus tends to interact with machinery (especially for viral RNA maintenance) that is less prone to tolerating truncation variants than that expected from the reference background. This could be an adaptation to minimize host variability for viral survival. The distribution of pNull probability suggests that ∼10% of human genes are tolerant to loss-of-function genes as homozygous (i.e., pNull ≥0.9), whereas that for SARS-CoV-2 PPI and RPI protein genes is ∼5 and ∼3%, respectively, which is lower than the reference background ( Figure 5B ) and is consistent with an adaptation to minimize host variability. Among the SARS-CoV-2 PPI and RPI protein genes, 19 are X-linked (e.g., POLA1, PDZD11, DDX3X, MID1IP1, etc.), with none having high pNull probability, but three X-linked genes (PDZD11, HS6ST2, and MID1IP1) have low pLI, and their nonsense variants may provide protection in men. Finally, other variants (e.g., 5′ UTR, synonymous, spliceacceptor/donor, etc.) affecting tissue-level protein expression or stability-compromising missense variants that indirectly affect protein expression levels of the above-discussed pathway specific proteins could also be valuable for assessing the role of genotype in pathogenesis and susceptibility. In recognition of the important role that variants could play in disease susceptibility, the human protein variants involved in SARS-CoV-2−host interactions have recently been reported, analyzed, and discussed. 33, 34 The emphasis here has been to provide a comprehensive view of the roles that human protein variants could play not only in these interactions but also in all major aspects of viral biology, such as viral entry, innate immunity, and host−virus interactions (protein−protein and protein−RNA). Because rare variants have been observed to contribute to disease severity in otherwise healthy individuals, 4 a careful look at those involving pathway-specific proteins in a large reference database is needed, given that they typically occur in heterozygotes. The findings extracted from the gnomAD database suggest that heterozygous carriers of several rare variants are present in the human population and that disease risks may be associated with these individuals. In Journal of Proteome Research pubs.acs.org/jpr Perspective addition, the observed influence of rare homozygous recessive variants on disease severity 4 encouraged an assessment of tolerance for loss-of-function (involving pLI, pNull, and pRec probabilities 117 ) of the genes discussed above. The degree of this tolerance helps estimate the likelihood of observing rare variant carriers (both homozygous and heterozygous) in a population. Although the discussion was centered around gnomAD, the rare variants not observed in this database (e.g., the variants of TLR7 and TLR3 discussed above) could play equally important roles. Because the profiling of an individual's variants can now be performed with relative ease (e.g., using quantitative PCR 139 or MassARRAY 140−142 ), this is especially true for specific loci, which would avoid whole-genome sequencing and thereby facilitate the assessment of COVID-19 disease risks. Such profiling would be valuable for optimizing national responses to this health crisis while we await an effective therapeutic or vaccine. In addition, knowledge about the protein structural consequences of these variants (Figures 1, 3, and 4) ■ REFERENCES What Is COVID-19? Presence of Genetic Variants Among Young Men With Severe COVID-19 Inborn errors of type I IFN immunity in patients with life-threatening COVID-19 Autoantibodies against type I IFNs in patients with life-threatening COVID-19 Impaired type I interferon activity and inflammatory responses in severe COVID-19 patients Longitudinal analyses reveal immunological misfiring in severe COVID-19 Systems biological assessment of immunity to mild versus severe COVID-19 Genomewide Association Study of Severe Covid-19 with Respiratory Failure Individual variation of the SARS-CoV-2 receptor ACE2 gene expression and regulation Host polymorphisms may impact SARS-CoV-2 infectivity A Global Effort to Define the Human Genetics of Protective Immunity to SARS-CoV-2 Infection Human genetic susceptibility to infectious disease Genome-wide association and HLA region fine-mapping studies identify susceptibility loci for multiple common infections Genome-wide association studies and susceptibility to infectious diseases An Atlas of Genetic Variation Linking Pathogen-Induced Cellular Traits to Human Disease Genetic Predisposition to Infectious Disease Mechanisms of genetically-based resistance to malaria Population genetics of malaria resistance in humans Genetic restriction of HIV-1 infection and progression to AIDS by a deletion allele of the CKR5 structural gene The role of viral phenotype and CCR-5 gene defects in HIV-1 transmission and disease progression Human susceptibility and resistance to Norwalk virus infection Molecular basis of altered red blood cell membrane properties in Southeast Asian ovalocytosis: role of the mutant band 3 protein in band 3 oligomerization and retention by the membrane skeleton Studying the Effects of ACE2Mutations on Stability, Dynamics and Dissociation Process of SARS-CoV-2 S1/hACE2 Complexes Severe viral respiratory infections in children with IFIH1 loss-of-function mutations Severe influenza pneumonitis in children with inherited TLR3 deficiency Life-threatening influenza pneumonitis in a child with inherited IRF9 deficiency The mutational constraint spectrum quantified from variation in 141,456 humans Building a Hybrid Physical-Statistical Classifier for Predicting the Effect of Variants Related to Protein-Drug Interactions SARS-CoV-2 (COVID-19) structural and evolutionary dynamicome: Insights into functional evolution and human genomics SARS-CoV-2 encoded proteome and human genetics: from interaction-based to ribosomal biology impact on disease and risk processes Cleavage Site in the Spike Protein of SARS-CoV-2 Is Essential for Infection of Human Lung Cells SARS-CoV-2 cell entry depends on ACE2 and TMPRSS2 and is blocked by a clinically proven protease inhibitor Structure, function, and antigenicity of the SARS-CoV-2 spike glycoprotein Cell entry mechanisms of SARS-CoV-2 Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation In situ structural analysis of SARS-CoV-2 spike reveals flexibility mediated by three hinges Molecular Architecture of the SARS-CoV-2 Virus The Architecture of Inactivated SARS-CoV-2 with Postfusion Spikes Revealed by Cryo-EM and Cryo-ET. Structure Structure-based design of prefusion-stabilized SARS-CoV-2 spikes A thermostable, closed SARS-CoV-2 spike protein trimer Controlling the SARS-CoV-2 spike glycoprotein conformation Structure-Based Design with Tag-Based Purification and In-Process Biotinylation Enable Deep Mutational Scanning of SARS-CoV-2 Receptor Binding Domain Reveals Constraints on Folding and ACE2 Binding The Impact of Mutations in SARS-CoV-2 Spike on Viral Infectivity and Antigenicity Mutations Strengthened SARS-CoV-2 Infectivity Structural and Functional Analysis of the D614G SARS-CoV-2 Spike Protein Variant Spike mutation D614G alters SARS-CoV-2 fitness A pneumonia outbreak associated with a new coronavirus of probable bat origin Functional assessment of cell entry and receptor usage for SARS-CoV-2 and other lineage B betacoronaviruses SARS-CoV-2 receptor ACE2 and TMPRSS2 are primarily expressed in bronchial transient secretory cells The protein expression profile of ACE2 in human tissues Structural basis for the recognition of SARS-CoV-2 by full-length human ACE2 Structural basis of receptor recognition by SARS-CoV-2 Structural and Functional Basis of SARS-CoV-2 Entry by Using Human ACE2 Structure of the SARS-CoV-2 spike receptor-binding domain bound to the ACE2 receptor Potent Neutralizing Antibodies against SARS-CoV-2 Identified by High-Throughput Single-Cell Sequencing of Convalescent Patients' B Cells Potently neutralizing and protective human antibodies against SARS-CoV-2 Cross-neutralization of SARS-CoV-2 by a human monoclonal SARS-CoV antibody Rapid isolation and profiling of a diverse panel of human monoclonal antibodies targeting the SARS-CoV-2 spike protein A SARS-CoV-2 surrogate virus neutralization test based on antibody-mediated blockage of ACE2-spike protein-protein interaction Quantification of SARS-CoV-2 neutralizing antibody by a pseudotyped virus-based assay A highly conserved cryptic epitope in the receptor binding domains of SARS-CoV-2 and SARS-CoV Structures of Human Antibodies Bound to SARS-CoV-2 Spike Reveal Common Epitopes and Recurrent Features of Antibodies SARS-CoV-2 neutralizing antibody structures inform therapeutic strategies Structural basis for the neutralization of SARS-CoV-2 by an antibody from a convalescent patient An Alternative Binding Mode of IGHV3−53 Antibodies to the SARS-CoV-2 Receptor Binding Domain De novo design of picomolar SARS-CoV-2 miniprotein inhibitors Neutralizing nanobodies bind SARS-CoV-2 spike RBD and block interaction with ACE2 High Potency of a Bivalent Human VH Domain in SARS-CoV-2 Animal Models Identification of an anti-SARS-CoV-2 receptor-binding domain-directed human monoclonal antibody from a naive semisynthetic library A cross-reactive human IgA monoclonal antibody blocks SARS-CoV-2 spike-ACE2 interaction Structure of the RNA-dependent RNA polymerase from COVID-19 virus Structural basis for inhibition of the RNA-dependent RNA polymerase from SARS-CoV-2 by remdesivir Structural Basis for RNA Replication by the SARS-CoV-2 Polymerase High-resolution structures of the SARS-CoV-2 2′-O-methyltransferase reveal strategies for structure-based inhibitor design SARS-CoV-2 Nsp1 binds the ribosomal mRNA channel to inhibit translation Structural basis for translational shutdown and immune evasion by the Nsp1 protein of SARS-CoV-2 Structure of M(pro) from SARS-CoV-2 and discovery of its inhibitors Crystal structure of SARS-CoV-2 main protease provides a basis for design of improved α-ketoamide inhibitors Identify potent SARS-CoV-2 main protease inhibitors via accelerated free energy perturbation-based virtual screening of existing drugs Activity profiling and crystal structures of inhibitor-bound SARS-CoV-2 papain-like protease: A framework for anti−COVID-19 drug design Inhibition of SARS-CoV-2 viral entry upon blocking N-and O-glycan elaboration Proteomics of SARS-CoV-2-infected host cells reveals therapy targets A direct RNA-protein interaction atlas of the SARS-CoV-2 RNA in infected human cells Systematic discovery and functional interrogation of SARS-CoV-2 viral RNA-host protein interactions during infection Pathogenesis of SARS-CoV-2 in Transgenic Mice Expressing Human Angiotensin-Converting Enzyme 2 A mouse-adapted model of SARS-CoV-2 to test COVID-19 countermeasures Pathogenesis and transmission of SARS-CoV-2 in golden hamsters Clinical characteristics of coronavirus disease 2019 in China Baseline characteristics and outcomes of 1591 patients infected with SARS-CoV-2 admitted to ICUs of the Lombardy Region Hospitalization and mortality among black patients and white patients with Covid-19 Presenting characteristics, comorbidities, and outcomes among 5700 patients hospitalized with COVID-19 in the New York City area Alter, G. Distinct Early Serological Signatures Track with SARS-CoV-2 Survival. Immunity Targets of T Cell Responses to SARS-CoV-2 Coronavirus in Humans with COVID-19 Disease and Unexposed Individuals SARS-CoV-2-specific T cell immunity in cases of COVID-19 and SARS, and uninfected controls Viral and host factors related to the clinical outcome of COVID-19 Phase I/II study of COVID-19 RNA vaccine BNT162b1 in adults A novel angiotensin-converting enzyme−related carboxypeptidase (ACE2) converts angiotensin I to angiotensin 1−9 Hydrolysis of biological peptides by human angiotensin-converting enzyme-related carboxypeptidase Angiotensin-converting enzyme 2 metabolizes and partially inactivates pyr-apelin-13 and apelin-17: physiological effects in the cardiovascular system Analysis of protein-coding genetic variation in 60,706 humans Landscape of X chromosome inactivation across human tissues Analysis of angiotensin-converting enzyme 2 (ACE2) from different species sheds some light on crossspecies receptor usage of a novel coronavirus 2019-nCoV Spike protein recognition of mammalian ACE2 predicts the host range and an optimized ACE2 for SARS-CoV-2 infection BeAtMuSiC: Prediction of changes in protein-protein binding affinity on mutations Quantification of biases in predictions of protein stability changes upon mutations A method and server for predicting damaging missense mutations A Simple Method of Estimating Fifty Per Cent Endpoints Identification of the haemoglobin scavenger receptor Multiple nonfunctional alleles of CCR5 are frequent in various human populations TMPRSS2 and ADAM17 cleave ACE2 differentially and only proteolysis by TMPRSS2 augments entry driven by the severe acute respiratory syndrome coronavirus spike protein MUSTANG: a multiple structural alignment algorithm Structural analysis reveals that Toll-like receptor 7 is a dual receptor for guanosine and single-stranded RNA Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations TLR7 escapes X chromosome inactivation in immune cells ModBase, a database of annotated comparative protein structure models and associated resources A survey of TIR domain sequence and structure divergence Human Leukocyte Antigen Susceptibility Map for Severe Acute Respiratory Syndrome Coronavirus 2 Estimating the Binding of Sars-CoV-2 Peptides to HLA Class I in Human Subpopulations Using Artificial Neural Networks Quantification of Uncertainty in Peptide-MHC Binding Prediction Improves High-Affinity Peptide Selection for Therapeutic Design The MHC I loading complex: a multitasking machinery in adaptive immunity Structure of the human MHC-I peptide-loading complex Molecular genetic testing and the future of clinical genomics SNP genotyping using the Sequenom MassARRAY iPLEX platform. Current protocols in human genetics A genome-wide association study identifies six novel risk loci for primary biliary cholangitis Germinal Immunogenetics predict treatment outcome for PD-1/PD-L1 checkpoint inhibitors