key: cord-338723-3vm23fgy authors: Lee, In-Hee; Lee, Ji-Won; Kong, Sek Won title: A survey of genetic variants in SARS-CoV-2 interacting domains of ACE2, TMPRSS2 and TLR3/7/8 across populations date: 2020-08-26 journal: Infect Genet Evol DOI: 10.1016/j.meegid.2020.104507 sha: doc_id: 338723 cord_uid: 3vm23fgy The COVID-19 pandemic highlighted healthcare disparities in multiple countries. As such morbidity and mortality vary significantly around the globe between populations and ethnic groups. Underlying medical conditions and environmental factors contribute higher incidence in some populations and a genetic predisposition may play a role for severe cases with respiratory failure. Here we investigated whether genetic variation in the key genes for viral entry to host cells—ACE2 and TMPRSS2—and sensing of viral genomic RNAs (i.e., TLR3/7/8) could explain the variation in incidence across diverse ethnic groups. Overall, these genes are under strong selection pressure and have very few nonsynonymous variants in all populations. Genetic determinant for the binding affinity between SARS-CoV-2 and ACE2 does not show significant difference between populations. Non-genetic factors are likely to contribute differential population characteristics affected by COVID-19. Nonetheless, a systematic mutagenesis study on the receptor binding domain of ACE2 is required to understand the difference in host-viral interaction across populations. Coronavirus disease 2019 caused by SARS-CoV-2 is a pandemic as of Mar. 2020. Initial reports from China revealed diverse risk factors, clinical courses and outcome for a relatively homogenous population (Zhou et al., 2020a) . Morbidity and mortality vary between populations (Yancy, 2020) . African Americans and Latinos are disproportionately affected by COVID-19 and show significantly higher mortality compared to the other race and ethnic groups in the US (Wadhera et al., 2020) and in the UK (Kirby, 2020) . A -healthcare disparity‖ must be responsible for the high incidence among minorities although socioeconomic factors, underlying medical conditions, and the difference in genetic susceptibility to SARS-CoV-2 infection may contribute (Chen et al., 2020) . Of note, a 3p21.31 gene cluster-SLC6A20, LZTFL1, CCR9, FYCO1, CXCR6 and XCR1-is associated with genetic susceptibility for severe COVID-19 cases with respiratory failure (Ellinghaus et al., 2020) . To find allelic variation across populations in the genes that are known be involved in viral entry to the host cells and sensing of viral RNA in host immune cells, we surveyed publicly available databases of genomic variants. SARS-CoV-2 is an enveloped and positive single-stranded RNA (ssRNA) virus and initiates human cell entry by binding of spike (S) protein present on the viral envelope to angiotensin converting enzyme 2 (ACE2) receptor on the host cells (Zhou et al., 2020b) . The SARS-CoV S protein/ACE2 interface has been elucidated at the atomic level, and the ACE2 was found to be a key factor of SARS-CoV transmission (Li et al., 2005b) . The binding mode of SARS-CoV-2 receptor binding domain (RBD) to ACE2 is nearly identical to SARS-CoV (Lan et al., 2020) . The S protein is cleaved into S1 and S2 by the type 2 transmembrane serine protease (TMPRSS2) and endosomal cysteine proteases cathepsin B and L (CatB/L) (Du et al., 2009) . TMPRSS2 is believed to be of utmost importance for SARS-CoV-2 entry into host cells. Recent studies demonstrated that an inhibitor of the protease activity of TMPRSS2-camostat mesylate-attenuated SARS-CoV-2 entry into lung epithelial cells suggesting a promising candidate for potential intervention against COVID-19 (Hoffmann et al., 2020) . The C-terminal domain of S1 subunit is responsible for binding of SARS-CoV-2 to ACE2 and the S2 subunit undergoes a conformational change that result in virus-membrane fusion and entry into the target cell (Du et al., 2009 ). Viral genomic RNA J o u r n a l P r e -p r o o f Journal Pre-proof is then released and translated into viral polymerase proteins for viral replication. Innate immune response is the first line of host defense mechanism for SARS-CoV-2 infection. Toll-like receptors recognize the viral RNAdouble-stranded RNA (dsRNA) by TLR3 and ssRNA by TLR7 and TLR8and trigger innate immune responses such as the expression of inflammatory genes for type I interferons and proinflammatory cytokines (Iwasaki and Pillai, 2014; Iwasaki and Yang, 2020) . Here we surveyed the genetic variants in functional residues of ACE2, TMPRSS2, CTSB/L (CatB/L), and TLR3/7/8 to investigate the difference in the genetic predisposition to the susceptibly of SARS-CoV-2 infection and the initiation of innate immune response. For ACE2, we investigated genetic variants in the residues on the interface to SARS-CoV-2 RBD from recent structural analyses (Hussain et al., 2020; Lan et al., 2020; Shang et al., 2020; Wrapp et al., 2020; Yan et al., 2020) . Given the high sequence similarity between S proteins of SARS-CoV-2 and SARS-CoV, we also investigated the residues shown to inhibit interactions from in vitro mutagenesis analysis (Li et al., 2005b) . We checked two residues reported to cause loss of cleavage activity of TMPRSS2 (Afar et al., 2001) and the enzymatically active sites for CatB/L. A total of 16 residues of TLR7 that are necessary for ssRNA-induced activation (Zhang et al., 2016) and the residues affecting reaction to ssRNAs from in vitro mutagenesis studies for TLR3 (Bell et al., 2006; de Bouteiller et al., 2005; Sarkar et al., 2007) and for TLR8 (Tanji et al., 2015) were checked for sequence variation. Additionally, we searched for nonsynonymous variants that would cause loss of gene function (i.e., frameshift, in-frame insertion/deletion, stop-gain, splice-disrupting, start-lost and stop-lost). The list of reported genetic variants in the genes and their allele frequencies (AFs) were ACE2 is highly conserved with few nonsynonymous variants in the interacting domain with the SARS-CoV-2 RBM (Lan et al., 2020) . Of 370 coding variants in ACE2, 248 were nonsynonymous variants with the highest AF of 1.6% (rs41303171). Within 33 residues interfacing the SARS-CoV-2 RBM, 19 variants (including 4 synonymous variants) were found with average AF of 0.03% (ranges 0-0.39%) ( Table 1) . Only one of the 19 variants (rs4646116; K26R) had global AF greater than 0.1% (AF=0.39%). Rs4646116 (NC_000023.10:g.15618958T>C) had the largest AF difference across populations: the lowest AF (0.007%) in East Asian and the highest (0.59%) in Non-Finnish European. The impact of this variant is not yet investigated with structural analysis but was not classified as deleterious (of possible impact on the structure and function of the protein) by in silico prediction algorithms such as SIFT and Polyphen2. The other variants were either very rare (i.e., population AF < 0.1%) or unique to a population or two. For the five known residues-K31, E35, D38, M82 and K353that were reported to significantly change binding affinity to viral S protein (Li et al., 2005a) , we found three variants: rs758278442 (K31K), rs1348114695 (E35K), and rs766996587 (M82I). However, all three were either synonymous or predicted to have little impact on protein. Rs758278442 showed significant AF difference across populations, especially among east Asian populations. It is found only among east Asian individuals in gnomADconsists of 1,909 Korean, 76 Japanese, and 7,212 other east Asian individualswith AF of 0.022%. The variant is also found at Korean Reference Genome Database (N=1,722) with AF of 0.029%, similar value to gnomAD. However, it was found with higher AF of 0.23% at Japanese genetic variation database (N=3,552). Rs1348114695 at residue 35 was found only in European and east Asian populations with very low frequencies: 0.001% and 0.014%, respectively. Lastly, rs766996587 at residue 82 was found only in African population (AF=0.026%). Nonetheless, protein modeling predicts little topological difference between all ACE2 variants and wild-type ACE in their binding to S protein (Hussain et al., 2020) . Therefore, we expect minimal genetic variance across populations critically affecting interaction between ACE2 and SARS-CoV-2. Figure 1A illustrates the 19 variants over known functional protein domains of ACE2. The proteolysis activity of TMPRSS2 is crucial for viral entry to host cells (Hoffmann et al., 2020) . Two residues, V292 and M478, are reported to impact the catalytic activity of TMPRSS2 (Afar et al., 2001 ) but we found no variants at these residues (Supplementary Table 1 ). Reported variants for TMPRSS2 contain 417 nonsynonymous variants including 40 loss-of-function variants. All of loss-offunction variants were very rare (AF < 0.01%). The rest of nonsynonymous variants were also of low frequencies (AF < 0.1%) mostly. Of the only 5 nonsynonymous variants with AF > 0.1%, rs12329760 (V192M, global AF=24.88%) predicted deleterious and its AF ranged from 15.33% (Latino) to 38.38% (East Asian). Further studies are required to test whether rs12329760 could exert functional impact on TMPRSS2 activity. Thus, differences in TMPRSS2 activity caused either by variants at critical loci or by loss-of-function variants are unlikely. SARS-CoV-2 uses both TMPRSS2 and the endosomal cysteine proteases cathepsin B and L (CTSB and CTSL) for priming S protein (Hoffmann et al., 2020) . UniProt entries for human CTSB and CTSL report 3 active sites. We found 3 variants in the active sites for CTSB (two missense variants and one synonymous variant), and one missense variant for CTSL ( Table 1 and Figure 1B) . Although all missense variants on active sites of CTSB/L are predicted deleterious, they were of very low allele frequencies (AF < 0.01%). CTSB has 429 nonsynonymous variants including 51 lossof-function variants (all with AF < 0.01%). CTSL has 211 nonsynonymous variants including 17 loss-offunction variants. Of note, one of 17 variants in CTSL (rs2378757, NC_000009.11:g.90343780A>C) is a common allele (global AF of 70.32%, population AF ranges from 62.66% to 98.48%). The variant changes stop codon to serine for one CTSL transcript isoform (ENST00000342020.5) but falls in intron for the other transcript isoforms. Next we checked genetic variants in TLRs that sense viral RNAs and initiate innate immune responses. There were 7 variants-4 synonymous and 3 nonsynonymous-in the 16 residues of ssRNA interacting domain of TLR7 ( Table 1 and Figure 1C ). Most variants were of extremely low frequencies (AF < 0.01%) except for one synonymous variant, rs769401373 (D135D), found only in east Asian population (AF=0.46%). TLR7 harbors 232 nonsynonymous variants including 8 loss-of-function variants. As in TMPRSS2, AFs of loss-of-function variants were also very low (AF < 0.01%). The UniProt entries for TLR3 and TLR8 list 10 sites (6 for TLR3 (Bell et al., 2006; de Bouteiller et al., 2005; Sarkar et al., 2007) and 4 for TLR8 (Tanji et al., 2015) ) from in vitro mutagenesis study that impact their response to viral infection (sensing of dsRNA or ssRNA, respectively). For these loci, two missense variants on TLR3 and one missense variant with one synonymous variant on TLR8 were found (Table 1 and Figure 1C ). All of these variants in TLRs were very rare (AF < 0.01%) across all populations. To summarize, the critical loci for host-viral interaction and sensing viral genomic RNA are highly conserved in all populations with few very rare variants. Especially, ACE2 and TLR7 seem to be under strong selection pressure as reflected in their relatively lower number of loss-of-function variants than expected in large variant databases such as gnomAD (Karczewski et al., 2020) : three observed variants out of 31 expected ones for ACE2 and two observed variants out of 20.7 expected ones for TLR7. Moreover, nonsynonymous variants in these genes were mostly of very low frequencies which suggests the chance of gene function altered by these variants would be unlikely, compared to the incidence of COVID-19 around the globe. Other factors such as existing medical conditions and environmental risk factors could contribute the regulation of expression of these key genes in susceptible individuals; however, further studies are required to elucidate potential associations. The majority of infected individuals experience no or mild symptoms of upper respiratory tract infection; however, for some individuals, the consequence of SARS-CoV-2 infection could be fatal. One of the contributing factors may be the viral load due to differential affinity of viral spike proteins to ACE2 and the efficiency of cleavage by TMPRSS2 that are essential for virus to enter and replicate inside of host cells. We did not find genetic variation between populations while there is a significant difference in incidence and mortality between race and ethnic groups in the U.S. Therefore, underlying medical conditions, age, environmental factors (e.g., air pollution, smoking, and humidity), and a healthcare disparity influence morbidity and mortality from COVID-19 considering the allelic spectrum for the key J o u r n a l P r e -p r o o f Journal Pre-proof genes associated with viral entry. Nonetheless, genetic susceptibility may play a role for severe cases with respiratory failure (Ellinghaus et al., 2020) . The population-scale genotype databases and datasets used in this study have limitations from relatively small sample size and imbalanced and incomplete representation of various human populations. Thus, there could be unreported variants in ACE2, TMPRSS2, and TLR3/7/8 that may be associated with change of susceptibility to COVID-19. With additional population-scale genomic databases for diverse populations, it will be possible to identify the individuals with rare genetic variants such as rs758278442 in the interacting domain of ACE2 and the genetic predisposition to cytokine storm that causes an acute progress of illness in young people. In parallel, a systematic mutagenesis analysis of the RBM of ACE2 is highly required to understand the difference in host-viral interaction across populations (Lan et al., 2020) . J o u r n a l P r e -p r o o f [1] 1KG P [2] SG DP [3 ] GTE x [4] KRG DB [5 ] Togo Var [6] Globa l Afric an Europ ean East Asia n South Asian Figure 1 Catalytic cleavage of the androgen-regulated TMPRSS2 protease results in its secretion by prostate and prostate cancer epithelia The dsRNA binding site of human Toll-like receptor 3 Epidemiological and clinical characteristics of 99 cases of 2019 novel coronavirus pneumonia in Wuhan, China: a descriptive study The international Genome sample resource (IGSR): A worldwide collection of genome variation incorporating the 1000 Genomes Project data Biospecimen Collection Source Site, N., Biospecimen Collection Source Site Recognition of double-stranded RNA by human toll-like receptor 3 and downstream receptor signaling requires multimerization and an acidic pH The spike protein of SARS-CoV--a target for vaccine and therapeutic development SARS-CoV-2 Cell Entry Depends on ACE2 and TMPRSS2 and Is Blocked by a Clinically Proven Protease Inhibitor Structural variations in human ACE2 may influence its binding with SARS-CoV-2 spike protein Innate immunity to influenza virus infection The potential danger of suboptimal antibody responses in COVID-19 KRGDB: the largescale variant database of 1722 Koreans based on whole genome sequencing The mutational constraint spectrum quantified from variation in 141,456 humans Evidence mounts on the disproportionate effect of COVID-19 on ethnic minorities Structure of the SARS-CoV-2 spike receptor-binding domain bound to the ACE2 receptor Structure of SARS coronavirus spike receptor-binding domain complexed with receptor Receptor and viral determinants of SARS-coronavirus adaptation to human ACE2 The Simons Genome Diversity Project: 300 genomes from 142 diverse populations Two tyrosine residues of Tolllike receptor 3 trigger different steps of NF-kappa B activation Structural basis of receptor recognition by SARS-CoV-2 Tolllike receptor 8 senses degradation products of single-stranded RNA Variation in COVID-19 Hospitalizations and Deaths Across New York City Boroughs Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation Structural basis for the recognition of SARS-CoV-2 by full-length human ACE2 COVID-19 and African Americans Structural Analysis Reveals that Toll-like Receptor 7 Is a Dual Receptor for Guanosine and Single-Stranded RNA Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: a retrospective cohort study Allele frequencies for European are from Non-Finnish European population Expression project (GTEx), v8 whole genomes NBDC's integrated database of Japanese genomic variation (TogoVar) Based on mutagenesis studies from UniProt protein information for Q9BYF1 (ACE2_HUMAN) The ligand-binding sites for small ligands and ssRNA from Zhang Based on active sites from UniProt protein information for P07858 (CATB_HUMAN) Based on active sites from UniProt protein information for P07711 (CATL1_HUMAN) Based on mutagenesis studies from UniProt protein information for O15455 (TLR3_HUMAN) Based on mutagenesis studies from UniProt protein information for Q9NR97 (TLR8_HUMAN) ACE2 S19 [7, 10] X:156189 78-15618980 NC_000023. 10 [7, [9] [10] [11] ☒ The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.☐The authors declare the following financial interests/personal relationships which may be considered as potential competing interests:J o u r n a l P r e -p r o o f