key: cord-1028690-htb2hdls authors: Guruprasad, Lalitha title: Human coronavirus spike protein-host receptor recognition date: 2020-10-31 journal: Prog Biophys Mol Biol DOI: 10.1016/j.pbiomolbio.2020.10.006 sha: d2babe45c614505171da8e28939488eea8637a3c doc_id: 1028690 cord_uid: htb2hdls A variety of coronaviruses (CoVs) have infected humans and caused mild to severe respiratory diseases that could result in mortality. The human CoVs (HCoVs) belong to the genera of α- and β-CoVs that originate in rodents and bats and are transmitted to humans via zoonotic contacts. The binding of viral spike proteins to the host cell receptors is essential for mediating fusion of viral and host cell membranes to cause infection. In this review, we discuss structural features of HCoV spike proteins and recognition of host proteins and carbohydrate receptors. During the past two decades, life-threatening diseases caused by human coronaviruses (HCoVs); severe acute respiratory syndrome coronavirus (SARS-CoV) in China (2002) (Drosten et al. 2003 , Ksiazek et al. 2003 , Peiris et al. 2003 and Middle East respiratory syndrome coronavirus (MERS-CoV) in Saudi Arabia (2012) (Zaki, et al., 2012) have resulted in major epidemics in several countries leading to a number of deaths. In addition, HCoV-229E (named after the specimen code 229E), HCoV-OC43 (organ culture 43), HCoV-NL63 (Netherlands 63) and HCoV-HKU1 (Hong Kong University 1) responsible for causing common colds and mild to acute endemic respiratory illnesses have been reported among the human populations from time to time. The HCoV-229E (McIntosh et a., 1967) and HCoV-OC43 (Kaye et al., 1971 , Hendley et al., 1972 first identified during 1960s as human respiratory pathogens were isolated from Salisbury, United Kingdom. The HCoV-NL63 was isolated during 2002 (van Der Hoek et al., 2004 , Fouchier et al., 2004 and HCoV-HKU1 was isolated during early 2005 (Woo et al., 2005) from patients with pneumonia. Currently the world is worst hit by a deadly infectious pandemic coronavirus disease 2019 caused by novel SARS coronavirus-2 (SARS-CoV-2). COVID-19 was first reported from Wuhan City, Hubei-1 Province, China, during December 2019 and in less than 7 months spread throughout the world. The SARS-CoV-2 was first reported from individuals to have been in contact with wildlife animals at the live animal and seafood market in Jianghan District, Wuhan . The SARS-CoV-2 is observed to spread more rapidly compared to other HCoVs. The transmission of SARS-CoV-2 in humans takes place via contact with respiratory droplets during coughing and sneezing, or surface contacts. The World Health Organization (WHO) has acknowledged evidence of the emergence of airborne transmission of SARS-CoV-2. The prominent symptoms of this viral infection are fever, dry cough, tiredness, loss of smell and taste, severe respiratory illness, and when compounded with conditions of comorbidity, may lead to blood clots in lungs, or respiratory failure, severe hypoxaemia, renal failure, septic shock, multiple organ failure, cardiogenic shock resulting in death (Vincent and Taccone, 2020) . Despite the re-emergence of HCoV infections in altered forms resulting in mortality, there have been no drugs or vaccines to treat them. Some drugs are being evaluated for use as repurposed drugs for COVID-19 (Thorlund et al., 2020 , Shaffer 2020 ) and a list of these drugs is mentioned in Table 1 . Convalescent plasma therapy is another intervention that is being used to treat COVID-19 patients. Design and development of vaccines are underway for combating COVID-19 in several countries. Some J o u r n a l P r e -p r o o f representative reference genomes, mode of viral transmission, carbohydrates and proteins that act as receptors of HCoVs required for human infection, are shown in Table 2 . HCoVs are enveloped spherical shaped virions with 100-120 nm diameter. These are Baltimore class IV (Baltimore 1971) positive-sense single-stranded RNA viruses with 27-31 kb genome size that are translated into structural and non-structural proteins (NSPs). From the 5' proximal direction, ~two-thirds of the HCoV genomes code for two large open reading frames (ORFs), ORFs 1a and 1b genes that utilize one overlapping base and code for NSPs. The latter one-third of the genome encodes for the structural proteins; spike, envelope, membrane and nucleocapsid and genome dependent ORFs. In HCoV-OC43 and HCoV-HKU1 genomes, an additional protein, hemagglutinin esterase (HE) is also present. The name coronavirus stems from the characteristic club-shaped corona-like projections of the spike proteins as seen in the electron microscopy structures of the virus surface (Almeida et al., 1968, Lai and Cavanagh 1997) . The spike protein is required for attachment to the host cellspecific receptors and subsequent catalysis of the virus -host cell membrane fusion. The envelope protein is a small integral membrane protein comprising 75 amino acid residues that plays a role in the virus life cycle. The membrane protein is a type III glycoprotein consisting of a short amino-terminal ectodomain, a triple-spanning transmembrane domain, and a long carboxy-terminal endo-domain. The nucleocapsid protein is a highly basic phosphoprotein that modulates viral RNA synthesis. The short projections of HE are present on the surface of HCoV-OC43 and HCoV-HKU1 virions. The proteins encoded by the genomes of HCoVs are shown in Table 3 . CoVs enter the human cells through the fusion of viral and host cellular membranes, mediated by the interaction between the viral spike glycoprotein and host receptor protein and/or carbohydrate (Belouzard et al., 2012 . The spike proteins also called fusion proteins are heavily glycosylated, single pass integral membrane proteins that assemble as ∼500 kDa homo-trimers projecting ∼18-20 nm from viral envelopes. The multiple sequence alignment of representative HCoV spike proteins is shown in Supplementary Fig. 1 . As per the phylogenetic tree in Fig. 1A , the α-CoV spike proteins (HCoV-NL63 and HCoV-229E) are grouped into one clade, the β-CoVs (SARS-CoV and SARS-CoV-2) and (HCoV-OC43 and HCoV-HKU1) into distinct clades and the MERS-CoV spike protein as yet another clade. The spike protein comprises S1 subunit towards the N-terminus followed by membrane-proximal S2 subunit that consists of hydrophobic fusion peptide and heptad repeat regions. Towards the C-terminus, there is a single membrane-spanning α-helix and an J o u r n a l P r e -p r o o f intracellular segment rich in cysteines. The S1 subunit is divided into four domains; S1 A , S1 B , S1 C and S1 D . The larger S1 A and S1 B domains are termed N-terminal domain (NTD) and Cterminal domain (CTD), respectively. The S1 A domain is characterized by a galectin-like fold. The S1 B domain is characterized either by a five-stranded anti-parallel β-sheet as in β-HCoVs or a six-stranded β-sandwich as in α-HCoVs. The S1 C and S1 D domains are associated with a five-stranded β-sheet structure. Among HCoVs, the S1 subunit varies in length and amino acid sequence, while the S2 subunits share relatively high sequence homology ( Supplementary Fig. 1 ). The 'linker' region connecting the S1 A and S1 B domains comprises ~32 amino acid residues and contributes one β-strand each to the S1 C and S1 D domains. The S2 subunit comprises three long α-helices, multiple α-helical segments and extended twisted β-sheets that span up to the viral membrane proximal end folding into a β-sheet domain. The amino acid sequences in the region connecting the S1 and S2 subunits are variable. The sequence variation among HCoVs in this region provides the flexibility for spike proteins to accommodate the cleavage sites for recognition by different proteases that is necessary for cellular entry of virion into the host cells. For instance, it has been reported that the 'PRRA' sequence motif is gained by SARS-CoV-2 providing a furin cleavage site (Ou et al., 2020) . The loops connecting the domains within the S1 subunit and between the S1 and S2 subunits provide the flexibility for structural rearrangements of the protein in its pre-fusion, post-fusion states, for binding to carbohydrates and to host cell receptors, and for presentation of the cleavage sites at the S1/S2 and the S2' sites during infection. The linear sequence and cartoon representation of the three-dimensional structure of human SARS-CoV-2 spike glycoprotein obtained from the Protein Data Bank (PDB) (Berman et al., 2020) that has the PDB code: 6VSB A-chain are shown in Fig. 1B and Fig. 1C , respectively. The HCoV spike proteins undergo structural alterations during various stages of host receptor recognition and viral -host cell membrane fusion. The different conformational states of the SARS-CoV spike glycoproteins during virus entry are reported (Song et al., 2018) . Initially, the spike glycoprotein adopts the pre-fusion trimer structure with the S1 B in "Down" (or closed) conformation. Upon binding with host cell receptor during infection, the spike protein undergoes a large conformational transition from pre-fusion (closed) state to a post-fusion (open) state such that the S1 B domain is converted to the "Up" conformation state. This promotes the hydrophobic fusion peptide to be exposed facilitating the viral and cellular membranes fusion by bringing them closer. The conformational changes present in the SARS-CoV spike protein reported by (Song et al., 2018) are shown in Fig. 1D . After the J o u r n a l P r e -p r o o f virion uptake by target host cells, the proteolytic cleavage between S1/S2 subunits is aided by host proteolytic enzymes in order to release the S1 subunit and trigger the pre-fusion to postfusion conformational transition (Burkard et al., 2014 , Millet et al., 2014 , Millet et al., 2015 . A second cleavage site S2′, upstream of the fusion peptide in the S2 subunit becomes available during the onset of membrane fusion. This second cleavage step occurs for all CoVs and is believed to activate the protein for membrane fusion which takes place via irreversible conformational changes (Belouzard et al., 2009 , Walls et al., 2017 . The cleavage sites present near S1/S2 and S2' participate in the viral entry and modulate host range and cell tropism. The structure of SARS-CoV-2 spike protein was analysed in situ by combing cryoelectron tomography, sub-tomogram averaging and molecular dynamics simulations (Turoňová et al., 2020) . Molecular dynamics simulations of the spike proteins reveal the conformational alterations in the protein required for the mechanistic function. The SARS-CoV-2 spike protein contains three hinge regions that provide the S1 A and S1 B domains orientational freedom that allows them to scan the host cell surface for potential receptors (Turoňová et al., 2020) . The attachment of HCoV to host cells is mediated via viral spike protein and host cell surface receptor interactions. The host receptors are carbohydrates that function as "Attachment receptors" and membrane proteins that function as "Entry receptors" thus mediating protein-carbohydrate and protein-protein interactions, respectively. Either the S1 A or S1 B or both domains are referred to as the receptor binding domain (RBD) that bind a carbohydrate for host cellular attachment or a cell surface receptor protein for host cellular entry. For instance, the S1 A domain of the spike proteins in HCoV-OC43 and HCoV-HKU1 bind to 9-O acetyl sialic acid host receptor (Hulswit et al., 2019) , and the MERS-CoV spike protein S1 A domain selectively binds to sialic acid (Li et al., 2017 , Park et al., 2019 . These studies demonstrate that the spike proteins recognise the cell-surface sialic acid glycoconjugates and can therefore serve as attachment factor on host cells. Sialic acid is a negatively charged ubiquitous monosaccharide that is terminally linked to oligosaccharides and forms glycoconjugates of proteins and lipids (Vlasak et al., 1988 , Huang et al., 2015 . The sialic acid derived glycoconjugates, glycoproteins and gangliosides decorate on the host cell surface and are therefore recognised by the HCoV spike proteins to cause the initial attachment to the host cells. The host cell selectivity of the virus can be hampered since such glycoconjugates are differentially expressed in several cells and tissues. Some viruses such as HCoV-OC43 and HCoV-HKU1 express HE protein that functions as sialate-Oacetylesterases that can reverse sialic acid binding of spike proteins to non-targeted host cells J o u r n a l P r e -p r o o f (De Groot et al., 2006) . Likewise, HCoV-NL63 spike protein binding to heparan sulfate is required for viral attachment and infection of target cells (Milewska et al., 2014) . These studies indicate that the different HCoV spike protein and host-receptor carbohydrate-binding partners that mediate host cell recognition remain to be explored. The spike protein S1 B domain is reported to bind various host receptor transmembrane proteins; SARS-CoV , Li et al., 2005 , SARS-CoV-2 and HCoV-NL63 (Hoffman et al., 2005) bind the human angiotensinconverting enzyme 2 (ACE-2) receptor, MERS-CoV binds the human dipeptidyl peptidase 4 (DPP4) and the HCoV-229E binds human amino peptidase N (APN) (Delmas et al., 1992 , Yeager et al., 1992 . The SARS-CoV also binds to human CD209L, a C-type lectin expressed in human lung type II alveolar cells and endothelial cells (Jeffers et al., 2004) . The purified SARS-CoV spike protein was shown to bind pulmonary surfactant protein D, a collectin found in the lung alveoli (Leth-Larsen et al., 2007) . The human host cell receptor proteins for HCoV-OC43, HCoV-HKU1 have not been identified so far. The structure, function, and evolution of CoV spike proteins from all four genera and the structural basis for fusion of viral and host membranes is reviewed (Li 2016) . The manifestation of three major HCoV epidemics; (SARS-CoV, MERS-CoV and SARS-CoV-2) in less than two decades along with instances of other less pathogenic HCoV infections is good reason, to study the molecular mechanisms underlying HCoV-host receptor interactions in order to understand viral entry into host cells. The three-dimensional structures of HCoV spike proteins and their S1 A and S1 B domains in complex with host receptors provide the atomistic details of the virus and host inter-molecular interactions. The structures aid the development of therapeutics and vaccines against these infectious diseases. This manuscript reviews HCoV spike proteins, their structure and function, and their binding to receptors based on the three-dimensional structures. The human SARS-CoV infection in Guangdong province of southern China in 2002 was transmitted from bats and civets , Marra et al., 2003 , Rota et al., 2003 , Ksiazek et al., 2003 . The disease outbreak then spread over Asia, Europe and North America. A total of 8,096 cases and 774 deaths were recorded. The civet SARS-CoV , Li et al., 2006 and SARS-like CoVs from some bats and civets were predicted to result in human infections (Menachery et al., 2015 . Based on the J o u r n a l P r e -p r o o f comparative analyses of SARS-CoV genomes, it has been proposed that they have evolved from civet SARS-CoV and that their spike proteins are highly homologous (Shi and Hu, 2008) . SARS-CoV uses its spike protein for the initial recognition of ACE-2 as host receptor for entry into human epithelial cells to cause infection. ACE-2 is an angiotensin-converting enzyme related carboxypeptidase, a type I integral membrane protein that comprises extracellular (18-740 amino acids), transmembrane (741-761) and cytoplasmic (762-840) regions. The extracellular region is composed of two domains, zinc metallopeptidase domain (19-615) and C-terminal collectrin-like domain (616-740). The three-dimensional structures of human SARS-CoV spike protein (PDB code; 6ACG: Song et al., 2018, 6CRW, 5WRG) and its S1 B domain complexed with human ACE-2 are known (2AJF, 3SCI, 3SCJ). The S1 B domain of SARS-CoV comprises a five-stranded anti-parallel β-sheet and the structure is stabilised by three disulfide bridges. Between strands β4 and β5, a ~65 amino acid residues region folds into an extended loop comprising short stretches of two α-helices and two anti-parallel β-strands in a β-sheet. The SARS-CoV spike protein binds the human ACE-2 receptor via this extended loop (3SCI) (Fig. 2A) . The interaction of virus with host cell receptor is mediated via the receptor binding motifs (RBMs) in the extended loop of spike protein S1 B domain and the virus binding motifs (VBMs) on the ACE-2 receptor comprising helices; H1, H2, H15, H17 and β-hairpin 'B' (labelled according to the PDBsum for PDB code: 3SCI). The amino acid residues that stabilise the virus-host recognition via protein-protein interactions are shown in Table 4 . The spike protein binding site and the carboxypeptidase catalytic site are two different sites in the ACE-2 receptor. During mid-December 2019, SARS-CoV-2 infection was reported in Wuhan, China. The nucleotide sequences of SARS-CoV and SARS-CoV-2 share 79.6% sequence identity at the genomic level . The SARS-CoV-2 uses ACE-2 as receptor for cellular entry . Recombinant overexpressed SARS-CoV-2 spike proteins form nonaggregated homotrimers that specifically bind only to human ACE-2 (Herrera et al., 2020) . The complete genome of bat SARS-CoV isolated in 2013 (RaTG13) was sequenced during 2020 and shown to be similar to the novel SARS-CoV-2. This bat SARS-CoV has been proposed to have recently crossed species and caused infection in humans . The pangolin SARS-CoV genomes isolated during 2014-2018 and sequenced in 2020 share 85.5% to 92.4% sequence similarity to SARS-CoV-2 (Han 2020 , Lam et al., 2020 . The SARS-CoV-2 spike protein shares ~97.5% and ~92% sequence identity with spike proteins of Herrera et al., 2020 , 6X29: Henderson et al., 2020 and crystal structures of the S1 B domain complexed with ACE-2 (6M0J: Lan et al., 2020 , 6LZG: Wang et al., 2020 are available in the PDB. The S1 B domain comprising five-stranded anti-parallel β-sheet in SARS-CoV-2 (6LZG: Wang et al., 2020) interacts with the human ACE-2 receptor as shown in Fig. 2B . The three-dimensional structures of human SARS-CoV (3SCI) and SARS-CoV-2 (6LZG) S1 B domains complexed with ACE-2 are highly superimposable. The protein-protein interaction sites between the SARS-CoV-2 virus and host receptor are similar to the interactions with SARS-CoV although there are differences in their amino acid sequences. The residues involved in the protein-protein interactions between SARS-CoV-2 spike protein and human ACE-2 are shown in Table 4 . The multiple sequence alignment of the bat, civet, pangolin, human SARS-CoV spike proteins and the dog, cat, mink, lion, tiger, human SARS-CoV-2 spike proteins is shown in Fig. 2D in order to depict the RBMs discussed above in the representative spike proteins from bat, civet, pangolin, human SARS-CoV, and dog, cat, mink, tiger, lion and human SARS-CoV-2 A motif "P 681 RRA 684 ", (amino acid numbering according to human SARS-CoV-2, NCBI code: QHD43416) gained in the human SARS-CoV-2 spike protein is referred as the furin cleavage site (Ou et al., 2020) . This gain of the 'PRRA' sequence motif in human SARS-CoV-2 distinguishes it from its closest bat homologue, RaTG13 SARS-CoV. To facilitate viral entry into host cell, SARS-CoV-2 is preactivated by furin and its spike protein is cleaved during viral packaging thereby reducing its dependence on host cell proteases for viral entry . The SARS-CoV-2 is also activated by transmembrane protease serine 2 and cathepsin L, and both enzymes show cumulative effects with furin on activating SARS-CoV-2 entry . The spike proteins from Canis lupus familiaris (NCBI code: QIT08292.1), Felis catus (QLG96797.1), Mustela lutreola (QJS39496.1), Neovison vison (QNJ45106.1), Panthera leo (QLC48407.1), Panthera tigris (QLC48443.1) also comprise the 'PRRA' sequence motif, indicating that these mammals employ a similar SARS-CoV-2 entry mechanism. MERS-CoV was first identified in humans in the Middle East during 2012. This virus originated in bats and was transmitted from dromedary camels to humans in Saudi Arabia with fatality rate ~35% (Zaki et al., 2012 , Bermingham et al., 2012 , Azhar et al., 2014a , Chan et al., 2015 , Sabir et al., 2016 , Azhar et al., 2014b , Alagaili et al., 2014 , Hemida et al., 2013 . Later, it spread to other Middle East countries; Jordan, United Arab Emirates, Qatar and to J o u r n a l P r e -p r o o f France, Germany, United Kingdom and Italy in Europe and Tunisia in North Africa via human to human transmission. MERS-CoV outbreak was later also reported in South Korea during 2015 (Ki, 2015) . MERS-CoV virions cause agglutination of human erythrocytes. Hemagglutination by MERS-CoV is mediated through simultaneous low-affinity binding of multiple spike proteins of the virus with multiple receptors on human erythrocytes surface. The MERS-CoV hemagglutanating activity is sialic acid dependent and the binding site of sialic acid is located in the S1 A domain (Li et al., 2017) . The pathogenicity of MERS-CoV is caused by the specific binding of its S1 B domain to the human DPP4 receptor. DPP4 is a transmembrane serine protease that comprises cytoplasmic (1-6 amino acids), transmembrane O-sialic acid binds to S1 A domain and is exposed to solvent as shown in Fig. 3A , residues on the strands β1, β4 and β5, adjacent loop regions and helix H1 are involved in the recognition (6Q04). The MERS-CoV spike protein S1 B domain binds to the β-propeller of DPP4 (4L72, Wang et al., 2013) and comprises five-stranded anti-parallel β-sheet stabilised by 3 disulfide bridges. A long insertion region between the β4 and β5 strands comprising ~80 amino acid residues forms a four-stranded anti-parallel β-sheet. The insertion region and the loop connecting β3 and β4 strands are involved in interacting with the human receptor DPP4 (4L72, Wang et al., 2013) . Each blade of the β-propeller in DPP4 comprises four anti-parallel β-strands and the MERS-CoV S1 B domain recognizes amino acid residues on the 4 th and 5 th blades in the β-propeller. The region of interactions between MERS S1 B domain of spike protein and DPP4 is shown in Fig. 3B . The residues involved in the protein-carbohydrate and protein-protein interactions are listed in Table 4 . HCoV-OC43 was isolated for the first time in 1967 from volunteers at the Common Cold Unit in Salisbury, United Kingdom. Modified sialic acid, such as 9-O-acetyl-Nacetylneuraminic acid is a major receptor for HCoV-OC43 (Vlasak et al., 1988) . Also, HCoV-OC43 uses 9-O-acetyl-sialic acid as a receptor . The HE protein reverses attachment of sialic acid by activating sialate-O-acetyl-esterase activity. Sitedirected mutagenesis, binding experiments, and the three-dimensional structures showed that residues involved in sialic acid binding are essential for HCoV-OC43 spike protein mediated entry into host cells. Further, the HCoV-OC43 spike protein does not bind free sialic acid and/or acidic pH conditions do not induce conformational changes in the spike protein suggesting that multivalent interactions with sialoglycans and/or attachment to a protein receptor are essential to promote membrane fusion . The cryo-electron microscopy structures of HCoV-OC43 spike protein trimer in apo form and in complex with 9-O-acetyl-sialoglycan at 2.9Å and 2.8Å resolution, respectively, are available in the PDB (6OHW, 6NZK, Tortorici et al., 2019) . The HCoV-OC43 uses its S1 A domain to bind sialic acid and the ligand interacts with a groove at the periphery of S1 A domain. The 9-O-acetylated sialic acid binds to a large exposed surface in the S1 A domain involving β1, β4 and β5 strands, adjacent loop regions and helix H1 as shown in Fig. 4A (6NZK) similar to MERS-CoV. These protein-carbohydrate interactions are mediated by several non-bonding interactions listed in Table 4 . The S1 B domain of HCoV-OC43 shown in Fig. 4B (6NZK) forms a five-stranded anti-parallel β-sheet. The long insertion between the 4 th and 5 th β-strands comprising ~130 amino acid residues folds into 5 β-strands and 5 short α-helices connected by loops. Seven of the nine disulfide bridges in S1 B domain are associated with the 130 amino acids long insertion. The human receptor for the spike protein of HCoV-OC43 is not yet known. The HCoV-HKU1 was first reported in Hong Kong in 2005 (Woo et al., 2005) and known to spread all over the world subsequently. The HCoV-HKU1 causes mild upper respiratory tract disease among young children but sometimes can lead to severe respiratory diseases in young children and in immunocompromised elderly patients and accounts for ~3% acute respiratory infections. The attachment receptor for HCoV-HKU1 was identified as O-acetylated sialic acid and the binding of this carbohydrate to S1 A domain is required but J o u r n a l P r e -p r o o f not sufficient to cause the infection (Huang et al., 2015) . The HCoV-HKU1 HE protein demonstrates sialate-O-acetylesterase receptor destroying enzyme activity specific to the Oacetylated sialic acids recognized by the spike protein (Huang et al., 2015) . Although a protein receptor has not been identified for HKU1, antibodies against the S1 B domain but not those against the S1 A domain blocked HKU1 infection of cells (Qian, et al., 2015) . These data suggest that the S1 B domain is the primary HCoV-HKU1 receptor-binding site, whereas, the S1 A domain mediates initial attachment via glycan binding (Kirchdoerfer et al., 2016) . Most HCoV-HKU1 spike monoclonal antibodies recognized epitopes in the region between amino acids 535 and 673 (i.e., S1 C domain and some region from S1 B domain), indicating that this region is immunodominant (Qian et al., 2015) . The electron microscopy structure of HCoV-HKU1 spike protein in pre-fusion conformation (5I08, Kirchdoerfer et al., 2016) and the crystal structure of HCoV-HKU1 spike glycoprotein S1 B domain required for host receptor binding and S1 C domain at 1.9 Å resolution are available (5KWB, Ou et al., 2017) . The HCoV-HKU1 S1 B (310-612 amino acid residues) comprises a ~155 amino acids insertion located between the β4 and β5 strands stabilised by several β-strands, small helices and seven disulfide bridges as shown in Fig. 4C . This region with an extended surface is capable of interactions with an unknown human receptor (Ou et al., 2017) . HCoV-NL63 of the α-HCoV genera was first detected in 2002-2003 soon after SARS-CoV epidemic. HCoV-NL63 is a major cause of bronchiolitis and pneumonia in newborns worldwide and can cause severe lower respiratory tract infections among young children and immune-compromised adults (Chiu et al., 2005) . HCoV-NL63 infections have been reported in countries across Europe, Asia and North America. Gene duplication events contribute to an additional N-terminal domain within the α-CoVs of HCoV-NL63 spike protein as observed in the cryo-electron microscopy structure (5SZS, Walls et al., 2016) . This region referred as "0" domain was shown to adopt a galectin-like β-sandwich fold similar to S1 A domain of HCoV-NL63 with an additional three-stranded β-sheet. The hostcell heparan sulfate proteoglycans participate in HCoV-NL63 anchoring and infection (Milewska et al., 2014) . The binding of heparan sulfate to the HCoV-NL63 spike protein using surface plasmon resonance (Walls et al., 2016) was hypothesized to be mediated either via the "0" domain or S1 A domain that exhibit several positively charged patches on their surface. The crystal structure of HCoV-NL63 S1 B domain in complex with human ACE-2 (3KBH, Wu et al., 2009 ) reveals a β-sandwich fold comprising two layers of 3 stranded βsheets stacked against each other through extensive hydrophobic interactions. Three loops connecting the strands; β1-β2, β3-β4 and β5-β6 form the RBMs and responsible for recognising human ACE-2. Among these, residues in the loop connecting the β1 and β2 strands stabilized by a disulfide bridge makes extensive interactions with the receptor as shown in the Fig. 4D . The HCoV-NL63 spike protein RBMs recognize residues on the helices; H1, H16, H17 and residues in the loops comprising turns connecting helices; H15-H16, H17-H18 and H18-H19 and the β-hairpin 'C' on the ACE-2 receptor termed as VBMs. The protein-protein interactions are mediated by several non-bonding interactions listed in Table 4 . HCoV-229E of α-HCoV genera isolated in 1966 cause severe lower respiratory tract infections among young children. This virus is proposed to have originated in African hipposiderid bats and transferred to camelids and alpacas as intermediate hosts (Corman et al., 2015) . The phylogenetic tree of the spike protein sequences from bat, camel and human HCoV-229E. The bat CoV-229E comprises the N-terminal domain "0" as in HCoV-NL63, but this domain is absent in the camel and human CoV-229E suggesting the evolutionary changes across species in CoV-229E. The HCoV-229E uses hAPN as receptor for cellular entry (Yeager et al., 1992) . The hAPN comprises a cytoplasmic tail (1-8 amino acids), transmembrane region (9-32) and extracellular region ( Table 4 . Seven HCoVs of αand βgenera identified during the last six decades are known to cause human infections that could lead to mortality. The viral spike protein uses a double receptor mechanism, i.e., carbohydrate and protein binding for attachment and entry into host cells, respectively, to cause the infection. The dynamical conformational states of the spike protein structure facilitate its binding to host cell receptors and viral entry. The spike protein's S1 A domain from MERS-CoV, HCoV-HKU1, HCoV-OC43 has evolved to bind carbohydrates, such as, sialic acid, and to heparan sulfate in HCoV-NL63, in order to mediate virus-host interactions. The spike proteins S1 B domain display significant sequence variability across the HCoVs and is therefore able to specifically recognise different host receptors. The S1 B domain in α-HCoVs comprises a six-stranded β-sandwich fold and in the β-HCoVs a five-stranded anti-parallel β-sheet. According to the phylogenetic tree, the α-HCoVs; HCoV-229E and HCoV-NL63 are closely related in sequence and have similar S1 B domain structure. However, these has evolved to recognize different receptors; the HCoV-229E binds to the human APN receptor, whereas, the HCoV-NL63 binds to human ACE-2 receptor. Among, the β-HCoVs, the SARS-CoV and SARS-CoV-2 bind human ACE-2 receptor, whereas, MERS-CoV binds DPP4 receptor. The protein receptors for HCoV-OC43 and HCoV-HKU1 that facilitate entry into host are currently not known. In β-HCoVs, the extended loop connecting the β4 and β5 strands in the spike protein S1 B domain is of variable length and amino acid sequence that have evolved to recognize different receptors. The extended disordered loops are stabilised by disulfide bridges. The extended loop connecting the β4 and β5 strands in the S1 B domain of human SARS-CoV-2 spike protein comprises three loops and a tethered disulfide bridge that are important structural determinants for ACE-2 recognition. The sequence regions that specifically interact with human receptors serve as potential candidate epitopes for design of antibodies. The sites of protein-protein interactions between the HCoV spike proteins and host receptors serve as potential sites for design of small molecule inhibitors. The human SARS-CoV-2 that has evolved from bat RaTG13 SARS-CoV has gained the 'PRRA' sequence motif that is absent in human and bat SARS-CoV. The gain of 'PRRA' sequence motif in SARS-CoV-2 involved in furin cleavage is known to enhance efficiency of the virus entry into target cells. In perspective, the availability of complete HCoV genome sequences from different host sources at different timelines of virus isolation will be helpful to trace their evolutionary mutations and transmission pathways. The identification of host entry receptors for HCoV-OC43 and HCoV-HKU1 and attachment receptors for many of the HCoVs discussed in this review will provide further insights into the mechanism of these virus-host interactions. LGP carried out the work and wrote the manuscript. The author declares that there is no potential conflict of interest. 1C . Human SARS-CoV-2 spike protein (PDB code: 6VSB A-chain). S1 A domain (27-300, green), S1 A -S1 B linker (301-335, pink), S1 B domain (336-516, purple), 517-533 (linker, golden rod), S1 C domain (534-589, orange), 590-593 (linker, golden rod), S1 D domain (594-674, cyan), protease cleavage site (675-688, blue), S1-S2 subunits linker (689-710, orange), central β-strand (711-737, magenta), downward helix (738-782, red), S2' cleavage site (783-815, sea green), fusion peptide (816-833, navy blue), connecting region (834-910, yellow), heptad repeats (912-983, chartreuse), central helix (984-1034, dodger blue), β-hairpin (1035 -1069, brown), connecting β-sheet domain (1070-1134, spring green). 6ACG, 6ACJ, 6ACK) . The pre-fusion to post-fusion conformations are indicated. 6ACC: brown (S1 B downward), 6ACG: conformation 1, spike (blue), ACE-2 (dodger blue), 6ACJ: conformation 2, spike (red), ACE-2 (orange), 6ACK: conformation 3, spike (green), ACE-2 (chartreuse). 2B . Cartoon representation of the SARS-CoV-2 spike protein S1 B domain (green) interacting with human ACE-2 (purple) (PDB code: 6LZG). Strand β4 (orange) and β5 (cyan), the extended loop (pink) and the side chains of residues within 4.5Å from ACE-2 (K417, G446, Y449, Y453, L455, F456, Y473, A475, G476, E484, F486, N487, Y489, F490, Q493, G496, Q498, T500, N501, G502, Y505) are indicated. within 4.5Å from ACE-2. The disulfide bridge C480-C488 (yellow). The residues that form hydrogen bonds with ACE-2 are shown in bold and italics. Human SARS-CoV-2 (light green), dog SARS-CoV-2 (black), cat SARS-CoV-2 (aqua green), mink SARS-CoV-2 (magenta), tiger SARS-CoV-2 (orange), lion SARS-CoV-2 (yellow), pangolin CoV (cyan), human CoV (dark green), bat CoV (red), civet CoV (violet). Figure S2) showing the insertion sequences that form receptor binding motifs within the RBD (S1 B domain) for human SARS-CoV-2 (1-3), Canis lupus familiaris SARS-CoV-2 (4), Felis catus SARS-CoV-2 (5), Mustela lutreola SARS-CoV-2 (6), Neovison vison SARS-CoV-2 (7), Panthera tigris SARS-CoV-2 (8), Panthera leo SARS-CoV-2 (9), bat SARS-CoV RaTG13 (10), pangolin SARS-CoV (11), bat SARS-CoV (12-17), civet , human SARS-CoV (20-21). The secondary structure conformations; β-strands (olive green arrows) and α-helices (red bars) are indicated. The starting and ending amino acid numbers of the regions are indicated after the NCBI code within brackets. The side chains of residues (Q36, F39, H91, A92, F101, I132, S133, P134, S135, Q304, R307) that lie within 4.5 Å from silaic acid are indicated. The residues that form hydrogen bonds with O-silaic acid are shown in bold and italics. . Cartoon representation of the MERS CoV spike protein (PDB code: 4L72) S1 B domain (green) interacting with human DPP4 (purple). The strand β4 (orange) and β5 (cyan), the extended loop (pink) and the side chains of residues within 4.5Å from DPP4 (S454, D455, P463, Y499, N501, K502, S504, L506, D510, R511, E513, P515, E536, D537, G538, D539, Y540, R542, W553, V555, A556, S557, S559) are indicated. The C503-C526 disulfide bridge (yellow). The residues that form hydrogen bonds with DPP4 are shown in bold and italics. 4C . Cartoon representation of the HCoV-HKU1 spike protein (PDB code: 5KWB) S1 B domain (green). The strands β4 (orange) and β5 (cyan) and the extended loop (pink) are indicated. The Cys-Cys pairs; 466-546, 474-495, 476-567, 520-533, 504-518, 485-516, 582-588, 556-569 form disulfide bridges (yellow). . Cartoon representation of the HCoV-NL63 spike protein (PDB code: 3KBH) S1 B domain (green) interacting with human ACE-2. The three loops (gold, blue, magenta) connecting the strands β1 (orange) to β2 (green), β3 (cyan) to β4 (green ) and β5 (red) to β6 (green), along with side chains of residues lie with 4.5 Å from hACE-2 (G494, G495, S496, C497, Y498, V499, C500, H503, G534, S535, P536, G537, S539, S540, W585, H586) are shown. The C497-C500 disulfide bridge (yellow). The residues that form hydrogen bonds with ACE-2 are shown in bold and italics. . Cartoon representation of the HCoV-229E spike protein (PDB code: 6ATK) S1 B domain (green) interacting with human APN. The three loops (gold, blue, magenta) connecting strands β1 (orange) to β2 (green), β3 (cyan) to β4 and β5 (red) to β6, respectively, along with side chains of residues that lie with 4.5 Å from hAPN (S312, G313, G314, G315, K316, C317, F318, N319, C320, R359, W404, S407, K408) are shown. The C317-C320 disulfide bridge (yellow). The residues that form hydrogen bonds with APN are shown in bold and italics. Middle East respiratory syndrome coronavirus infection in dromedary camels in Saudi Arabia Evidence for camel-to-human transmission of MERS coronavirus Detection of the Middle East respiratory syndrome coronavirus genome in an air sample originating from a camel barn owned by an infected patient Expression of animal virus genomes Activation of the SARS coronavirus spike protein via sequential proteolytic cleavage at two distinct sites Mechanisms of coronavirus cell entry mediated by the viral spike protein The protein data bank Severe respiratory illness caused by a novel coronavirus Coronavirus cell entry occurs through the endo-/lysosomal pathway in a proteolysis-dependent manner Middle East respiratory syndrome coronavirus: another zoonotic betacoronavirus causing SARSlike disease Human coronavirus NL63 infection and other coronavirus infections in children hospitalized with acute respiratory disease in Hong Kong Evidence for an ancestral association of human coronavirus 229E with bats Origin and evolution of pathogenic coronaviruses Structure, function and evolution of the hemagglutinin-esterase proteins of corona-and toroviruses Aminopeptidase N is a major receptor for the enteropathogenic coronavirus TGEV Metagenomic analysis of the viromes of three North American bat species: viral diversity among different bat species that share a common habitat Identification of a novel coronavirus in patients with severe acute respiratory syndrome A conformation-dependent neutralizing monoclonal antibody specifically targeting receptor-binding domain in Middle East respiratory syndrome coronavirus spike protein Molecular evolution of human coronavirus genomes A previously undescribed coronavirus associated with respiratory disease in humans Public health: broad reception for coronavirus The species severe acute respiratory syndrome related coronavirus: classifying 2019-nCoV and naming it SARS-CoV-2 Isolation and characterization of viruses related to the SARS coronavirus from animals in southern China Evolutionary relationships and sequence-structure determinants in human SARS coronavirus-2 spike proteins for host receptor recognition Pangolins Harbor SARS-CoV-2-Related Coronaviruses Middle East Respiratory Syndrome (MERS) coronavirus seroprevalence in domestic livestock in Saudi Arabia Controlling the SARS-CoV-2 Spike Glycoprotein Conformation Coronavirus infections in working adults: eight-year study with 229 E and OC 43 Characterization of the SARS-CoV-2 S protein: biophysical, biochemical, structural, and antigenic analysis Human coronavirus NL63 employs the severe acute respiratory syndrome coronavirus receptor for cellular entry Bat origin of human coronaviruses Human coronavirus HKU1 spike protein uses O-acetylated sialic acid as an attachment receptor determinant and employs hemagglutinin-esterase protein as a receptor-destroying enzyme Human coronaviruses OC43 and HKU1 bind to 9-O-acetylated sialic acids via a conserved receptor-binding site in spike protein domain A Evidence supporting a zoonotic origin of human coronavirus strain NL63 CD209L (L-SIGN) is a receptor for severe acute respiratory syndrome coronavirus Seroepidemiologic survey of coronavirus (strain OC 43) related infections in a children's population MERS outbreak in Korea: hospital-to-hospital transmission Pre-fusion structure of a human coronavirus spike protein A novel coronavirus associated with severe acute respiratory syndrome The molecular biology of coronaviruses Identifying SARS-CoV-2-related coronaviruses in Malayan pangolins Structure of the SARS-CoV-2 spike receptor-binding domain bound to the ACE2 receptor The SARS coronavirus spike glycoprotein is selectively recognized by lung surfactant protein D and activates macrophages Structure of SARS coronavirus spike receptor-binding domain complexed with receptor Receptor recognition mechanisms of coronaviruses: a decade of structural studies Structure, function, and evolution of coronavirus spike proteins Identification of sialic acid-binding function for the Middle East respiratory syndrome coronavirus spike glycoprotein Angiotensinconverting enzyme 2 is a functional receptor for the SARS coronavirus Animal origins of the severe acute respiratory syndrome coronavirus: insight from ACE2-Sprotein interactions The human coronavirus HCoV-229E S-protein structure and receptor binding The genome sequence of the SARS-associated coronavirus Recovery in tracheal organ cultures of novel viruses from patients with respiratory disease A SARS-like cluster of circulating bat coronaviruses shows potential for human emergence SARSlike WIV1-CoV poised for human emergence Human coronavirus NL63 utilizes heparan sulfate proteoglycans for attachment to target cells Host cell entry of Middle East respiratory syndrome coronavirus after two-step, furin-mediated activation of the spike protein Host cell proteases: Critical determinants of coronavirus tropism and pathogenesis Crystal structure of the receptor binding domain of the spike glycoprotein of human betacoronavirus HKU1 Characterization of spike glycoprotein of SARS-CoV-2 on virus entry and its immune cross-reactivity with SARS-CoV Structures of MERS-CoV spike glycoprotein in complex with sialoside attachment receptors Coronavirus as a possible cause of severe acute respiratory syndrome Distant relatives of severe acute respiratory syndrome coronavirus and close relatives of human coronavirus 229E in bats Characterization of a novel coronavirus associated with severe acute respiratory syndrome Co-circulation of three camel coronavirus species and recombination of MERS-CoVs in Saudi Arabia 15 drugs being tested to treat COVID-19 and how they would work Cell entry mechanisms of SARS-CoV-2 Structural basis of receptor recognition by SARS-CoV-2 A review of studies on animal reservoirs of the SARS coronavirus Cross-host evolution of severe acute respiratory syndrome coronavirus in palm civet and human Cryo-EM structure of the SARS coronavirus spike glycoprotein in complex with its host cell receptor ACE2 Epidemiology, genetic recombination, and pathogenesis of coronaviruses A real-time dashboard of clinical trials for COVID-19. The Lancet Digital Health Structural basis for human coronavirus attachment to sialic acid receptors In situ structural analysis of SARS-CoV-2 spike reveals flexibility mediated by three hinges Identification of a new human coronavirus Understanding pathways to death in patients with COVID-19. The Lancet Respiratory Medicine Human and bovine coronaviruses recognize sialic acid-containing receptors similar to those of influenza C viruses Structure, Function, and Antigenicity of the SARS-CoV-2 Spike Glycoprotein Glycan shield and epitope masking of a coronavirus spike protein observed by cryo-electron microscopy Tectonic conformational changes of a coronavirus spike glycoprotein promote membrane fusion SARS-CoV infection in a restaurant from palm civet Serological evidence of bat SARSrelated coronavirus infection in humans Structure of MERS-CoV spike receptor-binding domain complexed with human receptor DPP4 Structural and functional basis of SARS-CoV-2 entry by using human ACE2 Receptor-binding loops in alphacoronavirus adaptation and evolution Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation Characterization and complete genome sequence of a novel coronavirus, coronavirus HKU1, from patients with pneumonia Coronavirus diversity, phylogeny and interspecies jumping A new coronavirus associated with human respiratory disease in China Crystal structure of NL63 respiratory coronavirus receptor-binding domain complexed with its human receptor Human aminopeptidase N is a receptor for human coronavirus 229E Cryo-EM structures of MERS-CoV and SARS-CoV spike glycoproteins reveal the dynamic receptor binding domains Isolation of a novel coronavirus from a man with pneumonia in Saudi Arabia Angiotensinconverting enzyme 2 (ACE2) as a SARS-CoV-2 receptor: molecular mechanisms and potential therapeutic target A pneumonia outbreak associated with a new coronavirus of probable bat origin A novel coronavirus from patients with pneumonia in China UniProtKB IDs of human coronavirus proteins HCoV-229E ORF1ab P0C6X1; 1-111 (NSP1) Putative 2'-Omethyl transferase). Q6Q1S2 (Spike glycoprotein, 1356 aa), Q6Q1S1 (NSP3, 225 aa), Q6Q1S0 (Envelope small membrane protein Q7TFA1 (Protein non-structural 7b, 44 aa), Q7TFA0 (Protein non-structural 8a, 39 aa), Q80H93 (Non-structural protein 8b, 84 aa), P59595 (Nucleoprotein, 9a, 422 aa) K9N796 (NSP ORF3, 103 aa) K9N5R3 (Envelope small membrane protein, 82 aa), K9N4V0 (NSP ORF4a, 109 aa), K9N643 (Non-structural protein ORF4b Host translation inhibitor NSP1) P59633 (Protein 3b, 154 aa), P0DTC4 (Envelope small membrane protein, 75 aa), P0DTC5 (Membrane protein 222 aa), P0DTC6 (NSP6, 61 aa), P0DTC7 (Protein 7a, 121 aa), P0DTD8 (Protein non-structural 7b, 43 aa), P0DTC8 (Protein non-structural 8, 121 aa P36334 (Spike glycoprotein, 1353 aa), Q04853 (NSP12.9 kDa, 109 aa) Host translation inhibitor NSP1) Q5MQC9 (NSP4, 109 aa), Q5MQC8 (Envelope small membrane protein, 82 aa), Q5MQC7 (Membrane protein, 223 aa), Q5MQC6 (Nucleoprotein 7a LGP thanks School of Chemistry and CAS, UGC for providing research facilities.J o u r n a l P r e -p r o o f CoV: K390, R426, Y436, Y440, F442, L443, P462, D463, F472, N473, Y475, N479, Y481, G482, Y484, T486, T487, G488, I489,