key: cord-0956158-zkp40fuz authors: Dash, Pujarini; Turuk, Jyotirmayee; Behera, Santosh Ku.; Palo, Subrata Ku.; Raghav, Sunil K.; Ghosh, Arup; Sabat, Jyotsnamayee; Rath, Sonalika; Subhadra, Subhra; Bhattacharya, Debadutta; Kanungo, Srikant; Kshatri, Jayasingh; Mishra, Bijaya kumar; Dash, Saroj; Mahapatra, Namita; Parida, Ajay; Pati, Sanghamitra title: Sequence analysis of Indian SARS-CoV-2 isolates shows a stronger interaction of mutated receptor binding domain with ACE2 receptor date: 2020-08-28 journal: bioRxiv DOI: 10.1101/2020.08.28.271601 sha: ca1aca719d04fc492f76076922645a801e4fbdc8 doc_id: 956158 cord_uid: zkp40fuz SARS-CoV-2 is a RNA Coronavirus responsible for the pandemic of the Severe Acute Respiratory Syndrome (COVID-19). It has affected the whole world including Odisha, a state in eastern India. Many people migrated in the state from different countries as well as states during this SARS-CoV-2 pandemic. As per the protocol laid by ICMR and Health & Family welfare of India, all the suspected cases were tested for SARS-CoV-2 infection. The aim of this study was to analyze the RNA binding domain (RBD) sequence of spike protein from the isolates collected from the throat swab samples of COVID-19 positive cases and further to assess the RBD affinity with ACE2 of different species including human. Whole genome sequencing for 35 clinical SARS-CoV-2 isolates from COVID-19 positive patients was performed using ARTIC amplicon based sequencing. Sequence analysis and phylogenetic analysis was carried out for the Spike and RBD region of all isolates. The interaction between the RBD and ACE2 receptor of five different species was also analysed. Except three isolates, spike region of 32 isolates showed one/multiple alterations in nucleotide bases in comparison to the Wuhan reference strain. One of the identified mutation at 1204 (Ref A, RMRC 22 C) in the RBD of spike protein was identified which depicted a stronger binding affinity with human ACE2 receptor compared to the wild type RBD. Furthermore, RBDs of all the Indian isolates are capable of binding to ACE2 of human, bat, hamster and pangolin. As mutated RBD showed stronger interaction with human ACE2, it could potentially result in higher infectivity. The study shows that RBDs of all the studied isolates have binding affinity for all the five species, which suggests that the virus can infect a wide variety of animals which could also act as natural reservoir for SARS-CoV-2. Corona viruses have been studied for more than fifty years and are known to infect multiple animal species including human. Though their pathogenesis and mechanism of replication has already been well described due to previous outbreaks ( is also an essential requirement for transmission across different species 2 . It is known that SARS-CoV interacts with human angiotensin converting enzyme 2 (ACE2) for their entry and after COVID-19 outbreak, researchers found that SARS-CoV-2 also interacts with ACE2 for the viral entry into host cells 3, 4 . SARS-CoV-2 spike (S) protein gets cleaved by host proteases into S1 and S2 domain which mediates receptor recognition and membrane fusion respectively 5 . Wang et al. (2020) reported that, the S1 domain of S protein of SARS-CoV-2 contains a region called receptor binding domain (RBD) which makes a complex with human ACE2 and facilitates viral entry 6 . However, emerging mutations in SARS-CoV-2 genome might alter the process of infection transmission, replication and potential of viral attachment with ACE2. According to epidemiological data, COVID-19 has an origin from bats in Wuhan, China and then spread to other parts after its zoonotic transmission via the Malayan Pangolins 7,8 . In a country like India having a diversified geographical distribution, it is important to understand the pathogenesis of different strains of SARS-CoV-2 isolated from different parts of the country. As interaction with ACE2 is the main pathway of entry of this virus into its host, knowledge on the RBD binding affinity of different Indian SARS-CoV-2 isolates with the ACE2 of their natural reservoirs including human is highly essential. However, this has remained a grey area with a little information available. In the current study, we have analysed the RBD sequences of spike protein from different isolates of SARS-CoV-2 from COVID-19 patients of Odisha. Further we analysed the binding affinity of RBDs with the ACE2 of different probable natural hosts of corona virus: bat, pangolin and hamster including human. The detected mutation in the RBD region of one isolate shows stronger binding affinity with human ACE2 than the wild type RBD, providing important information regarding its virulence as well as drug targeting. The current study was a part of whole genome sequencing study carried under the Odisha Study group constituting different Government organisations jointly by Regional Medical For sequence alignment, full length genome sequence of SARS-CoV-2 isolate of Wuhan-Hu-1 (Accession no.NC_045512) was downloaded from NCBI database and used as reference sequence for all further analysis. Alignment of all 35 RMRC spike nucleotide sequences with the reference genome was carried out using BLASTN (align two/more sequences). From the data available for reference strain, we retrieved the information of coding region of spike RBD domain sequence and all RMRC spike sequences were aligned with the reference sequence to identify the respective RBD mutations using BLASTN. Multiple sequence alignment of all 35 RBDs (amino acid sequences) was carried out using Clustal X tool. Mutations specific to RMRC isolates were identified by comparing the RBD coding regions with the reference strain. A phylogenetic tree was generated using the MEGA software version 6 with 1000 bootstrap replications as instructed in MEGA software. From the phylogenetic tree analysis, four RMRC RBD sequences were selected from four random clusters for further investigation of their interactions with ACE2 receptor of probable natural hosts of SARS-CoV-2. Hamsters were reported to be the best suitable animal model to carry out SARS-CoV-2 related experiments; therefore it was essential to understand the interaction of its ACE2 receptor with isolated Indian SARS-CoV-2 RBDs 10 Before inception of structure prediction, the ACE2 sequences from pangolin, hamster, Chinese and Indian bat were aligned with the human counterpart to identify the percentage of similarity and dissimilarity between these sequences. The experimental 3D structure of human ACE2 was retrieved from RCSB PDB (PDB ID-6M0J) with a resolution of 2.45 Å positioning from 19-615 amino acids. Protein Data Bank (PDB) did not provide any experimental structure of ACE2 receptors of pangolin, hamster, Chinese and Indian bat, which has prompted us to predict their three dimensional (3D) structure through homology modelling using Modeller v 9.19 tool followed by structure validation. Suitable templates were identified for 3D model building of ACE2 using BLASTp 11 search against PDB (Protein Data Bank) database. The templates with PDB ID: 1R42, 6CS2, 6LZG, 3SCI, and 2AJF, were found to be the better homologs for target-template alignment and modelled 3D structure prediction. Based on the optimized target-template alignment, Modeller v9.19 12 facilitated in model developments, the models with lowest Discrete Optimized Protein Energy (DOPE) score were retained for further structural refinement. Side chain optimization was performed using WHATIF 13 and GalaxyRefine 14 tool. The optimized models of ACE2 were finalized based on its overall quality and stereo-chemical geometry and energy. The geometry of the predicted model was evaluated using PROCHECK 15 and Ramachandran analysis 16 . ERRAT 17 programme was used to calculate the accuracy of the nonbonded atoms for the predicted model. Verify 3D 18 tool was used to evaluate the compatibility of the 3D model with its own amino acid sequence by assigning a structural class based on its location, environment and comparing the results to good quality structures. The structure was uploaded in Qualitative Model Energy Analysis (QMEAN) server (bencrept), for resolving the model quality. The energy potential of the predicted model was calculated using ProSA-web server 19 . Four nucleotide sequences of RMRC RBDs were selected and translated to protein sequence using EMBOSS Transeq tool of European Bioinformatics Institute (EBI). The 3-D structure of reference human RBD (Uniprot ID: PODTC2, 333-526 amino acids of the spike protein, PDB ID-6M0J) was considered as wild type. The experimental structure of the wild RBD was mutated at the position I402L using Discovery Studio visualizer (4.1) for attaining a mutant RMRC22 RBD as required for further computational analysis. Finally, study of protein-protein interaction (PPI) between wild type/mutant RBDs and ACE2 receptors form different organisms was carried out using online HawkDock server which is a powerful tool to predict the binding structures and identify the key residues of PPIs 20 . All the SARS Co-V-2 isolates included in the study had a travel history from either outside the country or from other states of India. According to the travel history, out of 35 isolates, one had a foreign travel history, 11 were Nizamuddin (cluster detected from New Delhi, India during April, 2020) returned and rest 23 migrated from Surat, Gujarat, India. The detail of the demographic and clinical status of the patients included in the study has been described in one of our unpublished reports (Turuk et al). We did not record any death case among the patients whose samples were included in the current study. . In the current study, the spike region was identified at the position from 21563-25384 of the whole genome and consists of 3822 nucleotides. The BLAST alignment analysis of RMRC spike nucleotide sequences alignment analysis showed that, three spike sequences (RMRC 104, 157, 158) were 100 % identical to the Wuhan reference spike whereas all other RMRC spikes shared 99 % identity with one or multiple altered bases at different positions (Table 1) . The alignment of the spike sequences of RMRC isolates with the reference strain RBD, grouped in the third cluster and all other RBDs were in cluster four, which describes their phylogenetic distribution (Fig.1B) . The sequence alignment analysis showed that ACE2 of pangolin, hamster, Chinese and Indian bat shared 85 %, 84 %, 80 % and 78 % identity with human ACE2 respectively (Suppl. Fig.1 ). The quality of modelled structures of ACE2 was validated using several computational methods (pangolin Fig.2A (1.9%) in generously allowed regions and 7 (1.0%) in disallowed regions (Fig. 2B) . In Hamster, out of 785 amino acid residues, the Ramachandran analysis illustrated 613 (87.2%) residues in most favoured regions, 70 (10.0%) in additional allowed regions, 12 (1.7%) generously allowed regions and 8 (1.1%) in disallowed regions (Fig.3B) . In Chinese bat, out of favoured regions, 66 (9.2%) in additional allowed regions, 12 (1.7% ) generously allowed regions and 2 (0.3%) in disallowed regions (Fig.4B) . In Indian bat, out of 807 amino acid residues, the Ramachandran analysis illustrated 630 (86.8%88.8%) residues in most favoured regions, 73 (10.1%) in additional allowed regions, 15 (2.1%) generously allowed regions and 8 (1.1%) in disallowed regions (Fig.5B ). Using the Qualitative Model Energy Analysis (QMEAN) server, the model quality was determined. The overall quality of model was good as indicated by its QMEAN Z-score and QMEAN4 global score. Low quality models are expected to have a negative QMEAN Z-score. The QMEAN4 ranges from 0 to 1 and a higher value indicates good qualitymodel 22 . Additionally, the overall quality of the model was evaluated using Protein Structure Analysis (ProSA) tool which provides a quality score (Zscore) as compared to all known protein structure from x-ray crystallography as well structural NMR. The obtained Z-score value was -6.33 (Pangolin), -4.76 (Hamster), -6.24 (Chinese bat) and -6.73 (Indian bat); which indicates the high quality of the models compared to known protein structures (Fig. 2B-5B ). As the 3D structure of human RBD was already available in the database (PDB ID-6M0J), we Table-2 . However, if we compare the interaction of mutated and wild type RBD with ACE2 receptor of all species, the interaction between mutated RBD and human ACE2 is the strongest one with highest binding energy and highest number of hydrogen bonds. The complex trajectory of the recent COVID-19 pandemic in India poses greater risk towards control and containment of the infection. It is high time to understand its mobility pattern in the country and the viral genetic properties favouring its virulence. Though during early phase of pandemic, Odisha, had comparatively very few positive cases as well as small number of deaths, gradually, the virus has become rapidly infectious making the clinical scenario worsen. Many people from Odisha were working outside the state and during this pandemic, they returned to their home state due to many reasons. The samples included in our study were mainly collected from suspected cases who had a travel history from different states with one foreign travel record. As spike protein of SARS-CoV-2 mediates viral entry in to host and houses RBD, which binds to ACE2 receptor of host cell, understanding the spike-RBD distribution in the genome of Indian isolates is crucial for therapeutic design. SARS-CoV-2 spike protein is reported to have a stronger binding affinity for ACE2 than SARS-CoV and higher affinity means low number of virus is required to infect the cell, which may explain the high transmission of SARS-CoV-2 23 . In the current study, spike region of 32 isolates showed altered nucleotide bases at multiple positions as compared to the Wuhan reference strain suggesting mutations in these Indian isolates during the spread. For SARS-CoV-2, specific RBD-ACE2 binding ensures infection as well as serves as a potential target for developing treatment strategies for this infection 23 . According to Premkumar et al. 2020, the RBD of SARS-CoV-2 is an immunodominant and a potential target of antibodies in COVID-19 patients 24 . The mutation found in RMRC 22 isolate might play a role in altering the antigenicity or binding affinity of the respective RBD. Due to the rapid spreading and evolution, SARS-CoV-2 RBD is known to acquire several mutations leading to increased binding affinity to human ACE2 receptor 25 . In France, multiple mutations were identified in RBD of SARS-CoV-2 contributing to higher receptor binding capacity, which might be responsible for increased virus spread and infectivity 26 . On the other hand, a mutation in S protein has been found to be associated with decrease in receptor binding affinity 27,28 . In the current study, mutation in the RBD region of Indian isolates did not seem to affect its interaction with ACE2 receptor of other species prominently except that of human. Surprisingly, the patient, from whom, the RMRC 22 isolate was obtained, had a travel history of returning from Nizamuddin (cluster detected from New Delhi, India during April, 2020) recently before he tested positive for SARS-CoV-1 0 2. However, no mutation was observed in the isolates obtained from his other family members (his father and two brothers) who were also COVID-19 positive. It appears that, emergence and role of a mutation in any region of SARS-CoV-2 genome depends upon multiple factors including the geographical distribution, rate of spreading, alteration in the virulence of the virus and immune response of the host. The interaction analysis of mutated and wild type RBDs with ACE2 receptor indicated that bats and pangolins could be suitable natural reservoirs for Indian isolates of this virus and this finding falls parallel with other earlier reports 29, 30 . As hamster has been reported as a suitable animal model to study SARS-CoV-2 pathogenesis, the interaction of RBD of the Indian isolates included in the current study makes the earlier report more relevant 10 Coronaviruses: an overview of their replication and pathogenesis Bat-to-human: spike features determining 'host jump' of coronaviruses SARS-CoV, MERS-CoV, and beyond Angiotensinconverting enzyme 2 is a functional receptor for the SARS coronavirus A pneumonia outbreak associated with a new coronavirus of probable bat origin Coronaviridae Structural and Functional Basis of SARS-CoV-2 Entry by Using Human ACE2 The effect of travel restrictions on the spread of the 2019 novel coronavirus (COVID-19) outbreak Probable pangolin origin of SARS-CoV-2 associated with the COVID-19 outbreak SARS-CoV2 genome analysis of Indian isolates and molecular modelling of D614G mutated spike protein with TMPRSS2 depicted its enhanced interaction and virus infectivity Syrian hamsters as a small animal model for SARS-cov2 infection and countermeasure development Basic Local Alignment Search Tool Comparative Protein Structure Modeling Using MODELLER WIWS: A protein structure bioinformatics web service collection GalaxyRefine: Protein structure refinement driven by side-chain repacking PROCHECK: a program to check the stereochemical quality of protein structures Deviations from Standard Atomic Volumes as a Quality Measure for Protein Crystal Structures Verification of protein structures: patterns of nonbonded atomic interactions A method to identify protein sequences that fold into a known three-dimensional stucture ProSA-web: Interactive web service for the recognition of errors in three-dimensional structures of proteins HawkDock: a web server to predict and analyze the protein-protein complex based on computational docking and MM/GBSA Structure of the SARS-CoV-2 spike receptor-binding domain bound to the ACE2 receptor QMEAN: A comprehensive scoring function for model quality assessment A virus that has gone viral: Amino acid mutation in S protein of Indian isolate of Coronavirus COVID-19 might impact receptor binding, and thus, infectivity A pneumonia outbreak associated with a new coronavirus of probable bat origin Identification of 2019-nCoV related coronaviruses in Malayan pangolins in southern China