key: cord-319718-blqzi69t authors: Zhang, L.; wang, s.; Ren, Q.; Yang, J.; Lu, Y.; Gai, Z. title: Genome-wide variations of SARS-CoV-2 infer evolution relationship and transmission route date: 2020-05-03 journal: nan DOI: 10.1101/2020.04.27.20081349 sha: doc_id: 319718 cord_uid: blqzi69t In the epidemic evolution of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the issues of mutation, origin, typing and the effect of mutation on molecular detection remain to be unrevealed. In order to identify the evolutionary relationship of SARS-CoV-2 and evaluate the detection efficiency of primers that are currently used in different countries, we retrieved genomic sequences of 373 SARS-CoV-2 strains from multiple databases and performed genome-wide variation analysis. According to the nucleotide C28144T variation, the SARS-CoV-2 can be divided into group A (117 strains) and group B (256 strains). The spike protein gene (S gene) coding region 1841 (total 23403) A1841G, formed a B1 subgroup (40 strains) in group B, of which 30 strains were from European and American countries in March (especially Washington, USA). These mutations are likely to be influenced by the environment or the immunization selection pressure of different populations. Although the mutation is not in the receptor binding region (RBD) and alkaline cleavage region, it may also affect the ability of transmission and pathogenicity; however, the significance is not yet clear. As the ratio of A / B strains in the epidemic months showed an increasing trend (0.35: 1 in January, 0.62: 1 in February and 0.76: 1 in March), it seems that the transmissibility of group A strains becomes stronger with time. Based on the variation of 11 nucleotide sites during the epidemic process, it is speculated that the Washington strain is more like an ancestor type, and the Wuhan strain is the offspring of the group A virus strain. By comparing the detection capabilities of primers in different countries, the SARS-CoV-2 nucleotide variation may only affect molecular detection of very few strains. The differences in the transmissibility, pathogenicity and clinical manifestations of different types of strains require further investigations. In the epidemic evolution of severe acute respiratory syndrome coronavirus 2 26 (SARS-CoV-2), the issues of mutation, origin, typing and the effect of mutation on 27 molecular detection remain to be unrevealed. In order to identify the evolutionary 28 relationship of SARS-CoV-2 and evaluate the detection efficiency of primers that are 29 currently used in different countries, we retrieved genomic sequences of 373 30 SARS-CoV-2 strains from multiple databases and performed genome-wide variation 31 analysis. According to the nucleotide C28144T variation, the SARS-CoV-2 can be 32 divided into group A (117 strains) and group B (256 strains). The spike protein gene is speculated that the Washington strain is more like an ancestor type, and the Wuhan 44 strain is the offspring of the group A virus strain. By comparing the detection Introduction: 55 The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) started to 56 emerge in Wuhan in December 2019. As of April 20, 2020, more than 2.4 57 million cases have been reported in more than 211 countries and regions. 58 While the outbreak and pandemic of SARS-CoV-2 poses a great threat to public 59 health, in-depth studies have been conducted on key issues such as 60 transmission ability, epidemiology, clinical characteristics, pathogenic 61 mechanism, clinical treatment, and laboratory testing. Regarding the origin of 62 SARS-CoV-2, bat coronavirus RaTG13 was found to be highly similar to 63 SARS-CoV-2, so it was speculated that bats might be the original host of the 64 virus [1] . It was further found that the virus isolated from pangolin was similar 65 to SARS-CoV-2, suggesting that pangolin might be an intermediate host [2, 3] . 66 By comparing the genomes of existing coronavirus such as SARS-CoV-2, 67 MERS-CoV and SARS-CoV, it was found that RNA editing in the human body 68 is related to certain mutations obtained by virus evolution [4] . In order to 69 reveal more clues of the origin of SARS-CoV-2, which is still an urgent problem to 70 be solved, we retrieved the publicly available virus sequences from global 71 database sources such as GISAID and NCBI to analyze the emerged mutation, 72 evolution and typing of SARS-CoV-2, and to evaluate the influence of its 73 mutations on molecular detection in current clinical settings. and used to construct the phylogenetic tree by The variation percentage and pairwise distance were calculated using 87 MEGA-X32. BioEdit was used to calculate the number of mutation sites, and 88 . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 3, 2020 . . https://doi.org/10.1101 /2020 Based on the position and nucleotide variation of the phylogenetic tree, 104 we compared the evolutionary variation between SARS-CoV-2 representative 105 strains. WA1-F6 (sample collected on January 25, 2020) strain has one 106 nucleotide variation compared with the closed bat strain (bat/Yunnan/RaTG13). 107 Compared with WA1-F6 strain, we find that Chongqing/YC01, Fujian/8 and 108 HKU-SZ-002a strain (in the same evolutionary branch with WA1-F6 strain) 109 showed 2, 1 and 2 nucleotide variation respectively; While Wuhan epidemic 110 strain Wuhan-Hu-1 (sample collected in November 2019), which is far from 111 the outer group, has 3 nucleotide variation; and WA-UW82, which is far from 112 the outer group, has 8 nucleotide variation (sample collected in March 2020). 113 Interestingly, the WA-UW82 strain appeared one-and-a-half month after the 114 WA1-F6 strain became popular in the Washington area of the United States. 115 The WA-UW82 strain mutated and evolved from WA1-F6 in the 5 'UTR, 116 ORF1ab, S, ORF3a and ORF8 coding regions, whereas no positive selection 117 was found (Table 3) . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 3, 2020. Prevention (CDC) [5] , Europe and Hong Kong [6] (Table 4 ). . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 3, 2020. . https://doi.org/10. 1101 /2020 The primer sequences reported by China CDC and Europe did not find 155 nucleotide variation in the ORF1ab and RdRp amplification regions, 156 respectively, while several nucleotides in the N amplification region mutated, 157 including 10 strains from Mexico, Netherlands, Sweden, Finland, Portugal and 158 Peru. The nucleotides corresponding to the 5' initiation segment of the forward 159 primer of N region have all been mutated from GGG to AAC (Table 4) CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 3, 2020. . https://doi.org/10. 1101 /2020 and countries and regions were selected to construct the phylogenetic tree. 188 Several distinct clusters were shown in the phylogenetic tree, and the 189 evolutionary relationships of each cluster were analyzed. It was found that the CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 3, 2020. . https://doi.org/10.1101/2020.04.27.20081349 doi: medRxiv preprint that were deposited in April into the GISAID database. After deleting the 222 low-quality sequences, 45 were retained for analysis (data not shown). CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 3, 2020 . . https://doi.org/10.1101 /2020 process, we found that the first patient strain ( is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 3, 2020. . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 3, 2020 May 3, . . https://doi.org/10.1101 May 3, /2020 doi: medRxiv preprint . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 3, 2020. CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 3, 2020. . https://doi.org/10. 1101 /2020 . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 3, 2020. . https://doi.org/10. 1101 /2020 . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 3, 2020. . https://doi.org/10. 1101 /2020 . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 3, 2020. . https://doi.org/10. 1101 /2020 USA/WA1-F6 bat/Yunnan USA/WA1-F6 Chongqing/YC01 bat/Yunnan Note: Ks, synonymous mutation rate; Ka: non-synonymous mutation rate 362