key: cord-0951988-mdefkgyh authors: Nemr, Waleed A.; Radwan, Nashwa K. title: Typing of alpha and beta coronaviruses by DNA barcoding of NSP12 gene date: 2022-01-11 journal: J Med Virol DOI: 10.1002/jmv.27550 sha: 4b00b1cb9b4f99a9b00c7915e8425ce86ddff5fc doc_id: 951988 cord_uid: mdefkgyh Since the spread of the COVID‐19 pandemic, the world paid attention to coronaviruses (CoVs) evolution and their diverged lineages because many researches studies supposed that the severe acute respiratory syndrome coronavirus 2 (SARS‐CoV‐2) is evolutionarily developed from a lineage of bats CoVs. This is due to the ability of some mutant CoVs to transmit from a host to different hosts. For this reason, there are many fears about the pathogenicity of the upcoming variants of CoVs. Thus, it is important to get a rapid and economic technique for typing a wide range of human and animal CoVs species for following up their mutant transmission. Therefore, the present study aims at approaching a simple design of DNA barcoding of a wide range of mammals' CoVs (including alpha and beta CoVs), by universal amplification of a species‐specific sequence inside a conserved gene (NSP12) followed by amplicon sequencing. The in silico evaluation involved 96 nucleotide sequences of different CoVs (18 alpha CoVs and 78 beta CoVs), and was applied experimentally into the lab on 5 human CoVs isolates; 3 of them belong to beta CoVs (OC43, MERS, and SARS‐CoV‐2) and 2 are alpha CoVs (229E and NL63). The results indicated that the designed universal primers are able to amplify 332 bp of a taxonomic region inside the NSP12 coding sequence that facilitates the identification and classification of mammals' CoVs upon the resulting phylogenetic tree. Coronaviruses (CoVs) are a group of RNA viruses that can infect a wide range of mammals and birds. The coronavirus particle consists of a single- [hCoVs]) causing symptoms ranging from mild to severe and may lead to death. Two of them belong to Embecovirus, known as HKU1 and OC43. Another two strains belong to Sarbecovirus, known as SARS Each strain of hCoVs was developed by multiple complicated mutations events during evolution. This is due to a higher error rate of copying RNA genome during viral multiplications into the host cell. Hence, RNA viruses have a better ability to generate variants than DNA viruses. The most important mutations are those occurring in genes coding for the outer structural proteins, such as S protein, which is the responsible protein for binding with the specific host cell receptor. This increases the chance of different host transmission for the newly generated variants. 3 The evolution reports of SARS-CoV-2, which caused coronavirus disease-2019 (COVID-19) pandemic, supposed its relation to bat CoVs lineages as a genetic reservoir that transmitted to humans after chances of series mutations. 4 Furthermore, many single nucleotide variants have diverged from the first identified isolate of SARS-CoV-2 from Wuhan, China (NC_045512.2). Hence, it was expected that further CoVs variants will diverge, and they may get the chance to acquire different pathogenicity and immunogenicity characteristics as compared to the originated isolates. So, it is important to follow up the transmission of different geographic isolates of diverged CoVs, especially for species closely related to humans and bats. Full genome sequencing is the traditional technique to search for CoVs variants. However, it is an expensive technique and difficult to apply to low-income labs for surveying a large number of samples. Therefore, it is important to find out a simple and less expensive technique to diagnose and identify different types of CoVs as a preliminary step to forward the new isolates for full genome sequencing. Nucleotide sequences of a variety of conserved genes among related organisms are targeted for phylogenetic and identification studies. In detail, this nucleotide variation is used to distinguish different holotypes of an organism by a simple method termed DNA barcoding, which is based on PCR-amplicon sequencing using universal primers. The validity of DNA barcoding protocol depends on obtaining distinguishable reference sequences for each holotype and confirming the taxonomy of the related organisms. 5 Generally, according to information announced by the ICTV, all Coronaviridae members shared the genetic structure and organization of the replicase gene, which is encoding for essential proteins involved in the viral replication processes into the host cell. The structure of this gene is a complex of 2 overlapped open reading frames (ORFs); 1a and 1b. The translation products of them comprised 2 polyproteins (pp1a and pp1ab) that are subsequently cleaved by posttranslational processes to produce 16 mature products, called nonstructural proteins (NSPs). One of these proteins is RNA-dependent RNA polymerase (RdRp) which is comprised of NSP12. Moreover, ICTV cited that, the nucleotide sequence encoding for NSP12 is considered one of the most important taxonomic elements to classify CoVs according to the phylogeny. This means that the NSP12 gene contains variable nucleotide sequences that can distinguish the related and the unrelated species to an unknown isolate, depending on the estimated genetic distance of this gene. Thus, the present study aims at evaluating alpha and beta CoVs genomes, especially in the region of NSP12, to extract a conserved gene containing species-specific barcoding sequence for typing these CoVs with respect to their classification phylogeny, to facilitate the rapid identification of new CoV isolates. Consequently, inside the selected regions, well-aligned sequences were evaluated to locate universal primers matching with homologous flanking regions surrounding the species-specific sequences, using BioEdit sequence alignment software. 6 Candidate taxonomic amplicon sequences were evaluated to be distinguishable for each species by estimating the difference between the max score and the bit score of each aligned sequence, compared to the same region of other CoVs lineages (only sequences with 100% coverage), using a bit-scoring system of BLAST web-based tool (https://blast. ncbi.nlm.nih.gov/). Furthermore, sequence variability among different CoVs amplicons (involved hCoVs and the other related CoVs resulted by BLAST) was evaluated by estimating the entropy values of each nucleotide site, using BioEdit software. 6 The selected PCR primers were validated by in silico amplification of targeted CoVs genomes, using NCBI Primer-BLAST webbased tool. 7 The produced amplicon sequences of 96 different holotypes of alpha and beta CoVs were submitted for phylogenetic analysis using the Maximum Composite Likelihood method 8 conducted in MEGA6 software. 9 The unintended matching of the selected primers was investigated using the Primer-BLAST tool against the nucleotide sequence of the human genome and other human viruses (which associated with the respiratory tract) such as Adenoviridae T A B L E 2 BLAST scoring of different lineages of alpha and beta coronaviral amplicons (332 bp), reflects the resolution of sequence differentiation between inter-and intra-species The whole-genome sequence alignment of different human CoVs showed that RNA replicase gene (encoded by ORF1ab) has conserved sequences as compared to other regions, particularly the part of ORF1b, which indicates the lower mutation rate of this region during the evolution of different CoVs. However, inside the ORF1b at the NSP12 coding sequence, there are species-specific sequences between short lengths of somewhat homologous sequences which are selected for locating primers to be universal for targeting all types of alpha and beta CoVs. Therefore, the selected primers have some ambiguous nucleotide bases (Figure 1 ). In silico PCR amplification of the selected primers produced a single PCR product of 332 bp for hCoVs genomes, without any unintended matching with human genomes and other human respiratory viruses ( Table 1 ). The matching score of each amplicon sequence against other related CoVs genomes was estimated to evaluate the suitability of the selected amplicon to be a taxonomic sequence distinguishable for each type of targeted CoVs. The resolution of discrimination between different species of CoVs was estimated according to the difference between the max score of the complete aligned sequence and the bit score of the related species. The results indicate that each amplicon sequence is distinguishable for the species, with sub-species resolution in some lineages ( Table 2 The validity of the given PCR technique for identifying CoVs gen- The best strategy to overcome such a pandemic is the early detection of the pathogen and the tendency to prepare billions of doses of suitable vaccines. Thus, the premeditation to have a step before pathogen outbreak needs to find easy and fast diagnosis tools. The coupling of both diagnosis and identification processes in a tool is the best for large survey studies. Therefore, the present study tends to modify this protocol to match all alpha and beta CoVs. It is inferred from experience that the use of a specific diagnosis tool restricted for targeting a species, such as using species-specific PCR primers, may lead to losing the ability to detect its diverged variants before their subsequent outbreak. This probability is referred to the chance of mutations that may result in nucleotide substitutions inside the PCR priming sites. Thus, by the approached tool in the current research, it is easy to target a specific species of CoVs, by designing a specific probe inside the variable region of the amplicon sequence, to be included in the PCR. This will facilitate the ability to identify other CoVs genomes (if present) in probe-unmatched samples which showed positive PCR amplification. The present research approached a simple protocol for molecular typing of mammals' CoVs (including alpha and beta CoVs), using DNA barcoding of a taxonomic sequence inside a conserved gene (NSP12). By this protocol, it is easy to diagnose (positive PCR indicates infected sample) and identifies a wide range of these viruses by shortlength sequencing of the obtained amplicon (332 bp). This will be useful for surveying different geographic isolates of the targeted CoVs, and suitable to detect diverged variants. Great of thanks to Professor Ehab Serag, Microbiology Reference Lab of the National Blood Transfusion Service (NBTS), Ministry of Health, Egypt, for providing the authors with purified RNA samples of identified hCoVs. Waleed A. Nemr contributes by the practical part of PCR, and writing the paper. Nashwa K. Radwan contributes by the interpretation of results and helps in the in silico design of PCR. Coronavirus spike protein and tropism changes Betacoronavirus genomes: how genomic information has been used to deal with past outbreaks and the COVID-19 pandemic Genetic characterization of Betacoronavirus lineage C viruses in bats reveals marked sequence divergence in the spike protein of pipistrellus bat coronavirus HKU5 in Japanese pipistrelle: implications for the origin of the novel Middle East respiratory syndrome coronavirus A pneumonia outbreak associated with a new coronavirus of probable bat origin DNA, and the bar code of life BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT Primer-BLAST: a tool to design target-specific primers for polymerase chain reaction Prospects for inferring very large phylogenies by using the neighbor-joining method MEGA6: molecular evolutionary genetics analysis version 6.0 Analysis of partial sequences of the RNA-dependent RNA polymerase gene as a tool for genus and subgenus classification of coronaviruses Genetic diversity of SARS-CoV-2 over a one-year period of the COVID-19 pandemic: a global perspective Analysis of SARS-CoV-2 mutations in the United States suggests presence of four substrains and novel variants Diversity of coronavirus in bats from Eastern Thailand How to cite this article: Nemr WA, Radwan NK. Typing of alpha and beta coronaviruses by DNA barcoding of NSP12 The authors declare that there are no conflict of interests. The data that support the findings of this study are available from the corresponding author upon reasonable request. Waleed A. Nemr http://orcid.org/0000-0001-8465-0497