key: cord-313265-lff5cajm authors: Conway, Michael J. title: Identification of coronavirus sequences in carp cDNA from Wuhan, China date: 2020-03-16 journal: J Med Virol DOI: 10.1002/jmv.25751 sha: doc_id: 313265 cord_uid: lff5cajm Severe acute respiratory syndrome (SARS)‐like coronavirus sequences were identified in two separate complementary DNA (cDNA) pools. The first pool was from a Carassius auratus (crusian carp) cell line and the second was from Ctenopharyngodon idella (grass carp) head kidney tissue. BLAST analysis suggests that these sequences belong to SARS‐like coronaviruses, and that they are not evolutionarily conserved in other species. Investigation of the submitting laboratories revealed that two laboratories from the Institute of Hydrobiology at the Chinese Academy of Sciences in Wuhan, China performed the research and submitted the cDNA libraries to GenBank. This institution is very close in proximity to the Wuhan South China Seafood Wholesale Market where SARS‐CoV‐2 first amplified in the human population. It is possible that these sequences are an artifact of the bioinformatics pipeline that was used. It is also possible that SARS‐like coronaviruses are a common environmental pathogen in the region that may be in aquatic habitats. Severe acute respiratory syndrome (SARS)-CoV-2 emerged in the Chinese city of Wuhan in December 2019, and causes a respiratory illness called COVID-19, which can spread from person to person. As of 2 March 2020, there have been 89 868 cases and 3069 deaths, and the virus has spread to six continents. There are no specific antivirals or vaccines for this disease. SARS-CoV-2 is a member of the Coronaviridae family and includes a number of viruses that cause the common cold (eg, 229E, OC43, NL63, and HKU1). 1 SARS-CoV-2 is also related to SARS-CoV and Middle East respiratory syndrome coronavirus (MERS-CoV), which cause severe disease that is associated with high mortality rates. 2, 3 Coronaviruses originate in bats but are zoonotic and can circulate in a number of mammals and birds. 1 SARS-CoV emerged into the human population because of the exotic food trade in China, which allowed for a transmission event between a civet cat and a human. 4 MERS-CoV has emerged into the human population because of the use of camels as livestock, which has allowed for multiple transmission events between camels and humans. 5 The intermediate animal host of SARS-CoV-2 is still unknown. 2, 3 In this study, I mined the NCBI expressed sequence tag (est) database to discover sequences related to SARS-CoV-2. SARS-like virus sequences that were highly homologous to SARS-CoV-2 were identified in two separate complementary DNA (cDNA) pools. The first pool was from a Carassius auratus (crusian carp) cell line and the second was from Ctenopharyngodon idella (grass carp) head kidney tissue. 6, 7 The sequence from C. auratus cDNA was 152 amino acids long and the sequence from C. idealla cDNA was 88 amino acids long. BLAST analysis suggests that these sequences belong to SARS-like coronaviruses, and that they are not evolutionarily conserved in other species. Investigation of the submitting laboratories revealed that two la- Translated nucleotide BLAST (tblastn) database searches were performed by searching for SARS-CoV-2 protein sequences ( Figure S1 ) in the NCBI expressed sequence tag (est) database. Two cDNA clones were identified and accession numbers are GE213092 and JK851329.1. Standard Protein Blast was performed using the identified protein sequences against the NCBI nonredundant database and Clustal Omega was used to align protein and nucleic acid sequences. Phylogeny.fr "One Click" Mode was used to align, curate, generate the phylogeny, and for tree rendering. To identify novel coronavirus sequences, translated nucleotide BLAST (tblastn) database searches were performed by searching for SARS-CoV-2 sequences in the NCBI expressed sequence tag (est) database. SARS-CoV-2 protein sequences (amino acids 266-13 468 and 13 468-21 555) used to search the database ( Figure S1 ). Tblastn analysis using this sequence identified two cDNA clones that were highly homologous to SARS-like coronaviruses. These clones were from two separate cDNA pools. The first cDNA pool was made from Carassius auratus (crucian carp) blastulae embryonic cell line and contained a sequence of 152 amino acids that covered 2% of the SARS-CoV-2 genome and was 93.42% identical. Standard Protein BLAST analysis found that this sequence represents a portion of a coronavirus RNA-dependent RNA polymerase (RdRp) and was homologous to SARS-like coronaviruses ( Table 1 ). The second cDNA pool was made from Ctenopharyngodon idella (grass carp) head kidney and contained a sequence of 88 amino acids that covered 1% of the SARS-CoV-2 genome and was 93.18% identical. Standard Protein BLAST found that this sequence represents a portion of a coronavirus helicase protein and was also homologous to SARS-like coronaviruses (Table 2 ). Protein and nucleic acid alignments of each cDNA clone were performed to compare with the most related coronavirus sequences (Figures 1 and 2) . Phylogenetic analysis showed that the C. auratus SARS-like coronavirus sequence clusters with other SARS-like coronaviruses ( Figure 3 ). Hosts and sources of endemic human coronaviruses Evolutionary history, potential intermediate animal host, and cross-species analyses of SARS-CoV-2 Composition and divergence of coronavirus spike proteins and host ACE2 receptors predict potential intermediate hosts of SARS-CoV-2 SARS virus: the beginning of the unraveling of a new coronavirus Emergence of the Middle East respiratory syndrome coronavirus Identification and characterization of hypoxia-induced genes in Carassius auratus blastulae embryonic cells using suppression subtractive hybridization Complementary DNA sequencing: expressed sequence tags and human genome project The authors declare that there are no conflict of interests. http://orcid.org/0000-0002-8878-6065