key: cord-314415-yr0uxok2 authors: Guo, Zijing; He, Qifu; Tang, Cheng; Zhang, Bin; Yue, Hua title: Identification and genomic characterization of a novel CRESS DNA virus from a calf with severe hemorrhagic enteritis in China date: 2018-08-15 journal: Virus Res DOI: 10.1016/j.virusres.2018.07.015 sha: doc_id: 314415 cord_uid: yr0uxok2 In this study, a novel circular replication-associated protein (Rep)-encoding single stranded (CRESS) DNA virus was discovered in diarrheic sample of a calf with severe hemorrhagic enteritis. The virus, named Bo-Circo-like virus CH, has a circular genome with 3909 nucleotides (nt). Six putative open reading frames (ORFs) were identified, including Rep, capsid (Cap) and four proteins of unknown function. Both the genome size and the number as well as the organization of encoded ORFs, Bo-Circo-like virus CH is most closely related to Po-Circo-like virus 21 detected in pig faeces. A preliminary survey using specific primers for the Rep region showed that 5.3% (4/75) of diarrheic samples were positive for Bo-Circo-like virus, and all 42 healthy samples were negative. In conclusion, our results indicate that Bo-Circo-like virus CH may represent a new virus in bovine. Further investigation is needed to determine the relationship between the virus infection and diarrhea. Viruses with a circular ssDNA genome are found in archaea, bacteria, and eukaryotic organisms (Shulman and Davidson, 2017) . Until recently, the CRESS DNA viruses have been classified into six families by the International Committee on Taxonomy of Viruses (ICTV), namely Geminiviridae, Nanoviridae, Circoviridae, Genomoviridae, Bacilladnaviridae and Smacoviridae. Viruses of families Geminiviridae Nanoviridae and Bacilladnaviridae were known to infect plants (Ramesh et al., 2017; Rybicki and Martin, 2014; Kazlauskas et al., 2017) . Viruses of the new families Smacoviridae and Genomoviridae had been identified in faecal samples of various vertebrates, including humans and bovine Krupovic et al., 2016; Steel et al., 2016; Kim et al., 2012) . The family Circoviridae includes the genera Circovirus and Cyclovirus, which widely infected humans and certain animals (Fu et al., 2018; Hattermann et al., 2003; Li et al., 2011; Lorincz et al., 2011; Phan et al., 2014) . The genomes of circoviruses and cycloviruses range in size from 1.7 to 2.1 kb and contain two major ORFs, which encode Rep and Cap proteins (Breitbart et al., 2017) . The taxonomic classification for the family Circoviridae suggested that Rep identities and genome organization should be used to identify a genome as belonging to either the Circovirus or Cyclovirus genus (Rosario et al., 2017) . Furthermore, a new species demarcation threshold of 80% genome-wide pairwise identity for members of the family Circoviridae (Breitbart et al., 2017; Rosario et al., 2017) . In addition to six recognized CRESS DNA viral families, a large number of novel CRESS DNA viruses, waiting to be classified by ICTV, have been rapidly discovered in faeces of humans and animals through viral metagenomics in recent years. The hosts include chimpanzees (Blinkova et al., 2010) , pigs Zhang et al., 2014) , rodents (Phan et al., 2011) , bats (Ge et al., 2011) , humans (Castrignano et al., 2013) , dromedaries (Woo et al., 2014) , duck (Steel et al., 2016) , deer (Steel et al., 2016) , llama (Steel et al., 2016) , chamois (Steel et al., 2016) , bovine (Steel et al., 2016) , flying fox (Male et al., 2016) , chickens (Lima et al., 2017) and macaques (Kapusinszky et al., 2017) . Among these novel CRESS DNA viruses, a group of viruses with large genomes have been proposed family Kirkoviridae according to genomic and Rep-phylogenetic characteristics (Li et al., 2015; Zhao et al., 2017) . Until now, seven reported CRESS-DNA viruses genomes can be classified as the proposed family Kirkoviridae (Li et al., 2015; Shan et al., 2011; Zhao et al., 2017) , including four genomes determined in the feces of healthy and diarrhoeic pigs , one genome determined in the liver and spleen of a horse with fatal idiopathic hepatopathy (Li et al., 2015) and two genomes determined in patients feces with type 1 diabetes (Zhao et al., 2017) , successively. A significant characteristic of these viruses in the proposed family Kirkoviridae shows large genomes ranging in size from 2833 to 3923 nt with three to six ORFs. Additional, all the Rep proteins of these viruses show significant genetic distance with that of the viruses within family Circoviridae. Although one member named Kirkovirus Equ1 in the proposed family Kirkoviridae was suggested that it may be potentially related to disease (Li et al., 2015) , the pathogenesis of these viruses is still unknown. In this study, a novel bovine-derived CRESS DNA virus was found in diarrheic faecal sample through viral metagenomics. The fecal sample was collected from a 4-month-old Simmental cross calf with severe hemorrhagic enteritis accompanied by dyspnea and fever in October 2016 in Sichuan Province, China. The sample was suspended in phosphate-buffered saline (PBS) to create a 10% suspension and centrifuged at 12 000g for 10 min at 4°C. The supernatant was then filtered through a 0.22 mM filter (Millipore) to remove intact bacteria and other large cellular debris. The filtrates was treated with 10 U DNase and 1.5 mg RNase enzymes (TaKaRa) at 37°C for 90 min to remove unprotected nucleic acids. Total nucleic acid from the sample was extracted using a QIAamp Viral RNA Mini kit (Qiagen) following the manufacturer's instructions. Reverse transcription was performed using SuperScript III reverse transcriptase (Invitrogen) and random hexamers (Invitrogen). The quantity of cDNA was determined by a Qubit Fluorometer (Life Technologies). The Sample of 230 ng cDNA was used to construct a library according to the manufacturer's instructions (TruSeq RNA sample preparation kit) and sequenced on HiSeq 4000 (Illumina), as described in the previous studies (Chen et al., 2015) . Raw sequence reads were trimmed to remove the reads of adaptors, duplicate reads and bovine genomic sequences, and a minimum length of 150 bp was selected. The reads that passed the data processing were considered the useful sequences. In parallel, all useful reads were subjected to de novo contig assembly using the IDBA_UD (Peng et al., 2012) , and evaluate assembly results using SOAP2 software (http:// soap.genomics.org.cn/). The assembly data were aligned with sequences in the NCBI nonredundant nucleic database (NT) and the nonredundant protein database (NR) using BLASTN and BLASTX, respectively. The taxonomies of the aligned read with the best BLAST value (E value < 10 −4 ) were selected and used for further grouping analysis. The abundance of viruses was scanned by SOAP aligner software to further analyse the diversity of the species identified from the calf diarrheic sample. A total of 5 211 770 bases were generated from the Illumina sequencing run, and approximately 2.1% of the sequence data matched those of mammalian viruses. The raw sequence reads from the viral metagenomic libraries were deposited in GenBank database under the accession number SRP108885. Seven konwn mammalian viruses consist of bovine enterovirus (8785 reads), bovine kobuvirus (549 reads), bovine herpesvirus 1 (199 reads), nebovirus (107 reads), bovine coronavirus (50 reads), bovine adenovirus type 3 (5 reads) and bocaparvovirus (2 reads) were found, and the length of each read was 150 bp. Notably, 122 reads reveal significant sequence similarity with Po-Circo-like viruses (GenBank accession no. NC_025682.1 and JF713717. 1), and the reads were assembly into two large contigs. One contig consists of 2277 nt (position: 3904-3912-0-2291 nt) and the other contig consists of 1231 nt (position: 2728-3912-0-48 nt), referencing the genome (3912 nt) of strain Po-Circo-like virus 21 (GenBank accession no. NC_025682.1). To acquire the full-length genome of the CRESS-DNA virus and further verify the sequences of viral metagenomic, four set of primers were designed based on the viral metagenomic sequencing data (GenBank accession no. SRP108885). The primers sequences were shown in Table 1 . PCR was performed with Premix Taq™ (TaKaRa, China), following the manufacturer's protocol. The PCR amplification products were gel-purified using a gel extraction kit (OMEGA, USA) and then cloned into the pMD19-T vector and transformed into Escherichia coli DH5αcompetent cells (Fig. 1) . The recombinant plasmids were extracted using a plasmid extraction kit (OMEGA, USA) and then sequencing was performed by Sangon Biotech (China, Chengdu). The sequences were assembled using SeqMan software (version 7.0; DNASTAR, USA). Putative ORFs in the circular genomes were identified using the ORF Finder tool (http://www.ncbi.nlm.nih.gov/gorf/gorf. html). The hairpin and stem-loop structures were identified in Mfold webserver (Zuker, 2003) . Nucleotide and deduced amino acid sequences were analyzed using the MegAlign program of DNASTAR7.0 software (DNASTAR Inc., WI, USA) to determine sequence homology. MEGA 7.0 was used to perform the multiple sequence alignment and subsequently to build a maximum likelihood phylogenetic tree using the LG model with 1000 bootstrap support (Kumar et al., 2016) . The novel circular DNA virus has a complete genome of 3909 nt with G+C content of 38.76% (GenBank accession no. MH316857). Six putative ORFs (> 100aa) were identified, including Rep protein of 307 aa, Cap protein of 180 aa and four proteins of unknown function of 185, 129, 106 and 104 aa. A stem-loop structure was predicted to be located close to the putative Rep gene stop codon in the intergenic region (Fig. 2) . Both the genome size and the number as well as the organization of encoded ORFs of the virus were similar to those of the viruses of recently proposed Kirkoviridae (Li et al., 2015; Shan et al., 2011; Zhao et al., 2017) . (Li et al., 2015; Zhao et al., 2017; Wang et al., 2018) . Due to the remarkably diversity in the genome size, number of ORFs and genome sequences of these CRESS DNA viruses, it is only possible to construct a phylogenetic tree for them based on the Rep gene, which is the most conserved gene in the genomes of these novel viruses (Castrignano et al., 2017; Varsani and Krupovic, 2017) . Both the genome size and the number as well as the organization of encoded ORFs of strain Bo-Circo-like virus CH indicated that it showed closest genetic relationship with strain Po-Circo-like virus 21. However, two ORFs (129 residues and 104 residues) of the virus did not have significant similarity to any ORFs in GenBank, and a recombination event seems occurred in a ORF (185 residues). Generally, recombination and horizontal acquisition of non-homologous genes were significant features in ssDNA virus (Krupovic and Koonin, 2014; Lefeuvre et al., 2009; Martin et al., 2011) . This phenomenon was also observed in the CRESS DNA virus (Kazlauskas et al., 2017 (Kazlauskas et al., , 2018 Lefeuvre and Moriones, 2015) , which may contribute to evolution of the CRESS DNA virus. The putative Rep protein (307 residues) of Bo-Circo-like virus CH shares 93.9% aa identity and 88.1% aa identity with that of Po-Circolike virus 21 and Po-Circo-like virus 22, respectively, but shares 35.7-51.5% aa identity with that of other five reported members of the proposed family Kirkoviridae. The Rep-representative members of the families Circoviridae, Nanoviridae, Geminiviridae Genomoviridae, Bacilladnaviridae and Smacoviridae, and still unassigned novel ssDNA viruses available in GenBank database were used constructed a maximum-likelihood phylogenetic tree. The result showed that Bo-Circo-like virus CH is clustered into a independent branch with seven reported strains of proposed family Kirkoviridae and eight CRESS-DNA virus strains recently submitted to GenBank database; Bo-Circo-like virus CH is more closely related to Po-Circo-like virus and shows significant genetic differences with viruses in the families Circoviridae, Nanoviridae, Geminiviridae Genomoviridae, Bacilladnaviridae and Smacoviridae (Fig. 3) . Further analysis of 16 strains in the new clade suggested that these strains display large genomes ranging in size from 2833 to 4905 nt with three to six ORFs. The hosts includs human, macaca, bovine, porcine, horse, dromedary and rodent. Although the classification criteria of the proposed family Kirkoviridae is not yet clear (Li et al., 2015; Zhao et al., 2017) , it seems that these viruses may be classified into the family proposed Kirkoviridae. The third-largest ORF (180 residues) of Bo-Circo-like virus CH was predicted to encode Cap protein since high percentage (8/19) of the basic amino acids arginine and lysine near the N-terminus of the ORF was observed, which is characteristic of circovirus Cap proteins (Stewart et al., 2006) . Additional, a BLASTp search using the ORF showed that it hits the putative Cap protein of Macaca mulatta cg10456 (GenBank accession no. KU043416.1; QC = 81%, I = 48%, Ev = 2e-28). The putative Cap protein of Bo-Circo-like virus CH also matches hypothetical protein of six strains of the proposed family Kirkoviridae except for Kirkovirus Equ1. The putative Cap protein of Bo-Circo-like virus CH shares 80.6% aa identity and 80.1% aa identity with that of Po-Circo-like virus 21 and Po-Circo-like virus 22, respectively, and shares 38.9-46.4% aa identity with that of other four members of the proposed family Kirkoviridae. Four ORFs of unknown function were also identified in Bo-Circo-like virus CH. Among these ORFs, two ORFs (129 residues and 104 residues) did not have significant similarity to any ORFs in GenBank database, including ORFs of the viruses belong to family Kirkoviridae. One ORF (106 residues) of other two ORFs showed that it only hits a putative protein of Po-Circo-like virus 21 (QC = 74%, I = 90%, Ev = 1e-18) using a BLASTp search. Interestingly, for another ORF (185 residues), partial aa fragment (1-70aa) in the N-terminus showed 71% aa identity with the strain Po-Circo-like virus 22, but the region was deleted in strain Po-Circo-like virus 21, and the second region (71-185aa) only showed 21.4% aa identity with two-component system sensor histidine kinase BasS of Pectobacterium atrosepticum (GenBank accession no. WP_011095521.1), which seems a recombination event in the ORF. Due to the lack of the parental nt sequences in Genbank database, recombination analysis could not be further performed. In fact, it is not surpring that high rates of recombination in CRESS DNA genomes had been reported (Kazlauskas et al., 2018; Lefeuvre and Moriones, 2015) . To investigate the presence of this newly identified virus in other cattles, 75 diarrheic and 42 healthy faecal samples were collected from the provinces of Sichuan (two farms), Henan (three farms), Liaoning (three farms) and Shandong (three farms), China. A primer set was designed based on conserved Rep sequences. The primer sequences were as follows; F: 5′-AAGAACACCTGGATGAAGGAACGC-3′; R: 5′-GCCAGTCATCAATCACAACCCTCT-3′, the amplified fragment was 540 bp. Viral DNA was extracted from individual calf fecal samples using phenol-chloroform. The PCR amplification products were gelpurified using a gel extraction kit (OMEGA, USA) and then cloned into the pMD19-T vector and transformed into Escherichia coli DH5αcompetent cells. The recombinant plasmids were extracted using a plasmid extraction kit (OMEGA, USA) and then sequencing was performed by Sangon Biotech (China, Chengdu). In the case of bovine fecal samples, approximately 5.3% (4/75) of diarrheic faecal samples were positive for Bo-Circo-like virus, and all 42 healthy samples were negative by PCR assays. Four positive samples were detected from two farms (detection rate: 2/14 and 2/23, Fig. 2 . Predicted genome organization of strain Bo-Circo-like virus CH. Six ORFs (> 100aa) were predicted in strain Bo-Circo-like virus CH, including Cap, Rep and four proteins of unknown function. The potential stem-loop was predicted to be located close to the putative Rep gene stop codon in the intergenic region; detailed nucleotides sequences are also presented. respectively), and the geographical distance between two farms is more than 1000 km. The 540 bp of the four Bo-Circo-like virus sequences shared 98.2%-99.3% nt identity with Bo-Circo-like virus CH, and shares 90.2%-90.4% nt identity with strains Po-Cicro-like virus 21. Phylogenetic analysis of these partial Rep nt sequences showed that all Bo-Circo-like virus were clustered into a dependent branch and were more related to Po-Cicro-like virus 21 (Fig. 4) . The PCR screening sequence fragments were deposited in GenBank under the following Fig. 3 . Phylogenetic analysis was performed based on the amino acid sequence of ORF1 presumable encodes Rep protein. The sequence alignments included strain Bo-Circo-like virus CH in this study, representative members of the Circoviridae, Geminiviridae, Nanoviridae, Genomoviridae, Bacilladnaviridae and Smacoviridae families, the proposed new genera of krikoviruses, and still unassigned novel CRESS-DNA viruses with the best BLASTp matchs in GenBank database. The tree was constructed by the maximum-likelihood method with LG model and 1000 bootstrap replicates in MEGA 7.0 software. Only bootstrap values > 50% are shown. The strain in this study is marked with red box, and seven reported strains of the proposed family Kirkoviridae are marked with black boxes. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article). accession numbers MH316858-MH973561. Although the number of fecal samples was limited, all Bo-Circo-like virus in this study were detected from diarrheic faecal samples. Recently, one member named Kirkovirus Equ1 in the proposed family Kirkoviridae was suggested that it may be potentially related to disease, and the virus was found in the liver and spleen of a horse with fatal idiopathic hepatopathy (Li et al., 2015) . Thus, further investigation is needed to determine the pathogenesis of the Bo-Circo-like virus. In conclusion, a novel bovine-derived CRESS DNA virus, named Bo-Circo-like virus CH, was identified from a calf with fatal hemorrhagic enteritis in China. It shows a circular genome of 3909 nt with six putative ORFs, including Rep protein, Cap protein and four proteins of unknown function. Based genome organization and phylogenetic analysis, the Bo-Circo-like virus CH may represent a new virus in the proposed family Kirkoviridae. Further investigation is needed to determine the pathogenesis of the virus. The raw sequence reads from the viral metagenomic libraries have been deposited into as BioProject, Biosample and SRA at NCBI, the accession number were PRJNA389792, SAMN07208491 and SRP108885, respectively. The complete genome sequence of Bo-Circolike virus CH has been deposited in GenBank under accession no. MH316857; the 4 partial Rep sequences of Bo-Circo-like virus have been deposited in GenBank under accession nos. MH316858-MF973561. This work was funded by Sichuan Province Applied Foundation Project (grant number 2017JY0066), the Innovation team for animal epidemic diseases prevention and control on Qinghai-Tibet Plateau, State Ethnic Affairs Commission (grant number 13TD0057) and the Innovative Research Project of Graduate Students in Southwest University for Nationalities (grant number CX2017SZ057). The authors declare that they have no conflicts of interest. The study does not contain any studies with human participants or animals performed by any of the authors, and all experiments in this study are in compliance with ethical standards for research. Fig. 4 . Phylogenetic analysis based on partial ORF1 nucleotides sequences (540-nt) of presumable encodes Rep gene. The sequence alignments included five Bo-Circo-like virus strains detected in this study and seven reported strains of the proposed family Kirkoviridae. The tree was constructed by the maximum likelihood method with Hasegawa-Kishino-Yano model and 1000 bootstrap replicates in MEGA 7.0 software. Only bootstrap values > 50% are shown. The strains in this study are marked with red boxes. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article). Novel circular DNA viruses in stool samples of wild-living chimpanzees ICTV Report Consortium, 2017. ICTV virus taxonomy profile: Circoviridae Two novel circo-like viruses detected in human feces: complete genome sequencing and electron microscopy analysis Identification of circo-like virus-Brazil genomic sequences in raw sewage from the metropolitan area of São Paulo: evidence of circulation two and three years after the first detection A novel astrovirus species in the gut of yaks with, diarrhoea in the Qinghai-Tibetan Plateau Insights into the epidemic characteristics and evolutionary history of the novel porcine circovirus type 3 in southern China Genetic diversity of novel circular ssDNA viruses in bats in China Cloning and sequencing of Duck circovirus (DuCV) Case control comparison of enteric viromes in captive rhesus macaques with acute or idiopathic chronic diarrhea Evolutionary history of ssDNA bacilladnaviruses features horizontal acquisition of the capsid gene from ssRNA nodaviruses Pervasive chimerism in the replicationassociated proteins of uncultured single-stranded DNA viruses Identification of a novel single-stranded, circular DNA virus from bovine stool Evolution of eukaryotic single-stranded DNA viruses of the Bidnaviridae family from genes of four other groups of widely different viruses Genomoviridae : a new family of widespread single-stranded DNA viruses MEGA7: Molecular Evolutionary Genetics Analysis version 7.0 for bigger datasets Mycovirus-like DNA virus sequences from cattle serum and human brain and serum samples from multiple sclerosis patients Recombination as a motor of host switches and virus emergence: geminiviruses as case studies Widely conserved recombination patterns among single-stranded DNA viruses Possible cross-species transmission of circoviruses and cycloviruses among farm animals Exploring the virome of diseased horses Fecal virome of healthy chickens reveals a large diversity of the eukaryote viral community, including novel circular single-stranded DNA viruses First detection and analysis of a fish circovirus Cycloviruses, gemycircularviruses and other novel replication-associated protein encoding circular viruses in Pacific flying fox (Pteropus tonganus) faeces Recombination in eukaryotic single stranded DNA viruses IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth The fecal viral flora of wild rodents Cyclovirus in nasopharyngeal aspirates of Chilean children with respiratory infections Geminiviruses and plant hosts: a closer examination of the molecular arms race Revisiting the txonomy of the family Circoviridae: establishment of the genus Cyclovirus and removal of the genus Gyrovirus Virus-derived ssDNA vectors for the expression of foreign proteins in plants The fecal virome of pigs on a high-density farm Viruses with circular single-stranded DNA genomes are everywhere! Circular replication-associated protein encoding DNA viruses identified in the faecal matter of various animals in New Zealand Identification of a novel circovirus in Australian ravens (Corvus coronoides) with feather disease Sequence-based taxonomic framework for the classification of uncultured single-stranded DNA viruses of the family Genomoviridae Smacoviridae: a new family of animal-associated singlestranded DNA viruses Plasma virome of cattle from forest region revealed diverse small circular ssDNA viral genomes Metagenomic analysis of viromes of dromedary camel fecal samples reveals large number and high diversity of circoviruses and picobirnaviruses Viral metagenomics analysis demonstrates the diversity of viral flora in piglet diarrhoeic faeces in China Intestinal virome changes precede autoimmunity in type I diabetes-susceptible children Mfold web server for nucleic acid folding and hybridization prediction