key: cord-343131-tu6g977q authors: Cheung, Andrew K.; Ng, Terry F.; Lager, Kelly M.; Bayles, Darrell O.; Alt, David P.; Delwart, Eric L.; Pogranichniy, Roman M.; Kehrli, Marcus E. title: A divergent clade of circular single-stranded DNA viruses from pig feces date: 2013-04-24 journal: Arch Virol DOI: 10.1007/s00705-013-1701-z sha: doc_id: 343131 cord_uid: tu6g977q Using metagenomics and molecular cloning methods, we characterized five novel small, circular viral genomes from pig feces that are distantly related to chimpanzee and porcine stool-associated circular viruses, (ChiSCV and PoSCV1). Phylogenetic analysis placed these viruses into a highly divergent clade of this rapidly growing new viral family. This new clade of viruses, provisionally named porcine stool-associated circular virus 2 and 3 (PoSCV2 and PoSCV3), encodes a stem–loop structure (presumably the origin of DNA replication) in the small intergenic region and a replication initiator protein commonly found in other biological systems that replicate their genomes via the rolling–circle mechanism. Furthermore, these viruses also exhibit three additional overlapping open reading frames in the large intergenic region between the capsid and replication initiator protein genes. coronaviruses and enteroviruses were detected in these samples that were subsequently submitted to National Animal Disease Center for additional study. Fecal samples from six pigs were pooled and processed to prepare a viral nucleic acid library. Briefly, viral particles were first purified using size filtration and nucleases [8] . The extracted viral nucleic acids were amplified by random PCR with a specific nucleotide sequence tag for identification. Several libraries, each prepared with a different sequence tag for identification, were combined and subjected to 454 pyrosequencing and analyzed as described previously [4] . Sequencing was performed on a Roche FLX sequencer using Titanium chemistry (Roche, Branford, CT). For comparative purposes, the best BLASTx results were used to categorize the sequences (contigs and singletons) into virus family and genus. Of the 1,296,370 total keypass reads generated, 125,282 reads contained sequence tags belonging to the six pooled fecal samples. Positive sequence reads for a taxonomic group were identified based on deduced protein sequence similarity using a stringent expectation value, best BLASTx expectation scores of B 10 -10 , as the cutoff. Sequence reads at this cutoff level exhibited highly significant protein sequence similarities with known viruses in the database. Viral sequences (coronavirus, enterovirus, rotavirus) corresponding to the viruses identified by the Indiana Animal Disease Diagnostic Laboratory (West Lafayette, IN) were detected. Other viral sequences belonging to the RNA virus families (astrovirus, picobirnavirus, teschovirus, torovirus and sapelovirus) and DNA virus families (anellovirus, circovirus, and parvovirus) were also observed. Several sequences encoding amino acid sequences related to Rep of ChiSCV and PoSCV1 were identified. The ChiSCV-and PoSCV1-related nucleotide sequences detected by deep sequencing (designated Tp1 and Tp2) were used to design primers for PCR. DNA amplification employing converging primers (conventional PCR) was used to confirm the presence of contig sequences in the sample, and diverging primers (inverse PCR) were used to amplify and clone the complete circular viral genomes. Nucleic acids were extracted directly from fecal samples using a QIAamp MinEluteVirus Vacuum Kit (QIAGEN, Valencia, CA) and subjected to rolling-circle amplification to amplify circular DNA molecules (Illustra GenomiPhi V2 DNA Amplification Kit, GE Healthcare Biosciences, Piscataway, NJ). The amplified DNA was used as a template for PCR using converging or diverging primers based on 454 pyrosequencing results. The amplicons were resolved and excised from agarose gels, cloned into plasmid TOPO-CLX104 and introduced into Eschericheria coli TOP10 (Invitrogen, Carlsbad, CA) by transformation. Multiple clones were picked and used for sequence determination using Sanger methods. From the Tp1 PCR product, three clones were analyzed, and they all yielded identical sequences. This viral genome was designated porcine stool-associated circular virus 2 (PoSCV2; Gen-Bank accession number KC545226). From the Tp2 PCR product, four variant genomes were obtained, and the individual genomes were designated PoSCV3-4L5, -3L7, -LT2 and -4L13 with GenBank accession numbers KC545229, KC545227, KC545230 and KC545228, respectively. Similar to the genome organization of other SCVs, the Tp clones (PoSCV2 and all four PoSCV3 clones) were about 2.5 kb in length (Fig. 1a) . The viral genomes can be divided into four regions: two large ORFs with deduced amino acid sequences exhibiting homology to the Rep and Cap of ChiSCV, a LIR that encodes multiple overlapping ORFs, and an SIR that contains a palindromic sequence capable of forming a stem-loop structure. The Rep ORF and Cap ORF are transcribed divergently from the LIR and converge at the SIR. In contrast to the LIR of PoSCV1, which encodes two small ORFs (ORF3 and ORF4) in the same orientation as the Cap gene, the LIRs of PoSCV2 and PoSCV3 also contain an additional ORF (ORF5) in the reverse orientation as the Cap gene. The four PoSCV3 genomes were aligned, and a schematic representation is shown in Fig. 2a . The LIR, Cap region and 5 0 portion of the Rep region exhibited few to no nucleotide differences. Genetic differences were concentrated around the stem-loop structure in the SIR and the 3 0 portion of the Rep ORF. The four genome regions (SIR, Rep-ORF, Cap-ORF and LIR) are described individually in greater detail below. SIR: The SIR sequences of PoSCV2 and PoSCV3 are shown in Fig. 2b . Whereas the Rep ORF of PoSCV3-4L5 overlaps the stem-loop structure, the other four PoSCV3 genomes do not. All five genomes contain a palindromic sequence in the SIR that is capable of forming a stem-loop structure whose nucleotide sequence is well conserved. This stem-loop structure may be part of the origin of DNA replication. Among the PoSCV3 genomes, the SIR sequences on the Cap-gene side are more conserved, while sequences on the Rep-gene side exhibit the greatest differences. Rep ORF: Phylogenetic and pairwise identity analyses were conducted to determine the relationship of Tp clones to other viruses. A phylogram was created based on the deduced amino acid sequences encoded by the Rep gene (Fig. 2c) . The amino acid sequences were aligned using Mafft 5.8 [2] with the E-INS-I alignment strategy and previously described parameters [5, 6] . A maximum-likelihood tree was created using RaxML based on the Mafft alignment with previously described parameters [6, 10] . The resulting tree was midpoint rooted using MEGA4 [11] . Pairwise identity analysis of the PoSCV genes and ORFs was also performed using MEGA4 [11] . The results showed that the Tp clones were most closely related to ChiSCV or PoSCV1, and they clustered into a distinct clade with PoSCV2 and PoSCV3, separated into two different sub-groups. There is limited amino acid sequence identity (23-32 %) between the Tp clones and bovine SCV (BoSCV) [3] or PigSCV. The amino acid sequence identities between Tp:PoS-CV1 and Tp:ChiSCV were approximately 50 % and 40 %, respectively (Table 1a) . The nucleotide or amino acid sequence identity between PoSCV2 and PoSCV3 was approximately 87 %, and the sequence identity among the PoSCV3 variants was 93-100 %. In addition, rolling-circle replication (RCR) amino acid sequence motifs (RCR-I, RCR-II, RCR-III, walker A and walker B) commonly found among the Rep proteins involved in RCR were detected [9] (Fig. 1b) . These motifs were conserved among members of this new clade. Cap ORF: The deduced Cap protein sequences of selected SCV were compared (Table 1b) . There is limited amino acid sequence homology (17-26 %) between Tp clones and BoSCV, ChiSCV, PigSCV or PoSCV1. The nucleotide sequence identity of the Cap gene (46-48 %) was lower than the amino acid sequence identity (60-62 %) between PoSCV2 and PoSCV3. In general, the Rep gene is more conserved than the Cap gene across the ssDNA viruses. Therefore, it was unusual to find that the nucleotide and amino acid sequence identities among the PoSCV3 Cap genes (99-100 %) were higher than those of the Rep genes (94-100 %). LIR: The LIR nucleotide sequence identity between PoSCV2 and PoSCV3 was 70.2 %, and the sequences of credence to the speculation that either ORF may code for an important functional domain or protein. The LIRs of PoSCV2 and PoSCV3 also code for an additional ORF5 that is transcribed in the opposite orientation to the Cap gene and overlaps ORF3 and ORF4. The amino acid sequence identity between PoSCV2 and PoSCV3 was approximately 59 %, and the sequence identity among the PoSCV3 variants was 99-100 % (Table 1c) . Thus, the amino acid sequence identity of ORF5 between PoSCV2 and PoSCV3 was almost as high as that of the capsid protein identity of 61 %. In this work, we report a clade of novel viruses that includes PoSCV2 and PoSCV3, which encode a Rep-like protein and a palindromic sequence capable of forming a stem-loop structure (in the SIR), suggesting that their genomes may replicate via a common RCR mechanism. Interestingly, this clade of viruses encodes three overlapping ''conserved'' ORFs (ORF3, ORF4 and ORF5) in the LIR. Whereas the amino acid sequence identities between PoSCV2 and PoSCV3 for these ORFs range from 58.9 % to 68.6 %, the amino acid sequence identities among the capsid proteins range from 60.7 % to 64.1 %. Whether these additional ORFs code for functionally important proteins is not known. Likewise, the role of these viruses in any disease is unknown. The growing diversity of SCV-related genomes currently reported in the stool of chimpanzees, cows, and pigs likely portend further identification in other mammalian species. However, it remains to be seen whether these stool-associated viruses replicate in the host or that they are pass-through viruses present in the diet. Confirmation of their host and organ tropisms will require detection of SCV-specific antibodies or finding virions in animal tissues. A high level of co-infections involving numerous known viruses (coronavirus, ORF3 TP1 3L7 4L13 4L5 ORF4 TP1 3L7 4L13 4L5 ORF5 TP1 3L7 4L13 4L5 TP1 ----TP1 ----TP1 --- enterovirus, rotavirus, astrovirus, picobirnavirus, teschovirus, torovirus, sapelovirus, anellovirus, circovirus and parvovirus) was detected in just six animals from this study. This report, and the work of others, demonstrates the growing complexity of the pig virome and the challenge to understand the biology, interactions and significance of these newly discovered viruses. Novel circular DNA viruses in stool samples of wild-living chimpanzees MAFFT version 5: improvement in accuracy of multiple sequence alignment Identification of a novel single-stranded, circular DNA virus from bovine stool Diversity of viruses detected by deep sequencing in pigs from a common background Metagenomic identification of a novel anellovirus in Pacific harbor seal (Phoca vitulina richardsii) lung samples and its detection in samples from multiple years High variety of known and new RNA and DNA viruses of diverse origins in untreated sewage Simultaneous identification of DNA and RNA viruses present in pig faeces using process-controlled deep sequencing The fecal virome of pigs on a high-density farm Discovery of a novel circular single-stranded DNA virus from porcine faeces A rapid bootstrap algorithm for the RA 9 ML Web servers MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0 Acknowledgments The authors thank N. Otis, L. Hobbs, D. Michael and M. Woodruff for technical assistance and S. Ohlendorf for manuscript preparation. T.F.N. and E.L.D. were supported by R01 HL105770.