key: cord-0726902-au5aqr02
authors: Calusinska, Magdalena; Marynowska, Martyna; Goux, Xavier; Lentzen, Esther; Delfosse, Philippe
title: Analysis of dsDNA and RNA viromes in methanogenic digesters reveals novel viral genetic diversity
date: 2016-01-18
journal: Environ Microbiol
DOI: 10.1111/1462-2920.13127
sha: 5167871e60f51a448cf695d781e630b2f24c47a6
doc_id: 726902
cord_uid: au5aqr02

Although viruses are not the key players of the anaerobic digestion process, they may affect the dynamics of bacterial and archaeal populations involved in biogas production. Until now viruses have received very little attention in this specific habitat; therefore, as a first step towards their characterization, we optimized a virus filtration protocol from anaerobic sludge. Afterwards, to assess dsDNA and RNA viral diversity in sludge samples from nine different reactors fed either with waste water, agricultural residues or solid municipal waste plus agro‐food residues, we performed metagenomic analyses. As a result we showed that, while the dsDNA viromes (21 assigned families in total) were dominated by dsDNA phages of the order C audovirales, RNA viruses (14 assigned families in total) were less diverse and were for the main part plant‐infecting viruses. Interestingly, less than 2% of annotated contigs were assigned as putative human and animal pathogens. Our study greatly extends the existing view of viral genetic diversity in methanogenic reactors and shows that these viral assemblages are distinct not only among the reactor types but also from nearly 30 other environments already studied, including the human gut, fermented food, deep sea sediments and other aquatic habitats.

Although bacteria and archaea are the key players of the anaerobic digestion (AD) process occurring in methanogenic reactors producing biogas, the dynamics of these communities may be affected by numerous environmental and biological factors (Chen et al., 2008) . Bacteriophage predation can regulate microbial abundance and diversity in full-scale biogas units (Shapiro et al., 2010) , according to the proposed 'phage kills the winner' strategy (Thingstad and Lingen, 1997) . Even though the predator-prey paradigm is still underdeveloped in microbial ecology, research dedicated to aquatic habitats estimated that viral activity (e.g. specific lytic infection and/or prophage induction and activation of the viral DNA replication) destroys daily around 20% of microbial cells worldwide, thus having important implications to microbial food chains, carbon turnover and the global carbon cycle (Suttle, 2007; Maurice et al., 2013) .

Viruses, being the most abundant biological units on the planet, reaching 10 31 particles globally (Hendrix, 2002) , have been detected at concentrations 10-1000 times higher in waste water treatment plants (WWTP), including AD reactors, than in any other aquatic habitat (Wommack and Colwell, 2000; Wu and Liu, 2009 ). Nevertheless, the AD viral population (virome) has received very little attention compared with the AD microbial population (microbiome). Even though bacteriophage lysis was shown to regulate microbial abundance and diversity in full-scale AD reactors treating industrial waste water (Shapiro et al., 2010) , and phages were shown to propagate in methanogenic reactors at a rate in terms of viruslike particles (VLPs) of at least 5.2 × 10 7 VLPs day −1 l −1 (Park et al., 2007) , to date there are only a few studies characterizing the viral genomic content (mainly dsDNA phages) in these environments (Wu and Liu, 2009; Tamaki et al., 2012; Bibby and Peccia, 2013a) . In the case of RNA viruses, no study has ever been conducted in a methanogenic digester.

Here, in addition to the morphological perspective, we present a detailed metagenomic characterization of dsDNA and RNA viruses present in AD reactors and reveal their huge genetic diversity in nine different sludge samples collected from WWTP, farm digesters and biogas units treating a mixture of municipal solid waste and agrofood residues. We believe that using an optimized VLP filtration protocol that excluded the multiple displacement amplification (MDA) commonly used in viral metagenomics (MDA is known to enhance chimera formation and to create random over-amplification; Marine et al., 2014) , together with a combination of both sequenceindependent and sequence-dependent data analyses, allowed us to establish a detailed and fair picture of viral assemblages in AD reactors.

Sludge samples were collected from nine anaerobic reactors treating, respectively, (i) waste water, (ii) agricultural residues, and (iii) agricultural/agro-food residues and municipal solid waste (Table 1) . To evaluate the impact of the pre-filtration and the tangential flow filtration (TFF) protocols on the efficiency of VLPs recovery, a representative sludge sample was autoclaved (to keep the sample texture close to the original but to eliminate viruses) and spiked in with a ΦX174 phage suspension. The phage ΦX174, spiked at a known amount, was used to evaluate the recovery rate of VLPs at the different steps of the virus filtration protocol (Table S1 ). According to the number of plaque-forming units (pfu ml −1 ) calculated for each fraction, the amount of the phage remained in the same order of magnitude after the pre-filtration steps. However, it decreased by onefold following the TFF, mounted with a Sartocon Slice 200 Hydrosart cassette of 0.22 μm, and ultracentrifugation steps. Longer ultracentrifugation time did not result in a higher phage recovery rate (results not shown). Although in general the VLPs filtration protocol performed well, the two last steps could introduce a quantitative bias, especially important if absolute quantification of viruses in a sample is considered (not the case in our study).

To further evaluate the differential detection of known viral families and the potential loss of some of the largest viruses that may exceed the size of 0.22 μm (Abrahao et al., 2014) , another TFF with a cassette of 0.45 μm size was tested using metagenomics on two selected samples (P6_2 and C7_2). In our case, no additional families were detected for dsDNA viromes with the cassette of 0.45 μm size in comparison to 0.22 μm, and the data for the two samples analysed with the two cassette types correlated well (Fig. S1) . Moreover, the estimated total species richness, based on contiguous sequences (contigs) spectra, was independent of the cassette pore size used (Table 2) . For RNA viromes no striking differences at the family level were observed for either of the two cassette types used. Surprisingly, the estimated viral richness was slightly higher for the 0.22 μm cassette.

Using TEM results, the majority of purified VLPs resembled well-known viral morphologies, with filamentous viruses, including flexuous filaments similar to Inoviruslike or Tymovirales VLPs and also rigid rod-shaped particles (e.g. Tobamovirus-like) being intermixed with tailed and tailless icosahedral VLPs of different size ranges. Ovoid or bacilliform morphology was also observed (Fig. 1) . Head-tailed and tailless icosahedral viruses infecting both bacteria and archaea have been previously described in methanogenic environments (Pietila et al., 2014) . Siphoviridae-like head-tailed viruses have already been reported for methanogenic digesters (Chien et al., 2013) , and their presence is consistent with the dsDNA viral genetic diversity discussed below and in a previous metagenomic study (Tamaki et al., 2012) .

The initial sequence-independent comparative analysis of metagenomic reads proposed in this study was similar to the concept developed in crAss, where crossmetagenome assembly of reads enables the comparison of entities (e.g. contigs) that are shared between samples (Dutilh et al., 2012) . However, the use of the CLC Genomic Workbench instead of crAss allowed us to perform the co-assembly (separately for dsDNA and RNA viromes) and to determine a normalized (per metagenome size and contig length) degree of relatedness (here referred to contigs spectra), within the same application (not the case for crAss that requires a separate assembly software to be used). In continuation, these relative contigs abundance spectra expressed as DNAreads per kilobase of transcript (gene) per million reads mapped (DNA-RPKM; here equalled to the number of reads mapped to the contig and normalized by the contig length and per million mappable reads), together with a contig affiliation (sequence-dependent approach), gave a view on a normalized composition profile of dsDNA and RNA viromes for each sludge sample.

In this study over 42 million (M) and 25 M raw reads, generated, respectively, for 11 and 10 metagenomic libraries prepared for dsDNA and RNA viromes (Table S2 ), were quality trimmed and co-assembled, taking advantage of the fact that virus-like sequences rare in one sample can be abundant in another, thus leading to improved contigs (Minot et al., 2013) . When the reads were mapped back to the assembled contigs, only 48.5% could be aligned for dsDNA viromes, and the collector's curve analysis showed that the detection of the potential viruses-like sequences was not completely saturated S2 ). In contrast, nearly 82% of reads could map back to the assembled RNA contigs, and the calculated rarefaction curves confirmed that most of the existing viral RNA genetic diversity in AD reactors of the three main types was characterized in this study (Fig. S2) . For dsDNA viromes a co-assembly of reads yielded over 92 K contigs, while only a third (27.9 K) was taxonomically assigned of viral origin (Tables S3 and S4) . To verify a possible microbial genomic DNA (gDNA) contamination versus potentially novel massive diversity of dsDNA viruses in AD reactors, the predicted protein features for virus-assigned and unassigned dsDNA contigs were further functionally annotated ( Fig. 2A and B) . As a result, for contigs assigned of viral origin, 56.3% of the predicted proteins could have been functionally annotated, and their functional profile was different from the non-virus assigned contigs (Pearson correlation coefficient, r of 0.375). Using the subsystems approach (Overbeek et al., 2005) , 30.95% of annotated proteins fell into the category 'phages, prophages, transposable elements, plasmids', and 14.75% were classified into 'clustering-based subsystems' (unknown function), followed by 'DNA metabolism' (10.46%). In the case of the remaining contigs (64.8 K) that could not be taxonomically assigned of viral origin, only 26.5% of their predicted proteins could have been functionally annotated ( Fig. 2A) . Their distribution into subsystems categories was similar to a typical microbial metagenome from an AD digester (r of 0.994; Magdalena Calusinska unpubl. data), with three dominant subcategories, including 'clustering-based subsystems' (16.22%), 'protein metabolism' (11.27%) and 'carbohydrates' (11.25%), pointing to bacterial gDNA contamination. Interestingly, the remaining predicted unknown protein features for virus-unassigned contigs that could not be annotated point towards a novel genetic diversity potentially of viral origin. Indeed, out of the estimated 3.9 M protein clusters for the global virome (Ignacio-Espinoza et al., 2013) , only around 5000 viral genomes were sequenced and annotated at the time the study was conducted. Moreover, as the majority of the determined open reading frames (ORFs) have no close relatives available in the public databases (Roux et al., 2014a) , the analysis of AD viromes provides strong evidence of novel viral genetic diversity in this environment.

Four DNA contigs were ≥ 40 kb in size, with the longest one being 48 kb. Two between the largest DNA contigs (contig_ 25076 encodes 61 genes and contig_23253 encodes 58 genes), representing unclassified Siphoviridae phages, were putatively complete linear phage genomes, and no partially sequenced genes were detected at both ends (Fig. 3) . Indeed, previous studies investigating viral assemblages in methanogenic reactors reported the size of some viral DNA genomes to be between 30 and 80 kb (Wu and Liu, 2009) , which is in the range of the longest contigs identified in this study. However, further sequence-independent (e.g. based on the calculated contigs spectra, thus not biased by existing databases) contig binning (identification of contigs derived from one genome) should help to re-construct additional new viral genomes, as it was reported for a novel crAssphage discovered in human faecal metagenomes (Dutilh et al., 2014) . Out of 63 contigs that formed closed circles, only six had significant nucleotide similarity (BLAST bitscore above 50) to any vial sequence in Refseq virus database. Co-assembly of the 10 RNA viromes (sample MK18 was discarded from the analysis due to insufficient concentration of the generated library) resulted in 1142 contigs, out of which only 191 were taxonomically assigned of viral origin, and 144 were classified as RNA viruses. The two largest RNA contigs were above 8 kb and putatively represent nearly complete genomes of novel ssRNA(+) Picornavirales (Fig. 3, Table S4 ). Contig_592 encodes for a putative polyprotein and shows only 21% of sequence similarity (query coverage of 39%) with Heterosigma akashiwo RNA virus (Marnaviridae/ Marnavirus). The three predicted ORFs for the second largest contig_416 encode three different polyproteins. The first two ORFs encode the nonstructural proteins involved in replication, while the last one is a putative capsid polyprotein. They share 81-90% of sequence identity with the Drosophila C virus (Dicistroviridae/Cripavirus), for the query coverage of 94-100%. In general, most of the predicted protein features for virus-assigned RNA virome contigs encode for either hypothetical proteins, large polyproteins or proteins annotated according to their size, e.g. putative 126 kDa protein. Therefore, they could not be assigned to functional categories using the subsystems approach (Fig. 2C) .

Following the report of Allen and colleagues (2013), showing that a commonly used tool PHACCS (Angly et al., 2005) tends to underestimate viral richness in environmental samples, in our study the estimation of viral richness was done using CATCHALL (Bunge et al., 2012) and normalized contigs spectra. Only contigs that were annotated of putatively viral origin were used to calculate the richness. On the one hand, this conservative approach most probably resulted in diversity indices that were minimum estimates, and any novel viruses that show little or no sequence similarity to the previously sequenced ones were not considered. On the other hand, restriction of the analyses to contigs containing ORFs with significant similarity to previously sequenced viral genes should help to reduce the overestimation of richness caused by free contaminant bacterial gDNA (Roux et al., 2013) . Spurious singletons (e.g. sequences that could not assemble in larger contigs) that can easily inflate species richness were further addressed by applying a discounted model (Allen et al., 2013) . As a result, the estimated richness of DNA viruses was in general higher in farm digesters and biogas units treating a mixture of solid waste and agrofood residues in comparison to WWTP (Table 2) . Moreover, thanks to the normalization approach chosen in our study, we could quantitatively compare the richness between the dsDNA and RNA viruses. Accordingly, the estimated richness of dsDNA viruses was outnumbered by at least two orders of magnitude that of RNA viruses present in the analysed AD reactors.

The taxonomic analysis of the 27 K contigs assigned of viral origin indicated the recovery of predominantly dsDNA phages (94.6%). In total for dsDNA viromes 21 different families were assigned; however, the number varied between 11 and 17 according to the sample origin. Around 5% of identified viruses could not be classified to the existing viral families/orders. Head-tailed phages of the order Caudovirales constituted 77% of all the viruses identified in the AD reactors studied with three families, Myoviridae, Siphoviridae and Podoviridae, that accounted here, respectively, for 13.2%, 51.5% and 6.9% of all the dsDNA viruses (Fig. 4) . Other dominant viral groups were the members of the family Tectiviridae (6% of all the DNA viruses identified), followed by Mimiviridae (1.6%) and Phycodnaviridae (1%), with the two former families being known to contain giant viruses infecting eukaryotes (Claverie et al., 2009 ).

On the one hand, the dominance of Caudoviralesrelated sequences is consistent with a previous study, where the Caudovirales order represented approximately 93.8% of identifiable viral sequences at different stages of typical waste water treatment systems in the US (Tamaki et al., 2012) . On the other hand, Caudovirales genomes account for nearly 89% of all sequenced viral genomes (NCBI Viral Genomes, August 2015), biasing current databases and what may influence their predominance in metagenomic studies. Nevertheless, according to the recent revision of the electron microscopic descriptions of prokaryotic viruses, Caudovirales represent the vast majority (over 96.3%) of all known dsDNA bacteriophages (Ackermann and Prangishvili, 2012) , supporting the reasonable assumption that Caudovirales might indeed dominate the virome of anaerobic reactors.

Prevalence of lytic ssDNA viruses (e.g. Microviridae), which play a significant ecological role by contributing to the bacterial cell lysis (Suttle, 2007) , was documented for some soil and aquatic habitats (Krupovic et al., 2011) , and was recently revised for ocean samples (Labonte and Suttle, 2013) . Nonetheless, due to the sequencing approach undertaken in our study, ssDNA viruses were not targeted; therefore, it is not known if and at what quantities they exist in AD reactors. For this reason, to draw more conclusions on how phages interact with their microbial hosts in AD environments, it is important to further analyse ssDNA viruses to have a complete view of viral assemblages. Moreover, the monitoring of temporal changes of microbial communities and the corresponding viral assemblages would enable investigation of the rate of phage-mediated bacterial and archaeal cell turnover, thus to assess the 'phage kills the winner' strategy in a methanogenic world.

DNA contigs similar to known Euryarchaeal (Siphoviridae, Salterprovirus) and Crenarchaeal (Fuselloviridae, Bicaudaviridae, Lipothrixviridae, Rudiviridae) viral families were detected in analysed sludge samples. Until now, only five methanogen-infecting viruses have been characterized, and they belong to either Siphoviridae or Myoviridae families (Pietila et al., 2014) . Here, roughly 16 contigs were of putative Siphoviridae origin, and they were most closely related to Archaeal BJ1 virus-like. Interestingly, 19 contigs of different size range (643-4461 bp) were putatively assigned to Salterprovirus, and each of the contigs encoded type-B DNA polymerase, which is a signature of this viral genus. Remarkably, at least two different genotypes of Salterprovirus-like viruses were present, one prevailing in WWTP AD reactors and the other in farm and municipal solid waste treating reactors. Until now, two genomes of Salterprovirus were sequenced, namely His1 and His2, and the viruses were shown to infect only the extremely halophilic genus Haloarcula (Bath et al., 2006) . Multiple other dsDNA contigs were assigned to unclassified archaeal viruses that were mainly similar to Haloviruses, Pyrococcus abyssi virus 1 and Hyperthermophilic Archaeal Virus 2. With regard to Crenarchaeal viruses, the highest number of contigs was affiliated with the Bicaudaviridae family.

The presence in AD reactors of numerous contigs associated with viruses previously discovered in hypersaline or hyperthermophilic environments suggests that despite the widespread presence of archaea on our planet, specific analysis of archaea-infecting viruses has only been done in a few, usually extreme environments (Prangishvili et al., 2006) . Indeed, for over 6000 bacterial viruses studied to date, less than 100 archaeal viruses have been described, and most of them infect extremophiles (Ackermann and Prangishvili, 2012) . Moreover, except for the head-tailed viruses affecting archaea, most of the characterized archaeal viruses share genome sequence similarity only with the taxonomically closely related viruses (Pietila et al., 2014) . Therefore, the identification of novel archaeal viruses that relies on a comparative analysis with sequences deposited in public databases (66 complete genomes were deposited in NCBI Viral Genomes, August 2015) does not correctly reflect their diversity in newly studied natural environments; hence, the genetic diversity of archaeal viruses still remains poorly characterized. 

Unlike the pool of dsDNA viruses in AD reactors, which were composed mainly of bacteriophages, RNA viromes, assigned to a total of 14 different families, contained mostly viruses that infect eukaryotes. In general, RNA viruses infecting bacteria are rare, while up to now there were no archaeal RNA viruses reported (Nasir et al., 2014) . In consequence, except for one contig assigned as unclassified Levivirus Acinetobacter phage AP205, the two known families of bacterial RNA viruses, namely dsRNA Cystoviridae and ssRNA Leviviridae, were not detected in AD reactors. Interestingly, 46% of virusassigned RNA contigs were potential plant pathogens and were associated with five ssRNA(+) viral families, namely Virgaviridae, Betaflexiviridae, Tombusviridae, Potyviridae and Secoviridae (Table S5 ). Based on the identity scores to viral genomes listed in Refseq, 11 known viruses were identified using the criteria of ≥ 80% sequence identity over ≥ 90% of sequence length. Several contigs represented Tobamovirus genus (tobacco mosaic virus group), including many members that are widespread and well-characterized plant pathogens, e.g. cucumber green mottle mosaic virus, pepper mild mottle virus, turnip vein-clearing virus, etc. To our surprise, WWTP and municipal solid waste treating AD reactors were the main reservoirs of these viruses with, respectively, nine and ten Tobamovirus members present, while only six were found in farm AD reactors. The prevalence of plant pathogenic viruses in WWTP AD reactors (between 59 and 97% of RNA contigs annotated as plant pathogens were present in the sludge samples from WWTP AD reactors analysed here) could be explained by their dominance in human faeces which are input substrates. The contaminated food was proposed to be the source of plant-infecting RNA viruses in human faeces (Zhang et al., 2005) .

Although the diverse plant pathogenic viruses were common in RNA viromes of different AD reactors, the values are not quantitative on an absolute basis; thus, reliable estimation of their absolute abundance cannot be performed. Nevertheless, assessment of their quantities and the viability would be a logical extension of this study, as the residue of the AD process, namely the digestate, is frequently used as a fertilizer in agriculture. Additionally, the full chain of reactors (fed reactors, postdigesters and storage tanks) should be studied to evaluate the potential of the AD cascade to reduce plant pathogenic viruses. Moreover, assessing plant pathogenic viruses in sludge from WWTPs through metagenomics could prove a useful tool to trace the human-mediated dissemination of plant viruses of potential quarantine significance to new areas of the world (Candresse et al., 2014) .

Around 0.68% of annotated DNA contigs were tentatively classified into three main viral families (Adenoviridae, Herpesviridae and Poxviridae) that include potential human pathogens (Table S4) . However, except for three contigs that were putatively annotated as human herpes viruses, none of the other DNA contigs were directly assigned as human pathogens. Whereas our study did not reveal the presence of human adenoviruses in the AD reactors of the Schifflange WWTP [results confirmed by an additional quantitative polymerase chain reaction (qPCR) assay, Table S6 ], an integrated cell culture PCR approach showed their presence in the main influent and effluent streams of the aerobic treatment tanks of the same plant (Ogorzaly et al., 2013) . Since adenoviruses represent risk to human health, their removal through AD treatment is an alternative that could be considered to re-design modern WWTPs. Nevertheless, as human adenoviruses were detected in influent and effluent sludge from other mesophilic AD reactors located in the US (Bibby and Peccia, 2013a,b) , a global monitoring of different WWTPs should be performed with a standardized procedure (e.g. excluding the MDA step that biases the final quantitative results) to assess the efficiency of the removal of human viral pathogens through the AD treatment. No annotated human gastrointestinal RNA viruses, e.g. rotaviruses, astroviruses, certain members of coronaviruses, etc., were detected in the AD reactors studied. Also, previous metagenomic analysis of human faeces from healthy individuals indicated that less than 2.9% of sequencing RNA reads resembled animal viruses (Zhang et al., 2005) , meaning that although the raw sewage represents the effluent from urban human activities, provided that a large part of the population is composed of healthy human individuals, human gastrointestinal viruses are uncommon in methanogenic reactors of WWTPs.

Putative animal-infecting viruses represented 1.12% of annotated contigs, with the majority being assigned to five viral families/ orders, including Poxviridae (mostly Entomopoxvirinae), Ascoviridae, Iridoviridae, Herpesvirales and Baculoviridae. Except for Herpesviralesassigned contigs, most of the other contigs were tentatively assigned to insect-infecting viruses, with Lepidoptera moths (many species are common pests on crops) being identified as main hosts.

Acanthamoeba polyphaga mimivirus was the first discovered giant virus, isolated from an amoebal co-culture, and classified as a new family Mimiviridae-'mimicking the dsDNA and RNA viruses in methanogenic reactors 9 microbe' (Scola et al., 2003) . Amoebas, the main hosts of these viruses, have been isolated from various environments, including sewage treatment systems; therefore, not surprisingly approximately 6% of annotated contigs, mainly from waste water AD reactors, were tentatively assigned to different giant viruses, e.g. Acanthamoeba polyphaga mimivirus, Megavirus chilensis, Acanthamoeba polyphaga moumouvirus and Pandoravirus dulcis. Giant viruses are dsDNA viruses with an extraordinary diameter of around 700 nm and a genome size of 1.2 Mb. Recent studies have brought even more attention to these viruses as they indicated that in addition to amoebas, also vertebrates, including humans, may be hosts of these viruses (Abrahao et al., 2014) .

As one of the goals of this study was to determine if the viral assemblages from different methanogenic reactors were similar, the different dsDNA and RNA viromes were compared by applying non-metric multidimensional scaling (NMDS) on annotated contigs spectra. Consistent with the NMDS results, dsDNA and RNA viromes from WWTPs were clearly distinct from the farm digesters and organic-waste treating reactors (P < 0.001), while farm and organic-waste AD reactors clustered together ( Fig. 5A and B) . The lowest stress value for A and B was 0.2. For C, the MDS coordinates were calculated with METAVIR2, and the stress value was given as 7.69% with the R 2 non-metric fit of 0.994. Moreover, further BLAST-based comparative genomic analysis (to avoid the bias from the contaminant bacterial gDNA, only contigs assigned to be of viral origin were considered) and conducted in METAVIR2 (Roux et al., 2014b) revealed that the taxonomic composition of DNA viral assemblages in AD reactors was different from 27 other selected environments, including human gut, saliva, lungs and faeces, fermented food, deep sea sediments, and other aquatic environments (Fig. 5C) . These observations suggest that viruses in AD reactors, mainly dsDNA head-tailed phages, form assemblages that are different from those found in other environments. Indeed, at the genomic level, Caudovirales are an extremely divergent group with mosaic genomes composed of conserved and variable regions. Most of their homologous nucleotide and protein sequences often display none or only 10-20% identity in pairwise comparisons (Krupovic et al., 2011) , essentially leading to a high number of hypothetical proteins discovered in viral metagenomes (Hurwitz and Sullivan, 2013) . Moreover, it has been assessed at a global scale that between 10 9 and 10 10 fully functional bacteriophage genomes are produced via nonhomologous recombination every second (Hendrix, 2009) , and once they are viable they can easily spread within the population of closely related viruses, thus keeping on generating diversity. As phages obligatorily require their bacterial hosts, and the concentration and diversity of bacteria are high in AD reactors, no wonder that extensive homologous or non-homologous re-combinations, genomic segment duplications, etc. may take place, leading to a rapid evolution and an immense and novel diversity of bacteriophages in these environments.

In spite of the fact that phages are recognized to play key roles in microbial ecology by controlling strain composition and abundance, e.g. some of the known lysogenic archaeal viruses like Pleolipoviridae were shown to cause persistent infections resulting in a continuous virus production and host growth retardation (Pietila et al., 2014) , until now viruses remain largely unstudied in methanogenic digesters. Moreover, the main interest to study viruses in this environment is not only the fact that by shaping microbial dynamics they can negatively influence the process of methane production, but also the development of a phage-mediated treatment of anaerobic sludge, by regulating the abundance of key functional microbial groups, has the potential to control some of the process problems, like foaming, sludge dewaterability and digestibility, or the competition between nuisance and functionally important bacteria (Withey et al., 2005) . Although current sequencing methodologies, e.g. metagenomics, enable the sequencing of wild viral assemblages with unprecedented depth and resolution, due to an incredibly high diversity of viral genomes, the majority of viral sequences are not yet found in databases, which prevents their complete characterization when relying on sequence homology-based identification. Therefore, further genetic analysis of gene contents of the identified phage contigs should provide more information about their lifestyle, and investigations with methods such as viral tagging (Deng et al., 2012) or digital PCR (Tadmor et al., 2011) will be required to expand our knowledge on the interactions between viruses and their hosts in AD environments.

After mixing the reactor to collect a representative sample of 10-100 l, depending on the biogas unit, a sludge volume of 50 ml was snap frozen on site and kept at −80°C until further analysis. All reactors sampled in this study were the fed reactors. Metabolic parameters, including pH, total inorganic carbon, total ammonium-nitrogen, and total solid and volatile solid contents, were determined for each AD reactor at the time of sampling. Total solids (TS, 24 h at 105°C) and volatile solids (VS, 6 h at 550°C) were determined according to the 4630 Voreinigen Deutsche Ingeniören (VDI) norm. Total inorganic carbon and ammonium-nitrogen concentration (NH4-N) were measured in conformity with the manufacturers' protocol, using the BiogasPro system (RIMU, Königsbrunn, Germany).

Sludge samples were diluted with an SM buffer to 0.5 l, implemented with 5-10 stainless milling beads (Φ5 mm) and shaken vigorously (250 r.p.m. min −1 ) for 30 min to disperse flocks and to detach the biomass and viruses from vegetal fibres, then centrifuged at 4000 g for 20 min to sediment large-size particles and bacteria. The resulting supernatant was subjected to a three-step pre-filtration with an 8 μm Sartopure nitrocellulose filter (Sartorius) placed in a Büchner funnel under vacuum, a 5 μm pore Minisart syringe filter (Sartorius) and finally with a 0.8 μm MF-Millipore membrane filter placed in a Büchner funnel. The filtrate was applied onto a TFF system (Sartorius) mounted with a Sartocon Slice 200 Hydrosart cassette of 0.2 or 0.45 μm. The obtained permeate was ultracentrifuged at 45 000 g for 3 h at 4°C, and the resulting pellet was re-suspended in 2 ml of SM buffer.

A representative sludge sample was autoclaved and spiked in with 1 ml of ΦX174 phage suspension corresponding to a concentration of 5 × 10 8 pfu ml −1 . The sample was thoroughly shaken and subjected to all of the pre-filtration and TFF steps as described above. At each step, an aliquot was retained and a plaque-forming assay was applied in accordance with ISO 10705-2 to evaluate the phage recovery. Briefly, the fractions collected in check-points 0-8 (Table S1) were serially diluted (dilutions from 10 −1 to 10 −7 ), and 1 ml of each dilution was mixed with 1 ml of warm semi-solid Modified Scholtens Agar (ssMSA) medium and 1 ml of Escherichia coli C culture (ATCC 13706, OD = 0.287) in Modified Scholtens Broth (MSB) medium. Afterwards, the mixture was poured over a plate containing Modified Scholtens Agar (MSA) agar to create a second layer. For the positive control, the same procedure was performed with 1 ml of ΦX174 phage stock suspension (10 2 pfu ml −1 ), and for the negative control only bacteria in MSB and ssMSA media were used to create a second layer. After an overnight incubation at 37°C, plaques formed on the agar were counted and the concentration of phages was calculated according to the following formula:

where X = concentration of phages in pfu ml −1 , N = total number of plaques, n1 = number of replicates per dilution, V1 = volume used, and F1 = dilution factor.

For a selected number of samples, but including each sludge origin, VLPs from 25-fold concentrated stocks were deposited on formvar-coated copper grids, stabilized with evaporated carbon film (200 mesh, S162 Plano GmbH) and negatively stained with either 2% (wt vol −1 ) uranyl acetate or 2% phosphotungstic acid. The air-dried grids were examined with a LEO 922 OMEGA (Carl Zeiss) TEM microscope operated at 120 kV. 

The software CLC Genomics Workbench 7.0 was used for reads filtering, assembly and mapping. Following trimming for adapters, quality scores and read length, 42 M (dsDNA virome) and 13 M (RNA virome) reads were used to assemble into contigs. The summary of these steps is given in Tables S2 and S3. The two de novo co-assemblies were prepared, which incorporated sequence reads that passed the quality control from all samples (dsDNA and RNA viromes separately). To determine the relative abundance and occurrence of virus-like sequences in annotated contigs, contigs spectra were prepared. Briefly, reads were de-replicated and mapped back to the de novo assembled contigs for dsDNA and RNA viromes separately, and the average abundance was calculated as DNA-RPKM. The obtained annotated contigs abundance spectra were used to calculate alpha diversity with CATCHALL version 3.0, as implemented into MOTHUR v.1.31.1 (Schloss et al., 2009 ) and rarefaction curves. CATCHALL, by comparing a suite of flexible parametric and non-parametric models, returns the best model according to statistical and heuristic criteria. Moreover, by statistically discounting low-frequency observations, it provides the discounted richness estimation as well (Bunge et al., 2012; Allen et al., 2013) . Similarity in community membership between the different DNA and RNA viromes was compared by applying NMDS on annotated contigs spectra and using traditional Jaccard similarity coefficient based on the observed richness. Assembled contigs were taxonomically assigned using a BLAST comparison with the Refseq complete viral genomes protein sequences database from NCBI (release from 2014-03-13) using BLASTP (threshold of 50 on the BLAST bitscore) as implemented in METAVIR2 (Roux et al., 2014b) . Contigs with multiple ORFs that showed highest similarity to viral genes encoded by distinct genomes were taxonomically assigned according to the best BLAST bitscore. The raw sequence data have been submitted to the NCBI Sequence Read Archive under Accession Number SRP051200. Assembled and annotated contigs are available via METAVIR2 project Accession Numbers 3506 (dsDNA virome) and 3518 (RNA virome). Contigs classified of viral origin and those that could not be classified as viruses were further functionally annotated using the MG-RAST server (Meyer et al., 2008) , and are accessible thought the MG-RAST ID Numbers 4559444.3 and 4571041.3 respectively.

Hexon-specific adenoviral primers AdF and AdR and a hydrolysis probe AdP1 (Hernroth et al., 2002) were used to amplify the major capsid protein coding gene to detect human adenovirus serotypes A, C, D, E and F. The PCR amplifications were performed in 20 μl volume reactions including 1x TaqMan Environmental Master Mix 2.0 (Applied Biosystems), each primer at the final concentration of 0.9 μM and probe at 0.225 μM. The PCR cycling conditions were as follows: 95°C for 10 min, followed by 50 cycles of 95°C for 15 s and 60°C for 1 min. A standard curve was used for absolute quantification.

Additional Supporting Information may be found in the online version of this article at the publisher's web-site: Fig. S1 . Correlation of the data obtained for dsDNA viromes for the two pore size of 0.22 μm (samples P6, C7) and 0.45 μm (samples P6_2, C7_2) during TFF. Fig. S2 . dsDNA and RNA viral communities richness. Table S1 . Evaluation of the VLPs isolation procedure -a plaque forming assay. Table S2 . Summary of the sequencing results for dsDNA and RNA viromes. Table S3 . Summary of reads assemblies for dsDNA and RNA viromes. Table S4 . Characteristics of viral dsDNA and RNA contigs. Only contigs annotated of viral origin were included in the table. Full dataset is accessible via METAVIR2 accession numbers 3506 (dsDNA virome) and 3518 (RNA virome). Table S5 . Putative plant pathogenic RNA viruses identified in sludge samples from AD reactors. Table S6 . qPCR-mediated detection of human adenoviruses in sludge samples from AD reactors.

Acanthamoeba polyphaga mimivirus and other giant viruses: an open field to outstanding discoveries

Prokaryote viruses studied by electron microscopy

Estimation of viral richness from shotgun metagenomes using a frequency count approach

PHACCS, an online tool for estimating the structure and diversity of uncultured viral communities using metagenomic information

His1 and His2 are distantly related, spindle-shaped haloviruses belonging to the novel virus group

Identification of viral pathogen diversity in sewage sludge by metagenome analysis

Prevalence of respiratory adenovirus species B and C in sewage sludge

Estimating population diversity with CatchAll

Appearances can be deceptive: revealing a hidden viral infection with deep sequencing in a plant quarantine context

Inhibition of anaerobic digestion process: a review

Characterization of persistent virus-like particles in two acetate-fed methanogenic reactors

Mimivirus and Mimiviridae: giant viruses with an increasing number of potential hosts, including corals and sponges

Contrasting life strategies of viruses that infect photo-and heterotrophic bacteria, as revealed by viral tagging

Referenceindependent comparative metagenomics using crossassembly: crAss

A highly abundant bacteriophage discovered in the unknown sequences of human faecal metagenomes

Bacteriophages: evolution of the majority

Jumbo bacteriophages

Environmental factors influencing human viral pathogens and their potential indicator organisms in the blue mussel

The Pacific Ocean virome (POV): a marine viral metagenomic dataset and associated protein clusters for quantitative viral ecology

The global virome: not as big as we thought?

Genomics of bacterial and archaeal viruses: dynamics within the prokaryotic virosphere

Previously unknown and highly divergent ssDNA viruses populate the oceans

Caught in the middle with multiple displacement amplification: the myth of pooling for avoiding multiple displacement bias in a metagenome

Linking the lytic and lysogenic bacteriophage cycles to environmental conditions, host physiology and their variability in coastal lagoons

The metagenomics RAST server -a public resource for the automatic phylogenetic and functional analysis of metagenomes

Rapid evolution of the human gut virome

The distribution and impact of viral lineages in domains of life

Two-day detection of infectious enteric and non-enteric adenoviruses by improved ICC-qPCR

The subsystems approach to genome annotation and its use in the project to annotate 1000 genomes

Phage diversity in a methanogenic digester

Archaeal viruses and bacteriophages: comparisons and contrasts

Viruses of the archaea: a unifying view

Assessment of viral community functional potential from viral metagenomes may be hampered by contamination with cellular sequences

Ecology and evolution of viruses infecting uncultivated SUP05 bacteria as revealed by single-cell-and meta-genomics

Metavir 2: new tools for viral metagenome comparison and assembled virome analysis

Introducing mothur: open-source, platform-independent, communitysupported software for describing and comparing microbial communities

A giant virus in Amoebae

Bacteriophage predation regulates microbial abundance and diversity in a full-scale bioreactor treating industrial wastewater

Marine viruses-major players in the global ecosystem

Probing individual environmental bacteria for viruses by using microfluidic digital PCR

Metagenomic analysis of DNA viruses in a wastewater treatment plant in tropical climate

Theoretical models for the control of bacterial growth rate, abundance, diversity and carbon demand

Bacteriophages-potential for application in wastewater treatment processes

Virioplankton: viruses in aquatic ecosystems

Determination of virus abundance, diversity and distribution in a municipal wastewater treatment plant

RNA viral community in human feces: prevalence of plant pathogenic viruses

This work was supported by the Fonds National de la Recherche, Luxembourg (FNR CORE 2011 Project GASPOP, C11/SR/1280949: Influence of the Reactor Design and the Operational Parameters on the Dynamics of the Microbial Consortia Involved in the Biomethanation Process). We thank Cécile Walczak and Leslie Ogorzaly from the ERIN department of our institute for their help and suggestions during the VLPs filtration and TEM preparations. We thank Bénédicte De Vos, Anaïs Noo and Marie Fossépré for their technical support. The authors are most thankful to the members of the Biogas-Vereenegung ASBL, to François Delion (SOVACOM sarl), the members of the ASBL Au pays de l'Attert, and the partnership of the Ecobiogaz project of the INTERREG IV A programme for providing a year-round access to the biogas plants and for their help in sampling the reactors for sludge. Waste water plant managers and their respective technical staff are also warmly acknowledged for their contribution in sampling the reactors on their plants: André Detaille, Jean-Jacques Schmit and Régine Reinert (STEP, Bettembourg), Gerry Bissen and Gianni Di Pentima (SIVEC, Schifflange), Piero Daresta, Roland Troes, and Raymond Erpelding (SIACH, Pétange). We kindly thank Jillian M. Lenné for English correction. We would also like to thank the editor Dr Victoria Orphan and the anonymous reviewers for their critical revision that helped to improve this manuscript. The authors have declared that no competing interests exist.