key: cord-0069242-fc56dyrp
authors: Jurasz, Henryk; Pawłowski, Tomasz; Perlejewski, Karol
title: Contamination Issue in Viral Metagenomics: Problems, Solutions, and Clinical Perspectives
date: 2021-10-20
journal: Front Microbiol
DOI: 10.3389/fmicb.2021.745076
sha: 89e68135afb5e6143a370bf8a9561a95833470f7
doc_id: 69242
cord_uid: fc56dyrp

We describe the most common internal and external sources and types of contamination encountered in viral metagenomic studies and discuss their negative impact on sequencing results, particularly for low-biomass samples and clinical applications. We also propose some basic recommendations for reducing the background noise in viral shotgun metagenomic (SM) studies, which would limit the bias introduced by various classes of contaminants. Regardless of the specific viral SM protocol, contamination cannot be totally avoided; in particular, the issue of reagent contamination should always be addressed with high priority. There is an urgent need for the development and validation of standards for viral metagenomic studies especially if viral SM protocols will be more widely applied in diagnostics.

Next-generation sequencing (NGS) techniques combined with the development of computational tools led to an explosion of metagenomic studies in the past decade (Chiu and Miller, 2019; Lewandowski et al., 2019) . Metagenomics is defined as direct analysis of the whole microbial communities based on DNA/RNA extracted from clinical or environmental samples (Huson and Mitra, 2012) . Such analysis allows for the detection of known and unknown microorganisms and provides insights into the pathogen-host interactions, epidemiology, ecology, and evolution of organisms found across various ecosystems (Forbes et al., 2017; Chiu and Miller, 2019) . Although microbial research remains dominated by bacterial 16S rRNA gene sequencing studies, new techniques were also used for viral analysis (Ladner et al., 2014; Moustafa et al., 2017; Kufner et al., 2019) . Shotgun metagenomics (SM) is currently the most widely used technique to analyze viral DNA and RNA in a given environment (Conceicao-Neto et al., 2015; Forbes et al., 2017) and was successfully introduced into clinical practice to support diagnosis of systemic infections and occasionally identified a number of novel viral species (Palacios et al., 2008; Foulongne et al., 2011; Lipowski et al., 2017) .

While SM is being used to characterize the virome using various workflows, it still faces numerous challenges, including the decision regarding best extraction and sequencing methods, the need for host genomic background depletion, the necessity of access to computational resources and highly specialized bioinformaticists, and providing relevant clinical data fast enough to be of clinical value (Schlaberg et al., 2017; Boers et al., 2019) . Overall, SM approach has allowed for comprehensive surveys of never-before-seen viral communities (Moreno-Gallego et al., 2019; Waldvogel-Abramowski et al., 2019; Perlejewski et al., 2020b) . However, SM also detects external contaminant nucleic acids and cross-contaminations, which can affect the interpretation of microbiome data (Xu et al., 2013; Laurence et al., 2014) . So far, the issue of contamination in microbial sequencing studies was mostly discussed in regard to amplicon target sequencing (ATM); (16S rRNA gene sequencing) and was largely focused on bacterial bias (Karstens et al., 2019) . Such contamination effects are common, as several studies have found contaminant microbial DNA in laboratory reagents and laboratory surfaces (Salter et al., 2014; Eisenhofer et al., 2019; Stinson et al., 2019) . While several groups have also reported on the presence of genomic contaminants in viral SM data, there are no established criteria for examination and/or reporting of contamination in virome-focused studies (Moustafa et al., 2017; Zolfo et al., 2019; Perlejewski et al., 2020a) . The current review emphasizes the impact of contaminants on viral studies, especially when using low-biomass samples, and proposes recommendations to minimize its effect.

Different types of samples and SM protocols affect the composition of genetic background found in viral metagenomics. Therefore, contaminants may be represented by external host/human or bacterial DNA, as well as sequencing reads aligned to genomes of a non-sample viral, fungal, protozoal, or even plant species (Perlejewski et al., 2015; Moustafa et al., 2017; Asplund et al., 2019) . Specific contaminants are often not even reported in viral metagenomic studies as most viral SM research is focused only on viral hits, rarely aligning NGS reads to genomes other than host and viral. There are two major types of contaminants in viral SM studies: external or internal contamination (Figure 1 ; Davis et al., 2018; Eisenhofer et al., 2019) .

External contamination originates from the outside of samples during specimen collection and preparation and can include skin of patients or investigators (Kitchin et al., 1990; Meadow et al., 2015) , clinical and laboratory equipment (Mukherjee et al., 2015; Llamas et al., 2017) , collection tubes (Motley et al., 2014) , contaminated laboratory surfaces or air (Bittinger et al., 2014) , extraction kits, polymerase chain reaction (PCR) reagents (Grahn et al., 2003; Tilburg et al., 2010; Salter et al., 2014) , or even molecular biology-grade water (Nogami et al., 1998; Kulakov et al., 2002; Keki et al., 2013) . Manufacturers usually do not guarantee the absence of contaminating DNA in their products, and those reagents/kits that are sold as sterile may contain low-abundance external DNA (van der Horst et al., 2013) . Generally, most external contaminations in microbiome studies have their own unique profile specific to particular reagents and kits; therefore, they are often referred to as kitome and are largely undistinguishable from microbiome signals derived from analyzed samples (van der Zee et al., 2002; Salter et al., 2014; Sabatier et al., 2020) . Although a specific kitome can be detected and characterized, the types and quantities of reagent contaminants vary between different extraction/PCR kits and batches of the same reagent (Salter et al., 2014) . True DNA/RNA signals are reproducible and associated with individual samples; however, reagent contamination signals are linked predominantly to specific batches or even reagents lots (Salter et al., 2014; de Goffau et al., 2018) . For example, Glassing et al. (2016) analyzed MoBio DNA Extraction kit (QIAGEN; Hilden; Germany) and showed that 69% of dominating bacterial genera were the same in different lots of the kit, whereas the composition of minor genera was lot-dependent. Therefore, it has been recommended to process all samples in a particular project using the same batches/lots of reagents and to consider kit batches as a factor in the statistical analysis whenever multiple batches are used (Kim et al., 2017) .

It seems that neither laboratories nor sequencing facilities are free from contamination, and this external DNA noise can change over time (Salter et al., 2014) . For example, Weyrich et al. (2019) analyzed ultraclean ancient DNA laboratories for over 5 years and three modern molecular biology laboratories for 1 year and found that each one had its own unique microbial profile that changed over time according to the month and season. In another study, three different laboratories performed 16S rRNA sequencing of the same Salmonella bongori control using different batches of the same extraction kits (FastDNA Spin Kit For Soil; MP Biomedicals, Santa Ana, CA, United States) and obtained three different microbial profiles. This variation in the contaminant content could be the result of differences between kit batches and other reagents or may represent contaminants specific for each laboratory environment and investigators (Salter et al., 2014; Kim et al., 2017) .

Extraction kits seem to be the major source of nucleic acids external noise in microbiome studies (Evans et al., 2003; Salter et al., 2014; Smuts et al., 2014; Zhi et al., 2014; Sabatier et al., 2020) . Glassing et al. (2016) identified 88 bacterial genera in commonly used DNA extraction kits, and it was estimated that 10-50% of the bacterial profiles in lower-airway human samples are contaminants, and their main source are extraction kits (Drengenes et al., 2019) . Commercial extraction kits were found to contain a higher microbial diversity and several more human-associated bacterial taxa when compared to in-house extraction protocols . A different genetic background with significant higher prevalence of contaminants was reported for manual compared to automated extraction systems . The latter is not unexpected as manual extractions require a higher number of manual transfer steps than single-tube spin-column approach, and thus, the risk of external contamination is increased.

RNA sequencing is more susceptible to contamination than DNA sequencing due to the presence of the extra reverse transcription (RT) step (Strong et al., 2014) . In addition, it was found that commercially available RT enzymes can contain viral contaminants such as equine infectious anemia virus or murine leukemia virus (MuLV); (Zheng et al., 2011; Wally et al., 2019) .

DNA and RNA sequencing SM protocols may include an amplification step to generate sufficient amount of DNA/cDNA for sequencing libraries (Malboeuf et al., 2013) . A number of studies documented the presence of external DNA in various commercial polymerases (Bottger, 1990; Schmidt et al., 1991; Hughes et al., 1994) ; for example, microbial contaminants were reported in six commercially available Taq polymerases (Iulia et al., 2013) . It was estimated that the amounts of contaminants in recombinant Taq polymerase range between 10 and 1,000 genome equivalents of microbial DNA per unit of enzyme (Spangler et al., 2009) . Other potential sources of contaminants could also include PCR buffers or MgCl 2 stocks, as well as primers prepared with water-containing contaminant DNA (Stinson et al., 2019) . Considering the nature of SM and the necessity to analyze low-biomass samples, whole-genome amplification (WGA) is often used for the generation of templates suitable for sequencing (Thoendel et al., 2017) . When three commercial WGA-DNA kits (Illustra V2 Genomiphi, Illustra single cell Genomiphi, and Qiagen REPLI-g single cell kits) were tested, it has been found that each contained a wide variety of microbial contaminant DNA (Thoendel et al., 2017) . The origin of DNA background noise in WGA methods could come from amplification of contaminant DNA or from nonspecific extension of random primers (Blainey and Quake, 2011) . However, the consistent and highly specific contamination profile found in most individual WGA-DNA kits suggests the dominant role of the former (Thoendel et al., 2017) . DNA background was reported in studies using WGA-RNA kits for the analysis of cerebrospinal fluid (CSF) and synovial fluid samples (Malboeuf et al., 2013; Perlejewski et al., 2015 Perlejewski et al., , 2016 Masters et al., 2018) . WGA-RNA sequencing performed on clinical samples (CSF, swabs, and serum) and surrogate CSF samples (spiked with three 1:100 dilutions of influenza A H3N2 virus) using WTA2 kit (Sigma-Aldrich, St. Louis, MO, United States) resulted in the detection of a wide range of bacterial and viral contaminants. However, it should be noted that this background noise could have also originated from extraction kits and reagents used for the depletion of host genetic material (Oechslin et al., 2018) .

The final step of wet-laboratory SM protocols is sequencing (Garmaeva et al., 2019) . Currently, the most widely used method due to low costs, high yield, and wide availability is sequencing by synthesis marketed by Illumina (San Diego, CA, United States) (Kim et al., 2020) . Despite numerous advantages, Illumina sequencing platforms share common challenge related to phage PhiX174 (approximately 5.3 kb) control used for quality and calibration assessment (Manley et al., 2016) . While PhiX174 sequences should be removed from the final data, Mukherjee et al. (2015) reported that approximately 5.5% of publicly available microbial genomes in the Integrated Microbial Genomes database are contaminated by PhiX174, and 10% of them has been published in peer-reviewed scientific papers.

Cross-contamination is the most challenging internal contamination source when compared among the other numerous sources of internal contamination encountered in microbial sequencing (Olomu et al., 2020) . This form of contamination results from transfer of genetic material, amplicons, or barcodes between reaction tubes or wells (Carlsen et al., 2012; Poore et al., 2020) . Sample cross-contamination can occur at different steps throughout the whole SM protocol because of incorrect pipetting, accidental splashes of liquids, generation of aerosols, incorrect tube opening, or plate cover removal (Tamariz et al., 2006; Joung et al., 2017) . The risk of sample cross-contamination increases when a large batch of samples undergoes extraction and/or library preparation, especially when using tube strips without individual caps, or when using reaction plates (Lejal et al., 2020; Olomu et al., 2020) . Specimen-to-specimen cross-contamination was found to be significantly more common in high-throughput whole-genome sequencing (HT-WGS) in comparison to Sanger sequencing when influenza A/H3N2 virus from nasal/nasopharyngeal/throat swabs was analyzed (Lee et al., 2016) . Well-to-well contamination affects primarily neighboring samples, but occasionally even those 10 wells apart . In a study conducted by Minich et al. (2019) on no-template controls (NTCs), 47.5% of blanks for tubes and 95.7% of blanks for plate DNA extractions had evidence of well-to-well contamination. This contaminating effect was more common in samples with low biomass, thus negatively affecting microbial alpha and beta diversity metrics . To limit well-to-well contamination, it was proposed to keep a minimum of four-well gap between high-and low-biomass samples (Olomu et al., 2020) .

Another type of cross-contamination is run-to-run contamination observed for MiSeq (Illumina, San Diego, CA, United States) sequencers, which may manifest itself for as many as seven sequential runs following the original run (Brumme and Poon, 2017; Eisenhofer et al., 2019) . However, modifications to the post run wash procedure, mainly via the addition of a bleach wash, largely solved this problem (Brumme and Poon, 2017) .

Another type of internal contamination occurs as a phenomenon called "index hopping" or "index switching" and is the main cause of incorrect sample assignment of sequencing reads in multiplexed pooled libraries (Griffiths et al., 2018) . Index hopping refers to incorrect read assignment from a given NGS library based on assignment to a barcode belonging to a different one sequenced in the same pool (Costello et al., 2018) . This effect is largely due to an excess of free index primers, which, together with the cluster generation reagents, randomly ligate to other samples pooled together in the sequencing run (Carlsen et al., 2012; Sinha et al., 2017; Costello et al., 2018) . According to Sinha et al. (2017) in a multiplexed pool of samples sequenced on Illumina platform HiSeq 4000, up to 5-10% of all sequencing reads are misassigned from one sample to another. Index hopping is also a well-known phenomenon reported for the MinION (Oxford Nanopore Technologies, Oxford, Great Britain) sequencer where 0.056% of reads were found to have incorrectly assigned barcodes . Index switching reduces the value of negative controls in sequencing runs as NTCs and analyzed samples may contain the same sequences; thus, true signals cannot be distinguished from background noise (Hornung et al., 2019) . To reduce index switching, unique dual-indexing and dual-matched indexed adapters with unique molecular indices are recommended (MacConaill et al., 2018; van der Valk et al., 2020) .

Internal contamination in microbiome sequencing could also be caused by DNA damage and polymerase errors (Brandariz-Fontes et al., 2015; Hornung et al., 2019) . In one study evaluating 13 commercial polymerases, it was found that enzyme choice has a large impact on the proportion of correct reads recovered from multiple gene sequencing run (from 17 to 71%) (Brandariz-Fontes et al., 2015) . Nucleotide misincorporation, generation of chimeric sequences, or variation in efficiency of amplification of high and low GC fragments can arise from amplification bias (Brodin et al., 2013; Shugay et al., 2014) . Sequencing of GC-poor regions on Illumina platforms is typically less efficient, which limits uniform read coverage across the genome, thus affecting viral genome assembly in SM analysis (Kozarewa et al., 2009; Chen Y. C. et al., 2013) . A partial solution for amplification errors is offered by the use of high-fidelity polymerases, which are characterized by up to 100 times lower error rates and lower chimera generation rates (Sze and Schloss, 2019) . Importantly, PCR conditions also play a significant role in generation of internal contaminants, and it has been demonstrated that a lower number of PCR cycles results in a lower signal-to-noise ratio in microbial profiling studies (Quail et al., 2011; Sze and Schloss, 2019) . Errors can also arise during various parts of sequencing procedure (cluster amplification, sequencing cycles, image analysis), resulting in error base calling of approximately 0.1-1%, depending on sequencing platform (Fox et al., 2014) .

Finally, internal contamination may be the result of erroneous bioinformatics reads classification (Hornung et al., 2019; Ye et al., 2019) . In the current literature, there are dozens of virus-specific classification workflows that are based on different strategies using anything from simple one-step tools to analyses that combine five or more steps and a variety of algorithms for virome analysis (Wommack et al., 2012; Zhao et al., 2017; Nooij et al., 2018; Kieft et al., 2020) . Nooij et al. (2018) evaluated 49 different workflows suitable for viral classification and found that the positive predictive value was generally high (>75%), although some classifiers had lower and varied precision scores: IMSA+A (9%), Kraken (34%), NBC (49%), and vFam (3-73%). Taxonomic classifiers are also associated with different default thresholds for false-positive signal detection (from 0.01 to 0.5%), which results in tens (Bracken, MetaPhlAn2) or even thousands (Centrifuge, CLARK, Kaiju, MMseqs2, PathSeq) of false-positive hits, depending on the workflow (Ye et al., 2019) .

False reads assignments in microbiome studies may also arise from contamination of publicly available databases. For instance, more than 330,000 bp in the reference genomes of Plasmodium gaboni and Plasmodium falciparum were found to be contaminated with human genome (Kryukov and Imanishi, 2016) . Merchant et al. (2014) discovered that Neisseria gonorrhoeae TCDC-NG08107 genome submitted to GenBank contained fragments of cattle and sheep genomes. Similarly, The Cancer Genome Atlas database was found to be contaminated by human papillomavirus type 38 originating from RNA sequencing of human endometrial samples (Kazemian et al., 2015) . The previously mentioned contamination with PhiX174 illustrates the scale and range of microbial database contamination (Mukherjee et al., 2015) .

The impact of contamination is especially significant for low microbial biomass samples where the proportion of background noise increases with the decrease of input template (Malboeuf et al., 2013; Karstens et al., 2019) . The quantity of biomass can be evaluated by comparing the amount of extracted DNA/RNA from the studied sample to the volume of genetic material isolated from NTCs in the same SM protocol. Samples specified as low biomass typically contain the amount of DNA/RNA similar to NTCs, whereas rich samples contain significantly more genetic material than blank controls (Lauder et al., 2016) . Contaminants can easily dominate in low-biomass samples generating background noise that is much higher than true signal originating from the target virus (Malboeuf et al., 2013; Salter et al., 2014) . So far, a wide variety of environmental and clinical samples containing low viral biomasses have been studied with SM workflows including air (Prussin et al., 2019) , glacier ice (Zhong et al., 2020) , human skin (Tirosh et al., 2018) , nasal swabs (Altan et al., 2019) , and CSF (Perlejewski et al., 2020b; Perlejewski et al., 2020c) . Most widely used library preparation kits for sequencing require inputs as low as 1 ng of DNA (e.g., llumina Nextra XT), but this may still be unattainable for some low-biomass samples. Based on our own experience and other published studies, some biological samples such as CSF yield <1 ng of DNA/RNA after typical 200 µL extraction, and viral load found in this material is often in the range of 100 copies/mL (Poissy et al., 2012; Bradshaw and Venkatesan, 2016; Perlejewski et al., 2020b) . According to estimations by Garmaeva et al. (2019) 1 g of stool yields between 0.22 and 0.87 ng/µL of bacteriophage DNA (when using 50-200 µL of elution volume) signaling the need for application of more robust extraction and amplification methods, generating up to picograms of DNA (Garmaeva et al., 2019) . To generate sufficient amount of material for library preparation and sequencing, new products based on techniques, such as multiple displacement amplification (Spits et al., 2006) , linker amplification shotgun libraries (Bittinger et al., 2014) , sequence-independent single-primer amplification (Chrzastek et al., 2017) , or single-primer isothermal linear amplification (Ribo-SPIA), were introduced (Dafforn et al., 2004) . Commercial kits such as NuGEN's Ovation RNA-Seq System V2, which is based on Ribo-SPIA, can generate sufficient amount of cDNA for library preparation from as little as 500 pg of RNA with sufficient coverage and read count when sequencing as little as 100 copies of HIV RNA (Malboeuf et al., 2013) . Although all these methods solve the problem of insufficient material for sequencing in low-biomass samples, they neither reduce nor distinguish contaminants from true signals. Moreover, as previously mentioned, there is some evidence that these kits can be the source of extra genetic background themselves (Thoendel et al., 2017; Oechslin et al., 2018; Perlejewski et al., 2020a) .

Another common problem specific for ultralow-biomass samples (input < 50 pg) is the high level of read duplication reaching 70%, whereas it was reported to be only 0.5-2% with high DNA input samples (>50 ng) (Garmaeva et al., 2019) . This may generate a significant bias in quantitative analysis when different communities are compared to each other, and more irreproducible background noise is being amplified with decreasing sample biomass (Salter et al., 2014; Garmaeva et al., 2019; Erb-Downward et al., 2020) . Finally, low-biomass samples require extra steps during extraction and library preparation, which increase the likelihood of external and internal contamination (Salter et al., 2014; Rawlinson et al., 2019) .

Viral contaminants seem to be highly relevant among all types of contaminants encountered in viral metagenomic research, and they can, occasionally, significantly impact results interpretation, as was the case in the study by Xu et al. (2013) who identified National Institutes of Health-Chongqing virus (NIH-CQV) in patients with seronegative hepatitis. Although this particular pathogen was detected in 70% of hepatitis patients and in 0% of 45 healthy controls, it was later determined that this novel hybrid parvovirus-like virus was a contaminant from silica column-based RNA extraction kit (QIAamp MinElute Virus Kit; Qiagen, Hilden, Germany) (Smuts et al., 2014) . The observed lack of NIH-CQV presence in healthy control subjects was probably related to lot-to-lot differences in the degree of spin column contamination (Naccache et al., 2014b) . A year later, Acanthocystis turfacea chlorella virus 1 (ATCV-1) was proposed to be linked with the cognitive decline in humans after it was found in oropharyngeal samples collected from adults without current and past psychiatric disorders within a study that included measures of cognitive functioning (Yolken et al., 2014) . ATCV-1 is of algal origin and was later found to be a part of kitome of commercial DNase and RNA extraction kits (Kjartansdottir et al., 2015) . In general, most of reported contaminants in viral metagenomic studies seem to come from DNA and RNA extraction kits .

In another study, a silica column-based kit (QIAamp viral RNA mini kit; Germany) was found to generate background noise of Iridoviridae, Circoviridae, Baculoviridae, and Genomoviridae sequences (Ngoi et al., 2016) . In a recent study comparison of three extraction kits for metagenomic analysis of respiratory viruses, 19, 28, and 55 viral families were identified in NTCs using eMAG (bioMérieux, Marcy-l'Étoile, France), MagNA Pure 24 (Roche, Basel, Switzerland), and QIAamp Viral RNA Mini Kit extraction (Qiagen), respectively. Once again, the highest genetic background was found for the Qiagen kit, and it was composed of hits classified as Siphoviridae, Myoviridae, Microviridae, and Podoviridae . Various other reagents were also found to be a potential source of contamination, for example, BVDV−3 (bovine viral diarrhea virus 3) is a common contaminant in fetal bovine serum (Bergner et al., 2019) , whereas MuLV is present in 17 human cell lines (Cao et al., 2015; Uphoff et al., 2015) , as well as in reverse transcriptase-PCR reagents (Zheng et al., 2011; L'Huillier et al., 2019) .

Separation of true signals from background noise is both extremely important and yet challenging in clinical settings. Bacteriophages are particularly common among a variety of contaminants encountered in clinical metagenomics (Naccache et al., 2014a; Ngoi et al., 2016; Asplund et al., 2019; Sangiovanni et al., 2019) and may disrupt the quantitative picture of virome, whereas sequences of eukaryotic viruses may be falsely associated with diseases (Xu et al., 2013) . That was the case in the study linking febrile Kenyan adults with Kadiprio virus, which was initially considered to be the causative agent but was eventually found to be a part of QIAamp Viral RNA Mini Kit (Qiagen) kitome (Ngoi et al., 2016) . In a recent study, Mollerup et al. (2019) used NGS to search for viruses in human cancers and found Merkel cell polyomavirus (MCPyV) in Merkel cell carcinomas. However, close similarity of all MCPyV sequences found across samples allowed studies to conclude laboratory surfaces as the source of contamination (Foulongne et al., 2011; Mollerup et al., 2019) . In our previous viral SM studies, we often found pandoravirus sequences in CSF of patients with encephalitis and in NTCs (Perlejewski et al., 2015; Bukowska-Osko et al., 2016; Moustafa et al., 2017) . After closer analysis of these sequences (low-complexity reads with nucleotide tandem repeats), they were determined not to represent true signals, but sequencing artifacts and/or contaminants originating in laboratory reagents (Hjelmso et al., 2017; Waldvogel-Abramowski et al., 2019) .

So far, there are very few studies addressing the issue of viral contamination in viral sequencing (Naccache et al., 2014b; Moustafa et al., 2017; Asplund et al., 2019) . The most comprehensive was the one by Asplund et al. (2019) that evaluated 712 sequencing libraries prepared using several different protocols and found almost 500 viral hits associated with laboratory components. Similar to our observations, more contaminants were present in RNA sequencing protocols than those using DNA as a starting material. Most viruses reported by Asplund et al. (2019) were bacteriophages (60%), which is also consistent with our own studies in which phages constituted 96 and 77% of all viral sequences in CSF from encephalitis patients using RNA-based and DNA-based SM workflows, respectively (Perlejewski et al., 2020b) . Viruses of non-human vertebrae hosts constituted approximately 12% of all viral contaminants .

A frequent problem in viral SM studies is cross-contamination occurring when high viral-titer samples are simultaneously sequenced with low-biomass samples in the same sequencing lane (Moustafa et al., 2017) . This is especially relevant when viral SM is performed using clinical samples, and overexpressed viral hits from one sample affect the viromes of other specimens. Hightiter samples commonly contaminate low-biomass samples in the same sequencing run, and the rate of cross-contamination on Illumina platforms was reported to be approximately 0.05% (Deng et al., 2020) . In nanopore sequencing, cross-contamination occurs when low-and high-titer samples are pooled; to remedy these problems, it was proposed to batch samples together according to viral loads (Lewandowski et al., 2019) .

Bacterial contaminants affect both viral SM and ATM studies in a similar manner because of the same external origin of bacterial sequences, which are usually present in the kitome (Salter et al., 2014) . In approximately 72% of virome samples, bacterial DNA is considered to be the most abundant contaminant. Surprisingly, a significantly higher bacterial background noise is present in virus enriched than in non-enriched metagenomic samples (Zolfo et al., 2019) . These findings indicate that many virus-like particles (VLP)-targeting SM workflows fail in efficient virus enrichment and experience large contamination problems.

The predominant bacterial genera found in negative controls in ATM and SM studies are Propionibacterium, Flavobacterium, Streptococcus, Burkholderia, Methylobacterium, Curvibacter, Ralstonia, Escherichia, Acinetobacter, and Stenotrophomonas (Lauder et al., 2016; Weyrich et al., 2019) . Salter et al. (2014) reported the presence of Proteobacteria, Actinobacteria, Firmicutes, Bacteroidetes, Deinococcus-Thermus, and Acidobacteria in blank controls in PCR-based 16S rRNA gene and SM studies. In a study using HT-WGS in six different sequencing centers, Bradyrhizobium was reported to be the most common bacterial contaminant genus (Laurence et al., 2014) . Moreno-Gallego et al. (2019) found that more than 1% of bacterial reads identified in a fecal virome represented contamination and they belonged largely to Firmicutes phylum. This is compatible with the findings of Zolfo et al., who analyzed bacterial contaminants using measurements of bacterial small subunit ribosomal RNA gene (SSU rRNA). In 37 virome studies (analyzed environmental and human samples), SSU rRNA median ranged from 0 to 14.3% (approximately 1.2% per data set); (Zolfo et al., 2019) .

In HT-WGS studies of such clinical samples as stool or CSF, host genomic reads are an integral part of whole metagenomes (Nakamura et al., 2009; Perlejewski et al., 2020c) . Some investigators name all host reads as contaminants, as these sequences mask true signals and reduce assay sensitivity for pathogen detection (Malboeuf et al., 2013; Charre et al., 2020; Heravi et al., 2020) . Moreover, overrepresentation of host sequences in large NGS data sets can extend the process of data analysis and require high and costly computational powers (Hasan et al., 2016) .

The majority of human/host reads in WGS studies derived from the actual sample constitute a part of true genetic background; however, they reduce the sensitivity and sequencing coverage in microbial sequencing studies, especially for lowbiomass samples (Chiu and Miller, 2019; Pereira-Marques et al., 2019) . Clinical SM studies revealed that in such humanderived samples as nasopharyngeal aspirate, serum, and brain tissue, up to ∼95-99% of raw NGS reads derive from human DNA (Yang et al., 2011; Lipowski et al., 2017) . Consequently, without a significant host genomic depletion, viral genome coverage is likely to be low even when high viral loads are present (Luk et al., 2015) . In clinical settings, the minimum viral-host read ratio needed for viral identification is highly variable and species/sample/workflow-dependent. For instance, viral/human mRNA ratio of 0.0005% led to the discovery of MCPyV (Feng et al., 2008) , whereas viral/human RNA ratio was 0.0135% when a new arenavirus causing febrile illness was first identified in patients who received solid organ transplants from a single donor (Palacios et al., 2008) . In low-biomass clinical samples, human DNA/RNA overwhelms viral signals, but a variety of host depletion methods can partially remedy the problem by decreasing the background noise up to 3,100-fold with negligible loss of target virus (Oechslin et al., 2018) . Unfortunately, with the reduction of host genomic contamination, an increase of non-host contaminants is common, especially when kitome-related signals are being amplified (Salter et al., 2014; Oechslin et al., 2018) . Finally, some VLP purification methods such as CsCl density gradient ultracentrifugation efficiently remove host-derived DNA, but at the same time discriminate against particular viruses, thus affecting quantitative virome measurements (Kleiner et al., 2015) .

Bacterial and host-derived sequences are rarely reported in SM viral studies because NGS reads are often not aligned to comprehensive databases that include non-viral genomes. In SM studies on human nasopharyngeal samples and CSF, reads were mapping to plant, parasitic, fungal genomes, and even synthetic 

• Use sterile laboratory equipment: tubes, tips with filter, decontaminated racks, and machines • Wear disposable protective coats, gloves, and face masks • Always decontaminate working area • Perform wet-laboratory work under laminar flow hood • Perform all steps in dedicated laboratory areas: create separate preamplification, amplification, and postamplification sites

• Minimize the number of investigators in a project and record which samples were handled by a given technician Sampling • Avoid cross-contamination during sample preparation • Be aware that caging multiple laboratory animals in the same space may influence their microbial composition • Collect samples in sterile tubes • Avoid contamination derived from the skin or breath of the investigator • Use rich-biomass samples Maximize the sample volume for extraction when using low-biomass material

Reagents and wet-laboratory procedures

• Use the same types of reagents during the whole project Record all batches and lot numbers of all reagents used in a project

• Minimize the number of steps in wet-laboratory workflow • Use dedicated extraction kits for low-biomass samples with low elution volumes • Keep in mind that silica column-based nucleic acid extraction kits are associated with numerous contaminants constructs (Nakamura et al., 2009; Perlejewski et al., 2015) . These hits could have derived from various sources including reagents, sequencing errors, and erroneous classification, especially when using unfiltered and biased genome databases for alignment.

In virus-targeted SM studies, it is critical to make an accurate distinction between true viral signals and contaminants Asplund et al., 2019) . This is especially difficult when low-biomass samples containing low viral loads are being analyzed (Malboeuf et al., 2013; Perlejewski et al., 2016) . So far, a variety of SM workflows have been used for various samples using numerous wet-laboratory procedures and bioinformatics analysis, but a universally efficient approach is still unclear (Nakamura et al., 2009; Conceicao-Neto et al., 2015; Lewandowski et al., 2019) . SM viral protocols require validation and standardization before they can be used for routine clinical application (van Boheemen et al., 2020) . The protocols used are highly dependent on the type of sample. For instance, stool and tissue samples are treated differently (homogenization, filtration, DNA/RNA extraction, or nuclease treatment) than low-biomass samples such as CSF, human skin, or nasal swabs (e.g., required preamplification steps) (Hall et al., 2014; Sabatier et al., 2020) . Thus, any future standardized SM clinical viral protocols must take into consideration sample type and the expected viral pathogen (either DNA or RNA-based approach) (Schlaberg et al., 2017; Kufner et al., 2019) . Moreover, the same factors may affect the decision on sequencing parameters such as sequencing depth, which specifies how many times each base in a genome should be covered by NGS reads (Deng et al., 2020) . This parameter is associated with the abundance of target virus, which affects the sensitivity of applied workflows (Malboeuf et al., 2013; Pereira-Marques et al., 2019) . Another factor to consider is sequencing breadth, which specifies what portion of a genome should be sequenced for a reliable identification (Wylie et al., 2018) . Ladner et al. (2014) proposed five categories to define different genome standards in viral-targeted sequencing beginning with a "standard draft, " representing a low coverage with at least 50% of a draft genome candidate recovered (frequent for lowbiomass samples with low viral loads). On the opposite site, a "finished" category requires high coverage rates (400-1,000×) and represents cases when a complete viral consensus genome sequence is obtained, combined with complete population-level characterization of genomic diversity (Ladner et al., 2014) .

So far, there are no universal criteria for positive virus species identification in HTS-WGS analyses. Currently, it seems that the gold standard for microbial confirmation after identification by metagenomics is PCR or Sanger sequencing (Yu et al., 2016; Fang et al., 2018; Wylie et al., 2018; Holmes, 2019) . Theoretically, even a one virus-specific NGS read in SM could indicate a true signal. In the already mentioned study, a novel arenavirus was identified in organ transplant setting after only 14 virusspecific sequences were detected by SM (Palacios et al., 2008) . Liu et al. (2020) proposed that a positively identified viral taxon should be represented by at least two unique sequencing reads detected by the same or a different technique, whereas detection of reads mapping to at least three non-overlapping genome regions was required to identify virus in CSF in the studies conducted by Schlaberg et al. (2017) or Miller et al. (2019) . Reads dispersed across the whole genome and with high coverage indicate the presence of true viral signals, but isolated and/or repeated viral sequences found across samples from the same run suggest sequencing artifacts . In a study evaluating viral SM workflow in a tertiary diagnostic unit, positive viral identification required detection of at least three viral reads distributed across the whole genome with a high coverage score. Furthermore, the number of reads for the target virus had to be at least 100 times higher than in negative controls and other samples (Kufner et al., 2019) . This approach is balanced as it takes into account the high possibility of cross-contamination between samples and NTCs, whereas many microbiome studies disqualified all sequences found in negative controls (Dunn et al., 2013; Karstens et al., 2019) . A blacklist method assembles a catalog of specific contaminants found in NTCs in a given study and/or sequencing center and uses them in an algorithm to exclude matching sequences from WGS data sets (Ye et al., 2019) . However, it is well-documented that true signals can also occur in NTCs as part of the index switching phenomenon (Callahan et al., 2017; Sinha et al., 2017; Costello et al., 2018; Larsson et al., 2018) . It was shown that index switching ratios are higher in NTCs than in templatecontaining samples, indicating that at least several NTCs should be included in each sequencing run . This approach allows for the detection of even sporadic contaminants, which is relevant if the decontamination is based on removal of sequences below a specified read/species abundance threshold (Lazarevic et al., 2016; Asplund et al., 2019) .

Different thresholds were used in SM viral studies to distinguish between true and false-positive hits; for example, Guerin et al. (2020) proposed a threshold of >100 hits. In a study by Wylie et al. (2018) using pools of clinical samples (CSF, blood, plasma urine, swabs), the threshold of 0.1% of total reads for each virus expected in the appropriate sequencing pool was applied to limit the impact of index switching. In another study using VLP enrichment protocols, a relative read count threshold of 0.01% was set based on an empirical index contamination rate (O'Flaherty et al., 2018) .

Viral identification is currently supported by numerous computational algorithms and open-source programs, such as VirSorter (Roux et al., 2015) , VirusFinder (Wang et al., 2013) , VirusSeeker , VirusSeq , VirusDetect (Zheng et al., 2017) , and ViromeScan (Rampelli and Turroni, 2018) . Some of the algorithms/pipelines [ViralFusionSeq (Li et al., 2013) , Virana (Schelhorn et al., 2013) , VERSE ] even allow for the detection of viruses integrated into the host genomes. Another group of useful programs such as MARVEL (Amgarten et al., 2018) , PhagePhisher (Hatzopoulos et al., 2016) , or Phage_Finder (Fouts, 2006) are designed to detect phages in metagenomic data sets. Special caution is required when interpreting the results of viral mining software applied in mixed metagenomes as they contain more computationally derived internal contamination compared to virus-specific data sets. Zolfo et al. (2019) showed that assembly carried out in poorly enriched metagenomes increases the number of contigs falsely classified as viral. More than 20% of assembled reads were assigned as viral in approximately 12% of metagenomic poorly enriched samples. This indicates a significant presence of viral false-positives found in data sets containing high representation of bacterial genomes (Zolfo et al., 2019) .

Contamination in metagenomic studies can also be reduced or even removed using open-source software, such as R package decontam, which takes advantage of two observations: (i) contaminants are found at higher frequencies in low-titer samples, and (ii) their presence is more common in negative controls than in true samples (Davis et al., 2018) . A similar application presents DecontaMiner, which uses a subtraction approach to detect contaminations by bacteria, fungi, and viruses from different sources (Sangiovanni et al., 2019) . A much more virome-focused software is ViromeQC, which is designed for benchmarking and quantifying non-viral contamination in VLP-enriched projects. It uses three microbial markers: SSU-rRNA, large subunit rRNA gene, and 31 prokaryotic single-copy markers. In addition, ViromeQC calculates viral enrichment score measuring the quality of VLP enrichment protocol (Zolfo et al., 2019) . Finally, R packages such as microDecon (McKnight et al., 2019) or CroCo (Simion et al., 2018) are designed to efficiently and correctly detect cases of cross-contamination in studies using metabarcoding.

Evolution of NGS and WGA methods has allowed for the development of numerous metagenomic workflows, which were successfully applied in viral-focused studies across various environments (Conceicao-Neto et al., 2015; Kohl et al., 2015; Perlejewski et al., 2020b) . Regardless of the specific viral SM protocol, contamination cannot be totally avoided, and in particular, the issue of reagent contamination should always be addressed with high priority . So far, the problem of contamination was mostly studied in 16S rRNA profiling, and only a few viral SM studies used NTCs or reported kitome sequences characteristic for their protocols (Grahn et al., 2003; Karstens et al., 2019) .

In the present article, we described the most common sources and types of contamination found in viral metagenomic studies, and we propose some basic recommendations for reducing the background noise (Table 1) . There is an urgent need for the development and validation of standards in viral metagenomics, which would limit contamination bias, increase the quality of research, and allow viral SM protocols to be more widely applied in diagnostics.

KP, HJ, and TP: writing-original draft preparation and visualization. KP and HJ: conceptualization, data curation, and writing-review and editing. All authors contributed to the article and approved the submitted version. 

Effect of geographic isolation on the nasal virome of indigenous children

MARVEL, a tool for prediction of bacteriophage sequences in metagenomic bins

Contaminating viral sequences in high-throughput sequencing viromics: a linkage study of 700 sequencing libraries

Using noninvasive metagenomics to characterize viral communities from wildlife

Improved characterization of medically relevant fungi in the human respiratory tract using next-generation sequencing

Digital MDA for enumeration of total nucleic acid contamination

Understanding and overcoming the pitfalls and biases of next-generation sequencing (NGS) methods for use in the routine clinical microbiological diagnostic laboratory

Frequent contamination of Taq polymerase with DNA

Herpes simplex virus-1 encephalitis in adults: pathophysiology, diagnosis, and management

Effect of the enzyme and PCR conditions on the quality of high-throughput DNA sequencing results

PCR-induced transitions are the major source of error in cleaned ultradeep pyrosequencing data

Promises and pitfalls of Illumina sequencing for HIV resistance genotyping

Sensitivity of next-generation sequencing metagenomic analysis for detection of RNA and DNA viruses in cerebrospinal fluid: the confounding effect of background contamination

Replication and refinement of a vaginal microbial signature of preterm birth in two racially distinct cohorts of US women

High-throughput RNA sequencing-based virome analysis of 50 lymphoma cell lines from the cancer cell line encyclopedia project

Don't make a mista(g)ke: is tag switching an overlooked source of error in amplicon pyrosequencing studies?

Evaluation of NGS-based approaches for SARS-CoV-2 whole genome characterisation

VirusSeq: software to identify viruses and their integration sites using next-generation sequencing of human cancer tissue

Effects of GC bias in next-generation-sequencing data on de novo genome assembly

Clinical metagenomics

Use of Sequence-Independent, Single-Primer-Amplification (SISPA) for rapid detection, identification, and characterization of avian RNA viruses

Modular approach to customise sample preparation procedures for viral metagenomics: a reproducible protocol for virome analysis

Characterization and remediation of sample index swaps by nonredundant dual indexing on massively parallel sequencing platforms

Linear mRNA amplification from as little as 5 ng total RNA for global gene expression analysis

Simple statistical identification and removal of contaminant sequences in marker-gene and metagenomics data

Recognizing the reagent microbiome

Author correction: metagenomic sequencing with spiked primer enrichment for viral diagnostics and genomic surveillance

Laboratory contamination in airway microbiome studies

Home life: factors structuring the bacterial diversity found within and between homes

Contamination in low microbial biomass microbiome studies: issues and recommendations

Critical relevance of stochastic effects on lowbacterial-biomass 16S rRNA gene analysis

Contamination of Qiagen DNA extraction kits with Legionella DNA

Realtime utilization of metagenomic sequencing in the diagnosis and treatment monitoring of an invasive adenovirus B55 infection and subsequent herpes simplex virus encephalitis in an immunocompetent young adult

Clonal integration of a polyomavirus in human Merkel cell carcinoma

Metagenomics: the next culture-independent game changer

Detection of Merkel cell polyomavirus on environmental surfaces

Phage_Finder: automated identification and classification of prophage regions in complete bacterial genome sequences

Accuracy of next generation sequencing platforms

Studying the gut virome in the metagenomic era: challenges and perspectives

Inherent bacterial DNA contamination of extraction and sequencing reagents may affect interpretation of microbiota in low bacterial biomass samples

Identification of mixed bacterial DNA contamination in broad-range PCR amplification of 16S rDNA V1 and V3 variable regions by pyrosequencing of cloned amplicons

Detection and removal of barcode swapping in single-cell RNA-seq data

A novel next-generation sequencing and analysis platform to assess the identity of recombinant adeno-associated viral preparations from viral DNA extracts

Evaluation of rapid and simple techniques for the enrichment of viruses prior to metagenomic virus discovery

Depletion of human DNA in spiked clinical specimens for improvement of sensitivity of pathogen detection by next-generation sequencing

PhagePhisher: a pipeline for the discovery of covert viral sequences in complex genomic datasets

Host DNA depletion efficiency of microbiome DNA enrichment methods in infected tissue samples

Evaluation of methods for the concentration and extraction of viruses from sewage in the context of metagenomic sequencing

Reagent contamination in viromics: all that glitters is not gold

Issues and current standards of controls in microbiome research

Identification and elimination of DNA sequences in Taq DNA polymerase

Introduction to the analysis of environmental sequences: metagenomics with MEGAN

The evidence of contaminant bacterial DNA in several commercial Taq polymerases

Bioaerosol generation by raindrops on soil

Controlling for contaminants in low-biomass 16S rRNA gene sequencing experiments

Possible human papillomavirus 38 contamination of endometrial cancer RNA sequencing samples in the cancer genome atlas database

Application of special oligotrophic media for cultivation of bacterial communities originated from ultrapure water

VIBRANT: automated recovery, annotation and curation of microbial viruses, and evaluation of viral community function from genomic sequences

Optimizing methods and dodging pitfalls in microbiome research

The application of next-generation sequencing to define factors related to oral cancer and discover novel biomarkers

Avoidance of PCR false positives

Traces of ATCV-1 associated with laboratory component contamination

Evaluation of methods to purify virus-like particles for metagenomic sequencing of intestinal viromes

Protocol for metagenomic virus detection in clinical specimens

Amplification-free Illumina sequencing-library preparation facilitates improved mapping and assembly of (G+C)-biased genomes

Human contamination in public genome assemblies

Two years of viral metagenomics in a tertiary diagnostics unit: evaluation of the first 105 cases

Analysis of bacteria contaminating ultrapure water in industrial systems

Standards for sequencing viral genomes in the era of high-throughput sequencing

Computational correction of index switching in multiplexed sequencing libraries

Comparison of placenta samples with contamination controls does not provide evidence for a distinct placenta microbiota

Common contaminants in nextgeneration sequencing that hinder discovery of low-abundance microbes

Decontamination of 16S rRNA gene amplicon sequence datasets based on bacterial load assessment by qPCR

Contaminationcontrolled high-throughput whole genome sequencing for influenza A viruses using the MiSeq sequencer

Taxon appearance from extraction and amplification steps demonstrates the value of multiple controls in tick microbiota analysis

Metagenomic nanopore sequencing of influenza virus direct from clinical respiratory samples

Identification of viral signatures using high-throughput sequencing on blood of patients with Kawasaki disease

ViralFusionSeq: accurately discover viral integration events and reconstruct fusion transcripts at single-base resolution

A cluster of fatal tick-borne encephalitis virus infection in organ transplant setting

An optimized metagenomic approach for virome detection of clinical pharyngeal samples with respiratory infection

From the field to the laboratory: controlling DNA contamination in human ancient DNA research in the high-throughput sequencing era

Utility of metagenomic next-generation sequencing for characterization of HIV and human pegivirus diversity

Unique, dual-indexed sequencing adapters with UMIs effectively eliminate index cross-talk and significantly improve sensitivity of massively parallel sequencing

Complete viral RNA genome sequencing of ultra-low copy samples by sequence-independent amplification

Monitoring error rates in illumina sequencing

Comparative evaluation of cDNA library construction approaches for RNA-Seq analysis from low RNA-content human specimens

microDecon: a highly accurate read-subtraction tool for the post-sequencing removal of contamination in metabarcoding studies

Humans differ in their personal microbial cloud

Unexpected cross-species contamination in genome sequencing projects

Laboratory validation of a clinical metagenomic sequencing assay for pathogen detection in cerebrospinal fluid

Quantifying and understanding well-to-well contamination in microbiome research

High-throughput sequencing-based investigation of viruses in human cancers by multienrichment approach

Virome diversity correlates with intestinal microbiome diversity in adult monozygotic twins

Improved multiple displacement amplification (iMDA) and ultraclean reagents

The blood DNA virome in 8,000 humans

Large-scale contamination of microbial isolate genomes by Illumina PhiX control

Concerns over the origin of NIH-CQV, a novel virus discovered in Chinese patients with seronegative hepatitis

A cloud-compatible bioinformatics pipeline for ultrarapid pathogen identification from next-generation sequencing of clinical samples

Direct metagenomic detection of viral pathogens in nasal and fecal specimens using an unbiased high-throughput sequencing approach

The plasma virome of febrile adult Kenyans shows frequent parvovirus B19 infections and a novel arbovirus (Kadipiro virus)

Estimation of bacterial contamination in ultrapure water: application of the anti-DNA antibody

Overview of virus metagenomic classification methods and their biological applications

Limited correlation of shotgun metagenomics following host depletion and routine diagnostics for viruses and bacteria in low concentrated surrogate and clinical samples

Comprehensive viral enrichment enables sensitive respiratory virus genomic identification and analysis by next generation sequencing

Elimination of "kitome" and "splashome" contamination results in lack of detection of a unique placental microbiome

A new arenavirus in a cluster of fatal transplant-associated diseases

Impact of host DNA and sequencing depth on the taxonomic resolution of whole metagenome sequencing for microbiome analysis

Metagenomic analysis of cerebrospinal fluid from patients with multiple sclerosis

Next-generation sequencing in the diagnosis of viral encephalitis: sensitivity and clinical limitations

Search for viral agents in cerebrospinal fluid in patients with multiple sclerosis using real-time PCR and metagenomics

Search for viral infections in cerebrospinal fluid from patients with autoimmune encephalitis

Next-Generation Sequencing (NGS) in the identification of encephalitis-causing viruses: unexpected detection of human herpesvirus 1 while searching for RNA pathogens

Impact of Herpes simplex virus load and red blood cells in cerebrospinal fluid upon herpes simplex meningo-encephalitis outcome

Microbiome analyses of blood and tissues suggest cancer diagnostic approach

Seasonal dynamics of DNA and RNA viral bioaerosol communities in a daycare center

Optimal enzymes for amplifying sequencing libraries

From whole-genome shotgun sequencing to viral community profiling: the viromescan tool

How to carry out microbiological sampling of healthcare environment surfaces? A review of current evidence

VirSorter: mining viral signal from microbial genomic data

Comparison of nucleic acid extraction methods for a viral metagenomics analysis of respiratory viruses

Reagent and laboratory contamination can critically impact sequence-based microbiome analyses

From trash to treasure: detecting unexpected contamination in unmapped NGS data

Sensitive detection of viral transcripts in human tumor transcriptomes

Validation of metagenomic next-generation sequencing tests for universal pathogen detection

Detection of DNA contamination in Taq polymerase

Towards error-free profiling of immune repertoires

A software tool 'CroCo' detects pervasive cross-species contamination in next generation sequencing data

Index switching causes "spreading-of-signal" among multiplexed samples in Illumina HiSeq 4000 DNA sequencing. bioRxiv

Novel hybrid parvoviruslike virus, NIH-CQV/PHV, contaminants in silica column-based nucleic acid extraction kits

Optimizing Taq polymerase concentration for improved signal-to-noise in the broad range detection of low abundance bacteria

Whole-genome multiple displacement amplification from single cells

Identification and removal of contaminating microbial DNA from PCR reagents: impact on low-biomass microbiome analyses

Microbial contamination in next generation sequencing: implications for sequence-based analysis of clinical samples

The impact of DNA polymerase and number of rounds of amplification in PCR on 16S rRNA gene sequence data

The application of ultraviolet irradiation to exogenous sources of DNA in plasticware and water for the amplification of low copy number DNA

Impact of contaminating DNA in whole-genome amplification kits used for metagenomic shotgun sequencing for infection diagnosis

Contamination of commercial PCR master mix with DNA from Coxiella burnetii

Expanded skin virome in DOCK8-deficient patients

Prevalence and characterization of murine leukemia virus contamination in human cell lines

Retrospective validation of a metagenomic sequencing protocol for combined detection of RNA and DNA viruses using respiratory samples from pediatric patients

Sterile paper points as a bacterial DNA-contamination source in microbiome profiles of clinical samples

Index hopping on the Illumina HiseqX platform and its consequences for ancient DNA studies

Qiagen DNA extraction kits for sample preparation for legionella PCR are not suitable for diagnostic purposes

Viral metagenomics of blood donors and blood-derived products using next-generation sequencing

Plasmid DNA contaminant in molecular reagents

VirusFinder: software for efficient and accurate detection of viruses and their integration sites in host genomes through next generation sequencing data

VERSE: a novel approach to detect virus integration in host genomes through reference genome customization

Laboratory contamination over time during low-biomass sample analysis

VIROME: a standard operating procedure for analysis of viral metagenome sequences

Detection of viruses in clinical samples by use of metagenomic sequencing and targeted sequence capture

Hybrid DNA virus in Chinese patients with seronegative hepatitis discovered by deep sequencing

Detection of viral pathogens with multiplex nanopore MinION sequencing: be careful with cross-talk

Unbiased parallel detection of viral pathogens in clinical samples by use of a metagenomic approach

Benchmarking metagenomics tools for taxonomic classification

Chlorovirus ATCV-1 is part of the human oropharyngeal virome and is associated with changes in cognitive functions in humans and mice

Clinical metagenomic analysis of bacterial communities in breast abscesses of granulomatous mastitis

VirusSeeker, a computational pipeline for virus discovery and virome composition analysis

Detection of murine leukemia virus or mouse DNA in commercial RT-PCR reagents and human DNAs

VirusDetect: an automated pipeline for efficient virus discovery using deep sequencing of small RNAs

Reply to Naccache et al: viral sequences of NIH-CQV virus, a contamination of DNA extraction method

Glacier ice archives fifteen-thousand-year-old viruses. bioRxiv

Detecting contamination in viromes using ViromeQC

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.