key: cord-0850516-fx3h6ron authors: Cameron, Andrew; Bohrhunter, Jessica L.; Taffner, Samantha; Malek, Adel; Pecora, Nicole D. title: Clinical Pathogen Genomics date: 2020-10-01 journal: Clin Lab Med DOI: 10.1016/j.cll.2020.08.003 sha: d6d4e96643e4a0e4232202fae44a92276c008026 doc_id: 850516 cord_uid: fx3h6ron Recent improvements in next-generation sequencing technologies have enabled clinical laboratories to increasingly pursue pathogen genomics for infectious disease diagnosis. Clinical laboratories can also benefit from whole-genome sequence characterization of cultured isolates, helping to resolve infection prevention questions pertaining to pathogen outbreaks and surveillance. Metagenomic sequencing from primary specimens can also provide laboratories with an unbiased universal test for situations where traditional methods fail to identify infectious etiologies despite, high clinical suspicion. Here, the most useful applications of whole-genome sequence and metagenomic sequencing are summarized, as are the main advantages, limitations, and considerations for building an in-house clinical genomics program. Over the past 10 years, clinical pathogen genomics has gone from a cutting-edge research method for investigating organism relatedness to being the gold-standard for assessing potential outbreaks. 1 Although pathogen clonality is often the main question, next-generation sequencing (NGS) provides a rich trove of information to diagnose, assess antimicrobial resistance genes (ARGs) and virulence factors, monitor for high-risk clones, and understand the local genomic landscape. Additional benefits include the characterization of emerging and/or unusual pathogens. Whole-genome sequencing (WGS) of cultured isolates has become commonplace in research programs, and mainstream in public health laboratories, particularly for bacterial pathogens. 2, 3 Genomic assessment of fungal pathogens is an active but still developing field. [4] [5] [6] The introduction and application of microbial genomics in clinical laboratories remains uneven: the necessary instrumentation is often not available and the lack of assays cleared by the US Food and Drug Administration makes clinicallevel validations onerous. 7 Although these challenges are real, there are clear advantages of local (in-house) expertise and WGS data. There have been several reviews discussing the use of pathogen NGS for clinical care. [8] [9] [10] Commercial solutions for the sequencing of cultured isolates are now available through many reference laboratories and increasingly for sequencing from primary specimens as well. 11, 12 In this article, we provide an update on the current benefits and challenges of bringing this technology in-house for clinical microbiology laboratories. The focus is not on sequencing technologies, which have been extensively covered elsewhere, but rather on current practical uses and how they may develop in the future. 13 We will discuss the use of NGS pathogen genomics for both cultured isolates and primary specimens, for which this diagnostic modality is truly pan-phylogenetic. Metagenomic-based NGS (mNGS) approaches are increasingly at the forefront because they also provide coverage of viral pathogens, which are rarely cultured in the modern clinical laboratory, and not detectable using universal targets common among bacteria (eg, 16S) and fungi (eg, internal transcribed spacer). Logistical and infrastructural considerations for bringing this technology into a diagnostic microbiology laboratory are also discussed. For a clinical laboratory, the essential goals of WGS performed on cultured isolates of bacteria and fungi ( Fig. 1) are: Pathogen identification, focusing on cultured bacteria and fungi. Assessment of susceptibility to antibiotics. Infection prevention, specifically for outbreak investigations and pathogen/antimicrobial resistance surveillance. Standard clinical techniques-that is, culture, microscopy, biochemical characterization, nucleic acid tests (NATs), matrix-assisted laser desorption ionization -time of flight mass spectrometry-provide sufficient identification for the majority of frequently encountered pathogens. Occasionally, additional resolution is needed, either for accurate speciation or when subspecies information is required. This can be achieved through multilocus sequence typing, but typing schemes are tailored to a particular microbial species. And although sequencing multiple genetic loci by traditional Sanger methods is possible, scalability is a limitation of Sanger-based approaches, and costs can rapidly outstrip those of WGS if multiple targets are sequenced. In some circumstances a WGS-based approach is advantageous because it can provide clinically-actionable information. This can include: Distinguishing between members of a clonal complex. For example, the NAP1 or ST1 group of Clostridiodes difficile is a clinically relevant and high-risk clone easily identified by WGS. 14 Identifying high-risk clones and detection of virulence genes. Combined with WGS, the detection of rmpA and iucA virulence genes in Klebsiella pneumoniae Cameron et al are often markers of a hypervirulent phenotype associated with severe invasive infections. 15 Although curated databases for the identification of fungi using internal transcribed spacer sequencing are available, 16, 17 many fungal taxa cannot be speciated by internal transcribed spacer sequencing alone. 17 Databases available for the identification of fungal pathogens have been reviewed recently 18 and their limitations highlight the need for WGS approaches. Other considerations for fungal WGS include the following. Specimens and cultured isolates • Which specimens or cultured isolates will be prioritized for NGS? • Suitable extraction method for specimen type. • Suitable extraction method for cell envelope composition of microbe. • Biosafety -does the organism need to be inactivated prior to extraction? • Platform -long read and/or short read sequencer. • Sequencing depth required -what is the ploidy and genome size of the organism? • Host nucleic acid depletion method. • Target enrichment strategy. • Is there a suitable clinical-grade genome database available? • Does the organism have a reference genome? • Is there a typing scheme? Can WGS data be correlated with it? • Will the entire genome be analyzed or only certain genes (eg: MLST or virulence genes)? • Will genomes be assessed for additional features (mobile genetic elements, ARGs, virulence genes, etc.) in addition to clonality? • Is there sufficient knowledge about which ARGs are important in the organism of interest? • Which ARG database will be used? • Will all ARGs be reported or only those which are being investigated for discrepancies? • Is the evolutionary rate of the organism known? • How is clonality (e.g. number of SNP differences) defined in the species of interest? • What rules will be used to determine whether a genetic variation/mutation is clinically significant? • How will the results of the investigation be reported? • What level of identity with the reference sequence is required for clinical reporting? Many fungal species do not have reference genomes. 19 Large genome size and ploidy variation necessitates specialized bioinformatics tools. 19, 20 Clinical-grade WGS databases for fungal pathogens do not exist. Phenotypic antibiotic susceptibility testing is inexpensive, robust, and usually agnostic to the mechanism of antibiotic resistance. Antibiotic susceptibility testing methods are commonly augmented by rapid molecular assays or NATs, which detect high-risk and high-probability ARGs. For these reasons, phenotypic antibiotic susceptibility testing is unlikely to be replaced by WGS. However, there are instances where WGS can provide important information about susceptibility. For example: The identification of rare ARGs, chromosomal mutations, and variant alleles not detected by NATs. Resolution of discrepancies between phenotypic and NAT results, for example, a phenotypically susceptible organism with an ARG detected by NAT. Risk assessment by identifying genes involved in ARG mobilization and horizontal transfer (ie, transposons, insertion sequences, integrons, plasmids). Infection prevention is a major application of WGS of cultured isolates and facilitates both outbreak investigation and prospective surveillance (see Fig. 1 ). The most common use of WGS in clinical laboratories is outbreak investigation. WGS of isolates from a suspected outbreak allows for determination of clonality, correlation with regional and national data, and establishment of transmission timelines. [21] [22] [23] [24] Although other methods exist for determining isolate clonality (eg, pulsed-field gel electrophoresis), only WGS also identifies ARGs and virulence genes, and finely resolves degrees of relatedness. 23, 25, 26 WGS is also the preferred method for fungal genotyping in outbreak investigations, because many of these pathogens lack a preexisting typing scheme. 19, 27 Although there are unique challenges for WGS-based fungal outbreak investigations, it has been successfully used in this context. 19, 28, 29 Surveillance WGS is a powerful tool for prospective genomic surveillance and allows for the detection of cryptic outbreaks 30 and for monitoring the introduction of high-risk clones and ARGs into the hospital environment. 22, 25, 31, 32 Prospective surveillance by WGS also allows each facility to put outbreaks into the context of the local genomic landscape of circulating clones. This process can help to indicate whether potential outbreak strains are newly introduced or known clones circulating in the hospital or local community. Consistent sampling over a long period of time provides better data than sporadic collections of archived strains. Total nucleic acid sequencing (mNGS directly from primary specimens) as a universal diagnostic test has been reviewed thoroughly in recent years. [33] [34] [35] Traditional clinical laboratory methods can fail to establish a causative agent, even in patients with features strongly suggestive of infectious disease. Although the ongoing development of Cameron et al multiplex syndromic NAT panels is a partial solution to this problem, mNGS offers a more comprehensive approach. This is because multiplex syndromic NAT panels have limited inclusivity, generally targeting a select range of the most probable syndromic etiologies, and expansion drives up costs. 36 Unlike mNGS, established syndromic NAT panels may be incapable of detecting new or emerging strains. This challenge was evident in the coronavirus disease-19 pandemic where the severe acute respiratory syndrome coronavirus-2 was not detectable by respiratory panels otherwise capable of detecting human coronaviruses (HCoV-229E, HCoV-HKU1, HCoV-NL63, and HCoV-OC43). 37 The power of mNGS for infectious disease diagnostics is the provision of unbiased pan-taxonomic identification by total nucleic acid sequencing of primary specimens. Other applications include: Identification of DNA and RNA viruses. Identification of unculturable or fastidious pathogens, including cases complicated by antibiotic exposure. 38 Taxonomic profiling of polymicrobial infections. 39, 40 ARG screening in complex, polymicrobial matrices. Assessment of antiviral resistance. Outbreak and infection prevention surveillance. Unbiased metagenomic-based next-generation sequencing approach An unbiased mNGS approach refers to en masse shotgun sequencing of all microbial and host NA present in a patient specimen (see Fig. 1 ). Because it does not rely on prior culture, enrichment, or amplification, mNGS is considered to be unbiased, and is capable of the simultaneous detection of major pathogens from different domains (eg, virus, bacteria, fungi, and parasites). Pathogen detection relies on computational analysis of the resulting sequences to identify those aligning with known pathogen sequences. Depending on the method of library preparation, detection may be limited to DNA or may include both RNA and DNA. Several modifications can be made to mNGS workflows to selectively profile target organisms or improve recovery of pathogen-specific nucleic acids. These include amplicon sequencing and target enrichment (see Fig. 1 ). A targeted mNGS sequencing based on sequencing amplicons (polymerase chain reaction-amplified conserved sequences) provides little beyond phylogenetic information and relative abundance ( Table 1) . Furthermore, amplicon sequencing does not allow for extended sequence-based characterization of detected pathogens, although it may be more sensitive owing to polymerase chain reaction amplification. Target enrichment through hybridization capture (panels of pathogen-specific RNA/DNA probes) can enrich for select pathogens while providing WGS level information about each of them, but practical limitations means that the extent of genome coverage is often inversely related to the number of organisms the panel can detect. The implementation of a preanalytical clinical microbiology genomics workflow for infectious disease diagnosis requires consideration of specimen storage, nucleic acid extraction techniques, and sequence library preparation. 41 The mNGS assays require additional considerations, owing to an increased risk of contamination and bias. 42 Unbiased recovery of nucleic acids from a cultured isolate or primary specimen and successful NGS without the introduction of contaminants necessitates consideration of the following: Specimen collection, storage, and handling (for both WGS and mNGS). Aseptic technique. Transport and storage conditions. Freeze-thaw cycles. Specimen and nucleic acid preservation methods. Nucleic acid extraction (for both WGS and mNGS). Commonly used methods include mechanical (bead beating, cryofracturing), and enzymatic. Optimal lysis methods need to be determined for each specimen type. Extraction method can bias microbial community profiles. Host nucleic acid depletion (for mNGS). Host nucleic acid depletion aims to increase the ratio of microbial to host nucleic acid. 43 Technically challenging, expensive, and time consuming, that is, the removal of CpG-methylated host DNA or the selective lysis and degradation of host cells and DNA before extraction. 43 Can introduce additional bias into the workflow. Target enrichment by capture hybridization-and amplification-based technologies (for mNGS). Spiked primer enrichment during reverse transcription can amplify specific sequences and simultaneously retain sensitivity for other pathogens. 44 Table 1 Advantages and disadvantages of unbiased and targeted mNGS approaches Unbiased testing of patient specimen Discovery of novel organisms or traits Characterization of polymicrobial infections Extended pathogen characterization Host background (human/microbial) More costly than targeted amplicon sequencing Sequencing depth must be sufficient Easily contaminated with environmental nucleic acid More challenging computational analysis Targeted/amplicon mNGS More sensitive for organism detection Less costly than an unbiased approach Often requires amplification using primers that may be suboptimal for the pathogen Only a small fragment of the genome may be sequenced (ie, 16s amplicon profiling) Easily contaminated with environmental nucleic acid Probe-and-capture strategies can selectively enrich relevant nucleic acid sequences though may require large probe sets. 45 Target enrichment can preclude identification of novel pathogens. Although many of the applications described elsewhere in this article can be done through reference laboratories, there are distinct advantages to developing a pathogen NGS program within the clinical diagnostic laboratory. In-house testing provides a faster turnaround time to resolve questions of identification and the detection of resistance genes pertinent to patient care. 46 With respect to surveillance and outbreak questions, in-house testing additionally allows each institution to query databases of their local genomic landscape. Other benefits, particularly in academic institutions, include the development of familiarity with the technology and analytical literacy of NGS among staff and trainees, which can nurture expansion of NGS into more diverse applications. Bringing pathogen NGS in-house presents challenges, not least of which are the costs and logistics associated with building the technical and analytical infrastructure. Other considerations include the need for a clinical-grade validation, quality control measures, a proficiency testing program, and the development of a reporting structure, both for patient care and hospital infection prevention programs. It is still uncommon for instrumentation capable of NGS to be housed directly in a clinical microbiology laboratory. Pathogen NGS is often developed in collaboration with molecular diagnostic laboratories; however, this practice may change as costs of NGS platforms continue to decrease. For example, nanopore long read technology is approachable for most clinical laboratories, and there is growing interest in its application toward clinical questions. 47 Commercial analysis platforms are becoming available that can answer many outbreak and typing questions without requiring the skills of a professional bioinformatician. Examples include SeqSphere (Ridom, Munster, Germany) and Bionumerics (Applied Maths, Austin, TX). Other commercial solutions can help with more specific questions, including pathogen identification and viral genotyping/resistance assessment (IDbyDNA, Salt Lake City, UT; SmartGene, Lausanne, Switzerland; etc). Several public agencies have also developed analytical resources to address questions of pathogen identification, antimicrobial resistance, and clonality. 48 Less standardized questions, such as plasmid and transposon analysis in bacteria or viral typing, may require in-house bioinformatics personnel and customized analysis pipelines (see Fig. 1 ). A high-performance computing cluster is required; the amount of computing power required depends on the number of samples, expected turnaround times, and specific tools chosen within a pipeline. Furthermore, a large amount of data storage is necessary for archival purposes. The design of a clinical pathogen genomics bioinformatics pipeline should focus on scalability and robustness. Running tasks in parallel greatly increases computational efficiency. A modular design allows for easier testing, validation, and upgrades. Generally, a pipeline includes 3 phases: Clinical Pathogen Genomics 1. Read quality control. 2. Either de novo assembly into contigs or read alignment to a reference genome. 3. Genotyping, identifying resistance and virulence factors, single nucleotide polymorphism discovery, and phylogenetic tree building. Clinical bioinformatics pipelines also need to be adaptable and relevant to any species. Pipelines should contain vetted references and regularly updated databases of strain types and ARGs, while also allowing for custom BLAST databases. Validation and proficiency testing requirements associated with a pathogen NGS program vary considerably between WGS tests for cultured isolates and mNGS assays. Several authors have described development of pathogen WGS validation, including the use of test panels, appropriate controls, sequence quality control, and proficiency testing. [49] [50] [51] Additional guidance is available from state and national organizations. 52, 53 Validation of mNGS assays is less established, but has been described in several publications. 35, [54] [55] [56] Reporting Distilling information from pathogen NGS into an approachable report is challenging. 57 For surveillance and outbreak reporting to infection prevention teams, the report should include an introduction, a list of isolates/patients, and a description of the question at hand. For questions of antibiotic resistance, a table of pertinent ARGs and their connection to phenotypic susceptibility should be included. For questions of clonality (ie, a potential outbreak), a phylogenetic tree and single nucleotide polymorphism distance matrix are appropriate. As with any clinical report, a description of technical and analytical methods must be included. Reporting of clinical metagenomics is more straightforward and has been discussed elsewhere. 34, 35, 56 SUMMARY NGS-based diagnostics for infectious disease is poised to enter routine clinical practice for a variety of applications. Here we have summarized its usefulness for assays based on cultured isolates and primary specimens. Improved turnaround times, the ability to compile a comprehensive database of local pathogens and clones, enhanced literacy among trainees and laboratory directors, and the ability to develop applications for the unique needs of individual medical centers all number among the benefits of in-house NGS testing. Toward this, a well thought out reporting structure and ongoing dialogue with clinicians is critical for making pathogen NGS data actionable. The challenges of in-house pathogen NGS include financial, technical, and validation/quality control issues. The need to batch tests for efficiency of scale may necessitate a strategy for bringing pathogen NGS in-house on multiple fronts. For example, although the sequencing of cultured bacterial isolates is a good founder test, planning for other applications should follow closely behind. The future may bring microbiome-based diagnostics and other paradigm-shifting mNGS assays. For example, although host reads are an obstacle to pathogen detection, host transcriptional profiling (ie, RNA-based sequencing) could provide clinically relevant information about the immune response. An mNGS understanding of host response to infection may also help distinguish colonization from infectious disease (analogous to the enumeration of host polymorphonuclear neutrophils in smears prepared from sterile site specimens). 58, 59 As the price of NGS comes down, and literacy and comfort levels go up, it will be exciting to see how these platforms and technologies enter into routine practice. No author has commercial or financial conflicts of interest. N.D. Pecora is supported by funds available from the Department of Pathology and Laboratory Medicine, University of Rochester Medical Center. Whole-genome sequencing of bacterial pathogens: the future of nosocomial outbreak analysis Genomics of foodborne pathogens for microbial food safety Routine use of microbial whole genome sequencing in diagnostic and public health microbiology The fungal tree of life: from molecular systematics to genome-scale phylogenies Harnessing whole genome sequencing in medical mycology A new age in molecular diagnostics for invasive fungal disease: are we ready? Bacterial genome sequencing in the clinic: bioinformatic challenges and solutions Making the leap from research laboratory to clinic: challenges and opportunities for next-generation sequencing in infectious disease diagnostics Next-generation sequencing in clinical microbiology: are we there yet? Understanding and overcoming the pitfalls and biases of next-generation sequencing (NGS) methods for use in the routine clinical microbiological diagnostic laboratory Analytical and clinical validation of a microbial cell-free DNA sequencing test for infectious disease Contact transmission of vaccinia to an infant diagnosed by viral culture and metagenomic sequencing Advancements in next-generation sequencing An epidemic, toxin gene-variant strain of Clostridium difficile Hypervirulent Klebsiella pneumoniae the Barcode of Life Data System International Society of Human and Animal Mycology (ISHAM)-ITS reference DNA barcoding database-the quality controlled standard tool for routine identification of human and animal pathogenic fungi Online databases for taxonomy and identification of pathogenic fungi and proposal for a cloud-based dynamic data network platform Healthcare-associated fungal outbreaks: new and uncommon species, New molecular tools for investigation and prevention Investigating fungal outbreaks in the 21st century One-year molecular surveillance of carbapenem-susceptible A. baumannii on a German intensive care unit: diversity or clonality Whole-genome sequencing as standard practice for the analysis of clonality in outbreaks of methicillin-resistant Staphylococcus aureus in a paediatric setting Serratia marcescens outbreak in a neonatal intensive care unit: new insights from next-generation sequencing applications Community-acquired in name only: a cluster of carbapenem-resistant Acinetobacter baumannii in a burn intensive care unit and beyond Next-generation-sequencing-based hospital outbreak investigation yields insight into klebsiella aerogenes population structure and determinants of carbapenem resistance and pathogenicity Assessment of the local clonal spread of Streptococcus pneumoniae serotype 12F caused invasive pneumococcal diseases among children and adults Investigating clinical issues by genotyping of medically important fungi: why and how? Database establishment for the secondary fungal DNA barcode translational elongation factor 1alpha (TEF1alpha) (1) Outbreak of invasive wound mucormycosis in a burn unit due to multiple strains of Mucor circinelloides resolved by whole-genome sequencing Molecular epidemiology of Staphylococcus aureus bacteremia in a single large Minnesota medical center in 2015 as assessed using MLST, core genome MLST and spa typing A year of infection in the intensive care unit: prospective whole genome sequencing of bacterial clinical isolates reveals cryptic transmissions and novel microbiota Emergence of vancomycin-resistant enterococcus faecium at an Australian hospital: a whole genome sequencing analysis mNGS in clinical microbiology laboratories: on the road to maturity Clinical metagenomics Clinical metagenomic next-generation sequencing for pathogen detection Syndromic and point-of-care molecular testing Novel coronavirus: from discovery to clinical diagnostics Microbiological diagnostic performance of metagenomic next-generation sequencing when applied to clinical practice Rapid analysis of bacterial composition in prosthetic joint infection by 16S rRNA metagenomic sequencing Clinical metagenomics of bone and joint infections: a proof of concept study Advances in clinical sample preparation for identification and characterization of bacterial pathogens using metagenomics Consistent and correctable bias in metagenomic sequencing experiments Comparison of microbial DNA enrichment tools for metagenomic whole genome sequencing Metagenomic sequencing with spiked primer enrichment for viral diagnostics and genomic surveillance Capturing the Resistome: a targeted capture method to reveal antibiotic resistance determinants in metagenomes Real time application of whole genome sequencing for outbreak investigation -what is an achievable turnaround time? Third-generation sequencing in the clinical laboratory: exploring the advantages and challenges of nanopore sequencing FDA-ARGOS is a database with public qualitycontrolled reference genomes for diagnostic use and regulatory science Setup, validation, and quality control of a centralized whole-genome-sequencing laboratory: lessons learned Validation and implementation of clinical laboratory improvements act-compliant whole-genome sequencing in the public health microbiology laboratory GenomeTrakr proficiency testing for foodborne pathogen surveillance: an exercise from 2015 Next-generation sequencing for infectious disease diagnosis and management: a report of the association for molecular pathology College of American Pathologists' laboratory standards for next-generation sequencing clinical tests Laboratory validation of a clinical metagenomic sequencing assay for pathogen detection in cerebrospinal fluid Proficiency testing of virus diagnostics based on bioinformatics analysis of simulated in silico high-throughput sequencing data sets Validation of metagenomic next-generation sequencing tests for universal pathogen detection Evidence-based design and evaluation of a whole genome sequencing clinical report for the reference microbiology laboratory A cell-free DNA metagenomic sequencing assay that integrates the host injury response to infection Metagenomic signatures of gut infections caused by different Escherichia coli pathotypes