key: cord-1034458-5q3d9rf8 authors: Bistolas, Kalia; Vega Thurber, Rebecca title: Crystal ball 2020: viral discovery in the ‘realm’ of COVID‐19 date: 2020-12-06 journal: Environ Microbiol Rep DOI: 10.1111/1758-2229.12912 sha: 2fab039b9926965b891c583c0d9cd957744b5b0a doc_id: 1034458 cord_uid: 5q3d9rf8 nan As we sit six feet apart in the San Francisco airport terminal, waiting for a flight to our field site, we hear an attendant's voice echoing, 'All passengers must provide proof of a negative RT-qPCR COVID-19 test prior to boarding the airplane'. A year ago, we would have been hardpressed to hear such terminology on any loudspeaker in a major US airport. But a year ago, we were not mid-pandemic. When we reach the front of the boarding line, the attendant checks our documentation as another scans the crowd for anyone looking ill, sweating, coughing. In the corner, a teenager reads about viral replication in the New York Times (Corum and Zimmer, 2020 ). Another few rows over, a child is teaching his two stuffed dinosaurs -both wearing tiny masks -how to properly distance themselves. After we land in French Polynesia, we are briefed by an army of attendants and biosafety agents on what COVID-19 is, how SARS-CoV-2 is transmitted, and how to self-administer a diagnostic test and return it to a local processing facility. This is virology gone mainstream. For anyone who has witnessed and characterized epizootics and heard the many predictions of the next major emerging infectious disease (EID) in wildlife, humans, or both (Ogden et al., 2017) , this has been a surreal experience. The surfacing and spread of SARS-CoV-2 has been an explicit (and sobering) reminder that increased human interaction with wildlife and habitat encroachment pose a threat not only to wildlife health but our own. As human influence advances, these potential threats extend beyond the terrestrial and into aquatic ecosystems through the aquaculture we consume, the waterways we utilize, and the organisms we increasingly encounter (Cotruvo et al., 2013) . The magnitude and frequency of mass mortality events (MMEs) within marine ecosystems are escalating incrementally, although it is often unclear if these are due to greater detection efforts or external factors such as pollution and thermal stress mediated by climate change (Fey et al., 2015; Sanderson and Alexander, 2020) . Uniting trends in the emergence of marine epizootics have included changes in either (i) host distribution (e.g. the joined proximity of normally allopatric species through alterations in land use, trade, travel, or migration, and increases in host density) or (ii) microbial phenotype (e.g. change in transmissibility, pathogenicity, or host niche through genetic adaptation) (Daszak, 2000; Ogden et al., 2017) . In marine mammals, a recent study concluded that 72% of MMEs were likely attributable to viral pathogens, indicating unique attributes for spillover and transmissibility as EIDs and reflecting their potential zoonotic threat (Sanderson and Alexander, 2020) . These viruses pose risks to aquatic community stability, biodiversity, conservation efforts and aquaculture economy, and do not appear to be isolated from terrestrial ecosystems. For example, evidence of multiple instances of morbillivirus infection (e.g. canine distemper ) spillover from domesticated dogs to pinnipeds suggest proximity of the two hosts may have played a factor, arbovirus identification (e.g. mosquito-borne togaviruses and flaviviruses in cetaceans) may be indicative of viral vectoring by terrestrial invertebrates, and the atypical spread of a herpesviruslike MME among pilchards (Australia, 1995-98) suggest involvement of seabirds (Lafferty and Harvell, 2014; Bossart and Duignan, 2018) . This epizootic among pilchards also showed the ability of marine viruses to spread rapidly (5000 km in 7 months), further driving the hypothesis that MMEs may advance faster in aquatic ecosystems (due to water having a higher connectivity and lower granularity than air), with pathogens exploiting indirect mechanisms of infection (Harvell, 1999; McCallum et al., 2004) . The biological and economic results of such fast-spreading MMEs can be dramatic. This MME among pilchards alone resulted in >$12 million AUD loss to the Australian aquaculture industry over a 3-year period. Yet this value is an extraordinarily trivial value compared with the billions of dollars lost to epizootics in penaeid shrimp, oysters, abalone, lobster, and other invertebrates and countless other viral pathogens exerting pressure on fisheries and aquatic cultivation industries worldwide (Lafferty et al., 2015) . The reality we face in investigating viral diversity in nonmodel hosts is that this exploration may become salient faster than we realize. This is supported by the ratio between studies on Coronaviridae in Chiroptera or wildlife published in 2020 relative to the total sum of those published in the previous decade or two decades (≥1:1 respectively, search: November 2020, Pubmed). Thankfully, the tools to examine the viral community composition of non-model hosts and ecosystems are accessible now more than ever. Technology ranging from smartphone-enabled nanoscale microscopy (Diederich et al., 2020 (preprint) , Wei et al., 2013) to desk-side sequencing (e.g. minION; Greninger et al., 2015) is becoming available and utilitarian for everyday users, expanding not only our insight into the repertoire of viral biogeography and tropism in the animals we consume and sell, but in our understanding of the ecosystems in our own backyards. Moreover, terabases of data from over a decade of metagenomic sequencing represent a trove of potential unmined viral sequences if paired with sufficient metadata; viruses are routinely caught on water filters or within tissues and sequenced in tandem with their hosts. In a 2017 issue, Sullivan, Weitz, and Wilhelm noted that by 2020, computational tools to analyse viral diversity and the mechanisms of infection and disease would become easily accessible to biologists -democratized (Sullivan et al., 2017) . As virology rests in the spotlight, many of these tools are being assimilated into everyday conventional language and advanced by the small flood of computationally minded individuals with a newly vested interest in solving bioinformatic hurdles in viral discovery. A few of these innovations have provided the fundamental steps for navigating surveillance, management, and perhaps prediction of marine EIDs in a generation defined by viral discovery. Order from chaos -the value of discovery-based sequencing in non-model hosts The global 'virome' has long served as a frontier for genetic discovery. In the past decade, we have witnessed exponential growth in cultivated and uncultivated viral genomes (currently >2 million putative virus-like sequences; IMG/vr; Roux et al., 2020) and cellular hosts, contributing to an ever-growing inventory of possible therapeutic bacteriophage, drug therapy vectors, oncoviruses, pathogens, and more. In particular, our knowledge of two groups of viruses -large DNA (often previously excluded due to size-specific purification efforts) and RNA viruses (previously excluded due to DNA-centric sequencing efforts) -have matured. One study alone increased the total number of genomes in one viral group, the nucleocytoplasmic large DNA viruses (NCLDVs) by >1100% . Congruently, giant/jumbo bacteriophage with genomes >500 kb, virions longer than 600 nm, tRNA synthetases and proteinaceous nucleus-like compartments posit exciting new questions about virion architectures, infection and defence strategies, and evolutionary trajectories (Malone et al., 2020) . Highlighting the utility of database mining, another study identified 10 000 RdRp genes, hallmarks of RNA viruses, prompting reevaluation of gene exchange between RNA viral families and complete reconstruction of the new 'realm' of the Ribovaria Wolf et al., 2020) . Members of these newly discovered RNA viruses disproportionately infect invertebrates , plants , and metazoans , yet it remains unclear what impact that these fast-evolving genomes have on their hosts. Without this knowledge, it is near impossible to anticipate zoonoses in aquatic ecosystems and conserve wildlife threatened by disease. However, many of these discoveries provide evolutionary context for highly pathogenic viruses in non-model hosts or may inform future sampling efforts if sequences sharing features comparable to viruses eliciting MMEs are identified. Collectively, viral discovery in non-model hosts has reduced the viral world from an unknowable immensity of taxonomic singletons to a more limited, interconnected taxonomy linked by gene-sharing networks (Koonin and Dolja, 2014; Koonin et al., 2020) and allowed risk-based surveillance of potential MME-eliciting aquatic viruses. While the total scope of diversity remains uncharted in most ecosystems, discovery rates of viruses in some deeply sampled systems and hosts have started to approach saturation, placing constraints on viral community complexity, as well as our understanding of virion morphotypes, gene repertoire and more (Koonin and Dolja, 2014; Gregory et al., 2019) . For example, some conservative estimates put the total number of dsDNA viral genes at 4 million -a value with a finite number of possible virion structural designs and replication strategies. While total gene repertoire may be nearly cosmic, only few genes are shared by a wide variety of viruses (e.g. polymerases -RdRps, RTs, structural elements -SJR and DJR capsids, and helicases and endonucleases -S3Hs and RCREs; Dolja, 2014, Koonin et al., 2020) . These genes have served as cornerstones to redefine viral taxonomy (or 'Megataxonomy'), contextualized by other mobile genetic elements and retroelements . This taxonomic organization may guide our understanding of shared ecological, epidemiological, or evolutionary traits, such as broad predictions of host (e.g. DJR; Nayfach et al., 2020), disease emergence, or infection dynamics in marine ecosystems. Though forecasting power is currently low, these genomic tools -when paired with sufficient data of disease-eliciting environmental conditions -may provide a starting point for pathogen diagnostics, particularly in non-multifactorial marine EIDs. Secondary to its value in disease prediction and spillover prevention, this organization begins to link sequence to function, demonstrating the innovative evolutionary strategies that viruses utilize to navigate their hosts. Animal-virus interactions are contingent on factors ranging from host ontogenetic immunity to environmental temperature. There are numerous hurdlesfrom lack of continuous cell culture of appropriate hosts to unfamiliar histopathologythat can inhibit investigations of infection dynamics in aquatic systems. Although the approachability of multi-omic (metagenomic, transcriptomic, etc.) sequencing has revealed increasingly vast and diverse viromes of wildlife, the ecology of many of these virusestheir hosts, genetic capabilities, pathogenicity, transmissibility, and so on -remain in their nascency. However, snapshot and time-series sequencing have provided a framework for ecological inference based on both (i) the aggregate viral signal within ecosystems or hosts and (ii) the whole discrete genome sequence (and subsequent deduction of ecoevolutionary characteristics). Many tools developed for identifying viruses have become increasingly reference-independent, reliant on both genetic identities and features specific to viral genomes (such as CDS density, kmer content, ORF orientation, and so on; e.g. Roux et al., 2015; Kieft et al., 2020 among others) . Organization of viral signal into shared gene or protein networks have proved essential for describing ecosystem-and community-wide patterns such as niche differentiation, community structure and cohesion, and virome functional similarities (Gregory et al., 2019; Hurwitz et al., 2015) . Determining 'who infects whom' is a deceivingly simple, but fundamentally important first step in determining how viruses are transmitted and alter wildlife populations, particularly among hosts positioned to expedite spillovers or with a fragile conservation status. Microscopy (e.g. FISH, RNAscope, fluorophore tagging, etc.) and microfluidics (e.g. mining SAGs) provide viable alternatives to cultivation in nonmodel hosts but identifying which virus infects which host is now commonly achieved in silico. The field of 'paleovirology' can identify the remnants of past infections in the predicted host (e.g. Geering et al., 2014; Moniruzzaman et al., 2020) through integrations in a germline sequence in a eukaryote, somewhat analogous to components of sequence-dependent defence systems such as CRISPR, though not always in a functional capacity. As it stands, 85% of viral sequences are affiliated with a predicted host , and a database of more than 700 000 nonretroviral endogenized genes have provided insight into the history of infection in mammals (Nakagawa and Takahashi, 2016) . Full viral genome assemblies, while often the norm in those infecting human and model systems, are now also becoming attainable in wildlife and environmental systems. Contingent on sequencing depth, viromes and metagenomes enable assembly of high-abundance viral genomes and provide a snapshot of the net diversity of low-abundance viruses in the form of fragmented contigs. Single-virus genomics (SVG) -flow cytometric sorting, whole genome amplification and sequencing -now enables more complete genome sequences of these lowabundance virions. The development of SVG, in particular, has delivered near-complete genomes of large viruses previously thought to be cellular (Martínez Martínez et al., 2020), and highly microdiverse genomes that previously could not be assembled (Martinez-Hernandez et al., 2017) . These glimpses of full viral genomes also provide insight into viral microdiversity on a population level, lending insight into transmission and persistence (Gregory et al., 2019) . For example, read recruitment and single nucleotide polymorphism detection at the level of complete viral sequences -even those at low abundance -may define vectored transmission between specific wildlife in much the same ways that we may contact-trace those with high-titre infections of SARS-CoV-2 (Laha et al., 2020; Meredith et al., 2020) . Ultimately, these genomic sequences are essential to extend questions beyond 'what is where/when' (viral discovery and evaluation of viral signal) to ask 'what are they doing?' (functional capacity, genomic conformation, transcriptomics, proteomics, etc.) and begin to anticipate their potential to elicit consequential epizootics. When higher resolution is required to determine the individual impact a virus has on the cell(s) it infects, cultivation remains the gold (and often unattainable) standard in virus-host pairing and infection dynamics. Viral isolation and cultivation remain definitive to fulfil 'River's Postulates' and demonstrate causation between pathogen and disease (Rivers, 1937) . However, even if host prediction is correct, culture represents an abnormal system, with the potential for contamination by latent or other opportunists, with conceivably atypical cell types, atypical abiotic characteristics and atypical potential for coinfection. These conditions often make viral challenge experiments (such as those performed when investigating putative pathogens OsHV-1 or SSaDV; Burge et al., 2016) preferable to viral isolation and culture, though they may be similarly convoluted by many of these same factors. Although ambitious endeavours to examine the mechanisms of co-evolution and the genomic underpinnings of infection have made a renaissance in marine bacteria and archaea (Kauffman et al., 2018) , those in marine metazoans continue to lag. In wildlife EIDs, serology, histopathology, microscopy and other gene-based detection methods (e.g. qPCR, host transcriptomics, etc.) fill the gap. In silico protein-protein interaction networks may accelerate our understanding of the viral 'interactome', beyond model hosts as protein functional prediction advances. For example, in silico prediction of viral protein binding site residues with host receptor followed by protein expression and experimental validation via affinity purification mass spectrometry can provide key information about tropism, cell biology and expression patterns during infection, and spillover risk. Indeed, this in silico approach was utilized as proof-ofconcept in a range of scenarios, from identifying binding sites in non-model hosts (Kamal et al., 2019) to evaluating differences in protein interaction networks between bat and human coronaviruses in this most recent pandemic (Ortega et al., 2020) . In 2013, titans of microbiology and symbiosis fields remarked that animals inhabit a bacterial world (McFall-Ngai et al., 2013) . It is not hard to argue that we, in fact, also live in a viral one. With advancing computational tools, the field has developed the ability to explore how viruses and their hosts have altered each other's origins and evolution, and continue to transmit, infect and affect each other's genomes. We have the ability to investigate the intersection between host development and infection, and external environmental impacts and infection. We are also facing a potential epochal shift in the way that we apply these democratized computational tools. The cost of zoonoses and illiteracy in viral diversity in threatened wildlife is no longer hypothetical. These tools may be applied to investigate this larger viral epizootic pool to better anticipate spillover into new species or ourselves and preserve global biodiversity as we encroach on new habitats. Many fields continue to undergo rapid and profound change in response to the viral pandemic -viral ecology is no exception. Programs -both established and newhave coalesced to strategically sequence non-model hosts and ecosystems (Kress et al., 2020; Watsa et al., 2020) , with many calling for coordinated efforts to prevent future spillovers from wildlife. We predict that any high-throughput efforts will generate the development of initial rapid and 'low-investment' in silico analyses that provide a deeper understanding of genomic conformation/modification, viral protein expression/folding/maturation, protein-protein interactions and their relevance to zoonotic risk, in addition to basic taxonomic and evolutionary context. Though not without bias, we hope that the expansion of single-cell sequencing will also provide a higher resolution understanding of viral ecology. Coupled with (i) sufficient accessibility to democratized in silico tools, (ii) well-documented data provenance and (iii) well-reported metadata, open access data are an underutilized resource to explore viral diversity in nonmodel hosts. Only 20% of open access metagenomes are accessible, and even this does not imply functionality (Eckert et al., 2020) . From this dataset, Nayfach et al. (2020) were able to predict putative hosts for >81 000 viral sequences and link multiple viral clades. Further high-throughput discovery of RNA viruses and investigation of their ecology is just beginning. Though sequencing provides the groundwork for many questions, we believe that a deeper understanding of infection dynamics through culture, challenge, histopathology, serology and hypothesis-driven experimentation will endure. In our attempt to identify recent field-revolutionizing advances or predict transformative trends, we could not overlook those provided by the pandemic -at a substantial cost. We could not justify a single new advancement or tool -computational or otherwise -that we think will have more of an impact on the field of viral ecology than new human capital. We have seen openly available computational programs blossom over the last few years, and a wealth of data that is underexplored that is both publicly accessible and relatively inexpensive to access. When you add curiosity, a bit of time, and a ruthless need to understand the viral world we find ourselves inhabiting into this mix -all the things that a disproportionately large number of this next generation may harbour -there is unequivocally no predicting what we will learn about viral ecology in the natural world. This is the mainstream, entering virology. Emerging viruses in marine mammals Complementary approaches to diagnosing marine diseases: a union of the modern and the classic Bad News Wrapped in Protein: Inside the Coronavirus Genome. The New York Times Waterborne Zoonoses: Identification, Causes and Control Emerging infectious diseases of wildlifethreats to biodiversity and human health Deep roots and splendid boughs of the global plant virome Every fifth published metagenome is not available to science Recent shifts in the occurrence, cause, and magnitude of animal mass mortality events Endogenous florendoviruses are major components of plant genomes and hallmarks of virus evolution Marine DNA viral macro-and microdiversity from pole to pole Rapid metagenomic identification of viral pathogens in clinical samples by realtime nanopore sequencing analysis Emerging marine diseases -climate links and anthropogenic factors Depthstratified functional and taxonomic niche specialization in the 'core' and 'flexible' Pacific Ocean Virome In silico prediction and validations of domains involved in Gossypium hirsutum SnRK1 protein interaction with cotton leaf curl Multan betasatellite encoded βC1 VIBRANT: automated recovery, annotation and curation of microbial viruses, and evaluation of viral community function from genomic sequences Virus world as an evolutionary network of viruses and capsidless selfish elements Global organization and proposed megataxonomy of the virus world Opinion: intercepting pandemics through genomics The Role of Infectious Disease in Marine Communities: Chapter 5 Infectious diseases affect marine fisheries and aquaculture economics Characterizations of SARS-CoV-2 mutational profile, spike protein stability and viral transmission. Infection A jumbo phage that forms a nucleus-like structure evades CRISPR-Cas DNA targeting but is vulnerable to type III RNA-based immunity Single-virus genomics and beyond Single-virus genomics reveals hidden cosmopolitan and abundant viruses Does terrestrial epidemiology apply to marine systems? Animals in a bacterial world, a new imperative for the life sciences Rapid implementation of SARS-CoV-2 sequencing to investigate cases of health-care associated COVID-19: a prospective genomic surveillance study Widespread endogenization of giant viruses shapes genomes of green algae gEVE: a genome-based endogenous viral element database provides comprehensive viral protein-coding sequences in mammalian genomes Emerging infectious diseases: prediction and detection Role of changes in SARS-CoV-2 spike protein in the interaction with the human ACE2 receptor: an in silico analysis Viruses and Koch's postulates VirSorter: mining viral signal from microbial genomic data IMG/VR v3: an integrated ecological and evolutionary framework for interrogating genomes of uncultivated viruses Unchartered waters: climate change likely to intensify infectious disease outbreaks causing mass mortality events in marine mammals Giant virus diversity and host interactions through global metagenomics Viral ecology comes of age Disease Surveillance Focus Group. (2020) Rigorous wildlife disease surveillance Fluorescent imaging of single nanoparticles and viruses on a smart phone Doubling of the known set of RNA viruses by metagenomic analysis of an aquatic virome This work was funded by a National Science Foundation Biological Oceanography Grant (#1635913) to RVT and a National Science Foundation Postdoctoral Research Fellowship in Biology (#1907184) to KB.