key: cord-0628288-t3ofcsxd authors: Consortium, The Nucleic Acid Observatory title: A Global Nucleic Acid Observatory for Biodefense and Planetary Health date: 2021-08-05 journal: nan DOI: nan sha: f64ddcced9c2b47d293e5de7fab12dc60b51bd70 doc_id: 628288 cord_uid: t3ofcsxd The spread of pandemic viruses and invasive species can be catastrophic for human societies and natural ecosystems. SARS-CoV-2 demonstrated that the speed of our response is critical, as each day of delay permitted exponential growth and dispersion of the virus. Here we propose a global Nucleic Acid Observatory (NAO) to monitor the relative frequency of everything biological through comprehensive metagenomic sequencing of waterways and wastewater. By searching for divergences from historical baseline frequencies at sites throughout the world, NAO could detect any virus or invasive organism undergoing exponential growth whose nucleic acids end up in the water, even those previously unknown to science. Continuously monitoring nucleic acid diversity would provide us with universal early warning, obviate subtle bioweapons, and generate a wealth of sequence data sufficient to transform ecology, microbiology, and conservation. We call for the immediate construction of a global NAO to defend and illuminate planetary health. The SARS-CoV-2 pandemic has demonstrated our profound vulnerability to exponentially spreading viruses. Even though the first cases were estimated to have occurred in November or possibly October 1 , they were not detected until December 2 . Had governments acted just a few weeks earlier, the virus may have been excluded from many more countries or even eradicated before becoming a global pandemic, as was accomplished for SARS-CoV-1 in 2003 3 and Ebola in 2014-2015 4, 5 . Notably, those places that responded swiftly and aggressively to SARS-CoV-2 fared best early on, including New Zealand, Vietnam, China, South Korea, and Australia 6 . When an enemy can spread exponentially, an early detection and response requires exponentially fewer resources to achieve containment and eradication. The same logic holds for invasive species, which devastate natural ecosystems and cause immense damage to the agriculture and forestry sectors 7 . Invasions were estimated to cost an inflation-adjusted $54 billion per year in the United States 8 and a minimum of $47-$163 billion worldwide, with costs roughly doubling every six years from 1970-2017 9 . The U.S. and China have the most to lose from future invasions 10 . Analyses of containment and eradication programs targeting 136 invasive plants 11, 12 and 130 insects 13 consistently found that the larger the geographic area of the infestation, the lower the odds of success 14 . The ease of detecting the target pest was one of the most critical factors associated with eradication success. These dynamics underscore the importance of early warning systems to contain and eradicate biological invasives. Pandemic viruses and pests share one critical feature with all other living things: their genomes are made of nucleic acids. In every region of the world, these fragments of DNA and RNA are washed into local bodies of water, where new techniques have begun to permit their reliable detection 15, 16 . SARS-CoV-2 has been detected in wastewater days before the first clinical cases being reported in surrounding communities 17 ; wastewater sequencing has even detected and identified variants of the virus 18-21 as well as many other viruses 22-29 and antibiotic resistance genes 30 . Laboratories using metagenomic sequencing to analyze samples of river water have successfully detected signatures of species from throughout the associated watershed 15 , including terrestrial macrofauna 16 , and can distinguish between closely related species 31 (Fig. 1) . Even low-throughput metagenomic sequencing can detect spiked-in samples of pathogen DNA in less than 8 hours 32,33 . While most of these studies suggest that shotgun metagenomics alone has a higher limit of detection relative to quantitative PCR, there are exceptions in both environmental 34, 35 and clinical samples 36,37 , and it is widely agreed that performing parallel unbiased and target-specific amplification before sequencing can confer qPCR-level sensitivity to specific sequences of interest while maintaining the ability to detect previously unknown agents 38,39 or strains with mutations in commonly used RT-PCR primer binding sites 22 . Therefore, a system that combines unbiased amplification to detect previously unknown or foreign biological agents with more sensitive monitoring of particular species of concern from both wastewater and natural waterways appears to offer the best of both worlds (Fig. 2) . The case study dramatizing the importance of early detection used a much older technology: in 2013, Israel's poliovirus-specific environmental monitoring program detected a nascent outbreak in wastewater samples from the town of Rahat using plaque assays and swiftly initiated mass oral vaccination, eliminating the virus before even a single child came down with paralytic symptoms 40 . Today, we have the potential to detect any virus or invasive species, even novel pandemic-class agents unknown to science, through the comprehensive metagenomic sequencing of waterways. Here we propose to build a Nucleic Acid Observatory to continuously monitor the global environment for past, present, and future pandemic viruses and invasive pests. Sequencing at M monitors the health of the marine ecosystem, while wastewater sequencing at W can sensitively detect human-specific pathogens. Overview of sample processing, sequencing, and analysis. Processed samples are sequenced with and without targeted amplification of known agents. Unbiased sequencing can detect any exponential threat by monitoring the abundance of different k-mer fragments over time. The k-mers comprising any exponentially growing biological agent will increase in frequency as a group, permitting identification of the agent. Once identified, the threat can be added to the list of targets for sequence-specific amplification or enrichment, which will maximize sensitivity and immediately determine whether the new threat is present at any other sites of the Observatory. Targeted sequencing of stored samples could determine the time of introduction to any given site, aiding subsequent investigations. Alternative sequence-specific detection methods such as RT-PCR or LAMP may be used as appropriate. The world's demonstrated vulnerability to SARS-CoV-2 appears reason enough to invest in a comprehensive early-warning system. The United States has lost more citizens to the pandemic than it has in all military conflicts in the past century, yet it devotes less than 1% of its defense budget to biodefense 41 . Indeed, not a single line of the nearly $700 billion 2020 defense appropriations bill mentioned anything biological 42 . Most of the small investment in biosecurity is focused on anthrax and other chemical weapon equivalents, whose theoretical worst-case tolls are orders of magnitude lower than those of pandemics and other autonomously spreading pathogens while being far less accessible due to the need for a sophisticated aerosol delivery system 43, 44 . Agriculture and natural ecosystems are equally, if not more, susceptible to autonomous pathogens. Historical fungal blights afflicting staple crops have led to massive famines in the past, most infamously in Ireland; many still cause tremendous losses today 45 . Concerted outbreaks afflicting multiple crops could be devastating for the world, especially given our increasing reliance on near-monocultures 46 . Livestock can also suffer: the~1% case fatality rate of SARS-COV-2 is dwarfed by the 80-100% lethality of African swine fever in pigs 47 . An estimated 13-27% of all the world's swine were lost to the 2018-2019 outbreak 48 . For the moment, we largely lack the capacity to engineer biological agents capable of spreading invasively in the wild. The sole current exception, CRISPR-based gene drive technologies capable of editing populations of sexually reproducing organisms 49 , underscores the critical importance of a Nucleic Acid Observatory. Relative to viruses, gene drive systems are slower to spread and far more reliably countered 50 , as any given example can be overwritten by building and releasing a corresponding "immunizing reversal" drive system 49, 51 . Even so, a harmful drive system could cause tremendous damage if permitted to grow exponentially; for some species, individuals with relevant technical skills could build and release such a construct single-handedly. Genome sequencing, whether of at-risk organisms or of metagenomes, appears to be the only reliable method of detecting engineered gene drive systems in the environment, and is therefore required for defense. Despite defense projects aimed at sensitively detecting engineered sequences in the environment 52 , no such system has been constructed at scale. The possibility that individuals could single-handedly edit wild species had never been previously imagined before the invention of CRISPR-based gene drive, raising the unsettling possibility that we may stumble upon similarly unanticipated ways of engineering exponential spread. Even if the unknown unknowns prove harmless, a number of emerging threats are clearly visible today. For example, several current avenues of research promise to identify publicly [53] [54] [55] and reveal ways to enhance and weaponize potentially pandemic viruses either deliberately 56 or by better understanding factors governing transmissibility 57 or immune evasion 58 . Such agents would be less readily countered than would a gene drive construct 59, 60 . Constructing an early warning system could simultaneously deter attack and maximize the time available for defenders to develop and apply suitable countermeasures. To be effective, monitoring systems should not rely on any particular phenotype, symptom profile, or sequence to be present in future natural outbreaks or bioweapons. Since very few viruses have been sequenced to date, detection systems based on known sequences may not be sensitive to future pandemic agents 61 . For example, multiplexed CRISPR-based diagnostics capable of detecting all current human pathogens have tremendous medical potential 62 , but they could miss future zoonotic agents. Older systems attempting to detect aerosolized agents 63 are similarly limited to known threats, although they could be upgraded by applying metagenomic sequencing. Most importantly, any remotely sophisticated adversary would deliberately engineer a biological weapon to lack the sequences detected by any pre-existing diagnostics and defenses. In contrast, the only way to evade detection by shotgun metagenomic sequencing would be to somehow engineer a pathogen that entirely lacks sequence-amenable nucleic acids in its genome. Since all known living organisms possess genomes that can be sequenced with current technologies and all nucleic acids present in the environment are thought to be detectable in the water 15,64 , analyses searching for sequences diverging from historical baselines could sensitively detect nucleic acids from any biological construct exhibiting exponential growth. As such, a Nucleic Acid Observatory could reliably detect any and all subtle pandemic viruses, bioweapons, or other autonomous biological agents targeting humans, agriculture, or natural ecosystems within days or weeks of introduction. Invasive species are a primary cause of extinction of native species around the world [65] [66] [67] . Any waterway-based metagenomic sequencing system capable of reliably detecting a pandemic virus or CRISPR-based gene drive system in a terrestrial organism living in the associated watershed should also be capable of detecting invasive species. Consider forests. The American chestnut was once the most common tree in North America; thanks to the invasive chestnut blight, it is nearly extinct. Other species have been devastated or are threatened by other agents such as Dutch elm disease, the emerald ash borer, Asian longhorned beetle, hemlock woolly adelgid, the gypsy moth, Rapid Ohi'a Death, and more 14, 68, 69 . Even farmed trees are vulnerable; citrus greening disease has destroyed the orange industry in Florida, costing the state 34,000 jobs 70 . Assuming that future invasions would be equivalently costly 10 , the NAO could nearly pay for itself simply by enabling detection and eradication before they become widespread. In addition to detecting invasive species, environmental DNA in rivers can provide information on the abundance of native species, including terrestrial mammals 71 . Such methods have helped scientists monitor numerous endangered species [72] [73] [74] [75] , including those that are challenging to track 76, 77 . In aquatic environments, metagenomic sequencing was demonstrated to be superior to all conventional methods for species tracking 78 . A weekly census of species occupancy and abundance in environments could direct conservation resources more efficiently, and may be particularly relevant to assisting species imperiled by climate change. A key question is whether shotgun metagenomics can monitor abundance as effectively as the metabarcoding more commonly used to detect particular marker genes in eukaryota. Recent studies suggest that the metagenomics approaches commonly used by microbial ecologists are nearly as sensitive 79 . In addition, sufficiently deep metagenomic sequencing permits the direct monitoring of genetic diversity within populations, which is widely considered a better metric of species robustness than simple abundance [80] [81] [82] [83] [84] . The environmental sequencing of waterways has revealed that a large fraction to an overwhelming majority of sequences of nucleic acids in the environment are not found in current repositories 27, [85] [86] [87] . In other words, much of life remains entirely unknown to us. The scientific benefits of sequencing most of the remaining genetic diversity on Earth would extend far beyond conservation; indeed, they are difficult to describe without superlatives. Microbial and macrofaunal ecology, population genetics, geochemistry, evolutionary biology, and many more disciplines would be utterly transformed by such a treasure trove of data. Virtually all of the tangible benefits arising from biological research were enabled or inspired by natural systems, from vaccines to antibiotics, aspirin to insulin, and genetic engineering, DNA sequencing, PCR and CRISPR. Sequence most living things, and we can reasonably expect to discover many more tools that may become pillars of biotechnology and medicine. While the anticipated scientific benefits of a one-time spatial survey of genomic abundance and diversity would be staggering, the Nucleic Acid Observatory would go beyond a single snapshot by monitoring genetic diversity throughout the world for an extended time 88 . Large spatial and temporal series are considered the gold standard for ecological and evolutionary studies, but are seldom collected due to the considerable expense relative to the typically small funding streams available for ecological studies. For example, the entire U.S. National Ecological Observatory Network was funded for $434 million 89 , which is a rounding error compared to the cost of COVID-19 ($5.73 trillion in relief bills passed by the U.S. Congress alone) and the $3.2-$16 trillion estimated cost of the pandemic 90, 91 . If implemented in an even moderately standardized manner, the NAO would generate far and away the most comprehensive and useful ecological dataset ever collected. On a high level, the Nucleic Acid Observatory would involve extracting nucleic acids from filters or concentrated water samples from rivers and sewage systems at many sites throughout the world, selectively amplifying any known sequences of concern that may be present, conducting metagenomic sequencing to generate snapshots of the nucleic acid diversity on that day 27 , performing bioinformatic analyses to screen for novel sequences that have become exponentially more common relative to past snapshots, then assembling those sequences to identify the responsible organism or virus. Each of these steps can be economically and technically optimized. There are a variety of sampling methods that should be evaluated to determine which combinations are optimal for cost-effectiveness at different depths and breadths of threat coverage. Sample collection routinely employs artificial filters, generally developed for medical use and adapted to environmental purposes to concentrate nucleic acids 64, [92] [93] [94] . More recent developments include the use of passive samplers to which eDNA will bind 95 and aquatic organisms that naturally concentrate exogenous DNA that can be cultivated and harvested, such as filter-feeding sponges, shellfish and other organisms 96 , with the caveat that concentrations of bivalves may or may not substantially reduce the total concentration of eDNA 97, 98 and the net dynamics of eDNA movement determinants in marine environments remain to be established 98 . Finally, the use of settlement plates and centrifugal concentration of water samples may also have utility in some settings 64 . Each approach has strengths and limitations. Artificial filters require periodic replacement and once samples are collected then storage may be challenging. Some automation of both approaches has been achieved in field deployable remote automated underwater vehicles and there is significant capacity to improve this further 99 . Living filters such as mussels and other shellfish also concentrate exogenous nucleic acids and may be suitable for monitoring, including some in pre-existing populations, but this concept remains very much in early development 96, 100 . Once concentrated and extracted, nucleic acids are typically selectively or indiscriminately amplified by one of several methods 101 . For the Nucleic Acid Observatory, selective and unbiased approaches would be used in parallel to sensitively detect known sequences of interest while retaining the ability to identify previously unknown sequences that are exponentially increasing in abundance. Different combinations of sampling, filtering, extraction, and amplification methods differ in which source organisms and sequences they are able to detect 15,18,19,102-104 . Therefore several distinct methods should be used at each testing site to ensure coverage of diverse threats. For example, three protocols may be used to target DNA from cellular organisms, viral and cell-free DNA, and viral and cell-free RNA. Once amplified, samples may be shotgun sequenced by short or long-read sequencing, or both. Short-read sequencing based on Illumina technology is currently the most cost-effective on a per-read basis and consequently offers the greatest sensitivity, meaning it may be optimal for wastewater 105 . Nanopore sequencing offers much longer reads, superior recognition of gene drive systems and other constructs that combine elements normally never found adjacent to one another, and the ability to sequence DNA containing non-standard bases such as the newly discovered aminoadenine [106] [107] [108] . It can also sequence RNA directly, offering a means of monitoring RNA viruses without reverse transcription 109, 110 . Recent advances enabling adaptive nanopore sequencing to better detect low-abundance samples may allow it to approach the sensitivity of short-read sequencing 111 . For all methods, samples from different sites can be shipped to a single central laboratory located elsewhere, barcoded, and subjected to pooled sequencing 112 . Nanopore sequencing, which offers portable sequencing in remote environments, may be required in any areas where sample stability and logistics do not permit shipment. Sequencing costs for both short and long-read sequencing have dropped considerably faster than Moore's Law (Fig. 3) , a trend that looks set to continue given a variety of early-stage alternatives to current practice 113 . Therefore, the efficacy of a NAO can reasonably be expected to grow with time. Once samples are collected and sequenced, the resulting data must then be bioinformatically analyzed and interpreted. Distinct strategies are needed to monitor known versus unknown threats. Known threats-agents or genetic elements with known sequences-can be detected by mapping sequencing reads (or their protein translations) to databases of reference sequences [115] [116] [117] or by using classification, prediction, or screening methods derived from such databases 116, 118 . Such databases and methods have been used to detect federally-designated select agents, human pathogens, genetically engineered elements, and bacterial genes associated with toxin production and antimicrobial resistance 30, 115, 116, [119] [120] [121] . Once a new threat is discovered and sequenced anywhere in the world, such as when SARS-CoV-2 was first sequenced in China in January of 2020 122 , its presence can be monitored at all NAO testing sites. The number of reads mapped to a threat can also be used to quantitatively track its abundance through time by using comparisons to stable, common organisms like pepper mild mottle virus to convert counts to calibrated abundances across samples 123 . Comprehensive taxonomic profiling of specific groups or even all known species 117, 118, 124 can also be used to monitor for unusual deviations from a testing site's typical profile. Finally, mutational variants in a specific organism of interest can be monitored by aligning reads to its reference sequence, as recently used to track SARS-CoV-2 variants in wastewater 18, 19 . However, such reference-based approaches are ill-suited to detecting a truly novel threat whose sequence is not known a priori. For these unknown threats, we suggest employing a reference-free strategy that looks for signatures from arbitrary sequences that have begun to exponentially increase in frequency at a given location. A signature currently used by reference-free methods for studying variation in human genomic data [125] [126] [127] and bacterial metagenomic data 128, 129 are k-mers-sequences that are k base pairs long. K-mers ranging from ~30 to 40 base pairs in length are typically used because they are highly specific to the source sequence while remaining derivable directly from short sequence reads with little chance of sequencing error. An adaptation of these methods to detect increasing frequencies of sequences at a given testing location might count the occurrences of each k-mer in each new sample, then perform a statistical test to determine whether each k-mer has begun to exponentially increase in relative abundance compared to housekeeping reference genes at a recent point in time. Increasing k-mers that overlap can be assembled into longer sequences 130, 131 . K-mers or assembled sequences may be matched to their containing organisms by mapping to reference databases or used to design primers for targeted amplification of the surrounding sequence 132, 133 . Variations on this basic approach should be explored and may each be more or less sensitive to distinct threats. Additional bioinformatic analyses that may prove useful for detecting emerging threats include metagenome assembly methods 117 to generate reference genomes for the "microbial dark matter" that is absent from existing databases, creating a "pan-genome" of all sequences 134 that are found under normal circumstances at a given testing site, and cloud-based pipelines for pathogen detection and taxonomic identification 135 . The NAO will generate a wealth of environmental sequencing data that will vastly supersede what is currently available. As described above, this data holds great promise for transforming ecology, microbiology, and conservation biology; but its sheer volume poses technical and operational challenges in facilitating its use. Ongoing declines in the cost of hard drives and other storage technologies mean that storing the data in "cold storage" presents only a marginal cost increase over the basal cost of sequencing ( Supplementary Tables 1-4) . The primary challenge will involve making the data from thousands of testing sites available to the scientific community. For this purpose, we propose storing a small, strategically chosen subset of the data in a cloud computing platform, from which users can download the data or operate on it directly by purchasing computing resources in the cloud environment 136 . A possible subsetting strategy is to store all of the most recent week's data but just weekly or monthly snapshots for later periods. The methods used to generate the data by various "collectors" (see below) may vary in space and time, making it essential to keep detailed metadata to allow researchers to account for this methodological variation in their analyses 136 . Nations could adopt a variety of approaches to implement their own Nucleic Acid Observatory, from government-run to entirely outsourced. However, all successful operations will incentivize the reliable detection of rare nucleic acid sequences. In security, quality control is assured by consistently challenging the defenses. In this Observatory context, adequate sensitivity might be assured by employing "red teams" to simulate attacks by introducing foreign nucleic acids at various levels without informing those in charge of sample collection and sequencing. Specifically, each monitoring site will necessarily be frequented by collectors tasked with acquiring and sequencing samples. These are members of the "blue team", whose job is to defend the area by swiftly detecting any invasive sequences. To ensure detection is sufficiently sensitive and reliable, "red team" inspectors frequently challenge the defenses by releasing known amounts of foreign DNA or RNA into the environment within each watershed or sewer system. Incentives of collectors should be tied to their success or failure to detect these foreign sequences, while inspectors should be incentivized to accurately quantify the sensitivity and identify flaws. Ideally, each site will be monitored by at least two competing organizations of sample collectors (whether private, public, or military) and be challenged by at least two competing organizations of inspectors, with incentives to discourage collusion. The resulting competition would encourage innovation seeking greater cost-effectiveness in sample acquisition, processing, and sequencing approaches while enhancing sensitivity. For example, the establishment phase of a NAO could see different groups exploring various collection methods, sampling frequencies, and intensities to determine which approaches can offer the most sensitive and comprehensive monitoring. The cost of a Nucleic Acid Observatory will depend on the desired sensitivity for different nucleic acids and the benefits of scaling, but our back-of-the-envelope calculations estimate a total annual cost of $700 million for a pilot system monitoring wastewater from all 328 U.S. Ports of Entry and all 378 major USGS-designated water basins, in addition to one-time system setup expenses. For a complete system that would additionally monitor all major U.S. towns and cities, most international airports, and either the 378 water basins or all 2278 designated watersheds, we estimate a total annual cost of $5-15 billion annually ( Supplementary Tables 1-4) . Given the annual damages inflicted by invasive pests and especially by COVID-19, this looks like a remarkable bargain, one that would additionally boost employment throughout the nation (Supplementary Information Executive Summary). Other nations could enjoy similar benefits. Sample acquisition, processing, and sequencing represent the bulk of the estimated costs, with data storage and analysis being comparatively less expensive. These costs could potentially be reduced by adapting improvements from clinical sequencing such as superior sample barcoding 112 and optimization of filter use, sample storage, and amplification techniques. Most importantly, overall costs are expected to continue to decline faster than Moore's Law with the price of sequencing (Fig. 3) . As the entire global DNA sequencing market was estimated at only $8-12 billion in 2020-21, investing these sums should achieve substantial bulk efficiencies and discounts. Accelerating the reduction of sequencing costs will offer synergistic benefits with clinical sequencing for healthcare applications. Indeed, sufficiently widespread sequencing of patient samples could plausibly substitute for wastewater sequencing in detecting human pathogens. Wastewater sequencing is somewhat more cost-effective than sequencing many clinical samples, but the primary advantage of wastewater sequencing to detect human pathogens is that it preserves privacy. Whereas sequencing a clinical sample necessarily acquires data allowing genomic identification of the patient, the mixing of many samples in sewage precludes the identification of specific human genomes and the potential disclosure of any health, relationship, or location information. Laws and regulations requiring consent forms tightly constrain precisely what can be done with information obtained from clinical samples. In contrast, the anonymity of environmentally collected data would allow a NAO focused on wastewater and waterways to begin monitoring for any and all harmful invasives -not just those directly attacking humans -sooner rather than later. The NAO would tell us when to act when we are confronted with an emerging pandemic or other exponentially spreading biothreat. It will create a genome repository for virtually all life on Earth, give us contemporary snapshots of species health, and eventually, an historical record of ecological changes. The time to build it is now. Timing the SARS-CoV-2 index case in Hubei province Clinical features of patients infected with 2019 novel coronavirus in Wuhan SARS: how a global epidemic was stopped Ebola in West Africa--CDC's Role in Epidemic Detection, Control, and Prevention CDC's Response to the 2014-2016 Ebola Epidemic -Guinea Comparisons between countries are essential for the control of COVID-19 Invasive Species: Risk Assessment and Management Update on the environmental and economic costs associated with alien-invasive species in the United States High and rising economic costs of biological invasions worldwide Global threat to agriculture from invasive species Which factors affect the success or failure of eradication campaigns against alien species? When are eradication campaigns successful? A test of common assumptions Determinants of successful arthropod eradication programs Eradication of Invading Insect Populations: From Concepts to Applications Epidemiology of the silent polio outbreak in Rahat, Israel, based on modeling of environmental surveillance data Federal agency biodefense funding, FY2013-FY2014 Senate Appropriations Committee. United States Department of Defense Appropriations Anthrax: a continuing concern in the era of bioterrorism Emerging Pandemic Diseases: How We Got to COVID-19 Genome sequence and analysis of the Irish potato famine pathogen Phytophthora infestans Emerging infectious disease: An underappreciated area of strategic concern for food security African Swine Fever: Fast and Furious or Slow and Steady? Evaluating Losses Associated with African Swine Fever in the People's Republic of China and Neighboring Countries Concerning RNA-guided gene drives for the alteration of wild populations Safeguarding CRISPR-Cas9 gene drives in yeast Characterization of the reconstructed 1918 Spanish influenza pandemic virus The Global Virome Project SpillOver: A new tool for ranking the risk of viral spillover to humans using big data Airborne transmission of influenza A/H5N1 virus between ferrets Estimated transmissibility and impact of SARS-CoV-2 lineage B.1.1.7 in England Innate immune evasion strategies of DNA and RNA viruses Inoculating science against potential pandemics and information hazards Information Hazards in Biotechnology How accurately can we assess zoonotic risk? Massively multiplexed nucleic acid detection with Cas13 National Research Council (US) Committee on Effectiveness of National Biosurveillance Systems: Biowatch & the Public Health System Towards the Optimization of eDNA/eRNA Sampling Technologies for Marine Biosecurity Surveillance Globally threatened vertebrates on islands with invasive species Invasive mammal eradication on islands results in substantial conservation gains The threat of invasive species to IUCN-listed critically endangered species: A systematic review Eradication of invasive forest insects: concepts, methods, costs and benefits Economic impacts of invasive species in forest past, present, and future Impact of citrus greening on citrus operations in Florida Fishing for mammals: Landscape-level monitoring of terrestrial and semi-aquatic communities using eDNA from riverine systems The detection of great crested newts year round via environmental DNA analysis Environmental DNA detection of endangered and invasive species in Kejimkujik National Park and Historic Site The Promise and Pitfalls of Environmental DNA and RNA Approaches for the Monitoring of Human and Animal Pathogens from Aquatic Sources Beyond Biodiversity: Can Environmental DNA (eDNA) Cut It as a Population Genetics Tool? Environmental DNA reveals tropical shark diversity in contrasting levels of anthropogenic impact Detecting Southern California's White Sharks With Environmental DNA Calibrating Environmental DNA Metabarcoding to Conventional Surveys for Measuring Fish Species Richness The utility of a metagenomics approach for marine biomonitoring. Cold Spring Harbor Laboratory Inbreeding and extinction in a butterfly metapopulation Uncovering cryptic genetic variation Cryptic genetic variation promotes rapid evolutionary adaptation in an RNA enzyme Conservation of genetic uniqueness of populations may increase extinction likelihood of endangered species: the case of Australian mammals Call for a paradigm shift in the genetic management of fragmented populations Metagenomic analysis of DNA viruses in a wastewater treatment plant in tropical climate Metagenomics-based analysis of viral communities in dairy lagoon wastewater High variety of known and new RNA and DNA viruses of diverse origins in untreated sewage A call for an international network of genomic observatories (GOs) A groundbreaking observatory to monitor the environment The Impacts of the Coronavirus on the Economy of the United States The COVID-19 Pandemic and the $16 Trillion Virus An autonomous vehicle coupled with a robotic laboratory proves its worth Water Quality Monitoring and Sampling Microbiology Society. Ocean Robots Uncover Microbial Secrets Passive eDNA collection enhances aquatic biodiversity analysis Sponges as natural environmental DNA samplers The effect of bivalve filtration on eDNA-based detection of aquatic organisms Riverine distribution of mussel environmental DNA reflects a balance among density, transport, and removal processes In situ Autonomous Acquisition and Preservation of Marine Environmental DNA Using an Autonomous Underwater Vehicle Estuarine molecular bycatch as a landscape-wide biomonitoring tool Environmental DNA: For Biodiversity Research and Monitoring Choice of capture and extraction methods affect detection of freshwater biodiversity from environmental DNA Consistent and correctable bias in metagenomic sequencing experiments High Throughput Sequencing for the Detection and Characterization of RNA Viruses Multi-Platform Assessment of DNA Sequencing Performance using Human and Bacterial Reference Genomes in the ABRF Next-Generation Sequencing Study A widespread pathway for substitution of adenine by diaminopurine in phage genomes Noncanonical DNA polymerization by aminoadenine-based siphoviruses A third purine biosynthetic pathway encoded by aminoadenine-based viral DNA genomes Whole-Genome Sequencing of Human Enteroviruses from Clinical Samples by Nanopore Direct RNA Sequencing Rapid Sequencing of Multiple RNA Viruses in Their Native Form Nanopore adaptive sampling: a tool for enrichment of low abundance species in metagenomic samples One-Seq: A Highly Scalable Sequencing-Based Diagnostic for SARS-CoV-2 and Other Single-Stranded Viruses The Cost of Sequencing a Human Genome Clinical Metagenomic Next-Generation Sequencing for Pathogen Detection SeqScreen: a biocuration platform for robust taxonomic and biological process characterization of nucleic acid sequences of interest A review of methods and databases for metagenomic classification and assembly Mash Screen: high-throughput sequence containment estimation for genome discovery Assessing the Need for and Uses of Sequences of Interest Databases: A Report on the Proceedings of a Two-day Workshop FDA-ARGOS is a database with public quality-controlled reference genomes for diagnostic use and regulatory science PlasmidHawk improves lab of origin prediction of engineered plasmids using sequence alignment Novel 2019 coronavirus genome SARS-CoV-2 Titers in Wastewater Are Higher than Expected from Clinically Confirmed Cases Centrifuge: rapid and sensitive classification of metagenomic sequences novoBreak: local assembly for breakpoint detection in cancer genomes Association mapping from sequencing reads using k-mers Kevlar: A Mapping-Free Framework for Accurate Discovery of De Novo Variants Identifying Group-Specific Sequences for Microbial Communities Using Long k-mer Sequence Signatures KmerGO: A Tool to Identify Group-Specific Sequences With k-mers CAP3: A DNA sequence assembly program ABySS: a parallel assembler for short read sequence data Targeted MinION sequencing of transgenes DNA sonication inverse PCR for genome scale analysis of uncharacterized flanking sequences Faster pan-genome construction for efficient differentiation of naturally occurring and engineered plasmids with plaster. in (Schloss Dagstuhl -Leibniz-Zentrum fuer Informatik GmbH IDseq-An open source cloud-based pipeline and analysis service for metagenomic pathogen detection and monitoring Cloud computing for genomic data analysis and collaboration The Nucleic Acid Observatory Consortium includes many contributing authors who will be listed in the peer-reviewed publication. The following contributors were tasked with preprint preparation and coordination. They are listed here for administrative and procedural purposes only: Sequencing machine cost $600,000 Will shrink with future technologies Sequencing run cost $1,500Reagents are highly profitable for manufacturers. Current cost for NovaSeq at a sequencing core is ~$4k; this number assumes a large scaling discount and normal yearly cost reduction by the time of implementation.Reads per machine run 2.00E+09 Parameter of sequencer. Side benefits for agriculture, science, and the environment:• Invasive pests costing billions per year can be detected early enough for eradication • Deep sequencing will unearth new molecules for the biotech industry • Sequencing waterways can monitor the abundance of all species Sensitivity should more than double every two years:• DNA sequencing costs have fallen a billion-fold over twenty years • The rate of improvement recently slowed to 'only' the level of Moore's Law • At current rates, the sensitivity of the system will double each year • NAO would double the size of the sequencing market, catalyzing further acceleration Defends against pandemics and biological weapons engineered to be subtle:• All living things can be detected with sufficiently deep sequencing • Searching for sequences that swiftly become more common can detect all threats