key: cord-0426709-wsc93g8d authors: Kwok, Kirsty T. T.; de Rooij, Myrna M. T.; Messink, Aniek B.; Wouters, Inge M.; Smit, Lidwien A. M.; Heederik, Dick J.J.; Koopmans, Marion P. G.; Phan, My V. T. title: Establishing farm dust as a useful viral metagenomic surveillance matrix date: 2022-04-21 journal: bioRxiv DOI: 10.1101/2021.03.09.434704 sha: 4f3c68a4bdfad22db49cede550fee7fd0d2a1a76 doc_id: 426709 cord_uid: wsc93g8d Farm animals may harbor many viral pathogens, some with zoonotic potential which can possibly cause severe clinical outcomes in animals and humans. Documenting viral content of dust may provide information of potential sources and movement of viruses. Here, we describe a dust sequencing strategy that provides detailed viral sequence characterization from farm dust samples and use this method to document the virus communities from chicken farm dust samples and paired feces collected at multiple time points during the production cycle in broiler farms in the Netherlands. From the sequencing data, Parvoviridae and Picornaviridae were the most prevalent families, detected in 85-100% of all feces and dust samples. Surprisingly large genomic diversity was identified in Picornaviridae and viruses from the Caliciviridae and Astroviridae were also obtained. This study provides a unique characterization of virus communities in farmed chickens and paired farm dust samples and our sequencing methodology enabled the recovery of viral genome sequences from farm dust, providing important tracking details for virus movement between livestock animals and their farm environment. This study serves as a proof of concept supporting that dust sampling could potentially be incorporated as part of viral metagenomic surveillance. Many emerging infectious diseases are of zoonotic origins 1 . Approximately 70 % of zoonoses are 18 proposed to originate from wildlife and outbreaks of MERS-CoV 2 , Nipah virus 3 and SARS-CoV-2 in minks 19 higher diversity within this Sicinivirus clade (Figure 3) . We further reconstructed a ML tree for all 117 identified Sicinivirus polyprotein nucleotide sequences (N=15; farm dust samples: 4; chicken feces: 11) in 118 this study and compared them with global reference sequences (Figure 4) . Two sequences from chicken 119 feces (V_M_005_picorna_5 from farm F_01 and V_M_034_picorna_3 from farm F_03) were most 120 distinct from the rest of the identified sequences (sharing only 69.4-72.3% nt identity) and were most 121 closely related to strain JSY previously identified in China (78.9-79.0% nt identity). Other Sicinivirus 122 sequences from the same farm often formed a monophyletic lineage, suggesting within-farm Sicinivirus 123 sequences are highly genetically similar. Viruses identified from chicken feces and farm dust samples 124 consistently clustered with each other forming their own sub-clusters within major clades in the ML 125 trees, indicating high similarity between sequences from feces and dusts and that they are more closely 126 related as compared to global reference sequences. 127 128 Sixteen astrovirus sequences with ≥ 80% genome coverage were identified in this study, among 130 which only 1 astrovirus sequence was found in farm dust (VE_7_astro_14; farm F_02). There was only 1 131 astrovirus genomic sequence from farm F_01. RNA-dependent RNA polymerase (RdRp) and capsid 132 regions of these sequences were extracted for phylogenetic analyses (Figure 5A and 5B, respectively). 133 ML trees of both regions showed that these 16 astrovirus sequences belonged to two distinct lineages, 134 A1 and A2 ( Figure 5) . Sequences from samples from farm F_02 and F_03 were found in both lineages A1 135 and A2, suggesting co-circulation of different astrovirus strains in both farms. The The phylogenetic comparison of large contig sequences presented above support the findings about 174 the similarities between sequences from dust samples and possible farm animal sources. However, 175 additional information can be obtained with global analyses that would compare all available sequence 176 data from the dust material. A method that allows a comparison of the large amount of unclassified 177 sequences that are commonly observed in NGS data would provide additional information. We 178 therefore used a kmer comparison tool MASH 43,44 that prepares a hash description of all kmers of a 179 given length and then allows rapid quantitative comparison of similarity of the kmers sets generated 180 from different sequenced samples. We used the Mash distance function to calculate a Jaccard distance 181 value for all pairs of dust sequence samples. 182 We compared all assembled viral contigs of the dust samples. inter-farm samples. It is also apparent that farm F_01 and farm F_02 were slightly more related to each 186 other than either were to farm F_03. Similar patterns were obtained using all quality-controlled short 187 read data from each dust sample (Figure 7 In this study, we described a random-primed viral metagenomic deep sequencing strategy to viral 199 sequences from dust samples. We were able to obtain long viral sequences providing ≥ 80% viral 200 genome coverage from farm dust samples. Farm dust is known to be associated with adverse 201 respiratory effects observed in farm workers with prolonged exposure 40 ; however, whether viruses play 202 a role is thus far unknown due to a knowledge gap in virus detection and virus diversity in farm dust. 203 Characterizing farm dust viromes could aid in investigating possible health effects of occupational and 204 environmental exposure to virus-containing farm dust. The method we have developed could provide a 205 platform for future surveillance in farm animals and environmental samples. 206 Our phylogenetic analyses indicated that for all three farms, viruses identified from farm dust 207 samples were genetically closer to viruses identified from chicken feces collected from the same farm 208 than from the other two farms in the study. Furthermore, the Jaccard similarity analysis further showed 209 that for both known and unknown sequences the dust sequences were closer to other samples from the 210 same farm, indicating some source specificity in the sampling method. The combined data support the 211 idea that farm dust could potentially be a good proxy for the animals in that farm. This observation is in 212 agreement with a previous study that reported the correlation found in the bacterial antimicrobial 213 resistomes between animal feces and farm dust 46 Northern part: Friesland/Groningen region; Eastern part: Gelderland region). Detailed sampling strategy 277 and sample metadata is described in Table 3 . Each pooled poultry fecal sample contains fresh fecal 278 material from 3-4 chicks. Farm dust samples were collected using a passive air sampling approach using 279 electrostatic dustfall collectors (EDCs) 54, 55 . Electrostatic cloths were sterilized through incubation at 280 200°C for 4 hours. Sterilized electrostatic cloths were then fixed to a pre-cleaned plastic frame. EDCs 281 were exposed for 7 days at 1 meter above the floor with the electrostatic cloths facing up in broiler 282 farms to enable sampling of settling airborne dust instead of resuspended dust from the floor. EDCs 283 were contained in a sterile plastic bag before and after sampling. All samples were transported under 284 cold chain management and stored at -20°C/-80°C before processing. 285 Chicken fecal samples were processed as previously described 56 . Briefly, chicken fecal suspension 288 was prepared in Phosphate-buffered saline (PBS) and treated with TURBO DNAse (Thermo Fisher, USA) 289 at 37 °C for 30 minutes, and then subjected to total nucleic acid extraction using QIAamp viral RNA mini 290 kit (Qiagen, Germany) according to manufacturer's instruction without addition of carrier RNA. For dust 291 samples, electrostatic cloths were incubated in 3% beef extract buffer for 1 hour on rolling as previously 292 published 57 . After incubation, the suspension was collected and centrifugated at 4,000 g at 4°C for 4 293 minutes to pellet any large particles or debris. Total viruses in dust suspension were concentrated using 294 polyethylene glycol (PEG) similar to virus concentration in sewage published previously 58, 59 . Briefly, PEG 295 6000 was added to each dust suspension to make up a final 10% PEG 6000 (Sigma-Aldrich, USA) 296 concentration, followed by pH adjustment to pH 4 and overnight incubation at 4°C with shaking. After 297 overnight incubation, sample was centrifuged at 13,500 g at 4°C for 90 minutes. Supernatant was 298 removed and the remaining pellet was resuspended in 500 L of pre-warmed glycine buffer, and then 299 subjected to 5-minute centrifugation at 13,000 g at 4°C. Supernatant was collected, and supernatants 300 from EDC samples that were collected in the same farm at the same time point (N=1-4) were pooled 301 together for further processing. Viral-enriched dust suspension samples were then treated with TURBO 302 DNase as previously described 56 to remove non-encapsulated DNA, followed by total nucleic acid 303 extraction using MagMAX TM viral RNA isolation kit (Thermo Fisher, USA) according to manufacturer' 304 instructions but without the use of carrier RNA. Reverse transcription and second strand cDNA synthesis 305 of chicken fecal samples and dust samples was performed as previously described 56 Raw reads were subjected to adapter removal using Trim Galore/default Illumina software, followed 315 by quality trimming using QUASR 61 with a threshold of minimum length of 125 nt and median Phred 316 score ≤ 30. The resulting quality controlled reads were de novo assembled using metaSPAdes v3.12.0 62 . 317 De novo assembled contigs were classified using UBLAST 63 against eukaryotic virus family protein 318 databases as previously described 23, 26 . We set a detection threshold of contig with minimum amino acid 319 identity of 70%, minimum length of 300 nt and an e-value threshold of 1 x 10 -10 when interpreting our 320 contig classification results. Contig classification results were analyzed and visualized using R packages 321 including dplyr, reshaped2 and ComplexHeatmap [64] [65] [66] . 322 To compare Jaccard distances between dust samples, all quality controlled short reads, or resulting 334 assembled viral contigs were analyzed using the Triangle function from MASH v2.3 43, 44 with a kmer size 335 of 32 (-k 32) and a sketch size of 10,000 (-s 10000). The resulting distances were visualized in a heatmap 336 using R package ggplot2 71 . 337 338 The raw reads are available in the SRA under the BioProject accession number PRJNA670873 (chicken 340 feces) and PRJNA701384 (farm dust samples) ( Table S1 ). All sequences in phylogenetic analysis have 341 been deposited in GenBank under the accession numbers MW684778 to MW684847 (Table S2) . Figure 5 A B 0.5 aa subs/site Global trends in emerging infectious diseases Evidence for camel-to-human transmission of MERS coronavirus Nipah virus encephalitis reemergence Transmission of SARS-CoV-2 on mink farms between humans and mink and back to humans. Science (80-. ) Zoonotic risks from small ruminants Human infections with the emerging avian influenza A H7N9 virus from wet market poultry: Clinical analysis and characterisation of viral genome Transmission of H7N7 avian influenza A virus to human beings during a large outbreak in commercial poultry farms in the Netherlands Associations between pneumonia and residential distance to livestock farms over a five-year period in a large population-based study Increased risk of pneumonia in residents living near poultry farms: does the upper respiratory tract microbiota play a role? Endotoxin and dust at respirable and nonrespirable particle sizes are not consistent between cage-and floor-housed poultry operations Exposure to poultry dust and health effects in poultry workers: Impact of mould and mite allergens Respiratory response to endotoxin and dust predicts evidence of inflammatory response in volunteers in a swine barn Sustained intensive transmission of Q fever in the south of the Netherlands Epidemic Q fever in humans in the Netherlands Antibodies against MERS coronavirus in dromedary camels Zoonosis emergence linked to agricultural intensification and environmental change Applications of next-generation sequencing technologies to diagnostic virology Metagenomics and the molecular identification of novel viruses A review on viral metagenomics in extreme environments New dimensions of the virus world discovered through metagenomics Viral metagenomics of six bat species in close contact with humans in southern China A preliminary study of viral metagenomics of french bat species in contact with humans: Identification of new mammalian viruses Identification and characterization of Coronaviridae genomes from Vietnamese bats and rats based on conserved protein domains Metagenomic analysis of the RNA fraction of the fecal virome indicates high diversity in pigs infected by porcine endemic diarrhea virus in the United States The Fecal Virome of Pigs on a High-Density Farm Unbiased whole-genome deep sequencing of human and porcine stool samples reveals circulation ofmultiple groups of rotaviruses and a putative zoonotic infection Metagenomics detection and characterisation of viruses in faecal samples from Australian wild birds Metagenomic characterisation of avian parvoviruses and picornaviruses from Australian wild ducks Faecal virome of healthy chickens reveals a large diversity of the eukaryote viral community, including novel circular ssDNA viruses Agriculture and food: the Netherlands and Finland Internationalisation Monitor 2016-II Agribusiness Agriculture; crops, livestock and land use by general farm type, region Salmonella and broiler processing in the United States: Relationship to foodborne salmonellosis Campylobacter spp. As a foodborne pathogen: A review Origin and evolution of highly pathogenic H5N1 avian influenza in Asia Detection of Coxiella Burnetii in ambient air after a large Q fever outbreak Airborne transmission may have played a role in the spread of 2015 highly pathogenic avian influenza outbreaks in the United States Source analysis of fine and coarse particulate matter from livestock houses Occupational exposure to poultry dust and effects on the respiratory system in workers Respiratory symptoms in Swiss farmers: An epidemiological study of risk factors Abundance and diversity of the faecal resistome in slaughter pigs and broilers in nine European countries Mash: Fast genome and metagenome distance estimation using MinHash Mash Screen: High-throughput sequence containment estimation for genome discovery Towards a genomics-informed, real-time, global pathogen surveillance system A diarrheic chicken simultaneously co-infected with multiple picornaviruses: Complete genome analysis of avian picornaviruses representing up to six genera Virus taxonomy: The database of the International Committee on Taxonomy of Viruses (ICTV) The broad host range and genetic diversity of mammalian and avian astroviruses Discovery and genetic characterization of novel caliciviruses in German and Dutch poultry Passive airborne dust sampling with the electrostatic dustfall collector: Optimization of storage and extraction procedures for endotoxin and glucan measurement Assessment of airborne exposure to endotoxin and pyrogenic active dust using Electrostatic Dustfall Collectors (EDCs) Genome Sequences of Seven Megrivirus Strains from Chickens in The Netherlands Seasonal dynamics of DNA and RNA viral bioaerosol communities in a daycare center Metavirome sequencing to evaluate norovirus diversity in sewage and related bioaccumulated oysters Setting a baseline for global urban virome surveillance in sewage Species-independent detection of RNA virus by representational difference analysis using non-ribosomal hexanucleotides for reverse transcription Viral population analysis and minority-variant detection using short read nextgeneration sequencing A new versatile metagenomic assembler Search and clustering orders of magnitude faster than BLAST dplyr: A Grammar of Data Manipulation Reshaping data with the reshape package Complex heatmaps reveal patterns and correlations in multidimensional genomic data MAFFT: A novel method for rapid multiple sequence alignment based on fast Fourier transform IQ-TREE: A fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies RAxML-NG: A fast, scalable and user-friendly tool for maximum likelihood phylogenetic inference ggtree: an r package for visualization and annotation of phylogenetic trees with their covariates and other associated data