key: cord-0268637-5grkoark authors: Shi, M.; Zhao, S.; Yu, B.; Wu, W.-C.; Hu, Y.; Tian, J.-H.; Yin, W.; Ni, F.; Hu, H.-L.; Geng, S.; Tan, L.; Peng, Y.; Song, Z.-G.; Wang, W.; Chen, Y.-M.; Holmes, E. C.; Zhang, Y.-Z. title: Total Infectomes Characterization of Respiratory Infections in pre-COVID-19 Wuhan, China date: 2021-08-31 journal: nan DOI: 10.1101/2021.08.30.21262865 sha: dbe6dca343a691bdcedbe06dfbfa6d55f2bfbec2 doc_id: 268637 cord_uid: 5grkoark At the end of 2019 Wuhan witnessed an outbreak of "atypical pneumonia" that later developed into a global pandemic. Metagenomic sequencing rapidly revealed the causative agent of this outbreak to be a novel coronavirus - SARS-CoV-2. Herein, to provide a snapshot of the pathogens in pneumonia-associated respiratory samples from Wuhan prior to the emergence of SARS-CoV-2, we collected bronchoalveolar lavage fluid samples from 408 patients presenting with pneumonia and acute respiratory infections at the Central Hospital of Wuhan between 2016 and 2017. Unbiased total RNA sequencing was performed to reveal their "total infectome", including viruses, bacteria and fungi. Consequently, we identified 37 pathogen species, comprising 15 RNA viruses, 3 DNA viruses, 16 bacteria and 3 fungi, often at high abundance and including multiple co-infections (12.8%). However, SARS-CoV-2 was not present. These data depict a stable core infectome comprising common respiratory pathogens such as rhinoviruses and influenza viruses, an atypical respiratory virus (EV-D68), and a single case of a sporadic zoonotic pathogen - Chlamydia psittaci. Samples from patients experiencing respiratory disease on average had higher pathogen abundance than healthy controls. Phylogenetic analyses of individual pathogens revealed multiple origins and global transmission histories, highlighting the connectedness of the Wuhan population. This study provides a comprehensive overview of the pathogens associated with acute respiratory infections and pneumonia, which were more diverse and complex than obtained using targeted PCR or qPCR approaches. These data also suggest that SARS-CoV-2 or closely related viruses were absent from Wuhan in 2016-2017. The emergence of COVID-19 at the end of 2019 has had a profound impact on the world. The identification and studies of disease association. This method has been proven highly successful 54 in revealing the entire virome and microbiome in a diverse range of species 5-8 , including the 55 initial identification of SARS-CoV-2 from patients with severe pneumonia 1 . Acute respiratory infections and pneumonia are a significant public health concern on a global 57 scale. However, far less is known about the total "infectomes" associated with respiratory 58 infections and pneumonia. Herein, we report total infectome surveillance of 408 patients 59 presenting with pneumonia and acute respiratory infections at Wuhan Central Hospital prior to 60 the SARS-CoV-2 epidemic. The purpose of this study was to use an un-biased meta-61 transcriptomics tool to characterize the total infectome within these patients. Nevertheless, since 62 the sampling period occurred before the outbreak of COVID-19, this represents the first 63 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 31, 2021. ; https://doi.org/10.1101/2021.08.30.21262865 doi: medRxiv preprint 4 opportunity to characterize the entire range of pathogens simultaneously within a cohort and 64 determine the microbial composition of the population in which SARS-CoV-2 was initially 65 reported. Patient context 69 We considered 408 patients clinically diagnosed with pneumonia or acute respiratory infection at with age ranging from 16 to 90 years (medium, 62). Pre-existing medical conditions present in 74 these patients included hypertension (n=108), diabetes (n=46), bronchiectasia (n=31), chronic 75 obstructive pulmonary disease (COPD, n=23), cancer (n=12), and heart disease (n=10). Based on 76 evaluations made by clinicians at the hospital, 27 patients were described as severely ill, with 77 381 presenting with non-severe syndromes (Fig. 1E ). The mortality for the entire cohort was 78 0.74% (n=3) and the average duration of hospitalization was 8 days (range 2-322, medium 9). Total infectome 80 Meta-transcriptomic analysis of the BALF samples identified a wide range of RNA viruses, DNA 81 viruses, bacteria and fungi. For the purposes of this study, we only characterized those likely 82 associated with human disease (i.e., pathogens). This included (i) existing species that are known 83 to be associated with human disease, and (ii) potentially novel pathogens that have not been 84 previously characterized. For the latter, we only considered DNA and RNA viruses that are 85 related to a virus genus or family that have previously been shown to infect mammals and are at 86 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 31, 2021. ; https://doi.org/10.1101/2021.08.30.21262865 doi: medRxiv preprint 5 relatively high abundance level (i.e., >0.1% of total RNA, or 1000 RPM). Other than new 87 pathogens, the abundance threshold for pathogen positives was set as 1 RPM. Furthermore, 88 commensal bacteria population was not considered here. Based on these criteria we did not identify any potential novel viral pathogens. All the microbes 90 identified belonged to those previously characterized as human pathogens, comprising 15 RNA 91 viruses, 3 DNA viruses, 16 bacteria and 3 fungal pathogens (Fig. 2) . The case positive rate for all 92 pathogens was 71.25% (n=249, Fig. 2A ), many of which were only associated with RNA viruses 93 (27.3%, n=151) or bacteria (28.9%, n=160). Co-infection with two different pathogens was also 94 commonplace, comprising a total of 71 (12.8%) cases ( Fig. 2A) . Among the pathogens 95 identified, most were common respiratory pathogens such as influenza viruses, rhinoviruses, 96 Pseudomonas aeruginosa and Haemophilus influenzae (Fig. 2B ). In addition, we identified a 97 number of unconventional respiratory pathogens that are often not included in respiratory 98 pathogen screening panels but known to cause severe infections in respiratory tract or lungs, 99 including enterovirus D68 and Chlamydia psittaci (see below). 100 Finally, none of the pathogens described here appeared in the blank controls. Since the blank 101 control samples were generated using the same procedures for RNA extraction, library 102 preparation and sequencing as the experimental groups, these results effectively exclude the 103 possibility that the pathogens described above were of contaminant origin. were human rhinoviruses A-C (HRV, n=55), followed by influenza A virus (IAV, n=29), human 108 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 31, 2021. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. were also detected, although at a relatively low prevalence. Of particular interest was the 132 identification of a single case of Chlamydia psittaci -a potentially bird-associated zoonotic For each of the pathogens identified here, we performed a qRT-PCR assay to confirm their 138 presence and validate their abundance level as measured using our meta-transcriptomic 139 approach. Strikingly, strong correlations were observed between the abundance measured by 140 qPCR (i.e., CT value) and those estimated by read count after log 2 conversion (-0.8 < Pearson's 141 R < -1, Fig. 3 ). Hence, the quantification by the two methods is strongly comparable. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 31, 2021. ; https://doi.org/10.1101/2021.08.30.21262865 doi: medRxiv preprint 8 with those identified from the United States (Fig. 4) , although this may reflect limited sampling. Our meta-transcriptomic analysis revealed that many RNA viruses and bacteria detected were 155 present at extremely high abundance levels (>1%, and up to 52% of total RNA) and hence likely 156 indicative of acute disease. This was particularly true of eight species of RNA viruses -EV-D68, 157 the influenza viruses, HRV, HPIV3, 229E -as well as two species of bacteria (Haemophilus 158 influenzae and Pseudomonas aeruginosa) (Fig. 5) . Together, these comprise a total of 54 cases 159 (13.2% of total diseases cases). In marked comparison, high levels of abundance were never observed in the healthy control (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 31, 2021. ; https://doi.org/10.1101/2021.08.30.21262865 doi: medRxiv preprint pandemic, although a number of common cold coronaviruses (HKU1, OC43, 229E, and NL63) 175 were commonly detected in our cohort. Indeed, the earliest COVID-19 case, identified by qRT-176 PCR or next-generation sequencing-based assays performed at designated authoritative The data presented provide a comprehensive overview of the infectome associated with 183 pneumonia or acute respiratory infections in Wuhan, which is clearly more diverse and complex (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 31, 2021. ; https://doi.org/10.1101/2021.08.30.21262865 doi: medRxiv preprint from clinicians and are sometimes ignored entirely, most likely due to the lack of association 197 with severe disease in adults 13-15 . Nevertheless, our results showed these "neglected" respiratory 198 viruses had high diversity, abundance and prevalence in the cohort of pneumonia or acute 199 respiratory patients studied here in comparison to healthy controls, such that their role as agents 200 of disease should not be underestimated. One scenario is that they represent opportunistic 201 pathogens that take advantage of weakened immunity, such as herpesviruses associated acute 202 respiratory distress (ARDS) 16 . It is also possible that their pathogenic effects have yet to be 203 identified and may extend to disease manifestations beyond respiratory infections. For example, 204 deep sequencing of a brain biopsy sample suggesting that the OC43 coronavirus may be 205 associated with fatal encephalitis in humans 17 . 206 We also identified a potential "emerging" infectome, in this case comprising a single virus -EV-207 D68 -that may represent a regional or national outbreak of an unconventional respiratory (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. sample. This highlights the active replication and massive proliferation of viruses within the 220 respiratory system of these patients. 221 Finally, our zoonotic infectome also comprised a single pathogen, Chlamydia psittaci, that is 222 associated with avian species but causes occasional outbreaks in domestic animals (i.e., pigs, 223 cattles, and sheep) and humans 25 . In humans, C. psittaci infections often starts with influenza- HCoV-OC43 discovered here all comprised multiple lineages (Fig. 4) , suggesting these viruses 234 were introduced from diverse sources. Since some of the viruses were closely related to those 235 circulating in other countries it is possible that they represent overseas importations: this is not 236 surprising given that Wuhan is a major domestic travel hub and well linked internationally. The methodology used here served as an unbiased investigation of potential emerging pathogens. 241 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 31, 2021. infections based on the initial diagnosis made by clinicians. All patients were hospitalized and 264 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 31, 2021. ; https://doi.org/10.1101/2021.08.30.21262865 doi: medRxiv preprint 13 subject to bronchoalveolar lavage fluid (BALF) collection required by the initial diagnosis for 265 pneumonia or acute respiratory distress syndrome and independent of this study. The BALF 266 sample was divided into two parts for the clinical laboratory test and this study, respectively. Of Meta-transcriptomic pathogen discovery pipeline 274 We followed a standard protocol for meta-transcriptomics analysis for each BALF sample. Total The remaining sequencing reads were subject to a pathogen discovery pipeline. For virus 286 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. identification, the reads were directly compared against reference virus databases using the 287 blastn program and against the non-redundant protein (nr) database using diamond blastx 31 , with 288 an e-value threshold set at 1E-10 and 1E-5 for blastn and diamond blastx analyses, respectively. Viral abundance was summarized from both analyses, calculated using the relation: total viral 290 reads/total non-redundant reads* 1 million (i.e., reads per million, RPM). To identify highly 291 divergent virus genomes, reads were assembled using megahit 32 into contigs before comparison For pathogen positive samples, the same RNA used for meta-transcriptomics analysis was also 306 subject to a qRT-PCR assay with primers sets designed for a specific or related group of 307 pathogens (Table S1 ). RNA was first reverse transcribed by SuperScriptâ„¢ III First-Strand 308 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 31, 2021. ; https://doi.org/10.1101/2021.08.30.21262865 doi: medRxiv preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 31, 2021. ; https://doi.org/10.1101/2021.08.30.21262865 doi: medRxiv preprint Phylogenetic trees were estimated using the maximum likelihood method implemented in 472 PhyML. Sequences identified from this study were marked with red solid circle. For larger trees, 473 we only show lineages or sub-lineages which contain sequences identified in this study. A new 345 coronavirus associated with human respiratory disease in China A 349 pneumonia outbreak associated with a new coronavirus of probable bat origin Leung 355 GM, Feng Z. Early transmission dynamics in Wuhan, China, of novel coronavirus-infected 356 pneumonia Expanding the RNA virosphere by 358 unbiased metagenomics Francisella tularensis 360 ssp. holarctica in Ringtail Possums Holmes 362 EC, Zhang YZ. The evolutionary history of vertebrate RNA viruses Redefining the invertebrate RNA 366 virosphere High resolution 368 metagenomic characterization of complex infectomes in paediatric acute respiratory 369 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity Clinical features of patients infected with 2019 novel 373 coronavirus in Wuhan, China SARS-CoV-2 detection 375 in patients with influenza-like illness Characteristics of SARS-CoV-2 and COVID-19 Detection of viruses and atypical bacteria 379 associated with acute respiratory infection of children in Hubei Human rhinoviruses Parainfluenza viruses 15. van der Hoek L. Human coronaviruses: what do they cause? Human 386 herpesviruses respiratory infections in patients with acute respiratory distress Human coronavirus 390 OC43 associated with fatal encephalitis A probable new human picornavirus associated with 392 respiratory diseases Global emergence of enterovirus D68: a 394 systematic review No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted Severe respiratory illness associated with a nationwide 400 outbreak of enterovirus D68 in the USA A novel outbreak enterovirus D68 strain 405 associated with acute flaccid myelitis cases in the USA (2012-14): a retrospective cohort 406 study Coxsackievirus A21, enterovirus 68, and acute respiratory tract infection Prevalence and molecular characterizations of enterovirus D68 among children with 412 acute respiratory infection in China between Seroepidemiology of enterovirus D68 infection in China Chlamydia psittaci: update on an underestimated zoonotic agent The COVID-19 Pandemic in Japan Clinical and virological data of the first cases of COVID-19 in 423 Europe: a case series Presymptomatic transmission of 425 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted Covid-19 in South Korea -Challenges of 428 subclinical manifestations The 430 SILVA ribosomal RNA gene database project: improved data processing and web-based 431 tools Fast and sensitive protein alignment using DIAMOND MEGAHIT: an ultra-fast single-node solution 435 for large and complex metagenomics assembly via succinct de Bruijn graph MetaPhlAn2 for enhanced metagenomic taxonomic profiling Recent developments in the MAFFT multiple sequence alignment program trimAl: a tool for automated alignment 443 trimming in large-scale phylogenetic analyses New algorithms 445 and methods to estimate maximum-likelihood phylogenies: assessing the performance of 446 PhyML 3.0 No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted All rights reserved. No reuse allowed without permission.(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.The copyright holder for this preprint this version posted August 31, 2021. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 31, 2021. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.The copyright holder for this preprint this version posted August 31, 2021. ; https://doi.org/10.1101/2021.08.30.21262865 doi: medRxiv preprint