key: cord-0260206-iec0cp74 authors: Patrono, Livia Victoria; Vrancken, Bram; Budt, Matthias; Düx, Ariane; Lequime, Sebastian; Boral, Sengül; Gilbert, M. Thomas P.; Gogarten, Jan F.; Hoffmann, Luisa; Horst, David; Merkel, Kevin; Morens, David; Prepoint, Baptiste; Schlotterbeck, Jasmin; Schuenemann, Verena; Suchard, Marc A.; Taubenberger, Jeffery K.; Tenkhoff, Luisa; Urban, Christian; Widulin, Navena; Winter, Eduard; Worobey, Michael; Leendertz, Fabian H.; Schnalke, Thomas; Wolff, Thorsten; Lemey, Philippe; Calvignac-Spencer, Sébastien title: Archival influenza virus genomes from Europe reveal genomic and phenotypic variability during the 1918 pandemic date: 2021-05-14 journal: bioRxiv DOI: 10.1101/2021.05.14.444134 sha: ccd630d8f6fa06623f070292eafe675b5818c1c1 doc_id: 260206 cord_uid: iec0cp74 The 1918 influenza pandemic was the deadliest respiratory pandemic of the 20th century and determined the genomic make-up of subsequent human influenza A viruses (IAV). Here, we analyze the first 1918 IAV genomes from Europe and from the first, milder wave of the pandemic. 1918 IAV genomic diversity is consistent with local transmission and frequent long-distance dispersal events and in vitro polymerase characterization suggests potential phenotypic variability. Comparison of first and second wave genomes shows variation at two sites in the nucleoprotein gene associated with resistance to host antiviral response, pointing at a possible adaptation of 1918 IAV to humans. Finally, phylogenetic estimates based on extended molecular clock modelling suggests a pure pandemic descent of seasonal H1N1 IAV as an alternative to the hypothesis of an intrasubtype reassortment origin. One Sentence Summary Much can be learned about past pandemics by uncovering their footprints in medical archives, which we here demonstrate for the 1918 flu pandemic. As the COVID-19 pandemic unfolds (1), we are facing many questions regarding the emergence and spread of a novel pathogen in human populations. Although every pandemic is unique and making meaningful comparisons is challenging, the current situation naturally rekindles interest 25 in past pandemics (2) . The 1918 influenza A (H1N1) pandemic (hereafter 1918 pandemic) was the largest global catastrophe of infectious origin to affect humankind in the last century. The disease proceeded in three waves: a mild one in the spring of 1918, followed by deadlier waves in the 1918 fall and 1919 winter (3) . Beyond mortality records from that period that allow for comparisons of the 30 scale of this pandemic with subsequent ones, very little is known about even basic biological consequences of the fast and global spread of the 1918 virus. For example, it remains essentially unknown how much genomic diversity arose during the 1918 pandemic and how this diversity impacted viral phenotypes. In the late 1990s, molecular analyses of formalin-fixed, paraffin-embedded tissue blocks and permafrost-preserved bodies from victims of the 1918 pandemic (4) ascertained that the 5 causative agent was an influenza A virus (IAV) of the H1N1 subtype (5) . The reconstruction of two complete IAV genomes from Brevig Mission, Alaska (hereafter BM) (6) and Camp Upton, New York (hereafter CU) (7) , as well as the identification of a few other partial sequences (8, 9) informed evolutionary and functional investigations. These revealed that the 1918 virus harbored gene segments drawn from the diversity of IAV strains circulating in an avian reservoir (10) , and 10 that the hemagglutinin (HA) and polymerase complex genes were major determinants of its pathogenicity (11, 12) . However, our knowledge about an agent that caused an estimated 50-100 million deaths (13) remains extremely limited, notably because it relies on a remarkably sparse corpus of genomic/genetic data, the majority stemming from North America. We have started to meet this gap by applying recent advances in nucleic acid recovery 15 from archival samples (7, 14) to as-yet unexplored European pathology collection material, including the medical archive started by Rudolf Virchow in mid 1800s Germany (15) . This enabled us to sequence complete and partial 1918 influenza virus genomes from specimens sampled in two German cities, then use these data to provide new insights into the genomic and phenotypic diversity of the pandemic strains. 20 To explore the emergence, pandemic and post-pandemic periods, we selected 13 formalin-fixed lung specimens dated between 1900 and 1931, preserved within the Berlin Museum of Medical History at the Charité (Berlin, Germany, n=11) and the pathology collection (Narrenturm) of the 25 Natural History Museum in Vienna, Austria (n=2). This set included six specimens collected during pandemic years in Europe (n=4 in 1918, n=2 in 1919). Details about all specimens, including initial diagnosis, are available in the supplementary information. For each specimen, we heat treated 200 mg of formalin-fixed lung tissue to reverse macromolecule cross-links induced by formalin, then performed nucleic acid extraction. Following DNase treatment and ribosomal RNA depletion, we built high-throughput sequencing libraries and shotgun sequenced them on Illumina® platforms. We identified IAV reads in libraries from 3 of 13 specimens, all of which date to 1918 and were associated with histopathological findings of bronchopneumonia ( Fig. 1A; supplementary text S1 and Fig. S1 ). Viral RNA preservation was generally good, as 5 determined by median fragment lengths well above 100 nucleotides and the lack of any obvious damage pattern (Fig. S9-S11) . Human RNA fragments were on average shorter, possibly indicating better protection of encapsidated viral RNA (supplementary text S2). We did not detect any other viral agent likely to be involved in severe respiratory disease, but identified a number of bacteria which have been previously associated with respiratory diseases (e.g. 10 Pseudomonas aeruginosa and Mycobacterium kansasii in specimens that did not produce influenza reads) and with 1918 influenza (Staphylococcus aureus, Streptococcus pneumoniae, Klebsiella pneumoniae, Pasteurella multocida); in 2 specimens bacterial colonization was also evident from histopathology (supplementary text S1 and Fig. S1 ). for BM and CU, we plotted SNPs unique to these genomes. Missing information represents areas where we did not get any coverage or this was lower than our consensus calling criteria. 20 Together with the available BM and CU sequences, we used these influenza genomes from We could not fully disentangle these comparisons (the two Berlin genomes are not complete, and the two accurately dated genomes from the first wave are from European specimens, while the two accurately dated genomes from the second wave are from North America) and our sample size was very small. However, keeping these limitations in mind, it appears that within-outbreak variability in Berlin was negligible, while genomes sampled on the same continents (0.11-0.16%) and during the same wave (0.11%) exhibited lower overall divergence than genomes sampled on different continents (0.16-0.32%) and during different waves (0.16-0.21%). This pattern is compatible with spatio-temporal effects of local transmission. Altogether, genomic data support a scenario of predominantly local transmission with frequent long-distance dispersal events. Finally, we identified a 69 nt stretch of the PA of BM comprising 8 SNPs that shows a high degree of identity to IAV strains circulating in 1933 or later (Table S4) . Given the size of this fragment and that of the fragments initially targeted to characterize the BM PA sequence 10 (17), this apparent recombination may reflect the contamination of one of the PCR reactions that allowed for the reconstruction of the BM genome with a PCR product derived from a later influenza A virus strain. However, we cannot formally exclude that the BM PA fragment is a genuine but rare case of natural recombination (18) or that it represents a sequencing artefact due to intrahost variation. Resequencing of the BM genome using PCR-independent methods should 15 clarify this issue. In light of the hypothesis of an avian origin of the pandemic virus, we investigated the presence of amino acid (aa) signatures known or suspected to be associated with avian-to-human adaptation, either because they have been characterized functionally or found to be distributed 5 differentially across bird-and human-infecting influenza viruses. We first examined the high quality (but imprecisely dated) genome derived from MU-162 and identified two such positions. In PB2, we found a M631L aa change within the PB2 627 host range domain. Although this mutation has recently been described as a main mediator of adaptation and lethality of an avian influenza virus in mice (19) , all but one human H1N1 strains (including all other 1918 influenza 10 viruses) present a methionine at this position, suggesting that this change did not have profound evolutionary implications. On the contrary, we found that MU-162 has a leucine (L) at position 61 in NP, which is the residue most commonly found in human H1N1 strains prior to the 2009pdm (Fig. S12) , whereas an overwhelming majority of avian influenza A viruses present an isoleucine (I) at this position (20). Interestingly, MU-162 was the only 1918 influenza virus 15 presenting the I61L change, indicating this mutation only reached fixation after the first two waves of the pandemic. We then considered potential coding variation between first and second wave strains. The second wave of the 1918 pandemic was indeed much more severe than the first, and potential adaptations of the virus have been proposed as one possible explanation, among others (3). The 20 presence of the aa residue G222 in the receptor binding domain of the H1 subtype HA protein has already been discussed in this context (8) . This residue confers a binding affinity for both avian and human glycans, while the human-like D222 only efficiently binds human glycans (21, 22) . In a recent study, three of four first wave strain sequences presented G222 while only two of nine second wave strains did (all other strains presented D222) (8) . The three German HAs 25 sequenced here carried the human-like D222, which reduces the likelihood that the frequency of the two variants varied significantly between the first two pandemic waves. The availability of genome-wide information about two first wave strains also allowed us to identify two other aa changes that potentially mark a difference between first and second wave strains. We detected these variations at sites of the NP known to influence host range and 30 susceptibility/resistance to the interferon-induced MxA antiviral protein (20, 23, 24) : first wave strains BE-572 and BE-576 carried avian-like residues at position 16 (G) and 283 (L), whereas the fall strains and MU-162 carried D16 and P283 found in most human H1N1 strains with the exception of the 2009 pandemic H1N1 virus (see references (23) (24) (25) (26) and Fig. S12) . Position 283 is located in the body of the NP and presence of a proline at this site confers resistance to the human MxA protein, to which D16 also contributes, albeit to a lower extent (24, 26) . In addition 5 to supporting the hypothesis of an avian origin of this gene, these mutations may represent hallmarks of early adaptation to humans: during the first months of the pandemic, 1918 influenza viruses may have evolved a better capacity to evade the innate interferon response, which is an important aspect of influenza virus pathogenicity. 10 To start exploring the phenotypic impact of the genomic differences observed in vitro, we focused on the MU-162 strain for which we obtained a high-quality complete genome sequence. Since the polymerase complex contained most of the coding mutations found in MU-162 (4 in PB2, 2 in PB1, 3 in PA and 1 in NP proteins), and is a major pathogenicity factor for BM, we 15 resynthesized viral genes and performed a functional comparison of the activity of the BM and MU-162 polymerases after reconstitution in transfected cells. Using reporter assays, dose response curves revealed a two-fold higher activity of BM compared to MU-162 polymerase (Fig. 3A) . To identify the subunit(s) responsible for this difference, we determined the effect of 20 swapping single polymerase subunits between BM and MU-162 (Fig. 3B) . The introduction of PA from MU-162 into the BM polymerase complex caused a 1.7-fold reduction of its activity (Fig. 3B, P<0.0001 To more precisely pinpoint the variations responsible for the observed differences, we generated BM point mutants, each carrying one of the aa changes identified in the MU-162 30 strain. All three PA mutants significantly reduced the activity of the polymerase compared to BMwt (1.4-fold (P=0.0026) for K158R, 1.3-fold, (P=0.0092) for A337S and 1.9-fold (P<0.0001) for T570I, respectively) (Fig. 3C) . Of these, K158R was located in the region identified as being of uncertain origin in BM. In PB1, the N694S mutation apparently reduced polymerase activity by 1.8-fold (P<0.0001), which might however be explained by a reduced protein expression level. The V539I aa change in PB2 caused a similar 1. It had long been assumed that seasonal H1N1 IAVs descended (all or in part) from 1918 influenza viruses. The sequencing of the first segments and then genomes of 1918 strains showed their very close phylogenetic relationships with seasonal H1N1 viruses over all segments: nonclock tree reconstructions suggested that the latter directly descended from the former (17) . 20 However, the development and application of molecular clock models allowing for host-specific rates of nucleotide evolution (thereafter host-specific local clocks; HSLC) supported an fic an alternative scenario, whereby seasonal H1N1 viruses did not evolve directly from the pandemic virus in HA. These models rather suggested that the HA of seasonal IAVs was acquired from a co-circulating homosubtypic H1 IAV through intrasubtype reassortment (10) . Intriguingly, this scenario is also compatible with other non-genetic information. First, serological studies provided strong evidence that an H3 IAV circulated prior to about 1900 but weak evidence for 5 H3 circulation from ~1900-1918 (27) (28) (29) . Then, the fact that individuals born after 1900 were spared severe outcomes compared to young adults 20-40 years of age was compatible with an initial exposure to H1 viruses better matched to the 1918 pandemic viruses than the putative H3 viruses to which young adults were first exposed to (10) . We revisited these conflicting hypotheses about the origin of human seasonal influenza using our extended sequence data set 10 and first used the same inference approaches as previously applied. Non-clock maximum likelihood (ML) reconstructions indicated that human seasonal H1N1 and 1918 pandemic viruses cluster together with reasonably high bootstrap support, with the seasonal lineage nested within the 1918 pandemic variants for all segments but PB2 (Fig. S2) . Time-measured phylogenetic inference using a host-specific local clock (HSLC) model broadly recovers previously 15 reconstructed phylogenetic patterns for all segments but NP. For all segments but HA and NA the human seasonal lineage nested within pandemic flu diversity, while for HA and NA pandemic viruses form a monophyletic cluster with swine influenza viruses, which is positioned as a sister lineage to humans seasonal lineage (10) (Fig. 4a vs b, Fig. S3) ; this pattern was previously also observed in NP, but not in our reconstructions. 20 Because of the persisting conflict between the difference inference approaches for HA and NA (Fig. 4a, S2 and S3A) , we explored an alternative scenario that allows for a higher evolutionary rate on the branch ancestral to the seasonal lineage. Such a scenario would result in considerably higher divergence between pandemic and seasonal viruses for H1 and N1 than would be expected under a strict clock, and could therefore induce a sister lineage pattern with a 25 relatively deep most recent common ancestor (MRCA) under the HSLC model -which assumes a different but constant rate of evolution in each of the host-specific lineages. We ran simulations that showed that if the rate is elevated on the relevant branch, and this is not accounted for in the clock model, the HSLC inference generally fails to recover the seasonal lineage as a direct descendent of the pandemic virus (Fig. S5) . Fitting an extended HSLC model that allows for a 30 separate rate on the relevant branch inferred the seasonal lineage as a descendant of the pandemic virus in both HA and NA, consistent with the non-clock trees. Across segments, this model resulted in consistently higher posterior mean rates on the seasonal ancestral branch compared to the seasonal clade rates (Fig. 4) , but with a variability that largely reflected the variation in selective constraints on the segments. Less stringent purifying selection in HA and NA accompanied by a stronger rate acceleration potentially explains the observed patterns (Fig. 5 4c). To investigate whether variation in selection pressures could explain the acceleration on the branch leading to the seasonal lineage, we performed selection analyses using codon substitution models. These models did not identify diversifying episodic selection or relaxed selective constraints on this branch in any of the segments, implying that only mutation rate/generation time effects may explain the branch-specific elevated rate. Interestingly, a higher mutation rate 10 for the seasonal predecessor (due to a higher replication rate) has indeed been suggested by some experimental evidence comparing the Brevig Mission strain to seasonal strains (30, 31) . Altogether, these new analyses revive the scenario of a pure pandemic origin of seasonal H1N1 viruses. However, the essentially phenomenological nature of our modelling approach does not, for now, allow us to definitely favor it over the alternative scenario of a homosubtypic 15 reassortment. using an extended HSLC model (and using either model for all other segments). The star denotes the branch that is 5 allowed to have a separate evolutionary rate in the HSLCext model. c) Evolutionary rate estimates for the human lineage (first box) and the seasonal ancestral branch (second box) under the extended HSLC model for each segment ordered according to difference in these two rates. The boxes are colored according to the dN/dS estimate for the human lineage and the seasonal ancestral branch in each segment. We derived considerable insight from the addition of only one complete and two partial 1918 flu genomes. Our analyses show significant genomic variation whose spatial distribution suggests frequent long-distance dispersal events, identify potentially adaptive substitutions in NP between first and second wave viruses, hint at viral polymerase phenotypic variation and rekindle the 15 scenario of a pandemic origin for all segments of subsequent seasonal H1N1 influenza viruses. We acknowledge that these findings remain preliminary. Our sample of genomic diversity is extremely small (four good quality genomes) and in vitro phenotypic characterization cannot accurately predict in vivo phenotypes. Additional genomes from archival samples from various time points starting in 1918, as well as their phenotypic characterization in vitro and in vivo, will undoubtedly provide the opportunity for more robust tests of our hypotheses. As 3 out of the 4 5 archival samples from 1918 analyzed yielded good quality viral RNA, influenza genomic research using medical collections can now probably be considered a low risk-high gain enterprise. Therefore, we anticipate that the main obstacle to a better understanding of the evolution of 1918 flu viruses will be the identification of surviving pathological specimens, which highlights the importance of long-neglected museum activities (32). Figures S1-S12 20 Tables S1-S5 References A new coronavirus associated with human respiratory disease 15 in China Pandemic COVID-19 Joins History's Pandemic Legion Influenza: The mother of all pandemics Initial genetic characterization of the 1918 "Spanish" influenza virus Origin and evolution of the 1918 "Spanish" influenza virus hemagglutinin gene The 1918 influenza pandemic: 100 years of questions answered and unanswered throughput RNA sequencing of a formalin-fixed, paraffin-embedded autopsy lung tissue sample from the 1918 influenza pandemic Autopsy series of 68 cases dying before and during the 1918 influenza pandemic peak Influenza pandemic caused by highly conserved viruses with two receptor-binding variants Genesis and pathogenesis of the 1918 pandemic H1N1 influenza a virus Influenza Virus Hemagglutinin (HA) and the Viral RNA Polymerase Complex Enhance Viral Pathogenicity, but Only HA Induces Aberrant Host Responses in Mice Pathogenesis of the 1918 pandemic influenza virus Updating the accounts: global mortality of the 1918-1920 "Spanish" influenza pandemic Second-pandemic strain of Vibrio cholerae from the philadelphia cholera outbreak of 1849 Wissenschaft im Museum -Ausstellung im Labor Bayesian 15 phylogenetic and phylodynamic data integration using BEAST 1.10. Virus Evol Characterization of the 1918 influenza virus polymerase genes Homologous recombination in negative sense RNA viruses Enhanced pathogenicity and neurotropism of mouse-adapted H10N7 influenza virus are mediated by novel PB2 and NA mutations Genomic signatures of human versus avian influenza A viruses Glycan Microarray Analysis of the Hemagglutinins from Modern and Pandemic Influenza 30 Proc. Natl. Acad. Sci., in press Genomic signature and mutation 35 trend analysis of pandemic (H1N1) 2009 influenza A virus Pandemic Influenza A Viruses Escape from Restriction by Human MxA through Adaptive Mutations in the Nucleoprotein Isolation and molecular characterization of equine H3N8 influenza viruses from pigs in China Deep mutational scanning identifies sites in influenza nucleoprotein that affect viral inhibition by MxA The virtues of antigenic sin: consequences of pandemic recycling on influenza-associated mortality Interpretations of influenza antibody patterns of man Studies on the content of antibodies for equine influenza viruses in human sera Characterization of the reconstructed 1918 Spanish influenza pandemic virus Influenza virus hemagglutinin (HA) and the viral RNA polymerase complex enhance viral pathogenicity, but only HA induces aberrant host responses in mice Empfehlungen zu wissenschaftlichen Sammlungen als 15 Forschungsinfrastruktur The Isolation of Nucleic Acids from Fixed, Paraffin-Embedded Tissues-Which Methods Are Useful When? Extraction and amplification of DNA from formalin-fixed, paraffin-embedded tissues Trimmomatic: A flexible trimmer for Illumina 25 sequence data Improved metagenomic analysis with Kraken 2 Fast and accurate short read alignment with Burrows-Wheeler transform metaSPAdes: A New Versatile Metagenomic Assembler EAGER: efficient ancient genome reconstruction MapDamage2.0: Fast 35 approximate Bayesian estimates of ancient DNA damage parameters A synchronized global sweep of the internal genes of modern avian influenza virus FaBox: An online toolbox for FASTA sequences RDP4: Detection and analysis of recombination patterns in virus genomes New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of 45 PhyML 3.0. -PubMed -NCBI MAFFT multiple sequence alignment software version 7: Improvements in performance and usability AliView: A fast and lightweight alignment viewer and editor for large datasets IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era Some probabilistic and statistical problems in the analysis of DNA sequences Ultrafast approximation for phylogenetic 10 bootstrap Improving bayesian population dynamics inference: A coalescent-based model for multiple loci A synchronized global sweep of the internal genes of 15 modern avian influenza virus Improved Performance, Scaling, and Usability for a High-Performance Computing Library for Statistical Phylogenetics Posterior summarization 25 in Bayesian phylogenetics using Tracer 1.7 HyPhy 2.5-A Customizable Platform for Evolutionary Hypothesis Testing Using Phylogenies Less is more: An adaptive branch-site random effects model for efficient detection of episodic diversifying selection RELAX: Detecting 35 relaxed selection in a phylogenetic framework A parallel BEAST/BEAGLE utility for sequence simulation under complex evolutionary scenarios A simple and fast system for cloning 40 influenza A virus gene segments into pHW2000-and pCAGGS-based vectors The viral polymerase mediates adaptation of an avian influenza virus to a mammalian host Qualimap 2: advanced multi-sample quality control for high-throughput sequencing data Capsid and Infectivity in Virus Detection Relaxed phylogenetics and dating with confidence This work was funded in part by the Intramural Research Program of the National Institute of Allergy and Infectious Diseases of the NIH (DMM and JKT). MAS was supported by US National Institutes of Health grants HG006139 and AI135995. MW was supported by the Bill and Melinda Gates Foundation (INV-004212) and the David and Lucile Packard Foundation Research Council under the European Union's Horizon 2020 research and innovation programme (grant agreement no. 725422-ReservoirDOCS) and from the European Union's Horizon 2020 project MOOD (grant agreement no. 874850). The Artic Network receives funding from the Wellcome Trust through project 206298/Z/17/Z. PL acknowledges support by the Research Foundation -Flanders ('Fonds voor Wetenschappelijk Onderzoek -Vlaanderen This project was also supported by a grant to SCS from the National Research Platform for Zoonoses (Federal Ministry of Education and Research, 01KI1714). Author contributions: Conceptualization