key: cord-331701-izkz1hz4 authors: Eden, John-Sebastian; Rockett, Rebecca; Carter, Ian; Rahman, Hossinur; de Ligt, Joep; Hadfield, James; Storey, Matthew; Ren, Xiaoyun; Tulloch, Rachel; Basile, Kerri; Wells, Jessica; Byun, Roy; Gilroy, Nicky; O’Sullivan, Matthew V; Sintchenko, Vitali; Chen, Sharon C; Maddocks, Susan; Sorrell, Tania C; Holmes, Edward C; Dwyer, Dominic E; Kok, Jen title: An emergent clade of SARS-CoV-2 linked to returned travellers from Iran date: 2020-03-17 journal: bioRxiv DOI: 10.1101/2020.03.15.992818 sha: doc_id: 331701 cord_uid: izkz1hz4 The SARS-CoV-2 epidemic has rapidly spread outside China with major outbreaks occurring in Italy, South Korea and Iran. Phylogenetic analyses of whole genome sequencing data identified a distinct SARS-CoV-2 clade linked to travellers returning from Iran to Australia and New Zealand. This study highlights potential viral diversity driving the epidemic in Iran, and underscores the power of rapid genome sequencing and public data sharing to improve the detection and management of emerging infectious diseases. From a public health perspective, the real-time whole genome sequencing (WGS) of emerging viruses enables the informed development and design of molecular diagnostic methods, and tracing patterns of spread across multiple epidemiological scales (i.e. genomic epidemiology). However, WGS capacities and data sharing policies vary in different countries and jurisdictions, leading to potential sampling bias due to delayed or underrepresented sequencing data from some areas with substantial SARS-CoV-2 activity. Herein, we show that the genomic analyses of SARS-CoV-2 strains from Australian returned travellers with COVID-19 disease may provide important insights into viral diversity present in regions currently lacking genomic data. In late December 2019, a cluster of cases of pneumonia of unknown aetiology in Wuhan city, Hubei province, China was reported by health authorities [1] . A novel betacoronavirus, designated SARS-CoV-2, was identified as the causative agent [2] of the disease now known as COVID-19, with substantial human-to-human transmission [3] . To contain a growing epidemic, Chinese authorities implemented strict quarantine measures in Wuhan and surrounding areas in Hubei province. Significant delays in the global spread of the virus were achieved, but despite these measures, cases were exported to other countries. As of 9 March 2020, these numbered more than 100 countries, on all continents except Antarctica; the total number of confirmed infections exceeded 110,000 and there were nearly 4,000 deaths [4] . Although the vast majority of cases have occurred in China, major outbreaks have also been reported in Italy, South Korea and Iran [5] . Importantly, there is widespread local transmission in multiple countries outside China following independent importations of infection from visitors and returned travellers. In New South Wales (NSW), Australia, WGS for SARS-CoV-2 was developed based on an existing amplicon-based Illumina sequencing approach [6] . Viral extracts were prepared from respiratory tract samples where SARS-CoV-2 was detected by RT-PCR using World Health Organization recommended primers and probes targeting the E and RdRp genes, and then reverse transcribed using SSIV VILO cDNA master mix. The viral cDNA was used as input for multiple overlapping PCR reactions (~2.5kb each) spanning the viral genome using Platinum SuperFi master mix (primers provided in Supplementary Table S1 ). Amplicons were pooled equally, purified and quantified. Nextera XT libraries were prepared and sequencing was performed with multiplexing on an Illumina iSeq (300 cycle flow cell). In New Zealand, the ARTIC network protocol was used for WGS [7] . In short, 400bp tiling amplicons designed with Primal Scheme [8] were used to amplify viral cDNA prepared with SuperScript III. A sequence library was then constructed using the Oxford NanoPore ligation sequencing kit and sequenced on a R9.4.1 MinION flow-cell. Near-complete viral genomes were then assembled de novo in Geneious Prime 2020.0.5 or through reference mapping with RAMPART V1.0.6 [9] using the ARTIC network nCoV-2019 novel coronavirus bioinformatics protocol [10] . In total, 13 SARS-CoV-2 genomes were sequenced from cases in NSW diagnosed between 24 January and 3 March 2020, as well as a single genome from the first patient in Auckland, New Zealand sampled on 27 February 2020 (Table 1) . Australian and New Zealand sequences were aligned to global reference strains sourced from GISAID with MAFFT [11] and then compared phylogenetically using a maximum likelihood approach [12] . The Australian strains of SARS-CoV-2 were dispersed across the global SARS-CoV-2 phylogeny ( Figure 1A ). The first four cases of COVID-19 disease in NSW occurred between 24 and 26 January 2020, and these were closely related (with 1-2 SNPs difference) to the prototype strain MN908947/SARS-CoV-2/Wuhan-Hu-1, which is the dominant variant (Supplementary Figures S1 & S2) . Technological advancements and the wide-spread adoption of WGS in pathogen genomics have transformed public health and infectious disease outbreak responses [13] . Previously, disease investigations often relied on the targeted sequencing of a small locus to identify genotypes and infer patterns of spread along with epidemiological data. As seen with the recent West African Ebola [14] and Zika virus epidemics [15] , rapid WGS significantly increases resolution of diagnosis and surveillance thereby strengthening links between clinical and epidemiological data [16] . This advance improves our understanding of pathogen origins and spread that ultimately lead to stronger and more timely intervention and control measures [17] . Following the first release of the SARS-CoV-2 genome [18] , public health and research laboratories worldwide have rapidly shared sequences on public data repositories such as GISAID [19] (n = 236 genomes as of 9 March 2020) that have been used to provide near real-time snapshots of global diversity through public analytic and visualization tools [20] . While all known cases linked to Iran are contained in this clade, it is important to note the presence of two Chinese strains sampled during mid-January 2020 from Hubei and Shandong provinces. It is expected that further Chinese strains would be identified within this clade, and across the entire diversity of SARS-CoV-2 as this is where the outbreak started, including for the outbreak in Iran itself. However, while we cannot completely discount that the cases in Australia and New Zealand came from other sources including China, our phylogenetic analyses, as well as epidemiological (recent travel to Iran) and clinical data (date of symptom onset), provide evidence that this clade of SARS-CoV-2 is linked to the Iranian epidemic, from where genomic data is currently lacking. Importantly, the seemingly multiple importations of very closely related viruses from Iran into Australia suggests that this diversity reflects the early stages of SARS-CoV-2 transmission within Iran. None declared. Wuhan Municipal Health and Health Commission's briefing on the current pneumonia epidemic situation in our city A new coronavirus associated with human respiratory disease in China Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding An interactive web-based dashboard to track COVID-19 in real time World Health Organisation Coronavirus Situation Report -8 th Evolution of Human Respiratory Syncytial Virus (RSV) over Multiple Seasons in New South Wales, Australia. Viruses nCoV-2019 sequencing protocol, Quick J An amplicon-based sequencing framework for accurately measuring intrahost virus diversity using PrimalSeq and iVar MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood Whole Genome Sequencing-Implications for Infection Prevention and Outbreak Investigations. Curr Infect Dis Rep Virus genomes reveal factors that spread and sustained the Ebola epidemic Genomic Insights into Zika Virus Emergence and Spread. Cell Unifying the epidemiological and evolutionary dynamics of pathogens Tracking virus outbreaks in the twenty-first century org -Novel 2019 coronavirus genome Global initiative on sharing all influenza data -from vision to reality Nextstrain: real-time tracking of pathogen evolution Australia 01-Mar-20 SE Asia