key: cord-0862872-25uueajo authors: O'Toole, Áine; Hill, Verity; Pybus, Oliver G.; Watts, Alexander; Bogoch, Issac I.; Khan, Kamran; Messina, Jane P.; Tegally, Houriiyah; Lessells, Richard R.; Giandhari, Jennifer; Pillay, Sureshnee; Tumedi, Kefentse Arnold; Nyepetsi, Gape; Kebabonye, Malebogo; Matsheka, Maitshwarelo; Mine, Madisa; Tokajian, Sima; Hassan, Hamad; Salloum, Tamara; Merhi, Georgi; Koweyes, Jad; Geoghegan, Jemma L.; de Ligt, Joep; Ren, Xiaoyun; Storey, Matthew; Freed, Nikki E.; Pattabiraman, Chitra; Prasad, Pramada; Desai, Anita S.; Vasanthapuram, Ravi; Schulz, Thomas F.; Steinbrück, Lars; Stadler, Tanja; Parisi, Antonio; Bianco, Angelica; García de Viedma, Darío; Buenestado-Serrano, Sergio; Borges, Vítor; Isidro, Joana; Duarte, Sílvia; Gomes, João Paulo; Zuckerman, Neta S.; Mandelboim, Michal; Mor, Orna; Seemann, Torsten; Arnott, Alicia; Draper, Jenny; Gall, Mailie; Rawlinson, William; Deveson, Ira; Schlebusch, Sanmarié; McMahon, Jamie; Leong, Lex; Lim, Chuan Kok; Chironna, Maria; Loconsole, Daniela; Bal, Antonin; Josset, Laurence; Holmes, Edward; St. George, Kirsten; Lasek-Nesselquist, Erica; Sikkema, Reina S.; Oude Munnink, Bas; Koopmans, Marion; Brytting, Mia; Sudha rani, V.; Pavani, S.; Smura, Teemu; Heim, Albert; Kurkela, Satu; Umair, Massab; Salman, Muhammad; Bartolini, Barbara; Rueca, Martina; Drosten, Christian; Wolff, Thorsten; Silander, Olin; Eggink, Dirk; Reusken, Chantal; Vennema, Harry; Park, Aekyung; Carrington, Christine; Sahadeo, Nikita; Carr, Michael; Gonzalez, Gabo; de Oliveira, Tulio; Faria, Nuno; Rambaut, Andrew; Kraemer, Moritz U. G. title: Tracking the international spread of SARS-CoV-2 lineages B.1.1.7 and B.1.351/501Y-V2 date: 2021-05-19 journal: Wellcome Open Res DOI: 10.12688/wellcomeopenres.16661.1 sha: 3fc1c61e8ca42ff013a773643140651afc77d16f doc_id: 862872 cord_uid: 25uueajo Late in 2020, two genetically-distinct clusters of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) with mutations of biological concern were reported, one in the United Kingdom and one in South Africa. Using a combination of data from routine surveillance, genomic sequencing and international travel we track the international dispersal of lineages B.1.1.7 and B.1.351 (variant 501Y-V2). We account for potential biases in genomic surveillance efforts by including passenger volumes from location of where the lineage was first reported, London and South Africa respectively. Using the software tool grinch (global report investigating novel coronavirus haplotypes), we track the international spread of lineages of concern with automated daily reports, Further, we have built a custom tracking website (cov-lineages.org/global_report.html) which hosts this daily report and will continue to include novel SARS-CoV-2 lineages of concern as they are detected. (COVID- 19) collection. Invited Reviewers Any reports and responses or comments on the article can be found at the end of the article. In December 2020, routine genomic surveillance in the United Kingdom (UK) 1 reported a new and genetically distinct phylogenetic cluster of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) (variant VOC202012/01, lineage B.1.1.7). Preliminary analysis suggests that this lineage carries an unusually large number of genetic changes 2 . The earliest known cases of B.1.1.7 were sampled in southern England in late September 2020, and by December the lineage had spread to most UK regions and was growing rapidly 3 . In October 2020, a separate SARS-CoV-2 cluster (variant 501Y.V2, lineage B.1.351), which carried a different constellation of genetic changes, was detected by the Network for Genomic Surveillance in South Africa 4, 5 . Both lineages carry mutations, especially in the virus spike protein, that may affect virus function, and both appear to have grown rapidly in relative frequency since their discovery. Early analyses of the spatial spread of SARS-CoV-2 highlights the potential for rapid virus dissemination through national and international travel 6, 7 . Therefore continued genomic monitoring of lineages of concern is required. To better characterise the international distribution of lineages B.1.1.7 and B.1.351 we collated SARS-CoV-2 sequences from GISAID 8, 9 and assigned lineages using pangolin (v2.1.6, https://github.com/cov-lineages/pangolin), which implements the nomenclature scheme described in Rambaut et al., 10 . Genomes are assigned lineage B.1.1.7 if they exhibit at least 5 of the 17 mutations inferred to have arisen on the phylogenetic branch immediately ancestral to the cluster (Table 1) 2 ; or to B.1.351 if they exhibit at least 5 of 9 lineage-associated mutations (Table 1) 5 . Lineage count and frequency data have been calculated daily using grinch. Using Air Transport Association (IATA) travel data from October 2020, available through bluedot.global, we aggregated and collated the passenger volumes from international airports in London and South Africa to international destinations on same booking. Destinations with more than 5,000 passengers from London and more than 300 passengers from South Africa during the month of October are displayed on the cov-lineages. org website and in the underlying data for this publication 11 . grinch, with custom python modules that make use of geopandas v0.9, matplotlib v3.2 and seaborn v0.10, combines this information and produces reports with descriptive tables and figures that can be found at https://cov-lineages.org/global_report.html. All of the code underlying this daily lineage tracking web-report can be found at GitHub and Zenodo 12 . grinch is a python-based tool, the analysis pipeline of which is built on a snakemake backbone 13 . Every 24 hours a scheduled cron 14 task runs on our local servers. We download the latest data from GISAID and deduplicate based on sequence names. The sequences are assigned their most likely lineage using pangolin's latest version and model files. All processed metadata is available and maintained on the cov-lineages.org GitHub repository. To run grinch, the user must have access to a GISAID direct download key and a password and provide these within a configuration file for use. The command used to run grinch is grinch -i grinch_config.yaml, using the config file provided at doi:10.5281/zenodo.4640379 15 . Most users will not run grinch themselves, instead all information and useful descriptive figures are provided daily on the web report. Users can navigate to cov-lineages.org in a web browser of choice to view the latest daily report. As of 7th Jan 2021, 45 countries had reported the presence of B. (Figure 1a , b, c) 11 . Although some countries report increases in the relative frequency of B.1.1.7, genome sequencing efforts vary considerably. Potential targeting of sequencing towards travelers from the UK could bias frequency estimates upwards (Figure 1b , c) and differing genome sharing policies and delays may also skew reporting estimates. The time between the initial collection date of a new variant sample in a country and the first availability of a corresponding virus genome on GISAID was, on average, 12 days (range 1-71). The number of B.1.1.7 and B.1.351/501Y.V2 genome sequences reported in each country is a consequence of (i) the intensity of local genomic surveillance; (ii) the level of concern about new variant introductions; (iii) the volume of international travel among affected countries, and (iv) the amount of local transmission following the introduction of lineage from elsewhere. To explore these factors, we analysed the most recent available International Air Transport Association (IATA) travel data (October 2020). We collated the total number of origin-to-destination air journeys between major London international airports and each country. The calculation was repeated for journeys originating in all international South African airports. We focussed on London and South Africa as they are the locations with the first reports and highest reported prevalence of lineages B.1.1.7 and B.1.351 respectively 2,5 . However, due to low SARS-CoV-2 genomic surveillance in many locations, we cannot reject the hypotheses that these lineages initially originated elsewhere. Figure 1d shows destinations receiving >5,000 travellers in October 2020 from the UK (Figure 2 shows destinations receiving >300 travellers from South Africa). Of the countries that receive >5,000 travellers from London, 16 Our study has several limitations. The passenger flight data do not include recent changes to holiday travel, and recent restrictions on travel from the UK and South Africa is not reflected in the mobility data. Further, flight data may not accurately reflect the final destination if multiple tickets are purchased. The discovery and rapid spread of B. This project contains the following extended data: -Supplementary materials with group authorship affiliations and full acknowledgements. In the manuscript, the authors mention that users with access to GISAID direct download could run this pipeline locally, and generate their own reports. Can this tool be adapted to display genomic results for tracking national spread, or even state-level spread of lineages? ○ If this pipeline is intended to be constantly executed locally by users, it would be helpful to provide more information about how to install and run the pipeline, including reference to example input and output files. I have tried to run the pipeline using my GISAID data provision credentials, but that was not successful, as I ran into errors for which I could not find a solution online (GitHub and Zenodo). ○ About the online reports, increasing the font size in the plots being displayed (bar, curves, etc) would make labels and legends more intelligible, and improving the readability of their content. ○ About the flight data, why only flight counts from October are shown? Are these data only used for tracking the potential spread in early stages of viral emergence, or do you see other uses for such data? The colour gradient in the legend of Figure 1 is incomplete and does not go from 1 to 76. I think it must be just a formatting issue. ○ How was the "reported" cases shown in Figures 1 and 2 as well as the effect size and p-value of a suitable model. Following from the previous point, the explanation for the lack of a correlation with absolute numbers seems reasonable. But it still seems to me that flight numbers could correlate with the frequency of B117 at a fixed time interval from the first detected case in a given locality (thus somewhat factoring out sequencing effort in the locality). Is it possible to add this analysis? Please add installation instructions to the GitHub repo 8. Minor comments on https://cov-lineages.org/global_report.html: Figure 3 for each lineage is a map of sequence counts by region. I find the legend here completely baffling. Figure 2 (grey for no data, shades of green nicely spaced and annotated for different values of a continuous variable). For the widespread lineages like B.1.1.7, there's a lot of overplotting on Figures 4 and 5 , which make the counts and the country names very difficult to read. This could be addressed by just making the figures larger. 2. The table of links to news reports is absolutely wonderful. Would it be possible to include a button here to allow users to suggest additional news links? (I assume there's an existing mechanism for doing this, but I couldn't find one, so if not maybe just a link to a github issue with (potentially) a pre-filled title and required information would help?) Really minor comments about the manuscript: The first use of IATA (first para of the methods) is missing "International", i.e. it says "Using Air Transport…". 1. The second use of IATA (second para of results) does not need to be spelled out. 2. Figure 1A seems like it is missing a second Y axis for the number of GISAID genomes reported. In the PDF version and the HTML version it seems that new lines were added wherever there was a '>', e.g. '>5,000 travellers in October' and '>300 travellers from South Africa' both (erroneously?) start on new lines. Is the description of the software tool technically sound? Yes COG-UK) consortiumcontact@cogconsortium.uk: An integrated national scale SARS-CoV-2 genomic surveillance network Preliminary genomic characterisation of an emergent SARS-CoV-2 lineage in the UK defined by a novel set of spike mutations Transmission of SARS-CoV-2 Lineage B.1.1.7 in England: Insights from linking epidemiological and genetic data. bioRxiv. 2021. Publisher Full Text A genomics network established to respond rapidly to public health threats in South Africa PubMed Abstract | Publisher Full Text | Free Full Text Emergence and rapid spread of a new severe acute respiratory syndrome-related coronavirus 2 (SARS-CoV-2) lineage with multiple spike mutations in South Africa Establishment and lineage dynamics of the SARS-CoV-2 epidemic in the UK PubMed Abstract | Publisher Full Text | Free Full Text Genomic Epidemiology of SARS-CoV-2 in Guangdong Province Data, disease and diplomacy: GISAID's innovative contribution to global health A: Accession IDs included in publication Tracking the international spread of SARS-CoV-2 lineages B.1.1.7 and B.1.351/501Y-V2 A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology PubMed Abstract | Publisher Full Text | Free Full Text A: cov-lineages.org website Sustainable data analysis with Snakemake Using cron and crontab A: grinch_config.yaml [Data set Two-step strategy for the identification of SARS-CoV-2 variants co-occurring with spike deletion H69-V70 A cross-country database of COVID-19 testing An interactive web-based dashboard to track COVID-19 in real time A: Supplementary materials with group affiliations for Tracking the international spread of SARS-CoV-2 lineages B.1.1.7 and B.1.351/501Y-V2 An earlier version of this article can be found on Virological (url: https://virological.org/t/tracking-the-international-spread-ofsars-cov-2-lineages-b-1-1-7-and-b-1-351-501y-v2/592).We thank Norelle Sherry, Benjamin Howden and Michelle Sait for their contribution to sequencing in Australia. We also include full acknowledgements and details of group authorships at https://doi.org/10.5281/zenodo.4704471 19 . We would also like to extend our gratitude to everyone involved in the global sequencing effort. The legend in Figure 2 refers to "B.1.1.7" sequences, while the figure shows "B.1.351" sequences. It must be a typo. Are the conclusions about the tool and its performance adequately supported by the findings presented in the article? Yes The paper clearly describes the software and demonstrates its utility. I'd like to commend the authors for putting this tool and the associated website together so quickly, for maintaining both to a very high standard, for making sure that all of the work is open and reproducible, and for the huge amount of work and enormous collaborative effort that has gone into this clear and concise report.I have no serious reservations about the software tool or the data, analyses, or conclusions presented in the manuscript. The software is clear, open-source, sufficiently documented, and almost all of the proposed utility is presented on a clear and regularly updated website. The manuscript is clearly written, well researched, concise, and the conclusions are well justified by the analyses.Of course, I do have a few comments, some of which I hope might be useful in improving the paper and/or the website.Minor comments on the manuscript: I felt there was some tension in this article about whether it's a software note or a public health report. The title suggests the latter, but much of the article (and the article type of "Software Tool Article") suggests the former. Most of this tension for me as a reader came from looking at the title, which has no mention of software, so I think sets up expectations that differ from what is then provided (quite reasonably) in the paper. A very simple way to address this would be to start the title with "Using grinch to track…" or to end it with "… using grinch". Similar to point 1, the abstract doesn't actually mention 'grinch' or https://covlineages.org/global_report.html. It would seem clearer to me to incorporate in the abstract the framing that this article presents a generally applicable software tool, demonstrated on two lineages of concern.2.I would like to see some mention of related efforts somewhere in the report. A full detailed comparison is neither warranted nor useful here because all such websites can and should change regularly, but a couple of sentences comparing cov-lineages.org to sites like outbreak.info and covariants.org would be very useful. At a minimum, it seems useful to list the similar sites the authors are aware of, if only because the fact one can see similar patterns presented on those sites serves as a useful validation of the software presented in this paper. Given the situation, this is a desirable, not a requirement, but I'd love to see some unit tests on the GitHub repo. It seems potentially important to have this when the intention is to produce daily updates for public health. (Though I note that getting the same end result from completely independent implementations on other sites is probably worth more than a lot of unit tests). I struggled with Figure 1D . It wasn't clear to me what 'reported' and 'not reported' mean.And the legend makes it really hard to figure out how colours map to counts. It's stated that there is no correlation between the numbers of sequences and flight numbers. It would be nice to see the scatter plot for this (maybe as an inset to figure 1D ?), 6 . Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool? YesAre the conclusions about the tool and its performance adequately supported by the findings presented in the article? YesCompeting Interests: I am a paid consultant to GISAID, the database on which much of the data analysed in this article is hosted.Reviewer Expertise: Phylogenetics, molecular evolution, bioinformatics. I have a passing familiarity with SARS-CoV-2 data analysis.I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.