key: cord-0266593-nmvcgak7 authors: Warr, Amanda; Newman, Caitlin; Craig, Nicky; Vendelė, Ingrida; Pilare, Rizalee; Cruz, Lilet Cariazo; Barangan, Twinkle Galase; Morales, Reildrin G; Opriessnig, Tanja; Venturina, Virginia Mauro; Mananggit, Milagros R; Lycett, Samantha; Domingo, Clarissa YJ; Tait-Burkard, Christine title: No part gets left behind: Tiled nanopore sequencing of whole ASFV genomes stitched together using Lilo date: 2021-12-03 journal: bioRxiv DOI: 10.1101/2021.12.01.470769 sha: 94468ccfbff64451075e9ffa63ba4d03f3a8fdec doc_id: 266593 cord_uid: nmvcgak7 African Swine Fever virus (ASFV) is the causative agent of a deadly, panzootic disease, infecting wild and domesticated suid populations. Contained for a long time to the African continent, an outbreak of a particularly infectious variant in Georgia in 2007 initiated the spread of the virus around the globe, severely impacting pork production and local economies. The virus is highly contagious and has a mortality of up to 100% in domestic pigs. It is critical to track the spread of the virus, detect variants associated with pathology, and implement biosecurity measures in the most effective way to limit its spread. Due to its size and other limitations, the 170-190kbp large DNA virus has not been well sequenced with fewer than 200 genome sequences available in public repositories. Here we present an efficient, low-cost method of sequencing ASFV at scale. The method uses tiled PCR amplification of the virus to achieve greater coverage, multiplexability and accuracy on a portable sequencer than achievable using shotgun sequencing. We also present Lilo, a pipeline for assembling tiled amplicon data from viral or microbial genomes without relying on polishing against a reference, allowing for structural variation and hypervariable region assembly other methods fail on. The resulting ASFV genomes are near complete, lacking only parts of the highly repetitive 3’- and 5’telomeric regions, and have a high level of accuracy. Our results will allow sequencing of ASFV at optimal efficiency and high throughput to monitor and act on the spread of the virus. and implement biosecurity measures in the most effective way to limit its spread. Due to its 30 size and other limitations, the 170-190kbp large DNA virus has not been well sequenced 31 with fewer than 200 genome sequences available in public repositories. Here we present an 32 efficient, low-cost method of sequencing ASFV at scale. The method uses tiled PCR 33 amplification of the virus to achieve greater coverage, multiplexability and accuracy on a 34 portable sequencer than achievable using shotgun sequencing. We also present Lilo, a 35 pipeline for assembling tiled amplicon data from viral or microbial genomes without relying 36 on polishing against a reference, allowing for structural variation and hypervariable region 37 assembly other methods fail on. The resulting ASFV genomes are near complete, lacking 38 only parts of the highly repetitive 3'-and 5'telomeric regions, and have a high level of 39 accuracy. Our results will allow sequencing of ASFV at optimal efficiency and high 40 throughput to monitor and act on the spread of the virus. 41 course the COVID-19 pandemic. Furthermore, multiplexing and the washing and reuse of 96 the most expensive component of sequencing, the flow cells, allowing for cheaper 97 sequencing than other methods. Finally, the sequencer can produce very long reads which 98 improves assembly potential, particularly of highly repetitive genomes. 99 Whilst it is possible to obtain whole genome sequences of ASFV directly from blood-and 100 tissue extract DNA, the high prevalence of pig DNA and the need for baits or other methods 101 to enrich ASFV DNA render that method inapplicable for high-throughput, fast, sequencing. 102 Here, we present a method to sequence the near complete genomes, excluding only the 103 highly repetitive, variable length telomeric 3' and 5' regions, of ASFV using ONT's MinION 104 sequencing device using a tiled amplicon approach. The genome is amplified in 32 large 105 fragments 7kb in length, amplified simultaneously in two PCR pools. We propose this 106 method as an efficient, highly adaptable, more accurate, fast, and cost-effective option for 107 sequencing of continuing ASFV outbreaks as well as historic samples. We present 10 108 complete ASFV genome assemblies from samples from the early stages of the ASFV 109 outbreak in the Philippines in 2019 assembled either with the tiled sequencing approach or a 110 whole genome sequencing shotgun approach. The portability of Nanopore sequencing 111 makes it ideal for exploring the dynamics of ASFV infections as outbreaks emerge. As ASFV 112 continues to spread around the world, efficient methods of sequencing the genome are 113 essential to improve our understanding of the virus and the ongoing global spread. Our 114 primer sets have been optimized for relatively even coverage and have been designed to 115 bind outside of hypervariable regions. They only anneal to roughly 0.8% of the genome and 116 are designed to be well suited to the current outbreak, able to at least partially sequence 117 other genotypes and be easily modifiable should the virus mutate. 118 Finally, we present the Lilo pipeline. While pipelines exist to assemble genomes from tiled 119 amplicons, they rely on aligning reads to a reference and using polishing tools to generate a 120 consensus from the reads. This method works well for producing a genome sequence with 121 SNPs representative of the sequenced genome, however large indels, structural variants, 122 and hypervariable regions that may be difficult to align to a reference are not accurately 123 represented. For ASFV, whole genes can be inserted or deleted and due to homologous 124 recombination it can carry large structural variations, with indels likely being more important 125 than SNPs in creating viral diversity 18 . Therefore, we designed Lilo, which aligns reads to a 126 reference in order to assign them to an amplicon, selects the read with the highest base 127 quality and of the expected length for each amplicon, polishes the read with the remaining 128 reads, removes primers and stitches them together at overlaps ordered and oriented by a 129 reference. This approach makes the pipeline more adaptable to large structural variation and 130 hypervariable regions in genomes than currently available methods. 131 132 Shotgun sequencing of ASFV directly from blood 133 In field sequencing, particularly in developing countries, limits the availability of tools and 134 reagents. During the first outbreaks in the Philippines whole DNA was isolated from the 135 highly hemolysed blood collected from ASFV positive pigs. Samples were digested overnight 136 with proteinase K at 55°C prior to phenol/chloroform/isoamyl alcohol extraction and 137 precipitation with isopropanol before washing with 70% ethanol. Whole DNA samples were 138 prepared for sequencing using the ligation sequencing kit (LSK) LSK109 before sequencing 139 samples on a R9.4 flow cell using a MinION mk1b. The data were basecalled and 140 demultiplexed using Guppy (ONT) and the reads assembled with Flye and polished with 141 medaka. ( Figure 1A ) 142 The time between the beginning of sequencing and detection of the first ASFV read from 143 whole blood ranged from 19 seconds to 3 minutes. As seen in the example of PHL-1969 144 ( Figure 1B ) the percentage of reads that came from ASFV ranged from 0.006% to 0.24%, 145 likely dependent on the viral titers of the animals culled. ASFV samples show a similar size 146 distribution to other DNA found in the samples, if anything a second small peak of larger 147 fragments can be observed ( Figure 1C ). All four sequenced blood samples assembled into a 148 whole genome, however, due to variable coverage, the number of mismatches and indels 149 found in some of the samples were high ( Figure 3B ). 150 151 Tiled amplicon sequencing of ASFV 152 Given the low yield of ASFV sequences from shotgun sequencing, as demonstrated by us 153 and others 19-21 , and the high expense per sample, this sequencing approach was not fit for 154 purpose for high-throughput screening of an ongoing virus outbreak. Therefore, we 155 developed a method to amplify, sequence, and assemble ASFV genomes from pigs. 156 In order to enrich ASFV from the sample easily, a PCR amplification approach was chosen, 157 due to its ease of use and usually readily available tools in many countries and labs. Tiling 158 primers were designed targeting 7kb amplicon length and 1kb amplicon overlap using primal 159 scheme using a set of 26 ASFV reference sequences ( Figure 2A ). The primers are well 160 suited to genotype II, from the current outbreak, but also cover the majority of the genome 161 for at least genotypes I and IV ( Figure 2B ). This relatively long amplicon size was chosen to 162 reduce the number of primer pairs but also to span potential hypervariable regions. After 163 initial individual performance tests, several primers were redesigned from the original set of 164 primers produced by primal scheme, however the majority of them worked well from the 165 beginning. Fragments were amplified using the PCRBio VeriFi Hot Start high fidelity 166 polymerase according to the manufacture's instruction. Following redesign, all primers 167 amplified their targets, however, they did so at different efficiencies leading to uneven 168 coverage over the genome. To test this, evenly concentrated pools of primers (pool 1 and 169 pool 2, Figure 2A and Figure 2C ) were used to amplify blood DNA extract samples from 170 ASFV-infected pigs. Following initial amplifications, pools were split into three pools with 171 primer pair 1, producing a shorter 4kb fragment continuously outperforming the others in a 172 mixed reaction on its own, and primer concentrations in pool 1(Pair 1) and pool 2 were 173 gradually adjusted according to their performance. PCR products per sample were 174 combined, libraries prepared using the LSK109 kit in an R9.4 flow cell. Figure 2D 175 demonstrates the improvement that can be gained by tweaking primer concentrations from 176 evenly represented primer pairs (purple) to optimized primer concentrations (green). These 177 optimizations improve performance for multiplexing of multiple samples on one flow cell. 178 Fresher samples amplify more cleanly, but older, degraded samples will still amplify 179 sufficiently. To show this, we highlight two samples; sample PHL-126, which has been 180 heavily used and degraded, and sample PHL-261, which has been used less frequently 181 (aliquot stored in freezer without frequent use) and is of better quality. As can be seen in the 182 automated electrophoresis result of a tapestation ( Figure 2E ), PHL-126 shows poor 183 amplification and relatively many amplicons <7kb. Good amplification can be seen for the 184 shorter amplicon pair 1 still. PHL-261 on the other hand shows continued good amplification 185 of the desired 7kb and 4kb products of pool 1 (odd), pool 2 (even) and pair 1, respectively. 186 These samples were prepared with the LSK109 kit and multiplexed using native barcoding 187 and run on a R9.4 flow cell with 3 other ASFV genomes having been pooled in 188 representative quantities, the poorer amplification of PHL-126 had lower sequencing 189 throughput than the better quality PHL-261, but was still assembled into a near-complete 190 genome. Figure 2F shows sequencing coverage of the same two samples and the 191 proportion of total reads for each that was assigned to each of the 32 amplicons. 192 The post-amplification DNA Integrity Number (DIN) can be used to help predict 193 multiplexability, as different quality samples will impact the needed throughput. Figure 2G 194 shows the relationship between the post-amplification DIN and throughput <3kb, and Figure 195 2H demonstrates the number of gaps for different throughputs of reads >3kb. Samples PHL-196 126 (orange) and PHL-261 (blue) have been highlighted in pale (super accuracy base calling 197 (SAC)) and strong (high accuracy base calling (HAC)) colors, respectively. Figure 2H Comparing ASFV genomes we found major variation of the genome often originating from 215 indels. Available assembly pipelines were struggling with such variation when it did not 216 correspond to the reference sequence. Therefore, we developed the Lilo pipeline ( Figure 3A ) 217 to assemble the tiled amplicons (https://github.com/amandawarr/Lilo). Whilst Lilo uses a 218 reference alignment to sort the amplicons, it polishes against the highest quality reads rather 219 than a reference sequence. Using this pipeline, highly accurate genomes were obtained with 220 mismatch accuracy approaching Q50 when using SAC ( Figure 2I ) and indel accuracy up to 221 Q40 when compared to a closely related publicly available ASFV genome assembly 222 (MN715134.1) 21 , which may still be quite divergent from these samples in truth. 223 QUAST 22 (v5.0.2; quality assessment tool for genome assemblies) results demonstrate that 224 the increased coverage of the tiled amplicons produced a more accurate assembly than 225 shotgun sequencing of the virus using a whole flow cell sequencing directly from extracted 226 DNA. Shotgun sequencing however, was able to highlight some samples with longer 227 telomeric regions, such as PHL-237, which is a clear advantage of long-read sequencing 228 technology and something that should be explored for more in-detail investigations into the 229 role of the ASFV telomeric regions. Overall, SAC produced fewer mismatches and indels 230 than HAC and should be the preferred method, however, the time for base calling is a trade-231 off. Samples with high percentages of unassigned bases (N's) clearly correspond to DIN 232 numbers ( Figure 3B ). 233 The assembled genomes had excellent agreement on genome structure with the same 234 samples assembled from shotgun sequencing ( Figures 3B & 3C 240 We assessed the quality of Lilo assemblies against those produced with the ARTIC pipeline 241 (v1.2.1). A selection of the ASFV sequencing data were assembled using the ARTIC 242 pipeline, as well as using Lilo, both using the assembled shotgun sequence PHL-1969 as a 243 reference. 244 QUAST analysis shows lower numbers of mismatches against the closest reference 245 (MN715134.1) but higher indels. The percentage of unassigned bases is much higher for 246 ARTIC at around 2.4% whereas Lilo is at 0 or nearly 0%.( Figure 3B ) 247 Comparing Lilo-assembled genomes and ARTIC-assembled genomes to a reference 248 (MN715134.1) a number of indels can be observed. Figure 4A shows a likely real indel in the 249 PHL ASFV samples which all assemblies agree on and which is well supported by the reads. 250 In contrast, Figure 4B shows the only indel unique to almost all of the assemblies produced 251 by the Lilo pipeline while being absent from all artic assemblies and occurs in a 252 homopolymer, Most reads appear to support the deletion assembled by Lilo, whether this is 253 a real sequence or a result of poor accuracy of Nanopore sequencing of homopolymeric 254 regions is a more difficult question. Figure 4C shows an extreme example of a very long 255 homopolymeric region, ASFV has several of these and typically neither assembly method 256 agrees on the length of the homopolymer, with the reads lending no strong support to either 257 assembly. While errors from the Lilo pipeline tended to be randomly dispersed among 258 homopolymers, ARTIC errors tended to be more systematic, appearing consistently across 259 the assembled genomes. Frequently, homopolymers lead to the ARTIC pipeline replacing 260 the base immediately before the homopolymer and the first base of the homopolymer with a 261 pair of N's, as can be seen in Figure 4E As observed in Figure 5A , p72 genotypes do not correspond to the clustering. For example, 274 the E75 strain Spain 1975 isolate, an early genotype II, is grouping with genotype I's. 275 Unfortunately, the phylogeny contains many gaps and lacks both timely and geographic 276 resolution, showing that much more sampling is required. PHL samples clearly cluster within 277 the highly virulent, novel p72 genotype II cluster. 278 Resolving the tree further, selecting only those clustering with the novel p72 genotype II 279 genomes two distinct clusters of PHL sequences can be observed. Whilst, due to the 280 similarity of the genomes, the orders of lower branches are of lower confidence than those of 281 higher branches, reanalysis still suggest two different introductions into the Philippines 282 (internal branch lengths may be found in supplementary documents S1 and S2). As demonstrated by us and others, ASFV sequences can be obtained by direct sequencing 300 from blood or other tissue samples of infected pigs 19-21 . The resulting sequence includes 301 interesting information on the lengths and repeats found in the telomeric regions, which may 302 be helpful for more in-depth investigation into the virus pathology and spread. However, 303 without enrichment for ASFV 17 or depletion of host-methylated DNA 23 sample percentage 304 for ASFV is low relative to host DNA in the samples, meaning that obtaining sufficient ASFV 305 reads to assemble the genome from shotgun sequencing usually requires an entire MinION 306 flow cell, or more, depending on viral titer and original sample type. Bone marrow or blood 307 will likely yield the best virus:host ratio with spleen or muscle, whilst good sources of viral 308 DNA 24 , also contain a large number of nucleated host cells. Even if sufficient data is 309 obtained to assemble the genome, the coverage is likely too poor to sufficiently polish the 310 genome. In contrast, the tiled amplicon method can be used on samples with lower viral 311 titers or degraded DNA, selectively sequences the virus, and can be multiplexed on a flow 312 cell to simultaneously sequence multiple samples at high enough coverage for good 313 polishing. Especially in countries where ASFV is circulating in wild boar or feral pigs, 314 samples may be collected from infected animals that have been dead for a prolonged period 315 of time. It is important that the method is capable of amplifying virus from both high-and low 316 quality samples. Figure 2E demonstrates the variability of DNA integrity post-amplification 317 and that even poor samples that have been degraded amplify and produce near complete 318 genome assemblies. 319 Overall, the PCR amplification method increases coverage, is less prone to exhaust flow 320 cells quickly, allows for multiplexing, and consequently reduces costs, improves genome 321 accuracy, and removes the need for specialized enrichment or depletion methods. 322 Whilst ~7kb amplicons are very large compared to other comparable methods for other 323 viruses, the size of the ASFV genome, the stability of DNA, the relatively low numbers of 324 primer pairs, and the advantages of long reads detecting recombinants more easily make 325 this the best approach. Especially with the small, medium, and large indels that can occur in 326 ASFV 18 , it is important to get good resolution across these regions, which can be achieved 327 easily with large amplicons. It is important though to choose the right, high accuracy 328 polymerase capable of amplifying such long amplicons. We found PCRBio VeriFi to be 329 highly capable of this with the hot start version producing very few non-specific products, 330 whilst the non-hot start version can produce more non-specific product, which may be an 331 advantage for variant testing. As demonstrated in Figures 2E and F and 4B show that even 332 low quality samples can produce whole genome assemblies with few gaps. However, a 333 limitation of the large tiled amplicon method is that should a variant occur at the site of a 334 primer, the amplification of a relatively large section of genome will fail. While this is an 335 inconvenience, it will be simple to redesign a primer to replace the failed one or to act as an 336 alternate primer. It is also possible to amplify across a larger region using the existing 337 primers either side of the failed one, generating a 14kb product, to sequence a larger region 338 and design a primer from the sequenced amplicon. This was found to be possible using the 339 VeriFi HS polymerase and allows for the method to adapt as the virus changes. 340 Whilst Nanopore sequencing methods provide a lot of advantages, such as sequencing on 341 site, portability, and accessibility to less specialist communities, there are, as for any 342 sequencing method, drawbacks. 343 As demonstrated in Figure accurate reference can produce assemblies that are very accurate, however, these methods 364 do not handle structural variants and hypervariable regions well. While the genomes 365 sequenced here do not have any major indels compared to the reference used, diversity in 366 ASFV is partially driven by small, medium and large indels 18 and increased sequencing of 367 samples is likely to reveal more of them. While errors from the Lilo pipeline tended to be 368 randomly dispersed among homopolymers, ARTIC errors tended to be more systematic, 369 appearing consistently across the assembled genomes. Errors occurring in the same 370 position between genomes may be more likely to impact phylogenomic analysis than 371 relatively random errors. The only consistent indel error found across the majority of the Lilo 372 assembled genomes that was always absent in the artic genomes is shown in Figure 4B . 373 This region contains a homopolymers, which is typically difficult to correct from Nanopore 374 sequencing data, however while the ARTIC assembly more closely agrees with the 375 reference, the reads are well-supporting of the deletion found in the Lilo assemblies. It is not 376 unusual when carrying out multi-sequence alignments between whole ASFV genome 377 sequences, even those constructed from reads from a higher accuracy sequencing 378 technology, to find large homopolymers of variable length and it is unclear to what degree 379 these are limitations of sequencing technologies as opposed to real variation. 380 The Lilo pipeline also has some limitations, it currently assumes that any structural variants 381 will not change the length of any given amplicon by more than 5%, it assumes that structural 382 variants will not be dramatic enough to prevent alignment to the reference for the purposes 383 of assigning reads to amplicons and ordering and orienting the polished amplicons. Lilo also 384 assumes the reads will be the full length of the amplicon, making it incompatible with ONT 385 rapid kits that utilize transposases. However, the strength of not relying on polishing reads 386 aligned to a reference is beneficial for genomes where structural variation is expected to be 387 important, and for species with hypervariable regions which may not align and polish well 388 with a reference. The pipeline has been tested on tiled sequences from ASFV, Porcine 389 Africa sequenced, and these should be sequenced to understand the evolution of ASFV, 400 particularly the loss of its dependence on the sylvatic cycle. Given the slow mutational rate of 401 the virus, sequencing individual genes is unlikely to be informative and so to have a chance 402 of seeing variants in the virus the whole genome must be sequenced. The ability to amplify 403 the genotypes with our current scheme decreases with distance from genotype II, and 404 additional primers will need designing in the future to improve coverage over other 405 genotypes, however current coverage using this primer scheme is still likely to be of more 406 use than the p72 gene alone. Coverage gaps can be resolved relatively easily as larger 407 amplicons can be generated with flanking primers. Should primers on older or emerging 408 samples fail, the altered region can be amplified using primers from either side of the failed 409 amplicon, spanning the region, and the sequenced amplicon can be used to design new 410 primers for the region. We have presented an efficient, low cost method for sequencing and assembling ASFV 421 which can be carried out in the lab or in the field during outbreaks. The Lilo pipeline is a 422 lightweight pipeline that can be run on a standard laptop with 16GB RAM and no internet 423 connection, making it ideal for in field bioinformatic analysis of ASFV and other viruses. 424 Samples 426 Blood samples from outbreaks in central Luzon (Philippines) were collected following 427 depopulation of pigs within a defined containment radius. Blood samples were tested for 428 ASFV by PCR. Blood samples from ASFV-positive pigs were pooled at equal amounts by 429 farm before further processing. 430 Blood samples were spun for 20min at 3,000rcf before decanting the supernatant. 5xTEN 432 buffer(0.05M EDTA, 0.5M NaCl, 20mg/ml Proteinase K, 20% SDS, in 0.05M Trix-HCl, 433 pH8.0) were added to a 1x final concentration before incubation overnight at 55°C in a 434 shaking water bath. Equal volumes of phenol were added and gently mixed. Following 20min 435 centrifugation at 3,000rcf the aqueous phase was transferred to a fresh tube. If the phase 436 was very viscous, the phenol phase was re-extracted to improve yields. An equal volume of 437 phenol/chloroform/isoamyl alcohol (25:24:1) was added to the aqueous phase before mixing 438 and separation by centrifugation, 10min, 3,000rcf. The aqueous phase was transferred to a 439 fresh tube before addition of 1:10 3M sodium acetate and an equal amount of isopropanol. 440 Following 1h incubation at -20°C, samples were spun for 10min at 16,000rcf before washing 441 the pellet with 70% Ethanol. The pellet was dried and resuspended in nuclease-free water. Once the complete set of primers had been successfully designed to cover the complete 479 genome, the primers were pooled in equal amounts into two pools of non-overlapping 480 primers. These pools were tested using the same conditions as the individual PCRs, but in a 481 50µl reaction volume and using 1µM of the primer pool. The resulting PCR products were 482 cleaned using 0.4× volume AMPure XP beads (Beckman Coulter, Indianapolis, IN, USA) to 483 remove products smaller than approximately 2kb in length, then pooled equally prior to 484 sequencing. The cleaned PCR products were quantified using a Qubit ds DNA BR assay 485 (Invitrogen, Waltham, MA, USA) and combined in equimolar amounts to a total of 700ng for 486 library preparation according to the Native barcoding genomic DNA (with EXP-NBD104, 487 EXP-NBD114, and SQK-LSK109)-Nanopore protocol. 488 Following bioinformatic analysis of sequencing data, primers which were found to be over-or 489 under-performing were either redesigned or their contribution to the pool was adjusted 490 accordingly, and the new primer pool tested as above in an iterative fashion. Ultimately 2 491 non-overlapping pools and a separate reaction for primer pair 1 were used to obtain the 492 most even coverage and were processed as above, and pooled proportionally to the number 493 of amplicons in each pool prior to sequencing. Additionally the polymerase was swapped 494 from Phusion to VeriFi (PCRBIO) in a 25ul reaction using 2ul DNA per reaction, which has 495 markedly better performance on the amplicons with far less off-target amplification. The PCR 496 conditions for this polymerase were an initial denaturation of 1 minute at 98°C, followed by 497 40 cycles of 15 seconds at 98°C, 15 seconds annealing at 60°C, and 4 minutes and 40 498 seconds extension at 72°C, followed by a final extension for 5 minutes at 72°C. AMPure XP 499 bead cleanup after PCR is optional, but recommended in samples with low DIN. Primer 500 sequences, recommended primer concentrations and recommended pooling quantities are 501 described in supplementary table S1, and any updates to these will be released on Lilo's 502 github page. 503 Samples were sequenced following Nanopore's SQK-LSK109 or SQK-LSK110 protocol on 504 MinION mk1b or mk1c. The protocol was started with 1ug of pooled amplicons as measured 506 on a qubit using broad range reagents. For samples using multiplexing, the native barcoding 507 expansion kit from Nanopore was used following Nanopore's instructions when using SQK-508 LSK109. For using the barcodes with SQK-LSK110, the instructions for SQK-109 were 509 followed until after the barcodes had been ligated on, at which stage the end prep was 510 repeated and we follow the standard protocol for library prep with SQK-LSK110 from after 511 the end prep step. 512 Bioinformatic processing of ASFV genomes sequenced with shotgun sequencing 513 The data were basecalled and demultiplexed using MinKNOW (v19.06.8; ONT) using "fast" 514 basecalling. Following basecalling the reads were aligned to an ASFV genome using 515 minimap2 to identify ASFV reads, the fast5s for these reads were extracted using 516 fast5_subset from the ont_fast5_api (https://github.com/nanoporetech/ont_fast5_api) and 517 these were basecalled again using high accuracy basecalling. This was done to reduce 518 basecalling time, as this work was done locally in the field on a laptop without a GPU. The 519 reads were assembled with Flye (v2.6) 30 and polished 3 times with Medaka (v0.7.1; ONT). 520 Comparisons of quantity of data produced and the proportion of which were ASFV reads 521 were done using NanoComp (v1.28.1) 31 . 522 The data were basecalled and demultiplexed using Guppy (v5.0.14; ONT) using high or 524 super accuracy model on a GPU. The snakemake pipeline, Lilo 525 (https://github.com/amandawarr/Lilo), was developed and as summarised in Figure 3A 3. Select reads of the expected amplicon length (+/-5%) and subset to 300X 532 4. Select the read with highest average base quality within +/-1% of the median length 533 of reads for the amplicon to be the "reference" (with bioawk v1); 534 https://github.com/lh3/bioawk), remove any amplicons with fewer than 40 reads. 535 Targeting the median length allows for flexibility for large insertions or deletions. The required input to Lilo are demultiplexed reads in fastq format in a directory named 545 "raw/", a reference fasta, a bed file of primer alignments (as output by primal scheme), and a 546 csv of primer sequences (if there are ambiguous bases it is advised to expand them first) 547 and a config file, described on the github page. It is adaptable to any species (with a single 548 genome fragment/chromosome) with any tiled primer scheme. The pipeline outputs a fasta 549 file containing the assembled genome. 550 A subset of genomes were also assembled using the Artic pipeline 552 (https://artic.network/ncov-2019; v1.2.1) following the bioinformatics SOP using the medaka 553 method. 554 Quality control of assembled genomes 555 Quast (v5.0.2) was used to compare the assembled genomes to the most closely related 556 publicly available ASFV assembly according to BLAST alignment (MN715134.1) 21 . Samples 557 where both WGS and tiled sequencing were used were compared for overall structure using 558 nucmer (v4.0.0beta2) 36 . 559 Phylogeny 560 The phylogeny analysis was limited to the tiled genomes, as these were the most accurate 561 assemblies, and publicly available genomes. These were aligned using Mafft (v7.467) 37 A) DNA was extracted from blood and sequenced with Nanopore's LSK109 on an m1kb 694 before analysis and assembly using Flye and polishing with medaka. B) Read length 695 histograms for the dataset demonstrating total throughput (blue) and ASFV reads throughput 696 (orange) C) Normalized counts by dataset of reads for total throughput (blue) and ASFV 697 reads throughput (orange). D) Throughput over time for total read count (blue) and ASFV 698 read count (orange). 699 African swine fever -A review of current 582 knowledge An 584 Update on the Epidemiology and Pathology of African Swine Fever Review of the sylvatic cycle of African swine fever in sub-Saharan Africa 587 and the Indian ocean Infectious Dose of African Swine Fever Virus When 590 Consumed Naturally in Liquid or Feed On A Form of Swine Fever Occurring in British East Africa 593 (Kenya Colony) Economic consequences of African swine fever Current status of African swine fever African Swine Fever spread across Asia African 602 Swine Fever Virus: An Emerging DNA Arbovirus Joining the club: First detection of African swine fever in wild 605 boar in Germany Genotyping field strains of African swine fever virus by partial p72 608 gene characterisation Genetic characterization of African 611 swine fever virus isolates from soft ticks at the wildlife/domestic Mozambique and identification of a novel genotype Genetic characterisation of 615 African swine fever viruses from outbreaks in southern Africa (1973-1999) Identification of a New Genotype of African Swine Fever Virus 618 in Domestic Pigs from Ethiopia Genetic and antigenic diversity of African swine 621 fever virus The Genetics of Life 623 and Death: Virus-Host Interactions Underpinning Resistance to African Swine Fever, 624 a Viral Hemorrhagic Disease African swine fever whole-627 genome sequencing-Quantity wanted but quality needed Homologous recombination shapes the genetic diversity of African 630 swine fever viruses Nanopore sequencing of African swine fever virus Rapid Sequence-Based Characterization of African Swine 635 Fever Virus by Use of the Oxford Nanopore MinION Sequence Sensing Device and a 636 Companion Analysis Software Tool Short and Long-Read Sequencing Survey of the Dynamic 639 Transcriptomes of African Swine Fever Virus and the Host Cells QUAST: quality assessment tool 642 for genome assemblies Rapid Sequence-Based Characterization of African Swine 645 Fever Virus by Use of the Oxford Nanopore MinION Sequence Sensing Device and a 646 Companion Analysis Software Tool Swine Fever Virus in Carcasses of Domestic Pigs and Wild Boar Experimentally 650 Infected with the ASFV "Estonia Attenuated and non-haemadsorbing (non-HAD) genotype II African 653 swine fever virus (ASFV) isolated in Europe Genetic variation among African swine fever genotype II viruses, 656 eastern and central Europe Multiplex PCR method for MinION and Illumina sequencing of Zika 659 and other virus genomes directly from clinical samples Fast, scalable generation of high-quality protein multiple sequence 662 alignments using Clustal Omega Primer-BLAST: A tool to design target-specific primers for polymerase 665 chain reaction Assembly of long, error-prone 667 reads using repeat graphs NanoPack: visualizing and processing long-read sequencing data Minimap2: pairwise alignment for nucleotide sequences The Sequence Alignment/Map format and SAMtools BEDTools: a flexible suite of utilities for comparing 677 genomic features Combining de novo and reference-guided assembly with 680 scaffold_builder MUMmer4: A fast and versatile genome alignment system MAFFT: a novel method for rapid 685 multiple sequence alignment based on fast Fourier transform. Nucleic Acids 686 Research IQ-TREE 2: New Models and Efficient Methods for Phylogenetic 688 Inference in the Genomic Era A) Design of the tiled primer scheme for ASFV with ~7kb amplicons and ~1kb overlaps Predicted primer binding in correct region for representative ASFV genotypes. C) Workflow 702 with extraction from blood, PCR amplification of primer pools, pooling, sequencing and 703 bioinformatic analysis D) Coverage of one sample (PHL-3142) amplified with either evenly 704 Coverage of amplicons using optimized primer concentrations for the two 708 samples from figure 1E. G) Impact of post-amplification DIN on proportion of reads <3kb in 709 length, essentially wasted sequencing capacity, using R9.4 flow cells and LSK-109 710 (magenta) or R10.4 and LSK-110 (blue) PHL-261 in blue. I) Assembly accuracy based on proportion of mismatches against 716 reference (MN715134.1), with lines showing Q40 and Q50 PHRED scores. J) Assembly 717 accuracy based on proportion of indels against reference (MN715134.1), with lines showing 718 Q30 and Q40 PHRED scores A) Directed acyclic graph showing the steps the Lilo pipeline takes during assembly. The 721 graph has been simplified to show assembly of a genome containing 2 amplicons 722 (amplicon_01 and amplicon_n) for a single sample. B) Quast results for genomes 723 sequenced with tiled amplicons or from shotgun sequencing on R9 PHL-10 and PHL-30 were sequenced with an earlier version of the primer 725 scheme with one primer different and are expected to have a 25bp gap C) Nucmer 726 alignment of PHL-261 genomes assembled from WGS or Lilo tiled assembly. D) Nucmer 727 alignment of PHL-237 genomes assembled from WGS or B) IGV image showing alignment of genomes assembled with Lilo or Artic (top) and 729 assemblies and reads for a single sample This image shows a likely real indel present in all assemblies and supported by the reads. C) as in B, but showing an indel common in Lilo assemblies and missing in Artic assemblies 732 and the reference. D) as in B, but showing a long homopolymer with poor consensus from 733 the reads and inconsistent results in assemblies. E, F & G) Examples of indels specific to the 734 Artic pipeline assemblies which do not agree with the reference Maximum likelihood trees for our R9.4/SQK-LSK109 genomes and A) all available ASFV 737 genomes downloaded from NCBI (09/11/2021) or B) those specifically clustering with The authors declare no competing interests. 580 565 We would like to thank the Bureau of Animal Industry and Milagros Mananggit for providing 566 us access to the valuable ASFV blood samples. We would also like to thank Central Luzon 567