key: cord-0425672-amcsb5ff
authors: Bruijnesteijn, Jesse; van der Wiel, Marit; de Groot, Natasja G.; Bontrop, Ronald E.
title: Rapid characterization of complex genomic regions using Cas9 enrichment and Nanopore sequencing
date: 2021-03-11
journal: bioRxiv
DOI: 10.1101/2021.03.11.434935
sha: c71fa3f5679e7945dd21dd0ca7aea154d7d0ce6a
doc_id: 425672
cord_uid: amcsb5ff
Long-read sequencing approaches have considerably improved the quality and contiguity of genome assemblies. Such platforms bear the potential to resolve even extremely complex regions, such as multigenic families and repetitive stretches of DNA. Deep sequencing coverage, however, is required to overcome low nucleotide accuracy, especially in regions with high homopolymer density, copy number variation, and sequence similarity, such as the MHC and KIR gene clusters of the immune system. Therefore, we have adapted a targeted enrichment protocol in combination with long-read sequencing to efficiently annotate complex genomic regions. Using Cas9 endonuclease activity, segments of the complex KIR gene cluster were enriched and sequenced on an Oxford Nanopore Technologies platform. This provided sufficient coverage to accurately resolve and phase highly complex KIR haplotypes. Our strategy facilitates rapid characterization of large and complex multigenic regions, including its epigenetic footprint, in multiple species, even in the absence of a reference genome.
Repetitive regions are difficult to resolve using short-read sequencing approaches, and often remain 29 registered for years as incomplete gaps in draft genomes (Chinwalla et al. 2002 to nanopore adapters. This allows the direct sequencing of genomic segments while avoiding error 58 prone DNA synthesis and maintaining epigenetic modifications. Efficient and specific enrichment using 59 this approach has been demonstrated for single genes, including several cancer-related fusion genes 60 (Stangl et al. 2020 ), but an application for multigenic regions is absent in the literature.
In this study, we adapted the Cas9-mediated enrichment method to resolve complex multigenic 62 regions, and validated this approach by the targeted characterization of complex KIR gene clusters in 63 two different primate species. We focused on the KIR region in humans, which has been thoroughly 64 characterized at the genomic level, and is important, for instance, in AIDS susceptibility and 65 transplantation biology (Farag et al. 2006; Bashirova et al. 2011
A 'tiling' approach to enrich complex multigenic clusters without amplification
The characterization of large and repetitive immune regions requires the generation and sequencing 99 of genomic DNA (gDNA) fragments that share overlaps. Allelic variation in these overlaps allows the 100 phasing of haplotypes. To achieve this goal, dephosphorylated high molecular weight (HMW) gDNA 101 needs to be cleaved, using sets of CRISPR RNAs (crRNA) in complex with Cas9 endonuclease (Fig. S1 ).
These crRNAs are designed to target conserved stretches that are shared by members of a multigenic 103 family. This approach will allow generic enrichment. Only at the terminus of the cleaved target sites is 104 a phosphate group available, which is utilized for dA-tailing and subsequent ligation to Nanopore 121 generic crRNAs were designed to target the differential presence of human KIR genes that may be 122 encountered on a haplotype, whereas 12 crRNA were specific for one particular framework gene (Sup. 123 Table 1 ). In addition, seven crRNA were included to target the genes that flank the KIR gene cluster 124 (LILR and FcAR), in order to define both ends of the KIR haplotype.
The KIR haplotypes in rhesus macaques display even more content diversity, with 4 to 17 KIR 126 transcripts encoded as defined by segregation studies (Bruijnesteijn et The red and blue boxes depict activating or inhibitory receptors, respectively. Pseudogenes have been indicated as well (grey). The only homologous gene shared in humans and macaques is KIR2DL4 (yellow). Human KIR haplotypes are categorized based on a more inhibitory (group A) or activating (group B) gene profile, and reflect different segment configurations (e.g., cA-tA and cB-tB) [31] . Chromosomal recombination events might shuffle haplotype segments (e.g., cA-tB defined by the large X). In rhesus macaques, the framework is represented by KIR3DL20 at the centromeric region. The telomeric region is characterized by highly variable gene expansions and contractions. In contrast to the situation for humans, physical maps of rhesus macaque KIR haplotypes are virtually absent from the literature. For that reason, the lower panel represents a hypothetical configuration as the proposed situation at the start of this study. reflected different KIR genotypes, were 143 included in our panel (Roe et al. 2017 which is required to connect and distinguish neighboring KIR genes. The percentage of size-selected 157 reads that mapped to the target region ranged from 1.6% to 2.5% (± 4,015 reads), which provided a 158 median coverage ranging from 269 to 323X. The enrichment factor, which reflects the efficiency of the 159 targeted enrichment, ranged from 215 to 394X. The length of the consensus sequences that were 160 generated from the on-target reads ranged from 6.5 to 29.7 kb, and their assembly covered complete 161 reference KIR haplotype configurations ( Fig. 2A, 2B , and 2C). The consensus accuracy compared to the 162 reference sequences ranged from 96.7% to 99.9%.
The centromeric and telomeric regions were completely assembled and grouped into four different 164 segment configurations (cA01, cB02, tA01, tB01). Phasing of complete KIR haplotypes was achieved in 165 two individuals, including a homozygous haplotype configuration using allele level resolution ( Fig
An allele level resolution allowed the phasing of six rhesus macaque KIR haplotypes (Fig. 3) . (Table 2, #3; Fig. 3C ). This ensured highly accurate and overlapping consensus sequences that 265 resolved complete human and macaque KIR haplotypes at an allele level resolution (Fig. 2D & 3D ).
The high modification frequency of the KIR promotor regions (Fig. 4)
Genome-wide characterization of centromeric satellites 411 from multiple mammalian genomes
HLA/KIR restraint of HIV: surviving the fittest
Complete 416 characterization of killer Ig-like receptor (KIR) haplotypes in Mauritian cynomolgus 417 macaques: novel insights into nonhuman primate KIR gene content and organization
Unparalleled Rapid Evolution of KIR Genes 421 in Rhesus and Cynomolgus Macaque Populations
Nomenclature report for killer-cell 425 immunoglobulin-like receptors (KIR) in macaque species: new genes/alleles, renaming 426 recombinant entities and IPD-NHKIR updates
Defined by Their Transcriptomes
De novo genome assembly of a Han Chinese male 432 and genome-wide detection of structural variants using Oxford Nanopore sequencing
DNA methylation maintains allele-specific KIR gene expression in 436 human natural killer cells
Initial sequencing and comparative analysis of the 439 mouse genome
Methplotlib: analysis of modified nucleotides 441 from nanopore sequencing
Co-evolution of the MHC 443 class I and KIR gene families in rhesus macaques: ancestry and plasticity
Optimized sgRNA design to maximize activity and 447 minimize off-target effects of CRISPR-Cas9
Macaca mulatta isolate AG07107 chromosome 19, 449 whole genome shotgun sequence
Real-Time DNA Sequencing from Single Polymerase Molecules
Nonhuman primate models of human viral infections. 454
The Effect of KIR Ligand Incompatibility on the 457 Outcome of Unrelated Donor Transplantation: A Report from the Center for 458 International Blood and Marrow Transplant Research
Marrow Transplant Registry, and the Dutch Registry
A new hybrid approach for MHC genotyping: high-throughput 463 NGS and long read MinION nanopore sequencing, with application to the non-model 464 vertebrate Alpine chamois (Rupicapra rupicapra)
Selective nanopore 466 sequencing of human BRCA1 by Cas9-assisted targeting of chromosome segments 467 (CATCH)
Evolutionary and biomedical insights from the 470 rhesus macaque genome
Analysis of short tandem repeat expansions and 473 their methylation state with nanopore sequencing
Targeted nanopore sequencing with Cas9-guided adapter 477 ligation
Receptor (KIR) Genes: Definition of an Orangutan 480 KIR Haplotype Reveals Expansion of Lineage III KIR Associated 481 with the Emergence of MHC-C
A macaque model of HIV-1 infection
Combinations of Maternal KIR and Fetal HLA-C Genes Influence the Risk of 487 Preeclampsia and Reproductive Success
DNA targeting specificity of RNA-guided Cas9 nucleases
Finishing the euchromatic sequence of the 493 human genome
Improved data analysis for the 495 MinION nanopore sequencer
Fiddes 497 IT et al. 2018. Nanopore sequencing and assembly of a human genome with ultra-long 498 reads
Genome sequence, comparative analysis and 501 haplotype structure of the domestic dog
The IPD Project: a centralised resource 503 for the study of polymorphism in genes of the immune system
Sequencing primate genomes: what have we 506 learned? Annual review of genomics and human genetics
Killer-cell immunoglobulin-like receptor (KIR) 509 nomenclature report
Concerted and birth-and-death evolution of multigene families
Identifying the causes and consequences of assembly gaps using a 514 multiplatform genome assembly of a bird-of-paradise
Recombinant structures expand and contract inter and intragenic 518 diversification at the KIR locus
Revealing complete complex KIR haplotypes phased by long-read 521 sequencing technology
Single haplotype analysis demonstrates rapid evolution of the killer 524 immunoglobulin-like receptor (KIR) loci in primates
Nanopore sequencing and the Shasta toolkit enable 527 efficient de novo assembly of eleven human genomes
Partner independent fusion gene 531 detection by multiplexed CRISPR-Cas9 enrichment and long read nanopore 532 sequencing
Mind 534 the gap; seven reasons to close fragmented genome assemblies
Human diversity in killer cell inhibitory receptor genes
Sequence diversity analyses of an improved 541 rhesus macaque genome enhance its biomedical utility
Initial sequence of 543 the chimpanzee genome and comparison with the human genome
Bonthron 545 DT. 2020. Cas9-based enrichment and single-molecule sequencing for precise 546 characterization of genomic duplications
Age-related rhesus 548 macaque models of COVID-19