key: cord-102868-wnsgjdcp authors: Love, R. Rebecca; Pombi, Marco; Guelbeogo, Moussa W.; Campbell, Nathan R.; Stephens, Melissa T.; Dabire, Roch K.; Costantini, Carlo; della Torre, Alessandra; Besansky, Nora J. title: Inversion Genotyping in the Anopheles gambiae Complex Using High-Throughput Array and Sequencing Platforms date: 2020-05-28 journal: bioRxiv DOI: 10.1101/2020.05.25.114793 sha: doc_id: 102868 cord_uid: wnsgjdcp Chromosomal inversion polymorphisms have special importance in the Anopheles gambiae complex of malaria vector mosquitoes, due to their role in local adaptation and range expansion. The study of inversions in natural populations is reliant on polytene chromosome analysis by expert cytogeneticists, a process that is limited by the rarity of trained specialists, low throughput, and restrictive sampling requirements. To overcome this barrier, we ascertained tag single nucleotide polymorphisms (SNPs) that are highly correlated with inversion status (inverted or standard orientation). We compared the performance of the tag SNPs using two alternative high throughput molecular genotyping approaches versus traditional cytogenetic karyotyping of the same 960 individual An. gambiae and An. coluzzii mosquitoes sampled from Burkina Faso, West Africa. We show that both molecular approaches yield comparable results, and that either one performs as well or better than cytogenetics in terms of genotyping accuracy. Given the ability of molecular genotyping approaches to be conducted at scale and at relatively low cost without restriction on mosquito sex or developmental stage, molecular genotyping via tag SNPs has the potential to revitalize research into the role of chromosomal inversions in the behavior and ongoing adaptation of An. gambiae and An. coluzzii to environmental heterogeneities. We recently described a strategy that exploited the An. gambiae and An. coluzzii 136 database of natural variation (Ag1000G; www.malariagen.net/projects/ag1000g) (Miles 137 et al. 2017 ) to identify tag SNPs predictive of inversion orientation for all six common 138 inversion polymorphisms in these species. Using these tags, we developed an 139 algorithm capable of in silico inversion genotyping based on SNPs called from whole 140 genome resequencing data (Love et al. 2019 ). This is a rapid and powerful approach 141 assuming that whole genome sequence data are already available or will be produced 142 for other reasons. However, it does not satisfy experimental designs in which genomic 143 sequence data is not otherwise required, and where its procurement would be cost-144 prohibitive. For the requisite statistical power, studies aimed at finding significant 145 associations between inversions and behavioral or physiological phenotypes will likely 146 require thousands of specimens of known inversion genotype. To ensure that we retained at least six tag SNPs per inversion for genotyping, we were 235 compelled to reduce the genotypic concordance threshold below the 0.8 level imposed 236 in Love et al. (2019) . Even lowering the threshold to 0.7 for 2Rc tags in An. gambiae 237 failed to yield more than three candidates, and we declined to reduce that threshold 238 further. Minimum genotypic concordances for each inversion were 0.7 for 2Rb, 2Rc and 239 2Ru; 0.75 for 2Rd; 0.9 for 2Rj; and 0.9925 for 2La. 240 After filtering, we retained 54 tag SNPs in total, ranging from 6 to 11 per inversion 242 except 2Rc in An. gambiae, with only 3 tag SNPs. Based on these 54 tags, we selected 243 a custom 64-assay TaqMan OpenArray genotyping plate design whose 3,072 reaction 244 through-holes are divided into 48 sub-arrays, each with 64 through-holes (54 of which 245 were preloaded with a single custom assay). One such plate genotypes 48 mosquitoes 246 at 54 tags (2,592 genotypic assays). for genotyping assays are provided in Table S1 . Core Facility (GBCF) from a subset of specimens (n=192) to refine preparation 281 techniques and identify primers that produced PCR artefacts or were overrepresented. 282 Following optimization, primer pools were re-made to include only the optimized panel 283 of PCR 1 primers. Tag SNPs and PCR 1 primers for GT-seq genotyping are listed in 284 Table S2 . Biotech) according to manufacturer's instructions. Following normalization, 10 ul of 295 each sample per 96 well plate (up to 960 ul total) was then combined into a 1.5-mL 296 Eppendorf tube, for a total of 10 tubes. From each tube, 300ul was transferred to a 297 fresh 1.5-mL Eppendorf tube for two rounds of purification using AMPure XP 298 paramagnetic beads (Beckman Coulter, Inc.) with ratios of 0.5X and 1.3X respectively. 299 Purified libraries were eluted in 35 ul 1xTE and transferred to fresh 1.5-mL tubes before 300 adding 3.5 ul buffer EB containing a 1% Tween 20 solution. The procedures for filtering and calling molecular inversion genotypes were the same 320 for both OA and GT-seq platforms. Filtering steps were as follows. For each tag SNP, 321 we calculated the percentage of mosquito specimens in the sample with a genotype call 322 at that tag (the SNP call rate). If SNP call rates were <80%, the underperforming tag 323 SNPs were eliminated from further analysis. In addition, for each mosquito specimen 324 analyzed, we calculated the percentage of tag SNPs with a genotype call (the specimen 325 call rate). If the specimen call rate was <80%, that specimen was excluded from further 326 analysis. Note that the mosquito specimens in the sample varied according to the 327 inversion under consideration: 2La, 2Rb and 2Ru tags perform in both species, 2Rj and 328 2Rd tags are An. gambiae-specific, and defined subsets of 2Rc tags (referred to in this 329 work as 2Rc_col and 2Rc_gam) apply respectively to An. coluzzii or An. gambiae 330 To calculate the multilocus inversion genotype for each specimen, we converted the raw 333 genotype data for individual tag SNPs to the count of alternate alleles (if necessary), 334 where '0' is a homozygote for the reference allele, '1' is a heterozygote carrying one 335 reference allele, and '2' is a homozygote for the alternate allele. Next, we averaged the 336 number of alternate alleles present across all tag SNPs in a given inversion, and binned 337 this average to produce a predicted inversion genotype (0-0.67, 0; 0.68-1. 33 The inversion genotypes inferred for each specimen by the three methods 385 (cytogenetics, OA, and GT-seq) are provided in Table S3 . We compared these 386 genotypes to assess their concordance. Due to our filtering rules, not every specimen 387 had genotype calls by both molecular methods. We focused our assessment on the 388 subset of specimens that were successfully genotyped with all three methods (435 for 389 An. gambiae, and 513 for An. coluzzii). As summarized in Figure 2 and both OA and GT-seq that agreed, but were jointly discordant with cytogenetics. Except 397 for 2Rj with negligible discordance (and correspondingly low levels of polymorphism in 398 our sample), the cytogenetic versus multilocus molecular discordance affected from 14 399 to 73 mosquito specimens per inversion, representing 3% to 8% of the mosquito 400 samples (mean, 5%). Although cytogenetic karyotyping may be considered the gold 401 standard for inversion genotyping, two important considerations lend considerable 402 confidence to molecular genotypes, particularly when both molecular approaches 403 concur. First, while none of the tag SNPs are deterministic (i.e., none is perfectly and 404 invariably correlated with inversion orientation), OA and GT-seq infer genotypes based 405 on multiple predictive tags scored per inversion, thus providing weight of numbers. 406 Second, the final set of tags used for OA and GT-seq are almost completely non-407 overlapping, an outcome produced by distinct filters imposed on the initial list of 408 candidate tags during assay development (see Methods; Figure 3) . Accordingly, 409 agreement between both molecular methods is even stronger evidence in favor of the 410 inferred molecular genotype than that provided by one or the other molecular method by 411 itself. Table 2 shows that the two molecular methods agree at least 95% of the time (an 412 average of 98%), except in the case of 2Rc in An. gambiae. 413 We hypothesized that some genotypic discordances, specifically those in which the two 415 molecular methods agree but conflict with cytogenetics, are caused by cytogenetic 416 errors rather than systematic biases in the molecular approaches. This is difficult to 417 demonstrate conclusively, because the specimens used to compare the three 418 genotyping methods have not been subjected to whole genome sequencing. 419 Furthermore, it was not possible to double-check the cytogenetic karyotypes of the 420 specimens with discordances in the majority of cases, because neither slides nor 421 ovaries were available. However, there is strong evidence consistent with cytogenetic 422 error. During sampling in the field, specimens were assigned numerical identifiers that 423 incremented by "1" throughout the process; cytogenetic karyotyping was later split 424 between two institutions on the basis of even-or odd-numbered identifiers (see 425 Methods). This procedure virtually eliminates the possibility of biases owing to temporal 426 or spatial heterogeneities during the course of the mosquito sampling. Because even-427 and odd-numbered specimens should be random subsamples from the same 428 populations, we would expect no difference in discordance rates between them. This 429 was not what we found. Of the specimens assayed by both molecular methods in this 430 study, 280 were odd-numbered and 677 even-numbered. Focusing on the inversions 431 with the largest numbers of specimens whose cytogenetic and joint molecular 432 genotypes disagreed (2La, 73; 2Rb, 41; 2Ru, 56; Table 2 ), the combined 170 such 433 discordances occurred disproportionately in even-numbered specimens: 169 of the 170. 434 Analyses of 2x2 contingency tables demonstrated highly significant departures from the 435 null hypothesis (by Chi-square and Fisher exact probability tests), consistent with the 436 notion that cytogenetic error disproportionately affecting the even-numbered specimens 437 is responsible for these genotypic discrepancies (~5%). If this is the case, then based 438 on the fact that both molecular approaches agree >95% of the time, we suggest that the 439 true error rate for either molecular approach is <5%, probably closer to ~2%. 440 It is important to recognize that although the tag SNPs assayed by the two molecular 442 approaches are largely non-overlapping for technical reasons, the assumptions 443 underlying the ascertainment of the initial set of candidate tags were the same. The 444 implication is that if those assumptions are violated in natural populations, both 445 approaches may agree on the wrong genotype. The tags were ascertained in the 446 Ag1000G variation database, whose content was heavily biased toward An. gambiae at and/or (iii) violation of the assumption that the focal inversion arose uniquely (i.e., has a 455 monophyletic origin). We suspect that at least one of these scenarios applies to 2Rc 456 tags ascertained in An. gambiae, probably explaining their lower rate of apparent 457 success in genotyping (based on lower concordance values across the board; Table 2 ). 458 Previous work has shown that applying candidate tags to a taxon in which they are not (Table S3) . Based on this, we expect the systematic 464 underestimation of the number of alternate alleles to produce a distinctive pattern where 465 'true' standard homozygotes are correctly identified, but heterozygotes and inverted 466 homozygotes would be incorrectly genotyped molecularly as standard homozygotes. 467 Table 3 shows the distribution of discordant genotypes between cytogenetics and joint 468 molecular methods when broken down by genotypic class: standard homozygotes, 469 heterozygotes, and inverted homozygotes. While we have no objective measure of 470 which specimens are 'true' heterozgyotes and 'true' inverted homozygotes, it is 471 noteworthy that the discordances for all inversions other than 2Rc in An. gambiae either 472 skew toward molecular genotypes of '1' or '2', or they are roughly equally distributed 473 between '1' or '2' and '0'. The pattern for 2Rc in An. gambiae is distinctive, in that the 474 skew is strongly toward molecular genotypes of '0', which is consistent with tags that 475 may not be appropriately suited for the An. gambiae population in which they are 476 applied. Further study is both required and merited, to understand the cause(s) of 477 population structure between the populations used to develop the 2Rc tags for in An. Although we find good concordance between both molecular genotyping approaches, 487 GT-seq more often agreed with cytogenetics than did OA (Table 2) . This is not 488 surprising, given the hybridization-based nature of OA and the extremely high levels of 489 nucleotide diversity found in An. coluzzii and An. gambiae (Miles et al. 2017) . In 490 addition, highly concordant candidate tag SNPs that also had low polymorphism in the 491 ~50 bp immediately surrounding the tag, as required by OA, were sufficiently rare that 492 we were compelled to lower the concordance threshold to find enough candidates 493 suitable for assay design, and consequently the total number of OA tags per inversion is 494 smaller than for GT-seq (Table 1) reference genome assemblies and powerful functional genomics tools, the potential 510 exists to probe molecular mechanisms and deepen our understanding. Yet, a major 511 limitation to progress in this area has been the strict requirement for polytene 512 chromosome analysis, which not only limits samples but also demands rare cytogenetic 513 expertise whose throughput is low. Here we demonstrate that tag SNPs highly 514 correlated with inversion status can be used for joint molecular genotyping of common 515 inversions in An. gambiae and An. coluzzii across the genome (i.e., for karyotyping). and GT-seq. Each row is an individual mosquito, and each column compares inversion 547 genotypes derived from three genotyping approaches for a given inversion (a, 2La; j, 548 2Rj; b, 2Rb; c, 2Rc; d, 2Rd; u, 2Ru). Rows are grouped by species; 2Rj and 2Rd tags 549 are not applicable in An. coluzzii. Green represents 3-way genotypic concordance; 550 yellow, concordance between OA and GT-seq; purple, concordance between CYT and 551 GT-seq; black, concordance between CYT and OA; gray is missing data; red is 3-way 552 discordance. Revisiting the impact of inversions in evolution: from 605 population genetic markers to drivers of adaptive shifts and speciation? Annual Review of 606 How and why chromosome inversions evolve Chromosome inversions, local adaptation and speciation Breakpoint structure of the Anopheles gambiae 611 2Rb chromosomal inversion silico karyotyping of chromosomally 613 polymorphic malaria mosquitoes in the Anopheles gambiae complex. G3 (Bethesda) Chromosomal inversions and ecotypic 616 differentiation in Anopheles gambiae: the perspective from whole-genome sequencing Complex genome evolution in Anopheles coluzzii 619 associated with increased insecticide usage in Mali A test of the chromosomal theory of ecotypic 621 speciation in Anopheles gambiae Genetic diversity of the African malaria vector 624 Anopheles gambiae The Garki Project. Research on the Epidemiology and 626 Control of Malaria in the Sudan Savanna of West Africa World Health Organization Highly specific PCR-RFLP assays 629 for karyotyping the widespread 2Rb inversion in malaria vectors of the Anopheles 630 gambiae complex Intraspecific chromosomal polymorphism in the Anopheles gambiae 632 complex as a factor affecting malaria transmission in the Kisumu area of Kenya Chromosomal plasticity and evolutionary potential 635 in the malaria vector Anopheles gambiae sensu stricto: insights from three decades of 636 rare paracentric inversions Cuticular differences associated with aridity 638 acclimation in African malaria vectors carrying alternative arrangements of inversion 639 2La The Anopheles gambiae 2La chromosome 641 inversion is associated with susceptibility to Plasmodium falciparum in Africa Seasonal variations in indoor resting 643 Anopheles gambiae and Anopheles arabiensis in Kaduna ) 2La chromosomal inversion enhances 646 thermal tolerance of Anopheles gambiae larvae Selection in heterogeneous environments maintains the gene arrangement 648 polymorphism of Drosophila pseudoobscura Identification of single specimens of the Anopheles 650 gambiae complex by the polymerase chain reaction The distribution and inversion polymorphism of 653 chromosomally recognized taxa of the Anopheles gambiae complex in Mali Eco-Evolutionary Genomics of Chromosomal Inversions Molecular karyotyping of the 2La inversion 658 in Anopheles gambiae 559 560 Ayala D, Acevedo P, Pombi