key: cord-104073-vsa5y7ip authors: Warner, Emily F.; Bohálová, Natália; Brázda, Václav; Waller, Zoë A. E.; Bidula, Stefan title: Cross kingdom analysis of putative quadruplex-forming sequences in fungal genomes: novel antifungal targets to ameliorate fungal pathogenicity? date: 2020-09-23 journal: bioRxiv DOI: 10.1101/2020.09.23.310581 sha: doc_id: 104073 cord_uid: vsa5y7ip Fungi contribute to upwards of 1.5 million human deaths annually, are involved in the spoilage of up to a third of food crops, and have a devastating effect on plant and animal biodiversity. Moreover, this already significant issue is exacerbated by a rise in antifungal resistance and a critical requirement for novel drug targets. Quadruplexes are four-stranded secondary structures in nucleic acids which can regulate processes such as transcription, translation, replication, and recombination. They are also found in genes linked to virulence in microbes, and quadruplex-binding ligands have been demonstrated to eliminate drug resistant pathogens. Using a computational approach, we identified putative quadruplex-forming sequences (PQS) in 1362 genomes across the fungal kingdom and explored their potential involvement in virulence, drug resistance, and pathogenicity. Here we present the largest analysis of PQS in fungi and identified significant heterogeneity of these sequences throughout phyla, genera, and species. Moreover, PQS were genetically conserved. Notably, loss of PQS in cryptococci and aspergilli was associated with pathogenicity. PQS in the clinically important pathogens Aspergillus fumigatus, Cryptococcus neoformans, and Candida albicans were located within genes (particularly coding regions), mRNA, repeat regions, mobile elements, tRNA, ncRNA, rRNA, and the centromere. Genes containing PQS in these organisms were found to be primarily associated with metabolism, nucleic acid binding, transporter activity, and protein modification. Finally, PQS were found in over 100 genes associated with virulence, drug resistance, or key biological processes in these pathogenic fungi and were found in genes which were highly upregulated during germination, hypoxia, oxidative stress, iron limitation, and in biofilms. Taken together, quadruplexes in fungi could present interesting novel targets to ameliorate fungal virulence and overcome drug resistance. Fungi contribute to upwards of 1.5 million human deaths annually, are involved in the spoilage 23 of up to a third of food crops, and have a devastating effect on plant and animal biodiversity. 24 Moreover, this already significant issue is exacerbated by a rise in antifungal resistance and 25 a critical requirement for novel drug targets. Quadruplexes are four-stranded secondary 26 structures in nucleic acids which can regulate processes such as transcription, translation, 27 replication, and recombination. They are also found in genes linked to virulence in microbes, 28 and quadruplex-binding ligands have been demonstrated to eliminate drug resistant 29 pathogens. Using a computational approach, we identified putative quadruplex-forming 30 sequences (PQS) in 1362 genomes across the fungal kingdom and explored their potential 31 involvement in virulence, drug resistance, and pathogenicity. Here we present the largest 32 analysis of PQS in fungi and identified significant heterogeneity of these sequences 33 throughout phyla, genera, and species. Moreover, PQS were genetically conserved. Notably, 34 loss of PQS in cryptococci and aspergilli was associated with pathogenicity. PQS in the 35 clinically important pathogens Aspergillus fumigatus, Cryptococcus neoformans, and Candida 36 albicans were located within genes (particularly coding regions), mRNA, repeat regions, 37 Introduction sequence nucleotides and can form intramolecular or intermolecular associations [17, 18] . 75 This structure is further stabilised by the presence of monovalent cations, especially potassium 76 [19] . Moreover, the 5ʹ-to 3ʹ-directionality of the strands, glycosidic bonding in the G-tetrads, 77 the cation present, and number of stacked G-tetrads contribute to the wide variation of 78 observed G4 structures and topologies [13] . Conversely, iMs form within cytosine-rich regions there has been increased interest in the therapeutic potential of targeting quadruplexes 92 following the implication of these secondary structures in disease, especially cancer, due to 93 their prevalence in oncogene promoters [26] . However, there is also now a growing number 94 of pathogens in which G4s respectively; Figure 2A and E). The Basidiomycota and Zoopagomycota had high PQS 218 frequencies relative to genome size (0.445 and 0.373 PQS/kbp, respectively; Figure 2B and 219 F). The Mucoromycota and Basidiomycota displayed high PQS frequencies relative to GC 220 content (459 and 340 PQS/GC%, respectively; Figure 2C and G). Fungi within the 221 Basidiomycota had the highest average GC content (53.3%; Figure 2D and H). The 222 Microsporidia and Cryptomycota scored lowest for total number of PQS (300 and 372, 223 respectively), PQS/kbp (0.091 and 0.029, respectively) and PQS/GC% (8 and 11, 224 respectively; Figure 2 ). Moreover, they also had low GC content (39.6% and 35.0%, 225 respectively). 226 Considering G4s and iMs form in guanine or cytosine rich regions, respectively, one would 227 expect fungi with a higher genome GC content to have a higher PQS frequency by chance. 228 To investigate this further, the frequency of PQS/kbp relative to the GC content in all fungi and 229 their divisions were plotted. As expected, there was a positive correlation between GC content 230 and PQS frequency amongst all the fungal species analysed (r=0.5290; p<0.0001; Figure 3A ). and Mucoromycota (r=0.5619, r=0.3891, and r=0.5239 and r=0.2883, respectively; all p<0.05; 233 Figure 3B -D). However, there was not a significant correlation observed within the 234 Kickxellomycotina (n=7 species) were 8399.7, 6726.5, and 11315.1, respectively ( Figure 4A ). 259 The average PQS/kbp for each subphylum was 0.746, 0.081, and 0.292, respectively ( Figure 260 4B). The average PQS/GC% for each subphylum was 162.3, 162.7, and 305.4, respectively 261 ( Figure 4C ). Finally, the average GC% for each subphylum was 54.4%, 35.3%, and 38.2%, 262 respectively ( Figure 4D) . Figure 4C ). Finally, the average GC% were 39.6% and 35.0%, respectively ( Figure 4D ). 274 Finally, we also highlighted the frequency of PQS in fungal genera which contained important 275 human and plant pathogens. We found that there was also large heterogeneity in the 276 frequency of PQS between species within genera containing human pathogens (e.g. 277 Aspergillus spp., Candida spp., Cryptococcus spp., Blastomyces spp.) and plant pathogens 278 (e.g. Verticillium spp., and Fusarium spp.; Figure 5 ). This variation was particularly wide within 279 Aspergillus spp., and Cryptococcus spp. 280 Evolutionary conservation of genetic motifs within the genome are a hallmark of their 282 fundamental importance to how that organism functions. Therefore, we endeavoured to 283 explore whether there was evolutionary conservation of PQS within fungal genomes. We chose to explore this relationship in Aspergillus spp., due to the robustness and accuracy of 285 the phylogenetic tree available [44] . 286 Notably, we found that the frequency of PQS/kbp appeared to be intrinsically linked to how 287 closely related species were, with species within the same section displaying similar PQS 288 frequencies ( Figure 6 ). Aspergilli in this tree were divided into 13 sections (range of PQS/kbp 289 The Ascomycota and Basidiomycota contain many of the most prevalent fungal pathogens of 305 both plants and humans, including the genera Aspergillus spp., Candida spp., and 306 Cryptococcus spp., which contain fungal species that account for most fungal-related deaths 307 in humans. Although, not all species within these genera are potential pathogens and we found 308 high variation in their PQS frequency. Therefore, we compared the PQS frequency between pathogenic and non-pathogenic species to explore whether there was a link with 310 pathogenicity. 311 Similarly, comparing 16 species of Cryptococcus (9 pathogenic, 7 non-pathogenic) we also 320 found that pathogenic species had a significantly lower frequency of PQS/kbp (0.480 vs. When only total PQS were considered, the largest number of PQS in all three fungal species 348 could be found within the coding regions (CDS), genes, and mRNA, with few PQS found in 349 other genomic features ( Figure 8A , B, and C). However, this was not the same when 350 considering the frequency of PQS/kbp of the genomic features. In A. fumigatus, the greatest 351 frequency of PQS could be found in the repeat regions ( Figure 8D ). The lowest frequency 352 could be found within the tRNA. In C. neoformans, the highest PQS frequencies were still in 353 the CDS, genes, and mRNA, with a very low frequency found within the tRNA ( Figure 8E ). In 354 C. albicans, the highest frequency of PQS could be found in the rRNA, followed by repeat 355 regions and ncRNA ( Figure 8F ). There were no PQS found in the tRNA and low frequencies 356 were again found in the mobile elements. The total number and frequency of PQS 100 bp 357 before and after the annotated genomic features appeared to be evenly distributed (Figure 8) . 358 PQS are found in genes encoding proteins involved in metabolism, nucleic acid 359 binding, cell transport, and protein modification 360 As we knew the genomic location of the PQS, we could then identify the number and identity 361 of the genes which contained these sequences. This further enabled us to identify the classes of proteins associated with PQS-containing genes. In A. fumigatus, 35.1% of genes contained 363 at least one PQS. In C. neoformans, this number was almost double, with 59.9% of genes 364 containing PQS. Conversely, PQS were only found in 5.6% of genes in C. albicans. 365 Despite the discrepancies in the number of genes where PQS can found between the 366 organisms, in all cases, PQS were primarily located in genes which encoded proteins involved 367 in metabolism, nucleic acid binding, cell transport, and protein modification (Figure 9 ). They 368 were least likely to be found in genes encoding for calcium-binding proteins, extracellular 369 matrix proteins, cell adhesion molecules, and defense/immunity proteins (Figure 9) . 370 In all organisms, PQS could be found in the highest frequency in genes associated with 371 metabolite interconversion enzymes. In A. fumigatus, the number of genes associated with 372 metabolite interconversion enzymes was 3.7-fold higher than the next represented protein 373 class (434 genes vs. 117 genes for nucleic acid binding proteins and transporters; Figure 9A) . 374 In C. neoformans the number of genes associated with these enzymes was 2.1-fold higher 375 compared to nucleic acid binding proteins (491 genes vs. 231 genes, respectively; Figure 9B) . 376 In C. albicans, the difference in the number of PQS-containing genes associated with 377 metabolite interconversion enzymes and nucleic acid binding proteins was much lower (26 378 genes vs. 21 genes, respectively; 1.2-fold; Figure 9C ). Surprisingly, when categorising genes 379 based on gene ontology terms, there was an almost identical distribution of genes involved in iM in the promoter of the HIV-1 pro-viral genome has also been recently been described [31] . 387 Thus, whether PQS could be found in genes associated with virulence/drug resistance in A. fumigatus, C. neoformans, and C. albicans was explored. Although the list is not exhaustive 389 (there are many proteins still yet to be characterised), there were many interesting candidates 390 that arose from the analysis. In total, PQS were found in over 100 genes associated with the 391 virulence, drug resistance, or key biological processes of A. fumigatus (39 genes), C. 392 neoformans (41 genes), and C. albicans (27; Tables 1-3) . 393 In A. fumigatus, PQS could be found in notable genes, including the 14-α sterol demethylases 394 (cyp51A and cyp51B), the 1,3-β-glucan synthase catalytic subunit fks1, and ABC drug 395 exporter atrF, which are involved in drug resistance. In addition to genes involved in virulence, 396 including transcription factors stuA, hapX, and pacC, genes involved in pigment biosynthesis 397 (pksP, arp2, abr1, abr2, and ayg1), a master regulator of secondary metabolism laeA, and 398 gliN and gliP which are involved in the synthesis of gliotoxin (Table 1) . 399 As PQS could be found in almost two-thirds of C. neoformans genes, it was not surprising that 400 PQS could be found in those associated with virulence. These included the ABC transporter 401 afr1 (which is associated with fluconazole resistance), the protein kinases fsk and hog1, the 402 calcineurin-associated genes crz1 and cna1, pacC/rim101 like in A. fumigatus, and numerous 403 capsule-associated genes (the main virulence factor of Cryptococcus) including cap2, cap5, 404 cap10, cap59, cap60, cap64, cas31, cas33, and cxt1 (Table 2) . 405 There were very few genes in C. albicans that contained sequences likely to form 406 quadruplexes, and thus, quadruplexes might be less important in this organism. Notable genes 407 included the iron permeases ftr1 and ftr2, and a gene associated with flucytosine resistance 408 (rrp9 ; Table 3) . 409 The highest scoring potential quadruplex-forming sequences for each of these genes were 410 then re-analysed in an alternative PQS predictive algorithm called QGRS Mapper. In this 411 instance, the scores of known quadruplex-forming sequences were compared to scores of the 412 PQS in fungi. This was conducted to provide further insight into whether these sequences 413 were likely to form quadruplex structures. Figure 10B ). In all 437 cases, the average PQS frequencies in the upregulated genes were higher than the average 438 PQS observed throughout the entire genome ( Figure. 10B ). The average PQS frequencies in 439 upregulated PQS-containing genes were 2.97 PQS/kbp (germinating conidia), 3.72 PQS/kbp (oxidative stress), and 2.26 PQS/kbp (biofilms; Figure 10B ). Although, there were a range of 442 PQS frequencies observed between the genes from 0.34 to 11.90 PQS/kbp. The genes 443 containing the highest PQS frequencies for each condition were AFUA_8G01710 in 444 germinating conidia and hyphae (11.90 PQS/kbp), AFUA_4G09580 in hypoxic fungi (5.59 445 PQS/kbp), AFUA_3G03650 during iron limitation (8.50 PQS/kbp), AFUA_5G10220 during 446 oxidative stress (5.28 PQS/kbp), and AFUA_8G01980 in biofilms (5.90 PQS/kbp). 447 Interestingly, each of these genes were upregulated in at least 3 out the 6 conditions 448 investigated. 449 In this study, the number of potential quadruplex-forming sequences within the genomes of 451 fungi were computationally predicted and their potential involvement in pathogenicity was 452 discussed. Several important observations were made. This was the first study to identify the 453 heterogeneity of PQS amongst genetically distinct fungal species. Moreover, we highlighted 454 that pathogenic Aspergillus and Cryptococcus species contained fewer PQS compared to their 455 non-pathogenic counterparts and these could be found throughout known genomic features, 456 including genes, mRNA, repeat regions, tRNA, ncRNA, and rRNA. Genes containing PQS 457 were associated with metabolism, nucleic acid-binding proteins, protein modifying enzymes, 458 and transporters. Notably, PQS likely to form quadruplexes were identified in genes linked 459 with fungal virulence or drug resistance, such as cyp51A, and could be found in genes 460 upregulated during fungal growth and in response to stress. 461 The frequency of PQS throughout genomes is highly variable; for example, human genomes 462 were shown to contain around 0.228 PQS/kbp, whereas the genomes of Escherichia coli 463 contain around 0.028 PQS/kbp [15] . In this study we also found significant differences in the Interestingly, loss of PQS has recently also been observed in pathogenic Coronaviridae [47] . 488 It has also been reported that host nucleolin (an RNA-binding protein) can bind and stabilise 489 quadruplexes in the LTR promoter of HIV-1, which can silence viral transcription [48] . 490 Therefore, in this situation, loss of quadruplexes would be beneficial for immune evasion. (pacC/rim101). The most notable virulence factor of C. neoformans is its polysaccharide 536 capsule and PQS could be found in numerous capsule-associated genes (cas31, cap60, 537 cap59, cap64, cap2, cap5, cas33, cap10, and cxt1) [69]. In C. albicans PQS could be found 538 in genes such as the iron permeases ftr1 and ftr2 [70] . Notably, many of these genes contained 539 PQS which have previously been shown to be capable of forming bona fide quadruplexes, 540 such as the sequence GGAGGAGGAGG [71] . It is also interesting to highlight that these 541 organisms contained many more G 2+ L 1-12 compared to G 3+ L 1-12 PQS sequences, which is a 542 characteristic shared with S. cerevisiae [15] . 543 There are now an ever-increasing number of G4s identified within genes linked to microbial 544 pathogenicity. G4-forming motifs located in the hsdS, recD, and pmrA genes of S. The Pearson correlation coefficient was used to determine the association between PQS and 795 GC content. P<0.05 was considered statistically significant. 796 Stop neglecting fungi. Nat Microbiol Strategies for Engineering Natural Product Biosynthesis in Fungi The regulation and functions of DNA and RNA G-quadruplexes i-Motif DNA: structural features and significance to cell biology Whole genome experimental maps of DNA G-quadruplexes in 605 multiple species Quadruplex DNA: sequence, topology and structure An intramolecular G-quadruplex structure with mixed parallel/antiparallel MycoCosm portal: gearing up for 1000 fungal genomes G4Hunter web application: a web server for G-quadruplex prediction PANTHER version 14: more genomes, a new PANTHER GO-slim and 652 improvements in enrichment analysis tools Applications for protein sequence-function evolution data: 655 mRNA/protein expression analysis and coding SNP scoring tools Comparative transcriptome analysis revealing dormant conidia 664 and germination associated genes in Aspergillus species: an essential role for AtfA in 665 conidial dormancy Additional oxidative stress reroutes the global response of 669 Aspergillus fumigatus to iron depletion Global transcriptome changes underlying colony growth in the 673 opportunistic human pathogen Aspergillus fumigatus A Robust Phylogenomic Time Tree for Biotechnologically and 676 Medically Important Fungi in the Genera Aspergillus and Penicillium. mBio G-quadruplex-induced instability during leading-strand replication RNA G-quadruplexes are globally unfolded in eukaryotic 681 cells and depleted in bacteria Nucleolin stabilizes G-quadruplex structures folded by the LTR 687 promoter and silences HIV-1 viral transcription Aspergillus fumigatus conidia survive and germinate 690 in acidic organelles of A549 epithelial cells Genomic distribution and functional analyses of potential G-694 quadruplex-forming sequences in Saccharomyces cerevisiae Divergent distributions of inverted repeats and G-quadruplex forming 697 sequences in Saccharomyces cerevisiae Genome-wide prediction of G4 DNA as regulatory motifs: role in 701 Metabolism impacts upon Candida immunogenicity and 703 pathogenicity at multiple levels Metabolism in fungal pathogenesis. Cold Spring Harb Perspect Med Antifungal Resistance, Metabolic Routes as Drug Targets, 707 and New Antifungal Agents: An Overview about Endemic Dimorphic Fungi Secondary metabolite arsenal of an opportunistic pathogenic fungus Candidalysin is a fungal peptide toxin critical for mucosal infection The Fungal CYP51s: Their Functions, Structures Identification of Aspergillus fumigatus 716 multidrug transporter genes and their potential involvement in antifungal resistance LaeA, a regulator of morphogenetic fungal virulence factors Recognition of DHN-melanin by a C-type lectin receptor is 726 required for immunity to Aspergillus Aspergillus fumigatus virulence 728 through the lens of transcription factors Role of AFR1, an ABC transporter-encoding gene, in the in vivo 730 response to fluconazole and virulence of Cryptococcus neoformans Distinct stress responses of two functional laccases in Cryptococcus neoformans are revealed in the absence of the thiol-specific antioxidant The capsule of the fungal pathogen Cryptococcus neoformans Functional characterization of the ferroxidase, permease high-affinity 738 iron transport complex from Candida albicans Characterization of highly conserved G-quadruplex motifs as 742 potential drug targets in Streptococcus pneumoniae G-Quadruplex DNA Motifs in the Malaria Parasite Plasmodium 744 falciparum and Their Potential as Novel Antimalarial Drug Targets Characterization of G-Quadruplex Motifs in espB Genes of Mycobacterium tuberculosis as Potential Drug Targets Berberine Antifungal Activity in Fluconazole-Resistant Pathogenic 752 Yeasts: Action Mechanism Evaluated by Flow Cytometry and Biofilm Growth Inhibition 753 in Candida spp