key: cord-0333129-35o49jcl authors: Karawita, Anjana C.; Cheng, Yuanyuan; Chew, Keng Yih; Challgula, Arjun; Kraus, Robert; Mueller, Ralf C.; Tong, Marcus Z. W.; Hulme, Katina D.; Beielefeldt-Ohmann, Helle; Steele, Lauren E.; Wu, Melanie; Sng, Julian; Noye, Ellesandra; Bruxner, Timothy J.; Au, Gough G.; Lowther, Suzanne; Blommaert, Julie; Suh, Alexander; McCauley, Alexander J.; Kaur, Parwinder; Dudchenko, Olga; Aiden, Erez; Fedrigo, Olivier; Formenti, Giulio; Mountcastle, Jacquelyn; Chow, William; Martin, Fergal J.; Ogeh, Denye N.; Thiaud-Nissen, Françoise; Howe, Kerstin; Collins, Joanna; Tracey, Alan; Smith, Jacqueline; Kuo, Richard I.; Renfree, Marilyn B.; Kimura, Takashi; Sakoda, Yoshihiro; McDougall, Mathew; Spencer, Hamish G.; Pyne, Michael; Tolf, Conny; Waldenström, Jonas; Jarvis, Erich D.; Baker, Michelle L.; Burt, David W.; Short, Kirsty R. title: The swan genome and transcriptome: its not all black and white date: 2022-05-02 journal: bioRxiv DOI: 10.1101/2022.05.02.490350 sha: 36caf606ce05528cb57c42ed1f070c6009243af3 doc_id: 333129 cord_uid: 35o49jcl The Australian black swan (Cygnus atratus) is an iconic species with contrasting plumage to that of the closely related Northern Hemisphere white swans. The relative geographic isolation of the black swan may have resulted in a limited immune repertoire and increased susceptibility to infectious disease, notably infectious diseases from which Australia has been largely shielded. Indeed, unlike Mallard ducks and the mute swan (Cygnus olor), the black swan is extremely sensitive to severe highly pathogenic avian influenza (HPAI). Understanding this susceptibility has been impaired by the absence of any available swan genome and transcriptome information. Here, we generate the first chromosome-length annotated black and mute swan genomes annotated with transcriptome data, all using long-read based pipelines generated for vertebrate species. We used these genomes and transcriptomes, to show that unlike other wild waterfowl, black swans lack an expanded immune gene repertoire, lack a key viral pattern-recognition receptor in endothelial cells and mount a poorly controlled inflammatory response to HPAI. We also implicate genetic differences in SLC45A2 in the iconic plumage of the Australian black swan. Together, these data suggest that the immune system of the black swan is such that should any avian viral infection become established in its native habitat the survival of the black swan would be in significant peril. The distinctive black plumage of the native Australian black swan (Cygnus atratus) is in stark contrast to the white swans that are native to Europe and North America. This unique feature has resulted in the black swan playing an important role in Australian heraldry and culture. The limited native geographic range (Australia) and relative isolation of the black swan has direct consequences for its immune repertoire and susceptibility to infectious disease common to other parts of the world. Specifically, geographic isolation can result in founder effects and reduced immune diversity as a result of limited pathogen challenge 1 . The native Australian black swan has a remarkably distinct response to infection by highly pathogenic avian influenza (HPAI) virus compared to the closely related white swans (e.g. the mute swan; Cygnus olor) and other waterfowl 2, 3 . Unlike Mallard ducks and mute swans, the black swan is extremely sensitive to HPAI, succumbing to the disease within 2 to 3 days postinfection. This disease pathogenesis mirrors that of infected chickens, viewed as the most susceptible species to HPAI 3 . One of the striking features common to both black swans and chickens is that HPAI viruses preferentially infect endothelial cells, which may contribute to the disease severity in these two species 3 . These experimental studies are consistent with reports of natural infections, which suggest that captive black swans quickly succumb to HPAI whilst co-housed mute swans survive the infection 4 . Comparative genomics has played an important role in understanding species-dependent differences in HPAI pathogenesis 5 whilst also revealing the unique immune systems of many native Australian fauna 6 . However, comparative genomics is contingent upon the availability of high-quality species-specific genomes and transcriptomes. Here, we generate the first black and mute swan reference genomes and transcriptomes, including the transcriptional response of primary black swan endothelial cells to HPAI. These data show that the black swan has numerous unique characteristics including (i) lack of an expanded immune gene repertoire (ii) undetectable Toll-like Receptor (TLR) 7 gene expression in infected endothelial cells and (iii) a dysregulated pro-inflammatory response to viral infection that is likely to leave the species highly susceptible to viral infections such as HPAI. It is also likely that genetic differences in melanin production contribute to the distinctive black plumage of the black swan. The chromosome-length reference genomes for black and mute swan were constructed according to a Pacbio continuous long-read (CLR) DNA Zoo pipeline and Vertebrate Genomes Pipeline 1.5 7 , respectively ( Supplementary Figures 1 and 2 ). This included scaffolding contigs with Hi-C and Bionano maps. Curations of the assemblies identified scaffolds representing 34 autosomes plus the Z sex chromosome, and were named according to the descending order of the physical size. We assigned final scaffolds to 34 chromosomes (including the male sex chromosome) based on the physical size. The W chromosome was not assigned a scaffold as the DNA was from a male. The expected diploid number of chromosomes in the mute swan is 80 8 . The black swan genome was sequenced using 90x PacBio CLR coverage, generating a 1.12 Gb reference assembly. The mute swan genome was sequenced using 60x PacBio CLR coverage, generating a 1.13 Gb reference assembly. The details of both genomes, including a comparison to the latest VGP chicken (Gallus gallus) and Mallard duck (Anas platyrhynchos) genomes, are shown in Table 1 . The total classified repeat content of the genome was 10.56% for the black swan, with 1.71% unclassified repeats, and 10.76% for the mute swan genome, with 1.55% unclassified. This is lower than the repeat content recorded in the chicken (15%) and the Mallard duck (17%) 9, 10 . The black and mute swan genomes were annotated with RNASeq and IsoSeq transcriptome data, homology based alignments with other species and with bioinformatically inferred gene predictions, according to the methods listed in Supplementary Figure 5 . The completeness of the black and mute swan genomes was assessed using the Core Eukaryotic Genes Mapping (CEGMA) and Benchmark Universal Single Copy Orthologues (BUSCO) analyses and compared to the chicken and Mallard duck genome (Supplementary Table 1 & 2) . Notably, the black swan genome had the highest complete BUSCOs (8093), followed by the chicken (8054) and the mute swan (8010) genomes. Whilst the chicken genome had the highest complete core-eukaryotic gene content (224), this was only marginally higher than that of the mute (221) and black swan (219) genomes. One-to-one alignment between the black and mute swan genomes showed 98.35% average nucleotide identity between the 34 autosomes of the black and mute swans (Supplementary Figure 3) . The Z chromosome of the black and mute swan had several large (>10kb) structural variants (Supplementary Figure 4) , but were otherwise largely consistent. These structural differences in the Z chromosome may be associated with the speciation of the black and mute swan from their last common ancestor (approximately 6.1 million years ago) 11 . Structural genomic differences have been associated with the differential susceptibility of chickens and ducks to avian influenza 12 . We found no substantive inversions between the black and mute swan. However, given that both ducks and swans are Anseriformes we compared structural genomic differences between the susceptible black swan and mute swan and relatively resistant duck, using the ostrich as an outgroup. We investigated genes in the inverted genomic regions present in the duck but absent in the swans on chromosome 1. Strikingly, we found that 53 inverted genes (out of 1758 total genes) in chromosome one of the duck genome were mapped to immune system processes (Supplementary Table 3 ). However, given the absence of substantive structural variants between the black and mute swan, it is likely that any immunological consequences of these structural variants would be present in multiple swan species. The black and mute swan genomes were then annotated according to the methods listed in Supplementary Figure 5 . Sixteen thousand two hundred four (16, 204) gene models were obtained through Evidence Modeler as the final gene models in the black swan and 15,789 in the mute swan. Protein alignment against the UniProt/Swiss-Prot database was used to infer 15,478 gene model names for the black swan and 14,791 gene model names for the mute swan. One of the most remarkable features of the black swan is its distinct plumage pattern. To determine if the highly annotated genomes presented herein could offer insight into the iconic plumage we examined four genes (SLC45A2, SLMO2, ATP5e and EDN3) known to be involved in avian plumage colour 13, 14 .Three of the four genes, SLMO2 (PERLI3B), ATP5e and EDN3 shared 100% amino acid identity between the black and mute swan (data not shown). The exception, SLC45A2 in the mute swan had a nucleotide deletion in the first open reading frame of the mute swan, instigating a frame-shift mutation and an in-frame early stop codon ( Figure 2 ). Multiple non-homologous nucleotides were detected in the chicken and the duck SLC45A2 relative to that of the black swan ( Figure 2 ). SLC45A2 encodes a membrane-associated transport protein which regulates the tyrosinase activity and the melanin content of melanosomes 15 . The knockdown of this gene causes low melanin content and reduced tyrosinase activity in human melanoma cell lines 16 . These results suggest that this deletion in SLC45A2 is a candidate genetic change that could be responsible for the white plumage in white swans in the genus Cygnus. To understand whether the relative isolation of the black swan has resulted in an altered immune gene repertoire, evolutionary gene gain and loss were determined ( Figure 3 ). (p-value < 0.05). The black swan genome was estimated to be contractive, indicating that the total gene gain was less than the gene loss from the last common ancestor. The biological function of expanded genes in the black swan, chicken, mute swan, and duck was then investigated. Strikingly, immune system processes (e.g. GO0002376) were only expanded in the Mallard duck and the mute swan genomes (Figure 4 ). In contrast, in chickens, expanded gene families were associated with regulation of GTPase activity, extracellular matrix and structure organization, whilst the over-represented functional terms for expanded black swan gene families included cell-matrix adhesion, cell-substrate adhesion, extracellular matrix organization and extracellular structure organization. To specifically compare the immune gene repertoire of black and mute swans we used human and mouse immune genes to identify immune gene families in Cygnus species. Thirty-nine immune-related gene families of the black swan were contractive compared to the mute swan (Supplementary Table 4 ). The PANTHER pathways related to these genes included apoptosis signalling, cadherin signalling, general transcription by RNA polymerase, gonadotropin-releasing hormone receptor, inflammation mediated by chemokine and cytokine signalling, interleukin signalling, TGFbeta signalling and Wnt signalling pathways. MHC diversity is altered in some avian species 17 , which may affect susceptibility to disease 18 . We therefore compared MHC loci between black and mute swans. Two MHC class I and MHC class II loci were identified in the black and mute swan (Supplementary Figure 6 ). These were located on chromosome 33 in the swan genomes. A similar number of MHC complex associated genes were identified in each species. None of these genes as appeared to be pseudogenes. Compared to mammals, both mute and the black swans have a compact, relatively simple MHC B locus (Supplementary Figure 7) , with two class IIb (so-called BLB) genes followed by a pair of class I (so-called BF) genes that flank the TAP genes. The TAPBP gene in both birds, unlike chickens, does not flank the two-class-IIb genes 19 . Overall, the MHC region of both the black and mute swan share a similar genome landscape and represent a socalled minimal essential MHC similar to chicken and Mallard duck 20 . It is therefore unlikely that differences in the MHC complex contribute to species-specific differences in the response to HPAI virus infection. where it functions as a pathogen recognition receptor (PRR) that recognizes single-stranded viral-RNA 21 . TLR7 has been duplicated independently in several avian species 22 and differences in TLR7 tropism and function have been associated with the increased resistance of ducks to HPAI 23 There was no notable structural difference in the TLR7 gene between the black and mute swan genomes. However, strikingly TLR7 expression signals were detected in ISO-Seq analysis of mute swans but not in the ISO-seq analysis of black swan (Supplementary Figure 8) . To independently confirm these data, we investigated the expression of TLR7 using qRT-PCR in black swan tissues collected post-mortem. TLR7 mRNA could not be detected in any of the collected black swan tissues (Supplementary Table 5 ). As TLR7 expression can be induced by interferon we reasoned that gene expression in the black swan may only be detected in the presence of virus infection. Accordingly, we sought to establish an in vitro model of HPAI infection in black swans. In black swans experimentally infected with HPAI virus endothelial cells are the primary target of infection 3 Figure 9) . We therefore cultured primary black swan endothelial cells according to our previously described protocol for avian species 24 and endothelial cell identity was confirmed by tube formation, uptake of acetylated low density lipoprotein, von Willebrand factor expression and the absence of CD45 expression (Supplementary Figure 10) . Chicken, duck, and black swan endothelial cells were infected with A/Chicken/Vietnam/008/2004/H5N1 (VN04) and six hours later gene expression was examined by RNASeq. PCA plots showed that mock and virus-infected samples clustered separately for all three species ( Figure 5 ). Viral RNA was detected in the infected endothelial cells of all three species (data not shown). Importantly, whilst TLR7 transcription was upregulated (although not statistically significant) in infected duck and chicken endothelial cells, TLR7 transcription could not be detected in infected or naïve black swan endothelial cells (Table 2) . Moreover, whilst MyD88, the downstream adaptor of TLR7 was upregulated in infected duck and chicken endothelial cells it was downregulated in infected black swan endothelial cells (Table 2) . These data are consistent with the absence of TLR7 expression in black swan endothelial cells, despite an apparently intact TLR7 gene in the genome. The control and the infected groups showed intergroup clustering, indicating differences in whole transcriptome profiles between the control and the infected group in each species. Table 9 ). Typically, black swan and chicken endothelial cells upregulated more cytokines and cytokine receptors than duck endothelial cells in response to HPAI VN04 infection. Indeed, the highest number of cytokines and cytokine receptors were upregulated by infected chicken endothelial cells (Supplementary Table 9 ). In infected black swan endothelial cells 113 GO terms were significantly enriched (Supplementary Table 10 ; Figure 6A ). Many of these GO terms were associated with innate immunity, the cytokine signalling response and chemokine signalling. Several innate immunity pathways were increased in response to viral infection (z -score > 0) whilst GO terms such as negative regulation of MAP kinase activity and negative regulation of JNK cascade were decreased (z-score < 0). Similarly, 123 enriched GO terms in infected chicken endothelial cells included positive regulation of viral response and regulation of leukocyte chemotaxis (Supplementary Table 11 ; Figure 6B ). Terms such as leukocyte mediated cytotoxicity were increased after infection (z-score > 0) whilst negative regulation of apoptotic signalling and the positive regulation of innate immune responses were decreased. Strikingly, GO biological process terms enriched in infected duck endothelial cells were not primarily associated with the innate immune response (Supplementary Table 12 ; Figure 6C ). Rather, most genes were linked to cellular biological signalling and activity. This finding is consistent with our previous study of HPAI viruses in duck endothelial cells 25 . Interestingly, in direct contrast to black swans the inactivation of MAPK activity was significantly increased in ducks (z-score < 0). Due to the wide-ranging role of the MAPK cascade, including pro-inflammatory responses, we further investigated the expression profiles of the genes and identified ten genes involved in the "inactivation of MAPK pathway," five of which were significantly downregulated genes (i.e., DUSP1, DUSP4, DUSP7, DUSP10, and RGS3) in black swans (Supplementary Figure 13) . Dual-specificity phosphatases (DUSPs) are negative regulators of MAPKs and their associated pro-inflammatory effects 26 . Accordingly, we specifically examined the differential expression of DUSPs across the three avian species. In the black swan, all DUSPs were either not differentially expressed or down regulated in response to infection. In contrast, in the duck all DUSPS (except for DUSP15) were upregulated. Similar, in the chicken, DUSP1, DUSP5, DUSP7, DUSP10, DUSP15 DUSP16 were significantly downregulated in response to infection (Supplementary Table 13 ). In contrast, in the duck, all DUSPS (except for DUSP15) were upregulated. This transcriptional profile is consistent with poor regulation of a pro-inflammatory response to HPAI virus in black swans. The bars represent the enrichment score for the corresponding GO biological term with a pvalue <0.05. The black and mute swan reference genomes provided herein represent the first publicly available swan genomes. The analyses of these genomes, together with the first black swan transcriptome in response to HPAI virus infection, has provided a unique insight into the plumage and immune system of the black swan. The genomic insights provided by the present study were only possible due to the growing availability and accessibility of third generation sequencing. Specifically, older technologies that generate short read sequences can result in incorrect assembly, annotation errors and a large amount of manual effort to correct individual genes 7 . In contrast, and consistent with the broader goals of the Vertebrate Genomes Project 7 , the use of longer read sequences herein allowed us to generate black swan and mute swan genomes that were scaffolded to near chromosomal length and that were of comparable quality to the well-annotated chicken reference genome. Genomic analysis of four genes known to be associated with plumage colour in other birds 27 identified a potential frame-shift mutation in the first exon of SLC45A2 in the mute swan, which may have led to pseudogenization of this gene. SLC45A2 encodes a transporter protein involved in melanin synthesis and is considered one of the most important proteins affecting human pigmentation 28 . Mutations in the SLC45A2 gene have been reported in albinism in humans 29 . Furthermore, mutations in the gene have been associated with plumage colour variation in Japanese quails 30 , indicating the importance of the SLA45A2 in avian plumage. Interestingly, should a mutation of SLC45A2 have resulted in the differential plumage of the black and mute swan it would suggest that the last common ancestor of these birds was, in fact, black. This is direct contrast to the metaphor of 'black swan events' that are so defined because of their unprecedented and unexpected nature. Instead, it would appear that at one point in history black plumage for the swan was the norm rather than the exception. Compared to the last common ancestor, mute swan and mallard duck gene families involved in immune system processes were expansive. In contrast, no expansion in immune gene families was noted in the chicken or the black swan. This differential immune gene expansion, and its implications for susceptibility to HPAIV, are likely compounded by the observed impaired expression of TLR7 in the endothelial cells of black swans. Interestingly, other genes that have been observed to be differentially expressed between chickens and ducks, and implicated in susceptibility to HPAIV, were not differentially expressed between infected black swan and duck endothelial cells (e.g. RIG-I and IFITM3) [31] [32] [33] [34] . It is interesting to speculate as to whether mute swan endothelial cells would express TLR7. However, the presence or absence of TLR7 in the endothelial cells of mute swans is perhaps irrelevant to the pathogenesis of HPAIV, as the virus is not heavily endothelial tropic in this species 3 . In the black swan the observed differences in TLR7 expression in endothelial cells speaks to the value of combining genomics with both primary cell culture and transcriptomics, as has recently been suggested as the new standard for comparative genomics by Stephan and colleagues 35 . Either as a result of, or in addition to, these observed immune differences, black swan endothelial cells also mounted a markedly pro-inflammatory response to HPAIV infection. We have previously reported a similar pro-inflammatory in infected chicken endothelial cells (compared to those of ducks) and speculated that this inflammatory response leads to immunopathology in chickens in vivo 36 . Whether disease severity in black swans is driven by immunopathology remains to be determined, although it is consistent with the observed pathology in infected birds 37, 38 . In sum, it is likely that this combination of species-specific differences in the immune response contribute to the marked susceptibility of both the black swan and chicken to HPAIVs. The observed species dependent differences in the immune responses of swans raises the intriguing question as to why the black swan continues to thrive in its native Australia as well as in New Zealand (where it was introduced in 19 th century). This may be due to the fact that HPAI is not endemic in Australia and New Zealand. Indeed, captive populations of black swans located in parts of the world frequently exposed to HPAI are highly susceptible to severe disease 4 . The data presented in this study would therefore suggest that should HPAI become more prevalent in the Oceania region the ongoing survival of the black swan would be at significant risk. Moreover, many of the immune limitations described herein are not specific to avian influenza viruses. For example, TLR7 is essential in the immune recognition of a wide number of viral pathogens including avian coronaviruses 39 . These data suggest that should any avian endothelial specific viral infection become established in the native habitat of the black swan the survival of this iconic species would be in significant peril. Interactive influence of infectious disease and genetic diversity in natural populations Experimental infection of swans and geese with highly pathogenic avian influenza virus (H5N1) of Asian lineage Influenza virus and endothelial cells: a species specific relationship Avian Flu H5N8 in a Zoo and in Poultry -Media -OIE reports. Israel: Avian Flu H5N8 in a Zoo and in Poultry -Media -OIE reports 2020 The duck genome and transcriptome provide insight into an avian influenza virus reservoir species Adaptation and conservation insights from the koala genome Towards complete and error-free genome assemblies of all vertebrate species A phylogenetic study of bird karyotypes Three chromosome-level duck genome assemblies provide insights into genomic variation during domestication The repetitive landscape of the chicken genome The global diversity of birds in space and time A new duck genome reveals conserved and convergently evolved chromosome architectures of birds and mammals Mutations in SLC45A2 cause plumage color variation in chicken and Japanese quail Breeding history and candidate genes responsible for black skin of Xichuan blackbone chicken Mutations in the gene encoding B, a novel transporter protein, reduce melanin content in medaka Membrane-associated transporter protein (MATP) regulates melanosomal pH and influences tyrosinase activity The genomic architecture of the passerine MHC region: High repeat content and contrasting evolutionary histories of single copy and tandemly duplicated MHC genes Passerine MHC: genetic variation and disease resistance in the wild The chicken B locus is a minimal essential major histocompatibility complex A "minimal essential Mhc" and an "unrecognized Mhc": two extremes in selection for polymorphism Goose Toll-like receptor 7 (TLR7), myeloid differentiation factor 88 (MyD88) and antiviral molecules involved in anti-H5N1 highly pathogenic avian influenza virus response Toll-like receptor evolution in birds: gene duplication, pseudogenization, and diversifying selection Innate sensing of viruses by pattern recognition receptors in birds The culture of primary duck endothelial cells for the study of avian influenza Primary Chicken and Duck Endothelial Cells Display a Differential Response to Infection with Highly Pathogenic Avian Influenza Virus Mitogen-activated protein kinases in innate immunity Breeding history and candidate genes responsible for black skin of Xichuan blackbone chicken Association of the SLC45A2 gene with physiological human hair colour variation Identification of two novel mutations in the SLC45A2 gene in a Hungarian pedigree affected by unusual OCA type 4 Mutations in SLC45A2 cause plumage color variation in chicken and Japanese quail Identification of avian RIG-I responsive genes during influenza infection Association of RIG-I with innate immunity of ducks to influenza A comparative analysis of host responses to avian influenza infection in ducks and chickens highlights a role for the interferon-induced transmembrane proteins in viral resistance Duck Interferon-Inducible Transmembrane Protein 3 Mediates Restriction of Influenza Viruses Darwinian genomics and diversity in the tree of life Primary Chicken and Duck Endothelial Cells Display a Differential Response to Infection with Highly Pathogenic Avian Influenza Virus Influenza virus and endothelial cells: a species specific relationship Experimental infection of swans and geese with highly pathogenic avian influenza virus (H5N1) of Asian lineage Wild birds as reservoirs for diverse and abundant gamma-and deltacoronaviruses We would like to acknowledge colleagues from the Friedrich-Loeffler-Institut (Jens Peter Teifke, Robert Klopfleisch and Angele Breithaupt) for providing FFPE material. Special thanks to Christoph Schulze from the state diagnostic laboratory was involved in the autopsy of most of the samples use for FFPE material collection and Ashling Charles from DNA Zoo Australia team for routine data processing support. Hi-C data for the Black Swan was created by the DNA Zoo Consortium (www.dnazoo.org). DNA Zoo sequencing effort is supported by Illumina, Inc. E.L. KRS is a consultant for Sanofi, Roche and NovoNordisk. The opinions and data presented in this manuscript are of the authors and are independent of these relationships. Other authors declare no competing interests.