key: cord-0308963-10uwrr8o authors: Sabik, Olivia L; Calabrese, Gina M; Taleghani, Eric; Ackert-Bicknell, Cheryl L; Farber, Charles R title: Identification of a core module for bone mineral density through the integration of a co-expression network and GWAS data date: 2019-10-13 journal: bioRxiv DOI: 10.1101/803197 sha: 8e18bd5064b044cc308d8d2c18ce8c0c3eee9024 doc_id: 308963 cord_uid: 10uwrr8o Recently, the “omnigenic” model of the genetic architecture of complex traits proposed two general categories of causal genes, core and peripheral. Core genes are hypothesized to play a direct role in regulating disease; thus, their identification has the potential to reveal critical regulators and novel therapeutic targets. Here, we sought to identify genes with “core-like” characteristics for bone mineral density (BMD), one of the most significant predictors of osteoporotic fracture. This was accomplished by analyzing genome-wide association study (GWAS) data through the lens of a cell-type and timepoint-specific gene co-expression network for mineralizing osteoblasts. We identified a single co-expression network module that was enriched for genes implicated by GWAS and partitioned BMD heritability, correlated with in vitro osteoblast mineralization, and enriched for genes, which when mutated in humans or mice, led to a skeletal phenotype. Further characterization of this module identified four novel genes (B4GALNT3, CADM1, DOCK9, and GPR133) located within BMD GWAS loci with colocalizing expression quantitative trait loci (eQTL) and altered BMD in mouse knockouts, suggesting they are causal genetic drivers of BMD in humans. Our network-based approach identified a “core” module for BMD and provides a resource for expanding our understanding of the genetics of bone mass. Core genes have been broadly defined as those that directly influence a disease-207 relevant biological processes 11,12. Thus, severe perturbation of a core gene is more likely to 208 result in a significant impact on a phenotype, as in the case of a mouse knockout or human 209 monogenic disease. We identified all gene knockouts that produced a bone phenotype, defined 210 as either a change in BMD, bone mineral content (BMC), abnormal bone morphology, or 211 abnormal bone cell activity, by utilizing mouse knockout phenotype data from several databases 212 [26] [27] [28] [29] (Supplemental File 7) . Of the 13 modules enriched for BMD GWAS genes, two were 213 enriched for genes whose deficiency impacted bone in mice ( Figure 2D ). The purple module 214 was the most significantly enriched (OR=5.4, Padj = 1.6 x 10-34). We also compiled a list of 35 215 known drivers of monogenic bone diseases associated with osteoblast dysfunction, including 216 osteogenesis imperfecta, hyperostosis, and osteosclerosis (Supplemental File 8) 30-34. Again, 217 the purple module, containing 11 of 35 (31.4%) monogenic disease genes, was the most 218 9 significantly enriched (OR = 21.3, Padj = 6.9 x 10-9) ( Figure 2E ). Together, these independent 219 lines of evidence suggested the purple module was enriched for genes with core-like properties. Alpl 39, among many others (Supplemental File 12) . Thus, to further investigate the purple 280 module, we evaluated the expression of its genes with regards to osteoblast differentiation. To 281 do this, we utilized transcriptomic profiles collected from purified osteoblasts at multiple time 282 points across differentiation (GSE54461). Using k-means clustering, we found that the genes 283 within the purple module clearly partitioned into two distinct transcriptional profiles with regards 284 to differentiation (Figure 4A ,B). We have termed these groups the Early Differentiation 285 Submodule (EDS; high expression early and low expression late) (N=192 transcripts; 175 286 unique genes) and the Late Differentiation Submodule (LDS; low expression early and high 287 expression late) (N=423 transcripts; 323 unique genes). 288 We assessed whether there were differences between the EDS and the LDS with regard 289 to network parameters and their enrichment for functional annotations seen in the purple 290 module. We first looked at intramodular connectivity, measured by module membership 291 (correlation between the expression of each gene and the module eigengene). On average, 292 LDS genes had higher module membership scores than EDS genes (P = 3.0 x 10-4) ( Figure 293 4C), suggesting they may play more critical roles in the context of overall module behavior. 294 Additionally, the LDS was more significantly enriched for genes implicated by GWAS (OR = 3.0, 295 Padj = 5.2 x 10-10), osteoblast relevant GO terms (e.g. "ossification" (Padj = 3.24x10-14), skeletal 296 development" (Padj = 9.6 x 10-11), "osteoblast differentiation" (Padj = 1.4 x 10-4), and "biomineral 297 we analyzed histone modifications across the osteoblast genome and observed that they were 330 more likely to overlap regions marked by modifications associated with active regulatory 331 elements such as H3K4me1 (2.8x enrichment, P < 1 x 10-3), H3Kme2 (3.2x enrichment, P < 1 x 332 10-3), H3K4me3 (3.8x enrichment, P < 1 x 10-3), and H3K27ac (2.6x enrichment, P < 1 x 10-3) 333 relative to 1000 sets of random SNPs matched for allele frequency and distance from a 334 transcription start site ( Figure 5) . Additionally, we observed depletion of LDS SNPs in 335 heterochromatic regions, marked by H3K9me3 (0.14x depletion, P < 1 x 10-3). 336 To determine if the enrichments were specific to osteoblasts, we calculated the ratio 337 The overarching goal of this study was to identify causal genes from a module enriched 355 for genes with core-like properties underlying BMD GWAS associations. As described above, 48 356 Together, these data allowed us to directly link BMD associated variants to LDS genes and LDS 362 genes to pathways regulating BMD. We performed a colocalization analysis for each 363 eQTL/BMD association pair for all 48 genes in 48 GTEx tissues and identified 12 LDS genes 364 with colocalizing eQTL (PP4>0.7) (Supplemental File 14 and Figures 6A, B, C, and D) . We 365 also queried each of 12 LDS genes with a colocalizing eQTL and found that BMD had been 366 measured by the IMPC in 5 mutants. Of these, four genes (Cadm1, B4galnt3, Dock9, and 367 Osteoporosis is an increasingly common disease associated with reduced BMD and 393 negative health outcomes, namely fracture 1. Despite its significant genetic component, we do 394 not fully understand the genes and mechanisms that influence osteoporosis and its 395 determinants, such as BMD. Moreover, current therapeutics for osteoporosis have been 396 associated with rare side effects, leading to decreased compliance 47. Identification of the causal 397 genes with core-like properties that regulate BMD will help us to further understand the etiology 398 of osteoporosis and lead to the development of novel therapeutics. In this study, we identified 399 the LDS, a co-expression submodule enriched for genes with core-like properties influencing 400 BMD, by integrating a cell-and timepoint-specific co-expression network with the results of BMD 401 GWAS. We then used this information to identify four LDS genes that overlapped GWAS loci, 402 had colocalizing eQTL, and altered BMD in knockouts, suggesting they are causal for their 403 respective BMD GWAS association. 404 In this work, we hypothesized that the genes underlying BMD GWAS associations are 405 not created equal with respect to "biological importance" or membership in pathways with direct 406 impacts on bone mass. Substantial prior evidence supports this prediction 48,49and it is one of 407 the primary tenets of the omnigenic model 11,12. Identification of genes whose activity or function 408 is more proximal to BMD is important for a number of reasons. First, the identification of genes 409 with core-like properties has the potential to identify critical new players in pathways known to 410 directly impact bone and to uncover new processes essential to skeletal growth and 411 maintenance. Second, it provides a way to prioritize hundreds of BMD GWAS loci for further 412 investigation. Third, based on their central role in the regulation of BMD, it is logical to use the 413 concept of a core gene as a way to prioritize gene discovery in the context of selecting 414 promising new therapeutic targets for evaluation. 415 The omnigenic model uses a strict statistical definition to define core genes and many 416 have debated the utility of this designation11,15-17. Some have argued that focusing on core 417 genes underestimates the complexity of complex traits, attributing biologically nuanced diseases 418 to a small set of genes 13. Others have argued that the focus should not be on thoroughly 419 defining core genes, but instead on identifying the underlying biological pathways and 420 mechanisms 14. In practice, it is likely that the designation of core genes follows a spectrum 421 rather than a discrete classification. If so, then it should be possible to rank genes based on 422 their continuous "core" attributes, which would be analogous to ranking genes based on their 423 proximity to a disease or phenotype. In essence, that is what we have done in the current study 424 with the goal of identifying genes on the end of the "core" attribute distribution for mineralization. 425 Importantly, it is not likely that all genes in the LDS are causal genetic drivers or, if they are 426 causal, it is possible that several will have few core attributes. However, based on our analysis 427 and results, it is likely that many are causal genes that participate in "core" pathways and 428 processes that directly impact mineralization, bone formation, and BMD. Our study is not without limitations. First, we used gene expression data from the mouse 474 as a discovery platform, however this may limit the translational applications of the work due 475 biological differences and missing homologs between mouse and human. Secondly, this was 476 not a comprehensive study of the genetic effects driving osteoporosis, because we focused 477 exclusively on the contributions of just one cell type, bone-forming osteoblasts. In future work, 478 this approach could also be applied to other bone cell types. For example, one could use in vitro 479 measures of osteoclast activity as a filter to identify groups of genes influencing bone resorption, 480 and ultimately BMD. Finally, the eQTL comparisons made in this study were not derived from 481 expression data in bone tissue, as bone tissue expression was not measured in the GTEx 482 project. While we identified colocalizing eQTL in other tissues, these eQTL may be irrelevant to 483 BMD or the direction of eQTL effects in non-bone tissues may not reflect the direction of effect 484 in osteoblasts. 485 While we identified four novel regulators of bone mineral density, there is still much to be 486 gleaned from the late differentiation submodule. The LDS is a promising resource for two key 487 applications: (1) causal gene discovery and functional follow up and (2) studying the impact of 488 genetic variation on biological networks and complex phenotypes. There are still many genes 489 with no known connection to BMD in the LDS that are likely important to osteoblast biology and 490 mineralization. Additionally, the LDS is not just a list of candidate genes; it also provides insight 491 into the molecular hierarchy driving osteoblast differentiation and mineralization, which can 492 demonstrate how genetic variation impacts biological networks. The network topology of the 493 LDS can also be used to infer the causal relationships between genetic variants and the many 494 22 genes that influence osteoblast activity. Moving forward, the LDS can serve as a platform for the 495 identification of novel determinants of BMD and for furthering our understanding of the nuanced 496 relationship between genetic variation, molecular phenotypes, and complex traits. 497 In summary, we have used an integrative, network-based method to identify core genes 498 for the process of mineralization and BMD. While the definition of a core gene is still open to 499 debate, we found the expected properties of core genes are effective lenses through which to 500 contextualize GWAS associations. Integrating gene co-expression networks, GWAS data, in 501 vitro and in vivo phenotypic data reflecting "core" properties, and eQTL information has led us to 502 a more complete understanding of the biology and genetics of BMD. 503 Methods 505 RNA-seq: Neonatal collaborative cross heads were received from the University of North 506 Carolina. At UNC, neonatal (3-5 days) collaborative cross mice were euthanized by CO2, 507 decapitated onto paper towels soaked in 70% ethanol, and placed in cold PBS on ice for 508 overnight shipping. Once received, calvaria were dissected, paying special attention to brain 509 and interparietal bone removal. Isolated calvaria were placed in 24 well plates containing 0.5 mL 510 of digest solution (0.05% trypsin and 1.5 U/ml collagenase P) and incubated on a rocking 511 platform at 37 degrees during six, fifteen-minute digestions in 0.5 mL of digestion solution. 512 build mm9 using Bowtie, and quantified using EMASE 60. EMASE output transcript level 531 expression estimates calculated by assigning multi-mapping reads across the genome using 532 and expectation-maximization algorithm to allocate reads that differentiate between genes, then 533 isoforms of a gene, and then alleles. 534 535 WGCNA network construction: Estimated transcript count data was used as the basis for co-536 expression network construction. We removed transcripts with less than an average tpm <= 0.3 537 tpm across all samples, resulting in 29,000 transcripts used to construct the network. We used a 538 variance stabilizing transformation from the DESeq2 package that decouples the variance from 539 the mean 61. Next, we used PEER in order to remove latent confounding batch effects from our 540 data 62. As per PEER recommendations, we estimated PEER factors equal to one quarter of the 541 number of samples (N = 24) and included covariates in the calculation. We carried out the 542 downstream analysis with the residual values from PEER transformation. Finally, we used 543 quantile normalization to match the distribution of each of the samples in the analysis. 544 The resulting expression data was used to construct a signed, weighted gene co-545 expression network using the weighted gene co-expression network analysis (WGCNA) 546 24 package 63. There were no evident outliers from the hierarchical clustering analysis. The 547 pickSoftThreshold() function from the wgcna package was used to determine the power used to 548 calculate the network. The minimum power value that had an R2 >= 0.9 for the scale-free 549 topology model fit was used, and the network was calculated using a power of 9. We then used After 10 days of differentiation and mineral production, cells are washed with PBS and treated 591 with 10% NBF (1 mL per well) and incubated at room temperature for 15 minutes. The NBF is 592 removed and cells are washed with H2O (1mL x 2). Next, wells are stained with alizarin red ( 0.5 593 mLs, 40 mM @ pH 5.6) for 20 minutes on a shake plate at 120 rpm. Alizarin red stain is then 594 removed, and cells are washed 5 times with deionized H2O for 5 minutes on a shake plate at 595 180 rpm. Once rinsed, the mineralized wells are scanned, and .tiff images are retained to 596 Clinical Practice An estimate of the worldwide prevalence and disability 701 associated with osteoporotic fractures Genetics of osteoporosis Insights into the genetics of 705 osteoporosis from recent genome-wide association studies Genetic regulation of bone mass and 708 susceptibility to osteoporosis Using GWAS to identify novel therapeutic targets for 710 osteoporosis Genetic determinants of bone mass and 712 osteoporotic fracture. Principles of Bone Biology Genome-wide meta-analysis identifies 56 bone mineral density 715 loci and reveals 14 loci associated with risk of fracture Identification of 153 new loci associated with heel bone mineral 718 density and functional involvement of GPC6 in osteoporosis An Atlas of Human and Murine Genetic Influences on 721 An Expanded View of Complex Traits: From 723 Polygenic to Omnigenic Regulation of Osteoblast Differentiation by Runx2 An atlas of genetic influences on osteoporosis in humans and 730 mice The omnigenic model: Response from the 732 authors Disease Is More Complex Than Implied by the Core Gene Omnigenic Model Comments on Pritchard Paper Integrative genomics reveals novel molecular pathways and 739 gene networks for coronary artery disease Systems genetics approaches to understand complex 741 traits Adamts2 as Drivers of Isoproterenol-Induced Cardiac Hypertrophy and 744 Cardiomyopathy in Mice Identification of co-expression gene networks, regulatory 746 genes and pathways for obesity based on adipose tissue RNA Sequencing in a 747 porcine model Gene co-expression analysis identifies brain regions and cell types 749 involved in migraine pathophysiology: a GWAS-based study using the Allen 750 The Collaborative Cross, a community resource for the 752 genetic analysis of complex traits Gene 754 co-expression analysis for functional classification and gene-disease predictions Partitioning heritability by functional annotation using 757 genome-wide association summary statistics Mouse Genome Informatics (MGI, Mouse Genome Database, MGD) The International Mouse Phenotyping Consortium Web Portal, 762 a unified point of access for knockout mice and related phenotyping data High-Throughput, Multi-Image Cryohistology of Mineralized 765 Rapid phenotyping of knockout mice to identify genetic 767 determinants of bone strength How rare bone diseases have informed our knowledge of complex 769 diseases MECHANISMS IN ENDOCRINOLOGY: Genetics of 771 human bone formation Mendelian bone fragility disorders Genetic determinants of osteoporosis: common bases to 775 cardiovascular diseases? Genetics of osteoporosis: searching 777 for candidate genes for bone fragility SP7 inhibits osteoblast differentiation at a late stage in mice SOST is a ligand for LRP5/LRP6 and a Wnt 781 signaling inhibitor Sclerostin is a locally acting regulator of late-783 osteoblast/preosteocyte differentiation and regulates mineralization through a 784 MEPE-ASARM-dependent mechanism Osteocalcin secretion as an early marker of in vitro osteogenic 787 differentiation of rat mesenchymal stem cells The role of alkaline phosphatase in 790 mineralization Functional 792 involvement of PHOSPHO1 in matrix vesicle-mediated skeletal mineralization Skeletal and extra-skeletal 795 effects FAM20C regulates osteoblast behaviors and intracellular signaling 797 pathways in a cell-autonomous manner MEPE: A Mineralization Regulating Bone Matrix Protein MEPE regulation of 802 phosphate homeostasis and skeletal mineralization The NIH Roadmap Epigenomics Program data resource Genetic effects on gene expression across human tissues Fearing drugs' rare side effects, millions take their chances with 809 osteoporosis Finding the missing heritability of complex diseases International Schizophrenia Consortium et al. Common polygenic variation 813 contributes to risk of schizophrenia and bipolar disorder Identification of a gene module associated with BMD through the 816 integration of network analysis and genome-wide association data Integrating GWAS and Co-expression Network Data 819 Identifies Bone Mineral Density Genes SPTBN1 and MARK3 and an Osteoblast 820 Functional Module Role of the spermatogenic-Sertoli cell interaction 822 through cell adhesion molecule-1 (CADM1) in spermatogenesis CADM1 regulates the G1/S transition and represses 825 tumorigenicity through the Rb-E2F pathway in hepatocellular carcinoma miR-21 enhances cardiac fibrotic remodeling and 828 fibroblast proliferation via CADM1/STAT3 pathway Negative feedback loop of bone resorption by NFATc1-831 dependent induction of Cadm1 Cell adhesion molecule 1 is a new osteoblastic cell adhesion 833 molecule and a diagnostic marker for osteosarcoma Predicting the therapeutic efficacy of MSC in bone tissue 835 engineering using the molecular marker CADM1 Zizimin1, a novel Cdc42 activator, reveals a new GEF domain for Rho proteins FastQC: a quality control tool for high throughput sequence 841 data Hierarchical analysis of RNA-seq reads improves the 843 accuracy of allele-specific expression Moderated estimation of fold change and 845 dispersion for RNA-seq data with DESeq2 Using probabilistic 847 estimation of expression residuals (PEER) to obtain increased power and 848 interpretability of gene expression analyses WGCNA: an R package for weighted correlation 850 network analysis ToppGene Suite for gene list 852 enrichment analysis and candidate gene prioritization LDlink: a web-based application for exploring 855 population-specific haplotype structure and linking correlated alleles of possible 856 functional variants Software for computing and annotating genomic ranges The Ensembl genome database project Bayesian test for colocalisation between pairs of genetic 862 association studies using summary statistics RACER: A data visualization strategy for exploring 864 multiple genetic associations Tool Kit for Standardized Analysis of High Throughput Phenotypic Data extract geometric parameters of the mineral deposits. After imaging, the wells are de-stained by 597 incubation with 5% perchloric acid (1 mL) at room temperature for 5 minutes while shaking at 598 120 rpm. Eluent is collected and read at 405 nm. The levels of in vitro mineralization varied 599 significantly across the population, with a 63-fold change from the highest to lowest 600 mineralization samples (max_mmAR = 2.995993, min_mmAR = 0.04719). 601In this population, in vitro mineralization had a heritability of 47.8%(p=1.8x10-46), 602indicating that the between-strain variation is larger than the within strain variation and that there 603 is a genetic contribution to the process of mineralization. Using the WGCNA package, the 604 eigengene of each module was calculated, and the correlation between the eigengene and the 605 in vitro mineralization phenotype was calculated using the cor() function in R. The p-values 606 associated with the correlation between the module eigengenes and in vitro mineralization were 607 corrected for multiple testing using a Bonferroni correction and a p-value cutoff of 0.05 was 608 applied to the adjusted p-values. Specifically, we pulled BMD, altered bone morphology, altered bone cell activity, changes in 618 ossification or mineralization, or association with a known bone disease from the MGI database. 619The OBCD database contained genes with changes in bone mineral content (BMC), bone 620 volume fraction (BV/TV), and BMD of the femur and BMD of the vertebra. We mined the IMPC 621 database for any genes with altered BMD, and we pulled all Bonebase genes with altered 622 27 BV/TV in the femur or vertebra. This resulted in a list of 923 unique "bone" genes 623 (Supplemental File 7). 624We also curated a list of genes associated with monogenic bone disorders using a 625 literature review, specifically focusing on genes that disrupt osteoblast function, leading to 626 monogenic bone disorders 30-34) (Supplemental File 8) . We used a fisher's exact test to 627 measure the statistical significance of the representation of genes with associated mouse 628 knockout bone phenotypes and monogenic bone disease in each module. We then applied a