key: cord-0287613-qtu9xx9z authors: Keele, Gregory R.; Zhang, Tian; Pham, Duy T.; Vincent, Matthew; Bell, Timothy A.; Hock, Pablo; Shaw, Ginger D.; Munger, Steven C.; de Villena, Fernando Pardo-Manuel; Ferris, Martin T.; Gygi, Steven P.; Churchill, Gary A. title: Regulation of protein abundance in genetically diverse mouse populations date: 2020-09-23 journal: bioRxiv DOI: 10.1101/2020.09.18.296657 sha: 1e72dd06d651ce750b90bdcad8dc1e9618dbe0c4 doc_id: 287613 cord_uid: qtu9xx9z Proteins constitute much of the structure and functional machinery of cells, forming signaling networks, metabolic pathways, and large multi-component complexes. Protein abundance is regulated at multiple levels spanning transcription, translation, recycling, and degradation to maintain proper balance and optimal function. To better understand how protein abundances are maintained across varying genetic backgrounds, we analyzed liver proteomes of three genetically diverse mouse populations. We observe strong concordance of genetic and sex effects across populations. Differences between the populations arise from the contributions of additive, dominance, and epistatic components of heritable variation. We find that the influence of genetic variation on proteins that form complexes relates to their co-abundance. We identify effects on protein abundance from mutations that arose and became fixed during breeding and can lead to unique regulatory responses and disease states. Genetically diverse mouse populations provide powerful tools for understanding proteome regulation and its relationship to whole-organism phenotypes. Proteins constitute much of the structure and functional machinery of cells, forming signaling 23 networks, metabolic pathways, and large multi-component complexes. Protein abundance is 24 regulated at multiple levels spanning transcription, translation, recycling, and degradation to 25 maintain proper balance and optimal function. To better understand how protein abundances 26 are maintained across varying genetic backgrounds, we analyzed liver proteomes of three 27 genetically diverse mouse populations. We observe strong concordance of genetic and sex 28 effects across populations. Differences between the populations arise from the contributions of 29 additive, dominance, and epistatic components of heritable variation. We find that the 30 influence of genetic variation on proteins that form complexes relates to their co-abundance. 31 We identify effects on protein abundance from mutations that arose and became fixed during 32 breeding and can lead to unique regulatory responses and disease states. Genetically diverse 33 mouse populations provide powerful tools for understanding proteome regulation and its 34 relationship to whole-organism phenotypes. 35 36 37 38 Key words 39 proteomics, protein complex, systems genetics, pQTL, Collaborative Cross, Diversity Outbred, 40 inbred strains 41 Protein abundance is regulated at multiple levels spanning transcription, translation, recycling, 43 and degradation. It is responsive to genetic variation, as observed in yeast (Picotti et In this work, we obtained multiplexed mass spectrometry (mass-spec) quantification of proteins 79 in liver samples of 116 CC mice representing female/male pairs from 58 strains. We previously 80 collected proteomics data from the livers of 192 DO mice and 32 mice representing the eight 81 founder strains (two animals of each sex per founder strain) (Chick et al., 2016) (Figure 1a) . We 82 map protein quantitative trait loci (pQTL) in both the CC and DO and find consistency of the 83 genetic and sex effects on protein abundance across the populations. We characterize the 84 genetic architecture of protein-complexes in both the CC and DO, and find that complexes with 85 highly co-abundant members are heritable and exhibit non-additive and/or highly polygenic 86 genetic architecture, which is more apparent in the CC. Finally, we identify genetic variants that 87 originated and became fixed during the of the CC and contribute to CC strain-specific 88 phenotypes. Our work demonstrates the complementary strengths of these genetic resources 89 and offers new insights into the genetic regulation of protein abundance. 90 91 RESULTS 92 We quantified the abundance of 6,779 proteins in liver tissue from 58 inbred CC strains, one 94 female and male per strain. We previously reported quantification of proteins from liver tissue 95 of 192 outbred DO mice and 32 mice representing the eight founder strains (two per sex per 96 strain) (Chick et al., 2016) . The data for DO and founder strains were re-analyzed for this study 97 to ensure that all data were processed consistently (Methods), resulting in the quantification of 98 6,588 and 6,950 proteins, respectively. From the 9,226 total proteins detected, 4,434 were 99 observed in all three populations (Figure 1b ; Table S1 ). 100 We estimated protein abundance heritability (h 2 ), which reflects the combined effects of 101 genetic variants, their allele frequencies, and the large-scale genetic background (e.g., inbred or 102 outbred) of each population (Figure 1c) . Heritability estimates were highest in the CC and 103 founder strains, due to capturing variation from non-additive genetic effects (i.e., recessive and 104 epistatic) in inbred strains. Heritability estimates in the DO are limited to additive genetic 105 factors (narrow sense heritability; Lynch and Walsh 1998) that are captured in the genetic 106 relatedness of our sample. Heritability estimates were significantly correlated across 107 populations (r > 0.4, p <2.2e-16), suggesting that much of the controlling genetic variation is 108 conserved across populations. 109 In order to identify the genetic loci that drive variation in protein abundance, we carried out 110 protein quantitative trait locus (pQTL) mapping in the CC and DO (Table S2) . To determine 111 significant pQTL, we first applied a permutation analysis (Doerge and Churchill, 1996) to control 112 genome-wide error rate for each protein and then applied a false discovery rate adjustment 113 (FDR < 0.1) across proteins (Chesler et al., 2005) to establish a stringent detection threshold for 114 pQTL. Using this criterion, we identified 1,199 local and 289 distal pQTL in the CC and 1,652 115 3 116 (Figure 2a & b) , where local is defined as when the pQTL is 132 located within 10 Mbp of the midpoint of the protein-coding gene. We also identified a local 133 pQTL on the mitochondrial genome in the CC for mt-Nd1 ( Figure S1d) . In order to prevent the 134 exclusion of true pQTL due to overly stringent control of false positive and accurately compare 135 pQTL discovery across populations, we carried out a parallel analysis with more lenient FDR 136 control (FDR < 0.5; Figure S1a & b). 137 We next restricted the set of proteins to the 4,556 that were detected and analyzed in both 138 populations in order to compare genetic effects between the CC and DO. (Figure 2c) . Among 139 1,441 local pQTL stringently detected for the shared proteins, 639 were detected in both 140 populations (Figure 2d ). To determine if these local pQTL were driven by the same genetic 141 variants, we compared the estimated haplotype effects at each pQTL (Methods) and found that 142 622 (97.3%) were significantly positively correlated (Figure 2e ; Table S3 ). To assess whether 143 pQTL detected in only one population are population-specific or fell below the detection 144 threshold in the other population but are likely driven by the same genetic variation, we 145 compared the haplotype effects of detected pQTL to effects estimated at the corresponding 146 locus in the other population, finding that 1,239 (86.0%) were significantly positively correlated. 147 This trend for local pQTL holds true for lenient detection (Figure S1f & g; Table S3 ), indicating 148 that local genetic effects on proteins are highly conserved between the CC and DO populations, 149 even when failing to pass the threshold of detection in one of the populations. 150 The founder strains provide additional support for local pQTL in the CC and DO (Figure S1k-n) , 151 particularly for those that are challenging to detect due to rare alleles in the CC or DO (e.g., a 152 founder allele observed in three or fewer CC strains). We selected all genes with rare local 153 founder alleles that did not have a leniently detected pQTL in the CC, representing 2,410 genes. 154 We correlated the haplotype effects estimated at the locus closest to the gene transcription 155 start sites (TSS) with the protein abundance in the founders (Methods) and found significant 156 positive correlation for 196 genes, such as Cyp2j5. The conservation of local genetic effects 157 among the founder strains, CC, and DO can be striking (such as in Gosr2 and Cyp2d22; Figure 158 S2). The three populations together identify a more complete catalog of local pQTL than any 159 one alone. In total we find evidence to support local genetic effects on abundance for 3,004 160 proteins observed across the CC and DO. 161 Of the 183 distal pQTL stringently detected in the CC and 278 in the DO for the shared set of 162 proteins, we found 19 that were stringently detected in both populations. All 19 also had 163 significantly correlated haplotype effects (Figure 2h ; only stringently detected in only one population, we identified an additional 45 shared distal 167 pQTL between the CC and DO populations (Figure 2i ; Table S3 ). they unobserved, such as proteins that failed to be detected by mass-spec or functional non-198 coding RNA. We identified candidate mediators for each of the 19 shared distal pQTL (Figure 199 2j). The same mediator was identified in the CC and DO for 14 of these -we note that BCKDHB, 200 CCDC93, and IKBKAP are each candidate mediators of the distal pQTL for two proteins (CCDC93 201 for Ccdc22 and Fam45a, BCKDHB for Bckdha and Ppm1k for, and IKBKAP for Elp2 and Elp3). For 202 four of the 19 distal pQTL, the best candidate mediator was different for the CC and DO. For 203 example, TUBGCP3 is the strongest mediator of distal pQTL of Tubg1 and Tubgcp2 in the CC but 204 NAXD is the strongest mediator in the DO (Figure 2l) . Given that Tubg1, Tubgcp2, and Tubgcp3 205 are all members of the tubulin superfamily, TUBGCP3 is a stronger candidate based on 206 functional overlap. A true mediator should possess a local pQTL that co-localizes with the distal 207 pQTL. Notably, the local pQTL for Naxd is much stronger in comparison to the local pQTL of 208 Tubgcp3 in the DO (LODNaxd = 40.2 compared to LODTubgcp3 = 11.0) whereas in the CC, they are 209 more comparable (LODNaxd = 9.7 compared to LODTubgcp3 = 7.9). The local pQTL of Naxd in the 210 DO may be acting as a surrogate for the local genotype, and thus outperforming TUBGCP3. 211 Similarly, COPS8 is the strongest mediator for distal pQTL of Cops6 and Cops7a in the CC, 212 whereas in the DO, COPS8 was not detected as a mediator and notably did not have a local 213 pQTL (even leniently detected) in the DO, suggesting that COPS8 may be less accurately 214 measured in our DO sample population. The final case of discordance was for the distal pQTL of 215 Dnajb9, for which UGT3A2 was identified as a mediator in the DO whereas no candidate was 216 found in the CC. Ugt3a2 possesses local pQTL in both the CC and DO, which is notably strong in 217 the DO (LODUgt3a2 = 36.9), suggesting it may be a false positive mediator in the DO and the true 218 mediator was unobserved for both populations. We considered all distal pQTL that were 219 stringently detected (FDR < 0.1) in one of the populations, and evaluated the corresponding 220 pQTL status (stringent, lenient, or not detected) and mediation status (e.g., same or different 221 mediator) in the other population (Figure S1o) , which revealed similar levels of concordance in 222 mediation for pQTL detected in only one of the populations. 223 When we expand our pQTL comparisons to include all leniently detected pQTL, 1,462 proteins 224 have significantly correlated local pQTL (223 more than with stringent detection in one 225 population; Figure S1g ). In contrast, fewer distal pQTL with significantly correlated effects are 226 detected (22 compared to 64) ( Figure S1j ). Some pQTL that fail to meet the stringent threshold 227 in either population still replicate across the CC and DO, as is the case for 12 of the 41 proteins 228 with distal pQTL and correlated haplotype effects (Figure S1i). One interesting example is Ercc3 229 (Figure S3 ), which has distal pQTL near a region on chromosome 7 that contains Gtf2h1 and 230 Ercc2, which all strongly associate with each other based on protein-protein interactions 231 ( (22.3%) proteins in the founder strains. The differences between male and female were 240 overwhelmingly in the same direction for all populations (Figure 1g -i). Gene set enrichment 241 analysis revealed that proteins related to ribosomes, translation, and protein transport gene 242 ontology (GO) terms were more abundant in male livers whereas proteins related to catabolic 243 and metabolic processes, including fatty acid metabolism, were more abundant in female livers 244 in all populations. 245 246 Drivers of variation in the abundance of protein-complexes 247 Members of protein-complexes exhibit varying degrees of co-abundance, with some groups of 248 proteins being tightly co-abundant, i.e., correlated, and others less so (Romanov et al., 2019) . 249 Correlation between members of a complex suggests some degree of co-regulation. We found 250 that Table S6 ). However, protein-265 complexes can be influenced by genetic variation, as we previously reported for the Chaperonin 266 containing T (CCT)-complex that is stoichiometrically regulated in response to the low 267 abundance of a member protein, CCT6A (Chick et al., 2016) . We evaluated the extent to which 268 members of annotated protein-complexes were internally correlated, as well as how genetic 269 factors and sex contribute to variation in their joint abundance. We summarized co-abundance, 270 which we refer to as complex cohesiveness, as the median pairwise correlation between 271 complex members (Romanov et al., 2019) . To assess the contributions from genetic factors and 272 sex, we summarized each protein-complex using the PC1 from member proteins. We first 273 filtered proteins with local pQTL (FDR < 0.5) or strong distal pQTL (FDR < 0.1) to minimize the 274 influence of individual proteins with independent genetic effects in order to focus on the 275 shared genetic effects on a protein-complex. We estimated heritability and the proportion of 276 variation explained by sex for each complex-specific PC1 (Table S7) . is homozygous in seven CC strains, whereas in our DO cohort, there were no mice homozygous 319 for the PWK allele ( Figure S5b ). Heterozygous carriers of the PWK allele in the DO mice 320 exhibited no discernable effect, suggesting that the exosome complex is regulated by a 321 recessive PWK allele at Exosc7 (Figure S5c) , which is consistent with founder PWK mice having 322 low EXOSC7 abundance ( Figure S5d ). Mediation analysis indicates that genetic variation at the 323 Exosc7 locus distally regulates the complex as well as the functionally related genes, Dis3L and 324 Etf1. These two genes were not included in the original complex annotations, and their co-325 regulation suggests new biological interactions discovered via our approach. The complex-326 heritability (with Dis3l and Etf1 included as well as complex members previously filtered out 327 due to possessing pQTL) was 91.3%, and after removing the seven strains with the PWK allele at 328 Exosc7, was reduced but still high at 61.4%, indicating the presence of additional genetic factors 329 that affect the abundance of the exosome. 330 331 Secondary genetic effect on the chaperonin complex 332 Previously we found that the CCT complex was stoichiometrically regulated and driven by low 333 abundance of CCT6A when the NOD haplotype is present ( The DO sample is well-powered to detect the low NOD allele that drives 336 the complex-wide regulation, due to 19 (9.9%) of the mice being homozygous NOD. The CC 337 strains replicate the distal pQTL at the locus of Cct6a through a low NOD effect for complex 338 members (Cct4, Cct5, Cct8, and Tcp1) -but CCT6a itself was not quantified in the CC samples. 339 The effect of the pQTL at Cct6a drives less of the overall variation in the CC sample due to fewer 340 occurrences of NOD homozygotes (12 CC mice in comparison to 19 DO mice). The CC reveals a 341 secondary genetic effect mediated through CCT4, with high NZO and PWK alleles. After 342 including complex members that were previously filtered out for possessing pQTL, the complex-343 heritability was 56.3%, which, after excluding CC strains that are NOD at Cct6a and NZO or PWK 344 at Cct4, was still significant at 46.6% (Figure 5h ), indicating that, as with the exosome complex, 345 additional genetic effects contribute to CCT complex abundance. inducible forms suggests that although individual samples vary in the relative abundance of the 384 constitutive proteasome and immunoproteasome, they predominantly express one of the 385 forms. Across the founder strains, this relationship appears to be genetically regulated, with the 386 WSB, AJ, and NZO strains expressing more immunoproteasome, and the others expressing 387 more of the constitutive form (Figure S6b-g) . In the recombinant CC and DO, we identified a 388 genetic variant that controls the balance between PSMB6 (constitutive) and PSMB9 389 (immunoproteasome). In both the CC and DO, genetic variation near Psmb9 affected PSMB9 390 abundance, as well as PSMB6 abundance, confirmed through mediation analysis (Figure 6i & j) . 391 Consistent with the anti-correlation between inducible and constitutive subunits as well as the 392 balance in the founder strains, the relationship between PSMB9 and PSMB6 is negative, e.g., 393 mice that inherited the WSB haplotype at the Psmb9 locus have high PSMB9 abundance and 394 low PSMB6 abundance (Figure 6k Genetic factors also influence other components of the 26S proteasome. We identified a strong 403 local pQTL that is consistent in both the CC and DO for Psmd9 that does not affect other 404 members of 19S regulator (Figure 6g & h) , thus explaining the lack of cohesiveness of PSMD9 405 within the proteasome, which was previously observed in the DO (Romanov et al., 2019) . A 406 distal pQTL hotspot comprising members of the 20S catalytic core and 19S regulator mapped to 407 a short interval on chromosome 1 in the CC, most strongly observed in Psma1 (Figure 6m) . No 408 single, strong mediator was detected for the hotspot, though PEX19 was the best candidate for 409 a number of the proteins, suggesting there may be multiple drivers of the hotspot or that the 410 true driver was not observed at the protein-level. 411 412 Polygenic regulation of the mitochondrial ribosomal small subunit 413 The mitochondrial ribosomal small subunit was highly cohesive in both populations (Figures 3a, 414 b, 6n, & o) . The complex-heritability is also high, and after including all annotated members, 415 was 74.0% in the CC and 51.9% in the DO. Despite high complex-heritability, we detected few 416 pQTL for individual members of the complex. The one notable exception is Auh, for which we 417 detected a strong local pQTL in the CC and DO (Figure 6r & s) . population (Figures 6p & q) . This distribution contrasts with the bimodal pattern of abundance 451 for the exosome complex, which is driven by a single strong pQTL (Figure 4h) (Figure 1a) . As a proof of concept, we first confirmed the loss of expression of 462 proteins with known deletions, such as the 80 kbp deletion in CC026 that includes C3 and a 15 463 bp deletion for CC042 in Itgal (Figure 7a- protein's coding gene, and this level of enrichment was significant per permutation (p = 3.7e-4). 482 Interestingly, not all of the observed outliers associated with genetic variants were low 483 extremes, as would be expected with a mutation that results in loss of the protein; we also 484 observed increases in protein abundance in strains with private variants, such as Sash1, which 485 harbors a novel SNP allele in CC058 (Figure 7c ). CC004 has a unique SNP allele associated with 486 low abundance of Plek, a gene which also has a weak local pQTL based on genetic variation 487 from the founder strains (Figure S7e & f) . 488 The presence of biologically related proteins with extreme abundances specific to CC strains 489 further supports the impact of strain-specific genetic variants and regulatory patterns. We 490 identified strain-specific dynamics by testing the set of outlying proteins specific to each CC 491 strain for enrichment in GO and KEGG pathway terms (Tables S8 and S9) . In CC013, we 492 observed increased abundance in proteins with GO annotations for the innate immune system 493 (Figure 7d) , leukocytes, and other immune system-related GO terms. CC013 possesses a unique 494 SNP allele in Hcls1 that was associated with increased HCLS1 abundance (Figure 7e) , a gene 495 involved in myeloid leukocyte differentiation that may contribute to the high abundance of 496 immune-related proteins in CC013. CC013 also expressed a unique liver phenotype, 497 characterized by white granules across the tissue (Figure 7f) , which may relate to the excess of 498 immune-related proteins. CC030 has a 4 bp insertion in Brd4 that was associated with high 499 BRD4 abundance and may contribute to increased abundance in proteins related to nuclear 500 chromosome and chromatin (Figure 7g-h) . The strain-specific protein outlier sets were enriched 501 in a wide range of GO biological functions (Figure S7g-j) sex effects on proteins. In comparison to individual proteins, protein-complexes were less 511 consistent across these populations, highlighting the role of non-additive genetic effects in 512 controlling protein-complex interactions. We observed a wide range of genetic effects on 513 protein complexes, ranging from stoichiometric regulation in response to a large-effect locus 514 (exosome and CCT complexes) to multi-locus and highly polygenic regulation (e.g., the 26S 515 proteasome and mitochondrial ribosomal small subunit, respectively). Within the CC, we 516 observed strong effects on protein abundance from new mutations in specific CC strains. Lastly, 517 we highlight individual CC strains with both aberrant protein regulation based on strain-specific 518 mutations that broadly disrupt protein regulatory networks. 519 520 Consistent with expectations, the effects of local genetic variation are highly conserved 522 between the CC and DO (and often with the founder strains). In cases where discordance 523 occurred between populations, we can often explain based on differing founder allele 524 frequencies between the populations, such as for the local pQTL of Ercc2 that is not detected in 525 the DO because only one NOD homozygote was observed in the DO sample. Differences 526 between populations stemming from different allele frequencies has been observed in human 527 populations (Mogil et al., 2018) . These discordant cases highlight how dominance and large-528 scale homozygosity contribute to the genetic effects on proteins, exposed by comparing the CC 529 and DO. Distal (i.e., trans) pQTL are often harder to detect due to weaker effects and are thus 530 more likely to be discordant based on differences in allele frequencies and the presence of non-531 additive effects. Here we showed that the 19 distal pQTL that were detected in both 532 populations at FDR < 0.1 are highly consistent, both in terms of haplotype effects and their 533 mediation candidates. Discordance in effects between the CC and DO was much greater for 534 distal pQTL detected in a single population, which is in striking contrast to the strong 535 concordance of local genetic effects detected in only one population. The extent of discordance 536 in distal pQTL effects compared to local effects suggests that additional factors are contributing 537 beyond differing allele frequencies and may represent distal genetic effects that are not 538 conserved across population. 539 Mediation analysis results mirror the concordance of distal pQTL effects, with largely the same 540 mediators detected for distal pQTL detected with high confidence in both populations. There 541 are some key caveats with mediation analysis, such as its accuracy being dependent on the true 542 mediator being present in the data. If a candidate mediator is correlated with the true but 543 unobserved mediator, possibly due to LD, it will likely be identified as a false positive mediator. 544 This issue is problematic for proteomics studies if the mediator is not captured in the protein 545 19 data, such as non-coding RNAs or lowly abundant proteins that go undetected in the mass-spec 546 analysis. For example, PGD was detected as a candidate mediator of a distal pQTL of Akr1e1 in 547 both the CC and DO; however, previous studies identified zinc finger proteins (Rex2 and Zfp985) 548 as the most likely mediator of the pQTL (Hamilton-Williams et al., 2010; Keele et al., 2020). Zinc 549 finger proteins are lowly expressed and not prevalent in proteomics data, and in their absence, 550 the nearby protein PGD stands out as the best mediator in both populations. Variable 551 measurement error across candidate mediators can also cause preference for a false mediator 552 due to noise reducing the correlation between the distally controlled protein and its true 553 mediator. This is likely occurring in the DO where NAXD is the stronger mediator for the Tubg1 554 distal pQTL than TUBGCP3, the strong biological candidate. Furthermore, the presence of 555 variable measurement error in proteomics is likely, stemming from the overall magnitude of 556 each protein's expression, the size of the protein and the number of peptides used to 557 summarize it, and how specifically those peptides map to the protein. Comparisons of 558 mediation analysis of two independent genetic experiments can provide replication of findings, 559 as well as assess a more complete set of candidate mediators and correct misidentifications. 560 561 Protein-complexes can be viewed as emergent phenotypes that can be driven by independently 563 regulated members or sub-complexes, as well as higher order dynamics like protein-protein 564 stoichiometry that control complex assembly. We observed a spectrum of cohesiveness 565 (Romanov et al., 2019) across the protein-complexes and our three populations. These 566 differences reflected components and sub-complexes that are semi-independent of the greater 567 complex, as well as the underlying genetic architectures of the populations. Because high 568 cohesiveness does not necessarily imply that a protein-complex is genetically regulated (e.g., 569 stoichiometry could produce tight correlation among complex members due to non-genetic 570 factors), we employed genetic analyses (e.g., QTL and heritability) to characterize genetic 571 control of the protein-complexes. New models of aberrant protein functional networks and disease 601 The unique biology of individual mouse strains has led to a variety of discoveries of genetically 602 based disease phenotypes. We leveraged the CC population to identify strains exhibiting 603 aberrant protein dynamics that could not be tied to the founder strains, such as being 604 downstream of mutations unique to specific CC strains. They may also result from unique 605 combinations of alleles of upstream drivers of a shared functional network. Examples include 606 increased abundance of immune-related proteins and unique liver phenotype in CC013, which 607 we are following up, and altered mitochondrial respiratory complex I function in CC007. Due to 608 the replicability of the CC population, these strain-specific protein networks can be followed up, 609 confirmed, and the underlying mechanisms dissected to reveal new biology and develop new 610 models of disease. 611 612 Integrative genetic resource populations 613 Proteins are a more functionally relevant measure of physiology and disease than the more 614 commonly measured gene transcripts. In our examination of protein regulation across three 615 related populations, we found that local genetic and sex effects on protein abundance were 616 consistent, suggesting that molecular phenotype data and their findings can largely be 617 integrated across these populations. It is easy to conceive of investigators querying specific 618 genes of interest to assess whether their proteins possess sex effects, pQTL and their haplotype 619 effects, or mediators of distal pQTL. To enable these queries, we provide interactive QTL 620 analysis tools for both the CC (https://churchilllab.jax.org/qtlviewer/CC/Ferris) and DO 621 (https://churchilllab.jax.org/qtlviewer/DO/Svenson). In contrast, higher order molecular 622 phenotypes or characteristics, such as protein-complexes and regulatory networks, showed 623 greater discordance across these populations. Non-additive genetic effects, either at a single 624 locus or across loci, were apparent in the regulation of protein-complexes, driving the 625 discordance between populations based on the genetic architecture of each. Lastly, we 626 identified CC strain-specific aberrant protein abundances, their phenotypic or system relevance, 627 and their putative consistency with previously described mutations present across these 628 strains. 629 In this work, we used these diverse mouse populations to finely dissect genetic effects on 630 protein-complexes in liver tissue, revealing the presence of non-additive effects and a polygenic 631 spectrum of genetic regulatory patterns. In the future, we envision highly expandable joint 632 population resources, covering a range of molecular phenotypes (e.g., RNA-seq, ATAC-seq, 633 mass-spec proteomics) and tissues relevant to human disease. proteins were digested by Lys-C at a 1:100 protease-to-peptide ratio overnight at room 691 temperature with gentle shaking. Trypsin was used for further digestion for 6 hours at 37°C at 692 the same ratio with Lys-C. After digestion, 50 µL of each sample were combined in a separate 693 tube and used as the 11 th sample in all 12 tandem mass tag (TMT) 11plex. 100 µL of each 694 sample were aliquoted, and 30 µL acetonitrile (ACN) was added into each sample to 30% final 695 volume. 200 µg TMT reagent (126, 127N, 127C, 128N, 128C, 129N, 129C, 130N, 130C, 130N, 696 and 131C) in 10 µL ACN was added to each sample. After 1 hour of labeling, 2 µL of each sample 697 was combined, desalted, and analyzed using mass-spec. Total intensities were determined in 698 each channel to calculate normalization factors. After quenching using 0.3% hydroxylamine, 11 699 samples were combined in 1:1 ratio of peptides based on normalization factors. The mixture 700 was desalted by solid-phase extraction and fractionated with basic pH reversed phase (BPRP) 701 high performance liquid chromatography (HPLC), collected onto a 96 well plate and combined 702 for 24 fractions in total. Twelve fractions were desalted and analyzed by liquid 703 chromatography-tandem mass spectrometry (LC-MS/MS). 704 705 Off-line basic pH reversed-phase (BPRP) fractionation 706 We fractionated the pooled TMT-labeled peptide sample using BPRP HPLC (Wang et al., 2011) . 707 We used an Agilent 1200 pump equipped with a degasser and a photodiode array (PDA) 708 detector. Peptides were subjected to a 50-min linear gradient from 5% to 35% acetonitrile in 10 709 mM ammonium bicarbonate pH 8 at a flow rate of 0.6 mL/min over an Agilent 300Extend C18 710 column (3.5 μm particles, 4.6 mm ID, and 220 mm in length). The peptide mixture was 711 fractionated into a total of 96 fractions, which were consolidated into 24, from which 12 non-712 adjacent samples were analyzed (Paulo et al., 2016a) . Samples were subsequently acidified with 713 1% formic acid and vacuum centrifuged to near dryness. Each consolidated fraction was 714 desalted via StageTip, dried again via vacuum centrifugation, and reconstituted in 5% 715 acetonitrile, 5% formic acid for LC-MS/MS processing. 716 717 Liquid chromatography and tandem mass spectrometry 718 Mass spectrometric data were collected on an Orbitrap Fusion Lumos mass spectrometer 719 coupled to a Proxeon NanoLC-1200 UHPLC. The 100 µm capillary column was packed with 35 720 cm of Accucore 50 resin (2.6 μm, 150Å; ThermoFisher Scientific). The scan sequence began with 721 an MS1 spectrum (Orbitrap analysis, resolution 120,000, 350−1400 Th, automatic gain control Following acquisition of each MS2 spectrum, we collected an MS3 spectrum in which multiple 728 MS2 fragment ions are captured in the MS3 precursor population using isolation waveforms 729 with multiple frequency notches. MS3 precursors were fragmented by HCD and analyzed using 730 the Orbitrap (NCE 65, AGC 3E5, maximum injection time 150 ms, resolution was 50,000 at 400 731 Th). 732 733 Mass spectra data analysis 734 Mass spectra were processed using a Sequest-based pipeline (Huttlin et al., 2010) . Spectra were 735 converted to mzXML using a modified version of ReAdW.exe. Database search included all 736 entries from an indexed Ensembl database version 90 (downloaded:10/09/2017). This database 737 was concatenated with one composed of all protein sequences in the reversed order. Searches 738 were performed using a 50 ppm precursor ion tolerance for total protein level analysis. The 739 product ion tolerance was set to 0.9 Da. TMT tags on lysine residues, peptide N termini 740 (+229.163 Da), and carbamidomethylation of cysteine residues (+57.021 Da) were set as static 741 modifications, while oxidation of methionine residues (+15.995 Da) was set as a variable 742 modification. 743 Peptide-spectrum matches (PSMs) were adjusted to FDR < 0.01 Gygi, 2007, 2010) . 744 PSM filtering was performed using a linear discriminant analysis (LDA), as described previously 745 (Huttlin et al., 2010) , while considering the following parameters: XCorr, ΔCn, missed cleavages, 746 peptide length, charge state, and precursor mass accuracy. For TMT-based reporter ion 747 quantitation, we extracted the summed signal-to-noise (S:N) ratio for each TMT channel and 748 found the closest matching centroid to the expected mass of the TMT reporter ion. For protein-749 level comparisons, PSMs were identified, quantified, and collapsed to a peptide FDR < 0.01 and 750 then collapsed further to a final protein-level FDR < 0.01, which resulted in a final peptide level 751 FDR < 0.001. Moreover, protein assembly was guided by principles of parsimony to produce the 752 smallest set of proteins necessary to account for all observed peptides. PSMs with poor quality, 753 MS3 spectra with TMT reporter summed signal-to-noise of less than 100, or having no MS3 754 spectra were excluded from quantification ( where local is the effect of the local haplotype pair to peptide for sample , 2012). Here we used a leave-one-chromosome-out or "loco" , in which markers from the 818 chromosome the peptide is predicted to be located on are excluded from estimation in order 819 to avoid the kinship term absorbing some of local[ ] (Wei and Xu, 2016) . We then calculated 820 poly = cor( local , q), the Pearson correlation coefficient between  local , the BLUP of  local and 26 q, the incidence vector of the B6 allele among the founder strains (e.g., q = [0 1 0 0 0 0 0 0] 822 for a peptide with a B6 allele that is missing in the other founder strains Selecting which marker to fit the haplotype effects at is complicated by the fact that the CC and 934 DO have different sets of markers and the genomic coordinate of the peak LOD score is also 935 subject to variation. When comparing pQTL detected in both populations, we fit Equation 4 at 936 the markers with the highest LOD score specific to each population, meaning they may not be 937 the markers closest to each other. When comparing pQTL that were detected in only one 938 population, we selected the marker in the population that failed to map the pQTL that was 939 closest to the marker in the population that detected it. 940 29 941 If the genetic effects on a protein are primarily local, the relative abundances for a protein in 943 the founder strains should match the local pQTL effects observed in the CC and DO, which can 944 be used to confirm findings and better support suggestive pQTL in the CC or DO. We evaluated 945 the consistency of local pQTL in the CC with the founder strains, using an approach similar to 946 how we compared pQTL effects between the CC and DO. For the founder strains, rather than 947 fitting pQTL effects ( QTL ), we fit the founder effects as random terms (as described for the 948 local term in Equation 2 for the founder strains) summarized as BLUPs ( strain Founders ). We then 949 calculated the Pearson correlation between local pQTL effects in the CC and founder effects in 950 the founder strains: local = cor( QTL CC ,  strain Founders ). As when comparing QTL effects between the 951 CC and DO, we then tested local > 0, and corrected for multiple testing through the BH 952 procedure. 953 954 We performed mediation analysis on each distal pQTL with LOD > 6 in the CC or DO, which 956 involved a scan analogous to the QTL genome scans. The underlying model is 957 producing a mediation LOD score. The mediation model is fit for all proteins as mediators, 967 excluding protein , resulting in a mediation scan. 968 We assume that the vast majority of candidates are not true mediators of a specific pQTL and 969 thus the distribution of mediation LOD scores approximates a null distribution. Assuming that 970 the null distribution is approximately normal, we calculate the z-scores of the mediation LOD 971 scores, and define mediators of the QTL at marker for protein as proteins with med < -4, 972 where med is the z-score of the mediation LOD score for candidate mediator protein . The 973 rationale being that when testing the QTL term in Equation 5, if the mediator contains some of 974 the information of the QTL, its presence in both the alternative and null models will result in a 975 large drop in the LOD score of the detected pQTL. In order for a mediator to be a clear 976 candidate driver of the distal pQTL, we required that the mediator TSS be within 10 Mbp of the 977 pQTL marker. Strong mediators that were not near the pQTL often represent proteins that are 978 correlated with protein , such as co-regulated members of a protein-complex. 979 980 Proteins that exhibited differential abundance between the sexes, i.e., sex effects, were 982 identified using an LMM similar to the heritability model (Equation 3) for the CC, DO, and 983 founder strains, but instead testing the significance of the sex coefficient: 984 Genetics of trans-regulatory 1112 variation in gene expression Mutations in genes encoding melanosomal proteins cause pigmentary glaucoma in 1115 DBA/2J mice The expanded BXD family of mice: A cohort 1118 for experimental systems genetics and precision medicine The moderator-mediator variable distinction in social 1120 psychological research: Conceptual, strategic, and statistical considerations Fitting Linear Mixed-Effects Models 1123 Using lme4 Impact of regulatory variation from RNA to protein Rank-Based Inverse Normal Transformations 1127 are Increasingly Used, But are They Merited? Controlling the False Discovery Rate : A Practical and 1129 Powerful Approach to Multiple Testing R/qtl2: Software for Mapping Quantitative Trait Loci with High-1132 Dimensional Data and Multiparent Populations Complex trait analysis of gene expression uncovers polygenic and 1135 pleiotropic networks that modulate nervous system function Defining the consequences of genetic 1138 variation on a proteome-wide scale The Diversity outbred mouse 1140 34 population The Collaborative Cross, a community resource for 1143 the genetic analysis of complex traits The Genome Architecture of the Collaborative Cross 1145 Mouse Genetic Reference Population Permutation tests for multiple loci affecting a quantitative 1147 character Efficient Computation of Significance Levels for 1149 Multiple Associations in Large Studies of Correlated Data, Including Genomewide Association 1150 Studies Target-decoy search strategy for increased confidence in large-1152 scale protein identifications by mass spectrometry Target-Decoy Search Strategy for Mass Spectrometry-Based 1154 Methods in Molecular Biology Diversity Outbred Mice Identify Population-1158 Based Exposure Thresholds and Genetic Factors that Influence Benzene Quantitative Trait Locus Mapping Methods for Diversity 1162 Outbred Mice. G3 (Bethesda) CORUM: the comprehensive resource of mammalian protein 1165 complexes-2019 Genome Wide Identification of SARS-CoV Susceptibility Loci Using the Collaborative Cross Web-Based 1170 Search Tool for Visualizing Instrument Performance Using the Triple Knockout (TKO) Proteome 1171 Standard Idd9.2 and Idd9.3 Protective Alleles Function in CD4+ T-Cells and 1174 Nonlymphoid Cells to Prevent Expansion of Pathogenic Islet-Specific CD8+ T-Cells Genome-wide pQTL analysis of protein 1177 expression regulatory networks in the human liver A Tissue-Specific Atlas of Mouse Protein Phosphorylation 1180 and Expression Dual Proteome-scale Networks Reveal Cell-1183 specific Remodeling of the Human Interactome Identification, Inference and Sensitivity Analysis for 1185 Causal Mediation Effects Efficient control of population structure in model organism association mapping Variance component model to account for sample structure in genome-wide 1191 association studies Integrative QTL analysis of gene expression and 1194 chromatin accessibility identifies multi-tissue patterns of genetic regulation Genetic Drivers of Pancreatic Islet Function Gene loci associated with insulin secretion in 1201 islets from nondiabetic mice New Insights into the Function of 1203 the Immunoproteasome in Immune and Nonimmune Cells C57BL/6N mutation in cytoplasmic FMRP 1206 interacting protein 2 regulates cocaine response FaST 1208 linear mixed models for genome-wide association studies Quantitative variability of 342 plasma proteins in a human twin 1211 population Genetics and Analysis of Quantitative Traits Mediation Analysis Dynamic Regulation of the 26S Proteasome: From 1217 Synthesis to Degradation Increasing the Multiplexing Capacity of TMTs Using 1220 Reporter Ion Isotopologues with Isobaric Masses Long-term exercise in mice has sex-dependent benefits on body composition 1223 and metabolism during aging Genetic architecture of gene expression traits across 1226 diverse populations Informatics resources for the Collaborative Cross and 1228 related mouse populations Candidate Risk Factors and Mechanisms for Tolvaptan-1231 Induced Liver Injury Are Identified Using a Collaborative Cross Approach Identification of Candidate Risk Factor Genes for Human Idelalisib Toxicity Using a 1235 Collaborative Cross Approach Complex 1237 Control of GABA(A) Receptor Subunit mRNA Expression: Variation, Covariation, and Genetic 1238 Regulation Complex Genetic Architecture Underlies 1241 Regulation of Influenza-A-Virus-Specific Antibody Responses in the Collaborative Cross Compositional Proteomics: Effects of Spatial Constraints on Protein 1245 Quantification Utilizing Isobaric Tags Genetic diversity between mouse strains 1248 allows identification of the CC027/GeniUnc strain as an orally reactive model of peanut allergy Spatiotemporal variation of mammalian protein complex stoichiometries The Contribution of RNA Decay Quantitative Trait Loci to Inter-Individual Variation in Steady-State Gene Expression Levels Recovery of Inter-Block Information when Block Sizes 1258 are Unequal Quantitative mass spectrometry-based multiplexing compares the abundance of 5000 S. 1261 cerevisiae proteins across 10 carbon sources A Triple Knockout (TKO) Proteomics Standard 1263 for Diagnosing Ion Interference in Isobaric Labeling Experiments Collaborative Cross breeding population A complete mass-spectrometric map of the yeast 1270 proteome applied to quantitative trait analysis Host genetic diversity enables Ebola 1274 hemorrhagic fever pathogenesis and resistance The Collaborative Cross as a resource for modeling human 1277 disease: CC011/Unc, a new mouse model for spontaneous colitis Disentangling 1279 Genetic and Environmental Effects on the Proteotypes of Individuals Whole Genome Sequencing and Progress 1282 Toward Full Inbreeding of the Mouse Collaborative Cross Population. G3 (Bethesda) Content and performance of the 1286 MiniMUGA genotyping array, a new tool to improve rigor and reproducibility in mouse 1287 research Genetic variation influences pluripotent ground state 1290 stability in mouse embryonic stem cells through a hierarchy of molecular phenotypes Functionally Overlapping Variants Control 1294 Tuberculosis Susceptibility in Collaborative Cross Mice Genomes of the mouse 1297 collaborative cross qvalue: Q-value estimation for false 1299 discovery rate control Genetics meets proteomics: perspectives 1301 for large population-based studies High-resolution genetic mapping using the Mouse 1304 Diversity outbred population STRING v11: protein-protein association 1307 networks with increased coverage, supporting functional discovery in genome-wide 1308 experimental datasets Keeping the 1310 Proportions of Protein Complex Components in Check Mapping in structured populations by 1312 resample model averaging Protein complex-based analysis framework for high-throughput data sets Reversed-phase chromatography with multiple fraction 1317 concatenation strategy for proteome profiling of human MCF10A cells A random-model approach to QTL mapping in multiparent advanced 1320 generation intercross (MAGIC) populations Systems proteomics of liver mitochondria function Multilayered Genetic and Omics Dissection On the subspecific 1328 origin of the laboratory mouse Subspecific origin and haplotype diversity in the 1331 39 laboratory mouse clusterProfiler: an R Package for Comparing 1333 Biological Themes Among Gene Clusters A Loss-of-Function Mutation in the Integrin Alpha L ( Itgal ) Gene 1336 Contributes to Susceptibility to Salmonella enterica Serovar Typhimurium Infection Collaborative Cross Strain CC042 Genome-wide efficient mixed-model analysis for association 1339 studies Comparison of lenient mapping results, mediation candidates, and sex effects among genetically diverse mouse 1346 populations, related to Figures 1 and 2. Leniently detected (FDR < 0.5) pQTL in the (a) CC and (b) DO. The pQTL are plotted by 1347 the genomic position of protein against their coordinate. Dot size is proportional to association strength (LOD score). (c) Venn 1348 diagram of the overlap of analyzed proteins between the CC and DO. (d) A local pQTL was detected in the CC for mt-Nd1, a gene 1349 encoded on the mitochondrial genome Venn diagram 1351 of the overlap in local pQTL detected in the CC and DO. (f) The haplotype effects of local pQTL detected in both populations are 1352 highly consistent, as measured by the correlation coefficient comparing the effects in the CC and DO. (g) More local pQTL are 1353 consistent between the populations when also considering pQTL detected in only one of them. Red bars represent pQTL that 1354 that had significantly correlated effects (FDR < 0.1). (h) Venn diagram of overlap of distal pQTL In 1356 contrast to local genetic effects, considering distal pQTL that were leniently detected in only one of the populations resulted in 1357 many inconsistent effects comparisons, likely representing false positives or subtle distal effects specific to one population. The 1358 founder strains can reveal under-powered local pQTL in the CC (and DO), as revealed by the correlations between haplotype 1359 effects at putative local pQTL for genes with rare alleles in the CC (≤ 3 CC strains) and that did not have a local pQTL detected 1360 (FDR < 0.5) and strain effects in mice from the founder strains (k). The enrichment in positive correlations suggests the founders 1361 can provide additional information for detecting local pQTL in the CC. The gene Cyp2j5 had significant correlation (r = 0.96) 1362 between the strain effects in the (l) founder strains with the (m) local haplotype effects in the CC, characterized by low CAST 1363 and PWK effects. Mean and  2 standard deviation bars are shown for the founder strains, and boxplots for CC strains Stringent and lenient significance thresholds are included as horizontal black and gray dashed lines, respectively Comparison of mediation results for distal pQTL stringently detected (FDR < 0.1) in at least one of the populations. For distal 1367 pQTL detected stringently in both populations, mediation results were strongly consistent (14 out of 19). If the distal pQTL were 1368 detected leniently in the other population, mediation was still somewhat consistent (7 out of 19). When the distal pQTL was not 1369 detected in the other population, often no mediator was detected Observance of both sexes within the (p) CC and (q) founder strains improves power to detect sex effects. The -1371 log10(p-value), or logP, from a linear mixed effect model (LMM) that takes into account strain replicates compared to the logP 1372 from an unpaired t-test. The lower logP in the founder strains reflects the overall smaller sample size (32 mice compared to 116 1373 in the CC) Highly consistent local pQTL effects across the CC, DO, and founder strains Boxplots are shown for the 1377 CC strains and mean  2 standard deviation bars for the founder strains. The local pQTL effects are also highly consistent 1378 between the CC and DO, both in terms of modeled effects and the actual data -for (b, c) Gosr2 and (f, g) Cyp2d22. For 1379 comparisons of modeled effects (b, f), intervals represent  standard error bars the CC and DO data at the pQTL are represented as heatmaps with rows 1381 indicating founder allele dosage at the local pQTL (expected allele counts) and columns indicating individual mice, ordered by 1382 protein abundance. Clusters in rows towards the left or right sides represent founder haplotype effects, such as the high WSB 1383 effect in Cyp2d22. Haplotype-based association scans at the pQTL are overlayed with variant assocations for the peak variant association and haplotype-1385 based association are very close, consistent with a single variant largely driving the pQTL. Variants with an allele shared by only 1386 129 and CAST, matching the effects pattern, are shown. When the effects were more complex than bi-allelic, as with Cyp2d22 1387 (h), there are likely multiple causal variants, and haplotype-based association produces higher scores of assocation PWK-private variants with LOD > 6 were included for Cyp2d22, highlighting LD blocks that potentially carry founder-specific 1389 variants driving their extreme effects. The larger sample size and finer mapping resolution of the DO sample are evident in the 1390 high LOD scores and narrower association peaks. Genomic positions of peak associations from variant-and haplotype-based 1391 mapping are marked 2012). We defined gene sets based on having sex < 0.01 and split them further into subsets 1000 based on having higher abundance in males or higher abundance in females for both the CC 1001 and DO. We used the quantified proteins in each population as the background gene set to 1002 account for biases in the observed proteins. Hypergeometric set tests for enrichment of GO and 1003KEGG terms were performed with FDR multiple testing control (Storey et al., 2019) . Enriched 1004 GO and KEGG terms were selected based on having set < 0.1. Protein-complex analysis 1007We categorized individual proteins as members of protein-complexes as defined by previous 1008annotations (Ori et al., 2016) . For each protein-complex, we quantified how tightly co-1009 abundant, i.e., cohesive, the members are, by calculating the median pairwise Pearson 1010 correlation for each protein with the other members of the protein-complex. We summarized 1011 cohesiveness within a protein-complex by recording the median and interquartile range across 1012 the median correlations for the individual proteins. 1013To assess whether genetics or sex regulated protein-complexes as a whole, we estimated the 1014 complex-heritability and complex-sex effect size. We took the PC1 from the set of proteins from 1015 the complex after first removing the effects of key covariates from the individual proteins in 1016 order to keep the PC1 from reflecting other drivers of variation instead of genetic factors or sex. 1017For complex-heritability, we removed the effect of sex in the CC, and both sex and diet in the 1018 DO We would like to thank the members of the Churchill lab for feedback during the development 1064of this project and in the process of composing the manuscript. We would also like to thank 1065Lauren (Figures 4, 5, 6, S5, & S6) . abundance, but alternatively, they may represent a coding polymorphism in the quantified peptides, resulting in false low 1459 abundance due to the allele-specific nature of mass-spec quantification. The color or the dots indicates the founder allele at the 1460 gene. Strain-specific variants associated with high abundance at the local protein were also observed, such as high ARHGAP18 1461 abundance in CC017. (e) A novel SNP allele in CC004 at Plek was associated with low abundance, potentially representing a new 1462 allele at a gene that already possessed local genetic variation from the founder strains that drove a (f) suggestive local pQTL. Stringent and lenient significance thresholds are included as horizontal black and gray dashed lines, respectively. CC strain-1464 specific sets of proteins with extreme abundance were identified that were significantly enriched in GO and KEGG functional 1465 terms. (g) CC007 possessed distinctly low abundance of mitochondrial respiratory chain complex I proteins, as well as high 1466 abundance in two assembly factor proteins from the complex, NDUFAF3 and NDUFAF4, suggesting compensatory mechanisms 1467 may be occurring at the complex in the strain. Large dots represent the CC strain with extreme protein abundance. Other 1468 detected CC strain-specific protein dynamics include low abundance of (h) glycogen metabolic process proteins for CC0011, (i) 1469 cellular respiration proteins for CC030, and (j) high abundance of contractile fiber proteins for CC075.