key: cord-0903606-y0oz3n2d authors: Seviiri, M.; Law, M. H.; Ong, J.-S.; Gharahkhani, P.; Fontanillas, P.; 23andMe Research Team,; Olsen, C. M.; Whiteman, D. C.; MacGregor, S. title: A multi-phenotype analysis reveals 19 novel susceptibility loci for basal cell carcinoma and 15 for squamous cell carcinoma. date: 2022-03-08 journal: nan DOI: 10.1101/2022.03.06.22271725 sha: 8b457d88671de5028c8df53550024b4df4054bd6 doc_id: 903606 cord_uid: y0oz3n2d Basal cell carcinoma (BCC) and squamous cell carcinoma (SCC) are the most common forms of skin cancer. There is genetic overlap between skin cancers, pigmentation traits, and autoimmune diseases. We use linkage disequilibrium score regression to identify 20 traits (melanoma, pigmentation traits, autoimmune diseases, and blood biochemistry biomarkers) with a high genetic correlation (rg > 10%, P < 0.05) with BCC (20,791 cases and 286,893 controls in the UK Biobank) and SCC (7,402 cases and 286,892 controls in the UK Biobank), and use a multi-trait genetic analysis to identify 78 and 69 independent genome-wide significant (P < 5 X 10-8) susceptibility loci for BCC and SCC respectively; 19 BCC and 15 SCC loci are both novel and replicated (P < 0.05) in a large independent cohort; 23andMe, Inc (BCC: 251,963 cases and 2,271,667 controls, and SCC: 134,700 cases and 2,394,699 controls. Novel loci are implicated in BCC/SCC development and progression (e.g. CDKL1), pigmentation (e.g. DSTYK), cardiometabolic pathways (e.g. FADS2), and immune-regulatory pathways including; innate immunity against coronaviruses (e.g. IFIH1), and HIV-1 viral load modulation and disease progression (e.g. CCR5). We also report a powerful and optimised BCC polygenic risk score that enables effective risk stratification for keratinocyte cancer in a large prospective Canadian Longitudinal Study of Aging (794 cases and 18139 controls); e.g. percentage of participants reclassified; MTAGPRS = 36.57%, 95% CI = 35.89-37.26% versus UKBPRS= 33.23%, 95% CI=32.56-33.91%). Using linkage disequilibrium score (LDSC) regression 16 , 20 phenotypes were significantly genetically correlated (P < 0.05, rg > 10%) with either BCC or SCC (Figure 1 and Supplementary Table 1 ). In the first instance, 35 phenotypes that we considered as possibly correlated with skin cancer (including body mass index) were excluded for not meeting aforementioned criteria above (Supplementary Table 2 ). Using the same selection criteria, no additional new phenotypes were included following analysis using collated GWAS summary statistics (over 700 phenotypes) in the LD hub database 17 . In total, subsequent analyses included 22 genetically correlated traits; cancers; BCC and SCC GWAS from the UK Biobank (UKB) 18, 19 , a cutaneous melanoma GWAS metaanalysis 20 , KC from the QSkin Sun and Health Study (QSkin) 21 , KC from the Electronic Medical Records and Genomics Network (eMERGE) cohort 22, 23 and all-cancer from the Resource for Genetic Epidemiology Research on Aging (GERA) cohort 24 ; skin and hair pigmentation related traits; skin burn type (QSkin), red hair (QSkin), hair colour excluding red hair (UKB), skin colour (UKB), and mole count excluding melanoma cases (QSkin), autoimmune conditions; type 1 diabetes and hypothyroidism 25 , and vitiligo 26 , lifestyle-related traits; educational attainment in years spent in school 27 and smoking (cigarettes per day) 28 , and biochemistry blood biomarkers from the UKB; aspartate aminotransferase, C-reactive protein, albumin, and gamma-glutamyl transferase, glucose and vitamin D (adjusted for monthly variation). The sample sizes and phenotype measurement for all the included and excluded traits are presented in Supplementary Tables 3 and Supplementary Table 2 respectively. . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 8, 2022. ; https://doi.org/10.1101/2022.03.06.22271725 doi: medRxiv preprint . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 8, 2022. ; https://doi.org/10.1101/2022.03.06.22271725 doi: medRxiv preprint Adding 20 traits genetically correlated with either BCC or SCC (rg > 0.1, P < 0.05) (from UKB) increased the effective sample sizes for BCC and SCC by 2.6 and 8.3 times respectively. Using the MTAG approach we identified 78 and 69 independent genome-wide significant (P < 5 × 10 -8 ) susceptibility loci for BCC (Figure 2 and Supplementary Table 4 ) and SCC (Figure 3 and Supplementary Table 5 ) respectively. Although the results for the peak single nucleotide polymorphisms (SNPs) were more significant following the MTAG analysis due to the greater statistical power, the log (odds ratio) effect sizes for the MTAG output and the respective UKB Table 5 ). For SCC there was also high concordance with the effect estimates between the MTAG and the replication set with Pearson's correlation = 0.69 (95% CI=0.55-0.80, P = 3.48 x 10 -11 ; Figure 4d ). . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The Y-axis represents the level of significance recorded in negative log 10 (P-value), whilst the Xaxis represents the chromosome 1-22, alternated with light blue and light pink colours. The horizontal blue line represents a suggestive level of significance at P-value =10 -6 , while the red one represents the genome-wide level of significance; P = 5x 10 -8 . The green dots represent the 78 genome-wide significant independent loci for basal cell carcinoma susceptibility. . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) 11 Figure 4 shows the comparison of the effect estimates in log ( We identified seven novel loci with a potential role in the development and progression of keratinocyte cancer. rs10141120 in MARK3; MARK3 is a cell cycle regulator involved in the DNA damage response (e.g. following radiotherapy or treatment with alkylating agents) 29 , and implicated in carcinogenesis e.g. for hepatocellular carcinoma 30 . In addition, rs10141120 is in LD with rs3825566 (r 2 =0.94) and rs55859054 (r 2 =0.96) which are lead SNPs for hair colour 31, 32 . We also identified two novel variants for BCC and SCC; rs35563099 and rs7098111 respectively, near RAB11FIP2 that were in high LD (r 2 =0.98). RAB11FIP2 is overexpressed in colorectal and gastric cancer cells where it facilitates their migration leading to cancer metastases 33, 34 . It is likely that RAB11FIP2 promotes keratinocyte cancer progression. rs35563099 is a lead SNP for skin low tanning response 35 and in LD with rs11198112 (r 2 =0.99) for sunburns 31 . rs7098111 also is in LD with SNPs for skin/hair colour (rs11198112, r 2 =0.57), freckles (rs10444039, r 2 =0.99) and sunburns (rs11198112, r 2 =0.98) 31,36,37 . . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 8, 2022. ; https://doi.org/10.1101/2022.03.06.22271725 doi: medRxiv preprint rs10899466 in GAB2 is in LD with rs10899501 for hair colour (r 2 =0.80) 31, 38 . Following inflammatory stimuli (e.g. by cytokines) GAB2 is required in inflammatory signalling during tumorigenesis 39, 40 . It enhances cancer cell proliferation e.g. in breast cancer 41, 42 . Another novel variant was rs10766301, an intronic variant in SOX6. Although the direct role of SOX6 in keratinocyte cancer biology is unknown, it facilitates apoptosis in colorectal cancer, esophageal squamous cell carcinoma, pancreatic and ovarian cancer [43] [44] [45] [46] . Conversely, its downregulation facilitates cancer progression 43, 45, 47 . rs10766301 is in LD with rs2953060 (r 2 =0.60) for sunburns 31 . rs142004400 in CDKL1 promotes tumor cell proliferation, migration and invasion in melanoma, colorectal cancer, and breast cancer [48] [49] [50] , whose downregulation facilitates apoptosis 48 . We also identified rs472385 in FRMD5; FRMD5 modulates tumour progression by regulating cancer cell mobility and ROCK1-triggered kinase activity 51, 52 . Its expression is downregulated in renal, breast and colorectal cancers 52 . rs472385 is also in LD with rs35654783 (r 2 =0.76), a lead SNP for diastolic blood pressure 53 . We also identified rs10876864 near SUOX (+1.78kb); expression of SUOX is associated with both proliferation and progression of oral squamous cell carcinoma, gastric cancer and hepatocellular carcinoma [54] [55] [56] . In addition, eQTL analysis showed that rs10876864 was strongly linked to expression of SUOX and RPS26 in skin tissues ( Table 1 and Table 2 ). RPS26 regulates the tumour suppression activities of p53 in response to DNA damage 57 . Thus, it is possible that rs10876864 is involved in proliferation and progression of keratinocyte cancers. However, it is also likely that rs10876864 might have pleiotropic effects on KC through immunomodulating pathways since it is also near IKZF4 (-13.6kb); IKZF4 is required to suppress/maintain FOXP3+ regulatory T cells 58, 59 , important in auto-immunity and self-recognition. It is in LD with lead SNPs for auto-immune traits e.g. T1D, and allergic disease, rheumatoid arthritis (in LD with rs773125, r 2 =0.83) 60-62 , immune-. CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 8, 2022. ; https://doi.org/10.1101/2022.03.06.22271725 doi: medRxiv preprint suppressive medication use e.g. glucocorticoids (in LD with rs1689510, r 2 =0.72), thyroid preparations (in LD with rs7302200, r 2 = 0.71) and anti-asthmatic adrenergics inhalant use (in LD with rs34415530, r 2 =0.72) 63 . Previous studies have reported a relationship between immune response and skin cancer and a number of our novel loci for BCC and SCC suggested links to immune regulatory processes. rs2373232 is an intergenic variant between CCRL2 (-4.337kb) and CCR5 (+26.69kb) in high linkage disequilibrium (LD) r 2 =1, with rs1015164 (in CCR5), a known SNP for HIV-1 viral load variation 64, 65 . CCR5 is generally involved in coordination of the immune response 66 , and specifically regulates HIV-1 viral load and progression 67, 68 . CCRL2 is involved in regulating immune responses induced by chemokines 69 . rs2111485 is an intergenic variant between FAP (+10.49kb) and IFIH1 (-13.05kb) and in high LD (r 2 =0.89) with a nonsense SNP rs1990760 in IFIH1; IFIH1 is involved in innate (anti-viral) immune response (e.g. against coronaviruses), autoimmunity and autoinflammatory response [70] [71] [72] . In addition, FAP plays a pro-tumourigenic role in several cancers including breast, colorectal, gastric and esophageal cancer 73 , and also facilitates immunosuppression to enhance colorectal and gastric cancer progression 74,75 . rs10774625, an intronic variant in ATXN2, is a lead SNP for hypothyroidism susceptibility 76 , and in high LD (r 2 =0.9) with a missense SNP rs3184504 in SH2B3; SH2B3 is involved in mediating immune cell stimulation and inflammatory signalling 77, 78 . Another novel SNP rs4409785 near . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 8, 2022. ; https://doi.org/10.1101/2022.03.06.22271725 doi: medRxiv preprint At 10p15.1 rs706779 is an intronic variant in the interleukin 2 receptor subunit alpha (IL2RA) gene and lies a strong transcriptionally active enhancer in immune cells 83 . IL2RA controls the immune response (tolerance) by modulating the function of regulatory T-cells 84 . rs706779 is a lead SNP for T1D, Crohn's disease, vitiligo and in LD with rs7090530 (r 2 =0.73) for hypothyroidism and thyroid medication 26, 31, 60, 63, 85 . rs11059675 in LRRC43, and near IL31 (+9.58kb) is a lead SNP for psoriasis 86 , and is in LD with rs7968808 (r 2 = 0.98), a lead SNP for eczema 31 . IL31 induces and modulates skin allergic diseases 87 . rs17391694 near GIPC2 (+20.51kb) is a lead SNP for Crohn's disease, an inflammatory autoimmune disorder 85 , and lung cancer 88 . For both BCC and SCC, we also identified novel loci that have been previously reported as being associated with cardiometabolic biomarkers. The BCC novel loci for this pathway included; rs174570 in FADS2, rs10774625 in ATXN2, and rs1136165 near CKB in BMI, whilst for SCC they included; rs3768321 in PABPC4, and rs1260326 near GCKR. rs174570 is an intronic variant in FADS2 and 12.68kb away from FADS1 and it is a lead SNP for higher LDL cholesterol, total cholesterol and triglycerides levels, and is in LD with lead SNPs for PUFA levels in people of European descent e.g. rs174547 (r 2 =0.36), rs174577 (r 2 =0.34), and rs174538 (r 2 =0.42) 89, 90 . FADS1/2 genes are involved in the downstream metabolism of the plasma omega-6 and omega-3 PUFA resulting in oncogenic inflammatory biomarkers (prostaglandins E, thromboxane A2, and leukotriene B) 91 . rs1260326 in GCKR is a lead SNP for triglycerides, total cholesterol and fasting plasma glucose 92, 93 . Mutations in GCKR are known to be diabetogenic 94 . rs3768321 in PABPC4 is a lead SNP for HDL cholesterol 92 . . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 8, 2022. ; https://doi.org/10.1101/2022.03.06.22271725 doi: medRxiv preprint Some novel variants in the cardiometabolic pathway had pleiotropic effects with pigmentation and autoimmune traits. For example rs1136165 in CKB, a lead SNP for BMI 31 is in LD with rs55859054 (r 2 =0.89) and rs3825566 (r 2 =0.88) which are associated with hair colour 31, 32 . rs10774625 in ATXN2 is linked to a spectrum of cardiometabolic markers; diastolic and systolic blood pressure, CVD, coronary artery disease, and LDL cholesterol (Supplementary Table 4 ) 31, 53, 95, 96 , is also a lead SNP for immune regulatory phenotypes; hypothyroidism and T1D 76, 97 . Pigmentation is a crucial pathway in the development of BCC, SCC and melanoma. Known pigmentation genes like MC1R and IRF4 play an important role in the genetic susceptibility to skin cancers and many new loci were associated with pigmentation traits. rs2924552 near TPCN2 (+31.3kb); rare mutations in TPCN2 results in blond rather than brown hair among Icelanders and the Dutch 98 . rs9878566 in LINC00886 is in perfect LD (r 2 =1.00) with rs9818780 for sunburns 31 . rs6889986 in GPR98 is in LD with lead SNPS for hair colour; rs60325490 (r 2 =0.69) and rs6860111 (r 2 =0.69) 31, 32 . rs77758638 in AP3M2 is in LD with rs113060680 (r 2 =0.93) for hair colour and skin tanning response 32 . However, there were also a number of loci with a potential role in KC initiation and progression that had pleiotropic effects with pigmentation traits (as explained above) e.g. rs35563099 and rs7098111 near RAB11FIP2. Some novel BCC loci were previously known for CM; rs73008229 near ATM in LD (r 2 ~1) with rs1801516 a missense variant (i.e. L76I) in ATM; ATM has a cell cycle function in DNA damage response due to radiotoxicity after radiotherapy [99] [100] [101] . rs1801516 has previously been associated with cutaneous melanoma 102103, 104 . Several other novel BCC SNPs are in high LD with lead SNPs . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 8, 2022. ; https://doi.org/10.1101/2022.03.06.22271725 doi: medRxiv preprint for cutaneous melanoma; rs2369633 near DSTYK is in LD with rs11240396 (r 2 =0.73), rs6889986 near GPR98 locus is in LD with rs10942621 (r 2 =0.71), rs10766301 in SOX6 is in LD with rs2054095 (r 2 =0.65) 20 . Some novel loci have not been associated with any trait before (at a genome-wide significance level); rs12142181 (near RNA5SP45), rs7301141 (near FBRSL1) for BCC and rs12142181 (near RNA5SP45) for SCC. However, rs12142181 is in LD with rs111599055 (r 2 =0.53) for early onset of prostate cancer 105 . After multiple correction testing (P = 0.05/18,188 genes; 2.75 x10 -6 ) gene set analysis revealed During the validation of the PRSs, S5 (i.e. P < 10 -4 with 273 SNPs for the MTAGPRS and 462 SNPs for the UKBPRS) was the optimal PRS models for both MTAGPRS and UKBPRS with Nagelkerke R 2 of 10.65% and 9.55% respectively (Figure 5a) . The SNPs for the optimal models are presented in (Figure 5b ). In addition, the net reclassification index for KC risk was greater for MTAGPRS than the UKBPRS (Figure 5c) (Figure 5d ). . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 8, 2022. ; indicates the MTAG-derived PRS. 5a: Validation of the BCC MTAGPRS and UKBPRS models to select the best performing index based on clumped SNPs at S1 (P < 5x10 -8 ), S2 (P < 10 -7 ), S3 (P < 10 -6 ), S4 (P < 10 -5 ), S5 (P < 10 -4 ), S6 (P < 10 -3 ), S7 (P < 10 -2 ) and S8 (P < 10 -1 ) on the x-axis. The y-axis represents Nagelkerke's R 2 (%), a measure for model fitness. PRS model S1 and S5 are the optimal PRS models for UKBPRS and MTAGPRS respectively. 5b: Shows and compares the association between the UKBPRS and MTAGPRS and KC risk in CLSA (N=18,515) expressed in odds ratios per standard deviation (y-axis) increase in the PRS adjusted for age, sex and the ancestral 10 PCs. 5c: Illustrates that the MTAGPRS performs better than the UKBPRS based on both the categorical and continuous net reclassification improvement indices. 5d: . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) In this large multi-trait GWAS analysis we show that cutaneous melanoma, "all-cancer", pigmentation traits, autoimmune diseases and other serum metabolic biomarkers are genetically correlated with BCC and SCC. We have leveraged this genetic correlation using the MTAG approach to identify 78 and 69 independent genome-wide significant loci for BCC and SCC risk respectively, the most common skin cancers among fair skinned people. 19 BCC and 15 SCC novel loci were replicated in the 23andMe cohort, indicating our study uncovers a number of novel findings relevant to keratinocyte cancer biology. Firstly, we identify novel loci in the pigmentation pathways for both BCC and SCC susceptibility. Due to the importance of sun exposure in keratinocyte cancer biology 106 , several new loci for BCC and SCC were linked to pigmentation traits including skin colour, red hair, skin tanning response and sunburns. The gene set analysis results also confirmed we identified biological pathways involved in melanin biosynthesis and DNA damage response. Second, our study affirms the role of immune-regulatory processes and pathways in the BCC and SCC susceptibility. We show novel BCC and SCC loci known for immune regulatory processes including; HIV viral load modulation 67,68 , innate immune response for coronaviruses (through IFIH1) 107-109 , autoimmune disease susceptibility and medication use especially for thyroid or hypothyroidism. It indicates possible shared biology between keratinocyte cancers and immunerelated viruses. Third, immunosuppressive medication including azathioprine and cyclosporin A have been implicated in BCC and SCC risk 110, 111 . While we uncovered novel KC loci linked to immunerelated medication use including; anti-asthmatic inhalants, and thyroid preparations 63 , it is likely . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 8, 2022. ; https://doi.org/10.1101/2022.03.06.22271725 doi: medRxiv preprint that medication-related loci underpinned here are just a proxy indicator for the autoimmune disease. Thus, these medications are unlikely to cause BCC or SCC. In addition, even if these diseases were all treated with drugs that greatly increased the risk of KC they are (a) too rare to lead to a cryptic genetic correlation as large as what we see here e.g. for hypothyroidism (rg = -0.19, P = 1.05 x10 -4 ) (Supplementary Table 1) , and (b) the genetic correlation e.g. for hypothyroidism was negative with BCC where a drug induced cryptic overlap would give a positive genetic correlation. Fourth, our study also highlights the potential role of cardiometabolic biomarkers in BCC/SCC risk. Besides the PUFA levels, whose causal association link with the BCC risk has been established through a Mendelian randomisation study 112 , our results highlight a potential causal relationship between cardiometabolic biomarkers including; diastolic and systolic blood pressure, lipids, serum glucose, cholesterol and adiposity, and the risk of BCC and SCC. As it is the case for PUFA, downstream metabolism of these cardiometabolic biomarkers such as lipids and cholesterol results in oncogenic inflammatory biomarkers (e.g. prostaglandins E, thromboxane A2, and leukotriene B). However, some novel variants for the cardiometabolic pathway could be influencing BCC and SCC risk through already known pigmentation and immune regulatory biological pathways e.g. rs1136165 in CKB, and rs10774625 in ATXN2 31, 32, 76, 97 . Fifth, we also unveil important genes with a potential role in BCC and SCC initiation and progression e.g. FAP, CDKL1, MARK3, RAB11FIP2, GAB2, SUOX and SOX6. Although loci within these genes had pleiotropic effects with pigmentation traits, the aforementioned genes have established roles in tumor cell proliferation, migration and invasion, and downregulation of apoptosis in a number of cancers e.g. melanoma, colorectal cancer, and breast cancer 39, [48] [49] [50] 74, 75, 113 . . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 8, 2022. ; https://doi.org/10.1101/2022.03.06.22271725 doi: medRxiv preprint Besides the novel findings, our results further emphasise the shared biology between cutaneous melanoma and keratinocyte cancers. In total four novel loci for BCC and SCC at ATM, DSTYK, GPR98, and SOX6 were previously known for CM 20, 103, 104 . One strength of the MTAG method is the increase in statistical power to identify several loci that a standard single trait GWAS would not have done. For example, using MTAG we increased our sample size by 2.6 times and 8.3 times for BCC and SCC respectively. Owing to the great improvement in statistical power, our MTAG-derived BCC PRS outperformed (for KC risk stratification) the one derived from a single trait BCC GWAS. One caveat with the MTAG approach is that it assumes that the genetic variants have a homogeneous effect across all the included traits so that the results are not driven by a certain trait to result in false positives 15 Figure 2) . Secondly, there was good replication of our results in an independent cohort, which counters concerns of false positives. In conclusion, leveraging the genetic correlation between skin cancers, autoimmune diseases and pigmentation traits revealed novel susceptibility loci for SCC and BCC, as well as produced a powerful and optimised PRS for KC risk stratification. Novel loci are implicated in keratinocyte cancer development and progression, pigmentation, cardiometabolic pathways, and immuneregulatory pathways including; innate immunity against coronaviruses, HIV-1 viral load modulation and disease progression. . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 8, 2022 Participants that contributed to the phenotype-specific genome-wide association studies were of homogenous European ancestry drawn from different cohorts from Australia, Europe and America. While there was sample overlap across the included GWAS, MTAG adjusts and corrects for biases due to sample overlap 15 . The major cohorts used included; the UK Biobank (UKB) 18 8, 114 . . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. in Canada comprising about 50,000 participants (45-85 years) randomly recruited between 2010 and 2015 from ten provinces 115, 116 . More information about the cohort has been published elsewhere 115, 116 and summarised here. It consists of two cohorts; the "Tracking cohort" of ~ 20,000 participants recruited through a telephone questionnaire in ten provinces, and the "Comprehensive cohort" with ~ 30,000 individuals who provided data through an in-person questionnaire, clinical/physical tests and biological samples (e.g for genetic data) in seven provinces. In general, at baseline information on relevant variants including age, and sex were recorded, and participants were also asked whether they had been diagnosed with any cancer including KC . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 8, 2022. ; https://doi.org/10.1101/2022.03.06.22271725 doi: medRxiv preprint Firstly, for purposes of validation and selection of the optimal PRS models (as described below in Stage 6 analysis) we randomly selected 1,523 cancer-free controls and 388 prevalent KC cases at the baseline. Thus, our validation sample included 1,911 participants with a mean age of 65.81 years (sd =10.25) and 52.75% males. Secondly, for we tested the BCC PRSs a second sample (unrelated to the validation dataset) of 18,933 participants of European ancestry, with a mean age of 61.80 years (sd = 9.84), followed up for a mean duration of 2.9 years (sd = 0.3) and 49.63% males. Only participants with complete data on age, sex, and cancer status and KC diagnosis were included. Thus, 18,139 controls with no history of any cancer (at follow up 1) and 794 participants who developed KC during follow up. We conducted two case-control GWAS using UKB data for BCC, N = 307,684 (20,791 cases and 286,893 controls) and SCC, N = 294,294 (7,402 SCC cases and 286,892 controls) of European ancestry. We adjusted for age and sex as well as the first ten ancestral principal components (PCs) in order to control for biases from population stratification. We used Scalable and Accurate Implementation of GEneralized mixed model (SAIGE) software for the analysis since it controls for sample relatedness and case-control imbalance 25 . Analysis was restricted to single nucleotide polymorphism (SNPs) with minor allele frequency (MAF) > 1% and an imputation quality score of 0.3. BCC/SCC. Case ascertainment and definition are described in Supplementary Information. In addition, we conducted GWAS for pigmentation traits (e.g. skin colour, hair colour, tanning response, skin burn, sunburn, etc.), all-cancer, autoimmune conditions, and blood biochemistry biomarkers (e.g. C-reactive protein, vitamin D, glucose, albumin, aspartate aminotransferase, . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 8, 2022. ; https://doi.org/10.1101/2022.03.06.22271725 doi: medRxiv preprint gamma-glutamyl transferase, etc.) using data from international cohorts including; UKB, QSkin, and GERA as described in Supplementary Information, Supplementary Table 2 and Supplementary Table 3 . We also conducted GWAS on KC and all-cancer after accessing data from eMERGE (dbGaP, study accession: phs000360.v3.p1) and GERA (dbGaP, study accession: phs000674.v3.p3) cohorts respectively (Supplementary Information). We also accessed publicly available GWAS summary statistics e.g. for cutaneous melanoma 20 , smoking 28 We used LDSC version 1.0.1 119 , to compute the genetic correlation (rg) 16 between BCC and a range of other traits including; other skin cancer types, pigmentation traits, autoimmune traits and biochemistry biomarkers (recently released in the UKB). We then repeated this process for SCC instead of BCC. We used data from publicly available GWAS, as well as GWAS data from international cohorts of participants of European ancestry (conducted in stage 1 above). Traits with a statistically significant (P < 0.05) rg greater than 10% with either BCC or SCC were selected and included in the MTAG model (Figure 1 & Supplementary Table 1) . We further sought for additional traits that were genetically correlated with BCC or SCC using data from the LD hub catalog 17 . Out of about 700 phenotypes, no additional phenotypes were selected to be included in the final MTAG model. European ancestry met the inclusion criteria. The 22 genetically correlated traits included; BCC, SCC, skin colour, hair colour excluding red hair, hypothyroidism, type 1 diabetes, gammaglutamyltransferase, aspartate aminotransferase, serum vitamin D levels, albumin, C-reactive . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 8, 2022. ; https://doi.org/10.1101/2022.03.06.22271725 doi: medRxiv preprint protein, and glucose in the UK Biobank 19 , KC, red hair and mole count in the QSkin 21 , KC in eMERGE (dbGaP, study accession: phs000360.v3.p1), all-cancer in GERA cohort (dbGaP, study accession: phs000674.v3.p3), melanoma risk as measured by the latest and largest melanoma risk gwas meta-analysis 20 , vitiligo 26 , education attainment 27 and smoking 28 . All the above studies excluded 23andMe, to enable us to utilise the 23andMe data as a replication set. Details on the phenotypic measurements and definitions are described in the Supplementary Information, Supplementary Table 2 and Supplementary Table 3 Stage 3: Multi-trait analysis of GWAS summary statistics Next, using a total of 22 genetically correlated traits we conducted a multi-phenotype analysis of GWAS summary statistics (generated at stage 1 analysis and selected in stage 2) using MTAG software version 1.0.8 15 . MTAG default settings were used. MTAG combines GWAS summary statistics for genetically correlated traits into a meta-analysis while accounting for genetic correlation, sample overlap, maximising power to identify loci associated with the trait(s) of interest (here BCC and SCC) 15 . MTAG generates trait specific results for each phenotype included in the model. BCC and SCC GWAS summary data from UKB from stage 1 were included as trait 1 and 2 respectively in the model below; MTAG model: BCC + SCC + melanoma + pigmentation traits + autoimmune traits + ……. After the quality control measures, the analysis was restricted to 5,301,239 SNPs common in all the 22 GWAS with a minor allele frequency of >1%, and no ambiguous alleles. We assessed the effective increase in sample size after MTAG by comparing the average chi-squared before and after MTAG for BCC and for SCC using the following formula: (1-average_chi_MTAG_output)/ Where MTAG input corresponds to the input for either BCC or SCC in the UKB dataset. . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. We used FUMA v.1.3.6 120 , to identify independent, genome wide significant SNPs and the genomic risk loci, and performed annotation of candidate SNPs in the genomic loci and functional gene mapping. We also conducted gene-based and pathway analyses using MAGMA v.1.7, as implemented in FUMA v.1.3.6 121 . For the gene pathway analysis, gene ontology (GO) and curated . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 8, 2022. ; https://doi.org/10.1101/2022.03.06.22271725 doi: medRxiv preprint gene sets from MSigDB (v5.2) 122 were used and corrected for multiple testing. GWAS catalog 123 and Open Targets platform 124 were used to annotate novel loci and their relationship with other traits. Next, we sought to replicate the BCC and SCC susceptibility loci in a large independent cohort using data from the 23andMe research cohort. For BCC the replication cohort included 251,963 self-reported cases and 2,271,667 controls while the SCC replication comprised 134,700 cases and self-reported cases and 2,394,699 controls of European ancestry filtered to remove close relatives. Previous studies have shown high accuracy of 23andMe BCC/SCC self-reported cases 8 The BCC results were adjusted for a genomic control inflation factor λ=1.286. The equivalent inflation factor for 1000 cases and 1000 controls λ1000=1.001, and for 10000, λ10000=1.006. In a similar way, the SCC results were adjusted for a genomic control inflation factor λ=1.172. The equivalent inflation factor for 1000 cases and 1000 controls λ1000=1.001, and for 10000, λ10000=1.007. We compared the concordance of the effect sizes (log OR) for the MTAG results versus the replication results (Figure 4c and Figure 4d) . We further analysed the number of loci that replicated at genome-wide significant level (P = 5.0 x 10 -8 ), after multiple testing correction (i.e. Bonferroni correction P = 6.49 x 10 -4 for BCC and P = 7.24 x 10 -4 for SCC) and at a nominal P = 0.05. . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. To construct two comparable polygenic risk scores (PRSs) for BCC, we separately used the BCC MTAG output (generated in stage 3) and the UKB BCC single-trait GWAS (generated in stage 1) summary statistics as the discovery data sets. MTAG 15 drops SNPs with extremely significant associations with any input trait, which resulted in a number of previously reported pigmentation associated SNPs being dropped from the model. Hence in both the MTAG and UKB discovery GWAS summary statistics we also included four functional SNPs (rs1805007 for MC1R, rs1126809 for TYR, rs6059655 for ASIP, and rs12203592 for IRF4) that would otherwise have been dropped in the PRS using the weights from a previously published BCC PRS 125 . Next using autosomal, non-ambiguous, and bi-allelic SNPs overlapping in the CLSA cohort (MTAG discovery = 5,300,872 SNPs and UKB discovery = 5,300,868 SNPs) we performed LD clumping based on (r 2 =0.005 and LD window= 5000 kb, P =1) to yield 62,494 and 62,884 independent SNPs for MTAGPRS and UKBPRS models respectively. PLINK 1.90b6.8 126 for clumping. Using the clumped independent SNPs above, we generated PRS models at varying pvalue thresholds i.e. S1 (P < 5x10 -8 ), S2 (P < 10 -7 ), S3 (P < 10 -6 ), S4 (P < 10 -5 ), S5 (P < 10 -4 ), S6 (P < 10 -3 ), S7 (P < 10 -2 ) and S8 (P < 10 -1 ) in validation sample of 1,911 participants split from the CLSA cohort using log odds ratio (from the respective discovery GWAS; MTAG or UKB) as weights. PLINK2 (v2.00a3LM 5 May 2021 release) 126 was used for generating the PRS scores. For both MTAG and UKB PRS models, we used Nagelkerke's R 2 127 , a metric for model fitness used for selecting the optimal model. We computed the R 2 by comparing the model fitness between models with PRSs (BCC~MTAGPRS or UKBPRS + age + sex + 10 Pcs) and a null model using predictABEL package 128 in R software version 4.0.2 129 . . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) 126 , we generated individual scores for CLSA participants for both the BCC MTAGPRS and UKBPRS weighted by their respective effect sizes (log odds ratios). The genetic scores were standardized to variance of 1 in order to interpret the associations as odds ratio per standard deviation increase in the PRS. We compared the performance of the two BCC PRSs (MTAGPRS vs UKBPRS) based on the magnitude of the association (odds ratios) and the net reclassification improvement for KC risk using R version 4.0.2 129 . For net reclassification improvement, we compared the net reclassification index and the percentage of the participants who got reclassified to an appropriate risk group/tertile i.e. the low risk (bottom tertile), moderate risk (middle tertile), and high risk (top tertitle) after adding the MTAGPRS vs UKBPRS to the base model containing age, sex and the 10 PCs. . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted March 8, 2022. ; https://doi.org/10.1101/2022.03.06.22271725 doi: medRxiv preprint The full GWAS summary statistics for this study will be made available through the NHGRI-EBI GWAS Catalogue (https://www.ebi.ac.uk/gwas/downloads/summary-statistics). The PRS developed and utilised in this study are provided in the Supplementary Tables 7 and 8 and will also be made available via the PRS catalog https://www.pgscatalog.org/ UK Biobank data is available through application via https://www.ukbiobank.ac.uk/. Information on UKB blood biochemistry biomarkers can be found in the UKB dataset: http://biobank.ctsu.ox.ac.uk/crystal/label.cgi?id=17518 Data are available from the Canadian Longitudinal Study on Aging (www.clsa-elcv.ca) for researchers who meet the criteria for access to de-identified CLSA data. The variant-level data for the 23andMe replication dataset are fully disclosed in the manuscript. . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted March 8, 2022. ; https://doi.org/10.1101/2022.03.06.22271725 doi: medRxiv preprint . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted March 8, 2022. ; https://doi.org/10.1101/2022.03.06.22271725 doi: medRxiv preprint under Application Number 190225. The CLSA is led by The following members of the 23andMe Research Team contributed to this study Prevalence and costs of skin cancer treatment in the U.S Cancer in Australia: Actual incidence data from 1982 to 2013 and mortality data from 1982 to 2014 with projections to 2017 Non-melanoma skin cancer in Australia Keratinocyte Carcinomas: Current Concepts and Future Research Priorities The incidence and multiplicity rates of keratinocyte cancers in Australia Psoriasis regression analysis of MHC loci identifies shared genetic variants with vitiligo Common and different genetic background for rheumatoid arthritis and coeliac disease Genome-wide association study identifies 14 novel risk alleles associated with basal cell carcinoma Investigation of type 1 diabetes and coeliac disease susceptibility loci for association with juvenile idiopathic arthritis Genome-wide association studies and polygenic risk scores for skin cancer: clinically useful yet? A meta-analysis of 87,040 individuals identifies 23 new susceptibility loci for prostate cancer The Tensin-3 protein, including its SH2 domain, is phosphorylated by Src and contributes to tumorigenesis and metastasis Combined analysis of keratinocyte cancers identifies novel genome-wide loci Multi-trait analysis of genome-wide association summary statistics using MTAG Estimation of Genetic Correlation via Linkage Disequilibrium Score Regression and Genomic Restricted Maximum Likelihood LD Hub: a centralized database and web interface to perform LD score regression that maximizes the potential of summary level GWAS data for SNP heritability and genetic correlation analysis UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age The UK Biobank resource with deep phenotyping and genomic data Genome-wide association meta-analyses combining multiple risk phenotypes provide insights into the genetic architecture of cutaneous melanoma susceptibility Cohort profile: the QSkin Sun and Health Study The eMERGE Network: a consortium of biorepositories linked to electronic medical records data for conducting genomic studies Electronic medical records for genetic research: results of the eMERGE consortium Characterizing Race/Ethnicity and Genetic Ancestry for 100,000 Subjects in the Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies Genome-wide association studies of autoimmune vitiligo identify 23 new risk loci and highlight key pathways and regulatory variants Genome-wide association study identifies 74 loci associated with educational attainment Association studies of up to 1.2 million individuals yield new insights into the genetic etiology of tobacco and alcohol use Mapping the Human Kinome in Response to DNA Damage Isolation of a novel human gene, MARKL1, homologous to MARK3 and its involvement in hepatocellular carcinogenesis Leveraging Polygenic Functional Enrichment to Improve GWAS Power Genome-wide study of hair colour in UK Biobank explains most of the SNP heritability Rab11-FIP2 promotes the metastasis of gastric cancer cells Overexpression of Rab11-FIP2 in colorectal cancer cells promotes tumor migration and angiogenesis through increasing secretion of PAI-1 Genome-wide association study in 176,678 Europeans reveals genetic loci for tanning response to sun exposure A GWAS in Latin Americans highlights the convergent evolution of lighter skin pigmentation in Eurasia Genome-wide association study in Japanese females identifies fifteen novel skin-related trait associations Genome-wide association meta-analysis of individuals of European ancestry identifies new loci explaining a substantial fraction of hair color variation and heritability Grb2-Associated Binder2) Plays a Crucial Role in Inflammatory Signaling and Endothelial Dysfunction. (Review) A role for the scaffolding adapter GAB2 in breast cancer Increased proliferation and altered growth factor dependence of human mammary epithelial cells overexpressing the Gab2 docking protein Characterization of tumor-suppressive function of SOX6 in human esophageal squamous cell carcinoma Identification of Sox6 as a regulator of pancreatic cancer development The role of Sox6 and Netrin-1 in ovarian cancer cell growth, invasiveness, and angiogenesis MicroRNA-766 targeting regulation of SOX6 expression promoted cell proliferation of human colorectal cancer Decreased expression of SOX6 confers a poor prognosis in hepatocellular carcinoma RNAi-mediated downregulation of CDKL1 inhibits growth and colony-formation ability, promotes apoptosis of human melanoma cells CDKL1 promotes tumor proliferation and invasion in colorectal cancer Evaluation of cyclin-dependent kinase-like 1 expression in breast cancer tissues and its regulation in cancer cell growth FERM domain-containing protein FRMD5 regulates cell motility via binding to integrin β5 subunit and ROCK1 FERM-containing protein FRMD5 is a p120-catenin interacting protein that regulates tumor progression Genome-wide association analyses using electronic health records identify new SUOX is negatively associated with multistep carcinogenesis and proliferation in oral squamous cell carcinoma SUOX is a promising diagnostic and prognostic biomarker for hepatocellular carcinoma Sulfite Oxidase Is a Novel Prognostic Biomarker of Advanced Gastric Cancer The ribosomal protein S26 regulates p53 activity in response to DNA damage An inherently bifunctional subset of Foxp3+ T helper cells is controlled by the transcription factor eos Eos mediates Foxp3-dependent gene silencing in CD4+ regulatory T cells Fine mapping of type 1 diabetes susceptibility loci and evidence for colocalization of causal variants with lymphoid gene enhancers Shared genetic origin of asthma, hay fever and eczema elucidates allergic disease biology Genetic influences on susceptibility to rheumatoid arthritis in African-Americans Genome-wide association study of medication-use and associated disease in the UK Biobank Polymorphisms of large effect explain the majority of the host genetic contribution to variation of HIV-1 virus load Association Between Single-Nucleotide Polymorphisms in HLA Alleles and Human Immunodeficiency Virus Type 1 Viral Load in Demographically Diverse, Antiretroviral Therapy-Naive Participants From the Strategic Timing of AntiRetroviral Treatment Trial G protein-coupled receptor kinases promote phosphorylation and beta-arrestin-mediated internalization of CCR5 homo-and hetero-oligomers Mutational analysis of the CCR5 and CXCR4 genes (HIV-1 co-receptors) in resistance to HIV-1 infection and AIDS development among intravenous drug users Genetics of HIV-1 infection: chemokine receptor CCR5 polymorphism and its consequences Human B cells express the orphan chemokine receptor CRAM-A/B in a maturation-stage-dependent and CCL5-modulated manner Differential roles of MDA5 and RIG-I helicases in the recognition of RNA viruses The interferon-induced helicase C domain-containing protein 1 gene variant (rs1990760) as an autoimmune-based pathology susceptibility factor A Balancing Act: MDA5 in Antiviral Immunity and Autoinflammation Pro-tumorigenic roles of fibroblast activation protein in cancer: back to the basics FAP positive fibroblasts induce immune checkpoint blockade resistance in colorectal cancer via promoting immunosuppression Fibroblast Activation Protein-α-Positive Fibroblasts Promote Gastric Cancer Progression and Resistance to Immune Checkpoint Blockade Detection and interpretation of shared genetic influences on 42 human traits The adaptor Lnk (SH2B3): an emerging regulator in vascular cells and a link between immune and inflammatory signaling Lnk prevents inflammatory CD8 + T-cell proliferation and contributes to intestinal homeostasis A sestrin-dependent Erk-Jnk-p38 MAPK activation complex inhibits immunity during aging Genetics of rheumatoid arthritis contributes to biology and drug discovery Genome-wide association analyses identify 13 new susceptibility loci for generalized vitiligo HaploReg: a resource for exploring chromatin states, conservation, and regulatory motif alterations within sets of genetically linked variants Human IL2RA null mutation mediates immunodeficiency with lymphoproliferation and autoimmunity Association analyses identify 38 susceptibility loci for inflammatory bowel disease and highlight shared genetic risk across populations Large scale meta-analysis characterizes genetic architecture for common psoriasis associated variants Interleukin 31, a cytokine produced by activated T cells, induces dermatitis in mice Large-scale association analysis identifies new lung cancer susceptibility loci and heterogeneity in genetic susceptibility across histological subtypes Genetic loci associated with plasma phospholipid n-3 fatty acids: a meta-analysis of genome-wide association studies from the CHARGE Consortium Genome-wide association study of plasma N6 polyunsaturated fatty acids within the cohorts for heart and aging research in genomic epidemiology consortium Current evidence linking polyunsaturated Fatty acids with cancer risk and progression. Front Genetics of blood lipids among ~300,000 multi-ethnic participants of the Million Veteran Program A large electronic-health-record-based genome-wide study of serum lipids Human glucokinase regulatory protein (GCKR): cDNA and genomic cloning, complete primary structure, and chromosomal localization Identification of 64 Novel Genetic Loci Provides an Expanded View on the Genetic Architecture of Coronary Artery Disease A Large-Scale Multi-ancestry Genome-wide Study Accounting for Smoking Behavior Identifies Multiple Significant Loci for Blood Pressure Genome-wide association analysis of autoantibody positivity in type 1 diabetes cases Two newly identified genetic determinants of pigmentation in Europeans Radiogenomics Consortium Genome-Wide Association Study Meta-Analysis of Late Toxicity After Prostate Cancer Radiotherapy ATM phosphorylation of Nijmegen breakage syndrome protein is required in a DNA damage response DNA damage activates ATM through intermolecular autophosphorylation and dimer dissociation Novel pleiotropic risk loci for melanoma and nevus density implicate multiple biological pathways Genome-wide association study identifies three new melanoma susceptibility loci Two-stage genome-wide association study identifies a novel susceptibility locus associated with melanoma Association analyses of more than 140,000 men identify 63 new prostate cancer susceptibility loci UV Radiation and the Skin International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) membrane (M) protein inhibits type I and III interferon production by targeting RIG-I/MDA-5 signaling MDA5 Governs the Innate Immune Response to SARS-CoV-2 in Lung Epithelial Cells The African-American population with a low allele frequency of SNP rs1990760 (T allele) in IFIH1 predicts less IFN-beta expression and potential vulnerability to COVID-19 infection Azathioprine and Risk of Skin Cancer in Organ Transplant Recipients: Systematic Review and Meta-Analysis Skin cancer in organ transplant recipients: effects of immunosuppressive medications on DNA repair Polyunsaturated fatty acid levels and the risk of keratinocyte cancer: A Mendelian Randomisation analysis. Cancer Epidemiology Biomarkers & Prevention Rab11-FIP2 suppressed tumor growth via regulation of PGK1 ubiquitination in non-small cell lung cancer Genome-wide association study identifies novel susceptibility loci for cutaneous squamous cell carcinoma Cohort Profile: The Canadian Longitudinal Study on Aging (CLSA) The Canadian longitudinal study on aging (CLSA) Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program Meta-analysis of genome-wide association studies for height and body mass index in ∼700000 individuals of European ancestry Functional mapping and annotation of genetic associations with FUMA MAGMA: generalized gene-set analysis of GWAS data The Molecular Signatures Database Hallmark Gene Set Collection The new NHGRI-EBI Catalog of published genome-wide association studies (GWAS Catalog) Open Targets Genetics: systematic identification of trait-associated genes using large-scale genetics and functional genomics Polygenic Risk Scores Allow Risk Stratification for Keratinocyte Cancer in Organ-Transplant Recipients Second-generation PLINK: rising to the challenge of larger and richer datasets A note on a general definition of the coefficient of determination PredictABEL: an R package for the assessment of risk prediction models R: A Language and Environment for Statistical Computing. (R Foundation for Statistical Computing The incidence and clinical analysis of non-melanoma skin cancer