key: cord-0910904-hat5e9uu authors: Khullar, Saniya; Wang, Daifeng title: Predicting gene regulatory networks from multi-omics to link genetic risk variants and neuroimmunology to Alzheimer’s disease phenotypes date: 2021-09-20 journal: bioRxiv DOI: 10.1101/2021.06.21.449165 sha: ea7ab79496a1157ab8e95ef637843fd213794b3e doc_id: 910904 cord_uid: hat5e9uu Background Genome-wide association studies have found many genetic risk variants associated with Alzheimer’s disease (AD). However, how these risk variants affect deeper phenotypes such as disease progression and immune response remains elusive. Also, our understanding of cellular and molecular mechanisms from disease risk variants to various phenotypes is still limited. To address these problems, we performed an integrative multi-omics analysis of genotype, transcriptomics, and epigenomics for revealing gene regulatory mechanisms from disease variants to AD phenotypes. Method First, given the population gene expression data of a cohort, we construct and cluster its gene co-expression network to identify gene co-expression modules for various AD phenotypes. Next, we predict transcription factors (TFs) regulating co-expressed genes and AD risk SNPs that interrupt TF binding sites on regulatory elements. Finally, we construct a gene regulatory network (GRN) linking SNPs, interrupted TFs, and regulatory elements to target genes and gene modules for each phenotype in the cohort. This network thus provides systematic insights into gene regulatory mechanisms from risk variants to AD phenotypes. Results Our analysis predicted GRNs in three major AD-relevant regions: Hippocampus, Dorsolateral Prefrontal Cortex (DLPFC), Lateral Temporal Lobe (LTL). Comparative analyses revealed cross-region-conserved and region-specific GRNs, in which many immunological genes are present. For instance, SNPs rs13404184 and rs61068452 disrupt SPI1 binding and regulation of AD gene INPP5D in the Hippocampus and LTL. However, SNP rs117863556 interrupts bindings of REST to regulate GAB2 in DLPFC only. Driven by emerging neuroinflammation in AD, we used Covid-19 as a proxy to identify possible regulatory mechanisms for neuroimmunology in AD. To this end, we looked at the GRN subnetworks relating to genes from shared AD-Covid pathways. From those subnetworks, our machine learning analysis prioritized the AD-Covid genes for predicting Covid-19 severity. Decision Curve Analysis also validated our AD-Covid genes outperform known Covid-19 genes for classifying severe Covid-19 patients. This suggests AD-Covid genes along with linked SNPs can be potential novel biomarkers for neuroimmunology in AD. Finally, our results are open-source available as a comprehensive functional genomic map for AD, providing a deeper mechanistic understanding of the interplay among multi-omics, brain regions, gene functions like neuroimmunology, and phenotypes. Alzheimer's Disease (AD), a neurodegenerative disease, affects over 50 million elders worldwide 1 . Late-onset AD (LOAD) comprises >97% of all cases, usually occurring after age 65 2 . AD patients experience phenotypic changes such as memory loss, cognitive decline, and weak executive function (e.g., poor Mini-Mental State Examination scores) 1 . Also, many underlying molecular changes happen, like amyloid-beta plaques, neurofibrillary tangles (NFTs), and chronic neuroinflammation (by dysregulated innate immunity). However, molecular mechanisms causing AD progression and phenotypes remain elusive. Gene expression and regulation is a key mechanism leading to human diseases. Studies have found differentially expressed genes (DEGs) in AD in different brain regions, e.g., Hippocampus CA1, Lateral Temporal Lobe (LTL), Dorsolateral Prefrontal Cortex (DLPFC), implying important roles of those regions to AD. Also, gene co-expression networks have been widely used to identify co-expressed gene modules and link gene expression patterns to AD phenotypes 3 . Genes in a module show similar expression dynamics across AD phenotypes, implying they share certain molecular mechanisms dysregulated in AD 4 . Nevertheless, understanding gene regulatory mechanisms controlling those DEGs and co-expressed genes and modules for various AD phenotypes is unclear. Recent Genome-Wide Association Studies (GWAS) identified many genetic variants associated with AD 5 . Linking those AD variants to genes and regulatory mechanisms provides a deeper understanding of molecular causes in AD. AD SNPs have further linked several AD genes, (e.g. BIN1, MS4A4A, SPI1, TP53INP1 6 ) plus 11 more by Proteome-Wide Association Studies 7 . TREM2 and CD33 mutations are associated with microglia (resident macrophage immune cells of the brain) activation, increased cytokines, and higher AD risk 8 . However, most AD SNPs locate on non-coding DNA, and it is still challenging to link them to potential AD genes. Biologically, gene expression is controlled by different regulatory factors such as regulatory variants, transcription factors (TFs), and regulatory elements (e.g., enhancers, promoters). Those factors work together as a gene regulatory network (GRN) to control the expression and functions of genes. For instance, a SNP can disrupt binding sites of TFs (TFBSs) on enhancers to cause abnormal gene expression, potentially leading to diseases, a.k.a, potential disease genes. Using such GRNs, recent studies have predicted many disease genes and functions driven by disease risk variants, such as 321 high-confident Schizophrenia genes 9 . However, how GRNs link AD variants to genes and functions for phenotypes is unclear, especially across regions. To address these issues, we performed an integrative analysis of multi-omics to reveal genes, functions, and GRNs from AD variants to AD phenotypes for three brain regions as above (Fig. 1, Methods) . Given a region, we first build a gene co-expression network using population gene expression data from an AD cohort and identify co-expressed genes and modules associated with AD phenotypes. We then integrate chromatin interaction data (e.g., Hi-C) and TF-gene expression relationships to predict TFs regulating co-expressed genes if they bind to the regulatory elements that control them. Finally, we find AD SNPs interrupting those TFBSs, and output a brain-region GRN linking them together. Moreover, since altered immunity is a major factor contributing to LOAD (e.g., neuroinflammation could start decades before), we used those GRNs to link potential biomarker genes and variants for neuroimmunology in AD. Particularly, we looked at the genes from the neuroimmunology pathways shared between AD and SARS-CoV-2 virus (Covid- 19) , two diseases with a recently reported strong correlation. AD brains have an increased level of circulating cytokines, associated with the cytokine storm (overactivated inflammatory immune system response) impacting Covid-19 morbidity and mortality 10 . Covid-19 has widely affected elders with neurodegenerative diseases and can mediate neuroinflammation 11 , thus providing potential novel insights on a dysregulated immune system in AD and progression. Covid-19 survivors can also have an increased risk of neurological and psychiatric problems, known as Neuro-COVID 12 . β-coronaviruses (like Covid-19) may attack the Central Nervous System, elevating AD dementia processes in the brain 13 . On the other hand, AD patients have a threefold higher risk of infection and doubly higher risk of death from Covid-19 14 . AD risk genes, like APOE4, increase severe Covid-19 susceptibility twofold 15 . Therefore, looking at gene regulatory mechanisms for shared AD-Covid pathways potentially enables a better mechanistic understanding of neuroimmunology in AD. The NFKB pathway, shared by AD and Covid-19, regulates normal brain homeostasis (maintaining synaptic plasticity, neurite outgrowth, learning, and memory; moderating neuronal survival and apoptosis; processing neuronal information) 10 , innate immunity and cellular processes including inflammation and cell growth 16 . Since Covid-19 serves as a strong marker for an exaggerated and overreactive immune system, elucidating pathways disrupted in both Covid-19 and AD may provide more insights on misguided immunity in AD onset and progression. A prominent hypothesis believes AD is primarily caused by impaired NFKB pathway signaling, where NFKB TFs are activated, resulting in neuroinflammation, oxidative stress complications, microglia activation, increased cytokine levels, and neuronal death observed in AD brain regions 10 . NFKB TFs are also involved in a positive feedback loop with activating cytokines during Covid-19, contributing to exaggerated inflammatory responses in severe Covid-19 17 . Along with the NFKB Pathway (which induces expression of proinflammatory cytokines and is a key driver in both diseases), there are several shared pathways in Covid- 19 and AD. However, underlying gene functions linking immunological functions from Covid-19 to AD are unknown. Finally, we found the subnetworks of GRNs relating to genes from AD-Covid shared pathways. We then trained a machine learning model to predict severe Covid-19 from those network genes using RNA-seq data of a recent Covid-19 cohort. Our network genes outperform known Covid-19 genes for prediction. Using our model, we prioritized a set of highly predictive genes as AD-COVID genes, which can be potential biomarkers for immune system dysregulation and inflammation in AD (since Covid-19 severity is correlated with excessive proinflammatory immune response 18 ). The pipeline of our integrative analysis for predicting gene regulatory mechanisms from AD risk variants to phenotypes Our analysis can be summarized as a pipeline to predict gene regulatory networks (GRNs) from disease risk variants to phenotypes (Fig. 1) . The network for specific phenotypes links disease risk variants (e.g., GWAS SNPs), non-coding regulatory elements, transcription factors (TFs) to genes and genome functions, providing comprehensive mechanistic insights on gene regulation in disease phenotypes. Specifically, the pipeline includes the following steps. Here, our analysis is open-source available at https://github.com/daifengwanglab/ADSNPheno. • Step 1: Input gene expression data at the population level. The input data includes gene expression data of individuals and clinical information on AD phenotypes such as Braak staging and progression. • Step 2: Construct and cluster gene co-expression network. The pipeline constructs a gene co-expression network linking all possible gene pairs from the input data. The network edge weights are the Pearson correlations of the gene expression profiles across input samples. The gene co-expression network is further clustered into gene co-expression network modules. The genes in the same co-expression module are likely involved in similar functions and co-regulated by specific regulatory mechanisms. • Step 3: Annotate modular functions by enrichment analyses. To annotate the functions of gene co-expression modules, we calculate enriched pathways and functions, including KEGG pathways, REACTOME pathways, and Gene Ontology (GO) terms of the genes in each gene co-expression module. • Step 4: Associate AD phenotypes with genes and modules. We associate genes and modules with the phenotypes of input samples, revealing potential driver genes and modules for the phenotypes. • Step 5: Predict GRNs for genes and modules. We apply multiple computational methods to predict the gene regulatory networks that link TFs, non-coding regulatory elements to genes and modules, providing regulatory mechanistic insights on AD genes and modules. • Step 6: Link disease risk variants to the gene regulatory network. Our analysis further finds disease risk variants that interrupt the binding sites of TFs (TFBSs) in the gene regulatory networks for identifying functional variants to genes and modules to AD phenotypes. Step 7: Output a gene regulatory network linking disease variants to AD phenotypes. Ultimately, this network is the output that links AD genetic risk variants, non-coding regulatory elements, TFs to genes, and genome functions (via modules) for various phenotypes in the input data. Population gene expression data and data processing in Alzheimer's disease We applied this pipelined analysis to post-mortem human AD population gene expression datasets for three major AD brain regions: Hippocampal CA1, Lateral Temporal Lobe (LTL), and Dorsolateral Prefrontal Cortex (DLPFC). Also, we processed the datasets as follows. Finally, there are 12,183 shared genes across these 3 brain regions (Venn Diagram in Fig. S1 ). Epigenomic data has identified a variety of regulatory elements such as enhancers and promoters. Chromatin interaction data (e.g., Hi-C) have further revealed interacting enhancers and gene promoters. Thus, we integrated recent published epigenomic and chromatin interaction data for three brain regions to link enhancers to genes (via promoters). For Hippocampal Ca1, we obtained its enhancers and promoters from Brain Open Chromatin Atlas 27 and promoter-based interactions from GSE86189 28 . To identify promoters in LTL and DLPFC, we used R package, TxDb.Hsapiens.UCSC.hg19.knownGene 29 , to retrieve promoter start and stop positions of genes, using a short ultra-conserved promoter length of 5,000 base pairs upstream of the protein-coding start site 30 . Besides, we used the H3K27ac data from GSE130746 25 to find LTL enhancers. This dataset contains information on the target gene, distance that the H3K27ac mark is from the target gene's Transcription Start Site (TSS), and enhancer start and end positions. The enhancers in LTL that we used were at least 1,000 bases away from the TSS. Moreover, for DLPFC, we used the enhancers and interacting enhancerpromoter pairs in DLPFC from PsychENCODE 31 . We applied WGCNA 32 to population gene expression data to construct and cluster gene coexpression networks into gene co-expression modules (minimum module size = 30 genes). Also, we further used a K-Means clustering step based on code 33 and methodology that has been previously applied to conventional WGCNA and proven to improve module assignments and functional enrichments, in applications such as for brain-specific cell-type marker enrichments 34 . This additional K-means step utilizes the modular eigengenes (MEs) from WGCNA modules as initial centroids, initial gene assignments, and computable distance between gene and MEs for K-Means to re-assign genes to optimal modules across the iterations. In total, we obtained 30 gene co-expression modules for Hippocampus (13,073 genes), 56 modules for the LTL (25,292 genes), 35 modules for DLPFC (26,014 genes). Co-expressed genes in the same module are highly likely involved in similar functions and pathways. The enrichment analysis has thus been widely used to identify such functions and pathways in a gene module. Enrichment p-values were adjusted using the Benjamini-Hochberg (B-H) correction. Given a group of genes (e.g., from a module) for each brain region, we used multiple tools and hundreds of data sources for enrichment analyses (Table S1) . We used the highest enrichment -log10(adjusted p-value) scores from any source for each gene module and respective enrichment in a brain region. Then, for each enrichment for a phenotype in a region, we averaged the non-zero enrichment values for the gene modules that are statistically significantly positively correlated for that phenotype. We further associated genes and modules with these key AD developmental phenotypes: AD Stages and Progression (Moderate Stage, Severe Stage, and AD Progression), Healthy/Resilient (Control Stage or other resilient individuals with better cognitive abilities despite AD pathology), APOE genotype (APOE E4/E4 is a huge AD risk factor 35 ), Braak Staging, neuritic plaque accumulation (measured by CERAD Score), and cognitive impairment level (based on the MMSE Score). We associated gene co-expression modules with all possible AD phenotypes from the input data, by computing the correlations of each modular eigengene (ME) to each phenotype. The eigengenes of modules by WGCNA are the first principal components of modular gene expression. A ME is a vector representing expression levels of input samples and is the likeliest gene expression pattern of module genes. Second, based on the MEs, we used moduleTraitCor() and moduleTraitPvalue() in WGCNA to correlate our MEs with our phenotypes and find the most significantly associated phenotypes to the modules. Statistically significant module-phenotype associations for analysis have a p-value less than 0.05 and a positive correlation. Also, we used gene co-expression networks to examine the relationship between genes and AD phenotypes and identify potential driver (hub) genes for the modules (based on degree of connectivity for each gene in each module). Gene Regulatory Networks (GRNs), a key molecular mechanism, fundamentally control gene expression. Also, co-expressed genes are likely co-regulated by similar GRNs. Thus, our analysis integrates multiple methods to predict GRNs from gene expression data and coexpression modules. This study predicted GRNs in 3 brain regions that link transcription factors (TFs), regulatory elements, and target genes/modules. First, we identified regulatory elements (e.g. enhancers and promoters) that potentially interact using recent chromatin interaction data (Hi-C) and the scGRNom pipeline 36 . Second, we inferred the TF binding sites (TFBSs) based on consensus binding site sequences on interacting enhancers and promoters by TFBSTools 37 and motifmatchr 38 . This step generates a reference network linking TFs to regulatory elements (by TFBSs) to genes (by interactions). Third, using gene expression data for a given brain region, we predicted all possible TF-target gene (TG) pairs (or TF-modules) that have strong expression relationships by 3 widely used tools and databases: RTN 39 , TReNA Ensemble Solver 40 , Genie3 41 , (and TF-gene-module pairs by RTN), as below. Finally, this step maps these TF-TG pairs to the reference network. It outputs a full GRN for the region that links TFs, noncoding regulatory elements to target genes and modules. We combined a recent list of TFs 42 with JASPAR's list 43 to generate a final list of candidate TFs for inferring TF-TG pairs with strong expression relationships. We used this final TF list to find candidate TFs for each brain region (based on genes in the respective gene expression data). Also, TReNA Ensemble Solver with the default parameters (geneCutoff of 0.1 and ensemble of these solvers: LassoSolver, RidgeSolver, RandomForestSolver, LassoPVSolver, PearsonSolver, and SpearmanSolver) was used to construct the transcriptional regulatory network that links TFs to target genes (TGs). Besides, we used GENIE3 to predict additional GRNs via Random Forest regression, predicting each gene's expression pattern from the expression patterns of all TFs (TF-TG pairs with weights greater than 0.0025 were retained). In addition, we used RTN to predict TFs to TGs by calculating the Mutual Information between the TFs and all genes; permutation analysis with 1,000 permutations, bootstrapping, and the ARACNe algorithm 44 were used to select most meaningful network edges. Finally, the TF-TG pairs found in at least 2 of the above 3 sources were combined to map to the reference network. For the DLPFC, we instead used the published PsychENCODE GRN (Elastic Net regression weight of 0.1 as a cutoff) filtered for genes found in the DLPFC gene expression data 9 . In addition to predicting TFs for individual genes, we inferred TFs significantly co-regulating genes in a module in the Hippocampus and LTL. In particular, we performed Master Regulatory Analysis (MRA) on the RTN-inferred network by the RTN package 39 . For each gene module, MRA performed enrichment analysis using the inferred GRN, the phenotype (Module Membership correlation of all genes to that module), and hits (genes assigned to that module). It applied the hypergeometric test for overlaps between TFs and the genes (using gene expression data) and found the statistically significant TFs for each module. Linking GWAS SNPs for AD to gene regulatory elements GWAS studies have identified a wide variety of genetic risk variants associated with diseases. However, most disease risk variants lie on non-coding regions, hindering finding disease genes and understanding downstream disease functions. To this end, we used GRNs as described above to link AD SNPs to regulatory elements in the networks via interrupted TFBSs. We looked at 97,058 GWAS SNPs significantly associated with AD (i.e., AD risk SNPs with p<0.005) 5 . We overlapped those AD risk SNPs with the regulatory elements such as enhancers and promoters in the GRNs from Step 5. Then, we identified the variants that interrupt the TFBSs on the regulatory elements by motifbreakR 45 (using ENCODE-motif, FactorBook, HOCOMOCO, and HOMER data sources, default methodology, threshold of 0.001), and further linked them to the genes from the regulatory elements with interrupted TFBSs. An extension of 10 kilobase pairs was added to start and end positions of enhancers and an extension of 2 kilobase pairs was added to start and end positions of promoters. We found that 83,842 SNPs out of 97,058 AD GWAS SNPs interrupt the binding sites of 787 TFs. To investigate potential mechanistic interplays between AD and Covid-19, we compared KEGG pathways 46 for AD (hsa05010) and Covid-19 (hsa05171). Shared AD-Covid mechanisms include: Nuclear Factor Kappa B (NFkB), Inhibitor of Nuclear Factor Kappa B Kinase (IKK), c-Jun N-terminal Kinase (JNK), Interleukin-6 (IL-6), Phosphoinositide 3-Kinase (PI3K), Tumor Necrosis Factor alpha (TNFa), TNF Receptor. Further, we found 22 genes involved in those AD-Covid common mechanisms and they correlate highly with AD phenotypes in different brain regions. Pathview 47 was used to visualize the correlations of those genes and AD phenotypes. Also, for each brain region, we found the subnetworks of its GRN in AD that have TFs regulating those AD-Covid common genes as the region's AD-Covid regulatory network. To gauge the clinical predictive performance of our AD-Covid genes and networks in terms of predicting Covid-19 severities (and thereby immune system dysregulation), we looked at recent population RNA-seq gene expression data in blood of Covid-19 samples (GSE157103) 48 to check whether any genes from our AD-Covid regulatory networks can predict Covid-19 severities (e.g. being in the Intensive Care Unit (ICU) or not). To this end, we median normalized this gene expression data (19, 472 genes) and then applied differentially expression analysis by DESeq2 49 between 50 ICU and 50 non-ICU Covid patients. Volcano plots highlighted differentially expressed AD-Covid genes (adjust p<0.05). A smoothing factor of 0.01 was added to the numerator and denominator when computing the empirical log2(fold change). Apart from differentially expression analysis, which aims to find individual associated genes, we performed machine learning analysis for genes from our AD-Covid regulatory networks to see if they together can predict severe Covid-19 or not. In addition to our AD-Covid genes (from each region and combined), we also compared the machine learning prediction performance from other published Covid-19 genes. Since we predict Covid-19 severity, we compared predictive models from our lists with the respective performance of the benchmark list. A recent study 50 Python's Scikit-Learn 54 package, was used for our machine learning analysis. We used a Support Vector Machine (SVM) classification model (with linear kernel and balanced class weights, outputting predicted probabilities based on Python's sklearn.svm.SVC package). The classification accuracy of each gene was calculated by stratified 4-Fold Cross Validation (CV); each fold had 25 unique human samples in the testing set (between 12 and 13 samples from each class) and the remaining 75 samples in the training set (between 37 and 38 samples from each class). For the 18 published Covid-19 susceptibility genes, we performed Recursive Feature Elimination (RFE) CV based on a SVM classification model; this calculated the classification accuracy of each gene and the optimal number of published genes to use (smallest number of genes with the maximum stratified 4-fold CV accuracy). We fixed all models to incorporate only this optimal number of genes. Then, to build a model for each of our input gene lists, we performed RFE based on our linear SVM to select this fixed optimal number of predictive genes from the list for classifying ICU vs. non-ICU Covid-19 patients with high accuracy (i.e., feature selection). We then trained an SVM classification model again with those select predictive genes and reported the average accuracy and AUROC values of the model using 4-Fold stratified CV. That is, we average the values for accuracy and AUROC across all 4 folds. Each of the 4 stratified CV folds trained a linear SVM kernel model, predicting the probability that a given Covid-19 positive sample would be in the ICU. Within each fold, we predicted these probabilities for our testing set of 25 samples, to get the predicted probabilities for our samples, which is input for our Decision Curve Analysis (DCA). We used DCA 55 to evaluate and compare the machine learning models of those brain-region AD-Covid genes and benchmark genes for predicting Covid severity. DCA is widely used for making medical decisions, improving upon traditional comparison metrics (e.g., AUROC) for predictive models and other approaches requiring additional information to address individuals' clinical consequences for individuals 55 . Given a model and a threshold probability pT, patients will be sent to ICU if their percentage risks for Covid severity (i.e., ICU) from the model are greater than or equal to pT. Based on this, the true positive (TP) count is the number of Covid-19 severe individuals sent to the ICU, and the False Positive (FP) is the number of Covid-19 non-ICU individuals sent to the ICU. Thus, pT inherently represents subjective clinician preferences for FPs versus False Negatives (FNs: number of severe Covid-19 patients who were wrongly not sent to the ICU). Based on TP, FP and pT, the DCA then calculates Net Benefit = TP/N -((FP/N)*pT/(1-pT)), where N is the total number of patients (N=100 here). Thus, the Net Benefit represents the benefit of true positive ratio (TP/N) from false positive ratio (FP/N) weighted by odds of pT (i.e., pT/(1-pT)). DCA provides a simple, personalized risktolerance based approach of using pT to weight the FN and FP mistakes: lower thresholds represent a fear of FNs over FPs, and vice-versa. For instance, for a clinician who sends a Covid-19 positive individual with predicted severity of at least pT = 20% to the ICU, the utility of treating a Covid-19 severe individual is 4 times greater than the harm of needlessly sending a non-severe Covid-19 patient to the ICU. We compared our predictive models with 2 extremes: Treat All (predict 1 for all Covid-19 positive patients and send all to ICU regardless of severity) and Treat None (predict 0 for all positive patients and send none to ICU). Practically, a clinician ought to opt for the predictive model or extreme intervention strategy with the highest Net Benefit based on that clinician's preferred pT; thus, two clinicians (who may have their own, different pT values) may obtain different optimal results. Thus, DCA can evaluate the clinical usability of a Covid-19 severity prediction model based on its Net Benefit across clinically reasonable pT values. Finally, we performed DCA using code from Memorial Sloan Kettering Cancer Center 56 . Besides Covid severity, we also calculated gene expression correlations with Covid-19 and non-Covid for the genes from the AD KEGG pathway for three brain regions. Gene co-expression network analysis reveals gene expression dynamics for AD phenotypes across multiple brain regions First, we applied our analysis to population gene expression datasets of three major brain regions relating to AD: Hippocampal CA1, Lateral Temporal Lobe (LTL), and Dorsolateral Prefrontal Cortex (DLPFC) (Materials and Methods). We identified several gene co-expression modules showing specific gene expression dynamic changes for various AD phenotypes as below. These expression dynamics also imply potential underlying gene regulatory mechanisms in the phenotypes. In particular, given a brain region, we constructed and clustered a gene coexpression network to a set of gene co-expression modules. In a gene co-expression network for a region, nodes are genes. Each edge represents that the 2 respective genes have correlated gene expression profiles during AD progression (i.e., co-expression). There are likely groups of co-expressed genes within the network form densely connected sub-networks (gene co-expression modules). Genes within a module share similar gene expression dynamics in the region for the observed AD phenotypes. We used modular eigengenes (MEs) to represent such expression dynamics for a gene module, using the first principal components of modular gene expression matrices. The information for all modules with associated phenotypes is available in Supp. files 1, 2 and 3 for Hippocampus, LTL, and DLPFC, respectively. Hippocampal CA1. We identified 30 gene co-expression modules in the Hippocampus. 21 modules were significantly positively associated with at least one phenotype: AD progression, Braak Stage progression, aging, accumulation of neurofibrillary tangles (NFTs), MMSE score, cognitive impairment, AD, and resilience. Also, their MEs show specific expression dynamics ( Fig. 2A for 7 select modules, Fig. S2 for all). For instance, pink and lightyellow modules have high expression values for Controls (no AD) and are clustered together. On the other hand, greenyellow, yellow, magenta, and tan modules have high expression levels for AD individuals and cluster together. In between those groups of modules is the midnightblue module, which has relatively high expression for some Control and some AD individuals. Next, using these expression dynamic patterns, we further linked these gene modules to key AD phenotypes (Fig. 2B ) using their significant correlations ( Fig. S3 and Supp. file 1 for all modules). For instance, the greenyellow module has significantly correlations with AD, AD progression, moderate and severe stages, Braak 6 stage and cognitive impairment. The tan module has the highest correlation (r = 0.68) with severe stage, along with other AD phenotypes. The midnightblue module is more significant (r = 0.41) for the Braak 4 stage, where affected individuals typically exhibit mild symptoms of dementia. The lightyellow module is significant for cognitive resilience (r = 0.5), which is the ability of individuals to exhibit stronger cognitive functioning despite AD pathology. Moreover, the lightyellow module strongly correlates with better MMSE performance (r = 0.58) than the pink module (r = 0.44). Instead, the pink module is more significantly correlated with the Braak 3 stage (r = 0.45). Lateral Temporal Lobe. We identified 56 gene co-expression modules and found that 28 modules are significantly positively correlated with AD phenotypes of interest. We highlighted 5 MEs in Fig 2C, showing specific expression dynamics for AD phenotypes in LTL (Fig. S4 for all modules). 12 gene modules are positively associated with AD Progression, 1 with Aging (mediumpurple3), 1 with Gender (lightgreen), 12 with Controls, and 2 with Initial Stage (associated with Braak 1 and 2 stages). As shown in Fig. 2C , the sienna3 module has higher expression values for both old and young individuals in Controls. Orange, magenta, and yellow modules are clustered together and have higher expression values for AD samples. The midnightblue module is clustered between both groups and tends to be associated with higher Braak stages. As shown in Fig. 2D , the sienna3 module also has a statistically significant positive correlation with Control Stage (r = 0.63) and asymptomatic based on the Braak stage (r = 0.55). The midnightblue module is associated with aging, average Braak stage, clinical Braak stage, and AD. The yellow, orange, and magenta modules are associated with aging, AD and Braak progression phenotypes, and neuritic plaque accumulation (based on CERAD score); the orange module has a very strong correlation with dementia Braak stages (r = 0.72) and AD (r = 0.72). More module-phenotype associations are available in Fig. S5 and Supp. File 2. Dorsolateral Prefrontal Cortex. We found 35 gene co-expression modules for DLPFC. Figs. 2E and 2F highlighted 6 of those gene modules: darkolivegreen, yellow, red, royalblue, tan, and green (Figs. S6-7 for all modules). Those modules are significantly associated with various AD phenotypes such as progression, MMSE, APOE genotype, neuritic plaques. Larger sample size for the DLPFC (over 20 times larger than that for Hippocampal or LTL), likely attributes to relatively lower correlation coefficients between modules and AD phenotypes in DLPFC. However, we still see significantly correlated modules with various AD phenotypes (Fig. 2F , p< 0.05). For example, the tan gene module is associated with the worst APOE genotype (E4/E4) (r = 0.082), AD diagnosis age (r = 0.12). The royalblue and green modules are statistically significantly positively correlated with severe stage based on last MMSE score, with correlations of r = 0.22 and r = 0.2, respectively. In terms of better outcomes, the darkolivegreen module is significant for Controls (r = 0.11), better performance on last MMSE (r = 0.17), and cognitive resilience (r = 0.12). Furthermore, the red module is significant for better performance on last MMSE (r = 0.15) and has a similar correlation as darkolivegreen module for cognitive resilience. Therefore, those AD-phenotype-associated gene co-expression modules uncover the specific gene expression dynamic patterns across phenotypes and suggest that those co-expressed genes in the same modules are likely involved in similar functions and pathways. To understand this, we performed enrichment analysis of the modules as follows. Eigengenes and enrichments of co-expression modules reveal hub genes, gene functions, and pathways in AD phenotypes Gene module enrichment analysis allows us to better understand biological functions, structures, diseases, and other observed biological phenomena associated with AD phenotypes. Through enrichment analysis (Methods), we found enriched functions and pathways of AD modules and linked them to various AD phenotypes associated with the modules (Fig. 3 and Supp. document). Overall, healthier phenotypes are Control Stage (no AD), cognitive resilience, and protective APOE E2/E2 genotype. Hippocampal CA1 (Fig. 3A , Supp. file 1): The Hippocampus CA1 (crucial for autobiographical memory, mental time travel, self-awareness) has a significant loss in neurogenesis, memory ability, volume, and neuronal density in AD 57 . Modules for non-AD phenotypes are enriched with synaptic plasticity and dendrite development, norepinephrine neurotransmitter release cycle, and calcium signaling pathway, which can all be dysregulated in AD. This also suggests that resilient individuals may be protected from microsatellite instability and amyloid accumulation that may even occur naturally. Further, our analysis supports recent hypotheses on dysregulated immune systems in AD. Aging, NFTs, and AD developmental phenotypes are associated with abnormal innate immunity and IL-6 secretion. Our cognitive impairment, Braak progression, and AD modules are enriched for: viral genes, Covid-19 spike glycoprotein, activated TAK1 mediating p38 MAPK activation (linked to tau phosphorylation, neurotoxicity, neuroinflammation, synaptic dysfunction, worse AD 58 ), NFKB pathway (impaired and overexpressed during AD, leading to neuroinflammation, microgliosis, suppression of Wnt Signaling 10 ). We found an association between Severe AD and immunologic memory, antigenantibody interactions, and regulation of Interferon-Alpha Signaling. Interferon (cytokines mediating host response to viral infection with high expression in Hippocampal tissues of AD mice 59 ) response to immunogenic amyloid may activate microglia, initiate neuroinflammation, and lead to synaptic loss 60 . Many AD phenotypes are associated with Death Receptor Signaling, positive regulation of gliogenesis, Constitutive Signaling by aberrant PI3K in Cancer, positive regulation of JNK cascade (activated in AD brains, involved in tau phosphorylation and neuronal death) 70 . Xenobiotic Metabolic Processes are specific to Severe AD modules only, and studies 61 found links between dementia progression and various metabolic pathways. LTL (Fig. 3B, Supp. file 2) : The LTL contains the cerebral cortex (responsible for hearing, understanding language, visual processing, facial recognition) and is impacted early in AD 25 . Control modules are enriched with several pathways typically present in healthy conditions like Wnt signaling, Actin organization, and transmission across chemical synapses. Dysregulation of those pathways can lead to neurodegeneration, e.g., Wnt signaling to inhibit amyloid-beta production and tau protein hyperphosphorylation in AD progression 62 . Second, in LTL modules for AD and neuritic plaques, loss of Nlp from Mitotic centrosomes is enriched (may lead to reduced microtubule stability, abnormal cellular morphology, and functions in AD 63 ). Also, we found cell-type associated enrichments in the AD progression-related phenotypes, e.g., astrocyte projection for clinical Braak stage and asymptomatic. Astrocytes are increasingly activated near amyloid plaques in LOAD, producing pro-inflammatory cytokines and reactive oxygen species 64 . Other AD-related functions and pathways include postsynaptic differentiation, stress-activated signaling, TRAF-mediated NF-kB activation, prion pathway, epigenetic modifications. The prion pathway feedback loop is likely disrupted during AD leading to Aβ accumulation 65 . Epigenetic modifications (e.g., Histone A4 acetylation) can be dysregulated to affect gene expression of long-term potentiation and memory formation in AD; a recent study observed dramatic H4K16ac losses near genes linked to aging and AD in LTL 25 . AD-related phenotype modules are more enriched for Frontotemporal Dementia than controls are. DLPFC (Fig. 3C, Supp. file 3) : The DLPFC is involved in executive functioning, supports cognitive responses to sensory information, works with the Hippocampus to mediate complex cognitive functions 66 , has plasticity deficits in AD 67 . Microglia exclusively express many AD risk genes like APOE 68 . Our APOE E2 modules are shielded from neurotoxins and associated with mitochondrial inheritance (p<1e-16), while E4 modules are strongly enriched for cellular response to Aβ and cognitive dysfunction. Having both high-risk E4 alleles (instead of 1) is not associated with mitochondrial depolarization but more strongly associated with Serum Amyloid A Protein (p<1e-28 vs. p<1e-16), neuroimmunomodulation, monocyte chemoattractant proteins. We found several strong promising associations (some with p, P. S. , M. R.