Genome-wide prediction and integrative functional characterization of Alzheimer’s disease-associated genes


 1 

Genome-wide prediction and integrative 1 

functional characterization of Alzheimer’s 2 

disease-associated genes 3 

Cui-Xiang Lin1, #, Hong-Dong Li1, #, Chao Deng1, Weisheng Liu1, Shannon Erhardt2, 4 

Fang-Xiang Wu3, Xing-Ming Zhao4,5, Jun Wang2, Daifeng Wang6,7, Bin Hu8,*, Jianxin 5 

Wang1,* 6 

  7 

1 Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and 8 

Engineering, Central South University, Changsha, Hunan 410083, P.R. China 9 
2 Department of Pediatrics, McGovern Medical School, The University of Texas Health 10 

Science Center at Houston, Houston, TX 77030, USA 11 
3 Division of Biomedical Engineering, University of Saskatchewan, Saskatoon, 12 

SKS7N5A9, Canada. 13 
4 Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, 14 

Shanghai 200433, China 15 
5 Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, 16 

Ministry of Education, China 17 
6 Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, 18 

Madison, WI 53705, USA 19 
7 Waisman Center, University of Wisconsin - Madison, Madison, WI, 53705 USA 20 
8 Institute of Engineering Medicine, Beijing Institute of Technology, Beijing, 100081, 21 

China. 22 

# Authors contributing equally 23 

*Correspondence: bh@bit.edu.cn; jxwang@mail.csu.edu.cn 24 

 25 

 26 

.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 10, 2021. ; https://doi.org/10.1101/2021.02.09.430536doi: bioRxiv preprint 

https://doi.org/10.1101/2021.02.09.430536
http://creativecommons.org/licenses/by-nc/4.0/


 2 

Abstract 27 

The mechanism of Alzheimer’s disease (AD) remains elusive, partly due to the incomplete 28 

identification of risk genes. We developed an approach to predict AD-associated genes 29 

by learning the functional pattern of curated AD-associated genes from brain gene 30 

networks. We created a pipeline to evaluate disease-gene association by interrogating 31 

heterogeneous biological networks at different molecular levels. Our analysis showed that 32 

top-ranked genes were functionally related to AD. We identified gene modules associated 33 

with AD pathways, and found that top-ranked genes were correlated with both 34 

neuropathological and clinical phenotypes of AD on independent datasets. We also 35 

identified potential causal variants for genes such as FYN and PRKAR1A by integrating 36 

brain eQTL and ATAC-seq data. Lastly, we created the ALZLINK web interface, enabling 37 

users to exploit the functional relevance of predicted genes to AD. The predictions and 38 

pipeline could become a valuable resource to advance the identification of therapeutic 39 

targets for AD. 40 

Keywords: Alzheimer’s disease; disease gene prediction; functional gene networks 41 

Introduction 42 

Alzheimer’s disease (AD) is a complex and progressive neurodegenerative disorder that 43 

accounts for the majority of all dementia cases1. Its clinical symptoms include progressive 44 

memory loss, personality change, and impairments in thinking, judgment, language, 45 

problem-solving, and movement2. The two neuropathological hallmarks of AD are 46 

extracellular amyloid-β (Aβ) plaques and intracellular neurofibrillary tangles (NFTs), which 47 

are known to contribute to the degradation and death of neurons in the brain3. The number 48 

.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 10, 2021. ; https://doi.org/10.1101/2021.02.09.430536doi: bioRxiv preprint 

https://doi.org/10.1101/2021.02.09.430536
http://creativecommons.org/licenses/by-nc/4.0/


 3 

of patients with AD worldwide is currently rising. Specifically, it is estimated that 49 

approximately 50 million people are currently living with AD or other forms of dementia, 50 

and this number is expected to increase to over 152 million by 20501. AD not only causes 51 

suffering in both patients and their families but also places a severe burden on society. 52 

However, the drug development for AD is slowly progressing4, partly due to the 53 

incomplete understanding of the neuropathological mechanisms. 54 

AD is partly caused by genetic mutations4. Its two subtypes, i.e., early-onset AD (EOAD, 55 

onset age before 65 years) and late-onset AD (LOAD, onset age later than 65 years), 56 

have different genetic risk factors. In EOAD, rare mutations in APP, PSEN1 and PSEN2 57 

have been identified4. LOAD is markedly more complex, with APOE being a well-known 58 

risk gene for this subtype. Most known or putative AD-associated genes were discovered 59 

through genome-wide association studies (GWAS). Previously, GWAS identified CLU, 60 

CR1, and PICALM, along with approximately 20 more genes4. In addition, network 61 

approaches are used to identify AD-associated molecular networks or pathways. For 62 

example, a module-trait network approach was proposed and applied to identify gene 63 

coexpression modules that were associated with cognitive decline5, while a large-scale 64 

proteomic analysis identified an energy metabolism-linked protein module, strongly 65 

associated with AD pathology6. However, a large proportion of the phenotypic variances 66 

in AD cannot be explained by known risk genes7, 8, 9, which suggests additional AD-67 

associated genes that remain to be discovered. Since experimental approaches are often 68 

time consuming and expensive, computational approaches provide a promising 69 

alternative to discovering AD-associated genes. 70 

.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 10, 2021. ; https://doi.org/10.1101/2021.02.09.430536doi: bioRxiv preprint 

https://doi.org/10.1101/2021.02.09.430536
http://creativecommons.org/licenses/by-nc/4.0/


 4 

     Previous studies have shown that functional gene networks (FGNs) are promising for 71 

predicting disease-associated genes10, 11. In a FGN, a node represents a gene and the 72 

edge between two genes represents the co-functional probability (CFP) that the two 73 

genes take participate in the same biological process or pathway12. For example, Guan 74 

et al. constructed a global (i.e., non-tissue specific) FGN for mice, and identified Timp2 75 

and Abcg8 as two novel genes associated with bone-mineral density13, 14. Using the same 76 

network, Recla et al. discovered Hydin as a new thermal pain gene13, 14. Because gene 77 

interactions might be rewired in different tissues, global networks cannot reveal the 78 

differences of gene networks among tissues. To address this limitation, tissue-specific 79 

networks have been proposed to more accurately capture gene interactions in tissues. 80 

Greene et al. established 144 human tissue-specific networks and investigated these 81 

networks for the interpretation of gene functions and diseases15. Using the brain-specific 82 

network15, Krishnan et al. predicted disease genes for autism spectrum disorder11. By 83 

leveraging the functional genomic data of model species with similar genetic backgrounds, 84 

including mice and rats, a human brain-specific network was constructed, and its 85 

application to the identification of brain disorder-associated genes was illustrated in our 86 

previous work16. 87 

Because AD is a brain disorder with genetic contributions, we hypothesized that brain-88 

specific FGNs are informative for predicting AD-associated genes. It should be pointed 89 

out that our predictions of AD-associated genes do not indicate any causality, that is, the 90 

predicted genes may be either directly or indirectly associated with AD. To build models 91 

for AD-associated gene prediction, we first compiled AD-associated genes from multiple 92 

resources. These genes were used as positives for training models. We proposed a 93 

.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 10, 2021. ; https://doi.org/10.1101/2021.02.09.430536doi: bioRxiv preprint 

https://doi.org/10.1101/2021.02.09.430536
http://creativecommons.org/licenses/by-nc/4.0/


 5 

functional enrichment-based approach to identify negative genes that are not likely 94 

associated with AD. Next, we obtained ten brain-specific FGNs from the GIANT15 and 95 

BaiHui16 databases. After assessing the predictivity of each network by cross-validation 96 

of state-of-the-art machine learning models, we built a final model for predicting AD-97 

associated genes through an optimal selection of networks and machine learning 98 

methods. We scored all the other human genes that were not used in model training for 99 

their association with AD. We created a pipeline to evaluate top-ranked novel candidate 100 

genes by interrogating multiple biological networks. We then identified gene modules from 101 

an AD-related network. We assessed the association of these modules and top-ranked 102 

genes with AD-related phenotypes, including Consortium to Establish a Registry for 103 

Alzheimer’s Disease (CERAD) score, Braak stage, and clinical dementia rating (CDR) on 104 

an independent dataset. We next identified a set of  genes by combining our predictions 105 

and seven types of genomic evidence. We further identified potential variants that may 106 

affect the expression of prioritized genes. Lastly, we developed the ALZLINK web 107 

interface to enable the expoitation of predicted AD-associated genes. The resulting 108 

predictions and pipeline could be valuable to advance the identification of risk genes for 109 

AD. 110 

 111 

Results 112 

Prediction of AD-associated genes 113 

Our approach leverages machine learning and a brain FGN to predict AD-associated 114 

genes. The approach consists of three main components: compilation of AD-associated 115 

(positive) and non-AD (negative) genes, construction of a feature matrix based on a brain 116 

.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 10, 2021. ; https://doi.org/10.1101/2021.02.09.430536doi: bioRxiv preprint 

https://doi.org/10.1101/2021.02.09.430536
http://creativecommons.org/licenses/by-nc/4.0/


 6 

FGN, and prediction of AD-associated genes using machine learning models (Fig. 1). We 117 

first compiled a set of AD-associated genes and non-AD genes to train models (see the 118 

Methods section; Supplementary Note 1). We showed that the negative genes were 119 

superior to those selected by the random sampling approach (Supplementary Fig. 1) and 120 

that the negative genes were poorly associated with AD (Supplementary Fig. 2). In 121 

addition, we tested their enrichment in three AD-related gene sets associated with 122 

cognitive decline (the m109 module with 390 genes)5, amyloid-beta (15 genes), and Tau 123 

pathology (28 genes)17 respectively, from two recent studies5, 17. The results showed that 124 

the negative genes were not enriched in any of the three modules or pathways (p-values 125 

= 0.99, 1, 1 respectively). Next, we extracted a feature matrix for the positive and negative 126 

genes based on FGNs. For each gene (positives, negatives, or the other genes), its CFPs 127 

with the positive genes in the network were collected into a 147-dimensional feature 128 

vector. We considered the 10 collected brain FGNs (nine from GIANT and one from 129 

BaiHui) and evaluated their ability to predict AD-associated genes using state-of-the-art 130 

machine learning methods, including LR, SVM, RF, and ExtraTrees, which were shown 131 

to be promising in a previous study18. We found that the network in the BaiHui database 132 

achieved the best performance based on the four methods tested and that ExtraTrees 133 

performed better than the other methods in terms of both the area under the receiver 134 

operating characteristic curve (AUROC) and the area under the precision-recall curve 135 

(AUPRC) (Fig. 2A; Supplementary Fig. 3-5). Finally, we selected this network in 136 

combination with ExtraTrees to construct the model for predicting AD-associated genes. 137 

We performed five-fold cross-validation with ExtraTrees. Each of the five models 138 

established during cross-validation was used to score all other human genes that were 139 

.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 10, 2021. ; https://doi.org/10.1101/2021.02.09.430536doi: bioRxiv preprint 

https://doi.org/10.1101/2021.02.09.430536
http://creativecommons.org/licenses/by-nc/4.0/


 7 

not included in the training dataset. To achieve robust predictions, we repeated the cross-140 

validation 100 times and calculated an average score for each gene. The average 141 

AUROC and AUPRC based on cross validation are 0.91 and 0.76, respectively, 142 

suggesting the model is accurate. A higher score indicates that a gene is more likely to 143 

be associated with AD. The scores for predicted genes are provided in our developed 144 

web interface (www.alzlink.com). Our literature search showed that 12 of the top-ranked 145 

20 genes were likely associated with AD with some evidence (Supplementary Table 1), 146 

suggesting that our model has captured molecular signature of AD and makes confident 147 

predictions. Note that our prediction for AD-associated genes was based on only the 148 

machine learning model; the subsequent analysis such as enrichment, coexpression, and 149 

PPI relatedness was used separately to evaluate the association of predicted genes with 150 

AD. 151 

The top-ranked genes are functionally related to AD based on multiple lines 152 

of genomic evidence 153 

The top-ranked genes are enriched in AD-associated functions and phenotypes 154 

We hypothesize that genes with higher scores are more likely to be enriched in AD 155 

phenotype-related gene sets. To test this hypothesis, we excluded all genes in the training 156 

dataset, ranked the remaining ones based on their scores, and tested their enrichment in 157 

AD-related gene sets. We collected four gene sets associated with AD pathology. The 158 

first gene set was collected from AlzGene, which contained 277 genes. The other three 159 

gene sets, namely, the learning or memory pathway (214 genes), the cognition pathway 160 

(247 genes), and the amyloid-beta related pathway (51 genes), were collected from the 161 

Gene Ontology (GO) database. Using the decile enrichment test (see the Methods 162 

.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 10, 2021. ; https://doi.org/10.1101/2021.02.09.430536doi: bioRxiv preprint 

https://doi.org/10.1101/2021.02.09.430536
http://creativecommons.org/licenses/by-nc/4.0/


 8 

section), we observed that the top-ranked genes were significantly enriched in the four 163 

gene sets: AlzGene (p-value = 7.3×10-13), learning or memory pathway (p-value=6.6×10-164 

12), cognition pathway (p-value = 1.4×10-11), and amyloid-beta pathway (p-value=1.1×10-165 

9) (Fig. 2B).  166 

    We next tested whether the top-ranked genes were functionally similar to AD-167 

associated genes. From the ranked genes, we selected the same number of top-ranked 168 

genes as the curated positive genes (n=147). We then performed GO enrichment analysis 169 

of both the curated positive genes and the top-ranked genes using PANTHER19. The 170 

known positive genes and our predicted AD-associated genes were enriched in 771 and 171 

2573 terms, respectively, with 518 of these terms being shared, which was significant 172 

compared with the baseline in that no more than 1 pathway was shared (p-value<0.01). 173 

The 10 most significant shared terms are listed in Supplementary Table 2. We found that 174 

many known AD-related functions, including learning or memory, cognition, regulation of 175 

endocytosis, regulation of immune system process, regulation of cell death, and 176 

regulation of amyloid-beta formation, were shared pathways, implying that our predicted 177 

genes might be involved in AD pathology. Specifically, we tested whether the top-scored 178 

genes (score > 0.7) were involved in neuron development. Based on GO enrichment 179 

analysis, we found that they were enriched in both neuron development (GO:0048666) 180 

(FDR = 3.86×10-76) and central nervous system neuron development (GO:0021954) 181 

(FDR = 5.63×10-14). 182 

    We further tested whether the top-ranked genes overlap with gene modules that were 183 

associated with AD in published studies. A recent study identified gene coexpression 184 

modules that were related to AD5. Module 109 (m109) containing 390 genes was most 185 

.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 10, 2021. ; https://doi.org/10.1101/2021.02.09.430536doi: bioRxiv preprint 

https://doi.org/10.1101/2021.02.09.430536
http://creativecommons.org/licenses/by-nc/4.0/


 9 

strongly associated with cognitive decline. 350 genes overlapped with the brain FGN used 186 

in our work and therefore had predicted scores. We found that 101 genes in m109 were 187 

among the top-scored genes (score > 0.7), which was significant compared to the random 188 

baseline (p < 0.0001). We also obtained two gene sets from another recently published 189 

network association study on AD17. For protein phosphorylation events in AD, the study 190 

derived 28 kinases which were possibly implicated in AD, with 22 kinases having 191 

scores >0.7. Among the 14 genes in the amyloid-beta correlated cascade reported by the 192 

authors (after removing CLU because it is in the training set), nine had scores > 0.7. 193 

These results provide additional evidence that our predicted genes are associated with 194 

AD. 195 

 196 

The top-ranked genes show higher sequence similarity with AD-associated genes 197 

We evaluated whether the sequences of the top-ranked genes were similar to those of 198 

AD-associated genes using the sequence similarity method (see the Methods section). 199 

Let k∊[100, 200, 500] denote the number of top-ranked genes for testing.  We found that 200 

the top-ranked genes had significantly higher sequence similarity with AD-associated 201 

genes than randomly selected genes (p-value < 0.0001, Supplementary Fig. 6). Taking 202 

the top-ranked 200 genes as an example (Fig. 2C), the standardized SEQSIM-score was 203 

6.09, which was significantly higher than that of the randomly selected genes (SEQSIM-204 

score=-0.0006). The sequence similarity implies the functional similarity between 205 

predicted and known AD-associated genes. 206 

 207 

The top-ranked genes are coexpressed with AD-associated genes 208 

.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 10, 2021. ; https://doi.org/10.1101/2021.02.09.430536doi: bioRxiv preprint 

https://doi.org/10.1101/2021.02.09.430536
http://creativecommons.org/licenses/by-nc/4.0/


 10 

For the top-ranked k∊[100, 200, 500] genes, we showed that they were coexpressed with 209 

more AD-associated genes than random baseline on the independent Mayo RNA-seq 210 

dataset20 (p-value<0.0001) (Supplementary Fig. 7; see Methods). For example, the 211 

number of coexpressed gene pairs between the top-ranked 200 genes and the AD-212 

associated genes was significantly higher than that of randomly selected genes (p-value 213 

< 0.0001, Fig. 2C), suggesting an association of our top predicted genes with AD. 214 

 215 

The top-ranked genes interact strongly with AD-associated genes in PPI networks 216 

We hypothesized that the top-ranked k genes were more likely to interact with AD-217 

associated genes if the prediction is accurate. We obtained PPI networks from two 218 

databases: HuRI and STRING (see Methods). To avoid circularity, we removed those 219 

interactions which were used to construct the brain FGN from the two databases. We 220 

found that the top-ranked k∊[100, 200, 500] genes showed significantly more interactions 221 

with AD-associated genes (p-value < 0.0001, Supplementary Fig. 8). Taking the top-222 

ranked 200 genes as an example, the total number of interactions with AD-associated 223 

genes was 48 in HuRI, whereas only 11 interactions were found for the randomly selected 224 

genes (p-value <0.0001, Fig. 2C).  225 

 226 

The top-ranked genes are associated with AD based on miRNA-target networks 227 

miRNAs are important post-transcriptional regulators and have been implicated in AD21. 228 

We investigated whether top-ranked genes were functionally related to AD-associated 229 

genes or miRNAs. First, we observed that they shared more miRNAs with AD-associated 230 

genes than randomly selected genes (Supplementary Fig. 9; Methods). For instance, the 231 

.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 10, 2021. ; https://doi.org/10.1101/2021.02.09.430536doi: bioRxiv preprint 

https://doi.org/10.1101/2021.02.09.430536
http://creativecommons.org/licenses/by-nc/4.0/


 11 

top-ranked 200 genes shared a significant number of miRNAs with AD-associated genes 232 

(Fig. 2C, p-value<0.0001). Second, we found that the top-ranked genes interacted with a 233 

significant number of AD-associated miRNAs (Fig. 2C; Supplementary Fig. 9). These 234 

results imply that top-ranked genes are likely to be involved in post-transcriptional 235 

regulatory pathways associated with AD. 236 

  237 

AD-related regulatory networks reveal hub genes and hub miRNAs 238 

associated with AD 239 

We constructed two regulatory networks. One is a transcriptional regulatory network (TRN) 240 

extracted from the TRRUST database22 (version 2.0) that included only known and top-241 

ranked AD-associated genes (Fig. 3A and the Methods section). From this network, we 242 

identified hub genes based on outdegrees and indegrees. The genes with outdegree and 243 

indegree represent transcription factors (TFs) and target genes, respectively. The other 244 

regulatory network is a miRNA-target interaction network (Fig. 3B) extracted from 245 

mirTarBase23 (version 7.0) by considering only AD-associated genes and miRNAs 246 

(Methods). 247 

     We found that the hub genes in the AD-related TRN were supported by the literature 248 

and interaction evidence (Table 1). For example, RELA regulates 13 AD-associated 249 

genes including APOE and BACE1, interacts with 8 AD-associated genes in PPI networks, 250 

and is coexpressed with 16 AD-associated genes. Furthermore, RELA was shown to be 251 

associated with neuroprotection, learning, and memory24, 25. Another hub gene is JUN. It 252 

regulates 11 known AD-associated genes such as APP, BCL3, RELB, and PLAU, and 253 

interacts with the proteins encoded by 10 AD-associated genes such as MS4A2 and 254 

.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 10, 2021. ; https://doi.org/10.1101/2021.02.09.430536doi: bioRxiv preprint 

https://doi.org/10.1101/2021.02.09.430536
http://creativecommons.org/licenses/by-nc/4.0/


 12 

GSK3B. Besides, JUN is also responsible for Aβ-induced neuroinflammation through a 255 

signaling pathway26.  256 

    We identified genes such as CCND1 and CDKN1A as hubs in the miRNA-based 257 

regulatory network (Fig. 3B). Although some studies have reported their associations with 258 

AD27, 28, the mechanisms underlying these associations are not well understood. These 259 

genes might contribute to AD by perturbing the post-transcriptional regulatory network 260 

mediated by miRNAs (Table 1 and Fig. 3B). For example, CCND1 was associated with 261 

16 miRNAs that also bind to known AD-associated genes, including six miRNAs (miR-16-262 

5p, miR-106b-5p, miR-106a-5p, miR-20a-5p, miR-17-5p and miR-101-3p) that bind to 263 

APP and four miRNAs (miR-29b-3p, miR-186-5p, miR-29c-3p and miR-124-3p) that bind 264 

to BACE1. In addition, knockout experiments of CCND1 showed its protective role in 265 

neurodegeneration in the hippocampus29. Comparing the two networks focusing on only 266 

predicted (Fig. 3B) and known (Fig. 3C) AD-associated genes, we observed hub miRNAs 267 

such as miR-17b-5p, miR-26b-5p, miR-155-5p, miR-124-3p, and miR-106b-5p that were 268 

shared between them, indicating that the shared miRNAs might play roles in the 269 

pathology of AD. 270 

Gene modules in the integrated gene interaction network are associated with 271 

AD-related functions, neuropathological and clinical phenotypes in 272 

independent data 273 

We constructed an integrated gene interaction network by aggregating multiple lines of 274 

genomic evidence and identified four gene modules with a community cluster algorithm 275 

(Methods). The modules (denoted by M1, M2, M3, and M4) are shown in Fig. 4 (the genes 276 

in each module are provided in Supplementary Table 3). For each module, we performed 277 

.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 10, 2021. ; https://doi.org/10.1101/2021.02.09.430536doi: bioRxiv preprint 

https://doi.org/10.1101/2021.02.09.430536
http://creativecommons.org/licenses/by-nc/4.0/


 13 

enrichment analysis using PANTHER19 and identified the significantly enriched biological 278 

process terms (FDR <0.05). As many of the enriched terms were redundant, we selected 279 

representative GO terms with REVIGO30. All four modules were enriched in AD-280 

associated biological processes (Fig. 4). For example, M1 was enriched in regulation of 281 

cell death and regulation of neurogenesis; M2 was enriched in functions including 282 

response to amyloid-beta; M3 was enriched in learning or memory, regulation of synaptic 283 

plasticity; M4 was enriched in functions such as regulation of lipid transport and 284 

cholesterol efflux. These enrichments imply that the gene modules are not only 285 

biologically meaningful but also related to AD. 286 

    Next, we tested whether the modules were correlated with AD-related traits using a 287 

well established method31. For each module, we extracted the gene expression matrix 288 

containing the genes only in that module. We then computed the eigengene (i.e. the first 289 

principal component) of the expression matrix followed by correlating the eigengene with 290 

the AD-related traits of interest. We performed this analysis on the independent MSBB 291 

RNA-seq dataset with data available for three traits: the CERAD, Braak and CDR score. 292 

We conducted a total of twelve correlation tests resulting from all combinations of the four 293 

modules and the three traits. We found that the results of all correlation tests were 294 

significant (FDR < 0.05), suggesting that our identified modules were associated with AD 295 

traits. Taking the eigengene of M1 as an example, it was significantly correlated with the 296 

CERAD (r=-0.37, FDR=2.2×10-7), Braak (r=-0.41, FDR=1.5×10-8), and CDR score (r=-297 

0.42, FDR=6.1×10-9) (Figure 4B). Another example was M2, whose eigengene was 298 

significantly correlated with the three traits (Figure 4B). The correlation of M3 and M4 with 299 

the AD-related traits are provided in Supplementary Fig. 10.  300 

.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 10, 2021. ; https://doi.org/10.1101/2021.02.09.430536doi: bioRxiv preprint 

https://doi.org/10.1101/2021.02.09.430536
http://creativecommons.org/licenses/by-nc/4.0/


 14 

Individual top-ranked genes are associated with neuropathological and 301 

clinical phenotypes on independent datasets 302 

We hypothesized that the top-ranked genes were more likely to be associated with AD-303 

related phenotypes if our prediction was accurate. We tested this hypothesis using the 304 

independent MSBB RNA-seq dataset described above. For each gene, we calculated its 305 

PCC with the CERAD, Braak and CDR score (see the Methods section). To better 306 

investigate the trends between our prediction and the gene’s absolute correlation with 307 

AD-related phenotypes, we ranked all the predicted genes, divided them into 50 groups, 308 

and calculated the mean PCC for each bin. We found that higher ranks (higher predicted 309 

scores) were associated with higher mean PCC values for all three phenotypes. The 310 

predicted ranks were well correlated with the CERAD (r = 0.68), Braak (r = 0.70) and CDR 311 

(r = 0.73) score. The eigengenes for the top-ranked 100, 200 and 500 genes were all 312 

significantly correlated with CERAD, Braak and CDR scores (Supplementary Fig. 11).  313 

We then examined the correlations of individual top-ranked genes (those not included 314 

in the training set) with AD-related phenotypes5. Among the top-ranked 200 genes, we 315 

identified 95, 98 and 108 genes that were significantly correlated with CERAD, Braak and 316 

CDR scores, respectively (FDR < 0.05). Of them, 84 were correlated with all three 317 

phenotypes (Supplementary Table 4). Looking at FYN, its correlations with CERAD, 318 

Braak and CDR scores were 0.37, 0.35 and 0.37, while PRKAR1A had Pearson 319 

correlation coefficients of -0.25, -0.31 and -0.29 for the three traits respectively. These 320 

results indicate that our top-ranked genes were likely candidate genes for AD. 321 

Multiple evidence-supported AD-associated genes and their regulatory variants 322 

.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 10, 2021. ; https://doi.org/10.1101/2021.02.09.430536doi: bioRxiv preprint 

https://doi.org/10.1101/2021.02.09.430536
http://creativecommons.org/licenses/by-nc/4.0/


 15 

In the above sections, we have shown that the top-ranked genes are associated with AD 323 

based on multiple lines of functional genomic evidence. Here we performed further 324 

screening for AD-associated genes by aggregating these evidence, which are divided into 325 

two categories: (1) molecular interaction evidence reflecting the interaction of predicted 326 

genes with compiled AD-associated genes, and (2) phenotypic correlation evidence 327 

supported by correlation of predicted genes with AD traits. The former includes three 328 

types of evidence, which are protein interaction, mRNA coexpression, and miRNA sharing 329 

with AD-associated genes. The latter includes four types of evidence, which were the 330 

correlation with CERAD, Braak and CDR scores based on the MSBB dataset, and 331 

differential expression based on the ROSMAP dataset32.  332 

      To narrow down the predicted candidates, we focused on the top-ranked 200 genes 333 

(after excluding the compiled AD-associated genes). The seven types of genomic 334 

evidence for these genes are visualized as a circus plot (Figure 5), from which the 335 

evidence for each gene can be easily identified. We also obtained their enriched GO 336 

biological process terms and showed the functional annotation of these genes (Figure 5). 337 

We then applied strict criteria on functional evidence to screen for potentially confident 338 

AD-associated genes. That is, only one molecular interaction evidence and one 339 

phenotypic correlation evidence is allowed to be missing for each gene. From this, 36 out 340 

of the top-ranked 200 genes were retained (Supplementary Table 5), providing a set of 341 

multiple evidence-based candidate genes to the community for further functional 342 

experiments. As the function of a gene is directly related to the cell type it is expressed 343 

in, we further investigated the cell type specificity of their expression. Zhang et al. provides 344 

a set of genes that show cell type-specific expression in five major brain cell types 345 

.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 10, 2021. ; https://doi.org/10.1101/2021.02.09.430536doi: bioRxiv preprint 

https://doi.org/10.1101/2021.02.09.430536
http://creativecommons.org/licenses/by-nc/4.0/


 16 

including astrocyte, microglia, endothelial, oligodendrocytes and neuron33. Using this 346 

dataset, we found that 14 of the 36 genes showed specific expression in cell types such 347 

as astrocytes and microglia (Supplementary Table 6), while the others are expressed in 348 

two or more cell types. 349 

     Taking FYN as an example, it encodes a membrane-associated tyrosine kinase that 350 

is implicated in the control of cell growth and shows specific expression in astrocytes 351 

(Supplementary Table 6). It interacts with proteins encoded by 13 AD-associated genes 352 

such as APP and MAPT in PPI, shows significant coexpression with 10 AD-associated 353 

genes like CLU and interacts with 5 AD-assocaited miRNAs like hsa-mir-106b. Its 354 

expression was up-regulated based on the ROSMAP dataset (posterior error probability 355 

(PEP) =0.04)32. Its up-regulation in AD patients was further supported by the positive 356 

correlation with CERAD (PCC = 0.37), Braak (PCC = 0.35) and CDR (PCC = 0.37) scores 357 

(FDR < 0.001) on the MSBB dataset. The expression of FYN for the sample groups 358 

partitioned based on CERAD, Braak and CDR scores is shown in Figure 5A. PRKAR1A 359 

encodes a regulatory subunit of the cAMP-dependent protein kinases involved in the 360 

cAMP signaling pathway. It is functionally related with AD-associated genes through PPI, 361 

coexpression and miRNA-target network, and its expression is negatively correlated with 362 

the above three neuropathological traits (Figure 5A). Altered expression of PRKAR1A in 363 

AD patients was also identified34, providing independent evidence supporting our 364 

prediction.  365 

      Having shown that the expression level of the above genes was correlated with AD 366 

traits, we next exploited which genetic variants (SNP) might causally regulate the 367 

expression of these genes by integrating genetic and regulatory data. A SNP is likely 368 

.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 10, 2021. ; https://doi.org/10.1101/2021.02.09.430536doi: bioRxiv preprint 

https://doi.org/10.1101/2021.02.09.430536
http://creativecommons.org/licenses/by-nc/4.0/


 17 

causal if it is not only an eQTL but also resides in the transcriptional factor binding site 369 

(TFBS) within the promoter of the target gene34. By integrating eQTL and ATAC-seq data, 370 

we identified seven genes (FYN, PRKAR1A, PPP3R1, BMPR1A, LMNA, EGFR and 371 

KRAS), for which their eQTLs are also located in the TFBS (Supplementary Table 7). For 372 

instance, the SNP rs61202914 is an QTL for the expression of a FYN isoform. Further, 373 

we found that this SNP also resided in the TFBS of multiple transcription factors within 374 

the promoter region of FYN, thus likely affecting the binding affinity of the transcription 375 

factor and therefore expression level. As an illustration, RFX1_HUMAN.H11MO.0.B, 376 

which is a motif representing the TFBS of the transcription factor RFX1, harbors the SNP 377 

rs61202914 (Figure 6B). This evidence suggests that rs61202914 is likely a variant 378 

causally affecting the expression of FYN. For PRKAR1A, one TFBS in its promoter region 379 

harbors its eQTL (rs8080306) (Figure 6B), indicating that rs8080306 is likely a causal 380 

variant that regulates the expression of PRKAR1A. To summarize, our integrated analysis 381 

of eQTL and TFBS in active promoters suggests potential genetic variants that may be 382 

associated with AD through regulating the expression of their corresponding target gene. 383 

These results may be valuable to prioritize genes for further experimental studies. 384 

 385 

ALZLINK: a web resource for interrogating AD-associated genes 386 

To facilitate the interrogation of AD-associated genes and the use of the statistical 387 

evaluation pipeline developed in this work, we created the interactive web resource 388 

ALZLINK (available at: www.alzlink.com). This site provides the predicted genes along 389 

with their predicted scores and functional genomic evidence, facilitating experts in the 390 

field of AD to select candidates for further experimental testing. Also, the statistical 391 

.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 10, 2021. ; https://doi.org/10.1101/2021.02.09.430536doi: bioRxiv preprint 

https://doi.org/10.1101/2021.02.09.430536
http://creativecommons.org/licenses/by-nc/4.0/


 18 

methods to evaluate the association of an individual gene or a gene set with AD are 392 

implemented and available as an online pipeline. For an individual gene, users can query 393 

its interactions with known AD-associated genes in heterogeneous interaction networks 394 

and its correlation with AD-related traits including CERAD, CDR and Braak scores. For a 395 

gene set, users can statistically test its association with AD using the sequence or 396 

network-based methods, outputting the distribution of the test metric along with a p-value 397 

measuring the significance. For each interaction network such as PPI, the local network 398 

involving the queried gene or gene set and the known AD-associated genes is visualized 399 

on the web. The data and pipelines on ALZLINK could serve as a valuable resource for 400 

experts to prioritize AD-associated genes for further testing.  401 

 402 

Discussion 403 

AD is a neurodegenerative disease with heterogeneous pathologies8, 35, 36, 37. However, 404 

predicting AD-associated genes is challenging because AD, as a complex disease, is 405 

caused mainly by common variants of multiple genes and the disruption of related 406 

pathways. FGNs are an important model for characterizing complex functional 407 

relationships between genes and have been successfully applied to predict candidate 408 

genes for complex diseases, including autism11 and Parkinson’s disease38. Since AD is 409 

caused by gene dysregulation in the brain, we considered brain FGNs as the basis for 410 

predicting AD-associated genes. The key idea of our approach was to discover the 411 

pattern of AD-associated genes from a brain FGN using machine learning methods. Using 412 

our model, we were able to predict novel candidate genes for AD.  413 

.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 10, 2021. ; https://doi.org/10.1101/2021.02.09.430536doi: bioRxiv preprint 

https://doi.org/10.1101/2021.02.09.430536
http://creativecommons.org/licenses/by-nc/4.0/


 19 

   We evaluated the association of top-ranked genes with AD by investigating their 414 

enrichment in AD-related functions and phenotypes along with examining their 415 

association with AD through multiple heterogeneous biological networks. We found that 416 

the top-ranked genes were associated with AD. Based on the analyses of the 417 

independent MSBB data, we observed that the top-ranked genes were correlated with 418 

AD-related neuropathological (CERAD and Braak scores) and clinical (CDR) phenotypes, 419 

suggesting that they were likely associated with AD. We also explored gene modules from 420 

the AD-related network. We found that these modules were enriched in many AD-related 421 

pathways and phenotypes and were also correlated with three AD-related phenotypes, 422 

implicating their biological relevance. Combining the genomic data and our predictions, 423 

we identified a set of 36 genes whose association with AD was supported by multiple 424 

lines of evidence, indicating these genes as potential promising candidates. We further 425 

identified potential causal variants for 7 of the 36 genes by integrating brain eQTL and 426 

ATAC-seq data.  427 

    Our contributions are mainly three-fold. First, we compiled a set of genes that were 428 

likely related to AD by performing an intensive, stringent hand curation of multiple 429 

resources, providing a potential resource for the community. For negative gene selection, 430 

we proposed a pathway-based approach that works by removing any gene that was likely 431 

to be associated with AD. Thus, it can be expected that negative genes have been 432 

identified. We illustrated that this approach helped improve the accuracies of models in 433 

terms of both AUROC and AUPRC. Our model for predicting AD-associated genes 434 

depends on the non-AD (negative) genes. Different ways of negative gene selection could 435 

lead to bias in the model and thus the prediction. As our method selects negative genes 436 

.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 10, 2021. ; https://doi.org/10.1101/2021.02.09.430536doi: bioRxiv preprint 

https://doi.org/10.1101/2021.02.09.430536
http://creativecommons.org/licenses/by-nc/4.0/


 20 

by removing any gene that has a potential association with AD, a possible bias is that the 437 

predicted genes are more likely to be functionally related to and share GO terms with the 438 

compiled AD-associated genes. Second, we predicted novel candidate genes and 439 

showed that the top-ranked genes exhibit significant associations with AD through 440 

functional enrichment analysis and the investigation of multiple biological networks. 441 

Moreover, the genes were found to be correlated with AD-related phenotypes on 442 

independent datasets. Taking advantage of the functional genomic data, we identified a 443 

set of 36 AD-associated genes supported by multiple lines of evidence, indicating 444 

promising candidates. Third, we developed ALZLINK, a web interface to facilitate the use 445 

of data and pipeline developed in this study. It should be pointed out that the pipeline to 446 

evaluate the relevance of the predicted genes to AD is generic and can be applied to any 447 

other diseases. 448 

     Although our predictions are promising, as supported by our systematic analysis, our 449 

model for predicting AD-associated genes could be improved in several ways. First, our 450 

predictions were made at the gene level without differentiating the splice isoforms 451 

generated from the same gene through alternative splicing39, 40. This factor is essential 452 

because isoforms of the same gene might have different or even opposite functions. 453 

Isoforms have been implicated in diseases such as ovarian cancers41. The prediction of 454 

AD-associated genes at the isoform level could have the potential to promote our 455 

understanding of AD. Second, the human brain consists of multiple heterogeneous 456 

structures, each of which contains many different cell types. The association of the 457 

predicted genes with AD in different cell types remains to be resolved. Integrating single-458 

cell genomic data42, 43, 44 with our predicted genes could be helpful for addressing this 459 

.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 10, 2021. ; https://doi.org/10.1101/2021.02.09.430536doi: bioRxiv preprint 

https://doi.org/10.1101/2021.02.09.430536
http://creativecommons.org/licenses/by-nc/4.0/


 21 

question. Lastly, our predictions do not implicate causality. The genes predicted using our 460 

method are statistically significantly associated with AD. 461 

     In summary, we predicted novel AD-associated genes and provided evidence for their 462 

association with AD. However, further studies are needed to test the validity of our 463 

predictions. This pipeline of prediction and validation is generic and can be readily used 464 

for other diseases, such as Parkinson’s disease, cancers and heart diseases. We expect 465 

that the predicted genes might become a useful resource for experimental testing by the 466 

community and that our proposed pipeline could be used in other diseases. 467 

 468 

Methods 469 

Compilation of AD-associated and non-AD genes 470 

AD-associated (positives) and non-AD (negatives) genes are needed to build a machine 471 

learning model. First, we performed intensive hand-curation to identify confident AD-472 

associated genes from various disease gene resources, including AlzGene45, AlzBase46, 473 

OMIM47, DisGenet48, DistiLD49, and UniProt50, Open Targets51, GWAS Catalog52, 474 

differentially expressed genes (DEGs) in ROSMAP32 and published literature. The 475 

curated genes from each resource as well as the corresponding criteria were provided in 476 

Supplementary Note 1. As the AD-associated genes and their reliability vary across these 477 

resources, we applied a voting strategy and selected only those that were present in at 478 

least two resources to ensure higher reliability (see details in Supplementary Note 1). In 479 

this way, we obtained 147 AD-associated genes. Second, we selected a set of non-AD 480 

genes, which had no or minimal association with AD. The main idea of our method for 481 

non-AD gene selection was to remove any genes that exhibit potential associations with 482 

.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 10, 2021. ; https://doi.org/10.1101/2021.02.09.430536doi: bioRxiv preprint 

https://doi.org/10.1101/2021.02.09.430536
http://creativecommons.org/licenses/by-nc/4.0/


 22 

AD. We removed genes that (i) were annotated to the same Gene Ontology (GO) term 483 

enriched for the AD-associated genes or (ii) showed any association with AD based on 484 

the above-described resources (see details in Supplementary Note 1). In this way, we 485 

identified 1651 non-AD genes. 486 

Model development for predicting AD-associated genes 487 

We first constructed the feature matrix for all human genes based on the brain-specific 488 

FGN. This FGN was built by integrating heterogeneous functional genomic data, including 489 

gene expression, protein-protein interaction (PPI), protein docking and gene-to-490 

phenotype annotation using the well-established Bayesian framework16. The Bayesian 491 

network model predicts a co-functional probability (CFP) for every pair of genes by using 492 

the following formula: 493 

𝑃(𝐹!,𝐹",…,𝐹#) =
!
$
𝑃(𝑦 = 1)𝛱%&!

# 𝑃(𝑦 = 1)																																																												[1]	494 

where P(y=1) is the prior probability for a sample (i.e. a gene pair in this study) to be 495 

positive, P(Fi|y = 1), i = 1, 2, …, n, is the probability of observing the value of the i-th 496 

feature under the condition that the gene pair is functionally related, and C is a constant 497 

normalization factor. In the resulting network, a node is a gene, and an edge represents 498 

CFP that two linked genes participate in the same biological process or pathway. 499 

 For each gene, we extracted its CFP with the compiled AD-associated genes (147 500 

genes) from the network as features based on a previously proposed method18. As a 501 

result, each gene is characterized by a 147-dimensional vector. The feature data for the 502 

training set (147 positives and 1651 negatives, resulting in a total of 1798 genes) are 503 

represented by a 1798x147 matrix X. The label (1 for positives and 0 for negatives) of 504 

.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 10, 2021. ; https://doi.org/10.1101/2021.02.09.430536doi: bioRxiv preprint 

https://doi.org/10.1101/2021.02.09.430536
http://creativecommons.org/licenses/by-nc/4.0/


 23 

each gene is stored in a vector y. The feature matrix of all other genes not in the training 505 

set was extracted. 506 

    To develop a model for predicting AD-associated genes, we compared the different 507 

combinations of FGNs and machine learning models. To identify optimal FGNs for feature 508 

matrix construction, we obtained ten networks for the whole brain or brain-regions, 509 

including the brain, forebrain, frontal lobe, temporal lobe, hippocampus, thalamus, 510 

amygdala, glia and astrocytes from the GIANT database15 and the BaiHui database16. 511 

We considered these ten regions because they have been implicated in AD53, 54. As AD-512 

associated genes are likely to operate in immune cells55, 56, we investigated how well 513 

immune cells were represented in these networks. As microglia is the dominant immune 514 

cell in the brain and cell type-specific genes are indicators of the cell type of interest, we 515 

analyzed how microglia-specific genes were represented in these networks. We obtained 516 

a set of microglia-specific genes from the work33. We found that more than 95% of them 517 

existed in each of these networks, suggesting that immune cells are well represented in 518 

these networks. For the machine learning models, we considered logistic regression (LR), 519 

support vector machine (SVM), random forest (RF) and extremely randomized trees 520 

(ExtraTrees) for their promising accuracy shown in our previous work18. 521 

Statistical assessment of the relevance of top-ranked genes to AD 522 

We evaluated the relevance of the top-ranked genes to AD using the following method 523 

(the genes in the training set were excluded). These methods are based on the sequence, 524 

pathway and various biological networks, as described below.  525 

Decile enrichment test for AD pathways and phenotypes 526 

.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 10, 2021. ; https://doi.org/10.1101/2021.02.09.430536doi: bioRxiv preprint 

https://doi.org/10.1101/2021.02.09.430536
http://creativecommons.org/licenses/by-nc/4.0/


 24 

If the prediction is accurate, it is expected that AD-associated genes are more likely to be 527 

enriched in the top-ranked genes. Using the decile enrichment test proposed in the 528 

previous study11, we statistically assessed whether a larger proportion of a given AD-529 

related gene set falls into the first decile of the ranked genes. To do so, we excluded the 530 

genes in the training set, ranked the remaining genes, and split genes into 10 evenly 531 

binned deciles. Let Pnet and Prandom denote the proportion of a given gene set that falls 532 

into the first decile based on our prediction and random chance, respectively. We tested 533 

whether Pnet was significantly larger than Prandom by using the binomial test (see details in 534 

the previous work11). 535 

 536 

Evaluation based on sequence similarity 537 

Genes with similar sequences are likely to carry out similar functions. For a set of k 538 

predicted genes denoted by Gk, we evaluate its functional relationship with AD-associated 539 

genes using a sequence similarity-based score (SEQSIM-score), which measures the 540 

average similarity between predicted and known AD-associated genes.  It is calculated 541 

as:  542 

SEQSIM-score(𝑔!) =
"
!
∑ max

		𝑔𝑗∈𝐺𝑃
(𝑠𝑐𝑜𝑟𝑒(𝑔#,𝑔$))!#%"                             [2] 543 

, where GP denotes the set of compiled positive genes, score(gi, gj) is the sequence 544 

identity between a predicted gene gi and the AD-associated gene gj calculated using 545 

BLAST57. The higher the SEQSIM-score is, the more similar to AD-associated genes the 546 

predicted gene is. SEQSIM-score was standardized to have zero mean and unit variance 547 

using z-transform. For the top-ranked k∊[100, 200, 500] genes, their scores are denoted 548 

by the SEQSIM-scoreobserved. In the same way, we also calculated the SEQSIM-549 

.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 10, 2021. ; https://doi.org/10.1101/2021.02.09.430536doi: bioRxiv preprint 

https://doi.org/10.1101/2021.02.09.430536
http://creativecommons.org/licenses/by-nc/4.0/


 25 

scorerandom for a set of k randomly selected genes. We calculated 10,000 such scores 550 

from 10,000 randomly sampled gene sets. Let Nsig denote the number of random scores 551 

that are higher than SEQSIM-scoreobserved. We computed the p-value as Nsig/10000. 552 

 553 

Evaluation based on coexpression with AD-associated genes 554 

Compared to randomly selected genes, reliably predicted genes are more likely to be 555 

coregulated with AD-associated genes. Based on this hypothesis, we calculated the 556 

number of coexpressed gene pairs between top-ranked k genes and known AD-557 

associated genes using independent gene expression data. That’s to say, in each pair, 558 

one is a predicted gene and the other is a known AD-associated gene. The coexpression 559 

was measured with Pearson correlation coefficient (PCC). A gene pair was considered to 560 

be coexpressed if the PCC ≥ 0.7. To test whether the coexpression is significant, we 561 

generated 10,000 gene lists, each containing k randomly sampled genes. We calculated 562 

the number of coexpressed gene pairs for the top-ranked genes and for the randomly 563 

selected genes, denoted by Eobserved and Erandom. We calculated the p-value to measure 564 

whether Eobserved is significantly higher than Erandom.  565 

     We used the Mayo RNA-seq dataset generated from the Accelerating Medicines 566 

Partnership-Alzheimer’s Disease (AMP-AD) project (publicly available at 567 

https://www.synapse.org/#!Synapse: syn2580853) for coexpression evaluation. Note that 568 

this dataset was not used for constructing the brain FGN that was used to build the model 569 

for predicting AD-associated genes, so circularity was avoided. This dataset contains 570 

gene expression data of the temporal cortex obtained from 82 cases and 80 controls. The 571 

.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 10, 2021. ; https://doi.org/10.1101/2021.02.09.430536doi: bioRxiv preprint 

https://doi.org/10.1101/2021.02.09.430536
http://creativecommons.org/licenses/by-nc/4.0/


 26 

log2-transformed Fragments Per Kilobase of transcript per Million mapped reads (FPKM) 572 

was used for this analysis. 573 

 574 

Evaluation based on PPI networks 575 

We tested whether the top-ranked k genes were more likely to interact with AD-associated 576 

genes in PPI networks. We used the PPI data from Human Reference Interactome 577 

(HuRI)58 and Search Tool for the Retrieval of Interacting Genes/Proteins (STRING)59. 578 

Because some PPI data were integrated to build the brain FGN, such PPIs have been 579 

first removed from the two databases to avoid circularity. The interaction data in HuRI 580 

were experimentally identified. In STRING, a score is used to measure the interaction 581 

strength between two proteins; a score > 700 indicates an interaction with high confidence. 582 

Only the confident interaction was considered. We tested k values in [100, 200, 500]. For 583 

a given k value, we computed the number of genes in the top-ranked k genes that 584 

interacted with at least one AD-associated gene, denoted by Nobserved. Similarly, we also 585 

calculated Nrandom, which represents the number of genes in k randomly sampled genes 586 

that interacted with at least one AD-associated gene. With the same method described in 587 

the previous section, a p-value was calculated to measure the significance.  588 

 589 

Evaluation based on miRNA-target interaction networks 590 

This analysis was motivated by the assumption that top-ranked genes were more likely 591 

related to AD-associated genes or miRNAs based on miRNA-target interaction networks. 592 

First, we tested whether top-ranked genes and AD-associated genes share more miRNAs. 593 

We downloaded miRNA-target interaction data from miRTarBase23, a high-quality 594 

.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 10, 2021. ; https://doi.org/10.1101/2021.02.09.430536doi: bioRxiv preprint 

https://doi.org/10.1101/2021.02.09.430536
http://creativecommons.org/licenses/by-nc/4.0/


 27 

database of validated interactions. We computed the number of shared miRNAs of the 595 

top-ranked k∊[100, 200, 500] genes with AD-associated genes. Based on randomly 596 

sampled genes, we calculated a p-value to test whether the number of shared miRNAs 597 

was significant. Second, we tested top-ranked genes for their binding to AD-associated 598 

miRNAs. We retrieved AD-associated miRNAs from the Human microRNA Disease 599 

Database (HMDD) (v3.2). Similarly, for the top-ranked k genes, we calculated a p-value 600 

to measure their significance of binding to AD-associated miRNAs. 601 

 602 

Construction of AD-related regulatory networks 603 

To analyze the regulatory relationship between the predicted candidates and AD-604 

associated genes and obtain hub genes60, 61, we constructed two AD-related regulatory 605 

networks: one was a transcriptional regulation network, the other was a miRNA-target 606 

interaction network. 607 

   The human transcriptional regulatory network was downloaded from the Transcriptional 608 

Regulatory Relationships Unraveled by Sentence-based Text mining (TRRUST) 609 

database22. The full network contains 795 transcription factors (TFs) and 2492 target 610 

genes. First, we extracted an AD-related transcriptional regulatory network by retaining 611 

only the TF-target gene pairs in which one node is known or predicted AD-associated 612 

gene (among the top-ranked 200). We identified hub genes according to the outdegree 613 

or indegree. 614 

For constructing the AD-related miRNA-target interaction network, we first collected 44 615 

AD-associated miRNAs from an up-to-date review23. Then from the above-described 616 

miRTarBase23 (version 7.0), we extracted two networks. One contains only the interaction 617 

.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 10, 2021. ; https://doi.org/10.1101/2021.02.09.430536doi: bioRxiv preprint 

https://doi.org/10.1101/2021.02.09.430536
http://creativecommons.org/licenses/by-nc/4.0/


 28 

between AD-associated miRNAs and AD-associated genes, and the other contains only 618 

the interaction between AD-associated miRNAs and predicted AD-associated genes. 619 

 620 

Identification of gene modules in the integrated network 621 

To better understand the functions of the predicted genes, we constructed an integrated 622 

network by aggregating evidence from the brain FGN, PPI, coexpression network, 623 

miRNA-target network and transcriptional regulatory network. This network included the 624 

top-ranked 200 genes and the compiled 147 AD-associated genes. Two genes were 625 

connected with an edge if they were direct neighbors in any of the networks above. In 626 

detail, all TF-target interactions, which satisfy the above condition, were extracted from 627 

the transcriptional regulatory network in the TRRUST database22. We also included the 628 

genes with a CFP ≥ 0.7, and then expanded the resulting network by including other 629 

genes that have a CFP ≥ 0.95 with at least one known AD-associated gene. From the 630 

gene coexpression network, we retained only edges with PCCs higher than 0.7. From the 631 

PPI network, we included gene pairs whose encoded proteins show interaction in HuRI 632 

or STRING. For the miRNA-target interaction data, we computed a network in which the 633 

weight of the edge between two genes was calculated as w=Nshare/Nmax, where Nshare 634 

represents the number of miRNAs shared by the two genes and Nmax =max(N1, N2) with 635 

N1 and N2 denoting the number of miRNAs binding to the two genes, respectively. The 636 

range of w is from 0 to 1. The interaction with w ≥0.3 was considered. By applying the 637 

GLay algorithm implemented in Cytoscape[44] to the integrated network, we identified 638 

gene modules within which genes were closely connected. 639 

.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 10, 2021. ; https://doi.org/10.1101/2021.02.09.430536doi: bioRxiv preprint 

https://doi.org/10.1101/2021.02.09.430536
http://creativecommons.org/licenses/by-nc/4.0/


 29 

The Independent Mountain Sinai Brain Bank (MSBB) dataset with AD-related 640 

neuropathological and clinical traits 641 

We obtained an independent dataset with AD-related neuropathological and clinical traits 642 

from the MSBB study62. We used the data from Brodmann area 36 (parahippocampal 643 

gyrus), which is one of the most vulnerable regions to AD63. This dataset contains gene 644 

expression data from 215 donors for which AD-related phenotypes are also available. 645 

These phenotypes include the neuritic plaque density assessed by CERAD score, 646 

neurofibrillary tangle severity by Braak score, and severity of dementia by CDR score. 647 

The dataset contains 23021 genes measured for the 215 individuals and is available at 648 

the AMP-AD portal (https://www.synapse.org/#!Synapse:syn3159438). For each gene, its 649 

PCC with the CERAD, Braak and CDR scores was calculated.  650 

       Based on the CERAD score, we extracted control and AD samples using the criteria 651 

provided on https://www.synapse.org/#!Synapse:syn6101474; based on the Braak score, 652 

we followed the practice in 63 and divided samples into three groups in the ranges of [0, 653 

2], [3, 4] and [5, 6], representing different levels of tau pathology; Based on CDR, the 654 

samples were partitioned into three groups in the range of [0], [0.5, 2] and [3, 5] in the 655 

same way as used in 63, representing different degrees of severity of clinical dementia.  656 

 657 

Brain eQTL and ATAC-seq data 658 

We identify potentially causal regulatory variants by testing whether eQTL for a target 659 

gene also resides in the transcriptional factor binding site (TFBS) in its promoters through 660 

the integration of eQTL and ATAC-seq data. Both gene- and isoform-expression eQTLs 661 

were considered. We obtained brain gene eQTLs from GTEx (version: v8), PsychEncode 662 

.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 10, 2021. ; https://doi.org/10.1101/2021.02.09.430536doi: bioRxiv preprint 

https://doi.org/10.1101/2021.02.09.430536
http://creativecommons.org/licenses/by-nc/4.0/


 30 

(http://resource.psychencode.org/) and the CommonMind Consortium 663 

(https://www.synapse.org/#!Synapse:syn4622659). The latter two resources contain 664 

isoform eQTLs, which were also used. We used active promoters from the human brain 665 

ATAC-seq peak data in the BOCA database64. We identified TFBSs in these promoters 666 

using the FIMO tool65, with the transcription factor binding motif in the HOCOMOCO 667 

database (version 11) as reference. 668 

Data Availability 669 

All accession codes, unique identifiers, or web links for publicly available datasets are 670 

described in the paper. All data supporting the findings of the current study are listed in 671 

Supplementary Tables 1-7, Supplementary Figures 1-11, and our web interface 672 

(www.alzlink.com).  673 

Code Availability 674 

The codes for model development are publicly available at 675 

https://github.com/genemine/alzlink. 676 

 677 

References 678 
1. Calsolaro V, Antognoli R, Okoye C, Monzani F. The use of antipsychotic drugs for 679 

treating behavioral symptoms in Alzheimer's Disease. Front Pharmacol 10, 1465 (2019). 680 
 681 
2. Fredericks CA, et al. Early affective changes and increased connectivity in preclinical 682 

Alzheimer's disease. Alzheimers Dement (Amst) 10, 471–479 (2018). 683 
 684 
3. Giri M, Shah A, Upreti B, Rai JC. Unraveling the genes implicated in Alzheimer's 685 

disease. Biomed Rep 7, 105–114 (2017). 686 
 687 
4. Sims R, Hill M, Williams J. The multiplex model of the genetics of Alzheimer’s disease. 688 

Nature Neuroscience 23, 311-322 (2020). 689 
 690 
5. Mostafavi S, et al. A molecular network of the aging human brain provides insights into 691 

the pathology and cognitive decline of Alzheimer’s disease. Nat Neurosci 21, 811-819 692 
(2018). 693 

.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 10, 2021. ; https://doi.org/10.1101/2021.02.09.430536doi: bioRxiv preprint 

https://doi.org/10.1101/2021.02.09.430536
http://creativecommons.org/licenses/by-nc/4.0/


 31 

 694 
6. Johnson ECB, et al. Large-scale proteomic analysis of Alzheimer’s disease brain and 695 

cerebrospinal fluid reveals early changes in energy metabolism associated with 696 
microglia and astrocyte activation. Nat Med 26, 769-780 (2020). 697 

 698 
7. Ridge PG, Mukherjee, S., Crane, P. K., Kauwe, J. S. & Alzheimer's Disease Genetics 699 

Consortium. Alzheimer's disease: analyzing the missing heritability. PLoS ONE 8, 700 
e79771 (2013). 701 

 702 
8. Cuyvers E, Sleegers K. Genetic variations underlying Alzheimer's disease: evidence 703 

from genome-wide association studies and beyond. Lancet Neurol 15, 857–868 (2016). 704 
 705 
9. Ridge PG, et al. Assessment of the genetic variance of late-onset Alzheimer's disease. 706 

Neurobiol Aging 41, 200.e213–200.e220 (2016). 707 
 708 
10. Guan Y, Myers CL, Lu R, Lemischka IR, Bult CJ, Troyanskaya OG. A genomewide 709 

functional network for the laboratory mouse. PLOS Comput Biol 4, e1000165 (2008). 710 
 711 
11. Krishnan A, et al. Genome-wide prediction and functional characterization of the genetic 712 

basis of autism spectrum disorder. Nat Neurosci 19, 1454–1462 (2016). 713 
 714 
12. Troyanskaya OG, Dolinski K, Owen AB, Altman RB, Botstein D. A Bayesian framework 715 

for combining heterogeneous data sources for gene function prediction (in 716 
Saccharomyces cerevisiae). Proc Natl Acad Sci USA 100, 8348–8353 (2003). 717 

 718 
13. Guan Y, Ackert-Bicknell CL, Kell B, Troyanskaya OG, Hibbs MA. Functional genomics 719 

complements quantitative genetics in identifying disease-gene associations. PLoS 720 
Comput Biol 6, e1000991 (2010). 721 

 722 
14. Recla JM, Robledo RF, Gatti DM, Bult CJ, Churchill GA, Chesler EJ. Precise genetic 723 

mapping and integrative bioinformatics in Diversity Outbred mice reveals Hydin as a 724 
novel pain gene. Mamm Genome 25, 211–222 (2014). 725 

 726 
15. Greene CS, et al. Understanding multicellular function and disease with human tissue-727 

specific networks. Nat Genet 47, 569–576 (2015). 728 
 729 
16. Li H-D, Bai T, Sandford E, Burmeister M, Guan Y. BaiHui: cross-species brain-specific 730 

network built with hundreds of hand-curated datasets. Bioinformatics 35, 2486–2488 731 
(2019). 732 

 733 
17. Bai B, et al. Deep Multilayer Brain Proteomics Identifies Molecular Networks in 734 

Alzheimer's Disease Progression. Neuron 105, 975-991.e977 (2020). 735 
 736 
18. Duda M, Zhang H, Li HD, Wall DP, Burmeister M, Guan Y. Brain-specific functional 737 

relationship networks inform autism spectrum disorder gene prediction. Transl 738 
Psychiatry 8, 56 (2018). 739 

 740 
19. Mi H, Muruganujan A, Ebert D, Huang X, Thomas PD. PANTHER version 14: more 741 

genomes, a new PANTHER GO-slim and improvements in enrichment analysis tools. 742 
Nucleic Acids Res 47, D419–D426 (2018). 743 

 744 

.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 10, 2021. ; https://doi.org/10.1101/2021.02.09.430536doi: bioRxiv preprint 

https://doi.org/10.1101/2021.02.09.430536
http://creativecommons.org/licenses/by-nc/4.0/


 32 

20. Allen M, et al. Human whole genome genotype and transcriptome data for Alzheimer’s 745 
and other neurodegenerative diseases. Sci Data 3, 160089 (2016). 746 

 747 
21. Wang M, Qin L, Tang B. MicroRNAs in Alzheimer's Disease. Front Genet 10, 153 748 

(2019). 749 
 750 
22. Han H, et al. TRRUST v2: an expanded reference database of human and mouse 751 

transcriptional regulatory interactions. Nucleic Acids Res 46, D380–D386 (2017). 752 
 753 
23. Chou CH, et al. miRTarBase update 2018: a resource for experimentally validated 754 

microRNA-target interactions. Nucleic Acids Res 46, D296–D302 (2018). 755 
 756 
24. Kaltschmidt B, Kaltschmidt C. NF-kappaB in the nervous system. Cold Spring Harbor 757 

perspectives in biology 1, a001271-a001271 (2009). 758 
 759 
25. Pizzi M, et al. NF-kappaB factor c-Rel mediates neuroprotection elicited by mGlu5 760 

receptor agonists against amyloid beta-peptide toxicity. Cell Death Differ 12, 761-772 761 
(2005). 762 

 763 
26. Vukic V, et al. Expression of inflammatory genes induced by beta-amyloid peptides in 764 

human brain endothelial cells and in Alzheimer's brain is mediated by the JNK-AP1 765 
signaling pathway. Neurobiol Dis 34, 95–106 (2009). 766 

 767 
27. Kim H, et al. Overexpression of cell cycle proteins of peripheral lymphocytes in patients 768 

with Alzheimer's disease. Psychiatry Investig 13, 127–134 (2016). 769 
 770 
28. Scacchi R, Gambina G, Moretto G, Corbo RM. P21 gene variation and late-onset 771 

Alzheimer's disease in the Italian population. Dementia and geriatric cognitive disorders 772 
35, 51–57 (2013). 773 

 774 
29. Marathe S, Liu S, Brai E, Kaczarowski M, Alberi L. Notch signaling in response to 775 

excitotoxicity induces neurodegeneration via erroneous cell cycle reentry. Cell Death 776 
Differ 22, 1775-1784 (2015). 777 

 778 
30. Supek F, Bosnjak M, Skunca N, Smuc T. REVIGO summarizes and visualizes long lists 779 

of gene ontology terms. PLoS ONE 6, e21800 (2011). 780 
 781 
31. Langfelder P, Horvath S. WGCNA: an R package for weighted correlation network 782 

analysis. BMC Bioinformatics 9, 559 (2008). 783 
 784 
32. Canchi S, et al. Integrating Gene and Protein Expression Reveals Perturbed Functional 785 

Networks in Alzheimer’s Disease. Cell Rep 28, 1103-1116.e1104 (2019). 786 
 787 
33. McKenzie AT, et al. Brain Cell Type Specific Gene Expression and Co-expression 788 

Network Architectures. Sci Rep 8, 8868 (2018). 789 
 790 
34. Liang WS, et al. Altered neuronal gene expression in brain regions differentially affected 791 

by Alzheimer's disease: a reference data set. Physiol Genomics 33, 240-256 (2008). 792 
 793 

.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 10, 2021. ; https://doi.org/10.1101/2021.02.09.430536doi: bioRxiv preprint 

https://doi.org/10.1101/2021.02.09.430536
http://creativecommons.org/licenses/by-nc/4.0/


 33 

35. Liu J, Li M, Lan W, Wu F, Pan Y, Wang J. Classification of Alzheimer's disease using 794 
whole brain hierarchical network. IEEE/ACM Trans Comput Biol Bioinform 15, 624–632 795 
(2018). 796 

 797 
36. Cummings J, Feldman HH, Scheltens P. The “rights” of precision drug development for 798 

Alzheimer’s disease. Alzheimer's Res Ther 11, 76 (2019). 799 
 800 
37. Lambert J-C, et al. Genome-wide association study identifies variants at CLU and CR1 801 

associated with Alzheimer’s disease. Nat Genet 41, 1094 (2009). 802 
 803 
38. Yao V, et al. An integrative tissue-network approach to identify and test human disease 804 

genes. Nat Biotechnol 36, 1091–1099 (2018). 805 
 806 
39. Li H-D, Menon R, Omenn GS, Guan Y. The emerging era of genomic data integration for 807 

analyzing splice isoform function. Trends Genet 30, 340–347 (2014). 808 
 809 
40. Baralle FE, Giudice J. Alternative splicing as a regulator of development and tissue 810 

identity. Nat Rev Mol Cell Biol 18, 437–451 (2017). 811 
 812 
41. Barrett CL, DeBoever C, Jepsen K, Saenz CC, Carson DA, Frazer KA. Systematic 813 

transcriptome analysis reveals tumor-specific isoforms for ovarian cancer diagnosis and 814 
therapy. Proc Natl Acad Sci USA 112, E3050–E3057 (2015). 815 

 816 
42. Tian T, Wan J, Song Q, Wei Z. Clustering single-cell RNA-seq data with a model-based 817 

deep learning approach. Nat Mach Intell 1, 191–198 (2019). 818 
 819 
43. Zheng R, Li M, Liang Z, Wu F-X, Pan Y, Wang J. SinNLRR: a robust subspace 820 

clustering method for cell type detection by non-negative and low-rank representation. 821 
Bioinformatics 35, 3642–3650 (2019). 822 

 823 
44. Cao J, et al. The single-cell transcriptional landscape of mammalian organogenesis. 824 

Nature 566, 496–502 (2019). 825 
 826 
45. Bertram L, McQueen MB, Mullin K, Blacker D, Tanzi RE. Systematic meta-analyses of 827 

Alzheimer disease genetic association studies: the AlzGene database. Nat Genet 39, 828 
17–23 (2007). 829 

 830 
46. Bai Z, et al. AlzBase: an integrative database for gene dysregulation in Alzheimer’s 831 

disease. Mol Neurobiol 53, 310–319 (2016). 832 
 833 
47. Hamosh A, Scott AF, Amberger JS, Bocchini CA, McKusick VA. Online Mendelian 834 

Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. 835 
Nucleic Acids Res 33, D514–D517 (2005). 836 

 837 
48. Pinero J, et al. DisGeNET: a discovery platform for the dynamical exploration of human 838 

diseases and their genes. Database (Oxford) 2015, bav028 (2015). 839 
 840 
49. Palleja A, Horn H, Eliasson S, Jensen LJ. DistiLD Database: diseases and traits in 841 

linkage disequilibrium blocks. Nucleic Acids Res 40, D1036–D1040 (2012). 842 
 843 

.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 10, 2021. ; https://doi.org/10.1101/2021.02.09.430536doi: bioRxiv preprint 

https://doi.org/10.1101/2021.02.09.430536
http://creativecommons.org/licenses/by-nc/4.0/


 34 

50. Wu CH, et al. The Universal Protein Resource (UniProt): an expanding universe of 844 
protein information. Nucleic Acids Res 34, D187–D191 (2006). 845 

 846 
51. Carvalho-Silva D, et al. Open Targets Platform: new developments and updates two 847 

years on. Nucleic Acids Res 47, D1056-D1065 (2018). 848 
 849 
52. Buniello A, et al. The NHGRI-EBI GWAS Catalog of published genome-wide association 850 

studies, targeted arrays and summary statistics 2019. Nucleic Acids Res 47, D1005-851 
D1012 (2018). 852 

 853 
53. Xie A, Gao J, Xu L, Meng D. Shared mechanisms of neurodegeneration in Alzheimer's 854 

disease and Parkinson's disease. Biomed Res Int 2014, 648740 (2014). 855 
 856 
54. Dubois B. The emergence of a new conceptual framework for Alzheimer's disease. J 857 

Alzheimers Dis 62, 1059–1066 (2018). 858 
 859 
55. Young AMH, et al. A map of transcriptional heterogeneity and regulatory variation in 860 

human microglia. bioRxiv doi: https://doi.org/10.1101/2019.12.20.874099,  (2019). 861 
 862 
56. Tansey KE, Cameron D, Hill MJ. Genetic risk for Alzheimer's disease is concentrated in 863 

specific macrophage and microglial transcriptional networks. Genome Med 10, 14-14 864 
(2018). 865 

 866 
57. McGinnis S, Madden TL. BLAST: at the core of a powerful and diverse set of sequence 867 

analysis tools. Nucleic Acids Res 32, W20–W25 (2004). 868 
 869 
58. Luck K, et al. A reference map of the human binary protein interactome. Nature 580, 870 

402-408 (2020). 871 
 872 
59. Szklarczyk D, et al. STRING v10: protein-protein interaction networks, integrated over 873 

the tree of life. Nucleic Acids Res 43, D447–D452 (2015). 874 
 875 
60. Wang M, et al. Molecular networks and key regulators of the dysregulated neuronal 876 

system in Alzheimer’s Disease. bioRxiv doi: https://doi.org/10.1101/788323,  (2019). 877 
 878 
61. Scelsi MA, Napolioni V, Greicius MD, Altmann A. Network propagation of rare mutations 879 

in Alzheimer’s disease reveals tissue-specific hub genes and communities. bioRxiv doi: 880 
https://doi.org/10.1101/781203,  (2019). 881 

 882 
62. Wang M, et al. The Mount Sinai cohort of large-scale genomic, transcriptomic and 883 

proteomic data in Alzheimer's disease. Sci Data 5, 180185 (2018). 884 
 885 
63. Wang M, et al. Integrative network analysis of nineteen brain regions identifies molecular 886 

signatures and networks underlying selective regional vulnerability to Alzheimer's 887 
disease. Genome Med 8, 104-104 (2016). 888 

 889 
64. Fullard JF, et al. An atlas of chromatin accessibility in the adult human brain. Genome 890 

research 28, 1243-1252 (2018). 891 
 892 
65. Grant CE, Bailey TL, Noble WS. FIMO: scanning for occurrences of a given motif. 893 

Bioinformatics (Oxford, England) 27, 1017-1018 (2011). 894 

.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 10, 2021. ; https://doi.org/10.1101/2021.02.09.430536doi: bioRxiv preprint 

https://doi.org/10.1101/2021.02.09.430536
http://creativecommons.org/licenses/by-nc/4.0/


 35 

 895 
66. Hooper C, Meimaridou E, Tavassoli M, Melino G, Lovestone S, Killick R. p53 is 896 

upregulated in Alzheimer's disease and induces tau phosphorylation in HEK293a cells. 897 
Neurosci Lett 418, 34–37 (2007). 898 

 899 
67. Qin W, et al. Neuronal SIRT1 activation as a novel mechanism underlying the prevention 900 

of Alzheimer disease amyloid neuropathology by calorie restriction. J Biol Chem 281, 901 
21745-21754 (2006). 902 

 903 
68. Feio dos Santos AC, et al. Decrease of PTEN expression levels among normal, 904 

symptomatic and asymptomatic Alzheimer's disease (Ad) subjects, measured in 905 
hippocampus, temporal and entorhinal cortices. Alzheimer's & dementia : the journal of 906 
the Alzheimer's Association 7, S701 (2011). 907 

 908 
69. Sonoda Y, et al. Accumulation of tumor-suppressor PTEN in Alzheimer neurofibrillary 909 

tangles. Neurosci Lett 471, 20–24 (2010). 910 
 911 
 912 

Acknowledgments 913 

This work is supported by the National Key R&D Program of China (No. 914 

2018YFC0910504), the National Natural Science Foundation of China (No. U1909208, 915 

61772552, 61772557), 111 Project (No. B18059), and Hunan Provincial Science and 916 

Technology Program (2018WK4001). 917 

     The results published here are in part based on data obtained from the AMP-AD 918 

Knowledge Portal (https://adknowledgeportal.synapse.org/). The Mayo RNA-seq data 919 

were provided by the following sources: The Mayo Clinic Alzheimer's Disease Genetic 920 

Studies, led by Dr. Nilufer Ertekin-Taner and Dr. Steven G. Younkin, Mayo Clinic, 921 

Jacksonville, FL using samples from the Mayo Clinic Study of Aging, the Mayo Clinic 922 

Alzheimer’s Disease Research Center, and the Mayo Clinic Brain Bank. Data collection 923 

was supported through funding by NIA grants P50 AG016574, R01 AG032990, U01 924 

AG046139, R01 AG018023, U01 AG006576, U01 AG006786, R01 AG025711, R01 925 

AG017216, R01 AG003949, NINDS grant R01 NS080820, CurePSP Foundation, and 926 

support from Mayo Foundation. Study data includes samples collected through the Sun 927 

.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 10, 2021. ; https://doi.org/10.1101/2021.02.09.430536doi: bioRxiv preprint 

https://doi.org/10.1101/2021.02.09.430536
http://creativecommons.org/licenses/by-nc/4.0/


 36 

Health Research Institute Brain and Body Donation Program of Sun City, Arizona. The 928 

Brain and Body Donation Program is supported by the National Institute of Neurological 929 

Disorders and Stroke (U24 NS072026 National Brain and Tissue Resource for 930 

Parkinson’s Disease and Related Disorders), the National Institute on Aging (P30 931 

AG19610 Arizona Alzheimer’s Disease Core Center), the Arizona Department of Health 932 

Services (contract 211002, Arizona Alzheimer’s Research Center), the Arizona 933 

Biomedical Research Commission (contracts 4001, 0011, 05-901 and 1001 to the Arizona 934 

Parkinson's Disease Consortium) and the Michael J. Fox Foundation for Parkinson’s 935 

Research. The MSBB data were generated from postmortem brain tissue collected 936 

through the Mount Sinai VA Medical Center Brain Bank and were provided by Dr. Eric 937 

Schadt from Mount Sinai School of Medicine. 938 

 939 

Author contributions 940 

C.X.L., H.D.L. and W.S.L. developed the statistical method, performed the analysis, and 941 

wrote the manuscript. D.C. and C.X.L developed the web interface. X.M.Z., J.W., F.X.W. 942 

and D.W. provided instructions on the analysis. J.X.W. conceived and supervised the 943 

research and contributed to the manuscript. 944 

 945 

Additional information  946 

Supplementary Information accompanies this paper at http://www.nature.com/ nature 947 

communications.  948 

Competing financial interests: The authors declare no competing financial interests. 949 

 950 

.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 10, 2021. ; https://doi.org/10.1101/2021.02.09.430536doi: bioRxiv preprint 

https://doi.org/10.1101/2021.02.09.430536
http://creativecommons.org/licenses/by-nc/4.0/


 37 

Supplementary information 951 

Supplementary Notes 952 

Supplementary Note 1. Description for compiling AD-associated genes. 953 

 954 

Supplementary Figures 955 

Supplementary Fig. 1. Comparison in model performance of two methods in negative 956 

non-AD gene selection. 957 

Supplementary Fig. 2. Comparison of the negative controls and randomly selected genes 958 

based on their association with AD. 959 

Supplementary Fig. 3. Performances of different brain-region networks based on Random 960 

Forest (RF). 961 

Supplementary Fig. 4. Performances of different brain-region networks based on support 962 

vector machines (SVM). 963 

Supplementary Fig.  5. Performance of different brain-region networks based on logistic 964 

regression (LR). 965 

Supplementary Fig.  6. Validation of the top-ranked genes based on sequence similarity 966 

with AD-associated  genes.  967 

Supplementary Fig.  7. Validation of the top-ranked genes based on their coexpression 968 

with known AD-associated genes.  969 

Supplementary Fig. 8. Validation of the top-ranked genes based on protein-protein 970 

interaction networks in the STRING and HuRI database.  971 

Supplementary Fig. 9. Validation of the top-ranked genes based on miRNA-target binding 972 

networks. 973 

.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 10, 2021. ; https://doi.org/10.1101/2021.02.09.430536doi: bioRxiv preprint 

https://doi.org/10.1101/2021.02.09.430536
http://creativecommons.org/licenses/by-nc/4.0/


 38 

Supplementary Fig. 10. The correlation with three AD traits of the eigengenes of modules 974 

3 and 4. 975 

Supplementary Fig. 11. The correlation with three AD traits of the eigengenes of the top-976 

ranked genes. 977 

 978 

Supplementary Tables: 979 

Supplementary Table 1. The top-ranked genes (excluding training set) that are likely 980 

associated with AD based on literature. 981 

Supplementary Table 2. The top ten shared GO terms of the 147 AD-associated genes 982 

with the top 147 predicted genes. 983 

Supplementary Table 3. Gene modules identified from the integrated gene interaction 984 

network. 985 

Supplementary Table 4. The correlation of 84 genes with CERAD, Braak Score and 986 

CDR on the MSBB data. 987 

Supplementary Table 5. The seven types of functional evidence for the selected 36 988 

genes.  989 

Supplementary Table 6. The 14 genes with cell type specific expression. 990 

Supplementary Table 7. The seven genes with eQTLs located in the transcription factor 991 

binding site in the promoter region. 992 

 993 

Figure captions 994 

Fig. 1 Overview of the method for genome-wide prediction of AD-associated genes and their functional 995 

characterization. A Selection of AD-associated genes. 147 AD-associated genes were compiled from 996 

various resources, including AD-associated genes from OMIM, DisGeNet, Uniprot, DistiLD, AlzBase, 997 

.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 10, 2021. ; https://doi.org/10.1101/2021.02.09.430536doi: bioRxiv preprint 

https://doi.org/10.1101/2021.02.09.430536
http://creativecommons.org/licenses/by-nc/4.0/


 39 

AlzBase, AlzGene, literature, Open Targets, ROSMAP-DEG and GWAS-catalog. The gene that was 998 

present in at least two resources was selected. The AD-associated genes as well as potential positive 999 

genes inferred with a functional enrichment method were then removed from the full set of all human genes. 1000 

The remaining genes were treated as non-AD genes (negatives). B Brain specific functional gene networks 1001 

(FGNs) were used for feature matrix construction. For each gene, its cofunction probabilities with the 147 1002 

positive genes in the network were extracted as features. Thus, each gene was characterized by a 147-1003 

dimensional vector. C Selection of brain FGNs. We compared the ten networks collected for their predictivity 1004 

of AD-associated genes with machine learning approaches. An optimal network was selected. D Validation. 1005 

Predicted AD-associated genes were validated by AD-related pathways and various gene networks, 1006 

including coexpression networks, protein-protein interaction networks, miRNA-target binding networks, 1007 

transcriptional regulatory networks. E Functional implication in AD. The associations of the top predicted 1008 

genes with AD-related phenotypes were evaluated. Gene modules from an AD-related network were 1009 

identified. 1010 

 1011 

Fig. 2 Model performance and statistical evaluation based on AD-related pathways and various gene 1012 

networks.  A Comparison of ExtraTrees models built from different functional gene networks in terms of 1013 

AUROC and AUPRC based on cross-validation.  B Enrichment of the genes ranked in the first decile in 1014 

the four AD-associated gene sets or pathways with the decile enrichment test (described in Methods). C 1015 

Validation of the top-ranked genes based on their sequence similarity, the number of shared miRNAs, the 1016 

number of AD-associated miRNAs they can bind to, the number of coexpressed gene pairs, the number 1017 

of interactions with AD-associated genes in HuRI and STRING. In all the subplots, the red vertical line 1018 

and the distribution in yellow indicate the results for our top-ranked genes and randomly selected genes, 1019 

respectively.  1020 

 1021 

 1022 

Fig. 3 AD-related regulatory networks. A Transcriptional regulatory network including our compiled AD-1023 

associated genes and the top-ranked genes. B The interaction network between predicted genes and 1024 

.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 10, 2021. ; https://doi.org/10.1101/2021.02.09.430536doi: bioRxiv preprint 

https://doi.org/10.1101/2021.02.09.430536
http://creativecommons.org/licenses/by-nc/4.0/


 40 

AD-relevant miRNAs. C The interaction network between the compiled AD-associated genes and AD-1025 

relevant miRNAs. 1026 

 1027 

Fig. 4 Gene modules and their association with AD traits. The network was built by aggregating the 1028 

evidence from the protein-protein interaction network, coexpression network, miRNA-gene binding 1029 

network, transcriptional regulatory network and the brain FGN. This network contains the top-ranked 200 1030 

genes and the 147 compiled AD-associated genes. A Four gene modules, denoted by M1, M2, M3 and 1031 

M4, were identified by applying the GLay algorithm to the integrated network in Cytoscape. B The 1032 

association of M1 and M2 with the three AD-related phenotypes (the CERAD, Braak and CDR score) was 1033 

assessed. The results for all the tests were significant (FDR < 0.05).  1034 

 1035 

Fig. 5.  Visualization of functional evidence supporting the association of the top-ranked 200 genes with 1036 

AD. The seven circles show the strength of the seven types of evidence, including the three molecular 1037 

interaction evidence (the number of interacting AD-associated genes in PPI, coexpression network and 1038 

miRNA-target binding network, respectively) and the four phenotypic correlation evidence (the Pearson 1039 

correlation with CERAD, Braak and CDR on the MSBB dataset, and the log2-transformed fold change of 1040 

expression obtained from the ROSMAP study). The darker the purple color is, the stronger the functional 1041 

association is. The section corresponding to the blue arc shows the enriched GO biological process 1042 

terms, where each curve points the gene annotated to the term.  1043 

 1044 
Fig. 6 Illustration of the association of the top-ranked individual genes with AD-related phenotypes and 1045 

the potential regulatory variant of the gene. A Comparison of the expression of individual genes in 1046 

different sample groups. The samples were divided into groups based on the CERAD, Braak or CDR 1047 

score. The comparison for FYN and PRKAR1A is shown. B Potential regulatory SNPs that may regulate 1048 

the expression. For FYN, the SNP rs61202914 not only resides in the TFBS within its promoter region but 1049 

also is an eQTL (upper); the SNP rs8080306 is located in the TFBS and also an eQTL for PRKAR1A. 1050 

1051 

.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 10, 2021. ; https://doi.org/10.1101/2021.02.09.430536doi: bioRxiv preprint 

https://doi.org/10.1101/2021.02.09.430536
http://creativecommons.org/licenses/by-nc/4.0/


 41 

 1052 

Tables and Figures 1053 

Table 1.  Hub genes (after excluding known AD-associated genes) measured with the outdegree and 1054 

indegree in AD-related transcriptional regulatory network (TRN) and with the degree in miRNA-based 1055 

regulatory networks (MRN). 1056 

Hub Gene Gene type Outdegree, 
indegree in 
AD-related 

TRN 

Degree in AD-
related MRN 

Association with AD 

RELA| NFKB3 oncogenic TF 45, 5 2 RELA is associated with 
learning and memory24, 25 

JUN|AP-1 oncogenic TF 38, 11 5 AP1 signaling pathway is 
responsible for Aβ-induced 

neuroinflammation26 

TP53|p53 TF, tumor 
suppressor 

gene 

24, 7 8 TP53 was overexpressed in 
AD and involved in tau 

phosphorylation66 

SIRT1 TF 14,3 8 SIRT1 is associated with the 
production of Aβ67 

CCND1 oncogene 1, 18 16 CCND1 knockout protects 
against neurodegeneration 

in hippocampus29. 

CDKN1A|P21 oncogene 0, 24 15 Increased expression28 

PTEN tumor 
suppressor 

gene 

0, 4 14 Recruitment of PTEN into 
synapses contributed to 

synaptic depression in AD 68, 
69 

 1057 

  1058 

.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 10, 2021. ; https://doi.org/10.1101/2021.02.09.430536doi: bioRxiv preprint 

https://doi.org/10.1101/2021.02.09.430536
http://creativecommons.org/licenses/by-nc/4.0/


 42 

 1059 

Fig. 1 Overview of the method for genome-wide prediction of AD-associated genes and their functional 1060 

characterization. A Selection of AD-associated genes. 147 AD-associated genes were compiled from 1061 

various resources, including AD-associated genes from OMIM, DisGeNet, Uniprot, DistiLD, AlzBase, 1062 

AlzBase, AlzGene, literature, Open Targets, ROSMAP-DEG and GWAS-catalog. The gene that was 1063 

present in at least two resources was selected. The AD-associated genes as well as potential positive 1064 

genes inferred with a functional enrichment method were then removed from the full set of all human genes. 1065 

The remaining genes were treated as non-AD genes (negatives). B Brain specific functional gene networks 1066 

(FGNs) were used for feature matrix construction. For each gene, its cofunction probabilities with the 147 1067 

positive genes in the network were extracted as features. Thus, each gene was characterized by a 147-1068 

dimensional vector. C Selection of brain FGNs. We compared the ten networks collected for their predictivity 1069 

of AD-associated genes with machine learning approaches. An optimal network was selected. D Validation. 1070 

Predicted AD-associated genes were validated by AD-related pathways and various gene networks, 1071 

including coexpression networks, protein-protein interaction networks, miRNA-target binding networks, 1072 

transcriptional regulatory networks. E Functional implication in AD. The associations of the top predicted 1073 

genes with AD-related phenotypes were evaluated. Gene modules from an AD-related network were 1074 

identified.  1075 

.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 10, 2021. ; https://doi.org/10.1101/2021.02.09.430536doi: bioRxiv preprint 

https://doi.org/10.1101/2021.02.09.430536
http://creativecommons.org/licenses/by-nc/4.0/


 43 

 1076 

Fig. 2 Model performance and statistical evaluation based on AD-related pathways and various gene 1077 

networks.  A Comparison of ExtraTrees models built from different functional gene networks in terms of 1078 

AUROC and AUPRC based on cross-validation.  B Enrichment of the genes ranked in the first decile in 1079 

the four AD-associated gene sets or pathways with the decile enrichment test (described in Methods). C 1080 

Validation of the top-ranked genes based on their sequence similarity, the number of shared miRNAs, the 1081 

number of AD-associated miRNAs they can bind to, the number of coexpressed gene pairs, the number 1082 

of interactions with AD-associated genes in HuRI and STRING. In all the subplots, the red vertical line 1083 

and the distribution in yellow indicate the results for our top-ranked genes and randomly selected genes, 1084 

respectively.  1085 

A B

Fig. 2 Model performance and statistical evaluation through AD pathways and various gene networks. A Comparison of 
ExtraTrees models built from different functional gene networks in terms of AUROC and AUPRC based on cross-
validation. B Enrichment of the genes ranked in the first decile in the four AD-associated gene sets or pathways with 
the decile enrichment test (described in Methods). C Validation of the top-ranked AD genes based on their sequence 
similarity, the number of shared miRNAs, coexpression, number of interacting with AD-associated genes in (Bioplex, 
HuRI and STRING). In all the sub-plots, the red vertical line and the distribution in yellow indicate the results for our 
top-ranked genes and randomly selected genes, respectively. 

C

.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 10, 2021. ; https://doi.org/10.1101/2021.02.09.430536doi: bioRxiv preprint 

https://doi.org/10.1101/2021.02.09.430536
http://creativecommons.org/licenses/by-nc/4.0/


 44 

 1086 

Fig. 3 AD-related regulatory networks. A Transcriptional regulatory network including our compiled AD-1087 

associated genes and the top-ranked genes. B The interaction network between predicted genes and 1088 

AD-relevant miRNAs. C The interaction network between the compiled AD-associated genes and AD-1089 

relevant miRNAs. 1090 

.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 10, 2021. ; https://doi.org/10.1101/2021.02.09.430536doi: bioRxiv preprint 

https://doi.org/10.1101/2021.02.09.430536
http://creativecommons.org/licenses/by-nc/4.0/


 45 

 1091 

Fig. 4 Gene modules and their association with AD traits. The network was built by aggregating the 1092 

evidence from the protein-protein interaction network, coexpression network, miRNA-gene binding 1093 

network, transcriptional regulatory network and the brain FGN. This network contains the top-ranked 200 1094 

genes and the 147 compiled AD-associated genes. A Four gene modules, denoted by M1, M2, M3 and 1095 

M4, were identified by applying the GLay algorithm to the integrated network in Cytoscape. B The 1096 

association of M1 and M2 with the three AD-related phenotypes (the CERAD, Braak and CDR score) was 1097 

assessed. The results for all the tests were significant (FDR < 0.05).  1098 

Fig. 4 Gene modules and their association with AD traits. A Gene modules identified from the network 
integrated from the brain-specific functional gene network, protein-protein interaction network, coexpression
network, miRNA-gene binding network and transcriptional regulatory network. This network contains the top 
200 predicted genes and the 147 compiled AD genes. Then seven gene modules were identified by 
applying the GLay algorithm in Cytoscape. B The association with AD traits of modules.

learning or memory
regulation of ion transmembrane transport
cognition
regulation of synaptic plasticity
regulation of neuron death

regulation of lipid transport
cholesterol efflux
regulation of amyloid-beta formation

positive regulation of cytokine production
regulation of peptidyl-lysine acetylation
response to amyloid-beta

M2

M3

M4
regulation of phosphorylation
regulation of cell death
immune system process
regulation of neurogenesis
inflammatory response
gliogenesis

M1

A

B

●
●

●

●

●

●

●

●
●

●

●

●
●

●

●

● ●

●

●

●

●

●

●●
●

●●
●

●
●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

● ●

●

●

●

●
●

●

●

●

●

●

●

●

●

●

●

●

●
●

●
●

●

●

●●

●●

●

●

●

●

●

●
●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●
●●

●

●
●

●
●

●

●

●●

●

●

●
●

●

●

●●

●

●

●

●
●

●

●

●
●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●
●

●

●

●

●

●

●

●

●

●

●

●

●

●

●
●

−0.2
−0.1
0.0
0.1

1 2 3 4
CERAD

E
ig
en
ge
ne

●
●

●

●

●

●

●

●
●

●

●

●
●

●

●

● ●

●

●

●

●

●

●●
●

●●
●

●
●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

● ●●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●●

●

●

●

●
●

●

●

●

●

●

●

●

●

●

●

●

●
●
●
●

●

●

●●

●●

●

●

●

●

●

●
●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●
●●

●

●
●

●
●

●

●

● ●

●

●

●
●

●

●

●●

●

●

●

●
●

●

●

●
●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●
●

●

●

●

●

●

●

●

●

●

●

●

●

●

●
●

−0.2
−0.1
0.0
0.1

0 2 4 6
Braak

E
ig
en
ge
ne

●
●

●

●

●

●

●

●
●

●

●

●
●

●

●

● ●

●

●

●

●

●

●●
●

●●
●

●
●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●●

●

●

●

●
●

●

●

●

●

●

●

●

●

●

●

●

●
●

●
●

●

●

●●

●●

●

●

●

●

●

●
●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●
● ●

●

●
●

●
●

●

●

●●

●

●

●
●

●

●

●●

●

●

●

●
●

●

●

●
●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●
●

●

●

●

●

●

●

●

●

●

●

●

●

●

●
●

−0.2
−0.1
0.0
0.1

0 1 2 3 4 5
CDR

E
ig
en
ge
ne

●

●

●

●

● ●

●

●

●

●

●

●
●

● ●
●

●

● ●

●

●

● ●●
●

●

●

●

●

●

●
●

●

●

●

●
●

●
●

●

●
●
●

●

●
●
●●

●

●

●

●

●

●

●

●

●

●
●

●

●

●

●

●

●
●

●
●

●

●

●

●

●●

●

●

●
●
●

●

●

●

●

●

●

●

●

●

●

●
●

●●

●

●

● ●●

●

●

●
●

●

●

●
●

●●

●

●

●

●

●

● ●

●

●

●
●●

●

●●

●●

●

●

●

●

●

●

● ●
●

●
●

●

●

●

●

●

●

●

●

●● ●

●

●

●

●

●

●

●
●

●

●●

●

●
●●

●

●

●
●

●●
●

●
●●●

●

●

●

●

●

●
●

●

●

●
●

−0.1
0.0
0.1
0.2

1 2 3 4
CERAD

E
ig
en
ge
ne

●

●

●

●

●●

●

●

●

●

●

●
●

● ●
●

●

● ●

●

●

● ●●
●

●

●

●

●

●

●
●

●

●

●

●
●

●
●
●

●
●

●

●

●
●

●●

●

●

●

●

●

●

●

●

●

●
●

●

●

●

●

●

●
●

●
●

●

●

●

●

●●

●

●

●
●

●

●

●

●

●

●

●

●

●

●

●

●
●

●●

●

●

●●●

●

●

●
●

●

●

●
●

●●

●

●

●

●

●

● ●

●

●

●
●●

●

●●

● ●

●

●

●

●

●

●

● ●
●

●
●

●

●

●

●

●

●

●

●

●● ●

●

●

●

●

●

●

●
●

●

●●

●

●
●●

●

●

●
●

●●
●

●
●●●

●

●

●

●

●

●
●

●

●

●
●

−0.1
0.0
0.1
0.2

0 2 4 6
Braak

E
ig
en
ge
ne

●

●

●

●

● ●

●

●

●

●

●

●
●

● ●
●

●

●●

●

●

●●●
●

●

●

●

●

●

●
●

●

●

●

●
●

●
●

●

●
●

●

●

●
●

●●

●

●

●

●

●

●

●

●

●

●
●

●

●

●

●

●

●
●

●
●

●

●

●

●

● ●

●

●

●
●
●

●

●

●

●

●

●

●

●

●

●

●
●

●●

●

●

● ●●

●

●

●
●

●

●

●
●

●●

●

●

●

●

●

● ●

●

●

●
● ●

●

●●

● ●

●

●

●

●

●

●

●●
●

●
●

●

●

●

●

●

●

●

●

● ● ●

●

●

●

●

●

●

●
●

●

●●

●

●
●●

●

●

●
●

●●
●

●
●● ●

●

●

●

●

●

●
●

●

●

●
●

−0.1
0.0
0.1
0.2

0 1 2 3 4 5
CDR

E
ig
en
ge
ne

●●

●

●

●

●

●

●

●
●

●●

●

●
●

●

●

●

●

●

●

●

●●

●

●●●

●
●

●

●

●
●

●

● ●

●

●

●

●

●
●

●

●

●●
●

●

●

●

●

●

●

●

●

●

●

●
●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●
●

●

●●

●●

●

●
●

●

●

●●
●

●

●
●

●

●●

● ●
●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●

●

●

●

●

●●

●

●●
●
●

●

●
●

●

●

●

●

●
●

●

●●

●

●

●
●

●

●

●

●●
●

●

●

●

●
●

●

●
●

●
●

●●

●

●●

●

●

●●
●●

●

●
●

●
● ●

●

●

●

●

●

●
●

●
●●

−0.2
−0.1
0.0
0.1

1 2 3 4
CERAD

E
ig
en
ge
ne

●●

●

●

●

●

●

●

●
●

● ●

●

●
●

●

●

●

●

●

●

●

●●

●

●● ●

●
●

●

●

●
●

●

● ●

●

●

●

●

●
●

●

●

● ●
●

●

●

●

●

●

●

●

●

●

●

●
●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●
●

●

● ●

● ●

●

●
●

●

●

●●
●

●

●
●

●

●●

●●
●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●●

●

●

●

●

●●

●

●●
●

●

●

●
●

●

●

●

●

●
●

●

●●

●

●

●
●

●

●

●

●●
●

●

●

●

●
●

●

●
●

●
●

●●

●

●●

●

●

●●
●●

●

●
●

●
●●

●

●

●

●

●

●
●

●
● ●

−0.2
−0.1
0.0
0.1

0 2 4 6
Braak

E
ig
en
ge
ne

●●

●

●

●

●

●

●

●
●

● ●

●

●
●

●

●

●

●

●

●

●

●●

●

●● ●

●
●

●

●

●
●

●

● ●

●

●

●

●

●
●

●

●

●●
●

●

●

●

●

●

●

●

●

●

●

●
●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●
●

●

●●

● ●

●

●
●

●

●

●●
●

●

●
●

●

●●

● ●
●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●

●

●

●

●

● ●

●

●●
●

●

●

●
●

●

●

●

●

●
●

●

●●

●

●

●
●

●

●

●

● ●
●

●

●

●

●
●

●

●
●

●
●

●●

●

●●

●

●

●●
●●

●

●
●

●
●●

●

●

●

●

●

●
●

●
● ●

−0.2
−0.1
0.0
0.1

0 1 2 3 4 5
CDR

E
ig
en
ge
ne

●
●

●

●

●

●

●
●

●
●

●
●

●
●

●

●
●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●
●

●

●
●

●

●

●

●

●

●
●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●
●

●

●

●

●
●

●

●

●

●

●
●

●
●

●

●

●
●

●

●

●
●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●
●

●
●

●

●●

●
●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●
●
●

●

●

●

● ●

●●

●

●

●

●

●

●

●

●

●● ●
●

●

●●

−0.1
0.0
0.1

1 2 3 4
CERAD

E
ig
en
ge
ne

●
●

●

●

●

●

●
●

●
●

●
●

●
●

●

●
●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●
●

●

●
●

●

●

●

●

●

●
●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●
●

●

●

●

●
●

●

●

●

●

●
●

●
●

●

●

●
●

●

●

●
●
●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●
●

●
●

●

●●

●
●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●
●
●

●

●

●

● ●

●●

●

●

●

●

●

●

●

●

● ● ●
●

●

● ●

−0.1
0.0
0.1

0 2 4 6
Braak

E
ig
en
ge
ne

●
●

●

●

●

●

●
●

●
●

●
●

●
●

●

●
●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●
●

●

●
●

●

●

●

●

●

●
●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●
●

●

●

●

●
●

●

●

●

●

●
●

●
●

●

●

●
●

●

●

●
●
●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●
●

●
●

●

●●

●
●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●
●
●

●

●

●

● ●

●●

●

●

●

●

●

●

●

●

● ●●
●

●

● ●

−0.1
0.0
0.1

0 1 2 3 4 5
CDR

E
ig
en
ge
ne

M1

●
●

●

●

●

●

●

●
●

●

●

●
●

●

●

● ●

●

●

●

●

●

●●
●

●●
●

●
●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

● ●

●

●

●

●
●

●

●

●

●

●

●

●

●

●

●

●

●
●

●
●

●

●

●●

●●

●

●

●

●

●

●
●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●
●●

●

●
●

●
●

●

●

●●

●

●

●
●

●

●

●●

●

●

●

●
●

●

●

●
●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●
●

●

●

●

●

●

●

●

●

●

●

●

●

●

●
●

−0.2
−0.1
0.0
0.1

1 2 3 4
CERAD

E
ig
en
ge
ne

●
●

●

●

●

●

●

●
●

●

●

●
●

●

●

● ●

●

●

●

●

●

●●
●

●●
●

●
●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

● ●●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●●

●

●

●

●
●

●

●

●

●

●

●

●

●

●

●

●

●
●
●
●

●

●

●●

●●

●

●

●

●

●

●
●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●
●●

●

●
●

●
●

●

●

● ●

●

●

●
●

●

●

●●

●

●

●

●
●

●

●

●
●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●
●

●

●

●

●

●

●

●

●

●

●

●

●

●

●
●

−0.2
−0.1
0.0
0.1

0 2 4 6
Braak

E
ig
en
ge
ne

●
●

●

●

●

●

●

●
●

●

●

●
●

●

●

● ●

●

●

●

●

●

●●
●

●●
●

●
●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●●

●

●

●

●
●

●

●

●

●

●

●

●

●

●

●

●

●
●

●
●

●

●

●●

●●

●

●

●

●

●

●
●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●
● ●

●

●
●

●
●

●

●

●●

●

●

●
●

●

●

●●

●

●

●

●
●

●

●

●
●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●
●

●

●

●

●

●

●

●

●

●

●

●

●

●

●
●

−0.2
−0.1
0.0
0.1

0 1 2 3 4 5
CDR

E
ig
en
ge
ne

●

●

●

●

● ●

●

●

●

●

●

●
●

● ●
●

●

● ●

●

●

● ●●
●

●

●

●

●

●

●
●

●

●

●

●
●

●
●

●

●
●
●

●

●
●
●●

●

●

●

●

●

●

●

●

●

●
●

●

●

●

●

●

●
●

●
●

●

●

●

●

●●

●

●

●
●
●

●

●

●

●

●

●

●

●

●

●

●
●

●●

●

●

● ●●

●

●

●
●

●

●

●
●

●●

●

●

●

●

●

● ●

●

●

●
●●

●

●●

●●

●

●

●

●

●

●

● ●
●

●
●

●

●

●

●

●

●

●

●

●● ●

●

●

●

●

●

●

●
●

●

●●

●

●
●●

●

●

●
●

●●
●

●
●●●

●

●

●

●

●

●
●

●

●

●
●

−0.1
0.0
0.1
0.2

1 2 3 4
CERAD

E
ig
en
ge
ne

●

●

●

●

●●

●

●

●

●

●

●
●

● ●
●

●

● ●

●

●

● ●●
●

●

●

●

●

●

●
●

●

●

●

●
●

●
●
●

●
●

●

●

●
●

●●

●

●

●

●

●

●

●

●

●

●
●

●

●

●

●

●

●
●

●
●

●

●

●

●

●●

●

●

●
●

●

●

●

●

●

●

●

●

●

●

●

●
●

●●

●

●

●●●

●

●

●
●

●

●

●
●

●●

●

●

●

●

●

● ●

●

●

●
●●

●

●●

● ●

●

●

●

●

●

●

● ●
●

●
●

●

●

●

●

●

●

●

●

●● ●

●

●

●

●

●

●

●
●

●

●●

●

●
●●

●

●

●
●

●●
●

●
●●●

●

●

●

●

●

●
●

●

●

●
●

−0.1
0.0
0.1
0.2

0 2 4 6
Braak

E
ig
en
ge
ne

●

●

●

●

● ●

●

●

●

●

●

●
●

● ●
●

●

●●

●

●

●●●
●

●

●

●

●

●

●
●

●

●

●

●
●

●
●

●

●
●

●

●

●
●

●●

●

●

●

●

●

●

●

●

●

●
●

●

●

●

●

●

●
●

●
●

●

●

●

●

● ●

●

●

●
●
●

●

●

●

●

●

●

●

●

●

●

●
●

●●

●

●

● ●●

●

●

●
●

●

●

●
●

●●

●

●

●

●

●

● ●

●

●

●
● ●

●

●●

● ●

●

●

●

●

●

●

●●
●

●
●

●

●

●

●

●

●

●

●

● ● ●

●

●

●

●

●

●

●
●

●

●●

●

●
●●

●

●

●
●

●●
●

●
●● ●

●

●

●

●

●

●
●

●

●

●
●

−0.1
0.0
0.1
0.2

0 1 2 3 4 5
CDR

E
ig
en
ge
ne

●●

●

●

●

●

●

●

●
●

●●

●

●
●

●

●

●

●

●

●

●

●●

●

●●●

●
●

●

●

●
●

●

● ●

●

●

●

●

●
●

●

●

●●
●

●

●

●

●

●

●

●

●

●

●

●
●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●
●

●

●●

●●

●

●
●

●

●

●●
●

●

●
●

●

●●

● ●
●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●

●

●

●

●

●●

●

●●
●
●

●

●
●

●

●

●

●

●
●

●

●●

●

●

●
●

●

●

●

●●
●

●

●

●

●
●

●

●
●

●
●

●●

●

●●

●

●

●●
●●

●

●
●

●
● ●

●

●

●

●

●

●
●

●
●●

−0.2
−0.1
0.0
0.1

1 2 3 4
CERAD

E
ig
en
ge
ne

●●

●

●

●

●

●

●

●
●

● ●

●

●
●

●

●

●

●

●

●

●

●●

●

●● ●

●
●

●

●

●
●

●

● ●

●

●

●

●

●
●

●

●

● ●
●

●

●

●

●

●

●

●

●

●

●

●
●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●
●

●

● ●

● ●

●

●
●

●

●

●●
●

●

●
●

●

●●

●●
●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●●

●

●

●

●

●●

●

●●
●

●

●

●
●

●

●

●

●

●
●

●

●●

●

●

●
●

●

●

●

●●
●

●

●

●

●
●

●

●
●

●
●

●●

●

●●

●

●

●●
●●

●

●
●

●
●●

●

●

●

●

●

●
●

●
● ●

−0.2
−0.1
0.0
0.1

0 2 4 6
Braak

E
ig
en
ge
ne

●●

●

●

●

●

●

●

●
●

● ●

●

●
●

●

●

●

●

●

●

●

●●

●

●● ●

●
●

●

●

●
●

●

● ●

●

●

●

●

●
●

●

●

●●
●

●

●

●

●

●

●

●

●

●

●

●
●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●
●

●

●●

● ●

●

●
●

●

●

●●
●

●

●
●

●

●●

● ●
●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●

●

●

●

●

● ●

●

●●
●

●

●

●
●

●

●

●

●

●
●

●

●●

●

●

●
●

●

●

●

● ●
●

●

●

●

●
●

●

●
●

●
●

●●

●

●●

●

●

●●
●●

●

●
●

●
●●

●

●

●

●

●

●
●

●
● ●

−0.2
−0.1
0.0
0.1

0 1 2 3 4 5
CDR

E
ig
en
ge
ne

●
●

●

●

●

●

●
●

●
●

●
●

●
●

●

●
●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●
●

●

●
●

●

●

●

●

●

●
●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●
●

●

●

●

●
●

●

●

●

●

●
●

●
●

●

●

●
●

●

●

●
●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●
●

●
●

●

●●

●
●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●
●
●

●

●

●

● ●

●●

●

●

●

●

●

●

●

●

●● ●
●

●

●●

−0.1
0.0
0.1

1 2 3 4
CERAD

E
ig
en
ge
ne

●
●

●

●

●

●

●
●

●
●

●
●

●
●

●

●
●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●
●

●

●
●

●

●

●

●

●

●
●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●
●

●

●

●

●
●

●

●

●

●

●
●

●
●

●

●

●
●

●

●

●
●
●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●
●

●
●

●

●●

●
●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●
●
●

●

●

●

● ●

●●

●

●

●

●

●

●

●

●

● ● ●
●

●

● ●

−0.1
0.0
0.1

0 2 4 6
Braak

E
ig
en
ge
ne

●
●

●

●

●

●

●
●

●
●

●
●

●
●

●

●
●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●
●

●

●
●

●

●

●

●

●

●
●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●
●

●

●

●

●
●

●

●

●

●

●
●

●
●

●

●

●
●

●

●

●
●
●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●
●

●
●

●

●●

●
●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●
●
●

●

●

●

● ●

●●

●

●

●

●

●

●

●

●

● ●●
●

●

● ●

−0.1
0.0
0.1

0 1 2 3 4 5
CDR

E
ig
en
ge
ne

M2

.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 10, 2021. ; https://doi.org/10.1101/2021.02.09.430536doi: bioRxiv preprint 

https://doi.org/10.1101/2021.02.09.430536
http://creativecommons.org/licenses/by-nc/4.0/


 46 

 1099 

 1100 
Fig. 5.  Visualization of functional evidence supporting the association of the top-ranked 200 1101 
genes with AD. The seven circles show the strength of the seven types of evidence, including 1102 
the three molecular interaction evidence (the number of interacting AD-associated genes in PPI, 1103 
coexpression network and miRNA-target binding network, respectively) and the four phenotypic 1104 
correlation evidence (the Pearson correlation with CERAD, Braak and CDR on the MSBB 1105 
dataset, and the log2-transformed fold change of expression obtained from the ROSMAP study). 1106 
The darker the purple color is, the stronger the functional association is. The section 1107 
corresponding to the blue arc shows the enriched GO biological process terms, where each 1108 
curve points the gene annotated to the term.  1109 
 1110 

.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 10, 2021. ; https://doi.org/10.1101/2021.02.09.430536doi: bioRxiv preprint 

https://doi.org/10.1101/2021.02.09.430536
http://creativecommons.org/licenses/by-nc/4.0/


 47 

 1111 

 1112 

 1113 

Fig. 6 Illustration of the association of the top-ranked individual genes with AD-related phenotypes and 1114 

the potential regulatory variant of the gene. A Comparison of the expression of individual genes in 1115 

different sample groups. The samples were divided into groups based on the CERAD, Braak or CDR 1116 

score. The expression for FYN and PRKAR1A is shown. B Potential regulatory SNPs that may regulate 1117 

the expression. For FYN, the SNP rs61202914 not only resides in the TFBS within its promoter region but 1118 

also is an eQTL (upper); the SNP rs8080306 is located in the TFBS and also an eQTL for PRKAR1A. 1119 

 1120 

 1121 

 1122 

.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 10, 2021. ; https://doi.org/10.1101/2021.02.09.430536doi: bioRxiv preprint 

https://doi.org/10.1101/2021.02.09.430536
http://creativecommons.org/licenses/by-nc/4.0/