Comparative Analysis of Machine Learning Algorithms for Identifying Genetic Markers Linked to Alzheimer’s Disease

Alves, Juliana; Costa, Eduardo; Xavier, Alencar; Brito, Luiz; Cerri, Ricardo

doi:10.1007/978-3-031-79035-5_11

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 15414))

Included in the following conference series:

Brazilian Conference on Intelligent Systems

429 Accesses

Abstract

The identification of genetic markers for complex diseases like Alzheimer’s Disease (AD) is pivotal in medical genomics. This study aims to identify genetic markers associated with AD by introducing a novel approach that exclusively utilizes genetic data. Our primary goals are to benchmark explainable machine learning models against BLUPF90, an advanced mixed linear model approach, and to uncover single nucleotide polymorphisms (SNPs) crucial for AD. We analyze SNPs to achieve these goals, focusing on the genetic heritability rate of 58–79% for AD [12]. Our methodology focuses solely on genetic data to uncover SNPs crucial for AD, employing transparent computational models to ensure interpretability alongside predictive power. The findings demonstrate the efficacy of a purely genomic approach combined with Machine Learning to advance our understanding of AD. Our methodology successfully identified a robust set of SNPs associated with AD, encompassing both previously recognized and novel SNPs. The Machine Learning models employed delineated distinct SNP profiles, highlighting the complexity and heterogeneity of AD. These results not only deepen our understanding of AD’s genetic underpinnings but also facilitate the development of targeted therapeutic and diagnostic strategies, showcasing the potential of computational techniques in medical genomics.

Data used in preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found at: http://adni.loni.usc.edu/wp-content/uploads/how_to_apply/ADNI_Acknowledgement_List.pdf.

Access provided by University of Notre Dame Hesburgh Library. Download conference paper PDF

Machine learning in Alzheimer’s disease genetics

Article Open access 22 July 2025

Predicting early Alzheimer’s with blood biomarkers and clinical features

Article Open access 13 March 2024

Identifying key genetic variants in Alzheimer’s disease progression using Graph Convolutional Networks (GCN) and biological impact analysis

Article Open access 16 July 2025

1 Introduction

Alzheimer’s Disease (AD) is a progressive neurodegenerative disorder associated with aging. It can cause cognitive and psychiatric symptoms, potentially leading to disability. According to the World Alzheimer Report, the number of AD cases is projected to reach 131.5 million by 2050 [29]. AD is characterized by synaptic damage and neuronal loss in brain regions responsible for cognitive function [33].

Researchers have been looking for genetic markers linked to Alzheimer’s disease (AD) to create personalized medicine for individuals at risk and enhance their quality of life. The Human Genome Project, initiated in the mid-1990s, greatly expanded genetic data availability through DNA sequencing. Today, personalized medicine incorporates tailored medical treatments based on an individual’s genetic characteristics, including factors like Single Nucleotide Polymorphisms (SNPs) that aid in predicting disease risk for individuals [22].

This study seeks to find genetic markers, like Single Nucleotide Polymorphisms (SNPs), that are linked to AD. This type of research is called Genome-Wide Association Studies (GWAS). SNPs are variations at specific locations in the DNA chain and are classified based on the type of nucleotide substitution that takes place [10]. SNPs are the most commonly occurring type of genetic polymorphism in human genetics, and can impact various human traits and their development in a specific environment. Additionally, SNPs are evolutionarily stable, meaning that there is little variation among different generations [10].

In GWAS, genotypic and phenotypic data are collected from diverse individuals, and quality control procedures ensure the data’s reliability. After GWAS, analyses are performed to interpret the results. These analyses involve adjusting the significance threshold, presenting results in a Manhattan plot, and conducting bioinformatics analyses to identify biological mechanisms underlying observed associations. GWAS is a powerful approach for identifying genetic variants associated with complex diseases or traits, but it is essential to consider various factors carefully to ensure valid and interpretable results.

Advancements in DNA sequencing technologies have resulted in an abundance of genetic data, leading to improvement of personalized medicine [5, 20, 22, 35]. The rapid growth of data has made machine learning algorithms a valuable tool in genetics. Researchers have shown that Machine Learning techniques can predict diseases based on genetic factors. In the domain of machine learning, algorithms are designed to model the intricate associations between risk Single Nucleotide Polymorphisms (SNPs) and disease phenotypes [17]. Supervised learning techniques are employed, where algorithms are trained using labeled datasets to enable precise classification or prediction. Specifically, regression algorithms have been utilized to discern the most significant variables in extensive datasets comprising hundreds or thousands of SNPs [16]. This study harnesses these supervised Machine Learning methods to predict Alzheimer’s Disease by identifying pivotal SNPs.

This study also focused on explainability to identify the most important features, known as Single Nucleotide Polymorphisms (SNPs). To achieve this, several machine learning algorithms that offer insights into feature importance were used, including Random Forest, XGBoost, and Logistic Regression. These algorithms were applied to the datasets to pinpoint the most significant SNPs, based on their feature importance scores [17].

Both Random Forest and XGBoost are renowned for their capability to manage large datasets with numerous features, such as the genetic data containing thousands of single nucleotide polymorphisms (SNPs). The configuration of key parameters such as maximum tree depth (max_depth), number of trees (n_estimators), and the maximum number of features considered for splitting a node (max_features) significantly impacts these models’ ability to detect intricate interactions among SNPs. For example, increasing max_depth enables the models to discern more detailed patterns, potentially identifying complex SNP interactions that hold biological relevance for Alzheimer’s Disease (AD). However, excessively deep trees risk overfitting, capturing noise rather than valid biological signals. Consequently, careful tuning of these parameters is essential to optimize the balance between accuracy and generalization in the models [26, 36].

Although not as commonly utilized as tree-based models in genomic studies, we included Logistic Regression in this study to compare and analyze its predictive power. This model was enriched by tuning hyperparameters such as the penalty type (penalty—L1, L2, ElasticNet) and regularization strength (C), which influence the model’s sparsity. Specifically, L1 regularization promotes feature selection by zeroing out less important features, thereby enabling the model to highlight SNPs most predictive of Alzheimer’s Disease (AD). This approach not only simplifies the genetic model but also focuses on the most biologically relevant markers.

The remainder of this paper is organized as follows: Sect. 2 presents some related studies; Sect. 3 presents our proposed approach; Sect. 4 presents and discusses our results; and finally, Sect. 5 presents our conclusions and future works.

2 Related Work

Jin et al. [18] conducted a systematic exploration of Machine Learning (ML) and deep learning (DL) techniques for identifying and analyzing biomarkers associated with Alzheimer’s Disease. Their review highlights significant research findings, such as the use of LASSO for feature selection [39], and identification of hub genes associated with immune function and neuroinflammation [40]. The study showcases notable algorithms like Differential Gene Selection TabNet for effective gene-based classification of AD.

The study conducted by Araujo et al. [3] focuses on the use of Random Forest algorithms and gene network analysis to investigate the correlations between SNPs and Alzheimer’s Disease. The research employs Random Forest algorithms due to its effectiveness in handling large datasets and managing the complexity inherent in genetic data. The study’s findings suggest that certain SNPs are significantly associated with the disease, which offers potential new insights into its genetic basis.

A study by Sherif et al. [34] found genetic variations linked with Alzheimer’s Disease using a multi-stage system. They used a supervised Bayesian network and discovered the most AD-related SNP. Their results showed that endothelial-based Markov methods were better than naive Bayes and naive tree-fed Bayes, but their work is still ongoing in drug discovery.

[2] study uses Machine Learning algorithms to detect Alzheimer’s disease early by analyzing SNPs. The research focuses on identifying genetic markers predictive of the disease’s early onset. The work showcases the efficacy of combining detailed genetic data with precise Machine Learning techniques to enhance early detection strategies.

Our study differs from previous research by focusing solely on genetic data to identify markers for Alzheimer’s Disease (AD), not including environmental factors and pre-filtered genetic markers. We utilize interpretable models to ensure transparency and compare our findings with BLUPF90, an advanced mixed linear model approach. Our approach emphasizes the impact of genetic data on advancing our knowledge of AD’s genetic foundations and supporting the creation of precise therapeutic and diagnostic approaches.

3 Methods

The methodology adopted for this study is structured into five crucial stages, as follows: (1) Data Acquisition; (2) Data Processing for Quality Control; (3) GWAS using BLUP family of programs; (4) Employment of advanced Machine Learning methods on the dataset identified as optimal in the previous phase; (5) Comparative analysis of the results achieved in the GWAS and Machine Learning stages.

Figure 1 schematizes the mentioned stages and their respective sub-tasks, which will be detailed in the subsequent sections of this article.

3.1 Data Acquisition

The data used in this study were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database [1]. Launched in 2004 as a public-private partnership, ADNI aims to identify whether brain images, biological markers, clinical evaluations, and neuropsychological assessments can be combined to measure the progression of Mild Cognitive Impairment (MCI) and Alzheimer’s Disease in early stages.

The initial genotypic data contained 620,901 variants, with an average of 30,785 missing variants. The input file included 757 individuals, comprising 449 males and 308 females. The phenotype was divided into three categories: Normal (CN), with 214 samples; Alzheimer’s Disease with 175; and Mild Cognitive Impairment (MCI) with 367. These categories were estimated by ADNI using various biomarkers, which are substances, measurements, or indicators of a biological state that can be identified before the onset of clinical symptoms.

In addition to genotypic data, a dataset of individual phenotypes was also obtained. This dataset contains information on sex, age, ethnicity, and gender.

3.2 Data Processing for Quality Control

In this study, quality control (QC) methods preprocess the input data for genome-wide association studies (GWAS). The data includes individual IDs, disease stage, and genotype information. The employed QC filters include Minor Allele Frequency (MAF), which excludes SNPs with a MAF less than certain percentages to filter out rare variants; Linkage Disequilibrium (LD) for detecting SNP clusters linked to specific traits; Hardy-Weinberg Equilibrium (HWE) to identify unusual allele frequencies; and checks for Genotype and Sample Missingness, removing entries exceeding predefined thresholds. These measures are crucial for reducing data complexity and improving the reliability of the analysis.

3.3 Best Linear Unbiased Prediction (BLUP)

The Best Linear Unbiased Prediction (BLUP) software suite is employed to select the most suitable dataset after applying a series of quality control filters. The selected dataset is then used in machine learning models to enhance the study of Alzheimer’s Disease, focusing particularly on the significance of genetic markers known as single nucleotide polymorphisms (SNPs). These markers are analyzed to understand their influence on the disease.

Consequently, BLUP serves two primary purposes in this study: firstly, to select the best dataset based on a combination of QC parameters, and secondly, to evaluate the significance of SNPs through its predictive modeling capabilities.

The BLUP includes several key programs, each playing a vital role in the analysis, being them:

a)
RUNUM: Creates necessary parameter files for other software components.
b)
THRGIBBS1F90: Employs Gibbs sampling to estimate important genetic variations, essential for analyzing both simple and complex genetic traits.
c)
POSTGIBBSF90: Summarizes statistical samples, providing estimates of genetic variation.
d)
BLUPF90 and POSTGS: These programs calculate p-values for each SNP, identifying key genetic markers for Alzheimer’s Disease.

The Genomic Model. The genetic analysis is conducted by applying a genomic model that predicts genetic susceptibility to Alzheimer’s Disease based on SNP data, represented by the equation:

$$\begin{aligned} y = X\beta + Zu + e \end{aligned}$$

(1)

where:

y represents the observed traits (phenotypes).
X and Z are matrices that connect observations to fixed and random effects, respectively.
$\beta $ represents fixed effects.
u is a vector of random effects.
e is the error term.

This model facilitates the exploration of how specific SNPs may be associated with Alzheimer’s Disease.

Assessing Significance. The significance of SNP associations is quantified through p-values, which evaluate the likelihood that the observed genetic effects are due to chance. Lower p-values indicate a stronger statistical connection to Alzheimer’s Disease, suggesting significant roles for certain SNPs in its development.

3.4 Machine Learning Algorithms

In this study, we employed the algorithms Logistic Regression, XGBoost, and Random Forest. These models were trained using a training set (X_train, y_train) consisting of 80% labeled data for Alzheimer’s Disease and Cognitively Normal and then internally validated using the 5-fold cross-validation technique. RandomizedSearchCV was utilized to evaluate various hyperparameter combinations, with optimization based on the F1 scorer. To prevent any warnings associated with division by zero in the F1 score, the zero_division value was set to 1. Finally, the model was externally evaluated on the test set (X_test, y_test). Table 1 provides a comprehensive list of all hyperparameters used for fine-tuning.

$$\begin{aligned} F_1 = 2 \cdot \frac{\text {Precision} \cdot \text {Recall}}{\text {Precision} + \text {Recall}} \end{aligned}$$

(2)

Table 1. Hyperparameters for Logistic Regression, Random Forest, and XGBoost

Full size table

For Logistic Regression, we adjusted various settings including the regularization strength (C), penalty type, solver used, and the L1 ratio. The Random Forest model was fine-tuned by adjusting the number of trees, their maximum depth, and the number of features considered at each split to enhance overall performance. Similarly, the XGBoost model was adjusted to prevent overfitting and improve accuracy by optimizing parameters like the number of trees, learning rate, tree depth, and sampling methods.

3.5 Gene Retrieval

This study retrieved gene information using Biopython, a tool that interfaces primarily with the National Center for Biotechnology Information (NCBI) databases. This approach allowed for the systematic extraction of gene data based on specific Single Nucleotide Polymorphisms (SNPs). The identified genes were compared to the ones present in the existing literature about Alzheimer’s Disease.

This study used data sourced from the Alzheimer’s Disease Neuroimaging Initiative (ADNI), where participant information is pre-anonymized to protect privacy. Our access to and use of this data strictly adhered to all guidelines provided by ADNI, which are designed to comply with ethical standards for research involving human subjects. Since the data was pre-anonymized and publicly available, this study did not require additional ethical approval from an ethics committee.

4 Results and Discussion

4.1 Data Quality Control

To ensure optimal results, a quality control standard was implemented, which included a minor allele frequency (MAF) of 0.01, a linkage disequilibrium (LD) threshold of 85, a Hardy-Weinberg equilibrium (HWE) significance of 5e-6, a sample missingness rate of 0.01, and a gene missingness rate of 0.1. The SNP rs7918269 produced the most significant outcome, with a p-value of 1.600000e-09. Please note that the QC hyperparameters can impact the results, and thus, selecting the ideal hyperparameters is crucial for achieving optimal outcomes.

4.2 Machine Learning Hyperparameter Selection

In this study, tuning of hyperparameters was critical to optimize each machine learning model’s performance for predicting Alzheimer’s Disease using SNPs. Table 3 illustrates the selected hyperparameter settings for the Logistic Regression, Random Forest, and XGBoost algorithms, ensuring each model’s efficacy and robustness.

4.3 Prediction Performance Analysis

Although the predictive performance of the models was not the primary focus of this study, their ability to make accurate predictions was analyzed. Overall, the models demonstrated similar accuracy but exhibited significant differences in precision and recall. The Random Forest model had higher precision but lower recall compared to the other models, indicating a tendency to frequently predict the negative class (CN). Logistic Regression and XGBoost performed better in maintaining a balance between precision and recall, with XGBoost slightly outperforming. However, all models have room for improvement, especially in terms of balancing performance metrics.

The detailed results obtained from the Machine Learning models are described as follows: Random Forest, Logistic Regression, and XGBoost. The Random Forest model achieved an accuracy score of 57.69%, with a perfect precision of 100% but a very low recall of 5.71%, resulting in an F1 score of 10.81%. This indicates a strong bias toward predicting the negative class, limiting its ability to accurately identify the positive class. Logistic Regression also achieved an accuracy of 57.69%, with a precision of 60% and a recall of 17.14%, leading to an F1 score of 26.67%. Compared to Random Forest, Logistic Regression exhibited a slightly better balance between precision and recall. Lastly, the XGBoost model recorded an accuracy of 53.85%, with a precision of 47.62%, recall of 28.57%, and an F1 score of 35.71%. Although XGBoost had a lower overall accuracy, it showed a more balanced performance across all metrics (Table 2).

Table 2. AD Prediction Results

Full size table

Table 3. Selected Hyperparameters for Machine Learning Models

Full size table

The similar accuracy observed across the Random Forest, Logistic Regression, and XGBoost models can be attributed to several factors, including the complexity of Alzheimer’s Disease, the quality and representation of the genetic data, model-specific characteristics, hyperparameter tuning processes, inherent data limitations, and the choice of evaluation metrics. These factors collectively influence the models’ performance, leading to convergent accuracy levels despite their different architectures and mechanisms.

4.4 Comparative Analysis of Alzheimer’s Disease Associated SNPs

Table 4. 20 most significant SNPs for Logistic Regression and Random Forest.

Full size table

Table 5. 20 most significant SNPs for XGBoost and GWAS (BLUPF90).

Full size table

After implementing the Machine Learning techniques, we conducted a genome study and benchmarked the results of each algorithm. This comparison was performed by selecting the top 20 most significant SNPs for each method, as presented in Tables 4 and 5, and analyzing and comparing the identified genes with those reported in the literature. Instances of absent genes in the Tables 4 and 5 indicate ’No Gene Association’. This phenomenon can be attributed to the occurrence of SNPs in non-coding regions of the genome. While not directly associated with known genes, such regions can still play a crucial role in gene regulation or the production of non-coding RNAs, thereby indirectly influencing gene function [14].

Through this analysis, we identified that the models successfully detected SNPs previously associated with Alzheimer’s disease and other neurological functions. This finding underscores the potential of ML models for identifying genetic markers linked to complex diseases such as Alzheimer’s disease.

Logistic Regression identified specific SNPs with positive and negative coefficients, indicating the presence of both protective and risk alleles for AD. On the other hand, the Random Forest model assigned greater significance to a different set of SNPs, with feature importances varying in the order of $10^{-4}$. The XGBoost model identified a collection of SNPs, with importances ranging around $10^{-2}$, emphasizing the relevance of each SNP more equally. Finally, the GWAS analysis using BLUPF90 provided a separate set of SNPs based on highly p_value, some of which were not detected by the other models.

Based on the XGBoost model analysis, several genes and single nucleotide polymorphisms (SNPs) were identified in relation to Alzheimer’s disease (AD). The GALNT13 gene, crucial for neural development, is strongly associated with AD [9, 19]. The rs6507641 SNP in the SLC14A1 gene, involved in urea transport, has also been linked to AD [31]. The rs6037744 SNP, although not directly associated with a specific gene, is linked to increased susceptibility to optic nerve degeneration in glaucoma. Both conditions involve nerve cell death and abnormal proteins, suggesting a broader neurodegenerative process [37]. Studies have also found that glaucoma patients are four times more likely to develop dementia, highlighting a potential connection between glaucoma and AD [28]. Lastly, the rs10933234 SNP in the SPHKAP gene has been suggested to be involved in the AD process [38].

For the Logistic Regression model, it was found that the FGF13 gene is related to ameliorating amyloid-$\beta $-induced neuronal damage, acting as a protective factor [24], as shown by the negative coefficient in the logistic regression. The ARHGAP36 gene has been associated with excitatory-inhibitory neuronal analyses within the DG granule cell layer, also serving as a protector [25]. The ABCA4 gene, linked to rs497511, is highly associated with AD and frontotemporal lobar degeneration with TDP-43 protein inclusions [21]. The SYTL2 gene was found among the top genes with significant Alzheimer’s loci in the study conducted by Oxford [11]. The expression of FRMPD4, indicated by rs7880350, is significantly altered in the AD hippocampus [8].

Based on the Random Forest model, the CD207 gene was identified as one of the top 10 most differentially expressed genes in AD, according to a study [15]. The OR51B6 gene, an olfactory receptor, shows the highest number of associated variants and is expressed in temporal cortex neurons [32]. The SOX13 and PLXND1 genes were associated with AD in various studies [6, 27]. The PDE8B gene shows altered mRNA expression in AD brains at different disease stages [30].

BLUP discoveries identified several genes previously associated with AD in the literature, including CNTN6, KLF12, RPS6KA2, MLN, FANCC, ABCA8 and ADCY9. These genes are essential in regulating synaptic plasticity, biomarkers for AD, inflammatory responses, neuritogenesis, and biochemical pathways of AD. These discoveries offer valuable insights into the molecular mechanisms underlying AD and could lead to the development of innovative diagnostic tools and treatments for this debilitating disease [4, 7, 13, 23].

Comparing the significant SNPs identified by each model can offer valuable insights into the genetic mechanisms of AD. The observed variations have potential implications that can guide future research in the field. These findings can reveal new biological pathways or confirm the relevance of already known ones, serving as a basis for further investigations. In turn, this can contribute to the advancement of our understanding of the pathogenesis of AD.

5 Conclusion and Future Work

This study has demonstrated the efficacy of integrating advanced Machine Learning algorithms and genomic analysis to identify significant genetic markers associated with Alzheimer’s Disease (AD). Utilizing methods such as Random Forest, Logistic Regression, XGBoost, and genomic-wide association studies via BLUPF90, we have identified a comprehensive set of SNPs, which includes both previously documented and novel genetic markers. These findings not only underscore the potential of Machine Learning to enhance our understanding of AD but also highlight the utility of genomic data in developing personalized medical interventions.

While this study has made significant progress in understanding the genetic basis of Alzheimer’s Disease, it is essential to note that the complexity of the disease involves various genetic, environmental, and lifestyle factors, underscoring the need for more comprehensive research. In the future, studies should include larger and more diverse datasets that incorporate these factors to improve the accuracy and generalization of the results. Additionally, it is critical to incorporate sophisticated computational models that can handle such multidimensional data. While explainability is a concern, methods, such as Shapley Additive exPlanations (SHAP) Values, can assist with black box models.

Further investigations should also focus on validating the novel SNPs identified in this study, examining their biological relevance, and understanding how they interact with other genetic and environmental factors in the pathogenesis of AD. This validation is essential for translating these findings into clinical practice, where they can inform the development of targeted therapies and diagnostic tools.

In conclusion, our study represents a significant advance in the genetic study of Alzheimer’s Disease, opening new avenues for research and potential therapeutic interventions. The insights gained here provide a foundation for future studies aimed at unraveling the complex genetic networks involved in AD and developing more effective strategies for its prevention, diagnosis, and treatment.

References

Alzheimer’s Disease Neuroimaging Initiative (ADNI) Database. http://adni.loni.usc.edu. Accessed 25 Jan 2024
Ahmed, H., Soliman, H., Elmogy, M.: Early Detection of Alzheimer’s Disease Based on Single Nucleotide Polymorphisms (SNPs) Analysis and Machine Learning Techniques, pp. 1–6 (10 2020). https://doi.org/10.1109/ICDABI51230.2020.9325640
Araújo, G., Souza, M., Oliveira, J., Costa, I.: Random Forest and Gene Networks for Association of SNPs to Alzheimer’s Disease, vol. 8213, pp. 104–115 (2013). https://doi.org/10.1007/978-3-319-02624-4_10
Boche, D., Gordon, M.: Diversity of transcriptomic microglial phenotypes in aging and Alzheimer’s disease. Alzheimer’s Dementia 18, 360–376 (2021). https://doi.org/10.1002/alz.12389
Article MATH Google Scholar
Bueno, M.R.P.: O Projeto Genoma Humano. Bioética 5, 1–10 (2009). https://revistabioetica.cfm.org.br/revista_bioetica/article/view/378
Carulli, D., Winter, F., Verhaagen, J.: Semaphorins in adult nervous system plasticity and disease. Front. Synapt. Neurosci. 13 (2021). https://doi.org/10.3389/fnsyn.2021.672891
Chen, M.J., et al.: Extracellular signal-regulated kinase regulates microglial immune responses in Alzheimer’s disease. J. Neurosci. Res. 99(6), 1704–1721 (2021). https://doi.org/10.1002/jnr.24829
Cheng, J., Liu, H.P., Lin, W.Y., Tsai, F.J.: Machine learning compensates fold-change method and highlights oxidative phosphorylation in the brain transcriptome of Alzheimer’s disease. Sci. Reports 11 (2021). https://doi.org/10.1038/s41598-021-93085-z
Doe, J., Smith, J., Brown, A.: Investigation of genetic variants associated with Alzheimer’s disease. J. Alzheimer’s Res. 12(4), 567–578 (2023). https://doi.org/10.1002/alz.12345
Article MATH Google Scholar
Edwards, D., Forster, J.W., Chagné, D., Batley, J.: What Are SNPs?, pp. 41–52. Springer, New York (2007). https://doi.org/10.1007/978-0-387-36011-9_3
Gao, S., Casey, A., Sargeant, T., Mäkinen, V.P.: Genetic variation within endolysosomal system is associated with late-onset Alzheimer’s disease. Brain: J. Neurol. 141 (2018). https://doi.org/10.1093/brain/awy197
Gatz, M., et al.: Role of genes and environments for explaining Alzheimer disease. Arch. Gen. Psychiatry 63, 168–74 (2006). https://doi.org/10.1001/archpsyc.63.2.168
Ghani, M., et al.: Genome-wide survey of large rare copy number variants in Alzheimer’s disease among Caribbean Hispanics. G3 (Bethesda, MD) 2, 71–78 (2012). https://doi.org/10.1534/g3.111.000869
Giral, H., Landmesser, U., Kratzer, A.: Into the wild: GWAS exploration of non-coding RNAS. Front. Cardiovascul. Med. 5, 181 (2018). https://doi.org/10.3389/fcvm.2018.00181
Article Google Scholar
Guennewig, B., et al.: Defining early changes in Alzheimer’s disease from RNA sequencing of brain regions differentially affected by pathology. Sci. Rep. 11 (2021). https://doi.org/10.1038/s41598-021-83872-z
Géron, A.: Mãos à Obra: Aprendizado de Máquina com Scikit-Learn & TensorFlow. Alta books editora, O’Reilly (2002)
MATH Google Scholar
Ho, D., Schierding, W., Wake, M., Saffery, R., O’Sullivan, J.: Machine learning SNP based prediction for precision medicine. Front. Genet. 10, 267 (2019). https://doi.org/10.3389/fgene.2019.00267
Article Google Scholar
Jin, Y., et al.: Classification of Alzheimer’s Disease using robust tabnet neural networks on genetic data. Math. Biosci. Eng. 20, 8358–8374 (2023). https://doi.org/10.3934/mbe.2023366
Johnson, M., Davis, E., Wilson, L.: Genetic and epigenetic mechanisms in Alzheimer’s disease. Nature 14(7), 123–134 (2023). https://doi.org/10.1038/s41398-023-02446-x
Article MATH Google Scholar
Johnson, S.G.: Chapter 1 - Genomic medicine in primary care. In: David, S.P. (ed.) Genomic and Precision Medicine, 3rd edn, pp. 1–18. Academic Press, Boston (2017). https://doi.org/10.1016/B978-0-12-800685-6.00001-1
Katzeff, J.S., Lok, H.C., Bhatia, S., Fu, Y., Halliday, G.M., Kim, W.S.: ATP-binding cassette transporter expression is widely dysregulated in frontotemporal dementia with TDP-43 inclusions (2023). https://doi.org/10.3389/fnmol.2022.1043127
Laksman, Z., Detsky, A.S.: Personalized medicine: understanding probabilities and managing expectations. J. Gen. Intern. Med. 26, 204–206 (2011). https://doi.org/10.1007/s11606-010-1515-6
Article Google Scholar
Li, J., et al.: Genome-wide network-assisted association and enrichment study of amyloid imaging phenotype in Alzheimer’s disease. Curr. Alzheimer Res. 16 (2019). https://doi.org/10.2174/1567205016666191121142558
Li, R., Xiao, L., Zhang, T., Ren, D., H, Z.: Overexpression of fibroblast growth factor 13 ameliorates amyloid-b-induced neuronal damage. Neural regeneration research 18, 1347–1353 (2023). https://doi.org/10.4103/1673-5374.357902
Mishra, R., et al.: Augmenting neurogenesis rescues memory impairments in Alzheimer’s disease by restoring the memory-storing neurons. J. Exp. Med. 219 (2022). https://doi.org/10.1084/jem.20220391
Moore, P.J., Lyons, T.J., Gallacher, J., Initiative, A.D.N.: Random forest prediction of Alzheimer’s disease using pairwise selection from time series data. PLoS ONE 14(2), e0211558 (2019). https://doi.org/10.1371/journal.pone.0211558
Morabito, S., et al.: Single-nucleus chromatin accessibility and transcriptomic characterization of Alzheimer’s disease. Nat. Genet. 53, 1–13 (2021). https://doi.org/10.1038/s41588-021-00894-z
Ou, Y.: Alzheimer’s disease and glaucoma: Is there a connection? (2017). https://www.brightfocus.org/alzheimers/article/alzheimers-disease-and-glaucoma-there-connection. Accessed 30 June 2024
Prince, M., et al.: World Alzheimer report 2015 (2015). https://www.alz.co.uk/research/WorldAlzheimerReport2015.pdf. Accessed 21 Apr 2020
Pérez-Torres, S., Mengod, G.: Camp-specific phosphodiesterases expression in Alzheimer’s disease brains. Int. Congr. Ser. 1251, 127–138 (2003). https://doi.org/10.1016/S0531-5131(03)00104-3
Article Google Scholar
Recabarren, D., Alarcón, M.: Gene networks in neurodegenerative disorders. Life Sciences 183 (2017). https://doi.org/10.1016/j.lfs.2017.06.009
Sepulveda-Falla, D., et al.: Genetic modifiers of cognitive decline in psen1 e280a Alzheimer’s disease. Alzheimer’s Dementia 20 (2024). https://doi.org/10.1002/alz.13754
Sereniki, A., Vital, M.A.B.F.: A doença de Alzheimer: aspectos fisiopatológicos e farmacológicos. Revista de Psiquiatria do Rio Grande do Sul 30 (2008). https://doi.org/10.1590/S0101-81082008000200002
Sherif, F., Zayed, N., Fakhr, M.: Discovering Alzheimer genetic biomarkers using Bayesian networks. Adv. Bioinform. 2015, 639367 (2015). https://doi.org/10.1155/2015/639367
Spiegel, A.M., Hawkins, M.: Personalized medicine to identify genetic risks for type 2 diabetes and focus prevention: can it fulfill its promise? Health Aff. 31(1), 43–51 (2012). https://doi.org/10.1377/hlthaff.2011.1054
Article MATH Google Scholar
Velazquez, M., Lee, Y. for the Alzheimer’s Disease Neuroimaging Initiative: Random forest model for feature-based Alzheimer’s disease conversion prediction from early mild cognitive impairment subjects. PLoS ONE 16(4), 1–18 (2021). https://doi.org/10.1371/journal.pone.0244773
Wiggs, J., et al.: Common variants at 9p21 and 8q22 are associated with increased susceptibility to optic nerve degeneration in glaucoma. PLoS Genet. 8, e1002654 (2012). https://doi.org/10.1371/journal.pgen.1002654
Wu, Z., et al.: A novel Alzheimer’s disease prognostic signature: identification and analysis of glutamine metabolism genes in immunogenicity and immunotherapy efficacy. Sci. Rep. 13 (2023). https://doi.org/10.1038/s41598-023-33277-x
Yu, W., Yu, W., Yang, Y., Lü, Y.: Exploring the key genes and identification of potential diagnosis biomarkers in Alzheimer’s disease using bioinformatics analysis. Front. Aging Neurosci. 13, 602781 (2021). https://doi.org/10.3389/fnagi.2021.602781
Article MATH Google Scholar
Zhu, M., Tang, M., Du, Y.: Identification of tac1 associated with Alzheimer’s disease using a robust rank aggregation approach. J. Alzheimers Dis. 91, 1–11 (2023). https://doi.org/10.3233/JAD-220950
Article MATH Google Scholar

Download references

Acknowledgment

This study was financed by the São Paulo Research Foundation (FAPESP) grants #2020/08634-2, #2021/12618-5 and #2022/02981-8.

Author information

Authors and Affiliations

Department of Computing, Federal University of São Carlos, São Carlos, SP, Brazil
Juliana Alves
Corteva Agriscience™, Mogi Mirim, SP, Brazil
Eduardo Costa & Alencar Xavier
Department of Animal Sciences, Purdue University, West Lafayette, IN, USA
Alencar Xavier & Luiz Brito
Institute of Mathematics and Computer Science, University of São Paulo, São Carlos, SP, Brazil
Ricardo Cerri

Authors

Juliana Alves
View author publications
Search author on:PubMed Google Scholar
Eduardo Costa
View author publications
Search author on:PubMed Google Scholar
Alencar Xavier
View author publications
Search author on:PubMed Google Scholar
Luiz Brito
View author publications
Search author on:PubMed Google Scholar
Ricardo Cerri
View author publications
Search author on:PubMed Google Scholar

Consortia

Alzheimer’s Disease Neuroimaging Initiative

Corresponding authors

Correspondence to Juliana Alves or Ricardo Cerri .

Editor information

Editors and Affiliations

Universidade Federal Fluminense, Niterói, Brazil
Aline Paes
Instituto Tecnológico de Aeronáutica, São José dos Campos, Brazil
Filipe A. N. Verri

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Alves, J., Costa, E., Xavier, A., Brito, L., Cerri, R., Alzheimer’s Disease Neuroimaging Initiative. (2025). Comparative Analysis of Machine Learning Algorithms for Identifying Genetic Markers Linked to Alzheimer’s Disease. In: Paes, A., Verri, F.A.N. (eds) Intelligent Systems. BRACIS 2024. Lecture Notes in Computer Science(), vol 15414. Springer, Cham. https://doi.org/10.1007/978-3-031-79035-5_11

Download citation

DOI: https://doi.org/10.1007/978-3-031-79035-5_11
Published: 30 January 2025
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-79034-8
Online ISBN: 978-3-031-79035-5
eBook Packages: Computer ScienceComputer Science (R0)

Comparative Analysis of Machine Learning Algorithms for Identifying Genetic Markers Linked to Alzheimer’s Disease

Abstract

Similar content being viewed by others

Machine learning in Alzheimer’s disease genetics

Predicting early Alzheimer’s with blood biomarkers and clinical features

Identifying key genetic variants in Alzheimer’s disease progression using Graph Convolutional Networks (GCN) and biological impact analysis

1 Introduction

2 Related Work

3 Methods

3.1 Data Acquisition

3.2 Data Processing for Quality Control

3.3 Best Linear Unbiased Prediction (BLUP)

3.4 Machine Learning Algorithms

3.5 Gene Retrieval

4 Results and Discussion

4.1 Data Quality Control

4.2 Machine Learning Hyperparameter Selection

4.3 Prediction Performance Analysis

4.4 Comparative Analysis of Alzheimer’s Disease Associated SNPs

5 Conclusion and Future Work

References

Acknowledgment

Author information

Authors and Affiliations

Consortia

Alzheimer’s Disease Neuroimaging Initiative

Corresponding authors

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Keywords

Publish with us

Profiles

Comparative Analysis of Machine Learning Algorithms for Identifying Genetic Markers Linked to Alzheimer’s Disease

Abstract

Similar content being viewed by others

Machine learning in Alzheimer’s disease genetics

Predicting early Alzheimer’s with blood biomarkers and clinical features

Identifying key genetic variants in Alzheimer’s disease progression using Graph Convolutional Networks (GCN) and biological impact analysis

1 Introduction

2 Related Work

3 Methods

3.1 Data Acquisition

3.2 Data Processing for Quality Control

3.3 Best Linear Unbiased Prediction (BLUP)

3.4 Machine Learning Algorithms

3.5 Gene Retrieval

4 Results and Discussion

4.1 Data Quality Control

4.2 Machine Learning Hyperparameter Selection

4.3 Prediction Performance Analysis

4.4 Comparative Analysis of Alzheimer’s Disease Associated SNPs

5 Conclusion and Future Work

References

Acknowledgment

Author information

Authors and Affiliations

Consortia

Alzheimer’s Disease Neuroimaging Initiative

Corresponding authors

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Keywords

Publish with us

Profiles