key: cord-0296390-eqsueoz3
authors: Modarres, Hadi; Kalafatis, Chris; Apostolou, Panos; Marefat, Haniye; Khanbagi, Mahdiyeh; Karimi, Hamed; Vahabi, Zahra; Aarsland, Dag; Khaligh-Razavi, Seyed-Mahdi
title: Validity and cultural generalisability of a 5-minute AI-based, computerised cognitive assessment in Mild Cognitive Impairment and Alzheimer’s Dementia
date: 2021-04-02
journal: bioRxiv
DOI: 10.1101/2021.04.01.437840
sha: 3f03a02993bd638414ea249e78e40109413781da
doc_id: 296390
cord_uid: eqsueoz3

INTRODUCTION Early detection and monitoring of mild cognitive impairment (MCI) and Alzheimer’s Disease (AD) patients are key to tackling dementia and providing benefits to patients, caregivers, healthcare providers and society. We developed the Integrated Cognitive Assessment (ICA); a 5-minute, language independent computerised cognitive test that employs an Artificial Intelligence (AI) model to improve its accuracy in detecting cognitive impairment. In this study, we aimed to evaluate the generalisability of the ICA in detecting cognitive impairment in MCI and mild AD patients. METHODS We studied the ICA in a total of 230 participants. 95 healthy volunteers, 80 MCI, and 55 participants with mild AD completed the ICA, the Montreal Cognitive Assessment (MoCA) and Addenbrooke’s Cognitive Examination (ACE) cognitive tests. RESULTS The ICA demonstrated convergent validity with MoCA (Pearson r = 0.58, p<0.0001) and ACE (r = 0.62, p<0.0001). The ICA AI model was able to detect cognitive impairment with an AUC of 81% for MCI patients, and 88% for mild AD patients. The AI model demonstrated improved performance with increased training data and showed generalisability in performance from one population to another. The ICA correlation of 0.17 (p=0.01) with education years is considerably smaller than that of MoCA (r=0.34, p<0.0001) and ACE (r=0.41, p<0.0001) which displayed significant correlations. In a separate study the ICA demonstrated no significant practice effect observed over the duration of the study. DISCUSSION The ICA can support clinicians by aiding accurate diagnosis of MCI and AD and is appropriate for large-scale screening of cognitive impairment. The ICA is unbiased by differences in language, culture and education.

administered, computerised cognitive assessment tool based on a rapid categorisation task that employs AI to detect cognitive impairment. The ICA is self-administered and independent of language. 29, 30 We aimed to evaluate the generalisability of the ICA in detecting cognitive impairment in MCI and mild AD patients. We recruited participants from two cohorts in different continents. We hypothesize that the AI-model employed for ICA can be generalised across demographically different patient populations.

To measure the convergent validity of the ICA with standard of care cognitive tests we compared the ICA with the Montreal Cognitive Assessment (MoCA) and Addenbrooke's Cognitive Examination (ACE). We investigated the level of education bias between the cognitive assessments.

We also report the effects of repeated exposure to the test in healthy participants (learning bias).

The ICA test is a rapid visual categorization task with backward masking, and has been described in detail in previous publications. 29, 30 The test takes advantage of the human brain's strong reaction to animal stimuli. 31-33 One hundred natural images (50 of animals and 50 of not containing an animal) of various levels of difficulty are selected and are presented to the participant in rapid succession as shown in Figure S1 in the supporting information (SI).

Images are presented at the centre of the screen at 7° visual angle to the participant. In some images the head or body of the animal is clearly visible to the participants, which makes it easier to detect. In other images the animals are further away or otherwise presented in cluttered environments, making them more difficult to detect.

The strongest categorical division represented in the human higher level visual cortex appears to be that between animals and inanimate objects. 34, 35 Studies also show that on average it takes about 100 ms to 120 ms for the human brain to differentiate animate from inanimate stimuli. 32, 36, 37 Following this rationale, each image is presented for 100 ms followed by a 20 millisecond inter-stimulus interval (ISI), followed by a dynamic noise mask (for 250 ms), followed by subject's categorization into animal vs non-animal. Shorter periods of ISI can make the animal detection task more difficult and longer periods reduce the potential use for testing purposes as it may not allow for the detection of less severe cognitive impairments. The dynamic mask is used to remove (or at least reduce) the effect of recurrent processes in the brain. 38, 39 This makes the task more challenging by reducing the ongoing recurrent neural activity that could artificially boost the subject's performance; it further reduces the chances of learning the stimuli. For more information about rapid visual categorization tasks refer to Mirzaei et al., (2013) . 40 Grayscale images are used to remove the possibility of color blindness affecting participants' results. Furthermore, color images can facilitate animal detection solely based on color, 41, 42 without fully processing the shape of the stimulus. This could have made the task easier and less suitable for detecting mild cognitive deficits.

The ICA test begins with a different set of 10 trial images (5 animal, 5 non-animal) to familiarize participants with the task. If participants perform above chance (>50%) on these 10 images, they will continue to the main task. If they perform at chance level (or below), the test instructions are presented again, and a new set of 10 introductory images will follow. If they perform above chance in this second attempt, they will progress to the main task. If they perform below chance for the second time the test is restarted.

Backward masking: To construct the dynamic mask, following the procedure in, 43 a white noise image was filtered at four different spatial scales, and the resulting images were thresholded to generate high contrast binary patterns. For each spatial scale, four new images were generated by rotating and mirroring the original image. This leaves us with a pool of 16 images.

The noise mask used in the ICA test was a sequence of 8 images, chosen randomly from the pool, with each of the spatial scales to appear twice in the dynamic mask.

The ICA primarily tests Information processing speed (IPS) and engages higher-level visual areas in the brain for semantic processing, i.e. distinguishing animal vs. non-animal images, 29 which is the strongest categorical division represented in the human higher-level visual cortex. 44 IPS underlies many areas of cognitive dysfunction 45,46 and is one of the key subtle, early changes that is slowed down in pre-symptomatic Alzheimer's disease. 47 This is because the speed with which an individual performs a cognitive task is not an isolated function of the processes required in that task, but also a reflection of their ability to rapidly carry out many different types of processing operations.

In the case of the ICA, these operations include transferring visual information through the retina to higher level visual areas i.e. sensory speed, processing the image representation in the visual system to categorize it into animal or non-animal (i.e. cognitive speed), and then translating this into a motor response i.e. motor speed.

MoCA is a widely used screening tool for detecting cognitive impairment, typically in older adults. 48 The MoCA test is a one-page 30-point test administered in approximately 10 minutes.

The ACE was originally developed at Cambridge Memory Clinic. 49,50 ACE assesses five cognitive domains: attention, memory, verbal fluency, language and visuospatial abilities. On average, the test takes about 20 minutes to administer and score.

ACE-R is a revised version of ACE that includes MMSE score as one of its sub-scores. 51 ACE-III replaces elements shared with MMSE and has similar levels of sensitivity and specificity to ACE-R. 52

We aimed at studying the ICA across a broader spectrum of geographical locations with differences in language and culture to test the generalisability of the ICA. For analytical purposes we combined participants from two cohorts in order to study the ICA in one demographically diverse population.

See Table 1 for a summary of the demographic characteristics of recruited participants. Disorders Association (ADRDA) and the National Institute on Aging and Alzheimer's Association (NIA-AA) diagnostic guidelines. 53 All study participants had previously had an MRI-head scan, blood tests and physical examination as part of the diagnostic procedure.

The study was conducted at Royan institute. The study was conducted according to the Declaration of Helsinki and approved by the local ethics committee at Royan Institute. The inclusion exclusion criteria are listed in the Supplementary Information (SI). All diagnoses were made by a memory clinic consultant psychiatrist according to the same diagnostic criteria as in Cohort 1. The diagnostic procedure included an MRI-head scan, blood tests and physical examination for all participants. The eligibility criteria are listed in the SI.

One additional inclusion criterion for Cohort 2 required an ACE-III score of >=90 for healthy participants. Cognitive assessments were performed either in the clinic, or via at home visits.

Approximately 51% of assessments were conducted via home visits and 49% in the clinic in a single visit.

Inclusion criteria were common for both cohorts and refer to individuals with normal or corrected-to-normal vision, without severe upper limb arthropathy or motor problems that could prevent them from completing the tests independently (see SI). For each participant, information about age, education and gender was also collected. Informed written consent was obtained from all participants. Spectrum bias, whereby the subjects included in the study do not include the complete spectrum of patient characteristics in the intended use population 54 has been avoided in Cohort 1 and Cohort 2 by recruiting participants according to a sampling matrix and at the mild stage of Alzheimer's Dementia. Therefore, the ICA performance metrics reported in this study are relative to detecting cognitive impairment in a population with less severe impairment.

The raw data from the ICA is composed of reaction time and categorisation accuracy on the images. This data was used to calculate summary features such as overall accuracy, and speed using the same methodology as described previously. 29, 30 Accuracy is defined as follows:

(1) Accuracy= Number of correct categorisations/total number of images * 100

Speed is defined based on participant's response reaction times in trials they responded correctly:

(2) Speed= min(100, 100e -mean correct RT / 1025 + 0.341 ) A summary ICA Index, is calculated as follows:

(3) ICA Index = (speed/100) * (accuracy/100) * 100

The ICA Index describes the raw test result, incorporating speed and accuracy, the two main elements of the ICA test.

The AI model utilizes inputs from accuracy and speed of responses to the ICA rapid categorisation task (with the ICA Index as an input feature), as well as age, and outputs an indication of likelihood of impairment (AI probability) by comparing the test performance and age of a patient to those previously taken by healthy and cognitively impaired individuals. The AI model is able to achieve an improved classification accuracy relative to using any single feature from the ICA test.

A probability threshold value of 0.5 was used to convert the AI probability to the AI prediction of healthy or cognitively impaired (MCI/mild AD). The AI probability was also converted to a score between 0-100 using the following equation:

(4) ICA score = (1 -AI probability)*100 The ICA AI model used in this study was a binary logistic regression machine learning model which is a supervised linear classifier implemented on Python scikit-learn with stochastic gradient descent learning. 55 The algorithm's task is to learn a set of weights from a regression model that maps the participant's ICA test results and demographics to the classification label of healthy or cognitively impaired ( Figure 1 ). An example results-page from the ICA, showing the ICA Score obtained from the AI model, as well as the informative features of the ICA test such as the accuracy, speed and ICA Index are shown in Figure S2 in the SI.

The ICA's prediction on each participant was obtained using leave-one-out-cross validation on the data from Cohort 1 and Cohort 2. In this method the dataset was repeatedly split into two non-overlapping training and testing sets. The training set was used to train the model and the test set to test the performance of the classifier. In the leave-one-out method only one instance was placed in the test set, with the remaining data points used for training. This was done iteratively for all data points, providing an AI probability score for each participant.

The combined results were used to calculate overall metrics (Receiver operating curve area under the curve (ROC AUC), sensitivity, specificity) for the classifier by comparing the ICA AI prediction to clinical diagnosis. The sensitivity or true positive rate is the proportion of actual positives -i.e., impaired-that are identified as such, whilst the specificity, or true negative rate is the proportion of actual negatives -i.e., healthy -that are identified as such.

The ICA AI prediction was also compared to MoCA, ACE to obtain percentage agreement values between these cognitive tests. Single cut-off values were used to obtain predictions for MoCA (score of >=26 for healthy) and ACE (score of >=90 for healthy).

In order to test the generalisability of the ICA, data from Cohort 1 was used to train the AI model and was tested on data from Cohort 2, and vice versa. The number of data points used to train the AI model can significantly impact the performance of the model on the testing dataset. To investigate this, subsets of the data from one cohort were used for training through random sampling. For each training size a model was trained and tested on all the data from the other cohort. We varied the size of the training data from 3 data points to training with all of the data from each cohort.

In total 230 participants (Healthy: 95, MCI: 80, mild AD: 55) were recruited into Cohort 1 and Cohort 2. Participant demographics and cognitive test results are shown in Table 1 .

Participants were recruited based on a sampling matrix in order to minimise age, gender, and education year differences across the three arms. Table S1 in the SI.

Due to the balanced recruitment, there was no significant difference across genders in any of the cognitive tests (See Figure S3 in SI). The ICA also did not show a significant difference in score between those with 0-11 years education, compared to those with 12 years of education or more ( Figure S3b in SI). In contrast there was a statistically significant increase in MoCA score for mild AD participants of higher education, while ACE scores were higher for those with higher education years in Healthy and MCI participants ( Table 2) . This trend was also illustrated in correlation analysis. The ICA displayed a Pearson r correlation of 0.17 with education years (p=0.01), which is considerably smaller than that of MoCA (r=0.34, p<0.0001) and ACE (r=0.41, p<0.0001) which displayed significant correlations.

The statistically significant Pearson correlation of 0.62 with ACE and 0.58 with MoCA demonstrates convergent validity of ICA with these cognitive tests. The scatterplot of ICA with

MoCA and ACE is shown in Figure S4 

The breakdown of speed and accuracy by age and diagnosis is shown in Table 3 . Within

Healthy participants, there is a strong negative correlation between age and accuracy (Pearson r = -0.4, p<0.0001); similarly, for MCI participants (Pearson r = -0.4, p<0.001).

However, for mild AD participants there is no correlation between their accuracy and age (Pearson r = 0.07, p=0.58). Analysis did not reveal a significant difference in speed with age within the three groups. Prior to the commencement of the 100 image ICA test, participants are shown a set of trial images for training purposes. If users perform adequately well on the trial images, they proceed to the main test, however if they perform below chance then the trial images are reshown to participants. We observed that the number of attempts required by participants before proceeding on to the main test is itself a strong predictor of cognitive impairment.

Among Healthy participants 88% completed the trial images on their first attempt compared to 61% of MCI participants, and 44% of mild AD participants (See Figure S5 in SI).

Furthermore, within each group, those who required more than one attempt to progress onto the main test scored lower than those who did not, and they tended to be older participants (Table S3 in SI), indicating within-group cognitive performance variation.

The raw data from the ICA test consist of categorisation accuracy and reaction time for each of the 100 images on the test. In Figure 2 , the average accuracy and reaction time per image has been visualised as a heatmap for each group to show how healthy and impaired confidence interval of ICA Score for healthy, MCI, and mild AD, with all data points overlaid on the graph. T-test pvalue comparing Healthy-MCI, and Healthy -mild AD ICA score is also shown (d) ROC AUC vs training data size.

The shaded area represents 95% CI as each training subset was selected randomly 20 times from the whole study data.

For the AI score a higher value is indicative of being cognitively healthy, and a lower score is indicative of potential cognitive impairment. Healthy participants have significantly higher ICA score compared to MCI and mild AD participants (Figure 3c) . A direct comparison of this type is not provided for ACE, as this cognitive test was used as an inclusion criterion for healthy participants in Cohort 2, and hence by default it would have a specificity of 100% for healthy participants from that study. Figure S2 of the SI. In addition to the AI output (ICA Score), the overall accuracy, speed, ICA Index and performance during the test is displayed. As shown in the results presented here, these additional metrics are highly correlated with diagnosis, clinically informative, and help explain the AI output (ICA score), providing supporting evidence to aid the clinician in diagnosis. 

To assess practice effect, we recruited healthy participants to control for the risk of fluctuating or progressively lower test scores in cognitively impaired individuals. The mean ICA Index of the 12 healthy participants, with the 95% confidence interval is shown in Figure S6 . The oneway ANOVA p-value obtained was 0.99, showing no significant practice effect for the participants who completed the ICA test 78 times over a period of 96.8 days on average.

In Study limitations include a relatively lower recruitment of young participants with mild AD.

However, this is reflective of the lower prevalence of young mild AD patients in the general public. Test-retest data have not been captured in this study. We have previously reported that high test-retest reliability (Pearson r > 0.91) was obtained for the ICA. 29, 30 Fluid or molecular biomarker sub-typing to determine amyloid positivity for MCI participants has not been carried in this study, due to lack of data availability. The MCI group, however, reflects the heterogeneity MCI diagnoses in memory clinics. We plan to correlate fluid biomarker positivity with the ICA in future studies.

Remote cognitive assessment is becoming increasingly important, particularly as health services cannot accommodate regular patient attendance to memory services for progression monitoring or response to treatments. The COVID-19 pandemic has accelerated this pressing need and guidelines for the implementation of partly or fully remote memory clinics have recently been published. 16 Digital cognitive and functional biomarkers are essential in order to enable this. We report a proof-of-concept capability of the ICA for the remote measurement of cognitive performance. Further validation is required for remote administration in MCI and mild AD patients.

In summary the ICA can be used as a digital cognitive biomarker for the detection of MCI and AD. Furthermore, the ICA can be used as a high frequency monitoring tool both in the clinic and potentially remotely. The employment of AI modelling has the potential to further enhance its performance but also to personalise its results at an individual patient level across geographic boundaries. 

Alzheimer's disease facts and figures

The prevalence of mild cognitive impairment in diverse geographical and ethnocultural regions: The COSMIC Collaboration

Practice guideline update summary: Mild cognitive impairment

Alzheimer's Disease International

The Edinburgh Consensus: preparing for the advent of disease-modifying therapies for Alzheimer's disease

A resurrection of aducanumab for Alzheimer's disease

Mild cognitive impairment: the Manchester consensus

Rationale for Early Diagnosis of Mild Cognitive Impairment (MCI) supported by Emerging Digital Technologies

Early Detection of Mild Cognitive Impairment (MCI) in an At-Home Setting

Timely diagnosis for alzheimer's disease: A literature review on benefits and challenges

Potentially modifiable lifestyle factors, cognitive reserve, and cognitive function in later life: A cross-sectional study

Screening for mild cognitive impairment (MCI) utilizing combined mini-mental-cognitive capacity examinations for identifying dementia prodromes

Detection of MCI in the clinic: Evaluation of the sensitivity and specificity of a computerised test battery, the Hopkins Verbal Learning Test and the MMSE

Predictors of dementia misclassification when using brief cognitive assessments

Longitudinal Change in Performance on the Montreal Cognitive Assessment in Older Adults

Implementing Remote Memory Clinics to Enhance Clinical Care During and After COVID-19. Front Psychiatry

Deep Learning in Alzheimer's Disease: Diagnostic Classification and Prognostic Prediction Using Neuroimaging Data

Machine learning prediction of incidence of

Alzheimer's disease using large-scale administrative health data

Machine learning techniques for the diagnosis of alzheimer's disease: A review

Applications of Technology in Neuropsychological Assessment

Mobile Technology for Cognitive Assessment of Older Adults: A Scoping Review

The Fundamentals of Person-Centered Care for Individuals with

Integrated Cognitive Assessment: Speed and Accuracy of Visual Processing as a Reliable Proxy to Cognitive Performance

Resolving human object recognition in space and time

Temporal dynamics of animacy categorization in the brain of patients with mild cognitive impairment. bioRxiv

Matching Categorical Object Representations in Inferior Temporal Cortex of Man and Monkey

Cortical representation of animate and inanimate objects in complex natural scenes

Timing, Timing, Timing: Fast Decoding of Object Information from Intracranial Field Potentials in Human Visual Cortex

Tracking the Spatiotemporal Neural Dynamics of Real-world Object Size and Animacy in the Human Brain

Masking Disrupts Reentrant Processing in Human Visual Cortex

Beyond core object recognition: Recurrent processes account for object recognition under occlusion. Isik L

Predicting the human reaction time based on natural image statistics in a rapid categorization task

Rapid serial processing of natural scenes: Color modulates detection but neither recognition nor the attentional blink

Animal Detection in Natural Images: Effects of Color and Image Database

The time course of visual processing: Backward masking and natural scene categorisation

Deep Supervised, but Not Unsupervised

We thank the site teams at the NHS trusts and the Royan Institute for their support throughout the study.