key: cord-0820888-yiug2a9t authors: Worhach, Jennifer; Boduch, Madeline; Zhang, Bo; Maski, Kiran title: Remote Assessment of Pediatric Patients with Daytime Sleepiness and Healthy Controls: A Pilot Study of Feasibility and Reliability date: 2021-10-11 journal: Child Neurol Open DOI: 10.1177/2329048x211048064 sha: 0f87a89bbb1610218e5b9497bd5aa73e46b623cb doc_id: 820888 cord_uid: yiug2a9t We assessed the reliability of cognitive testing for children and adolescents ages 8 to 19 years of age with narcolepsy or subjective daytime sleepiness compared to healthy controls. Forty-six participants took part in the study (n = 18 narcolepsy type 1, n = 6 subjective daytime sleepiness, and n = 22 healthy controls). Participants completed verbal (vocabulary testing) and non-verbal intelligence quotient (IQ) tasks (block design, matrix reasoning) from the Wechsler Abbreviated Scale of Intelligence- Second Edition (WASI-II) in-person or remotely through a HIPAA compliant telehealth platform with conditions counterbalanced. We found that vocabulary T-scores showed good reliability with intraclass correlation coefficient (ICC) of 0.76 (95% CI: 0.64, 0.85) between remote and in-person testing conditions. Matrix Reasoning T-scores showed moderate reliability (ICC 0.69, 95% CI: 0.68, 0.90) and Block Design T-scores was poor between testing conditions. Overall, the results of this pilot study support the feasibility and reliability of verbal and non-verbal IQ scores collected by telehealth. Telehealth has expanded exponentially in many areas of healthcare over the past decade and continues to grow in the setting of the COVID-19 pandemic. Telehealth also holds promise for child neurology research, especially in the assessment of cognitive measures. Given the challenges of assessing cognition in-person due to access to transportation, time burden for patients and caregivers, facility limitations, and current concerns of viral transmission, validating remote cognitive assessments in pediatric patients with neurological conditions is of great value to facilitate future research endeavors. Furthermore, given that lockdown and quarantine measures during the COVID-19 pandemic impacted the sleep and mood of many children and adolescents, it is important to assess the influence of these factors on cognitive function. 1 Reliability of remote cognitive testing has been best studied in adult patients and has included healthy individuals as well as those with mild cognitive impairment, Alzheimer's disease, a history of alcohol abuse, and intellectual disabilities. [2] [3] [4] [5] [6] Results have shown that remote assessments yield comparable results to face-to-face assessments. In general, studies have also found participants are accepting and comfortable with the use of technology in performing cognitive assessments. 7, 8 Fewer pediatric studies exist assessing reliability of remote cognitive testing but include assessments of children with Batten disease, learning disabilities, and psychosis. [9] [10] [11] However, none of the remote testing sites in these pediatric studies were conducted in home settings but rather hotel rooms during disease based conferences or within a designated testing site with patients in one room and assessor in another. Theoretically, remote cognitive testing in the home environment could pose more challenges as children and adolescents may be more distracted in home settings. In this pilot study, we assess the reliability of a subset of verbal and non-verbal IQ tests from the Wechsler Abbreviated Scale of Intelligence-Second Edition (WASI-II) done via home-based telehealth services against in-person testing among healthy controls and participants with subjective sleepiness (patients presenting with subjective excessive daytime sleepiness but had normal polysomnogram and multiple sleep latency testing) or narcolepsy type 1 (NT1), a chronic neurological disease with typical onset between 10 to 20 years of age. In addition, we examined the influence of self-reported subjective sleepiness and sleep, affect, circadian preference, and objective measures of sleep duration from actigraphy on cognitive test results. We hypothesized that remote cognitive assessments would show good reliability with in-person testing across all groups and task scores collected in the different testing conditions would not show associations with sleepiness, sleep, nor affect measures. We recruited participants from the Boston Children's Hospital sleep clinic through clinic flyers and web ads and the community through the Boston Children's Hospital Research Patient Registry, a recruitment database of potential subjects who have indicated that they are interested in hearing about research studies. In total 46 participants ages 8 to 19 took part in the study (see Table 1 for demographic and medication history). Participants included 18 children with Narcolepsy Type 1 (NT1) 12 , 6 with subjective sleepiness (patients who reported symptoms of excessive daytime sleepiness but did not meet clinical criteria for a CNS Disorder of Hypersomnolence based on polysomnogram and multiple sleep latency testing) 12 , and 22 healthy controls recruited from the community. During remote testing, NT1 participants were allowed to take their home medications as prescribed, but for in-person testing, participants were asked to stop stimulants, wake promoting medications, or any sedating medications for 5 half-lives of the drug. NT1 participants were permitted to stay on SSRI medications for cataplexy. Written consent was obtained from all participants 18+ prior to beginning the study. For participants under age 18, consent was obtained from a parent/guardian and assent was obtained from the participant. The IQ measures collected in this pilot study were part of a larger clinical research study conducted at Boston Children's Hospital. In order to assess baseline IQ, three trained research assistants independently administered the WASI-II subtests to participants once remotely and once in person with conditions counterbalanced to negate practice effects. We used WASI-II vocabulary testing task to assess verbal IQ and either block design or matrix reasoning task for non-verbal IQ assessment. Remote testing occurred via HIPAA compliant platforms, Zoom Video Communications, Inc. 13 and Vidyo, Inc. 14 with the assessor in their office or home and participant in their home. Participants were mailed a sealed packet of testing materials for WASI-II tasks for the remote assessment and broke the seal of the packet materials in front of the examiner during the remote testing session on web-camera. Both Vidyo, Inc. 14 and Zob om Video Communications, Inc. 13 provide a secure communication environment with Advanced Encryption Standard (AES) and are compliant with the data security requirements of the Health Insurance Portability Act and Accountability Act. Remote testing occurred in the late afternoon (between 3-4:30 pm) for all participants (after school) except for two participants whose testing occurred in the morning (9:30am -10:00am) due to scheduling conflicts. In-person testing occurred at Boston Children's Hospital in a designated testing room between 6 pm-7 pm one week before or after the remote assessment (conditions counterbalanced, chi-square value 2.54, p = 0.28). For each participant, the same research assistant conducted both testing conditions using a personal laptop computer. Research assistants followed standard administration procedures across conditions, which included affirming both parties web cameras were working with good picture clarity, no distractions present, and comfort at time of testing. Testing proceeded after participants verbally affirmed these conditions. The remote and in-person cognitive assessments took between 25-45 minutes for participants to complete in either condition (home/remote). Prior to in-person cognitive testing, we sent participants 4 questionnaires regarding affect, chronotype, sleep problems and daytime sleepiness via REDCap 15 survey distribution system and they wore an actigraph (Actiwatch-L, Mini-Mitter/Respironics) on their non-dominant wrist for up to 14 days to assess habitual total sleep time. Participants wore the actiwatch and completed questionnaires prior to in-person testing only. Institutional Review Board approval was obtained from the Boston Children's Hospital. To assess participant's baseline verbal and non-verbal IQ T-scores, we used the WASI-II. This is a widely used measure of objective cognitive ability that consists of four subtests. 16 To assess baseline verbal IQ, we administered the vocabulary task for all participants and to assess nonverbal IQ, we administered either the block design or matrix reasoning subtests. We used T-scores generated from this testing to assess reliability measures. Subtest T-scores mean is 50 and standard deviation is 10 (normative range). Positive and Negative Affect Scale for Children (PANAS-C): This is a 30 item self-report questionnaire used to measure emotions experienced in the past week. The PANAS-C instructs children to indicate how often they have felt interested, sad, etc during the "past few weeks" on a 5-point Likert scale. It is validated in children ages 10 to 18 years. 17, 18 Children's Sleep Habits Questionnaire (CSHQ): This is a 45 item questionnaire filled out by the parent to screen for underlying sleep problems. Parents are asked to rate the frequency in which their child displayed various sleep behaviors over the past week. It is validated in children ages 4 to 10 years 19 but has been used in research to assess sleep habits of older children. [20] [21] [22] Morningness and Eveningness Scale for Children (MESC): This is a 10 item self-reported questionnaire used to evaluate morning/evening sleep preferences in children that may influence cognitive testing. It examines sleep schedule inclinations and subjective feelings of fatigue and alertness. It is validated in children as young as 9 years of age. 23 Epworth Sleepiness Scale (ESS): This is an 8 item self-reported questionnaire used to assess for daytime sleepiness. It asks subjects to rate the probability of falling asleep during daytime situations and activities. It is validated in children 6 to 16 years. 24, 25 Actigraphy testing: Actigraphy is a wrist-watch device worn on the non-dominant wrist that has a computer-based validated algorithm to measure wake and sleep periods in children and adults based on movement. 26 It is currently the gold standard for home ambulatory sleep/ wake measurement and validated in children. 27 We report one week average total sleep time (TST) measures collected from the (Actiwatch-L, Mini-Mitter/Respironics device). To compare demographic variables, survey data and actigraphy results, we used ANOVA testing. To report WASI-II task scores, we report mean scores in each condition (remote, in-person) for each group and conducted a univariate screen with ANOVA testing for group differences. If ANOVA testing yielded results with p-value <0.01, we conducted linear regression for group differences adjusting for age, gender and ethnicity. WASI -II task scores showed normal distribution. We used paired t-tests to compare the mean difference between scores collected in the two testing conditions. To look at the relationship between WASI-II subtest scores and sleepiness, affect and sleep variables, we conducted Pearson correlation testing with generated T-scores with the following variables: CSHQ total score, morning/eveningness scores, ESS total score, TST average on actigraphy, and positive and negative mood scales. If these correlations showed p-value of <0.01, we then further analyzed that association in a linear regression model adjusting for age, gender, and ethnicity. We assessed reliability between the two methods of administration (telehealth vs in person) both within and across groups using intraclass correlations coefficients (ICC). ICC is an index that assess not only how well correlated two techniques are but also if they are equal. ICC ranges from 0 (no agreement) to 1 (perfect agreement). An ICC <0.5 indicates poor agreement, 0.5 to 0.75 indicates moderate agreement and 0.75 to 0.9 indicates good agreement and >0.90 indicates excellent agreement. 28 We used Bland-Altman plots to assess the mean differences between measures and visually examine the degree of agreement between the two testing conditions. In this method, the differences between remote and in-person testing are plotted against their averages. The plot includes one value per participant, a reference line (zero representing perfect agreement between testing conditions), the mean of the differences between the conditions (mean bias), and limits of agreement (deviation from the mean superior to two standard deviations). Differences are expressed as remote assessment minus in-person assessment so a negative value indicates that in-person has a higher IQ score than the remote assessment and a positive value indicates remote assessment overestimates the value. Bland and Altman recommended that 95% of the data points should lie within ± two standard deviations of the mean difference to assess reliability. 29 In below figures the solid line is mean of differences and dashed line is mean of differences plus or minor 1.96 times standard deviation of differences. We conducted statistical analysis using SPSS for Windows (version 19; IBM Corp, Armonk, NY, USA). Of the 46 participants, all completed the WASI-II vocabulary subtest. Only the first 12 participants completed the block design subtest because we found poor reliability on interim analysis and thus switched to the matrix reasoning subtest (n = 34 participants) for the remainder of the study. Mean age of participants (14.21 years, SD 2.7) was similar between groups (F = 1.26, p = 0.29). Gender (chi-square 0.78, p = 0.68) and ethnicity (chi-square 5.69, p = 0.49) did not differ between groups either. Full demographic results are listed in Table 1 . Task performance data, survey and actigraphy data results are listed in Table 2 . All groups' WASI-II task scores fell within normal range (normal range defined as mean T-score, SD 10). On ANOVA univariate screen, we detected group differences in in-person vocabulary task results but group differences did not persist when adjusting for age, gender and ethnicity (F = 2.12, p = 0.13). We assessed the reliability of the vocabulary T-scores using ICC and Bland-Altman plots and analysis (LoA). In the two-way mixed effects model of all participants, we found an ICC of 0.76 (95% CI: 0.64, 0.85) between remote and in-person administered vocabulary T-scores indicating good reliability. On subgroup analysis, the reliability for the subjectively sleepy controls (ICC 0.93, 95% CI: 0.74, 0.98) was excellent and NT1 participants (ICC 0.82, 95% CI: 0.63, 0.91) was good, but the reliability in the healthy control was only moderate (ICC 0.63, 95% CI: 0.35, 0.80). The Bland-Altman plot of vocabulary T-scores is presented in Figure 1a for all participants and each group (subjectively sleepy, narcolepsy type 1 and healthy controls). We identified three outliers (2 healthy controls and 1 NT1 participant) who had higher scores >2 SD on the remote assessment compared to in-person testing. As the healthy control group did show lower reliability, we reviewed data on the two healthy control outliers and did not find they differed from the group in demographics, actigraphy result, or survey responses. Across all groups, the in-person vocabulary T-scores was slightly less than those obtained by remote assessment but results were not significant (mean difference 1.1, p = 0.18) based on paired t-test. Among the 34 participants who completed the in-person and remote matrix reasoning task from the WASI-II, our two-way mixed effects model showed moderate reliability of the T-scores between conditions [ICC 0.69, 95% CI: 0.68, 0.90). However, we found excellent reliability among the subjectively sleepy participants with ICC of 0.93, 95% CI: 0.71,0.98) whereas the other groups showed only moderate reliability (NT1: ICC 0.55, 95% CI: 0.32, 0.72; healthy controls: ICC 0.71, 95% CI:0.53,0.82). The Bland-Altman plots are presented in Figure 1b for all participants and show that two healthy controls were outliers. One of these outliers performed better on remote testing condition and the other performed better in-person. Based on paired t-tests there was no significant difference across participants on their in-person versus remote matrix reasoning T-scores (mean difference −0.56, SD 7.51, p = 0.67). Only 12 participants completed the block test because reliability was clearly poor between testing conditions during this pilot study. Across groups, the ICC was 0.14, 95% CI 0, 0.56 and there were too few participants in each group to perform meaningful subgroup analysis. The Bland-Altman plots are presented in Figure 1c for all participants and show that all participants had results within two standard deviations of the mean. Paired t-tests did not show differences between conditions (mean difference −3.08 points, SD 7.80, p = 0.20. The task scores on verbal, matrix, and block design scores did not show correlations with ESS, CSHQ, actigraphy TST, or PANAS-C scores in remote or in-person conditions with one exception (Table 3a & 3b) . The CSHQ total score and in-person vocabulary task results showed a trend towards an association (r = −0.28, p = 0.06). However, linear regression of in-person vocabulary testing results did not show a main effect for CSHQ total score (F = 2.10, p = 0.16) when age, gender, and ethnicity were included in the model. There was little variability in morning/eveningness scores, which may explain why there were no significant associations. Our pilot study results show that use of telehealth to collect verbal and non-verbal IQ scores may offer a feasible and reliable means to acquire cognitive data for pediatric research through the COVID-19 pandemic and beyond. Between remote and in-person conditions, participants showed good reliability on the vocabulary testing, moderate reliability on the matrix reasoning testing, and poor reliability on the block-design testing. Notably, it was the subjectively sleepy and NT1 participants who showed good to excellent reliability on the vocabulary task whereas healthy controls demonstrated moderate reliability. On matrix reasoning, the subjectively sleepy group again showed excellent reliability on testing conditions; NT1 group and healthy controls showed moderate reliability. Test results did not significantly differ between conditions based on paired t-tests. Last, we did not find that sleepiness, affect, or sleep influenced WASI-II T-scores in either the home or in-person conditions showing IQ is robust to these external factors. Our Plausibly, these correlations were higher in these studies than ours because remote testing took place in a testing/clinic site hubs rather than home as in our study. The home environment has more variable quality computers, webcam, and testing environment than hospital/research site testing centers. We did not collect data on participant home network bandwidth speed or their perceived visual and sound quality. Our study adds to the literature in presenting data using interclass correlation testing which takes into account rater bias and is a better assessment of reliability than Pearson or Spearman correlation testing. 30 Another strength of our study is that we report data collected from healthy controls and not just from patient populations. This yielded some surprising findings with the healthy controls showing less reliability in testing methods than patient populations. Though demographic data did not differ between groups, there could be sample bias or unknown confounding factors contributing to these group discrepancies. While most participant difference scores between remote and in-person assessments were within two standard deviations of the mean on Bland-Altman plots, there were outliers with a maximum difference in measures of 17 on the vocabulary task and 22 on the matrix reasoning task (Figure 1 ). Such differences would produce changes in the interpretation of results (out of normal range) and thus we suggest investigators continue to refine our testing protocols. WASI-II T-score differences were not significantly different between testing methodologies but vocabulary T-scores were higher in the remote condition than in-person (mean difference 1.08). Similar studies conducted with adult participants have also noted better cognitive test scores using telehealth than in-person and authors suggested that participants may find it easier to focus and feel less pressure to perform when the examiner is not in the room directly observing them. 2, 6 However, better remote performance in our study could also be because remote testing tended to occur earlier in the afternoon/evening than in-person assessments and NT1 participants were allowed to stay on wake promoting medications for the remote condition only. More standardized timings and conditions of assessments as well as collection of additional measures including momentary fatigue and sleepiness prior to testing in different conditions are recommend for future protocols. In contrast, our participants had higher scores on non-verbal tasks when administered in-person. Methods employed by Hodge et al. 10 which included split screen mechanisms (one showing the evaluator and other presenting design to replicate) and touch screen technology making it easier to record responses may yield more reliable results on non-verbal tasks. 10 There are additional limitations to our pilot study beyond small sample size. First, we only tested a subset of tests from the WASI-II. Our results may not be applicable when using other cognitive assessments. Second, although all our research assistants were trained to administer the tests in the same way it is possible that there were unstudied differences in administration over the course of the study. Another limitation is the decision to administer the same test within a 1 week time span. It is possible that the participants remembered some of the questions; however, our study was counterbalanced to counter learning effects. Remote cognitive assessments in patients with sleep disorders and/or daytime sleepiness holds great promise for pediatric sleep and neurology research. Importantly, we did not find any associations between WASI-II T-score and subjective or objective nocturnal sleep measures, affect, or baseline sleepiness providing evidence that IQ assessment is robust in both testing conditions. Based on our experience, we suggest future protocols using remote cognitive assessments utilize improved technology (use of split screen and touch screens) to assess non-verbal IQ. While the chosen WASI-II tasks allowed us to obtain baseline IQ scores, they do not give a detailed profile of cognition and functional problems. Reliability of remote testing of cognitive domains such as executive functioning, memory, and attention still need further study in children/adolescents with neurological conditions and sleep disorders. We hope others build on our study experience as telehealth neuropsychological testing is a necessary need during the COVID-19 pandemic and offers great promise in the future to collect data in less costly and burdensome fashion. Written informed consent was obtained from the patient(s) for their anonymized information to be published in this article. Dr Maski has received prior consulting fees from Harmony Biosciences, Alkermes, and Jazz Pharmaceuticals. She received an investigator initiated grant from Jazz Pharmaceuticals. The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article. The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Boston Children's Hospital by Boston Children's Hospital Office of Faculty Development Grant and NINDS of the National Institutes of Health under award number K23 NS104267-01A1 to KM. This study was approved by IRB at Boston Children's Hospital (P00024211). Informed consent and assent was received by all participants per IRB regulation. Psychological and behavioral impact of lockdown and quarantine measures for COVID-19 pandemic on children, adolescents and caregivers: a systematic review and meta-analysis The consistency of neuropsychological assessments performed via telecommunication and face to face A comparison of intellectual assessments over video conferencing and in-person for individuals with ID: preliminary data Feasibility of telecognitive assessment in dementia Teleneuropsychology: evidence for video teleconference-based neuropsychological assessment Neuropsychological assessment and telemedicine: a preliminary study examining the reliability of neuropsychology services performed via telecommunication Feasibility of neuropsychological testing of older adults via videoconference: implications for assessing the capacity for independent living Consumer acceptability of brief videoconference-based neuropsychological assessment in older individuals with and without cognitive impairment Remote assessment of cognitive function in juvenile neuronal ceroid lipofuscinosis (Batten disease): a pilot study of feasibility and reliability Agreement between telehealth and face-to-face assessment of intellectual ability in children with specific learning disorder The feasibility of videoconferencing for neuropsychological assessments of rural youth experiencing early psychosis International Classification of Sleep Disorders ZOOM Cloud Meetings. Version 5.4.7. Zoom Video Communications, Inc. Updated Research electronic data capture (REDCap)-a metadata-driven methodology and workflow process for providing translational research informatics support Wechsler Abbreviated Scale of Intelligence-Second Edition (WASI-II) Development and validation of brief measures of positive and negative affect: the PANAS scales A measure of positive and negative affect for children: scale development and preliminary validation The Children's Sleep Habits Questionnaire (CSHQ): psychometric properties of a survey instrument for school-aged children Sleep, chronotype, and sleep hygiene in children with attention-deficit/ hyperactivity disorder, autism spectrum disorder, and controls. Eur Child Adolesc Psychiatry Sleep problems in children and adolescents with epilepsy: associations with psychiatric comorbidity Atomoxetine, parent training, and their effects on sleep in youth with autism spectrum disorder and attention-deficit/hyperactivity disorder Association between puberty and delayed phase preference Validation of the Epworth sleepiness scale for children and adolescents using Rasch analysis A new method for measuring daytime sleepiness: the Epworth sleepiness scale Practice parameters for the use of actigraphy in the assessment of sleep and sleep disorders: an update for 2007. Sleep Validation of actigraphy in middle childhood A guideline of selecting and reporting intraclass correlation coefficients for reliability research Measuring agreement in method comparison studies Intraclass correlations: uses in assessing rater reliability Not applicable, because this article does not contain any clinical trials. Jennifer Worhach https://orcid.org/0000-0002-5254-2894