key: cord-0950387-xlq3v7d7 authors: Guler, Mehmet Akif; Aydın, Esref Orkun title: Development and validation of a tool for evaluating YouTube-based medical videos date: 2021-11-25 journal: Ir J Med Sci DOI: 10.1007/s11845-021-02864-0 sha: 797f805fffcd6d21d10bcff3781753f890c52af1 doc_id: 950387 cord_uid: xlq3v7d7 BACKGROUND/AIMS: Today, one of the ways to access medical information is the internet. Our objective was to develop a measurement tool to assess the quality of online medical videos. METHODS: Online videos covering a variety of subjects (COVID-19, low back pain, weight loss, hypertension, cancer, chest pain, vaccination, asthma, allergy, and cataracts) were evaluated using our Medical Quality Video Evaluation Tool (MQ-VET) by 25 medical and 25 non-medical professionals. Exploratory factor analysis, Cronbach’s alpha, and correlation coefficients were used to assess the validity and reliability of the MQ-VET. RESULTS: The final MQ-VET consisted of 15 items and four sections. The Cronbach’s alpha reliability coefficient for the full MQ-VET was 0.72, and the internal consistency for all factors was good (between 0.73 and 0.81). The correlation between the DISCERN questionnaire scores and MQ-VET scores was significant. CONCLUSION: Collectively, our findings indicated that the MQ-VET is a valid and reliable tool that will help to standardize future evaluations of online medical videos. Eight out of ten internet users access medical information online [1] . The YouTube platform, in particular, allows users to create medical content, without any obligation to post verified information [2, 3] . In 2007, Keelan et al. first examined the quality of immunization-related online videos [4] . Many subsequent studies have further assessed the reliability of medical videos on YouTube. Presently, the search term "YouTube" returns more than 1,500 publications on PubMed and Scopus (accessed on 17 Jan 2021) [1] . However, a standardized tool for evaluating medical health videos is lacking. Most previous studies used novel, topic-specific scoring systems based on the literature and authors' own knowledge [5] . However, the generalizability of these scoring systems is poor, and the results obtained using them are difficult to repeat. Moreover, their validity and reliability have not been adequately measured. A variety of tools to evaluate the accuracy of medical information are available, including the DISCERN instrument, Health on the Net (HON) code, Journal of the American Medical Association (JAMA) evaluation system, brief DISCERN instrument, global quality score (GQS), and video power index (VPI); medical videos can also be evaluated subjectively [1, 5, 6] . The HON Foundation devised eight principles for websites to abide by, called the HONcode [7] . Certification by the HONcode Foundation is available for a fee, but the quality of medical information is not rated. Furthermore, the validity and reliability of this system for YouTube videos have not been confirmed. The JAMA scoring system was created to evaluate medical information on websites [8] , but has not been validated for videos. The DISCERN instrument was created nearly 20 years ago for application to "written information about treatment choices" [9, 10] . Again, however, this instrument has not been validated for medical videos. In addition, the second part of the DISCERN questionnaire is focused on treatment information. Thus, videos that exclude treatment information yield misleading results [10] . The VPI score, as a measure of audience approval, is calculated as the number of likes a video has divided by the number of likes and dislikes [11] . This scoring system, which is frequently used, is not suitable for evaluating the quality and reliability of medical videos. Given the lack of suitable instruments, we developed a reliable instrument, i.e., the Medical Quality Video Evaluation Tool (MQ-VET), for use by patients and healthcare professionals. Our original questionnaire included 42 novel items based on published evaluations of medical video quality. All questionnaires used in YouTube-related articles and questions used in subjective evaluations were examined by both authors [1, 5, [12] [13] [14] . Candidate items were rated by the authors (0 points, not applicable; 10 points, highly applicable). Duplicated questions and those with a score below the average were excluded, resulting in a total of 28 questions. Videos were evaluated by 25 medical and 25 non-medical participants who have obtained sufficient points in any of the valid English language tests in the country and fluent in English. The questionnaire items were rated by participants in terms of quality and relevance (0-10 points for each item). The face and content validity of the questionnaire were also evaluated using the 10-point rating system. After excluding items with a score < 7, the final MQ-VET included 19 items. Ten unique videos (first appeared video for the popular topics from different medical subjects using YouTube's default setting) differing in terms of the uploader and medical topic were evaluated using a 5-point Likert scale (Table 1 ). The DISCERN instrument was also used to evaluate each video for concurrent validity. Statistical analyses were performed using SPSS ver. 20.0 (IBM, Armonk, NY, USA). Data distribution was examined using the Shapiro-Wilk test and histograms. Continuous variables are expressed as means ± standard deviation (SD) with ranges, and categorical variables are expressed as numbers and percentages. For item analysis, kurtoses, item-item correlations (IIC), and item-total correlations were calculated. Exploratory factor analysis (EFA) was conducted to verify construct validity. Kaiser-Meyer-Olkin (KMO) measure of sampling adequacy and Bartlett's sphere test were conducted to check whether the data were suitable for EFA. In general, KMO values between 0.8 and 1 indicate that the sampling is adequate, and KMO values less than 0.6 indicate that the sampling is not adequate [15] . Factors in the EFA were extracted using principal components analysis and the varimax kappa 4 of the rotation. Reliability was assessed by analyzing Cronbach's alpha value. Spearman's correlation coefficients between MQ-VET and DISCERN scores were used for concurrent validity. After the study was completed, the post hoc power analysis was performed using the G*Power version 3.1.9.2 software (Heinrich-Heine-Universität Düsseldorf, Düsseldorf, Germany). For the bivariate normal model correlation from exact test family, the post hoc power was calculated as 0.81 in the power analysis using correlation between the MQ-VET and the DISCERN scores (Table 2) . The mean age of the participants was 30.98 ± 4.38 years (range: 25-42 years). The professions of the participants were as follows: doctor (44%, n = 22), pharmacist (6%, n = 3), academic/teacher (20%, n = 10), and engineer (24%, n = 12). Profession data were missing in three cases. There were 23 (46%) participants with a bachelor's degree, 6 (12%) with a masters, and 21 (42%) with a doctorate. The Kaiser-Meyer-Olkin (KMO) value for the final MQ-VET (19 items) was 0.83 and the Bartlett's test statistic was x 2 = 3920.72 (p < 0.001). Thus, the data were suitable for further analysis. The first exploratory factor analysis (EFA) had five factors. The component correlation matrix was orthogonal, and varimax rotation with Kaiser normalization was applied. The factor loadings of the final EFA are displayed in Table 3 . Ultimately, our questionnaire included 15 items across four factors (5, 4, 3, and 3 items for factors 1-4, respectively). Correlation between the final form of the MQ-VET and DISCERN questionnaire used for concurrent validity. The scores of the questionnaires have shown in Table 2 Regarding internal consistency, the Cronbach's alpha value was 0.81, 0.78, 0.75, and 0.73 for factors 1-4, respectively. The Cronbach's alpha reliability coefficient for the overall MQ-VET questionnaire was 0.72. Collectively, our results confirmed the validity and reliability of the MQ-VET questionnaire. Although previous publications have discussed YouTube medical video quality [16] [17] [18] , standardized assessment tools were not utilized. Typically, de novo questionnaires were devised based on the literature and the authors' own knowledge [1, 5] . Several tools exist for the evaluation of written online information [5] . However, the applicability of these tools to videos is not known [7] . The DISCERN questionnaire is designed to evaluate treatment options [10] . Thus, it is inappropriate for analyzing videos lacking treatment information. The JAMA questionnaire, the GQS, and the HONcode were also created for the evaluation of the medical sites and written information on the internet. The VPI was designed specifically for evaluating videos, but assesses popularity rather than quality and content. Scores based on popularity change over time, which impacts repeatability [11] . The MQ-VET resolves the aforementioned issues, and its validity and reliability have been demonstrated for a variety of medical topics. Also, the MQ-VET was designed for use by both medical professionals and the general population. Evaluation of additional medical topics by more reviewers will provide further support for the MQ-VET, while translation into other languages will increase its utility. This study was limited by the low number of participants and videos, and by the lack of the test-retest reliability of the MQ-VET. However, we believe that these problems will be resolved in future studies (Table 4 ). In conclusion, we have developed a questionnaire to evaluate the quality of online medical videos posted by both medical professionals and members of the general public. We believe that this tool will help standardize evaluations of online videos. 1. Dates of updates, if any, are clearly stated 2. The recording date of the video and date on which the information was accessed are mentioned 3. The resources and references used are clearly stated 4. Concerns about advertising and potential conflicts of interest have been resolved 5. Sufficient information was provided about the identity of the presenter in the video Part 2 6. The materials used in the video facilitated learning 7. The video covered the basic concepts of the subject 8. To explain the medical topic, visual resources were used sufficiently 9. The medical terms used were well-explained Part 3 10 . The sound quality of the video was sufficient 11. The image quality of the video was sufficient 12. The information in the video is clear and understandable Healthcare information on YouTube: a systematic review Using the internet for health-related activities: findings from a national probability sample YouTube: online video and participatory culture YouTube as a source of information on immunization: a content analysis Medical YouTube videos and methods of evaluation: literature review A systematic review of patient inflammatory bowel disease information resources on the World Wide Web Improving the transparency of health information found on the internet through the honcode: a comparative study Assessing, controlling, and assuring the quality of medical information on the internet: Caveant lector et viewor-Let the reader and viewer beware Brief DISCERN, six questions for the evaluation of evidence-based content of health-related websites DISCERN: an instrument for judging the quality of written consumer health information on treatment choices Evaluating the accuracy and quality of the information in kyphosis videos shared on YouTube Englishlanguage videos on YouTube as a source of information on selfadminister subcutaneous anti-tumour necrosis factor agent injections Content of widely viewed YouTube videos about celiac disease An exploratory assessment of weight loss videos on YouTube™ Factor analysis as a tool for survey analysis YouTube as a source of information on fibromyalgia YouTube as a source of patient information for ankylosing spondylitis exercises A quality analysis of disc herniation videos on YouTube Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations