key: cord-102918-bkyk7or9 authors: Burns, C. Sean; Nix, Tyler; Shapiro, Robert M.; Huber, Jeffrey T. title: Methodological Issues with Search in MEDLINE: A Longitudinal Query Analysis date: 2020-05-22 journal: bioRxiv DOI: 10.1101/2020.05.22.110403 sha: doc_id: 102918 cord_uid: bkyk7or9 This study compares the results of data collected from a longitudinal query analysis of the MEDLINE database hosted on multiple platforms that include PubMed, EBSCOHost, Ovid, ProQuest, and Web of Science in order to identify variations among the search results on the platforms after controlling for search query syntax. We devised twenty-nine sets of search queries comprised of five queries per set to search against the five MEDLINE database platforms. We ran our queries monthly for a year and collected search result count data to observe changes. We found that search results vary considerably depending on MEDLINE platform, both within sets and across time. The variation is due to trends in scholarly publication that include publishing online first versus publishing in journal issues, which leads to metadata differences in the bibliographic record; to differences in the level of specificity among search fields provided by the platforms; to database integrity issues that lead to large fluctuations in monthly search results based on the same query; and to database currency issues that arise due to when each platform updates its MEDLINE file. Specific bibliographic databases, like PubMed and MEDLINE, are used to inform clinical decision-making, create systematic reviews, and construct knowledge bases for clinical decision support systems. Since they serve as essential information retrieval and discovery tools that help identify and collect research data and are used in a broad range of fields and as the basis of multiple research designs, this study should help clinicians, researcher, librarians, informationalists, and others understand how these platforms differ and inform future work in their standardization. 143 To answer these questions, our analytical framework is based on the concepts of methods and 144 results reproducibility [50] . Methods reproducibility is "the ability to implement, as exactly as 145 possible, the experimental and computational procedures, with the same data and tools, to obtain 146 the same results" and results reproducibility is "the production of corroborating results in a new 147 study, having followed the same experimental methods (A New Lexicon for Research 148 Reproducibility section, para. 2). We do not apply the concept of inferential reproducibility in 149 this paper since this pertains to the conclusions that a study makes based on the reproduced 150 methods, and this would largely be applicable if we investigated the relevance of the results 151 based on an information need rather than, as we do, focus solely on the reproducible sets of 152 search queries and the records produced by executing those queries. 162 The search queries, tested in the pilot studies, were designed to be semantically and logically 163 equivalent to each other on a per set basis. Differences between queries within sets were made 164 only to adhere to the query syntax required for each platform. 188 Note: Column meanings: The Keyword column indicates how many keywords were used in the query, not counting 189 field specific keywords, such as document title, journal title, or author name. The latter are counted in the 190 FieldSpecific column, which indicates the number of field specific terms used in the query. The MeSH column 191 indicates how many MeSH terms were used in the query. The Branches column indicates how many trees a MeSH 192 term belongs to. The PubDate column is a binary column to indicate whether a query does not include a publication 193 date (0) or includes a publication date (1). The Explode column indicates whether a MeSH term was not exploded 194 (0), exploded (1), or in queries with multipe MeSH terms, at least one term was exploded and one was not (2) . The 195 AND, OR, and NOT columns indicate a count of how many of these Boolean operators were used in the query. In 196 legacy PubMed, queries require an 'and medline [sb] ' tag in order to limit results to MEDLINE only and to exclude 197 PubMed more broadly. These ANDs were not counted in this column. We did count ANDs when used to join 198 terms or when including publication date ranges in our searches, even for Ovid/MEDLINE, even though 199 Ovid/MEDLINE uses the limit operator and not technically the AND operator. Databases as scientific instruments and their role in the ordering of scientific work Scholarship and disciplinary practices Should metaanalysts search Embase in addition to Medline? Examining the role of MEDLINE as a patient care information resource: an analysis of data from the Value of Libraries study Human(e) Factor in Clinical Decision Support Systems Sources of polysemy in indexing practice: The case of games, experimental in MeSH Emerging trends and new developments in information science: a document co-citation analysis Classical databases and knowledge organization: A case for boolean retrieval and human decision-making during searches Can we prioritise which databases to search? A case study using a systematic review of frozen shoulder management Comparing the coverage, recall, and precision of searches for 120 systematic reviews in Embase, MEDLINE, and Google Scholar: a prospective study A comparison of the performance of seven key bibliographic databases in identifying all relevant systematic reviews of interventions for hypertension Availability of renal literature in six bibliographic databases Integrating evidence-based practice and information literacy skills in teaching physical and occupational therapy students Searching PubMed for a broad subject area: how effective are palliative care clinicians in finding the evidence in their field? A Content Analysis of Strategies and Tactics Observed Among MLIS Students in an Online Searching Course A learning-based approach for performing an in-depth literature search using MEDLINE Breaking records: The history of bibliographic records and their influence in conceptualizing bibliographic data Improving information retrieval using Medical Subject Headings Concepts: a test case on rare and chronic diseases The earth is flat (p> 0.05): significance thresholds and the crisis of unreplicable research 1,500 scientists lift the lid on reproducibility Estimating the reproducibility of psychological science Preferred Reporting Items for Systematic Reviews and Meta-Analyses: The PRISMA Statement Cochrane Handbook for Systematic Reviews of Interventions Available: /handbook Bibliographic database access using free-text and controlled vocabulary: an evaluation Full text database retrieval performance A checklist to assess database-hosting platforms for designing and running searches for systematic reviews Comparison of cinahl® via EBSCOhost®, OVID®, and ProQuest® A comparison of searching the Cochrane library databases via CRD, Ovid and Wiley: implications for systematic searching and information services Updated Algorithm for the PubMed Best Match Sort Order. NLM Technical Bulletin Increasing number of databases searched in systematic reviews and meta-analyses between 1994 and 2014 Compliance of systematic reviews in veterinary journals with Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) literature search reporting guidelines Database selection in systematic reviews: an insight through clinical neurology The comparative recall of Google Scholar versus PubMed in identical searches for biomedical systematic reviews: a review of searches used in systematic reviews Searching one or two databases was insufficient for meta-analysis of observational studies Analysis of the reporting of search strategies in Cochrane systematic reviews Information Retrieval in Telemedicine: a Comparative Study on Bibliographic Databases Bias on the web Search engine coverage bias: evidence and possible causes Evaluating the usability and usefulness of a digital library The Impact of Query Interface Design on Stress, Workload and Performance Development of a Search Strategy for an Evidence Based Retrieval Service What a difference an interface makes: just how reliable are your search results? Focus Altern Complement Ther When is a search not a search? A comparison of searching the amed complementary health database via EBSCOhost, OVID and DIALOG Developing search strategies for clinical practice guidelines in SUMSearch and Google Scholar and assessing their retrieval performance Comparison of journal title coverage between CINAHL and Scopus Medical literature searches: a comparison of PubMed and Google Scholar What does research reproducibility mean? The New PubMed is Here. U.S. National Library of Medicine Advanced PubMed Searching Resource Packet. U.S. National Library of Medicine Systematic bias in cancer patient-reported outcomes: symptom 'orphans' and 'champions New MeSH Supplementary Concept Record for the 2019 Novel Coronavirus, Wuhan, China. NLM Technical Bulletin Google Scholar: the pros and the cons Is the coverage of google scholar enough to be used alone for systematic reviews Google Scholar is not enough to be used alone for systematic reviews