key: cord-0516563-36nemqxa
authors: Pallath, Akash; Zhang, Qiyang
title: Paperfetcher: A tool to automate handsearch for systematic reviews
date: 2021-10-24
journal: nan
DOI: nan
sha: 3a01d3e7c19864cb2e457db03bd5be01f8e712f6
doc_id: 516563
cord_uid: 36nemqxa

Handsearch is an important technique that contributes to thorough literature search in systematic reviews. Traditional handsearch requires reviewers to systematically browse through each issue of a curated list of field-specific journals and conference proceedings to find articles relevant to their review. This manual process is not only time-consuming, laborious, costly, and error-prone, but it also lacks replicability and cross-checking mechanisms. In an attempt to solve these problems, this paper presents a free and open-source Python package and an accompanying web-app, Paperfetcher, to automate handsearch for systematic reviews. With Paperfetcher's assistance, researchers can retrieve articles from designated journals within a specified time frame with just a few clicks. In addition to handsearch, this tool also incorporates snowballing in both directions. Paperfetcher allows researchers to download retrieved studies as a list of DOIs or as an RIS database to facilitate seamless import into citation management and systematic review screening software. To our knowledge, Paperfetcher is the first tool that automates handsearch with high usability and a multi-disciplinary focus.

Literature search is an early and crucial step for performing a comprehensive systematic review [1] [2] [3] . The process of literature search often begins with a fieldrelated bibliographic database search, which retrieves studies from electronic records that index journals and non-journal sources [4] . Limiting literature search to bibliographic search risks missing high quality papers [3, 5] that were not indexed with identifiable terms [6] , formatted as abstracts or letters [2, 3] , located in supplement editions of journals [2] , or not included in electronic databases [7] . Handsearch is an additional step that reviewers often undertake to identify such studies, which involves systematically browsing through the tables of contents from a curated list of field-specific journals, abstracts, and conference proceedings in order to gather papers relevant to the synthesis topic [4] . To ensure comprehensive information retrieval and transparent reporting, it is important to conduct literature search in a systematic and exhaustive manner [8] . Combining handsearch with bibliographic database search decreases the likelihood of missing major relevant studies and therefore underpins a solid foundation for the systematic review [9] .

Despite its importance, the prevailing practice of handsearch awaits urgent development. Advancing handsearch strategy is also timely due to the rapid increase * Authors are listed alphabetically. Both authors contributed equally to this work.

in studies available online and the subsequent increase in the amount of time required to retrieve papers. Traditionally, handsearch requires reviewers or hired volunteers to manually read through the tables of contents and abstracts of tens to hundreds of journal issues, supplementary materials, and conference proceedings [4, 8] . Past literature agrees on the time and labor-intensive nature of handsearch, reporting time spent on this task ranging from an hour per journal volume [10] to 185 hours spent for 10 journals [11] . This procedure is not only time-consuming, laborious [4] , and costly [10] , but more importantly, it is error-prone due to human fatigue [12] , lacks replicability and an easy cross-checking mechanism due to its cumbersome manual nature.

To address these problems, we developed a freely available Python package and an accompanying web-app -Paperfetcher -that automates handsearch to increase efficiency and ensure replicability. Paperfetcher automatically fetches works from a user-selected list of journals within a user-specified timeframe. Paperfetcher not only returns article metadata, but it also generates a report of the parameters used to perform the handsearch, which can assist with replicability. Automation of handsearch facilitates less error-prone systematic review, and more importantly, enables researchers to focus their energy on the screening process rather than the search process. Paperfetcher addresses the urgent need for a more replicable, robust, and time-efficient method to conduct handsearch.

In addition to handsearch, Paperfetcher also includes a snowballing function for forward and backward citation chasing [13] [14] [15] [16] . Forward citation chasing searches for ar-arXiv:2110.12490v3 [cs.IR] 7 Jan 2022 ticles that cite a given article of interest while backward citation chasing searches for articles cited by the article of interest. Snowballing, which is an umbrella term for citation chasing in both directions, is a very useful supplementary search strategy [5] . Researchers have found that it identifies 51% of studies in systematic reviews [16] . We implemented snowballing in Paperfetcher due to its important contribution to literature search.

While there are several existing tools to automate snowballing (further discussed in Section IV), to the best of our knowledge, there is no other tool in the current market that provides an easy-to-use interface to automate handsearch.

Paperfetcher consists of a) a Python package which implements the handsearch and snowballing algorithms to retrieve raw data and convert it to various output formats (such as text and Research Information Systems (RIS) format), and b) a web-app that provides an easy to use graphical interface for the package. In the following subsections, we describe how the Python package and the web app work.

Paperfetcher 's handsearch algorithm queries the database of academic content registered with Crossref. Crossref is an official digital object identifier (DOI) registration agency of the International DOI Foundation [17] . Research suggests that Crossref is the most robust and holistic implementation of the DOI model [17] , and covers a wide range of disciplines. Among the 11 official DOI registration agencies, only Crossref, DataCite, and Multilingual European DOI Registration Agency (mEDRA) cover English materials related to scholarly and professional research content [18] . We excluded DataCite as it only indexes research data and not research articles or reports [18] . As mutual collaboration between mEDRA and Crossref allows DOIs registered with mEDRA to be deposited on the Crossref platform [19] , we excluded mEDRA to avoid redundancy. Since Paperfetcher is primarily designed for English-reading researchers, we excluded DOI agencies that contain materials of other non-English languages, such as China National Knowledge Infrastructure [20], or which index non-academic content, such as the Entertainment Identifier Registry [21] . Other non-English DOI agencies can be added in future versions of Paperfetcher once demand for multilingual research content arises.

Statistics updated on Dec 31 2021 show that Crossref covers more than 1.3 million published and unpublished content types [22] , such as journals, books, conference proceedings, dissertations, working papers, technical reports, and data sets. It provides more than 90,000 records of journals and more than 80,000 records of conference proceedings to empower an effective handsearch [22] . A comparison of different databases, including Scopus, Web of Science, Dimensions, Crossref, and Microsoft Academic, concluded that Crossref is a bibliographic data source that is of significant interest for bibliometric analyses and is becoming increasingly valuable over the years [23] . As compared to the widely-used Scopus database, Crossref covers a larger number of documents that have been published in journals [23] .

Paperfetcher 's snowballing algorithm queries both the Crossref database and COCI, the OpenCitations Index of Crossref open DOI-to-DOI citations. COCI is a database derived from Crossref, which contains more than 1.2 billion DOI-to-DOI citation pairs of Crossref-deposited articles with open citations [24, 25] . The COCI database is updated periodically to add more citations. As of January 2, 2022, the last update to the database was on November 25, 2021 [24] .

Each handsearch has three inputs: the International Standard Serial Number (ISSN) of the journal to fetch articles from, a date range within which to fetch articles, and an optional list of keywords to refine the search. We do not recommend using keywords when performing handsearch for systematic reviews as there is a possibility of missing relevant studies which were not indexed in Crossref with matching keywords. However, this input can still be useful for performing literature searches for specific purposes, or if the reviewers are time-constrained and have to filter through a large number of matching articles.

Paperfetcher queries the Crossref database through its Representational State Transfer Application Programming Interface (REST API) with these three inputs to retrieve metadata -containing fields such as article title, authors, journal, publisher, publishing date, abstract, keywords, etc. -of all articles from the journal with the specified ISSN, within the selected date range, and matching the list of keywords (if specified). The Paperfetcher web app can iterate through a list of journals, perform handsearch for each of them, and merge the retrieved metadata into a single dataset of search results.

Paperfetcher requires a list of DOIs as input for both forward and backward snowballing. For backward snowballing, it queries the Crossref REST API to retrieve DOIs of papers cited by the input DOIs. The Crossref REST API only returns information about articles cited by a given paper, and not articles citing it. The COCI REST API, however, can return this data, therefore Paperfetcher uses COCI for forward snowballing. In both cases, Paperfetcher compiles the union of the retrieved DOIs into a dataset of search results.

The Paperfetcher Python package can convert the search results into several formats, such as a text file of DOIs, a CSV file or Excel spreadsheet with rows containing article title, authors, journal, DOI, URL, etc. for each article, RIS databases, and even pandas DataFrames [26, 27] for further data analysis and processing in Python or R. The Paperfetcher web-app implements two of these options -it can either export results to an RIS file or to a text file of DOIs. Paperfetcher uses Crossref's content negotiation service [28] to retrieve metadata in the RIS format.

Paperfetcher 's web-app, based on the Streamlit app framework, is available online at https://share.streamlit.io/paperfetcher/ paperfetcher-web-app/main/paperfetcher_app.py. The web-app can also be run offline, as described in Section III D. The highlight of this web app is that it has an easy-to-use graphical interface and requires no programming experience. To perform handsearch (Figure 1 ), users just need to fill out a few search parameters -journal names or journal ISSNs, a date range to fetch articles within, optional search keywords, output format -and click on the 'Search' button. To perform snowballing (Figure 2 ), users only need to enter comma-separated DOIs of selected papers, select the type of snowballing (forward or backward), select output format, and click on the 'Search' button.

After performing a search, Paperfetcher gives users the option to download their data either as a text file of DOIs or as an RIS file, based on the output format selection they made (Figure 3 ).

RIS is a standardized format that allows data exchange among different citation management software [29] . RIS data exported from Paperfetcher can be seamlessly imported into citation management software such as End-Note, Mendeley, Paperpile and Zotero, and into Sys-tematic Review Screening tools such as ASReview [30] , Covidence, DistillerSR, and EPPI-Reviewer [31] (for a comprehensive review of screening tools, see Zhang and Neitzel [32] ). DOIs from exported text data can be bulkimported into citation management tools such as Zotero in order to fetch missing abstracts or other important metadata.

Paperfetcher generates a report of the parameters used to perform the search and the number of papers fetched, which is displayed to users after performing the search. Figure 3 shows a sample report for snowballing. These reports provide a mechanism for reviewers to document their search for cross-checking and reproducibility.

Privacy-conscious users may opt to use the Paperfetcher web-app offline. To do so, they must first install and then run the app from source. The source code and installation instructions for the app are available online at https://github.com/paperfetcher/ paperfetcher-web-app.

Python programmers might prefer to directly use Paperfetcher 's Python package as it can export search results to more data formats than the app. Users who wish to set up data pipelines or perform additional data analysis in Python or R will find the package's ability to export data to pandas DataFrames particularly useful. Such users can install the package directly from the Python Package Index (https://pypi. org/project/paperfetcher). Developers who are interested in modifying Paperfetcher 's source code can clone the source repository from GitHub at https:// github.com/paperfetcher/paperfetcher. Documentation for the Python package can be found at https: //paperfetcher.github.io/paperfetcher.

There are a few existing tools that automate snowballing. Some tools only enable uni-directional citation chasing. For example, Sci-Finder supports backward reference chasing, but it is not free of charge and only focuses on literature related to chemicals, drugs, and substances [33] . Some databases, including Web of Science, Scopus, and Google Scholar, enable forward citation chasing. We only found two tools that support citation chasing in both directions: SpiderCite and citation-chaser. SpiderCite is part of the free online suite of tools SR-Accelerator, and allows users to import and export references in EndNote XML, RIS, and BibTeX formats [34] . Citationchaser is an R package that is free of charge, open source, and provides an easy-to-use Shiny app interface, where users can input DOIs of articles to search and export data in RIS format [35] . However, to date, all the above tools lack functions to automate handsearch. Paperfetcher serves to complement these tools to help reviewers identify as many relevant studies as possible.

Although Paperfetcher has the potential to make significant contributions to the field of systematic review, the current version is limited in a few ways. First, there can be a lag between when a paper is first published online and when its DOI is deposited in Crossref. In such a case, Paperfetcher will fail to find the paper. Second, certain publishers do not make their abstracts and references publicly available on Crossref. In the former case, reviewers can use citation management software to retrieve missing abstracts. The latter case, however, is more problematic as this will result in relevant studies being missed by Paperfetcher 's snowballing function [23] . In addition, as updates to COCI lag behind updates to Crossref, Paperfetcher 's forward citation chasing function might miss newly published citing articles. Citationchaser, which queries the Lens.org database (consisting of articles from PubMed, PubMed Central, CrossRef, Microsoft Academic Graph and CORE), has access to more reference information. As the Lens.org API requires a subscription fee, and Paperfetcher was developed without any funding, we had no choice but to exclude paid APIs. In the future, should we have access to funding or resources, we can incorporate additional APIs (such as Lens.org or Web of Science) to improve Paperfetcher 's snowballing function.

In a nutshell, Paperfetcher provides a free, easy-touse, and efficient method to automate handsearch. We believe that it can significantly reduce the time reviewers spend on literature search and can improve transparency in synthesis reporting across disciplines.

Defining the process to literature searching in systematic reviews: a literature review of guidance and supporting studies

It's in your hands: the value of handsearching in conducting systematic reviews of public health interventions

Handsearching still a valuable element of the systematic review

Searching for studies: a guide to information retrieval for Campbell systematic reviews

Literature searching for social science systematic reviews: consideration of a range of search techniques

Systematic Reviews: Identifying relevant studies for systematic reviews

Handsearching for randomized controlled clinical trials in German medical journals

Cochrane handbook of systematic reviews of interventions

Handsearching versus electronic searching to identify reports of randomized trials

How good are volunteers at searching for published randomized controlled trials?

Handsearching did not yield additional unique FDG-PET diagnostic test accuracy studies compared with electronic searches: a preliminary investigation

An investigation of the adequacy of MEDLINE searches for randomized controlled trials (RCTs) of the effects of mental health care

A comparison of results of empirical studies of supplementary search techniques and recommendations in review methodology handbooks: a methodological review

Searching for and selecting studies

Systematic literature studies: Database searches vs. backward snowballing

Effectiveness and efficiency of search methods in systematic reviews of complex evidence: audit of primary sources

CrossRef: an overview

DOI registration agencies -areas of coverage

mEDRA, mEDRA -who we are

EIDR -a universal unique identifier for movie and television assets

Crossref stats

Large-scale comparison of bibliographic data sources: Scopus, Web of Science, Dimensions, Crossref, and Microsoft Academic

Software review: COCI, the OpenCitations Index of Crossref open DOIto-DOI citations

Data Structures for Statistical Computing in Python

The pandas development team, pandas-dev/pandas: Pandas

An open source machine learning framework for efficient and transparent systematic reviews

EPPI-Reviewer: advanced software for systematic reviews, maps and evidence synthesis

Methodological review: A systematic narrative review of screening tools for conducting systematic reviews in educational research

LibGuides: SciFinder: What Is SciFinder? (2021)

citationchaser: An R package and Shiny app for forward and backward citations chasing in academic searching