Evidence Summary

 

Publons Peer Evaluation Metrics are not Reliable Measures of Quality or Impact

 

A Review of:

Ortega, J. L. (2019). Exploratory analysis of Publons metrics and their relationship with bibliometric and altmetric impact. Aslib Journal of Information Management, 71(1), 124–136. https://doi.org/10.1108/AJIM-06-2018-0153

 

 

Reviewed by:

Scott Goldstein

Web Librarian

Appalachian State University Libraries

Boone, North Carolina, United States of America

Email: goldsteinsl@appstate.edu

 

Received: 1 May 2019                                                                    Accepted:  14 July 2019

 

 

cc-ca_logo_xl 2019 Goldstein. This is an Open Access article distributed under the terms of the Creative CommonsAttributionNoncommercialShare Alike License 4.0 International (http://creativecommons.org/licenses/by-nc-sa/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly attributed, not used for commercial purposes, and, if transformed, the resulting work is redistributed under the same or similar license to this one.

 

 

DOI: 10.18438/eblip29579

 

 

Abstract

 

Objective To analyze the relationship between scholars’ qualitative opinion of publications using Publons metrics and bibliometric and altmetric impact measures.

 

Design Comparative, quantitative data set analysis.

 

Setting Maximally exhaustive set of research articles retrievable from Publons.

 

Subjects 45,819 articles retrieved from Publons in January 2018.

 

Methods Author extracted article data from Publons and joined them (using the DOI) with data from three altmetric providers: Altmetric.com, PlumX, and Crossref Event Data. When providers gave discrepant results for the same metric, the maximum value was used. Publons data are described, and correlations are calculated between Publons metrics and altmetric and bibliometric indicators.

 

Main Results In terms of coverage, Publons is biased in favour of life sciences and subject areas associated with health and medical sciences. Open access publishers are also over-represented. Articles reviewed in Publons overwhelmingly have one or two pre-publication reviews and only one post-publication review. Furthermore, the metrics of significance and quality (rated on a 1 to 10 scale) are almost identically distributed, suggesting that users may not distinguish between them. Pearson correlations between Publons metrics and bibliometric and altmetric indicators are very weak and not significant.

 

Conclusion The biases in Publons coverage with respect to discipline and publisher support earlier research and suggest that the willingness to publish one’s reviews differs according to research area. Publons metrics are problematic as research quality indicators. Most publications have only a single post-publication review, and the absence of any significant disparity between the scores of significance and quality suggest the constructs are being conflated when in fact they should be measuring different things. The correlation analysis indicates that peer evaluation in Publons is not a measure of a work’s quality and impact.

 

Commentary

 

The study is the first in-depth examination of articles in Publons, a web platform that allows users to make public their peer-review reports for journals as well as rate articles on quality and relevance. Previous studies looking at Publons have focused on who uses the service rather than what articles are reviewed. This study sheds more light on two questions that have been asked in the literature. First, what is the relationship between traditional bibliometric indicators and peer subjective valuations (interpreted more broadly than refereed peer review as it encompasses other pre- and post-publication reviews)? In a comparable study focusing on Faculty of 1,000 (F1000) recommendations, Waltman and Costas (2014) found only a weak correlation between peer evaluations and citations, suggesting that the measures may capture different types of impact. Second, in what ways do the data providers used in scientometric analysis provide discrepant data? Zahedi and Costas (2018), for instance, conducted the most exhaustive comparison of altmetric providers and found that coverage of publications between them varies widely.

 

This commentary relies on Perryman’s (2009) critical appraisal tool for bibliometric studies. The study fares very well when evaluated against it. For instance, the author clearly states the research questions after a literature review that motivates the need for a more in-depth look at Publons. The data are analyzed using appropriate statistical methods given the research objectives, and the results are graphically displayed in an appropriate manner given the types of analyses performed. The correlations between all the metrics are presented in a correlation matrix, which, although missing a colour legend, is an excellent way to visualize this type of data. Limitations of the data sources are also considered. Sufficient, though perhaps minimal, detail was provided on how the author scraped the data from Publons; it would be difficult for a novice researcher in this area to replicate the study. It is not exactly clear whether scraping the website violates Publons’ terms of service, which explicitly state that “The correct way to access these data is via our API.”

 

This study highlights a couple of implications for information professionals. Peer evaluations of scholarship outside of traditional (double-) blind peer review (pre- and post-publication review and open peer review as adopted by some journals during the refereeing process) is an emerging practice. Librarians should familiarize themselves with these new methods of review and be in a position to inform scholars of their opportunities and challenges (see Ford, 2016). Information professionals may also take away from this study a more measured view of the role peer review plays in the assessment of scholarly literature. Although this study suggests Publons metrics are not suitable quality indicators for reasons that are specific to the Publons platform, it would be wise to remember that peer review is not an infallible measure of quality, let alone impact. An evidence-based critical appraisal of a scholarly work does not simply consist of checking the peer review status.

 

References

 

Ford, E. (2016). Opening review in LIS journals: A status report. Journal of Librarianship and Scholarly Communication, 4(General Issue), eP2148.

https://doi.org/10.7710/2162-3309.2148

 

Perryman, C. (2009). Evaluation tool for bibliometric studies. Retrieved from https://www.dropbox.com/l/scl/AAAL7LUZpLE90FxFnBv5HcnOZ0CtLh6RQrs

 

Waltman, L., & Costas, R. (2014). F1000 recommendations as a potential new data source for research evaluation: A comparison with citations. Journal of the Association for Information Science and Technology, 65(3), 433–445. https://doi.org/10.1002/asi.23040

 

Zahedi, Z., & Costas, R. (2018). General discussion of data quality challenges in social media metrics: Extensive comparison of four major altmetric data aggregators. PLoS ONE, 13(5), e0197326. https://doi.org/10.1371/journal.pone.0197326