key: cord-0737431-4h2ma00j authors: Gkiouras, Konstantinos; Nigdelis, Meletios P.; Grammatikopoulou, Maria G.; Goulis, Dimitrios G. title: Tracing open data in emergencies: the case of the COVID‐19 pandemic date: 2020-06-19 journal: Eur J Clin Invest DOI: 10.1111/eci.13323 sha: 9bb775eebfcea03d8ab72676301bad0214439084 doc_id: 737431 cord_uid: 4h2ma00j BACKGROUND: The COVID‐19 pandemic constitutes an ongoing, burning Public Health Emergency of International Concern (PHEIC). In 2015, the World Health Organization (WHO) adopted an open‐data policy recommendation in such situations. OBJECTIVES: The present cross‐sectional meta‐research study aimed to assess the availability of open data and metrics and of articles pertaining to the COVID‐19 outbreak in five high‐impact journals. METHODS: All articles regarding the SARS‐CoV‐2, published in five high impact journals (Ann Intern Med, BMJ, JAMA, NEJM and Lancet) until March 14, 2020 were retrieved. Meta‐data (namely the type of article, number of authors, number of patients, citations, errata, news and social media mentions) were extracted for each article in each journal in a systematic way. Google Scholar and Scopus were used for citations and author metrics respectively, and Altmetrics and PlumX were used for news and social media mentions retrieval. The degree of adherence to the PHEIC open data call was also evaluated. RESULTS: A total of 140 articles were published until March 14, 2020, mostly opinion papers. Sixteen errata followed these publications. The number of authors in each article ranged from 1 to 63, whereas the number of patients with a laboratory‐confirmed SARS‐CoV‐2 infection reached 2,645. The impact of these publications reached a total of 4,210 cumulative crude citations and 342,790 news and social media mentions. Only one publication (0.7%) provided complete open data, while 32 (22.9%) included patient data. CONCLUSIONS: Even though a large number of manuscripts was produced since the pandemic, availability of open data remains restricted. Forty-six years ago, Monto, a Coronavirus epidemiology expert reported the complete lack of literature suggesting that human Coronaviruses could be involved in lethal infections [1] . Today, according to the World Health Organization (WHO), a pandemic is an appropriate analogy for what the COVID-19 has developed to [2] , sparking the need for information concerning the characteristics of the virus, the underlying pathology and possible management. Lessons learned from the Ebola outbreak include the importance of prompt and facilitated availability of data in Public Health Emergencies of International Concern (PHEIC), as globally agreed in the 2015 WHO consultation [3] , to enhance preparedness. This paradigm shift in the approach to information sharing in emergencies encompassed open data, elimination of embargo policies, fitfor-purpose platforms to present relevant research items, as well as expedited, or post-publication peer review policies [3, 4] . The proposed framework was adopted immediately in the COVID-19 outbreak, being the second public health emergency to implement this specific WHO consultation after the Zika virus. In parallel, the Wellcome Trust issued a similar statement, high-lightening the need for rapid dissemination of research data and findings relevant to the virus [5] . Subsequently, editors of scientific journals adhered to this strategy by immediately issuing calls for research items on COVID-19 to feed the existing demand gap [6, 7] , often expediting the review process to disseminate information as promptly as possible, in an open access manner. These calls for COVID-19 intel sparked a new era in scientific publishing, with individual journals receiving 20 [8] to 100 [9] articles every day, concerning the virus. On the other hand, scientists have called for taking everything with a pinch of salt as the circulation of fake news was pedalled from non-evidence-based data [10] . Given that we are currently living unpreceded times in both terms of health outbreak and rapid publication analogy, a close look at the metrics behind these rapidly produced publications in five high-impact journals (BMJ, NEJM, JAMA, Lancet and Ann Intern Med) is of utmost importance to better understand this unique situation we are currently experiencing. This article is protected by copyright. All rights reserved Three independent researchers accessed the web platforms designed specifically for articles pertaining to COVID-19 pandemic developed by five high-impact journals. The search strategy was designed to locate any COVID-19-related publication, with a publication type normally involving peer-review in each journal, published from inception, until March 14 th , 2020. The COVID-19 resource platforms of the Annals of Internal Medicine (Ann Int Med) [11] , The British Medical Journal (BMJ) [12] , The Journal of the American Medical Association (JAMA) [13] , The New England Journal of Medicine (NEJM) [14] , and The Lancet (Lancet) [15] were searched. Articles were included in this study if their type was eligible for a peer-review process in each journal, as ascertained by the information provided on each journal's guidelines for authors. Meta-data characteristics were extracted from each publication by three researchers for each publication in a predefined Microsoft Excel ® spreadsheet. Extracted information involved (1) characteristics of the article including type, country of origin, number of authors, number of patients (with a laboratory-confirmed SARS-CoV-2 infection), (2) [18] , and Reddit [19] , according to the altmetrics provided by each journal, as well as (7) first and last author details including the total number of publications based on PubMed [20] and the Hirsch-index (h-index) [21] according to Scopus [22] . Citations of the published items were retrieved from Google Scholar [16], using the title of each manuscript as the searched term. This provided results related to the specific publication only. Then, the total number of citations of each article was normalized by the immediacy index of the journal, provided by the Web of Science (WoS) [23] . The ratio of the number of authors to patients (confirmed cases of SARS-CoV-2 infection) was also calculated for ascertaining any hyperbole in the number of authors claiming authorship. Articles were categorized in data-driven (if they included patients), reviews (when summarizing evidence), or opinion papers. This article is protected by copyright. All rights reserved Altmetric [24] . First and last author data were retrieved directly for authors with Scopus [22] profiles. In those lacking a Scopus [22] author profile, the affiliation used in the COVID-19 publication was applied in the search to identify them within the Scopus database. Data availability was ascertained if articles provided all data in an accessible way (e.g. either as supplements of their online publications or as stored databases on publicly available depositories/websites). In the case of case reports/series (describing either only one or more patients), data availability was determined by checking if data was provided for all patient's characteristics and measurements mentioned on their publications. The present study follows the reporting guidelines for health research [26] . Data are presented as means ± standard deviations, followed by medians alongside the minimum and maximum values for each variable. Descriptive analysis of the data was conducted on the Statistical Package for Social Sciences version 25 (SPSS, Chicago, IL, USA). A total of 140 research items were retrieved from the five journals (Supplementary Table 1 ). In Table 1 This article is protected by copyright. All rights reserved The present results indicate that publication norms have changed to facilitate information dissemination concerning the COVID-19 pandemic. In further detail, a surge of manuscripts emerged as soon as the SARS-CoV-2 was identified, with distinct authorship etiquette, citation and dissemination effects, as well as peer review policies, all differing from what was considered as "acceptable" until now. Instead of actual open data and dataset sharing, most of the published items were in fact opinion papers, case reports and reviews, indicating that the PHEIC call was not adhered. Carefully poring over these five journals since the COVID-19 outbreak indicates that the majority of publications were produced by Chinese researchers, followed by UK and US scientists ( Figure 1 ). Although China is known to exhibit a "publish or perish" culture [27] , in the case of COVID-19, Chinese research appears highly justified given that the country was the initial epicenter of the virus. In parallel, many studies originating from China included a large number of patients (>100), presenting most of the COVID-19 data synthesis, with scientists from other countries either presenting case studies, case series or opinion and narrative review papers. Following the COVID-19 call for data, opinion and reviews lacking patient data consisted of 75% of the published research in these five journals. Interestingly, many research items were characterized by extensive authorship lists (Table 1) . Hyperauthorship is common in multicenter studies, with an increased collaboration size being needed to orchestrate data collection. In China, the need to reform the quality of submitted manuscripts [28] and reduce the number of hyperprolific authors [27] stemming from a local cashfor-publication policy [29] [30] [31] [32] has been high-lightened. On these premises, fairly recently, This article is protected by copyright. All rights reserved Chinese authorities opted for tackling these incentives, by limiting publication rewards and promoting better research practices [33] . On the other hand, the cash-for-publication policy has been adopted by institutions from many other countries, including the USA [34] . However, in the call for COVID-19 open data, the authorship norm appears to be stretched out. Although some of the research was submitted by prominent figures in medicine or the political scene (e.g. Bill Gates), many items were authored by more "rookie" scientists. In further detail, the COVID-19 era was the perfect opportunity for many researchers to publish, with some identified as publishing their first manuscript following the COVID-19 call, and others having an h-index of 0. Often, when authors with Chinese/Korean names were encountered, it was not possible to calculate the exact number of publications or h-index, as Scopus has been reported to distinguish inadequately between Chinese/Korean names [27] , and the same problem has also been reported in other databases, including Google Scholar [35] resulting in confusing and erroneous metadata records [36] . Even among case studies, although data is being reported for one patient only, there were cases when the number of authors exceeded 60 [37] . The patient-to-author ratio in case reports ranged between 0.016-0.25, indicating that a minimum of four to a maximum of 63 authors were employed to report a single case study. Therefore, the number of authors might not always reflect the needs of the presented work, but could well be the epiphenomenon of a "publish or perish" trend [27, 38, 39] , authorship abuse/manipulation, or even a "you scratch my back and I will scratch yours'" strategy among colleagues [40, 41] . When case studies are proliferating excessively, this might entail a higher risk for misconduct [42] , although it has been suggested that sometimes a collectivistic perspective is exhibited by scientists stemming from more individualistic countries [40, 43] . In an extended version of this multi-author trend in the COVID-19 era, the opportunity to publish in flagging academic journals may exaggerate this effect even further, increasing the number of researchers with authorship credit, even among case reports. Shared first authorship position was also very prominent among published research items with ten papers having shared first authorship at the Lancet, one of which divided the first author position among six authors [44] . This article is protected by copyright. All rights reserved The majority of research concerning the SARS-CoV-2 received an astonishing number of citations in less than a month's time post-publication. The cumulative citations of all manuscripts published in the five journals reached 4,798, corresponding to a mean of 34.3 citations per research item. Although Google Scholar has been recently portrayed as "the most comprehensive academic search engine" to date [45] , it also entails several limitations [46] . For instance, it appears that it is more efficient in detecting grey literature [46] , and this might produce a greater number of citations results for each research item. Nevertheless, searching specific article titles, as performed in the present analysis, appears to be a valid method for Google Scholar use [47] . Considering that in the five journals assessed herein, a high-impact manuscript with a "sexy" subject might acquire 0 to a maximum of 2 citations during the first publication month (with one being a possible editorial referring to the data and the other being either a super-fast response in the form of letter, or an UpToDate [48] citation), it becomes clear that citation farming spirals in PHEIC. In several cases, altmetrics have been shown to correlate with citations [49] [50] [51] and can even predict highly cited articles [52] . In the case of COVID-19 intel, the news and social media impact was extremely powerful, reflecting the need for immediate information dissemination in emergencies. The most eye-catching part of the present study is that selected articles received more than 20,000 social media mentions in less than a month. One of the early manuscripts, published in late January 2020 [53] has been mentioned more than 41,000 in social media, while yielding more than 740 citations in less than two months. This indicates that in the COVID-19 era, the use of social media excelled beyond the medical profession lobby and research promotion networks, blending with crisis informatics [54] to feed the societal information gap concerning the pandemic. With the need for immediately available information on the course and management of COVID-19, some of the published research items consisted of the first cases detected in selected countries. Interestingly, the SARS-CoV-2 patient 0, suggested to have originated from the Hubei province [55, 56] , was never published as a case report. On the contrary, after the Wuhan lockdown, many first cases from other countries were published in high-impact journals promptly. For instance, the first US case was confirmed by the Centers for Disease Control and Prevention on January 21 st This article is protected by copyright. All rights reserved [57] , and published as a case study as soon as ten days later [37] . Other countries following this first country case publication regime included the UK [58] and Canada [59] . It appears that publication of the first country case studies is irrelevant to the COVID-19 burden suffered in each region, as nor the Chinese, neither the Italians, ever reported the characteristics of their country's first patient. When adhering to the "publish or perish" culture, the likelihood to produce a subsequent erratum of the manuscript is increased, in parallel to the chance for retraction [60, 61] . Among 140 manuscripts, 16 had subsequent corrections revealing pitfalls in the fast-track publication regime. In a follow-up search of our data conducted in late May 2020, 2 additional errata were identified. A quick search in PubMed for retracted articles on COVID-19 showed that three original articles and one editorial (based on one of these manuscripts) to date, were retracted from journals, indicating that editors must be prudent with all submitted research in PHEIC until March 14 th 2020. As already noted by Ioannidis [10] , one specific retracted publication [62] has received an astonishing Altmetric score of 254,647 points. On the other hand, questions have been raised concerning the quality of published information [63] with many noting that in the times of COVID-19, the spread of misinformation, in parallel to valid data, is in fact, inevitable [64, 65] . Examples include possible therapies examined via clinical trials published in flagging journals without a comparator, lacking randomization and with authors having clear conflicts of interest (i.e. funding sources) [9, 66] . Nevertheless, on the aftermath of this pandemic, the experience we will have gained can also be used to reduce publication redundancy [67] and sloppiness, and focus on the quality of the produced research. Under the "open data" umbrella: mind the gap Although the term "open data" appears to be self-explanatory, researchers and institutions tend to have a different understanding of the concept [68] . According to the WHO consultation [3] , sharing of both data and results should be the default practice; however, the number of COVID-19 research items actually sharing their data was minimal, as 11 out of 140 manuscripts in total, shared actual open patient data. Of note, of these 11 with available patient data (Table 1) , only one (0.7%) had a linked dataset file and the used R code, two presented all data within the manuscript tables, one presented newspaper data, and the remaining seven involved either case studies, or case series, where all data were inevitably presented within the manuscript text. Thus, only one This article is protected by copyright. All rights reserved actually fulfilled the call for open datasets ( Table 2 ). This indicates that some researchers make use of the crisis opportunity to submit their manuscripts in an open-access manner, without, however, adhering to the WHO recommendations for actual data sharing [3] . Luckily, during a revision of the current paper which underwent normal review process, the BMJ issued a revision of the call for COVID-19 research (20 th April 2020) asking for a parallel submission of data alongside each COVID-19-related article. Inevitably, the lack of actual open data dissemination does not allow for the synthesis of similar data from distinct research items and might allow for data slicing [69] , as identified in one case herein. Similar concerns were raised by the JAMA editorial board [70] suggesting that duplicate patient records might pollute accuracy of the COVID-19 epidemiology. Nevertheless, it appears that readers must exercise due diligence concerning all published items [10] . At the moment, a total of 101 clinical trials concerning the virus are registered in clinicaltrials.gov, three of which, all China-based, have reported a "completed" status (cumulative patient sample N=716) and many are expected to finish on time (late April 2020). Hopefully, by design, these are bound to produce evidence-based data concerning the management of COVID-19 infection and aid in tampering down the epidemic. The proportion of these that will eventually share their data apart from presenting their results is yet to be discovered, and will help to assess whether the call for open data in PHEIC needs to be re-evaluated in a novel and stricter framework. Collectively, the calls for open data in COVID-19 produced a large number of manuscripts in a very short time [71] , although the majority (77.1%) failed to include actual patient data but consisted mainly of opinion papers or reviews. In fact, the surplus of published research was as unanticipated as the spread of COVID-19. In the hypothetical question if the scientists are getting what they expect from this literature burst, the answer is that they are certainly getting what is available, although the call for open data share does not appear to be fully adhered. As a result, at the moment, the signal-to-noise ratio is inflated, with opinions and personal views creating buzz; what is needed are actual data that could be synthetized to aid evidence-based practice and mitigate the pandemic. Therefore, the most appropriate question to consider is whether the call for This article is protected by copyright. All rights reserved open data in PHEIC actually serves its purpose or if it compromises research integrity and quality [72] . Undoubtedly, publication norms differ in emergencies, and the rigging review process entails a series of bottlenecks, allowing for a few scientists to abuse relevant calls for open data and use them as a means to publish through a loophole. Nevertheless, as indicated by the social media impact, in PHEIC, the importance of data dissemination is pivotal, and the flow of research should not be ceased until the pandemic has been tackled. Medical reviews. Coronaviruses Covid-19: WHO declares pandemic because of "alarming levels" of spread, severity, and inaction World Health Organization. WHO | Developing global norms for sharing data and results during public health emergencies. WHO. World Health Organization Data sharing in public health emergencies: a call to researchers. Bull World Health Organ Sharing research data and findings relevant to the novel coronavirus (COVID-19) outbreak | Wellcome The Lancet T. Emerging understandings of 2019-nCoV Response to the emerging novel coronavirus outbreak Journals Open Access to Coronavirus Resources Science Has an Ugly, Complicated Dark Side. And the Coronavirus Is Bringing It Out Mother Jones. 2020. Available from Coronavirus disease 2019: the harms of exaggerated information and nonevidence-based measures American College of Physicians BMJ's Coronavirus (covid-19) Hub The Lancet. COVID-19 Resource Centre Available from: www.facebook PubMed Accepted Article This article is protected by copyright. All rights reserved An index to quantify an individual's scientific research output Altmetric [Internet]. digital science A catalogue of reporting guidelines for health research The scientists who publish a paper every five days Publish or perish China's Publication Bazaar The Numbers Game The Possibility of Systematic Research Fraud Targeting Under-Studied Human Genes: Causes, Consequences, and Potential Solutions High Monetary Rewards and High Academic Article Outputs: Are China's Research Publications Policy Driven? Ser Libr China bans cash rewards for publishing papers Cash bonuses for peer-reviewed papers go global. Science (80-) Scientific author names: errors, corrections, and identity profiles The continuing confusion in figuring out the surname of a Chinese author -A proposed solution First Case of 2019 Novel Coronavirus in the United States Inflated numbers of authors over time have not been just due to increasing research complexity Data sharing in the era of COVID-19. Lancet Digit Heal Games academics play and their consequences: how authorship, h -index and journal impact factors are shaping the future of academia Authorship and citation manipulation in academic research Gaming the Metrics: Misconduct and Manipulation in Academic Research. First edit Culture and Unmerited Authorship Credit: Who Wants It and Why? Front Psychol Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: a retrospective cohort study Google Scholar to overshadow them all? Comparing the sizes of 12 academic search engines and bibliographic databases The Role of Google Scholar in Evidence Reviews and Its Applicability to Grey Literature Searching Google Scholar Is Not Enough to Be Used Alone for Systematic Reviews Twitter Mentions and Academic Citations in the Urologic Literature Correlation Between Altmetric Score and Citations in Pediatric Surgery Core Journals Comparing alternative and traditional dissemination metrics in medical education Can Tweets Predict Citations? Metrics of Social Impact Based on Twitter and Correlation with Traditional Metrics of Scientific Impact Clinical features of patients infected with 2019 novel coronavirus in Wuhan Fifteen years of social media in emergencies: A retrospective review and future directions for crisis Informatics The novel Chinese coronavirus (2019-nCoV) infections: Challenges for fighting the storm Characteristics of and Important Lessons From the Coronavirus Disease 2019 (COVID-19) Outbreak in China First Travel-related Case of 2019 Novel Coronavirus Detected in United States | CDC Online Newsroom | CDC Lessons for managing high-consequence infections from first COVID-19 cases in the UK First imported case of 2019 novel coronavirus in Canada, presenting as mild pneumonia US studies may overestimate effect sizes in softer research Misconduct accounts for the majority of retracted scientific publications Uncanny similarity of unique inserts in the 2019-nCoV spike protein to HIV-1 gp120 and Gag. bioRxiv Accepted Article This article is protected by copyright. All rights reserved Pandemic fear" and COVID-19: mental health burden and strategies Improving communication about COVID-19 and emerging infectious diseases Medical misinformation in mass and social media: An urgent call for action, especially during epidemics Compassionate Use of Remdesivir for Patients with Severe Covid-19 Redundancy in reporting on COVID-19 Open data and public health Duplicate and salami publication: a prevalence study of journal policies Editorial Concern-Possible Reporting of the Same Patients With COVID-19 in Different Reports Scientists are sprinting to outpace the novel coronavirus Publishing in 2020: A checklist to support a shift in behaviour to achieve best practice This article is protected by copyright. All rights reserved