key: cord-1049056-5aqkrdou authors: Ioannidis, John P.A.; Salholz-Hillel, Maia; Boyack, Kevin W.; Baas, Jeroen title: The rapid, massive growth of COVID-19 authors in the scientific literature date: 2021-08-21 journal: bioRxiv DOI: 10.1101/2020.12.15.422900 sha: d8f8e6beff7a5ac2612245bd83794c324af6d248 doc_id: 1049056 cord_uid: 5aqkrdou We examined the extent to which the scientific workforce in different fields was engaged in publishing COVID-19-related papers. According to Scopus (data cut, August 1, 2021), 210,183 COVID-19-related publications included 720,801 unique authors, of which 360,005 authors had published at least 5 full papers in their career and 23,520 authors were at the top 2% of their scientific subfield based on a career-long composite citation indicator. The growth of COVID-19 authors was far more rapid and massive compared with cohorts of authors historically publishing on H1N1, Zika, Ebola, HIV/AIDS and tuberculosis. All 174 scientific subfields had some specialists who had published on COVID-19. In 109 of the 174 subfields of science, at least one in ten active, influential (top-2% composite citation indicator) authors in the subfield had authored something on COVID-19. 52 hyper-prolific authors had already at least 60 (and up to 227) COVID-19 publications each. Among the 300 authors with the highest composite citation indicator for their COVID-19 publications, most common countries were USA (n=67), China (n=52), UK (n=32), and Italy (n=18). The rapid and massive involvement of the scientific workforce in COVID-19-related work is unprecedented and creates opportunities and challenges. There is evidence for hyper-prolific productivity. The acute crisis of COVID-19 has challenged the scientific community to generate timely evidence about the new coronavirus and its pandemic. Interest on COVID-19 has spread rapidly and widely across the scientific literature and among researchers. Such an "all hands on deck" response of the scientific workforce during a crisis may have been beneficial in generating ideas and evidence expeditiously. However, many authors publishing on COVID-19 may have lacked proper background expertise. The explosive focus on COVID-19 may have caused some inappropriate "covidization" of research 2,3 and the resulting research, conducted in such haste, may suffer from low validity. , 4, 5 Here, we aim to understand which scientific areas and which types of scientists have been most mobilized by the pandemic. The growth of the COVID-19 author cohort is contrasted against what happened in the mobilization of the scientific workforce for 5 other major infectious diseases. We also probe whether there is evidence of hyper-prolific productivity with some scientists rapidly publishing large numbers of papers. Concurrently, we evaluate scientists who have had the highest citation impact for their COVID-19-related work. Finally, we discuss the implications of this rapid "covidization" of the research enterprise. We used a copy of the Scopus database 6 of 2020 or greater. In order to evaluate publication dates by month, we have used the publication month and year where available. When publication month was either not available or exceeded the indexing date, we used the indexing date. This accounts, for example, for cases where an article is published today, but the official journal issue is due later. Our evaluation is targeted at the date at which publications became available to the public rather than official publication dates. We considered both publications in peer-reviewed venues and preprints. To avoid double counting of the same item (e.g., a work published both in a peer-reviewed journal and as a preprint, or in two preprint servers), we identified and filtered out duplicates by matching against author names and titles. Our search-based approach applies methods used for unstructured reference linking 7 and ranks documents based on similarity of fields. We used a combination of author names and titles, as many commonly used fields in reference linking, such as journal title, do not apply. The process first identifies multiple potential duplicate matches based on these fields, and then validates the best match based on the overlap between words in the title and author names. After identifying top candidate matches for de-duplication, the process applies a validation step based on the overlap between words in the title and author names. We excluded all preprints that link to either a non-preprint item, such as journal articles, or another preprint with an earlier date. The result of this step is exclusion of 10,703 preprints. We further focused on the 3,862,276 authors who have at least one Scopus-indexed publication since 2020 and who have also authored in their entire career at least 5 Scopus-indexed papers classified as articles, reviews or conference papers. This allows exclusion of authors with limited recent presence in the scientific literature as well as some author IDs that may represent split fragments of the publication record of some more prolific authors. All authors were assigned to their most common field and subfield discipline of their career. We used the Science Metrix classification of science, which is a standard mapping of all science into 21 main fields and 174 subfield disciplines. 8, 9 Influential scientists We also examined how COVID-19 has affected the publication portfolio of researchers whose work has the largest citation impact in the literature. On the one hand, these scientists are already well established and thus may have less need or interest to venture into a new field. On the other hand, these scientists are also more productive and competitive, therefore they may be faster in moving into a rapidly emerging, new important frontier. We used the career-long statistics calculated with the Scopus database of August 1, 2021, using the code as provided with the supplemental data recently published for the most cited authors across science. [10] [11] [12] Each author has been assigned to a main field and main subfield based on the largest proportion of publications across fields and analysis is restricted to the top 2% authors per Science Metrix subfield. We have developed a composite citation indicator 10, 11 and accordingly 170,832 scientists can be classified as being in the top 2% of their main subfield discipline based on the citations that their work received through 2020. Of those, 125,869 were active and had published at least 1 paper also in 2020 or early 2021. In order to visualize the growth and spread of the COVID-19 scientific literature across scientific fields and over time, we used a graphical mapping of scientific fields that has been previously developed 13 and which places the 333 Scopus journal categories sequentially around the perimeter of a circle. There are 27 high-level categories that are placed first and ordered in a manner that emerges naturally from a meta-analysis of the layouts of other science maps created using multiple databases and methods. 14 Each of the 27 categories is assigned a unique color. The remaining 306 lower-level journal categories are then ordered within the corresponding high-level categories using factor analyses based on citation patterns. Each of the 333 journal categories thus has a fixed position on the perimeter of the circle. The full Scopus citation graph of well over 50 million articles and 1 billion citation links was used to cluster articles into over 90,000 topics using established methods. 15 Each topic is assigned a position within the circle based on triangulation of the positions of its constituent papers, each of which takes on the positional characteristics of its journal category. Topics are colored by their dominant journal category and area-sized proportionally based on the number of objects (e.g., papers, authors) being counted for the particular analysis. This circle of science and topic visualization are used in Elsevier's SciVal tool. For the display of authors per topic, we have assigned authors to one topic by taking the topic with the highest proportion of publications per author. We performed Scopus searches for terms reflecting 4 other infectious diseases that have manifested as epidemics in modern history (H1N1, AIDS, Ebola, Zika) and tuberculosis, an epidemic that is ongoing since ancient times and that has probably resulted in the largest cumulative number of deaths over history compared with any other infectious pathogen. One should cautiously interpret comparisons between different infectious diseases considering also for the explosive, pandemic nature of COVID-19 and the relative impact of these various disease entities. We used the search terms TITLE-ABS-KEY("swine flu" OR *h1n1*), TITLE-ABS-KEY(*ebola*), TITLE-ABS-KEY(*zika*), and TITLE-ABS(*tuberculosis*). AIDS requires a more thorough search strategy as keywords and title searches will yield many false positives for the target disease. We collected papers based on the Fingerprint engine concepts 29 "Human Immunodeficiency Virus 1", "HIV Infection", "AIDS/HIV", "HIV-1", "HIV Prevention", "HIV Testing", "Human Immunodeficiency Virus 1 Reverse Transcriptase". These concepts are based on a unified controlled thesaurus which among other things, addresses the disambiguation of homonyms such as "hearing aids." We also mapped the most prolific authors of the published COVID-19 corpus and the authors whose COVID-19 publications to-date had the highest citation impact. For prolific productivity, we ranked the authors according to decreasing number of COVID-19 published items. We show detailed data on extremely prolific authors with over 30 COVID-19 published items to-date. Hyper-prolific publishing reflects a complex phenomenon and may be generated by true productivity and excellence, but also by misconduct (e.g., gift and honorary authorship), and publication of trivialities or "salamislicing" where one body of work is cut into multiple "least publishable units." We make no effort to probe the key drivers in each hyper-prolific author. This is not feasible for the broad scope and number of papers considered in our study, plus misconduct is extremely difficult to prove. Nevertheless, we dissected among hyper-prolific authors whether they published also very large numbers of full papers (articles, reviews, conference proceeding papers) or mostly editorializing and other items that are not full papers. Citation impact was assessed with the previously proposed citation indicator [10] [11] [12] that combines information on 6 indices: total citations, Hirsch h-index, Schreiber hmindex, citations to single-authored papers, citations to first-or single-authored papers, and citations to first-, single-or last-authored papers. This avoids focusing simply on a single traditional metric such as citations, where it is expected that the authors of the earliest highly-cited papers would monopolize the top of the list, even if they had published a single paper and they were co-authors among many other authors. Self-citations are excluded from all calculations. 11, 12 We present descriptive data on the institution, country and two most common scientific subfields (per Science Metrix classification) for the top-300 authors in that list. We avoid comparisons based on statistical tests, as the analyses presented here are descriptive and exploratory. Among the 3,862,276 authors who have published anything that is Scopus-indexed in 2020 or early 2021 and who have also authored in their entire career at least 5 Scopusindexed papers that are classified as articles, reviews or conference papers, by the end of July 2021, 360,005 of these authors (9.3%), had at least one published and indexed COVID-19 paper. Critical Care Medicine (37.00%). However, such rates were higher than 10% (i.e., at least one in ten authors in that field had published on COVID-19) in 75 subfield disciplines and higher than 5% (i.e., at least one in twenty authors) in 107 subfield disciplines. All 174 subfields had one or more authors publishing on COVID-19. Supplementary Table 1 gives detailed data for COVID-19 publication rates of authors across all subfield disciplines. 28% of the authors published their COVID-19 research primarily in a subfield discipline that was not among the top 3 subfield disciplines where they had published most commonly during their career. Sometimes the fields of expertise of authors seemed remote from COVID-19, e.g., an expert on solar cells publishing on the epidemiology of COVID-19 in healthcare personnel. Even experts specializing in their past work on remote disciplines such as fisheries, ornithology, entomology or architecture had published on COVID-19. Influential scientists were even more likely to have published COVID-19 research (Supplementary Table 2 ). Among the 125,869 influential scientists active in publishing in 2020 or early 2021, 23,520 (18.7%) had COVID-19 publications in 2020 or early 2021. The publication rate was the highest in the fields of Public Health (39.7%) and Clinical Medicine (34.4%). Among subfield disciplines, the highest publication rate of such active, influential authors was seen ( Applied Ethics (60.2%) and Allergy (59.0%). However, publication rates were higher than 10% (i.e., at least one in ten authors in that field had authored something on COVID-19) in 109 of 174 subfield disciplines across science and higher than 5% (i.e., at least one in twenty authors) in 134 subfield disciplines. Figure 1 shows the growth and spread of COVID-19 papers, authors of COVID-19 papers, and high-impact authors of COVID-19 papers (those who belong to the top-2% of impact, as discussed previously) across scientific topics. As shown, there is a strong response of the literature and of the scientific workforce in some specific thematic areas, but there is also increasing and substantial involvement of scientists and respective publications, even in remote topics. As shown in Figure 2 , the massive growth of authors publishing on COVID-19 has been far more rapid and prominent than the growth of the publishing scientific workforce A total of 9,809 author IDs in Scopus had 10 or more Scopus-indexed published COVID items. Setting thresholds of at least 15, 20, 25, 30,40, 50 and 60 items, the numbers amounted to 3,661, 1,674, 932, 539, 220, 113 and 54 separate author IDs, respectively. Figure 3 shows the distribution of COVID publication frequency of authors and Table 2 the 53 authors with 60 or more COVID-19 published items indexed in Scopus (one author had two separate Scopus author ID files which we merged). Of these 53 extremely prolific authors, 5 were BMJ news journalists (including the author with the highest number of published items, Elisabeth Mahase, n=227 published items), two were editors of the New England Journal of Medicine, and one was a journalist at Option/Bio. Among the remaining 45 scientists, the most common countries were USA (n=7), UK (n=6) Italy (n=6) and India (n=5). When limited to full papers (articles, reviews, conference proceeding papers), there were 7 authors who had published 60 or more such full papers and 50 authors had published 40 or more. Supplementary Table 3 More than 700,000 scientists (and counting) have published work related to COVID-19. The most influential scientists across science were even more commonly engaged with COVID-19 research. More than one in six active, influential scientists quickly added or adjusted their publishing portfolio to include COVID-19. More than half of the active, influential scientists in several scientific subfields were involved urgently in COVID-19 work, and every single scientific subfield had some scientists publishing on COVID-19. The rapid and extensive spread of COVID-19 interests across the map of science was unique compared with other major epidemic infectious diseases. A comparison against 5 other major epidemic infectious diseases showed that none of them came anywhere close to the explosive nature of the involvement of the scientific workforce in COVID-19related work. This applied even to HIV/AIDS and tuberculosis that have had a far greater cumulative mortality toll. HIV/AIDS has killed over 35 million people and tuberculosis has killed over 1 billion people to-date. 16, 17 Our data even underestimate the explosive growth of COVID-19-related work, since some papers are published but not yet indexed. Some of this deficit is captured by preprints (a popular method of disseminating information in the COVID-19 era), 18 Many authors had published an astonishingly large number of COVID-19 items, and 53 had published 60 or more in such short time. Given delays in indexing, these numbers may underestimate the hyper-prolific productivity. The concentration of hyperprolific authors in countries like China, Hong Kong, and Italy may be related to the early outbreak of the pandemic in these locations, as well as prevalent co-authorship practices in these countries. Some of the unethical and questionable practices surrounding authorship may cluster in specific countries and specific research environments that overtly game and manipulate authorship, through practices such as gift or honorary authorship. Importantly, meritorious productivity versus sloppiness is difficult to disentangle without examining each case in depth. A large share of the hyper-prolific authors capitalized mostly on copious publishing of editorializing items rather than full papers (articles, reviews or conference proceeding papers). We also addressed the citation impact of authors for their COVID-19 work. The top ranks included many journalists and editors who published numerous news stories and editorials in their highly visible medical and science journals. This news/editorial function may be helpful. These published items may be readily used for citations, as they are often published well in advance of the scientific work to which they refer. However, the quality, standards and validity of rapidly deployed non-peer-reviewed items is unknown. Flashy news, media, and editorializing in both academic journals and the popular press may be prominent during the pandemic. 21-24 It is unknown whether non-peer-reviewed news stories and in-house editorials in major journals help safeguard against the "infodemic" or sometimes contribute to make things worse. Excluding journalists and editors of prestigious journals, the key countries of the authors with the highest composite citation indicator tended to be similar to the countries of the most prolific authors. A few subfields accounted for the lion's share of the authors with the highest composite citation indicator. The rapid response of the scientific community to crisis is largely a welcome phenomenon. Many scientists quickly focused their attention to an urgent situation and an entirely new pathogen and disease. This demonstrates that the scientific community has sufficient flexibility to shift attention rapidly to major issues. Much was swiftly learned on COVID-19. The quality of the published work was not assessed in our analysis, given the broad scope and huge diversity of the included papers. Nevertheless, many surveys of the quality of COVID-19 publications already exist. 25-38 Although existing surveys of the quality of COVID-19 research do not cover all subfields of investigation and quality is often difficult to measure precisely, the consistent finding of high prevalence of low quality studies across very different types of study designs suggests that a large portion (perhaps even the large majority) of the immense and rapidly growing COVID-19 literature may be of low quality. Moreover, massive productivity has been described in the pre-COVID era, as affecting researchers across many fields 39 and may be a particular feature for COVID-19 research. Extreme productivity would be worrisome if it sacrifices quality. The spread of COVID-19 publications in topics and authors traditionally working beyond key relevant disciplines further testifies the great attractiveness of COVID-19 as a field of investigation. The favorable aspect of this expansion is the ability to bring in specialists with expertise in diverse fields, fostering interdisciplinarity. There are situations where experts in seemingly extremely remote fields (e.g., music or second-language acquisition) may indeed be relevant to contribute to the COVID-19 literature. For example, experts almost in any field may be fully justified to publish on how COVID-19 impacted their work. However, we have anecdotally noted that many published contributions represent situations of epistemic trespassing, where scientists try to address COVID-19 health and medical questions, although they come from unrelated fields and probably lack fundamental subject-matter expertise. In particular, scientists who work with data of any sort, may feel entitled that they can handle, analyze, and interpret COVID-19-related data. We do not wish to single out specific scientists, since this may be a very common problem. Furthermore, the exact magnitude of this problem is difficult to fathom, because it is impossible to know details on whether specific scientists may have additional training/expertise on disciplines beyond what they have published on in their careers. However, in the absence of relevant subject-matter expertise among the authors' teams, the generated research products may be fundamentally flawed. 40 Such fundamentally flawed research may then even pass peer-review, since the same people populate also the ranks of peer-reviewers. Flaws go beyond retractions, which account for <0.1% of published COVID-19 work. 41, 42 Furthermore, there has been a rapid mobilization of funding into COVID-19 research, with some areas, e.g., vaccine development, earmarked for urgent work. According to one analysis, until the end of June 2021, $21.7 trillion have been committed to various activities related to the COVID-19 response (https://www.devex.com/news/interactive-who-s-funding-the-covid-19-response-andwhat-are-the-priorities-96833). While the vast majority of these funds are not directly related to research, some of this funding may eventually also support research products and publications. Direct research activities amount to $14 billion, plus there are $173 billion committed to vaccines and treatments and $237 billion committed to health systems. This funding may have worked as an additional attractor of scientists to this rapidly expanding field. Certain limitations should be discussed. First, current Scopus data have high precision and recall (98.1% and 94.4%, respectively), 6 but some authors may be split in two or more records and some ID records may include papers from two or more authors. These errors may affect single authors but are unlikely to affect the overall picture obtained in these analyses. Second, field and subfield classification follows a well-known established method, though published items are not precisely categorizable in scientific fields. Third, data on citation impact of COVID-19 authors are too early to appraise with confidence, and the ranking of specific scientists is highly tenuous and can quickly change with relatively small changes in citation counts. The bigger picture of author characteristics rather than specific names should be the focus of these data. Fourth, since many COVID-19 accepted papers are not yet indexed in Scopus, fields with slower publication and indexing may be relatively under-represented in the analyses. Fifth, we used simple terms that are highly specific for the comparative evaluation of other infectious diseases and some relevant papers and authors working on them may have been missed. However, the difference of these other diseases against the explosive nature of COVID-19 authorships is so stark that it would still be very prominent even if some additional authors working on these diseases could be identified. Sixth, given our study design, we cannot tell whether scientists who shift their attention to COVID-19 are abandoning their prior work, or just working additionally on COVID-19. The pandemic has had direct effects on some types of research, e.g., some investigations were suspended during lockdowns. One would need to have a far longer perspective to examine the longterm impact of potential "covidization" of research upon other scientific disciplines. How a torrent of COVID science changed research publishing -in seven charts Covidization of research: what are the risks Scientists fear that 'covidization' is distorting research Against pandemic research exceptionalism General medical publications during COVID-19 show increased dissemination despite lower validation Scopus as a curated, high-quality bibliometric data source for academic research in quantitative science studies When peer reviewers go rogue-Estimated prevalence of citation manipulation by reviewers based on the citation patterns of 69,000 reviewers Towards a multilingual, comprehensive and open scientific journal ontology Character-level convolutional networks for text classification Multiple citation indicators and their composite across scientific disciplines Updated science-wide author databases of standardized citation indicators Updated science-wide author databases of standardized citation indicators Toward an objective, reliable and accurate method for measuring research leadership Toward a consensus map of science Research portfolio analysis and topic prominence Epidemiology: a mortal foe World Health Organization Preprints bring 'firehose' of outbreak data Figure 3 . Frequency of authors according to the number of COVID-19 publications among the authors in Scopus with 5 or more publications in total on any topic. Levin