key: cord-0971643-ua3832ua authors: Buttrick, Peter title: A Modern Library of Babel date: 2020-08-24 journal: J Card Fail DOI: 10.1016/j.cardfail.2020.07.007 sha: 19d2d9cf5d4c91e420fb5cf1520393457203508e doc_id: 971643 cord_uid: ua3832ua nan The recent explosion of literature linked to COVID-19 reminds one of Jose Luis Borges' short story, "The Library of Babel". This work describes a universe of books, the order and content of which are random and quite possibly meaningless. However, while the vast majority of the books in this library are pure nonsense, it also contains, somewhere, every coherent book ever written, or that might ever be written, and all useful information, including predictions of the future. Despite, or perhaps because of this glut of information, the books are useless to the reader, leaving the librarians in a state of suicidal despair. Our current state of affairs is not so apocalyptic and journal editors seem to be holding up, however there is some resonance between the current proliferation of COVID related publications and the Borges metaphoric library. By one estimate, the COVID-19 literature published between February and May this year included more than 30,000 papers and was doubling every 20 days, 1 among the biggest explosions of scientific literature ever. A large plurality of these papers were in non-peer reviewed journals or were presented initially as non-peer reviewed preprints. When the peer-review process was deployed, it was clearly accelerated as the 14 medical journals publishing the most COVID-19 content halved the average time from submission to publication 2 and whether this affected the quality of the review process is conjectural. This explosion of literature has provoked the creation of a tranche of analytic tools, some based on natural language processing, designed to build searchable archives, the most prominent of which is probably the CORD-19 data set. 3 It currently holds more than 60,000 published articles and preprints, including studies of coronaviruses dating back to the 1950s and which in turn allows some rational filtering and integration of this new, rapidly expanding body of work. Obviously, this outpouring of literature was fueled by the legitimate need for information, any information, that might help navigate the acute phase of the pandemic and it seems likely (although it's still probably premature to make a definitive assessment) that the sheer volume of information that was shared so quickly altered treatments and improved outcomes for thousands of hospitalized patients. However, it is also clear that misleading information also was propagated. The most striking example of this probably related to the very early, small, non-randomized clinical reports claiming efficacy of hydroxychoroquine and azithromycin as treatment for COVID-19 4 which have subsequently been largely disproven and may even have been shown to be harmful in patients with co-incident cardiac disease (despite deafening initial enthusiasm). There have also been several high profile retractions of peer-reviewed papers 5 in which the clinical cohorts studied may not have been carefully validated. (An irony was that one of these retractions addressed the utility of hydroxycholoquine) A further problem, identified by the editors of JAMA, 6 is that some of the patients described in various publications have been reported on more than once. Leaving aside the fact that this represents a lapse in ethical standards, describing the same patients in different articles, per the JAMA editorial board, "creates an inaccurate scientific record, may affect the accuracy of subsequent estimates of prevalence of the disease or outcomes, and may preclude valid metaanalyses". In sum, these publication trends have real potential for harm, especially in the case of COVID-19 pandemic research where the therapeutic and epidemiologic landscape is so labile. So a valid question that all journal editors and journal readers, need to wrestle with is how do we balance the need for very rapid dissemination of new information with the need for careful scrutiny and validation of that information, a process which of necessity takes a great deal of time and attention. Obviously there is no clear answer but as the process has unfolded, it appears as if the very early case descriptions were critical to disseminate as quickly as possible so that clinicians around the world would be alert to a new emerging pathogen and associated clinical syndrome. This phase of information dissemination recalls the early days of the AIDS epidemic which first appeared in the Morbidity and Mortality Weekly Report (MMWR) of the CDC in 1981 as a description of 5 previously healthy men in Los Angeles with unexplained pneumocystis carinii. 7 This single descriptive report prompted several others from around the county, and in very short order an acquired immune deficiency disorder was recognized by the healthcare community. Having gotten past this first descriptive phase with COVID-19, we have now entered into a phase where the possibility and risk of dissemination of misleading information is real, based on the legitimate desire for any new information linked to therapeutic outcomes as well as the desire of journals and journal editors to publish high impact studies. In this phase it is critically important that the peer review process (and time to publication) be both accelerated and rigorous. Journal editors can and should demand timely reviews of highly impactful manuscripts but they should insist on exacting attention paid to statistical analysis, transparency around study registration and prespecified analytical plans, and access to primary data. This latter point is very important to emphasize as the lack of this access is what led to the high profile retractions cited above, as well as to the risk of the same patients being described in multiple publications. The cardiology research community has historically done an excellent job in building searchable data registries through the NCDR, the VA COIN network and others so the burden and responsibility on us in this area is particularly high. Information dissemination also places a very high burden on the reader. It is very likely that new data will be highly nuanced, not adequately captured in a manuscript abstract, and will itself require careful interpretation. As we have seen, much information about COVID-19 ends up on social media platforms even before it is formally reviewed and filtering this so that it is accurate and not politicized is the responsibility of all informed consumers. Indeed in the case of COVID related publications, the Altmetric score (a measure of media attention) and the citation index of published works correlate poorly, suggesting a real divergence in interest between biomedical researchers and those who primarily engage with the topic through the media. So far the clinical and research communities seem to be navigating this unprecedented path reasonably well, but if we fail, there is a very real risk that a contemporary version of Borges' Library of Babel will emerge. Pandemic Publishing: Medical journals drastically speed up their publication process for Covid-19 CORD-19: The Covid-19 Open Research Dataset Hydroxychloroquine and azithromycin as a treatment of COVID-19: results of an open-label non-randomized clinical Authors, elite journals under fire after major retractions Possible Reporting of the Same Patients With COVID-19 in Different Reports JAMA