key: cord-103310-qtrquuvv authors: Wu, Tianzhi; Ge, Xijin; Yu, Guangchuang; Hu, Erqiang title: Open-source analytics tools for studying the COVID-19 coronavirus outbreak date: 2020-02-27 journal: nan DOI: 10.1101/2020.02.25.20027433 sha: doc_id: 103310 cord_uid: qtrquuvv To provide convenient access to epidemiological data on the coronavirus outbreak, we developed an R package, nCov2019 (https://github.com/GuangchuangYu/nCov2019). Besides detailed real-time statistics, it offers access to three data sources with detailed daily statistics from December 1, 2019, for 43 countries and more than 500 Chinese cities. We also developed a web app (http://www.bcloud.org/e/) with interactive plots and simple time-series forecasts. These analytics tools could be useful in informing the public and studying how this and similar viruses spread in populous countries. As demonstrated in Suppl. Doc. 1, this new package also contains functionalities to facilitate data visualization. For example, with one command, users can easily plot the distribution of cases on the maps of the world, China, and even individual provinces ( Figure 1 ). With historical data, we can incorporate temporal and spatial information to create an animation to help us understand disease transmission and examine the spread of the COVID-19 outbreak. To enable users to access these datasets without coding, we also developed interactive web apps in both English [9] and Chinese [10] . As demonstrated in Supp. Doc. 1, these apps can also be run locally from Rstudio. Using these apps, users can gain insights by quickly generating all 23 plots in Supp. Doc. 2 based on daily updated data. Complementing the dashboard by Dong et al. [3] , our web app enables users to select their regions of interest and check both the historical and real-time data. Generated by the app on February 25, 2020, Figure 2 shows that the total confirmed cases in the provinces outside Hubei are stabilizing, following a similar trend. The extreme measures that the Chinese government took since January 23 seem to be working. Built with the RStudio Shiny framework, these apps contain a simple forecast module. We first converted the log-transformed numbers of cases or deaths as a time-series data, then used the exponential smoothing method (ets) in the R package forecast [11] with default settings to forecast the total cases. On February 7, 2020, this simple model predicted that the death toll would reach 2000 in ten days, a staggering number at the time that later materialized, unfortunately. We also converted the raw number of cases as percent daily changes and conducted a similar forecast. Interestingly, daily percent changes in both confirmed cases and deaths in China are decreasing linearly except for a few outliers (see Figure 16 and 18 in Supplementary Document 2). Even though not all data sources are official statistics, this kind of detailed data offers a unique opportunity to study this novel pathogen. The hundreds of cities could even be considered as semi-independent outbreaks, as many of them are far from the epicenter and effectively on lockdown from the end of January 2020. As shown in Figures 5 and 6 in Supp. Doc. 2, the death rate, estimated by dividing current total deaths by total confirmed cases, in Wuhan is 4.47%. Probably due to an overwhelmed healthcare system, this death rate is higher than the average of 2.92% (95% confidence interval [2.35% -3.38%]) observed in 22 Chinese cities with 200 or more confirmed cases. Cities in Hubei province have higher fatality rates than cities in other regions (Figure 6 in Supp. Doc. 2). Internationally, the death rate in Japan (2.50%) is close to that of Italy (2.60%), lower than the 3.67% observed in China overall (Figure 17 in Supp. Doc. 2). The death rate in Iran is 9.63%, probably due to underreported cases. The rapid, exponential growth phase in China spans roughly from January 15 to February 15, 2020, when the number of confirmed cases skyrocketed 1670-fold from 41 to 68,500. Such rapid growth is now evident in South Korea, Italy, and Iran ( Figure 3 ). Other countries with a smaller number of cases but showing a sharp upward trend include Germany, Spain, and France. If not managed well, tens of thousands of cases in each of these and other countries could be possible in weeks. Public health officials need to grasp the power of exponential growth. Currently, city-level historical data is only available for China. These data sources occasionally change data formats, which requires us to monitoring the data sources. If the Supplementary Document 1: Detailed tutorial and example of how to use the R package. Supplementary Document 2: Example of plots obtained from our web app. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint . is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint . . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint . https://doi.org/10.1101/2020.02.25.20027433 doi: medRxiv preprint Figure 3 . Countries with rapidly growing COVID-19 cases. This plot is obtained using our interactive app. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/2020.02.25.20027433 doi: medRxiv preprint A novel coronavirus outbreak of global health concern A Novel Coronavirus from Patients with Pneumonia in China An interactive web-based dashboard to track COVID-19 in real time nCov2019: An R package for accessing coronavirus statistics Real-time tracking of the coronavirus infection Real-time data on the novel coronavirus Daily statistics of 2019-nCov Clinical features of patients infected with 2019 novel coronavirus in Wuhan Coronavirus COVID-19 outbreak statistics and forecast Coronavirus COVID-19 outbreak statistics and forecast Forecasting with exponential smoothing : the state space approach. Springer series in statistics APIs stopped providing data, the real-time data would not be updated. But the historical data will remain accessible for researchers. We will maintain the web apps during this outbreak.Our nCov2019 package reduces the barrier for researchers and public health officials in obtaining comprehensive, up-to-date data on this ongoing outbreak. With this package, epidemiologists and other scientists can directly access data from four sources, facilitating mathematical modeling and forecasting of the COVID-19 outbreak. The interactive web apps are accessible to the general public and could also be easily customized by researchers to produce other dashboards or track other countries. We hope these analytics tools could be useful in studying and managing this pathogen on a global scale.Conflicts of Interests: None.