key: cord-0887686-fzs99kku authors: Locey, Kenneth J; Webb, Thomas A; Khan, Jawad; Antony, Anuja K; Hota, Bala title: An Interactive Tool to Forecast US Hospital Needs in the Coronavirus 2019 Pandemic date: 2020-04-23 journal: nan DOI: 10.1101/2020.04.20.20073031 sha: 2e3a012c752d8cb6b18a5cb79d93282ac592c9ab doc_id: 887686 cord_uid: fzs99kku Hospital enterprises are currently faced with anticipating the spread of COVID-19 and the effects it will have on visits, admissions, bed needs, and crucial supplies. While many studies have focused on understanding the basic epidemiology of the disease, few open source tools have been made available to aid hospitals in their planning. We developed a web-based application for US states and territories that allows users to choose from a suite of models already employed in characterizing the spread of COVID-19. Users can obtain forecasts for hospital visits and admissions as well as anticipated needs for ICU and non-ICU beds, ventilators, and personal protective equipment supplies. Users can also customize a large set of inputs, view the variability in forecasts over time, and download forecast data. We describe our web application and its models in detail and provide recommendations and caveats for its use. Our application is primarily designed for hospital leaders, healthcare workers, and government official who may lack specialized knowledge in epidemiology and modeling. However, specialists can also use our open source code as a platform for modification and deeper study. As the dynamics of COVID-19 within the US change, our application will also change to meet emerging needs and questions of the healthcare community. Health Organization on March 11 th 2020. By then, confirmed COVID-19 cases were reported among 109 countries and exceeded 128,000 worldwide (data source: Johns Hopkins University Center for Systems Science and Engineering). That number increased more than 10-fold in less than one month. As COVID-19 spreads within and among nations, healthcare enterprises grapple with the challenges of preparing for the growing number of COVID-19 patients and with appropriating the resources needed to treat patients while protecting healthcare professionals. In the United States (US), policy makers and hospital leaders prognosticate on how to best allocate resources in the face of an anticipated surge in demand that may last for several months to come (Bukhari and Jameel, 2020) . However, even as COVID-19 threatens to overwhelm healthcare systems, the predictive analytics tools that would otherwise allow hospitals to make informed decisions are lagging behind the increasing number of studies aimed at characterizing the basic epidemiology of COVID-19. To meet the pressing needs of hospital enterprises we developed an interactive, opensource web-based application to provide state-and hospital-specific forecasts of COVID-19 patients and related supplies. Our application is available on the http://covid19forecast.rush.edu/ website and allows users to employ a suite of models already used in the prediction of COVID-19 cases. It then couples these models to granular customizable inputs to produce hospital-level forecasts for COVID-19 visits and admits, ICU and non-ICU beds, ventilators, and various personal protective equipment (PPE) supply needs. Our efforts are aimed at addressing immediate and anticipated healthcare demands, and to allow informed decision-making by government officials and healthcare professionals who may lack specialized expertise in epidemiology, modeling, and data science. By making our aggregated data and source code freely available, and by offering additional source code outside the application itself, epidemiologists, modelers, and data scientists may also find our application useful as is, or as a modifiable resource for deeper analysis. In this paper, we describe our application in detail, focusing on the data and models it uses, the inputs it allows users to enter, and the graphs, tables, and downloadable data it provides. We also provide guidance on the use of this application, the interpretation of its . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 23, 2020. . https://doi.org/10.1101/2020.04. 20.20073031 doi: medRxiv preprint outputs, and the caveats of our approach. Finally, we discuss the value of our application to meeting the needs of the healthcare community, its potential as a tool for generating novel insights, and modifications to come. In addition to our aim of empowering administrators, physicians, and governmental agencies to make informed decisions, we sought to enable other predictive healthcare analytics teams and researchers. Specifically, the modification and deployment of our application's source code requires a minimal set of widely popular open source software (e.g., python language, Jupyter notebook) and little-to-no experience in web development. Overview -Our open-source application allows users 1) to aggregate data from a popular opensource repository of COVID-19 data, 2) to track and forecast the progression of COVID-19 cases across US states using a suite of well-known models, 3) customize a large set of input parameters to provide state-and hospital-specific forecasts for numbers of hospital visits, admitted patients, ICU needs, and personal protective equipment (PPE) supply needs. The application also allows users to adjust the length of forecasts, to adjust expected time lags in patient visits, to adjust the average length of stay (LOS) for ICU and non-ICU patients, to examine forecasts from previous days, and to download forecast data for deeper analysis. Data -Our application accesses COVID-19 data from the Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE) (2) . Specifically, our application downloads, aggregates, and stores daily reports from the JHU CSSE public GitHub repository (https://github.com/CSSEGISandData/COVID-19). These daily reports contain numbers of confirmed cases, and numbers of reported deaths and recoveries for counties, states, provinces, and nations reported since January 22 nd , 2020. For select models, our application uses population sizes for US states and territories based on data from the US Census Bureau (2010 -2019). Additionally, our application uses dates of COVID-19 arrival in US states and territories based on data available from state and territory governmental agencies (e.g., Departments of Health). . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted April 23, 2020. . Modeling COVID-19 cases -Our application allows users to choose from five simplistic models of growth that have been used in characterizing the spread of COVID-19 as well as one popular epidemiological model that has been frequently used in the study of COVID-19 (Fig 1) . Altogether, the following suite of models allows users the ability to compare and contrast forecasts resulting from different forms of growth and varying complexity of disease dynamics and social response (e.g., testing lags, social distancing). Exponential growth -Initial stages of growth often appear limited only by the inherent growth rate (r) of the population or disease. In this way, exponential growth proceeds multiplicatively according to a simple functional form, Nt = N0e rt , where N0 is the initial infected population size, t is the amount of passed, and Nt is the infected population size at t. The exponential model has been widely used to characterize the spread of COVID-19 the during initial weeks of infection Because it assumes that r is constant, the exponential model has a simple log-linear transformation, log(Nt) = log(N0) + t · r, that allows log-transformed numbers of cases to be regressed on t (Sit, Poulin-Costello, and Bergerud 1994). Our application uses this exponential regression to obtain predictions for the expected number of confirmed COVID-19 cases (N). This model has explained upwards of 99% of variation in the initial days or weeks of COVID-19 spread within states; however, it quickly begins to fail because it only allows for continued rapid growth (Fig 2) . Quadratic growth -Initial stages of growth may be more rapid than that expected from the exponential model while the latter monotonic increase in N can proceed less rapidly than predicted by the exponential. In these cases, growth may be quadratic, i.e., characterized by a constant change in growth rate. Early COVID-19 studies have implicated quadratic growth in spread of COVID-19 (e.g., Brandenburg 2020; Fang, Ne, and Penny 2020) and the quadratic model, to date, has continued to perform well (Fig 2) . The quadratic function, f(x) = x 2 + x + c, is a 2 nd order polynomial that can be applied to population growth as Nt = β1t 2 + β2t + N0. Our application uses numerical optimization of the fitted parameters, β1 and β2, to find the best fit quadratic function for a given time series and hence, to predict values for (N). This model has, thus far, improved as COVID-19 spreads and has explained upwards of 99% of variation in . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted April 23, 2020. . https://doi.org/10.1101/2020.04.20.20073031 doi: medRxiv preprint COVID-19 cases among states (Fig 2) . However, like the exponential model, the quadratic model only allows for continued growth, i.e., no saturation and decline. Consequently, the quadratic model must eventually fail as COVID-19 cases saturate. Logistic growth -Exponential growth within a population cannot continue ad infinitum. Instead, growth must slow as an upper limit is approached or as natural limitations to disease spread (e.g., Like the quadratic model, the logistic model has also continued to perform well as states have progressed in COVID-19 infection (Fig 2) . The logistic model takes a relatively simple functional form, ! = " #$% !"# , where α is the upper limit of N and r is the intrinsic rate of increase. Our application uses numerical optimization of α and r to find the best fit logistic function and hence, predicted values for N. This model has, to date, improved as COVID-19 spreads and has explained upwards of 99% of variation in COVID-19 cases among states (Fig 2) . While this model allows for saturation, it does not allow for decrease. Cubic growth -Despite their usefulness during the early-to-mid stages of COVID-19 spread, both the exponential and quadratic models only allow for monotonic and rapid increase (Fig 1) . In contrast, a cubic function allows growth to accelerate until an inflection is reached, where after, the function begins to saturate (Fig 1) . The cubic function is a 3 rd order polynomial that can be applied to population growth as Nt = β1t 3 + β2t 2 + β3t + N0. Our application uses numerical optimization of the fitted coefficients, β1, β2, and β3, to find the best fit cubic function for a given time series and hence, to predict values for (N). Depending on the values of the estimated coefficients, the cubic function can saturate and then decrease (Fig 1) . This model has, thus far, explained upwards of 99% of variation in COVID-19 cases among states and continues to improve as COVID-19 spreads (Fig 2) . . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) This symmetrical curve has two parameters, mean = μ, standard deviation = σ, and belongs to the family of exponential distributions (Fig 1) . When used to model spread of disease, Gaussian curves are symmetrical around a climax day with the change in the rate of growth determining the standard deviation about the curve. Gaussian models have previously been successful in approximating the spread of COVID-19 in Germany (Schlickeiser and Schlickeiser 2020). Our application uses numerical optimization of μ and σ to find the best fit Gaussian function and hence, predicted values for N. This model has, thus far, explained >99% of variation in COVID-19 cases among states and continues to improve as COVID-19 spreads (Fig 2) . In these equations, α is the inverse of the incubation period, and γ is the inverse of the average infectious period, and β is the average number of contacts of infected persons with susceptible persons per unit time. Our application imputes the initial value of β from a well-known . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 23, 2020. We allowed β to decrease in proportion to I. We assumed that people will, on average, reduce their contact with others when the populace is aware that an increasing percent of their population is infected. This approach allows an inherent degree of social distancing to emerge as a frequency-dependent phenomenon. We also simulated an explicit effect of social distancing (λ) to capture the overall strength of response to public health policies. These effects were included as time-iterative modifications to β: This function allows β to remain unchanged when either I or λ equal 0. When λ equals 1, the daily change in β is governed by the implicit frequency-dependent effect of I. Importantly, simple algebraic rearrangement shows that the product of social distancing (λ) and the fraction infected (I) determines the percent daily change in contact rate (β): As a result, λ determines the daily proportional change in the contact rate per infected fraction of the total population: We also modified the classic SEIR model to account for initial time lags in COVID-19 testing. Specifically, and particularly in the US, widespread testing for COVID-19 may have artificially dampened the apparent number of positive cases within the first month of the first reported infection. We accounted for this effect by modifying the apparent size of I while . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 23, 2020. . https://doi.org/10.1101/2020.04.20.20073031 doi: medRxiv preprint allowing the actual size of I to grow according to the SEIR-SD dynamic. This time-iterative modification took the following logistic functional form: where τ and ε are fitted parameters. This equation models testing as low-to-nonexistent during the initial weeks of outbreak, and then accelerates afterwards. To date, the SEIR-SD model has generally performed as well as or better than the exponential, quadratic, logistic, cubic, and gaussian models (Fig 2) . As COVID-19 cases saturate and begin to decline, we expect the SEIR-SD model to produce more realistic downward trajectories than the other models. Our application performs a pseudo-optimization on the SEIR-SD model parameters and a likely date of initial infection, as opposed to using the first reported occurrence. Our implementation of the SEIR-SD model is based on an unbiased search of multivariate parameter space within ranges of parameter values derived from population sizes for US states and territories and the increasing corpus of COVID-19 literature (Table 1) . Our application performs 50,000 iterations and chooses the set of parameters that maximize the explained variation in observed data. This implementation avoids the computational challenges of applying numerical optimizers to complex simulation models and avoids the problems that these optimizers can have in becoming trapped in local minima. Forecasting COVID-19 cases -Our application allows users to select one of the abovementioned models and a location from a list of US states and territories (Fig 3) . It then plots the reported number of COVID-19 cases along with model predictions (up to present day) and forecasts (up to 60 days ahead) (Fig 3) . Users can also view how the predictions and forecasts have changed over the last 10 days (Fig 3) . is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 23, 2020. . https://doi.org/10.1101/2020.04.20.20073031 doi: medRxiv preprint hospital. The Poisson distribution has ideal properties for modeling this scenario. Specifically, it is a discrete probability distribution (x-values correspond to days) with a simplistic formulation (e.g., the mean equals the variance), where the only parameter needed to obtain the probability mass function (i.e., mean), directly corresponds to the expected average time lag. Users can view the change in the forecasted patient census by adjusting the expected time lag (Fig 4) . Because bed needs must also reflect the numbers of beds needed for new admissions, those opened from discharges, and those currently occupied, our application models the daily carryover of the patient census using expected LOS and the cumulative distribution function (cdf) of the binomial distribution. The binomial distribution is a discrete probability distribution that models binary outcomes (e.g., patients either leave the hospital or stay for an additional day) and requires only two parameters (p, n). We set the value of p to 0.5 and set the value of n to be twice the average LOS. Doing so produces a symmetrical probability mass function (pmf), with a mean equal to the average LOS. This pmf is then converted to a cdf, which produces a first approximation for the fraction of 1-day, 2-day, …, etc., patients that are expected to be discharged on the present day. The fraction of patients not expected to be discharged are then carried over to the following day, e.g., a 1 st -day patient becomes a 2 nd -day patient and thereby has a different probability of leaving the hospital on the current day. This process is then iterated from the date of the first COVID-19 admit to the latest day in the user-requested forecast window (Fig 5) . is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 23, 2020. . procedural face masks, extra-large yellow isolation gowns, anti-fog (with film) surgical face masks, anti-fog full face shields, and particulate filter respirators. These values entered are then multiplied by the respective patient type across the forecasted patient census to produce graphical and tabulated results (Fig 6) . Provisioning of source code -Source code for our application was built using the Python language (v3.7.4), Jupyter notebook, and the python-based Voila software which converts Jupyter notebooks to dashboard-like applications. All code and associated data are available from the public GitHub SupplyDemand repository, found on the Rush Quality Safety and Value analytics GitHub organization (https://github.com/Rush-Quality-Analytics/SupplyDemand). The repository provides an informative README.md file and the source code includes extensive commenting to assist users in their use and potential modification. Meeting immediate needs -Hospitals and healthcare enterprises are endeavoring to make appropriate preparations and acquire adequate supplies to meet the challenges of the COVID-19 pandemic. While many studies have aimed to characterize the basic epidemiology of the disease, and many online tools have been developed to visualize its spread, few tools have been developed that empower hospitals to make informed decisions about expected visits and admits, ICU beds, ventilator, and PPE needs. To this end, our application is already in use by our home institution and several hospitals across the country and is available on the http://covid19forecast.rush.edu/ website. Consequently, the present work is intended to 1) make our application broadly known to the healthcare and scientific communities 2) give healthcare providers an in-depth understanding of our application, and 3) to point specialists, non-. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 23, 2020. . https://doi.org/10.1101/2020.04.20.20073031 doi: medRxiv preprint specialists, and in-house predictive analytics teams to a freely available tool that can also serve as modifiable platform of analytical source code and aggregated data. While some regions in the United States have started to experience peaks in cases with plateaus and declines, we anticipate that until a vaccine is available, episodes of repeated increased cases will be seen, for which forecast models will inform operational responses. Novel insights for applied and basic research -While our intention was not to provide de novo models or to provide refined epidemiological parameter estimates, our tool does allow for novel insights. First, our SEIR-SD model incorporates two phenomena of global, national, and local concern, i.e., social distancing and lags in COVID-19 testing. To our knowledge few extensions of the SEIR model have accounted for social distancing in being driven by an emergent social response to increased percent infected and as driven by external forces (e.g., public policy). Likewise, few if any SEIR models have accounted for the influence of lags in testing on the apparent size of the infected population. Second, we envision that applied and basic research studies can be conducted using the downloadable data of our application and the freely available source code. Specifically, users can investigate any number of simple-to-complex relationships using the downloadable forecast data that results from our suite of models and which is offered alongside adjustable forecast windows, time-lags, lengths of stay, as well as other customizable parameters and aggregated data (population size, date of first reported infection, numbers of confirmed COVID-19 cases). Pending and potential modifications -We are continually improving the functionality and performance of our application to meet the predictive analytic needs of our home institution and broader healthcare community. In the near future, we will include models to predict the eventual decline of the pandemic and potential resurgence as social distancing guidelines and other mitigating policies are relaxed. We also look to provide the functionality to 1) examine regions outside the US, 2) examine county-level regions within the US, 3) include a greater array of supply needs and forecasts for numbers of providers and staff needed, and 4) allow providers to begin planning how and when to increase the number of elective surgeries and ambulatory visits. In building our open-source application from a small set of freely available and highly popular . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 23, 2020. . software tools, we expect that other researchers and healthcare analytics teams could readily pursue these and other improvements. Caveats and limitations -Our web application is versatile and easy to use. However, users should consider the following caveats and limitations of modeling. First, our application draws from a widely used COVID-19 dataset that may not reflect the true prevalence of COVID-19 within each US state and territory. While our application allows users to enter several parameters We thank the Johns Hopkins University Center for Systems Science and Engineering (JH CSSE) for continuing to provide daily updated data on the COVID-19 pandemic. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 23, 2020. . . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 23, 2020. . Our application allows users to choose from six models that have previously been employed in the study of COVID-19, the general forms of which are depicted below. While the exponential and quadratic (2 nd degree polynomial) models only allow for continued growth, the logistic allows growth to saturate. The cubic model (3 rd degree polynomial), Gaussian, and SEIR-SD models allow growth to saturate and decrease. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 23, 2020. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 23, 2020. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 23, 2020. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted April 23, 2020. . https://doi.org/10.1101/2020.04.20.20073031 doi: medRxiv preprint Figure 5 . The third pane our application plots and tabulates the forecasted patient census. Left: A graph of forecasted numbers of critical care and non-critical care patients . Right: A table of forecasted bed needs for all COVID-19 patients, non-ICU patients, ICU patients, and ICU patients on ventilators. The application informs the user that forecasted data beyond 14 days is available via csv downloads. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted April 23, 2020. Users can click the teal button at the bottom to display or refresh downloadable csv files. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted April 23, 2020. . https://doi.org/10.1101/2020.04.20.20073031 doi: medRxiv preprint How will country-based mitigation measures influence the course of the COVID-19 epidemic? The Lancet Quadratic growth during the 2019 novel coronavirus epidemic A short history of mathematical population dynamics Risk assessment of novel coronavirus COVID-19 outbreaks outside Gaussian process approximations for fast inference from infectious disease data Will coronavirus pandemic diminish by summer? Available at Estimating the effective reproduction number of the 2019-nCoV in China Transmission dynamics of the COVID-19 outbreak and effectiveness of government interventions: A data-driven analysis Some epidemiological models with nonlinear incidence Critical care utilization for the COVID-19 outbreak in Lombardy, Italy: early experience and forecast during an emergency response Early dynamics of transmission and control of COVID-19: a mathematical modelling study. The Lancet Infectious Diseases A conceptual model for the coronavirus disease 2019 (COVID-19) outbreak in Wuhan, China with individual reaction and governmental action The reproductive number of COVID-19 is higher compared to SARS coronavirus How species richness and total abundance constrain the distribution of abundance Quantifying undetected COVID-19 cases and effects of containment measures in Italy Effective containment explains sub-exponential growth in confirmed cases of recent COVID-19 outbreak in Mainland China Models in ecology Epidemic analysis of COVID-19 in China by dynamical modeling Novel coronavirus 2019-nCoV: early estimation of epidemiological parameters and epidemic predictions COVID-19 and Italy: what next? The Lancet COVID-19 outbreak on the Diamond Princess cruise ship: estimating the epidemic potential and effectiveness of public health countermeasures Real-time forecasts of the COVID-19 epidemic in China from Catalogue of curves for curve fitting Phase-adjusted estimation of the number of coronavirus disease 2019 cases in Wuhan WHO Director-General's opening remarks at the media briefing on COVID19 -March 2020 Generalized logistic growth modeling of the COVID-19 outbreak in 29 provinces in China and in the rest of the world Estimation of the time-varying reproduction number of COVID-19 outbreak in China