key: cord-0518071-0jpvqqdw authors: Bartz-Beielstein, Thomas; Rehbach, Frederik; Mersmann, Olaf; Bartz, Eva title: Hospital Capacity Planning Using Discrete Event Simulation Under Special Consideration of the COVID-19 Pandemic date: 2020-12-14 journal: nan DOI: nan sha: c3039be0f703a81011d4b15abb2f43497c3a2c48 doc_id: 518071 cord_uid: 0jpvqqdw We present a resource-planning tool for hospitals under special consideration of the COVID-19 pandemic, called babsim.hospital. It provides many advantages for crisis teams, e.g., comparison with their own local planning, simulation of local events, simulation of several scenarios (worst / best case). There are benefits for medical professionals, e.g, analysis of the pandemic at local, regional, state and federal level, the consideration of special risk groups, tools for validating the length of stays and transition probabilities. Finally, there are potential advantages for administration, management, e.g., assessment of the situation of individual hospitals taking local events into account, consideration of relevant resources such as beds, ventilators, rooms, protective clothing, and personnel planning, e.g., medical and nursing staff. babsim.hospital combines simulation, optimization, statistics, and artificial intelligence processes in a very efficient way. The core is a discrete, event-based simulation model. Resource and capacity planning with respect to the COVID-19 pandemic is a challenging task for hospitals. This paper describes BaBSimHospital, a resource planning tool that was developed in cooperation with ICU experts, crisis teams, and health administrations. babsim.hospital combines simulation, optimization, statistics, and artificial intelligence processes in a very efficient way. The core is a discrete, event-based simulation model. Sequential parameter optimization (SPOT) is used to optimize the parameters (status transition probabilities, length of stay, distribution properties) Bartz-Beielstein, Lasarczyk, and Preuss (2005) . For modeling purposes, distributions (including a truncated and translated Gamma distribution) were specially developed by us in order to realistically simulate the length of stay. babsim.hospital takes into account different risks for individual groups of people (age and gender-specific) and can be used for any resources beyond the planning of bed capacities. Using BaBSimHospital provides the following advantages for crisis teams: • comparison with your own local planning, • simulation of local events, arXiv:2012.07188v1 [stat.AP] 14 Dec 2020 • adaptation to your own situation, • simulation of any scenario (worst / best case), • simulation of any pandemic scenario, considering the local situation, • standardized approach. And, there are benefits for medical professionals, e.g, • analysis of the pandemic at local, regional, state and federal level, • consideration of special risk groups, • validation of the length of stay, • validation of the probabilities. Finally, there are potential advantages for administration, management, e.g., • assessment of the situation of individual hospitals taking local events into account, • consideration of relevant resources: beds, ventilators, rooms, protective clothing, • personnel planning: medical and nursing staff. The ideas used in the babsim.hospital implementation are based on the paper by Lawton and McCooe (2019) . babsim.hospital is implemented in the statistical programming language R, see R Core Team (2020). It uses a discrete-event simulation, which uses the simmer package (Ucar, Smeets, and Azcorra 2019) . This paper describes practical aspects ("how to use") of the babsim.hospital package. Although this paper focuses on the situation in Germany, it can be used for different settings. For example, we have used data from UK to model a local configuration. Even, if there are no real-world data available, the user can perform simulations based on synthetic data. How to use synthetic data and the theoretical background of babsim.hospital is described in Bartz-Beielstein et al. (2020) . This paper is structured as follows. Section2.2 describes the data, which are necessary for the simulation. How to run a simulation is described in Section 3. The visualization of simulation results is illustrated in Section 4. Section 4 also presents the error measure, which is used for the optimizations. The optimization procedure is introduced in Section 5. How to visualize the optimized parameter settings is discussed in Section 6. Section 7 describes how data for future simulations can be generated. babsim.hospital provides tools for a deeper understanding of the simulation parameters. The corresponding tools from statistics and sensitivity analysis are described in Section 8. A clean start and loading the required packages can be performed as follows: rm(list=ls()) suppressPackageStartupMessages({ library("SPOT") library("babsim.hospital") library("simmer") library("simmer.plot") library("plotly") library("rpart") library("rpart.plot") }) We need at least version 2.1.8 of SPOT. packageVersion("SPOT") %> [1] '2.1.10' The babsim.hospital simulator models resources usage in hospitals, e.g., number of ICU beds (y), as a function of the number of infected individuals (x). In addition to the number of infections, information about age and gender will be used as simulation input. In general, the simulator requires two types of data: simulation data that describes the spread of the pandemic over time and field data that contains daily resource usage data. Included in the package are tools to generate synthetic data, e.g., you can generate simulation and field data to run the simulations. This procedure is described in Bartz-Beielstein et al. (2020) . To demonstrate the usage of real-world data, we have included two sample datasets from Germany. • Simulation data, i.e., input data for the simulation. We have included a data sample from the German [Robert Koch-Institute](https://www.rki.de) (RKI). Please take the copyright notice, which is shown in Section 10 (Appendix) under advisement, if you plan to use the RKI data included in the package. • Field data. We have included a data sample from the German [DIVI Register](https://www.intensivregister.de/). The field data is used to validate the output of the simulation. Please take the copyright notice, which is shown in Section 10 (Appendix) under advisement, if you plan to use the RKI data included in the package. First, we will take a closer look at the simulation data. Here, we will use the data from the RKI Server. babsim.hospital provides a function to update the (daily) RKI data. updateRkidataFile("https://www.arcgis.com/sharing/rest/content/items/f10774f1c63e40168479a1feb6c7ca74/da Users are expected to adapt this function to their local situation. The downloaded data will be available in the package as rkidata. Due to data size limits on CRAN, the full dataset is not included in the babsim.hospitalpackage. Instead, we provide a subset of the Robert Koch-Institut dataset with 10,000 observations in the package. : chr "Schleswig-Holstein" "Schleswig-Holstein" "Schleswig-Holstein" "Schlesw %> $ Landkreis : chr "SK Kiel" "SK Kiel" "SK Kiel" "SK Kiel" ... %> $ Altersgruppe : chr "A15-A34" "A35-A59" "A35-A59" "A35-A59" ... %> $ Geschlecht : chr "M" "M" "M" "M" ... %> $ AnzahlFall : int 1 1 1 1 1 1 1 1 1 1 ... %> $ AnzahlTodesfall : int 0 0 0 0 0 0 0 0 0 0 ... %> $ Refdatum : chr "2020/09/01 00:00:00" "2020/09/02 00:00:00" "2020/09/03 00:00:00" "2020 %> $ IdLandkreis : int 1002 1002 1002 1002 1002 1002 1002 1002 1002 1002 ... %> $ Datenstand : chr "12.12.2020, 00:00 Uhr" "12.12.2020, 00:00 Uhr" "12.12.2020, 00:00 Uhr" %> $ NeuerFall : int 0 0 0 0 0 0 0 0 0 0 ... %> $ NeuerTodesfall : int -9 -9 -9 -9 -9 -9 -9 -9 -9 -9 ... %> $ Meldedatum : chr "2020/08/27 00:00:00" "2020/08/30 00:00:00" "2020/09/03 00:00:00" "2020 %> $ NeuGenesen : int 0 0 0 0 0 0 0 0 0 0 ... %> $ AnzahlGenesen : int 1 1 1 1 1 1 1 1 1 1 ... %> $ IstErkrankungsbeginn: int 1 1 0 1 0 1 1 1 1 0 ... %> $ Altersgruppe2 : chr "Nicht übermittelt" "Nicht übermittelt" "Nicht übermittelt" "Nicht über The rkidata can be visualized as follows (here region = 0 is Germany, region = 5 is North Rhine-Westphalia, region = 5374 Oberbergischer Kreis, etc.): p <-ggVisualizeRki(data=babsim.hospital::rkidata, region = 5374) print(p) After downloading the rkidataset, these data will be preprocessed. Not all the information from the original rkidata data set is required by the babsim.hospital simulator. The function getRkiData() extracts the subset of the raw rkidata required by our simulation, optimization, and analysis: rki <-getRkiData(rki = rkidata) %> getRkiData(): Found days with negative number of cases. Ignoring them. str(rki) %> 'data.frame': 1055516 obs. of 7 variables: %> $ Altersgruppe: chr "A15-A34" "A35-A59" "A15-A34" "A35-A59" ... %> $ Geschlecht : chr "M" "W" "M" "W" ... %> $ Day : Date, format: "2020-09-01" "2020-09-01" ... %> $ IdBundesland: int 1 1 1 1 1 1 1 1 1 1 ... As illustrated by the output from above, we use the following simulation data: Next, we will describe the field data, i.e., the real ICU beds (or other resources). Similar to the rkidata, which is available online and can be downloaded from the RKI Server, the field data is also available online. It can be downloaded from the DIVI Server as follows, where YYYY-MM-DD should be replaced by the current date, e.g, 2020-10-26. updateIcudataFile("https://www.divi.de/joomlatools-files/docman-files/divi-intensivregister-tagesreports Note, the data structures on the DIVI server may change, so it might be necessary to modify the following procedure. Please check the hints on the DIVI web page. Contrary to the updateRkidataFile() function, which downloads the complete historical dataset, the updateIcudataFile() function only downloads data for a single date. The downloaded data will be available in babsim.hospital as icudata. As described in the Appendix (Section 10, the DIVI dataset is not open data. Therefore, only an example data set, that reflects the structure of the original data from the DIVI register, is included in the babsim.hospital package as icudata: str(babsim.hospital::icudata) %> 'data.frame': 40470 obs. of 9 variables: %> $ bundesland : int 1 1 1 1 1 1 1 1 1 1 ... The ìcudata can be visualized as follows (region = 0 is Germany, region = 5 is North Rhine-Westphalia, region = 5374 is the Oberbergischer Kreis, etc.). Based on the icudata, two plots can be generated. The icudata, i.e., the field data or real data, will be preprocessed as follows. The function getIcuBeds() converts the 9 dimensional DIVI ICU dataset icudata (bundesland,gemeindeschluessel,. . . , daten_stand) into a data.frame with two columns: To run a simulation, the setting must be configured (seed, number of repeats, sequential or parallel evaluation, variable names, dates, etc.). babsim.hospital uses a list to store information related to the configuration. We will describe the components of this list first. The configuration list contains information about the simulation and field data. region = 5374 seed = 123 simrepeats = 2 parallel = FALSE percCores = 0.8 resourceNames = c("intensiveBed", "intensiveBedVentilation") resourceEval = c("intensiveBed", "intensiveBedVentilation") We can specify the field data based on icudata (DIVI) for the simulation as follows: FieldStartDate = "2020-09-01" icudata <-getRegionIcu(data = icudata, region = region) fieldData <-getIcuBeds(icudata) fieldData <-fieldData[which(fieldData$Day >= as.Date(FieldStartDate)), ] rownames(fieldData) <-NULL icu = TRUE icuWeights = c(1,1) Next, simulation data (RKI data) can be selected. The simulation data in our example, depend on the field data: SimStartDate = "2020-08-01" rkidata <-getRegionRki(data = rkidata, region = region) simData <-getRkiData(rkidata) simData <-simData[which(simData$Day >= as.Date (SimStartDate)), ] simData <-simData[as.Date(simData$Day) <= max(as.Date(fieldData$Day)),] simData$time <-simData$time -min(simData$time) rownames(simData) <-NULL Finally, we combine all field and simulation data into a single list() called data: data <-list(simData = simData, fieldData = fieldData) Configuration information is stored in the conf list, i.e., conf refers to the simulation configuration, e.g., sequential or parallel evaluation, number of cores, resource names, log level, etc. conf <-babsimToolsConf() conf <-getConfFromData(conf = conf, simData = data$simData, fieldData = data$fieldData) conf$parallel = parallel conf$simRepeats = simrepeats conf$ICU = icu conf$ResourceNames = resourceNames conf$ResourceEval = resourceEval conf$percCores = percCores conf$logLevel = 1 conf$w2 = icuWeights set.seed(conf$seed) In addition to the configuration list, a second list, which stores information about the simulation model parameters, is used. The core of the babsim.hospital simulations is based on the simmer package. It uses simulation parameters, e.g., arrival times, durations, and transition probabilities. There are currently 29 parameters (shown below) that are stored in the list para. A data.frame with these two parameters is passed to the main simulation function babsimHospital. Output from the simulation is stored in the variable envs. The simulation run is started as follows: First, we illustrate how to generate plots using the simmer.plot package (Ucar, Smeets, and Azcorra 2019) . In the following graph, the individual lines are all separate replications. The smoothing performed is a cumulative average. Besides intensiveBed and intensiveBedVentilation, babsim.hospital also provides information about the number of non-ICU beds. The non-ICU beds are labeled as bed. Summarizing, babsim.hospital generates output for three bed categories: To plot resource usage for three resources side-by-side, we can proceed as follows: resources <-get_mon_resources(envs) resources$capacity <-resources$capacity/1e5 plot(resources, metric = "usage", c("bed", "intensiveBed", "intensiveBedVentilation"), items = "server") Note, each resource can be plotted separately. For example, the following command generates a plot of non icu beds. plot(resources, metric = "usage", "bed", items = "server", steps = TRUE) babsim.hospital provides functions for evaluating the quality of the simulation results. Simulation results depend on the transition probabilities and durations, i.e., a vector of more than 30 variables. These vectors represent parameter settings. babsim.hospital provides a default parameter set, that is based on knowledge from domain experts (doctors, members of COVID-19 crises teams, mathematicians, and many more). We can calculate the error (RMSE) of the default parameter setting, which was used in this simulation, as follows: fieldEvents <-getRealBeds(data = data$fieldData, resource = conf$ResourceNames) res <-getDailyMaxResults(envs = envs, fieldEvents = fieldEvents, conf=conf) resDefault <-getError(res, conf=conf) print(resDefault) %> [1] 9.412271 The error is 9.4122709. In addition to the original simmer plot, babsim.hospital provides functions for visualization. Here, we illustrate how babsim.hospital plots can be generated. Before we deescribe these plots, readers should be aware of the fact, that we do not use the full data, simulation results are completely wrong and do not represent any real-world situation! The following figures are included to demonstrate the working principles of the visualization procedures. Note, ggplot and plotlycan be used as follows to generate interactive plots. ggplotly(p) As discussed above, babsim.hospital provides a default parameter set, which can be used for simulations. The function babsimHospitalPara() provides a convenient way to access the default parameter set: para <-babsimHospitalPara() babsim.hospital provides an interface to optimize the parameter values of the simulation model. The following code is just a quick demo. To run the following code, the complete rkidata and icudata data sets must be available. The code from above was shown for didactical purposes. It starts a short optimization run to illustrate the underlying optimization procedure. Results from real the runoptDirect() runs are stored in the paras.rda file. babsim.hospital provides results from several regions (towns and counties in Germany), e.g.: • 'getParaSet (5374) Results, i.e., parameter settings, of the short runoptDirect() optimization from above can be used as follows: xy <-resDemo$best.df para <-getBestParameter(xy) res <-modelResultHospital(para=para, conf=conf, data = data) resOpt <-getError(res, conf=conf) print(resOpt) %> [1] 6.566368 These results show that even a very short optimization improves the error: The improved is 6.5663677, which is smaller than 9.4122709. This improvement can also be visualized. Besides using the optimized parameters for simulations, which allows improved simulations, the optimized parameters can be used for analysing the parameter settings. babsim.hospital includes several tools to analyze parameter settings. You might recall that parameter settings consist of • transition probabilities, e.g., the probability that an infected individual has to go to the hospital. • durations, e.g., the time span until an infected individual goes to the hospital (in days). The following plot illustrates the transition probabilities. It uses the following states: This matrix can be visualized as follows: visualizeGraph(para=para, option = "P") We will describe only a very quick parameter analysis. A detailed analysis will be presented in a forthcoming paper. Here, we demonstrate how machine laerning tools (regression trees) can be used to visualize the most important parameters of the simulation model. getParameterNameList(c(24, 25, 3, 10)) %> x24 x25 %> "AmntDaysAftercareToHealthy" "RiskFactorA" %> x3 x10 %> "AmntDaysNormalToIntensive" "AmntDaysVentilationToDeath" Sevral additional tools are available, e.g., to perform a sensitivity analysis, because babsim.hopital uses the R package SPOT (sequential parameter optimization toolbox) to improve parameter settings.SPOT implements a set of tools for model-based optimization and tuning of algorithms (surrogate models, optimizers, DOE). SPOT can be used for sensitivity analysis, which is in important under many aspects, especially: • understanding the most important factors (parameters) that influence model behavior. For example, it is of great importance for simulation practitioners and doctors to discover relevant durations and probabilities. • detecting interactions between parameters, e.g., do durations influence each other? The fitness landscape can be visualized using the function plotModel. Again, we would like to mention that these results are only shown for didactical purposes. They are not a valid statistical analysis, because the optimization via simulation runs are too short and do not produce sufficient results. Here, the interaction between the first two model parameters, i.e., AmntDaysInfectedToHospital, and AmntDaysNormalToHealthy, is shown. SPOT::plotModel(res$modelFit, which = c(1,2) ,xlab = c("x1", "x2")) ICU ventilated 7. 'intafter': intensive aftercare (from ICU with ventilation, on ICU) 8. 'aftercare': aftercare (from ICU, on normal station is shown below: 1. 'data': an already existing data set, i.e., the history 2. 'EndDate': last day of the simulated data (in the future) 3. 'R0': base reproduction values (R0) at the first day of the scenario and at the last day of the scenario Found days with negative number of cases. Ignoring them. n <-as.integer( max(data$Day)-min we developed a resource planning tool for hospitals that considers the specific situation of the COVID-19 pandemic. babsim.hospital is implemented in the statistical programming language R, see R Core Team (2020), and uses a discrete-event simulation model. The simmer package It combines simulation, optimization, statistics, and artificial intelligence processes in a very efficient way For modeling purposes, distributions (including a truncated and translated Gamma distribution) were specially developed by us in order to realistically simulate the length of stay. babsim.hospital takes into account different risks for individual groups of people (age and gender-specific) and can be used for comparison with your own local planning, simulation of local events, adaptation to your own situation, simulation of any scenario (worst / best case), simulation of any pandemic scenario, considering the local situation, and enables a standardized approach. And, there are benefits for medical professionals, e.g, analysis of the pandemic at local, regional, state and federal level, the consideration of special risk groups, tools for validating the length of stays and transition probabilities. Finally, there are potential advantages for administration, management, e.g., assessment of the situation of individual hospitals taking local events into account the following copyright notice under advisement, if you plan to use the RKI data included in the package: Die Daten sind die "Fallzahlen in Deutschland" des Robert Koch-Institut (RKI) und stehen unter der Open Data Datenlizenz Deutschland Version 2.0 zur Verfügung Please take the following copyright notice under advisement. The DIVI data are not open data. The following statement can be found on the DIVI web page: Eine weitere wissenschaftliche Nutzung der Daten ist nur mit Zustimmung der DIVI gestattet. Therefore, only an example data set Optimization of High-dimensional Simulation Models Using Synthetic Data Sequential Parameter Optimization Policy: A Novel Modelling Technique to Predict Resource -Requirements in Critical Care a Case Study R: A Language and Environment for Statistical Computing Simmer: Discrete-Event Simulation for r Please download the data from RKI and DIVI or provide your own simulation and field data! Note: results are stored in the directory results. %> [1] "2020/09/01 00:00:00" "2020/12/11 00:00:00" %> [1] "2020-09-01" "2020-12-11" %> [1] "trainDataSim: 2020-10-05" "trainDataSim: 2020-12-11" %> [1] "trainDataField: 2020-11-02" "trainDataField: 2020-12-11" %> [1] "Starting optimization loop:" %> [1] "#########################################" %> [1] "Repeat: 1 ###############################" %> [1] "trainConfSim: 2020-10-05" "trainConfSim: 2020-12-11" %> [1] "trainConfField: 2020-11-02" "trainConfField: 2020-12-11" %> [1] "Warning cutting some x0 parameters as there are too many" %> [1] "Starting Surrogate Optimization" %> [1] "*******************************" %> 60% completed. regression-based parameter screening can be performed to discover relevant (and irrelevant) model parameters: (1 observation deleted due to missingness) %> Multiple R-squared: 0.3807, Adjusted R-squared: 0.3423 %> F-statistic: 9.924 on 7 and 113 DF, p-value: 1.307e-09