key: cord-0908400-du5r623k authors: Powała, Alina; Woźnica, Adam title: Optimal Relocation of Medical Procedures Under the Covid-19 Regime: an Urology Case Study date: 2021-12-31 journal: Procedia Computer Science DOI: 10.1016/j.procs.2021.09.109 sha: 3f1e3acb70d093a13d1af3d3d1875800c549a89b doc_id: 908400 cord_uid: du5r623k We present a prototype system, OptiLoc, that aims at analysing consequences of medical capacity restrictions under the Covid-19 regime as well as at providing medical analysts and decision makers with medical procedure relocation plans for a given area in a given time period. This is achieved by first forecasting the demand of medical procedures of different types at the selected geographical granularity level and then finding relocations which are optimal according to a well defined objective function that takes into account procedure and relocation costs, under constraints imposed by Covid-19 restrictions and hospital capacities. Allocation plans are visualized in a user friendly web-based application. We demonstrate the effectiveness of the developed system on the data on urological procedures from Poland. The Covid-19 pandemic has put an enormous strain on the medical system. On one hand, to limit the spread of the virus various regional and federal authorities temporarily closed individual hospital wards, or even the whole hospitals and medical clinics, usually for all but emergency medical procedures. On the other hand, many medical facilities were re-purposed to free up resources and personnel to only care for patients with COVID-19. This has resulted in a situation where various non life-threatening medical procedures were postponed or even cancelled, effectively building up a waitlist of procedures waiting to be performed after restrictions are lifted. The Covid-19 pandemic has put an enormous strain on the medical system. On one hand, to limit the spread of the virus various regional and federal authorities temporarily closed individual hospital wards, or even the whole hospitals and medical clinics, usually for all but emergency medical procedures. On the other hand, many medical facilities were re-purposed to free up resources and personnel to only care for patients with COVID-19. This has resulted in a situation where various non life-threatening medical procedures were postponed or even cancelled, effectively building up a waitlist of procedures waiting to be performed after restrictions are lifted. One approach to reduce the ever-lengthening backlogs is to proactively reallocate patients to medical facilities functioning at non-reduced capacity, preferably to nearby regions with less strict restrictions. Ideally, the allocation strategy should be data driven and optimized for a well defined objective so that it can be easily audited and adjusted. In this paper we present a prototype system, OptiLoc, that aims at providing the analysts and decision makers with medical procedure relocation plans under capacity restrictions. The relocation plan takes into account various aspects related to healthcare system under consideration such as expected number of procedures of different types to relocate, hospitalization capacities, priorities of procedures both from the view of patients (as measured e.g. by survival times) and the whole system (e.g. immediate procedure costs), typical hospitalization times for non-ambulatory procedures and relocation costs between regions. The relocation problem is cast as a large scale mathematical optimization problem minimizing the overall allocation costs constrained by hospital capacity to which procedures are assigned. The number of procedures is determined by an application of a machine learning based forecasting algorithm that provides an estimated number of procedures of different types that are typically performed in a given region in any period in the future in normal, non-COVID times. The effectiveness of the system is demonstrated on the data on urological procedures from Poland with the granularity of county / district, year-month and procedure type according to the ICD-9 dictionary. 1 Finally, we demonstrate how the algorithmic part is coupled with a web-based graphical client which allows for a user-friendly and graphical presentation of the optimization plans. The remainder of this paper if organized as follows. In Sec. 2 we provide a high level overview of the developed system. In Sec. 3 we describe available data which were used in the forecasting and optimization models. In Sec. 4 we present the machine learning methodology employed to forecast the needed number of medical procedures to perform in a given region and period. In Sec. 5 we define the mathematical optimization problem which finds optimal procedure allocation. Finally, we conclude with Sec. 6 where we also provide pointers for future work. OptiLoc allows to analyze the consequences of capacity restrictions in given locations and the possibility of relocation of procedures on a given area in a given period of time. In what follows we focus on administrative locations in Poland where counties / districts constitute voivodeships / provinces. The system accepts optimization requests like: Calculate the optimal relocation of procedures "63" and "55" in the following districts: 0415, 0464, 0463, 0462, 0461, 0410, 0407, 0404 in the period from 2021-05-01 to 2021-05-31, assuming the capacity rate in the districts is limited to 30%. As a response, OptiLoc provides information about the consequences of limiting capacity in the specified districts together with procedures' relocation plan. In this prototype we dealt with urological procedures from Poland but configuring OptiLoc to work with any other types of procedures is straightforward. Moreover, the proposed approach can in principle work for data from other countries; we expect that the changes will be mainly needed at the datasets level, however, adjusting to the other specific conditions might be necessary. OptiLoc is based on solving large scale optimization problems and has been designed as a modular framework which allows to abstract the core mechanics of solving optimization problems from the specifics of the application at hand. The prototype system has been configured using data specific to the Polish healthcare, population, administrative division, etc. but the framework can easily be applied to other contexts. Because of the sheer number of procedures performed monthly in Poland (around 54,000 if limited to urological procedures) the computation is paralellized and works in two phases. First, optimization tasks are simultaneously launched for all 16 provinces (the relocation happens within the borders of the province). Then, a second level optimization task is launched for the whole country on the remaining vacancies (the relocation happens across provinces). OptiLoc uses the following data: L. Data on locations and their hierarchies (i.e. districts and provinces), Sec. 3.1. P. Data on procedures (including median hospitalization time and prioritization in terms of survival rates), Sec. 3.2. F. Procedure demand forecasts (Sec. 4). This is achieved by a machine learning forecasting model that based on on historical data predicts the number of medial procedures of different types to be performed in a given region in any period in the future in normal, non-COVID time. C1. Costs of medical procedures from P. according to National Health Fund price list, Sec. 3.2; C2. Costs of medical procedures from P. in terms of patient survival, Sec. 3.2; C3. Relocation costs (based on pairwise distances between districts), Sec. 3.1. MC. Maximum capacity of the sites from L. (limit on the number of beds), Sec. 3.1. The output of OptiLoc consists of: • Total cost of medical procedures that could not be relocated, e.g. 30 procedures of type "63" -cost PLN 200,000. This can be understood in terms of a "debt", as these procedures will have to be performed in the future. • Consequences of postponing medical procedures to save lives: in general postponing an important procedure will have a detrimental effect on survival rates. • Relocation plan for medical procedures that were successfully relocated given the input parameters. Sample visualization of the procedures' relocation plan in the web-based prototype application ( Fig. 1) shows the specified districts (Polish "POWIAT") on the map as well as the numerical data on: forecasted procedure demand (column 1), optimal number of procedures (column 2), approximate maximal capacity (column 3). The median of the hospitalization time for this procedure is 4 days; the period of interest is limited to 30 days. In the specified time frame, maximum capacity of district 0461 (1st row) was reduced to 2 beds (3rd column), thus 60 bed-days. Since procedure 58 median hospitalization time is 4 days, there can be at most 14 procedures performed in that district (2nd column). The remaining 22 procedures have been relocated to district 0410 (5th row), which has 9 beds (thus 270 bed-days). We provide here a high level description of the available data used in the modelling and optimization tasks. These datasets can be divided into two main categories: data on locations (hierarchical administrative areas, hospitalization capacity and pairwise relocation costs) and data on procedures (historical numbers of executed procedures to generate forecasts, hospitalization times, procedure prioritization, immediate procedure costs incurred on health system). Due to delays in accessing data resulting from protracted administrative procedures the implemented models were tested without access to some datasets. In these cases, the missing data were randomly simulated and the implementation was carried out in such a way that real data, once available, can be easily re-connected with the models. In this work we focus on relocating urological procedures in Poland. The smallest considered administrative location entity is that of county / district which constitutes voivodeships / provinces; there are 16 voivodeships in Poland and 380 counties (314 are pure counties / districts and 66 cities with county rights). Going forward it is possible to consider even smaller administrative entities (e.g. community or even hospitals). There is no data defining precisely the maximum possible number of individual procedures to be performed in a given area at a given time. To approximate this number (the so-called practical limit), we use data on the number of beds available in individual wards. The assumption that we are making is as follows: in a given district, no more than as many urological medical procedures can be performed per day as there are beds in urology departments in all hospitals in a given district. To achieve this, we use the median hospitalization time for the procedures. The cost of relocation from location i o location j is given by the distance in kilometers between i and j, multiplied by an uniform costs of transporting the patient by ambulance per kilometer (a configuration parameter). Data on procedures which were used to train forecasting models were gathered and delivered in the context of the ProME project [4] . Input data is limited to urological procedures and contain the number of performed procedures and hospitalizations in Poland aggregated per county / distinct, year-month and procedure type according to the ICD-9 dictionary. Procedure types are available at different generality levels: from the most general ones, chapters (e.g. 55: Operations on Kidney) through sub-chapters (e.g. 55.1: Pyelotomy and Pyelostomy), main categories (e.g. 55.11: Pyelotomy) until detailed categories. The number of urological procedures from each of the above hierarchy level is the following: there are 10 chapters, 102 sub-chapters, 524 main categories and 986 detailed categories. Data span 10-years period from 2009-01-01 to 2018-12-31. In Fig. 2 we provide timeseries of both the number of procedures and hospitalizations per year-month. In the remainder of this work we will focus on the number of procedures at the levels of chapter and main category. Procedure costs from patients' perspective were defined based on survival length estimated by the means of survival analysis [3] . More precisely, the cost is defined as the inverse of estimated survival. The other required procedure characteristics such as median hospitalization times and immediate procedure costs incurred on health system were not available during the project; these were randomly generated. The goal of forecasting models is to provide expected demand of executed procedures of different types (according to the ICD-9 taxonomy) in a given administrative area at any period in the future, under non-pandemic circumstances. We will use the following notation. Let X def = {X pl i : i ∈ T } be an univariate time series for procedure p ∈ {1, . . . , m}, location l ∈ {1, . . . , n} and indexing set T . We will consider procedures at different granularity levels (see Section 3). The default granularity is the main category according to ICD-9 (resulting in m = 524). Some experiments from Sec. 4.1 will be reported on the chapter level according to ICD-9 (m = 10). The location set corresponds to counties / districts in Poland and has cardinality of n = 380. The indexing set T is given by year-month. We considered two baseline models with the goal to obtain a reference point in the context of evaluation metrics for the more sophisticated models. In the first model, : k = max{i ∈ T : m(i) = M}}. The next model is essentially a curve fitting REGRESSION model [6] based on an additive decomposition of the time series into four main model components: trend (g) that corresponds to non-periodic changes in the value of the time series, seasonality (s) that represent periodic changes (e.g. yearly or weekly), events (h) that account for potentially irregular schedules and random noise (ϵ t ) that constitute the idiosyncratic signals which are not accounted for by the other components: X i = g i + s i + h i + ϵ i . The trend component can be set to either saturating growth which is modelled by the means of a logistic function or to piecewise linear which is composed of a number of straight-line segments with change points automatically inferred from the data. In all of our experiments we used the latter option as it gave rise to a more predictive model. In order to model seasonal effects, the selected model relies on the Fourier transform for the yearly seasonalities. The event component was not used in the analysis. The considered model by default is suited for univariate time series and hence the modelling takes place independently for every combination of procedures and counties. This is amenable for efficient parallel or distributed implementations (e.g. Spark [5]). Finally, we also considered deep learning models with the Long Short Term Memory network topologies [2] . With such models it is possible to go beyond univariate time series and incorporate a number of independent variables. The main characteristics of the proposed network topology are the following. • One model is trained per one procedure p. We only focus on procedures from the ICD-9 chapter level. • The model input consists of two parts: county id and multivaried time series for 3 * 12 = 36 months in the form of {X l i : i ∈ {1, . . . , 36}}. • Each month of the multivariate time series is associated with the number of performed procedures from the chapter levels for a given county and two values obtained from the Fourier transform applied on the month variable: , where m(i) returns month number for index i, and F 1 , F 2 return two first coefficients obtained from the Fourier transform. • County id l is embedded into a 5-dim space. The multivariate time series is mapped to 16-dim space by the means of the LSTM layer. • A dense layer is used to map the 5 + 16 = 21-dim space to an output vector with 12 elements corresponding to predicted months for procedure p and county l. • The above model for a procedure p is trained on a set containing all counties; for each county l we reprocess data to extract all 36-month multivariate time series connected with subsequent 12 values per procedure p. In order to validate predictive performance of the considered models we run a number of experiments. Historical data was dived into training data spanning 9 years (from 2009-01-01 to 2017-12-31) and 1-year of testing data used for model evaluation (from 2018-01-01 to 2018-12-31). We focused on two evaluation metric which are frequently used where t i denotes the true value, p i is the forecasted value and i iterates over months from 2018. Since the forecasts will be used for optimal procedure reallocation, the most relevant metric from the above is MAE. For the sake of completeness we also report MAPE. In Tab. 1 we first present the evaluation metrics for baseline and regression models on the chapter and main categories levels. The regression models improve MAE (MAPE) about 10% (12%) over the baseline models for chapters and about 14% (20%) for main categories. Results of BASELINE2 are unexpectedly weak; this is likely caused by the fact that this model does not discriminate between recent (i.e. last year) and older (i.e. 2 years back) values. In order to better understand the relative performance of the baseline and regression models we present in Fig. 3 MAE for these two models aggregated by month. The performance gap between the two models is most pronounced for short forecasting windows (months 1-4 2018). We note that optimization presented in Sec. 5 will be most often performed for such short forecasting windows; this reinforces the advantage of the regression model over the baselines. In the next set of experiments we compare performance of REGRESSION and LSTM models. We limit results on the level of 10 procedures from the ICD-9 chapter level. Results are presented in Tab. 2. It is clear that neither of the methods consistently outperforms the alternative. Indeed, REGRESSION is better (worse) than LSTM for 4 (6) procedures according the a paired T-test with significance level of 5%. We believe that LSTM can be further improved by carefully selecting hyper-parameters and network typologies; we plan to focus on this aspect in the future. The task realized by OptiLoc is to find solutions to a (set of) Assignment Problems (AP). In the generally formulated AP there are many agents and a number of tasks. Each agent can be matched to perform any task at a cost that in general may vary for each pair of an agent and a task. The goal is to complete as many tasks as possible by assigning at most one agent to each job and at most one job to each agent in such a way as to minimize the total cost of the job. In OptiLoc a task is equivalent to a single execution of a procedure (i.e. a single patient) and agent corresponds to a hospital bed. However, as the hospitalization time for different procedures may vary in a given time window (e.g., 30 days) several procedure executions may be assigned to one bed. The goal of OptiLoc is to optimally assign patients to beds in terms of allocation costs, hospitalization time and prioritization of medical procedures. In OptiLoc we consider two basic scenarios where L is the set of locations (e.g., set of districts) and P is the set of types of medical procedures. The so-called multiplier scales a location (e.g., hospital) capacity. The following considerations apply to the selected forecast horizon of o, which in the current implementation is set to thirty days from a given date, according to the granularity of the forecasting data. Scenario 1. We limit the execution of procedures from P in locations from L according to a certain scheme, given by specifying the multipliers d l for individual places l ∈ L. What are the consequences of deferring procedures (numerical statistics and costs of deferred procedures)? Scenario 2. We limit or increase the execution of procedures from P in areas from L according to a certain scheme, given by specifying the multipliers d l for individual places l ∈ L. What are the possibilities of relocating the execution of procedures from P to areas from L? What are the costs associated with a successful relocation of procedures? What are the numerical statistics and costs associated with deferring procedures? Individual procedures may have different hospitalization times. For each procedure p median hospitalization time med(p) expressed in days is given. We assume that in the period o in the location l, we have to allocate procedures so that their total execution takes less than the assumed period of o. Moreover, each individual procedure (patient) is assigned to exactly one bed at a time. Patients with insufficient beds are allocated to a virtual bed (m virt), i.e. marked for relocation. The optimal allocation of patients to beds should be found taking into account the following conditions: • the averaged and normalized cost of performing the p procedure for the health care system (according to the NHF price list) is given in the K1 cost table and amounts to c p ; • the averaged and normalized cost of performing the p procedure from the patient's perspective (significance for survival) is given in the K2 cost table and is c p surv ; • the normalized cost of relocation from l to l ′ is given in the K3 cost table and is c l,l ′ . The project goal is to minimize the cost of the deferred procedures. To this end, the goal of the optimal AP is to maximize the sum of costs in the form of (c p + c p surv − c l,l ′ ) so that we: • maximize the costs of executed procedures c p ; • maximize the probability of patient survival c p surv , and • minimize relocation costs (c l,l ′ ). Below, we introduce the symbols common to all considered issues: • P -the set of procedures under consideration, see section 3; • o -a fixed period of time during which the procedures from the P set are to be executed, e.g., 30 days; • L -a set of locations where procedures can be executed; • P lo -a set of procedures to be executed in the l location in the o period; • med(p) -median hospitalization time for procedure p; • x m,p def = 1 if bed m will be occupied by procedure p; 0 if bed m will not be occupied by procedure p. We define the objective function as follows: is the original location of the procedure p and l ′ = loc(i) is the location of the bed i. We define the constraints as follows: G. x m,p = 1 -each individual procedure (patient) is assigned to one bed (real or virtual); Fig. 4a presents an allocation matrix for a sample problem. As m virt collects all deferred procedures to be relocated on the country level in the next step, the cost of assigning any procedure to m virt (see Fig. 4b ) is minimal since the overall goal is to maximize the objective function. The above optimization tasks fall into the category of integer (or more specifically binary) optimization problems. These problems were solved using the CPLEX solver [1] . To understand how the size of the optimization problem (as measured by the number of variables) affects the computation time, a number of sequential and parallel experiments were performed. The experiments were divided into 11 groups of tasks with a similar number of variables. In each of these groups, 16 tasks of given sizes were prepared; the number 16 corresponds to the number of provinces. Details of the experiments are presented in Tab. 3. The task configuration is prepared is a greedy way based on the available data to achieve the desired number of procedures and beds. That causes variance in the number of variables, e.g., for larger groups of tasks (7 and up) and for some provinces there was an insufficient number of procedures. For example, for group 1, we have tasks with a variable number of procedures -from 51 to 128, with an avg. of 65.8 (std. dev. 19.8). For the second group -from 101 to 277, with an avg. of 128 (std. dev. 45.9), for the third group -from 202 to 300 with an avg. of 237.9 (std. dev. 31.7). The following constraints directly affecting the time to obtain the results were used in the performance tests: • CPLEX timeout was set to 20 min. (i.e. after exceeding this limit, the solver returns the best solution so far); • the relative difference in % between the values of the objective function for the best solution found and the value for the optimization problem without binary constraints (i.e. a bound of possible solution) is less than 5%; • parallel computations used max. 4 processes (ultimately it will be 16 processes, i.e., the number of provinces). For sequential test all 16 tasks from each group were run sequentially (giving rise to 176 tasks). Statistics for execution times aggregated into groups are presented in Tab. 5. They show that for groups of 500 and smaller, the time budget of 20 min. is sufficient to get below the relative difference of 5 %, while for groups of 1,000 and larger it is often not possible to achieve this. Especially for groups 2,000 and above, the median execution time is approximately 20 min., i.e. at least half of the runs have been stopped after exceeding the 20 min. limit. On the other hand, for these problems the relative difference obtained is in most cases still acceptable and lies between 5% and 10%. For parallel executions, for each of the groups containing 16 tasks, 4 tasks were performed in parallel. The execution times for each group are presented in Tab. 6. It follows that our parallel computing strategy scales linearly with the number of tasks. Importantly, our experiments show one important practical aspect of using OptiLoc: in majority of the cases the user can quit quickly obtain an approximate solution within 5-10% range of the optimal solution. In this way, the user may quickly experiment with various viable limitation scenarios. The analysis of the consequences of medical capacity restrictions allows to obtain numerical statistics and the cost associated with this limitation. In OptiLoc, this cost can be assigned with any appropriate external cost model but the deferred procedures will have to be performed in the future and will incur additional costs. It should be noted that currently in the cost analysis we do not take into account factors related to the change in the health condition of a patient who has not been subjected to the assumed procedure. More importantly, the individual components of the cost model should be expressed in the same units. Thus, the question in what universal units to express: the cost of the procedure according to the NHF pricelist, the cost of relocating the patient from site A to site B and the cost associated with the importance of the procedure for saving the patient's life remains open. OptiLoc was built for extensibility, therefore the designed solution may be enriched to include the patient's health condition, differentiating costs of performing the procedure at the district level and various aspects of the procedures. Deep Learning Survival Analysis, 2nd Edition Wstepna identyfikacja i analizaźródeł danych dla potrzeb projektu naukowo badawczego ProMe Forecasting at Scale The research has been supported by the Polish Ministry of Science and Higher Education under project ProME "Predictive Modeling of the Covid-19 Epidemics". We would also like to thank M. Niezgódka, A. Szałas, Linh Nguyen, P. Dunin-Kȩplicz, M. Iwański, P. Regulski and P. Wiśniewski for helpful comments and discussions. Alina Powała, Adam Woźnica / Procedia Computer Science 00 (2021) 000-000Fig. 5: Execution times (in seconds) for sequential optimization tasks, aggregated per task group. These times are given in total (data preparation + solver) and only for the solver. Due to the high computational complexity of the optimization problem, and due to the practical aspect of the problem, calculations for each province are run in parallel. Subsequently, procedures that could not be relocated within the province are relocated within the country to the remaining vacant beds. Thanks to this, an important practical aspect of using OptiLoc is quick experimentation with various viable scenarios. Once the approximate cost is within acceptable limit, the full computation to obtain the optimal solution can be executed.