key: cord-0004356-xzjzko0a authors: Perry, Alexander G.; Moore, Kieran M.; Levesque, Linda E.; Pickett, C. William L.; Korenberg, Michael J. title: A Comparison of Methods for Forecasting Emergency Department Visits for Respiratory Illness Using Telehealth Ontario Calls date: 2010-11-01 journal: Can J Public Health DOI: 10.1007/bf03403965 sha: b5980bd966e9b961c6d47772b49e6f405ceaed20 doc_id: 4356 cord_uid: xzjzko0a Objectives: Anticipating increases in hospital emergency department (ED) visits for respiratory illness could help time interventions such as opening flu clinics to reduce surges in ED visits. Five different methods for estimating ED visits for respiratory illness from Telehealth Ontario calls are compared, including two non-linear modeling methods. Daily visit estimates up to 14 days in advance were made at the health unit level for all 36 Ontario health units. Methods: Telehealth calls from June 1, 2004 to March 14, 2006 were included. Estimates generated by regression, Exponentially Weighted Moving Average (EWMA), Numerical Methods for Subspace State Space Identification (N4SID), Fast Orthogonal Search (FOS), and Parallel Cascade Identification (PCI) were compared to the actual number of ED visits for respiratory illness identified from the National Ambulatory Care Reporting System (NACRS) database. Model predictor variables included Telehealth Ontario calls and upcoming holidays/weekends. Models were fit using the first 304 days of data and prediction accuracy was measured over the remaining 348 days. Results: Forecast accuracy was significantly better (p<0.0001) for the 12 Ontario health units with a population over 400,000 (75% of the Ontario population) than for smaller health units. Compared to regression, FOS produced better estimates (p=0.03) while there was no significant improvement for PCI-based estimates. FOS, PCI, EWMA and N4SID performed worse than regression over the remaining smaller health units. Conclusion: Telehealth can be used to estimate ED visits for respiratory illness at the health unit level. Non-linear modeling methods produced better estimates than regression in larger health units. respiratory illness at the health unit level for each of the 36 health units in Ontario. Although the earlier study by van Dijk indicates Telehealth is potentially useful for estimating ED visits at the provincial level, only simple cross-correlation was used to measure the strength of the association between calls and visits. No method is proposed to provide an estimate of visits from calls and the callvisit relationship is not examined at the health unit level. The study does not examine the possibility of non-linear relationship between calls and visits. To address this, the performance of five different forecasting methods that use calls to forecast ED visits is compared, including two techniques capable of capturing non-linear relationships between calls and visits. The methods examined include Exponentially Weighted Moving Average (EWMA), regression, subspace state space identification (N4SID), Fast Orthogonal Search (FOS), and Parallel Cascade Identification (PCI). EWMA, regression, and Autoregressive Integrated Moving Average (ARIMA) have been applied in forecasting emergency department visits. 8, 10, 11 It can be shown that ARIMA models can be represented in state-space form 12, 13 and in this study the N4SID method of identifying the parameters of a state-space model is used. 14 To investigate the possibility that non-linear relationships exist between calls and visits and are useful in estimating visits from calls, we employed two non-linear techniques for modeling time series: FOS and PCI. 15, 16 Hospital ED visits and Telehealth Ontario calls from June 1, 2004 to March 14, 2006 (652 days) were included in the study. The study period was limited by availability of Telehealth data. Hospital ED visits for respiratory complaints were obtained from the Canadian Institute of Health Information (CIHI) National Ambulatory Care Reporting System (NACRS) database for the fiscal years 2004-2005 and 2005-2006 . Telehealth Ontario calls were obtained from the MOHLTC. Information in each data set consisted of date of call or visit, patient age, and reason for visit or call. To protect the identity of individuals, all personal health information used in the study were de-identified and postal code information was limited to the forward sortation area (FSA). Ethics approval for the study was obtained from the Queen's University Health Sciences and Affiliated Teaching Hospitals Research Ethics Board. Emergency visits for respiratory complaints were identified from the NACRS database using a set of International Classification of Disease Revision 10 Canadian Enhancement (ICD10-CA) codes. These codes, presented in Table 1 , were obtained by translating ICD-9 codes found by previous research to be highly correlated with positive lab results for respiratory pathogens. 17 A similar set of codes had been used in the van Dijk study. 9 There were 548,438 visits for respiratory illness during the study period. Records with missing, invalid or out-of-province postal codes were excluded, leaving 535,185 visits in the analysis. Each call to Telehealth Ontario is assigned to one of 486 call guidelines based on the reason for the call. 9 Calls for respiratory illness were identified using the same subset of these call guildelines as used by van Dijk. 9 There were 187,426 calls for respiratory complaints according to this set of guidelines over the study period. Of these, calls with missing, invalid or out-of-province FSA were excluded, leaving 177,601 calls in the analysis. To construct time series of the daily number of Telehealth Ontario calls and daily number of emergency visits for respiratory illness for each health unit, a mapping between postal code FSA and health unit was constructed by linking dissemination areas using the Statistics Canada Health Region Boundary File 18 and Postal Code Conversion File. 19 Because an FSA may overlap several health units, the geographic regions used in the analysis approximate rather than exactly represent health unit regions. However, the same mapping was used for calls and visits meaning that calls and visits were aggregated over exactly the same mutually exclusive geographic regions. Forecasts of the daily number of ED visits for respiratory illness at the health unit level were generated by each of the five different modeling methods for each of the 36 health units in Ontario. Models were constructed to produce estimates of the daily number of visits, , for given health unit q days ahead, where q ranged from 0 to 14. A maximum forecasting horizon of 14 days was chosen because previous research indicated that calls were significantly correlated with visits up to two weeks in advance. 9 Each method had as inputs the daily aggregate number Telehealth calls, x c , for the health unit being considered and an indicator variable for upcoming holidays and weekends, x h (except the EWMA model which used only the former). Table 2 presents details of the models used. The variable n indicates the time index in days. All models were implemented in MATLAB. 21 Models were fit using the first 304 days of Telehealth data (training data set) and performance was assessed on the remaining data (validation data set). A total of 2,700 models were created (5 methods, 15 lead times, 36 health units). Forecasting accuracy was assessed using the mean square error, %MSE, defined as: where y(n) is the actual number of emergency visits at day n, is the estimated number of ED visits at day n, and the overbar "¯¯" Viral infection, unspecified H66. 9 Otitis media, unspecified J00 Acute nasopharyngitis (common cold) J01. 9 Acute sinusitis, unspecified J06. 8 Other acute upper respiratory infections of multiple sites J39. 9 Disease of upper respiratory tract, unspecified J06. 9 Acute upper respiratory infection, unspecified J20.0 Acute bronchitis due to mycoplasma pneumoniae J20. 1 Acute bronchitis due to haemophilus influenzae J20. 2 Acute bronchitis due to streptococcus J20. 3 Acute bronchitis due to coxsackievirus J20. 4 Acute bronchitis due to parainfluenza virus J20. 5 Acute bronchitis due to respiratory syncytial virus J20. 6 Acute bronchitis due to rhinovirus J20.7 Acute bronchitis due to echovirus J20. 8 Acute bronchitis due to other specified organisms J20. 9 Acute bronchitis, unspecified J18. 8 Other pneumonia, organism unspecified J18. 9 Pneumonia, unspecified J10. Table 3 presents a description of Ontario health units with a population greater than 400,000 in 2005. 22 Collectively, these health units represent about 75% of the Ontario population. The %MSE for each of the methods for four-day-ahead predictions is presented in the table, in addition to the ratio of the median number of daily Telehealth calls to the median number of daily ED visits. The call-to-visits ratio was found to be associated with model accuracy. To give the reader a sense of the meaning of the %MSE values reported in Table 3 , Figure 1 shows a plot comparing the actual to four-day-ahead estimated number of ED visits for the FOS, regression, and PCI models over part of the validation data for the city of Toronto Health Unit (EWMA and State Space are omitted for clarity). Figure 2 shows the predicted number of visits over the next one-week period (dashed lines) compared with the actual ED visits (solid line) that occurred during that period. The one-weekahead weekly predictions were obtained by summing the onethrough seven-day-ahead predictions. Aggregating over a sevenday period removes the well-known weekly cyclical pattern in visits 23 and more clearly shows ability to predict beyond these known trends. In health units with populations of more than 400,000, the multi-level regression model indicated that FOS offered better Table 2 . Forecasting Models and Implementation Details EWMA Model α and β are the model coefficients. Model coefficients identified by least-squares fit Note that the models did not use the actual number of visits on previous days, y(n+q-1) as a predictor of future visits. This assumption was made as this information is not currently available in Ontario -hospitals are only required to report visit information to CIHI at the end of the fiscal year. Instead the previous model estimate was used. β i are the model coefficients. L is the maximum lag in the calls allowed in the model and was set to 10. Model coefficients identified by least-squares fit , indicating that either calls or holidays could be an input to a given cascade. In the case of Ri=h, k can take negative values up to -q, otherwise k≥0. C is the number of cascades and was set to 3. P is the order of the polynomial describing the static non-linearity and was set to 2. Model identified using Parallel Cascade Identification (PCI) 16 L is the memory length of the dynamic linear element and was set to 10. h j (k) is the impulse response of the dynamic linear element of the j th cascade. Up to a third order cross-correlation was allowed when determining the impulse response of the dynamic linear element in the PCI model. M is the number of terms in the model and was set to 8. a i is the coefficient of the i th term in the model. C j is the number of x c factors in the i th term; 0≤C j ≤3 for any given term. t c,i,j is the lag of the j th x c factor in the i th term; 0≤t c,i,j ≤10 for any given term. Model identified using Fast Orthogonal Search (FOS) 15 H i is the number of x h factors in the i th term; 0≤H k ≤2 for any given term. t h,i,j is the lag of the j th x h factor in the i th term; 0≤t h,i,j ≤10 for any given term. performance (p=0.03) than regression. PCI did not provide statistically significant better performance over regression (p=0.44) while both the EWMA and state-space method provided worse performance (p<0.0001). The %MSE in forecast accuracy was lower for health units with a population of more than 400,000 (p<0.0001). The interaction terms between forecasting method and population size for FOS and PCI were statistically significant (p<0.001), indicating that these methods provided poorer performance than regression for health units with populations less than 400,000 while interactions for the EWMA and state-space/N4SID models were not significant (p=0.24 for each). Forecast accuracy decreased with lead time (p<0.0001). The results of this study are consistent with those obtained by van Dijk et al. where Telehealth was shown to be correlated with ED visits at the provincial level up to two weeks in advance. 9 The results indicate that Telehealth can be used to estimate future visits at the health unit level. Estimates are better for health units with larger populations and non-linear modeling methods produced more accurate estimates for these health units. dictor of ED visits, and for example only the mean number of visits over the training data and information about upcoming holidays and weekends were useful in creating forecasts. Then we would expect a non-significant change in forecasting error with lead time as upcoming holidays/weekends are always precisely known regardless of the forecast horizon. A major limitation of this study was that only two years of Telehealth data were available. Due to this limitation and because only one flu season was available to test prediction performance, it is difficult to draw firm conclusions about the performance of the various methods or the importance of considering nonlinearity. The use of least-squares to fit coefficients for the models may introduce some bias as the distribution of visit counts may be more Poisson than Gaussian and this may have degraded predictive performance. However, the error introduced by this assumption is likely negligible compared to the error due to unexplained variation. An important assumption made in developing the forecasting models was that the Telehealth calls-ED visits relationship does not change over time: we built a predictive model using approximately the first year of data and then tested it over the second year of data. However, since calling behaviours could change over time, for example in response to factors such as season and promotion of the Telehealth service, this assumption could have been violated. Since data confirming past visits is not available in real-time, models cannot be updated on an ongoing basis. Collection of real-time data through syndromic surveillance systems may address this limitation in the future. It was not possible to account for individuals who made calls in one area but sought care in another. This might have influenced results where geographic boundaries separate a large population from the centre of care used by this population. The fact that accuracy of ED visit prediction using Telehealth is better for larger health units might be explained by the fact that call-to-visits ratios are much higher for health units with larger populations (correlation coefficient 0.70, p<0.001). The call-to-visits ratio is simply a measure of the number of calls, adjusted for the number of visits. It might be expected that more calls provide more information that allows the number of visits to be predicted. The fact that non-linear methods show improved performance relative to regression only over health units with larger population might also be explained considering the call-to-visits ratio as non-linear methods may provide benefit only when richer call data are available. Since the first terms selected by FOS included three to four linear terms of the eight terms in the model, it is not surprising that the non-linear methods provided only limited improvements in prediction accuracy. The FOS method works by adding terms one at a time to the model from a pool of candidates, consisting of a set of all possible predictors and their powers and cross-products, on the basis of how much they reduce error in the model accuracy over the training data. An advantage of the FOS method over normal regression is that it can search for terms in the model including power and cross-product (interaction) predictor terms much more quickly than forward, backward, or step-wise selection with regression. The number of candidate predictor variables can be large in time series analysis when the influence of these predictors at multiple lags is considered. For example, the FOS algorithm we used was able to search 28,392 candidates to generate an eight-term model in approximately 17 seconds using a standard laptop com-puter. It should be noted that PCI and FOS performance was better than regression for health units with populations of more than 400,000 (p<0.0001) over both the training and the validation data. Model performance can degrade if over-fitting occurs on the training data, and therefore performance should be examined over both training and validation data when comparing methods. It is not surprising that the EWMA and State-Space methods perform worse than the regression, FOS and PCI methods since these methods explicitly (EWMA) or implicitly (state-space -through the use of state variables) use value of past estimates to produce future estimates. Inaccuracies in past predictions can accumulate to impact future predictions. Performance of these methods might be improved should the actual number of past ED visits be available. In this case, FOS could also be augmented to include terms with past ED visits. Finally, the practical difference in the predictions made using different methods should be considered in addition to the statistical significance of the difference in their accuracy in the context of the intended application. Résultats : La précision des prévisions était significativement supérieure (p<0,0001) dans les 12 circonscriptions sanitaires de plus de 400 000 habitants (75 % de la population de l'Ontario) que dans les circonscriptions plus petites. La ROR a produit les meilleures estimations (p=0,03), tandis que l'IPC n'apportait aucune amélioration significative. Les méthodes ROR, IPC, MMPE et N4SID ont produit de moins bons résultats que la régression dans les petites circonscriptions sanitaires. Télésanté Ontario peut être utilisée pour estimer les visites à l'urgence associées aux maladies respiratoires dans les circonscriptions sanitaires. Les méthodes de modélisation non linéaires produisent de meilleures estimations que la régression dans les circonscriptions qui englobent la majorité de la population. Mots clés : prévision; surveillance; infections de l'appareil respiratoire; modélisation mathématique; planification hôpitaux Exploring reasons for bed pressures in Winnipeg acute care hospitals The impact of influenza-associated respiratory illnesses on hospitalizations, physician visits, emergency room visits, and mortality Coping with winter bed crises Ontario Ministry of Health and Long-Term Care. Ontario Health Plan for an Influenza Pandemic Ontario Ministry of Health and Long-Term Care. Pandemic H1N1 (pH1N1) Alternate Influenza Assessment, Treatment, and Referral Services. Decision Document The Ontario Expert Panel on SARS and Infectious Disease Control: For the Public's Health: A Plan of Action-Final Report of the Ontario Expert Panel on SARS and Infectious Disease Control The SARS Commission Interim Report: SARS and Public Health in Ontario. Ontario Ministry of Health and Long-Term Care Forecasting demand of emergency care Can Telehealth Ontario respiratory call volume be used as a proxy for emergency department respiratory visit surveillance by public health? Modeling emergency department visit patterns for infectious disease complaints: Results and application to disease surveillance Time series modeling for syndromic surveillance State space and ARMA models: An overview of the equivalence. Bank of Canada Working Paper 93-4 Methods and Applications, 3 rd ed N4SID: Subspace algorithms for the identification of combined deterministic-stochastic systems A robust orthogonal algorithm for system identification and time-series analysis Parallel cascade identification and kernel estimation for nonlinear systems Code-based syndromic surveillance for influenza-like illness by international classification of diseases Health Regions: Boundaries and Correspondence with Census Geography. 82-402-XIE Postal Code Conversion File (PCCF), Reference Guide. 92F0153GIE MATLAB System Identification Toolbox User's Guide, Version 6 ed The MathWorks Inc. MATLAB Version 7 Release 14 Table 109-5315 -Estimates of population (Census and administrative data), by age group and sex, Canada, provinces, territories, health regions and peer groups, annual (number) (table), CANSIM (database), Using E-STAT (distributor)