key: cord-162105-u0w56xrp authors: Centeno, Raffy S.; Marquez, Judith P. title: How much did the Tourism Industry Lost? Estimating Earning Loss of Tourism in the Philippines date: 2020-04-21 journal: nan DOI: nan sha: doc_id: 162105 cord_uid: u0w56xrp The study aimed to forecast the total earnings lost of the tourism industry of the Philippines during the COVID-19 pandemic using seasonal autoregressive integrated moving average. Several models were considered based on the autocorrelation and partial autocorrelation graphs. Based on the Akaike's Information Criterion (AIC) and Root Mean Squared Error, ARIMA(1,1,1)$times$(1,0,1)$_{12}$ was identified to be the better model among the others with an AIC value of $-414.51$ and RMSE of $47884.85$. Moreover, it is expected that the industry will have an estimated earning loss of around 170.5 billion pesos if the COVID-19 crisis will continue up to July. Possible recommendations to mitigate the problem includes stopping foreign tourism but allowing regions for domestic travels if the regions are confirmed to have no cases of COVID-19, assuming that every regions will follow the stringent guidelines to eliminate or prevent transmissions; or extending this to countries with no COVID-19 cases. According to the Philippine Statistics Authority, tourism accounts to 12.7% of the countrys Gross Domestic Product in the year 2018 [1] . Moreover, National Economic Development Authority reported that 1.5% of the countrys GDP on 2018 is accounted to international tourism with Korea, China and USA having the largest numbers of tourists coming in []. In addition, Department of Tourism recorded that 7.4% of the total domestic tourists or an estimated figure of 3.97 million tourists, both foreign and domestics were in Davao Region on 2018 [2] . Also, employment in tourism in-dustry was roughly estimated to 5.4 million in 2018 which constitutes 13% of the employment in the country according to the Philippine Statistics Authority [3] . Hence, estimating the total earnings of the tourism industry in the Philippines will be very helpful in formulating necessary interventions and strategies to mitigate the effects of the COVID-19 pandemic. This paper will serve as a baseline research to describe and estimate the earnings lost of the said industry. The objective of this research is to forecast the monthly earnings loss of the tourism industry during the COVID-19 pandemic by forecasting the monthly foreign visitor arrivals using Seasonal Autoregressive Integrated Moving Average. Specifically, it aims to answer the following questions: 1. What is the order of the seasonal autoregressive intergrated moving average for the monthly foreign visitor arrivals in the Philippines? 2. How much earnings did the tourism industry lost during the COVID-19 pandemic? The study covers a period of approximately eight years from January 2012 to December 2019. Also, the modeling technique that was considered in this research is limited only to autoregressive integrated moving average (ARIMA) and seasonal autoregressive integrated moving average (SARIMA). Other modeling techniques were not tested and considered. The research utilized longitudinal research design wherein the monthly foreign visitor arrivals in the Philippines is recorded and analyzed. A longitudinal research design is an observational research method in which data is gathered for the same subject repeatedly over a period of time [4] . Forecasting method, specifically the Seasonal Autoregressive Integrated Moving Average (SARIMA), was used to forecast the future monthly foreign visitor arrivals. In selecting the appropriate model to forecast the monthly foreign visitor arrivals in the Philippines, the Box-Jenkins methodology was used. The data set was divided into two sets: the training set which is composed of 86 data points from January 2012 to December 2018; and testing set which is composed of 12 data points from January 2019 to December 2019. The training set was used to identify the appropriate SARIMA order whereas the testing set will measure the accuracy of the selected model using root mean squared error. The best model, in the context of this paper, was characterized to have a low Akaike's Information Criterion and low root mean squared error. The data were extracted from Department of Tourism website. The data were composed of monthly foreign visitor arrivals from January 2012 to December 2019 which is composed of 98 data points. Box-Jenkins methodology refers to a systematic method of identifying, fitting, checking, and using SARIMA time series models. The method is appropriate for time series of medium to long length which is at least 50 observations. The Box-Jenkins approach is divided into three stages: Model Identification, Model Estimation, and Diagnostic Checking. In this stage, the first step is to check whether the data is stationary or not. If it is not, then differencing was applied to the data until it becomes stationary. Stationary series means that the value of the series fluctuates around a constant mean and variance with no seasonality over time. Plotting the sample autocorrelation function (ACF) and sample partial autocorrelation function (PACF) can be used to assess if the series is stationary or not. Also, Augmented Dickey−Fuller (ADF) test can be applied to check if the series is stationary or not. Next step is to check if the variance of the series is constant or not. If it is not, data transformation such as differencing and/or Box-Cox transformation (eg. logarithm and square root) may be applied. Once done, the parameters p and q are identified using the ACF and PACF. If there are 2 or more candidate models, the Akaike's Information Criterion (AIC) can be used to select which among the models is better. The model with the lowest AIC was selected. In this stage, parameters are estimated by finding the values of the model coefficients which provide the best fit to the data. In this research, the combination of Conditional Sum of Squares and Maximum Likelihood estimates was used by the researcher. Conditional sum of squares was utilized to find the starting values, then maximum likelihood was applied after. Diagnostic checking performs residual analysis. This stage involves testing the assumptions of the model to identify any areas where the model is inadequate and if the corresponding residuals are uncorrelated. Box-Pierce and Ljung-Box tests may be used to test the assumptions. Once the model is a good fit, it can be used for forecasting. Forecast evaluation involves generating forecasted values equal to the time frame of the model validation set then comparing these values to the latter. The root mean squared error was used to check the accuracy of the model. Moreover, the ACF and PACF plots were used to check if the residuals behave like white noise while the Shapiro-Wilk test was used to perform normality test. The following statistical tools were used in the data analysis of this study. Sample autocorrelation function measures how correlated past data points are to future values, based on how many time steps these points are separated by. Given a time series X t , we define the sample autocorrelation function, r k , at lag k as [5] whereX is the average of n observations . Sample partial autocorrelation function measures the correlation between two points that are separated by some number of periods but with the effect of the intervening correlations removed in the series. Given a time series X t , the partial autocorrelation of lag k is the autocorrelation between X t and X t+k with the linear dependence of X t on X t+1 through X t+k−1 removed. The sample partial autocorrelation function is defined as [5] and r k is the sample autocorrelation at lag k. RMSE is a frequently used measure of the difference between values predicted by a model and the values actually observed from the environment that is being modelled. These individual differences are also called residuals, and the RMSE serves to aggregate them into a single measure of predictive power. The RMSE of a model prediction with respect to the estimated variable X model is defined as the square root of the mean squared error [6] whereŷ i is the predicted values, y i is the actual value, and n is the number of observations. The AIC is a measure of how well a model fits a dataset, penalizing models that are so flexible that they would also fit unrelated datasets just as well. The general form for calculating the AIC is [5] AIC p,q = −2 ln(maximized likelihood) + 2r n where n is the sample size, r = p + q + 1 is the number of estimated parameters, and including a constant term. The Ljung−Box statistic, also called the modified Box-Pierce statistic, is a function of the accumulated sample autocorrelation, r j , up to any specified time lag m. This statistic is used to test whether the residuals of a series of observations over time are random and independent. The null hypothesis is that the model does not exhibit lack of fit and the alternative hypothesis is the model exhibits lack of fit. The test statistic is defined as [5] wherer 2 k is the estimated autocorrelation of the series at lag k, m is the number of lags being tested, n is the sample size, and the given statistic is approximately Chi Square distributed with h degrees of freedom, where h = m − p − q. Conditional sum of squares was utilized to find the starting values in estimating the parameters of the SARIMA process. The formula is given by [7] θ n = arg min θ∈ s n (θ) (5) where According to [7] , once the model order has been identified, maximum likelihood was used to estimate the parameters c, φ 1 , ..., φ p , θ 1 , ..., θ q . This method finds the values of the parameters which maximize the probability of getting the data that has been observed . For SARIMA models, the process is very similar to the least squares estimates that would be obtained by minimizing where ε t is the error term. Box−Cox Transformation is applied to stabilize the variance of a time series. It is a family of transformations that includes logarithms and power transformation which depend on the parameter λ and are defined as follows [8] where y i is the original time series values, w i is the transformed time series values using Box-Cox, and λ is the parameter for the transformation. R is a programming language and free software environment for statistical computing and graphics that is supported by the R Foundation for Statistical Computing [9] . R includes linear and nonlinear modeling, classical statistical tests, time-series analysis, classification modeling, clustering, etc. The 'forecast' package [7] was utilized to generate time series plots, autocorrelation function/partial autocorrelation function plots, and forecasting. Also, the 'tseries' package [10] was used to perform Augmented Dickey-Fuller (ADF) to test stationarity. Moreover, the 'lmtest' package [11] was used to test the parameters of the SARIMA model. Finally, the 'ggplot2' [12] , 'tidyr' [13] , and 'dplyr' [14] were used to plot time series data considered during the conduct of the research. Line plot was used to describe the behavior of the monthly foreign visitor arrivals in the Philippines. Figure 1 shows that there is an increasing trend and a seasonality pattern in the time series. Specifically, there is a seasonal increase in monthly foreign visitor arrivals every December and a seasonal decrease every September. These patterns suggest a seasonal autoregressive integrated moving average (SARIMA) approach in modeling and forecasting the monthly foreign visitor arrivals in the Philippines. Akaike Information Criterion and Root Mean Squared Error were used to identify which model was used to model and forecast the monthly foreign visitor arrivals in the Philippines. Table 1 shows the top two SARIMA models based on AIC generated using R. ARIMA (1,1,1)×(1,0,1) 12 has the lowest AIC with a value of −414.56 which is followed by ARIMA (1,1,1)×(1,0,1) 12 with an AIC value of −414.51. Model estimation was performed on both models and generated significant parameters for both models (refer to Appendix A.2). Moreover, diagnostic checking was performed to assess the model. Both models passed the checks using residual versus time plot, residual versus fitted plot, normal Q-Q plot, ACF graph, PACF graphs, Ljung-Box test, and Shapiro-Wilk test (refer to Appendix A.3). Finally, forecast evaluation was performed to measure the accuracy of the model using an out-of-sample data set (refer to Appendix A.4). ARIMA (1,1,1)×(1,0,1) 12 produced the lowest RMSE relative to ARIMA (0,1,2)×(1,0,1) 12 . Hence, the former was used to forecast the monthly foreign visitor arrivals in the Philippines. ing the COVID-19 Pandemic Crisis Figure 2 shows the estimated earnings loss (in billion pesos) of the tourism industry of the Philippines every month from April 2020 to December 2020. According to the Department of Tourism, the Average Daily Expenditure (ADE) for the month in review is P 8,423.98 and the Average Length of Stay (ALoS) of tourists in the country is recorded at 7.11 nights. The figures were generated by multiplying the forecasted monthly foreign visitor arrivals, ADE, and ALoS (rounded to 7) [2] . Moreover, it is forecasted under community quarantine that the recovery time will take around four to five months (up to July) [15] . With this, the estimated earning loss of the country in terms of tourism will be around 170.5 billion pesos. Based on the results presented on the study, the following findings were drawn: 1. The order of SARIMA model used to forecast the monthly foreign visitor arrival is ARIMA (1,1,1)×(1,0,1) 12 since it produced a relatively low AIC of −414.51 and the lowest RMSE of 47884.85 using an out-of-sample data. This means that the model is relatively better among other SARIMA models considered in forecasting the monthly foreign visitor arrivals in the Philippines. 2. If the COVID-19 Pandemic lasts up to five months, the tourism industry of the Philippines will have an estimated earnings loss of about P 170.5 billion. Assumptions about average daily expenditure and average length of stay of tourists were based on the Department of Tourism reports. The projected P 170.5 billion loss on Philippines foreign tourism is really a huge money. Regaining such loss the soonest time, however, would only jeopardize the lives of the Filipino people. On the other hand, the government can, perhaps, reopen the Philippines domestic tourism. This would somehow help regain the countrys loss on revenue from tourism, although not fully. However, the following recommendations, shown in scenarios/options below, may be helpful in regaining it, both in foreign and domestic tourism, and ensuring safety among Filipinos, as well. 1. Option 1: Stop foreign tourism until the availability of the vaccine, but gradually open domestic tourism starting July of 2020. In this scenario/option, the following considerations may be adhered to, viz. (a) not all domestic tourism shall be reopened in the entire country; only those areas with zero covid-19 cases; (b) for areas where domestic tourism is allowed/reopened, appropriate guidelines should be strictly implemented by concerned departments/agencies to eliminate/prevent covid-19 transmission; and (c) digital code that would help in tracing the contacts and whereabouts of domestic tourists, as being used in China and Singapore, should be installed before the reopening of the domestic tourism. 2. Option 2: Gradual opening of foreign tourism starting July 2020 and full reopening of domestic tourism on the first semester of 2021 or when the covid-19 cases in the Philippines is already zero. However, the following considerations should be satisfied, viz. (a) only countries with covid-19 zero cases are allowed to enter the Philippines; (b) appropriate guidelines should be strictly implemented by concerned departments/ agencies both for foreign and domestic tourism to eliminate/ prevent the spread of the said virus; and (c) digital code that would help in tracing the contacts and whereabouts of foreign tourists, as being used in China and Singapore, should be installed before reopening the foreign tourism in the Philippines. A.1 Model Identification Fig. 3 : Line, ACF, and PACF Plot of Monthly Visitor Arrivals Line, ACF, and PACF graph were used to identify the model to be used to forecast the monthly visitor arrivals in the Philippines. The line graph in Figure 3 shows an increasing trend which suggests a non-stationary behavior. This is supported by the ACF and PACF plots which shows a slow decay in all the lags of the former and the the first lag is significant for the latter. Moreover, the line plot slightly display an increasing variance across time. Therefore, data transformation such as differencing and Box-Cox transform were applied to the time series data. Figure 4 show the line, ACF, and PACF graph of the transformed data. The line graph shows that the data is stationary which is supported by both ACF and PACF graphs. Moreover, ACF graph suggests a seasonal pattern in the data since 12 th , 24 th , 36 th , 48 th , and 60 th lags are significant. This is also true in the case of PACF since the 12 th lag is significant. Akaike's Information Criterion (AIC) was used to identify the best SARIMA model from among the models considered. Table 2 shows the top 2 models with the least AIC, namely: ARIMA (0,1,2)×(1,0,1) 12 (Model 1) and ARIMA (1,1,1)×(1,0,1) 12 (Model 2)with an AIC of −414.56 and −414.51, respectively. The combination of conditional sum of squares and maximum likelihood estimates were used to estimate the parameters of the three moving averages and test its significance. Table 3 shows the estimated coefficients, standard errors, z-values, and p-values of each parameters of ARIMA (0,1,2)×(1,0,1) 12 . Since the p-value of the parameter is less than 0.05, there is sufficient evidence to say that the estimate of the moving averages, seasonal autoregressive, and seasonal moving average are significantly different from zero. The combination of conditional sum of squares and maximum likelihood estimates were used to estimate the parameters of the three moving averages and test its significance. Table 4 shows the estimated coefficients, standard 12 . Since the p-value of the parameter is less than 0.05, there is sufficient evidence to say that the estimate of the moving averages, seasonal autoregressive, and seasonal moving average are significantly different from zero. Residual versus Time, Residual versus Fitted, and Normal Q-Q Plot were used to perform diagnostic checking for the model whereas ACF and PACF plots of the residuals were used to check if there are remaining patterns that should be accounted by the model 1 and 2. Graphs for Model 1 are displayed in the left whereas graphs for Model 2 are displayed in the right. Figure 5 shows that the residuals and time does not display correlation between the two variables. Therefore, this scatter plot suggests that the residuals has no serial correlation, that is, there is no interdependence between time and residuals. This is supported by Ljung-Box test which suggests that the error terms behave randomly for both Model 1 (Q(20) = 20.109, df=20, Model df=4, p= 0.4511) and Model 2 (Q(20) = 20.941, df=20, Model df=4, p= 0.4006). In addition, the residuals versus fitted values scatter plot displays no visible funneling pattern which indicates that the variances of the error term are relatively equal. Moreover, the normal Q-Q plot suggests that the residuals are normally distributed since most of the the values lie along a line. This is supported by Shapiro-Wilk test which suggest that the error is normally distributed for both Model 1 (W= 0.98062, p= 0.2372) and Model 2 (W= 0.9852, p= 0.4513). Finally, Fig. 5 : Line, ACF, and PACF Plot of the Two Models ACF and PACF graphs displays that all of the lags are within the acceptable limits. Therefore, all the lags are not significant which means that the residuals of the model may be considered as white noise. Hence, the residuals are assumed to be Guassian white noise. Figure 6 shows the ACF and PACF graphs of the forecast errors both models. All of the autocorrelation and partial autocorrelation are within the limits which means that these values are not significant. Therefore, the forecast errors are considered white noise. Shapiro-Wilk test was used to test if the forecast errors of both models were normally distributed. The results show that there is no sufficient evidence to say that the forecast error terms are not normally distributed for both Model 1 (W= 0.0.878, p= 0.082) and Model 2 (W= 0.875, p= 0.075). This means that it can be assumed that the fore- Based on the diagnostic presented, the models satisfied all the assumptions of a seasonal autoregressive integrated moving average model. Furthermore, Model 2 is relatively accurate compared to Model 1 based on the RMSE of each model. Hence, the ARIMA (1,1,1)×(1,0,1) 12 was used to forecast the monthly visitor arrivals in the Philippines. Philippine statistic authority: Contribution of tourism to the philippine economy is 12.7 percent Department of tourism-philippines: Tourism statistics Philippine statistic authority Encyclopedia of Research Design, 1 ed Time Series Analysis: Forecasting and Control Long Short-Term Memory Networks with Python: Develop Sequence Prediction Models with Deep Learning Automatic time series forecasting: the forecast package for R Box-Cox Transformation R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing tseries: Time Series Analysis and Computational Finance. R package version 0 Diagnostic checking in regression relationships ggplot2: Elegant Graphics for Data Analysis Package 'tidyr A grammar of data manipulation What quarantine measures can do? modelling the dynamics of covid-19 transmission in davao region