key: cord-240372-39yqeux4 authors: Costa, Kleyton Vieira Sales da; Silva, Felipe Leite Coelho da; Coelho, Josiane da Silva Cordeiro title: Forecasting Quarterly Brazilian GDP: Univariate Models Approach date: 2020-10-26 journal: nan DOI: nan sha: doc_id: 240372 cord_uid: 39yqeux4 Gross domestic product (GDP) is an important economic indicator that aggregates useful information to assist economic agents and policymakers in their decision-making process. In this context, GDP forecasting becomes a powerful decision optimization tool in several areas. In order to contribute in this direction, we investigated the efficiency of classical time series models and the class of state-space models, applied to Brazilian gross domestic product. The models used were: a Seasonal Autoregressive Integrated Moving Average (SARIMA) and a Holt-Winters method, which are classical time series models; and the dynamic linear model, a state-space model. Based on statistical metrics of model comparison, the dynamic linear model presented the best forecasting model and fit performance for the analyzed period, also incorporating the growth rate structure significantly. The economic activity of a country can be influenced by several factors that subject economic agents to change their consumption and investment decisions, in addition to impacting other results, such as inflation and unemployment. Such factors, or shocks, result from the modification of economic policies, in the level of production technology, through meteorological changes etc. The gross domestic product (GDP) is one of the main indexes for measuring the level of economic activity, and the forecast of its trajectory provides useful information concerning the future economic trend in the short term, acting as an object for the expectation of economic behavior. Significant impacts on economic activity arise through crises. They are a dysfunction inherent in the free market system. Through the development of information transmission technologies and the global integration of markets, the scope and frequency of these dysfunctions have been expanded. Beginning in the second quarter of 2014, the Brazilian economic crisis is still the subject of many analyzes, with no consensus on the generating variables, as well as their consequences. In the second quarter of 2016, the GDP growth rate accumulated in four quarters had reached the lowest level of the last two decades (-4.6 %) . The data show that the recovery (after a significant drop) was not complete, followed by a period of stagnation in the country's growth rate. Paula and Pires (2017) analyzed the ineffectiveness of counter-cyclical policies -between 2011 and 2014 -as a result of problems in the coordination of macroeconomic policy; and also by the occurrence of exogenous shocks, such as the deterioration of trade terms and the water crisis that occur in period. Filho (2017) argues that the origin of the Brazilian economic crisis was due to a series of supply and demand shocks that (mostly) were caused by wrong public policies, contributing to the reduction of growth potential the Brazilian economy and to the increase in tax cost. According to Feijó and Ramos (2013) , the most relevant aggregates that derive from the System of National Accounts are the measures of product, income and expenditure. The Macroeconomic Aggregates are statistical constructions that synthesizes the productive effort of a given country or region and its possible consequences on the generation of income and expenditure for a specific period of time. By definition, the GDP of a country or region represents production 1 of all production units of the economy -government, self-employed workers, companies etc. -in a given period, usually quarterly or annually, at market prices 2 . Blanchard and Johnson (2017) presents two ways of interpreting GDP. The so-called nominal GDP is defined as the sum of quantities of final goods multiplied by the current 1 The socially organized economic activity that aims to create goods and services to be traded on the market and/or they are achieved by means of factors production (land, capital and labor) traded on the market (IBGE, 2016) . 2 Economic transactions with observed or imputed market value. price of the goods, that is, considering the inflationary effect during the calculation period. Real GDP takes into account constant prices and sets a given year as a base, excluding the effect of price increases. Restricting itself to GDP as an instrument for efficiently measuring the quality of life of the population has theoretical and practical limitations. The growth in production is not a sufficient condition for improving well-being (education, health, culture, security etc.). This is because the quality of economic growth is not part of the scope defined for the calculation of GDP. There is possibility of an expansion sustained by war expenditures (production of supplies and weapons, construction of military installations etc.) or through the reconstruction of a region that has been affected by natural disasters (hurricanes, earthquakes, floods etc.), that it is reasonable to understand themselves as issues that do not promote economic and social well-being or are motivated to do so. Angus Deaton, the Economics Nobel Prize in 2015 says, If crime goes up, and we spend more on prisons, GDP will be higher. If we neglect climate change, and spend more and more on cleaning up and repairing after storms, GDP will go up, not down; we count the repairs but ignore the destruction. (Deaton, 2013) Time series analysis has proven to be an effective tool to understand the behavior patterns of a dataset distributed sequentially over time, with a wide range of models for the purpose of analyzing and predicting trends and seasonality. Seasonal Autoregressive Integrated Moving Average (SARIMA) and Holt-Winters method are considered classical time series models and the class of dynamic linear models is part of the Bayesian approach. Regarding the contributions that use classic models, the analysis constructed by Abonazel and Abd-Elftah (2019) for Egypt's annual GDP between the years 1965 and 2016, with a forecasting of ten years ahead (2017 to 2026), presented results that pointed to the country's GDP growth during the period under analysis; Wabomba et al. (2016) estimated Kenya's GDP between 2013 and in 2017. The result obtained was significant growth in the Kenyan economy in the period; Agrawal (2018) modeled the series of India's real GDP growth rate from 1996 to 2017. In the analyzed data, the ARIMA model did not show any more significant results than other models. The author also used the Holt-Winters model and linear trend, both showing similar results each other; and da Silva et al. (2020) found significant results using ARIMAX and SARIMAX models (take into account exogenous variables) for the forecast of Brazilian annual and quarterly real GDP for the year 2019. For the Bayesian approach and the class of state-space models, Piccoli (2015) analyzed four dynamic linear models to identify the one with the best forecasting capacity for nominal GDP in the United States. Best results were obtained using a multivariate model SUTSE (Seemingly Unrelated Time Series Equations) that considered as variables the nominal GDP, the industry production index, the consumer price index (inflation), and the quarterly interest rate for US Treasury bills; Rees et al. (2015) built new measures for Australia's GDP growth, using state-space methods. The results found have a high correlation with the figures published officially for GDP growth. However, the measures are less volatile, easier to predict, and achieved good results in nowcasting; Issler and Notini (2016) estimate Brazilian real monthly GDP with state-space representation and also find good results in forecasting when compared with Central Bank Economic Activity Index (IBC-Br) 3 ; Migon et al. (1993) developed a study about the performance of Bayesian Dynamic Models applied to a set of Brazilian macroeconomics time series (industrial productivity index, the balance of trade, components of GDP and others) between the period 1970 to 1990. The comparison was made between the dynamic models and classical structured models and obtained results indicate that the Bayesian approach was similar to the classical approach. Another applied study was developed by Baurle et al. (2020) , with the aim of forecasting GDP in the euro area and Switzerland with a Bayesian vector autoregressive structure (BVAR) and a factor model structure. He found evidence that the factor model structure performs satisfactorily. Another accepted approach to GDP forecasting is macroeconomic projections based on leading indicators. Garnitz et al. (2019) applied this strategy to forecast GDP growth in forty-four countries, including Brazil. One of the results found indicates that the forecasts can be improved by adding World Economic Survey (WES) indicators of the three main trading partners by country. The aim of this work is to investigate a suitable time series model to describe and forecast Brazilian GDP, also investigating the fit of these models to dynamics between periods of economic growth and recession. For this purpose, it is compared different classes of time series models. Thus, the chosen models were the Holt-Winters method, SARIMA and dynamic linear model. In the literature, there are some applications regarding these models but no comparative studies were found using the models adopted in this work. This work is organized as follows: Section 2 describes the methodology. Section 3 presents the results and discussion, and, finally, the last section provides the main conclusions and some possibilities for future research. To follow we outline the data and the empirical approach used to fitted and forecast the time series of Brazilian gross domestic product between the years 1996 and 2019, at 1995 prices. This section also defines the models that were investigated. Care was also taken that the references used in the definition of models and metrics also correspond to studies and authors with wide use and quality proven by the academic community. The quality of the data used in empirical analysis is a fundamental element for the quality of the results. A factor that contributes to the empirical analysis of GDP is the vast documentation made available by government agencies. For that, we obtained the time series in the IBGE Automatic Recovery System (IBGE, 2020). Data used for the analysis are quarterly and comprise the first quarter of 1996 until the fourth quarter of 2019. Data values used in this article are in Brazilian Real (BRL). Statistical analyzes, as well as graphic representations, were built using open-source software R Core Team (2020). The United Nations (2010) says that GDP derives from the concept of value added. Therefore, GDP is the sum of gross value added of all resident producer units plus that part of taxes on products, fewer subsidies on products. GDP is also equal to the sum of finalizes of goods and services measured at purchasers' prices, less the value of imports goods and services. And GDP is too equal the sum of primary incomes distributed by resident producer units. According to Feijó and Ramos (2013) GDP can be calculated in three different ways, but are part of the Accounting Identity (Production = Income = Expenditure), guiding National Accounts. The perspective of production is calculated by sum the added values of economic activities plus taxes, net of subsidies, on products. That is, where GVA it is gross value added, IC is the intermediate consumption, T are taxes on products and Sub are subsidies on products. The income perspective is obtained by adding the remunerations of factors of production. Labor is remunerated by wages, loan capital is remunerated by interest, venture capital is remunerated by profit, and ownership of production goods ("land") is remunerated by rent. That is, where W are wages, GOS are gross operating surplus (sum of interest, profit e rent), T are taxes on products and Sub are subsidies on products. The time series constructed in this work was built from the perspective of expenditure. It is calculated by the sum of household consumption, investment, government spending and net exports. That is, where C it is household consumption, I it is investment (gross fixed capital formation less stock variation), G it is government consumption, and NE it is net exports (Exports less Imports). As described in Cowpertwait and Metcalfe (2009) , the Holt-Winters method was proposed by Holt (1957) and Winters (1960) , using exponentially weighted moving averages to update those needed for seasonal adjustment of the mean (trend) and seasonality. The method has two variations with four equations: one forecast equation and three smoothing equations. Hyndman and Athanasopoulos (2018) The additive method equations is describe as following, whereŷ t+h|t is the forecast equation. The t , b t and s t are respectively level, trend and seasonality equations, with corresponding smoothing parameters α, β and γ. The parameter m denotes the frequency of seasonality, and for quarterly data m = 4. Finally, k is the integer part of ( h−1 m ) which ensures that the estimates of the seasonal indices used for forecasting come from the final year of the sample. For the multiplicative method the same equations t , b t and s t are defined. But the change in structure occurs because instead of sum the equations inŷ t+h|t an operation is performed to multiply the sum of the level and trend equations by the seasonality equation. Box & Jenkins models determine the proper stochastic process to represent a given time series by passing white noise through a linear filter (Morettin and Toloi, 2018) . The model used was SARIMA, seeking to incorporate the seasonality component that is present in the data under analysis. The SARIMA of order (p, q, d) × (P, Q, D) s is defined by, where θ(B) is the moving average operator of q order, φ(B) is the autoregressive operator of p order, Φ(B s ) is the seasonal autoregressive operator of P order, Θ(B s ) is the seasonal moving average operator of Q order, ∇ d is the simple difference operator, ∇ D s is the seasonal difference operator and α t is the noise. Dynamic linear models are an important class of state-space models. Broadly used in the last decades, they have a high degree of efficiency for the analysis and forecast of time series, providing flexibility and applicability through an elegant and robust probabilistic apparatus. The estimation and inference challenges are solved by recursive algorithms, which follow the Bayesian approach, calculating conditional distributions of quantities of interest given the observed information. Considering a series affected by time, through dynamic and random deformations, they associate seasonal or regressive components. In this work were used contributions from West and Harrison (1997) , Laine (2019) a system equation and initial information given by where F t e G t are known matrices; v t and w t are two sequences of independent noises, with average zero and known covariance matrices V t and W t respectively. D t is the current information set; m 0 and C 0 contains relevant information about the future, according usual statistical sense, given D t , (m t , C t ) is sufficient for (Y t+1 , θ t+1 , . . . , Y t+k , θ t+k ). To take into account growth and seasonality, it is defined θ t = (µ t , β t , γ t , γ t−1 , γ t−2 ), where µ t is the current level, β t is the slope of the trend, γ t , γ t−1 and γ t−2 are the seasonal components. The selection of most suitable forecasting model was made through the contributions of Hyndman and Koehler (2006) , Armstrong (2001) , Morettin and Toloi (2018) and Ahlburg (1984) using the following metrics: i. square root of the mean squared error (RMSE); ii. mean absolute error (MAE); iii. mean absolute percentage error (MAPE); and iv. Theil's inequality coefficient (U-Theil). The first two metrics are widely used for measures whose scale depends on the scale of the data. The third metric has the advantage of being scale-independent, and so are frequently used to compare forecast performance across different data sets. And the last metric can improve the accuracy of a forecast through Theil's decomposition of forecast error into bias, regression and disturbance proportions and his associated linear correction procedure. This section presents the results obtained using the Holt-Winters additive method, SARIMA and dynamic linear models to fit the data of interest. For each model, it was plotted the observed and predicted values, and also the 95% confidence interval for the predicted values. Graphics are effective tools to understand the behavior of the series and whether the models generate reasonable fit and predictions in relation to the observed data. Compared to the multiplicative Holt-Winters method, the additive formulation was considered the most appropriate, taking into account the sum of squared errors. Figure 1 shows the adjustment of the additive method for the Brazilian quarterly GDP data in the period 1996 to 2016 and the forecast between the years 2017 and 2019. It is observed that the model was able to fit the data reasonably. The fit also occurred significantly in periods of strong recession, such as the international financial crisis of 2008 and the period of recession in the Brazilian economy between the second quarter of 2014 and the fourth quarter of 2017. To apply SARIMA model, the behavior of autocorrelation (ACF) and partial autocorrelation functions (PACF) were verified. In Figure 2 (a), it is possible to see a slow decay rate of the autocorrelation function to zero. This behavior indicates the non-stationarity of the series, which needs to be differentiated in order to make it stationary. We used an algorithm to generate sixteen SARIMA models following the principle of parsimony. From the generated models, the structure with the best results was the SARIMA (0, 1, 1) × (0, 1, 1) 4 , with metrics: Akaike information criterion (-438.58 ); the sum of squared error (0.01686); and the Ljung-Box test (p-value = 0.96). Figure 4 shows the model fitted to the Brazilian quarterly GDP data for the period 1996 to 2016 and the forecast between the years 2017 and 2019. It is observed that the model also fits the data reasonably. It includes periods when economic shocks occurred, such as the mentioned crises. In this work, the dynamic regression matrix F t and the evolution matrix G t of the model are For the study, it was assumed the observational variance V t = σ 2 , and the covariance matrix of the system W t is a diagonal matrix introduced by W t = diag(σ 2 µ , σ 2 β , σ 2 γ , 0, 0). These unknown variances were also estimated using Bayesian inference. Thus, to complete the specification of the model, it was assumed independent inverse gamma priors distributions with means a, a θ 1 , a θ 2 , a θ 3 and variances b, b θ 1 , b θ 2 , b θ 3 , respectively, fixed in known values. Therefore, by using the unobservable states as latent variables, a Gibbs sampler can be run on the basis of the following full conditional densities: with The full conditional density of the states is a normal distribution and it is covered in the used dlm package (Petris, 2010) . From the Gibbs sampler, 5000 iterations were generated for each parameter, model variances, out of which the 1000 initial iterations were considered as burn-in period and discarded. Hence, the remaining iterations were used to compose the posterior samples of the estimated variances. Posterior estimates of the four unknown variances, from the Gibbs sampler output, can be seen in Figure 5 . The RMSE, the MAPE, the MAE, and the U-Theil were calculated for the fitted values of models, and their results are shown in Table 1 . The RMSE and MAPE metrics were investigated for the forecast values and the results are shown in Table 2 . It is observed that the better results were given through the models SARIMA (0, 1, 1) × (0, 1, 1) 4 and dynamic linear, the latter being one that best fits the series of Brazilian GDP, at 1995 prices, for having achieved the lowest values in all metrics for fitted and forecast values. As the dynamic linear model was chosen the best model, in the fit and prediction criteria, it is shown in Table 3 the comparison between the growth rate for the Brazilian GDP according to the observed values and the growth rate of values predicted by the dynamic linear model. Figure 7 shows that the proposed model in this study obtained satisfactory results when the observed and predicted growth rates are compared. The projection data also maintained the tendency of Brazilian economic growth to stagnate in the analyzed period. Therefore, economic policies were not effective for a consistent recovery in the short and medium term. Understanding the GDP behavior is a topic of study and discussion by society and the academic community. In the present work, we proposed the application of the Holt-Winters additive method, SARIMA and dynamic linear model with interest in forecast the behavior of Brazilian quarterly GDP, at 1995 prices. The data comprise the period between the first quarter of 1996 and the fourth quarter of 2019. Theil's inequality coefficient (U-Theil) shows that the models used in the study are better than the naive prediction, i.e, when the forecast at time t is the value observed in t −1 . Both the analyzed series and the models' forecast show the necessity for sustained growth in a market economy. By the metrics RMSE, MAE, MAPE and U-Theil, it appears that the dynamic linear model presented the best fit to data and efficient forecast performance, with MAPE of 0.839. We find evidence in this study that corroborates with the observed results of stagnation in the Brazilian economy after a crisis period started in the second quarter of 2014. Therefore, the dynamic linear model proved to be efficient for forecasting and fit to GDP data even with economic shocks. From the pandemic caused by Covid-19 in 2020 and your economic and humankind consequences, the time series forecasting models must be adjusted so that they can adapt to a significant exogenous and structure economic shock. This means that the model was probably able to capture the data structure and generate forecasts effectively Forecasting egyptian gdp using arima models GDP modelling and forecasting using ARIMA: an empirical study from India Forecast evaluation and improvement using theil's decomposition Principles of forecasting: a handbook for researchers and practitioners Forecasting the production side of gdp Uso de ferramentas econométricas para modelar e estimar o pib do brasil The great escape: health, wealth, and the origins of inequality A crise econômica de Forecasting gdp all over the world using leading indicators based on comprehensive survey data Forecasting seasonals and trends by exponentially weighted moving averages Forecasting: Principles and Practice Another look at measures of forecast accuracy Brasil : ano de referência 2010. IBGE Sistema ibge de recuperação automática -sidra Estimating brazilian monthly gdp: a state-space approach Introduction to dynamic linear models for time series analysis Modelos bayesianos univariados aplicados à previsão de séries econômicas Análise de séries temporais: modelos lineares univariados Crise e perspectivas para a economia brasileira An r package for dynamic linear models Dynamic Linear Models with R Identification of a dynamic linear model for the american gdp R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing A state-space approach to australian gross domestic product measurement System of National Accounts Modeling and forecasting kenyan gdp using autoregressive integrated moving average (arima) models Bayesian Forecasting and Dynamic Models Forecasting sales by exponentially weighted moving averages