key: cord-0743797-ssl292e9 authors: Kınacı, Harun; Ünsal, Mehmet Güray; Kasap, Reşat title: A close look at 2019 novel coronavirus (COVID 19) infections in Turkey using time series analysis & efficiency analysis date: 2020-12-23 journal: Chaos Solitons Fractals DOI: 10.1016/j.chaos.2020.110583 sha: e909a5b2913cc844ec87205c3157b46d0def36e0 doc_id: 743797 cord_uid: ssl292e9 2019 novel coronavirus (COVID 19) infections detected as the first official records of the disease in Wuhan, China, affected almost all countries worldwide, including Turkey. Due to the number of infected cases, Turkey is one of the most affected countries in the world. Thus, an examination of the pandemic data of Turkey is a critical issue to understand the shape of the spread of the virus and its effects. In this study, we have a close look at the data of Turkey in terms of the variables commonly used during the pandemic to set an example for possible future pandemics. Both time series modeling and popular efficiency measurement methods are used to evaluate the data and enrich the results. It is believed that the results and discussions are useful and can contribute to the language of numbers for pandemic researchers working on the elimination of possible future pandemics. medical masks, can help protect people from the 2019 novel coronavirus infections [1] . COVID 19 affects the health system of many developed and developing world countries negatively. Turkey, the United States of America, Spain, Italy, Russia, England, France, Iran Germany and Brazil are some of the most negatively affected countries in the world. Although Turkey started a long process of resistance at the beginning of the epidemic, a long pandemic process started by detecting the first case on 10 th March. Because of being a geographical bridge between Asia and Europe, having many international airports and socioeconomic structures based on international relations, the spread of the pandemic in Turkey is quite fast. Turkey is the 9 th most-affected country from the virus according to the data collected on the 11 th May. Thus, the examination of the pandemic data of Turkey is a very important issue to understand the shape of the spread of the virus and its effects. In this study, we have a close look at the data of Turkey in terms of variables commonly used during the pandemic to set an example for possible future pandemics. Furthermore, the predictions are obtained for the coming fourteen days period by using exponential smoothing and Box-Jenkins models in time series analysis. In terms of some variables, the efficiency measurement issue is considered by using Stochastic Frontier Analysis (SFA), Data Envelopment Analysis (DEA) and Cross efficiency evaluation (CE) methods. In this way, the efficiency values of each day in the pandemic are tried to be calculated. Efficiency results are obtained by using the real data and the forecast-ing data. The data set is collected from the website of the Republic of Turkey, Ministry of Health [2] . This study has two purposes. The first is to try to present a general view of COVID 19 in Turkey and understand the shape of the spread of the virus and its effects within the framework of the statistical data recorded officially during the pandemic. The second is to show how the selected data for time series modeling and efficiency analysis can be used to serve the purpose of Sections 3 and 4 . By considering these two goals of the study, it is aimed to provide inferences that can make different interpretations by using different techniques. Furthermore, the aim is to be able to explain the pandemic process to researchers from different perspectives. The rest of the paper is organized as follows: Section 2 includes some comments about the data used in this study, descriptive statistics and graphs are also given in the section. Section 3 deals with the modeling processes in the concept of time series analysis. Section 4 touches on the efficiency analysis by using efficiency measurement methods. Finally, Section 5 concludes the study. In this section, the data of 2019 novel coronavirus (COVID 19) infections are investigated to understand the shape of the spread of the virus and its effects in Turkey. The daily data is collected from the day of the first detected case (3 rd March) to the 10 th May. The data overview process tries to summarize and visualize the updated situation as much as possible by using the graphs and giving descriptive statistics about the variables used. The data used in this study can be found on the website of the Republic of Turkey, Ministry of Health [2] . The variables examined in this section are daily number of tests, the daily number of cases, the daily number of recovered cases, the daily number of deaths, the daily rate of cases, rate of deaths, speed of spread, speed of deaths, the total number of cases, the total number of deaths, the total number of the cases in the intensive care unit, the total number of intubated cases, total number of recovered cases and total number of remaining cases. In the next sections, some of the important variables used in the paper are also examined in details. Since lots of variables are discussed in the overview and analysis processes of the paper, abbreviations are used for them. The list of abbreviations can be found at the end of the article. Firstly, the descriptive statistics are given for each variable to summarize the features and structures of the variables used in the paper. Some of the essential descriptive statistics are given in Table 1 . Furthermore, Figs. 1-5 are given to visualize the data by using the graphs in terms of total and daily statistics, rates and speeds. It is needed to give some details about the remarkable points of the values in Table 1 and Figs. 1-5 . As seen in Table 1 and Fig. 1 , the day when most cases appeared as 5138 is the 32 nd day (10 th April) of the pandemic in Turkey. It can be defined as a peak point of the pandemic. This period is really the most violent process in Turkey during the pandemic. While the data is in this way according to the detection process of the disease, different implications can be done about the death statistics. Depending on the progress and aggravation of the disease in the body, the highest value of the number of deaths appeared as 127 in the 40 th day (18 th April) of the pandemic, which is eight days after 10 th April. In Fig. 2 , all the indicators related to the total data such as the total number of cases, the total number of recovered cases, the total number of deaths, etc. tend to increase, naturally. As seen in Fig. 3 , the most variational time of the daily rate of cases is in the first days of the pandemic, then it starts to lie between the interval of 0.7 and 1.3.The high values of the data of the rate of cases are realized within the days between 16 th and 37 th . As an interesting point, the rate of death reaches high val- ues in two different periods during the pandemic. Both between the 11 th and 15 th days and after the 25 th day, the rate of death exhibits high values. In Fig. 4 , speed values of the number of cases and deaths are given, respectively. The speed value is the ratio of the value of relevant day to the value of the previous day. In other words, it is a kind of rate showing the speed or frequency of an event that occurs per unit of time, population, or other standards of comparison. Depending on the scarcity of the initial data in the first days, the highest speed rates of detected cases and deaths are observed in the first days of the pandemic as 5 th and 10 th days, respectively. Actually, this is an indicator about the disease and this situation also shows that the pandemic tends to spread rapidly from the early days. In Fig. 5 , the data about death statistics can be seen clearly. The total number of deaths appears to increase polynomially. As mentioned above, the highest value of the number of deaths is 127 and it appeared on the 40 th day (18 th April) of the pandemic. If we have a close look at daily number of deaths, it is likely to see that it increases until the 40 th day of the pandemic and tends to decrease afterwards. At the beginning, the initial variance changes are valid. Then, the speed of deaths has values between 1 and 1.5, especially after the 11 th day of the pandemic. Depending on the process of the disease and the effect of the treatment process, the first recovered cases are detected on the 9 th day of the pandemic. The maximum value of the daily number of recovered patients (43498 cases) occurs on the 50 th day (28 th April) of the pandemic. Especially between the 50 th and 53 rd days, the number of recovered cases is quite high. If we consider the structures of the data in terms of the values of descriptive statistics in Table 1 , it is likely to say that daily rate of cases, speed of cases and speed of deaths show significant positive (right) skewed and pointed distribution structure in terms of statistics and standard error (std. error) values of skewness and kurtosis, respectively. Time series analysis is a subject of Statistical Science, which includes techniques for modeling variable values depending on time and making predictions for the future. Unlike regression analysis, time series analysis may contain lagged values of variables in the model. Similar to regression models, linear and non-linear forms of time series models are available. In this section, basic time series models and forecasting methods are explained. Additionally, modeling and forecasting applications are discussed by using COVID 19 data of some variables mentioned in this study. Time series analysis is one of the important tools of applied statistics used for modeling the time series data and forecasting the future values. The modeling process includes expressing data graphically, using autocorrelation and partial autocorrelation functions (ACF and PACF) and modeling them is a task in itself. As a general definition of observed values of time series, Z t , t = 1 , 2 , . . . , n with n sample sizes and, t th observed data over time is expressed by Z t [3] . The Box-Jenkins modeling ARIMA ( p, d, q ) processes include model selection, parameter estimation and model diagnostic checking. The model is considered acoording to historical observation values ( p) and past error values ( q ). T The degree of the parameter p and q are the degrees of the autoregressive and the moving average parameters, respectively. The degree of difference is considered as d . This model is showed as ARIMA ( p, d, q ) and form formulated as and θ (B) are polynomials of p and q degrees, respectively [ 3 , 4 ] . The best model is determinated by considering the value of the AIC (Akaike Information Criterion), the AIC is given as below [3] : Here, M is the number of estimated parameters. The Box -Jenkins approach for forcasting is Minimum mean square error forecasts for of Z n +1 is given as the conditional expectation [ 3 , 5 ] . In this section, classical moving averages and exponential smooting methods for forcasting is considered, as well. Exponential smooting technique involves averaging (smooting) the past values of a series in an exponentially manner [5] . Simple exponential smoothing has a single level parameter, it can be given as below : Brown's exponential smoothing has level and trend parameters, it can be defined in the equations below : Lastly, Holt's exponential smoothing has level and trend parameters, it can be considered by the following equations: Here, the best forecasting model is determinated by using the Mean Absolute Error (MAE) value. In other words, if the relevant model has the smallest MAE, it can be considered as the best one [5] . In this Section, DNOC, TNOC, DNOD, TNOD, DNOR, TNOR and TNORC as COVID 19 data for Turkey in terms of time series modeling and forecasting are obtained for the next 14 days (until 24 th May) of the pandemic. Table 2 presents the models obtained for each series. The model statistics for the models obtained for each series are given in Table 3 . Accordingly, it can be said that all models are statistically sufficient and the best model at 0.01 significance level. Table 4 gives the parameter estimates of the models obtained from the series as exponential smoothing models. Accordingly, all parameter estimates except Gamma (trend) parameter of Model 3 are statistically significant at the level of 0.01 significance. Table 5 gives the parameter estimates of the models obtained from the series as ARIMA models. Accordingly, all parameter estimates are statistically significant at 0.05 significance level. In Table 6 , 14day prediction values obtained by using forecasting techniques for each COVID 19 data of Turkey are given based on the models given in Table 4 and 5 . The column of the day no is the number of days since the virus was first detected in Turkey. According to the forecasting results, the daily number of deaths and cases are expected to decrease in the short term. Although there is a decrease in daily data, normally, total data are expected to increase for number of deaths and cases. So, this case is an indicator which shows that the intensity of the pandemic decreases in the short term. The total number of recovered cases and intubated cases tend to decrease in the short term, which is a pleasant situation for the pandemic. Using time series analysis to forecast the pandemic values of Turkey is not common subject in the literature. Furthermore, these forecasting values are used (obtained from both linear and nonlinear forms) in the efficiency measurement process in Section 4 , which is a novel application . By using these forecasting values and calculating the effficiency scores (with a non-linear model, SFA in Section 4.1 ) the days are considered as decision-making units in the study, and this approach has never been used in the literature for Turkish data. The two most popular methods used in the solution of benchmarking and ranking problems are Data Envelopment Analysis (DEA) and Stochastic Frontier Analysis (SFA). Both methods draw a production frontier (efficiency frontier) for the production possibility set. While DEA draws the frontier with mathematical modeling methods, SFA draws the limit with mathematical functions. DEA is a nonparametric method and SFA is a parametric method, so the analysis performed is expected to be more appropriate since it contains a random error term in the function that SFA uses. Decision-making units (DMUs) on the efficiency frontier achieved with DEA are called efficient. In SFA, there are no DMUs on the efficiency frontier. its efficiency is ranked according to its proximity to this frontier. SFA models contain an error term. There are some assumptions for this distribution of error term in SFA. The most important of them is that we know the character of this error term before starting the analysis. In other words, this error term has a distribution and it is assumed that the distribution is known in advance. In addition, SFA needs some assumptions about the structure of the data generated process and production possibility set before the implementation. That is, it assumes that we know the stochastic relationship between the inputs and outputs produced. Since this will cause a cost in the application, it can be said that this situation constitutes a disadvantage according to DEA. The model proposed by Aigner et al. [6] , Meesun and Van den Broeck [7] almost at the same time is as follows. where v is the error term corresponding to statistical noise and contains the margin of error caused by all kinds of uncontrollable situations. v and u in the Eq. (8) are independent variables. u is a non-negative random variable, that is, it is the factor that makes the situation stochastic and is called inefficiency effect. In other words, it is a variable that reflects the inefficient of the unit of interest. If u = 0 , the firm is called efficient, and if u > 0 it is concluded that firm is to a certain extent inefficient. Meeusen and Van den Broeck [7] considered u as exponential distribution, while Battese and Corra [8] considered half-normal distribution. In addition to Eq. (8) , the model with constants is given in Eq. (9) . Eq. (9) is called the stochastic frontier production function. With this function, the output values are enveloped from the top by the stochastic variable e x i β+ v i . By using the above definitions, the technical efficient score of decision-making units is obtained by Eq. (10) . Another name for this efficiency score is the Farrell measure. This is an output-oriented measure. After the parameter estimate is made, the technical efficiency is estimated by Eq. (11) . Where e is defined as e = v − u . Data Envelopment Analysis (DEA) is a nonparametric performance measurement technique commonly used in various branch of science to measure the efficiencies of DMUs and classify them as efficient or inefficient ones. In the efficiency measurement process, some variables are considered as inputs and some of the variables are considered as outputs. One of the classical DEA models called CCR can be given by the following mathematical programming model: In CCR model, if the optimum value of θ p equals to 1 ( θ * p = 1 ) , DM U p is defined as efficient unit, otherwise, it is referred to as an inefficient unit. To solve some of the drawbacks such as discrimination power problem and assignment of zero weights of classical DEA models, cross efficiency evaluation method (CE) is developed by Sexton et al. [10] . CE provides an alternative and effective efficiency measurement and ranking. Its idea can be defined by considering two stages. Firstly, the optimal weight values are calculated for each DMU by using above-mentioned CCR model. Secondly, the cross-efficiency score of DM U p ( θ p, j ) is calculated as below [10] : The first COVID 19 cases have seen in Turkey on 10 th March, 2020. The first death from COVID 19 occurred on 16 th March, 2020, six days after the first case occurred. The first to recover (42 cases) happened on 26 th March, 2020, sixteen days after the first case appeared. Since then, the number of new cases, the number of deaths, and the number of recovered cases has continued to increase. In this study, efficiency analysis is performed for two different situations (by excluding the forecasting values and by including the forecasting values) with three different methods (SFA, DEA, and CE). Firstly, real data between 26 th March and 10 th May are used. Secondly, data from 26 th March to 24 th May are used. However, the data between 11 th May and 24 th May is the forecasting data. Thus, taking the values obtained by times series modeling into account, COVID 19 performance measurement is considered for the next days. The variables used in the efficiency analysis are daily data for number of the recovered cases (output), number of deaths (input 1) and number of cases (input 2). The efficiency scores obtained for the first situation are given in Table 7 by three different methods. The graph of these scores is shown in Fig. 6 . In the results of SFA, it is likely to see that the highest performance occurred between 31 st and 33 rd days. According to the results of DEA, the best performing days are 55 th and 56 th days, and CE consider this interval between 54 th and 58 th days. Briefly, it can be seen that the general situation is troubled in the first days of the epidemic, but then improved in the following days. The values given in parentheses in Table 7 are the rank of efficiency scores. Rank (1) expresses the day of the best performance. According to the results, the efficiency scores are not good in the first days, but then the scores gradually increases. This is more evident in Fig. 6 . As the days progressed, an increase in efficiency scores is detected. This is an indicator of the measures taken by the scientific committee of Ministry of Health and the increasing awareness of people. The efficiency scores obtained for the second situation are given in Table 8 and the graph of these scores is shown in Fig. 7 . where the days corresponding to the forecasting days are marked with * . Similar results are obtained in Fig. 7 , as well. Based on the three methods, it is likely to see that the efficiency scores gradually increase. T The results of SFA show that the efficiency scores increase step by step and that the recovery process continues in the coming short term. Especially, after 37 th day, SFA scores tend to be high. Both DEA and CE show similar results. It is a noticeable point that 56 th , 73 rd * , 74 th * and 76 th * days have the best performances according to DEA results. Briefly, it can be seen that the general situation is troubled in the first days of the epidemic, but then it improves in the following days. The reason for that can be considered as the practices and the strict measures taken so far have worked. Turkey gradually increases its performance in the next days if it continues to follow this policy. Spearman's rank correlations are also obtained to evaluate the consistency of the models used and are given in Table 9 . The correlation coefficients obtained are statistically significant ( α = 0 . 05 ). In the first situation, the correlation coefficients between SFA and DEA, and between SFA and CE are 0.6010 and 0.5910, respectively. Consequently, in both situations, SFA has higher correlations compared to DEA and CE. This section has a comparison between the efficiency measurement methods, and the scores and ranking results of each efficiency measurement method (for both linear approaches as DEA and CE and non-linear approach as SFA) and correlations between them are given in Tables 7-9 . Due to the number of infected cases, Turkey is one of the mostly influenced countries from COVID 19 in the world. Turkey is the 9 th most-affected country from the virus according to data collected on the 11 th May. Thus, the analysis of the data belonging to Turkey is important to understand the pandemic process. The results and discussions can be a beneficial indicator and a useful guide for the researchers who would like to work on the dispositions of possible future pandemics. One of the purposes of this study is to try to present a general view of COVID 19 in Turkey and understand the shape of the spread of the virus and its effects within the framework of the statistical data recorded officially during the pandemic. The day when most cases appeared as 5138 is the 32 nd day (10 th April) of the pandemic in Turkey. It can be defined as a peak point of the pandemic. This period is really the most violent process in Turkey during the pandemic. The highest value of the number of deaths appeared as 127 in the 40 th day (18 th April) of the pandemic, which is eight days after 10 th April. The maximum value of the daily number of recovered patients (43498 cases) occured on the 50 th day (28 th April) of the pandemic. Especially between the 50 th and 53 rd days, the number of recovered cases are quite high. All these results indicate that the negative process is getting slowed down day by day, and after the day in the middle of the pandemic, the situation has started to improve. If we consider the structures of the data in terms of the values of descriptive statistics, it can be investigated that daily rate of cases, speed of cases and speed of deaths show significant positive (right) skewed and pointed distribution structure in terms of statistics and standard error (std. error) values of skewness and kurtosis, respectively. The second purpose of the study is to show how the selected data for time series modeling and efficiency analysis can be used for serving the results for the policy makers. Forecasting results of time series analysis indicate that the daily number of deaths and cases are expected to decrease in the short term. Total data are expected to increase slowly for number of deaths and cases. So, this case is an indicator which shows that the intensity of the pandemic decreases in the short term. The total number of recovered cases and intubated cases tend to decrease in the short term, which is a pleasant situation for the pandemic. According to the results of efficiency analysis, the pandemic situation is not good in the first days of the pandemic, but then the scores gradually increased. As the days progressed, an increase in efficiency scores is detected. According to the results in the second scenario (by including the forecasting values) of the efficiency measures, it is expected to see this normalization and recovery process in coming short term, as well. There is no conflict of interest and we confirm that the manuscript has been read and approved for submission. I hope you may find the manuscript suitable for publication and look forward to hearing from you in due course. Time series analysis: forecasting and control Cases of residual types in diagnostic checking for ARMA model Introduction to time series analysis and forecasting Formulation and estimation of stachastic frontier production function models Efficiency estimation from cobb-douglas production functions with composed error Estimation of a production frontier model: with application to Pastoral Zone of Eastern Australia Measuring the efficiency of decision making units Measuring efficiency: an assessment of data envelopment analysis