key: cord-299363-y7o8ovf9
authors: Guliyev, Hasraddin
title: Determining the spatial effects of COVID-19 using the spatial panel data model
date: 2020-04-07
journal: Spat Stat
DOI: 10.1016/j.spasta.2020.100443
sha: 
doc_id: 299363
cord_uid: y7o8ovf9

This study investigates the propagation power and effects of the coronavirus disease 2019 (COVID-19) in light of published data. We examine the factors affecting COVID-19 together with the spatial effects, and use spatial panel data models to determine the relationship among the variables including their spatial effects. Using spatial panel models, we analyse the relationship between confirmed cases of COVID-19, deaths thereof, and recovered cases due to treatment. We accordingly determine and include the spatial effects in this examination after establishing the appropriate model for COVID-19. The most efficient and consistent model is interpreted with direct and indirect spatial effects.

Efforts directed toward interpreting the pathophysiology of COVID-19 have led to the EU mobilising €10,000,000 into research that would "contribute to more efficient clinical management of patients infected with the virus, as well as public health preparedness and response" ("Coronavirus: EU mobilises €10 million for research," 2020, January 31). Further, US-based corporations such as Co-Diagnostics and the Novacyt's molecular diagnostics division Primerdesign have been developing COVID-19 testing kits for use in the research setting ("Primerdesign launches molecular test for new coronavirus," 2020, January 31). The UK government has also sanctioned £20,000,000 to support the development of a COVID-19 vaccine ("Coronavirus: UK donates £20m to speed up vaccine," 2020, February 3). Given the nature of the pandemic, COVID-19 has been a subject of intense discussion since the beginning of 2020. As the pandemic spreads exponentially, healthcare enterprises and non-profit organizations have already begun work to counter it.

In this study, we investigate the propagation power and effects of COVID-19 in light of published data. Thus, the factors affecting COVID-19 are examined together with spatial effects, and spatial panel data models are used to determine the relationship among the variables (factors) with spatial effects. Using spatial panel models, we analyse the relationship between the rate of confirmed cases (R c ) of COVID-19, the rate of deaths (R d ), the rate of recovered cases (R r ) due to treatment, with spatial and temporal effects.

We first estimate a standard linear panel data model devoid of spatial effects. This model can be used as a reference for the estimation results of spatial panel data models as well as to check the robustness of these estimation results (Yang, Chen, Cao, Li, & Li, 2017) . The formulation of a standard linear regression model (SLM) is as follows (Lan, 2012; Tatoglu, 2012) :

where is the explained variable, denotes the individuals, and constitutes the regions ( = 31). is the dimension of the time series, that is, from 22 January 2020 to 10 March 2020. ′ is the 1 × vector of observations of the explanatory variables and is the × 1 vector of undetermined coefficients. is an individual effect that cannot be directly observed and quantified and is a disturbance term that varies with the individual and time. If is related to , the panel data model is a fixed effects model; otherwise, it is a random effects model (Fotheringham & Rogerson, 2008) .

Spatial panel data models include the spatial autoregression model (SAR), spatial error model (SEM), spatial autocorrelation model (SAC), and spatial Durbin model (SDM). These models consider the spatial effects based on the SLM and they are estimated using the maximum likelihood principle. Among them, the SAR model considers the spatial spillover effect of the dependent variable. Hence, its formula includes the spatial lag term of the dependent variable, which can be expressed as follows:

where is the spatial lag of the dependent variable and = ∑ =1 is the contiguity based on the weighted Rook matrix.

is the spatial autoregression coefficient. If has statistical significance, it demonstrates the existence of a significant J o u r n a l P r e -p r o o f spatial dependence among the dependent variables. That is, a confirmed case in a region depends on the contiguous regions. The value of reflects the degree of the spatial dependence (Gelfand, Diggle, Guttorp, & Fuentes, 2010) .

The SEM discovers the effects of the omitted variables on the observation of the determined (dependent) variable in a provincial area, which contains a spatial error term. A spatial autocorrelation among residuals is thus practical and the SEM can be formulated as follows:

where λWε is the spatial error term, is the autoregressive factor, and is a random error term that is usually assumed to be independent and identically distributed (i.i.d.). We can confirm the existence of hidden independent variables with spatial autocorrelation if is statistically significant, which results in the trend of a noticeable spatial autocorrelation in the residuals. The SAC model is a combination of the SAR models and SEMs; it consists of the dependent variable spatial lag and a spatial error term, which can be expressed as follows:

= + ′ + + = λWε + , (5) In the above equation, corresponding to various research functions and needs, SWM W1 and SWM W2 (spatial weight matrix) can be the same or different (LeSage, 2008; Yang et al., 2017) . In this study, we used the same SWM to estimate the model, that is, 1 = 2 = , and the residual terms are the same as those revealed above. The SDM includes the dependent variable spatial lags and explanatory variables. It uses the marginal effects of the explanatory variables from the nearby regions/state based on the SAR model. The common specification for the SDM is as follows:

= + ′ + δ + + (6) where δ is the explanatory variables' spatial lag, is the × ( − 1) constant independent variable matrix, and is the ( − 1) × 1 vector of the parameters that determine the marginal effects of the independent variables from nearby observations on , the dependent variable. (Elhorst, 2010) , illustrates the relationships among the previously stated spatial panel models. First, we examine the SLM estimated by ordinary least squares. We start with this model, as it is the simplest and most common. Though it is a non-spatial effect model, it is frequently used as a diagnostic tool for model specification and is a benchmark for comparisons with spatial models.

We also represent the SEM, as the interpretation of the coefficients is similar to that of an SLM. The SAR is introduced in section 3. Because of endogenous spatial dependence in this model, it is more challenging to interpret the coefficients. This section also examines the SAC model (Kelejian & Prucha, 1998) , which is characteristically close to the SAR model. In section 4, we present two regression models with spatial lags only in the independent variables-the spatially-lagged X model (SLX) other than the spatial Durbin error model, both of which include exogenous spatial dependence. In section 5, we consider the interpretation of coefficients for an SDM, which includes both exogenous and endogenous spatial dependencies, consequently complicating the interpretation more than for the preceding models (Golgher & Voss, 2016) .

For the purpose of our objective, we include sampling data from 22 January 2020 to 10 March 2020 for the 31 * regions in Mainland China. The data are collected from the COVID-2019 situation reports ** by WHO. We analyse the relationship between the rate of confirmed cases (R c ) of COVID-19, the rate of deaths (R d ), and the rate of recovered cases (R r ), with spatial and temporal effects. The rates are calculated classifying each variable by the population in the province. Population statistics for each province is collected from the National Bureau of Statistics of China *** . The statistics of the rate of confirmed deaths and recovered cases on 10 March 2020 and average statistics are presented in Table A .1.

Table A.1. shows that Hubei had the highest concentration of the rate of confirmed cases (10.714 cases per 100,000 people), followed by Guangdong (0.214 cases per 100,000 people) and Henan (0.201 cases per 100,000 people). The least rate of confirmed cases was from Ningxia, Qinghai, and Tibet. These data are up to 10 March 2020. In Hubei, an average of 6.498 out of 100,000 people tested positive for COVID-19, 2.126 out of 100,000 patients recovered, and 0.241 out of 100,000 people lost their lives. For Guangdong, these statistics are 0.159, 2.126, and 0.241, respectively.

Before fitting spatial panel models, we require an SWM matrix. An SWM characterizes the spatial relationships among variables in a dataset (Fotheringham & Rogerson, 2008; Zeren, 2010) . The in this research was 31 × 31, row-standardized with zero diagonal factors and developed via the conceptualization of spatial relations of the polygon rook contiguity in Stata 16. Formally,

This form is then transformed into a suitable format for Stata 16.0 and is used in spatial panel regression.

To manage the spatial autocorrelation effect of the dependent variable and correctly analyse the affecting factors and their spatial spillover effects, spatial panel data models can be used. Compared with standard linear panel data models, spatial panel data models take on spatial effects, such as the spatial dependence and spillover effects. Further, compared with the spatial model built on cross-sectional data, the spatial panel data model can grasp the individual heterogeneity of spatial units-that is, individual effects-and can escape missing variables and estimation errors more efficiently (Elhorst, 2014) Before estimating spatial panel data models, we need to test for cross-sectional dependence. The primary issue when confronted with spatially referenced data is to determine whether spatial dependence exists, that is, whether "nearby" cases are more correlated than distant ones. A flexible way of assessing whether dependence in the cross-section of a panel dataset is spatially related is the particularization of the Pesaran (2004) test for general cross-sectional dependence (Croissant & Millo, 2019; Tatoğlu, 2013) . Table 2 shows the cross-sectional dependence test reports; we can reject that the null hypothesis errors are i.i.d. This is not surprising given our hit map visual ( Figure 2 ) appraisal of confirmed COVID-19 cases. Consequently, we require spatial panel models. The estimation results for the SLM and the six spatial panel data models are shown in Table 2 . The parameters of the spatial panel models are estimated using the quasi-maximum likelihood estimator derived by Lee and Yu (2010) and the p-value is calculated using the robust standard error. All of spatial panel data models include two-way effects: individual (cross) and temporal (time) effects. Temporal effects for each spatial panel model are shown in Table A.2. Firstly, we eliminated SEM (2), SDM (5) and SDEM (6) models since there was spatial effect no statistically significant at %5 level. Following, we had to choose from models such as SAR (1), SAC (3) and SLX (4). The estimated coefficient of the spatially lagged independent variables (LM r and LM d ) in the SLX (spatially-lagged X) model was statistically significant at the %5 level. That is, the rate of confirmed COVID-19 cases for provinces in China are spatially correlated. This further suggests that it is necessary to construct spatial panel data models rather than SLMs, which do not consider spatial effects, if our objective is to explore the influencing factors of the rate of confirmed cases and their spatial spillover effects.

The pseudo-R 2 (99.16), likelihood ratio-stat (LR-stat) (85972), and Lagrange multiplier (LM) test of common spatial terms stat (41.853) for the SLX are higher than SAR (1) and SAC (2) models. Its value of the corrected Akaike information criterion (AICc) (-3725.134), which is calculate for small samples, and Bayesian information criterion (BIC) (-3725.134 ) are also lower than the SAR and SAC models. The LM r , and LM d test statistics for the SLX are significant at the %5 level, and, hence, spatial effects of explanatory variables (LM r , LM d ) are different from zero. Hausman test statistics is 21.791 for SLX; further, the fixed effects SLX is more consistent in comparison with the random effects SLX (prob<0.001). Consequently, the SLX can be considered a betterfitting spatial panel regression model. Therefore, we mainly interpret the influencing factors based on the estimation results of the SLX in the following analysis.

The average direct, indirect, and total effects of these explanatory variables are presented in Table 3 . The direct effect expresses the marginal effect of the change in the independent variable of one percent on the dependent variable of the same unit. The indirect effect is the marginal effect of the change in the independent variable in one percent on the dependent variable value of all neighbouring units. The total effect is the sum of both effects. The average direct effects of the rate of recovered cases and the rate of deaths are 32.485 (prob<0.001) and -0.734 (prob<0.001), respectively, indicating that one-percent increase in the rate of deaths (example, in Hubei) leads to 32% positive change in the rate of confirmed cases (in Hubei) and a one-percent increase in the rate of recovered cases leads to 0.7% negative change in the ratio of confirmed cases (in Hubei), respectively. Compared with the average direct effects and the estimated coefficients, the average indirect effects can more comprehensively reflect the actual effect of the influencing factors. The indirect effects of the rate of recovered cases are measured at 1.663 (prob<0.001), indicating that a one-percent increase in the rate of deaths (in Hubei) leads to 1.7% positive change in the rate of confirmed cases (in neighbouring regions of Hubei, namely, Henan, Anhui etc.). However, the indirect effect of the rate of recovered cases is not significant at the 5% level (prob>0.05). Table A .2 shows temporal effects of SLX model. We consider that the rate of confirmed cases in the first days increased slightly and it was not statistically significant at %5 level. However, for the SLX model, after 3 February 2020 date, the rate of confirmed case increases had become statistically significant. The increase in the rate of confirmed cases since the beginning of March 2020 has become dramatic. So, we contemplate that the confirmed cases on 10 March 2020 compared to 22 January 2020 date increased by 0.1254 cases in 100,000 people.

Built on the spatial panel data of 31 regions in China from 22 January 2020 to 10 March 2020, we investigated the influencing variables (the rate of deaths and recovered cases) and their spatial spillover effects of COVID-19. Before we built and compared the spatial panel data models, we tested the cross-sectional dependence using the Pesaran test. We thus found cross-sectional dependence between the units.

Among the panel data regression models estimated to capture spatial effects, the most efficient and consistent model was determined according to the maximum pseudo-R 2 , LR-test, LM-test statistics, and minimum AICc and BIC values. The results of the model comparison allowed us to select the SLX from the predicted spatial panel data models for interpretation.

In the SLX model, the spatial effects of the dependent and independent variables were examined separately. Specifically, the independent variables effects were split into the total, indirect (spatial spillover effects), and direct effects in order to improve the identification of the actual impacts and spatial interactions of the factor components on COVID-19.

We thus draw the following conclusions: • As per the total effect, the rate of deaths has significant positive effects, while the rate of recovered cases has significant negative effects on COVID-19. • As per the direct effect, the rate of deaths has significant positive effects on COVID-19. That is, a one-percent increase in the rate of deaths leads to 32% the rate of confirmed positive changes. In addition, the recovered cases have significant negative effects on COVID-19. That is, a one-percent increases in the rate of recovered cases leads to 0.7% confirmed negative changes. • As per the indirect effect, the rate of deaths has significant positive effects on COVID-19 in the neighbouring region. That is, a one-percent increase in the rate of deaths leads to 1.7% confirmed positive changes in the neighbouring regions. However, the rate of recovered cases did not have significant negative effects on COVID-19. • As a result of the temporal effect analysis, the rate of confirmed cases is increasing day by day. We compared the date of 22 January 2020 with the date of 10 March 2020, the confirmed cases had increased nearly by 0.13 cases per 100,000 people, in other word, 13 cases per 10,000,000 people.

Some limitations need to be addressed while discussing the results of the present study. We cannot model the rate of deaths because of the presence of high proportion of zeros. In addition, we consider that the time period is short. Future research can be examined with a big dataset.

In general, this study had provided researchers with information about the effects of the spread of the COVID-19 virus. Therefore, the effects of the spread of the virus have been addressed both spatially and temporally, and efforts have been made to produce information that would be useful to all humanity. 0.0418** 0.0373** 0.0417** 0.0366** 0.0393** 0.0412** 0.0393** 04/02/2020 0.0516*** 0.0463*** 0.0516*** 0.0455*** 0.0488*** 0.0512*** 0.0487*** 05/02/2020 0.0594*** 0.0534*** 0.0594*** 0.0524*** 0.0562*** 0.0589*** 0.0561*** 06/02/2020 0.0660*** 0.0593*** 0.0659*** 0.0582*** 0.0624*** 0.0654*** 0.0622*** 07/02/2020 0.0716*** 0.0641*** 0.0715*** 0.0631*** 0.0675*** 0.0708*** 0.0674*** 08/02/2020 0.0743*** 0.0663*** 0.0743*** 0.0651*** 0.0698*** 0.0731*** 0.0697*** 09/02/2020 0.0756*** 0.0669*** 0.0756*** 0.0659*** 0.0705*** 0.0740*** 0.0705*** 10/02/2020 0.0759*** 0.0667*** 0.0759*** 0.0655*** 0.0702*** 0.0737*** 0.0702*** 11/02/2020 0.0747*** 0.0651*** 0.0747*** 0.0638*** 0.0685*** 0.0719*** 0.0684*** 12/02/2020 0.0782*** 0.0684*** 0.0782*** 0.0672*** 0.0720*** 0.0756*** 0.0720*** 13/02/2020 0.1051*** 0.0926*** 0.1051*** 0.0911*** 0.0975*** 0.1023*** 0.0975*** 14/02/2020 0.1161*** 0.1025*** 0.1161*** 0.1008*** 0.1078*** 0.1131*** 0.1078*** 15/02/2020 0.1127*** 0.0987*** 0.1127*** 0.0969*** 0.1037*** 0.1088*** 0.1036*** J o u r n a l P r e -p r o o f

References Coronavirus: EU mobilises €10 million for research

Coronavirus: UK donates £20m to speed up vaccine

Panel data econometrics with R

Applied spatial econometrics: raising the bar

Spatial Panel Data Models

The SAGE handbook of spatial analysis: Sage

Handbook of spatial statistics

How to interpret the coefficients of spatial models: Spillovers, direct and indirect effects

A generalized spatial two-stage least squares procedure for estimating a spatial autoregressive model with autoregressive disturbances

Analysis of cross-sectional data and panel data with application of STATA

Estimation of spatial autoregressive panel data models with fixed effects

An introduction to spatial econometrics

Outbreak of Pneumonia of Unknown Etiology in Wuhan China: the Mystery and the Miracle

General diagnostic tests for cross section dependence in panels. Primerdesign launches molecular test for new coronavirus

İleri panel veri analizi stata uygulamalı

Clinical Characteristics of 138 Hospitalized Patients With 2019 Novel Coronavirus-Infected Pneumonia in Wuhan

The spatial characteristics and influencing factors of modal accessibility gaps: A case study for Guangzhou

Mekansal etkileşim analizi