key: cord-0810890-05mogh2l authors: Guan, Ming title: Panel Associations Between Newly Dead, Healed, Recovered, and Confirmed Cases During COVID-19 Pandemic date: 2021-12-11 journal: J Epidemiol Glob Health DOI: 10.1007/s44197-021-00019-z sha: 7bbe469ae6ea57dbc0cc9399dad0c3367f3c73e7 doc_id: 810890 cord_uid: 05mogh2l BACKGROUND: Currently, the knowledge of associations among newly recovered cases (NR), newly healed cases (NH), newly confirmed cases (NC), and newly dead cases (ND) can help to monitor, evaluate, predict, control, and curb the spreading of coronavirus disease 2019 (COVID-19). This study aimed to explore the panel associations of ND, NH, and NR with NC. METHODS: Data from China Data Lab in Harvard Dataverse with China (January 15, 2020 to January 14, 2021), the United States of America (the USA, January 21, 2020 to April 5, 2021), and the World (January 22, 2020 to March 20, 2021) had been analyzed. The main variables included in the present analysis were ND, NH, NR, and NC. Pooled regression, stacked within-transformed linear regression, quantile regression for panel data, random-effects negative binomial regression, and random-effects Poisson regression were conducted to reflect the associations of ND, NH, and NR with NC. Event study analyses were performed to explore how the key events influenced NC. RESULTS: Descriptive analyses showed that mean value of ND/NC ratio regarding China was more than those regarding the USA and the World. The results from tentative analysis reported the significant relationships among ND, NH, NR, and NC regarding China, the USA, and the World. Panel regressions confirmed associations of ND, NH, and NR with NC regarding China, the USA, and the World. Panel event study showed that key events influenced NC regarding USA and the World more greatly than that regarding China. CONCLUSION: The findings in this study confirmed the panel associations of ND, NH, and NR with NC in the three datasets. The efficiencies of various control strategies of COVID-19 pandemic across the globe were compared by the regression outcomes. Future direction of research work could explore the influencing mechanisms of the panel associations. Despite travel restrictions [1] and limitations [2] , coronavirus disease 2019 (COVID-19) has rapidly spread across the globe as a result of multiple literature. For instance, a longitudinal analysis concluded the impact of COVID-19 could migrate between vulnerable counties [3] . Another theoretical study demonstrated that a large-scale spatial transmission of COVID-19 was caused by the relatively high per-capita rate of transmission [4] . To tackle the spread of COVID-19, a growing number of countries initiated practical strategies (in-house isolation, quarantine, and promoting general awareness about transmission routes) against further development of contagion [5] . But consequently, the situation rapidly deteriorated with increasing number of newly confirmed cases (NC) [6] , especially in western countries. Especially, a certain empirical law of COVID-19 spread attracted academic attention [7] . Although NC between countries was reported [8] , national gaps among newly recovered cases (NR), newly healed cases (NH), and newly dead cases (ND) were seldom documented in the current academic literature. Statistical analyses with micro and macro data of COVID-19 pandemic can help evaluate the relevant control interventions. Till now, regarding the epidemic evolution of total COVID-19 infections, analytical methods of control efficiency of COVID-19 pandemic are limited and biased. Notably, trend forecast with publicly available micro epidemiological data has been particularly the mainstream in the field of COVID-19 control. For example, multiple studies forecast a trend of the COVID-19 spreading in China [9] [10] [11] . Moreover, the temporal dynamics of the COVID-19 epidemic were reported in the parts of the World including Huangshi city, China [12] , South Korea [13] , UK and Sweden [14] , Pakistan [15] , and Wuhan, China [16] . The survival duration including the average lag between NC and ND [17] , lethal duration [18] , and COVID-19 duration [19] were employed to reflect the evolution of COVID-19 pandemic. But, forecast and trend methods often considered time change and neglect the relationships among ND, NH, NR, and NC. Additionally, pure mathematics underlined prediction errors caused by large uncertainties [20] . However, those studies without regional, national, and global variables could not obtain correct and scientific findings. To date, analytical tools in published studies were limited to reflect the associations of ND, NH, and NR with NC. For example, a substantial body of time series models and simulations employed not spatial and locational factors but temporal factors [21] [22] [23] [24] [25] [26] . Several simulations reported time trend of ND, NH, NR, and NC, but provided limitations in studying locational differences [27] [28] [29] . Thus, time series studies and simulations led to partial and biased research outcomes. Even more importantly, panel associations of ND, NH, and NR with NC were not analyzed. Furthermore, policy interventions were not considered in the current studies. From December 12, 2019 till now, a series of daily policies and regulations were released by the Chinese government, global organizations, and western countries and documented in China Data Lab [30] . With publicly available data of the COVID-19 pandemic for both the USA and Italy, a study observed that the future NC, ND, and NR of COVID-19 were reasonably predicted [31] . Thus, trend driven by policy outcomes regarding NC which indirectly assessed national struggling efforts against COVID-19 pandemic often were neglected. The progress in COVID-19 crisis was formally characterized by ND, NH, NR, and NC. Thus, this study based on publicly available longitudinal datasets to explore panel associations of ND, NH, and NR with NC. According to the presumptions of the panel models, pooled regression, stacked within-transformed linear regression, quantile regression for panel data, random-effects negative binomial regression, and random-effects Poisson regression would be conducted to reflect the associations of interest regarding China, the USA, and the World. Subsequently, panel event study was performed to reflect the trends of NC. Consequently, the endemic control performance would be further analyzed, assessed, and compared on the basis of the empirical outcomes. Daily cases in China included the numbers of NH, NC, and ND at the province-level unit available from January 15, 2020 to January 14, 2021 [32] . Daily cases in the USA included the numbers of ND and NC at the state-level unit available from January 21, 2020 to April 5, 2021 [33] . Daily cases in the World (outside Antarctica, China, the USA, and MS Zaandam) included ND, NR, and NC at the countrylevel unit available from January 22, 2020 to March 20, 2021 [34] . The dataset of China contained information on 31 province-level units. The dataset of the USA contained information on 51 states. The dataset of the World contained information on 192 countries and regions. The geographical divisions could be found in Appendix. There was no data cleaning performed on the raw data available at Harvard dataverse. Before designing statistical strategies, the relationships between NC and ND, between NH and NC, and between NH and ND regarding China, the relationship between ND and NC regarding USA, and the relationships between NC and ND, between NC and NR, and between ND and NR regarding the World were depicted by front-and-back plots in Figs. 1, 2, 3, 4, 5, 6 and 7 [35] . Due to sparse distribution in Figs. 1, 2, 3 and asymptotic normality in Figs. 4, 5, 6 and 7, several linear and nonlinear panel regression models were considered as potential analytical methods when normality assumptions were violated. Tentative analysis on the relationships between ND, NH, NR, and NC was performed by a one-stop solution for robust inference with multiway clustering (Stata package vcemway) [36] . In the sample, the identification code and day were identified as the clustered variables of interest. Thus, this study extended the ordinary least squares regression to incorporate random effects at the individual level. The Relationship between NR and NC regarding the World following analyses estimated the resulting random effects model and adjusted its standard errors for two-way clustering in identification code and day. As compared with the statistical outcomes from the ordinary least squares regression, two-way clustering can lead to more conservative inferences than one-way clustering approaches. The main associations of interest in this study were panel associations of ND and NH with NC regarding China, panel associations between ND and NC regarding the USA, and panel associations of ND and NR with NC regarding the World. In the pooled regression analysis, the regions of China (Central China, Western China, Northeast China, and East China), the USA (New England, Mid-Atlantic Region, the South Region, Mid-West Region, the Southwest Region, and the West Region), and the World (Africa, Asia, Europe, North America, Oceania, and South America) were also considered as covariates. The count data of ND, NH, NR, and NC tended to follow the Poisson or negative binomial distributions. In this large sample, the distributions approached to normal distributions approximately. Regarding the associations, the feasible panel models could be linear and nonlinear models. When NR, NH, NC, and ND were considered as count data, randomeffects negative binomial regression and random-effects Poisson regression could be employed to reflect the associations of interest in nonlinear models. When ND, NH, NR, and NC were considered as continuous variables, pooled regression, stacked within-transformed linear regression, and quantile regression for panel data could be employed to explore the associations of interest in linear models. Regarding China, the panel associations of ND and NH with NC could be found by using the regression model (1): Regarding the USA, the panel associations between ND and NC could be found by using the regression model (2): Regarding the World, the panel associations of ND and NR with NC could be found by using the regression model (3): Here, β 0 was constant. β 1 and β 2 were coefficients. μ 1 , μ 2 , and μ 3 were random errors. If optimized iterations were not concave, the possible calculations of chosen methods were deleted. Regarding cubic or quadratic equations, this study aimed to explore the associations of interest rather than dynamic system analysis. Thus, it was unnecessary to conduct regressions with squared terms or interactions. Pooled regressions are usually carried out to analyze available time series of cross-sections. The main advantage of pooled regression is the ability to measure different factors at the region level and aggregate results at the national level. The main disadvantages of pooled regression are overestimating and underestimating the impact in the regions. Stacked within-transformed linear regression analysis was performed by Stata program xtstackreg [37] . Regarding the suitability and applicability, stacked within-transformed linear regression accommodated fixed-effects estimation, applied a degrees-of-freedom adjustment, and allowed for factor-variables in dependent variables. When regressing regarding China, the USA, and the World, all region-level units entered into regressions. After regression calculation, parts of the geographical covariates were left in the regression outcomes. The main advantage of stacked within-transformed linear regression is the ability to generate predictions from a "stacked" ensemble of models, including LASSO regression, k-nearest neighbors, random forest, and gradient boosting. This technique produces superior estimates with larger samples. Quantile regression for panel data was performed by Stata program qregpd with Nelder-Mead optimization [38] . Likewise, quantile regression for panel data addresses a fundamental problem posed by alternative fixed-effect quantile estimators: inclusion of individual fixed effects alters the interpretation of the estimated coefficient on the treatment variable. Compared to the standard mean regression models, quantile regression models are more robust and flexible, which can help to account for unobserved heterogeneity and heterogeneous covariates effects. According to Powell (2015) , a quantile regression estimator can be used to evaluate impacts of exogenous and endogenous treatment (1) NC ∼ 0 + 1 NH + 2 ND + 1 . (2) NC ∼ 0 + 1 ND + 2 . (3) NC ∼ 0 + 1 NR + 2 ND + 3 . Relationship between ND and NR regarding the World variables on an outcome distribution among the sample with small T [39] . Simultaneously, random-effects negative binomial regression and random-effects Poisson regressions were conducted. This study included panel models for the associations of interest and prediction models for the effects of key events. A panel event study implemented by the program "eventdd" in Stata [40] was employed to analyze how the key events influenced NC. With a difference-in-difference style model, a series of lag and lead coefficients and confidence intervals (CIs) were estimated and plotted. In the context, three key events were adopted as treatments regarding China, the USA, and the World (outside Antarctica, China, the USA, and MS Zaandam), respectively. On February 5, 2020, China released tax exemption and loan policies to beef up coronavirus containment (http:// en. nhc. gov. cn/ 2020-02/ 06/c_ 76511. htm). Coronavirus Guidelines for America was issued on March 16, 2020 in the USA (https:// www. white house. gov/ briefi ngs-state ments/ coron avirus-guide lines-ameri ca/). On March 11, 2020, WHO characterized COVID-19 as a pandemic (https:// www. who. int/ emerg encies/ disea ses/ novelcoron avirus-2019/ events-as-they-happen). All analyses were performed with Stata (Version 14 and 16, Stata Corporation, College Station, TX, USA). In Table 2 , NC was significantly predicted by ND and NH regarding China. Simultaneously, NC was significantly predicted by ND regarding the USA. NC was significantly predicted by ND and NR regarding the World. In Before conducting random-effects Poisson regression and random-effects negative binomial regression, the 66 values of NC (< 0) were treated as missing values. The results from the estimation presented in Table 4 indicated that ND and regions had significant effects on NC regarding China. The results from the estimation presented in Table 5 indicated that ND had significant effects on NC in stacked within-transformed linear regression, quantile regression for panel data, random-effects Poisson regression, and random-effects negative binomial regression regarding the USA. Moreover, regions had significant effects on NC in random-effects Poisson regression regarding the USA. The results from the estimation presented in Table 6 indicated that ND and NR had significant effects on NC in stacked within-transformed linear regression and randomeffects Poisson regression regarding the World. Moreover, regions had significant effects on NC in random-effects Poisson regression regarding the World. Figure 10a- This study employed publicly available daily datasets including the samples of China, the USA, and the World (outside Antarctica, China, the USA, and MS Zaandam) and obtained the associations of ND, NR, and NH with NC regarding China, the USA, and the World, respectively. In panel event study, curve lines showed key events influenced NC regarding the USA and the World significantly, while straight line showed key events nearly had no significant influence on NC regarding China. Congruent with a prior study [41] , this study confirmed the effects of control measures. Regression outcomes provided coarse estimates of controlling performance comparisons of COVID-19 pandemic. This study was in line with early simulation outcomes which found that their NH rates were the approximately linear increasing functions and the ND rates were the small constants [42] . This could partially explained by an early study which indicated that socio-economic determinants and city sizes had high impacts on the change of COVID-19 transmission in China [43] . Because of mean value of NH/NC ratio (China) > mean value of NR/NC ratio (the World) and mean value of NH/ ND ratio (China) > mean value of NR/ND ratio (the World), the practical performance of COVID-19 controlling in China was seemly better than that in the other countries. Some Chinese scholars agreed with this judgment [44, 45] . With regard to methodologies, the findings in panel event study were in line with prior studies. For example, an exploratory data analysis with visualizations had been made to understand the number of NR, NC, and ND in China [46] . An 82-day (January 21 to April 12, 2020) forecast infections for COVID-19 death indicated that forecast placed the COVID-19 peak in the USA around July 14, 2020 [47] . This study was in line with another study which revealed that the effect of NC on ND was heterogeneous across provinces in China [48] . Furthermore, the spread of COVID-19 up to February 5, 2020 the number of NC showed a trend of "rapid increase before slowing down" [49] . Another forecast showed that the cumulative number of cases for Italy, UK and the USA corresponded to the diminishing average daily rate, from April 22 to May 22, 2020 [50] . Changes of COVID-19 ND, NH, NR, and NC in various regions could be influenced by life style, environmental factors, regulations, and progressing stages. Regarding life style, change in social distancing [51] , increase of space-time clusters [52] , and different sets of neighborhood characteristics [53] could be identified as risk factors for ND and NC during the COVID-19 pandemic. As to environmental factors, a study indicated temperature and the columnar density of total atmospheric ozone had a strong association with the tendency of COVID-19 spreading in almost all states in the USA [54] . As for regulations mainly including mobility restrictions and other non-pharmacological interventions, ill-prepared work [55] , facemask shortage [56] , poor traveller screening [57] , forgone care [58] , and population migration [59] could lead to ineffective prevention and controlling COVID-19. Regarding progressing stages, changes of COVID-19 ND, NH, NR, and NC might be caused by COVID-19 epidemic progressing laws differentially in various countries. Theoretically, various phases of COVID-19 epidemic documented four phases in 61 most affected countries [60] , three or four phases in Wuhan City, Hubei Province and China [61] , and five stages in China's non-Hubei provinces [62] . There were small curves in the point estimation regarding China and wide range of trajectories regarding the USA. This could be partially explained by several studies. For example, a study showed rapid nucleation and diffusion in January 2020 followed by rapid NC decrease in February in China, while the USA showed a wide range of trajectories, with an abrupt transition from slow NC increase in January and February, to rapid geographic dispersion shortly before mobility reductions occurred in March [63] . Regarding the epidemic trends of national and state regional administrative units, a study from July 27, 2020, to January 22, 2021 indicated the turning point of the early epidemic in the USA was predicted to occur in September [64] . Another model inferred that the inflection point of the epidemic across China would be mid-February, and the end of the epidemic would be in late March [65] . Regarding data sources, this study employed three datasets. The current study had a large sample size which increased the precision of the study. Additionally, more than 1-year period could provide reliable results regarding epidemic control and daily changes in the prevalence of COVID-19 conditions. Regarding statistical methods, this study adopted several advanced panel regression methods. Especially, the event study with difference in difference was used to analyze the role of key events. Compared with the other studies [66] [67] [68] [69] , the results from this study were significantly more accurate, realistic, appropriate, and suitable for long-time series outbreak data. Another advantage of this study was under the consideration of key events. There were several limitations. First, several variables including demographics, financial support, and international aids were not taken into account. Statistically, a study in South Korea found that sex, region, and infection reasons affected on both NR and ND [13] . Second, this study did not adopt newly designed methods conceived by the author to analyze the law of spread and transmission of COVID-19. Changes in case definitions affected inferences on the transmission dynamics of COVID-19 allowed detection of more cases as knowledge increased in China [70] . Finally, this study did consider one key event rather than varying treatment time and duration [71] . Using panel analysis and data collected in China provincelevel units, the USA state-level units, and the World countrylevel units (outside Antarctica, China, the USA, and MS Zaandam), regressions confirmed the positive panel associations between NH, ND, and NC regarding China, between ND and NC regarding the USA, between NR, ND, and NC regarding the World. Panel event study showed key events influenced NC regarding the World and the USA more forceful and unsteady as compared to that regarding China. Future work on the basis of the current study should be performed on the influencing mechanism of the panel associations. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/. The Southwest Region (Arizona, New Mexico, Oklahoma, and Texas), and The West Region (Alaska, Colorado Impact of international travel and border control measures on the global spread of the novel 2019 coronavirus outbreak The effect of travel restrictions on the spread of the 2019 novel coronavirus (COVID-19) outbreak Spatial and temporal trends in social vulnerability and COVID-19 incidence and death rates in the United States Prediction of COVID-19 transmission dynamics using a mathematical model considering behavior changes Practical strategies against the novel coronavirus and COVID-19-the Imminent Global Threat Adaptation of the national plan for the prevention and fight against pandemic influenza to the 2020 COVID-19 epidemic in France Comparative infection modeling and control of COVID-19 transmission patterns in China Prediction of daily COVID-19 cases in European countries using automatic ARIMA model Data-based analysis, modelling and forecasting of the COVID-19 outbreak Nowcasting and forecasting the potential domestic and international spread of the 2019-nCoV outbreak originating in Wuhan, China: a modelling study Predicting the epidemic trend of COVID-19 in China and across the world using the machine learning approach Lockdown contained the spread of 2019 novel coronavirus disease in Huangshi city, China: early epidemiological findings Data analysis of coronavirus COVID-19 epidemic in South Korea based on recovered and death cases Inferring UK COVID-19 fatal infection trajectories from daily mortality data: were infections already in decline before the UK lockdowns? Predictive modeling of COVID-19 death cases in Pakistan A numerical simulation of fractional order mathematical modeling of COVID-19 disease in case of Wuhan China The lag between daily reported Covid-19 cases and deaths and its relationship to age Time-to-death approach in revealing chronicity and severity of COVID-19 across the world Analyzing the effect of duration on the daily new cases of COVID-19 infections and deaths using bivariate Poisson regression: a marginal conditional approach On the uncertainty of real-time predictions of epidemic growths: a COVID-19 case study for China and Italy Short-term forecast in the early stage of the COVID-19 outbreak in Italy. Application of a weighted and cumulative average daily growth rate to an exponential decay model Machine learning techniques to detect and forecast the daily total COVID-19 infected and deaths cases under different lockdown types [published online ahead of print Spatio-temporal estimation of the daily cases of COVID-19 in worldwide using random forest machine learning algorithm COVID-19: short term prediction model using daily incidence data Modeling and forecasting number of confirmed and death caused COVID-19 in IRAN: a comparison of time series forecasting methods Time series forecasting of Covid-19 using deep learning models: India-USA comparative case study Count-valued time series models for COVID-19 daily death dynamics Fourier analysis using the number of COVID-19 daily deaths in the US A novel IDEA: the impact of serial interval on a modified-Incidence Decay and Exponential Adjustment (m-IDEA) model for projections of daily COVID-19 cases China Data Lab. 2020. Policies and regulations Forecasting COVID-19 confirmed cases, deaths and recoveries: revisiting established time series modeling through novel applications for the USA and Italy China COVID-19 daily cases with Basemap US COVID-19 daily cases with Basemap World COVID-19 daily cases with Basemap Speaking Stata: front-and-back plots to ease spaghetti and paella problems vcemway: a one-stop solution for robust inference with multi-way clustering STACKREG: stata module to perform stacked linear regression analysis to facilitate testing of multiple hypotheses QREGPD: stata module to perform quantile regression for panel data Quantile regression with nonadditive fixed effects. RAND Labor and Population Working Paper EVENTDD: stata module to panel event study models and generate event study plots Prediction of the confirmed cases and deaths of global COVID-19 using artificial intelligence Long-term predictions of current confirmed and dead cases of COVID-19 in China by the non-autonomous delayed epidemic models Impacts of socio-economic determinants, spatial distance and climate factors on the confirmed cases and deaths of COVID-19 in China Fine-scale space-time cluster detection of COVID-19 in Mainland China using retrospective analysis Transmission dynamics and control measures of COVID-19 outbreak in China: a modelling study Analyzing the epidemiological outbreak of COVID-19: a visual exploratory data analysis approach Forecast and evaluation of COVID-19 spreading in USA with reduced-space Gaussian process regression Investigating the cases of novel coronavirus disease (COVID-19) in China using dynamic statistical techniques Spatio-temporal evolution and influencing mechanism of the COVID-19 epidemic in Shandong province Forecasting the cumulative number of confirmed cases of COVID-19 in Italy, UK and USA using fractional nonlinear grey Bernoulli model Role of social distancing in tackling COVID-19 during the first wave of pandemic in Nordic region: evidence from daily deaths, infections and needed hospital resources Daily surveillance of COVID-19 using the prospective space-time scan statistic in the United States COVID-19 testing, case, and death rates and spatial socio-demographics in New York City: an ecological analysis as of Analysis of the spread of COVID-19 in the USA with a spatio-temporal multivariate time series model Preparedness and vulnerability of African countries against importations of COVID-19: a modelling study Facemask shortage and the novel coronavirus disease (COVID-19) outbreak: reflections on public health measures Estimated effectiveness of symptom and risk screening to prevent the spread of COVID-19 Reports of forgone medical care among US adults during the initial phase of the COVID-19 pandemic Population migration, spread of COVID-19, and epidemic prevention and control: empirical evidence from China Global analysis of daily new COVID-19 cases reveals many static-phase countries including the United States potentially with unstoppable epidemic Dynamics of the coronavirus disease 2019 (COVID-19) epidemic in Wuhan City, Hubei Province and China: a second derivative analysis of the cumulative daily diagnosed cases during the first 85 days Spatio-temporal distribution characteristics and influencing factors of COVID-19 in China Spatiotemporal evolution of COVID-19 infection and detection within night light networks: comparative analysis of USA and China Exploring the influence of human mobility factors and spread prediction on early COVID-19 in the USA A simple transmission dynamics model for predicting the evolution of COVID-19 under control measures in China Forecasting the number of confirmed new cases of COVID-19 in Italy for the period from Forecasting the dynamics of cumulative COVID-19 cases (confirmed, recovered and deaths) for top-16 countries using statistical machine learning models: auto-regressive integrated moving average (ARIMA) and seasonal auto-regressive integrated moving average (SARIMA) Prediction of confirmed cases of and deaths caused by COVID-19 Chile through time series techniques: a comparative study Short-term forecasting of daily confirmed COVID-19 cases in Malaysia using RF-SSA model Effect of changing case definitions for COVID-19 on the epidemic curve and transmission parameters in mainland China: a modelling study flexpaneldid: a stata command for causal analysis with varying treatment time and duration. IWH Discussion Papers 5/2019