key: cord-0858964-0gq2x34a authors: Chaubal, R.; Kannan, S.; Khattry, N.; Gupta, S. title: WORLDWIDE CASE FATALITY RATIO OF COVID-19 OVER TIME date: 2020-10-06 journal: nan DOI: 10.1101/2020.10.04.20206599 sha: 016abd5ef58e02d5c8dc54bff354bf1c140e9aec doc_id: 858964 cord_uid: 0gq2x34a ABSTRACT Background The case fatality ratio (CFR) of coronavirus disease 2019 (COVID-19) has been reported to be variable among different countries and regions but few analyses have tracked this ratio worldwide over time. Methods The primary objective was to assess the time-course evolution of CFR of COVID-19 in all countries with available data and secondary objective was to evaluate associations between country-wise CFR and country-level health, human development, demographic and economic parameters. Day-wise data of COVID-19 cases and deaths for each country was extracted from a public repository and countries with at least 1000 cases on cutoff date were clustered by unsupervised kmeans on the basis of deaths per 100000 population (DP100K). Day-wise CFR (cumulative deaths divided by cumulative cases, multiplied by 100) for each country and cluster (country group) was plotted as time-series and country-level parameters were tested for association with CFR using weighted multiple linear regression. Results On September 24, 2020 there were 32140504 cumulative COVID-19 cases and 981792 deaths reported from 184 countries for a worldwide CFR of 3.06 % (95%CI 3.05 -3.07). Unsupervised k-means clustering in 157 countries with at least 1000 reported cases resulted in Clusters (country groups) A, B, C, D and E with centroid DP100K and CFR of 0.100 and 2.51 (95% CI 2.42-2.61), 0.503 and 2.28 (95% CI 2.23-2.33), 1.816 and 1.73 (95% CI 1.71-1.75), 7.395 and 1.76 (95% CI 1.75-1.76),4 and 36.303 and 3.82 (95% CI 3.82-3.83), respectively. In a log-log analysis DP100K and CFR were significantly positively correlated (R=0.3570, p<0.001) with each other. All country groups and majority of included countries showed a pattern of gradually increasing CFR from the beginning of pandemic, followed by a plateau and then a steady decline in CFR. Among 10 country-level parameters, GDP per capita ({beta}=-0.483, p=0.000), hospital beds per population ({beta}=-0.372, p<0.001), mortality from air pollution ({beta}=-0.487, p=0.003) and population density ({beta}=-0.570, p< 0.000) were significantly negatively associated while maternal mortality ratio ({beta}=0.431, p=0.000) and age ({beta}=0.635, p<0.000) were positively associated with CFR. Conclusions The CFR of COVID-19 has gradually increased over time in majority of countries at various stages of the pandemic, followed by a plateau and a steady decline. Population level COVID-19 mortality burden and CFR are significantly positively associated with each other. The novel coronavirus SARS-CoV-2, causative agent of coronavirus disease 2019 , has caused a worldwide pandemic of massive proportions with a large number of fatalities. There is no proven vaccine to prevent this infection and, as yet, no proven treatment with high efficacy. It is also uncertain whether infected individuals will develop long lasting immunity. SARS-CoV-2 causes severe disease, manifested primarily by pneumonia and respiratory failure, in a small proportion of cases and this is the primary cause of mortality. Because of the high person-toperson transmission, SARS-CoV-2 has overwhelmed the healthcare systems of many countries, including several high-income countries. From a public health perspective, it is of great importance to be able to have accurate estimates of the case fatality ratio of an epidemic disease. The reported case fatality ratios of COVID-19 have ranged from 0.5% to almost 10% in various countries with reported variation within countries as well. The propensity to suffer severe disease and death from COVID-19 has been reported to be impacted by many factors including patient's age and comorbidity status, with older individuals or those with comorbidities having a higher mortality rate [1] [2] [3] [4] [5] [6] [7] . Because these and other as yet unknown factors impact mortality and are variably distributed between countries and regions, it is difficult to compare unadjusted CFR across countries. However, within the same country taken as a unit, the variability in these factors is likely to be less, and therefore it may be relevant and insightful to track the unadjusted CFR within countries over time. Such an analysis could provide time trends that may help inform public health planning, forecasting and resource allocation. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint 6 We performed, and report here, an analysis of the day-wise case fatality ratio of COVID-19 in all countries, using an open source data repository that is regularly updated, starting from the first reported case in that repository, until the data cut-off date. Ethics Declaration: This analysis was performed by academic researchers from a large tertiary cancer centre in India. Institutional ethics committee approval was not sought for this analysis, as per institutional guidelines, because it did not involve collection or handling of individual patient data. All the raw data used in this analysis is available as supplementary files The primary objective was to analyse the time-course evolution of COVID-19 CFR in individual countries, keeping a 24-hour day as the unit of time. The secondary objectives were to correlate the country-wise CFR with 10 country-level health, human development, demographic and economic parameters (listed below under 'Data Extraction') to identify any associations. Countries whose time sequential data for COVID-19, in terms of cumulative cases, cumulative deaths and cumulative recoveries, were reported in the COVID-19 GitHub Repository for the COVID-19 dashboard by the Center for Systems Science and Engineering of the Johns Hopkins University School of Medicine (CSSE-JHU) [8], were included in the analysis. We excluded data from two cruise ships (The Diamond Princess and MD Zandaam) which reported localized outbreaks of COVID- 19 . Data from countries with at least 1000 cumulative confirmed COVID-19 cases reported on the cut-off date were included for unsupervised clustering based on COVID-19 deaths per 100000 population (DP100K), plotting the CFR time-trends, . CC-BY-NC-ND 4.0 International license It is made available under a perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint 7 and performing the weighted multiple linear regression between country-level parameters and CFR. The sources of data in this Repository include World Health Organization, various national governments and other publically accessible resources [9] . The analysis included data from the date of first reporting on this Repository until the data cut-off date. Data on COVID-19 cumulative cases and deaths as on cut-off date were downloaded from the online Repository (Supplementary Table 1 is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted October 6, 2020. We calculated the COVID-19 deaths per 100000 population (DP100K) for each country using the total COVID-19 deaths on the cutoff date and country population for the year 2019. The DP100K was used to group countries with various levels of population-level COVID-19 mortality burden for depiction of CFR time-trends. Log (base 2) transformed DP100K on the x-axis was plotted against log (base 2) transformed CFR on the y-axis for countries with at least 1000 COVID-19 cases on the cutoff date and a linear regression, with coefficient of correlation, computed between these two variables. An unsupervised k-means clustering was performed based on log DP100K using Tableau, which uses Lloyd's algorithm with squared Euclidean distances to estimate distances of all points in a cluster from the centroid of that cluster and centroids in other clusters. The algorithm estimates the sums of 'between cluster' and 'within cluster' distances which are used to calculate the ratio of variances for the above two metrics. At least 25 iterations were performed for . CC-BY-NC-ND 4.0 International license It is made available under a perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted October 6, 2020. . https://doi.org/10.1101/2020.10.04.20206599 doi: medRxiv preprint k=1…25 wherein for each value of k, the ratio of the two variances was used to identify the first local maximum value for Calinski-Harabasz index (CH index) for smallest value of k. We additionally generated Elbow plots to manually identify the appropriate 'k' for clustering and see its agreement with C-H generated k. Wilcoxon signed rank was used to compare the median CFR of each country group identified by k-means clustering and this data visualised as violin plots. We calculated the CFR for each day for each country from the downloaded data and depicted it date-wise for each DP100K cluster. Data points that were outliers and skewed visual representation were either plotted on a secondary y-axis and indicated in the graphs or removed from visualization (Supplementary Table 3 ). We estimated the mean change in CFR per day for the most recent 10 calendar days from the cut-off date (including the cut-off date) for whole world, countries with >/= 1000 cases, country clusters, and each country, by subtracting the CFR on day minus10 from the CFR on the cut-off date, and dividing the difference by 10. A 2sided 95% confidence interval for CFR on the cut-off date was calculated for the whole world, countries with >/= 1000 cases, each country cluster and each country. CFR for countries with >/= 1000 cases on the cut-off date were evaluated for independent association with 10 different country-level indices of health, human development, population demographics, economic development and mobility using weighted multiple linear regression analysis. Weighted regression was used because different countries with different numbers of cases and deaths contributed unequal amount of information. Weights were assigned using the inverse of the square of the standard error of CFR for each country. The multicollinearity assumption was tested using Variance Inflation Factor (VIF). Independence of residuals and homoscedasticity assumptions were checked using Durbin-Watson test and residuals plot (scatter plot of standardised residuals versus standardised is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted October 6, 2020. . https://doi.org/10.1101/2020.10.04.20206599 doi: medRxiv preprint 10 predicted values), respectively. Variables which showed high collinearity (VIF > 10) were removed systematically at each step until the model only included factors which showed VIF < 10. The data cut-off date was September 24, 2020 and included all cases and deaths is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted October 6, 2020. . Table 1 shows the number of countries, cases, deaths, CFR with 95%CI, DP100K, centroid DP100K, centroid CFR, and 10-day mean change for clusters A to E. Supplementary Table 4 shows the CFR with 95% CI and 10-day mean change in CFR for individual countries in each country group. Figure 3 shows the day-wise time trend of CFR for the whole world, countries with case load >/= 1000 and country groups A to E and shows that the case fatality ratio of COVID-19 has been increasing over time in all these clusters since the beginning of the pandemic, followed by a plateau and then a downward trend. The CFR on September 24, 2020 for whole world, countries with >/= 1000 cases, Group A, Group B, Group C, Group D, and Group E were 3.06% (95% CI 3.057-3.07), 1.73 (95% CI is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted October 6, 2020. . States has the maximum number of cases but has lower CFR than many countries with far fewer cases. This could be due to widespread testing and detection of many more patients with mild or asymptomatic disease but could also be due to the good quality healthcare infrastructure in United States. Supplementary Table 9 shows the independent association of country-level health, human development, demographic, economic development and mobility indices with CFR in countries with case load >/= 1000 using weighted multiple linear regression. Multicollinearity assumption testing using Variance Inflation Factor (VIF) identified Infant Mortality rate/100,000 live births as being collinear with other indicators and was removed from the analysis. We also excluded smoking prevalence due to missing values from 30 countries. Our final model suggests that GDP per capita (β=-0.483, p=0.000), hospital beds per population (β=-0.372, p<0.001), mortality from air . CC-BY-NC-ND 4.0 International license It is made available under a perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted October 6, 2020. . https://doi.org/10.1101/2020.10.04.20206599 doi: medRxiv preprint 13 pollution (β=-0.487, p=0.003) and population density (β=-0.570, p< 0.000) were significantly negatively associated while maternal mortality ratio (β=0.431, p=0.000) and age (β=0.635, p<0.000) were positively associated with CFR. Of note, this model was able to account for much of the observed variation in CFR (R 2 =0.46). Our time series analysis using open source data suggests that the case fatality ratio of COVID-19 pandemic has increased over the period of its emergence in most countries and is currently at 3.06% for the aggregate of world data. The CFR of COVID-19 is twice that of the corresponding CFR (0.4-1.5%) reported by the World Health Organization for the H1N1 influenza pandemic [12] . The high CFR suggests that COVID-19 is a severe disease with important public health ramifications. The CFR has increased in countries with high number of cases as well as in those with lower case load. The gradually increasing CFR followed by plateau could be a manifestation of the long duration (2-8 weeks) [13] for COVID-19 related deaths to manifest in countries which have experienced rapid increase in case load in the past few months. Testing for SARS-CoV-2 has become more widely available and, in many countries, is now being offered to patients with less severe disease. This would be expected to increase the denominator and bring down the CFR and this may be the reason for the plateau that is becoming apparent in some countries and country groups. As expected, the CFR in various countries and groups of countries is variable. This is likely because of the imbalanced distribution of factors like age and comorbidity status that impact COVID-19 related mortality and also because of variable availability of advanced intensive care infrastructure. It is also possible that unknown host or pathogen related factors are contributing to the variability in CFR. Because is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted October 6, 2020. . https://doi.org/10.1101/2020.10.04.20206599 doi: medRxiv preprint patient level data were not available to adjust potential confounding, we have not formally, statistically compared the observed CFR between various countries or clusters of countries. Our analysis suggests that there is a significant positive correlation between the country-level COVID-19 mortality burden as estimated by deaths per 100000 population (DP100K) and CFR (Supplementary Figure 1.B) , but we are unable to decipher the presence or predominant direction of causality in this association. It is possible that CFR could contribute to DP100K, DP100K could contribute to CFR because of overwhelmed health infrastructure or unknown variable(s) could contribute to both in the same direction. Other analyses have reported various figures for CFR of COVID-19 ranging from less than 1% to more than 10% [7, 14, 15, 16] . Some of these analyses have reported CFR after accounting for time-lag and other adjustments [17, 18, 19, 20] . It could be argued that the number of asymptomatic infected individuals is much larger than the current number of known infections and hence the CFR of COVID-19 is much lower than is estimated from so-called 'naïve' ratio that we have reported. While it might be true that the number of infected individuals is larger and the CFR among 'universal set' of infected individuals (the so-called infection mortality ratio) is lower, we would argue that the estimate of greatest interest from a clinical and health infrastructure perspective is the fraction of deaths among those individuals who are symptomatic. We would further argue that community level testing of asymptomatic individuals in the 'screening' mode and using that number to calculate CFR deemphasizes the actual severity of the clinical disease. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted October 6, 2020. . https://doi.org/10.1101/2020.10.04.20206599 doi: medRxiv preprint 15 challenging. Our analysis of CFR as a time trend within countries is robust because the demographic and other variables within a country remain stable over the period of this analysis. Our analysis in countries with heaviest COVID-19 involvement suggests a pattern of increasing CFR with increasing case load. This is likely due to high stress on the acute care infrastructure as cases increase. Countries with lesser case load at present should anticipate this pattern in the near future. This inference is further strengthened by the significant inverse association of hospital beds per unit population with CFR in the country-level analysis. Our country-level analysis suggests that higher resources (higher GDP per capita and hospitals beds per population) are associated with lower CFR, which attests to the imperative of supporting countries with lesser resources. Higher age is also independently associated with CFR which could explain the relatively high fatalities in some European countries. An enigmatic inverse association in our analysis is with population density. Many countries with high density are also the ones with lesser resources and the lower CFR and deaths in these countries, relative to their population sizes, is as yet unexplained. There are some limitations of our analysis. We did not account for time-or severitydependent reporting of cases and also for the time-lag in outcomes. It is possible that COVID-19-related deaths have been underreported in some parts of the world [21] and that the actual CFR is higher than our estimates. We did not have access to patient-level data and data of such granularity may never become available for such a large number of cases and deaths. Therefore, we have been unable to provide 'adjusted' CFR after accounting for the distribution of variables. The correlation between CFR and health and social parameters could possibly be masked or is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted October 6, 2020. . https://doi.org/10.1101/2020.10.04.20206599 doi: medRxiv preprint confounded because the latter were available at the country-level with likely intracountry variation. In summary, our analysis confirms that COVID-19 is a severe disease in a considerable minority of patients, its CFR has gradually increased in many parts of the world, and, given the large number of cases, is likely to result in high absolute number of deaths unless an effective vaccine or treatment is developed. Our analysis also indicates that the herd immunity argument is likely inappropriate and will possibly result in a high number of fatalities. Governments and public health authorities should consider the CFR among clinically presenting cases and plan acute care capacity accordingly. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted October 6, 2020. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted October 6, 2020. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted October 6, 2020. . is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted October 6, 2020. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted October 6, 2020. . https://doi.org/10.1101/2020.10.04.20206599 doi: medRxiv preprint J a n 2 2 Gr o u p C Gr o u p C A u s t r a l i a B a n g l a d e s h C a me r o o n C e n t r a l A f r i c a n R e p u b l i c C o n g o ( B r a z z a v i l l e ) C u b a C y p r u s E t h i o p i a Gh a n a Gr e e c e Ha i t i I c e l a n d S y r i a J a p a n T u n i s i a Ni c a r a g u a V e n e z u e l a Gu i n e a -B i s s a u J a ma i c a K e n y a L a t v i a L e s o t h o L i b e r i a L i t h u a n i a Ma l a wi Ma u r i t a n i a Ne p a l S e n e g a l U r u g u a y U z b e k i s t a n Y e me n Z a mb i a Z i mb a b we Ga b o n S u d a n Gr o u p D A f g h a n i s t a n A l b a n i a A l g e r i a A u s t r i a A z e r b a i j a n B a h r a i n B e l a r u s B e l i z e B u l g a r i a is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted October 6, 2020. . https://doi.org/10.1101/2020.10.04.20206599 doi: medRxiv preprint J a n 2 2 World CFR on 24 th September was 3.06. 43 /157 countries report a CFR higher than the world CFR of 3.06 is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted October 6, 2020. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted October 6, 2020. . https://doi.org/10.1101/2020.10.04.20206599 doi: medRxiv preprint Supplementary Table 1 : Raw data downloaded from the CSSE-JHU github repository on 25 th September 2020 and used for analysis figure 4 to enable better visualization of CFR trends over time. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted October 6, 2020. . https://doi.org/10.1101/2020.10.04.20206599 doi: medRxiv preprint Characteristics of and Important Lessons From the Coronavirus Disease 2019 (COVID-19) Outbreak in China: Summary of a Report of 72314 Cases From the Chinese Center for Disease Control and Prevention Clinical features of covid-19 BMJ Presenting Characteristics, Comorbidities, and Outcomes Among 5700 Patients Hospitalized With COVID-19 in the New York City Area Clinical and epidemiological features of 36 children China: an observational cohort study Clinical Characteristics of Covid-19 in New York City Covid-19 in Critically Ill Estimating the infection and case fatality ratio for coronavirus disease (COVID-19) using age-adjusted data from the outbreak on the Diamond Princess cruise ship Estimates of the severity of coronavirus disease 2019: a model-based analysis Estimating case fatality rates of COVID-19 Estimating case fatality rates of COVID-19 Estimating case fatality rates of COVID-19 Real estimates of mortality following COVID-19 infection Global coronavirus death toll could be 60% higher than reported. Financial Times