key: cord-325931-9gqonmf5 authors: Nguimkeu, Pierre; Tadadjeu, Sosson title: Why is the Number of COVID-19 Cases Lower Than Expected in Sub-Saharan Africa? A Cross-Sectional Analysis of the Role of Demographic and Geographic Factors date: 2020-10-21 journal: World Dev DOI: 10.1016/j.worlddev.2020.105251 sha: doc_id: 325931 cord_uid: 9gqonmf5 Unlike initially predicted by WHO, the severity of the novel coronavirus pandemic has remained relatively low in Sub-Saharan Africa, more than two months after the first confirmed cases were identified. In this paper, we analyze the extent to which demographic and geographic factors associated to the disease explain this phenomenon. We use publicly available data from a cross-section of 182 countries worldwide, and we employ a regression analysis that accounts for possible misreporting of COVID-19 cases, as well as a Ramsey-type specification that preserves degree of freedom. We found that proportion of population aged 65+, population density, and urbanization are significantly positively associated with high numbers of active infected cases, while mean temperature around the first quarter (January-March) is negatively associated to this COVID-19 outcome. These factors are those for which Africa has a comparative advantage. In contrast, factors for which Africa has a relative disadvantage, such as income and quality of health care infrastructure, are found to be insignificant predictors of the spread of the pandemic. These results hold even when accounting for possible underreporting, as well as differences in the duration of the epidemic in each country, as measured by the time elapsed since the first confirmed case occurred. We conclude that differences in demographic and geographic characteristics help understand the relatively low progression of the pandemic in sub-Saharan Africa as well as the gap in the number of active cases between this region and the rest of the World. We also found, however, that this gap is insignificant beyond these factors, and is expected to narrow over time as the pandemic evolves. These results provide insights for relevant urban policies and kinds of development planning to consider in the fight against disease spreads of the coronavirus type. The coronavirus disease 2019 caused by a novel SARS-CoV-2 that emerged in China last year, and has since spread to all regions of the world, is causing major ravages worldwide as reported by the World Health Organization (WHO). Since the burst of this disease, experts have been announcing a drastic surge and dramatic consequences of the epidemic in sub-Saharan Africa (SSA), a region already plagued by poverty and lack of health infrastructure. However, more than two months after the first confirmed case in the continent, the transmission and severity of the novel coronavirus pandemic has remained relatively low in sub-Saharan Africa, while other regions of the world (such as Europe and USA) have been more seriously hit 1 . Sub-Saharan Africa is indeed the least affected region, with 28,848 infected cases and 1112 deaths recorded as of April 27, 2020 (Our world in data 2020). Understanding why the severity of COVID-19 in sub-Saharan Africa remains comparatively low in spite of the weak health-care system, inadequate surveillance and laboratory capacity, scarcity of public health human resources, and limited financial means (Nkengasong and Mankoula 2020) is therefore a question that merits attention. Although there is no clear consensus about this seemingly puzzling situation, a number of hypotheses have been posited to try to explain the low numbers observed in sub-Saharan Africa. One argument that emerged was that, since very few African countries have sufficient and appropriate diagnostic capacities, the number of cases are largely underreported. However, as explained by Dr John Nkengasong, head of the African Center of Disease Control and Prevention, the fact that health facilities are still not overwhelmed by patients may rule out this hypothesis. 2 Other hypotheses build from recent clinical studies (e.g. China, Italy) that have identified the presence of pre-existing noncommunicable diseases such as cerebrovascular diseases (CVD), diabetes, hypertension, and cancer, as the main comorbidities associated with infected and death cases (Driggin et al. 2020 , Yang et al. 2020 ). On the other hand, while it has been documented that dense communities, urban congestions or colder weather may favor the transmission of viruses of respiratory syndrome such as influenza, measles, tuberculosis, coronavirus (Alirol et al 2011 , Van de Poel et al. 2012 , recent clinical studies show that individuals aged 60 years or older are at higher risk of contamination and death , WHO, 2020 . Against this backdrop, the present study aims to analyze the role of demographic and geographic (DG) factors in explaining the low severity of the epidemic in Sub-Saharan Africa (SSA) compared to other regions. We employ a regression analysis that estimates the number of active infected cases where these DG factors are used as explanatory variables. Based on the related literature, the demographic indicators considered are the median age and the proportion of population aged 65+, whereas geographical factors include population density, urbanization rate and mean first quarter temperature. Both these demographic and geographic factors are found to be significantly and positively associated with the number of active COVID-19 cases. Given that SSA countries exhibit relatively lower magnitudes in these factors compared to the rest of the world, they thus have a comparative advantage from these perspectives. In contrast, factors in which Sub-Saharan Africa has a considerable disadvantage such as income and quality of health infrastructure (measured by GDP per capita, and health expenditure), turn out to be insignificant predictors of the spread of the epidemic. Measures of epidemiological factors, especially the prevalence of diabetes, are also found to have no significant association with this COVID-19 outcome, possibly for endogenous behavioral reasons that we further discuss in Section 2. The only source of COVID-19 data available to us for this exercise is a publicly available one whose quality is, unfortunately, very uncertain. Our econometric specification attempts to solve this uncertainty by explicitly accounting for possible underreporting in the official number of confirmed cases used, as well as for the lag in the disease introduction in each country, as measured by the time elapsed since the first confirmed case was detected. The latter also allows to capture the learning effect, as countries that experience the epidemic relatively later are likely to learn from successful coping strategies adopted by those that experienced it relatively earlier. To test whether and by how much the estimated effects of the DG factors differ between sub-Saharan Africa and the rest of the world, we employ a Ramsey-type device that preserves degrees of freedom. This consists in assessing whether an interaction between a SSA dummy and these factors help explain the outcome variable. Subject to the above-mentioned data quality caveats, our results provide conclusive evidence that the relatively low progression of the epidemic in Sub-Saharan Africa and the gap observed in the number of active cases compared to the rest of the world can be partly explained by the differences in demographic and geographic factors. However, this gap narrows down with the duration of the epidemic and is not significant beyond these factors. These results call for strategies to implement mitigation efforts and containment measures that pertain to SSA situation, and provide insights on policies and program interventions that could be considered to prevent the spreads of disease of coronavirus type. This paper is organized as follows. Section 2 discusses the background and descriptive statistics. Section 3 presents the estimation approach and the results of the regression analysis. Section 3 concludes. The data consists of a cross-section of 182 countries affected by the Coronavirus pandemic. We collated the most recent data available from various sources including Worldometer Coronavirus (2020), World Development Indicators (WDI), Global Health Observatory (GHO), World Bank Climate Change Knowledge Portal, World Population Prospects (2019) and the Institute for Health Metrics and Evaluation (IHME). Data on COVID-19 spread includes the total number of confirmed cases, the total number of deaths and the total number of active cases (which is the total number of confirmed cases net of recoveries and deaths). 3 Figure 1 compares the trends in the average number of active cases in Sub-Saharan Africa versus the average of the rest of the world for the first 60 days since the pandemic has erupted in the given regions. 4 Overall, this trend has been consistently lower in sub-Saharan Africa compared to the rest of the world, even when adjusting for the lag in the timing of disease occurrence in both regions. Source: Author's construction from Our world in data (2020). Table 1 presents descriptive statistics of the variables of interests in our whole sample. We denote by Duration, the variable that accounts for differences in the timing of disease eruption, i.e. the number of days elapsed since the first case of COVID-19 to the observed date. The first panel summarizes the characteristics of the COVID-19, including the total number of cases, total number of deaths, the total number of active cases, and the duration of the epidemic from our data source. The second panel summarizes the epidemiological factors including the prevalence of CVD and the prevalence of diabetes (i.e. the proportion of people aged 20-79 that have type 1 or type 2 diabetes). The choice of these variables are based on WHO (2020) report emphasizing that those with such pre-existing medical conditions (i.e. cerebrovascular disease, diabetes, chronic respiratory disease, and cancer) are at higher risk. While COVID-19 infects people of all ages, Zhou et al. (2020) recently found that older people (e.g. 60+ years) are at higher risk of the disease. We therefore consider the proportion of the population aged 65+ as a fraction of the total population (denoted Pop 65+), and the median age of the population as our main demographic indicators, whose statistics are summarized in the third panel of the table. As for geographic factors which are given in the fourth panel of the table, we consider three indicators: urbanization (i.e. the proportion of people living in urban areas), population density (i.e. number of people per squared kilometer, km2), mean temperature (in Celsius degrees) of the first quarter of the year (January-March). Table 2 presents summary statistics of our variables of interest by region. It shows that compared to all other regions, sub-Saharan Africa has the lowest prevalence of CVD and diabetes, the lowest proportion of population aged 65 +, the youngest population i.e. lowest median age, one of the lowest urbanization rate (apart from South Asia), one of the lowest population density (apart from North America), and the highest average temperature around the first quarter of the year. In particular, the Pop 65+ and Median age in Sub-Sahara is 3.3% and 20 years, compared to 16.5% and 40 years in North America. This is a huge difference in terms of age-related vulnerability. There is also a huge difference in the average temperature across regions. While SSA experiences an average of 26 o C during the months of January through March, other regions experience much lower temperature, especially Europe and Central Asia with an average of 1.89 o C, as well as North America with a negative average of -8.92 o C (see Table 2 ). Following related literature, we use GDP per capita as a measure of income, and health expenditure per capita as a measure of healthcare infrastructure. 5 Both can be understood as Economic indicators. For these factors, Sub-Saharan Africa is very disadvantaged, and has the lowest scores along with South Asia. For example, GDP per capita in Sub-Saharan Africa is about 20 times lower than in North America. The observed differences in these summary statistics suggests, as it will become clearer in the regression analysis below, that there should be important differential effects of the corresponding factors between SSA and the other regions. This section presents the regression model and the estimation approach, and discusses the associated results. We consider the log total number of active infected cases of COVID-19 as our response variable, which is the total number of cases net of total deaths and recovered. We use a linear regression model framework to analyze the relationship between disease outcome and demographic and geographic factors while controlling for epidemiologic, economic and health system infrastructure indicators: * = * 0 + * 0 + 1 + 1 where denotes the true COVID-19 outcome variable in country (e.g., log Active Cases); is a binary indicator * (dummy variable) for sub-Saharan Africa, which equals 1 if country is a sub-Sahara African country, and equals 0 otherwise; is the duration of the epidemic in country (i.e. the number of days elapsed since the first confirmed case was reported in country ); is a vector of explanatory variables including epidemiological, = [ 2 ,…, ] demographic, environmental, economic and health infrastructure factors in country ; is the total number of explanatory variables (excluding the dummy variable ). The error term, is assumed to be mean zero and captures * , all other factors driving the outcome that are not accounted for by our model specification. Unfortunately, the true COVID-19 outcome variable is unobserved due to uncertainties in data quality and we can * only observe a possibly misreported surrogate defined as where is the measurement error, or the amount of underreporting. If we assume that, for the reasons already evoked, data from sub-Saharan Africa are possibly more underreported than those of other regions, then we can write represents the average amount of excess underreported log number of active cases in SSA compared 0 ≥ 0 to other regions (whose average amount of underreporting is ), and is the residual measurement error that is 0 ≥ 0 assumed to be uncorrelated with and . 6 We therefore have , and the relationship , = * -0 -0between the reported cases and the factors of interest to be estimated can be summarized as: where is the reported number of active cases in country . The coefficient is the intercept and 0 = * 0 -0 0 = * 0 is the average difference in outcome between sub-Saharan Africa and the rest of the world, both exacerbated by -0 the amounts of misreporting and , respectively, conditional on the vector of factors 7 The coefficient is the 0 0 . 1 average ceteris paribus effect of the length of the pandemic on the outcome, whereas captures how this duration 1 effect varies between Sub-Saharan Africa and the rest of the world. These coefficients also capture the "learning" effect, given that countries that experienced the epidemic relatively later may have learned and adopted the most successful strategies adopted by those that faced it earlier. 8 average ceteris paribus effect of the specific factor on the outcome . Notice that the new error term, has an , increased variance as it includes both the measurement error component and the original error expected in the regression. Overall, if this model is correctly specified, the outcome of the regression will be an unbiased estimate of , but with reduced precision in these estimates, lower -statistics and a reduced [ 1 , 2 ,…, ] 2 . Since a focus in this study is on the conditional comparison of the outcome of sub-Saharan Africa with the rest of the world, a useful modification of Model (4) is to interact the sub-Saharan Africa dummy variable, , with all the explanatory variables of the model, and add these interaction terms, as = [ 2 ,…, ] additional regressors to the model to assess heterogeneity in the effects of the initial factors. However, this would significantly reduce the degree of freedom of the model (which is already low), and thus lead to low power in the significance testing of the coefficients. To overcome this issue, we adopt a device similar in spirit to the Ramsey RESET test, 9 which allows us to assess whether the interaction terms significantly explain , while 2 ,…, conserving in degrees of freedom. This consists in augmenting the interaction between and the OLS fitted values as an additional regressor of the model to see whether it significantly explains the response variable. Specifically, recall that the fitted values from Equation (4) is defined by where the components with "hat", represent the OLS estimates of their underlying quantities. These fitted values are therefore just linear functions of the independent variables. If we interact the fitted values with the dummy variable, , we get a particular function of the desired interactions terms, , and This suggests estimating 2 ,…, . a model of the form where stands for the fitted values obtained above. Notice that model (5) is equivalent to a "reduced-form" regression model of the form where the relationship between the two last sets of coefficients is given by 0 = 0 ; 0 = 0 + 2 ( 0 + 0 ); 1 = 1 ; 1 = 1 + 2 ( 1 + 1 ); = ; = 2 ; ≥ 2 (7) The sign and significance of the coefficient can therefore be used to assess how the effects of on 2 = [ 2 ,…, ] the outcome variable differ between Sub-Saharan Africa and the rest of the world. Once Equations (4) and (5) have been estimated, the coefficients of the reduced form model given by Equation (6) can be inferred using the formulas given by Equation (7), and their standard errors can be approximated using the delta method. Notice that the coefficients of the main baseline factors and do not change across specifications, but only the coefficients of the terms interactions with are likely to change across specifications. Our ultimate targets are those from the reduced form relationship given by Equation (6), although in this case, we are really only interested in computing and , 0 1 which can be readily obtained from the estimation of Equation (5) using the formulas in Equation (7). In addition, given that is an estimate of the expected value of given , and , using Equation (5) to estimate , the outcome variable is also useful to correct for potential heteroscedasticity, in case the error variance in Equation (4) is thought to change with , and . In summary the estimation method proceeds as follows. , Step 1: Estimate the model given by Equation (4) by OLS, and obtain the fitted values . Compute the interaction terms . Step 2: Run the regression model given by Equation (5) average difference in outcome and the differential effect of on the disease outcome between SSA and the rest of the world, conditional on all other factors. Given the small size of the sample, classic standard errors may not be correctly estimated from the usual asymptotic variance-covariance matrix. We therefore use the bootstrap method as a complementary estimation approach for these standard errors. To account for possible heteroscedasticity in the error term given in Equations (4) and (5), we also compute robust standard errors. 10 An obvious limitation of this econometric model is its inability to account for endogenous behavioral response to the disease outcome, which could be substantial. For instance, it is possible that people at higher risk are responding to the disease in a way that could mitigate the initially perceived impact. This is issue is especially true for epidemiologic factors. Addressing it would require ancillary information about the underlying behavior and possibly a different methodological approach. The empirical results for these factors should therefore be taken with extra caution. In contrast, demographic and geographic factors are less responsive to outbreaks so that the effect from these DG factors should be the most meaningful estimates in the empirical assessment. The regression results of the baseline estimates with log total number of active cases as the main dependent variable are presented in Table 3 . The estimation results for this outcome are summarized in columns (1) through (3). The first column presents the baseline estimates without the SSA regional dummy. The second column presents the estimates that include the SSA dummy variable (corresponding to Equation (4)), while the third column presents these results where this dummy variable is interacted with the fitted values of the regressions in Column (2) which corresponds to Equation (5). Preliminary analysis has shown strong pairwise correlation between CVD and diabetes prevalence, as well as median age and Pop 65+, respectively. These strong correlations have also been noticed in earlier work such as Halter et al (2014) . Hence, to avoid multi-collinearity, only one of these factors was included in the main regressions. In particular, Table 3 presents the results using the variables that have less missing values among highly correlated alternatives. This means that we focused on results with diabetes prevalence and Pop 65+ in our main discussion. Those with the alternative measures, i.e., CVD prevalence and median age, were less conclusive because of poor fit and statistical power (some due, e.g., to high rates of missing values), but are available from the authors. 11 Several functional forms were also considered (e.g., which explanatory variables to include in log form) and the reported results are based on those that gave the highest fit in terms of adjusted R-squared and/or pseudo-log likelihood value. All the values reported in columns (1) , (2) and (3) indicate global significance at 1%. The epidemiological predictors we use to assess the spread of the pandemic are the duration of the pandemic and the prevalence of diabetes in the country. While recent clinical studies suggest that cerebrovascular diseases and diabetes are some of the most distinctive comorbidities among patients under intensive care of COVID-19 (e.g. Fang et al., 2020; Yang et al. 2020 ), our current cross-country data does not provide enough evidence to support this hypothesis. Our regression estimation results show no statistical significance in the association between diabetes prevalence and the number of active COVID-19 cases worldwide. One possible explanation for this puzzling result, as already mentioned in the methodological discussion, is the presence of endogenous behavioral responses among diabetes and cerebrovascular diseases patients. Being more at risk, these individuals are likely to be more careful in adopting safety and social distancing measures than others, which could eventually cancel their higher mortality or contamination risk and lead to no significance in an empirical assessment. Our econometric model does not deal with this important issue. On the other hand, as one would expect, the duration of the pandemic appears to be a strong predictor of the disease spread. We estimate that any additional day in the duration of the epidemic is associated with a 4% increase in the number of active cases, and this effect is significant at 1% (see Columns (1) - (3)). Although the first cases of coronavirus occurred in Sub-Saharan Africa relatively later than other parts of the world (e.g. China, Europe, USA), the interaction between the SSA dummy and the duration of the epidemic shows that the longer the epidemic would last, the more severe the consequence would be for Sub-Saharan Africa countries compared to the rest of the world, everything else equal. In particular, our results show that any additional day of the epidemic is associated with a 12.27% increase in the number of active cases in SSA, and this effect is significant at 1%. 12 This duration effect is much higher in SSA compared to the rest of the world. Our most meaningful findings are those related to demographic and geographic factors. The results show that the proportion of population aged 65 and above, an important demographic indicator capturing population ageing, is positively associated to the number of active cases, and this correlation is significant across all specifications. Specifically, everything else equal, a 1 percent increase in the fraction of this population is associated with about 0.09 percent increase in the number of active cases on average. This result is in line with recent evidence suggesting that relatively older adults are at a higher risk of COVID-19 (e.g. Zhou et al. 2020; WHO 2020) . Since Sub-Saharan Africa has an extremely young population (e.g., half of the population is aged below 20, and only 3.3% are above 65), they are likely at a relatively lower risk on this dimension. All the geographic factors considered are also significantly associated with the severity of the disease. The coefficient for Log population density is estimated to range between 0.23 and 0.27 across specifications, implying that a 1% increase in the density of the population is associated with a 0.23%-0.27% increase in the number of active cases. Indeed, higher population density would tend to increase the likelihood of inter-community contagion, even under social distancing measures. Alirol et al. (2011) explain that this is particularly true for diseases transmitted via respiratory and fecal-oral routes (such as influenza, measles, tuberculosis, severe acute respiratory syndrome, etc.), given the increase in the amount of shared airspace. These authors also showed that cities are becoming important hubs for the transmission of infectious diseases, not only because of international travel and migration, but also because urbanization is associated with negative health outcomes and utilization (e.g., Stillwaggon, 2002; Greif et al., 2011) . Consistently with these findings, our results show that a 1 percentage point increase in the urbanization rate is associated with about 3.1% to 3.7% increase in the number of active COVID-19 cases. Given that population density and urbanization rates remain relatively low in sub-Saharan Africa countries (see Figure 2 ), these countries thus have an important advantage in coping with the virus spread compared to other parts of the world from a spatial perspective. Average temperature around the first quarter of the year (January-March) is also another relevant geographic factor, not only because recent research has suggested that temperature and climatological factors could influence the spread of this novel coronavirus in general (de Ángel Solá et al. 2020; Liu et al., 2020) , but also because this particular quarter of the year corresponds to when the novel coronavirus has been initially spreading. We found this indicator to be negatively associated with the pandemic spread in our estimations, showing that a 1 o C decrease in average temperature around this quarter is associated with about 3.8% increase in the number of active cases of COVID-19. This result means that countries with relatively higher temperature at this time of the year, such as sub-Sahara African countries, would tend to have a relatively lower number of cases, everything else equal. Source: UN World Urbanization Prospects (2018) An important aspect of this regression analysis is that it allows to formally assess how the effects of the demographic and geographic factors discussed above differ between SSA and the rest of the world. The key parameter for this assessment is the coefficient of the interaction term between SSA and the predicted values of the regression in Column (2), presented in Column (3). This parameter is estimated at -1.39 and is significant at 5%. It implies that any factor that is positively associated with the spread of COVID-19 worldwide would have, on average, a 1.39% lower effect on the number of active cases in sub-Saharan Africa compared to the rest of the World, for any percent shift. This is the case for all demographic and geographic factors considered. However, when we control for all these factors, the number of active cases in sub-Saharan Africa is not significantly different from those of other parts of the world. This is evidenced from the non significance of the SSA dummy in both the reduced form estimate and in the specification given in Column (3). 13 This implies that any assumption about SSA being relatively safer through unobserved heterogeneity such as some pre-existing immunity (e.g. Guerrini et al. 2020 ) should be taken with great caution. This may also mean that any number of cases by which SSA is exceeding the rest of the world is being offset on average by the amount of underreporting, conditional on the DG factors. 14 The last set of indicators whose role are examined in this analysis are those related to income and the quality of the health system. A large body of the health economics literature shows that these factors contribute to improving health outcomes (Cutler et al., 2006) . However, our estimation results show that GDP per capita, our measure of aggregate income, is insignificant in all our specifications (see Columns (1) - (3)). This means that, although this factor provides the opportunity to improve material conditions, subsidize effective containment measures such as social distancing, and improve related public goods as shown by many research (e.g. Marmot 2002, Condliffe and Link, 2008) , it is not associated with a significantly lower level of COVID-19 spread, when demographic and geographical factors are controlled for. Thus, given these DG factors, the fact that SSA is relatively poor compared to other regions of the world does not seem to be a crucial issue in addressing the pandemic in the continent, unlike initially thought. Another important issue that has spurred concerns about this pandemic for the African continent is the fragile health systems in most Sub-Saharan African countries, and the fear that new or re-emerging disease outbreaks such as the current COVID-19 pandemic could potentially paralyze health systems at the expense of primary healthcare requirements 13 Using the formulas in (4) and the delta method, the reduced form coefficient for the SSA dummy is inferred from estimates in Columns (2) and (3) is computed at , with a standard error of 0.957. 0 = -0.25 -1.39 * ( -1.8 + 1.15) = 0.661 (Velavan and al., 2020) . However, using public health expenditure as a recognized indicator for the quality of healthcare infrastructure (see Ssozi, and Amlani, 2015; Gallet and Doucouliagos, 2017; Obrizan and Wehby, 2018) , we found no statistically significant association with the active spread of COVID-19, or its containment thereof. This suggests that the relative fragility of health infrastructure in SSA countries and their relatively week capacity to diagnose and handle outbreaks compared to other regions does not constitute a significant catalyst of the COVID-19 spread. The results discussed above are subject to the caveats of data quality already raised in Section 2, which may remain pending even after attempting to mitigate it with our econometric strategy. The results on epidemiologic indicators (i.e. prevalence of diabetes or CVD) can not bear a causal interpretation given the possible underlying endogenous behavioral adjustment discussed in the estimation section and unaccounted by our model. Our most credible estimates are the effects of demographic and geographic factors which are largely exogenous and are found to be significant and robust across all our specifications. They provide compelling evidence that may help understand why the number of infected cases of COVID-19 has been growing slower in sub-Saharan Africa and has remained relatively low compared to other regions of the world. These findings are however credible to the extent that the measurement errors in the dependent variable are uncorrelated with these factors as is usually assumed in the econometric literature (see Bound et al 2002 , Hausman 2001 . Otherwise, a more sophisticated model of misreporting is needed and may require other methodological investigations that are beyond the scope of this work. The goal of this paper was to assess the role of demographic and geographic factors in explaining the spread of COVID-19, with the aim of understanding why the epidemic is progressing relatively slower in sub-Saharan Africa. We employ a Ramsey-type device that preserves degrees of freedom in a regression analysis framework that accounts for possible misreporting to estimate the number of active COVID-19 cases as a function of these factors. We found that the proportion of population aged 65+, population density and urban population rate are positively associated with the number of active cases, whereas average temperature around the first quarter of the year (January-March) is negatively associated with this epidemic outcome. Because sub-Sahara African countries exhibit both lower rates of the former factors and higher levels of the latter, they are less affected than other countries by these drivers. As a consequence, these factors are found to have lower marginal effects on the number of active cases in sub-Saharan Africa compared to the rest of the world. These results help understand the relatively low progression of the pandemic in sub-Saharan Africa, compared to the rest of the world. However, this advantage that sub-Saharan Africa seems to have regarding the spread of COVID-19 disappears once we take away demographic and geographic characteristics. This suggests that any assumption that sub-Sahara African countries could be benefiting from pre-existing immunity conditions beyond the above-mentioned factors should be taken with caution. While the number of active cases increases with the duration of the epidemic, our results show that the perverse effect of time is exacerbated in sub-Sahara African countries compared to the rest of the world, in spite of the former having a learning advantage. This means that the comparative advantage that SSA seems to have now could narrow and possibly reverse in the future as the pandemic evolves amid no medical solutions. This therefore calls for awareness and strategies to implement mitigation efforts and containment measures that pertain to the SSA situation. Our results provide insights for policies that could be implemented to overcome disease spreads of the coronavirus type. In particular, given that geographic factors such as urbanization and dense populations appear to have the largest and most significant impacts in our analysis, successful policies and programs to address the spread and severity of such diseases should leverage on geographical eco-system. This includes sensible planning of the expansion of cities as well as the integration of health and social distancing concerns into urban policies. Urbanisation and infectious diseases in a globalised world. The Lancet infectious diseases Measurement Error in Survey Data The relationship between economic status and child health: evidence from the United States The determinants of mortality Weathering the pandemic: How the Caribbean Basin can use viral and environmental patterns to predict, prepare and respond to COVID-19 Cardiovascular considerations for patients, health care workers, and health systems during the coronavirus disease 2019 pandemic Are patients with hypertension and diabetes mellitus at increased risk for COVID-19 infection? The Lancet Respiratory Medecine The impact of healthcare spending on health outcomes: a meta-regression analysis Urbanisation, poverty and sexual behaviour: the tale of five African cities Potential Link between Anti Malaria Prophylaxis and the Prevention of COVID-19 Infection Diabetes and cardiovascular disease in older adults: current status and future directions Mismeasured Variables in Econometric Analysis: Problems from the Right and Problems from the Left Impact of meteorological factors on the COVID-19 transmission: A multi-city study in China The influence of income on health: views of an epidemiologist Looming threat of COVID-19 infection in Africa: act collectively, and fast Health expenditures and global inequalities in longevity Data retrieved from Our The effectiveness of health expenditure on the proximate and ultimate goals of healthcare in Sub-Saharan Africa 2018 revision of world urbanization prospects Is there a health penalty of China's rapid urbanization The Covid-19 epidemic. Tropical Medicine and International Health Coronavirus disease 2019 (COVID-19): situation report United Nations, Department of Economic and Social Affairs Clinical course and outcomes of critically ill patients with SARS-CoV-2 pneumonia in Wuhan, China: a single-centered, retrospective, observational study. The Lancet Respiratory Medicine Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: a retrospective cohort study The most evident limitation of our analysis is the quality of the publicly available data that we used and the associated misreporting in the outcome variable. Econometric approaches to deal with these issues such as the one we employed may not fully mitigate it or fully identify some relevant components of the relationship, especially if the measurement errors are correlated with explanatory factors. Another important limitation is the inability of the model to measure the endogenous behavioral responses of some of the key explanatory variables. Addressing these concern would require better quality data as well as ancillary information about these behavioral responses. These considerations are left as possible avenues of future research.