key: cord-291724-bwiyp73e authors: Abdulmajeed, Kabir; Adeleke, Monsuru; Popoola, Labode title: ONLINE FORECASTING OF COVID-19 CASES IN NIGERIA USING LIMITED DATA date: 2020-05-08 journal: Data Brief DOI: 10.1016/j.dib.2020.105683 sha: doc_id: 291724 cord_uid: bwiyp73e The novel Coronavirus disease (COVID-19) was first identified in Wuhan, China in December 2019 but later spread to other parts of the world. The disease as at the point of writing this paper has been declared a pandemic by the World Health Organization (WHO). The application of mathematical models, artificial intelligence, big data, and similar methodologies are potential tools to predict the extent of the spread and effectiveness of containment strategies to stem the transmission of this disease. In societies with constrained data infrastructures, modeling and forecasting COVID-19 becomes an extremely difficult endeavor. Nonetheless, we propose an online forecasting mechanism that streams data from the Nigeria Center for Disease Control to update the parameters of an ensemble model which in turn provides updated COVID-19 forecasts every 24 hours. The ensemble combines an Auto-Regressive Integrated Moving Average model (ARIMA), Prophet - an additive regression model developed by Facebook, and a Holt-Winters Exponential Smoothing model combined with Generalized Autoregressive Conditional Heteroscedasticity (GARCH). The outcomes of these efforts are expected to provide academic thrust in guiding the policymakers in the deployment of containment strategies and/or assessment of containment interventions in stemming the spread of the disease in Nigeria Decision Sciences Application of an online framework for forecasting the range of COVID-19 cases in Nigeria using limited data. The dataset is in raw Excel file format, has been cleaned, analyzed in this article, and is available in the Mendeley Data repository. The ensemble algorithm presents lower and upper bounds for the total number of possible COVID-19 cases for the next day. ARIMA, Prophet, and Holt-Winters models were implemented and tuned using brute search. Data on the daily number of COVID-19 cases in Nigeria were automatically mined from the NCDC and Wikipedia websites every 24 hours. The dataset contains COVID-19 cases starting from February 27, 2020, to April 5, 2020 . Descriptive analysis of the dataset has been presented and the forecasts using limited data are made available. Sample code to mine real-time COVID-19 cases in Nigeria as a Pandas data frame or Excel file has been provided in the repository. Osun State University, Osogbo, Osun State, Nigeria. The data at the point of writing and code can be found in the Mendeley Repository. Value of the Data  These data are useful as they present facts that drive analytics on COVID-19 cases in Nigeria.  Academic institutions, public health agencies, scientific communities, researchers, students, and self-explorers can use these data, code, and models to analyze COVID-19 cases in Nigeria and beyond.  The data -with daily updates, model, code, and analysis presented can be applied to drive analytics, policy development, and decision making in other countries where data is scarce. It also represents an early reference that can be used in the future.  The ensemble of models leverages the strengths and compensates for weaknesses in the individual forecasting algorithms even with limited data. The daily number of COVID-19 cases in Nigeria from February 27, 2020, to April 5, 2020, were automatically mined every 24 hours from the official websites of NCDC (http://covid19.ncdc.gov.ng/) and Wikipedia (http://tiny.cc/nigeria_covid19) using a python script. The case numbers up to April 5 can be found in the supplemental data (Appendix A). As at the point of writing this brief, the dataset contains 39 time-series data points. For forecasting purposes, earlier days with zero incidences of COVID-19 were filtered for building the forecast models. This is presented in tabular form in Table 1 -2 and visual form in Fig. 1 while the number of new cases per day is shown in Fig. 2 . Seven successive daily forecasts by the ensemble, starting from March 29, 2020, to April 5, 2020, are presented in Table 3 . The autocorrelation and partial autocorrelation behavior of the dataset of Fig. 3 aids in developing ARIMA modeling insights. Similarly, Fig. 4 is a visualization of the forecast beams indicating the direction and strength of increases or decreases in the forecasted number of cases. Table 4 highlights the relative strength and weaknesses of the models individually and finally; Fig. 5 presents a visual comparison of COVID-19 cases in Nigeria vs South Africa for analyzing policy impact. In this paper, we present the application of ensemble forecasting models in a data constrained environment. The objective is to establish the lower and upper bounds on the possible number of COVID-19 cases per day using a framework that automatically streams web data in real-time from reliable sources. This data is used for retraining and adapting the parameters of an ensemble of three models which in turn updates its forecast prediction for the following day. Each of the three models provides an estimated lower bound and upper bound for the number of cases. The Ensemble forecast is achieved by taking the minimum of the lower bounds and the maximum of the upper bounds. Due to limited sources of data (the only available information is a single variable, "number of COVID-19 cases per day"), it makes it difficult to implement specialized, advanced, and more generalizable methods that often require a variety of features and bigger datasets. It is also important to highlight that there are only a few samples (22) represents the series of differences, are the coefficients, the and are the lagged predictors for the model [2] . The ARIMA model is a generalization of many sub-models and characterized by three parameters: order of autoregressive observations , degree of differencing , and number of moving average terms . The foundations of ARIMA lie on the fact that nonstationary time-series can be made stationary by through differencing. The verification of stationarity can be achieved using autocorrelation plots and unit root tests such as the Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test [3] . For this model, we are interested in non-seasonal phenomena. Using brute search and inspection of the autocorrelation function ACF and partial autocorrelation function PACF plots (Figure 3) , an ARIMA (2,1,0) model was chosen. ii. Prophet is an additive regression time-series forecasting algorithm developed by Facebook [4] , [5] . It has strengths in dealing with strong seasonal effects, missing data, outliers, and shifts in trend making it fully automatic. It is also implemented with a Stan backend which introduces a fast solution to L-BFGS (Limited-memory Broyden-Fletcher-Goldfarb-Shannon algorithm) for the forecasting problem. Prophet includes a decomposable time series model defined by: is the trend, represents seasonal changes and captures irregular effects. We elicit trend characteristics as of primary importance in developing the forecast. The trend model in this work utilizes a piecewise saturated growth model with time-varying carrying capacity defined below: is the time-varying carrying capacity, is the growth rate, is an offset. The growth rate is not constant but piecewise with , , and defining its structure. iii. Holt-Winters Exponential smoothing is a famous time-series modeling and forecasting algorithm that came to light in the 1950s after some scientific reports [6] [7] [8] . This algorithm is a weighted average of past observations with exponentially decaying weights to capture the trend in a time-series dataset. It has the general form of: Where the values of alpha tune the response of the model. Alpha values close to emphasize past input data, while values close to emphasize recent input data. The Holt-Winters ES algorithm introduces a seasonality component to the vanilla ES. In addition to the ES model, a GARCH model [9] was used to forecast variances and combined with the ES model gives the upper and lower bound for this model iv. Forecasts: In Table 3 , the dates for each of the forecasts along with the corresponding acutal reported (official) cases by NCDC can be observed. The performance of the ensemble models can easily be visualized in Figure 4 . The forecast envelope accurately captures the actual number of real cases detected by the NCDC. By combining these three algorithms, we compensate for their relative weaknesses while reinforcing their relative strengths (Table 4) . Ultimately, we believe in the mantra that All models are wrong, but some are useful -George Box. No readily known weaknesses Strong and accurate forecasting (short-term), favors recent data samples, requires few data points, straightforward implementation. Lagged forecasts. Although, there is more information on the COVID-19 situation in North America, Europe, and Asia in comparison with Nigeria. Figure 5 presents the comparison of COVID 19 cases in South Africa (SA) with Nigeria (NG). The figure shows that the number of confirmed cases of the infection in SA in geometrically higher than NG. This observation could be explained from two perspectives: (i) SA either has much more people infected with COVID-19 than NG; (ii) or there is wider coverage in the number of people test per day among the population. The latter reason may probably be stronger than the former as there is corroboratory evidence for the later. As of the 20th March 2020, SA had conducted 6,438 tests [10] while Nigeria only performed 69 tests [11] . One of the current strategies being deployed for the containment of COVID-19 is the lockdown of regions affected to avoid further spread by human movement. We find evidence that this strategy is effective in the containment of the infection in SA. SA declared a National lockdown on the 26th of March, 2020 [12] to tame the spread of infection. As shown in Figure 5 , there was a kink in the progression of confirmed cases on March 27, 2020. The steepness of the curve was relatively flattened and steady for a few days thereafter. The steep upward trend of the infections was stemmed the same day the national lockdown policy was announced. The fact that those that have been infected before the lockdown will be presenting themselves for tests and treatment could have accounted for a slight rise in cases within 2 weeks of lockdown. The full compliance with the policy and number of new cases in post-2-weeks lockdown policy are scenarios that would guide further actions in SA. It is difficult to model COVID-19 (in fact, any real-life scenario) has inherent modeling difficulties such as the number of tests, randomness, interventions, stay-at-home compliance, curfews, epidemiological realities, and many other factors contribute to the difficulty of forecast models in this case. Countries, especially in Africa who are just witnessing a progressive rise in COVID-19 cases must be decisive in implementing the containment interventions and ensure strict compliance by the citizenry. Nigeria Center for Disease Control Forecasting: principles and practice Testing the null hypothesis of stationarity against the alternative of a unit root Forecasting at scale Prophet Statistical forecasting for inventory control Forecasting seasonals and trends by exponentially weighted averages Forecasting sales by exponentially weighted moving averages Generalized Autoregressive Conditional Heteroskedasticity Coronavirus Testing -Source Data An update of COVID-19 outbreak in Nigeria Ramaphosa announces 21-day coronavirus lockdown for South Africa The authors acknowledge the Abdulmajeeds for their encouragement throughout this work. Special appreciation to the Management of Osun State University for the support of this project. The authors declare that they have no known competing financial interests or personal relationships which have, or could be perceived to have, influenced the work reported in this article. Supplementary data to this article, daily updated data, and data mining code can be found online using the Mendeley Repository http://dx.doi.org/10.17632/3xsb9nyfhp.1. Updates can also be monitored at the GitHub repository https://github.com/kbmajeed/nigeria_covid19