key: cord-0786047-oyieqde2 authors: Torrealba-Rodriguez, O.; Conde-Gutiérrez, R.A.; Hernández-Javier, A.L. title: Modeling and prediction of COVID-19 in Mexico applying mathematical and computational models date: 2020-05-29 journal: Chaos Solitons Fractals DOI: 10.1016/j.chaos.2020.109946 sha: 138ce31e9a529a61f5329ea0f7d10d2962630b11 doc_id: 786047 cord_uid: oyieqde2 This work presents the modeling and prediction of cases of COVID-19 infection in Mexico through mathematical and computational models using only the confirmed cases provided by the daily technical report COVID-19 MEXICO until May 8(th). The mathematical models: Gompertz and Logistic, as well as the computational model: Artificial Neural Network were applied to carry out the modeling of the number of cases of COVID-19 infection from February 27(th) to May 8(th). The results show a good fit between the observed data and those obtained by the Gompertz, Logistic and Artificial Neural Networks models with an R(2) of 0.9998, 0.9996, 0.9999, respectively. The same mathematical models and inverse Artificial Neural Network were applied to predict the number of cases of COVID-19 infection from May 9(th) to 16(th) in order to analyze tendencies and extrapolate the projection until the end of the epidemic. The Gompertz model predicts a total of 47,576 cases, the Logistic model a total of 42,131 cases, and the inverse artificial neural network model a total of 44,245 as of May 16(th). Finally, to predict the total number of COVID-19 infected until the end of the epidemic, the Gompertz, Logistic and inverse Artificial Neural Network model were used, predicting 469,917 59,470 and 70,714 cases, respectively. In December 2019, a new coronavirus disease emerged characterized as a viral infection with a high level of transmission in Wuhan, China. is caused by the virus known as Severe Acute Respiratory Syndrome coronavirus 2 (SARS-CoV-2) established by the Coronaviridae Study Group of the International Committee on Taxonomy of Viruses (ICTV) [1] . The origin of this virus is not yet confirmed, but a sequence-based analysis suggests bats as a possible key reservoir [2, 3] . Furthermore the notorious effects that the virus has on people's health, there are other effects that are triggered as an indirect effects of the virus, some of these are the following: psychological distress, economic losses and negative impacts on daily activities [4] . Due to these effects, and in order to counteract them, the governments of different countries have made public policy decisions in both the health and economic aspects. At the same time in the academic field, several research papers have been published focused on modeling and estimating the possible number of people infected with COVID-19 in a specific period. Ivanov [5] presented a robust simulation on the impact of COVID-19 on global supply chains, the results conclude in taking care of the closing and opening of the facilities at different levels could become an important factor that determines the impact of the epidemic outbreak. Li et al. [6] applied a Gaussian distribution to analyze COVID-19 transmission in Hubei Province of China and developed predictions on the epidemic trends in South Korea, Italy, and Iran, the results show the evolution of the epidemic, discovering that imposing controls would have a significant impact. Duccio Fanelli and Francesco Piazza [7] developed a mathematical model consisting of susceptible-infected-recovereddeaths (SIRDs) to analyze the outbreak of the coronavirus disease in China, Italy and France, the results describe that the recovery rate in the countries is the same, while the infection and mortality rates seem to be different. Petropoulos and Makridakis [8] presented an objective approach to predict the continuation of global COVID-19 using models from the exponential smoothing family, the results provide a forecast timeline for proper planning and decision making. Applied mathematical models, such as Gompertz and Logistic, have been used successfully to predict the number of infected with COVID-19 in China, as demonstrated by: Jia et al [9] where three mathematical models were applied, including the Gompertz model and logistic model, to estimate the progress of COVID-19 in Wuhan, China, the results of the mathematical models predict that the COVID-19 will be over probably in late-April, 2020 in Wuhan. Castorina et al. [10] developed the mathematical Gompertz and Logistic models to evaluate the effectiveness of containment in the epidemic spread of COVID-19 in China, South Korea, Italy, and Singapore, the results of the models predict the maximum number of infected individuals for each country studied, in order to maintain a strong containment policy. Ahmadi et al. [11] analyzed the prediction of definitive cases of COVID-19 in Iran through mathematical models, the results show that, by carrying out compliance and public behavior interventions, it is possible to control and reduce the COVID-19 epidemic from April 28 th to July 2020 in Iran through the Gompertz model. On the other hand, among computational models to simulate and predict nonlinear behaviors, the inverse artificial neural network (ANNi) has recently excelled. In the literature, the ANNi computational model has simulated and predicted values of interest in engineering fields, as reported by: Márquez-Nolasco et al. [12] developed two inverse artificial neural network models to simulate and estimate the internal heat and outlet temperature of a thermal absorber. The results show a good simulation of the process (R 2 >0.99) and through the estimation it was possible to obtain higher values both in the internal heat and in the outlet temperature with respect to those obtained experimentally. Abdallah el hadj [13] used the inverse artificial neural network to estimate the equilibrium temperature, equilibrium pressure and critical temperature in the solubility of solid drugs in carbon dioxide in critical properties. The results show that the estimate obtained an average relative deviation of 1.1% when compared with properties reported in the literature. Mexico has not escaped from the global pandemic, On February 27, 2020, the first patient with COVID-19 was reported in the country, meanwhile the day 22 of April the cases had already overcome the 10,000. Mexico is one of the country's most prone to the spread of the virus due to the number of patients who visit hospitals daily. For example, patients with chronic hemodialysis constitute a group at high risk of serious complications in case of presenting infections, due to their immunosuppressed state and the coexistence of significant comorbidities [14] . Likewise, other patients with various diseases are a potential spreading factor for COVID-19, basically to the fact that they regularly go to check-ups and wait in hospitals, exposing themselves to acquiring the infection. Beside of the implementation of several public policies, the evolution of the epidemic in Mexico, has also motivated scientific research. In this regard, it is worth mentioning that research related to the estimation of the number of cases of COVID-19 becomes relevant due to the need to anticipate the sufficient infrastructure and specialized materials that are required to deal with treating the disease, such as: ventilation units, masks for the protection of health personnel, isolation rooms, among others, . An example of the academic research that was accomplished in Mexico is the one carried out by Cruz-Pacheco [15] where the arrival of the infectious outbreak in The predictions presented here are based on the data on the total number of cases confirmed by the Ministry of Health and come from the "Daily Technical Report" [18] issued by the Mexican Ministry of Health. In order, to modelling the actual tendency in the total number of confirmed cases, predict the number of total cases from May 9 to 16, and predict the total number of cases at the end of the epidemic, mathematical and computational models are proposed. The rest of this article is organized as follows: in the section two, details about the sources and characteristics of the data are discussed. In section three, the mathematical and computational methodologies are specified, containing two subsections: the first dedicated to explain the specifications of mathematical models This section contains details of where the data was obtained as well as some specific characteristics related to them. The data used comes from the "Daily Technical Report" [18] issued by the Mexican Ministry of Health. Such report includes (among other data), figures on the number of total COVID-19 confirmed cases nationwide. The data presented in the report comes in turn from the information provided by the state health authorities. For each day, the figures correspond to those reported up to 1:00 p.m. It's very important to mention that the data on confirmed cases, only includes those cases that were successfully detected and reported, but the real total confirmed cases of COVID-19 are probably much higher (for example on the last April 8, 2020 the Ministry of Health declared that the total estimated confirmed cases could be around 8.3 times higher than the cases that were effectively registered [19] ). [ Figure 1 about here] The figure shows values in a range that goes from 1 to 31,522 in a clear upward tendency. Considering the relative growth of cases an interesting fact is reveled: in the first 40 days the average growth rate on the number of cases is 25% per day, while in the last 20 days the average growth rate is just around 7% (maybe this could reflect at some extent the effects of containment policy carried out by the Ministry of Health). The increasing levels of absolute growth suggest an exponential tendency with an average growth rate around 18% for the 72 observations, this confirms what is showed by the empirical evidence coming from several countries in earlier stages of the epidemic. However, it is important to mention that also the empirical evidence shows how in later stages of the epidemic, the growth rate on the total confirmed cases starts to slow down. These two patterns in the behavior, give as a result a kind of "Sigmoid" function in the shape of the complete tendency. So that´s why the Gompertz, Logistic an ANN models are proposed to model and predict the confirmed cases. In this section, the specifications of the mathematical and computational models to carry out the modeling and prediction of the cases of COVID-19 infection are described. Mr. Gompertz [20] proposed in 1825 a model known as "Gompertz Theoretical law of mortality". Since then, the Gompertz model has been used to describe growth in plants, animals, bacteria, and cancer cells. Several re-parametrizations of this model have been made, one of them described in [21] , that is very useful for analytic purposes, is the following: On the other hand, the logistic growth model proposed by Verhulst in 1838 [22] , can be expressed as a differential equation: where: ; are the dependents variables in the models (the population variable in both models) ; are the parameters for the growth rate in the models are the maximum values of each model attained as t goes to infinity Eqs. (1) and (2) are differential equations, that can be solved to obtain: where: are the values that correspond to the inflexion point of the models (time at which the number of new confirmed cases per day begins to decrease) From the deterministic relationships proposed by the mathematical models, the statistical relationships are formulated including a stochastic error term, this in order to be able to perform the empirical estimation of the model's parameters. For Gompertz model For Logistic model (6) where: are the parameter estimates for and are the parameter estimates for and are the parameter estimates for and and ; are the error terms for each model and represent the difference between the number of cases observed and the number of cases predicted by each model. From (5) and (6) equations, and can be solved, to obtain: For Gompertz model (7) For Logistic model (8) In order to find out the values that minimize the following sums of the m square errors, corresponding to the m observations, for each model: From an initial approximation and using the Gauss-Newton algorithm for the solution of non-linear least-squares problems, the values that minimize such sums can be found. All calculations were carried out with Stata. The direct and inverse Artificial Neural Network model is a set of coupled computational models that allow modeling and extrapolating values through a series of representative data. For the development of inverse Artificial Neural Network (ANNi) model, it is necessary to apply the coefficients obtained during the learning of an Artificial Neural Network (ANN) model in order to propose an objective function [23] . The steps to apply the computational methodology are the following: The ANN model is developed using a three-layer architecture. In the input layer, each input parameter (In) is assigned a weight factor (Wi) and the sum of these, a factor known as bia (b 1 ) is added. In the hidden layer, a transfer function is applied to represent an internal output. In the output layer, the weighted sum of the signals provided by the hidden layer is calculated and output coefficients (Wo and b2) are added to generate a simulated value. The following Eq. (9) represents an ANN model using a tangential-sigmoidal function (TANSIG) on the hidden layer and a linear transfer function (PURELIN) the output layer: )] (9) Where: is the neurons number in the hidden layer, is the neurons number in the input layer and Out is the modeled value. Generally, to carry out the learning a Levenberg-Marquardt algorithm is used due to its precision and speed when adjusting the coefficients to obtain the desired output [24] . Once the ANN model has been developed, it is inverted to propose an objective function. This approach allows extrapolating input values (In x ) from a desired output (Out). The following Eq. (11) represents an objective function: )+ Once the ANNi model has been proposed, it is necessary to apply an optimization algorithm to find the values that solve the objective function in the shortest possible time. In this work, a genetic algorithm (GA) was coupled due to its search capacity and reasonable response time [25] . All calculations were carried out with Matlab mathematical software with the optimization toolbox. The Coefficient of Determination R 2 is applied to specify the fit of the observed data and the modeled ones. Where ̅ ∑ , is the output value obtained by the model, and is the observed output value. Eq. (13) shows the estimated Gompertz model where: ̂ is the number of total estimated COVID-19 cases for each day is the number of maximum estimated COVID-19 cases at the end of the epidemic is the estimated growth rate of total COVID-19 cases [ Figure 2 about here] Regarding to the predicted values of the model for the period that goes from May 9, to May 16 th , Fig. 3 shows predicted value of 47,576 on May 16 th . [ Figure 3 about here] Also, this model predicts that the maximum number of total cases at the end of the epidemic (469,917) will be reached. On the other hand, Eq. (14) shows the estimated Logistic model ̂ where: ̂ is the number of total estimated COVID-19 cases for each day is the number of maximum estimated COVID-19 cases at the end of the epidemic. is the estimated growth rate of total COVID-19 cases. is the time at which the maximum number of new daily cases is expected to occur. An ANN model with seven neurons in the hidden layer and using a TANSIG function was able to model the number of COVID-19 infected registered until May 8 th . Table 1 shows the coefficients of weights and bias obtained during training. [ Table 1 about here] The coefficients obtained are integrated into the following Eq. (15) to model the number of positive cases (Nº) for COVID-19 infection: )] [ Figure 6 about here] To carry out the prediction, an ANNi model was developed based on a similar procedure to that reported by Márquez et al. [12] . To predict the number of positive cases for the period that goes from May 9 th to 16 th , the following objective function was proposed using the coefficients presented in Table 1 : )] Where: In x represents the number of days to extrapolate and Nº represents the total number of positive cases of COVID-19. The genetic algorithm was applied to search the total number of positive cases for COVID-19 with respect to the number of extrapolated days. Fig. 7 shows the prediction of the number of cases by COVID-19 to be presented on May 16 th obtained by ANNi. [ Figure 7 about here] Regarding to the predicted values of the model for the period that goes from May 9 th , to May 16 th , Fig. 7 The predicted results from the models could differ in a significant way from the observed ones, if there is a change in the coming days due to various testing strategies, social-distancing policies, reopening the community, or stay-home policy, the predicted death tolls will definitely change. Obviously, further analysis in broader validation of this conclusion is needed by updating the real current Covid-19 data into the model. The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Michigan Press, (1975) 183. The species Severe acute respiratory syndrome-related coronavirus: classifying 2019-nCoV and naming it SARS-CoV-2 COVID-19 infection: Origin, transmission, and characteristics of human coronaviruses Origin and evolution of pathogenic coronaviruses From SARS to COVID-19: A previously unknown SARS-related coronavirus (SARS-CoV-2) of pandemic potential infecting humans-Call for a One Health approach Predicting the impacts of epidemic outbreaks on global supply chains: A simulation-based analysis on the coronavirus outbreak Propagation analysis and prediction of the COVID-19 Analysis and forecast of COVID-19 spreading in China, Italy and France Makridakis Forecasting the novel coronavirus COVID-19 Zhao Prediction and analysis of Coronavirus Disease Data analysis on Coronavirus spreading by macroscopic growth laws Modeling and Forecasting Trend of COVID-19 Epidemic in Iran Optimization and estimation of the thermal energy of an absorber with graphite disks by using direct and inverse neural network Novel approach for estimating solubility of solid drugs in supercritical carbon dioxide and critical properties using direct and inverse artificial neural network (ANN) Prevención y control de la infección por coronavirus SARS-CoV-2 (Covid-19) en unidades de hemodiálisis Dispersion of a new coronavirus SARS-CoV-2 by airlines in 2020: Temporal estimates of the outbreak in Mexico. medRxiv Temas de la conferencia sobre el covid-19 en México del 16 de abril COVID-19)-Comunicado Técnico Diario Cifra real de infectados por Covid-19 en México sería de 26,519 personas, reconoce la Secretaría de Salud On the nature of the function expressive of the law of human mortality, and on a new mode of determining the value of life contingencies The use of Gompertz models in growth analyses, and new Gompertz-model approach: An addition to the Unified-Richards family Notice sur la loi que la population poursuit dans son accroissement Optimum operating conditions for heat and mass transfer in foodstuffs drying by means of neural network inverse Comparing the performance of neural networks developed by using Levenberg-Marquardt and Quasi-Newton with the gradient descent algorithm for modelling a multiple response grinding process Adaptation in Natural and Artificial Systems. University of Declaration of interests ☐ The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work The correspond author thank at SNI CONACyT for the support provided