key: cord-1045139-k7wft3u7 authors: Pathak, Anurag; Kumar, Manoj; Singh, Sanjay Kumar; Singh, Umesh title: Statistical Inferences: Based on Exponentiated Exponential Model to Assess Novel Corona Virus (COVID-19) Kerala Patient Data date: 2021-08-03 journal: Ann DOI: 10.1007/s40745-021-00348-7 sha: 209b36a331f41490becd0767ce09615a3c79f34e doc_id: 1045139 cord_uid: k7wft3u7 In this article, we use exponentiated exponential distribution as a suitable statistical lifetime model for novel corona virus (covid-19) Kerala patient data. The suitability of the model has been followed by different statistical tools like the value of logarithm of likelihood, Kolmogorov–Smirnov distance, Akaike information criterion, Bayesian information criterion. Moreover, likelihood ratio test and empirical posterior probability analysis are performed to show its suitability. The maximum-likelihood and asymptotic confidence intervals for the parameters are derived from Fisher information matrix. We use the Markov Chain Monte Carlo technique to generate samples from the posterior density function. Based on generated samples, we can compute the Bayes estimates of the unknown parameters and can also construct highest posterior density credible intervals. Further we discuss the Bayesian prediction for future observation based on the observed sample. The Gibbs sampling technique has been used for estimating the posterior predictive density and also for constructing predictive intervals of the order statistics from the future sample. In December 2019, a novel corona virus was recognized from China, in Wuhan city, Hubei province, see Arshad Ali et al. [4] , Li et al. [20] . More importantly, the epidemic of covid-19 has occurred in most of the countries. At the international level, WHO has declared the covid-19 as a epidemic, it has affected more than 215 countries, areas or territories as a pandemic, see WHO [31] . But in India the first covid-19 case was reported on 30-01-2020, from a student, and then spread gradually in each state and increased the number of covid-19 cases. The virus has spread through respiratory droplets. The coughing, fever and sneezing are found to be the main symptom of covid-19, and for more see Chen et al. [9] and Paules et al. [23] , etc. Further the transmission patterns of covid-19 based on age-specific social contact characterization and countries with different social structures and lifestyles effects have been studied by Liu et al. [21] and Shi et al. [25] respectively. Chen et al. [9] has been discussed in the form of global health concern. Recently, Kumar [18] has discussed cluster analysis for covid-19 data set of different states and union territories (UTs) in India. While Khakharia et al. [17] have developed an outbreak prediction system for covid-19 of the top ten high densely populated countries in worldwide. Using machine learning and a suitable statistical model for predicting the rise in the new cases has highlighted for most of the countries. Moreover, on the outbreak of covid-19, many researchers have started investigating and using the covid-19 data for various perspectives. Therefore for statistical inference purpose, we have considered covid-19 data during 09-03-2020 to 23-04-2020 from Kerala state in India. The data have shown in further section. The collected data has been modeled with distribution and analyzed. Authors refer to see in detail, for modeling, prediction, big data analytics by Shi et al. [25] , longitudinal data analysis by Temesgen et al. [29] and real-time decision analysis by Tien [30] . We know that the exponential model has the most exploited distribution for lifetime data analysis. But its suitability has been restricted with a constant hazard rate, which is very difficult to verify in many practical problems. This led to the development of other distributions. The Weibull and gamma distribution have been extensively used for analysing lifetime data, particularly, in those situations where the hazard rate has monotonically increasing, decreasing and constant. The exponentiated exponential distribution was developed by Gupta et al. [13] , which has an alternative to two parameters (shape and scale) as the Weibull and gamma distribution, see Gupta and Kundu [14] . The probability density function (pdf) of the exponentiated exponential distribution or model (EED) is given by where and are the shape and scale parameters, when = 1 , it reduces to the one parameter exponential distribution. Figure 1 shows EED for = 0.5, 2, 5, 10, 15; = 1 . The cumulative density function (cdf) of EED with two parameters are given as x > 0, > 0, > 0. (2) F(x| , ) = (1 − e − x ) ; x > 0, > 0, > 0. It may be noted here that the hazard function is monotone increasing if > 1 and > 0 ; monotone decreasing if < 0 and > 0 ; and constant shaped if = 1 and = 1 . The EED can be reduced to exponential distribution (ED) having a shape and scale parameter one. For more detailed about the EED, readers are referred to see Gupta et al. [13] , Gupta and Kundu [14] . Most of the studies on the EED and ED have focused on estimation of the parameters, reliability and hazard functions see Singh [26] . In Bayesian paradigm, the prior distribution of the parameters have used to derive the posterior density of the parameters. This method has also used to obtain predictive inference for future observations. To learn more about the Bayesian inferences and model selections, readers are referred to Gelman et al. [11] , Baklizi [5] , Bernardo and Smith [7], Berger [6] and Geisser [10] . But no one has attempted to work on empirical posterior probability based inference for EED. Thus, in this manuscript our main goals are (i) to obtain the Classical and Bayes estimate of the parameter; (ii) to utilize the predictive models with a Kerala covid-19 data to obtain the predictive inference for future observations; (iii) to justify that the given sample follow the EED by using AIC, BIC and empirical posterior probability of model selection criteria and goodness-of-fit tests. Hence, the purpose of this paper is to present a systematic inference of EED. In Sect. 2, the maximum likelihood estimation of the parameter, asymptotic confidence interval of EED. Bayes estimators of EED parameter using non informative prior is derived in Sect. 2.4. The Bayes prediction of EED for future observations explained in Sect. 3. In Sect. 4 mainly focused on the model selection criteria. Kerala covid-19 real data analysis report and discussion are mentioned in Sect. 5. In Sect. 6 shown the conclusion of this study. Let x 1 , x 2 , … , x n be a random sample of size n are drawn from EED, and = (x 1 , x 2 , … , x n ) � forms an observed sample. Then for given a set of data = (x 1 , x 2 , … , x n ) � from Eq. (1), the likelihood function is given by The log-likelihood function of EED specified in Eq. (5) is given by The maximum likelihood estimate (MLE) of and are the solution of the following normal equation (obtained by equating the partial derivative of Eq. (6) w. r. t. and to zero). Now, obtain the MLE of and from above Eq. 5. We can solve the two non linear normal equation given as, It may be noted here that the normal equations thus obtained do not have explicit or closed solutions. Therefore, we have used the Newton-Raphson method for its numerical solution. The confidence interval (CI) of and for EED can be derived by the Fisher's information matrix, which is given by where The observed Fisher's information matrix is Therefore, the observed variance-covariance matrix becomes I −1 ̂,̂ . Here, and where the elements of A two-sided normal approximate 100(1 − )% CI for the parameters and are √ Var(̂) respectively, and where z ∕2 is the upper ( ∕2) th percentile of the standard normal distribution. In order to obtain the Bayes estimate of the parameters and based on Kerala covid-19 data set. We assume that these parameters are independently uniformly distributed over the interval (0, ) and (0, ) respectively. Berger [6] discussed an estimation theory under uncertain prior information, see Jeffreys et al. [16] and Jeffreys [15] . The joint prior distribution of and are given by . Using the joint prior density of Eq. (9) and likelihood of Eq. (5), the posterior density of and is given by where ( ) is a normalizing constant. Hence, the respective marginal posterior pdfs of and are given by and The Bayes estimator ̂B and ̂B of and comes out to be and respectively, provided the posterior expectation exist. The above integrals in Eqs. (13) and (14) are not obtainable in closed form. Therefore, Markov Chain Monte Carlo (MCMC) method are used for their calculation. In this method, first we generate samples from posterior distributions, through the Gibbs sampler via the Metropolis-Hastings algorithms, then this sample is used to compute Bayes estimates. The Gibbs algorithm are used for simulating the samples from the full conditional posterior distributions while the Metropolis-Hastings algorithm generate samples from an arbitrary proposal distribution (i.e. a Markov transition kernel). For details about this method, see Singh et al. [28] . Hence, conditional posterior distributions of the parameters and can be written as respectively. The Bayes estimate for and , the required sample from Eqs. (15) and (16) are generated using MCMC procedure. For the details of the algorithm used, see Singh et al. [27] . Let us consider a future sample be the sample order statistics. Suppose that we are interested in the posterior predictive density of the future order statistic y (r) given the informative set of data x 1 , x 2 , … , x n . If the pdf of the r th order statistic in the future sample is denoted by h (r) (.| , ) , then here f (.| , ) and F(.| , ) are same as above Eqs. (1) and (2), respectively. The predictive density of y (r) is denotes as h * (r) (.| ) , then where ( , | ) is the joint posterior density of and as given Eq. (10) . It is note that h * (r) (y| ) cannot be expressed in closed form and hence it cannot be evaluated analytically. When r = 1 & m = 1 in Eq. (18) , it becomes the one sample prediction given below Therefore, the consistent solution of h * (r) (y| ) can be obtained by using the Gibbs sampler via the Metropolis-Hastings algorithm. Hence, the consistent estimator of h * (r) (y| ) is given by Along the same line, if we want to estimate the predictive distribution of y (r) , say H * (r) (.| ) , then a consistent estimator of H * (r) (y| ) can be obtained as Where H (r) (y| , ) denotes the distribution function of h (r) (y| , ) , which is given by It may also noted here that the same MCMC sample ( i , i ), i = 1, 2, … , M can be used to compute ĥ * (r) (y| ) or Ĥ * (r) (y| ) for all y. Further an important problem to establish a two sided predictive interval of the r th order statistic Y (r) drawn from a future sample {Y 1 , … , Y m } of size m, and it does not depend on the informative sample {x 1 , … , x n }. Now we construct a 100 % predictive interval for Y (r) . In symmetric case, the 100 % predictive interval for Y (r) can be obtained by and for the lower bound L and upper bound U respectively, see Al-Jarallah and Al-Hussaini [3] . While in asymmetric case, the predictive interval of the form (L, ∞) and (0, U) with the coverage probability can be obtained by and for L and U respectively. It is very difficult to obtain the analytical solutions. Therefore, we may apply some right numerical techniques for solving these nonlinear equations. Entropy as an information criteria in the statistics and tells us that estimate of parameter is the disorder or lack of information about the sampling of distribution. The modeling process can be judged by Akaike's information criterion (AIC) and Bayesian information criterion (BIC). This is a tool for model selection, it was developed by Akaike [2] . The estimated statistical models and the goodness of fit are measured through AIC. For the considered (22) data, all the considered competing models may be ranked according to their AIC, with the one having the lowest AIC being the best. This can be evaluated by BIC has developed by Schwarz et al. [24] , for model selection among the considered statistical models with different number of parameters. The MLE method is used to estimate the model parameter, it is possible to increase the likelihood by adding parameters, which may result in over fitting. The BIC resolves this problem by introducing a penalty per parameter term for the number of parameters in the model. This led to the AIC, and can be evaluated by The lower value of BIC is preferred among the estimated considered models. It may also noted that, k is the number of parameters in the model and n is the sample size. In Bayesian setup, we want to choose a most appropriate model from a bunch of r fully specified models. Let f 1 ( ), f 2 ( ), … , f r ( ) be densities or models with underlying observation vector = (x 1 , x 2 , … , x n ) � for ∈ S . We want to extract most important and informative model for ∈ S . For this, we assume that 1 , 2 , … , r be the subjective probabilities of each models with ∑ r j=1 j = 1 , where j denotes the probability of the j th model. Although if there is no any further information to distinguish models, then the unbiased choices j = 1∕r , for j = 1, 2, … , r are reasonable, and j is referred as prior probabilities for j th model, see Leonard and Hsu [19] . Hence, empirical posterior probability of the j th model is given by The covid-19 positive confirmed number of patients data are taken from Kerala state between 09-03-2020 to 23-04-2020. The daily bulletin of data has been issued by covid-19 outbreak control and prevention state cell, health and family welfare department, directorate of health services, government of Kerala. Available at https:// dhs. kerala. gov. in/. The covid-19 confirmed number of patients data are reported as below Between these date the total 447 persons have been tested covid-19 positive, and 129 persons are considered under treatment. According to an idea with associated failure rate, we considered, a graphical method based on total time on test (TTT) plot as a crude indicator see Aarset [1] . The empirical TTT is given as where r = 1, 2, … , n and x (r) is the order statistics of the sample. 3, 6, 8, 2, 3, 2, 3, 1, 12, 12, 15, 28, 14, 9, 19, 39, 6, 20, 32, 7, 24, 21, 9, 11, 8, 13, 8, 9, 12, 7, 10, 2, 3, 8, 1, 7, 1, 4, 2, 6, 19, 11, 10 . Figure 3 shows the TTT plot, which is concave indicating that data relates to an increasing failure rate. Thus, it can be properly accommodated by a EED. The fitting of EED was checked using PP plot and CDF plot given in Fig. 4 . By testing the hypothesis that = 1 , we can see if the EED provides a better prediction of the Kerala covid-19 data of the component compared with one parameter ED. Note that the graphical presentation of empirical and theoretical cdf can be verified in case that EED can explain the data well enough. The likelihood ratio (LR) test will be discussed here for theoretical justification purpose, see a recent work on LR test by Pathak et al. [22] . The test on the shape parameter , the null hypothesis is H 0 ∶ = 1 , and the test statistic is asymptotically with 2 1 . Here L(̂,̂| ) is the likelihood function of the EED, ̂,̂ is the MLE, and 1,̂ is the MLE under condition of H 0 . The MLE of the parameters for Kerala covid-19 data are ̂= 1.572179,̂= 0.126561 and ln(L(̂,̂| )) = −141.6486 . Hence, the observed Fisher's information matrix is Thus, the inverse of is given by Now, we can obtained the asymptotic 95% intervals for the parameters as summarized in Table 1 . Under H 0 ∶ = 1 , the MLE of is 0.096196, and ln(L(1,̂| )) = −143.6784 . Thus, the ratio test statistic is Λ = 0.04391913 , and corresponding p value is 0.9560809. Hence, we have enough evidence (i.e., more than 95% level of confidence) to reject our null hypothesis and conclude that the sub model ED (when = 1 ) cannot give a good enough fit to Kerala covid-19 data. It is well known that Kolmogorov-Simirnov (K-S) test is applicable even in the case of very small samples as well as large samples. This test assumes continuous of the distribution function, to check difference between F n (x) and F(x), see Gibbons and Chakraborti [12] . Since, to test where F n (x) is the sample (empirical) distribution function, F(x) is specified EED for all . The test statistic is less then tabulated value of K-S distance D n, then accept H 0 otherwise reject. The statistic D n is used to obtain the confidence bands on F n (x) for all x, where F n (x) is a consistent estimator for cdf F(x). The number D n, is obtain from the K-S table (critical value), such that where, 0 ≤ F(x) ≤ 1, ∀x . Thus we define and where L n (x) and U n (x) are lower and upper confidence band for the cdf F(x), with 100(1 − )% confidence coefficient. Of course, the F(x) lies completely within the limits if and only if the hypothesis cannot be rejected at % level of significance. Hence, the K-S test has been applied on Kerala covid-19 data, and value of the test statistics 0.093023 < 0.207398 (critical value) and the K-S bound plot in Fig. 5 , which shows that EED provides a satisfactory fit to the Kerala covid-19 data. By all the above criteria of goodness of fit as well as discussed above model suitability for considered Kerala covid-19 data, we see that from the fitted density and empirical posterior probability plots in Fig. 6 and sample quantile plot in Fig. 7 of EED provide better fit than ED. Also, from Table 2 , it can be seen that the quantiles of the EED and ED at various points of Kerala covid-19 data are very less similar to each. The quantiles value of the EED were found to be more than ED, and hence it covers significantly more information than ED. Moreover, the main advantage of taking EED over ED is that it has accommodate different types of hazard rate. According to the results shown in Table 3 , we observed that the value of -log(Likelihood), p value, and empirical posterior probability values of EED are larger than ED for Kerala covid-19 data. Also, the value of K-S distance, AIC and BIC are found lesser than ED. Therefore, on the basis of Table 3 we conclude that the EED satisfactorily modeled the Kerala covid-19 data. Now, we obtained the MLE of , and corresponding Bayes estimates with non-informative priors for the Kerala covid-19 data. The MLE of and can obtained from above Eqs. 7 and 8 respectively. The computation of MLE of and for Kerala covid-19 data samples, we used the Newton-Raphson method with the initial guess for and have been taken from contour plots, which is shown in Fig. 8 and their MLE are verified from Fig. 9 . For the calculation of Bayes estimate under the prior in Eq. 9, and are obtained by the expected value of and with respect to their marginal posteriors as stated above Eqs. (11) and (12) respectively. The Bayes estimate of and are obtained as ̂B and ̂B , respectively, and corresponding 95% HPD intervals are summerise in Table 4 . The highest predictive density interval (HPDI) is the interval that includes the most probable values of a given predictive density at a given significance level, subject to the condition that the density function has the same value at the end points. For more about HPD intervals, the reader is referred to Box and Tiao [8] . We have reported certain HPD intervals by using the Kerala covid-19 data, see Table 5 . Figure 10 represents the posterior predictive density and distribution function for the future observations. The graphs are displayed based on future observations. The EED studied in this article is flexible in modeling with increasing failure rate function. Based on different types of goodness of fit tests, we saw that EED provide better fit for Kerala covid-19 data. The MLE, Bayes estimates and nonparametric empirical posterior probability are derived. Among various types of model selection criterion as well as proposed empirical posterior probability, we observe that EED has perform better than ED. The MLE, AIC, BIC, CIs and HPDs are reported in Tables 3 and 4 , respectively. Since we have no prior information about data, the Bayesian estimation is done under non-informative prior. Moreover, we have also verified with the help of Table 5 , the Bayes predictive inference for one sample future observations of a real data set that are positive skewed. Also, we have seen in Table 5 , for next 43 days future covid-19 patients will be detected positive approximately 546 persons. Thus we recommend that EED may be used in future research for Bayesian inference with positive skewed data. How to identify a bathtub hazard rate A Bayesian analysis of the minimum AIC procedure Bayes inference under a finite mixture of two-compound Gompertz components model The outbreak of coronavirus disease 2019 (covid-19)-an emerging global health threat Bayesian prediction intervals for ranges and waiting times Statistical decision theory and Bayesian analysis Bayesian inference in statistical analysis Emerging coronaviruses: genome structure, replication, and pathogenesis Predictive inference Bayesian data analysis Nonparametric statistical inference: revised and expanded Modeling failure time data by Lehman alternatives Exponentiated exponential family: an alternative to gamma and Weibull distributions An invariant form for the prior probability in estimation problems Theory of probability Outbreak prediction of covid-19 for dense and populated countries using machine learning Monitoring novel corona virus (covid-19) infections in India by cluster analysis Bayesian methods: an analysis for statisticians and interdisciplinary researchers An outbreak of NCIP (2019-nCoV) infection in China-Wuhan What are the underlying transmission patterns of covid-19 outbreak? An age-specific social contact characterization Bayesian inference: Weibull Poisson model for censored data using the expectation-maximization algorithm and its application to bladder cancer data Coronavirus infections-more than just the common cold Estimating the dimension of a model Optimization based data mining: theory and applications Estimation of parameters and reliability function of exponentiated exponential distribution: Bayesian approach under general entropy loss function Estimation of parameters of generalized inverted exponential distribution for progressive type-ii censored sample with binomial removals Bayesian estimation for Poisson-exponential model under progressive type-II censoring data with binomial removal and its application to ovarian cancer data Joint modeling of longitudinal CD4 count and time-todeath of HIV/TB co-infected patients: a case of Jimma University specialized hospital Internet of things, real-time decision making, and artificial intelligence WHO (2020) Coronavirus disease 2019 (covid-19) situation report-71 Acknowledgements Authors are very heartfelt thank to the editor-in-chief, associate editors and referees for their valuable suggestions which make the path to improvements in this paper. Conflict of interest On behalf of all authors, the corresponding author states that there is no conflict of interest and no funding was received for conducting this study. If any of the sections are not relevant to the manuscript, kindly mention as 'Not applicable' under the heading 'Conflict of Interest' statement.