key: cord-281122-dtgmn9e0 authors: Ribeiro, Matheus Henrique Dal Molin; Mariani, Viviana Cocco; Coelho, Leandro dos Santos title: Multi-step ahead meningitis case forecasting based on decomposition and multi-objective optimization methods date: 2020-09-22 journal: J Biomed Inform DOI: 10.1016/j.jbi.2020.103575 sha: doc_id: 281122 cord_uid: dtgmn9e0 Epidemiological time series forecasting plays an important role in health public systems, due to its ability to allow managers to develop strategic planning to avoid possible epidemics. In this paper, a hybrid learning framework is developed to forecast multi-step-ahead (one, two and three-month-ahead) meningitis cases in four states of Brazil. First, the proposed approach applies an ensemble empirical mode decomposition (EEMD) to decompose the data into intrinsic mode functions and residual components. Then, each component is used as the input of five different forecasting models, and, from there, forecasted results are obtained. Finally, all combinations of models and components are developed, and for each case, the forecasted results are weighted integrated (WI) to formulate a heterogeneous ensemble forecaster for the monthly meningitis cases. In the final stage, a multi-objective optimization (MOO) using the Non-Dominated Sorting Genetic Algorithm – version II is employed to find a set of candidates’ weights, and then the Technique for Order of Preference by similarity to Ideal Solution (TOPSIS) is applied to choose the adequate set of weights. Next, the most adequate model is the one with the best generalization capacity out-of-sample in terms of performance criteria including mean absolute error (MAE), relative root mean squared error (RRMSE) and symmetric mean absolute percentage error (sMAPE). By using MOO, the intention is to enhance the performance of the forecasting models by improving simultaneously their accuracy and stability measures. To access the model’s performance, comparisons based on metrics are conducted with: (i) EEMD, heterogeneous ensemble integrated by direct strategy, or simple sum; (ii) EEMD, homogeneous ensemble of components WI; (iii) models without signal decomposition. At this stage, MAE, RRMSE, sMAPE criteria and Diebold–Mariano statistical test are adopted. In all twelve scenarios, the proposed framework was able to perform more accurate and stable forecasts, which showed, on 89.17% of the cases, that the errors of the proposed approach are statistically lower than other approaches. These results showed that combining EEMD, heterogeneous ensemble and WI with weights obtained by optimization can develop precise and stable forecasts. The modelling developed in this paper is promising and can be used by managers to support decision making. specific diseases. These approaches are adopted for different purposes such as comparing, 14 implementing, evaluating, prevention, therapy, and the development of public policies [4] . 15 Usually, the epidemic models are based on parameters related to susceptibility (S), infected 16 (I), and removed (R), as well as exposed (E) individuals can be considered which leads to 17 the SIR or SEIR models. Each variation of these models has its particularity, and different 18 factors can be considered in these approaches to provide knowledge about the disease spread. 19 Nowadays, these approaches have been proposed to understand the spread of new coronavirus 20 [5, 6] . Also, different mathematical approaches are proposed to mitigate the effects of several 21 diseases such as ebola [7] , influenza [8] , and malaria [9] . In the last years, a computer 22 science field called artificial intelligence (AI) able to recognize patterns of historical data and 23 support the decision making has received attention to solving problems from the commerce 24 [10] , and industry [11] . Machine learning models, an AI sub-field, becomes the kernel of data 25 analysis, once dealing with classification [12] , data clustering [13] , and regression tasks [14] . 26 Nonetheless, when it comes to matters of diseases that plague the Brazilian public health 27 system, such as dengue, malaria, and others, there are limited discussions as regards the 28 effectiveness of machine learning models to develop predictive models. Some of these studies 29 aimed to define the incidence of diseases such as ventriculitis and meningitis [15] , as well as 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 the original signal is split into five components (four IMF and one residue). In the sequence, 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 age (ARIMA), and seasonal autoregressive integrated moving average (SARIMA). Figure 1 125 associates the diseases and adopted modeling. 126 Figure 1 : Diseases related and its adopted modeling. Adjacent to the above mentioned, as well the presented in Appendix A, some gaps in 127 relation to the developed approaches can be found and are stated as follows: 128 • Considering the disease's type, around of 93.75% of the papers focused on malaria, 129 dengue or influenza. Hence, there is a lack of discussion concerning the predictive 130 capacity of machine learning-based approaches for diseases such as measles, meningitis, 131 and chikungunya on the forecast task; 132 • In the modeling aspect, only four papers focused on ensemble approaches such as bag-133 ging and boosting or models combined by average. One paper used an optimization 134 approach of the swarm intelligence field called firefly algorithm (FFA) for hyperparam-135 eters tuning, and no paper adopted signal decomposition or MOO with the purpose of 136 building ensembles. It is well known that the combination of these strategies can help 137 on the improvement of the model's accuracy and, therefore, out-of-sample generaliza-138 tion; 139 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 (test set, out-of-sample forecasting) in the same Brazilian state, but in different splitting 167 setups of the datasets. In this way, we tried to accommodate the data variability for each 168 state. 169 Figure 2 shows the study areas, the behavior of the number of notified cases by state 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 When the EMD is used, the main drawback is named "mode mixing", that is, each in which y i is i -th output value (i = 1, . . . , n), s is the number of neurons, x ij is an input 238 value, θ = [w 1 , . . . , w k , b 1 , . . . , b k , β 1 1 , . . . , β 1 p , β s 1 , . . . , β s p ], is the vector of weights and biases , 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 1. Minimization of in which, F(.) is a function of θ, y i andŷ i are the observed and estimated values, 245 respectively. Also, σ 2 e and σ 2 θ are the variance of errors, weights and biases, respectively; Considering that QRF uses the quantiles in the prediction process, the α-quantile of CDF 299 is stated as the probability that the number of notifications is lower than Q α if the given p t 300 is equal to α, where the estimate of α is stated as follows: The PLS regression approach is a technique to analyze multivariate data, in which the 306 aim is to relate one or two output variables (Y) with several inputs (X). For this purpose, 307 given a linear model, the problem that often arises is the matrix of inputs being singular. Faced with this, to deal with this problem, the PLS decomposes X into orthogonal scores T 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 subject to inequality constraints in which, θ = [θ 1 , . . . , θ i ], (i = 1, . . . , n) is a vector of decision variables, L and U are the 332 lower and upper boundaries for each decision variable, and J k (θ) is the k -th objective to be 333 minimized or maximized. In this respect, during the MOO step, an optimization algorithm is applied to find the 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 As regards to the MOO approach, most of the algorithms proposed in the literature are The NSGA-II parameters adopted in this paper are exposed in Section 5, item 4b. Lastly, in the MCDM step, it is possible to find a preferable set of decision variables 359 (weights in this paper) that allows dealing with the trade-off between the objectives. It -proof 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 Section 5, item 4c. and forecast the meningitis cases according to the recursive method, as given by Eq. (8), S : in which f is a function related to the adopted model in the training process,ŷ (t+h,k) 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 [65], the recursive strategy uses forecasting values as a model's input to forecasting the 390 next predicted values. Its main disadvantage is to accumulate the previous forecasting 391 errors in the recursive process. However, the advantage of the recursive method lies in 392 the use of one model for all processes, i.e., train one model to forecast a one-step-ahead 393 horizon, and use it for multi-step-ahead forecasting task. On the other hand, the direct 394 method uses only past values to predict the future, which is its advantage, as it does 395 not accumulate prediction errors. However, its disadvantage lies in the necessity to Machine learning models, sometimes, have high training time, either due to the use 408 of different training strategies such as cross-validation (k -fold or LOOCV-TS), or even 409 due to the number of parameters to be tuned. In this context, because in this paper 410 several models are evaluated, to find an efficient ensemble learning forecasting model 411 to study meningitis cases, the recursive forecasting strategy is adopted. Moreover, 412 the forecasting horizons are defined as one, two, and three months-ahead, which are 413 considered short-term. Therefore, even though the recursive method could lead to high 414 forecasting errors, it is used in this paper due to its lower computational cost. However, 415 the direct method can be considered for this study. 416 In this paper, the control hyperparameters of the adopted models in step 2 are ob- EEMD components. Therefore, the methodology used to select the models' order is 432 the grid-search. Table 1 shows a sample of 3 out of 3125 ensemble learning models, 433 randomly selected, where the order of models for each component is detailed. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 (a) In the MOP, the cost function, for each combination of models and components, 440 is stated as follows: 441ŷ t+h = n=4 j=1 θ j IM F M odel j + θ 5 Residue M odel 5 ,(9) Considering the bias-variance framework [57], the objectives are defined as follows: whereθ j is the estimated weight; 462 6. Computing the performance measures MAE, RRMSE and sMAPE given by, [71] is applied. In this paper, the lower tail priori hypothesis H is given by Eq. (16), H : and statistic of DM test is given by Eq. (17), 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 In addition, Figure 3 presents the modeling process. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 residue, to forecast the number of meningitis cases one, two and three-month-ahead of time. According to Zhang et al. [77] , this effectiveness is associated with the diversity used by the 510 heterogeneous ensemble approach, which is an efficient and simple way to perfect forecasting 511 accuracy and stability (lower standard deviation of the errors). Faced with this, it makes 512 the predictive model more robust. -proof 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 Concerning the one-month-ahead forecast, for all states, the proposed framework achieves 514 better accuracy than the compared models. Regarding MAE criterion, the compared mod-515 els increasing the forecasting errors regarding proposed methodology, which ranges between 516 22.73% and 36.36%, 36.87% and 138.22%, 0.72%, and 54.35%, as well as 41.51% and 66.04% The comparison II is designed to verify the forecasting performance of the proposed 540 hybrid framework by comparing it with five models which do not consider EEMD for signal 541 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 decomposition, namely BRNN, CUBIST, GBM, QRF, and PLS. The comparisons are shown 542 in Table 3 and additional discussion is presented. Regarding these results, it is observed that 543 the use of signal decomposition, specifically in the case of the EEMD approach, can enhance 544 the model's performance. This shows that the EEMD approach is suitable for decomposing The results of one-month-ahead up to three-month-ahead forecast for meningitis cases 557 were obtained. The comparisons between the proposed approach and the models without -proof 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 The EEMD-HTE-MOO presents a better generalization capacity than the EEMD-HTE- 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 concerning the use of the DI strategy for the task of grouping the EEMD components. Therefore, regarding the results presented in Subsections 6.1, 6.2, 6.3 and 6.4, the pro-651 posed approach showed better accuracy than the compared models. In parallel, the proposed 652 framework achieves excellent performance in 83.33% and 25% of the cases, and good results Considering what is shown by Figures 4a, 5a, 6a and 7a evidence of a trade-off among 662 the objectives adopted on MOO can be seen, in other words, depending on the weights used 663 to create the ensemble obtained from the EEMD, there is a bias increase while the variance 664 decreases. In this aspect, the use of MOO is adequate, because it allows the obtaining of 665 an efficient model that is able to reach small forecast errors and lower standard deviation 666 errors. The same behavior is replicated for the other two forecast horizons. During this round, Figures 4b, 5b, 6b and 7b show that the data behavior is learned 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 by the models, in most of the cases, which allows predictions compatible with the observed 669 values. That is, the meningitis cases forecasted are close to the observed values. The good 670 performance regarding accuracy obtained in the training phase persists in the test stage, 671 indicating that the hybrid framework is robust to reach the developed predictions. The overfitting phenomenon occurs when the model has great generalization in the train-673 ing set, but not in test set or out-of-sample forecasting. To avoid it, two approaches were 674 considered. First, each adopted model was trained using a cross-validation procedure, as 675 described in the methodology section, to prevent overfitting. And second, when the bias and 676 variance are objectives adopted in multi-objective optimization, the trade-off between these 677 measures is considered which leads to overfitting treatment. Also, by the illustrated trough 678 of predicted and observed values (Figures 4b, 5b, 6b and 7b) , once a similar performance is 679 observed in training and test sets, there is no evidence of overfitting. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 assembling of MCDM techniques. Also, is desirable to compare the recursive and direct 724 methods to perform multi-step-ahead forecasting for the proposed task. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 -proof 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 Appendix A. Summary of related works 1051 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 Compared with LASSO and LSTM approaches, the proposed framework reach better improvement than this approaches for the adopted task. J o u r n a l P r e -p r o o f Journal Pre-proof 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 Viral (aseptic) meningitis: A 732 review Current meningitis outbreak in Ghana: Historical perspectives and the importance 735 of diagnostics Departamento de Informática do SistemaÚnico 738 de Saúde (DATASUS) The mathematics of infectious diseases COVID-ABS: An agent-based model of COVID-19 epidemic to simulate 745 health and economic effects of social distancing interventions Modeling and forecasting the COVID-19 748 pandemic in India An Ebola model with hyper-susceptibility A 753 Bayesian system to detect and characterize overlapping outbreaks Modeling pyrethroids 756 repellency and its role on the bifurcation analysis for a bed net malaria model An artificial intelligence 759 system for predicting customer default in e-commerce Opportunities and challenges of artificial 762 intelligence for green manufacturing in the process industry A novel bagging C4.5 algorithm based on 765 wrapper feature selection for supporting wise clinical decision making Predicting temporal 772 propagation of seasonal influenza using improved Gaussian process model Healthcare-associated ventriculitis and meningitis in a neuro-ICU: Incidence 776 and risk factors selected by machine learning approach Mapping the transmission risk of zika 779 virus using machine learning models Complementing the power of deep learning 782 with statistical model fusion: Probabilistic forecasting of influenza in Dallas County Multi-785 objective ensemble model for short-term price forecasting in corn price time series International Joint Conference on Neural Networks (IJCNN) Ensemble learning by means of a multi-objective 789 optimization design approach for dealing with imbalanced data sets Ensemble methods in machine learning Ensemble approach based on bagging, boosting 794 and stacking for short-term prediction in agribusiness time series Bayesian interpolation Combining instance-based and model-based learning ICML'93 Instance-based learning algorithms Greedy function approximation: A gradient boosting machine. The 881 Quantile regression forests Random forests Using quantile regression forest to estimate uncertainty of 888 digital soil mapping products The multivariate calibration problem in chemistry 891 solved by the pls method Partial least-squares regression: a tutorial Automatic clustering-based identification of autoregres-922 sive fuzzy inference models for time series Identification of lags 925 in nonlinear autoregressive time series using a flexible fuzzy model Forecasting third-party mobile payments with implications 928 for customer flow prediction Multiple steps ahead solar photovoltaic power forecasting 931 based on univariate machine learning models and data re-sampling Multi-step ahead 934 forecasting of heat load in district heating systems using machine learning algorithms Forecasting: principles and practice A novel combined model based on advanced optimiza-940 tion algorithm for short-term wind speed forecasting Comparing predictive accuracy R: A Language and Environment for Bus Statistical Computation Coyote optimization algorithm: A new metaheuristic 996 for global optimization problems Cultural coyote 999 optimization algorithm applied to a heavy duty gas turbine operation. Energy Convers 1000 Manage Metaheuristic inspired on 1002 owls behavior applied to heat exchangers design Design of heat exchangers 1005 using Falcon optimization algorithm A 1008 support vector machine-firefly algorithm based forecasting model to determine malaria 1009 transmission Developing a 1012 dengue forecast model using machine learning: A case study in China 1015 et al. Modeling dengue vector population using remotely sensed data and machine 1016 learning The utility of LASSO-based models 1018 for real time forecasts of endemic infectious diseases: A cross country comparison Employing machine learning techniques for the malaria 1021 epidemic prediction in Ethiopia United Kingdom Real time influenza monitoring using hospital big data in combination 1026 with machine learning methods: Comparison study Forecasting influenza epidemics by 1029 integrating internet search queries and traditional surveillance data with the support 1030 vector machine regression model in Liaoning A GIS-based artificial neural 1033 network model for spatial distribution of tuberculosis across the continental United 1034 A comparison of 1037 three data mining time series models in prediction of monthly brucellosis surveillance 1038 data A novel data-driven model for real-time influenza forecasting Artificial neural network based prediction of malaria 1043 abundances using big data: A knowledge capturing approach Forecasting dengue epidemics using a 1046 hybrid methodology -1048 fluenza activity using self-adaptive ai model and multi-source data in Chongqing Declaration of interests (X) The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.( ) The authors declare the following financial interests/personal relationships which may be considered as potential competing interests:J o u r n a l P r e -p r o o f Journal Pre-proof