key: cord-0878930-dafll0s6 authors: Ak, Ç.; Ergönül, Ö.; Gönen, M. title: A prospective prediction tool for understanding Crimean–Congo haemorrhagic fever dynamics in Turkey date: 2019-05-24 journal: Clin Microbiol Infect DOI: 10.1016/j.cmi.2019.05.006 sha: 5cb8afcb4fe6d5261f741ea877818fe3c2b69b84 doc_id: 878930 cord_uid: dafll0s6 OBJECTIVES: We aimed to develop a prospective prediction tool on Crimean–Congo haemorrhagic fever (CCHF) to identify geographic regions at risk. The tool could support public health decision-makers in implementation of an effective control strategy in a timely manner. METHODS: We used monthly surveillance data between 2004 and 2015 to predict case counts between 2016 and 2017 prospectively. The Turkish nationwide surveillance data set collected by the Ministry of Health contained 10 411 confirmed CCHF cases. We collected potential explanatory covariates about climate, land use, and animal and human populations at risk to capture spatiotemporal transmission dynamics. We developed a structured Gaussian process algorithm and prospectively tested this tool predicting the future year's cases given past years' cases. RESULTS: We predicted the annual cases in 2016 and 2017 as 438 and 341, whereas the observed cases were 432 and 343, respectively. Pearson's correlation coefficient and normalized root mean squared error values for 2016 and 2017 predictions were (0.83; 0.58) and (0.87; 0.52), respectively. The most important covariates were found to be the number of settlements with fewer than 25 000 inhabitants, latitude, longitude and potential evapotranspiration (evaporation and transpiration). CONCLUSIONS: Main driving factors of CCHF dynamics were human population at risk in rural areas, geographical dependency and climate effect on ticks. Our model was able to prospectively predict the numbers of CCHF cases. Our proof-of-concept study also provided insight for understanding possible mechanisms of infectious diseases and found important directions for practice and policy to combat against emerging infectious diseases. CrimeaneCongo haemorrhagic fever (CCHF) is a tick-borne viral infection usually transmitted by tick bites, or through contact with tissues, blood or other bodily fluids from infected people and animals [1] . Turkey has the highest case counts among other countries where it remains endemic. Hyalomma marginatum ticks are the primary vectors, and they feed on animals at each developmental stage. Both wild and domesticated animals are important in the disease transmission cycle, serving as reservoirs for the continuation of tick re-infection. People working or living close to livestock or to habitats of the vector ticks are particularly at risk. Human-to-human transmission is possible, typically among health-care workers or care-givers. When the possibility for enzootic transmission exposure increases, the risk of CCHF virus infection for humans increases as well [2] . Environmental changes can influence both the survival and reproduction of H. marginatum ticks, then may trigger community outbreaks. For example, neglect of agricultural lands and agricultural reforms causing landscape alterations may be an important factor for the emergence of CCHF. The investigation of those environmental factors that may influence the cycle of CCHF is relevant for outbreak preparedness and response. Some of the seasonal and climatic covariates were previously reported as important predictors of CCHF virus infections [3e5] . Areas with higher temperatures, precipitation and humidity were linked with high CCHF occurrence in Bulgaria and Iran [4, 5] . Suitable habitat for H. marginatum ticks was reported as fragmented agricultural lands, forested lands and grass cover in Turkey and Bulgaria, and non-irrigated agricultural land (e.g. pasture) was found to be correlated with CCHF case counts in Turkey [5e7]. The use of spatiotemporal modelling tools might help us better understand the characteristics of established outbreaks to develop different types of interventions to prevent and treat diseases. Predicting the emergence is not realistic because there are so many variables; nevertheless predicting the spatial and temporal trajectory is feasible and probably more effective [8] . Such studies were carried out for Ebola, Zika, H1N1 influenza, and severe acute respiratory syndrome viruses and the results of these studies helped decision-makers to plan bed capacity [9] , anticipate travelrelated spread [10] and plan vaccine trials [11] . World-wide CCHF retrospective risk maps were reported using the published cases [12] , however, a prospective risk analysis based on a comprehensive set of data including climatic, environmental and husbandry parameters is still lacking. Turkey has the highest number of laboratory-confirmed CCHF cases. Monthly data covering 14 years and comprising >10 000 cases could be valuable for understanding the spatiotemporal dynamics of disease spread. We have already presented the improved performance of a structured Gaussian process (GP), against frequently used machinelearning algorithms used in ecological and epidemiological applications [13] . Here we describe the spatiotemporal dynamics of CCHF and extract the important covariates for CCHF virus infection using a structured GP method on the surveillance data set for Turkey. We tested the generalization capability of our approach by predicting where and how many CCHF cases will be observed in each month in 2016 and 2017 prospectively. The surveillance data consist of monthly case counts (i.e. observations) for each province. Our regression model takes the past case counts and covariate information as inputs and outputs a numeric value as the future case count. The date (i.e. month and year) and location (i.e. province) of the laboratory-confirmed CCHF cases in Turkey between January 2004 and December 2015 were obtained from the Ministry of Health to train our predictive model. We were provided with the surveillance data between January 2016 and December 2017 after we made our predictions for those years. In our study, the province centres were used as the case locations. We collected over 50 potentially related spatial and temporal covariates for use as input in our model. These covariates are listed in the Supplementary material (Table S1 ). Detailed interpretations of the covariates are presented at http://midas.ku.edu.tr/ProspectiveCCHF. Latitudes, longitudes and altitudes of province centres were taken from the website of the General Directorate of Highways (http://www.kgm.gov.tr). The remaining spatial covariates were obtained from the Census of Agriculture Agricultural Holdings (Households) of Turkey, which can be found on the website of the Turkish Statistical Institute (http://www.turkstat.gov.tr). Year and month information was extracted from the surveillance data given. CCHF cases had been observed frequently during hot months (e.g. May, June and July), moderately during warm months (e.g. April, August and September) and rarely during cold months (e.g. October, November, December, January, February and March). We encoded each time period by three temporal covariates: the year, month and seasonal group (i.e. hot, warm or cold) to which it belonged. Climate covariates were taken from the Climatic Research Unit database [14] , and other temporal covariates were obtained from the website of the Turkish Statistical Institute. The number of households was divided by the total population of each province and land-related covariates were divided by the total area of each province to make these covariates comparable across different provinces. Gaussian process regression is a machine-learning algorithm that finds a relation between an output y (e.g. CCHF cases) and a set of inputs x (e.g. longitude, latitude, date, etc.). The main assumption of this model is that there is an unobserved or latent function f that depends on x, but for which we only have access the version with some noise, y. This unobserved variable is a GP with the mean vector m and covariance matrix S, which depends on the inputs [15] . In this study, we formulated a GP model with a Kronecker decomposition approach for spatiotemporal modelling, named structured GP, to learn covariance functions for both knowledge extraction and prediction. Our main hypothesis about the spatiotemporal processes is that response values depend on both time and location. We need a kernel function (i.e. covariance function) that makes nearby observations in time and/or space produce similar values. Each spatial and temporal covariate is fed into a kernel function for structured GP, (see Supplementary material, Appendix S1, for a detailed description). We get a better understanding about the underlying dynamics of the process to be modelled when data can be explained with fewer covariates, which may be hidden or latent factors that in combination play greater roles in the observed dynamics. To find these fewer but important covariates, we optimized each covariate's relative importance. For the 2016 prediction, we used the years 2004e2015 as training sets (81 provinces  144 months). We then used the trained model to predict case counts of 81 provinces for 2016 (81 provinces  12 months). For the 2017 prediction, we used the years 2004e2016 as training sets (81 provinces  156 months). We then used the trained model to predict case counts of 81 provinces for 2017 (81 provinces  12 months). A study in Turkey found that areas with CCHF cases had lower mean temperatures in the late autumn and the winter [16] . We used the fact that vector-borne disease dynamics are affected by the previous year's weather conditions, animal population, etc. because vector abundance is also affected by these. Hence, covariates of this year will be used to make predictions for the case counts of next year. We trained our model using all spatial covariates, the temporal covariates between 2003 and 2014 and the case counts for years 2004e2015; then given all the spatial covariates and temporal covariates of the year 2015 and the learned parameters from our trained model we predicted the cases for 2016. The same approach was applied for the 2017 predictions. We focused on prospective predictions of the years 2016 and 2017. Prediction for any given year can be done given the covariates of the previous year. The Pearson's correlation coefficient and normalized root mean squared error were used to measure the prediction performance. Computational modelling was performed using the statistical software package R [17] . The input covariates, nationwide CCHF surveillance data set and our computational results reported in this study can be publicly explored and downloaded at http://midas.ku.edu.tr/ ProspectiveCCHF/. In Turkey, 10 411 confirmed CCHF cases were reported between years 2004 and 2017, mainly from April to October, and yearly epidemic curves peaked around June and July (Fig. 1a) . Most of these confirmed CCHF cases were reported in north and northeast regions of Anatolia (Fig. 1b) . Detailed interpretations of the case counts are presented at http://midas.ku.edu.tr/ProspectiveCCHF/. We predicted the nationwide annual case count for 2016 as 438, whereas the observed case count was 432 (Fig. 2) . Similarly, we predicted the nationwide annual case count for 2017 as 341, whereas the observed case count was 343 (Fig. 3) . Pearson's correlation coefficient and normalized root mean squared error values for the 2016 prediction scenario are 0.83 and 0.58, respectively. For the 2017 prediction, Pearson's correlation coefficient is 0.87 and normalized root mean squared error is 0.52. Each month's prediction for all provinces on a map can be seen at http://midas.ku.edu. tr/ProspectiveCCHF/. Latitude and number of settlements with <25 000 inhabitants covariates of provinces (i.e. spatial covariates) and monthly potential evapotranspiration (evaporation and transpiration) measurements (i.e. temporal covariate) were found to be the most explanatory covariates for the 2016 prediction (see Supplementary material, Figure S1a ,b). In the 2017 prediction, number of settlements with <25 000 inhabitants and longitude covariates of provinces (i.e. spatial covariates) and monthly potential evapotranspiration measurements (i.e. temporal covariate) were the most important covariates (see Supplementary material, Fig. S1c,d) . Turkey has the highest number of laboratory-confirmed CCHF cases, and we included all 10 441 CCHF cases into our computational analyses. We used a unified model including a rich collection of spatial and temporal data sources to determine the relative importance of each data source. We evaluated our approach by performing monthly predictions for each province in a prospective manner. The latitude, longitude and number of settlements with <25 000 inhabitants were found to be the most important spatial covariates for predicting CCHF case counts prospectively. Potential evapotranspiration and season were found to be the most informative temporal covariates for both the 2016 and the 2017 predictions. The importance of number of settlements with <25 000 inhabitants could be related to the human population at risk living close to the habitat of ticks and animals as these settlements are usually situated in rural areas where people are engaged in agricultural activities. The number of settlements with <25 000 inhabitants is important for both years, but positions of latitude and longitude switched their rankings in terms of importance. This finding is in line with the increased number of CCHF cases in eastern parts in later years, which can be better captured by longitude rather than latitude. Evapotranspiration is a climate variable and is defined as the total water vapour produced in the water basin as a result of the growth of plants in the water basin. Potential evapotranspiration is evapotranspiration at the time when there is sufficient water available to provide for a surface completely covered with plants. This term refers to providing the ideal amount of water to plants. It is also obvious that season covariate determines the temporal behaviour of CCHF or other seasonal infectious diseases in general. These two important temporal covariates confirm the role of the climate for the underlying mechanism of CCHF. Careful follow up of these covariates may provide possible warnings in the short term instead of having to wait for yearly predictions from our model. Higher temperature was previously found as a main driver for the abundance of H. marginatum [1, 12, 16, 18] because high temperatures may accelerate the life cycle of ticks and so increase host questing. In our study, we found that yearly changes in the land involving olive trees, fallow land and forest land were more important than the animal population (see Supplementary material, Fig. S1b,d) . Our findings were parallel with those of another report in which the land cover, rather than climate and animal population, was found to be the main driver for world-wide distribution of CCHF. Those authors commented that these factors might be more important in predicting finer-scale prevalence patterns [12] . We used the annual data of husbandry from the Turkish Statistical Institute for the first time, and our model was able to reveal the importance of different animal groups (see Supplementary material, Fig. S1b,d) . In our model for the 2017 prediction, goats, cattle and sheep were found to be the most significant animals for CCHF dynamics and spread, respectively. These findings contradict the observations of veterinarians in the field, who claim that bovine/cattle livestock are more important than goat livestock in the transmission cycle of the virus. This contradiction implies that there are some other underlying reasons such as the farmers; those caring for the goats might come into hand contact with them with or without protection. We must take into account the possible reasons why a covariate is chosen and take precaution against it respectively. The importance of covariates that may be related to human action indicates that awareness is lacking in some parts of the country about the presence of CCHF or precautions against CCHF. Our model identifies the directions to which we should pay close attention with high priority. For instance, in the areas with high goat, cattle or sheep density, agricultural workers and others working with animals should also be monitored and must be informed about CCHF. For further investigation, tick abundance studies in the field should be developed and improved. Annual predictions for 2016 and 2017 are accurate, but the predictions for individual provinces are not as much accurate (Figs. 2 and 3) . Predicting the total number of cases from overall seasonality is easier than capturing spatial dependencies because time-series data are dependent on whether there is seasonality behaviour of the data. One limitation of this study is that our model may not predict an outbreak if the reason for the outbreak is not related to the covariates that we used to train our model. However, when the first data of the outbreak arrive, the model will update itself accordingly, although there might be some delay for accurate predictions. Another limitation is that even if the surveillance data are ready, covariate data (e.g. livestock statistics) might be published much later or might be incomplete at the time of prediction. Then, the model would not be able to benefit from all information sources to better capture the progress of disease dynamics. However, these problems are valid for all data-driven models. Our proof-of-concept study provided insights for understanding possible mechanisms of infectious diseases and found directions with high priority for practice and policy to combat against emerging infectious diseases. We tested our tool on a single disease, but the same framework can be extended towards other vectorborne infectious diseases, as well as other infectious diseases. CrimeaneCongo haemorrhagic fever Drivers, dynamics, and control of emerging vector-borne zoonotic diseases Spatial analysis of CrimeaneCongo hemorrhagic fever in Iran CrimeaneCongo hemorrhagic fever and its relationship with climate factors in southeast Iran: a 13-year experience Environmental correlates of CrimeaneCongo haemorrhagic fever incidence in Bulgaria An early warning system for CrimeaneCongo haemorrhagic fever seasonality in Turkey based on remote sensing technology Modeling the spatial distribution of CrimeaneCongo hemorrhagic fever outbreaks in Turkey Pandemics: spend on surveillance, not prediction Centers for Disease C, Prevention. Effectiveness of Ebola treatment units and community care centersdLiberia Assessment of the potential for international dissemination of Ebola virus via commercial air travel during the 2014 West African outbreak Real-time dynamic modelling for the design of a cluster-randomized phase 3 Ebola vaccine trial in Sierra Leone The global distribution of CrimeaneCongo hemorrhagic fever Spatiotemporal prediction of infectious diseases using structured Gaussian processes with application to CrimeaneCongo hemorrhagic fever Updated high-resolution grids of monthly climatic observationsdthe CRU TS3.10 Dataset Gaussian processes for machine learning The trend towards habitat fragmentation is the key factor driving the spread of CrimeaneCongo haemorrhagic fever A language and environment for statistical computing Factors driving the circulation and possible expansion of CrimeaneCongo haemorrhagic fever virus in the western Palearctic The authors declare no conflict of interests. This work was funded by the Turkish Academy of Sciences (TÜBA-GEBiP; The Young Scientist Award Programme) and the Science Academy of Turkey (BAGEP; The Young Scientist Award Programme). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. We are grateful to the Public Health Directorate of the Ministry of Health of Turkey for providing us with the nationwide CCHF surveillance data set. Supplementary data to this article can be found online at https://doi.org/10.1016/j.cmi.2019.05.006. The input covariates, nationwide CCHF surveillance data set and our computational results reported in this study are accessible at http://midas.ku.edu.tr/ProspectiveCCHF/. OE and MG designed the study and interpreted the results. ÇA collected and cleaned the spatial and temporal covariates used, implemented the software and generated the figures. ÇA and MG designed the software and performed the data analysis. ÇA and € OE did the literature search, and drafted the first version of the paper, which was revised by MG. All authors contributed to the final version of the Article.