key: cord-0025239-bco2hy73 authors: Sylvestre, Emmanuelle; Joachim, Clarisse; Cécilia-Joseph, Elsa; Bouzillé, Guillaume; Campillo-Gimenez, Boris; Cuggia, Marc; Cabié, André title: Data-driven methods for dengue prediction and surveillance using real-world and Big Data: A systematic review date: 2022-01-07 journal: PLoS Negl Trop Dis DOI: 10.1371/journal.pntd.0010056 sha: 13645b5a66858e8e73ee0689963cd54fd6d08f14 doc_id: 25239 cord_uid: bco2hy73 BACKGROUND: Traditionally, dengue surveillance is based on case reporting to a central health agency. However, the delay between a case and its notification can limit the system responsiveness. Machine learning methods have been developed to reduce the reporting delays and to predict outbreaks, based on non-traditional and non-clinical data sources. The aim of this systematic review was to identify studies that used real-world data, Big Data and/or machine learning methods to monitor and predict dengue-related outcomes. METHODOLOGY/PRINCIPAL FINDINGS: We performed a search in PubMed, Scopus, Web of Science and grey literature between January 1, 2000 and August 31, 2020. The review (ID: CRD42020172472) focused on data-driven studies. Reviews, randomized control trials and descriptive studies were not included. Among the 119 studies included, 67% were published between 2016 and 2020, and 39% used at least one novel data stream. The aim of the included studies was to predict a dengue-related outcome (55%), assess the validity of data sources for dengue surveillance (23%), or both (22%). Most studies (60%) used a machine learning approach. Studies on dengue prediction compared different prediction models, or identified significant predictors among several covariates in a model. The most significant predictors were rainfall (43%), temperature (41%), and humidity (25%). The two models with the highest performances were Neural Networks and Decision Trees (52%), followed by Support Vector Machine (17%). We cannot rule out a selection bias in our study because of our two main limitations: we did not include preprints and could not obtain the opinion of other international experts. CONCLUSIONS/SIGNIFICANCE: Combining real-world data and Big Data with machine learning methods is a promising approach to improve dengue prediction and monitoring. Future studies should focus on how to better integrate all available data sources and methods to improve the response and dengue management by stakeholders. a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 Dengue virus (DENV) is an arbovirus transmitted to humans by Aedes aegypti or Aedes albopictus female mosquitoes [1] . The incidence of dengue, the disease caused by DENV, has rapidly increased around the world in recent decades [2] due to population growth, urbanization, increased travel, and insufficient vector control [3] . The World Health Organization (WHO), considers dengue a major global public health challenge in the tropical and subtropical regions [4] . Today, dengue is one of the most important vector-borne diseases in the world and recent studies on its prevalence estimate that 3.9 billion people are at risk of transmission, with 390 million infections and 96 million symptomatic cases per year [1, 5] . Although most infections are asymptomatic or are characterized by intense flu-like symptoms that last up to 10 days [6] , severe forms of dengue hemorrhagic fever/dengue shock syndrome can also occur [7] and might lead to death. Mortality due to dengue can be greatly reduced by early diagnosis, appropriate clinical management [3, 7] . Most dengue-endemic regions (mainly South-East Asia, the Americas, and the Pacific region) rely on traditional surveillance, based on hospital syndromic reporting and laboratory confirmation of a subset of cases to a central health agency [3, 8] . The method is very accurate, but is hampered by its lack of responsiveness with substantial delays between a case and its notification [8] , which can limit the health system ability/rapidity to put in place appropriate measures to avoid drastic consequences. Moreover, this traditional surveillance system is expensive, due to the time needed to aggregate and manually validate data [9] . These limitations have prompted researchers to investigate other solutions. Many studies have described alternative methods, such as mobile, digital and Internet-based systems, to efficiently crowdsource data from the community [3] . However, these approaches have not been translated yet into the standard dengue management practice. Yet, they are relevant for all dimensions of dengue management, such as monitoring, clinical management, and dengue outbreak forecasting [3, 8] . Over the years, scientists have developed statistical and machine learning models to reduce the reporting delays and monitor new cases in almost real-time, but also to accurately use non-traditional and non-clinical data sources (e.g. Internet search engines and social media platforms) to predict communicable disease outbreaks [10] [11] [12] [13] , including dengue. Many studies have proposed new strategies based on Big Data and machine learning models to improve dengue outbreak management. However, recent systematic reviews only examined the relevance and usefulness of Internet-based surveillance systems in emerging tropical disease management [8, 14] , and they did not focus specifically on dengue management. Furthermore, recent systematic reviews on dengue analyzed monitoring [15] , vaccine efficacy [16] , epidemiological trends [17, 18] , the overall disease burden [19] [20] [21] and clinical prognosis models [22] , but they did not discuss these new methods to improve dengue management. Therefore, the first aim of this systematic review was to identify and describe all real-world and Big Data-based methods used to monitor and predict/forecast dengue-related outcomes, regardless of the region and/or population. The second aim was to analyze several features of these studies, such as the data sources and their origin, the different outcome types (e.g. epidemiological and clinical outcomes), the chosen statistical methods, and their performance and variability based on the population and location. This systematic review was performed following the "Preferred Reporting Items for Systematic Reviews and Meta-Analyses" (PRISMA) guidelines [23] . Four reviewers (ES, CJ, AC and MC) developed the systematic review protocol. The literature search was performed in September 2020. The study protocol was registered on the PROSPERO registry of systematic reviews (ID: CRD42020172472). The review focused on studies that used real-world data, Big Data and/or machine learning methods to monitor, predict and/or forecast dengue outbreaks or dengue-related outcomes (clinical or epidemiological). Studies from any country (also regions outside endemic regions) were included, without any language filter. Analyses could be performed on past or future data. • Dengue diagnosis based on the standard WHO definition [7] Finally, the references of the retained studies and of major dengue epidemiological review articles were screened to identify studies overlooked by the previous search strategies. Selection process. Two independent authors (ES and CJ) screened the title and abstract to select relevant studies for the review. They read the full text of all studies that seemed to meet the eligibility criteria, or if the abstract was not explicit enough to make a decision. In case of disagreement, a third reviewer helped to reach a consensus (AC). Quality assessment, data collection, extraction, and analysis. Two reviewers (ES and CJ) extracted data from the selected articles, including first and last authors, year of publication, study period, objectives, study population, methodology, model performance and evaluation, study site (S1 Text). As reporting guidelines for machine learning models and real-world data studies are not available, each reviewer independently performed a quality assessment using quality assessment criteria described in previous review articles on these topics [27-29] (S1 Table) . A narrative synthesis of all eligible studies was prepared using the following framework: i) data sources and outcomes, ii) statistical and machine learning methods, iii) evaluation metrics, and iv) study results. All descriptive analyses from the extracted articles were performed using R version 3.6.3 [30] . Among the 2064 studies identified, 119 articles were included in this systematic review (Fig 1) [31 -148] . Although the search time window was from January 1, 2000, the first included studies were published in 2008, and 67% of the eligible articles were published between 2016 and 2020 (Fig 2) . The study populations were predominantly from South-East Asia (37%) and South America (22%). Among the 119 papers included, 77 (65%) were articles, and 42 (35%) were conference papers. On the basis of the Web of Science "Research Area" and the Scopus "Subject Area" classification, the topic of the selected articles were aggregated into eight categories and three main themes: i) Information Technology & Science (52% of all articles), ii) Medicine (24%), and iii) Health Informatics, Public Health & Biology (24%) ( Table 1) . Conference papers were mainly classified in the "Information Technology & Science" category (39/42; 93%), whereas articles were more evenly distributed in the "Medicine" (28/77; 36%), "Health Informatics, Public Health & Biology" (26/77; 34%) and "Information Technology & Science" (23/77; 30%) themes (S2 Table) . The complete list of all selected studies and their characteristics are in S3 Table. All included studies, except one [68] , used only retrospective data. Most articles had multiple and heterogeneous data sources. The most conventional data sources were: government agencies (n = 72, 46%) and medical institutions (e.g. hospitals/laboratories) (n = 30, 19%). The data retrieved from these sources included epidemiological data, climate and environmental data from meteorological departments, and clinical and biological data. Some studies also used open access data from the WHO or from databases of published studies (S3 Table) . Among the included studies, 47/119 (39%) used at least one novel data stream, such as Internet search engines and social networks [14] . Most of these studies (n = 41, 87%) were published after 2015. Google was the most frequently used Internet search engine (n = 19 Data-driven methods for dengue management using real-world data: A systematic review studies) and Twitter the most frequently used social network (n = 18). Many studies based on novel data streams were research articles (n = 33, 70%), but the main theme, regardless of the study type (Conference paper or Article) varied depending on the data. Specifically, studies based on Google data were classified homogeneously into the three main themes. Conversely, studies that exploited social networks as data source were evenly distributed between Conference papers (n = 9) and Articles (n = 10), but only few of them were classified into the Medicine theme ( Table 2) . Most studies used structured data, but 41 (34%) studies had an unstructured data source, such as Internet search-based queries or Twitter (Table 2) . Among the 41 studies that used unstructured data, 28 (68%) did not develop their own pre-processing methods for these data sources, but simply used keywords related to their research. However, when studies used Natural-Language Processing (NLP)-based methods, they had a full pre-processing framework based on the NLP state-of-the-art recommendations. Overall, studies that used non-conventional data relied less frequently on clinical data. Conversely, studies that used human data relied mostly on traditional sources, such as weather and environmental data. Moreover, genomic and vector data were vastly underused in combination with other sources, because only five studies using at least one of these sources were included in this systematic review. Data sources are detailed in Table 2 . Data-driven methods for dengue management using real-world data: A systematic review The main aim of the included studies was to predict a dengue-related outcome (n = 65, 55%), to assess the validity of data sources for dengue surveillance (n = 29, 24%), or both (n = 25, 21%). The most frequently chosen outcomes (for prediction and monitoring) were dengue incidence rate (n = 58, 49%), dengue diagnosis based on symptoms (n = 20, 17%), and dengue outbreaks (n = 18, 15%) (S4 Table) . Only one study [48] used NLP-based methods for dengue prediction or surveillance, but as a pre-treatment step to extract and format data for modelling. The model choice was related to the study objectives (prediction/forecasting or validity of a data source for dengue monitoring). Overall, most studies compared the performances of different models and statistical methods. The most frequently used models, regardless of the study aim(s), were regression-based models (25%), followed by decision-tree models (18%), and artificial neural networks (15%). Most studies on dengue monitoring used correlation analyses to identify relevant variables and/or data sources. Correlation methods (Pearson correlation or Spearman correlation) were especially useful to assess the validity of novel data streams, such as Twitter and Internet search engines. Most studies that included machinelearning algorithms used supervised learning methods (69%). The models' characteristics are detailed in Table 3 . Data-driven methods for dengue management using real-world data: A systematic review To evaluate and assess the performance of the chosen statistical methods and/or models, 71 studies (60%) used a machine learning approach and partitioned their data into a training set and a test set. Like for the models, the choice of evaluation metrics was closely related to the study aim(s). All articles used at least one metric, and most of them more than one. Overall, the most common metrics were based on a Confusion Matrix (53%), with Accuracy as the most used metric, followed by Recall or Sensitivity. Correlation-based metrics were used in 37% of studies, especially correlation coefficients (Pearson or Spearman, depending on the data source). The aim of most studies that used correlation metrics was to assess a data source for dengue monitoring (n = 37, 84% of the 44 studies with correlation metric). Error-based metrics were also commonly used (n = 35, 29% of all studies). Few studies used other metrics (n = 22, 18% of all studies) and only 9 studies (8%) did not used at least one metrics falling into the above categories. (Table 4 ). Among the 54 studies on surveillance, 37 (68%) assessed novel data streams, such as Internet search engines and social media, particularly Google (n = 16, 30%) and Twitter (n = 16, 30%). The most common traditional data source evaluated was climate, environmental and geographic data (n = 13/54; 24%) (S5 Table) . All studies found a statistically significant association between the data source and the dengue-related outcome. The aim of the studies on prediction (n = 90) could be categorized in two main groups: i) comparing different models to predict a dengue-related outcome, and ii) finding the (17) 14 (22) 4 (14) 3 (10) Twitter 18 (14) 12 (19) 4 (14) 2 (6) Other Other data sources Data-driven methods for dengue management using real-world data: A systematic review significant predictors among several covariates in a model. Twenty-two studies (24%) included tried to respond to both aims. The most significant predictors were rainfall (22 models, 43% of 51 studies), temperature (21 models, 41% of 51 studies), and humidity (13 models, 25% of 51 studies). These predictors were also the most frequent in studies to predict dengue incidence rates or dengue outbreaks. Conversely, in studies on dengue diagnosis prediction, the most frequent predictors were fever (4 models, 66% of 6 studies), arthralgia/myalgia (3 models, 50% of 6 studies), platelet count (2 models, 33% of 6 studies), and white blood cell count (2 models, 33% of 6 studies) (Table 5) . Overall, in studies comparing different models, neural networks and decision trees gave the best performances and were the best models in 13 studies (52% of 54 studies), followed by support vector machine (9/54 studies, 17%). In studies to predict dengue incidence rates, regression-based models showed the highest performance (5/24 studies, 21%) ( Table 6 ). The full list of models and predictors, depending on the outcome, is provided in S5 Table. This systematic review showed that in the last 20 years, data-driven methods for dengue monitoring and prediction have become very popular, particularly in Asia where 72% of the included studies were performed. Very few studies were carried out outside Asia or the Americas, which is to be expected, because these are the two biggest dengue-endemic regions and 70% of the actual dengue burden is in Asia [149] [150] [151] . Studies in African countries were noticeably absent, although this continent also is a dengue-endemic region. The most frequent data sources were conventional data traditionally used in dengue-related studies, such as case counts, climate, environmental, and clinical data. However, this review also highlighted the growing interest by the scientific community for novel Big Data streams for dengue surveillance and prediction [14,33,39- Data-driven methods for dengue management using real-world data: A systematic review Table 4 . Evaluation metrics used in the selected articles depending on their aim(s) � . Table 5 . Most significant predictors for the three most frequently studied outcomes. Dengue outbreaks n = 9 Dengue diagnosis n = 6 . Indeed, social media and Internet search engines have become widely accessible worldwide, and therefore they represented the most popular novel data streams in the included studies. The easy access to these sources facilitates the assessment of their influence on infectious disease surveillance and prediction [152] [153] [154] . This is particularly true for neglected tropical diseases, such as dengue, Zika virus disease and chikungunya, because of their reoccurrence and the massive increase of their incidence in recent years [155, 156] . Moreover, harnessing these novel data streams can improve traditional dengue surveillance systems, because they allow the early detection of an outbreak, and thus can decrease delays between the actual dengue outbreak onset and the official case notifications [157, 158] . In the case of dengue control, early response is especially important because it can influence the outbreak severity. Our analysis also identified the underutilization of some data sources. Genomic data and vector-based data were exploited only in 6 of the 119 included studies [35,42,50,57,75,131], despite the importance of vector surveillance in dengue. Moreover, studies using genomic data were based only on human genome data, although scientists could easily access viral genome sequencing data, for instance via the European Virus Archive-GLOBAL (EVAg) [159] . EVAg aim is to offer access to viruses and to virus sequencing data (including dengue) to scientists, government agencies and academic institutions. None of the included studies made use of data provided by this archive. The lack of vector data is surprising because this type of information is crucial in dengue monitoring studies [160, 161] . However, we could not evaluate publication bias, especially in the case of underused data sources. As all included studies on the pertinence of a data source found a significant association between the source and a denguerelated outcome, we cannot exclude that some data sources were not underused, but rather not relevant for dengue management. However, the nature of the underused data sources could suggest that there is a dichotomy between data sources and the objectives of dengue studies: the studies focus either on techniques for vector monitoring/prediction or on techniques for human surveillance/prediction, but rarely on both. This dichotomy was also observed within human surveillance and prediction studies. Specifically, health scientists seemed to rely mainly on traditional data, whereas information technology researchers focused more on non-traditional data (especially social networks). Thus, studies using hospital data for dengue prediction rarely leveraged other data sources, such as climate data. Conversely, studies based on non-traditional data sources rarely used human data, besides the official number of dengue case counts. This might be explained by the fact that clinical data are often hard to access for researchers, particularly outside the medical community, for legal and ethical reasons. Furthermore, a substantial number of the selected papers were conference papers from Data-driven methods for dengue management using real-world data: A systematic review Information Technology & Sciences Conferences rather than Medicine Conferences. This might reflect the lack of interactions between research teams focused on prediction and/or informatics and physicians and/or government agencies focused on infectious disease monitoring and management. Yet, this research field would greatly benefit from combining their complementary approaches/expertise. Nevertheless, the most commonly studied outcomes in these articles based on real-world data were dengue incidence rate, dengue outbreaks and dengue diagnosis because they need to assess the reliability of novel data streams compared with traditional data sources. As most studies could demonstrate that these sources and methods can complete traditional surveillance and prediction methods, stakeholders should be more aware of these alternative methodologies and novel data streams, and reach out to these highly specialized teams to optimize outbreak dynamic tracking and to improve data completeness and prediction model accuracy. Most of the included studies relied on machine learning methods, particularly supervised learning models, to assess traditional and also novel data streams. These models were useful also for the analysis of traditional data sources, and allowed scientists to harness non-structured data with NLP methods [40, 43, 48, 49, [51] [52] [53] 56, 60, 65, 66, [69] [70] [71] 73, 76, 77, [79] [80] [81] 84, 85, 92, 98, 100, 102, 105, [110] [111] [112] 114, 115, 126, 127, 130, [134] [135] [136] [137] [138] [139] . Unsupervised learning models were not the method of choice in most studies, possibly because these studies wanted to identify relevant data sources and/or indicators for dengue monitoring and prediction. Indeed, unsupervised learning tends to be used to identify clusters with similar characteristics [162, 163] . Studies that used these methods wanted to predict dengue diagnosis based on the patient clinical profiles or to assess the validity of novel data sources, such as Twitter. Moreover, this approach for dengue research is fairly recent: with the exception of one conference paper from 2011, all studies using unsupervised learning models were published after 2016. Similarly, most studies relying on NLP methods were published rather recently, especially after 2017 (35 of the 42 studies with NLP methods). These two observations suggest that unsupervised learning and NLP might become more prominent in dengue research. It is important to note that despite the use of real-world data, these statistical methods were employed to analyze only retrospective data (but for one study), making their pertinence in real conditions difficult to assess. Evaluation metrics are crucial in real-world data studies because they help to determine whether the collected data are fit for the purpose (here, dengue surveillance and prediction) and to assess data quality and bias [164] . Although most of the included prediction studies used at least one of the gold standard metrics for information retrieval, such as precision (or positive predictive value) and recall (or sensitivity) [165] , several articles employed only errorbased metrics, such as root mean square error and mean absolute error. The choice of evaluation metrics is obviously related to the study objective, but even studies where information retrieval metrics could be calculated did not necessarily use them. Again, these methodological choices might be explained by the discrepancy between health scientists who prefer "traditional" modeling evaluation metrics and information technology scientists who focus on information retrieval metrics. This study also highlighted that despite the variety of approaches to predict dengue outcomes, some factors are constantly relevant, regardless of the study period or country, such as weather-based predictors, artificial neural networks, and decision tree models. However, a consensus on universal models and data sources has not been reached and will probably be difficult to attain due to the complex nature of dengue transmission. This review has two main weaknesses despite the systematic approach. First, we only searched for published articles and did not look for preprints. Second, besides the experts involved in this review, we could not obtain the opinion of other international experts due to the infectious disease context of 2020 (COVID-19 and dengue outbreaks in many regions). Therefore, we may have missed relevant studies for the review. Finally, the definition of realworld data can vary according to the stakeholders' view. We had to choose one single definition for the reviewing process, but other definitions do exist. Therefore, we cannot rule out a selection bias in our study. Overall, this review showed that combining novel real-world and Big Data sources with machine learning methods is a promising approach to improve dengue prediction and outbreak monitoring. These new approaches are especially relevant because they can help government agencies and experts to better prepare for each resurgence and better manage outbreaks. Their aim is not to replace existing systems, but to complement them, especially for reducing delays between outbreaks and reporting. Future studies should focus on better integrating all available data sources and methods to improve the stakeholders' response and to better understand dengue outbreaks. The global distribution and burden of dengue Viremia and Clinical Presentation in Nicaraguan Patients Infected With Zika Virus, Chikungunya Virus, and Dengue Virus Dengue: knowledge gaps, unmet needs, and research priorities World Health Organization. Global strategy for dengue prevention and control Refining the global spatial limits of dengue virus transmission by evidence-based consensus The incubation periods of Dengue viruses Dengue Guidelines for Diagnosis, Treatment, Prevention and Control. Special Programme for Research and Training in Tropical Diseases, editor. Geneva: World Health Organization Internet-based surveillance systems for monitoring emerging infectious diseases A new approach to monitoring dengue activity Predicting epidemics using search engine data: a comparative study on measles in the largest countries of Europe Improved state-level influenza nowcasting in the United States leveraging Internet-based data and network approaches Google trends: a web-based tool for real-time surveillance of disease outbreaks Early detection of disease outbreaks using the Internet Harnessing Big Data for Communicable Tropical and Sub-Tropical Disorders: Implications From a Systematic Review of the Literature. Front Public Health Dengue disease surveillance: an updated systematic literature review Systematic review of dengue vaccine efficacy Dengue Infections in Colombia: Epidemiological Trends of a Hyperendemic Country Dengue in Latin America: Systematic Review of Molecular Epidemiological Trends Prevalence and burden of dengue infection in Europe: A systematic review and meta-analysis Dengue disease outbreak detection Predicting the operations alert levels for dengue surveillance and control. Stud Health Technol Inform Evaluation of Internet-based dengue query data: Google Dengue Trends Epidemiological assessment of the severity of dengue epidemics in French Guiana Fuzzy model identification of dengue epidemic in Colombia based on multiresolution analysis A data-driven epidemiological prediction method for dengue outbreaks using local and remote sensing data Electronic event-based surveillance for monitoring dengue Prediction of dengue incidence using search query surveillance Using web search query data to monitor dengue epidemics: a new model for neglected tropical disease surveillance Neural network diagnostic system for dengue patients risk classification Non-invasive diagnosis of risk in dengue patients using bioelectrical impedance analysis and artificial neural network Causality Analysis of Google Trends and Dengue Incidence in Bandung, Indonesia With Linkage of Digital Data Modeling: Longitudinal Observational Study Google Health Trends performance reflecting dengue incidence for the Brazilian states A dengue fever predicting model based on Baidu search index data and climate data in South China A mathematical model to study the 2014-2015 largescale dengue epidemics in Kaohsiung and Tainan cities in Taiwan, China The current and future global distribution and population at risk of dengue Long-term predictors of dengue outbreaks in Bangladesh: A data mining approach Probability of dengue transmission and propagation in a non-endemic temperate area: conceptual model and decision risk levels for early alert, prevention and control Google Search Trends Predicting Disease Outbreaks: An Analysis from India Using Google Trends to Examine the Spatio-Temporal Incidence and Behavioral Patterns of Dengue Disease: A Case Study in Metropolitan Manila, Philippines Predicting the severity of dengue fever in children on admission based on clinical features and laboratory indicators: application of classification tree analysis Google dengue trends: An indicator of epidemic behavior. The Venezuelan Case Social Media as a Sentinel for Disease Surveillance: What Does Sociodemographic Status Have to Do with It? Using Baidu Search Index to Predict Dengue Outbreak in China The risk of dengue for non-immune foreign visitors to the 2016 summer olympic games in Rio de Janeiro, Brazil Outbreak detection model based on danger theory Weekly Forecasting Model for Dengue Hemorrhagic Fever Outbreak in Thailand Recurrent Neural Networks With TF-IDF Embedding Technique for Detection and Classification in Tweets of Dengue Disease In-Mapper combiner based MapReduce algorithm for processing of big climate data Data mining techniques for predicting dengue outbreak in geospatial domain using weather parameters for A Gaussian process based big data processing framework in cluster computing environment Prediction of Dengue Disease Through Data Mining by Using Modified Apriori Algorithm Early Self-Diagnosis of Dengue Symptoms Using Fuzzy and Data Mining Approach Artificial Neural Network for Health Data Forecasting, Case Study: Number of Dengue Hemorrhagic Fever Cases in Malang Regency Intelligent Dengue Infoveillance Using Gated Recurrent Neural Learning and Cross-Label Frequencies Healthcare Data Mining: Predicting Hospital Length of Stay of Dengue Patients Data mining for dengue hemorrhagic fever (DHF) prediction with naive Bayes method Feature Selection Algorithms for Malaysian Dengue Outbreak Detection Model Dengue Fever Prediction Using K-Means Clustering Algorithm Predictive Model for the Dengue Incidences in Sri Lanka Using Mobile Network Big Data Infodemiology for Syndromic Surveillance of Dengue and Typhoid Fever in the Philippines Dengue Fatality Prediction Using Data Mining Big Data and social media: surveillance of networks as management tool Remote Sensing Based Modeling of Dengue Outbreak with Regression and Binning Classification Using Google Trend Data in Forecasting Number of Dengue Fever Cases with ARIMAX Method Case Study: Surabaya, Indonesia Predicting Dengue Incidences Using Cluster Based Regression on Climate Data Using C-support Vector Classification to Forecast Dengue Fever Epidemics in Taiwan Tracking Dengue Epidemics Using Twitter Content Classification and Topic Modelling Dengue Propagation Prediction using Human Mobility Analysis of Significant Factors for Dengue Infection Prognosis Using the Random Forest Classifier Dengue Outbreak Prediction for GIS based Early Warning System Detect climatic factors contributing to dengue outbreak based on wavelet, support vector machines and genetic algorithm Analysis of Correlation between Google Search Trends and Dengue Outbreaks from India Building Intelligent Indicators to Detect Dengue Epidemics in Brazil using Social Networks Analysis of Epidemic Outbreak in Delhi Using Social Media Data Classification of Dengue Dataset Using J48 Algorithm and Ant Colony Based Aj48 Algorithm Analysis of Dengue Outbreaks Using Big Data Analytics and Social Networks Use of Social Media for the Detection and Analysis of Infectious Diseases in China Risk Factor Identification and Spatiotemporal Diffusion Path During the Dengue Outbreak Prediction of Dengue Outbreaks with Big Data using Machine Learning Prediction of dengue fever using intelligent classifier Prediction of chronic and infectious diseases using machine learning classifiers-A systematic approach Development of prediction models for the dengue survivability prediction: An integration of data mining and decision support system Prediction of dengue using recurrent neural network The Use of Spaceborne and Oceanic Sensors to Model Dengue Incidence in the Outbreak Surveillance System Statistical models of dengue fever Classification of Dengue Haemorrhagic Fever (DHF) using SVM, naive bayes and random forest An improved and adaptive attribute selection technique to optimize Dengue fever prediction Knowledge discovery in open data of dengue epidemic Enhancement of epidemiological models for dengue fever based on twiter data Improved Prediction of Dengue Outbreak Using the Delay Permutation Entropy Predictive analytics in Malaysian dengue data from 2010 until 2015 using BigML Using internet search queries for infectious disease surveillance: Screening diseases for suitability New key factors discovery to enhance dengue fever forecasting model Comparative study on decision tree based data mining algorithm to assess risk of epidemic Mining weather information in dengue outbreak: Predicting future cases based on wavelet, SVM and GA Disease surveillance using online news: Dengue and zika in tropical countries Detecting spatial clusters of infection risk with geo-located social media data Tweeting Fever: Can Twitter Be Used to Monitor the Incidence of Dengue-Like Illness in the Philippines? Dengue surveillance based on a computational model of spatio-temporal locality of Twitter A study of machine learning models in epidemic surveillance: Using the query logs of search engines An Innovative Big Data Predictive Analytics Framework over Hybrid Big Data Sources with an Application for Disease Analytics Dengue disease detection using K-means, hierarchical, kohonen-SOM clustering Diagnosis classification of dengue fever based on Neural Networks and Genetic algorithms Comparision using data mining algorithm techniques for predicting of dengue fever data in northeastern of Thailand Detection of dengue disease using artificial neural network based classification techniquetion Decision tree technique applied to the clinical method in the dengue diagnosis Knowledge discovery on dengue patients using data mining techniques Comparitive analysis of machine learning techniques for classification of arbovirus Comparison of Classification Techniques-SVM and Naives Bayes to predict the Arboviral Disease-Dengue Ieee International Conference on Bioinformatics and Biomedicine Workshops Multiple attribute frequent mining-based for dengue outbreak The Global Burden of Dengue: an analysis from the Global Burden of Disease Study Global, regional, and national dengue burden from 1990 to 2017: A systematic analysis based on the global burden of disease study 2017 World Health Organization. A Global Brief on Vector-Borne Diseases. Geneva: World Health Organization Digital Disease Detection-Harnessing the Web for Public Health Surveillance Web-based infectious disease surveillance systems and public health perspectives: a systematic review Social media and internet-based data in global systems for public health surveillance: a systematic review Global risk mapping for major diseases transmitted by Aedes aegypti and Aedes albopictus The potential impacts of 21st century climatic and population changes on human exposure to the virus vector mosquito Aedes aegypti An epidemiological network model for disease outbreak detection Public health surveillance in the United States: evolution and challenges The European Virus Archive goes global: A growing resource for research Entomo-virological surveillance strategy for dengue, Zika and chikungunya arboviruses in field-caught Aedes mosquitoes in an endemic urban area of the Northeast of Brazil Arbovirus vectors of epidemiological concern in the Americas: A scoping review of entomological studies on Zika, dengue and chikungunya virus vectors Big data and clustering algorithms Different clustering algorithms for Big Data analytics: A review Causality: models, reasoning, and inference Schü tze H. Introduction to Information Retrieval