key: cord-0890776-1kyrptgx authors: Samaras, Loukas; García-Barriocanal, Elena; Sicilia, Miguel-Angel title: Chapter 2 Syndromic surveillance using web data: a systematic review date: 2020-12-31 journal: Innovation in Health Informatics DOI: 10.1016/b978-0-12-819043-2.00002-2 sha: bb91814bdf50b5c074284df791f7a19f0859d5b8 doc_id: 890776 cord_uid: 1kyrptgx Abstract During the recent years, a lot of debate is taken place about the evolution of Smart Healthcare systems. Particularly, how these systems can help people improve human conditions of health, by taking advantages of the new Information and Communication Technologies (ICT), regarding early prediction and efficient treatment. The purpose of this study is to provide a systematic review of the current literature available that focuses on information systems on syndromic surveillance using web data. All published items concern articles, books, reviews, reports, conference announcements, and dissertations. We used a variation of PRISMA Statements methodology to conduct a systematic review. The review identifies the relevant published papers from the year 2004 to 2018, systematically includes and explores them to extract similarities, gaps, and conclusions on the research that has been done so far. The results presented concern the year, the examined disease, the web data source, the geographic location/country, and the data analysis method used. The results show that influenza is the most examined infectious disease. The internet tools most used are Twitter and Google. Regarding the geographical areas explored in the published papers, the most examined country is the United States, since many scientists come from this country. There is a significant growth of articles since 2009. There are also various statistical methods used to correlate the data retrieved from the internet to the data from national authorities. The conclusion of all researches is that the Web can be a useful tool for the detection of serious epidemics and for a creation of a syndromic surveillance system using the Web, since we can predict epidemics from web data before they are officially detected in population. With the advance of ICT, Smart Healthcare can benefit from the monitoring of epidemics and the early prediction of such a system, improving national or international health strategies and policy decision. This can be achieved through the provision of new technology tools to enhance health monitoring systems toward the new innovations of Smart Health or eHealth, even with the emerging technologies of Internet of Things. The challenges and impacts of an electronic system based on internet data include the social, medical, and technological disciplines. These can be further extended to Smart Healthcare, as the data streaming can provide with real-time information, awareness on epidemics and alerts for both patients or medical scientists. Finally, these new systems can help improve the standards of human life. and the peak of influenza in the United States. Since then, various researches have been conducted with the help of web data (social media, search engines, etc.) to establish the fundamentals of internet surveillance systems. This novel approach deals with the implementation of information systems based on data from the web to track epidemics and to create patterns and rules for an early prediction. Across all over the world, at universities, organizations, or research centers, scientists study the potential of the web in epidemics. Except from various information systems and methods created for data analysis of the web data, new terms were introduced to describe this new approach and trend, such as Infoveillance or Infodemiology Spruit & Lytras, 2018) . In this study, we examine the publications until the year of 2018. This examination was applied on an initial collection of 337 published items, of which 225 were found to be relevant. These publications consist of many different types. Most of them (86.22%) are published articles based on researches from various organizations, universities, or research centers which study the fields of syndromic surveillance or monitor epidemics. Other publications are based on dissertations, conference announcements, reviews or systematic reviews of articles, books, and one Google patent. In Fig. 2 .1, we can see these types and their percentage in relation to the sum of the published items. As we can see in Fig. 2 .1, most of the publications are published articles. Nevertheless, we allocated 19 reviews or systematic reviews. With this work we try to give a detailed analysis of the works published by identifying the most relevant ones, concerning syndromic surveillance using the Web, presenting the biggest collection of related articles ever made on this subject. By using the methodology of systematic review, there is an in-depth description and analysis of the main areas covered by these publications: the time (year) they were published, the diseases that were examined, the web data sources used by the researchers, the countries in which these diseases were spread, the approaches of the data analysis used by the scientists, and finally, the number of scientists that have worked until now. The detailed analysis refers to 225 relevant publications that were extracted from the online database of Google Scholar (2018) , which is a powerful tool for every researcher. To conduct this review, we used a variation of the PRISMA statements methodology (Moher, Liberati, Tetzlaff, & Altman, 2009 ) and we also examine the previous reviews or systematic reviews. We identified 19 previous reviews that were conducted to analyze various articles in the previous years. Of the total reviews, most of them focus on the influenza disease or other diseases, while others examine epidemics using web data in a more general way. We consider the work of Rattanaumpawan, Boonyasiri, Vong, and Thamlikitkul (2018) as the one with the biggest collection of articles, 110 in total, without ignoring of course the good work that has been done by other scientists who conducted reviews. Considering the above, this study is a both quantitative and qualitative analysis, and the aims of this review are: 1. To make a complete collection of articles, related to syndromic surveillance using web, available to any researcher. 2. To investigate the academic interest on this field, based on the number of published items every year. Is this growing or declining? 3. To analyze other aspects, such as the epidemic characteristics (diseases), the geographical spread of the researches and researchers, the used data, and the way the data analysis is described in these reteaches. 4. To estimate and evaluate what areas has been explored so far and what could be the possible research in the future. 5. To elaborate a novel way of systematic review on this research field that can help in deciding the importance of this research field. 6. To showcase, evaluate and align the results to the new Smart Healthcare technologies Our research took place from May 1 to June 30, 2018. The research model used was a variation of the PRISMA model, including a research protocol, which was created for this purpose and consists of five research stages plus a writing phase. The five stages are as follows: preparation, data retrieval, data analysis, data synthesis, and results. Below, we briefly present each stage. The preparation stage includes all the necessary decisions and tools to be used, such as the scope, the method designing, and the research questions. The definitions of the methods, appropriate for the implementation of this research was critical, meaning that it should be decided at first. Three research questions were created to examine and understand the current literature regarding syndromic surveillance using the Web. The motivation of this review is to categorize and summarize the previous work that has been done until today and find the explored and unexplored areas toward a future work. These questions are as follows: RQ1: Is the academic interest growing or declining? To answer this question, we must base on the current literature and examine this in a yearly basis. RQ2: What aspects have been explored until now in the available literature? In addition, it is useful to understand which areas are underexplored. Generally, there is a very wide field of research, since there are lots of diseases and lots of countries, which can be researched. With this question, we are seeking what has been researched and at what extend. So, the next research question should be as follows: RQ3: What topics have been covered and which ones need further development and research in the future? This stage was concluded with some sample data that were gathered to help in finalizing the methods and tools needed to complete this research. The present review has gone through with the help of the search queries that were inserted in the Google Scholar search engine. The criterion for the relevance of publications is whether each one includes the following keywords in the analysis, or the variants of them: syndromic surveillance OR detecting epidemics AND using web. We conducted an extensive search through Google Scholar for research papers, articles, books, dissertations, reports, and conference presentations. The main keywords (which were expected to be prevalent in the relevant papers) were syndromic surveillance, predicting epidemics, and using web. These keywords are generally broad terms, while some others, more specific, were used such as Google, Twitter, Yahoo, or Web. There were two final search strings (for each of the two searches made) and can be expressed in one Boolean statement as follows: ððA OR BÞ AND ðC1 OR C2 OR C3 OR C4ÞÞ where A, syndromic surveillance; B, predicting epidemics; C1, using Web; C2, Google; C3, Yahoo; C4, Twitter. The search strategy contained the following design decisions: Searched databases: Google Scholar, which contains articles from Springer Link, Science Direct, IEEE Xplore, Web of Science, etc. Searched items: Journal articles, conference papers, workshop papers, technical reports, books reviews, and dissertations. Search applied on: Full text, to avoid exclusion of papers that didn't include the searched keywords in abstracts or titles or used a different variant of the terms but were relevant for this review. Language: The search was limited to papers written in English. All other languages are excluded. Publication period: All years (2004À18). Syndromic surveillance using web started to receive attention after 2004, as already mentioned. Therefore the chosen publication period is set to be since 2004. The outcome of the search process resulted a total of 225 relevant papers, of which 19 are relevant reviews or systematic reviews. The data were retrieved in a period of two weeks and initially included 337 published items, possible to be eligible in the review. All needed information was entered in a centralized Microsoft Excel spreadsheet for further analysis, as follows. The objective of the study selection was to identify papers relevant to the objectives of the review, according to the agreed scope. The search strings were set to avoid excluding relevant studies in a relatively small category or when some data were missing, for example, when there was a research using web data, but not for a specific disease. In addition, some articles were entered twice, since there were present in each one of the queries entered in the Google Scholar search engine. During this phase, extensive analysis (quantitative and qualitative) has been done regarding the information gathered from the reviewed papers and articles. This was necessary for supporting data categorization and data synthesis. Our goal was to summarize, in a quantitative way, the main areas of research related to syndromic surveillance using web data. These quantitative summaries have been included in the results section, assisted with graphs that are supplemented by the references to the included papers. The statistical analysis has been performed in all areas concerned with the synthesis of the studies that were identified. We used multiple perspectives to indicate interesting relationships and trends within the reviewed articles, since the available literature is not limited. By doing so, we believe that a scientist can better understand the perspectives and the trends of this type of research that has been conducted and goes into depth, finding interesting questions and views of the research field, potentially for a future work. The classification of the results includes the required characteristics to summarize, include or exclude features, to help the synthesis, and finally, the proper evaluation of all the researches included in this review. This stage includes the outcomes of this research. The scope is not only to compare the previous published research to each other, but also to determine the importance of the research field of syndromic surveillance using web data. The final stage is to write the procedures and results of this extensive research. This stage consists of all the procedures taken place during this research and we present them all in a way that helps in understanding each stage. The entire research model can be briefly described in Fig. 2 .2. In this section, the results of this systematic review are presented based on the 225 papers that are relevant to the examined subject and finally selected. 2.3.1 RQ1: Is the academic interest growing or declining? To be able to answer this question, we must see the development of the articles and other items that were published each year from 2004. That means that we must find when these publications were made and determine the trend. The first (relevant) one is published in 2004 and concerns the study of Johnson et al. (2004) , while the last one was published in 2018. Many studies occurred after 2009 as a result of the 2009 influenza pandemic. A lot of articles refer to influenza and particularly to the H1N1 virus that was responsible for many cases with severe symptoms during that year. This virus type has an unusual characteristic: it does not affect people over 60 years old but many young people, adults, and children. This virus type can lead to pneumonia if not early diagnosed and treated. The term swine flu was used by the news media in America and a lot of money were directed to the treatment of this disease. For instance, in Russia, the government has allocated 4 billion rubles (US$140 million) to buy the initial 43 million doses of vaccines, to perform In Fig. 2 .3, the thicker line represents the number of publications per year, while the dotted thinner line represents the trend, as it was calculated from the following polynomial formula: y 5 0:3042x 2 2 1:3639x 1 2:7626 where y is the expected number of publications and x is the current year-2003 Most articles per year are observed in 2017 (50 articles), but from 2013 until 2017, each year over 20 articles have been published regarding syndromic surveillance using the Web, except from the year 2015, in which we observe 12 publications. It seems that many scientists, research centers, or organizations appreciate the usability of the Web to provide data and tools for monitoring epidemics and outbreaks when they are effectively used along with the science of statistics. During the year 2018, we have allocated only 20 articles, but this is normal since this systematic review is conducted in June 15, 2018. As a result, the number of publications for the year 2018 is calculated as 50, using the abovementioned formula. The results of this analysis answers to the first research question that the academic interest grows over time. Health issues are widely discussed in this research field as it is critical for modeling and predicting purposes concerning public health. Of the 225 publications, 161 (71.55%) refer explicitly to one or more health subject or disease. Some researchers conducted researches for more than one health subject (6.22%), while others focused on one disease (65.33%). There is a relatively large percentage of researches that did not examine specific diseases, but the syndromic surveillance generally (28.45%). Table 2 .1 includes all cases. The relative % column indicates the percentage of the publications containing at least one health subject/disease. As we can see in this table, 103 published articles are related exclusively to influenza (45.78%, relative percentage 63.98%), 10 articles are about dengue, 5 articles are about HIV/AIDS, 4 articles are about malaria, 4 articles about cancer or breast cancer, and 3 articles about Ebola exclusively. There are also two references for foodborne illnesses and two for hand, foot, and mouth epidemics. Other health subjects are discussed once, such as the African swine fever, malaria, Lyme disease, listeria, foodborne illness, Ebola, depression, dementia, communicable diseases, cholera, human papillomavirus, measles, respiratory syncytial virus, syphilis, and systemic lupus erythematosus. As we see, not all the abovementioned are diseases that can be transmitted, although depression is s special case of psychological phenomenon (disorder) related to individuals. However, regarding public health, the research revealed that the Web can be a useful source of data for other nontransmitted diseases or disorders as cancer , or depression (Yang, Huang, Peng, & Tsai, 2010) . There are also articles which examine more than one disease and we include them separately in the table. Such cases are influenza and listeria, influenza and pertussis, cholera and dengue, influenza, cholera and rabies virus, influenza and dengue, and influenza and tuberculosis. Three studies include more than three diseases in their area of examination. Another aspect is the keywords used to retrieve data from the internet for those researchers who have built specific databases for this purpose. In 78 studies, the scientists used as a search term, the name of the disease, in the language of the country they examined or using the English-speaking terms. Others used the words for their symptoms or other keywords, for example, "cold" to detect influenza, based both on the weather conditions of this common disease and on social media (Shikha, Younghee, & Mihui, 2017) . Weather is an important factor for the development of epidemics and we will come back to this matter, since there are some researches from scientists that are classified as relevant. We will further discuss the potential of weather forecast to help syndromic surveillance using the Web. A major aspect is where the relevant published researches have been conducted, but also which regions or countries of the earth these researches concern. We may assume that there is a similarity between these two aspects, since it is usual for a study to be made in the United States, for the United States. This is only a general rule applied to most publications, but there are exceptions. For instance, a scientist in Italy conducts a research about Africa. Such a case is the study of Alicino et al. (2015) , which examines the spread of Ebola virus in West Africa based on the findings of the researchers from the University of Genoa (Italy). In our case, we believe that it is more appropriate to analyze the geographical location that has been under research with the use of the internet. This is important, taking into account that the use of internet is globally available and this makes easy for data to be accessed by people in almost all countries of the world. Fig. 2 .4 shows explicitly the geographic regions from which surveillance data were used. In this figure we can see that the most explored region is North America with 105 publications. This occurs since most researchers originated from the countries of the United States and Canada, but also some other scientists have conducted researches for these countries, especially for the United States. It is important to notice that for Africa, there are only four studies conducted for the countries of West Africa (Liberia, Sierra Leone, and Guinea) for Ebola virus, since this virus is generally very common in the regions of Africa. For Europe, there are 41 publications and for Asia 49. For Asia, 17 of the articles concern China. For Oceania (Australia and New Zealand), there are eight publications, while for South America we found six publications. For Central America and Caribbean, only one publication has been made, referring to Haiti. It is worth mentioning that eleven publications are made, examining more than one geographical region, for example, a study was made using data for Australia, Canada, Ireland, New Zealand, South Africa, the United Kingdom (England and Wales, Scotland, Northern Ireland), and the United States (Paul, Dredze, Broniatowski, & Generous, 2015) . In the regional analysis, we separate the regions that were exclusively studied, while there are eleven publications concerning countries more than one. Another example is a study for Australia, Canada, and Ireland, regions quite different to each other, but they are included in one article. Other publications concern the United States and Australia or Brazil, Ukraine, Turkey, and Venezuela. We see that, in the last case, there are four countries belonging to two different geographical continents, South America and Europe. The above observations mean that the current publications concern all the continents of the world, except from Antarctica and the North Pole, since very few people live there. It is also interesting to locate the countries, where the research has been conducted. All 225 researches were conducted in 32 countries. Most of the studies (102) were made in the United States. In the second place, far away, we find China with 19 publications, and the third place is taken by the United Kingdom with 18. Australia and Italy have nine publication each one. The complete list of countries is shown in Table 2 .2. From Europe, there are totally 47 publication from 9 countries on this research field. Surprisingly, Central Europe does not have too many publications, as Germany has none and France only three. In Europe, there are totally nine countries, in which researches are made: Denmark (5), France (3), Italy (9), Netherlands (2), Norway (1), Portugal (2), Spain (4), Sweden (3), and United Kingdom (18). On the contrast, it is quite interesting that countries, which are not considered as very developed or they are very small, have at least one publication, such as Madagascar, Pakistan, Philippines, and Thailand. Scientists referred to or use various sources of data from the web for their experiments or/ and their analysis. We have found 169 different web sources that were used. The most popular is Twitter, as shown in Fig. 2 .5. In Fig. 2 .5, we can see all the 169 web data sources. Fifty-nine publications (34.91%) have been conducted about Twitter. The next popular data source is Google Trends (30, 17.75%), which include both Google Trends and Goggle Flu Trends (GFT). Naver Trends search engine is used in one occasion. Other 18 (10.65%) researches include a combination of two or more data sources, for example, using social media and web search logs. The 15 publications (8.88%) which are mentioned as other include special systems developed in some countries. These systems are basically local systems used in countries to gather epidemiological data. They have been developed in specific countries and can be accessed to provide useful data and information, for example, recorded incidents by physicians, cloud-based electronic health records, various other databases, or emergency medicine internet sites, such as the medical website Vårdguiden.se of Sweden ). The Google search engine follows with 11 publications (6.51%), while the Chinese Baidu search engine and Microsoft Bing were used in 6 researches (3.55%). The same number is found to blog data. Wikipedia is a web data source, used in five studies (2.96%) and data from Yahoo were used in two instances. We can see that various web data have been used, in correlation to the syndromic surveillance data. It is important to mention that the data extraction was executed using traditional methods (use of internet sites) or by using specific API's (Application Programming Interfaces), written in programming languages, such as Python, C11, C#, .NET, etc. There are many techniques regarding the data analysis approach used by various researchers that made estimations, analysis, or predictions. We have identified 109 published items with an explicit data analysis method (48.44%). The data analysis methods or statistical approach cannot easily be categorized, as scientists use sometimes multiple methods but, as a rule, all tried to find correlations between data from the official authorities for diseases and the data obtained from the web to create prediction rules. Twenty-four correlation techniques are used to correlate health data to data from the internet. The correlation techniques used are mentioned as Pearson correlation R, Spearman correlation coefficient, cross correlation function, simple correlation, or autocorrelation function. Others use the R 2 coefficient. It must be noticed that many researchers use combinations of models, which is very common in statistics, when the probability or the validity of the results must be examined and verified in depth before they are announced. Thirty-two types of regression or autoregression models were found. These models can be summarized as follows: linear regression and the ARGO model (autoregression with Google search data) which uses publicly available online search data and not only incorporates the seasonality in influenza epidemics, but also captures changes in people's online search behavior over time. GARMA model is the generalized autoregression moving average model, ARIMA (Autoregressive Integrated Moving Average model) or ARMA is the Autoregressive Moving Average model, and SARIMA is the Seasonal Autoregressive Integrated Moving Average model. It must be said that in all cases, the dependent variable is the data from the official health authorities, while independent variables are the data from the sources on the internet. Six studies have been conducted using linear, simple, and multivariate or multiple models. Seven use nonlinear models, such as logistic (logit), nonlinear regression, etc. Six researches use frequency analysis, for example, frequency of queries submitted in the internet. It must be mentioned that there are some other models used, such as the Support Vector Machine (SVM) in two studies, and two used the Ailment Topic Aspect Model (ATAM). Of course, there are some cases that the statistical analysis requires more than one technique, for example, a research (Yanga, Santillana, & Kou, 2015) was conducted, using multiple techniques, such as: Other scientists use the LASSO (Least Absolute Shrinkage and Selection Operator) models, referring to them as LASSO regression or LASSO algorithm, while others use other models, such as ANOVA, or other machine learning algorithms, or a combination of them. Regarding an early prediction, some researchers conclude that, using web data, prediction is possible. In nine studies, early prediction time (period) varies from one day up to 6À13 weeks. There are totally 786 authors, which contributed in the 225 publications. Some of them worked in more than one research. John S. Brownstein has the most publications, as he has participated in 17 published researches. Michael J. Paul has 10 publications as an author, while Mark Dredze and Mauricio Santillana have nine and eight, respectively. Vasileios Lampos has seven, Elad Yom-Tov has six, and Elaine O. Nsoesie has five. Below, in Table 2 .3, we see that 29 scientists have three or more publications and other 72 have two publications as authors. Although many aspects of the research regarding syndromic surveillance using the Web have been used and analyzed, we believe that the field of research is still very wide. Besides influenza, other diseases have been explored but not to a very wide extend. Lately, since 2017, another aspect is under examination; the role of weather. Four studies examine the role of weather into an integrated method based on weather data to assist the epidemics prediction. According to Van Noort, Á guasa, Ballesterosb, and Gomes (2011), a more direct approach in determining the seasonal variation of the ILI factor p, for example, could be the confirmation of the presence of ILI symptoms in influenza infected persons. The above statement is important, since indicates the use of weather data to track the presence of influenza. If this is true, then future information systems could be based both on web data and weather data. Such an effort is described in the research of Shikha et al. (2017) . This is a very interesting aspect, since it is not widely researched and could be researched further. A surveillance system may be created, based on weather forecasts that can be accessed through the internet and give warnings about a possible spread of a disease, for example, influenza based on weather conditions. Regarding influenza, it is considered that the spread can be extensive in areas with cold and dry weather conditions (World Health Organization, Media Center, Influenza Overview, Fact Sheet no 211, March 2003) (World Health Organization, 2003) . Furthermore, it might be interesting to analyze the spread and the outbreak of diseases with discrimination between urban and rural regions, taking into account that most infectious diseases are developed in a considerable extend in crowded cases, such as big and populous areas, for example, large metropolitan cities. This could be interesting, as in rural areas, internet data may be less or more difficult to be discovered, especially in some countries with vast territories or less developed. The above results show an analysis regarding the time, the epidemics, the location, the web data, and the data analysis methods, which are presented in the examined researches and included in this review. Regarding the time, more publications are made every year based on researches from various countries from the world. This is normal, as technology and web continuously advances. The data from the web are now abundant and many scientists can take advantage of it. Many fields of science are now included in this effort to harness the web and estimate epidemics and health issues. Computer science, statistics, sociology, and medicine work together to investigate the appropriate method to exploit the web data. Various methods are used to analyze the data in a complex framework with the ultimate goal for the social benefit. While broadband internet is nowadays widely used as the data speeds are growing with VDSL and ADSL lines (Very high speed digital subscriber line, Asymmetric digital subscriber line) by using fiber optics cables, there is still a part of the world undeveloped, such as Africa, Central America, or South America. As we saw, only four studies examine web data for Africa, one for Central America, and six for South America, as people in these countries may still have not the economy power, the expertise, or the infrastructure to widely use web services. That's why the data volume is limited in these geographical areas. On the other hand, more advanced countries seek the way of using web data for health purposes. Regarding epidemics, influenza seems to provide plenty of data and this explains the big percentage of studies that use data for this disease. It is also a disease with many health complications and a large number of deaths annually and worldwide. Twitter is found to be used in a large scale, as it may give many and useful advantages. Through the special APIs (REST and STREAMING API), there is the capability to track messages real-time, geolocated from almost every part of the world. The messages from Twitter, called tweets, provide the sufficient amount of data that can be further analyzed to help in the estimation of epidemics, although this procedure requires more computer programming techniques and algorithms. Google services, on the other hand, such as the search engine or the Trends, are also extensively used and are found to be also accurate enough to track epidemics. Nevertheless, as we have conducted researches with these two popular data sources, we believe that the potential of social media and Twitter particularly is bigger, despite certain constraints and limitations. There is a lot of discussion about the usability of internet data to track epidemics. The key focus a researcher must have under consideration is how an internet system can be managed and function as a replacement system, a supplementary system, a support system, or an extension to traditional monitoring systems. Peek, Holmes, and Sun (2014) research is about the effective management of big data, collected through both traditional and internet surveillance systems. The scientists of this research believe that data for health and biomedicine have become so big and complex that the traditional methods and tools for managing them are not efficient any more. They call this development a big data revolution. They examine and discover the technical aspects of big data and the infrastructure needed for the management of them, to provide useful information. They also review analytical information about health and biomedicine regarding selection of cases and control, bias and confounding in observational data and techniques for mining health dimensional data. These data come from medical records, administrative data, web search logs, and social media. It is widely recognized that internet and electronic health data are quickly growing as technology advances. Capturing health data from various sources needs a change in the technical and management techniques regarding, such as the infrastructure, the file systems as well as the multidimensional data processing. To catch data from the internet, this can be done either with real-time techniques and nonreal-time techniques. Nevertheless, if we need early prediction of epidemics, we must support real-time data gathering and new technologies must be introduced to do so, such as Apache Spark and Storm. examined various studies about the usability of data from the social media. Many authors claim that social media programs should primarily be used to support existing surveillance programs. The researchers of this study concluded that the use of search queries and social media for disease surveillance are relatively recent phenomena and there is also an evolution of the tools and the methodology to exploit them. Although these surveillance systems, based on internet data, have a support function to the traditional ones, they require a high level of familiarity about capturing social behavior through social media. The data from social media, such as Twitter, YouTube, and Facebook, are abundant, but the quality of health information among users in these media is highly variable and this may raise some concerns (Schein, Wilson, & Kealan, 2011 ) that social media users are exposed to unopposed viewpoints that counter core public health recommendations. We believe that this is logical but partly true as people interact with each other through social media about health issues, but it is hard to imagine that an individual could ignore the official treatment and medication or precaution measures for severe health issues just based on what people say in the social media. On the other hand, a person could be further alerted using social media. Infoveillance (Guy, Ratzki-Leewing, Bahati, & Gwadry-Sridhar, 2011 ) is a term to describe the capability of real-time retrieval of internet data regarding syndromic surveillance. It is a strategy to capture real-time online data. These data can be systematically mined, aggregated, and analyzed to inform public health and policy. Social media are also considered as a real-time source of epidemic intelligence. Epidemics can be monitored through accessing the expected magnitude, peek time, and intensity, and the duration (Nsoesie, Brownstein, & Ramakrishnan, 2013) . The data from the internet can inform healthcare practitioners on when to expect changes in demand for healthcare resources. A surveillance system based on real-time data could provide the means to do so. Nevertheless, in some cases, a precise estimation is not always successful. GFT is an online system that provides useful data, based on Google searches made for influenza. This system was developed by Google and during the first working period, it could also provide estimates and predictions. GFT is now no longer publishing current estimates, since it missed the emergence of the 2009 influenza pandemic and overestimated the 2012À13 influenza season epidemic (Olson, Konty, Paladini, Viboud, & Simonsen, 2013) . Google searches though were used by many scientists and researchers to track epidemics, using various statistical methods that were very successful. Generally, we must consider that a general rule cannot be found for all countries and all diseases, but each one of the latter requires deep statistical analysis; a pattern of a disease in one country may differ in another. It is possible, however, to build systems based on internet data, social media, search engines, etc., but each system requires methods, which can be similar or not. The global pattern of human behavior and globalization can help this, since internet allows people to interact to an electronic system (such as Google Search) or to each other (social media). It is true that internet-based approaches are logistically and economically appealing. However, they do not have the capacity to replace traditional surveillance systems; they should not be viewed as an alternative, but rather as an extension (Milinovich, Williams, & Hu, 2014) . Of course, many people could say that the previous statement is thought to be based on today's technology. In the future, as the research continues, technology will further advance and the credibility of internet-based systems will probably be better and wide across the world. The findings of the researches until today show that the potential of creating such internet surveillance systems becomes more possible over time. The impact of syndromic surveillance using web data has been shown to be large, both to the academic community and to the world in general. This is revealed not only from the numerous publications made, from the large number of authors and researchers, but also from the variety of techniques and sources used. The available tools to estimate and predict epidemics or health issues are undeniably a big conquest of science by humanity. Using web data, syndromic surveillance could be accurate in a less expensive way than traditional systems. Nevertheless, traditional surveillance systems must not stop existing. Web systems could be thought as supplementary or supportive systems. Hardware and software still go on getting better. Despite the evolution of electronic systems, the role of human is still important. Moran et al. (2016) describe the role of human interaction, comparing the weather forecast models to the ones for a weather forecast and found that epidemic forecasting is messier than weather forecasting. Google data and Twitter data represent almost the two-thirds (65.70%) of the total web data, used to estimate epidemics. However, we must not forget that there is significant difference between these popular tools. Using Google logs or data from other searched engines, there is the ability to track the search volume and not the search text itself, because what we really need is the aggregated number of searches, provided by these search engines. Twitter on the other side has this ability, but it first requires and it's easy enough to get the whole text which is included in a message. This means that we can read the detailed view of an individual regarding epidemics, but almost on every opinion a person has about a topic and not just for epidemics. This is not of course a privacy violation, but it must not be used for evil purposes. Human interaction in the social media must be conceptualized as a moment view, which is depicted in a specific time and place through internet. This is moreover what syndromic surveillance does; an estimation of time and place. Regarding modern healthcare systems, this work reveals the potential of internet syndromic surveillance systems to perform as adaptive analytic systems or as a novel research perspective in many ways, critical for Smart Healthcare: by enhancing Smart Healthcare data preprocessing and modeling and by using big data analytic techniques, model evaluation, knowledge deployment is possible through new forms of information infrastructure, as described in the work of Spruit and Lytras (2018) . The key concepts applied to Smart Healthcare systems may be the following: • How applied data science for patient-oriented healthcare can empower medical scientist and patients to more effectively and efficiently improve healthcare, • The discipline of applied data science, • The knowledge discovery process (KDP), and • The meta algorithmic modeling. Since gathering data from the internet is shown to be easy nowadays across the world, Smart Healthcare systems may drive the derived information to people in various directions: governments, scientists, medical doctors, national or international medical organizations or institutions, patients, or people who travel a lot and need direct and constant knowledge of what is happening in the world, regarding epidemics. This spread of data and information, within the framework of Smart Healthcare, could be valuable. Considering the advance of Internet of Things (IoT), the latest and future introduction of 5G networks, or smart home perspectives, internet surveillance systems could lead to a better warning and dealing with epidemics health plans and health policy decisions. Under the above aspect, health decision policies might need a closest and greater attention to emerging technologies, such as the IoT, cognitive computing, advanced analytics and business intelligence, 5G networks, anticipatory and context-aware computing, and advanced distributed data warehouse platforms (Visvizi, Lytras, Damiani, & Mathkour, 2018) . We believe that the key perspective would not be just the change of the technology infrastructure and social behavior, but the proper handling of health data itself for health policies. Extending further the above into the future smart cities or villages (Lytras & Visvizi, 2018; Visvizi &Lytras, 2019) within the framework of IoT, another debate comes up regarding data on epidemics: first, citizens' awareness of applications and solutions that are considered smart and secondly, their ability to use these applications and solutions. In any case, to enhance traditional surveillance systems with internet systems for epidemics, given the proper use of the health data, can lead to social-driven decisions and policies for better living conditions for humans. At the end, both syndromic surveillance using internet data and Smart Healthcare will have the impacts in society, technology, and in medical science. The main conclusions of this review can be summarized as follows: There is an extensive literature related to the field of syndromic surveillance using the web. This review has gathered the biggest collection of articles and publications on this field ever made, totally 225. Of course, we may assume that more publications can be found except from Google Scholar or using other search queries, but the bulk of the gathered literature in this review is quite enough to extract useful results and conclusions. The academic interest is still growing, especially after 2009. The years containing relevant publications extend from 2004 to the year of 2018. Since 2009, the number of publications has risen, after the 2009 pandemic of influenza. The total number of authors in publications is 786 worldwide. These two facts are important as they reveal the significance and the future perspectives of this research field. Many health topics have been discussed, while most of them are about influenza. It seems that for influenza a lot of data are available through internet, but the fact that totally 34 severe diseases have been included is also important, as it shows the potential of the web data to be used for syndromic surveillance. Most articles examine North America, the United States, Canada, Europe, and Asia, since these are mainly the origins of scientists and data are broadly available in these areas of the world. Almost all regions of the earth have been examined for epidemics, but some areas, such as Africa, South America, or Central America, are underexamined. The contribution of scientists across the world is also extensive, since researchers from 32 countries of the world have published works on this field. Twitter at first and secondly Google are the most referred and used web data sources, although other social media, search engines, health records, and web logs have been widely used. There is a rich collection of various web data and electronic systems that have been built to present an internet surveillance system. Various models for data analysis and techniques are used to correlate health data to data from the internet. In some occasions, scientists show that a prediction of epidemics is possible before the official data are announced from the competent health authorities. This prediction was achieved 6À13 weeks prior to the official data. Another important aspect is that, using web data, we could estimate a disease development in shorter surveillance time intervals; while most of traditional surveillance systems monitor epidemics on weekly or monthly basis, information systems based on web data could provide daily monitoring or even for shorter periods. Smart Healthcare innovations can use internet-based surveillance systems to assist in the awareness and direction of information to people, medical science and organizations, as well as the policy decisions. With the continuous use of internet, Global System for mobile communication networks, smart mobile devices, and health information could lead to a better understanding and informative techniques regarding epidemics. Within the concept and realization of IoT, early detection and prevention of epidemics could be the common policy for the future, rather than treatment, by reducing costs and procedures. Technology infrastructure will change to deal with the big data revolution, as not only humans, but also smart devices would probably be constantly connected and interact with each other. KDP is a constant effort to derive knowledge out of data. This can be thought as an adaptive procedure in relation to the technology progress. Health data have been always sensitive and not easy to use. The same applies for internet data, even though they are not considered as epidemical, but inspect and reveal an epidemic from other sources, for example, social media, web searches, etc. Internet, but mainly Smart technologies, could become irrepressible, if not treated under the previous assumption. Future research could also be dedicated on this matter. Both Internet surveillance systems and Smart Healthcare have the same destination: to improve the quality of life. The potential of the research on this field remains broad and other diseases except from influenza could be further examined. In addition, the research could be driven to other aspects of syndromic surveillance, such as the weather conditions or the distinction of epidemics in between urban and rural regions with the use of data from the web. Furthermore, although the impacts of internet data for monitoring epidemics may be obvious, it could be more discussed in the future. Web can be a useful tool for obtaining data and make predictions for epidemics despite sometimes the restrictions of accuracy or the size of the data. Internet-based surveillance systems can be established to track epidemics and could work as alternative, supportive or as an extension to traditional systems. Finally, we believe that the contribution of this study is that it showcases the extended research and use of internet surveillance systems, the techniques, and the future perspectives within the concept and framework of Smart Health. Technology advance requires change of the way we think of data, information, and how to deliver them to accomplish the improvement of public health. • Extensiveness of surveillance systems with internet data • Data acquiring techniques and programming languages • Social media interaction • Early prediction and real-time monitoring of epidemics • Smart Healthcare with the use of internet data for epidemics Johnson, H. A., Wagner, M. M., Hogan, W. R., Chapman, W., Olszewski, R. T., Dowling, J., & Barnas, G. (2004) . Analysis of web access logs for surveillance of influenza (2004 Monitoring disease outbreak events on the web using text-mining approach and domain expert knowledge. International conference on language resources and evaluation 10. Portoroz, Slove´nie, 23 May 2016/28 May 2016, Version publie´e À Anglais. Smart Health À A new form of healthcare Assessing Ebola-related web search behaviour: Insights and implications from an analytical study of Google Trendsbased query volumes Scoping review on search queries and social media for disease surveillance: A chronology of innovation Infodemiology: Tracking flu-related searches on the web for syndromic surveillance Googling" for cancer: An infodemiological assessment of online search interests in Australia, Canada, New Zealand, the United Kingdom, and the United States Available from Social media: A systematic review to understand the evidence and application in infodemiology, Electronic Healthcare Overview of syndromic surveillance. What is syndromic surveillance? Centres of Disease Control and Prevention Web queries as a source for syndromic surveillance Developing a prototype system for syndromic surveillance and visualization using social media data (Thesis) The reliability of tweets as a supplementary method of seasonal influenza surveillance Use of internet search queries to enhance surveillance of foodborne illness Real-time predictive seasonal influenza model in Catalonia Scoping review on search queries and social media for disease surveillance: A chronology of innovation Google trends for formulating GIS mapping of disease outbreaks in India An investigation of the public health informatics research and practice in the past fifteen years from 2000 to 2014: A scoping review Validating models for disease detection using twitter Malaria surveillance system using social media How often people google for vaccination: Qualitative and quantitative insights from a systematic search of the web-based activities using Google Trends Mining web data for epidemiological surveillance, PAKDD. Lecture Notes in Computer Science (7769) Semantic analysis of open source data for syndromic surveillance National and local influenza surveillance through Twitter: An analysis of the 2012À2013 influenza epidemic Digital disease detection-harnessing the Web for public health surveillance Mining Twitter data for influenza detection and surveillance Time-series adaptive estimation of vaccination uptake using web search queries Google trends: A web-based tool for real-time surveillance of disease outbreaks Extracting signals from news streams for disease outbreak prediction Syndromic surveillance of Flu on Twitter using weakly supervised temporal topic models News trends and web search query of HIV/AIDS in Hong Kong Web-based infectious disease surveillance systems and public health perspectives: A systematic review Social and news media enable estimation of epidemiological patterns early in the 2010 Haitian cholera outbreak OMG U got flu? Analysis of shared health messages for biosurveillance Social network simulation and mining social media to advance epidemiology Monitoring influenza trends through mining social media Text and structural data mining of influenza mentions in web and social media Social networks, web-based tools and diseases: Implications for biomedical research Predicting and containing epidemic risk using friendship networks Detecting influenza outbreaks by analyzing Twitter messages. CoRR, abs/1007.4748 Towards detecting influenza outbreaks by analyzing Twitter messages Lightweight methods to estimate Influenza rates and alcohol sales volume from Twitter messages. Language Resources and Evaluation Impact of extreme weather events and climate change for health and social care systems Using Twitter sentiment and emotions analysis of Google Trends for decisions making Insights from flutracking: Thirteen tips to growing a web-based participatory surveillance system Facebook and Twitter vaccine sentiment in response to measles outbreaks Enhancing Twitter data analysis with simple semantic filtering: Example in tracking Influenza-like illnesses Carmen: A twitter geolocation system with applications to public health Google Flu Trends: Correlation with emergency department Influenza rates and crowding metrics Influenza forecasting with Google flu trends Network based model of social media big data predicts contagious disease diffusion. Information Discovery and Delivery Infodemiology: Tracking flu-related searches on the web for syndromic surveillance Infodemiology and infoveillance: Framework for an emerging set of public health informatics methods to analyze search, communication and publication behavior on the internet Impact of predicting health care utilization via web search behavior: A data-driven analysis Improving disease surveillance: Sentinel surveillance network design and novel uses of Wikipedia Using social media as a method for early indications & warnings of biological threats. Capstone Project Syndromic surveillance using regional emergency medicine internet Googling" for cancer: An infodemiological assessment of online search interests in Australia, Canada, New Zealand, the United Kingdom, and the United States Internet-based monitoring of Influenza-like illness in the general population: Experience of five Influenza seasons in The Netherlands Global disease monitoring and forecasting with Wikipedia Detecting influenza epidemics using search engine query data Analysing trends and forecasting malaria epidemics in Madagascar using a sentinel surveillance network: A web-based application Monitoring seasonal influenza epidemics by using internet search data with an ensemble penalized regression model Social media: A systematic review to understand the evidence and application in infodemiology, Electronic Healthcare Ensemble learned vaccination uptake prediction using web search queries Predicting antimicrobial drug consumption using web search data Twitter and Public Health (Part 2): Qualitative analysis of how individual health professionals outside organizations use microblogging to promote and disseminate health-related information The landscape of international event-based biosurveillance Twitter and the volume of influenza-like illness in a pediatric hospital Informatics research proposal predicting influenza trends from blogspsu Natural supplements for H1N1 Influenza: Retrospective observational infodemiology study of information and search activity on the Internet Prediction of infectious disease spread using Twitter: A case of Influenza Using public open data to predict dengue epidemic: Assessment of weather variability, population density, and land use as predictor variables for dengue outbreak prediction using support vector machine Monitoring hand, foot and mouth disease by combining search engine query data and meteorological factors Detecting flu transmission by social sensor in China GET WELL: An automated surveillance system for gaining new epidemiological knowledge Web queries as a source for syndromic surveillance A web-based analysis for dengue tracking and prediction using artificial neural network. In Advanced science and technology letters Forecasting word model: Twitter-based influenza surveillance and prediction Comparative analysis of online health queries originating from personal computers and smart devices on a consumer health information portal Predicting new diagnoses of HIV infection using internet search engine data Epidemic outbreak and spread detection system based on twitter data Analysis of web access logs for surveillance of influenza Subregional nowcasts of seasonal influenza using search trends Using google trends for Influenza surveillance in South China Social network analysis and modeling of cellphone-based syndromic surveillance data for Ebola in Sierra Leone Systematic review of surveillance by social media platforms for illicit drug use Immediate and long-term effects of 2016 Zika outbreak: A twitter-based study Investigating Twitter as a source for studying behavioral responses to epidemics Separating fact from fear: Tracking flu infections on twitter Flu detector: Estimating influenza-like illness rates from online user-generated content Assessing public health interventions using Web content Flu detector-tracking epidemics on Twitter Assessing the impact of a health intervention via user-generated Internet content Does locally relevant, real-time infection epidemiological data improve clinician management and antimicrobial prescribing in primary care? A systematic review. Family Practice Seeking health information online: Does Wikipedia matter Detecting social signals of flu symptoms Forecasting influenza levels using real-time social media streams Early stage influenza detection from twitter. Computer Science À Social and Information Networks, Computer Science À Computation and Language Heat stroke internet searches can be a new heatwave health warning surveillance indicator The wisdom of crowds in action: Forecasting epidemic diseases with a web-based prediction market system Dengue Baidu Search Index data can improve the prediction of local dengue epidemic: A case study in Guangzhou, China Using Baidu search index to predict Dengue outbreak in China Accurate influenza monitoring and forecasting using novel Internet data streams: A case study in the Boston Metropolis Using multi-source web data for epidemic surveillance: A case study of the 2009 influenza A (H1N1) pandemic A new approach to monitoring dengue activity Web-based surveillance systems for human, animal, and plant diseases Google flu trends and emergency department triage data predicted the 2009 pandemic H1N1 waves in Manitoba Dengue prediction by the web: Tweets are a useful tool for estimating and forecasting dengue at country and city level Using social network analysis to inform disease control interventions. Preventive Veterinary Medicine, 126, 94À104 Predicting the spread of pandemic influenza based on air traffic data and social media Google Flu Trends in Canada: A comparison of digital disease surveillance data with physician consultations and respiratory virus surveillance data Towards exploiting social networks for detecting epidemic outbreaks Forecasting AIDS prevalence in the United States using online search traffic data Using social media to monitor mental health discussions-evidence from Twitter Forecasting Zika incidence in the 2016 Latin America outbreak combining traditional disease surveillance with search, social media, and news report data Wikipedia usage estimates prevalence of Influenza-like illness in the United States in near real-time Use of web-based symptom checker data to predict incidence of a disease or disorder. US Patent App. 14/180,683, US20140236613A1, US Application Integrating malaria surveillance with climate data for outbreak detection and forecasting: The EPIDEMIA system Internet-based surveillance systems for monitoring emerging infectious diseases. The Lancet Infectious Diseases Tracking dengue epidemics using twitter content classification and topic modelling Epidemic forecasting is messier than weather forecasting: The role of human behavior and internet data streams in epidemic forecast Comparing social media and search activity as social sensors for the detection of influenza Forecasting influenza outbreak dynamics in Melbourne from Internet search query surveillance data Climate change and public health surveillance: Toward a comprehensive strategy Twitter influenza surveillance: Quantifying seasonal misdiagnosis patterns and their impact on surveillance estimates A case study of the New York City 2012-2013 Influenza season with daily geocoded Twitter data from temporal and spatiotemporal perspectives The complex relationship of real space events and messages in cyberspace: Case study of Influenza and pertussis using tweets Analysis of public concerns about Influenza vaccinations by mining a massive online question dataset in Japan Role of online data from search engine and social media in healthcare informatics Monitoring Twitter content related to influenza-like-illness in Spanishspeaking populations Computational approaches to influenza surveillance: Beyond timeliness Using search queries for malaria surveillance Towards early discovery of salient health threats: A social media emotion classification technique Patterns of information-seeking for cancer on the internet: An analysis of real world data Reassessing Google Flu Trends data for detection of seasonal and pandemic Influenza: A comparative epidemiological study at three geographic scales Respiratory syncytial virus tracking using internet search engine data Digital disease detection: A systematic review of event-based internet biosurveillance systems ASPREN surveillance system for Influenza-like illness: A comparison with flutracking and the national notifiable diseases surveillance system Comparison: Flu prescription sales data from a retail pharmacy in the US with Google Flu trends and US ILINet (CDC) data as flu activity indicator You are what you Tweet: Analyzing Twitter for public health A model for mining public health topics from Twitter Twitter improves Influenza forecasting Worldwide influenza surveillance through twitter Social media mining for public health monitoring and surveillance Technical challenges for big data in biomedicine and health: Data sources, infrastructure, and analytics Participatory online surveillance as a supplementary tool to sentinel doctors for influenza-like illness surveillance in Italy Using participatory Web-based surveillance data to improve seasonal influenza forecasting in Italy Early detection of perceived risk among users of a UK travel health website compared with internet search activity and media coverage during the 2015À2016 Zika virus outbreak: An observational study Using internet searches for Influenza surveillance Internet-based biosurveillance methods for vector-borne diseases: Are they novel public health tools or just novelties? Evaluating Google Flu Trends in Latin America: Important lessons for the next phase of digital disease detection Prediction using propagation: From flu trends to cybersecurity Measuring global disease with Wikipedia: Success, failure, and a research agenda Estimating disease burden using google trends and wikipedia data advances in artificial intelligence: From theory to practice The measles vaccination narrative in twitter: A quantitative analysis Systematic review of electronic surveillance of infectious diseases with emphasis on antimicrobial resistance surveillance in resourcelimited settings Forecasting rare disease outbreaks from open source indicators Using transactional big data for epidemiological surveillance: Google flu trends and ethical implications of 'infodemiology Avian influenza risk surveillance in North America with online media IPSIM-Web, an online resource for promoting qualitative aggregative hierarchical network models to predict plant disease risk: Application to brown rust on wheat Disease surveillance based on Internet-based linear models: An Australian case study of previously unmodeled infection diseases Challenges in detecting epidemic outbreaks from social networks Deploying nEmesis: Preventing foodborne illness by data mining social media Modeling spread of disease from social interactions Predicting disease transmission from geo-tagged micro-blog data Syndromic surveillance models using Web data: The case of scarlet fever in the UK. Informatics for Health and Social Care Syndromic surveillance models using web data: The case of influenza in Greece and Italy using google trends Smart monitoring and controlling of Pandemic Influenza A (H1N1) using Social Network Analysis and cloud computing Combining search, social media, and traditional data sources to improve Influenza surveillance Cloud-based electronic health records for real-time, region-specific influenza surveillance Using clinicians' search query data to monitor Influenza epidemics Predicting flu incidence from Portuguese Tweets Analysing Twitter and web queries for flu trend prediction Literature review on effectiveness of the use of social media: A report for Peel Public Health Web-based surveillance of illness in childcare centers Effective detection of the 2009 H1N1 Influenza pandemic in US Veterans Affairs medical centers using a national electronic biosurveillance system The potential use of social media and other internet-related data and communications for child maltreatment surveillance and epidemiological research: Scoping review and recommendations What can google and wikipedia can tell us about a disease? Big Data trends analysis in systemic lupus erythematosus The utility of "Google Trends" for epidemiological research: Lyme disease as an example. Geospatial Health, 4, 135À137 Real-time processing of social media with SENTINEL: A syndromic surveillance system incorporating deep learning for health classification. Information Processing & Management An infodemiology study on breast cancer in Iran: Health information supply versus health information demand in PubMed and Google Trends Evaluating Google, Twitter, and Wikipedia as tools for influenza surveillance using Bayesian change point analysis: A comparative analysis Predicting flu-rate using big data analytics based on social data and weather conditions Correlation between national influenza surveillance data and search queries from mobile devices and desktops in South Korea Use of social media to monitor and predict outbreaks and public opinion on health topics The use of Twitter to track levels of disease activity and public concern in the US during the Influenza A H1N1 pandemic Medical analysis and visualisation of diseases using Tweet data VazaDengue: An information system for preventing and combating mosquito-borne diseases with social networks Applied data science in patient-centric healthcare: Adaptive analytic systems for empowering physicians and patients StreamWeb: Real-time web monitoring with stream computing Dynamic forecasting of Zika epidemics using Google Trends Internet-based surveillance of Influenza-like-illness in the UK during the 2009 H1N1 Influenza pandemic Mining social media and web searches for disease detection Monitoring Influenza activity in Europe with Google Flu Trends: Comparison with the findings of sentinel physician networks-results for 2009-10 Social network clustering and the spread of hiv/ aids among persons who inject drugs in 2 cities in the Philippines Smart Cities: Issues and Challenges: Mapping Political, Social and Economic Risks and Threats New media methods for syndromic surveillance and disease modelling. CAB reviews perspectives in agriculture veterinary science nutrition and natural resources Regional level influenza study with geo-tagged Twitter data Forecasting the incidence of dementia and dementia-related outpatient visits with google trends: Evidence from Taiwan Using Twitter data to provide qualitative insights into pandemics and epidemics Early detection of disease outbreaks using the Internet Identification of keywords from Twitter and web blog posts to detect influenza epidemics in Korea. Disaster Medicine and Public Health Preparedness Estimating influenza outbreaks using both search engine query data and social media data in South Korea Tracking and predicting hand, foot, and mouth disease (HFMD) epidemics in China by Baidu queries Detecting and tracking disease outbreaks by mining social media data Forecasting influenza in Hong Kong with Google search queries and statistical model fusion A neural network based approach to detect Influenza epidemics using search engine query data Effectiveness of web-based social sensing in health information dissemination-A review Utility and potential of rapid epidemic intelligence from internet-based sources Do seasons have an influence on the incidence of depression? The use of an internet search engine query data as a proxy of human affect Uncovering social media data for public health surveillance, Association for Information Systems AIS Electronic Library (AISeL) Advances in using Internet searches to track dengue Using electronic health records and Internet search information for accurate influenza forecasting Early warning for infectious disease outbreak: Theory and practice Accurate estimation of Influenza epidemics using Google search data via ARGO Use of social media for the detection and analysis of infectious diseases in China Detecting disease outbreaks in mass gatherings using Internet data Learning about health and medicine from Internet data Seeking insights about cycling mood disorders via anonymized search logs Methods of using real-time social media technologies for detection and remote monitoring of HIV outcomes Using search engine data as a tool to predict syphilis Monitoring influenza epidemics in China with search query from Baidu Using Google Trends and ambient temperature to predict seasonal influenza outbreaks A spatial-temporal method to detect global influenza epidemics using heterogeneous data collected from the Internet Multi-task learning improves disease models from web search On infectious intestinal disease surveillance using social media content We acknowledge the University of Alcala de Henares which gave us the opportunity and support to conduct this research. Loukas Samaras conducted the research with the help of Dr. Miguel-Angel Sicilia and Dr.Elena García-Barriocanal. The conception and plan of the research conducted is part of Loukas Samaras' ongoing PhD work, and was done under the supervision of Miguel-Angel Sicilia and Elena García-Barriocanal.