key: cord-0764971-9ys14x35 authors: Gori, Davide; Durazzi, Francesco; Montalti, Marco; Di Valerio, Zeno; Reno, Chiara; Fantini, Maria Pia; Remondini, Daniel title: Mis-tweeting communication: a Vaccine Hesitancy analysis among twitter users in Italy date: 2021-10-01 journal: Acta Biomed DOI: 10.23750/abm.v92is6.12251 sha: 33eda7b279dfda3f3e0ff5ffec240bad91299d4f doc_id: 764971 cord_uid: 9ys14x35 BACKGROUND AND AIM: A previously unseen body of scientific knowledge of varying quality has been produced during the ongoing COVID-19 pandemic. It has proven extremely difficult to navigate for experts and laymen alike, giving rise to the so-called “Infodemic”, a breeding ground for misinformation. This has a potential impact on vaccine hesitancy that must be considered in a situation where efficient vaccination campaigns are of the greatest importance. We aimed at describing the polarization and volumes of Italian language tweets in the months before and after the start of the vaccination campaign in Italy. METHODS: Tweets were sampled in the October 2020-January 2021 period. The characteristics of the dataset were analyzed after manual annotation as Anti-Vax, Pro-Vax and Neutral, which allowed for the definition of a polarity score for each tweet. RESULTS: Based on the annotated tweets, we could identify 29.6% of the 2,538 unique users as anti-Vax and 12.1% as pro-Vax, with a strong disagreement in annotation in 7.1% of the tweets. We observed a change in the proportion of retweets to anti-Vax and pro-Vax messages after the start of the vaccination campaign in Italy. Although the most shared tweets are those of opposite orientation, the most retweeted users are moderately polarized. CONCLUSIONS: The disagreement on the manual classification of tweets highlights a potential risk for misinterpretation of tweets among the general population. Our study reinforces the need to focus Public Health’s attention on the new social media with the aim of increasing vaccine confidence, especially in the context of the current pandemic. (www.actabiomedica.it) The still ongoing COVID-19 pandemic has been accompanied by a continuous and overwhelming stream of information and scientific evidence, which has proven invaluable in pooling scientific knowledge and expertise across the globe. This remarkable collective effort has since shown some less benign consequences, well encompassed by the term "Infodemic" defined by the World Health Organization (WHO) as an overabundance of information, both online and offline (1) . This overwhelming amount of data of varying quality and from different sources could endanger the public health efforts to curb the spread and effects of the pandemic by fostering mistrust in health authorities. It could also prove harmful to individual health as it could lead to uncertainty regarding which information to trust and rely on and thus potentially leading to a higher risk of adopting harmful behaviours. Said uncertainty and difficulty in acquiring reliable and scientifically sound information have the potential to exacerbate the phenomenon of Vaccine Hesitancy (VH) which has been identified as a major public health concern by WHO in 2019 (2) . The WHO Scientific Advisory Group of Experts (SAGE) working group on VH defined it as a "delay in acceptance or refusal of vaccination despite availability of vaccination services": this definition not only encompasses utter opposition to vaccination but the whole spectrum of negative stances toward it, including reluctant acceptance (3) . As the efforts to curb the harrowing effects of the pandemic on healthcare systems and societies heavily rely on efficient mass-vaccination campaigns, VH is now more than ever an issue that needs to be firmly addressed. In a recent concise systematic review, only a narrow majority of survey studies conducted among the general population stratified by country (29 out of 47 studies, 62%) showed an acceptance of COVID-19 vaccination ≥70%, (4) . Unsatisfactory rates of COVID-19 vaccine acceptance have been reported in the Middle East, Russia, Africa, and several European countries, including Italy (5) (6) (7) (8) (9) . In many countries, values under 60% have been registered: a figure below the estimated range of immune individuals (60-75%) needed to halt the transmission and spread of the virus (10, 11) . In this context, social media represent a relatively new factor at play; with a potential to accelerate the spread of (good and bad) information and to offer a mean of rapidly selecting what is relevant to public discourse. If properly managed, this instrument might prove invaluable in enabling the diffusion of scientifically validated and useful information, but if left unchecked it shows some detrimental aspects. In fact, it has been proven to be a fertile environment for misinformation, with an apparent advantage of unscientific views over official and scientifically validated information (12) . This can be inferred by evidence showing how anti-vaccine tweets have a 4.13-fold chance of being retweeted if compared to neutral tweets (13); moreover, a majority (65%) of YouTube videos on the topic of vaccines have been shown to convey anti-vaccine messages (14) . Information diffusion patterns on these platforms have also shown an echo chamber effect, which originates well-segregated communities and increases polarization (15) . Additionally, previous evidence coming from several studies highlighted not only that VH is fostered by misinformation (16) (17) (18) (19) , but also actively weaponized for political purposes (20). Similar phenomena have been recently observed during the COVID-19 pandemic (21) , when the scientific community initially played the role of boundary spanner but then lost its relevance and became more isolated within a context of increasing politicisation of the debate. In addition to these social phenomena, social media are sensitive to targeted disinformation campaigns that actively spread malware and unsolicited content. The Twitter discourse around vaccines has been found to be influenced by Twitter bots and "trolls" (22) , the former being artificial entities spreading antivaccine messages, and the latter users actively promoting discord and eroding public consensus. The overall picture regarding vaccines on Twitter during the last ten years before COVID-19 shows an increasing antivaccine user-base with a minimal amount of inter-communication between communities (23). As a possible effect of the division about vaccines, previous literature observed that the struggle between pro-and anti-vaccinationists on various social media (i.e. Twitter, Facebook) leads to an increase in the number of undecided people that are more likely to cluster around anti-vaccination movements rather than official/institutional sources of information (24, 25) . This might be one of the reasons behind the steeping increase of VH in the years following the introduction of social media, thus leading the WHO to label it as a major Public Health concern (2). The aim of our work is to characterize a dataset of tweets in Italian language referred to both vaccines and vaccination practice in order to provide the scientific community with potentially useful information about Pro-and AntiVax users on Twitter. Such knowledge on prevalence, diffusion patterns, polarization, and other characteristics of tweets regarding vaccination could be helpful in understanding the machinery behind the successful online communication of scientific messages, and on a more practical side in planning more targeted and efficient campaigns on social media aimed at improving vaccine acceptance, especially considering their growing role in the public debate. An Italian-based dataset of tweets related to vaccines and vaccination practice was collected from October 2020 to January 2021 through the continuous filter streaming endpoint of the Twitter API implemented in the Python's package Tweepy. Tweets were downloaded if they contained at least one of the following words within their text: "vaccino", "vaccini", "vaccinazione" and "vaccinazioni", which are the Italian translations of "vaccine(s)" and "vaccination(s)". Starting from a collection of 764K original tweets, we randomly sampled 7,004 tweets to be manually annotated. A characterization was performed by a panel of MD Public Health Residents and Physics students into different predefined classes: -"AntiVax" (AV): expressing negative stances towards vaccines. topic is not dealt with in the right context (e.g., metaphors or analogies). Tweets considered as "OffTopic" for at least 1 over 3 annotators were removed. In particular, the annotators were asked to consider as explicitly AntiVax or ProVax only tweets clearly containing an opinion of the author toward the acceptance of a vaccine. For this reason, messages expressing aversion to or support for pharmaceutical companies and distribution strategies were asked to be classified as Neutral. To characterize the polarization between anti-vax and pro-vax tweets, we can define a polarity score for each tweet as where the sum is over with -1 for AV, 0 for N and +1 for PV tweets, is the number of annotators assigning the tweet to class c, and is the total number of annotators. The polarity score spans from -1 (3/3 annotations as AV) to +1 (3/3 annotations as PV). Based on this score, we can assign a polarity class to each tweet: AV if , PV if and N otherwise. This method allows to distinguish between annotations with "weak disagreement", such as AntiVax-AntiVax-Neutral annotations by the three evaluators (producing a value of and annotations with "strong disagreement", such as AntiVax-AntiVax-ProVax annotations (producing a value of . Following the definition of polarity class written before, the tweets annotated with strong disagreement will be classified as Neutral, while the tweets annotated with weak disagreement will be classified in accordance with the majority annotation. We also calculate the polarity score ( ) for the Twitter users in the database as the average scores of the tweets they wrote (in the range ). The user is then assigned to a user polarity class, that is AV if , PV if and N otherwise. Finally, following a previously conducted study in which the volume of tweets and retweets was considered (26), a stratified analysis for AV, PV, and N over time was conducted considering the time series of Tweets volumes. The visual representation of tweet volumes over time allows for considerations regarding volume peaks and potential events that might have caused them. The number of retweets to each one of the annotated tweets was interpreted as a proxy of the attention received by the tweets and by the polarity class associated to that tweet. In this sense, when we refer to AntiVax, Neutral or ProVax retweets, we refer to the labels of the manually annotated original tweets. A corpus of 7,004 tweets was manually annotated, of which 495 tweets (7.0%) were considered as OT by at least one annotator and consequently removed. The remaining 6,509 represented the final sample analysed. In Figure. 1 the percentage of agreement and disagreement on tweets annotated with at least 2/3 annotations (based on a majority criterion) in the same class is shown. As we show in Figure 1 , N and AV tweets share a higher percentage of 100% consensus between annotators (60.8% and 58.0% respectively), while the agreement on tweets to be considered as PV is lower (44.7%). We then classified each tweet using the polarity classes estimated through the polarity measure P T , which considers all tweets containing annotations of opposite polarity as Neutral, as detailed in Materials and methods. The class frequencies according to this method are thus 61.2% for N, 24.1 for AV and 14.7 for PV. Polarity scores and annotation frequencies within the dataset are listed in Table 1 and visualized in Figure. 2. Tweets considered as AV by 3/3 annotators were 1,007 (15.5% of the dataset) while 563 (8.7%) by 2/3. On the other hand, 473 (7.3%) and 479 (7.4%) were labelled as PV by 3/3 and 2/3 annotators respectively. Furthermore, 190 tweets received an annotation in each of the three categories (indicated as ANP in Table 1 ) and 272 tweets received partly opposite annotations (indicated as AAP or APP), leading to a strong disagreement in 7.1% of the total number of tweets. Even if these three groups of tweets, that we chose to classify as Neutral tweets, are the least represented categories in Table 2 , their abundance is not negligible and show to some extent the risk of misinterpretation of the text content of the tweets. In Figure 2 , AAP and APP annotations are represented by the two smallest spots in the AV-PV plane (light blue and light red respectively), while ANP annotations are represented by the smallest grey spot in the centre of the cartesian plot. All the tweets analyzed were written by 2,538 unique users, meaning that on average each user wrote 2.6 original tweets. We grouped the users according to the polarity class defined above (shown in Figure. 3a) and in Figure 3b we show how many tweets they wrote in each A N P category. PV and AV users are mainly concentrated around the respective axes in Figure 3b , meaning that these categories of users tend to write contents within the same polarity class. Only 4 PV users wrote 1 or 2 AV tweets, while 16 AV users wrote 1-3 PV tweets, but this difference is likely to be associated to the different total amount of AV and PV users and not necessarily to a different user behaviour. The dispersion of N users is higher, with users who wrote up to 8 AV or PV tweets and who are generally more represented in the ProVax-Neutral plane of Figure 3b . In Figure 4a we show the average number of retweets done to the annotated tweets, stratified by polarity score , while in Figure 4b we show the average number of retweets received per user, stratified by user polarity score . On the one hand, tweets with extremal polarity values received more attention on average ( Figure. 4a) . On the other hand, the most retweeted users have an absolute value of polarization around intermediate values both for PV and AV: the peak is around +0.5 and -0.7 in Figure. the week of December 23-30, 2020, but also the week of November 4-11 shows a strong relative volume increase. In relation to this, we note that on November 9 th there was the Pfizer and BioNTech announce of a vaccine candidate against COVID-19 and on December 27 th the vaccination campaign started in Italy, thus these changes in Tweet volume can be likely associated to these events with a large echo on the media in relation to COVID and vaccines. The volume of original tweets (Fig. 4c) does not show any particular trend reversal between the categories, with Neutral tweets being those written the most during each week, followed by AntiVax and finally ProVax tweets. On the other hand, the volume of retweets (Fig. 4d) shows that during the weeks starting in 18/11/2020 and 09/12/2020, the volume of AV retweets exceeded the volume of the Neutral ones, even though there were less AV original tweets than N ones. Furthermore, the fraction of Neutral tweets over time ( Supplementary Fig. 1 ) follows a negative trend after the two main volume increases of 4 November and 23 December. In particular, the first decrease in the volume of retweets of Neutral content was characterized by an increase of AV retweets, while the Our study analyzed the volumes of Pro-and Anti-Vax tweets in Italy in the period from October 2020 to January 2021. A panel of Public Health Residents and Physics MD students analyzed a total of 7,004 tweets categorizing them into "AntiVax", "Neutral", "ProVax" and "OffTopic", recording a higher rate of disagreement for "ProVax" annotation. A greater disagreement in identifying a tweet as pro-vax is likely due to the inherent difficulty in recognizing a message actively supporting vaccine uptake as opposed to simply sharing the news. Since the majority of official and news sources support an implicit endorsement of the vaccine, it is sometimes difficult to distinguish forms of active support for the uptake. Our annotation identified users mainly interested in making neutral comments who nevertheless engage more in writing actively ProVax content than AntiVax content. This might be due to the fact that 1) Neutral users are more oriented towards ProVax opinions, or 2) potentially ProVax tweets could have been classified as Neutral to a major extent as compared to AntiVax tweets, which result more distinguishable. Our results are similar to those found by a tweets sentiment analysis conducted between 2011 and 2019 (27) , confirming the consistency of our sampling. In case of disagreement on a tweet with 2/3 annotations as PV or AV, the remaining annotation is more often N than at the opposite pole. However, 2.9% of the total number of tweets received an annotation within each category (ANP in Table 1 ) underlying that even within annotators we found some grade of disagreement. Remarkably, we observed a strong disagreement on nearly 500 tweets (7.1%), which received opposite classification by one of the annotators. This underscores a potential ambiguity associated to the messages conveyed through social media, which may lead to misinterpretation of the author's opinion or intention. In the perspective of devising even more sensitive and specific algorithms to perform sentiment analyses, this figure shows how much progress is still needed, as even highly educated annotators often struggle to interpret the meaning of messages. On the other hand, the risk of misunderstanding is possibly enhanced by the lack of proper context in the task of manual annotation, where annotators were shown just the text of the tweets without information about the author, comments or replies as it usually happens scrolling down the Twitter feed. We also found that tweets at the extremes of the opinion polarization are much more actively shared, with a prevalence of attention (number of retweets) for tweets with AV content versus PV content, similarly to other cases of dissemination of misleading information (28) (29) (30) . Attempts at explaining this known online behaviour have included the observation that messages are deemed more convincing if they align with the reader's opinion (31) : this might mean that individuals tend to share posts that are more clearly polarized and thus more recognizably aligned (or not) with their own opinion. This could be amplified even more by the segregation of groups of users with similar opinions and thus exposed to similar sources of information. Moreover, it has been shown that messages with a more emotionally charged content, or evoking powerful imagery, are more likely to be shared online (32) . It could be that such characteristics could both make the detection of the stance expressed in a tweet more evident, resulting in a higher polarization score, and have a higher chance to be shared. Our result shows that the users are more willing to share extremely polarized content written by not extremely polarized users. One possible explanation is that an overly extreme user sharing always highly polarized content is seen as less trustworthy than a less biased user who writes moderate posts from time to time. Possibly, these moderately polarized users could assume the function of boundary spanners for a more balanced discussion, reducing the gap between the two extremes. In a previous study we showed how the analysis of social media can be useful to characterize the perception about vaccines (26) . The analysis of tweet volumes over time, stratified by their content, could help in identifying two changes in the time series. An increase in the total volume of tweets associated to vaccines occurred very close after the Pfizer's announcement of the new vaccine release on November 9, 2020. Another increase in volume was observed around December 27 th , 2020, the so-called V-day corresponding to the beginning of vaccination campaign in Italy. After both events, we reported a decrease on the fraction of Neutral retweets, suggesting that users tend to amplify more polarized content after important newsbreaks that may affect real life, probably trying to convince people of their own opinion. Interestingly, while after the November announcement, the volume of Anti-Vax retweets increased, the ratio turned in favour of the ProVax group after the beginning of the vaccination campaign. This last result might be related to the considerable increase of tweets posted by official channels (i.e. Ministry of Health, ISS) regarding the campaign kick-off, combined with a tendency by individual users to share pictures and reports of their vaccine administration. After 20/01/2021, the retweet volume becomes balanced between the two opposite polarizations. However, in the weeks following the end of the study, public attention globally and in Italy was drawn towards the reports of several cases of thrombotic events after the administration of the Vaxzevria vaccine. This culminated in a report by the Pharmacovigilance Risk Assessment Committee (PRAC) of the European Medicines Agency (EMA) linking this vaccine with some rare coagulation disorders (33) . This might have shifted the opinions about vaccination uptake in a significant way. In this perspective, it would be of great importance to continue monitoring through validated algorithms that analyse not only the volumes but also the orientation of tweets to understand how sentiment shifts across different population strata. The analysis of social media Big Data through the methods of Data Science, in particular exploiting new tools of extracting meaning from a large volume of information, has the potential to drive real change in Public Health practice, or at least to allow an insightful monitoring of topics of concern, such as vaccination hesitancy. For this reason, the creation of algorithms specifically designed for infoveillance is increasingly necessary in this field (34) (35) (36) . This could help to design targeted campaigns towards the segment of the population that falls in the "undecided" category due to confusing messages conveyance, given their susceptibility to the anti-vax rhetoric. Social media play an extremely relevant role in selecting what becomes matter for public debate and in steering its direction, with observable real-life consequences. Due to only a partial understanding of the fine machinery behind the failure or success of content in the social media arena, its potential for good remains yet partially untapped. In order to favor the spread of scientifically validated and useful information with a positive impact on society, more knowledge is needed on what makes a message relevant among the cacophony of posts we are subjected to on a daily basis. In this study, by characterizing a dataset of almost 7,000 tweets in Italian language related to vaccines, we showed that it is not straightforward to understand the stance of the messages even for educated readers. Furthermore, we observed that the most retweeted users were only moderately polarized in the anti-vax vs pro-vax debate, even if the most shared tweets were extremely polarized. These results offer some insight into the social mechanisms that govern the amplification of messages in online social networks, particularly with regard to sensitive and debated topics such as vaccine uptake. Figure S1 . Fraction of the volume of Neutral retweets with respect to the total volume of retweets. We have drawn two vertical lines corresponding to the date of the Pfizer and Biontech announce of a vaccine candidate against COVID-19 and the vaccination campaign beginning in Italy. Managing the COVID-19 Infodemic: Promoting Healthy Behaviours and Mitigating the Harm from Misinformation and Disinformation Ten Health Issues WHO Will Tackle This Year. Available online Group on Vaccine Hesitancy. Vaccine Hesitancy: Definition, Scope and Determinants COVID-19 Vaccine Hesitancy Worldwide: A Concise Systematic Review of Vaccine Acceptance Rates Once We Have It, Will We Use It? A European Survey on Willingness to Be Vaccinated against COVID-19 COVID-19 Vaccine Hesitancy Is Associated with Beliefs on the Origin of the Novel Coronavirus in the UK and Turkey A Global Survey of Potential Acceptance of a COVID-19 Vaccine High Rates of COVID-19 Vaccine Hesitancy and Its Association with Conspiracy Beliefs: A Study in Jordan and Kuwait among Other Arab Countries. Vaccines (Basel) 2021 Enhancing COVID-19 Vaccines Acceptance: Results from a Survey on Vaccine Hesitancy in Northern Italy. Vaccines A mathematical model reveals the influence of population heterogeneity on herd immunity to SARS-CoV-2 Reproductive number of coronavirus: A systematic review and meta-analysis based on global level evidence Social Media and Vaccine Hesitancy: New Updates for the Era of COVID-19 and Globalized Infectious Diseases. Hum Vaccin Immunother 2020 Sentiment, contents, and retweets: a study of two vaccine-related twitter datasets What do popular YouTube TM videos say about vaccines? Child Care Health Dev Polarization of the Vaccination Debate on Facebook The Influence of Vaccine-Critical Websites on Perceiving Vaccination Risks HPV Vaccine Information in the Blogosphere: How Positive and Negative Blogs Influence Vaccine-Related Risk Perceptions, Attitudes, and Behavioral Intentions Social Media Use and Influenza Vaccine Uptake among White and African American Adults The Impact of Rare but Severe Vaccine Adverse Events on Behaviour-Disease Dynamics: A Network Model Clusters of science and health related Twitter users become more isolated during the COVID-19 pandemic Weaponized Health Communication: Twitter Bots and Russian Trolls Amplify the Vaccine Debate Temporal trends in anti-vaccine discourse on Twitter The Online Competition between Pro-and Anti-Vaccination Views The Anti-Vaccination Infodemic on Social Media: A Behavioral Analysis Are We Ready for the Arrival of the New COVID-19 Vaccinations? Great Promises and Unknown Challenges Still to Come. Vaccines Trustworthy Health-Related Tweets on Social Media in Saudi Arabia: Tweet Metadata Analysis Topology Comparison of Twitter Diffusion Networks Effectively Reveals Misleading Information An Exploratory Study of COVID-19 Misinformation on Twitter The echo in flu-vaccination echo chambers: selective attention trumps social influence. Vaccine Effective vaccine communication during the disneyland measles outbreak. Vaccine Signal assessment report on embolic and thrombotic events (SMQ) Other viral vaccines) From Big Data to Precision Medicine Section Editors for the IMIA Yearbook Section on Public Health and Epidemiology Informatics Public Health and Epidemiology Informatics: Recent Research Trends Moving toward Public Health Data Science Infodemiology and Infoveillance: Tracking Online Health Information and Cyberbehavior for Public Health Viale Berti Pichat 6/2, 40138 Bologna Acknowledgements: D.R. and F.D. were funded through the EU H2020 project No. 874735 "Versatile emerging infectious disease observatory -forecasting, nowcasting and tracking in a changing world (VEO)". Each author declares that he or she has no commercial associations (e.g. consultancies, stock ownership, equity interest, patent/licensing arrangement etc.) that might pose a conflict of interest in connection with the submitted article. This appendix has been provided by the authors to give readers additional information about the list of investigators.