key: cord-0807983-4djnskm5 authors: Sv, Praveen; Ittamalla, Rajesh; Deepak, Gerard title: Analyzing the attitude of Indian citizens towards COVID-19 vaccine – A text analytics study date: 2021-02-27 journal: Diabetes Metab Syndr DOI: 10.1016/j.dsx.2021.02.031 sha: f5f4ace06c22c97bee71f248a9c7eca376c6b14b doc_id: 807983 cord_uid: 4djnskm5 BACKGROUND AND AIMS: The government of India recently planned to start the process of the mass vaccination program to end the COVID-19 crises. However, the process of vaccination was not made mandatory, and there are a lot of aspects that arise skepticism in the minds of common people regarding COVID-19 vaccines. This study using machine learning techniques analyzes the major concerns Indian citizens voice out about COVID-19 vaccines in social media. METHODS: For this study, we have used social media posts as data. Using Python, we have scrapped the social media posts of Indian citizens discussing about the COVID- 19 vaccine. In Study 1, we performed a sentimental analysis to determine how the general perception of Indian citizens regarding the COVID-19 vaccine changes over different months of COVID-19 crises. In Study 2, we have performed topic modeling to understand the major issues that concern the general public regarding the COVID- 19 vaccine. RESULTS: Our results have indicated that 47% of social media posts discussing vaccines were in a neutral tone, and nearly 17% of the social media posts discussing the COVID-19 vaccine were in a negative tone. Fear of health and allergic reactions towards the vaccine are the two prominent issues that concern Indian citizens regarding the COVID-19 vaccine. CONCLUSION: With the positive sentiments regarding vaccine is just over 35%, the Indian government needs to focus especially on addressing the fear of vaccines before implementing the process of mass vaccination. Methods: For this study, we have used social media posts as data. Using Python, we have scrapped the social media posts of Indian citizens discussing about the COVID-19 vaccine. In Study 1, we performed a sentimental analysis to determine how the general perception of Indian citizens regarding the COVID-19 vaccine changes over different months of COVID-19 crises. In Study 2, we have performed topic modeling to understand the major issues that concern the general public regarding the COVID- 19 vaccine. During the mid of March 2020, WHO has announced the COVID-19 crisis as a pandemic global health crisis [1] . We are at the beginning of 2021, yet the normalcy hasn't returned, and many parts of the world are still in lockdown. The scariest aspect of this pandemic is, with the threat of the second wave being imminent, some experts believe the worst is yet to come. With countries like the United States of America and Germany, which records more COVID-19 cases now, compared to four months earlier, it can be inferred that the ongoing crises may well go beyond 2020. The year 2021 brings new hope to the Indian citizens as the daily surge of COVID-19 cases in India was under 20,000 cases for most parts of the year. However, vaccinating the general public is one of the safest ways to halt any pandemics outbreak, including the current COVID-19 crises. The growing human rights concerns, anti-vaccine movement, and skepticism towards the vaccine and its effects may result in the process of vaccination becoming a complicated task. Previous studies have identified that several factors could lead an individual to develop their perspective towards vaccines and their effects [2, 3] . Previous research has also indicated that people are more prone to reject the vaccine for a newly detected disease [4] . With the Indian government planning to vaccinate the citizens from the beginning of the year 2021, it is important to understand the general public's mental attitude towards the vaccine. This study will help government officials and policymakers understand the issues that need to be addressed before initiating the process of mass vaccination. We have used social media posts of Indian citizens to understand common citizens' mental attitudes towards the COVID-19 vaccine and the effect of the vaccination during pandemic times. By using Python library Twint, we have scrapped tweets that contain the words 'Covid vaccine.' For this study, we have chosen Twitter as a data source because, after the outbreak of COVID-19, more and more individuals worldwide began to use Twitter as a platform to share their concerns about the pandemic and receive updates about COVID-19 [5] . From the previous studies conducted in the past, we can infer that social media is the best source from which we can understand the general public's behavior during unprecedented times [6, 7, 8, 9] . For our analysis, we have collected tweets that contain the words "Covid vaccine." Python library Twint has an option of geographical filtering, and by using the option, we have collected tweets that belong to India. For this study, we have only chosen tweets written in the English Language. After eliminating the tweets that belong to other languages, we have finally chosen 73,760 unique English tweets for the study. To counter the damages that may be caused due to the unbalanced sample, we choose an equal number of tweets for each month in our corpus. After that, we performed the data cleaning process for the data in our corpus. We performed the process of J o u r n a l P r e -p r o o f removing the stop words, removing the numbers, punctuations, and hyperlinks that are not needed for our data analysis. Stop words are the words in the corpus that do not have any meaning of their own and are therefore not needed for the analysis. Stop words are generally articles and prepositions that do not have any meaning of their own and do not provide any merit. After removing the stop words from the corpus, we performed the process of stemming and lemmatizing to the data in our analysis. Stemming is the process of eliminating the terminations of terms to identify their root type. On the other hand, lemmatization is the process of grouping different types of words together so that they will be grouped, thus reducing the dimensionality. The objective of study 1 is to understand the sentiments of Indian citizens towards the COVID-19 vaccine. We performed the process of sentiment analysis to understand the attitude of Indian citizens towards the COVID-19 vaccines. Sentimental analysis is a process by which the sentiment score of the data is determined after the analysis. Sentiment analysis is described as "the automatic method to extract and analyze the subjective judgments on different aspects of an item or entity" [10] . Sentimental analysis is a machine learning process that involves the use of NLP (Natural Language Processing) with the objective of identifying the emotions the authors of a particular text expressed through his words [11] . Sentimental analysis at the beginning was done on the document level [12] , then on the sentence level, and [13] on the phrase level [14, 15] . Sentiment analysis computationally distinguishes and classifies opinion reflected in a particular section of text by the author about the subject that the premise is built upon. The sentiment analysis's main objective is to determine the rate of the polarity and by which the tone expressed by the author in a particular corpus can be defined as positive, negative, or neutral. For this study, we used the computer programming J o u r n a l P r e -p r o o f language Python for collecting the tweets. Specifically, we have used the library twint for the process of data analytics. Some of the early studies conducted on sentiment analysis of Twitter data are [16, 17] . For this study, we have used the python library Textblob to process the textual data. Textblob, using NLP (Natural Language processing) and advanced machine learning principles, analyze every word in the documents presented in the corpus and define the overall sentiments being projected as positive, negative, or neutral. [18] The main objective of the first study is to understand the attitude of Indian citizens towards COVID-19 vaccines. However, a sentimental analysis study won't indicate the major factors or issues that shape the attitude of Indian citizens towards the COVID-19 vaccines. To understand the major aspects that lead Indian citizens to have the attitude, we have conducted study 2. We performed LDA (Latent Dirichlet Allocation) topic modeling in study 2 to understand the major aspects Indian citizen voices out that shapes their attitude towards COVID-19 vaccine. Latent Dirichlet Allocation topic modeling was first introduced in 2003 by Blei, Ng, and Jordan. Topic modeling is a process that summarizes a vast archive of texts by discovering the topics and themes hidden within a set of corpora by using a group of algorithms [19] . Before the introduction of LDA, Probabilistic Latent Semantic Indexing was used in deriving topics. The basic theory behind Probabilistic Latent Semantic Indexing is that the algorithms model each word in a document as a sample from a mixture model, where the mixture components are the multinomial random variables, and that can be represented as topics. However, one of the main disadvantages of Probabilistic Latent Semantic Indexing is its algorithms does not provide a probabilistic model at the document level [20] LDA follows the "Bag of words" assumption, which represents a document as a mixture of latent topics in which a topic is a multinomial distribution over words. The core aspect of Latent Dirichlet Allocation (LDA) topic modeling is that the algorithms J o u r n a l P r e -p r o o f assume that all the documents in a data frame or corpus exhibit a similar set of topics which each of the documents exhibits a various probabilistic mixture of those topics. It works under the assumption that a specific group of words is prone to get associated with specific topics. Using LDA, one can discover latent topics from extensive unstructured data. To facilitate a better understanding of the identified issues, we used the library LDAvis. Though sentimental analysis gave an insight into the general public's attitude towards the COVID-19 vaccine and its effects, it hasn't helped us understand the major issues that shape the general public's attitude. To understand the general public's concerns J o u r n a l P r e -p r o o f regarding the vaccine and its effects, we further performed Latent Dirichlet Allocation topic modeling for the tweets about the vaccine that has negative sentiments. The results of the topic modeling were given in Table 2 . Although most common people's opinions on social media regarding vaccines and their effects were in neutral sentiments, only 35% of sentiments are positive, and this should be the concern for the government and policymakers. Unless or until the government can convince most of its population that the results and the effects of the vaccine will be positive, the objective of vaccination cannot be achieved. With the positive sentiments regarding vaccine is just over 35%, the Indian government needs to focus especially on addressing the fear of vaccines before implementing the process of mass vaccination. Our findings reveal an interesting aspect that, despite COVID-19 having affected nearly 11 million people in India, a considerable number of Indian citizens still feel that the whole pandemic is exaggerated. This mentality will lead the citizens to reject the vaccine. Apart from that, skepticism over the nationality of the vaccine, skepticism over the vaccine trials, skepticism over health after taking the vaccine, the fear of death that the vaccine may cause, allergic reactions towards the vaccine, distrust over pharma companies, doubts regarding data provided by the vaccine companies, prevalence of numerous vaccines and concerns over choosing the safest, and the rush in providing vaccine were the other concerns shared by Indian citizens. Our analysis has shown that though the Indian general public is voicing some genuine https://www.indiatoday.in/coronavirusoutbreak/story/COVID-19-antibody-immunitylasts-8-months-study-1752290-2020-12-23 Preferences and Willingness to Pay for Human Papillomavirus Vaccination for Their Daughters: A Discrete Choice Experiment in Hong Kong Considering Emotion in COVID-19 Vaccine Communication: Addressing Vaccine Hesitancy and Fostering Vaccine Confidence Twitter's user growth soars amid coronavirus, but uncertainty remains. CNET Crisis information distribution on twitter: a content analysis of tweets during hurricane sandy Twitter tsunami early warning network: a social network analysis of twitter information flows Twitter earthquake detection: earthquake monitoring in a social world Evaluating public response to the Boston marathon bombing and other acts of terrorism through twitter A survey of multimodal sentiment analysis Reflections on sentiment/opinion analysis A sentimental education: sentiment analysis using subjectivity analysis using subjectivity summarization based on minimum cuts Mining and summarizing customer reviews Recognizing contextual polarity in phrase level sentiment analysis Contextual phrase-level polarity analysis using lexical affect scoring and syntactic n-grams Classifying sentiment in microblogs: is brevity an advantage is brevity an advantage? Twitter as a corpus for sentiment analysis and opinion mining Topic models Latent dirichlet allocation Recovered COVID-19 patients last immunity for 8 months, raise hopes for vaccine: Study