key: cord-0908252-jm7wkkat authors: Ahmed, Md Shoaib; Aurpa, Tanjim Taharat; Anwar, Md Musfique title: Detecting sentiment dynamics and clusters of Twitter users for trending topics in COVID-19 pandemic date: 2021-08-09 journal: PLoS One DOI: 10.1371/journal.pone.0253300 sha: 8c816047c89c447bc535276acd528234e9a56bb8 doc_id: 908252 cord_uid: jm7wkkat COVID-19 caused a significant public health crisis worldwide and triggered some other issues such as economic crisis, job cuts, mental anxiety, etc. This pandemic plies across the world and involves many people not only through the infection but also agitation, stress, fret, fear, repugnance, and poignancy. During this time, social media involvement and interaction increase dynamically and share one’s viewpoint and aspects under those mentioned health crises. From user-generated content on social media, we can analyze the public’s thoughts and sentiments on health status, concerns, panic, and awareness related to COVID-19, which can ultimately assist in developing health intervention strategies and design effective campaigns based on public perceptions. In this work, we scrutinize the users’ sentiment in different time intervals to assist in trending topics in Twitter on the COVID-19 tweets dataset. We also find out the sentimental clusters from the sentiment categories. With the help of comprehensive sentiment dynamics, we investigate different experimental results that exhibit different multifariousness in social media engagement and communication in the pandemic period. People's involvement in the online social network (OSN) has increased during the COVID-19 pandemic, as regular activities move online. Numerous uses of OSN (e.g., people use OSN for expressing their opinion, communicating with family members, online meetings, etc.) are showed up at this time. Like other OSN, the use of popular microblogging service Twitter has also been impacted. It becomes a popular media for the leaders to communicate with general people and make them aware of public health during this health crisis [1] . So, people usually spend more time on Twitter, and users are more active than at any other time. Their involvements increase during the lockdown period to get the latest news on COVID-19. At the same time, they share their opinions and feelings with their friends through it. As a result, analysis of Twitter data draws vast attention from researchers in this pandemic. PLOS ONE | https://doi.org/10.1371/journal.pone.0253300 August 9, 2021 1 / 20 a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 Sentiment analysis is a technical study about people's emotions, opinions, and attitudes [2] . It is an effective way to measure people's thoughts on particular topics. Moreover, sentiment analysis can convey various impacts on society in several ways. Different types of mental anxieties arise in this pandemic situation, and all those mental conditions can be summarized through sentiment analysis. We can quickly determine the extensive state of depression and panic disorder of persons in a society or community from the sentiment analysis result. We need to apply different virtual depression optimizers in those depressed persons to bring some positive ramifications to society. Again, the success of many applications like recommendation systems depends on the sentiments of social users. Sentiment analysis for active users is a more efficient way to track public opinion. In the coronavirus pandemic, these types of research have significant contributions to help government and policymakers. Authors in [3] analyze Indian people's sentiment during corona lockdown. They used some popular hashtags for measuring positivity and negativity in people. People concentrate on many different topics during this whole pandemic period. Some people posted tweets about the COVID-19 tests and deaths. Again, some people focused on job cuts, online education, or politics. Besides the new topics arrival among people, many different thoughts regarding those topics are shown in this pandemic situation. In [4] , authors determine top trending topics using hashtags for detecting COVID-19 conspiracy theories. Another work [5] detected trending topics and clustered them using the k-mean clustering algorithm. So, the determination of trending sub-topics at different time windows is essential to understand the public's changing interests properly. Our work includes the concept of analyzing active users' different sentiments, such as positive, negative, and neutral sentiments at a particular time interval for trending topics related to COVID-19. This work concentrates on people's positive, negative and neutral sentiments on top-k trending sub-topics in Twitter related to COVID-19. We also track the changes that occurred in top trending topics in Twitter and user's sentiment. The main contributions of our research are summarised below: • Propose a model that lists top-k trending topics in Twitter due to COVID-19 pandemic at a different time interval. • We are modeling and evaluating users' sentiments towards different topics of a given query. • Modeling the sentiment dynamics of different topics. • Detection of sentiment clusters and tracking their changes for top-k trending topics over time. We have accomplished this work as an extensive version of our extended abstract that appeared at [6] . The significant key points of our additional contributions in this journal version are listed below: I. We cluster the Twitter users based on their sentiments on different topics related to II. We model the degree of topical activeness of the users according to the rank of the topics of a given query. III. We revise the existing algorithm to list top-r users according to their overall activities related to top-k trending topics. IV. We conduct our experiment on a new dataset that contains COVID-19 related tweets. We collect those tweets with real-time Twitter lookup API and prepare them according to our requirements. V. The COVID-19 outbreak results in an overwhelming amount of information on different topics, and also users' sentiments vary quickly. As a result, we consider a non-overlapping time window with shorter time intervals to monitor social users' sentiments. VI. In most cases, tweets are very informal, extremely noisy, and also contain grammatically incorrect phrases. To improve the quality of data, we apply a set of pre-processing steps such as Tokenization, Lemmatization, Stemming, Sentence Segmentation, etc. for performance enhancement. VII. We consider a self-regulating topic modeling approach known as T-LDA (Twitter-Latent Dirichlet Allocation) [7] to detect the topic from a tweet. Rajesh et al. [8] scrutinized Tweets related to the coronavirus to get out the appropriate and most accurate with minor misinformation spread. Here, applied only the LDA (Latent Dirichlet Allocation) analysis to find out the negative sentiments dominated the tweet as expected as the virus highly contagious that was clear from the sentiment analysis significantly depends on some words. This work only shows the negative sentiment just from some particular topics without analyzing any model in time intervals and devoid of any sentiment model and analysis. Jim Samuel et al. [9] presented an issue surrounding public sentiment leading to the testimony of growth in fear sentiment and negative sentiment. This approach does not examine the change of sentiment aloft time. An evolving method [10] illustrates the sentiment analysis country-wise related to COVID-19. The author evokes sentiments from tweets only with the judgment of some growing keyword about coronavirus of examining the top trending topics over time. They also discuss just positive and negative sentiments. This approach does not consider any extensive topical model (ex. T-LDA) and neutral sentiment. Yin et al. [11] introduced a structure to study the topic and sentiment dynamics due to COVID-19 from extensive Twitter posts. A recent proposal [12] to analyze social media (micro-blogging like as Twitter called Weibo) data in the early stage of COVID-19 in China and proposed a topic extraction and classification model. The opinion's appearance showed that the topic's approach is stable and viable for understanding public opinions. Moreover, they showed the statistical results of the percentage of first-level topics of COVID-19. A machine learning-based sentiment analysis [13] introduced a hybrid approached to find out the sentiments on regular tweets with polarity calculations. The polarity score is measured from a score range of -1 to 1 based on words used and then used three sentiment analyzer W-WSD, TextBlob, and SentiWordNet. Those analyzers are then validated with the Waikato Environment for Knowledge Analysis (Weka) to measure the best result. Pandey et al. [14] proposed a metaheuristic method depend on K-means and cuckoo search. This method is applied to the different tweeter datasets to determine the optimum cluster-heads in terms of sentiment. It is also compared with differential evolution, particle swarm optimization, cuckoo search, improved cuckoo search, two n-grams, and gauss-based cuckoo search. A clustering-based approach on sentiment analysis is proposed by Gang [15] where they accosted a weighting method called term frequency-inverse document frequency (TF-IDF) on document-based content. Over the two existing forms of propositions, they listed a competitive advantage, one is allegorical methods, and another is supervised learning methods. They used the simple k-mean clustering algorithm to find the positive and negative categories of clusters. An SVM classifier combined with a cluster organization provided better classification accuracies than a stand-alone SVM to control the impressions, feelings, and biases presented in the source material to assess tweet sentiment analysis [16] . They used an algorithm called C3E-SL in their analysis, capable of combining classifier and cluster assemblies. This algorithm will improve tweet classifications from clusters' additional details, assuming the same classmark is more likely to be shared by similar instances from the same clusters. Shreya et al. [17] suggested a study that came from various clusters that belong to polarity wise and subjective wise with sentiment ratings. The sentiment scores are assessed here using Afinn and TextBlob. Therefore, they used extensive data, calculating the Euclidean distance in less time and using the K-means clustering algorithm technique. An extensive approach [18] to find out the appearance of clustering techniques on document sentiment analysis. In their first approach, they showed two types of notices. The first one is a good performance, and the second one is the poor performance when applying the K-means-type clustering algorithm on balanced and unbalanced datasets, respectively. To avoid this problem, they designed a weighting model that worked well on both unbalanced and balanced datasets that were better than the conventional weighting model. Feng et al. [19] researched clustering methods on standard blog posts and got natural emotions from web blogs by topics or keywords, which is a typical approach. A novel approach based on Probabilistic Latent Semantic Analysis (PLSA) is performed. An emotion-oriented clustering technique is proposed to find common emotions affirming the connection of fine-grained sentiment between blogs and blog posts. Farhadloo et al. [20] proposed a score representation with aspect level sentiment identification. This identification is based on positiveness, neutralness, and negativeness. This process is designed with a 3-class SVM classifier to determine feature sets according to a 3-dimensional representation (positive, negative, and neutral). To improve clustering results, authors utilized a bag of nouns (BON) rather than a bag of words (BOW). We introduce some relevant concepts before defining the problem statement. Then we give an overview of our proposed framework. Social Graph: We model the Twitter network as a social graph G = (U, E, T ), where U is the set of nodes (users), E is the set of connections or virtual social relationships among the Twitter users (such as the following relationships in Twitter), and T ¼ fT 1 ; T 2 :::; T m g is the set of topics discussed by the social users U [21] . Topic: A topic is a collection of the most representative words for that topic. For example, politics topic has words like election, government, democratic, parliament, etc. about politics [22, 23] . Social Stream: A social stream S is a continuous and temporal sequence of the tweets posted by the social users U. Query: An input query Q ¼ fT q g consisting top-k trending Topics T q ¼ fT i ; T iþ1 :::; T k g at a particular time interval. Overlapping Time Window: A window of a predefined length len is moved over the social stream S and specifies the intervals to analyze. Let Γ = be a sequence of points in time, I m an interval [t i−len , t i ] of len, where 0 < len � i. We partition Γ into set of equallength intervals denoted as I ¼ fI 1 ; :::; I m g. We consider an overlapping window partially overlaps with the prior window. The degree of overlap is controlled by the parameter Δt [24] . Topical Involvement Score: For each user u i 2 U, we compute her involvement score towards the query Q in a time interval I m using Eqs 1 and 2 which measures u i 's relative participation compared with the most active users at that time interval I m . where κ (ui , Q, I m ) indicates the total number of tweets posted by u i related to lth number topic in Q at I m . Our proposed approach has three stages as presented in Fig 1. Firstly, the pre-processing is performed to remove irrelevant data from the social stream S. Secondly, we apply the topic modeling method on the cleaned data to infer the latent topics and then select top-k trending topics. Then we apply our proposed algorithm to the processed social streams to find top-k trending topics and users' involvement scores. Finally, we detect involved uses' sentiment dynamics and clusters of top-involved users at different time intervals. In general, tweets are informally written and often contain grammatically wrong sentence formations with misspellings and non-standard words. Tweets also contain numerous non-standard forms (e.g., comeee for Come, goooood for good), informal abbreviations (e.g., tmrw for tomorrow, lemme for let me, wknd for weekend), phonetic substitutions (e.g., gdn8 for good night, 4eva for forever, 2day for today), etc. For removing those, we follow some steps to lead the standardization for our next stages. In the first stage, we remove the noise entities such as HTML tag, Stop words, Punctuations, White Spaces, URLs, etc. The next stage is text normalization like as Tokenization, Lemmatization, Stemming, Sentence Segmentation, etc. Finally, word standardization gives us cleaned texts. To improve the quality of our tweet corpus and the fulfillment of the consequent steps, mentioned normalization of the tweets through linear substitution of lexical variants with their conventional forms proposed by Han et al. [25] The use of hashtags (for example #coronavirus, #StayHome) to point out a tweet's topic is common on Twitter. However, neither every tweet contains hashtags, nor hashtags have been written by following any rule. Thus, tracking hashtags rarely leads to the exact topic. Another topic modeling approach T-LDA (Twitter-Latent Dirichlet Allocation) [7] , is a popular way of inferring topics on Twitter. It is a textual analysis tool that deals with microblogs like tweets. Tweets are limited to 140 characters, and within this limitation, a single tweet can refer to a single topic. This restricted characteristic of tweets intercepts traditional text mining tools in their successive execution. T-LDA resolves this issue and potentially works with tweets. Twitter LDA has been implemented based on the following assumptions. • Assuming there are T topics on Twitter and each topic t is generated from background word distribution θ B and topic word distribution θ t . Latent value y dominated by Bernoulli distribution π. identifies a word w to be a background word (y = 0) or a topic word (y = 1). • F u represents a user u's topic of interest. It also determines the assignment of topic t for each word in tweets posted by u. • α d , β d , γ d , and λ d are the parameters of the Dirichlet prior on F u , θ t , π and θ B respectively. • z is the determined topic for a tweet [26] . Table 1 shows the word distribution for top-k topics (k = 3) in different time intervals from 23 rd March, 2020 to 31 st March, 2020. In our proposed model, we set the value of the query Q at each time interval I m as the top-k trending topics related to COVID-19 at that I m . We define trending score (Λ (Ti , I m )) for each topic T i according to Eq 3: Where N T i indicates the total number of tweets related to topic T i and U T i ;I m represents the number of unique Twitter users who posted tweets on T i at time interval I m . The parameter α 2 [0, 1] balances the above two factors. Table 2 shows how the changing value of alpha can effect the top trending topics at a particular time window. In this table top-k topics for three different values of α at two different time We use Eq 3 (α = 0.5) on different time intervals and detect top-k (k = 3) trending topics on Twitter. We consider seven-time intervals starting from 23rd March 2020 to 31st March 2020. Each of these time interval's length is 3 days and we shift them by Δt = 1 days. We also measure how much of these trending topics are discussed by users at those time intervals. Table 3 shows the percentages of top-k = 3 trending topics that indicate it's popularity among users. We consider α = 0.5 to determine the trendiness of topics. We depict this heatmap using seven We develop an algorithm that can detect top-k trending topics before determining the top involved users at a particular time. Algorithm overview. The algorithm, called Query Algorithm, identifies top-k topics from social stream S at each time interval I m through procedure TOP_K_TOPICS (line 9-17) at first. It enumerates the trending score η (Tj , I m ) for each topic T j and adds that score to a priority queue of size k (line [11] [12] [13] [14] [15] [16] . Then it returns the top-k topics based on their trending scores. Next, the algorithm finds the set of users U Q I m from U for a given Q at each time interval I m and then computes users' involvement score σ (ui , Q, I m ) (line 3-6). Finally, the users are sorted by their involvement scores, and the proposed algorithm returns the top 20 users as output (line 7-8). Algorithm 1 Query Algorithm For sentiment identification, we use VADER (Valence Aware Dictionary and Sentiment Reasoner) [27] is a lexicon and rule-based sentiment analysis appliance that precisely harmonizes to sentiments expressed in social media. VADER is open-source and licensed under the MIT available in GitHub. It is the rule-based sentiment analysis engine that carries out the grammatical and syntactical rules. In addition, it recognizes the intensity of sentiment in sentencelevel text. Our processed social streams pass through this engine for the analysis of sentiments and give a score. The scoring formulation is given below: • The compound score (%) is calculated by summing the valence scores of each word in the lexicon, adjusted according to the rules, and then normalized to be between Υ max and Υ min . It is suitable for a single uni-dimensional measure of sentiment for a given sentence. Where, Υ min = −1 = most extreme negative and Υ max = 1 = most extreme positive. Here we take the graded thresholds for classifying sentences as either positive, neutral, or negative. Typical threshold values are: • positive sentiment: % � 0.05 • neutral sentiment: % > -0.05 and % < 0.05 • negative sentiment: % � -0.05 • The pos, neg, and neu scores are the proportion of each category and the multidimensional measures of sentiment for a given sentence. From Table 4 , there we look at three columns. The first column is the social streams (tweets), the second column is polarity, where we observe the value of different sentiments between Υ max and Υ min after applying VADER. Then eventually, we classify the social streams as either positive, negative, or neutral. In this section, we estimate the performance of our algorithm on a real Twitter dataset. We perform all experiments on an AMD Ryzen 7 3700U with Radeon Vega 10 Gfx (8 CPUs), 2.3 GHz Windows 10 PC with 32 GB RAM and 512GB NVME M.2 SSD. We collect COVID-19 related tweets through Twitter lookup API's endpoint that contains 100 million tweets with 10,000 users from 23 March 2020 to 31 March 2020. We consider two performance evaluation measures, one is entropy, and another one is semantic cohesion. Entropy measures with the Equation of 4 and 5 that betokens the randomness of topics discussed in clusters. Here, jUðC j Þj jUj is the weighted probability of a user in cluster C j for discussing a trending topic. p ij is the percentage of active users for that topic in the cluster. entropyðfC j g r j¼1 Þ measures the weighted entropy considering all topics over all the (r) clusters. Usually, a good topical cluster should have a low entropy value. Semantic cohesion is measured with the following Equation from 6 to 8. For this purpose, we find out the main topic of activity of each user u i according to Eq 6. Then, the most recurrent topic in a cluster C j at time interval I m defines with the Eq 7. Finally, we find the semantic cohesion (expertness of cluster) denoted as r ðC j ;I m Þ for a particular topic T j at time interval I m with Eq 8. In this section, we have mentioned the findings of our experiments. Firstly we detect top-k = 3 trending topics and identify the involved users for those topics using our query algorithm. Then we determine their sentiments. Based on these experiments, we make different types of observations regarding users' sentiments. We consider our Table 3 's topics set in each time interval for further experiment. Our query is a set of topics and we fixed k = 3 and α = 0.5 for determining these topic sets. We consider all negative, positive, and neutral tweets for different time windows. We demonstrate the result not only for our query Q = {set of top-k = 3 trending topics} but also for individual topic in the query set. We sketch bar diagrams in Figs 4-6 to represent percentages of positive, negative and neutral tweets posted by our users on a particular time interval for News, Health and COVID-19 test. These three topics appear again and again as top trending topics. Sea green, Coral red and Royal blue bars are indicating the positive, negative, and neutral tweets, respectively. After that, we concentrate on the most involved users' sentiments towards COVID-19 related subtopics. We determine users' involvement scores for our query topics (top trending topics from Table 3 ) at different time intervals. We identify the top 20 involved users at each time interval. Our research has been accomplished with recent year's tweets related to COVID-19. To preserve the privacy of users, we replace some alphabets with ' � ' in usernames. Table 5 shows the sentiment dynamics for the top 20 involved users at each I m . To measure overall sentiment dynamics, we sum up users' sentiment scores towards the query topics. In that table, we can see that the top 20 involved users' list is changing over time. The reason behind this is that user's interests and their involvements in trending topics vary over time. Another remarkable fact in this table is the change of users' sentiments over time. Users who remain in the top 20 on the next I m have different sentiment scores. Here we analyze some users' sentiment dynamics with their involvement below: • PK �� 17 is highly involved in each I m . He has positive sentiments in I 1 ,I 2 and I 3 and diverts to negative sentiments from I 4 to I 7 . More highly involved users like PK �� 17 have different sentiment dynamics at each I m . • Other types of users like ma �� te remain in the top 20 at some time windows. But also drop from the involvement list at the next or previous time windows. ma �� te is a top involved user in I 1 , I 2 , I 3 , but vanishes from the list after I 4 . These users have various sentiment dynamics at a particular I m . • Some users suddenly appear in the top involved list, who have no existence in the list previously (e.g., jg �� 00). User jg �� 00 is not one of the top involved users at I 1 and I 2 , while he is scoring top at the next three I m . This user has non-identical sentiment scores over time. We analyze the top 20 users and bring out ten users whose average involvement in seven I m is greater than other users. We track the changes in their sentiment dynamics. In Fig 7 we sketch these 10 users' sentiments. Here we can observe that users have different types of sentiment scores at different time windows. Even for some users, their sentiment dynamics changes from positive to negative or neutral. After another shift, it is changing into positive again. This heatmap provides clear visualization of sentiments' change over time for a particular user. We also determine sentimental clusters based on the users' sentiment scores. We find cluster C Pos , C Neg and C Neu . We identify these clusters in two different ways. In Table 6 , we identify clusters for each trending topic in the query set. For a user's cluster membership identification, we sum up the sentiment scores of all tweets posted by that user on a particular topic. If she/he achieves a positive sentiment score, then he/she is the member of cluster C Pos . For negative and neutral sentiment scores, a user is the member of C Neg and C Neu respectively. In all time intervals, the C Neu clusters have the highest number of members. Next, we sum up each users' sentiment scores for top-k = 3 trending topics and consider these scores for clustering them. We determine C Pos , C Neg , and C Neu clusters following the same procedure of identifying topic-wise clusters. Table 7 represents the sizes of overall clusters at different time windows. Here, from the first time interval to the fifth time-interval, the positive, negative, and neutral clusters' size change typically. But, in the sixth and seventh time-interval, neutral cluster size increases than usual. From the analysis of this change, top- Fig 8 and visualize the changes in the clusters' size precisely for a graphical representation. The size of these clusters is changing with the shift of time windows. We also notice that the neutral clusters (C Neu ) always have the largest sizes among all. As a first evaluation measure, we find out the entropy of our mentioned positive, negative, and neutral clusters. These clusters are shown in Table 8 . Hence, a good sentimental cluster should have a low entropy value, and here C Neg is 1.401 that depicts the lowest entropy value in the first time interval. The highest value of entropy is 1.584 as C Pos in the second time interval that refers to a bad sentimental cluster comparatively. We also see the diversity of entropy values where some values explicate good sentimental clusters, and some define bad sentimental clusters. Schematic cohesion, which is our second evaluation measure, is represented in Table 9 that leads to clusters' expertness. Here, we see the most outstanding value of schematic cohesion of C pos and C neg are 0.60 (News) and 0.63 (News) respectively in the first time interval. Furthermore, the most economical value of C pos is 0.39 (Health) in the third time interval. From the observation of this table, we see the heterogeneity among the clusters as values. Here, considering two topics, one is News, and another one is Health. Tracking top involved users' sentiments and sentimental clusters over time is the main objective of this work. Therefore, we conduct these experiments focusing on the topics that have the most trendiness on Twitter at a particular time. Depending on the unique users' number and their activities on Twitter about a specific sub-topic related to COVID-19, we identify top-k trending sub-topics. Table 1 holds information regarding trending sub-topics. Table 2 shows how the value of α controls given two parameters for a sub-topics trendiness detection. When we change the value of α, the list of top trending subtopics is changing. It changes by either the topic title or by the serial of topics in the list. Notably, very few users can sustain the top involved list at all time intervals for related trendy topics. In Table 3 topic 'Lockdown' appears in the top sub-topics list at I 1 and then vanishes from the list after that. Other topics may remain on the list at more than a time window, but the percentages of their popularity change over time. This observation becomes clearer when we notice the heatmap in Fig 3. To find out the sentiment from the social stream, we use VADER. It depicts in Fig 9 as an architecture view. Table 4 shows some examples of social streams with sentiment results. By using 'Users' involvement Detection Algorithm, we bring out top r involved users' sentiments and analysis over time. Table 5 holds the top 20 involved user's sentiment scores. Notably, very few users can sustain the top involved list at all time intervals for related trendy topics. COVID-19 has a particular impact on users' sentiments. So we intend to focus on the most involved users' sentiments. With the flow of time, users' overall sentiments on top COVID-19 topics are changing. In Table 5 , we can observe that the change of time window brings changes in the top 20 involved users' lists and their sentiments. This list in each I m is mixed with negative, positive, and neutral sentiments. Fig 7 has ten specific users' overall sentiment scores on various time windows represented by a heat map that indicates these changes more specifically and visually. We also illustrate the sentimental clusters topic-wise and overall. Table 6 shows the topic wise sentimental clusters and Table 7 displays the overall sentimental clusters in each time window. The 3D visualization can help to regulate the behavior of overall sentimental clusters. It is sketched in Fig 8. Finally, Table 8 exhibits the entropy of clusters at each time window that serves the randomness of a cluster as the reference of entropy value. Table 9 depicts the schematic cohesion at each time window that mirrors the clusters' expertness. Users' sentiment for diverse purposes has brought attention to research on social networks. It contains great importance in the COVID-19 pandemic situation. This paper proposed a model to identify users' sentiment dynamics for top-k trending sub-topics related to COVID-19. It has also detected the top active users based on their involvement score on those trending topics. This work successfully derives a function to calculates user's involvement scores towards Query topics and determines the top 20 involved users to analyze their sentiment at the different periods. We accomplish this research with the latest Twitter data and bring out that both users' involvement and their sentiments vary after a particular time. In the future, besides the determination of active users, we want to develop a methodology to track top negative and positive users by analyzing their sentiments. Data curation: Md Shoaib Ahmed, Tanjim Taharat Aurpa. World leaders' usage of Twitter in response to the COVID-19 pandemic: a content analysis A survey on opinion mining and sentiment analysis: tasks, approaches and applications. Knowledge-based systems Sentiment analysis of nationwide lockdown due to COVID 19 outbreak: Evidence from India. Asian journal of psychiatry COVID-19 and the 5G conspiracy theory: social network analysis of Twitter data Covid-transformer: Detecting trending topics on Twitter using universal sentence encoder Clustering Active Users in Twitter Based on Top-k Trending Topics Comparing Twitter and traditional media using topic models. InEuropean conference on information retrieval Informational flow on Twitter-Corona virus outbreak-topic modelling approach COVID-19 public sentiment insights and machine learning for tweets classification. Information COVID-19 on social media: Analyzing misinformation in Twitter conversations Detecting topic and sentiment dynamics due to Covid-19 pandemic using social media. InInternational Conference on Advanced Data Mining and Applications Using social media to mine and analyze public opinion related to COVID-19 in China Machine learning-based sentiment analysis for Twitter accounts. Mathematical and Computational Applications Twitter sentiment analysis using hybrid cuckoo search method. Information Processing & Management A clustering-based approach on sentiment analysis Combining classification and clustering for tweet sentiment analysis. In2014 Brazilian Conference on Intelligent Systems Clustering and sentiment analysis on Twitter data Exploring performance of clustering methods on document sentiment analysis Extracting common emotions from blogs based on finegrained sentiment clustering. Knowledge and information systems Multi-class sentiment analysis with clustering and score representation Discovering and Tracking Active Online Social Groups Online Topical Clusters Detection for Top-k Trending Topics in Twitter Query Oriented Active Community Search Discovering and tracking query oriented active online social groups in dynamic information network Lexical normalization for social media text Online topic model for Twitter considering dynamics of user interests and topic trends VADER: A Parsimonious Rule-based Model for Sentiment Analysis of Social Media Text