key: cord-0196893-566jcbc0
authors: Dolin, Pavel; d'Hauthuille, Luc; Vattani, Andrea
title: FeelsGoodMan: Inferring Semantics of Twitch Neologisms
date: 2021-08-18
journal: nan
DOI: nan
sha: 959989d7cab02891d0f8b781e56e5c4d6f73a887
doc_id: 196893
cord_uid: 566jcbc0

Twitch chats pose a unique problem in natural language understanding due to a large presence of neologisms, specifically emotes. There are a total of 8.06 million emotes, over 400k of which were used in the week studied. There is virtually no information on the meaning or sentiment of emotes, and with a constant influx of new emotes and drift in their frequencies, it becomes impossible to maintain an updated manually-labeled dataset. Our paper makes a two fold contribution. First we establish a new baseline for sentiment analysis on Twitch data, outperforming the previous supervised benchmark by 7.9% points. Secondly, we introduce a simple but powerful unsupervised framework based on word embeddings and k-NN to enrich existing models with out-of-vocabulary knowledge. This framework allows us to auto-generate a pseudo-dictionary of emotes and we show that we can nearly match the supervised benchmark above even when injecting such emote knowledge into sentiment classifiers trained on extraneous datasets such as movie reviews or Twitter.

Live streaming platforms such as Amazon Twitch or YouTube Live have become increasingly popular in the last decade and have seen an even faster growth in the last couple of years due to the COVID-19 pandemic and the rise of esports. Users on these platforms watch videogame players livestream their gameplay, and comment live on the stream to share their opinions with the streamer and the rest of the audience. Given the instantaneous and idiosyncratic nature of the chat room culture, the language used is very different from a formal conversation. It is riddled with grammatical errors, abbreviations, game-specific lingo, as well as emoji and emoji-like icons. In particular, Twitch users make heavy usage of emotes, which are Twitch-specific icons or animations used to express a particular emotion, feeling, or inside-joke 1 .

Emotes on Twitch have become a language of their own and have both changed and enriched how people communicate with each other on the platform. They can be interspersed within text to change the meaning of a message, or constitute an entire message on their own. They are rendered when users type a predefined string, e.g. "Kappa" → and "LUL" → . There are over 8 million emotes-over 400,000 were observed in the week surveyed, constituting one third of all unique tokens on Twitch. Like memes, emotes are generated by the community, causing a constant change in their frequency and meaning.

One emote which whose meaning has changed over time is "FeelsGoodMan" → , based on a cartoon frog from a 2005 comic by the artist Matt Furie. Furie's cartoon frog was adopted by right wing posters on various online forums like 4chan in the early 2010s. Since then, Furie has campaigned to reclaim the meaning of his character, and the emote has seen an upsurge in more mainstream non hate usage (Glitsos and Hall, 2019) and positive usage on Twitch. Our results on Twitch agree, showing that "FeelsGoodMan" and its counterpart "FeelsBadMan" are mainly being used literally.

Continuous introduction of new emotes and their cryptic origins makes it unfeasible to maintain curated dictionaries documenting their meaning and semantics. With the exception of the recent work by (Kobs et al., 2020) , which classified 100 top emotes and labeled 2000 Twitch chat messages, there is a lack of analytical studies focusing on understanding Twitch data and the enigmatic language of emotes. In this paper, we aim to fill this gap.

In this paper we set to address two core tasks:

(A) Perform sentiment analysis on Twitch data better than previous baselines set by (Kobs et al., 2020) . In addition, we introduce a framework that can handle emote drift without additional major data labeling effort.

(B) Provide a broad insight into emote semantics and their sentiment. This is to address the lack of a broad understanding of thousands of emotes.

To address Task (A):

• We conducted a thorough set of experiments comparing standard traditional machine learning methods for supervised sentiment analysis on Twitch data. To the best of our knowledge, no such foundational analyses have been performed; only a lexicon-based approach and a deep learning approach with noisy labels have been tried by (Kobs et al., 2020) .

• We show that our best model outperforms the previous benchmark set by (Kobs et al., 2020) by 7.36 percentage points on accuracy.

• We break down the performance of our base classifiers and demonstrate that features with emotes constitute more than 50% of feature importance, while comprising only 20% of features.

• We introduce a drift-resilient framework to Learn Out Of Vocabulary Emotions (LOOVE). Requiring little to no additional data labeling, LOOVE is able to incorporate new emote knowledge into existing models without relying on emotes as explicit features.

As for Task (B):

• We create an emote pseudo-dictionary based on word embedding neighborhoods.

• We automatically infer a corresponding sentiment for thousands of emotes from the emote pseudo-dictionary.

The remainder of the paper is organized as follows: Section 3 provides an overview of the related literature. Section 4 presents our experiments to establish a new set of supervised baselines on the Twitch dataset. In Section 5, we introduce our framework LOOVE to augment external classifiers with emote knowledge. Finally, Section 6 discusses the construction and properties of the emote pseudo-dictionary.

Additionally, in the Appendix A we present the Emote Case study. In Appendix B we study of trends in the Twitch Unlabeled Dataset that we collected for the study. In Appendix C we showcase additional applications of Twitch w2v embeddings.

There are few studies on Twitch and emotes. We relied on all existing relevant Twitch studies as well as the relevant literature found on other neologisms such as emoticons, emoji and slang.

Labeled Twitch emote data is virtually non-existent. Kobs et al. (2020) conducted a study with the Twitch community and semantically labeled 100 frequently occurring emotes in 2018. Although the top 100 emotes account for 35.1% of the tokens and the top 1000 account for 52.1% of tokens, there are over 8 million total emotes, with over 400,000 emotes observed in the week studied, and the number growing every day. The primary contribution of that work was providing the sentiment analysis baseline for Twitch data, as well as a labeled dataset of 2000 chat messages. For the baseline, they used an Average Based Lexicon approach, which represents a comment as a sequence of tokens with an assigned sentiment from a look up table. To come up with the sentiment score for the entire comment, they averaged the sentiment scores of the tokens. This approach along with its variation achieved a 61.8% and 62.8% accuracy, respectively. They also employed a Convolutional Neural Network (CNN) approach, which was weakly-trained on the data generated by the Average Based Lexicon approach. This resulted in 63.8% accuracy.

Another noteworthy study of emotes was done by (Barbieri et al., 2017) , who tried to address emote prediction, i.e. which emote the user is more likely to use, and trolling detection, i.e. "a specific set of emotes which are broadly used by Twitch users in troll messages." The authors were only predicting the top 30 most frequently used emotes. The highest F1 score for emote prediction was 0.39. The highest F1 score for trolling detection was 0.81.

Other emote and Twitch related studies include "Classification of viewers by their consumption behavior and analysis of subscribers' emote usage" (Oh et al., 2020) ; prediction of subscription status of a user in a channel based on user's comments (Loures et al., 2020) ; and a master's thesis on language variety on Twitch entitled "The present text is a research into the language usage in Computer-Mediated Communication, specifically on the online streaming platform Twitch.tv." (Hope, 2019) .

The sentiment analysis involving emojis and emoticons have been addressed with both deep learning (DL) and traditional machine learning (ML) approaches using various datasets. A notable DL approach is a emotion-semantic-enhanced bidirectional long short-term memory (BiLSTM) network with the multi-head attention mechanism model (EBILSTM-MH) (Wang et al., 2020) . This achieved 71.70% accuracy on microblog text data involving emojis. Additionally authors showed that an SVM model achieved 66.81 accuracy on the same dataset. Another traditional ML approach for sentiment analysis, in this case using Twitter data, was carried out by (Illendula and Yedulla, 2018) . It is based on emoji embeddings from 147 million tweets, trained on Random Forest (RF) and Support Vector Machine (SVM) with an overall accuracy of 62.1% and 65.2% respectively. This approach outperformed the then current state of the art.

Studies have been done to understand the semantics of emoji and emoticons. One notable work is EmojiNet by (Wijeratne et al., 2016) . This is the first machine readable sense inventory for emojis. The researcher created a centralized table of emoji definitions, incorporated from multiple resources. Additionally, through word sense disambiguation techniques they assigned senses to emojis. Additionally, (Illendula and Yedulla, 2018) presented a thorough study of emoji semantics and their use in social media posts. (Wilson et al., 2020) used Urban Dictionary 2 as a corpus to create slang word embeddings. For sentiment prediction (64.4% accuracy) and sarcasm prediction (80.2% accuracy) they achieved marginally better scores than other standard pre-trained em-beddings such as word2vec-GoogleNews when initializing classifiers with UD embeddings. Other notable approaches include: a framework that combines probabilistic inference with neural contrastive learning that models the speaker's word choice in a slang context (Sun et al., 2021) . In addition a BiLSTM based model was utilized for slang detection and identification at the sentence level with an F1-score of 0.80.

In this section we establish a new set of baselines for Twitch chat sentiment outperforming the past state of the art method (Kobs et al., 2020) by 7.36 percentage points on accuracy. We also investigate the driving features behind the method, showing that emotes contribute significantly to the performance of the model, constituting on average over half of the Gini feature importance (using a Random Forest classifier). This is despite being only a fifth of the classifier's features.

For our training and testing corpus we used the Twitch sentiment dataset by (Kobs et al., 2020) which we will refer to as the Emote 

We focused on Twitch chat sentiment analysis using traditional ML approaches, because, to the best of our knowledge, it had not previously been investigated. We trained Naive Bayes (NB), Logistic Regression or Maximum Entropy (ME), Random Forest (RF) and Support Vector Machines with linear kernel (SVM) models as they have been the most popular traditional ML algorithms for sentiment detection to date (Yadav and Vishwakarma, 2020; Zimbra et al., 2018) .

The input features to the models were constructed using a well established simple sentiment analysis approach based on a bag-of-features (Pang et al., 2002) . We tested both unigrams and unigrams plus bigrams as input features. We additionally tried three text processing methods. In our study we call minimal text processing Processing 1 (P1). It involves punctuation removal, lowercasing tokens and removing like characters that occur more than three consecutive times 4 . Processing 2 (P2) refers to P1 processing plus stop word removal. Processing 3 (P3) is P2 plus lemmatization of tokens.

To our surprise, the "textbook" ML approaches with minimal processing, which we call P1, outperforms the previous Twitch sentiment baselines (Kobs et al., 2020) of 63.8% by 7.36 percentage points on accuracy in the best case on the same EC dataset. The accuracy results are summarized in Table 1 .

The first column of Table 1 describes the classifier and the type of input features, the rest of the columns are processing type. The integer following the classifier's name refers to the number of ngrams generated from the corpus 5 . We break down the performance of P1.RF.2 to demonstrate the driving features behind the classifier. Table 2 shows the cumulative sum of the Gini RF feature importance. Because we trained a ternary one-versus-rest classifier, results in the table are displayed for each class: positive, negative and neutral. Additionally, because the features consist of unigrams and bigrams, emotes can occur as unigrams and as a part of bigram. We differentiate it as follows: the column "emotes only" refers to unigram emote features and the "emotes+" column refers to bigrams that have at least one emote. Across the three classes "emote only" contributes on average 0.4493, "emote+" on average contributes 0.0938, while constituting 0.104 and 0.1036 respectively of the total features. Combined emote features constitute 0.5431 6 of the Gini feature importance to the performance of the model (using a RF classifier). This is despite being only 20.76% of the features. This is summarized in We further examine the top 100 features of positive, negative and neutral classifiers. First we arrange features of each classifier by Gini index. Then we split the features into 2 categories: emote features 7 and "other" features. For each feature set we generate a histogram from the feature position index. Results are presented in Figure 1 . From the figure it is evident that emote histogram is biased toward "0" implying higher importance, while the "other" histogram is biased towards "100". The mean/median of the emote feature indices is 42.16/38 and the mean/median of "other" feature indices is 59.35/61.

To conclude, it is important to note that the performance difference between P1.RF.1 and P1.RF.2 is marginal. This implies that the introduction of bigrams is not that significant to the overall performance of the classifier. In fact the difference between a significant number of classifier combinations listed in Table 1 are marginal, implying that the choice of a classifier with these features is not significant, perhaps with the exception of NB. However, the presence of emote features is significant. 

Now that we have established solid baselines for the fully supervised case, we consider the task of nearing these benchmarks with a solution that resists drift and requires minimal to no supervision. This is necessary because new emotes are constantly introduced, and their usage distribution frequently changes. Our baseline models study from Section 4 demonstrated the critical importance of including emote information in the model. In that case, that information was explicitly encoded in per-emote features. We now want to abstract away from this requirement in order to further resist drift and ensure generalization to new emotes.

We introduce a simple but powerful framework that successfully meets the requirements above. In particular, our framework-which we call LOOVEis able to Learn Out Of Vocabulary Emotions, and enrich existing models with this knowledge.

The framework is depicted in Figure 2 . We start with an existing sentiment classifier: this could be a Twitch sentiment classifier that needs to be enriched with the knowledge of newly-introduced emotes; or could be a sentiment classifier trained on a completely separate dataset such as Twitter. The output of this classifier is then concatenated with emote sentiment stats obtained without needing labeled data. Specifically, for each unseen emote in the text being evaluated, its sentiment is auto-generated, averaging out the sentiment of known words in the word embedding neighborhood of the emote. Rather than introducing per-emote features, we perform "pooling" by keeping only a few statistics, such as the mean, max, min of the sentiment of the emotes in the text. The LOOVE framework has several amenable properties. First, word embeddings can be trained in an entirely unsupervised manner. A periodic retraining or fine-tuning of this space removes the need to maintain a labeled dataset or a manual lexicon as new emotes are introduced. Second, our framework decouples the existing classifier from the new OOV knowledge. In practice this is very important since companies are wary of completely changing or retraining their production classifiers given they might be used across different applications. Third, while we could have encoded the emote knowledge simply by concatenating the actual emote embedding vector (with some pooling across emotes), the decision to encode just a few stats (possibly even just the average inferred sentiment) is a much more robust choice: it results in just a handful of parameters which makes tuning of the final classification extremely simple and customizable (which can be done manually or learned with very few examples); these stats are also resilient to word embedding space rotations or shifts happening upon retraining or fine-tuning. Finally, we point out that our framework is not limited to sentiment analysis, nor emotes, and can be applied to other scenarios suffering from out-of-vocabulary issues.

We trained a w2v model (Mikolov et al., In addition to the w2v model we compiled a reference sentiment table. We used the VADER lexicon (Hutto and Gilbert, 2014) augmented with an emoji/emoticon lexicon (Novak et al., 2015) 9 . For each emote, we generated a sentiment value by finding the top 5 neighboring words in the embedded space with an existing sentiment value in the reference sentiment table and took their mean 10 .

We observed 0.353 RMSE when tested against Vader's vocabulary and 0.275 RMSE accuracy when we tested against 100 manually labeled emotes provided by (Kobs et al., 2020) . We want to point out that this method is limited, as not every emote has neighbors that are in the reference sentiment table. Due to this limitation, we are able to generate sentiment for 22,507 emotes, even though we have embedding for over 144,000 emotes. Despite this limitation, automatically labeling over 22,000 emotes is a tremendous leap forward, as only 100 emotes have been classified before in the literature.

We used the EC dataset described in Section 4 and three publicly available datasets for ternary sentiment classification, Rotten Tomatoes (RT) (Pang and Lee, 2005) , Twitter Dataset (T) of manually labeled tweets (Eisner et al., 2016) and sampling Yelp Dataset 11 (Y). All datasets were split in a stratified fashion with 80% designated for training and 20% for testing. RT has 8,528 with 42%/20%/38% positive/neutral/negative split; Twitter has 64,596 examples with 29%/46%/24% class split. For Yelp we used 150,000 examples with balanced classes.

For the second stage of the model, we incorporated abstracted emote information in the form of their sentiment stats and combined it with the prediction of the classifier trained on external data. The resulted feature vector is shown in column 1 of Table 5. Using these features we trained a secondary classifier using the EC dataset. Despite using EC for training, we are only effectively using this data for statistical information about emotes. 

Accuracy results for LOOVE variants 12 are presented in Table 4 . As depicted in Figure 2 LOOVE is composed of 2 classifiers: CLF1 and CLF2. CLF1 is trained on an external dataset. CLF2 is trained on EC dataset using the outputs of emote pseudo-dictionary stats and CLF1 output. The idea is to maximize the use of external datasets, while minimizing the reliance on EC. In our experiment we trained CLF1 on 4 external datasets: EC, RT, T and Y using three algorithms: ME, SVM and RF. Additionally we tested 2 "edge cases". For the first edge case we removed CLF1 so the model only predicts using emote pseudo-dictionary. For the second edge case we removed emote pseudodictionary and CLF2 testing the performance of CLF1.

In Table 4 each numerical column represents an algorithm choice for CLF1. Each numerical row refers to the dataset CLF1 is trained on. The first numerical column depicts the first edge case. The first numerical row of represents the second edge case.

As expected, CLF1 trained on EC gives the best performance irrespective of CLF2. However, The best performance independent of EC training for CLF1 is obtained when using RF for CLF1 trained on Twitter data (RF.T). This combination achieves 69.31% accuracy which is on par with the fully supervised SoTA baselines obtained in Section 4. We want to note that CLF1 trained only on Twitter without any emote information performs worse than a coin flip when applied to Twitch data. However, when enriched by LOOVE, its performance shoots up and nearly matches our best supervised benchmark.

In Table 5 we examine RF.T's features for CLF2. We again see that the driving features are emotes. Only 0.0708 Gini importance comes from the predicted label of CLF1, the rest 0.9292 are driven by emotes stats.

Feature Name Importance mean emote sentiment 0.2853 min emote sentiment 0.2568 max emote sentiment 0.2546 number of emotes 0.0968 source ds predicted label 0.0708 std emote sentiment 0.0357 

Given the vital importance of emotes and the success of the LOOVE we want to examine the w2v embedding space that LOOVE is based on and take a closer look at the structure of the emote pseudodictionary. We used t-SNE to visualize embeddings in 2D of the top 1000 emotes, 1000 words, 1000 emojis, and 240 emoticons, for a total of 3240 tokens (Figure 3) . Visually, one can see that words, emotes, and emoticons overlap while the emoji cluster is more isolated. However, it is also visually evident that tokens cluster by their corresponding type. In Figure 4 , we show the distributions of the 100 clos- Figure 3 : Top emotes, emojis, words, and emoticons (t-SNE with perplexity = 50, n iters = 3000.). The orange oval is sadness, the purple oval is annoyance/disappointment, the pink square represents laughing/trolling, and the blue square excitement. est neighbors for each token type 13 . We can see that emojis indeed tend to cluster around their own type with very little exceptions, just like words do. Emotes and emoticons also like to cluster around their own type, but have neighbors from other types as well. A partial success of the emote pseudodictionary can be attributed to the fact that emotes tend to cluster around words with 0.45 frequency. Since the labeled tokens from VADER are predominantly words, these are used as neighbors in k-NN to learn emote sentiment. This is why it is possible to construct the emote pseudo-dictionary without relying on the sentiment of the nearest neighbor emotes themselves.

The box overlaying Figure 3 zooms into a particular area of the space where we can find four distinct clusters representing the emotions of sadness, annoyance/disappointment, laughing/trolling, and excitement ("PogChamp"-like emotes). We examine this trend-clustering by sentiment-to see if it remains true across the embedded space. Using tokens from the reference sentiment table from Section 5, we looked at the sentiment of the top 1,000 token neighbors and plotted sentiment histograms by label type (Figure 5, top row) . Since the sentiment in the lookup table is a float between -1 (negative sentiment) and 1 (positive sentiment), we define each label type using an even 0.66 partitioning. In addition to tokens in the sentiment lookup table, we also plotted the derived emote sentiment ( Figure 5 , bottom row).

The top row shows that all distributions are bimodal, with only the negative distribution being significantly biased towards the negative sentiment. On the other hand, the distributions from the bottom row are more pronounced. We can see that the positive distribution now has a clearly defined shift towards 1. The neutral distribution is a lot less bimodal. However, the negative distribution is not as pronounced as before, though it is still heavily biased towards -1. Overall, the newly generated distributions have a more pronounced bias towards the expected sentiment class. When we look at the derived emote sentiment, the bottom row of (Figure 5 , we see a more prominent distribution for the positive and neutral class, with a slight bias towards negative sentiment for negative classes. This serves as an indicator that on average local neighbors tend to have the same sentiment, further illustrating the strength of our emote pseudo-dictionary.

In Appendix A we present a case study involving "FeelsGoodMan" and "FeelsBadMan" emotes as an example of emote pseudo-dictionary use for emote interpretation. In Appendix C we propose several applications of the Twitch w2v embeddings in other fields.

There are many problems that still need to be addressed. Despite establishing a new Twitch sentiment baseline, which performs sentiment analysis on par with other methods and datasets in terms of accuracy (Zimbra et al., 2018) , overall performance can still be improved. A potential improvement to our transfer learning model as well as emote emote pseudo-dictionary could be in the construction of an actual synonym space, rather than directly using w2v space. A notable approach to create a synonym-antonym space has been proposed in the literature by (Samenko et al., 2020) . Using this kind of vector space to find both synonym and antonym emotes could be more successful. 

We created multiple baselines for sentiment analysis that in the best case outperformed the previous metric by 7.36 percentage points. We established the importance of emotes in sentiment analysis of Twitch data by examining the features of the baseline models, showcasing the importance of emote features. We then introduced our LOOVE unsupervised framework, which abstracts away from the explicit use of emotes as features and uses emote stats along with a classifier trained on non Twitch data to predict sentiment. This model performs nearly on par with fully supervised baselines. LOOVE is based on w2v embeddings trained on over 313 million Twitch chat messages in conjunction with k-NN. A driving feature behind the framework is a emote pseudo-dictionary which can be used to derive sentiment for unknown emotes. Using this emote pseudo-dictionary, we created a sentiment table for 22, 507 emotes. This is the first case of emote understanding on this scale.

In Figure 3 we showed a zoomed-in t-SNE plot of four distinct clusters representing the emotions of sadness, annoyance/disappointment, laughing/trolling, and excitement ("PogChamp"-like emotes). Here we present a case study, looking at two contrasting emote representatives, "Feels-GoodMan" and "FeelsBadMan". We aggregate the top 10 neighbors for each emote and display them in Figure 6 .

From the figure it is evident that the top neighbors for each are semantically similar. The top neighbors for "FeelsGoodMan" are "EZY", "EZ", "FeelsAmazingMan", "FeelsOkayMan", "wide-peepoHappy", "pepeW" and "Okayge". Each of these can be considered a different emote "flavor" of "FeelsGoodMan", depicting "Pepe the Frog" with a positive connotation. Other neighbors are used with a positive sentiment as well.

"FeelsBadMan" tells a similar story, but with a polar opposite sentiment. "Sadge", "PepeHands", "Smoge", "sadge", "peepoSad" again depict "Pepe" just like "FeelsBadMan", but with a sad, negative connotation. Other neighbors, do not feature "Pepe" but represent sadness as well. It is quite remarkable that the neighbors in both cases not only contain the same semantics as the original emote, but several feature "Pepe" as well.

To further strengthen the case we show that it is possible to find "FeelsBadMan" using vector additions starting from "FeelsGoodMan" (and viceversa) . As demonstrated by (Mikolov et al., 2013) with the W oman+King −M an = Queen example, we observed that by adding the frown emoticon ":(" to "FeelsGoodMan" and subtracting the smile emoticon ":)", we obtain "FeelsBadMan" in the top 3 closest embeddings Figure 6. Figure 6 : Neighbors of "FeelsGoodMan" vs. "Feels-BadMan" (all strings besides the emoticons above are emotes). Adding ":(" to "FeelsGoodMan", subtracting ":)"

Our unlabeled dataset consisted of 313M chat messages from 521 thousand streams over the course of 1 week in April (06/07/21 -06/13/21). There was an average of roughly 45K unique streamers per day. The number of messages per stream varies wildly, depending on the popularity of the streamer and the game they are playing. Up to 30% of streams on any given day receive no messages. Looking at the data for a randomly selected day (06/08/21), the median number of messages for a stream is 117, the mean 744, with a STD of 13,585. The top 1200 streams account for 50% of all messages, while representing 1.7% of all streams (71,917).

We fetched 8.06M emotes from three sources: the Twitch official API, FrankerFaceZ (FFZ), and Bet-terTTV (BTTV). FFZ consists of 253,335 emotes, BTTV consists of 381,389 emotes, with the remaining Twitch official emotes. Of these, 41k emotes appear in more than one group. While the number of Twitch official emotes dwarf the other two, FFZ and BTTV emotes are incredibly popular, representing 44 of the top 100 most used emotes.

Considering that emotes account for 33% of unique tokens in the Twitch Unlabeled Dataset (as depicted in 3), we wanted to understand their usage frequency. By generating a rank-frequency distribution, we showed that emotes follow a power law (Figure 7) . In fact, emote rank-frequency distribution is quasi-Zipfian with a power of 0.97. Similarly, we observe that words follow a power law, as expected. However, emojis and emoticons behave somewhat differently. 

Towards the understanding of gaming audiences by modeling twitch emotes

Association for Computational Linguistics

emoji2vec: Learning emoji representations from their description. Computation and Language

The pepe the frog meme: an examination of social, political, and cultural implications through the tradition of the darwinian absurd

hello [streamer] PogChamp": The language variety on twitch

Vader: A parsimonious rule-based model for sentiment analysis of social media text

Learning emoji embeddings using emoji cooccurrence network graph

Emote-Controlled: Obtaining Implicit Viewer Feedback Through Emote-Based Sentiment Analysis on Comments of Popular Twitch.tv Channels

Stinkycheese: Chatbased model for subscription classification

Efficient estimation of word representations in vector space

Borut Sluban, and Igor Mozetič

Cross-cultural comparison of interactive streaming services: Evidence from twitch

Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales

Thumbs up?: sentiment classification using machine learning techniques

Synonyms and Antonyms: Embedded Conflict. Computation and Language

2021. A computational framework for slang generation

Emotion-semanticenhanced bidirectional LSTM with multi-head attention mechanism for microblog sentiment analysis

EmojiNet: Building a machine readable sense inventory for emoji

Urban dictionary embeddings for slang NLP applications

Sentiment analysis using deep learning architectures: a review

The state-of-the-art in twitter sentiment analysis: A review and benchmark evaluation

In addition to using embeddings for sentiment analysis, there are other useful ways to apply these new-found embeddings. They can be used to learn slang immediately as it proliferates in the wild, improve brand safety classifiers, quickly extend a knowledge graph or add dimension to existing nodes within it, and build improved classifiers for games, genres, and industries.Today, slang evolves and proliferates extremely quickly due to the ubiquity of the Internet and peerto-peer communication platforms. A platform like Twitch can be used to learn synonyms and replacements for common words and slang. The huge volume of messages and the culture of the Twitch user base makes it an ideal place to learn about new memes and slang in an unsupervised way, since they will often be some of the first users.Brand safety is a related application. Many moderation tools rely on keywords and regular expressions to detect profane, racist, toxic, and sexual content. While these are precise, they are likely to miss clever and new misspellings, and will certainly miss entirely new strings which are being used to "safely" convey the same meaning as their known counterparts. These could be words, emojis, or even emotes. This w2v model provides a way to organically learn which alternate constructions are being used to circumvent existing moderation filters.Another application is in expanding a Knowledge Graph to incorporate related entities, or to add variations/nicknames of known entities. This works best in the gaming-space since that is the community's focus, but also for television shows, cryptocurrencies, sports, etc. Given the string "hikaru", the closest embeddings are names of dozens of other chess players. In 6 Given the string "morde" (short for "Mordekaiser", a champion in League of Legends), the model returns other champions from the game and their nicknames. This was also tested for a few other words such as "vaxx" (short for vaccine), "grau", a popular gun in the game Call of Duty: Warzone, and other words.