key: cord-0231577-b3kjge3z authors: Schild, Leonard; Ling, Chen; Blackburn, Jeremy; Stringhini, Gianluca; Zhang, Yang; Zannettou, Savvas title: "Go eat a bat, Chang!": An Early Look on the Emergence of Sinophobic Behavior on Web Communities in the Face of COVID-19 date: 2020-04-08 journal: nan DOI: nan sha: c3c57a11b5ecb2c364f50bc938dc39cf593b5d02 doc_id: 231577 cord_uid: b3kjge3z The outbreak of the COVID-19 pandemic has changed our lives in unprecedented ways. In the face of the projected catastrophic consequences, many countries have enacted social distancing measures in an attempt to limit the spread of the virus. Under these conditions, the Web has become an indispensable medium for information acquisition, communication, and entertainment. At the same time, unfortunately, the Web is being exploited for the dissemination of potentially harmful and disturbing content, such as the spread of conspiracy theories and hateful speech towards specific ethnic groups, in particular towards Chinese people since COVID-19 is believed to have originated from China. In this paper, we make a first attempt to study the emergence of Sinophobic behavior on the Web during the outbreak of the COVID-19 pandemic. We collect two large-scale datasets from Twitter and 4chan's Politically Incorrect board (/pol/) over a time period of approximately five months and analyze them to investigate whether there is a rise or important differences with regard to the dissemination of Sinophobic content. We find that COVID-19 indeed drives the rise of Sinophobia on the Web and that the dissemination of Sinophobic content is a cross-platform phenomenon: it exists on fringe Web communities like dspol, and to a lesser extent on mainstream ones like Twitter. Also, using word embeddings over time, we characterize the evolution and emergence of new Sinophobic slurs on both Twitter and /pol/. Finally, we find interesting differences in the context in which words related to Chinese people are used on the Web before and after the COVID-19 outbreak: on Twitter we observe a shift towards blaming China for the situation, while on /pol/ we find a shift towards using more (and new) Sinophobic slurs. The coronavirus disease caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is the largest pandemic event of the information age. SARS-CoV-2 is thought to have originated in China, with the presumed ground zero centered around a wet market in the city of Wuhan in the Hubei province [50] . In a few months, SARS-CoV-2 has spread, allegedly from a bat or pangolin, to essentially every country in the world, resulting in over 1M cases of COVID-19 and 50K deaths as of April 2, 2020 [45] . Humanity has taken unprecedented steps to mitigating the spread of SARS-CoV-2, enacting social distancing measures that go against our very nature. While the repercussions of social distancing measures are yet to be fully understood, one thing is certain: the Web has not only proven essential to the approximately normal continuation of daily life, but also as a tool by which to ease the pain of isolation. Unfortunately, just like the spread of COVID-19 was accelerated in part by international travel enabled by modern technology, the connected nature of the Web has enabled the spread of misinformation [44] , conspiracy theories [28] , and racist rhetoric [30] . Considering society's recent struggles with online racism (often leading to violence), and the politically charged environment coinciding with SARS-CoV-2's emergence, there is every reason to believe that a wave of Sinophobia is not just coming, but already upon us. In this paper, we present an analysis of how online Sinophobia has emerged and evolved as the COVID-19 crisis has unfolded. To do this, we collect and analyze two large-scale datasets obtained from Twitter and 4chan's Politically Incorrect board (/pol/). Using temporal analysis, word embeddings, and graph analysis, we shed light on the prevalence of Sinophobic behavior on these communities, how this prevalence changes over time as the COVID-19 pandemic unfolds, and more importantly, we investigate whether there are substantial differences in discussions related to Chinese people by comparing the behavior pre-and post-COVID-19 crisis. Main findings. Among others, we make the following findings: 1. We find a rise in discussions related to China and Chinese people on Twitter and 4chan's /pol/ after the outbreak of the COVID-19 pandemic. At the same time, we observe a rise in the use of specific Sinophobic slurs, primarily on /pol/ and to a lesser extent on Twitter. Also, by comparing our findings to real-world events, we find that the increase in these discussions and Sinophobic slurs coincides with real-world events related to the outbreak of the COVID-19 pandemic. 2. Using word embeddings, we looked into the context of words used in discussions referencing Chinese people finding that various racial slurs are used in these contexts on both Twitter and /pol/. This indicates that Sinophobic behavior is a cross-platform phenomenon existing in both fringe Web communities like /pol/ and mainstream ones like Twitter. 3. Using word embeddings over time, we discover new emerging slurs and terms related to Sinophobic behavior, as well as the COVID-19 pandemic. For instance, on /pol/ we observe the emergence of the term "kungflu" after January, 2020, while on Twitter we observe the emergence of the term "asshoe," which aims to make fun of the accent of Chinese people speaking English. 4. By comparing our dataset pre-and post-COVID-19 outbreak, we observe shifts in the content posted by users on Twitter and /pol/. On Twitter, we observe a shift towards blaming China and Chinese people about the outbreak, while on /pol/ we observe a shift towards using more, and new, Sinophobic slurs. Disclaimer. Note that content posted on the Web communities we study is likely to be considered as highly offensive or racist. Throughout the rest of this paper, we do not censor any language, thus we warn the readers that content presented is likely to be offensive and upsetting. Due to its incredible impact to everybody's life in early 2020, the COVID-19 pandemic has already attracted the attention of researchers. In particular, a number of papers studied how users on social media discussed this emergency. Chen et al. [10] release a dataset of 50M tweets related to the pandemic. Cinelli et al. [12] , Singh et al. [41] , and Kouzy et al. [31] studied misinformation narratives about COVID-19 on Twitter. Lopez et al. [32] analyzed a multi-language Twitter dataset to understand how people in different countries reacted to policies related to COVID-19. A number of papers studied racist activity on social networks. Keum and Miller [29] argued that racism on the Internet is pervasive and that users are likely to encounter it. Zimmerman et al. [56] focused on the influence that the anonymity brought by the Internet has on the likelihood for people to take part in online aggression. Relia et al. [39] found that racist online activity correlates with hate crimes. In other words, users located in areas with higher occurrence of hate crimes are more likely to engage in racism on social media. Yang and Counts [53] studied how users who experienced racism on Reddit self-narrate their experience. They characterize the different types of racism experienced by users with different demographics, and show that commiseration is the most valued form of social support. Zannettou et al. [55] present a quantitative approach to understand racism targeting Jewish people online. As part of their analysis, they present a method to quantify the evolution of racist language based on word embeddings, similar to the technique presented in this paper. Hasanuzzaman et al. [22] investigated how demographic traits of Twitter users can act as a predictor of racist activity. By modeling demographic traits as vec-tor embeddings, they found that male and younger users (under 35) are more likely to engage in racism on Twitter. Other work performed quantitative studies to characterize hateful users on social media, analyzing their language and their sentiment [9, 40] . In particular, it focused on discrimination and hate directed against women, for example as part of the Pizzagate conspiracy [8, 11] . Remarks. To the best of our knowledge, ours is the first datadriven study on the evolution of racist rhetoric against Chinese people and people of Asian descent in light of the COVID-19 pandemic. To study the extent and evolution of Sinophobic behavior on the Web, we collect and analyze two large-scale datasets from Twitter and 4chan's Politically Incorrect board (/pol/). Twitter. Twitter is a popular mainstream microblog used by millions of users for disseminating information. To obtain data from Twitter, we leverage the Streaming API 1 , which provides a 1% random sample of all tweets made available on the platform. We collect tweets posted between November 1, 2019 and March 22, 2020, and then we filter only the ones posted in English, ultimately collecting 222,212,841 tweets. 4chan's /pol/. 4chan is an imageboard that allows the anonymous posting of information. The imageboard is divided into several sub-communities called boards: each board has its own topic of interest and moderation policy. In this work, we focus on the Politically Incorrect board (/pol/), simply because it is the main board for the discussion of world events. To collect data, we use the data collection approach from Hine et al. [23] , to collect all posts made on /pol/ between November 1, 2019 and March 22, 2020. Overall, we collect 16,808,191 posts. Remarks. We elect to focus on these two specific Web communities, as we believe that they are representative examples of both mainstream and fringe Web communities. That is, Twitter is a popular mainstream community that is used by hundreds of millions of users around the globe, while 4chan's /pol/ is a notorious fringe Web community that is known for the dissemination of hateful or weaponized information [23] . We start our analysis by studying the temporal dynamics of words related to "china" and "chinese" on 4chan's /pol/ and Twitter. Also, we investigate the prevalence of several racial slurs targeted towards Chinese and Asian people. Figure 1 shows the number of occurrences of "china" and "chinese," and the proportion of posts containing these two words on 4chan's /pol/ on a daily base. We also annotate (with vertical lines) real-world events related to the COVID-19 pandemic (see Table 1 for more details). We first observe a sudden increase for both words around January 23, 2020, the day the Chinese government officially locked down the city of Wuhan marking the first large-scale effort in China to combat COVID-19. 2 After the Wuhan lockdown, the popularity of "china" and "chinese" declines until the latter part of February, right around the time that COVID-19 cases started to appear en masse in Europe. On February 24, 2020 (annotation 4 in the figure), 11 municipalities in Lombardy, Italy were put on lock-down in an attempt to slow the explosion of community spread cases, and we start to see the use of "china" and "chinese" slightly rise again. This rate increases dramatically around March 9, 2020 (annotation 5), which is when the Italian government extended the lock-down to the entirety of Italy. The second peak comes around March 16, 2020, when Donald Trump referred COVID-19 as "Chinese Virus" in a tweet. Interestingly, after this event, we observe a peak in activity where around 10% of all posts made on /pol/ are related to China (see Figure 1 ). On Twitter (see Figure 2 ) we see the same high level trend: discussion about "china" and "chinese" has a large up tick when Wuhan is locked down, and then declines until COVID-19 hits Europe. There is one important difference however. The amount of relative discussion on Twitter during the first peak is much lower than the level of discussion once Europe comes into play. This may be due to the fact that discussion on Twitter is more geographically distributed, or that 4chan's /pol/ is more easily inflamed by conspiracies and racism-related posts. Social distance may work as one factor in illustrating the gap between two peaks. Referring to the perception of others [48, 6, 1, 14] , this perception can be elevated by a familiarity of cultural, nationality, ethics, education, occupation, etc. Geographically intimacy, as well as close cultural background, leads to higher attention on COVID-19 outbreak in Europe than the lock-down in Wuhan. Besides "china" and "chinese," we also analyze the temporal dynamics of Sinophobic racial slurs on 4chan's /pol/ and Twitter. We pick a set of 8 Sinophobic slurs, including "chink," "bugland," "chankoro," "chinazi," "gook," "insectoid," "bugmen," and "chingchong." 3 Some of them are well-known racial slurs towards Chinese and Asian people [51] , such as "chink," "chingchong," and "gook." Others (e.g., "bugland") are based on preliminary results where we used word embeddings to discover other racial slurs (see Section 5 for more details). The results are depicted in Figure 3 and 4 for /pol/ and Twitter, respectively. For /pol/, we observe a general trend that is similar with the trend observed before for the mentions of "china" and "chinese" (see Figure 1 ). That is, we observe two main peaks of activity around January 23, and March 16, 2020, for most of the slurs. In particular, it is worrisome that the use of most slurs keeps increasing after the event where Donald Trump referred to COVID-19 as "Chinese Virus." By looking at some posts on /pol/ posted after March 16, 2020, we find several worrisome examples of hateful rhetoric that call for violence: e.g., "I really hope the world gets together to exterminate every last Chink on this planet." and "Seconded, every chink needs to exterminated, the yellow Jew needs to be erased from this world." When looking at the popularity of these terms, we find that "chink" is the most popular Sinophobic slur on /pol/ with an order of magnitude more posts compared to other slurs like "gook" and "chingchong." On Twitter (see Figure 4 ), we observe a rise of Sinophobic slurs during December 2019, especially for "chink." By manually examining the peak, we find that this happens because of a tweet that went viral including the slur "chink." 4 Apart from this outlier, we observe an increase in the use of Sinophobic slurs after January 23; again the increase in the use of these slurs is lower than the one observed on /pol/ (cf. Figure 3 and 4). By the end of our dataset, we observe a substantial increase in the use of the slur "chingchong." Looking at some examples of tweets including this specific term, we find several hateful comments: e.g., "All because Ching Chong had to chow a fucking frog raw for lunch. #coronavirussafety" and "...Fuck you'd just do to me you little shit I'll break your fucking neck fucking ching chong corona viru...". Finally, to better visualize the rise of Sinophobic slurs, we show the number of posts including any of the eight identified slurs posted after January 1, 2020, in Figure 5 . Overall, we observe a rise in Sinophobic slurs, mainly on /pol/, and to a lesser extent on Twitter. Despite this fact, we observe a substantial increase on Twitter on March 16, 2020. In a nutshell, these findings indicate that Sinophobic behavior is on the rise and that it is a cross-platform phenomenon. A common theme among racist ideology is that of an invading virus. History is rife with examples of diseases being attributed to specific races and nationalities, and there is no reason to believe that COVID-19 would buck this trend; the first identified COVID-19 cases did originate in China. However, the world today is much more diverse and connected than it was in the 15th century when Italians dubbed syphilis the "French disease." Figure 1 and 2 make it quite clear that 4chan and Twitter are heavily discussing China in relation to COVID-19, and that this discussion accelerated rapidly once the Western world became affected. The upswing is potentially related to the scapegoating phenomenon [47] The first cases originated in China, and the Chinese government was the first to take active and serious measures to combat its spread prompting a reasonable degree of discussion. When these measures were ineffective in preventing the spread to the Western world, however, China's existing association with COVID-19, in particular China's "failure" to prevent its spread make it a just scapegoat [2] in the face of a looming pandemic. That said, we do see meaningful differences in the use of slurs on /pol/ and Twitter. /pol/'s use of slurs tracks with the use of "china" and "chinese" to a worrying degree, but this is much less pronounced on Twitter. This is not entirely unsurprising considering that /pol/ is well known to be a locus of racist ideology, however it is worthwhile discussing some of the theory around why it tracks so well. The clearly racist reaction fits the notion of defensive denial, which is a common strategy for coping with stress [3, 17, 24, 27, 42] . Essentially, the early stages of COVID-19 were exclusively a Chinese problem; "superior" Western society had nothing to worry about, even though experts were warning of a pandemic breakout even before Wuhan was locked down. This conforms with the scapegoating theory of clinical psychology, in which members of a group project unwanted self aspects onto another person or group, then attack the scapegoat believing that "this is not me" [13, 18, 38] . Political scientists have argued that scapegoating is a major driver for racism in a number of settings [15, 37] . To analyze the content, more specifically the context of the use of specific words, we train multiple word2vec models [33] for each Web community. In a nutshell, these models map words into a high-dimensional vector space so that words that are used in a similar way are close to each other. To do this, we leverage the skip-gram model, which is a shallow neural network aiming to predict the context of a specific word. In this work, we train three groups of word2vec models for each of Twitter and /pol/: 1. One word2vec model (W A ) trained on all posts made during the period between October 28, 2019 and March 22, 2020. We denote the period by T . This model allows us to study the use of words for the entire duration of our study. 2. One distinct word2vec model for each week between October 28, 2019 and March 22, 2020, denoted by W t=i , i ∈ T (i is the ith week in T ). These models allow us to study changes in the use of words over time. 3. One word2vec model trained on historical data for all posts shared between July 1, 2016 and November 1, 2019 (W C ). This model acts as a baseline and allows us to investigate the emergence of new terms during the period of our study. First, we look into the overall use of words on 4chan's /pol/ using the word2vec model trained on the period between October 28, 2019 and March 22, 2020 (W A ). In this model, words used in similar context will present similar vectors. The left side of Table 2 reports the top 20 most similar words for the terms "china," "chinese," and "virus." We make several observations: first, we note that there are many derogatory terms for Asian people, Chinese people in particular, in the top 20 most similar terms. Some examples include "chink" (derogatory term referring to Asian people), "chinkland" (referring to the land of chinks, i.e., China), and "chiniggers" (an offensive word created by combining "china" and "nigger"). For instance, a /pol/ user posted: "We should have never let these Chiniggers into the country or enforced a mandatory quarantine for anyone coming from contaminated areas. But it's too late now." Another /pol/ user posted: "You chinks deserve it, there's no shithole of a country that could be as disgusting as chinkland." This indicates that /pol/ users use a wide variety of derogatory terms to possibly disseminate hateful ideology towards Chinese and Asian people. Second, by looking at the most similar words of the term "virus," we find several terms related to the COVID-19 pandemic [50] . This is evident since the four most similar words to the term "virus" are related to COVID-19, specifically, "coronovirus," "covid," "coronavirus," and "corona." This indicates that the overall use of words in /pol/ is highly affected by the COVID-19 pandemic, and this event is likely to cause changes in the use of language by users. The corresponding results for Twitter is shown on the right side of Table 2 . On Twitter we observe multiple politicalrelated terms that are similar to "china" and "chinese," such as "government" and "ccp" (Chinese Communist Party). Furthermore, we again observe, some potentially offensive terms like "chinazi," which indicates that the use of Sinophobic content is not limited to fringe Web communities like 4chan, and it also exists in mainstream Web communities like Twitter. Also, many terms that are similar to "virus" are also related to COVID-19, such as "corona" and "coronavirus." This indicates Twitters users' word usage are influenced by the COVID-19 pandemic as well. To better visualize the use of language related to Chinese people, we create graphs that visualize the use of words that are similar to the term "chinese," following the methodology by Zannettou et al. [55] . In a nutshell, we create a graph where nodes are words and an edge between the words exists if their cosine similarity (obtained from the trained word2vec model) is above a pre-defined threshold. 5 We limit the graph into nodes that are two hops away from a specific word of interest (in this case "chinese"). Then, we perform various tasks for visualizing the graph. First, the graph is layed out in the graph space with an algorithm that takes into account the weights of the edges [26] . That is, words that have large cosine similarities are layed out closer in the graph space. Second, the size of each node is relative to the number of occurrences of the word in our dataset. Third, we run the the Louvain community detection method [5] on the graph and represent nodes that belong to the same community with the same color. The resulting graphs are depicted in Figure 6 and 7 for /pol/ and Twitter, respectively. By inspecting the obtained communities of words in Figure 6 , we observe several interesting themes around the use of words related to "chinese." First, we observe a community that is highly related to the COVID-19 pandemic (blue community on bottom right). Interestingly, within this community, we also observe terms like "biowepon" (sic) and "bioattack," likely indicating that /pol/ users are sharing probably false information about the pandemic, for instance claiming that the whole pandemic is a "bioattack" from the Chinese on the Western world. For example, a /pol/ user posted: "Anyone that doesn't realize this is a Chinese bioweapon by now is either a brainlet or a chicom noodle nigger." Second, we observe two tightlyknit communities (red and yellow communities on left-side of the graph) that appear to predominantly include derogatory terms towards Asian, and in particular Chinese people. Some of the words in these communities are "ricenigger," "chinksect," "chankoro," "chinks," "yellowniggers," and "pindick." By looking at some examples of posts from /pol/ users, we observe the use of these terms for disseminating hate: e.g., "Chang you useless ricenigger fuck off. Just call the bitch and ask her youll see this is fucking ccp bs. ITS A FUCKING EX-PERIMENTAL CHINK BIOWEP" and "I fucking hate chinks. Stop spreading viruses everywhere you pindick cunts." Interestingly, the most distant word in these communities is the word "batsoup," which is closer to the community related to COVID-19 [44] . The rest of the communities in this graph are seemingly related to China in general (purple community) and to other countries in Asia (green community). Overall, this graph highlights that /pol/ users use a wide variety of derogatory terms to characterize Chinese people. When looking at the graph obtained for Twitter (see Figure 7) , we observe an interesting community of terms (blue), which includes words related to the COVID-19 pandemic. We observe a large number of words that are seemingly anti-China like the terms "makechinapay," "blamechina," and "chinaisasshoe." At the same time, there are a lot of terms referring to the virus itself like "chinawuhanvirus," "chinaflu," and "coronacontrol," as well as a few terms that aim to support Chinese people through this crisis like "staystrongchina." For example, a Twitter user posted: "How do you say Chi-com asshoe? #ChinesePropaganda #ChinaLiedPeopleDied #Chi-naVirus #WuhanCoronavirus." The other communities on the graph include various terms related to happenings in China and other Asian countries/regions. By taking a deeper look at profanities that appeared among the terms, we can roughly divide them into two groups: one is insults addressing Asian people, such as racist variations of "china" and "chinese" (e.g., "chinkland," "chingchong," and "chinksect") or culturally oriented racist terms, including attacking dietary habits (e.g., "ricenigger"), skin tone (e.g., "yellownigger"), or sexual stereotypes (e.g., "pindick"). The frequent appearance of swear words among the terms can indicate an abreaction to the rising fear and stress in front of the disease [21, 16] . At the same time, the racist and targeted focus of these slurs can be explained with the mechanism of defensive aggression, either focusing on cultural taboos, such as sexuality [16, 25] , or perpetuating societal oppression [25] . Discussions on Web communities like 4chan's /pol/ and Twitter are highly dynamic and respond to real-world events as they unfold. Thus, we expect users on these Web communities to discuss various topics related to the COVID-19 pandemic. Moreover, events like the COVID-19 pandemic unfold over time, and this is reflected by the dynamics of discussion on Web communities. In previous sections we explored both the usage of some key terms related to Sinophobia, as well as a static understanding of content. However, these previous analyses do not help us understand how Sinophobic language is evolving over time. More specifically, there is a lack of understanding on how the context in which words are used changes, and also how new words are created. The former is important because it provides significant insights into the scope and breadth of the problem. The latter is important because the language of online extremism has been shown to include memes and slang that have completely contradictory meanings to "normal" usage, or do not even exist outside of the communities that use them. We first study the Sinophobic language evolution on 4chan's /pol/, and in Section 6.2, we will focus on Twitter. To study the evolution of discussions and use of language, we make use of the weekly word2vec models (W t=i , i ∈ T ). To illustrate how these models are helpful, we initially compare the results from the model trained on the first week of our dataset (W t=0 ) with the model trained on the last week of our dataset (W t=−1 ). Table 3 reports the top 20 similar words to "china," "chinese," and "virus," for the first and last weekly word2vec models (similar to how Table 2 shows results for a model trained on the entirety of our dataset). Interestingly, we observe major differences between the most similar words obtained from the first and last models (comparing left sides of the Table with the right side), as well as between the whole model and these two weekly models (cf. Table 2 and Table 3) . We make several key observations. First, when looking at the most similar words to the term "china" from the first week model (left side of Table 3 ), we observe words referring to other counties, mostly in Asia (e.g., "japan," "singapore," etc.), but also that the derogatory term "chinks" is among the top 20. This result indicates that 4chan's /pol/ users typically use racial slurs targeted to Chinese people, and this was also happening even before the outbreak of the COVID-19 pandemic. Similar findings can be observed by looking at the most similar words to the term "chinese." We observe the existence of racial slurs like "chink," however, most of the other words relate to people originating from other Asian countries, such as "koreans." When looking at the most similar words to the term "virus," before the COVID-19 pandemic, we observe general terms related to diseases or other outbreaks, e.g., "ebola." Second, by comparing the most similar words from the first and last models, we observe several interesting differences. By looking at the most similar words to the term "china," we observe that derogatory terms like "chink" have a higher cosine similarity compared to the first model, likely indicating a rise in the use of this term in discussions related to China. Furthermore, we observe terms like "chernobyl," which may indicate that /pol/ users are comparing this outbreak with the Chernobyl disaster. For example, a /pol/ user posted: "I can see China collapsing after all this, just as the Chernobyl incident was the beginning of the end for the USSR...." We also see the term "childkiller," which upon manual investigation is due to a particularly active user repeatedly posting that China created COVID-19 as a bioweapon. Specifically, we find multiple occurrences of the following sentence in multiple /pol/ posts: "CHINA CRE-ATED THE CHINA BIOWEAPON MURDER DEATH CHILD-KILLER VIRUS IN CHINA!" Interestingly, we also find some terms that seem to be sarcastic towards the way that Chinese people talk English. For instance, the term "numba" refers to Table 3 : Top 20 most similar words to the words "china," "chinese," and "virus" for the first and last trained word2vec models from 4chan's /pol/. the word "number" and "asshoe" refers to the term "asshole." Some examples from /pol/ posts are: "Dont trust China, China is asshoe" and "TAIWAN NUMBA 1 CHINA NUMBA NONE!" Third, by looking into the most similar words to the term "chinese," we observe the term "bioterrorism" likely indicating that 4chan's /pol/ users are calling Chinese people as bioterrorists that is likely related to conspiracy theories that COVID-19 was bioengineered. For example, a /pol/ user posted: "THIS IS BIOTERRORISM NUKE CHINA NOW." By looking at the most similar words to the term "virus," we find that the most similar one is the term "bioengineered," indicating that the conspiracy theory went viral on /pol/ during that specific week and was discussed extensively. For instance a /pol/ user posted: "The bat soup is just a cover-up. One of (((Leiber)))'s chinks stole the bioengineered virus & tried to patent it in China, violating export-controlled laws & committing espionage. My guess is, he didn't handle the virus correctly, got himself sick, then infected others in the Wuhan wet market." Finally, by looking at the other similar words to the term "virus," we clearly observe those that are related to the COVID-19 pandemic with terms like "wuflu" (created by combining Wuhan and Flu), "covid," and "corona." For instance, a /pol/ user posted "Die to wuflu already, boomers." These differences are also more evident by looking at the graph visualizations in Figure 8 . To create these graphs, we use the same methodology as Figure 6 , for the first and last weekly trained word2vec models, visualizing the two-hop neighborhood of the term "chinese." Looking at the graph obtained from the first model (see Figure 8 (a)), we observe mostly innocuous terms related to Chinese people and other Asian people. By looking at the graph obtained from the last model (see Figure 8 (b)), however, we observe an entirely different, more hateful behavior. Specifically, the two main tightly-knit communities (red and blue communities), are filled with slurs used against Chinese people like "ricenigger," "fuckface," "zipperhead," "bugpeople," "subhumans," etc. Example of posts from /pol/ include: "I hope you fucking die in hell, you psychopathic zipperhead. You and your whole disgusting race" and "We should unironically nuke China. Kill some bugpeople and eradicate COVID-19 at the same time." Overall, these findings indicating that we are experiencing an explosion in the use of Chinese derogatory terms in fringe Web communities like 4chan's /pol/, in particular after the outbreak of the COVID-19 pandemic. These findings are particularly worrisome, since it is likely that as the pandemic evolves, it is likely to have further rise in the dissemination of racist and hateful ideology towards Chinese people that might also have real-world consequences, such as physical violence against Chinese people. Discovering new terms. Next, we aim to study how new terms, related to "chinese," emerge on 4chan's /pol/ and how their popularity changes over the course of our dataset. To achieve this, we make use of the terms extracted from the vocabularies of the trained word2vec models on 4chan's /pol/. Specifically, we initially extract the vocabulary from the model trained on historical data (W C ) and treat it as our base vocabulary. Then, for each weekly trained model (W t=i , i ∈ T ), we extract the vocabulary and compare the terms with our base vocabulary: for each term that is new, we add it to our base vocabulary treat it as a new term. Since, we want to find new terms that are related to Chinese, we filter out all new terms that have a cosine similarity below 0.5 in the weekly trained model for which we discovered the new term. Overall, using the above methodology, we manage to discover a total of 50 new terms. Then, we visualize the popularity of the 20 most popular new terms of the course of our dataset in Figure 9 . We observe the emergence of several interesting words during the the end of January, 2020. First, we observe the emergence of terms like "batsoup," likely indicating that /pol/ users are discussing the fact that the COVID-19 outbreak, allegedly started by Chinese people consuming bats. Second, by the same time, we observe the emergence of "biolab" and "biowarfare." The use of these words indicate that /pol/ users discuss various conspiracy theories on how the COVID-19 virus was created on a lab or how it can be used as a bioweapon. Interestingly, these terms are persistent from their emergence till the end of our datasets, indicating that these theories are generally appealing to 4chan's userbase. Other interesting new terms include the terms "kungflu," which an offensive term towards Chinese people related to the COVID-19 virus, and "heinsberg," which is the center of the outbreak in Germany and indicates that /pol/ users was discussing about it, especially during the end of February, 2020 and beginning of March, 2020. The echo chamber effect [54] performs significantly on 4chan, that the narratives towards COVID-19 are consistently blaming China, and being racist, or spreading conspiracy theory, which alarms for the risk of information manipulation Table 4 : Top 20 most similar words to the words "china," "chinese," and "virus" for the first and last trained word2vec models on Twitter. Figure 9 : Visualization of the emergence of new words related to "chinese" over time on 4chan's /pol/. [7, 49] . Previous studies on social networks have shown that a small number of zealots can distort collective decisions, especially on ambiguous events [52, 43] . Now, we focus on the Sinophobic language evolution on Twitter. We follow the same methodology used in Section 6.1. The corresponding results are depicted in Table 4 and Figure 10 . From Table 4 , we can observe that during the first week covered by our Twitter dataset, many similar terms to "china" and "chinese" are related to politics, such as "tradewar." This is again quite different from the result on /pol/ (see Table 3 ). Meanwhile, for "virus," the most similar terms are also related to diseases. However, when checking results on our last week Twitter data, we observe that many Sinophobic terms appear to be semantically similar to "china" and "chinese," such as "chinazi." As in 4chan's /pol/ (see Table 3 ), newly created Sinophobic terms, including "chinavirus" and "kungflu," appear to be close in context as well. For example, a Twitter user posted: "I agree. Too specific. It's obviously called the kungflu. It's kicking all of our asses regardless of denomination." Moreover, many terms with similar contexts to "china" and "chinese" in our last week Twitter dataset are still about politics. In contrast to the first week Twitter data, these politicalrelated terms are related to COVID-19, e.g., "ccpvirus," and some of these terms even convey the meaning of revenge and punishment towards China, such as "boycottchina." For instance, a Twitter user posted: "#ChineseVirus is chinesevirus. One name. #BoycottChina #ChinaLiesPeopleDie." By looking into the graphs obtained from the first and last weekly trained word2vec models (see Figure 10 ) we again observe substantial differences between the first and last models. The graph from the first model includes mainly words related to China and other Asian regions, as well as words used for discussing matters related to China, e.g., "tradewar." On the other hand, for the graph obtained from the last model, we observe several terms related to COVID-19 like "chinavirus," "chinesevirus," "chineseflu," and "chinaisasshoe." This indicates a shift towards the use of racist terms related to Chinese people after the COVID-19 outbreak on Twitter. We also observe some terms that appear related to the behavior of Donald Trump. For instance, the term "racistinchief" is likely related to the fact that Donald Trump calls the COVID-19 virus as "Chinese Virus," and this was discussed on Twitter. For instance, a Twitter user posted: "Trumps a real asshole, just in case yall forget #Trump-Pandemic #TrumpVirus #RacistInChief." Discovering new terms. To discover new terms from Twitter, we follow the same methodology with /pol/, as documented in Section 6.1. Overall, we discover a total of 713 new terms between October 28, 2019 and March 22, 2020. Figure 11 visualizes a sample of 40 of the new terms according to their popularity and cosine similarity with the term "chinese." We observe a lot of new terms relating to the Hong Kong protests emerging during November 2019, such as "freehongk" and "hongkongpoliceb." Also, after the outbreak of the COVID-19 pandemic, we observe the emergence of a wide variety of terms around the end of January 2020. Some notable examples include terms like "chinavirus," "chinesevirus," "wuhanpneumo-nia," "wuhancorono," etc. These findings highlight that during important real-world events, such as the COVID-19 pandemic, language evolves and new terms emerge on Web communities like Twitter. At the same time, it is particularly worrisome that we observe the appearance of new terms that can be regarded as Sinophobic like "chinesevirus," which can possibly lead to hate attacks in the real-world, and almost certainly harm international relations. As the last part of our analysis, we set out to assess how the semantic distance between words change over the course of our datasets. To do this, we leverage the weekly trained word2vec models (W t=i , i ∈ T ): for each word2vec model, we extract the cosine similarity between two terms and then we plot their similarities over time. This allow us to understand whether two terms are mapped closer to the multi-dimensional vector space over time, hence visualizing if two terms are used more in similar context over time. We show some examples in Figure 12 : the terms are selected based on our previous analysis. We observe several interesting changes in the similarities between terms over time. Specifically, for the terms "chinese" and "virus" (see Figure 12 (a)) we observe a substantial increase in cosine similarity between these two terms over time, especially after the week ending on January 19, 2020. The cosine similarity on both Twitter and /pol/ was below 0.5 in the early models, while after January 19, 2020, it is mostly over 0.5, with the last model having a similarity over 0.6. This indicates that the terms "chinese" and "virus" are used in more similar ways over time on both Twitter and /pol/. Another example are the terms "chinese" and "chink" (see Figure 12 (b)). We observe that for both Twitter and /pol/ the similarity between these terms increases over the course of our datasets. Interestingly, the increase in cosine similarity between these terms is larger for Twitter, likely indicating that Twitter users are more affected by the COVID-19 with regards to sharing Sinophobic content, while on /pol/ the difference is smaller which indicates that /pol/ users were affected less by COVID-19 when it comes to sharing Sinophobic content. Finally, we illustrate also the cosine similarity differences between the terms "chinese" and "bat"/"pangolin" in Fig-ure 12(c) and 12(d), respectively. For "bat," we observe that the cosine similarity was low during our first models and it substantially increased after the week ending on January 26, 2020. This indicates that both Twitter and /pol/ users have started discussing the fact that the virus allegedly originates from "bats" around that specific time frame and they continued doing so until the end of our datasets. For "pangolin," we observe some differences across the two Web communities: on /pol/ the users were not discussing pangolins at all before January 26, 2020 and after that they started discussing them with a high cosine similarity to the term "chinese" (over 0.4). On the other hand, on Twitter we observe that users were discussing pangolins even before the COVID-19 outbreak. To combat the COVID-19 pandemic, many governments have implemented unprecedented measures like social distancing and even government enforced, large-scale quarantines. This has resulted in the Web becoming an even more essential source of information, communication, and socialization. Unfortunately, the Web is also exploited for disseminating disturbing and harmful information, including conspiracy theories and hate speech targeting Chinese people. Part of this can be attributed to scapegoating, a basic psychosocial mechanism to deal with stress. Building upon the well known in-group favoritism/out-group hostility phenomenon, racist ideology has a long history of scapegoating. A common scapegoating theme has been to equate the targeted people with a disease, either figuratively or literally. When threatened by events outside our control, it is only "natural" to seek for external blame. In the case of COVID-19, the entire world is threatened, and there is a "natural" external actor to blame. In this paper, we make a first attempt to understand Sinophobic language on the social Web related to COVID-19. To this end, we collect two large-scale datasets from 4chan's /pol/ and Twitter over a period of five months. Our results show that COVID-19 has indeed come with a rise of Sinophobic content mainly on fringe Web communities like /pol/ and to a lesser extent on mainstream ones like Twitter. Relying on word embeddings, we also observe the semantic evolution of Sinophobic slurs. Moreover, our study also shows that many new Sinophobic slurs are created as the crisis progresses. Our study has several implications for both society and the research community focusing on understanding and mitigating emerging social phenomena on the Web. First, we showed that the dissemination of hateful content, and in particular Sinophobic content, is a cross-platform phenomenon that incubates both on fringe Web communities as well as mainstream ones. This prompts the need to have a multi-platform point-of-view when studying such emerging social phenomena on the Web. Second, we showed that Sinophobic behavior evolves substantially, especially after life changing events like the COVID-19 pandemic. This highlights the need to develop new techniques and tools to understand these changes in behavior and work towards designing and deploying counter-measures with the goal to prevent or mitigate real-world violence stemming from these behaviors. While the COVID-19 crisis does provide a unique opportunity to understand the evolution of hateful language, our study should be also be taken as a call for action. The Web has enabled much of society to keep going, or at least to maintain social connections with other humans, but it has also allowed, and potentially encouraged the proliferation of hateful language at a time where we can afford it the least. Social distance and social decisions The nature of prejudice Diagnosing discrimination: Stress from perceived racism and the mental and physical health effects. Sociological Inquiry China coronavirus: Lockdown measures rise across hubei province Fast unfolding of communities in large networks A social distance scale A 61-million-person experiment in social influence and political mobilization Hate is not binary: Studying abusive behavior of #gamergate on Twitter Measuring #GamerGate: A tale of hate, sexism, and bullying Covid-19: The first public coronavirus twitter dataset A conspiracy of fishes, or, how we learned to stop worrying about #GamerGate and embrace hegemonic masculinity The covid-19 social media infodemic Scapegoating: Dynamics and interventions in group counseling Prejudice, social distance, and familiarity with mental illness Scapegoat racism and the sacrificial politics of security The emotional force of swearwords and taboo words in the speech of multilinguals Positive tertiary appraisals and posttraumatic stress disorder in us male veterans of the war in vietnam: the roles of positive affirmation, positive reformulation, and defensive denial The dynamics of scapegoating in small groups Coronavirus: Italy extends emergency measures nationwide Italy imposes draconian rules to stop spread of coronavirus /italy-draconian-measures-effort-halt-coronavirus-outbreakspread Why are males inclined to use strong swear words more than females? an evolutionary explanation based on male intergroup aggressiveness Demographic word embeddings for racism detection on twitter Kek, Cucks, and God Emperor Trump: A Measurement Study of 4chan's Politically Incorrect Forum and Its Effects on the Web Viability of coping strategies, denial, and response to stress Swearing: A social history of foul language, oaths and profanity in English Forceat-las2, a continuous graph layout algorithm for handy network visualization designed for the gephi software Coping with traumatic life events Coronavirus myths, scams and conspiracy theories that have gone viral Racism on the internet: Conceptualization and recommendations for research. Psychology of violence they just see that youre asian and you are horrible": How the pandemic is triggering racist attacks Coronavirus goes viral: Quantifying the covid-19 misinformation epidemic on twitter Understanding the perception of covid-19 policies by mining a multilanguage twitter dataset Distributed Representations of Words and Phrases and their Compositionality Trump tweets about coronavirus using term 'chinese virus Naming the coronavirus disease (covid-19) and the virus that causes it Statement on the second meeting of the international health regulations (2005) emergency committee regarding the outbreak of novel coronavirus Any four black men will do: Rape, race, and the ultimate scapegoat Reflections on racism Ethnicity and National Origin-based Discrimination in Social Media and Hate Crimes Across 100 US Cities Characterizing and detecting hateful users on Twitter A first look at covid-19 information and misinformation sharing on twitter Discrimination-related stress effects on the development of internalizing symptoms among latino adolescents Information gerrymandering and undemocratic decisions Bat soup, dodgy cures and 'diseasology': the spread of coronavirus misinformation Confirmed cases pass 1 million as it happened Trump signs off on trade deal with china to avert december tariffs The scapegoat as an essential group phenomenon Construal-level theory of psychological distance The spread of true and false news online Wikipedia. 2019-20 coronavirus pandemic Automating power: Social bot interference in global politics Understanding self-narration of personally experienced racism on reddit What is gab: A bastion of free speech or an alt-right echo chamber A Quantitative Approach to Understanding Online Antisemitism Online aggression: The influences of anonymity and social modeling Figure 11 : Visualization of the emergence of new words related to "chinese" over time on Twitter.