key: cord-0063709-bsmghsie authors: Huang, Hong; Chen, Zhexue; Shi, Xuanhua; Wang, Chenxu; He, Zepeng; Jin, Hai; Zhang, Mingxin; Li, Zongya title: China in the eyes of news media: a case study under COVID-19 epidemic date: 2021-05-26 journal: Front Inform Technol Electron Eng DOI: 10.1631/fitee.2000689 sha: 6b8f38d824e48a7b74132aef39f7d2b804e3de08 doc_id: 63709 cord_uid: bsmghsie As one of the early COVID-19 epidemic outbreak areas, China attracted the global news media’s attention at the beginning of 2020. During the epidemic period, Chinese people united and actively fought against the epidemic. However, in the eyes of the international public, the situation reported about China is not optimistic. To better understand how the international public portrays China, especially during the epidemic, we present a case study with big data technology. We aim to answer three questions: (1) What has the international media focused on during the COVID-19 epidemic period? (2) What is the media’s tone when they report China? (3) What is the media’s attitude when talking about China? In detail, we crawled more than 280 000 pieces of news from 57 mainstream media agencies in 22 countries and made some interesting observations. For example, international media paid more attention to Chinese livelihood during the COVID-19 epidemic period. In March and April, “progress of Chinese vaccines,” “specific drugs and treatments,” and “virus outbreak in U.S.” became the media’s most common topics. In terms of news attitude, Cuba, Malaysia, and Venezuela had a positive attitude toward China, while France, Canada, and the United Kingdom had a negative attitude. Our study can help understand China’s image in the eyes of the international media and provide a sound basis for image analysis. The news is influenced by various social factors. It is difficult to truly reflect the whole picture of an event due to the journalists' and editors' participation at all levels. Every country, every media, and even every person would have its own mixed and conflicted (Zhang L and Wu, 2017) , and portrayed as a socialist country, a significant power, an authoritarian state, and a militant obstructive force (Wang, 2003; Zhang L and Wu, 2017) . Traditional methods lack scalability, and the analysis of granularity is relatively simple. Chen et al. (2021) investigated China's image during the COVID-19 epidemic period from a sentiment analysis perspective, and analyzed only aspects of sentiment on Twitter data from the open public. A more intuitive question is, what is China's image at multiple levels and portrayed by overseas mainstream media in a specific period? How do they report China? How do other factors influence their reporting over time? To our knowledge, none of these issues has been thoroughly examined. In this study, we look at China during the COVID-19 epidemic period as an example to see how overseas media portray China. As one of the early COVID-9 epidemic outbreak areas, China attracted the global media's attention early in 2020. At the beginning of the epidemic, China and the Chinese people suffered from much criticism. What was China's image during this epidemic period? How do international news media portray China over time? Specifically, we aim to answer the following questions: Q1: What has the international media focused on during the COVID-19 epidemic period? Q2: What is the media's tone when they report China? Q3: What is the media's emotional tone when they are talking about China? To answer these questions, we first designed several crawlers to crawl news articles from overseas news media. We found more than 280 000 pieces of news from 57 mainstream media agencies in 22 countries. After data cleaning and annotation, we generated a highly qualified dataset for news analysis and natural language processing (NLP) related tasks. Then we explored the multi-level media focus problem from three levels: entity level, coarsegrained topic level, and fine-grained topic level. An entity is a real-world object, such as a person or an organization. At the entity level, we concentrated mainly on the entities on which the media focused in their news reports. At the coarse-grained topic level, all news articles were classified into topic categories like society or politics to determine the types of news reported in different media. In contrast, at the fine-grained topic level, each news article was further examined to determine a more concrete topic, such as "Wuhan is under lockdown" and "the progress of the vaccine." As for the tone of the news against China, we studied the tone of the news in different countries, on different topics, and at different periods. For the third question, we designed two methods for determining news emotions toward China. One is to use sentiment intensity to quantitatively measure the media's influence toward China; the other is to use emotional labels to examine the emotional situation qualitatively. We also made some interesting observations. For example, international media paid more attention to Chinese livelihood during the COVID-19 epidemic period, and most media presented a negative tone against China, such as American and French media. Our contributions are as follows: 1. As far as we know, we are the first to study the image of China as a country in the eyes of overseas news media with a large-scale, multi-level study, especially during the COVID-19 epidemic period. 2. We built a high-quality dataset for country image study and NLP tasks, including crawling a large amount of news media data and annotating parts of the data with crowdsourcing techniques. 3. We made some interesting discoveries. For example, from the analysis of topic distribution over time, we saw that in February and March COVID-19 was the most serious in China. In March and April, "progress of Chinese vaccines," "specific drugs and treatments," and "virus outbreak in U.S." became the media's most reported topics. In terms of news emotion toward China, Cuba, Malaysia, and Venezuela had a positive attitude, while France, Canada, and the United Kingdom had a negative attitude. The dataset was crawled from 57 media outlets in 22 countries between December 1, 2019 and June 30, 2020. For case study purposes, we focused mainly on news related to COVID-19. Information, including news titles, authors, and content, was collected. Below we will describe the details of data collection, cleaning, translation, and annotation. 1. Data collection We studied the mainstream news media (official news media or media with the most massive audience) in 22 countries including some powers and countries within the Belt and Road Initiative (BRI). We selected 57 official and influential news media sites from the Chinese Ministry of Foreign Affairs and other authoritative websites (www.fmprc.gov.cn/web/gjhdq_676201/gj_676203 /yz_676205/, www.fec.mofcom.gov.cn/article/gbdqzn/index.shtml) as our data sources (see Table A1 in the appendix). We designed crawlers for each news medium and collected news that contained keywords related to COVID-19 (e.g., 2019-nCOV, COVID-19, coronavirus, pneumonia) in the context or in the title between December 1, 2019 and June 30, 2020. For collected data, we deleted duplicated items that were crawled twice or more. In addition, we calculated each news text's similarity score and deleted news that was almost the same. Furthermore, we used regular expressions to fix and replace faulty fields in the dataset, e.g., replacing some of the header fields automatically generated by the website or some label fields in the body. 3. Translation To ensure the authenticity and integrity of the news data, we collected the original news from the site, which means that the data is available in multiple languages. To facilitate the model for processing the data, we used an online translation application programming interface (www.fanyi-api.baidu.com) to translate all the data into English. We did some processing in the translation process to preserve the integrity of the context information and the sentences and paragraphs after the translation of long text. We used crowdsourcing technology to label some randomly selected data from our dataset for supervised learning of the analysis model. We developed a system for multi-person collaborative labeling (https: //203.195.140.107:8088) . Then we trained 100 labeling experts to annotate the data. Each news item was randomly assigned to at least five experts, who randomly read each news item and marked the news with emotion tags, tone tags, news object tags, topic tags, and news genre tags. The detailed annotated labels are shown in Table 1 . After the first annotation, the data with disputed results were annotated a second time by more experts. We ended up annotating about 4115 pieces of high-quality news data. The processed statistics of our dataset are shown in Table A1 in the appendix. We have made the dataset public and available at http: //203.195.140 .107/dataset/download. We also did some preliminary studies on the dataset. The distribution of news media source is shown in Fig. 1 , with some of the smaller sources combined. We can see that media in different countries paid extra attention to China during the epidemic, and the United States ranks first in the list. For further research, we separately analyzed the news related to China and COVID-19. The number of news item over time is shown in Fig. 2 . There are several peaks in the figure. The first peak appeared on January 23 when Wuhan was locked down, which attracted a large amount of attention worldwide. The second peak was around March 16, when the number of COVID-19 caused deaths outside China surpassed that of China for the first time. We define this time point as the second wave of the COVID-19 epidemic period. The third peak was around May 28, when the World Health Organization (WHO) announced the launch of the "COVID-19 Technology In this section, we explore the media focus on China during the COVID-19 epidemic period. Specifically, we conduct our analysis from three levels: entity level, coarse-grained topic level, and fine-grained topic level. At the entity level, we explore the named entities on which these stories are focused. An entity is a real-world object, such as a person or an organization. We extract and analyze the first few entities of most media interest in each category. At the coarse-grained topic level, we analyze the categories to which news items belong thematically, such as social and political ones. In this way, we determine the types of topics related to China to which the media pays more attention. At the fine-grained topic level, we further analyze specific topics of media interest, such as major events or topical trends. Entities in the news corpus represent essential elements, including people, organizations, places, and things. We identify entities from these news corpora using the named entity recognition (NER) method. To better understand these news corpora, we extract entities from the news using an NLP tool named spaCy (Honnibal and Montani, 2017) . We focus on 10 types of entities, which are listed in Table 2 . After obtaining entities using spaCy, we align them with Wikidata (www.en.wikipedia.org/wiki/Wikidata). Specifically, each entity has a unique identifier called the QID in Wikidata. For example, the QIDs of "U.S." and "the United States" are both Q30, which means that "U.S." and "the United States" are the same entity. In this way, we align entities without disambiguation. We further study the extracted entities and find that China, Wuhan, the United States, and WHO appeared with a high frequency. The result is highly associated with the epidemic. We list the top five entities in each category as shown in Fig. 3 . We can see that during the second wave of the COVID-19 epidemic period, mainstream media care more about medical scientists, such as Anthony Fauci (an American physician and immunologist) and Zhong Nanshan (a Chinese medical scientist), and events that were closely influenced by this epidemic, for example, postponing the Tokyo Olympics because of the epidemic. After examining popular news sites like the BBC and CNN, we set seven topic categories for our study: society, politics, economy, technology, sports, humanity, and entertainment. As introduced in Section 2, we have manually annotated some news articles with these categories' labels. We consider these news articles as training datasets. Then we extract features using term frequency and inverse document frequency (TF-IDF) (Jurafsky and Martin, 2000) , build a supervised convolutional neural network (Kim, 2014) for training, and predict the topic labels for the remaining non-annotated news articles. The coarse-grained topic distribution is shown in Fig. 4 . We can see that the media pay special attention to people's livelihood and society issues, followed by politics, economy, and technology topics. The attention to livelihood and society accounts for more than 45%, and entertainment and humanity news together account for less than 1%. During the epidemic, the government's priority is to ensure people's livelihood, accompanied by the promulgation and implementation of a series of political regulations and economic impacts. in the economic field is only 3.24%; on the contrary, Singapore shows great interest in economic issues, accounting for 31.19%. We show the topic distribution over time in Fig. 6 . The horizontal axis represents the month and the vertical axis represents the percentage of all news on a given topic in a given month. We focus on topic distribution in different periods. In the social news about China during the epidemic, the number of news articles increases and then declines with time. The proportion in February and March reaches a peak, and then the proportion gradually decreases in the next few months. In February and March, COVID-19 was the most serious in China. Much news followed the social topics, focusing on the impact of the epidemic on people's lives and societies. After April, as the epidemic began to be effectively controlled in China, the proportion of news reports continued to decline. In terms of technology, the proportion of news reports about China has generally risen over time. The proportion in the first three months increased In terms of sports, the proportion of reports showed a general trend with time. The number of articles in January and February increased gradually. The proportion of sports news reached its peak in March, and then declined sequentially from April to June. During this period, on March 22, the International Olympic Committee officially announced the postponement of the 2020 Tokyo Olympics to 2021, and much news reported related events in March. The second layer of topic discovery is finegrained topic detection, which automatically identifies topics from news streams. Different from coarse-grained topic classification, it does not have any preset topics. Its primary purpose is to learn from news articles and find topics about which the most news is concerned. We model this problem as an unsupervised learning problem rather than a supervised classification problem. On the other hand, due to the continuous emerging news, we have to simultaneously deal with vast amounts of data. Thus, we design an efficient and effective supervised topic detection method for clustering news by topics and detecting topics. Specifically, inspired by Keygraph (Sayyadi and Raschid, 2013) , we construct an entity graph that contains only entities and keywords (including proper nouns, adjectives, and ordinal words) to make the algorithm more efficient. In addition, we use the SIFRank (Sun et al., 2020) algorithm based on the ELMo (Embeddings from Language Models) pre-training model to extract keywords, which improved performance. The learned topics are as follows: 1. In the first 20 days of January, the topics reported by various media focused mainly on the "novel coronavirus found in China," "virus outbreak in China," and "Wuhan is under lockdown." In late January, with the virus spread to Japan, Italy, Iran, and other countries, the media shifted its focus from China to other countries. 2. In February, the media reported mainly on topics such as "Aggregated events were postponed or canceled," "stock market volatility," and "the progress of the vaccine." It seems that the media began to pay more attention to the impact of the epidemic on human activities and the economy. 3. In March and April, "progress of Chinese vaccines," "specific drugs and treatments," and "virus outbreak in U.S." became the media's most concerned topics. The focus shifted away from China because the epidemic in China was well controlled. In March, topics such as "Tokyo Olympics" and "events postponed" also occupied a lot of forums. This corresponds to the sports news percentage peak in March in Fig. 6. 4. In May, financial topics such as "the stock market," "crude oil," and "exchange rate" received continuous attention. In the first half of this month, topics on "compensation from China for COVID-19", "virus origin," "vaccine competition," "China's second wave of epidemics," and "China-Australia relations" were hot. "NPC and CPPCC China" became the focus at the end of May. 5. In early June, the topics of media concern were scattered. Topics such as "People's Bank of China buys bank loans," "Stocks surged," "China will strengthen global cooperation in vaccine trials," "Trump administration says it will block Chinese airlines from flying into the U.S.," "China urges citizens to avoid Australia," "Harvard research," and "New virus cases raise fears in Beijing" were reported. Media usually bare three tones against China: support China (positive), oppose China (negative), and neutral. As stated in Section 2, we annotate our training dataset with tone labels (news media's tones against China), so we model this problem as a supervised learning problem. We first learn all word embeddings using word embedding techniques like BERT (Devlin et al., 2019) , and then feed them into a supervised classifier. After training, we are able to predict the news tones toward China. We find that most of the news has a neutral tone against China in our dataset of all media, accounting for 62%, as shown in Fig. 7 . This result is in line with principles of news reporting. We further analyze the tone of news at the country level. The tone of news against China in different countries is shown in Fig. 8 . The horizontal axis represents the country and the vertical axis represents the percentage of news in a country that has a certain tone. We can see that France, the United Kingdom, and the United States hold a relatively negative tone toward China, while Russia, Singapore, Cuba, and Brunei bear a positive tone. We also calculate the similarity scores of each country against China. We collect all the news in a country and create statistics of their tones toward China. Hence, the proportions of positive, neutral, and negative tones can be viewed as a vector. By calculating this vector's similarity score, we can find a similar country that bears a similar tone against China. We find that France and the United Kingdom have a similar result of 0.913. The similarity score between France and the United States is 0.824, and the similarity between the United States and Germany is 0.817. This shows that Western powers' tones toward China are consistent, and their news shows more of their negative tones against China. On the contrary, Russia and Cuba have a similarity of 0.849, Russia and Brunei have a similarity of 0.797, and Brunei and Malaysia have a similarity of 0.735, showing a positive tone toward China. The tone of news against China on different topics is shown in Fig. 9 . The horizontal axis represents the topic and the vertical axis represents the percentage of a given topic that reports a particular tone. We find that international news media hold the most negative tone toward China when reporting political news and the second most negative tone when reporting economic news. We report the tone of news over time in Fig. 10 . The horizontal axis represents the month and the vertical axis represents the percentage of a given topic that reports a particular tone. We can see that the tone of the news media changed over time. In January, the negative proportion of most countries toward China was relatively low. With the continuous aggravation of the epidemic situation, the media of many countries (such as Singapore, Spain, and Germany) gradually increased their negative tone toward China in February. After March, the domestic epidemic situation gradually eased. The negative tone showed a significant downward trend, or it was We use two methods to determine news emotion tones toward China. One is to use sentiment intensity to measure the media's influence toward China quantitatively; the other is to use emotional labels to examine the emotional situation qualitatively. For news sentiment intensity, we use Vader (Hutto and Gilbert, 2014) to calculate the sentiment intensity of a news article. The original paper's sentiment intensity ranges from −1 to 1. −1 represents the most negative emotional value, while 1 represents the most positive emotional value. To distinguish the emotional intensity of news reports more clearly, we uniformly extend the range to [−5, 5] . As shown in Fig. 11 , news all over the world has reflected negative sentiment obviously, which means the intensity score is not equal to zero. Sentiment intensity has fluctuated for half a year. In January, when the epidemic began, the score was the lowest. In February, medical teams all over China galloped to Wuhan. In March, the epidemic was effectively controlled. Therefore, the score gradually returned to zero from January to March. From April to May, The sentiment intensity in different countries is shown in Fig. 12 . We can see that Malaysia has the highest sentiment intensity, while Canada has the lowest intensity. A country with high positive sentiment intensity means that it has a positive attitude toward China, and vice versa. This result corresponds to our discussion in Section 4. A piece of news usually shows or implicitly expresses opinions on an event, a person, or other Fig. 12 Sentiment intensity for each country targets, reflecting the author's emotions. We divide emotions into six categories: agreeable, believable, good, hated, worried, and sad. If a piece of news contains none of these emotions, we think it is objective. For label-based emotions, considering that a piece of news may contain multiple emotions simultaneously, even opposite emotions, we need to design a useful multi-label emotion classification model to sufficiently capture the semantics of news context. We employ BERT (Devlin et al., 2019) as the feature extractor. The input to our model includes news headline and body content. After using the feature extractor, we obtain a sequence of the last hidden states and then retain the first token of the sequence (classification token). This token is fed to a linear layer with a sigmoid activation function, which predicts six probability distributions corresponding to defined emotions. The threshold is set to 0.4. In other words, if the prediction probability of emotion is greater than or equal to 0.4, we consider that the news contains this emotion. Note that if all six probabilities are less than 0.4, we consider the news to be objective. The overall illustration of our multi-label emotion classification model is shown in Fig. 13 . As shown in Fig. 14 , considering non-subjective news articles, we find that international news toward China holds more negative emotions than positive emotions, up to 26.0% and 5.5%, respectively. China, one of the countries with an early outbreak of the virus, has suffered from public criticism. The rapidly increasing number of infections makes the emotional tone of overseas media present negative emotions such as "critical" and "anger." Perhaps this explains why positive feelings toward China are only 5.5%. Fig. 13 Illustration of the multi-label emotion classification model As shown in Fig. 15 , we find that France's news reports have an extremely high percentage of "hated" emotion, followed by Canada, the United Kingdom, South Korean, Spain, and the United States. We rank the typical emotions in descending order for the convenience of comparison. As we can see in Fig. 16 , the proportion of "agreeable" toward China is generally low in every country, and the highest is Cuba, followed by Malaysia, Venezuela, Kazakhstan, and Belarus, while the lowest is the United States. As for the "hated" emotion, the highest is France, while the lowest is Kazakhstan. We run the k-means algorithm to cluster different countries, where the number of clusters is set to 3, and input data is the proportion of emotions except "objective." The clustering results are shown in Table 3 . As far as we can see, cluster 2 shows more positive emotions than others, while cluster 3 shows more negative emotions than others. We developed and deployed a visual system (http: //203.195.140.107) to show the whole news analysis process in this study. Fig. 17 shows the system framework. It consists of five modules: 1. Data collection We crawled news from 57 news websites of mainstream media in 22 countries and updated the data automatically every day. Details are as given in Section 2. 2. Data preprocessing We cleaned up the crawled data by strict standards and translated multilingual news into English. For better model learning, we annotated 5000 pieces of news with crowdsourcing technology. Details are as given in Section 2. 3. Data analysis We aimed to answer three questions: (1) What has the international media focused on during the COVID-19 epidemic period? (2) What is the media's tone when they report China? (3) What is the media's attitude when talking about China? We deployed some modules that would be used to answer these questions, such as named entity recognition, topic classification, and topic clustering. Details are as given in Sections 3-5. 4. Data visualization We displayed our system in a hierarchical manner, as shown in Fig. 18 . 5. Storage services Storage and querying of the knowledge graph are the keys to the entire system. To persistently store and analyze the knowledge graph data, we used two different types of databases to store data at different stages in the data processing procedure. We used the document-based MongoDB to store the crawled data. In addition, for mining information and data at a deeper level, we leveraged Neo4j (www.neo4j.com) to store the knowledge graph data of entities, topics, and events. 7 Related work 7.1 Country image Nimmo and Savage (1976) defined an image as "a human construct imposed on an array of perceived attributes projected by an object, event or person." The traditional analysis of national image often uses surface analysis based on related corpora and news content. Manheim and Albritton (1984) proposed two dimensions to describe the national image, visibility and valence, which represent the media's influence range and the degree of preference in the media content on the country, respectively. In different media, China is portrayed with different national images. Wang (2003) compared China's national image as projected by Chinese media and American media based on a content analysis between 1958 and 2002. Peng (2004) studied the coverage of China in New York Times and Los Angeles Times. Zhang L (2010) explored the image of China in three international newspapers in Europe. Zhang L and Wu (2017) used critical discourse analysis to examine the representation of China by China Daily. To summarize all of these studies, it was found that China's national image was portrayed as a peace-loving country, a developing country, and an anti-hegemonic nation by Chinese media (Wang, 2003; Zhang L and Wu, 2017 Fig. 18 System visualization The first three figures (a-c) are visualizations of a multi-level media focus. The first layer (a) shows the entity graph extracted from all news corpora. In the graph, each node is an entity. When clicking one entity node, we can get into the second layer (b) and see a detailed graph related to that entity. Move the cursor to display fine-grained topics. Click one node in this graph to see the third layer (c) and detailed events associated with this node. The last three figures (d-f) show coarse-grained topics and the tone of news and emotion analysis portrayed by foreign media was usually mixed and conflicted, such as a socialist country, a significant power, an authoritarian state, and a militant obstructive force (Zhang L and Wu, 2017) . These methods prefer manual analysis and lack fine-grained analysis and scalability. Chen et al. (2021) investigated China's image during the COVID-19 pandemic with aspect-based sentiment analysis. Compared to our system, it is limited to sentiment analysis. Moreover, our system provides a multi-level and multi-view country image analysis. Several systems have been built to analyze news from multiple perspectives. For news aggregation and analysis, Google News (www.news.goole.com) is the largest news aggregation system and monitored more than 5000 news sources worldwide as of 2013 (Filloux, 2013) . It performs topic detection, tracking, and clustering of news. It also uses algorithms to offer personalized news to users. Lloyd et al. (2005) built a news analysis system named Lydia to track temporal and spatial information of entities in the news. Hou et al. (2015) built NewsMiner (www.newsminer.net), which is a news-mining system framework. They proposed a three-level representation of news and formalized news-mining tasks as link predictions in the heterogeneous network. For fake news detection, Emergent (www.emergent.info) is a rumor tracking tool developed by Columbia University to study the spreading mechanism of rumors. For news and event detection, Liu et al. (2016) built a tool that can help journalists discover news on social media more quickly and assess news authenticity. There are also news detection and tracking systems and services provided by some companies or organizations, such as People's Daily Online (www.peopleyun.cn/yuqing.html), Baidu (www.yuqing.baidu.com/saas/intro/newindex), and Nielsen (www.nielsen.com). These systems usually monitor forums, blogs, news media, and social networks to find content relevant to a particular topic or keyword. Statistical analysis is performed based on the collected content to provide consulting services. The above works focused mainly on different aspects of news analysis, thus different from our system at different granularities. Media analysis is also known as media content analysis, which is a part of content analysis. Macnamara (2005) interpreted content analysis as a technique for describing what is said at a given place and a given time with objectivity and accuracy. However, slanted news coverage always exists in real life. In content analysis, researchers first define analysis questions or assumptions that need to be studied. Then they collect relevant news data, systematically read the news text, and annotate the text with examples of media bias associated with the ongoing analysis. After that, researchers use the annotated findings to accept or reject their hypothesis (McCarthy et al., 2008; Oelke et al., 2012) . There are two types of content analysis: quantitative and qualitative (Vaismoradi et al., 2013) . Qualitative analysis attempts to find "all" instances of media bias, including some that require human interpretation. Quantitative analysis measures news by determining the frequency of a particular word or phrase, the number of articles that include that word or phrase, or the size and location of an article in a printed newspaper (D'Alessio and Allen, 2000). They also use computer software to aid analysis, for example, by analyzing how often terms, topics, or words appear together (Lowe, 2002) . In addition to content analysis, media bias can be analyzed through public opinion polls or public votes, such as through the Gallup/Knight Foundation (John and James, 2018) and MBFC (www.mediabiasfactcheck.com). In analyzing media sentiment and content analysis, computational methods can be used for sentiment analysis. For example, Hutto and Gilbert (2014) presented VADER, a simple rule-based model for general sentiment analysis. Neri et al. (2012) used linguistic and semantic approaches to analyze sentiment about newscasts. In analysis of the media's tone, opinion mining has mostly focused on polarity detection of reviews by classifying the given text as positive, negative, or neutral. There are several existing tone detection models, including both neural models and classical classifier-based models (Ghosh et al., 2019) . Zhang Q et al. (2018) defined this problem as a ranking one and proposed a ranking-based method to maximize the differences among different tones. In this study, we focused on how international news media portrayed China during the COVID-19 epidemic period. We answered three questions using big data techniques: (1) What has the international media focused on during the COVID-19 epidemic period? (2) What is the media's tone when they report China? (3) What is the media's attitude when talking about China? Specifically, we crawled more than 280 000 pieces of news from 57 mainstream news media entities in 22 countries and made a detailed analysis. We found that during the second wave of the COVID-19 epidemic period, mainstream media cared more about medical scientists. Also, during the COVID-19 epidemic period, Singapore and Malaysia were more concerned about China's economy, whereas Canada and France were more concerned about Chinese politics. In March and April, "progress of Chinese vaccines," "specific drugs and treatments," and "virus outbreak in U.S." became the topics that most concerned the media. In terms of news emotion toward China, Cuba, Malaysia, and Venezuela had a positive attitude, while France, Canada, and the United Kingdom had a negative one. Our study can help understand China's image in the eyes of the international media and provide a sound basis for image analysis. Hong HUANG and Xuanhua SHI designed the research. Zhexue CHEN, Chenxu WANG, and Zepeng HE processed the data. Hong HUANG, Zhexue CHEN, and Chenxu WANG drafted the manuscript. Hai JIN, Mingxin ZHANG, and Zongya LI helped revise the manuscript. Hong HUANG finalized the paper. Hong HUANG, first author of this invited paper, is an associate professor at Huazhong University of Science and Technology (HUST), Wuhan China. She received her PhD degree in computer science from the University of Göttingen, Germany in 2016, and her ME degree in electronic engineering from Tsinghua University, Beijing, China in 2012. Her research interests include social network analysis, social influence, and data mining. Xuanhua SHI, corresponding author of this invited paper, is currently a professor with the School of Computer Science and Technology, HUST, Wuhan, China. He is the deputy director of the National Engineering Research Center for Big Data Technology and System (NER-CBDTS). He published more than 100 peer-reviewed papers in conferences and journals such as ASPLOS, VLDB, ACM Trans Comput Syst, and IEEE Trans Parall Distr Syst. He is a corresponding expert of Front Inform Technol Electron Eng. He received research supports from several governmental and industrial organizations, such as the National Natural Science Foundation of China, Ministry of Science and Technology, Ministry of Education, and the European Union. His current research interests include cloud computing, big data processing, and AI systems. Hai JIN received his PhD degree in computer engineering from HUST, Wuhan, China, in 1994. He worked at the University of Hong Kong from 1998 to 2000, and was a visiting scholar at the University of Southern California, Los Angeles, CA, USA from 1999 to 2000. He is currently the Cheung Kung Scholars Chair Professor of Computer Science and Engineering with HUST. He has coauthored 15 books, and published over Country image in COVID-19 pandemic: a case study of China Media bias in presidential elections: a meta-analysis BERT: pre-training of deep bidirectional transformers for language understanding Google News: the Secret Sauce. Monday Note Stance detection in web and social media: a comparative study Natural Language Understanding with Bloom Embeddings, Convolutional Neural Networks and Incremental Parsing. GitHub NewsMiner: multifaceted news analysis for event search Vader: a parsimonious rule-based model for sentiment analysis of social media text Perceived Accuracy and Bias in the News Media Speech and Language Processing: an Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition Convolutional neural networks for sentence classification Proc Conf on Empirical Methods in Natural Language Processing Reuters tracer: a large scale system of detecting & verifying real-time news events from Twitter Lydia: a system for large-scale news analysis Software for Content Analysis-a Review Media content analysis: its uses, benefits and best practice methodology Changing national images: international public relations and media agenda setting Assessing stability in the patterns of selection bias in newspaper coverage of protest during the transition from communism in Belarus Sentiment analysis on social media Candidates and Their Images: Concepts, Methods, and Findings Visual analysis of explicit opinion and news bias in German soccer articles Representation of china: an across time analysis of coverage in the New York Times and Los Angeles Times A graph analytical approach for topic detection SIFRank: a new baseline for unsupervised keyphrase extraction based on pre-trained language model Content analysis and thematic analysis: implications for conducting a qualitative descriptive study National image building and Chinese foreign policy The rise of China: media perception and implications for international politics Media representations of China: a comparison of China Daily and Financial Times in reporting on the belt and road initiative Ranking-based method for news stance detection. The Web Conf