key: cord-0432266-6a4wabrz authors: Vrancken, Joren title: Theme Analysis of Political Facebook Ads in the 2021 Dutch General Election date: 2022-01-12 journal: nan DOI: nan sha: 005c2b759de1b09c3c3523304daaa314c70490ba doc_id: 432266 cord_uid: 6a4wabrz Social media platforms have been trying to be more transparent about the political ads they run on their platforms, because the Cambridge Analytica scandal revealed that political campaigns are using social media on a large scale. One such transparency effort is the Facebook Ad Library, a public repository of all political ads run on Facebook and Instagram. This library provides journalist and researchers with data to get a better understanding of political advertising and microtargeting on Facebook's platforms. Unfortunately, the Facebook Ad Library only provides estimates and basic information. This paper analyses political ads in more depth, by examining the themes that ads are about. We provide a method to match themes to political Facebook ads and we apply this method to analyse Facebook ad campaigns ran by Dutch political parties during the 2021 Dutch general election. Social media has revolutionized the advertising business. Social media strive to provide a unique user experience to each user. The content one sees is tailored exactly to them. To accomplish this, social media platforms gather data from their users. They analyse this data to show content that keeps their users on the platform as long as possible. Social media platforms make money by showing ads to their users [10] . Just as they try to tailor the content a user sees to their preferences, they also tailor the ads a user sees to their preferences. This way of delivering content and ads is called microtargeting. In traditional advertising, ads are seen by a large number of people (e.g. TV commercials). Many people that see these ads are not interested in their message and many others would have been interested anyway without seeing the ads. Microtargeting solves this problem. Instead of advertising to everyone that watches a particular TV channel or everybody that waits at a specific bus stop, social media platforms target ads to the people with high probability of being interested in them. Facebook provides advertisers with a large range of selectors to specify the right people [11] . Facebook also helps advertisers find their ideal target audience. Facebook tracks the performance of ads to find the audiences that respond best to an advertiser's ads. For example, if an advertiser sells baby products, Facebook will let them target ads to women between the ages of 30 and 35, live in Amsterdam and are interested in products for young mothers. However, microtargeting is controversial, as it has become a tool for political campaigns to reach voters unnoticed. Borgesius et al. define microtargeting for political purpose as "personalised communication that involves collecting information about people, and using that information to show them targeted political advertisements" [32] . Political microtargeting has become part of mainstream public debate after multiple media outlets reported that Cambridge Analytica, a political consultancy company, used data of tens of millions of Facebook users to microtarget political ads, without the users' consent [17] [13] . Microtargeting went unnoticed because journalists and researchers cannot get a transparent view of who political parties are targeting and what messages political parties are using, as the ads are only seen by the target audience. Author's address: Joren Vrancken, Radboud University, Nijmegen, NL, jorenvrancken@gmail.com. As a response to this controversy, or to limit the chance that lawmakers adopt new strict laws, social media companies have amended their microtargeting practices. Twitter has decided to ban political advertising altogether [25] . Facebook has created a public repository of all political ads that are run on their platforms (i.e. Facebook and Instagram), called the Facebook Ad Library [5] . The library does not only provide the content of ads, but also metadata about the ads (e.g. how much was spent on the ad and when it was active). In theory, the goal of this library is to give better insights into the advertisements political campaigns are running. However, the library does have some transparency problems. For example, it does not show how much an ad cost, but instead only shows a range (e.g. between €2000 and €3000). Political science scholars and political journalists are interested in the messaging of political parties, because the messaging of a political party shows which political themes a party is focusing on, what their stance is on those themes and what they want voters to know about those themes. Ads are an important part of this messaging. The Facebook Ad Library is a useful tool that researchers and journalists can use to analyse the messaging in political ads, however the Facebook Ad Library does not provide any metadata (e.g. what an ad is about) on the content. This makes it hard to analyse the content of the ads on a large scale. In this paper we present a technique to help researchers analyse the content of political ads (in the Facebook Ad Library), by matching themes to ads. We showcase this technique by analyzing the 2021 Dutch general elections. First, we look at what data is available in the Facebook Ad library (subsection 1.1). We then look at what Facebook ads look like and which elements are important for our analysis (subsection 1.2). In section 2 we discuss related research. In section 3 we describe the technique on how to match themes to ads and walk through each step. In section 4 we apply this technique to the 2021 Dutch general elections and compare the results with existing voter research. We conclude in section 5 by discussing the methodology and the results. Finally, in section 6, we look at some interesting leads to continue this research. The Facebook Ad Library is a public repository of ads published on Facebook and Instagram, released in 2019 [23] . It consists of two parts: a public website and an API [6] . The website can be used to manually search for ads. The API can be used by applications to automate searching and retrieving ads. 1.1.1 Information about ads. The Facebook Ad Library provides a lot of information about each ad. The most important data the library provides for an ad is: • Start date: The date Facebook started showing an ad. • End date 1 : The date Facebook stopped showing the ad. • Spending: A range estimating the amount that is spent on the ad. The currency that was used in the payment is also provided. • Impressions [9] : Impressions measure how many times an ad was shown to a user. Like spending, impressions are provided as a range estimating the amount. • Estimated audience size (previously called potential reach) [8] : A range estimating how many users the ad could potentially be shown to. • Demographic distribution: Facebook provides some basic demographic information of the distribution of impressions 2 . They provide three demographic characteristics: -Female/male ratio. -Regions: Large regions within a country. For example, in the Netherlands the regions are the twelve provinces. • Content: The actual text that appears in the ad. An ad has multiple text elements, see subsection 1.2 for more information. It should be noted that the images or videos of an ad are not directly accessible through the Facebook Ad Library API, but the API does provide a link to the ad including any image or video. 1.1.2 Spending Tracker. Besides information on ads, the library also provides information on the organisations and political parties behind the Facebook pages (Facebook ads are always linked to a Facebook page) that pay for the ads, in the so-called Spending Tracker [7] . This is a good resource to identify an exhaustive list of Facebook pages that are used by political parties to run ads. • No exact numbers are given. The numeric data is given in ranges. For example, between €2000 and €3000 was spent on an ad or the ad has between 1000 and 2000 impressions. • It is not clear when an ad was most active. For example, if an ad gained between 1000 and 2000 impressions and was active for 30 days, we do not know whether the ad was mostly shown on one day or evenly over the 30 days. • There is a nuanced difference in the information that is available through the website and the API. For example, Facebook bundles similar ads on the library website to give a better estimation of the metrics. This information is not available through the API. • It is not publicly known how Facebook decides who sees which ad. The Facebook Ad Library only provides basic information about whom an ad was shown to. This means that the Facebook Ad Library provides only half the picture. Because we do not know how this decision is made, we do not know whether it is an explicit decision by a political party to target certain people or whether it is an automated choice by the Facebook algorithm, or a combination. A Facebook Ad contains the following elements: • The top of the ad shows an header that informs the user that they are being shown an ad and which Facebook page has published and paid for the ad. • The creative body: The main text of the ad. • An image or video. • Below the creative body and image we find a call to action. This section consists of the following elements: -The creative link caption: A link to a website of the advertiser. -The creative link title: A title that is shown above the creative link caption. -The creative link description: A description of the creative link caption. Not all of these elements need to be present in an ad (e.g. an ad can have an image but no creative body). It is also possible for ads to have multiple variants of one element (e.g. an ad with multiple creative bodies or multiple images). In this case the Facebook algorithm will pick the best combination of elements to show to a user. The Facebook Ad Library has been used by researchers to get a better understanding of political advertising on social media. In 2020 Fowler et al. [12] used the Facebook Ad Library to compare political Facebook ads to more traditional political advertising (e.g. television commercials). Edelson et al. [4] use the Facebook Ad Library to find suspicious and malicious advertising practices. Schmøkel and Bossetta [22] provide tools to analyse the images in Facebook ads. The Facebook Ad Library is not the only dataset that has been used to analyse ads on social media. ProPublica, in collaboration with other media outlets and researchers, published a dataset of Facebook ads (both political and non-political) that was gathered by volunteers installing a browser extension that saves all ads [21] . Ortega [16] uses this dataset to analyse the use of negative campaigning with online political ads. Levi et al. [15] use it to classify political ads. Researchers have used other political texts (e.g. speeches) to analyse the themes that political parties and specific politicians focus on. Many researchers use topic modeling to extract themes from these political texts. Topic modeling is a class of natural language processing models to cluster words into groups of similar word, where each group should represent a theme. Latent Dirichlet Allocation (LDA), presented by Blei et al. [3] , is a widely used topic model. Topic modeling work best if the analysed documents are homogeneous. There are many models that implement topic modeling, each suitable for a specific type of text. For example, GSDMM (Yin et al. [31] ) is suitable for short texts. Political ads, however, come in many forms that range from just a few words to multi-paragraph articles. This makes topic modeling less suited for analyzing political ads. Instead of relying on topic models, we provide an alternative approach to match themes to political ads. The goal of our methodology is to match a theme (or multiple themes) to an ad. In other words, we want to categorize what an ad is about using a list of themes. To accomplish this we provide the following repetitive process: (1) Create a list of themes (subsection 3.1). (2) Obtain and pre-process the ad content (subsection 3.2); (3) Create a list of relevant words for each theme (subsection 3.3); (4) Match themes to ads using the word lists (subsection 3.4); (5) Update the word lists by using the matched ads (subsection 3.5); (6) Go to step 3 (subsection 3.6). In this section we will go into detail about each step. We first need a list of themes to match to ads. These themes should cover the full public debate. We use existing research for this purpose. We base our themes on the codebooks from the Comparative Agendas Project (CAP) [2] . These codebooks contain categories and sub-categories that cover every topic in the public debate and a description for each category. As some themes are less prevalent in the current public debate and elections, it can be hard to distinguish between related themes. For example, it might be hard to distinguish between international affairs and foreign trade in ads. To solve this, we combine similar themes into one theme. We use the Facebook Ad Library API to retrieve the ad data we need. We want to analyse the textual content of the ads, we are interested in three elements: the creative bodies, the creative link descriptions and the creative link titles. We will refer to these combined elements as the ad text. Before we can analyse the ad texts, we need to make sure they are all in the same format. We use common natural language processing for this: • If an ad has multiple variants of an element (e.g. multiple possible creative bodies), combine them into a single text. • Replace all non-ASCII characters (e.g. emojis and bold characters) with the nearest equivalent ASCII characters. • Remove all words except for the nouns, proper nouns and adjectives. • Normalize all words to their root form (e.g. "cars" becomes "car" and "better" becomes "good"). This is referred to as lemmatization. To speed up the analysis, we also remove all duplicate texts, as political parties tend to reuse text. For each theme we use a list of words that are relevant to that theme. We refer to these word lists as theme word lists. We use the descriptions in the CAP codebooks for this purpose. For each theme, we (manually) add the relevant words from the corresponding CAP codebook category description to a list. To make sure the theme word lists are all in the same format we pre-process them like the ad texts in subsection 3.2. Once we have the themes and their corresponding theme word lists we use the following algorithm to match themes to an ad: (1) Compute the sizes of the intersections between each theme word list and the words in the ad text. (2) If the largest intersection contains at most one common word, no theme is matched to the ad. This mean that ads do not have to correspond to a theme. This is necessary, because political parties regularly advertise about non-policy matters (e.g. events they are organizing and membership benefits). (3) If the largest intersection contains more than one common word, match that theme to the ad. (4) If an intersection contains more than five common words, match that theme to the ad (even if there is a larger intersection). This means that multiple themes can be matched to an ad. This is necessary as some advertisements are longer texts that cover multiple themes (e.g. a summary of the most important points of a party's policy plans). At this point we have ads that correspond to themes. We use these ads to improve our theme word lists by updating the theme word lists for a theme with words common in ads about that theme. For each theme, we compute the frequency of each word in the ads about that theme. We (manually) check the most common words for words that are relevant to the theme and (manually) add these to the theme word list. Whether a word is relevant to a theme is subjective. This is ultimately up to the researcher creating the theme word list. We provide the following guidelines to aid in this decision: • When computing the most commonly used words in a set of ads, we will find many words that are commonly used in all ads (e.g. "vote" and "party" are common words in election ads) and words that are common in general (e.g. "world" and "year"). These need to be filtered out, as they do not correspond to a single theme. • Ideally, a word should be in only one theme word list. If a word is relevant to multiple themes, it is probably better to not add it to any list as its meaning is too broad. • Reading political ads helps getting a feeling for the way political ads are written and how words are used by political parties. This will help the researcher decide whether a word that has different meanings in different contexts should be added a theme word list. After an iteration of this process, we have improved theme word lists. We can use these improved lists to create an improved matching between themes and ads to find more relevant words for the theme word lists. Eventually, we come to a point where we do not find any more new, relevant words. At this point the process stops. Some themes are more important in an election than others. As political parties advertise more about the themes that voters find important, there will be less unique, relevant words for some themes. This means that some theme word lists will be finished after fewer iterations than others. This also means that the theme word lists will not be of the same length. In this section we will use the method described in section 3 to analyse the themes used in political Facebook ads during the 2021 Dutch general election. This election was held from 15 to 17 March. We look at ads that ran between 1 September 2020 and 1 September 2021 4 and we focus on three parties that are seen as the winners of the election [18] [29] [24] [30]: • Democraten 66 (D66): The party that ended second in number of seats. • Forum voor Democratie (FvD): The party that gained the most seats relative to their previous election results 5 . • Volkspartij voor Vrijheid en Democratie (VVD): The party that won most seats. These parties cover a broad political spectrum, D66 being a progressive, social liberal party, VVD a centre-right, conservative liberal party and FvD being a conservative, right-wing populist party [19] . In subsection 4.1 we look at the themes we used during this analysis. In subsection 4.2 we show the distribution of themes per party. We analyse these distributions through three lenses: we compare our results with existing voter research (subsection 4.3), we look at which parties got the most impressions for each theme (subsection 4.4) and we look at the demographic make-up of the audience (subsection 4.5). We published more detailed results (e.g. the results of all parties) on our website 6 . On this website we also published an analysis of the metadata of political Facebook Ads. As described in subsection 3.1 we create a list of themes using the CAP codebook of the Netherlands [1] . This resulted in the following list of themes. The categories from the CAP codebook on which the themes are based are listed for each theme. • Agriculture: This theme covers all topics and debate relevant to agriculture, farming and livestock. This theme is based on category 4 (Agriculture and Fisheries). • Civil Rights: This theme covers all topics and debate relevant to rights of citizens (e.g. discrimination and privacy). This theme is based on category 2 (Civil Rights). Category 2 also includes migration, which we split off into a new theme. • Climate: This theme covers all topics and debate relevant to climate and the environment. This theme is based on categories 7 (Environment), 8 (Energy) and 21 (Public Lands). • Defense: This theme covers all topics and debate relevant to defense and the military. This theme is based on category 16 (Defense). • Economy: This theme covers all topics and debate relevant to the economy (e.g. macroeconomic policies and commerce). This theme is based on categories 1 (Macroeconomics and taxes) and 15 (Domestic Commerce). • Education & Culture: This theme covers all topics and debate relevant to education, culture and religion. This theme is based on categories 6 (Education) and 17 (Technology). • Government: This theme covers all topics and debate relevant to government services, government operations and public service. This theme is based on category 20 (Government Operations). • Healthcare: This theme covers all topics and debate relevant to healthcare. This theme is based on category 3 (Healthcare). • Housing: This theme covers all topics and debate relevant to housing and city planning. This theme is based on category 14 (Housing). • Law & Order: This theme covers all topics and debate relevant to general law, crime and jurisdiction. This theme is based on category 12 (Law and Crime). • Migration: This theme covers all topics and debate relevant to immigration, emigration and refugees. This theme is split from category 2 (Civil Rights). • Social Welfare: This theme covers all topics and debate relevant to social welfare (e.g. low-Income assistance). This theme is based on categories 5 (Labor) and 13 (Social Welfare). • Transportation: This theme covers all topics and debate relevant to traffic, transportation and infrastructure. This theme is based on category 10 (Transportation). The theme word lists we created be found at https://github.com/joren485/DutchPoliticalFacebookAdComparision/ tree/main/data/wordlists. Using the themes in subsection 4.1, we created theme word lists using our methodology and matched themes to the ads of D66, FvD and VVD. Table 1 shows the distribution of themes in number of ads and Table 2 shows the distribution of themes of impressions 7 . Table 1 and Table 2 only include ads that were matched to a theme. Table 3 shows what percentage of ads did not match any theme. There are multiple explanations why the matched percentage is relatively low for some parties: • Ads about organisational matters such as training sessions, conferences and becoming a member. • Ads that focus on candidates or a broad message rather than on policy plans or themes. For example, in the election the VVD focused their campaign on the party leader and D66 focused on the message "it is time for new leadership". Other parties focus more on their policy plans. For example, 79.59% of ads by the Dutch labor party, Partij van de Arbeid (PvdA), were matched to a theme. • Ads consisting of only a few words are harder to match to a specific theme. Before the election, Nieuwsuur 8 commissioned Ipsos 9 to do research into which themes are important to the constituency of each party [26] . The research includes the top five themes for each party: we see almost a 1-to-1 overlap. • D66: The top five themes according to voters correspond the top four themes in ads. This shows us that D66 focuses its ads on themes that are important to their voters. • FvD: Ads are highly focused on Healthcare, which their voters also care a lot about. However, the theme Climate is second in % of ads, but not one of the most important themes according to their voters. We can also see this in Table 2 : Climate is in the 10th place of % of impressions, showing us that these ads are (relatively) not performing well. • VVD: The top five themes according to voters correspond to the top six themes in the ads. Like with FvD, the theme Climate is not one of the most important themes to the voters, but is prevalent in VVD ads. Issue ownership for a political party is defined as having the best solution for an issue according to voters. In this section we compare existing issue ownership research with which parties got the most impressions for each theme, to see whether the party with the best solutions is reaching the most voters with their messaging. Table 4 shows the three parties that got the most impressions (and what percentage of the total impressions they got) for each theme. It should be noted, that this has an inherent bias to right leaning parties as they tend to advertise more than left leaning parties. If we compare these issues with the data in Table 4 and subsection 4.2, we make the following observations: • D66 owns Education and gets the most impressions for Education & Culture. • FvD advertises about issues that they do not own. For example, FvD advertises a lot about Healthcare and Climate, but does not own these issues. • VVD owns multiple issues and focuses on most of them in their ads. However, if we look at Table 4 , we see that the VVD is only in the top three of impressions for two themes. • The European Union issue from the I&O report (which corresponds to the Foreign Affairs theme) does not get much attention in ads. For D66, FvD and VVD Foreign Affairs falls in the bottom three in % of ads. However, the ads about Foreign Affairs do relatively well when we look at the % of impressions. Especially the VVD gets many impressions and is second in number of impressions. The demographic data that the Facebook Ad Library provides allows us to compute the distribution of ads among demographic groups. Using these tables we can make some interesting observations about the advertising strategy of the Dutch political parties: • There are some big swings in female/male distribution of the impressions. For example, nearly 60% of all ads about migration are shown to men and nearly 60% of all ads about social welfare are shown to women. • As people between the ages of 13 and 17 are not allowed to vote, they are mostly excluded from the advertising. • 65+ is the largest group by population, but this group is significantly under-represented in impressions for almost all themes. This may be because senior citizens are less active on social media. • The distribution of impressions in provinces seems to be mostly inline with the population statistics. We only see a significant deviation from the population statistics in specific themes (e.g. Agriculture). In subsection 4.3 we see that themes that are important to voters are the same themes that political parties advertise about. This shows that our technique produces accurate results. It also shows that political parties prioritize themes that their constituencies find important. As the theme word lists are updated with common words in ads, the word lists consist of commonly used words. This is advantageous, because political parties tend to use specific terms to convey their opinions. For example, when talking about climate change, a party that prioritizes the dangers of climate change will use different language than a party that is skeptic towards climate change. Only using pre-made lists of words or clustering similar words into groups (i.e. using topic modeling) might not include relevant words (and common misspellings), which will result in miss-matching themes to ads. Different political parties use different strategies when it comes to writing ads. Some parties use many short ads each covering a single topic, others use large texts covering everything that is important to the party. This means that any technique to analyse the themes in ads should be agnostic to the length and format of the content of the ads. We accomplish this by only taking the intersection between the ad and the theme word lists into account and not looking at the length of the ad and the theme word list. Our method does, however, have two inherent biases. Whether a word is relevant to a theme is subjective. This is an inherent bias towards the opinion of the researcher creating the theme word list. Whether a word in an ad is relevant to a theme is also dependent on the time the ad is active. The problems that are relevant to a certain theme are different each election, this means that there is a bias towards the current political issues in the theme word lists. As discussed in subsection 3.5, someone needs to decide whether a word appearing in an ad is relevant to a theme or not. Unfortunately, this creates a lot of manual work for the researcher. However, as we only look at words that are common in ads, the method is still scalable. For example, we analysed more than thirty thousand ads for this research. Although the Facebook Ad Library gives a lot of insights into the ads that are run on the platforms of Facebook, we still only see part of the picture. Facebook only provides estimates of the performance of an ad. These estimates make it possible to analyse the advertising performance of a theme or political party. However, it does not help us answer a maybe more important question: How are political parties using opaque online tools (such as microtargeting) to reach voters? • In this research we focused on the text contained in political ads. However, many ads also include an image or video. Some ads only consist of an image or video and do not have any text. These images and videos should also be analysed to improve the theme matching. • In section 4 we focused on the distribution of themes in ads by political parties. It would be interesting to look more deeply into the way political parties talk about theses themes, for example, sentiment analysis. • As Edelson et al. [4] shows, we can see which seemingly separate advertisers publish closely related (or identical) ads. We might be able to find parties that are collaborating in their ad campaigns. We would like to sincerely thank Frederik Zuiderveen Borgesius and Tom Dobber. They have provided invaluable guidance, feedback and advice during this research. We would also like to sincerely thank everybody that proofread this paper. The Netherlands Agendas Project studies the policy outputs of the national government Comparitive Agendas Project Master Codebook. Comparative Agendas Project A security analysis of the facebook ad library Facebook Ad Library Facebook Ad Library API Facebook Ad Library Report About Estimated Audience Size Facebook Reports Third Quarter Help your ads find the people who will love your business. Facebook for Business Political Advertising Online and Offline Facebook Says Cambridge Analytica Harvested Data of Up to 87 Million Users Platform ad archives: promises and pitfalls Automatically Identifying Political Ads on Facebook: Towards Understanding of Manipulation via User Targeting Are microtargeted campaign messages more negative and diverse? An analysis of Facebook ads in European election campaigns How Trump Consultants Exploited the Facebook Data of Millions NOS. 2021. Winnaars VVD en D66 aan zet bij formatie Parties and Elections in Europe. 2021. Netherlands. Parties and Elections in Europe Ipsos & Nieuwsuur Kiezersonderzoek 2021. I&O Research Political Advertisements from Facebook FBAdLibrarian and Pykognition: open science tools for the collection and emotion detection of images in Facebook political ads with computer vision A Better Way to Learn About Ads on Facebook Rutte op weg om langstzittende premier te worden Political content Ipsos & Nieuwsuur Kiezersonderzoek 2021 Dit zijn de belangrijkste verkiezingsconclusies Nog niet eerder waren de linkse partijen zo klein A dirichlet multinomial mixture model-based approach for short text clustering Online Political Microtargeting: Promises and Threats for Democracy