key: cord-0592492-czqvs107 authors: Silva, Mirela; Giovanini, Luiz; Fernandes, Juliana; Oliveira, Daniela; Silva, Catia S. title: Facebook Ad Engagement in the Russian Active Measures Campaign of 2016 date: 2020-12-21 journal: nan DOI: nan sha: 315c22a012518565cef4f08147e276cfce8f6c86 doc_id: 592492 cord_uid: czqvs107 This paper examines 3,517 Facebook ads created by Russia's Internet Research Agency (IRA) between June 2015 and August 2017 in its active measures disinformation campaign targeting the 2016 U.S. general election. We aimed to unearth the relationship between ad engagement (as measured by ad clicks) and 41 features related to ads' metadata, sociolinguistic structures, and sentiment. Our analysis was three-fold: (i) understand the relationship between engagement and features via correlation analysis; (ii) find the most relevant feature subsets to predict engagement via feature selection; and (iii) find the semantic topics that best characterize the dataset via topic modeling. We found that ad expenditure, text size, ad lifetime, and sentiment were the top features predicting users' engagement to the ads. Additionally, positive sentiment ads were more engaging than negative ads, and sociolinguistic features (e.g., use of religion-relevant words) were identified as highly important in the makeup of an engaging ad. Linear SVM and Logistic Regression classifiers achieved the highest mean F-scores (93.6% for both models), determining that the optimal feature subset contains 12 and 6 features, respectively. Finally, we corroborate the findings of related works that the IRA specifically targeted Americans on divisive ad topics (e.g., LGBT rights, African American reparations). was pervasively leveraged during the Cold War via the Soviet Active Measures [9, 49] , one the the most well-documented uses of disinformation in political warfare against the U.S. and its allies. The account of Active Measures defectors [9, 10] sheds light on practices that remain largely the same today. There were two goals: discredit the U.S. as imperialist and permeate chaos in American and Western allies' society. To spread disinformation, operators exploited the media's hunger for "scoops," which was fed via anonymous leaks and compromised journalists. A polarized media was highly conducive to the spread of disinformation because the target wants to believe in a message that affirms their preconceived opinions. Even balloons were used to spread disinformation [5] , resulting in over 300M pamphlets littering Central Europe. Operators targeted grassroot movements to sow discord by exploiting societal vulnerabilities, such as distributing racists leaflets falsely attributed to the KKK, while simultaneously infiltrating antiracist groups [49, 59] . The Cold War Active Measures campaigns bear a disturbing resemblance to what we are witnessing today. We are immersed in an environment of highly polarized, scoop-hungry media, with some outlets spreading demonstrably false information [55] . Our society has now evolved, making room for social media to become the 21st century version of Cold War balloons spreading disinformation. The 2019 Mueller report [38] revealed that IRA (Internet Research Agency, associated with the Kremlin) employees travelled to the U.S. in 2014 on an intelligence-gathering mission to better understand American culture for use in social media posts. Arif et al. [4] documented the IRA's penetration in the #BlackLivesMatter movement, playing "both sides" in the discourse. Science continues to be leveraged as an indirect target of disinformation campaigns, inflaming the debate about climate change [13] and the coronavirus pandemic [58] . Notably, during the 2016 U.S. presidential election, as many as 529 different rumors were spread on Twitter [27] , and approximately 80,000 social media advertisements [60] were identified by the United States House of Representatives Permanent Select Committee on Intelligence (HPSCI) as disinformation advertisements released by Russian actors with the intent of interfering with the 2016 presidential campaign and sow division in American society by exploring issues such as race (Black Lives Matter advocacy), 2nd amendment rights, and immigration. McFaul [36] gives evidence that Soviet Active Measures never stopped: the U.S. went from a Cold War with the Soviet Union to a Hot Peace with Russia. The key difference between disinformation now and in the last century is that the Internet and social media platforms have amplified disinformation's scope, speed, and detrimental effects. While in the past campaigns were expensive, long, and "manual" (e.g., flyers disseminated from the sky via balloons [9, 49] , spreading disinformation today is arguably cheaper, faster (click of a button), and executed remotely, complicating attribution. Tackling disinformation is difficult because: (i) spreading it is not illegal in the United States, (ii) solutions cannot infringe freedom of speech, (iii) dissemination speed and scale can render fact-checkers quickly outdated, and (iv) the combination of truth with falsehoods exacerbates human confusion and challenges automatic detection. What makes the disinformation campaigns surrounding the U.S. presidential election remarkable is that it was one of the best well-documented Active Measures Russia conducted against the US since the Cold War. Analysis on this campaign is essential for defenses against future campaigns because Russia's Active Measures will not stop. In fact, the Senate Select Committee on Intelligence report on Active Measures on social media [52] highlights that IRA activity on social media did not cease, but rather increased after Election Day 2016, as if the results emboldened the Russia government [23, 31] . Moreover, reports have shown that foreign states such as Russia, China, and Iran targeted the Donald Trump and Joe Biden 2020 election campaigns in the U.S., using similar techniques as those employed by the IRA in 2016 [6] . In this paper, we analyze a dataset [60] made available by the U.S. House of Representatives Permanent Select Committee on Intelligence containing 3,517 Facebook ads created by the Russian Internet Research Agency (IRA) from June 2015 to August 2017. We hypothesize that the number of clicks reflects an ad's pertinence and users' engagement; therefore, we opted to use ad clicks as our engagement metric. In predicting ad engagement (measured by ad clicks), we identified four broad categories of features with potential predictive value: (1) the ad's metadata (e.g., lifetime, expenditure); (2) the size of the ad's text (character and word count); (3) the ad text's sociolinguistic features (e.g., authenticity and emotional tone); and (4) the ad text's subjectivity (e.g., objective vs. subjective) and sentiment (e.g., positive vs. negative). We therefore aimed to investigate: • RQ1: Is there a relationship between the ads' features and engagement? • RQ2: What feature set makes a disinformation ad successful? • RQ3: Given a set of the most discriminative features, how accurately can one predict engagement? • RQ4: Which semantic topics best characterize the Facebook IRA ad dataset? In quantifying our empirical investigation, we implemented several correlation analysis methods, as well as machine learning analysis for topic modeling and feature selection. This paper confirms and builds on major findings from similar works, such as [38, 45, 47] . We confirm that communities (e.g., African Americans, Republicans, LGBT) were specifically targeted by the IRA to sow dissent within American society, and several communities experienced an increase in engagement with the Russian ads in our dataset during key moments of the 2016 presidential election (e.g., during President Trump's office takeover). However, in contrast to DiResta et al. [47] , who performed a qualitative analysis of the ads, we do so through a quantitative methodology, combining statistics and multi-methods machine learning focused on engagement. Many aspects of our results corroborate prior works [3, 45, 47] , but we also go further than prior works to show that high engagement ads were more positive in terms of sentiment, more informal and personal, and shorter in text size than standard engagement ads. Finally, we find that ad expenditure was ranked as the most important feature for predicting high engagement by six machine learning models, and that sociolinguistic features of the ad (e.g., the presence of words associated with religion) made up the top 5 features for predicting high engagement for the majority of the learning models. This paper is organized as follows. Section 2 reviews prior works analyzing the Russian Active Measures disinformation campaign related to the U.S. presidential election of 2016. Section 3 describes the methodology of our analysis. Section 4 presents our correlation analyses and machine learning results. Section 5 discusses our study's findings, limitations, and future work directions. Section 6 concludes the paper. In this section, we focus on prior work intersecting the topic of the present paper, in particular prior analyses of Russia's great active measures campaign of 2016 and disinformation spreading. Investigations and reports on Russian efforts to influence the 2016 U.S. elections emerged as early as mid-2016 via the FBI Crossfire Hurricane investigation and after Congress members had access to classified intelligence [37] . After the election, in early 2017, the Office of the Director of National Intelligence released an assessment of the Russian influence and disinformation campaign [39] , for the first time acknowledging its similarities to the Soviet Active Measures campaigns that targeted the U.S. during the Cold War [9, 10, 43, 49] . The report highlighted a perceived change in Russia intelligence efforts, which since the Cold War, have been primarily focused on foreign intelligence collection. For decades, Russian and Soviet intelligence services have sought to collect insider information to allow the Kremlin with a better understanding of U.S. priorities and foreign policy. However, the Intelligence Community had uncovered that Vladimir Putin had ordered an influence campaign using social media to hurt Clinton's electoral chances and Manuscript submitted to ACM undermine public faith in the U.S.'s democratic process. Next, Congress sought the aid of experts and social media companies in facilitating its public hearings and investigations. Following the firing of FBI Director James Comey, a Special Counsel was formed and represented another line of investigation on Russian active measures campaign. In September 2017, the media started reporting [56] that the Mueller probe was focused on the use of social media as the main tool for the active measures campaign. This prompted social media companies to conduct internal audits, which led to a dataset of tweets, Facebook ads and posts, and YouTube videos being released to the House Permanent Select Committee on Intelligence. The Senate Select Committee on Intelligence undertook a study of these events and sought the input of two main Technical Advisory Groups (TAG) to analyze the dataset provided to the Committee by the social media companies [25, 45, 47] . Both groups analyzed thousands of ads, pages, tweets, and posts that social media companies independently identified through audits pertaining to the Internet Intelligence Agency's (IRA) active measures campaigns; the analyses focused on qualitative and quantitative aspects of the dataset. Both reports, released to the public in late 2018, reached similar conclusions, corroborated in early 2019 by the Mueller report [38] . The IRA, supported by the Kremlin, conducted a major active measures campaign in the years preceding the 2016 presidential election campaign, with their social media stimuli reaching millions of American citizens. They sought two main goals: (1) influence the 2016 U.S. presidential election by harming Hillary R. Clinton's chances of success while supporting then-candidate Donald J. Trump, and (2) sow discord in American politics and society, especially on race issues by heavily targeting the African American population while playing both sides of the political discourse (also corroborated by independent work from Arif et al. [4] ). The group led by John Kelly [45] also stressed the key role played by Twitter bots in amplifying propaganda, in agreement with prior research by Bessi and Ferrara et al. [8, 17] . The intelligence reports and independent researchers also analyzed the IRA Facebook paid advertisements (ads) from qualitative and quantitative perspectives. In particular, the House Permanent Select Committee on Intelligence released 3,517 Facebook ads associated with the IRA in 2018. Although the ads were not the bulk of the IRA's activity in social media, the use of advertising was consistent with IRA's modus operandi [52] : divisive subjects related to race, police brutality, Second Amendment rights, patriotism, LBGT rights, and immigration [29] . In a U.S. census-representative survey, Ribeiro et al. [48] found that people from different socially salient groups react differently to the content of the IRA's Facebook ads, further positing that Facebook's ad API facilitated this divisive targeting. Indeed, Facebook estimates that 11.4M Americans saw at least one of the ads ultimately determined to have been purchased by the IRA [52] . The work closest to ours is by Alvarez et al. [3] , who performed sentiment analysis on this same Facebook ads dataset to correlate positive and negative emotions with engagement and discover how the valence of emotions changed over time. The analyses found negative sentiment was more prevalent before the elections and positive sentiments after the election. Through the use of the versatile Maximal Correlation analysis, we confirm that positive sentiment ads were correlated with high engagement. While previous works have focused on generation, measurement, and content of propaganda, the goal of this research was to assess in-depth the effectiveness (i.e., engagement) of such tactics. Thus, this paper expands these prior works by focusing only on the Facebook ads to find correlations between user engagement and 41 features (e.g., sentiment, sociolinguistic features of the ad's text). We also compared six machine learning models for feature selection, to further analyze which ad features were most important for engagement. We further leveraged Latent Dirichlet Allocation (LDA) to detect, in an unsupervised fashion, eight major topics/groups (e.g., justice and African American, LGBT rights) weaponized in the ads; this confirms Howard et al. 's [45] analysis, wherein 20 clusters of audiences/groups (e.g., African American politics and culture, black identity and nationalism, LGBT rights and social liberalism) were identified using modularity to find community structures in networks. A deep understanding of engagement is imperative to effectively measure disinformation, yet current researchers argue that measuring disinformation is likely impossible. For example, the TAG group led by DiResta et al. [47] argued that determining whether the IRA's disinformation campaigns indeed affected the 2016 presidential election is impossible. Rid [49] similarly argued that it is unlikely that the Russian trolls convinced a significant number of American voters to change their minds because the volume of IRA activity was lower than reported: only 8.4% of IRA activity was election-related [46] and the discourse happened in echo-chambers where people already had their minds set. However, former Soviet disinformation defectors such as Ladislav Bittman beg to differ [9, 10] -disinformation can indeed be measured. In his account of Soviet disinformation tactics, Bittman discussed the two ways by which the KGB measured the success of disinformation campaigns. The first was through the attention (i.e., engagement) that the message was drawing outside the Soviet bloc, e.g., the amount of public discussion generated by the message and the tone of the political discourse on the issue. In the 21st century, this metric is what online platforms call engagement: a function of the number of article/post views, likes, retweets, shares, mentions, etc. Bittman stressed the cult of the published word: the number of words used by the mass media of the enemy or victim is more important than a careful evaluation of the operation results. Less attention is paid to whether the words had the desired effect. The second metric to measure disinformation was determining whether the message forced the target country to make any political changes that could directly or indirectly benefit the Soviet Union. In the 21st century, the election of President Donald J. Trump could be a political change that benefited Russia's political interests, as the U.S. intelligence community confirmed [38] . According to Bittman, the Soviet Union knew that it was unlikely that a single disinformation campaign would tip the balance of power. However, disinformation operatives like himself believed that mass production of propaganda and disinformation over several decades would have a significant effect. The same rationale applies today: one tweet or Facebook post may not tip the balance, however, several months of posts on a disinformation narrative (e.g., questioning the integrity of a presidential election) might cause irreparable harm to a democracy. Our paper and analyses provide in-depth insights on engagement as a key metric of disinformation impact. This section describes the dataset used in our analyses along with the steps taken for data cleaning and feature extraction. We leveraged a dataset of 3,517 Facebook ads created by the Russian Internet Research Agency (IRA) and made publicly available to the U.S. House of Representatives Permanent Select Committee on Intelligence [60] by Facebook after internal audits. Estimated to have been exposed to over 126M Americans between June 2015 and August 2017, these ads were a small representative sample of over 80,000 organic content identified by the Committee. Of the 3,517 ads, 3,290 contained text entry; the remaining 227 ads were purged from the dataset, as we were interested in performing sentiment analysis and topic modeling based on the ads' text. Next, we discarded four ads that did not contain a numerical value for the number of ad clicks (our criteria for measuring engagement). Therefore, our final dataset For each of the 3,286 ads, we extracted a total of 41 features (see Table 1 ) that can be summarized into four main categories: (i) ad metadata features, extracted from the metadata already contained in the dataset (e.g., # of ad clicks and impressions); (ii) text size features, related to the size of the text itself (e.g., word count); (iii) sentiment & subjectivity features, describing both valence (positive vs. negative) and salience (low to high arousal) of sentiment in the ad's text; and (iv) sociolinguistic features, related to emotions, mood, and cognition present in the ad's text based on word counts (e.g., the words "crying, " "grief, " and "sad" are counted as expressing sadness). was composed of 2 pages, where the first page contained ad metadata (e.g., the textual content of the ad, the link to the ad) and the second page contained a screenshot of the ad as seen by Facebook users (see Fig. 4for examples). We used the PyPDF2 Python library [44] to automatically extract the following metadata features from each ad: • Ad Impressions: the number of users who viewed the ad. • Ad Clicks: the number of users who clicked on the ad. • Ad Spend: the amount of money (in RUB) spent on the ad. • Ad Lifetime: the ad's creation and end dates (in hours). Engagement includes all actions that users take in reaction to an advertisement, such as viewing, clicking, liking, commenting, and sharing. Because the metadata made available for the dataset only captures two of the aforementioned actions (ad impressions and clicks), we opted to use the feature Ad Clicks as our metric for ad engagement. Ad Clicks is a good measure because it indicates how many users actually engaged with the advertisement, i.e., took action by clicking on the ad after exposure. We opted to disregard Ad Impressions in our analyses as it was highly correlational with Ad Clicks (see Sec. 4.1 and 5.2). We summarized the length of the ad's text using a total of two features: character count and word count. We leveraged three sentiment analysis packages: • VADER [20]: a rule-based NLP library; outputs a uni-dimensional and normalized compound score that ranges from −1.0 (negative) to 1.0 (positive), where scores between −0.05 and 0.05 are considered neutral sentiment. • TextBlob [34] : a rule-based NLP library; outputs a polarity (sentiment) score that ranges from −1.0 (negative) to 1.0 (positive) sentiment, as well as a subjectivity score ranging from 0.0 (objective) to 1.0 (subjective). • Flair [1] : an embedding-based framework built on PyTorch; Flair's pre-trained sentiment model outputs labels of either POSITIVE or NEGATIVE sentiment. We validated these sentiment analysis packages using the average F-score as the performance metric with a dataset containing 50K highly-polarized movie reviews from IMDB [35] . In this dataset, 25K reviews were labeled positive and 25K negative; Flair greatly outperformed both VADER and TextBlob (89.5% vs. 69.0% vs. 66.5%, respectively). Nonetheless, we opted to use all 14 sentiment and subjectivity features in our analyses as listed in Table 1 . To extract sociolinguistic features, we leveraged LIWC2015 [42] , a text analysis tool that reflects a text's emotions, thinking styles, social concerns, and grammar (e.g., parts of speech) based on word counts. A total of 21 LIWC features were extracted: • Four summary variables: analytical thinking (formal, logical, and hierarchical thinking vs. informal, personal, here-and-now, and narrative thinking), clout (expertise and confidence vs. tentative, humble, or anxious), authenticity (honest, personal, and disclosing text vs. guarded or distanced), and emotional tone (positive, upbeat style vs. anxiety, sadness, or hostility; values around 50 suggesting neutrality or ambivalence), each measured on a 100-point scale. • Seventeen other LIWC categories, most of which are related to psychological processes. Each of these features was measured as percentage of words (e.g., "affective process" of 10 means that 10% of all words of the ad's text were related to emotions, such as "happy" and "cried"). See Table 2 for more detailed examples of these 17 features: affective processes (e.g., positive emotions, anxiety, anger), social processes (e.g., family, friends), cognitive processes (e.g., insight, certainty, discrepancy), perceptual processes (e.g., see, hear, feel), biological processes (e.g., body, health, sexual), drives (e.g., affiliation, power, reward), time orientations (past/present/future focus), relativity (e.g., motion, space, time), personal concerns (e.g., work, leisure activities), and punctuation (e.g., periods, commas). This section describes the diverse set of analyses conducted to answer our research questions, along with our results identifying several features that prompt users' engagement with disinformation. Specifically, we performed the following analyses: • A correlation analysis described in Sec. 4.1 to answer RQ1 ("Is there a relationship between the ads' features and engagement?"). • A feature selection analysis detailed in Sec. 4.2 to answer RQ2 ("What feature set makes a disinformation ad successful?") and RQ3 ("Given a set of the most discriminative features, how accurately can one predict engagement?"). • A topic modeling analysis discussed in Sub. 4.3 to answer RQ4 ("Which semantic topics best characterize the Facebook IRA ad dataset?"). We opted to separate the dataset into a standard group and an outlier group to better understand how low engagement vs. high engagement vary as a function of ad features. Using the 1.5xIQR rule (i.e., values above 3 + 1.5 × ), we identified 432 upper outliers based on Ad Clicks. Therefore, ads with < 2, 188 clicks ( = 2, 854, 86.9%) were assigned to the Standard Engagement group (subscript stand) and those with ≥ 2, 188 clicks ( = 432, 13.1%) were assigned to the High Engagement group (subscript high). Before we could perform statistical analyses on the extracted features, we used the Shapiro-Wilk to test for normality and found that none of the continuous metadata features (Ad Clicks, Impressions, Spend, and Lifetime) was normally distributed ( < .001 for all variables) based on a 1% significance level. Prior to calculating Pearson's and Spearman's Rank Correlation Coefficients, we normalized the continuous metadata features using the Yao-Johnson power transformation (as it allows for zero and negative values) because these features exhibited a heavy positive skew (i.e., the Fisher-Pearson coefficient of skewness was >> 1). This removed the skewness from the ad metadata features. We used Pearson's ( ) and Spearman's Rank Correlation ( ) tests to find the linear and monotonic correlations, respectively, between Ad Clicks (our engagement metric) and all other extracted features (see Table 3 ) -0.09*** -0.07*** 0.10*** -0.08* -0.10* 0.13** Religion n.s. n.s. 0.07*** n.s. n.s. 0.11* All Punctuation n.s. n.s. 0.10*** n.s. n.s. n.s. * Significant at p <.05 ** Significant at p <.01 *** Significant at p <.001 n.s. = not statistically significant As displayed in Table 3 , we found moderate to strong positive correlations between Ad Clicks and the ad metadata We also noticed that Ad Impressions was extremely similar in distribution to Ad Clicks, which is intuitive as views and clicks are both metrics of social media engagement [2] . We therefore opted to discard the Ad Impressions feature from our analysis. Strong or moderate correlations did not hold true for the remaining feature categories. For example, we Additionally, sentiment and subjectivity variables were further analyzed using the Chi-Squared test (see Table 4 ). Features were transformed into categorical variables using the thresholds described in Sec. 3 (RQ1) Ad expenditure followed by ad lifetime and text size showed the highest Maximal Correlations for both Standard and High Engagement groups. High Engagement ads were more positive in terms of sentiment, more informal and personal, and shorter in size than Standard Engagement ads. Our correlation and statistical analyses used to address RQ1 relied on the individual relevance of each feature in characterizing engagement. However, individual features sometimes fail in predicting the target variable accurately. Machine learning models can combine multiple features to predict the target, sometimes revealing promising features that do not have relevant pairwise correlation results. In this section, we present and discuss our investigative steps and results to address RQ2 and RQ3, where we aimed to determine the set of features (from Table 1 ) that best predict engagement via machine learning analysis. There are many feature selection techniques available in the literature [57] . In this work, we use Recursive Feature Elimination (RFE) to determine from our set of collected features (Table 1) which subset should be retained to best predict ad engagement. RFE employs a multi-class classifier as estimator to rank the relevance of existing features by assigning a weight coefficient to determine their importance and select the optimal feature subset (i.e., the subset with best prediction results) in a supervised fashion [57] . As RFE weighs the features based on the importance in predicting engagement, it can become sensitive to the type of model used. For this reason, we performed this analysis for six different estimators and then compared their results, checking for commonalities among the selected feature subsets. Six popular classifiers were used, in particular: Adaboost, Bernoulli Naive Bayes (NB), Gradient Boosting, Support Vector Machine with a linear kernel (Linear SVM), Logistic Regression, and Random Forest (see Tables 5 and 6 ). We used their implementations available in the scikit-learn library [41] (a popular machine learning library for Python) with default parameters in all cases. RFE was performed using stratified 5-fold cross-validation due to its relatively low bias and variance [22] . The evaluation metric used to rank the features and select the optimal feature subset was the F-score (the harmonic mean of precision, a measure of exactness, and recall, a measure of completeness [22] ), which is well suited to handle imbalanced datasets as in our case (2, 854 Standard vs. 432 High Engagement ads) [22] . Importantly, prior to this analysis, all features were first standardized by removing the mean and scaling to unit variance. Our results are summarized in in Table 5 , for each classifier, we present the average subset associated with the highest F-score. Topic modeling was performed on the ads' text using Latent Dirichlet Allocation (LDA), an unsupervised probabilistic generative model. Simple textual preprocessing was done to make the text more amenable for analyses, including the removal of punctuation and stop words, and the lowercasing of all words. In order to transform the textual data into a format that serves as input for the LDA model, we converted the texts into a simple vector representation using bag of words (BoW). Then, we converted the list of ad texts into lists of vectors, all with length equal to the vocabulary. Words were then lemmatized, keeping only nouns, adjectives, verbs, and adverbs. We validated the LDA's topic modeling performance using topic coherence, as described in [50] , and is made readily available in gensim.models module for Python. Using the coherence, i.e., the coherence computed as the average similarity between the top word context vectors and their centroid, we find the set of parameters with maximum coherence value of 0.58 for the entire dataset: = 0.01 and = 0.91, yielding a total of 8 topics. Using these parameters to train the LDA model, we then reduced the number of repeated keywords across different topics-i.e., each topic should describe a unique idea. The groups of people that were targeted for each advertisement was provided amongst the several metadata provided in our original dataset. Using this information and the keywords associated with each topic, we then inspected the cleaned LDA topic results and proposed topic labels; for example, keywords such as conservatism, republican, tea party, confederate, Fox News, Trump, Pence, conservative were assigned to the "conservative or Republican" category. Therefore, we proposed the following eight overarching topic categories: (1) American patriotism, (2) justice/African-American, (3) perseverance/liberal/democrat, (4) female rights/education, (5) peace/guns, (6) police/military, (7) community integration/LGBT, and (8) capitalism/conservative/republican. A summary of these results, along with example keywords, can be found in Table 7 . Our results, as analyzed and validated using Machine Learning algorithms, are in agreement with the qualitative analyses presented in prior works [45, 47] . Figure 2 shows the occurrence of each summary topic derived by the LDA topic modeling from June 2015 to August 2017. We see that the majority of Topic 7 (community integration/LGBT) has the largest ad count preceding the election (May 2016). Interestingly, Topic 3 (perseverance/liberal/democrat) closely mirrors Topic 8 (capitalism/conservative/republican). We also observe several interesting occurrences when considering the median number of ad clicks for each summary topic during this same time period (Fig. 3) . Topic 3 (perseverance/liberal/democrat) stands out in engagement before the election (February-July, 2016) as well as significant impact during office takeover. Topic 5 (peace/guns) has some significant engagement in the months preceding the election and some impact during office takeover. Topic 1 (American patriotism) and Topic 7 (community integration/LGBT) appear to follow each other throughout this timeline. Topic 2 (justice/African-American) experiences relatively low median engagement numbers with the exception of a spike during the office takeover period. In January 2017, there was a surprisingly big significant spike in engagement in both figures. (RQ4) We confirm prior works by DiResta et al. [47] and Howard et al. [45] , wherein the IRA purposefully targeted racial, ethnic, and political communities within the U.S. to further polarize political discourse. During key moments of the 2016 presidential election (e.g., during President Trump's office takeover, circa February-May 2017), several communities (e.g., African Americans, LGBT) experienced a surge in engagement with the IRA disinformation ads in our dataset. We sought out to investigate several research questions pertaining to engagement in a dataset of Facebook ads created by the IRA during Russia's latest active measures campaign perpetrated before and after the 2016 U.S. presidential election, with the goal to influence the election results and sow discord in American citizens over divisive societal issues. To do so, we leveraged descriptive statistical and machine learning analyses to explore a total of 41 features metrics (e.g., likes, shares) were not available in the dataset curated by Facebook. This section analyzes our findings and the limitations of our work. . We hypothesize an intuitive explanation: paying more for an ad might be associated with a better targeting service from social media platforms, potentially causing the ad to reach more people who will be more interested in the ad (and views are a direct precursor to Ad Clicks). We also hypothesize that engagement potentially occurs as soon as users view an ad, and extending the ad's lifetime will likely not alter how users perceive and interact with the ad. Size. Another important feature was the length of the ad's text. Facebook truncates posts greater than 477 characters [19] . High Engagement ads had, on average, nearly 110 fewer characters (and nearly 20 fewer words) than the Standard Engagement group; therefore, we hypothesize that shorter ads are more engaging. Research on deception detection shows that deceivers embed influence cues in their content to blur people's decision making [28] . In fact, accounts from Cold War disinformation points to the use of pictures, short texts, sexual appeal (if applicable), sensationalism, and high-arousal emotion in disinformation stimuli [49] . However, we only considered the textual content of each ad and disregarded the presence of images. It is possible that ads with shorter texts used emotionally visceral images (examples in Fig. 4) to communicate a message, likely increasing users' engagement advertisement. We found that sentiment features were highly important for predicting engagement, with High Engagement ads more positive in sentiment than Standard Engagement ads. Corroborating this finding, there is indeed a wealth of cognitive and behavioral sciences research that points to the impact of affective states (i.e., emotions) in decision making [18, 26] , where positive emotions have been shown to be more detrimental to rational decision-making than negative emotions. Positive affect states have been shown to cause an increase in trust and a decrease in social vigilance [28, 30] ; therefore, a user's good mood indicates a safe environment [28] , and can thus increase one's susceptibility to deception. Several works have also shown that high emotional arousal is leveraged by con artists to persuade victims to comply with their requests [30, 33] by focusing the victims' attention onto reward cues [32] . The LIWC sociolinguistic features are separated into two broad categories: summary variables (Analytic, Authentic, Clout, and Tone) and other LIWC categories (e.g., cognitive processes). For the summary variables, we found that High Engagement ads were more personal and informal than Standard Engagement ads. Evans and Krueger [15] and Cialdini's principles of persuasion [11] offer plausible explanations to this: people who are perceived as familiar or similar (e.g., same culture) are more likely to be trusted by others (a phenomenon termed the in-group trust disposition) and are more likely to have their requests obeyed. Therefore, ads whose authors masqueraded themselves as part of the targeted community may have achieved higher engagement levels. In total, LIWC features largely dominated four of the six feature selection models. From this, we see that the content of the advertisement itself, along with the use of (or lack thereof) certain topics (e.g., religion) influences engagement. Furthermore, prior works (e.g., [45, 54] ) have shown that the type of user account (e.g., bot vs. real human Twitter accounts) impacts user engagement with a message. In this paper, we found that LIWC features such as Authentic (which measures how authentic a writer appears to be) suggest that the authors of the Facebook ads may have impacted users' engagement (e.g., an African American user posting about #BlackLivesMatter). The IRA has been shown to groom real users [51] into writing their disinformation articles; as such, future works should analyze the accounts of users responsible for spreading the disinformation in our dataset. Doing so may increase our awareness of the IRA's modus operandi. We now discuss this paper's limitations with an eye towards potential future works. Our data collection extracted the ad texts but discarded any images associated with the ad, overlooking the presence of emotionally charged visual stimuli used in combination or as its own malicious ad product (see Fig. 4 for examples). To mitigate this data loss, future works can leverage deep learning architectures such as neural networks for image captioning to characterize the content of an image [24] and pair it with the ad text and dataset's features. Similarly, if an ad only contains a video, future works can make use of video summarization and image captioning with attention-based mechanisms [16] to leverage all the available information. In fact, this treatment of media files has its standalone merit and is well suited to be integrated within social media platforms. In our work, we analyzed a collection of features readily available and extractable from the dataset (such as ad metadata and text size) but we also extracted several other sentiment and subjectivity features using the pre-trained NLTK VADER, TextBlob and Flair models, and sociolinguistic features using the novel text analysis tool LIWC. Our analysis includes full characterization of the relationship between these features and the dependent variable (ad clicks, i.e., engagement) including both linear and non-linear relationships. However, due to the inherent sparsity and noisiness in natural language processing, the extracted features will quickly become co-linear which can impact subsequent feature selection techniques. In the future, we plan to use techniques such as the Gram-Schmidt Transform [61] to guarantee orthogonality of the feature space. We also corroborated prior works by [45, 47] showing that the IRA purposefully targeted communities to polarize political discourse in the U.S. Based on this, the unsupervised LDA performed surprisingly well considering the relatively small dataset. LDA is a powerful tool for topic modeling, though it suffers from major drawbacks similar to many unsupervised models, including: (1) stasis, that is, LDA finds the set of topics for the entire dataset without the ability to track them over time; (2) the number of topics needs to be defined a priori (in this work, we applied measures of consistency and reproducibility to determine the best number of topics); (3) LDA measures keyword contribution based on a Bag of Words (BoW) model, which assumes words are exchangeable, the sentence structure (semantic) is not modeled; (4) non-hierarchical modeling, where keywords are shared between topics; and, finally, (5) LDA topic distribution does not capture linear correlation relationships between topics, as intuitively disclosed in drawback (4) . Another limitation of this work refers to our sociolinguistic analysis of the ads' text. Out of 85 categories available in the LIWC tool, we restricted our analysis to the 17 main categories to reduce sparsity in our feature space, given our relatively small set of ads. Because these features exhibited promising results in predicting engagement, the examination of more LIWC categories using larger sets of ads is a potential fruitful direction of future work. Moreover, in our topic modeling analysis, we used BoW to convert the ads' text into a vector representation to serve as an input for the LDA model. This may also be a limitation of our study since BoW disregards certain properties of the text such as grammar, semantic meaning, and word ordering. The use of other word vectorization techniques able to capture semantic meaning and other relevant properties, such as Word2Vec and GloVe, is therefore another research direction. Understanding what makes for high engagement in propaganda and disinformation ads paves the way for countermeasures in several respects. First, future research can evolve the social media labels and potentially expose deceptive cues in posts from suspicious or biased accounts to better inform users. This is particularly important when we consider that Russia Active Measures did not stop after the campaign considered in this work and in fact intensified after the election [52] . For example, in 2018, the Washington Post reported that Russian trolls inflamed the U.S. debate over climate change [13] . In June of 2020, the Associated Press reported that U.S. officials confirmed that Russia was behind the spreading of disinformation about the coronavirus pandemic [58] . Disinformation campaigns have also been generated from own nation states figures [21] , as we have witnessed in the aftermath of the U.S. 2020 presidential election. The success of such campaigns has even prompted the business of disinformation-as-a-service, which key stakeholders, including disinformation researchers should pay a closer look [53] . This paper focused on a statistical and multi-methods machine learning investigation of features that predict engagement in a dataset of 3,517 Facebook ads created by the Internet Research Agency (IRA) that serves as ground-truth disinformation and were created between June 2015 and August 2017. These ads, made publicly available by the United States House of Representatives Permanent Select Committee on Intelligence, were part of a Russia Active Measures disinformation campaign that sought to influence the U.S. Presidential Election of 2016 and sow division in American society, especially on racial issues. We extracted a total of 41features from this dataset and using correlation analysis, feature selection, and topic modeling, we found that: (1) ad expenditure, text size, ad lifetime, and sentiment were recurring important features chosen by six different machine learning models in the makeup of a successful disinformation ad; (2) positive sentiment ads were more engaging than negative ads; (3) sociolinguistic features (e.g., use of religion-related words) were highly important in predicting engagement; and (4) confirming prior works, the IRA targeted several communities and sociopolitical topics (e.g., reproductive rights, Second Amendment rights) during the 2016 U.S. presidential election cycle. We offer suggestions for future works that may shed light on important aspects regarding the prediction of engagement with disinformation, which we hope can foster the next generation of countermeasures and in-depth analyses. Contextual String Embeddings for Sequence Labeling View, Like, Comment, Post: Analyzing User Engagement by Topic at 4 Levels across 5 Social Media Platforms for 53 News Organizations Good News, Bad News: A Sentiment Analysis of the 2016 Election Russian Facebook Ads Acting the Part: Examining Information Operations Within #BlackLivesMatter Discourse Addressing Russian Influence: What Can We Learn From U BBC News. 2020. Russia, China and Iran hackers target Trump and Biden, Microsoft says. BBC Propaganda. New York: Horace Liveright. 150-155 pages Social bots distort the 2016 U.S. Presidential election online discussion The Deception Game: Czechoslovak Intelligence in Soviet Political Warfare The KGB and Soviet disinformation: an insider's view. Washington: Pergamon-Brassey's Influence: The Psychology of Persuasion, Revised Edition Nonparametric Statistics for Non-Statisticians: A Step-by-Step Approach These provocative images show Russian trolls sought to inflame debate over climate change, fracking and Dakota pipeline Ensemble Correlation Coefficient The Psychology (and Economics) of Trust Summarizing Videos with Attention The Rise of Social Bots Affective influences on judgments and behavior in organizations: An information processing perspective Stop mindlessly following character count recommendations on Facebook posts Vader: A parsimonious rule-based model for sentiment analysis of social media text From COVID-19 to voting: Trump is nation's single largest spreader of disinformation, studies say. USA Today Data mining: concepts and techniques Disinformation, "fake news"and influence campaigns on Twitter A Comprehensive Survey of Deep Learning for Image Captioning Social Media, News and Political Information during the US Election: Was Polarizing Content Concentrated in Swing States? Positive affect as a factor in organizational-behavior Detection and Analysis of 2016 US Presidential Election Related Rumors on Twitter Thinking, Fast and Slow. Farrar, Straus and Giroux Uncover: Strategies and Tactics of Russian Interference in US Elections Emotional arousal may increase susceptibility to fraud in older and younger adults Seven ways misinformation spread during the 2016 election Consumer vulnerability to scams, swindles, and fraud: A new theory of visceral influences on persuasion Out of Control: Visceral Influences on Behavior Textblob: simplified text processing Learning Word Vectors for Sentiment Analysis From Cold War to Hot Peace: An American Ambassador in Putin's Russia Key lawmakers accuse Russia of campaign to disrupt U.S. election. The Washington Post Report on the Investigation into Russian Interference in the 2016 Presidential Election Background to "Assessing Russian Activities and Intentions in Recent US Elections True or false a CIA analyst's guide to spotting fake news Scikit-learn: Machine learning in Python. the The development and psychometric properties of LIWC2015 Soviet Active Measures Reborn For The 21st Century: What Is To Be Done PyPDF2 Social Media and Political Polarization in the United States Update on Twitter's review of the 2016 US election The Tactics & Tropes of the Internet Research Agency On Microtargeting Socially Divisive Ads: A Case Study of Russia-Linked Ad Campaigns on Facebook Active Measures: The Secret History of Disinformation and Political Warfare Exploring the Space of Topic Coherence Measures 2020. Russia 'launders' disinformation by using fake personas Russian Active Measures Campaigns and Interference in the 2016 U.S. Election Outsourcing Disinformation Predicting Misinformation and Engagement in COVID-19 Twitter Discourse in the First Months of the Outbreak Lies, Damn Lies, and Viral Content Mueller Probe Has 'Red-Hot' Focus on Social Media, Officials Say. Bloomberg News Pattern recognition US officials: Russia behind spread of virus disinformation Active Measures: A Report on the Substance and Process of Anti-U.S. Disinformation and Propaganda Campaigns House of Representatives Permanent Selection Committee on Intelligence Social Media Advertisements Unsupervised feature selection through Gram-Schmidt orthogonalization-A word co-occurrence perspective