key: cord-0440353-gkcjgel8
authors: Atreja, Shubham; Hemphill, Libby; Resnick, Paul
title: What is the Will of the People? Moderation Preferences for Misinformation
date: 2022-02-01
journal: nan
DOI: nan
sha: d4b15f1c6f33d79f15a8edacb4b12916ff5f0391
doc_id: 440353
cord_uid: gkcjgel8

To reduce the spread of misinformation, social media platforms may take enforcement actions against offending content, such as adding informational warning labels, reducing distribution, or removing content entirely. However, both their actions and their inactions have been controversial and plagued by allegations of partisan bias. The controversy in part can be explained by a lack of clarity around what actions should be taken, as they may not neatly reduce to questions of factual accuracy. When decisions are contested, the legitimacy of decision-making processes becomes crucial to public acceptance. Platforms have tried to legitimize their decisions by following well-defined procedures through rules and codebooks. In this paper, we consider an alternate source of legitimacy -- the will of the people. Surprisingly little is known about what ordinary people want the platforms to do about specific content. We provide empirical evidence about lay raters' preferences for platform actions on 368 news articles. Our results confirm that on many items there is no clear consensus on which actions to take. There is no partisan difference in terms of how many items deserve platform actions but liberals do prefer somewhat more action on content from conservative sources, and vice versa. We find a clear hierarchy of perceived severity, with inform being the least severe action, followed by reduce, and then remove. We also find that judgments about two holistic properties, misleadingness and harm, could serve as an effective proxy to determine what actions would be approved by a majority of raters. We conclude with the promise of the will of the people while acknowledging the practical details that would have to be worked out.

Zittrain describes three eras of Internet governance [8, 68] . The first focused on rights of end-users and abstention from action by any intermediary, so as to allow the benefits of connectivity and communication to flourish. The second era focuses on public health, where intermediaries can be expected to intervene to diminish harms. The spread of online misinformation is often discussed as one of those threats to public health. Potential harms include impacting elections [2] , undermining efforts to respond to a global pandemic [57] , and stoking ethnic violence [23] . Thus, there have been increasing expectations that platforms like Google, Facebook, Twitter, and YouTube should take action to reduce the spread of misinformation.

The platforms have responded. They take actions against publishers, such as suspending or banning their accounts.

They also sometimes take three kinds of actions against particular pieces of content. The first is to alert users that a particular content may contain misinformation, through a label, color coding, or text. Twitter refers to this as warning [58] . Facebook refers to it as an inform [44] action, a terminology that we will adopt. The second action is to downrank a content item so that it appears later in search results or news feeds and thus fewer people encounter it. Following Facebook's terminology, we will refer to this as a reduce action. The third available action is to filter or remove the content.

While the platforms are acting on misinformation, their actions have been controversial. The U.S. Congress has repeatedly called platform leaders to task for taking insufficient action (e.g, [46] ). Conservative politicians and pundits have complained that enforcement actions fall disproportionately on publishers expressing conservative viewpoints [7] . On the related issue of offensive content, a 2020 Pew survey found a 34% gap between liberals' and conservatives' responses to the proposition, "Offensive content online is too often excused as not a big deal", up from a 13% gap just four years earlier [62] .

However, the problem may be more fundamental than one of imperfect policies or ineffective enforcement. It may not always be clear what action should be taken. Some decisions may not neatly reduce to questions of factual accuracy.

An article may contain several claims, only some of which are untrue. An article may be factually correct but still mislead many people to factually incorrect beliefs. Conversely, some factual claims may be false but not likely to be believed (e.g., satire) and some false claims may not lead to much harm, even if believed. The harmfulness of some misinformation may depend on the context of what larger narratives are circulating and what violent conflicts are stirring.

When the correct decision is ambiguous or contested, what can lend legitimacy to decisions about misinformation moderation? One possibility is for governments to set criteria. Another is to have them made by people who are widely accepted as trustworthy because of their benevolence and expertise. Facebook touts its reliance on external fact-checkers trained in journalistic practices [20] . The public, however, is suspicious of all these authorities, as well as the platforms [39, 53] . Zittrain offers a pithy summary, "We don't know what we want. We don't trust anyone to give it to us. " [67] As a result, he suggests that we are now entering a third era of content governance, where legitimacy comes through process [8, 68] .

One form of process-based legitimacy for contested decisions can come from following well-defined procedures.

This has been the primary approach of platforms with respect to misinformation and other moderation decisions, as it assures that decisions are not made arbitrarily or based on the whims of platform employees. However, this does not assure complete legitimacy as any reasonably sized codebook may not capture all the nuances associated with moderation decisions. Exceptions need to be made which raises the question of who has the authority over making these changes. An alternate source of legitimacy for contested decisions in democratic societies is the will of the people, taking the action preferred by the majority of the people. As we explore in the background section, there are practical and conceptual difficulties with each approach, but the will of the people offers a useful starting point.

However, surprisingly little is known about what ordinary people want the platforms to do about specific content. This paper contributes empirical evidence about preferences of U.S. based raters for platform actions on 368 news articles. The articles were chosen to over-represent potentially problematic misinformation, by selecting popular URLs from problematic sites and URLs that had been flagged by Facebook for further investigation. Each article was rated on mturk by 54 raters -18 liberals, 18 conservatives, and 18 others. Raters first searched for corroborating information, then judged the misinformation and harmfulness of the content, and finally expressed preferences for what actions, if any, platforms should take.

Manuscript submitted to ACM We confirm empirically that there is no way to please everyone all the time; there were many items with no clear consensus on which actions to take. There was no partisan split in how many items were thought to be deserving of platform actions but there were partisan differences about which items; more conservatives want action on items from pro-liberal sources and vice versa. There was a clear hierarchy of severity, with the inform action viewed as least severe, followed by reduce, and then remove. Finally, we find that collective judgments about misinformation and harm could serve as an effective proxy to determine what actions, if any, a majority of raters approved of.

We review two potential sources of legitimacy for moderation decisions against misinformation -i) codebooks and policy rules, and ii) the will of the people. We describe the two alternatives while also emphasizing some of the practical and conceptual concerns associated with each of them. The first and most common approach, based on codebooks and policy rules, treats people as sensors of objective attributes of content items. The intention is to minimize the impact of their opinions on the final decisions that are made. In contrast, the second approach treats people as democratic citizens whose opinions are a source of legitimacy for decisions.

The first approach is based on codebooks and policy rules. A platform policy specifies sets of attributes that are sufficient to warrant platform actions. A codebook and rater training materials are designed to produce a very high inter-rater agreement about these attributes. We will refer to such attributes as quasi-objective; objective because that is the aspiration but quasi because they still depend on human judgments where subjectivity may creep in. For example, Google has published the codebook that it asks paid human raters to follow in evaluating the quality of websites as potential search results [28] . It runs more than 170 pages and defines attributes ranging from expertise and authoritativeness to having a descriptive and helpful title.

Policy rules based on quasi-objective attributes help to legitimate outcomes in two ways. First, accountability to the general public can be maintained, in principle, by publicizing the set of attributes and the mapping rules, as Google does with its codebook for search results. In practice, however, even platforms that publicize their attributes and codebooks are not completely transparent about the mapping of those attributes to enforcement actions.

Second, legitimacy is enhanced by making the results appear less subjective and more definitive. A common theme in platforms' content moderation policies, for misinformation as well as other types of problematic content, is to try to reduce subjectivity [26, 69] . Even if the attributes aren't completely objective, if almost any trained rater would evaluate them the same way, the process will yield definitive results. 1 For enforcement decisions about misinformation, several quasi-objective attributes have been proposed. The first, of course, is the accuracy of factual claims made in an article: "false information that can be verified as such" [55] .

Platforms have partnered with independent fact-checking organizations to objectively verify the veracity of factual claims present in content [4] . Researchers [34, 35] and legal scholars [36] have argued, however, that factual accuracy is an insufficient attribute, as it does not differentiate between satire, unintentional mistakes, and intentionally fabricated content [11] . Consequently, some guidelines also include the author's intent and the potential to cause harm as crucial to deciding the course of action against misinformation [58, 61, 70] . The Credibility Coalition has defined a much larger set of quasi-objective attributes, such as, the representativeness of a headline, different types of logical fallacies, reputation of sources, etc. [65] .

Unfortunately, the approach of codebooks plus policy rules is not a panacea for establishing legitimacy of decisions.

There are both practical and conceptual challenges.

2.1.1 Practical Difficulties. The first practical challenge is that it can be difficult in practice to achieve sufficient inter-rater agreement on attributes to treat them as objective or even subjective but definitive. For example, raters achieved high inter-rater reliability on very few of the Credibility Coalition's attributes [65] . Among content indicators, the Krippendorf alpha value for ratings of whether the title of an article was representative of the content was 0.367, and coding for five different types of logical fallacies all had alpha < 0.5. Among context features, only reputation of citations at .852 had an alpha level above 0.67, a commonly used threshold among communications scholars for human coding [38] .

Another practical challenge is that any direct mapping between attributes and enforcement invites strategic gaming of the system where publishers may identify loopholes with content violating the spirit but not the letter of the policies.

For example, some publishers try to avoid platform actions by adding disclaimers about "satire" or "parody" on their websites, even though their articles can still mislead many users [11] .

2.1.2 Conceptual Challenges. One conceptual challenge is that any reasonably sized codebook may be insufficient to capture the nuanced policy judgments that people would like the platforms to make. Platform policies against satire provide a relevant example. Most platforms draw a blanket exception for taking any action against satire, despite the evidence that even satire can mislead certain users [11] . More subtle rules are possible, but it would challenging to identify what additional attributes could be collected that allow for clear distinctions between harmful and not harmful satire. The Napalm girl controversy provides an example in another realm of content regulation, as described in [26] .

An iconic photograph depicts underage nudity but also serves documentation of the horrors of the Vietnam war. The photo was removed by Facebook, per their guidelines that prohibited any kind of child nudity. However, the decision resulted in major criticism at a global level.

Another challenge to legitimacy is that codebooks and rules do not define, once and forever, what platforms will or should do. Public controversy over particular cases can lead to reconsideration of rules. For example, in the Napalm girl case, Facebook reinstated the photo and eventually updated their rules to provide scope for other exceptions [26] . More generally, platform policy makers regularly gather to consider cases not covered by current rules. 2 The fact that rules are regularly revised calls into question whether the final power to make changes to the rules should rest with policy makers employed by the platforms. Critics have argued that they tend to share a Silicon Valley ethos, lacking diversity in age, nationality, ethnicity, race, class, and ideology [42] . One way that platforms can boost the legitimacy of rule-based decision processes is to cede the power to make exceptions and revise rules to more trusted, external entities. For example, Facebook has convened an oversight board (aka Supreme Court) selected for expertise and diversity [21] . It has been criticized, however, both on grounds that it is insufficiently independent (board members are paid by a foundation that Facebook set up) and insufficiently powerful (Facebook has not pre-committed to accepting all of its recommendations).

Outside of platform content moderation, there is a long stream of research on citizen assemblies and deliberative polling, where a citizen panel is selected at random from some pool, deliberates, and then reports their opinions on a matter of public importance [18, 48] . The idea is to collect preferences of the informed general public, those people who have taken the time to carefully consider a decision [24] . Random selection ensures that no single entity or organisation can manipulate the outcomes by controlling the panel composition.

Scholars have proposed that similar approaches could be useful for making decisions on online content moderation [22, 66] . For e.g., Zittrain proposes that high school civics students serve as citizen jurors assessing the acceptability of online political advertisements [66] ; curricular activities and teacher supervision would ensure a sufficient level of informedness of the raters. In the dataset we analyze in this paper, the raters searched for corroborating information before providing ratings, thus insuring a minimal level of informedness. Furthermore, less effort may be needed for determining a moderation preference on a single item than the kind of issues where deliberative polls have required hours or days in the past. For example, Fan and Zhang [22] have explored convening juries who deliberate about a content item for a minimum of just four minutes, before reporting a collective judgment.

One way to elicit the will of the people is to ask a panel of raters directly what enforcement actions they think are warranted against individual content items. We will refer to this as eliciting action preferences. In practice, this approach has not been explored much. One recent example, however, is Twitter's Birdwatch, which invites a community of users to assess whether tweets should have warning labels and compose the content of the warnings [16] .

Alternatively, a panel of raters may be asked to rate a small set of high-level attributes that require some subjective judgment. We will refer to this approach as eliciting holistic judgments. For example, rather than asking people to rate the accuracy of the headline, the presence of sources, etc., they may be asked to simply rate its overall credibility [6] .

Bhuiyan et al. asked raters to judge individual content for its overall credibility [6] , but their focus was limited to carrying out an evaluation of the content and it is not immediately clear how their judgment can inform the course of action against the content. In general, any approach that focuses on the judgment of the content will still require policy rules that map judgments on these high-level attributes to specific enforcement actions.

Our approach: In the dataset we analyze in this paper, raters answered both kinds of questions. They reported whether they wanted platforms to take inform, reduce, or remove actions. They also judged two high-level attributes: how misleading the article was and how much harm it would cause if people were misinformed about the topic of the article.

Of course, appealing to the will of the people is also not a panacea for ensuring legitimacy of moderation decisions.

We review both practical difficulties and conceptual challenges, and some of the possible remedies.

The biggest practical difficulty is collecting enough judgments from user or citizen panels in a timely way. Platforms deal with tens or even hundreds of millions of posts each day. And evidence suggests that fake news can travel up to 6 times faster than true news [63] . For some kinds of late-breaking information, such as the aftermath of crisis events, even a highly skilled journalist may not be able to assess immediately what is true and what is not [5] .

Rather than trying to delegate every individual content decision to a panel of lay raters, more modest goals may be appropriate. For example, panels might be convened only for a limited set of items, and not in real time. Their judgments might be used in more limited ways. First they might be used to settle appeals on disputed items. Prior research has Manuscript submitted to ACM argued for improving the contestability of moderation decisions [59, 60] and considering the will of the people as part of the appeals process could prove useful [60] .

Second, the judgments might be used to evaluate, after the fact, how well a platform's actions implemented the will of the people. Current approaches for evaluating platform decisions against misinformation have relied on the factual accuracy of the content [9] but as noted above, factual accuracy (or a lack thereof) may not be sufficient to determine the course of action against a content. Third, the panel judgments could be used as ground truth labels, both for training sets for automated systems [13] and as constraints when platform policy makers update codebooks and policy rules [47] .

Another practical difficulty for the approach of directly eliciting action preferences is providing lay raters with enough understanding of the space of moderation actions to have well-formed opinions about the appropriateness of various actions for particular content items. Many users of social media platforms have only a vague understanding of prioritized feeds [19] and may have even less understanding of specific enforcement actions. In our study, we asked raters about categories of actions (inform, reduce, remove) rather than specific actions. This would only provide imprecise information about the will of the people for very specific actions expressed in terms of internal platform models, such as a 75% reduction in the expected number of views for a content item.

A more conceptual challenge comes from the need for raters to be sufficiently informed to make good judgments. There are some topics which require high levels of scientific, medical, or mathematical knowledge to make good judgments. This is a challenge which academics and practitioners of deliberative polling and other forms of citizen panels have confronted over the decades. There is no perfect solution, but citizen panels can be provided with information resources prepared by experts, or an ability to consult experts [24] . This delegates questions of facts to experts while reserving matters of subjective judgment to the more representative citizen panel. 3 The quality of lay raters' judgments of the holistic attribute of misinformation can be assessed in part by using the judgments of expert raters as a benchmark [3, 6, 27, 51, 52] . Previous studies indicate that panels of lay raters can perform reasonably well against this benchmark on published news articles, suggesting that this challenge may not be insurmountable for most articles [3, 51] .

Another conceptual challenge comes from the need, even in a democracy, to protect minorities from the tyranny of the majority [43] . In the realm of online harassment, Schoenebeck et al. [54] show that users' preferences for moderation approaches vary a lot based on their identities. Other research shows that the identity of a user has the potential to bias how they evaluate misinformation [6, 65] . For misinformation judgments in the U.S., liberals and conservatives may not accept the preference of the majority as sufficient to determine a legitimate outcome, especially if they perceive that they are outnumbered. Similarly, racial, ethnic, religious, and other minority groups may not agree with the majority opinion.

The risks to a minority may come both from too much moderation (restricting their ability to express their identity or communicate with each other) [31] and from too little moderation (allowing content to circulate that misrepresents them and may even incite violence against them) [64] .

Again, theorists and practitioners have developed approaches to mitigating such risks. One is to assure representation of at-risk groups in decision-making panels, a property known as "descriptive representation" [25, 50] . This ensures that their voices will be heard, but they can still be subject to negative outcomes if they are outvoted by the majority.

Another approach is to require a supermajority, such as two-thirds, to justify an action [10, 40] . This presumes an acceptable default, of either inaction or action, when there is no consensus. The current default of all major platforms is non-action, allowing content to go viral unless an enforcement action curtails it [26] . An alternative would be a default of "friction", with virality limited for all content unless an affirmative decision is taken to reduce the friction [17] .

The strongest protection would come from setting thresholds of required votes separately for different groups. If there is sufficient polarization among groups, however, setting even a modest threshold such as 40% of Democrats and 40% of Republicans, could lead to gridlock, with no actions approved. Also, when a minority is potentially in danger both from moderation inaction as well as action, such higher vote thresholds can only protect against one or the other danger, not both. If the default is no action, then more stringent thresholds will reduce enforcement actions. If the default is friction, then more stringent thresholds will increase enforcement actions.

Given the societal controversy about platforms' moderation practices on misinformation, a natural first question about the will of the people is whether this is due to problems of implementation or whether there are underlying differences of opinion about what should be done about particular items. If there are many items where there is no consensus on the right actions to take, then no moderation process can yield outcomes that please almost everyone almost all the time.

• RQ1: How much agreement is there, among informed lay raters, about the preferred actions to be taken on potential misinformation?

Next, we explore the role of political ideology in rater's preferences for moderation actions on misinformation. In the U.S., which is the focus of this study, ideological differences are often cast along a single dimension, from liberal to conservative. A previous analysis of the same dataset we analyze showed that the high-level misinformation judgments of liberal and conservative raters are more correlated among raters who do independent research before rendering judgments than among those who are less informed, but that some partisan differences remain [51] . Thus, we might also expect to see some partisan differences in action preferences, for two reasons.

First, there may be systematic differences in values between liberals and conservatives. Surveys based on moral foundations theory show that liberals tend to focus only on harm and fairness, while conservatives also focus on loyalty, authority, purity, and liberty [29, 30] . If more conservatives harbor libertarian convictions about the importance of free speech even when it is harmful, we might expect conservatives to prefer that platforms take fewer actions overall.

And even on items where a majority of conservatives prefer that some action be taken, there may be a large group of dissenters, leading to less agreement among conservative than among liberal raters. Other surveys based on personality traits have shown differences between liberals and conservatives in traits such as openness and conscientiousness [12] . If liberals are more open, we might expect them to prefer that platforms take fewer actions overall and have less agreement among raters. Thus, we have:

• RQ2: Is there more or less agreement among conservative raters than among liberal raters about their preferred platform actions? • RQ3: Do conservative raters prefer that platforms act on fewer items than liberal raters prefer?

Second, there is a strategic issue. Some content, whether accurate or not, may help to sway public opinion in favor of policies or candidates. Raters may report a preference for actions that increase the distribution of content that favors their side. Thus, even if there is no difference in overall preferences, we might expect some differences in which items raters prefer platforms to act on. Thus, we have:

• RQ4: Do conservative raters prefer action on items from different sources than liberal raters prefer?

It is not immediately obvious that raters can develop enough understanding of the space of moderation actions against misinformation to have well-formed opinions about the appropriateness of each action. For instance, platforms' public descriptions of their misinformation mitigation processes often imply a hierarchy of severity of actions: warnings or information labels are the least severe, reducing the distribution through downranking is more severe, and removing an item entirely is the most severe 4 . As a sanity check, we consider whether the aggregate ratings across individual articles also indicate a consistent hierarchy in terms of the severity of different action types. This leads to the question:

• RQ5: Is there a hierarchy of perceived severity of actions?

Next, we consider the extent to which action preferences are correlated with judgment of the holistic attributes of misinformation and harm. This is motivated in part by the difficulty of explaining specific enforcement actions to lay raters. While collecting judgment on holistic attributes does not fully address the issue of inferring the will of the people for very specific enforcement actions, if preferences for the abstract actions of inform, reduce, and remove can be predicted from the holistic attributes, it would suggest that it is not necessary to directly elicit preferences for specific actions in order to understand the will of the people. If, on the other hand, high-level misinformation and harm judgments explain little of the variance in action preferences, it would indicate that action preferences are based on some other factors that are not captured by those two high-level judgments. Thus, we have:

• RQ6: How well can aggregate judgments of whether an item is misleading and/or harmful predict aggregate preferences for the inform, reduce, and remove actions?

Raters on Amazon Mechanical Turk rated a set of news articles. For each article, each rater provided a judgment of how misleading the article was, how harmful it would be if people were misinformed about the topic. For each article, each rater also reported three binary action preferences, whether they thought platforms should inform users that the item was misleading, reduce the item's distribution, and/or remove it entirely. Details of the dataset follow.

A total of 372 articles were selected, taken from two other studies. As described in [3] , a set of 207 were selected from a larger set provided by Facebook that were flagged by their internal algorithms as potentially benefiting from fact-checking. The subset were selected based on a criterion that their headline or lede included a factual claim. The other 165 articles consisted of the most popular article each day from each of five categories: liberal mainstream news; conservative mainstream news; liberal low-quality news; conservative low-quality news; and low-quality news sites with no clear political orientation [27] . Five articles were selected on each of 33 days between November 13, 2019 and February 6, 2020. Our study was conducted a few months after that. Four articles were removed because their URLs became unreachable during the course of the study, leaving a total of 368 articles for the analysis. To provide a sense of the articles, Table 1 describes four of them where rater action preferences were not uniform. Others 107 20

Before rating their first article, all raters completed a qualification task in which they were asked to rate a sample article At the completion of the qualification task, workers were assigned to one of three groups based on their ideology.

We asked about both party affiliation and ideology, each on a five-point scale; raters who both leaned liberal and leaned toward the Democratic Party were classified as liberal; those who both leaned conservative and leaned toward the Republican Party were classified as conservative; others were classified as others, though this group also includes people with non-centrist ideologies but whose party affiliation did not match their ideology.

Each rater could rate as many articles as they wanted, although each rater could rate an article only once. Eighteen people of each ideology rated each news article. Raters were paid approximately $15 per hour. Table 2 shows the recruitment funnel. In order to get equal numbers of subjects who passed the qualification test, which consisted of answering questions about the instructions after completing one sample rating task, more subjects had to be randomized to the individual research condition. Of subjects who completed the qualification, a higher percentage of subjects in the second condition went on to complete rating tasks. Table 3 shows that more raters were liberal than conservative or others. Thus, in order to get eighteen ratings from each rater group, the conservative and other raters rated more items than the liberal raters, as shown in the second column.

Step 1: Evidence. Raters were first asked to read a news article by clicking on the URL. In order to solicit an informed judgment on the article, raters were asked to search for corroborating evidence (using a search engine) and

provide a link to that evidence in the rating form (see Figure 1 ). Step 2: Judgments. Then, raters were asked to evaluate "how misleading the article was" on a Likert scale from 1=not misleading at all to 7=false or extremely misleading. The question was designed to solicit a holistic judgment about the article rather than focusing on a fixed set of attributes (such as, a factual claim, the accuracy of the headline, etc.). We also avoided using loaded terms such as, "fake news" or "mis/disinformation" where users may already have preconceived notions about the term [11] . We also provided raters with an option to say that they did not have enough information to make a judgment, although the option was rarely used (<3% of judgments across all articles); these ratings were excluded from the final analysis.

A second question asked raters to evaluate "how much harm there would be if people were misinformed about this topic?" on a Likert scale (1=no harm at all to 7=extremely harmful). We framed the question counterfactually to discourage any link between misleading judgments and harm judgments. (see Figure 2 in Appendix) Step 3: Action Preferences. In the next step, we asked each rater to provide their personal preferences for action against each news article. First, a rater was asked whether, in their personal opinion, any action was warranted ( Figure   3 ). If they answered yes to that question they were asked three binary questions, one for each of three possible actions, inform, reduce, and remove ( Figure 4) . 

We first provide descriptive statistics about the overall frequency with which raters reported that each of the actions was warranted (see Table 4 ) and then proceed to answer our research questions. For each research question we first describe the analysis method and then immediately report the results. Table 5 . Mean level of disagreement with the majority rating

RQ1: How much agreement is there, among informed lay raters, about the preferred actions to be taken?

To address the first question, we summarize the aggregate action preferences for each article as the percentage of users who said each action should be taken. If the distribution of these aggregate preferences is bimodal, with almost every item having close to 0% or close to 100% wanting each action, then we would have very high agreement among the raters. Figure 5 shows histograms of aggregate action preferences, across all 368 articles. For example, on 91 articles, less than 10% of raters wanted an inform action, while on 146 articles the preference for inform as the action was between 30-70%. When remove was preferred by a majority, it was almost always a slim majority, with only one article achieving a supermajority of more than 70%. More generally, for all three action types, among items where more than half the raters preferred the action, it was more common to have 50-70% of raters prefer the action than to have near universal agreement of 80-100%. Table 5 summarizes the level of disagreement with the majority preference. Averaging across items, 23.57% disagreed with the majority preference about whether to take an inform action. The mean level of disagreement was lower for the remove action because there were many items that almost everyone agreed should not be removed. Note that the mean disagreement would only be little higher (11.85% instead of 11.05%) with decisions not to remove any items.

In summary, we find considerable variance in the level of disagreement between rater preferences for action on individual articles and on many articles, no clear consensus exists.

RQ2: Is there more or less agreement among conservative raters than among liberal raters about their preferred platform actions?

To answer the second question, consider the second and third rows of Figure 5 and Table 5 . Conservative raters seem to have a higher fraction of articles where there is no clear consensus about the inform and reduce actions (i.e., 30-70% prefer the action) and this leads to somewhat higher percentages of disagreement with the majority preference (Table   5 ).

RQ3: Do conservative raters prefer that platforms act on fewer items than liberal raters?

The second and third columns of Table 4 show that conservatives wanted actions a little less often than did liberals overall. This also translated into minor differences in the number of items that majorities of liberals and conservatives would like to see platforms act on, as shown in the second and third columns of Table 6 . A majority of liberals preferred inform and reduce actions on a few more articles than did a majority of conservatives. However, a majority of conservatives preferred removal of 22 articles while a majority of liberals preferred removal of only 15.

RQ4: Do conservative raters prefer action on items from different sources than liberal raters? Inform  104  118  108  Reduce 70  80  74  Remove 12 15 22 Table 6 . No. of articles recommended for each action type based on the aggregate preferences of different user groups To answer the fourth research question, we classify the political leaning of the publisher of each article, using labels from MBFC 5 . Figure 6 presents a visual confusion matrix. Each dot represents one article, and the color denotes the bias of the source of that article. Blue represents pro-liberal sources; red represents pro-conservative sources; black sources classified as unbiased; grey represents sources whose ratings were not available from MBFC 6 . The x-axis represents the percentage of liberal raters who wanted the action to be taken, the y-axis the percentage of conservative raters who For each action type (represented via different plots), we see that the preferences of the two groups follow the same general trend, and in most cases, result in the same decision. However, when differences do appear, for instance, when more conservatives want action than liberals (upper-left quadrant), the articles mostly come from pro-liberal sources (blue dots). Similarly, when more liberals want an action (lower-right quadrant), the articles mostly come from pro-conservative sources (red dots). The pattern is also evident from the correlation statistics (Table 7) . While the conservative and liberal action preferences are largely correlated on unbiased sources, the correlation drops quite a bit for sources that are ideologically biased toward either side.

To answer the fifth research question, we compared the aggregate preferences for the three actions on each item. All articles where more than half the raters preferred removal also had majorities wanting the reduce and inform actions; moreover, all articles where a majority wanted the reduce action the majority also wanted the inform action (see Figure   7 ). For all but one item, more people wanted an inform action than wanted a reduce action. On all items more people wanted a reduce action than wanted a remove action.

There was some effect of presentation order, as shown in Table 8 . When reduce was presented before the inform action, more people preferred a reduce action compared to the condition when inform was presented before the reduce action. However, even considering only those raters who were presented with the reduce action first, the overall fraction of users who preferred inform action was higher than those who preferred reduce action. Furthermore, there were no items where a majority wanted a reduce action but did not want an inform action. Thus, we conclude that raters consider inform as the most lenient of the three actions, followed by reduce and remove.

RQ6: How well can aggregate judgments of whether an item is misleading and/or harmful predict aggregate preferences for the inform, reduce, and remove actions?

We use regression models to predict users' aggregate action preferences from their aggregate judgments. We first train a separate model for each action type. Our dependent variable (or the predicted variable) for each model is the fraction of raters who want that action. Since the dependent variable is a count proportion, we use a Generalized Linear Regression Model (GLM) with binomial family and a logit link (as recommended by Zuur et al. [71] ). The predictor variables are the means of all users' judgments. We estimate four models using different predictors -only misleading judgments, only harm judgments, both without an interaction term and both plus an interaction term.

We use two information criteria (AIC and BIC) to select the best fit model for each action type. We find that models with both the predictors (misleading and harm judgments) perform better than the models with only one of them. For inform and reduce, the best performing model includes the interaction term as well, but not for remove. All the trained models and their AIC-BIC values can be found in the Tables in Appendix A.2. We present the rest of the results using the best model for each action type.

In order to interpret what each model has learned, we plot the decision boundary in terms of misleading and harm judgments ( Figure 8 ). That is, in each regression equation, we set the value of the predicted preferences to 0.5 (i.e., a majority-based decision boundary), and plot the resulting curve between misleading and harm judgments; articles to the right of the curve (i.e., greater values of harm and misleading judgment) are recommended for that action and articles to the left are not.

If we rely on judgments in that way to make the final decisions, articles with misleading judgment greater than 3.95 are always recommended for inform. In addition, when the misleading judgment is greater than 4.85, reduce is also recommended as one of the actions. Harm judgment can further mediate this effect, as higher harm scores can lead to the same (or a more stringent) action even when the misleading judgment is lower. For instance, an article with misleading and harm judgments of 4 and 3, respectively, is recommended for inform, while another article with scores of 4 and 5 is recommended for reduce as well. Articles with misleading judgment below 5 are never recommended for remove. When the misleading judgment is higher than 5, the type of action may still depend on the harm judgment.

Furthermore, none of the decision curves intersect with the harm-axis, which suggests that harm judgments alone may not be sufficient to recommend an action against an article. This makes sense because most people would not want to take action against an article containing good information just because the topic was one where misinformation would Table 9 . Comparison between decisions based on aggregate preferences and decisions based on the the regression model's output given the aggregate judgments of misleading and harm (Jaccard Index measures similarity between the two set of articles) Fig. 9 . Comparison between actual preferences and the predicted preferences for each action type; horizontal dotted line represents judgment-based decision boundary, vertical dotted line represents preference-based decision boundary be harmful. Also note that the decision boundary for the remove action is a straight line, reflecting the result that the best fit model did not include an interaction term between the misleading and harm judgments. Table 9 shows that decisions generated from the output of the regression models closely follow the decisions that would be made from taking the majority vote on the preference questions. The Jaccard Index, a popular metric for computing set similarity, shows that the two sets of articles recommended for each action type (one based on prediction, one based on preference) are largely similar. While the preference-based outcomes produce the lowest level of disagreement given the aggregate preferences, the level of disagreement is only slightly higher if we use the predictions of the regression models, based on aggregate judgments of misleading and harm. (Table 9 ). We also analysed the set of items that have a prediction-preference mismatch but found no obvious pattern in the topics or other features of the articles (a listing of those articles is in Appendix A.3).

Finally, we test whether the results we obtain are generalizable or not by evaluating our prediction performance on held-out test sets, using cross-validation. We train our models over 10 iterations, each time using a different partial dataset (80% -294 articles) and report the prediction performance on the corresponding 11 .86 (SD: 1.02) Table 10 . Prediction performance in terms of the level of disagreement (%age users) on a held out test set (74 articles) over 10 iterations held out test set (20% -74 articles). Table 10 shows the prediction performance in terms of the level of disagreement, in line with our emphasis on minimizing the overall disagreement associated with the action decisions. Comparing between the preference-based outcomes and the judgment-based outcomes of the regression model, the level of disagreement is largely consistent, suggesting that our models are generalizable and can be used to predict action preferences on completely new articles as well. Additionally, we also report two standard metrics for evaluating predictions -namely, the F1 score and the Jensen Shannon (JS) distance -as part of the appendix (see A.4).

In summary, we find that relying on users' aggregate judgments (about how misleading and/or harmful content is) for predicting their action preferences does not lead to large increases in the level of disagreement with the decisions.

Furthermore, both misleading and harm judgments play a role in the best performing model, and the models we train are generic enough for predicting action preferences on other articles as well. Thus, aggregate action preferences are largely reducible to aggregate judgments of misleading and harm, at least when raters are asked to provide misleading and harm judgments prior to reporting their action preferences.

Our results empirically show that there may not always be a clear consensus in terms of people's moderation preferences for individual content items. For instance, on 146 articles the preference for inform as the action was between 30-70%.

Similarly, there were 110 articles with reduce preference between 30-70%. When remove was preferred by a majority, it was almost always a slim majority, with only one article achieving a supermajority of more than 70%. On average, the level of disagreement with the majority preference ranges from 11.05% for remove to 21.12% for reduce and 23.57% for inform.

Furthermore, the lack of consensus cannot be explained entirely by ideological differences. We find differences in the preferences among ideologically-aligned users as well. For instance, if we consider only the liberal raters, on average more than 20% would disagree with the majority preference (of other liberals) for both reduce and inform actions (see Table 5 ). Similarly, more than 20% of conservatives would disagree with the majority preference among conservatives for both inform and reduce decisions. Prior research indicates that other demographic factors, such as age, can also influence individuals' evaluation of misinformation [6] . However, since we did not collect any other demographic information from our raters, we cannot comment on these factors here. It suffices to say that platforms can expect considerable differences in the moderation preferences of individual users, and therefore, many of their moderation decision may not be generally acceptable to many users.

One implication is that the public discourse around platform enforcement practices needs to shift. The prevailing narrative is that the primary challenge in moderation decisions about misinformation is one of separating fact from fiction. Indeed, platforms have partnered with independent fact-checking organizations to verify facts [20] while considerable emphasis has been put on scaling up the fact-checking process to handle the large volume of content circulating on social media platforms [3, 27, 49, 56] . The framing implies that platforms be judged by how "correct" their actions are. This is incomplete as facts are objective and most reasonable people should largely agree about facts, but in practice, many of the platforms' actions against misinformation may be contentious.

Instead, we need a framing that there may be legitimate differences of opinion about what kinds of content are harmful to the public when they appear unsolicited in search results and social media feeds. These differences may be driven by different judgments about how harmful the content would be if widely distributed as well as different judgments about the harms that may come from reducing its distribution. Like other consequential decisions where there are differences of opinion, they may need to be resolved through partially political processes. With this framing, platforms should strive for a process that produces "legitimate" actions, outcomes that are broadly accepted even by people who do not agree with all of them.

The debates in popular media [37, 41] and some survey results [1] suggest that conservatives tend to be strong proponents of free speech and generally, against any kind of censorship by social media platforms. However, by collecting action preferences on individual articles, we find no strong difference between liberals' and conservatives' preferences with regard to how often each action needs be taken.

We do find some differences in which items they think should be acted on: more conservatives would like to see action taken against articles from pro-liberal sources, and more liberals would like to see action taken against pro-conservative sources. The partisan disagreements are strong enough to suggest that simply acting on the will of the majority might not be the best aspiration. If conservatives sense that the judgments and preferences of liberal raters are causing reduced distribution of articles that conservatives approve of, it could increase affective polarization. This effect may be exacerbated if those articles would primarily have been viewed by conservative readers.

The partisan disagreements are not so large, however, that any counter measure to accommodate the disagreement would lead to gridlock. Consider, for example, a decision rule requiring a majority of ratings from both liberals and conservatives. In Figure 6 , that would correspond to the upper right quadrant of each graph. For both inform and reduce actions, the upper right quadrant is well populated and includes both red and blue dots, indicating articles from liberal and conservative publications. The upper left and lower right quadrants would be excluded, preventing action on many items from liberal sources that liberals approved of and items from conservative sources that conservatives approved of.

Our results show that partisan disagreements to some extent could be handled by additional measures, such as, setting multiple thresholds for different groups. It is reasonable to argue that such measures may be suitable for handling other kinds of differences as well. However, as noted in Section 2.2.2, decision rules based on supermajorities or thresholds for multiple groups reduce the number of positive decisions, leaving more of the "default" decision. Currently, the default for platforms is to take no action. Thus, more stringent requirements would mean fewer actions. If the primary risk to a minority group is over-moderation, the more stringent criteria will mitigate the risk. If, however, the primary risk to minority groups is under-moderation (e.g., when misinformation circulating about a group is fomenting violence against them) then it might make sense to have lower thresholds, for example requiring a majority of the at-risk group the prefer the action as well as at least 25% of the overall population. If the will of the people is a legitimate way to make these decisions, future work may explore the various rules and provisions that could be put in place in line with different objectives.

One of the challenges for eliciting action preferences described in section 2.2.2 is the need for raters to understand the space of potential enforcement actions. Our raters were required to pass a quiz that checked their understanding of the abstract action terms: inform, reduce, and remove. However, they could have answered those questions correctly by syntactic matching of words in the quiz questions and words in the definitions that we provided, without truly understanding them. One indication that our raters understood them reasonably well is that their preferences implied a hierarchy of severity that makes intuitive sense, with remove the most severe, followed by reduce, and then informational labels as the least severe action. Moreover, this ordering largely held even when the interface reversed the order of presentation of the inform and reduce actions. While no prior research has been done to specifically confirm this ordering of severity in user's minds, Twitter's guidelines 7 seem to assume this ordering. Furthermore, this ordering is consistent with research on the actual impacts of these actions. When evaluating the effectiveness of inform as a strategy against fighting online misinformation, i.e., whether it affects a reader's perception about the news article, studies have concluded that the effect is probably small [15] , and thus it makes sense to think of reduced distribution as a more severe enforcement action.

We find that misinformation judgments on their own were not sufficient to determine action preferences. Articles judged as extremely misleading or entirely false warranted action, but less severely misleading articles were also deemed actionable if the topic of the article was judged to be one where the public would be harmed by being misinformed (see the decision boundaries in Figure 8 ). Furthermore, making decisions using both the misinformation and harm judgments (and their corresponding thresholds) would yield action choices that would please almost as many raters as always choosing the majority preferred actions (see Table 9 ). Thus, it appears that in practice it may not be necessary to directly elicit action preferences at all.

We speculate further that it may be sufficient to elicit just a single judgment from each rater, what we will call actionability. Raters could be asked to report the extent to which a content item is potentially harmful enough that some enforcement action should be taken. Different thresholds could be set on the mean actionability rating: above the lowest threshold, an inform action would be taken; above a somewhat higher threshold distribution would be reduced;

at an even higher threshold, the item would be removed. Enforcement rules could even make a continuous mapping, where the extent to which the distribution of a content is reduced goes from 0% to 100% in line with the increase in the actionability ratings.

One caveat to this speculation, however, is that we asked each rater to provide misinformation and harm judgments before stating their action preferences. This could have encouraged them to make their action preferences correlate highly with their misinformation and harm judgments. Future research could conduct a between-subjects test to see whether misinformation and harm judgments from one group of raters can predict the action preferences of a different group of raters.

No matter what the platforms do in their misinformation moderation decisions, it will not be possible to please everyone all the time. We find that there are many news articles and potential moderation actions where opinions are fairly evenly split. Thus, it may be helpful to reframe expectation for platforms. Instead of expecting them to produce correct decisions, the public should expect platforms to make decisions that will be accepted as legitimate, even by people who disagree with particular decisions.

One source of legitimacy can come from following transparent procedures, based on pre-announced policies that map from quasi-objective attributes to enforcement actions. Platforms largely try to follow this approach today. The background section noted ways that they could enhance the legitimacy of their rule-based policies: by increasing inter-rater agreement on the quasi-objective attributes, by ensuring a nuanced enough set of attributes and rules to cover the subtleties of cases that occur in practice, and by externalizing decisions about the policy rules. This paper explores another potential way to produce legitimate decisions, following the will of the people. It would be impractical to convene a user or citizen panel to make every single enforcement decision, but such citizen panels could produce the final judgments for a limited set of cases, which would be used for arbitrating disputes, labeling training data, and evaluating platform outcomes for transparency report cards. The legitimacy of the will of the people can be further enhanced by bringing transparency in the formation and composition of citizen panels, ensuring that panels have adequate access to expertise, and incorporating policies that protect minority voices. As Winston Churchill stated: "No one pretends that democracy is perfect or all-wise. Indeed, it has been said that democracy is the worst form of Government except all those other forms that have been tried from time to time" [14] .

Despite the high degree of political polarization in the U.S. currently, liberal and conservative raters mostly agreed about which articles needed enforcement. Thus, trying to implement the will of the people would not imply doing the will of whichever group happens to be slightly larger, at the expense of the slightly smaller group. Some ideological splits did occur for articles from ideologically-aligned sites. Even there, however, there were some articles from left-leaning sites that a majority of liberals rated as actionable, and similarly for conservative raters and sites. Thus, action rules could be established that require some level of approval from both liberals and conservatives.

It does appear to be possible to elicit the will of the people. Action preferences were not entirely predictable from just misinformation judgments, but they were quite predictable from a combination of misinformation and harm judgments.

Rather than asking about preferences for particular actions, or judgments of misinformation and harm, it may be simpler to simply elicit a rating of overall actionability.

Many details of procedures would need to be worked out in order to develop a practical way of legitimating platform misinformation moderation decisions though user or citizen panels. Initial evidence suggests that the approach is promising.

A. Note: * p<0.1; * * p<0.05; * * * p<0.01 

We use F1 score to measure the classification accuracy, i.e., whether the predicted decision (action or no action) matches with the preference-based decision. We also use Jensen Shannon (JS) distance to measure the distance between the underlying distributions, i.e., actual preferences and predicted preferences, without considering the decision outcome.

The average result from 10 iterations are reported in Table 17 . We find the F1 score and the JS distance to be largely consistent over these iterations. The high values of F1 score and low values of JS distance helps establish that our models are generalizable and can be used to predict action preferences on completely new articles as well.

Manuscript submitted to ACM Table 17 . Prediction performance using F1 score and JS distance on a held out test set (74 articles) over 10 iterations

Most Americans Think Social Media Sites Censor Political Viewpoints

Social Media and Fake News in the 2016 Election

Scaling up fact-checking using the wisdom of crowds

The Partnership Press: Lessons for Platform-Publisher Collaborations as Facebook and News Outlets Team to Fight Misinformation

The challenges of responding to misinformation during a pandemic: content moderation and the limitations of the concept of harm

Connie Moon Sehat, and Tanushree Mitra. 2020. Investigating Differences in Crowdsourced News Credibility Assessment: Raters, Tasks, and Expert Criteria

Fact-Checking Won't Save Us From Fake News

Answering impossible questions: Content governance in age of disinformation

Types, Sources, and Claims of COVID-19 Misinformation

Building bridges across political divides: Experiments on deliberative democracy in deeply divided Belgium

Dead reckoning: Navigating content moderation after "fake news

The secret lives of liberals and conservatives: Personality profiles, interaction styles, and the things they leave behind

Efficient elicitation approaches to estimate collective crowd answers

Remarks on Democracy

Real Solutions for Fake News? Measuring the Effectiveness of General Warnings and Fact-Check Tags in Reducing Belief in False Stories on Social Media

Introducing Birdwatch, a community-based approach to misinformation

How to Stop Misinformation Before It Gets Shared

The crisis of democracy and the science of deliberation

I always assumed that I wasn't really that close to [her

Facebook's third-party fact-checking program

What is the Oversight Board?

Digital Juries: A Civics-Oriented Approach to Platform Governance

Experimenting with a democratic ideal: Deliberative polling and public opinion

Fair algorithms for selecting citizens' assemblies

Custodians of the Internet: Platforms, Content Moderation, and the Hidden Decisions That Shape Social Media

Moderating with the Mob: Evaluating the Efficacy of Real-Time Crowdsourced Fact-Checking

General Guidelines

Liberals and conservatives rely on different sets of moral foundations

The righteous mind: Why good people are divided by politics and religion

Disproportionate Removals and Differing Content Moderation Experiences for Conservative, Transgender, and Black Social Media Users: Marginalization and Moderation Gray Areas

The Path of the Law (1897 Speech)

Quote Investigator

Lexicon of Lies: Terms for Problematic Information

Defining "Fake News

Fake News: A Legal Perspective. SSRN Scholarly Paper ID 2958790

Why Big Tech and Conservatives Are Clashing on Free Speech

Content Analysis: An Introduction to its Methodology

Regulating online content moderation

Deliberative democracy in America: A proposal for a popular branch of government

Debate intensifies over free speech rights of conservatives on social media

The Silicon Valley ethos: Tech industry products, discourses, and practices

Direct democracy and minority rights: A critical assessment of the tyranny of the majority in the American states

Hard Questions: What's Facebook's Strategy for Stopping False News?

Evidence Or the Event-On Judicial Proof and the Acceptability of Verdicts

Disinformation Nation: Social Media's Role in Promoting Extremism and Misinformation

The value of social media data: Integrating crowd capabilities in evidence-based policy

Innovative citizen participation and new democratic institutions: Catching the deliberative wave

Towards Fact-Checking through Crowdsourcing

The concept of representation

Informed Crowds Can Effectively Identify Misinformation

Survey Equivalence: A Procedure for Measuring Classifier Accuracy Against Human Labels

Why the government should not regulate content moderation of social media

Drawing from justice theories to support targets of online harassment

Helping Fact-Checkers Identify False Claims Faster

Impact of Rumors and Misinformation on COVID-19 in Social Media

Synthetic and manipulated media policy

At the End of the Day Facebook Does What ItWants" How Users Experience Contesting Algorithmic Content Moderation

Contestability For Content Moderation

An update on our continuity strategy during COVID-19

Partisans in the U.S. increasingly divided on whether offensive content online is taken seriously enough

The spread of true and false news online

Hate Speech on Social Media: Content Moderation in Context. Conn

A Structured Response to Misinformation: Defining and Annotating Credibility Indicators in News Articles

A Jury of Random People Can Do Wonders for Facebook

Gaining Power and Losing Control

Three eras of digital governance

A Blueprint for Content Governance and Enforcement | Facebook

Preparing for Elections

GLM and GAM for Absence-Presence and Proportional Data