key: cord-0435478-hjvcj673 authors: Roth, Camille; St-Onge, Jonathan; Herms, Katrin title: Quoting is not Citing: Disentangling Affiliation and Interaction on Twitter date: 2021-12-01 journal: nan DOI: nan sha: a4e9c8d4dcf4b20dcf8b6c232341416fc00d49dd doc_id: 435478 cord_uid: hjvcj673 Interaction networks are generally much less homophilic than affiliation networks, accommodating for many more cross-cutting links. By statistically assigning a political valence to users from their network-level affiliation patterns, and by further contrasting interaction and affiliation (quotes and retweets) within specific discursive events, namely quote trees, we describe a variety of cross-cutting patterns which significantly nuance the traditional"echo chamber"narrative. The socio-semantic assortativity of online networks is now a classical result: at the macro level, social clusters are often semantically homogeneous, exhibiting for instance similar political leanings [1, 16] ; at the micro level of users, links form more frequently between semantically similar dyads [24, 7] . These observations depend nonetheless heavily on topics [4, 10] , and on link types: in particular, affiliation links generally configure networks where homophily is much stronger than with interaction links [23] . On Twitter, this dichotomy separates subscriptions (followers) and (dry) retweets, from mentions and replies, whereby the latter are more cross-cutting than the former [8, 20] . By focusing on quote cascades on Twitter i.e., rather short-lived discursive events featuring in the same instance both link types (namely, quotes and retweets), we aim to examine the simultaneous manifestations of the affiliation/interaction dichotomy, which is normally studied in a separate or aggregate manner. Tweet cascades, or retweet trees, have long been studied from a diffusion perspective. Such trees are heterogeneous structurally [18] and generatively, for instance alternating broad and deep propagation dynamics [14] ; their formation speed and their range depends on content type, such as true vs. false news [26] . Quote tweets, or tweets with comments, appeared more recently (2015) even though they remind of the original conversational use of retweets [5] before becoming a proper tool on the platform. Research on quotes is still relatively sparse but confirms they are instrumental in (possibly antagonistic) conversation rather than propagation [12, 15] . A related strand of research has questioned whether online public spaces stimulate the development of like-minded groups or foster the exposure to diverse content [19, 3] . For one, beyond a commonly observed right-/left-wing biclustering, aggregate Twitter networks exhibit a mix of supportive and oppositional relationships [25] and a certain asymmetry whereby mainstream content receives much more attention from so-called "counter-publics" than the other way around [17] -all of which hints at a diversity of attitudes towards cross-cutting content and interactions. As we shall see, quote cascades on Twitter gather ephemeral publics that are generally local, in terms of time and of participants. By examining the structure of cross-cutting participation in quote trees, we also aim to contribute to study how local online arenas of a certain political orientation attract participants affiliated with diverse political orientations. In this regard, a series of recent results go against the grain of the traditional "echo chamber" narrative: users appear to engage heavily with content affiliated with an opposite camp, such as commenting on YouTube videos of some opposing channel [27] or posting messages on a Reddit thread of some opposing "subreddit" [21] ; more precisely, there exists a continuum of roles where users are diversely embedded in bipartisan networks i.e., are at the interface between users of opposing political affiliations, or not [11] . In a nutshell, we aim to describe the local and largely ephemeral quote tree structure in regard to the political valence both of the original content and of the users who further participate in trees in various ways; the valence is itself computed from a network observed on a much wider temporal and topological scale, thus serving as a basemap. This enables us to distinguish a variety of cross-cutting interaction patterns and roles. Perimeter and collection. Over the whole year of 2020, we collected all publications by French-speaking Twitter users belonging to a perimeter based on the 2019 European Parliament elections. We had previously collected all tweets containing at least one hashtag among {#EU2019, #ElectionsEuropeennes2019, #Europeennes2019, #EP2019, #Européennes2019, #electionsue19, #CetteFois-JeVote} between one month before and one month after the vote (April 26-June 28, 2019), focusing on users active in French (i.e., publishing at least 15% of tweets in that language). We further required users to have published at least 5 tweets over this period (minimum activity) and be above the median number of 195 followers (minimum visibility), which reduced the number of users from 39,938 to 15,919, of which 14,102 were still active in January 2020, and 13,074 in December 2020; reflecting a relatively low attrition rate given the initial focus on 2019 elections. Casual manual examination of this perimeter indicates that there are very few bots and that most well-known news sources or political figures have been included, thus suggesting that it represents a meaningful part of the politics-related French online Twitter space. Tree size and depth. We then build all non-trivial quote trees stemming from a initial tweet, or root tweet, published in 2020. More precisely, we consider recursive cascades of quotes, restricted by construction to quotes from perimeter users, while excluding quotes where a user quotes themselves, and comprising at least one quote. The dataset features 1.13m trees generated by 12,462 unique users i.e., about 90 trees per active user, following a usual heterogeneous distribution (inset of Fig. 1-left) . Top users are unsurprisingly accounts of media and political figures generating in excess of 10k trees over the whole year i.e., dozens a day. Besides, trees of more prolific users are generally larger on average and among the largest ones (violin plots on Fig. 1 ). Tree size also follows a heterogeneous law whereby 75% of all trees are of size 2 or 3, 90% of size 6 or less, and only 1% are larger than 30 nodes, as shown on Fig. 1 -right. By definition, larger trees gather more quotes and thus represent a larger portion of the dataset in relative terms. To keep the focus on quotes and avoid an over-representation of relatively trivial and very small trees in the subsequent computations, we rather consider the coverage of the dataset in terms of tree nodes. This leads us to define thresholds of small, medium or large trees by considering respectively a coverage of 75% of all nodes (trees containing up to 17 nodes), 90% or less (up to 71 nodes), and the last decile (remaining trees up to a maximum of 1786 nodes). The average depth of trees, denoted as d and computed as the average distance from the root tweet over all nodes, is generally small, with more than 90% of trees with a d of 1 ( Fig. 1-right) , indicating the absence of secondary quotes, or quotes of quotes. Less than 2% of trees feature a d > 1.5 (majority of secondary quotes) and less than 3% of nodes belong to such trees. On the whole, depth is a relatively rare phenomenon, as shown by the exponentially decreasing number of chains reaching a certain depth over all trees (solid line on Fig. 2) . Furthermore, deeper chains correspond to ping-pongs between two individuals (A-B-A-B...) rather than iterative quoting between distinct users (A-B-C-D...): to show this, we plot the number of distinct quoters in a chain, as a function of its maximal depth, focusing on terminal subchains of a given length w. In Political valence of users. We define the likely political position of users by estimating their so-called "Ideal Point" (IP), a technique first introduced to infer a unidimensional political valence of lawmakers from the set of bills they support [22] and more recently applied on Twitter users based on the set of accounts they follow [2] . This method relies on the manual attribution of a fixed valence to a small subset of bootstrap users, or "elites", from which positions are computed for the whole dataset along affiliation links. We use here the set constructed by [6] comprising 2,013 elites of the French political realm. We then collected the follower set for all users of our dataset (as of January 2021). We were eventually able to compute the IP value of 9,815 users who follow at least 10 elites which provided enough information for the IP estimation. We observe in Fig. 3 that IPs may roughly be broken down into three ranges gathering each a third of the density: markedly negative values where IP < − 1 3 ; somewhat central values around 0, IP ∈ [− 1 3 , 1 3 ]; and markedly positive values, IP > 1 3 . For users whose political affiliation is explicitly known, who are also well represented in our dataset which further confirms our good coverage of the political space, these three ranges match what is usually considered as left-wing, center and right-wing, respectively. For instance, all users who are explicitly members of PS (Parti Socialiste, left-wing) have an IP below 0 with an average around -1; while all members of LR (Les Républicains, right-wing) have an IP above 0, of average around +1. Without entering into a debate concerning the relevance of political labels based on unidimensional values, we deem IPs to be a sufficient proxy to characterize the relative political positions of users generating and participating in quote trees. To describe quote trees in relation to the political valence of their root author, we now exclusively focus on the 699k trees whose root tweet user has a known IP, denoted ρ. This makes about two thirds of all trees. Relative to user IPs, the distribution of ρ over trees favors central and, to a lesser extent, right-tilted values (essentially close to +1). More precisely, half of tree roots stem from the third of users with a central IP, while about 20% stem from users with a markedly negative IP (left), and 30% with a markedly positive IP (right, with the same peak around +1). Upon casual examination the 30 top accounts generating the most trees, which are thus also larger, belong mainly to mainstream media organizations and, to a lesser extent, center-wing political figures. General features. We first consider the relationship between size, average depth d , and root IP ρ. Results are summarized in Fig. 4 . The left panel shows the distribution of ρ for the three tree size ranges. The largest trees are more often generated by central IP users. The right panels show heat maps for each IP range and three interesting areas, in decreasing order of density: (1) both shallow and small-and medium-sized trees, by far the most frequent ones over the whole spectrum, (2) medium-to large-sized trees and moderately deep, which seem to be more often generated by central IP users, (3) small yet deep trees, whose root is more often made of IP-positive users when focusing on the 1% deepest trees (indicative of a narrow reach with strong tendency to long chains). First-order layer. Based on the above, we contend that focusing on the two first layers (i.e., primary and secondary quotes) captures most of the content framing behavior. We examine the average IP of the first layer of quotes, denoted as Q , with respect to the root user's IP value ρ. We also consider R , the average IP of so-called "dry" retweets of the root tweet i.e., without quoting, which we deem a proxy of its political position: retweets indeed correspond to the audience of users who plainly forward with no further framing. On the whole, we observe on Fig. 5 that R generally follows ρ. Average IP values of quoting users, by contrast, tend to diverge from both R and ρ when ρ is not central, all the more for large trees as indicated on the three small panels. In other words, tweets at the root of large trees generate quotes, or framing instances, from the whole political spectrum, irrespective of the position of their retweeting audience; while smaller trees exhibit a narrower spectrum of quoting reactions, closer to ρ and thus both the root and R . Note that the standard deviations of R and Q , not shown here, are relatively constant across these spectrums -around 0.65-indicating some amount of variability around each average. Figure 6 characterizes further this divergence between quotes and retweets. We compare Q − R with: the root IP ρ (i.e., the difference between the two curves of the previous figure) , the average IP of retweeters R , as well as the offset between retweets and the root IP, R − ρ. This last quantity indicates how far the retweeting population of a root tweet is from the (constant) IP of the root user. We observe first that the divergence Q − R goes, on average again, in a direction opposed to the IP value considered on the x-axis: be it ρ, R or R − ρ. For instance, divergences are increasingly negative for positive root, retweet and offset IP; and the other way around. They also remain under the y = −x curve: the magnitude of this "backlash" is thus smaller than the initial shift from the center (IP=0). Put simply, if a root tweet is tweeted or retweeted by the left, on average, it is still going to be quoted, on average, by the left, but less so. Second, magnitudes of Q − R are larger when compared with R , and even more with R − ρ, than with ρ: they are stronger when a root tweet attracts retweets from non-central users as well as from users off the "baseline" IP of the root. The small middle panel in Fig. 6 is illustrative in this regard: it focuses on trees produced by central users (− 1 3 ≤ ρ ≤ 1 3 ) whose discrepancy Q − R is in aggregate close to 0 (as per Fig. 5 ). Yet, even for these root tweets from central users, Q − R grows as R or R − ρ diverge from 0. To summarize, first-layer quotes diverge more from retweets in larger trees and when root tweet users are non-central, and even more so when average retweeters are non-central or unusually off the root IP. Here again, standard deviations stay around 0.65 for all curves, indicating nonetheless a varied constellation of situations. Assuming that retweets are, on average, rather concentrated around the same IP as the root tweet user, these observations configure an instantaneous, low-level dynamics at the level of individual trees where quotes are all the more off the "baseline" of the retweeting population as this population is off the root user value. This makes it possible to hypothesize that quotes feature local counter-publics of users who come from a distinct set of IP positions to intervene to frame the original content. We thus turn to users. User-centric patterns. Several of these tree-centric observations hold from a user-centric perspective. Figure 7 confirms that users retweet roots roughly along their own IP on average, albeit less so for extreme users (which may probably partly explained by artefactual reasons where e.g., users of IP > 2 do not have much content to retweet on their right). Quotes are however more diverse and as a result the divergence Q − R is not flat and higher in absolute values for non-central users. The bottom left heat map illustrates the former point, the bottom central heat map the latter one. Moreover, the second heat map underlines a higher spread of Q − R for non-central users, some of them exhibiting an average divergence close to 0 (quoting on the same material they would retweet), others exhibiting a high average divergence. This hints not only at the existence of various roles, but at the higher spread of these roles further from the center. Violin plots in Fig. 7 -right support this interpretation: the top (respectively bottom) quartile for users with a negative (respectively positive) IP is above (respectively below) 0. Albeit beyond the scope of this paper, it would be most interesting to examine qualitatively who these users are, both from the content they publish and from interviews, to contrast their interest in participating in an online public space. Quotes of quotes: toward the deeper layer. While primary quotes tend to go against the polarity of the initial root tweet (all the more for non-central roots), in relative terms and all other things being equal, it is unclear whether these dynamics persist deeper in the tree and, for one, whether secondary quotes are made by users whose IP is more aligned with that of primary quoters or not (toward, or away from, the root tweet user). To shed light on this issue, we simply compare the discrepancy between a primary quoter's and the root's IPs, D1 − ρ, with the discrepancy between that primary quoter and their secondary quoters, D2 − D1. We observe in Fig. 8 that secondary quotes tend to turn the tide i.e., they stem from users yet again closer to the root, all the more when the primary quoter's IP diverges from the root. In other words, if a quoter is more to the left than the root, a secondary quoter is going to be more to the right than the quoter, in a sort of back-and-forth movement. The amplitude of this movement is however smaller at the second level -it is as if the shift of secondary quotes was damped: the IP value of the second quoter is, on average, less far from the first quoter, than the first quoter is from the root. Interestingly, the direction of this movement is non-monotonous for the largest trees, where second quoters are further in the same direction as the first quoters for small discrepancies (D1 − ρ), while this trend gets reverted for larger discrepancies (rightmost panel of Fig. 8 ). Put differently, for root tweets generating the largest numbers of quotes, there appears to be two types of quotes: those originating from quoters close to the root IP and which further attract second quoters roughly of the same polarity, and those originating from quoters that are farther and attracting second quoters in the opposite direction. Fig. 8 . Comparison of the discrepancy between a primary quoter's IP D1 and the root IP ρ (x-axis) and the average discrepancy between IPs of secondary quoters and that of their immediate parent D1 (y-axis). Panels: breakdown by tree size ranges. Qualitative homogeneity of some framing practices. We finally qualitatively illustrate one of our findings on the behavior of Q − R by studying in more detail a handful of trees related to the above-mentioned example on the small middle panel in Fig. 6 i.e., with a central ρ close to 0. We focus on large trees, to ensure a meaningful qualitative analysis, and on keywords related to the main political measures to curb the Covid-19 crisis in France (admittedly one of the most debated issues in 2020), to ensure comparability among trees. To exemplify each region of this graph, we arbitrarily select three trees whose R is respectively negative, close to zero, and positive. They respectively deal with (1) lockdown lifting ( R = −.63, Q = −.48), (2) mask mandates ( R = .11, Q = −.22), and (3) vaccination ( R = .80, Q = .30); see Fig. 9 . Overall, we expectedly observe participation from the whole spectrum. We specifically compare quotes from cross-cutting users with those from noncross-cutting users i.e., users who intervene on roots that are primarily retweeted by users from the opposite vs. the same side. As said before, quotes are framing operations and we naturally use the notion of "frames" to qualitatively detail their nature. A frame is defined as a rhetorical device to recontextualize root tweet issues through the lens of a certain perspective, including normative judgments [9] . We build frame categories using an inductive handcoding approach typical of the "grounded theory" [13] , looking for semantic similarities among quotes of a given tree. We then grouped similar claims and detected 7 frame categories for each tree, which are also quite recurrent across trees, plus an eighth category, "other", which regroups rare and isolated frames. Most frames aim at criticizing officials and their abilities to curb the crisis. Categories differ essentially in the form of that criticism: ranging from concrete expectations (frame A), via expressed mistrust related to incompetent communication (frame B), to allegations of selfishness and malice (frame C), plain protest (frame E) or even insults and mockery (frame F). Table 1 shows the breakdown of frame categories for each of the three trees and each user political valence/color. We found that all colors are generally present in all frames. Some categories are used predominantly by a specific color: for instance, frame Frame category Tree 1 Tree 2 Tree 3 (nuances in parenthesis for subtopics specific to tree [1] , [2] or [3] ) total < ∼ > total < ∼ > total < ∼ > of this preliminary exploration, and even though there are remarkable variations in the use of some frames by some color, we nonetheless hypothesize that quote frames might obey a vertical dichotomy between "us, the people" and "them, the officials" as much as a moderate horizontal dichotomy between political camps -it is as if cross-cutting interventions fulfill a relatively similar rhetorical goal. We differentiated affiliation and interaction links on Twitter by focusing on a specific object featuring both link types: quote trees. We showed under which conditions these ephemeral discursive events may attract a diverse public eager to frame the initial information contained in the root tweet and coming from a more or less wide spectrum of estimated political valences. In particular, assuming that retweets reflect the "baseline" audience valence of a given root tweet, we observed that the public of quoters diverges all the more from baseline when the root tweet has a non-central valence and attracts a larger audience. Moreover, this backand-forth movement persists in secondary quotes, albeit in an attenuated and non-monotonous manner. At first sight, these phenomena go against the "echo chamber" narrative, at least for larger "chambers" and trees. Coming back to users, we nuanced this finding by exhibiting distinct user attitudes: while some users (especially non-central ones) quote root tweets of a distinct valence as the tweets they normally retweet, some users do not, reminiscing a behavior more akin to echo chambers. A casual yet in-depth qualitative exploration of just three trees further showed that both cross-and non-cross-cutting users nevertheless appear to rely on a small set of mildly shared frames. Put simply, cross-cutting interventions do not necessarily use cross-cutting frames. While shedding light on the formation and composition of counter-publics in reaction to content published in online social networks, our results hint at further research that would focus on specific regions of the figures presented in this paper, and qualify in more detail the position, behavior and claims of the corresponding users. The political blogosphere and the Birds of the same feather tweet together: Bayesian ideal point estimation using Twitter data Social Media and Democracy: The State of the Field, Prospects for Tweeting from left to right: Is online political communication more than an echo chamber? Tweet, tweet, retweet: Conversational aspects of retweeting on Twitter Recovering the french party space from Twitter data The echo chamber effect on social media Political polarization on Twitter Framing: Toward clarification of a fractured paradigm Quantifying controversy in social media Political discourse on social media: Echo chambers, gatekeepers, and the price of bipartisanship Quote RTs on Twitter: Usage of the new feature for political discourse The Discovery of Grounded Theory: Strategies for Qualitative Research The structural virality of online diffusion Antagonism also flows through retweets: The impact of out-of-context quotes in opinion polarization analysis Birds of a feather tweet together: Integrating network and content analyses to examine cross-ideology exposure on Twitter Alliance of antagonism: counterpublics and polarization in online climate change communication What is Twitter Happy accidents: Deliberation and online exposure to opposing views When politicians talk: Assessing online conversational practices of political parties on Twitter No echo in the chambers of political interactions on Reddit Spatial models of parliamentary voting Social and semantic coevolution in knowledge networks Folks in folksonomies: Social link prediction from shared metadata Of echo chambers and contrarian clubs The spread of true and false news online Cross-partisan discussions on YouTube Acknowledgments. We are grateful to Telmo Menezes and Katharina Tittel for contributing to define the user perimeter and subsequently collect Twitter data. This work was supported by the "Socsemics" Consolidator grant from the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation program (grant agreement No. 772743) [2] improve crisis management, [3] test the vaccine first) Table 1 . Number of quotes featuring a given frame category, Frame categories for all trees (with ) and counts per tree, broken down by user valence ("<" for IP<-1 3 , "∼" central IP, ">" IP> 1 3 ) . Percentages indicate the proportions of quotes of a given color that mention a given frame. Note: quotes using multiple frames appear several times. B (incompetence) is on average more often used by blue (43%) than red (24%) or black (19%) quotes; as is, to a lesser extent, frame A (call for responsibility). By contrast, frame F (insults/mockery) tends to appear more in red (22%) than blue (14%) and black quotes (8%). Interestingly, some frames are balanced, such as frames C (us/them) and D (argumentation). Keeping in mind the small size