key: cord-0468078-vc2lsl8g authors: Fallucchi, Francesco; Kaufmann, Marc title: Narrow Bracketing in Work Choices date: 2021-01-12 journal: nan DOI: nan sha: 304a96d6620417d1886fa016a205f88308c93518 doc_id: 468078 cord_uid: vc2lsl8g Many important economic outcomes result from cumulative effects of smaller choices, so the best outcomes require accounting for other choices at each decision point. We document narrow bracketing -- the neglect of such accounting -- in work choices in a pre-registered experiment on MTurk: bracketing changes average willingness to work by 13-28%. In our experiment, broad bracketing is so simple to implement that narrow bracketing cannot possibly be due to optimal conservation of cognitive resources, so it must be suboptimal. We jointly estimate disutility of work and bracketing, finding gender differences in convexity of disutility, but not in bracketing. bined. The estimate for bracketing in work is merely noisier than for work and money combined, but rejects broad bracketing. Unlike direct comparisons of reservation wages, direct estimates of the degree of bracketing distinguish between the impact of bracketing and preferences on reservation wages by jointly estimating them. Joint estimation finds no gender differences in the degree of bracketing, showing that the large gender differences in reservation wages by treatment can be explained by gender differences in work preferences. In a follow-up experiment aimed at reducing the impact of narrow bracketing, we run two treatments that differ from NARROW by describing additional sequences as being done "before" or "after" the 15 required tasks. Assuming convex disutility of work, we expected participants in the BEFORE treatment to think of the earlier, hence easier, tasks and participants in the AFTER treatment to think of later, hence harder, tasks -although both treatments may lead to broad bracketing by drawing attention to the required tasks. We therefore predicted that reservation wages in BEFORE would be more similar to narrow bracketing, while AFTER would be closer to broad bracketing. The reservation wages are directionally in line with this, but the differences are statistically significant only at the 10% level when they are. For example, in one setting they are $2.70 in NARROW, $2.72 in BEFORE, and $2.96 in AFTER. In Section 5, we identify narrow bracketing as a suboptimal mistake, ruling out alternative mechanisms. We consider models of optimal allocation of cognitive resources; differences of information; strategic or motivated bracketing; and preferences for bracketing. Since we keep information and the overall outcome sets constant across several treatments, and since broad bracketing requires merely adding two workload and two payment numbers -noting that 15 + 15 = 30 and $2 + $4.50 = $6.50 -we conclude that narrow bracketing in our experiment is suboptimal. We also discuss how framing never provides an explanation for bracketing, since it can be applied broadly or narrowly. We conclude with a brief discussion of the challenge of jointly estimating bracketing and preferences outside the lab, and ways to move towards joint estimation in field data. Taking into account narrow bracketing can improve estimates and predictions when policies affect both overall incentives and how people bracket these incentives. In this section, we formalize narrow and broad bracketing in choices, similar to Read et al. (1999) , Rabin and Weizsäcker (2009) , and Ellis and Freeman (2020) . We show that narrow and broad bracketing are unidentified if a person maximizes context-independent additive preferences over all options, such as expected utility, constantabsolute-risk-aversion, or linear utility preferences depending on how rich the choice set is. Conditional on choices that are not compatible with such context-independent additive preferences, bracketing can be identified, and we describe our strategy for doing so. Let us describe the setup. A person makes two choices, X from the choice set X and Y from the choice set Y, with X , Y ⊂ X for some choice space X. We assume that X is an additive space, so that for any X and Y in X, we also have X + Y ∈ X. These two choices together determine the full outcome O = X + Y . Then we can write the set of total outcomes from which the person actually chooses as We assume that the utility U (·) depends only on total outcomes -in the experiment will correspond to total effort and total earnings. Thus we can write U (X, Y ) = U (X + Y ). Hence a broad bracketer combines all decisions and maximizes the following: A narrow bracketer instead is a person who makes each choice in isolation, by which we mean that they maximize some (possibly different) function V (·) over the additional bundles X , rather than over total outcomes. Formally, they maximize the following: 1 max X∈X V (X) (2) Based on this, we can now define narrow and broad bracketing. 2 Definition 1. Denote by O(X , Y) the overall choice a person makes who faces two separate choices from choice sets X and Y. Let C 1 (X , Y) be the choice made from X when the other choice set is Y, and C 2 (X , Y) be the choice made from Y when the other choice set is X -so that C 1 (X , Y) = C 2 (Y, X ). Then O(X , Y) = C 1 (X , Y) + C 2 (X , Y). We say that a person brackets broadly if O(X , Y) = O(X , Y ) for all choice sets X , X , Y, and Y with X + Y = X + Y . We say that a person brackets narrowly if C 1 (X , Y) = C 1 (X , Y ) for all choices Y and Y . In our experiment, the second 'choice' set Y = {Y } is a singleton -it is an endowment of required work and money that gets added to their choice X ∈ X . We describe our specific treatments in Section 3, but conceptually we test for broad and narrow bracketing as follows: 3 Hypothesis (Broad Bracketing). Behavior is consistent with broad bracketing if O(X , Y) = O(X , Y ) for all X , X , Y, and Y s.t. X + Y = X + Y . Hypothesis (Narrow Bracketing). Behavior is consistent with narrow bracketing if C 1 (X , Y) = C 1 (X , Y ) for all Y and Y . We now turn to the question of identifiability, since it is possible for behavior to be consistent with both broad and narrow bracketing. Our proposition 1 characterizes those preferences that are consistent with both broad and narrow bracketing: preferences for which narrow bracketing and broad bracketing always lead to identical choices, no matter which choice sets we offer the person. The proposition highlights the central role played by additivity across preference domains, that M (X + Y ) = M (X) + M (Y ) for all X, Y , where M (·) is the money metric. Notice that this assumption is usually much stronger 1 Recent work by Vorjohann (2020) suggests a specific form for V (·) in risky choices over multi-dimensional goods, with one reference point per dimension. If the reference points can depend on other choice sets, then this is not covered by our formalization. 2 This definition extends naturally to more choices. 3 These hypotheses naturally extend to more general choice sets Y. than, but implies, additive separability, which requires that additivity holds for bundles across different goods, periods, or dimensions, but not necessarily within each good. In fact, it is well-known that any real-valued continuous function satisfying this additive functional equation (called Cauchy's functional equation) must be linear, showing how strong this condition is. Two special cases of this result are the result by Rabin and Weizsäcker (2009) who show that CARApreferences imply no costs to bracketing under expected utility; and the result (contemporaneous with ours) by Ellis and Freeman (2020) that bracketing is unidentified under linear utility over multiple goods. Our proposition unifies and extends these results as special instances of a general additivity property that holds for arbitrary choice settings (assuming a numeraire). Under different constraints on the choice sets that we observe, additivity of the money metric over these choice sets will lead to different restrictions on the underlying preferences. For example, if the setting is such that people make two choices, one over work and money, and one over food and money, then the first choice determines all the work and the second choice determines all the food. In that case, additivity in such a choice setting implies that the utility can be represented via a utility function that is separable, but not linear, in work and food, and linear in money. In our experiment with two goods (work and money) our identifying assumption is that either we have contextdependent preferences, or we have non-linear preferences in money and work. To illustrate why linear utilities are a problem, suppose a person is always willing to do exactly one extra task for one extra dollar or more, but never for less. Then their choice from one choice set is not affected by how many tasks they choose in the other choice set, so broad and narrow bracketing lead to identical choices. Both convexity of effort as well as comparison effects (such as Tversky and Simonson (1993) ; Kőszegi and Szeidl (2012) ; Bordalo et al. (2013) ; Bushong et al. (2020) ) will ensure that the identification assumption holds. We primarily target increasing tiredness by choosing a reasonably large amounts of difference in work. We conducted two sets of pre-registered experiments online, using the software Lioness (Giamattei et al., 2020) . In this section, we first describe our main experiment conducted from December 2019 to January 2020, and our hypotheses for bracketing in real effort choices with detailed instructions in Appendix B. We then highlight some implementation details that were not fixed in or mildly changed from our pre-registration. We finally describe our follow-up experiment conducted in March and August 2020, aimed at reducing the potential impact of narrow bracketing. The main experiment consists of four parts. Part 1: Tutorial Subjects familiarize themselves with the task by completing three tasks correctly which consist of decoding a sequence of twelve letters into numbers, as seen in Figure 1 . Every sequence is a new table showing a mapping of ten randomly chosen letters to the numbers between 0 and 9. After every attempt we generate a new table, independently of whether the answer was correct or not. This makes it harder for subjects to learn the task than more commonly used encryption or typing tasks (e.g. Erkal et al., 2011; De Quidt et al., 2017; De Quidt, 2018) . This should increase the convexity of costs, which is our primary channel for satisfying our identifiability assumptions. Part 2: Elicit Tediousness We elicit participants' perceived tediousness of the task on a scale from 1 ("not tedious at all") to 10 ("extremely tedious"), providing us a control common to all treatments. Part 3: Elicit Reservation Wages by Treatment We elicit participants' reservation wage for a high-work option that requires 15 more tasks than the low-work option, depending on treatments. The four treatments (described below) vary participants' endowment of tasks and money and how this endowment is presented. There are two scenarios with Scenario 1 always presented before Scenario 2. We elicit one reservation wage for each scenario through an incentivized price list task, where participants choose to accept or reject the extra work for a list of extra reservation wages between $0.25 and $4.00 in $0.25 increments. The extra wages are in addition to the $4.00 participants receive for the alternative workload. 4 We remind subjects that a single binding choice from one of the two scenarios will be chosen at random. We randomly determine the binding choice and inform subjects about the total payment and number of sequences to decode. Subjects can complete the tasks without time constraints. We then ask a short demographic questionnaire and display a summary of total earnings. Our main experiment consists of four treatments: BROAD, NARROW, LOW, and PARTIAL. Let us illustrate implementation and wording of treatments using Scenario 1. For the three treatments BROAD, NARROW, and PARTIAL, the total outcomes -choice list option plus endowment -are identical; but some of the work and money is shifted from the choice list options to the endowment. Treatment LOW instead has identical choice list wording to NARROW, but no endowment of work. The following list shows how the wording of choice list items changes between treatments for scenario 1. Payments M always start from $4.25 and go up to $8, and participants choose between OPTION A and OPTION B: Table 1 summarises the choice, endowment, and overall outcome for each treatment and each scenario. In the notation of Section 2, we denote by X the set of options as presented in the isolated choice; by Y the endowment of tasks and money that will be added to the choice made from X ; and by O = X + {Y } the actual set of outcomes. All of this information is visible to the participants on their choice page. In the BROAD treatment, participants are offered the payment M + 2, which includes their participation fee. In all other treatments, the payment offered is M and is on top of the $2.00 participation fee. Across all treatments and scenarios M ranges from $4.25 to $8.00 in steps of $0.25. Participants in the treatments NARROW, PARTIAL, and BROAD treatments must do at least 15 tasks no matter what they choose, which is why we set their lowest additional payment to $4.00, and why higher work load receives at least $4.25. Since LOW has identical choice set X as NARROW, participants in the LOW treatment can receive $4.00 for no work in Scenario 1. Choice Set S1 (X 1 ) Endowment S1 (Y 1 ) Full Outcome S1 BROAD (30, $(M + 2)) vs (15, $6) (0, 0) (30, $(M + 2)) vs ( Hypothesis 2 (Narrow Bracketing). Behavior is consistent with narrow bracketing if m N ARROW = m LOW . As we discussed in Section 2, we need an identification assumption, which in our case implies that m LOW = m BROAD . If they are equal, we cannot rule out linear (population) preferences: that participants are as willing to do 15 additional sequences on top of 0 sequences, as they are to do them on top of 15 sequences. We recruit in total 929 subjects on Amazon Mechanical Turk between the end of December 2019 and the beginning of January 2020. In Table 2 we report a summary of demographics. Of the subjects recruited, 162 did not complete the experiment. While it might itself indicate narrow bracketing, we see no evidence of differential attrition by treatment across different parts of the study (see Table 9 in Appendix C). Across all treatments, between half and two-thirds of attritors drop before completing the practice tasks, the rest essentially drop out after finding out how many tasks they have to do in total. Treatments are similar in terms of gender composition (χ 2 p-value: 0.91), while they are slightly older in the BROAD treatment compared to other treatments (37.8 years vs 35.0-36.0, χ 2 p-value: 0.01). Finally, individuals rate the task on average as 7.33-7.54 out of 10 in tediousness, which does not significantly vary across treatments (χ 2 p-value: 0.86). Roughly 25% of the choices made in the choice-list are inconsistent: in few cases subjects make only one inconsistent choice, while in other cases choices are inconsistent throughout the list of wages offered. In our main analysis we drop a scenario if individuals make one or more inconsistent choices in it. To detect an effect size of 0.40, that we consider economically meaningful, at a 5% level of significance with 90% power, we would need 174 observations in NARROW and 116 in PARTIAL and BROAD treatments. The number of observations collected with consistent choices are above these thresholds and therefore considered sufficient for our treatments comparisons, although as the discussion on identification makes clear, the effect size decreases as preferences become more linear. 6 Participants earned $7.30 on average, for an average working time of 35 minutes. The careful reader will have noticed that we have more participants in the NARROW treatment than in other treatments. This is because in our initial version of the NARROW treatment, we mistakenly informed participants of their endowment on the page before the first choice, which we fixed in later version by informing participants of their endowment on the first choice page only. Early information provision might lead to some (fast) reference effects not present in other treatments. In order to have enough observations of each subtreatment, we collected more data. As we show in Appendix C, our results are robust to choosing the early, late, or pooled NARROW sample. Moreover, in our pre-registration we had planned on using three treatments with equal total outcomes, one with no baseline tasks, one with 8 baseline tasks, and one with 16. We realized however that the middle treatment of 8 baseline tasks would add little value and cost us more, so we dropped it and went with 15 instead of 16 tasks. 7 When fixing the NARROW treatment, we realized however that instead of having an intermediate number of tasks, we could broadly frame the task but not the money dimension, which led to PARTIAL as a replacement for the intermediate treatment. This PARTIAL treatment allows us to test directly for narrow bracketing in the money or work dimension separately which seems a useful addition. While in line with our pre-registration, it is worth pointing out that this treatment was not pre-registered exactly as run. Finally, because we added PARTIAL and changed NARROW later, these observations are not balanced against observations in LOW, BROAD, and early NARROW treatments: most were collected in sessions focusing only on these treatments. We don't consider this a major issue, since restricting our analysis to the early balanced data does not meaningfully affect our results. In our follow-up study we explored if we can reduce narrow bracketing by making the increasing costs of the additional tasks more salient through a different presentation of choices. We design two new treatments, BEFORE and AFTER, that are identical to the NARROW treatment, except that we describe the effort choice to participants as extra sequences to decode before or after the mandatory sequences. We reasoned that thinking about doing additional tasks "after" the required ones would lead participants to think about higher marginal disutility. 8 Concretely, we presented choices as follows: OPTION A "0 additional sequences before (after) the 15 required for an extra $4" versus OPTION B "15 additional sequences before (after) the 15 required for an extra $X" with X starting from $4.25 and up to $8. In total, 302 participants were recruited and started the HIT. 9 We report in Table 10 in Appendix C the summary statistics, compared to the earlier NARROW sessions. We observe a similar attrition rate to the other treatments. Overall, the composition of the sample in terms of demographics and perceived tediousness of the task is similar to the main treatments. However, we find more inconsistent choices. 10 This leads to the following natural hypotheses: Hypothesis 3 (Debiasing through highlighting required tasks alone). Drawing attention to extra tasks reduces narrow bracketing if m BEF ORE = m N ARROW . Intuitively, the BEFORE phrasing might make participants realize the relevance of the required tasks. This may change their behavior, if they understand that they should add the endowment to their choice and do so. Hypothesis 4 (Debiasing through highlighting convex cost of additional tasks). Drawing attention to the (assumed) convexity of costs reduces narrow bracketing if m AF T ER = m N ARROW . Intuitively, the AFTER phrasing might make people realize that they have to do tasks 16 through 30, rather than 1 through 15, even if they do not realize that they should broadly bracket. So, if it draws attention in this way, then it should lead to a larger change away from NARROW. In both cases, full debiasing requires that the reservation wage equals BROAD -or PARTIAL if it only debiases in the work dimension. These hypotheses are however not as tightly linked to the theory as our main hypothesis, as we need to make additional assumptions about the shape of the preferences, as well as about our treatments drawing sufficient attention to these features of the preferences. In the analysis below we analyse the reservation wages across treatments: the smallest extra wage for which sub- First, we report our main results which test and estimate broad and narrow bracketing directly. We then report a non-preregistered heterogeneity analyis, as well our pre-registered foll-up study collected after COVID-19 induced lockdowns, which might affect comparisons with the main treatments. 8 Note however, that these follow-up treatments were collected primarily after COVID-19 induced lockdowns had been put in place. 9 The new study was conducted in two sessions in March and August 2020. 10 These features may be explained by recent evidence about the effects of the COVID-19 pandemics on the composition of the pool of Mturkers. Moss et al. (2020) reports that the demographic composition of MTurkers did not change with the pandemics. However, Arechar and Rand (2020) find that on average Mturkers became less attentive. Our findings are in line with both studies, making us cautious about the comparability of treatments. For this reason we focus exclusively on individuals whose choices are consistent. Table 4 , we reject that BROAD and NARROW are equal with a p-value of less than 0.001 and that BROAD and PARTIAL are equal with a p-value of 0.012 in Scenario 1. 12 Similar results hold for Scenario 2. Notice that the reservation wage of Scenario 2 slightly decreases (statistically remains constant) in PARTIAL, which we attribute to either genuine non-convex disutility or, more likely, to some narrowly bracketed framing effect possibly due to the order of choices. Result 2. We fail to reject Hypothesis 2 that individuals bracket work decisions narrowly. 11 See C.4 in Appendix C for bar plots and kernel density plots of the raw reservation wage data by treatment and scenario. 12 Since there are ties in our data -there are multiple people with the same reservation wage -the test we compute uses a normal approximation to the test statistic (which does not assume that the data is normal, but that the test statistic itself is normal). The same comparisons of mean reservation wages via Wilcoxon rank-sum tests fail to reject narrow bracketing, where we compare means in LOW and NARROW, which have identical choice sets, but different endowments. In scenario 1, the Wilcoxon rank-sum test fails to reject that the mean of $2.30 in LOW is different from the mean of $2.07 in NARROW (p-value of 0.117), and in scenario 2 fails to reject that $2.74 in LOW is different from $2.70 in NARROW (p-value of 0.704) -see Table 3 for the means and Table 4 for the test p-values. 13 One problem with comparing means by scenario is that this allows for multiple tests. While we only have two scenarios that both lead to identical results for BROAD compared to the other treatments, we would like a more generally valid test. In our pre-registration, we specified running a linear regression with a single treatment effect averaged across scenarios. This essentially averages the reservation wages across scenarios by treatment and compares them. While this test rejects that BROAD is equal to NARROW or PARTIAL -see Table 16 in Appendix C -it is conceptually the wrong test, as is illustrated by its failure to reject that NARROW and PARTIAL are equal. The reservation wage in PARTIAL is once above and once below the reservation wage in NARROW, which leads to similar average reservation wages when averaged over both scenarios. The reason this is conceptually the wrong test is made clear by Hypotheses 2 and 1, which apply at the level of full choices, which means at the level of scenarios. At the scenario-level, PARTIAL and NARROW indeed are statistically significantly different in Scenario 1, but not in The general test we propose is to estimate the average degree of narrow bracketing across scenarios. Intuitively speaking, instead of averaging the reservation wages first and then estimating narrow bracketing, we would like to estimate narrow bracketing per scenario and then average this. One way to do this is to realize that we can write the reservation wage in NARROW as a convex combination as follows: where κ is the degree of narrow bracketing. If κ = 0 we have broad bracketing, if κ = 1 we have narrow bracketing. Based on this we run the following non-linear regression: where Y i,s,t is the observed reservation wage for individual i in scenario s and treatment t. We include fixed effects for reservation wage by scenario and for treatments BROAD (B s ) and LOW (N s ). The average reservation wage for the NARROW treatment in a given scenario is then determined by the convex combination of B s and N s , with κ determining the weight on each. If we thought that there is a fraction κ B of the population that brackets exactly narrowly, and the rest brackets exactly broadly, then the above would identify this κ B . Similarly, κ P captures the fraction of people who bracket work narrowly, and we believe everyone brackets money narrowly. This estimation allows us to combine all scenarios, and provides the average degree of narrow bracketing. It also allows κ to fall outside the interval [0, 1], which can happen under non-convex disutility or because people engage in other types of bracketing than full broad or narrow bracketing. 14 We estimate the above regression via non-linear least squares (NLS, Gallant (1975) ) and report the results in Table 5 . We estimate κ for bracketing over money and work (BROAD), as well as just work (PARTIAL). The main parameter of interest is κ, which is estimated in all specifications to be non-significantly different from 1, while being significantly different from 0. In other words, we cannot rule out that participants are bracketing narrowly, while we do reject that they bracket both money and work broadly (column 1, κ B = 1.38, standard error 0.29) or bracket work broadly and money narrowly (column 4, κ P = 1.53, standard error 0.61). In order for NLS to provide credible standard errors however, we need the means between the BROAD and PARTIAL compared to LOW to be sufficiently different across all scenarios jointly -which is our identification assumption again! Our identication assumption holds for BROAD compared to LOW (since it holds in Scenario 1), but is unlikely to hold for PARTIAL compared to LOW, given how similar they are. This is reflected in the noisy (and untrustworthy) standard errors for the PARTIAL treatment in column 4. Despite the standard errors of the NLS-estimate probably being untrustworthy, the estimate of κ strongly favors narrow over broad bracketing. 15 This approach seems the most conceptually sound to us -although there are probably better econometric estimation strategies for κ than NLS. In addition to avoiding multiple tests, it allows for joint identifcation of bracketing and preferences, the benefit of which we highlight in the next part. We further explore the determinants of work decisions across treatments with a set of (non pre-registered) estimates of κ by gender, which showcases the need for joint estimation rather than relying on direct comparisons of reservation wages. Furthermore, since we have a substantial fraction of choices by individuals that are never willing to decode extra sequences, our data are right-censored at $4.25, we also estimate Tobit regressions. The gender-specific estimates of κ for BROAD versus LOW show that there is no major difference in the degree of narrow bracketing as measured by κ. This holds despite the fact that the difference in reservation wage for women in scenario 1 is $0.91 in Scenario 1 and $0.21 in Scenario 2 (the difference between S1 broader and S1 narrow in Table 5 ), while these differences are 14 As an example of non-convex disutility, suppose that the tasks 1 to 10 are unpleasant, that tasks 11 to 20 are easy due to warm-up, while tasks 21 to 30 are really unpleasant. Then a person who has to do 15 tasks but is asked like in NARROW treatment to do 15 vs 0 additional tasks may well think that these 15 additional tasks are easier than the first 15 tasks if they take their baseline into account only partially. Thus they would be more willing than participants in LOW who in turn are more willing to work than participants in BROAD, hence κ would fall outside the [0, 1] interval. 15 This remains true when we restrict observations of NARROW to late observations where participants find out their baseline on their first choice page only, but the standard errors are even larger, both due to smaller sample size and due to the identification assumption failing badly for males in this subsample. only $0.42 and $0.23 for men. This highlights our next result: Result 3. We see no difference in bracketing by gender, but narrow bracketing is costlier for women than men due to higher convexity of disutility. This highlights the need for direct estimation of the degree of narrow bracketing via κ, which jointly estimates the preferences -here the change in reservation wage between scenarios -and bracketing. The reason is that the more linear the preferences are, the more similar behavior under narrow and broad bracketing is, which means that the reservation wage changes less even if the person brackets narrowly. Since in our dataset women have a larger change in their reservation wage as measured by the difference between LOW and BROAD, they pay a higher cost from narrow bracketing: women's stated reservation wage differs more between BROAD and NARROW, despite equal outcomes, so that they lose a larger amount of money in one of these two treatments than men do. If instead we focused on whether the difference in reservation wage was significant and large as a proxy for narrow bracketing, we would wrongly conclude that women bracket more narrowly than men, another illustration why focusing on reservation wages rather than κ is conceptually unhelpful. In the Tobit regressions, we regress, separately for each treatment, the reservation wages choices on age, gender and the self-report of perceived tediousness, controlling for the Scenarios. In particular we are interested to check if there are within treatment gender differences, as already found in different settings (e.g. Koch and Nafziger, 2019). We report the estimation results in Table 6 . The Tobit regressions in Table 6 tell a similar story. Age has no clear effect on the results, while we observe a positive correlation between tediousness and the reservation wage in all treatments, although this is not always significant. Let us conclude this part by highlighting a field setting where such joint identification is almost possible and is likely to matter. In a nice study of monopsony power, Dube et al. (2020) measure labor supply elasticities on MTurk. In their estimation of employers' market power, they assume myopic workers who essentially narrowly bracket. While this assumption does not affect the reduced-form measure of labor supply elasticity, if we treat this estimate as if it was all driven by preferences rather than bracketing, this may provide the wrong predictions to responses in labor market policies and designs. Their data does not allow for our simple strategy of joint identification, since they only have a single observation per worker. But in future data collections, it should be straightforward to collect multiple observations per worker or to make and test additional assumptions about population-level bracketing and preferences to allow joint estimation. The treatments BEFORE and AFTER are identical to the NARROW treatment, except for describing additional sequences as "additional sequences before" or "additional sequences after" the 15 required tasks. We report the means by Scenario and by treatment, and the Wilcoxon tests in Tables 7 and 8 similarly to our main treatments. We compare BEFORE and AFTER with the NARROW treatment with which they are identical except for the additional highlighting of the tasks as "before" or "after" the baseline tasks. In both BEFORE and AFTER the extra reservation wage is higher than in NARROW, but in both cases this difference is not statistically significant (p − values > 0.091). In Appendix C, we however show that the AFTER treatment is statistically significantly different from NARROW when we limit ourselves to those observations in NARROW that received their information about baseline on the first choice page only, which indicates a partial success of debiasing. Result 4. Reminding participants of additional tasks alone has no discernible affect, while emphasizing the additional costs due to convexity (suggestively) reduces the impact of narrow bracketing. In this section, we consider alternative mechanisms that can lead to bracketing and explain why they do not explain our results. This provides some of the most conclusive evidence that narrow bracketing can be a suboptimal mistake. At the end of the section, we discuss why context and comparison effects cannot explain bracketing. would be optimal to combine the choices, yet they decide that it isn't worth doing so given the cognitive costs they expect this would entail. This includes models of rational (in-)attention such as Lian (2018) and Kőszegi and Matějka (2020) . Our results are inconsistent with these models, which rely on lack of information or cognitive costs. Lack of information is ruled out by having all choice-relevant information on the choice page, while cognitive costs would have to be implausibly large for MTurkers to 'decide' that computing 15 + 15 is too costly. Note that we keep the information identical across all treatments (with the exception of early observations in our NARROW treatment -see the discussion in Section 3.4 for details), which rules out any broadly bracketed type of reference effects, since all the information is presented in one go on the first choice page. Strategic concerns for bracketing have been studied primarily as a means for self-control (Koch and Nafziger (2011, 2016) ). In such models, a person sets narrow goals and bears a cost from missing these goals, which can help them overcome self-control problems. While narrow goal-bracketing can lead people to respond within a given bracket, it cannot lead to different goal-bracketing in our experiment: all the possible outcomes are identical, hence then possibilities for self-control and goal-setting are also identical. This brings us to preferences as the source of bracketing, such as the model of news utility (Kőszegi and Rabin (2009) ) where people get reference-dependent utility from news about investments or gambles. If the news about a choice in one bracket is resolved separately from news in other brackets, people cannot avoid feeling the resulting news utility separately, which leads to narrow bracketing. News utility cannot explain our results, since there is a single piece of news in all treatments. A more likely candidate for preference-based bracketing is social preferences where brackets serve as a signal of social norms or sanctions. For example, a person who is asked to split $10 between two people may split this amount equally, even if they know that one person received more money that day than the other. They may not consider it their responsibility, their duty, or their right to affect the pre-existing income difference -and this might depend on the social context in which they are asked to make the choice. In our experiment, this would require people expecting to be treated fairly in each choice, rather than by overall outcome or by employer, which seems unlikely given that people bear the full consequences. Second and more importantly, there is almost no response to the baseline workload, which would surely affect how fair participants perceive their workload to be. Finally, by mistakes we mean anything from people not understanding that they should combine outcomes, to 'forgetting' or not realizing that they should do so in a situation -what Handel and Schwartzstein (2018) call mental gaps. Our study provides some of the most conclusive evidence of narrow bracketing as a suboptimal mistake. While we interpret Tversky and Kahneman (1981) and related narrow bracketing over gambles as mistakes, the choices involved are more complicated to combine (Rabin and Weizsäcker (2009) ; Ellis and Freeman (2020)) or they don't rule out preference-based explanations as readily (such as Redelmeier and Tversky (1992) , if participants expect to find out separate choices separately, or Exley and Kessler (2018), which involves social preferences and thus leaves room for norms). Context effects such as focusing and range effects, like convex costs, can be applied narrowly or broadly and thus they cannot explain why a person brackets them narrowly or broadly. Since our experiment holds full outcomes constant across most treatments, context effects should be applied equally if they are broadly bracketed, and thus not lead to different choices. Context effects and bracketing are thus complementary dimensions of how choice presentations affect decisions. Barberis et al. (2006) explore this point in detail with respect to how it is the interaction of non-expected utility and bracketing that can make sense of observed behavior over moderate-sized gambles, not non-expected utility alone. 16 . In this paper, we test for and estimate broad and narrow bracketing in work choices. We characterize the identification assumption needed and reject that workers on Amazon Mechanical Turk bracket broadly, since they seem to ignore required baseline tasks. Our experiment rules out optimal conservation of cognitive resources, information, and preferences as alternative mechanisms for our results, which leaves suboptimal mistakes as the candidate explanation. In order for narrow bracketing to be usefully applied outside the lab, we need to move from identifying narrow bracketing to estimating its impact and identify situations where it matters. One major challenge will be to jointly identify bracketing and preferences in field data. There are at least three ways to improve identification in the field. First, we can combine observational data with targeted field experiments that directly estimate preferences or bracketing (or both) to estimate relevant parameters. Second, we can model or include direct information on how the final outcomes are effectively chosen. For example, our data may contain monthly expenses on retailers even though we know that people buy multiple times per months. Finally, we may relax additivity assumptions which are central to narrow bracketing: choices made by maximizing additive preferences are not affected by bracketing, as our discussion in section 2 makes clear. In choices over money, this issue is less severe, because money is fungible: short of large income shocks or strong liquidity constraints, a dollar is a dollar is a dollar. Thus instances of non-fungibility of money (Heath and Soll (1996) ; Hastings and Shapiro (2013); Abeler and Marklein (2017)) provide evidence of mental accounting, a form of narrow bracketing. Non-fungibility of work on the other hand does not immediately imply narrow bracketing as a mistake, because an hour of work is not identical to an hour of work on another day, at another hour, or on another task. This requires allowing for non-additive preferences such as habit formation or adaptation, strongly convex costs, complementarities, or non-expected utility preferences. The additional difficulty compared to monetary outcomes shows why direct evidence for narrow bracketing in work choices is important. 16 For related points in choices over social preferences, see also Read et al. (1999) 's discussion of Rawlsian preferences, as well as Sobel (2005) A Appendix: Proofs Let us prove proposition 1, which we breack up into multiple subresults. Let us set up more general notation than the simplified notation used in Section 2. The full set of possible outcomes is X, such as R or R 2 0 or the space of lotteries. We denote our two choice sets by X , Y ⊂ X. The option chosen from X is denoted by X, and the option chosen from Y is denoted by Y . We call X the first choice and Y the second choice if we need to order them, although this does not represent temporal order, simply an index for referring to choices made. For example, in choices over work and money, we have X = R 2 0 , and if the first choice is between doing 4 tasks for $2 and doing nothing, and the second choice is between doing 6 tasks for $3, then X = {(0, 0), (4, 2)} and Y = {(0, 0), (6, 3)}. There are two ways that choices can be presented: separately or aggregated (i.e. summed together). When presented separately, the person is asked to make two choices X and Y , with X ∈ X and Y ∈ Y. When presented in aggregate, the person is asked to make a single choice O, with O ∈ O := X + Y. We denote the choice from an aggregate decision as O A (X , Y) (O for 'overall outcome'), and we denote by we know that the choice from X was 1 and the choice from Y = 5. In general this can be multi-valued, a technical issue that we will ignore. 17 The total outcome is just the sum of the two choices, hence O A = F A + S A , where we drop the arguments when they are clear implicitly. Similarly denote by O S , F S , and S S the overall choice, as well as the first and second choices made from X and Y when choices are presented separately. for all X , Y and Y (by symmetry this also directly holds for S S , since we can exchange X and Y). Thus narrow and broad bracketing are unidentifiable if both conditions hold at the same time. Being unidentified means that even if a person brackets narrowly, their behavior will still look consistent with broad bracketing; and similarly if a person brackets broadly, their behavior will look consistent with narrow bracketing. Suppose that choices are unidentifiable, so that both conditions hold for all choice sets. Then we have: , by redoing all the steps in reverse with Y This implies that so that we can simply refer to the first choice by C(X ): it depends only on what the choice set is, not on the other choice set, nor on whether it is presented separately or in aggregate or whether it is the first or the second. The same holds for the second choice, which we can write as C(Y). 18 And therefore, the same applies to overall outcomes, which can be written as C(O) for O = X + Y. When neither choice depends on presentation nor on the other choice, it is clear that bracketing is unidentified, since both narrow and broad bracketing hold. This proves the following proposition: Proposition 2. Bracketing is unidentified if and only if the following hold for all X , Y, X , Y and O = X + Y: Let us define the aggregate money metric M (·). First we have to assume that we have a numeraire good, which we'll call money, which exists under sufficiently strong continuity conditions. This may be money itself, some other good or composition of goods, or a continuous measure for the probability over some lotteries. Then for every outcome X, let us define M (X) such that the person is indifferent between paying M (X) and receiving X or receiving 0 and no money in choices presented in aggregated form. (The bundle X may itself contain any quantity of the numeraire.) Thus M (X) is defined via the indifference relation (X, M (X)) ∼ (0, 0) over aggregate choices, which is independent of whether the person brackets narrowly or broadly. Thus (X, P ) can be read as "paying P to receive X", where payments are done in the numeraire. 19 We are not assuming that the numeraire is separable from other goods, only that it spans the wole range of utility. With this, we can state the following proposition: Proposition 3. Suppose we have a money metric M (·) : X → R for choices presented in aggregate, and suppose that preferences satisfy appropriate continuity assumptions. X is closed under addition: if X, Y ∈ X, then X + Y ∈ X. We will now use proposition 3 to prove the special subcases mentioned in proposition 1. Proposition 4. Suppose X = R N , so that the choice is between two goods, whether different goods, or goods on different dates, or in different states. If M is additive and continuous, then it is linear: Proof. Additivity plus continuity for one-dimensional functions in R implies linearity. 21 Letting e i be the unit vector in dimension i, i.e. it is the bundle that provides one unit of good i and nothing else, then for x ∈ R, we have that f i (x) := M (x · e i ) is a one-dimensional function that is additive and continuous, hence M (x · e i ) = λ i x for some λ i . Proposition 5. Suppose X is a set of risky choices, possibly with a finite number of outcomes. If M is additive and continuous in probability, then M satisfies the independence axiom and has an expected utility representation. Proof. Denote by X, Y , and Z three random variables, possibly multi-dimensional. Note that if A is an event independent of the three random variables that happens with probability p, and A C denotes its complement, theñ X = 1(A) · X + 1(A C ) · Z andỸ = 1(A) · Y + 1(A C )Z are the random variables yielding the value of X respectively Y with probability p and the value of Z with probability 1 − p. Let A be distributed uniformly on [0, 1], independently of X, Y , Z. Let A(p, q) be the event that A ∈ (p, q). where we used the fact that 1(A(0, q))X and 1(A(p, p + q))X have the same distribution, hence also the same utility. Writing f X (p) = M (1(A(0, p))X), we have that f X is additive, i.e. it satisfies f X (p + q) = f X (p) + f X (q). We assumed that it is continuous in p, hence we know that f X is linear, i.e. f X (p) = λp for some λ. Since f X (1) = M (X), we have λ = M (X), which shows that M (1(A(0, p))X) = pM (X). LetX = 1(A(0, p))X + 1(A(p, 1))Z and similarly forỸ = 1(A(0, p))Y + 1(A(p, 1))Z. From this, we get the following corollaries. Corollary 1. Suppose M is additive and defined over risky wealth outcomes. Then M is an expected utility with CARA. Proof. Proposition 5 shows that M satisfies the independence axiom, hence by continuity that is has expected utility form, hence we can apply proposition 6. Corollary 2. Suppose M is additive and over at least two goods with risk and time. Then M is linear in all goods and with risk-neutral expected utility. If moreover M is stationarity, then we get a single linear factor for each good and the others are determined by exponential discounting. Proof. This follows from applying all prior propositions in and across all dimensions. This concludes all results relating to additivity under context-independence from proposition 1. We now prove that if bracketing is unidentified, then WARP holds, hence we cannot have context-dependent preferences, since we have the independence of irrelevant alternatives (IIA) assumption. This rules out context-dependent preferences such as Tversky and Simonson (1993) , Kőszegi and Szeidl (2012) , Bordalo et al. (2013 ), or Bushong et al. (2020 . Proposition 7. Suppose that we have preferences for which bracketing is not identifiable. Then under sufficient continuity assumptions, and assuming that adding positive amounts of numeraire to a single option make it more desirable, WARP holds. Proof. Suppose that we have unidentifiability. Suppose by contradiction that WARP fails for choices in overall aggregate choices, so that there are X, Y ∈ X ∩ Y s.t. X ∈ C(X ) and Y ∈ C(Y), yet X / ∈ C(Y). Then assuming that each C(·) is a singleton set and that we have continuity, this implies that for Y = Y \ {X} ∪ {X + ε}, we also have C(Y ) = C(Y) = Y for sufficiently small ε of the numeraire. (Here we rely strongly on continuity, either by considering only finite choice sets for which singleton choice sets and simple continuity will give these results straightforwardly, or by relying on stronger forms of continuity that rule out overly strong context-dependent effects, such as making one option better makes other options seem better by even more etc.) Let us now offer two separate choices X and Y . By unidentifiability, we know that C(X ) = X since the choice is identical to the choice made when it is the only choice. Similarly, we have C(Y ) = Y . Hence the person chooses X + Y from O = X + Y. But X + Y + ε ∈ O, which implies that the person could have gotten something strictly better (again, ruling out context-dependent preferences that violate various forms of dominance). This is a contradiction, hence WARP holds. Welcome Thank you for accepting our HIT. During the HIT, please do not close this window or leave the HIT's web pages in any other way. If you do close your browser or leave the HIT, you will not be able to re-enter and we will not be able to pay you! You will receive a baseline payment of $2.00 once you complete the HIT. Additionally, you can earn an extra bonus that will depend on your choices. You will receive a code to enter into MTurk to collect your payment once you have finished. Please read all instructions carefully. Thank you for accepting to participate in this HIT. On top of the guaranteed payment of $2.00 you will have the chance to earn an extra bonus, as explained later. The task In this HIT you will decode several sequences of random letters into numbers with the given decoding table. For each letters sequence, the decoding table changes. The main part of the HIT will require you to decode several of these tasks. To gain familiarity with the task you will now have to correctly decode 3 sequences. Note that each letter must be decoded correctly. After entering the decoded sequence, hit the submit button. Subsequently, irrespective of whether the text sequence was decoded correctly or not, a new sequence and decoding table will appear. Once you decode 3 sequences correctly, we describe the main part of the HIT. In the example you see the text sequence tvqqnqvgfgug. The decoding table tells you that u=0, t=1,... This means that you have to decode tvqqnqvgfgug into 167757642404 and enter this numeric value into the answer field. [NARROW] By completing the HIT you will receive $2.00. To do so you are required to decode some sequences correctly for a bonus. We will give you two pages of choices, with 16 choices on each page. Each choice is between a low number and a high number of additional sequences to decode before the required sequences for different bonuses. Example Choice (DOES NOT COUNT): • 10 additional sequences for an extra $4.00 • 20 additional sequences for an extra $5.00 [LOW] By completing the HIT you will receive $2.00. To do so you are required to decode some sequences correctly for a bonus. We will give you two pages of choices, with 16 choices on each page. Each choice is between a low number and a high number of sequences to decode for different bonuses. Example Choice (DOES NOT COUNT): • 10 sequences for an extra $4.00 • 20 sequences for an extra $5.00 [BROAD] To complete the HIT you will be asked to decode a certain number of sequences correctly. The number of sequences you will be required to decode will depend on your choices. We will give you two pages of choices, with 16 choices on each page. Each choice is between a low number and a high number of sequences to decode for different amounts (which includes the $2.00 completion fee of the HIT). • 10 sequences for a total payment of $6.00 • 20 sequences for a total payment of $7.00 [PARTIAL] By completing the HIT you will receive $2.00. To do so you are required to decode some sequences correctly for a bonus. We will give you two pages of choices, with 16 choices on each page. Each choice is between a low number and a high number of sequences to decode for different bonuses. Example Choice (DOES NOT COUNT): • 10 sequences for an extra $4.00 • 20 sequences for an extra $5.00 After you made your choice the computer will select randomly one of the 16 choices from one of the 2 pages. That option will be implemented. Thus you should select your preferred option for each choice. B.5 Main Task -Scenario 1 [BROAD] Choices to make now: for each choice in this Scenario, choose the preferred option. By completing the HIT you will receive a total payment (which includes the $2.00 completion fee) depending on your choices. 1) 15 sequences for a total payment of $6.00 30 sequences for a total payment of $6.25 2) 15 sequences for a total payment of $6.00 30 sequences for a total payment of $6.50 3) 15 sequences for a total payment of $6.00 30 sequences for a total payment of $6.75 4) 15 sequences for a total payment of $6.00 30 sequences for a total payment of $7.00 5) 15 sequences for a total payment of $6.00 30 sequences for a total payment of $7.25 6) 15 sequences for a total payment of $6.00 30 sequences for a total payment of $7.50 7) 15 sequences for a total payment of $6.00 30 sequences for a total payment of $7.75 8) 15 sequences for a total payment of $6.00 30 sequences for a total payment of $8.00 9) 15 sequences for a total payment of $6.00 30 sequences for a total payment of $8.25 10) 15 sequences for a total payment of $6.00 30 sequences for a total payment of $8.50 11) 15 sequences for a total payment of $6.00 30 sequences for a total payment of $8.75 12) 15 sequences for a total payment of $6.00 30 sequences for a total payment of $9.00 13) 15 sequences for a total payment of $6.00 30 sequences for a total payment of $9.25 14) 15 sequences for a total payment of $6.00 30 sequences for a total payment of $9.50 15) 15 sequences for a total payment of $6.00 30 sequences for a total payment of $9.75 16) 15 sequences for a total payment of $6.00 30 sequences for a total payment of $10.00 [PARTIAL] Choices to make now: for each choice in this Scenario, choose the preferred option. By completing the HIT you will receive $2.00 plus a bonus depending on your choices. [NARROW] Note: you are required to decode 15 sequences correctly, in addition to the sequences based on your choices. Choices to make now: for each choice in this Scenario, choose the preferred option. By completing the HIT you will receive $2.00 plus a bonus depending on your choices. [LOW] Choices to make now: for each choice in this Scenario, choose the preferred option. By completing the HIT you will receive $2.00 plus a bonus depending on your choices. Choices to make now: for each choice in this Scenario, choose the preferred option. By completing the HIT you will receive a total payment (which includes the $2.00 completion fee) depending on your choices. OPTION A OPTION B 1) 30 sequences for a total payment of $6.00 45 sequences for a total payment of $6.25 2) 30 sequences for a total payment of $6.00 45 sequences for a total payment of $6.50 3) 30 sequences for a total payment of $6.00 45 sequences for a total payment of $6.75 4) 30 sequences for a total payment of $6.00 45 sequences for a total payment of $7.00 5) 30 sequences for a total payment of $6.00 45 sequences for a total payment of $7.25 6) 30 sequences for a total payment of $6.00 45 sequences for a total payment of $7.50 7) 30 sequences for a total payment of $6.00 45 sequences for a total payment of $7.75 8) 30 sequences for a total payment of $6.00 45 sequences for a total payment of $8.00 9) 30 sequences for a total payment of $6.00 45 sequences for a total payment of $8.25 10) 30 sequences for a total payment of $6.00 45 sequences for a total payment of $8.50 11) 30 sequences for a total payment of $6.00 45 sequences for a total payment of $8.75 12) 30 sequences for a total payment of $6.00 45 sequences for a total payment of $9.00 13) 30 sequences for a total payment of $6.00 45 sequences for a total payment of $9.25 14) 30 sequences for a total payment of $6.00 45 sequences for a total payment of $9.50 15) 30 sequences for a total payment of $6.00 45 sequences for a total payment of $9.75 16) 30 sequences for a total payment of $6.00 45 sequences for a total payment of $10.00 [PARTIAL] Choices to make now: for each choice in this Scenario, choose the preferred option. By completing the HIT you will receive $2.00 plus a bonus depending on your choices. For this option you selected that you are (are not) willing to decode # additional sequences for $X.XX. In total you will decode # sequences to receive the HIT payment and the bonus. The computer randomly selected the Choice # from Scenario #. For this option you selected that you are (are not) willing to decode # sequences in total for $X.XX. In total you will decode # sequences to receive the HIT payment and the bonus. In today's HIT you have earned a bonus of $. Your guaranteed participation fee is: $2.00. So, in total, you have earned $. To receive your earnings, please enter this code into MTurk After you have done that, you can close this window. We thank you for participating in our study. Here we report the results when we restrict the data from the NARROW treatment to data to those sessions where the baseline endowment is only revealed on the first choice page, rather than on the page right before, as was inadvertently the case for early sessions. Broad bracketing is rejected as before, narrow bracketing is still not rejected, but when estimating κ with PARTIAL as the broader treatment, the value of κ jumps all over the place and is incredibly noisy (although as mentioned in the main text, PARTIAL and LOW are so close that one shouldn't take the standard errors seriously). Moreover, NARROW and AFTER are statistically significantly different in this case, as indicated by Table 15 . However, the issues around the different sample population for BEFORE/AFTER remain, given that we collected most of the data after COVID-19 induced lockdowns. Next we report results from the initial NARROW treatments where the information was displayed on the page right before. The results are essentially the same, although there is no longer a statistically significant difference in scenario 2, since NARROW lies between LOW and BROAD and is not significantly different from either, reflecting the lower power due to closer to 'linear' preferences in Scenario 2 (the difference between LOW and BROAD is lower). We compare the means of NARROW treatments with message displayed before the first choice page and on the first choice page by scenario directly in Table 19 . This shows that for scenario 2, these two versions are significantly (and sizeably) different, reflecting also that in one case this leads to rejection of broad bracketing in scenario 2 and once it doesn't. No matter which is the accurate treatment, both reject broad bracketing, and neither rejects narrow bracketing. There are two possible reasons for the difference: either it is due to the display of information, in which case the later data with information display on the page is the appropriate test, rejecting broad in both scenarios. In this case, the treatments NARROW and BROAD are not balanced within sessions, since we had completed collection of data on BROAD (mostly at least, we have a small overlap between the treatments). Or it is due to changes in the population due to sampling at different times. In this case the earlier data is the appropriate test, and balances observations against Table 13 : Estimates of κ when the broader option is BROAD or PARTIAL respectively. Narrow option is always LOW, κ estimates the convex combination between the broader and the narrow option, with κ = 0 indicating equality with the broader option. the BROAD treatment -i.e. the rejection of broad bracketing cannot be due (or more correctly, is statistically unlikely to be due) to different preferences. Fungibility, labels, and consumption Turking in the time of covid Individual preferences, monetary gambles, and stock market participation: A case for narrow framing Salience and consumer choice A Model of Relative Thinking. The Review of Economic Studies Your loss is my gain: a recruitment experiment with framed incentives Bonus versus penalty: How robust are the effects of contract framing Monopsony in online labor markets Relative earnings and giving in a real-effort experiment Equity concerns are narrowly framed The online platform economy in 2018: Drivers, workers, sellers, and lessors The evolution of the online platform economy: Evidence from five years of banking data Nonlinear regression Lioness lab: a free web-based platform for conducting interactive experiments online Frictions or mental gaps: what's behind the information we (don't) use and when do we care Fungibility and consumer choice: Evidence from commodity price shocks. The quarterly journal of economics Mental budgeting and consumer decisions Reference-Dependent Consumption Plans Self-regulation through goal setting Goals and bracketing under mental accounting Correlates of narrow bracketing Choice simplification: A theory of mental budgeting and naive diversification A model of focusing in economic choice A theory of narrow thinking Demographic stability on mechanical turk despite covid-19 Narrow bracketing and dominated choices Choice bracketing On the framing of multiple prospects Interdependent preferences and reciprocity The framing of decisions and the psychology of choice. science Pauline Vorjohann. Reference-dependent choice bracketing