key: cord-0256280-n11g0bjt
authors: Marshall, James A. R.; Reina, Andreagiovanni; Hay, Célia; Dussutour, Audrey; Pirrone, Angelo
title: Magnitude-sensitive reaction times reveal non-linear time costs in multi-alternative decision-making
date: 2021-09-15
journal: bioRxiv
DOI: 10.1101/2021.05.05.442775
sha: 4f6621df845d8ee4bab1df86a00e1b08da10c80a
doc_id: 256280
cord_uid: n11g0bjt

Optimality analysis of value-based decisions in binary and multi-alternative choice settings predicts that reaction times should be sensitive only to differences in stimulus magnitudes, but not to overall absolute stimulus magnitude. Yet experimental work in the binary case has shown magnitude sensitive reaction times, and theory shows that this can be explained by switching from linear to geometric time costs, but also by nonlinear subjective utility. Thus disentangling explanations for observed magnitude sensitive reaction times is difficult. Here for the first time we extend the theoretical analysis of geometric time-discounting to ternary choices, and present novel experimental evidence for magnitude-sensitivity in such decisions, in both humans and slime moulds. We consider the optimal policies for all possible combinations of linear and geometric time costs, and linear and nonlinear utility; interestingly, geometric discounting emerges as the predominant explanation for magnitude sensitivity.

However, an alternative and standard formulation of the Bellman equation, the central equation 53 in constructing a dynamic program, accounts for the cost of time by discounting future rewards 54 geometrically, so a reward one time step in the future is discounted by rate γ < 1, two time steps 55 in the future by γ 2 , and so on (see Materials and Methods). This is a standard assumption in As previously done for binary decisions (Pirrone et al., 2018a,b) , here we focused our analyses exclusively on equal alternatives. For the analyses, we did not censor any datapoints. , on a scale of brightness from 0 to 1 in PsychoPy. Y-axis presents mean reaction times, in seconds. Bars show 95% confidence intervals. Participants experienced equal alternative conditions, interleaved with unequal alternative trials in pseudo-randomised order. Participants that performed the whole experiment experienced each equal alternative presentation ten times.

As shown in Figure 1 , the data show strong magnitude sensitivity, given that choices for equal 104 alternatives of higher magnitude conditions (higher brightness on a scale from 0 to 1 on Python) 105 were made faster. 106 To assess if reaction times decreased as a function of the mean brightness of the equal al-107 ternatives, we used a linear mixed model in R. The model was fitted by specifying as fixed effect 108 (explanatory variable) the brightness of equal alternatives as a continuous predictor. The partici-109 pant ID was also added to the model as a factor for random effects. Reaction times significantly 110 decreased as a function of the mean brightness of the alternatives (b = -1.95, p < .001, CI -2.14 -1.75).

Further details for the mixed-effect regression are presented in the supplementary information 112 ( Stimuli's magnitude (concentration in yolk g.L-1) Empirical results from the slime mould experiment. Decreasing latencies to reach a food source as a function of the magnitude of the equal alternatives. X-axis presents the concentration in egg yolk of equal food sources (20, 40, 60, 80 g.L-1). Y-axis presents mean latency to reach a food source, in minutes. Bars show 95% confidence intervals. 50 slime moulds were tested for each magnitude for a total of 200 slime moulds.

For our theoretical analysis we begin by re-deriving optimal policies for decisions when the change 133 is made from linear costing of time, or Bayes Risk, to geometric discounting of future reward. Note 134 that geometric discounting of future rewards is similar to, but not the same as, non-linear utility. As Under Bayes Risk-optimisation it is known that, for binary decisions, optimal policies are magnitude-151 insensitive when subjective utility is linear, whereas they are magnitude-sensitive when subjective 152 utility is nonlinear (Tajima et al., 2016, 2019) . 153 For ternary decisions, however, even with nonlinear subjective utility, policies exhibit very weak 154 magnitude-sensitivity early in decisions, becoming magnitude-insensitive as decisions progress 155 (Fig. 3 , row 'linear'). Sensitivity analysis shows that magnitude-insensitivity is a general pattern. An 156 informal understanding of this can be arrived at by appreciating that sigmoidal functions have 157 two extremes of parameterisation; in one extreme they are almost linear, hence will be mostly 158 magnitude insensitive due to the known result (Tajima et al., 2016) . At the other extreme, the 159 function becomes step-like; in this case options are either good or bad, and the optimal policy 160 rapidly becomes 'choose the best' (Fig. 4) , since under such a scenario sampling is of minimal benefit 161 as early information quickly indicates whether an option is good or bad, and choosing the first 162 option that appears to be good is optimal. magnitude-sensitive simulated reaction times (Fig. 5 ). This agrees with the weak magnitude-181 sensitivity observed in the optimal policies derived above (Fig. 3) . Note, however, that this contrasts 182 with the binary decision case in which optimal policies, and hence reaction times, become magnitude In contrast to linear time costing, across all nonlinear subjective utility functions considered, ge-189 ometric time costing resulted in strongly magnitude sensitive simulated reaction times (Fig. 6) , 190 with longer reaction times for lower value equal-value option sets; this strategy was previously 191 hypothesised to be optimal (Pais et al., 2013) . The strong magnitude-sensitivity in the numerical 192 simulations corresponds with the strong magnitude-sensitivity observed in the optimal policies 193 derived above (Fig. 3) . Figure 6 . Geometric discounting of reward leads to strongly magnitude-sensitive simulated reaction times across a range of nonlinear subjective utility functions, with decisions postponed for low equal-value option sets. Simulation parameters were: prior meanx p,i = 1.5 and variance σ 2 p,i = 5, observation noise variance σ 2 a,i = 2, temporal cost γ = 0.1, and simulation timestep dt = 5 × 10 −3 . Lines are the mean reaction time for 10 4 simulations, 95% confidence intervals are shown as red shading (mostly invisible because smaller than the linewidth).

In understanding behaviour, which is a product of evolution, searching for optimal algorithms for (Tajima et al., 2019) . The resulting algorithms correspond to earlier simple models for 202 perceptual and value-based decision-making. These findings, however, rest on an assumption that 203 time is a linear cost for subjects. Here we have shown that deciding human subjects and foraging 204 unicellular organisms do, however, exhibit marked magnitude sensitivity in ternary decisions, as 205 previously shown for binary decisions (Pirrone et al., 2018a; Dussutour et al., 2019) . We have also 206 shown that optimality theory that discounts future rewards multiplicatively based on time is the 207 foremost explanation for such observations of magnitude-sensitivity; nonlinear subjective utility 208 alone is not sufficient to give rise to strongly magnitude-sensitive decision times when time is 209 treated as a linear cost.

The Bayes Risk optimal policy is approximated by a neural model that is consistent with observations 212 of economic irrationality (Tajima et al., 2019) , hence it will be important to see if a revised neural 213 model based on the revised optimal policy still shows such agreement. For example, while in the 214 binary case magnitude-sensitive reaction times can be explained both by nonlinear subjective utility 215 functions, and by multiplicative discounting rather than Bayes Risk, in the multi-alternative case our 216 analysis suggests that the same phenomenon is explained primarily by multiplicative discounting of 217 future rewards and not by nonlinear utility. Slime moulds were presented with a choice between three equal food sources in an arena 302 consisting of 60 mm diameter Petri dish filled with plain 1% agar. We punched three holes (10mm 303 ) in the arena and filled them with a food source (10mm ). We used four different food patches 304 varying in quality: 2% w/v powdered oat mixed with either 2, 4, 6 or 8% w/v egg yolk. Once the 305 food sources were set in each hole, we placed a slime mould (10mm ) in the centre of the arena 306 2cm away from each food. We replicated the experiment 50 times for each food quality. For each 307 replicate, we measured the time taken by the slime mould to reach either one of the three food 308 sources. 309 To assess the difference in the latency to reach the food as a function of the food quality, we 310 used a linear mixed model (function lmer, Package lme4) in R (RStudio Version 1.2.1335). The 311 models were fitted by specifying the fixed effects (explanatory variables) the concentration in yolk 312 (continuous predictor). The sclerotia identity was also added to the model as a random factor. We 313 transformed the dependant variable using the "bestNormalize" function ("bestNormalize" package). 314 The outcome of the model is presented in the supplementary information (Table S2) . 

where V (t,x(t)) is the value of the state estimates vectorx(t) at time t, ri(t,x(t)) is similarly the 331 expected reward from choosing the i-th reward, δt is the time interval to the next decision point, c 332 is the linear cost per unit time, ρ is the reward rate per unit time based on optimal decision-making 333 over a sequence of trials, tw is the inter-trial waiting time, and . . . is expectation over the next time 334 interval (δt) (Tajima et al., 2019) . For the results presented here we set c = 0, tw = 1 and found the 335 optimal ρ > 0 using the methods of Tajima et al. (2019) ; note, however, that since the prior was not 336 varied this reward-rate optimisationcould not induce magnitude-sensitive reaction times in itself. 337 For the geometric discounting case the Bellman equation becomes 338 V (t,x(t)) = max {maxi{ri(t,x(t))}, V (t + δt,x(t + δt)) γ } ,

where 0 < γ < 1 is a discount factor for rewards received in future timesteps; this discount factor is 339 per-unit-time, hence to discount a reward δt < 1 timesteps in the future the appropriate factor is 340 γ δt/1 = γ δt .

Stochastic simulations 342 Since noise processing is important in determining reaction times, we derived optimal decision 343 policies as above, then tested them through numerical analysis of stochastic models. To test for 

Testing optimal timing in value-linked decision making

The physics of optimal decision making: 379 a formal analysis of models of performance in two-alternative forced-choice tasks

Phenotypic variability predicts decision accuracy in unicellular 382 organisms

The evolution of decision rules in complex environments

Speed, accuracy, and the optimal timing of choices

Models of adaptive behaviour: an approach based on state

Visual fixations and the computation and comparison of value in 390 simple choice

Speed-accuracy trade-offs during foraging decisions in the acellular slime 392 mould physarum polycephalum

Dynamic modeling in behavioral ecology

Comment on 'optimal policy for multi-alternative decisions

Matlab r2020b

Integrating function and mechanism

A mechanism for 399 value-sensitive decision-making

Optimality theory in evolutionary biology

Psychopy2: Experiments in behavior made easy. Behavior research methods

Psychopy-psychophysics software in python

Evidence for the speed-value trade-off: 405 Human and monkey decision making is magnitude sensitive

Is attentional discounting in value-based decision making magnitude sensitive? 407

When natural selection should optimize speed-accuracy 409 trade-offs

Single-trial dynamics explain magnitude sensitive decision making

Decision-making without 413 a brain: how an amoeboid organism solves the two-armed bandit

Myopic discounting of future rewards after medial 416 orbitofrontal damage in humans

Gaze amplifies value in decision making

Sensitivity of reaction time to the 419 magnitude of rewards reveals the cost-structure of time

Optimal policy for multi-alternative decisions

Optimal policy for value-based decision-making

Absolutely relative or relatively absolute: violations of value 425 invariance in human decision making

Perceptual change-of-mind decisions 427 are sensitive to absolute evidence magnitude

Reward certainty and preference bias 429 selectively shape voluntary decisions. bioRxiv

 369 We thank Satohiro Tajima for sharing the code for the binary decision model. Dr Tajima was an 370 exceptionally promising scientist who is sadly missed. We thank Jan Drugowitsch and Alex Pouget 371 for discussions of their results and our own, and Thomas Bose, Nathan Lepora and Sophie Baker 372 for comments on an earlier draft.

The authors declare that they have no conflicting interests. 375