key: cord-0258398-mhpgxg0g
authors: Adkins, Tyler J.; Lee, Taraz G.
title: Reward modulates cortical representations of action
date: 2020-11-25
journal: bioRxiv
DOI: 10.1101/2020.01.15.907006
sha: f339f0f29d31769117cfe2d76a37147e2e6cdfc5
doc_id: 258398
cord_uid: mhpgxg0g

People are capable of rapid on-line improvements in performance when they are offered a reward. The neural mechanism by which this performance enhancement occurs remains unclear. We investigated this phenomenon by offering monetary reward to human participants, contingent on successful performance in a sequence production task. We found that people performed actions more quickly and accurately when they were offered large rewards. Increasing reward magnitude was associated with elevated activity throughout the brain prior to movement. Multivariate patterns of activity in these reward-responsive regions encoded information about the upcoming action. Follow-up analyses provided evidence that action decoding in pre-SMA and other motor planning areas was improved for large reward trials and successful action decoding was associated with improved performance. These results suggest that reward may enhance performance by enhancing neural representations of action used in motor planning. Highlights Reward enhances behavioral performance. Reward enhances action decoding in motor planning areas prior to movement. Enhanced action decoding coincides with improved behavioral performance.

reward. The neural mechanism by which this performance enhancement occurs remains 48 unclear. We investigated this phenomenon by offering monetary reward to human participants, 49 contingent on successful performance in a sequence production task. We found that people 50 performed actions more quickly and accurately when they were offered large rewards.

Increasing reward magnitude was associated with elevated activity throughout the brain prior to 

We administered a discrete sequence production (DSP) task during functional magnetic 113 resonance imaging (fMRI) to 30 healthy human participants. Participants trained for 40 trials on 114 one sequence and 200 trials on another and returned 48 hours later to perform these 115 sequences for prospective rewards ($5, $10, or $30). We found that behavioral performance 116 was enhanced for $30 trials compared to $5 trials. Our data revealed a widespread network of 117 brain areas whose activity scaled linearly with reward magnitude. From this reward-responsive 118 network, we decoded information about upcoming action and performance from patterns of 119 activity preceding movement. This distributed representation included clusters in movement 120 planning areas such as LPFC, pre-supplementary motor area (pre-SMA) and supplementary 121 motor area (SMA). We then examined whether action decoding in specific ROIs was influenced 122 by reward magnitude. We found that action decoding in pre-SMA was enhanced for trials with 123 large reward cues. Furthermore, decoding in SMA was associated with improvements in 124 behavioral performance. Although future work is needed to determine exactly what aspects of 125 action-e.g., sequence, vigor, skill level-are encoded in these brain areas, our results suggest 126 that reward may improve performance by enhancing action coding prior to movement. 

Behavioral task 141 Participants learned to perform two sequences of eight keypresses (Fig. 1) . Before each 142 trial, participants were instructed by a color cue (1.5 s) to perform one of the two sequences.

After a brief delay (2-6 s), participants could perform the instructed sequence using the A-S-D-F 144 keys. During movement, an array of 4 grey rectangles (representing the keys) 'lit up' in 145 sequence to remind the participant of which key to press next. If an incorrect key was pressed, 146 the corresponding placeholder box would turn red for one second and the trial was aborted. If 147 the participant did not successfully complete the entire eight-item sequence under the timed 148 deadline (see below), a message saying "Too Slow" was displayed for one second and the trial 149 was aborted. There was no feedback following a successful trial. We expected that our 150 participants would rely less on these on-line visual cues, and more on advanced planning, as 151 learning progressed. In the reward session, sequence cues were presented concurrently with 152 incentive. Stimulus orderings for sequence identity and the duration of cue-to-execution 153 intervals (2-6 s) and inter-trial intervals (2-6s) were optimized to estimate effects of interest 154 using AFNI's make_random_timing tool by running 5000 iterations of different random 155 orderings/timings and selecting those explained the most variance in simulated GLM analyses.

Experimental Protocol 158 Participants performed the DSP task described above during fMRI scanning in two 159 separate sessions 48 hours apart. In session one, participants performed eight blocks of 35 160 trials each. During this training session, participants performed 200 trials of one sequence 161 ("trained"), 40 trials of a second sequence ("novice"), and 40 trials of un-cued pseudorandom 162 sequences ("random"). The identity of each of the two sequences was selected from a set of 163 three different sequences for each participant (3-2-4-1-2-4-2-1, 2-1-4-2-3-4-1-3, and 4-3-1-4-2-164 3-1-2). These sequences were chosen to be free of trills (e.g. 1-2-1) and repeats (e.g. 1-1). The 165 selection of these sequences was counterbalanced across participants such that the trained 166 sequence for one third of the subjects was the novice sequence in another third of the subjects 167 and omitted for the final third of the subjects. This procedure ensured that our results would not 168 be driven by the idiosyncrasies of the specific sequences chosen. In session two, participants 169 performed 8 blocks of 30 trials, 120 trials for each sequence trained in session one. We reset to be stricter each block if participants' accuracy was above 70% on the previous block.

Each trial was associated with one of three incentive magnitudes: $5, $10, or $30. These 178 reward values were presented simultaneously with the sequence color cues. The order of 179 reward values was set to be an m-sequence to mitigate carryover effects (Buracas and Boynton 180 2002). Participants were informed that a trial would be selected at random at the end of the 181 experiment, and that they would earn the associated reward provided they performed the target 182 sequence without error under the time limit. Our metric of successful task performance therefore 183 incorporated both speed and accuracy.

Incentive-behavior analysis

We estimated the effect of reward on behavioral performance using a Bayesian 187 hierarchical logistic regression model. The dependent variable was trial success. The model 188 included fixed effects of $10-$5 and $30-$5 and allowed intercepts to vary by subject. We weighted scans were acquired for anatomical localization. Functional data were realigned to the 212 third volume acquired, slice-time corrected, and registered to the MNI-152 template using a non-213 linear warp. Functional data were smoothed with a 6mm FWHM kernel for the whole-brain 214 univariate analysis but left unsmoothed for all multivariate analyses (see below). Pre-processing 215 was performed using AFNI (Cox,1996) .

Whole-brain Univariate Analysis

We modeled BOLD responses to the preparatory cue using generalized linear models 219 (GLM) implemented in using AFNI. As shown in Figure 1 , the preparatory cue conveyed 220 REWARD MODULATES ACTION-REPRESENTATIONS 6 information about the reward that could be obtained (e.g., "$30") and the sequence that should 221 be performed (e.g., a blue square for sequence A). One GLM was used to capture the spatially 222 distributed patterns of brain responses to the cue on each trial. This GLM modeled each trial's 223 response to the cue as a separate regressor, enabling us to obtain separate coefficient maps for 224 each trial. The resulting coefficient maps were used as samples in a multivariate decoding 225 analysis. This GLM also contained motor execution regressors for each sequence, which were 226 created by convolving a gamma function with a square wave starting at movement onset with 227 duration equal to the movement time. Another GLM was used to measure the extent to which 228 BOLD responses were greater for larger monetary incentives. This GLM used the same motor 229 execution regressors but contained 2 separate regressors for each sequence for the cue period.

One regressor captured the mean response to the cue, while the other captured parametric 231 modulation due to reward magnitude. This latter regressor was coded to predict linear changes 232 in activity with reward magnitude while accounting for the mean response to the cue (with 233 reward levels coded as 1, 2, 3). The resulting coefficient maps (one per participant) were fed 234 into a second-level group analysis using AFNI's 3dttest++ command, which helped us identify 235 the voxels whose activity reliably scaled with reward magnitude across participants.

To examine whether our multivariate analyses (see below) could be driven by mere 

Multivariate Analysis of reward-responsive regions

We used multivariate pattern analysis (MVPA) to localize the areas in the brain that 246 contained information about the identity of an upcoming action and whether the action would be 247 performed successfully ('performance decoding'). Multivariate techniques-such as machine 248 learning classifiers-are powerful tools for measuring information in the brain, because they are 249 capable of discovering complex, high-dimensional mappings between spatially distributed 250 patterns of brain activity and stimuli (or behaviors, or many other things the brain might contain 251 information about). We performed MVPA using SpaceNet classifiers from the nilearn python 

The classifiers were run separately on each subject and were given the trial-wise cue-258 related activity beta maps as input. We performed initial feature selection by considering only 259 those voxels whose activity increased with reward magnitude at cue (group-level p < 0.001 260 uncorrected). We used a leave-one-participant-out approach to construct the masks to ensure a trained on data with true labels, but for the 1000 other permutations the labels for the training 288 data were randomly shuffled. The classification target for the classifiers was action identity and 289 the analysis yielded binary accuracy scores for each trial, representing whether the action cued 290 on a trial was successfully decoded. With trial-wise decoding accuracies, we were able to test 291 whether action decoding was enhanced for trials with larger incentives ($30) compared to trials 292 with smaller incentives ($5, $10). We use permutation testing to assess whether decoding 293 accuracy at each reward level was above chance and whether the differences in decoding 294 accuracy between reward levels were greater than would be expected by chance. We compute 295 p-values for these means using the formula (C + 1) / (N + 1) where N is the total number of 296 permutations (1000) and C is the number of permutations whose means were greater than or 297 equal to the 'true' mean. To address issues of multiple comparisons, we use 1000 bootstrap 298 samples from the null distributions to determine the probability of obtaining statistically 299 significant effects in zero, one, two, three, etc. ROIs under the null. This analysis showed that it 300 was unlikely (p < .05) under the null to observe significant effects in multiple ROIs (Fig. S6 ).

Brain-behavior Analysis 303 Lastly, we examined the link between action decoding and behavior. We used a 304 hierarchical logistic regression model with Bayesian parameter estimation to model our trial 305 success. This model was designed to test whether higher decoding accuracy in our ROIs was 306 associated with higher behavioral accuracy. The dependent variable was behavioral 307 performance (success or failure). The models included fixed effects of decoding accuracy 308 REWARD MODULATES ACTION-REPRESENTATIONS 8 (correct or incorrect) and sequence identity (A or B) and allowed intercepts to vary by subject.

The sequence identity predictor was included to control for the effect of sequence and thereby 310 separate this effect from the effect of decoding accuracy.

Data-availability

Data and code used in this project are available at https://github.com/adkinsty/dsp_scanner. 315 316 317 Figure 1. Discrete sequence production task. To study the effects of motivation on skilled action, participants 318 performed two 8-item motor sequences for monetary incentives. The diagram above depicts an example trial from the 319 reward session (training was the same except there were no incentives). Reward and sequence (color) were cued at 320 the start of each trial. Our fMRI analyses focused on hemodynamic responses to this cue. After a brief delay, the 321 sequence was performed. If it was completed under a specific time limit, the trial was successful.

Prospective reward improves motor sequence performance 325 326

Our participants were more likely to successfully complete sequences on $30 trials compared to 327 both $10 trials (β !"#$" = 0.14, CI %&% = [−0.04,0.29], pd = 94.5% ; Fig. 2A ) and $5 trials 

Prospective reward increases preparatory activity in brain areas involved in motor planning

We first asked which regions of the brain were linearly responsive to motivation (i.e., 352 prospective reward value) just prior to movement. This analysis revealed clusters of reward-353 related activity in most of the brain regions that we expected to be engaged during the DSP Table S1 ).

Unsurprisingly, we also observed clusters of reward-related activity in reward regions such as 357 the striatum and the pallidum. This analysis provided a reliable map of the brain regions whose 358 activity was modulated by prospective rewards. We did not find any regions in which the extent 359 of this reward modulation significantly differed between the sequences (whole-brain paired t-360 test, all P > 0.05, corrected). Of particular interest to this study is the finding that many regions 

Reward-modulated brain areas encode motor skill information

We performed MVPA using SpaceNet classifiers to identify the subset of regions from the 372 reward map (Figure 3 ) whose patterns of activity also contained information about the upcoming 373 action. Our first set of SpaceNet classifiers were trained to predict the identity of intended 374 actions from patterns of brain activity preceding movement. At the group level, these classifiers 375 predicted the intended action (for a held-out 20% of trials) with a mean accuracy of 0.61 and a 376 standard deviation of 0.12 across subjects (P < .01; Figure 4A , Figure S5 ). We found action 377 information distributed across the cortex including LPFC, SMA, and M1 ( Figure 4B , Table S2 ).

Importantly, this analysis considered only voxels that were responsive to reward magnitude. The 379 regions in Figure 4B therefore simultaneously respond to changes in motivation and contribute 380 to a representation of the action being prepared (Fig. S7) . These findings are consistent with 

At the group level, the classifiers predicted future behavioral performance (on a held-out 20% of 400 data) with a mean accuracy of 0.79 and standard deviation of 0.09 across subjects (p < .001, 401 Figure 5A , Figure S5 ). This decoding analysis revealed a more restricted information map than 402 the analysis of action decoding, but the map still included informative clusters in LPFC, SMA, 403 and M1 ( Figure 5B , Table S3 ). 

We performed several control analyses to ensure our classifiers were not merely detecting 413 univariate differences between our conditions of interest (See Methods for a list of contrasts).

These analyses did not reveal any significant clusters of activity when directly contrasting the 415 mean univariate response between two sequences that survived multiple comparisons 416 correction (no significant clusters of activity at p < 0.05). This suggests that our multivariate 417 analyses were more sensitive in distinguishing between the two actions. Additionally, the linear 418 effect of reward did not differ between the two sequences (no significant clusters of activity at p 419 < 0.05). In another univariate control analysis, we tested whether the mean difference in cue-420 related activity for subsequently correct vs incorrect trials differed between the two sequences.

We again found no significant clusters of activity in this analysis suggesting that the univariate 422 response on both correct and incorrect trials was similar across the two sequences. Thus,

although behavioral performance differs slightly between the sequences and at the different 424 levels of reward, it is unlikely that our decoding results are driven by mere univariate 425 differences.

Our fMRI results suggest that the reward-related regions identified previously are not only 428 responsive to changes in motivation, but also relevant to future behavior. Furthermore, the 429 information maps from our two MVPAs (Figures 4 and 5) were found to overlap in key regions of 430 interest such as LPFC, SMA, and M1 (Table S4 ). This conjunction map, shown in Figure 6 , 

Successful action decoding in SMA coincides with more accurate behavioral performance

Our results above suggest that motivation by prospective reward enhances action as well as 492 neural representations of those actions in the brain. While these motivational effects may be 493 coincidental, we considered the possibility that the enhanced action coding may be a neural 494 mechanism by which subsequent behavior is enhanced. It follows from this hypothesis that 495 behavioral performance should be better on trials in which action codes had high fidelity (i.e., 496 when action identity was correctly decoded from preparatory brain activity).

We found evidence that participants were more likely to succeed when action could be decoded 

In this study, we sought to examine the neural mechanisms that contribute to the motivational 508 enhancement of action. We found that performance in a motor sequencing task improved as the 509 size of prospective performance-contingent reward increased. When examining cue-related 510 activity just before movement onset, we uncovered distributed patterns of activity across a large 511 network of regions important for motor planning that simultaneously coded for reward value, 512 action, and future behavioral success. We then interrogated a subset of these regions to 513 examine how action coding was impacted by increasing reward values. We found that our 514 ability to decode upcoming actions from pre-SMA improved as reward values increased. A 515 follow-up analysis showed that people were more likely to succeed on trials in which we could 516 correctly decode the upcoming action from preparatory activity in SMA. Our results suggest that 517 incentive-motivated performance may depend on enhanced representations of action used in 518 movement planning 519 520

Our results show that motivation (i.e, prospective reward value) modulated a widespread task 523 network prior to movement (Figure 3 ). In these regions, the amplitude of the hemodynamic 524 response to the cue was larger for high-value trials compared to low-value trials. This 

Enhanced action representations as a mechanism for enhanced performance 568 569

Our multivariate analyses showed that motivational signals and preparatory action 570 representations converge in several brain areas. While such convergence is likely necessary for 571 motivation to enhance behavior, it is unclear exactly what happens when these signals 572 converge. One possibility is that motivation enhances cortical representations of upcoming 573 actions. We addressed this possibility in follow-up ROI analyses and found evidence for such 574 enhancement in pre-SMA ( Figure 7A ). Our results also show that anticipatory action decoding in SMA was linked to subsequent behavioral performance ( Figure 7B ). Together, these results 576 provide support for the hypothesis that the prospect of reward may lead to enhanced 577 performance by enhancing prospective representations of action used in planning.

Although our a priori focus was on LPFC as a key area involved in the enhancement of action 580 by reward, our results predominantly implicate SMA and pre-SMA. It is known that cells in the 581 supplementary motor area encode information about the sequential order of future movements 582 (Tanji & Shima, 1994) , while cells in pre-SMA are responsible for updating such sequential 583 movement plans (Shima et al., 1996) . This was corroborated by more recent work 584 demonstrating a deficit in the inhibition of planned movements following lesion to the right pre- 

It is possible that reward also enhances performance through enhanced processing during the 593 execution period itself. However, we chose to focus on activity at cue (motor preparation) rather 

We think it is plausible that the action decoding we observed was driven by action identity (e.g.,

sequence order) and that these representations became more distinct with increased 

However, none of the skills in our task were expert, having been trained for only a few hundred 617 trials at most. If decoding were driven by skill-level, it would be curious why such brain 618 differences in skill level would be enhanced by reward. One possibility is that novice and trained skills differentially engage higher cognitive processes (Poldrack et al. 2005 ). If reward 620 modulates these processes, it may result in increased differentiation between patterns of activity 621 associated with novice and trained skills. Regardless, it is clear that preparatory activity differs 622 between the two sequences, that motivation enhances the distinctiveness of this preparatory 623 activity, and that this increased distinctiveness coincides to better task performance.

Our analyses also focused heavily on decoding action information from patterns of cortical 626 activity. We focused on this aspect of our dataset because prior work validated the approach of 

Closing remarks

In sum, we provide evidence that behavioral performance is enhanced by motivation and that a 646 widespread network of motor planning regions jointly contains information about reward, action, 647 and performance. Additionally, our ability to decode skilled actions from patterns of BOLD 648 activity from these regions in isolation was enhanced by the prospect of large rewards. We 

Responses to reward in monkey dorsal 666 and ventral striatum

Structured Sparsity Models for Brain 668

Decoding from fMRI data

Prefrontal Cortex Drives Mesolimbic Dopaminergic Regions to Initiate Motivated Behavior

The Role of the Dorsal Striatum in Reward and 674

Motivation and cognitive control: from behavior to neural 676 mechanism. Annual review of psychology

Efficient design of event-related fMRI experiments using M-678 sequences

brms: An R Package for Bayesian Multilevel Models Using Stan

Contributions of Orbitofrontal and Lateral Prefrontal Cortices to 682 Economic Choice and the Good-to-Action Transformation

Supplementary motor area 684 encodes reward expectancy in eye-movement tasks

Stan: A Probabilistic Programming Language

Confidence intervals in within-subject designs: A simpler solution to Loftus 689 and Masson's method

AFNI: software for analysis and visualization of functional magnetic resonance 691 neuroimages

Reward-related responses in the human striatum

Motor skill learning between selection and execution

Benchmarking solvers for TV-L1 least-699 squares and logistic regression in brain imaging

Distinct contribution of the cortico-striatal and 702 cortico-cerebellar systems to motor skill learning

The basal ganglia: from motor commands to the control of 704 vigor

Motivation sharpens exogenous spatial attention

Avoiding non-708 independence in fMRI data analysis: leave one subject out

Reward Motivation Enhances Task 710 Coding in Frontoparietal Cortex

Action-Specific Value Signals in Reward-Related 712 Regions of the Human Brain

Computational models of motivated action selection in corticostriatal circuits

Visualization in Bayesian 716 workflow

Motor Cortex Excitability Reflects the Subjective Value of 718 Reward and Mediates Its Effects on Incentive-Motivated Performance. The Journal of 719 Neuroscience

A weakly informative default prior distribution for 721 logistic and other regression models

A multi-modal parcellation of 724 human cerebral cortex

Functional mapping of sequence learning in normal 726 humans

Interpretable brain-wide 728 prediction analysis with GraphNet

Neurons in Anterior Cingulate Cortex Multiplex Information about 730 Reward and Action

Parallel basal ganglia circuits for decision 732 making

Central mechanisms of motor skill 734 learning

Capturing the temporal 736 evolution of choice across prefrontal cortex

Prefrontal cortex mediation of cognitive enhancement in 738 rewarding motivational contexts

Adaptation of Prefrontal Cortical Firing Patterns and 741

Their Fidelity to Changes in Action-Reward Contingencies

Reward-dependent modulation of working memory in lateral 744 prefrontal cortex. The Journal of neuroscience : the official journal of the Society for 745 Neuroscience

A region of mesial prefrontal 747 cortex tracks monetarily rewarding outcomes: characterization with rapid event-related fMRI

Influence of Reward 750 Expectation on Visuospatial Processing in Macaque Lateral Prefrontal Cortex

Mental labour

Haith AM. 2019. Motor Learning. Comprehensive 754 Physiology

Reward modulation of prefrontal and visual 756 association cortex during an incentive working memory task

Bayesian Assessment of Null Values Via Parameter Estimation and Model 758

Out of control: Diminished prefrontal activity coincides with 760 impaired motor performance due to choking under pressure

Effect of Expected Reward Magnitude on the

Indices of Effect Existence and 764 Significance in the Bayesian Framework

Toward an Autonomous Brain Machine 766 Interface: Integrating Sensorimotor Reward Modulation and Reinforcement Learning. The 767

Characterizing the Associative Content 769 of Brain Structures Involved in Habitual and Goal-Directed Actions in Humans: A Multivariate 770 fMRI Study

A Comparative Study of 772 Algorithms for Intra-and Inter-subjects fMRI Decoding

Motivational state, reward value

and Pavlovian cues differentially affect skilled forelimb grasping in rats

The role of the pre-777 supplementary motor area in the control of action

Decoding sequential finger 779 movements from preparatory activity in higher-order motor regions: a functional magnetic 780 resonance imaging multi-voxel pattern analysis

Attentional requirements of learning: Evidence from performance 783 measures

The neural basis of motivational influences on cognitive 785 control

Incentives Boost Model-Based Control 787

Across a Range of Severity on Several Psychiatric Constructs

Scikit-learn: Machine Learning in 792 Python

Many hats: intratrial and reward level-dependent BOLD activity in 794 the striatum and premotor cortex

The neural correlates of motor skill automaticity

Cortical 799 neurons multiplex reward-related signals along with sensory and motor information

Premotor and Motor Cortices 802 Encode Reward

Impact of Expected Reward on

Frontal and Supplementary Eye Fields and Premotor Cortex

Neuronal activity related to reward value and motivation in 807 primate frontal cortex

Representation of Action-Specific Reward 809 Values in the Striatum

An fMRI Study of the Role of the Medial 811

Neurocognitive Contributions to Motor Skill Learning: The 813 Role of Working Memory

Value-based modulations in human visual cortex

Basal ganglia contributions to motor control: a vigorous tutor

Role for cells in the presupplementary 818 motor area in updating motor plans

Differential impact of reward and 821 punishment on functional connectivity after skill learning

Role for supplementary motor area cells in planning several 823 movements ahead

Practical Bayesian model evaluation using leave-one-out 825 cross-validation and WAIC

Differential effect of reward and 827 punishment on procedural learning. The Journal of neuroscience : the official journal of the 828 Society for Neuroscience

Heterogeneous reward signals in prefrontal cortex. Current 830 opinion in neurobiology

Neuronal activity in primate dorsolateral and orbital prefrontal cortex 832 during performance of a reward preference task

Coding and monitoring of 835 motivational context in the primate prefrontal cortex. The Journal of neuroscience : the official 836 journal of the Society for Neuroscience

Cognitive effort: A neuroeconomic approach. Cognitive, 838 affective & behavioral neuroscience

Skill learning strengthens cortical representations of motor 840 sequences

Direct Comparison of Neural Systems Mediating 842

Conscious and Unconscious Skill Learning

Interactions of motivation and cognitive control. Current Opinion in 844

The Role of Human Primary Motor Cortex in the 846 Production of Skilled Finger Sequences