key: cord-0430380-y3bs5skr
authors: Weineck, Kristin; Wen, Olivia Xin; Henry, Molly J.
title: Neural entrainment is strongest to the spectral flux of slow music and depends on familiarity and beat salience
date: 2021-12-10
journal: bioRxiv
DOI: 10.1101/2021.11.29.470396
sha: d24f9d2e4480a5b306f1a2779c57d78101fa1b55
doc_id: 430380
cord_uid: y3bs5skr

Neural activity in the auditory system synchronizes to sound rhythms, and brain–environment synchronization is thought to be fundamental to successful auditory perception. Sound rhythms are often operationalized in terms of the sound’s amplitude envelope. We hypothesized that – especially for music – the envelope might not best capture the complex spectro-temporal fluctuations that give rise to beat perception and synchronize neural activity. This study investigated 1) neural entrainment to different musical features, 2) tempo-dependence of neural entrainment, and 3) dependence of entrainment on familiarity, enjoyment, and ease of beat perception. In this electroencephalography study, 37 human participants listened to tempo-modulated music (1–4 Hz). Independent of whether the analysis approach was based on temporal response functions (TRFs) or reliable components analysis (RCA), the spectral flux of music – as opposed to the amplitude envelope – evoked strongest neural entrainment. Moreover, music with slower beat rates, high familiarity, and easy-to-perceive beats elicited the strongest neural response. Based on the TRFs, we could decode music stimulation tempo, but also perceived beat rate, even when the two differed. Our results demonstrate the importance of accurately characterizing musical acoustics in the context of studying neural entrainment, and demonstrate entrainment’s sensitivity to musical tempo, familiarity, and beat salience.

Neural activity synchronizes to different types of rhythmic sounds, such as speech and music 45 (Doelling and Poeppel, 2015 , Nicolaou et al., 2017 , Ding et al., 2017 , Kösem et al., 2018 , 46 over a wide range of rates. Neural oscillations are involved in the regulation of 47

Because of the complex nature of natural polyphonic music, we hypothesized that 95 amplitude envelope might not be the only or most dominant feature by which neural activity 96 would be entrained (Mller, 2015) . Thus, the current study investigated neural responses to 97 different musical features that evolve over time and capture different aspects of the stimulus 98 dynamics. Here, we use the term musical feature to refer to time-varying aspects of music that 99 fluctuate on time scales corresponding roughly to the neural δ band, as opposed to elements of 100 music such as key, harmony or syncopation. We examined amplitude envelope, the first 101 derivative of the amplitude envelope (usually more sensitive to sound onsets than the 102 amplitude envelope), beat times, and spectral flux, which describes spectral changes of the 103 signal on a frame-to-frame basis by computing the difference between the spectral vectors of 104 subsequent frames (Mller, 2015) . One distinct advantage of spectral flux over the envelope or 105 its derivative is that spectral flux is sensitive to rhythmic information that is communicated by 106 changes in pitch even when they are not accompanied by changes in amplitude. 107

The current study investigated neural entrainment to natural music by using two 108 different analysis approaches: Reliable Components Analysis (RCA) (Kaneshiro et al., 2020) 109 and temporal response functions (TRFs) (Di Liberto et al., 2020) . RCA typically relies on 110 stimulus-response correlation or stimulus-response coherence (Kaneshiro et al., 2020) . These 111 approaches have been criticized because of their potential susceptibility to autocorrelation, 112 which is argued to be minimized in the TRF approach (Zuk et al., 2021 ). Thus, we tested the 113 agreement between these two analysis approaches. 114

We aimed to answer four questions. 1) Does neural entrainment to natural music 115 depend on tempo? 2) Which musical feature shows the strongest neural entrainment during 116 natural music listening? 3) How compatible are RCA-and TRF-based methods with 117 quantifying neural entrainment to natural music? 4) How do enjoyment, familiarity, and ease 118 of beat perception affect neural entrainment? To answer these research questions, we recorded 119 electroencephalography (EEG) data while participants listened to instrumental music 120 presented at different tempi (1-4 Hz). Strongest neural entrainment was observed in response 121 to the spectral flux of music, for tempi between 1-2 Hz, to familiar songs, and to songs with 122 an easy-to-perceive beat. Moreover, a classifier trained on the neural responses to each 123 musical feature predicted the metrical level at which listeners tapped the beat. This indicates 124 that the brain responded to perceived tempo, even when it was different from the stimulus 125 tempo. 126 127

Scalp EEG activity of 37 human participants was measured while they listened to 129 instrumental segments of natural music from different genres (Supplementary Table 1 investigated the effects of enjoyment, familiarity, and the ease with which a beat was 138 perceived (Fig. 1A) . To be able to use a large variety of musical stimuli on the group level, 139 and to decrease any effects that may have arisen from individual stimuli occurring at certain 140 tempi but not others, participants were divided into four subgroups that listened to different 141 pools of stimuli (for more details please see Materials and Methods). The subgroups' stimulus 142 pools overlapped, but the individual song stimuli were presented at different tempi for each 143 subgroup. 144

We examined neural synchronization to the time courses of four different musical features 146 (Fig. 1B) . First, we quantified energy fluctuations over time as the gammatone-filtered 147 amplitude envelope (we report analyses on the full-band envelope in Supplementary Figures 2  148   and 4) . Second, we computed the half-wave-rectified first derivative of the amplitude 149 envelope, which is typically considered to be sensitive to the presence of onsets in the 150 stimulus (Bello et al., 2005) . Third, a percussionist drummed along with the musical segments 151 to define beat times, which were here treated in a binary manner. Fourth, a spectral novelty 152 function, referred to as spectral flux (Mller, 2015) , was computed to capture changes in 153 frequency content (as opposed to amplitude fluctuations) over time. In contrast to the first 154 derivative, the spectral flux is better able to identify note onsets that are characterized by 155 changes in spectral content (pitch or timbre), even if the energy level remains the same. To 156 ensure that each musical feature possessed acoustic cues to the stimulation-tempo 157 manipulation, we computed a fast Fourier transform (FFT) on the musical-feature time 158 courses separately for each stimulation-tempo condition; the mean amplitude spectra are 159 plotted in Figure 1C . Overall, amplitude peaks were observed at the intended stimulation 160 tempo and at the harmonic rates for all stimulus features. 161

In order to assess the degree to which the different musical features might have been 162 redundant, we calculated mutual information (MI) for all possible pairwise feature 163 combinations and compared MI values to surrogate distributions calculated separately for 164 each feature pair (Fig. 1D, E) . MI quantifies the amount of information gained about one 165 random variable by observing a second variable (Cover and Thomas, 2005 Each trial consisted of the presentation of one music segment, during which participants were instructed to listen attentively without moving. After a 1-s silence, the last 5.5 s of the music segment was repeated while participants tapped their finger along with the beat. At the end of each trial, participants rated their enjoyment and familiarity of the music segment, as well as the ease with which they were able to tap to the beat (Translated English example in Figure: "How much did you like the song?" rated from "not at all" to "very much"). (B) Exemplary traces of the four musical features of one music segment. (C) Z-scored mean amplitude spectrum of all 4 musical features. (D) Mutual information (MI) for all possible feature combinations (green) compared to a surrogate distribution (yellow, three-way ANOVA, *pFDR<0.001, rest: pFDR<0.05). Boxplots indicate the median, the 25 th and 75 th percentiles. (E) MI scores between all possible feature combinations (*pFDR<0.001, rest: pFDR<0.05).

Neural entrainment to music was investigated using two converging analysis pipelines based 191 on (1) RCA followed by time-(stimulus-response correlation, SRCorr) and frequency-192 (stimulus-response coherence, SRCoh) domain analysis and (2) TRFs. 193 First, an RCA-based analysis approach was used to assess tempo effects on neural 194 entrainment to music (Fig. 2, Supplementary Fig. 2) convolution and ridge regression to avoid overfitting, the TRF was computed based on 229 mapping each musical feature to "training" EEG data. Using a leave-one-trial-out approach, 230 the EEG response for the left-out trial was predicted based on the TRF and the stimulus 231 feature of the same trial. The predicted EEG data were then correlated with the actual, unseen 232 EEG data (we refer to this correlation value throughout as TRF correlation). We analyzed the 233 two outputs of the TRF analysis: the filter at different time lags, which typically resembles 234 evoked potentials, and the TRF correlations (Fig. 3, Supplementary Fig. 4) . Again, strongest 235 neural entrainment (here quantified as Pearson correlation coefficient between the predicted 236 and actual EEG data) was observed for slower music (Fig. 3A) shaded area). Highest correlations were found at slow tempi (repeated-measure ANOVA for evaluating tempo differences and Greenhouse-Geiser correction where applicable). The slopes of regression models were used to compare the tempo-specificity between musical features (repeated-measure ANOVA). (C) Mean SRCorr across musical features. Highest correlations were found in response to spectral flux. There were significant differences between all possible feature combinations except between the envelope and beat onsets (repeated-measure ANOVA, Tukey's test, pFDR<0.001). Boxplots illustrate the median, 25 th and 75 th percentiles. (D) Same as (C) for the frequency based SRCoh. All possible feature combinations were significantly different from each other apart from the envelope and beat onsets (pFDR<0.001). Coherence values were averaged over the stimulus tempo and first harmonic. Normalized SRCoh in response to the (E) amplitude envelope, (F) first derivative, (G) beat onsets and (H) spectral flux. Each panel depicts the stimulus response coherence as colorplot (left) and the pooled SRCoh values at the stimulation tempo and first harmonic (right). (I) Mean differences of SRCoh values at the stimulation tempo and the first harmonic (negative values: higher SRCoh at harmonic, positive values: higher SRCoh at stimulation tempo, paired-sample t-test, *pFDR<0.05; **pFDR<0.001). (J) Same as (I) based on the FFT amplitudes of each musical feature.

As natural music is a complex, multi-layered auditory stimulus, we sought to explore the 243 neural response to different musical features and to identify the stimulus feature or features 244 that would evoke strongest neural entrainment. Regardless of the dependent measure (RCA-245 SRCorr, RCA-SRCoh, TRF correlation), strongest neural entrainment was found in response 246 to the spectral flux ( Fig. 2C-D, 3B ). In particular, significant differences (as quantified with a 247 repeated-measure ANOVA followed by Tukey's test) were observed between the spectral flux 248 and all other musical features using the SRCorr (FSRCorr(3,132)=43.99, pGG= 1.85e-11, 249 η 2 =0.58), SRCoh (FSRCoh(3,132)=30.75, pGG =2.33e-9, η 2 =0.49) and TRF correlations 250 (FTRF(4,165)=30.25, pGG=5.36e-11, η 2 =0.49). 251

As the TRF approach offers the possibility of running a multivariate analysis, all 252 musical features were combined and compared to the single-feature TRF correlations (Fig.  253 3B). Although there was a significant increase in TRF correlations in comparison to the 254 amplitude envelope (repeated-measure ANOVA with follow-up Tukey's test, pFDR=1.66e-08), 255 first derivative (pFDR =1.66e-8) and beat onsets (pFDR=1.66e-8), the spectral flux alone showed 256 an advantage over the multi-featured TRF (pFDR=3.39e-4). Thus, taking all stimulus features 257 together is not a better descriptor of the neural response than the spectral flux alone, 258

indicating together with the MI results from Figure 1 that spectral flux is a more complete 259 representation of the rhythmic structure of the music than the other musical features. 260

To test how strongly modulated TRF correlations were by each musical feature, a 261 regression line was fitted to single-participant TRF correlations as a function of tempo, and 262 the slopes were compared across musical features (Fig. 3A) . Linear slopes were significantly 263 higher for the spectral flux and the multivariate model compared to the remaining three 264 musical features (repeated-measure ANOVA with follow-up Tukey's test, envelope-spectral 265 flux: pFDR=5.44e-6; envelopeall: pFDR=3.54e-5; derivative-spectral flux: pFDR=9.98e-7; 266 derivativeall: pFDR=1.53e-5; beat-spectral flux: pFDR=4.54e-7; beatall: pFDR=3.46e-6; 267 spectral fluxall: pFDR=0.12). The results for SRCorr were qualitatively similar (envelope-268 spectral flux: pFDR=1.24e-4; derivative-spectral flux: pFDR=2.21e-5; beat-spectral flux: 269 pFDR=9.31e-5; Fig. 2B ). 270

We also examined the time courses of TRF weights (Fig. 3C -F) for time lags between 271 0 and 400 ms. Cluster-based permutation testing (1000 repetitions) was used to identify time 272 windows in which TRF weights differed across tempi for each musical feature (see Materials 273

and Methods for more details). Significant effects of tempo on TRF weights were observed 274 for the spectral flux between 102-211 ms (p=0.01; Fig. 3 F-G). The tempo specificity was 275 observable in the amplitudes of the TRF weights, which were largest for slower music ( Hz and from 2.75-4 Hz for both stimulus features (derivative: T1-2.5Hz=-1.08, p=0.33, 288 R 2 =0.03; T2.75-4Hz=-2.2, p=0.09, R 2 =0.43), but this was only significant for the envelope (T1-289 2.5Hz=-6.1, p=0.002, R 2 =0.86; T2.75-4Hz=-5.66, p=0.005, R 2 =0.86). 290 

So far, we demonstrated that both RCA-and TRF-based measures of neural entrainment lead 293 to similar results at the group level, and reveal strongest neural entrainment to spectral flux 294 and at slow tempi. Next, we wanted to quantify the relationship between the SRCorr/SRCoh 295 and TRF correlations across individuals (Fig. 4, Supplementary Fig. 3 Familiar songs and songs with an easy-to-tap beat drive strongest neural entrainment 319

Next, we tested whether neural entrainment to music depended on 1) how much the song was 320 enjoyed, 2) the familiarity of the song, and 3) how easy it was to tap the beat of the song; each 321 of these characteristics was rated on a scale ranging between -100 and +100. We 322 hypothesized that difficulty to perceive and tap to the beat in particular would be associated 323 with weaker neural entrainment. Ratings on all three dimensions are shown in Figure 5A . To 324 evaluate the effects of tempo on the individual's ratings, separate repeated-measure ANOVAs 325 were conducted for each behavioral rating. All behavioral ratings were unaffected by tempo 326 (enjoyment: F(12,429)=0.58, p=0.85, η 2 =0.02; familiarity: F(12,429)=1.44, pGG=0.18, 327 η 2 =0.04; ease of beat tapping: F(12,429)=1.62, p=0.08, η 2 =0.05). 328

To assess the effects of familiarity, enjoyment, and beat-tapping ease on neural 329 entrainment, TRFs in response to spectral flux were calculated for the 15 trials with the 330 highest and the 15 trials with the lowest ratings per participant per behavioral rating condition 331 ( Fig. 5B-F) . TRF correlations were not significantly different for less enjoyed compared to 332 more enjoyed music (paired-sample t-test, t(33)=1.91, pFDR=0.06, re=0.36; Fig. 5C ). In 333 contrast, significantly higher TRF correlations were observed for familiar vs. unfamiliar songs 334 (t(33)=-2.57, pFDR=0.03, re=0.46), and for songs with an easier-to-perceive beat (t(33)=-2.43, 335 pFDR=0.03, re=0.44). These results were reflected in the TRFs at time lags between 0-400 ms 336 ( Fig. 5D-F) . 337

Next, we wanted to entertain the possibility that musical training could modulate 338 neural entrainment to music. Therefore, participants with less than 2 years of regular, daily 339 music training were assigned to a "non-musician" group (n=17) and participants with over 6 340 years of regular music training were labelled as "musicians" (n=12). Although there is little 341 agreement about the specific criterion that should be used to defined musician and non-342 musician participants, this division had the advantages that it ignored participants with 343 medium amounts of training and it roughly equally divided our sample. Subsequently, TRF 344 correlations were compared between groups ( Supplementary Fig. 6 ). Regardless of the 345 stimulus feature, no significant differences were detected between participants with different 346 levels of musical expertise (paired-sample t-test, envelope: pFDR=0.998; derivative: 347 pFDR=0.998; beats: pFDR=0.833; spectral flux: pFDR=0.998). Moreover, the Goldsmith's 348

Musical Sophistication Index (Gold-MSI) was used to quantify musical "sophistication" 349 (referring not only to the years of musical training, but also e. g. musical engagement or self- Brain responses to musical features predict perceived beat rate 357

In natural music, the beat can be perceived at multiple metrical levels. For that reason, it was 358 possible that listeners did not perceive the beat at the tempo we intended (the stimulation 359 tempo), but may have instead perceived the beat at double or half that rate. Thus, we wanted 360 to explore whether our TRF-based measures of neural entrainment simply reflected the 361 stimulus tempo that we presented, or whether they might be sensitive to perceived beat rate 362 when that differed from the stimulation tempo, i.e., the intended beat rate. For this analysis, 363 we made use of the tapping data that were collected in the final part of each trial, during 364 which participants finger-tapped to the beat for 5.5 s. Trials with at least three consistent taps 365 were assigned to a perceived tempo condition (1-4 Hz in steps of 0.25-Hz, see Materials and 366

Methods for more details). In this study, we will use the term "stimulation tempo" to refer to 367 the predominant beat frequency in each music segment, whereas we will use the term "tapped 368 beat rate" when referring to the tapped frequency. The preferred tapped beat rate on the group 369 level was ~1.55 Hz ( Supplementary Fig. 7C , mode of skewed Gaussian fitted to mean 370 histograms of the relative number of trials per tapped beat rate). 371

We wanted to test if we could identify the stimulation tempo (chosen by us) or the 372 tapped beat rate (rate the participant tapped to) based on the neural data, in particular when 373 the stimulation tempo and the tapped beat rate were different. We used a support vector 374 machine (SVM) classifier to first, predict the stimulation tempo ( Fig. 6A-B) and second, to 375 predict the perceived (tapped) rate based on the neural response to different musical features 376 ( Fig. 6C-D) . For predicting the stimulation tempo, we identified two sets of 6 trials (per 377 participant) each, one set where the participants tapped the intended stimulation tempo and 378 the other set where they tapped the same rate, but the intended stimulation tempo was twice as 379 fast as what the participants tapped, i.e., participants tapped the subharmonic of the 380 stimulation tempo. We were able to do this for 18 of our 34 participants. Next, TRFs were 381 computed in response to each musical feature for each set of trials (tapped rate = intended 382 stimulation tempo vs. same tapped rate = 2*stimulation tempo). The SVMs were computed 383 using bootstrapping (100 repetitions) and a leave-one-out approach. The mean SVM 384 prediction accuracies for each musical feature were compared to a surrogate distribution 385 generated by randomly shuffling the tempo labels (tapped rate = intended stimulation tempo 386 vs. same tapped rate = 2*stimulation tempo) when training the SVM classifier. We observed 387 Fig. 6A ). This shows that even if the 391 perceived tempo of two musical pieces is the same, the intended (acoustic) stimulation tempo 392 evokes varying levels of neural entrainment. For comparing the prediction accuracies across 393 musical features, an accuracy index((AccuracyData-AccuracySurr)/(AccuracyData+AccuracySurr)) 394 was submitted to a repeated-measure ANOVA. No significant differences between musical 395 features were observed (F(3,68)=0.93, p=0.43, η 2 =0.06; Fig.6B ). 396

Next, the neural response to different musical features were used to predict the tapped 397 beat rate for sets of trials with the same stimulation tempo (intended stimulation tempo = 398 tapped rate vs. same stimulation tempo = 2*tapped rate). Analogous to the previously 399 described analysis pipeline, 13 individual datasets from different tempo conditions (this time 400 from only 9 participants with each one dataset and two participants with each two datasets to 401 increase the sample size) were identified that met the criterion. Fig. 6C ), suggesting that entrained neural responses also possess 406 unique signatures of the perceived beat rate, even when it is different from the stimulation 407 tempo. No significant differences in predicting the tapped beat rate between musical features 408 were observed (F(3,48)=1.04, p=0.39, η 2 =0.09; Fig. 6D ). 409 classifier predicting the stimulation tempo (n=18; tapped rate = intended stimulation tempo vs. same tapped rate = 2*stimulation tempo). Based on the TRFs to all musical features, significant differences in prediction accuracies were computed in comparison to a surrogate (paired-sample t-test, *pFDR<0.001). (B) Comparison of SVM classifier accuracies ((AccuracyData-AccuracySurr)/(AccuracyData+AccuracySurr)) across musical features revealed no significant differences in predicting the stimulation tempo (repeated-measure ANOVA, p=0.43). (C)-(D) Same as (A)-(B), but here the SVM classifier predicted the tapped rate based on the TRFs (n=13; intended stimulation tempo = tapped rate vs. same stimulation tempo = 2*tapped rate) (paired-sample t-test, *pFDR<0.001). No differences were observed in SVM prediction accuracies across musical features (repeated-measure ANOVA, p=0.39). 410

We investigated neural entrainment to naturalistic, polyphonic music presented at different 412 tempi. The music stimuli varied along a number of dimensions in idiosyncratic ways, 413

including the familiarity and enjoyment of the music, and the ease with which the beat was 414 perceived. The current study demonstrates that neural entrainment is strongest to 1) music 415 with beat rates between 1 and 2 Hz, 2) spectral flux of music, 3) familiar music and music 416 with an easy-to-perceive beat. In addition, 4) brain responses to the music stimuli were 417 informative regarding the listeners' perceived metrical level of the beat, and 5) analysis 418 approaches based on TRF and RCA revealed converging results. 419 420 Neural entrainment was strongest to music with beat rates in the 1-2 Hz range 421

Strongest neural entrainment was found in response to stimulation tempi between 1 and 2 Hz 422 in terms of SRCorr (Fig. 2B) , TRF correlations (Fig. 3A) , and TRF weights (Fig. 3C-F) . 423

Moreover, we observed a behavioral preference to tap to the beat in this frequency range, as 424 the group preference for music tapping was at 1.55 Hz (Supplementary Fig. 7C ). Previous 425 studies have shown a preference to listen to music with beat rates around 2 Hz (Bauer et al., 426 2015) , which is moreover the modal beat rate in Western pop music (Moelants, 2002) Thus, there is a tight link between preferred rates of human body movement and preferred 433 rates for the music we make and listen to that was moreover reflected in our neural data. This 434 is perhaps not surprising, as musical rhythm perception activates motor areas of the brain, 435 such as the basal ganglia and supplementary motor area (Grahn and Brett, 2007) , and is 436 further associated with increased auditory-motor functional connectivity (Chen et al., 2008) . 437

In turn, involving the motor system in rhythm perception tasks improves temporal acuity 438 (Morillon et al., 2014) , but only for beat rates in the 1-2 Hz range (Zalta et al., 2020) . 439

In the frequency domain, SRCoh was strongest at the stimulation tempo and its harmonics 440 ( Fig. 2E-I) . In fact, highest coherence was observed at the first harmonic and not at the 441 stimulation tempo itself (Fig. 2I) . This replicates previous work that also showed higher 442 coherence (Kaneshiro et al., 2020) and spectral amplitude (Tierney and Kraus, 2015) at the 443 first harmonic than at the musical beat rate. There are several potential reasons for this 444 finding. One reason could be that the stimulation tempo that we defined for each musical 445 stimulus was based on beat rate, but natural music can be subdivided into smaller units (e.g., 446 notes) that can occur at faster time scales. A recent MEG study demonstrated inter-trial phase 447 coherence for note rates up to 8 Hz (Doelling and Poeppel, 2015) . Hence, the neural responses 448 to the music stimuli in the current experiment likely tracked not only the beat rate, but also 449 faster elements such as notes. In line with this hypothesis, FFTs conducted on the stimulus 450 features themselves showed higher amplitudes at the first harmonic than the stimulation 451 tempo for all musical features except the beat onsets (Fig. 2J) . Moreover, there are other 452 explanations for higher coherence at the first harmonic than at the beat rate. For example, the 453 low-frequency beat-rate neural responses fall into a steeper part of the 1/f slope, and as such 454 may simply suffer from worse signal-to-noise ratio than their harmonics. 455

Regardless of the reason, since frequency-domain analyses separate the neural response 456 into individual frequency-specific peaks, it is easy to interpret neural tracking (SRCoh) or 457 stimulus spectral amplitude at the beat rate and the note rateor at the beat rate and its 458 harmonicsas independent (Keitel et al., 2021) . However, music is characterized by a nested, 459 hierarchical rhythmic structure, and it is unlikely that neural tracking at different metrical 460 levels goes on independently and in parallel. One potential advantage of TRF-based analyses 461 is that they operate on relatively wide-band data compared to Fourier-based approaches, and 462 as such are more likely to preserve nested neural activity and perhaps less likely to lead to 463 over-or misinterpretation of frequency-specific effects. (Patel, 2003) , there are 472 differences in their spectro-temporal composition that make spectral information especially 473 important for music perception. For example, while successful speech recognition requires 4-474 8 spectral channels, successful recognition of musical melodies requires at least 16 spectral 475 channels (Shannon, 2005 )the flipside of this is that music is more difficult than speech to 476 understand based only on amplitude-envelope information. Moreover, increasing spectral 477 complexity of a music stimulus enhances neural entrainment (Wollman et al., 2020) . 478

Critically, both temporal and spectral information influence the perceived accent structure in 479 music (Pfordresher, 2003) . 480

A recent study claimed that neuronal activity synchronizes less strongly to music than 481 to speech (Zuk et al., 2021); notably they focused specifically on amplitude envelope to 482 characterize the stimulus rhythms. We argue that the amplitude envelopeeven when passed 483 through a model of the peripheral auditory systemis a suboptimal measure to approximate 484 individual note onsets that convey rhythmic structure in music and to which neural activity 485 can be entrained (Mller, 2015) . Imagine listening to a melody played in a glissando fashion on 486 a violin. There might never be a clear onset that would be represented by the amplitude 487 envelopeall of the rhythmic structure is communicated by spectral changes. Thus, in this 488 study we wanted to compare neural entrainment by the amplitude envelope to neural 489 entrainment by spectral flux, which compares spectral content, i.e., power spectra, on a frame-490 to-frame basis, and which is arguably a more appropriate measure of rhythmic and metrical 491 structure in music. Indeed, many automated tools for extracting the beat in music used in the 492 musical information retrieval (MIR) literature rely on spectral flux information (Oliveira et 493 al., 2010) . Also in the context of body movement, spectral flux has been associated with the 494 type and temporal acuity of synchronization between the body and music at the beat rate 495 (Burger et al., 2018) to a greater extent than other acoustic characterizations of musical 496 rhythmic structure. As such, we found that spectral flux drove stronger entrainment than the 497 amplitude envelope. 498

Using TRF analysis, we found that not only was neural entrainment to spectral flux 499 stronger than to any other musical feature, it was also stronger than to the response to a mutli-500 variate predictor that combined all musical features. For this reason, we calculated the shared 501 information (MI) between each pair of musical features, and found that spectral flux shared 502 significant information with all other musical features (Fig. 1) . Hence, spectral flux seems to 503 capture information also contained in, for example, the amplitude envelope, but contains 504 unique information about rhythmic structure that cannot be gleaned from the other acoustic 505 features (Fig. 3) . This finding has potentially important implications for direct comparisons of 506 neural tracking of music and speech, or music and natural sounds (Zuk et al., 2021) . We 507 would caution that conclusions about differences in how neural activity entrains to different 508 categories of sounds should be sure to characterize stimuli as fairly as possible rather than 509 relying on the amplitude envelope as a one-size-fits-all summary of rhythmic structure. 510 511

We found that the strength of neural entrainment depended on the familiarity of music and the 513 ease with which a beat could be perceived (Fig. 5) . This is in line with a previous study 514

showing stronger neural entrainment to familiar music (Madsen et al., 2019) . It is likely that 515 songs a person knowsfamiliar songsincrease engagement. We note that we did not have a 516 measure of engagement, though engagement has been shown to be a major driver of neural 517 entrainment during film viewing (Dmochowski et al., 2014) . 518

There was also higher neural entrainment to music with subjectively "easy-to-tap-to" 519 beats. However, both neural entrainment and ease of beat tapping were highest for slow 520 stimulation tempi; faster songs were associated with weaker entrainment and tended to be 521 rated as more difficult to tap to. Thus, in the current study, it is not possible to separate the 522 influences of stimulation tempo and beat salience on neural entrainment. Here, we chose 523 music stimuli with salient, easy-to-perceive beats. However, a design including more "weakly 524 metrical" or syncopated rhythms may have more success in doing so. Overall, we interpret 525 our results as indicating that stronger neural entrainment is evoked in response to music that is 526 more predictable: familiar music and with easy-to-track beat structure. 527

Musical training did not affect the degree of neural entrainment in response to tempo-528 modulated music (Supplementary Fig. 6 ). This contrasts with previous music research 529

showing that musicians' neural activity was entrained more strongly by music than non- One interesting yet difficult aspect of music, when it comes to studying entrainment, is 544 that music has metrical structure; that is, there are several levels at which nested periodicities 545 can be perceived. Here, we asked participants to tap along with short sections of each musical 546 stimulus so that we could confirm that their perceived (tapped) beat rate matched our intended 547 stimulation tempo. Although participants mostly tapped at the rate we intended, they 548 sometimes tapped at half or double the intended stimulation tempo, especially when the 549 stimulation tempo was particularly fast or slow, respectively. Here, we applied a classification 550 approach to demonstrate that entrained neural responses to music can predict a) whether 551 participants tapped at double-time or half-time to stimuli with the same stimulation tempo, or 552 b) whether stimuli to which participants tapped identically belonged to the double-time or 553 half-time stimulation-tempo condition. Importantly, neural activity was measured in response 554 to auditory stimulation (without movement) and the perceived metrical level was based on the 555 beat tapping rate established in a separate part of each trial after the listening portion was 556 over. To our knowledge, this study constitutes the first to successfully identify the specific 557 metrical level at which individuals perceived a beat in the absence of overt movement. 558

Nonetheless, there are a few caveats to mention. First, we chose musical stimuli that all had a 559 relatively easy-to-perceive beat. As a result, only 11 participants had enough trials with 560 metrically ambiguous tapping behaviour to stimuli belonging to the same intended stimulation 561 tempo condition for conducting TRF analysis. Moreover, we initially only included the beat-562 tapping section of each trial as a verification of the validity of our tempo manipulation. As 563 such, we only collected tapping responses for 5.5 s per trial, and tapping behavior was quite 564 difficult to analyze due to the short tapping epochs, which resulted in many tapping trials 565 being discarded. 566 567

In the present study, we used the TRF and RCA analysis approaches to quantify neural 569 entrainment. Here, we have purposefully avoided the debate about whether these metrics 570 measure entrainment "in the narrow sense" (Obleser and Kayser, 2019), meaning phase-571 locked and (mainly) unidirectional coupling between a rhythmic input and neural activity 572 generated by a neural oscillator (Lakatos et al., 2019) or whether neural tracking reflects 573 convolution with an evoked response (Zuk et al., 2021) . Here, we prefer to remain agnostic, 574 and refer rather to "entrainment in the broad sense" (Obleser and Kayser, 2019) , that is neural 575 tracking of music independent of the underlying physiological mechanism. 576 RCA and TRF approaches share their ability to characterize neural responses to single-577 trial, ongoing, naturalistic stimuli. As such, both techniques afford something that is 578 challenging or impossible to accomplish with "classic" ERP analysis. However, we made use 579 of two techniques in parallel in order to leverage their unique advantages. RCA allows for 580 frequency-domain analysis such as SRCoh, which can be useful for identifying neural 581 tracking responses specifically at the beat rate, for example. Past music studies often used a 582 "frequency-tagging" approach for this, which is based on averaging over trials in the time 583 domain (so requires repetition of stimuli) rather than relating the neural response to the 584 stimulus time course, and moreover operates in electrode as opposed to component space 585 (Nozaradan et al., 2012 , Nozaradan et al., 2011 . TRFs rather take into account wider-band 586 neural data, which may better capture the tracking of nested metrical structure as in music. 587

Moreover, TRFs offer a univariate and multivariate analysis approach that allowed us to show 588 that adding other musical features to the model did not improve the correspondence to the 589 neural data over and above spectral flux alone. Despite their differences, we found strong 590 correspondence between the dependent variables from the two approaches. Specifically, TRF 591 correlations were strongly correlated with stimulation-tempo SRCoh, and this correlation was 592 higher than for SRCoh at the first harmonic of the stimulation tempo for the amplitude 593 envelope, derivative and beat onsets ( Supplementary Fig. 5) . Thus, despite being computed on 594 a relatively broad range of frequencies, the TRF seems to be correlate with frequency-specific 595 measures at the stimulation tempo. 596 597

This study presented new insights into neural entrainment to natural music. We compared 599 neural entrainment to different musical features and showed strongest neural responses to the 600 spectral flux. This has important implications for research on neural entrainment to music 601 research, which has so far often quantified stimulus rhythm with what we would argue is a 602 subpar acoustic featurethe amplitude envelope. Moreover, our findings demonstrate that 603 neural entrainment is strongest for slower beat rates, and for predictable stimuli, namely 604 familiar music with an easy-to-perceive beat. 605

Participants 607

Thirty-seven participants completed the study (26 female, 11 male, mean age = 25.7 years, 608 SD = 4.33 years, age range = 19-36 years). Target sample size for this was estimated using 609 G*Power3, assuming 80% power for a significant medium-sized effect. We estimate a target 610 sample size of 24 (+ 4) for within-participant condition comparisons and 32 (+ 4) for 611 correlations, and defaulted to the larger value since this experiment was designed to 612 investigate both types of effects. The values in parentheses were padding to allow for 613 discarding ~ 15% of the recorded data. The datasets of three participants were discarded 614 because of large artefacts in the EEG signal (see section EEG data Preprocessing), technical 615 problems and for not following the experimental instructions. The behavioral and neural data 616 of the remaining 34 participants were included in the analysis. 617

Prior to the EEG experiment, all participants filled out an online survey about their 618 demographic and musical background using LimeSurvey (LimeSurvey GmbH, Hamburg, 619 Germany, http://www.limesurvey.org). All participants self-identified as German speakers. 620

Most participants self-reported normal hearing (7 participants reported occasional ringing in 621 one or both ears). Thirty-four participants were right-and three were left-handed. Seventeen 622 participants reported having no musical background (0-2 years of daily music training, here 623 termed as "non-musicians") and 12 reported at least 6 years of musical training ("musicians"). The stimulus set started from 39 instrumental versions of musical pieces from different 632 genres, including techno, rock, blues, and hip-hop. The musical pieces were available in a 633 *.wav format on Qobuz Downloadstore (https://www.qobuz.com/de-de/shop). Each musical 634 piece was segmented manually using Audacity (Version 2.3.3, Audacity Team, 635 https://www.audacityteam.org) at musical phrase boundaries (e.g., between chorus and verse), 636 leading to a pool of 93 musical segments with varying lengths between 14.4 -38 s. We did 637 not use the beat count from any publicly available beat-tracking softwares, because they did 638 not track beats reliably across genres. Due to the first Covid-19 lockdown, we assessed the 639 original tempo of each musical segment using an online method. Eight tappers, including the 640 authors, listened to and tapped to each segment on their computer keyboard for a minimum of 641 17 taps; the tempo was recorded using an online BPM estimation tool 642 (https://www.all8.com/tools/bpm.htm). In order to select stimuli with unambiguous strong 643 beats that are easy to tap to, we excluded 21 segments due to high variability in tapped 644 metrical levels (if more than 2 tappers tapped different from the others) or bad sound quality. 645

The remaining 72 segments were then tempo-manipulated using a custom-written 646 Table 1) . 655

Each participant was assigned to one of four pseudo-randomly generated stimulus 656

lists. Each list comprised 4-4.6 min of musical stimulation per tempo condition (Kaneshiro et 657 al., 2020) , resulting in 7-17 different musical segments per tempo and a total of 159-162 658 segments (trials) per participant. Each segment was repeated only once per tempo but was 659 allowed to occur for up to three times at different tempi within one experimental session 660 (tempo difference between two presentations of the same segment was 0.5 Hz minimum). The 661 presentation order of the musical segment was randomly generated for each participant prior 662 to the experiment. The music stimuli were played at 50 dB sensation level (SL), based on 663 individual hearing thresholds that were determined using the method of limits (Leek, 2001) . 664 665

After attaching the EEG electrodes and seating the participant in an acoustically and 667 electrically shielded booth, the participant was asked to follow the instructions on the 668 computer screen (BenQ Monitor XL2420Z, 144Hz, 24", 1920x1080, Windows 7 Pro (64-669 bit)). The auditory and visual stimulus presentation was achieved using custom-written 670

Matlab scripts using Psychtoolbox (PTB-3, (Brainard, 1997) ) in Matlab (R2017a; The 671

MathWorks, Natick, MA, USA). 672

The overall experimental flow for each participant can be found in Figure 1A . First, 673 each participant conducted a self-paced spontaneous motor tempo task (SMT; (Fraisse, 1982) ) 674 which is a commonly used technique to assess individual's preferred tapping rate (Rimoldi, 675 1951 , McAuley, 2010 . To obtain SMT, each participant tapped for thirty seconds (3 676 repetitions) at a comfortable rate with a finger on the table close to a contact microphone 677 (Oyster S/P 1605, Schaller GmbH, Postbauer-Heng, Germany). Second, we estimated 678 individual's hearing threshold using the method of limits. All sounds in this study were 679 delivered by a Fireface soundcard (RME Fireface UCX Audiointerface, Audio AG, 680

Haimhausen, Germany) via on-ear headphones (Beyerdynamics DT-770 Pro, Beyerdynamic 681

GmbH & Co. KG, Heilbronn, Germany). After a short three-trial training, the main task was 682 performed. The music stimuli in the main task were grouped into eight blocks with 683 approximately 20 trials per block and the possibility to take a break in between. 684

Each trial comprised two parts: attentive listening (music stimulation without 685 movement) and tapping (music stimulation + finger tapping; Fig. 1A ). During attentive 686 listening, one music stimulus was presented (8.3-56.6 s) while the participant looked at a 687 fixation cross on the screen; the participant was instructed to mentally locate the beat without 688 moving. Tapping began after a 1-s interval; the last 5.5 s of the previously listened musical 689 segment were repeated, and participants were instructed to tap a finger to the beat of the 690 musical segment (as indicated by the replacement of the fixation cross by a hand on the 691 computer screen). Note that 5.5 s of tapping data is not sufficient to conduct standard analyses 692 of sensorimotor synchronization; rather, our goal was to confirm that the participants tapped 693 at the intended beat rate based on our tempo manipulation. After each trial, participants were 694 asked to rate the segment based on enjoyment/pleasure, familiarity and ease of tapping to the 695 beat with the computer mouse on a visual analogue scale ranging from -100 to +100. At the 696 end of the experiment, the participant performed the SMT task again for three repetitions. 697 698 EEG data acquisition 699 EEG data were acquired using BrainVision Recorder (v.1.21.0303, Brain Products GmbH, 700

Gilching, Germany) and a Brain Products actiCap system with 32 active electrodes attached 701 to an elastic cap based on the international 10-20 location system (actiCAP 64Ch Standard-2 702 Layout Ch1-32, Brain Products GmbH, Gilching, Germany). The signal was referenced to the 703 FCz electrode and grounded at the AFz position. Electrode impedances were kept below 10 704 kOhm. The brain activity was acquired using a sampling rate of 1000 Hz via a BrainAmp DC 705 amplifier (BrainAmp ExG, Brain Products GmbH, Gilching, Germany). To ensure correct 706 timing between the recorded EEG data and the auditory stimulation, a TTL trigger pulse over 707 a parallel port was sent at the onset and offset of each musical segment and the stimulus 708 envelope was recorded to an additional channel using a StimTrak (StimTrak, Brain Products 709 GmbH, Gilching, Germany). 710 711

Behavioral data. Tapping data was processed offline with a custom-written Matlab script. To 713 extract the taps, the *.wav files were imported and downsampled (from 44.1 kHz to 2205 Hz). 714

The threshold for extracting the taps was adjusted for each trial manually (SMT and music 715 tapping) and trials with irregular tap intervals were rejected. The SMT results were not 716 analyzed as part of this study and will not be discussed further. For the music tapping, only 717 trials with at least three taps (two intervals) were included for further analysis. Five 718 participants were excluded from the music tapping analysis due to irregular and inconsistent 719 taps within a trial (if > 40% of the trials were excluded). 720

One of our goals was to test whether we could identify trials based on the neural data 721

where the perceived tempo differed from the intended stimulation rate (see Brain responses to 722 musical features can predict the produced beat tapping rate). For this analysis, we identified 723 two subsets of participants: those that tapped the same tempo to two sets of stimuli with 724 different intended stimulation tempi, and those that tapped the intended stimulation tempo on 725 some trials and a different tempo than what was intended (the harmonic or subharmonic) on 726 other trials. We identified 18 participants that tapped for at least 6 trials at the intended 727 stimulation tempo and tapped for at least 6 trials at the same tempo when the stimulation 728 tempo was something different (double the tapped tempo; i.e., participants tapped at half the 729 intended stimulation tempo). In contrast, we identified 11 participants that tapped for at least compared. All stimulus features were z-scored and downsampled to 128 Hz for computing the 758 stimulus-brain synchrony. To account for slightly different numbers of samples between 759 stimulus features, they were cut to have matching sample sizes. 760

To validate that each musical feature contained acoustic cues to our tempo 761 manipulation, we conducted a discrete Fourier transform using a Hamming window on each 762 musical segment (resulting frequency resolution of 0.0025 Hz), averaged and z-scored the 763 amplitude spectra per tempo and per musical feature (Fig. 1C) . 764

To assess how much information the different musical features share, a mutual 765 information (MI) score was computed between each pair of musical features (Fig. 1D) . MI (in 766 bits) is a time-sensitive measure that quantifies the reduction of uncertainty for one variable 767 after observing a second variable (Cover and Thomas, 2005) . MI was computed using 768 quickMI from the Neuroscience Information Theory Toolbox with 4 bins, no delay, and a p-769 value cut-off of 0.001 (Timme and Lapish, 2018) . For each stimulus feature, all trials were 770 concatenated in the same order for each tempo condition and stimulation subgroup (Time x 13 771 Tempi x 4 Subgroups). MI values for pairs of musical features were compared to surrogate 772 datasets in which one musical feature was time reversed (Fig. 1D) . To statistically asses the 773 shared information between musical features, a three-way ANOVA test was performed (with 774 first factor: data-surrogate comparison; second factor: tempo and third factor: stimulation 775 subgroup downsampled to 500 Hz, and epoched between 1 s after stimulus onset (to remove onset 782 responses to the start of the music stimulus) until the end of the initial musical segment 783 presentation (attentive listening part of the trial). Single trials and channels containing large 784 artefacts were removed based on an initial visual inspection. Missing channels were 785 interpolated based on neighbouring channels with a maximum distance of 3 786 (ft_prepare_neighbours). Subsequently, Independent Component Analysis (ICA) was applied 787 to remove artefacts and eye movements semi-automatically. After transforming the data back 788 from component to electrode space, electrodes that exceeded 4 standard deviations of the 789 mean squared data for at least 10% of the recording time were excluded. If bad electrodes 790 were identified, pre-processing for that recording was repeated after removing the identified 791 electrode (Kaneshiro et al., 2020) . For the RCA analysis, if an electrode was identified for 792 which 10% of the trial data exceeded a threshold of mean + 2 standard deviations of the 793 single-trial, single-electrode mean squared amplitude, the electrode data of the entire trial was 794 replaced by NaNs. Next, noisy transients of the single-trial, single-electrode recordings were 795 rejected. Therefore, data points were replaced by NaNs when the data points exceeded a 796 threshold of two standard deviations of the single-trial, single-electrode mean squared 797 amplitude. This procedure was repeated four times to ensure that all artefacts were removed 798 (Kaneshiro et al., 2020) . For the TRF analysis, which does not operate on NaNs, noisy 799 transients were replaced by estimates using shape-preserving piecewise cubic spline 800 interpolation or by the interpolation of neighbouring channels for single-trial bad electrodes. 801

Next, the data were restructured to match the requirements of the RCA or TRF (see To examine the correlation between the neural signal and stimulus over time, the 828 stimulus-response correlation (SRCorr) was calculated for every musical feature. This 829 analysis procedure was adopted from (Kaneshiro et al., 2020) . In brief, every stimulus feature 830 was concatenated in time with trials of the same tempo condition and subgroup to match the 831 neural component-by-time matrix. The stimulus features were temporally filtered to account 832 for the stimulus-brain time lag, and the stimulus features and neural time-courses were 833 correlated. To create a temporal filter, every stimulus feature was transformed into a Toeplitz 834 matrix, where every column repeats the stimulus-feature time course, shifted by one sample 835 up to a maximum shift of 1 s, plus an additional intercept column. The Moore-Penrose 836 latencies retained. As there was no significant correlation between latencies and tempo 889 conditions, the stimulation tempi were split upon visual inspection into two groups (1-2.5 Hz 890 and 2.75-4 Hz). Subsequently, a piecewise linear regression was fitted to the data and the R 2 891 and p-values calculated ( Supplementary Fig. 4G, K) . 892

TRFs were evaluated based on participant ratings of enjoyment, familiarity, and ease 893 to tap to the beat. Two TRFs were calculated per participant based on the 15 highest and 15 894 lowest ratings on each measure (ignoring tempo condition and subgroup), and the TRF 895 correlations and time lags were compared between the two groups of trials (Fig. 5) . 896

Significant differences between the groups were evaluated based on paired-sample t-tests. 897

The effect of musical sophistication was analyzed by computing the Pearson 898 correlation coefficients between the maximum TRF correlation across tempi per participant 899 and the general musical sophistication (Gold-MSI) per participant ( Supplementary Fig. 6) . 900 A support vector machine (SVM) classifier tested whether TRFs captured information 901 about the intended stimulation tempo, the perceived beat rate, or both (Fig. 6) . As described 902

previously (see Behavioral Analysis), individual tempo conditions were identified in which 903 participants tapped the same rate for two sets of trials that had different intended stimulation 904 tempi, and conditions were also identified in which participants tapped two different rates in 905 response to the same intended stimulation tempo. TRF analysis was performed separately for 906 those two groups of trials, and the z-scored TRF weights were fed into the SVM classifier. 907

First, the SVM classifier was trained to predict the stimulation tempo based on the TRF 908 weights for trials on which the stimulation tempo corresponded to the tapped rate versus trials 909 when the same tapped rate was twice the stimulation tempo (tapped rate = intended 910 stimulation tempo vs. same tapped rate = 2*stimulation tempo; n=18). In comparison, we next 911 identified participants that tapped for 6 trials at the intended tempo and for 6 trials at the 912 harmonic of that intended tempo (intended stimulation tempo = tapped rate vs. same 913 stimulation tempo = 2*tapped rate, n=13). The resulting TRFs were used to predict the tapped 914 rate of the participants. Overall, the classifier was trained to find the optimal hyperplane that 915 separates the data (fitcsvm) and was validated in with a leave-one-out cross-validation method 916 (crossval). Classification error (quantified with kfoldLoss) was compared to a surrogate 917 condition in which the labels of the classifier were randomly shuffled during the training step. 918

The SVM was computed for 100 iterations of the surrogate data. An SVM-accuracy metric 919 was quantified as: 920 best predictors of the random effects and the fixed-effects coefficients (beta) were computed 930 for every musical feature and illustrated as violin plot (Fig. 4) . 931 932 Statistical Analysis. 933

For each analysis, we assessed the overall difference between multiple subgroups 934 using a one-way ANOVA. To test for significant differences across tempo conditions and 935 musical features (TRF Correlation, SRCorr and SRCoh), repeated-measure ANOVAs were 936 conducted coupled to Tukey's test and Greenhouse-Geiser correction was applied when the 937 assumption of sphericity was violated (as calculated with the Mauchly's test). As effect size 938 measures, we report partial η 2 for repeated-measures ANOVAs and requivalent for paired sample 939

Where applicable, the p-values were corrected using the 940

False Discovery Rate (FDR)

Individual musical tempo 962 preference correlates with EEG beta rhythm

A tutorial on onset detection in music signals

The Psychophysics Toolbox. Spatial Vision

Semantic Context 968 Enhances the Early Auditory Encoding of Natural Speech

971 Synchronization to metrical levels in music depends on low-frequency spectral 972 components and tempo

Neural entrainment is associated with 975 subjective groove and complexity for performed but not mechanical musical rhythms

Listening to musical rhythms 978 recruits motor regions of the brain

Entropy, Relative Entropy, and Mutual Information

Multivariate Temporal Response Function (mTRF) Toolbox: A MATLAB Toolbox 982 for Relating Neural Signals to Continuous Stimuli

Simple Acoustic Features Can Explain 985

Phoneme-Based Predictions of Cortical Responses to Speech

Evidence for enhanced 988 neural tracking of the speech envelope underlying age-related speech-in-noise 989 difficulties

Cortical 992 encoding of melodic expectations in human temporal cortex

994 Temporal modulations in speech and music

Neural coding of continuous speech in auditory cortex 997 during monaural and dichotic listening

Audience preferences are predicted by 1000 temporal reliability of neural processing

Correlated components 1002 of ongoing EEG point to emotionally laden attention -a possible marker of 1003 engagement? Frontiers in human neuroscience

Acoustic landmarks 1005 drive delta-theta oscillations to enable speech comprehension by facilitating 1006 perceptual parsing

Cortical entrainment to music and its modulation 1008 by expertise

6 -Rhythm and Tempo

Cortical oscillations and speech processing: emerging 1012 computational principles and operations

Rhythm and Beat Perception in Motor Areas of the 1014 Brain

Phasic Modulation of Human Somatosensory Perception by Transcranially Applied 1017 Oscillating Currents

Low-Frequency Neural Oscillations Support 1021 Dynamic Attending in Temporal Context. Timing & Time Perception

Frequency modulation entrains slow neural oscillations 1023 and optimizes human listening behavior

Natural music evokes correlated EEG responses reflecting temporal structure 1027 and beat

Frequency-Specific Effects 1029 in Infant Electroencephalograms Do Not Require Entrained Neural Oscillations: A 1030 Commentary on

Neural Entrainment Determines the Words We Hear

Music Familiarity Affects EEG 1035 Entrainment When Little Attention Is Paid

A New Unifying Account of the Roles of 1037 Neuronal Entrainment

Pulse and meter as neural resonance

Adaptive procedures in psychophysical research. Perception & 1041 Psychophysics

Marching to the beat of the same drummer: 1043 the spontaneous tempo of human locomotion

Music 1045 synchronizes brainwaves across listeners with strong effects of repetition, familiarity 1046 and training

Tempo and rhythm. Music perception

1050 The time of our lives: life span development of timing and event tracking

Fundamentals of Music Processing: Audio, Analysis

Preferred tempo reconsidered

Motor contributions to the 1057 temporal precision of auditory attention

The Musicality of 1059 Non-Musicians: An Index for Assessing Musical Sophistication in the General

Is Modulated by Music Tempo. Frontiers in Human 1064 Neuroscience

Selective neuronal entrainment to 1068 the beat and meter embedded in a musical rhythm

Neural Entrainment and Attentional Selection in the 1070 Listening Brain

IBT: A Real-time Tempo and 1072 Beat Tracking System

FieldTrip: Open 1074 Source Software for Advanced Analysis of MEG, EEG, and Invasive

Correlated Components Analysis

Extracting Reliable Dimensions in Multivariate Data

Rhythm in language and music: parallels and differences

Neural Oscillations Carry Speech Rhythm through to 1082 Comprehension

Phase-Locked Responses to Speech in 1084

Human Auditory Cortex are Enhanced During Comprehension

The Role of Melodic and Rhythmic Accents in Musical 1087 Structure

Neural tracking of 1089 the speech envelope is differentially modulated by attention and language experience

Personal tempo

1093 Comparison of Spontaneous Motor Tempo during Finger Tapping, Toe Tapping and 1094 Stepping on the Spot in People with and without Parkinson's Disease

requivalent: A simple effect size indicator

Speech and Music Have Different Requirements for Spectral 1099 Resolution

Local entrainment of α oscillations by 1101 visual stimuli causes cyclic modulation of perception

EEG and MEG 1103 coherence: measures of functional connectivity at distinct spatial scales of neocortical 1104 dynamics

Neural entrainment to the rhythmic structure of music

A Tutorial for Information Theory in Neuroscience. 1108 eneuro

Sustained neural 1110 rhythms reveal endogenous oscillations supporting speech perception

Music as a scaffold for listening to speech: Better neural phase-locking to song 1114 than speech

Neural 1116 entrainment to music is sensitive to melodic spectral complexity

Natural rhythms of periodic temporal 1119 attention

Structure and function of auditory 1121 cortex: music and speech

Envelope 1123 reconstruction of speech and music highlights stronger tracking of speech at low 1124 frequencies

pseudoinverse of the Toeplitz matrix and temporal filter was used to calculate the SRCorr. To 837 report the SRCorr, the mean (± SEM) correlation coefficient across tempo conditions for 838 every stimulus feature was calculated. For comparing tempo-specificity between musical 839 features, a linear regression was fit to SRCorr values (and TRF correlations) as a function of 840 tempo for every participant and for every musical feature (using fitlm). We compared the 841 resulting slopes across musical features with a one-way ANOVA 842Stimulus-response coherence (SRCoh) is a measure that quantifies the consistency of 843 phase and amplitude of two signals in a specific frequency band and ranges from 0 (no 844 coherence) to 1 (perfect coherence) (Srinivasan et al., 2007) . Here, the magnitude-squared 845 coherence between different stimulus features and neural data was computed using the The assessment of TRF weights across time lags was accomplished by using a 873 clustering approach for each musical feature and comparing significant data clusters to 874 clusters from a random distribution (Fig. 3C-F) . To extract significant time windows in which 875 the TRF weights were able to differentiate the different tempo conditions, a one-way ANOVA 876 was performed at each time point. Clusters (consecutive time windows) were identified if the 877 p-value was below a significance level of 0.01 and the size and F-statistic of those clusters 878 were retained. Next, the clusters were compared to a surrogate dataset, which followed the 879 same procedure, but had the labels of the tempo conditions randomly shuffled before entering 880 it to the ANOVA. This step was repeated for 1000 times (permutation testing). At the end, the 881 significance of clusters was evaluated by subtracting the proportion of times the summed F-882 values of each clusters exceeded the summed F-values of the surrogate clusters from 1. A p-883 value below 0.05 was considered significant (Fig. 3G ). This approach yielded significant 884 regions for the full-band (Hilbert) envelope and derivative ( Supplementary Fig. 4 ). As these 885 clusters did not show differences across amplitudes but rather in time, a latency analysis was 886 conducted. Therefore, local minima around the grand average minimum or maximum within 887 the significant time lag window were identified for every participant/tempo condition and the 888