key: cord-0316304-3f3tmuzk
authors: Gnanateja, G. Nike; Rupp, Kyle; Llanos, Fernando; Remick, Madison; Pernia, Marianny; Sadagopan, Srivatsun; Teichert, Tobias; Abel, Taylor J.; Chandrasekaran, Bharath
title: Deconstructing the Cortical Sources of Frequency Following Responses to Speech: A Cross-species Approach
date: 2021-10-04
journal: bioRxiv
DOI: 10.1101/2021.05.17.444462
sha: db1ef795f58fdf39dbe39fe47f56a3734fbab67c
doc_id: 316304
cord_uid: 3f3tmuzk

Time-varying pitch is a vital cue for human speech perception. Neural processing of time-varying pitch has been extensively assayed using scalp-recorded frequency-following responses (FFRs), an electrophysiological signal thought to reflect integrated phase-locked neural ensemble activity from subcortical auditory areas. Emerging evidence increasingly points to a putative contribution of auditory cortical ensembles to the scalp-recorded FFRs. However, the properties of cortical FFRs and precise characterization of laminar sources are still unclear. Here we used direct human intracortical recordings as well as extra- and intracranial recordings from macaques and guinea pigs to characterize the properties of cortical sources of FFRs to time-varying pitch patterns. We found robust FFRs in the auditory cortex across all species. We leveraged representational similarity analysis as a translational bridge to characterize similarities between the human and animal models. Laminar recordings in animal models showed FFRs emerging primarily from the thalamorecepient layers of the auditory cortex. FFRs arising from these cortical sources significantly contributed to the scalp-recorded FFRs via volume conduction. Our research paves the way for a wide array of studies to investigate the role of cortical FFRs in auditory perception and plasticity. Significance Statement Frequency following responses (FFRs) to speech are scalp-recorded neural signals that inform the fidelity of sound encoding in the auditory system. FFRs, long believed to arise from brainstem and midbrain, have shaped our understanding of sub-cortical auditory processing and plasticity. Non-invasive studies have shown cortical contributions to the FFRs, however, this is still actively debated. Here we employed direct cortical recordings to trace the cortical contribution to the FFRs and characterize the properties of these cortical FFRs. With extra-cranial and intra-cranial recordings within the same subjects we show that cortical FFRs indeed contribute to the scalp-recorded FFRs, and their response properties differ from the sub-cortical FFRs. The findings provide strong evidence to revisit and reframe the FFR driven theories and models of sub-cortical auditory processing and plasticity with careful characterization of cortical and sub-cortical components in the scalp-recorded FFRs.

Details of the cranial EEG recordings have been reported previously (Teichert, 2016; 208 Teichert et al., 2016) . Briefly, EEG electrodes manufactured in-house from medical grade 209 stainless steel were implanted in 1mm deep, non-penetrating holes in the cranium of Maq1. All 210 electrodes were connected to a 36-channel Omnetics connector embedded in dental acrylic at the 211 back of the skull. The 33 electrodes formed regularly-spaced grids covering roughly the same 212 anatomy covered by the international 10-20 system (Li and Teichert, 2020) . All the electrodes 213 were referenced to an electrode placed at Oz. 214

For the single-tipped sharp electrode recordings in Maq1, neural activity was recorded 216 with a chronically implanted 96 channel electrode array with individually movable electrodes 217 (SC96 from Graymatter). For the laminar recording in Maq2, neural activity was recorded with a 218 24 channel laminar electrode (S-Probe from Plexon) positioned approximately perpendicular to 219 the left superior temporal plane's orientation. The depth of the probe was adjusted iteratively 220 until the prominent sound-evoked supra-granular source was located slightly above the center of 221 the probe. At the time of the experiments, 12 of the electrodes were positioned in or close 222 enough to the superficial layers of the auditory cortex to pick up frequency-tuned local field 223 potentials. Six of these electrodes also picked up frequency-tuned multi-unit activity, suggesting 224 that they were located in layer III or below. The devices in both animals were implanted over the 225 right hemisphere in a way that allowed electrodes to approach the superior temporal plane 226 approximately perpendicular. 227

All experiments were performed in small (4' wide by 4' deep by 8' high) sound-230 attenuating and electrically insulated recording booths (Eckel Noise Control Technology). 231

Animals were positioned and head-fixed in custom-made primate chairs (Scientific Design). 232

Neural signals were recorded with a 256-channel digital amplifier system (RHD2000, Intan) at a 233 sampling rate of 30 kHz. 234 Experimental control was handled by a Windows PC running an in-house modified 235 version of the Matlab software-package monkeylogic. Sound files were generated prior to the 236 experiments and presented by a sub-routine of the Matlab package Psychtoolbox. The sound-237 files were presented using the right audio-channel of a high-definition stereo PCI sound card (M-238 192 from M-Audiophile) operating at a sampling rate of 96 kHz and 24 bit resolution. The 239 analog audio-signal was then amplified by a 300 Watt amplifier (QSC GX3). The amplified 240 electric signals were converted to sound waves using a single element 4 inch full-range driver 241 speaker (Tang Band W4-1879) located 8 inches in front of the animal, and presented at an 242 intensity of 78 dB SPL. To determine sound onset with high accuracy, a trigger signal was routed 243 through the unused left audio channel of the sound card directly to one of the analog inputs of the 244 recording system. The trigger pulse was stored in the same stereo sound-file and was presented 245 using the same function call. Hence, any delay in the presentation of the tone also leads to an 246 identical delay in the presentation of the trigger. Thus, sound onset could be determined at a level 247 of accuracy that was limited only by the sampling frequency of the recording device (30kHz: 248 corresponding to 33 μsec). 249

The cranial EEG and intracranial recordings were performed on two wild-type (~ 8 253 months old), pigmented guinea pigs (GPs , GP1 and Gp2; Cavia porcellus; Elm Hill Labs, 254

Chelmsford, MA), weighing ~600 -800g. All experimental procedures were conducted 255 according to NIH Guidelines for the care and use of laboratory animals and were approved by 256 the Institutional Animal Care and Use Committee (IACUC) of the University of Pittsburgh. 257

Prior to commencing recordings, a custom headpost for head fixation, skull screws that 258 served as EEG recording electrodes or reference electrodes for intracranial recordings, and 259 recording chambers for intracranial recordings were surgically implanted onto the skull using 260 dental acrylic (C & B Metabond, Parkell Inc.) following aseptic techniques under isoflurane 261 anesthesia. Analgesics were provided for three days after surgery, and animals were allowed to 262 recover for ~10 days. Following recovery, animals were gradually adapted to the recording setup 263 and head fixation for increasing durations of time. 264

All recordings were performed in a sound-attenuated booth (IAC) whose walls were 266 covered with anechoic foam (Pinta Acoustics). Animals were head-fixed in a custom acrylic 267 enclosure affixed to a vibration isolation tabletop. Stimuli were presented using Matlab 268 (Mathworks, Inc). Digital stimulus files sampled at 100 kHz were converted to an analog audio 269 signal (National Instruments, USA), attenuated (Tucker-Davis Technologies, USA), power-270 amplified (Tucker-Davis Technologies, USA) and delivered through a calibrated speaker (4" 271 full-range driver, TangBand, Taiwan) located ~0.9 m in front of the animal. Stimuli were 272 presented at ~75 dB SPL. 273

FFRs were acquired from unanesthetized, head-fixed, passively-listening GPs using a 275 vertical electrode montage. Scalp-recorded activity was collected via a stainless-steel skull screw 276 (Fine Science Tools, USA). Reference and ground conductive adhesive hydrogel electrodes 277 (Foam electrodes, Kendall™ or Covidien Medi-trace®, USA) were placed on the high forehead 278 and mastoid respectively. Signals were acquired using a multichannel neural processor (Ripple 279

Inc., USA). 280

Intracortical recordings were performed in unanesthetized, head fixed, passively-listening 282

GPs. Small craniotomies (1 -2 mm. Diameter) were performed within the implanted recording 283 chambers, over the expected anatomical location of primary auditory cortex. Neural activity was 284 recorded using high-density 64-channel multi-site electrode (Cambridge Neurotech), inserted 285 roughly perpendicular to the cortical surface. The tip of the probe was slowly inserted to a depth 286 of ~2 mm. After a brief waiting period to allow the tissue to settle, signals were acquired using a 287 multichannel neural processor (Ripple Scout). 288

A summary of the participant information, stimulus and acquisition parameters across 289 the three species are provided in Table 1 . 290

A linear regression-based method was used to remove the harmonics of the power line 293 interference from the data, using the cleanline plugin in EEGLAB . 294

The raw sEEG was high pass filtered using a third-order zero-phase shift butterworth filter. No 295 low pass filter was used, as the sampling rate was 1000 Hz resulting in an effective low pass 296 frequency of 500 Hz. Time-locked sEEG epochs were extracted for all vowels in both polarities. 297

The epochs that exceeded amplitudes of 75 µV were rejected. The FFRs in both the polarities for 298 each vowel were averaged to obtain a total of four FFR waveforms (one for each vowel). Four 299 FFR waveforms were obtained for each electrode. 300

The inter-trial phase locking value (ITPC) was estimated to assess the frequencies at 301 which the cortical units phase-lock. The single-trial FFRs were decomposed into a spectrogram 302 

The EEG data from the humans was bandpass filtered from 80 to 1000 Hz and epoched 313 from -25 to 250 ms (re:stimulus onset). Baseline correction was applied on each epoch, and the 314 epochs exceeding an amplitude of +/-35 µV were excluded from further analysis.

The raw data were high pass filtered using a second-order zero-phase shift FIR filter. 317

Time-locked epochs were extracted for all vowels in both polarities. The epochs that exceeded 318 amplitudes of 250 µV were rejected. The FFRs in both the polarities for each tone were averaged 319 to obtain a total of four FFR waveforms (one for each tone). 320

Custom Matlab and R routines were used to filter and average signals appropriately to 321 obtain local field potentials (LFPs) and multi-unit activity (MUA) from the macaque and GP 322 recordings. The current source density (CSD) in the macaque was computed from the LFP 323 signals derived from the electrodes with 150 µm spacing, spatially smoothing the LFP signal 324 using a Gaussian filter (SD = 250 µm), and obtaining the second spatial derivative method using 325 the finite difference approximation. In the GP, CSD was computed from the LFP signals 326 derived from alternate electrode contacts (60 µm spacing), spatially smoothing the LFP signal 327 using a Gaussian filter (SD = 125 µm) and obtaining the second spatial method using the finite 328 difference approximation. The sink with the earliest latency in the CSD, post-stimulus onset, was 329 used to identify the thalamorecipient layers. 330

RSA was used to establish homologies between the species and assess similarities across 332 scalp and cortical FFRs. RSA was performed on the accuracies of a machine learning model to 333 decode the four mandarin tones (pitch patterns) based on the FFRs. A hidden Markov model 334 (HMM) classifier was used as the machine learning model and was trained to decode the 335 Mandarin tones based on the FFR pitch tracks (Llanos et al., 2017) . A detailed description of the 336 HMM-based decoding approach can be found in a previously published methods paper (Llanos 337 et al., 2017) . The averaging size of the HMM was dynamically adjusted to obtain equivalent classification accuracies across the different levels (scalp, cortex) and species. The averaging 339 sizes used were 150 trials for human scalp FFR, 24 and 2 trials for sEEG FFR in Hum1 and 340

Hum2 respectively, 150 trials for macaque scalp FFR, 4 trials for macaque PAC FFR, 6 trials for 341 the guinea pig scalp FFRs, and 16 trials for the GP PAC FFRs. The confusion matrices of 342 decoding patterns (proportion correct) were extracted and used for further representational 343 similarity analysis. Multi-dimensional scaling analysis was performed on the confusion matrices 344 (diagonals removed) to assess if the decoding patterns across levels and species are similar. 345

Procrustes analysis was performed to rotate and transform all the MDS representation to the 346 same scale to facilitate comparison across species and levels. A similarity matrix was derived 347 from the confusion matrices (diagonals removed) across-levels and-species using Pearson's 348 product-moment correlations and the significance of these correlations were also assessed. 349

In the animal models, we had the opportunity to recorded FFRs from the scalp and the 352 cortex in the same animal (Maq1, GP1, and GP2). We compared the FFR power spectra at the 353 scalp and cortex to analyze similarities and differences in the spectral composition between the FFRs in latency. One scalp and one intracortical electrode with the best signal-noise ratio were 376 chosen for this analysis. In Maq1, scalp and intracortical FFRs were obtained simultaneously. In 377 GP1, scalp and intracortical FFRs were obtained in two separate sessions, while in GP2 both 378 scalp and intracortical FFRs were obtained simultaneously. The FFRs from the cortex were 379 derived from the electrode with the maximum signal-to-noise ratio in laminar FFR-LFP 380 recordings. 381 A cross-spectral power density estimate was also obtained to assess the similarity in 382 power between the scalp and cortical FFRs irrespective of the latency difference. This estimate is 383 useful in getting an objective metric of similarity in spectral properties of the FFRs at the cortex and the scalp, irrespective of the differences in temporal properties. The Cross-spectral densities 385 were obtained using Welch's periodogram method with 50% overlapping 1024-point hamming 386

windows. The absolute power of the cross-spectral densities was averaged across the FFRs for 387 the four Mandarin tones and overlaid on the plot of frequency-wise latency comparison to obtain 388 a unified inference of the spectro-temporal similarities in the FFRs at the scalp and cortex. 389

The above measures show differences and similarities in the scalp and cortically recorded 390

FFRs. Multiple volume-conducted fields in the cortex and sub-cortex that lead to constructive 391 and destructive interferences drive these differences and similarities, and the above measures 392 may not be sensitive to differentiate these fields. Thus, we further used a blind source separation 393 approaches using independent component analysis that can separate spectro-temporally of the ICAs were assessed by cross-correlation with the stimulus waveform. These latencies were 419 also used to infer the potential generators of the ICAs, with earlier latencies corresponding to 420 more subcortical sources. The power coherence was estimated between each of these ICAs and 421 the stimulus waveform using cross-power spectral density (cpsd.m). This analysis aided in 422 inferring the differential pattern in the decline of power coherence across the cortical ICAs and 423 subcortical ICAs. The power coherence metric is not a measure of phase-locking but just the 424 power coherence between the stimulus and the ICAs. 425

Code availability: The analysis codes will be provided to the readers on request. 426

FFRs in the human auditory cortex to vowels with time-varying pitch contours 428

FFRs were recorded from stereotactically implanted electrodes ( Fig. 1A and 1E ) in two 429 participants while they listened to the pitch varying Mandarin vowels. In both participants, the location of the electrodes was based on clinical necessity. In Hum1, the electrodes were implanted 431 in both hemispheres with electrodes spanning across the superior temporal plane, superior 432 temporal gyri/sulci, middle temporal gyri/sulci, and insula. In Hum2 the electrodes were implanted 433 only in the right hemisphere spanning the frontal, parietal, and temporal lobes, the superior 434 temporal plane, and the insula. 435

We analyzed time-and phase-locked neural activity to the periodicities in the stimulus. 436

Robust FFRs (Fig 1C & 1F) with amplitudes above pre-stimulus baseline (p<0.05 on permutations-437 based t-tests between pre-stimulus baseline and FFRs on bootstrapped FFR trials) were observed 438 in the electrode contacts in the Heschl's gyrus (HG) and the Planum Temporale in both subjects 439 (6/129 electrode contacts in Hum1, and 10/226 electrode contacts in Hum2) (Fig 1A & 2E) . 440

Electrode contacts farther from HG did not show FFR like responses that were significantly above 441 the pre-stimulus baseline level (p<0.05 on permutations-based t-tests between pre-stimulus 442 baseline and FFRs on bootstrapped FFR trials). Thus, further FFR analyses were restricted to the 443 electrodes along the HG. 444

We used four-pitch variants of the vowel /yi/ (Fig 1B) , referred to as Mandarin tones to 445 elicit the FFRs (Fig 1C) . These Mandarin tones have been extensively used to record FFRs to 446 study the neurophysiology of pitch processing and associated plasticity in humans (Krishnan et tones showed robust onset responses followed by FFRs that lasted throughout the stimulus 451 duration. We refer to the FFRs recorded from electrode contacts in close proximity to or directly 452 within the auditory cortex as 'cortical FFRs (cFFRs)' from here on. As is the case for scalp-453 recorded FFRs, the cFFRs closely followed the Mandarin tones' fundamental frequency (Fig 1B) . 454

All four Mandarin tones elicited robust cFFRs (Fig 1C & 1F) in the electrode trajectories that 455 were inserted along the HG, PT, and STG. The cFFRs that showed the highest amplitudes and 456 signal-to-noise ratios were found in the electrode contacts closest to HG (Fig 1A & 1E ) (p<0.05, 457

Permutation based ANOVA followed by post-hoc paired t-tests on bootstrapped trials). 458

The magnitudes of the cFFRs were highest for tone stimuli with lower-F0, i.e., most 459 robust in T3 (89-111 Hz) and T2 ( 109-133 Hz), followed by T1 (~129 Hz) and T4 (140 to 92 460 Hz) (Fig 1D) . This pattern is clearly visualized within the cFFRs to T2 and T4, where strong 461 inter-trial phase coherence (ITPC) or phase-locking can only be observed when the F0 of the 462 vowel is low and phase-locking declines when the F0 is high (Fig 1D & G) . 

Hemispheric asymmetry was analyzed in Hum1 with bilateral temporal lobe coverage. 477

The high-quality and high signal-to-noise ratio FFR data allowed us to statistically assess 478 hemispheric asymmetry within the subject. cFFRs to the Mandarin tones showed a distinct 479 hemispheric asymmetry, consistent with a prior study using MEG (Coffey et al., 2016) . The 480 electrodes in the right hemisphere showed higher amplitude cFFRs to the Mandarin tones 481 (p<0.01, permutation-based t-tests on signal-to-noise ratios on bootstrapped cFFR samples). The 482 rightward symmetry was also seen in the ITPC spectrograms and pitch tracking accuracy to the 483 Mandarin tones (Fig 1D & 2B) , which together indicate better phase-locking to the stimulus F0 484 in the right hemisphere. The better phase-locking in the right hemisphere was also seen in an 485 additional set of non-speech stimuli i.e., click trains with repetition rates in the human pitch 486 range ( Supplementary Fig 1) . We used a Hidden Markov Model (HMM) to decode the mandarin 487 tones from the cFFRs. The cFFRs from the right hemisphere tracked the stimulus pitch better 488 than in the left hemisphere (Fig 2b) . Consequently, decoding accuracies were higher in the right 489 hemisphere than the left hemisphere ( Fig 2C) . The pattern of tone decoding errors ('confusions') 490 correlated significantly (p<0.05) between cFFRs from the right hemisphere and the scalp FFRs 491 from a set of 20 subjects, but the same was not true for the cFFRs from the left hemisphere and 492 the scalp FFRs (Fig 2D) , Despite this difference, multidimensional scaling analysis revealed 493 similar clustering of tone FFRs across the scalp, right HG, and left HG (Fig 2E) . 494

As in the humans, we recorded EEG in both animal model systems in order to establish 496 scalp-derived FFRs as a translational bridge between the three species. Recordings in the animal 497 models used the same Mandarin tone stimuli previously used for the human subjects. The scalp-498

recorded FFRs in both model species showed FFR activity (Fig 1I, 1L, & 1N) above the pre-499 stimulus baseline (SNRMaq = 3.2, SNRGP1 = 7, SNRGP2 = 3.4) and showed the expected phase-500 locking to the F0 of the stimuli (Fig 1J, 1M, & 1N) . Both species showed FFRs that correlated 501 (rMaq = 0.45, rGP = 0.48, rGP1 = 0.5, rGP2 = 0.5, r-maximum cross-correlation coefficient) with 502 the stimulus at latencies (LatMaq = 3ms, LatGP = 3.5ms) expected of early brainstem responses. 503 Furthermore, we recorded local field potentials (LFPs) from electrodes in PAC to 504 compare against the intracranial LFP recordings in the human epilepsy patient. Similar to the 505 scalp-recorded FFRs, the intracranial LFPs in both model species also yielded strong amplitude 506 cFFRs above the pre-stimulus baseline (SNRMaq = 18.3, SNRGP1 = 3.7381, SNRGP2 = 7.7) and 507 readily showed the expected phase-locking to the F0 of the stimuli (Fig 1J, 1M, & 1N) . The 508 latencies of the cFFRs (LatMaq = 11.6ms, LatGP1 = 9.7ms, LatGP2 = 10.1ms), however, were 509 longer than scalp-recorded FFRs in both species (ps<0.001 in both GPs and Maqs, on sign rank 510 comparison of stimulus to response cross-correlation latencies on bootstrapped samples). 511

We used RSA to quantify similarities between humans and animal models across 512 different recording levels (intracortical vs. scalp) (Fig 3) . RSA was performed on confusion 513 matrices constructed from FFRs recorded using harmonized stimuli (four Mandarin tones) 514 across-species and levels. Human scalp data were derived from FFRs recorded in 20 participants 515 from a previously published study (Reetzke et al., 2018) . In the macaque and GP subjects, scalp 516 FFRs were recorded from cranial EEG electrodes surgically implanted in the skull. In the 517 macaque, intracranial data were recorded from an electrode positioned immediately above layer 518 1 of the primary auditory cortex (PAC) from a chamber implanted over the frontal cortex. In the 519 GP, intracranial data were recorded from an electrode contact estimated to be positioned in 520 putative layer 4 of PAC. Visual inspection shows that the pattern of phase-locking of FFR and cFFR in the animal models was similar to that seen in the human, with phase-locking declining 522 rapidly with increasing stimulus F0 (Fig 1D, 1G, 1J, 1M, & 1N) . 523

We decoded the Mandarin tone categories from scalp and intracranially recorded FFRs 524 for all species using an HMM classifier (Llanos et al., 2017; Reetzke et al., 2018) . The HMM 525 classifier performed at above chance levels (>0.25) across species and levels in identifying the 526 correct pitch patterns from the FFRs (principal diagonal, Figure 3A ). Human and animal FFR 527 confusion matrices were strikingly similar at both the scalp (Fig 3B) and intracortical levels (Fig  528   3C ) (p<0.05 on Pearson's correlation of confusion matrices without the principal diagonal) with 529 stronger similarity seen for the intracortical cFFRs ( Fig 3C) . However, the scalp FFRs also 530 yielded subtle species-specific differences (with greater similarity between human and GPs, 531

relative to the macaque model). 532

Although intracortical recordings from human subjects provide high spatial and temporal 535 resolution, they are still prone to contamination by volume-conducted fields from the brainstem 536 and subcortical nuclei, and do not provide cortical layer-specific information. To overcome this 537 limitation, we turned to laminar recordings from multi-contact electrodes traversing all layers of 538 PAC approximately perpendicular to the cortical sheet (Fig 4) in the two animal models. These 539 recordings allowed us to compute current source densities (CSD), which reflect post-synaptic 540 currents and the corresponding passive return currents in the local cortical populations. Current 541 sinks and sources are independent of volume-conducted potentials from the brainstem and the 542 midbrain. The CSDs can be used to determine whether the post-synaptic currents in cortical 543 populations are phase-locked to the stimulus, and if so, at which cortical depth and latency do they arise. In addition, these laminar recordings also allowed us to assess the prevalence, latency, 545 and cortical depth of multi-units that phase-lock the F0 of the stimulus. 546 Fig 4D, 4H and 4L show CSDs of the low-pass filtered local field potentials (1 -70 Hz) 547 aligned to the stimulus onset. This analysis identified expected patterns of sources and sinks for 548 both animals that were used to identify the putative location of thalamorecipient cortical layers 549 (layer IV and deep layer III), as well as supra-and infra-granular layers. We then computed the 550 CSDs using the same filter setting used for the FFRs. Fig 4B & F show a 30 ms long snippet of 551 the CSD FFRs from the sustained portion of tone 3 for both species (Maq2, GP1, and GP2). Note 552 the presence of several currents that entrained to the F0 of the stimulus (stimulus to CSD 553 correlation >0.5, Fig4C, 4G, & 4K). We will refer to these currents as cortical frequency-554 following currents (cFFC). The most prominent cFFC was located in putative thalamo-recipient 555 layers, and a second, somewhat weaker, cFFC with opposite polarity was identified in infra-556 granular layers. There was also an indication of a third and even weaker opposite polarity cFFC 557 in supra-granular layers. 558

The cFFCs in both species showed a strong correlation with the stimulus at latencies of 559 12-25 ms in the macaque, and ~10-25 ms in the GPs (Fig 4C, 4G, 4K ). These latencies are 560 consistent with a cortical origin. Only one infragranular cFFC in macaque had a latency of 6.3 561 ms that seemed inconsistent with a cortical origin. It is likely that this particular cFFC does not 562 exclusively reflect post-synaptic activity, but rather a very large-amplitude spike that was 563 isolated at this electrode contact and was bleeding into the frequency range of the FFR. Given 564 the short latencies, it is likely that the spike in question corresponded to a passing 565 thalamocortical fiber, rather than an infragranular cortical neuron.

In both species, the strongest and most prominent cFFCs were recorded in granular 567 layers, and most likely reflect active postsynaptic currents in response to F0-locked thalamic 568 input at basal dendrites (Fig 4B, 4F, 4J) . It is less clear if the cFFCs in infra and supra-granular 569 layers reflect active postsynaptic currents which might be indicative of the propagation of phase-570 locked responses to these layers or if they exclusively reflect passive return currents. In order for 571 the phase-locked activity to spread beyond thalamo-recipient layers, not only the post-synaptic 572 input currents but also the output, i.e., their firing rates, would have to be entrained to F0. We suggest that the thalamo-recipient layers not only receive phase-locked input but also fire in a 579 phase-locked manner to the stimulus F0 in both animal models. These results indicate that the 580 thalamo-recipient layers not only receive strong phase-locked input from the thalamocortical 581 fibers, but may also propagate the FFRs to downstream cortical layers, albeit with reduced 582 phase-locking strength. 583

Because both intracranial and scalp FFR recordings were obtained in the same macaque 585 and GPs, we used the opportunity to examine the power and latency across frequencies of the 586 scalp and cortical FFRs to infer similarities. The cortical FFRs were higher in amplitude than the 587 scalp FFRs, presumably due to the proximity of the electrodes to the cortical sources. The 588 comparison of the spectral characteristics of the scalp and cortical FFRs were thus made by 589 normalizing the spectral estimates. Compared to the scalp FFRs, the cFFRs from the PAC were 590 predominantly composed of low-frequency F0 energy relative to higher harmonics (Fig 1 and  591 5B). Fig 5A shows the FFRs to tone 3 (low-frequency dipping contour) recorded from the scalp 592 and cortex in the macaque and the GPs. Cortical cFFRs in both species showed longer latencies 593 than the scalp-recorded FFRs (ps<0.01 permutation based Wilcoxon sign rank tests on 594 bootstrapped FFR trials) (Fig 5A) . While the phase-locking of cFFRs to Mandarin tones were 595 higher in the PAC than at the scalp (Fig 1) , the decline in phase-locking with increasing 596 frequency was similar at both the PAC and the scalp. This can also be seen in the difference in 597 normalized power spectral density between the scalp and cortical FFRs at the high frequencies 598 when normalized based on maximum spectral amplitude (Fig 5B) . 599

Cross-spectral power analysis revealed that scalp and cortical FFRs shared strong power 600 coherence with the stimulus near the F0 (70-110 Hz), which declined rapidly at higher 601 frequencies in the cortical FFRs relative to the scalp FFRs (Fig 5C) . This trend was similar in all 602 animals (Maq1, GP1, and GP2). This pattern indicated that the FFRs recorded at the scalp and 603 the cortex were similar in power spectral density at the low-frequency regions. Such a pattern 604 can be caused either by a single common source or by more than one source with similar spectral 605 properties but different temporal properties. We thus estimated the cross-correlation strength and 606 latency between the scalp and cortical FFRs across frequencies in the time-frequency domain. 607

The maximum correlation between the scalp and cortical FFRs was seen at frequencies <120 Hz. 608 However, at these frequencies, the cortical FFRs showed delays of~7 ms in the macaque, and 609 ~10 ms (Fig 5C) in the GP in comparison to the scalp FFRs. This indicates the presence of 610 temporally disparate but spectrally overlapping neural sources of the scalp-recorded FFRs. These delays can also be prominently seen in the latencies of the cFFRs, which are higher than 612 conventionally obtained scalp-recorded FFRs (Fig 5C)  613 We further applied a blind source separation approach to disentangle the spectro-614 temporally overlapping components that contribute to the scalp-recorded FFR. We used 615 independent component analysis (ICA) as the source separation approach (Maq - Fig 6 and GP-616   Fig 7) . In the macaque, we submitted all the scalp (33 electrodes) and intracortical electrodes (96 617 electrodes spanning the superior temporal plane, prefrontal, and pre-motor cortex to ICA 618 decomposition. We extracted 12 ICAs that explained 96% of the variance. Among these, we 619 focused on four ICAs each of which explained >10% of the variance individually, and who as a 620 group explained 75% of the variance (Fig 6) . ICA2 was consistent with a volume-conducted 621 generator from the regions distant to all intracranial electrodes (Fig 6A) . This is apparent from 622 the widely distributed ICA weights across electrodes. Furthermore, ICA2 had a latency of 3.4 ms 623 and showed prominent power coherence with the stimulus at F0 as well as the higher harmonics 624 ( Fig 6C) . In contrast, ICA1 had a longer latency (18.1 ms), responded strongest to the F0, and 625 exhibited a steeper gradient between electrodes in the superior temporal plane and 626 motor/premotor cortex. The power coherence with the stimulus also declined in frequency faster 627 than in ICA1. These three findings are consistent with a generator in the primary auditory 628 cortex. This putative cortical ICA1 also propagated to the scalp and contributed to the scalp-629 recorded FFRs. Similarly, topographies and latencies of ICA4 (lat=0.5 ms suggested a cortical 630 origin and propagated to the scalp electrodes. ICA3 and ICA4 in contrast to ICA1 show spatial 631 weights that are opposite in polarity and hence possibly emerged from different cortical sources 632 with different orientations.

In the GP, we submitted the averaged FFRs for each of the Mandarin tones at the 24 634 electrodes along the layers of the PAC, and 2 electrodes placed on the scalp to ICA. The scalp 635 electrodes were placed on the vertex of the scalp (Cz : midpoint of the head along both sagittal 636 and coronal axes) and on the temporal surface of the scalp close to the auditory cortex (T4). Six 637

ICAs were extracted that explained ~99% of the variance in the FFRs. Among these, the first 638 four components explained >10% of the variance in the FFR amounting to a total of 93.5% 639 (GP1) and 99% (GP2) (Fig 6) . Based on visual inspection, the weights of ICA2 in GP1 were not 640 modulated appreciably along the laminar electrode layout. Similarly, ICA3 and ICA4 in GP2 641

were not modulated appreciably along the laminar electrode layout. This is consistent with 642 volume-conducted activity from distant brain regions, in this case most likely the brainstem (Fig  643   7A & C). It should be noted that the spatial loadings of the volume conducted ICs in GP2 largely 644 follow the same trend as GP1 but are not exactly similar. This could be in part influenced by the 645 large cortical onset and offset responses that propagate to the scalp in GP2 which are not as 646 apparent in GP1. 647

In GP1 the putative subcortical ICA2 had a latency of 4.1 ms and showed strong power 648 coherence with the stimulus F0 as well as the harmonics (Fig 7F) . This putative subcortical 649 ICA2 contributed almost the entire variance in the scalp electrodes placed at Cz, thus suggesting 650 negligent contribution from other sources such as cortex. ICA1 (lat=9.7ms) and ICA3 651 (lat=12.3ms) showed maximum spatial weights around putative cortical layers 4 and 2/3 and 652 contributed maximally to the variance at the electrode T4 that was placed very close to the 653 surface of primary auditory cortex. In contrast, it did not contribute to variance at the scalp 654 electrode Cz. While ICA4 (lat =15.1ms) was also consistent with a cortical source, it contributed 655 to very little variance in the scalp electrodes. Taken together, these results show that in the GP, the scalp electrodes placed at the midline are dominated by the subcortical sources, while those 657 placed on the temporal scalp locations are dominated by cortical sources. 658

Similar results were also obtained in GP2 where the putative subcortical ICA3 and ICA4 659 had a latencies of 2.7 and 2.4 ms respectively, and showed strong power coherence with the 660 stimulus F0 as well as the harmonics (Fig 7D) . 

Frameless robot-assisted stereoelectroencephalography in children: technical 1026 aspects and comparison with Talairach frame technique

Biomap: A neurodiognostic tool for auditory processing disorders

Aging Affects Neural 1033 Precision of Speech Encoding

Monkeys share the neurophysiological basis for 1035 encoding sound periodicities captured by the frequency-following response with humans

Descending projections from the auditory cortex to the inferior 1038 colliculus in the gerbil, Meriones unguiculatus

The descending corticocollicular pathway 1040 mediates learning-induced auditory plasticity

Neural Correlates of Vocal Production and Motor Control in 1043 Human Heschl's Gyrus

Multichannel recordings of the human brainstem frequency-following 1045 response: Scalp topography, source generators, and distinctions from the transient ABR

Subcortical sources dominate the neuroelectric auditory frequency-1048 following response to speech

Age-related changes in the 1050 subcortical-cortical encoding and categorical perception of speech

Commentary: Understanding 1053 Stereoelectroencephalography: What's Next

Context-Dependent Encoding 1055 in the Human Auditory Brainstem Relates to Hearing Speech in Noise: Implications for 1056 Developmental Dyslexia

The scalp-recorded brainstem response to speech: Neural 1058 origins and plasticity

The scalp-recorded brainstem response to speech: Neural 1060 origins and plasticity

Recording Frequency-following Responses to 1062

Voice Pitch in Guinea Pigs: Preliminary Results

Oscillatory entrainment 1064 of the Frequency Following Response in auditory cortical and subcortical structures

Early Sound Encoding and their Relationship to Speech-in-Noise Perception

Cortical contributions to 1073 the auditory frequency-following response revealed by MEG

Cortical Correlates of the Auditory Frequency-1075 Following and Onset Responses: EEG and fMRI Evidence

Evolving perspectives on the sources of the frequency-following 1079 response

EEGLAB: an open source toolbox for analysis of single-trial EEG 1081 dynamics including independent component analysis

Two crossed axonal projections contribute to binaural 1084 unmasking of frequency-following responses in rat inferior colliculus

Contributions of Robotics to the Safety and Efficacy of 1087

Brainstem frequency-following response recorded 1092 from one vertical and three horizontal electrode derivations

Putative measure of peripheral and brainstem frequency-following in humans

Origins of the Scalp-Recorded Frequency-Following 1098 Response in the Cat

Human frequency-following 1100 responses to monaural and binaural stimuli

Acoustic basis of context 1103 dependent brainstem encoding of speech

Physiological bases of the encoding of speech 1105 evoked frequency following responses

Neural generators 1108 of the frequency-following response elicited to stimuli of low and high frequency: A 1109 magnetoencephalographic (MEG) study

Noninvasive localization of brain-stem lesions in 1111 the cat with multimodality evoked potentials: Correlation with human head-injury data

Processing of Communication 1114 Calls in Guinea Pig Auditory Cortex

Speech frequency-following 1116 response in human auditory cortex is more than a simple tracking

Auditory cortical generators of the Frequency Following Response 1118 are modulated by intermodal attention

Electrically-Evoked Frequency-1120 Following Response (EFFR) in the Auditory Brainstem of Guinea Pigs

Hearing Ranges of Laboratory Animals

Subdivisions of auditory cortex and processing streams in primates

Differential Group Delay of the Frequency Following 1127 Response Measured Vertically and Horizontally

Auditory 1129 biological marker of concussion in children

Neural representation of pitch salience in the 1132 human brainstem revealed by psychophysical and electrophysiological indices

The effects of tone language experience on 1135 pitch processing in the brainstem

Experience-dependent plasticity in pitch 1137 encoding: from brainstem to auditory cortex

Analyzing the FFR: A tutorial for decoding the richness of auditory 1139 function

Context-dependent plasticity in the subcortical 1141 encoding of linguistic pitch patterns

Interactive effects of linguistic abstraction and 1143 stimulus statistics in the online modulation of neural speech encoding

A surface metric and software toolbox for EEG electrode grids in the 1147 macaque

Evoked potentials 1149 recorded from the auditory cortex in man: evaluation and topography of the middle 1150 latency components

Hidden Markov modeling of frequency-following 1153 responses to Mandarin lexical tones

Mining event-related brain dynamics

Differential brainstem pathways for the conduction of 1157 auditory frequency-following responses

Far-field recorded frequency-following responses: 1160 correlates of low pitch auditory perception in humans

Functional Interplay Between the Putative 1163 Measures of Rostral and Caudal Efferent Regulation of Speech Perception in Noise

Use of the guinea pig in studies on the development 1166 and prevention of acquired sensorineural hearing loss, with an emphasis on noise

Context-dependent encoding in the auditory 1169 brainstem subserves enhanced speech-in-noise perception in musicians

Temporal Coding of Voice Pitch Contours in Mandarin Tones

Functional Imaging Reveals Numerous 1176 Fields in the Monkey Auditory Cortex

Pitch coding and pitch processing in the human brain

Human frequency-following responses: representation of 1180 second formant transitions in normal-hearing and hearing-impaired listeners

Evidence of degraded representation of speech in 1183 noise, in the aging midbrain and cortex

Mechanisms and streams for processing of "what" and "where" 1185 in auditory cortex

Tracing the Trajectory of Sensory 1187 Plasticity across Different Stages of Speech Learning in Adulthood

Speech sound discrimination by 1190 monkeys and humans

Differential sensitivity to vowel continua in Old World monkeys 1192 (Macaca) and humans

Auditory brain stem response to complex sounds: a tutorial

Hearing It Again and Again: On-Line Subcortical Plasticity in Humans 1196 Op de Beeck HP

Novelty Detection in the Human Auditory Brainstem

Far-field recorded frequency-following responses: 1200 evidence for the locus of brainstem sources

Human auditory frequency-following 1203 responses to a missing fundamental

Representation of the voice onset time (VOT) 1205 speech parameter in population responses within primary auditory cortex of the awake 1206 monkey

Temporal Encoding of 1208 the Voice Onset Time Phonetic Parameter by Field Potentials Recorded Directly From 1209 Human Auditory Cortex

Brainstorm: A User-Friendly 1211 Application for MEG/EEG Analysis

Tonal frequency affects amplitude but not topography of rhesus monkey 1214 cranial EEG components

Linear superposition of 1216 responses evoked by individual glottal pulses explain over 80% of the frequency 1217 following response to human speech in the macaque monkey

Contextual processing in unpredictable 1220 auditory environments: the limited resource model of auditory refractoriness in the 1221 rhesus

Brainstem Evoked Potential Indices of Subcortical Auditory 1223 Processing Following Mild Traumatic Brain Injury

Identification and localisation of auditory areas 1225 in guinea pig cortex

Differences between auditory frequency-following responses and onset 1227 responses: Intracranial evidence from rat inferior colliculus

Auditory Processing in Noise: A Preschool Biomarker for Literacy

Frequency-following (microphonic-like) neural responses evoked 1233 by sound

Generators of the Frequency-following Response in 1235 the Guinea Pig

Spectral and Temporal Processing in Human Auditory Cortex. Cereb

Previous studies used distributional source modeling applied to EEG or MEG data to 702 study the cortical source of FFRs (Coffey et al., 2016; Bidelman, 2018; Hartmann and Weisz, 703 2019). Here we circumvented the challenges of inverse source localization by using direct 704 intracranial recordings in two human participants, and confirmed that robust cortical FFRs to 705 pitch patterns could be evoked in the Heschl' gyri. These cortical FFRs phase-locked only to the 706 stimulus fundamental frequency, while subcortical FFRs can track speech harmonics as high as 707 950 Hz (Galbraith et al., 2000; Plyler and Ananthanarayan, 2001) . Further, the latencies of 708 cortical FFRs (17-23 ms) were significantly longer than expected of subcortical FFRs (Du et al., 709 2009; Wang and Li, 2018) . Compared to earlier studies, we examined the cortical FFRs to higher 710 F0s (made possible by intracranial recordings), and showed cortical FFRs to F0s as high as 150 711Hz. 712Bilaterally, the electrodes in the HG showed substantially stronger FFRs compared to 713 those in the PT. No other cortical regions close to the HG showed FFRs. The PT did not phase-714 lock to the periodicity of the stimulus, which might indicate a transformation of temporal pitch 715 code into a place or a rate-place code in the auditory-association cortex. This pattern was also 716 consistent in the macaque data (Maq1) where only electrodes closest to the primary auditory 717 cortex showed strong FFRs. Weaker FFRs on electrodes in motor, pre-motor and prefrontal 718 cortex were likely volume-conducted fields not originating in the motor regions. 719 Consistent with previously reported rightward bias in the cortical FFR activity (Coffey et 720 al., 2016 (Coffey et 720 al., , 2017a (Coffey et 720 al., , 2017b ; Gorina-Careta et al., 2021), we found evidence of distinct rightward 721 asymmetry of FFR magnitudes and steeper phase-locking decline with F0 in the left compared to 722 the right HG. The right hemisphere asymmetry observed in our study may underlie processing 723 differences of melodic and prosodic features in (non-native) speech and music ( due to the coarse spatial resolution offered by human sEEG it cannot confirm the presence of 738 cortical frequency following currents as against the thalamocortical input currents, which is 739 essential to firmly establish cortex as a putative generator of scalp-recorded FFRs. By 740 establishing similarities in FFR representation between the human and animal models, and by 741 leveraging high-density laminar recordings in animal models, we were able to explore the 742 laminar sources of the FFRs and breakdown the cortical contributions to the scalp FFRs with 743 high spatial and temporal detail. 744We leveraged representational similarity analyses (RSA) as a translational bridge across 745 levels and species. This allowed us to further deep dive into the FFR sources in animal models at 746 an fine anatomical resolution. Critically, despite differences in recording procedures, anatomy, 747 and arousal states, we demonstrate strong similarities in representational structure between the 748 cortical and scalp FFRs in both human and animal models. Further, the similarity across the 749 species suggests similar representation of the F0 feature in the three species. Further, across the 750 species, the falling tone (T4) was represented less robustly (more confusion) than the rising tone 751 (T2) (less confusion) suggesting a cross-species similarity in preferential processing of stimuli 752 with rising, relative to falling pitch (Peng et al., 2018) . Due to these similarities, macaques and 753GPs may be well-suited to help answer important questions about the cortical FFRs. The extent 754 of representational similarity across species was lower for scalp-recorded FFRs than intra-755 cortically recorded FFRs, which likely reflects a variability in dipole orientations of cortical FFR 756 sources. 757These results complement and expand an earlier study (Ayala et al., 2017) , which 758 explored the similarity of human and monkey scalp FFRs based on morphological characteristics 759 of FFR to a single 40 ms /da/ syllable with a relatively steady F0. Going beyond a morphological 760 comparison, we use a range of complex speech sounds with time-varying pitch to assess the 761 species-specific similarity using RSA. We also establish homologies across three animal species 762 along the evolutionary hierarchy, each of which can be leveraged to understand FFRs using 763 advanced approaches that are species-specific; for example, optogenetic approaches can be 764 efficiently used in guinea pigs to understand the effects of corticocollicular projections on FFRs, 765 and macaques can be efficiently trained to categorize novel stimuli to reveal the effects of 766 learning on FFRs. We have set a crucial template for future studies to examine the FFRs across synaptic currents that effectively constitute the cortical FFRs (Fig 8) . Nevertheless, we 799 emphasize that the finding of cortical source of the FFR is not at odds with the well-established 800 existence of subcortical sources in scalp-recorded FFRs Smith et al., 1978) , 801 but show that cortical sources also contribute to scalp FFRs 802

The use of animal models enabled us to explore the similarities between the scalp and 804 cortical FFRs in the same subjects. The power coherence between the scalp and cortical FFRs 805 further shows that the cortical sources do not strongly contribute to scalp FFRs at higher 806 harmonics. At the F0, however, there was a strong correlation between the scalp and cortical 807FFRs. However, the latencies of these correlations suggested that the cortical FFRs had longer 808 latencies than the scalp FFRs. 809We used independent component (IC) analysis to disentangle the contribution of the 810 spatio-temporally overlapping cortical and sub-cortical FFR components and their contribution 811 to the scalp recorded FFRs. We found short latency volume-conducted ICs that presumably 812 emerged from the subcortical regions and propagated uniformly to all scalp and cortical 813 electrodes. We also found strong ICs that were localized to thalamocortical recipient layers and 814 projected to the scalp. These ICs likely reflect the bulk signal from the cortex that propagates to 815 the scalp. However, not all putative cortical ICs contributed to the scalp-recorded FFRs due to 816 dipole orientations that did not favor volume conduction to the scalp, and differed based on the 817 electrode location. There was a very specific pattern of differential cortical contribution based on FFRs. At the same time, it also sheds light on the species-specific differences and similarities 829 that contribute better understanding of the FFR properties across animal models. 830

The sample size can be considered a limitation of the study, potentially limiting the 832 generalizability of the study. However, it is very rare to record FFRs from bilateral Heschl's gyrus 833 in the same human subject, and we statistically show the comparison between the two hemispheres 834 at a single subject level. Similarly, we show high quality replicable recordings in the two macaques 835 and the two guinea pigs. Further, we show that the finding of FFRs to speech is localized to the 836 HG similar in both our human subjects. In addition, the results of both GPs are also largely similar. 837We also report results of individual human, macaques, and guinea pigs for better transparency and 838 to avoid over-generalization across the small sample sizes. We also offer several complementary 839 analyses within and across species to facilitate our analysis which show converging evidence for