Cory Gerritsen
York University
Elzbieta Slawinski
University of Calgary
David Eagle
University of Calgary
Abstract
Auditory perception may be modified by attentional mechanisms. Forward informational masking, an attentional phenomenon, was studied as a function of time and task demands. A rapid auditory presentation (RAP) task involving timbre-based streaming of to-be-attended sound signals from distractors was used to assess the report of a to-be-attended signal (Probe; “P”). The experiment consisted of a sequence of distractors alone, or following another to-be-attended signal (Target; “T”), in various conditions. Participants (29 undergraduate psychology students) were asked either to simply detect or to identify P at various Stimulus Onset Asynchronies (SOAs) after T. Learning effects were also examined. Response task was found to be irrelevant to decrements, while decrements were generally ameliorated as SOA increased, and as experience with the tasks increased. These performance decrements represent an auditory attentional blink.
Keywords: attention, informational masking, auditory attentional blink
Introduction
Navigating our perceptual environment would be impossible were it not for our ability to determine which items in the environment were task-relevant to us and which items were not. Our cognitive systems cannot process all pieces of information that are presented to us each moment (Tsotsos, 1997). Selective attention allocation allows for some, but not all stimuli to be processed in a task-appropriate way. The mechanisms by which stimuli are selected for this differential processing are largely unknown, as is the nature of the processing itself.
The Attentional Blink
Numerous researchers have studied auditory attention by exploring situations in which the auditory system fails to process relevant information due to interference from another signal. When such a failure cannot be attributed to limitations of the physiological hearing apparatus, the phenomenon is referred to as informational masking (e.g., Massaro and Khan, 1973; Leek, Brown, & Dorman, 1991). The Attentional Blink (AB), in audition, is an example of such a situation. In visual research, AB is a much-studied phenomenon wherein the correct report or identification of a to-be-attended stimulus (e.g., a letter or digit) is made less likely due to the allocation of attention to a previously presented stimulus (Raymond, Shapiro, and Arnell, 1992). AB tasks typically employ the rapid presentation of to-be-attended signals set within a sequence of distractors. For example, brief serial presentations of letters may be interspersed with digits and participants may be asked to report the presence or identity of these digits. A blink is defined by the failure to report the presence of one to-be-attended signal (Probe: “P”), significantly more often in trials wherein it is preceded by another to-be-attended signal (Target: “T”), than in trials in which it is presented alone.
Theorizing about the attentional blink
The formulation of new models of how the blink occurs in audition may be useful. Currently, models for how the AB works are based primarily on evidence from the visual domain, due to a much stronger representation of visual blink research in the literature. Examples of such models are given in Shapiro, Raymond & Arnell (1994), Chun and Potter (1995) and Jolicoeur (1998). One exception to this rule is the hypothesis of Potter, Chun, Banks & Muckenhopt (1998). They have proposed that the AB in audition is due to a task-switching deficit, since a blink was observed only in trials wherein the processing of T and P differed in their task. However, more recent work using higher presentation rates (Goddard and Slawinski, 1999) has demonstrated a blink in an auditory presentation involving only one task. The possibility of multiple loci or task-dependant loci for attentional selection (Lavie, 1995) has also not been considered in theories of AB in audition. It has not yet been demonstrated clearly that the AB in audition or vision reflects only one type of processing failure at only one level of the system (Jolicoeur, 1998).
Furthermore, visual models of the AB may be ill suited to explain the AB in audition. The mechanisms by which auditory and visual input are processed are fundamentally different in several ways. Given these differences, attentional systems in the two modalities are likely to function in different ways. For example, there is no analogue to visual fixation in audition (Banks, Roberts, and Cyrani, 1995). Although the auditory system can focus attention on one band of frequencies, roughly paralleling the visual fixation process (Mondor and Bregman, 1994), this process would only be useful in “fixating” on auditory objects that are defined by the frequency region they occur in. Most sound objects are not clearly defined in this way, making attentional selection a much different task in audition.
Empirical evidence exists that supports the idea that visual and auditory ABs are separate phenomena. Goddard, Isaak and Slawinski (1998) found that visual and auditory ABs did not correlate within individuals in a study that subjected participants to both. If a single, amodal mechanism was responsible for both decrements, a positive correlation may be expected. Furthermore, it has been observed that when no spatial shift is required between T and P viewing in visual blink tasks, stimuli immediately following T in the sequence are often spared from the effects of the blink (for a summary, see Visser, Zuvic, Bischof & diLollo, 1999). This phenomenon, called “lag one sparing” (Potter et al., 1998), has not been observed in auditory tasks. In fact, P is in fact least likely to be reported when it immediately follows T in such tasks. All of these considerations provide a strong impetus to explore the attentional blink in audition separately from its visual counterpart and to generate unique models for its functioning.
The AB in audition has been labeled the Auditory Attentional Blink (AAB; Duncan, Martens, and Ward, 1997; Slawinski and Goddard, 2001), although, for reasons described above, it is uncertain whether it is best to describe the visual and auditory versions of AB in similar terms. Given the unclear status of AAB theories in the existing literature, the current study serves to address questions about the AB in audition specifically, with a view towards developing theories about the processes that contribute to it.
AAB with Various Stimuli and the Current Study
This study aims to disentangle the contributions of identification processes and simpler detection processes to the the AAB. This goal requires the development of new stimuli that will allow for such an analysis while avoiding possible confounds.
While AAB studies using speech stimuli have yielded blink phenomena (e.g., Duncan et al., 1997), it is uncertain whether this is a purely auditory phenomenon, since visual representations may be made of speech stimuli (Goddard, Isaak, and Slawinski, 1998), and since the blink may occur between modalities (Arnell and Jolicoeur, 1999). Linguistic stimuli are also highly familiar and may be categorized nearly automatically when presented. This makes categorical processing a plausible candidate for the cause of the decrement along with simpler auditory processes. However, the AAB has also been demonstrated to occur in tasks that employ pure tones as stimuli, with T and P defined by intensity (i.e., these tones were louder than distractors; Goddard, Isaak and Slawinski, 1998; Slawinski and Goddard, 2001). This procedure theoretically eliminates the possible effects of visualization and linguistic processing.
The current study examines the AAB in a task that employs timbral differences between novel, non-speech targets, probes, and distractors as the basis for their discrimination. The use of a timbre as a basis for discrimination makes it possible for participants to identify individual to-be-attended signals: Qualitative differences between a T and P that vary in timbre may be described by a participant, allowing for identification tasks to be conducted as well as simple detection tasks. Furthermore, participants have no experience discriminating these sounds, and as such they are prone to automatic categorization or identification. Therefore, such sounds will provide an opportunity to study separately the roles of categorization and simple detection or discrimination processes in the AAB. An AAB observed in such a task would also indicate implications for research in music perception, in which rapid presentation of sounds with various timbres is common.
The proneness of the AAB to learning effects will also be easier to observe when these novel stimuli are employed. Observing learning effects is a secondary aim of the current study. The presence of learning effects would indicate that an AAB observed with these stimuli is not due to absolute limits of the auditory system (as in, for example, energy masking), but rather that a process that is prone to learning is responsible for the effect.
Summary and hypotheses
To summarize, this study aims to determine whether an AB effect will be seen in a timbral-streaming Rapid Auditory Presentation task not involving speech sounds while differentiating between tasks involving the detection and identification of T and P. The study will also examine learning effects in the AAB. It is hypothesized that an attentional blink will be observed in this study, that the observed blink’s characteristics will vary according to task demands, and that learning will ameliorate the blink decrement.
Methodology
Participants
Twenty-nine participants volunteered for the study (for individual participant characteristics, see Table 1). Criteria for exclusion from statistical analyses included: 1. A false-positive rate of greater than 15% (this rate reflected the incidence of the participant indicating they had heard two to-be-attended signals when only one was present), 2. The use of responses by the participant other than those given as options (Indicating possible misunderstanding of the instructions), and 3. Failure to pass the initial hearing screen. All participants were screened for normal hearing with audiograms generated by a Bruel and Kjaer type 1800 audiometer prior to participating. Normal hearing was defined as not more than 40dB HL loss of hearing at any frequency tested, in either ear. Data from seven participants were not included in analyses as they did not meet these criteria: Two participants did not meet criterion 2, while five participants did not meet criterion 1. All participants passed the hearing screen.
Table 1. Participant Characteristics By Task
Task | Identify | Detect |
n | 12 | 10 |
Male:Female Ratio | 7:5 | 2:8 |
Mean Age (Years) | 24 | 22 |
Procedures and Equipment
Stimuli consisted of sound sequences presented at a rate of 10 signals/ second, with 20ms pauses between signals, yielding 80ms individual signals. A 5ms ramp at each signal’s onset and offset was added to eliminate artifactual clicks. Target and Probe sounds consisted of two different synthesized complex sounds approximating the spectral characteristics of a bell and a pipe organ. Which sound constituted T and which one constituted P was randomized across sequences. T and P were recorded from synthesizer notes in their steady state to obtain a temporal envelope that remained more or less unchanged over time (i.e., to eliminate sharp attacks or varying rates of decay). Distractors were simple tones, generated by Matlab, of frequencies varying between 200Hz and 2.5kHz in 100Hz increments, and were amplified so that their intensities were equal to those of the to-be-attended signals. These auditory sequences were generated by a Hewlett-Packard Vectra computer. The program was designed using Matlab. Stimuli were played through Sennheiser HD 265 linear headphones, and amplified through the terminal’s BW-692 speaker system.
Stimuli were chosen based on the subjective qualities that made them subjectively easy to distinguish from each other, as well as from distractors. In order to obtain sounds that were noticeably dissimilar from one another, and from distractors, several target stimuli were played for five volunteer participants. The two most noticeably different sounds as agreed upon by participants (the synthesized bell and organ sounds) were selected for programming into the experimental presentation sequences. These sequences were also reviewed by the five pilot-study participants to confirm that they were still discernible at the rate at which they would be presented during the experiment.
Half of the sequences in each trial contained a target and probe, while the other half contained only a probe. The probe-only response rates were used as baseline measures to determine the extent of decrements due to the presence of a target. In sequences with no target, the space the target would have occupied was filled with another distractor sound. In all, 16 sounds were presented per sequence. Targets (or the distractors that replaced them in control trials) were presented 3, 6, or 9 positions after the onset of the sequence. These positions corresponded to signal onsets 200ms, 500ms and 800ms after the onset of the sequence. Stimulus Onset Asynchrony (SOA) between Target and Probe ranged from 100ms (P presented immediately following T) to 500ms. See Figure 1 for a graphic representation of one such sequence.
Figure 1. Spectrogram of a sample experimental sound sequence. To-be-attended signals can be seen in positions 5 and 10. Stimuli were analyzed using Micromat Computer Systems’ Soundmaker version 1.0.1 software. Recordings for signal analysis were made via a Revox M 3 500 microphone connected to the Hewlett Packard terminal via a Shure model M 268 Mixer.
Each participant’s trial consisted of 80 sound sequences, half of which were experimental sequences (including target and probe) and half of which were control sequences (including only the probe). This variable is henceforth referred to as “target presence”. The instructions given varied between participants also varied, with half of the participants being asked to indicate whether they had heard the “bell”, the “organ”, or both within each sequence (“identify” condition). The other half merely being asked to indicate how many of the to-be-attended sounds they had heard (“detect” condition). This variable is henceforth referred to as “Task”. Also, each participant’s performance in the first 40 sequences was compared with performance in the next 40 sequences to gauge any effects of repeated exposure to the sequences (i.e., learning effects). This variable is henceforth referred to as “exposure”. The final variable was SOA: the delay between the onset of the target (or distractor in its place in control trials) and that of the probe. SOA and target presence were balanced between the two levels of exposure to ensure that its effects were not artifactual of the experimental design. SOA, target presence and exposure were manipulated within participants while task was manipulated between participants.
After the peripheral hearing test screening, practice trials were administered. Participants performing the identification task were given practice identifying the targets on their own, then embedded in distractor sequences, until they demonstrated that they could correctly identify single targets embedded in sequences. Participants performing the detection task were allowed to practice listening to single-target streams until they could discriminate target sounds from distractor stimuli. When participants reported feeling comfortable performing these tasks, they began the trials. Participants performing the detection task were asked to indicate how many of the sounds they heard in a given sequence – zero, one, or two – and were told that there would never be more than two. Before performing the identification task, participants were given the same instructions, except that they were told to identify the sounds they heard; none, bell, organ, or both. Trials began with a key press. Participants were given as much time as they wanted to respond and made their responses by keyboard. Participants were instructed to guess in case of uncertainty.
Results
Data were analyzed using a 2x2x5x2 (task x target presence x SOA x exposure) mixed-model ANOVA. Significance was assessed using alpha levels determined using the Bonferroni correction for family-wise error rates based on an alpha level of 0.05. Significant main effects were found for target presence, F(1,20)=45.70, p<.001, for SOA, F(4,17)=25.20, p<.001, and for exposure, F(1,20)=5.41, p=.03. No significant main effect was found for Task, F(1,20)=.001, p=.98. Significant two-way interactions were found between target presence and SOA, F(4,17)=19.10, p<.001, and between target presence and exposure, F(1,20)=11.20, p<.001. No other interactions approached significance.
The exposure x task presence interaction is summarized in Figure 2. Significant differences were found between target-absent and target-present trial means at both levels of exposure. At the first level of exposure (first 40 trials for each participant), a two-tailed simple t-test yielded significance, t(21)=-7.16, p<.001, between the target-present mean (.811) and the target-absent mean (.993); and at level two (latter 40 trials for each participant), a two-tailed simple t–test also yielded significance, t(21)=-5.08, p<.001, between the experimental mean (.875) and the control mean (.984).
Figure 2. Significant learning effect. X axis shows 1st v/s 2nd half of trials performed by each participant, Y axis shows the likelihood out of one of reporting P when presented with (black bar) and without (grey bar) T.
The target presence x SOA interaction is summarized in Figure 3. Two-tailed simple t-tests yielded significant differences between target-absent and target-present trial means at 100ms (.580 and .977, respectively), t(21)=-7.01, p<.001, at 200ms (.881 and .994), t(21)=-4.18, p<.001, and at 500ms (.926 and .989), t(21)=-3.49, p=.002. Differences were not significant (though they approached significance) at 300ms (.932 and .989), t(21)= -2.02, p=.057, and 400ms (.898 and .994), t(21)=-2.57, p=.018.
Figure 3. Significant blink effects were seen at SOAs 1, 2 and 5. Y axis shows the likelihood out of one of reporting P when presented with (black bar) and without (grey bar) T.
Discussion
Hypotheses
The hypothesis that an attentional blink would be observed in this study was supported – a clear decrement was seen at three of the SOAs tested; 100ms, 200ms, and 500ms. While these were the only decrements that achieved statistical significance, decrements at 300ms and 400ms approached significance. A timbral streaming task (using stimuli other than speech sounds) may then produce an AAB comparable to tasks involving other stimuli. The hypothesis regarding learning was also supported: Across tasks and SOAs, performance in the report of P was better during the second half of trials presented.
The hypothesis that the observed decrement would vary according to response task was not supported. No difference was found between blink functions produced by the two types of tasks.
Implications
The observation of AAB in the tasks described above highlights the pervasiveness of this phenomenon and its robustness across different stimuli and tasks.
Differentiating Identification and Detection Tasks. The lack of an effect of task demands in this study is particularly interesting. Visual blink tasks in which targets are discriminable from distractors by category membership (e.g., letter or number) may require rapid identification of visual forms in order to be discriminated from the distractor stream. This makes such processes plausible explanations for the presence of a blink. Auditory tasks that employ speech sounds may also be incapable of creating a distinction between identification and detection since these signals may be categorized automatically. Auditory tasks typically employ simpler stimuli whose discrimination does not require identification, only the detection of intensity level differences. In such tasks, identification of targets and probes is not assessed. The task used in this study was unique in that it allowed for a distinction to be drawn between identification and detection task demands. The lack of differences observed may suggest several possibilities. It is possible that the AAB involves a failure to discriminate T and P from distractors due to basic deficits in determining these sounds’ distinguishing spectral characteristics, resulting in an inability to detect and, therefore, to identify the sounds. The fact that the AAB effect, in this scenario, was prone to learning effects also has implications.
If the AAB represents an early failure to distinguish between T, P, and distractors, and if this failure may be ameliorated through learning (suggesting that it is not a physiological limit of the auditory system), auditory scene analysis (ASA) may be responsible for the AAB. ASA is the processes by which the auditory system perceptually group sonic signals together to form sound objects. These mechanisms often rely on learned information in order to function, a phenomenon referred to as schema-based streaming (Bregman, 1990). The operation of such mechanisms may have implications for selective attention to different auditory signals. Failures to discriminate sounds from the background stream may result in a failure to allocate attention to them. This idea is not incompatible with other ideas about the blink; central processing decrements may still occur, and given the evidence from cross-modal blink tasks (Arnell and Jolicoeur, 1999), they likely do. The suggestion is simply that more basic perceptual processes may also be prone to failures in the processing of a signal related to the processing of another signal when presented in close succession. More support for this hypothesis may be seen in informational masking research performed by Leek, Brown & Dorman (1991). These authors found that target detection was adversely affected as the degree of similarity between targets and distractors increased. If informational masking is released when a target and distractor are dissimilar enough, this may suggest that a resource-consuming discrimination or grouping task is at the root of informational masking and, possibly, the AAB.
Interpreting the lack of a difference between results yielded by the two tasks should be performed with caution, though. It is possible that null finding may simply be construed as a lack of statistical power. They may also reflect the possibility that some form of identification of T and/or P occurred regardless of the instructions given to participants. Even in the detection task, for instance, participants tended to come up with their own labels for the targets as they learned the task.
Implications for Music and Speech Perception. The fact that the AAB may occur in a timbre-based discrimination task demonstrates that it may have implications for research in music perception. Listening to music is a task that often requires distinctions to be made between sounds generated by different instruments with characteristic timbral qualities, when these sounds are presented in rapid succession as they were in this study. Speech stimuli also share these qualities: Phonemes in ordinary speech are rapidly-presented (several separate phonemes may presented within 500ms) and discernible in part by timbre. If the AAB were as prevalent in everyday conversation as it was in this study, speech comprehension would seem almost impossible. The amelioration of blink effects through learning may explain, in part, a listener’s ability to make such rapid distinctions and identifications.
Future Directions
Although the current study revealed a pattern of results suggesting the presence of AAB, to of the SOAs examined yielded only nearly-significant decrements. These results should be interpreted with caution in light of the possibility that the decrements observed may operate over a different time course than decrements observed in other RAP tasks. The null finding for task demand effects in this study should also be interpreted with caution due to the possibility of multiple explanations.
Future studies may benefit from the use a timbral discrimination-based AAB task by assessing which of T and P are actually reported in the event of a blink. Information Masking tasks (e.g., Massaro and Khan, 1973) have demonstrated that IM may operate both forwards and backwards in time. It is unknown which of the two signals is typically “missed” in the event of an AAB, if there is a characteristic pattern at all. The possibility of reporting target identity makes this a possible topic of study.
Conclusions
Models of how the AB functions are often based largely on evidence from the visual domain, and may not be transferrable to the auditory modality. The timbrally-based discrimination task has proven to be a useful way of exploring the AAB, making various new paradigms available for research into this phenomenon. It has yielded a pattern of attentional decrements that are robust across task demands, but that may be ameliorated by learning. These findings may have implications for theories about the processes that contribute to AAB.
References
Arnell, K. M. & Jolicoeur, P. (1999). The Attentional Blink Across Stimulus Modalities: Evidence for Central Processing Limitations. Journal of Experimental Psychology: Human Perception and Performance, 25, 630-648.
Banks, W. P., Roberts, D. & Cirani, M. (1995). Negative Priming in Auditory Attention. Journal of Experimental Psychology: Human Perception and Performance, 21, 1354-1361.
Bregman, A. S. (1990). Auditory Scene Analysis: The Perceptual Organization of Sound. Cambridge, MA: MIT Press.
Chun, M. M. & Potter, M. C. (1995). A Two-Stage Model for Multiple Target Detection in Rapid Serial Visual Presentation. Journal of Experimental Psychology: Human Perception and Performance, 21, 109-127.
Duncan, J., Martens, S. & Ward, R. (1997). Restricted Attentional Capacity Within But Not Between Sensory Modalities. Nature, 387, 808-810.
Goddard, K. M.; Isaak, M.I. & Slawinski, E. B. (1998). Modality Specific Attentional Mechanisms Can Govern the Attentional Blink. Canadian Acoustics, 27, 98-99.
Jolicoeur, P. (1998). Modulation of the Attentional Blink by On-Line Response Selection: Evidence from Speeded and Unspeeded Task One Decisions. Memory and Cognition, 26, 1014-1032.
Lavie, N. (1995). Perceptual Load as a Necessary Condition for Selective Attention. Journal of Experimental Psychology: Human Perception and Performance, 21, 451-468.
Leek, M. R., Brown, M. E. & Dorman, M. F. (1991). Informational Masking and Auditory Attention. Perception and Psychophysics, 50, 205-214.
Massaro, D. W. & Kahn, B. J. (1973). Effects of Central Processing on Auditory Recognition. Journal of Experimental Psychology, 97, 51-58.
Mondor, T. A. & Bregman, A. S. (1994). Allocating Attention to Frequency Regions. Perception and Psychophysics, 56, 268-276.
Raymond, J. E., Shapiro, K. L. & Arnell, K. M. (1992). Temporary Suppression of Visual Processing in an RSVP Task: An Attentional Blink? Journal of Experimental Psychology: Human Perception and Performance, 18, 849-860.
Shapiro, K. L., Raymond, J. E. & Arnell, K. M. (1994). Attention to Visual Pattern Information Produces the Attentional Blink in Rapid Serial Presentation. Journal of Experimental Psychology: Human Perception and Performance, 20, 357-371.
Slawinski, E. B. & Goddard, K. M. (2001). Age-Related Changes in Perception of Tones Within a Stream of Auditory Stimuli: Auditory Attentional Blink. Canadian Acoustics, 29, 3-12.
Tsotsos, J. K. (1997). Limited Capacity of Any Realizable Perceptual System is a Sufficient Reason for Attentive Behavior. [Abstract]. Consciousness and Cognition, 6, 429-436.
Visser, T. A. W., Zuvic, S. M., Bischof, W. F. & DiLollo, V. (1999). The Attentional Blink With Targets in Different Spatial Locations. Psychonomic Bulletin and Review, 6, 432-436.
Biographical Notes
Cory Gerritsen received a B.A. in psychology from the University of Calgary in 2002 and is currently studying clinical psychology at York University.
Elzbieta Slawinski received her Ph.D. at the University of Warsaw in 1978. She recently retired from her position as Associate Professor of Psychology at the University of Calgary.
David Eagle received his Ph.D. at the University of California, Berkeley in 1992. He is currently Associate professor of Composition and Electroacoustic Music at the University of Calgary.