Human sensory systems are organized into processing hierarchies within cortex, such that incoming sensory information is analyzed and compiled into our vivid sensory experiences. Computations that are common to these sensory systems include the abilities to maintain enhanced focus on particular aspects of incoming sensory information (i.e., attention) and to retain sensory information in a short-term memory store after such sensory information is no longer available (i.e., working memory). In at least the auditory and visual systems, the necessary computational steps to create these experiences take place in cloverleaf clusters of cortical field maps (CFMs). The human auditory CFMs represent the spectral (i.e., tones) and temporal (i.e., period) aspects of sound, which are represented along the cortical surface as two orderly gradients that are physically orthogonal to one another: tonotopy and periodotopy, respectively. Knowledge of the properties of such CFMs is the foundation for understanding the specific sensory computations carried out in particular cortical regions. This chapter reviews current research into auditory nonverbal attention, auditory working memory, and auditory CFMs, and introduces the next steps to measure the effects of attention and working memory across the known auditory CFMs in human cortex using functional MRI.
- human auditory cortex
- cloverleaf cluster
- cortical field maps
- working memory
Mammalian sensory systems are composed in cortex of many functionally specialized areas organized into hierarchical networks [1, 2, 3, 4, 5, 6]. The most fundamental sensory information is embodied by the organization of the sensory receptors, which is maintained throughout most of the cortical hierarchy of sensory regions with repeating representations of this topography in cortical field maps (CFMs) [5, 7, 8, 9, 10, 11, 12, 13]. Accordingly neurons with receptive fields situated next to one another in sensory feature space are positioned next to one another in cortex within a CFM.
In auditory cortex, auditory field maps (AFMs) are identified by two orthogonal sensory representations: tonotopic gradients from the spectral aspects of sound (i.e., tones), and periodotopic gradients from the temporal aspects of sound (i.e., period or temporal envelope) [5, 10, 14]. On a larger scale across cortex, AFMs are grouped into cloverleaf clusters, another fundamental organizational structure also common to visual cortex [8, 10, 15, 16, 17, 18, 19, 20]. CFMs within clusters tend to share properties such as receptive field distribution, cortical magnification, and processing specialization (e.g., [18, 19, 21]).
Across the cortical hierarchy, there is generally a progressive increase in the complexity of sensory computations from simple sensory stimulus features (e.g., frequency content) to higher levels of cognition (e.g., attention and working memory) [6, 13, 22]. CFM organization likely serves as a framework for integrating bottom-up inputs from sensory receptors with top-down attentional processing [12, 17]. With the recent ability to measure AFMs in the core and belt regions of human auditory cortex along Heschl’s gyrus (HG) using high-resolution functional magnetic resonance imaging (fMRI), the stage is now set for investigation into this integration of basic auditory processing with higher-order auditory attention and working memory within human AFMs (Figure 1) [5, 12, 15, 23].
This chapter first provides a brief history of research into models of auditory nonverbal attention and working memory, with comparisons to their visual counterparts. Next, we discuss the current state of research into AFMs within human auditory cortex. Finally, we propose directions of future research investigating auditory attention and working memory within these AFMs to illuminate how these higher-order cognitive processes interact with low-level auditory processing.
2. Attention and working memory in human audition
2.1 Models of attention and working memory
Attention, the ability to select and attend to aspects of the sensory environment while simultaneously ignoring or inhibiting others, is a fundamental aspect of human sensory systems (for reviews, see [24, 25, 26, 27]). Given the limited resources of the human brain, attention allows for greater resources to be allocated to processing of important incoming sensory stimuli by diverting precious resources from currently unimportant stimuli. Such allocation can be controlled cognitively, in what is generally referred to as ‘top-down’ attentional control in models of attention, in reference to the higher-order cognitive processes controlling attention from the ‘top’ of the sensory-processing hierarchy and acting ‘down’ on the lower levels (Figure 2) [24, 28, 29, 30, 31]. Despite lower priority being assigned to the currently unimportant stimulus locations, change is constant, so the resource diversion to attended stimuli is not absolute, allowing for the sensory environment to continue to be monitored. If, instead, processing resources were evenly distributed throughout the sensory field, without regard to salience, more resources would be wasted on unimportant aspects of the field. If something in the unattended sensory field should become important, the system requires a mechanism to reorient attention to that aspect of the field. Such stimulus-driven attentional control is referred to as ‘bottom-up’, referring to the ability of incoming sensory input at the bottom of the hierarchy to orient the higher-order attention system. This broad framework of attentional models is common at least to the senses most commonly studied, vision and audition [25, 27, 31, 32].
In the effort to elucidate the parameters of auditory attention, researchers have taken a myriad of approaches in numerous contexts. Researchers have attempted to decipher at what level of the sensory-processing hierarchy stimulus-driven attention occurs (after which sensory-processing steps does attention act) [24, 30, 31, 33, 34, 35], how attention can be deployed (to locations in space or particular sensory features) [36, 37, 38, 39, 40], and how can attention be distributed (to how many ‘objects’ or ‘streams’ can attention be simultaneously deployed) [41, 42, 43, 44]. Many studies have narrowed the range of possibilities without precisely answering these questions, and so remain active areas of research. Modern models of attention generally agree that stimuli are processed to some degree before attention acts, accounting for the stimulus-driven ‘bottom-up’ attentional shifts, though it is unclear to precisely which degree [24, 30, 33]. Neuroscientific evidence suggests that attention acts throughout sensory-processing hierarchies, so the idea of attention being located at a particular ‘height’ in the hierarchy may not be a particularly useful insight for identifying the cortical locus of attentional control [45, 46]. Modern attentional models also generally agree that attention can be deployed to locations in or features of sensory space, both of which are fundamental aspects to the sensory-processing hierarchy [24, 35]. Finally, modern models of attention agree that attention is very limited, but not about precisely how it is limited. Some models are still fundamentally ‘spotlight’ models [25, 44], in which attention is limited to a single location or feature set, while others posit that attention can be divided between a small number of locations or features [41, 47]. Based on related working-memory research, the latter theory is gaining prominence as likely correct.
Working memory (i.e., a more accurate term for ‘short-term memory’) is the ability to maintain and manipulate information within the focus of attention over a short period of time after the stimulus is no longer perceptible (for reviews, see [48, 49, 50, 51]). Without explicit maintenance, this retention period is approximately 1–2 s, but is theoretically indefinite with explicit maintenance. Working memory should not to be confused with ‘sensory memory’, also known as ‘iconic memory’ in vision and ‘echoic memory’ in audition . Sensory memory is a fundamental aspect of sensory systems in which a sensory trace available to attention and working-memory systems persists for less than ~100 ms after stimuli are no longer perceptible. Models of working memory are nearly indistinguishable from models of attention; the key difference is that working memory is a ‘memory’ of previously perceptible stimuli, whereas attention is thought to act on perceptible stimuli or sensory traces thereof. Working-memory models posit, by definition, that working memory acts after perception processing has occurred (Figure 2; for review, see ). However, it has been difficult to isolate exactly where working-memory control resides along the cortical hierarchy of sensory processing, likely because low-level perceptual cortex is recruited at least for visual working memory and attention [40, 46, 54, 55].
Like attention, working-memory models also posit that working memory is a highly limited resource, in which a small set of locations or objects (e.g., 3–4 items on average) can be simultaneously maintained [42, 49]. In fact, some modern measures of attention and working memory are nearly identical. The change-detection task is a ubiquitous one in which subjects are asked to view a sensory array, then compare that sensory array to a second one in which some aspect of the array may have changed, and indicate whether a change has occurred (Figure 3) [56, 57, 58, 59, 60]. A short delay period (i.e., retention interval) is included during each array, which may include a neutral presentation or, if desired, a mask of the sensory stimuli to prevent the use of ‘sensory memory’. The length of the delay period can be then be altered to either measure attention or working memory. If the delay period is on the order of ~0–200 ms, it is considered an attentional task; if it is longer, on the order of 1–2 s, it is considered a working-memory task . Therefore, attention and working-memory systems are at a minimum heavily intertwined and very likely the same system studied in slightly different contexts, with attention being a component of a larger working-memory framework.
With the relatively recent invention of fMRI, researchers have been able to begin to localize these models of attention and working memory to their cortical underpinnings (e.g., [6, 37, 40, 50, 55, 61, 62]). FMRI, through its exquisite ability to localize blood oxygenation-level dependent (BOLD) signals (and thus the underlying neural activity) to just a couple of millimeters is the best technology available for such research [63, 64]. Two broad approaches have been employed for studying these high-order cognitive processes: model-based and perception-based. Model-based investigations tend to use tasks based on behavioral investigations into attention and working memory, adapt them to the strict parameters required of fMRI, and compare activity in conditions when attention or working memory are differentially deployed [61, 62]. Perception-based investigations tend to measure low-level perceptual cortex that has already been mapped in detail and measure the effects of attention or working memory within those regions [50, 55, 65]. Both approaches are important and should be fully integrated to garner a more complete and accurate localization of these attentional and working-memory systems.
2.2 Overview of auditory and visual attention research
Research into attention began in earnest in the auditory system after World War II with a very practical motivation. It had been noted that fighter pilots sometimes failed to perceive auditory messages presented to them over headphones despite the fact that the messages were completely audible. To solve this problem, Donald Broadbent began studying subjects with an auditory environment similar to the pilots, with multiple speech messages presented over headphones . Based on his findings, he proposed a selective theory of attention, which was popular and persuasive, but ultimately required modification. Environments such as the one Broadbent studied are more commonly encountered at cocktail parties, in which multiple audible conversations are taking place, and people are able to attend to one or a small set of speech streams while attenuating the others. To study the ‘cocktail party phenomenon,’ the dichotic listening task was developed in the 1950s by Colin Cherry [66, 67]. Subjects were asked to shadow the speech stream presented to one ear of a set of headphones while another stream was presented to the other ear, and they demonstrated little knowledge of the nonshadowed (unattended) stream (Figure 4).
A host of studies followed up on the basic finding, revealing several attentional parameters within the context of that type of task (e.g., [30, 35, 40, 68, 69, 70, 71]). Importantly, preferential processing of the attended stream relative to the unattended streams is not absolute; for example, particularly salient information, such as the name of the subject, could sometimes be recalled from an unattended stream, presumably by reorienting attention [39, 66, 67, 69]. The streams were typically differentiated spatially (e.g., to each ear through a headset), indicating a spatial aspect to attentional selection and therefore the attentional system. Similarly, the streams were also typically differentiated by the voice of the person speaking, indicating attentional selection based on the spectrotemporal characteristics of the speaker’s voice such as the average and variance of pitch and speech rate (often reflecting additional information about the speaker, such as gender) [66, 67, 68, 72].
These findings are very similar to findings in the visual domain, indicating that attentional systems across senses are similarly organized. Visual attention can similarly be deployed to a small set of locations or to visual features with very little recall of nonattended visual stimuli . Roughly analogous to speech shadowing are multiple-object-tracking tasks, which require subjects to visually track a small set of moving objects out of a group [47, 73]. Visual change-detection tasks are also very common, and they demonstrate very similar results as their auditory counterparts [50, 74, 75]. In sum, the evidence suggests that attentional systems are organized very similarly, perhaps identically, between at least vision and audition.
Despite these broad contributions, these types of tasks are of limited utility when tying behavior to cortical activity because the types of stimuli used are rather high-order (e.g., speech) with relatively uncontrolled low-level parameters. For example, the spectrotemporal profile of a stream of speech is complex, likely activating broad swaths of low-level sensory cortex in addition to higher-order regions dedicated to speech comprehension, including working and long-term memory [68, 72, 76, 77]. If one were to compare fMRI activity across auditory cortex in traditional dichotic listening tasks, the differences would have far too many variables for which to account before meaningful conclusions can be made about attentional systems. It may seem intuitive to compare cortical activity between conditions where identical speech stimuli have been presented and the subject either attended to the stimuli or did not. However, areas that have increased activity when the stimuli were attended could simply reflect higher-order processing that only occurs when attention is directed to the stimuli rather than directly revealing areas involved in attentional control. For example, recognition of particular words requires comparison of the speech stimulus to an internal representation, which requires activation of long-term memories of words . Long-term memory retrieval does not happen if the subject never perceived the word due to attention being maintained on a separate speech stream, so such memory-retrieval activity would be confounded with attentional activity in the analysis .
Thus, simpler stimuli that are closer in nature to the initial spectrotemporal analyses performed by primary auditory cortex (PAC) are better suited for experiments intended to demonstrate attentional effects in cortex . Reducing the speech comprehension element is a good first step, and research approached this by using a change-detection task and arrays of recognizable animal sounds (cow, owl, frog, etc.; Figure 5) . These tests revealed what the researchers termed ‘change deafness,’ in which subjects often failed to identify changes in the sound arrays. Such inability to detect changes is entirely consistent with very limited attentional resources, and very similar to results of working-memory change-detection tasks [30, 53, 60, 78].
However, even these types of stimuli are not best suited to fMRI investigation at this stage of understanding due to their relative complexity compared to the basic spectrotemporal features of sounds initially processed in auditory cortex [12, 50]. As discussed in detail below, the auditory system represents sounds in spectral and temporal dimensions, and stimuli similar to those used to define those perceptual areas would be best suited now to evaluating the effects of attention in the auditory system (Figure 6) [5, 10].
2.3 Overview of auditory and visual working-memory research
Visual and auditory working memory were discovered in quick succession and discussed together in a very popular and influential model by Baddeley and Hitch linking sensory perception, working memory, and executive control [79, 80, 81]. The generally accepted modern model of working memory has changed somewhat from the original depiction, but the vast majority of research has been working within the framework (for reviews, see [30, 51, 53, 79, 81]). Each sense is equipped with its own perceptual system and three memory systems: sensory memory, working memory, and long-term memory. Direct sensory input, gated by attentional selection, is one of the two primary inputs into working memory. Sensory memory is a vivid trace of sensory information that persists after the information has vanished for a short time and is essentially equivalent to direct sensory input into working memory, again gated by attentional selection; one can reorient attention to aspects of the sensory trace as if it were direct sensation. Long-term memory is the second primary input into working memory, which is gated by an attention-like selection, generally referred to as selective memory retrieval. Working memory itself is a short-term memory workspace lasting a couple of seconds without rehearsal, in which sensory information is maintained and manipulated by a central executive . The central executive is a deliberately vague term with nebulous properties; as a colleague often quips, “All we know of the central executive is that it’s an oval,” after its oval-shaped depiction in the Baddeley and Hitch model. There is ongoing debate as to the level of the hierarchy at which each system is integrated into that of the other senses, with no definitive solutions.
Visual working memory and visual sensory memory (i.e., ‘iconic memory’) were fundamentally measured by George Sperling in 1960 . He presented arrays of simple visual stimuli for short periods of time and asked subjects to report what they had seen after a number of short delays. He discovered that subjects could only recall a small subset of stimuli in a large array, representing the limited capacity of visual working memory. Furthermore, they could recall a particular subset of the stimuli when cued after the presentation but before the sensory trace had faded (≤100 ms), indicating that visual sensory memory exists and that visual attention can be deployed to stimuli either during sensation or sensory memory. Over the next decade, George Sperling went on to perform similar measurements in the auditory system, delineating very similar properties for auditory perception, sensory memory, and working memory .
Without directly measuring brain activity, researchers concluded that sensory systems must be operating independently with dual-task paradigms in which subjects were asked to maintain visual, auditory, or both types of information in working memory. It was shown that subjects could recall ~3–4 ‘chunks’ of information (which may not precisely reflect individual sensory locations or features) of each type, regardless of whether they were asked to maintain visual, auditory, or both types of information [49, 78]. If the systems were integrated, one would be able to allocate multisensory working-memory ‘slots’ to either sense, with a maximum number (e.g., 6–8) that could be divided between the senses as desired. Instead, subjects can maintain on average ~3–4 visual chunks and ~3–4 auditory chunks, without any ability to reallocate any ‘slots’ from one sense to the other.
While electroencephalogram (EEG) and positron emission topography (PET) recordings could broadly confirm the contralateral organization of the visual system and coarsely implicate the parietal and frontal lobes in attention and working memory, it was not until the advent of high-resolution fMRI that researchers could begin localizing attention and working memory in human cortex with any detail [6, 17, 37, 50, 84, 85, 86, 87, 88, 89, 90]. Model-based fMRI investigations have attempted to localize visual working memory by comparing BOLD activity in conditions where subjects are required to hold different numbers of objects in working memory [50, 62, 91, 92]. The logic goes that, because visual-working-memory models posit that a maximum of ~3–4 objects can be held in visual working memory on average, areas that increase their activity with arrays 1, 2, 3 objects and remaining constant with arrays of 4 or more objects should be areas controlling visual working memory. Such areas were found bilaterally in parietal cortex by multiple laboratories [57, 62, 91, 93], but activity related to visual working memory has also been measured in early visual cortex (e.g., V1 and hV4) [55, 65, 94], prefrontal cortex , and possibly in object-processing regions in lateral occipital cortex , indicating that working-memory tasks recruit areas throughout the visual-processing hierarchy. (We note that the report of object-processing regions is controversial, as the cortical coordinates reported in that study are more closely consistent with the human motion-processing complex, hMT+, than the lateral occipital complex [15, 17, 96, 97]). However, little has been done to measure visual-working-memory activity in visual field maps, and so these studies should be considered preliminary rather than definitive. Measurements within CFMs would, in fact, help to clear up such controversies.
Auditory-working-memory localization with fMRI has been quite limited compared to its visual counterpart, and largely concentrated on speech stimuli rather than fundamental auditory stimuli [30, 68]. As noted above with attention localization with fMRI, too many variables exist with highly complex stimuli, and as such, a different approach is necessary. Furthermore, even low-level auditory sensory areas have only very recently been properly identified [5, 10].
3. Auditory processing in human cortex
3.1 Inputs to auditory cortex
Auditory processing is essential for a wide range of our sensory experiences, including the identification of and attention to environmental sounds, verbal communication, and the enjoyment of music. The intricate sounds in our daily environments are encoded by our auditory system as the intensity of their individual component frequencies, comparable to a Fourier analysis . This spectral sound information is thus one fundamental aspect of the auditory feature space (Figure 7A,C). The basilar membrane of the inner ear responds topographically to incoming sound waves with higher frequencies transduced to neural signals near the entrance to the cochlea and progressively lower frequencies transduced further along the membrane. This organized gradient of frequencies (i.e., tones) is referred to as tonotopy (i.e., a map of tones); this topography may also be termed cochleotopy, referring to a map of the cochlea. Tonotopic organization is maintained as auditory information is processed and passed on from the inner ear through the brainstem, to the thalamus, and into PAC along Heschl’s gyrus (HG; Figure 1; for additional discussion, see [2, 5, 6, 12, 99, 100]). The preservation of such topographical organization from the basilar membrane of the inner ear to auditory cortex allows for a common reference frame across this hierarchically organized sensory system [6, 7, 12, 13, 22, 23].
A second fundamental aspect of the auditory feature space is temporal sound information, termed periodicity (Figure 7B,D) [10, 101, 102]. Human psychoacoustic studies indicate that there are separable filter banks (i.e., neurons with distinct receptive fields) for not only frequency spectra—as expected given tonotopy, but also temporal information [103, 104, 105]. The auditory nerve likely encodes such temporal information through activity time-locked to the periodicity of the amplitude modulation (i.e., the length of time from peak-to-peak of the temporal envelope) [101, 106]. Temporally varying aspects of sound are thought to preferentially active neurons selective for the onset and offset of sounds and for sounds of certain durations. Organized representations of periodicity in primates have been measured to date in the thalamus and PAC of macaque and human, respectively, and are termed periodotopy, a map of neurons that respond differentially to sounds of different temporal envelope modulation rates [5, 10, 107]. Repeating periodotopic gradients exist in the same cortical locations as, but are orthogonal to, tonotopic gradients, which allows researchers to use measurements of these two acoustic dimensions to identify complete AFMs.
3.2 fMRI measurements of auditory field maps
Measurements of the structure and function of human PAC and lower-level auditory cortex have been relatively few to date, with many studies hampered by methodological issues (for reviews, see [5, 23]. Precise measurements of AFMs across primary and lower-level auditory cortex are vital, however, for studying the neural underpinnings of such prominent auditory behaviors as attention and working memory. Recent research has now successfully applied fMRI methods commonly used to measure visual field maps to the study of AFMs in human auditory cortex.
3.2.1 Phase-encoded fMRI
The phase-encoded fMRI paradigm provides highly detailed in vivo measurements of CFMs in individual subjects [9, 10, 15, 108, 109, 110, 111]. This technique measures topographical representations using stimuli that periodically repeat a set of values in an orderly sequence (Figure 7). The phase-encoded methods are specialized for AFM measurements by combining this periodic stimulus with a sparse-sampling paradigm (Figure 8) [10, 112, 113, 114, 115]. Sparse-sampling separates the auditory stimulus presentation from the noise of the MR scanner during data acquisition to avoid contamination of the data by nonstimulus sounds [116, 117, 118].
The periodic stimulus allows for the use of a Fourier analysis to determine the value of the stimulus (e.g., 800 Hz frequency for tonotopy) that most effectively drives each cortical location . The cortical response at a specific location is said to be ‘in phase’ throughout the scan with the stimulus value that most effectively activates it, hence the term ‘phase-encoded’ mapping. The alternate term ‘traveling-wave’ mapping arises from the consecutive activation of one neighboring cortical location after the other to create a wave-like pattern of activity across the CFM during the stimulus presentation. The phase-encoded paradigm only captures cortical activity that is at the stimulus frequency, thus excluding unrelated cortical activity and other sources of noise. Similarly, cortical regions that are not organized topographically will not be significantly activated by phase-encoded stimuli, as there would be no differential activation across the cortical representation [8, 15, 16]. The statistical threshold for phase-encoded cortical activity is commonly determined by coherence, which is a measure of the amplitude of the BOLD signal modulation at the frequency of the stimulus presentation (e.g., six stimulus cycles per scan), divided by the square root of the power over all other frequencies except the first and second harmonic (e.g., 12 and 18 cycles per scan) [15, 17, 110].
Measurement and analysis of phase-encoded CFM data must be performed within individual subjects rather than across group averages to avoid problematically blurring together discrete CFMs and their associated computations (for extended discussions, see [5, 15, 17]). CFMs may differ radically in size and anatomical position among individual subjects independent of brain size; this variation is reflected in associated shifts in cytoarchitectural and topographic boundaries [119, 120, 121, 122, 123, 124]. In the visual system, for example, V1 can differ in size by at least a factor of three despite its location on the relatively stable calcarine sulcus . Accordingly, when such data are group-averaged across subjects, especially through such approaches as aligning data from individual brains to an average brain with atlases such as Talairach space  or Montreal Neurological Institute (MNI) coordinates , the measurements will be blurred to such a degree that the measured topography of the CFMs is inaccurate or even lost. Blurring from such whole-brain anatomical co-alignment will thus cause different CFMs to be incorrectly averaged together into a single measurement, mixing data together from adjacent CFMs within each subject and preventing the analysis of the distinct computations of each CFM.
3.2.2 Criteria for auditory field map identification
In order to avoid the imprecise application of the term ‘map’ to topographical gradients or other similar patterns of cortical organization, the designation of an AFM—and CFMs in general—should be established according to several key criteria (Figure 9) (for reviews, see [5, 8, 15]). First, by definition, each AFM must contain at least the two orthogonal, nonrepeating topographical representations of fundamental acoustic feature space described above: tonotopy and periodotopy (Figure 9A) [10, 17, 21, 108, 110, 111]. When this criterion is ignored and the measurement of only one topographical representation is acquired (e.g., tonotopy), it is impossible to correctly identify boundaries among cortical regions. Measurements of the organization and function of specific regions of early auditory cortex in human long have mostly relied on tonotopic measurements alone, which has resulted in variable, conflicting, and ultimately unusable interpretations of the organization of human PAC and surrounding regions (for detailed reviews, see [5, 23]).
The representation of one dimension of sensory space—one topographical gradient along cortex like tonotopy—is not adequate to delineate an AFM, or CFMs in any sensory system. The measurement of a singular topographical dimension merely demonstrates that this particular aspect of sensory feature space is represented along that cortical region. The CFMs within that cortical region cannot be identified without measuring an orthogonal second dimension: a region of cortex with a large, confluent gradient for one dimension could denote a single CFM (Figure 9Ai,ii) or many CFMs (Figure 9Ai,iii), depending upon the organization of the overlapping second topography. Similarly, the two overlapping gradients must be approximately orthogonal, as they will otherwise not represent all the points in sensory space uniquely (Figure 9B) [15, 16, 127, 128]. As the complexity of adjacent gradients increases, the determination of the emergent CFM organization grows increasingly complicated.
Due to the relatively recent measurements of periodotopic representations in human auditory cortex and monkey midbrain, AFMs in core and belt regions can now be identified [10, 102]. The identification of periodotopy as the second key dimension of auditory feature space is strengthened by psychoacoustic studies, which show that separable filter banks occur not only for frequency spectra, but also temporal information, indicating the presence of neurons with receptive fields tuned to ranges of frequencies and periods [14, 103, 104, 105]. Additionally, representations of temporal acoustic information (i.e., periodicity) have been measured in the auditory system of other model organisms, including PAC in domestic cat and inferior colliculus in chinchilla [129, 130].
A second AFM criterion is that each of its topographical representations must be organized as a generally contiguous and orderly gradient [16, 128]. For such a gradient to develop, the representation must be organized such that it covers a full range of sensory space, in order from one boundary to the other (e.g., from lower to upper frequencies for tonotopy; Figure 9C). A topographical gradient is thus one of the most highly structured features of the cortical surface that can be measured using fMRI. The odds of two orderly, orthogonal gradients arising as a spurious pattern from noise in an overlapping section of cortex is extraordinarily low (for a calculation of the probability of spurious gradients arising from noise, see ).
Third, each CFM should contain representations of a considerable amount of sensory space. Differences in cortical magnification are likely among CFMs with different computational needs, but a large portion of sensory space is still expected to be represented (e.g., [15, 16, 19, 21, 97, 127, 131]). A high-quality fMRI measurement of the topography is necessary to adequately capture the sensory range and magnification. The quality of the measurement is dependent upon choosing an appropriate set of phase-encoded stimuli. The sampling density and range of values in the stimulus set both affect the accuracy and precision of the measurement. For example, the intensity (i.e., loudness) of the tonotopic stimulus alone can alter the width of the receptive fields of neurons in PAC and consequently increase the lateral spread of the BOLD signal measured in neuroimaging . In addition, some degree of blurring in the measurements of the topography is expected due to such factors as the overlapping broad receptive fields, the inherent spatial spread of the fMRI signal, and measurement noise [64, 109, 133, 134]. The stimulus parameters and how they may affect the cortical responses should therefore be given careful consideration.
Fourth, the general features of the topographies composing the CFMs and the pattern of CFMs across cortex should both be consistent among individuals. It is essential to remember, nevertheless, that cytoarchitectural and topographic boundaries in PAC vary dramatically in size and anatomical location independent of overall brain size [119, 121, 122, 123, 124, 135], as do CFMs across visual cortex [16, 17, 120, 136]. Regardless of these variations, the overall organization among specific CFMs and cloverleaf clusters will be maintained across individuals.
3.2.3 Definition of auditory field map boundaries
The measurement of AFMs is one of the few reliable in vivo methods to localize the distinct borders of the auditory core and belt regions in individual subjects [5, 10, 12, 23]. The boundaries of an AFM—and of CFMs in general—are determined by carefully defining the edges of overlapping sections of tonotopic and periodotopic gradients within a specific cortical region in an individual hemisphere (Figure 9). If a set of overlapping representations of the two dimensions is present in isolation, the boundary of the AFM can be estimated to be where the gradient responses end, although there will likely be some spatial blurring or spreading of the representation along these edges (Figure 9Ai,ii) [16, 17, 110, 137]. For multiple, adjacent representations that each span the full range of one dimension (e.g., low-to-high frequencies of tonotopy) can be divided into two sections at the point at which the gradients reverse (Figure 9Ai,iii). At the gradient reversals, the representations of stimulus values increase from low to high (or vice versa) across the cortical surface in one section to the boundary where the representations in the next AFM then reverse back from high to low (or vice versa) along the cortical surface in the next section (Figure 9C). Such phase-encoded fMRI measurements of the boundaries of the AFMs in human auditory cortex have been shown to be closely related to those determined by invasive human cytoarchitectural studies and nonhuman primate cytoarchitectural, connectivity, and tonotopic measurements [2, 5, 10, 121, 138, 139, 140, 141, 142, 143, 144].
At a scale of several centimeters, groups of adjacent CFMs are organized within both auditory and visual cortex into a macrostructural pattern called the cloverleaf cluster, named for the similarity of the organization of the individual CFMs composing a cluster to the leaves of a clover plant [8, 10, 15, 16, 17, 18, 19, 20]. Within a cluster, one dimension of sensory topography is represented in concentric, circular bands from center to periphery of the cluster, and the second, orthogonal dimension separates this confluent representation into multiple CFMs with radial bands spanning the cluster center to periphery. In AFM clusters, a confluent, concentric tonotopic representation is divided into specific AFMs by reversal in the orthogonal periodotopic gradients. Neighboring cloverleaf clusters are then divided along the tonotopic reversals at the cluster boundaries.
While CFM clusters have consistent positions relative to one another across the cortical surface, CFMs within each cluster may be oriented differently among individuals as if rotating about a cluster’s central representation. This inter-subject is consistent with the variability in molecular gradient expression that gives rise to the development of cortical topographical gradients [145, 146, 147, 148, 149]. This unpredictability of cluster anatomical location and rotation emphasizes the need for careful data analysis to be performed in individual subjects, in which common CFMs can be identified by analyzing the pattern of CFMs and cloverleaf clusters within that sensory system.
3.3 Organization of human auditory field maps
3.3.1 Auditory cortex organization in macaque monkey vs. human
Auditory processing in human cortex and in nonhuman primates occurs bilaterally along the temporal lobes near the lateral sulcus (Figure 1; e.g., [5, 10, 115, 121, 139, 140, 141, 142, 144, 150, 151, 152, 153]). In the macaque monkey model system upon which much of our understanding of human audition is based, converging evidence from cytoarchitectural, connectivity, electrophysiological, and neuroimaging studies have generally identified 13 auditory cortical areas grouped into core, medial and lateral belt, and parabelt regions that are associated with primary, secondary, and tertiary levels of processing, respectively (for extended discussions, see [2, 5, 154]). Auditory processing in macaque cortex begins along the superior temporal gyrus (STG) within three primary auditory areas: A1, R, and RT . In contrast to early visual processing in which primary visual cortex is composed of V1 alone, primary auditory cortex is considered to be a core region composed of these three AFMs; all three areas contain the expanded layer IV arising from dense thalamic inputs and the high expression of cytochrome oxidase, acetylcholinesterase, and parvalbumin distinctive to primary sensory cortices [2, 142, 143, 150, 152, 154, 155, 156, 157]. The eight belt regions are divided into four areas along both the lateral (CL, ML, AL, RTL) and medial (CM, RM, MM, RTM) sides of the core [158, 159, 160]. Along the lateral belt, two additional areas create the parabelt, which allocates auditory information to neighboring auditory cortex as well as to multimodal cortical regions [2, 161].
Based on cytoarchitectural, connectivity, and neuroimaging measurements, early auditory processing in human cortex has been shown to resemble the organization of lower-level macaque auditory processing [10, 23, 121, 144, 151, 152, 153, 162]. Over the ~25 million years of evolutionary separation between the species, the core, belt, and parabelt areas have rotated from the STG to Heschl’s gyrus (HG), an anatomical feature unique to humans [11, 163]. The specific structure of HG differs across individuals, variably existing as a single or double gyrus. PAC is then either mostly centered on the single HG or overlapping both gyri in the case of two (Figure 1B,C) [122, 135, 136]. Core, belt, and parabelt areas have thus shifted in orientation from a strictly rostral-caudal axis for A1 to R to RT along macaque STG to a medial-lateral axis along human HG for hA1, hR, and hRT. The naming of the AFMs in human is based on the likely homology to macaque, but adds an ‘h’ to signify human .
3.3.2 Eleven human AFMs compose three cloverleaf clusters overlapping Heschl’s gyrus
With our new understanding of periodotopic representations overlapping the previously identified tonotopic gradients, in vivo fMRI measurements can now identify the 11 AFMs that compose the core and belt regions of human auditory cortex (Figure 10) [5, 10, 12, 23]. Running from STG to the circular sulcus (CiS) along HG are three distinct, concentrically organized, tonotopic representations. The primary circular tonotopic gradient is one dimension of the HG cloverleaf cluster, with a confluent low-tone representation located centrally and expanding smoothly to high-tone representations at the outer edge (Figure 10B,C) . The HG cluster is divided along the orthogonal periodotopic reversals into two AFMs each of core, medial belt, and lateral belt: hA1, hR, hMM, hRM, hML, and hAL (Figure 10D,E). Positioned at the tip of HG, hA1 is the largest of these core and belt AFMs, with the posterior/lateral region representing low tones and the anterior/medial region representing high ones. HA1 is involved in the most basic of cortical auditory computations, which is reflected in its representations of broad ranges of tonotopy and periodotopy .
A reversal in the tonotopic gradient along the anteromedial edge of the HG cluster divides it from the CM/CL cluster just past the tip of HG (Figure 10B,C). A high-periodicity gradient reversal splits this tonotopic gradient into hCM, and hCL, two regions associated with early language and speech processing as well as audiovisual integration (Figure 10D,E) . Finally, the reversal in the tonotopic gradient along the posteriolateral edge of the HG cluster separates it from the RT cluster positioned where HG meets STG (Figure 10B,C). Two reversals in the periodotopic representations here divide the RT cluster into hRT, hRTM, and hRTL (Figure 10D,E). In macaque, these AFMs along STG are thought to subserve lower-level processing of auditory stimuli like temporally modulated environmental sounds [158, 159]. More research is needed to determine how what other AFMs form the CM/CL and RT clusters. Based on emerging data, it is likely that AFMs will also be a fundamental organization of auditory cortex adjacent to these cloverleaf clusters, such as planum temporale (PT), planum polare (PP) and STG.
3.4 Measuring attention and working memory in human AFMs
The characterization of AFMs and cloverleaf clusters will be crucial for the study of the structure and function of human auditory cortex, as these in vivo measurements allow for the systematic exploration of computations across a sensory system (for reviews, see [5, 17]). Such AFM organization provides a basic framework for the complex processing and analysis of input from the sensory receptors of the inner ear [5, 12, 17, 23]. The cloverleaf cluster organization of AFMs may also play a role in coordinating neural computations, with neurons within each cluster sharing computational resources such as common mechanisms to coordinate neural timing or short-term information storage [8, 12]. Similarly, vision studies suggest that functional specializations for perception are organized by cloverleaf clusters, as a particular cloverleaf cluster can be functionally differentiated from its neighbors by its pattern of BOLD responses, surface area, cortical magnification, processing specialization, and receptive field sizes [12, 16, 18, 19, 21, 165]. These distinctions indicate that CFMs within individual cloverleaf clusters are not only anatomically but also functionally related [15, 18, 20, 166].
The cluster organization is not necessarily thought to be driving common sensory functions, but rather reflects how multiple stages in a sensory processing pathway might arise during development across individuals and during evolution across species. It is likely that this cluster organization, like the topographic organization of CFMs, allows for efficient connectivity among neurons that represent neighboring aspects in sensory feature space [166, 167, 168, 169]. Since the axons contained within one cubic millimeter of cortex can extend 3-4 km in length, efficient connectivity is vital for sustainable energetics in cortex .
The definitions of AFMs and the cloverleaf clusters they compose using phase-encoded fMRI will thus serve as reliable, independent localizers for investigations of attention and working memory in early auditory cortex across individuals. Measurements of individual AFMs along the cortical hierarchy will help reveal the distinct stages of top-down and bottom-up auditory processing. In addition, changes in AFMs can be tracked to study how auditory cortex changes under various attentional and working memory tasks and disorders (e.g., [145, 171, 172, 173, 174, 175, 176, 177]).
The human brain has sophisticated systems for perception, trace memory, attention, and working memory for audition and vision, and likely the other senses as well. These systems appear to be organized in a very similar manner for each sense, despite the inputs to each system and information content being quite different. Behavioral measures of the last several decades have led to the development of well-defined models of each system. These models form the basis for the investigation of their underlying architecture in the cortical structures of the human brain. EEG and PET have allowed for spatially coarse investigation of cortical activity, but with the advent of fMRI, it has become possible to make exceptionally detailed spatial measurements. The methods of investigation must be carefully crafted to best elicit activity reflecting the desired aspects of each system; not only must the tasks be appropriate for fMRI, the stimuli and task must be closely matched not just to the system being studied, but to the inputs into that system as well.
For both audition and vision, the sensory processing in cortex happens in cloverleaf clusters of CFMs. This organizational pattern has clearly been demonstrated in the lower tiers of the processing hierarchy and very likely is organized as such throughout. Because the CFMs across the entire hierarchy (or at least, most) of one sense can be measured in just one session in the fMRI scanner, they make incredibly efficient localizers. CFMs are be measured in individual subjects, and serve as functional localizers that can be used to average more accurately across subjects than anatomical localizers. As such, due to the pervasive and fundamental role CFMs play in sensory systems, they are also excellent candidates for measuring the effects of attention and working memory in cortex. To best accomplish this feat, it is proposed that stimuli that are similar to those used to measure CFMs are excellent candidates for use in traditional tasks used to define attentional and working-memory models.
This material is based upon work supported by the National Science Foundation under Grant Number 1329255 and by startup funds from the Department of Cognitive Sciences at the University of California, Irvine.
Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.