Open access peer-reviewed chapter

The Contribution of fMRI in the Study of Visual Categorization and Expertise

By Natasha Sigala

Reviewed: January 23rd 2014Published: May 31st 2014

DOI: 10.5772/58276

Downloaded: 754

1. Introduction

We recognize and categorize objects around us within a fraction of a second and in a number of different ways, depending on context, our experience with them, and the purpose of the categorization. For example the same animal can be a dog, a bow wow or a bulldog, a mammal or a Canis lupus familiaris. We are also able to recognize it in a variety of lighting conditions, orientations and positions, despite the large number of two dimensional images that every three dimensional object generates. It is therefore not surprising that our extraordinary ability to recognize objects has fascinated philosophers for a very long time. In Categories Aristotle made an attempt to categorize everything, mainly by analysing patterns of language and speech, answering questions like “τί ἐστί”, “what is it?”, “what like is it?” (Cross, 1959), and by describing the defining qualities that all instances of a particular category share, e.g. all soft things share the quality of softness.

Real progress in describing and understanding such qualities (features) was made by Fred Attneave in experiments of visual perception during his Ph.D. research at Stanford University (Attneave, 1957). Specifically his results showed that subjective ratings of the perceived dissimilarities between stimuli (letter stings and shapes) and the frequency of errors while learning those stimuli, could both be explained in terms of distances between stimuli represented as points in a “psychological space”. The idea of describing subjective psychological experiences with geometry influenced Roger Shepard, who applied and extended multidimensional scaling (MDS), the representation of objects (e.g. shapes, words, faces) as points in space, so that the distances between the points represent the perceived similarities between the objects (Richardson, 1938), (Torgerson, 1952), (Shepard, 1958), (Shepard, 1962a), (Shepard, 1962b), (Shepard, 1987).

The exact ways in which similarity of a perceived object and its mental representation is measured have been the focus of research for over half a century. According to the theory of prototypes, we categorize new objects by comparing them to abstracted representations that are the central tendency of all the examples of the categories that we have experienced (Posner and Keele, 1968), e.g. a prototypical triangle or square. According to exemplar theories, the perceived objects are compared with stored representations of exemplars, grouped by category (Medin and Schaffer, 1978), e.g. memories of actual triangles and squares we have experienced. Although the early exemplar models postulated equal weights for all stimulus features, later contributions included an attention-optimisation hypothesis, allowing the perceived distance (similarity) between objects to vary with context and task demands (Generalised Context model, (Nosofsky, 1986)). For example this implies that for a layperson the representations of a trout and bass may be close together (similar), but for an ichthyologist or fisherman those representations are further apart (less similar) because they can identify a number of features that make the two types of fish different.

Eleanor Rosch analyzed language patterns and visual perception across cultures and did the experiments that I suspect Aristotle would have liked to perform himself (Heider, 1972, Rosch et al., 1976a). She showed that the categorizations we make rely on features of high validity (e.g. feathers and wings for birds) which form prototypes that aid cognitive economy and reflect perceived world structure (Boden, 2006), pp. 520-521. In other words “robin” is a more typical bird example, than “ostrich”, and this helps us recognise and make decisions about objects faster, and also reflects the likelihood to encounter such objects in our environment. Rosch named the hierarchical level where category prototypes are found as the basic level of categorization (e.g. bird, fish, tree), which is the one that provides the most useful information, the first to be named by children, and the most necessary in language (Rosch, 1976, Rosch et al., 1976b). The level below (subordinate) contains more specific information (e.g. robin, sparrow), while the level above (superordinate) (e.g. animal, plant) less specific information. These results explained how we structure information about the world around us, to a large extent in a universal, rather than arbitrary or culture-dependent way. The opposite conclusion would render the following “ancient Chinese” animal taxonomy, presented by Jorge Luis Borges (1966) (in (Boden, 2006), p. 519) entirely possible:

“[Animals] are divided into (a) those that belong to the Emperor, (b) embalmed ones, (c) those that are trained, (d) suckling pigs, (e) mermaids, (f) fabulous ones, (g) stray dogs, (h) those that are included in this classification, (i) those that tremble as if they were mad, (j) innumerable ones, (k) those drawn with a very fine camel’s hair brush, (l) others, (m) those that have just broken a flower vase, (n) those that resemble flies from a distance”.

Crucially for the question of expertise, Rosch’s basic level of categorization, can be and is modified by experience. Based on their observation that an airplane mechanic answered questions about airplanes differently to other (non-expert) participants, Rosch and colleagues suggested that one way to study what part a person’s previous knowledge plays in categorization, would be a systematic variation of the participants’ level of expertise and the object categories (Rosch et al., 1976a). They also speculated that an ichthyologist would use as a basic level, what a novice would consider a more specific, subordinate one (e.g. trout or salmon, instead of fish). Tanaka and Taylor performed such an experiment, and compared the categorization performance of dog and bird experts in listing distinctive features, as well as speed and level of categorizing animals in and out of their area of expertise. They found that the most informative category for the experts was the more specific, subordinate level, rather than the basic one (Tanaka and Taylor, 1991), e.g. crow or robin, rather than bird for the bird experts, and beagle or collie, rather than dog for the dog experts.

Despite long-running debates about the usefulness of the concept of similarity in explaining categorization (e.g. Goodman, 1972), and the merits of prototype, exemplar or other theories, most researchers agree that categorization is based on an expanded notion of similarity, an overall similarity, which encompasses physical, functional and overall features, reflecting a person’s theoretical understanding of the world, e.g. (Murphy and Medin, 1985) in (Ahn and Dennis, 2001). The concept of similarity has furthermore inspired computational theories of vision, where “the representation of similarity is taken to be the goal of the visual system” (Edelman, 1998), modern extensions of which are particularly appropriate for relating neural data, behaviour and models across species and neuroscientific methods (Kriegeskorte, 2009).

In this chapter I will first review some background work that sets the context for the subsequent discussion of a subset of fMRI studies that have shed light on the questions of perceptual and cognitive categorization and expertise over the last 15 years.

2. Computational and experimental work

During the end of the 20th and the beginning of the 21st century, the idea that neuronal populations code representations which reflect both physical stimulus similarity and perceived stimulus similarity, as shaped by task demands, received significant experimental support. In order to bridge the species gap between human and non-human primates, and study meaningfully the neuronal representations in the macaque brain, it was first necessary to test if the categorizations that monkeys make can also be explained with MDS, and prototype and/or exemplar theories. This involved the creation of novel, parametrically designed stimuli with a known number of varying dimensions (features). The premise here was inspired by computational theories of vision that called for representations that were two-dimensional viewpoint-dependent snapshots of three-dimensional objects (Poggio and Edelman, 1990) that preserved the geometry of similarity amongst the objects (Edelman, 1998) (as opposed to stored three-dimensional representations of the objects themselves (Marr and Nishihara, 1978) (Biederman, 1987)). The first study to show that MDS was a useful tool for understanding object categorization in the monkey was described in (Sands et al., 1982). Crucially, they did not provide categorization training to the animals, and they explored how macaques perceived and represented pictures of various natural categories (faces, fruit, colours). They reached the conclusion that macaques treat pictorial stimuli categorically cf. (Sigala, 2009). Several years later (Sugihara et al., 1998) employed novel stimuli (computer-generated 3D animals) to test the usefulness of MDS in the study of object recognition, following a previous demonstration with the human visual system (Cutzu and Edelman, 1996). After systematic training of the monkeys to report their perception of the stimuli, Sugihara et al. showed that the psychophysical representation of the novel stimuli, as revealed by MDS, captured the similarities built in the stimulus space. This means that the monkey visual system could successfully recover a two-dimensional configuration of the stimuli that were originally built in a high-dimensional space (set of 56 variables), and reliably capture their relative similarities. This was the first piece of evidence for the non-human primate visual system that representations of the stimuli might be representations of the similarities of the stimuli.

Following this psychophysical experiment, Sigala and colleagues also created two parametrically designed stimulus sets (schematic faces and fish) with four varying dimensions (Sigala et al., 2002). The participants (both humans and non-human primates) first had to report how similar the stimuli were to them, then were trained to categorize them at the subordinate level based on the combination of two features, and then reported how similar the stimuli were to them a second time. By collecting similarity ratings before and after the categorization training, it was possible to show that the perceptual similarity of the stimuli changed, particularly for the macaques, after they had learnt to categorize the stimuli based on two features (ignoring the other two equally varying features) (Fig. 1). The initial MDS solution of the similarity ratings showed that the stimulus features were not consistently used when comparing the stimuli (Fig. 1a).

Figure 1.

Psychophysical representation of schematic faces in a non-human (a, b) and a human (c, d) participant, as revealed by Multi Dimensional Scaling (MDS) on similarity ratings of the stimuli before and after categorization training. (Triangles: psychophysical representation based on similarity ratings; circles: physical stimulus values). 20 schematic faces are represented by: red and yellow circles, for category 1 and 2 exemplars used in training respectively; blue circles for the test exemplars used in the transfer phase of categorization; purple circles: prototypes for categories 1 and 2. The two categories were linearly separable along the Eye Height, Eye Separation dimensions, but not along the Nose Length, Mouth Height. Lines connect matching physical stimulus representations. When several patterns share the same combination of physical dimension values, multiple triangles are connected to the same circle. Longer lines correspond to less faithful psychophysical representation of corresponding physical stimulus values. Reproduced from (Sigala et al., 2002) with permission.

Figure 2.

a) Mean distance difference between psychophysical and physical stimuli along the dimensions that were diagnostic for the categorization task (Δ12) and the dimensions that also varied, but were not diagnostic for the categorization task (Δ34), before and after training, averaged over 20 schematic faces (error bars are standard error of mean). b) Average distance difference along the non-diagnostic and diagnostic dimensions before and after categorization for schematic fish stimuli. Data before and after categorization taken from two different monkeys and a single human subject. Significance levels (t-test): (∗∗) corresponds to P<0.005 and (∗) to P<0.01. Reproduced from (Sigala et al., 2002) with permission.

However training to categorize the stimuli based on a subset of their features, changed the way they perceived and represented them, even in the context of a different task (similarity ratings vs. categorization) (Fig. 1b, Fig. 2, Fig. 3). Looking for the neuronal underpinnings that supported this perceptual change (Sigala and Logothetis, 2002) found that cells in the anterior inferior temporal cortex selectively represented the values of the features that were important (diagnostic) for the categorization task, over the values of the features that were not (Fig. 4). This was clear evidence that perceptual expertise correlated with selective tuning of cells in the temporal cortex, which presumably developed over the course of training, since the stimuli were unlike anything the animals encountered in their normal environment. It was also evidence in favour of the Generalised Context Model (Nosofsky, 1986), according to which selective attention processes make the perceptual multidimensional stimulus space to shrink or expand reflecting the importance of the most relevant stimulus dimensions (Fig. 5) (Gauthier and Palmeri, 2002). It is clear that if one input the firing rates of the recorded cells in an MDS analysis, the result would resemble the solution recovered for the behavioural data after training (Fig. 1b), where the diagnostic feature values (that separate the categories) are perceived as dissimilar and end up further away from each other, while the non-diagnostic feature values (that don’t separate the categories) are perceived as similar and end up close together in space. A similar finding was reported in a single-unit study of inferior temporal neurons, where the stimulus space of parametrically designed shapes was recovered both by psychophysical and neurophysiological measurements (Op de Beeck et al., 2001).

Figure 3.

Stimuli and category structure used by (Sigala et al., 2002) and (Sigala and Logothetis, 2002). a. The first stimulus set consisted of line drawings of faces with four varying features: eye height, eye separation, nose length and mouth height. b, The second stimulus set consisted of fish outlines with four varying features: the shape of the dorsal fin, tail, ventral fins and mouth. In both stimulus sets, each feature could take one of three discrete values. The categories were separable along two of the four stimulus features, but information about only one of the diagnostic features was insufficient for optimal performance. The monkeys were presented with one stimulus at a time. c, Two-dimensional representation of the stimulus space. Black stars represent the stimuli of the first category and red ovals represent the stimuli of the second category. Each number indicates the position of one corresponding stimulus from a. As the stimuli differ along four dimensions, the two-dimensional representations in this figure result in overlap of stimuli that are distinct. The purple circles (P1 and P2) represent the prototypes. Cyan circles represent test exemplars that did not belong to a fixed category. The two categories were linearly separable along the eye height, eye separation (EH, ES) dimensions (solid line) but not along the nose length, mouth height (NL, MH) dimensions. Reproduced from (Sigala and Logothetis, 2002) with permission.

Figure 4.

Population average of neuronal tuning to the features that were diagnostic for the categorization task in (Sigala and Logothetis, 2002) (reproduced with permission). The average activity is sorted by stimulus feature (eye height (a), eye separation (b), nose length (c) and mouth height (d)). Black traces indicate average responses to the best feature value; grey traces indicate average responses to the worst feature value. The bars indicate standard error of the mean. For each feature (a–d) a t-test was performed for the time window 100–475 ms after stimulus presentation, testing the hypothesis that the mean firing rate to the best feature was higher than the mean firing rate to the worst feature. The hypothesis could not be rejected in the case of the diagnostic features (eye height (a) and eye separation (b)).

Figure 5.

Stretching and compression of the stimulus space by selective attention according to the task demands. As the subjects learn to categorize fish into two groups the multidimensional space becomes relatively stretched along a diagnostic dimension (here shape of the tail) relative to the non-diagnostic dimensions. Before categorization (A), object 1 is equally similar to objects 2 and 3. During categorization (B), object 1 becomes more similar to object 2 than to object 3 through selective attention to the shape of the tail. Reproduced from (Gauthier and Palmeri, 2002) with permission.

3. The brave new era of fMRI

Until the nineties the study of the neural substrates of object recognition in the human brain relied mainly on patients with brain damage, e.g. (Farah, 1992). However behavioural and event-related potential studies of normal subjects, as well as behavioural, lesion and single-unit recording studies in macaque monkeys, had provided a wealth of evidence for the functional organisation of the primate visual system, and of object recognition in particular. With the advent of functional imaging (initially PET, but mainly of fMRI) it became possible to see the normal human brain in action for the first time. Two of the very first questions people wanted to ask concerned the localisation and functional organisation of object recognition: where does recognition happen in the brain, and are there specialized cortical modules for different object categories. These were really important questions to ask because a) single-unit recordings (Desimone et al., 1984) and cortical field recordings in human patients (Allison et al., 1994) did not have the global coverage necessary to show the extent of clusters of inferior temporal neurons tuned to faces or other trained object classes e.g. paperclips (Logothetis and Pauls, 1995), and to assess hemispheric laterality; b) structural models of vision and object recognition (Marr and Nishihara, 1978), (Biederman, 1987) did not suggest different representations for different types of objects, or multiple levels of recognition; but c) neuropsychological studies had provided evidence for at least three different brain modules, specific for faces, common objects and words, see (Farah, 1992) for a review.

In this chapter, I present some experimental and theoretical progress that has followed from the literature of perceptual similarity, and will review contributions of the fMRI literature.

(Sergent et al., 1992) provided the first neuroimaging evidence for category-selective responses for face stimuli, with PET imaging of normal subjects. They showed a dissociation between face processing in the right ventromedial hemisphere, and object processing in the left occipito-temporal cortex. The first study to employ fMRI to address the localisation question was by (Puce et al., 1995). They compared faces vs. scrambled faces and reported a number of bilateral activations, with the strongest in the right fusiform gyrus. But it was the study by (Kanwisher et al., 1997) (see (Kanwisher et al., 1996) for preliminary results) that thoroughly tested the response specificity of a proposed face module for the first time, and coined the acronym, the Fusiform Face Area (FFA) that has been with us for almost 15 years now (Fig. 6).

Nancy Kanwisher and her colleagues first localised the area that was significantly more active for faces than objects, then tested if the activity of the area could be explained by a number of variables: low level features; the fact that all faces belonged in the same level category, while the objects were from different categories, by comparing with houses; the fact that faces were compared with inanimate objects, by comparing with hands; attentional factors, by comparing passive fixation with a one-back task. That study was important for several reasons: a) because it showed a clear selectivity for faces over other object categories; b) it pointed to the need for different theories of computation for object and face recognition; c) brought the discussion about whether faces were special, or this specificity could be expected to develop for every class of objects that one was expert in (Diamond and Carey, 1986), in the neuroimaging field.

Figure 6.

The Fusiform Face Area first reported in (Kanwisher et al., 1997) (reproduced with permission). Results from one subject. The right hemisphere appears on the left. The brain images at the left show in colour the voxels that produced a significantly higher MR signal intensity during the epochs containing faces than during those containing objects (1a) and vice versa (1b) for 1 of the 12 slices scanned. These significance images are overlaid on a T1-weighted anatomical image of the same slice. In each image, an ROI (Region Of Interest) is shown outlined in green, and the time course of raw percentage signal change over the 5 min 20 sec scan (based on unsmoothed data and averaged across the voxels in this ROI) is shown at the right. Epochs in which faces were presented are indicated by the vertical gray bars marked with an “F”; gray bars with an “O” indicate epochs during which assorted objects were presented; white bars indicate fixation epochs.

4. Car, bird and greeble experts

Gauthier and her colleagues had also been working on the question of functional specialization using fMRI (Gauthier et al., 1996a) (Gauthier et al., 1996b), and had preliminary evidence that FFA may also be recruited in subordinate categorizations (e.g. for the basic level “bird”, the subordinate level –requiring additional perceptual processing-could be “skylark” or “blackbird”) (Gauthier et al., 1997). Their fMRI work with greebles, novel objects created to provide a control set equivalent for faces (Fig. 7), indicated that the FFA may be an area involved in visual expertise in general and not just for faces (Gauthier et al., 1999). This proposal has become known as the expertise hypothesis.

Figure 7.

Meet the Greebles. Two greebles from different 'families', as defined by the shape of the large central part, as well as two individual greebles from the same family, differing only in the shape of the smaller parts. Adapted from (Gauthier et al., 1999) with permission

The greebles however have been criticised as being too similar to faces (in terms of symmetry, features, configural processing), which would make them less than ideal controls for the face specificity hypothesis of FFA (see below for neuro-imaging evidence). In a different experiment, Gauthier and her colleagues, recruited car and bird experts in order to test whether FFA was recruited during categorization of objects in their domain of expertise (Gauthier et al., 2000). The results showed that while the FFA response was still greater for faces, rather than other objects, it was nonetheless involved in processing the objects in the participants’ area of expertise as well (e.g. birds for bird experts). This finding has been subsequently replicated by other laboratories (Xu, 2005), as well as in different domains of expertise (e.g. with moths and butterflies) (Rhodes et al., 2004). The BOLD signal intensity of the FFA in these studies is greater for faces than other stimuli, so there is specificity, but not exclusivity, for face stimuli, a domain of expertise that most humans share.

Another way to test the expertise hypothesis, instead of recruiting real world experts, is to effectuate expertise in the lab following intense training until a certain discrimination criterion has been reached, and/or a certain number of training hours (e.g. 10) have been completed. These paradigms, not only allow to test the responses of FFA or any other brain area, but also to study other changes as a result of training. For example (Op de Beeck et al., 2006) tested changes in cortical activations before and after 10 hours of training to discriminate novel 3D shapes at a subordinate level in a match-to-sample task. The authors used three novel shape categories, which they named “smoothies”, “spikies” and “cubies” based on the type of protrusions they had (Fig. 8).

Figure 8.

Stimuli and task used in (Op de Beeck et al., 2006), reproduced with permission. For each of the three classes (smoothies, spikies, and cubies), exemplars were constructed from a four-dimensional object space. Each exemplar had a value from 0 to 5 on each of four shape dimensions. The top three rows show exemplars from each class: value 0 on each dimension (far left), value 5 on one dimension and value 0 on the other dimensions (middle four exemplars), and value 5 on each dimension (far right). The bottom half of the figure shows the task used to train subjects in shape discrimination.

Figure 9.

fMRI evidence that categorization training changes object representations in human extrastriate cortex, reproduced from (Op de Beeck et al., 2006) with permission. Activations (significance maps thresholded at p < 0.0001, uncorrected) are shown for the contrast [trained > untrained], with red/yellow indicating positive contrast and blue indicating negative contrast. a, Functional activation overlaid on a coronal anatomical slice for three subjects. The left, middle, and right subjects were trained with the smoothies, spikies, and cubies, respectively. These subjects were representative in the size of training effects seen across the population. Slices are shown with right hemisphere at the left. b, Functional activation overlaid on a ventrolateral view of the inflated brain of a fourth subject (trained with the smoothies).

The approach of using artificial stimuli avoids confounds of prior, uncontrolled experience of the participants with the stimuli, and should lead to new cortical representations that are formed during the course of training. The participants were scanned before and after the training sessions, and while performing an orthogonal task to shape discrimination (colour change detection task), so any observed changes in the cortical activations could not be attributed to changes in task performance or differences of other cognitive factors e.g. attention, task difficulty etc. This study showed that the behavioural improvement in shape discrimination (one trained class for each participant) was accompanied by the following changes in cortical activations: 1) more voxels, therefore more groups of neurons, responded to the trained rather untrained stimuli in the post-training scan; 2) these voxels did not form a large cluster, but multiple small clusters in the extrastriate visual cortex (Fig. 9); 3) the Lateral Occipital Cortex (LOC) showed more activity to the trained stimuli compared to the FFA and early retinotopic visual areas, but also showed more activity to the stimuli before the training had taken place, especially in the right hemisphere; 4) the right FFA did not show increased responsiveness to the trained stimuli, except in one participant trained with “smoothies”, who during debriefing reported interpreting the stimuli as “women wearing hats”, in other words face-like (Fig. 9).

This last observation shed light as to what could be driving the FFA responses in the studies with greebles, which are also face-like stimuli. This was further tested in a recent study by (Brants et al., 2011) which replicated the design of the original greeble fMRI study (Gauthier et al., 1999) and extended it by scanning the participants both before and after training with the greebles. The main finding of this experiment is that even before training the participants showed an inversion effect[1] - with the Greebles (that did not increase after training), indicating that these objects are interpreted as living entities with faces, and the cortical activations elicited by them may not be due to expertise.

Two more studies that contrasted the effect of mere exposure vs. categorization training (van der Linden et al., 2008) and a categorization vs. individuation task (Wong et al., 2009) with computer-generated, parametrically designed stimuli also showed increased cortical activations after training in areas near and somewhat overlapping with the right FFA.

5. The extent of the brain areas involved in categorization revealed with fMRI

Lack of increased right FFA response to trained stimuli (morphed cars) within a subordinate category was also reported by (Jiang et al., 2007) in a study of categorization training that revealed a large number of areas, including prefrontal, insular, parietal and inferior temporal cortices, as well as the thalamus that responded to the trained stimuli vs. a fixation baseline (Table 1). A similar network of cortical and subcortical areas, including the striatum, was also revealed by a study of categorization of dynamic moving patterns (Li et al., 2007), contrasted with a rotation detection task of the same stimuli. This study indicated a different role for temporal and parietal areas, and for prefrontal and striatal areas. For the former areas, the selectivity to diagnostic visual features was tuned according to task demands. Prefrontal and striatal areas are thought to provide top-down control by shaping the neural representations in the cortical areas involved in the relevant stimulus feature processing (e.g. inferior temporal cortex for shape). This study replicated and extended a number of key finding from separate single unit studies in the macaque, partly because of the global brain coverage that fMRI affords, but also thanks to careful parametric design of the stimulus space, and the sensitivity of Multi Voxel Pattern Analysis (MVPA) methods that capitalize on differences across distributed patterns of activity in the brain, e.g. (Cox and Savoy, 2003), (Norman et al., 2006) (Fig. 10).

Table 1.

Table of brain areas that are activated by the trained stimuli (cars) in (Jiang et al.,2007) (Supplementary Table 1, reproduced with permission).

Figure 10.

Basic classification scheme in MVPA (adapted from (Cox and Savoy, 2003) with permission). Each participant viewed block of images belonging to 1 of 10 categories. The pattern of activity over a previously selected subset of voxels, based on a feature selection procedure, is treated as a high-dimensional vector, shown here as a profile plot (the mapping between the voxels in the brain and points in the profile plot is symbolized by red lines). These vectors, along with labels corresponding to the category, are given to a classifier, which learns statistical regularities in the patterns, and maps between brain patterns and experimental conditions. In a subsequent session (separated in time by as much as a month), the same subject is shown the same category of objects. fMRI data are collected with the same spatial sampling, and the pattern over the same voxel subset is extracted. This pattern is given to the trained classifier that infers the category the subject was viewing, based on the decision boundary it extracted. The same steps can be followed with data collected in a single session, where half the data is used to train the classifier, and half is used in the testing phase.

MVPA, an approach that uses a variety of powerful pattern classification algorithms to decode the information in multi-voxel patters of activity, is particularly well suited for the study of distributed cortical representations. Although single unit studies had already pointed to the distributed nature of information coding e.g. (Young and Yamane, 1992), it is the excellent spatial coverage and non-invasive nature of fMRI that can provide information about where categories are represented in the human brain. The advantages of MVPA include: a) the sensitivity to signals that can be ignored in univariate approaches, where each voxel is treated independently and its activity modulation must reach a statistical significance threshold; b), brain activity and behaviour can be related on a trial-by-trial basis (as in single unit studies) rather than hundreds of averaged trials; and c) most importantly, it allows to characterize the structure of neural representations in individual subjects with or without the use of pattern classifiers (Norman et al., 2006). The earliest attempt to examine and relate the entire distributed pattern of voxel activations in object-related cortical areas to object categories was pioneered by Edelman and colleagues (Edelman et al., 1998), and showed promising results in recovering stimulus spaces with MDS. Pictures of natural stimuli, human and animal faces (Figure 11), and computer-generated stimuli resembling fighter planes, human bodies, sharks, four-legged animals, cars and vans (Figure 12), separated in meaningful categories based on the fMRI activity patterns. This study pioneered the multivariate analysis approach with fMRI data, and further suggested that both perceptual and cortical representations may be representations of similarity, which can vary according to the importance of the stimulus features and level of experience with the stimuli, e.g. (Schyns and Rodet, 1997), (Op de Beeck H, 2000), (Sigala, 2004).

Figure 11.

The left panel illustrates the stimulus space configuration, as it was derived with MDS from seven participants’ most significant voxels that responded preferentially to whole objects, but not to their scrambled versions. Those voxels were bilaterally in the Lateral Occipital Complex (area LO).The number labels indicate the serial number of the picture in the epoch. The main result is the separation of faces and animals in two linearly separable clusters. The panel on the right shows the result of a bootstrap procedure, where MDS was applied to randomly permuted time courses. This result shows no clustering that corresponds to specific stimulus categories. (Figure from Edelman et al. 1998, reproduced with the permission of the Psychonomic Society.)

Figure 12.

Comparison of the stimulus space derived with MDS applied on perceptual judgments of similarities (left panel) and fMRI activation patterns (right panel). (Figure from Edelman et al. 1998, reproduced with the permission of the Psychonomic Society.)

The finding of distributed, rather than category-specific clusters of activation in the cortex, was later confirmed with a now seminal study by (Haxby et al., 2001) who showed that despite a small number of areas specialized for specific categories (see section: Other specific modules: Bodies, Places, Words), the representations of objects in the ventral temporal cortex are very distributed and overlapping. This means that the representation of a particular object, or object category, is encoded in the activity pattern created by a large number of neurons, and each of these neurons takes part in instantiating multiple object representations.

Figure 13.

Similarity of representations in human and monkey Inferior Temporal Cortex (IT). Reproduced from (Kriegeskorte et al., 2008b) with permission. (A) Stimulus space reflecting the similarity of responses in IT. The stimuli have been arranged such that their pairwise distances approximately reflect response-pattern similarity (MDS, dissimilarity: 1 – Pearson r, criterion: metric stress). In each arrangement, images placed close together elicited similar response patterns. Images placed far apart elicited dissimilar response patterns. The arrangement is unsupervised: it does not presuppose any categorical structure. (B) Fiber-flow visualization emphasizing the interspecies differences. This visualization combines all the information from (A) and links each pair of dots representing a stimulus in monkey and human IT by a “fiber.” The thickness of each fiber reflects to what extent the corresponding stimulus is inconsistently represented in monkey and human IT. The interspecies consistency ri of stimulus i is defined as the Pearson correlation between vectors of its 91 dissimilarities to the other stimuli in monkey and human IT. The thickness of the fiber for stimulus i is proportional to (1 – ri)2, emphasizing the most inconsistently represented stimuli.

Figure 14.

Hierarchical clustering of Inferior Temporal Cortex (IT) response patterns. Reproduced from (Kriegeskorte et al., 2008b) with permission. This analysis proceeds from single-image clusters (bottom of each panel) and successively combines the two clusters closest to each other in terms of the average response-pattern dissimilarity, so as to form a hierarchy of clusters (tree structure in each panel). The vertical height of each horizontal link indicates the average response-pattern dissimilarity (the clustering criterion) between the stimuli of the two linked subclusters (dissimilarity: 1 – r). The cluster trees for monkey and human are the result of completely independent experiments and analysis pipelines. This data-driven technique reveals natural-category clusters that are consistent between monkey and human. For easier comparison, subcluster trees were colored (faces, red; bodies, magenta; inanimate objects, light blue).

In the most recent and visually compelling demonstrations of the power of combining the notion of similarity with multivariate analyses, (Kriegeskorte et al., 2008b) demonstrated the close correspondence of the activity patterns in human and non-human IT cortex, and revealed a mostly orderly representation of object super-ordinate categories (e.g. animate, inanimate), as well as a continuous representations of categories and exemplars (Fig.13, 14). This approach promises to help bridge the gap of relating the different approaches (computational, imaging, electrophysiological and psychophysical) and species (mainly human and non-human primates) that have figured in the field of object recognition and categorization over the decades (Kriegeskorte et al., 2008a).

6. Other selective modules: Bodies, places, words

The answer to the question “are faces special” is probably yes – the question refers to visual processing mechanisms that may be unique to faces. The case study of a boy who sustained brain damage on the first day he was born, resulting in an inability to recognise individual faces (while his object recognition is largely intact) even after acquired experience and plasticity, as he was tested at the age of 16 (Farah et al., 2000), indicates that the brain is equipped to cope with face recognition at birth and this core ability may not be restored with experience. The FFA therefore may be the result of natural selection for copying with a common and important stimulus for survival, which occupies a dense stimulus/similarity space. This is because all faces look largely similar in terms of number and configuration of features, and multiple categorizations need to be applied on them in a very short time (e.g. identity, sex, emotion). This does not mean however that experience is not important, since the fine tuning of face recognition ability also depends on experience, as seen e.g. in the other-race effect[1] - (Lindsay et al., 1991), (Feng et al.).

Other modules selective for processing certain categories of stimuli that have been revealed with fMRI are the Parahippocampal Place Area (PPA), which is selective for the geometry of the local environment including pictures of houses (Epstein R, 1998), the Extrastriate Body Area (EBA) selective to human body parts (Downing et al., 2001), as well as the letter string area, which has been elegantly shown to form as a result of experience (Baker et al., 2007).

Imaging work in the non-human primate has also revealed the full extent of the representation of faces and other categories in the brain. The equivalent of the human FFA, seems to be still elusive (Ku et al., 2011), but a network of areas that respond more to faces than to other objects has been repeatedly shown with fMRI in the macaque brain including the temporal and prefrontal cortex, and the amygdala (Logothetis et al., 1999), (Tsao et al., 2003), (Hoffman et al., 2007), (Tsao et al., 2008b), (Tsao et al., 2008a), (Ku et al., 2011). The exact role of each face patch is not clear, but what we do know is that it only takes a small amount of micro-stimulation of a small group of face selective neurons to affect the perception of a macaque categorizing face vs. non-face stimuli and bias the response towards the face category (Afraz et al., 2006). A direct comparison of the reported locations of these face patches from different groups, reveals such a dense network of selective cell groups, which clearly points to a very distributed representation of faces in the non-human primate brain (Fig. 15). However, it should be noted that the majority of these face patches are also active in the anaesthetised preparation (Logothetis et al., 1999), (Ku et al., 2011), complicating the interpretation of the findings and the role of these activations in conscious perception.

Figure 15.

Comparison of Face-Selective Activation Found in the Current Study with Face-Selective Activation Described in the Literature Superimposed on a Side and Ventral View of the Brain. (A) Side view of the brain. The locations of face-selective patches found in the literature ( (Bell et al., 2009), (Logothetis et al., 1999), (Pinsk et al., 2005), (Tsao et al., 2008a) and (Tsao et al., 2008b) are marked by closed symbols and locations found in the current study are indicated by the open circles. For the activated areas described by (Tsao et al., 2008a) and (Tsao et al., 2008b), naming conventions used by the authors were retained with the locations in parentheses. Locations were estimated based on AP positions when given; otherwise positions were estimated by comparing the coronal slices to the atlas by (Saleem and Logothetis, 2006). In cases where activation extended over multiple slices the average position was taken. The locations shown are after normalization to the macaque template (McLaren et al., 2009). Note that not all studies (including the present one) make a distinction between STS patches located on the lip and the fundus. (B) Ventral view of face-selective activation in this study (open circles) and the literature (closed symbols). Abbreviations: LOS: lateral orbital sulcus; MOS: medial orbital sulcus; OTS: occipitotemporal sulcus; PMTS: posterior middle temporal sulcus; RS: rhinal sulcus. Figure reproduced from (Ku et al., 2011) with permission.

7. Conclusion

It would be unfortunate if the reader was left with the impression that fMRI has only revealed locations and representations of stimuli that are compatible with certain computational and theoretical accounts of visual perception and long term memory. Other important contributions of fMRI in the literature of expertise relate to evidence for the improvement of working memory abilities in experts (Moore et al., 2006), as well as the demands of encoding strategies (e.g. chunking) that can make certain tasks easier (Bor et al., 2003), and the possibility that a frontoparietal cortical network may be a general purpose expertise-based network, e.g. (Bor and Owen, 2007).

Despite its relatively young age, the fMRI community has engaged with and made important contributions to most of the questions that had been keeping single-unit electrophysiologists busy for decades, regarding for example functional specialisation, local vs. distributed processing, hierarchical representations and the malleability of those representations. The emphasis in the field has slowly but surely shifted from where functions are taking place to how the underlying computations are achieved.

The unprecedented ability to see the whole human brain in action has reassured us about its similarities with non-human primate brains, e.g. (Orban et al., 2004), but also pinpointed differences, e.g. (Petit et al., 1997), and has revealed the extent to which the brain operates as a functional network, e.g. (Vogel et al., 2010, Van Dijk KR et al., 2010).

In the field of visual categorization and expertise, fMRI has revealed a number of specialised areas for processing biologically and culturally important categories of stimuli, like faces and letter strings. At the same time fMRI has revealed how distributed the representations of most object categories are, and how these may be organised in a hierarchical way that breaks up the complexity of the world in manageable chunks, governed by perceptually and cognitively defined similarity rules, that take into account task demands.


  • In fMRI experiments the inversion effect is associated with higher activity in FFA for upright than for inverted stimuli. The inversion effect is a hallmark of holistic processing (the integration of the stimulus parts into a whole representation), which may be specific to faces, as opposed to part-based processing for other types of stimuli, like animals or artifacts.
  • The other race effect (or own-race bias): the phenomenon describes the fact that, despite our impressive ability to effortlessly recognize and remember faces, we make more errors when we remember and recognize faces from another, less familiar race (O’Toole et al., 1994).

How to cite and reference

Link to this chapter Copy to clipboard

Cite this chapter Copy to clipboard

Natasha Sigala (May 31st 2014). The Contribution of fMRI in the Study of Visual Categorization and Expertise, Advanced Brain Neuroimaging Topics in Health and Disease, T. Dorina Papageorgiou, George I. Christopoulos and Stelios M. Smirnakis, IntechOpen, DOI: 10.5772/58276. Available from:

Embed this chapter on your site Copy to clipboard

<iframe src="" />

Embed this code snippet in the HTML of your website to show this chapter

chapter statistics

754total chapter downloads

More statistics for editors and authors

Login to your personal dashboard for more detailed statistics on your publications.

Access personal reporting

Related Content

This Book

Next chapter

Color Specificity in the Human V4 Complex – An fMRI Repetition Suppression Study

By Tessa Van Leeuwen

Related Book

First chapter

Sensory Integration in Attention Deficit Hyperactivity Disorder: Implications to Postural Control

By Dalia Mohamed Hassan and Hanan Azzam

We are IntechOpen, the world's leading publisher of Open Access books. Built by scientists, for scientists. Our readership spans scientists, professors, researchers, librarians, and students, as well as business professionals. We share our knowledge and peer-reveiwed research papers with libraries, scientific and engineering societies, and also work with corporate R&D departments and government entities.

More about us