We recognize and categorize objects around us within a fraction of a second and in a number of different ways, depending on context, our experience with them, and the purpose of the categorization. For example the same animal can be a dog, a bow wow or a bulldog, a mammal or a Canis lupus familiaris. We are also able to recognize it in a variety of lighting conditions, orientations and positions, despite the large number of two dimensional images that every three dimensional object generates. It is therefore not surprising that our extraordinary ability to recognize objects has fascinated philosophers for a very long time. In Categories Aristotle made an attempt to categorize everything, mainly by analysing patterns of language and speech, answering questions like “τί ἐστί”, “what is it?”, “what like is it?” (Cross, 1959), and by describing the defining qualities that all instances of a particular category share, e.g. all soft things share the quality of softness.
Real progress in describing and understanding such qualities (features) was made by Fred Attneave in experiments of visual perception during his Ph.D. research at Stanford University (Attneave, 1957). Specifically his results showed that subjective ratings of the perceived dissimilarities between stimuli (letter stings and shapes) and the frequency of errors while learning those stimuli, could both be explained in terms of distances between stimuli represented as points in a “psychological space”. The idea of describing subjective psychological experiences with geometry influenced Roger Shepard, who applied and extended multidimensional scaling (MDS), the representation of objects (e.g. shapes, words, faces) as points in space, so that the distances between the points represent the perceived similarities between the objects (Richardson, 1938), (Torgerson, 1952), (Shepard, 1958), (Shepard, 1962a), (Shepard, 1962b), (Shepard, 1987).
The exact ways in which similarity of a perceived object and its mental representation is measured have been the focus of research for over half a century. According to the theory of prototypes, we categorize new objects by comparing them to abstracted representations that are the central tendency of all the examples of the categories that we have experienced (Posner and Keele, 1968), e.g. a prototypical triangle or square. According to exemplar theories, the perceived objects are compared with stored representations of exemplars, grouped by category (Medin and Schaffer, 1978), e.g. memories of actual triangles and squares we have experienced. Although the early exemplar models postulated equal weights for all stimulus features, later contributions included an attention-optimisation hypothesis, allowing the perceived distance (similarity) between objects to vary with context and task demands (Generalised Context model, (Nosofsky, 1986)). For example this implies that for a layperson the representations of a trout and bass may be close together (similar), but for an ichthyologist or fisherman those representations are further apart (less similar) because they can identify a number of features that make the two types of fish different.
Eleanor Rosch analyzed language patterns and visual perception across cultures and did the experiments that I suspect Aristotle would have liked to perform himself (Heider, 1972, Rosch et al., 1976a). She showed that the categorizations we make rely on features of high validity (e.g. feathers and wings for birds) which form prototypes that aid cognitive economy and reflect perceived world structure (Boden, 2006), pp. 520-521. In other words “robin” is a more typical bird example, than “ostrich”, and this helps us recognise and make decisions about objects faster, and also reflects the likelihood to encounter such objects in our environment. Rosch named the hierarchical level where category prototypes are found as the basic level of categorization (e.g. bird, fish, tree), which is the one that provides the most useful information, the first to be named by children, and the most necessary in language (Rosch, 1976, Rosch et al., 1976b). The level below (subordinate) contains more specific information (e.g. robin, sparrow), while the level above (superordinate) (e.g. animal, plant) less specific information. These results explained how we structure information about the world around us, to a large extent in a universal, rather than arbitrary or culture-dependent way. The opposite conclusion would render the following “ancient Chinese” animal taxonomy, presented by Jorge Luis Borges (1966) (in (Boden, 2006), p. 519) entirely possible:
“[Animals] are divided into (a) those that belong to the Emperor, (b) embalmed ones, (c) those that are trained, (d) suckling pigs, (e) mermaids, (f) fabulous ones, (g) stray dogs, (h) those that are included in this classification, (i) those that tremble as if they were mad, (j) innumerable ones, (k) those drawn with a very fine camel’s hair brush, (l) others, (m) those that have just broken a flower vase, (n) those that resemble flies from a distance”.
Crucially for the question of expertise, Rosch’s basic level of categorization, can be and is modified by experience. Based on their observation that an airplane mechanic answered questions about airplanes differently to other (non-expert) participants, Rosch and colleagues suggested that one way to study what part a person’s previous knowledge plays in categorization, would be a systematic variation of the participants’ level of expertise and the object categories (Rosch et al., 1976a). They also speculated that an ichthyologist would use as a basic level, what a novice would consider a more specific, subordinate one (e.g. trout or salmon, instead of fish). Tanaka and Taylor performed such an experiment, and compared the categorization performance of dog and bird experts in listing distinctive features, as well as speed and level of categorizing animals in and out of their area of expertise. They found that the most informative category for the experts was the more specific, subordinate level, rather than the basic one (Tanaka and Taylor, 1991), e.g. crow or robin, rather than bird for the bird experts, and beagle or collie, rather than dog for the dog experts.
Despite long-running debates about the usefulness of the concept of similarity in explaining categorization (e.g. Goodman, 1972), and the merits of prototype, exemplar or other theories, most researchers agree that categorization is based on an expanded notion of similarity, an overall similarity, which encompasses physical, functional and overall features, reflecting a person’s theoretical understanding of the world, e.g. (Murphy and Medin, 1985) in (Ahn and Dennis, 2001). The concept of similarity has furthermore inspired computational theories of vision, where “the representation of similarity is taken to be the goal of the visual system” (Edelman, 1998), modern extensions of which are particularly appropriate for relating neural data, behaviour and models across species and neuroscientific methods (Kriegeskorte, 2009).
In this chapter I will first review some background work that sets the context for the subsequent discussion of a subset of fMRI studies that have shed light on the questions of perceptual and cognitive categorization and expertise over the last 15 years.
2. Computational and experimental work
During the end of the 20th and the beginning of the 21st century, the idea that neuronal populations code representations which reflect both physical stimulus similarity and perceived stimulus similarity, as shaped by task demands, received significant experimental support. In order to bridge the species gap between human and non-human primates, and study meaningfully the neuronal representations in the macaque brain, it was first necessary to test if the categorizations that monkeys make can also be explained with MDS, and prototype and/or exemplar theories. This involved the creation of novel, parametrically designed stimuli with a known number of varying dimensions (features). The premise here was inspired by computational theories of vision that called for representations that were two-dimensional viewpoint-dependent snapshots of three-dimensional objects (Poggio and Edelman, 1990) that preserved the geometry of similarity amongst the objects (Edelman, 1998) (as opposed to stored three-dimensional representations of the objects themselves (Marr and Nishihara, 1978) (Biederman, 1987)). The first study to show that MDS was a useful tool for understanding object categorization in the monkey was described in (Sands et al., 1982). Crucially, they did not provide categorization training to the animals, and they explored how macaques perceived and represented pictures of various natural categories (faces, fruit, colours). They reached the conclusion that macaques treat pictorial stimuli categorically cf. (Sigala, 2009). Several years later (Sugihara et al., 1998) employed novel stimuli (computer-generated 3D animals) to test the usefulness of MDS in the study of object recognition, following a previous demonstration with the human visual system (Cutzu and Edelman, 1996). After systematic training of the monkeys to report their perception of the stimuli, Sugihara et al. showed that the psychophysical representation of the novel stimuli, as revealed by MDS, captured the similarities built in the stimulus space. This means that the monkey visual system could successfully recover a two-dimensional configuration of the stimuli that were originally built in a high-dimensional space (set of 56 variables), and reliably capture their relative similarities. This was the first piece of evidence for the non-human primate visual system that representations of the stimuli might be representations of the similarities of the stimuli.
Following this psychophysical experiment, Sigala and colleagues also created two parametrically designed stimulus sets (schematic faces and fish) with four varying dimensions (Sigala et al., 2002). The participants (both humans and non-human primates) first had to report how similar the stimuli were to them, then were trained to categorize them at the subordinate level based on the combination of two features, and then reported how similar the stimuli were to them a second time. By collecting similarity ratings before and after the categorization training, it was possible to show that the perceptual similarity of the stimuli changed, particularly for the macaques, after they had learnt to categorize the stimuli based on two features (ignoring the other two equally varying features) (Fig. 1). The initial MDS solution of the similarity ratings showed that the stimulus features were not consistently used when comparing the stimuli (Fig. 1a).
However training to categorize the stimuli based on a subset of their features, changed the way they perceived and represented them, even in the context of a different task (similarity ratings vs. categorization) (Fig. 1b, Fig. 2, Fig. 3). Looking for the neuronal underpinnings that supported this perceptual change (Sigala and Logothetis, 2002) found that cells in the anterior inferior temporal cortex selectively represented the values of the features that were important (diagnostic) for the categorization task, over the values of the features that were not (Fig. 4). This was clear evidence that perceptual expertise correlated with selective tuning of cells in the temporal cortex, which presumably developed over the course of training, since the stimuli were unlike anything the animals encountered in their normal environment. It was also evidence in favour of the Generalised Context Model (Nosofsky, 1986), according to which selective attention processes make the perceptual multidimensional stimulus space to shrink or expand reflecting the importance of the most relevant stimulus dimensions (Fig. 5) (Gauthier and Palmeri, 2002). It is clear that if one input the firing rates of the recorded cells in an MDS analysis, the result would resemble the solution recovered for the behavioural data after training (Fig. 1b), where the diagnostic feature values (that separate the categories) are perceived as dissimilar and end up further away from each other, while the non-diagnostic feature values (that don’t separate the categories) are perceived as similar and end up close together in space. A similar finding was reported in a single-unit study of inferior temporal neurons, where the stimulus space of parametrically designed shapes was recovered both by psychophysical and neurophysiological measurements (Op de Beeck et al., 2001).
3. The brave new era of fMRI
Until the nineties the study of the neural substrates of object recognition in the human brain relied mainly on patients with brain damage, e.g. (Farah, 1992). However behavioural and event-related potential studies of normal subjects, as well as behavioural, lesion and single-unit recording studies in macaque monkeys, had provided a wealth of evidence for the functional organisation of the primate visual system, and of object recognition in particular. With the advent of functional imaging (initially PET, but mainly of fMRI) it became possible to see the normal human brain in action for the first time. Two of the very first questions people wanted to ask concerned the localisation and functional organisation of object recognition: where does recognition happen in the brain, and are there specialized cortical modules for different object categories. These were really important questions to ask because a) single-unit recordings (Desimone et al., 1984) and cortical field recordings in human patients (Allison et al., 1994) did not have the global coverage necessary to show the extent of clusters of inferior temporal neurons tuned to faces or other trained object classes e.g. paperclips (Logothetis and Pauls, 1995), and to assess hemispheric laterality; b) structural models of vision and object recognition (Marr and Nishihara, 1978), (Biederman, 1987) did not suggest different representations for different types of objects, or multiple levels of recognition; but c) neuropsychological studies had provided evidence for at least three different brain modules, specific for faces, common objects and words, see (Farah, 1992) for a review.
In this chapter, I present some experimental and theoretical progress that has followed from the literature of perceptual similarity, and will review contributions of the fMRI literature.
(Sergent et al., 1992) provided the first neuroimaging evidence for category-selective responses for face stimuli, with PET imaging of normal subjects. They showed a dissociation between face processing in the right ventromedial hemisphere, and object processing in the left occipito-temporal cortex. The first study to employ fMRI to address the localisation question was by (Puce et al., 1995). They compared faces vs. scrambled faces and reported a number of bilateral activations, with the strongest in the right fusiform gyrus. But it was the study by (Kanwisher et al., 1997) (see (Kanwisher et al., 1996) for preliminary results) that thoroughly tested the response specificity of a proposed face module for the first time, and coined the acronym, the Fusiform Face Area (FFA) that has been with us for almost 15 years now (Fig. 6).
Nancy Kanwisher and her colleagues first localised the area that was significantly more active for faces than objects, then tested if the activity of the area could be explained by a number of variables: low level features; the fact that all faces belonged in the same level category, while the objects were from different categories, by comparing with houses; the fact that faces were compared with inanimate objects, by comparing with hands; attentional factors, by comparing passive fixation with a one-back task. That study was important for several reasons: a) because it showed a clear selectivity for faces over other object categories; b) it pointed to the need for different theories of computation for object and face recognition; c) brought the discussion about whether faces were special, or this specificity could be expected to develop for every class of objects that one was expert in (Diamond and Carey, 1986), in the neuroimaging field.
4. Car, bird and greeble experts
Gauthier and her colleagues had also been working on the question of functional specialization using fMRI (Gauthier et al., 1996a) (Gauthier et al., 1996b), and had preliminary evidence that FFA may also be recruited in subordinate categorizations (e.g. for the basic level “bird”, the subordinate level –requiring additional perceptual processing-could be “skylark” or “blackbird”) (Gauthier et al., 1997). Their fMRI work with greebles, novel objects created to provide a control set equivalent for faces (Fig. 7), indicated that the FFA may be an area involved in visual expertise in general and not just for faces (Gauthier et al., 1999). This proposal has become known as the expertise hypothesis.
The greebles however have been criticised as being too similar to faces (in terms of symmetry, features, configural processing), which would make them less than ideal controls for the face specificity hypothesis of FFA (see below for neuro-imaging evidence). In a different experiment, Gauthier and her colleagues, recruited car and bird experts in order to test whether FFA was recruited during categorization of objects in their domain of expertise (Gauthier et al., 2000). The results showed that while the FFA response was still greater for faces, rather than other objects, it was nonetheless involved in processing the objects in the participants’ area of expertise as well (e.g. birds for bird experts). This finding has been subsequently replicated by other laboratories (Xu, 2005), as well as in different domains of expertise (e.g. with moths and butterflies) (Rhodes et al., 2004). The BOLD signal intensity of the FFA in these studies is greater for faces than other stimuli, so there is specificity, but not exclusivity, for face stimuli, a domain of expertise that most humans share.
Another way to test the expertise hypothesis, instead of recruiting real world experts, is to effectuate expertise in the lab following intense training until a certain discrimination criterion has been reached, and/or a certain number of training hours (e.g. 10) have been completed. These paradigms, not only allow to test the responses of FFA or any other brain area, but also to study other changes as a result of training. For example (Op de Beeck et al., 2006) tested changes in cortical activations before and after 10 hours of training to discriminate novel 3D shapes at a subordinate level in a match-to-sample task. The authors used three novel shape categories, which they named “smoothies”, “spikies” and “cubies” based on the type of protrusions they had (Fig. 8).
The approach of using artificial stimuli avoids confounds of prior, uncontrolled experience of the participants with the stimuli, and should lead to new cortical representations that are formed during the course of training. The participants were scanned before and after the training sessions, and while performing an orthogonal task to shape discrimination (colour change detection task), so any observed changes in the cortical activations could not be attributed to changes in task performance or differences of other cognitive factors e.g. attention, task difficulty etc. This study showed that the behavioural improvement in shape discrimination (one trained class for each participant) was accompanied by the following changes in cortical activations: 1) more voxels, therefore more groups of neurons, responded to the trained rather untrained stimuli in the post-training scan; 2) these voxels did not form a large cluster, but multiple small clusters in the extrastriate visual cortex (Fig. 9); 3) the Lateral Occipital Cortex (LOC) showed more activity to the trained stimuli compared to the FFA and early retinotopic visual areas, but also showed more activity to the stimuli before the training had taken place, especially in the right hemisphere; 4) the right FFA did not show increased responsiveness to the trained stimuli, except in one participant trained with “smoothies”, who during debriefing reported interpreting the stimuli as “women wearing hats”, in other words face-like (Fig. 9).
This last observation shed light as to what could be driving the FFA responses in the studies with greebles, which are also face-like stimuli. This was further tested in a recent study by (Brants et al., 2011) which replicated the design of the original greeble fMRI study (Gauthier et al., 1999) and extended it by scanning the participants both before and after training with the greebles. The main finding of this experiment is that even before training the participants showed an inversion effect - with the Greebles (that did not increase after training), indicating that these objects are interpreted as living entities with faces, and the cortical activations elicited by them may not be due to expertise.
Two more studies that contrasted the effect of mere exposure vs. categorization training (van der Linden et al., 2008) and a categorization vs. individuation task (Wong et al., 2009) with computer-generated, parametrically designed stimuli also showed increased cortical activations after training in areas near and somewhat overlapping with the right FFA.
5. The extent of the brain areas involved in categorization revealed with fMRI
Lack of increased right FFA response to trained stimuli (morphed cars) within a subordinate category was also reported by (Jiang et al., 2007) in a study of categorization training that revealed a large number of areas, including prefrontal, insular, parietal and inferior temporal cortices, as well as the thalamus that responded to the trained stimuli vs. a fixation baseline (Table 1). A similar network of cortical and subcortical areas, including the striatum, was also revealed by a study of categorization of dynamic moving patterns (Li et al., 2007), contrasted with a rotation detection task of the same stimuli. This study indicated a different role for temporal and parietal areas, and for prefrontal and striatal areas. For the former areas, the selectivity to diagnostic visual features was tuned according to task demands. Prefrontal and striatal areas are thought to provide top-down control by shaping the neural representations in the cortical areas involved in the relevant stimulus feature processing (e.g. inferior temporal cortex for shape). This study replicated and extended a number of key finding from separate single unit studies in the macaque, partly because of the global brain coverage that fMRI affords, but also thanks to careful parametric design of the stimulus space, and the sensitivity of Multi Voxel Pattern Analysis (MVPA) methods that capitalize on differences across distributed patterns of activity in the brain, e.g. (Cox and Savoy, 2003), (Norman et al., 2006) (Fig. 10).
MVPA, an approach that uses a variety of powerful pattern classification algorithms to decode the information in multi-voxel patters of activity, is particularly well suited for the study of distributed cortical representations. Although single unit studies had already pointed to the distributed nature of information coding e.g. (Young and Yamane, 1992), it is the excellent spatial coverage and non-invasive nature of fMRI that can provide information about where categories are represented in the human brain. The advantages of MVPA include: a) the sensitivity to signals that can be ignored in univariate approaches, where each voxel is treated independently and its activity modulation must reach a statistical significance threshold; b), brain activity and behaviour can be related on a trial-by-trial basis (as in single unit studies) rather than hundreds of averaged trials; and c) most importantly, it allows to characterize the structure of neural representations in individual subjects with or without the use of pattern classifiers (Norman et al., 2006). The earliest attempt to examine and relate the entire distributed pattern of voxel activations in object-related cortical areas to object categories was pioneered by Edelman and colleagues (Edelman et al., 1998), and showed promising results in recovering stimulus spaces with MDS. Pictures of natural stimuli, human and animal faces (Figure 11), and computer-generated stimuli resembling fighter planes, human bodies, sharks, four-legged animals, cars and vans (Figure 12), separated in meaningful categories based on the fMRI activity patterns. This study pioneered the multivariate analysis approach with fMRI data, and further suggested that both perceptual and cortical representations may be representations of similarity, which can vary according to the importance of the stimulus features and level of experience with the stimuli, e.g. (Schyns and Rodet, 1997), (Op de Beeck H, 2000), (Sigala, 2004).
The finding of distributed, rather than category-specific clusters of activation in the cortex, was later confirmed with a now seminal study by (Haxby et al., 2001) who showed that despite a small number of areas specialized for specific categories (see section: Other specific modules: Bodies, Places, Words), the representations of objects in the ventral temporal cortex are very distributed and overlapping. This means that the representation of a particular object, or object category, is encoded in the activity pattern created by a large number of neurons, and each of these neurons takes part in instantiating multiple object representations.
In the most recent and visually compelling demonstrations of the power of combining the notion of similarity with multivariate analyses, (Kriegeskorte et al., 2008b) demonstrated the close correspondence of the activity patterns in human and non-human IT cortex, and revealed a mostly orderly representation of object super-ordinate categories (e.g. animate, inanimate), as well as a continuous representations of categories and exemplars (Fig.13, 14). This approach promises to help bridge the gap of relating the different approaches (computational, imaging, electrophysiological and psychophysical) and species (mainly human and non-human primates) that have figured in the field of object recognition and categorization over the decades (Kriegeskorte et al., 2008a).
6. Other selective modules: Bodies, places, words
The answer to the question “are faces special” is probably yes – the question refers to visual processing mechanisms that may be unique to faces. The case study of a boy who sustained brain damage on the first day he was born, resulting in an inability to recognise individual faces (while his object recognition is largely intact) even after acquired experience and plasticity, as he was tested at the age of 16 (Farah et al., 2000), indicates that the brain is equipped to cope with face recognition at birth and this core ability may not be restored with experience. The FFA therefore may be the result of natural selection for copying with a common and important stimulus for survival, which occupies a dense stimulus/similarity space. This is because all faces look largely similar in terms of number and configuration of features, and multiple categorizations need to be applied on them in a very short time (e.g. identity, sex, emotion). This does not mean however that experience is not important, since the fine tuning of face recognition ability also depends on experience, as seen e.g. in the other-race effect - (Lindsay et al., 1991), (Feng et al.).
Other modules selective for processing certain categories of stimuli that have been revealed with fMRI are the Parahippocampal Place Area (PPA), which is selective for the geometry of the local environment including pictures of houses (Epstein R, 1998), the Extrastriate Body Area (EBA) selective to human body parts (Downing et al., 2001), as well as the letter string area, which has been elegantly shown to form as a result of experience (Baker et al., 2007).
Imaging work in the non-human primate has also revealed the full extent of the representation of faces and other categories in the brain. The equivalent of the human FFA, seems to be still elusive (Ku et al., 2011), but a network of areas that respond more to faces than to other objects has been repeatedly shown with fMRI in the macaque brain including the temporal and prefrontal cortex, and the amygdala (Logothetis et al., 1999), (Tsao et al., 2003), (Hoffman et al., 2007), (Tsao et al., 2008b), (Tsao et al., 2008a), (Ku et al., 2011). The exact role of each face patch is not clear, but what we do know is that it only takes a small amount of micro-stimulation of a small group of face selective neurons to affect the perception of a macaque categorizing face vs. non-face stimuli and bias the response towards the face category (Afraz et al., 2006). A direct comparison of the reported locations of these face patches from different groups, reveals such a dense network of selective cell groups, which clearly points to a very distributed representation of faces in the non-human primate brain (Fig. 15). However, it should be noted that the majority of these face patches are also active in the anaesthetised preparation (Logothetis et al., 1999), (Ku et al., 2011), complicating the interpretation of the findings and the role of these activations in conscious perception.
It would be unfortunate if the reader was left with the impression that fMRI has only revealed locations and representations of stimuli that are compatible with certain computational and theoretical accounts of visual perception and long term memory. Other important contributions of fMRI in the literature of expertise relate to evidence for the improvement of working memory abilities in experts (Moore et al., 2006), as well as the demands of encoding strategies (e.g. chunking) that can make certain tasks easier (Bor et al., 2003), and the possibility that a frontoparietal cortical network may be a general purpose expertise-based network, e.g. (Bor and Owen, 2007).
Despite its relatively young age, the fMRI community has engaged with and made important contributions to most of the questions that had been keeping single-unit electrophysiologists busy for decades, regarding for example functional specialisation, local vs. distributed processing, hierarchical representations and the malleability of those representations. The emphasis in the field has slowly but surely shifted from where functions are taking place to how the underlying computations are achieved.
The unprecedented ability to see the whole human brain in action has reassured us about its similarities with non-human primate brains, e.g. (Orban et al., 2004), but also pinpointed differences, e.g. (Petit et al., 1997), and has revealed the extent to which the brain operates as a functional network, e.g. (Vogel et al., 2010, Van Dijk KR et al., 2010).
In the field of visual categorization and expertise, fMRI has revealed a number of specialised areas for processing biologically and culturally important categories of stimuli, like faces and letter strings. At the same time fMRI has revealed how distributed the representations of most object categories are, and how these may be organised in a hierarchical way that breaks up the complexity of the world in manageable chunks, governed by perceptually and cognitively defined similarity rules, that take into account task demands.
- In fMRI experiments the inversion effect is associated with higher activity in FFA for upright than for inverted stimuli. The inversion effect is a hallmark of holistic processing (the integration of the stimulus parts into a whole representation), which may be specific to faces, as opposed to part-based processing for other types of stimuli, like animals or artifacts.
- The other race effect (or own-race bias): the phenomenon describes the fact that, despite our impressive ability to effortlessly recognize and remember faces, we make more errors when we remember and recognize faces from another, less familiar race (O’Toole et al., 1994).