There has been a recent resurgence of interest in the use of haptic displays to augment human performance, and to provide an additional means of information transfer to interface operators whose visual and/or auditory modalities may be otherwise informationally-overloaded (e.g., Gallace et al., 2007; Kaczmarek & Bach-y-Rita, 1995; Spence & Ho, 2008a; Yannier et al., 2008; Zlotnik, 1988). Over the last few years, researchers have investigated the use of tactile interfaces to provide assistance in a wide variety of settings including everything from vibrating belts to provide navigation support (Nagel et al., 2005) through to wrist watches that allow the user to tell the time by the pattern of vibration that they feel on their wrist (Töyssy et al., 2008). However, the more extravagant predictions made by early researchers regarding the potential uses of vibrotactile interfaces – that people would soon be monitoring the latest stock market figures via vibrating waist displays (see Geldard, 1974; Hennessy, 1966), and/or watching television using nothing more than a 20 by 20 array of vibrators on the back of their chairs (the so-called “tactile television”; Collins, 1970) – have, as yet, proved to be too far-fetched (even allowing for extensive practice to familiarize themselves with the devices concerned).
The problem with the implementation of these predictions was that early researchers typically failed to account for the fundamental human limits on the processing of tactile information through artificial displays (e.g., see Gallace et al., 2007; Spence & Driver, 1997b, for reviews). Here, it is critical to note that humans are severely limited in their capacity to process information, and, if anything, the limits on the processing of tactile information seem to be far more restrictive than for visual or auditory modalities (see Spence & Gallace, 2007; Spence &Ho, 2008a). What is more, many vibrotactile interfaces were originally tested in the laboratory under conditions of unimodal sensory stimulation. In real-life environments, however, multiple senses are likely to be stimulated at the same time, and visual stimuli seem to have priority access to our attentional resources (Posner et al., 1976; Spence et al., 2001). Nevertheless, one area where there has been a lot of interest (and promise shown) in the last few years relates to the use of non-visual cues to facilitate people’s visual search performance. It is on this aspect of tactile and multisensory displays that this chapter will focus.
It is our belief, given the known limitations on the processing of tactile information, that the primary role of tactile information displays in the coming years will be in terms of providing relatively simple information to interface operators in order not to overload their limited capacity for tactile information processing under conditions of concurrent multisensory stimulation (Spence & Ho, 2008a; see also Cao et al., 2007). However, it is important to note that we do not wish to imply by this that the haptic sense is necessarily fundamentally inferior to vision or hearing in terms of its ability to transmit information to an interface operator. In fact, it is often taken for granted (and hence under-appreciated) that the haptic sense is actually capable of processing vast amounts of information in our daily lives. This may be partly due to the fact that few of us encounter people who are haptically-challenged or are aware of the devastating effects caused by the loss of tactile/kinesthetic sensation. The story of Ian Waterman, an Englishman who lost his haptic sense from the neck down, provides a rare glimpse into the crucial role tactile/kinesthetic information plays in our daily tasks, such as helping us to maintain our posture, walk, and even button-up our shirt in the morning (see Cole, 1995).
Before we proceed, it is also worth pointing out that most tactile displays stimulate only a small part of the haptic sense. The term haptics is used here to refer to both tactile and kinesthetic sensing, as well as manual manipulation (Loomis & Lederman, 1986). The majority of tactile displays that have been developed for user interfaces only provide passive vibrotactile stimulation, and their bandwidth and spatial density (when an array of tactors are used) do not yet fully match the sensory capabilities of humans (e.g., Verrillo & Gescheider, 1992). Force-feedback devices constitute a type of kinesthetic display, but they are typically not portable and hence their usage is limited in applications such as collision avoidance systems and facilitating visual search in dynamic environments. It is therefore not too surprising that the success of tactile displays has, to date, been so limited, since we have yet to tap into the full potential of the haptic sense. It is important to note, however, that there are many ‘small‘ mouse-like devices which provide force-feedback (Akamatsu & MacKenzie, 1995, 1996) or stylus pen type devices (Forlines & Balakrishnan, 2008) that have now been shown to be effective in daily computing situations (Viau et al., 2005). Therefore, size may not turn out to be as big a problem as previously thought when considering the use of kinesthetic feedback.
The deaf and deaf-and-blind community have long used methods such as fingerspelling and Tadoma (see Tan & Pentland, 2001 , for a review) in order to communicate: With the Tadoma method (see Reed et al., 1985), deaf and blind individuals place their hand on a speaker’s face with their thumb resting vertically on the center of the speaker’s lips, and the fingers spread across the speaker’s cheek and neck. Tadoma users are able to pick-up the naturalistic mouth opening, airflow, muscle tension and laryngeal vibration information through the hand. Tadoma users can achieve rates of information transfer of up to 12 bits/s (see Reed & Durlach, 1998), which is about half of the rate exhibited by able-bodied individuals when monitoring audiovisual speech.
The success of ’natural‘ tactile communication methods, such as Tadoma, provides living proof that haptics, when properly engaged, has the potential to provide an effective communication channel with a surprisingly high rate of information transmission. That said, it is also important to note that there are tremendous individual differences with regard to the limits of tactile information transfer (see Craig, 1977). For instance, two of the many thousands of sighted participants tested by Craig over the years were found to be able to read at a phenomenal rate of 70-100 words per minute (approximately 9-13 bits/s) through their fingertips using the vibrotactile patterns generated by the Optacon (Bliss et al., 1970); That is, at rates two to three times those seen in blind participants with an equivalent amount of practice. More impressive still was the fact that Craig’s ’extraordinary observers‘, as he called them, were able to read at a higher rate through their fingertip than through an equivalent visual display! Thus, we would argue that while it is still important for tactile interface designers to consider the limits of human tactile processing, the opportunities for innovative tactile interfaces to provide useful information to interface operators in the coming years ought to be stressed. Some possibilities here for the increased use of tactile interfaces include the provision of alert and interrupt signals (Calhoun et al., 2003; Hameed et al., 2009), directional or waypoint navigation signals (e.g., Bosman et al., 2003; Ho & Spence, 2007 ; Jones et al., 2006; Nagel et al., 2005; Van Erp, 2005 ; Van Erp et al., 2004 , 2005 ; Van Erp & Van Veen, 2004 ; Van Veen et al., 2004), orientation signals (e.g., for astronauts working in microgravity or deep-sea divers; Van Erp & Van Veen, 2006), signals to improve situational awareness (e.g., Raj et al., 2000) and/or spatial warning signals (e.g., Ho et al., 2006 ; Ho & Spence, 2008; Van Erp et al., 2007).
Compared to ’natural‘ tactile communication methods, most artificial tactile displays developed for tactile aids and human-computer interactions have yet to demonstrate information rates beyond 6-7 bits/s (see Reed & Durlach, 1998). In the future, this may be remedied by expanding haptic displays so that they can stimulate both the tactile and kinesthetic senses (e.g., Reed et al., 2003; Tan et al., 1999, submitted). It could also be argued that we have yet to learn how to communicate through the skin as effectively as we might using display technology and coding schemes that go beyond simply mimicking vision (the retina; see the next section) or hearing (the cochlea). Learning more about the perceptual grouping of tactile information, such as through the study of tactile Gestalts, will likely help here (see Gallace & Spence, submitted). However, when thinking about the presentation of tactile patterns to the skin of an interface operator, it is important to highlight an often under-appreciated problem relating to the question of what perspective we view stimuli/patterns that are ’drawn‘/presented on the skin.
2. From what perspective do we view tactile stimuli presented on the skin?
It is interesting to note here that the issue of where to present vibrotactile information on an interface operator’s body is becoming more and more important now that researchers are increasingly looking at the possibility of presenting letters and other meaningful, spatially-distributed patterns of vibrotactile stimulation using vibrotactile chairs, corsets etc. (Auvray & Spence, 2009; Jones et al., 2006; Jones & Sarter, 2008; Loomis, 1974; Tan et al., 2003; Yanagida et al., 2004). For example, Yanagida et al. reported up to 87% successful letter recognition in some cases using a 3 x 3 array of vibrators on the back of a chair. Note that the vibrators were activated sequentially, and in the same sequence (as if someone were tracing the letter on the chair’s, or person’s, back).
Given that nearly 50% of our skin surface is found on the torso, the back clearly offers great opportunities for the tactile presentation of information. One well-known psychological illusion that is relevant to the discussion here occurs when an ambiguous letter (such as a ‘b’, ‘d’, ‘p’, ‘q’) is drawn on a person’s forehead (e.g., Krech & Crutchfeld, 1958, p. 205; Natsoulas, 1966; Natsoulas & Dubanoski, 1964). If the person on whom the letter is drawn is asked to identify the letter, they will often describe the mirror image of the letter that was actually drawn – e.g., frequently saying ‘b’ if a ‘d’ was drawn, etc. (see Kikuchi et al., 1979). Krech and Crutchfield (1958) found that about 75% of people take an internal perspective (i.e., as if looking out from an imagined perspective in the middle of the body; the so-called ‘egocentre’; note that it is this perspective that leads to the mirror-reversals), while the remaining 25% took the external perspective (as if standing outside themselves), when a character was drawn on their forehead. A similar confusion has also been shown to occur for letters drawn (or presented) on the stomach. By contrast, the majority of people tend to report letters (or other symbols) that are drawn on the back of their head (or on their back) correctly. Such results have been taken to show that when trying to interpret the pattern of stimulation on their backs, people are likely to take an ‘external’ perspective (see Figure 1). In fact, it has been argued that we normally take this external perspective (as if standing behind ourselves) when trying to interpret patterns drawn on the body. This may perhaps help to explain why it is so easy to achieve ‘out-of-body’experiences in precisely this situation (i.e., when it appears that we are standing outside and behind ourselves; see Aspell et al., 2009; Ehrsson, 2007; Lenggenhager et al., 2007).
Taken as a whole, the experimental literature that has investigated the viewpoint from which people interpret letters/symbols drawn on the skin suggests that presenting meaningful stimulus patterns to an interface operators’ back may be easier than presenting the same stimuli to their stomach. It is certainly likely to result in a more consistent pattern of responding from interface operators. Back displays also have the advantage of keeping an interface operator’s hands free. Pattern recognition also appears to be superior on the back than on the forearm (Jones et al., 2006). Furthermore, presenting tactile stimuli to stationary parts of the body (such as the back) also avoids the change numbness/blindness that can be experienced when tactile stimuli are presented to moving limbs (see Gallace et al., 2009).
3. The crossmodal correspondence problem in multisensory interface design
In recent years, there has been a rapid growth of research investigating the effectiveness of tactile cues in directing an interface operator’s visual attention in a particular direction. Often the effectiveness of these tactile cues has been measured against the effectiveness of auditory cues (since both are non-visual). In this chapter, the focus will be on the vibrotactile (auditory and audiotactile) cuing of visual search in cluttered visual displays. Given that tactile cues will nearly always be presented in different spatial locations from the visual displays that they are designed to inform an interface operator about, this raises the correspondence problem (e.g., Fujisaki & Nishida, 2007; Marr, 1982).
In its traditional form, the correspondence problem referred to the difficult situation faced by the brain when it has to ‘decide’ which stimulus in one eye should be matched with which stimulus in the other eye (especially with stimulus displays such as random dot stereograms; e.g., Julesz, 1971; Marr, 1982). However, while it was originally framed as a purely unimodal visual problem, researchers have recently come to realize that (in complex real-world scenes) the brain also faces a crossmodal version of the correspondence problem (see Fujisaki & Nishida, 2007): How, for example, in a cluttered everyday, multisensory scene, does the brain know which visual, auditory, and tactile stimuli to bind into unified multisensory perceptual events and which to keep separate? A large body of basic psychological research has shown that spatiotemporal synchrony, semantic and synaesthetic congruency, and the ‘unity effect’ all play a role here in helping the brain decide which sensory stimuli should be bound, and which should be kept separate (Parise & Spence, 2009; see Spence, 2007, for a review).
Taking things one stage further, it can certainly be argued that the typical interface operator has a very similar (if not even more challenging) problem to solve. How does s/he know which location in the visual field s/he is being directed to look at on perceiving a completely-unrelated tactile stimulus that is presented on some part of their anatomy (often their back)? Clearly, while temporal synchrony can sometimes help here (but note that cues will sometimes need to be presented in advance of, or after, the relevant visual event; see below), precise spatial coincidence cannot. How then does an interface operator know which location in a distal visual display is being referred to by tactile stimuli on their body (e.g., back)? Is there a natural, dare we say ‘intuitive’ (Ho et al., 2007b; Van Erp, 2005 ), correspondence that interface designers can capitalize upon? If, as the literature briefly reviewed in the preceding section suggests, people take the perspective of standing behind themselves, looking forward as if ‘seeing’ their back from behind, then one might imagine that a tactile stimulus presented to the left side, say, of the participant’s back, if projected forward, would lead the participant to attend to the left side of the visual display. We will move now to a review of the evidence on the tactile cuing of visual search.
4. Facilitating visual search using non-visual and multisensory cues
Van der Burg et al. (2009) recently investigated whether vibrotactile cues could be used to facilitate participants’ visual search performance in cluttered displays. The visual search displays in their study consisted of 24, 36, or 48 line segments oriented at +22.5º that regularly, but unpredictably, changed colour during the course of each trial (see Figure 2). The participants had to discriminate the orientation (horizontal vs. vertical) of the visual target that was presented somewhere in the display on each and every trial. The vibrotactile cue was presented from a mobile phone vibrator attached to the back of the participant’s left hand. It should be pointed out that this non-visual cue was entirely spatially non-predictive with regard to the likely location of the visual target in the display, but that its onset was temporally synchronized with the colour change of the visual target.
Van der Burg et al.’s (2009) results showed that the vibrotactile cue had a dramatic effect on the efficiency of participants’ visual search performance: Search slopes dropped from 91 ms/item in the baseline no-cue condition to just 26 ms/item when the vibrotactile cue was presented: For the largest set size, the benefit resulting from vibrotactile cuing equated to a mean reduction in search latencies of more than 1,300 ms (or 30%). While error rates increased as the set size increased, there were no differences as a function of whether the cue was present or absent (thus arguing against a speed-accuracy trade-off account of this RT benefit; see Spence & Driver, 1997a). Interestingly, the benefits of vibrotactile cuing on participants’ visual search performance were of an equivalent magnitude to those that had been reported in an earlier study in which a spatially non-predictive auditory cue had been presented over headphones instead. In that study, the search slope was 31 ms/item when an auditory cue was present, as compared to 147 ms/item in the no-cue condition (see Van der Burg et al., 2008, Experiment 1).
Ngo and Spence (in press, submitted) have recently extended Van der Burg et al.’s (2008, 2009) research findings: In their first experiment, they demonstrated that vibrotactile cues presented to both sides of the participant’s waist (rather than to the participant’s left hand as in Van der Burg et al.’s, 2008, study) led to an equivalent visual search benefit as compared to when an auditory cue was presented over a pair of loudspeakers, one placed to either side of the computer monitor on which the visual search displays were presented (rather than over headphones as in Van der Burg et al.’s, 2008, study). In a second experiment, Ngo and Spence (submitted) went on to show that bimodal audiotactile cues resulted in visual search performance that was no better than that seen when the unimodal (either tactile or auditory) cues were presented (see Figure 3).
In a subsequent experiment, Ngo and Spence (submitted) went on to investigate whether making the cue (either tactile or auditory) spatially informative with respect to the likely side of the target would lead to any additional performance advantage. In this study, the cue correctly predicted the side of the target on 80% of the trials and was invalid on the remaining 20% of trials. Under such conditions, participants’ visual search performance was improved still further as compared to the spatially-uninformative central cuing condition (see Figure 4). It is, though, unclear whether this performance benefit should be attributed to the overt or covert orienting of participants’ spatial attention to the side of the cue (see Spence & Driver, 1994, 2004). However, given the relatively long mean visual search latencies (> 3,000 ms), it would seem likely that the participants in Ngo and Spence’s experiment would have moved their eyes around the visual display during the interval between its onset and the moment when they actually initiated their manual discrimination response (see Henderson, 2003; Henderson & Hollingworth, 1998; Tan et al., 2009; Van der Burg et al., 2008).
Here, for the first time in the task popularized by Van der Burg et al. (2008, 2009), auditory cues were found to result in significantly faster overall visual search latencies than vibrotactile cues (there had been no difference in any of the previous studies using this paradigm). The visual search slopes were also shallower following auditory than following vibrotactile cuing. Why should this be so? Well, it may be that when a non-visual cue provides spatial information to a participant (or interface operator), it is more advantageous if the cue is presented from the same functional region of space as the target stimulus that the cue is informing the interface operator about (see Ho & Spence, 2008; Previc, 2000; Spence & Ho, 2008b, on this point).
5. Interim Summary
To summarize, Van der Burg et al.’s (2008, 2009) recent research has shown that spatially uninformative auditory and vibrotactile cues can be used to facilitate participants’ visual search performance in cluttered visual displays. Ngo and Spence (in press, submitted) have extended these findings by showing that the performance benefits occur even when the auditory and vibrotactile cues are presented from different locations (in space and/or on a participant’s body), and that bimodal audiotactile cues are no more effective than unimodal cues in facilitating participants’ visual search performance. Ngo and Spence have also demonstrated that performance can be facilitated even further simply by making the cue spatially informative with regard to the likely side on which the target is presented. One obvious follow-up question to emerge from this line of research concerns whether operator performance could be facilitated still further simply by making the non-visual (i.e., tactile, or for that matter auditory, or audiotactile) cue even more informative with regards to the likely location of the visual target. While, as yet, no one has addressed this question using Van der Burg et al.’s specific ‘pip and pop’ or ‘poke and pop’ visual search tasks, other researchers have shown that visual search and change detection performance can benefit from the cuing of as many as three or four locations on a person’s back.
6. From left/right cuing to quadrant cuing and beyond
Lindeman et al. (2003) highlighted a facilitatory effect of vibrotactile spatial cuing on participants’ visual search performance using three possible cue locations on the left, middle, and right of a participant’s back (presented using a chair-back mounted vibrotactile display). The participants in their study had to search a display of 24 random letters in order to find a target letter (that was specified at the bottom of the screen; see Figure 5). Participants responded by using the mouse to click on one of the letters in the display. The vibrotactile cues in this study were 100% valid with regard to the panel (left, middle, or right) in which the visual target would be found. Under such conditions, vibrotactile cuing led to a 12% reduction in search latencies as compared to a no-cue baseline condition. Interestingly, however, Lindeman et al. also reported that visually cuing the relevant section of the visual display (see the right panel of Figure 5) led to a much larger (30%) reduction in target detection latencies. Once again, bimodal visuotactile cuing was shown to result in performance that was no better than that seen following the most effective of the unimodal cues (cf. Ngo & Spence, submitted).
It is, however, important to note here that it is unclear whether the reduced efficacy of vibrotactile (relative to visual) cuing reported by Lindeman et al. (2003) simply reflected uncertainty on the part of their participants with regard to the location of the vibrotactile cues on their back (since no measure of localization accuracy was provided in this study). Alternatively, however, this difference may also reflect the fact that, in this particular experimental setting, vibrotactile cues were simply not as effective as visual cues in facilitating participants’ visual search performance. It is interesting to note at this point that simultaneous visual cuing (the presentation of a visual halo around the display coinciding with the visual target colour change) was found to be singularly ineffective in facilitating participants’ visual search performance in a visual search study conducted by Van der Burg et al. (2008; Experiment 2b). This difference in results suggests that different mechanisms may have been facilitating participants’ performance in these two (at least superficially similar) experiments (see below for further discussion of this point).
Moving one stage further, Hong Tan and her colleagues at Purdue have conducted a number of studies over the last decade investigating whether the vibrotactile cuing of one quadrant of a person’s back can facilitate their change detection performance in a version of the flicker paradigm (see Jones et al., 2008; Mohd Rosli et al., submitted; Tan et al., 2001, 2003, 2009; Young et al., 2003). In the flicker paradigm, two similar visual scenes/displays are presented in rapid alternation (e.g., Rensink, 2000). In Tan et al.’s studies, the visual displays typically consisted of a random array of horizontal and vertical line segments (see Figure 6). The two displays presented in each trial differed only in terms of the orientation of one of the elements (alternating between horizontal and vertical in successive screen displays). A 120-ms blank scene was inserted between the presentation of each of the two displays in order to mask any transient local motion cues associated with the changing orientation of the target. Previous research has shown that people need focal attention in order to detect the change in such situations. On each trial, a 250-300 Hz vibrotactile cue was presented 200 ms before the onset of the visual displays (the vibrotactile cue was presented for 60 ms, and was followed by a 140 ms empty interval), from one of the 4 corners of a 2-by-2 square array of tactors mounted on the back of the participant’s chair (with a centre-to-centre spacing of approximately 16 cm). Importantly, Tan et al. confirmed that all of their participants could identify the quadrant from which each vibrotactile stimulus had been presented without error (on 60 trials) at the start of their experimental session. Upon detecting the changing item in the visual display, the participants had to click on a mouse button; They then had to move the cursor across the screen using the mouse and click again in order to identify the target item.
Tan et al. (2009) varied the validity of the vibrotactile cue in different experiments. Often, the visual target would be presented in the screen quadrant indicated by the vibrotactile cue on 50% of the trials, while it was presented from one of the three other, uncued, quadrants on the remaining 50% of the trials (constituting valid and invalid trials, respectively; see Tan et al., 2003). The results of experiments using such spatially-informative vibrotactile pre-cues revealed that participants were able to respond significantly more rapidly, and no less accurately, to visual targets presented in the cued quadrant than to targets presented in one of the uncued quadrants. So, for example, the participants in one study responded 41% more rapidly on the validly-cued trials than in no cue baseline trials, and 19% more slowly than in the no cue conditions when the cue was spatially invalid (i.e., when the cue indicated that the target would be presented in one quadrant, whereas, in reality, it was actually presented from one of the other three quadrants; cf. Ngo & Spence, submitted, Experiment 3). Another interesting result to emerge from Tan et al.’s (2009; Mohd Rosli et al., submitted) research was that RTs increased as the location of the target moved further away from the centre of the cued quadrant (toward the periphery). This latter result would appear to suggest that participants’ attention was initially focused on the centre of the cued screen quadrant before moving outward (or becoming more diffuse).
Recently, Tan et al. (2009; Jones et al., 2008) have started to monitor their participants’ eye movements (using an eye tracker) in order to assess how the presentation of vibrotactile cues on a participant’s back influences the overt orienting of their spatial attention around the visual search display situated in front of them. Under conditions where the vibrotactile cue validity was high (75% valid), Jones et al. reported that their participants’ predominantly directed their saccades to the cued quadrant initially. (As in their previous research, RTs to detect the target were significantly faster as compared to those seen in a no-cue baseline condition.) Interestingly, however, when the vibrotactile cue was made completely non-predictive with regard to the quadrant in which the visual target was likely to occur (i.e., when the target was just as likely to appear in each of the four screen quadrants, regardless of the quadrant in which the vibrotactile cue had been presented), and when the participants were instructed to ignore the vibrotactile cues, then no significant differences were observed in the pattern of overt orienting from that seen in the no-cue condition. Under such conditions, the participants tended to direct their eyes to the top-left quadrant of the display first. Tan et al.’s results therefore suggest that non-predictive vibrotactile cues presented to a person’s back can (under the appropriate conditions) be completely ignored. This result contrasts markedly with the results of other laboratory research highlighting the fact that people are unable to ignore vibrotactile cues presented to their fingertips (at least when the visual targets are presented from close by; i.e., from the same functional region of space; see Gray et al., 2009; Kennett et al., 2001, 2002; Spence et al., 1998).
One obvious question to emerge from this transition from 2, to 3, to 4 vibrotactile cue locations concerns just how many different spatial locations could potentially be cued on a person’s back in the tactile interfaces of the future. Lindeman and Yanagida (2003) have already shown, for example, that participants can identify the source of a 1 s, 91 Hz, vibration using a 3-by-3 array of 9 tactors mounted on the back of a chair (with a minimum 6 cm spacing between adjacent tactors; and, importantly, no practice) at a level exceeding 80% correct. Unfortunately, however, no one has yet (at least as far as we are aware) investigated whether using a 3-by-3 matrix of vibrotactile cues would give rise to a performance benefit in a visual search or change detection task that was any larger than that already demonstrated by Tan et al. (2009) in their quadrant cuing studies. This certainly represents an important area for future study given that, at some point, increasing the specificity of spatial vibrotactile cuing will no longer lead to any further enhancement of visual search performance. Why? Well, because of the well-known limits of discriminating vibrotactile stimulation for touch displays on the back will have been reached (e.g., Weinstein, 1968; Wilska, 1954). Note also that there are systematic biases in tactile localization that need to be taken into account when presenting a large number of vibrotactile stimuli to a person’s back/torso (e.g., see Cholewiak et al., 2004; Cholewiak & Collins, 2000; Van Erp, 2005). The influence of these biases on perceived vibrotactile localization is likely to become all the more pronounced as the density of cue locations (e.g., on the back) increases.
7. The importance of spatially co-localizing cue and target events
Ngo and Spence (submitted) have also investigated whether the reduced benefits of vibrotactile as opposed to auditory spatial cuing reported in one of their studies (Experiment 3), that was described earlier, may have resulted from the fact that vibrotactile cues have, of necessity, to be presented to an operator’s body surface (Gregory, 1967). By contrast, the auditory cues used in their study were presented from close to the visual target display instead (i.e., from the same functional region of space as the target event; see Previc, 2000). In order to assess the potential importance of relative cue position on the facilitation of participants’ visual search performance by non-visual cues, Ngo and Spence went on, in a final experiment, to compare their participants’ performance under conditions in which the auditory cues were either presented from close to the visual display (i.e., from external loudspeakers situated to either side of the visual display) or via headphones (i.e., from close to the participant but further from the visual display, as in Van der Burg et al.’s, 2008, study). In separate blocks of experimental trials, the cue was either spatially nonpredictive (i.e., 50% valid) or 80% predictive with regard to the likely side of the visual display in which the target was presented. Note that exactly the same spatial information was provided in both cases (i.e., no matter whether the cue sound was presented over headphones or from the external loudspeakers). Ngo and Spence nevertheless still found that their participants were able to discriminate the orientation of the visual targets significantly (34%) more rapidly when the auditory cues were presented from close to the visual display than when they were presented from close to the participant (i.e., over headphones; see Figure 7). These results therefore suggest that, wherever possible, spatially co-localizing the non-visual cue (or warning) signal with the target visual event/display may be advantageous, especially when the cue provides spatial information to an interface operator.
Speaking more generally, we believe that Ngo and Spence’s (submitted) results support the view that vibrotactile cues may, if anything, be inherently somewhat less effective in facilitating an interface operator’s performance than auditory cues given that they have to be presented from the participants’ body, which in many situations may be far away from the relevant visual event or display (see Spence & Ho, 2008b, for a review). By contrast, it is typically much easier to present auditory cues from close to the location of the relevant visual display (see Perrott et al., 1990, 1991, 1996). In fact, recent work by Ho and her colleagues (Ho et al., 2006) has come to a similar conclusion on the basis of their research investigating the effectiveness of vibrotactile versus auditory warning signals in alerting car drivers to the likely location of a potential danger on the road either in front or behind them. The possibility that the region of space in which non-visual cues are presented should play such an important role in determining their effectiveness, and the fact that the cue should, whenever possible, be presented from close to the relevant visual display (though see Ho & Spence, in press, for an exception) raise a number of interesting, but as yet unanswered, questions for future research. This research will likely have important implications for the future design of non-visual interfaces/warning signals.
Given that the wrists/hands are increasingly being targeted as a potential site for vibrotactile stimulation in tactile/haptic interface design (e.g., Bosman et al., 2003; Chen et al., 2008; Hameed et al., 2009; Sklar & Sarter, 1999; Van der Burg et al., 2009), one interesting question for future research will be to determine whether the tactile enhancement of visual search performance would be modulated by the position of a person’s hands relative to the visual display about which the vibrotactile cue was meant to provide information. If, for example, a vibrotactile stimulator were to be attached to either hand, and the side on which a vibration was presented were to indicate the likely side on which the visual target would appear (e.g., as in Ngo & Spence’s, submitted, studies), then one might ask whether the benefit of vibrotactile cuing on participants’ visual search performance would be any larger were the hands to be placed by the side of the visual display say, rather than down (away from the display) in the participant’s lap (see Abrams et al., 2008; Hari & Jousmäki, 1996; Reed et al., 2005). At present, we simply do not know the answer to this question. Though should such a result be obtained, it would certainly have important implications for anyone thinking of presenting tactile cues to a car driver, say (since the cues could, in principle, either be presented by means of the vibration of the steering wheel when driving or by vibrating the sides of the driver’s seat instead; Ho & Spence, 2008; Spence & Ho, 2008a).
Presenting non-visual spatial cues from the same location as the visual display that they are designed to refer to provides one obvious solution to the crossmodal correspondence problem in interface design (that was outlined earlier; see Section 3). However, it is important to note that in many real-world display settings, this may only be possible for auditory, but not necessarily for vibrotactile warning signals. What is more, in certain environments, it may simply not be possible to co-localize auditory cues with the relevant visual events either (see Fitch et al., 2007; Ho et al., 2009; Perrott et al., 1996). So, for example, Fitch and his colleagues recently reported that participants found it easier to localize vibrotactile than auditory cues in a vehicular setting. The participants in their study had to indicate which direction was indicated by the activation of one of an array of eight chair vibrators or eight loudspeaker cones. The participants in Fitch et al.’s study were significantly better at localizing the direction indicated by the vibrotactile cue (86% correct) than indicating the direction indicated by the auditory cue (32%). When presented with an audiotactile cue, the participants correctly localized the direction indicated by the cue on 81% of the trials (i.e., once again, multisensory cuing was no better than the best of the unimodal cues; Lindeman et al., 2003 ; Ngo & Spence, submitted).
Of course, the crossmodal correspondence problem could be solved when presenting vibrotactile cues if some way could be found to have people attribute a distal event to the source of stimulation on their body surface. However, to date, all attempts to achieve distal attribution using vibrotactile stimulation have failed (see Epstein et al., 1986). It should also be noted here that the crossmodal correspondence problem can be solved for auditory cues that are presented more-or-less simultaneously with a salient visual event by utilizing the so-called ‘ventriloquism effect’ (see Spence & Driver, 2000). The ventriloquism effect refers to the automatic visual capture of perceived auditory localization that occurs when a salient visual stimulus is presented at more-or-less the same time as a sound (see Slutzky & Recanzone, 2001). The harder it is to localize a sound, the larger the visual biasing of the perceived auditory localization is likely to be. The ventriloquism effect is larger for synaesthetically congruent pairs of auditory and visual events than for synaesthetically incongruent pairs. So, for example, Parise and Spence (2009) recently reported significantly larger spatial (and temporal) ventriloquism effects when large visual stimuli were paired with low frequency tones, and small visual stimuli with high frequency tones, than when large visual stimuli were paired with high tones or when small stimuli were paired with low tones. While tactile stimuli can also be ventriloquized toward stimuli presented in a different location in another sensory modality (see Caclin et al., 2002), it seems unlikely that tactile stimuli could ever be ventriloquized away from the body itself (i.e., and to the visual display/event to which they refer). Hence, the ventriloquism of relatively unlocalizable warning signals may only be of benefit for auditory cue (or accessory) stimuli.
8. What mechanism(s) underlie facilitation of visual performance by non-visual cues?
While the various studies reported in this chapter clearly demonstrate that various non-visual cues, be they tactile, auditory, or audiotactile, can be used to facilitate a person’s ability to detect/identify visual targets in complex visual displays, the mechanism(s) underlying these effects have not, as yet, been fully worked out. Whenever a spatial cue provides information regarding the likely location of the target then any facilitation of participants’ performance may be attributable, at least in part, to the endogenous (i.e., voluntary) orienting of their spatial attention to the location (side or quadrant) indicated by the cue (see Driver & Spence, 2004 , for a review). Additionally, however, when the cue provides information about the likely identity of the target (or when the cue provides location information and the participant is required to make some sort of target localization response) then facilitatory effects may also reflect the direct priming of the participant’s response by the cue (see Ho et al., 2006; Spence & Driver, 1994, 1997a).
The presentation of a non-visual cue (or accessory stimulus) may also bias participants’ responses in detection tasks, simply by making them somewhat more likely to say that a target was present, regardless of whether or not it was actually in the display (see Odgaard et al., 2003; Stein et al., 1996). When a cue is presented at the same time, or slightly ahead, of the relevant visual event/display then it may also facilitate participants’ performance by means of a non-spatial alerting effect (e.g., Posner, 1978; Spence & Driver, 1997a). Alerting effects have been characterized as a general speeding-up of a participant’s responses, often together with a concomitant reduction in the accuracy of those responses (i.e., alerting can be thought of as equating to a lowering of the participant’s criterion for initiating a response). Researchers in this area have managed to rule out alerting as the primary cause of the visual search benefits that they have observed by showing that certain cue-related effects only occur when the non-visual cue is synchronized with the visual target, and not when it is presented shortly before the visual target (e.g., see Van der Burg et al., 2008, Experiment 3; Vroomen & de Gelder, 2000, Experiment 2). This latter pattern of results is more consistent with some form of multisensory integration effect (as these tend to be maximal when events are presented simultaneously in different modalities; see Spence et al., 2004; Stein & Meredith, 1993; Stein & Stanford, 2008; Vroomen & de Gelder, 2000).
Finally, it is also possible that when spatial cues are presented (as in the studies of Lindeman et al., 2003; Ngo & Spence, submitted; Perrott et al., 1990, 1991, 1996; Tan et al., 2009) they may facilitate participants’ performance by exogenously drawing their spatial attention toward the location of that cue (e.g., Dufour, 1999; Gray et al., 2009; Kennett et al., 2001, 2002; Spence et al., 1998). Researchers have shown previously that auditory or tactile cues briefly facilitate a participant’s ability to detect and/or discriminate visual (and, for that matter, auditory and tactile) targets presented from more or less the same spatial location, even when they are non-predictive with regards to the likely location of the target. These benefits last for about 200-300 ms from the onset of the cue, and appear to be maximal at cue-leading asynchronies of 100-200 ms (see Spence et al., 2004). Neuroimaging studies have now revealed that the presentation of a vibrotactile cue on the same (rather than opposite) side as a visual target can lead to enhanced activation in early visual cortical areas, such as the lingual gyrus (e.g., see Macaluso et al., 2000), presumably via back-projections from multisensory parietal areas (Driver & Noesselt, 2008).
While it is easy to see that vibrotactile cues presented to the wrists/hands might lead to an exogenous shift of a participant’s visual attention to the region of space around their hand/arm (Spence et al., 2004), it is less clear that vibrotactile cues presented to an interface operator’s back would necessarily also lead to an exogenous shift of their spatial attention to a particular location in frontal visual space (i.e., where the visual display/event is often likely to be located). However, the large spatial separation between the vibrotactile cue presented to a participant’s back and the visual event in frontal space (that it is designed to inform the interface operator about) also makes explanations for the facilitatory effects of spatial cues in terms of multisensory integration (see Stein & Meredith, 1993; Stein & Stanford, 2008) seem unlikely. One caveat that should, however, be noted at this point is that the rules of crossmodal attention and multisensory integration operating in the unseen part of space behind our heads (and, presumably also our backs) may be fundamentally different from the better-studied interactions that have been observed and documented in frontal (visual) space (see Ho & Spence, in press; Spence & Ho, 2008b). More research is needed on this topic.
What would be helpful here would be to conduct spatially-informative counter-cuing experiments (e.g., Chica et al., 2007), since that would really help researchers get a handle on the automatic nature of such exogenous crossmodal spatial cuing effects (see Tan et al., 2009). It has been reported previously that counter-cuing (i.e., when a cue on one side informs the participant about the likely localization of the target on the opposite side) can lead to very short-lasting exogenous cuing effects at the cued location (typically lasting for no more than 50 ms), followed by a later, longer-lasting endogenous cuing benefit at the likely target location (i.e., on the opposite side of the cue; see Chica et al., 2007; Driver & Spence, 2004 ). Results such as these have been taken to suggest that the cue automatically captures participants’ attention spatially, prior to their being able to endogenously re-direct their attention on the basis of the informational content carried by the cue. Such a result, should it be found with vibrotactile cuing on a participant’s back prior to the discrimination of a visual target in frontal space, would therefore suggest that under the appropriate conditions back cues can indeed exogenously direct a person’s visual spatial attention in frontal space. Such an effect, should it be observed, might reflect some sort of mental set effect (i.e., showing that people can remap, or align, ‘back’ space to ‘frontal’ space under the appropriate conditions). However, if this crossmodal cuing effect were not to be observed (see Tan et al., 2009), it might then lead one to suggest that the mapping between an interface operator’s back and the visual display in front of them is actually fairly arbitrary in nature. As such, it would imply that there might not be any special correspondence between locations on an interface operator’s back and locations in frontal space. Given the importance of such an observation for our understanding of the facilitation of visual search using non-visual cues, this clearly reflects another important topic for future research.
It is at this point that one starts wondering whether the benefits from non-visual (especially vibrotactile) spatial cuing may result solely from the informational content provided by the cue. If this were to be the case, then the possibility emerges that perhaps the same information could be transmitted to an interface operator using a particular rhythm of tactile pulses delivered via a single vibrator attached to their wrist/back etc. (e.g., Brown et al., 2006; Frings & Spence, submitted; Peddamatham et al., 2008; Töyssy et al., 2008), rather than using a spatially-distributed, and potentially ambiguous (see Yana et al., 2008), vibrotactile display. At the very least, it is certainly worth pausing to consider whether the only benefit of spatial cuing relative to, say rhythmical, tactile cuing is the speed with which different cues can be differentiated in the former case (cf. Frings & Spence, submitted). However, even if such exogenous crossmodal cuing effects were not to be observed in a counter-cuing experiment, it could nevertheless still be argued that the spatial content of a vibrotactile cue on an interface operator’s back might be capable of priming the appropriate orienting response (e.g., Gregory, 1967; Proctor et al., 2005). That is, there might still be some kind of ‘natural’ or intuitive mapping between back and frontal space which makes it easier to interpret the directional spatial cue, even if it does not lead to exogenous spatial attentional orienting: After all, just think how natural it feels to turn one’s head in the appropriate direction when someone unexpectedly taps the back of one’s shoulder.
9. Conclusions, caveats, and directions for future research
The research that has been reviewed in this chapter demonstrates that the presentation of non-visual cues (be they tactile, auditory, or audiotactile) can have a profoundly beneficial effect on participants’ performance on a variety of different visual tasks, as evidenced by the findings from a number of visual search and change detection tasks (e.g., see Jones et al., 2008; Lindeman et al., 2003; Ngo & Spence, in press, submitted; Perrott et al., 1990, 1991, 1996; Tan et al., 2003, 2009; Van der Burg et al., 2008, 2009). It is interesting to note that non-visual warning signals, at least in certain circumstances, seem to provide benefits that visual cues simply cannot offer (Santangelo & Spence, 2008; Van der Burg et al., 2008; Experiment 2b; though see also Lindeman et al., 2003).
There is an important open question here as to whether, and under exactly what conditions, bimodal (i.e., multisensory) cues will facilitate performance more than unimodal cues. Bimodal cues appear to outperform unimodal cues under certain conditions (Ho et al., 2007a; Spence & Santangelo, 2009), but not others (e.g., Fitch et al., 2007; Lee & Spence, 2009 ; Ngo & Spence, submitted; Lindeman et al., 2003). One intriguing recent result that has now been demonstrated in a number of different experimental settings is that multisensory cues appear to capture people’s spatial attention more effectively than unimodal cues when they are otherwise distracted (e.g., when performing another task), that is, under conditions of high perceptual (or cognitive) load (see Spence & Santangelo, 2009, for a recent review). Taken together, the most parsimonious conclusion to draw at the present time regarding the benefits of bimodal (or multisensory) over unimodal spatial cuing (i.e., attentional capture) is that it depends on the particular task conditions in which the cue is presented. Following on from this conclusion, researchers will, in the future, certainly need to demonstrate whether unimodal (as compared to bimodal) non-visual warning signals still retain their effectiveness (e.g., in visual search or change detection tasks) under conditions where the operator is overloaded, say answering a mobile phone while driving when the tactile warning signal comes in (e.g., Lee et al., 2009; Santangelo & Spence, 2008; Scott & Gray, 2008; Spence & Santangelo, 2009).
There are, however, also a number of potential caveats in terms of anyone thinking of applying these findings regarding the facilitation of visual search using non-visual cues to real-world settings. Perhaps the most important of which relates to the fact that in the majority (possibly all) of the studies reviewed here, the participants were instructed to fixate on a central fixation point at the start of each and every trial (i.e., prior to the presentation of the non-visual cue). This point is absolutely crucial because in any real-world setting it is unlikely that an interface operator would necessarily have their eyes and head nicely aligned in this way when the tactile, auditory, or audiotactile warning signal is presented. In fact, in many settings, the cue will be presented precisely because an interface operator’s attention has been distracted off to the side (e.g., see Ho & Spence, in press). This means that there is an unresolved research question to be addressed here about the efficiency of non-visual cuing under conditions of unconstrained head/eye movements. The problem relates to the fact that the perceived location from which an auditory or tactile event is perceived to have been presented has been shown to change as a function of any change in their eye and/or head position (see Harrar & Harris, in press; Ho & Spence, 2007; Lewald & Ehrenstein, 1996a, b; Macaluso et al., 2002; Weerts & Thurlow, 1971). Now, it may be that these overt-orienting induced shifts are small enough not to deleteriously influence an interface operator’s performance when using 2, 3, 4, and possibly even 9 vibrotactile cue locations (see Natsoulas & Dubanoski, 1964). However, at some point, the benefits of increased cue resolution will be offset by the mislocalization errors that are induced by any changes in head/eye position (see also Spence et al., 2004, on this point).
A second caveat that has to be noted here is that the actual tasks, paradigms, and visual displays used in the research that has been reviewed here have all been lifted straight from the psychologists’ laboratory. That is, they are, in some important regards, very artificial (e.g., when in everyday life does one need to search for a horizontal or vertical line from amongst a large number of tilted distractors?). What we need to do now that we have demonstrated the efficacy of auditory, vibrotactile, and audiotactile cuing in facilitating people’s ability to search amongst letters and line-segments in a laboratory setting, is to test the benefits using more realistic and dynamic displays, such as those found in air-traffic control settings (see Figure 8). We also need to bare in mind the fact that there is already evidence that certain of the cuing (accessory stimulus) benefits that have been reported to date may be specific to the particular tasks under investigation (e.g., compare Lindeman et al., 2003, and Van der Burg et al., 2008, as discussed above). On the other hand, though, there has also been a lot of exciting progress being made recently in applying the constraints on crossmodal attention that have been discovered in the laboratory to real-world interface settings (e.g., Ferris et al., 2006; Ferris & Sarter, 2008; Sarter, 2000, 2007; Spence & Ho, 2008b).
Another important question for future research in this area concerns the determination of what constitutes the optimal asynchrony (if any) between non-visual cues, and the displays/events that they are designed to inform the interface operator about. To date, researchers have either looked at the synchronous presentation of the cue and target event (e.g., Ngo & Spence, submitted; Van der Burg et al., 2008, 2009), or else at conditions in which the cue has been presented prior to the onset of the target event (Tan et al., 2009; Van der Burg et al., 2008). While many researchers have been concerned about the effects of any perceived asynchrony on their participants’ performance (Lindeman et al., 2003; Van der Burg et al., 2009), the only study that we are aware of that has conducted a full time-course analysis in order to determine the optimal cue-target stimulus onset asynchrony (SOA) was reported by Van der Burg et al. (2008, Experiment 3). They tested auditory cue-visual target (i.e., colour change) asynchronies from cue-leading asynchronies of 150 ms through to target-leading asynchronies of 100 ms. Their results showed that cue-target asynchronies from +100 ms gave rise to significant cuing benefits, but intriguingly the benefits were maximal when the target actually preceded the cue by 25-50 ms (see Figure 9).
It is important to note that even when a non-visual cue stimulus is programmed to be delivered at the same time as a visual target stimulus, asynchronies can be induced either because of physical differences in equipment lags or because of biophysical differences in sensory transduction latencies (e.g., see Harrar & Harris, 2005; Shi et al., submitted; Spence et al., 2003; Spence & Squire, 2003). Note also that in certain situations the non-visual warning signal will, of necessity, have to be presented with some slight delay with respect to the external events that they are designed to inform the operator about (think, for example, about vehicle collision avoidance warning signals; see Ho & Spence, 2008; see also Chan & Chan, 2006). Researchers will therefore need to start focusing more of their research efforts on assessing the effectiveness of warning signals that are presented after the onset/occurrence of the event of interest. Furthermore, once we know more about the mechanism(s) underlying these crossmodal facilitatory effects on visual task performance in cluttered scenes (see above) we may want to try and utilize specific asynchronies in order to maximize attentional cuing and/or multisensory integration effects (see Spence et al., 2004; Shore et al., 2006). That said, the evidence that we have reviewed in this chapter has hopefully highlighted the potential use that non-visual (in particular, vibrotactile, auditory, and audiotactile) cues (and accessory stimuli) may have in facilitating overloaded interface operators’ visual search through complex and dynamic information displays in the coming years.