Virtual reality (VR) is a technology that allows users to interact with computer-simulated environments. Virtual reality applications are normally displayed either on a computer screen or in a stereoscopic display system. The latter is also referred to as an immersive display system. In this chapter we discuss immersive interactive real-time VR systems, related devices, interaction techniques, and applications in the context of numerous industrial and research projects of the Virtual Environments department of the Fraunhofer Institute of Intelligent Analysis and Information Systems (IAIS) - .
Applications running in immersive virtual reality display systems are nowadays widely used in industry and in research laboratories. Design, engineering, and simulation are the most prominent application areas. However, the maturity of the corresponding 3D user interfaces (UI) is far from being able to cope with the functional abilities desired by users. This leads to restricted usability, comfort and efficiency, particularly for highly interactive applications, counterbalancing the benefits that the use of virtual environments (VE) can offer. There are many reasons for this situation: missing standards for immersive interaction, limited resources for development, and a focus on the engineering functionality, neglecting the importance of usability issues. Simple transformation of desktop user interaction concepts to immersive VR for the sake of simplicity and reduced development time fails.
In recent years, many novel concepts for immersive interaction have been developed and published. Most emerged in research laboratories; and some have been developed in close cooperation with industrial partners who are the potential users. This method ensures that industrial end user requirements are considered. Our department has successfully applied this concept in the VRGeo Consortium, which studies the usability of VR for the Geosciences - . However, not all developed immersive interaction concepts have been successfully transformed into productive tools for a broader audience or even a mass market.
In the next sections, we want to contribute to the advancement of immersive interaction, concentrating on large-scale VR installations. We also offer an overview of existing technology and its trends. The text can serve as a guide for application developers that need to consider 3D input. It aims at encouraging them to put enough effort on the design of 3D interaction techniques, including customer requirements, and not to neglect these aspects in favour of pure application functionality. Nowadays, it is by far not sufficient to impress potential customers or users with pure 3D. They need to be convinced that the investment into this technology leads to a substantial benefit – which can only be achieved if the user interface is elaborated enough so that a user recognizes the whole system as an effective tool.
So what are the key benefits of VR that can be of interest for industry? First of all, VR systems can help decreasing design costs by allowing designers to work with virtual models. Second, VR systems can facilitate decision-making by providing integrated views demonstrating different alternatives with additional information. Third, VR systems can be used as demonstration and training environments because of their high-impact visualization possibilities and the ability to reproduce situations that could be dangerous or difficult to observe in the real world. Developing successful VR applications requires a deep understanding of the key features of VR, namely 1) immersion, the ability to of a system to immerse a user into the world of a virtual model and facilitate its spatial perception 2) interaction, the possibility to control virtual objects and get feedback 3) collaboration, support for simultaneous work of two or more users.
Immersive or 3D interaction is human computer interaction in which a user's tasks are performed in a 3D spatial context. Before considering the 3D interaction techniques and their implementation, we briefly survey modern 3D input and output devices. Another important topic that will be considered in this chapter is the evaluation of 3D applications. Several application examples will be presented in the last part of the chapter. For further reading please refer to (Bowman, 2005).
2. 3D Display Systems
Various application fields that already use immersive interaction have different display system requirements. Different screen sizes, resolutions, and 3D viewing technologies exist. For example, a presentation environment for groups of people needs to satisfy different demands as opposed to an interactive training environment designed for a specific scenario. The result is a relatively large number of available display system configurations that are all based on the same principle: projection of a stereo image pair onto a large screen. The following components are common to most 3D projection-based display systems: 1) a rendering system - a powerful computer with respective graphics hardware and software, 2) a projection system and corresponding glasses, 3) additional output devices, such as auditory and olfactory displays, 4) a tracking system, 5) interaction devices. The most important characteristic of a display system is the used technology to generate a 3D impression. In most cases, stereo image pairs are rendered, so that a user perceives a 3D virtual model using stereo glasses. The most common systems are:
Passive stereo systems. Images rendered by the computer are projected for the left and right eye in perpendicular polarization and respective polarization glasses are required to get the 3D effect. For each eye, a separate projector is needed.
Active stereo systems. The rendered images are projected in sequence for left and right eye and special shutter glasses are used to get the 3D effect. The glasses are see-through so that a user could perceive the virtual environment as being integrated in his or her normal environment; the glasses are synchronized with the projectors on infrared channel. Only one active-stereo-capable projector is needed per screen.
The Infitec (color separation) system  - separates the stereo image pair by using different, complementary spectra of lights for both eyes. The crosstalk is very small. However, because of the different wavelength triples used for both eyes, slight deviations of colors are noticeable.
Autostereoscopic displays. Such systems do not require stereo glasses, since they separate the images for both eyes on the display surface using layers of masks and lenticular optics, see (Börner et al., 2000). This causes different images to be seen from different viewing angles, and consequently, the user can perceive a 3D effect when viewing from the corresponding sweet spot. Some displays of this kind support several sweet spots, which allow multiple viewers to see 3D, or to see the effect from different angles. Furthermore, some displays support correct 3D effect even for a moving head, using eye tracking methods. Noticeably, most autostereoscopic displays still have normal desktop monitor s sizes and therefore they are not suited for most of the immersive interaction techniques discussed here. However, some manufacturers have started recently to offer large autostereoscopic displays with up to 57” diagonal - .
Quasi-holographic displays. Consisting of optical modules, like projectors and mirrors, and a holographic screen, this innovative kind of display system (Balogh, 2006) is capable of generating a true 3D effect without using auxiliary means, such as glasses. The holographic screen emits light beams received from the optical modules under different angles as if they were emitted from real artifacts. Note that, in contrast to all other display systems mentioned so far, there is no static flat image (or image pair). Instead, at a certain position on the screen, subtle image changes occur when observed from different positions in front of it. Unlimited number of viewers can walk in front of the screen and see 3D objects with just the naked eye. The inventors of this technology - describe the underlying principle as a digital window through which virtual objects can be seen as real objects are seen through a normal window. The display has suitable size for the support of immersive interaction and could also be used for co-located multi-user interaction.
Light-field displays. A true 3D display that does not require stereo glasses has been proposed by (Jones et al., 2007). It is constructed using a high-speed projector and a spinning reflective device that is synchronized with the projected image sequence. In this way it is possible to project multiple center-of-projection images. The reflecting device behaves like a mirror horizontally, allowing a 360 field of view and scatters vertically. Head tracking can be used to display the correct images for different observer heights. This display type is not suited for direct interaction with virtual objects, simply because their location coincides with rotating mechanical parts of the display.
For immersive interaction, currently large projection-based systems for the display of stereo image pairs, requiring the use of stereo glasses, are predominantly used. Many types of such systems have been developed and presented since over more than one decade. Most of them were originally developed in research labs and emerged from prototypes as commercially available products. They are available from projector manufacturers - or special VR hardware suppliers - . However, tiled, large-scale autostereoscopic displays have already been proposed, see e.g. (Sandin et al., 2005). A rough system classification is presented below.
Projection screen. The simplest VR display consists of just one vertical projection screen, with e.g. one projector for active stereo, and one graphics computer. A reasonable investment into a robust tracking system that supports both hand and head tracking benefits spatial interaction and therefore highly interactive applications are perfectly possible in these relatively simple settings.
Desk and workbench systems. Turning the screen by 90 degrees results in a table-like projection. One of the first systems of this kind was the Responsive WorkbenchTM (RWB) (Krüger et al., 1995), which was developed by our department (being a part of German National Research Center for Information Technology (GMD) at that time). The RWB consists of a horizontal projection plane, onto which active stereo images are projected via a mirror. The utilized interaction metaphor resembles a working desk: users directly grasp 3D objects located on the table top, and also use interaction widgets placed within the work space. The fact that these objects seem to be placed on the surface, particularly enhances spatial impression and makes the RWB suited for applications like landscape planning, architecture, or surgery simulation and training. Later, a vertical projection plane was added so that objects that have a high extend can be displayed completely.
Projection walls. Vertical projection walls are often composed of an array of projection areas, projectors, and graphics computers in order to increase size and resolution of the image. Such systems are not easy to calibrate and maintain and need a lot of space. Projection needs edge blending in order to achieve seamless image borders. In case of active stereo, the shuttering of all image channels need to be synchronized. This is achieved using a dedicated sync signal, produced by one of the graphics cards and fed into all other graphics cards. Note that multiple graphics cards can be driven either by a PC cluster or by dedicated hardware capable of combining several graphics cards connected to one computer, e.g. the NVIDIA QuadroPlex - . These systems are often used for marketing and presentation purposes and they have limited interaction capabilities.
Surround-screen projection systems. These systems resemble a room and consist of up to six planar, back-projected screens. Back projection is needed because the interior space must be kept free for the users, and in order to avoid shadow casting. This increases space requirements because of the needed exterior positioning of projectors and mirrors. Normally, only four or five screens are used in order to reduce needed space and costs, and to keep an open entrance area. The degree of immersion is very high in these systems and consequently they are very well suited for walkthrough scenarios in which users are immersed within virtual spaces and rooms, e.g. architecture and interior design. The first such system was the CAVETM (Cruz-Neira et al., 1993), which has been a success worldwide. The CyberStage (Tramberend et al.; 1997) was the first European CAVE-like display system and was developed by our department (being with GMD at that time). The CyberStage is a 3m x 2.4m room with stereoscopic projections on three walls and onto the floor as well. It allows a user to move freely around and feel immersed in an unbounded space. It is an immersive room with 8-channel sound and vibrating floor. Surround-screen projections are widely used in the industry, especially in the automotive sector; and they are still being further developed. Recently, (DeFanti et al., 2009) proposed a six-sided system, each side consisting of a tiled screen. The ground forming a pentagon, each side has three screens, with the top and bottom screens tilted inwards. In order to get inside, one of the sides can be moved like a door. There is no top screen. Front projection is used for the ground, whereas the sides are back-projected. Since stereo viewing is based on circular polarization, screen material had to be chosen carefully. This setup has several advantages over standard four-sided CAVE-like displays: increased resolution, reduced visibility of interreflections due to increased 108 angle between screens, and less off-axis viewing angles since the screens are facing the viewer position.
Cylindrical, curved screen systems. Surrounding screens can enhance the degree of immersion, compared to flat screens. They are curved to create a seamless transition of the projection field (no visible edges), and to reduce the visibility of perspective errors resulting from observer positions away from the centre of projection (the main view point). These errors are easier noticeable at transitions between flat screens (like in the CAVETM), particularly for straight lines. The projectors must be calibrated to pre-distort the image so that from the main view point, a rectilinear image can be observed, which is checked by projecting a suitable test grid.
The i-ConeTM (Simon & Göbel, 2002) display developed by our department has a slightly slanted projection area, forming a cone-shaped display, in order to minimize disturbing echoes perceivable at the centre of the system. Another positive effect of the slanting is that front projection can be used in a way that shadow casting by viewers is reduced (see Figure 1.). Notice that, compared to the CAVETM, space requirements are significantly reduced taking into account the higher number of viewers that can reside in the i-ConeTM. The VarrierTM display developed by (Sandin et al., 2005) is making innovative use of 35 autostereoscopic display panels arranged in a tiled array, forming a cylindrical immersive VR display. The work demonstrates how standard autostereoscopic panels should be modified for use in a VR display system. For example, the image slices forming the interleaved left and right eye perspectives are generated according to world coordinates, taking into account the viewer position in the display area, contrary to standard autostereoscopic displays that use integer device coordinates and compress the depth of the scene.
The TwoView display system. Normally, in projection based VEs, only one user is head-tracked and can see perspectively correct. All other users wearing stereo glasses see a stereo image computed for a different centre of projection, which is distorted. The TwoView display developed by our department removes this limitation and supports two head tracked users. Because of correct perspective, the virtual and physical spaces are correctly aligned for both users allowing true co-located collaborative work with 3D objects. The image pairs of the two users are separated using polarization, whereas stereo is achieved by shuttering (active stereo). This means that polarization filters need to be mounted to the shutter glasses, and two active-stereo capable projectors are needed, see Figure 2.. Other display systems (Froehlich et al., 2004) support even more viewers, based on shuttered LCD projectors.
Note that the ground projection is inactive in TwoView mode.
Augmented Reality (AR) displays. These displays combines virtual 3D object with the physical world seen by the user. This can be achieved using cameras or semi-transparent mirrors. There are mobile displays such as head mounted displays and also projection-based systems. As an example, consider the Spinnstube display system (Wind et al., 2007) also developed in our department. It uses a small, lightweight active stereo projector to project stereo images onto a screen located above the user (see Figure 3.). A user is seated and is looking onto a semi-transparent mirror, which reflects the screen. In addition, a user can observe physical artefacts underneath the mirror, which are placed on a working desk. This technique is used in school environments and supports all kinds of training applications, since it can overlay physical objects with additional, helpful information. Furthermore, a unified collaborative manipulation space can be formed for up to four users by aligning several displays as shown in Figure 3. A new version of the Spinnstube® is expected in 2009.
3. Tracking Systems
Tracking systems (or trackers) provide information about position and orientation of real physical objects for the use in VR applications. Trackers can measure the motion of interaction devices, as well as user's head, hands, and sometimes the whole body or just eyes. This information is then used to calculate the correct perspective projection and auditory inputs accordingly to the position of a user and react on his/her actions. Hence, together with signalling functions of interaction devices, trackers are the primary means of input into VR applications. Trackers can be described by a set of key characteristics, which can be used for a system evaluation and comparison (Meyer et al., 1992):
Resolution. Measures the exactness with which a system can locate a reported position. Resolution shows the smallest change a system can detect.
Accuracy. The range within which a reported position is correct. This is a function of the error involved in making measurements and often it is expressed in statistical error.
System responsiveness comprising sample rate (the rate at which sensors are checked for data, usually expressed as frequency), data rate (the number of computed positions per second, usually expressed as frequency, update rate (the rate at which the system reports new position coordinates to the host computer, also usually given as frequency), and latency (the delay between the movement of the remotely sensed object and the report of the new position).
Different technologies have been used in the development of tracking devices: there exist optical, magnetic, mechanical, acoustic, inertial and hybrid trackers. Many tracking systems consist of a signal emitter and a signal receiver. For example, in the case of magnetic sensors, an emitter can be connected to the stereoscopic glasses, so that when a user moves his/her head, so does the position of the emitter. A receiver senses the signals from the emitter and an algorithm determines the position and orientation of the receiver in relation to the emitter. Each tracking technology has its own advantages and disadvantages (Welch & Foxlin, 2002). The most widely used systems in VR today are based on the optical technology, although some systems based on other technologies are also quite popular (such as the products of Polhemus - , Intersense - , Ascension - and XSens - ). Companies such as A.R.T. - , Vicon - and NaturalPoint - provide a wide range of hardware and software solutions for video-based infrared tracking of configurations of passive markers, which can be attached to different devices. Normally, the hardware in their systems employs infrared emitters integrated with synchronised high-resolution cameras, which have built-in image processing possibilities, and their software is able to track multiple targets (reflective spheres) in the tracking volume defined by the number and configuration of cameras.
In our projects we have not only used numerous commercial systems, but also developed our own prototypes and evaluated innovative systems developed by other research institutions. For example, the system developed by (Foursa, 2004) employed three infrared cameras and was able to track active markers (light-emission diodes - LEDs), which were mounted on the glasses and on a stylus. The system has been used to study reconstruction from multiple cameras (a metric that took into account local errors of reconstruction has been developed), to develop methods of system performance analysis (including reliability analysis), to design new interaction devices (a stylus with 2 LEDs has been used to control 6 degrees-of-freedom) and to study movement-based interaction with markers in different VR applications (Foursa & Wesche, 2007). In the HUMODAN project (Perales et al., 2004) we have integrated a markerless tracking system with our VR installations and evaluated its usability. The system employed two color cameras and required a uniform background; the segmentation, matching and reconstruction algorithms of the system allowed to track several reference points of human body (such as hands, head and shoulders) and to provide biomechanically corrected positions of up to 20 points of a human model.. However, since the tracked persons had to be visible on the cameras, additional light sources had to be placed in front of the users, distracting their attention from the VR display. Furthermore, since the accuracy of segmentation of the system was worse than one in marker-based systems, it led to position reconstruction errors of the feature points. Nevertheless, we could use the system for application control and simple interaction tasks (Foursa & Wesche, 2007). Although some companies - offer commercial markerless tracking solutions for motion capturing today, they are normally not used with virtual environments and their accuracy evaluation results are not easily available.
Current trends in tracking systems include the development of affordable, but high-performance hardware and software systems, which are wireless, scalable, support multiple targets and employ flexible calibration techniques, such as the prototype developed by (Pintaric & Kaufmann, 2007). New opportunities are opened by the so-called depth-sensing cameras (for example, the ZCam - ), which provide depth information for each pixel and do not require any additional background. Commercial applications using this technology for 3D control have been already developed - .
4. Interaction Devices
Interaction devices are physical tools that are used for different interaction purposes in VR, such as view control, navigation, object selection and manipulation, and system control. Sometimes the devices are integrated and supported by tracking systems. Normally they are equipped with buttons and controls; in addition, they may support force and audio feedback. Any input device may be used for various interaction techniques. The goal of a developer is to select the most appropriate technique for a device or vice versa, aiming at natural and efficient interaction. An interaction device may be discrete (generating events on user’s request, e.g. a button), continuous (continuously generating events, e.g. a tracker) or hybrid (combining both controls). Most interaction devices belong to one of the following categories: 1) gloves/hand masters 2) mice and joysticks 3) remote controls/wands 4) force/space balls. Input devices may be described by the follows characteristics:
Degrees-of-freedom (DOF) of an input device. Many devices supported by tracking systems have 6 DOFs (3 for position and 3 for orientation).
Number and type of buttons and controls.
Compatibility with interaction tasks.
Presence of feedback channels (sound, vibration etc).
Other specific characteristics: it may be one- or two-handed device, wireless or wired device and so on.
Below we describe several interaction devices developed in our department (see Figure 4.):
YoYo (Simon & Fröhlich, 2003) is an input device for controlling 3D graphics applications. The device consists of three elastically connected rings in a row, which can be moved relative to each other. The centre ring holds a tracking sensor and a few application programmable buttons. The left and the right ring are elastic 6DOF controllers. The device is designed for two-handed interaction and combines elastic force input and isotonic input in a single device. Compact size, symmetric shape, and the elastic properties result in a "soft" and responsive feel of this device. YoYo can be used for navigation and manipulation in three-dimensional graphics applications in the area of data exploration and scientific visualization. An informal user study and user observations have shown that novice users are quite confident with the device after a short introduction, and that most users alternate between using elastic rate control and isotonic control for navigation.
NoYo (Simon & Doulis, 2004) is a joystick-like handheld input device for travel and rate-controlled object manipulation. NoYo combines a 6DOF elastic force sensor with a 3DOF source-less isotonic orientation tracker. This combination allows consistent mapping of input forces from local device coordinates to absolute world coordinates, effectively making NoYo a "SpaceMouse to go". The device is designed to allow one-handed as well as two-handed operation, depending on the task and the skill level of a user. A quantitative usability study shows the handheld NoYo to be up to 20% faster, easier to learn, and significantly more efficient than the SpaceMouse desktop device.
CubicMouse™ (Fröhlich & Plate, 2000) is an input device that allows users to intuitively specify 3D coordinates in graphics applications. The CubicMouse™ consists of a box with three perpendicular rods passing through the centre and buttons for additional input. The rods represent the X-, Y-, and Z-axes of a given coordinate system. Pushing and pulling the rods specifies constrained motion along the corresponding axes. Embedded within the device is a 6DOF tracking sensor, which allows the rods to be continually aligned with a coordinate system located in a virtual world.
PioDA is a combined 3D interaction device for virtual environments. It was particularly developed for the A.R.T. tracking system, but it can also work with any optical tracking system able to track passive markers. PioDA is PDA-based. It is used as a 6DOF interaction device with portable graphical user interface for application/system control in virtual environments. It integrates wireless optical tracking (room-size) and is useful for most projection based virtual environments. Having only one handy device for both interaction and application/system control is the major advantage of this device. The GUI of the device can display interactive content, which can be synchronized with the context of a VR application.
Current trends in interaction devices include the development of wireless lightweight devices with multiple sensors feedback. For example, Nintendo’s Wii Remote - , the primary controller for Nintendo's Wii console, has motion sensing capability through the use of accelerometer and optical sensor technology, provides basic audio and rumble functionality, and supports Bluetooth. Such functionalities allow developing simple but effective interactive applications - . Another interesting direction is the integration of haptic devices with virtual environments (for example, the Inca 6D device - ). One should also be aware that there exist other input channels, such as speech input, brain input - and gestures, and sometime the combination of different input channels can significantly increase the usability of applications.
5. Interaction Techniques
3D interaction techniques are methods for performing certain interaction tasks within a 3D environment. Typically, a user is partly or fully immersed in the 3D environment and can access its objects to control the application via 3D input devices. 3D interaction techniques are not device-centric in the first place; instead a number of design decisions are needed to achieve usability performance and comfort, particularly considering highly interactive 3D applications. Designing a 3D interaction technique needs to take into account the following aspects:
Spatial relationship of users and virtual objects in the environment
Number and type of available input devices (special purpose vs. general purpose, many controls vs. few basic controls, spatially tracked vs. non-tracked)
Single-user vs. collaborative applications
Consequently, the following basic interaction techniques can be distinguished:
Direct interaction (the control of nearby objects in reach for being grabbed by hand) vs. indirect interaction (the control of distant objects, or transition between nearby and distant objects)
Two-handed vs. traditional, one-handed input methods
Application control using many control channels (buttons, sliders, touch-screen, etc), or utilizing pure hand movement in space
Traditional single user interaction vs. co-located multi-user interaction that supports multiple users working together at the same location and sharing the same virtual manipulation space
Interaction tasks of complex applications in 3D environments can be categorized into selection, manipulation, navigation, creation, system control, and collaboration.
Selection refers to the process of selecting a 3D object for further interaction. This can be further subdivided into indicating the desired object, confirming the selection, and receiving a feedback (Bowman et al., 2005). It can be quite complex to select the right object due to inappropriate size, large distance to the user, occlusion, or even movement. The basic selection techniques are direct grabbing and ray casting. These basic techniques have been further developed to solve the mentioned selection problems. E.g. the flexible pointer technique (Olwal & Feiner, 2003) allows selecting partially obscured objects. The aperture selection technique (Forsberg et al., 1996) replaces the pick ray by a conic volume. It is even possible to control the opening angle of the cone dynamically, depending on requirements.
Manipulation means changing the state of a previously selected 3D object, including its geometry. It normally immediately follows the selection of an object; consequently in the literature selection and manipulation sometimes are considered together. The multitude of object properties requiring especial treatment cannot be considered in summary. However, the most basic task, moving objects in space, is so common that several techniques for it have been presented.
Direct manipulation. The direct, isomorphic method is used to attach the 3D object to the hand. Situations in which this can be applied in a comfortable way are nonetheless very rare. As soon as the object is not in reach for direct grasping with the hand, a technique for distant interaction is needed. Consider e.g. pointing at a distant object using a pick ray and rotating it. In this situation, rotation can in fact only be applied in a reasonable way around the axis of the pick ray (Bowman et al., 2005). Imagine also how tedious it would be to change the distance of the object to the user: a user would have to grab and push or pull the object several times, by clutching to it repeatedly.
Distant interaction. Some techniques have been proposed to improve the comfort and performance of direct distant manipulation. The virtual hand can be placed at the object’s location after object selection, using a scaled selection technique based on ray casting (Bowman & Hodges, 1997), which effectively increases the reaching distance of the hand. A similar effect can be accomplished by applying a non-linear growth of the (virtual) arm length of the user (Poupyrev et al., 1996). Alternatively, the whole virtual environment can be scaled down so that the selected object is brought in reach close to a user, allowing direct, isomorphic manipulation (Mine et al., 1997).
3D widgets. Instead of selecting and dragging a 3D object of the scene, manipulation can also be accomplished indirectly, by using 3D widgets. They are commonly used in 2D environments as a means for mouse-based object movement, in particular, for rotation, as the well-known trackball. This technique is applicable also in 3D environments to support rotation control for distant objects. An advantage of using 3D widgets is that object manipulation can be constrained, assigning different handles to the coordinate axes. Regarding object geometry, the applicability of these standard manipulation techniques is restricted to rigid-body transformations and scalings. Manipulations of parametric objects can be much more involved, e.g. imaging the task of smoothing or dragging a free form curve.
Manipulation of complex objects. Manipulation of complex, parametric objects is still not very common in 3D virtual environments, although the design of suitable methods is possible. However, inventing deformation tools for certain parametric objects is a challenge on its own, like for free form curves and surfaces, as used in Computer-Aided Geometric Design (CAGD). We have developed deformation tools for this class of objects to be used in environments that support direct interaction. A user can smooth, sharpen, and warp curves and surfaces in an intuitive way, and does not need to interact with the low-level mathematical parameters. Curves can be smoothed and sharpened locally by pointing at the corresponding locations at the curve, which causes the shape continuously changing close to the input device, whereas the farther curve segments are almost unchanged. Similarly, curve segments can be warped. The details of these tools are described in (Wesche, 2004) and are based on the idea of variational modeling (Wesselink & Veltcamp, 1995). The mode of operation of the smoother and sharpener is shown in Figure 5. Notice the rather indirect but still intuitive way of how hand movement is mapped to the shape deformation: the hand just points to the involved locations instead of performing a direct push or pull; therefore we can argue that the installation of a force feedback device is not necessary since the interaction mode does not imply it.
Need for navigation. Navigation is the process of setting the user’s position and orientation relative to a 3D environment. In many cases the whole VE can be overlooked and is in reach and there is no direct need for navigation, since the relative alignment of user and scene is accomplished by manipulation such as orienting the scene with the hand. Therefore, navigation often implies that the VE or the scale of the VE is very large compared to the user’s size or area of arm reach. In these cases, particular techniques are needed for traveling around in the VE (strictly, navigation should be divided into way finding, which is a cognitive process needed acquire knowledge of the 3D surroundings, and travel, the process to change position (Bowman et al., 2005)).
Walking. Most naturally, a user can travel by walking. Since the available space of display systems is limited, walking simulators that support walking in place are needed. The disadvantage is that the technological effort is high. 3D input devices are more convenient to control a user’s locomotion. They can also be combined with head tracking. The NOYO input device (see section 5) is such a device operated by both hands. The disadvantage is that hands are not free to perform selection and manipulation task simultaneously. It is also possible to evaluate tracked body parts for deriving the travel direction, e.g. the gaze vector. Notice that the direction of view should normally be independent from the walking direction.
Movement platforms represent an elegant method for controlling locomotion. The “Virtual balance” (Fleischmann et al., 1997), is a disk placed on three pressure sensors on which a user is standing and, by changing his/her weight on the platform, can travel in the virtual space, see Figure 6.
Notice that this is based on only two degrees of freedom supported by the device. The chairIO device, proposed by (Beckhaus et al., 2007), is used in a similar way. The person sits on a rotatable seat that can be tilted in any direction. Shifting the body weight is mapped to the direction of movement in the virtual environment. The travel speed is calculated from the degree of displacement. Compared to the Virtual Balance, the movement direction can be controlled more easily because of the additional rotational degree of freedom. Other interesting moving platforms are the sphere-based systems (such as a system developed by (Fernandes et al., 2003)), which are already available commercially - , and a tile-based system developed by (Iwata et al., 2005). However, such systems are normally used with head-mounted displays.
We see creation as a separate, new interaction task in order to distinguish it from manipulation. Creation is the task of instantiating or constructing a new 3D virtual object from scratch and adding it to the environment. Instantiating can be simple in case the object class only has a few parameters, e.g. consider the creation of primitive geometric objects like spheres, cylinders, or boxes. On the other hand, construction of objects can involve complex interaction needed to specify a high number of object parameters and properties. To cope with this, creation techniques can be designed that support the successive input of the various parameters and of the needed topological connection information, which could result in techniques that are not intuitive. Alternatively, an intuitive shape creation technique that extracts all properties from hand input may result in unpleasing or undesired results. In addition, complex algorithms for the connection of objects created in that way may be required. As an example, consider the task of free-from shape creation. A number of different techniques have been proposed for shape input in 3D virtual environments. The surface drawing technique (Schkolne et al., 2001) extracts polygonal surfaces out of intuitive free-hand drawing strokes at the Responsive WorkbenchTM. The surfaces of (Kuriyama, 1994) are based on the input of parametric boundary curves and result in pleasant shapes with well-defined geometric continuity properties across the boundaries. However, they are not supported by standard computer aided geometric design (CAGD) systems. The free-form sketching system of (Wesche, 2004), which runs at the Responsive Workbench™, supports these surfaces, and Spline-based surfaces as well; see also section 9. In addition, this application allows a user to specify the topology (i.e. the connectivity) of curves forming surface boundaries. This is achieved by pointing at the surface or curve locations directly in 3D space. In summary, the creation of complex objects may involve multiple successive interaction tasks. Furthermore, the pure geometry is not sufficient to fully specify a 3D object; in most cases the topology needs to be defined explicitly by suitable 3D input methods.
5.5. System control
Up to now, all 3D interaction techniques were related to spatial control of objects or of the whole scene. However, no application can exist without means to issue certain commands to control mode or state changes. These activities are referred to as system control, and they exist in a Virtual Environment as well. However, system control is often simply neglected in a 3D application. Normally, poor UI performance and high user frustration is the result. Although there is no standard for system control in a 3D environment, research in recent years has produced a variety of techniques. The 2D counterpart of 3D system control is the well-established WIMP (windows, icons, menus, pointer) standard. Compared to that, a much richer set of possibilities and techniques are available for 3D. These include
Adaptation of menu techniques
Posture and gesture based input
GUI of integrated standard input devices, like personal digital assistants (PDA)
From this it can be seen that there are two principal approaches (possible to use in combination): the 3D UI elements are an integral part of a 3D scene, or the GUI of standard devices, which can be instrumented with 6DOF tracking targets, is used for system control. We provide two examples in which the 3D widgets are rendered as part of the scene graph, general-purpose and a special purpose 3D widget.
General-purpose 3D widgets. 3D menus are widely used in VR applications since the menu concept is well-known from the WIMP interface. In 3D, a variety of different menu approaches exist. They make use of the third dimension in different ways: in the worst case a user needs to hit a 2D region in space using a pick ray, thus he/she needs to solve a task that is conceptually 1D (select an item from a list) in three dimensions.
This demonstrates why the straightforward transformation of concepts initially developed for the desktop to a 3D environment fails. 3D menu solutions not only need to utilize the third dimension in a useful way, they also should take into account the spatial relationships between menu widget, user hands and arms, view direction, manipulation space, and size of virtual objects. In some cases the 3D manipulation space is relatively large so that fixed menu locations are not the optimal solutions. We have developed a hand-oriented toolbar (Wesche, 2004) that follows hand movement; but at the same time it evaluates how a user turns his/her wrist to select an item (see Figure 7.). However, notice that the toolbar size should be limited to allow a comfortable use.
Special-purpose 3D widgets. The design of system control for special applications can benefit from approaches that intentionally deviate from physical world situations. The ToolFinger (Wesche, 2003) is such as technique, since it supports providing several virtual tools held in one hand at the same time; an idea that would sound completely impracticable in the real world. In contrast to this, in a virtual environment a user can quickly transition between several tools that need to be applied to an object in order to change its properties. Applications like 3D shape design can benefit from this technique (Figure 8.). A designer can iteratively elaborate the shape of an object in 3D using several tools in quick succession. Thus the system control subtask that is optimized using this approach is tool selection. An advantage of integrated UIs like 3D widgets is that they can be placed and configured within the manipulation space, i.e. within the scene the user looks at.
Multi-frame rate rendering. The problem with virtual UI elements is that they are rendered and updated with the same frame rate as the normal 3D scene. This means that in case the scene is complex and the frame rate drops, no reasonable user interaction is possible any more. The decoupling of rendering performance of the UI from the actual scene was topic of the research of (Springer et al., 2007), who introduced multi frame rate rendering. By optically or digitally composing interactive and less interactive parts of a scene, the interactive performance of users can be significantly increased. The use of standard input devices like PDAs for system control avoids the frame rate problem, but requires a user to shift his/her point of focus repeatedly from the 3D scene to the screen of the input device and back that breaks any immersive experience.
Current trends in developing UIs include the research in post-WIMP, non-command driven interfaces, which could be used also in 3D VEs. The translation of the intent of a user into a sequence of commands is replaced by much more fluent user input, based on so-called freeform UIs. This is expected to lead to new ways of system control for VEs as well. E.g. (Igarashi, 2003) utilizes free form drawing strokes to overcome the limitations of the WIMP interface.
In projection-based environments, several users can work together interacting with the same application. In most display systems, this kind of interaction is not supported since only one user can be head-tracked and sees the correct perspective space. In the TwoView display system, developed at Fraunhofer IAIS, this restriction is removed and two users each receive their own perspectively correct stereo image pair. Therefore they can manipulate the same virtual scene collaboratively. A basic collaborative selection and manipulation technique is the bent pick ray (Riege et al., 2006), The normally straight pick ray of two users interacting together in one place, when selecting the same object, is bent to provide visual feedback of the fact that a part has been grabbed by both users (see Figure 9.).
6. Design and Evaluation of 3D Applications
6.1. Developing 3D user interfaces
How to select the appropriate 3D user interface for a particular application? Since standards are rare or do not exist at all (especially for system control), design considerations should be based on similar application settings in 3D environments (not on the same application on the desktop) that have proven successful. Many such applications have been published, however not all put the needed attention towards usability in 3D. In general, a selection depends on the answer to questions like:
How easy is it to extend the given VE system by new input devices?
What kind of input devices is available? How many control channels (i.e. buttons, wheels) do they offer?
How can the users be characterized that will use the application? Are they domain experts? Are they (additionally) VR experts? Is quick learning of UI techniques a must? Are users already accustomed to other 3D user interfaces? Is it a public installation?
What are the properties of the 3D scene and the VE? What are the relationships between a user’s height, the size of the 3D scene and the distance to the scene? Is it an immersive setting or a semi- or non-immersive environment? Are the objects in reach i.e. can you apply direct interaction? Are resting positions of input devices or for a user’s arm available? Should co-located collaboration in groups be supported?
The following basic approaches can be followed when designing 3D UIs:
Adapting from 2D environments: this in many cases leads to crude solutions in which the same GUI as in the 2D counterpart application needs to be operated in a 3D environment, using a pick ray. However, there are notable exceptions of this approach leading to usable 3D graphical menu systems especially tailored to 3D. As an example, consider the work of (Dressler, 2007), who developed a 3D menu system for immersive environments. The optimized layout of menus improves performance and error rate for pick ray based interaction, compared to the use of PDAs and tablet-PCs. Moreover, positioning and scaling of the menu are rule-based, enhancing reachability and readability of text, and reducing occlusion.
Imitating the reality: this is appropriate for simple direct interaction tasks for objects in reach; however, it fails for more complicated tasks. One reason for this is the missing tactile and force feedback channels. Moreover, this approach does not exploit possibilities only available in a virtual world.
Inventing 3D behaviour deviating from the physical world: this approach can lead to interesting new techniques and it seems to be the most promising one. This is because certain constraints and certain feedback cannot easily be established or are not available in VEs.
The design guidelines for 3D user interfaces are manifold; the most important ones are:
Support fluent flow of actions. This is particularly important if the application is characterized by many successive changes of tools, or modes, as e.g. in a 3D computer aided styling application.
Spatially organize the 3D scene and interaction objects. This can be done in such a way that the focus of attention does not need to change often. Consider e.g. the benefit of a hand-mounted virtual 3D menu, compared to a menu positioned at a fixed location on the screen.
A more detailed discussion on such and more guidelines can be found in (Bowman et al., 2005). Another guideline is based on the human capabilities, i.e. based on the interplay of the use of both hands. The corresponding analysis of (Guiard, 1987) is often cited in work about interaction, and has proven to be a good principle. Guiard distinguishes among the following ways of using both hands: asymmetric bimanual (e.g. playing a violin, writing on a piece of paper), more or less symmetric bimanual (e.g. indicating the width of some object), or one-handed. In asymmetric bimanual tasks, the non-dominant hand usually forms a reference frame for the finer and more accurate movement of the dominant hand. These observations can be used for the design of powerful manipulation techniques in 3D. This has been achieved e.g. in the work of (Cutler et al., 1997). For example, consider their two-handed rotation tool, where one hand specifies the axis of rotation, and the other hand is rotating the selected object around it.
6.2. Evaluating applications
Similar to evaluating any other artefacts, an evaluation of 3D user interfaces involves assessing their strengths and weaknesses in order to improve their effectiveness. The main purpose of 3D user interfaces evaluation is the analysis of their usability and generation of recommendations on how they could be improved. Evaluation should always follow significant design changes and it may be necessary to perform evaluation several times during an interface development. Evaluation performed at an early stage may be dedicated more to the study of importance of different features and functionalities, while evaluation performed at a later stage may be dedicated to their effectiveness. The most widely used evaluation techniques are the following (Bowman et al., 2005):
Formative evaluation is an observational user study in which users try out the proposed interface. They may be asked to simply explore the interface or to perform some specific tasks.
Summative evaluation compares various techniques in one experiment. Users may be given a specific task that they perform both in the proposed system and in another one or in a different configuration of the proposed system.
During an evaluation one could assess functionality, performance, and ergonomics of an interface. Evaluation may be done in a formal or informal manner. The results of the evaluation may be obtained in the form of a questionnaire, an expert interview or automatic registration of some events. They may be qualitative and quantitative. The latter requires a metric, such as time, number of tasks performed, accuracy of actions etc.
Assessment techniques may include the evaluation of the following parameters:
Evaluation of systems performance, which may be used in case the analysis of a computer or graphics system is necessary, for example to calculate frame rate or latency.
Task performance. It is the main focus of many evaluations. The quality of a task performance may be analyzed in terms of the time needed to perform it, number of errors a user makes, accuracy of the performance, quantity of the information learnt and other entities.
User performance. It refers to the subjective perception of the interface by a user. Here one should analyze if the interface has any barriers to a task completion, if the interface is comfortable and if the interface is intuitive and may be used without specific knowledge.
A thorough evaluation approach should contain application-specific experiments, use a wide range of metrics and engage sufficient amount of users. Statistical methods should be applied to calculate average values and their confidence intervals, understand, if the results are statistically significant and, finally, understand, if the number of the test participants was big enough. Some web-resources on usability and statistical analysis - have online calculators and brief explanations of procedures and could be recommended as a starting point for researchers, who do not have expertise in usability analysis.
6.3. An example of application evaluation
In order to identify the influence of immersion and collaboration on the performance in assembly and manipulation tasks in a virtual environment, we performed a quantitative assessment of user performance in an assembly modeling application on the basis of our framework described in section 7.3 (D’Angelo et al., 2008). We asked each of the twenty participants to perform a specific task ten times in four modes: in single and collaborative two-user modes with stereoscopic and monoscopic vision for each mode. The participants had to assemble a table out of a table plate and four table legs and place it at a specific position on a floor plate. In each assembly task, the modeling parts were randomly positioned in space, while the sum of the inter-object distances was kept constant for all initial configurations. An automatic timer clock measured and logged the task completion times, starting with first and stopping with last assembly operations. The results showed average speed-up factors of 1.6 and 1.4 for collaborative interaction and stereoscopic vision respectively. With both collaboration and stereo vision the performance of users could be increased by a factor of 2.2.
7. Application Examples
7.1. Interactive Visualization of geo-seismic data for the oil and gas industry
The huge amount of geo-seismic data acquired for searching oil reservoirs is one of the greatest challenges for information analysis. The approach is to use interactive 3D visualization in a Virtual Environment, from which geophysicists and geologists expect the best achievements. A lot of visualization algorithms and user interface technologies have been developed in recent years in order to offer these users a benefit.
Important features of the framework developed at Fraunhofer IAIS are the visualization of multiple data types, support for well planning (i.e. paths for drilling after oil), combined visualization and sonification of well log data, and multi-resolution techniques for volume visualization. For navigation and interaction new input devices tailored to geo-seismic data interpretation were engineered. This allows users to focus on their exploration tasks rather than on operating a computer.
The most important aspect is the combination of algorithms that enable browsing through gigabytes of 3D seismic data volumes at interactive frame rates directly in 3D (Plate et al., 2002, 2007), with particularly suited interaction techniques and input devices. Figure 10. presents the VRGeo demonstrator at the Responsive WorkbenchTM.
7.2. Sketching free-form shapes
Shape design is one of the most important activities in design and product development. It is well known that buyers base their decisions of what product to choose on the shape appearance and, if applicable, on the ergonomics of a product. The main decisions about the shape of a new product take place in the conceptual design phase, which benefits from rapid prototyping and 3D printing technologies in order to cope with the high number of variants to be built and with the iterative nature of that process. For this to work, a digital model even in the beginning of shape conceptualization is a must. However, most design processes are still dominated by hand-crafted models made out of deformable or workable material, like clay, or wood, which are difficult or impossible to change. On the other hand, designers like to use hands and tools in a natural way, and not to operate a computer system, which hinders their creative thinking.
Tools for sketching free-form shapes have been developed for Virtual Environments, because these environments are very well suited to offer designers the tools adapted to their specific needs. At the same time, a digital model can be obtained and can be used for rapid prototyping.
The work shown in Figure 11. is an example of such an approach; see also (Wesche, 2004). The user draws curves and constructs a curve network that forms the skeleton of the surface he/she has in mind. Automatic surfacing methods generate shapes that correspond to the outlined boundary, thus freeing a designer from specifying all surface parameters by hand. The designer draws and alters shapes directly in space, using the hands. Since the size and the location of the virtual model corresponds to the region easily in reach by a user's hands, direct hand-based interaction at the location of focus is very natural and intuitive. On the other hand, controlling all available degrees of freedom when hand-drawing a shape requires elevated concentration from designer.
Results and user tests (Deisinger, 2000) have shown that designers were indeed able to use these tools, however have also shown the fact that the underlying technology is still too immature and not robust enough to replace traditional procedures. This is by all means true for tools that require drawing in 3D. More specialized approaches computerize a certain subtask of shape design, and these have proven more successful, e.g. digital tape drawing (Balakrishnan, 2002).
The styling application has even been tested using markerless tracking (see section 3), and it was usable for interactive surface deformations using the naked hand. Drawing curves, however, representing a complex constructive task could not easily be accomplished with this technique.
7.3. Virtual Environment for Product Customization
The TwoView display system is used to support flexible and quick customization of products from a great number of parts. The application developed by (Foursa et al., 2007) is an effective instrument that can be simultaneously used by two users for rapid assembly tasks, allowing engineers and designers to work collaboratively (see Figure 12.). Furthermore, it is directly connected to a manufacturing environment, which is able to produce the product right after customization. An XML-based language is used for the specification of all possible configurations based on a set of predefined parts. Using the VE modeler, a user effectively adds connection information by constructing a virtual product out of these parts. This is stored and, together with the part description, it forms a complete product configuration for use in successive steps of the process chain. Two users can work collaboratively together, which is more effective compared to a single user environment (see section 6.3).
7.4. Augmented Reality in School Environments
Technological advances in the field of information and communication technologies enable innovative ways to mediate knowledge. Among them, Augmented Reality becomes increasingly a field of interest. Integrated in e-learning systems, AR provides innovative ways to transfer knowledge in education. This was the purpose of the project Augmented Reality in School Environments - we had in our department (Wind et al., 2007). The aim was to create an innovative teaching aid, enabling teachers to develop, with a moderate effort, new teaching practices for teaching scientific and cultural content to school classes in a comprehensive way. 3D presentations and user-friendly interaction techniques lead to a better understanding of scientific and cultural content coupled with high student motivation. The students had the possibility to interact together with the virtual objects in a virtual shared space provided by the Spinnstube® display system and thereby performed learning by doing instead of learning by reading or listening. Furthermore, the new technology promoted team work and collaboration between classes in the same school or even remotely between schools in different countries. A co-located learning environment with four pupils sharing a common work space is shown in Fig 3. The pupils can hand over real objects among each other on a table. The objects are augmented with 3D information using the Spinnstube® display system.
The development of the VR technology is driven by two main factors: technological advancements and end-user requirements, the latter one being more important. The appearance of new products on the one hand and new industrial requirements emerging as a result of continuously growing information flows on the other hand, require the development of new interaction techniques. Although there are no universal techniques that can be applied in any case, the experience collected by the academic community in the recent years allows developers to find appropriate solutions quickly and should always be taken into consideration before designing new 3D applications. However, more efforts are required in order to increase the maturity of 3D interfaces and to continuously update this area of knowledge taking into account the advancements of adjacent areas. This includes the achievements both in technical areas, such as tracking technologies and input device engineering, and in human factors science. Combining these areas, suitable immersive interaction methods could be developed based on thorough usability studies. Furthermore, it could be useful to evaluate the usability of interaction techniques initially designed for immersive applications in semi- or non-immersive environments, bringing it closer to our everyday life. This approach could particularly benefit desktop-based environments using large displays that not necessarily offer stereoscopic viewing. In summary, increasing technical maturity and a lot of already existing immersive interaction concepts now allow the productive use of spatial interfaces, including the extension of standard desktop systems. However, the existing concepts are still mostly isolated from each other, and the current state of the art has not yet reached the level to provide a common standard for immersive interaction.