Framework for multimodal AR GUI.
Lasers are powerful light source. With their thin shafts of bright light and colours, laser beams can provide a dazzling display matching that of outdoor fireworks. With computer assistance, animated laser graphics can generate eye-catching images against a dark sky. Due to technology constraints, laser images are outlines without any interior fill or detail. On a more functional note, lasers assist in the alignment of components, during installation.
- laser graphics
- human-robotic interaction (HRI)
- wearable technology
- mobile robot control
- augmented reality
In the past, robots were deployed primarily in industrial scenarios where tasks were repetitive and in a fixed sequence under a structured and well-constrained condition. Robots were programmed and debugged ‘off-line’, before their programs were ported to the shop floor. The tasks may be expected to be repeated many hundreds of thousands, or even millions of times, 24 hours a day, and continuously for a number of years. A typical example of such a scenario would be in the manufacturing of automobiles. In this ‘high volume and low-mix’ manufacturing application, the robot actions are explicitly defined and programmed for the robot to execute. The cost and time spent to program and commission each robot would be negligible considering the number of automobiles produced and the relatively long product cycles.
In ‘low volume and high-mix’ applications, the use of robots may be unattractive due to the relative complexity of robot programming and the related setup costs. As we expand on the scope of robotic applications, a different mode of interaction is evolving. Recently, more robots are deployed in semi-structured manufacturing environment  or in domestic environment like in homes . Industrial robots are becoming more collaborative in their interactions with humans and are designed to work with humans in the same environment, without the provision of safety enclosures. In a home setting, it is best that the robot takes on the role of a compliant friend who is able and willing to do our bidding. Tasks required of the robot in such an environment are expected to be different and non-repetitive, at least within a time scale of hours or minutes.
Hence, it is important that the task of programming and interaction between human and robot become more natural and intuitive, evolving to an interaction that is typically associated to that between human. The different operating scenarios require new paradigms in the implementation of a Human-Robot Interface (HRI) such that data are clearly presented and easily accessible to all relevant parties. Industrial robots typically present their data on a computer display and often use a keyboard, mouse, or touch pendants as its input device. This setup is not favourable because it causes the human operator to have a divided attention between following the task procedure, visually, and simultaneously monitoring the important parameters on the computer display . Furthermore, in an industrial setting, the need for protective wear would make the usage of the mouse and keyboard or touch pendants untenable.
In this chapter, we propose a framework that uses laser graphics to develop an augmented reality application for HRI. This envisages the evolution of HRI to be more natural with the robot providing a greater contribution in formulating the desired outcome. This invariably moves the robot and human towards a relationship exhibiting greater interaction in terms of frequency and quantity of information exchanged during their interactions. In an environment shared between humans and robots, the human needs to be able to recognise the intention of robots and vice versa. This is to avoid the possibility of conflicts and accidents.
For human and robot to interact meaningfully, through mutual understanding, a mechanism for dialog between robots and humans must exist [4, 5]. The human should be able to define and refine his needs. In addition, the robot is required to deliberate, formulate solutions and present appropriate options, in a form suitable for human understanding. On the other hand, a robot should be able to understand its human partner through common conversational gestures frequently used by humans [6, 7], such as by pointing and gazing. There must also be a common frame of spatial referencing [8, 9] to avoid ambiguities.
Augmented reality (AR) technologies  are used in our proposed framework to help the human and robot to define their communication and intentions . Two types of AR technologies – the ‘see-through’ AR [12, 13] and spatial AR [14, 15] – are applied to enable the human to manipulate the robot in a way that the robot can understand. Similarly, these AR technologies help the robot to convey information so that the human has a better understanding of what the robot is doing and in its intentions. In research efforts involving the programming and controlling of mobile robots, there are some that make use of ‘see through’ AR technologies [11, 16], and others that utilize the spatial AR technologies [8, 17].
In the Mercedes-Benz autonomous concept car programme, laser generated graphics was proposed to indicate when it was safe for a human to cross its path by projecting a moving ‘pedestrian crossing’. In this application, the advantages of laser in being able to project images in a natural environment were exploited effectively. In addition, the need for a mobile robot, or vehicle, to communicate its intention to humans was also elaborated.
The desire for a more natural and intuitive interface for human-robot interaction was studied by a number of researchers including . These incorporated laser pointing to assist in defining targets and projected imagery to enhance interaction. In addition, others explored human mimicking interactions , while others focused on projectors devices .
The design of HRI had previously focused only on the interaction between the robot and the human that is controlling the robot. This is a natural omission, as humans and robots had previous been kept separate as a feature of design. As humans and robots intrude into the other’s space, the needs of both humans and robots outside the intended interaction need to be considered. In the sharing of resources, it is important that all parties are aware of the other’s intentions. This chapter identifies the need and describes the use of laser-based line graphics in the provision of such a function.
2. Proposed human-robot interaction framework
Figure 1 presents our proposed framework for Human-Robot Interaction. The components of the framework and their inter-linkages are presented in the schematic. The central element is the interaction kernel, which directs the control flow of the application and updates the model data such as the motion commands, motion trajectories and robot status. In each update loop, the human user could use the user interface, which consists of a multi-modal handheld device, to provide control commands or task information for the robot to execute.
The user interface uses two different types of AR technologies to display the robot information for the human user to visualize. The human user would be wearing a see-through AR glasses so that he or she can view more information regarding the robot, as well as the task in the real world. In addition, the robot has an on-board laser projector to provide for spatial AR. The robot projects its status and intentions as words or symbols onto the physical floor or wall, depending on the nature of the desired notification.
In the task support module within the framework, the human interacts with the robot through a dialogue Graphical-User-Interface (GUI), moving towards a defined task, which is acceptable to the human and executed by the robot. The role of the robot is enhanced from the traditional role of a dumb servant to that of a competent ‘partner’. To highlight the need to maintain the higher status of the human in the decision-making hierarchy, we refer to this partnership as a master-partner relationship. It is inferred that the human makes the decisions whilst the robot assists the human user by considering the information on intended task, as well as the task constraints to provide appropriate task support to the human. The robot should be capable of providing suggestions to its human master and be able to learn and recognise the human’s intentions . The robot must also be imbued with a knowledge base to allow it to better define the problem. Only with these capabilities will the robot be able to elevate its role towards that of a collaborating partner.
Under the assisted mode, the human’s cognitive load may be expected to be lower [22, 23] than if he were be responsible for all aspects of the task. In the performance of a task where the robot is unable to assist the human, in an appropriate manner, the human may elect to proceed without robot assistance. This would be the direct mode where the human operator determines the path, trajectory and operation parameters of the task. The model data, such as generated robot motion commands, planned trajectories, as well as the robot status are updated accordingly, and visualized consistently through the laser projection, the augmented reality display or a 2D graphical user interface.
This chapter presents, as concept verification, the development and evaluation of a user interface module, which uses laser graphics to implement the spatial AR technology to display the robot provided information for the human user to visualize.
3. Hardware configuration of the user interface module
The implementation of the user interface module is illustrated in Figure 2. It has been used to enable the robot to perform a waypoint navigation task, in a known environment, where local features may change and obstacles moved. Intrinsic to the implementation are the hardware devices for the human to interact with the robotic system.
3.1. Wearable transparent display
The human is provided with an Epson Moverio BT-200. It is a wearable device with a binocular ultra-high resolution full colour display. It incorporates a front facing camera and motion sensors. The motion sensors capture the user’s head motion and the camera supports target tracking in the observer’s field of view. This device is depicted on the human operator in Figure 2. Wearing the device allows the user to view his environment as well as any augmented data that is generated, overlaid on the real-world scene. Through the display, the robot system can provide status information and selection menus for the operator to select. The advantage of a wearable transparent LCD display is that it provides the human with an unimpeded view of his environment.
3.2. Multi-modal handheld device
A novel multimodal wireless single-handheld device is as depicted in Figure 3. It comprises five units of spring loaded finger paddles, a nine-axis inertial measurement unit (IMU), a laser pointer and a near-infrared LIDAR sensor. With the spring-loaded linear potentiometers, a position-to-motion mapping is programmed to map individual finger displacement to a certain motion command for the robot. The hand motions of a user are sensed by the IMU, and gesture recognition is applied to interpret the human gestures. This would allow the user to interact with the robot through gestures.
A laser pointer is included on the device so that a user can point and define a particular waypoint location or a final destination. The LIDAR sensor is included to assist in user localization by the robot.
This device supports a one-handed gloved operation and was designed for use in an industrial scenario. In an industrial setting, the hands of a human operator are, frequently, gloved, and the use of double-handed devices is viewed as being undesirable from safety considerations.
3.3. Robotic platform
The human-robot partnership framework is implemented on MAVEN [24, 25]. The robot, shown in Figure 4, is a holonomic robot with four mecanum wheels. It hosts an on-board computer for controlling the drive motors and its other robotic services. The Linux OS has been installed as the robot’s operating system for the embedded computer. The robot operating system (ROS) is installed along with the Linux OS.
The maximum forward and lateral speeds of the robot have been limited to 0.6 m/s while the on-the-spot rotational speed has been limited to 0.9 rad/s. The various sensor and behaviour modules that have been installed on the robot include a Hokuyo laser rangefinder module, a USB camera module, an MJPEG server, a localization system, map server, as well as a path planning and a navigation module.
The robot is provided with a laser projection-based spatial AR system, which enables the projection of line graphics and text onto a suitable surface. The projected images can be used to augment reality in the traditional manner or to provide indications of the robot's intention or status. In the context of a moving robotic platform, it can project the robot’s intention to move, turn, or stop. The intended path that is planned by the robot can also be projected on to the floor or road surface. During interactions, the laser graphics are used to project markers to confirm destinations or to place virtual objects for the human to confirm its desired position and orientation.
The ability to recognise the robotic platform’s intentions allows humans (and other robots) to adjust their motion to avoid conflicts. This would enhance the safety of humans in the vicinity of the robot.
4. Laser projection-based AR system
A laser projection system has been installed on the robot to provide for a spatial augmented reality. It facilitates the presentation of projected digital AR information such as the robot motion behaviours and its motion trajectories onto the floor surface of the work environment.
Figure 5 depicts a schematic of the implemented laser projection system. To create laser graphics, two tiny computer-controlled mirrors are used to direct the laser beam onto a suitable surface. The first mirror rotates about the horizontal axis while the second mirror rotates about the vertical axis. A pair of galvanometers is used to produce the rotating motions, which subsequently aims the laser beam to any point on a square or rectangular raster. The position of the laser point is controlled by changing the electric current through its coil in a magnetic field. The shaft, of which the mirror is attached to, will rotate to an angle proportional to the coil current. In this manner, by combining the motions of the two galvanometers in orthogonal planes, the x-y position of the projected laser spot can be changed.
The computer ‘connects the dots’ by rotating the mirrors at a very high speed. This causes the laser spot to move sufficiently fast from one position to another, resulting in a viewer seeing a single outline drawing. This process is called ‘scanning’ and computer-controlled mirrors are galvanometer ‘scanners’. The scanners move from point to point at a rate of approximately 30–40 kpps. To add more detail to a scene, additional sets of scanners can be used to overcome the limitations in scanning speeds.
A multicolour laser projection system would consist of red, green and blue lasers, each with its individual driver and optics. The drivers also control the intensity of each laser source independently. Red, green and blue laser beams are mixed in the transparent mirror system and the combined beam is subsequently projected onto the mirrors of the galvanometers. Together, these three laser diodes combine their output to produce a white or an ‘infinitely’ varied coloured beam.
4.1. Image generation
An International Laser Display Association (ILDA) interface can be used to import custom graphics, text and effects into laser animation format. Files containing scalable vector pictures or videos are loaded to the graphic controller in a special format. The control software converts these files into a list of sequential points, each of which is characterised by the angular deflections of the galvanometers in the vertical and horizontal planes. The intensity of laser radiation is also controlled via the interface.
The ILDA laser control standard produces a sequence of digital-to-analog converter (DAC) outputs on differential wire pairs with average amplitude of ± 24V for the galvanometer control and ±5V for the laser diode drivers. When the galvanometers receive a new value for mirror deflections, it drives the mirrors to the next desired angular position.
4.2. Transformations for inclined surface projection
As the projector frame is not necessarily perpendicular to the projection surface, there will be visible distortions in the source image (Projector Frame) projected onto the surface (Projected Image Frame) as shown in Figure 6.
This distortion needs to be corrected through a pre-warping process that is applied to the projection image, before being projected to the particular surface.
The pre-warping process can be accomplished using the concept of homography in computer vision , as shown in Figure 7. The idea is to find the transformation matrix between the source image frame and the projected image frame, as illustrated in Figure 7 step (1). Typically, a square outline will be projected onto the desired surface and its corner’s positions will be estimated. The corner positions of this square, in the source projector frame, are also calculated. The appropriate transformation matrix will be determined using the corresponding points. The details of this method are described in the paper by Rahul, Robert and Matthew . The next step, as seen in Figure 7 step (2), is to multiply this matrix with our original image to obtain the pre-warped image frame. Finally, as depicted in Figure 7 step (3), this image is projected onto the inclined surface to produce the undistorted square image (Projected Image Frame).
4.3. Camera-projector calibration and software
Our multimodal handheld device is equipped with a laser pointer, which is used to project a marker point for indicating a reference or indicating a chosen item. The marker and its projected position are identifiable by the camera, located close to the laser writer. To determine the position of the marker, a calibration process is performed to obtain the necessary transformation parameters.
The calibration process is performed by using the laser writer to draw a square with known parameters onto the floor within a region, in the camera field of view. The camera captures the image of the square that was projected on the floor. The corners of the square in the camera frame are subsequently extracted. These values, together with the known projector frame, are used to obtain the projector-camera homography. Figure 8 illustrates the procedures.
Figure 9 shows the camera-projector system and demonstrates the use of the laser marker to indicate a position. The system confirms the position of the marker by responding with the projection of an arrow head that points to the marker location. In this scenario, the robot will project an arrow that follows the laser marker indicated by the user.
5. An implementation
Within the proposed human-robot framework, we identify three different groups of people that may interact with the robot. These people, who are known as interactants, are grouped according to their roles, relating to the robot actions, and on the nature of the information they may require.
Group 1 Interactants – Operator who is responsible for the control and supervision of the robot. The operator is equipped with an LCD display and the Multi-Modal Human Input Device (HID). A wearable see-through transparent LCD display is provided to allow the operator an unobstructed visual awareness of the environment. The LCD display functionality allows for the projecting of high-resolution dialog actions proposed by the partner-robot. In addition, relevant information required by the operator to allow for timely intervention would also be displayed. This information includes communication strength and battery health. With the augmented view, the operator is able to provide the necessary support and commands to the robot. Navigation through the menu options and selection of specific items are executed using the Multi-Modal Human Input Device.
Group 2 Interactants – Observers who are interested in monitoring the tasks being executed. This group of humans may be equipped with the transparent LCD displays where they can share the actions of the human in controlling the tasks. Without the wearable displays, this group of observers would only be able to share in the notifications by the robot through the ‘Laser Writer’. They would, however, not be permitted or able to control the actions of the robot, as control is only permitted through the Multi-Modal Human Input Device. This restriction provides a clear differentiation between the two groups.
Group 3 Interactants – Passerby who are in the vicinity of interaction, but who are not directly related to, or interested in the task being executed by the robot. The interest of this group of passerby arises from the sharing of common space and the need to accommodate the robot's motion. Predominantly, the interest may be restricted to one of avoiding the robot and its workspace. Their interest is in the near-term actions of the robot as in the robot's current actions or in its next action. They need to be provided with the ability to identify the robot's actions or its intention. The robot can support this need for situation awareness by this group through the visual prompts generated by the laser writer (Table 1).
|Operator (Group 1 interactant)||Observer (Group 2 interactant)||Passerby (Group 3 interactant)|
|Wearable transparent LCD display||Provided||May be provided||Not provided|
|Multimodal HID||Provided||Not provided||Not provided|
|Remark||Fully involved in controlling the robot. View laser indications for status feedback||Involved in collaborating with either the operator or robot. View laser indications for status feedback||Not involved in the operation. Views laser indications to avoid collisions|
Figure 10 shows the dialog generated by the wearable display. Each robot is equipped with a unique augmented reality marker for facilitating the process of overlaying the computer-generated information over the real scene. The operator, with the multimodal hand controller, will have the capability of visually selecting the options suggested by the robot to complete the task. This offers a more intuitive and convenient way to interact with the robot.
While the GUI on the wearable display enables the operator to intuitively control and monitor the robot, other humans who share the same working environment (Passerby) might encounter difficulties in accommodating this robot’s motion and inadvertently cross into its intended path. The laser writer is provided to overcome this problem by providing passersby with visual indications of the robot’s intentions. Figure 11 shows one possible implementation for this system. In this scenario, this laser system allows the robot to project the direction of its trajectory, before moving from one point to another. Particularly, the robot is able to indicate: ‘stop’, ‘forward’, ‘backward’, ‘turn left’ and ‘turn right’ as text or line-graphic symbols.
The proposed framework for the dual-modality AR system was implemented for deployment in a laboratory setting where humans and mobile robots are expected to coexist and operate in close proximity. The mobile platform was ‘let loose’ within the laboratory without any prior briefing to the other laboratory users. Third party human-robot interaction was observed, and a ‘qualitative’ feedback was solicited from the ‘unsuspecting’ human. Two scenarios were evaluated. The first scenario was without the laser writer augmentation, and the second with the laser writer function activated. In this second scenario, the robot projected its action and indicated its motion trajectory.
Invariably the response was positive, especially when the robot was stationary and the human was unsure of the response required of them. In instances when the robot’s route was identified, humans would find an alternative route in an attempt to avoid the robot. The human response of avoiding the robot’s workspace enabled the robot the option of increasing its speed resulting in better operational performance.
As a natural extension of the unmanned deployment of mobile robots, the framework of providing notification of the robot’s actions is recognised to be supportive to the application of autonomous transportation in an urban scenario. In such a scenario, vehicles are larger and environments shared with humans who are less familiar with robot interaction. The ability to improve on the passerby’s awareness of the robot’s actions could be achieved using the laser writer information system described in this chapter. When the vehicle is avoiding or stopping for the human, appropriate indications may be provided to the human by use of the laser notification system.
In this chapter, we highlight the application of laser-generated outline-graphics as a viable addition to ‘augmented-reality’. Its images are bright and of high contrast. This lends itself to applications in natural environments, both indoors and outdoors, where the ambient lighting is expected to be relatively bright. Whilst unfilled graphics may be considered as a deficiency, when attempting to generate visually appealing GUI menu with filled graphical images, it can be used effectively to project images and outline text onto the surrounding surfaces. These images can be viewed without the aid of any wearable devices in a natural environment.
In addition, we proposed a design framework for GUI implementation in human-robot shared environments. In this framework, we identify specific requirements of the first party, human in direct control of the robot and the requirements of the third party, humans in the operating vicinity of the robot. The needs of each group are different and can be optimally addressed using different AR modalities. The use of laser-generated line graphics was deployed as a means of projecting messages and notifications to humans in the vicinity. This is perceived as supporting a safer working environment that humans and robots can share. In addition, humans are also enabled with a better awareness of the robot’s actions and this reduces the possibility of accidents. The behaviour of humans avoiding the robot’s workspace, where such options exist, produces opportunities for faster platform speeds and improved task efficiencies.
Mobile robots should indicate its intension as it executes its task to humans in its vicinity. This is a necessary requirement when the general public and mobile robots share and intrude into the workspace of the other. In recent times, we witness the deployment of automated transportation of food items in restaurants and of humans in autonomous vehicles. These are deployments in shared environments. The issues highlighted are relevant and worthy of consideration in their design and implementation. The laser writer provides a simple and effective way to improve on passerby situation awareness in a naturally bright environment.
This research was carried out at the School of Mechanical & Aerospace Engineering and BeingThere Centre, Nanyang Technological University. The BeingThere Centre is supported by the Singapore National Research Foundation under its International Research Centre @ Singapore Funding Initiative and administered by the IDM Programme Office. In addition, the A*STAR Industrial Robotics Program is gratefully acknowledged.