Recently, eye-gaze input systems have been developed as novel human–machine interfaces [1-10]. Their operation requires only eye movements by the user. Based upon such systems, many communication aids have been developed for people with severe physical disabilities, such as amyotrophic lateral sclerosis (ALS). Eye-gaze input systems commonly employ non-contact eye-gaze detection for which an incandescent, fluorescent, or LED lamp can be used as the source of infrared or natural light. Detection based on infrared light can detect eye gaze with a high degree of accuracy [1-3] but requires an expensive device. Detection based on natural light uses ordinary devices and is therefore cost-effective [4,5]. However, an eye-gaze input system for natural light has a low degree of accuracy.
We have previously developed an eye-gaze input system for people with severe physical disabilities [8-10]. This system uses a personal computer (PC) and a home video camera to detect eye gaze under natural light. The camera (e.g., a DV camera) can easily be connected to a PC through an IEEE 1394 interface. The frames taken by the camera can be analyzed in real time using the DirectShow library by Microsoft. We developed image analysis software to detect eye gaze. Our eye-gaze input system runs the software on Windows. This system does not require any special devices and is easily customizable. Therefore, this system is not only cost-effective but also versatile. Moreover, it can be operated under natural light and thus is suitable for personal use.
2. Current eye-gaze input systems
Many systems or devices have been developed as communication aids for ALS patients. For example, the E-tran (eye transfer) frame is used for communication between ALS patients and others. The E-tran frame is a conventional device and its structure is very simple. It is a transparent plastic board with characters, such as the alphabet, printed on it. When using the E-tran frame, a communication partner (helper) holds it over the user’s face. Specifically, the user gazes at the place where the character that the user wishes to communicate is positioned. The helper moves the E-tran frame until the eye gaze of the user corresponds with that of the helper. Therefore, the helper can determine the character from the user’s eye gaze. A user who can gaze at the characters on the E-tran frame can also communicate with others. In addition, the E-tran frame does not require power supply and is therefore highly portable. However, considerable skill is required to use the E-tran frame.
The row–column scanning system is also used to aid the communication of ALS patients. This system can be operated with one switch. In other words, the user can input characters or operate a PC by using their physical residual function. The row–column scanning system is configured to exploit simple hardware. For example, if the user employs the screen keyboard that is installed on Windows, the user can operate many of the Windows software applications. It takes considerable time to input using the row–column scanning system, because this system operates by scanning the rows and columns of keyboards using only one switch. To improve upon this situation, a new method for row–column scanning has been reported . This method optimizes the speed of row-column scanning by using a Bayesian network for machine learning. However, a patient with severe ALS cannot use the row–column scanning system, despite its single switch.
Our eye-gaze input system mitigates these weaknesses. In a general eye-gaze input system, the icons displayed on the PC monitor are selected by the user gazing at them, as shown in Figure 1. These icons are called indicators and are assigned to characters or functions of the application program. The eye-gaze input has to detect the user’s gaze in order to ascertain the selected indicator. Many eye-gaze detection methods have been developed in the past. Several systems use the EOG(electro-oculogram) method for eye-gaze detection , which detects eye gaze by the difference in the electrical potential between the cornea and the retina. It is a contact method that uses electrodes placed around the eye. Although cost-effective, some users find that long-term use of the electrodes is uncomfortable. Therefore, many systems detect eye gaze using non-contact methods [1-10]. Specifically, the user’s gaze is detected by analyzing images of the eye (and its surrounding skin) captured by a video camera. To classify the many indicators, most conventional systems use special devices such as infrared light [1-3] or multiple cameras . Nevertheless, in order to be suitable for personal use, the system should be inexpensive and user-friendly. Therefore, a simple system using a single camera in natural light is desirable [4-6]. However, natural-light systems often have low accuracy and are capable of classifying only a few indicators . This makes it difficult for users to perform a task that requires many functions, such as text input. To solve these problems, a simple eye-gaze input system that can classify many indicators is needed.
3. Eye-gaze detection by image analysis
Eye gaze is defined as a unit vector in a three-dimensional coordinate space. The origin of this unit vector is the center of the eyeball. Generally, the user’s gaze is detected on a two-dimensional plane. It has horizontal and vertical components. The method of tracking the iris (the colored part of the eye) is the most popular method for eye-gaze detection using image analysis in natural light [4-6]. For example, if the edge between the iris and the sclera (the white part of the eye) is estimated by image processing, the appropriately approximated ellipse of the edge shows the location of the iris. However, it is difficult to distinguish the iris and the sclera by image analysis, because the edge between the iris and the sclera is not sharp. In addition, if a large part of the iris is hidden by the upper and lower eyelids, the measurement errors increase, because the obscuring of the iris by the eyelids causes estimation errors in the delineation of the iris. To resolve these issues, we propose a new image analysis method for detecting eye gaze using both the horizontal and vertical directions. This detection method is based on the limbus tracking method. Our eye-gaze detection method can obtain the coordinates of the user's gaze point.
In our eye-gaze detection method, the video camera records images of the user’s eye from a distant location (the distance between the user and camera is approximately 70 cm), and then this image is enlarged. The head movements of the user induce a large error in the detected gaze. We compensated for the head movements by tracing the location of a corner within the eye.
3.1. Horizontal gaze detection
The limbus tracking method is an eye-gaze detection method using the difference in reflectance between the iris and the sclera. By this method, eye gaze can be estimated with relative ease, and therefore it has been used since the 1960s . The general eye-gaze detection system using the limbus tracking method irradiates an eyeball of a subject with infrared light. The eye gaze of the subject is detected by measuring the reflected light using optical sensors such as photodiodes.
We have developed a new eye-gaze detection method that is used under natural light [8,9]. An overview of the proposed horizontal gaze detection method is shown in Figure 2. The difference in reflectance between the iris and the sclera is used as follows: the gaze is estimated from the difference between the integral values of the light intensities in Areas A and B, as shown in Figure 2. We designate this differential value as the horizontal eye-gaze value, which gives a value for the horizontal gaze component. The relation between the horizontal eye-gaze value and the angle of sight is nearly proportional. Therefore, the system can be calibrated using this relation.
3.2. Vertical gaze detection
An overview of the proposed vertical gaze detection method is shown in Figure 3. The vertical eye-gaze is also detected by the limbus tracking method. In other words, the light intensity of the eye image is used to detect the vertical eye gaze. Specifically, the vertical eye gaze is estimated from the integral value of the light intensity in Area C that is not hidden by the eyelids . We designate this integral value as the vertical eye-gaze value, which gives a value for the vertical gaze component. The relation between the vertical eye-gaze value and the angle of sight is a characteristic function. Therefore, the system can be calibrated using this relation. Many application programs for eye-gaze input need low-accuracy measurements that involve only three directions of vertical eye gaze (top, center, and bottom). Therefore, our practical eye-gaze input system detects only three general directions of vertical eye gaze.
In reality, the light-intensity distribution of the eye image changes with iris movement, and vertical gaze can be detected using this change . The system stores vertically aligned images of the eye gazing at the indicators. The light-intensity distributions (the results of a one-dimensional projection) are calculated from these eye images as reference data. The user’s vertical gaze can be detected by pattern matching based on these reference data. An overview of the method is shown in Figure 4, which illustrates the detection of each of the three gaze directions: top, center, and bottom. The wave patterns at the right of the eye illustrations show the light-intensity distributions. We confirmed that with increasing reference data the method can distinguish five to seven vertical gaze directions.
4. Input interfaces based on eye gaze
We developed a new eye-gaze input system using the methods discussed in Section 3. This system detects the eye gaze of a user under natural light and operates the application programs for communication aids such as text input. Two interfaces to operate the application programs have been developed. One of the interfaces has indicators displayed on the PC monitor. The functions of application programs are executed by gazing at these indicators. The other interface allows eye gaze to control the mouse cursor. By means of this interface, a user can operate the general Windows software. We describe our eye-gaze input system and its input interface below.
4.1. Eye-gaze input system
Our eye-gaze input system comprises a PC, a home video camera, and an IEEE 1394 interface for image capture from the camera. For eye-gaze detection, the computer runs image analysis software on Windows (XP, Vista, or 7). This system (illustrated in Figure 5) does not require a device exclusively for image processing. The characteristics of eye gaze vary from one individual to another. Therefore, the eye-gaze input system requires calibration. The indicators for calibration are shown in Figure 6. Users must calibrate the system before using it for tasks. After the camera location is adjusted, the calibration begins. While the calibration is being performed, users gaze at each indicator when its color switches to red. Our eye-gaze input system has two types of indicators, which are specific to each application. In particular, the five calibration indicators shown in Figure 6(a) are used for the input interface with a workspace displayed at the center of the PC monitor. The workspace is used for displaying an application software window. In addition, the nine calibration indicators shown in Figure 6(b) are used for the interface to operate the mouse cursor, because this interface requires a higher accuracy of measurement.
4.2. Interface using indicators displayed on PC monitor
An interface suitable for eye-gaze input can be designed when developing application programs for an eye-gaze input system. For example, an interface using indicators is most commonly used. The indicators are displayed on the windows of the application program and are selected by the gaze of the user. The arrangement of indicators depends on the measurement accuracy of the eye-gaze input system. Our system treats each indicator as one of 27 indicators (3 rows and 9 columns), which permits high accuracy. However, in the interest of usefulness, our practical eye-gaze input system utilizes an interface with 5 to 12 indicators. The arrangement patterns of the indicators are shown in Figure 7.
The arrangement pattern in Figure 7(a) is used when the eye-gaze input system needs fewer than five indicators. This arrangement pattern can be used when the application program requires a small number of indicators. However, it can demand a wide display area for the application program. Therefore, this arrangement is best used by application programs such as a television program viewer. The arrangements in Figures 7(b) and (c) are used when the eye-gaze input system needs 6 to 12 indicators. In particular, some kinds of application programs require 6 to 10 indicators. These application programs utilize the arrangement in Figure 7(b). For example, fixed-phrase mailers or Web browsers use this arrangement pattern.
In addition, text input is a popular application for eye-gaze input. Around 60 indicators are required to input Japanese text. In fact, English text input systems require a similar number of indicators, because the English language contains uppercase letters, lowercase letters, and punctuation marks. Moreover, control keys are required for text input. If around 60 indicators are displayed on a screen, the window for text input cannot be arranged on the same screen. In other words, its operability is greatly decreased. Therefore, we developed a text input system for Japanese and English using the indicators shown in Figure 7(c). Its interface requires two selections: one for character group selection and another for character input (the details are given in Section 5). For an eye-gaze input system with the indicators shown in Figure 7(a), (b), or (c), there is no necessity to detect eye gaze when the user gazes at the center of the PC monitor. Therefore, an eye-gaze input system using any of these arrangements is calibrated with the simple indicators shown in Figure 6(a).
Generally, the eye-gaze input decision with such an interface requires the detection of not only the user's gaze point but also the user's command for an indicator (assigned character) selection. An input decision can be made by using eye fixation, measuring the time for which the eye fixates on a target such as one of the indicators. The abovementioned interface using indicator selection requires special application programs. However, the operability of the system can be increased by using suitably designed indicators. The users need to sufficiently practice operating this system to operate the application programs at a faster pace.
4.3. Interface for mouse operation
When users operate a PC with the mouse, they gaze at the mouse cursor routinely. In other words, if the mouse cursor can be moved to the user's gaze point, the eye gaze of the user can be utilized for an input interface. Our eye-gaze input system can obtain the coordinates of the user's gaze point. In other words, when a user gazes at a point on the PC screen, the mouse cursor moves to that point.
If the mouse cursor is controlled by eye gaze, the user gazes over the entire area of screen. Hence, the eye gaze of the user must be detected with a high degree of accuracy. Therefore, a system with this interface is calibrated using the indicators shown in Figure 6(b). By using an interface for mouse operation, the general Windows software can be operated without any special application programs. In addition, Windows operations such as copying a file can be performed by eye gaze. The method for operating this interface is clear and simple; therefore, this interface is user-friendly.
We developed special application programs for eye-gaze input. However, users may want to run other Windows software. By selecting the abovementioned interface, users can fulfill this desire. However, the icons and menu items of the general Windows software are small for eye-gaze input. In other words, it is difficult to select the icons and menu items with mouse operations by eye gaze. When users gaze at one point on the object viewed, their eye fixation has micromotions (called involuntary eye movements). Therefore, it is difficult to keep the mouse cursor on the viewed object for the time required for eye-gaze input. In addition, the general Windows software and the eye-gaze detection software are executed on Windows separately. Hence, the general Windows software cannot recognize the icon or menu item that is gazed at by the user.
To resolve these issues, the interface for mouse operation requires a different method for input decisions. We think that an eye blink should provide the information used in this input decision method. The details of this method are given in Section 6.
5. Application programs for eye-gaze input
Our research group has developed application programs for eye-gaze input to assist ALS patients. The interface of the application programs employs indicators displayed on the PC monitor, as shown in Section 4-2. We present our application programs below.
5.1. Text input system
Text input is the most important function to aid communication by ALS patients. Inputting text by eye gaze increases the convenience of their communication. We designed indicators for text input by eye gaze, considering the success rate of gaze selection with our proposed system . There are 12 indicators (2 rows and 6 columns). With this system, users can input Japanese or English text at a faster pace. However, around 60 indicators are required to input Japanese text. Similarly, 12 indicators are insufficient for English text input, because the English language contains uppercase letters, lowercase letters, and punctuation marks. To resolve this problem, we designed a new interface through which users can select any character (English or Japanese) by first choosing the indicator group. An overview of the interface is shown in Figure 8.
This interface requires two selections: one for character group selection (e.g., Group “A–E”) and another for character input. Letters require two selections, and punctuation marks (“etc.” in Figure 8) also require two. However, commonly used characters such as the space (“SPC” in Figure 8) require just one selection. To input the character “C,” the user first selects the indicator for Group “A–E” and then the indicator for the uppercase letter “C.” Japanese characters can be input in the same way.
5.2. Support system for personal computer operation
If users can operate general Windows functions by eye gaze, they can operate commonly used application programs such as mailers and Web browsers. Users can input text to these applications using the abovementioned interface for text input. Such applications are normally operated by keyboard or mouse, especially the latter. When an eye-gaze input system is used, the functions of the application programs must be assigned to indicators. We have extended our system to general Windows functions. Many guidelines have been proposed for the development of application programs for the disabled. To satisfy these guidelines, we assign the following Windows functions to indicators: cursor control; execution of application programs; use of shortcut keys to copy, cut, and paste; and selection of items from a menu bar. Hence, commercial applications can be used with our system . The Windows functions are organized as shown in Figure 9, and the user can switch indicator group. The “main operation screen” has indicators for cursor operation, object selection, decision input (enter), etc. The “extended operation screen” has indicators for operating the mouse, activating the desktop, switching, or closing the window, etc. Using these indicators, all general Windows functions can be performed.
As described in Section 4-3, general Windows functions can be extended by controlling the mouse cursor by eye gaze. However, indicators that include the commonly used functions actualize a comfortable and high-speed operation of Windows.
5.3. Mailer software for sending fixed phrases
By using the text input system described in Section 5-1, users can input English text by eye gaze at approximately 16 characters per minute . This input rate is not adequate to send an emergency message. To resolve this concern, we developed mailer software for sending fixed phrases by eye gaze. This software requires only a few steps for sending a message. In addition, combinations of the fixed phrases can be sent, and each phrase is customizable. Users can send a message to a pager or a smart phone outside the room. Therefore, users can communicate their requests (such as “I would like a drink of water”) to their helpers. A screenshot of this mailer software is shown in Figure 10.
5.4. Web browser using eye gaze
With the popularization of the Internet, people now frequently browse Web pages to collect information. We paid great attention to this point; hence, we developed a Web browser for the eye-gaze input system. Generally, a Web page is related to others via hyperlinks. In addition, users often input text to a Web page when using a social networking service (SNS), online shopping, etc. When browsing these Web pages, the users make selections via hyperlinks, radio buttons, and text boxes that must be detected on the Web pages. Our system determines the locations of these selectable objects on a Web page. The system then stores the locations of these objects. Consequently, the mouse cursor jumps to the object of the candidate input. Therefore, our system enables Web browsing at a faster pace. An overview of the object selection method that uses information on the arrangement is shown in Figure 11.
5.5. Television program viewing system
Studies have reported that the three principal functions of an environmental control system are for a television, reclining bed, and air conditioner. In other words, physically disabled people such as ALS patients would like to operate the functions of these devices. We focused our attention on a PC with a television tuner, and developed a television program viewing system for eye-gaze input. This system displays television programs on the PC screen along with the indicators for function control. The five indicators for the television program viewing system are displayed in the upper part of the screen as shown in Figure 12. The functions of a channel selector, volume control, and power switch are assigned to the five indicators. When users view television programs, the indicators are not required. Therefore, we set up two modes designated as viewer mode and control mode. In control mode, the five indicators are displayed on the screen. In viewer mode, the five indicators are not displayed, but an indicator for mode change is displayed. If the user gazes at the indicator for mode change, the other five indicators appear instead.
6. Next-generation eye-gaze input system
As described above, we have developed not only an eye-gaze input system for natural light but also an application system. When the application programs are used in combination, the quality of life (QoL) of ALS patients is improved. However, in order to provide additional improvements in QoL, a more versatile environment for eye-gaze input is required. For example, some users would like to explore the newer Web services, such as Facebook and Twitter. It is difficult to develop new software for these users individually. To resolve this problem, we need to improve our interface for mouse operation by eye gaze (presented in Section 4-3).
As shown in Section 4-2, if a user gazes at the indicator for a desired input, that input is easily decided upon, because the application program can recognize the indicator viewed. The interface for mouse operation can move the cursor to the gaze point of the user; however, it is difficult for this type of interface to recognize the icon viewed. To resolve this problem fundamentally, we are developing an interface that utilizes information on eye gaze and eye blinks. Many such interfaces have been proposed, but no truly practical system has been developed. When using this type of interface, unconscious eye blinks occur. In other words, the input errors are often attributable to unconscious blinks. This phenomenon is known as the “Midas touch problem.”
We think that if involuntary (unconscious) blinks can be recognized, the input errors can be significantly decreased. In fact, we are presently developing an eye-gaze input system that can recognize voluntary blinks. Most conventional methods for measuring eye blinks analyze images of the eye (and its surrounding skin) captured by a video camera. Commonly used NTSC video cameras are capable of detecting eye blinks. However, it is difficult for these cameras to measure the detailed temporal changes that occur during the process of eye blinking, because an eye blink occurs relatively fast (within a few hundred milliseconds). The eye-gaze input system also uses an NTSC camera and therefore it is necessary to take account of this problem.
NTSC video cameras capture moving images at 60 fields/s, and these field images are mixed to produce field-interlaced images at a rate of 30 frames/s (fps). We have proposed a new method for using NTSC video cameras to measure eye blinks . This method utilizes the non-interlaced eye images captured by an NTSC video camera. These images are odd- and even-field images in the NTSC format and are generated by splitting NTSC frames (interlaced images). The proposed method has a time resolution that is twice that of the NTSC format. Therefore, the detailed temporal changes that occur during the process of eye blinking can be measured. By using this new method for eye blink detection, we can develop a next-generation eye-gaze input system that is more user-friendly.
We have developed a new eye-gaze input system for people with severe physical disabilities. This system detects the horizontal and vertical eye-gaze components of users under natural light such as that from an incandescent, fluorescent, or LED lamp. By using this system, users can input text or commands to a PC. We have also developed application programs for the eye-gaze input system, including a text input system, PC operation support system, fixed-phrase mailer, Web browser, and television program viewing system. When these programs are used in combination, the QoL of ALS patients is improved.
Our eye-gaze input system can obtain the coordinates of the user's gaze point. Accordingly, when a user gazes at a point on the PC screen, the mouse cursor moves to that point. By using this input interface, users can operate the general application software of Windows. In addition, our system is expected to contribute to the development of a next-generation eye-gaze input system. This new eye-gaze input system will be developed using our new method for eye-gaze and eye-blink detection. We believe that our new eye-gaze input system can ameliorate the QoL of ALS patients.