Since the development of the first computers in the middle of the twentieth century, their number and penetration into daily life has been increasing rapidly. However, not only has the number and computational power of computers been growing, but user - machine interaction paradigms have also developed and changed. From punch cards this evolution has slowly progressed to the third wave of computerization, ubiquitous computing.
The term ubiquitous computing was first used by Mark Weiser (1991) to describe the concept in which small cheap computers are integrated into most of the objects surrounding the user. The purpose of these devices is to assist users in their daily routines and perform automation of their environments. As examples, Weiser cites doors that open only to approved badge wearers, coffee machines that make coffee automatically when the user wakes up, and diaries that make notes on their own.
If users had to control all these devices using the traditional desktop human-machine interaction model, they would most probably find them annoying rather than helpful. This gives rise to the necessity for a completely new approach to the interaction. Weiser states that ubiquitous devices should provide their services semi-automatically, by sensing the environment, analyzing the situation and making a decision as to what functionalities should be offered to the user.
The surroundings of the device, relevant from the point of view of running an application, are often referred to by the term context. Devices able to adapt their functionality to changing surroundings are called context-aware devices. While few commercial applications exist to support the concept of context awareness, several prototypes have been introduced. However, more research is needed as the technology for recognizing context is not yet widely available, and many problems remain unsolved.
The purpose of this chapter is to provide an overview of the state-of-the-art in current context-awareness research and to argument for the increasing role of the concept in future human-computer interaction. Both applications and implementation issues are covered. In addition, the authors’ findings and future research on the technology for context recognition are described. The authors of this chapter are researchers in the field, mainly studying activity recognition using wearable sensors. They address certain key problems associated with the use of the technology in mobile devices, such as power consumption and robustness.
The “Applications”-section provides a wide range of examples of applications that employ context-awareness. In the “Context recognition”-section, common approaches to the formulization of the context are described and techniques and devices used for recognition of the context are also presented. Finally, the “Main concerns”-section analyzes the concerns and major problems raised by new context-aware applications. The section also contains the description of the authors’ work with information on future research.
2.1. Characteristics of context-aware applications
Early on, Schmidt et al. (1999) proposed three application domains that can benefit from context awareness: adaptive user interfaces that change according to the surrounding conditions, context-aware communication systems, and proactive application scheduling. Nowadays even more uses for the technology have been devised. This section discusses the use of the knowledge of context and the possible applications for the technologies that are used in context recognition. First, the characteristics of context-aware applications are described. Second, the major application fields are presented: home automation, work support systems for professional use, diary applications, mobile guides for free-time use, health care, abnormality detection, ubiquitous learning and advertising. Examples and the technologies used in each group of applications are given. Finally, there is a discussion of the use of the context recognition technology as an enhancer of HCI and as a new input modality is covered.
The need to determine the user’s current context has emerged from the vision of ambient intelligence. Thus, the applications that use context information are mobile, ubiquitous services that offer the main benefits of context awareness: adaptation, personalization, and proactivity. Adaptation means that a system adjusts its behaviour according to the current context. A mobile phone can, for example, switch to silent mode when the user is at a meeting. Personalization means that a system adjusts its behaviour according to user preferences, habits, skills or tasks. For example, a shopping system can offer the user the products that are the most likely to be of interest. Proactivity means that the system can automatically act without user interaction and even anticipate future actions or contexts and act accordingly. (Pichler et al., 2004)
Dey and Abowd have another way of describing context aware applications: “A system is context-aware if it uses context to provide relevant information and/or services to the user, where relevancy depends on the user’s task.” They also state that the applications have three different main categories for the context-aware features. They are presentation of information and services to a user, automatic execution of a service, and tagging of context to information for later retrieval. This division is widely used in the literature for describing the functionalities of context-aware applications. The authors have also made an exhaustive survey of the different definitions related to context awareness. (Dey & Abowd, 2000b)
We have discussed user centric application design in our earlier work (Makkonen & Visa, 2007a). The idea was to make HCI more natural by considering the support of different human characteristics. Context awareness as such can enable this by increasing the amount of automation, memory support, and multimodality in the interaction.
Context-aware applications do not always contain all these characteristics. In fact, the utilization of all the possibilities that the concept enables is usually very limited. Many technical issues are yet to be solved before there is widespread adoption of context awareness, but eventually it will become more prevalent as cheaper and more energy-efficient sensors come to market.
2.2. Technology enhancing HCI
Enabling devices to act smartly and independently is possibly the most obvious use for context awareness. When a device or any smart object knows what is happening in its surroundings, it can make life easier for its user. Schmidt describes how context awareness can enhance HCI. Basic ways to help system output include finding a suitable time for user interruption, reducing the need for interruptions, and adjusting system output to match context. Conversely, the input can be helped by reducing the need for user input and making it easier by limiting input space, and adapting the input system to the context. (Schmidt, 2000)
An example of this kind of concept is a portable consumer device. A working example of a context aware mobile phone is SenSay, by Siewiorek et al. This uses light, motion and microphone sensors for determining the user context. Context in this case is restricted to determining if the user is engaged in physical activity, the level of the background noise and if the user is talking. Calendar entries of meetings etc. are also used. The phone has four states, according to which the ring sound volume and the vibration alarm are modified. For example, if the user is in a “do not disturb”-state, the phone sends an SMS to inform the caller. (Siewiorek et al., 2005)
Another growing field is home automation (also known as the “smart home”) that increases automation of a normal home by using sensors and artificial intelligence. This attempts to make life easier for the residents, i.e. improve the usability of the home in various ways. This kind of system can control the heating or air conditioning, lights, curtains, audio and video devices, kitchen appliances, and even a sauna stove. Additionally, security related systems, like fire alarms and surveillance, can be linked to the system. The system detects the context of the home and the residents by using sensors, and then adapts itself accordingly. The system can also anticipate the future actions of the residents. The technology will become increasingly popular in the future. Its major drawback, however, is that the connected devices must utilize a common standard.
Moreover, context awareness can be thought of as an input modality when the system reacts to user actions on the basis of the context information. Salber et al. (2000) have addressed the topic. They state that multimodality refers to making explicit commands to a system and receiving output in multiple ways, whereas in context awareness the sensor input is done implicitly without explicit user commands. Feki et al. (2004) have also studied the issue and present a vision of how context awareness can be used along with multimodality to ensure more efficient update of functionality to user interface for people with disabilities.
An important point about context awareness is that it relates to many other technologies. The technology that enables us to recognize human physical activities and body movements, and thus enhances context recognition, can be used on its own to enhance HCI. Gesture recognition techniques often use data accelerometer sensors. Human hand gestures, for example, work as an effective input modality. Mäntylä et al. (2000) have tested an accelerometer-based system for hand gesture recognition for the use of a mobile device. The system can recognize some of the most common gestures that are made when using a mobile phone. For example, this enables automatic call reception when the phone is moved to the ear. Bui and Nguyen (2005) and Hernandez-Rebollar et al. (2002) have studied sign language recognition using accelerometer sensors placed on a special glove. Earlier, Lee and Xu (1996) have used HMMs for recognition of gestures of a sign language alphabet. Pylvänäinen (2005) has studied hand gesture recognition from accelerometer signals using a hidden Markov model (HMM). Mäntyjärvi et al. (2004) have designed a system for controlling a DVD-player using hand gestures. The system also uses accelerometers and a HMM. Darby et al. (2008) propose a HMM-based real-time activity recognition system for gaming using Vicon, an optical motion capture system, as the input sensor.
Healthcare systems that use context-aware computing are often used for ambulatory monitoring and remote-assisted rehabilitation. The health condition and physical activities of the patients can be monitored by using context recognition technology. By doing this, any alarming situations can be spotted. Preventive healthcare can also profit from the technology since tracking normal people’s physical activities provides information on the amount of calories used and can act as a motivator for the user.
Tentori and Favela (2008) propose an activity-aware system for healthcare use. The system tracks the activities of hospital staff and assists their work by giving information on the status of the patient. The system also reports on other situations such as whether the patient has taken medication. Jovanov et al. (2005) introduce a wireless body area network (WBAN) based telemedicine system for ambulatory monitoring and remote-assisted rehabilitation. Many different sensors are used to monitor the patient’s condition. Sherrill et al. (2005) have used activity recognition for monitoring motor activities of patients undergoing pulmonary rehabilitation. The authors state that the application would help clinicians and optimize the exercise capacity of patients.
Another type of applications that relates to activity monitoring and healthcare is abnormality detection. By tracking any activity over a prolonged period, it is possible to detect behavioral patterns and gain a good idea of what is normal. Thus, abnormal or undesirable can be detected or even anticipated and thus prevented. Industrial applications have used such monitoring for a long time in applications such as monitoring processes in a chemical plant. Yin et al. (2008) have studied abnormality detection using data from body-worn sensors. They state that the approach can be applied in security monitoring for identifying terrorist activities, and also in healthcare to detect signs of serious illness.
2.4. Diaries and memory support by tagging
Diary applications exploit tagging of context information to different time-related events. When a picture is taken with a camera, the device can get the context information automatically and store it as a tag for the picture. The information in this case can include who took the picture, and where was it taken. Mani and Sundaram (2006) use context awareness for media retrieval. They use context information for tagging and locating the most relevant photographs.
Such an approach can be useful in a wide variety of diary applications. Schweer and Hinze (2007) use context awareness this way for augmenting the user’s memory. The idea is that human memory links the context of the situation to the memory, and thus the memories can be restored. In a similar manner, Dey and Abowd (2000a) propose a system that attaches context information to a reminder note in order to remind the user in an appropriate situation. It is useful if a reminder is activated just as it is needed such as, when you are leaving the house and you have to remember to take something with you. However, this kind of reminder and calendar would have to be created by hand. It would be useful if a system could learn the user’s normal routines and act accordingly. TRAcME is an activity recognition system for a mobile phone by Choujaa and Dulay (2008). The system can recognize generic human activities, and learns the patterns of the user’s habits.
2.5. Mobile guides
Mobile guidance systems and manuals are applications that offer the user information via a mobile device, normally a PDA or a mobile phone. Examples of such systems are mobile tourist guides. The guides can propose sights and services like restaurants based on the user context. They enable the user to get information on interesting sights and services easily, without the effort of searching.
It is important to consider user preferences and the history of previously visited sights and services to avoid repeatedly recommending the same ones. Moreover, the applications need to get the information somehow. It is obvious that this cannot be provided by the same provider as the application, but by the sights and services themselves. Another potential source of information are the experiences and suggestions of other travelers. This can be seen in several travel websites, like TripAdvisor.com, that is used by a great number of people. This kind of information is called consumer generated content (CGC). Overall, this kind of applications seems to a lot of potential for the application of context awareness.
Schwinger et al. have made a review of mobile tourist guides that utilize context recognition. Almost all applications presented in their paper use only user location and profile. The authors state that the main drawbacks of the systems are the lack of consideration of tourism as a social activity and time as context information. Moreover, only some applications use automatic context delivery. (Schwinger et al., 2008)
A good advertisement is shown at the right time, to the right person, and in the right way. “Shown” in fact is not a good way to express this, because the advertisement can exploit several human senses by using multiple modalities. Using different smells is an example of this. Context awareness can address all these requirements, and it would be surprising if advertising companies did not show a keen interest in it. According to Neumann (2008), “Context awareness technology, based on the clues from a consumer's surroundings, can help you deliver your message to your target audience in a much more personalized and valuable way”.
What would it be like if your mobile phone knew that you are hungry, or going to be hungry soon? The device could then suggest, for example a nearby restaurant suited to your personal taste and preferences. It could even show you the menu and possibly recommend some of your favorite dishes. This would be of great interest to advertisement companies. Today, of course, such information is not available as the mobile phone cannot read our minds. However, the device could learn the user’s habits by registering how often and at what time the person eats. By using artificial intelligence methods, it is possible to gain clues about what is likely to happen next, and act accordingly.
Personalization and user profiles are commonly used means for doing this. Uhlmann and Lugmayr state that although numerous user profiling methods exist, they are mostly scattered among many different internet shops and services. Thus, each service has its own profile for the same user. The authors stress the need for a way to create a single mobile profile that does not depend on the internet, nor is limited to web-based services. Instead, the profile could be used in traditional shopping, too. As an example, they mention that several internet music stores gather users’ musical tastes to profiles. This profile could well be used in traditional record stores, too, because the taste is the same and only the shop is different. (Uhlmann & Lugmayr, 2008)
Aalto et al. propose a location-aware system that delivers advertisements to a mobile device when the user location is near a shop. This is called proximity-triggered mobile advertising. The authors state that location information as such is not sufficient, but that personalization is necessary to be able to deliver the advertisements without causing annoyance (Aalto et al., 2004). Bluetooth is used for getting the location information. Rashid et al. (2008) propose a system with a similar approach. Mahmoud (2006) uses time, location, and user preferences as the context information for addressing advertisements to mobile devices. de Castro and Shimakawa (2006) use user location and interests as context information in a similar system.
2.7. Work assistance
Work assistance applications aim to support a worker to do a job better. They help by showing how the work is performed and by giving a reminder of the correct order of the working phases. They can also support learning. The applications can be, for example, workshop or maintenance assistance systems. In our previous work (Makkonen & Visa, 2007b) we have developed and tested a mobile assembly support application. The purpose of the application was to guide the worker through an assembly of a grid-anode for a corrosion prevention system. The application used a conventional hypertext-based user interface without any automatic functionality. The tests showed that due to awkward working conditions, the user interface was inconvenient. On the other hand, if context-awareness was used, it would be possible to determine which work phase is underway and to monitor the work quality. Quality-critical tasks in particular should profit from the approach.
Current applications in the field are specific for the task in hand and employ special techniques for determining the work context. This usually involves sound since different tools make distinct noises, and employ RFID tags mounted in parts in assembly tasks to recognize them. Therefore the solutions are usually not generalizable. Lukowicz et al. (2004) propose a system that attempts to segment and recognize user gestures in a workshop environment by using microphones and accelerometers. This could be used for reducing the cognitive load on the user and providing assistance. Stiefmeier et al. (2008) detect the steps in assembling a car part by using a finite state machine.
Tähti et al. (2004) have presented a context-aware mobile application for supporting work in an office-type environment. The application combines automation (by adjusting the device settings according to the context at hand) with tagging (reminders, documents related to meetings etc.) and service browsing (finds services that are available in the current context).
2.8. Ubiquitous context-aware learning
Ubiquitous learning tries to bring the learning environment to wherever the user may be and enables the learning to take place any time. Moreover, context awareness enables the learning to relate to the current situation or to nearby objects. Therefore, the learning can be more efficient than in a permanent classroom environment. The idea uses tagging - and, unlike in diary applications, time is not the main issue for the context, but rather the objects and the topics to be learnt.
According to Schmidt, context awareness can be utilized for deciding what, when, and how the learning resources are delivered to the learning user. The simplest way to do this is to select the most appropriate resources for the learner and the situation at hand. The challenges of such systems are that context is hard to identify, and that the time at which the resources are delivered has to be well-defined to avoid causing annoyance to the user. (Schmidt, 2005)
Ogata and Yano (2004) have tested a system for learning polite expressions in Japanese using a context aware system. They use the learner’s profile (age, gender, occupation etc.), other people’s profiles and the current environment as the context information, which determines the appropriate expressions to be used in each situation. According to the authors, the system proved to be very useful in tests. Similar approaches are proposed by Hsieh et al. (2007) and Chu and Liu (2007). Both use context-awareness for enhancing the learning of English. Jeong and Lee (2007) propose methodologies for implementing learning systems that use context awareness.
Context awareness is also used in less obvious fields. Doyle et al. (2002) propose a method for using context awareness for interactive story telling. The user is “in the story” and can interact with the story world by performing actions such as walking to affect the plot of the story.
3. Context recognition
Context recognition is a challenging task. The most problematic issue is the impossibility defining a fixed set of factors that should be determined to categorize the context. In the first work to introduce the term context-awareness, Schilit and Theimer (1994) use the term context to refer to the location of the entity. Later, however, Day and Abowd (2000b) extended the meaning to include any relevant information to characterize the situation of the entity. They proposed that a context can be seen as a combination of four context types: Identity, Time, Location and Activity. This context model was later enlarged by Zimmermann et al. (2007) to include an additional context type called Relations.
Identity specifies personal user information like contact list, age, phone numbers etc. Time, in addition to its intuitive meaning, can utilize overlay models to depict events like working hours, holidays, days of the week and so on. Location refers either to geographical location, like longitude and latitude or to symbolic location such as at home, in the shop or at work. Activity relates to what is occurring in the situation. It concerns both the activity of the entity itself as well as activities in the surroundings of the entity. Finally, Relations are used to define dependencies between different entities. Relations describe whether the entity is a part of a greater whole and how it can be used in the functionality of some other entities.
Acquisition of Time and Identity does not require any sophisticated techniques, but can be achieved using regular user profiles and hardware clock. Automatic methods for recognition of Relations have not so far been investigated. Therefore, this chapter concentrates on methods that can be used to determine Activity and Location context types. A range of established techniques based on machine learning and positioning is available for the recognition task.
3.1. Location recognition and positioning
3.1.1. GNSS-based methods
The most widely used methods for outdoor positioning are based on GNSS (Global Navigation Satellite Systems). Currently GPS (Global Positioning System) and DGPS (Differential Global Positioning System) are the predominant GNSS solutions. GPS provides accuracy of within a few meters depending on the number of visible satellites and their geometric orientation in relation to receiver.
GNSS positioning is based on the so called trilateration method. This method can determine the exact location of an unknown single point, when the locations of three known points and the distances between the known points and the unknown point are given. The distances between satellites and receiver are determined using the time difference between signal transmission and receive time. One additional satellite is required to overcome problems caused by unsynchronized satellite and receiver clocks. This means that GPS require at least four visible satellites to perform positioning. The positioning precision of GPS is about 20 meters.
GPS accuracy can be increased to a few centimeters using DGPS methods, but requires additional infrastructure. The drawback of the GPS - based positioning is a setting-up time of about half a minute when the receiver is turned on. However, this problem can be overcome using A-GPS (Assisted GPS) method. In A-GPS, a cellular network can be used to send initialization parameters to the GPS receiver to decrease the setting-up time and improve accuracy.
The major disadvantage of the GNSS based positioning systems is the requirement to have satellites in the line-of-sight of the receiver. This constraint means that utilization of GNSS receivers is impossible indoors or in narrow canyons.
3.1.2. Cellular network -based methods
Another approach is to obtain position from existing cellular networks. The fact that most of the currently used mobile computing devices are cellular mobile phones, make this concept particularly attractive. Consequently, positioning methods for the third generation of cellular networks (3G) are a research area receiving a lot of attention.
Four different positioning methods are included into 3GPP standard. The simplest method is to estimate the position of the mobile device relying on serving cell (the so called cell ID-based method). The wide variation in cell size (100m-35km) results in the poor positioning accuracy achieved using this method. The accuracy can be improved using other relevant information, like distance from the base station. The distance, in its turn, can be assessed using signal travelling time from mobile to base station and back to mobile (Round-Trip-Time).
Another method called OTDOA-IPDL (Observed Time Difference of Arrival - Idle Period Downlink) uses the geographical locations of several base stations serving adjacent cells. The relational distances to these base stations are determined using time of arrivals of special communication signals sent from base stations. Optionally, special communication signals can be sent from the mobile device to the base stations. In this case, position is calculated using the hardware installed in the network, which can determine position of any mobile phone irrespective of its type. This U-TDOA (Uplink - Time Difference of Arrival) method is highly attractive for positioning mobiles during emergency calls. Methods based on time difference of arrival provide accuracy of the order of few hundred meters.
Finally, A-GPS method already mentioned among the GNSS based methods is standardized to be used with UMTS. This method assumes that a GPS receiver is integrated into the user's mobile device and initialization parameters are sent through the UMTS network.
Methods based on signal strength or the travel times of the signal suffer from multipath effects. In urban areas signals hardly ever propagate from base station to mobile device along line-of-site route. This leads to poor positioning accuracy. To overcome this problem methods based on so-called fingerprinting have been introduced (Laitinen et al., 2001). First, the area, in which positioning is to occur, is divided into small sectors. In each sector a special information sample is taken and stored to a data base. The type of sample depends on the cellular system; for example, in UMTS the sample is a power delay profile of the sector (Ahonen & Laitinen, 2003). Later, when positioning takes place, the device performs similar measurements and seeks the data base from the most similar sample. The location, from which, the similar sample was taken is assumed to be the current position of the mobile device. The accuracy of this method is about 100 meters.
The benefit of cellular networks-based positioning methods is the possibility to use them in mobile phones without the need of additional hardware. However, their accuracy is significantly lower than that of GPS. Positioning methods based on cellular network can be used indoors, but problems of multipath-propagation decrease the accuracy there even further.
3.1.3. WLAN-based methods
The accuracy provided by cellular networks of 100 meters at best is usually not sufficient for indoor positioning. Therefore, use of Wireless Local Area Network (WLAN) networks, which usually have much smaller coverage, could be desirable. There are no positioning standards for WLAN technology, but some commercial realizations are available (e.g. Ekahau RTLS). Methods for WLAN positioning are very similar to those used in cellular networks. Positioning can be done either by using trilateration and received signal strength as a measure of relative distance between receiver and base station (Kotanen et al., 2003) or by using database correlation methods (Saha et al., 2003).
3.1.4. Proximity detection
Often information on proximately located resources is more important than exact geographical location. In such cases range of proximity detection solutions are available. Radio Frequency Identification (RFID) system consists of readers and tags. The reader device sends electromagnetic waves that are captured by a tag antenna. Transmitted electromagnetic radiation induces power into the tag microchip circuit and is used to send digital data stored in the tag back to the transmitter. In the case of passive RFID tags, only the reader energy is used to operate the tag. In the active RFID, the tag has its own energy source and electromagnetic radiation of the reader is used only to turn the tag into active mode.
Another approach for proximity detection is utilization of IR-beacons (Butz et al., 2000). The disadvantage of this technology is line-of-sight requirement, which means that the mobile device cannot be positioned if it is in the pocket or if the IR (Infra Red) receiver is not directed towards the light source. Other possibilities are utilization of Wireless PAN (Personal Area Network) communication equipment such as Bluetooth and ZigBee. These technologies are specified in IEEE 802.15 standard.
3.2. Activity Recognition Devices
According to the context definition of Dey and Abowd (2000b) activity context type is seen as referring to everything that answers the question of what is occurring in the situation. With regard to personal mobile computing devices such as mobile phones, this context type can be divided into two separate parts to simplify analysis: user activity and environmental activity. The present section demonstrates the measurement devices that can be used for monitoring these two context elements.
3.2.1. Environmental Activity Monitoring
Currently available mobile devices can already have several measurement devices capable of environmental monitoring like temperature sensor and photodiode. In this section these and some other sensors, introduced in the current research for environmental sensing related to context-awareness, will be described.
One device indispensible in any mobile phone is the microphone. Therefore, it received much attention from researchers in the field. In their work Korpipää et al. (2003) have used the microphone to recognize events such as speech, music, car and elevator. In another work Siewiorek et al. (2005) use the microphone as one of the context sources that can be used to update profiles of a mobile phone automatically.
In the two works mentioned above, the microphone was used in combination with other sensors to determine various environmental activities. In work by Clarkson et al. (1998) a microphone was used on its own in developing a Nomadic Radio application. In this work they introduce Auditory Scene Analysis (ASA) system, which divides an auditory scene into two layers. In the first layer single sound objects (speech, car, telephone ring) were detected and then, in the second layer of the ASA, the sequences and simultaneous occurrences of these objects were used to detect sound scenes (office, street, super market). Layered activity recognition architectures are discussed in greater detail later in this chapter in the section “Recognition Architectures”.
Schmidt et al. (1999) have introduced integration of photodiode into PDA (Portable Digital Assistant) for automatic control of the backlight of the display. Currently, this technology is widely used in several smart phone devices. In addition to the light intensity, photo-diodes and other optical sensors (like IR and UV sensors) can be used to distinguish the quality of the light. Spectral properties of the illumination can be used to distinguish artificial light sources from daylight and possibly to make a distinction among different indoor spaces (Mäntyjärvi, 2003).
Multiple optical sensors can be combined to enrich the information acquired from the environmental lightning. Optical sensors on different sides of the device can be used to determine if the device is placed on a surface or held by the user. In addition, the difference between direct and indirect lighting can be detected (Schmidt, 2002).
The video camera is a standard device in mobile phones and thus should be considered within the scope of context recognition. The camera has all of the capabilities of an optical sensor described above. Additionally, it can be used to detect commonly used routs and places visited as Clarkson (2003) and Aoki et al. (1999) have shown. These research groups investigated sequences of pictures taken by camera. Repetitive patterns were used for indoor navigation or detection of places visited before. The drawback of the camera is significant computational power demand and therefore higher energy consumption, which is an important issue in the area of mobile computing.
Finally, sensors capable of monitoring humidity, temperature and air pressure can be integrated into mobile phone as well. Use of these sensors was discussed by Schmidt (2002). Temperature can be used to determine if the mobile device is indoors. However, while low temperatures of -100C show that the device is most probably outdoors, high temperatures (e.g. +200C) do not rule out the possibility of it being indoors. In the same work Schmidt suggests that a humidity sensor can be used for weather monitoring and tracking of transitions from one space to another. Similarly, an air pressure sensor in can be used to detect the altitude of the device.
3.2.2. User Activity Monitoring
Accurate user activity recognition is as important as recognition of all the other context elements. In current research, wide range of different measurement devices has been introduced for user activity monitoring. However, it should be noted that in addition to the measurements provided by sensors, regular operations of the mobile device are an extensive source of user activity analysis. For example, calendar entries can be used to determine the availability of the user (Siewiorek et al., 2003).
An accelerometer is standard equipment in several mobile phone models (e.g. Nokia N95 and iPhone) and currently it is used for automatic picture rotation and as a control mechanism in certain games and interactive applications. Extensive research has been conducted into the capabilities of the accelerometer in the area of the user activity recognition.
Researchers have succeeded in determining a high number of different physical activities using accelerometers spread over the body of a user. Previous studies (Bao & Intille, 2004) have shown that two accelerometers mounted on the wrist and on the thigh are enough for accurate recognition of twenty different everyday activities like walking, sitting, working on computer, eating, watching TV, riding an elevator etc. In addition to dynamic motion measurements, three orthogonally mounted accelerometers are capable of determining the direction of the gravitation force and in this way estimate the orientation of the device (Bouten et al., 1997).
Detection of whether the device is held by the user is important information. It can be used to optimize the power consumption by turning the device to sleep mode when the user is not interacting with it. This information can be acquired using touch sensor. Realization of a touch sensor in practice has been achieved by Hinckley and Sinclair (1999). In this study the touch sensor was installed on a mouse and was used to improve the usability of an office application. Toolbars of the application were hidden when the user was not touching the mouse. This way, the part of the window reserved for text was increased and usability improved.
Physiological sensors are measurement devices capable of monitoring different mechanical, physical, and biochemical functions of living organisms. Several studies have examined the utilization of these sensors in the area of context recognition (Schmidt 2002, Krause et al. 2003, Pärkkä et al., 2006). The most popular physiological sensors are probably those capable of heart rate and pulse monitoring. These devices can be used to determine the physical and psychological load of the user. Another way of monitoring variations in the emotional state of the user is the measurement of the galvanic skin response. For instance, this technology is commonly used in lie detectors. Other sensor types used for monitoring physiological events include skin temperature and respiratory effort sensors.
There are numerous other sensors used in medicine for physiological monitoring tasks. However, they often require the insertion of the instruments into the human body. Currently, it is difficult to contemplate such devices as having context-awareness capabilities.
3.3. Architecture of Context Recognition Methods
Measurements obtained from the sensor devices described above are usually inadequate for recognizing an entity as complex as a context. Such complexity usually requires advanced processing techniques and the fusion of measurements from several sensing devices. This operation is performed using a combination of several machine learning techniques.
Context recognition methods proposed in previous research are very similar to the techniques used in the area of pattern recognition (Duda et al., 2001). The aim of the pattern recognition is to classify data on the basis of their properties and available a priori knowledge. In our case, the classification procedure involves associating different measurements with appropriate contexts.
Measurements are not usually processed by pattern recognition algorithms as such, but special analysis is required to extract more meaningful information from each sensor. For example, in sensors with rapidly varying output, like accelerometers, a single measurement value provides little information. Instead, measurements of such sensors should be analyzed in short bursts in order to determine properties related to variations and reduce noise. As a result, several descriptive measures, called features, are extracted from a windowed signal of each sensor. These features are then combined to numerical vectors and processed by a classification algorithm.
In several studies, the result of the classification algorithm is considered as a context (Bao & Intille, 2004; Bouten et al., 1997). The architecture described is commonly used for physical activity recognition (e.g. walking, running, sitting, etc.) (Ermes et al., 2008). In general this method is capable of recognizing events that can be characterized by the features extracted from a single signal window.
In some cases, a more accurate description of the context can be acquired, when a sequence of the classification results is analyzed. For example, a sequence of several sounds in the environment can be used to characterize more complex situations like supermarket, office and busy street (Clarkson et al., 1998). Detection of event sequences requires an additional layer in context recognition architecture.
The overall structure of the introduced architecture can be seen in Figure 1. The following sections describe in great detail the layers of this structure.
3.3.1. Feature extraction
While feature is a standard term in the area of the pattern recognition, studies in the area of context-awareness often refer to it as cue (Schmidt, 2002) or context-atom (Mäntyjärvi, 2003). The purpose of the feature extraction procedure is to convert rough sensor data into a more descriptive format for the given pattern recognition task.
Usually, features are extracted from longer measurement sequences, called windows. In the case of the periodical events, the duration of the window should be long enough to capture at least one period of the measured phenomena. In the case of monitoring non-periodic events, the window length can be chosen according to the amount of noise in the signal or other specific issues.
In general, there are two common types of features to be extracted: time domain features and frequency domain features. In the time domain, the windowed signal is analyzed as it is. In the frequency domain the window of signal is converted to Fourier-series using Fast Fourier Transform algorithm.
Common features extracted from the time domain representation of the signal include a mean value, standard deviation, median and quartiles. Another commonly used time domain feature is a count of zero-crossings in the signal window or differential of signal can be used to estimate the frequency of the signal.
More information on the spectral properties of the signal can be extracted from the frequency domain representation of the signal. Frequency of the spectral peak can be used to estimate period length of the periodic signal. Power of the spectral peak is proportional to the intensity of the periodic motion. Spectral entropy can be used to distinguish periodic signals from non-periodic ones.
3.3.2. Context Layer Algorithms
The purpose of the context layer algorithms is to match the derived feature vector to the best matching contexts. In other words, the goal is to find previously defined events that are most typical to the obtained measurements. This means that before the context can be recognized from measurements, recognition algorithms require some prerequisite knowledge about possible contexts and their characteristics. As a result, the context recognition process proceeds in two steps. The first step is learning. The prerequisite knowledge is obtained during this step. The second is classification, which matches unidentified features to one of the trained contexts.
In general, there are two learning strategies: supervised and unsupervised learning. The difference is that in the supervised learning, set of possible contexts is known during the training phase. An algorithm is trained using labeled feature vectors. A label is the name of the context that occurred at the time feature vector was recorded. The result is a trained algorithm capable of deciding on the most probable label for any given unlabeled feature vector.
In the unsupervised learning method, feature vectors used to train the algorithm do not have labels. Therefore, the purpose of the training is not to learn what the typical feature vectors are for each particular context, but to divide given feature vectors into groups based on their similarity. During the classification phase the trained algorithm is capable of deciding on which group any given feature vector belongs to, without giving any meaningful labels to the groups.
Patterns classification based on feature vectors is a widely studied subject in the field of pattern recognition. More detailed theoretical background on this topic can be found in the literature (Duda et al., 2001). Both of the training strategies are used as methods for context recognition and are briefly described below.
3.3.3. Supervised learning
The supervised learning strategy has the advantage of straight-forward applicability when the set of contexts to be recognized is known in advance. The disadvantage is the requirement for labeling feature vectors used for training. Usually this requirement involves a testee wearing measurement devices and performing different activities according to some predefined scenario. This scenario is then used to label the obtained measurements (Bao & Intille, 2004).
The supervised learning method is supported by algorithms such as classification trees, multilayer perceptron (MLP) and K-nearest neighbors –classifier (KNN). The classification tree represents the classification task as a sequence of questions. The answer to each question determines, which question will be asked next. At the end, each possible sequence of answers leads to some particular classification result. In the case of numerical feature vectors, questions usually attempt to clarify whether the value of a particular feature is greater than the threshold value obtained during training. Classification trees have been used in several studies (Bao & Intille, 2004; Ermes et al., 2008) and showed promising results.
Another widely used algorithm for user activity classification is the MLP, which belongs to a group of classification algorithms called artificial neural networks. Algorithms belonging to this group try to simulate behavior of biological neural networks in the decision making process. Multilayer perceptron consists of several interconnected layers of simple processing units called neurons. MLP is capable of obtaining accurate results in complex activity recognition tasks, though analysis why a particular classification result was suggested is problematic. Multilayer perceptron has been employed for user physical activity recognition in several works (Ermes et al., 2008; Aminian et al., 1999).
The third example of supervised learning methods is the KNN-classifier. In this algorithm during classification phase the measured feature vector is compared to all the feature vectors collected during the training phase. K most similar feature vectors are selected from the training set, where K is a predefined parameter. The classification result is chosen on basis of the majority of labels in the selected training set vectors. Examples of the use of KNN classifier for activity recognition can be found in several studies (Bao & Intille, 2004; Ravi et al., 1999).
3.3.4. Unsupervised learning
In the case of unsupervised learning, no scenarios are needed. The testee wearing sensor equipment does not even need to be aware of making measurements and this produces more realistic data. When it is important to maintain realistic sequences of the contexts, the unsupervised learning is a particularly suitable option. The disadvantage of unsupervised learning is a greater difficulty of interpreting the obtained data. It is possible to ascertain the time instances when the same context have occurred, but is impossible to know without additional knowledge which context it really was.
One commonly used algorithm based on unsupervised learning is the self-organizing map (SOM) by Kohonen (1998). This algorithm is a powerful tool for projecting high dimensional feature vectors to a lower dimensional (typically two dimensional) space. In the result of the projection, feature vectors that were similar in a high dimensional space, remain close to each other in the new lower dimensional space. The smaller number of dimensions makes the visualization of the measurements easier and the classification algorithms more powerful.
In the area of context recognition SOM is often used as a preprocessing method for other algorithms. This way, in (Van Laerhoven & Cakmakci, 2000) unlabeled measurements were processed by SOM algorithm to reduce the dimensionality of data to two dimensions and then several regions in the resulting map were labeled manually. During the classification phase the regular KNN algorithm was used for the context recognition. This helped reduce the time and computational effort expended on measurement annotations and computation.
In another study (Krause, 2003) measurements were preprocessed by SOM algorithm and then divided into several groups using K-means clustering. K-means clustering is an unsupervised learning based algorithm that divides given feature vectors into K distinct groups. No labels were assigned to the K extracted groups. The motivation of the authors was that context does not necessarily require any descriptive label to provide context aware functionality.
3.3.5. Context sequence analysis layer
Recognized context can be considered as a state of the system. In many cases this recognized state is overly fine-grained knowledge for context aware applications. Because of this, analysis of context sequences can be used to obtain more meaningful information. For example, Aoki et al. (1999) implemented an indoor positioning method based on sequence of video observations. In this work the authors used linear programming algorithm to recover the route taken by a person from individual pictures obtained from a video camera mounted on a rucksack strap.
Another benefit of context sequence analysis is the increased accuracy of the context recognition. For example, if we consider the case where the user is taking a bus and the output of the context recognition algorithm varies between riding a bus and driving a car, the sequence is clearly invalid. By performing sequence analysis, it is possible to check which context dominates the sequence and thereby resolve the problem. Of course, it is possible only in cases where the recognized context is correct most of the time.
Sequence analysis of the contexts has been used in several studies (Krause et al., 2003; Van Laerhoven & Cakmakci, 2000). In these research projects Markov Model was used to represent transition probabilities between various activities and thereby avoid invalid sequences of contexts. Markov Model is a common tool to represent the behavior of processes with discrete states and is commonly used in the area of speech recognition.
Sequence analysis is a desirable tool, when the events recognized on the context layer are expected to be rapidly changing over time. This is often the case for video and audio observation. When variations in the context are not particularly rapid, as in case of physical activity (walking, running, sitting, etc.), modeling of transition probabilities becomes more challenging. The measurements used for training should be recorded without any scenario, forcing the user to change activities in a particular order and at particular time instances. In general, it means that the unsupervised learning should be favored for the training of the transition probabilities in the case of slowly changing contexts.
4. Main concerns
Context awareness is not a free lunch. There are several problems to be solved before a wide adoption of the technology is possible. First of all, the applications are supposed to run on portable devices that contain fragile sensors. Mobile phones are durable and do not usually break as a result of a minor fall, but a context-aware sensor system might suffer partial or total damage. (Schmidt, 2002)
Limited battery life, broken sensors, and faulty connections give rise to missing or defective sensor values. When sensor values are faulty the system might not be able to compensate for the information loss. Sensor loss could be compensated either by design with overlapping sensors or by signal processing algorithms. For example, in activity classification in case of missing or defective sensor values the classifier could be taught to compensate for information loss by using information from other sensors.
One way to overcome the problem of missing measurements is to use a classification tree supplemented with so-called surrogate splitters. We have proposed a novel classifier based on them (Avdouevski et al., 2008). A regular classification tree is an algorithm producing a sequence of questions. Each question analyzes if a particular feature value exceeds a predefined threshold. The following question is chosen based on the answer of the previous one. The resulting sequence of answers to these questions defines the final classification outcome. In a classification tree with surrogate splitters, instead of using just one set of features and thresholds for each question, multiple optional feature and threshold sets from several sensors are used. These are defined using statistical analysis of feature distributions. The supplemented tree is able to cope with missing sensor values, because a compensatory option is often available. In addition, the classifier uses the latest complete classification result as extra knowledge. We have shown that the method is able to compensate the loss of sensors for half a minute at a time without a significant reduction in classification accuracy compared to conventional methods.
Context aware devices are part of pervasive systems that strive for invisibility. Hence, the actual devices have to be small in size in order to be unobtrusive. This is especially true for portable devices such as mobile phones, which also have to be light in weight. These factors limit the possibilities to embed new functionality to any device. For example, a pocket-sized mobile phone cannot contain many sensors for context awareness. Similarly, battery and memory unit sizes are limited. Therefore, saving energy and efficient memory usage become critical.
4.2. Energy consumption
A context-aware system typically consists of a personal device, an external system and a connection between them. Portable devices, like mobile phones, are not connected to line current and so long battery life is an important part of usability. Energy consumption can be reduced with system design solutions, hardware selection and algorithms.
There is a tradeoff between the “always on”-demand of mobile applications and their energy consumption. If a device measured, for example, body movements of the user at a constant high sampling rate, the batteries would run out soon. For ideal use of resources, the device has to know when and what to measure. This is difficult because relevant information can be lost if the samples are taken too seldom.
There are several algorithms to minimize energy consumption. One way is to limit sampling frequency or instances at which the samples are taken. This also frees up processor time and reduces memory requirements and network usage. Moreover, in order to save memory, one might save preprocessed information, like features or statistics instead of raw signal. Heuristics related to the application field can also be used for the task, although generic solutions cannot be created this way. For example, in activity monitoring it is probable that the individual will continue the activity for prolonged period of time. Thus, in a week-long activity monitoring period, samples could be saved at intervals of only once a minute (Marschollec et al., 2007).
This idea can be extended so that sampling frequency is adapted to the current context. In our work (Makkonen et al., 2008) we considered the activity recognition problem by treating each activity as a state of a state machine. Each state defines how samples are measured and which features are calculated for the classification. Measurement parameters include the time interval between measurements blocks and the sampling rate of each block. Furthermore, after each measurement the stability of the state is determined based on feature values. In the case of stability, the algorithm skips samples as it is assumed that the state is not likely to change rapidly. Conversely, in the case of instability, the system measures more samples and performs a more thorough activity classification.
The approach reduces energy consumption by measuring and transmitting fewer samples, calculating fewer features, and performing the classification more seldom. We have shown that the recognition performance does not drop dramatically compared to the cases when all available samples are used. This enables the approach to be used in mobile applications. (Makkonen et al., 2008)
Processing capacity on small portable devices is limited. For the user, this usually means delays in functionality. Similarly, savings in processing time also have an effect on the power consumption. Again, processing complexity can be minimized by sensor selection, system design, and sampling schemes. Adaptive sampling or lowered sampling frequency minimized the amount of samples needed, and therefore processor time. Feature selection controls the amount of calculation needed. Usually, simple features like mean and variance are faster to calculate than, for example, Fourier transformation. Even classification algorithms can be selected to preserve processing time. For example, nearest neighbor classification can be done by calculating the distance to all previous samples, which is a heavy operation, or to the cluster centers, which is a significantly lighter task.
Despite network connection being a major energy drain, in some context aware applications mobile device needs to be in continuous communication with an external system. One of the ways to limit network power is timed functionality. If the system uptime is limited, energy drain is reduced. So instead of always on, there could be the following options: only on when selected on, on with limited functionality or on with timed functionality.
Context awareness can enhance situation awareness and provide more services but it can also involve too much interaction so that using the system becomes burdensome. Conversely, if there is not enough interaction, the user can be denied important functionality. When users are subjected to barrage of enquiries, they are likely to give random responses in order to get rid of the query (Ackerman et al., 2001). In addition, some of the automatic actions taken by the applications can be even dangerous. These error situations can arise as a result of false recognition of the context.
For example, the autopilots of aircrafts and crash prevention systems of cars are critical applications. They use a sensor system for detecting abnormal situations. If they make false assumptions on the status of the environment, they might cause severe situations. Although these erroneous actions are unlikely, every now and then they happen and sometimes lives are lost.
Humans define context differently than computers and usually want devices to react in the same way in the same circumstances. However, when awaiting an important phone call, the user will probably willingly accept the call, even if it means interrupting a meeting or other such activity. Depending on the situation, context information might not be clear. The human aspect is hard to define and therefore a context aware system might not be able to differentiate between states. It is normal for a human to forget things, such as switching a mobile phone to off or to quiet and then back on. A context aware device might be able to compensate for much of this, but there are also situations when the user wants the system to act in the opposite way to previous situations.
4.4. Legislation and ethics
Apart of unintended fully automatic actions, a major concern is maintaining privacy. When a context aware system is in use, sensors continuously collect data that could be considered private. Since privacy is considered a human right, almost everything concerning personal information is regulated by laws: collecting, sharing, transferring, manipulating, storing, combining, using and so on. Some of the regulations can be by-passed with the users consent, but not all of them and not everywhere. Laws differ between information sources and regions and what may be legal in one country may be illegal in another.
Fair information practices are a group of commonly accepted principles in privacy protection. There are five principles involved: notice and awareness of collected information; choice and consent to give information; access to given information and participation; integrity and security of the information given; and enforcement and redress. The last item is included because without proper enforcement, the principles would be ineffective. (Federal Trade Commission, 2009)
Laws governing information privacy attempt to protect personal information from misuse and disclosure. Context awareness, and information technology overall, is still a new technology so the laws dealing with it are still changing in many regions. In the European Union the main guidelines are given by Directive 95/46/EC on the protection of individuals with regard to the processing of personal data and on the free movement of such data. However practices still vary within different countries in the area. Briefly, the Directive states that a reason must exist for the collection of such data, that the data is only used for that purpose and when the reason ceases to be valid the data is deleted. Collected data should not be shared without legal authorization or the consent of individuals concerned. Records must be kept up to date and be available for inspection. Data should be protected and should not be transferred to sources where it might be vulnerable. Finally data such as sexual orientation, religion and political beliefs must be considered too sensitive to be collected in all but extreme circumstances. (Wikipedia, 2009a; Wikipedia, 2009b; European Commission, 2009)
Although privacy is mainly a legal or ethical issue, the solution to privacy protection lies in technical details and design. There are several ways to maintain privacy in context aware systems. The system could either be designed to collect only information that is not considered private, or the information could be processed so that personal information is blurred or removed. Privacy protecting algorithms work at different levels. For example, sensors can remove some of the information and after that the processing unit that gathers data from several sensors can process it anonymously.
If the collected data is considered private, it is important that it is stored safely. Access to the information should be limited to authorized users. Interactive systems keep the user aware of the shared information, but more invisible technology, which is designed not to need interaction, can collect information that the user does not want to be collected. Information sharing is limited by laws that protect privacy, but ethics do not necessarily correspond to laws, and organizations may attempt to profit from such information.
Context awareness is a useful enhancement to many applications, but the ethics of the technology raise certain questions. From ecological perspective, the “always available” technology has implications for energy consumption and recyclability. Most context aware devices are designed to consume small amounts of energy. However, if every person were to have a device switched on mode, the energy drain would be significant. Recyclability in this context means the reusability of devices and sharing. If several devices can make use of the same sensors environmental and economical costs will be lower.
In a home monitoring system that records information on people inhabiting or visiting a house, ethical issues arise when individuals are unaware of being monitored or some of the information is made available to unknown people. For example, a doctor or other health care professional might check on the condition of the person using the monitoring system. The patient is aware of this and is aware of the sensors in use. There might be a camera in use, but the patient knows where it is and when it is on. However, this situation would be unethical, if the person was unaware of any monitoring activity or there was no privacy.
Kosta et al. (2008) have considered ethical and legal issues concerning user centric ambient intelligence services. These scenarios are comparable to other context-aware applications. For example, a home monitoring system enables the elderly to stay in their homes longer without the need for institutional care. Though this is beneficial, the system incurs costs which accrue from system parts and the monitoring personnel. Furthermore the person being monitored needs to know how to use the technology. This all indicates that the system is available to only privileged group, and thus could promote social inequality.
We have presented a wide variety of applications that use context-awareness and the technologies behind the concept. We have also discussed implementation issues and problems that are yet to be solved. Our own work in the field continues as we try to overcome these obstacles. Context awareness needs cheaper and lighter sensors, better ways to save energy, and novel ways of gathering knowledge about the surrounding world. In addition, the legislation has to be carefully considered when designing new applications that pose a threat to privacy.
Based on the evidence of the current state-of-the-art and our work, it can be said that the concept of context awareness is gaining more and more foothold in information technology. Thus, it is reasonable to assume that the concept will become widely used in future HCI, and many new application areas that use it will appear.