Results in terms of accuracy on the bedroom scenario of the CAD-60 dataset (‘new person’) using single classifiers, a simple averaged ensemble (AV) and the DBMM.
This chapter discusses the use of dynamic Bayesian networks (DBNs) for time-dependent classification problems in mobile robotics, where Bayesian inference is used to infer the class, or category of interest, given the observed data and prior knowledge. Formulating the DBN as a time-dependent classification problem, and by making some assumptions, a general expression for a DBN is given in terms of classifier priors and likelihoods through the time steps. Since multi-class problems are addressed, and because of the number of time slices in the model, additive smoothing is used to prevent the values of priors from being close to zero. To demonstrate the effectiveness of DBN in time-dependent classification problems, some experimental results are reported regarding semantic place recognition and daily-activity classification.
- dynamic Bayesian network
- Bayesian inference
- probabilistic classification
- mobile robotics
- social robotics
Bayesian inference finds applications in many areas of engineering, and mobile robotics is not an exception. When time is a variable to be considered, the dynamic Bayesian network (DBN) [1–5] is a powerful approach to be considered. Due to its graphical representation and modelling versatility, DBN facilitates the problem-solving process in probabilistic time-dependent applications. Therefore, DBNs provide an effective way to model time-based (dynamic) probabilistic problems and also enable a very suitable and intuitive representation by means of a graph-based tree.
Depending on the structure of the DBN, the joint probabilistic distribution that governs a given system can be decomposed by a tractable product of probabilities, where the conditional terms only depend on their directly linked nodes. This chapter concentrates on inference problems using DBN where the variable to be inferred from a feature vector (data) represents a set of semantic classes or categories, in the context of intelligent perception systems for mobile robotics applications. Namely, we will address problems where denotes semantic places in a given indoor environment [6, 7] e.g. and also when the classes of interest are daily-live activities [8, 9].
The principle of Bayesian inference basically depends on two elements: the prior and the likelihood; in practical problems, the evidence probability acts ‘only’ as a normalization to guarantees that the posterior sums to one. In this chapter, we will deal with the problems of the classical Bayesian form , but the incorporation of (past) time will be explicitly modelled in a discrete-time basis, and the
The observed data enters the DBN in the form of a vector of features
The use of Bayesian inference in mobile robotics for purpose of localization, simultaneous localization and mapping (SLAM), object detection, path planning and navigation, has been addressed in many scientific works; see Ref.  for a review. The majority of those applications involve stochastic filtering, such as Kalman filter (KF), particle filter (PF), Monte Carlo techniques and hidden Markov model (HMM) [11, 12]. However, when the parameter of interest has to be inferred from multidimensional feature vectors ( e.g. feature vectors with hundreds of elements) and also when the distribution that the observed data were drawn is not known (in unseen/knew or testing scenarios) then, a DBN can be used to handle such complex problems. In robotics, semantic place classification [6, 7] and activity recognition [8, 9] are examples of such problems and belong to the research area of pattern recognition. For these application cases, the class-conditional probabilities (or likelihoods) can be modelled using machine learning techniques, for example, naive Bayes classifier (NBC), support vector machines (SVMs) and artificial neural networks (ANNs) [13, 14].
The remainder of this chapter is organized as follows: a brief review of the DBN is given in Section 2. Section 3 addresses inference in DBN, formulated for purposes of pattern recognition in robotics, followed by the use of additive smoothing on the prior distributions. In Section 4, experimental results on semantic place classifications and activity recognition are presented. Finally, Section 5 presents our conclusions.
2. Preliminaries on DBN
Basically, a DBN is used to express the joint probability of events that characterizes a time-based (dynamic) system, where the relationships between events are expressed by conditional probabilities. Given evidence (observations) about events of the DBN, and prior probabilities, statistical inference is accomplished using the Bayes theorem. Inference in pattern recognition applications is the process of estimating the probability of the classes/categories given the observations, the class-conditional probabilities, and the priors [15, 16]. When time is involved, usually the system is assumed to evolve according to the first-order Markov assumption and, as consequence, a single time slice is considered.
In this chapter, we address DBN structures with more than one time slice. Moreover, the conditional probabilities of the DBN will be modelled by supervised machine learning techniques (also known as classifier or classification method). Two case studies will be particularly discussed: activity recognition for human-robot interaction and semantic place classification for mobile robotics navigation.
The observed data variable, denoted by , enters into the DBN in the form of conditional probabilities , where the values of
In summary, DBN is a direct acyclic graph (DAG) that consists of a finite set of events (the nodes or vertices) connected through edges (or arcs) that model the dependencies among the events and also the time variable. Here, the nodes are given by the variables , and the dynamic (time-based) behaviour of the BDN is considered to be governed by the current time
3. Inference with DBN
The problem is formulated by considering i.e., the joint distribution of the nodes over the time up to
The simplest case is for a single time slice where the posterior reduces to . For two time slices, we have
As the number of time slices increases, the problem of inferring the class becomes more complex; therefore, some assumptions can be made in order to find a tractable solution. As a first assumption, let the nodes be independent of later (subsequent in time) nodes. As a consequence, and taking as the example for
This strategy for ‘updating’ the values of the prior by taking the values of previous posteriors is a very common and effective technique used in Bayesian sequential systems. The steps involved in the calculation of the posterior probability, as expressed in Eq. (2), are illustrated in Figure 3.
Selection of the class-conditional model to express is an important part of the approach and can be achieved by well-known probabilistic machine learning methods. Although generative methods (e.g. Naïve Bayes, GMM and HMM) provide direct probabilistic interpretation and, therefore, constitute appropriate choices, discriminative methods (e.g. SVM, random forest and ANN) tend to have better classification performance. However, to be a suitable model, a given discriminative method has to be of a probabilistic form; this implies, at least, that the outcomes from the classifier sum to one. A more advanced method can be used to model in a DBN, as the dynamic Bayesian mixture model (DBMM) , where a mixture of
The product of likelihoods and priors, in the expression of the a-posteriori Eq. (2), has the consequence of penalizing the classes that are less likely to occur. In other words, the classes with low probability, i.e. close to zero, will have an even more low values of posterior; this effect is intensified as the number of time slices increases. Because the priors are recursively assigned by assuming the values of the previous posteriors, we suggest to use additive smoothing to avoid values of priors to be very close to zero.
Additive smoothing, also called Lidstone smoothing, adds a term (
Figure 4 provides an example of the impact of
4. Experiments on classification: mobile robotics case studies
In order to demonstrate the use of the DBN as formulated above, we will consider two classification problems that find applications in mobile robotics: semantic place recognition  and activity classification .
4.1. Semantic place recognition
Figure 5 illustrates a probabilistic system for semantic place recognition where data comes from a laser scanner sensor. In a practical application, the sensor is mounted on-board a mobile robot [6, 7]. Based on Figure 5, we can make a direct correspondence with the DBN discussed above by verifying that the feature vector is
As an example of the DBN application in semantic place classification, let us report some results from Ref. , where a DBN was applied on the image database for robot localization (IDOL) dataset: available at http://www.cas.kth.se/IDOL/. In this context, the problem of semantic place classification can be stated as follows: ‘given a set of features, calculated on data from laser scanner sensors (installed on-board a mobile robot), determine the semantic robot location (‘corridor’, ‘room’, ‘office’, etc) by using a classification method’. The experiments in Ref.  use a mixture of classifiers to model the class-conditional probability in the DBN; such approach is called DBMM .
Figure 6 shows recognition results in a sequence of nine frames from the IDOL dataset, where the first row depicts images of indoor places as captured by a camera mounted on-board a mobile robot. The second row provides classification results without time slices (i.e. time-base prior probabilities are not incorporated into the DBN), and the subsequent rows show classification probabilities for a DBN with time-slices up to three. In the figure, the vertical line (in red) indicates the transition between classes: from the class ‘kitchen’ (KT) to the class ‘corridor’ (CR).
4.2. Activity classification
In the case of the activity classification problem described here, the objective is to classify the human’s daily activity based on spatiotemporal skeleton-based features. In such a case, mobile robots mounted with appropriated cameras can make use of such classification models to improve the quality of life of, for example, old-age people, by assisting them in their daily life or detecting anomalous situations. Similar to semantic place recognition problem, the activity classification problem can also be seen as a time-dependent probabilistic system, where the feature vector
Figure 7 exhibits an activity classification framework, based on Ref. , which uses a DBN with mixture models (the DBMM approach as previously described in the semantic place classification problem), where the data is acquired by using an RGB-D sensor, followed by the skeleton detection step and the feature extraction process, where the latter is based on geometrical features. From the training stage, global weights are computed using an uncertainty measure (e.g. entropy) as a confidence level for each base classifier based on their performance on the training set. During the test, given the input data (i.e. skeleton features for the current activity), base classifiers are used and merged as mixture models with time slices (using previous time instant classification) to reinforce the current classification.
The well-known dataset for activity recognition Cornell Activity Dataset (CAD60) [9, 17] was used to evaluate the proposed framework in Refs. [8, 18]. The CAD-60 dataset comprises video sequences and skeleton data of human daily activities acquired from a RGB-D sensor. There are 12 human’ daily activities performed by four different subjects (two male and two female, one of them being left-handed) grouped in five different environments: office, kitchen, bedroom, bathroom and living room. Additionally, the CAD-60 dataset has two more activities (random movements and still), which are used for classification assessment on test sets, in order to evaluate precision and generalization capacity of the approaches since these activities encompass similar movements to some other activities. We have adopted the same strategy described in Ref. , so that we present the classification results in terms of precision (Prec) and recall (Rec) for each scenario. The evaluation criterion was carried out using leave-one-out cross-validation. The idea is to verify the generalization capacity of the classifier by using the strategy of ‘new person’, i.e. learning from different persons and testing with an unseen person. The classification is made frame-by-frame to account for the accuracy of the frames correctly classified.
Results show the DBMM approach obtained better classification performance compared to other state-of-the-art methods presented in the ranked table in Ref. . The overall results were precision: 94.83%; recall: 94.74% and accuracy: 94.74%. Figure 8 presents the classification performance (i.e. precision and recall) for the ‘new person’ tested in each scenario. For comparison purposes, Table 1 summarizes the results in terms of accuracy of state-of-the-art single classifiers and a simple averaged ensemble compared with the proposed DBMM for the bedroom (scenario with more misclassification), showing that our approach outperforms other classifiers. The classification performance in terms of overall accuracy, precision and recall has shown that our proposed framework outperforms state-of-the-art methods that use the same datasets .
In this section, we have shown the DBMM [8, 18] performance using an offline dataset. Additionally, further tests using a mobile platform with an RGB-D sensor on-board running on-the-fly in an assisted living context was also successfully validated with accuracy above 90%, as reported in Ref. . More details about the DBMM using a mobile robot for activity recognition and a video showing the classification performance can be found in Ref. .
In this chapter, the authors have presented a DBN formulation for classification of time-dependent problems together with experimental results on applications of two mobile robots. The first one regarding the semantic place classification and the second one based on activity classification. In both formulations, the DBN was used as basis to compose the DBMM [6, 8, 18], a more complex structure used to handle more complex scenarios. In both applications, the DBMM has shown to be a powerful choice in modelling of time-dependent scenarios.
When it comes to semantic place classification, the model could detect classes’ transitions during the robot navigation, thanks to the different time slices (i.e. higher than 2) and the additive smoothing used in the model. In the case of activity recognition, since the activities in the dataset do not have classes’ transitions, i.e. only one activity is performed during a task, in this case, a simple version of the DBMM using only one time slice is enough to correct classify all activities. For real-time applications using a mobile robot and in accordance with experimental results reported in Ref. , it is suggested to use more than two time slices in the mode.