Blind Implicit Source Separation – A New Concept in BSS Theory

algorithm proposed in the research work referenced above solves with success the problem of classiﬁcation of trafﬁc noise. Within this algorithm, the BISS problem is handled in an embedded way. Motivated by the promising results achieved, a new compact expression for the BISS solution is now proposed. The new BISS-PCA method introduced here robustly solves the feature extraction process for the problem described. The conclusions of this


Introduction
The Blind Source Separation (BSS) problem was first introduced (Herault et al., 1985;Jutten & Herault, 1988) in the context of biological problems (Ans et al., 1983;Herault & Ans, 1984) with the aim of being able to separate a set of signals generated by the central nervous system. A few years later, several methods based on BSS were applied to other fields of industry and research (Deville, 1999). The BSS problem arises from the need to recover the original sources from a blindly mixture. This extraction is characterised as a blind process because the lack of information about the following topics: the characterisation of the sources, the number of sources present at the time of the mixture, and the way that this mixture is performed. Although this kind of information is unknown, the problem described can be solved if the input signals to the mixture process are statistically independent. Related literature provides several methods, most of which have been classified according to the context in which the mixture is performed: linear mixture model, convolutive mixture model, and non-linear mixture model. The first part of this chapter is devoted to describe the most relevant existing works in applying these methods to the audio field. Many of the real problems, however, do not support this simplification, so this part stresses the need for full characterisation of the problem, mainly about the mixing process and the nature of the sources involved.
Typically, the goal of the BSS theory is to extract a set of variables matching the sources involved in the mixture. We have detected, however, the existence of other research fields where the goal is to extract from the mixture another set of variables which appear as implicit functions of the hidden sources. Extracting these variables brings a new challenge for the BSS theory, becoming particularly complex when the sources have a noisy nature. In the second part of this chapter, a complete definition of this new problem is introduced, for which the BSS problem in its classical form must be reformulated. Used by first time in (Mato-Méndez & Sobreira-Seoane, 2011), within a pattern recognition context, the Blind Implicit Source Separation (BISS) concept opens an interesting research field. The BSS-PCA algorithm proposed in the research work referenced above solves with success the problem of classification of traffic noise. Within this algorithm, the BISS problem is handled in an embedded way. Motivated by the promising results achieved, a new compact expression for the BISS solution is now proposed. The new BISS-PCA method introduced here robustly solves the feature extraction process for the problem described. The conclusions of this

Blind audio source separation
The aim of BSS theory is to extract p unknown sources, from m mixtures acquired through a sensors network. To solve this problem, the literature provides a wide set of methods, most collected in (Comon & Jutten, 2010;Hyvärinen et al., 2001). In this sense, many algorithms have been applied to the context of audio signals, and they can be classified according to the solution of three different problems. First, the denoising process from an undesired mixture provided by both, the channel noise and the sensors network noise. Second, the separation of musical sources from an audio mixture. Finally, the problem created by the "cocktail party" effect (Cherry, 1953), generated when several speakers talk at the same time in reverberant field conditions. Other problems appearing in the state of the art can be analysed as a combination of the above.

Mixture models
The study of the solution becomes very complex taking into account the existence of different types of problems and application contexts. For many years, however, they have been addressed according to how the mixing process is performed. A generic mixture model for the BSS problem can be written as where H is a function of both, the channel and the sensor network, and is a Gaussian additive noise signal, independent of the p sources of s. Thus, existing methods can be classified according to this criterion (see (Comon & Jutten, 2010;Mansour & Kawamoto, 2003;Pedersen et al., 2007;Puntonet G., 2003) for more detail) into the categories that are described below.

Instantaneous mixtures
Source separation from instantaneous mixtures has been one of the first applications of BSS in the audio field. For signals acquired into a recording studio, the mixing process can be considered instantaneous: first, the signals associated with each of the sources can be considered independent because being acquired at different times and at different spatial locations. Second, the multipath contributions associated with both, the sources and sensors, can be neglected thanks to the acquisition process of the mixture and third, studios can be considered as "noise free" controlled environments. So the signals recorded under these conditions does not contains neither relevant undesired reflections or significant noise contributions. Thus, many authors approach this problem by means of an instantaneous mixture model. For this situation, the channel is characterised by have no memory, for which the mixture acquired by the j − th sensor can be modelled as In this context, the function H in (1) can be identified with a real matrix verifying that where the vector x contains the contributions of the m sensors in the array. So, the separation problem is reduced to solve the system of Eq. (3). In this case, the solution can be achieved by applying ICA on this equation. Before proceed, it is necessary to have at least the same number of mixtures than sources. Besides, at most, only one source can show a Gaussian distribution. Under these conditions, the separation is performed by calculating an estimation of the mixing matrix that minimises the statistical dependence between components of the original signals.
The contribution of the sensor array and the channel makes not possible to neglect the noise effect in most applications. Therefore, the signal acquired by the j − th sensor can be expressed as where s j (n) is the noise signal acquired by the j − th sensor, and c j (n) is the noise signal provided by the channel. The last signal is typically characterised as wide-band noise, with N (μ c j , σ c j ) distribution for that sensor. It is usual to express the sum of these two noise signals as Taking into account this undesired effect, Eq.

Convolutive mixtures
When the mixture is not instantaneous, the channel has memory, so the signal acquired by the j − th sensor can be expressed as where r is the order of the FIR filter that models the mixture. Thus, this mixture can be modelled by means of the expression

323
Blind Implicit Source Separation -A New Concept in BSS Theory This is the convolutive model, where H(l) is the matrix that models the channel and H(z) the matrix that models the effects of sources on the observations. Therefore, this last matrix can be written by means of the Z transform as Several ICA-based algorithms can be applied in this case to carry the separation process out. In the context of audio, the convolutive problem is classically analysed by means of second order statistics (Ehlers & Schuster, 1997;Ikram & Morgan, 2001;Kawamoto et al., 1999;Rahbar & Reilly, 2001;Sahlin & Broman, 1998;Weinstein et al., 1993), higher order statistics (Charkani & Deville, 1999;Jutten et al., 1991b;Nguyen et al., 1992;Nguyen & Jutten, 1995;Van Gerven et al., 1994) and probability density function (Bell & Sejnowski, 1995;Koutras et al., 1999;Lee et al., 1997a;Torkkola, 1996).

Nonlinear mixtures
In a more general approach, the H function in Eq.
(1) does not support a linear form. This is the case for the separation problem of traffic noise sources in a general context. In this problem, the original sources can not be observed and it is unknown how their signals have been mixed. So, if possible, the extraction of the signals that make up the resulting mixture can be a priori characterised as a blind separation process.
For nonlinear mixtures it is usual to simplify the problem by using a post-nonlinear mixture model as being H 2 a real matrix and H 1 a nonlinear function. To solve it, research works based on second order statistics (Molgedey & Schuster, 1994) and based on the probability density function (Solazzi et al., 2001;Valpola et al., 2001) can be consulted.

Full problem approach
The usual procedure in BSS is to analyse the problem by means of identifying its mixing model. A proper application of the methods described, however, requires an additional knowledge about both, the mixing process and the nature of the sources involved. Thus, to set an accurate strategy of separation it is necessary to add other informations.
The BSS problem for those situations in which the number of observations is higher than the number of sources (over-determined problem), or equal (determined problem), is well studied. For other situations (underdetermined problem), much remains to be done. This new approach leads to research works focused on solving underdetermined problems (Nion et al., 2010;Rickard et al., 2005;Sawada et al., 2011;Zhang et al., 1999a), and focused on optimising the solution for over-determined problems (Joho et al., 2000;Yonggang & Chambers, 2011;Zhang et al., 1999a;.
In addition, a prior knowledge about both, the statistical and spectral characterisation of the sources, will lead to more efficient separation methods. Thus, the information can be extracted by means of BSS algorithms that exploit the study of second order statistics for non-stationarity sources (Kawamoto et al., 1999;Mansour & Ohnishi, 2000;Matsuoka et al., 1995;Pham & Cardoso, 2001;Weinstein et al., 1993) and cyclo-stationarity sources (Knaak et al., 2002;2003). These will also be suitable for the separation of whiteness sources (Mansour et al., 1996;. Some information, however, contained in wide-band sources can not be extracted only using second order statistics. In this case, algorithms based on higher order statistics must be applied. Finally, many of the algorithms show an excellent performance working on synthetic mixtures. However, a significant degradation in the results is detected when they are applied on real mixtures. In addition, a distinction between both, master-recorded and live-recorded mixtures, must be done. Research works carried out to solve audio signals separation in real conditions can be found in (Kawamoto et al., 1999;Koutras et al., 2000;Lee et al., 1997a;Nguyen et al., 1992;Sahlin & Broman, 1998).
The BSS problem applied to extract signals from a noisy mixture is well studied. The residual signal in this case is typically characterised as white noise. A particularly complex problem occurs, however, when the signals to extract are noise signals. Besides, these are in general characterised as coloured noise, as it occurs for traffic noise sources. In this sense, the research carried out by us regarding the application of BSS to traffic noise real mixtures may be consider a pioneer work. The more closest researches can be found in the study of mechanical fault diagnosis in combustion engines. This is a less complex problem because the signal acquisition process is performed by isolating the engine. The research is focused in applying BSS for the study of its vibrational behaviour. Existing papers (Antoni, 2005;Gelle et al., 2000;Knaak & Filbert, 2001;Knaak et al., 2002;Wang et al., 2009;Wu et al., 2002;Ypma et al., 2002) show the difficulty in the search for satisfactory solutions. The complexity of application of BSS theory will become higher by incorporating other sources for the generation of the traffic noise signal. The next section is devoted to the study of this problem in the context of pattern recognition, for which the BSS problem needs to be reformulated.

Blind implicit source separation
This new concept is related to the classical definition of sources into a BSS problem and it has been detected by us in classification problems of noise signals. In a generic classification problem, the main goal is to assign an unknown pattern ϕ to a given class C i . This class belongs to the set C of c classes previously determined. The starting condition is that each pattern shall be represented by a single vector of features, and it can not belong to more than one class. Under these hypotesis, this pattern may be uniquely represented by ϕ = [ϕ 1 , ϕ 2 , . . . , ϕ d ] T , where d is the number of the extracted features and the dimensionality of the classification problem. For a better understanding of the new BSS concept, the following two examples of application may be considered: • Mechanical fault diagnosis in combustion engines For the context described, the fault diagnosis can be seen as the combination of two problems to be solved: a classification problem, and a source separation problem. Thus, the BSS application has two purposes: the first task, being able to separate relevant information from the wide-band noise associated with the vibration of the structure. This relevant information is contained within the spectral lines associated with the combustion noise, so that the first task may be characterised as a denoising process. The second task is focused in extracting the information contained within the set of spectral lines and assign it to one of the engine phases. Thus, the strategy followed seeks to improve the identification of possible faults associated with one of the engine phases. This identification task can be 325 Blind Implicit Source Separation -A New Concept in BSS Theory viewed as a classification problem. The prior application of BSS results therefore in a better definition of the boundaries that separates the two classes previously established (faulty, non-faulty).

• Classification of traffic noise
Although, in a colloquial sense, being able to separate two sources of traffic noise might seem synonymous with being able to classify them, both concepts differ in practice because the processing methods applied. There appears, however, a clear correlation between both, the difficulty in applying blind separation algorithms on specific classes of sources and the difficulty in applying classification algorithms on them. To compare the problem with the above, it must be simplified by considering only the combustion noise. In this case, the classification problem consists in assigning an unknown pattern with a predetermined class of vehicles regarding its noise emitted level. In this case, a single engine can belong to two categories of vehicles. Unlike the previous case, the features vector does not provide discriminative information, so an extraction of information from extra sources is needed. The trouble, as the reader may guess, is the lack of uniqueness for the solution. This issue occurs for other sources considered, so the problem is not successfully solved by adding them into the feature extraction process.
As it will be shown, the problem of classification of traffic noise is much more complex than the one described in the example. The signal acquired by means of a sensors network is a combination of a large number of noise sources. Thus, the associated BSS problem becomes into an extremely complex problem to solve: • For a isolated pass-by, the vibration behaviour of the engine becomes more complex due to the change of the mechanical model handled. This model is now in motion, and it is affected by its interaction with the other parts of the structure. The information associated with the spectral lines, located at low frequencies, is now altered by energy from other systems such as suspension or brakes. The resulting signal is thus combined with noise induced by the exhaust system. • The turbulences created by the vehicle in motion (aerodynamic noise) spread energy at high frequency on the acquired signal. Both, the distribution and intensity of this energy, will depend on the geometry and speed of the vehicle. For a given geometry, the higher the speed of the vehicle, the higher the emission at high frequencies will be. • Once exceeded 50 km/h, for motorcycles and cars, and 70 km/h for trucks, most of the energy in the acquired signal is now associated with rolling noise. This noise is generated by the contact of the wheel with the pavement surface. Thus, a masking of information associated with the three sources of noise described above is produced. • The consideration of other features modifies the resulting signal: directivity pattern of the vehicle, vehicle maintenance/age, road conservation, ground effect, Doppler effect, type of pavement, distance from source to the sensor network, atmospheric conditions and reflexions of the signal on different surfaces close to the road (buildings, noise barriers, ...). • The traffic noise signal results from a combined pass-by of vehicles. This combination adds both, an interfering pattern and a masking effect, into the mixing process of the signals associated with each of the sources.
Several calculation methods have been developed to predict the noise levels emitted by the traffic road. These are based on mathematical models trying to find the best approximation to the real model described above. This real model is too complex to be implemented, so an approach is carried out by simplifying the number of sources to be considered. Thus, part of the information needed to carry out this prediction is obtained by means of indirect methods. Regarding the European prediction model (CNOSSOS-EU, 2010), information about the average speed of the road, the pavement type and the traffic road intensity is then needed. This information must be collected according to the vehicle type categorisation performed. Thus, we decided to address the design of a portable device capable to provide such information in real time. For this purpose, the more complex trouble lies in the classifier design. Within this, the incorporation of BSS techniques was proposed with the hope to improve the feature extraction process. To address this task into an intercity context, the mixing process can be modelled according to the scheme of Fig. 1, where s i (n) is the signal associated with the vehicle to be classified. The goal will be to extract the feature vector of an event whose signal is hidden in mixture where overlapping events are present. The extraction of the signal s i itself does not help, because this signal carries information associated to other events. It is therefore necessary to find another way to extract this features vector by means of the discriminative information associated with the event to be classified. So, it is proposed to express this information through the acquired mixture as s i Γ (n) = Γ i (x(n)). Thus, the problem to be solved consists in finding for wich the BSS problem can be expressed as As the reader can see, the BSS sources in its classical form remain hidden. For this reason, we have named this new BSS problem as Blind Implicit Source Separation (BISS). To solve it, the sources definition handled in Fig. 1 is thus no longer valid.

327
Blind Implicit Source Separation -A New Concept in BSS Theory

Dimensionality reduction
One of the typical problems that appear in pattern recognition is the need to reduce the dimensionality of the feature space. For this task both, Principal Component Analysis (PCA) and Independent Component Analysis (ICA), are the most usual techniques employed. The performance obtained, however, may be different according to the problem to be solved. As it will be seen through this section, the explanation lies in the way that both techniques are applied.
An overview of the original feature space shows in general the existence of values that do not efficiently contribute to the extraction of discriminative information for classification purposes. Under this assumption, for years a large number of techniques has been developed (Fodor, 2002). The goal is to reduce the dimensionality of the original problem, while minimising the possible loss of information related with this process. Most are based on the search of subspaces with better discriminative directions to project the data (Friedman & Tukey, 1974). This projection process involves a loss of information. So a compromise solution is achieved by means of a cost function. There are research works (Huber, 1985), however, which prove that the new subspaces show a higher noise immunity. Furthermore, it is achieved a better capability to filter features with a low discriminative power. So, it results in a better estimation of the density functions (Friedman et al., 1984).
But there are two issues that must be taken into account and that are closely related to the transformations to be used at this stage. First, outliers will be added due to the high variability of the patterns to classify, so an increase of between-class overlap inevitably will occur. Thus, this issue leads to a degradation in the classifier performance. Furthermore, the choice of a suitable rotation of the original data will allow a better view of the discriminative information, as it is shown in Fig. 2. So, it will be very important to find those transformations that contribute to both, a best definition of the between-class boundaries and a best clustering of the within-class information. Most of the techniques developed for dimensionality reduction are based on the assumption of normality of the original data. For these, it is also shown that most of the projections in problems of high dimensionality allow to achieve transformed data with a statistical distribution that can be considered approximately normal. Among them, a technique of proven effectiveness is PCA (Fukunaga, 1990).

Independent Component Analysis for Audio and Biosignal Applications
Blind Implicit Source Separation A New Concept in BSS Theory 9 In certain cases, however, PCA may not provide the best directions for projecting the data, as it is shown in Fig. 3. (b). Moreover, this technique limits the analysis to second order statistics so that, for features with a certain degree of statistical dependence between them, ICA (Hyvärinen et al., 2001) will be more suitable. In this technique, the search of independence between components is the basis for the projection directions pursuit, so it can be considered as a dimensionality reduction technique, and therefore an alternative to PCA. ICA application, however, is subject to two major restrictions: 1.
The assumption of independence of the data is a stronger condition than the assumption of incorrelation, so the conditions for ICA application are more restrictive compared with PCA. 2. The data must show a non-Gaussian distribution, so ICA is not applicable to normal populations, as it occurs with the space of features studied here.
The traffic noise signal verifies the two previous hypotheses: the samples may be considered independent, because being acquired at different times and have different sources spatial location. Furthermore, these samples follow a non-Gaussian distribution, as it is shown in (Mato-Méndez & Sobreira-Seoane, 2008b). Although the extraction of features can be made by using only one sensor, the assumptions handled are the following: 1. For isolated pass-bys, the acquired signal is the result of the combination of both, the signal associated with the vehicle and the background noise. 2. For combined pass-bys, the problem becomes more complex because adding energy associated with other vehicles to the signal associated with the vehicle intended to be classified. 3. The removal of this residual information by source separation techniques would improve the extraction process.
So why not apply this technique to the acquired signal?.

ICA approach
Taking in mind the ideas described into the previous section, the application of ICA is proposed within a first research work (Mato-Méndez & Sobreira-Seoane, 2008a). This application is carried out by transforming the convolutive problem, which follows the model 329 Blind Implicit Source Separation -A New Concept in BSS Theory in Fig. 1, into a linear problem. This transformation is achieved by performing a set of synthetic mixtures by means of the signal acquired. At this point, the reader must remember that the goal is to obtain a higher separability degree of the extracted features. It is not therefore the extraction of the signals associated with the vehicles involved within the mixture process. From this point of view, the transformation carried out is accurate. Thus, the problem to solve now is to findφ i = [φ i1 , . . . ,φ id ] T , an estimation of ϕ i , by applying ICA to the new mixture performedx.
In ICA, the separation is conducted by estimating the mixing matrix which minimises the statistical dependence between components of the original signals. To apply it, at most only one source can show a Gaussian distribution. Besides, once the number of sources is known, it is necessary to get at least an equal number of mixtures. For the linear case, the process of extracting the independent components match with solving the blind source separation problem. Under these hypothesis, the mathematical approach of the mixture can be expressed as The convolutive problem can be therefore expressed by means of a linear system of m mixture equations with p unknowns,X ≈ A · S, where A represents the mixing matrix, and S andX are the vectors of sources and observations respectively. The solution for the linear problem is then conducted by finding the separation matrix B, which is an estimate of the inverse of the mixture matrix A. Although the uniqueness for the solution does not exist from a strict mathematical approach, regarding the independence of the extracted signals this uniqueness can be achieved (Cao & Liu, 1996). In this sense, to ensure the separability of the sources it is sufficient with applying a set of conditions before proceed: 1. The separation process is feasible if the linear function associated with the mixture is bijective, i.e., the regularity of the mixing matrix is needed to be able of estimate B. 2. Regarding the independence of the sources, if p − 1 sources shows a non-Gaussian distribution, the independence of pairs of the extracted components is ensured. As result, the possibility of separating the original sources is also ensured. 3. The combined presence of Gaussian and non-Gaussian sources at the time of the mixture will allow the separation of the last ones. This separation will be impossible, however, for the first ones.
Under the above assumptions, an estimation of both unknowns, the coefficients of the matrix A and the values of the vector s, can therefore be achieved. Although the independence between the recovered sources is ensured in this way, there still exist two unsolved problems in calculating the solution: the uncertainty associated with the energy of the signals obtained, and the uncertainty on the order that they appear. Despite these two uncertainties, ICA proves the existence of uniqueness solving the BSS problem. Furthermore, the existence of these two uncertainties is not an inconvenience for classification purposes.
The process is conducted in two steps. In a first stage, the orthogonal projection of the input mixtures is performed by means of a decorrrelation process. This stage therefore simplifies the solution to a data rotation. Thus, the separation matrix can be factorized as B = R · W, being W a whitening matrix and R a rotation matrix. The whitening process is started by subtracting the mean from the samples. After this, it concludes by applying an orthonormalization process on the centred samples by means of the Singular Value Decomposition (SVD). Proceeding as above, the covariance matrix Σ = E[s(n) · s T (n)] match with the identity matrix. It is true that the study of second order statistics, and more specifically the analysis provided by the decorrelation, allows to carry out a whitening of the samples. This is, however, a necessary but not sufficient condition to ensure the independence of the samples. The difficulty lies in the uncertainty introduced by their possible rotation. This is the reason why, at most, only one of the original sources may show a Gaussian distribution. If this condition is not ensured, the separation of two Gaussian sources is not possible. It is due because the joint distribution for these sources will show a circular symmetry.
Among the wide set of ICA-based algorithms, the developed by Aapo Hyvärinen (Hyvärinen, 1999) is used in (Mato-Méndez & Sobreira-Seoane, 2008a;b) due to its excellent relationship between quality and computational cost. Also known as FastICA, this algorithm in fixed point use both statistics, the kurtosis and negentropy, as non-gaussianity criteria. The decorrelation process is performed by applying onX the SVD decomposition, widely used in data mining. The idea of this decomposition method was first raised by Carl Eckart and Gale Young in 1936 (Eckart & Young, 1936), by approximating a rectangular matrix by another of lower rank. It was not until 1980, however, that a computational version was proposed by Virginia C. Klema and Alan J. Laub (Klema & Laub, 1980). This new version allowed to discover its performance in solving complex problems. SVD decomposition makes possible to detect and to sort the projection directions that contain the values of higher variance, by means of the use of two square matrices containing the singular vectors. Thus, the dimensionality reduction can be achieved by means of SVD, allowing to find subspaces that best approximate the original data. By applying SVD onX, this matrix can be expressed asX ≈ UΛ 1 2 V T , i.e., ⎛ ⎜ ⎝x where Fig. 4 graphically shows the changes that take place for a two-dimensional case. The left-multiplication by V T allows to transform both vectors, v 1 and v 2 showed in Fig. 4 (a), to the unit vectors of Fig. 4 (b). After this step, these vectors are scaled by the product of the covariance matrix Σ, by transforming the unit circle into an ellipse of axes σ 1 Γ 1 and σ 2 Γ 2 , as it is showed in Fig. 4 (c). Finally, the right-multiplication by the matrix U leads to a new rotation of the axes and the consequent rotation of the resulting ellipse of Fig. 4 (c) to its final position showed in Fig. 4 (d).
Thus, the whitening matrix can be expressed as

331
Blind Implicit Source Separation -A New Concept in BSS Theory Finally, after obtaining the matrix R by finding a non-normal orthogonal projection, a estimation of the sources can be achieved by means ofŜ = RW. Taking into account that both, U and V, are unitary matrices, and that the remaining m − r eigenvalues are null, the singular value decomposition of the matrixX allows to express Eq. (14) as where {λ 1 , . . . , λ r } is the set of singular eigenvalues ofX. A suitable approximation for this matrix can be achieved therefore by means of

Independent Component Analysis for Audio and Biosignal Applications
Blind Implicit Source Separation A New Concept in BSS Theory 13 after removal the r − b values, whose contribution can be neglected. This approximation is optimal for the Frobenius norm (Srebro, 2004), being equivalent to the Euclidean norm for this case. The error is thus limited to

Discussion
The method applied allows to improve the classification results. This improvement is due to the previous remotion of energy that is not related with the event that is being processed. The separability degree of the extracted features, however, is suboptimal because of various causes analysed by us, and which are summarised as follows: • Under ICA assumptions, its application on the acquired signal will always result in a set of independent components. But, are these components related with the event to be classified?. For isolated pass-bys, the generated signal follows a sources model much more complex that the used in Fig. 1. In this case, the traffic signal is generated from a set {o 1 , . . . , o q } of q sources of noise, by combining the signals associated with each one of them. Discriminative information associated with each of these sources is therefore masked within this process. This situation is worst when considering combined pass-bys generated from a set {s 1 , . . . , s p } of p isolated pass-bys. Regarding discriminative information, the goal is to obtain a features vector that maximises the between-class separation, while minimising the within-class dispersion. In this sense, the features vector obtained by applying ICA on the acquired signal is not optimal. The trouble lies in that the extracted features contain a mix of information generated by several sources within the set {o 1 , . . . , o q }. The reader should notice how the extraction of this information from the resulting coloured noise signal becomes a much more complex task for BSS theory.
The situation becomes more complicated if a feature selection process is incorporated. The added complexity lies in how the extracted components are selected to be a part of the new calculated subspaces. • On one hand, ICA is highly dependent on the values of skewness and kurtosis shown by the distributions associated with the signals to be separated. In this sense, PCA is most suitable to address the problem of dimensionality reduction of the feature space. By other hand, although ICA and PCA provide similar benefits for this purpose, PCA used alone can not be considered as a sources separation technique. Therefore, PCA must be combined with BSS for both purposes. • From a classification point of view both, the distances and angles of the input values, are altered because the whitening process carried out by ICA. This fact contributes to increase the within-class dispersion resulting in a greater uncertainty on the separation boundaries. This dispersion will become even greater with the presence of outliers, for which ICA is fully vulnerable. • The acquired signal can be considered approximately stationary for short time intervals, lower than 180 ms (Cevher et al., 2009). To process these type of signals, it is usual to use a HMM model, as in speech processing occurs. Thus, HMM provides a suitable model to extract hidden temporal information. This model is not supported by ICA, because the time dependence is removed by considering the matrixX as a set of iid random variables. Moreover, some discriminant information remains hidden in frequency.

333
Blind Implicit Source Separation -A New Concept in BSS Theory Therefore, because these two reasons, a T-F domain is most suitable for the BSS process to apply. Finally, the linear model used to solve this BISS problem is suboptimal. The application of BSS on a convolutive mixture model can better exploit the information acquired by the sensor network.
The search for a successful solution that supports these ideas leads to the BISS-PCA method described below.

BISS-PCA method
To better address the solution, therefore, the first step is to express the mixture model as a function of the noise sources {o 1 , . . . , o q }. This new expression can be achieved by reformulating Eq. (12) by means of the mixture model of Fig. 5. For this more suitable model, the signal provided by the j − th sensor can be expressed in terms of the sources set {s 1 , . . . , s p } as where r is the order of the FIR filter that models the mixture. The signal s i is in turn generated by the noise sources set {o 1 , . . . , o q }. This last one can be characterised as an instantaneous mixture, after applying a far-field approximation. This is a valid approximation, given that the distances between the sources {o 1 , . . . , o q } are much smaller than the distance from this set to the sensors network. So,Eq. (20) can be expressed as where h w ib indicates the contributions of the noise source o b on the signal s i . Thus, the above expression can be reordered as This last equation already allows to express the BISS problem as a function of {o 1 , . . . , o q }. To do this, since the goal is to extract a feature vector closest to the noise sources related with the event to be classified, this vector will be different from Eq. (11). With this consideration, the BISS problem consists in finding ζ i = [ζ i1 , · · · , ζ id ] T , by solving To achieve a better solution, it is proposed to carry out the features projection on subspaces closer to the sources {o 1 , . . . , o q }, by means of a three-stage strategy (see (Mato-Méndez & Sobreira-Seoane, 2011) for more detail). The first stage deals with the segmentation of the acquired signal, by selecting a fragment of signal centred in the event to classify. For finding discriminative information nearest to these sources, an abstracted features vector ψ i = [ψ i1 , ψ i2 , . . . , ψ i f ] T is extracted after removing energy unrelated to the event into a T-F domain by adapting the technique proposed in (Rickard et al., 2001). The last step deals with the suppression of possible correlation between the components of ψ i by projecting them on the directions of maximum variance. This goal can be efficiently achieved by means of the Karhunen-Loeve Transformation (KLT) transformation. It was originally proposed by Kari Karhunen y Michel Loeve (Karhunen, 1946;Loeve, 1945) as a method of development in series for continuous random processes. Widely used in signal processing, it is commonly applied in pattern recognition by means of the linear transformation ζ i = A i T ψ i . The goal is to obtain the values of the matrix A i verifying that R ζ i is diagonal. Thus, It is sufficient with assign to the columns of the matrix A i the eigenvectors of the matrix R ψ i . So that an orthogonal basis can be achieved by means of them, because R ψ i is a symmetric matrix. It is achieved thus that R ζ i = Λ i , diagonal matrix formed by the eigenvalues 1 of R ψ i .
Although PCA (Fukunaga, 1990;Jackson, 1991) is usually identified as the same technique, it differs in the calculation components of the matrix A i when applying the transform KLT. In this case, columns of the matrix A i are matched with the eigenvectors of the covariance matrix of ψ i . The calculation is performed by obtaining each component so as to maximise the variance of the dataset under the restriction f ∑ k=1 a k il 2 = 1, ∀ l = 1, . . . , f . Before proceed it is necessary to achieve a set of data having zero mean. So that centring the data by means of a mean estimator is previously needed. After this adjust, the estimation of the covariance matrix will match the autocorrelation matrix, so that Σ ψ i = R ψ i = E{ψ i ψ i T }. Thus both, the set of eigenvalues {λ i1 , . . . , λ i f } and the set of associated eigenvectors {a i1 , . . . , a i f } can be easily calculated. In this way we achieve to project the original data on the new subspace obtained, by means of ζ il = a T il ψ il , ∀ l = 1, . . . , f . Its variance will then given by σ 2 Once the eigenvalues are sorted in descending order of weight, the d eigenvectors corresponding with the d major eigenvalues are chosen. These eigenvectors are the ones which define the set of "Principal Components".
This strategy allows to reduce the dimensionality of the features space by projecting the original data on the directions of maximum variance, as it is shown in Fig. (3). (a). This is made while minimising the cost in loss of information associated with the process: taking into account that A i is an orthogonal matrix, ψ i can be expressed as The error is limited to Substituting the values of ζ il by ζ il = a T il ψ il , ∀ l = d + 1, . . . , f , it is easily obtained that Then it follows from the above expression how the loss of residual information is minimised, in an optimal way, according to the least squares criterion.

Advances
The BSS-PCA algorithm summarises the concepts addressed through this chapter. This algorithm shows an accuracy of 94.83 % in traffic noise classification, drastically improving results achieved before. In addition, BSS-PCA allows to obtain a substantial reduction in uncertainty assigned by CNOSSOS-EU to this task for the prediction of the noise level emitted by traffic road. This uncertainty is calculated by considering most usual methods in vehicle counts. A full analysis on the benefits of this classifier can be found in (Mato-Méndez & Sobreira-Seoane, 2011).
The BISS-PCA method has been recently extended into a new research work. A new technique has been developed, achieving greater discriminative capability for a different set of features that the one used by BISS-PCA. Fig. 6 shows an example of the discriminative capability

337
Blind Implicit Source Separation -A New Concept in BSS Theory analysed before (figures (a), (c) and (e)) and after (figures (b), (d) and (f)) applying this new technique. By means of this example, we want to show the effect of this technique over one feature (SR) working alone and combined with another feature (MFCC 3 or SBER 4 ). These three features are fully described in the work cited above. It can be observed (figure (a)) how SR shows no between-class discriminative capability for the motorcycle class. After applying the new technique, however, a decision boundary appears. This fact allows now be able to discriminate between two classes (figure (b)). By other hand, the discriminative capability of an isolated feature is generally lower than shown by one subset of the features vector. Figures (c) and (d) correspond to cars class, for which SR is applied in combination with MFCC 3 . It can be observed how the new technique improves the separability degree for this combination of features. Finally, a suitable selection (SR combined with SBER 4 ) leads to a better discrimination of all classes considered (motorcycles, cars and trucks). An example of this is shown in figures (e) and (f) for cars class. The separability between this class and both, motorcycle class and truck class, is clearly improved after applying this new technique (figure (f)).

Conclusions
The application of the existing BSS techniques requires a thorough study of the problem to solve. In many cases, however, the BSS problem is simplified by identifying its mixture model. The first part of this chapter has been devoted to review this issue, which has allowed better understand the need for additional information about the problem to be solved. After it, a new BSS problem has been introduced and discussed. This problem appears in situations for which the variables to extract are presented as implicit functions of the original sources. For this reason, we have named this new problem as Blind Implicit Source Separation (BISS). Achieving a solution becomes a specially complex task when the original sources are identified with noise sources. In these cases, the sources models used in BSS are no longer valid and the separation problem needs to be reformulated. Throughout this chapter, a full characterisation for the BISS problem has been presented.
An example of BISS problem occurs for the classification of traffic noise. Through the chapter, a detailed description about it within an intercity context has been given. To solve it, a first approximation has been proposed, by applying ICA to synthetic mixtures obtained from the signal acquired by a sensor network. After a results analysis, however, it has been shown how ICA does not optimally solves this problem.
After this, a thorough study on how better solve the BISS problem is conducted. As result, a novel feature extraction technique has been then introduced. This technique is used in embedded form by the BSS-PCA classifier developed (Mato-Méndez & Sobreira-Seoane, 2011). Its excellent performance lies in its conception, robustly solving the BISS problem. Unlike other methods described in the state of the art in pattern recognition, this algorithm combines the use of both, an abstracted features vector and the application of BSS on the acquired signal. The compact design of this technique gives rise to the BISS-PCA method that has been introduced in this chapter. It has been explained how this method allows the extraction of discriminative information from the set of original noise sources. Unlike ICA, for which this information remains masked, this new technique allows emerge it. The features space therefore wins in resolution while a dimensionality reduction is performed.
Detected by us in pattern recognition problems, the new BISS concept opens an interesting multidisciplinary research field. This new approach allows to optimise the extraction of discriminative information that otherwise remains hidden. For classification purposes, the BISS-PCA method introduced in this chapter can be extended to other application contexts. This work has been addressed in a recent research. As a result, a new technique solving the BISS problem has been achieved, allowing a highest resolution on the between-class boundaries for a different set of features that the one used by BISS-PCA. An example of the improvements has been shown at the end of this chapter. The results of this new research work are expected to appear soon published, so the reader is invited from this moment to have a look.

Acknowledgments
This work has been partially financed by the Spanish MCYT, ref.