Large scale parameters.
In this chapter, the fundamentals of distributed inference problem in wireless sensor networks (WSN) is addressed and the statistical theoretical foundations to several applications is provided. The chapter adopts a statistical signal processing perspective and focusses on distributed version of the binary-hypothesis test for detecting an event as correctly as possible. The fusion center is assumed to be equipped with multiple antennas collecting and processing the information. The inference problem that is solved, primarily concerns the robust detection of a phenomenon of interest (for example, environmental hazard, oil/gas leakage, forest fire). The presence of multiple antennas at both transmit and receive sides resembles a multiple-input-multiple-output (MIMO) system and allows for utilization of array processing techniques providing spectral efficiency, fading mitigation and low energy sensor adoption. The problem is referred to as MIMO decision fusion. Subsequently, both design and evaluation (simulated and experimental) of these fusion approaches is presented for this futuristic WSN set-up.
- decision fusion
- distributed MAC
- statistical CSI
- instantaneous CSI
- large-scale WSN
- environmental characterization
- experimental validation
This chapter addresses the fundamentals of distributed inference problems in wireless sensor networks (WSNs) and provides statistical theoretical foundations to several applications. It adopts a statistical signal processing perspective and focuses on the distributed version of the binary-hypothesis test for detecting an event as correctly as possible. The reference WSN scenario is described in Section 2 which consists of multiple transmit sensors and an information center equipped with multiple antennas for collecting and processing the information to arrive at a robust decision on an observed phenomenon of interest. The presence of multiple antennas at both transmit and receive side resembles a multiple-input-multiple-output (MIMO) system and the inference problem based on information fusion is referred to as MIMO decision fusion. Consequently, several channel-aware fusion rules guiding MIMO decision fusion (DF) is studied in Section 3. The practical implications of employing MIMO decision fusion for distributed inference in WSN is evaluated through an indoor-to-outdoor measurement campaign detailed in Section 4. Performance of the fusion rules over the measured environment is also compared with that over the simulated set-up in Section 4.
2. WSN scenario and system model
In a WSN, decision fusion (DF) refers to the process of arriving at a final decision on an observed phenomenon by fusing local decisions transmitted by individual sensors on the said occurrence at a decision fusion center (DFC). In traditional WSNs, each sensor is allocated a dedicated orthogonal channel for transmitting their local observations, a communication scenario commonly referred to as parallel access channel (PAC) . Sensor signals are transmitted over the PAC using time, frequency or code division multiple access. Over the recent years, large-scale or massive WSNs are being deployed, that involves coexistence of multitude of sensors, and the bandwidth requirement increases linearly with the number of sensors. In such scenarios, all sensors transmit their decisions simultaneously over a multiple access channel (MAC), while suffering from intrinsic interference resulting from superposition of multiple sensor signals in time . To alleviate fusion performance in presence of deep fading, shadowing and interference, a DFC equipped with multiple antennas is proposed in . This choice demands only further complexity on DFC side and does not affect simplicity of sensors implementation. The result is a communication over a “virtual” MIMO channel between the sensors and the DFC, as shown in Figure 1.
2.1 System model for MIMO configuration
2.1.1 Sensing and local decision model
A WSN consisting of sensors and a DFC equipped with receive antennas is considered for investigation in this section, where the th sensor communicates its local decision, , about the presence or absence of a target, after being mapped on an On-and-Off Shift Keying (OOK)  modulated symbol, . Irrespective of the scenario and target, maps into , where is the set of binary hypotheses with representing the absence or presence of a specific target. The sensor decisions are assumed to be transported over a flat fading multiple access distributed (or virtual) MIMO channel with perfect synchronization at the receiver end.
The performance of WSN can be evaluated in terms of the conditional probability mass function (pmf) . Assuming conditionally independent and identically distributed (iid) decisions, we denote the probability of detection and false alarm at the th sensor. We also assume that , i.e. the decision taken by individual sensor always results in a receiver operating characteristics (ROC) higher than the decision threshold . The system probabilities of false alarm and correct detection is given by,
where is the fusion statistics, is the decision threshold to which is compared to, and is the probability of event A conditioned on event B.
2.1.2 Signal model
If the composite channel coefficient between the th sensor and the th receive antenna at the DFC is denoted by , the received signal can be expressed as,
after sampling and matched filtering at the DFC, where is the received signal, is the transmitted signal, is the additive white Gaussian noise (AWGN) vector, is the independent small scale fading matrix, is the large scale attenuation and shadowing matrix with the th diagonal element accounting for pathloss and shadowing experienced by the th sensor. Here, denotes circular symmetric complex normal distribution with mean vector and covariance matrix respectively.
2.1.3 Channel model
If the propagation channel is assumed to be Rician distributed, the fading vector at the th sensor can be modeled as,
where is the steering vector, is the non-line-of-sight (NLOS) scattered component and is the Rician -factor between th sensor and DFC. A Two Wave with Diffused Power (TWDP) [5, 6] distributed channel fading vector can be modeled by,
where , is the shape factor for the fading distribution. For double-Rayleigh (DR)  distributed fading vector, the channel coefficients can be expressed as,
with and no line-of-sight (LOS) components existing between the sensors and the DFC.
The sensors are uniformly deployed with distances from the DFC varying between and , and large scale attenuation of where is the pathloss exponent and is a log-normal variable such that with representing normal distribution with mean vector and covariance matrix respectively, is the distance of the th sensor from the DFC, and are the mean and standard deviations in dBm respectively.
2.1.4 Modified system model for non-coherent decision fusion
Non-coherent decision fusion over MAC using the received-energy test has been investigated in [2, 8]. In such a scenario, if the probability of false alarm for any sensor decision is lower than the probability of detection, the received energy can prove to be optimal for arriving at the right decision about an observed phenomenon at the DFC for mutually independent and identically distributed (i.i.d.) sensor decisions.
In this case, let us consider that a group of sensors transmit their local decisions to the DFC equipped with antennas over a Raleigh faded MAC with channel coefficients of equal mean power, thereby exploiting diversity combining, either in time, frequency, code or in polarization domain. Statistical channel state information (CSI) is assumed at the DFC, i.e. only the pdf of each fading coefficient is available.
Let us denote: the received signal at the th diversity branch of the DFC after matched filtering and sampling; , the fading coefficient between the th sensor and the th diversity branch of the DFC; the additive white Gaussian noise at the nth diversity branch of the DFC. The vector model at the DFC is the following:
where , and are the received signal, transmitted signal and the AWGN vectors respectively. Finally, we define the random variable , representing the number of active sensors and the set of possible realizations of . It is worth-mentioning here that it will be more practical to assume an asymmetric model for the statistics of the channel coefficients resulting in scenario-dependent analysis. Therefore, symmetric channel model is considered here to analyze performance with power control possibility depending on the application scenario.
3. MIMO decision fusion
3.1 Instantaneous CSI
Two types of fusion rules are considered and compared in this chapter in order to arrive at a reliable choice depending on the communication scenario. One set of rules (Decode-and-Fuse) uses the received signal directly to arrive at a decision on whether a target is present or absent, without taking any information from the transmitted signal into consideration, the optimum (opt) test statistics  for which is given by,
assuming conditional independence of from , given , and . The test statistics for three different sub-optimum fusion rules belonging to this group are considered for this section to compensate for the asymptotically increasing computational complexity of the optimum rule. These rules are Maximal Ratio Combining (MRC) , Equal Gain Combining (EGC)  and Max-Log rules , defined by the following test statistics,
respectively, all assuming identical sensor performances.
The other set of fusion rules (Decode-then-Fuse) aims at concluding to a global decision after estimating the transmit signal from the received signal vector. Using Chair-Varshney (CV) rule, the test statistics for which over a noiseless channel is given by,
where . Two different detectors are considered under this umbrella, especially, the Maximum Likelihood (ML) detector  to obtain,
and the Minimum Mean-Squared Error (MMSE) detector  to get,
where and are the mean and covariance matrix of the transmit signal vector respectively. The estimated from the above two detectors can be incorporated directly in the CV-rule of (12) to obtain the test statistics for CV-ML and CV-MMSE rules.
3.2 Statistical CSI
If statistical CSI  is used for DF at the receiver instead of the instantaneous CSI extracted from the sensor signals received at the DFC, the optimal test statistics can be formulated as,
where is the estimated hypothesis, is the Log-Likelihood-Ratio (LLR) of the optimal fusion rule and is the decision threshold to which is compared to. The threshold can be determined using either the Bayesian approach (i.e. the threshold is detected based on the one that minimizes the probability of error) or the Neyman-Pearson approach  (i.e. the threshold is detected based on the one that ensures fixed system false-alarm rate). An explicit expression of the LLR from Eq. (15) is given by,
where we have exploited the conditional independence of from (given ).
In the case of conditionally (given ) i.i.d. sensor decisions () we have that and . Differently, when local sensor decisions are conditionally i.n.i.d. the pmfs are represented by the more general Poisson-Binomial distribution with expressions given by,
It is to be noted here that calculating the sums in Eq. (17) practically becomes impossible with the increase in the number of sensors . Several alternatives have been proposed across literature to tackle such exhaustive computations, which include fast convolution of individual Bernoulli probability mass functions (pmfs) , Discrete Fourier Transform (DFT)  based computation and recursion-based iterative approaches.
4. Performance evaluation of MIMO decision fusion
4.1 Measurement campaigns
An indoor-to-outdoor measurement campaign has been conducted in  for investigating propagation characteristics of an (number of sensors = number of receive antennas at the DFC ) virtual MIMO system at 2.53 GHz with 20 MHz bandwidth and subcarrier spacing of around 0.15 MHz. The campaign is conducted with different spatial combinations of half-omnidirectional single transmit antennas representing the sensors, deployed in two different rooms of the Facility of Over-the-Air Research and Testing (FORTE) at Fraunhofer IIS in Ilmenau, Germany (Conference Room, , located on the 1st floor and Instrumentation Room, , located on the ground floor) and receive antennas mounted and co-located on an outside tower representing the DFC. The antennas emulating the sensors are deployed at different heights, namely, near the ground and ceiling and at heights of 1meter (m), 1.5 m and 2 m from the ground, sometimes on all 4 walls, sometimes on all 3 walls and sometimes only on 1 wall of each room at a time. The channel measurements are collected over a measurement set-up detailed in Figure 2 and are recorded using the MEDAV RUSK - HyE MIMO channel sounder.
The dimensions of the two rooms selected are 8.45 m by 4.52 m by 2.75 m for the room and 5.7 m by 3.5 m by 3 m for the room. These rooms are chosen such that a variety of indoor environments is represented including room with keyhole effect (no windows) and with no direct LOS communication, room (smart office) and room cluttered with several noisy electrical and metering equipment (potential scenario for Industry 4.0). In both rooms, measurement set-up is repeated for stationary scenarios and scenarios with people moving around. Due to channel reciprocity conditions, it is assumed that channel estimates can be used for both uplink and downlink.
4.2 Environment characterization
Large and small scale channel statistics are extracted from the channel impulse response (CIRs) and channel frequency responses (CFRs) recorded in the above-mentioned campaign. In order to separate out large scale statistics, average received power and attenuation at each measurement location is calculated by averaging the recorded CIRs at that location. The pathloss exponent is determined from the slope of the best fit line to the logarithm of distances v/s logarithm of average attenuation plot. The probability density function (pdf) of deviation of each value of calculated attenuation from the best-fit line to the log–log plot yields the shadowing distribution. Figures 3–5 demonstrates the log–log attenuation plots for three different measurement scenarios, static environment - Conference (), dynamic environment - Conference () and static environment - Instrumentation () rooms respectively, while Table 1 summarizes the average values for the pathloss exponents () and mean and standard deviation (, ) of the shadowing distributions in all the three above-mentioned measurement scenarios.
The gamma distribution with mean of dB and standard deviation of dB offers a good approximation to the shadowing experienced in the campaign. The pathloss exponent varies between 2 to 4. Higher is experienced over a shorter distance direct link between the sensors and the DFC. Lower shadowing is observed in an environment (room) cluttered with metallic surfaces that contribute constructively to reflected signal power than in an open indoor environment (room) stocked with wooden tables and chairs.
To analyze the small scale channel statistics, the power delay profile (PDP) of the channel is drawn by averaging the power across all the delay bins for each sensor (Table 2). The average delay spread is the first moment and the root mean square (rms) delay spread is the square root of the second central moment of each sensor channel PDP respectively. The fading vector is obtained by concatenating CFRs at all the frequency points experienced over each sensor-DFC channel. The number of frequency points encountered is calculated by dividing the discrete bandwidth of the measured signal with the discrete coherence bandwidth of the sensor-DFC channel. The measurement is also used to deduce additional details like antenna correlation, determined from the correlation coefficients between each pair of fading vectors. The phase information from the complex CIRs is used to compute the steering vector for each transmit antenna.
The distribution of the derived fading vector fits the Rician distribution in most cases, with DR and TWDP distributions fitting the remaining few. Figure 6 demonstrates the range of -factor values for the Rician and TWDP distribution fitting and Figure 7 plots the range of values good for the TWDP distribution fitting where -factor arises due to the presence of two strong interfering components and is given by , with and are the instantaneous amplitudes of the specular components. Figures 8 and 9 depicts the range of antenna correlation coefficients and amount-of-fading (AF) values experienced over all the measurement scenarios, respectively. The average values for each of channel parameters, , , correlation coefficients, and AF () obtained after analysis of the measurement data over each measurement scenarios of , and .
A special scenario is observed in case of the room where the propagation environment can be approximated by DR fading () distribution. As the room is devoid of any windows, a rich scattering environment with ‘keyhole’ effect  is experienced with the existence of a waveguide propagation channel.
In the dynamic scenario, two sets of specular multipath components arrive at the receiver, one over the direct LOS link and the other due to reflection from the moving human body, thereby yielding en environment which can be accurately approximated by the TWDP fading distribution with values ranging between 6 and 20 and values varying between 0.1 and 0.9. Large distances between the transmit sensor and the DFC and nearness of most of the scattering surfaces to the sensors has resulted in similar AF values over all the measurement scenarios (refer to Figure 9).
Rich scattering and diffraction around the sensors in the windowless room has resulted in low correlation between the transmitted signals, while a high correlation is observed among the sensor signals in the open environment of the room. In summary, both fading and shadowing gets detrimental with the increase in inter-sensor distances. Separation between the sensors leads to very low coordination between them making the transmit signals vulnerable to noise, interference and fading, while a large number of different shadowing values are encountered resulting in higher shadowing variance and increased shadowing severity. For indoor-to-outdoor virtual MIMO based communication scenario in WSNs, the propagation environment can experience a -factor varying between 0 to 20 and a -value varying between 0.1 to 0.9 in an open-concept smart office or home and in industry-like environments, as compiled in Table 3.
|0.5 to 4||—|
|6 to 20||0.1 to 0.9|
4.3 Performance analysis
The fusion performance of the formulated sub-optimum fusion rules is evaluated in this subsection, where the propagation environment is modeled using the accumulated measurements. Based on the observation in , and experimental scenarios, Ricean, TWDP and DR distributions are used to characterize the propagation channel.
4.3.1 Receiver operating characteristics (ROC)
For the different fusion rules of Section 3, probability of detection () is plotted against probability of false alarm () (commonly referred to as the Receiver Operating Characteristics (ROC)) for and in presence of a fixed channel SNR of 20 dB. The particular value of 20 dB is chosen for the plots, as the average attenuation recorded for any measurement location is approximately around 20 dB across all measurement environments. The measured SNR over the direct LOS link is recorded to be equal to 40 dB yielding an equivalent channel SNR of dB.
Impact of large scale channel parameters: In order to analyze the impact of large scale channel effects, the small scale fading vectors are modeled to be Rayleigh distributed (). From the Decode-and-Fuse group of sub-optimum fusion rules, MRC and Max-Log are used for DF over four different communication scenarios; No Shadowing (‘Th’; , , = 1, 0 dB, 0 dB; ), (, , = 2.72, 1.22 dB, 2.4 dB), (, , = 2.56, 1.77 dB, 3.6 dB), (, , = 1.96, 1.48 dB, 1.89 dB), and the ROC performances are plotted in Figure 10.
With increase in shadowing and pathloss, MRC outperforms Max-Log. Dependence of Max-Log rule on the noise spectral density is the principle reason behind its poor performance in presence of rich shadowing. From the second group, CV-ML and CV-MMSE rules are used for DF over the above-mentioned four communication scenarios and the ROC performances are plotted in Figure 11. With increase in large scale channel effects, CV-MMSE outperforms CV-ML. The propagation environment has no impact on the performance of CV-ML owing to its dependence only on the SNR which is kept constant for the plots in Figure 11. In general, sub-optimum fusion rules perform better over scenario than over and over than over . With a strong LOS link existing in case of the scenario, it experiences the lowest pathloss. The scenario experiences higher pathloss due to penetration losses contributed by the moving human bodies.
Impact of small scale channel parameters: In order to analyze the effect of the small scale channel effects, sub-optimum fusion rules are used for DF over four different communication scenarios;
‘Th’ case with Rayleigh distributed fading vectors (, , = 1, 0 dB, 0 dB; ),
scenario with Rician distributed fading vectors (, , = 2.72, 1.22 dB, 2.4 dB; with ),
scenario with TWDP fading vectors (, , = 2.56, 1.77 dB, 3.6 dB; with , ),
scenario with DR fading vectors (, , = 1.96, 1.48 dB, 1.89 dB; ),
If the small scale fading vectors are Rayleigh distributed, EGC performs best and CV-ML performs worst. CV-ML is worst under all considered propagation scenario. Max-Log performs a tad bit better than CV-ML over Rician, TWDP and DR fading channels. If the fading vectors are Rician and TWDP distributed, MRC, EGC and CV-MMSE perform almost equivalently. CV-MMSE however champions over MRC and EGC if the small scale channel effects follow DR distribution. Some analogies between performances under measured environment and simulated (as in ) can also be concluded from the results presented in Figures 10–14. In both cases ROC performance demonstrates that CV-MMSE performs better than CV-ML rule, CV-MMSE performs close to MRC/EGC rules, while CV-ML exhibits the worst performance.
Impact of measurement environment: If both large and small scale channel parameters are varied, the probability of detection with MRC and CV-MMSE rules saturates with the increase in over the scenario, but increases with for the and scenarios at a rate slower with higher , as is evident in Figures 15 and 16. with CV-ML and Max-Log rules increases proportionately with for all scenarios. It is worth-mentioning that this set of performances is limited to the chosen channel of 20 dB, and cannot be generalize to any value of channel SNR.
This chapter summarizes design of sub-optimal fusion rules propounded for decision fusion at a DFC equipped with multiple antennas. Such rues are more efficient than exact LLR based optimal fusion rule for practical implementation. The sub-optimal fusion rules offer a plethora of choices for fusing sensor decisions at the DFC energy efficiently with lower requirement of system knowledge and computational complexity, thereby eliminating all problems with fixed point implementation. All these rules still significantly benefit from the addition of multiple antennas at the DFC, with a saturation on performance depending on the specific rule and channel SNR.
We also investigate and study the practical implications of employing distributed MIMO based WSN, especially in the light of the recently proposed decision fusion algorithms for DFC equipped with multiple integrated antennas. A detailed measurement campaign is conducted for an indoor-to-outdoor distributed MIMO scenario with transmit antennas, representing sensors, deployed in a wide variety of indoor environments and receive antennas mounted on top of an outdoor tower, thereby replicating a DFC. Measurements are accumulated both in static and dynamic (people moving around) environments.
For each measurement scenario, large and small scale statistics are derived from the accumulated data, and average values of pathloss and shadowing variations are calculated. Fading distributions derived from the recorded channel impulse responses (CIRs) are found to closely match the double Rayleigh distribution in 21.4% cases, the TWDP distribution in 28.6% cases and the Ricean distribution in 50% cases.
Large and small scale channel parameters calculated from the accumulated measurements are used to model the MAC scenario over which performance of the formulated fusion rules is analyzed for virtual MIMO-based WSN. All the sub-optimal fusion rules, on an average, exploit diversity offered by multiple antennas at the DFC to achieve considerable gain in performance. Among all the rules, CV-ML performs worst and CV-MMSE performs best in all scenarios. MRC, EGC and Max-Log perform in between the two extremes of CV-ML and CV-MMSE. In this case, EGC performs better than MRC and MRC performs better than Max-Log.