## 1. Introduction

Wireless Sensor Networks (WSNs) consist of spatially distributed autonomous devices, called sensors, that communicate in a wireless manner. Sensors cooperate together to monitor physical or environmental conditions such as temperature, sound, vibration or pressure [1]. This technology has been originally developed for military applications such as battlefield surveillance [2]. Recently, WSNs have been deployed in many other civilian application areas. It includes environment applications such as fire detection in forest monitoring and health-care applications like monitoring the patient status.

Sensor nodes are expected to be of tiny size. Therefore, the size of every sensor's components, such as the power source, processing and data storing memory, are expected to be also very limited. In addition of the physical characteristics, a large number of sensors are often deployed in hostile environments, where the human intervention is difficult if not impossible, for example inside a volcano. Hence, in these networks it is not practical to perform maintenance operations, such as changing batteries on deployed sensor nodes. This requires sensors to be able to self-organize, self-configure and should optimize the energy consumption to maximize the network's lifetime. The network lifetime in WSNs refers to the period of time from the deployment of the sensor nodes to the instant when the network is considered unusable [3].

In terms of energy consumption, sensors consume energy for three main reasons: data sensing, data processing and wireless data communicating. Wireless communication refers to data transmission and reception. Among these three operations, the most power-consuming task is data transmission. Approximatively 80% of power consumed in each sensor node is used to transmit data [4]. One field of research, aimed at extending the lifetime for WSNs, is data compression. In general, by applying a suitable compression scheme, power consumption can be reduced during the transmission and processing stages thus extending network lifetime. Also, by reducing the data packet size, less bandwidth is required to send and receive data [4]. In addition, to further enhance energy efficiency, cluster-based communication schemes are widely used in WSNs [5]. In view of this, we propose to take advantage of both data compression and clustering in order to further reduce energy consumption in the network. Indeed, the compression and clustering strategies are not supposed to work independently but rather the network design has to consider both simultaneously. Therefore, by enabling both compression and clustering schemes in WSNs, energy consumption would be greatly enhanced.

In this chapter, we propose a complete analysis of a new Compression Cluster-based scheme in a Spatially-Correlated Region ($CC\_SCR$) for event-driven applications in WSNs. As WSNs are typically densely-deployed over a sensor field [6], sensor nodes are typically very close to each other. Contrary to continuous monitoring applications, in Event Detection-Driven (EDD) applications, active nodes are concentrated in a relatively small area. Therefore, readings from these nodes are expected to be quite similar. Building on this, we propose a clustering scheme that exploits the spatial correlation of sensed data among the nodes to reduce the size of the data packets that will be sent. Specifically, in the proposed scheme, Cluster Members (CMs) send only the differences between their readings and a reference data which corresponds to the value sensed by the selected Cluster Head (CH). As such, one of the proposal's main issue is to select as CH the node that reduce the average packet size in the cluster. Note that, in this chapter, we complement our previous work published in [7] with several simulation results and analytical analysis.

The main contributions to this chapter are:

The $CC\_SCR$ compression cluster-based protocol for WSNs is proposed. It exploits the spatial correlation of the sensed data in order to reduce energy consumption.

An analytical energy consumption model for comparison to both classical and single hop schemes is developed.

The $CC\_SCR$ is implemented in TinyOS [8] for simulation analysis, in order to prove the potential benefits of $CC\_SCR$ in future applications of real WSNs deployments.

The remainder of this chapter is organized as follows. Section ▭ presents a background of research related to this work, while Section ▭ exposes the network model. Section ▭ shows the analytical results regarding the energy consumption and network lifetime, while Section ▭ shows the simulation results. Finally, the chapter concludes in Section ▭ with a summary of the main advantages of the proposed scheme.

## 2. Related work

We review some of the related works regarding the compression and the clustering schemes in Subsections ▭ and ▭, respectively.

### 2.1. Compression schemes

In the literature, there has been an increased interest in studying compression algorithms for WSNs. On the other hand, many of these compression algorithms have been proposed for classical networks, which are not suitable to be deployed in the WSNs context [9], [10]. The main reason is the limited memory size of sensor nodes, for example, the size of bzip2 is $219KB$ and the size of LZO is $220KB$. Another reason is the limited processor speed of sensor nodes which is around $4-8MHz$. Thus, embedding classical data compression schemes in these tiny nodes is very difficult and it is necessary to design a low-complexity and small-size data-compression algorithm for sensors. We review in the following some of these compression techniques. A more detailed description of compression methods can be found in [11].

One compression algorithm for WSNs is the coding-by-ordering data-compression technique [12]. In this algorithm, when data is combined at an aggregation node, some of these data are implicitly transmitted. The main idea behind this technique is to replace the data transmission of certain nodes by the order in which the aggregated packets are placed by the aggregator. For example, consider five nodes (${n}_{1}$, ${n}_{2}$, ${n}_{3}$, ${n}_{4}$, ${n}_{5}$) that send data to their aggregator node, ${n}_{a}$ and suppose that data value of each node can be any integer from a range of 0 to 23. When the aggregator node receives a value from each node ${n}_{1}$, ${n}_{2}$, ${n}_{3}$, ${n}_{4}$, ${n}_{5}$, the order of transmission of first four nodes ${n}_{1}$, ${n}_{2}$, ${n}_{3}$, ${n}_{4}$ determines the value of ${n}_{5}$ implicitly. Indeed, there are $4!=24$ possible ways of ordering these data packets.

The pipelined in-network compression algorithm is discussed in [13]. The main idea behind this technique is to combine data in order to make them smaller than the original size. After the aggregator collects data from differents nodes, it is stored for a certain amount of time. Data packets are then combined into one packet to minimize data transmission. For example, consider that each data packet has the following form: $<$ *data value*, *node ID*, *timestamp* $>$. Then, the compressed data packet has the following form: $<$ *shared prefix*, *suffix list*, *node ID list*, *timestamp list*$>$. The *shared prefix* field, i.e. the most significant bits, is the same for all the measured values. The suffix list field expresses the list of measured values excluding the *shared prefix* part. The *node ID list* is the list of node identifiers and the *timestamp list* is the list of timestamps. One advantage of this simple compression scheme is that the shared prefix system can be used for both *node ID* and *timestamp* fields. By doing so, more data compression can be achieved.

The distributed compression scheme proposed in [14] uses a side information to encode a source information. For example, if there are two data sources: $X$ and $Y$, which are correlated and chosen from a discrete alphabet, then $X$ can be compressed at the theoretical rate of its conditional entropy $H\left(X\right|Y)$. The receiving node maintains the cosets and can decode $X$ knowing $Y$'s codevector, and with partial information from the source $X$.

The main advantage of the $CC\_SCR$ algorithm, compared to those previously mentioned is that $CC\_SCR$ takes into account the physical characteristics of the sensed data in order to compress data and also considers clustering communication in order to reduce energy consumption.

### 2.2. Clustering schemes

In addition to the compression techniques, there has been an increased interest in studying energy efficient clustering algorithms and extensive clustering algorithms have been proposed for WSNs. Hereafter, we briefly review the most relevant energy efficient clustering algorithms. For more details, the reader can review [15], [16], [5], and [17].

Hybrid Energy-Efficient Distributed clustering (HEED) [18] protocol operates in two main phases: the set-up phase where clusters are formed and the steady phase where the sensor nodes transmit their data using the Time Division Medium Access (TDMA) frames. HEED set-up phase operates in three sub-phases. The first sub-phase is the initializing. Nodes exchange *hello* messages to discover their neighborhoods. The second sub-phase consists of a competition process. The third sub-phase is the finalizing and it allows nodes to join their corresponding CH based on the connectivity degree.

Power-Efficient Gathering in Sensor Information Systems (PEGASIS) [19] steady phase consists of a formation of chains instead of clusters. In the chain formation, the Base Station (BS) and sensor nodes are connected via a chain using a greedy algorithm. One of the nodes, in the chain, is selected by turns to represent the head. In data gathering phase, each node delivers the sensing data to the nearest neighbor node until the data reach the head node which aggregates and delivers the sensing data to the BS.

In [20], the authors proposed an Energy Efficient Clustering Scheme (EECS) protocol. In this protocol, CH candidates compete for the ability to elevate to a CH with a certain probability. This competition involves candidates broadcasting their residual energy to neighboring candidates. If a given node does not find a node with more residual energy, it becomes a CH.

The main difference between the aforementioned clustering algorithms and the $CC\_SCR$ algorithm is that nodes use the physical characteristic of the sensed data to elects CHs. The benefits of our proposed scheme is that nodes use the compression scheme to reduce the energy consumption.

In this chapter, we combine the benefits of using both the clustering and the compression techniques to reduce the energy consumption in the network. Indeed our proposal scheme takes into account the characteristics of the physical surveilled event and also takes advantage of the energy unconstrained of the Base Station (BS) which participates into the CH selection. Specifically, our proposal scheme takes advantage of the fact that the nodes that sense a certain event are usually very close to each other (which entails a high correlation between sensed data) in order to reduce the size of data packets communicated through the network. The BS then selects an efficient CH that minimize the data transmission over the network. To the best of knowledge this is the first clustering protocol, that takes into consideration the physical characteristics of the environment to elect energy efficient clusters and therefore implement the compression scheme to reduce the energy consumption.

## 3. Network model

We consider an event-driven WSN consisting of $N$ sensors deployed over a vast field as in Figure ▭. We denote the *i-th* sensor node as ${n}_{i}$ and the corresponding sensor node
set as $\{{n}_{1}$, ${n}_{2}$,...,${n}_{N}$ $\}$. Some assumptions concerning the sensor nodes and the underlying network model are now presented:

Nodes are uniformly distributed in an $A\times A$ area with $(x,y)$ coordinates. Nodes are homogenous and all have the same capabilities. A unique identifier ID is assigned to each node.

Sensor nodes and the sink node are all stationary after deployment.

Nodes have two power controls to vary transmission power, which depends on the distance to the receiver [21]. Each node, ${n}_{i}$, can reach any other node with a transmission range, ${R}_{c}$. The sink can be reached with a transmission range, ${R}_{t}>{R}_{c}$.

CHs use the average operation as the aggregation technique in order to eliminate data redundancy.

We consider event detection driven wireless sensor applications. The center of the event is located in a random uniformly-distributed point with coordinates $({x}_{event},{y}_{event})$ within the network's area. The range of the event, i.e., the area range where sensors can detect the event, is $R\_event$ meters, where $R\_event\in [1,A]$. We also suppose that an event has a duration of $T\_event$ seconds. In addition, only sensor nodes within $R\_event$ range are considered as active nodes in the network and they are the only nodes performing as the source of the detected event. The rest of the nodes in the system are not considered in our analysis as they do not participate in data reporting. Additionally, in our model, only one event can be active inside the system area and the data value $C$ at the center of the event is constant, i.e., the stationary model in which the measured data do not change during the $T\_event$ seconds that the event is active.

A cluster-based WSN is considered where only one CH is elected for each event. The clustering process is triggered whenever an event is sensed by the nodes inside the event area.

The spatial correlation of the data from the different active nodes has been modeled on previous works in the area according to the following models:

In this chapter, we use diffusion property to model spatially-correlated data [26]. The model considered in this chapter is the same as in [22] in which the data reading at a distance $d$ from the center of the event is $D=C/{(d+1)}^{\alpha}$, where $C$ is a constant representing the value at the center of the event, and $\alpha $ is the diffusion parameter, which depends on the particular environment and phenomenon surveyed, e.g., for light $\alpha =2$, heat $=\alpha \simeq 1$.

Figure ▭ shows the data reading using the aforementioned model, with different values for $\alpha $, and $C=250$. On one hand, when $\alpha \ge 1$, we observe a relatively big difference between the value sensed at the center of the event and the values observed at distance $d$ far away from the center of the event. On the other hand, when $\alpha <1$ ($\alpha =0.1,0.01$ and $0.001$), the data readings away from the center of the event are very similar. In our study, we are interested, specifically, in the types of event where data values are highly correlated.

We use Henizelman's energy consumption model [27]. Specifically, the energy consumed to transmit a message at a distance $d$ is given by

$${E}_{tx}(sz,d)=\left\{\begin{array}{c}\hfill sz\times {E}_{elec}+sz\times {E}_{fs}\times {d}^{2},\phantom{\rule{4.pt}{0ex}}\text{if}\phantom{\rule{4.pt}{0ex}}d\le {d}_{0}.\\ \hfill sz\times {E}_{elec}+sz\times {E}_{mp}\times {d}^{4},\phantom{\rule{4.pt}{0ex}}\text{otherwise}.\end{array}\right.$$ | () |

where $sz$ is the data packet size in bits, ${E}_{elec}$ is the energy consumed due to the transmitter/receiver circuitry, ${E}_{mp}$ is the energy consumed by the transmitter amplifier and ${d}_{0}=\sqrt{{E}_{fs}/{E}_{mp}}$ is the distance threshold between the transmitter and the receiver over which the multi-path fading channel model is used. The energy consumed to receive a message is ${E}_{rx}\left(sz\right)=sz\times {E}_{elec}$.

### 3.1. Classical clustering protocol

A classical clustering process is composed of two phases: the set-up phase and the steady-state phase, as depicted in Figure ▭. When an event occurs in a random (uniformly-distributed) point of the network, nodes inside the event area wake up and start the clustering process. At the beginning of this phase, active nodes compete among each other to become a CH. Specifically, active nodes transmit their control packet to the sink node according to the specified random medium access protocol. In this chapter, CSMA control protocol is used, which is the default MAC protocol for the Mica platform. The control packet only comprises the node's ID and no data is transmitted at this point. The first node that successfully transmits this packet becomes the CH. All nodes involved in event-reporting immediately send their signaling message to the sink node. Therefore, the sink node selects the first node that successfully transmits a signaling message and then it broadcasts a signaling message over the network for a CH notification. Thus, the rest of the nodes become CMs. In the steady phase, CMs send their data in a scheduled fashion using a Time Division Multiple Access (TDMA) protocol. Note that the CH assigns slots to its CMs to accomplish the successful data transmission. Then, the CH aggregates the data values received from its CMs with its own, and sends the resulting data to the sink node.

### 3.2. Proposed compression clustering protocol

The proposed clustering Compression Cluster-based scheme in Spatially-Correlated Regions ($CC\_SCR$), process is also composed of the same two phases, namely: the set-up phase and the steady-state phase. As in classical protocol, the set-up phase happens whenever an event occurs in a region of the network. However, in $CC\_SCR$, the active nodes send their first measured data value to the sink node, i.e., they no longer send just their control packet. Instead, active nodes send a data packet. The reason for this is that this sensed data is used in CH selection procedure. Indeed, this entails an extra energy consumption at the set-up phase, compared to classical protocol. However, this first data transmission allows important energy savings in the steady-state phase.

It is important to note that $CC\_SCR$ is best suited for environments where event conditions are fairly stable during of the duration of the event. This is due to the fact that the CH is chosen according to the first sensed data. Hence, if event conditions suffer a high variation, the originally-selected CH may no longer render acceptable energy savings. An example of such an application is a fire surveillance forest, in which, when a fire occurs in a region, the temperature remains stationary for the duration of the fire in the region. Another example includes target-tracking. In this kind of application, the target is the source of the measured data at sensor nodes, such as light or temperature. Here, the measured data remains the same whenever the target stays in the same place and hence the sensor nodes sense the same measured data during the presence of the target. Next, we describe the set-up and the steady-state phases of the proposed algorithm.

In the set-up phase, after receiving the first data packets from all the active nodes, the sink node calculates the difference between the data from node ${n}_{i}$ and those from node ${n}_{j}$, $i\ne j$. Next, these differences are summed over. We call this sum of the difference between data values ${S}_{i}$. Then, the sink selects as CH the node which minimizes the total difference, calculated value ${S}_{i}$, between each node ${n}_{i}$ and node ${n}_{j}$, $i\ne j$. Finally, the sink broadcasts a control message to the active nodes in order to notify the node selected as CH. Therefore, the rest of the nodes consider themselves as CMs. Note that there is no need for the CMs to send any extra packets since the sink already knows which nodes are active.

In the steady-state phase, the CMs send the difference between their sensed data and the CH's data value, which corresponds to a compressed value, called ${\Delta}_{i}$, rather than the complete data packet, $value\_C{M}_{i}$. Therefore, ${\Delta}_{i}=|value\_C{M}_{i}-value\_CH|$ represents the difference between the

*i-th*CM's data value $value\_C{M}_{i}$, and the corresponding CH data value $value\_CH$. In order to perform this compression, the CH sends its complete sample data value to the CMs at the beginning of each event occurrence. Therefore, CMs send only the ${\Delta}_{i}$ to the CH. The main advantage of the proposed scheme is that the ${S}_{i}$ calculation is made at the sink node, which is not energy or memory-constrained.

#### 3.2.1. Example

To illustrate the protocol's operation, let us consider the following example as presented in Figure ▭ (Figure ▭ shows the case of a classical scheme). We consider five active nodes ${n}_{1}$, ${n}_{2}$, ${n}_{3}$, ${n}_{4}$ and ${n}_{5}$ in the event region $e$ which covers a region of range $R\_event$ as in Figure ▭. In this example, we consider temperature as the sensed measurement value. Nodes ${n}_{1}$, ${n}_{2}$, ${n}_{3}$, ${n}_{4}$ and ${n}_{5}$ sense the value of ${20}^{\circ}$, ${22}^{\circ}$, ${19}^{\circ}$, ${20}^{\circ}$ and ${15}^{\circ}$, respectively, and they send the values to the sink. When the sink receives the data values, it calculates the ${S}_{i}$ value for each node ${n}_{i}$. The node which has the smallest ${S}_{i}$ is considered as the CH.

The following calculation is done at the sink. For node ${n}_{1}$:

$|20-22|=2$

$|20-19|=1$

$|20-20|=0$

$|20-15|=5$

The sink node calculates ${S}_{1}=|20-22|+|20-19|+|20-20|+|20-15|=8$

For node ${n}_{2}$:

$|22-20|=2$

$|22-19|=3$

$|22-20|=2$

$|22-15|=7$

The sink node calculates ${S}_{2}=|22-20|+|22-19|+|22-20|+|22-15|=14$

For node ${n}_{3}$:

$|19-20|=1$

$|19-22|=3$

$|19-20|=1$

$|19-15|=4$

The sink node calculates ${S}_{3}=|19-20|+|19-22|+|19-20|+|19-15|=9$

The calculation for node ${n}_{4}$ is the same as node ${n}_{1}$.

For node ${n}_{5}$:

$|15-20|=5$

$|15-22|=7$

$|15-19|=4$

$|15-20|=5$

The sink calculates ${S}_{5}=|15-20|+|15-22|+|15-19|+|15-20|=21$.

Finally, it selects either node ${n}_{4}$ or node ${n}_{1}$ as a CH. Both nodes minimize the total difference value measured. The other nodes become CMs. During the steady phase, CM nodes send their ${\Delta}_{i}$ value to the CH rather than their complete value. In this example, ${n}_{3}$ sends the sample value of 2 rather than the complete sample value of 22. Note that the compression data sent in our scheme involves sending less coded bits compared to a complete data that is sent in the classical scheme. Therefore, considerable energy saving is achieved in our scheme, as can be seen in Sections ▭ and ▭. Note that this election algorithm can be achieved at nodes in a distributed manner. Distributed scheme here means that nodes first exchange data and then the calculations specified at the sink node will run at the level of individual nodes. However, in this scheme nodes will receive a considerable amount of data, which may complicate the election process as nodes have a limited local memory.

## 4. Analytical results

In this section, the mathematical model of classical, single hop and $CC\_SCR$ protocols are described. The total energy consumed in the network, ${E}_{total}$, for the duration of an event, can be calculated as follows:

where ${E}_{competing}$ is the energy consumed during the cluster formation phase and ${E}_{reporting}$ is the energy consumed during the steady-state phase. We calculate in the following $\mathbb{E}\left[{E}_{total}\right]$ as the expected energy consumed through the network for a single hop protocol and for both the classical and $CC\_SCR$ protocols.

$$\begin{array}{c}\hfill \mathbb{E}\left[{E}_{total}\right]=\mathbb{E}\left[{E}_{competing}\right]+\mathbb{E}\left[{E}_{reporting}\right]\end{array}$$ | () |

### 4.1. Classical protocol

For classical protocol, where no data compression is enabled, we first calculate $\mathbb{E}\left[{E}_{competing}\right]$. The energy consumed in the cluster formation phase is due to the signaling packet transmission of the active nodes in the event area directly to the sink plus the reception of the signaling packet from the sink to the active nodes, then:

$$\begin{array}{c}\hfill \mathbb{E}\left[{E}_{competing}\right]=m\times [{E}_{tx}(S,{R}_{t})+{E}_{rx}\left(S\right)]\end{array}$$ | () |

where $m=N\pi R\_even{t}^{2}/{A}^{2}$ is the expected number of active nodes in the range $R\_event$ when the disk is totally included in the area $A\times A$ and $N$ is the total number of nodes in the network. $S=24$ bits is the size of signaling message, and

$m\times {E}_{tx}(S,{R}_{t})$ is the energy consumed to send $m$ competing messages to the sink.

$m\times {E}_{rx}\left(S\right)$ is the energy consumed by the resulting competing messages sent from the sink through the network.

$$\begin{array}{cc}\hfill \mathbb{E}\left[{E}_{reporting}\right]=& N{b}_{r}\times [{E}_{tx}(S,{R}_{c})+(m-1)\times {E}_{rx}\left(S\right)+(m-1)\times {E}_{tx}(fixe,{R}_{c})\hfill \\ & +(m-1)\times {E}_{rx}\left(fixe\right)+{E}_{DA}\times fixe+{E}_{tx}(fixe,{R}_{t})]\hfill \end{array}$$ | () |

where $fixe$ bits is the size of the full data packet, $N{b}_{r}$ is the number of packets sent during the steady phase, and

${E}_{tx}(S,{R}_{c})$ is the energy consumed from a signaling message sent by the CH to its CMs in order to send their data.

$(m-1)\times {E}_{rx}\left(S\right)$ is the energy consumed by CMs to receive the signaling message.

$(m-1)\times {E}_{tx}(fixe,{R}_{c})$ is the energy consumed by CMs to send data to the CH.

$(m-1)\times {E}_{rx}\left(fixe\right)$ is the energy consumed by the CH to receive data sent by the CMs.

${E}_{DA}\times fixe$ is the energy consumed by the CH due to data aggregation.

${E}_{tx}(fixe,{R}_{t})$ is the energy consumed by the CH to send the aggregated data to the sink.

In the simulation, we take $fixe=32$ and $N{b}_{r}=29$.

### 4.2. Single hop protocol

In the single hop protocol, there is no energy consumed during the set-up phase. Nodes start sending data packets directly to the sink, then:

$$\begin{array}{c}\hfill \mathbb{E}\left[{E}_{total}\right]=N{b}_{r}\times m\times {E}_{tx}(fixe,{R}_{t})\end{array}$$ | () |

where

$N{b}_{r}$ is the number of reports sent to the sink during $T\_event$ sec.

$m$ is the number of nodes involved in data reporting.

${E}_{tx}(fixe,{R}_{t})$ is the energy required to send a full data packet to the sink.

The interest of analyzing this case is to have an insight into the benefits of clustering schemes in event-driven WSNs.

### 4.3. CC_SCR

We now consider the case where the $CC\_SCR$ strategy is enabled. It is to be noted that the proposed scheme, $CC\_SCR$, behaves in the same manner in the cluster formation phase as classical protocol, with the important difference being that nodes transmit the data packet instead of the signaling packet, then:

$$\begin{array}{c}\hfill \mathbb{E}\left[{E}_{competing}\right]=m\times [{E}_{tx}(fixe,{R}_{t})+{E}_{rx}\left(S\right)]\end{array}$$ | () |

where

$m\times {E}_{tx}(fixe,{R}_{t})$ is the energy consumed to send $m$ data packets to the sink.

$m\times {E}_{rx}\left(S\right)$ is the energy consumed by the resulting compete message sent from the sink through the network.

In the steady-state phase, energy consumption is found as follows:

$$\begin{array}{cc}\hfill \mathbb{E}\left[{E}_{reporting}\right]=& {E}_{tx}(fixe,{R}_{c})+(m-1)\times {E}_{rx}\left(fixe\right)+N{b}_{r}\times [{E}_{tx}(S,{R}_{c})\hfill \\ & +(m-1)\times {E}_{rx}\left(S\right)+(m-1)\times {E}_{tx}(S+lo{g}_{2}\left(\mathbb{E}\left[{\Delta}_{i}\right]\right),{R}_{c})\hfill \\ & +(m-1)\times {E}_{rx}(S+lo{g}_{2}\left(\mathbb{E}\left[{\Delta}_{i}\right]\right))+fixe\times {E}_{DA}+{E}_{tx}(fixe,{R}_{t})]\hfill \end{array}$$ | () |

where

${E}_{tx}(fixe,{R}_{c})$ is the energy consumed to send CH data packets to the CMs.

$(m-1)\times {E}_{rx}\left(fixe\right)$ is the energy consumed by CMs in order to receive data sent by the CH.

${E}_{tx}(S,{R}_{c})$ is the energy consumed from a signaling message sent by the CH to its CMs in order to send their data.

$(m-1)\times {E}_{rx}\left(S\right)$ is the energy consumed by CMs in order to receive the signaling message.

$(m-1)\times {E}_{tx}(S+lo{g}_{2}\left(E\left[{\Delta}_{i}\right]\right),{R}_{c})$ is the energy consumed by the CMs to send the compressed data to the CH.

$(m-1)\times {E}_{rx}(S+lo{g}_{2}\left(E\left[{\Delta}_{i}\right]\right))$ is the energy consumed by the CH to receive the compressed data from the CMs.

${E}_{DA}\times fixe$ is the energy consumed by the CH due to data aggregation.

${E}_{tx}(fixe,{R}_{t})$ is the energy consumed by the CH to send the aggregated data to the sink.

where $\mathbb{E}\left[{\Delta}_{i}\right]$ is the average data packet size which corresponds to the difference between the CMs' data and the CH's data. It is worth noting that when considering an uniform node distribution with a large $N$, the node that minimizes the distance in the $R\_event$ region will be located at the center of $R\_event$. Therefore, to calculate $\mathbb{E}\left[{\Delta}_{i}\right]$, let us first calculate the average distance between active nodes and the CH, $\mathbb{E}\left[{d}_{toCH}\right]$. We denote by $\mathcal{D}$ the disk of radius $R\_event$, i.e. $\mathcal{D}=\{(x,y)\mid {x}^{2}+{y}^{2}\le R\_even{t}^{2}\}$. Since active nodes are uniformly distributed in $\mathcal{D}$, we have

$$\mathbb{E}\left[{d}_{toCH}\right]=\int {\int}_{\mathcal{D}}\sqrt{{x}^{2}+{y}^{2}}dxdy=\frac{1}{\pi R\_even{t}^{2}}{\int}_{r=0}^{r=R\_event}{\int}_{0}^{2\pi}{r}^{2}drd\theta =\frac{2R\_event}{3}.$$ | () |

We then calculate $\mathbb{E}\left[{\Delta}_{i}\right]$, the average data difference between the data at the CM and the maximum value at the CH $C$, considering that the CH is located at the center of the cluster. Indeed, in a highly dense WSN, such the one considered in this work, it is reasonable to consider that the CH is located at the center of the cluster, i.e., very close to the event origin.

$$\mathbb{E}\left[{\Delta}_{i}\right]=|C-\frac{C}{{(\mathbb{E}\left[{d}_{toCH}\right]+1)}^{\alpha}}|=C\times |(1-\frac{1}{{(\frac{2R\_event}{3}+1)}^{\alpha}})|$$ | () |

Note that the previous model is an approximation of reality, in which an ideal channel is considered, i.e., there is no consideration of packet loss. According to the previous models, Figure ▭ and ▭ show the average energy consumed in the network of $N=1000$ for different values of ${R}_{t}$ and ${R}_{c}$, respectively.

In Figure ▭, we set ${R}_{c}=100$ $m$ and $N{b}_{r}=29$, and in Figure ▭, we set ${R}_{t}=150$ $m$, $N{b}_{r}=29$. These results show that the $CC\_SCR$ strategy is suitable when ${R}_{t}$ is less than around 180 meters and ${R}_{c}$ is greater than around 50 meters.

Exceeding these thresholds makes the competing process of the proposed protocol very costly in energy due to the full data packet sent to the sink during the set-up phase. Remember that classical protocol only transmits a control packet in this phase. Therefore, $CC\_SCR$ has the highest energy consumption when the distance from the cluster to the sink becomes considerable.

In addition, Figure ▭ demonstrates that the single hop scheme achieves the greatest energy consumption as 1) its transmissions depend directly on ${R}_{t}$, and 2) it sends the full data packets. Figure ▭ shows a steady energy consumption for the single hop scheme as it does not use a CH to shorten its transmission range, therefore all nodes use a costly direct-transmission ${R}_{t}$, which dramatically decreases network lifetime.

Figure ▭ shows the average energy consumed in the network when varying the $N{b}_{r}$ parameter. Here, ${R}_{t}$ and ${R}_{c}$ are set to $300m$ and $100m$, respectively. The results show that significative energy savings can be achieved when increasing the number of reports sent from the CMs to the CH. Building on from these observations, in Figure ▭ we observe that the single hop scheme achieves the greatest energy consumption as its transmission depends directly on $N{b}_{r}$.

The point of intersections, ${R}_{t}\_inter$, ${R}_{c}\_inter$, and $N{b}_{r}\_inter$ of Figure ▭, ▭ and ▭, respectively, are calculated as follows.

$${R}_{t}\_inter=\sqrt[4]{\frac{fixe\phantom{\rule{-0.166667em}{0ex}}\times \phantom{\rule{-0.166667em}{0ex}}(m\phantom{\rule{-0.166667em}{0ex}}\times \phantom{\rule{-0.166667em}{0ex}}{E}_{elec}\phantom{\rule{-0.166667em}{0ex}}+\phantom{\rule{-0.166667em}{0ex}}{E}_{mp}\phantom{\rule{-0.166667em}{0ex}}\times \phantom{\rule{-0.166667em}{0ex}}{R}_{c}^{4})\phantom{\rule{-0.166667em}{0ex}}-\phantom{\rule{-0.166667em}{0ex}}N{b}_{r}\phantom{\rule{-0.166667em}{0ex}}\times \phantom{\rule{-0.166667em}{0ex}}(m\phantom{\rule{-0.166667em}{0ex}}-\phantom{\rule{-0.166667em}{0ex}}1)\phantom{\rule{-0.166667em}{0ex}}\times \phantom{\rule{-0.166667em}{0ex}}\Delta \phantom{\rule{-0.166667em}{0ex}}\times \phantom{\rule{-0.166667em}{0ex}}[2\phantom{\rule{-0.166667em}{0ex}}\times \phantom{\rule{-0.166667em}{0ex}}{E}_{elec}+{E}_{mp}\phantom{\rule{-0.166667em}{0ex}}\times \phantom{\rule{-0.166667em}{0ex}}{R}_{c}^{4}]\phantom{\rule{-0.166667em}{0ex}}-\phantom{\rule{-0.166667em}{0ex}}(S\phantom{\rule{-0.166667em}{0ex}}-\phantom{\rule{-0.166667em}{0ex}}fixe)\phantom{\rule{-0.166667em}{0ex}}\times \phantom{\rule{-0.166667em}{0ex}}{E}_{elec}}{(S-fixe)\phantom{\rule{-0.166667em}{0ex}}\times \phantom{\rule{-0.166667em}{0ex}}{E}_{mp}}}$$ | () |

$${R}_{c}\_inter\phantom{\rule{-0.166667em}{0ex}}=\phantom{\rule{-0.166667em}{0ex}}\sqrt{\frac{m\phantom{\rule{-0.166667em}{0ex}}\times \phantom{\rule{-0.166667em}{0ex}}fixe\phantom{\rule{-0.166667em}{0ex}}\times \phantom{\rule{-0.166667em}{0ex}}{E}_{elec}\phantom{\rule{-0.166667em}{0ex}}-\phantom{\rule{-0.166667em}{0ex}}m\phantom{\rule{-0.166667em}{0ex}}\times \phantom{\rule{-0.166667em}{0ex}}(S\phantom{\rule{-0.166667em}{0ex}}-\phantom{\rule{-0.166667em}{0ex}}fixe)\phantom{\rule{-0.166667em}{0ex}}\times \phantom{\rule{-0.166667em}{0ex}}{E}_{elec}\phantom{\rule{-0.166667em}{0ex}}-\phantom{\rule{-0.166667em}{0ex}}m\phantom{\rule{-0.166667em}{0ex}}\times \phantom{\rule{-0.166667em}{0ex}}(S\phantom{\rule{-0.166667em}{0ex}}-\phantom{\rule{-0.166667em}{0ex}}fixe)\phantom{\rule{-0.166667em}{0ex}}\times \phantom{\rule{-0.166667em}{0ex}}{E}_{mp}\phantom{\rule{-0.166667em}{0ex}}\times \phantom{\rule{-0.166667em}{0ex}}{R}_{c}^{4}\phantom{\rule{-0.166667em}{0ex}}-\phantom{\rule{-0.166667em}{0ex}}2\phantom{\rule{-0.166667em}{0ex}}\times \phantom{\rule{-0.166667em}{0ex}}N{b}_{r}\phantom{\rule{-0.166667em}{0ex}}\times \phantom{\rule{-0.166667em}{0ex}}(m\phantom{\rule{-0.166667em}{0ex}}-\phantom{\rule{-0.166667em}{0ex}}1)\phantom{\rule{-0.166667em}{0ex}}\times \phantom{\rule{-0.166667em}{0ex}}\Delta \phantom{\rule{-0.166667em}{0ex}}\times \phantom{\rule{-0.166667em}{0ex}}{E}_{elec}}{\phantom{\rule{-0.166667em}{0ex}}-\phantom{\rule{-0.166667em}{0ex}}fixe\phantom{\rule{-0.166667em}{0ex}}\times \phantom{\rule{-0.166667em}{0ex}}{E}_{fs}\phantom{\rule{-0.166667em}{0ex}}+\phantom{\rule{-0.166667em}{0ex}}N{b}_{r}\phantom{\rule{-0.166667em}{0ex}}\times \phantom{\rule{-0.166667em}{0ex}}(m\phantom{\rule{-0.166667em}{0ex}}-\phantom{\rule{-0.166667em}{0ex}}1)\phantom{\rule{-0.166667em}{0ex}}\times \phantom{\rule{-0.166667em}{0ex}}\Delta \phantom{\rule{-0.166667em}{0ex}}\times \phantom{\rule{-0.166667em}{0ex}}{E}_{fs}}}$$ | () |

$$N{b}_{r}\_inter=\frac{-m\phantom{\rule{-0.166667em}{0ex}}\times \phantom{\rule{-0.166667em}{0ex}}(S-fixe)\phantom{\rule{-0.166667em}{0ex}}\times \phantom{\rule{-0.166667em}{0ex}}{E}_{elec}-m\phantom{\rule{-0.166667em}{0ex}}\times \phantom{\rule{-0.166667em}{0ex}}(S-fixe)\phantom{\rule{-0.166667em}{0ex}}\times \phantom{\rule{-0.166667em}{0ex}}{E}_{mp}\phantom{\rule{-0.166667em}{0ex}}\times \phantom{\rule{-0.166667em}{0ex}}{R}_{t}^{4}+m\phantom{\rule{-0.166667em}{0ex}}\times \phantom{\rule{-0.166667em}{0ex}}fixe\phantom{\rule{-0.166667em}{0ex}}\times \phantom{\rule{-0.166667em}{0ex}}{E}_{elec}+fixe\phantom{\rule{-0.166667em}{0ex}}\times \phantom{\rule{-0.166667em}{0ex}}{E}_{mp}\phantom{\rule{-0.166667em}{0ex}}\times \phantom{\rule{-0.166667em}{0ex}}{R}_{c}^{4}}{2\phantom{\rule{-0.166667em}{0ex}}\times \phantom{\rule{-0.166667em}{0ex}}(m-1)\phantom{\rule{-0.166667em}{0ex}}\times \phantom{\rule{-0.166667em}{0ex}}\Delta \phantom{\rule{-0.166667em}{0ex}}\times \phantom{\rule{-0.166667em}{0ex}}{E}_{elec}+\Delta \phantom{\rule{-0.166667em}{0ex}}\times \phantom{\rule{-0.166667em}{0ex}}{E}_{mp}\phantom{\rule{-0.166667em}{0ex}}\times \phantom{\rule{-0.166667em}{0ex}}{R}_{c}^{4}}$$ | () |

where $\Delta =fixe-S-{log}_{2}\mathbb{E}\left[{\Delta}_{i}\right]$.

We conclude from Figure ▭, ▭ and ▭ that 1) using a clustering scheme saves more energy than a single hop scheme, and 2) the application depends on ${R}_{t}$ (referring to how far the sink node is from the area-sensed field), ${R}_{c}$ (referring to the size of the area field), and $N{b}_{r}$ (referring to the number of data updating to the CH). More specifically, we conclude that for a fixed ${R}_{c}$, the higher ${R}_{t}$ results a poor performance of $CC\_SCR$ concerning energy consumption. However, increasing ${R}_{c}$ or $N{b}_{r}$ gives $CC\_SCR$ a better performance compared to the classical scheme. In addition, we conclude that both $CC\_SCR$ and the classical scheme outperforms the single hop scheme.

In the following, we analyze the network lifetime. Based on [28], the general definition of the average network lifetime, $lifetime$, can be expressed as follows:

where $\mathbb{E}[{E}_{0}\_total]=m\times {E}_{0}$ is the average total residual energy in the area of $R\_event$ range, ${E}_{0}$ is the initial energy of a node and $\mathbb{E}\left[{E}_{total}\right]$ is the average energy consumed per unit of time (i.e., during $T\_event$ seconds).

Using the same parameter settings as in the energy consumption analysis described above, Figure ▭, Figure ▭ and Figure ▭ show network lifetime for different values of ${R}_{t}$, ${R}_{c}$ and $N{b}_{r}$, respectively. From Figure ▭, we observe that $CC\_SCR$ outperforms the classical scheme when ${R}_{t}$ is lower than around $180m$. The result of the single hop scheme, in Figure ▭, shows that network lifetime decreases faster than in $CC\_SCR$ and in the classical schemes (Figure ▭ shows the ratio gain of network lifetime).

From Figure ▭, we observe that the more increase of ${R}_{t}$, the less network lifetime gain in $CC\_SCR$, compared to the classical scheme. We also notice the steady network lifetime for the single hop scheme, but with a shorter value than for both the proposed and the classical schemes (Figure ▭ shows the ratio gain of network lifetime).

From Figure ▭, we observe that the higher the $N{b}_{r}$, the longer the network lifetime, concerning the $CC\_SCR$ strategy compared to the classical scheme. We also observe that the network lifetime decreases faster in the single hop scheme, than compared to both $CC\_SCR$ and the classical schemes (Figure ▭ shows the ratio gain of network lifetime).

## 5. Simulation results

We use TinyOS [8] as a simulation tool. The parameters used for this set of results are presented in Table ▭.

Figure ▭ shows the average energy consumed in the network per unit of time for different concentrations of nodes. In this case, there are twenty simulated events. The results clearly demonstrate that our proposal outperforms the classical scheme. It can be seen that, as the number of nodes in the system increases, also the energy consumption increases. Indeed, when there is a high number of nodes in the network, there is also a high number of nodes that sense the event. Hence, the number of packet transmissions (both control and data packets) is much higher than in the case where just a few nodes are active per event. The main reason for the better performance of the proposed protocol is that only the difference, ${\Delta}_{i}$, is transmitted rather than the complete data packet, during the steady-state. Note that this difference between classical and the proposed protocol increases concerning higher densities networks. The rationale behind this is that, in high density networks, nodes are closer to each other, which in turn entails a higher correlation degree among the sensed data. This in turn renders a smaller packet size. Conversely, in the classical scheme, since the packet size is fixed, a higher density network only increases the number of packets transmitted, consuming a lot of energy.

Figure ▭ shows the average energy consumed over time for 60 nodes in the classical, $CC\_SCR$ protocols, and also for one single hop to reach the sink. In order to explore the benefits of clustering architecture, a scenario where all nodes transmit directly to the sink is presented. For the proposed scheme, all active nodes transmit their initial packet to the sink in order to choose the reference node (note that in this case there is no CH). Then, the sink selects the node that minimizes the data difference, as explained in the previous section, and then transmits a control packet indicating the ID of the reference node. Then, for data transmission, active nodes only transmit their difference, ${\Delta}_{i}$, directly to the sink. The results demonstrate clearly that $CC\_SCR$ conserves more energy compared to the classical scheme. Also, it is clear that the choice of clustering scheme offers more energy savings than the single hop scheme. The ratio gain presented in Figure ▭ may reach up to 11 times more energy conservation than the classical scheme, and up to 119 times more energy conservation than the single hop scheme, which are considerable results.

Figure ▭ shows the average energy consumed for different values of $R\_event$ region. When $R\_event$ is varied, also the number of active nodes per event is modified accordingly. Figure ▭ shows the number of active nodes per event. It can be seen that the average number of active nodes for both the classical and proposed scheme is approximately the same. Indeed, the proposed mechanism has no impact on the number of active nodes. Note that, by increasing the number of active nodes, energy consumption also increases. Observe, for instance, that energy consumption, when $R\_event=30$ meters, is less than the consumption when $R\_event=$ 60 meters and 90 meters. In each scenario, we observe that, by enabling our compression scheme, energy consumption over the network is reduced, therefore extending network lifetime.

Figure ▭ shows the average energy consumed for different values of $T\_event$ period. Increasing $T\_event$ also increases the period of the steady-state phase and the number of data reported, thereby it can be seen as an increase in the energy consumption. That explains why the energy consumed by $T\_event=200$ seconds is less than that of $T\_event=300$ and 400 seconds. In each scenario, we observe that enabling our compression scheme reduces energy consumption over the network and thereby extends network lifetime. It is important to note that the proposed mechanism is particularly energy-efficient for high-event duration times. This is due to the fact that, as event duration increases, CMs in the classical scheme transmit many full-length packets while, in the proposed scheme, CMs also transmit many packets but with a much smaller length. This results in a slight increase in energy consumption for the proposed scheme while for the classical scheme there is a significant increase in energy consumption when the event duration increases.

Figure ▭ shows the average energy consumption in $CC\_SCR$ when the aggregation technique is enabled at the CH, compared to the case where no aggregation is performed. The results clearly show that the aggregation technique conserves more energy (Figure ▭ shows the ratio of the gain). The result is expected because when the CH aggregates the data of its CMs, the CH only transmits one single packet to the sink, unlike the case when no aggregation is used, where the CH transmits each data value received from the CMs.

## 6. Conclusion and future work

In this chapter, we have proposed a novel clustering scheme, namely, $CC\_SCR$ protocol, which uses a compression technique for event-driven applications in WSNs. The proposed clustering scheme is based on selecting the node that reduces the packet size among all the active nodes in the system. The sink selects the node which minimizes the total amount of data as a CH, therefore increasing the efficiency of the compression technique by sending only the difference, rather than the complete data value, to the CH. To analyze the performance of the proposed scheme compared to both single hop (i.e., direct transmission to the sink) and classical schemes, an approximate mathematical model for energy consumption was developed. In addition, we implemented the $CC\_SCR$ in TinyOS, and for different system parameters, simulation results conclude that, considering the spatial correlation in the communication of WSNs, achieves significant energy conservation compared to a classical clustering scheme. The ratio benefit may be up to 11 times that of the classical scheme. As such, the proposed scheme greatly extends network lifetime. In future work, we aim to investigate a generalization of the clustering scheme in order to consider a higher number of events that can occur simultaneously in within the network, as it is the case in some environment monitoring applications. These results can be verified and deployed in our testbed. We also aim to include a mobility aspect to a certain number of nodes in the network, and consider this propriety in the clustering process in order to achieve considerable energy savings.