Effect of Decentralized Clustering Algorithm and Hamming Coding on WSN Lifetime and Throughput

Wireless Sensor Networks (WSN) has become an interesting field of research because of its wide range of applications such as environmental monitoring, electromagnetic pollution monitoring, medical applications and industrial applications (Teo et al., 2007; Margi et al., 2009; Castelluccia et al., 2005; AbouElSeoud et al., 2010; Tavares et al., 2008). WSN consists of multi-functioning sensor nodes with limited power capacity, so prolonging the lifetime is essential and is one of the main concerns (Castelluccia et al., 2005; Schmidt et al., 2009; Karlsson et al., 2005). For this reason different routing protocols are obtained to increase network lifetime. The clustering routing protocol is one of the most commonly routing protocols because it is energy efficient (Heinzelman et al., 2000, 2002). In any clustering protocol, the network is divided into clusters where some nodes are responsible for others. These nodes are called cluster heads (CHs) or network masters (NMs). There are different algorithms and different methods of choosing the CHs. For example, LEACH (Heinzelman et al., 2000) used the randomized rotation to choose CH nodes. This randomized rotation allows some nodes to act as CHs and the others cannot. Therefore LEACH was improved to be LEACH-C (Heinzelman et al., 2002) that uses central algorithm to choose the CHs and allows only the nodes in the center of each cluster to act as CHs. Also two different algorithms of choosing the NMs are considered in (Botros et al., 2009). The network is considered as one cluster; therefore the CH node that is responsible for collecting data from other nodes is called NM. In the first algorithm, the sensor could become NM more than once for a fixed number of cycles. It was proven that this algorithm provided a lifetime longer than the lifetime obtained by LEACH and LEACH-C algorithms (Heinzelman et al., 2000, 2002). However, this algorithm has some residual energy after the network failure and this energy cannot be used anymore. Therefore, the second algorithm is obtained to improve the first one by allowing each sensor to become NM once with a different number of cycles and acts as an active node or ordinary node (that senses the


Introduction
Wireless Sensor Networks (WSN) has become an interesting field of research because of its wide range of applications such as environmental monitoring, electromagnetic pollution monitoring, medical applications and industrial applications (Teo et al., 2007;Margi et al., 2009;Castelluccia et al., 2005;AbouElSeoud et al., 2010;Tavares et al., 2008).WSN consists of multi-functioning sensor nodes with limited power capacity, so prolonging the lifetime is essential and is one of the main concerns (Castelluccia et al., 2005;Schmidt et al., 2009;Karlsson et al., 2005).For this reason different routing protocols are obtained to increase network lifetime.The clustering routing protocol is one of the most commonly routing protocols because it is energy efficient (Heinzelman et al., 2000(Heinzelman et al., , 2002)).In any clustering protocol, the network is divided into clusters where some nodes are responsible for others.These nodes are called cluster heads (CHs) or network masters (NMs).There are different algorithms and different methods of choosing the CHs.For example, LEACH (Heinzelman et al., 2000) used the randomized rotation to choose CH nodes.This randomized rotation allows some nodes to act as CHs and the others cannot.Therefore LEACH was improved to be LEACH-C (Heinzelman et al., 2002) that uses central algorithm to choose the CHs and allows only the nodes in the center of each cluster to act as CHs.Also two different algorithms of choosing the NMs are considered in (Botros et al., 2009).The network is considered as one cluster; therefore the CH node that is responsible for collecting data from other nodes is called NM.In the first algorithm, the sensor could become NM more than once for a fixed number of cycles.It was proven that this algorithm provided a lifetime longer than the lifetime obtained by LEACH and LEACH-C algorithms (Heinzelman et al., 2000(Heinzelman et al., , 2002)).However, this algorithm has some residual energy after the network failure and this energy cannot be used anymore.Therefore, the second algorithm is obtained to improve the first one by allowing each sensor to become NM once with a different number of cycles and acts as an active node or ordinary node (that senses the

Decentralized algorithm
In many WSN applications that cover large areas, some nodes may be out of the sink's range and cannot reach it because the required energy is higher than their initial energy, i.e., a more powerful battery would be needed to reach the sink.According to the technology or the cost, there may be some constrains which prevent increasing the power of the node battery.In such cases, some nodes cannot reach the sink and the sink does not have any information about these nodes.Consequently, these nodes are considered dead and out of range throughout network lifetime and hence the area in question is not fully covered.Therefore, a new algorithm which is called the decentralized algorithm is developed.When applying this algorithm, the nodes that are out of the sink's range, will be in range and operate as active nodes.In the set up phase, the nodes that are in range send their information to the sink that will divide the network into clusters and compute the number of cycles of the CHs in the first rotation.After this first rotation, the role of the sink is over and the responsibility of choosing the next CHs is transferred to the current CHs.This means that the CHs in any rotation are responsible for choosing the CHs of the next rotation and computing their number of cycles.In order to permit the CH to undertake the new responsibilities, some information about the nodes in its cluster (such as the IDs and the remaining energy of each node) must be known.Therefore it will send a broadcast message to all nodes in its cluster.Since the distances between nodes and CHs within a cluster are less than the distances between nodes and the sink, the nodes that are out of the sink range will be within the CH range.All nodes in each cluster will send their information such as IDs and energy levels to the CH.Also, the sink will send the IDs of all nodes that can work as CHs at the end of the first rotation.By using this algorithm, the nodes that were out of the sink range will be active nodes, sense the surroundings and communicate with the CHs which makes the network fully covered.On the other hand, some overhead energy will be consumed by the CHs due the additional responsibility.This overhead energy is explained next according to the network parameters and variables shown in According to the above network paramters and variables, the overhead energy is as follows: 1. Broadcast energy (E bc ): The energy dissipated by the CH in order to inform all the nodes in its cluster about the next CH and to activate the nodes that are out of the sink's range.It is calculated as follows, assuming that the network is divided into K clusters.

fs
where l is the number of bits that is transmitted by the CH to declare the next CH and is considered to be equal to 8 bits and d CH_N is the distance from the CH node to any sensor node.

Processing energy for ID comparison (E ID ):
The processing energy that is dissipated by the CH to compare between the transmitted nodes IDs and the IDs stored in its data base to choose the next CH.It is calculated according to the following equation: where N op is the number of binary operations and E oper is the energy per binary operation and is equal to 10 -14 J according to the new technology (Ali et al., 2011).
3. Processing energy for computations (E Cy ): The processing energy that is dissipated by the CH in order to calculate the number of cycles that will be allocated to the next CH.It is calculated in the same manner as the processing energy for ID comparison (E ID ). 4. Announcement energy (E an ): The energy dissipated by the CH in order to announce the next CH node about its number of cycles.It is calculated as follows: 2 El E d an CH N fs   (3)

Optimum number of clusters for the decentralized algorithm
The optimum number of clusters is obtained by minimizing the total energy consumed per cycle.This is because the total energy consumed by the sensor node is the energy consumed per one cycle multiplied by the total number of cycles; this is considered to be the network lifetime.Note that the total energy consumed by the sensor node is almost equal to its initial energy because the remaining energy is very small (close to zero) using the algorithm of (Botors et al., 2009).Therefore, the lifetime is maximized by minimizing the total energy consumed per cycle.This energy depends on the energy consumed by the node when it acts as a CH and when it acts as an active node.Since the nodes cover the entire area under study, the energy consumed by the node when it acts as an active node and CH will be as follows: (1 ) where E Rx is the energy dissipated by the active node in order to receive an announcement from the CH.It is assumed that the network consists of N nodes and is divided into K clusters with approximately (N/K) sensors per cluster; therefore, the energy dissipated inside a single cluster during one complete cycle is as follows: (1 ) Consequently, the total energy consumed during a single cycle or during transmitting single frame.
By substituting from equations (4), ( 5) and ( 6) into ( 7), the energy consumed in the entire network during a single cycle is as follows: The optimum number of clusters K opt is obtained by setting the derivative of E Cycle with respect to K to zero and K opt will be as follows: By ignoring the processing energies E ID and E Cy (because they are in the order of 10 -14 J (Ali et al., 2011)), the optimum number of clusters will be simplified as follows: 2 4 () It is obvious from the above equation that the optimum number of clusters depends on the network parameters and the network area.Therefore it is more reasonable to investigate the effect of the decentralized algorithm on the networks that cover large areas to see its effect on lifetime and network coverage.

Effect of decentralized algorithm on networks covering large areas
Lifetime is expected to decrease with increasing the network area.This is because of the large distances between nodes and the sink that prevents a lot of nodes from reaching the sink (require energy more than its initial energy).Consequently, these nodes will be considered as dead nodes with respect to the sink.Therefore, applying the decentralized algorithm on the applications that cover large areas is more reasonable.A MATLAB (Matlab) simulation model is built to study the effect of the decentralized algorithm on networks covering large areas.The lifetime for different network areas is shown in Table 2.
Assuming that the number of sensor nodes and the sink location vary according to the network area (For example, the sink location is (0, -125) and the number of sensor nodes equals 100 nodes for a network of 100m×100m.For a network of 200m×200m, sink location is (0, -250) and the number of sensor nodes equals 400 nodes and so on) The table shows that the clustering technique improves the lifetime for the different network areas especially in larger network areas.This is because, as the network area increases, the number of sensors that cannot reach the sink increases; this increases the efficiency of the decentralized algorithm.On the other hand, a large increase in the network area will lead to a slight increase in lifetime (for example for 500m×500m area the lifetime increased by approximately 17.5%) because some clusters will have a lot of nodes that cannot act as CHs and the nodes that will act as CHs will operate for a small number of cycles due to the large communication distance from nodes to the sink.The simulation is also run for a small network area (100m×100m) and it is found that the lifetime is increased by only 5%.This means that the clustering with decentralized algorithm is not efficient on small network area because all nodes can reach the sink.

Effect of decentralized algorithm in case of no data aggregation
In some applications such as environmental mentoring and industrial applications, data of each and every sensor is important.This means that data cannot be aggregated and the CH collects data from the nodes and sends it as is to the sink without aggregation.Therefore, the decentralized clustering algorithm is investigated in case of no data aggregation and the energy dissipated by the CH and the total energy dissipated inside the network during single cycle will be as follows: Keeping the energy of a node when it acts as an active node the same as in the case of data aggregation (because in both cases, it senses the surroundings and sends the data to the CH node), the energy dissipated in one cluster will be as follows: (1 ) And the total energy dissipated inside the network during one complete cycle will be as follows: By setting the derivative of E Cycle with respect to K to zero to obtain the optimum number of clusters, K 2 will be as follows: (2 ) This equation has no solution since the denominator will always be negative (because the values of the processing energy are very small compared to the energy consumed in ).Therefore, using clustering technique is not efficient and only one cluster is preferred in case of no data aggregation using decentralized algorithm.

Coding in sensor network
In the recent years a significant amount of research has focused on the lifetime prolongation which is the main concern in different WSN applications.But, in important and critical applications such as industrial and medical applications, the throughput may be more important (Margi et al., 2009).Therefore, in this part, it is assumed that the CH collects data from all sensors and sends it to the sink and only one cluster is considered because it was proved in the previous section that the clustering technique was not efficient in case of no data aggregation.Therefore, the CH will be called the network master (NM).But in noisy environments, data received at the sink may be corrupted by noise and wrong decisions may be made.In order to guarantee data integrity at the sink, Error Correcting Codes (ECC) are used.In this work, the use of the Hamming code is considered with different rates.Also, the Hamming code is compared to the CRC which is the most commonly used error detecting technique (Nguyen, 2005).With CRC, correction is attempted by retransmission of the data once.Also, the metric that compromises between the throughput and the lifetime is introduced.
The metric is called Information per Joule (IPJ).It is defined as the overall throughput during network lifetime (Tp) divided by the total energy consumption (E all ) as explained in the following equation: The throughput represents the amount of information that can correctly reach the sink during the lifetime.It is calculated according to the following equation:

11
(1 ) where r is the code rate, C j is the number of cycles allocated to NM and BER is the Bit Error Rate and is defined as the amount of error in the received frames divided by the total frame length.

Processing energy for coding
It is assumed in this work that the coding and decoding processes are part of the sensor hardware architecture.The processing energy is assumed to be the number of binary operations multiplied by the energy per binary operation (Karlsson et al., 2005).The binary operation is defined as the exclusive-or of two bits and the energy per binary operation is the energy consumed in processing of a two-input Exclusive-Or (XOR) gate.It is assumed in this work that the design of the XOR gate is based on static CMOS which is commonly used in sensor networks (Teo et al., 2007, Enz et al., 2004).The energy per binary operation (E oper ) depends on the fabrication technology (Hempstead et al., 2006, Ragini et al., 2009).The following values: 10 -10 , 10 -12 and 10 -14 J respectively will be used (from older to newer technology).

Hamming code processing energy
The number of binary operations in the encoder circuit is equivalent to the number of twoinput XOR gates in the parity tree.The decoder contains two circuits, one for error detection and the other for error correction (Fu & Ampadu, 2010).The processing energy for encoding (E enc ) and decoding (E dec ) will be as follows: [( 1) ] where k is the information block length, n is the codeword length, N c is the number of codewords in the transmitted frame and E oh is the overhead energy due to the sensor's microprocessor instructions execution.

CRC processing energy
A 12-bit CRC is used since it is more suitable for the assumed frame length (for simplicity, it is assumed to be 2048 bits instead of 4000 bits) (Nguyen, 2005).The implementation of CRC using XOR operations is obtained.The processing energy for CRC encoding or decoding is the same and consists of the following steps: 5. Zero padding the polynomial vector to have the same length of data.6. Exclusive-ORing the output vector and the data vector.7. Ignoring the most significant zeros in the output vector because they represent the quotient.8. Exclusive-ORing the remainder and the polynomial vector.These steps are repeated until a remainder of length equal to the length of the used polynomial (12 bits) is reached.This final remainder represents the CRC check bits that are included in the transmitted frame.Consequently, the processing energy for CRC encoding or decoding according to these steps is as follows: where h is the generator polynomial length.

www.intechopen.com
Effect of Decentralized Clustering Algorithm and Hamming Coding on WSN Lifetime and Throughput 267

Network assumptions
At any time instant, the network consists of sensors that sense the surroundings and an NM that collects the sensed data and forwards it to the sink.This means that there are two noisy communication channels: one from sensors to the NM and the other from the NM to the sink.The following assumptions are made regarding applying error correcting or detecting techniques in the sensor network.For the Hamming code, sensors encode the data and send it to the NM that decodes then re-encodes the data again before sending it to the sink.For the CRC, sensors compute the CRC check bits and send the coded data to the NM that decodes it; if an error is detected, the NM will request retransmission.The NM decodes the retransmitted data and if no errors are detected it sends it to the sink.However, if any frame error is detected, the NM will discard the frame because, it is assumed that retransmission only occurs once.
According to these assumptions, the energy consumed by the sensor node for both Hamming code and CRC is as follows: Hamming Code: 1 1 () where p is the path loss factor (for example, it equals 2 for the free space model), S1 is the number of corrected frames received by the NM, sink j d is the distance from NM j to the sink, ji tx NM d  is the distance between a sensor and the NM and C j is the number of Cycles allocated to NM j and it is calculated according to the following relation: where E initial is the sensor node initial energy.

Simulation results and analysis
Simulations are run using MATLAB to study the effect of using error correction/detection techniques on the lifetime and the IPJ in case of additive white Gaussian noise (AWGN) channel.Assuming an area of 100m×100m for simplicity, Table 3 shows the values of the lifetime in cycles for CRC and Hamming code at different values of E oper and SNR.The values prove that the Hamming code has a very low effect on network lifetime.It produces almost the same lifetime of a system without coding (uncoded system) which is equal to 2950 cycles (the lifetime is reduced in case of no data aggregation).In contrast, CRC decreases the lifetime by 37.3% compared to the uncoded system at E oper = 10 -10 J due to its high processing energy.
The other important metric is the IPJ.Fig. 1 shows the IPJ for the used Hamming codes and the CRC at different energy per binary operation.It is found that the IPJ of Hamming (63, 57) outperforms the IPJ of both Hamming (7,4) and CRC.It is also noticed that the IPJ of Hamming (63, 57) at the lowest SNR and highest E oper is higher than the IPJ of CRC and Hamming (7,4) at highest SNR and lowest E oper .Therefore, it is better to use the long Hamming code.
To improve network performance, another method is investigated in which the NM acts as a repeater or a relay that collects data from sensors and forwards it to the sink without decoding or encoding to reduce the processing energy at the NM.MATLAB simulations indicate that the lifetime for both Hamming lengths will increase and become almost equal to the lifetime of the uncoded system.Also the IPJ is slightly improved as shown in Fig. 2.
The figure shows that using the NM as a repeater is more suitable for high rate Hamming codes lengths especially at low SNR.

Error Detection & Correction Techniques
Energy per binary operation (E oper ) 10  10 J 10 On the other hand, it is more reasonable to consider Rayleigh fading channel in the wireless communications channels (Karyonen & Pomalaza-Ráez, 2004).Slow fading is considered in this work due to the small size of the area under study (100m100m).The same two lengths of Hamming are examined in case of AWGN channel with Rayleigh fading (Rayleigh fading channel).The fading channel adds a large amount of errors to the data which decreases the probability of finding a single error in long codeword lengths such as a codeword of length 63.However, the overall IPJ of Hamming (63, 57) is higher than the IPJ of Hamming (7,4) as shown in Fig. 3, due to its higher code rate.The figure shows the IPJ over the AWGN channel and the Rayleigh fading channel of the used low rate and high rate Hamming code.
The IPJ is obtained at E oper = 10 -10 J, because at this value, the processing of coding has a noticeable effect on the lifetime and the IPJ.It is observed from the figure that some degradation of the IPJ in case of Rayleigh fading channel occurs in both lengths of Hamming as a result of the large number of errors added from the fading, which has an adverse effect on network performance.

Fixed data length scheme
All the previous results were based on transmitting a frame of fixed length by all sensors with an amount of data that varies according the code rate.Consequently, all the sensor nodes consume the same amount of transmitted energy and have approximately the same lifetime.On the other hand, in some applications such as environmental monitoring, the sensor collects a fixed amount of data.The sensor can have a fixed amount of data and sends a frame of variable length according to the coding technique used.This length will depend on the amount of added parity by the coding technique.Therefore, in this section, it is assumed that all sensors have a fixed amount of data of length 2048 bits and transmit a frame of length that varies according to the amount of added parity.Consequently, the amount of energy consumed by any sensor node and an NM will change due to variations in transmitting and receiving energy as follows: where K 1 is the data length that equals to 2048 bit and K tx is the frame length and is calculated according to the following equation: Simulations are run to study the effect of using this scheme on the overall network performance.The IPJ for Hamming (7,4) and (63, 57) over AWGN at E oper = 10  10 J are shown in Fig. 4.
The figure shows that the IPJ of Hamming (63, 57) outperforms the IPJ of Hamming (7,4).The rationale behind this result is investigated and it is found that the low rate code such as the Hamming (7, 4) adds a large amount of parity which increases the transmitted energy and has an adverse effect on the lifetime.In contrast, the high rate code such as the Hamming (63, 57) adds a small amount of parity which does not strongly affect the transmitted energy.Consequently it does not affect the lifetime.This difference in lifetime causes the IPJ over the lifetime of the Hamming (63, 57) to outperform the IPJ of the Hamming (7,4).Therefore, the high rate Hamming is more suitable in sensor networks than the low rate Hamming irrespective of the application and the transmitting scheme used by the sensor node (transmitting fixed frame or fixed data).Simulations show that this scheme will not change the result in the case of Rayleigh channel.

Conclusion
Different clustering algorithms and routing protocols were examined for prolonging the lifetime such as LEACH and LEACH-C.This chapter focuses on increasing network lifetime by dividing the network into clusters and making each node inside the cluster acts as a Cluster Head (CH) only once.The Decentralized algorithm is developed and studied in this chapter and it is found that the optimum number of clusters will depend on the algorithm of choosing the CHs.The effect of the decentralized algorithm on networks covering large areas is investigated.It is found that clustering is more efficient for large networks.Also, the proposed clustering algorithm is examined for applications that do not need data aggregation.Results prove that the clustering will be inefficient in case of no data aggregation and only one cluster is preferred.Also, network throughput is a n i m p o r t a n t f a c t o r i n c a s e o f n o d a t a aggregation; therefore error detecting and correcting codes are used to improve data integrity and the whole network is considered as one cluster.The Hamming code with different rates is used to improve network throughput and compared to CRC.It is found that the Hamming code with different lengths provides longer lifetime than CRC due to its lower processing and higher IPJ due to its higher throughput.It is also observed that the Hamming code has a negligible effect on lifetime compared to the uncoded system.These results are taken a step further by examining different lengths of Hamming codes.It is observed that a Hamming code of length 63 is more suitable in sensor networks than that of length 7.This means that the high rate Hamming can provide a higher IPJ at low SNR than the low rate Hamming.The system is also investigated when the NM acts as a repeater or relay that collects data from sensors and forwards it to the sink without decoding or encoding.It is observed that this technique increases the IPJ for high rate code at low SNR.It also increases the lifetime because of the reduction in processing at the NM.The effect of Rayleigh fading channel was also investigated.The results showed that the IPJ of the high rate Hamming is still higher than the low rate Hamming; even though the low rate Hamming improves the BER and makes the IPJ of the fading channel close to the IPJ of the AWGN channel.Finally, a fixed data length scheme is examined to generalize the results for different applications that do not require data aggregation.The results of using this scheme show that the lifetime and the IPJ of the high rate Hamming codes are higher than the lifetime and the IPJ of the low rate Hamming codes.Therefore, the proposed hardware implementation of the high rate Hamming code will be one of the preferred solutions in sensor networks with different transmitting schemes and applications.
Fig. 1.IPJ of Hamming code and CRC at different E oper

Table 1 .
Network parameters and variables

Table 2 .
Lifetime of different network areas using the decentralized algorithm

Table 3 .
Lifetime at different E oper (7,le4shows the values of lifetime (in cycles) at different energy per operation.It is found that the lifetime of the Hamming(7, 4)is lower than the lifetime of the Hamming (63, 57) by about 33% at E oper =10 -10 J.This because of the difference in the amount of energy consumed for transmission for both Hamming lengths.

Table 4 .
Lifetime at Different E oper for fixed data length scheme