Error Resilient H.264 Video Encoder with Lagrange Multiplier Optimization Based on Channel Situation

To achieve an optimum transmission over a noisy wireless channel, both the source coding and network should be jointly adapted. An acceptable video quality in wireless environ‐ ment can be obtained by the adjustment of parameters in video codec and wireless network. For the former, people have proposed many error resilient video encoding algorithms to en‐ hance the robust performance of the compressed video stream in wireless networks. These algorithms can be divided into three categories: 1) error detection and error concealment al‐ gorithms used at video decoder of wireless receiver; 2) error resilient video encoding algo‐ rithms located at video encoder of wireless transmitter; 3) robust error control between video encoder and decoder based on 1) and 2). Fig.1 summarizes different techniques at dif‐ ferent parts of a wireless video transmission system.


Introduction
Robust delivery of compressed video in wireless packet-switched networks is still a challenging problem.Video packets transmitted in wireless environments are often corrupted by random and burst channel error due to multi-path fading, shadowing, noise disturbance, and congestion in physical wireless channel.
To achieve an optimum transmission over a noisy wireless channel, both the source coding and network should be jointly adapted.An acceptable video quality in wireless environment can be obtained by the adjustment of parameters in video codec and wireless network.For the former, people have proposed many error resilient video encoding algorithms to enhance the robust performance of the compressed video stream in wireless networks.These algorithms can be divided into three categories: 1) error detection and error concealment algorithms used at video decoder of wireless receiver; 2) error resilient video encoding algorithms located at video encoder of wireless transmitter; 3) robust error control between video encoder and decoder based on 1) and 2).Fig. 1 summarizes different techniques at different parts of a wireless video transmission system.
Since error concealment algorithms are only used at video decoder in wireless receiver, they do not require any modification of video encoder and channel codec.Hence, there is not any increase of coding computing complexity and transmission rate.Therefore, error concealment algorithms can be easily realized in present wireless video transmission system.However, since error concealment algorithms make full use of spatial and temporal correlation in video stream to estimate the corrupted region of video frames, when the correlation between corrupted region and correctly received frames is weak, error concealment algorithms cannot achieve good effect so that there is apparent distortion in repaired reconstructed video frames.In addition, although error concealment algorithms can reduce the intensity of temporal error propagation, it cannot reduce the length of temporal error propagation.As we know, human visual system (HVS) is not very sensitive to short term obvious error propagation while long term even slight error propagation will annoy the observation of HVS impressively.Therefore, desirable error repaired effect should make the intensity and length of error propagation minimum simultaneously.In a practical wireless video transmission system, one entire frame is normally encapsulated into one video packet in order to make full use of limited wireless bandwidth.In this situation, any loss of one video packet would degrade image quality of successive frames in video decoder apparently since existing video standards utilize inter-frame prediction to make high compression efficiency.Hence, many error resilient methods have been developed to reduce the impacts of errors and improve the video quality in wireless video transmission in recent years [1][2][3][4].However, most of the previously developed algorithms mitigate coding efficiency by adding redundancy to the video stream to enhance error resilient performance.

Error resilient video coding
As mentioned, real time wireless video applications are very sensitive to the increase of coding overhead in [5], which may not only result in additional delay that makes correctly received video packets invalid, but also deteriorate the quality of service in wireless environment especially in ad hoc networks [6].Therefore, it is necessary to make compressed video stream more resilient to errors at minimum expense of coding overhead.
In order to overcome the error propagation effect caused by video packet losses, long term memory motion-compensated prediction [7] is a reasonable way to suppress error propagation in the temporal domain at the cost of reducing the coding efficiency.In [8], the selection of reference frame in long-term motion compensated prediction is proposed for H.263 video with referring to the rate-distortion optimization (RDO) criteria.As a further work of [8], based on the original RDO model in error free condition, an error robust RDO (ER-RDO) method has been proposed in [9] for H.264 video in packet lost environment by redefining the Lagrange parameter and error-prone RD model.However, the ER-RDO method still requires a very high computational complexity to accurately determine the expected decoder distortion.To reduce the computational burden, Zhang et al. [10] developed a simplified version of the ER-RDO method by making full use of block-based distortion map to estimate the end to end distortion.Since the selected Lagrange parameters in these two methods are not precise enough to make corresponding rate distortion optimization, their cost for coding overhead for real time wireless video communication system is not desirable.
In the periodic frame method [11], a periodic frame is only predicted by previous l reference video frame, which is the previous periodic frame.l is the frame interval between neighboring periodic frames.When the frames between two periodic frames are lost, second periodic frame is still decoded correctly, so error propagation can be suppressed efficiently.However, the coding overhead of periodic frame increases obviously when the correlation between neighboring periodic frames is not high.To alleviate the heavy burden on wireless channel resulted by periodic frame, Zheng et al. also proposed the periodic macroblock (PMB) method [11] to reduce the increase of coding overhead by selecting only certain number of important MBs to be predicted by previous l reference video frame.PMB can effectively control the coding overhead with the sacrifice of the error reconstruction effect.Another effective way to constrain error propagation is to insert intracoded MBs.Compared to long term reference frame prediction, it needs more redundancy by adopting the intracoded mode.To obtain a better trade-off between the coding efficiency and error resilient performance, the methods based on accurate block-based distortion estimation model [12] [13] were developed for MPEG4 and H.261/3.The end-to-end approach in [12] generalized the RD optimized mode selection for point-to-point video communication by taking into account both the packet loss and the receiver's concealment method.In [13], the encoder computes an optimal estimate of the total distortion at decoder for a given rate, packet loss condition, and the concealment method.The distortion estimation is then incorporated within an RD framework to optimally select the coding mode for each macroblock.Both methods achieved better error resilient performance.However, their computational complexity and implementation cost are too high.
In this chapter, we develop a new channel based rate distortion (RD) model for error resilient H.264 video codec, which aims at minimizing the coding overhead increase while maintaining a good error resilience performance.In the new RD model, the practical channel conditions like packet lost rate (PLR) and packet lost burst length (PLBL), error propagation and error concealment effects in different reference frames are taken into consideration in analyzing the expected MB-based distortion at encoder.Moreover, for each reference frame, its corresponding Lagrange parameter is adjusted according to the variation of the channel based RD model, which can more accurately describe the relationship between coding rate and expected distortion at decoder in the sense of packet lost environment than other existing methods.Moreover, in our proposed new RD model, a proper intra-coded mode for error resilient performance is also considered.Therefore, more appropriate reference frame and encoding mode can be selected for each MB with the proposed method.
In the following of this chapter, a brief review on the error-robust rate-distortion optimization (ER-RDO) method is given in Section 2. The derivation of our proposed error resilient rate distortion (RD) optimization will be described in the same section.In section 3, the error resilient performance of the proposed method and some existing methods will be evaluated using computer simulations on H.264 video codec.Finally, some concluding remarks will be given in Section 4.

The proposed error resilience optimization method
As the latest video coding standard, H.264 has supreme coding performance by adopting lots of advanced techniques [14].With the rate distortion optimization (RDO) operation, H. 264 achieves a very good coding efficiency and a high PSNR simultaneously in error free condition.For encoding m th MB in n th frame, the RDO operation can find its most proper coding mode and reference frame by minimizing the cost as follows: ( , , , ) ( , , , ) ( , , , ) where D s (n,m,r,o) and R(n,m,r,o) are the source distortion and the coding rate when the MB is predicted by r th reference frame and encoded with mode o.In an error free environment, the Lagrange parameter can be determined by the quantization parameter Q as follows [15] 2 / 3 0.85 : ( .263)0.85 2 : ( .264) However, the cost in (1) doesn't consider the distortion caused by error propagation and error concealment.Therefore, it cannot be directly used for finding the best reference frame and encoding mode in an error prone wireless packet-switched network if the channel condition is taken into consideration.

ER-RDO model
To take into account the packet lost effect, an error robust RDO (ER-RDO) method was developed in [11] by redefining the Lagrange parameter and error-prone RD model based on the practical wireless channel situation and potential decoded MB corrupted distortion.In the ER-RDO model, the expected overall distortion of m th MB in n th frame is determined as where D ec is the error concealment distortion if this MB is lost, and D ep represents the expected error propagation distortion in the case that this MB is received correctly but the reference frames are erroneous.p is the current wireless channel packet loss rate (PLR), and p c is the probability that all reference frames are correct, which is computed by where k is the number of reference frames in the encoder buffer.
If we assume high-resolution quantization, the source distortion D s depends on the rate (R) as follows [9]: where α and β parameterize the functional relationship between rate and distortion [13].If uniform quantization is used, then we have where Δ is the quantization step size.
Referring to ( 5) and ( 6), the selected Lagrange parameter in ER-RDO model is computed as Withλ ER−RDO , (3) and ( 4), the best reference frame r * and encoding mode o * for m th MB in n th frame selected as in [11] are determined as follows.
From ( 4) and ( 7), we can find that the selected λ ER−RDO in each reference frame is identical when the number of reference frame and PLR is known.That is to say, the correlation between the coding rate and expected overall distortion for all reference frames is equal.However, as we know, when the distance between the selected reference frame and the present encoding frame turns to be longer, the probability of correct reconstruction of this frame at receiver is higher with the degradation of the coding efficiency.Therefore, the term (1 − p c )D ep in ( 8) is not accurate enough (a comprehensive interpretation will be given in next subsection).So in the sense of error resilience, the correlation between the coding rate and expected overall distortion for each reference frame at decoder should be different and be varied according to not only PLR and the range of reference frame, but also the distance between the selected reference predicted frame and the present encoding frame.

The proposed channel based RDO model
To The estimated cost for n th frame predicted by n-r where R(n,n-r) is the coding overhead of nth frame predicted by n-r reference frame, and referring to (3),D p (n,n-r) is the expected overall distortion of nth frame at decoder in the proposed channel based RDO model with n-r reference frame.It is given by where D s (n,n-r) is the source distortion predicted by n-r reference frame in error free situation, D lep is distortion caused by the long term error propagation when frames before reference frames are lost.And D ep r is the potential distortion caused by the frame loss in the range of reference frame when n-r frame is the reference frame, which can be computed as followed.
1 ( , ) ( ) ( , ) For computing D ep r as in (11), it includes two parts: one is the error propagation distortion caused by n-k, n-k+1… n-r reference frame.The term D r j +1 in ( 11) is error concealment reconstruction distortion when n-j frame is lost (r ≤ j ≤ k), and its corresponding occurrence probability is When the frames after present reference frame n-r are lost, present encoding frame n can still be decoded correctly, this occurrence probability is computed as So another part is the multiplying results of q s (r) and D s (n,n-r) as in (11).
With ( 9), ( 10) and ( 11), the final estimate cost for n th frame predicted by n-r reference frame is Finally, J p (n,n-r) is computed as So with the derivatives of J p (n,n-r) for Δ as (7), the optimized Lagrange parameter for present encoding frame n predicted by reference frame n-r is obtained by ) where we assume that the buffer length of reference frame k is larger than real-time PLBL obtained from the feedback of wireless channel situation.

Implementation of reference frame and mode selection algorithm
With the results obtained before, we apply the proposed channel based RDO model to select the best reference frame and encoding mode in an H.264 encoder as follows.For one MB in P frame, it has two categories of encoding modes: intracoded and intercoded.Intracoded modes include direct coding, intra_4×4 and intra_16×16; intercoded modes include in-ter_16×16, inter_16×8, inter_8×16 and inter_P8×8 mode (this mode is composed of inter_8×8, inter_8×4, inter_4×8 and inter_4×4 sub 8×8 block modes).For each intercoded mode, the best reference predicted frame r * for m th MB in n th frame in coding mode o is selected by finding the minimum cost of interceded modeJ p (n, m, o, r).

Experimental results
In this section, we evaluate the performance of the proposed channel based RDO model in terms of video quality and coding efficiency in wireless packet lost environment.In our experiments, we use H.264 JM 8.2 codec as test platform where video stream structure is selected as IPPP….Three standard QCIF video sequences, namely Salesman, Susie and Foreman, are used in the simulations.The range of tested intracoded frames in these sequences is from 10 th to 100 th frame.Their QP is set as 28, their frame rate in H.264 JM8.2 is 30 fps, and their buffer of reference frames includes previous five frames.In order to make full use of wireless channel bandwidth, each compressed video frame is transmitted by a single packet.A simple error concealment method is used to make analysis of potential error propagation and error concealment effect at video encoder.When a MB is assumed to be lost, it will be replaced by the MB at same position in the previous error free frame.As a comparison, we use the original H.264 JM8.2 codec, the periodic frame method, the PMB method [11] and ER-RDO method [8] as reference algorithms.In addition, for the PMB method, we use PMB (11%), and PMB (22%) and PMB (33%) to denote the corresponding performance when the proportions of periodic MB in video frame are 11%, 22% and 33% respectively.
We first look at the error resilience performance of the proposed method by considering the PSNR performance of the reconstructed video under a packet loss environment.Fig. 3 shows the error reconstruction effect of three test sequences using different methods when PLR = 0.1 and PLBL < 5.At each point in Fig. 3, it is an average PSNR result when any reference frame of present encoding frame is lost.
For evaluating the coding efficiency of different methods, we consider their impacts on overall coding rate requirement and PSNR performance of reconstructed video in error free environment.The simulation results for the three test sequences are listed in Table 1, 2 and 3 respectively.It is seen that all of the error resilient methods have little effect on original video quality.For fair comparisons, the PSNR performance of the reconstructed video is more or less kept constant for different methods.We then compare the coding rate required for each method.Table 1 shows the coding rate requirement of different methods for Salesman sequence, in which there is high correlation between reference frames and encoding frame.It is noted that the coding redundancy resulted in all methods is smallest among the three test sequences.The coding rate increase of ER-RDO method is not desirable as it needs more bits than the periodic frame method, while the PMB method in different level of long term predicted MB can obtain less rate increase.The coding overhead increase of the proposed method is not obvious as it is only slightly larger than PMB (11%) and PMB (22%) and apparently smaller than PMB (33%).For Susie sequence where the correlation between reference frames and encoding frame is moderate, the coding overhead is in general more than that of Salesman sequence, as shown in Table 2.It is noted that the coding rate of the periodic frame method has increased about 14%, which is a heavy burden for wireless channel.The coding rate increase of ER-RDO is smaller than PMB (33%), while it is still more than PMB (11%) and PMB (22%).The coding rate of the proposed method is just 0.2% higher than that of H.264 JM 8.For Foreman sequence, as there is low correlation between reference frames and encoding frame, the required coding rate of all methods is largest in among the three test sequences, as shown in Table 3. Again, our proposed method achieves the best coding efficiency.The coding rate increase of the proposed method is only 1.93%, while that of PMB (11%), PMB (22%), PMB (33%), ER-RDO and the periodic frame method is 3.01%, 4.23%, 6.63%, 6.82% and 18.23%, respectively.

Method
As a conclusion with the results of error resilient performance and the coding efficiency, the proposed method can obtain not only more satisfying video reconstruction effect but also smaller coding rate increase than the reference methods.Fig. 4 shows the coding efficiency of the proposed method in different PLR from 0.01% to 0.1% of Foreman sequence.In Fig. 4, we can find that the increase of coding rate using the proposed method is small when compared with that of H.264 JM 8.2 codec.Even in some instances of low LPR of Fig. 4, the proposed method can achieve a slightly smaller coding rate than the original H.264 JM 8.2 codec.
As a further analysis on error resilient performance of the proposed method with respect to the PMB and ER-RDO method, Table 4, 5 and 6 give more detailed reconstruction PSNR (dB) effect comparison in Salesman, Susie and Foreman sequences when each of the reference frames in encoder buffer is lost.From the tables, we can find that the PMB method, especially PMB (33%) can achieve better results when the lost reference frame is far away from present encoding frame.On the contrary, ER-RDO can obtain better reconstruction effect when lost reference frame is near to present encoding frame.Our proposed method achieves a compromise between the two methods and obtains better average error reconstruction performance.In addition, it is always better than H.264 JM 8.2 when any reference frame in the encoder buffer is lost.

Conclusions
In this paper, an error resilient method based on the feedback of wireless channel condition is proposed for robust H.264 video stream transmitted in wireless packet lost environment.The proposed method can smartly adjust Lagrange parameter for each reference frame at encoder buffer by adopting proposed channel based RDO model.The modified Lagrange parameter can better reflect the association between the expected distortion and coding effi-ciency of video streaming in the sense of error resilience in packet lost environments.Comprehensive experimental results show that the proposed method sufficiently absorbs the advantages of existing methods and achieves better error resilient performance with minimum increase of coding overhead.

Figure 1 .
Figure 1.Error resilient methods used in packet-switched wireless networks

Figure 2 .
Figure 2. Inter-coded prediction reference frame range r * is best reference predicted frame in coding mode o.Since (1 − p) k +1 D lep is same for any reference frame to predict n th frame, andD ec is independent of encoding modes and reference frame[8], (17) can be simplified to the best intracoded mode o ** for this MB, it can be determined as follows with the cost for intracoded modeJ i (n, m, o, 0).results, the best encoding mode o ^ and its potential best reference predicted frame r ^ in the sense of optimized error resilience for m th MB in n th frame are found as * * ** ( , , , ) arg min( ( , , , ), ( , , ,0))

Figure 4 .
Figure 4.The coding rate (kb/s) of the proposed method with respect to original H.264 JM 8.2 codec in different PLR from 0.01% to 0.1%

Table 1 .
Coding rate comparison of different methods in Salesman sequence Advanced Video Coding for Next-Generation Multimedia Services

Table 2 .
Coding rate comparison of different methods in Susie sequence

Table 3 .
2 but smaller than all other methods.Coding rate comparison of different methods in Foreman sequence Advanced Video Coding for Next-Generation Multimedia Services

Table 4 .
Reconstruction PSNR (dB) comparison of different methods in Salesman sequence

Table 5 .
Reconstruction PSNR (dB) comparison of different methods in Susie sequence

Table 6 .
Reconstruction PSNR (dB) comparison of different methods in Foreman sequence