Different values of and for different values of N=7.
Recent advances in high-performance portable processing equipment, such as mobile processors, have enabled users to experience new-generation devices, including networked gaming consuls, smart televisions and smart phones. Video coding, video compression and video communication are essential parts of the aforementioned applications. However, networking infrastructures do not offer unlimited bandwidth, and storage devices do not offer unlimited capacities. Therefore, there is significant demand for reliable high-performance video communication/compression protocols. Video compression refers to the process of reducing the amount of video data used to represent digital videos; it is a combination of spatial image compression and temporal motion compensation (Hanzo et al., 2007).
Multiple description (MD) coding has appeared to be an attractive technique to decrease the impact of network failures and increase the robustness of multimedia communications (Goyal, 2001). The MD coding is especially useful for those applications in which retransmission is not possible or is too expensive. Lattice vector quantization (LVQ) is a well-known lossy compression technique for data compression. LVQ is used for spatial compression and is less computationally complex due to the regular structure of the lattice ( Conway & Sloane, 1988). MD image coding has been presented in several studied (Bai & Zhoa, 2007) (Akhtarkavan & Salleh, 2010) (Akhtarkavan & Salleh, 2012).
In (Reibman et al., 2002), an MD video coder is presented that uses motion-compensated predictions. This MD video coder utilizes MD transform coding and three separate prediction paths at the encoder. Another MD video coding technique is introduced in (Biswas et al., 2008). In this scheme, the 3D Set Partitioning in a hierarchical tree (3D-SPIHT) algorithm is used to modify the traditional tree structure. Multiple description video coding based on LVQ was presented in (Bai & Zhao, 2006). In that study, MDLVQ is combined with the wavelet transform to produce a robust video coder. An error-resilient video coding scheme using the MD technique is proposed in (Chen, 2008) that employs MDLVQ with channel optimization (MDLVQ-CO). In that study, the central codebook is locally trained according to known channel erasure statistics, and two translated lattices are used for side codebooks to reduce distortion in the case of erasure.
In (Yongdong & Deng, 2003), MD video coding has been used to form an authentication scheme for Motion JPEG2000 (Fukuhara & Singer, 2002) streaming in a lossy network. In this study, the video frames are first transcoded into media fragments that are protected with integrity tokens and a digital signature. Then, the integrity tokens and signature are encoded into codewords using the forward error correction (FEC) for data loss resilience. Because of the use of MD video coding and sequence numbers, the scheme provides content integrity and defeats collage attacks.
In (Franchi et al., 2005), two MD video coding schemes are proposed, i.e., drift-compensation multiple description video coder (DC-MDVC) and an independent flow multiple description video coder (IF-MDVC). An interesting feature of DC-MDVC is the ability to use the actual reconstructed frame as the new reference frame for the side prediction loops instead of the original frame.
In (Zandoná et al., 2005), a second-order predictor is inserted in the encoder prediction loop. In (Campana, et al., 2008), a new MD video coding scheme for the H.264/AVC standard (Wiegand et al., 2003) is proposed based on multiple description scalar quantization. In that study, a splitting block is inserted in the standard H.264/AVC scheme after quantization. This block generates two descriptions by duplicating the control structures, picture parameter set, slice header, and information about the motion vectors. The MD coding video scheme presented in (Radulovic et al., 2010) splits the video information into several encoding threads, and redundant pictures are inserted to reduce the error drift packet loss that occurs. In (Zandoná et al., 2005), a novel RD model for H.264 video encoding in a packet loss environment is proposed. In that study, the end-to-end distortion is estimated by inserting a block-based distortion map to store the potential errors of the current frame that may propagate to the future frames. This scheme is intended for low-delay applications over lossy networks.
The IF-MDVC is designed for transmission in environments in which the packet loss rate is directly proportional to the packet size and thus having more descriptions with a smaller packet size at the same bit-rate is advantageous (Franchi et al., 2005). The MD coding scheme in (Radulovic et al., 2010) requires the channel conditions to allocate the coding rate to primary and redundant pictures, to minimize the total distortion experienced at the receiver. The study described in (Zhang et al., 2004) requires that the potential channel distortions of its reference frames be known a priori. The descriptions generated by the MD scheme presented in (Tillo et al., 2008) based on redundant H.264 pictures are not independent; thus the decoder cannot take advantage of all of the available information.
In this chapter a generic MD video coding based on the coinciding similar sublattices of the hexagonal lattice is presented. The coinciding similar sublattices are special sublattices because they have the same index; even though they are generated by different generator matrices (Akhtarkavan & Salleh, 2012). The proposed multiple descriptions coinciding lattice vector quantization (MDCLVQ) video coding scheme forms a diversity system for MD video coding that can be exploited for any video coding standard such as H.264/AVC or Motion JPEG2000. Extensive simulations and the experimental results of applying the proposed MD coding scheme to several reference QCIF video sequences demonstrate that our MD coding algorithm outperforms state-of-the-art single- and two-description video coding schemes in terms of the compression ratio and transmission robustness. In addition, small differences between the peak signal to noise ratio (PSNR) of the side video sequences compared to the central decoder show significant capability to resist channel failures. Finally, the proposed MD video coding scheme provides acceptable fidelity criteria in terms of PSNR, which means that there is a negligible drop in the quality of the video and a significant increase in error resiliency and compression efficiency.
2.1. Multiple-description coding
Multiple-description (MD) coding is a method to address network impairments when the re-transmission is expensive or impossible. According to Goyal (2001), MD coding can effectively address packet loss without the need for retransmission, thus it can meet the network requirements. In this scheme, a stream of input data is transformed into several different independent descriptions and sent over different channels of a diversity system. At the receiver if all the descriptions are received correctly, the original data will be reconstructed accurately. But, in case some of the descriptions fail to reach the destination, due to channel failure, the rest of the descriptions, which are fed via side decoders, are used to find an estimate of the original data. The performance of the MD system to reconstruct the original data can be of several levels of accuracy. A typical 2-channel MD coding scheme is shown in Fig. 1. Consider the simplest scheme in which two descriptions are the same. If either description is lost then the other would be useful. However, if both descriptions are available then one will be useless and hence the bandwidth has been wasted. In other word, receiving more descriptions must result in better reconstruction quality which can be offered by the MD coding (Goyal, 2001). According to information theoretic approach, the MD coding scheme may not require more bit rate (bandwidth) than the single description system. In the MD coding system, there is always a trade-off between the required bit rate and the distortion. Thus, in MD coding scheme, compression efficiency is sacrificed in order to gain error resiliency. Therefore, MD coding should be applied only if it does not require too extra bit rate or a wider bandwidth (Goyal, 2001).
2.2. Elementary of lattices
In mathematics (algebra), a lattice is defined as a partially ordered set (poset) in which any two elements have a unique supremum (the element’s least upper bound or join) and an infimum (greatest lower bound or meet). In other words, a lattice is considered as a subset of points in the Euclidean space that share a common property. For example the lattice is a subset of points with n+1 coordinates, such that the sum of these coordinates is zero. Therefore, the lattice can be defined as:
An n-dimensional lattice in is denoted by . It means that consists of all integer linear combinations of a basis vectors in (Heuer, 2008). The fundamental parallelotope of a lattice is defined as (Conway & Sloane, 1988)
The fundamental parallelotope is the building block of the lattice because if it is repeated many times, the whole space is filled in a way that there is only one lattice point in each parallelotope. The lattice points are generated using a generator matrix. The generator matrix of the lattice with the basis vectors is given as (Conway & Sloane, 1988):
The Gramm matrix of a lattice is defined as , where is the transposed matrix of . The Gramm matrix determines the linear independence of the basis vectors, that is, they are linearly independent if and only if the determinant of the Gram matrix is non-zero. Two lattices are called equivalent if they have the same Gramm matrix or if the Gramm matrices are proportionate. There are many ways of choosing a basis and a fundamental parallelotope for a lattice . But the volume of the fundamental region is uniquely determined by , and the square of this volume is called the determinant of the lattice (Conway & Sloane, 1988). The determinant of a lattice is also equal to the determinant of the Gramm matrix (Conway & Sloane, 1988):
Thus, the volume of the fundamental parallelotope of a lattice is calculated as (Conway & Sloane, 1988)
The hexagonal lattice is a subset of the complex space , and at unit scale it is generated by the basis vectors , where (Vaishampayan et al., 2001). The hexagonal lattice is generated by
and the Gramm matrix of is calculated as (Conway & Sloane, 1988)
and . Thus, the volume (or area) of fundamental parallelotope of will be calculated as .The hexagonal lattice is also generated by
and the Gramm matrix of can be calculated as
According to , the lattices generated by and are equivalent lattices. However, the determinant of is 3 and the volume of fundamental parallelotope of is . This is because and both describe the hexagonal lattice but in different coordinates and on different scales (Conway & Sloane, 1988). In an n-dimensional lattice , the Voronoi region of a lattice point is defined as the union of all non-lattice points within that are closer to this particular lattice point than any other lattice point. Thus, the Voronoi region of is defined as (Vaishampayan et al., 2001)
As a consequence, all the points within must be quantized to . The Voronoi regions of the points in the A2 are hexagons; therefore, it is called the hexagonal lattice. The Voronoi region of a sublattice point is the set of all lattice points that are closer to than any other sublattice points. Thus, the Voronoi region of is defined as
Lattice vector quantization (LVQ) is a vector quantization technique that reduces the amount of computation for codebook generation since the lattices have regular structures. A finite set of points in an n-dimensional Euclidean space, , is called an Euclidean code (Conway & Sloane, 1982a). An n-dimensional quantizer is a mapping function that sends each point into provided that is the nearest code point. The code points may be selected according to any type of relationship. If the code points are selected from a lattice, then the quantizer would be called a lattice vector quantizer. Fast quantizing algorithms are a family of lattice vector quantization algorithms presented in (Conway & Sloane, 1982b) for different root lattices. The quantization using lattice points is a projection from n-dimensional space onto hyper plane. The fast quantizing algorithm first projects the n-dimensional input vector onto n+1 dimensional vectors on hyper plane using a matrix multiplication (Conway & Sloane, 1982b). Then, using a manipulation the projected point is mapped onto a lattice point. In case of lattice, the input stream of data is vectorized into 2-dimensional vectors. Then, each input vector is projected onto the 3-dimensional space, using the transformation matrix T given as (Conway & Sloane, 1982b)
If the expression does not hold, all the coordinates need to be rounded to the nearest integer points, while keeping the original values in another variable. The projected 3-dimensional vector is easily quantized (mapped) to the nearest lattice point by a simple manipulation. The sum of the differences between each coordinate of the original projected point to the nearest integer is calculated. If the sum of the differences is positive, then 1 is subtracted from the coordinate farthest from the integer. On the other hand, if the sum is negative, then 1 is added to the coordinate with the most difference. Thus, performing the computation-intensive nearest neighboring search algorithm is avoided. The two-dimensional version of the result point is calculated by right multiplying the quantized point by (Conway & Sloane, 1982b).
2.3. Coinciding similar sublattices of A2
Assume that is an n-dimensional lattice with the generator matrix . A sublattice with generator matrix is said to be geometrically similar to if and only if , for nonzero scalar , an integer matrix with , and a real orthogonal matrix B (with ) (Conway & Sloane, 1988). The index N is defined as the ratio of the fundamental volume of the sublattice to the fundamental volume of the lattice . Therefore, the value of N controls the coarse degree of the sublattice as well as the amount of redundancy in the MD coder (Vaishampayan et al., 2001).Thus, N is calculated by
Sublattice is considered as a clean sublattice if all the points of the reside only inside the Voronoi region of the sublattice points rather than on the boundary of the Voronoi region (Conway et al., 1999). It has been shown in (Bernstein et al., 1997) and (Vaishampayan et al., 2001) that, for the hexagonal lattice, is similar to if N is of the form
In addition, N must be in the form of , where, denotes the number of points at squared distance from the origin. The sublattices of A2 are clean, if and only if, and are relatively primes. It follows that A2 has a clean similar sublattice of index N if and only if N is a product of primes congruent to 1 (mod 6) (Conway et al., 1999). The sequence of integers that generate clean sublattices of the hexagonal lattice are named A038590 by Sloane (Sloane, 2000). If these conditions are met then the basis vectors of the sublattice will be and . In other words, and are selected such that the value of N satisfies these conditions and hence a clean similar sublattice of the hexagonal is generated. Thus, the basis vectors are calculated as and. The corresponding generator matrix will be
For example, with and , a clean similar sublattice of the hexagonal lattice with index is generated. The basis vectors are calculated as and . Thus, the corresponding generator matrix will be calculated as
It is also possible to calculate the index of the sublattice generated by using Eq. (13). The determinant of the generator of the hexagonal lattice at unit scale is . The determinant of is calculated as . Thus, the index will be . The sublattice generated by is shown in Fig. 2 with blue squares and the hexagonal lattice points are shown with light blue triangles. The fundamental parallelotope of the hexagonal lattice and the similar sublattice generated by are shown with small and big parallelogram, respectively. The basis vectors, u and v, are also shown. The Voronoi region of the sublattice point is shown with a dashed hexagon.
It is seen in Fig. 2 that has generated a clean similar sublattice because there are no lattice points on the boundary of the Voronoi region of the sublattice points. The coinciding similar sublattices are defined as geometrically similar sublattices of a root lattice with the same index N but generated by different generator matrices. The commonly used values of N are 7, 13, 19, and 37. Therefore, α and β must be selected such that clean sublattices of the hexagonal lattice with a desired index is generated. In order to find the suitable values of α and β, all twofold combinations of integers [-10, -10] … [+10, +10] have been examined and only the combinations that generate clean similar sublattices of the hexagonal lattice with index N = 7 are provided in Table 1.
The similar sublattices of the hexagonal with index N=7 corresponding to the generator matrices to provided in Table 1, are plotted in Fig. 3. The sublattice points corresponding to are shown with blue squares and the sublattice points corresponding to are shown with red circles. It can be observed that in the area that is shared between all the sublattices, only two distinct sublattices exist. In other words, the similar sublattices are coinciding with each other in a regular manner. Thus, these sublattices are called coinciding similar sublattices of the hexagonal lattice. In other words, the sublattices form two groups, the first group of sublattices coincide with and the second group coincide with . Each group consists of 6 sublattices which are coinciding with each other. Another apparent property of the coinciding sublattices, and , is that they overlap with each other in a regular pattern, that is, they overlap with each other every N lattice points in every direction. These points are shown by big red circles with black border. The SuperVoronoi set of an overlapping point is the set of all lattice points that are closer to this point than any other overlapping point. The SuperVoronoi set of the overlapping points are shown with a blue hexagon. This symmetry is used in construction of the partitions. Partitions are used to define the new equivalence relation and the new shift-property that can be used to simplify the labeling function.
|i||α||β||c||i||α||β||G i '|
As shown in Table 1 there are 12 generator matrices corresponding to every indices N = 7. The choice of the generator matrix changes the quantization process because the transformation matrix is calculated based on the generator matrix, that is, the choice of the generator matrix determines the transformation matrix used. The root lattice has two different definitions. One is in n-dimensional space, that is, with dimensions. Another group of generator matrices are with dimensions. For example, lattice has a 3-dimensional generator , in addition to the generators in 2-dimesional space. The transformation matrix is calculated using the relation
By substituting , , and in Eq. (3-7), the corresponding transformation matrix will be
For example, the transformation matrix corresponding to α = -3 and β = -2 is calculated as
3. MD video coding based on coinciding similar sublattices of A2
MD coding has appeared to be an attractive scheme to be used for representing fault tolerant communication schemes and lattice vector quantization has been known for representing high compression performance with low computational needs. A multiple-description lattice vector quantizer encodes the vectorized source for transmission over a two-channel communication system.
In this section, multiple-description lattice vector quantization based on the coinciding similar sublattice of A2 (hexagonal lattice) are presented. These schemes are called MDCLVQ-H.264/AVC and MDCLVQ-Motion JPEG2000. The experimental results will be presented in section 4.
3.1. System overview
A video is a sequence of two dimensional images coming with certain timing details. Therefore, it is possible to consider a given video as a three-dimensional array in which the first two dimensions serve as spatial directions of the moving pictures, and the third dimension represents the time domain. In this way, a frame is defined as a set of all pixels that correspond to a single moment.
In the proposed MD video coding schemes the input video is converted into two descriptions before being encoded by the standard video encoder. Then, the encoded descriptions are sent on the channels. If both descriptions reach the receiver, then a high quality video can be reconstructed by the central decoder. However, if only one description arrives, then a degraded video is reconstructed by the appropriate side decoder. The MDCLVQ is a generic scheme and it can be adopted for any video coding standard.
However, in this research it has been applied to H.264/AVC and Motion JPEG2000 only. Thus, two MD video coding schemes MDCLVQ-H.264/AVC and MDCLVQ-Motion JPEG2000 are defined, respectively. In other words, MDCLVQ-H.264/AVC uses the H.264/AVC video encoder while MDCLVQ-Motion JPEG2000 uses the Motion JPEG2000 video encoder. Therefore, both schemes are shown in a single schematic diagram in Fig. 4. The MDCLVQ video coding scheme is described in following subsections. In the proposed MDCLVQ scheme lattice points are used as the codebooks of the quantizer and the coinciding similar sublattice points are used as the labels. The schematic diagram of the MDCLVQ is illustrated in Fig. 6. The MDCLVQ scheme includes: the wavelet transformation module, the vectorization module, the LVQ module, the labeling function module, the arithmetic coder/decoder module, and the MD decoders. These modules are described in following subsections.
3.2. Wavelet decomposition module
It is possible to consider a video as a three dimensional array in which the first two dimensions serve as spatial directions of the moving pictures, and the third dimension represents the time domain. In this way, a frame is defined as a set of all pixels that correspond to a single moment. Every frame is decomposed into several wavelet coefficients (sub-bands), where the biorthogonal Cohen-Daubechies-Feauveau (CDF) 5/3 wavelet transforms (with lifting implementation) with 1 level of decomposition is used. Finally, the wavelet coefficients are streamed to the LVQ module.
3.3. Lattice vector quantizer module
In the LVQ module, the 2-D vectors, constructed in vectorization module are mapped to the nearest neighbouring lattice point using the fast quantizing algorithm. Lattice is used in the LVQ as the codebook. The details of the fast quantizing algorithm have been presented in section 2.2.
3.4. The coinciding labeling function
The proposed coinciding labeling function is composed of the hexagonal lattice , coinciding sublattice number 1, and coinciding sublattice number 2, . The first coinciding sublattice is generated by ; while the second coinciding sublattice is generated by . The labeling function maps each lattice point into two coinciding sublattice points that form a label. The first coinciding sublattice point belongs to ; while the second coinciding sublattice point belongs to . The two produced descriptions are encoded using a basic zero-order arithmetic codec prior being transmitted over the channels.
There are a lot of similarities between the terminologies of MDCLVQ-A2 schemes and traditional MDLVQ schemes. The coinciding similar sublattices overlap with each other in a regular pattern, that is, they have overlapping points every N lattice point in every direction. These points comprise the set of overlapping point. It is defined as
where, is the lattice, is the first coinciding similar sublattice, and is the second coinciding similar sublattice. In other words, the overlapping points are the lattice points on where the two coinciding similar sublattices coincide with each other. The Voronoi region/set of a coinciding sublattice is defined in Eq. (13). The SuperVoronoi region/set of an overlapping sublattice point is the set of all lattice and sublattice points that are closer to than any other overlapping point. In other words, the SuperVoronoi region of is defined as
where is the lattice and is the set of overlapping points. The coinciding similar sublattices partition the space into SuperVoronoi regions. The coinciding similar sublattices of the hexagonal lattice with index N = 7 and the SuperVoronoi regions of the overlapping sublattice points are plotted in Fig. 3. The coinciding similar sublattice points are within the partitions. Thus, the labels all reside within the same SuperVoronoi region as the labelled lattice points. Therefore, both vertices of the label will be as close as possible to the original lattice point and the side distortions will be low.
The labeling function maps every hexagonal lattice point to a label. In the MDCLVQ scheme, the label is considered as a directed edge which consists of two coinciding sublattice points and selected from the coinciding similar sublattices and respectively. The set of labels corresponding to every overlapping point is defined as:
In other words, is the set of all possible labels within the SuperVoronoi set of the overlapping point . The coinciding similar sublattices have inherent symmetries that can be used to define a new equivalence relation. In order to define the new equivalence relation, a modified version of the congruency relation is used. The traditional congruency relation accepts natural modulus; however, the modified version of the congruency accepts real modulus.
Let’s assume that and denote the x-coordinate and y-coordinate of the lattice point , respectively. Although and may take on values from an unlimited range of real numbers, using , the remainders only take on values from a finite set of real numbers. It has been observed that according to the congruency relation the remainders are repeated in a regular pattern. For example, in case of the coinciding similar sublattice with index N the remainders of take on values and the remainders of take on values . Thus, it is possible to define a new equivalence relation for the coinciding similar sublattice with index N. The equivalence relation enables the labeling function to find the SuperVoronoi sets of the sublattice points without using the computation intensive neighbourhood search. The new equivalence relation, denoted as , is defined as:
Eq. (23) shows that the two points of the lattice, and , are equivalent, if and only if their x-coordinates are congruent modulo and their y-coordinates are congruent modulo . Thus, the lattice points are divided into partitions. Each equivalence class is denoted by a twofold as the following:
The proposed labeling procedure has three properties. First, it utilizes different sublattices rather than a single sublattice. Second, it utilizes a new rule of equivalence as well as a new partitioning scheme. Third, the coinciding labeling scheme utilizes a new shift property in order to simplify the labeling procedure. The new shift property is defined as:
This finding shows that in order to label an arbitrary lattice point, it is just enough to obtain its equivalent point within the SuperVoronoi set of the origin, and translate the label of the equivalent point by . Thus, the problem of labeling all lattice points has been simplified to the problem of labeling only the lattice points within the SuperVoronoi set of the origin. These labels are obtained from the set of labels corresponding to the origin .
The rule for assigning labels within to points within is the parallelogram law. According to the parallelogram law if a lattice point λ is labelled using . Then, sum of the side distortions, is equal to , where is the midpoint of the label. Thus, in order to minimize the distortions, the algorithm chooses the shortest label that its mid point is as close as possible to the lattice point. The distance between the label and the lattice point is considered as the length of the line connecting the midpoint of the label to the lattice point. The labeling function is performed in the following nine steps;
Generate the coinciding similar sublattices and
Obtain the set of overlapping points
Obtain the SuperVoronoi set of the origin
Construct the partitions using the defined relation
Obtain , the set of labels corresponding to the origin
Assign every lattice point within , to the shortest edges within that its midpoint is as close as possible to
For any point outside the SuperVoronoi set of the origin, , obtain its equivalent point within the SuperVoronoi set of the origin,
Obtain the label of the equivalent point
Translate the label of the equivalent point by
As the two sublattices can be used interchangeably in the labeling algorithm, there is no preference between the two descriptions. Thus, there is no need to use the alternative transmission technique. As for example, consider the hexagonal lattice with unit fundamental area and the coinciding similar sublattices with index shown in Fig. 5. The first sublattice points are shown with circles and the second sublattice points are shown with squares. The sublattice points within are labeled with numbers. In Fig. 5, the lattice points outside the SuperVoronoi set of the origin are shown with triangles and the overlapping points are shown with circled-squares.
In general, for the sublattices of the hexagonal lattice with index N, there are points within the SuperVoronoi set of the origin. There are possible labels within that are obtained by the Cartesian product of and . According to the new equivalence relation defined in Eq. (23), the partitions are constructed by dividing the x-coordinate of the lattice points modulo and the y-coordinate of the lattice points modulo . Thus, the lattice points are divided into partitions. The partitions are divided into single-point partitions and double-point partitions. In case of index N=13, there are 181 lattice points inside the SuperVoronoi set of the origin. These points are also partitioned into single-point partitions and 12 double-point partitions. In case of the coinciding sublattice with index N=7 there are 55 lattice points within the SuperVoronoi set of the origin that are shown with crosses in Fig. 5. The points within the SuperVoronoi sets of the origin are divided into 49 partitions, including 43 single-point partitions and 6 double-point partitions. The double-point partitions are shown with capital letters in Fig. 5. The same partitioning scheme is repeated for every overlapping point due to the inherent symmetry of the coinciding similar sublattices. The constructed partitions are listed in Table 2.
|λ||( λ x , λ y )||m x , m y||λ||( λ x , λ y )||m x , m y||λ||( λ x , λ y )||m x , m y|
The points inside the double-point partitions are located on the boundaries of the SuperVoronoi regions of the overlapping sublattice points. The points residing on the borders are always equidistant from the two adjacent overlapping points. Thus, it is possible to label them according to two overlapping points. However the labeling function must be injective. Therefore, a simple rule is invented to the labeling function in order to avoid confusion of having double-point partitions. The invented rule must be to label the equidistant points using either the first overlapping point or the second overlapping point. In the current labeling function these points are labelled using the overlapping point with smaller x-coordinate.
Consider the points AG and AK in Fig. 7 that form a double-point partition; since and . According to the new rule, AK is labelled due and the label is . In contrast, AG is labelled due and its label is . This rule is repeated for other double-point partitions. However, every single-point partition within the SuperVoronoi sets of the origin is labelled using the shortest edge within which its midpoint is as close as possible to the lattice point. The complete list of the labels is given in Table 3.
3.5. Inverse wavelet transforms
In this step the side descriptions are inverse wavelet transformed and the two different video streams are produced. The inverse wavelet module uses the biorthogonal inverse lifting (CDF) 5/3 wavelet transform. The experimental results related to the application of MDCLVQ-A2 to MD video coding are presented in section 4.
3.6. Video encoders/decoders
In the encoder modules, the input side video streams are encoded by H.264/AVC or Motion JPEG2000 encoders. In case the H.264/AVC is selected as the video encoding standard, then the JM reference encoder is used. On the other hand, the Motion JPEG 2000 is another choice of the video encoding module that is done using the software provided by OpenJPEG library. The bit streams are sent over the channels. In the receiver side, the bit streams are decoded using the corresponding decoders. The decoding in this step is almost lossless.
3.7. The decoders in the receiver side
There are two different decoding processes in this scheme. One is the lossy decoding that is done in the side decoders and central decoder. Another is the lossless decoding that is done in H.264/AVC (Motion JPEG2000) decoder. The latter is the reverse of the encoding process that is done in H.264/AVC (Motion JPEG2000) encoder and the former is the reverse of the labeling function.
The received video data streams are decoded by H.264/AVC (Motion JPEG2000) then sent to the central decoder and the side decoders. There are an of decoders in an m-channel MD coding schemes, that is, a central decoder and side decoders (Ostergaard, 2007). In the MDCLVQ, a two channel MD coding scheme is used. Thus, there are three decoders: the central decoder, side decoder 1, and side decoder 2. At the receiver, if both descriptions are received correctly, then the central decoder is used to map the labels back to the original lattice points. The side decoders are used to find an approximation of the original lattice point if only one single description is received.
The central decoder is the reverse of the labeling process described above. When the two descriptions are received correctly, they are sent to the central decoder to be mapped back to the corresponding lattice point. Assume that the received label is . In order to decode the following three steps are performed:
Obtain the equivalent label of within . Denote it as . Two labels are said to be equivalent if their vertices are equivalent points; i.e. and .
Obtain the lattice point corresponding to the label .
Translate by to find the original lattice point.
In this scheme, if only the description from channel is received while the other channel is corrupted, then is used in the side decoder to find the approximate of the original lattice point using the following steps:
Obtain the equivalent sublattice point of within and denote it as .
Obtain the subset of labels that use as one of their coordinates by:
Obtain the midpoint of the lattice points within that their labels are members of . It is denoted as .
Obtain the approximate point by translating by .
In other words, the equivalent label for the corrupted data is one of the labels within . There is no way to determine the original label as well as the original lattice point. Thus, it is required to obtain the approximate point by finding the weighted midpoint of the lattice points corresponding to the labels residing within . In order to find the approximate point, the weighted midpoint is translated b , using the shift property.
As for example, if the received descriptions are given as and . Since 9.5 is congruent to 2.5 and 0.866 is congruent to . Therefore, belongs to the partition . Similarly, it is possible to show that belongs to the partition . Thus, the label is equivalent to the label according to Table 4. Both labels are shown with black arrows in Fig. 6. The necessary translation to shift to would be , according to the shift property. It is shown with a red arrow.
In the next step, the lattice point corresponding to is obtained from Table 3, that is . The necessary shift to translate to the original lattice point is also . Therefore, the original lattice point is (10, 1.734). The decoding procedure is depicted in Fig. 6. However, assume that one of the descriptions is lost and therefore only one description is received correctly. As the labeling function is balanced, it does not matter which label is received. In case is received, it is processed by the side decoder 1 to find the approximate point. As showed before belongs to the partition and its equivalent sublattice point that is residing within is given by .
According to Table 4-2, the lost description may be within . Thus, the approximate equivalent point will be the midpoint of the lattice points within . In order to find the approximate of the original point is translated by which obtains as result the point (9.8235, 0.7121). On the other hand, if is received then the side decoder 2 is used to process it. The coinciding sublattice points are used either as the first or the second vertex of the label. Therefore, at the receiver it is possible to determine the channel to which the received description belongs.
4. Experimental results for MDCLVQ-A2 applied to video coding
In this section, the experimental results of the MDCLVQ-A2 video coding schemes are presented. The proposed MDCLVQ-A2 is applied to H.264/AVC and Motion JPEG2000 schemes. Section 4.1 presents the experimental results for MDCLVQ-H.264/AVC and the results for MDCLVQ-Motion JPEG2000 are presented in section 4.2.
The standard videos of “Akiyo”, “Coastguard”, “Carphone”, and “Foreman” with the QCIF format (144×176 pixels) and 7.5 fps are chosen to evaluate the benefits of the proposed MDCLVQ-H.264/AVC scheme over the original single description scheme in terms of bit rate-efficiency and error-resiliency. The default settings of the original JM9.4 reference encoder are used for encoding the original videos and the side videos.
The MDCLVQ scheme is a multivariable system because it is controlled by three factors: the wavelet threshold, the fundamental area of the hexagonal lattice, and the index of the coinciding sub-lattices. The performance of the MD coding scheme is evaluated based on the average required bit rate and reconstruction fidelity in the decoders. The fundamental area of the lattice is changed by multiplication of a factor called sigma to transformation matrix. Increasing the fundamental area increases the required bandwidth (bit rate) for encoding the video and the reconstruction fidelity. On the other hand, increasing the index decreases the required bit rate. In these experiments the labeling function with index N=7 has been used constantly and the wavelet threshold is constantly zero. Then, the value for the bit rate is calculated at the end of every encoding runs. Therefore, the analyses provided below are based on values of sigma.
The fundamental area of the hexagonal lattice is changed by sigma = 0.1, 0.2… 1, while keeping the shape of the hexagonal lattice fixed. Thus, the quantization error is adapted and central distortion, side distortions and the associated bit rates are affected. The required bit rate for encoding both descriptions must be compared with the required bit rate for encoding the single description source. Suppose that a given video source requires R b/s to be encoded by a single encoder. On the other hand, if for the same video source, the first description of the MD coding scheme requires R1 b/s and the second description requires R2 b/s. The bit rate efficiency is defined as
The bit rate efficiency is defined so that the performance of the proposed MD coding scheme is compared with the original video encoder in terms of the required bit rate. Thus, if the MD video coding scheme can encode both side videos with total bit rate (R1+R2) ≤ R then the bit rate efficiency will be positive and the MD coding is outperforming the original encoder. However, if (R1+R2) > R the the bit rate efficiency will be negative and it indicates that the compression efficiency has been sacrificed for gaining error resiliency ( (Goyal, 2001). Therefore, the bit rates required by the proposed MDCLVQ-H.264/AVC scheme for encoding the two descriptions are provided in Table 4. The bit rates required to encode the standard videos by the JM9.4 reference encoder and the bit rates efficiencies calculated using the Eq. (27) are also provided in Table 4.
According to Table 4, the bit rate efficiencies of the proposed MDCLVQ-H.264/AVC scheme corresponding to sigma= 0.1 to 0.7 for all the videos are positive; therefore, error resiliency and compression efficiency are both obtained and with the sigma lower than 0.7. However, for sigma 0.8 to 1 the compression efficiency is not achieved. As an example, the MDCLVQ-H.264/AVC with sigma = 0.1, in average shows 0.89 bit rate efficiencies for all the videos, which is an improvement as compared with the original encoder. However, for the values of sigma from 0.8 to 1, the proposed scheme requires more bit rates than the original H.264/AVC encoder, therefore, the compression efficiency has been traded off for error resiliency. The error resiliency of the proposed MCLVQ-H.264/AVC scheme is due to its MD coding nature, that is, it can provide a degraded version of the source if only one of the channels works, while a better reconstruction quality is achieved when both channels work and both descriptions arrive at the destination.
Although the bit rate efficiency is important for MD video coding, the quality of the reconstructed videos in the central and side decoders are also important. In other words, compression efficiency is not acceptable if it comes with significant drop in the quality of the video. A common measure to evaluate the quality of the video encoding is to compare the reconstructed video with the original video through the average peak signal to noise ratio (PSNR) of the luminance (Y) components. In the following results, the PSNR is measured for the luminance component.
The performance of the MDCLVQ-H.264/AVC scheme in terms of the required bit rate and the reconstruction fidelity is controlled by the fundamental area of the hexagonal lattice. The fundamental area of the hexagonal lattice is changed by sigma. Therefore, in Table 5 the average PSNR values of the reconstructed videos in the central decoder as well as the side decoders for the “Akiyo”, “Coastguard”, “Carphone”, and “Foreman” video sequences are provided in Table 5 for different values of the sigma.
|Miss America||Side 1||16.53||23.58||28.07||31.57||34.39||36.75||38.72||39.98||40.72||41.15|
As shown in Table 5, the qualities of the reconstructed videos are directly proportional to the value of sigma. In other words, the PSNR of the central decoder and the PSNR of the side decoders improve as the sigma is increased. On the other hand, according to Table 4, the bit rate efficiencies decrease as the sigma is increased. Then, the quality of the reconstructed video is reversely related to the bit rate efficiency. Thus, as the value of sigma increases the compression efficiency is traded off for better reconstruction quality. As for example, the MDCLVQ-H.264/AVC shows the best reconstruction quality for the “Foreman” video when the sigma is 1; PSNR = 36.96 in the side 1 decoder, PSNR = 36.82 in the side 2 decoder, and PSNR = 38.05 in the central decoder. However the MDCLVQ-H.264/AVC shows the best bit rate efficiency for the “Foreman” video when the sigma is 0.1.
In addition, according to Table 5 the PSNR values for the central decoder and the side decoders are not significantly improved as sigma is increased from 0.7 to 1. This implies that the best performance of the MDCLVQ-H.264/AVC in terms of the bit rate efficiency and reconstruction quality for the “Foreman” video is seen with sigma = 0.7. As an example, for the “Foreman” video with sigma = 0.7 and with almost the same bit rate as the original encoder the MDCLVQ-H.264/AVC provides PSNR = 34.07 dB in the central decoder, PSNR = 33.08 dB in the side 1 decoder, and PSNR = 33.08 dB in the side 2 decoder.
According to Table 5, the reconstruction performance of the proposed MDCLVQ-H.264/AVC scheme in terms of the central PSNR and side PSNR do not significantly increase while the sigma is increased from 0.7 to 1. In order to demonstrate the visual quality of the reconstructed videos, the first frames of the central videos of “Foreman” sequence that have been encoded by the MDCLVQ-H.264/AVC with sigma = 0.1 to 1 are provided in Fig. 7.
According to Fig. 7, the quality of the reconstructed videos of the “Foreman” increase as the value of sigma is increased. The reconstruction qualities for the “Foreman” videos corresponding to sigma values lower than 0.3 are very low (PSNR = 14.58, PSNR = 19.67dB, and PSNR = 23.82dB) while the reconstruction quality of the videos corresponding to sigma higher than 0.4 are acceptable. For example the reconstructed video corresponding to sigma = 0.7 has central PSNR = 34.07dB.
However the difference between the PSNR corresponding to sigma = 0.7 and sigma = 1 is negligible. In other words, the extra bit rate that is required for higher values of sigma is not worth of the gained reconstruction fidelity, thus the best performance of the MDCLVQ-H.264/AVC for the “Foreman” video is achieved with sigma = 0.7. In other words, using sigma = 0.7 the proposed scheme can offer acceptable side and central reconstruction qualities as well as error resiliency without requiring extra bit rate.
4.2. MDCLVQ-motion JPEG2000
The proposed MDCLVQ-Motion JPEG2000 scheme generates two descriptions of the input video that are encoded by the OpenJPEG software to motion jpeg files. In this subsection, the volume of the motion jpeg files generated by encoding the MD sequences are compared with size of the motion jpeg file of the original single description scheme and the encoding efficiencies are calculated similar to bit rate efficiency in Eq. (27). Suppose that volume of the motion jpeg file of a given video source is V (kB). On the other hand, if for the same video source the first description of the MD coding scheme generates a motion jpeg file with V1 (kB) and the second description generates a motion jpeg V2 (kB). Then, the compresssion efficiency is defined as
The compresssion efficiency is defined so that the performance of the proposed MD coding scheme is compared with the original video encoder in terms of the encoding efficiency. Thus, if the MD video coding scheme can encode both side videos with total volume of (V1+V2) ≤ V then the compression efficiency will be positive and the MD coding is outperforming the original encoder. However, if (V1+V2) > V, then the compresssion efficiency will be negative and it indicates that the compression efficiency has been traded off for gaining error resiliency
In these experiments, standard video sequences of “Akiyo”, “Coastguard”, “Carphone”, and “Foreman” with the QCIF format (144×176 pixels) are used to evaluate the performance of the proposed MDCLVQ-Motion JPEG2000 encoding scheme. As stated in section 4.1, the performance of the MDCLVQ-A2 and consequently the performance of the MDCLVQ-Motion JPEG2000 scheme in terms of the compression efficiency and reconstruction fidelity are controlled by the fundamental area of the hexagonal lattice. The fundamental area of the lattice is changed by multiplying the sigma to the generator matrix (transformation matrix). The compression efficiencies for the MDCLVQ-Motion JPEG2000 are provided in Table 6. According to Table 6, the volume of the motion jpeg files generated by the MDCLVQ-Motion JPEG2000 scheme increase as the sigma is increased. Thus, as the sigma increases the encoding efficiency is decreased.
This is similar to the results for the MDCLVQ-H.264/AVC because the encoding (compression) efficiency is due to the coarse degree of the quantization which is controlled by the sigma. In general, the compression efficiencies in Table 4 are lower than corresponding values in Table 5 because H.264/AVC standard has higher encoding (compression) efficiency than Motion JPEG2000.
Although the compression efficiency is important for MD video coding, the quality of the reconstructed videos in the central and side decoders are also important. In other words, compression efficiency is not acceptable if it comes with significant drop in the quality of the video. The performance of the MDCLVQ-Motion JPEG2000 scheme in terms of the compression efficiency and the reconstruction quality is affected by the value of the sigma. The average PSNR values of the reconstructed videos in the central decoder as well as the side decoders for the “Akiyo”, “Coastguard”, “Carphone”, and “Foreman” video sequences are provided in Table 7.
|V e f f||0.44||-0.30||-0.81||-1.19||-1.63||-1.94||-2.18||-2.48||-2.73||-2.95|
The PSNR values of the central decoder and the side decoders of the MDCLVQ-Motion JPEG2000 provided in Table 7 are higher than the corresponding values of the MDCLVQ-H.264/AVC provided in Table 5. This is because of better reconstruction quality of the Motion JPEG2000 as compared with the H.264/AVC. However, the general pattern is the same and all the PSNR values increase as the sigma is increased because reconstruction fidelity is affected by the performance of the MDCLVQ-A2. On the other hand, the compression efficiencies of MDCLVQ-H.264/AVC are in general better than MDCLVQ-Motion JPEG2000 because of higher compression efficiency of the H.264/AVC.
The proposed MD coding schemes require low computational capabilities, due to the inherent symmetries of the coinciding similar sublattices of the hexagonal lattice. The labeling function and the partitioning scheme presented in section 3.4 are not computation intensive processes. The partitions are generated using the simple proposed two-fold real congruency relation. The proposed shift property simplifies the problem of labeling the entire lattice space to the problem of labeling the SuperVoronoi region of the origin. The SuperVoronoi region of the origin with index N includes N2+N-1 lattice points and N2 sublattice points. Thus, the labeling function is simplified to a table look up with N2+N-1 entries. In addition, the proposed labeling function, the partitioning scheme, and the shift property are scalable and as a consequence using the sigma the performance of the entire scheme is adjustable, as shown in section 4.
|Miss America||Side 1||16.34||26.07||31.36||33.82||35.52||35.95||37.60||38.08||38.59||38.95|
The main objective of this research is to design and implement a generic MD video coding scheme that can be adopted by different kinds of video coding standards. In other words, the proposed scheme does not rely on specific characteristics of the video coding standard being used and does not make any assumption about the state of the channel or probability distribution of errors. In this way, the proposed scheme is different from the MD video coding schemes presented in (Yongdong & Deng, 2003), (Franchi et al., 2005) (Zandoná et al., 2005). Thus, it can effectively increase the performance of the encoding system in terms of compression efficiency and reliability, regardless of the encoding scheme being used. In addition, the proposed scheme does not require significant hardware or software changes; therefore, it can be adopted for available hardware and software designs. Although running parallel H.264/AVC or Motion JPEG2000 encoders requires more computational resources than the original encoder, the total computation time of the MD encoders is less than that of the original because the entropies of the side videos are significantly decreased by the MDCLVQ, and thus, the time required by of the arithmetic coding is decreased.
6. Future research
As for future direction, it would be interesting to develop the MD client-server video encoding and decoding that are able to stream the real time videos on networks. The developed encoder module can be run on the server side and the decoder module is on the client side. The future MD client-server video codec must be adaptive or able to make the best use of the available bandwidth to meet its requirements. In other words, the user must be able to increase or decrease the quality of reconstruction with regard to the available bandwidth. Finally, it is suggested that the MDCLVQ schemes to be integrated with the H.264/AVC or Motion JPEG 2000 video encoding standards in a way that the built-in quantization and encoding procedures are replaced by the MD scheme in order to increase the performance of the MD coding. Thus the performances of the MD video coding schemes are optimized.