Open access peer-reviewed chapter

Fast Motion Estimation’s Configuration Using Diamond Pattern and ECU, CFM, and ESD Modes for Reducing HEVC Computational Complexity

By Randa Khemiri, Nejmeddine Bahri, Fatma Belghith, Soulef Bouaafia, Fatma Elzahra Sayadi, Mohamed Atri and Nouri Masmoudi

Submitted: March 14th 2019Reviewed: May 12th 2019Published: October 23rd 2019

DOI: 10.5772/intechopen.86792

Downloaded: 111


The high performance of the high efficiency video coding (HEVC) video standard makes it more suitable for high-definition resolutions. Nevertheless, this encoding performance is coupled with a tremendous encoding complexity compared to the earlier H264 video codec. The HEVC complexity is mainly a return to the motion estimation (ME) module that represents the important part of encoding time which makes several researches turn around the optimization of this module. Some works are interested in hardware solutions exploiting the parallel processing of FPGA, GPU, or other multicore architectures, and other works are focused on software optimizations by inducing fast mode decision algorithms. In this context, this article proposes a fast HEVC encoder configuration to speed up the encoding process. The fast configuration uses different options such as the early skip detection (ESD), the early CU termination (ECU), and the coded block flag (CBF) fast method (CFM) modes. Regarding the algorithm of ME, the diamond search (DS) is used in the encoding process through several video resolutions. A time saving around 46.75% is obtained with an acceptable distortion in terms of video quality and bitrate compared to the reference test model HM.16.2. Our contribution is compared to other works for better evaluation.


  • HEVC
  • motion estimation
  • early skip detection (ESD)
  • early CU termination (ECU)
  • coded block flag (CBF) fast method (CFM)

1. Introduction

The fast multimedia technology development and network communications makes ultrahigh-definition (HD) and HD video contents widely used in our daily life. This fast jump to use high video resolutions in which many provide some problems in terms of memory storage cost and transmission bandwidth gives birth to the new high efficiency video coding (HEVC) [1, 2]. HEVC is developed in 2013 by the joint collaborative team on video coding (ISO/IEC) Moving Picture Experts Group (MPEG) and the International ITU-T Video Coding Experts Group (VCEG). It is urbanized to overcome the enormous amount of UHD video contents. Compared to the earlier H.264/AVC [3] standard and at the identical visual quality, HEVC guarantees a high encoding performance, reaching 50% of bitrate [4]. Facing to this immense huge encoding performance, a huge computational complexity is obtained. Motion estimation (ME) represents the large part of encoding process that occupies around 70% of the total time of inter prediction, as Jungho [5] indicates in Figure 1.

Figure 1.

Encoding time distribution.

This large consumption is principally due to the new hierarchy of the block coding based on coding tree units (CTU). This new concept is analog to macroblocks in the earlier standard of compression. Each picture frame is divided into square forms, called coding units (CUs) [6], where 64 × 64 represents the maximum size, and recursively subdivided into 8 × 8 blocks. Prediction and transform blocks (PUs and TUs) are in each CU, where PU represents the principal unit in the ME process.

Figure 2 shows the CTU tree structure in the HEVC standard where LCU represents the large coding unit and SCU represents the small coding unit.

Figure 2.

CTU tree structure in the HEVC standard.

When reducing the time essential for the search algorithm, the ME computational complexity will be automatically reduced. Furthermore, when using different fast mode decision algorithms based on early termination, the ME computational complexity will be reduced, which primes to the entire HEVC execution time reduction.

It is within this context that this article presents a fast encoding algorithm principally based on the early skip detection (ESD), the coded block flag (CBF) fast method (CFM), and the early CU termination (ECU) modes [7, 8, 9] to decrease the HEVC encoding complexity.

The remainder of this paper is structured as follows: the next section details some works on the HEVC fast motion estimation algorithms. Section 3 provided an overview of the motion estimation algorithm. Section 4 highlights the proposed fast configuration for the HEVC encoder. Experimental results for the fast HEVC configuration compared to the results obtained with the original HM16.2 reference software [10] are discussed in Section 5. Finally, in Section 6, conclusions and some prospects are given.

2. Related works

Aiming to optimize the HEVC encoder complexity, several works have been proposed to reduce the test zonal search (TZS) motion estimation algorithm. Some works are interested in hardware solutions, and others are focused on software optimizations.

In [11], using sequential and parallel techniques, two hardware diamond architectures for HEVC video coding are proposed. These architectures achieve an encoding in full HD at 30 frames per second using a Virtex-7 field programmable gate array (FPGA) design.

Authors in [12] have proposed a hardware parallel sum of absolute difference (SAD) design for gray-scale images to reduce motion estimation time for block size of 4 × 4 pixels. A multiplier is exploited for addition as a partial product reduction (PPR). Results obtained on Virtex-2 Xilinx FPGA show that the maximum frequency obtained is 133.2 MHz for 4 × 4 block size. Nalluri et al., in [13, 14], have proposed two other SAD architectures on FPGA Xilinx Virtex without and with parallelism. The proposed parallel architecture has accelerated the SAD calculation by 3.9× compared to the serial SAD architecture. In [15], authors have proposed two implementations of the SAD and SSD algorithms using NVIDIA GeForce GTX480 with CUDA language in order to reduce the ME run-time. The proposed architecture saved about 32% of encoding time for class E video sequences with nonsignificant degradation in the PSNR and the bitrate.

Regarding software solutions, the 8-point square and the 8-point diamond have been replaced by Nalluri et al. [16] with a 6-point hexagonal in the TZS ME algorithm, and 50% in encoding time is saved without degradation in bitrate and PSNR. To replace the TZS algorithm, in [17, 18], authors proposed small diamond pattern search (SDPS), large diamond pattern search (LDPS), and horizontal diamond search (HDS). Experiments using HM8.0 showed that these algorithms allow a reduction of 49% of motion estimation calculation time with nonsignificant increase in bitrate and slight degradation in video quality.

In [19], Liquan et al. have proposed a fast mode decision algorithm by skipping some depths. The proposed work allows saving about 21.5% of encoding time with a slight bitrate increase and a negligible efficiency loss coding. The algorithm proposed by Qin [20] uses the ECU algorithm according to an adaptive MSE threshold value. This work ensures time saving without degradation in the quality. Podder [21] has also proposed an interesting software method to reduce the ME time. Based on human visual features (HVF), an efficient decision of the appropriate block partitioning mode has been obtained. This work allows saving 41.44% of the execution time for SCVS video sequences. In the work published in [22], a fast HEVC ME based on DS and three fast mode decisions, ECU, ESD, and CBF modes, have been presented. Simulation results show a reduction of 56.75% in the complexity of HEVC in terms of execution time, accompanied with slight degradation in video quality and bitrate, when comparing the HM.16.2 executed on an Intel® Core TM i7–3770 @3.4 GHz processor. Authors in [22] have tested just one sequence from each class with just two quantification parameters (QPs), QP = 22 and 37, to evaluate the use of the fast modes.

By analyzing all these previous works, we can note that using fast mode decision algorithms represents an interesting technique in order to reduce the HEVC computational complexity.

3. Overview of the motion estimation in the new HEVC

TZ Search algorithm, used in HEVC ME process (Figure 3), includes four distinct main stages in order to determine the best motion vector.

Figure 3.

Motion estimation process.

These stages, which are the motion vector prediction (MVP), the first search performed with a pattern of square or diamond forms, the refinement, and the raster search, are described in the next subsections.

3.1 Motion vector prediction (MVP)

To compute the corresponding block’s median predictor, the TZS algorithm uses the up predictor, the upper right predictor, and the left predictor (Figure 4).

Figure 4.

MV adjacent of a current PU.

The median computation is done via the following equation.


3.2 Initial grid search

The first search is performed by the determination of the search pattern and the “searchrange.” As it is detailed in Figure 5(a) and (b), the main goal of this stage is to localize the search window via a pattern of square or a diamond forms.

Figure 5.

Diamond/square search pattern. (a) Diamond search pattern stride length equal to 4. (b) Square search pattern stride length equal to 4.

Thus, these two search patterns are referred to the eight points for each round. The distance corresponding to the minimum distortion point is saved in the “BestDistance” variable. Currently, diamond search pattern is used as default, but the square pattern search can also be used by modifying the HEVC configuration file through the “Diamondsearch” variable.

3.3 Raster search

This step consists of choosing the distance which corresponds to the greatest matched point from the previous search. Three cases according to this distance denoted as “BestDistance” are summarized as follows:

  • The process is stopped when “BestDistance” = 0.

  • A refinement is needed when 1 < “BestDistance” < iRaster.

In the configuration file, “iRaster” represents a changeable variable not to be overdone.

  • BestDistance > iRaster is agreed correctly; a raster scan is achieved using the iRaster value as the length step. If difference obtained from the starting station to the MV from the first level is besides large, this step is preceded. This step is computed on the entire search window.

Figure 6 shows an example of a full search algorithm with iRaster which is equal to 4.

Figure 6.

Raster search pattern when iRaster = 4.

3.4 Raster and star refinement

The refinement is performed when the distance of the motion vector previously obtained is different to 0. There are mainly two refinement types:

  • Raster refinement

The best point obtained from the previous steps corresponds to the start point of the star refinement. It can be performed using a diamond or a square pattern with distances ranging from “search range” to one. In each iteration, the distance is divided by 2, and when the distance will be equal to one, two adjacent point searches are performed, and then the process is stopped.

  • Star refinement

In this step, the selected point obtained from the previous steps corresponds to the start point of the star refinement. In each iteration, the distance is divided by two, and when the distance will be equal to one, two adjacent point searches are applied to determine the best estimated MV which gives the minimum of SAD (Figure 7).

Figure 7.

The used TZSearch algorithm.

4. The proposed fast configuration

Several fast decision mode algorithms are in this effort aiming to speed up the ME process. Firstly, diamond search pattern is utilized to decrease the encoder computational complexity. Some configurations are also set, such as the early CU termination (ECU), the early skip detection (ESD), and the coded block flag (CBF) in which fast decision mode algorithms are adopted in HEVC video coding. These proposed fast algorithms were given bellow.

4.1 Early CU termination (ECU)

This algorithm is used when switching from a depth p to the next p + 1. As Figure 8 showed, if skip is the best current CU prediction mode, the sub-tree calculations can be skipped [23]. Thus, good mode is determined with rate distortion (RD) calculation cost [24]. The minimal RD cost relates to the skip mode that caused the stop of the partitioning [25].

Figure 8.

Algorithm of early CU termination.

Several works show that the most chosen mode was the skip [25]. This clarifies the detail that an excessive enhancement is obtained when the skip mode recognition is anticipated. This mode induces a better encoder performance since it denotes a block code deprived of residual information.

4.2 Early skip detection (ESD)

The early skip detection signifies a modest verification of the two-variance motion skip conditions (CBF and differential motion vector (DMV)). As shown in Figure 9, this verification is performed after determining the best inter 2 N × 2 N. Before checking the skip mode, the current CU performs two inter 2 N × 2 N modes (advanced motion vector prediction called AMVP and merge mode). The DMV and CBF are checked when the minimum RD cost is induced by the mode selection. When CBF is equivalent to zero and the best mode inter 2 N × 2 N DMV is equal to (0, 0), the skip mode is the best mode of current CU. Consequently, the residual modes of PU are not examined anymore [8].

Figure 9.

Early skip detection algorithm.

4.3 Coded fast method (CFM)

The coded fast method (CFM) detects the best mode of a prediction unit [7]. As shown in Figure 10, for each PU mode belonging to a CU, the RD cost is calculated.

Figure 10.

Algorithm of coded block flag fast method.

An evaluation of the different coefficients, CBF for the luminance and the two chrominances, is performed. When all transform coefficients (CBF_Y, CBF_U, and CBF_V) are equal to zero [9], all remaining modes will not be tested.

5. Experimental results

5.1 Experimental conditions

The performance evaluation of this work is effectuated with a random access (RA) configuration through the HM 16.2 reference test model, exploiting the fast mode decision algorithms ECU, ESD, and CBF, previously detailed. To appraise the fast implementation, a comparison of HEVC encoding time, bitrate, and PSNR with the original is effectuated, where a search range is 64. Sixty-four also is the CU maximal size and CU partition depth maximal equals four. An Intel® Core TM i7–3770 @ 3.4 GHz is used in this work with Windows 8 OS platform.

The four resolutions tested are to the four classes (class D (416 × 240), class C (832 × 480), class B (1920 × 1080), and class A (2560 × 1600)) [26]. For each video sequence, 50 is the encoded frame number used. To evaluate results, eight sequences recommended by the JCT-VC [26], each one with four quantification parameters (QP) 22, 27, 32, and 37, are used.

5.2 Evaluation criteria

To evaluate this work, we used the formula detailed in Table 1.

ΔT (%)Encoding time speedupΔT%=TProposedTOriginalTOriginal×100%
ΔBR (%)Bitrate increaseΔBR%=BitRateProposedBitRateOriginalBitRateOriginal×100%

Table 1.

Evaluation criteria.

Where: Bitrateoriginal, PSNRoriginal and Toriginal represent bitrate, video quality, and encoding time of the original algorithm, respectively and Bitrateproposed, PSNRproposed and Tproposed, BitRateProposed represent bitrate, video quality, and encoding time of the proposed algorithm, respectively.

5.3 Results

Table 2 specifies the results obtained when using the proposed fast HEVC configuration compared to the original one.

ClassSequencesQPΔT(%)ΔPSNR (dB)ΔBR(%)
Class A 2560 × 1600PeopleOnStreet22−25.04−0.040−0.470
Average class A−50.385−0.123−0.478
Class B 1920 × 1080BQTerrace22−29.91−0.060−1.180
Average class B−55.926−0.059−0.504
Class C 832 × 480RaceHorses22−11.45−0.029−1.430
Average class C−34.400−0.126−0.454
Class D 416 × 240BQSquare22−34.64−0.075−0.870
Average class D−46.326−0.116−0.218

Table 2.

Performance evaluation of the proposed algorithm compared to the original one.

The proposed algorithm shows a time saving of up to 46.759% on average compared to the original algorithm. The speedup attains 84.51% of encoding time for BQTerrace video for QP 37. In fact, the time saving is more important for some videos such as Traffic, BQSquare, and BQTerrace sequences ranging from 57.91 to 65.135%. This is due to the motion slowness in these sequences. Indeed, for videos containing low motion activities [18], the improvement is more significant. With the highest resolution, traffic video is characterized by intensive movement of objects against a stationary background. Concerning BQSquare, this video having fast motion is often coded by the bi-predictive mode, as it is the best prediction mode.

Defiantly for sequences with high activity, such as BlowingBubbles, RaceHorses, and PeopleOnStreet, the time saving is only around 34.73 and 28.38%. The worst case is for the motion-filled and dynamic RaceHorses video, which records horse racing. Many great frequency details are in this video, since horsetail is regularly expensive to encode.

The time saving is visible with 49% for BasketballDrive sequence. This video contains a high contrast and high motion activities. The background has a rather similar texture.

Not only the encoding time was saved but also the bitrate which is justified by the negative values in the table, ranging from 0.002 to −2.182% for PartyScene and PeopleOnStreet with QP equal to 22 and 37, respectively. Regarding the quality of video, the PSNR deprivation is from −0.015 to −0.23 dB for BasketballDrive and RaceHorses with QP equal to 22 and 37, respectively.

In average, the fast HEVC configuration induces a nonsignificant poverty in terms of video quality, around 0.106 dB, with a decrease of 0.416% in the bitrate that is a very interesting point in terms of increasing the compression performance.

Figure 11 shows the curves of rate distortion (RD) of HEVC original algorithm and the fast one, for two sequences for each class: PeopleOnStreet and Traffic from class A (2560 × 1600), BQTerrace and BasketballDrive from class B (1920 × 1080), PartyScene and BasketballDrive from class C (832 × 480), and BlowingBubbles and BQSquare from class D (416 × 240). This can also be checked in Table 2. The sequences are taken at QPs 22, 27, 32, and 37.

Figure 11.

RD curve comparison of our algorithm versus the original one.

Four QP parameters are presented in all curves; horizontal axes on (kbps) represent the bitrate where the vertical one on (dB) represents the PSNR.

Figure 11 shows that all RD curves are overlaid [27]. In fact, the proposed changes have insignificant impairments on bitrate and PSNR. For lower QP values, the degradation is more significant. Experimental results prove that the fast configuration gives better performances than the original one, given that it offers a significant time saving, without any influence on the quality and the bitrate.

Further, for all tested sequences, an important speedup is obtained for bigger QPs. Figure 12 evaluates the time saving in average by varying from 22 to 37. We note that the time saving increases in proportion to QP. In average, for higher QP, equal to 37, the run-time decreases by 63.5%. This decline is justified by the choice of the skip mode for bigger QP values [25].

Figure 12.

Curves of time saving for all videos coded through random access configuration with QP from 22 to 37.

Table 3 summarizes the performances of the proposed work compared to different previous algorithms.

Kibeya [17]Liquan [19]Qin [20]
ΔPSNR (dB)ΔBR (%)ΔT (%)ΔPSNR (dB)ΔBR (%)ΔT (%)ΔPSNR (dB)ΔBR (%)ΔT (%)
Class A−0.0521.0830.670.1%−22.4
Class B−0.0130.2945.37−0.0200.834−34.000.3%−28.4
Class C−0.0110.5320.86−0.0451.225−16.500.2%−23.0
Class D−0.0080.266.9−0.0401.060−13. 500.2%−17.0
Podder [21]Proposed fast algorithm
ΔPSNR (dB)ΔBR (%)ΔT (%)ΔPSNR (dB)ΔBR (%)ΔT (%)
Class A−41.9−0.123−0.478−50.384
Class B−34.37−0.059−0.504−55.926
Class C−42.92−0.126−0.454−34.400
Class D−46.57−0.116−0.218−46.326

Table 3.

Proposed algorithm compared to previous works.

Compared to [17], the proposed work was more competent in terms of bitrate and saving time. In fact, [17] allows saving about 25.95% of encoding time with a slight bitrate. This algorithm was based on large diamond search pattern as an algorithm for motion estimation implemented on HM8.0. Concerning Liquan [19], its algorithm consists of skipping some detailed depths used in the preceding frames. This work allows saving about 21.5% of encoding time with a slight bitrate. Qin [20] implemented an algorithm established on the ECU according to a MSE adaptive threshold value. A time saving without degradation in the quality is obtained in this work. Another interesting method was presented by Podder et al. [21], where human visual features (HVF) are used for the selection of appropriate block partitioning modes. This work offered 41.44% reduction in terms of time for the standard class video sequences (SCVS).

6. Conclusion

HEVC induces an important progress in terms of video quality, in particular for high video resolutions. Nevertheless, this recital is combined with a bigger computational complexity which tremendously increases the encoding time. Motion estimation module using the quadtree structure represents the mainly strong process that is a conduit to the augmentation of the HEVC computational complication. In this paper to decrease this computational complexity, one fast configuration was presented to optimize the ME process by using CU partitioning fast mode decision algorithm and a diamond search. A reduction of 46.75% in the encoding time is obtained without inducing a significant degradation in encoding performance in terms of video quality or bitrate.

As perspectives, additional optimizations will be also implemented to reduce the encoder complexity via digital platform for video processing.

We will also exploit the fast configuration detailed in this paper for the new compression standard Joint Video Exploration Team (JVET) [28, 29].

How to cite and reference

Link to this chapter Copy to clipboard

Cite this chapter Copy to clipboard

Randa Khemiri, Nejmeddine Bahri, Fatma Belghith, Soulef Bouaafia, Fatma Elzahra Sayadi, Mohamed Atri and Nouri Masmoudi (October 23rd 2019). Fast Motion Estimation’s Configuration Using Diamond Pattern and ECU, CFM, and ESD Modes for Reducing HEVC Computational Complexity, Digital Imaging, Muhammad Sarfraz, IntechOpen, DOI: 10.5772/intechopen.86792. Available from:

chapter statistics

111total chapter downloads

More statistics for editors and authors

Login to your personal dashboard for more detailed statistics on your publications.

Access personal reporting

Related Content

This Book

Next chapter

Digital Image Processing Applied to Optical Measurements

By Alonso Saldaña-Heredia, Pedro Antonio Márquez-Aguilar, Álvaro Zamudio Lara and Arturo Molina-Ocampo

Related Book

First chapter

Knowledge Discovery and Information Extraction on the Open Internet Using MATLAB and Amazon Web Services (AWS)

By Kelly Bennett and James Robertson

We are IntechOpen, the world's leading publisher of Open Access books. Built by scientists, for scientists. Our readership spans scientists, professors, researchers, librarians, and students, as well as business professionals. We share our knowledge and peer-reveiwed research papers with libraries, scientific and engineering societies, and also work with corporate R&D departments and government entities.

More About Us