Watermarking Technique for Multimedia Documents in the Frequency Domain Watermarking Technique for Multimedia Documents in the Frequency Domain

In order to secure and maintain the authenticity and integrity of multimedia documents, we use digital watermarking. This discipline can be applied to images, audios, and videos. For this reason, and to be independent of the nature of the signal composing the document to be watermarked, we will propose in this chapter two watermarking techniques, one for the audio and another for the image to watermark a video containing the two components audio and image. MDCT is combined with Watson model and a motion detection algorithm in the image watermarking technique and is combined with a psychoacoustic model to elaborate the audio watermarking technique. For the two techniques, the bits of the mark will be duplicated to increase the capacity of insertion and then inserted into the least significant bit (LSB). We will use an error correction code (Hamming) on the mark for more reliability in the detection phase. To highlight our experimental results point of view robustness and imperceptibility, we will compare the proposed techniques with some other existing techniques.


Introduction
The spread of multimedia documents and by virtue of the development of technologies in connection with the computer directs the world toward an era where the digital takes a primordial place. In addition, the development of the Internet and, more generally, the new means of communication authorized the large-scale dissemination of digital data. Despite the mentioned advantages, we are facing serious problems: multimedia documents become unprotected, digital data are distributed in an illegal manner, and copyrights are unprotected.
Where does the digital watermarking come from as a security mechanism complementary to encryption? Its basic idea is to insert the information in a robust and imperceptible way in multimedia documents [1]. On after the literature, digital watermarking has received substantial interest as a research topic in the 90s [2,3]. For the past 28 years, the work on digital watermarking continue to multiply in order to find watermarking techniques for multimedia documents that must meet the following criteria: robustness against a maximum number of attacks and manipulations, high capacity insertion, and imperceptibility of the mark. An appropriate watermarking system must provide the best compromise between these three main features (Figure 1).
A watermarking system is formed mainly by two processes: insertion and detection. A mark W is inserted in a multimedia document M to obtain the watermarked document M 0 by applying the insertion process. In some watermarking systems, we can use a secret key C to perform the insertion. The marked document M 0 can undergo transformations, and we obtain the resulting document M 00 . Subsequently, we move to the detection of the mark. There are several detection schemes which we quote: the private scheme where the original digital document is given to the detector, the mark is detected by comparing the original with the watermarked, and the semi-private scheme which gives an answer in the presence or absence of the mark (true or false) without using the original document and the blind scheme, in which only the secret key is needed to extract the mark. To design a watermarking system, the choice of the insertion area is considered as a very important step [4,5]. We can distinguish three major fields of insertion: the domain without transformation (spatial domain and time domain), the frequency domain, and the multi-resolution domain. The domain without transformation can be the spatial domain for the image and the video and the time domain for the audio. One of the advantages of the methods operating in this field is that they are very fast, since no initial treatment is necessary. However, such a domain does not offer much resistance against existing attacks. The frequency domain is obtained after the application of a transformation such as fast Fourier transform (FFT), discrete cosine transformation (DCT) [6], etc. The most important benefit of using the transformed domain is that it is already used to prepare multimedia information in communication standards such as JPEG for still images [7], MPEG2 for video sequences [8], and MPEG1 for audio [9]. Techniques operating in the frequency domain have the advantage of being robust against the compression operation, since they use the same space that is used for coding. The development of new compression standards such as JPEG2000 [7] and MPEG4 [8] has led researchers to use other insertion domains as the multiresolution domain [10]. The information represented in this area is well localized in Figure 1. Compromise between robustness, ratio, and imperceptibility. frequency and time. The sub-band decomposition allows isolating the low frequency components. The middle and high components constitute a less sensitive insertion space.
In the following, we will present some watermarking techniques for video existing in the literature.
• Shaveta and Daljit [11]: in this technique, the authors apply the SWT to the images of the video. Subsequently, they apply the SVD to each subband of the red layer. Then, they change the singular values of the HH band with the singular values of the HH band of the brand. For the other two layers, they select the block with the highest S values and then apply the DCT to the selected band. Finally, they insert the mark on each of the selected bands. The detection scheme is the inverse of that of insertion.
• Shital et al. [12]: In this article, the author used a watermarking technique to detect tampering in a video. The technique operates in the frequency domain using DCT as a transformation. After generating the mark (hash value of the frame, the micro-block numbers, and the frame number), the latter is inserted into the frames in the frequency domain. The insertion is done by replacing the LSB of the highest non-zero DCT coefficient by the bit of the corresponding mark.
• Supriya and Navin [13]: in this chapter, the author proposes a hybrid technique for video based on the discrete wavelet transform and singular value decomposition. In this technique, the mark is inserted into the original video images by first converting it into the YCbCr color space. Next, the luminance portion (Y component) is broken down into four subbands using a discrete wavelet transform. Finally, the singular values of the sub-band LL are perceptually shaped by singular values of the image of the watermark. The detection scheme is the inverse of that of insertion.
In this chapter, we will propose a watermarking system for multimedia documents based on the following ideas: • The frequency space is a good space points of view robustness and imperceptibility, hence the choice of the modified discrete cosine transformation (MDCT) to switch to the frequency domain.
• The temporal methods based on the least significant bit (LSB) provide good results in terms of imperceptibility, insertion capacity, and robustness. For these reasons, came the idea of using the concept of LSB not in the time domain but in the frequency domain to take advantage of the latter.
• To have a blind detection and to reduce the error rate, we had the idea to use a substitute method with an error correction code.
• To select the places of insertion, we exploited the properties of the psychoacoustic models 2 of MPEG 1 for audio component, the properties of the human visual system, the Watson model for image component, and a motion detection algorithm to watermarking video.
• Finally, to improve the robustness against attacks, we thought to duplicate the bits of the mark several times.
This chapter is organized as follows: in Section 2, we will detail some related works and the process of insertion and detection for the proposed techniques. Section 3 will present the experimental results and compare the results obtained by the proposed watermarking system with other existing in the literature. In the last section, we give a conclusion for this work.  [14]. The coefficients obtained after the application of the MDCT are separated into two bands: high frequencies band and low frequencies band. In our work, we will use a modified version of the MDCT.
The direct and inverse MDCT defined for the audio signal are given by: where: • x(n) is the sample number n, • k is the number of the frequency line (k ∈ [0, N À 1]). where: • n is the number of the temporal sample, n ∈ [0, N]), • k is the number of the frequency line k ∈ [0, N]).
For the image, and as we are going to work on blocks of two dimensions, we will use the MDCT for two-dimensional arrays.
The direct and inverse MDCT defined for the image signal are given by: where: • N1 Â N1 size of the image I, • I(i, j) value of the pixel at position i, j of the image I.

Motion detection
To improve the robustness of the video watermarking technique, it is preferable to insert the mark in moving objects [15,16]. For this reason, we have chosen to use a motion detection algorithm, the one proposed by Peddireddi [17], to identify the objects in motion in the video where we will insert the bits of the mark. The algorithm is composed of four main blocks presented in the following figures (Figures 2 and 3).

JND
JND (or just noticeable difference), also known as just perceptible difference or differential threshold, is the minimum amount by which the intensity of the stimulus must be modified to produce a noticeable variation in a sensory experience [18]. This measure is used in the Watson model which consists of the following steps: • Change the domain of study by calculating the DCT.
• Definition of the quantization matrix. This model uses the Q m quantization matrix of the JPEG standard [19].
• Calculate the frequencies sensitivity coefficients.  • Calculate the sensitivity to the luminance.
• Calculate the contrast masking threshold, M.
• Finally, calculate the quantization error E divided by M to obtain the JND threshold.
In our work, we will change this model. To achieve the change of the domain study, we will use the MDCT instead of the DCT to exploit its advantages. This choice is also due to the fact that the MDCT has better coding performance than the DCT and also due to the calculation complexity of MDCT which has been reduced in recent years.

Psychoacoustic model
In our work, we will use the psychoacoustic model 2 of the MPEG1 standard. We chose to incorporate this model into our proposed watermarking technique for the audio component of the video, if it exists, in the search for insertion positions. In this model, we do not distinguish between tonal and non-tonal components, but we calculate tonal indices that determine whether the components appear to be tonal or nontonal (noise) [9]. This model is applied on time frames and calculates a masking curve that we will note, thrω. Figure 4 shows the masking curve thrω for a test signal that has been selected.

Insertion scheme
The diagram we will adopt can be summarized in Figure 5.
In this section, we will give the general principle of the process of inserting the brand for the video watermarking technique. For the realization of this technique, we will adopt a proposed watermarking technique for the still image and another proposed technique for the audio. The insertion is performed at moving objects and in non-successive images. This choice is inspired by the fact that: • Successive images are strongly correlated, and a mark can be detected and deleted easily by a hacker. • Moving objects are considered a very important factor as, for example, in MPEG4 compression. So, to guarantee a good robustness criterion especially against the compression, we inserted the bits of the mark in the moving objects of the video. We can also improve the invisibility criterion as the mark moves with the objects.
1. The initial input signal is an uncompressed video file. The latter may include or not an audio component.

2.
After reading the original video, we proceed to the separation of the two audio and image components. For this reason, the first step is to check if the video has an audio component or not. If the video does not have an audio component, then we extract only the different images constituting the video.

3.
In this technique, we will insert the mark "Mark1" in the audio component and the mark "Mark2" in the image component. Before proceeding with the insertion of the two marks, we must binarize them. The insertion process of the proposed technique can integrate any type of mark (text, image, and beep sound). The length of the marks is chosen to be multiple of 8. After binarization of the two marks, we obtain two binary vectors of length multiple of 8. This choice will then be useful for performing a Hamming coding (12,8) [20] on each byte of the binary vectors. The use of the Hamming error correction code makes it possible to improve the detection rate of the two marks, as the inserted bits can be modified (inversion from 0 to 1 or from 1 to 0). It will ensure the correction of errors if necessary. Hamming (12,8) is a linear code whose principle is to add 4 control bits to encode an 8-bit word. At the end, we obtain two coded bit vectors which represent the two coded marks, of length multiple of 12.

4.
To obtain a robust watermarking technique against the different manipulations, we will insert the bits of the mark "Mark2" in no-successive images. Hence, the interest of the module allows to select E images among the D images of the video. Subsequently, we proceed to the detection of the moving object in these images while using a motion detection algorithm. As an output, this algorithm gives the images of the object in motion.

5.
Insertion scheme proposed for the image: we will tattoo the different images of the object in motion detected.

2.
Replicate the edges of the image to make its dimensions a multiple of 8.

3.
Decompose the image into blocks of 8 Â 8 pixels in the spatial domain.

4.
Move to the frequency domain by applying the MDCT, (Eq. (3)). To obtain the frequency coefficients for each block, we must apply the MDCT for each block of 8 Â 8 pixels.

5.
Separate the frequencies and extract the low frequencies band. We chose to insert the mark bits in the low frequencies band as it is much less sensitive to attacks than the high frequencies band. At the end of this step, we obtain for each block all the low frequencies.
6. Since the human eye is more sensitive to the noise introduced into the low frequency band, we will introduce the Watson model to look for the least perceptible insertion places in the frequencies band. This model calculates the just perceptible difference "JND" for each frequency coefficient of each block.

7.
Substitute the insertion of the mark bits: we will look for insertion positions that belong to the band of low frequencies and allow keeping the mark imperceptible ( Figure 6). • Select a coefficient of the low frequencies band.
• Select the least significant bit (LSB) of the binary representation of the coefficient.
• Substitute the least significant bit by bit stream of watermark to insert.
• Calculate the decimal value of the watermarked coefficient.
• Calculate the difference between the coefficient before the insertion of the mark bit and after the insertion: Var_coef.
• Compare this value obtained with that which corresponds to the matrix containing the JND values generated by the Watson model.
• If Var_coef < JND, so we can insert watermarking bit in this position and we can change the coefficient value without noticing the difference.
• Else, the insertion in this position will be visible to the eye.
The insertion is performed on all the blocks of the image to improve the robustness. Therefore, we will proceed with the duplication of bits of the brand F times. F is calculated according to the number of components where insertion is invisible to the eye, "NBCom_INV," and brand size Lmark: At the end of this step, we get a watermarked block in the frequency domain.
8. Go back to the space domain by applying the IMDCT (Eq. (4)) to reconstruct the watermarked image.
All previous steps are applied to all blocks in the image and for all selected images in the video.
6. Insertion scheme proposed for the audio: we will integrate this model in the insertion process to exploit its properties in the search for insertion positions. Similarly, and as for the image, this technique operates in the frequency domain using the MDCT (Eq. (1)). The various steps constituting the insertion process are: 1. Decompose the original audio signal into blocks of 1024 samples each (23 ms duration).

2.
Integrate the psychoacoustic model 2 on each time frame of 1024 samples obtained from the previous step. This model will generate a masking curve thrω.

3.
In parallel with the previous step, apply the MDCT (Eq. (1)) transformation on blocks of 1024 samples to pass to the frequency domain. We obtain blocks of 1024 frequency coefficients in the frequency domain.

4.
Extraction of low frequencies: the coefficients obtained are separated at low frequencies and high frequencies. We take each block of frequencies components and set the low frequencies band to half, at the occurrence of N/2 (N = 1024).

5.
Substitute insertion: we will inject the watermarking bits into the frequency components of the low frequency band under the masking curve thrω (Figure 7).
We will look for the insertion positions Po belonging to the low frequency band and lying under the curve. After the binarization and the hamming coding of the Mark1, we will obtain a binary sequence bi {0, 1} of length Lmark1. In order to improve the robustness criterion of the proposed technique, we duplicated each bit of the sequence bi, F1 times. F1 is calculated as the integer part of the ratio between the number of components at positions Po, NB_TH and the length of the mark Lmark1.
We will have a binary sequence b'i {0, 1} of length L'mark1.
After the search for the different frequency components located at the Po positions, we proceed to the binarization of the values of these components. Next, we substitute the least significant bit (LSB) of each component with the current bit of the watermarked message. At the end, we get watermarked block in the frequency domain.
6. Go back to the time domain by applying the IMDCT (Eq. (2)) to reconstruct the watermarked audio. All previous steps are applied to all blocks in the audio.

7.
After getting the watermarked audio signal and different watermarked images, we join these two components (audio and image) to form the final watermarked video signal

Detection scheme
The detection is blind (we do not have the original document; only the secret key is needed to extract the mark) and the reverse of the insertion. For the detection of the two marks Mark1 and Mark2 inserted, we will need as keys "Key1," "Key2":  Figure 7 shows the masking curve thrω in blue and the curve of low frequencies samples in red for a signal that has been chosen.
• Duplication numbers F and F1 that we can insert a bit.
• List of the positions of the components under the masking curve that are sought by the psychoacoustic model 2 in the insertion phase.
• Positions of the components sought by the Watson model in the insertion phase.
The entry of the detection process is the watermarked video resulting from the insertion process. After separating the two audio components, if it exists, and image and using the two keys (Key1 and Key2), we extract the two marks inserted into each component. 1. From these, we can detect the bits of the message inserted in the components corresponding to these positions. We will then have as a result a binary vector containing the watermark bits corresponding to the coded signature but with duplication F times for each bit. Finally, to detect the bits of the mark without duplication, we use the parameter F to eliminate the duplication. We will have as a result the extracted encoded binary brand, of size multiple of 12.

2.
Hamming decoding to finally find useful binary brand, corrected multiple of 8.
1. Detection scheme proposed for the audio: after decomposing the watermarked audio signal into blocks of 1024 samples and applying the MDCT on each block to pass to the frequency domain, we proceed to the detection of the bits of the mark.
• From the positions of the watermarked components under the masking curve, sought by the psychoacoustic model 2 in the insertion phase, we determine the values of these components. Subsequently, we proceed, as we did in the insertion process, to the binarization of these values. Then, we extract from the least significant bit of the inserted message. We obtain then a binary sequence with duplication of length L'mark1. Finally, to detect the bits of the mark without duplication, we use the parameter F1 to eliminate the duplication. We will have as a result the extracted encoded binary brand, of size multiple of 12.
• Hamming decoding to finally find useful binary brand, corrected multiple of 8.
• Reconstruction of the final mark.

Experimental results and comparative analysis
In this section, we will present, in detail, all the experimental results obtained. The algorithm is tested on MATLAB R2013a with an Intel (R) core (TM) i7-6500U CPU 2.59 GHz, 8 GB memory computer. The experimental corpus is formed by six videos of .avi format ( Table 1).

PSNR
Peak signal-to-noise ratio (PSNR) is an objective quality evaluation measure whose unit is (dB). It measures the quality of the altered (watermarked) image compared to the original image. In particular, we used the PSNR to evaluate the invisibility of our watermarking system. PSNR is defined as: where: • I r,i,j and I' r,i,j : values of pixels (i, j) in the r th image of the original and watermarked video.
• R : total number of video frames.

SNR
Signal-to-noise ratio (SNR) is a measure that will allow us to calculate the similarity between watermarked audio and original audio. It is usually expressed in decibels (dB). SNR is defined as: where: • x(n): sample number n of the original signal.
• x 0 (n): sample number n of the watermarked signal.

Objective difference grade
Objective difference grade (ODG) is a score calculated by the PEAQ algorithm [21]. This algorithm compares the original signal and the watermarked signal and assigns a comparative score between 0 and À4. If ODG = 0, there is no degradation. If we get a GDO rating that varies between À0.1 and À1, the deterioration is noticeable but not annoying. For an ODG rating that ranges between À1.1 and À2, the degradation is slightly annoying. If the ODG value obtained varies between À2.1 and À3, the degradation is annoying. Finally, if the ODG score obtained is in the range [À3, 1; À4] so the distortion is very boring.

Universal quality index
The universal quality index (UQI) is proposed by [22]. It is an objective evaluation of the visual quality of images and whose range of values varies between [0, 1]. Higher UQI values represent a better criterion of imperceptibility. The UQI is defined by: where: • I and I 0 are, respectively, the average values of the original image I and the processed image I 0 .
• б 2 I and б 2 I' are, respectively, the variances of I and I 0 .
• б II' is the covariance of I and I 0 .

NC
To test the robustness of the technique against attacks, we will calculate the correlation NC between the original brand inserted and the mark detected after the exposure of watermarked files to different attacks. For the image, the formula for normalized intercorrelation is given by: where: • bin is the binary vector of the inserted mark.
• bin 0 is the binary vector of the mark detected after application of the attacks.
• Lmark2 is the length of the inserted mark.
For audio, the formula for normalized intercorrelation is given by: where: • bi is the binary vector of the inserted mark.
• bin is the binary vector of the mark detected after application of the attacks.
• Lmark1 is the length of the inserted mark.

Marks
• Mark1: in the audio component of the video, we will insert the text mark "audiowatermarking," of length 136 bits and after the hamming coding, its length reaches 204 bits (after that, each bit will be duplicated F1 times).
• Mark2: in the image component of the video, we will insert the image "logo.bmp," of size 32 Â 32 pixels and after the hamming coding, its length reaches 1536 bits (after that, each bit will be duplicated F times) (Figure 9). Table 2 gives PSNR, UQI, SNR, and ODG values for the imperceptibility tests.

Imperceptibility
By analyzing and comparing the original image (a) with its watermarked equivalent (b) of the video horses.avi, we notice that they do not present remarkable differences and they are even   identical. So the proposed watermarking technique does not affect the quality of images and the inserted brand remains invisible to the human eye. We also note that the spectrogram of figure (c) faithful to that of figure (d). This shows the imperceptible criterion of the technique ( Figure 10).

Robustness
To evaluate the robustness, we will apply different types of attacks on the audio and video component of the video: MP3 compression/decompression with the MPEG1 coder "lame.exe" at different compression rates: 128, 96, and 64 kbit/s, the attack by impulsive and Gaussian noise, cropping, frame swapping, frame dropping, frame averaging, and change the coding rate. We will calculate the NC values between the mark before and after the attacks for both components ( Table 3).
According to the results, we note that the NC values for watermarking system vary between 1 and 0.85 that is very interesting. For values of NC = 1, it means that the mark detected after the attacks is perfectly identical to the initial mark. We also notice that the watermarking system is robust against MPEG1 and MPEG2 compression.

Comparative analysis
According to the study of the existing, the watermarking techniques for the video watermark only the image component. It is among the contributions of our watermarking system.
In Table 3, the notation "-" indicates that data are not available. On after PSNR values shown in Table 4, we note that the proposed watermarking system guarantees the best criteria of imperceptibility PSNR = 60 dB. In addition, the proposed technique shows good performance against attacks. The NC values vary between 0.89 and 0.97. Comparing the results obtained by the proposed watermarking system with those obtained by Dolley and Manisha in [24], we note that the proposed technique is more robust against the Gaussian attack and for the other attacks, the results are close but from the description of this technique, we found that the detection scheme requires the presence of the original video, while our proposed method requires only the keys which makes the detection faster. In addition, we note that the results obtained by the proposed method are better than those obtained by Chitrasen and Tanuja in [26] which shows the contribution of our watermarking system.

Conclusion
In this chapter, we proposed a watermarking system for multimedia documents operating in the frequency domain using MDCT. As a conclusion, we can draw from these obtained results that: • The frequency domain, in particularly the MDCT, has shown its proof point of view imperceptibility and robustness. There is still a very interesting area.
• The integration of the psychoacoustic model 2 of the MPEG I standard, the use of the Watson model and the motion detection algorithm, the insertion in the LSB, and the Hamming coding improves the performance of the proposed watermarking system.