AWGN Watermark in Images and E-Books – Optimal Embedding Strength

One important watermarking algorithms class is based on spread spectrum technique (Cox et al., 1997, 2008; Feng et al., 2005, 2006; Mora-Jimenez & Navia-Vazquez, 2000; Perez-Freire & Perez-Gonzalez, 2009; Ruanaidh & Pun, 1998; Ruanaidh & Csurka, 1999) – embedding by spreading the information about each message bit across several image pixels, or across the whole image. Watermark embedding in such manner is usually performed by adding of some pseudorandom data vector to the image.


Embedding
Inputs into embedding program are the original image, the message bit and the watermark key.The watermark key is a secret number, same for the embedder and the detector.Embedder output is the watermarked image.
We present the grayscale image c 0 , of mn  pixels, with one mn  matrix or (same in fact) with one mn vector.In this text we will use terms image vector, image matrix and image as synonyms (if this doesn't make confusion).
We embed one message bit (binary one or zero) into the image according to formulas: Image vectors c w1 and c w0 arise from embedding of binary one or binary zero into the image, c 0 is the original image vector before embedding,  is the embedding strength coefficient (>0) and r w is the reference pattern. 1   The reference pattern is pseudorandom vector chosen in accordance with N(0,1) (standard normal distribu1tion), of the same dimension as c 0. In its generation, watermark key is used as the pseudorandom generator seed.

Detection
Inputs to the detector are the image and the watermark key (the same as in the embedder).
Its output is the detected watermark message.As a detection measure, we use a linear correlation between the image and the reference pattern: (• denotes the dot product, and ║ ║ -the vector norm).
The detector compares the computed linear correlation value with the threshold value , set in advance, and replies that it is embedded: (3)

Embedding of longer message
We embed a k bits message into the image by repeatedly (k times) embedding one bit.We name this embedding of message bits one over another.
An alternative solution is embedding into subimages: we divide the image into disjoint subimages, and then we embed one bit message into each subimage.Under subimage term we mean any subset of image pixels set.Subimages may, but need not, be composed of adjacent pixels.
Different variants of the stated methods are also possible.For example, we may embed several message bits into every subimage.

Optimal embedding strength
It is not easy to produce good imperceptible digital watermark, because it needs to satisfy two opposite requests: 1 More precisely, resultant values in matrices must be from permissible set of image pixels values.Therefore, it is more precise to say cw1=[c0+rw]8 and cw0=[c0-rw]8.Sign [ ]8 designates "squeezing" of the result vector coordinates into 8-bits values, i.e. into values from the set {0,1,2...255}.However here, for the purpose of simplicity, we will ignore this (small!)diference.

www.intechopen.com
Watermarking -Volume 1 186 -Detectability -in any case it needs to be detectable -Fidelity -it needs to be imperceptible for casual observer.
These two demands are mutually confronted.If watermark is embedded stronger, then it's more likely it will be detectable.In the other hand, if watermark is strongly embedded, it will be more noticeable.Detectability is surely more important condition -"essential requirement".Thus, we need to determine optimal embedding strength -minimal one that guaranties detectability.
In some situations it may be expected that watermarked image will be subjected (from embedding to detection moment) to some modification.Under these circumstances it is also useful to know optimal strength -minimal one that ensures watermark detectability after this (expected) modification.
Two concepts are closely related to watermark detectability: effective and robust embedding.
Watermark is embedded effectively if it is detectable immediately after embedding.
If message is detectable in digital image that is after embedding subjected to some modification, we say the watermark is robust against undergone modification.

Embedding effectiveness
Embedding strength coefficient  and detection threshold  directly influence the embedding effectiveness.
Coefficient  needs to be big enough (effective embedding), but not too big (quality request).
If we reduce the detection threshold  (set it to be close to, or maybe equal to zero), embedding will be more effective (the detector will report in higher percent of embedding cases the image as watermarked).But, if the threshold  is too small, the detector may reply that the image is watermarked, even when this is not true.
It is very important to find the real measure -to set the threshold  and the coefficient , in a way that false negative probabilities (non-effective embedding) and false positive ones (case when the detector reports that image is watermarked when this is not true) are acceptably low.Also, watermark must not be embedded too strongly to have the fidelity affected.
In subsection 3.1, we determine optimal value of coefficient  for effective embedding of one bit message.In subsection 3.2, we determine it for a longer message.

Deviation of lc(c 0 ,r) from zero -parameter  lc
In searching for optimal embedding strength, we begin with well known facts for normal distribution: -If X 1 and X 2 are normally distributed random variables: X 1 N( 1 , 1 2 ), X 2 N( 2 , 2 2 ), then their linear combination is also normally distributed: -For distribution N(,  2 ), the 68-95-99,7 rule (empirical rule, 3-sigma rule) states that nearly all values drawn from it lie within 3 standard deviations of the mean: (5) -If X 1 , X 2 , ..., X k are independent standard normal random variables (i.e.X i N(0,1), i=1,...,k), then the sum of their squares is  2 (chi-square) variable with k degrees of freedom.Its mathematical expectation is k.
Parameter  lc -the standard deviation of a sample (linear correlations between the original image and reference patterns) is the basis for correct setting of parameters  and , for effective embedding.
The biggest embedding strength is needed when lc(c 0 ,r w )=-3 lc and we embed binary one, and when lc(c 0 ,r w )= 3 lc and we embed binary zero Thus, if we choose then almost every reference pattern will be effectively embedded with strength .
Such embedding, in which  is set in advance, and the reference pattern is embedded with this strength (without consideration for linear correlation value between the image and the reference pattern that is to be embedded), we call a fixed strength embedding.
The fixed strength embedding is quite bigger than necessary.Better result we get if we consider the linear correlation between the image and the pattern that is to be embedded.In this case, we talk about an algorithm with the embedding strength adjustment.
Let l 0 =lc(c 0 ,r w ) and we embed the binary one.If l 0 < , we set =-l 0 .If l 0 , we set =0 (not necessary to embed anything).Thus, if we embed binary one, then max l ,0 if we embed binary zero, then max l ,0 Embedding with strength  makes in image mean square error

Detection threshold  setting
To avoid false positive error,  needs to be bigger than the biggest correlation between the image and not embedded reference patterns.
If we set =3 lc , the linear correlation between the original image (c 0 ) and reference pattern would be in interval (-,), almost with probability p=1.In other words, we could be almost 100% sure that the detector would not respond with a false positive error.
The threshold should not be larger than 3 lc : this is not only needless, but also affects the fidelity (a stronger embedding would be needed for the detector to recognize the message).
We have to consider the possibility of a false positive error when the message is quite short (as here, the message being one bit long).If it is longer, the false positive error is not a big problem (we will explain more about this topic in subsubsection 3.2.1),and  could be set quite smaller.

Parameter  lc dependence on the image dimension
The image c k derived from k equal images, c, with energies E(c), has energy E(c k ) =kE(c) and dimension dim(c k )=kdim(c).Its  lc parameter is So, parameter  lc is lower for larger images.Therefore, in an image with larger dimension, we have to embed our message with a lower strength.In the previous case, it is Taking the fidelity of image into account, it is obvious that in larger image we can embed a longer message.

Threshold  setting for longer message
The threshold  for a longer message may be smaller than 3 lc (value recommended in case of one bit message).For example, for = lc , for roughly 68% of possible reference patterns, the linear correlation between the image and the reference pattern is inside the interval (-,).So, the false positive may appear in 'every third' case.If we insist that every message bit must be detected for confirmation of message presence, it will be almost impossible to detect the message if it is not embedded (for detection of non-existing message, it is necessary for all reference patterns to arise a false positive, and in a longer message this is almost impossible).
So, in the case of longer message it is permitted a quite smaller  value.

Message bits embedding one over another
Due to the fact that reference patterns are mutually uncorrelated, embedding of new pattern will not substantially change linear correlation between the image and previously embedded patterns (linear correlation is resistant against white Gaussian noise): Hence, if we embed message bits one over another, we can take same value for  as in the case of only one information bit. 2e embed k bits into image c 0 =( c 0 (1), c 0 (2),..., c 0 (mn)), in a way that for each bit we add (or subtract) the corresponding reference pattern r j =( r j (1), r j (2),..., r j (mn)) (j=1,2,...,k), multiplied by embedding strength  j .The resultant image is c w =( c w (1), c w (2),..., c w (mn)).We obtain each image pixel with (sign '+' is for binary one, '-' for binary zero embedding): Such embedding is localized in space.Only corresponding coordinates of the original image and reference pattern influence on resultant pixel value.
Reference patterns coordinates take values from distribution N(0,1).Therefore, their linear combination
Example 2: If =2 and we embed the message of k=25 bits (one over another, with fixed strength embedding), then =k = 1 0   , i.e. embedding of 25 patterns with strength 2 is equivalent to embedding of one pattern with strength 10.

Embedding into subimages
We divide image c 0 of mn pixels into k disjoint subimages c 0 1 , c 0 2 , ..., c 0 k , of mn 1 , mn 2 ,..., mn k pixels, respectively ( k j j=1 mn = mn  ).For each subimage, the coefficient  lc j (j=1,2,...,k) is www.intechopen.com If it is possible to divide this image into k parts of equal dimensions and energies, then all subimages will have the same  lc j parameter value: In this case, the overall strength for k bits message, when embedding into k subimages, is equal to the strength for one bit message, multiplied by k .Embedding strength in this case is equal as if the message is embedded one bit over another.

Robustness against lossy compression
Next we give an estimate of the strength coefficient  for watermark, to survive the lossy compression.
In subsection 4.1 we consider the case of one bit, and in subsection 4.2 -of a longer message.
For the purpose of simplicity, we only discuss the case of binary one embedding.The contents of this section may be used with small changes for the case of binary zeros.With next experiment we try to find which part of watermark will be destroyed by lossy compression.
1. We embed k reference patterns r(1), r(2),..., r(k) (one by one) into the image c 0 with strength  (value  is the same for all of them).In a way, we get k watermarked images (with binary one embedded) c w (1), c w (2),..., c w (k). 2.Then, we subject each of these k images to expected lossy compression (same for all k images), and we get k compressed watermarked images c wn (1), c wn (2),..., c wn (k). 3.For each of them we calculate linear correlation with corresponding reference pattern (the one that is previously embedded in it).
In Fig  We can see from this graphic, that for each reference pattern r(i), (i=1,2,...,k) stands: -Embedding of binary one with strength  raises linear correlation value between the image and the reference pattern by  -Lossy compression reduces linear correlation by constant, i.e. it erases constant part of watermark.Thus, after lossy compression, it remains the constant part of watermark.
(Clearly, binary zero embedding reduces linear correlation value for , and lossy compression raises it by constant -it erases constant part of watermark).
We introduce new concept: embedding strength coefficient after compression, '.' is part of  that survives after lossy compression.
For all of twenty reference patterns, the effect is the same: for our image and =3, after DjVu Photo compression, it survived '=0.75.
After repeating our experiment for other compression techniques (for various intensities, for JPEG compression and compression used in PDF and DjVu files making), we get the same conclusion: For given image and compression intensity, ' is constant for fixed  coefficient (it doesn't depend on reference pattern embedded, nor on linear correlation value of original image with it).
Surely, for given  coefficient, value ' is smaller in more intensive compression.For example, ' is bigger for JPEG 70% compression than for JPEG 40%.

Coefficient  setting for a robust watermark
If we wish the watermark to be robust against lossy compression, we need to modify statements ( 8) and ( 9) (subsubsection 3.1.2:Parameter  setting) and replace each  with ': in the case of fixed embedding strength: '=3 lc+  -in the case of embedding strength adjustment: '=max(-l 0 ,0) (for binary one), '=max(+l 0 ,0) (for binary zero embedding) The procedure of deriving ' from  is "natural", because it matches to the events chronology:  and ' are "cause and effect":

Robustness of a longer message
If we embed k patterns into the image, each with own embedding strength coefficient  i (i=1, 2... k), we have: The overall strength coefficient remaining after compression is ' (the coefficient that corresponds to  for our image and compression technique).Example 5: The same test is done for 64 bits message.The watermark also survives DjVu Photo compression.Overall embedding strength is =6.94.In Fig. 3, the compressed original (without watermark) and compressed watermarked image could be seen.

AWGN watermark in transform domain
Many authors suggest watermark embedding in transform, instead of spatial domain.In addition, they propose using transform domain that will be used during lossy compression.
Goal is to achieve greater watermark robustness against expected compression.

Optimal embedding strength in transform domain
In this subsection we apply the described watermarking algorithm in transform domain (embedding and detection occur in transform, instead of spatial domain).We show that there is no difference from the viewpoint of effective embedding of AWGN watermark, between watermarking in spatial and in transform (DCT, block DCT, Fourier or arbitrary wavelet) domain.
First of all, most of nowadays used transforms in image processing and in compression are orthogonal (or at least unitary) linear transforms.
Standard deviations from zero of linear correlations between vector c 0 and reference patterns in spatial and transform domain are equal: It is all the same if we embed AWGN watermark into an image with strength  in spatial or in transform domain.Correlation values and MSE too, will be the same in both domains.
Also, there is no difference between spatial and transform domains in AWGN watermark robustness against lossy compression.

AWGN watermark embedding into subimage in transform domain
As in spatial domain, it is possible to embed watermark in some part of coefficients in transform domain (here also, we call it embedding into subimage).
Technically, there is not any difference in optimal strength setting, speaking of subimage and of the whole image.Effective embedding strength we determine by dimension and energy of image (or image part) in that AWGN watermark is to be embedded.Robustness against lossy compression we determine by these coefficients properties in the regard to expected compression.
However, for images in spatial and in transform domain, there is a great difference in energy distribution.While in spatial domain, image energy is mainly distributed evenly over entire matrix, in transform domain it is concentrated in some of its elements.Illustration of this fact may be seen in Fig. 4, on example of block DCT (transform used in JPEG compression).Here, for presentation of matrices in DCT domain, we use solution proposed in (Vučković, 2008): matrix elements with value 0 are presented in black color positive elements are presented in nuances from black to white -negative elements are colored in nuances from black to yellow because of big difference between image elements in block DCT domain (second and fourth objects in Fig. 4), they are presented with logarithmed magnitudes Block DCT domain, which is the basis of JPEG compression, is used frequently in watermark embedding.At description of such embedding, we use term image subchannel (Eggers and Girod introduced it in (Eggers & Girod, 2001)).Subchannel is vector that in block DCT domain has for coordinates -elements, with same index in blocks.Subchannels are ordered according to zigzag order (Fig. 5).Thus, subchannel 1 consists of all DC blocks elements; subchannel 10 consists of all elements that are in the position 10 in the block (zigzag order).
In block DCT domain, image energy is concentrated in upper left corner of the block.For example, in image 'Cameraman', even 99.76% of whole image energy is concentrated in its first 32 subchannels.Therefore, with described algorithm, it is needed twice higher strength for effective embedding in first 32 subchannels than into the whole image.In this way we embed in 50% image coefficients, so the mean square error in this case is two times higher comparing with embedding into all of coefficients.
Embedding in lower right block corner is effective even with very small strength.However, even at very low compression, these coefficients will be destroyed.Therefore, embedding into those coefficients is not recommended.
Embedding in vicinity of subchannel 1 is not good solution, because of extremely high energy which requires big embedding strength (and this badly influences image quality).
Our explorations (Vučković, 2010a(Vučković, , 2010b) ) confirmed earlier (Eggers-Girod, 2001), that for effective and robust (against JPEG compression) and imperceptible watermark, it is the best to embed it in several coefficients in upper left block part (somewhere in the region of subchannels 10-22 -see Fig. 5).Such embedding showed also good robustness against compression techniques different from JPEG (for example, against DjVu compression).

AWGN watermark and other image modifications
Image modifications are organized (Cox et al., 2008) in two classes -valumetric and geometric distortions.

Valumetric distortions
Valumetric distortions are simpler than geometric ones.They change individual pixels values.These modifications include additive noise, amplitude changes, linear filtering and lossy compression.

Additive noise
This modification has effect of a random signal adding.For watermarked image, additive noise adding is defined by: where s is a random vector chosen from some distribution, independently of c 0 and r.
Thus, an additive noise does not affect AWGN watermark (AWGN algorithm is robust against this image modification).
AWGN watermark is also robust against brightness changing (c w1 =c w +nJ, where J is matrix of ones and n is integer).

Amplitude changing
It may be presented by the formula where >0 is scaling factor.Such operation causes brightness and contrast changing.
After this modification, linear correlation value is equal to starting one, multiplied by factor . Depending on factor , watermark detectability will increase (if >1), or diminish (if <1).

Linear filtering
It is given by where  is a filter, and  designates convolution.Many common image operations are performed using linear filters.Examples of them are blurring and sharpening effects.
Linear filtering and lossy compression are more complex operations than the previous two, because changes that they cause are not strictly localized (pixel change depends on certain number of surrounding pixels too).

Geometric distortions
This class of image modifications includes many image distortions (rotation, spatial scaling, translation, skew or shear, cropping, perspective transformation, and changes in aspect ratio).
These modifications are more complicated than valumetric, because they displace information about pixels in image matrix.They usually change matrix dimensions as well.Therefore, here is not possible readily to detect the watermark.Yet, with these modifications, information about embedded message is not lost, but only "masked".For each geometric distortion, before detection, we need to perform one correction procedure.
For illustration, we depict one geometric modification -image rotation by an angle .Many inferences of this analyze could be applied to other modifications.
With modifications that erase a part of image data (for example crop operation, or other operation with erasing several columns or rows), after correction procedure, we'll have original image matrix with locally erased data (some elements are erased, but other are untouched).Then, it is possible to calculate accurately parameter  lc for untouched image part.For example, if prior to watermark embedding we know that image will be subjected to crop operation with reducing complete image area to its quarter, the watermark needs to be embedded twice stronger, to be detectable after cropping. www.intechopen.com Watermarking -Volume 1 200

Robustness against rotation
In the case of image rotation, one of several different correction procedures may be used prior to detection.If watermarked image is rotated by angle , to make watermark detection possible, we may do one of the following: -Compare (using linear correlation) rotated image with the reference pattern that is also rotated by  -Before detection, rotate the rotated image by angle - and crop it to its original dimension; and then, compare it with reference pattern, that is also rotated by angles  and - and cropped to original dimension -Image rotated by  and then by - (and then cropped to original image dimensions), (we may) compare with the original reference pattern.
Linear correlation values for image and reference pattern (unrotated and rotated) are presented in Fig. 6, for image "Cameraman".Reference pattern r w is embedded with strength =5.Then, image is subjected to rotation by angle =10.
In each row we present image and pattern for which the correlation is calculated.So, for each example it is specified which image and pattern (and their dimensions) we deal with, and also the linear correlation value for them.In the first figure row, watermarked image and embedded reference pattern are presented.Second, third and fourth row contain information about three previously stated solutions of correction and detection.
The rotation practically does not reduce linear correlation value, if it is calculated for image and reference pattern that are rotated in the same manner.However, if we compare the rotated image (after restoration to original position by rotation in opposite direction and cropping to original dimension) with the original reference pattern, certain watermark amount will be lost (' value will be considerably less than embedding strength value ).
It should be noted that in cases where not all details of distortion occurrence are known, the latest solution is usually only possible.
Experiments have confirmed (Vučković, 2010a) that stated inference for lossy compression (subsubsection 4.1.1:Embedding strength parameter after compression (')) stays also for rotation (and other distortions too).For our image and embedding strength , after given modification, remaining strength ' doesn't depend on the reference pattern nor on its correlation with the original image.Hence, to set embedding strength  needed that after expected modification remains ', it is sufficient to experiment with only one reference pattern.
The procedure of determining the necessary embedding strength  may be performed in the next steps: 1. We determine ' using procedure given in subsubsection 4.1.2(Coefficient  setting for a robust watermark) 2. By experimenting with only one reference pattern, we determine the necessary embedding strength , that after expected modification, remaining strength is at least '.
With black-white text pages there exists another problem, which makes them especially inconvenient for watermarking: watermark can be removed simply by text retyping from them.

Conclusion
This text contains integrated results of several earlier papers.
For given grayscale image and AWGN watermark, we described procedure for optimal embedding strength setting.We analyzed optimal strength for effective embedding, and also watermark robust against expected image modification.
We analyzed AWGN watermark embedding cases, into the whole image and into images, in spatial and transform domains and robustness of such embedding against expected compression.
For other image modifications, procedures for optimal strength setting are similar as in the case of compression.For each geometric modification, however, we need prior detection to perform one correction procedure.
Obtained results we applied to color images and e-books.AWGN algorithm is not applicable on black-white e-books that originate from DOC or TeX files, because watermark may be removed from them by simply retyping of text.

Acknowledgment
The work presented here was supported by the Serbian Ministry of Education and Science (project III44006).

. 1 ,
linear correlation values of these images with reference patterns are shown (c 0 is the first page scan of the Ruđer Bošković's book 'Elementa geometriae', =3, k=20, compression is DjVu Photo, made by program DjVu Solo 3.1 -LizardTech, Inc).Graphic abscissa presents arrays indexes, and ordinate -their values.The dotted line presents linear correlation values between the original image and reference patterns.The dashed line presents linear correlation values between watermarked images and corresponding reference patterns.The solid line presents the linear correlation values between compressed watermarked images and corresponding reference patterns.

Fig. 1 .
Fig. 1.Linear correlation between the image (original, watermarked and compressed watermarked) and corresponding reference patterns (left), for the first page of the book "Elementa geometriae" (right)

Fig. 4 .
Fig. 4. Image 'Cameraman' in spatial and in block DCT domain; one 88 block in spatial and in block DCT domain As we can see, in block DCT domain, energy of each block is concentrated in its upper left corner.On the other hand, JPEG compression damages elements in lower right corner much more intensively.