“The chapter begins with the short description about the concept of entropy, formula, and matlab code. Within the main chapter body, three different approaches how to use the information entropy in dataset analysis: (i) for data segmentation into two groups; (ii) for filtration of the noise in the dataset; (iii) for enhancement of the entropy contribution via point information gain. Finally, the conclusion is briefly about extended analysis using more generalized entropy, and the usability of described algorithms: advantages and disadvantages.”
- point information gain
MATLAB environment enables advanced data processing and analysis, especially using its toolboxes like signal processing, image processing, and statistics. MATLAB and its toolboxes are trademarks or registered trademarks of The MathWorks, Inc.
MATLAB and its toolboxes are trademarks or registered trademarks of The MathWorks, Inc.
is the necessary step before the analysis.
transforms the raw data into more transparent format for the analysis.
includes tasks as calibration, filtering, feature detection, alignment, normalization, modeling, and so on.
is the interpretation of the processed data.
consists of comparison, classification, clustering, decomposition, pattern recognition, identification, and so on.
One of the most useful plots in the signal or image analysis is the signal histogram, an expression of signal abundance, first introduced by Pearson . The estimation of proper histogram, as a representation of the probability distribution function, suffers with the question of the proper binning. However, in the digital era, we are live with the datasets, which are discrete representation of discrete events of the real signal. Thus, the amount of bins is usually given by the amount of quantization levels during the sampling process (Figures 3 and 4).
In this chapter, the question of image processing is discussed. The lecture opens the intensity histogram function, and the induction continues through the statistical parameters, like central moments, to the information entropy. Three different methods for using the entropy in image processing are introduced, entropy filtration, entropy segmentation, and point information gain. The description is completed by mathematical equations as well as by commented MATLAB commands. The results of the commands are the plots and figures presented within the text. This chapter aims to serve as guiding overview for the entropy consideration as a processing method. The simple examples show the methods steps and additional features (Figure 5).
2. Histogram function
In digital image representation, intensity histogram of a grayscale image is an intensity function shows count of pixel with the intensity equals independently on the position :
In MATLAB, the grayscale image MathWorks builtin demo image pre-packaged into MATLAB.
MathWorks builtin demo image pre-packaged into MATLAB.
Im = imread(‘circuit.tif’); %load the image;
Im = double (Im) / 255; %convert to double, range 0-1;
figure, imshow (Im); %show the image;
title (’Image of circuit’); %add caption to the image;
Do not forget the ’’ symbol at the end of each command, otherwise MATLAB will write all the pixels values!
The image is loaded into the variable . Its intensity levels are between zero and (8-bit coding) with single precision. The second command coverts the image into double precision and rescale the intensity values for the range from zero to one. The size of the image is , which is also the amount of the pixels in the image.
To compute the histogram:
[H,d] = imhist (Im); %compute histogram of the image;
figure, plot (d, H); %show the histogram function;
xlabel (‘d’); %add caption to the x-axis;
ylabel (‘H’); %add caption to the y-axis;
title (‘Histogram of circuit’); %add caption to the plot;
where the variable is the histogram function, and variable are the values (intensity levels). The is a function of .
The histogram represents almost the distribution of the values in the image. To obtain the estimation of the distribution function, it is necessary to normalize the histogram function to sum equals one The moment: The fact that the probability distribution of is normalized means that the moment is always .
The moment: The fact that the probability distribution of is normalized means that the moment is always .
MN = sum (H); %count histogram area;
H = H./MN; %normalize histogram;
figure, plot (d,H); %show normalized histogram;
title (‘Probability of circuit’);
This could be used for many modification (contrast enhancement, frequency evaluation, segmentation, …, etc.).
2.1. Statistical parameters
The distribution function allows us to compute some statistical parameters relevant for the further processing. The distribution is well characterized by two parameters, the location parameter and scaling parameter. The location parameter describes the value around which are all other values.
%compute mean value from the histogram;
mu = sum (H.*d) / sum(H)
This time, do not write the symbol at the end of the sentence, to see the result of calculation. This value express the position on the -axis (d-value), around which is the distribution centered. The value means weighted arithmetic average of all intensity levels. The levels are weighted according to the probability estimated from the histogram function ; The value of the weighted arithmetic average is called mean value There are three Pythagorean means: Arithmetic, Geometric, and Harmonic
There are three Pythagorean means: Arithmetic, Geometric, and Harmonic
figure, plot (d,H);
title (‘Probability of circuit and mean value’);
%add mean value to the probability plot;
hold on, plot ([mu, mu], [0, max(H)], ‘r’);
There is another way how to obtain the mean value directly from the intensity levels of all pixels in the image .
%compute mean value from the image;
mu = mean ( reshape ( Im,size (Im,1)*size (Im,2),1 ) )
There are two other parameters for the distribution location, median, and mode. The median of the distribution is a value separating the higher half of plot from the lower half, it is a value. The mode is the value that appears most often in a set of data, the one with highest probability (the , where is highest ). MATLAB has implemented functions:
%compute median from the image;
Me = median ( reshape ( Im,size (Im,1)*size(Im,2),1 ) )
%compute mode from the image;
Mo = mode ( reshape ( Im,size (Im,1)*size (Im,2),1 ) )
figure, plot (d, H);
title (‘Probability of circuit with median and mode values’);
%add median value to the probability plot;
hold on, plot ([Me, Me], [0, max(H)], ‘-.*r’);
%add mode value to the probability plot;
hold on, plot ([Mo, Mo], [0, max (H)], ‘:*g’);
When plotted, median is dash-dotted red and mode-dotted green. The median often serves instead of mean for the distributions that are not Gaussian. The mode expresses the most frequent value in the distributions that has only one such peak, and thus they are unimodal (Figures 8 and 9).
The second parameter of the distribution is the scaling parameter. It describes how far the other values are from the location parameter . The second central moment estimates the variance , measures how far the values are spread out (dispersed). An equivalent measure is the square root of the variance, called the standard deviation . Standard deviation thus measure dispersion of the values.
%compute variance from the normalized histogram;
sigma2 = sum (H.*(d-mu). ^2)
%compute standard deviation;
sigma = sqrt (sigma2)
%compute variance from the image;
sigma2 = var ( reshape ( Im,size (Im,1)*size (Im,2),1 ) )
%compute standard deviation;
sigma = std ( reshape ( Im,size (Im,1)*size (Im,2),1 ) )
In case, that the dispersion is skewed, thus has different dispersions on left and right side of the plot (from the point of view of the location parameter), it is recommended to use the Inter Quartile Range as robust measure of scale. Usage of the also removes the affects of the outliers to the distribution dispersion.
%compute inter quartile range;
Q = iqr (reshape (Im,size (Im,1) * size (Im,2),1 ) )
The value of the Inter Quartile Range is usually bigger than the standard deviation . Approximately times for Gaussian distribution
Approximately times for Gaussian distribution
However, the basic statistical parameters do not cover the distributions that have more than one mode (multimodal), and also cannot describe the negative exponential distributions without location parameter. In that case, we are using different measure of the distribution, the entropy . Entropy is a measure of unpredictability of information content:
%compute entropy of the image;
S = entropy (Im)
%find where normalized histogram equals zero;
f = find (H==0);
%exclude zero values from computation;
Hx = H;
Hx(f) = ;
%compute entropy from the normalized histogram without zero values;
S = -sum (Hx.*log2 (Hx))
There will be completely different values of entropy for images with different distributions/histograms.
%show histogram and entropy of circuit image;
figure, imshow (Im);
title ([‘Circuit, entropy S = ’, num2str(S) ] );
figure, imhist (Im);
title ([‘Histogram of circuit, entropy S = ’, num2str(S) ] );
%show histogram and entropy of Gaussian noise;
J = imnoise (zeros (size (Im)), ‘Gaussian’,.5, .1);
SJ = entropy(J)
title ([‘Gaussian noise, entropy S = ’,num2str (SJ)]);
title ([‘Histogram of Gaussian noise, entropy S = ’,num2str(SJ) ] );
%show histogram and entropy of cell image;
C = imread (‘cell.tif’);
C = double (C) / 255;
SC = entropy (C)
figure, imshow (C);
title ([‘Cells, entropy S = ’, num2str (SC) ] );
figure, imhist (C);
title ([‘Histogram of cells, entropy S = ’,num2str (SC) ] );
%show histogram and entropy of unique value;
U = ones (size (Im))*.5;
SU = entropy (U)
figure, imshow (U);
title ([‘Unique intensity value, entropy S = ’, num2str(SU) ] );
title ([‘Histogram of unique intensity value, entropy S = ’,num2str(SU) ] );
Entropy is a number that somehow characterize the distribution:
Let start with a reminder of the form of Shannon entropy from information theory with respect to the image analysis terminology. Any given normalized discrete probability distribution fulfills the condition:
Usually, in an intensity image, there exists an approximation of probability distribution given by the normalized histogram function . is an intensity function that shows the count of the pixels with intensity equals to independent on the image position [3,4]. The histogram is normalized by the number of pixels to fulfill the conditions. The central moment again.
The central moment again.
More conditions are assumed when measuring the information. Information must be additive for two independent events , :
The information itself should be dependent only on the probability distribution or normalized histogram function in our cases. Equation 7 describes the conditions referred to. This equation is the well-known modified Cauchy’s functional equation with unique solution . In statistical thermodynamic theory, the constant refers to the Boltzman constant . In the Hartley measure of information, equals one [6,7]. Let us focus on Hartley measure. If different amounts of information occur with different probabilities, the total amount of information is the average of the individual information, weighted by the probabilities of their individual occurrences [7,8]. Therefore, the total amount of information is:
which leads us to the definition of Shannon entropy as a measure of information:
Thus, entropy is the sum of the individual information weighted by the probabilities of their occurrences.
In image analysis, the unknown probability distribution function of intensity values is approximated via histogram function : The histogram has to be normalized to the total amount of pixels [9,10]. Shannon entropy allows information content of the whole image or just from the selected part of the image to be measured (Figures 10 and 11).
The entropy implemented in MATLAB function
S = entropy(Im)
is Shannon entropy.
4. Entropy filtration
Entropy allows all the information content of the entire image to be measured. However, when we change the number of pixels in the histogram computation, we obtain partial information content that is strictly dependent on the area entering the computation (Figures 12 and 13).
Entropy filtering is based on the replacement of pixel values in the image by values of entropy. Entropy is computed in a specified area, usually from the pixel’s n-by-n symmetric neighborhood in the input image [4,11]. The shape of the neighborhood should be also defined by the users. The computed entropy is
where is the pixel’s neighborhood.
It is clear that the output image (as computed by entropy filtration) is strongly dependent on the area selected. For small , the local disturbances will be given sufficient weight, and the output image will be too noisy. On the other hand, too large an value will not preserve details and the output image will be blurred. Therefore, the key question in the filtration method is how to select a suitable neighborhood. selection is always a compromise between a noisy or blurry image. Of course, filtration can be very useful for decreasing the area and thus allowing further analysis (Figures 14 and 15).
%compute entropy filtering with small structure element;
F = entropyfilt (Im);
figure, imshow (F,);
title (‘Entropy filtering of circuit, se = true(9)’);
%compute entropy filtering with middle structure element;
F = entropyfilt (Im,true (41));
figure, imshow (F,);
title (‘Entropy filtering of circuit, se = true(41)’);
%compute entropy filtering with large structure element;
F = entropyfilt (Im,true(91));
figure, imshow (F,);
title (‘Entropy filtering of circuit, se = true(91)’);
5. Entropy thresholding and segmentation
Thresholding is a time cheap method searching for point in the intensity histogram for separating image into the objects related to the real objects. It takes from the image parts that corresponding to the threshold parameter(s). Automatic threshold selection using the entropy is based on the maximization of entropy segmentation. The histogram function is separated into two parts, and , iteratively in . For both parts, the Shannon’s entropies are computed
Then, the entropy of part and (taken together) is computed as
A threshold value is set for , where is maximized [12,13]. This method uses the global histogram function; therefore, it is not sensitive to the random noise contribution and successfully removes the noise. However, the use of thresholds also ignores local changes in the background, illumination, and non-uniformity. For images with different conditions within the scene, thresholds generally produce loses and artifacts. The use of thresholds without any previous preprocessing, for example,, light normalization, is applicable only with objects that are well separable from the background. Automatic segmentation techniques [3,4,12,14,15] are very powerful tools under easily-separable conditions (Figures 16 and 17).
HA = zeros(size(H)); %empty lower histogram;
HB = zeros(size(H)); %empty upper histogram;
%empty cumulative distribution function;
C = zeros(size(H));
%cumulative distribution function;
C(1) = H(1);
for k = 2:length(H),
C(k) = C(k-1) + H(k);
C = double(C);
%cycle through intensity levels;
for k = 1:length(H),
if C(k) > 0, %only for positive cumulation;
for w = 1:k, %from beginning till now;
if H(w) > 0, %only for positive histogram
%compute the lower histogram value
HA(k) = HA(k) - ( H(w)/C(k)) * log2(H(w)/C(k) );
if ( 1-C(k) ) > 0, %only for positive cumulation residuals;
for w = k + 1:length(H); %from now till end;
if H(w) > 0, %only for positive histogram
%compute the lower histogram value
HB(k) = HB(k) - ( H(w)/(1-C(k))) * log2(H(w)/(1-C(k)) );
%locate the maxima for joined histograms
[co, kde] = max(HA+HB);
Th = d( kde-1 )
II = im2bw(Im, Th);
title([‘Entropy segmentation of circuit, Th = ‘, num2str(Th)]);
The value where the entropy is maximized represents the threshold for segmentation of the image (Figure 18).
5.1. Grayscale thresholding
The entropy segmentation gives similar results with the Otsu thresholding. Otsu gray level thresholding is a nonparametric method of automatic threshold selection for image segmentation also from the normalized intensity histogram . For separating histogram into two classes, the between class variance is maximized:
%cycle through the histogram;
w(1) = sum( H(1:T) ); %probability of first class
u(1) = sum( H(1:T) .* d(1:T) ); %class mean
%protection against zero;
if w(1) == 0,
u(1) = 0;
%class mean recomputation;
u(1) = u(1)/w(1);
w(2) = sum( H( (T+1):end) ); %probability of second class
u(2) = sum( H( (T+1):end) .* d( (T+1):end) ); %class mean
%protection against zero;
if w(2) == 0,
u(2) = 0;
%class mean recomputation;
u(2) = u(2)/w(2);
%between class variance;
ut = w*u’;
sigmaB(T) = w(1)*(u(1)-ut)^2 + w(2)*(u(2)-ut)^2;
%find maximal between class variance
[e,r] = max(sigmaB);
TTh(1) = d(r); %set Threshold
TTh = graythresh(Im); %compute threshold;
IO = im2bw(Im, TTh); %segment image;
title ([‘Otsu segmentation of circuit, Th = ’, num2str(TTh)]);
6. Point Information Gain
The most interesting is the point information gain () which asks the question: How important is one pixel for the whole image or for the selected part? In other words, is the occurrence of the value of one single pixel a surprise? It is predictable that for value of background pixels it will not carry a lot of information, if we discard one of them. On the other hand, the objects, especially if they are complicated in structure, will increase the entropy on their position. Shannon equation evaluate total amount of the information entropy from the whole histogram. Let evaluate the normalized image histogram and compute the Shannon information entropy :
To investigate the contribution of one single pixel with intensity value to the total entropy, we need to evaluate the second histogram which is created without this investigated pixel:
This time we discard the value of the center investigated from the computation, but only once.
One single pixel of intensity value will only decrease the histogram value on its intensity position . Then, the histogram is again normalized. The probability of intensity value is slightly lower than the probability of the primary normalized histogram (with all pixels). The other probabilities , where is not the value of investigated pixel, are slightly higher than the probability of the primary normalized histogram (with all pixels). Then, in the second computation of entropy , computed from the modified normalized histogram :
the individual information as well as their weights differs according to the computation of whole entropy . Therefore, we obtained two different entropy values and . Entropy represents the whole measure of information in original image. Entropy represents the measure of information in the image without the investigated pixel. The difference :
refers to the difference between the entropy of the two histograms, and therefore also difference between the entropy of the two images (the first one with contains our investigated pixel and the second one without this investigated pixel). Recall that the both histograms and were normalized, and therefore, any difference in the number of pixels in the images is immaterial. Difference represents either the entropy contribution of pixel or the contribution of the value of the pixel to the information content of the whole image. The transformation of each image pixel value to its contribution to the whole image via equation 17 represents the measure of the information carried by that pixel, the Point Information Gain (). Repeated computation 17 for every single pixel of the image transforms the original image into the entropy map: the image that shows contribution of every pixel to the whole information content of the image (Figures 19 and 20).
It is predictable that the values of background pixels will not carry a lot of information, even if we discard one of them. On the other, the objects, especially if they are complicated in structure, will increase the entropy in their immediate area. According to the information theory, the object occurrence produces a bigger surprise than does background occurrence, and the quantifies this effect. For this reason, the details in the image are preserved: they are the surprise. For the same reason, random noise is removed: We always know it is presented, and no surprise occurs (Figures 21 and 22).
C = imread(‘cell.tif’);
%C = imread(‘cameraman.tif’);
%C = imread(‘circuit.tif’);
C = double(C)/255;
S = entropy(C); %compute entropy
%compute average probability of one pixel
pomo = 1/numel(C);
[H,d] = imhist(C); %compute histogram
H = H./sum(H); %normalize histogram
IE = zeros(size(C)); %empty result image;
%cycle through intensity levels;
%precompute second histograms;
G = H;
%remove pixel contribution;
G(k) = G(k) - pomo;
%protection against zero;
G(f) = ;
G = G./sum(G); %renormalization;
%entropy without pixel;
E(k) = -sum(G.*log2(G));
%point information gain;
PIG(k) = S-E(k);
%assign pig to pixels;
f = find(C==k/255);
IE(f) = PIG(k);
title(‘Point Information Gain of the cells.’);
%title(‘Point Information Gain of the cameraman.’);
%title(‘Point Information Gain of the circuit.’);
7. Conclusion and discussion
For those, who are interested in the entropy processing, the things are little bit more complicated.
approach is dependent only on pixel , there is no information about pixel’s position. Therefore, the area of the histogram function computation could include not the whole image, but only some selected area around the investigated pixel. The could be the whole row and whole column in which the pixel is located. Difference in this case refers to the difference between the information content of the two crosses. Difference represents either the entropy contribution of pixel or the contribution of the value of pixel to the cross. Even more derivation from the original algorithm were developed recently [17–19].
There also exist different entropies, not only Shannon, namely Tsallis-Havrda-Charvát and Rényi definitions at least. The Rényi entropy:
is the generalization of the Shannon entropy. For the , the Rényi entropy equals Shannon ().
The evaluation of entropy has heavy computational burden; therefore, it is recommended to use parallelization on GPU. For the processing of the color images, it is usual to tread each color channel independently like a grayscale image.
Overall, the entropy is a representative parameter of the image and there is still a lot of potential in its usage for processing and analysis.
The code presented in this chapter could be downloaded at: https://www.mathworks.com/ matlabcentral/fileexchange/55493-information-entropy.
This work was supported by the Ministry of Education, Youth, and Sports of the Czech Republic - projects “CENAKVA” (No. CZ.1.05/2.1.00/01.0024) and “CENAKVA II” (No. LO1205 under the NPU I program).
Katajama, M., Oreši , M. Data processing for mass spectrometry-based metabolomics. Journal of Chromatography A. 1158:318–328, 2007.
Pearson, K. Contributions to the mathematical theory of evolution. II. Skew variation in homogeneous material. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences. 186:343–414, 1985.
Sonka M., Hlavac V., Boyle R. ĎImage processing, analysis and machine vision. Brooks/Cole Publishing Company, 1999.
Gonzales R. C., Woods R. E. ĎDigital Image Processing. Addison-Wesley Publishing Company, 1992.
Boublik T. Statistical thermodynamic. Academia, 1996.
Hatley J. V. Bell System Technical Journal. 7:535, 1928
Jizba P., Arimitsu T. The world according to Renyi: thermodynamics of multifractal systems. Annals of Physics. 312:17–59, 2004.
Shannon C. E. A mathematical theory of communication. Bell System Technical Journal. 27:379–423 and 623–656, 1948.
Demirkaya O., Asyali M. H., Sahoo P. K. Image processing with MATLAB: Applications in medicine and biology. CRC Press, 2009.
Nixon M., Aguado A. Feature extraction & image processing. Academic Press, 2002.
Moddemeijer R. On estimation of entropy and mutual information of continuous distributions. Signal Processing. 16(3):233–246, 1989.
Pun T. A new method for grey level thresholding using the entropy of the histogram. Signal Processing. 2:223–237, 1980.
Tzvetkov P., Petrov G., Iliev P. Multidimensional dynamic scene analysis for video security applications. IEEE Computer Science’ 2006, Istanbul.
Beucher S. Applications of mathematical morphology in material sciences: a review of recent developments. International Metallography Conference, pp. 41–46, 1995.
Otsu N. A Threshold Selection Method from Gray-Level Histogram. IEEE Transactions on Systems, Man, and Cybernetics 9:62–66, 1979.
Urban J., Vanek J., Stys D. Preprocessing of microscopy images via Shannon’s entropy. In Proceedings of Pattern Recognition and Information Processing: pp.183–187, Minsk, Belarus, ISBN 978-985-476-704-8, 2009.
Rychtarikova R., Nahlik T., Smaha R., Urban J., Stys D. Jr., Cisar P., Stys D. Multifractality in imaging: application of information entropy for observation of inner dynamics inside of an unlabeled living cell in bright-field microscopy. In ISCS14, Sanayei et al. (eds.), Springer, pp. 261–267, 2015.
Štys D., Urban J., Van k J., Císa P.. Analysis of biological time-lapse microscopic experiment from the point of view of the information theory, Micron, S0968-4328(10)00026-0, 2010.
Štys D., Van k J., Náhlík T., Urban J., Císa P.. The cell monolayer trajectory from the system state point of view. Molecular Biosystems, 7:2824–2833, 2011.
- MATLAB and its toolboxes are trademarks or registered trademarks of The MathWorks, Inc.
- MathWorks builtin demo image pre-packaged into MATLAB.
- The 0th moment: μ0=E[(D−E[D])0]=∑idk0p(dk)=∑kp(dk). The fact that the probability distribution of D is normalized means that the 0th moment is always 1.
- There are three Pythagorean means: Arithmetic, Geometric, and Harmonic
- Approximately 1.349 times for Gaussian distribution
- The 0th central moment again.