Tabular of results for box counting method application.
The current evolution of both texture analysis algorithms and computer technology made boosted development of new algorithms to quantify the textural properties of an image and for medical imaging in recent years. Promising results have shown the ability of texture analysis methods to extract diagnostically meaningful information from medical images that were obtained with various imaging modalities such as positron emission tomography (PET) and magnetic resonance imaging (MRI). Among the texture analysis techniques, fractal geometry has become a tool in medical image analysis. In fact, the concept of fractal dimension can be used in a large number of applications, such as shape analysis and image segmentation. Interestingly, even though the fact that self-similarity can hardly be verified in biological objects imaged with a finite resolution, certain similarities at different spatial scales are quite evident. Precisely, the fractal dimension offers the ability to describe and to characterize the complexity of the images or more precisely of their texture composition.
2.1. Fractal geometry
A fractal is a geometrical object characterized by two fundamental properties: Self-similarity and Hausdorff Besicovich dimension. A self-similar object is exactly or approximately similar to a part of itself and that can be continuously subdivided in parts each of which is (at least approximately) a reduced-scale copy of the whole. Furthermore, a fractal generally shows irregular shapes that cannot be simply described by Euclidian dimension, but, fractal dimension () has to be introduced to extend the concept of dimension to these objects. However, unlike topological dimensions the fd can take non-integer values, meaning that the way a fractal set fills its space is qualitatively and quantitatively different from how an ordinary geometrical set does.
Nature presents a large variety of fractal forms, including trees, rocks, mountains, clouds, biological structures, water courses, coast lines, galaxies. Moreover, it is possible to construct mathematical objects which satisfy the condition of self-similarity and that present fd (Figure 1).
The objects in Figure 1 are self-similar since a part of the object is similar to the whole and the fractal dimension can be calculated by the equation:
where is the number of the auto-similar parts in which an object can be subdivided and is the scaling, that is, the factor needed to observe auto-similar parts. According to the Eq.1, the following values are obtained for the Koch fractal and the Sierpinski triangle:
In mathematics, no universal definition of fd exists and the several definitions of fd may lead to different results for the same object. Among the wide variety of fd definitions that have been introduced, the Hausdorff dimension is surely the most important and the most widely used. Such definition can be theoretically applied to every fractal set but has the disadvantage it cannot always be easily determined by computational methods.
2.2. Hausdorff dimension
Hausdorff dimension was introduced in 1918 by mathematical Felix Hausdorff . Since many of the technical developments used to compute the Hausdorff dimension for highly irregular sets were obtained by Abram Samoilovitch Besicovitch, is sometimes called Hausdorff-Besicovitch dimension.
Hausdorff formulation is based on the construction of a particular measure, , representing the uniform density of the fractal object.
Intuitively we can sum up the construction as follows: let be a fractal and a complete coverage of consisting of spheres of diameter smaller than a given r that approximate, so.
We define the Hausdorff measure as the function that identifies the smallest of all the covering spheres for with:
with volume of the unit sphere in for integer.
We obtain an approximate measurement of, the so-called course-grained volume.
In the one-dimensional case (), supplies the length of set measured with a ruler of length. The shorter the ruler, the longer the length measured, a paradox known as the coastline paradox.
Hence, when the effective length of is well approximated. Limit for small calculated for other values of, however, lead to a degenerate:
Therefore, can be defined as the transition point for the function monotonically decreasing with:
with the -dimensional Hausdorff measure given by Eq. 3.
The course-grained volume defined by Eq. 3 normally presents a scaling like:
that provides a method to estimate the dimension.
In the uni-dimensional case we can easily obtain:
from which we derive.
Although the definition of Hausdorff dimension is particularly useful to operatively define the fd, that presents difficulties when implementing it. In fact, determining the lower bound value of all coverings, as defined in Eq. 5, can be quite complex. For example, let’s consider the uni-dimensional case, in which we want to compute the fd of a coastline (Koch Curve). According to Eq. 3 in the case of the coastline length is measured by a ruler of length. Accuracy of the measure increase with decreasing. For the coastline will have infinite length. Similar arguments can be applied to; for the measure of.
This discussion implies that our coastline (ex. Koch Curve) will have a fd value more than one-dimensional and less than two- dimensional. For this reason, the fd is considered as the transition point (the lower bound value in Eq. 5) between and.
Several computational approaches have been developed to avoid the need of defining the lower bound at issue. Therefore many strategies accomplished the fd computation by retrieving it from the scaling of the object’s bulk with its size. In fact, object’s bulk and its size have a linear relationship in a logarithmic scale so that the slope of the best fitting line may provide an accurate estimation of this relationship. By using this log-log graph, called Richardson’s plot, the requirement of knowing the infimum over all coverings is relaxed.
Several approaches have been developed to estimate fractal dimension of images. In particular, this section will introduce two fractal analysis strategies: the Box Counting Method and the Hand and Dividers Method.
These methods overcome the problem by choosing as covering a simple rectangle fixed grid in order to obtain an upper bound on.
Five algorithms for a practical fd calculation based on these methods will also be presented.
3.1. Box counting method
The most popular method using the best fitting procedure is the so-called Box Counting Method. Given a fractal structure embedded in a d-dimensional volume the box-counting method basically consists of partitioning the structure space with a d-dimensional fixed-grid of square boxes of equal size.
The number of nonempty boxes of size needed to cover the fractal structure depends on:
The box counting algorithm hence counts the number for different values of and plot the log of the number versus the log of the actual box size. The value of the box-counting dimension is estimated from the Richardson’s plot best fitting curve slope.
Figure 2 shows the Box counting method for the Koch Curve.
Several algorithms based on box counting method have been developed and widely used for fd estimation, as it can be applied to sets with or without self-similarity. However, in computing fd with this method, one either counts or does not count a box according to whether there are no points or some points in the box. No provision is made for weighting the box according to the number of points belonging to the fractal and inside the current box.
3.2. Hand and dividers method
Useful features and information can be deducted from the contours of structures belonging to an image and there is a number of techniques that can be used when estimating the boundary fractal dimension.
The Richardson method employs the so-called walking technique consisting of "walking" around the boundary of the structure with a given step length.
The actual structure boundary is so approximated by a polygon whose length is equal to:
In a nutshell, it corresponds to the length of the single step multiplied by the number of steps needed to complete the walk.
The process is then reiterated for different step lengths:
With the perimeter calculated with steps of length.
The object’s boundary fd is finally estimate from:
where is the slope of the Richardson’s plot.
The perimeter length of the boundary depends on the step length used so that a large step provides a rough estimation of the perimeter whereas a smaller step can take into account finer details of the contour.
Consequently, if the step length decreases the perimeter increases.
In practice, the perimeter length is obtained by constructing a generally irregular polygon which approximate the border. Let be the set of coordinates of object boundary and let be a fixed step length. Given a starting point, an arbitrary contour point, the next point on the boundary in a fixed direction (e.g. clockwise) is the point that has a distance
as close as possible to.
The reached point then becomes the new starting point and is used to locate the next point on the boundary that satisfies the previous condition. This process is repeated until the initial starting point is reached.
The sum of all distances corresponds to the irregular polygon perimeter (Figure3).
A number of different perimeters for each polygon at each fixed step length are used to build the Richardson’s plot and the slope of its best linear fit is exploited to estimate the fd.
All Hand and dividers techniques rely on the same identical principle that attempt to approximate the border perimeter with a different polygons. However, since the point coordinates belonging to border set are discrete, all the implemented methods differ in the choice of which point in the set has a distance that better approximate the step length.
The following two methods are the implementations of two different choices about how to overcome this particular issue.
4.1. HYBRID algorithm
The HYBRID algorithm is a computer implementation of Hand and Dividers method developed by Clark. Let be the boundary of the object whose fd we wish to compute. The main part of the method focuses on the perimeter estimation and the corresponding Richardson’s plot is then attained by reiterating this hard core part at different step size. Figure 4 shows the flow chart of the method.
Given an arbitrary starting point and its coordinates on the boundary, the algorithm searches for the next pivot point. In particular the starting point is copied into a current point, , which identifies all points having a mutual distance of about. The actual point running through the entire border is indicated as running point .
Therefore the program searches for a specific running point having a distance from as near as possible to the step. In particular, in the HYBRID method the real step may be chosen to be longer or shorter than the fixed step depending on the minimum deviation from it. Similarly once the running point hits a contour point having a distance from the actual current one bigger than the size step, the choice is made between that point and the preceding one.
Afterward, the computed distance between these two points and is stored and the running point becomes the new current point.
The procedure continues until the initial starting point is reached. Obviously it is likely that after a complete walking the starting point may be reached before having hit the following current point. In other words, there may not be a multiple of step size so that the final incomplete step length is added to the others stored distances, whose sum represent the boundary’s perimeter. Since the fixed step length is adapted every time during the perimeter computation, its averaged value is then computed and used in the Richardson’s plot.
4.2. EXACT algorithm
The EXACT algorithm was proposed for the first time by Clark in 1986. As it will be shown, this method requires a longer computational time by providing a simpler solution to the choice of the best current points.
Similarly to the previous method the entire perimeter estimation is displayed in the flow chart of Figure 5.
The procedure is very similar to the one used for the previous method. As before (see Figure 5), the end of the step may not coincide with the digitized coordinates of the boundary.
The way the EXACT method attempts to overcome this problem relies on the assumption of piecewise linearity, meaning that all the points on the contour can be joined by a series of straight line[13, 14] (see Figure 6 (a)).
The location of the next current point on the boundary from the one previously determined is schematically illustrated in Figure 6 (b).
The procedure starts from an arbitrary starting point and the algorithm searches for the next pivot point. In particular the starting point is copied into a current point, , which identifies all points having a mutual distance of about. The actual point running through the entire border is indicated as running point .
The distance from the current point to each point on the contour line is then calculated until the step length falls between two consecutive boundary points, and for which:
The exact position of the point with coordinates is deduced by a process of geometric interpolation between the two consecutive running points and. This point then becomes the new current point and is used to calculate the next boundary point and so on.
The process is stopped when we come back to the initial starting point in order to obtain a polygon as is shown in Figure 8.
The point becomes the new starting point in order to calculate the next pivot point and so on, until the initial starting point is reached.
The perimeter length of the polygon is found by adding the final incomplete step length to the sum of the other step lengths needed to entirely cover the boundary.
The procedure is then repeated for different step lengths.
The results, i.e., perimeter lengths versus step lengths, are plotted on a log-log Richardson’s Plot. From the slope of the fitting line on the Richardson’s plot we obtain the fd of the examined boundary[17, 12, 16, 18, 19, 1, 20, 21, 4]
4.3. Box-counting algorithm
The Box-counting algorithm implementation of box-counting method relies on the basic idea of covering a given digital binary image with a set of measuring boxes of sizes and then to count the number of boxes which actually contain the image.
Figure 8 shows the flow chart for box-counting fd estimation and for different box sizes. Moreover, since the procedure of size scaling (with number of iterations) may be not always applicable to any image matrix size, image padding with background pixels is performed.
Therefore the final image has a dimension that is a power of 2. This can be easily implemented by using padarray matlab function.
4.4. Differential Box-counting algorithm (DBC)
The box counting method is an extremely powerful tool for fd computation; in fact, it is easy to implement as well as flexible and robust.
However, a major limitation lies on the fact that the counting process of nonempty boxes implies its use only for binary images rather than gray scale ones. An extension of the standard approach to gray scale images is called the Differential Box Counting (DBC) and has been proposed in 1994 by N. Sarkar and Chaudhuri.
In the DBC method, a gray level image is considered as a 3-D spatial surface with denoting the pixels spatial coordinates and the third axis the pixels gray level.
As for the standard box counting, the image matrix is partitioned into non-overlapping -sized boxes, where is an integer falling in the interval.
Then, the scale of each block is. On each block there is a column of boxes of size, with denoting the height of a single box. Named the total number of gray levels in, hence is defined by the relationship .
Let numbers... be assigned to the boxes so to group the gray levels. Let the minimum and the maximum gray level of the image in the grid fall in box number and, respectively.
The number of boxes covering this block is calculated as:
In Figure 9 for example, hence.
Extending to the contribution from all blocks:
The Eq. 16 is computed for different box size (so for different) and the values of are plotted versus the values of in a log-log plot.
A matlab implementation of DBC can make use of functions such as or in order to make the box partitioning and apply the Eq. 15.
The DBC procedure has some weak points in the method used to select an appropriate box height, since the values of is limited to the image size and is limited by the number of blocks of size in which the image is divided.
Secondly, the box number calculation may lead to overestimate the number of boxes needed to cover the surface. Let and be the pixels associated with the minimum and the maximum gray level of the block respectively, as is illustrated in Figure 10.
According to DBC procedure, the two pixels are assigned in boxes 2 and boxes 3. The distance between and is smaller than 3, which is the size of the box.
Hence, when calculating Eq. 15, the block can be covered by a single box but its pixels with minimum and maximum gray levels fall into two different boxes.
To solve the aforementioned problems some modifications was proposed by J. Li, Q. Du and C. Sun. Given a digital image of size, a new scale is defined instead of, i.e. where is a positive real number.
In particular, let and be the mean and the standard deviation of respectively. Hence, if the greater part of image pixels fall into the interval of gray level within, where is a positive integer, the height of the boxes is given by:
As a result, the errors introduced using are smaller than in the original DBC method. A box with smaller height is chosen when a higher intensity variation is present on the image surface. So the improved method uses, in general, finer scales to count.
Moreover, the use of instead of to count the number of boxes leads to the following modification of Eq. 15:
with ceil(. ) denoting the function rounds the elements of the quantity into (. ) to the nearest integers greater or equal to it.
As an example, suppose that the block is covered by a column boxes with the size. If the pixels and represent the maximum and the minimum gray levels of the block, the two pixels will be assigned as in Figure 10.
According to Eq. 18 the number of counted boxes is, which is exactly the number of boxes covering the block.
As in standard box counting method, after having determined the number nr(i,j) for each block, the total number of boxes covering the full image surface is computed for different scales. Plotting the linear fit of log versus the log (Richardson’s plot) the fd is finally estimated.
5. Applications and discussion
Each described method has been implemented in Matlab 2010a and applied to either well-known fractals or biomedical images.
The results on the hand and dividers methods are shown in the table 1. The computed values are also compared to the theoretical fd values. The computational time for a 2.50 GHz 5i CPU is also shown.
The value ranges for the step size are not displayed but they were automatically chosen based upon the computation of the structure’s maximum caliber diameter which is defined as the major axis of an ellipse in which the structure can be embedded. The range was then running from the 40of the maximum caliber diameter to the minimum step defined as the maximum distance between any two contiguous border points.
In practice, both EXACT and HYBRID methods computed the different step sizes by scaling each time the maximum step by with the number of the iteration. The chosen value of is a compromise between a sufficient number of fitting points and the need to avoid too small variations of the step size so to duplicate perimeter estimation. The latter usually occurs in HYBRID method for it hits the same current points if the step does not vary enough in two consecutive iterations.
The parameter’s estimation uncertainty is also shown in the table 1; that is calculated from the fitting accuracy based upon standard linear regression.
The number of data points used in the Richardson’s plot was about 60 and two examples of that computation using EXACT and HYBRID are shown in Figure 12.
On the table 2 the computation results for the box counting method are also shown. The type of the displayed values are similar to the previous ones with the exception of Box counting uncertainty. In fact, the way an image can be partitioned into several boxes may affect the final computation of the number of nonempty boxes.
To investigate the variability of the fd for different box partitioning layouts, random box subdivisions have been applied. Therefore, the results on the table 2 show the standard deviation of the different computed fds and the mean values for each fractal at issue. In general, that variability is more pronounced in images having rougher resolution.
|Twin Dragon Hybrid||1.5236||1.466||8.6||0.006||117005|
|Twin Dragon Exact||1.5236||1.465||11.5||0.006||117005|
In general, the EXACT and the HYBRID methods appeared to be more precise than the box counting method but on the other hand they have a less wide range of applicability. However, this is also the reason of the fortune of the box counting methods compared to the others. Also, HYBRID technique is computationally less expensive than EXACT especially when the number of border points is quite large. The use of a variable step length which can be shorter or longer than the fixed step size leads to a larger variability and so to a Richardoson’s plot having a less accurate fitting. That has effects on the uncertainties of the parameter to estimate. Because of that, a more careful choice of the step size range is needed in the case of HYBRID method.
Importantly, it is quite clear that the choice of the starting point may also affect the perimeter value as the following currents points will depend upon this. A test on 80 random starting points for the Gosper Island fractal revealed that the fd computation performed with the HYBRID method appeared to be more stable than the one with EXACT.
As for walking method, in box counting the process of scaling from the maximum box size is limited by the pixel size so in principle a gross resolution might be the reason of a bad estimate of fd. It is noteworthy that the tests performed do not show any correlation between resolution and fd accuracy; that may be also caused by the fact that some fractals such as dragon does not reproduce the real fractal at small scales.
An application of the DBC method on a x-ray image is also shown in Figure 13 where breast cancer mammography image has been processed. The method uses a sliding technique as implemented in or matlab functions so to produce an image rather than a single fd value as previously described.
The second DBC method shows higher contrast in the area of the cancer and consequently lower fd values. Due to the enormous amount of linear fitting performed for an image size of 3450 3100 the computational time reached 15 minutes.
In this chapter some of the most widely used and robust methods for fractal dimension estimation as well as their performances have been described. For few of them a detailed description of the algorithm has been also reported to make much easier for a beginner to start and implement his own Matlab code. Computational time is not excessively long to necessitate compiled functions such as C-mex files but that can be an advantage when using very high resolution images. The use of the described algorithms is obviously not restricted to the sole field of the image processing but it can be applied with some changes to any data analysis.