Open access peer-reviewed chapter

Application of Genetic Algorithm in Numerous Scientific Fields

Written By

Gautam Garai

Reviewed: 07 June 2022 Published: 13 July 2022

DOI: 10.5772/intechopen.105740

From the Edited Volume

Genetic Algorithms

Edited by Sebastián Ventura, José María Luna and José María Moyano

Chapter metrics overview

286 Chapter Downloads

View Full Metrics

Abstract

The genetic algorithm (GA) and its variants have been used in a wide variety of fields by the scientists efficiently for solving problems. From the pool of evolutionary algorithms, the GA is chosen by the researchers and has been popular as a useful and effective optimizer. It has several advantages and disadvantages. However, it provides solutions for various kinds of problems such as space research, economics, market study, geography, remote sensing, agriculture, data mining, cancer detection, and many more. This chapter discusses the utilization of the GA in some of these fields with a few experimental results such as data clustering, pattern identification and matching, and shape detection. The results are illustrated and explained with reasons for better understanding of the GA application in the scientific fields. Other than these, the GA in bioinformatics for biological sequence alignment is discussed with examples.

Keywords

  • genetic algorithm
  • optimization
  • application
  • scientific field
  • stochastic search

1. Introduction

In the designing process of scientific problems, the primary goal of the researchers is to develop a system to provide an efficient low-cost solution. In this endeavor, they need a good optimizer to reach their target. Several optimizers are available, which work efficiently for various kinds of problems. Among these, genetic algorithm (GA) is chosen as a functional tool for good and useful optimizer. The algorithm has proven to be a class of effective optimization techniques for many applications in engineering, economics, manufacturing, artificial intelligence, operations research, space science, agriculture, physics, chemistry, bioinformatics, medical science, and many more. However, there are advantages and disadvantages of using genetic algorithms as an optimizer such as other optimization tools. The amount of a priori knowledge required to use genetic algorithms is minimal. One can apply an “off the shelf” genetic algorithm to an optimization problem by mapping all of the function inputs to a pre-specified representation, such as binary strings. However, one often specializes a genetic algorithm to a specific problem domain by using additional heuristics, specialized representation, and/or operators, which constitute the a priori information for obtaining superior performance. If one opts to use a genetic algorithm without expending effort to customize it to the problem at hand, the trade-off is typically that the non-specialized genetic algorithm may produce poor results, which may include computationally inefficient optimization resulting to converge to a suboptimal solution. The scientists have incorporated customization in the conventional GA according to the need and specification of their research problems.

This chapter discusses the utilization of GA in various fields with examples and illustrations. It is shown how the GA recognizes a specific pattern from a group of patterns. Genetic-algorithm-based clustering technique is then described for identifying similar group of items among several homogeneous or heterogeneous groups. The discussion is also done on the biological sequence alignment with the GA in achieving accurate alignment.

The rest of the chapter is organized as follows. Section 2 describes briefly the history of evolutionary algorithm along with the GA. A survey on genetic algorithm is narrated in Section 3. The next three sections elaborate several utilizations of the GA in various scientific fields. Section 4 identifies a pattern among a group of patterns. Genetically guided clustering technique is elaborated in Section 5. Section 6 describes the application of genetic algorithm for detection of polygonal shape of a dot pattern. The biological sequence alignment is discussed in Section 7. Section 8 concludes the chapter.

Advertisement

2. A brief history of evolutionary computation

In the literatures, three primary search methods are identified, namely calculus-based, enumerative, and random [1] search. Calculus-based methods are further subdivided into two classes—indirect and direct. Indirect search method seeks the local extrema by solving a set of equations resulting from the derivative of objective function. On the other hand, direct search method seeks the local optima by hopping on the function and moving in a direction related to the local gradient. This is basically the notion of hill-climbing. Enumerative scheme has been considered in many shapes and sizes. Here, the search algorithm finds the objective function value at every point within a finite or a discrete infinite space. Random search is the most popular among the researchers since the shortcomings of other two techniques are rectified here. The genetic algorithm is an example of a search procedure of this category [1].

For several years since 1950, many computer researchers studied evolutionary systems independently with an idea that in future evolution could be used as an important and essential optimization tool for solving various problems in engineering. The idea was to develop a system with a population of candidate solutions to a given problem with operators influenced by both natural selection and natural genetic variation. Rechenberg [2] introduced “evolution strategies” (Evolutionsstrategie in the original German), a method used to optimize real-valued parameters for devices such as air foils. This idea was further developed by Schwefel [3]. Fogel et al. [4] developed a technique called “evolutionary programming” where the candidate solutions to the given tasks were represented as finite-state machines, which were evolved by randomly mutating their state-transition diagrams and selecting the fittest among them. The evolution strategies, evolutionary programming and genetic algorithms together form the backbone of evolutionary computation. Several other researchers developed evolution-inspired algorithms for optimization and machine learning. Box [5], Friedman [6], Bremermann [7], and Reed et al. [8] worked in this area though their studies had little impact on the field. Evolutionary biologists also used computers to simulate evolution for the controlled experiments [9, 10, 11].

Finally, in 1960s and 1970s, at the University of Michigan, Holland first invented the genetic algorithms and later on improved by himself along with his colleagues [1]. Unlike the previous evolution strategies, Holland’s intention was to formally study the phenomenon of adaptation, which occurs in nature instead of designing algorithms for solving specific problems. He also developed ways by importing the natural adaptation mechanisms in computer systems. The genetic algorithm was presented by Holland as an abstraction of the biological evolution and also gave a theoretical framework for adaptation. His GA was a method for moving from one population of “chromosome” (string of 1’s and 0’s) to a new population with a kind of “natural selection” along with the genetic operators, namely crossover, mutation, and inversion. Each chromosome was a combination of “genes” (bits), and each gene was a particular “allele” (either 0 or 1). In the population the useful chromosomes for reproduction were chosen by the selection operator. The fitter chromosomes thus produced more off-springs compared with less fit ones. In the process of crossover, the subparts of two chromosomes were exchanged. This basically mimicked the biological recombination between two single-chromosome (haploid) organisms. The allele values of some locations of the chromosome were randomly changed and reversed by mutation and inversion processes, respectively.

For last several years, the researchers had been studying and interacting widely on different evolutionary computation methods and ultimately it had broken down to some extent the boundaries between GAs, evolutionary programming, evolution strategies, and other evolutionary approaches. Today the term “genetic algorithm” is used by the researchers to describe something very far from the original conception of Holland. There are at least three overlapping meanings of search in genetic algorithms and other search procedures, as discussed below. Search for stored data – Here the problem is to efficiently retrieve information stored in the computer memory. Suppose in a large database of names and addresses stored in some ordered way we want to search for a record corresponding to a given last name using Binary search procedure. Search for paths to goal – Here the problem efficiently searches to find a set of actions that will reach a given goal starting from a given initial state. This form of search is central to many techniques in artificial intelligence. Figure 1 illustrates a simple example of 8-puzzle. A set of tiles numbered 1–8 are placed in a square, leaving one space empty. Sliding one of the adjacent tiles into the blank space is termed a move. Typical algorithms are branch and bound search, depth-first search, etc. Search for solutions – This is rather a general class of search compared with search for paths to goal. Here, the concept is to efficiently search a final solution to a problem in a huge space of candidate solutions. Among these solutions, one may or may not be the goal and the solution may or may not be improved with the progress of the search. These are the kinds of search problems for where the genetic algorithms are used.

Figure 1.

Eight-puzzle problem with the initial state and the final state.

Although the genetic algorithm is directed to search for paths to goal or search for solutions, it cannot be guaranteed that it will always reach the goal. This is due to its stochastic or random search nature. On the other hand, this characteristic enables the rapid determination of a solution. With the advancement of the search process, GAs follow certain specific steps, which are not similar to the traditional search procedures.

Advertisement

3. A survey on genetic algorithms

The GAs have been employed in a wide variety of problems. Some of the studies involving medical image registration, image segmentation, and contour recognition are available in [12, 13, 14]. In addition, the classification of endothelial cells in tissue section, normalization of Chinese handwriting, and evaluation of earthquake risk for geological structures are examples of some practical applications [15, 16, 17]. GAs have also been used in optimization of feature extraction chain, error-correcting graph isomorphism, and dot pattern matching [18, 19, 20]. Moreover, the techniques to generate fuzzy rules for target tracking, constrained placement problems, process planning for job shop matching and blind channel identification with higher-order cumulation fitting have employed GA [21, 22, 23, 24]. In the field of information retrieval, GA is used to retrieve the dynamic web-based contents [25]. Raidl et al. worked with biased mutation in evolutionary algorithm for solving subset-selection problems on complete graphs [26].

Although the simple genetic algorithm has been used for solving various problems, researchers have always tried to enhance the performance of GA by modifying the algorithm. Among several approaches for the improvement of performance, in 1970, Cavicchio developed a technique to conserve the best individuals by substituting the inferior parents if the offspring’s fitness exceeded that of the inferior parent [27]. The crowding scheme of De Jong [28] intended to retain the diversity and the best individuals in the population by exchanging the maximally similar strings. Fogel [29] as well as Back and Schwefel [30] tested some optimizing functions for their algorithms and showed how the evolutionary method with self-adaptive mutation performs better than the method without self-adaptive mutation. Other than simple genetic approaches, Davis suggested that hybridizing genetic algorithms with the most successful optimization methods for particular problems provided the best performance [31]. The hybrid genetic scheme is aimed to hill-climbing from several points scattered in the search space. In case of multimodal objective function, it is obvious that some chromosomes/strings (offspring) will be in the basin of attraction of the global solution, where hill-climbing is an effective and fast form of search. However, a hybrid approach spoils hyperplane sampling, but does not destroy it entirely.

Some more variants of GA are also found in the literature [1, 32, 33, 34, 35]. However, the implementation of modified genetic algorithm often increases the computation time for solving various complex problems, and the researchers have tried to increase the speed of the algorithm using parallel/distributed GA when the computation time is large for a particular problem. Various parallel implementations of GAs are discussed in [36, 37]. Multiobjective optimization problem is also solved by parallel genetic algorithm. The major parallel multiobjective genetic algorithms are discussed and some observations are included in [38]. Furthermore, two approaches of parallel GAs, namely the inland model and the diffusion model, are discussed in [39, 40], respectively. In the inland model, the population is divided into a number of smaller populations. Among the subpopulations, the migration of individuals happens occasionally during the progress of search process. However, the selection of individuals for migration and the frequency of migration are significant debatable problems [36]. On the contrary, each individual in the diffusion model is related to a spatial location on a low-dimensional grid. The entire population is regarded as a group of active individuals, which interact only with the neighbors. Another important utilization of distributed GA is noticed in the performance-driven VLSI routing problem, which can handle both electrical and topological constraints of the problem [41].

Other than these, the well-known NP-hard Traveling Salesman problem on a cluster of nodes is also solved by the application of parallel genetic algorithm [42]. A master-slave technique is implemented in the parallel/distributed genetic approach for obtaining the optimal and/or suboptimal traveling path(s). Moreover, the distributed genetic algorithm is employed in the labor scheduling problems to ascertain the number of employees and their work schedule. The objective is to minimize the labor expenses and expected opportunity costs [43].

The researchers have also successfully employed GAs in solving various kinds of problems in other scientific fields. Notredome and Higgins [44] developed a popular software for aligning sequences with two objective functions. Several GA-based approaches were developed for solving multisequence alignment [45, 46, 47, 48, 49]. In chemistry, GAs were incorporated in studying water oxidation [50], magnetic storage [51], catalysis [52], stability of boron wheels [53]. Sathya et al. developed a genetic-based algorithm to classify genes for identification of biomaker genes to suggest an individualized treatment [54]. As a robust heuristic search method, the GA helped for mining information from the large datasets. It was applied to discover interesting and useful relations between data elements with abstraction in the domain of association rules [55]. Another GA-based classification technique was used to improve the feature selection with support vector machine (SVM) classifier. Feature selection was used to classify a breast cancer dataset with 699 instances into two classes, namely benign and malignant, each with 11 attributes [56]. A genetic approach was also applied in machine learning for selecting the proper and the best move in playing of chess. The technique helped in deciding the effective move by classifying the set of rules for a particular solution [57]. In addition, several GAs and hybrid GAs were proposed by the medical practitioners as well as the scientists to detect cancer by different approaches such as feature selection, categorization, and classification, registering temperature difference, and many more [58]. A hybrid genetic approach was developed for early diagnosis of breast cancer using thermography. This technique measured the temperature difference between cancerous and healthy tissues with a high success rate [59].

In the following sections, some applications of genetic algorithm with several illustrations are discussed. The parameters of experimental setup are provided to help implementation of the GAs in solving similar or different kinds of problems. Some exceptional results are also elaborated with illustrations to understand the strength of GA for finding solution in numerous scientific fields.

Advertisement

4. Dot pattern matching identification

In a 2-D or 3-D feature space, a dot pattern represents some physical objects or class of objects with a set of dots or points. Such dot patterns are used in geographic and cartographic data, spatial information system, astronomy and astrophysics, remote sensing, image texture analysis, biomedical imaging, and some other areas [60, 61, 62, 63]. The involvement of such studies with dot patterns includes the set estimation and shape identification, identification of point process parameter, and clustering and classification.

The dot pattern matching problem is defined as follows. T is a given test dot pattern, which is to be identified in an unknown scene of dot patterns S (a set of dot patterns or objects). Now, to find the position of best match in S, T should be translated and rotated. So a reference point (the centroid of the coordinates of the dots of T) is required to translate T. Let this centroid be O. In the space S, the translation of dot pattern to a point P indicates that their centroid is translated to P. Similarly, its rotation by θ is the rotation of pattern(s) with respect to O as origin.

The matching score is now evaluated as follows for two dot patterns S and T. Once T is transformed and rotated with respect to the origin O, the distance dtisj between a point ti of T and each of the points sj of S is calculated and the minimum distance is considered. If the number of points of T is α, then the sum of the minimum distances Dmin for all α points is as follows.

Dmin=i=1αMindtisjj=1,2,,βE1

where s1,s2,sβ and t1,t2,tα are the points of S and T, respectively, and αβ. Dmin indicates the best matching score of two dot patterns or objects.

In pattern recognition problem, the performance of CGA (conventional genetic algorithm) [1] has been tested with the sequential and distribution of two GA-based modified methods, namely, cascaded genetic algorithm (CAGA) [64] and distributed CAGA (DCAGA) [65].

In dot pattern or object matching problem, S is a set of dot patterns of different shapes as depicted in Figures 2 and 3, and we have taken one of them as a test pattern T and matched it with S. The scores for matching between T and S for the genetic methods are shown in Tables 13. Here the score for successful matching is denoted by a ratio of two numbers. One of them is the number of times the solution has been reached and the total number of trials in percentage. Two types of matching, namely perfect matching and imperfect but visual matching, are considered. In case of visually matched patterns, it is observed that T is superimposed over S without any visible error, but the matching error is not negligible. On the contrary, the calculated error of matching between T and S is close to zero in perfectly matched situation. Naturally, the pattern or object is termed as visually matched.

Figure 2.

A scene of multiple dot patterns in 2-D space [64].

Figure 3.

A scene of multiple objects with edge map in 2-D space [64].

Dot patternCAGACGA
Successful
matching
Unsuccessful matchingSuccessful
matching
Unsuccessful matching
Perfectly matchedVisually matchedNot
matched
Perfectly matchedVisually matchedNot
matched
DP122%53%25%0%10%90%
DP230%40%30%0%7%93%
DP373%23%4%3%53%44%
DP417%80%3%0%50%50%

Table 1.

Matching results of CAGA and CGA on the dot patterns in Figure 2. The following data have been summarized over 50 runs for each DP.

Dot patternDCAGADCGA
Successful
matching
Unsuccessful matchingSuccessful
matching
Unsuccessful matching
Perfectly matchedVisually matchedNot
matched
Perfectly matchedVisually matchedNot
matched
DP157%43%0%14%73%13%
DP240%60%0%10%70%20%
DP383%17%0%10%67%23%
DP440%60%0%10%90%0%

Table 2.

Matching results of DCAGA and DCGA on the dot patterns in Figure 2. The following data have been summarized over 50 runs for each DP.

Dot patternDCAGADCGA
Successful
matching
Unsuccessful matchingSuccessful
matching
Unsuccessful matching
Perfectly matchedVisually matchedNot
matched
Perfectly matchedVisually matchedNot
matched
P133%67%0%0%30%70%
P276%24%0%10%63%27%
P327%67%6%3%10%87%
P417%80%3%0%27%73%

Table 3.

Matching results of DCAGA and DCGA on the objects with edge map in Figure 3. The following data have been summarized over 50 runs for each DP.

The dot/point patterns depicted in Figure 3 are basically the gray-tone digital images. The pattern recognition experiment performed over these patterns is matching of edge maps. In Figure 3, the gray-tone digital image is a combination of four different objects of 512X512 pixels with 256 possible gray levels. The Sobel Operator is used to convert the digital image (Figure 3) into a two-tone edge map [66]. The edge map is usually considered as a discrete dot pattern with a dot represented digitally by 1 and a blank (white space) digitally by 0. In Figure 3, the object (P2) is separated from other three objects (P1, P3, and P4), which are very close and touched each other.

We consider the results of GAs for Figure 2 in the experiment. From Table 1, the success rate of CAGA for the best case is 96% for DP3 where 73% of times the pattern is perfectly matched and 23% of times it is visually matched. The success rate of CAGA for DP4 is 97%, but in this case the pattern is perfectly matched for only 17% of times. The performance of CAGA is worst for DP2 with failure rate of 30%. On the other hand, the best performance of CGA is achieved for DP3 where the pattern is perfectly matched for 3% of times and visually matched for 53% of times. In CGA, the successful matching score of DP2 is only 7% (visually matched), which is worst among all four patterns.

From Table 2, it is noted that for the DCAGA, the success rate is 100% for all four patterns considering perfect and visual matching. The DCAGA achieves the best performance for DP3. On the contrary, in the worst case, the failure rate of the DCGA is at most 23%. However, for matching of the pattern DP4, a success rate of 100% is reached by the DCGA.

Table 3 tabulates the experimental results of object matching for the GA-based distributed approaches. It is observed that the DCAGA cannot always achieve the success rate of 100% as noticed for the earlier experiment of dot pattern matching in Table 2. The success rate of the DCAGA is 100% for P1 and P2, and the failure rate is a maximum 6% for P3 and P4. It is noted that both techniques perform best for the isolated object P2. The DCGA does not perform well for the objects, which are close to each other.

Advertisement

5. Genetically guided clustering algorithm

In general, the clustering problem can be presented as follows:

For the set of n data X=x1x2xn to be clustered, each xip is an attribute vector consisting of p real measurement describing the i-th object. The objects corresponding to the data are to be clustered into non-overlapping groups C=C1C2Ck where k is number of clusters, C1C2Ck=X, Ciϕ, and CiCj=ϕ for ij. The objects within each group should be more similar to each other than the objects in any other group, and the value of k may or may not be known.

Clustering approaches are semi-optimal ways of arriving at the grouping problem. The number of ways of sorting n objects into k groups is enormous, which is given as

Nnk=1k!i=0k1ikikinE2

For 25 objects with five clusters, this is in the order of 1016. Clearly, it is impractical to exhaustively search the solution space and obtain the optimal solution.

5.1 Measures of similarity

Since similarity is fundamental to the definition of a cluster, a measure of similarity between two objects/patterns/entities drawn from the same feature space is essential to most clustering procedures. The proximity of individuals (i.e., objects) is usually expressed as a distance during the clustering of data units. The clustering of variables generally involves a correlation or other such measure of association. The smaller the distance between the objects, the higher is the similarity. Because of the variety of feature types and scales, the distance measure (or measures) must be chosen carefully. Several distance measures are employed for clustering [67, 68]. The most popular and commonly used one is the Euclidean distance,

dXiXj=XiXjXiXj=l=1pxilxjl21/2E3

which is the line of sight distance between two points representing the objects.

5.2 Genetic-algorithm-based clustering

Let us consider a set consisting of n vectors X=x1x2xn, which will be clustered into k groups of homogeneous data. Each xiRd is a vector in a feature space of d real-valued measurements for defining the object features denoted by xi. The features can be color, breath, length, etc.

We have discussed data clustering with a GA-based algorithm. This algorithm is dependent on the splitting and merging techniques [69]. Initially the data are split into several sub-clusters, and in the next step those sub-clusters are merged to find out the required clusters. We have considered various types of datasets to show how the data are grouped to isolate properly the required number of clusters.

5.2.1 Cluster partitioning in R2 feature space

We have studied seven datasets in R2 feature space depicted in Figures 410. Among them, both Figures 4 and 6 consist of three clusters, although the nature of clusters is different. In both figures, two clusters are located close to each other, whereas the third one is placed far from those two clusters. However, in Figure 4, the density of closely placed clusters is nonuniform. On the other hand, the third one is three times in size compared with other two clusters. The data points are more dense at the cluster center. The density gradually gets reduced toward the boundary. Figure 4(a) shows the original dataset, and Figure 4(b) illustrates how the clusters are isolated after using the genetically guided method.

Figure 4.

(a) The original dataset with three clusters before the application of GA-based algorithm. (b) Isolated three clusters after completion of the algorithm [69].

The dataset in Figure 5(a) comprises five clusters. All of them are denser near the center and lighter toward the boundary. Three clusters are equal in size, and two of them are close to each other. The five clusters have been correctly isolated using the algorithm, as shown in Figure 5(b).

Figure 5.

(a) The original dataset with five clusters before the application of algorithm. (b) Isolated five clusters after the completion of algorithm [69].

On the contrary, Figure 6 consists of two closely located clusters of different densities. One cluster has uniformly dense data and the other has density tapering away from the center. The data density of the third cluster is also uniform. However, all of them are equal in size. Figure 6(a) and (b) illustrate the scenario, respectively, before and after the application of algorithm. All three clusters are properly isolated.

Figure 6.

(a) The original dataset with three clusters before the application of algorithm. (b) Isolated three clusters after the completion of algorithm [69].

Figure 7 depicts two almost overlapping clusters of equal size and density. The data points distribution of both the clusters is interesting since they follow Gaussian distribution in R2 space. The clusters are so closely placed that they seem to be visually a single cluster. However, the genetic algorithm can correctly identify the clusters, as illustrated in Figure 7(b).

Figure 7.

(a) The original dataset with two clusters before the application of algorithm. (b) Isolated two clusters after the completion of algorithm [69].

The dataset of Figure 8 is different and interesting from previous four datasets. Here, one cluster is completely enclosed by the other. The data point density of both clusters is almost uniform. In the dataset, a special feature of the algorithm is used. It is to check the adjacency condition of sub-clusters, which are to be merged to isolate the clusters ultimately as depicted in 8(b).

Figure 8.

(a) The original dataset with two clusters before the application of algorithm. (b) Isolated two clusters after the completion of algorithm [69].

Figures 9 and 10 consist of special type of clusters. Figure 9(a) is a combination of six clusters, which are different in density as well as size. Among six clusters, two are identical with elliptical shape and both are connected with a string of highly dense data points. They appear visually as a single cluster although they are disjoint. The set of clusters are shape-wise three in numbers and have been identified as three clusters. Here, the string of points is defined as a third cluster, and it connects those two elliptical shaped clusters. In Figure 9(a), two of the remaining three clusters are almost equal in shape and density. The largest cluster in Figure 9(a) has lesser density compared with other two clusters. However, all six clusters including the string of points are correctly isolated after application of the genetic algorithm as shown in Figure 9(b).

Figure 9.

(a) The original dataset with six clusters before the application of algorithm. (b) Isolated six clusters after the completion of algorithm [69].

Figure 10.

(a) The original dataset with two clusters and noise before the application of algorithm. (b) Isolated two genuine clusters and separated noise after the completion of algorithm [69].

The data shown in Figure 10(a) are quite different from all other datasets. It is also a special dataset in nature because it consists of two identical shaped clusters with different density in the presence of random noise scattered all over the 2-D feature space. Two clusters are perfectly identified and also the noise, which is identified as the third cluster as shown in Figure 10(b). After application of the algorithm it is seen that the noise is clustered locally in smaller arbitrary shape in the space due to inherent characteristics of the random noise. The third cluster is then formed by merging all smaller clusters of noisy data as depicted in Figure 10(b).

5.2.2 Cluster separation in Rn feature space

We have now discussed the cluster identification of a dataset of more than two features. Iris data is a set of four features [70], which is one of the most popular databases in the pattern recognition literature. The dataset contains three classes named as Iris Setosa (Class A), Iris Versicolor (Class B), and Iris Virginica (Class C). Each class has 50 instances and refers to a kind of Iris plant. The four feature attributes of the data are Petal length, Petal width, Sepal length, and Sepal width. Among the three classes, two are not linearly separable from each other, whereas the third one is identified separately from other two classes.

One cluster has been separated clearly from other two after invoking the algorithm. However, other two clusters are not perfectly separable (see Table 4).

Classified as class AClassified as class BClassified as class C
Actual class A5000
Actual class B04010
Actual class C0050

Table 4.

Confusion matrix for Iris flower classification.

Advertisement

6. Shape recognition with genetic algorithm

The processing of dot patterns (DPs) or point sets in a plane is useful and important in pattern recognition problems. Dot patterns are encountered in various problems including points in feature space [71], pixels in digital image [72], physical objects such as stars in the galaxy [73], or spatial data [74, 75, 76].

An attribute of dot patterns is the visual shape generated by it. One simple way of defining the external shape is the convex hull [77]. But in most cases, the underlying shape from which the points emerge is not convex. To detect the nonconvex shape, the GA-based split and merge procedure [78] starts with the initial convex hull polygon. The Splitting process first divides the sides of the polygon. Consider a side AB¯ and evaluate the average distance of r neighboring points of A and B. Now, if the average distance is smaller than the length of AB¯, the side AB¯ is considered as a candidate for splitting. Eventually such development takes care of the concavity of the DP by generating a zigzag polygonal border as depicted in Figure 11(a). The merging algorithm is then applied for having a smoother border. Splitting results in inclusion of additional edges while merging does just the opposite, making the border looks less zigzag as in Figure 11(b).

Figure 11.

(a) Edge (represented by dashed line) marked for splitting process (edge AB before splitting; edges AC and BC created by splitting). (b) Edge marked for merging process over region XYZ and region PQRS). (c) Edges (represented by dashed lines) marked for isolation process [78].

A DP may also be the union of more than one disjoint smaller DP region as shown in Figure 11(c). The isolation process separates such regions by removing two lines of the resulting polygon simultaneously, which act as the bridge between two disjoint regions. The removal of lines is possible if the region within the lines does not contain any dot.

Now, consider a set S that contains n points in the plane and the convex hull consists of the points of S. In that case, the underlying shape of the pattern is the convex hull if the dot pattern does not contain any concavity, else the splitting of edges is required. Figure 11 shows the splitting and merging of the edge/regions and the region for isolation.

6.1 Polygonal shape detection of dot pattern

The algorithm is studied on different dot pattern sets in R2 space. The datasets are largely classified into three groups. Among these groups one consists of dot patterns with almost uniform density, as in Figures 12 and 13. Figure 14 represents a dot pattern of variable density data points, and the data pooled from Gaussian distribution is shown in Figure 15. A special dot pattern with multiple distinct components is illustrated in Figure 16.

Figure 12.

(a) A spiral shaped DP and its convex hull. (b) Resulting perceptual border of the DP [78].

Figure 13.

(a) A star-shaped DP and its convex hull. (b) Resulting perceptual border of the DP with the GA-based merging process [78].

Figure 14.

(a) A DP with nonuniform density and its convex hull. (b) Resulting perceptual border of the DP [78].

Figure 15.

(a) A DP with two Gaussian distributions and its convex hull. (b) Resulting perceptual border of the DP applying GA-based merging process [78].

Figure 16.

(a) A multicomponent chromosome-like dot pattern and its convex hull. (b) Resulting perceptual border of the components [78].

In the group of datasets with nearly uniform density data points, Figures 12 and 13 show a DP with a concavity of spiral shape and a star-shaped DP, respectively. In Figure 12, the splitting process starts with the convex hull of 35 edges and ends with a zigzag border polygon of 187 sides. After the splitting process, the merging algorithm is applied on the polygon since the DP does not contain any isolated component. Finally, a polygon of 68 edges is generated with a smooth border, as in Figure 12(b). Figure 13(a) shows a polygon consisting of five concavities and the splitting process is executed on each side of the five-side convex hull. The resulting polygon of 181 edges is formed with a zigzag border. The GA-based procedure is applied on this polygon to produce a polygon (of 26 sides) with a star-like perceptual border as in Figure 13(b).

The dot patterns of Figures 14 and 15 are chosen to show the performance of the algorithm on variable density data points. Each DP consists of two concavities. In Figure 14, the upper half of the DP (the denser region) has one concavity and the other one is in the lower half of the DP (the lesser density region). The density of dots in the dot pattern is gradually reduced toward the lower half. The DP of Figure 15 also contains two concavities located in the middle region and opposite to each other. Its density decreases gradually from the center of each half toward the boundary portion. We have employed the GA-based merging technique on this dataset at the end of the splitting process. In Figure 14, the splitting procedure is started with a convex hull polygon of 24 sides (see Figure 14(a)) and a polygon of 102 sides is generated. Next, the merging process is executed over it and a 52-edge polygon with a smoother border is obtained as in Figure 14(b). In Figure 15(a), the splitting process starts on a 22-side convex hull polygon to create a polygon of 40 edges. The application of merging process finally generates a 32-side smother border polygon as shown in Figure 15(b).

The ultimate polygons with smother border as in Figure 17(b) and (c) are generated from the starting polygon of Figure 17(a) if the user specifies the number of sides. Figure 17(a) shows a C-shape DP with its convex hull. In both cases, the splitting process is continued as long as there is no violation of the density-length condition. Now, the ultimate polygonal shape as in Figure 17(b) is obtained depending on the user’s input of a 36-side polygon (say). Similarly, the polygon of Figure 17(c) is produced from Figure 17(a) when the user specifies 23 as the number of sides of polygon (say).

Figure 17.

(a) A C-shaped DP and its convex hull. (b) Perceptual border of the DP with 36 edges (user specified). (c) Perceptual border of the DP with 23 edges (user-specified) [78].

Figure 16 is a special dot pattern of three components where each one contains nearly uniformly dense data points. The convex hull of the DP is a 17-side polygon. The splitting process followed by the isolation process separates the DP into three distinct regions with zigzag polygonal boundaries. Finally, for each isolated polygon, the merging process generates three separated polygons with a smoother border as in Figure 16(b).

Advertisement

7. Biological sequence alignment with genetic algorithm

An alignment at sequence level represents the primary structure of biological macromolecules, which is represented as a linear sequential chain of residues. Residues are the building blocks of macromolecules in which DNAs and RNAs are made of nucleotide residues and proteins are of amino acid residues. The building blocks encode the history of molecular evolution. During the process of evolution, residues accumulate many random changes and diverge over time. Some of the changes are accepted and selected by natural selections, and some of them are not. Residues, which are involved in important functional and/or structural roles, tend to be preserved by natural selection and do not accumulate random changes [79]. These parts of sequences help to infer the evolutionary history and identify the common ancestor of two or more related sequences. Therefore, the sequences that share the common evolutionary origin are called homologous sequences. Homology is the conclusion that can be drawn by looking at the degree of similarities between two or more sequences of different species.

7.1 Scoring schemes in sequence alignment

The similarity between two or multiple sequences is quantified by measuring the alignment score. To choose the best alignment among a set of sequences, quantification of alignment is required. It identifies the evolutionary distances between two or multiple sequences. The best alignment always associated with the maximum shared similarity or identity and the lowest number of mutational events between the sequences. Therefore, the scoring systems always score positive when there are identical or similar residues in the alignment.

On the other hand, the mutational events are always associated with zero or negative scores. Different scoring schemes are used in sequence alignment to know the level of identity or similarity. The process of score evaluation for a pairwise sequence alignment is more straightforward and simple compared with scoring the alignment of multiple sequences together. In two aligned sequences, the total number of identical residues yields a percentage identity between them. This identity count is used mostly for nucleotide sequence alignment, as the nucleotides A, T/U, G, and C play equivalent roles in the structure and function of the DNA or RNA molecule. Therefore, the nucleotides are either identical, which is a match in alignment, or nonidentical, which is a mismatch in an alignment. In contrast, for protein sequences, a similarity score is calculated along with the identity score denoting the amino acids having similar physicochemical properties. The substitution matrices are normally consulted for protein sequence alignment.

7.2 Sequence alignment scheme without gaps

The GA-based sequence alignment [80] is a simple method to align a pair of sequences (a query sequence and a known database sequence) without gaps for finding similarity. When a biological sequence (DNA/RNA/protein), called the query sequence is given, one usually performs a similarity measure within the databases that consist of all available genomic sequences and the known protein sequences. Eventually, the search yields many sequences with varying degree of similarities. It then depends on the user to identify those that are associated with the higher scores and homologous. In the alignment technique, we consider one sequence as a database sequence denoted by D and the other as a query sequence denoted by Q.

In the initialization step, an initial population of size N is randomly generated as the probable alignment solutions for the input Q. Each individual Pi,i12N or a chromosome is a string of bits such that Pi=b1b2.bL where L is the length of each individual/chromosome, which is equal to the given Q, and each bj is 01, j12L. The 1’s in the chromosome represent the presence of the residues in the corresponding positions and the 0’s indicate the absence of the residues in Q.

In this approach, all three operators of the conventional GA, namely tournament selection, one-point crossover, and bit-flip mutation are used [1]. In each iteration, the algorithm advances the search toward a better solution by producing a better population. After random selection of two chromosomes from the population pool, the crossover and the mutation operations are performed on them.

The fitness score of a chromosome is evaluated by the following fitness function F.

W=i=1nWiE4

where n is the total number of residues to be aligned, and w is the weight of the alignment score.

For protein sequence alignment, the BLOSUM62 matrix is consulted and for DNA sequence, an identical match gets +1 score, and a mismatch gets 0. The fitness score determines the similarity or identity level between two sequences D and Q. The fitness score is now calculated for pairwise alignment between two sequences D and Q. It is evaluated in three ways. The scoring is for the straight alignment and the alignments with a left or right shift.

For a straight alignment, let us consider the following two nucleotide sequences, D and Q.

D=ATGCTTAGTCQ=ACGCATAGAC

Let us now consider the following binary coded chromosome from the population of an intermediate generation as a probable solution of query sequence Qp.

Qp=0110101101

The equivalent decoded value of Qp by replacing 1 with the corresponding residue of Q will be as follows.

Qp=_CG_A_AG_C

where “_” denotes the presence of 0 in Qp.

Therefore, the alignment score or the fitness of the chromosome would be 4 and the alignment structure looks like:

D=ATGCTTAGTC
Qp=_CG_A_AG_C

However, the alignment by shifting is necessary for a different representation of Q to achieve the best alignment score. For example, if Q is ACTTAGTAAC, then the shifting to the right is required to obtain the optimum alignment with alignment score 6, which is illustrated below,

D=ATGCTTAGTC
Q=ACTTAGTAAC

Similarly, for Q: GCGTTGCTTAGA, the left shift alignment is necessary to obtain the best alignment with score 7, which is shown below,

D=ATGCTTAGTC
Q=GCGTTGCTTAGA

For the evaluation of the final fitness score of an individual, the scores of all three alignments (straight, shift right, and shift left) techniques are considered. However, the selection of position to start alignment depends on the randomly chosen position number. Since minimum 30% matching between two sequences is necessary to evaluate the homologous relationship [81, 82], the random numbers between 1 and 70% of the length of the sequence is generated. For example, if a sequence length is of 100 residues, we shall identify a position randomly between 1st and 70th residues of D or Q depending on the alignment type. The length of D is considered for the straight as well as the right shift alignments. The length of D is also considered for the left shift alignment if it is smaller than or equal to Q. However, the length of Q is taken when D is larger than Q. The total number of times for choosing the position numbers for these three types of alignments depends on the average length of D and Q. In this case, it is 30% of the average length. For example, if the average length of D and Q is 200, the alignment techniques will be continued 60 times with different random starting position values. Now the best matching score is extracted from 60 results given by three alignment techniques.

Advertisement

8. Conclusions

This chapter describes the application of genetic algorithm in various scientific fields as a reliable optimizer. It is narrated how the simple/conventional genetic algorithm has been advanced with time. Apart from these some real applications of the modified genetic algorithm with illustrations have been elaborated in brief (references are provided for detailed description) on the synthetic data. It is shown how the GA identifies a pattern (in single/multiple dot(s) or edge map(s) scene) in 2-D space. The reasoning is given to explain the differences in results of matching scores of two genetic sequential and distributed schemes (CGA and CAGA or DCGA and DCAGA). A genetic-based clustering approach with results provides the efficient use of GA for grouping synthetic as well as real data. The function of GA in complex shape detection and biological sequence alignment on synthetic data further shows the application of GA as a useful and essential optimizer.

Advertisement

Abbreviations

GAGenetic Algorithm
SVMSupport Vector Machine
CGAConventional Genetic Algorithm
CAGACAscaded Genetic Algorithm
DCAGADistributed CAscaded Genetic Algorithm
2-DTwo-Dimensional
3-DThree-Dimensional
R2Two-Dimensional space
DPDot Pattern
DNADeoxyribonucleic acid
RNARibonucleic acid

References

  1. 1. Goldberg DE. Genetic Algorithms in Search, Optimization and Machine Learning. Reading, MA: Addison-Wesley Publishing; 1989
  2. 2. Rechenberg I. Cybernetic solution path of an experimental problem. Royal Aircraft Establishment (U.K.): Ministry of Aviation; 1965
  3. 3. Schwefel HP. Evolutions Strategie and Numerische Optimierung. Ph.D. Thesis. Berlin: Technische University; 1975
  4. 4. Fogel LJ, Owens AJ, Walsh MJ. Artificial Intelligence Through Simulated Evolution. New York: John Wiley; 1966
  5. 5. Box GEP. Evolutionary operation: A method for increasing industrial productivity. Journal of the Royal Statistical Society, Vol. C. 1957;6(2):81-101
  6. 6. Friedman GJ. Digital simulation of an evolutionary process. General Systems Yearbook. 1959;4:171-184
  7. 7. Bremermann HJ. Optimization through evolution and recombination. In: Yovits MC, Jacobi GT, Goldstein GD, editors. Self-Organizing Systems. Washington D. C.: Spartan Books; 1962
  8. 8. Reed J, Toombs R, Barricelli NA. Simulation of biological evolution and machine learning. Journal of Theoretical Biology. 1967;17:319-342
  9. 9. Baricelli NA. Numerical testing of evolution theories. Acta Biotheoretica. 1962;16:69-126
  10. 10. Fraser AS. Simulation of genetic systems by automatic digital computers: I introduction. Australian Journal of Biological Sciences. 1957;10:484-491
  11. 11. Martin GG, Cockerham CC. High speed selection studies. In: Kempthorne O, editor. Biometrical Genetics. USA: Pergamon; 1960
  12. 12. Hill A, Taylor CJ. Model-based image interpretation using genetic algorithm. Image and Vision Computing. 1992;10:295-300
  13. 13. Roth G, Levine MD. Geometric primitive extraction using a genetic algorithm. IEEE Transactions on Pattern Analysis and Machine Intelligence. 1994;16(9):901-905
  14. 14. Toet A, Hajema WP. Genetic contour matching. Pattern Recognition Letters. 1995;16:849-856
  15. 15. Lin DS, Leou JJ. A genetic algorithm approach to Chinese handwriting normalization. IEEE Trans. Systems, Man and Cybernetics-Part B. 1997;27(6):999-1007
  16. 16. Yamany SM, Khiani KJ, Farag AA. Application of neural networks and genetic algorithms in the classification of endothelial cells. Pattern Recognition Letters. 1997;18:1205-1210
  17. 17. Giacinto G, Paolucci P, Roli F. Application of neural networks and statistical pattern recognition algorithms to earthquake risk evaluation. Pattern Recognition Letters. 1997;18:1353-1362
  18. 18. Wang YK, Fan KC, Horng JT. Genetic-based search for error-correcting graph isomorphism. IEEE Transactions on Systems, Man and Cybernetics-Part B. 1997b;27(4):588-596
  19. 19. Ansari N, Chen MH and Hou ESH. Point pattern matching by a Genetic Algorithm. In: Proc. of the 16th Annual Conf. of IEEE Industrial Electronic Society (IECON’90), Vol. II. Pacific Grove: 1990. pp. 1233-1238
  20. 20. Mirmehdi M, Palmer PL, Kittler J. Genetic optimization of the image feature extraction process. Pattern Recognition Letters. 1997;18:355-365
  21. 21. Chan KCC, Lee V, Leung H. Generating fuzzy rules for target tracking using a steady-state genetic algorithm. IEEE Transactions on Evolutionary Computing. 1997;1(3):189-200
  22. 22. Schnecke V, Vornberger O. Hybrid genetic algorithms for constrained placement problems. IEEE Transactions Evolutionary Computing. 1997;1(4):266-277
  23. 23. Zhang F, Zhang YF, Nee AYC. Using genetic algorithms in process planning for job shop machining. IEEE Transactions on Evolutionary Computing. 1997;1(4):278-289
  24. 24. Chen S, Wu Y, McLanghlin S. Genetic algorithm optimization for blind channel identification with higher order cumulant fitting. IEEE Transactions on Evolutionary Computing. 1997;1(4):259-265
  25. 25. Kushchu I. Web-based evolutionary and adaptive information retrieval. IEEE Transactions on Evolutionary Computation. 2005;9(2):117-125
  26. 26. Raidl GR, Koller G, Julstrom BA. Biased mutation operators for subgraph-selection problem. IEEE Transactions on Evolutionary Computation. 2006;10(2):145-156
  27. 27. Cavicchio DJ. Adaptive Search using Simulated Evolution. Ph.D. dissertation. Ann Arbor, Michigan: University of Michigan; 1970
  28. 28. De Jong KA. An Analysis of Behavior of a Class of Genetic Adaptive System. Doctoral dissertation. Michigan: University of Michigan; 1975
  29. 29. Fogel DB. Evolutionary Computation: Toward a New Philosophy of Machine Intelligence. Piscataway, NJ: IEEE Press; 1995
  30. 30. Back T, Schwefel HP. An overview of evolutionary algorithm for parameter optimization. Evolutionary Computation. 1993;1:1-23
  31. 31. Davis LD. Handbook of Genetic Algorithms. New York: Van Nostrand Reinhold; 1991
  32. 32. Harp SA, Samad T. Genetic synthesis of neural network architecture. In: Davis L, editor. Handbook of Genetic Algorithms. New York: University of Chicago Press; 1992. pp. 202-221
  33. 33. Rizzi S. Genetic operators for hierarchical graph clustering. Pattern Recognition Letters. 1998;19:1293-1300
  34. 34. Kim D, Ahu S. A MS-GS VQ codebook design for wireless image communication using genetic algorithms. IEEE Transactions on Evolutionary Computation. 1999;3(1):35-52
  35. 35. Maniezzo V. Genetic evolution of the topology and weight distribution of neural networks. IEEE Transactions on Neural Networks. 1994;5:39-53
  36. 36. Cantú-Paz E. A summary of research on parallel genetic algorithms. Illinois Genetic Algorithm Lab., Univ. Urbana, IL: Illinois Genetic Algorithm Lab., Univ. Illinois Urbana-Champaign; 1995. p. 950076. Tech. Rep
  37. 37. Tomassini M. Parallel and distributed evolutionary algorithms: A review. In: Miettinen K, Mkel M, Neittaanmki P, Periaux J, editors. Evolutionary Algorithms in Engineering and Computer Science. New York: Wiley; 1999. pp. 113-133
  38. 38. Veldhuize DAV, Zydallis JB, Lamont GB. Considerations in engineering parallel multiobjective evolutionary algorithms. IEEE Transactions on Evolutionary Computation. 2003;7:144-173
  39. 39. Hao JK, Dome R. A new population-based method for satisfiability problems. In: Proc. of the 11th European Conference on Artificial Intelligence. New York: Wiley; 1994. pp. 135-139
  40. 40. Holland JH. Adaptation in Natural and Artificial Systems. Ann Arbor, MI: Univ Michigan Press; 1975
  41. 41. Lienig J. A parallel genetic algorithm for performance-driven VLSI routing. IEEE Transactions on Evolutionary Computation. 1997;1:29-39
  42. 42. Sena GA, Megherlu D, Isern G. Implementation of a parallel genetic algorithm on a cluster of workstations: Traveling salesman problem, a case study. Future Generation Computer Systems. 2001;17:477-488
  43. 43. Easton FF, Mansour N. A distributed genetic algorithm for deterministic and stochastic labor scheduling problems. European Journal of Operational Research. 1999;118:505-523
  44. 44. Notredame C, Higgins DG. SAGA: Sequence alignment by genetic algorithm. Nucleic Acids Research. 1996;24(8):1515-1524
  45. 45. Arenas DE, Ochoterena H, Rodriguez VK. Multiple sequence alignment using a genetic algorithm and GLOCSA. Journal of Artificial Evolution and Applications. 2009;2009:963150
  46. 46. Naznin F, Sarker R, Essam D. Vertical decomposition with genetic algorithm for multiple sequence alignment. BMC Bioinformatics. 2011;12:353
  47. 47. Shyu C, Foster J. Evolving Consensus Sequence for Multiple Sequence Alignment with a Genetic Algorithm. In: Cantú-Paz E et al., editors. Proc. Conf. Genet. and Evol. Comp. (GECCO’03), vol. 2724. Berlin Heidelberg, Chicago, IL, USA: LNCS, Springer-Verlag; 2003. pp. 2313-2324
  48. 48. Michalewicz Z, Fogel DB. How to solve it: modern heuristics. 2nd rev. and extended. ed. Berlin; London: Springer; 2004
  49. 49. Narimani Z, Beigy H, Abolhassani H. A new genetic algorithm for multiple sequence alignment. International Journal of Computational Intelligence and Applications. 2012;11(04):1250023
  50. 50. Zhao S, Jin R, Abroshan H, Zeng C, Zhang H, House SD. Gold nanoclusters promote electrocatalytic water oxidation at the nanocluster/cose2 interface. Journal of the American Chemical Society. 2017;139:1077-1080. DOI: 10.1021/jacs.6b12529
  51. 51. Bader SD. Colloquium: Opportunities in nanomagnetism. Reviews of Modern Physics. 2006;78:1-15. DOI: 10.1103/RevModPhys.78.1
  52. 52. Pelegrini M, Parreira RLT, Ferrão LFA, Caramori GF, Ortolan AO, Silva EH. Hydrazine decomposition on a small platinum cluster: The role of n2h5 intermediate. Theoretical Chemistry Accounts. 2016;135:58. DOI: 10.1007/s00214-016-1816-x
  53. 53. Islas R, Heine T, Ito K, Schleyer P, v. R., and Merino G. Boron rings enclosing planar hypercoordinate group 14 elements. Journal of the American Chemical Society. 2007;129:14767-14774. DOI: 10.1021/ja074956m
  54. 54. Sathya M, Jayaselvi M, Joshi S, Pandy E, Pareek PK, Jamal SS, et al. Cancer categorization using genetic algorithm to identify biomaker genes. Journal of Healthcare Engineering. 2022;ID:5821938
  55. 55. Xu Y, Zeng M, Liu Q, Wang X. A genetic algorithm based multilevel association rules mining for big datasets. Mathematical Problems in Engineering. 2014;2014:867149
  56. 56. Devaraj N. Feature Selection using Genetic Algorithm to Improve SVM Classifier. USA: LAP LAMBERT Academic Publishing; 2019
  57. 57. Bhasin H, Bhatia S. Application of genetic algorithms in machine learning. Intl. Journal of Computer Science and Information Technologies. 2011;2(5):2412-2415 0975-9646
  58. 58. Manszoori TK, Suman A, Mishra AK. Application of genetic algorithm for cancer diagnosis by feature selection. Intl. Journal of Engineering Research and Technology. 2014;3(8):1295-1301 2278-0181
  59. 59. Resmini R, Silva L, Aranjo AS, Medeiros P, Muchaluat-Saade D, and Conci A. Combining genetic algorithms and SVM for breast cancer diagnosis using infrared thermography. Sensors. 2021;21:4802
  60. 60. Griffin PM, Alexopoulos C. Point pattern matching using centroid bounding. IEEE Transactions on Systems, Man, and Cybernetics. 1989;19(5):1274-1276
  61. 61. Lavine D, Lambird BA, Kanal LN. Recognition of spatial point patterns. Pattern Recognition. 1983;16(3):289-295
  62. 62. Sprinzak J, Werman M. Affine point matching. Pattern Recognition Letters. 1994;15(4):337-339
  63. 63. Zhang L, Xu W. Point-pattern matching using irreducible matrix and relative invariant. Tsinghua Science and Technology. 1999;4(4):1602-1605
  64. 64. Garai G, Chaudhuri BB. A cascaded genetic algorithm for efficient optimization and pattern matching. Image and Vision Computing. 2002;20:265-277
  65. 65. Garai G, Chaudhuri BB. A distributed hierarchical genetic algorithm for efficient optimization and pattern matching. Pattern Recognition. 2007;40:212-228
  66. 66. Jain AK. Fundamentals of Digital Image Processing. Englewood Cliffs, N.J: Prentice-Hall; 1989
  67. 67. Everitt BS. Cluster Analysis. London: Edward Arnold; 1993
  68. 68. Johnson RA, Wichern DW. Applied Multivariate Statistical Analysis. New Jersey: Prentice Hall; 1992
  69. 69. Garai G, Chaudhuri BB. A novel genetic algorithm for automatic clustering. Pattern Recognition Letters. 2003;25:173-187
  70. 70. Newman DJ, Hettich S, Blake CL and Merz CJ. UCI Repository of machine learning databases, 1998. Univ. of California, Irvine: Dept. of Information and Computer Sciences; 1998. Available from: http://www.ics.uci.edu/∼mlearn/MLRepository.html
  71. 71. Duda RO, Hart PE. Pattern Classification and Scene Analysis. New York: John Wiley & Sons, Inc.; 1973
  72. 72. Faugeras O. Three-Dimensional Computer Vision: A geometric Viewpoint. Cambridge: The MIT Press; 1993
  73. 73. Ogawa H. Labeled pattern matching by Delaunay triangulation and maximal cliques. Pattern Recognition. 1986;19:35-40
  74. 74. Taylor PJ. Quantitative Methods in Geography: An Introduction to Spatial Analysis. Boston: Houghton Mifflin Company; 1977
  75. 75. Laurini R. and Thompson D. Fundamental of Spatial Information Systems. The A.P.I.C. Series, No. 37. London: Academic Press; 1992
  76. 76. Okabe A, Boots B, Sugihara K. Spatial tessellations: Concepts and Applications of Voronoi Diagrams. New Jersey: John Wiley and Sons; 1992
  77. 77. Ronse C. A bibliography on digital and computational convexity (1961-1988). IEEE Transactions on Pattern Analysis and Machine Intelligence. 1989;11:181-190
  78. 78. Garai G, Chaudhuri BB. A split and merge procedure for polygonal border detection of dot pattern. Image and Vision Computing. 1999;17:75-82
  79. 79. Xiong J. Essential bioinformatics. NY: Cambridge University Press; 2006
  80. 80. Garai G, Chowdhury B. A novel genetic approach for optimized biological sequence alignment. Journal of Biophysical Chemistry. 2012;3:201-205
  81. 81. Sander C, Schneider R. Database of homology-derived protein structures and the structural meaning of sequence alignment. Proteins. 1991;9(1):56-68
  82. 82. Rost B. Twilight zone of protein sequence alignments. Protein Engineering. 1999;12(2):85-94

Written By

Gautam Garai

Reviewed: 07 June 2022 Published: 13 July 2022