Bio-Inspired Hybrid Algorithm for Web Services Clustering

Web services clustering is the task of extracting and selecting the features from a collection of Web services and forming groups of closely related services. The implementation of novel and efficient algorithms for Web services clustering is relevant for the organization of service repositories on the Web. Counting with well-organized collections of Web services promotes the efficiency of Web service discovery, search, selection, substitution, and invocation. In recent years, methods inspired by nature using biological analogies have been adapted for clustering problems, among which genetic algorithms, evolutionary strategies, and algorithms that imitate the behavior of some animal species have been implemented. Computation inspired by nature aims at imitating the steps that nature has developed and adapting them to find a solution of a given problem. In this chapter, we investigate how biologically inspired clustering methods can be applied to clustering Web services and present a hybrid approach for Web services clustering using the Artificial Bee Colony (ABC) algorithm, K-means, and Consensus. This hybrid algorithm was implemented, and a series of experiments were conducted using three collections of Web services. Results of the experiments show that the solution approach is adequate and efficient to carry out the clustering of very large collections of Web services.


Introduction
Web services clustering is the task of selecting and extracting the features of a collection of Web services, discovering the similarities between them to form groups or classes considering those features.The implementation of novel and efficient algorithms for the automatic clustering of Web services is relevant for the organization of large collections of services in private or public network such as the Internet.Having a directory of Web services organized in groups according to one or more characteristics represents an advantage during the search, selection, invocation, substitution, and composition of Web services.
Methods inspired by nature using biological analogies have been adapted for clustering problems, among which genetic algorithms, evolutionary strategies, and algorithms that imitate the behavior of some animal species have been implemented.Living beings such as animals and plants and even the climate exhibit extraordinary, complex, and fascinating natural phenomena.Of particular interest is the intelligent behavior of some animal species to find a solution to solve a problem and maintain the perfect balance of the environment surrounding.This is the main idea of computation inspired by nature, that is, to imitate the steps that nature has developed and adapt them to find a solution of a problem, thus converting it into a bio-inspired algorithm.This is the main reason for the implementation of an algorithm that exploits the collective intelligence of a Bee Colony as an alternative for clustering.This chapter presents an innovative approach for Web services clustering using a hybrid algorithm based on the Artificial Bee Colony, Kmeans, and Consensus.
The work described in this chapter is part of a research project whose objective is to design and implement a semantic directory of Web services using efficient, fully automated methods to allow the organization, composition, and classification of Web services with a semantic approach.
Figure 1 shows the general architecture of the semantic directory and a methodology to construct and manage the directory.The methodology consists of the following phases: 1. Public Web service retrieval aims at searching over the Internet to find and copy Web service descriptions (files formatted in WSDL description language) in a local file directory.Web service retrieval is executed through crawlers designed specifically to parse and identify links to files in WSDL.
2. Extraction and analysis of Web services consists of parsing every Web service description file and the extraction of specific data, such as: method names, input and output parameters, and port types to facilitate the automatic invocation.Extracted data is stored in an ontology model.For this phase, we use a tool which transforms WSDL files into an ontological representation.
3. Web service similarity calculation is an important phase of the methodology, because classification and clustering of Web services requires the calculation of distances between services.In order to calculate similarities, different measures can be implemented and combined to obtain better results.4.Web service classification and clustering consists of selecting and extracting the characteristics of a collection of Web services and discovering the similarities between them to form groups or classes considering those characteristics.Therefore, this phase depends on the results of the previous phases.In particular, in this chapter a hybrid clustering algorithm based on the Artificial Bee Colony algorithm, the K-means, and Consensus is described.

5.
Inference and reasoning represents the supporting mechanisms to exploit and utilize the enriched Web service ontologies.

Related work
Web service classification and clustering is a topic addressed form different perspectives such as statistical, stochastic, and novel approaches based on bio-inspired algorithms.Additionally, semantic approaches to describe, discover, and invoke Web services have been studied to propose novel clustering algorithms.In this section a revision of related work is presented considering two trends: reported work that address clustering and classification of Web services and clustering approaches based on bio-inspired algorithms.Table 1 presents the main characteristics of related work.
In 2009, Liang et al. [1] proposed a method for Web service categorization considering keywords and semantic relations between elements of the description.Their proposed methodology involves preprocessing WSDL documents, rough clustering by labeling Web services with class tag, and fine clustering.
In 2009, Platzer et al. [2] described a scalable approach for clustering very large service repositories.They use a statistical clustering algorithm enhancing a vector space to support the search of services related to a given query.
In 2012, Pop et al. [3] presented two approaches for service clustering, one inspired by the behavior of the birds and other inspired by the behavior of ants.They implemented methods to evaluate the semantic similarity between services.
In 2013, Du et al. [4] presented an approach for clustering Web services based on functional similarity and refinement of clusters using a concept position vector.
In 2014, Wu et al. [5] presented an approach which consists of three modules: data preprocessing, Web service tag recommendation, and Web services clustering.The first module consists of building a content vector formed with nouns, verbs, or adjectives.Authors use different features and different approaches for similarity computation.For content use the normalized Google distance, for data types and messages they use a basic match similarity, and for tag similarity they apply the Jaccard coefficient.
In 2014, Prakash and Singh [6] compared the performance of evolutionary algorithms: Genetic Algorithm, Differential Evolution, Particle Swarm Optimization, and Artificial Bee Colony for clustering three real and one synthetic data sets.
In 2017, Sahoo [7] presented a two-step ABC algorithm for data clustering problems.Authors addressed the three problems of the ABC algorithm such as initial positions of food sources, solution search equation, and abandoned food location.
In 2018, Kotekar and Kamath [8] described a Web services clustering approach based on Cat Swarm Optimization (CSO), which emulates the social behavior of cats in nature.
Automated Web services clustering is useful to facilitate service search, service discovery, service composition, and service substitution.Of particular interest is the representation of Web services through ontologies because the purpose of this work is the automatic organization of any collection (public or private) of Web service in ontologies and their semantic enrichment by classification and clustering.

Clustering process
Clustering of Web services consist of partitioning the set of Web services in the collection into an appropriate number of clusters based on a similarity measure.Therefore, services in the same cluster are more similar than the services in the different clusters [6].
In this section the clustering approach implemented is described.This process has as input a collection of Web services formatted according to Web Service Description Language (WSDL).This collection of Web services is processed utilizing specific parsers to extract the most important data of the service description, which are the method names and input and output parameters.The detailed process is described in the following subsections (Figure 2).

Extracting and parsing
Every Web service description includes the definition of the programming interfaces to be invoked remotely.Figure 3 shows the abstract service interface  Advanced Analytics and Artificial Intelligence Applications definition; from this, the elements extracted for similarity calculation are the name of the operations and their associated input and output parameters.

Semantic similarity measures
Measuring the similarity between two concepts is not a new topic.Throughout the last decades, many measures of similarity have been reported using different perspectives: syntactic, semantic, contextual, etc.In this work, we use a set of semantic similarity measurements based on WordNet. 1  between all Web services in the collection is a process executed in pairs.Let W be the tuple that represents all Web services in the collection as follows: where P, represents all operation names; I, is the set of input parameters; O, is the set of output parameters.
In particular, in this work the similarity measures were applied only on the operation names.Therefore, the similarity calculation takes as input a matrix of all operation names in the collection of Web services, that is, as follows: Eight measures that exploit WordNet database were used to calculate the semantic similarity between Web service operations.WordNet is a lexical database available online; it is organized into five categories: nouns, verbs, adjectives, adverbs, and function words [9].The utilization of WordNet for semantic similarity measurements is a good approach in contrast with the traditional syntactic similarity approaches, specifically in the case of service operations, as they normally include a verb indicating the main functionality of the operation method.
Additionally, an application programming interface (API) that implements a large collection of semantic similarity measures (140 methods) is available WNetSSAPI2 .A deeper analysis and comparison of similarity measures is out of the scope of this work.Table 2 shows a summary of the semantic similarity measures used.
With these measures, all service operations are compared, and a set of eight matrixes are created with the distances between them.Figure 4 shows an example of the calculation of the eight similarities with operation names.Advanced Analytics and Artificial Intelligence Applications

Artificial Bee Colony algorithm
The Artificial Bee Colony (ABC) algorithm is an optimization technique that simulates the foraging behavior of honey bees and has been successfully applied to various practical problems and is a nature-inspired and swarm intelligence method that has been applied in different scenarios with good results.The ABC algorithm was proposed in 2005 by Karaboga [18,19]; accordingly, the collective intelligence model of the Bee Colony consists of: a. Employed foragers which are bees assigned (employed) to a particular food source and are exploiting it.They carry information about the food source, distance and direction to the nest and the profitability of the source, and are capable of sharing this information.
b.Unemployed foragers are bees that are continuously searching for food sources.These unemployed bees are subdivided into scouts, bees that search on the surrounding environment for new food sources, and onlookers, bees that wait in the nest.
The ABC algorithm has different modes of behavior: a. Exploration is the task executed by unemployed bees to find new food sources.
b.Exploitation is the task executed by employed bees on a food source.
c. Recruitment is the action that an unemployed bee executes with forager bees to exploit a food source.
d.Abandonment of a nectar source occurs when a better food source is found.Bio-Inspired Hybrid Algorithm for Web Services Clustering DOI: http://dx.doi.org/10.5772/intechopen.85200 An important behavior of employed and onlooker bees is their capacity of sharing information (memory) to choose and adjust the food source value.This value depends on the proximity to the nest, the richness or concentration of honey energy [18].The exchange of information occurs during the waggle dance at the hive.Onlooker foragers watch numerous dances at the dancing area and decide to employ themselves at the most profitable food source.When an onlooker forager recruit starts searching and locates the food source, then it utilizes its own capability to memorize the location and starts exploiting it.The onlooker forager becomes an employed forager.In the ABC algorithm the set of possible solutions represent the food sources, and the food source value represents the quality of the solution.A general representation of the ABC workflow algorithm is presented in Figure 5.

Hybrid algorithm description
A hybrid algorithm was proposed to make the ABC auto-adjustable during each iteration to decide the number of clusters by incorporating K-means and a Consensus method.In particular, K-means is used to select the elements inside each generated cluster to decide centroids for similarity calculations.The solution of the algorithm is represented as a vector of size n (number of Web services to cluster) where each position of the element in the vector is the group to which it belongs to.

Objective function
The objective function of this hybrid algorithm is shown in Eq. ( 4): where d, distance between centroid of cluster y i and a service x i ; y i , centroid of cluster i; x i : one of the services included in cluster i.No group of services should be empty, and there should be no intersection between groups.

Filtering similarities
The first stage of the hybrid algorithm consists of filtering of the eight matrices that contain the information of similarities between Web services.The filtering consists of discarding values that exceed the limits allowed and established by the similarity measures, as a result of this filtering, new matrices are generated with a degree of 95% certainty in the measurements.Eq. ( 5) shows the filtering calculation: where X, average matrix; 1.96, table value; σ, standard deviation; N, element of the similarity matrix; μ, average similarity.

Food source representation
After the filtering process, all obtained data is stored in an average matrix (food sources) discarding the positions that contain null or zero information; that is, the average matrix was calculated considering only those values that were filtered and accepted as feasible values that contribute with information to the hybrid algorithm.

Bee's representation
Using the average matrix a set of arrays is generated representing the bees and other important information as the number of groups and centroids.Table 3 shows the structure of the solution generated.a. Max group.This hybrid algorithm does not require the user to indicate how many groups it should generate; the algorithm as it iterates determines how many groups to generate based on the results obtained on the previous iteration and applying the Consensus method.Initially, the i-th bee will generate a random number of groups, based on a discrete uniform distribution with limits 2 to N/2; in the subsequent iterations of the algorithm, the i-th bee will produce a random number γ (based on a normal distribution of the weighted variance of the Max group determined by the colony in the previous iteration), then a simple rounding will be applied to γ.
b.Centroids.Next, the i-th bee must determine the centroids of the γ groups involved in the classification.The centroid sub-vector is formed by N integer elements, where x ijk is zero if it is not considered as a centroid for any group, in case x ijk = a implies that the j-th Web service is centroid of the k-th group.Initially such values are assigned randomly; later by applying Eq. ( 6), the values of the sub-vector are obtained: where X 0 new , new vector; X 0 i , first vector generated by the algorithm; ϕ, aleatory number between 0 and 1 a. β represents the sum of the similarity between centroids of the groups, while α is the summation of the similarity between members of each group to the corresponding centroid.The objective is to minimize β and maximize α simultaneously.
b.For each bee the assessment function is calculated using the objective is to obtain the highest f(x) value.During each iteration, the f(x) value is stored in the solution vector.c.Normalization is used to determine the quality of the food source found.
Normalization is calculated with n . The bee will abandon the food source if there is a better food source in the near surrounding.d.Limit is a counter that indicates the number of iterations that the employed bee has being exploiting the current food source.The employed bee is obligated to abandon the food source when a g number of iterations is achieved (Figures 6-8).Table 5 shows the results obtained with configurations: the best, worst, average, variance, and standard deviation of f(x) over the 20 executions of the algorithm (Table 6).

Conclusions
This chapter describes a hybrid algorithm for Web services clustering; the approach is based on the ABC optimization algorithm combined with the K-means and Consensus.The clustering process starts from the extraction and processing of WSDL documents, then the calculation of semantic similarities between pairs of Web services operations.The semantic similarity measurements are based on WordNet, combining corpus-based and taxonomic approaches.As a result of the calculations, eight matrices are generated which are the input data set to the hybrid bio-inspired algorithm.
Biologically inspired algorithms offer advantages over conventional clustering methods, such as the ability to find a near optimal solution by updating the candidate solutions iteratively and have self-organizing behavior.
The clustering algorithm designed is based on the behavior of the bees but was improved by incorporating K-means and Consensus so that the algorithm adjusts itself in each iteration.A series of experiments were conducted with the hybrid ABC algorithm, using optimal values for each adjustable parameter.The experiments were carried out with three collections with 50, 647, and 1027 Web services, and the algorithm was executed with variants of 100, 200, and 500 iterations.The hybrid ABC algorithm has shown good results for Web services clustering.
As future work, more combinations of semantic similarities, as well as incorporating more information of the description of the services and the data types incorporated in the XML service definitions.

Figure 1 .
Figure 1.Semantic directory of Web services.

Figure 2 .
Figure 2. Clustering process of Web services.

Figure 4 .
Figure 4. Example of the calculation of semantic similarities.

Figure 7 .
Figure 7. Pseudocode of the "Generation of N initial food sources."

Figure 9 .
Figure 9.Comparison of similarity between the best solutions found with 1027 services.

Figures 9
Figures 9 and 10 show a comparison of results obtained from the algorithm with 100, 200, and 500 iterations of the instance of 1027 services.It can be seen that the best values of f(x) are produced with 500 iterations; however the result of the number of groups involved is more stable and better with 200 iterations.

Figure 10 .
Figure 10.Comparison of the number of groups in the best case for 1027 services.

Table 1 .
Comparison of related work.

Table 2 .
Computing similarity Summary of similarity measurements.

Table 3 .
Composition of the vector with information of generated groups.

Table 5 .
Summary with the best results of similarity.

Table 6 .
Summary with the best results of clusters.