Experiment results when fixed parameter p = 2, change parameter q.
This chapter presents a study on improving the quality of the self-organizing map (SOM). We have synthesized the relevant research on assessing and improving the quality of SOM in recent years, and then proposed a solution to improve the quality of the feature map by adjusting parameters of the Gaussian neighborhood function. We have used quantization error and topographical error to evaluate the quality of the obtained feature map. The experiment was conducted on 12 published datasets and compared the obtained results with some other improving neighborhood function methods. The proposed method received the feature map with better quality than other solutions.
- quantization error
- topographical error
- self-organizing map
- feature map
- projection quality
- learning quality
SOM is a very useful neural network for visualization and data analysis. Among SOM’s application areas, urban design is a potential area. Many of SOM’s applications can be included in urban design such as: analysis of growth factors in urban design proposal , consider urban spatial structure , analysis of city systems , city data mining , predicting accessibility demand for healthcare infrastructure) , etc. However, for SOM’s calculation results to be more accurate, improving the quality of feature map is a problem to solve.
SOM creates a map of the input data in the multi-dimensional space to the less dimensional space that is usually two-dimensional space called by the feature map of the data. To evaluate the quality of feature map, people mainly use two indicators: learning quality and projection quality [6, 7, 8, 9]. The learning quality indicator is determined through measurement of quantization error (QE) [10, 11]. The projection quality indicator is determined through measurement of topographical error (TE) [12, 13, 14]. If the values of the QE and TE are small, feature map will be assessed with good quality.
Many studies have shown that the quality of feature map is affected greatly by the initial parameters of the network, including map size, numbers of training and neighborhood radius [11, 15, 16, 17, 18]. Beside that, a feature map achieving with a set of fit parameters is not considered as the best quality map. Therefore, improving the feature map quality of SOM is concerned by many researchers.
To achieve good quality map for each dataset in traditional method is “trying error” with different parameters of the network. These parameters, creating a map with the smallest error measurement are suitable for the dataset . According to Chattopadhyay et al. , with a specific dataset, the size of the map is selected by “trying error” until reaching value of QE, TE small enough. Polzlbauer  indicates the technical correlation between QE and TE, which TE often arises when QE reduces. In case of increasing the size of Kohonen layer, QE may reduce but TE increases (i.e., the large size of the Kohonen layer can distort the shape of the map), and vice versa when the size of Kohonen layer is too small, TE is not trust. The use of a small neighborhood radius leads to reduced QE. If the neighborhood radius is the smallest value, QE will reach a minimum value .
Besides the method of “trying error” to determine a suitable network configuration, the study on improving the algorithm of SOM to enhance the quality of feature map is also interested by researchers. Germen [22, 23] optimized QE by integrating “hit” parameter when updating the weight vector of the neurons, the term “hit” means the number of excitation to a neuron (or BMU counter). The “hit” parameter will determine adjusting weight vector of neuron, i.e., the neurons representing for many samples are adjusted less (to ensure not lose information) than neurons representing for less samples.
Neme [24, 25] proposed SOMSR model (SOM with selective refractoriness), which allows reducing TE. In this model, the neighborhood radius of the BMU did not reduce gradually in the learning process. In every training times, every neuron in the neighborhood radius of the BMU will decide itself whether being affected by the BMU or not in the next training.
Kamimura  has integrated the “firing” rate in the distance function to maximize information input. The “firing” rate identifies the important degree of each feature comparing to the remaining features. This method can reduce both QE and TE; however, with each dataset, it needs to “trying error” in several times to determine the appropriate value of “firing.”
Lopez-Rubio  describes the topographical error of the map as a state of “self-intersections.” If it detects a “self-intersections” state between neurons after each learning times, it will redo that learning times. This solution can reduce the TE, but increase QE.
Another approach is to adjust the scope and the learning rate of the neighborhood neurons. Kohonen  homogenised learning rate of all the neurons in the neighborhood radius to learning rate of the BMU by using the “bubbles” neighborhood function. He concluded that the bubbles function is less effective than the Gaussian function.
Aoki and Aoyagi  and Ota et al.  published an asymmetric neighborhood function. The essence of this function is extending the neighborhood radius towards one direction and shrinking the opposite one. Theoretically, this could “slide” down the topographical error out of the map. However, his experiment has been limited in the certain situations and not really convinced.
Lee and Verleysen  replaced the neighborhood function by “fisherman” rule. “Fisherman” rule updates the neurons in neighborhood radius following the recursive principle, which BMU is adjusted following input sample and the BMU-adjacent neurons (adjacent level 1) is governed by the BMU (unadjusted by input samples), moreover, each adjacent neuron in level 2 is adjusted by the previous adjacent neuron in level 1. The remaining neurons in the neighborhood radius are adjusted in the same rule. However, the way to determine the order of the adjacent neurons when they are organized in a rectangular or a hexagonal grid is not shown in his article. In addition, he concluded that the Gaussian function has better results than the rule of “fisherman”.
It can be recognized that achieving a feature map with good quality according to many criterion is a difficult problem. So far, there has not any solution, reducing simultaneously both QE and TE that is well-applied for every dataset.
In this chapter, we improved Gaussian neighborhood function by adding the adjusting parameter in order to simultaneously reduce the QE, TE of the map. The next contents of the chapter include: Section 2 presents an overview of SOM and assessment measures of the quality of feature map; Section 3 presents our studying on adjusting the parameter of the Gaussian neighborhood function; Section 4 indicates the empirical results and the conclusion of the proposed method.
2. Self-organizing map neural network
2.1 Structure and the algorithm
SOM includes input and output Kohonen layer. Kohonen layer is usually organized under the form of a two-dimensional matrix of neurons. Each unit i (neuron) in the Kohonen layer having a weight vector wi = [wi,1, wi,2, …, wi,n], with n is the size of the input vector; wi,j is the weight vector of neuron i going with input j (Figure 1). SOM is trained by unsupervised algorithm. The process is repeated many times, at time t doing three steps:
Step 1. Finding BMU: randomly select sample x(t) from dataset (with t is training times), search for a neuron c of the Kohonen matrix containing the minimum dist distance (frequently use functions of Euclidean, Manhattan or vector dot product). Neuron c is called by Best Matching Unit (BMU).
Step 2. Calculating neighborhood radius of BMU: using the interpolation function (reduce gradually following the times of iterations)
where is the neighborhood radius in the t training time; N0 is initial neighborhood radius; is time constant, with K is the total number of iterations.
Step 3. Updates weight vector of neurons in the neighborhood radius of BMU towards being near to sample x(t):
where is the learning rate at the iteration t, (the learning rate is reduced simply along with time similar to neighborhood radius, with ). could be a linear function, exponential function …; is a neighborhood function, showing the impact of distance on the learning process calculated by the formula (4)
where rc and ri are the positions of neuron c and neuron i in Kohonen matrix.
2.2 The quality of feature map
Quantization error and topographical error are main measurements to assess the quality of SOM. Quantization error is the average difference of the input samples compared to its corresponding winning neurons (BMU). It assesses the accuracy of the represented data, therefore, it is better when the value is smaller .
where x(t) is the input sample at the training t; wc(t) is the BMU’s weight vector of sample x(t); T is total of training times.
Topographical error assesses the topology preservation [13, 14]. It indicates the number of the data samples having the first best matching unit (BMU1) and the second best matching unit (BMU2) being not adjacent. Therefore, the smaller value is better.
where x(t) is the input sample at training times t; d(x(t)) = 1 if BMU1 and BMU2 of x(t) not adjacent, vice versa, d(x(t)) = 0; T is total of training times.
3. Adding adjust parameter for Gaussian neighborhood function
Formula 3 shows the learning ability of SOM depends on two components: learning rate and neighborhood function .
Because the learning rate decreases simply over time, it should define the general learning rate of SOM over the training time. Therefore, the quality of feature map will be influenced mainly by neighborhood function . The adjustment of the neighborhood function will affect directly to the learning process and the quality of the feature map of SOM.
Neighborhood function defines the influence level of input sample on neurons in the neighborhood radius of BMU (Figure 2).
The formula (4) is rewritten in the following general form:
where q and p are two adjustable parameters, with q ≥ 0 và p ≥ 0.
It shows that the value of depending on the distance from the position of the being assessed neuron (ri) (neuron i) to the position of BMU (rc) and parameters q, p, specifically:
If (BMU is neuron being assessed), .
If (the being assessed neuron in the farthest position in neighborhood radius ), the value of the neighborhood function depends on parameter q, with:
The formula (8) shows the minimum value of function depends on parameter q.
Figure 3 illustrates the neighborhood function in case of the neighborhood radius , where p = 2 and q = 0.5, 1, 2, 4, 8, 12.
3.1 Parameter q
In principle, the bigger adjusting level of neurons’s weight vector in the current learning times, the higher their difference with other input patterns in other learning times is. This is the cause of increasing the quantization error. Therefore, to reduce the QE, we must reduce the level and scope of the influencing of input sample, i.e., the increase of the value of q will reduce QE.
However, if q is too large, the learning ability of the map is restricted, i.e., the topography of map changes less, and partly depends on the initialization of the neural’s weight vector. On the other hand, neighborhood radius can be shrunk, due to with neurons in remote positions of neighborhood radius (i.e., neurons in remote positions in the neighborhood radius are not adjusted or adjusted negligibly by input sample). Therefore, to ensure that all the neurons in the neighborhood radius are adjusted by the input sample, the parameter q is not allowed to be too large. For example, the case of q = 8 and 12, function when the value of reaching to .
In case of q ≈ 0, Gaussian function has the same result as bubble function, i.e., with all neurons in the neighborhood radius . As a result, if the neighborhood radius is bigger, the feature map will be more likely to change locally following input sample x(t). This reduces the remember ability the previous learning times of the network.
Therefore, TE may depends on initializing the weight vector of neurons if q is too large or depends on the order of the input samples if q is too small. It is notable that the initial weight vector of neurons and the order of the input sample are selected randomly. Therefore, the topographic learning ability of network is best when parameter q is not too small or too large.
3.2 Parameter p
When the parameter q is fixed, if the parameter p increases, the value of function of the neurons that near the BMU will increase gradually to 1, i.e., the number of neighbors around the BMU that are adjusted similar with BMU will extend. This is the cause of QE increasingly. If the parameter p is too large, the feature map tends to change locally according to the input sample from the closest training times (similar to the case that parameter q is too small). However, TE may vary slightly because TE is conducted by BMU1 and BMU2.
Figure 4 illustrates original neighborhood function (with q = 0.5 and p = 2) and adjusted neighborhood function (with q = 4 and p = 1, 2, 3, 4, 5, 6) in case of .
In case of p = 1, the graph is similar to the case of q = 8, 12 in Figure 3, i.e., the smallest QE compared to the case of p > 1, but TE is unreliable due to depend on initializing the weight vector of neurons.
Therefore, the adjustment of parameter p has no significant impact on improving the quality of the feature map of SOM, but the parameter q has positive significance in improving the quality of the feature map of SOM. The bigger the parameter q is, the smaller QE is. However, q reaches the most appropriate value when TE is the smallest. Therefore, we recommend the neighborhood function with an adjustable parameter as follows:
with the parameter q can be adjusted depending on each the dataset to achieve better quality of feature map.
We have conducted experiments for 12 published datasets, including: XOR (data samples are distributed within the XOR operation), Aggregation, Flame, Pathbased, Spiral, Jain, Compound, R15, D31, Iris, Vowel and Zoo. The parameters were used in the experiment as follows: network size: 10 × 10; initial neighborhood radius: 10; initial learning rate: 1; number of training times: 20,000.
The experiments were conducted in two cases: case 1—fixed parameter p, changed parameter q; case 2—fixed parameter q, changed the parameter p.
|XOR (q = 1)||0.1754||0.1587||0.1546||0.1518||0.1525||0.1513|
|Aggregation (q = 4)||2.7895||3.0003||3.2722||3.6436||3.6100||3.8718|
|Flame (q = 4)||1.1858||1.2105||1.2306||1.3158||1.4010||1.4209|
|Pathbased (q = 4)||2.5458||2.4759||2.7586||2.8462||2.9400||2.9928|
|Spiral (q = 2)||3.5976||3.4319||3.4334||3.4603||3.4926||3.5797|
|Jain (q = 4)||2.3664||2.3519||2.7136||2.9018||3.1494||3.3035|
|Compound (q = 1)||4.2063||3.7575||3.6224||3.4969||3.5082||3.4913|
|R15 (q = 4)||1.3161||1.4406||1.5544||1.6498||1.6972||1.7376|
|D31 (q = 4)||2.3832||2.4769||2.8137||2.9886||3.0686||3.1960|
|Iris (q = 1)||0.7140||0.6382||0.6166||0.6002||0.5880||0.5849|
|Vowel (q = 2)||2.3938||2.3715||2.4186||2.4310||2.4529||2.4627|
|Zoo (q = 4)||1.1817||1.0912||1.1780||1.1954||1.2015||1.2131|
Case 1: Parameter p is fixed, parameter q changed.
Table 1 shows the experimental results with parameter p = 2 and change the value of parameter q = 0.5, 2, 4, 8, 12.
We can see that QE is in a reverse ratio to q, when q is bigger, QE is smaller, while TE reaches the minimum value with q = 1, 2, 4. This is especially true with the proposed analysis in Section 3.
The values in bold are the best results, in which: TE is the smallest, QE is also smaller than the case of using the original neighborhood function (q = 0.5) (column 2, Table 1).
Case 2: Parameter q is fixed, parameter p changes.
When p = 1: both QE and TE increase high.
When p ≥ 2: TE tends to be stable or increase slightly when p rises. This shows that the parameter p is negligibly significance in improving the topographical quality when identifying suitable parameter q; QE tends to increase with the majority of datasets while increasing p (excepting for the dataset XOR, Compound and Iris, QE tends to decrease, but TE tends to increase). This suggests that, p = 2 is the best value.
From Figures 5 to 16 are charts comparing the values of QE, TE when changing the parameters q and p, in which: figures on the left (a) are the results when fixing p = 2 and changing q; figures on the right (b) are the results when fixing q and changing p. Parameter q is selected by the corresponding value to achieve the smallest value of TE in figure (a).
When putting parameter p = 2 and changing parameter q, we see that the charts are similar (figure (a)—on the left), with QE is reduced gradually, TE reduced at first, then increased inversely with QE when parameter q increased gradually. TE reaches the lowest value when q∈ [10, 28].
When fixing parameter q and changing the parameter p, the charts also have similarities (figure (b)—on the right), including: TE is highest when p = 1; both graphs of QE and TE tend to stabilize or increase gradually with p ≥ 2.
Conclusion: With p = 2 (default value), the adjustment of the parameter q has significantly impacted on the quality of the feature map. If q is bigger, the QE is smaller. However, TE is lowest when q is not too small or too large. Therefore, with p = 2, parameter q is the most suitable when its value is large enough to achieve the lowest value of TE. Conversely, if we have identified the most appropriate value of the parameter q, the parameter p has little significant impact on improving the quality of the feature map.
Table 3 shows the results of QE, TE when using neighborhood function (with parameter p = 2 and q is determined for each dataset shown in Table 2) and some other neighborhood functions. Results show that the neighborhood function achieved QE, TE smaller than the original Gaussian function, Bubbles function and asymmetric neighborhood function.
|Dataset||hci(t)||h′ci(t)||Bubble function||Asymmetric neighborhood function|
Note: The results in Table 3 are the average value of 10 experiment times. The result of each dataset present in two rows: the first row shows QE and the second row displays TE.
This chapter proposes the parameter for adjustment of the Gaussian symmetric neighborhood function. Our parameter adjusting method can reduce both QE and TE of the feature map. However, the value of parameter must be determined individually for each specific dataset. The improved Gaussian function is better than the original Gaussian function and some other neighborhood functions like Bubble function, asymmetric neighborhood function.