Open access peer-reviewed chapter - ONLINE FIRST

Improving Feature Map Quality of SOM Based on Adjusting the Neighborhood Function

By Le Anh Tu

Submitted: May 1st 2019Reviewed: August 18th 2019Published: October 14th 2019

DOI: 10.5772/intechopen.89233

Downloaded: 63

Abstract

This chapter presents a study on improving the quality of the self-organizing map (SOM). We have synthesized the relevant research on assessing and improving the quality of SOM in recent years, and then proposed a solution to improve the quality of the feature map by adjusting parameters of the Gaussian neighborhood function. We have used quantization error and topographical error to evaluate the quality of the obtained feature map. The experiment was conducted on 12 published datasets and compared the obtained results with some other improving neighborhood function methods. The proposed method received the feature map with better quality than other solutions.

Keywords

  • quantization error
  • topographical error
  • self-organizing map
  • feature map
  • projection quality
  • learning quality

1. Introduction

SOM is a very useful neural network for visualization and data analysis. Among SOM’s application areas, urban design is a potential area. Many of SOM’s applications can be included in urban design such as: analysis of growth factors in urban design proposal [1], consider urban spatial structure [2], analysis of city systems [3], city data mining [4], predicting accessibility demand for healthcare infrastructure) [5], etc. However, for SOM’s calculation results to be more accurate, improving the quality of feature map is a problem to solve.

SOM creates a map of the input data in the multi-dimensional space to the less dimensional space that is usually two-dimensional space called by the feature map of the data. To evaluate the quality of feature map, people mainly use two indicators: learning quality and projection quality [6, 7, 8, 9]. The learning quality indicator is determined through measurement of quantization error (QE) [10, 11]. The projection quality indicator is determined through measurement of topographical error (TE) [12, 13, 14]. If the values of the QE and TE are small, feature map will be assessed with good quality.

Many studies have shown that the quality of feature map is affected greatly by the initial parameters of the network, including map size, numbers of training and neighborhood radius [11, 15, 16, 17, 18]. Beside that, a feature map achieving with a set of fit parameters is not considered as the best quality map. Therefore, improving the feature map quality of SOM is concerned by many researchers.

To achieve good quality map for each dataset in traditional method is “trying error” with different parameters of the network. These parameters, creating a map with the smallest error measurement are suitable for the dataset [11]. According to Chattopadhyay et al. [19], with a specific dataset, the size of the map is selected by “trying error” until reaching value of QE, TE small enough. Polzlbauer [20] indicates the technical correlation between QE and TE, which TE often arises when QE reduces. In case of increasing the size of Kohonen layer, QE may reduce but TE increases (i.e., the large size of the Kohonen layer can distort the shape of the map), and vice versa when the size of Kohonen layer is too small, TE is not trust. The use of a small neighborhood radius leads to reduced QE. If the neighborhood radius is the smallest value, QE will reach a minimum value [21].

Besides the method of “trying error” to determine a suitable network configuration, the study on improving the algorithm of SOM to enhance the quality of feature map is also interested by researchers. Germen [22, 23] optimized QE by integrating “hit” parameter when updating the weight vector of the neurons, the term “hit” means the number of excitation to a neuron (or BMU counter). The “hit” parameter will determine adjusting weight vector of neuron, i.e., the neurons representing for many samples are adjusted less (to ensure not lose information) than neurons representing for less samples.

Neme [24, 25] proposed SOMSR model (SOM with selective refractoriness), which allows reducing TE. In this model, the neighborhood radius of the BMU did not reduce gradually in the learning process. In every training times, every neuron in the neighborhood radius of the BMU will decide itself whether being affected by the BMU or not in the next training.

Kamimura [26] has integrated the “firing” rate in the distance function to maximize information input. The “firing” rate identifies the important degree of each feature comparing to the remaining features. This method can reduce both QE and TE; however, with each dataset, it needs to “trying error” in several times to determine the appropriate value of “firing.”

Lopez-Rubio [27] describes the topographical error of the map as a state of “self-intersections.” If it detects a “self-intersections” state between neurons after each learning times, it will redo that learning times. This solution can reduce the TE, but increase QE.

Another approach is to adjust the scope and the learning rate of the neighborhood neurons. Kohonen [11] homogenised learning rate of all the neurons in the neighborhood radius to learning rate of the BMU by using the “bubbles” neighborhood function. He concluded that the bubbles function is less effective than the Gaussian function.

Aoki and Aoyagi [28] and Ota et al. [29] published an asymmetric neighborhood function. The essence of this function is extending the neighborhood radius towards one direction and shrinking the opposite one. Theoretically, this could “slide” down the topographical error out of the map. However, his experiment has been limited in the certain situations and not really convinced.

Lee and Verleysen [30] replaced the neighborhood function by “fisherman” rule. “Fisherman” rule updates the neurons in neighborhood radius following the recursive principle, which BMU is adjusted following input sample and the BMU-adjacent neurons (adjacent level 1) is governed by the BMU (unadjusted by input samples), moreover, each adjacent neuron in level 2 is adjusted by the previous adjacent neuron in level 1. The remaining neurons in the neighborhood radius are adjusted in the same rule. However, the way to determine the order of the adjacent neurons when they are organized in a rectangular or a hexagonal grid is not shown in his article. In addition, he concluded that the Gaussian function has better results than the rule of “fisherman”.

It can be recognized that achieving a feature map with good quality according to many criterion is a difficult problem. So far, there has not any solution, reducing simultaneously both QE and TE that is well-applied for every dataset.

In this chapter, we improved Gaussian neighborhood function by adding the adjusting parameter in order to simultaneously reduce the QE, TE of the map. The next contents of the chapter include: Section 2 presents an overview of SOM and assessment measures of the quality of feature map; Section 3 presents our studying on adjusting the parameter of the Gaussian neighborhood function; Section 4 indicates the empirical results and the conclusion of the proposed method.

2. Self-organizing map neural network

2.1 Structure and the algorithm

SOM includes input and output Kohonen layer. Kohonen layer is usually organized under the form of a two-dimensional matrix of neurons. Each unit i (neuron) in the Kohonen layer having a weight vector wi = [wi,1, wi,2, …, wi,n], with n is the size of the input vector; wi,j is the weight vector of neuron i going with input j (Figure 1). SOM is trained by unsupervised algorithm. The process is repeated many times, at time t doing three steps:

  • Step 1. Finding BMU: randomly select sample x(t) from dataset (with t is training times), search for a neuron c of the Kohonen matrix containing the minimum dist distance (frequently use functions of Euclidean, Manhattan or vector dot product). Neuron c is called by Best Matching Unit (BMU).

Figure 1.

Illustrates the structure of SOM.

dist=xtwc=minixtwiE1

  • Step 2. Calculating neighborhood radius of BMU: using the interpolation function (reduce gradually following the times of iterations)

Nct=N0exptλE2

where Nctis the neighborhood radius in the t training time; N0 is initial neighborhood radius; λ=KlogN0is time constant, with K is the total number of iterations.

  • Step 3. Updates weight vector of neurons in the neighborhood radius of BMU towards being near to sample x(t):

wit+1=wit+LthcitxtwitE3

where Ltis the learning rate at the iteration t, (the learning rate is reduced simply along with time similar to neighborhood radius, with 0<Lt<1). Ltcould be a linear function, exponential function …; hcitis a neighborhood function, showing the impact of distance on the learning process calculated by the formula (4)

hcit=exprcri22Nc2tE4

where rc and ri are the positions of neuron c and neuron i in Kohonen matrix.

2.2 The quality of feature map

Quantization error and topographical error are main measurements to assess the quality of SOM. Quantization error is the average difference of the input samples compared to its corresponding winning neurons (BMU). It assesses the accuracy of the represented data, therefore, it is better when the value is smaller [11].

QE=1Tt=1TxtwctE5

where x(t) is the input sample at the training t; wc(t) is the BMU’s weight vector of sample x(t); T is total of training times.

Topographical error assesses the topology preservation [13, 14]. It indicates the number of the data samples having the first best matching unit (BMU1) and the second best matching unit (BMU2) being not adjacent. Therefore, the smaller value is better.

TE=1Tt=1TdxtE6

where x(t) is the input sample at training times t; d(x(t)) = 1 if BMU1 and BMU2 of x(t) not adjacent, vice versa, d(x(t)) = 0; T is total of training times.

3. Adding adjust parameter for Gaussian neighborhood function

Formula 3 shows the learning ability of SOM depends on two components: learning rate Ltand neighborhood function hcit.

Because the learning rate decreases simply over time, it should define the general learning rate of SOM over the training time. Therefore, the quality of feature map will be influenced mainly by neighborhood function hcit. The adjustment of the neighborhood function will affect directly to the learning process and the quality of the feature map of SOM.

Neighborhood function hcitdefines the influence level of input sample on neurons in the neighborhood radius Nctof BMU (Figure 2).

Figure 2.

Illustrates the influencing of input sample on the neurons in the neighborhood radius at training times t.

The formula (4) is rewritten in the following general form:

hcit=expqrcripNcptE7

where q and p are two adjustable parameters, with q ≥ 0 và p ≥ 0.

It shows that the value of hcitdepending on the distance from the position of the being assessed neuron (ri) (neuron i) to the position of BMU (rc) and parameters q, p, specifically:

  • If rcri=0(BMU is neuron being assessed), hcit=1.

  • If rcri=Nct(the being assessed neuron in the farthest position in neighborhood radius Nct), the value of the neighborhood function depends on parameter q, with:

hcit=expqE8

The formula (8) shows the minimum value of function hcitdepends on parameter q.

Figure 3 illustrates the neighborhood function hcitin case of the neighborhood radius Nct=10, where p = 2 and q = 0.5, 1, 2, 4, 8, 12.

Figure 3.

Illustrates function h ci t after changing the value of q.

3.1 Parameter q

In principle, the bigger adjusting level of neurons’s weight vector in the current learning times, the higher their difference with other input patterns in other learning times is. This is the cause of increasing the quantization error. Therefore, to reduce the QE, we must reduce the level and scope of the influencing of input sample, i.e., the increase of the value of q will reduce QE.

However, if q is too large, the learning ability of the map is restricted, i.e., the topography of map changes less, and partly depends on the initialization of the neural’s weight vector. On the other hand, neighborhood radius Nctcan be shrunk, due to hcit0with neurons in remote positions of neighborhood radius (i.e., neurons in remote positions in the neighborhood radius are not adjusted or adjusted negligibly by input sample). Therefore, to ensure that all the neurons in the neighborhood radius Nctare adjusted by the input sample, the parameter q is not allowed to be too large. For example, the case of q = 8 and 12, function hcit0when the value of rcrireaching to Nct.

In case of q ≈ 0, Gaussian function has the same result as bubble function, i.e., hcit1with all neurons in the neighborhood radius Nct. As a result, if the neighborhood radius Nctis bigger, the feature map will be more likely to change locally following input sample x(t). This reduces the remember ability the previous learning times of the network.

Therefore, TE may depends on initializing the weight vector of neurons if q is too large or depends on the order of the input samples if q is too small. It is notable that the initial weight vector of neurons and the order of the input sample are selected randomly. Therefore, the topographic learning ability of network is best when parameter q is not too small or too large.

3.2 Parameter p

When the parameter q is fixed, if the parameter p increases, the value of function hcitof the neurons that near the BMU will increase gradually to 1, i.e., the number of neighbors around the BMU that are adjusted similar with BMU will extend. This is the cause of QE increasingly. If the parameter p is too large, the feature map tends to change locally according to the input sample from the closest training times (similar to the case that parameter q is too small). However, TE may vary slightly because TE is conducted by BMU1 and BMU2.

Figure 4 illustrates original neighborhood function hcit(with q = 0.5 and p = 2) and adjusted neighborhood function hcit(with q = 4 and p = 1, 2, 3, 4, 5, 6) in case of Nct=10.

Figure 4.

Illustrates function h ci t after changing the value of p.

In case of p = 1, the graph hcitis similar to the case of q = 8, 12 in Figure 3, i.e., the smallest QE compared to the case of p > 1, but TE is unreliable due to depend on initializing the weight vector of neurons.

Therefore, the adjustment of parameter p has no significant impact on improving the quality of the feature map of SOM, but the parameter q has positive significance in improving the quality of the feature map of SOM. The bigger the parameter q is, the smaller QE is. However, q reaches the most appropriate value when TE is the smallest. Therefore, we recommend the neighborhood function hci'twith an adjustable parameter as follows:

hci't=expqrcri2Nc2tE9

with the parameter q can be adjusted depending on each the dataset to achieve better quality of feature map.

4. Experiments

We have conducted experiments for 12 published datasets, including: XOR (data samples are distributed within the XOR operation), Aggregation, Flame, Pathbased, Spiral, Jain, Compound, R15, D31, Iris, Vowel and Zoo. The parameters were used in the experiment as follows: network size: 10 × 10; initial neighborhood radius: 10; initial learning rate: 1; number of training times: 20,000.

The experiments were conducted in two cases: case 1—fixed parameter p, changed parameter q; case 2—fixed parameter q, changed the parameter p.

Note: The results in Tables 1 and 2 are the average value of 10 experiment times. The result of each dataset presented in two rows: the first row shows QE and the second row displays TE.

q0.5124812
XOR0.18900.15850.12990.11290.09020.0810
0.03180.02230.02730.04270.07050.0925
Aggregation5.97025.06434.02762.93402.28191.8472
0.05490.03620.02940.02450.04240.0678
Flame2.18391.95121.51941.18220.91290.8206
0.07000.05670.04070.03930.04790.0833
Pathbased4.58594.04273.26182.47791.93921.7401
0.05610.04330.03730.03150.04340.0794
Spiral4.75954.17193.46752.92392.29752.0085
0.05430.04040.02840.03640.04130.0564
Jain5.27454.48293.57262.35591.62361.5234
0.05130.03950.03130.02690.04430.0637
Compound4.42053.75953.15082.56721.83231.7744
0.06240.02990.03490.04000.06300.0690
R152.22262.02121.80051.46061.07300.9562
0.07220.06310.03680.02740.06130.1162
D314.76764.12043.39432.45692.00551.6793
0.04790.03520.02840.02070.03320.0394
Iris0.77090.64300.53530.44030.37730.3494
0.07390.05480.06890.09400.11960.1566
Vowel2.74592.57362.37552.20051.91501.7468
0.05370.04360.04120.04480.04940.0497
Zoo1.58411.44211.24681.09120.97900.9156
0.03430.02540.01690.01040.01620.0208

Table 1.

Experiment results when fixed parameter p = 2, change parameter q.

p123456
XOR (q = 1)0.17540.15870.15460.15180.15250.1513
0.05340.02030.02250.02440.02380.0255
Aggregation (q = 4)2.78953.00033.27223.64363.61003.8718
0.08500.03000.02770.02730.03160.0282
Flame (q = 4)1.18581.21051.23061.31581.40101.4209
0.14380.04050.02840.03040.03310.0330
Pathbased (q = 4)2.54582.47592.75862.84622.94002.9928
0.13000.03130.03630.03510.03490.0304
Spiral (q = 2)3.59763.43193.43343.46033.49263.5797
0.06900.02900.02650.02900.02610.0264
Jain (q = 4)2.36642.35192.71362.90183.14943.3035
0.08960.02630.02700.03060.04020.0403
Compound (q = 1)4.20633.75753.62243.49693.50823.4913
0.06660.02910.03370.03400.03730.0398
R15 (q = 4)1.31611.44061.55441.64981.69721.7376
0.10550.02940.03670.03900.04540.0548
D31 (q = 4)2.38322.47692.81372.98863.06863.1960
0.08030.01990.02270.02380.02590.0284
Iris (q = 1)0.71400.63820.61660.60020.58800.5849
0.06650.05180.05550.05600.05720.0598
Vowel (q = 2)2.39382.37152.41862.43102.45292.4627
0.06350.04100.04160.04140.04290.0455
Zoo (q = 4)1.18171.09121.17801.19541.20151.2131
0.03660.01040.01820.01880.01760.0180

Table 2.

Experiment results when change parameter p, fixed parameter q.

Case 1: Parameter p is fixed, parameter q changed.

Table 1 shows the experimental results with parameter p = 2 and change the value of parameter q = 0.5, 2, 4, 8, 12.

We can see that QE is in a reverse ratio to q, when q is bigger, QE is smaller, while TE reaches the minimum value with q = 1, 2, 4. This is especially true with the proposed analysis in Section 3.

The values in bold are the best results, in which: TE is the smallest, QE is also smaller than the case of using the original neighborhood function (q = 0.5) (column 2, Table 1).

Case 2: Parameter q is fixed, parameter p changes.

Table 2 shows the experimental results when fixes parameter q of each dataset corresponding to the best value of TE in Table 1 and respectively change the value of p = 1, 2, 3, 4, 5, 6.

When p = 1: both QE and TE increase high.

When p ≥ 2: TE tends to be stable or increase slightly when p rises. This shows that the parameter p is negligibly significance in improving the topographical quality when identifying suitable parameter q; QE tends to increase with the majority of datasets while increasing p (excepting for the dataset XOR, Compound and Iris, QE tends to decrease, but TE tends to increase). This suggests that, p = 2 is the best value.

From Figures 5 to 16 are charts comparing the values of QE, TE when changing the parameters q and p, in which: figures on the left (a) are the results when fixing p = 2 and changing q; figures on the right (b) are the results when fixing q and changing p. Parameter q is selected by the corresponding value to achieve the smallest value of TE in figure (a).

Figure 5.

XOR dataset. (a) p = 2 and q changes and (b) q = 1 and p changes.

Figure 6.

Aggregation dataset. (a) p = 2 and q changes and (b) q = 4 and p changes.

Figure 7.

Flame dataset. (a) p = 2 and q changes and (b) q = 4 and p changes.

Figure 8.

Pathbased dataset. (a) p = 2 and q changes and (b) q = 4 and p changes.

Figure 9.

Spiral dataset. (a) p = 2 and q changes and b) q = 2 and p changes.

Figure 10.

Jain dataset. (a) p = 2 and q changes and (b) q = 4 and p changes.

Figure 11.

Compound dataset. (a) p = 2 and q changes and (b) q = 1 and p changes.

Figure 12.

R15 dataset. (a) p = 2 and q changes and (b) q = 4 and p changes.

Figure 13.

D31 dataset. (a) p = 2 and q changes and (b) q = 4 and p changes.

Figure 14.

Iris dataset. (a) p = 2 and q changes and (b) q = 1 and p changes.

Figure 15.

Vowel dataset. (a) p = 2 and q changes and (b) q = 2 and p changes.

Figure 16.

Zoo dataset. (a) p = 2 and q changes and (b) q = 4 and p changes.

When putting parameter p = 2 and changing parameter q, we see that the charts are similar (figure (a)—on the left), with QE is reduced gradually, TE reduced at first, then increased inversely with QE when parameter q increased gradually. TE reaches the lowest value when q∈ [10, 28].

When fixing parameter q and changing the parameter p, the charts also have similarities (figure (b)—on the right), including: TE is highest when p = 1; both graphs of QE and TE tend to stabilize or increase gradually with p ≥ 2.

Conclusion: With p = 2 (default value), the adjustment of the parameter q has significantly impacted on the quality of the feature map. If q is bigger, the QE is smaller. However, TE is lowest when q is not too small or too large. Therefore, with p = 2, parameter q is the most suitable when its value is large enough to achieve the lowest value of TE. Conversely, if we have identified the most appropriate value of the parameter q, the parameter p has little significant impact on improving the quality of the feature map.

Table 3 shows the results of QE, TE when using neighborhood function hci't(with parameter p = 2 and q is determined for each dataset shown in Table 2) and some other neighborhood functions. Results show that the neighborhood function hci'tachieved QE, TE smaller than the original Gaussian function, Bubbles function and asymmetric neighborhood function.

Datasethci(t)h′ci(t)Bubble functionAsymmetric neighborhood function
XOR0.18900.15850.25720.1808
0.03180.02230.27080.4635
Aggregation5.97022.93407.30924.9466
0.05490.02450.17940.4476
Flame2.18391.18222.63522.1916
0.07000.03930.16420.6828
Pathbased4.58592.47795.5245.3888
0.05610.03150.19810.2715
Spiral4.75953.46755.65154.3775
0.05430.02840.15020.6306
Jain5.27452.35596.30265.4962
0.05130.02690.20240.3172
Compound4.42053.75955.56633.5529
0.06240.02990.21990.4349
R152.22261.46062.50171.8911
0.07220.02740.13840.6337
D314.76762.45695.60955.958
0.04790.02070.20540.3506
Iris0.77090.64301.0010.9284
0.07390.05480.23120.2610
Vowel2.74592.37553.10222.8808
0.05370.04120.18720.3965
Zoo1.58411.09121.71821.7179
0.03430.01040.21820.2210

Table 3.

Compares measures QE, TE of some neighborhood functions.

Note: The results in Table 3 are the average value of 10 experiment times. The result of each dataset present in two rows: the first row shows QE and the second row displays TE.

5. Conclusion

This chapter proposes the parameter for adjustment of the Gaussian symmetric neighborhood function. Our parameter adjusting method can reduce both QE and TE of the feature map. However, the value of parameter must be determined individually for each specific dataset. The improved Gaussian function is better than the original Gaussian function and some other neighborhood functions like Bubble function, asymmetric neighborhood function.

How to cite and reference

Link to this chapter Copy to clipboard

Cite this chapter Copy to clipboard

Le Anh Tu (October 14th 2019). Improving Feature Map Quality of SOM Based on Adjusting the Neighborhood Function [Online First], IntechOpen, DOI: 10.5772/intechopen.89233. Available from:

chapter statistics

63total chapter downloads

More statistics for editors and authors

Login to your personal dashboard for more detailed statistics on your publications.

Access personal reporting

We are IntechOpen, the world's leading publisher of Open Access books. Built by scientists, for scientists. Our readership spans scientists, professors, researchers, librarians, and students, as well as business professionals. We share our knowledge and peer-reveiwed research papers with libraries, scientific and engineering societies, and also work with corporate R&D departments and government entities.

More About Us