Open access peer-reviewed chapter

Improving Feature Map Quality of SOM Based on Adjusting the Neighborhood Function

Written By

Le Anh Tu

Submitted: 01 May 2019 Reviewed: 18 August 2019 Published: 14 October 2019

DOI: 10.5772/intechopen.89233

From the Edited Volume

Sustainability in Urban Planning and Design

Edited by Amjad Almusaed, Asaad Almssad and Linh Truong - Hong

Chapter metrics overview

918 Chapter Downloads

View Full Metrics


This chapter presents a study on improving the quality of the self-organizing map (SOM). We have synthesized the relevant research on assessing and improving the quality of SOM in recent years, and then proposed a solution to improve the quality of the feature map by adjusting parameters of the Gaussian neighborhood function. We have used quantization error and topographical error to evaluate the quality of the obtained feature map. The experiment was conducted on 12 published datasets and compared the obtained results with some other improving neighborhood function methods. The proposed method received the feature map with better quality than other solutions.


  • quantization error
  • topographical error
  • self-organizing map
  • feature map
  • projection quality
  • learning quality

1. Introduction

SOM is a very useful neural network for visualization and data analysis. Among SOM’s application areas, urban design is a potential area. Many of SOM’s applications can be included in urban design such as: analysis of growth factors in urban design proposal [1], consider urban spatial structure [2], analysis of city systems [3], city data mining [4], predicting accessibility demand for healthcare infrastructure) [5], etc. However, for SOM’s calculation results to be more accurate, improving the quality of feature map is a problem to solve.

SOM creates a map of the input data in the multi-dimensional space to the less dimensional space that is usually two-dimensional space called by the feature map of the data. To evaluate the quality of feature map, people mainly use two indicators: learning quality and projection quality [6, 7, 8, 9]. The learning quality indicator is determined through measurement of quantization error (QE) [10, 11]. The projection quality indicator is determined through measurement of topographical error (TE) [12, 13, 14]. If the values of the QE and TE are small, feature map will be assessed with good quality.

Many studies have shown that the quality of feature map is affected greatly by the initial parameters of the network, including map size, numbers of training and neighborhood radius [11, 15, 16, 17, 18]. Beside that, a feature map achieving with a set of fit parameters is not considered as the best quality map. Therefore, improving the feature map quality of SOM is concerned by many researchers.

To achieve good quality map for each dataset in traditional method is “trying error” with different parameters of the network. These parameters, creating a map with the smallest error measurement are suitable for the dataset [11]. According to Chattopadhyay et al. [19], with a specific dataset, the size of the map is selected by “trying error” until reaching value of QE, TE small enough. Polzlbauer [20] indicates the technical correlation between QE and TE, which TE often arises when QE reduces. In case of increasing the size of Kohonen layer, QE may reduce but TE increases (i.e., the large size of the Kohonen layer can distort the shape of the map), and vice versa when the size of Kohonen layer is too small, TE is not trust. The use of a small neighborhood radius leads to reduced QE. If the neighborhood radius is the smallest value, QE will reach a minimum value [21].

Besides the method of “trying error” to determine a suitable network configuration, the study on improving the algorithm of SOM to enhance the quality of feature map is also interested by researchers. Germen [22, 23] optimized QE by integrating “hit” parameter when updating the weight vector of the neurons, the term “hit” means the number of excitation to a neuron (or BMU counter). The “hit” parameter will determine adjusting weight vector of neuron, i.e., the neurons representing for many samples are adjusted less (to ensure not lose information) than neurons representing for less samples.

Neme [24, 25] proposed SOMSR model (SOM with selective refractoriness), which allows reducing TE. In this model, the neighborhood radius of the BMU did not reduce gradually in the learning process. In every training times, every neuron in the neighborhood radius of the BMU will decide itself whether being affected by the BMU or not in the next training.

Kamimura [26] has integrated the “firing” rate in the distance function to maximize information input. The “firing” rate identifies the important degree of each feature comparing to the remaining features. This method can reduce both QE and TE; however, with each dataset, it needs to “trying error” in several times to determine the appropriate value of “firing.”

Lopez-Rubio [27] describes the topographical error of the map as a state of “self-intersections.” If it detects a “self-intersections” state between neurons after each learning times, it will redo that learning times. This solution can reduce the TE, but increase QE.

Another approach is to adjust the scope and the learning rate of the neighborhood neurons. Kohonen [11] homogenised learning rate of all the neurons in the neighborhood radius to learning rate of the BMU by using the “bubbles” neighborhood function. He concluded that the bubbles function is less effective than the Gaussian function.

Aoki and Aoyagi [28] and Ota et al. [29] published an asymmetric neighborhood function. The essence of this function is extending the neighborhood radius towards one direction and shrinking the opposite one. Theoretically, this could “slide” down the topographical error out of the map. However, his experiment has been limited in the certain situations and not really convinced.

Lee and Verleysen [30] replaced the neighborhood function by “fisherman” rule. “Fisherman” rule updates the neurons in neighborhood radius following the recursive principle, which BMU is adjusted following input sample and the BMU-adjacent neurons (adjacent level 1) is governed by the BMU (unadjusted by input samples), moreover, each adjacent neuron in level 2 is adjusted by the previous adjacent neuron in level 1. The remaining neurons in the neighborhood radius are adjusted in the same rule. However, the way to determine the order of the adjacent neurons when they are organized in a rectangular or a hexagonal grid is not shown in his article. In addition, he concluded that the Gaussian function has better results than the rule of “fisherman”.

It can be recognized that achieving a feature map with good quality according to many criterion is a difficult problem. So far, there has not any solution, reducing simultaneously both QE and TE that is well-applied for every dataset.

In this chapter, we improved Gaussian neighborhood function by adding the adjusting parameter in order to simultaneously reduce the QE, TE of the map. The next contents of the chapter include: Section 2 presents an overview of SOM and assessment measures of the quality of feature map; Section 3 presents our studying on adjusting the parameter of the Gaussian neighborhood function; Section 4 indicates the empirical results and the conclusion of the proposed method.


2. Self-organizing map neural network

2.1 Structure and the algorithm

SOM includes input and output Kohonen layer. Kohonen layer is usually organized under the form of a two-dimensional matrix of neurons. Each unit i (neuron) in the Kohonen layer having a weight vector wi = [wi,1, wi,2, …, wi,n], with n is the size of the input vector; wi,j is the weight vector of neuron i going with input j (Figure 1). SOM is trained by unsupervised algorithm. The process is repeated many times, at time t doing three steps:

  • Step 1. Finding BMU: randomly select sample x(t) from dataset (with t is training times), search for a neuron c of the Kohonen matrix containing the minimum dist distance (frequently use functions of Euclidean, Manhattan or vector dot product). Neuron c is called by Best Matching Unit (BMU).

Figure 1.

Illustrates the structure of SOM.


  • Step 2. Calculating neighborhood radius of BMU: using the interpolation function (reduce gradually following the times of iterations)


where Nct is the neighborhood radius in the t training time; N0 is initial neighborhood radius; λ=KlogN0 is time constant, with K is the total number of iterations.

  • Step 3. Updates weight vector of neurons in the neighborhood radius of BMU towards being near to sample x(t):


where Lt is the learning rate at the iteration t, (the learning rate is reduced simply along with time similar to neighborhood radius, with 0<Lt<1). Lt could be a linear function, exponential function …; hcit is a neighborhood function, showing the impact of distance on the learning process calculated by the formula (4)


where rc and ri are the positions of neuron c and neuron i in Kohonen matrix.

2.2 The quality of feature map

Quantization error and topographical error are main measurements to assess the quality of SOM. Quantization error is the average difference of the input samples compared to its corresponding winning neurons (BMU). It assesses the accuracy of the represented data, therefore, it is better when the value is smaller [11].


where x(t) is the input sample at the training t; wc(t) is the BMU’s weight vector of sample x(t); T is total of training times.

Topographical error assesses the topology preservation [13, 14]. It indicates the number of the data samples having the first best matching unit (BMU1) and the second best matching unit (BMU2) being not adjacent. Therefore, the smaller value is better.


where x(t) is the input sample at training times t; d(x(t)) = 1 if BMU1 and BMU2 of x(t) not adjacent, vice versa, d(x(t)) = 0; T is total of training times.


3. Adding adjust parameter for Gaussian neighborhood function

Formula 3 shows the learning ability of SOM depends on two components: learning rate Lt and neighborhood function hcit.

Because the learning rate decreases simply over time, it should define the general learning rate of SOM over the training time. Therefore, the quality of feature map will be influenced mainly by neighborhood function hcit. The adjustment of the neighborhood function will affect directly to the learning process and the quality of the feature map of SOM.

Neighborhood function hcit defines the influence level of input sample on neurons in the neighborhood radius Nct of BMU (Figure 2).

Figure 2.

Illustrates the influencing of input sample on the neurons in the neighborhood radius at training times t.

The formula (4) is rewritten in the following general form:


where q and p are two adjustable parameters, with q ≥ 0 và p ≥ 0.

It shows that the value of hcit depending on the distance from the position of the being assessed neuron (ri) (neuron i) to the position of BMU (rc) and parameters q, p, specifically:

  • If rcri=0 (BMU is neuron being assessed), hcit=1.

  • If rcri=Nct (the being assessed neuron in the farthest position in neighborhood radius Nct), the value of the neighborhood function depends on parameter q, with:


The formula (8) shows the minimum value of function hcit depends on parameter q.

Figure 3 illustrates the neighborhood function hcit in case of the neighborhood radius Nct=10, where p = 2 and q = 0.5, 1, 2, 4, 8, 12.

Figure 3.

Illustrates function hcit after changing the value of q.

3.1 Parameter q

In principle, the bigger adjusting level of neurons’s weight vector in the current learning times, the higher their difference with other input patterns in other learning times is. This is the cause of increasing the quantization error. Therefore, to reduce the QE, we must reduce the level and scope of the influencing of input sample, i.e., the increase of the value of q will reduce QE.

However, if q is too large, the learning ability of the map is restricted, i.e., the topography of map changes less, and partly depends on the initialization of the neural’s weight vector. On the other hand, neighborhood radius Nct can be shrunk, due to hcit0 with neurons in remote positions of neighborhood radius (i.e., neurons in remote positions in the neighborhood radius are not adjusted or adjusted negligibly by input sample). Therefore, to ensure that all the neurons in the neighborhood radius Nct are adjusted by the input sample, the parameter q is not allowed to be too large. For example, the case of q = 8 and 12, function hcit0 when the value of rcri reaching to Nct.

In case of q ≈ 0, Gaussian function has the same result as bubble function, i.e., hcit1 with all neurons in the neighborhood radius Nct. As a result, if the neighborhood radius Nct is bigger, the feature map will be more likely to change locally following input sample x(t). This reduces the remember ability the previous learning times of the network.

Therefore, TE may depends on initializing the weight vector of neurons if q is too large or depends on the order of the input samples if q is too small. It is notable that the initial weight vector of neurons and the order of the input sample are selected randomly. Therefore, the topographic learning ability of network is best when parameter q is not too small or too large.

3.2 Parameter p

When the parameter q is fixed, if the parameter p increases, the value of function hcit of the neurons that near the BMU will increase gradually to 1, i.e., the number of neighbors around the BMU that are adjusted similar with BMU will extend. This is the cause of QE increasingly. If the parameter p is too large, the feature map tends to change locally according to the input sample from the closest training times (similar to the case that parameter q is too small). However, TE may vary slightly because TE is conducted by BMU1 and BMU2.

Figure 4 illustrates original neighborhood function hcit (with q = 0.5 and p = 2) and adjusted neighborhood function hcit (with q = 4 and p = 1, 2, 3, 4, 5, 6) in case of Nct=10.

Figure 4.

Illustrates function hcit after changing the value of p.

In case of p = 1, the graph hcit is similar to the case of q = 8, 12 in Figure 3, i.e., the smallest QE compared to the case of p > 1, but TE is unreliable due to depend on initializing the weight vector of neurons.

Therefore, the adjustment of parameter p has no significant impact on improving the quality of the feature map of SOM, but the parameter q has positive significance in improving the quality of the feature map of SOM. The bigger the parameter q is, the smaller QE is. However, q reaches the most appropriate value when TE is the smallest. Therefore, we recommend the neighborhood function hci't with an adjustable parameter as follows:


with the parameter q can be adjusted depending on each the dataset to achieve better quality of feature map.


4. Experiments

We have conducted experiments for 12 published datasets, including: XOR (data samples are distributed within the XOR operation), Aggregation, Flame, Pathbased, Spiral, Jain, Compound, R15, D31, Iris, Vowel and Zoo. The parameters were used in the experiment as follows: network size: 10 × 10; initial neighborhood radius: 10; initial learning rate: 1; number of training times: 20,000.

The experiments were conducted in two cases: case 1—fixed parameter p, changed parameter q; case 2—fixed parameter q, changed the parameter p.

Note: The results in Tables 1 and 2 are the average value of 10 experiment times. The result of each dataset presented in two rows: the first row shows QE and the second row displays TE.


Table 1.

Experiment results when fixed parameter p = 2, change parameter q.

XOR (q = 1)0.17540.15870.15460.15180.15250.1513
Aggregation (q = 4)2.78953.00033.27223.64363.61003.8718
Flame (q = 4)1.18581.21051.23061.31581.40101.4209
Pathbased (q = 4)2.54582.47592.75862.84622.94002.9928
Spiral (q = 2)3.59763.43193.43343.46033.49263.5797
Jain (q = 4)2.36642.35192.71362.90183.14943.3035
Compound (q = 1)4.20633.75753.62243.49693.50823.4913
R15 (q = 4)1.31611.44061.55441.64981.69721.7376
D31 (q = 4)2.38322.47692.81372.98863.06863.1960
Iris (q = 1)0.71400.63820.61660.60020.58800.5849
Vowel (q = 2)2.39382.37152.41862.43102.45292.4627
Zoo (q = 4)1.18171.09121.17801.19541.20151.2131

Table 2.

Experiment results when change parameter p, fixed parameter q.

Case 1: Parameter p is fixed, parameter q changed.

Table 1 shows the experimental results with parameter p = 2 and change the value of parameter q = 0.5, 2, 4, 8, 12.

We can see that QE is in a reverse ratio to q, when q is bigger, QE is smaller, while TE reaches the minimum value with q = 1, 2, 4. This is especially true with the proposed analysis in Section 3.

The values in bold are the best results, in which: TE is the smallest, QE is also smaller than the case of using the original neighborhood function (q = 0.5) (column 2, Table 1).

Case 2: Parameter q is fixed, parameter p changes.

Table 2 shows the experimental results when fixes parameter q of each dataset corresponding to the best value of TE in Table 1 and respectively change the value of p = 1, 2, 3, 4, 5, 6.

When p = 1: both QE and TE increase high.

When p ≥ 2: TE tends to be stable or increase slightly when p rises. This shows that the parameter p is negligibly significance in improving the topographical quality when identifying suitable parameter q; QE tends to increase with the majority of datasets while increasing p (excepting for the dataset XOR, Compound and Iris, QE tends to decrease, but TE tends to increase). This suggests that, p = 2 is the best value.

From Figures 5 to 16 are charts comparing the values of QE, TE when changing the parameters q and p, in which: figures on the left (a) are the results when fixing p = 2 and changing q; figures on the right (b) are the results when fixing q and changing p. Parameter q is selected by the corresponding value to achieve the smallest value of TE in figure (a).

Figure 5.

XOR dataset. (a) p = 2 and q changes and (b) q = 1 and p changes.

Figure 6.

Aggregation dataset. (a) p = 2 and q changes and (b) q = 4 and p changes.

Figure 7.

Flame dataset. (a) p = 2 and q changes and (b) q = 4 and p changes.

Figure 8.

Pathbased dataset. (a) p = 2 and q changes and (b) q = 4 and p changes.

Figure 9.

Spiral dataset. (a) p = 2 and q changes and b) q = 2 and p changes.

Figure 10.

Jain dataset. (a) p = 2 and q changes and (b) q = 4 and p changes.

Figure 11.

Compound dataset. (a) p = 2 and q changes and (b) q = 1 and p changes.

Figure 12.

R15 dataset. (a) p = 2 and q changes and (b) q = 4 and p changes.

Figure 13.

D31 dataset. (a) p = 2 and q changes and (b) q = 4 and p changes.

Figure 14.

Iris dataset. (a) p = 2 and q changes and (b) q = 1 and p changes.

Figure 15.

Vowel dataset. (a) p = 2 and q changes and (b) q = 2 and p changes.

Figure 16.

Zoo dataset. (a) p = 2 and q changes and (b) q = 4 and p changes.

When putting parameter p = 2 and changing parameter q, we see that the charts are similar (figure (a)—on the left), with QE is reduced gradually, TE reduced at first, then increased inversely with QE when parameter q increased gradually. TE reaches the lowest value when q∈ [10, 28].

When fixing parameter q and changing the parameter p, the charts also have similarities (figure (b)—on the right), including: TE is highest when p = 1; both graphs of QE and TE tend to stabilize or increase gradually with p ≥ 2.

Conclusion: With p = 2 (default value), the adjustment of the parameter q has significantly impacted on the quality of the feature map. If q is bigger, the QE is smaller. However, TE is lowest when q is not too small or too large. Therefore, with p = 2, parameter q is the most suitable when its value is large enough to achieve the lowest value of TE. Conversely, if we have identified the most appropriate value of the parameter q, the parameter p has little significant impact on improving the quality of the feature map.

Table 3 shows the results of QE, TE when using neighborhood function hci't (with parameter p = 2 and q is determined for each dataset shown in Table 2) and some other neighborhood functions. Results show that the neighborhood function hci't achieved QE, TE smaller than the original Gaussian function, Bubbles function and asymmetric neighborhood function.

Datasethci(t)h′ci(t)Bubble functionAsymmetric neighborhood function

Table 3.

Compares measures QE, TE of some neighborhood functions.

Note: The results in Table 3 are the average value of 10 experiment times. The result of each dataset present in two rows: the first row shows QE and the second row displays TE.


5. Conclusion

This chapter proposes the parameter for adjustment of the Gaussian symmetric neighborhood function. Our parameter adjusting method can reduce both QE and TE of the feature map. However, the value of parameter must be determined individually for each specific dataset. The improved Gaussian function is better than the original Gaussian function and some other neighborhood functions like Bubble function, asymmetric neighborhood function.


  1. 1. Lonsway B, Mulky AR. A self-organizing neural system for urban design. In: Proceedings of the Twenty First Annual Conference of the Association for Computer-Aided Design in Architecture; 11-14 October 2001; Buffalo (New York). pp. 386-391
  2. 2. Arribas-Bel D, Schmidt CR. Self-organizing maps and the US urban spatial structure. Environment and Planning B: Urban Analytics and City Science. 2013;40(2):362-371
  3. 3. Kropp J. A neural network approach to the analysis of city systems. Applied Geography. 1998;18(1):83-96
  4. 4. Neme O, Pulido JRG, Neme A. Mining the city data: Making sense of cities with self-organizing maps. In: WSOM 2011: Advances in Self-Organizing Maps. Springer; 2011. pp. 168-177
  5. 5. Mayaud JR, Anderson S, Tran M, Radić V. Insights from self-organizing maps for predicting accessibility demand for healthcare infrastructure. Urban Science. 2019;3(1):33. DOI: 10.3390/urbansci3010033
  6. 6. Bauer H, Pawelzik K. Quantifying the neighborhood preservation of self organizing feature maps. IEEE Transactions on Neural Networks. 1992;3(4):570-579
  7. 7. Kahraman C. In: Kahraman CE, editor. Computational Intelligence Systems in Industrial Engineering. 1st ed. Vol. 6. Atlantis Press; 2012. pp. 295-315
  8. 8. Polani D. Measures for the organization of self-organizing maps. In: Studies in Fuzziness and Soft Computing. Vol. 78. Springer; 2002. pp. 13-44
  9. 9. Uriarte E, Martín DF. Topology preservation in SOM. International Journal of Applied Mathematics and Computer Science. 2005;1(1):19-22
  10. 10. Berglund E, Sitte J. The parameterless self-organizing map algorithm. IEEE Transactions on Neural Networks. 2006;17(2):305-316
  11. 11. Kohonen T. Self-Organizing Maps. 3rd ed. Springer-Verlag; 2001
  12. 12. Bauer H, Herrmann M, Villmann T. Neural maps and topographic vector quantization. Neural Networks. 1999;12(4-5):659-676
  13. 13. Kiviluoto K. Topology preservation in self-organizing maps. In: Neural Networks, IEEE International Conference on (ICNN96), Volume 1; June 3-6 1996; Washington, DC: IEEE; 1996. pp. 294-299
  14. 14. Mwasiagi JI, XiuBao H, XinHou W, Qing-dong C. The use of K-means and Kohonen self organizing maps to classify cotton bales. In: Beltwide Cotton Conferences (BWCC’07); 9-12 January 2007; New Orleans, Louisiana; 2007
  15. 15. Flanagan JA. Self-organization in Kohonen’s SOM. Neural Networks. 1996;9(7):1185-1197
  16. 16. Germen E. A novel approach for learning rate in self orginizing map (SOM), anadolu University. Journal of Science and Technology — Application of Science and Engineering. 2018;19(1):144-152
  17. 17. Mulier F, Cherkassky V. Statistical analyses of self-organization. Neural Networks. 1995;8(5):717-727
  18. 18. Wang S, Wang H. Knowledge discovery through self-organizing maps: Data visualization and query processing. Knowledge and Information Systems. 2002;4(1):31-45
  19. 19. Chattopadhyay M, Dan PK, Mazumdar S. Application of visual clustering properties of self organizing map in machine-part cell formation. Applied Soft Computing. 2012;12(2):600-610
  20. 20. Polzlbauer G. Survey and comparison of quality measures for self-organizing maps. In: Paralic J, Polzlbauer G, Rauber A, editors. In: Proceedings of the Fifth Workshop on Data Analysis (WDA-04); 24-27 June 2004; Sliezsky dom, Vysoke Tatry, Slovakia: Elfa Academic Press; 2004. pp. 67-82
  21. 21. Sun Y. On quantization error of self-organizing map network. Neurocomputing. 2000;34(1-4):169-193
  22. 22. Germen E. Increasing the topological quality of Kohonen’s self organizing map by using a hit term. In: Neural Information Processing, Proceedings of the 9th International Conference on (ICONIP’02), Volume 2; 2002. pp. 930-934
  23. 23. Germen E. Improving the resultant quality of Kohonens self organizing map using stiffness factor. In: Advances in Natural Computation, Lecture Notes in Computer Science (First International Conference, ICNC 2005), Volume 3610; August 27-29 2005; Changsha, China: Springer, Berlin, Heidelberg; 2005. pp. 353-357
  24. 24. Neme A, Chavez E, Cervera A, Mireles V. Decreasing neighborhood revisited in selforganizing map. In: Artificial Neural Networks-ICANN 2008, Volume 5163; 3-6 September 2008; Prague, Czech Republic: Springer, Berlin, Heidelberg; 2008. pp. 671-679
  25. 25. Neme A, Miramontes P. Self-organizing map formation with a selectively refractory neighborhood. Neural Processing Letters. 2014;39(1):1-24
  26. 26. Kamimura R. Input information maximization for improving self-organizing maps. Applied Intelligence. 2014;41(2):421-438
  27. 27. Lopez-Rubio E. Improving the quality of self-organizing maps by self-intersection avoidance. IEEE Transactions on Neural Networks and Learning Systems. 2013;24(8):1253-1265
  28. 28. Aoki T, Aoyagi T. Self-organizing maps with asymmetric neighborhood function. Neural Computation. 2007;19:2515-2535
  29. 29. Ota K, Aoki T, Kurata K, Aoyagi T. Asymmetric neighborhood functions accelerate ordering process of self-organizing maps. Physical Review. 2011;83(2-1):1-9
  30. 30. Lee JA, Verleysen M. Self-organizing maps with recursive neighborhood adaptation. Neural Networks. 2002;15:993-1003

Written By

Le Anh Tu

Submitted: 01 May 2019 Reviewed: 18 August 2019 Published: 14 October 2019