Open access peer-reviewed chapter

Multivariate Analysis in the Characterization and Classification of Soils

Written By

Oswaldo Eduardo Ramos Ramos and Leonardo Guzmán Alegría

Submitted: 27 August 2023 Reviewed: 04 September 2023 Published: 27 November 2023

DOI: 10.5772/intechopen.1002983

From the Edited Volume

New Insights on Principal Component Analysis

Fausto Pedro García Márquez, Mayorkinos Papaelias and René-Vinicio Sánchez Loja

Chapter metrics overview

41 Chapter Downloads

View Full Metrics

Abstract

Soil is a fundamental natural resource in the balance for the ecosystems as well as for agriculture, food, and housing. The soil is very susceptible to changes in its structure due to contamination or degradation of anthropogenic origin. Therefore, its evaluation, whether for environmental purposes or as an agricultural or housing resource, must be carried out in depth. This evaluation comprises the analysis of multiple physical, physicochemical and chemical-biological parameters. However, due to these multiple parameters, the use of multivariate statistical methods becomes necessary. In this chapter, the soil data analysis was performed by the method of Principal Components Analysis for a reduction of dimensions and, to carry out a better interpretation of results. This method was applied to carry out a characterization and classification of soil samples. The analysis was performed with data obtained from soil samples from the Bolivian Altiplano. The results show the potential of the principal component of the method in processing data.

Keywords

  • principal component analysis
  • multivariate analysis
  • reduction of dimensions
  • Bolivian Altiplano
  • contamination

1. Introduction

Soil and water are fundamental natural resources in the balance of ecosystems as well as in agriculture, food, and housing being thus fundamental for life.

Soils are composed of two environments, biotic and abiotic. The first constituted of microorganisms while the second, abiotic, is composed of solid, liquid and gaseous phases. The two environments characterize a particular soil giving it its uniqueness.

Soil and natural waters are highly susceptible to big changes in their structure and composition due to anthropogenic degradation or contamination. Therefore, its evaluation, whether for environmental purposes or as agricultural or housing resources, must be carried out deeply. This evaluation comprises the analysis of multiple physical, physicochemical and chemical-biological parameters. For example, to know the fertility of the soils, it is important to analyze parameters such as: pH, Electrical Conductivity, Organic Matter, exchangeable cations and others. The evaluations are carried out by comparison of these parameters with values established in agricultural or environmental regulations. However, due to complexity and variety and a big number of parameters, its analysis may become tough. Because of this, the use of multivariate statistical methods becomes imperative. Thus, the multivariate analysis applied to the characterization and classification of soils and natural waters according to the field of study that is intended to be carried out, grows up in importance.

The Principal Components Analysis offers us an alternative for the characterization and classification of soils. The different soil samples or soil sampling points constitute the elements of a system, and the physicochemical parameters measured in these samples, the variables. Thus, we have a system with multiple elements and variables since generally, the sampling points and the variables are numerous. The data analysis can be quite complex because the representative points among the sampling points should be represented in multidimensional spaces. Even though the variables could be represented in one-dimensional (considering each variable) or two-dimensional (every two variables) spaces, this is neither practical nor objectively informative.

The analysis through Principal Components with reduction of dimensions offers us precisely an alternative. Since the set of multivariables can be reduced to a few compound variables, the analysis becomes feasible and, the conclusions are more objective.

However, not all data sets are susceptible to Principal Component Analysis (PCA). They must meet certain requirements, for example they should be comprised of numerical variables and, the correlations between the variables must be above an acceptable level. If these requirements are not met, although a PCA could be made, their results would not be valid. The compliance for these requirements is given by the correlation matrix, where correlations must be observed. The Kaiser-Meyer-Olkin Measure (KMO) of sampling adequacy is a statistic that indicates the amount of variance in the variables that can be explained with the PCA. It is somewhat similar to the coefficient of determination R2 from a linear regression analysis. Kaiser proposed the following criteria for KMO [1]:

  1. 0.9≤KMO≤1.0 = Excellent sample adequacy.

  2. 0.8≤KMO≤0.9 = Good sample adequacy.

  3. 0.7≤KMO≤0.8 = Acceptable sample adequacy.

  4. 0.6≤KMO≤0.7 = Regular sample adequacy.

  5. 0.5≤KMO≤0.6 = Bad sample adequacy.

  6. 0.0≤KMO≤0.5 = Unacceptable sample adequacy.

Therefore, the KMO is required to be a value, at least, greater than 0.7 for the PCA to be acceptable. Bartlett’s test of sphericity is a statistical test which null hypothesis is an identity matrix. The acceptance of the null hypothesis means that there are no correlations (Sig. > 0.05). On the other hand, the alternative hypothesis is a non-identity matrix. The acceptance of this hypothesis means that there are correlations (Sig. < 0.05) and thus, a PCA can be performed. The statistical evaluation was made by SPSS Software.

Advertisement

2. PCA in the analysis of soil samples

In a study carried out by Ramos Ramos et al. [2], water samples from the Bolivian altiplano were analyzed. The results of various elements and variables determined in the water are presented in Table 1.

LocationSample IDWater typeEC (μS/cm)pHEh (mV)HCO3 (mg/L)F
(mg/L)
Cl (mg/L)SO42 (mg/L)Ca2+
(mg/L)
Mg2+ (mg/L)Na+
(mg/L)
K+ (mg/L)Ionic balance (%)
CayhuasiCAP1Mg–Na–Ca–HCO3SO41,1207.45167561.40.0113.5170.869.885 580.514.34.1
Soracachi 1SOP1Mg–Ca–Na–HCO3SO49127.33165427.10.2419.7145.455.162.356.86.1–1.0
Soracachi 2SOP2Mg–Na–Ca–HCO3SO49367.15184417.40.0121.0158.856.364.867.65.01.4
PariaPAP1Na–Mg–HCO3–Cl–SO42,1207.41173717.61.04198.2242.872.558.6291.542.1–0.7
ChusaqueriCHP1Ca–Na–Cl1,4807.52160153.80.03294.985.6156.320.488.38.21.9
Toledo 1TOP1Na–Ca–Cl–HCO31,4007.72154270.90.01233.090.468.815.4183.530.40.5
Toledo 2TOP2Na–Ca–Cl3,8607.38170263.60.01926.7183.3206.955.4421.028.0–0.9
KulliriKUP1Na–HCO3–B–SO48507.75160297.80.1161.997.027.23.4135.019.1–4.3
CopacabanitaCOP1Ca–Na–SO41,3907.08190190.40.4725.5562.5234.516.4125.56.98.6
TolalomaTOLP1Ca–HCO3SO46607.61158266.00.0126.8105.698.711.133.06.90.7
AndamarcaANP1Na–Ca–HCO3SO41,1907.69169483.30.0174.1142.8114.713.1179.018.57.3
AvaroaAVP1Ca–Na–HCO35907.29169290.40.1914.148.072.37.449.25.22.3
Pampa AullagasPAMP1Na–Ca–Cl–HCO3SO41,3606.41218219.70.35164.5144.269.922.4137.026.71.0
Quillacas 1QUP1Na–B–Cl–HCO31,2007.90141234.30.26189.779.136.26.7175.016.6–2.3
Quillacas 2QUP2Na–Cl8106.5521061.00.2093.766.811.54.9123.821.4–2.0
Quillacas 3QUP3Na–Cl–B–SO44106.7420536.60.1335.936.810.03.644.617.57.0
Condo K2CONP2Na–Ca–HCO36607.55162234.30.3242.055.652.08.2109.49.913.0
Condo K4CONP4Na–Ca–HCO3–B5707.55165227.00.2930.448.743.17.451.08.9–3.9
CaraynachaCARP1Ca–Na–HCO34007.48168172.10.1320.930.542.96.244.413.99.6
LlapallapaniLLAP1Na–Ca–SO4HCO32106.9719543.90.057.140.111.54.314.66.2–3.0
ChallapataCHAP1Ca–Na–Mg–HCO3–CI6506.84197135.50.1547.035.755.315.344.36.510.0
HuancaneHUAP1Na–Ca–HCO3–Cl8707.13185292.90.1478.846.770.113.8130.57.213.0
IrukasaIRP1Na–Cl4,6307.35173593.10.011,219.79.139.67.21,33326.916.1
RealengaREP1Na–Ca–Mg–SO4HCO37706.81206178.20.3037.1176.656.023.1101.05.88.9
PazñaPAZP2Na–Ca–SO4HCO31,2707.19180300.20.6772.1282.6114.320.5211.523.614.3
TotoralTOTP1Na–Cl–NO3SO41,4506.8719898.80.55168.8131.151.310.4208.022.80.5
CayumalliriCAYP1Ca–Na–HCO3SO46406.70212205.00.0733.972.164.515.135.03.0–1.1
Sora SoraSORP1Al–Ca–Mg–SO44,5003.793780.04.0628.11,020.2130.763.645.56.71.6
ChapanarCHAO1Ca–Mg–Na–HCO3SO42807.60157109.80.019.544.220.88.816.43.9–7.7
TotoralrTOR1Ca–SO42,5803.883690.00.51104.01,546.9333.540.428.29.2–2.1
AvicayarAVR1Ca–SO42,5303.104170.01.09106.91,261.1255.336.856.76.6–2.1
PaznarPAZR1Ca–Na–SO41,9604.7122114.60.06151.7874.9191.336.2128.512.03.3

Table 1.

Major ion composition in ground and surface water samples in the Poopó basin from the Bolivian Altiplano.

Below detection limit.

Samples ID end P pertain to wells.

Samples ID end R pertain to rivers.

Reproduced with permission from Springer Nature; Excerpt from [2].

Table 2 shows the correlation matrix of parameters. It can be seen that there are correlations between different variables, which is an indicative of underlying structures. In principle, a PCA would be feasible.

Correlaciones
ECpHEhHCO3FClSO4CaMgNaK
ECPearson correlation1−.287.429*.091.531**.709**.505**.493**.419*.625**.342
sig. (2-tailed).112.014.620.002.000.003.004.017.000.055
N3232323232323232323232
pHPearson correlation−.2871−.569**.378*−.393*.120−.624**−.361*−.109.133.102
sig. (2-tailed).112.001.033.026.514.000.043.554.468.580
N3232323232323232323232
EhPearson correlation.429*−.569**1−.462**.582**−.107.815**.569**.276−.145−.325
sig. (2-tailed).014.001.008.000.559.000.001.126.428.070
N3232323232323232323232
HCO3Pearson correlation.091.378*−.462**1−.228.307−.415*−.262.313.475**.456**
sig. (2-tailed).620.033.008.210.087.018.148.081.006.009
N3232323232323232323232
FPearson correlation.531**−.393*.582**−.2281−.128.523**.225.358*−.098−.033
sig. (2-tailed).002.026.000.210.486.002.215.044.594.860
N3232323232323232323232
ClPearson correlation.709**.120−.107.307−.1281−.085.123.033.897**.497**
sig. (2-tailed).000.514.559.087.486.643.501.857.000.004
N3232323232323232323232
SO4Pearson correlation.505**−.624**.815**−.415*.523**−.0851.832**.400*−.151−.171
sig. (2-tailed).003.000.000.018.002.643.000.023.408.350
N3232323232323232323232
CaPearson correlation.493**−.36l*.569**−.262.225.123.832**1.353*−.057−.107
sig. (2-tailed).004.043.001.148.215.501.000.048.756.559
N3232323232323232323232
MgPearson correlation.419*−.109.276.313.358*.033.400*.353*1−.060.063
sig. (2-tailed).017.554.126.081.044.857.023.048.746.733
N3232323232323232323232
NaPearson correlation.625**.133−.145.475**−.098.897**.151−.057−.0601.512**
sig. (2-tailed).000.468.428.006.594.000.408.756.746.003
N3232323232323232323232
KPearson correlation.342.102−.325.456**−.033.497**−.171−.107.063.512**1
sig. (2-tailed).055.580.070.009.860.004.350.559.733.003
N3232323232323232323232

Table 2.

Correlations Matrix of water samples in Poopó Basin, Bolivian Altiplano.

. Correlation is significant at the 0.05 level (2-tailed).


. Correlation is significant at the 0.01 level (2-tailed).


The results of the KMO and Bartlett's sphericity are (Table 3):

KMO and Bartlett’s Test
Kaiser Meyer Olkin Measure of Sampling Adequacy.478
Bartlett’s Test of SphericityApprox. Chi-Square349.409
df55
Sig..000

Table 3.

KMO and Bartlett's sphericity results for water samples.

The KMO statistics has a value of 0.478 which indicates that the data is not suitable for PCA, although Bartlett's test of sphericity has a Sig of 0.000. This means that the data is not suitable for a PCA.

Table 4 correspond to an analysis of metals in soils samples carried out in the Bolivian altiplano, Poopó Basin, Bolivia [2]. There are 36 sampling points in which 16 parameters have been determined. Therefore, we have a system with 33 elements and 16 variables.

NoCodeAlAsBCdCrCuFeMnMoNiPPbSSiSrZn
mg/kgmg/kgmg/kgmg/kgmg/kgmg/kgmg/kgmg/kgmg/kgmg/kgmg/kgmg/kgmg/kgmg/kgmg/kgmg/kg
1COR 1H13181.113.811.30.512.920.616708.9233.70.28.0532.535.6556.9429.942.586.8
2COR 1C11823.017.84.50.515.221.123063.1474.00.310.1337.640.9300.5519.232.584.2
3COR 1P12705.815.06.30.314.818.120914.0370.30.39.5345.535.1161.0459.426.385.2
4COR 1AA30996.338.515.00.940.248.657156.41029.90.827.4976.488.3581.11183.355.5232.5
5COR 2C7945.628.08.90.710.220.115650.0262.10.34.4298.461.3241.5509.130.177.8
6COR 2P11231.631.26.90.710.921.316403.3299.20.57.5285.8104.4424.998.431.476.3
7COR 2AA8736.826.112.10.510.020.915818.3332.40.35.8407.748.7373.7449.337.899.5
8COR 3H10296.927.813.10.610.517.616405.1394.20.56.8457.740.3348.8535.233.381.4
9COR 3C4372.916.24.70.25.98.710264.0192.50.33.4218.920.4168.4375.915.654.5
10COR 3P6899.820.85.40.48.912.814938.4281.60.35.1279.230.6153.6455.221.867.0
11COR 3AA6910.121.56.70.59.116.015030.0303.40.25.1274.437.2181.8504.622.777.8
12VM 1H42276.030.140.01.227.127.822425.4546.12.87.7910.085.9506.3983.351.2159.4
13VM 1C25081.623.027.50.317.411.516043.0365.00.44.7381.126.0264.1985.444.158.0
14VM 2H43636.717.329.50.329.218.822690.3448.10.310.3469.228.0227.91138.139.178.0
15VM 2P40756.019.327.50.328.218.722097.1428.50.48.8449.325.6220.41042.039.372.3
16VM 3H5509.49.38.30.45.29.48763.7144.10.53.7198.215.8170.1558.518.595.6
17VM 3C11299.413.33.80.615.322.923362.0454.60.510.8352.735.4162.2460.527.181.4
18VM 3P12348.815.94.80.416.325.525623.8497.50.311.5393.541.5180.9458.428.991.7
19VM 4H12362.619.720.50.811.121.817023.4292.10.68.0533.139.5424.41395.951.3203.0
20VM 4C25008.040.214.50.730.641.146040.2928.50.918.9813.970.8398.41109.063.3146.4
21VM 4P8614.918.523.30.89.116.618128.2627.70.47.5457.932.4584.8617.735.2179.4
22VM 5H6887.928.911.80.910.022.1238.7503.40.37.8362.570.2844.3487.625.5130.2
23VM 5C7839.328.413.30.910.822.521060.7505.60.37.3371.073.0818.4488.226.0137.5
24VM 6C7941.519.09.30.48.914.816193.1510.60.36.5328.943.9360.1575.924.688.5
25VM 6P6511.727.28.61.19.321.121475.7544.70.26.7364.753.6414.0506.919.5161.2
26POO 1C10751.319.85.70.912.320.319361.9424.30.58.6348.343.4191.3633.827.390.8
27POO 1P13660.920.67.10.514.421.022414.2485.60.69.4348.333.7181.2606.733.876.1
28POO 2C10290.115.713.51.010.314.014995.4340.50.45.2337.627.9179.9516.936.1117.9
29POO 2P10725.519.36.70.412.215.717443.0386.60.26.8295.330.9214.1552.829.068.3
30POO 3C7681.327.08.42.19.919.117784.7297.00.36.4688.2138.7300.1496.830.3246.6
31POO 3P9274.621.212.11.511.020.817636.1337.40.46.0803.995.2289.3517.936.9193.8
32POO 4C7099.814.39.90.310.912.214469.9286.80.77.5250.724.8171.4440.326.051.7
33POO 4C (Alta)9838.717.511.70.410.313.014625.4277.90.35.1331.224.4182.9503.246.347.7
34POO 4P10407.316.311.70.611.315.116412.7359.60.36.5348.038.3292.0664.433.969.1
35POO 4 AA25849.414.628.90.318.612.714794.3311.30.35.6315.727.1583.41439.754.357.0
36PUÑ 4P22049.216.521.20.215.29.312327.1348.10.54.8317.316.6232.01308.143.745.5
37Referencia33277.518.423.73.974.5356.426390.7491.40.4300.2969.5101.51841.5501.381.0744.7

Table 4.

Metals and trace elements analysis in soils from Poopó Basin, in Bolivian Altiplano [3].

Table 5 shows the correlation matrix, where it can be seen that there are correlations between the different variables. KMO’s and Bartlett’s tests give the following results: The KMO statistics is 0.704 and indicates that the data is acceptable (Table 6) for performing a PCA. The Sig. of the Bartlett test is 0.000 and indicates that the alternative hypothesis is valid, thus, the correlation matrix is not an identity. Then, these values indicate that the data can be subjected to a PCA. Therefore, we proceed with the PCA.

Correlations
AlAsBCdCrCuFeMnMoNiPPbSSiSrZn
AlPearson Correlation1,207,796**−,115,876**,353*,452**,400*,512**,412*,505**,012,092,716**,651**,039
Sig. (2-tailed),227,000,504,000,035,006,016,001,012,002,946,595,000,800,819
N36363636363636363636363636363636
AsPearson Correlation,2071,124,449**,421*,726**,528**,635**,334*,508**,595**,700**,481**,106,354*,503**
Sig. (2-tailed),227,472,006,011,000,001,000,047,002,000,000,003,539,034,002
N36363636363636363636363636363636
BPearson Correlation,796**,1241,003,528**,056,083,184,530**,028,414*−,034,327,732**,642**,136
Sig. (2-tailed),000.472,988,001,745,632,282,001,871,012,845,052,000,000,429
N36363636363636363636363636363636
CdPearson Correlation−,115,449**,0031−,030,355*,133,189,238,099,581**,823**,310−,118,057,826**
Sig. (2-tailed),504,006,988,864,033,438,271,163,567,000,000,066,491,743,000
N36363636363636363636363636363636
CrPearson Correlation,876**,421*,528**−,0301,700**,788**,705**,443**,786**,660**,167,154,638**,685**,218
Sig. (2-tailed),000,011,001,864,000,000,000,007,000,000,331,368,000,000,202
N36363636363636363636363636363636
CuPearson Correlation,353*,726**,056,355*,700**1,856**,834**,366*,898**,760**,553**,396*,225,492**,587**
Sig. (2-tailed),035,000,745,033,000,000,000,028,000,000,000,017,187,002,000
N36363636363636363636363636363636
FePearson Correlation,452**,528**,083,133,788**,856**1,825**,272,915**,665**,298,091,348*,521**,414*
Sig. (2-tailed),006,001,632,438,000,000,000,109,000,000,077,596,037,001,012
N36363636363636363636363636363636
MnPearson Correlation,400*,635**,184,189,705**,834**,825**1,310,857**,622**,309,401*,336*,442**,470**
Sig. (2-tailed),016,000,282,271,000,000,000,065,000,000,067,015,045,007,004
N36363636363636363636363636363636
MoPearson Correlation,512**,334*,530**,238,443**,366*,272,3101,224,564**,271,152,296,413*,257
Sig. (2-tailed),001,047,001,163,007,028,109,065,190,000,110,376,079,012,130
N36363636363636363636363636363636
NiPearson Correlation,412*,508**,028,099,786**,898**,915**,857**,2241,629**,299,238,321,476**,421*
Sig. (2-tailed),012,002,871,567,000,000,000,000,190,000,077,163,056,003,010
N36363636363636363636363636363636
PPearson Correlation,505**,595**,414*,581**,660**,760**,665**,622**,564**,629**1,622**,352*,395*,648**,744**
Sig. (2-tailed),002,000,012,000,000,000,000,000,000,000,000,035,017,000,000
N36363636363636363636363636363636
PbPearson Correlation,012,700**−,034,823**,167,553**,298,309,271,299,622**1,414*−,147,137,701**
Sig. (2-tailed),946,000,845,000,331,000,077,067,110,077,000,012,392,425,000
N36363636363636363636363636363636
SPearson Correlation,092,481**,327,310,154,396*,091,401*,152,238,352*,414*1,168,284,453**
Sig. (2-tailed),595,003,052,066,368,017,596,015,376,163,035,012,328,093,006
N36363636363636363636363636363636
SiPearson Correlation,716**,106,732**−,118,638**,225,348*,336*,296,321,395*−.147,1681,737**,163
Sig. (2-tailed),000,539,000,491,000,187,037,045,079,056,017,392,328,000,342
N36363636363636363636363636363636
SrPearson Correlation,651**,354*,642**,057,685**,492**,521**,442**,413*,476**,648**,137,284,737**1,253
Sig. (2-tailed),000,034,000,743,000,002,001,007,012,003,000,425,093,000,136
N36363636363636363636363636363636
ZnPearson Correlation,039,503**,136,826**,218,587**,414*,470**,257,421*,744**,70l**,453**,163,2531
Sig. (2-tailed),819,002,429,000,202,000,012,004,130,010,000,000,006,342,136
N36363636363636363636363636363636

Table 5.

Sample correlation matrix of metal and trace elements in soil samples.

Correlation is significant at the 0.01 level (2-tailed).


Correlation is significant at the 0.05 level (2-tailed).


KMO and Bartlett’s Test
Kaiser Meyer Olkin Measure of Sampling Adequacy.704
Bartlett’s Test of SphericityApprox. Chi-Square725.706
df120
Sig..000

Table 6.

KMO and Bartlett’s sphericity results for soil samples.

Four main components have been extracted. The principal extracted components are presented in Table 7. The table shows that with four components, 85.13% of the variability would be explained.

Total variance explained
ComponentInitial eigenvaluesExtraction sums of squared loadingsRotation sums of squared loadings
Total% of varianceCumulative %Total% of varianceCumulative %Total% of varianceCumulative %
17.58447.40147.4017.58447.40147.4014 73229.57829.578
23.19219.94867.3503.19219.94867.3504.07025.43555.013
31.90511.90479.2541.90511.90479.2543.63922.74377.756
4.9415.88085.134.9415.88085.1341.1817.37985.134
5.7354.59689.731
6.4662.91392.643
7.3812.38195.024
8.2891.80596.829
9.1641.02597.854
10.110.69098.544
11.074.46299.006
12.065.40599.411
13.049.30499.715
14.029.18199.896
15.011.07299.968
16.005.032100.000

Table 7.

Extraction of principal components (PC) with a total explained variance of 85.13%.

Extraction Method: Principal Component Analysis

Table 8 shows the rotated component matrix with the Varimax rotation method. According to this matrix, the representativeness of the main components with respect to the variables is determined. The highest correlation that the variable has with the main component was taken as a criterion.

Rotated component matrixa
Component
1234
Al.320.874−.056−.101
As.523.083.547.314
B−.116.945.057.197
Cd−.011−.068.946.056
Cr.719.647.016−.065
Cu.861.132.412.125
Fe.928.203.155−.104
Mn.850.199.191.233
Mo.099.609.432−.250
Ni.956.142.115.046
P.500.470.658.001
Pb.220−.078.885.129
S.119.161.328.874
Si.210.825−.145.166
Sr.376.746.093.163
Zn.280.073.809.214

Table 8.

Rotated component matrix.

Extraction Method: Principal Component Analysis.

Rotation Method: Varimax with Kaiser Normalization.

aRotation converged in 5 iterations.

PC1=Ni+Fe+Cu+Mn+Cr
PC2=B+Al+Si+Sr+MoE1
PC3=Cd+Pb+Zn+P+As
PC4=Ni+Fe+Cu+Mn+Cr

Table 9 shows the Component Score Coefficients Matrix, which are the coefficients of the variables in each PCA. For example, for Principal Component 1 (PC1), the following Eq. (2) applies:

Matriz de coeficiente de puntuación de componente
Componente
1234
Al−,007,236−,027−,140
As,078−,050,079,192
B−,184,308,020,145
Cd−,125−,003,351−,124
Cr,145,108−,066−,117
Cu,206−,076,018,017
Fe,259−,058−,043−,168
Mn,220−,063−,084,166
Mo−,106,200,213−,365
Ni,278−,089−,093−,008
P,009,093,193−,161
Pb−,041−,043,282−,044
S−,051,000−,057,803
Si−,031,222−,099,150
Sr,002,178−,032,101
Zn−,034−,009,231,048

Table 9.

Component score coefficients matrix.

Método de extracción: análisis de componentes principales.

Método de rotación: Varimax con normalización Kaiser.

Puntuaciones de componente.

PC1=0.007Al+0.078As0.184B0.125Cd+0.145Cr+0.206Cu+0.259Fe+0.220Mn0.106Mo+0.278Ni+0.009P0.041Pb0.051S0.031Si+0.002Sr0.034ZnE2

Similarly, the equations for the components PC2, PC3 and PC4 can be expressed.

According to this matrix of score coefficients and their corresponding equations, the score values of the main components are obtained for the 36 sampling points. These are shown in Table 10.

SampleCP1CP2CP3CP4
1−0.21118−0.0777−0.334670.80961
20.6651−0.67391−0.53521−0.20777
30.39744−0.60917−0.64339−0.954
44.071620.567190.766410.55046
5−0.34113−0.596460.24423−0.13876
6−0.07872−0.942990.76250.27764
7−0.18808−0.36460.021990.54976
8−0.16801−0.192840.041810.26754
9−0.79917−0.90177−0.74427−0.65343
10−0.33788−0.78329−0.45122−0.68191
11−0.27713−0.79168−0.32166−0.4073
12−0.672213.280142.51067−1.29964
13−0.611991.3241−0.818150.22432
140.337661.77831−1.07201−0.46932
150.239381.63463−0.98365−0.54454
16−1.18136−0.4418−0.4608−0.8548
170.68138−0.72139−0.37866−1.24838
181.03018−0.79119−0.53675−0.89173
19−0.591171.091460.478150.82755
202.921820.701940.351120.21752
21−0.320970.116160.214851.62766
22−0.52482−0.68660.551432.96642
230.00703−0.713240.530042.46939
24−0.07964−0.61843−0.433210.45935
250.13395−1.02840.678030.67771
260.16106−0.544940.12387−0.82888
270.54161−0.31881−0.42751−0.80632
28−0.6818−0.061850.23561−0.69627
290.06159−0.56759−0.68723−0.29851
30−0.80934−0.647743.33121−0.66866
31−0.59454−0.141722.187−0.70855
32−0.38801−0.34591−0.64539−0.98714
33−0.475970.01196−0.67739−0.40583
34−0.32046−0.14616−0.38936−0.05153
35−0.835231.86013−1.283171.80646
36−0.761021.34416−1.205240.07187

Table 10.

Sampling point scores for principal components.

After these scores, which represent the new composite reduced variables, different graphical representations and interpretations can be made. For example, the plotting of each principal component with respect to the sampling points (Figure 1).

Figure 1.

Representation of the main components PC1 and PC2 with respect to the sampling points.

Figure 1a shows that in the sampling points there is a certain homogeneous distribution of PC1 with respect to the points, except for points 4 (Code sample, CORR 1AA) and 20 (Code sample, VM 4C). This indicates that there are high outliers in PC1 (Ni, Fe, Cu, Mn and Cr). In environmental terms, special attention is required at these sampling points. The points in question are (Table 11):

NoCode SoilCr mg/kgCu mg/kgFe mg/kgMn mg/KgNi mg/kg
4COR 1AA40.248.65756.41029.927.4
20VM 4C30.641.146040.2928.518.9
Max40.248.657156.41029.927.4
Min5.28.7238.7144.13.4
Mean14.319.319049.5411.187.9

Table 11.

Inhomogeneous distribution in soil samples.

The samples have high values in these metals. Particularly, sample number 4 presents the maximum values and, sample number 20 is well above the average values. Similar analysis can be done for the other principal components.

Regarding structure, the samples numbers 4 and 20 form a group with relatively high values of PC1. Another group is formed by the rest of the points. Figure 1b represents the distribution of PC2 (B, Al, Si, Sr and Mo). It can also be seen that there are atypical points forming a group with relatively high values in this component. These are sampling points 12, 14, 15, 35 and 36. The rest of the points form a group with a homogeneous distribution in relation to PC2.

Figure 2a shows that the behavior of the PC3 is also homogeneous, with the exception of sampling points 12, 30, and 31 that would have high outliers in Cd, Pb, Zn, P and As. Regarding structure, sampling points 12, 30 and 31 form a group with high values of PC3, and the other group formed by the rest of the sampling points. Figure 2b represents the distribution of sulfur in the sampling points. It can be seen that one group would be made up of sampling points 21, 22, 23 and 35 with relatively high sulfur values, and the other group made up of the rest of the points.

Figure 2.

Representation of the main components PC3 and PC4 with respect to the sampling points.

Figure 3 is the representation of PC1 against PC2. There are three groups observed; the first group formed by sampling points 12, 13, 14, 15, 19, 35 and 36, with high values of PC2 (B, Al, Si, Sr and Mo) with respect to their values of PC1 (Ni, Fe, Cu, Mn and Cr); a second group made up of sampling points 4 and 20 which would have high values of PC1 in relation to PC2; and a third group made up of the rest of the sampling points which would have an homogeneous distribution in PC2 and PC1. Again, from the environmental point of view, the sampling points of the first and second groups should be analyzed more carefully.

Figure 3.

Representation of the main components PC2 against PC1.

Figure 4 shows the distribution of the samples in relation to the main components PC1 against PC3. While the PC3 in most of the sampling points does not show variability except for points 12, 30 and 31; PC1 is highly distributed with the greatest variability in the sampling points.

Figure 4.

Representation of the main components PC3 against PC1.

In the same way, graphical representations of the rest of the combinations of the main components can be made (Figures 5 and 6).

Figure 5.

Representation of the main components PC4 against PC1.

Figure 6.

Representation of the main components PC3 against PC2.

In the following example, two types of soils have been considered. The characteristics of both are different, and this fact will allow us to see the ability of the main components to characterize and classify soils [4].

The following chemical parameters have been considered: pH in H2O, pH in KCl solution, Electrical Conductivity (EC), Change acidity, Total Nitrogen, Organic Matter, Assimilable Phosphorus and Exchangeable Cations (Ca2+, Mg2+, Na+, K+).

The first group of samples comes from the inter-Andean valley of the Municipality of Inquisivi – Yamora, which is located between the coordinates: 66o43′29″ and 67o17′58″ West longitude; 15o47′34″ and 17o18′20″ South latitude and at an average altitude of 2840 m (a.s.l.). The second sample comes from the Northern Altiplano Viacha Municipality, located between the coordinates: 68o16′56″ and 68o22′72″ West longitude and 16o32′39″ and 16o54′44″ latitude, with an average altitude of 4070 m (a.s.l.), both in La Paz, Bolivia [4].

The ten soil samples have been taken in the Yamora community, and another 10 soil samples from the Viacha community. The mentioned 11 parameters have been analyzed. The evaluation does not take into account the environmental conditions of Yamora or Viacha. It is only carried out based on the chemical parameters for the evaluation of fertility from the chemical point of view of the soils (Table 12).

LocationpH (H2O)pH (KCl)CEH-Al% MO% NNaKCaMgP
Yamora6.755.80.0750.03293.40.280.1280.68817.7612.548273.916
Yamora6.764.980.0750.06093.20.300.1280.68817.7602.577255.876
Yamora6.725.730.0680.03393.30.310.1340.68818.3312.636250.994
Yamora6.765.890.0740.00823.40.320.1340.65518.6742.684257.253
Yamora6.735.890.0720.03293.20.300.1340.65517.4552.518246.810
Yamora6.795.920.0680.03293.40.320.1460.65517.7992.548266.998
Yamora6.795.370.0690.03913.40.30.1280.68817.8742.587253.086
Yamora6.85.840.0730.03293.40.30.1340.65517.0742.450259.345
Yamora6.836.010.0720.03493.40.30.1340.65518.0272.606261.420
Yamora6.825.950.0720.03293.10.30.1340.62217.3412.479275.347
Viacha8.547.130.7270.09340.70.0864.6630.4595.3854.00816.010
Viacha8.727.160.7320.09340.70.0965.0120.4915 1554.06615.870
Viacha8.787.120.7360.10540.70.1014.6050.4265.0423.98818.241
Viacha8.747.110.7350.10540.50.0934.6050.4265.0803.98819.287
Viacha8.817.160.7370.09340.60.0924.6630.4265.1184.02719.845
Viacha8.787.120.7380.10540.70.0914.6630.4595.1184.02720.612
Viacha8.947.010.7310.10910.70.0934.6050.4265.0803.95917.683
Viacha8.696.780.7330.08130.60.0934.6630.4265.3093.80218.241
Viacha8.497.140.7320.08180.70.0935.1290.4915.2333.97818.311
Viacha8.817.140.7800.08130.50.0935.2460.4914.8893.93916.010

Table 12.

Results of the analysis of parameters in samples from Yamora and Viacha.

Reproduced with permission from Revista Boliviana de Química; Excerpt from [4].

PCA was also performed and the correlation matrix is shown in Table 13.

Correlaciones
pHenH2OpHenKClCEH_AlMONNaKCaMgP
pHenH2OCorrelación de Pearson1.944**.996**.945**−.994**−.992**.991**−.978**−.996**.991**−.993**
Sig. (bilateral).000.000.000.000.000.000.000.000.000.000
N2020202020202020202020
pHenKClCorrelación de Pearson.944**1.947**.836**−.941**−.941**.948**−.940**−.947**.947**−.941**
Sig. (bilateral).000.000.000.000.000.000.000.000.000.000
N2020202020202020202020
CECorrelación de Pearson.996**.947**1.936**−.998**−.997**.998**−.971**−.999**.995**−.998**
Sig. (bilateral).000.000.000.000.000.000.000.000.000.000
N2020202020202020202020
H_AlCorrelación de Pearson.945**.836**.936**1−.939**−.942**.925**−.921**−.943**.936**−.938**
Sig. (bilateral).000.000.000.000.000.000.000.000.000.000
N2020202020202020202020
MOCorrelación de Pearson−.994**−.941**−.998**−.939**1.995**−.996**.975**.997**−.991**.996**
Sig. (bilateral).000.000.000.000.000.000.000.000.000.000
N2020202020202020202020
NCorrelación de Pearson−.992**−.941**−.997**−.942**.995**1−.994**.967**.997**−.990**.994**
Sig. (bilateral).000.000.000.000.000.000.000.000.000.000
N2020202020202020202020
NaCorrelación de Pearson.991**.948**.998**.925**−.996**−.994**1−.960**−.996**.993**−.996**
Sig. (bilateral).000.000.000.000.000.000.000.000.000.000
N2020202020202020202020
KCorrelación de Pearson−.978**−.940**−.97l**−.921**.975**.967**−.960**1.974**−.962**.969**
Sig. (bilateral).000.000.000.000.000.000.000.000.000.000
N2020202020202020202020
CaCorrelación de Pearson−.996**−.947**−.999**−.943**.997**.997**−.996**.974**1−.991**.997**
Sig. (bilateral).000.000.000.000.000.000.000.000.000.000
N2020202020202020202020
MgCorrelación de Pearson.991**.947**.995**.936**−.991**−.990**.993**−.962**−.991**1−.995**
Sig. (bilateral).000.000.000.000.000.000.000.000.000.000
N2020202020202020202020
PCorrelación de Pearson−.993**−.941**−.998**−.938**.996**.994**−.996**.969**.997**−.995**1
Sig. (bilateral).000.000.000.000.000.000.000.000.000.000
IM2020202020202020202020

Table 13.

Matrix of correlations of samples from Yamora and Viacha.

. La correlación es significativa en el nivel 0,01 (2 colas).


In the correlation matrix, high correlations between the variables are observed, the KMO with 0.865 and a Bartlett Significance of 0.000 indicate that the reduction of dimensions by principal components is feasible and adequate (Table 14). Therefore, we proceeded to obtain two main components (Table 15) and the rotated component matrices and component score coefficients for the samples (Table 16) with the application of Varimax rotation and Kaiser normalization.

KMO and Bartlett’s Test
Kaiser Meyer Olkin Measure of Sampling Adequacy.865
Bartlett’s Test of SphericityApprox. Chi-Square715.671
df55
Sig..000

Table 14.

KMO and Bartlett's sphericity results for soils samples.

Total variance explained
ComponentInitial eigenvaluesExtraction sums of squared loadingsRotation sums of squared loadings
Total% of varianceCumulative %Total% of varianceCumulative %Total% of varianceCumulative %
110.70797.33897.33810.70797.33897.3385.67351.57251.572
2.1661.51198.848.1661.51198.8485.20047.27698.848
3.063.56999.417
4.038.34299.759
5.011.10399.862
6.006.05699.918
7.004.04199.959
8.003.02499.983
9.001.01299.995
10.000.00399.998
11.000.002100.000

Table 15.

Extraction of principal components with a total explained variance of 98.84%.

Extraction Method: Principal Component Analysis.

Rotated component matrixa
Component
12
pHenH2O.711.699
pHenKCl.876.461
CE.728.683
H_Al.476.870
MO−.717−.694
N−.710−.699
Na.739.667
K−.725−.657
Ca−.718−.694
Mg.723.682
P−.717−.693
Component score coefficient matrix
Component
12
pHenH2O.011.123
pHenKCl1.195−1.122
CE.106.024
H_Al−1.1841.367
MO−.041−.092
N−.007−.128
Na.185−.059
K−.174.049
Ca−.047−.086
Mg.095.035
p−.045−.087

Table 16.

Rotated component matrix and component score coefficient matrix for the Yamora and Viacha samples.

Extraction Method: Principal Component Analysis.

Rotation Method: Varimax with Kaiser Normalization.

Component Scores.

aRotation converged in 3 iterations.

The rotated component matrix shows that there is a structure. A group of parameters that have a positive correlation with the principal components, Group 1 (PC1): pH in KCl, Na+, CE, Mg2+, pH in H2O and H-Al with positive correlation. There is another group of parameters that have a negative correlation, Group 2 (PC2): N, MO, P, Ca2+ and K+. This leads to a competition between these groups of parameters in the soil. If Group 1 overlaps Group 2, the soil would have high pH and EC values, high Na+ and Mg2+ contents, and positive values of the main components, poor exchange content of OM, P and N. It means that there is an unfavorable soil for agriculture purposes. However, if Group 2 of parameters overlaps Group 1, then the soil is rich in OM, P and N. This means that the soil is more suitable for agriculture purposes, and it would have negative values of the main components.

The score coefficient matrix of the components generates the functions of PC1 and PC2:

PC1=0.011pHH2O+1.195pHKCl+0.106CE1.184HAl0.041OM0.007N+0.185Na0.174K0.047Ca+0.095Mg0.045PE3
PC1=0.123pHH2O1.122pHKCl+0.024CE+1.367HAl0.092OM0.128N0.059Na0.049K0.086Ca+0.035Mg0.087PE4

The score values are the following (Table 17):

LocationCP1CP2
Yamora−0.6039−0.7968
Yamora−2.92761.6177
Yamora−0.7392−0.6686
Yamora0.4893−2.0047
Yamora−0.3933−0.9410
Yamora−0.3570−1.0346
Yamora−1.51910.1140
Yamora−0.4907−0.8760
Yamora−0.2710−1.0577
Yamora−0.2547−1.0504
Viacha0.80950.5245
Viacha0.84820.5004
Viacha0.41591.0341
Viacha0.40461.0660
Viacha0.92140.4989
Viacha0.37421.0584
Viacha0.10101.3822
Viacha0.69790.5567
Viacha1.22240.0200
Viacha1.27190.0567

Table 17.

Principal component coefficients for the Yamora and Viacha samples.

The representation of the components for the Yamora and Viacha samples are shown in Figure 7. For both PC1 (Figure 7a) and PC2 (Figure 7b), the positive values indicate that the pH parameters in KCl, Na+, EC, Mg2+, pH in H2O, and H-Al overlap the parameters of N, OM, P, Ca2+, K+. This means that if the soils have positive values of PC1 and PC2, then the soil has high pH values, high Na+ concentration, and high EC. On the other hand, if the soil has negative values of the components, then the soil is rich in OM, P, N, which represents a much more suitable land for agriculture.

Figure 7.

Principal Component Analysis of the Yamora and Viacha samples a) Principal Component PC1, b) Principal Component PC2.

In the case of the Yamora samples, its PC1 and PC2 is negative, therefore, this soil is rich in OM, P and N, which represents a much more suitable soil for agriculture. In the results for the Viacha samples, the PC1 and PC2 are positive, therefore, this soil is shown as a soil not so suitable for agriculture (Figure 8).

Figure 8.

Principal components PC1 and PC2 from Yamora and Viacha samples.

It can be observed that the main components accurately classify the two types of soils. In addition, a correlation can be observed for each type of soil (Figure 9).

Figure 9.

The main components PC1 and PC2 show correlation for each type of soil, Yamora and Viacha.

The slope of both is approximately the same and the characterization of the soils is given by the ordinate to the origin (Figure 9). Soils with more suitable characteristics for cultivation, that is, the parameters N, OM, P, Ca2+, K+ overlap the pH in KCl, Na+, EC, Mg2+, pH in H2O and H-Al tend towards smaller or even negative ordinates to the origin. In this case, the main components are capable of classifying and characterizing the soils with high precision. Thus, the multivariate analysis of soils constitutes an important tool for classifying soils.

It should be considered that the principal components give us a stand point in the data analysis. These must be complemented with other methods of multivariate analysis. In this case; for example, multivariate discriminant analysis can be applied [5].

The coefficients of the standardized canonical discriminant function indicate that the most appropriate parameters considered in the discriminant function are N, Na+, K+, Mg2+ and P. The parameters that are important to define soil fertility are: pH and OM. In addition, other factors that intervene in soil formation are the presence of minerals that contain exchange cations (Na+, K+, Mg2+ and Ca2+), decreases in soil acidification and, the decomposition process of minerals.

The general discriminant function obtained for the two types of soils is [6]:

D=18.418+118.391N8.267Na+67.852K11.752Mg+0.114PE5

While the discriminant functions by group are:

DYamora=4971.556+10936.736N553.970Na+5449.136K206.719Mg+13.873PE6
DViacha=2725.2563502.174N+454.280Na+2826.109K+1226.527Mg0.023PE7

The results of the application of the discriminant function in the classification of the samples in both places indicate that the 20 samples can be classified 100% correctly. Therefore, the application of these functions in the classification of new soil samples has a high probability of classifying them correctly. In this way, it is possible to classify the soils through five parameters and the discriminant function, and thus, determine its chemical fertility. This information can be complemented to the main components.

Advertisement

3. Conclusions

The data analysis by main Principal Components Analysis for the reduction of dimensions in data was applied to soil samples. It is shown that this tool is fundamental and fully applicable, since it allows the characterization and classification of soil samples with precision. This brings a better interpretation of the results.

Advertisement

Acknowledgments

Due acknowledgement to Springer Nature for giving permission to reproduce Table 1 from “Sources and behavior of arsenic and trace elements in groundwater and surface water in the Poopó Lake basin, Bolivian Altiplano” by Oswaldo Eduardo Ramos Ramos, Luis Fernando Cáceres, Mauricio Rodolfo Ormachea Muñoz, Prosun Bhattacharya, Israel Quino, Jorge Quintanilla, Ondra Sracek, Roger Thunvik, Jochen Bundschuh, Maria Eugenia Garcia., Environmental Earth Sci., 66: 793 – 807, 2012.

A due acknowledgement to Revista Boliviana de Química for giving permission to reproduce the Table and the equation from “Chemometric evaluation of internal reference material (IRM) of agricultural soils in the two provincial municipalities of La Paz” by Rolando Mamani Quispe, Leonardo Guzmán Alegria, Jorge Chungara Castro, Oswaldo Eduardo Ramos Ramos, Revista Boliviana de Química, Vol. 39, No 4, pp 181 – 189, 2019; and “Análisis multivariable en la clasificación de suelos para la agricultura en el valle y Altiplano Boliviano” by Rolando Mamani Quispe, Oswaldo Eduardo Ramos Ramos, Jorge Chungara Castro, Leonardo Guzmán Alegría, Revista Boliviana de Química, Vol. 38, No 3, pp 126 – 132, 2021.

A due acknowledgment to José Antonio Bravo, Ph.D., Chief Editor of Revista Boliviana de Química for gramatical revision of the document.

Advertisement

Conflict of interest

The authors declare no conflict of interest.

References

  1. 1. Martín Q., Cabero M. T., de Paz Y. 2007. Tratamiento estadístico de datos con SPSS, pg. 328, España, Universidad de Salamanca. Ed. Thomson
  2. 2. Eduardo RRO, Fernando CL, Rodolfo OMM, Prosun B, Israel Q, Jorge Q, et al. Sources and behavior of arsenic and trace elements in groundwater and surface water in the Poopó Lake basin. Bolivian Altiplano. Environmental Earth Science. 2012;66:793-807
  3. 3. Eduardo Ramos Ramos Oswaldo. Geochemistry of trace elements in the Bolivian Altiplano – Effects of natural processes and anthropogenic activities. PhD Thesis, TRITA LWR PHD-2014:04. 2014
  4. 4. Rolando MQ, Leonardo GA, Jorge CC, Eduardo RRO. Chemometric evaluation of internal reference material (IRM) of agricultural soils in the two provincial municipalities of La Paz. Revista Boliviana de Química. 2019;39(4):181-189
  5. 5. Rolando MQ, Eduardo RRO, Jorge CC, Leonardo GA. Análisis multivariable en la clasificación de suelos para la agricultura en el valle y Altiplano Boliviano. Revista Boliviana de Química. 2021;38(3):126-132
  6. 6. Mongay FC. Quimiometría. España: Universitat de Valencia; 2005. p. 245 Ed. PUV

Written By

Oswaldo Eduardo Ramos Ramos and Leonardo Guzmán Alegría

Submitted: 27 August 2023 Reviewed: 04 September 2023 Published: 27 November 2023