Sequential binary partition defining macronutrient balances.
Soil fertility studies aim to integrate the basic principles of biology, chemistry, and physics, but generally lead to separate interpretations of soil and plants data . Paradoxically, J.B. Boussingeault warned as far as in the 1830s that the balance between nutrients in soil-plant systems was more important than nutrient concentrations taken in isolation . Indeed, the biogeochemical cycles of elements that regulate the dynamics of agroecosystems  do not operate independently . However, raw concentrations of individual elements or their log transformation are commonly used to conduct statistical analyses on plant nutrients [5, 6,7], soil fertility indices  and C mineralization data [9, 10]. Researchers thus proposed several ratios and stoichiometric rules to relate system’s components to each other when monitoring mineralization and immobilization of organic C, N, P and S in soils [4, 11], cations interacting on soil cation exchange capacity , nutrient interactions in plants [13, 14, 15, 16] and carbon uptake by plants [17, 18].
Different approaches have been elaborated to describe nutrient balances in soils. The nutrient intensity and balance concept (NIBC) computes ionic balances in soil water extracts [19, 20]. The basic cation saturation ratio (BCSR) concept hypothesizes that cations and acidity exchanging on soil cation exchange capacity (CEC) can be optimized for crop growth . However, the BCSR has been criticized for its elusive definition of ‘ideal’ cationic ratios [21, 22]. In plant nutrition,  were the first to represent geometrically interactions between nutrients by a ternary diagram where one nutrient can be computed by difference between 100% and the sum of the other two. As a result, there are two degrees of freedom in a ternary diagram. One may also derive three dual ratios from K, Ca, and Mg but only two ratios are linearly independent because, for example, K/Mg can be computed from K/Ca × Ca/Mg and is thus redundant. Therefore, a ratio approach conveys
In contrast, there are
To solve problems related to nutrient diagnosis in soil and plant sciences, one must first recognize that soil and plant analytical data are most often compositional, i.e. strictly positive data (concentrations, proportions) related to each other and bounded to some whole . Compositional data have special numerical properties that may lead to wrong inferences if not transformed properly. Log-ratio transformations have been developed to avoid numerical biases [26, 29, 30, 31]. The balance concept presented in this chapter is based on log ratios or contrasts. Balances are computed rather simply from compositions using the isometric log-ratio (
The aim of this chapter is to introduce the reader to the balance concept as applied to soil fertility studies. The first section of this chapter presents the theory common to the three subsequent subjects, which are cationic balance in soils, plant nutrient signatures and mineralization of organic residues. It is suggested that the reader gets familiar with the theory before browsing through the subject of interest.
2. Theory of CoDa
Because a change in any proportion of a whole reverberates on at least one other proportion, proportions of components of a closed sum (100%) are interdependent. Therefore, a compositional vector is intrinsically multivariate: its components cannot be analyzed and interpreted without relating them to each other [32,33]. Compositional data (CoDa) induce numerical biases, such as self-redundancy (one component is computable by difference between the constrained sum of the whole and the sum of other components), non-normal distribution (the Gaussian curve may range below 0 or beyond 100% which is conceptually meaningless) and scale dependency (correlations depend on measurement scale). Redundancy can be controlled by carefully removing the extra degree of freedom in the
One of the log ratio transformations is the centered log ratio (
The additive log ratio or
2.1. From CoDa to sound balances
The sample space of a compositional vector defined by
Where closes the sum of components to some whole such as 1, 100%, 1000 g kg-1, which allows computing a filling value to the unit of measurement. In other cases where the data do not add up to the measurement unit such as mg dm-3 or mg L-1, the measurement unit just cancels out when components are ratioed.
In general, raw or log-transformed concentration data are analyzed statistically without any
Fortunately, recent progress in compositional data analysis provides means to elaborate structured pathways and interpret results coherently . Indeed, the
Balances can be illustrated by a CoDa dendrogram  where components or groups of components are balanced by analogy to a mobile and its fulcrums (Figure 3). Each part has its own weight and the balance between parts or groups of parts are the fulcrums (boxplots) equilibrating the system and computed as
A CoDa dendrogram (e.g. Figure 3) is interpreted as follows:
Each fulcrum represents a balance. There are 4 balances for 5 components in Figure 3.
If the fulcrum lies in the center of the horizontal bar, the balance is null. If it lies on the left side of the center, the mean balance is negative and left-side components occupy a larger proportion in the simplex. A fulcrum on the right side indicates a positive balance.
Rectangles located on fulcrums are boxplots.
The length of vertical bars represent the proportion of total variance
Nested balances are encoded in an
|[N,P,K | Ca,Mg]||+1||+1||+1||-1||-1||3||2|
|[N,P | K]||+1||+1||-1||0||0||2||1|
|[N | P]||+1||-1||0||0||0||1||1|
|[Ca | Mg]||0||0||0||+1||-1||1||1|
In Table 1, the sequential binary partition of nutrients encodes the balances between two geometric means across the + components at numerator and the – components at denominator. The orthogonal coefficient of a log contrast is computed from the number of + and – components in each binary partition. The balances between two subcompositions are orthogonal log ratio contrasts between geometric means of the “+1” and “-1” groups. The
2.2. Dissimilarity between compositions
As a result of orthogonality, the Aitchison distance () between any two compositions is computed as a Euclidean distance across the selected
On the other hand, the Euclidean distance () based on log transformations is biased by the difference between the geometric means times the number of parts as follows :
In plant nutrition studies , the Mahalanobis distance () may be preferred to the Euclidean distance because the former takes into account the covariance structure of the data  (as illustrated in Figure 5) and has distribution [50,51]. The
Where is the mean and COV is the covariance matrix. Both and computed across log-transformed data are higher than their counterparts computed across balances, indicating systematic upper bias using natural log compared to
2.3. Cate-Nelson analysis
The Cate-Nelson procedure was developed as a graphical technique to partition percentage yield (yield in control divided by maximum yield with added nutrient) versus soil test . The scatter diagram is subdivided into four quadrants to determine a critical test level and a critical percentage yield by maximizing the number of points in the + quadrants. This technique is analog to binary classification tests widely used in medical sciences  where data each quadrant are interpreted as true positive (correctly diagnosed as sick), false positive (incorrectly diagnosed as sick), true negative (correctly diagnosed as healthy) and false negative (incorrectly diagnosed as healthy). Applied to soil fertility studies, we can define four classes as follows:
True positive (TP: nutrient imbalance): imbalanced crop (low yield) correctly diagnosed as imbalanced (above critical index).
False positive (FP: type I error): balanced crop (high yield) incorrectly identified as imbalanced (above critical index). FP points indicate luxury consumption of nutrients.
True negative (TN: nutrient balance): balanced crop (high yield) correctly diagnosed as balanced (below critical index).
False negative (FN: type II error): imbalanced crop (low yield) incorrectly identified as balanced (below critical index). FN points show impacts of other limiting factors.
The performance of the test is measured by four indices:
Sensitivity: probability that a low yield is imbalanced as TP/(TP+FN)
Specificity: probability that a high yield is balanced as TN/(TN+FP)
Positive predictive value (PPV): probability that an imbalance diagnosis returns low yield as TP/(TP+FP)
Negative predictive value (NPV): probability that a balance diagnosis returns high yield as TN/(TN+FN)
The performance of the binary classification test is higher when the four indexes get closer to unity. However, the maximization of the four indexes may not be the most appropriate procedure. Indeed, agronomists are more interested in high PPV than in high specificity.
Using the Cate-Nelson graphical procedure, the TN specimens are selected as reference population after removing outliers. If the number of points is too large, yields are arranged in an ascending order and a two-group partition is computed. The sums of squares between two consecutive groups of observations are iterated as follows:
Where is class 1 yields starting with the two lowest soil indices; the remaining yields are in class 2 or ; and
In this chapter, statistics computed across compositional data were performed in the R statistical environment . Compositional data analysis was conducted using the R “compositions” package . Data distribution was tested using the Anderson-Darling normality test  in the “nortest” package . Multivariate outliers were removed using computed in the R “mvoutlier” package . Linear discriminant analysis (LDA) was used as a statistical ordination technique that allows computing linear combinations of variables that best discriminate groups. Multiple regression analysis was conducted using
3. Cationic balances in tropical soils
3.1. Sequential binary partition
The percentage base saturation is the proportion of soil cation exchange capacity (CEC) occupied by a given cation. The soil compositional vector is defined as follows :
As illustrated in Figure 7, the first contrast, [K | Ca, Mg, H+Al], balances the K against divalent cations and acidity to enable adjusting the K fertilization to soil basic acid-base conditions as modified by liming.
The second contrast [Ca, Mg | H+Al] is the acid-base contrast for determining lime requirements while the [Ca | Mg] balance reflects the Ca:Mg ratio in soils adjustable by the liming materials. Alternative SBPs could also be elaborated such as [K, Ca, Mg | (H+Al)], [K | Ca, Mg] and [Ca | Mg] balances that reflects the BCSR model of . The selected sequential binary partition for cationic balances is presented in Table 2.
For example, if a soil contains 2.9 mmolc K dm-3, 20 mmolc Ca dm-3, 5 mmolc Mg dm-3, and 23 mmolc H+Al dm-3. Cationic balances are computed as follows:
Note that the K fertilization would depend on soil acidity as well as levels of exchangeable Ca and Mg in the soil. We thus expect the K index and the K balance to be similarly related to fruit yield if the
Changes in soil cationic balances were monitored in N and K fertilizer trials established on an epieutrophic and endodystrophic soil (Red-Yellow Oxisol)  at São Carlos (São Paulo, Brazil). One year old plants of ‘Paluma’ guava (
Soils were sampled annually after harvest at four locations per tree in the 0-20 cm and 20-40 cm layers where most of the root system is located, then composited per plot. Soil samples were air dried and analyzed for K, Ca, Mg and (H + Al) . The K, Ca and Mg were extracted using exchange resins, quantified by atomic absorption spectrophotometry and reported as mmolc dm-3. Exchangeable acidity (H+Al) was quantified by the SMP pH buffer method  and computed using the equation of  to convert buffer pH into mmolc (H+Al) dm-3 as follows:
Cation exchange capacity (CEC) was computed as the sum of cationic species. Assuming a soil bulk density of 1 kg dm-3, CEC averaged 5.4 cmolc kg-1.
3.3.1. Influence of the K fertilization on cationic balances in soil
As shown by scatter and ternary diagrams (Figure 8), the large ellipses, that represent the distribution of cationic balances in the 0-20 and 20-40 cm layers, overlapped. However, the small ellipses (Figure 8) representing the confidence region about means differed significantly. The [K │ Ca, Mg, H+Al] balance was higher in the 0-20 cm layer, indicating that more K accumulated in the surface layer as a result of surface K fertilizer applications.
Soil test K and cationic balances were averaged between the beginning and the end of the growing season to represent average soil conditions. The soil indices were related to fresh fruit yield (Figures 9a and 9b). In Figure 9, data are means of 4 replicates and bars are least significant differences.
3.3.2. Critical soil K concentration and balance in the N and K trials
The Cate-Nelson partitioning of the relationship between guava fresh fruit yield and either soil K level or the [K | Ca, Mg, (H+Al)] balance across the combined N and K fertilizer experiments indicates that the K level index classified two specimens as TN compared to four for the K balance index (Figure 10). The graphical representation of this soil-plant relationship indicates diagnostic advantage to using the K nutrient balance in rather than the K concentration.
The sensitivity, specificity, PPV and NPV criteria are presented in Table 3. We expect performance criteria to be at least 80%. Low specificity indicates that some interactions with K leading to high yield, possibly involving Ca and Mg, have been ignored. Apparently, the
4. Multi-element Balances in plant nutrition
4.1. Sequential binary partition
Plant nutrients are classified as essential macronutrients measured in % (N, S, P, Mg, Ca, K, Cl), essential micronutrients measured in mg kg-1 (Mn, Cu, Zn, Mo, B) and beneficial nutrients generally measured in mg or μg kg-1 but occasionally in % (Si, Na, Co, Ni, Se, Al, I, V) [64, 65, 15]. The plant ionome is defined as elemental tissue composition as related to the genome . A subcomposition of plant ionome could be defined by the following simplex for conducting statistical analysis:
Macronutrients have a stoichiometric relationship with carbon uptake;
N with S, P, K, Ca, Mg, Fe, Mn, Zn, and Cu;
NH4 with K, Ca, and Mg;
S with N, P, Fe, Mn, Mo;
P with N, K, Ca, Mg, B, Mo, Cu, Fe, Mn, Al, and Zn;
Cl with N and S;
K with N, P, Ca, Mg, Na, B, Mn, Mo, and Zn;
Ca with N, K, Mg, Na, Cu, Fe, Mn, Ni, and Zn;
Mg with N, P, B, Fe, Mn, Mo, Na, and Si;
B with N, P, K, and Ca;
Cu with N, P, K, Ca, Fe, Mn, and Zn;
Fe with N, P, Ca, Mg, Cu, Mn, Co, and Zn;
Zn with N, P, K, Ca, Mg, S, Na, Zn, Fe, and Mn;
Mn with N, P, K, Ca, Mg, B, Mo, Ni, and Zn;
Mo with N, P, K, S, Fe, and Mn.
The tissue composition can be altered by environmental and seasonal factors. A dataset of 1909 potato (
A critical hyper-ellipsoid can be viewed as a particular zone of the nutrient balance space where the probability to obtain high yield is high enough to satisfy the practitioner. The points lying inside the hyper- ellipsoid would be qualified as “balanced”, and those lying outside the multi-dimensional construct, as “imbalanced”. The practitioner might delineate intermediate zones if needed. Fertilizer trials were conducted to monitor balance change toward optimum nutrient conditions defined by the critical ellipses. In a P trial, P treatments applied to a P deficient soil were 0, 33, 66, 98 and 131 kg P ha-1. In a K trial, K treatments of 0, 50, 100 and 150 kg K ha-1 were applied to a K deficient soil. The diagnostic leaf of potato was sampled at the beginning of flowering .
4.3. Seasonal change in nutrient compositions
The boxplots and the CoDa dendrogram illustrate the center and dispersion of nutrient balances per development stage (Figure 11). The [N, P, K | Ca, Mg] balance tended to decrease markedly during the season while the [N | P] and [Ca | Mg] balances tended to increase, and the [N,P | K] balance tended to decrease. The fast decrease in [N,P,K | Ca, Mg] balance is attributable to more N, P and K than Ca and Mg being transferred toward growing leaves during exponential growth and toward tubers during maturation. The K was more affected than N and P.
The discriminant scores (dots) and eigenvectors, as well as confidence regions at 95% level delineated the distributions of populations (large grey ellipses) and means (small color colored ellipses) across stages of plant development (Figure 12). The first axis, dominated by the Redfield [N | P] balance followed by the [N, P, K | Ca, Mg] balance, captured 92% of total inertia. It is noteworthy that the nutrient balance changed orderly from one developmental to the other. The
4.4. Defining reference balances for diagnostic purposes
The confidence region of optimum nutrition was defined by a 4-dimensional hyper-ellipsoid (Figure 13).
The green and red points in Figure 13 represent specimens showing balanced and imbalanced nutrition, respectively. The fertilization of the potato should move nutrient signature toward the hyper-ellipsoid center. Added P perturbed the internal nutrient balance of cv. ‘Superior’ growing on a P deficient soil (Figure 14). The P trial showed that an addition of 98 kg P ha-1 allowed the balance to penetrate into the critical ellipse.
In Figure 15, it can be observed that added K also perturbed the nutrient balance: the potato ionome moved toward the critical ellipse. The 2nd K rate moved the K deficient plant ionome closer to the critical ellipse, but Ca shortage maintained the crop outside the critical ellipse. From the second application rate up, the perturbation was small. In this case, the Ca was likely to be the most limiting nutrient as shown on the ternary diagram.
The perturbation on 5 nutrients can be illustrated by a matrix of ternary diagrams (Figure 16). These diagrams show 2 nutrients and an asterisk (*) representing the sum of the 3 other components. The central dot is the mean of high yielders surrounded by its 95% confidence region represented by a black line.
5. Compositional modeling of C mineralization of organic residues in soils
The carbon, nitrogen, phosphorus and sulfur cycles are interconnected in agroecosystems and often expressed using stoichiometric rules . The ratio between total C and total N is the most simplified rule used in C mineralization studies but the Corganic/Norganic and lignin/ Norganic ratios are also common. However, several biochemical components of organic matter are omitted in most studies, resulting in loss of information on the system. There are few studies on the relationship between labile or recalcitrant C and the biochemical composition of organic residues added to soil.  analyzed ash and N contents as well as four C fractions in organic residues representing pools of increasing resistance to decomposition. In this section, we related labile C in organic residues to this 6-part compositional vector of organic residues. The components were expressed as fractions on dry weight basis to compute a biological stability index using multiple linear regression models. The compositional vector was defined as follows:
Where SOL = soluble matter, HEM = hemicellulose, CEL = cellulose, L IG= lignin, and N = total nitrogen.
Because scale dependency induces spurious correlations [71, 72, 73] and linear regression models are solved based on correlations between variables, the interpretation of regression coefficients is scale-dependent. To illustrate the problem of spurious correlations, chemical fractions were scaled on organic mass basis and analyzed using multiple linear regression.
The balance scheme reflected the C/N ratio and the order of decomposability of biochemical components (Figure 17). The SOL fraction was isolated from other biochemically labile fractions because its composition is complex, possibly including sugars, amino-sugars, amino-acids, and polypeptides as well as more recalcitrant or bacteriostatic easily solubilized polyphenols such as fulvic acids, tannic substances, resins, intermediate products, etc. The balance scheme was formalized by SBP as shown in Table 4.
|[SOL,HEM,CEL,LIG,N | Ash]||1||1||1||1||1||-1||5||1|
|[SOL,HEM,CEL,LIG | N]||1||1||1||1||-1||0||4||1|
|[SOL,HEM,CEL | LIG]||1||1||1||-1||0||0||3||1|
|[SOL | HEM,CEL]||1||-1||-1||0||0||0||1||2|
|[HEM | CEL]||0||1||-1||0||0||0||1||1|
The linear regression models relating labile C to bio-chemical fractions or balances showed R2 values between 0.86 and 0.92 (Figure 18). For the 6-part (dry mass basis) and 5-part (organic matter basis) models, variation in labile C mesaured as evolved CO2 was explained in part by total N and SOL as follows:
However, Equations 11 and 12 were subcompositionally incoherent. The intercept and the β coefficient for HEM showed opposite signs in equations 11 and 12 while CEL and LIG were absent in Equation12. This incoherence is attributable to spurious correlations (Table 5). Pearson correlation coefficients among raw proportions were not consistent in terms of value, significance or sign whether the proportions were expressed on the dry mass of the organic product (including ash) or on organic matter (LOI) basis.
|Pearson correlation coefficient|
|Dry matter basis (including ash)|
|Organic matter basis (loss on ignition)|
On the other hand, the labile C pool was largely explained by the
Equation 13 shows that labile C increases with total N and higher proportions of more labile over more recalcitrant C forms. These findings indicate that the
This paper shows that the specific numerical properties of compositional data require log ratio transformations before conducting statistical analyses of soil and plant compositional data. Compared to raw concentration data, the orthonormal balances can be interpreted consistently and without numerical bias as isometric log ratio coordinates. The
The balance paradigm was elaborated within the plant nutrition and soil carbon modules of the research project entitled ‘Implementing means to increase potato ecosystem services’ (CRDPJ 385199 – 09). We acknowledge the financial support of the Natural Sciences and Engineering Research Council of Canada (NSERC), the Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP), the Coordinação de Aperfeiçoamento de Pessoal de Nivel Superior (CAPES), as well as farm partners as follows: Cultures Dolbec Inc., St-Ubalde, Québec, Canada; Groupe Gosselin FG Inc., Pont Rouge, Québec, Canada; Agriparmentier Inc. and Prochamps Inc., Notre-Dame-du-Bon-Conseil, Québec, Canada; Ferme Daniel Bolduc et Fils Inc., Péribonka, Québec, Canada.