Machine Learning, Compositional and Fractal Models to Diagnose Soil Quality and Plant Nutrition

Soils, nutrients and other factors support human food production. The loss of high-quality soils and readily minable nutrient sources pose a great challenge to present-day agriculture. A comprehensive scheme is required to make wise decisions on system ’ s sustainability and minimize the risk of crop failure. Soil quality provides useful indicators of its chemical, physical and biological status. Tools of precision agriculture and high-throughput technologies allow acquiring numerous soil and plant data at affordable costs in the perspective of customizing recommendations. Large and diversified datasets must be acquired uniformly among stakeholders to diagnose soil quality and plant nutrition at local scale, compare side-by-side defective and successful cases, implement trustful practices and reach high resource-use efficiency. Machine learning methods can combine numerous edaphic, managerial and climatic yield-impacting factors to conduct nutrient diagnosis and manage nutrients at local scale where factors interact. Compositional data analysis are tools to run numerical analyses on interacting components. Fractal models can describe aggregate stability tied to soil conservation practices and return site-specific indicators for decomposition rates of organic matter in relation to soil tillage and management. This chapter reports on machine learning, compositional and fractal models to support wise decisions on crop fertilization and soil conservation practices.


Introduction
With the world population expected to reach more than 9 Â 10 9 people by 2050, the food demand must increase by 70% in a situation where yield average of several staple crops is expected to decline [1]. More than 95% of our food is produced on soil [2]. Despite the general perception that soil is an abundant resource, the reality is that the soil resource is degrading at fast rate as a result of salinization, erosion, compaction, contamination, structure collapse, acidification, loss of organic matter and biological activities, as well as land allocation to urban and industrial development. Gains in technology alone will not suffice to compensate the harmful agricultural practices thought heroically to maintain soil productivity and farm viability on the long run. Understanding comprehensively how agroecosystems build and function worries more. Two centuries ago, German scientist Alexander von Humboldt warned that management of living systems must be based on the rigorous collection of contextual facts and local knowledge [3]. His thoughts translate today into data acquisition from diverse sources, data mining and data processing methods to assist making wise decisions on how to manage soils properly at local scale.
The land is the basic resource for food production. There is a need to develop soil quality criteria and implement them where it matters most. [4] attributed large disparities in decision-making thought naively to manage soils properly to unequal, insufficient or inadequate collection of information, widespread ignorance on how agroecosystems function, lack of understanding on how factors interact, and the wrong perception that buisiness-oriented economic and social values outweigh environmental damages or beneficial ecosystem services. Indeed, high crop productivity relies on positive interactions between climatic, managerial and edaphic factors [5]. Data must be integrated into comprehensive decision-making models to manage complex systems sustainably. High-quality and diversified information reduces the risk of making wrong decisions based on regional averages rather than at the right interaction level at field scale [6,7]. Judicious decisions on locally acceptable actions should rely on well-documented facts and sound knowledge of environmental conditions. Besides traditional means to diagnose soil-plant systems, progress on data acquisition tools includes proximate and remote sensing, highthroughput laboratory technologies or on-the-go data acquisition kits of precision agriculture.
Several diagnostic models support decisions on soil and nutrient management. While soil properties and plant compositions have been addressed as separate variables in reductionist models [8], empirical-mechanistic models were developed to synthesize more data, balancing untestable and testable concepts [9][10][11]. This required not only sufficient data input, but also calibrating empirical coefficients and validating the results in a wide variety of environments. More recently, modern tools of artificial intelligence allowed to process large and diversified datasets in relation with ecosystem performance based on Alexander von Humboldt's principles of biogeography [3].
On the other hand, soil and plant analytical data are inherently multivariate compositional data constrained to the measurement unit, posing a serious numerical problem of "resonance" within the constrained space of compositions, such as 100% or the unit of measurement [12]. Ternary diagrams were the first representations of the closed space of three interrelated variables [13]. [14] related tissue N, P and K concentrations in a ternary NPK diagram to delineate the space of successful tissue compositions. It was not until [12] that ternary diagrams formed the basis of an emerging and appealing field of mathematics called "Compositional Data Analysis" (CoDa). CoDa rely on log ratio transformations. [15] developed means to project compositions as coordinates in the Euclidean space. The CoDa concepts corrected computational errors and fallacies in earlier plant and soil diagnostic models [16,17].
On the other hand, the fractal theory has been useful to address the geometry of soil aggregation [18] and the kinetics of carbon decomposition in soils [19]. Fractal kinetics assigned to time a coefficient between 0 and 1 to explain the reduction in decomposition rate due to reduced contact between organic matter particles and their immediate environment resulting from aggregate buildup with time [19]. Fractal coefficients also provided a description of aggregate fragmentation patterns upon mechanical stress and avoided computational errors reported in classical synthetic measures of aggretation [20].
Machine learning, compositional and fractal modeling tools can process large and diversified soil-plant datasets that allow conducting side-by-side comparisons between failure and success. We hypothesized that well-informed models can assist making wise decisions on soil and nutrient management at local scale. In this chapter, we address carbon sequestration and factor-specific fertilization to sustain soil productivity and support resource conservation actions.

Growth-limiting factors
Field trials to document practices are conducted under the assumption that all factors but the ones being varied are equal or at optimum levels. Liebscher's law of the optimum stated that "a production factor which is in minimum supply contributes more to production, the closer other production factors are to their optimum" [8]. The law of the maximum aimed to optimize controllable factors given the impossibility to modify factors that are not controllable in the present state of knowledge and technology [21]. A provisionary list of growth-impacting factors is provided in Table 1 [21,22]. Nutrient interactions impact crop yield through synergism, antagonism, dilution, excess, toxicity or crosstalks. Nutrient interactions are addressed as pairwise ratios [23]. Nutrient crosstalks occur where change in sulfur availability alter tissue compositions of micronutrients [24]. An extreme case of nutrient excess is toxicity where vital processes are affected. In field experiments, synergism is also viewed as positive interaction occurring where plant response is greater by combining two nutrients than from individual effects [25]. A list of nutrient interactions is presented in Table 2.
Face to the formidable task to optimize tens of growth-limiting factors and myriads of factor interactions, most of them being unknown, each case under study could rather be viewed as unique combinations of factors. For successful cases in the neighborhood, most factors are equal except those impacting the performance of defective specimens, facilitating side-by-side comparisons.

Soil quality indicators
In Canada and Brazil as well as in other countries, soil mismanagement led to soil degradation [30,31]. There is a great challenge to address soil problems and optimize resource-use efficiency to sustain soil productivity [32]. Soil quality impacts nutrient supply and resistance to erosion [33,34]. [4] provided a list of biological, chemical and physical indicators of soil quality measurable at various scales of agroecosystems ( Table 3). Biological indicators are presently the least documented but technologies of metagenomics will fill this gap in years to come [35]. Point-scale indicators can be integrated into maps to guide precision agriculture at field or subfield level. It is still difficult to evaluate soil quality uniformly among stakeholders with respect to soil threats, soil multifunctionality and ecosystem services [36].

Biological Chemical Physical
Point-scale indicators • Desertification • Loss of vegetative cover • Wind and water erosion • Siltation of rivers and lakes Table 3.

Soil test diagnosis
The sufficiency level of available nutrients (SLAN), the basic cation saturation ratio (BCSR), and soil test buildup and maintenance (STBM) are the main soil test interpretation philosophies [34]. The SLAN and BCSR addressed the relatively immobile nutrients (P, K). The STBM was used to manage N, P, and K. Critical and maintenance soil test levels were delineated from field trials. Bray (1963) [22] assumed that (1) for nutrients relatively immobile in soils such as P and K, soils and fertilizers have nutrient-supply coefficients specific to plant species, planting patterns and rates, provided that soil and climatic conditons are similar and (2) response patterns can be described by the Mitscherlich equation. The SLAN related soil test P and K to percentage yield using the Mitscherlich-Bray equation. Alternatively, the relationship was partitioned into soil fertility classes each given a probability of response to fertilization [34,37]. Compared to actual yield, percentage yield showed higher correlation with soil test level. Percentage yields have been first expressed as yield at 0-level of nutrient, other factors assumed to be at adequate levels, divided by yield where all factors were assumed to be at adequate levels. Percentage yields were also expressed as response ratios, i.e., ln Y treatment =Y control ð Þ , i.e. yield gain of treatment over that of control, to run metaanalysis at regional scale [38]. Using yield percentage and probability of response, the SLAN concept assumed random effects across factors not being varied and thus hid the effects of local factors that impact crop yield.
The BCSR postulated, without proper calibration, that "ideal" cationic ratios and saturation levels should be maintained on soil cation exchange capacity to maximize yield [28]. The application of such concept to fertilization decisions failed under field conditions, most often leading to overfertilization [39]. Nevertheless, BCSR may assist making decisions on liming and lime sources to neutralize soil acidity, provide proper cementing agents bridging soil particles and improve soil aggregation [24]. In comparison, compositional data analysis methods proved to be a more appropriate approach to run statistical analysis on results of soil tests for cations and other cementing agents [29,40].
The STBM concept has been elaborated from nutrient budgets, nutrient-use efficiency and soil P-fixing capacity as an attempt to adjust fertilization to local conditions. Expected yield and plant-and soil-specific coefficients were assessed from field observations and pot trials [41]. Soil P fixing capacity has been assessed in priority in Brazil, but coefficients estimated from literature often proved to be unrealistic, leading to overfertilization at local scale, especially for P [42].
Transferring SLAN, BCSR and STBM regional models to the local scale cannot be a straightforward operation. Growers' heuristics is traditionally to look for successful practices developed under comparable environmental and managerial conditions as reported in their neighborhood. Alternatively, large and diversified datasets can be documented and synthesized into a diagnostic kit of features easy-to-acquire by stakeholders at reasonable cost and effort among those presented in Tables 1 and 3. The minimum package of facts, factors and local knowledge supporting fertilization decisions can be handled by machine learning models to diagnose growthlimiting factors and predict crop yields after correction. Thereafter, compositional data analysis can rank dianosed components in the order of their limitations to yield to support nutrient management [43][44][45][46]. Yield can be predicted in regression mode. Besides, the classification mode can provide a list of high-yielding and balanced specimens as benchmarks for use at local scale, as well as the probability to yield more than some yield target.

Soil quality diagnosis
The interpretation of soil quality indicators requires well defined values, otherwise, the indicators cannot be used in practice to support management decisions [35]. Benchmarks could be native soil, reference sites, or successful combinations of comparable factors for agronomically or environmentally performing soils. Scores could have thresholds for (1) more than is better, (2) optimum range, (3) less than is better, or (4) undesirable range [47]. Principal component analysis (PCA), redundancy analysis (RDA), discriminant analysis and multiple regression have been used to process data.
Soil aggregation is a key indicator of soil quality. Mean weight diameter (MWD) is a common indicator of soil aggregation computed as follows: Where x is aggregate diameter and w i is the mass of the i th aggregate fraction. Mean particle diameter is assessed as average sieve size between successive sieves rather measured as average particle size. The contribution of the largest fractions is inflated artificially by multiplying the fraction by its diameter.
The MWD is numerically biased, unevenly weighted, and computed from aggregate-size fractions that vary widely among studies [40]. Alternatively, patterns of aggregate fragmentation can be synthesized into fractal dimensions. It is assumed that aggregates collapse following mechanical stress into smaller fragments of similar shape. Aggregates left on each sieve are counted after subtracting the sand fraction (> 53 μm) on each sieve [40] as follows: Where N d i ð Þ is the number of particles, M d i ð Þ is the mass of aggregates of the i th aggregate-size fraction, d i is mean diameter and ρ i is bulk density. Note that ρ i must differ between the stronger and denser micro-and the more friable macroaggregates. The shape coefficient c i refers to a cube. Particle volume can be computed as x 3 , x being the average opening between two successive sieves.
The fractal dimension D f is estimated as follows: Where S d k ð Þ is the cumulated number of particles with diameter ≤ d k , N d i ð Þ is the number of particles in the i th size fraction, α is a proportionality parameter, and D f , the fragmentation fractal dimension, is a scaling factor derived from the log-log relationship between S d k ð Þ and d k . The fractal model for soil aggregation is presented in Table 4 and Figure 1. The fractal was found to be 2.51 (slope), indicating well aggregated soil. Fractal dimensionality is generally between 2 and 3 for the 3-D soil aggregates, but may exceed even 3, a result difficult to interpret physically. Aggregate-size fragments have contrasting friability, often showing several fractal patterns. However, the fractal dimensions have the disadvantage of being assessed from a limited number of sieves.
Carbon sequestration plays a key role to enhance soil quality and abate greenhouse gases. Because aggregates reduce the contact between the organic substrate and its immediate environment as they build up in soils, the decomposition rates of organic particles decrease with time, allowing organic matter to accumulate [19]. First-order kinetics of organic matter decomposition in soils k t ð Þ is controlled by fractal coefficient h as follows: Where k 1 is decomposition rate at time t = 1 and h is fractal coefficient. If h ! 0, k is non-fractal and the reaction proceeds at maximum rate; if h ! 1, decomposition rate is fractal, indicating that protection mechanisms control reaction rate during soil agradation or degradation. [19] found fractal coefficient of 0.71 for wellaggregated soils under pasture compared to 0.45 for annual cropping and 0.25 for a degraded soil under fallow. Hence, the fractal coefficient is a measure of carbon protection mechanisms developing as soil quality increases or of loss in protection mechanisms leading to soil degradation.
The soil aggregation has also been expressed in terms of isometric log ratios (ilr) or coordinates [40]. The ilr is computed as a balance between two groups of aggregate fractions, as follows: Where r and s are numbers of aggregate-size fractions at numerator and denominator, respectively, and G 1 and G 2 are geometric means of aggregate-size fractions at numerator and denominator, respectively. The balance dendrogram in Figure 2 is a system of balances among five aggregate-size fractions starting with a general balance between micro-(< 0.25 mm) and macro-(> 0.25 mm) aggregates where r = 4 (the number of macro-aggregate fractions) and s = 1 (the microaggregate fraction). The balance between micro-and macro-agregates in Table 4 is computed as follows: Because ilr transformation allows projecting compositions into the Euclidean space, Euclidean distance ε can be computed between two soil aggregation states across ilr dimensions to indicate whether the soil is degrading or agrading, as follows [40]: Where j is a compositional dimension. Because computations are made on a mass basis rather than particle counts as for fractal dimensions, there is no need to make assumptions about ρ i and c i . The benchmark aggregation state could be defined as ultimate aggregation state where all aggregates pass through the smallest sieve size.

Tissue nutrient diagnosis
Early workers proposed to classify the results of tissue tests, that are continuous variables, using concentration ranges and critical values such as poverty adjustment (deficiency), critical percentage, and nutrient sufficiency, luxury consumption or excess (including antagonism and toxicity) [48][49][50]. The critical percentage was the tipping point on the response curve, located at 90-95% maximum yield. Nutrients were diagnosed separately rather than as unique combinations of interactive nutrients. Although the reject/accept dichotomania led to considerable interpretation uncertainties [17], the one-nutrient-at-the-time approach is still commonly used today. [51] suggested using methods of multivariate analysis to handle tissue compositions as a whole rather than as separate components, ignoring the numerical pathologies of using inherently interrelated raw concentration values.
Dual ratios were thought to account for nutrient interactions [52]. The Diagnosis and Recommendation Integrated System (DRIS) has been elaborated to handle nutrient ratios [53,54]. The DRIS required computing the mean and variance of dual ratios but did not fit into any method of multivariate analysis. Much earlier, [14] already developed a concept of optimum combinations of interactive nutrients within a ternary diagram (Figure 3). Because plants show various degrees of plasticity in response to growing conditions [55][56][57], they can adjust nutrient acquisition to nutrient stress [58][59][60][61]. This fits perfectly into the realm of Composition Data Analysis.
Because compositional vectors convey relative information, one should first 'think ratios' but, realizing that quotients are more difficult to handle than sums or differences, 'think logratios' [62]. Log ratios are log contrasts between components at numerator and denominator, respectively. While compositional data are constrained to the compositional space (e.g., 100%), log ratios can scan the real space, allowing to conduct statistical analyses and return confidence intervals without constraints. It was not until [12] developed the theory of Compositional data Analysis (CoDa) that ternary diagram could be expanded to more than three nutrients.
The Compositional Nutrient Diagnosis (CND) avoided several computational pathologies in DRIS such using different measurement units for macro-and micronutrients, pairwise rather than multivariate ratios, non-normal distribution, use of a dry matter basis as a separating component, assumed additivity of nutrient functions, non-symmetrical functions between dual ratios and their inverse, and non-symmetrical nutrient ratio and product functions. The CoDa also allowed  (ellipses with p = 0.10, 0.05, and 0.01, respecrtively). diagnosing multinutrient ratios in the Euclidean space [16] and conducting multivariate analyses in plant ionomics [58].
In CoDa, the simplex is closed to measurement unit using a filling value computed as follows: Where F v is the filling value for unit g kg À1 , D is the number of quantified components in the D-part composition, and c i is concentration of each quantified part. The filling value is required to back-transform log ratio means into original concentration values. The centered log ratio [clr ¼ ln x i =G ð Þ] integrates all pairwise ratios into a single multinutrient expression, as follows for N: Where clr is centered log ratio, x i is a component of the compositional simplex, and G is geometric mean across components including the filling value, expressed in exactly the same measurement unit. For plant tissue analysis showing 4% N, 0.325% P and 5% K, the filling value is 100% -(4% + 0.25% + 5%) = 90.75%. The clr value for N in that 4-part composition is computed as follows: Euclidean distance ε can be computed between two tissue states, one being diagnosed and another being used as benchmark composition, using clr or ilr as follows: The ilr has the advantage over clr that Euclidean distances can be computed across the selected Euclidean dimensions (Figure 4). Micronutrients can be balanced separately to avoid large variations due to tissue contamination. Moreover, macronutrients with concentrations moving in the same direction with time (N, P, K vs. Ca, Mg) [63,64] can be set apart to address timlessness (Figure 5).
The CND based on clr aimed initially to replace DRIS for regional diagnosis [16,42,[65][66][67][68][69][70][71][72][73][74][75][76][77][78][79][80]. Thereafter, a website service was made available to Brazilian growers (https://www.registro.unesp.br/#!/sites/cnd/). The standardized clr differences between clr values of the diagnosed (clr j ) and that of the reference subpopulation (clr * j ) of true negative (high-yielding and nutritionally balanced) specimens weighted by the standard deviation (SD * j ) ranked nutrients in the order of their limitation to yield, as follows [80]: At that time, the reference subpopulation was selected at regional scale using the Cate-Nelson partitioning procedure by iterating the Mahalanobis distance M to maximize classification accuracy. The M was computed as follows: The M 2 is distributed like a χ 2 variable. The variance matrix is used where clr values are relatively independent from each other [80]. The use of D clr variables leads to singularity of the covariance matrix. This required removing one clr value, generally that of the filling value. [81] recommended using the ilr transformation rather than clr or the ordinary log transformation to conduct multivariate analysis due to the advantageous orthonormal basis of ilr variables.
The Cate-Nelson procedure returned four quadrants by point counting and thus allowed setting apart the subpopulation of true negative specimens, avoiding to include false positive specimens (high-yielding but nutritionally imbalanced) in the reference subpopulation, as was the case for DRIS and other nutrient diagnostic approaches. Quadrants are interpreted as follows:  Model accuracy is determined as follows:

Machine learning methods to process large datasets
An introduction to machine learning methods is provided in [82]. "When dealing with complexity, mechanistic models become less obvious. System thinking, implying stocks and flows, becomes difficult to tune where species interact through varying functions over space and time … most ecological patterns are nonlinear … Another approach could rely purely on phenomenology with machine learning. Using this approach, we identify key features to predict outcomes using pattern detection".
Machine learning is a family of methods of artificial intelligence that includes object similarity algorithms (k-nearest neighbors), decision trees (e.g., Random Forest), boosted decision trees (e.g., Gradient Boosting), multiple regression, gaussian methods, neural networks and several others, often tunable with hyperparameters. Machine learning methods can integrate numerous growthimpacting factors including soil quality indicators such as those documented by technologies of precision agriculture or supported by classical state-or industrybased agronomic models. Documenting as many growth-limiting factors as possible can decrease the number of assumptions required to diagnose nutrient problems at local scale, facilitating side-by-side comparisons. The confusion matrix generated by machine learning (ML) model in classification mode classified specimens into four quadrants by point counting, and thus allowed setting apart true negative specimens.
Compositional Data Analysis can be combined with machine learning methods to customize plant nutrient requirements for application at local scale where factor interactions shape fertilization decisions [17,46,[83][84][85][86]. After running ML methods, it was suggested to use the ilr transformation to compute the Euclidean distance between the diagnosed (X) and successful (x) compositions, then compute the corresponding perturbation vector to rank nutrients in the order of their limitations to yield [44]. The perturbation vector is computed as follows [87]: The perturbation vector resembles the Deviation from Opimum Percentage [88]. Several log ratio transformation techniques other than clr and ilr are available but have not been tested yet [89].

Information flow
A flow of information from data acquisition to dataset organization and fertilizer recommendations at subfield level was described for lowbush blueberry (Vaccinium angustifolium) in Quebec [46], cranberry (Vaccinium macrocarpon) in Quebec and Wisconsin [85], and several crops in Brazil [17,83,84]. Nutrient diagnosis at local scale requires a well-documented dataset, an accurate machine learning model, a reliable model prediction algorithm, and a large set of ecologically diversified true negative specimens (Figure 6).
The bottleneck of machine learning models is knowledge gain on the learning curve. As anticipated 200 years ago by Alexander von Humboldt [3] a comprehensive understanding of living systems requires collecting facts and local knowledge trustfully. Data can be observational as provided by growers, or experimental as retrieved from the published and the gray literature. Data sharing among stakeholders does not suffice to run machine learning. Data must be collected in a uniform way and cleaned from errors. Missing data could be imputed carefully or documented from other databases such as meteorological databases. Thereafter, data must be checked for their distribution to detect outliers.
A minimum dataset of meaningful features could be selected by adding or removing features (razor of Occam) without losing model accuracy during the model training process. Minimum data sets facilitate data acquisition by stakeholders at minimum cost and effort and make sense to them. The most performing machine learning model is selected. In general, the classification mode (yield class about yield cutoff) is more acurate than the regression mode. The classification mode returns the probability to exceed yield cutoff as targeted by the grower.

Local diagnosis
Features such as cultivar, rootstock, soil type or climatic conditions have been averaged to generate regional standards as "Frankenstein-built constructs" that may lead to unaccurate diagnosis at local scale where factors interact [17]. The local diagnosis often differs from regional diagnosis because the heroic assumption that "all controllable and uncontrollable factors but the ones being addressed are at equal or optimum levels" may fail at local scale. Indeed, the regional diagnosis is counterintuitive to growers' heuristics that compares normal to abnormal situations under similar conditions in their neigborhood [86]. Fertilizer recommendations can be customized using the fertilization regime of the closest compositional neighbors as reference, by modifying regional recommendations, from response curves, or using an optimization algorithm (Figure 7).
At local scale, the closest compositional neighbors are the true negative specimens showing similar growing conditions and the smallest compositional Euclidean distance from the diagnosed specimen. The nearest neighbors were said to be located in "Humboldtian loci or "enchanting islands", "Ilhas Encantadas" in Portuguese, for a given set of uncontrollable factors. The grower has been pictured by [43] as a compositional parachutist manipulating nutrients as paracords to land on the closest "enchanting islands". There, the resources to tackle controllable factors can be used parsimoniously and efficiently to reach trustful yield targets. Because the number of successful factor combinations is limited by the size and diversity of datasets, a close collaboration is required between stakeholders to collect facts and document local knowledge trustfully [6,7,[90][91][92][93][94]. The decision to fix a yield target in classificaiton mode depends not only on growers' yield objective, but also on model precision and the number of true negative specimens available as close neighbors. The number of true negative specimens must be high because they provide benchmark compositions and trustfull yield targets under otherwise comparable growing conditions. As shown in Figure 8 for the Brazilian peach tree dataset [83], classification accuracy increased slightly while the number of true negative specimens decreased exponentially as yield target increased. Smaller number of true negative specimens as benchmark compositions limits model's capacity to select local conditons close to those of the diagnosed specimen. In this case, the decision was to select 16 ton ha À1 as cutoff yield, a reasonable yield objective. Fertilization recommendation using a Markov chain random walk algorithm to combine optimally N,, P and K dosage to increase yield from 2300 to 5900 kg berry ha À1 for lowbush blueberry considering a set of corrected site-specific controllable factors (reproduced from [46]).

Concluding remarks
In this chapter, we showed that fractal, compositional and machine learning models are promising alternatives to former empirical and mechanistic models to diagnose soil quality and plant nutrition at local scale and conduct side-by-side comparisons. Fractal kinetics confirmed that organic matter decomposition rates are controlled by protection mechanisms developing during organic matter transformation in soils. Site-specific coefficients can be assigned to decomposition rates under soil management practices. Compositional Data Analysis accounted for the special geometry of D-part compositions using log ratio transformations to tackle numerical bias before running numerical analyses. Machine learning methods can handle large and diversified datasets acquired through close collaboration between stakeholders.The CoDa methods can be combined with machine learning methods to diagnose nutrient imbalance and rank nutrients in the order of their limitation to yield by side-by-side comparison with successful neighbors.
This paper emphasized the need to change paradigm from the regional to the local scale to diagnose soil quality and plant nutrients and customize recommendations. Local features can be assembled in large and diversified numbers to address trustful feature combinations, then carved to a minimum data set impacting system's productivity and sustainability. Large and diversified data sets can be processed by methods of machine learning and compositional data analysis to reach the field or subfield scale. This requires collecting data uniformly and a close collaboration between stakeholders.