The eigenvalue principal component results.
Natural resource scientists, concerned citizens, and government officials are interested in reconstructing disturbed environments for reforestation and agricultural productivity. We examined Clearfield County in Pennsylvania, USA, to develop a predictive model to reconstruct the landscape for seven agronomic crops (corn, corn silage, oats, alfalfa hay, red clover, bluegrass, and soybeans) and thirteen woody plants (white cedar, lilac, highbush cranberry, Amur maple, gray dogwood, peashrub, white spruce, white pine, red maple, red pine, jack pine, nannyberry, and white ash). A significant predictive model (p ≤ 0.001) was generated explaining 96.94% of the variance, with percent clay, bulk density, hydraulic conductivity, available water capacity, pH, percent organic matter, percent rock fragments, slope, topographic position, and electrical conductivity explored as main effect terms, plus squared terms, and first order interaction terms. The model is not over-specified and each predictor is significant (p ≤ 0.05). The modeling effort suggests that there are at least several clusters of vegetation preference dimensions based upon the terrain of the landscape. The model provides insight into how to reconstruct the disturbed environment for vegetation in the study area.
- environmental design
- landscape architecture
- soil science
1. Introduction and literature
Reclamation scientists and partitioners are interesting in restructuring disturbed soils (neo-sols) for maintaining vegetation productivity in a sustainable manner . Along with this interest, they are concerned with constructing predictive models (equations) to quantitatively assess the inherent productivity of a soil column. The literature addressing this interest originated in the 1980’s to study reclaiming large surface coal mines . However, the quest was perplexing with many unanswered questions such as: did a different equation need to be developed for each and every plant material? How much of the soil column required measuring and did the soil column have a weighted contribution? And, which variables should be measured? This article is primarily about one researcher’s quest and the colleagues he is affiliated with to address this issue.
By the late 1980s and early 1990s a methodology was developed that answered these questions [2, 3]. The framework for this methodology is illustrated in Figure 1. The approach attempts to predict the productivity of the soil column itself and is not a real time productivity model that assesses the current plant production based upon immediate weather conditions or greatly added nutrients beyond modest levels. Therefore, weather and soil additives are beyond the modeling effort.
Soil and vegetation productivity can be actually rather vague and variant in definition. This variation in definition may surprise some who believed they had a very firm idea what constituted soil, especially in the biological and agronomic sciences. The broadest view comes from sol engineers who divide the terrestrial surface into two categories: bedrock and soil. Bedrock are expansive stone-like structures that cannot be dislodged or moved and soil consists of particles that can be moved . Thus, almost any inclusion can become a soil particle such as plastics, organisms, large boulders, and many other objects. This viewpoint can be quite different from the classically trained agronomic soil scientist’s sensibilities concerning what is soil. The divergence in thinking occurs between one group who utilizes soil for vegetation (softscapes) and another group who primarily utilize soils for construction for buildings, walls, roads, and paving (hardscapes). However, in soil productivity studies, it is soil properties that are measurable, acting as a construct representing the soil profile.
A similar issue exists when defining soil productivity. It is a general idea with no firm definition. However, vegetation measures such as plant height per year or weight per area are constructs representing vegetation productivity.
A constructed model would be applicable to the study area where the soils and vegetation are sampled. The ideal study area would have all vegetation of interest grown across all soils in the study area and across normal, dry, and wet years. To initiate such a comprehensive study would take numerous field plots measuring plant growth for at least ten years . This is an extensive modeling project, taking up to 1 million USA dollars to accomplish. Most research projects last only a few years and are funded at much less levels . However, the United States Department of Agriculture, directed the Natural Resource Conservation Service to conduct such work county by county. Not all counties in the United States have been evaluated; yet, the American federal government maintained a long term vision to collect this data in an effort that is nearing 100 years old. The federal government was excellent at collecting the data and publishing the data, accessible to all for free. The data is available to investigators who have the statistical ability to analyze the gathered information. This American database led to the development of the methodology Figure 2) [2, 3].
Recently published research articles by Wen and Burley and Corr et al. review many of the authorities and related modeling efforts to produce similar and related Equations [5, 6]. The focus of the literature review in this chapter will concentrate upon those studies that followed the methodology in Figure 2. The first reported equation was in a study by a team from the University of Manitoba of an equation for Clay County, Minnesota, published in 1989 . This study suggested that many agronomic crops covary together concerning preferences for soil, meaning that an equation could be generated for a set of crops at one time, as opposed to having an equation for each individual crop. The team also produced equations for woody plants and for a combination of woody plants and crops [8, 9]. The team also discovered that sugarbeets (
At the time, it was somewhat unusual for a master’s thesis to generate numerous scholarly articles (five journal articles and three conference articles), as many such landscape thesis results in few if any publications. However, the University of Manitoba encouraged such publications and activities. In addition, it was even more unusual that a landscape architect would generate that many articles. Zhi Yue a co-author of this book chapter wondered how Dr. Burley found a way to develop these equations, “When I was quite young (age 6), I lived in Edmonton, Alberta, and my parent’s friends were American academics who worked at the University of Alberta in disciplines/professors such as anthropology, wildlife biology, and music. It was there that I met my first landscape architect (when I was 17), R. H. Knowles, who gave me a copy of his book . So, it seemed natural to me and expected that scholarly efforts would result in publication. When I was 22, I wrote my first article as an undergraduate and had it published when I was 23 . I later learned that this modest output was greater than all the landscape architecture output in 1978 from my eventual home academic institution. In other words, academic landscape architects did not publish much back then. But it seemed natural to me that a curious landscape architect dedicated to academic scholarship might be the one who eventually developed these neo-sol productivity equations. The equations could have been developed by agronomists, horticulturists, soil scientists, foresters, or environmental engineers. Yet I learned that many disciplines are deep but not broad in education like a landscape architect and did not ask the same practical and applied questions a landscape architect might ask. Plus, I had the fortune of working on a research-oriented Plan A Thesis (most landscape architecture master’s students do a project as a Plan B Thesis), meaning that my committee members at the University desired that I take courses in statistics, as much as I could take. So, for my coursework, I took introductory statistics, non-parametric statistics, regression analysis, analysis of variance, philosophy of science, and linear algebra. Later, I took multivariate analysis and statistical autocorrelation. My University of Manitoba professors prepared me so well, that by the time I went to the University of Michigan for my Ph.D., the professors there in the School of Natural Resources waived any requirements for me to take a statistics course. Still, I took a course in epidemiological statistics, auto-correlation, and a course in risk analysis at my leisure. It was this statistical background that assisted me in the modeling efforts with skills and abilities often not present with others who were searching for a way to develop neo-sol equations.” observed Dr. Burley.
The next equation to be published was a study of Polk County, Florida . The study was initiated through Anthony Bauer, FASLA, a well-known landscape architect from Michigan State University, specializing in surface mine reclamation who gave a comprehensive exam question for Jon Bryan Burley in his quest for a Ph.D. The study revealed two different sets of vegetation preferences: a mesic preferring group of plants and another group preferring wet conditions. The previous studies in Minnesota and North Dakota had revealed primarily equations for mesic settings. It can be quite unusual that an exam question generated a paper, but this did not seem unusual to Jon Bryan Burley, as several papers were generated from assignments in graduate level courses in risk assessment, field studies, remote sensing, and anthropology [16, 17, 18, 19].
After the Florida study, the next reported mesic preference neo-sol equations were reported about three counties in the North Dakota coal fields: Oliver, Mercer, and Dunn counties [20, 21]. A paper was published that illustrated the spatial application of the equation surface mine setting to maximize productivity and minimize costs . Another paper examined the relationships between softscape soil and hardscape soils, identifying some soils that are suitable for both applications . In addition, the effort addressed various American state laws concerning the deployment and use of neo-sol equations .
These efforts lead to the publication of a surface mine reclamation book, and two national American Society of Landscape Architects (ASLA) research awards . By 2005, Dr. Burley became the American Society of Reclamation Sciences (ASRS) researcher of the year and contributed towards his induction as a 2010 Fellow in ASLA for his research contributions. Dr. Burley’s goal was to see if he could develop a set of North American reclamation equations, even a global equation. But the research sputtered as there seemed little new interest in funding or construction such equation studies. “I was urged by my department chair to abandon my neo-sol productivity equations research and go where the money was, such as in healthy cities or climate change.” noted Dr. Burley. “But I am rather stubborn. So much of the earth was being disturbed by human activities that the equations could be helpful in a wide variety of applications where the original soil profiled is disturbed — I did not want to have the reputation as an academic money ambulance chaser. And then something interesting happened. As an aging professor being successful in conducting research and publishing, I found international students and professors wanted to work with me and often they were interested in developing new equations as a means of learning how to do research.”
The result of the renewed interest was started by a French team who worked with Dr. Burley and developed a mesic preference equation for Grand Traverse County in Michigan . Then, Chinese scholars began working with Dr. Burley. There was a movement in P.R. of China for Chinese academics to learn research methods and publish. This resulted in the study and publication of a silica mining region in Chippewa County, Wisconsin and a kaolinite mine area in Georgia . Coal mining in Montana, Wyoming, Colorado, and Texas was also explored [5, 28]. Corr et al. (Dustin Corr was an American graduate student of Dr. Burley) studied developing equations in the iron mining region of the Upper Peninsula of Michigan, deriving mesic and the first xeric set of equations, concluding the current set of equations that have been developed with this methodology .
Zhi Yue, a professor in landscape architecture from Nanjing Forestry University, Nanjing, P. R. of China was interested in applying this methodology to a study area in Clearfield County, Pennsylvania. This book chapter reports upon the results concerning the application of this methodology in this study area. The study represents a continuing effort to construct a set of equations and data sets to potentially derive a set of universal equations for the eastern 2/3rds of the United States.
2. Study area and methodology
2.1 Study area
Clearfield County is the study area, located in Pennsylvania (Figure 3). The county is composed of angular hills, farms, small towns, and forests (Figure 4). Coal mining and clay mining occur in the county . The county’s soil survey is one of the oldest in the United Sates being published in 1916, but updated in 1988 . From southeast to northwest the terrain and soils change along ridges and river bottoms, resulting in a county that is physically quite diverse. Rose et al. published a paper concerning some of the environmental issues associated with coal mining in the county associated with acid mine drainage . Brown and Parizek examined hydrological grown water flow for two mines in the county . Skousen and Zipper describe a recent overview of coal mining the Appalachian region .
The methodology has been reported in detail by several publications [2, 3, 5, 6]. For this study, crop and plant harvest and growth information are sorted by soil type. The variables employ different measurements by weight per acre or height per year. Each variable is standardized with a mean of “0” and a standard deviation of 1. This standardization prevents measurement scales with large numbers from dominating the results of scales expressed in smaller units. For example the weight measurement kg per hectare is a different type of measurement scale than the volume measurement of hectoliters per hectare. Standardizing allows apples and oranges (in this case corn and alfalfa hay) to be compared, as first proposed by Kendall . Then the standardized variables are assessed with principal component analysis (PCA). The analysis examines the covariance of crops and woody plants across soil types, developing latent dimensions with vector coefficients. Each dimension is orthogonal (independent) to other dimensions. The maximum number of dimensions is equal to the total number of variables. Each dimension has an associated eigenvalue. The sum of the eigenvalues equals to the number of dimensions. The larger the eigenvalue, the greater the proportion of variance the dimension explains. Typically, eigenvalues greater than one are considered potential candidates for further analysis. Often the first few eigenvalues explain 70% or more of the variance in the data set . The eigenvector coefficients facilitate the creation of a linear combination of values, when summed, represent the expected vegetation response to the soil, a single dependent variable per soil profile [2, 3, 5, 6].
The independent variables are composed by gathering the soil variables of interest for a depth of 1.22 meters. Each variable is weighted by depth at 30.5 cm increments. As the layer nearest the top contributes approximately 40% to plant growth, the second layer 30%, the third layer20%, and the fourth layer 10% . By employing a weighting equation, one value per variable, per soil type can be computed. The effort by Doll and others settled the issue of where to measure soil variables and how to derive a single variable value for variables such as soil reaction and percent organic matter to describe the soil profile [36, 37].
The data source for the study has been published by Hallowich et al., . The vegetation employed in the investigation include: corn (
Regression analysis was performed employing main effects, squared terms, and first order interaction terms as independent variables from the soil profiles . Doll et al. proposed a hypothetical multi-order interaction model, but supplied no evidence that such a model actually represented any true predictive power . In addition, no investigator has demonstrated any theories or statistical models to suggest that independent variables beyond first order interactions are necessary or represent biological responses in soil profiles. Linear combinations derived from the vegetation dimensions formed the dependent variables [2, 3, 5, 6]. In this study, one of the linear combinations will be selected for equation development.
Past published equations have been somewhat complex containing many main effects, squared terms, and first order interaction terms as variables, often over 10 predictor variables. When selecting the best variable in step-wise regression, several criteria are employed. The first is that all of the regressors must be significant (p ≤ 0.5) under Type III sums of squares, meaning in SAS the regressor’s p-value is assessed as though it was the last predictor added to the regression model . Second, an equation presenting the most the largest possible R-square is preferred, as it explains a larger proportion of the variance. Finally, an equation which does not violate Mallows’ C statistic is preferred, as then the model is not over-specified, meaning the Cp value must be larger than the number of regressors, thereby avoiding multi-collinearity issues . Once the best equation is selected, it is ready for interpretation and examination.
When interpreting the selected equation, the model may present significant variables that pose soil–plant relationships that have been poorly studied, especially when examining interaction terms. Main effect terms are often more widely studied and known, as illustrated by Buta et al. . With the number of possible variables to include in a model concerning soil properties, in many respects soil science has examined many of the main effect, but, has yet to study many of the interaction properties . Squared terms often indicate the limitations of any main effect or interaction term, counter-balancing the contribution of mail effect variables and suggesting a curvilinear relationship.
The first five eigenvalues produced dimensions that have potential for equation development, as they are all greater than 1.0 (Table 1). The eigenvector coefficients are presented in Table 2. The results in Table 2 suggest that there is no linear combination that is suitable for all the of the plants (no set of coefficients that are all positive) in the study and that the plants are divided into various preferences (each vector has positive and negative values).
Table 3 presents the best model in the regression analysis with the second principal component. The analysis results suggests the dimension is suitable for predicting plant growth for corn (
The regression results contain terms that are main effects (one), squared terms (three), and first order interaction terms (seven). The model is not over specified as the C-plot score is 22.1294, suggesting that the results for this regression iteration would not be over specified until there were nearly 22 regressors in the model. The r-squared for the results in Table 3 is 0.9694. In other words, the proposed equation predicts 96.94 percent of the variance in the second dimension.
Unlike most of the previous studies that have been conducted where the previous results produced a universal mesic equation for all agronomic crops and woody plants, the results in Table 2, suggest that this was not possible. Often the first set of eigenvector coefficients would be all positive, indicating a universal covariance and soil preference amongst the vegetation types studied . This was not true for Clearfield County. In addition, past results in Northern Michigan and in Florida suggested the vegetation studied was divided into two soil zones: in Michigan a mesic and xeric zone; in Florida a mesic and hydric zone. However, in Clearfield County, Pennsylvania, the vegetation may be responding to latent dimensions not as clearly identified and understood. In other words, the landscape of Clearfield County may be more complex and diversified. In comparison, a large three county study area in North Dakota presented a more uniform landscape than Clearfield County, a smaller area .
If a reclamation team was interest in reclaiming surface mine in Clearfield County, the results of the first dimension indicated that an ordination of the seven crops variables might lead to a universal crop equation, as all the eigenvector coefficients are positive for crops in the first dimension. But in such landscape, the choice of woody plants for adjacent reclaimed areas may be limited. For examples, the reclaimed soil may not be suitable for lilac, highbush cranberry, Amur maple, and red maple.
The equation derived from Table 3 is presented in Eq. 1. This equation can be employed in Clearfield Country for a soil profile to predict plant growth. According to the statistical results, it will be wrong only one time in ten thousand applications. The value of such equations is that they provide an opportunity for the reclamationist to consider how to reconstruct the soil profile. The equations provide feedback.
Equations one suggests that topographic positions on the top of ridges should be avoided and that maximizing water holding capacity should be addressed. High topographic positions can have denser, clay soils. Well drained clay soils will be more productive. Increasing slopes eventually will reduce soil productivity; however steeper slopes should contain more organic matter. High clay content and high soil reaction reduces productivity. Abundant rock fragments can be beneficial as long as the available water holding capacity is high. For the most part, the equation is suggesting the management of water and aeration. These general principles are derived by interpreting the equation.
Y = Vegetation Productivity.
TP = Topographic Position.
SL = % Slope.
CL = % Clay.
HC = Hydraulic Conductivity.
BD = Bulk Density.
OM = % Organic Matter.
FR = % Rock Fragments.
AW = Available Water Holding Capacity.
PH = Soil Reaction.
When Eq. 1 is applied to predict soil productivity, highly productive soils have scores between the values of 2 to 3. For example, a soil similar in structure to the native Clymer Channery loam found in the county, which is a deep loamy soil residing upon 8% and 15% slopes, is a fairly productive soil, with a computed score of 2.24. The equation corroborates the expected high productivity of this soil. In contrasts, moderately productive soils had scores near zero. The Ernest silt loam is only a modestly producing soil. For soils with a similar soil profile, Eq. 1 predicts a score of −0.34. Finally, poorly producing soils had scores of −2 to −3. The Berks shaly silt loam in 15 to 25% slopes is a low productivity soil. When Eq. 1 is applied to soil profiles, similar to the Berks shaly silt loam, the calculated values is −2.75.
The value of the equation is to predict vegetation productivity prior to reclamation. Reclamationists can propose various reconstructed profiles with the soil resources available as illustrated by Burley in 1999 . On a site being reclaimed, there may be a variety of soil profiles on various topographies and slopes. The sum of the total productivity per mine site area or disturbed environment can be computed and compared.
The resulting equation like the one presented in this study often present intriguing questions concerning the properties of soils. While some might believe that soils have already been overly studied, the truth is that many of the interacting properties from a soil have been only modestly investigated with primarily the main effects being explored. The various equations produced over the years, provide insight into which interactions merit further study. In addition, soil scientists have been exploring and assessing new and different soil properties [5, 6]. These properties could be folded into future modeling efforts.
A full exploration of the potential models suggested by Table 2 would require a longer discourse and narrative than possible in a book chapter. For example, Corr et al. took 43 pages to describe the various combinations of equations they discovered in their study . However, this book chapter does sufficiently cover the fundamentals concerning the literature, methodology, and the results from a new study area. This effort has been ongoing for over 30 years and has only explored the fringes of possibilities.
There are still many unanswered questions in the reconstruction of soil profiles, especially the long-term stability and productivity of any reconstructed profile. In addition, very few equations have been validated with studies growing crops and woody plants and comparing the results to past predictions .
5. Conclusion and future prospect
This investigation illustrates that it is possible to develop neo-sol productivity equations that are highly specific and rigorous. The science for this effort has been operational for at least 30 years. But the databases to conduct such work have often been collected and available for over 100 years, awaiting analysis. The databases are expensive and time consuming to build.; yet when constructed and analyzed, they may offer insight into reclaiming disturbed landscapes. The study of Clearfield County revealed that in a landscape complex of large hills/small mountains and large broad valleys, vegetation preferences may be diverse, divided into dimensions of preference. While the first equations were developed for reclaiming environments disturbed from surface mining, landscapes are disturbed by many more types of human activities. In the future, these equations may render service in guiding the reconstruction and management of the soil for vegetation across many forms od disturbance.
Conflict of interest
The authors declare no conflict of interest.
The authors of this book chapter wish to acknowledge the scholarly contributions of the late Dr. Kimery Vories (1946–2019). Dr. Vories was a member of the American Society for Surface Mining and Reclamation, now the American Society of Reclamation Sciences . He earned a master’s degree at Western State University in Colorado and conducted Ph.D. work at the University of Amherst, Massachusetts and at Colorado State University. Back in the mid 1980s, Kimery was one of the scholars encouraging the initiation of the development of soil productivity equations for disturbed landscapes. His encouragement led to this line of research. Dr. Vories is known for his bat conservation activities with abandoned mines.