8 PCA – A Powerful Method for Analyze Ecological Niches

Principal Component Analysis, PCA, is a multivariate statistical technique that uses orthogonal transformation to convert a set of correlated variables into a set of orthogonal, uncorrelated axes called principal components (James & McCulloch 1990; Robertson et al., 2001; Legendre & Legendre 1998; Gotelli & Ellison 2004). Ecologists are most frequently dealing with multivariate datasets. This is especially true in field ecology, and this is why PCA is an attractive and frequently used method of data ordination in ecology. PCA enables condensation of data on a multivariate phenomenon into its main, representative features by projection of the data into a two-dimensional presentation. The two created resource axes are independent, and although they reduce the number of dimensions–i.e. the original data complexity–they maintain much of the original relationship between the variables: i.e., information or explained variance (Litvak & Hansell 1990). This is helpful in focusing attention on the main characteristics of the phenomenon under study. It is convenient that, if the first few principal components (PCs) explain a high percentage of variance, environmental variables that are not correlated with the first few PCs can be disregarded in the analysis (Toepfer et al., 1998). In addition, applying PCA has become relatively userfriendly because of the numerous programs that assist in carrying out the computational procedure with ease (Dolédec et al., 2000; Guisan & Zimmerman 2000; Robertson et al., 2001; Rissler & Apodaca 2007; Marmion et al., 2009).


Introduction
Principal Component Analysis, PCA, is a multivariate statistical technique that uses orthogonal transformation to convert a set of correlated variables into a set of orthogonal, uncorrelated axes called principal components (James & McCulloch 1990;Robertson et al., 2001;Legendre & Legendre 1998;Gotelli & Ellison 2004).Ecologists are most frequently dealing with multivariate datasets.This is especially true in field ecology, and this is why PCA is an attractive and frequently used method of data ordination in ecology.PCA enables condensation of data on a multivariate phenomenon into its main, representative features by projection of the data into a two-dimensional presentation.The two created resource axes are independent, and although they reduce the number of dimensions-i.e. the original data complexity-they maintain much of the original relationship between the variables: i.e., information or explained variance (Litvak & Hansell 1990).This is helpful in focusing attention on the main characteristics of the phenomenon under study.It is convenient that, if the first few principal components (PCs) explain a high percentage of variance, environmental variables that are not correlated with the first few PCs can be disregarded in the analysis (Toepfer et al., 1998).In addition, applying PCA has become relatively userfriendly because of the numerous programs that assist in carrying out the computational procedure with ease (Dolédec et al., 2000;Guisan & Zimmerman 2000;Robertson et al., 2001;Rissler & Apodaca 2007;Marmion et al., 2009).
PCA has been widely used in various fields of investigation and for different tasks.Many authors have used PCA for its main purpose: i.e., to reduce strongly correlated data groups or layers.These studies concern either environmental variation (e.g., Kelt et al., 1999;Johnson et al., 2006;Rissler & Apodaca 2007;Glor & Warren 2010;Novak et al., 2010a;Faucon et al., 2011;Grenouillet et al., 2011), the investigated species or community characteristics (e.g., Kingston et al., 2000;Pearman 2002;Youlatos 2004;Kitahara & Fujii 2005), or both, sometimes in combination with detrended correspondence analysis, DCA, canonical correspondence analysis, CCA, and other ordination methods (e.g., Warner et al., 2007;González-Cabello & Bellwood 2009;Marmion et al., 2009;Mezger & Pfeiffer 2011).The application of PCA has helped in various fields of ecological research, e.g., in determination of enterotypes of the human gut microbiome on the basis of specialization of their trophic niches (Arumugam et al. 2011).In aquatic habitat studies, it has been applied for evaluation of aquatic habitat suitability, its regionalization, analysis of fish abundance, their seasonal and spatial variation, lake ecosystem organization change etc. (Ahmadi-Nedushan et al., 2006;Blanck et al., 2007;Catalan et al., 2009).However, it has been often applied in analyzing farming system changes (Amanor & Pabi 2007).
In many cases, PCA has been used as a source or supporting analysis in the performance of more complex analysis, such as the study of adaptive fish radiation, strongly influenced by trophic niches and water depth (Clabaut et al., 2007), predicting the potential spatial extent of species invasion (Broennimann et al., 2007) and multi-trait analysis of intra-and interspecific variability of plant traits (Albert et al., 2010).Chamaillé et al. (2010) performed PCA and Hierarchical Ascendant Classification to evaluate environmental data, on the one hand, and human and dog population density data, on the other, in order to detect possible ranking of regions differently threatened by leishmaniasis.
Niche differentiation and partitioning is an ecological issue where PCA is frequently used.It enables efficient differentiation among related parapatric species (Dennis & Hellberg 2010).
To access the problem authors use various available input data, which may be other than direct measurements of the niche.Since body shape and composition can readily be related to adaptation to the environment, morphometry figures as an adequate surrogate approach for studying the niche.Morphometric characteristics represent a data set vitable for evaluating the organism−environment relationship; besides PCA, Lecomte & Dodson (2005) additionally used discriminant analysis for this purpose.Inward et al. (2011) applied PCA to determine the morphological space of dung beetles representing regional faunas.Claude et al. (2003) demonstrated, using the case of turtles, that geometric morphometry, evaluated with PCA, can help to analyze the evolution of convergence.Morphometric differences between related species can easily refer to niche partitioning, reflecting differences in spatial or trophic level (Catalan et al. 2009;Niet-Castañda & Jiménez-Jiménez 2009;Novak et al., 2010b).
Hypogean habitats such as caves and artificial tunnels are relatively simple habitats owing to their low diversity, low production, and the constancy of their environmental factors (Culver 2005); they are thus suitable for investigating an environmental niche in situ.In this contribution we demonstrate the use of PCA in exploring an ecological niche in two case studies from caves in Slovenia.In the study of the three most abundant hymenopteran species that settled in the caves for rest, we only could evaluate their spatial niches on the basis of the usually measured environmental parameters.PCA was applied in two levels: 1.In the exploratory data analysis it was used as an efficacious tool to reduce the parameters into two principal components, PCs. 2. In the test hypothesis, the PCs of all three species were subjected to variance analysis to detect differences between the spatial niches of the three species.

Ecological niche concept
Ecological or environmental niche is one of the most useful concepts for exploring how and where organisms live and how are they related to their environment.After its introduction (Grinnel 1917), this concept changed considerably as new knowledge about the habitat and functioning of the organisms within was acquired.There are three main views in the evolution of the concept.The first view is that niche equals habitat, which is a multidimensional presentation of conditions in the local physical place where an organism lives.The second and most frequently applied understanding is the functioning of an organism within its concrete environment, which concerns acting on and responding to the organism's physical environment as well as to other organisms, originally, within a community.Since the community concept is in the course of radical change (Ricklefs 2008), it is convenient to replace the term community with a more general one, an assemblage.In practice, habitat and the function of an organism are often discussed as spatial, temporal and trophic niches.The third view is that the niche refers to variables within the whole range of the distribution area of an organism, which provides much information about organism−environment relationships on the global level (Soberón 2007).
The century-old niche concept has had many peaks and falls in the history.Since the initial idea was generated, it has evolved much over the years, being forgotten or neglected and/or misinterpreted, and recovering in different ways (Collwell & Rangel 2009).In the last decade, an intensive debate has taken place about redefining the meaning, importance and suitability of different aspects of the niche, including the measuring methods.This development accurately reflects the importance of its resurrection for the progress of ecological, evolutionary and related investigations.Today there are numerous niche concepts (Chase & Leibold 2003).Grinnell (1917) conceptualized the idea on a case study of a bird, the California Thrasher.He wrote that the niche comprehends the various circumstances to which a species is adapted by its constitution and way of living.He also wrote that no two species in a single fauna have precisely the same niche relationship, a fact which indicates the different roles of species within a community.What Grinnell called a niche was later understood for a long time as habitat, i.e., the sort of place where an organism lives.By a niche, Elton (1927) meant the place of an (animal) organism in its community, its relation to food and enemies and to some extant to other factors.He stressed the importance of the food (trophic) dimension of the ecological niche.Later (Elton 1933) he denoted a niche as a species mode of life, as for instance, professions in a human community.Instead of environments, Hutchinson (1957) attributed niches to species (Collwell & Rangel 2009), and explained the niche as part of an abstract multidimensional space, the ecospace, representing the whole range of intracommunity variables and interactions.Within this space, the way of life of a species is balanced by its tolerances and requirements.He called the overall potential niche the fundamental niche, in contrast to the realized niche, which is narrower for the negative impact of competitors and predators.The niche concept evolved additionally through discussion of several points of the competitive exclusion principle: i.e., the assumption that two species with identical environmental requirements cannot coexist indefinitely in the same location.According to Hardin (1960), the niche dimensions are represented by different abiotic and biotic variables concerning a species, such as its life history, habitat, position in the food chain, and its geographic range.Whittaker & Levin (1975) understood the niche as a species' requirements and its position in relation to other species in a given community.Recently, the importance of studying the niche to improve our understanding of the functioning of species and whole ecosystems has again become a widely discussed topic in global ecology.Pullian (2000) showed that, besides competition, other factors, such as niche width, habitat availability, dispersal, etc. influence the observed relationship between species distribution and the availability of suitable habitat, and should thus be incorporated into Hutchinson's niche concept.Soberón (2007) justified the separation of niches into two−the Grinnellian and the Eltonian class−on the basis of their focuses.Hutchinson's (1957) consideration can be applied to both groups.The Grinnellian class of niches is based on consideration of their noninteractive variables, such as average temperature, precipitation and solar radiation, and environmental conditions on a broad scale.These variables are relevant to understanding coarse-scale ecological and geographic properties of species.The Eltonian class niches, in contrast, focus on bionomic variables, such as biotic interactions and resource-consumer dynamics, which can be measured principally on a local scale.Whereas datasets of variables of the Grinnellian niche group have been rapidly compiled in the World, very little theory has been developed explicitly about this.On the other hand, variables for considering much more dynamic and complex Eltonian niches have never been available (Soberón 2007).Both classes of niches are relevant to understanding the distribution of individuals of a species, but the Eltonian class is easier to measure at the high spatial resolutions characteristic of most ecological studies, whereas the Grinnellian class is suited to the low spatial resolution at which distributions are typically defined (Soberón 2007).Applying the modelling of species distribution to the distribution constraints is strongly encouraged to provide better insight in species distributions (Kearney & Porter 2009;Bellier et al., 2010).It is important to understand that a niche is not a conservative concept, but a consequence of the complexity of the subject, which may refer to very different features of the fundamental niche, with different ecological and evolutionary properties (Soberón & Nakamura 2009).It has been demonstrated that, on the one hand, inconsistent adaptive pressures may give rise to a whole palette of niche diversification (e.g., Romero 2011), while, on the other hand, convergent evolution in various combinations takes place within the multidimensional niche space (e.g., Hormon et al., 2005).

Ordination and the PCA concept
Ordination is a method in multivariate analysis used in exploratory data analysis.Exploratory data analysis is an approach to analyzing data sets to summarize their main characteristics in an easy-to-understand form, often in graphs.In this procedure no statistical modelling is used.The order of objects in ordinations is characterized by values of multiple variables.Similar objects are ordinated near each other and vice versa.Many ordination techniques exist, including principal components analysis (PCA), non-metric multidimensional scaling (NMDS), correspondence analysis (CA) and its derivatives, like detrended CA (DCA), canonical CA (CCA), Bray-Curtis ordination, and redundancy analysis (RDA), among others (Legendre & Legendre 1998;Gotelli & Ellison 2004).
PCA is widely useful in considering species; it is appropriate for the analysis of community composition data or as gradient analysis.Gradient analysis is an analytical method used in plant community ecology to relate the abundance of various species within a plant community to various environmental gradients by ordination or by weighted averaging.These gradients are usually important in plant species distribution, and include temperature, water availability, light, and soil nutrients, or their closely correlated surrogates (Lepš & Šmilauer 2003).

Environmental niche of three hymenopteran and two spider species
Between 1977 and 2004, 63 caves and artificial tunnels were ecologically investigated in Slovenia; the three most abundant Hymenoptera species found in these studies have been ecologically evaluated (details in Novak et al. 2010a).In the caves, many environmental data were collected, as follows.The following abbreviations of the environmental variables are used: Dist-E = distance from entrance; Dist-S = distance from surface; Illum = illumination; PCS = passage cross-section; Tair =air temperature; RH = relative air humidity; Tgr = ground temperature; HY = substrate moisture.The hymenopteran spatial niche breadth was originally represented by nine variables.The variation was subjected to PCA, and differences in niche overlap were tested using One-way ANOVA.In the following, we demonstrate the analysis of occupied physical space in the three species: Amblyteles armatorius, n=16, Diphyus quadripunctorius, n=42, and Exallonyx longicornis, n=44.These variables refer to the environmental conditions for the individual placements within the caves.
PCA requires normal data distribution.This is often not the case with the environmental data provided by field investigations, as in our case.In variables presented as proportions or ratios, e.g., humidity, this problem can be overcome with the arc-sin transformation.In those variables stretched over a large scale of values, e.g., illumination and passage cross section, this c a n b e a c h i e v e d b y t r a n s f o r m a t i o n i n t h e l o g a r i t h m i c s c a l e .I n o u r s t u d y , w e u s e d t h e Kolmogorov-Smirnov test, K-S, to check the data for normality.To normalize distribution, we transformed air humidity and substrate moisture data (arcsin) (Fig. 1), and passage cross section and illumination data (log) (Fig. 2).PCA is sensitive to the relative scaling of the original variables.We therefore z-standardized the data.Here we demonstrate relations between nine environmental variables with Pearson correlation coefficients (Table 1).To obtain detailed information on the pattern of variation, the sets of nine environmental variables were subjected to PCA.In this way, we obtained nine PCs.These new values are called principal component scores.The Eigenvalue and ratios of explained variances are presented in Table 2, where PC variance is in progressive decline.The last four components represent such a small ratio of the total variance that it is reasonable to ask whether they describe any biotic response or not.A common rule is to interpret only those components that contribute more than 5% of the total variance.In this study case on Hymenoptera, PCs1 to PCs5 meet this criterion in the total account of 92.5% of the variance explained, while 7.5% of the variance remains unexplained.The explained contribution of variances to the total variance is shown in a scree plot (Fig. 3).The large differences between the variances of the first three PCs and much smaller ones of the other scores are clearly evident.Projection of the variables on the factor plane revealed that the 1 st and the 2 nd axes of the PCs explained 37.6% and 29.5% of the total variance.The Pearson correlation coefficients and elementary graphics associated with the relations between PCs and environmental variables are presented in Table 3 and Fig 4, respectively.In this graphic presentation, they are placed on a circle, called the correlation circle, with the pair of factor axes as its axes.The stronger the correlation between a variable and the factor, the greater the correlation of the corresponding variable with the factor axes.The variables that are correlated with a particular factor can thus be identified, thereby providing information as to which variables can explain the given factor.This is demonstrated in Fig. 5. PC1 best explains the variability of air and ground temperature, and illumination: these values increase with the decreasing PC1, while the values of airflow and distance from the entrance increase with the increasing PC1.PC2 best explains the variability of air humidity, substrate moisture and both distance from the entrance and from the surface: these values increase with the increasing PC2.In the case presented the explanatory power of PCA with respect to variable importance is evident.PCs thus fully represent adequate surrogates to explain the spatial component of the niche.
For the interpretation of these outputs, one needs good biological and ecological knowledge about the organisms under study.The projections of the environmental dimensions of the three species are represented by polygons in Fig. 5.A more elaborate figure has been published elsewhere (Novak et al. 2010a).
Parameter  Moreover, PCs enable the testing of differences between environmental niches.For this purpose, in the test hypothesis, the PCs defining niches were subjected to variance analysis for differences between the three species.One-way ANOVA was used to test differences between species in the 1 st and 2 nd principal components (F and p values in Table 3).In this way, PCA allows testing of differences niches.The same analyses of the spatial niches were carried out on two co-existing spider species, Meta menardi and Metellina meriannae (Novak et al. 2010b).In this case, the variations in temperature, humidity, airflow and illumination were subjected to PCA.The 1 st and the 2 nd PCs together explained 70.4% of variation (Figs. 6 and 7).In this way, we presented the course of temporal changes in the spatial niches of the two spiders.

Discussion
Since computer techniques and technologies have enabled efficacious computation of PCA, it has become one of the most useful tools in ecology in various fields of use.Still, one can readily notice that many problems appear when its applicability for different purposes is to be estimated.On the one hand, reservations occur because of the credibility or interpretability of the data.Yet Austin (1985), e.g., stated that animal ecologists often use PCA without discussion of the ecological implications of its linear model, although the PCA axes are not necessarily ecologically independent, and there is no necessary ecological interpretation of components.Besides, it is particularly notable that two-and threedimensional data using Gauss species response curves can produce complex flask-like distortions in which the underlying gradient structure is impossible to recognize without prior knowledge.In this sense, some authors (e.g., Hendrickx et al., 2007), in a specific context, decided not to rely on the obtained PCA axes, since they obscured additive and  interactive effects among variables that were partially correlated.The others highlighted that the use of PCA on spatially autocorrelated data is not appropriate in a spatial context (Novembre & Stephens 2008).Indirectly, this is the identical aspect of the problem that Nakagawa & Cuthill (2007) explain on a general level; they direct attention to the mostly neglected fact that we are still dealing with and discussing statistical rather than biotic importance.Null hypothesis significance testing, e.g., does not provide us with the magnitude of an effect of interest, nor with the precision of the estimate of the magnitude of that effect.They thus advocate (ibid.)presentation of biotic relevance: i.e., measurement of the magnitude of an effect on an organism and its confidence interval.These are the real goals in biology irrespective of the statistical values measured.In this sense, biologists and ecologists are often "trapped in statistics", into "statitraps", rather than dealing with biotic phenomena themselves.In addition, in comparison with other methods, PCA sometimes proves not to be as efficacious.In their study on ecological niches of two Ceratitis flies, De Meyer et al. (2008) aver that the PCA model is apparently not as good as the GARP (genetic algorithm for rule-set prediction) model at capturing the species-environment relationship.This is probably because the PCA model cannot account for nonlinear species-environment relationships in the way that GARP can.
Additionally, Ricklefs (2008) established that a local ecological community, which consists of those species whose distribution include a particular point in space and time, is an epiphenomenon with relatively little explanatory power in ecology and evolutionary biology.To understand the coexistence of species locally, one must understand what shapes species distribution within regions, but factors that constrain distribution within regions are poorly understood.This evidence of the disintegrating concept of "community" requires reconsideration of our prior explanations of coexisting species in the subterranean environment.Although many authors use the term community when dealing with its biota, their methods of discussion reveal that they mostly use the term in its wide, much looser meaning rather than referring to specific species composition and their inter-and intraspecific relations.This is especially evident in their continual references to the fact that the biology and/or ecology of many species remain unknown.
In this contribution we had presented common uses of PCA and its efficacy in accessing multivariate data, such as provided in the most usual and convenient field investigations in caves.With respect to the dynamic niche conception (Soberón 2007;Soberón & Nakamura 2009), our case studies deserve additional comment.On the one hand, the essence of the "disintegration community concept" sensu Ricklefs (2008) can well be perceived in the widely distributed species that we encounter in caves.These include the animal species appertaining to the parietal association, i.e., those species found especially on the walls and ceiling near cave entrances (Jennings 1997).The three hymenopteran and two spider species discussed in this study belong among them.The parietal association is a heterogeneous assemblage of species with respect to their geographical distribution and activity in the subterranean environment, thus directly raising the need for more complete knowledge of their "biogeography of the species" sensu Ricklefs (2008).On the other hand, in highly endemic troglomorphic species-i.e., those well adapted to the hypogean habitat (e.g., Gibert andDeharveng 2002, Christman et al., 2005, Culver and Pipan 2009)-the community disconception might have little or no significant impact on the findings under discussion.

Conclusions
PCA is a useful tool enabling ordination of environmental variables in ecology.An environmental niche comprises multidimensional data set, while PCA is an appropriate statistical tool to handle such data sets.In this way, PCA readily provides the means to explain the variance magnitudes related to environmental variables, which represent the environmental niche.Despite that, one must be aware that PCA output depends on input data, which can never cover all dimensions of an environmental niche.Besides, PCA may obscure many effects among partially correlated variables.As with other statistical approaches, it is necessary to consider the results carefully, implementing a broad knowledge of the biology and ecology of the organisms under study in order to avoid statistical artifacts.

Acknowledgement
We are indebted to Michelle Gadpaille for valuable improvement of the language.This study was partly supported by the Slovene Ministry of Higher Education, Science and Technology within the research program Biodiversity (grant P1-0078).

Fig. 1 .
Fig. 1.Distribution of row relative air humidity data (a) and the data after arc-sin transformation (b) with normal distribution curve.

Fig. 2 .
Fig. 2. Distribution of row cross section data (a) and the data after logarithmic transformation (b) with normal distribution curve.

Fig. 3 .
Fig. 3. Scree plot of the eigenvalue and the percent of variance explained by each component is shown in decreasing order.

Fig. 4 .
Fig. 4. Projection of the nine ecological variables on the 1 st and 2 nd factor planes.Graphical associated fall of the variables (arrows) in the correlation circle.

Fig. 5 .
Fig. 5. Ordination of the nine environmental variables in 1 st and 2 nd PC axes.Ellipses (95% confidence) represent spatial niches in the three hymenopteran species.

Table 1 .
Pearson correlations coefficient among nine environmental variables.Significant correlations in bold.(Upper row r, lower row p).

Table 2 .
Eigenvalues and percentages of explained variability.

Table 3
. Pearson correlation coefficient between environmental variables and the first two Principal Components (PCs) F and p values of one-way ANOVA in testing the first two PCs according to the three hymenopteran species.