Age composition of the interviewed local villagers.
With the rapid advancement and popularity of geospatial technologies such as location-aware smartphones, mobile maps, etc., average citizens nowadays can easily contribute georeferenced wildlife data (e.g., wildlife sightings). Due to the wide spread of human settlements and lengthy living histories of citizens in their local areas, citizen-contributed wildlife data could cover large geographic areas over long time spans. Citizen science thus provides great opportunities for collecting wildlife data of extensive spatiotemporal coverage for wildlife habitat assessment. However, citizen-contributed wildlife data may be subject to data quality issues, for example, imprecise spatial position and biased spatial coverage. These issues need to be accounted for when using citizen-contributed data for wildlife habitat assessment. Geovisualization and geospatial analysis capabilities provisioned by geographic information systems (GISs) can be adopted to tackle such data quality issues. This chapter offers an overview of citizen science as a means of collecting wildlife data, the roles of GIS to tackle the data quality issues, and the integration of citizen science and GIS for wildlife habitat assessment. A case study of habitat assessment for the black-and-white snub-nosed monkey (Rhinopithecus bieti) using R. bieti sightings elicited from local villagers in Yunnan, China, is presented as a demonstration.
- citizen observers
- local villagers
- wildlife sightings
- geovisualization interview
- habitat suitability mapping
- data quality
- black-and-white snub-nosed monkey (Rhinopithecus bieti)
Habitats provide resources such as food, shelter, potential nesting sites, and mates for wildlife to achieve survival and reproduction . Understanding the requirements or preferences of wildlife on their habitats and assessing the quality of wildlife habitat is of great importance for conservation biologists and conservation managers . For example, wildlife habitat assessment supports conservation practices such as ex situ or reintroduction and restoration conservation, predicting risk of invasive species, systematic conservation planning, assessing threats, and setting conservation priorities [3, 4, 5, 6].
One approach to assessing wildlife habitat quality is to predict wildlife habitat suitability maps indicating the spatial variation of habitat suitability . Habitat suitability mapping is often carried out in a geographic information system (GIS) . Conceptually, spatial prediction of wildlife habitat suitability requires GIS data layers characterizing the environmental conditions (environmental data) and knowledge on the relationship between wildlife habitat suitability and environmental conditions. Based on the relationship and the environmental conditions at a location (e.g., a pixel), the in situ habitat suitability can be inferred. Inferring habitat suitability at every location in the study area of interest results in a suitability map . Such a habitat suitability map can then be used to assess the spatial variation of wildlife habitat quality and to support conservation. With the rapid development of geospatial technologies, environmental data for characterizing environmental conditions are becoming abundant and increasingly available [9, 10]. The key for wildlife habitat assessment through habitat suitability mapping therefore lies in obtaining knowledge on the relationship between wildlife habitat suitability and environmental conditions (environmental niche).
Data-driven approaches are most commonly adopted in deriving the relationship between wildlife habitat suitability and environmental conditions (environmental niche modeling) . Data-driven approaches for environmental niche modeling require wildlife data indicating habitat use, for example, abundance data, presence and/or absence data. Wildlife data are overlaid with environmental data layers to extract the environmental conditions at locations where habitat use occurs. The relationship between wildlife habitat suitability and environmental conditions can then be derived through statistical analysis, machine learning, data mining, or other modeling techniques . Thus, wildlife data become the key to deriving the relationship between habitat suitability and environmental conditions for mapping wildlife habitat suitability for wildlife habitat assessment.
Traditionally, wildlife data are collected using various techniques such as field observation, radio telemetry, infrared trapping cameras, and global positioning system (GPS) collars [11, 12]. Accurate wildlife data can be collected through these techniques, but admittedly these techniques are also somewhat expensive to deploy . The high cost may prevent these techniques from being used in wildlife data collection, particularly for areas and projects with limited budget support. Besides, some of these techniques (e.g., field observation and GPS collars) are logistically difficult for areas with rugged terrains and limited accessibility . Low-cost techniques such as trailing wildlife markings and interviewing local people about wildlife sightings through questionnaires are also used for wildlife data collection, but wildlife data collected with these techniques can be of low quality (e.g., inaccurate spatial location and/or time) [14, 15]. Cost-effective methods for collecting wildlife data of satisfactory quality are ideal for wildlife habitat assessment and sustainable conservation given that much of the world’s biodiversity occurs in the world’s poorest and remote countries .
Local residents were proven to be a cost-effective source of obtaining wildlife data [17, 18]. Many local residents, such as those living in remote rural areas and particularly those whose livelihoods are closely linked to ecosystem services (e.g., subsistence farmers, shepherds, and hunters), spend a great deal of time in the field. They encounter wildlife in its natural environment and, as a result, accumulate a rich knowledge about the wildlife habitat use. Wildlife data elicited from local residents at relatively low cost, although may be subject to data quality issues (e.g., data credibility, positional accuracy, spatial bias, etc.), could be used to support and sustain conservation programs with limited budget.
From a broader perspective, the increasing availability of citizen-contributed data accompanied by the advancements in GIS has created the opportunity to make full use of citizen science to address many real-world problems. On the one hand, citizen-contributed data have become increasingly available with the resurrected popularity of citizen science  and the emerging phenomenon of volunteered geographic information (VGI) . One prominent example is the eBird citizen science project that is driven by bird watchers and documenting bird species across the globe . On the other hand, the advancements in GIS capabilities (e.g., geovisualization and spatial analysis) have made it possible to accommodate the data quality issues associated with citizen-contributed data to make use of such data for scientific inquires [22, 23].
This chapter offers an overview of citizen science for wildlife data collection and its integration with GIS for wildlife habitat assessment. A case study of habitat assessment for the black-and-white snub-nosed monkey (
2. Citizen science for wildlife data collection
2.1. Citizen science
The rapid advancements of geospatial information technologies in the last decade have greatly prompted the flourish of citizen science. Location-aware portable devices constantly connected to the Internet (e.g., GPS-enabled smart phones) are now commonplace. Average citizens thus can conveniently contribute georeferenced wildlife observations using such devices via social media, mobile map, citizen science project mobile apps, etc. [26, 32, 33]. From a geographic and GIS perspective, citizen science involving geospatial data generation (e.g., wildlife sightings with location information) is called “geographic citizen science”  and the georeferenced wildlife observations are a form of VGI [20, 34]. Due to the increasing availability of enabling technologies, millions of citizens across the world are participating in citizen science projects and many of them are contributing large volumes of wildlife observations on a daily basis. Interested readers can check out a wide range of ongoing citizen science projects (not limited to wildlife-related projects) at scistarter.com and search for specific projects by topic and/or location. As of the time of writing, searching projects at scistarter.com by the topic “Animals,” “Birds,” and “Insects & Pollinators” returned 382, 162, and 190 projects, respectively. As a prominent example, the eBird project [21, 35], launched in 2002 by the Ornithology Lab at Cornell University and the National Audubon Society, as of November 2016, has engaged over 330,000 bird watchers from more than 250 countries who have reported observations of over 10,300 bird species. As of June 2018, eBird has accumulated over 500 million bird observations in its global database; in recent years, there have been more than 100 million bird observations added to the database each year.
Wildlife data contributed by participants in such citizen science projects are a form of geospatial big data [36, 37]. Complex patterns can be discovered from such intensive data through visualizations, simulations, data mining, and various modeling techniques to provide valuable insight for forming concrete hypotheses about the underlying ecological, biological, and geographical processes that generated the observed data . Thus, the abundance of citizen-contributed wildlife data has the potential of shifting research paradigm in biological, ecological, and geographical studies from the traditional hypothesis-driven approach to the emerging data-driven approach; for instance, scholars are promoting the idea of “data-intensive science” for biodiversity studies and “data-driven geography” [36, 37, 38].
2.2. The (dis)advantages of citizen science for collecting wildlife data
Citizen science has several advantages as an alternative mechanism for collecting wildlife data. Citizen-contributed data contain rich local information that spans a wide temporal spectrum because citizens, as local experts and sensors , have long been sensing and accumulating knowledge of their respective areas. Citizen science also has the potential to provide wildlife data over large areas, given that billions of networked human sensors are distributed across the globe. In addition, citizen science can provide timely updated wildlife data that are difficult to obtain and maintain through other techniques but can be easily elicited from citizens living in the local areas. Moreover, citizen-contributed data are much less expensive than traditional scientific data collection protocols (e.g., biological survey). In many cases, citizens contribute data purely voluntarily . This low cost is of great practical significance in many real-world programs falling short of funding support.
Due to the above advantages of citizen science, it is possible to obtain timely updated wildlife data using citizen science over large areas. Citizen science thus has a great potential to support and sustain long-time wildlife population monitoring at large spatial scale (e.g., eBird) and provide wildlife data for wildlife habitat assessment.
In spite of the strengths, one should be aware of the shortcomings of the “citizen science” approach to wildlife data collection. For example, this approach cannot be used in areas with low population where sufficient local citizen observers/informants are lacking. It is also not good for collecting data on evasive animals with little contact with humans. Most importantly, there can be data quality issues associated with wildlife data contributed by volunteer citizens (i.e., non-professionals) which make the data challenging to standardize and analyze [17, 18]. The following sections detail some of the data quality issues, their implications for wildlife habitat assessment, and how GIS techniques (geovisualization, geospatial analysis, geocomputation, etc.) can be adopted to tackle the issues toward reducing the impact of such issues on wildlife habitat assessment.
2.3. The data quality issues of citizen-contributed wildlife data
The quality of citizen-contributed wildlife data is the major concern when using such data for wildlife habitat assessment. The average citizens engaged in citizen science projects are not well-trained professionals; their voluntary data collection actions are mostly constrained by internal commitment. Thus, citizen-contributed wildlife data may or may not be accurate [20, 39]. Three aspects of data quality are particularly relevant to the use of citizen-contributed wildlife data for wildlife habitat assessment: data creditability, positional accuracy, and spatial bias.
2.3.1. Data credibility
In order to be useful for wildlife habitat assessment, wildlife data (e.g., sightings) reported by citizen participants need to be credible, that is, provide ground truth wildlife observations. Data credibility is affected by the characteristics of both the wildlife and the citizen observers (e.g., local residents). On the one hand, local residents often only observe wildlife that is active in the daytime. The target wildlife should be easily recognizable to reduce misidentification given that local residents usually have no training on species identification [17, 40]. On the other hand, local resident knowledge of the target wildlife, age, length of residence, and formal education also influence data credibility . For instance, performance in georeferencing tasks differs between novice and expert citizen participants ; there exists both between-observer differences  and within-observer differences (over time)  in BBS participant bird-counting skills.
Various methods have been developed for increasing the credibility of citizen-contributed wildlife data. Ref.  identified a total of 12 strategies that have been adopted by citizen science programs to increase their data credibility across different program stages including training and planning, data collection, and data analysis and program evaluation. As an example, eBird uses a two-part approach to assure data credibility during data entry : automated data quality filters flag records for review based on observation date and geographic location; a flagged entry, once confirmed as legitimate by the observer, is then reviewed by a regional expert reviewer again.
2.3.2. Positional accuracy
Position of the wildlife data used for habitat suitability mapping needs to be accurate so that the locations can be used to accurately obtain the corresponding environmental conditions at these locations from environmental data layers. Insufficient positional accuracy of wildlife data leads to mismatch between the locations of wildlife habitat use and the corresponding environmental conditions, and thus degrades the accuracy of environmental niche modeling and habitat suitability mapping .
Nonetheless, it is also important to note that the impact of positional accuracy of wildlife data on habitat suitability mapping depends on the spatial resolution at which suitability mapping is conducted. Mapping at high spatial resolution (e.g., using environmental data of 30 m × 30 m grids) definitely requires wildlife data of high positional accuracy that is comparable to the spatial resolution of the environmental data so that values of the environmental conditions at these locations can be accurately extracted from environmental data layers. In contrast, for mapping at coarse spatial resolution (e.g., 1000 m × 1000 m grids), the absolute positional accuracy of wildlife data does not have to be very high as long as it is accurate enough relative to the spatial resolution of environmental data in use.
2.3.3. Spatial bias
Wildlife observations contributed by citizens are often concentrated more in some geographic areas than others (i.e., spatial bias) because observations made by citizens are opportunistic in nature . Unlike well-designed sampling or survey schemes which allocate observation sites in a way such that the geographic space and/or the environmental space are well covered by the observation sites, spatial distribution of the observation efforts of citizen volunteers would be considered neither random nor regular in the sense of sampling or survey design. One example to demonstrate this is wildlife sightings elicited from local residents. Local residents are not intentionally tracking wildlife of interest. Instead, they typically spot the wildlife en route to doing something else. The routes on which local citizens spot wildlife would be considered neither random nor regular but “ad hoc” . As a result, wildlife sightings elicited from local residents are usually concentrated in areas with higher route accessibility.
Such spatial bias in wildlife data has a significant impact on environmental niche modeling and habitat suitability mapping for wildlife habitat assessment. Due to the spatial bias, citizen-contributed wildlife data might not be representative of the actual wildlife habitat use. The relationship derived based on the wildlife data thus might not well represent the underlying environmental niche. Spatial bias in citizen-contributed wildlife data, if not appropriately accounted for, would adversely affect the accuracy of wildlife habitat suitability mapping [47, 48, 49].
3. The roles of GIS
GIS is the ideal tool for conducting wildlife habitat assessment as it involves geospatial data. Besides providing an integrated environment for managing and manipulating environmental data layers and georeferenced wildlife data, GIS can also offer capabilities to remedy or address some of the data quality issues associated with citizen contribute wildlife data. Firstly, geovisualization can be used to facilitate wildlife data elicitation from citizen participants and improve positional accuracy. Secondly, based on the cause of spatial bias, spatial analysis can be used to compensate for the biased coverage in observation efforts. Lastly, geospatial computation techniques can be employed to address the computational challenges arising from analyzing very large volumes of citizen-contributed wildlife data.
3.1. Geovisualization to improve positional accuracy
In general, positional accuracy of wildlife data largely depends on the availability of positioning technology. Wildlife sightings can be accurately georeferenced with the aid of high-accuracy positioning techniques. For example, smartphones equipped with high-accuracy GPS units ensure generated data record is associated with accurate geographic coordinates. Nevertheless, the above observations hold only for citizen observers who are reporting or recording data at the time of sighting wildlife occurrences in the field. In many cases, local residents (e.g., farmers) do not keep records of daily wildlife sightings or they simply do not have access to GPS units or smartphones. Most often, wildlife data are elicited from their memories long after the time of sighting [17, 18, 23, 40].
Wildlife data (e.g., sightings) collected from local citizens through interviews or questionnaire surveys often have position information with unsatisfactory accuracy [14, 15]. Descriptions of the locations of wildlife sightings are often imprecise or vague, particularly if a long time has lapsed since the actual sightings. Such incapability partly results from the absence of an effective interviewing media (e.g., an intuitive and interactive representation of the natural environment where local citizens are active) that facilitates local citizens to recall and locate their sightings of wildlife. Ref.  collected distribution and abundance data of terrestrial tortoises from local shepherds over 1 km × 1 km grid cells with the aid of topographic maps. However, it is difficult to accurately locate wildlife sightings on topographic maps for the local residents who had no training in map reading.
Geospatially enabled and user-friendly geovisualization interfaces could help improve positional accuracy of the wildlife data elicited from local residents [50, 51]. Geovisualization, particularly 3D geovisualization techniques, can be adopted to help local residents to recall and locate their sightings of wildlife and obtain wildlife data with more accurate positional information . Given the flat 2D topographic maps, the local residents need relief interpretation skills to re-construct the 3D topography of the landscape; local residents can then orientate themselves and locate places on the landscape. But they often do not have much training in basic map reading skills, not to mention relief interpretation. 3D geovisualization can facilitate relief interpretation by producing a realistic and intuitive terrain representation  and improves visual search efficiency and navigation performance .
Geovisualization techniques as discussed above help improve positional accuracy of wildlife data at the very beginning of data generation. Sometimes, in cases where positional uncertainty does exist in wildlife data and is indeed of concern for wildlife habitat assessment, GIS-based methods have been developed to reduce its impact on the accuracy of wildlife habitat assessment. As an example,  proposed a spatial sampling method for deriving probable wildlife occurrence locations from patrol records using heuristics based on data recording context and species ecology to increase the accuracy of habitat suitability mapping.
3.2. Geospatial analysis to tackle spatial bias
Many geospatial analytical methods have been proposed to account for the spatial bias in wildlife data. An
Filtering samples in the geographic or environmental space (i.e., remove observations that are within certain distance of one another) is also applied to reduce spatial bias [56, 57]. This method is based on the heuristic that removing localities (i.e., field samples) that are within certain distance of one another would somehow balance the unequal sampling or observation effort. The key of this method is to determine the distance threshold properly.
If detailed information observation effort is available, such information can then be incorporated to correct for spatial bias. Spatial bias in wildlife observations was compensated for by weighting the observations with weights inversely proportional to the cumulative visibility at the observation sites, given that cumulative visibility is a good proxy of the underlying observation effort . Here, cumulative visibility is the frequency at which a given location can be seen by observers from the routes the observers take. It can be computed based on a digital elevation model (DEM) representing the terrain and the routes using viewshed analysis, a common GIS function.
A FactorBiasOut method was developed to correct for spatial bias in species presence-only data for species distribution modeling with MAXENT . This method first estimates an empirical distribution to approximate the underlying but usually unknown sampling distribution that generated the presence-only data. This approximate sampling distribution is then used to factor out the spatial bias in presence-only data. This is achieved by feeding MAXENT with background data that have the same spatial bias as the presence data. For example, occurrence data of a target group of species that are observed by similar methods can be taken as the estimate of the effort information and thus are used as the background data.
Recently, a general representativeness-directed approach was proposed to spatial bias mitigation in citizen-contributed wildlife observations (i.e., samples) for habitat suitability mapping . The key idea is to define and quantify the representativeness of samples and then properly weigh the samples to improve representativeness. Sample representativeness is defined as the “goodness-of-coverage” of the samples in the environmental covariate space, which in turn is measured by the similarity between the probability distribution of the samples in the covariate space and the probability distribution of all mapping units (e.g., pixels) within the study area. Spatial bias is then mitigated by weighting the samples toward increasing sample representativeness. The optimal weights that maximize sample representativeness are determined through an optimization procedure using a genetic algorithm.
3.3. Geocomputation to enable big data analysis
Citizen-contributed wildlife data are an important source of geospatial big data. In spatial analysis or modeling of such large volume of data (e.g., point pattern analysis, wildlife habitat assessment, and species distribution modeling), it is urgent to address the associated computational challenges. Geocomputation technologies could be utilized to address such computational challenges.
For example, over 100 million bird observations were added to the eBird database each year. Point pattern analysis is commonly used to discover patterns from such data. Existing point pattern analysis software tools are not able to handle geospatial big data efficiently. Cutting-edge geocomputation technologies such as cloud computing and GPU (graphics processing units) computing can be leveraged to accelerate point pattern analysis algorithms. The massively parallel computing powers of cloud computing and GPU computing effectively sped up point pattern analysis tasks on big data by a factor of hundreds [60, 61]. Given the significant acceleration brought by the geocomputation technologies, geospatial big data analysis tasks that once were computationally prohibitive can now be conducted in a timely manner.
4. Integrating citizen science and GIS for wildlife habitat assessment: a
Rhinopithecus bieticase study
A case study of mapping black-and-white snub-nosed monkeys’ (
4.1. Species and study site
The study site is Mt. Lasha area located in northwest Yunnan Province, southwest China (Figure 1). Mt. Lasha is near the southern-most part of its geographic range [63, 65]. The 20.31 km2 study area is an important habitat for a group of about 100
4.2. Data collection
4.2.1. Wildlife data elicited from local villagers
Geovisualization interview sessions were carried out by one biologist and one field assistant who were very familiar with the study area during July and August 2010. Sixty-eight local residents including herdsmen, hunters, and farmers who had extensive experience in the mountains from all five nearby villages were interviewed. The majority of them are aged between 30 and 60 (Table 1). The elicited
4.2.2. Environmental data
Environmental factors impacting
4.3. Accounting for positional uncertainty and spatial bias
Data elicited from local villagers impose two challenges, namely positional uncertainty and spatial bias. First, local villagers often recall
Geospatial analysis methods provisioned by GIS were adopted to address the two challenges. First, a frequency sampling strategy [23, 70] was applied to reduce the position uncertainty in sighting polygons provided by local villagers and to identify the representative locations for
Second, the spatial bias was compensated for by inversely weighting each representative presence location with cumulative visibility of the location from the routes taken by local villagers . In this particular case study, spatial bias in the elicited
4.4. Habitat assessment
A kernel density estimation-based method [23, 71] was applied to derive the relationship between
Across the three historical periods, high suitability habitats were in forests (Figure 4d) at mid-to-high elevation range (Figure 4a) on the northeast hill slopes (Figure 4c). Overall, high suitability habitats shrank in the 1987–2005 period compared to the previous period. As an example, the area outlined on Figure 5b in the 1987–2005 period is of much lower suitability compared to the 1973–1981 period. This might be a result of the introduction of new village settlements and roads in that area in the 1987–2005 period which induced significant human disturbance.
The derived relationships between
Wildlife data required for wildlife habitat assessment can be difficult and expensive to obtain with traditional data collection methods (e.g., biological survey, geographic sampling), especially for conservation programs with limited budget support in remote and poor areas. Citizen science offers a cost-effective way of collecting wildlife data to sustain such programs. Nevertheless, average citizens are non-professionals and their wildlife observation efforts are un-coordinated. Thus, wildlife data contributed by citizens may be subject to data quality issues such as positional uncertainty and spatial bias. This chapter provides an overview of citizen science as a means of collecting wildlife data, GIS-provisioned geovisualization, and geospatial analysis techniques for tackling the data quality issues of citizen-contributed wildlife data, and the integration of citizen science and GIS for wildlife habitat assessment. A case study of mapping
Support to the author from the University of Denver through the Faculty Startup Funds is greatly appreciated. The author would like to thank Dr. A-Xing Zhu in the Department of Geography at the University of Wisconsin-Madison for his invaluable inputs on designing methods, and Dr. Wen Xiao and Dr. Zhi-Pang Huang in the Institute of Eastern-Himalaya Biodiversity Research at Dali University and officials in the Yunling Nature Reserve administration for their generous support for field work of the reported
Conflict of interest
The author declares no conflict of interest.