Typical setting of various types of archive and new data types, and the necessary actions to put them into one data system.
This paper gives an overview about the theoretical and technical aspects of a geographic information system (GIS), which can provide a framework for scientific (hydrological, morphological, geological, etc.) analysis of cave survey data. It is emphasized that a GIS containing archive cave data is important because the information often is irreplaceable (the cave environment has changed), or the resurveying may harm the cave. Thus, it is proposed that a GIS of cave data be multidisciplinary to avoid unnecessary resurveying of caves. To produce such a system, one has to bear in mind many aspects, which is not always evident for the practicing scientists. Cave surveys produce spatial data, either it was measured with measure tape and compass or with a LiDAR station. It is a major issue in cave data processing that the spatial data produced in various surveys do not fit together due to the different methods and coordinate systems, or because of the various data types, which make it hard to syllabize the similarities between different sets of data. The paper focus on how to work with archive and new survey data, and how to handle maps, scans, and sampling data in one information system. From the aspect of data transfer, three main functionalities of a GIS are distinguished: processing, storing, and representation of the information. Discussing the theoretical and practical backgrounds of these functionalities, the paper presents the best practices of building a GIS from archive and newly measured data, emphasizing the importance of procedures like data management, quality control, and automation. The paper shed lights to the various data types that are usually related to cave surveys, to help cave scientist to control the data management and understand (and apply) the automatisms. Also, the probable technical parameters of future cave surveillance systems are discussed.
- cave surveying
- archive data
- cave data management
Cave survey data is collected for several reasons. Cave maps, 3D models, and parametric data for many disciplines of science can be created and extracted from the raw measurements. The amount of the collected data during a cave survey is much larger today than it was decades ago, thanks to the evolution of surveying tools and methods . Such amount of data is created and handled in several ways depending on the skills and intents of the surveyor/processing staff or the scientists involved, and sometimes due to the lack of geographic information system (GIS) experience, the aims cannot be fully achieved . However, the resolution of the collected information does not always match with the aims of the project, and this also may cause improper conclusions in a scientific project or inaccurate cave maps [3, 4]. In the classic times of speleological surveys—using measure tape and compass—the collected data was rather insufficient for scientific purpose and now with the use of terrestrial laser scanning (TLS) stations, it is rather too numerous to efficiently work with for many users .
Methodological papers about the analysis of cave data concentrate usually on the theoretical and not the practical aspects of data processing. For example, in the case of a new cave survey, the comparison of the newly collected data and the archive data is reported in several studies [3, 4], but the details of how the different data packages were incorporated in one system remain in the background. For several spelunkers—however—these details are possible sources for errors due to the lack of experience with coordinate transformations or management of data files, and programs, which are involved in this process. Such comparisons are often used to illustrate the higher accuracy of the new surveying methods, and while this is reasonable, the archive data often preserves notes and observations, which bear scientific importance (e.g., locations of bat colonies, archeological specimens, water sampling, etc.). Moreover, archive data often preserved cave conditions that have changed since the time of the survey. In an optimally built GIS, both the new and the archive data are positioned correctly and the database of the surveys can be queried simultaneously.
The aim of this paper is to highlight the cave data processing from the aspect of a geographic information system (GIS), and to demonstrate that such information system can be used to help scientific projects to combine newly measured and archived data. Using a GIS means using database tables, data transformation tools, statistical and spatial analysis, filtering and extracting certain parameters from the raw data, and finally (but not necessarily) visualizing the results in 2D or 3D. To do this efficiently, one must possess knowledge about such processes, and the know-how is so complex that it has become an advancing discipline: the GIScience.
2. The nature of cave survey data
All cave survey project starts with a plan to do something for a certain purpose. At this point, the surveyors’ knowledge and the available instruments will determine the quality of their future data. Although, contemporary surveyors have already upgraded their instruments, still there are lots of data from the predigital times. Most of the caves in the world were surveyed (at least partially) with measure tape and compass, by speleologists progressing in the cave passages from station to target points . The data consists of several records containing an array of distance, dip, and azimuth measurements making a 2–10 m long spatial vector from each record. The sequentially joining vectors form a 3D network (Figure 1), and each node of this network can be represented with x, y, z coordinates. Although, the survey produces its own coordinate system, where the origin is the survey base point (entrance station), the whole set can be inserted in real geographic position defined in a Euclidian geodetic reference system (X, Y, Z) if the cave entrance was measured with geodetic instruments (i.e., total station or GPS). Additionally, the width and height of the cave passage can also be measured at each station to provide data for 3D cave models [7, 8]. The punctuality of the measurements can be enhanced if the network is measured backwards too (back to the original station), or if loops are present in the cave where the closing point of the loop is identical to one of the previously measured stations. This latter measuring sequence in the loop may also be measured reversely, and the average error can be distributed among the stations of the loop [3, 6]. This method (loop closure), however, was not always followed by the surveyors in the past due to several reasons. Most often, the backward measurements are missing, and the whole passage system was probably measured in different times by different people, which means that the condition of the data is varying. Usually, the most problematic part of creating a consistent GIS is the harmonization of several unit systems, data structures, content types, and coordinate systems.
3. Inputs and outputs
The condition of the created survey data will determine how much time is necessary to build a working GIS. The GIS is always built to serve as a “tool” to achieve the aims of a certain project (even if onward use is also in the hat), so the planning should consider the possible sources of the cave survey data. Basically three scenarios are the most frequents: (1) the data is to be salvaged from archives, (2) new survey is done, and (3) partly new and partly archive data are processed. In all cases the created data will be processed, stored, and visually represented in some way.
The data processing tool is responsible for the digital recording of text (notes, reports), alphanumeric and logical (true/false) information, images (photos, scanned documents), and vector geometry (line plot maps, sections).
Data storage devices provide secure data store and availability of alphanumeric and logical data, images, and texts in digital or digitized documents. Digital data is located on mass storage devices in its appropriate format and accessed via database and file management tools.
The tool, which represents the data as a 2D map or a 3D model is a complex application, which not only visualize the data, but most often serve as the GIS environment. It makes possible to access not only the visual representation of the data, but all the collected information from the database.
These main functionalities are accessed through programs directly developed for cave survey data management (Compass, Visual Topo, TopoRobot, and Therion), or with well-known applications developed for general data management (Excel, ArcGIS, AutoCAD, and many other). In either way, the application itself becomes organic part of the information system of the survey project with all of its data formats and processing algorithms. In some cases, the applications themselves are only applying external programs or scripts , increasing the complexity of the whole system. Selecting the tools for these functionalities is one of the most crucial parts of planning of the GIS. The aspects should include the consideration of the aim of the project firstly, but later the usability and the spatial dimensions should also be tackled. The more we rely on archive data, the more we should involve noncave mapping applications to bring the data into acceptable form.
3.1. Data types of an archive survey
Using only archive data is appropriate in the case when the project cannot afford new surveys either because of time or financial limits, or the cave has an endangered environment where even the survey may cause serious damages. The spatial resolution of the data is usually suitable for morphometric analysis [10, 11], but realistic 3D models cannot be created. Archive survey data can be collected from paper-based documentations (reports, notebooks, maps, and published papers). In this case, the GIS will be composed of the digitalized forms of these documents, and will always incorporate the following components: database, digital map, original documents (scanned), and the tools (programs). In a general purpose GIS, the main characteristics of these components are the followings:
The database is to be built from the notebook records. Although, the temptation is usually great to skip seemingly the irrelevant information (e.g., the condition of measuring) during digitization, it is always useful to fill the database with attributes like “CONDITION”, or at least put such remarks in a “NOTE”. The database usually contains the following attributes: date of survey, station-id, target-id, distance between station and target, dip, azimuth (angle from the north in clockwise rotation), width and height of the passage at the station. If the entrance points are measured with GPS, the database can be completed with the latitude, longitude, and elevation of each point using Euclidian geometry and the vector data.
The digital map is created from scanned paper maps usually to provide additional information about the morphology of the cave. Maps contain the outlines of the passage levels indicating the characteristic morphology of the walls and the main artifacts. Additionally point-like objects, names, transversal, and longitudinal sections are also displayed (Figure 2). The maps which are suitable for morphological analysis have usually large scale (>1:1000), but rarely contain geographic or geodesic coordinates, so the first step is the “georeferencing”, which means that identifiable points (e.g., the marked stations) on the map are associated with their geographic coordinates. These coordinates can be calculated from the base point. The map processing starts with the determination of the data types, which are selected to be digitized. Each map data types (points, texts, lines, polygons) will be stored as a graphical element associated in one or more files. The processing tool will determine if the created files are suitable for GIS, thus, it is most appropriate to use a GIS program directly (e.g., ArcGIS, MAPINFO, or QGIS). These programs are also suitable to do the “georeferencing” with the help of the previously processed database.
The digital archiving of the original data is advised in the case of the notebooks and necessary in the case of the maps. The scanned documents are usually in pdf or jpg format, and obviously stored on a hard drive. However, the location of the archived data is highly relevant from the aspect of the GIS, because the map processing tools will record the name and the source (folder) of the map file during the digitization process, so it will cause problems if the folder structure or the file name is changed after the process has started. This is also true for the created files at the end of the digitization: we will face with file access problems if their location or name is changed.
The preferred tool can be one of the cave surveying programs, but depending on the condition of the recorded measurements other processing tools can also be appropriate. If the records are accurate (all the previously listed attributes are present), the surveying program can produce a representation too and calculates most of the morphometric parameters. In this case, the maps and vertical sections—if digitized in convenient format—can refine the results. If the records are incomplete (e.g., the passage width and height data are missing), the maps and sections can be used to complete them. This is more easily achieved in a standard GIS program (e.g., QGIS).
The program and the file-folder structure, will be a part of the information system. If more programs and people are involved in the data processing, a unified nomenclature of files and thoughtful folder structure can help to avoid file access failures. The reliability of the resulting data depends mainly on the quality of the original survey, but due to the manual acquisition of data and sketching of passage morphology, it is always biased compared to the new methods . Furthermore, the digitization of archive data is also prone to transcription errors.
3.2. Data types of new surveys
New surveys are still carried out mostly with the station-target approach, but with modern (fast and accurate) instrumentation. The accuracy and resolution of the collected data is usually much larger compared to the traditional methods, but still requires human expertise both in the data collecting and the processing phase. The most widespread surveying tool today is the DistoX, which is based on the combination of a laser measuring tool and a handheld computer . The two devices are connected via Bluetooth, and the mapping program on the mobile device (PDA, tablet, or a smartphone) handles the database of the measurements providing graphical user interface for the on-site map compilation too. The software—running on the handheld computer—automatically handles the loop closures if new survey tracks are measured, modifying the coordinates of the existing stations too. The method is based on the algebraic minimization of the root mean squares (rms) of the differences.
Although, this mapping system is a GIS application in itself, it is not designed for the postsurvey processes (e.g., map making or morphometry analysis). For these purposes, several external component programs are used that can import the surveying program’s output file types. The output formats are the common vector graphics (e.g., dxf—a simple text type file describing shapes and geometry in a well-documented syntax ), and the text-type database with rows and columns are compatible with the usual cave survey managing programs. In the case of a DistoX survey, the raw data structure is quite similar to the previously introduced traditional database (Figure 3), having the advantage of being in digital form natively.
In contemporary surveys, the ultimate aim is to increase the speed and accuracy of the measurements using digital data. With DistoX, the transcription errors can be bypassed by direct recording in the handheld device , but there are still biases: (1) shooting the laser beam to a few selected points in the spelunker’s field of view and (2) the manual generalization of the cave morphology by drawing the map on-site. Although, the process can be enhanced—shooting more and more targets—the resolution of the survey will always be lower than the surveys’ done with a TLS.
The use of static terrestrial LiDAR instruments—despite of their impractical nature in harsh environments—is on the rise [5, 18, 19]. These tools produce thousands of range and angular data in few minutes measured from the station’s location. The target points—similarly to the DistoX—are measured with one laser beam, but in this case the instrument repeatedly shots the beam to new targets swaying almost the whole field of view during one session. The point cloud of a scanning session at one station consists of nonoverlapping points forming a grid when using 3D polar coordinates (yaw, pitch, and range) or a data table when using Cartesian (x, y, z) coordinates [20, 21]. The former one is considered as a raster type data and can be easily fitted with panoramic photos shot from the same position. To do this, the scanning instrument should be equipped with an optical camera too. However, it is more common to export the scanned data with x, y, z coordinates in binary.las files . Although, other formats are also exist, most of the point cloud processing programs (e.g., MeshLab, ReCap, Microstation, and CloudCompare) are able to import and export las-files.
Concerning the coordinates, the point cloud data is in a local reference system relative to the station. Data from multiple stations can be combined if the scanned surfaces overlap with each other, and if artificial backsight targets (i.e., regular-shaped objects) are placed into the common field of view of the subsequent scans. This process is done either automatically or manually within a desktop application after downloading the data from the TLS instrument. Both processes are based on “best fitting” approximations defined mathematically in the algorithms of the processing tools. The error of the fitting depends on the method we choose in the fitting approximation—usually the least-squares method—and the range of error is usually documented in the programs’ description, although it also depends on several other factors. According to Lichti and Gordon , there are five error types which are distinguishable in a TLS survey: (i) the placement of the survey stations and the backsight target object; (ii) instrument leveling and centring; (iii) backsight target centring; (iv) raw scanner observation noise; and (v) laser beamwidth.
The precision parameters for the instrument, and the method can be obtained from the documentation if we know the range of the shots (beamwith of the laser beam is calculated) and the magnitude of the de-noise algorithm (removing outliers from the point cloud) relative to the range. Yet, at least two of the above listed errors are not independent from the human factor during the survey: the placing and the leveling of the instrument. However, attempts to reduce the chance of human errors are already made in the TLS procedure too; some LiDAR tools do not require manual fitting of backsight target objects to position themselves at the subsequent stations, and the precise leveling of the instrument is also done with automatic sensors and motors. Sometimes, though, this is quite problematic because of the size of the TLS instrument and the positioning is still the decision of the surveyor (Figure 4).
If the fitted sequence of the survey sessions contain at least one (but rather two: entrance and exit) positions where the geographic coordinates are measured with GPS (or other geodetic instruments), the whole survey can be transferred from a local (x, y, z) to a geodetic coordinate reference system (X, Y, Z) and can be referenced to other data (i.e., maps) surveyed previously (Figure 5).
Leaving behind the station-target method, techniques of high-resolution mobile mapping—such as the Zebedee—are also emerging . Two data types are generated from this approach: a point cloud and a trajectory. The point cloud data is a huge list of x, y, z coordinates enriched with a set of attributes (the intensity of the reflecting beam or the precision of the calculated coordinate) associated to each of the points. The trajectory data is a much smaller set of coordinates in a strict sequence, defining the movement of the surveyor in the Euclidian space. This data is quite similar to the polygon network of the archive surveys—with the distinction that the vectors of the trajectory do not necessarily join in one single node at the branching points of the passages. The instrument is a lightweight handheld LiDAR station combined with an inertial measurement unit (IMU), which provides measurements of angular velocities and linear accelerations. The IMU also contains a three-axis magnetometer. Based on the incoming data from the measuring instruments, a portable computer calculates the trajectory of the surveyor and the position of the point cloud relative to this trajectory. With this instrument, several thousands of point data are collected within seconds; and obviously, the method, which estimates the trajectory may produce errors. To correct these errors, the comparison of the overlapping areas helps—like the loop closure method in traditional cave surveying—to minimize the differences. The software uses best fitting algorithms to automatically localize similar patches of scanned cave parts . The point cloud data—similarly to the data types of a TLS—is a .las-file or a zip-compressed .laz-file.
3.3. Combined data types of archive and new surveys
When new survey is done with modern instrumentation, usually the subject is a cave, where spelunkers worked previously and produced several kinds of archive data. The newly measured and the archive data both provide valuable information for scientist, thus, they should be integrated with each other. The two datasets can be paired along well defined spatial constraints—like identifiable morphology or artifacts (Table 1). For example, if some points of the archive survey are marked permanently in the cave, the installed artifacts can be identified on the LiDAR point cloud as regular-shaped objects. If the markings are too small, more apparent objects can be mounted temporarily on the cave wall, where old markings are found (e.g., uniform-sized disks).
|Archive data||New data||Action|
|Map||Survey database||Identify station locations on the archive map|
|Survey database||Survey database||Match the data structure; harmonize the coordinate system (check the validity on overlapping areas)|
|Map||Point cloud||Identify characteristic points on the archive map (based on morphology, or permanent markings on the cave wall)|
|Survey database||Point cloud||Identify the possible locations of the archive stations—based on notes, and/or permanent markings|
|3D model||Point cloud||Compare the locations of the base stations; check the point cloud processing methods if only the mesh (3D model) is available|
|Documentation||Survey database||Depending on the type of the document: matching descriptions with the new survey, photo localization, section orientation|
In some cases, the structure of the archive dataset (i.e., column sequence in the data table) may have similar characteristics to the new one, but the reference systems of them are different. To avoid errors in later phases, the two sets should be checked at overlapping parts before unifying the two databases.
Archive data is not necessarily old data. Point clouds of several TLS survey sometimes are given to scientists to process the data and extract new information from it, but the surveys may come from different groups, who worked with different instruments. It is also possible that the point cloud data (las-files) are not accessible, only the 3D model of the cave—derived from the point cloud. Such models can be created in several ways—basically using stochastic methods—and the generalization of the surface (i.e., the level of details) mainly depends on the used method. If the method is unknown for some reason, the relation of the 3D model and the original point cloud data has an uncertainty, and in practice, the correct position of the model can be achieved only if overlapping cave parts are present in both the model and in the newly measured point cloud.
Various documents may exist if a cave is well known for a long time, and the overarching aim of a GIS is to integrate these data into a common spatial context. Depending on the type of the archive data, the integration takes different amount of time. The process involves the same methodology, what is described in the case of the archive data (i.e., scanning and structuring), but it can reach better results due to the presence of the new survey. The most challenging task, however, is the positioning of archive photos, which requires deep knowledge of the subject cave. The archive photos can usually be located simply to one spatial position, but in some cases, both the subject’s and the photographer’s position can be reconstructed. This information is stored in a separate data table, or in the header of the image file, which can be extracted with a photo-editing program. Using this information, the photo can be draped on the cave surface model.
Not only archive photos are the subjects of the systematic process of cave-related tasks. Close-ranged photogrammetry is an emerging method of cave modeling besides TLS or combined with it . While TLS is preferable if the morphology of a cave is the subject of the survey, while in the case of an archeological site, the texture of the cave wall is the prior aim of documentation. Even 3D models can be created simply from the images, if abundant overlapping photos are taken with the same interior parameters (focal length, distortion parameters). This is achieved with programs (e.g., Photoscan), which can reconstruct the relative spatial positions of the photographer via the comparison of the texture of the images.
4. The structure of a survey database
The GIS works effectively if the links between the different subsystems are well defined, and the users understand these definitions enough to maintain the links. The regulations should be built in the programs as deeply as possible (error handling) and be documented in manuals. The subsystems can be connected into a data system if:
The data transfer is well established (documented).
The quantitative data sets are in the same unit system.
The qualitative data sets are defined on the same contextual basis.
The data transfers occur in those cases when the user attempts to work with data, which was created or stored in different program from which she/he uses momentarily. The most basic act of data transfer is opening a file with a program. If the file has improper syntax, this simple act may result in only error messages. Obviously, one can prepare for this by saving the survey data in proper file format, but what is “proper” depends on the software used as the component of the system. Ideally, the whole sequence of the processes is planned prior the survey, but it should be documented in written format at least during work; otherwise, the process is not reproducible. These documentations contain the list of the system components (component programs (CP)), the connections between the components (input-output (IO)—formats) and the location and the naming system of the files (Table 2).
|Scope of the documentation||Description|
|1||Database management||Definition of database structures (RDB andxml), queries, connections, table/file names, attribute types, etc. Handling the different conceptual categories in qualitative data types (i.e., different nomenclature in source data). Handling the various measuring systems, units and calibrations in quantitative data types|
|2||Processing sequences||Sequence of the actions of the work (from measuring the data to publishing the results), and naming the programs, which were involved|
|3||Automated processes||List of scripts and programs that perform automatic processes, indicating certain actions they involve. Logic and syntax of file and folder naming for input and output files|
|4||Quality control||Description of the possible errors of each action (both manual and automatic), and the definition of the acceptable error range. Description of the possible validation methods|
In some cases—at least partially—the component program itself creates the documentation, which can be in various forms (e.g., text, rtf, and xml). In a component program equipped with GIS functionality, the user connects the different data types in a workspace, and the connections are saved in workspace (ws) file. In a general purpose GIS workspace such as the QGIS, an xml-type file is created containing the <
From the technical aspects, a well-written documentation is usually more valuable than the published result of the survey (i.e., a map or a 3D model of the cave) because it contains the stepwise methods, providing the benefit of reproducibility of the results. But writing a thorough description of the methodology not only benefits third party readers, but the scientist too who actually work on the project, because the documentation itself may also form the basis of scientific publications. In the process of reconstructing the results of an archive survey, the logs are also quite valuable.
4.1. Database management
Due to the modularity of the GIS, the survey database is neither a homogenous table (i.e., an Excel worksheet), nor a uniform file, although the data modules may take such common forms. Original form of the data modules are differentiated into raster, vector, and alphanumeric attribute types. If the original data is assigned into a workspace file, where the different types of data are linked, a GIS database is created despite of the diversity of original forms.
Database management involves the structuring, the maintenance, and the querying of the data through one or more user interface (program component) such as a GIS application (Figure 7). In a GIS, it is very rarely a linear sequence of tasks, but rather an iterative process. The iteration involves mainly the positioning of the data: if new measurements are available, the calculated coordinates of the existing—processed—data may change due to the fitting methods. Also, the quantity of the processed archive data may increase. To avoid discrepancies in the database, the relations of the different data modules should contain links pointing to each other. These links allow one to manage the system without changing the database records manually. This is usually done automatically within the program component that manages the different data types (e.g., closing a loop in Therion will modify the passage geometry of the whole cave map ).
The data physically is stored on a hard drive, and it is evident that the data access must be ensured throughout the managing procedures. To ensure this, the folder structure is recorded in the workspace file, but it is also important to notice that the component programs—installed on the computer—have their own logic in storing the data. For example, the default file saving path of a GIS program can be the same where the file settings were installed, and when the program is uninstalled or reinstalled with a new version, this folder can be overwritten or removed. These folders are the system folders, which belong to the component programs together with those folders where the executables are installed. To avoid loss of data, one should not store or save acquired data or workspace files in system folders.
4.2. Processing sequences
The well-documented sequence of actions in the work (from measuring the data to publishing the results), naming the programs which were involved, is like a cookbook: one can achieve the results without it with experience, but for those, who are not familiar with the whole procedure, the stepwise aid is necessary. The processing sequences are the main components of a technical documentation. Basically, two scenarios can be distinguished:
The data acquisition is already done and the GIS is aimed to incorporate the available information into a coherent system.
The data is yet to be acquired, and the project can identify core components of the GIS, which are planned.
In the first case, the project obviously involves archive data even if the data is not so old, and the main task is to revise the different sources. The processing sequence describes what and how we modify the data to incorporate it in a GIS program. However, the data quality—in this case—cannot be truly modified, by filtering out the unmarked records or outliers, it can be enhanced. Modifications are done in the qualitative and quantitative categorization of the data (measure units, coordinate system, and nomenclature). The definition of qualitative categories may also be different in the data (e.g., what is considered as the minimum size of a conduit?).
In the second case, the planning should include the preparation of templates (e.g., tables), and user guides. Moreover, the incoming data should be error-checked, prior to passing it to the subsequent phase of the work.
4.3. Automated processes
In most cases, the automatic processes help the users in time consuming, repeatedly occurring tasks, such as recalculating the spatial positions of the survey stations after loop closure . Not so long ago, the cave maps have had to be updated manually after a new survey track was added to the existing system, even if the loop closure was calculated by the software. Fortunately, this task is also automatized in some of the cave surveying programs . The automatic methods in cave data processing are developed in three main areas:
Determining the spatial position is one of the most challenging tasks even in contemporary surveys. The polygon network ideally has to be connected to two distinct surface points (entrances or wells) to fit the lower order stations in between them, but the two way measuring can also improve the quality of the data. The fitting (calculation of the station’s position) is done by linear algebraic equations and automatized in the surveying program. The logic is the same if we work with LiDAR.
Modeling of the cave is done for several reasons, but the aim is usually to create an irregular shape in the virtual space. Subsequently, one can calculate the parameters of it, or simply use it as a visualization of the hardly accessible location. The models thus, are parametric or realistic ones. Parametric models can automatically be generated from the survey records extruding 2D geometrical shapes along the station-target vectors [10, 11]. The visual representation of such a model is schematic compared to a realistic one (Figure 8). To produce a parametric model, one must not even have to visualize the model to obtain the results, which are numbers indicating the volume, surface, and rate of void in the incorporating rock.
Creating realistic models—however—is a more common aim among cavers. Besides the table-, and the map view, the cave surveying programs provide 3D visual representation of survey results for a long time (helping the cavers to understand the passage structure more easily). In the popular surveying programs, the modeling is also based on the extrusion of a geometrical object along the station-target vector, but to enhance the model resolution the vertex number is increased in the surveys regarding the transversal sections. Instead of just 4 (LRUD method), 6 or 12 equally distributed radial vectors are measured around a station perpendicular to the station-target vector . The sections are placed along the polyline network, and the more the vertices are measured, the better the realistic model will be. The edges of the adjoining shapes are also smoothed automatically using tangential curves and radial base functions.
When working with a LiDAR data, the automatic methods help fitting the point clouds of different stations to each other. The fitting result is described with statistical parameters and in some cases with a new attribute of each of the points showing the quality (quality map).
Automatic processes in data management are responsible mainly for the data loading and updating. This automatism occurs when the database is located on a server, while the GIS interface is on a client computer. If a working GIS is established and the database connections are defined, SQL scripts can update the client side regularly querying the database server. The data upload process is also automatized in this case: the data logged in a survey management program (or just in an Excel sheet) is written in a certain file format, which can be data-mined with scripts (primitive programs developed for repeatedly occurring tasks). The script code can work with any type of data (raster, vector, or alphanumeric). It extracts the data from the structured file and uploads it to the server-side database. The data mining scripts only work well if the files are located on the predefined path/folder; otherwise, the data is not loaded in the main database.
4.4. Quality control
One of the most crucial issues of archive data processing is the estimation of errors, which are present in the sources. Errors mostly affect the spatial positioning of the base data, thus, it is important to find ways to compare the existing data to something we can surely decide whether it can be trusted. It is also important to know how the errors were originally put into the data. The QC of the archive data is usually based on new (control) measurements.
For example, if loop correction was done with survey management software, the geographic position of the passages might have changed drastically. On the printed editions of the map, this was not always fully tracked. The farther in the past we go back, the bigger is the chance we find cave maps with uncorrected parts edited manually after new survey sequences. In fact, most of the result maps of archive surveys inevitably bear such kind of inhomogeneous errors distributed over the whole area.
This produces many possibilities for subsequent misinterpretation and first of all, we have to obtain a consistent database of the archive survey tracks to calculate loop closures. Though, this task is only a matter of digitization of the field notes in most cases, it is quite problematic if the survey database (records of the measurements) is not available. In the latter case, the polygonal network of the survey has to be reconstructed from the maps, and there is no way to tell what the error of the survey is, until conducting a new one. However, it is observed that the estimated error is not less than 1%, but sometimes reaches 5–10% of the distance from the base station .
Correcting the geometry of an archive cave map is often a first step toward building the GIS. Regarding the inhomogeneity of the spatial errors over the mapped area, the correcting method must also apply to spatially varying functions to modify the misplaced parts of the map. The simplest—and most adequate—method for this is based on the irregular network of triangles among ground control points (GCPs). The GCPs are usually the stations of the archive survey and can—ideally—be identified on both the scanned archive map and the line plot map, which is created from the survey database. The two maps make two differing geometric manifestations of each triangle in the network, although the corner points (stations) are literally the same. The comparison of the triangle pairs—using the Euclidian coordinates as variables—results a first order 2D function of transformation, which can be applied on all points within the area of the triangle (including the nodes). Usually the line plot map is accepted as the base and the scanned paper map is the one to be modified triangle-by-triangle. This method is also referred as the “rubber sheet” method. The absolute difference between the two positions of the triangle’s points can be expressed as an attribute in the database making it possible to compile 2D “error maps” as quality indicator of the archive map.
After correcting it, even an archive map can be used to extract enough spatial parameters for 3D cave modeling. The volumetric parameters of the cave model can be calculated from the transversal and longitudinal sections. The error is estimated from the comparison of the archive section and a newly measured one. For this comparison, basically the geometrical parameters (perimeter, area) are used. The experienced rate of difference depends on the resolution of the source material (the lager, the better) and the shape of the passage profile. This implies a very important thing in cave data processing, which is the uniqueness of each cave.
In the case of new surveys, the QC is usually maintained largely by the processing programs by logging the accuracy of TLS measurements and the surface fitting parameters. However, the user must consciously handle these logs (often presented as a simple message after finishing a work phase) to track and report the confidence of the created model. In this case, the user created logs and documentation of the data processing can be the proper form of having a control on the quality.
5. Uniqueness of cave investigation
Caves are unique systems which evolved in unique geological, morphological, and climatic circumstances. Although following the processing sequences and using automatism may help one to process and analyze the data quickly, people have to keep it in their minds that the certain cave may be different from the previously processed one. The uniqueness on one hand comes from the differing aims, but it also originates from the specialities of the surveillance techniques. Concerning the surveying part of the investigation, the position of the cave relative to the groundwater level is the most influencing factor followed by the passage geometry. The surveying can be extremely difficult in subaquatic caves, where the water is muddy and easy in dry and comfortably wide passages with box-shaped transversal sections.
During the last decade, the TLS surveying technology has evolved to a level of flexibility that makes it adaptable to suit different geometric conditions in caves . There also exist several well-documented projects, which provide the necessary stepwise help in combining photo documentation with TLS survey data. The uniqueness of the geological settings and the hydrogeological history of the area however, is still an issue in interpreting the data. Thus, purely mathematical approaches extrapolating throughout whole regions must be handled with caution because they may lead to false results. It was shown by several authors that on regional scale, the regularities of the cave distribution in the karst depends on variables like the thickness and dipping of beds , presence/abundance of tectonic fractures , the rate and direction of vertical movements , and the hydrological settings (hypogenic/epigenic conditions ).
6. Related sciences (convergence in scientific approaches)
GIScience in the context of cave investigations represents a convergence of surveying (geodesy), modeling (mathematics), and application of the results (different disciplines of sciences). The disciplines include mainly natural sciences (geology, hydrogeology, karstology, climatology, morphology, etc.), but archeology may also be involved if the cave contains cultural heritage. This highly varying scope of possible surveyors of caves brings up concerns about the reusability of the surveyed data. A spelunker should bear it in mind that the collected data will affect the cave in longer terms either positively or negatively. If the collected data is disclosed or unorganized, scientists of different disciplines may have to survey the cave over and over impacting the environment with each attempt. The importance of reworking the archives comes into front light especially in those cases when the cave environment has changed drastically (due to opening parts of it to public), but also in those cases when the environmental impact of a new survey is high.
In a case study of the Buda Thermal Karst System (Hungary), Albert et al.  demonstrated how to use a GIS to obtain new scientific results from archive data. This project aimed to estimate the macro- and meso-scale conduit porosity within the limestone and marl sequence incorporating the cave. The archive documentation included survey records, maps, transversal, and longitudinal sections. A method was worked out to create 3D passage models from the survey database using Visual Basic scripts and a GIS capable mapping application (AutoCAD), and subsequently extract volumetric data from the models. The database has to be prepared prior to the modeling, and the scripts provided the automation for the process of making 3D shapes from database records. In this case, the data information system included the original survey records but without properly measured LRUD (left-right, up-down) data. The missing information was collected mainly from analog maps. The study processed three caves of the Karst System and managed to estimate the unmapped size of the caves, concluding that one of the studied caves can be the largest of the country with 2/3 of its passages yet to be explored. A few years after the modeling was done when the prediction was confirmed . The study was based on archive data without disturbing the protected cave environments with a new survey.
Caves attract not only scholars and explorers, but tourist as well. The caves are important sites for the public, and the stakeholders—when deciding about cave management—should rely on an information system that incorporates multidisciplinary observations . In all cases, the aim is to minimize the damage in the caves and maximize the benefits of a survey. This can be achieved with a GIS afterwards, but a thorough planning of the survey is also important [31, 32].
7. The future of survey data processing
Technologies and methods that may be an integrated part of cave surveying in the future, have root in the present. The surveying techniques are changing fast, but the caves are still places where both the surveyor and the surveying instrument are challenged. The technological and ergonomic characteristics, the price and the handling of a new tool all should be optimal to reach a breakthrough and become widespread in cave surveying. As the DixtoX became popular a decade ago, and the TLS during the last decade, the emerging technologies like the LiDAR-based mobile mapping system , combined with close-range photogrammetry may take the place of the “most popular cave surveying” in the future.
Although, the mobile mapping systems are expensive and still unavailable for most cavers, technical requirements of close-range photogrammetry become affordable for wide public in the last decade. Even a mobile phone can be used to create a photorealistic 3D model of an irregularly shaped object, like a rock surface . However, whole caves are not always suitable for photo documentation because of the casted shades of artificial lighting, contrasts and greatly varying distances. Archeological sites though are documented in several cases using the combined technology of TLS, a photogrammetry [19, 34]. From the aspects of Earth sciences, geophysical methods are also at hand to map the cavities and the lithology in the rock body that encloses a cave .
The seeds of the autonomous or semi-autonomous surveillance systems (robots and remotely controlled probes), which can combine the laser scanning technology along with other sensor types (magnetic, infra, sonar, gravimetric, etc.), are also present . The tendency is toward the higher precision and the larger data size, and obviously the data management and the processing methods will also have to change to keep up with the higher demands. The increasing amount of data will demand for lager storage sizes if continuous recording is expected for hours, and the multiplication of sensors will demand for more power and space. Moreover, the system has to be ruggedized making it even larger. Contemporary fully autonomous systems—capable to navigate without GNSS (Global Navigation Satellite System)—are the size of a larger suite case and works only for a few hours. It is still a long time before these surveillance robots will autonomously do the caving instead of spelunkers. A rather probable option is the combination of semi-autonomous systems with the recent technologies. Using TLS for broad passages and drone swarms equipped with active sensors for high and tight passages will require human assistance, in positioning beacons for the swarm. Still these systems are in not even in planning phase at the moment.
This chapter gives an overview of the aims, benefits, and the possible issues of creating a GIS from cave survey data focusing on the data types on both the input (surveying) and output (modeling) side. The most widespread data types are listed and explained along with the functionalities of the system components. The here presented approach highlights that in cave investigations one does not use simply one program to process the data, but many of them (worksheet editors, map editors, and modeling tools). Although, in strict sense, not all of the used component programs have GIS capabilities, using them in a common project connects them into an information system which has to fulfill three functionalities: processing, storing, and representation. This chapter explains how these functionalities are handled in the case of new and archive data processing.
It is very crucial, and the chapter emphasize it in several ways that the archive data is precious despite of its poorer quality in spatial resolution compared to the data of recent surveys. The huge amount of archive data is lost if it is not processed and incorporated into a common information framework of the GIS. One should not forget the environmental impact of a scientific study when deciding about a new survey instead of data-mining the archives. In some cases, the type of cave management may also has changed since the time of an archive survey (e.g., the cave has been opened to the public), and some parts of the cave cannot be surveyed any more in its original, natural form. Without building such an information system from the archives, modeling and the related studies must rely only on the contemporary data.
The chapter can help cave investigations in two ways: for those who are already familiar with surveying it draws attention to the importance of procedures like data management, quality control, or automation; and for those who work with the data as beginner users, the paper can shed lights to the various tasks related to cave surveys.
- Precision of the Leica DistoX is 2 mm within 10 m range, with an angular error of 0.5° RMS .
- The spatial accuracy of the traditional measuring method is 1% of the distance from the entrance point in good conditions, but it can be even 10% .