Information systems have evolved to service-oriented architectures (SOA) where dedicated desktop applications have turned into on-line data and services. On one hand this distributed environment let users to share (resources) data and tools, but on the other hand there is a need to develop mechanisms to allow users to find and access to these distributed resources efficiently.
Current trends for discover and access geospatial information are being addressed by deployment of interconnected Spatial Data Infrastructure (SDI) nodes at different scales to build a global spatial information infrastructure (Masser et al., 2008, Rajabifard et al., 2002) being the SOA paradigm in the geospatial domain.
However, current Geographic Information Systems (GIS) and the services provided by the SDIs fail to allow transparent navigation between related geographic information resources. In SDI like in other domains, metadata are a necessary mechanism to describe the information, and together with Catalogue Services are the key elements for discovery and information fusion possibilities (Nogueras et al., 2005, Díaz et al., 2007).
In this context, pointing out this need, there are directives such as INSPIRE http://inspire.jrc.ec.europa.eu
Metadata allow us to describe data and, based on it, we could organize, publicize and facilitate the access to such information. Traditionally, it has been the user or the data provider who creates these metadata that will be published in Catalogue Services, for being discovered and accessed later by different users in a SDI. The fact of generating metadata like who created the data, where are they placed, etc. is a laborious task, fundamentally because the traditional metadata formats are large and complex, the users who are documenting the data usually have no knowledge about some metadata of the original data due to the lack of information supplied by the provider, etc.
In this sense, the production of metadata becomes a laborious job that consumes a large amount of time and effort becoming a task released into the background despite its major importance. This provokes, in reality, a scarcity in metadata availability in SDI and consequently difficulty in data discovery and a miss functioning SDI.
For all the above reasons there is a need to facilitate metadata production to easily create, with minimal user intervention, metadata descriptions when the data are created. In this way, data and metadata can be packed, forming a logical unit, created at the same time and minimizing the inconsistence between data and their metadata.
In this chapter we present a methodology for documenting geospatial information. This methodology provides mechanisms to automate the generation and publication of metadata. For demonstration purposes we describe a prototype implemented within an open-source software GIS/SDI client. This prototype is capable of semi-automatic extraction of explicit metadata from data resources, metadata edition and publication to be catalogued for data discovery in an SDI.
The nature of this integrated workflow that facilitates metadata creation and management, will hopefully contribute to a change in mindset as to the cost/benefit ratio of generating and exploiting metadata, a necessary ingredient for successful SDI.
2.1. Geospatial Information
There are studies showing that most of the information (more than 80%) is likely to be linked to a geographic position. When we talk about geospatial information we are talking about data intrinsically related to a geographic position. Although there exists formats specially supporting geospatial data, any other data or information, not considered spatial in nature can be georeferenced and considered as such.
Georeferenced resources are then resources of any nature that have defined their existence in physical space. That is, those that have established their location in terms of map projections or coordinate systems. Nowadays, the act of georeference has gone beyond the fields of geoscience and GIS, thanks to the emergence of new tools which their ease of use has expanded and democratized this task outside of the current technical context.
The use of tools like Google Earth http://earth.google.es http://www.flickr.com
All this georeferenced content, like geospatial data, can be described by using metadata and published in Catalogue Services in order to be integrated in SDI.
2.2. Description of Resources
A description is the explanation, in a detailed and ordered way, of how is certain person, place, object or anything, through the explanation of its various parts, characteristics or circumstances.
As we said earlier, metadata allows us to describe data and, based on it, we could organize, publicize and facilitate the access to such information. Metadata are commonly defined as “structured data about data” or “data that describe the attributes of a resource” or simply “information about data”. In other words, metadata is the information that describes the content, quality, condition, origin, and other characteristics of data. Metadata is the information and the documentation that enable data to be well understood, shared and used effectively by all types of users over time.
These metadata or data description must be generated according to a standard in order to fulfil the minimum requirements for interoperability. One of these metadata standards is DublinCore (DC), this standard was born originally to describe Web resources in a general way proposed by the initiative "Dublin Core Metadata Initiative" (DCMI) http://www.dublincore.org
The need of this kind of metadata standards is pointed out by organizations like World Wide Web Consortium (W3C) http://www.w3.org/
The W3C proposal is to develop metadata extending languages based in XML (eXtensible Markup Language) (Bray et al., 2000) or RDF (Resource Description Framework) (Manola y Miller, 2004). In this way in the geospatial domain there is a general consensus.
As we mentioned in the section before, potentially, any resource could be georeferenced and be integrated with other geospatial information in or outside SDI environments. As we focus on a methodology for description of geospatial information, we describe next the goals of the geographic metadata creation and the standards used in this domain.
Geographic metadata help people involved in the use of geographic information to find the information that they need and determine how best to use them (Nebert, 2004). In (FGDC, 2000) it is stated that the creation of geographic metadata has three major goals (which are also benefits):
Organize and maintain investments in data made by an organization: Metadata seek to promote the reusability of data without having to turn to the team that was responsible for its initial creation.
Publicize the existence of geographic information through catalog systems: Metadata records are usually published through catalog systems, sometimes also referred as directories. Electronic catalogs not differ too much from the traditional library catalogs except for the fact that it offers a standardized interface for search services. Thus, these catalogs are the tool that put consumers in touch with the producers of information. By means of the publication of geographic information resources through a catalog, organizations can find data to use, other organizations with who share data and maintenance efforts and customers for these data.
Facilitate the access to the data, their acquisition and a better utilisation of the data achieving information interoperability when it comes from various sources: Metadata help receiving users or organizations in the processing, interpretation and storage of data in internal repositories.
Within the world of geographic information have been defined recommendations for the creation of metadata, whose main purpose is to provide a “hierarchical and concrete” structure to describe fully each of the data to which they refer. These recommendations have been created and approved by standardization bodies according to opinions of experts in the area. These recommendations, in form of standards or metadata schemas, provide criteria to characterize their geographic data properly.
Throughout the years have emerged, at national or European level, even within a specific domain, a set of initiatives to standardize the creation of metadata. However, these initiatives have been repealed for harmonization with the international standard ISO19115:2003 http://www.iso.org/iso/catalogue_detail.htm?csnumber=26020 http://www.fgdc.gov/metadata/csdgm
Regardless of the metadata standard used, it is usual to classify the elements of metadata respect on their role within the paradigm “discovery, evaluation and access” established in (Nebert, 2004):
Discovery metadata elements are those that allow minimally describe the nature and content of a resource. These elements usually respond to the questions “What, Why, When, Who, Where and How”. Typical elements in this category would be the title, the description of the data set or its geographic extension.
Exploration metadata provide information that allow verify that the data are in accordance with the desired purpose, assess their properties or contact with the organization that will provide further information.
Exploitation metadata include those necessary descriptions for access, transfer, load, interpret and use the data in the final application in order to be exploited.
Another important aspect related to the metadata schemas is their level of detail, which is defined by the choice of the standard itself and the creation of special extensions and profiles. First, the chosen standard defines a more or less large set of elements with different condition: mandatory, optional and mandatory if applicable or conditional. An extension of the standard usually consists on adding new constraints (e.g. conversion of optional elements to mandatory), extension of code lists and the creation of new elements and entities. Some standards such as ISO19115:2003 and CSDGM provide methods for the extension of the metadata within their specification. And if there are a big number of these extra features (they involve the creation of a considerable number of elements), ISO19115:2003 recommends making a formal request for the creation of a specific application profile for that community of users who require it.
However, although the specific profiles and the optional and conditional elements facilitate certain flexibility to the geographic metadata, most of the common used standards like CSDGM and ISO19115:2003 are too complex (Nebert, 2004), both define more than 350 elements distributed into multiple hierarchical sections. This complexity means that, to complete the geographic metadata, it is necessary to devote a big amount of time and highly qualified human resources.
Automatic mechanisms for generating metadata in standard format would be a helpful way to assist user to increase the number of available metadata in distributed environments improving the discovery of the data in an efficient way.
2.3. Generation of Metadata
Metadata is usually created by data providers, generated manually and stored (separated from the resource) in catalogs, according to digital libraries tradition, to be found later for informational purposes. However, practical problems with their creation and maintenance are limiting their effectiveness for tasks such as discovery and evaluation of the usefulness of a given resource.
Some authors emphasize as causes of this low effectiveness the complexity of the rules and standards in the geospatial context or the low automation and synchronization between the creation of data and metadata. In terms of complexity, (Bodoff et al., 2005) regret the overhead of planned uses for some metadata: according to certain rules some metadata must provide at the same time the documentation, the configuration and the access point to the resource. Other authors point out the necessity to automatize data generation, (Bulterman, 2004, Manso et al., 2004).
Nowadays metadata are usually created manually, and only few of them are extracted automatically by software, for example, geographical extent or the date of creation. Although theoretically only must be introduced by hand subjective descriptors such as the abstract, but the complexity and variety of formats limit the application of automated techniques.
Due to the increasing need of metadata to find the great amount of data available in distributed environment, especially in geospatial information systems, being deployed as SDI, there are numerous software applications that try to facilitate this metadata generation. Most of these application started supporting CSDGM standard and ISO 19115. There is a good survey on these applications available in the FGDC metadata working group http://www.fgdc.gov/metadata/iso-metadata-editor-review
The purpose of this section is to make a small state of the art of existing proposals to improve the automated generation of metadata. The hypothesis is that the automatic generation of meta-information permits decrease human interaction in the creation of metadata, reducing the associated workload and the obstacles arising from the complexity of the metadata schemas that metadata creators must face.
2.3.1. Methods aimed at the extraction
Actual models of representation of georreferenced information, especially the raster and vector spatial representation models, are characterized by being highly structured and are manifested in multiple exchange formats. Due to the complex nature of digital resources, it is not possible to effectively reuse methods for automatic generation of metadata already existing in the context of information retrieval on textual type documents (e.g. Web browsers). On the other hand, the few existing GIS tools that offer automatic deduction of metadata for raster and vector formats are based on the analysis of these specific formats and the implementation of ad-hoc mechanisms that process the data in these formats to extract information which is used later to populate the metadata elements (Manso et al., 2004).
Among the applications that perform an automatic extraction of metadata from certain geospatial data exchange formats is the free software tool CatMDEdit http://catmdedit.sourceforge.net
Another well-known tool and widely used that includes automatic metadata generation functionality from geographic data is ESRI http://www.esri.com http://www.idee.es/resources/recomendacionesCSG/NEM.pdf
Regarding to other data formats, such as text, sound or video documents, content creation software, that is, the range of programs used to create these resources, usually support some automatic metadata generation from the content that they generate. For example, MS Office attaches to the document a title based in the text of its first line, apart from other technical metadata such as dates of creation or modification and the author information. These metadata created by the content creation software are often used by the file system to index and sort the contents. All this kind of metadata can be collected during the creation process (Greenberg et al., 2005), but during the creation of digital content there are other metadata that can be automatic generated, they are usually used in various visualization applications, but not usually taken into account for the description of the resource for future discovery.
A couple of examples of the type of work that is being developed in this area can be the report of (Greenberg et al., 2005) on the generation of metadata for MS Word, Acrobat, Dreamweaver, CityDesk, WinAmp… file formats or the DCS (Dublin Core Services) project, which develops a set of services and applications for the automatic metadata extraction from more than 10 types of digital formats (XML, BibTex, XHTML, PNG, etc.), this project aims to support the development and widespread application of the Dublin Core standard format.
2.3.2. Methods aimed at the inference
We can infer metadata from other metadata or from geodata, using various techniques of data mining, data recovery, using the context surrounding the data, reasoning techniques and so on. We could know the administrative limits of a geo-spatial data from the knowledge of their bounding box using a gazetteer, or maybe we can infer a more or less adequate abstract from the information of the name, the legend of a layer, its geographical position, etc.
In this sense, (Taussi, 2007) proposes a metadata extraction based on three fundamental steps. The first step consist on apply some metadata extraction techniques largely based on the specific exchange format of the geographic data. Next step is the automatic deduction of the information regarding data quality, using brute force, stochastic or comparison techniques of the analyzed data with other reference data. Finally, the last step to apply is based on the utilization of data mining techniques that lead to obtaining a higher degree of knowledge about the data. Among the proposed data mining techniques, the following can be highlighted (Hand et al., 2001):
Exploratory data analysis: Goal is to explore data without clear ideas of what we are looking for.
Descriptive modelling: Idea is to describe all of the data. For example, showing the distribution of the data, partitioning of the data into groups or making models that show relationships between variables.
Predictive modelling: Goal is to build a model that permits one variable to be predicted from the known values of other variables.
Discovery methods: These methods are based on pattern detection, and idea here is to identify patterns, rules, outliers or combinations of items that occur frequently in data.
Retrieval by content: It is based in the comparison of the contents of the dataset according to the pattern of interest to find similar patterns.
A work that tries to take a further step in the induction of metadata from the analysis of data is that developed by (Klien & Lutz, 2005). They propose a method for automatic annotation of geodata that consists of two main steps. In a first step, ontologies are defined from the definition of concepts (eg. floodplain) for a possible dataset. Depending on the spatial relationships that exist and should be verified with a reference dataset, it is checked whether the dataset corresponds to a floodplain, they check their connection to a nearby river, the altitude with respect to this, and if it is a flat terrain. In a second step, existing topological relationships are verified by a spatial processing for each type of relationship included in the concept, and if it meets all the dataset is semantically annotated with the concept. This approach requires a previous readiness to define concepts based on spatial relationships that makes the method is not directly applicable to any set of data. But in any case, it is helpful to specify formally the spatial analysis that allows checking whether a dataset meets certain characteristics, for annotate semantically.
Outside the geographical scope, there are also other works that exploit the idea of extracting information about other resources using techniques related to data mining. In this line we can mention the work done by (Kawtrakul & Yingsaeree, 2005) which provides a framework for extracting metadata from electronic documents, such as text documents or images; the study of (Day et al., 2007) for extracting metadata of publications from the bibliographic references, or the articles of (Boutell & Luo, 2005) and (Suh & Bederson, 2007) where photographs analysis techniques (based on clusters) are presented to identify, for example, if the photograph was taken during the day or night, in a natural reserve or in a city, inside a place or outside...
2.4. Publication of Georreferenced Resources
The publication is the effect of revealing or expressing to the public some information, that is, the activity of making information available for public view.
In the world of geographic information is widely accepted that publishing means to make available to users some information of the data in a catalog service, as it is driven by Open Geospatial Consortium (OGC) http://www.opengeospatial.org http://www.loc.gov/z3950/agency
A more general mechanism, but whose philosophy and operation can be adapted to the field of the georreferenced resources is the Open Archives Initiative (OAI) http://www.openarchives.org
OAI provides us with the specification Open Archives Initiative Object Reuse and Exchange (OAI-ORE) that defines standards for the description and exchange of aggregations of Web resources. These aggregations, sometimes called compound digital objects, may combine distributed resources with multiple media types including text, images, data, and video. The goal of these standards is to expose the rich content in these aggregations to applications that support authoring, deposit, exchange, visualization, reuse, and preservation. Although a motivating use case for the work is the changing nature of scholarship and scholarly communication, and the need for cyberinfrastructure to support that scholarship, the intent of the effort is to develop standards that generalize across all web-based information including the increasing popular social networks of “Web 2.0”.
The specification of standards for the publication of georreferenced information, whose implementation is feasible from a technical and economic point of view, results essential to the progress of technology and services.
3 Proposed Methodology
3.1. Metadata generation
The proposed methodology, to automatically generate complete metadata and of reasonable quality, avoiding as much as possible the participation of the user, is a combination of the methods described before orchestrated efficiently.
Initially, we start by obtaining all relevant information that can be extracted from the data resource itself, for example, the data size or the creation and modification data. Later, we try to extract as much information as possible from the data content. We must emphasize that this is one of the most important sources of information, so we must pay special attention on it. Moreover, how to analyze the resource and the amount of information available will depend entirely on the nature of the resource and the format in which they are represented. By this method we can extract explicit information in the data, for instance, in an email is easy to find information such as the sender, the recipient or the date that it was sent.
Now, we will add the common information pertaining to the creation and the exploitation context. From the creation context we can obtain relevant information as the organization or the company responsible of the data and the theme of the data. We can operate in a similar manner with the exploitation of context, obtaining information such as the theme or the resource quality offered by certain company. All these information can be previously set or revised by the user or automatically obtained exploring the data set and their context.
The next step is to consider collecting information from the process of creation of the data, obviously if it exists. It should be noted that this source of information is “volatile” due to the fact that it is only available at the time that data is created and for this reason we must collect and store all possible information at that moment. We consider that the information that can be obtained from the process of creation of the data is very important and rarely taken into account. By this method we can accurately find out relevant information such as the creation process in order to replicate the results later, costs (computational, temporal, economic, etc.) or the author of the data. In addition to the information that we had commented just now, during the process of creation of the data, a sensor or other measuring mechanism can provide relevant information. Some magnitudes could be measured such as elevation, position or temperature, and incorporate this values to the metadata automatically. We can find a good example of this method in some digital cameras that use it to add, among others, the information that provides their integrated global positioning system (GPS) http://en.wikipedia.org/wiki/Global_Positioning_System http://www.exif.org
Having reached this point we already have a base of information, and applying to it some deductive methods we will try to extend it. One way to deduce new metadata is that an element of metadata is created through a direct correspondence with another existing metadata element. For example, you can get the place name corresponding to the data using the 4 coordinates of their bounding box, using a gazetteer service. Another way to deduce new metadata is based on the calculation of a new element of metadata through a computation process of the data themselves. In this sense there are many lines of investigation open that cover a wide range of possibilities. We can find from different techniques to analyze/process text documents or web pages to find out its main theme or keywords, to other techniques that employ the geodata themselves, for example, to determine the province of a town by topological calculations. The last deductive method is the inference of metadata from other existing metadata or from the data content. It represents the best method, in fact in some situations the only one applicable, for the post-hoc creation of metadata, that is, document existing geodata. An example would be to infer the season of the data using the temperature metadata element, perhaps obtained by measurements such as we have explained above, so a rule would establish for a temperature below 15 degrees in Tenerife that we can suppose winter. Today it is obvious that the creation of this inferred metadata overlaps extensively the research fields of data mining and data recovery (Goodchild, 2007).
Finally, we must never forget to offer the user the possibility of modify or introduce the information, although the idea is that users increase their confidence in the methodology in base of the observation of its acceptable results and tend to not participate in the metadata generation process.
While most methods are applicable throughout the lifecycle of data, other methods are only applicable at the moment that data are created. We must pay special attention on them since most times the information that is not collected at that time is lost forever, and some of this information can be essential for a correct description of the resource.
The proposed methodology will allow the improvement of the automatic generation of metadata and their quality, collecting information that is currently ignored, such as that one that is coming from the creation process that can be very important to describe resources properly. Consequently, the result of applying this methodology will obtain more metadata, of higher quality and correction, with reduced participation of the user.
3.2. Metadata publication
Metadata publication is the second step of the proposed methodology. Once the metadata has been generated, users will be assisted in the automatic publication of this metadata. We can give users the possibility of publish these metadata automatically in an integrated way in the workflow. Hopefully, this will lead to increase the amount of published metadata since we are drastically reducing the necessary effort to correctly describe resources (by generating metadata manually) and to publish them.
In our methodology, metadata publication means to publish metadata in a Catalog Cervice, we will use the CSW http://www.opengeospatial.org/standards/cat
However, we want to explore other ways to publish metadata and resources in order to obtain better levels of the capacity to be found. One way is publish the information resources directly in servers or social networks that support this kind of information using the metadata to document it properly in the server. For example, we can publish maps in MapServer http://mapserver.org http://www.wikiloc.com http://en.wikipedia.org/wiki/Peer-to-peer
This section describes the architecture of our proposal to develop this methodology. The following architecture shows the modules and the connections to design an application that we have called
A crawler is an application that explores the content of a system in a methodical and automated way. This kind of applications is used to build an index the resources found in the system, in basis to the information extracted of each resource in its processing. In this way,
To implement and fulfil the requirements of the proposed automatic metadata generation methodology we have decided to use a three-tier architecture (Eckerson & Wayne, 1995), to place some modules designed to implement the required functionality in each tier. This architecture is a client-server architecture in which the user interface, functional process logic (“business rules”), computer data storage and data access are developed and maintained as independent modules, often on separate platforms.
As we can see in Figure 1, at the bottom of the figure and the lowest level of the application is the Data Tier, this tier consists on a database to store the generated metadata and the data files themselves. In this tier the information is stored and retrieved, so it must provide well defined interfaces to manage the data. In our case, this access is provided by data drivers to access to the data files and ODBC http://en.wikipedia.org/wiki/Open_Database_Connectivity
The next tier, which lies just above the Data Tier, is the Logic Tier. It controls the application's functionality by performing detailed processing. In the bottom level of this tier we can find the components implementing the metadata generation methods aimed at extraction. These components correspond to the methods described in the proposed methodology section. Thus, the
According to the methodology, and as we can see on the architecture, on the top level of this tier and connecting to the modules which generate metadata, we have the
In the top of the Figure 1 we have the highest application level, where we find the Presentation Tier, this tier displays the information provided by the lower tiers through a graphical user interface. This user interface, moreover, allows users to interact, configure and operate with the application.
This kind of architecture, benefits from the advantages of modularized software containing well-defined module interfaces, it intends to allow any of the three tiers to be upgraded or replaced independently. This is very useful if we want to reuse some components (even the module containing the two lower tiers) and integrate it in other system, to incorporate the functionality of automatic metadata generation and management to any new or existing application.
5. Case Study: Metadata Management Platform in gvSIG
We describe next a case of study in which we have implemented a proof of concept for our methodology and architecture. In this sense, we have implemented a prototype of the metadata manager, using the functionality and the extension possibilities that offers an open-source software GIS/SDI client called gvSIG http://www.gvsig.gva.es
We have extended gvSIG to facilitate, with an integrated workflow, metadata creation, management and publication. This prototype interacts with the gvSIG core to handle the metadata associated to all the resources pointed out to be described with metadata, and provide automatic extraction of explicit metadata from data resources for both internal metadata for user efficiency purposes and external metadata to be catalogued for data discovery in an SDI. In this case, with this integrated solution, we could get lots of information available in the process of data creation. The metadata manager will be working in the background annotating all the metadata while gvSIG users are working with their geospatial data, when it is required the metadata manager will use the implementation of the proposed metadata generation methodology to obtain as much information of the resource as possible, thus, without user interaction. As an added value gvSIG will be using these metadata, as internal metadata to avoid task duplication or recalculations and to visualize the resources properly. On the other hand, when the user wants to share data in a distributed environment like an SDI, he or she can use this metadata to publish metadata. The metadata manager will allow the user to visualize and edit the metadata according to one chosen standard format, and warn about the status of the metadata, for example to fulfil a minimum required set of elements of a certain standard format. Finally, included in this workflow, the prototype includes a user-friendly wizard to guide the user to publish these metadata in a Catalogue Service.
This prototype became available in October 2008 as a pilot plug-in of gvSIG. To sum-up it includes the metadata manager capable of semi-automatic extraction of explicit metadata from data resources for internal use or for being exported to a standard format and/or published in a catalogue service.
In this prototype we support GeoNetwork Opensource http://geonetwork-opensource.org
As we can see in the Figure 2, in this prototype we define, within the central structure (or core) of gvSIG, an internal metadata dynamic object which would keep all types of metadata associated with their respective resource, as we can see reflected in the
The various metadata elements collected will be stored in an XML format file that will be saved together with the data for future uses. When a resource is created, thus it does not have associated metadata yet, the
The Figure 3 shows a screen shoot where we can see part of the modules containing GUI (Graphical user interface) shown in the prototype architecture. These modules let the user to visualize and edit these associated metadata by using the metadata editor, he or she can add additional information (such as an adequate title or abstract) that might be required by the standard metadata formats. In this case, when a user wishes to edit the metadata to export it or publish it in a catalogue service, he will choose one of the supported standard formats for this purpose. Once it has been chosen, the metadata manager will start a wizard that will guide the user to view and edit the metadata according to the selected format and validating the metadata fields. This wizard will guide users to import and export metadata too, validating it according to a standard format to be shared by multiple users without having to publish it in a catalogue service. As we see in the figure the
As we see in Figure 3. This user interface also links with the
Given the above information, let us look at a typical use case. A technician using gvSIG has combined basic geospatial data including terrain data such as slope and aspect, with vegetation data, to create a rough forest fire risk map. Assuming he or she has permission to share this new dataset, he then undergoes the process of publishing the risk map to a map server, and would also like to (or should be required to) publish its description to a metadata catalogue service such as that currently available at the European Commission INSPIRE Geoportal http://www.inspire-geoportal.eu
In our use case, the resulting dataset, risk map, will have associated a metadata object that will be created by the process described above. The final step in the workflow is when the user decides to publish the metadata record to a catalogue service the metadata manager checks the validity of the resource associated metadata, the validation will depend on the metadata standard that has been chosen to publish, thus the standards that the Catalogue Service supports. If the metadata conform to minimum requirements according to the selected output metadata standard format, then the metadata manger uses stylesheets to generate an XML string compatible with the catalogue service and carries out the pertinent interaction with the server to establish the connection and publish the metadata.
This prototype is only capable of working with shapefile http://en.wikipedia.org/wiki/Shapefile
Using this integrated solution, the user can close the life cycle of metadata (Baca, 2008) within the same application. So we could create, modify and publish metadata using the metadata manager, and later discover and recover metadata using the integrated catalog client in gvSIG that allow us to recover the linked data from the Catalog Service.
6. Conclusions and Future Work
Metadata descriptions are critical to enhance the discovery, access and use of GI data and therefore are a key element in achieving good data integration and smooth functioning of Spatial Data Infrastructures, as a basic infrastructure to discover, share and use heterogeneous GI data. This points out the need to facilitate metadata production to easily create, with minimal user intervention, metadata descriptions in standard formats.
The presented methodology includes mechanisms capable of automatic generation and publication of metadata in Open Catalogues as means of improving geospatial information sharing in distributed environments like SDI.
As a proof of concept the implemented prototype allows the extraction of explicit metadata from data resources to be catalogued for data discovery in a Spatial Data Infrastructure.
This implementation of the concept of semiautomatic extraction and management of metadata facilitates the creation and edition of images and geospatial data to be published in a Spatial Data Infrastructure. The integrated nature of this solution within the user workflow hopefully will lead to a proliferation of metadata creation, thus improving the functionality and value of SDIs. Furthermore, this development completely supports the philosophy of total integration of data and metadata that we are trying to promote in order to all data generated are easily found and accessible.
In the early future we will complete the development of the metadata manager for documenting well-known imagery and cartographic data sources. The work includes document more types of resources and file formats, add new standard formats and expand the possibilities of publication, but the major effort will be done to continue implementing and improving each of the methods that compose the proposed methodology to automatically generate metadata included in the
As we had said, the automatic metadata generation methodology includes more intelligent methods to extract metadata by using inferential reasoning techniques from other metadata and data associated. Intuitive extraction of intrinsic (context-based) metadata of the data source in Google-like techniques, including deductive methods to create well formed free text.
Another interesting future development is the
On the other hand, we will continue to investigate and develop new techniques that allow us the complete integration of data and their metadata. This will greatly facilitate the management, reuse and sharing of resources. Additionally, we will explore other lines of investigation about georreferenced resources publication, for example, the use of indexing techniques that allows us to find the data using simple metadata sets, rather than creating complex formats such as those stored in the current catalogs.
This work was partially funded by the project “CENIT España Virtual”, funded by the CDTIin the program “Ingenio 2010” through