Fundamentals of Volunteered Geographic Information in Disaster Management Related to Floods

The main purpose of this chapter is to introduce fundamental knowledge regarding the notion of volunteered geographic information (VGI) and its applications in disaster management (DM) of events related to floods. Initially, the meaning of the term is defined along with certain properties and general trends that characterize VGI. A brief literature review unfolds the range of activities that compose that certain term, along with its applications to flood event management. Those applications cover significant aspects of both VGI and DM cycle: from participatory activities of volunteers up to pure data analysis, extracted from social media and other VGI sources, while, in terms of DM cycle, from mitigation up to response and recovery. Finally, a set of four main clusters of open challenges is addressed. Those clusters accumulate the vast majority of open topics on this research field.


Introduction
Flood events occur with high frequency globally, due to reasons related to climate change, to deforestation, and to problematic urban design of many high-populated areas. As a result, the effective disaster management (DM) of flood events, aspiring to mitigate the occurrence along with the negative consequences of those incidents, has emerged.
The current chapter provides a comprehensive interview of an interdisciplinary research regarding the use of volunteered geographic information (VGI) in procedures, methods, and strategies related to DM of flood events.
The next sections introduce the notion of VGI and its applications to DM of events related to floods. Various similar terms are mentioned along with a literature review which unfolds the range of activities that compose the so-called applications of VGI to flood event management. Those applications cover significant aspects of both VGI and DM components. In specific, the scope of the applications ranges from participatory activities of volunteers up to pure VGI data analysis, generated from social media content and other VGI sources. In terms of DM, those

VGI data sources
The VGI data sources can be grouped into two main categories: (1) the conventional, pure, structured, or purpose-driven VGI sources and (2) the unstructured, unintentionally driven ones.
The first category consists of specialized web spots in which the users are invited to report or generate specific information, by following some basic rules or some simple procedures. Probably the most popular representative of this category is OpenStreetMap (OSM), developed by Steve Coast. OSM counts millions of users who contribute to mapping information, while the mapping quality in high-populated cities of the world is equivalent to one of the conventional mapping data providers [25,26]. Regarding floods, there is published research for manipulating OSM content for the needs of flood event management [27].
Fundamentals of Volunteered Geographic Information in Disaster Management Related to Floods DOI: http://dx.doi.org /10.5772/intechopen.92225 Various other specialized VGI sources, which focus on DM, are based on the Ushahidi platform. Ushahidi means testimony in the language of Suachili. It was initially developed for mapping violent incidents in Kenya during the countries post-electoral events in 2008. Since then, Ushahidi has been evolved in an organization which provides web software for crisis situations. The platform has been widely used for DM purposes, of natural events [28,29], while applications exclusively regarding flood events are analyzed in the following sections.
The second category of VGI data sources consists of popular web spots through which the users generate geo-information unintentionally. Those VGI sources include almost all of the popular social networks (Facebook, Twitter, WeChat, YouTube). Considering the billions of users of the social media, the volume of produced information is tremendous. While numerous researches are based on the exploitation of those data. Moreover, as the use of that category of sources, in developing countries, is constantly rising [30], a large volume of information regarding floods is available, contributing thus to data availability which is characterized as problematic [31]. Apart from the latter, the enormous volume of generated information can contribute significantly in the emergency response of a flood-disastrous event, as the immediate information is vital for an effective rapid response.

Characteristics and properties of VGI
A significant property of VGI is related to conventional VGI sources and its compliance to specifications [23]. It is generally accepted that the volunteers tend to ignore strict specification rules as a really disciplined data production could kill their interest in generating data [32,33]. Well-designed user interfaces and purpose-driven approaches for generating data are considered efficient ways in order to increase the amount of generated formed information.
Moreover, some of the most important aspects of VGI that need to be assessed are quality and credibility. Regarding both, Linus' law seems to be applied in the vast majority of cases [25,26,33,34]. Linus' law is linked to the Linux operating system and implies that the more programmers develop a software, the less bugs the software will have [35]. In terms of VGI, Linus' law implies that the more volunteers appear in a certain region, the more accurate and complete the information will become.
Even though Linus' law seems to be applied in most of the cases, latest research, to unstructured VGI sources, like social networks, demonstrated that the information produced by the majority of the users might be wrong. Until today, those cases usually refer to information regarding controversial and subjective topics that have political orientations and impact. An indicative example is the spread of fake news during the presidential elections of 2016, through Twitter [36,37]. Moreover, there are a lot of cases in which many researchers propose various quality frameworks for assessing VGI different than the validity of Linus' law [38]. In terms of DM of physical events like floods though, the validity of Linus' law seems to be effective.
Finally, another significant property of VGI refers to the spatial heterogeneity of the produced spatial content. Even if in a certain area the quality of the produced information may be considered as sufficient, in other areas, data quality may be proven significantly different. An indicative example is presented in [39] in which a comparison was performed, between the spatial distribution of flood events extracted from VGI and the floods that were reported in official authoritative sources. While in various parts of the world the information was equivalent to the official data, in other areas there was missing information. As a result, assessments of VGI data in areas of interest always need to be performed in order to be assured that the data quality is sufficient for the use that it is designated for.

DM and VGI
DM is the term that describes the scientific and operational activities and strategies which focus on mitigating the negative consequences of a catastrophic event occurrence. In general DM consisted of five main parts that compose the DM cycle. Those parts are (A) prevention, (B) mitigation, (C) preparedness, (D) response relief, and (E) recovery, divided in rehabilitation and reconstruction [40]. For each part there is a plethora of published research, while the range of events that are confronted through DM is pretty large: from political crisis situations and wars up to physical events such as floods, earthquakes, and fire events [41].
The general notion of VGI has been emerged as an important component that aspires to contribute to each one of the components of the DM cycle [42,43]. Besides, the importance of volunteered activities in the DM procedures is clearly stated in the Sendai Framework for Disaster Risk Reduction of the United Nations [44], according to which the role of volunteers and community-based entities in general is to collaborate with authorities by providing "specific knowledge, and pragmatic guidance." Meaningful ways of contribution according to each type of disastrous event though are still a challenge [45,46]. Specifically regarding flood event management, in the following sections, various indicative applications of VGI for each one of the DM cycle components are analyzed.

Applications of VGI in DM of flood events
Numerous published researches focus on utilizing VGI data sources for DM of flood events.
In terms of flood identification, in [39] a Twitter corpus consisting of 87.6 million tweets was analyzed, leading to the identification of 10.000 flood events, globally. The main steps of methodology applied and included initially geo-referencing of the tweets and, sequentially, identifying flood events in the geo-parsed content.
In terms of tracking a flood event, in [47] the contribution of unconventional VGI data sources (social networks) was assessed, for DM purposes. The research focuses on the devastating Queensland floods, which occurred in Australia from December 2010 up to February 2011. Those floods caused damages to more than 30 cities and rural communities in southern and western Queensland, while various agricultural sub-areas were inundated. The cost of the floods was about 5 billion Australian dollars. From a VGI point of view, the social networks Facebook and Twitter were used as data sources for extracting related information. Apart from the text of each post, embedded photos and videos were processed, identifying thus various sub-events. During the unfoldness of the floods, about 15 k tweets were posted per hour. Among the conclusions it is stated that VGI contributed significantly to the tracking and provided immediate and in-depth information, crucial for prevention, mitigation, preparedness, and response tasks of the DM cycle. In addition, they stated that by using VGI, the enhancement of their emergency situation awareness can lead to better decisions in planning operations for giving aid, not concluded.
The above assumption was verified in similar research [48], regarding the Colorado floods, occurring in the United States in 2013. The significance of correctly tracking all the phases of a natural disastrous event emerged, completely documenting that the negative impact of similar flood events that may potentially occur in the future can be minimized. Moreover, VGI data sources were able to fill an important gap of information regarding the floods, especially since the flood occurrence, until the time that the scientific teams arrived in the area. In terms of methodology, the basic components include collection of tweets published within 9 days since the flood occurrence and the classification of those, to specific categories, including (1) geo-tagged tweets, (2) tweets containing obvious URLs to photos and videos, (3) tweets containing place names, and (4) tweets containing structural terms, determined by the engineering team.
Apart from tracking, the significance of rapidly produced information to authorities and DM stakeholders is emphasized on the international research [42,[49][50][51] as timely information is vital for the emergency response phase of the DM procedures. Moreover, the lack of information increases radically the budget that needs to be allocated for restoration. VGI sources have the potential to significantly contribute to that part [52].
In [53] a method for extracting flood event-related information through VGI sources was presented. Their extensive research provides meaningful insights regarding the most effective automated classification methods for dividing the posted information into certain categories. From a DM perspective, they focused on event detection of pluvial and fluvial flood events, while the collection of specialized information that could be extracted through geo-tagged photos contributed effectively to tracking and to verifying conventional hydrological models.
Moreover, in [46,54] methodologies for effective processing of social network data for DM purposes of flood events are presented. Among the main findings is that effective classification and geo-referencing can lead to advanced insights regarding DM of flood events. Moreover by automating the methods, mapping of consequences of a flood event can be performed in real time, contributing significantly to risk response of a flood event.

Participatory approaches
As stated in previous sections, the general notion of VGI is not strictly related to digital data procedures but also highly related to participatory approaches. After all, community involvement has been emerged as an important part of the DM operational activities, as by imbuing the community with a sense of ownership of the risk reduction process, resilience to deal with natural hazards is increased [19]. Moreover those approaches can be proven vital, especially in developing countries, which are expected to confront with the major consequences of the climate change, despite their minimum contribution to the problem [55], while data availability in many cases is affected, due to laws, security protocols, illiteracy, cultural barriers, and economic reasons [31]. In addition, the budget needed for organizing can be minimized by engaging local authorities to provide premises and by using open-source software solutions [56] for collecting and processing information related to floods.
An interesting approach was presented in [22] who refer to the Chametla community located in Baja California that aimed to reduce the risk of negative consequences in the event of a potential flood occurrence in Baja California, Mexico. The community received appropriate training by experts. In specific, they organized a workshop, in Chametla, in which the participants were able to annotate on printed satellite imagery their property along with various spots of the area that are considered vulnerable to floods, building thus a related map. Sequentially, they presented their results, and upon related discussions, they were able to correct and adjust various spots on the map. The output was reviewed by risk management experts who provided additional corrections. The final map was created by a GIS technician who digitally mapped all the printed information. The workshop participants created an ordered list of tasks that they could do in order to minimize the area's vulnerability to the floods. Those tasks included, among others, the pavement of few streets and the creation of drainage. In addition they distributed surveys for collecting socioeconomic and flood awareness level information of the locals. They concluded that the majority of the inhabitants are taking measures for being protected in the event of a hurricane or other similar disastrous event.
A similar approach had been presented in [57] who introduced a methodology, for exploring the potentials of joined activities of scientific teams and locals. They used two case studies, the Upper Danube and the Upper Brahmaputra river basins, while the aim of the participatory activities was to assimilate local knowledge in scientific flood event management procedures for mitigating potential disaster in mountain areas. They organized two related workshops, one for each case study, in which the participants, entitled as local actors (LAs), received training, in a story telling mode, regarding the climate change and its potential consequences in the next 40 years. Sequentially they were invited to evaluate proposed response tasks by defining and prioritizing criteria, according to their local knowledge. The output was processed by subject matter experts and was assimilated in related strategies for coping with flooding.
A community though may not be solely consisted of locals. In [21] an innovative participatory approach was presented, linked to the decision-making for prevention, preparedness, and mitigation tasks of flood events. In specific, a community was created, consisting of more than 117 Brazilian Scientists and flood subject matter experts from NGOs and private companies. As case studies, the municipalities of Lajeado and Estrela, located in South Brazil, were used. In those areas, mostly due to the geo-morphological characteristics, floods occur frequently, sometimes twice per year. The expert community was asked to define the most suitable criteria that define an area as vulnerable to floods. The feedback was received through the distribution of related questionnaires. Sequentially, the criteria were ranked according to their level of importance with the use of two related processes: the analytic hierarchic process (AHP) and the analytic network process (ANP). Finally, by using GIS and mp algebra, they created related maps that indicate the areas most vulnerable to floods according to the output of each ranking process.

Combined approaches
Apart from pure VGI-related activities, there is a lot of published research that tends to combine VGI along with a plethora of other data sources, creating thus the so-called mashups [58] which act complementary to each other aspiring to have the most efficient output. Those mashups consist of VGI data along with imagery, authoritative data, and ground-truth observations and measurements.
In specific, in [43] a hybrid approach was presented, manipulating flood-related data extracted from social networks and data gathered from a graphics processing unit (GPU) for accelerated hydrodynamic modeling. The approach was assessed in two flood events of the Tyne and Wear floods which occurred in June and August 2012, respectively, in the United Kingdom. About 1800 and 160 tweets were collected for each flood, respectively, while 43 and 13 tweets met the defined criteria for assimilation to related inundation models.
In [19] a method for implementing VGI in flood forecasting and mapping activities was presented. In specific, information through YouTube and through data collected by applying various queries in Twitter and various other Internet searches was extracted. The volume of extracted information that was assimilated in their flood-related models was small (~20 videos in YouTube, lack of related data in Twitter).
The output of the research presented in [18,43] emerges the contribution of VGI data to calibrating inundation models, rises though challenges for assimilating effectively large volume of produced VGI information in related models.

Developed web applications that utilize VGI for flood events
Apart from methodologies and approaches for manipulating VGI data for DM of flood events, there is published research indicating the development of web applications.
In [19] a novel participatory platform for engaging communities in all aspects of the flooding life cycle, entitled "NOAH," was introduced. The approach was applied in biosphere reserves, recognized by UNESCO. Definitely the app is associated to the conventional type of VGI sources.
In particular the users of the platform are divided into two specific categories: the anonymous users, who make contributions without providing any personal information, and the registered ones, who share observations in a more authenticated way. While sharing observations the users are requested to classify the reported observations in predefined categories. Various validation rules of the system focus on increasing the quality of the shared information. Those rules include, i.e., the mandatory presence of GPS coordinates in each uploaded photo, while post-processing procedures are applied on the shared information. The collected data are used for assessing and calibrating an inundation model, by validating or adjusting the water level according to a geo-tagged photo. Finally they assessed the usability of their platform by distributing questionnaires to the users. The feedback gained was that their platform is at an above-average level in terms of usability, while a general assumption was that VGI can contribute to mitigating a flood event occurrence and to providing information for adjusting inundation models.
In [59] a collaborative mapping approach was presented, based on the Ushahidi platform, through which ordinary people shared flood-related observations by using their mobile devices. The observations indicated points with measurements regarding the flood levels in various parts of Sao Paolo, Brazil. Among the conclusions of the research is the difficulty in engaging citizens to report to the platform. Moreover, by distributing questionnaires, feedback was collected regarding the app's usability and the data reliability. The main findings were that an improved user interface of the app, would be significant for user engagement.

Open challenges of manipulating VGI data for effective DM of flood events
In the current section, the author addresses the open challenges of VGI data sources when those are utilized for DM purposes, related to floods. The open challenges are accumulated to four main clusters, all blended by the general notion of quality: (a) classification, (b) geo-referencing, (c) visualization, and (d) automation. In the following paragraphs, each cluster is analyzed thoroughly.

Classification
The first set of challenges is related to dividing the ones related to flood information into the proper categories. A complete and proper classification structure could lead to extract information that can give valuable insights in various phases of a flood event occurrence. Various classification structures have been presented [42,48,54,[57][58][59][60][61]. A conclusion though to an essential, commonly used, classification can be proven beneficial for advancing the general research to a next step. In Table 1 the author suggests a conceptual classification structure, consisting of 12 main categories.
By adopting the basic principles of a classification schema like the one proposed, a researcher can receive, as output, a high level of specialized information which is vital for contributing efficiently to various phases of the DM cycle.
Moreover, by further sub-classifying categories of the initial classification structure, formed specialized information, regarding a flood event, can be extracted (i.e., Tables 2-5). In Table 2 a consequence-measurement scale ranging from I to V is proposed. The scale has an acceding logic in terms of the impact of the consequences, starting from value I, which is associated to simple identification of a rain or storm, up to value V, which is linked solely to human loss.
Similar quantification logic is applied in Table 3 regarding the effects on social life, while in this case Value I is related to the minor impact of a rain and Value V is related to zero social activity.
Finally, Tables 4 and 5 subdivide the information related to flood aid and expressed emotions, respectively. Three main types are defined for each main classification category.

Geo-referencing
The second cluster of challenges is related to correct and precise geo-referencing of the information, as the only way to have accurate maps is to have sufficient geo-referencing of the data. This vital set of challenges has a lot of complex characteristics that need to be taken into account, especially while processing specific sets of data mostly linked to unconventional VGI data sources like texts posted through social networks.  Table 3; info related to consequences in Table 2; emotions expressed as a result of the consequences of the flood in Table 5; and flood aid in Table 4. There are some social media that include location-related info in their semantics. Indicatively, Twitter has the ability to embed x and y coordinates of the spot in which a post is published (geo-located tweets). However, the percentage of those tweets against the total sum varies from 1 to 5% [62][63][64]. Moreover, as various researchers have stated, the geographic place in which a post was published is not necessarily associated to the descriptive information of a tweet's text [29].

Consequence score Description
An effective way to cope with this is to detect geographic entities that appear within each tweet's text. Even if there are various issues in this approach as well though, mostly regarding the presence of more than one geo-locations and more than one flood-related observations in a single text, the quantity of geo-referenced information extracted is significantly higher. Various geo-validation rules based on filtering the observation according to its distance from the flood event occurrence may solve the problem partially, while applied artificial intelligence for clearing ambiguity is also an interesting approach [65].
There are various algorithms, published in the international literature, that manipulate text corpuses from social media in order to detect geo-locations. One of those is the TAGG algorithm [66] which is based on detecting geo-locations in a text, using a database of known locations. The author has also presented techniques that aspire to contribute to effective geo-referencing of DM-related information [67].    Table 5.

Sub-classification of emotions.
Particularly, regarding the latter, a precision score level is indicated for each georeference ( Table 6).
According to the precision level of each geo-reference, the output of the processed information can be used from authorities (precision at a city level) or from rescue teams and locals (precision at a street level). Effective geo-referencing for DM related to floods needs is still quite a challenging sub-topic, especially towards the goal of high precision.

Visualization
The third cluster of challenges is linked to generating appropriate visualization results. In specific, the generated maps and graphs must be readable to people that could potentially be stakeholders of the DM cycle but with zero knowledge regarding geography and science in general. The production of complicated schemas, as an output of a bright methodology, is often the reason of not widening a methodology to all DM levels, as the complexity through which the information is delivered to the recipients limits the capability of having a crucial message understood. Even if we are living in an age that the literacy levels are higher than ever, geographical literacy is still a challenge for a plethora of people globally. Within this framework, some visualization suggestions can be found in Figures 1, 2. Figure 1 displays information related to the consequences of a flood event, occurring in West Attica, Greece. Each bullet located on the maps represents a consequence score value ( Table 2). Since both flood events caused human losses, there are many bullets in red.
Furthermore, Figure 2 visualizes the frequency of posted tweets that are related to identification of rain. With those maps an initial assumption may be provided to the DM stakeholders, regarding the potentials of flood occurrence, especially in the areas in which the frequency of tweets, indicating a rain, is comparatively significantly higher than in other areas.

Automation
Finally, the fourth cluster of challenges is related to automation. As many researchers agree, VGI data analysis is a time-consuming process [46,61,[68][69]. Especially when dealing with unconventional sources, the volume of produced information may consist of hundred thousands or even millions of data-rows. Techniques, like natural language processing (NLP), designated for handling large amount of information provide effective solutions. Moreover, the use of artificial intelligence applications, for classifying the related content, such as support vector machines, can radically reduce the time needed for classifying the information and for coping with ambiguities. Published research that employs classifiers provides really promising results [53,70].

Conclusions
The main aim of this chapter was to inform the reader about the fundamentals regarding VGI and its applications to DM of flood events. In previous sections, the author described the general notion of VGI and the similar terms that can be found

Author details
Stathis G. Arapostathis Harokopio University, Athens, Greece *Address all correspondence to: sarapos@hua.gr in the international literature and provided awareness of its basic characteristics and properties. Sequentially, significant research related to VGI and flood event management was presented. Considering the above, it can be safely assumed that VGI can effectively be used for identifying flood events and for documenting various phases of the unfoldness along with the tracking of the negative consequences and tasks crucial for the preparedness against similar flood events that may potentially occur. Moreover, the use of VGI provides significant assistance in calibrating and validating flood and inundation models, by providing specific spatiotemporal information. Furthermore, participatory activities can provide significant contribution regarding preparedness by identifying vulnerable spots and performing adjustments in the urban environment, making thus an area more resilient to floods. Similar activities consisting of subject matter experts can provide valuable support in the decision-making processes of the DM related to flood management.
Regarding data availability, the unconventional VGI data sources provide an enormous volume of information related to floods; information though with anarchic characteristics surely is not compliant to specifications, while the conventional VGI data sources, which are usually purpose-driven, may provide data more compatible to the DM needs; the data production though is limited.
The open challenges of VGI data, when those are manipulated for DM purposes, are accumulated in a set of four clusters. The first cluster is related to classification. The more complete and detailed classification structure, the more specialized the processed information will become. Precise geo-referencing; effective and simplified visualization of the processed information, easily readable by all the DM stakeholders; and finally adaptation of automation techniques complete the set of the challenges.
Assuming that the social networks will continue to be evolved and enlarged, it is expected that methodologies that will be able to assimilate all the potentials of VGI in the DM mechanisms will be more and more dominant.

Conflict of interest
The author declares no conflict of interest.
© 2020 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/ by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.