Earthquake Observation by Social Sensors

Social media have garnered much attention recently and the number of social media users has been increasing. Social media are kinds of media for social interaction among users. Users create contents for themselves and exchange them on social media. Social media include many kinds of forms, including weblog, wikis, videos and microblogs. One of the biggest characteristics of social media is user-generated contents.

questionnaire format (Intensity, 2005).From the Twitter web-site, Toretter extracts tweets that refer to earthquakes and estimates the location of an earthquake's epicenter using location information included with those tweets (Sakaki et al., 2010) These methods treat social media users as sensors.We designate these virtual sensors as social sensors, which entail no costs.Unfortunately, such sensors provide a signal that is extremely noisy because users sometimes misunderstand phenomena, sleep, and are not near a computer.
We introduce these methods and explain a process for earthquake detection by analyzing social sensor information.We introduce current studies and services for earthquake observation using social sensors .Moreover, we explain Toretter as an example and describe its mechanisms.

Overview of earthquake observation by social sensors
We explain the basic idea of social sensors and introduce internet service users as social sensors to observe earthquakes.

Earthquake observation services performed by social sensors
We introduce four earthquake observation services that use information from internet users.In this chapter, we examine Toretter as an example.We explain its detailed mechanisms in the next chapter.

Did You Feel It?
The web site Did You Feel It?, which is operated by United States Geological Survey (USGS), is shown in Fig. 2. Through the internet, it gathers earthquake information from users who experienced those earthquakes directly (Intensity, 2005).

TED
The USGS also manages the Twitter Earthquake Detector (TED), which gathers tweets referring to earthquake occurrences from Twitter.They acquire location information and photographs attached to tweets and show this information related to maps (Survey, 2009).

314
Earthquake Research and Analysis -Statistical Studies, Observations and Planning www.intechopen.comiShake The iShake project has developed a smartphone application (Fig. 3) that uses a phone to measure acceleration during an earthquake and report those data to researchers for processing (CITRIS, 2011).This project, conducted by UC Berkeley, is designed to create a system that moves beyond Did You Feel It?.Data from smartphone applications can complement data obtained from ground monitoring instruments, thereby improving the resolution and accuracy of earthquake intensity maps.

Toretter
Toretter extracts tweets referring to earthquakes and estimates the location of the earthquake epicenter using location information of those tweets (Sakaki et al., 2010).A temporal model and spatial model for earthquake detection are defined by social sensors.
Then methods are proposed to detect earthquakes and to estimate the location of an earthquake epicenter automatically.The Toretter mechanism is shown in Fig. 4. First it collects tweets referring to earthquakes by crawling with the Twitter API and filtering the tweet messages using a tweet classifier.Second it tries to detect an earthquake from collected tweets based on a temporal model for earthquake detection.Finally, it extracts location information for each tweet from Twitter.The system uses that information and a particle filter to estimate the earthquake epicenter based on a spatial model for social sensors.
In this chapter, we explain methods of earthquake observation using social sensors according to the Toretter mechanism.We explain this entire process in the following section.

Overview of social sensors
We introduce the mode of social sensors and describe their features in comparison to physical sensors.

Basic idea of social sensors
Many methods and infrastructure can be used to observe events and natural phenomena using physical sensors: heavy traffic, air pollution, astronomical events, weather phenomena, and earthquakes are some examples.The basic mechanisms of such observations by physical sensors are presented on the right side of Fig. 5. First, a target event for observation occurs.Second, some sensors for the target event respond with a positive signal.Third, a central server collects signals from sensors and analyzes them.Finally, the server detects the target event or produces some observation values as output.
If users of social media observe an event, then similarly to physical sensors, they make posts about the event.For example, some Twitter users might post "Oh earthquake!" or "pouring rain, thunder & lightning " or "It's a double rainbow!& the moon is out.Beautiful!".These actions by users are analogous to the response of physical sensors to a stimulus: the users and sensors send a signal when an event occurs.Therefore, a user of social media is a sensor of a kind.We designate such sensors as social sensors.
An observation system incorporating social sensors is depicted on the left side of Fig. 5. First, an event occurs.Second, social media users make posts about the event.Third, the posts are collected at a central server and analyzed.Finally, the server detects the event or produces some observation value.This whole process corresponds to a process of observation by physical sensors, presented for comparison in Fig. 5 Methods for observing phenomena by physical sensors can be adapted to social sensors.Actually, some services based on social media use methods of observation resembling methods used with physical sensors.
Regarding Twitter users as social sensors, we can work with the following assumption.
1.Each Twitter user is regarded as a sensor.A sensor detects a target event and makes a report probabilistically.2. Each tweet is associated with a time and location, which is a latitude-longitude pair.

Features of social sensors
Social sensors differ from physical sensors in some points.We describe features of social sensors in comparison to physical sensors.Social sensors are uncontrollable.They sometimes become inoperable because some users are not on-line; maybe they are sleeping or busy doing something else.They also function improperly more often than physical sensors because users misinterpret events more often than physical sensors.Therefore, it is necessary to know that social sensors are noisier than physical sensors and that their signals must be analyzed more carefully.
Social sensors, which are users of social media, are located over a wide area.They can give responses to events of many kinds, ranging from natural phenomena, such as earthquakes and hurricanes, to events related to human activities, such as heavy traffic, live performances, and elections.The extremely numerous social sensors all over the world present the possibility of responding to events of many kinds.In other words, detection of target events can be done with no cost to set up sensors.However, when using social media systems such as Twitter, which incorporate these social sensors, it is necessary to filter the signals (tweets) posted by social sensors (Twitter users) according to the event that is to be observed.Using some method, it is necessary to extract tweets referring to a target event.We summarize the features of social sensors and physical sensors in Table 1.
We explain these methods in the next section.

Tweet collection
In the first step portrayed in Fig. 4, it is necessary to collect tweets referring to an earthquake from Twitter.This process includes two steps: crawling tweets from tweets that do not refer to the earthquake.For crawling and filtering tweets, we recommend using script programming languages, such as Python, PERL, and Ruby.

Crawling tweets from Twitter
To collect tweets or some user information from Twitter, one must use the Twitter Application Programmers Interface (API).Twitter API is a group of commands that are necessary to extract data from Twitter.Twitter has APIs of three kinds: Search API, REST API, and Streaming API.
In this section, we introduce Search API and Streaming API, which are necessary to crawl tweets from Twitter.We explain REST API later because REST API is necessary to extract location information from Twitter information.
Additionally, it is known that Twitter API specifications are subject to change.When using Twitter API, it is necessary to know the latest details and requirements.They are obtainable from Twitter API documentation1 .

Twitter Search API
The Twitter Search API extracts tweets from Twitter, including search keywords or those fitting other retrieval conditions, in chronological order.It is possible to use language, date, location and other conditions as retrieval conditions.
When searching tweets including earthquake posted from Some points must be considered when using Twitter Search API: • It is possible to collect tweets posted only during the prior five days.It is not possible to search tweets posted six days ago.• It is only possible to collect the latest 1500 tweets at one time.
(Technically speaking, it is possible to access one page with a request and track pages back to the 15th page.One page includes 100 tweets at most.Therefore it is possible to acquire the latest 1500 tweets at one time.)• One is limited to API requests.
(No limit is published, but it is possible to access the Twitter Search API at least 500 times per hour.) Therefore, we recommend the collection of tweets every 10 min or more often because it is impossible to crawl all tweets including earthquake if those tweets are posted at 2000 tweets per hour and one uses Twitter Search API every hour.Actually, tweets including earthquake were posted at more than 5000 per hour when the earthquake occurred on March 11, 2011.
Toretter requests the API command search 15 times every 5 min to collect the latest tweets each time: 180 command executions per hour.

Twitter Streaming API
The Twitter Streaming API extraction is defined in Twitter API documentation as follows: 319 Earthquake Observation by Social Sensors www.intechopen.com The Twitter Streaming API enables high-throughput near-real-time access to various subsets of public and protected Twitter data.
Twitter Streaming API provides some methods shown in Table 4, of which filter method can be used to crawl tweets related to earthquakes.command explanation filter returns public statuses that match one or more filtering conditions.firehose returns all public statuses.
A few companies have permission to access this command.link returns all statuses containing http: and https:.retweet returns all retweets sample returns a random sample of all public statuses.(ratiois about 1%) Table 4. Streaming API methods.
Filter method returns public statuses that match one or more filtering conditions.All conditions of filter are presented in Table 5.It is possible to use the parameter track to collect tweets because keywords can be set as a condition value of track.
command explanation follow returns public statuses that reference the given set of users.track returns public statuses that include specified keywords.locations returns public statuses that posted from a specific set of bounding boxes to track.
Table 5. Conditions of filter methods.
When using a filter command with the parameter keyword, earthquake, it is necessary to create a file called tracking that contains track=earthquake.Then one can access the following URL: https://stream.twitter.com/1/statuses/filter.json Streaming API also returns results in the form of JSON, shown in Fig. 6.Therefore, it is possible to parse those results in the same way as results obtained with Search API.
It is possible to collect tweets including earthquake in real time.Some points must be considered when using Twitter Streaming API: • The prepared server must have sufficiently high specifications to process all data received from Twitter.• It is impossible to use some characters in Twitter Streaming API (e.g., Japanese characters can not be used in Twitter Streaming API).
Using Toretter, we want to detect earthquakes in Japan.For that purpose, it is necessary to collect tweets including earthquake in Japanese.However, Japanese characters cannot be used in Twitter Streaming API.Therefore, Toretter uses the Twitter Search API to crawl tweets.To collect tweets of languages other than English, it is necessary to check whether that language is supported by the Twitter Streaming API.

Filtering tweets using machine learning
We collected data from tweets including keywords related to earthquakes, such as earthquake, shake.Sample tweets are presented in Table6.
Those tweets include not only tweets that users posted immediately after they felt earthquakes, but also tweets that users posted shortly after they heard earthquake news, or perhaps they misinterpreted some sense of shaking from a large truck passing nearby.
Figure 7 presents sizes of earthquakes and counts of Japanese tweets including the keyword earthquake on February 11, 2011.When the seismic activity reached its peak, the graph of tweets invariably showed a peak.However, when the graph of tweet counts showed a peak, the seismic activity did not necessarily show a peak.Some "false-positive" peaks of the graph of tweet counts arise from mistakes by users or some news related to earthquakes.Therefore, we must filter tweets to extract those posted immediately after the earthquake.We designate tweets posted by users who felt earthquakes as positive tweets, and other tweets as negative tweets.
Here, we describe the creation of a classifier to categorize crawled tweets into positive tweets and negative tweets, using Support Vector Machine: a supervised learning method.

Supervised learning
Supervised learning, a machine learning method, solves classification problem and regression problems analyzing training data.It is often used for spam mail filtering and gender estimation of Web users.

321
Earthquake Observation by Social Sensors www.intechopen.comToretter uses Support Vector Machine (SVM), an extremely effective supervised learning method.

Support Vector Machine
SVM is a method used to create a classifier for two-class pattern classification.The SVM projects each training sample as points (as presented on the left side of Fig. 8) into multi-dimensional feature space.It creates a hyperplane that has the largest distance to the nearest training sample points of each class (as presented on the right side of Fig. 8).One must input positive samples and negative samples into SVM, which creates a classifier for two classes by searching the hyperplane.
To study them in detail, several books are useful (Bishop, 2006).

Process of creating a classifier using machine learning
Figure 9 depicts the process of supervised learning, which has three steps.We explain this process using an example of creation of a spam filter along the lines of Fig. 9 First, we prepare both sample collections of spam mails as positive samples and those of other mails as negative samples.Those must be classified manually by humans.
Second, we extract various features from samples.We must select effective features for classification.Effective features are those which positive samples seem to have and which negative samples do not seem to have, or vice versa.For example, all words included in samples are often used to create spam filters because we can infer that spam messages include words such as "Free!", "50% off!", and "Call now!" with high probability.
Third, we input both positive samples and negative samples with feature information and create a classifier for those samples.If inputting a new mail into the classifier, then it outputs a positive value or a negative value.If the output is positive, the new mail is regarded as a spam message.

Creation of sample data for the classifier
Positive samples and negative samples must be created manually.There are two points of consideration.
First, this process is very sensitive.One must classify positive tweets and negative tweets accurately.Therefore, it is necessary to acquire records of actual earthquakes.One must choose positive tweets referring to these earthquake records to classify them precisely.
Second, one must prepare equal numbers of positive tweets and negative tweets.The number of samples needed depends on the task.Generally, it is said that sample data must comprise 300-500 samples.Actually, one should increase the number of samples until finding the classification which provides sufficient performance.

Extraction of features from sample data
Next, one must select features of tweets for machine learning.In the spam mail filter example, words included in sample mails are chosen as features.Toretter uses features of three kinds.We explain them in detail and use the following sentence for explanation.
Oh! Earthquake happened right now! Keyword features all words included in a tweet.example sentence → Oh, earthquake, happened , right, now Statistical features number of words in a tweet message and the position of the search keyword within a tweet example sentence → number of words: five, the position of the search keyword: second Context features words before and after a search keyword.
example sentence → Oh, happened Statistical features are the most effective in these three features according to results of our earlier research (Sakaki et al., 2010).It is guessed that this is true because people who came across an earthquake were surprised and in an emergency situation so that they tend to post short tweets such as "Oh! earthquake!" and "It's shaking".
Of course, these features can differ depending on language, country, and culture.Therefore, effective features should be chosen when creating a filter for tweets.

Applying machine learning
Some machine learning methods can create a classifier for any problem: Naive Bayes classifier, Neural Networks, Decision Tree, and Support Vector Machine.In this chapter, Support Vector Machine is used for our explanation because it is said that SVM is a superior method for classification problems and regression problems, and many SVM software packages exist.We treat SVM-Light, which is a popular SVM tool, as an example in this chapter.
Creating a classifier demands three steps.

Create training data from tweets
First, it necessary to convert tweet data into a training data file format for SVM-Light.The training data file format of SVM-Light is <target> <feature>:<value> <feature>:<value> ... <feature>:<value> # <info> -1 1:0.43 3:0.12 9284:0.2# abcdef In this file format, each line corresponds to a single tweet.<target> expresses a polar of each tweet.+1 means positive and −1 means negative.<feature> expresses a feature ID of each feature and <value> expresses the weight of each feature in the tweet.Each feature should be assigned to each feature ID.For example, if one assigns each feature to each feature ID, as in Table 7, then a tweet conversion into a training data for SVM-Light as shown below.

Earthquake detection from a time-series data using a probabilistic model
The second step of Fig. 4 detects an earthquake from positive tweets.
First, it is difficult to believe these tweets directly because some users misinterpret shaking caused by something other than an earthquake.Some ill-willed users post positive tweets to deceive others.This closely resembles physical sensors, and sometimes produces a wrong value.Therefore, we must process positive tweets to detect earthquakes with high accuracy, similarly to treating physical sensors.Many methods have been used to detect peaks from time-series data for purposes such as burst detection (Kleinberg, 2002;Zhu & Shasha, 2003) and anomaly detection (Cheng et al., 2008;Krishnamurthy et al., 2003).Toretter uses a static rule 5 tweets in 5 min that is calculated using an exponential function.We explain this method hereinafter.

Temporal model
To detect an earthquake using physical sensors, we must calculate the probability of earthquake occurrence based on signals from those sensors.Similarly, we must calculate the probability of earthquake occurrence from signals of social sensors.In this subsection, we explain the temporal model we use to calculate this probability.the graph of positive tweet counts.It can be inferred from these graphs that this frequency distribution of positive tweets is an exponential distribution, as expressed by the following equation (Sakaki et al., 2010).
We express the number of sensors producing positive value at time t in n(t).Here, n(t) is equal to the number of positive tweets at time t.I fn 0 sensors produce positive value at t = 0, then we can calculate the number of sensors for which the response is a positive value at time t using the following equation.
Therefore, we can calculate N t a , the number of sensors that produce a positive value from time 0 to time t a , as presented below.
We define the false-positive ratio of a sensor as p f .In this case, we assume that we have n sensors and that all n sensors have the same false-positive ratio equally.The probability of all n sensors producing a false alarm is p n f .Therefore, the probability of earthquake occurrence can be estimated as From Eq. 3, Eq. 4, we can calculate the probability of earthquake occurrence at time t a .
326 Earthquake Research and Analysis -Statistical Studies, Observations and Planning www.intechopen.com

Setup the condition for detection trigger
In the Toretter system, we detect an earthquake when five positive tweets arrive in 5 min, which means five sensors produce positive signals in 5 min.In this subsection, we explain how to determine this condition.
We set λ = 0.34, p f = 0.35 (taken from our earlier research) to Equation ( 5) , by which we can calculate the probability of earthquake occurrence.When obtaining n 0 positive tweets, and given that we would like to make an alarm with false-positive ratio less than 1%, we can calculate t wait as If we set t wait = 5, then we can calculate n 0 = 4.1 from Eq. 6.Therefore, the trigger for earthquake detection is set as five positive tweets come in 5 min in Toretter.The trigger used for detection of earthquake calculation can be determined using an exponential function, as described in this section.

Location estimation from tweets
In this section, we explain a means to estimate the location of an earthquake epicenter by analyzing tweets.First, we introduce the kinds of location information to be acquired from tweets.Next, we explain methods to estimate the location of the earthquake epicenter.

Extracting location information from tweets
Two kinds of information are applicable for location estimation from tweets: using location information in the Twitter user profile or using geotag attached to tweets.

Location information in user profiles
The twitter user profile includes the location information of users.Of course, not all users make their location information public on the internet, but a sufficient number of users do so (This number varies among countries.).
For earthquake detection, we collect positive tweets.We extract the location information of users who post those positive tweets for earthquake epicenter location estimation.Twitter REST API must be used to extract location information of users from Twitter.
Twitter REST API is one Twitter API included among all methods to use basic functions of Twitter.Many methods of using REST API exist.We use the users/show method to obtain user information.To extract user information of Twitter user TwitterAPI, it is necessary to access the following URL.
http://api.twitter.com/1/users/show.json?screen_name=TwitterAPI &include_entities=true It is possible to obtain results in Fig. 12, which is described in JSON format, in the same manner as that used for Twitter Search API.It is possible to know from the result in Fig. 12 that Twitter user TwitterAPI resides in San Francisco, CA.Some points to consider when using Twitter REST API are the following:  • Some users do not register their location information, or register non-location data, such as in a dream, anywhere.Such non-location data should be ignored.• API requests are limited.
(The limit is published: it is possible to access the Twitter Search API about 150 times per hour without authorization.) It is possible to access REST API 150 times per hour.This limit is sufficient to extract user information for location estimation of an earthquake epicenter because the earthquake-related tweets posted in the 5 min after an earthquake are most often fewer than 100.To expand the limit, one must register with Twitter and obtain an authorization called OAuth, according to the Twitter API Documentation3 .
Moreover one must convert location information acquired from Twitter into a latitude-longitude pair because human beings can understand places expressed by the names of places, such as San Francisco, but a computer can not understand where that place is.One must treat location information in the format of a latitude-longitude coordinate pair.At present, some web services can convert geographical names into a latitude-longitude coordinate pairs, such as the Google Maps API and Yahoo Maps API.Here we explain the Google Maps API.
To convert San Francisco into a a latitude-longitude coordinate pair, one can access the following URL.
http://maps.google.com/maps/api/geocode/json?address=San %20Francisco&sensor=false&language=en Results are obtainable as in Fig. 13, which is described in JSON format, in the same manner as Twitter API.It is possible to convert San Francisco into latitude = 37.7749295, longitude = −122.4194155.
Location information related to an earthquake can be acquired as described above.Fig. 13.Result of geographical name converted using Google Maps API.

Geotags attached to each tweet
Some tweets have an attached geotag, which includes a latitude-longitude pair acquired from GPS.If positive tweets related to an earthquake include tweets with attached geotags, then it is possible to use these geotag data for location estimation.Geotag data can be extracted using the Twitter Search API.Therefore, GPS data can be obtained if stored when using crawl for those tweets by the Twitter Search API.
Geotag data are more accurate than location information of the Twitter user profile because they are acquired from GPS.Nevertheless, it is unusual that positive tweets referring to an earthquake include a sufficient number of tweets with attached geotags to estimate the earthquake epicenter location.Actually, a combination of location information of Twitter users and geotag should be used.

Location estimation using Bayesian filtering
If one can obtain sufficient location information from positive tweets, then estimating the location of the earthquake epicenter can be done using the information.Nevertheless, that information is often inaccurate.Alternatively if they are precise, then users might still be posting far from the earthquake epicenter.Therefore, it is preferred that the location of the earthquake epicenter be estimated probabilistically.
Several methods can be used to estimate the location of events from sensor readings using Bayesian Filters: Kalman filters, Multihypothesis tracking, Grid-based approaches, Topological approaches, and Particle filters.
We use particle filters as an example for explanation.Particle filters have high performance in belief, accuracy, robustness, and variety according to an evaluation by Fox et al. (Fox et al., 2003).Moreover particle filters work better to detect earthquakes from Twitter in the experiments by Sakaki et al. (Sakaki et al., 2010).

Spatial model
Each tweet is associated with a location.We describe a method that can estimate the location of an event from sensor readings.To define the problem of location estimation, we consider the evolution of the state sequence {x t , t ∈ N} of a target, given To solve the problem, several methods of Bayesian filters are proposed such as Kalman filters, multi-hypothesis tracking, grid-based and topological approaches, and particle filters.For this study, we use particle filters, both of which are widely used in location estimation.
Additionally, we must consider the nonuniform distribution of Twitter users when we apply Bayesian filters to social sensors because social sensors are arranged non-uniformly to a greater degree than normal physical sensors are.

Location estimation using a particle filter
A particle filter is a Bayes filter that approximates a state probabilistically.It is a sequential Monte Carlo method.For location estimation, we maintain a probability distribution for the location estimation at time t, designated as the belief Bel(x t )={x i t , w i t }, i = 1...n.Each x i t is a discrete hypothesis related to the location of the object.The w i t are non-negative weights, called importance factors, which sum to one.
The Sequential Importance Sampling (SIS) algorithm is a Monte Carlo method that forms the basis for particle filters.The SIS algorithm consists of recursive propagation of the weights and support points as each measurement is received sequentially.

Evaluation and application
In this section, we explain how to evaluate results of experiments and describe points that should be considered when applying these methods.

Selection of the target area
Three conditions must be met to apply methods for earthquake observation from social media.
The first is that a sufficient number of people use Twitter in a targeted area.The second one is that several earthquakes occur each year for a target area.The third one is that infrastructure should be set up in a target area.
These three conditions are needed in each step of earthquake detection and location estimation.A sufficient number of tweets and a certain number of earthquakes are needed to create a classifier for tweets and to estimate the locations of earthquake epicenters.Accurate logs of earthquakes are also necessary to calculate the false-alarm probability of social sensors and to evaluate the earthquake detection system performance.
If creating a classifier and setting a trigger for earthquake detection in an area and applying them in another area, then the third condition is not indispensable.However, the first condition and the second condition are necessary in both areas.15 depicts an earthquake occurrence distribution map.Earthquake detection using information from Twitter users is applicable in overlapping areas of these two maps: for example, Japan, the west coast of the U.S., Indonesia, Turkey, Iran, and Italy.
The number of Twitter users has been increasing continuously.Therefore, those areas can probably be expanded.Additionally, if one uses social media other than Twitter, then overlapping areas might be changed.
Therefore, a target area should be chosen very carefully to apply the methods described in this chapter.

Evaluation of earthquake detection
To evaluate the performance of earthquake detection and earthquake epicenter location estimation, one must collect earthquake data from some organizations.Those data must include information about an approximate time point of an earthquake and approximate position of an earthquake epicenter.Moreover, it is better that they include the exact time of an earthquake, the longitude and latitude of an earthquake epicenter, and the seismic intensity of earthquakes in each region.
For example, the Japan Meteorology Agency (JMA) publishes an earthquake database on the Web, which includes a time, magnitude, and earthquake intensities at each point of area, a place of earthquake epicenter of all earthquakes above level 1 on the Japanese seismic intensity scale 5 .The USGS publishes similar data on the Web 6 .Data of such kinds can be obtained by crawling.They can be used to create training data for classifiers and to evaluate the performance an earthquake detection system.

Conclusion
Our research is an early approach to using Twitter as a social sensor for earthquake observations.It is meaningful that we apply methods by ordinary physical sensors to earthquake detection by social sensors.Furthermore, we present the possibility of earthquake detection without installing numerous physical sensors.The method is effective for earthquake observations in some countries where a few seismic sensors exist.However, it is difficult to detect earthquakes occurring in oceanic areas or less populated areas using methods we introduced in this chapter.Therefore, we must verify that earthquake detection by social sensors is effective when we apply these methods.Furthermore, the applicable scope of the earthquake observation by social sensors can be extended considering a stochastic gradient, more detailed probabilistic models, and so on.Many subjects remain to be explored in future work.

Fig. 5 .
Fig. 5. Correspondence between event observation by social sensors and by physical sensors.
Fig. 7. Size of earthquakes and change of tweet counts on February 11, 2011
svm_learn "training data file" "model file" svm_learn is a command in SVM-Light to create a model file for classifier.After running svm_learn, it is possible to obtain model file as an output of svm_learn.It is possible to classify the tweet command svm_classify with this model file.When classifying new tweets into a positive class and negative class, each tweet is converted into test data in the same format as training data.Then the following command is executed.svm_classify "test data file" "model file" "output file 324 Earthquake Research and Analysis -Statistical Studies, Observations and Planning www.intechopen.comIt is possible to obtain polars of each tweet in the output file New tweets are classifiable into a positive class and negative class by the classifier for tweets as described.SVM-Light(Joachims, 2008), LIBSVM(Chih-Chung & Chih-Jen, 2011), and Classias

Figure 10
Figure 10 depicts the sizes of earthquakes and counts of positive tweets filtered by SVM on Feb 11 2011.These two graphs are correlated: whenever an earthquake occurs, a peak appears in the graph of positive tweet counts.Therefore, we can detect earthquakes by detecting the peaks of positive tweet counts.

Figure 11
Figure 11 presents graphs of positive tweet counts during earthquakes.In Fig. 11, the green line shows an exponential function.As shown here, the green line resembles the red line,

Figure 14
Figure14depicts the Twitter user distribution map and Fig.15depicts an earthquake occurrence distribution map.Earthquake detection using information from Twitter users is applicable in overlapping areas of these two maps: for example, Japan, the west coast of the U.S., Indonesia, Turkey, Iran, and Italy.

Table 1 .
Features of physical sensor and social sensors.

Table 2 .
Search results of keyword earthquake after the conversion.It is possible to obtain results in Fig.6, as described in JavaScript Object Notation (JSON) format, which is a text-based open standard designed for human-readable data.It is possible to convert this result in Fig.6into Table2by parsing the result using a script programming language.Parameters that are often used to collect tweets are shown in Table 3 (This table is referred to Twitter API Documentation 2 ).
Fig. 6.Search results from Twitter Search API.name explanation required value q search keywords required rpp the number of tweets to return per page optional up to 100 result type search result of type optional mixed/recent/popular until tweets before the given date optional before today since tweets after the given date optional after 5 days ago lang restricts tweets to the given language optional jp/en/all/othersTable 3. Search conditions of Twitter Search API.

Table 7 .
Sample features for SVM-Light.
is a possibly nonlinear function of the state x t−1 .Furthermore, u t is an i.i.d.process noise sequence.The objective of tracking is to estimate x t recursively from measurements, asz t = h t (x t , n t ), h t : R n t ×R n t →R n t ,where h t is a possibly nonlinear function, and where n t is an i.i.d.measurement noise sequence.From a Bayesian perspective, the tracking problem is to calculate, recursively, some degree of belief in the state x t at time t, given data z t up to time t.Presuming that p(x t−1 |z t−1 ) is available, the prediction stage uses the following equation.
p(x t |z t−1 )= p(x t |x t−1 )p(x t−1 |z t−1 )dx t−1Here we use a Markov process of order one.Therefore, we can assume thatp(x t |x t−1 , z t−1 )=p(x t |x t−1 ).In the update stage, Bayes' rule is applied asp(x t |z t )=p(z t |x t )p(x t |z t−1 )/p(z t |z t−1 ),where the normalizing constant is p(z t |z t−1 )= p(z t |x t )p(x t |z t−1 )dx t .