Change Point Analysis in Earthquake Data

Earthquake forecasts are very important in human life in terms of estimating hazard and managing emergency systems. Defining of earthquake characteristics plays an important role in these forecasts. Of these characteristics, one is the frequency distribution of earthquakes and and the other is the magnitude distribution of the earthquakes. Each statistical distribution has many parameters describing the actual distribution. There are various statistical distributions used to model the earthquake numbers. As is well known, these are binomial, Poisson, geometric and negative binomial distributions. It is generally assumed that earthquake occurrences are well described by the Poisson distribution because of its certain characteristics (for some details, see Kagan, 2010; Leonard & Papasouliotis, 2001). In their study, Rydelek & Sacks (1989) used the Poisson distribution of earthquakes at any magnitude level. The Poisson distribution is generally used for earthquakes of a large magnitude, and the earthquake occurrences with time/space can be modeled with the Poisson process in which, as is known, the Poisson distribution is one that counts the events that have occurred over a certain period of time. There is a significant amount of research on the change point as applied to earthquake data. Amorese (2007) used a nonparametric method for the detection of change points in frequency magnitude distributions. Yigiter & İnal (2010) used earthquake data for their method developed for the estimator of the change point in Poisson process. Aktaş et al. (2009) investigated a change point in Turkish earthquake data. Rotondi & Garavaglia (2002) applied the hierarchical Bayesian method for the change point in data, taken from the Italian NT4.1.1 catalogue. Recently, much research in the literature has focused on whether there is an increase in the frequency of earthquake occurrences. It is further suggested that any increase in the frequency of earthquakes, in some aspects, is due to climate change in the world. There is considerable debate on whether climate change really does increase the frequency of natural disasters such as earthquakes and volcano eruptions. In many studies, it is emphasized that there is serious concern about impact of climate change on the frequencies of hazardous events (Peduzzi, 2005; Lindsey, 2007; Mandewille, 2007 etc.). In Peduzzi’s study (2005), there are some indicators about increasing number of the earthquakes especially affecting human settlements, and it is also reported that there is an increase in the percentage of earthquakes affecting human settlements from 1980 onwards. The change point analysis can be used to study the increase or decrease in the frequency of the earthquake occurrences.


Introduction
Earthquake forecasts are very important in human life in terms of estimating hazard and managing emergency systems.Defining of earthquake characteristics plays an important role in these forecasts.Of these characteristics, one is the frequency distribution of earthquakes and and the other is the magnitude distribution of the earthquakes.Each statistical distribution has many parameters describing the actual distribution.There are various statistical distributions used to model the earthquake numbers.As is well known, these are binomial, Poisson, geometric and negative binomial distributions.It is generally assumed that earthquake occurrences are well described by the Poisson distribution because of its certain characteristics (for some details, see Kagan, 2010;Leonard & Papasouliotis, 2001).In their study, Rydelek & Sacks (1989) used the Poisson distribution of earthquakes at any magnitude level.The Poisson distribution is generally used for earthquakes of a large magnitude, and the earthquake occurrences with time/space can be modeled with the Poisson process in which, as is known, the Poisson distribution is one that counts the events that have occurred over a certain period of time.There is a significant amount of research on the change point as applied to earthquake data.Amorese (2007) used a nonparametric method for the detection of change points in frequency magnitude distributions.Yigiter & İnal (2010) used earthquake data for their method developed for the estimator of the change point in Poisson process.Aktaş et al. (2009) investigated a change point in Turkish earthquake data.Rotondi & Garavaglia (2002) applied the hierarchical Bayesian method for the change point in data, taken from the Italian NT4.1.1 catalogue.Recently, much research in the literature has focused on whether there is an increase in the frequency of earthquake occurrences.It is further suggested that any increase in the frequency of earthquakes, in some aspects, is due to climate change in the world.There is considerable debate on whether climate change really does increase the frequency of natural disasters such as earthquakes and volcano eruptions.In many studies, it is emphasized that there is serious concern about impact of climate change on the frequencies of hazardous events (Peduzzi, 2005;Lindsey, 2007;Mandewille, 2007 etc.).In Peduzzi's study (2005), there are some indicators about increasing number of the earthquakes especially affecting human settlements, and it is also reported that there is an increase in the percentage of earthquakes affecting human settlements from 1980 onwards.The change point analysis can be used to study the increase or decrease in the frequency of the earthquake occurrences.
The change point analysis is a very useful statistical tool to detect an abrupt or a structural change in the characteristic of the data observed sequentially or chronologically.Many different statistical methods are available to detect the change point in the distribution of a sequence of random variables (Smith, 1975;Hinkley, 1970;Boudjellaba et al., 2001 in many others).Multiple change points have also been investigated for many years (Hendersen & Matthews, 1993;Yao, 1993;Chen & Gupta, 1997, among others.).In this study, we aimed to find an evidence of increased or decreased in world seismic activities after 1901.Earthquake frequencies are modeled by the Poisson distribution as the most commonly used discrete distribution.We modeled the earthquakes in the world with the Poisson process since the number of earthquakes is counted with Poisson distribution.We investigated any abrupt change point(s) in parameters of the model using the frequentist (maximum likelihood) and Bayesian method.When the magnitude of the earthquakes is taken into account in the process, it is then called the compound Poisson process.We also investigated a change in the magnitudes of the world earthquakes.For this purpose, Poisson process and compound Poisson process are introduces in Section 2. The frequentist and Bayesian methods used for change point estimates are explained in Section 3. Worldwide earthquake data and change point analysis of this data are given Section 4.

Poisson process
The Poisson process has an important place in stochastic processes.
See Parzen (1962) for some details about Poisson processes.

Methods for Estimating Change Point
The Maximum likelihood method and the Bayesian method are basic methods in statistical change point analyses for point estimation and hypothesis testing.

Change point estimation with the maximum likelihood method (frequentist method)
Let  12 n X, X, , X be a sequence of the random variables such that i X   (i 1, , ) has probability density function where the change point  is unknown discrete parameter and the parameters  1 ,  2 can be assumed to be either known or unknown.The single change point model in the sequence of the random variables is written: X, X, , X ~f ( x , ) X, X, , X f ( x ,) . (5) Under model Eq.( 5), for the observed values of the sequence of the random variables, the likelihood function is and the logarithm of the likelihood function is: After adding the expression nf(x , ) in Eq.( 7) then Eq.( 7) can be re-written as follows: When the parameters,  1 ,  2 are unknown, for any fixed values of change point , the maximum likelihood estimator of the parameters  1 ,  2 are found to be the derivative of Eq.( 8) 1  and 2  respectively: After solving the equations system given in Eq.( 9) and ( 10), the maximum likelihood estimator of the parameters  1 ,  2 ,  1 ˆ,  2 ˆ are obtained.The maximum likelihood estimate of the change point  is:

Change point estimation with the Bayesian method
The Bayesian method differs from the frequentist method in that each parameter is assumed to be a random variable and each one has a probability function called prior distribution.
The estimate of the unknown parameter is obtained by deriving a posterior distribution on the basis of the prior distributions and the likelihood function.The posterior distribution is obtained: Under change point model given in Eq.( 5), let   012 p ( , ) be the joint prior distribution of parameters  1 ,  2 and let  0 p( ) be the prior distribution of change point .The likelihood function is given by Eq.( 6) and so the joint posterior distribution is written as follows: Integrate Eq.( 12) with respect to the parameters, the marginal posterior distribution of change point is proportional to and the Bayesian estimate  ˆ of the change point  is found by maximizing the marginal posterior distribution given by Eq.( 13).Assuming uniform priors, the joint posterior mode, which gives the maximum likelihood estimates, is at  ˆ,  1 ˆ,  2 ˆ (Smith, 1975).

Detecting change point in worldwide earthquake data and findings
This study investigates whether there is a change point in worldwide earthquake activities such as the number of earthquake occurrences and their magnitude.At first, assuming that the occurrences of the earthquakes follow the homogeneous Poisson process, a change point is investigated in the occurrence rate of the earthquake in unit time; secondly, the magnitudes of the earthquake are assumed to be normally distributed random variables; a change point is investigated in the mean of the normally distributed random variables describing the earthquake magnitudes.
To detect a change point in earthquakes, the relevant data is taken from the website of the U.S. Geological Survey ( 2011 T,T, ,T are exponentially distributed random variables as given in Eq.( 2).Under the assumption that  is equal to the occurrence time of an event, the likelihood function can be written under the change point model in the sequence of exponentially distributed random variables: Where  shows change point and is a continuous parameter defined time interval (0,T],  2 is the occurrence rate in the time interval  (,T ] and  N is the number of events that occurred in the time interval  (0, ] and T is the sum of the time interval between the earthquake occurrences, that is Tt (Raftery & Akman, 1986;Akman & Raftery, 1986).

The logarithm of the likelihood function is
After the derivation of the log likelihood function given by Eq.( 15) for  1 and  2 , the maximum likelihood estimations of 1  and 2  are obtained respectively: For the possible values of ˆ,a n d  ˆ are the maximum likelihood estimation for  1 ,  2 and .The scatter plot of the time intervals between the earthquakes is shown in Figure 1.From Figure 1, it is clearly seen that there is a change (or at least one change) in the occurrence rate of the earthquakes.When we assume a single change point in the data and the method explained above is applied this data and the results are given in Table 1.As is shown in Table 1, the occurrence rate of the earthquake in unit time (in day or in year) increased considerably after February 3, 2002, approximately ten times of the occurrence rate of the earthquakes in unit time before February 3, 2002.From the log likelihood function of the change point in Figure 2, it can be seen that there are at least two change points in the data.There many methods are used to detect multiple change points in the data.One of the basic methods used for this purpose is binary segmentation procedure (Chen & Gupta, 1997;Yang & Kuo, 2001).In the binary segmentation procedure, the data is divided into two homogeneous groups according to the estimated change point, and a change point is searched in each subdivided data until there is no change in the subdivided data.The results are given in Figure 3. Table 1.Estimated parameters for the earthquake data.
www.intechopen.comFig. 2. The maximum point of log likelihood function given in Eq.( 16) is shown as a red star and the corresponding change point estimate is shown with a red circle.
Change points and the occurrence rates of the earthquake throughout time axis 09-Mar-1957 10-May-1997 12-Jan-2010  As can be seen in Figure 4 and Figure 5, the occurrence rate of the earthquakes increases slowly up until 03-February-2002, goes up sharply between 03-February-2002 and 12-January-2010, and tends to fall thereafter.When using Bayesian method, the prior distributions of the parameters  1 ,  2 and , are taken to be respectively, p( , , ) L ( , , ) p( , ) p() With integrate Eq. ( 18) with respect to the parameters  1 ,  2 , the marginal posterior distribution of change point is proportional to where ( .) is gamma function.The marginal posterior distributions of 1  and 2  are not obtained analytically because of discontinuity in  and the close form of the marginal posterior distributions of  1 and  2 are respectively proportional to where, for the possible values of ,  12 n s, s, , s , and the variable  N takes the values  1,2, ,n .Form Eq.( 19), the Bayesian estimate  ˆof the change point in the earthquake data is found to be the date 03-February-2002 2002 which corresponds to the posterior mode.The posterior distribution of the change point is given in Figure 6.The Bayesian estimates of the parameters 1  and 2  would be the same as the maximum likelihood estimates because of uninformative prior distributions.
For multiple change points, the binary Bayesian segmentation procedure can be easily used.
The procedure is employed as the mode of the posterior distribution of change point decreases considerably until the one before the posterior mode is found.The results are given in Figure 7.
Change points and the occurrence rates of the earthquake throughout time axis 09-Mar-1957 10-May-1997 12-Jan-2010 When comparing the maximum likelihood estimates and the Bayesian estimates of the change points, we can see that the change points corresponding to dates such as 9- March-1957, 10-May-1997, 03-February-2002and 12-January-2010 overlap.These dates are investigated in depth from many aspects, including overall world temperature, other disasters, and astronomical events.
When we look at the magnitudes of the earthquakes in the data, those of magnitude 6 to 6.9 number 291 (35.57%) and those of magnitude 7 to 7.9 number 258 (31.54%) (Table 2).The histogram of the magnitudes is given in Figure 6.Furthermore, using the Kolmogorov-Smirnov test, the distribution of the magnitudes is found to be almost normal (p=0.000).
Magnitude 4-4.9 5-5.9 6-6.9 7-7.9 8-8.9 9 The where  is an unknown change point in the mean of the sequence of the normally distributed random variables.2) and the logarithm of the likelihood function in Eq. ( 21) is: The mean and variance of the magnitudes of the earthquakes according to the estimated change points dates, (9-March-1957, 10-May-1997, 03-February-2002and 12-January-2010) are given in Table 3.

Estimated Change points
Up to 9-Mar-1957 Using Table 3, we compute the recurrence periods for certain earthquakes of magnitude y, corresponding to the change point, 12-January-2010.The recurrence period of an earthquake with magnitude y is computed by Reccurence period=1/(Expected number of earthquakes). www.intechopen.com The expected number of earthquakes of magnitude y is computed by the multiplying the probability of occurrence class of magnitude with the expected number of earthquakes over a certain period of time.www.intechopen.com

Conclusion
Predicting earthquakes is of crucial importance these days.Many researchers have studied earlier attempts at earthquake detection.The change point analysis is used in both backward (off-line) and forward (on-line) statistical research.In this study, it is used with the backward approach in the worldwide earthquake data.The change points found in the worldwide earthquake data are useful in making reliable inferences and interpreting the results for further research.Each date found as the change point in the earthquake data should be carefully investigated with respect to other geographical, ecological, and geological events or structures in the world.
), consisting of 819 earthquakes worldwide of magnitude 4.0 or above covering the period from 3-March-1901 until 11-March-2011.It is assumed that the earthquake occurrences follow the homogeneous Poisson process.The earthquakes are observed at times  , , S in the continuous time interval (0,T].For the earthquakes data, the starting point S 0 is in 3-March-1901 and the end point S 818=T is in 11-March-2011.

Fig. 1 .
Fig. 1.The time intervals (days) between the earthquakes according to the earthquake occurrences.

Fig. 4 .Fig. 5 .
Fig. 4. Multiple change points in the earthquake data.Estimated change points are shown on the scatter plot of the time intervals between the earthquakes.
where the prior distributions are called uninformative priors.The joint posterior distribution of the parameters is can be written as:

Fig. 6 .
Fig. 6.The posterior distribution of change point and the posterior mode.

Fig. 7 .
Fig. 7.The Bayesian estimations of parameters for multiple change points in the earthquake data (*in days).

Fig. 8 .
Fig. 8. Histogram of the earthquake magnitudes.The sequence of the earthquake magnitudes 12

Table 2 .
The number of earthquakes with respect to magnitudes.
For any given value of , the maximum estimates of unknown parameters  1 ,  2 , are The estimated change point is close to end of the sequence such that it refers to no change point in the sequence.We can model both the number of the earthquake occurrences and the magnitudes of the earthquakes with compound Poisson process.Let  } are the magnitudes of the earthquakes, which are normally distributed random variables with parameters  i and  2 .

Table 4 .
The estimation of recurrence periods for certain earthquakes after the change point 12-January-2010.

Table 5 .
Expected number of earthquakes and the expected total magnitudes of earthquakes with respect to the change point 12-January-2010.

Table 6 .
Probability of earthquake occurrences after the change point 12-January-2010.