Assessing Seismic Hazard in Chile Using Deep Neural Networks

Earthquakes represent one of the most destructive yet unpredictable natural disasters around the world, with a massive physical, psychological, and economi-cal impact in the population. Earthquake events are, in some cases, explained by some empirical laws such as Omori’s law, Bath’s law, and Gutenberg-Richter’s law. However, there is much to be studied yet; due to the high complexity associated with the process, nonlinear correlations among earthquake occurrences and also their occurrence depend on a multitude of variables that in most cases are yet unidentified. Therefore, having a better understanding on occurrence of each seismic event, and estimating the seismic hazard risk, would represent an invaluable tool for improving earthquake prediction. In that sense, this work consists in the implementation of a machine learning approach for assessing the earthquake risk in Chile, using information from 2012 to 2018. The results show a good performance of the deep neural network models for predicting future earthquake events.


Introduction
Chile is a one of the most seismic countries in the world, with an average of a major earthquake (> 8 in Richter scale) every 10 years. The last major earthquake in Chile was registered on February 27, 2010, that affected almost 80% of the Chilean population, registering 525 deaths and several wounded. Therefore, having a better approximation or additional information on where, when an event of that magnitude could occur would represent an invaluable tool for managing and designing public policies regarding natural disasters [1,2]. However, earthquake prediction is a very challenging task, due to its highly complex, chaotic, or nonlinear nature, and also, their occurrence depend on a multitude of variables that in most cases are yet unidentified [3,4].
Ogata [5] introduced epidemic-type aftershock sequence (ETAS) models for seismic hazard estimation; those models and their multiple extensions [6][7][8][9][10][11] are statistical models that use a given parametrization of the expected number of events in a given region conditional on the past events, also known as the conditional ground intensity function (GIF). The GIF is associated with the occurrence rate of an earthquake and its triggering function at time t and within an ( x, y ) location. Aftershocks are then estimated following the seismic aftershock propagation law or Omori's law [12]. Also, it is widely used for earthquake forecast applications [11,13,14]. Although the ETAS models are very good for estimating the intensity function and forecasting triggering events, they normally fail to predict the risk of main events due to their limitations in identifying foreshock events. Then, their performance could also be affected by the use of very large datasets.
Joffe et al. [15] stated that current techniques are insufficiently sensitive to allow for precise modeling of future earthquake occurrences. The above raises the importance for new approaches that consider broader and bigger sources of information. In that sense, deep learning (DL) models have state-of-art accuracy for most of the problems where statistical learning models are applied and where a precise mathematical formulation is hard to obtain. Moreover, DL methods, like deep feedforward artificial neural networks (DFANNs) and recurrent neural networks with long short-term memory (RNN-LSTM), have appeared in the last few years, with incredible success to a variety of problems: speech recognition, language modeling, translation, time series anomaly detection, and stock market prediction, to name a few [16]. This paper presents a temporal deep learning approach for ground intensity function estimation in Chile, using historical information from seismic event catalogs.

Methods
The general purpose for this work is to use a deep learning (DL) approach with deep feedforward artificial neural networks (DFANNs) and a recurrent neural networks with long short-term memory (RNN-LSTM) for ground intensity function estimation. First, the data are preprocessed to estimate the daily ground intensity function; then the output is used as input for the DL networks (DFANN and RNN-LSTM). Finally, both DL approaches are compared to find the best model. A description of the proposed procedure is shown in Figure 1.

Data
The database consisted of 86,000 seismic event records occurred in Chile, from 2000 to 2017, obtained from the National Seismological Center (http://www.sismologia.cl); each record consists of a time location (year, month, day, hour, minute, and second), a spatial location (latitude and longitude), depth (in kilometers), and magnitude (on Richter scale). Figure 2 shows the spatial distribution of seismic events with magnitude superior to 6 (in Richter scale).

Figure 1.
Scheme for the two modular DL neural network framework: data preprocessing and estimation modules. In the data preprocessing module, all data are analyzed and prepared as inputs for the following modules; this considers estimating the daily ground intensity function. The estimation module will receive inputs from the previous model and use DFANN

Data preprocessing module
The data preprocessing module consists of estimating the conditional intensity function that represents a way of specifying how the present depends on the past in an evolutionary point process [17]. Point process models have become essential components in the assessment of seismic hazard. A particular class is given by the Natural Hazards -Risk, Exposure, Response, and Resilience self-exciting temporal point process which models events whose rate at time t may depend on the history of events at times preceding t, allowing events to trigger new events (see [18,19] and the references within). These models appeared for the first time in applications to population genetics, and for this they are also known as epidemic-type models. Ogata [5,20] introduced the epidemic-type aftershock sequence (ETAS) models for modeling seismic events. These models are characterized by a parametric intensity function which represents the occurrence rate of an earthquake at time t conditional on the past history of the occurrence.
ETAS models and its successive extensions have proven to be extremely useful in the description and modeling of earthquake occurrence times and locations. Self-exciting point process models [5,19] were initially introduced in time and successively extended to the space [19]. The temporal self-exciting point processes can be defined in terms of the conditional ground intensity function (GIF): where N (A) is the number of events occurring at time t ∈ A and { ℋ t : t ≥ 0} is the history of all events up to time t. By denoting t i ∈ [0, T) , a simple point process with t i < t i+1 , the GIF can be written as where the component μ can be considered the base rate that prevents the process to die out, m i is the magnitude at the time t i , and g is the triggering function which determines the form of the self-excitation [5]. This process with intensity function λ g ( t | ℋ t ) is also known as marked self-exciting point process, where the mark is given by the magnitude associated to each event. For example, the magnitude of an earthquake also influences how many aftershocks there will be. Different parameterizations have been proposed for the functions m and f . Ogata [5] proposed the use of c (m) = e β (m− M t ) and f (t) = K _____ (t + c) p , where the parameter β measures the effect of magnitude in the production of aftershocks and f is the modified Omori formula [12], with t representing the time of occurrence of the shock, K a normalizing constant depending on the lower bound of the aftershocks, and c and p are characteristic parameters of the seismic activity of a given region.
The ground intensity function estimation can be estimated using the PtProcess library available in R [21].

Estimation module
Once the GIF databases are obtained for each magnitude (>3, >4, >5 and >6), they are structured for estimation with the DL models. The database is separated in two groups, training and test (67 and 33% of the data, respectively). A lookback of 3 is used, meaning that the output in time t will be estimated considering a window of t −1 , t −2 , t −3 inputs. Also both models were trained with 100 epochs.

Deep feedforward neural networks (DFANNs)
Deep feedforward artificial neural network (DFANN), also called feedforward neural networks or multilayer perceptron, is the most popular and widely known artificial neural network. In this network, the information is propagated in a forward direction, from the input nodes through the hidden nodes (if any) and to the output nodes. As stated by [22,23], DFANNs are universal approximators, and the universal approximation theorem states that "every bounded continuous function with bounded support can be approximated arbitrarily closely by a multilayer perceptron by selecting enough but a finite number of hidden neurons with appropriate transfer function" [22,24].
The goal of a DFANN is to approximate some function f * by mapping y ̂ = f (x; ) and learn the value of the parameters θ that result in the best function approximation for f * [25]. The DFANN model consists a set of elementary processing elements called neurons. These units are organized in an architecture with three types of layers: the input or sensory layer, the hidden, and the output layers. The neurons corresponding to one layer are linked to the neurons of the subsequent layer without any type of bridge, lateral, or feedback connections. The connections symbolize the flux of information between neurons. Figure 3 illustrates the architecture of this artificial neural network with r hidden layers.
DFANN operates as follows. The input signal is received by the neurons of the input layer; these neurons are just in charge of propagating the signal to the first hidden layer, and they do not make any processing. The first hidden layer processes the signal (applying a nonlinear transformation or transfer function) and transfers it to the subsequent layer; the second hidden layer propagates the signal to the third and so on. The number of hidden layers gives the depth of the model, hence the term "deep." When the signal is received and processed by the output layer, it generates the response.
The knowledge of the DFANN is registered, by the learning algorithms, in the connections between the neurons of each layer = { 1 , 2 , … , r } , called weights. Several learning algorithms have been created to estimate the weights, where the most popular and the first being the backpropagation, also known as generalized delta rule, popularized by [26]. The backpropagation learning algorithm is a supervised learning method and is an implementation of the Delta rule. It requires the desired output for any given input to be able to compute the output error. The main idea of the algorithm is to have a backward propagation of the errors from the output nodes to the inner nodes. For the construction of the backpropagation learning algorithm, we need to compute the gradient of the error of the network with respect to the network's modifiable weights. A DFANN network with 4 hidden layers and 12 neurons in each layer was implemented for this work.

Recurrent neural networks with long short-term memory (RNN-LSTM)
As firstly proposed by Rumelhart [26], recurrent neural networks have a primitive type of memory, in the form of recurrent layers that can operate in time [27]. Each recurrent layer takes both the output of the previous layer and an internal output of the current layer as inputs. Thus, RNNs are ideal for dealing with time series data [27]. RNNs can solve the purpose of sequence handling to a great extent but not entirely; they are great when it comes to short contexts, but to be able to build a story and remember it, the models need to be able to understand and remember the context behind longer sequences, just like a human brain. This is not possible with a simple RNN. Long short-term memory (LSTM) networks [28] are a type of RNN precisely designed to escape the long-term dependency issue of recurrent networks. LSTM recurrent networks (RNN-LSTM) have memory cells that have an internal recurrence (a self-loop), in addition to the outer recurrence of the RNN. The latter adds a nonlinear transformation to the inputs [28]. These memory cells, A, are controlled mainly by the memory door, the forgetting door ( h t ), and the output door. The memory door activates the entry of information to the memory cell, and the forgetting door selectively erases certain information in the memory cell and activates the storage to the next entry [29]. Finally, the output door decides what information the memory cell will emit [30]. The LSTM network structure is illustrated in Figure 4. Each cell has three gate activation functions σ and two output activation functions defined by tanh as a nonlinear transfer function.
In addition, they classify and predict based on time series data, since there may be delays of unknown duration between important events in a series of time. It allows clearly remembering events selected from far away in the past, which contrasts with basic NRs, for which the memory of an event decays over time [27].  A 1-layer RNN-LSTM with 12 cells was implemented for this work. Both DL models were implemented using Keras, with TensorFlow as backend, in Python. Figure 5 shows GIF estimation for the data preprocessing module, estimated for magnitudes >3, >4, >5, and >6, respectively. Note that with higher magnitudes, the GIF time series become thinner, due to the decrease of seismic events that fit in the category.

Results
The structure implemented for both DFANN and RNN-LSTM models is shown in Figure 6.
The DFANN model performs slightly better than the RNN-LSTM models, in particular for lesser magnitudes (>3). Table 1 shows the training and test performance measures (root mean square error, RMSE) for each magnitude group and DL model. Both models show better performances with magnitude >3, that is, when more information are available.
Also, a representation of the training and test results for the best model are shown in Figure 7. The model captures the trend very well; however, it does not perform accordingly in terms of the magnitude of the intensity function.

Discussion
This work introduces a novel approach to predict the temporal ETAS-GIF alternative to the statistical approach proposed by [14]. The deep learning method has recently been used for predicting locations of aftershock events [31] especially based on ground motion data. The first use of a feedforward neural network for the prediction of seismic hazard was introduced by [32] in the spatial domain.
Possible extensions of the deep learning approach could be to include the ground motion together to other variables [30,31] as inputs of the model and to incorporate the spatial dimension for a spatiotemporal prediction [33][34][35]. Some statistical techniques could be used for identifying possible patterns and inputs [36][37].
Also, since seismic events could be characterized by different features depending of the different locations of the principal events, we think that DL neural network models could be used for characterizing earthquakes in some specific seismic areas such as the local ETAS models [7,11].
Different neural networks models could be used for comparing earthquake predictions [38]. For example, Bayesian DL neural networks could be used for a new prediction scenario considering the uncertainty of major earthquake occurrences and the probability of recurrence in a similar way to the Bayesian approach proposed by [32]. Additionally, other DL and machine learning approaches as convolutional neural networks (CNN), generative networks (GN), and random forest regression (RFR) could be implemented by incorporating the spatial component and allowing to "generate" new prediction seismic risk maps.
However, the main limitation of neural networks is that they are considered "black boxes" since it is difficult to quantify the correlation between the involved variables and their uncertainty.

Conclusion
This chapter deals with the estimation of seismic risk given by the temporal ETAS conditional intensity function. To achieve this goal, two deep learning models were implemented: a deep feedforward artificial neural network and a recurrent long short-term memory network. The results show a good estimation, in particular with the DFANN model. However, it should be pointed out that both implemented models could be improved by adding more hidden layers or stacking more LSTM layers in the DFANN and RNN-LSTM models, respectively. Also, exogenous variables (such as ground motion among others) could be considered for improving the predictions. Since the proposed model only considers a temporal model, extensions to the prediction of earthquake locations will be considered in future works. We think that deep learning algorithms could be useful tools for many earthquake prediction approaches.