A Non-Homogeneous Markov Chain Model to Study Ozone Exceedances in Mexico City A Non-Homogeneous Markov Chain Model to Study Ozone Exceedances in Mexico City

In many cities around the world, air pollution is among the many environmental problems that affect their population. Among the many known facts about the impact of pollution on human health, we have that for ozone concentration levels above 0.11 parts per million (0.11ppm), the susceptible part of the population (e.g., the elderly, ill, and newborn) staying in that environment for a long period of time, may experience serious health deterioration (see, for example, [1–10]). Therefore, to understand the behaviour of ozone and/or pollutants in general, is a very important issue.


Introduction
In many cities around the world, air pollution is among the many environmental problems that affect their population. Among the many known facts about the impact of pollution on human health, we have that for ozone concentration levels above 0.11 parts per million (0.11ppm), the susceptible part of the population (e.g., the elderly, ill, and newborn) staying in that environment for a long period of time, may experience serious health deterioration (see, for example, [1][2][3][4][5][6][7][8][9][10]). Therefore, to understand the behaviour of ozone and/or pollutants in general, is a very important issue.
It is possible to find in the literature a vast amount of works that try to answer some of the many issues arising in the study of pollutants' behaviour. Depending on the type of questions that one is trying to answer, different methodologies may be used. Among the many works concentrating on the study of ozone behaviour are, [11][12][13] using extreme value theory to study the behaviour of the maximum ozone measurements; [14] using time series analysis; [15] using volatility models to study the variability of the weekly average ozone measurements; [13,16] using homogeneous Poisson processes and [17,18] using non-homogeneous Poisson models to analyse the probability of having a certain number of ozone exceedances in a time interval of interest; [19] using compound Poisson models to study the occurrence of clusters of ozone exceedances as well as their mean duration time; and [20] using queueing model to study the occurrence of cluster of ozone exceedances as well as their size distribution.
In the environmental area, it is also possible to find works using Markov chains models. Some of them are, [21,22] where non-homogeneous Markov models are used to study the occurrence of precipitation. We also have [23] where those types of models are used to study tornado activity. In the case of ozone modelling we have, for instance, the works of [24][25][26] using time homogeneous Markov chains. In those works the interest was in estimating the probability that the ozone measurement would be above (below) a given threshold, conditioned on where it lays in the present and in the past days.
In [24], the order of the Markov chain was estimated using auto-correlation function. Its transition matrix was estimated using the maximum likelihood method (see, for instance, [27,28], among others). In [25], the order of the chain was also considered an unknown quantity that needed to be estimated. The Bayesian approach (see, for example, [29]) was used to estimate the order as well as the transition probabilities of the chain. In particular, the maximum à posteriori method was used. In [26], the estimation of the order of the chain is performed using the Bayesian approach using the so-called trans-dimensional Markov chain Monte Carlo algorithm ( [30,31]). The transition matrix of the chain was obtained through the maximum à posteriori method. However, the common denominator of those works is that the Markov chain model used was a time homogeneous one. Since ozone data are not, in general, time homogeneous, the data had to be split into time homogeneous segments and the analysis was made for each segment separately.
Here, the interest also resides in estimating, for instance, the probability that the ozone measurement will be above a given threshold some days into the future, given where it stands today and in the past few days. Although in the present work we also use Markov chain models and the Bayesian approach, the novelty here is that the time-homogeneous assumption is dropped. Here, we consider a non-homogeneous Markov chain model. We assume that the order of the chain as well as its transition probabilities are unknown and need to be estimated. The chosen method of estimation is also the maximum à posteriori.
This work is presented as follows. In Section 2 the non-homogeneous Markov chain model is given. Section 3 presents the Bayesian formulation of the model. An application to ozone measurements from Mexico City is given in Section 4. In Section 5 some comments about the methodology and results are made. In an Appendix, before the list of references, we present the code of the programme used to estimate the order and the transition probabilities of the Markov chain.

A non-homogeneous Markov chain model
The mathematical model considered here may be described as follows. Let N > 0 be a natural number representing the number of years in which measurements were taken. Let T i , i = 1, 2, . . . , N, be natural numbers representing the amount of observations in each year. Hence, we have that for a given year i, either T i = 366 or T i = 365, depending on whether or not we have a leap year, i = 1, 2, . . . , N.
Let Z (i) t be the ozone concentration on the tth day of the ith year, t = 1, 2, . . . , T i , i = 1, 2, . . . , N. Following [23], we will set T i = T = 366, i = 1, 2, . . . , N, with the convention that for non leap year, we assign Z (i) Remark. Since, we are taking all years of the same length we will drop the index i from the notation.
Denote by L > 0 the environmental threshold we are interested in knowing if the ozone concentration has surpassed or not. Define Y = {Y t : t ≥ 0} by, Hence, Y t indicates whether or not in the tth day the threshold L was exceeded.
As in [25], we assume that Y is ruled by a Markov chain of order K ≥ 0. In contrast with that work, in the present case the Markov chain is a non-homogeneous one. Hence, denote by : t = 1, 2, . . . T}, the corresponding non-homogeneous Markov chain of order K. We assume that K has as state space a set S = {0, 1, . . . , M}, for some fixed integer M ≥ 0, such that, M ≤ T with probability one.
Note that, X (K) has as state space the set S Also, note that (see [25]), if the set of observed value is (y 1 , y 2 , . . . , y T ), then the transition probabilities of X (K) are such that Therefore, w occurs, if and only if, the observation following y t+1 , y t+2 , . . . , y t+K , is y t+K+1 . This enables us to work with a more treatable state space for X (K) , and therefore, to have a better form for the transition matrix.
Hence, as in [25,32], we consider the transformed state space S by using the transformation f : S corresponds to the state m ∈ S (K) 2 . Hence, the transition probabilities of X (K) may be written as (see, for instance, [25]), where m ∈ S 2 , is the initial distribution of X (K) . When K = 0, we have that P Remarks. 1. When K = 1, we have that P   3. Y is going to represent our observed data.
In addition to estimating the order K of the Markov chain, we will also estimate its transition probabilities P , the transition matrix at time t. Note that, if

A Bayesian estimation of the parameters of the model
There are many ways of estimating the order and the transition matrix of a non-homogeneous Markov chain. One way of estimating the order is via the auto-correlation function associated to the chain throughout the years. Another way is to use the Bayesian approach. When it comes to estimating the transition probabilities we have, for instance, the maximum likelihood method ( [33]) and the empirical estimator ( [34]) which are essentially the same. In the present work, we will use the Bayesian approach (see, for instance, [29,35]) to estimate the order and the transition probabilities. In particular, we are going to adopt the maximum à posteriori approach. Inference will be performed using the information provided by the so-called posterior distribution of the parameters. The posterior distribution of a vector of parameters θ given the observed data D, indicated by P(θ | D), is such that is the likelihood function of the model, and P(θ) is the prior distribution of the vector θ.
In the present case, we have that the vector of parameter is θ = (K, The vector θ belongs to the following sample space (Note that if we have K = 0, then the parametric space reduces to Θ = ∆ T 2 .) In the present case we have D = Y mi (t) the number of years in which the vector (x 1 , x 2 , . . . , x K ) corresponding to a state m ∈ S (K) 2 is followed by the observation i, , as the number of years in which we have the observation m at time t, t = 1, 2, . . . , T. Additionally, let n (K) m indicate the number of years in which the state corresponding to the initial K days is equivalent to the value m ∈ S Therefore, since a Markovian model is assumed, the likelihood function is given by (see, for instance, [23,33]) . (3) Note that when K = 0, the expression (3) simplifies to The prior distribution of the vector of parameters is given as follows. We assume a prior independence of P (K) (t) as functions of t. Also, since the forms of P (K) (t) and Q (K) (1) depend on the value of K, we have that for θ = (K, where P(Q (K) (1) | K) and P P (K) (t) | K are the prior distributions of the initial distribution Q (K) (1) and of the transition matrix P (K) (t) given the order of the chain, respectively, and P(K) the prior distribution of the order K.
Remark. When we have K = 0, the vector of parameters is Given the nature of transition matrices, we are going to assume that rows are independent. We also assume that, given the order K of the chain, each row of the transition matrix P (K) (t) will have as prior distribution a Dirichlet distribution with appropriate hyperparameters. Therefore, given that K = k, row (P (k) for t in the appropriate range. In the case of initial distribution Q (K) (1), we also have a Dirichlet prior distribution, but now with hyperparameters (α . Therefore, for any given t, We assume that K has as prior distribution a truncated Poisson distribution defined on the set S with rate λ > 0; i.e., Therefore, we have from [25,32,36], that the conditional posterior distribution of P (K) (t) given K, is ). The mode of each Dirichlet distribution is known and is given by (see [37]), Additionally, the posterior distribution of the initial distribution Q (K) (1) given K is 2 ). Hence, as in the case of the posterior distribution of P (K) (t), the mode of P( When K = 0, we have Therefore, for each t = 1, 2, . . . , T, Furthermore, we also have, from [25], that with the appropriate adaptation for the case of K = 0. Hence, the posterior distribution of the order K is Therefore, in order to obtain the probability of interest, we just have to use (7) to estimate the value of K that maximises that posterior probability, and then use (4), (5), and/or (6) (depending on the case), in order to calculate the corresponding transition matrix and initial distribution, respectively.
The hyperparameters appearing in the prior distribution will be considered known and will be specified later.

Application to ozone data from the monitoring network of Mexico City
In this section we apply the model to the Mexico City's ozone measurements. The data used consist of twenty two years of the daily maximum ozone measurements (from 01 January obtained minute by minute and the averaged hourly result is reported at each station. The daily maximum measurement for a given region is the maximum over all the maximum averaged values recorded hourly during a 24-hour period by each station placed in the region. Since emergency alerts in Mexico City are declared regionally, we will analyse each region separately. The Mexican ozone standard considers the threshold 0.11ppm (see [38]). Hence, we will take that value as one of our thresholds. Additionally, for comparison purpose, we will also take the threshold values 0.15ppm and 0.17ppm. Even though it is a general belief that ozone measurements depend on the measurements of only a few days in the past, we are taking M = 16 when we consider the threshold values 0.15ppm and 0.17ppm. We have decided to do that because in previous works the order for homogeneous segments could have higher order. In the case of L = 0.11ppm, in some cases, larger values of M were needed. Hence, we also take M = 16, in the case of region NW, and we take M = 18 in the case of regions CE, NE, SE, and SW. In order to account also for the possibility of low order, we take λ = 1 in the prior distribution of K.
The hyperparameters of the Dirichlet prior distributions are assigned as in [25]. Therefore, the values of α         Table 1 gives the values of P(K | Y). Even though, S includes the values 0, 1, 2, and 3, since the posterior probabilities at those points are of order 10 −8 and below, we have omitted those values of K. We use the symbol "-" to indicate that the specific value of K either was not considered in the corresponding region or the probability associated to it was small compared to the values shown.
Looking at Table 1 we may see that, if we consider the threshold L = 0.11ppm, then the selected order of the chain is K equal to 16 in the case of region NE, equal to 12 for region NW, and equal to 17 for regions CE and SE. When we consider region SW, the value of K is  Table 1. Posterior distribution of the order of the chain for all regions and threshold considered. The symbol "-" is used to indicate that the specific value of K either was not considered in the corresponding region or the probability associated to it was small compared to the values shown.
either larger than or equal to 18 with probability one. If we take into account the threshold L = 0.15ppm, then, also by looking at Table 1, we have that the chosen orders are 10, 6, 11, 9, and 14, in the cases of regions NE, NW, CE, SE, and SW, respectively. When we consider the threshold L = 0.17ppm, then the selected orders are 7, 8, and 9, for regions NE, CE, and SW, respectively. In the cases of regions NW and SE, the estimated order is 5. Therefore, using this information and (4), the corresponding transition and initial probabilities may then be calculated.
As an example, consider the case of region CE and the threshold 0.17ppm. In that case, we have that the order of the chain is K = 8. Therefore, S (K) 2 = {0, 1, . . . , 255}. In Table   2, we have the approximated estimated values of the initial distribution Q  Looking at Table 2, it is possible to see that the highest initial probability is that associated to the state 0, i.e., the first eight days of the year form a string of zeros, meaning that the concentration levels are below 0.17ppm. Additionally, once you have the information that the ozone concentration levels on the first eight days are below 0.17ppm, then the highest transition probability is also associated to the transition to zeros, i.e., the two days following the eight initial days with concentration below 0.17ppm are more likely to present lower concentration levels as well.
In order to illustrate the type of information that may be obtained using the methodology considered here, take the case of the year 2012 and region CE. Suppose we want to calculate the probability that during the first nine days of January we have that the ozone concentration is below 0.17ppm from the first eight days, and it is above it on the ninth. Therefore, we want to know the probability that (0, 0, 0, 0, 0, 0, 0, 0) is followed by one. Hence, we want the probability of having the following sequence of zeros and ones: 0, 0, 0, 0, 0, 0, 0, 0, 1. Therefore, P((0, 0, 0, 0, 0, 0, 0, 0, 1)) = P (8)  (2) as well as the initial probabilities Q In order to obtain the values of the probabilities of interest, recall that P (K) 2 , t = 1, 2, . . . , T − K. Therefore, looking at Table 2, we have that P  Suppose now that we want to know the probability of having (0, 0, 0, 0, 0, 0, 0, 0) followed by (1,1). Hence, we want to know what the probability that (0, 0, 0, 0, 0, 0, 0, 0) is followed by one and that (0, 0, 0, 0, 0, 0, 0, 1) is followed by one. Therefore, we need to calculate P((0, 0, 0, 0, 0, 0, 0, 0, 1, 1)) = P (8) Proceeding in this way we may calculate the probability of having any string of states at any time of the year.
If we compare to the actual measurements in the year 2012, then we have that in the first ten days, the sequence Y, in the case of region CE, has the configuration 0, 0, 0, 0, 0, 0, 0, 0, 0, 0. In fact, the estimated probability of that sequence of zeros and ones is 0.5 × 0.762 × 0.0327 ≈ 0.0125 which is three times higher than the probability of having (0, 0, 0, 0, 0, 0, 0, 0) followed by (1,1). If we consider also the year 2013, the results are similar. Hence, the methodology used here can produce estimated values that may describe well the behaviour of the data.

Conclusion
In this work we have considered a non-homogeneous Markov chain model to study the ozone's behaviour in Mexico City. The interest resides in estimating the probability that the ozone level will be above (below) a certain threshold given that it is either above or below it in the present and in the recent past. Due to the nature of the questions asked here, a natural way of trying to answer them is to use Markov chain models. However, due to the non-homogeneity of the data, a non-homogeneous version of the chain is used. Using the Bayesian approach a maximum à posteriori estimation of the order of the matrix as well as its transition matrix and initial distribution was made. The results have shown that higher order should be considered for the chain. One explanation for that could be the way the empirical probability of having an exceedance in a given day behaves. That can be seen in Figure 1 where, as an illustration, the proportion of exceedances of the threshold 0.11ppm is presented for each region. The values correspond to the proportion of years in which in a fixed day the threshold was exceeded. As we vary the days, we have the behaviour throughout the 366 days. In that figure we have in the horizontal axis the days and in the vertical axis we have the values of which represents the proportion of years with an exceedance in the tth day, t = 1, 2, . . . , T. The notation Y (i) t is used to indicate the variable Y t defined in (1) on the ith year.
The plots in Figure 1 reflect well the fact that in region SW, in most of the days of the year, there are exceedances of the threshold 0.11. We may also see the influence of the seasons of the year. The hill between days 100 and 200 appearing in every plot, corresponds to measurements taken between April and June. Higher values occur during the days corresponding to approximately mid April to mid May. Those months are in the middle of Spring. During this season it does rain much in Mexico City. Additionally, there is a lot of sunlight. Hence, the ozone concentration is bound to be high, and as a consequence, the proportion of years in which exceedances occur at that period is large. The values decrease when the raining season starts (around the beginning of June).