Open Access is an initiative that aims to make scientific research freely available to all. To date our community has made over 100 million downloads. It’s based on principles of collaboration, unobstructed discovery, and, most importantly, scientific progression. As PhD students, we found it difficult to access the research we needed, so we decided to create a new Open Access publisher that levels the playing field for scientists across the world. How? By making research easy to access, and puts the academic needs of the researchers before the business interests of publishers.

We are a community of more than 103,000 authors and editors from 3,291 institutions spanning 160 countries, including Nobel Prize winners and some of the world’s most-cited researchers. Publishing on IntechOpen allows authors to earn citations and find new collaborators, meaning more people see your work not only from your own field of study, but from other related fields too.

In this work, a first-order autoregressive hidden Markov model (AR(1)HMM) is proposed. It is one of the suitable models to characterize a marker of breast cancer disease progression essentially the progression that follows from a reaction to a treatment or caused by natural developments. The model supposes we have observations that increase or decrease with relation to a hidden phenomenon. We would like to discover if the information about those observations can let us learn about the progression of the phenomenon and permit us to evaluate the transition between its states (supposed discrete here). The hidden states governed by the Markovian process would be the disease stages, and the marker observations would be the depending observations. The parameters of the autoregressive model are selected at the first level according to a Markov process, and at the second level, the next observation is generated from a standard autoregressive model of first order (unlike other models considering the successive observations are independents). A Markov Chain Monte Carlo (MCMC) method is used for the parameter estimation, where we develop the posterior density for each parameter and we use a joint estimation of the hidden states or block update of the states.

Keywords

autoregressive hidden Markov model

breast cancer progression marker

Gibbs sampler

hidden states joint estimation

Markov Chain Monte Carlo

chapter and author info

Authors

Hamid El Maroufy*

Department of Mathematics, Faculty of Sciences and Technics, Sultan Moulay Slimane University, Béni Mellal, Morocco

El Houcine Hibbah

Department of Mathematics, Faculty of Sciences and Technics, Sultan Moulay Slimane University, Béni Mellal, Morocco

Abdelmajid Zyad

Biological Engineering Laboratory, Team of Natural Substances, Cell and Molecular Immuno-Pharmacology, Sultan Moulay Slimane University, Morocco

Taib Ziad

Early Clinical Development, Astra Zeneca RD, Gothenburg, Mölndal, Sweden

*Address all correspondence to: h_elmaroufy@hotmail.com

The main motivation behind this work is to characterize progression in breast cancer. In fact, disease progression cannot be assessed correctly without the use of biomarkers, which would effectively monitor the evolution of the patient health state, and this is the case for breast cancer. The major challenge in this matter for researchers and clinicians is to unravel the stage of the disease, so as to tailor the treatment for each patient and to monitor the response of a patient to a treatment.

Currently, studies have shown that there is a correlation between the levels of certain markers such as cancer antigen CA15-3, carcinoembryonic antigen (CEA), and serum HER2 Neu with the stage of the disease [1]. This gives an opportunity of using a hidden Markov model (HMM) to predict the stage of the disease based on biomarker data and to address the effectiveness of the treatments in their influence on the transition of the cancer from one state to another. In HMM, we have two constituents: the Markovian hidden process suitable to represent the breast cancer stage and the observation process given by the biomarker data. By the way, we can learn about the disease transition rates and how it progresses from primary breast cancer to advanced cancer stage, for example.

Indeed, HMM is a useful tool for tackling numerous concrete problems in many fields but some possible applications of HMM are in speech processing [2], in biology [3], in disease progression [4], in economics [5, 6], and in gene expression [7]. For a complete review of HMM, the reader is referred to Zucchini and MacDonald [8], in which properties and definitions of HMM are presented in a plausible way with both classical estimation by maximum likelihood method and expectation maximization (EM) algorithm and the new Bayesian inference is addressed.

The model we consider here is a variation of the regular hidden Markov model, since we will use extensions to incorporate dependence among successive observations, suggesting autoregressive dependence among continuous observations. Consequently, we have relaxed the conditional independence assumption from a standard HMM, because we would like to add some dynamics to the patient disease progression and because in reality the current patient biomarker observation is dependent on the past one. In fact, the autoregressive assumption in HMM has shown its advantage over regular HMM that cannot catch the strong dependence between successive observations (e.g., Ref. [9]). A similar model to ours can be found in Ref. [10]. This kind of models, which were first proposed in Ref. [11] to describe econometrics time series, is generalization of both HMM and autoregressive models, will be effective in representing multiple heterogeneous dynamics such as the disease progression dynamics, and can be even generalized to a regime-switching ARMA models such as in Ref. [12].

Moreover, Our model can also be viewed as an extension of the multivariate double-chain Markov model (DCMM) developed by Ref. [13], where there are two discrete Markov chains of first order: the first Markov chain is observed and the second one is hidden. In contrast to this DCMM, our multivariate first-order autoregressive hidden Markov model (MAR(1)HMM) will lead to continuous observations, where each observation conditional on the hidden process will depend on the previous observation according to an autoregressive process of first order. This dynamic is promising for continuous observed disease biomarkers.

Parameter estimation is very challenging for HMM family models since the likelihood is not available in a closed form most of the time. Thus, we call for a Markov Chain Monte Carlo (MCMC) procedure instead of a maximum likelihood-based approach. This choice rises from the fact that the Bayesian analysis uses prior knowledge about the process being measured, and it allows direct probability statements and an approximation of posterior distributions for the parameters. Instead in the maximum likelihood approach, we cannot have declared prior or have exact distribution for the parameters when the likelihood is untractable or when we have missing data (e.g., Refs. [14–16]).

Since the realization of HMM includes two separate entities: the parameters and the hidden states, the Bayesian computation is carried out after augmenting the likelihood by the missing hidden states [17]. The hidden states are sampled using a Gibbs sampler adopting a joint estimation of the hidden states or block update of the states (instead of a single update of each state separately) by means of a forward filtering/backward smoothing algorithm. Given the hidden states, we can compute the autoregressive parameters and the transition probabilities of the Markov chain by Gibbs sampler from their posterior densities after specifying conjugate priors for the parameters. Hence, the MCMC algorithm will alternate between simulating the hidden states and the parameters. Finally, we can obtain posteriors statistics such as the means, standard deviations and confidence intervals after assessing the convergence of the MCMC algorithm.

This chapter is organized as follows: after a preliminary on HMM, a description of the model is given in Section 3. In Section 4, we will give the Bayesian estimation of the parameters and the hidden states and provide the details of the MCMC algorithm, before presenting the results of a simulation studies in Section 5 and we will finish by a conclusion.

2. Preliminary

Since the model suggested is of the HMM type, we will describe HMM in more detail: an HMM is a stochastic process {Xt,Yt}t=0T, where {Xt}t=0Tis a hidden Markov chain (unobservable) and {Yt}t=0Tis a sequence of observable independent random variables such that Y_{t }depends only on X_{t }for the time t = 0,1,…T. Here the process {Xt}t=0Tevolves independently of {Yt}t=0Tand is supposed to be a homogeneous finite Markov chain with probability transition matrix Π of dimension a × a, where a indicates the number of the hidden states and Π_{0} = (Π_{01},…,Π_{0a }) is the initial state distribution.

We denote the probability density function of Y_{t }= y_{t }given X_{t }= k for k ∈ {1, …, a} with Pxt(yt,θk), where θ_{k }refers to the parameters of P when X_{t }= k. We suppose further that the processes Y_{t }|X_{t }and Y_{t′ }|X_{t′ }are independent for t ≠ t′. Let Θ = (θ_{1},…, θ_{a }) and θ = (Π_{0}, Π, Θ), and then, the HMM can be described as follows: First, the likelihood of the observations and the hidden states can be decomposed to P(y0,…,yT,x0,…,xT,θ)=P(y0,…,yT|x0,…,xT,θ)P(x0,…,xT,θ).Since {Xt}t=0Tis a Markov chain, P(x0,…,xT,θ)=Π0(x0)∏t=1TΠ(xt|xt−1). Under the conditional independence of the observations given the hidden states, P(y0,…,yT|x0,…,xT,θ)=Px0(y0|θx0)∏t=1TPxt(yt|θxt). Consequently, the likelihood function for the hidden states and the observations is given by P(y0,y1,…,yT,x0,x1,…,xT,θ)=Π(x0)Px0(y0|θx0)∏t=0TΠ(xt|xt−1)Pxt(yt|θxt).

3. Model description and specification

The MAR(1)HMM model we consider in this work is a hidden Markov model, where conditionally on the latent states, the observations are not independent like it is the case for a regular hidden Markov model. Instead, the current observation is allowed to depend on the previous observation according to an autoregressive model of first order. As in an HMM model, the latent states evolve according to a discrete first-order time homogeneous Markov model. We consider data of n continuous random variables observed over time, each of potentially different lengths, i.e., for each individual i = 1,2, …, n, we observe a vector yi,.=(yi,ui,…,yi,mi)T, with u_{i }< m_{i }.

Define u0=min1≤i≤n{ui}and M=max1≤i≤n{mi}and note that the times u_{i }and m_{i }may vary over the entire observation period from u_{0} to M with the restriction that u_{i }– m_{i }≥ 1, for i = 1,2,…,n.

We assume, for i = 1,2,…,n for integer time t = u_{i },…,m_{i }, that the random variable Y_{i,t }taking nonnegative values depends only on the states X_{t }and the previous observation Y_{i,t–1}, and based on the model developed by Farcomeni and Arima [10], we get the following model:

Yi,t|Xt=xt=β(xt)Yi,t−1+μ(xt)+εi,t.E1

The choice of the autoregressive part of the model is motivated by the fact that successive biomarker observations are most of the time correlated from many diseases unlike the hypothesis of independence between observations in HMMs.

We interpret x as the vector of the hidden health states of the patients; in the case of breast cancer, those states would be localized or advanced metastatic breast cancer for example, while y is the vector of the biomarkers observed and measured for the patients. The ε_{i },_{ t }are normal variables with mean 0 and variance σ^{2} such that ε_{i },_{ t }and ε_{i′ },_{ t′ }are uncorrelated, (i, t) ≠ (i′, t′).

The parameters β(xt)and μ(xt)are parameters taking values in Rfor each hidden state and σ^{2} ∈ R+.

Similar to Ref. [13], the transition matrix of the Markov chain Π is time homogeneous with dimension a × a where a is the number of hidden states, and Π = (Π_{ gh }, g = 1,…,a; h = 1,…,a) where Π_{ gh }= P(X_{t }= h|X_{t–1} = g), for g = 1,2,…,a; h = 1,2,…, a; and t = u_{0+1},…, M. We let the first state Xu0to be selected from a discrete distribution with vector of probabilities r = (r_{1},…,r_{a }). Also we consider the time of initial observation u_{i }, the initial observed state yi,ui, and the number of consecutive time points that were observed m_{i }– u_{i }+ 1. Let μ = (μ^{(1)},…, μ^{(a)}), β = (β^{(1)},…,β^{(a)}), and θ = (μ, β, σ^{2}, r, Π) be the set of all parameters in the model. We suppose that the individuals, i.e., Y_{i },_{ t }, behave independently conditionally on X. Therefore, for i = 1, …, n, P(yi,.|yi,ui,x,θ)=∏t=ui+1miP(yi,t|yi,t−1,xt,Θ)and P(x|θ)=P(xu0)∏t=u0+1MP(xt|xt−1,Π), where P(xt|xt−1,Π)=P(Xt=xt|Xt−1=xt−1,Π)=Πxt−1,xt.Then, the likelihood density for the observations of all individuals y = (y_{1},…,y_{n }) given first time vector of observations y0=(y1,u1,…,yn,un), x, and θ is P(y|y0,x,θ)=∏i=1nP(yi,.|yi,ui,x,θ),

This is due to the conditional independence of the y_{i }, given x and θ. The joint mass of each y_{i },. and x given yuiand θ can be written as follows: P(yi,.,x|yi,ui,θ)=P(yi,.|yi,ui,x,θ)×P(x|yui,θ).Using the Markov property of the hidden process, we have after simplification P(x|yi,ui,θ)∝P(yi,ui|x,θ)P(x|θ)=P(yi,ui|xui,θ)rxu0Πxu0,xu0+1×⋯×ΠxM−1,xM.In addition, P(yi,.|yi,ui,x,θ)=∏t=ui+1miP(yi,t|yi,t−1,x,θ), and consequently,

P(yi,.,x|yi,ui,θ)∝rxu0P(yi,ui|xui,θ)∏t=u0+1MΠxt−1,xt∏t=ui+1miP(yi,t|yi,t−1,x,θ).Finally, under the hypothesis of normal error distribution for the autoregressive parameters of the model (Eq. (1)) and the Chapman-Kolmogorov property, the joint distribution of y_{i },. and x given yi,uiand θ can be simplified to:

where ϕ denotes the density of a standard normal distribution N(0,1)and χ{A}(x)is the usual indicator function of a set A. Finally, the joint distribution of y and x has the following form:

We will use a Bayesian approach to estimate the model parameters. Inference in the Bayesian framework is obtained through the posterior density, which is proportional to the prior multiplied by the likelihood. The posterior distribution for our model, as in most cases, cannot be derived analytically, and we will approximate it through MCMC methods specifically designed for working with the augmented likelihood with the hidden states. In fact, MCMC methods start by specifying the prior density Π(θ) for the parameters. Since the data Y are available, the general sampling methods work recursively by alternating between simulating the full conditional distributions X given y and θ given x and y.

4.1. Prior distributions

Under the assumption of independence between the parameters θ=(μ,β,σ2,r,Π), the prior density could be written as P(θ)=P(r)P(Π)P(μ)P(β)P(σ2). r is the parameters of a multinomial distribution; hence, the natural choice for the prior would be a Dirichlet distribution r∼D(α01,…,α0a). Later on, ∑j=1aΠij=1, and we assume that Πi∼D(δi1,…,δia)for each row i of the transition matrix. This choice of the Dirichlet prior can be even the default D(1,…,1)as recently discussed in Ref. [18]. In fact, a Dirichlet prior is justified because the posterior density of each row of the transition matrix is proportional to the density of a Dirichlet distribution, and hence, choosing a Dirichlet prior would give a posterior Dirichlet. This can be justified as follows for a given set of parameters λ=(λ1,…,λa)from a discrete or from a multinomial density:

π(x1,…,xa,λ1,…,λa)=n!x1!…xa!λ1x1…λaxafor the nonnegative integers x1,…,xa, with ∑i=1axi=n.This probability mass function can be expressed, using the gamma function Γ, as π(x1,…,xa,λ1,…,λa)=Γ(∑i=1axi+1)∏i=1aΓ(xi+1)∏i=1aλixi.This form shows its resemblance to the Dirichlet distribution, and by starting from supposing the prior λ∝D(α0,…,αa), the posterior is P(λ|x)∝P(λ)P(x|λ)∝∏iλixi∏iλiαi−1∝∏iλixi+αi−1∝D(x1+α1,…,xa+αa).

Furthermore, concerning the priors for parameters of the autoregressive model, we suppose for h=1,…,a: μ(h)∼N(αh,τh), β(h)∼N(bh,ch), and inverse gamma (IG) prior for σ2∼IG(ε,ζ). αh,τh,bh,ch,ϵ,ζare hyperparameters to be specified. For more details on Bayesian inference and prior selection in HMM, the reader is referred to Ref. [19]. In our case, prior distributions for the autoregressive parameters were proposed by Ref. [20] for a mixture autoregressive model, who points out that they are conventional prior choices for mixture models.

4.2. Sampling the posterior distribution for the hidden states

Chib [21] developed a method for the simulation of the hidden states from the full joint distribution for the univariate hidden Markov model case. We will describe his full Bayesian algorithm for the univariate hidden Markov model before a generalization to our MAR(1)HMM.

4.2.1. Chib’s algorithm for the univariate hidden Markov model for estimation of the states

Suppose we have an observed process Yn=(y1,…,yn)and the hidden states Xn=(x1,…,xn), θ are the parameters of the model. We adopt for simplicity Xt=(x1,…,xt)the history of the states up to time t and Xt+1=(xt+1,…,xn)the future from t + 1 to n. We use the same notation for Y_{t }and Y^{t+1}.

For each state xt∈{1,2,…,a}for t=1,2,…,n, the hidden model can be described by a conditional density given the hidden states π(yt|Yt−1,xt=k)=π(yt|Yt−1,θk), k=1,…,a, with x_{t }depending only on x_{t–1} and having transition matrix Π and initial distribution Π_{0}, and the parameters for π(.) are θ=(θ1,…,θa).

Chib [21] shows that it is preferable to simulate the full latent data Xn=(x1,…,xn)from the joint distribution of x1,…,xn|Yn,θ, in order to improve the convergence property of the MCMC algorithm because instead of n additional blocks if each state is simulated separately, only one additional block is required. First, we write the joint conditional density as

For sampling, it is sufficient to consider the sampling of x_{t }from P(xt|Yn,Xt+1,θ,Π). Moreover, P(xt|Yn,Xt+1,θ,Π)∝P(xt|Yt,θ,Π)P(xt+1|xt,Π). This expression has two ingredients: the first is P(xt+1|xt,Π), which is the transition matrix from the Markov chain. The second is P(xt|Yt,θ,Π)that would be obtained by recursively starting at t = 1.

The mass function P(xt−1|Yt−1,θ,Π)is transformed into P(xt|Yt,θ,Π), which is in turn transformed into P(xt+1|Yt+1,θ,Π)and so on. The update is as follows: for k=1,…,a, we could write

These calculations are initialized at t = 0, by setting P(x1|Y0,θ)to be the stationary distribution of the Markov chain. Precisely, the simulation proceeds for k=1,…,a,recursively by first simulating P(x1=k|Y0,θ),from the initial distribution Π_{0}(k) and P(x1=k|Y1,θ,Π)∝P(x1=k|Y0,θ,Π)π(y1|Y0,θk).Then, we get by forward calculation P(xt=k|Yt−1,θ)=∑l=1aΠlkP(xt−1=l|Yt−1,θ), for each t=2,…,n, where Π_{ lk }is the transition probability and P(xt=k|Yt,θ)∝P(xt=k|Yt−1,θ,P)Π(yt|Yt−1,θk).The last term in the forward computation P(xn=k|Yn,θ)would serve as a start for the backward pass, and we get recursively for each t=n−1,…,1.; P(xt=k|Yn,Xt+1,θ)∝P(xt|Yt−1,θ,Π)P(xt+1|xt=k,Π),which permits the obtention of Xn=(x1,…,xn).

4.2.2. Simulating the hidden states for the MAR(1)HMM

Returning to our model, and adopting notations and algorithm developed by Fitzpatrick and Marchev, f will denote the observation density for the MAR(1)HMM, and for u0<t<M; x−t=(xu0,…,xt),xt=(xt,…,xM),y(t)=(yi,t,i=1,2,…,n), y,t=⋃i:ui<t{yi,ui,…,yi,min{t,mi}}, and yt=⋃i:t<mi{yi,max{t+1,ui},…,yi,mi}. The posterior distribution of the hidden state could be written as: P(x−M|y,M,θ)=P(xM|y,M,θ)×⋯×P(xu0|y,M,xu0+1,θ). So we could sample the whole sequence of states by sampling from P(xt|y,M,xt+1,θ).Hence, the estimation of the hidden states is performed recursively by first initializing

We perform a similar calculation for every state at time t, and we conclude by calculating P(xM=k|y,M−1,θ)=∑l=1aΠlkP(xM−1=l|Y,M−1,θ),and P(xM=k|y,M,θ)∝P(xM=k|y,M−1,θ)f(y(M)|y,M−1,θk). Later on, we get P(xM=k|y,M,θ), which permits the simulation of P(xM|y,M,θ).Finally, by backward calculation, we simulate from the probabilities P(xt|y,M,xt+1,θ)∝P(xt+1|xt,Π)P(xt|y,t,θ)for each time t=M−1,…,u0. Those backward probabilities would permit the simulation of the latent states.

4.3. Sampling from P(θ|x, y)

4.3.1. Sampling Π

Under the prior assumption of Dirichlet prior for each row of the transition matrix P(Πi)∝D(δi1,…,δia), and the independence assumption between those rows, the posterior distribution for Π_{ i }can be developed using Eq. (2) as follows: Let n_{ij }denote the number of single transitions from state i to state j, so

4.3.2. Sampling posterior distribution for initial distribution

Let n0l=χxu0(l), for l=1,…,a. Using (2), under Dirichlet prior D(δ01,…,δ0a)for the parameter r, we obtain P(r|x,y)∝P(r)∏l=1arlχ{xu0}(l)∝∏l=1arlδ0l+n0l−1∝D(δ01+n01,…,δ0a+n0a).

4.3.3. Sampling posterior distribution for the autoregressive parameters μ,β,σ2

When a complete conditional distribution is known such as the normal distribution or beta distribution, we use the Gibbs sampler to draw the random variable. This is the case for our model. Let us define nui(l)=∑i=1nχ{xui=l},nl=∑i=1n∑t=ui+1miχ{xt=l},N=∑l=1anl,n0l=χ{xu0}(l).So for l=1,2,…,a;by supposing N(αl,τl)as prior distribution and using Eq. (2), the conditional posterior distribution of μ^{(l)} is:

Finally, the algorithm is ran for d = 1,…,D iterations by alternating between the following steps, where in each step we compute a conditional posterior for the given parameter:

The MCMC algorithm:

For h=1,2,....,a, give reference values for the hyperparameters α_{h }, τ_{h }, a_{h }, b_{h }, δ_{0h }, and δ_{ih }for i=1,2,....,a.

Initialization (Step d = 1 of the MCMC iterations): Initialize Π^{(1)}, r^{(1)}, μ^{(1)}, β^{(1)}, and σ^{2(1)}.

Simulation of the hidden states:

Initialization of forward simulation: P(xu0(d)|y,u0,θ)∝P(y,u0|xu0(d))P(xu0(d)|r(d)), with y,u0={yi,ui,,ui=u0,i=1,…,n}.

Forward simulation: For k=1,….,aand t=u0+1,….,M:P(xt(d)=k|y,t−1,θ)=∑l=1aΠlk(d)P(xt−1(d)=l|Y,t−1,θ)and

In this section, we apply our results to the breast cancer model discussed earlier. The main reason behind our work is that the progression of breast cancer cannot be seen directly unless we use observations related to the disease that could characterize its progression; those observations here are quantities which could be measured; they are called biomarkers, where the word biomarker is used to designate any objective indication of a biological process or disease condition including during treatment and should be measurable. Furthermore, biomarkers are increasingly used in the management of breast cancer patients. One example is reported in Ref. [22], stating that there is correlation between elevation of CEA and/or CA15-3 and disease progression, in breast cancer patients. Also we use the autoregressive dependence among the observations to add more dynamics to the model unlike conventional HMMs where the successive observations given the Markov process are independent. We used the classification of breast cancer in three states: local where the disease is confined within the breast, the regional phase when the lymph nodes are involved, and the distant stage where the cancer is found in other parts of the body. We restrict ourselves to these three stages unlike other stage classifications that divide the progression in more than three stages such as the TNM (tumor, node, and metastasis) system. By lack of finding data about breast cancer biomarkers, we will confine ourselves to simulate an MAR(1)HMM model for observation time M = 24, and a number of individuals n = 210, a = 3 for Markov states number, with the length observation time for each individual selected uniformly between 2 and M. The simulation process supposes we have for the autoregressive means μ=(μ(1),μ(2),μ(3))=(12,24,36), since markers such as CA15–3 increase as the disease advances toward metastatic breast cancer. In addition, CA15–3 increase rapidly between successive observations, and thus, we take in the simulation the parameters β=(β(1),β(2),β(3))=(0.2,0.4,0.8).

The algorithm of simulation works as follows:

For each individual i=1,…,n, choose m_{i }the length of observation for that individual i.

Generate each discrete disease state x_{t }using transition matrixΠ=(0.7,0.2,0.1;0.1,0.6,0.3;0.2,0.3,0.5)for t=u0+1,…,M.

Generate the observations y_{i,t }for all individuals using our model 1.

We choose a prior σ2∼IG(0.001,0.001), a D(1,…,1)prior for each row of Π, and Gaussian noninformative priors for the μs and the βs. Having the hidden states and the observations, we ran our algorithm for 8000 MCMC iterations. MCMC algorithm convergence was assessed by analyzing MCMC iterations mixing plots that are shown in Figure 1 , autocorrelation sample graphs checking as illustrated in Figure 2 , and inspecting histograms of posterior densities for the parameters of the models in Figure 3 . All parameters show good mixing of chains, autocorrelations that decay immediately after a few lags, and perfect posterior densities fitting. Also the Gelman [23] potential scale reduction factor (PSRF) was plot. The PSRF is measured for more than two MCMC chains (three chains in this works are considered), and it is measured for each parameter of the model; it should show how the chains have forgotten their initial values and that the output from all chains is indistinguishable. It is based on a comparison of within-chain and between-chains variances and is similar to a classical analysis of variance; when the PSRF is high (perhaps greater than 1.1 or 1.2), then we should run our chains out longer to improve convergence to the stationary distribution. Each PSRF declines to 1 as the number of iterations approaches infinity to confirm convergence. All the parameters have shown a PSRF less than 1.1 as the number of iteration increases and by the way a good sign of convergence ( Figure 4 ). Moreover, we should point out that in the family of Markov switching model there is the so-called label switching problem (e.g., Ref. [24]) which arises identifiability problem, and hence, we would not estimate perfectly the parameters. In addition, the posterior densities could show evidence of multimodality. some authors postprocess the output of the MCMC to deal with the issue (e.g., [25]), while other uses a random permutation of the parameters in each iteration of the MCMC algorithm (e.g., [26]) or one can call for an invariant loss function method (e.g., [27]). In our case, no identifiability issue is noticed since we used well-separated prior hyperparameters. Even when we start from different initial values for the parameters, our algorithm converges immediately after a few iterations.

Finally and before giving our results, we should report that the simulation of the Dirichlet posterior was carried out following ([28, p. 22], [29, p. 155]) who reported that the posterior Dirichlet parameters should be simulated using the beta distribution approach. Table 1 shows how the posterior values estimated from algorithm are very close to the true ones.

Parameter

True value

Posterior statistics

Mean

Standard deviation

Confidence interval (5%)

μ^{1}

12

11.929

0.047

(11.851–12.005)

μ^{2}

24

23.923

0.047

(23.847–24.000)

μ^{3}

36

35.843

0.070

(35.729–35.959)

β^{1}

0.2

0.2016

0.0012

(0.1997–0.2035)

β^{2}

0.4

0.4018

0.0009

(0.4004–0.4032)

β^{3}

0.8

0.8022

0.0010

(0.8005–0.8038)

π_{11}

0.7

0.688

0.068

(0.5715–0.797)

π_{12}

0.2

0.223

0.062

(0.129–0.332)

π_{13}

0.1

0.090

0.042

(0.032–0.17)

π_{21}

0.1

0.091

0.035

(0.041–0.154)

π_{22}

0.6

0.607

0.059

(0.507–0.701)

π_{23}

0.3

0.302

0.055

(0.214–0.397)

π_{31}

0.2

0.153

0.053

(0.075–0.250)

π_{32}

0.3

0.368

0.071

(0.257–0.488)

π_{33}

0.5

0.479

0.073

(0.358–0.599)

σ^{2}

2

2.023

0.032

(1.970–2.077)

Table 1.

Posterior inference for the parameters of the MAR(1)HMM model.

6. Conclusion

We have extended the method of Chib [21] for block update estimation of the states to a MAR(1)HMM model. Furthermore, we would like to point out that our model can easily be extended to include missing observations, as we should only add an extra step in each MCMC iteration to estimate the missing observations. Also, we can estimate the autoregressive model for different values of the autoregressive order, p ≥ 1, by evaluating the Bayesian information criterion to select the best order that fits the observations of the model. Our model would capture the complexity and the dynamics of the evolution of breast cancer by introducing the latent states; the probabilities of transition between the latent states allow to compare among the effects of treatments on slowing or accelerating the transition of the disease from one health stage to another the autoregressive parameter mean values corresponding to different stages of the disease would guide medical doctors and scientists to monitor patients in different phases of the disease. The model incorporates individual observations with different lengths.

Last but not least, we like to mention the utilities of switching diffusion processes in addressing and analyzing many complicated applications such as in finance and risk management. Our future work would be to apply these processes to explore disease progression, because they are characterized by the coexistence of continuous dynamics and discrete events as well as their interactions.

Acknowledgments

We would like to thank the editorial staff for the comments that helped in improving this work. Also, we would like to thank the supporters of this work: The Lalla Salma Foundation Prevention and Treatment of Cancer, Rabat, Morocco; and the Germano-Morrocan Program for Scientific Research PMARS 2015-060.

Hamid El Maroufy, El Houcine Hibbah, Abdelmajid Zyad and Taib Ziad (November 2nd 2017). Bayesian Estimation of Multivariate Autoregressive Hidden Markov Model with Application to Breast Cancer Biomarker Modeling, Bayesian Inference, Javier Prieto Tejedor, IntechOpen, DOI: 10.5772/intechopen.70053. Available from:

Bayesian Networks for Supporting Model Based Predictive Control of Smart Buildings

By Alessandro Carbonari, Massimo Vaccarini and Alberto Giretti

We are IntechOpen, the world's leading publisher of Open Access books. Built by scientists, for scientists. Our readership spans scientists, professors, researchers, librarians, and students, as well as business professionals. We share our knowledge and peer-reveiwed research papers with libraries, scientific and engineering societies, and also work with corporate R&D departments and government entities.