A Review on the Exact Monte Carlo Simulation

Perfect Monte Carlo sampling refers to sampling random realizations exactly from the target distributions (without any statistical error). Although many different methods have been developed and various applications have been implemented in the area of perfect Monte Carlo sampling, it is mostly referred by researchers to coupling from the past (CFTP) which can correct the statistical errors for the Monte Carlo samples generated by Markov chain Monte Carlo (MCMC) algorithms. This paper provides a brief review on the recent developments and applications in CFTP and other perfect Monte Carlo sampling methods.


Introduction
In the past 30 years, substantial progress has been made in popularizing Bayesian methods for the statistical analysis of complex data sets. An important driving force has been the availability of different types of Bayesian computational methods, such as Markov chain Monte Carlo (MCMC), sequential Monte Carlo (SMC), approximate Bayesian computation (ABC) and so on. For many practical examples, these computational methods can provide rapid and reliable approximations to the posterior distributions for unknown parameters.
The basic idea that lies behind these methods is to obtain Monte Carlo samples from the distribution of interest, in particular the posterior distribution. In Bayesian analysis of complex statistical models, the calculation of posterior normalizing constants and the evaluation of posterior estimates are typically infeasible either analytically or by numerical quadrature. Monte Carlo simulation provides an alternative. One of the most popular Bayesian computational methods is MCMC, which is based on the idea of constructing a Markov chain with the desired distribution as its stationary distribution.
By running a Markov chain, MCMC methods generate statistically dependent and approximate realizations from the limiting (target) distribution. A potential weakness of these methods is that the simulated trajectory of a Markov chain will depend on its initial state. A common practical recommendation is to ignore the early stages, the so-called burn-in phase, before collecting realizations of the state of the chain. How to choose the length of the burn-in phase is an active research area. Many methods have been proposed for convergence diagnostics; [10] gave a comparative review. Rigorous application of diagnostic methods requires either substantial empirical analysis of the Markov chain or complex mathematical analysis. In practice, judgments about convergence are often made by visual inspection of the realized chain or the application of simple rules of thumb.
Concerns about the quality of the sampled realizations of the simulated Markov chains have motivated the search for Monte Carlo methods that can be guaranteed to provide samples from the target distribution. This is usually referred to as perfect sampling or exact sampling. In some cases, for example, the multivariate normal, perfect samples are readily available. For more complicated distributions, perfect sampling can be achieved, in principle, by the rejection method. This involves sampling from a density that bounds a suitable multiple of the target density, followed by acceptance or rejection of the sampled value. The difficulty here is in finding a bounding density that is amenable to rapid sampling while at the same time providing sample values that are accepted with high probability. In general this is a challenging task, although efficient rejection sampling methods have been developed for the special class of log-concave densities; see, for example, [20,22].
A surprising breakthrough in the search for perfect sampling methods was made by [37]. The method is known as coupling from the past (CFTP). This is a sophisticated MCMC-based algorithm that produces realizations exactly from the target distribution. CFTP transfers the difficulty of running the Markov chain for extensive periods (to ensure convergence) to the difficulty of establishing whether a potentially large number of coupled Markov chains have coalesced. An efficient CFTP algorithm depends on finding an appropriate Markov chain construction that will ensure fast coalescence. There have been a few further novel theoretical developments following the breakthrough of CFTP, including [20,21,38]. Since then, perfect sampling methods have attracted great attention in various Bayesian computational problems and applied probability areas.
Apart from coupling from the past, many other perfect sampling methods were proposed for specific problems, for example, perfect sampling for random spanning trees [2,47] and path-space rejection sampler for diffusion processes [3][4][5]. Very recently, a type of divide-and-conquer method has been developed in [15,16]. These methods use the technique for the exact simulation of diffusions and samples from simple sub-densities to obtain perfect samples from the target distribution.
Perfect samples are useful in Bayesian applications either as a complete replacement for MCMC-generated values or as a source of initial values that will guarantee that a conventional MCMC algorithm runs in equilibrium. Perfect samples can also be used as a means of quality control in judging a proposed MCMC implementation when there are questions about the speed of convergence of the MCMC algorithm or whether the chain is capable of exploring all parts of the sample space. Of course, when perfect samples can be obtained speedily, they will be preferred to MCMC values, since they eliminate such doubts. In addition, sophisticated perfect sampling methodology often motivates efficient approximate algorithms and computational techniques. For example, [43] uses regenerated Markov chains to obtain simple standard error estimates for importance sampling under MCMC context. The condition required there will allow us to carry out perfect sampling via multigamma coupling approach [23]. This paper will present a brief review for perfect Monte Carlo sampling and explain the advantages and drawbacks of different types of methods. Section 2 will present rejection sampling techniques, and then CFTP will be covered in Section 3. Recent divide-and-conquer methods will be reviewed in Section 4. The paper ends with a discussion in Section 5.

Rejection sampling techniques
Rejection sampling, also known as 'acceptance-rejection sampling', generates realizations from a target probability density function f x ð Þ by using a hat function Mg x ð Þ, where f x ð Þ ≤ Mg x ð Þ and g x ð Þ are a probability density function from which samples can be readily simulated. The basic rejection sampling algorithm is as follows: Many other perfect sampling methods are actually equivalent to rejection sampling. For example, ratio-of-uniform (RoU) method [45] may have to be implemented via a rejection sampling approach.
The efficiency of rejection sampling depends on the acceptance probability, which is 1=M. To perform rejection sampling efficiently, it is very important to find hat functions which provides higher acceptance probabilities. In other words, we shall choose M as small as possible [39]; however for many complicated problems, there is no easy way to find M small enough to guarantee high acceptance probability. A number of sophisticated rejection sampling methods have been suggested for dealing with complex Bayesian posterior densities, which we discuss below.

Log-concave densities
for all x, y and λ ∈ 0, 1 ½ . For the special class of log-concave densities, Gilks and wild [22] developed the adaptive rejection sampling (ARS) method. The method constructs an envelope function for the logarithm of the target density, f x ð Þ, by using tangents to log f x ð Þ at an increasing number, n, of points. The envelope function u n x ð Þ is the piecewise linear upper hull formed from the tangents. Note that, the envelope can be easily constructed due to the concavity of log f x ð Þ. The method also constructs a squeeze function l n x ð Þ which is formed from the chords of the tangent points. The sampling steps of the ARS algorithm are as follows. Outputs a stream of perfect samples from f x ð Þ. Initialise u n x ð Þ and l n x ð Þ by using tangents at several points. 01 Þ , 0 4 Output x * ; Update u n , l n ð Þto u nþ1 , l nþ1 ð Þusing x * ; 0 5 Goto 02 06 The ARS algorithm is adaptive and the sampling density is modified whenever f x * ð Þ is evaluated. In this way, the method becomes more efficient as the sampling continues. Leydold [29] extends ARS to log-concave multivariate densities. Leydold's algorithm is based on decomposing the domain of the density into cones and then computing tangent hyperplanes for these cones. Generic computer code for sampling from a multivariate log-concave density is available on the author's website; it is only necessary to code a subroutine for the target density. Leydold's algorithm works well for simple low-dimensional densities. The drawback of ARS algorithm is that it only works for log-concave densities, which is a very small class of posteriors in practice. Also computationally it is very challenging to implement ARS algorithm for high-dimensional densities [11].

Fill's rejection sampling algorithm
We consider a discrete Markov chain with transition probability P x, y ð Þ and stationary distribution π x ð Þ, x ∈ S. LetP x, y ð Þ ¼ π y ð ÞP y, x ð Þ=π x ð Þ be the transition probability for the time-reversed chain. Suppose that there is a partial ordering on the states S, denoted by x ≼ y. Let0 and1 be unique extremal points of the partial ordering.
To demonstrate the algorithm given by [20], we will assume that there are update functions ϕ andφ both mapping where U $ Unif 0, 1 ½ and x ≼ y )φ x, u ð Þ≼φ y, u ð Þ a:e: u ∈ 0, 1 ½ : The algorithm is as follows: is indeed a perfect sample from π. From Algorithm 2.3, we can see that rejection sampling can still be possible, even if we do not have a closed form of the hat function. The first limitation of Algorithm 2.3 is that it works only if the time-reversed chain is monotone, but [21] has extended the algorithm theoretically for general chains. The second limitation is that step 3 of Algorithm 2.3 is difficult to perform [20]. For these reasons, Fill's algorithm is not practical for realistic problems.

Coupling from the past
Coupling from the past was introduced in the landmark paper of [37] which showed how to provide perfect samples from the limiting distribution of a Markov chain.

Basic CFTP algorithms
Let X t f g be an ergodic Markov chain with state space X ¼ 1, ⋯, n f g, where the probability of going from i to j is p ij and the stationary distribution is π. Suppose we design an updating function ϕ Á, where ϕ is a deterministic function and U is a random variable. To simulate the next state Y of the Markov chain, currently in state i, we draw a random variable U and let for t 1 < t 2 . Proposition 3.1 [37] Suppose there exists a time t ¼ ÀT, the backward coupling time, such that chains starting from any state in X ¼ 1, ⋯, n f g, at time t ¼ ÀT, and with the same sequence U t , t ¼ ÀT, ⋯, À1 f g , arrive at the same state X * 0 . Then it must follow that X * 0 comes from π. This proposition is easy to prove. If we run an ergodic Markov chain from time t ¼ À∞ and with the sequence U t , t ¼ ÀT, ⋯, À1 f g after ÀT, the Markov chain will arrive at X * 0 . Then X * 0 comes exactly from π since it is collected at time 0 and the Markov chain started from À∞. The algorithm is as follows: Propp and Wilson [37] also proved that this algorithm is certain to terminate. The idea of simulating from the past is important. Note that if we collect F T 0 Á ð Þ as the result, where T is the smallest value that makes F T 0 Á ð Þ a constant, we will get a biased sample. This is because T is a stopping time, which is not independent of the Markov chain.

Read-once CFTP
The basic CFTP algorithm needs to restart the Markov chains at some points in the past if they have not coalesced by time 0. We must use the same random sequence U t f g À1 À∞ when we restart the Markov chains. In Monte Carlo simulations, we usually use pseudorandom number generators, which are deterministic algorithms. So if we give the same random seed, we will get the same random sequence. This means that the same sequence U t f g can be regenerated in each coupling procedure.
If we can run the Markov chain forward starting at 0 and collect a perfect sample in the future, we will not have to regenerate U t f g. Wilson [48] developed a readonce CFTP method to implement the forward coupling idea. A simple example is provided by [6]. In fact, the multigamma coupler in [23] can be implemented via the more efficient read-once CFTP algorithm.

Improvement of CFTP algorithms
Propp and Wilson [37] showed that the computational cost of the algorithm can be reduced if there is a partial order for the state space X that is preserved by the update function ϕ, i.e. if x ≤ y then ϕ x, U ð Þ≤ ϕ y, U ð Þ. Their procedure is outlined in Algorithm 3.2, whereas before0 and1 are the unique extremals. Note that their algorithm needs a monotone update function ϕ for the Markov chain, while Algorithm 2.3 requires a monotone update functionφ for the time-reversed chain.
Algorithm 3.2 is much simpler than Algorithm 3.1, since only two chains have to be run at the same time, but the requirement of monotonicity is very restrictive. Markov chains with transitions given by independent Metropolis-Hastings and perfect slice sampling have been shown to have this property, by [9,32], respectively. However [32,34] have also noticed that such independent M-H CFTP is equivalent to simple rejection sampler.
In general it is hard to code perfect slice samplers correctly. For example, Hörmann and Leydold [26] have pointed out that the perfect slice samplers in [7,36] are incorrect. The challenge of monotone CFTP is usually to construct the detailed updating function with a guarantee of preserving the partial order.
Finding a partial order preserved by the Markov chains is a non-trivial task in many cases. An alternative improvement is to use CFTP with bounding chains, such as that in [27,33]. If the bounding chains, which bound all the Markov chains, coalesce, then all Markov chains coalesce. Thus if only a few bounding chains are required, the efficiency of the CFTP algorithm can be improved significantly. Sometimes, it may be impossible to define an explicit bounding chain (the maximum of the state space may be infinity, and the upper bound chain cannot start from infinity), but it is possible to use a dominated process to bound all Markov chains [28].

Applications and challenges
Although CFTP is extremely challenging to be implemented for many practical problems, it did find a few applications in certain discrete state space problems, for example, the Ising model [37]. Also [18] applied CFTP to ancestral selection graph to simulate samples from population genetic models. Refs. [12,14] applied CFTP to a class of fork-join type queuing system problems. Connor and Kendal [8] applied CFTP for the perfect simulation of M/G/c queues. CFTP also finds its application in signal processing [17].
CFTP for continuous state space Markov chains is very challenging, since a random map from an interval to a finite number of points is required. In recent years, many methods have been developed for unbounded continuous state space Markov chains, such as perfect slice sampler in [32], multigamma coupler and the bounded M-H coupler in [23,34]. Wilson [49] developed a layered multi-shift coupling, which shifts states in an interval to a finite number of points. However, none of these methods can solve any practical problems.

Recent advances in perfect sampling
Recently, a new type of perfect Monte Carlo sampling method based on the decomposition of the target density f , as f Á ð Þ ¼ g 1 Á ð Þg 2 Á ð Þ, was proposed in [15], where g 1 and g 2 are also (proportional to) proper density functions. Note that here g 1 and g 2 are continuous density functions which are easy to simulate from. Suppose that q-dimensional vector values x 1 and x 2 are independently drawn from g 1 and g 2 , respectively. If the two independent samples are equal, i.e.
Note that such a naive approach may be practical for discrete random variables with low-dimensional state space, but for continuous random variables, it is impossible since P [15] proposed a novel approach to deal with this, which is explained in the following subsection.

Perfect distributed Monte Carlo without using hat functions
First we introduce the following notations related to the logarithm of the sub-densities: where ∇ is the partial derivative operator for each component of x. Then we consider a q-dimensional diffusion process X t ω ! , t ∈ 0, T ½ (T < ∞), defined on the q-dimensional continuous function space Ω, given by: gis a typical element of Ω. Let  be the probability measure for a Brownian motion with the initial probability distribution Clearly X t has the invariant distribution f 1 x ð Þ (using the Langevin diffusion results [24]). Let  be the probability law induced by X t , t ∈ 0, T ½ , with X 0 ¼ ω 0 $ f 1 Á ð Þ, i.e. under  we have X t $ f 1 x ð Þ for any t ∈ 0, T ½ . The idea in [15] is to use a biased diffusion process X ¼ X t ; 0 ≤ t ≤ T È É to simulate from the target function f . It is defined as follows. 7 A Review on the Exact Monte Carlo Simulation DOI: http://dx.doi.org/10.5772/intechopen.88619 Definition 4.1 (Biased Langevin diffusions) The joint density for the pair X 0 , X T À Á (the starting and ending points of the biased diffusion process), evaluated at point is the transition density for the diffusion process X defined in Eq. (6) from X 0 ¼ x to X T ¼ y and f 2 y À Á ¼ g 2 y À Á =g 1 y À Á . Given X 0 , X T À Á the process X t , 0 < t < T È É is given by the diffusion bridge driven by Eq. (6).
The marginal distribution for X T is f y À Á . Therefore, to draw a sample from the target distribution f x ð Þ, we need to simulate a process X t , t ∈ 0, T ½ from  (the law induced by X) and then X T $ f x ð Þ. Simulation from  can be done via rejection sampling. We can use a biased Brownian motion B t ; 0 ≤ t ≤ T È É as the proposal diffusion: Definition 4.2 (Biased Brownian motion) The starting and ending points B 0 , B T À Á follow a distribution with a density h x, y À Á , and B t ; 0 < t < T È É is a Brownian bridge given B 0 , B T À Á . Under certain mild conditions, Dai [15] proved the following lemma. Lemma 4.1 Let  be the probability law induced by B t ; 0 ≤ t ≤ T È É . By letting where div is the divergence operator. Condition 4.1 There exists l > À ∞ such that Under Condition 4.1 the ratio (8) can be rewritten as which has a value no more than 1. Therefore we can use rejection sampling to simulate from , with proposal measure . This acceptance probability (10) can be dealt with using similar methods as that in [3,5]. The algorithm is presented below; see [13,15] for more details. Simulate ω 0 , ω T ð Þfrom density h 01 Simulate the biased Brownian bridge B t , t ∈ 0, T ð Þ À Á 02 Accept ω T as a sample from f , with probability (6); If rejected, go back to step 01. 03 Such a method is a rejection sampling algorithm, but it does not require finding a hat function to bound the target density, which is usually the main challenge of the traditional rejection sampling for complicated target densities. The above algorithm uses g 2 as the proposal density function, which does not have to bound the target f . However, it requires a bound for the derivatives of the logarithm of the subdensities (see Condition 4.1). This is usually easier to get in practice, since the logarithm of the posterior is usually easy to deal with. Also [15] noted that we should choose sub-densities g 1 and g 2 as similar as possible, in order to achieve high acceptance probability.
Dai [15] focused on the simple decomposition of f ¼ g 1 g 2 , although it mentioned that for more general decomposition of f ¼ g 1 g 2 ⋯g C , a recursive method can be used. Unfortunately, a naive recursive method is very inefficient. A more sophisticated method is introduced in the following section.

Monte Carlo fusion for distributed analysis
A more efficient and sophisticated methods were proposed recently in [16], named as Monte Carlo fusion. Suppose that we consider where each g c x ð Þ (c ∈ 1, … , C f g ) is a density (up to a multiplicative constant). Here C denotes the number of parallel computing cores available in big data problems, and each g c x ð Þ means the sub-posterior density based on a subset of the big data. In group decision problems, C means the number of different decisions which should be combined and g c x ð Þ stands for the decision from each group member.
Dai et al. [16] considered simulating from the following density on extended space, which admits the marginal density f for y. Here p c yjx c ð Þ À Á is the transition density from x c ð Þ to y for the Langevin diffusion defined in Eq. (6) associated with each sub-density g c .
Dai et al. [16] considered a rejection sampling approach with proposal density proportional to the function where x ¼ C À1 P C c¼1 x c ð Þ and T is an arbitrary positive constant. Simulation from the proposal h can be achieved directly. In particular, x 1 ð Þ , … x C ð Þ are first drawn independently from g 1 , … , g C , respectively, and then y is simply a Gaussian random variable centred on x. This is a distributed analysis or divide-and-conquer approach. Detailed acceptance probabilities and rejection sampling algorithms can be found in [16].
The above fusion approach arises in modern statistical methodologies for 'big data'. A full dataset will be artificially split into a large number of smaller data sets, and inference is then conducted on each smaller data set and combined (see, for instance, [1,30,31,35,[40][41][42]46]). The advantage for such an approach is that inference on each small data set can be conducted in parallel. Then the heavy computational cost of algorithms such as MCMC will not be a concern. Traditional methods suffer from the weakness that the fusion of the separately conducted inferences is inexact. However, the Monte Carlo fusion in [16] is an exact simulation algorithm and does not have any approximation weakness.
The above fusion approach also arises in a number of other settings, where distributed analysis came naturally. For example, in signal processing, distributed multi-sensor may be used for network fusion systems. Fusion approach arises naturally to combine results from different sensors [44].

Conclusion
Although perfect simulation usually refers to correcting the statistical errors for the samples drawn via MCMC, it actually covers a much wider area beyond CFTP. In fact for certain applications, it is often possible to construct other types of perfect sampling methods which are much more efficient than CFTP. For example, for the exact simulation of the posterior of simple mixture models, the geometricarithmetic mean (GAM) method in [11] is much more efficient than CFTP in [25]. Details of GAM method is provided in Appendix. Also the random walk construction for exact simulation for random spanning trees [2] is much more efficient than the CFTP version.
Bayesian computational algorithms keep evolving, in particular under the current big data era. Although almost all newly developed algorithms are approximate simulation algorithms, perfect sampling is still one of the key wheel-driven forces for new Bayesian computational algorithms, and they usually can quickly motivate new class of 'mainstream' algorithms. More focus should be given to methods beyond CFTP, for example, the fusion type of algorithms.
The Monte Carlo fusion method has the potential to be used in many Bayesian big data applications. For example, for large car accident data, the response variable is usually a categorical variable representing the seriousness of the accident, and generalized linear regression model is often used. Under a Bayesian framework, we may estimate the posterior distribution for the regression parameters via such a fusion approach. Then the posterior mean, the posterior median, or other characteristics of the posterior distribution can be estimated using the Monte Carlo samples. Also such an algorithm is perfect sampling algorithm, and no convergence justification is needed, since it always provided realizations exactly from the target distribution. We assume that the component weights p ¼ p 1 , … , p K À Á are unknown parameters and the number of components, K, and the component densities, f k , are all known. We focus on the perfect sampling from the posterior distribution of p.
Suppose that we have N observations, y 1 , … , y N . Let L nk ¼ f k y n À Á and assume that the prior distribution of p is Dirichlet: Then the posterior distribution is given by where X ¼ pj There are several ways to carry out perfect sampling from Eq. (16). The first method is based on CFTP [25]. An alternative perfect sampling method for simple mixture models is introduced by [19]. The third approach is to use adaptive rejection sampling [22,29], since the posterior is log-concave. We may also use the ratioof-uniform method. However, none of these methods are more efficient than the geometric-arithmetic mean method in [11].

A.1 Geometric-arithmetic mean method
Suppose that p * , the MLE of p is unique and for simplicity, assume the prior π 0 p ð Þ is uniform. Define a nk ¼ L nk = P p * k L nk . The posterior density of p is then given by where X is defined in (16). Let I n be a random element of arg max k L nk . Define A j ¼ n : I n ¼ j f gand let n ¼ n 1 , … , n k ð Þwhere n j is the number of elements in A j .
Define M ¼ m jk È É with m jk ¼ P n ∈ A j a nk =n j . If n j ¼ 0, then set m jj ¼ 1 and m jk ¼ 0 for j 6 ¼ k. We now make two assumptions, which we will return to later on: A: M is invertible.
Under these assumptions, we will show that the following rejection sampler generates simulated values from the posterior distribution of p. First we define V to be the diagonal matrix with diagonal elements

Algorithm 6.1 (GAM sampler)
Sample q from the Dirichlet distribution with parameter n þ 1.
If U ≤ h pjy ð Þ= Q K j¼1 q j =v j nj , 04 Accept p and stop; 05 else 06 reject p and go to 01. 07 Proposition 6.1 Under assumptions A and B, Algorithm 6.1 samples p with probability density (17).
Proof: Since the geometric average is no larger than the arithmetic average, for p ∈ X , we have where in the case n j ¼ 0, the product term is taken as 1. So that, for p ∈ X , with m jk as previously defined, we have where q j ¼ v j P K k¼1 p k m jk , j ¼ 1, … , K or equivalently q ¼ VMp. Since v j > 0 and P K k¼1 p k m jk > 0, it follows that q j > 0 for j ¼ 1, … , K. Furthermore Note that the right-hand side of Eq. (22) is proportional to a Dirichlet distribution with parameters n 1 þ 1, … , n K þ 1 ð Þ . Rejection sampling then proceeds as usual: • A sample q is drawn from Dirichlet n þ 1 ð Þ.
• The value p ¼ M À1 V À1 q is calculated.
• It is accepted with probability h pj y ð ÞI X p ð Þ= Q K j¼1 q j =v j n j .
We now return to assumptions A and B. Suppose that M is invertible but the elements of v ¼ M T À Á À1 1 are not all positive. In this case, let where Δ is a diagonal matrix with diagonal elements 1=α 1 , … , 1=α K ð Þ . Note that v ¼M T À1 1 > 0. Algorithm 6.1 and its proof can then be modified by replacing M byM. Suppose now that assumption A does not hold, i.e. M is not invertible. This can be remedied by adding positive quantities to the diagonal elements of M. This also provides an alternative way of ensuring that the elements of v are positive.

A.2 Dirichlet priors and pseudo data
Suppose that the prior π 0 p ð Þ is Dirichlet The prior can be synthesized by introducing pseudo data,ã mk , m ¼ 1, … , A; k ¼ 1, … , K, defined as follows: With the Dirichlet prior, the posterior distribution given by Eq. (16) can be written as where a lk , l ¼ 1, … , N þ A f g contains the real data a nk , n ¼ 1, … , N f g and the pseudo dataã mk , m ¼ 1, … , A f g . The posterior distribution (26) has the same form as Eq. (17). Therefore GAM can be used to sample realizations from the posterior distribution (26).

A.3 Simulation results and discussion
We compare the running time of mixture models with sample sizes (N) and different number of components (K) in Table 1. The components have specified normal distributions with means The prior on p is uniform. We sample 10,000 realizations from the posterior of the models.
From Table 1, we can see that the GAM algorithm, while using very little memory, is highly efficient in running time. The last row of the table is the estimated acceptance probability of the GAM algorithm. The algorithm is very efficient when the component densities are known. We can see this not only by simulation but also from theoretical considerations, as follows.

A.3.1 Explanation of efficiency
When v ¼ M T À Á À1 1 > 0, we are able to use M directly without modification to construct the hat function, thereby speeding up the calculations. In the simulations of the previous section, this was always found to be the case. Now we explain why this should be so. If the maximum likelihood estimator of p is consistent, then when N ! ∞, 1 n j X n ∈ A j L nk P K k¼1 p * k L nk À L nk P K k¼1 p k L nk ! p 0: Assuming sufficient regularity, we also have m jk ¼ P n ∈ A j a nk n j !
where Y has density f y ð Þ ¼ P K k¼1 p k f k y ð Þ, L j ¼ yj f j y ð Þ ≥ f k y ð Þ, k ¼ 1, … , K n o and γ j ¼ Ð L j f y ð Þdy. Fearnhead's algorithm, Leydold's algorithm and ratio-of-uniform. GAM method. GAM acceptance probability. The * indicates that Fearnhead's method, and Leydold's method will not run on a standard desktop when K ¼ 4 and K ¼ 5. Therefore where Let v ¼ γ 1 , … , γ K ð Þ , then