Simulated Annealing Book 210.5772/50449Bayesian Recovery of Sinusoids with Simulated AnnealingÜstündagDursun1CevriMehmetMarmara University, Faculty of Science and Letters, Department of MathematicsTurkeyIstanbul University, Faculty of Science, Department of MathematicsTurkey1. IntroductionThe ultimate goal of collecting data is to gain meaningful information about a physical system. However, in many situations, the quantities that we would like to determine are different from the ones which we are able to have measured. If the data we measured depends on the quantities we want, it contains at least some information about them. Therefore, our general interest is to subtract this information from data. Let the vector

$$ contain the parameters to be estimated from the (measurements) vector

$$ which is the output of the physical system that one wants to be modeled. The physical system is described by a vector function

$$ in the form:

$$where t represents time. In many experiments, the recorded data

$$ are sampled from an unknown function

$$ together with errors

$$ at discrete times

$$:

$$The measurement errors

$$ are generally assumed to be drawn independently from a zero mean Gaussian probability distribution with a standard deviation of

$$. On the other hand, different signal models correspond to different choices of signal model function

$$. In this chapter, we restrict our attention to the staticStatic refers to that the amplitudes of the sinusoids do not change with time. sinusoidal model given by

$$where

$$’s represent the amplitudes of the signal model and

$$. The goal of data analysis is usually to use the observed data

$$ to infer the values of parameters

$$. Besides estimating the values of the parameters, there are two additional important problems. The one is to obtain an indication of the uncertainties associated with each parameter, i.e. some measures of how far they are away from the true parameters. The other we will not consider here is to assess whether or not the model is appropriate for explaining the data. Sinusoidal parameter estimation in additive white noise within a Bayesian framework has been an important problem in signal processing and still now is an active research area because of its wide variety of applications in multiple disciplines such as sonar, radar, digital communications and biomedical engineering. The purpose of this research is therefore to develop accurate and computationally efficient estimators for sinusoidal parameter, namely, amplitudes and frequencies. In above problem, one may or may not know the number of sinusoids. When it is unknown, it is called model selection (Andrieu and Doucet, 1999; Üstündag, 2011) and is not subject to this chapter. Under an assumption of a known number of sinusoids, several algorithms have already been used in the parameter estimation literature, such as least-square fitting (Press et al., 1995), discrete Fourier transform (Cooley & Tukey, 1964), and periodogram (Schuster, 1905). With least square fitting, the model is completely defined and the question remaining is to find the values of the parameters by minimizing the sum of squared residuals. The discrete Fourier transform has been a very powerful tool in Bayesian spectral analysis since Cooley and Tukey introduced the fast Fourier transform (FFT) technique in 1965, followed by the rapid development of computers. In 1987, Jaynes derived periodogram directly from the principles of Bayesian inference. After his work, researchers in different branches of science have given much attention to the relationship between Bayesian inference and parameter estimation and they have done excellent works in this area for last fifteen years (Bretthorst, 1990; Üstündag et al., 1989, 1991; Harney, 2003; Gregory, 2005; Üstündag & Cevri, 2008, 2011; Üstündag, 2011). In this chapter, we studied Bayesian recovery of sinusoids using estimation approach proposed by Bretthorst for a general signal model equation and combined it with a simulated annealing (SA) algorithm to obtain a global maximum of the posterior probability density function (PDF)

$$ for frequencies. Unfortunately, conventional algorithms (Press et al., 1995) based on the gradient direction fail to converge for this problem. Even when they converge, there is no assurance that they have found the global, rather than a local maximum. This is because the logarithm of the PDF

$$ is so sharply peaked and highly nonlinear function of frequencies. In this respect, a pattern search algorithm described by Hook-Jevees (Hooke & Jevees, 1962) to overcome this problem has already been used by some researchers in literature of estimation. However, we have found out that this approach does not converge unless the starting point is much closer to the optimum so that we have developed an algorithm in which this Bayesian approach is combined with a simulated annealing (SA) algorithm (Kirkpatrick, et al., 1983; Corana et al., 1987; Goffe et al., 1994; Ingber, 1996), which is a function optimization strategy based on an analogy with the creation of crystals from their melts. This explores the entire surface of the posterior PDF for the frequencies and tries to maximize it while moving both uphill and downhill steps, whose sizes are controlled by a parameter

$$ that plays the role of the temperature in a physical system. By slowly lowering the temperature

$$ towards zero according to a properly chosen schedule, one can show that the globally optimal solutions are approached asymptotically. Thus, it is largely independent of the starting values, often a critical input in conventional algorithms, and also offers different approach to finding parameter values of sinusoids through a directed, but random, search of the parameter space. In this context, an algorithm of this Bayesian approach is developed and coded in Mathematica programming language (Wellin at al., 2005) and also tested for recovering noisy sinusoids with multiple frequencies. Furthermore, simulation studies on synthetic data sets of a single sinusoid under a variety of signal to noise ratio (SNR) are made for a comparison of its performance with Cramér-Rao lower bound (CRLB), known as a lower limit on variance of any unbiased estimator. The simulations results support its effectiveness. 2. Bayesian parameter estimationLet us now reconsider above problem within a Bayesian context (Bernardo & Smith, 2000; Bretthorst, 1988; Gregory, 2005; Harney, 2003; Jaynes, 2003; Ruanaidh & Fitzgerald, 1996; Üstündag & Cevri, 2008). As with all Bayesian calculations, the first step is to write down Bayes’ theorem for the joint PDF for all of the unknown parameters

$$:

$$The quantity

$$ is called the prior PDF; it represents prior knowledge of the parameters

$$given the information

$$. The sampling distribution

$$ is the likelihood (Edwards, 1972) of the data

$$ given the model parameters. The probability function

$$ is the evidence, which is a constant for parameter estimation and is used here for normalizing the posterior PDF

$$. Therefore, it can be dropped in Equation (5) so that the joint PDF for the unknown parameters turns out to be the following form:

$$A key component in Bayes theorem is the likelihood function

$$which is proportional to the probability density of the noise. If its standard deviation

$$ is assumed to be known, then the likelihood function takes on the following form:

$$where the exponent

$$ is defined as follows

$$This is equivalent to

$$where

$$ and

$$In order to obtain the posterior PDF for

$$, Equation (6) can be integrated with respect to the nuisance parameters

$$ under the knowledge of

$$:

$$With the choices of an uniform prior PDF or independent Gaussians distributions with mean zero and known standard deviation for the amplitudes, the integral equations in (12) turn out to be a Gaussian integral which can be evaluated analytically (Bretthorst, 1988). To do this it is simply to convert the square matrix

$$ into a special type of matrix- a so called diagonal matrix- that shares the same fundamental properties of the underlying matrix. In other words, it is equivalent to transforming the underlying system of equations into a special set of coordinate axes. Therefore, this diagonalization process (Bretthorst, 1988) effectively introduces new orthonormal model functions,

$$and also gives a new expression for the signal model function in Equation (3):

$$The new amplitudes

$$ 's are related to the old amplitudes

$$ 's by

$$where

$$ represents the

$$th component of the

$$th normalized eigenvector of

$$, with

$$ as the corresponding eigenvalue. Substituting these expressions into Equation (9) and defining

$$to be the projection of the data onto the orthonormal model functions

$$, we can then proceed to perform the

$$ integration over

$$in Equation (12) to obtain

$$with

$$This represents the mean-square of the observed projections. If

$$ is known, the joint posterior probability density of the parameters

$$, conditional on the data and our knowledge of

$$ is given by

$$If there is no prior information about noise, then

$$ is known as a nuisance parameter and must also be eliminated by integrating it out. Using Jeffreys prior (Jeffreys, 1961)

$$and integrating Equation (12) over

$$ we obtain

$$This has the form of the “Student t- distribution”. As well as determining the values of the

$$ parameters for which the posterior PDF for

$$ is a maximum, it is also desirable to compute uncertainties associated with them. To do this, let us assume the case where

$$ is known and let

$$ represent the estimated values of

$$. Following to Bretthorst’s work (Bretthorst, 1988), we can expand the function

$$ in a Taylor series at the point

$$, such that

$$with

$$defined as

$$ and

$$for a single frequency case. For an arbitrary model the matrix

$$ cannot be calculated analytically; however, it can be evaluated numerically. The calculations of the mean and standard deviations for

$$ parameters require for evaluating Gaussian integrals by first changing to the orthogonal variables as was done above with the amplitudes. Let

$$and

$$ represent the

$$ eigenvalue and eigenvector of the matrix

$$, respectively. Then the new orthogonal variables are given by

$$By using these orthogonal variables to perform the

$$ Gaussian integrals, the estimate variance for

$$ can be calculated:

$$Therefore, the approximations for

$$ can be implemented in the form:

$$In agree with Bretthorst, the expectation values of the amplitudes are given by

$$. From Equation (15) the expected values for the old amplitudes

$$becomes

$$The uncertainty in

$$ is

$$, so that the corresponding uncertainty in

$$ is

$$3. Implementation of Simulated Annealing (SA) algorithmBayesian approach introduced by Bretthorst is briefly summarized in section 2 but, it is referred to Bretthorst's works (Bretthorst, 1988) for more detail information. Consequently, Bayesian parameter estimation turns into a global optimization problem which is a task to find the best possible solution

$$ for Equations (19) or (20) satisfying

$$in the solution space

$$.Because there is no variation at negative frequencies and the highest possible frequencies corresponds to wave that under goes a complete cycle in two unit intervals, so the lower limit on the range is 0 and all the variation is accounted for by frequencies less than

$$. Over last few decades, researchers have developed many computational algorithms to address such type of global optimization problems (Metropolis et al., 1953; Jeffreys, 1961; Kirkpatrick, et al., 1983; Laarhoven & Aarts, 1987; Stefankovic et al., 2009). Although there are numerous algorithms which are suggested to achieve this goal, few of them are capable of locating it effectively. Therefore, we follow the SA algorithm, suggested by Corana (Corana et. al., 1987) and modified by Goffe (Goffe et al., 1994), which is a kind of probabilistic algorithm for finding the global optimum of Equation (27) although its various alternative versions have already been used in statistical applications. A brief review of the most work on many algorithms based on that of SA, together with areas of applications is provided by Ingber and Binder (Binder, 1986; Ingber, 1994). The algorithm begins with an initial guess of the frequencies

$$, a trial step-length vector

$$and a global parameter

$$ (called the initial temperature). Each step of the SA algorithm replaces the current frequency with randomly generated new frequency. In other words, the next candidate point

$$ is generated by varying one component

$$of the current point

$$ at a time:

$$where

$$ is a uniformly distributed random number from the interval [-1,1] and

$$ is a step vector. The function value of

$$is then computed. If

$$then the point

$$ is accepted as the

$$th iteration point, it is replaced with

$$and algorithm moves uphill. If

$$ is the largest posterior PDF, denoted as

$$its value and

$$ are recorded since this is the best current value of the optimum. This forces the system toward a state corresponding to a local maximum or possibly a global maximum. However, most large optimization problems, like the one given in Equation (27), have many local maxima and optimization algorithm is therefore often trapped in a local maximum. To get out of a local maximum, a decrease of the function value

$$ is accepted with a certain probability. This is accomplished by the Metropolis-criterion (Metropolis et al., 1953) which is based on the changes of obtaining new state with the posterior PDF of frequencies value, defined as

$$where

$$represents difference between the present and previous posterior PDF values of frequencies, e.g.,

$$. Whenever

$$$$is computed and compared to

$$, a uniformly distributed random number from the interval

$$. If

$$, the new point

$$ is accepted and replaced with

$$ and the algorithm moves downhill, i.e. lower temperatures and larger differences in posterior PDF’s values. This continues until all

$$ components have been altered and thus

$$ new points have successively accepted or rejected according to the Metropolis criterion. After this process is repeated

$$ times the step vector

$$ is adjusted by the following rule:

$$where

$$ is the number of accepted moves along the direction

$$ and the parameter

$$, which is initially defined by user, controls the step variation along the

$$th direction. The aim of these variations in a step length is to maintain average percentage of accepted moves at about one-half of the total number of moves. An alternative step size of above SA algorithm is given by replacing Equation (28) with

$$where

$$ is a standard Normal distribution,

$$represents the temperature at the

$$th iteration and rescales the step size and ordering

$$. On the other hand,

$$is the CRLB for the

$$th component of angular frequencies at the

$$th iteration (Ireland, 2007) and provides a theoretical lower limit on how precisely parameter

$$can be extracted from noisy measurements. In this respect, it is defined in the form:

$$where the Fisher information matrix

$$(Kay, 1993), defined by

$$is an expectation of the second derivatives of the signal function with respect to

$$. Assuming that the matrix

$$ is diagonal for a large

$$ so that its inversion is straightforward. In this case, the diagonal elements yield the lower bound (Stoica et al., 1989; Lagrange, 2005) for the variance of

$$ asymptotically and we can write,

$$where

$$ represents the estimated variance of the noise and is described in Equation (41). This whole cycle is then repeated

$$ times, after which the temperature is decreased by a factor

$$. This process is generally called annealing (or cooling) schedule which is the heart of the algorithm and effects the number of times the temperature is decreased. If a fast cooling takes place then the problem will be trapped in a local maximum. Therefore, there are various annealing schedules suggested by different researchers (Ingber, 1994; Stefankovic, et al., 2009) for lowering the temperature but we choose the following:

$$Because of being exponential rather than logarithmic, it is sometimes known as simulated quenching (SQ) (Vasan et al., 2009; Aguiare et al., 2012). In case of a well conditioned estimation problem like, say, frequency estimation problem in signal processing, it is clear that the convenience of SA algorithm, together with a need for some global search over local optima, makes a strong practical case for the use of SQ. Therefore, different parameters have different finite ranges and different annealing time- dependent sensitivities. Classical annealing algorithms have distributions that sample infinite ranges and there is no decision for considering differences in each parameter dimension; e.g. different sensitivities might be necessary to use different annealing schedules. This requires the development of a new PDF to embed in the desired features (Ingber, 1994) so that it leads to variants of SA that justifies exponential temperature annealing schedules. Termination of the algorithm occurs when average function value of the sequences of the points after each

$$ step cycle reaches a stable state:

$$where

$$is a small positive number defined by user and

$$ indicates the last four successive iteration values of the posterior PDF of the frequencies that are being stored. Further details of the algorithm initialization are problem-dependent and are given in Section 5. 4. Power spectral densityBefore we discuss the computer simulated examples, there is something we need to say about how to display the results. The usual way the result from a spectral analysis is displayed is in the form of a power spectral density that shows the strength of the variation (energy) as a function of frequency. In Fourier transform spectroscopy this is typically taken as the squared magnitude of the discrete Fourier transform of the data. In order to display our results in the form of a power spectral density (Bretthorst, 1988; Gregory, 2005), it is necessary to give an attention to its definition that shows how much power is contained in a unit frequency. According to Bretthorst (Bretthorst, 1988) the Bayesian power spectral density is defined as the expected value of the power of the signals over the joint posterior PDF:

$$Performing integrals analytically over

$$ by using these orthonormal model functions defined in section 2, the power spectral density can therefore be approximated as

$$This function stresses information about the total energy carried by the signal and about the accuracy of each line. In the next section, we will present some numerical examples how well this technique works.5. Computer simulated examplesTo verify the performance of the algorithm, we generated a simulated data vector according to one, two and with five sinusoids. Here

$$ runs over the symmetric time interval

$$ to

$$ in

$$ integer steps and the components of the noise

$$ are generated from the zero mean Gaussian distribution with a known deviation

$$, initially and added to the simulated data. However, one of the interests in an experiment is also to estimate noise power

$$ so that it is assumed to be unknown. In agreement with Bretthorst, this is given in the following form:

$$Clearly, it is seen that the accuracy of the estimate depends on how long we sample and the signal-to-noise ratio (SNR), defined as the ratio of the root mean square of the signal amplitude to the noise

$$. In addition, one may also get the following expression of SNR:

$$When the standard deviation of the noise is unknown, an empirical SNR is obtained by replacing

$$ in Equation (42) with the estimated noise variance in (41). In our first example, we generate the data set from the following equation:

$$We then carried out the Bayesian analysis of the simulated data, assuming that we know the mathematical form of the model but not the value of the parameters. We first gave starting values to the list of frequencies to begin a multidimensional search for finding a global maximum of the posterior PDF of the frequencies

$$ given in Equations (19) or (20). As an initial estimate of the frequencies

$$ for the maximization procedure, it is possible to take random choices from the interval

$$. However, it is better to start with the locations of the peaks chosen automatically from the Fourier power spectral density graph by using a computer code written in Mathematica. In agreement with Corana (Corana, et al., 1987), reasonable values of the parameters that control the SA algorithm are chosen as

$$,

$$ and

$$. Then the global optimization algorithm starts at some high temperature

$$. Thus the sequence of points is generated until a sort of equilibrium is approached; that is a sequence of points

$$ whose average value of

$$ reaches a stable value as iteration proceeds. During this phase the step vector

$$ is periodically adjusted by the rule defined in Equation (32). The best point

$$ reached so far is recorded. After thermal equilibration, the temperature

$$ is reduced by using the annealing scheduled in Equation (37) with a factor

$$and a new sequence is made starting from this best point

$$, until thermal equilibrium is reached again and so on. The SA algorithm first builds up a rough view of the surface by moving large step lengths. As the temperature

$$ falls and the step decreases, it is slowly focuses on the most promising area. Therefore it proceeds toward better maximum even in the presence of many local maxima by adjusting the step vector that can allow the algorithm to shape the space within it may move, to that within which ought to be as defined by the PDF

$$. Consequently, the process is stop at a temperature low enough that no more useful improvement can be expected, according to a stopping criterion in Equation (38). Table 1.Computer simulations for a single harmonic frequency modelOnce the frequencies are estimated, we then carried on calculating the amplitudes and parameter errors approximately using Equations (25), (23) and (26), respectively. However, an evaluation of the posterior PDF at a given point

$$ cannot be made analytically. It requires a numerical calculation of projections onto orthonormal model functions, related to Eigen-decomposition of the

$$ dimensional matrix

$$. Therefore, the proposed algorithm was coded in Mathematica programming language (Wellin, P., et al., 2005), that provides a much flexible and efficient computer programming environment. Furthermore, it also contains a large collection of built-in functions so that it results much shorter computer codes than those written in C or FORTRAN programming languages. The computer program was run on the workstation with four processors, which of each has got Intel Core 2 Quad Central Processing Unit (CPU), in two cases where the standard deviation of noise is known or not. The output of the computer simulations when

$$ is illustrated in Table 1. The results when

$$ is unknown are almost similar with that of Table 1. Parameter values are quoted as (value) ± (standard deviation). It can be seen that a single frequency and amplitudes are recovered very well. The estimated value of SNR and the standard deviation of the noise are also shown in Table 1. In our second example, we consider a signal model with two close harmonic frequencies:

$$In a similar way, we produced the same size data corrupted by the zero mean Gaussian noise with

$$. We run our Mathematica code again in the case where the deviation of noise is unknown. The results, shown in Table 2, indicate that all values of the parameters within the calculated accuracy are clearly recovered.Table 2.Computer simulations for two closed harmonic frequency modelIn general, we consider a multiple harmonic frequency model signal:

$$The best estimates of parameters are tabulated in Table 3. Once again, all the frequencies have been well resolved, even the third and fourth frequencies which are too closed not to be separated by the Fourier power spectral density shown in Figure 3. Actually with the Fourier spectral density when the separation of two frequencies is less than the Nyquist step, defined as

$$, the two frequencies are indistinguishable. This is simply because there are no sample points in between the two frequencies in the frequency domain. If

$$ theoretically the two frequencies can then be distinguished. If

$$is not large enough, the resolution will be very poor. Therefore, it is hard to tell where the two frequencies are located. This is just the inherent problem of the discrete Fourier power spectral density. In this example two frequencies are separated by 0.01, which is less than the Nyquist step size. There is no way by using Fourier power spectral density that one can resolve the closed frequencies less than the Nyquist step. However, Bayesian power spectral density shown in Figure 3 gives us very good results with high accuracy. Finally, we constructed the signal model in Equation (3), whose parameters, amplitudes and frequencies, are randomly chosen from uniform distribution in the intervals

$$ and

$$, respectively and used it to generate data samples of

$$ by adding a zero mean Gaussian random noise with

$$. The proposal algorithm was rerun for recovering sinusoids from it and the results are tabulated in Table (4). It can be seen that frequencies are specially recovered with high accuracies. Ten frequencies signal model are shown in Fig.5.

Figure 1.Recovering signal from noisy data produced from a single harmonic frequency signal model.

Figure 2.Recovering signals from noisy data produced from two closed harmonic frequency signal model.Table 3.able 3.Computer simulations for a multiple harmonic frequency modelIn order to compare the results with Bretthorst's in this example we converted On the other hand, modifications of this algorithm (Üstündag & Cevri, 2008; 2011) have already been made by generating the next candidate in Equation (33) from normal distribution with a mean of the current estimation whose standard deviation is a square root of the CRLB (Ireland, 2007) given in Equation (36), which is a lower limit to the variance of the measurement of the frequencies, so this generates a natural scale size of the search space around their estimated values. It is expected that better solutions lie close the ones that are already good and so normally distributed step size is used. Consequently, the results we obtained are comparable with or higher than those obtained previously. In addition, all the results discussed so far are also consistent with those of Bretthorst (Bretthorst, 1988) and also demonstrate the advantages of the Bayesian approach together with SA algorithm. Moreover it appears to be very reliable, in the sense that it always converged to neighborhood of the global maximum. The size of this neighborhood can be reduced by altering the control parameters of the SA algorithm, but this can be expensive in terms of CPU consumption. Moreover, we initially assumed that the values of the random noise in data were drawn from the Gaussian density with the mean

$$ and the standard deviation

$$Figure 4 shows the exact and estimate probability densities of the random noise in data. It is seen that the estimated (dotted) probability density is closer to the true (solid) probability density and the histogram of the data is also much closer to the true probability density. The histogram is known as a nonparametric estimator of the probability density because it does not depend on specified parameters. Computer simulations had been carried out to compare the performance of the method with the CRLB. To do this, we generated 64 data samples from a single real tone frequency signal model

$$ and added it to the variety of noise levels. After 50 independent trials under different SNR ratios, the mean square errors (MSE) for the estimated frequencies were obtained and their logarithmic values were plotted with respect to SNR ratios that vary between zero and

$$dB (deci Bell). It can be seen from Figure 6 that the proposed estimator has threshold about 3 dB of the SNR and follows nicely the CRLB after this value. As expected, larger SNR gives smaller MSE. However, many of existing methods in signal processing literature have a MSE that is close to the CRLB when the SNR is more than 20 dB and they usually perform poorly when the SNR is decreased.

Figure 3.Spectral analysis of multiple frequency signal model Table 4.Computer simulations for ten harmonic frequency signal model

Figure 4.Comparison of exact and estimate probability densities of noise in data

$$.

Figure 5.Spectral analysis of ten frequency signal model.The computational complexity of the algorithm is dependent upon a few parameters such as annealing schedule, data samples and parameters. Under using same annealing schedule, Fig. 7 shows only CPU time of different simulations in a variety of number of data samples. It can be clearly seen that an increase in these numbers causes larger consumption of CPU time. With fixed size of components set and specifically annealing schedule of SA algorithm, the overall execution time of the cooling and decision is almost constant, but the runtime of the first two stages (move and evaluate) mostly depends on complicated design constraints and objective functions. Because the move and the evaluation process in the SA algorithm play an important role in CPU resource usage, improving the calculation ability for these stages will be the most feasible approach for an optimizing SA so that parallel computing is one of best approaches for this goal.

Figure 6.The calculated MSE of the proposed method compared with CRLB versus different SNR with a white noise.

Figure 7.CPU times versus with number of parameters and data samples.6. ConclusionIn this work we have partially developed a Bayesian approach combined with SA algorithm and applied it to spectral analysis and parameter algorithm for estimating parameters of sinusoids from noisy data. Overall, results presented here show that it provides rational approach for estimating the parameters, namely the frequencies and the amplitudes, can be recovered from the experimental data and the prior information with high accuracy, especially the frequency, which is the most important parameter in spectral analysis. A significant advantage of this approach comes from the very large posterior probabilities, which are sharply peaked in the neighborhood of the best fit. This helps us to simplify the problem of choosing starting values for the iteration and it provides a rational approach for estimating, in an optimal way, values of parameters by performing a random search. On the other hand, for sufficiently high SNR, MSEs of frequencies will attain CRLB so that it justifies the accuracy of the frequency estimation. Although the SA algorithm spends large consumption of CPU time, it is competitive when compared to the multiple runs often used with conventional algorithms to test different starting values. As expected, parallel implementation of SA algorithm reduces CPU resource usage.Data analysis given in this chapter has also been applied to more complicated models and conditions, such as signals decay, periodic but non-harmonic signals, signals with non-stationary, etc.,. In general, we have also not addressed the problem of model selection which is the part of spectral analysis. In this case, one has enough prior information in a given experiment to select the best model among a finite set of model functions so that Bayesian inference helps us to accomplish it. Therefore, it will deserve further investigations. Nomenclature

$$: Nuisance parameters

$$: Amplitudes of

$$sinusoidal

$$: Marginal probability density function

$$: Conditional probability density function

$$: A matrix defined as

$$$$: A parameter which controls the step variation along the

$$th direction

$$: A set of observed data

$$: Measurement errors

$$: A small positive number defined by user

$$: Expectation of a function or random variable

$$: Model functions that contains sinus and cosines terms

$$: Projections of data onto orthonormal model functions

$$: Orthonormal model functions

$$: Mean of square of projections

$$: Prior information

$$: Fisher information matrix

$$: Inverse of Fisher information matrix

$$: Number of frequencies.

$$: Number of data samples

$$A standard Normal distribution

$$: The number of accepted moves along the direction

$$.

$$: Number of iterations

$$: Least square function

$$: Bayesian power spectral density

$$: Sampling variance

$$: Discrete time set

$$: Standard deviation of angular frequency

$$: Standard deviation of amplitude

$$: Solution space of angular frequencies

$$:

$$Eigenvalue of the matrix

$$$$:

$$Eigenvector of the matrix

$$$$: A factor between 0 and 1.

$$: Parameters vector

$$:

$$ th component of normalized Eigenvalues of

$$$$:

$$th component of

$$th normalized Eigenvector of

$$$$: Step-length vector

$$: Trial step-length vector

$$: Uniformly distributed random number from the interval

$$$$: Variance of noise

$$: Expected value of noise variance

$$: CRLB for

$$th component of

$$$$: Vector of angular frequency

$$: Estimated angular frequency vector

$$: An initial guess vector for frequencies

$$: The initial temperature

$$: Diagonal matrix

$$: Absolute value CRLB: Cramér-Rao lower bound PDF: Probability Density Function SA : Simulated Annealing SNR : Signal to Noise RatioSQ : Simulated Quenching MSE: Mean Squared Errors dB : deci BellAcknowledgementThis work is a part of the projects, whose names are "Bayesian Model Selection and Parameter Estimation" with a number FEN - DKR - 200407 - 0082 and "Bayesian Spectrum Estimation of Harmonic Signals" with a number FEN-BGS-060907-0191, supported by the Marmara University, Istanbul, Turkey.AguiareH.JuniorO.IngberL.PetragliaA.PetragliaM. R.MachadoM. A. S.2012Stochastic Global Optimization and Its Applications with Fuzzy Adaptive Simulated Annealing. Springer Heidelberg New York Dorddrecht London, 3657BarbedoJ. G. A.LopesA.2009Estimating Frequency, Amplitude and Phase of two Sinusoids with Very Close Frequencies, International Journal of Signal processing, 52138145BernardoJ. M.SmithA. F. M.2000Bayesian Theory, Wiley, New York.BinderK.1986Monte Carlo Methods in Statistical Physics, 2nd Edition, Springer-Verlag.BretthorstG. L.1988Bayesian Spectrum Analysis and Parameter Estimation, Lecture Notes in Statistics, Springer-Verlag Berlin Heidelberg, New York.BretthorstG. L.1990An Introduction to Parameter Estimation Using Bayesian Probability Theory, Maximum Entropy and Bayesian Methods, Kluwer Academic Publishers, 5379Brooks S.P. & Morgan B.J.T.1995Optimization using simulated annealing, Journal of Royal Society (The Statistician) 44, 241257ChanK. W.SoH. C.2004Accurate frequency estimation for real harmonic sinusoids, IEEE Signal Processing Letters 11, 609612CooleyJ. W.TukeyJ. W.1964An algorithm for the machine calculation of complex Fourier series, Mathematics of Computation 19, 297301CoranaA.MarchesiM.MartiniC.RidellaS.1987Minimizing multimodal functions of continuous variables with the simulated annealing algorithm, ACM Transactions on Mathematical Software 13, 262280EdwardsA. W. F.1972Likelihood, Cambridge University Press.GregoryP. C.2005Bayesian Logical Data Analysis for the Physical Science, Cambridge University Press.GoffeW. L.FerierG. D.RogersJ.1994Global optimization of statistical functions with simulated annealing, Journal of Econometrics 60, 65100HarneyH. L.2003Bayesian Inference: Parameter Estimation and Decisions, Springer-Verlag Berlin Heidelberg.HookeT. R.JeveesT. A.1962Direct search solution of numerical and statistical problems, Journal of Association of Computer Machinery 5, 212229IngberL.1994Simulated Annealing: Practice versus theory, Mathematical computer Modeling 18, 2957IngberL.1996Adaptive Simulated Annealing ASA: Lessons learned, Control and Cybernetic 25, 3354IrelandJ.2007Simulated annealing and Bayesian posterior distribution analysis applied to spectral emission line fitting, Journal of Solar Physics 243, 237252JaynesE. T.1987Bayesian Spectrum and Chirp Analysis, in Maximum Entropy and Bayesian Spectral Analysis and Estimation Problems, C. R. Smith and G. J. Erickson (eds.), D. Reidel, Dordrecht, 137JaynesE. T.2003Probability Theory: The Logic of Science, Cambridge University Press.JeffreysH.1961Theory of Probability, Oxford University Press, London.KayS. M.1993Fundamentals of statistical signal processing: Estimation theory, Prentice Hall, 1KirkpatrickS.GelattC. D.VecchiM. P.1983Optimization by Simulated Annealing, Journal of Science 220, 671680LaarhovenP. V.AartsE.1987Simulated annealing: Theory and Applications, D. Reidel Publishing Company.LocatelliM.2000Simulated annealing algorithms for Continuous global optimization: Convergence Conditions, Journal of Optimization Theory and Applications 106, 121133LagrangeM.MarchandS.Rault-RrJ.2005Improving sinusoidal frequency estimation using a trigonometric approach, Proceedings of the Digital Audio Effects Conference, Spain,110115MetropolisN.RosenbluthA.RosenbluthM.TellerA. M.TellerE. .1953Equation of states calculations by fast computing machines, Journal of Chemical Physics 21, 10871092Mc WhorterT.ScharfL. L.1993Cramer-Rao Bound for deterministic modal analysis, IEEE Transactions on Signal Processing 41, 18471866NishiyamaK.1997A nonlinear filter for estimating a sinusoidal signal and its parameters in white noise: On the case of a single sinusoid, IEEE transactions on signal processing 45, 970981NortonJ. P.1986An introduction to identification, Academic Press, 1986.PressW. H.FlanneryB. P.TeukolshyS. A.VetterlingW. T.1995Numerical recipes in C: The art of computing, Cambridge University Press.RuanaidhJ. J. K. Q.FitzgeraldW. J.1996Numerical Bayesian Methods Applied to Signal Processing, Springer-Verlag, New York.StoicaP.NehoraiA.1989MUSIC, Maximum Likelihood, and Cramer-Rao Bound, IEEE Transactions on Acoustic Speech, and Signal Processing 37, 720741SchusterA.1905The Periodogram and its optical analogy, Proceedings of the Royal Society of London, Series A 77, 136140StefankovicD.VempalaS.VigodaE.2009Adaptive simulated annealing: A near- optimal connection between sampling and counting. Journal of the ACM 56, doi:10.1145/1516512.1516520TsoulosG.LagarisI. E.2006GenAnneal: Genetically modified Simulated Annealing, Computer Physics Communications 174, 846885ÜstündagD.QueenBowcockJ. E.1989A method for retrieving images from noisy, incomplete data, Proceeding of the Fifth Alvey Vision Conference, University of Reading, 239245ÜstündagD.QueenN. M.SkinnerG. K.BowcockJ. E.1991Two new methods for retrieving an image from noisy, incomplete data and comparison with the Cambridge MaxEnt package, 10. International Workshop on Maximum Entropy and Bayesian Methods, MaxEnt 90, 295301ÜstündağD.CevriM.2008Estimating parameters of sinusoids from noisy data using Bayesian inference with simulated annealing, WSEAS Transactions on Signal Processing 7, 432441ÜstündağD.CevriM.2011Recovering sinusoids from noisy data using Bayesian inference with simulated annealing, Mathematical and Computational Applications 16, 382391ÜstündağD.2011Recovering sinusoids from data using Bayesian inference with RJMCMC, Proceeding of Seventh International Computation on Natural Computation (ICNN), Shanghai, 18501854VasanA.RajuK. S.2009Comparative analysis of Simulated annealing, Simulated quenching and genetic algorithms for optimal reservoir operation, Applied Soft Computing 9, 274281WellinP.GayllordR.KaminS.2005An Introduction to programming with Mathematica, Cambridge University Press.