Posterior distribution for the number of time correlation decay channels for a polymer solution of polyethylene glycol with a molecular weight of
Abstract
The rapidly improving performance of inelastic scattering instruments has prompted tremendous advances in our knowledge of the highfrequency dynamics of disordered systems, yet also imposing new demands to the data analysis and interpretation. This ongoing effort is likely to reach soon an impasse, unless new protocols are developed in the data modeling. This need stems from the increasingly detailed information sought for in typical line shape measurements, which often touches or crosses the boundaries imposed by the limited experimental accuracy. Given this scenario, the risk of a bias and an overparametrized data modeling represents a concrete threat for further advances in the field. Being aware of the severity of the problem, we illustrate here the new hopes brought in this area by Bayesian inference methods. Making reference to recent literature results, we demonstrate the superior ability of these methods in providing a probabilistic and evidencebased modeling of experimental data. Most importantly, this approach can enable hypothesis test involving competitive line shape models and is intrinsically equipped with natural antidotes against the risk of overparametrization as it naturally enforces the Occam maximum parsimony principle, which favors intrinsically simple models over overly complex ones.
Keywords
 inelastic Xray scattering
 inelastic neutron scattering
 Bayes analysis
 MCMC methods
 model choice
1. Introduction
In the last decade, a large amount of inelastic neutron and Xray scattering measurements focused on the study of the collective atomic dynamics of disordered system [1, 2, 3, 4, 5]. Although, across the years, the analysis of the line shape reported in these measurements seldom benefited from the support of a Bayesian inference analysis, the need of this statistical tool is becoming increasingly urgent. As a general premise, it is worth stressing that a scattering measurement somehow resembles a microscope pointed on the dynamics, whose “focus” can be adjusted by suitable choice of the momentum
One could guess that such a sharp spectral shape does not leave any room for interpretative doubts, also considering that the limiting hydrodynamic spectral profile is exactly known as analytically treatable starting from the application of mass, momentum, and energy conservation laws. Although these statements appear partly true, the very concept of “interpretative doubt” sounds grossly illdefined before spelling out explicitly the accuracy required to the interpretation one alludes to. Despite its pioneering nature, the quality of the measurements in panel A seems certainly adequate for a precise determination of the sidepeak position, probably not much so for a detailed analysis of the spectral tails, which are dominated by the slowly decaying resolution wings. Nonetheless such a shape might still appear a more encouraging candidate for a line shape analysis than its counterpart reported in panel B of Figure 1 which is featured by broad and loosely resolved spectral features, besides a definitely poorer count statistics. Given that the latter result is fairly prototypical of terahertz spectroscopic measurements on simple disordered systems, one might wonder why, thus far, the analysis of these measurements failed to benefit from Bayesian inference methods as routine line shape analysis tools. Aside of hardly justifiable initial mistrusts, a likely explanation is that only recently these spectroscopic techniques transitioned to a mature age in which the very detection of collective modes in amorphous systems can no longer be considered a discovery in itself, and detailed analyses of the spectral shape are more and more common and required. Again, the takeon message of this course of events is that the pivotal issue is the adequacy of a given measurement to provide the sought for information, rather than the quality of the measurement in itself. The unbalance between an unavoidably limited experimental performance and the rapidly increasing interpretative needs dramatically enhances the risk of “good faith overinterpretations” representing a lethal threat for the progress of knowledge.
When dealing with neutron or Xray scattering, the statistical accuracy of spectral acquisition is the primary concern. For the most varied reasons, e.g., relating to the scattering properties of the sample, the integration time, or the count rate of the measurement, the achieved count statistics may either be adequate for a rigorous data analysis or, as often happens, not as good as we would like it to be. In the latter case, the experimental data might not be accurate enough to tell us everything about the physical problem under scrutiny. They could tell us something, but not everything! This is why we need a solid inferential method capable of extracting the maximum amount of information from the data acquired and possibly providing us with a quantitative probabilistic evaluation of the different models that are compatible with the data at hand. Especially when nothing or very little is known about a specific sample or system, the point is, given the observed data, how plausible is a specific model? What is the precision of the conclusions drawn from this model? Are there other possible interpretations of the data at hand? To what extent are different models and interpretations supported by the observed data?
A Bayesian inferential approach provides answers to all these questions on a probabilistic basis, along with a sound criterion to integrate any prior knowledge in the process of data analysis. Bayesian inference, in fact, recognizes the importance of including prior knowledge in the analysis. When we do have wellestablished prior knowledge about a sample property or a general law a physical phenomenon must comply with, it would be insane and pointless not to use this information. Such a prior knowledge, in fact, can protect us from the risk of making mistakes in the description of experimental data, hence in their interpretation. In the Bayesian framework, prior knowledge takes the form of probability statements so that different probabilities, ranging from zero to one, can be attributed to competitive explanations of the data. In this way, less probable explanations are not excluded a priori but simply given a smaller prior probability. The a priori probability of different explanations is then updated, through the Bayes theorem, based on the new information provided by the data. The results of this analysis, thus, assume the form of posterior probabilities. On this basis, one can easily establish which model is most supported by both data and prior knowledge, what are the posterior probabilities of alternative models and those of their parameters, and which provides a ground to appreciate the precision of their estimates. In addition, Bayesian methods naturally embody the Occam’s razor principle, thus favoring simpler models over unnecessarily complex ones. Last but not least, Bayesian estimation algorithms are generally less affected by the presence of local optima in the parameter space and are not sensitive to the starting values used to initialize the estimation process.
The aim of this chapter is to illustrate how Bayesian inference can be used in Xray and neutron scattering applications. The Bayesian approach proposed here is implemented through an estimation algorithm, which makes use of Markov chains Monte Carlo (MCMC) methods [9, 10] integrated, where appropriate, with a reversible jump (
2. An example: searching for differences
Depending on the problem at hand, our approach to data analysis can be very different. Imagine that we want, as a toy or teaching example, to measure either the neutron or the Xray scattering spectrum from a system whose spectrum is wellknown and its interpretation unanimously agreed. For instance, we aim at extracting the phonon dispersion curve from the thoroughly measured spectral density
More often, we face a different problem, as we want to measure for the first time a certain system on which we might not have previous knowledge. Alternatively, we could have prior knowledge about that same system, yet in different thermodynamic or environmental conditions— for instance, a liquid either in bulk or confinement geometries—and possible effects of these peculiar conditions are under scrutiny. Changes could also be very small, and, since detecting them is the focus of our research, it is essential to take the most impartial and noninvasive approach. In this second situation, it would be desirable not to rely too heavily on previous results when choosing the model and to allow the measurement to reveal possible new features.
The two situations mentioned above notably differ in the amount of information available on the system before analyzing the data. In the first case, we have a complete knowledge of the system, while, in the second case, this knowledge is partial or even lacking at all. In this second situation, a traditional approach would consist in either best fitting a model we deem adequate for the data, e.g., wellassessed for the considered sample, albeit only in different thermodynamic or environmental conditions, or fitting competing models to establish the one best performing based on criteria agreed upon, e.g., the chisquare value. Following the first path, we hinge too much on a specific model and on previous partial knowledge, thus jeopardizing the chance of new findings. On the other hand, the second path would be less coercive at the cost of completely ignoring previous partial knowledge. In addition, the model chosen would be simply the one providing the best fit, but no assessment can be made on the plausibility of this or any other fitting model, based on the data measured. Conversely, a Bayesian approach to data analysis would, instead, allow to assign a different prior probability to the different models (accounting for the uncertainty of available information on the system) and, then, revise these probabilities in the light of the data to deliver the posterior probability of each model, conditional on the data at hand.
3. Bayesian inference
3.1 The Bayes theorem
The Bayes theorem stems from the theorem of compound probability and from the definition of conditional probability. If we consider two events
where
From Eq. (1), we immediately get:
which is nothing else than the Bayes theorem.
Let us now consider
where
3.2 The prior distribution
Let us consider the different elements of Eq. (4), starting with the prior distribution (or simply prior)
Just to make a few examples, it might be possible that a certain parameter
In other situations, the information available on the parameters might be more vague. For example, we might simply know that a certain parameter must be nonnegative or that it must range in a limited interval, as often the case of neutron scattering hampered by severe kinematic constraints. Nonnegative parameters can be a priori assumed to follow, for example, a truncated Gaussian or a gamma distribution, and, if no other information is available, the prior distribution will be adjusted to make allowance for a large parameter variability, reflecting the noninformative initial guess. Parameters having random or hardly quantifiable variations within limited windows can be assumed to approximately follow a uniform distribution over such a window. Also, whenever feasible, any mutual entanglement between parameters, as well as any selection, conservation, or sum rule, should be embodied in a usable distribution function complementing our prior knowledge
Notice that, even if it is common practice to assume that the parameters are a priori independently distributed, correlation between them can be naturally induced by the data, through the combination of the likelihood and the prior. Parameters can be a posteriori correlated, even if they are a priori independent.
3.3 The likelihood function
The likelihood function is the joint probability of the observed data, conditional on the model adopted and its parameter values. Notice that for continuous data, the likelihood becomes a density of probability. Let
The left side of Eq. (6) is the likelihood function for the observed sample
To be more specific, we can consider spectroscopic data. The observable directly accessed by a spectroscopic measurement is the spectrum of the correlation function of density fluctuation, or dynamic structure factor
where
Under the assumption above, the likelihood function is:
Conditional on a certain value of the parameter vector
3.4 The posterior distribution and its normalizing constant
The term on the lefthand side of Eq. (3) is the joint posterior distribution of the model parameters, given prior knowledge and measured data, i.e.,
The term in the denominator of Eq. (3):
is generally called the marginal likelihood and represents the probability of observing the measured data
For this reason, Bayesian inference usually needs to resort to MCMC methods to simulate the joint posterior distribution. MCMC algorithms, in fact, allow to draw values from distributions known up to a normalizing constant, as is often the case for
To illustrate an interesting point, let us go back to the example considered before, in which we want to analyze spectroscopic data that can be modeled as in Eq. (7) and for which the likelihood is given in Eq. (8). Imagine to have no prior information at all on the parameters of the model so that the only sensible choice for the prior is a uniform distribution on the parameter space. Then, from Eqs. (8) and (10), it follows that:
which implies that the posterior distribution is a multivariate Gaussian. As already mentioned, parameters can be estimated taking the mean of the posterior distribution, which, for a Gaussian distribution, corresponds to the median, mode, and maximum of the distribution. Therefore Bayesian parameter estimates are obtained as those values of
Despite the asymptotic equivalence, sometimes parameters are much easier estimated in a Bayesian rather than in a frequentist perspective. Frequentist estimation, in fact, is generally based on least squares or maximum likelihood methods, and this might be a problem in the presence of local optima. If, for example, the starting values of the parameters, needed to initialize the optimization algorithm, are close to a local optimum, the algorithm might be trapped in this suboptimal solution. As a consequence, different starting values might determine different solutions and, thus, parameter estimates. The Bayesian estimate of a parameter, as stated before, is instead obtained as the mean of its posterior distribution, marginalized with respect to all other parameters. This estimation procedure does not involve any minimization or maximization, and, thus, the fitting algorithm does not risk to get trapped in local optima, and the results are independent from starting values used in the MCMC algorithm used to simulate the posterior distribution (see Section 3.6). It might happen, obviously, that the posterior of one or more parameters is bimodal or multimodal. The presence of different parameter regions with high posterior density might suggest that the data show some evidence in favor of a more complex model but not enough for this model to have the highest posterior probability. In this case, it is not reasonable to use the mean as a point estimate for the parameters, since it might fall in a low posterior density region, and the mode of the posterior distribution can be used in its place. In such situations of posterior multimodality, it is evident how the whole posterior distribution conveys a much richer information than the simple parameter estimate.
3.5 The Occam’s razor principle
Even if Bayesian and classical analysis asymptotically give the same results, Bayesian results always have a probabilistic interpretation, and this is particularly relevant when we need to compare different models and determine, for instance, the number of spectral excitations (in the frequency domain) or the number of relaxations (in the time domain). In addition, the Bayesian method represents a natural implementation of the Occam’s razor [18, 19, 20]: this principle is intrinsic to Bayesian inference and is a simple consequence of the adoption of the Bayes theorem. In model choice problems, in fact, the posterior probabilities of the different models naturally penalize complex solutions with respect to simple ones, thus conforming to the parsimony principle.
To see this, consider Eq. (4), and imagine that the parameter vector also includes a model indicator parameter
Now, consider for simplicity just two possible models, the first one, denoted as
3.6 Bayesian computation of model parameters
As already stated, Bayesian inference completely relies on the joint posterior distribution
where
Concerning the proposal distribution, this should be chosen as a distribution from which it is easy to sample. It could be, for instance, a normal distribution centered on the current value of the parameter and with a certain variance which can be adjusted and used as a tuning parameter. This locution alludes to the circumstance that adjustments of this parameter can literally tune the step of the parameter updates. For a normal proposal distribution, a large variance allows the new value
When the parameter vector also includes a model indicator parameter, a further move needs to be considered to update this parameter and to allow the algorithm to explore different models. This move is a reversible jump [11] step, which is specifically designed to allow the Markov chain to move between states having different dimensions (since the dimension of the parameter space varies accordingly to the model considered).
As a final remark, consider that when the MCMC algorithm reaches convergence, after a so called “burnin” period, the draws not only effectively represent samples from the joint posterior distribution but are also theoretically independent from the starting values of each parameter. Few examples about this point are shown in Table 1 of Ref. [12]. Notice, however, that the time required to reach convergence might vary a lot depending on the data and the prior. For example, peaked unimodal posterior distributions (i.e., highly informative data) generally speed up convergence, as well as the availability of an important prior information, which reduces the size of the effectively accessible parameter space. On the contrary, the presence of many high posterior density regions can hinder and slow down convergence.



1  8.47 
2  61.83 
3  23.91 
4  4.41 
5  1.12 
6  0.26 
4. The Bayesian approach in neutron and Xray scattering spectroscopy
4.1 Neutron and Xray Brillouin scattering
One of the models commonly used to analyze either neutron or Xray scattering data is the socalled damped harmonic oscillator (DHO) profile, which we report here below:
where
To fit the experimental data, the model in Eq. (14) needs to be convoluted with the instrument resolution, and it can conceivably sit on top of an unknown linear background. Overall, the final model used to approximate the measured line shape is given by:
where
For IXS, Eqs. (14–16) are still formally valid although the instrument resolution function has usually a slightly more complex shape which appears in the convolution of Eq. (15) either as approximated by an analytical model or measured from the signal of an almost elastic scatterer; obviously, in the latter case, the convolution is computed numerically. The final model is further corrupted by an additive Gaussian noise, having a variance that, for instance, can be taken proportional to each data point. Thus, the experimental data points are given by:
with
where
The whole parameter vector for the model in Eq. (17) is
Bayesian inference is, then, based on the joint posterior of the whole parameter vector
Once the convergence is attained, after a burnin period, at each sweep
from the joint posterior
where each row is a particular draw of the whole parameter vector
Once a particular model with, let us say,
In assessing convergence, a valuable tool is provided by trace plots, which show the sampled values of a parameter over the sweeps of the algorithm. Ideally, a trace plot should exhibit rapid upanddown variation with no longterm trends or drifts. Imagining to break up this plot into a few horizontal sections, the trace within any section should not look much different from the trace in any other section. This indicates that the algorithm has converged to the posterior distribution of the parameters. Other convergence criteria can be found, for example, in Ref. [23].
Figure 4
shows the trace plots of three DHOmode frequencies
In
Figure 5
, we report an example of Bayesian analysis applied to neutron Brillouin scattering data from liquid gold [12] at different values of the momentum transfer
Even in this straightforward case, however, additional insights can be obtained from the posterior distributions delivered by the Bayesian inference. For example, in
Figure 6
, it can be noticed that, as the value of
To investigate these issues, one can look, for example, at the posterior distributions for the excitation frequency
The shape of these posterior distributions provides a measure of the precision with which the parameter is estimated. For example,
Data discussed in Ref. [25] provide another example of the efficacy of Bayesian inference in enforcing the parsimony principle. Specifically, we refer to the case of an IXS measurement from a suspension of gold nanoparticles in water which has been analyzed with a model similar to the one in Eq. (14), yet with the DHO terms replaced by Dirac delta functions, due to the extremely narrow width of the measured excitations. For all
As a further remark, we would like to stress again the fact that results from Bayesian inference are always to be interpreted in a probabilistic nuance. For instance, we stated before that the oscillation mode
4.2 Bayesian inference in the time domain
Time correlation function decays can be modeled in terms of an expansion of the intermediate scattering function
where
The value of
Let us consider one of the datasets in Ref. [15], representing the time correlation decay of a polymer solution of polyethylene glycol with a molecular weight of
The most visited model is the one with two exponential functions. The fit is shown in the figure below ( Figure 9 ).
The values reported in
Table 1
clearly show that the posterior distribution of
When we model a spectroscopic dataset through a homogeneous mixture, e.g., a linear combination of exponentials, Lorentzians or DHO functions, the posterior distribution for the number of components always has at least a maximum, unless the data are so scarcely informative that the posterior for
Let us introduce a quantity which could resemble the
which measures the distance between the experimental data and the best fit determined with the















Nevertheless,
In summary, we have here shown some of the opportunity offered by a Bayesian inference analysis of experimental results and, in particular, those obtained with spectroscopic methods. As possible future development, it appears very promising the opportunity of applying similar methods to the joint analysis of complementary time or frequencyresolved measurements. Also, we can envisage the use of more informative priors implementing the fulfillment of sum rules of the spectra or any other known physical constraint of the measurement. We are confident that, in the long run, these methods will improve the rigor of routine data analysis protocols, supporting a probabilitybased, unprejudiced interpretation of the experimental outcome.
This work used resources of the National Synchrotron Light Source II, a US Department of Energy (DOE) Office of Science User Facility operated for the DOE Office of Science by Brookhaven National Laboratory under Contract No. DESC 0012704. The open access fee was covered by FILL2030, a European Union project within the European Commission’s Horizon 2020 Research and Innovation programme under grant agreement N°731096.
Acknowledgments
We would like to thank U. Bafile, E. Guarini, F. Formisano, and M. Maccarini for the very stimulating discussions.
References
 1.
Boon JP, Yip S. Molecular Hydrodynamics. Mineola, NY: Dover Publication Inc.; 1980  2.
Hansen JP, McDonald IR. Theory of Simple Liquids. New York: Academic Press; 1976  3.
Balucani U, Zoppi M. Dynamics of the Liquid State. Vol. 10. Oxford: Clarendon Press; 1995  4.
Copley J, Lovesey S. The dynamic properties of monatomic liquids. Reports on Progress in Physics. 1975; 38 :461  5.
Scopigno T, Ruocco G, Sette F. Microscopic dynamics in liquid metals: The experimental point of view. Reviews of Modern Physics. 2005; 77 :881  6.
Berne BJ, Pecora R. Dynamic Light Scattering: With Applications to Chemistry, Biology, and Physics. New York: Dover Publications, Inc.; 2000  7.
Fleury PA, Boon JP. Brillouin scattering in simple liquids—argon and neon. Physical Review. 1969; 186 :244  8.
Cunsolo A, Pratesi G, Verbeni R, Colognesi D, Masciovecchio C, Monaco G, et al. Microscopic relaxation in supercritical and liquid neon. The Journal of Chemical Physics. 2001; 114 :2259  9.
Gilks WR, Richardson S, Spiegelhalter DJ. Markov Chain Monte Carlo in Practice. London: Chapman & Hall/CRC; 1996  10.
Tierney L. Markov chains for exploring posterior distributions. The Annals of Statistics. 1994; 22 :1701  11.
Green PJ. Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika. 1995; 82 :711  12.
De Francesco A, Guarini E, Bafile U, Formisano F, Scaccia L. Bayesian approach to the analysis of neutron Brillouin scattering data on liquid metals. Physical Review E. 2016; 94 :023305  13.
De Francesco A, Scaccia L, Maccarini M, Formisano F, Zhang Y, Gang O, et al. Damping off terahertz sound modes of a liquid upon immersion of nanoparticles. ACS Nano. 2018; 12 :8867  14.
De Francesco A, Scaccia L, Formisano F, Maccarini M, De Luca F, Parmentier A, et al. Shaping the terahertz sound propagation in water under highly directional confinement. Physical Review B. 2020; 101 :05436  15.
De Francesco A, Scaccia L, Lennox RB, Guarini E, Bafile U, Falus P, et al. Modelfree description of polymercoated gold nanoparticle dynamics in aqueous solutions obtained by Bayesian analysis of neutron spin echo data. Physical Review E. 2019; 99 :052504  16.
Parmentier A, Maccarini M, De Francesco A, Scaccia L, Rogati G, Czakkel O, et al. Neutron spin echo monitoring of segmentallike diffusion of water confined in the cores of carbon nanotubes. Physical Chemistry Chemical Physics. 2019; 21 :21456  17.
Bemardo JM. Philosophy of statistics. In: Bandyopadhyay PS, Forster MR, editors. Handbook of the Philosophy of Science. Vol. 7. Amsterdam: NorthHolland; 2011. pp. 263306  18.
Berger JO, Jefferys WHA. The application of robust Bayesian analysis to hypothesis testing and Occam’s razor. Journal of the Royal Statistical Society, Series A. 1992; 1 :17  19.
Jefferys WH, Berger JO. Ockham’s Razor and Bayesian Analysis. American Scientist. 1992; 80 :64  20.
MacKay D. Information Theory, Inference and Learning Algorithms. Cambridge: Cambridge University Press; 2003  21.
Chib S, Greenberg E. Understanding the metropolishastings algorithm. The American Statistician. 1995; 49 :327  22.
Roberts GO, Gelman A, Gilks WR. Weak convergence and optimal scaling of random walk metropolis algorithms. The Annals of Applied Probability. 1997; 7 :110  23.
Cowles MK, Carlin BP. Markov chain Monte Carlo convergence diagnostics: A comparative review. Journal of the American Statistical Association. 1996; 91 :883  24.
Guarini E, Bafile U, Barocchi F, De Francesco A, Farhi E, Formisano F, et al. Dynamics of liquid Au from neutron Brillouin scattering and ab initio simulations: Analogies in the behavior of metallic and insulating liquids. Physics Review. 2013; B88 :104201  25.
De Francesco A, Scaccia L, Maccarini M, Formisano F, Guarini E, Bafile U, et al. Interpreting the terahertz spectrum of complex materials: The unique contribution of the Bayesian analysis. Materials. 2019; 12 :2914