In this chapter, we discuss basic ideas on how to structure and study the Bayesian methods for standard models of dark energy and how to implement them in the architecture of deep learning processes.
- dark energy
- Bayesian analyses
- machine learning
- cosmological parameters
The dark sector of the universe has been the issue of study for cosmologists who are striving to understand the world around us in its entirety. The composition of the current universe is an age-old inquiry that these researchers have probed into. And while we do have estimates of the likely percentages of baryonic matter, dark matter, and dark energy at 5, 27 and 68%, respectively, researchers have been trying to improve these estimates and optimise the computational expense of the statistical methods employed to analyse cosmological data available.
These thoughts have opened the path of the following chapter, in where we will discuss from the standard dark energy models to explain the cosmic acceleration until the design of a numerical architecture in order to understand the constrains over the cosmological parameters that can describe the current universe and its effects.
2. Dark energy as a solution to the cosmic acceleration
A highlight in observational cosmology is the origin and nature of the cosmic accelerated expansion. The standard cosmological model that is consistent with current cosmological observations is the so-called concordance model or CDM. According to this scenario, the observed accelerating expansion is related to the repulsive gravitational force of a Cosmological Constant with constant energy density and negative pressure . This proposal has been the backbone of the standard cosmology since the nineties, but simple enough as it is the proposal that has a couple of theoretical problems; two of them are the fine tuning argument and coincidence problem [1, 2]. In order to solve or at least relax these problems, some proposals have led to alternative scenarios that can modify the general relativity (GR) or consider a landscape with a dynamical dark energy. It is in this way that dark energy emerges as a cosmological solution since it can be described as a fluid parameterised by an equation of state (EoS), which can be written in terms of the redshift, . So far, the properties of this EoS remain under-researched. Just to mention a few, there are a zoo of proposals on dark energy parameterisations discussed in the literature (see, e.g., [3, 4, 5, 6, 7, 8, 9]), addressing from parameterisation as Taylor-like series to dynamical that can provide oscillatory behaviours [10, 11, 12, 13].
Nowadays, the techniques to discriminate between models and confront them with CDM are based on the calculations of the constraints on the EoS-free parameter(s) of the models. This methodology has been done using observables that can show the cosmic acceleration such as supernovae type IA (SNeIa), baryon acoustic oscillations (BAO), cosmic microwave background (CMB), weak lensing spectrum, etc. The relevance of using these observations is due to the precision with which dark energy can be probed. Currently, some measurements such as the Pantheon from supernovae , BOSS , just to cite a few, point out a way to constrain these EoS parameters. These observations allow deviations from the CDM model, which are usually parameterised by the EoS- free parameters [16, 17, 18, 19, 20]. In past years, there have been many observations related to the verification of the cosmic acceleration, for example, from Union 2.11 to the Joint LightCurve Analysis [21, 22]. But the statistics has been improved due to the density of data this kind of supernovae.
3. On how to model dark energy
One of the first steps to understand the behaviour of the cosmic acceleration remains in that we require an energy density with negative pressure at late times . To achieve this, we need to express the ratio between the pressure and energy density as negative, i.e., . In order to develop the evolution equations for a universe with this kind of fluid, we start by introducing in Einstein equations a Friedmann-Lemaitre-Robertson-Walker metric to obtain the Friedmann and Raychaudhuri equations for a spatially flat universe:
where is the Hubble parameter in terms of the redshift , is the gravitational constant and the subindex indicates the present-day values for the Hubble parameter and matter densities.
From Eq. (2), it is possible to obtain the energy conservation equation, in that way, the energy density of the non-relativistic matter is . And the is given by:
and the dark energy density can be modulated as , where can be written as:
If we assume that the energy-momentum tensor (on the right side of the Einstein’s equations) is a perfect fluid (without viscosity or stress effects), i.e., , the form of can be restricted to be:
Now, the behaviour of the latter is restricted directly to the form of , which can give a description of the Hubble function (which can be normalised by the constant Hubble ), as for example, in the case of quiescence models () the solution of is . If we consider the case of the cosmological constant (), then .
As an extension, with the later equations we can calculate the dynamical age of the universe using the follow relationship:
Integrating, we can obtain:
From here, we can set a functional form of , in which contribution of the dark energy density to in Eq. (1) goes to a region of negative values of . The physics behind this behaviour is an impact on the evolution of dark energy using the dynamical age of the universe Eq. (8). When we compare several theoretical models in the light of observations, a model approach is essential. As we mentioned in the “Introduction” section, to obtain a dark energy model with late-time negative pressure, we can think in two scenarios:
a quiescence model, which can show a wide application in tracker the slow roll condition of scalar fields and demands a constant value of . As an example, for a flat universe and according to the Planck data , the dark energy EoS parameter gives , which is consistent with the cosmological constant. These data constrain the curvature parameter at 2 and are found to be very close to 0 with .
a kinessence model; where when the EoS is a function of redshift . For this case, several dark energy models with different parameterisations of have been discussed in the literature .
4. Standard dark energy models
where are constants and are functions of the redshift , or, the scalar factor . As brief examples, in this section, we present three models that have bidimensional forms in the since that they depend only of two free parameters . A first target is to express the exact form of the Hubble function using a specific expression for given by Eq. (5). Once integrated, we can normalise this function by a Hubble parameter , and from now on, we called this normalisation function depending of the redshift as . The second target is to test these equations with the current astrophysical data available.
4.1 Lambda cold dark matter-redshift parameterisation (CDM)
This model is given by:
where represented the matter density (including the non-relativistic and dark matter terms). We consider in the value of . As it is well known in the literature, this standard model provides a good fit for a large number of observational data surveys without addressing the important theoretical problems mentioned above.
4.2 Linear-redshift parameterisation (LR)
from we can recover model if with and . We notice immediately that due the linear term in , this proposal diverges at high redshift and consequently yields strong constraints on in studies involving data at high redshifts, e.g., when we use CMB data .
As usual, we can use the later to obtain an expression for the Hubble normalised function as:
4.3 Chevallier-Polarski-Linder parameterization (CPL)
Due the consequence of the LP parameterisation divergence, Chevallier, Polarski and Linder proposed a simple parameterisation [33, 34] that in particular can be represented by two parameters that are given by a present value of the EoS and its overall time evolution . The proposal is given by the expression:
and its evolution is
As we can notice, the divergence at high redshift relaxes, but still this ansatz has some problems in specific low redshift range of observations.
5. Estimating the cosmological parameters
After we have defined a specific cosmological model, we can then perform their test using astrophysical observations. The methodology can be described by a simple calculation of the usual method and then process the MCMC chains computational runs around a certain value [observational(s) point(s)] and obtain the best fit parameter(s) of this process. Parameter estimation is usually done by computing the so-called likelihood function for several values of the cosmological parameters. For each data points in the parameter space, the likelihood function gives the minimised probability of obtaining the observational data that was obtained if the hypothesis parameters had the given values (or priors). For example, the standard cosmological model CDM is described by six parameters, which include the amount of dark matter and dark energy in the universe as well as its expansion rate . Using the CMB data (which is the accuracy data that we understand very well so far), a likelihood function can be constructed. The information given by can tell which values of these parameters are more likely, i.e. by probing many different values. Therefore, we are able to determine the values of the parameters and their uncertainties via error propagation over the free parameters of the model.
Now, the following question is that what kind of astrophysical surveys2 can we use to test the cosmological models. In the next sections we described the most used surveys that are employed to analyse the cosmic acceleration. It is important to mention that these surveys spread depending upon their own nature. We have three types of observations classified as: standard candles (e.g., supernovae, in which characteristic function is the luminosity distance), standard rulers (e.g., supernovae, in which characteristic function is the angular/volumen distance), and the standard sirens (e.g., gravitational waves, which can be described by frequencies or chirp masses depending the observation) [35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45]. The set of all of them can describe a precise statistics, but by separate, each of them have intrinsic problems due to their physical definition. For supernovae, the luminosity distance has in their definition an integral of the cosmological model; therefore, when we perform the error propagation, the uncertainty is high. This disadvantage can be compensated by the large population of data points in the sampler. On the other hand, the uncertainty is less for standard rulers in comparison to supernovae. For this case, the definition of angular distance does not include integrals. The price that we pay in order to use this kind of sampler is that the population of data is very small (e.g., from surveys like BOSS or CMASS, we have only seven data points). Moving forward, the observation of gravitational wave standard sirens would be developed into a powerful new cosmological test due that they can play an important role in breaking parameter degeneracies formed by other observations as the ones mentioned. Therefore, gravitational wave standard sirens are of great importance for the future accurate measurement of cosmological parameters. In this part of the chapter, we are going only to develop the use of the first two kinds of observations.
6. Supernovae sampler
Along the ninety years, since their discovery, Type IA supernovae (SNIa) have been the proof of the current cosmic acceleration. The surveys have been changing given us a large population of observations, from Union 2.13 to the Joint LightCurve Analysis [21, 22], the data sets have been incrementing observations and also their redshift range. Currently, the Pantheon sampler, which consists of a total 1048 Type Ia supernovae (SNIa) in 40 bins  compressed, is the largest spectroscopically confirmed SNIa sample to date. This latter characteristic makes this sample attractive to constrain with considerably precision the free cosmological parameters of a specific model.
SNIa can give determinations of the distance modulus , whose theoretical prediction is related to the luminosity distance according to:
where the luminosity distance is given in units of Mpc. In the standard statistical analysis, one adds to the distance modulus the nuisance parameter , an unknown offset sum of the supernovae absolute magnitude (and other possible systematics), which is degenerate with .
Now, the statistical analysis of the this sample rests on the definition of the modulus distance as:
where is the Hubble-free luminosity distance:
With this notation, we expose the different roles of the several cosmological parameters appearing in the equations: the matter density parameter appears separated as it is assumed to be fixed to a prior value, while is the EoS parameters . These later are the parameters that we will be constraining by the data. The best fits will be obtained by minimising the quantity [46, 47, 48, 49, 50]:
where are the measurement variances. And nuisance parameter encodes the Hubble parameter and the absolute magnitude and has to be marginalised over.
From now on, we will assume spatial flatness; therefore, the luminosity distance is related to the comoving distance via the equation
where is the speed of light, so that, using Eq. (15) we can obtain
The normalised Hubble function can be obtained by taking the inverse of the derivative of with respect to the redshift . An usual alternative, instead of using the full set of parameters for this sampler, is to use the Pantheon plugin for CosmoMC to constrains cosmological models (something similar as in the case of Joint Light Curve Analysis sampler ).
Since we are taking nuisance parameter in the sample, we choose the respective values of from a statistical analysis of the CDM model with Pantheon sample obtained by fixing to the Planck value given in . It is common to perform this kind of fit using computational tools that can run a standard MCMC chains. In cosmology—at least at the moment this text is writing—several codes have been implemented in order to perform the statistical fit of this parameter. The lector can explore the tool called MontePython code4 and run a standard MCMC for using the model of their preference. As an example, if we run a CDM model with this supernovae sample, the mean value obtained will be .
7. Baryon acoustic oscillation sampler
As a standard ruler, these astrophysical observations can contribute important features by comparing the data of the sound horizon today to the sound horizon at the time of recombination (extracted from the CMB anisotropy data). Usually, the baryon acoustic distances are given as a combination of the angular scale and the redshift separation.
To define these quantities we require a relationship via the ratio:
where is the comoving sound horizon at the baryon dragging epoch,
and is the drag epoch redshift with as the sound speed with and , which are the present values of baryon and photon parameters, respectively.
We define the dilation scale as:
where is the angular diameter distance given by
Using the comoving sound horizon, we can relate the distance ratio with the expansion parameter (defined such that ) and the physical densities and . Therefore, we have
with and . As we mentioned above, unfortunately, so far we have a very low data population of this sampler. Moreover, as an example for this text, we employed compilations of three current surveys: from six-degree Field Galaxy Survey (6dFGS) , from Sloan Digital Sky Survey (SDSS)  and from Baryon Oscillation Spectroscopic Survey (BOSS) with high-redshift CMASS .
We can also, add to the full sample three correlated measurements of , and from the WiggleZ survey , which has the inverse covariance matrix:
In order to perform the -statistic, we define the proper function for the BAO data as
where is given as
Then, the total is directly obtained by the sum of the individual quantity by using Eq. (27) in
8. How to deal with Bayesian statistics
Now, we are ready to introduce how to extrapolate the above frequentist analyses to the Bayesian field . The important difference between both statistics is that in the first one we are dedicated in work with a standard fit, while in the second one, we are taking into account the following idea: given a specific set of cosmological values (the priors), which are the probability of a second set of values to fit the hypothesis [57, 58, 59, 60].
The above idea is what we call a Bayesian model selection, which methodology consist in describe the relationship between the cosmological model, the astrophysical data and the prior information about the free parameters. Using Bayes theorem , we can update the prior model probability to the posterior model probability. However, when we compare models, the evidence function is used to evaluate the model’s evolution using the data at hand.
We define the evidence function as:
where is the vector of free parameters (which for the dark energy models presented in the above sections, will be given by the free parameters). is the prior distribution of these parameters.
From a computational point of view, and due to the large population of data and the model used, Eq. (30) can be difficult to calculate due that the integrations can consume to much computational time when the parametric phase space is large. Nevertheless, even when several methods exist [62, 63], in this text, we present test with a nested sampling algorithm  which has proven practicable in cosmology applications .
Once we obtain the evidence, we can therefore calculated the logarithm of the Bayes factor between two models , where the reference model () with highest evidence can be the CDM model and impose a flat prior on , i.e., we can use an exactly value of this parameter.
The interpretation of the results of this ratio can be described by a scale known as Jeffreys’s scale , which easily can be explained as follows:
if , there is no significant preference for the model with the highest evidence;
if , the preference is substantial;
and, if , it is strong; if , it is decisive.
9. About deep learning in cosmology
Although Bayesian evidence remains the preferred method compared with information criterions and Gaussian processes on the literature, a complete Bayesian inference for model selection—this to have a scenario where we can discriminate a pivot model from a hypothesis—is very computationally expensive and often suffers from multi-modal posteriors and parameter degeneracies. As we pointed out in the later section, the calculation of the evidence leads to large time consumption to obtain the final result.
As the study of the Large Scale Structure (LSS) of the universe indicates, all our knowledge relies on state-of-the art cosmological simulations to address a number of questions by constraining the cosmological parameters at hand using Bayesian techniques. Moreover, due to the computational complexity of these simulations, some studies look remains computationally infeasible for the foreseeable future. It is at this point where computational techniques as machine learning can have a number of important uses, even for trying to understand our universe.
The idea behind the machine learning is based on considering a neural network with a complex combination of neurons organised in nested layers. Each of these neuron implements a function that is parameterised by a set of weights . And every layer of a neural network thus transforms one input vector—or tensor depending the dimension—to another through a differentiable function. Theoretically, given a neuron , it will receive an input vector and the choice of an activation function , the output of the neuron can be computed as:
where is called the hidden state, is the activation function, and is the output.
The goal is to introduce a set of data in order to train this array, and therefore, the architecture can learn to finally give an output set of data. For example, the network can learn the distribution of the distance moduli in the dark energy models, then feed the astrophysical samplers (surveys) to the network to reconstruct the dark energy model and then discriminate the most probable model. 5
Moreover, while neural networks can learn complex nested representations of the data, allowing them to achieve impressive performance results, it also limits our understanding of the model learned by the network itself. The choice of an architecture  can have an important influence on the performance of the neural network. Some designs have to made concerning the number and the type of layers, as well as the number and the size of the filters used in each layer. A convenient way to select these choices is typically through experimentation—which for our universe, we will need these to happen first—as it is, we can select the size of the network, which depends on the number of training test as networks with a large number of cosmological parameters likely to overfit if not enough training tests are available.
At the moment these lines are writing, a strong interest over this kind of algorithm is not only bringing new opportunities for data-driven cosmological discovery but will also present new challenges for adopting machine learning—or, in our case, a subset of this field, deep learning—methodologies and understanding the results when the data are too complex for traditional model development and fitting with statistics. A few proposals in this area have been done to explode the deep learning methods for measurements of cosmological parameters from density fields  and for future large-scale photometric surveys .
10. Deep learning for dark energy
The first target in order to start training an astrophysical survey is to design an architecture with an objective function of neural networks that can have many unstable points and local minima. This architecture makes the optimisation process very difficult, but in real scenarios [70, 71], high levels of noise degrade the training data and typically result in optimisation scenarios with more local minima and therefore increase the difficulty in training the neural network. It can thus be desirable to start optimising the neural network using noise-free data which typically yield smoother scenarios. As an example, in Figure 1, we present a standard network using an image of a cosmological simulation (the data) and then divided an array of several layers to finally extract the output cosmological parameters value [72, 73]. Each neuron use a Bayesian process to compute the error propagation as it is done in the standard inference analyses.
Step 1. Construction of the neural network. For a Recurrent Neural Network method, we can choose values that have one layer and a certain number of neurons (e.g., you can start with 100 for a supernovae sampler).
Step 2. Organising the data. We need to sort the sampler from lower to higher redshift in the observations. Afterwards, we re-arrange our data using the number of steps (e.g., try with four steps numbered as for a supernovae sampler).
Step 3. Computing the Bayesian training. Due to the easiness of neural networks to overfit, it is important to choose a mode of regularisation. With a Bayesian standard method to compute the evidence, the algorithm can calculate errors via regularisation methods . Finally, over the cost function we can use Adam optimiser.
Step 4. Training the entire architecture. It is suitable to consider a high number of epochs (e.g., for a sampler as Pantheon, you can try with 1000 epoch per layer). After the training, it is necessarily to read the model and apply more times the same dropout to the initial model. The result of this step is the construction of the confidence regions.
Step 5. Computing modulus distance for each cosmological model. Using the definitions of , we can compute by using a specific dark energy equation of state in terms of and then integrating them.
Step 6. Computing the best fits. Finally, the output values can be obtained by using the training data as a simulated sample. We use the publicly codes CLASS6 and Monte Python7 to constrain the models as it is standard for usual Bayesian cosmology.
The results of this recipe can be seeing in Figure 2.
In this chapter, we discuss how to derive the equations of state for a specific dark energy model. Also, we studied the standard models of dark energy in order to project the cosmic acceleration according to the current data available in the literature. It is important to remark that each Bayesian statistics performed will depend solely on the data used to develop them. More the data, better the statistics. So we expect that future surveys will improve the constrains over the cosmological parameters, not only at background level, but also at perturbative level.
The exploration of these astrophysical surveys has reached a new scenario in regards to the machine learning techniques. These kind of techniques allow to explore—without technical problems in the astrophysical devices—scenarios where the pivot model of cosmology, CDM, a theoretical framework that accurately describes a large variety of cosmological observables, from the temperature anisotropies of the cosmic microwave background to the spatial distribution of galaxies. This model has a few free parameters representing fundament quantities, like the geometry and expansion rate of the Universe, the amount and nature of dark energy, and the sum of neutrino masses. Knowing the value of these parameters will improve our knowledge on the fundamental constituents and laws governing our universe. Thus, one of most important goals of modern cosmology is to constrain the value of these parameters with the highest accuracy. Therefore, as an extrapolation between the ideas of the standard cosmostatistics and the use of machine learning techniques will improve even better the constrain of the cosmological parameters without to be worried about the intrinsic uncertainties of the data .
CE-R is supported by the Royal Astronomical Society as FRAS 10147, PAPIIT Project IA100220 and ICN-UNAM projects.
- http://supernova. lbl.gov/Union/
- This word in the coloquial language also can be replaced by likelihood –do not misunderstood with the function L. Or simple we can called as samplers.
- In this text we are employing a Recurrent Neural Network. There are several in this machine learning field e.g. in  and references therein.