Open access peer-reviewed chapter

Bayesian Deep Learning for Dark Energy

By Celia Escamilla-Rivera

Submitted: November 21st 2019Reviewed: February 3rd 2020Published: May 1st 2020

DOI: 10.5772/intechopen.91466

Downloaded: 433


In this chapter, we discuss basic ideas on how to structure and study the Bayesian methods for standard models of dark energy and how to implement them in the architecture of deep learning processes.


  • cosmology
  • dark energy
  • Bayesian analyses
  • machine learning
  • cosmological parameters

1. Introduction

The dark sector of the universe has been the issue of study for cosmologists who are striving to understand the world around us in its entirety. The composition of the current universe is an age-old inquiry that these researchers have probed into. And while we do have estimates of the likely percentages of baryonic matter, dark matter, and dark energy at 5, 27 and 68%, respectively, researchers have been trying to improve these estimates and optimise the computational expense of the statistical methods employed to analyse cosmological data available.

These thoughts have opened the path of the following chapter, in where we will discuss from the standard dark energy models to explain the cosmic acceleration until the design of a numerical architecture in order to understand the constrains over the cosmological parameters that can describe the current universe and its effects.


2. Dark energy as a solution to the cosmic acceleration

A highlight in observational cosmology is the origin and nature of the cosmic accelerated expansion. The standard cosmological model that is consistent with current cosmological observations is the so-called concordance model or ΛCDM. According to this scenario, the observed accelerating expansion is related to the repulsive gravitational force of a Cosmological Constant Λwith constant energy density ρand negative pressure p. This proposal has been the backbone of the standard cosmology since the nineties, but simple enough as it is the proposal that has a couple of theoretical problems; two of them are the fine tuning argument and coincidence problem [1, 2]. In order to solve or at least relax these problems, some proposals have led to alternative scenarios that can modify the general relativity (GR) or consider a landscape with a dynamical dark energy. It is in this way that dark energy emerges as a cosmological solution since it can be described as a fluid parameterised by an equation of state (EoS), which can be written in terms of the redshift, wz. So far, the properties of this EoS remain under-researched. Just to mention a few, there are a zoo of proposals on dark energy parameterisations discussed in the literature (see, e.g., [3, 4, 5, 6, 7, 8, 9]), addressing from parameterisation as Taylor-like series to dynamical wzthat can provide oscillatory behaviours [10, 11, 12, 13].

Nowadays, the techniques to discriminate between models and confront them with ΛCDM are based on the calculations of the constraints on the EoS-free parameter(s) of the models. This methodology has been done using observables that can show the cosmic acceleration such as supernovae type IA (SNeIa), baryon acoustic oscillations (BAO), cosmic microwave background (CMB), weak lensing spectrum, etc. The relevance of using these observations is due to the precision with which dark energy can be probed. Currently, some measurements such as the Pantheon from supernovae [14], BOSS [15], just to cite a few, point out a way to constrain these EoS parameters. These observations allow deviations from the ΛCDM model, which are usually parameterised by the EoS- free parameters [16, 17, 18, 19, 20]. In past years, there have been many observations related to the verification of the cosmic acceleration, for example, from Union 2.11 to the Joint LightCurve Analysis [21, 22]. But the statistics has been improved due to the density of data this kind of supernovae.

3. On how to model dark energy

One of the first steps to understand the behaviour of the cosmic acceleration remains in that we require an energy density with negative pressure at late times [23]. To achieve this, we need to express the ratio between the pressure and energy density as negative, i.e., wz=P/ρ<0. In order to develop the evolution equations for a universe with this kind of fluid, we start by introducing in Einstein equations a Friedmann-Lemaitre-Robertson-Walker metric to obtain the Friedmann and Raychaudhuri equations for a spatially flat universe:




where Hzis the Hubble parameter in terms of the redshift z, Gis the gravitational constant and the subindex 0indicates the present-day values for the Hubble parameter and matter densities.

From Eq. (2), it is possible to obtain the energy conservation equation, in that way, the energy density of the non-relativistic matter is ρmz=ρ0m1+z3. And the ρmis given by:


and the dark energy density can be modulated as ρDEz=ρ0DEfz, where can be written as:


If we assume that the energy-momentum tensor (on the right side of the Einstein’s equations) Tμνis a perfect fluid (without viscosity or stress effects), i.e., μTμν=0, the form of fzcan be restricted to be:


Now, the behaviour of the latter is restricted directly to the form of wz, which can give a description of the Hubble function (which can be normalised by the constant Hubble H0), as for example, in the case of quiescence models (w=const.) the solution of fzis fz=1+z31+w. If we consider the case of the cosmological constant (w=1), then f=1.

Some interesting insights of the above forms for wzhas been reported in [4, 24] and references therein, where a dark energy density ρDEwith varying and non-varying wzis considered.

As an extension, with the later equations we can calculate the dynamical age of the universe using the follow relationship:


Integrating, we can obtain:


From here, we can set a functional form of fz, in which contribution of the dark energy density to Hzin Eq. (1) goes to a region of negative values of wz. The physics behind this behaviour is an impact on the evolution of dark energy using the dynamical age of the universe Eq. (8). When we compare several theoretical models in the light of observations, a model approach is essential. As we mentioned in the “Introduction” section, to obtain a dark energy model with late-time negative pressure, we can think in two scenarios:

  • a quiescence model, which can show a wide application in tracker the slow roll condition of scalar fields and demands a constant value of w. As an example, for a flat universe and according to the Planck data [21], the dark energy EoS parameter gives w=1.006±0.045, which is consistent with the cosmological constant. These data constrain the curvature parameter at 2 σand are found to be very close to 0 with Ωk<0.005.

  • a kinessence model; where when the EoS is a function of redshift z. For this case, several dark energy models with different parameterisations of wzhave been discussed in the literature [24].

4. Standard dark energy models

One of the most commonly used proposals in the literature are Taylor series-like parameterisations [25, 26, 27, 28, 29]:


where wnare constants and xnzare functions of the redshift z, or, the scalar factor a. As brief examples, in this section, we present three models that have bidimensional forms in the since that they depend only of two free parameters wi. A first target is to express the exact form of the Hubble function using a specific expression for wgiven by Eq. (5). Once integrated, we can normalise this function by a Hubble parameter H0, and from now on, we called this normalisation function depending of the redshift as Ez=Hz/H0. The second target is to test these equations with the current astrophysical data available.

4.1 Lambda cold dark matter-redshift parameterisation (ΛCDM)

This model is given by:


where Ωmrepresented the matter density (including the non-relativistic and dark matter terms). We consider in fzthe value of w=1. As it is well known in the literature, this standard model provides a good fit for a large number of observational data surveys without addressing the important theoretical problems mentioned above.

4.2 Linear-redshift parameterisation (LR)

One of the first attempts using Taylor series—at first order—is the EoS given by [30, 31].


from we can recover ΛCDMmodel if wz=w=1with w0=1and w1=0. We notice immediately that due the linear term in z, this proposal diverges at high redshift and consequently yields strong constraints on w1in studies involving data at high redshifts, e.g., when we use CMB data [32].

As usual, we can use the later to obtain an expression for the Hubble normalised function as:


4.3 Chevallier-Polarski-Linder parameterization (CPL)

Due the consequence of the LP parameterisation divergence, Chevallier, Polarski and Linder proposed a simple parameterisation [33, 34] that in particular can be represented by two wiparameters that are given by a present value of the EoS w0and its overall time evolution w1. The proposal is given by the expression:


and its evolution is


As we can notice, the divergence at high redshift relaxes, but still this ansatz has some problems in specific low redshift range of observations.

5. Estimating the cosmological parameters

After we have defined a specific cosmological model, we can then perform their test using astrophysical observations. The methodology can be described by a simple calculation of the usual χ2method and then process the MCMC chains computational runs around a certain value [observational(s) point(s)] and obtain the best fit parameter(s) of this process. Parameter estimation is usually done by computing the so-called likelihood functionfor several values of the cosmological parameters. For each data points in the parameter space, the likelihood function gives the minimised probability of obtaining the observational data that was obtained if the hypothesis parameters had the given values (or priors). For example, the standard cosmological model ΛCDM is described by six parameters, which include the amount of dark matter and dark energy in the universe as well as its expansion rate H. Using the CMB data (which is the accuracy data that we understand very well so far), a likelihood function can be constructed. The information given by Lcan tell which values of these parameters are more likely, i.e. by probing many different values. Therefore, we are able to determine the values of the parameters and their uncertainties via error propagation over the free parameters of the model.

Now, the following question is that what kind of astrophysical surveys2 can we use to test the cosmological models. In the next sections we described the most used surveys that are employed to analyse the cosmic acceleration. It is important to mention that these surveys spread depending upon their own nature. We have three types of observations classified as: standard candles (e.g., supernovae, in which characteristic function is the luminosity distance), standard rulers (e.g., supernovae, in which characteristic function is the angular/volumen distance), and the standard sirens (e.g., gravitational waves, which can be described by frequencies or chirp masses depending the observation) [35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45]. The set of all of them can describe a precise statistics, but by separate, each of them have intrinsic problems due to their physical definition. For supernovae, the luminosity distance has in their definition an integral of the cosmological model; therefore, when we perform the error propagation, the uncertainty is high. This disadvantage can be compensated by the large population of data points in the sampler. On the other hand, the uncertainty is less for standard rulers in comparison to supernovae. For this case, the definition of angular distance does not include integrals. The price that we pay in order to use this kind of sampler is that the population of data is very small (e.g., from surveys like BOSS or CMASS, we have only seven data points). Moving forward, the observation of gravitational wave standard sirens would be developed into a powerful new cosmological test due that they can play an important role in breaking parameter degeneracies formed by other observations as the ones mentioned. Therefore, gravitational wave standard sirens are of great importance for the future accurate measurement of cosmological parameters. In this part of the chapter, we are going only to develop the use of the first two kinds of observations.


6. Supernovae sampler

Along the ninety years, since their discovery, Type IA supernovae (SNIa) have been the proof of the current cosmic acceleration. The surveys have been changing given us a large population of observations, from Union 2.13 to the Joint LightCurve Analysis [21, 22], the data sets have been incrementing observations and also their redshift range. Currently, the Pantheon sampler, which consists of a total 1048 Type Ia supernovae (SNIa) in 40 bins [14] compressed, is the largest spectroscopically confirmed SNIa sample to date. This latter characteristic makes this sample attractive to constrain with considerably precision the free cosmological parameters of a specific model.

SNIa can give determinations of the distance modulus μ, whose theoretical prediction is related to the luminosity distance dLaccording to:


where the luminosity distance is given in units of Mpc. In the standard statistical analysis, one adds to the distance modulus the nuisance parameter μ0, an unknown offset sum of the supernovae absolute magnitude (and other possible systematics), which is degenerate with H0.

Now, the statistical analysis of the this sample rests on the definition of the modulus distance as:


where dLzjΩmθis the Hubble-free luminosity distance:


With this notation, we expose the different roles of the several cosmological parameters appearing in the equations: the matter density parameter Ωmappears separated as it is assumed to be fixed to a prior value, while θis the EoS parameters wi. These later are the parameters that we will be constraining by the data. The best fits will be obtained by minimising the quantity [46, 47, 48, 49, 50]:


where σμ,j2are the measurement variances. And nuisance parameter μ0encodes the Hubble parameter and the absolute magnitude Mand has to be marginalised over.

From now on, we will assume spatial flatness; therefore, the luminosity distance is related to the comoving distance Dvia the equation


where cis the speed of light, so that, using Eq. (15) we can obtain


The normalised Hubble function Ezcan be obtained by taking the inverse of the derivative of Dzwith respect to the redshift Dz=0zH0dz˜/Hz˜. An usual alternative, instead of using the full set of parameters for this sampler, is to use the Pantheon plugin for CosmoMC to constrains cosmological models (something similar as in the case of Joint Light Curve Analysis sampler [22]).

Since we are taking nuisance parameter Min the sample, we choose the respective values of μ0from a statistical analysis of the ΛCDM model with Pantheon sample obtained by fixing H0to the Planck value given in [51]. It is common to perform this kind of fit using computational tools that can run a standard MCMC chains. In cosmology—at least at the moment this text is writing—several codes have been implemented in order to perform the statistical fit of this parameter. The lector can explore the tool called MontePython code4 and run a standard MCMC for Musing the model of their preference. As an example, if we run a ΛCDM model with this supernovae sample, the mean value obtained will be μ0=19.63.

7. Baryon acoustic oscillation sampler

As a standard ruler, these astrophysical observations can contribute important features by comparing the data of the sound horizon today to the sound horizon at the time of recombination (extracted from the CMB anisotropy data). Usually, the baryon acoustic distances are given as a combination of the angular scale and the redshift separation.

To define these quantities we require a relationship via the ratio:


where rszdis the comoving sound horizon at the baryon dragging epoch,


and zdis the drag epoch redshift with cs2=c2/31+3Ωb0/4Ωγ01+z1as the sound speed with Ωb0and Ωγ0, which are the present values of baryon and photon parameters, respectively.

We define the dilation scale as:


where DAis the angular diameter distance given by


Using the comoving sound horizon, we can relate the distance ratio dzwith the expansion parameter h(defined such that H100h) and the physical densities Ωmand Ωb. Therefore, we have


with Ωm=0.295±0.304and Ωb=0.045±0.00054[22]. As we mentioned above, unfortunately, so far we have a very low data population of this sampler. Moreover, as an example for this text, we employed compilations of three current surveys: dzz=0.106=0.336±0.015from six-degree Field Galaxy Survey (6dFGS) [52], dzz=0.35=0.1126±0.0022from Sloan Digital Sky Survey (SDSS) [53] and dzz=0.57=0.0726±0.0007from Baryon Oscillation Spectroscopic Survey (BOSS) with high-redshift CMASS [54].

We can also, add to the full sample three correlated measurements of dzz=0.44=0.073, dzz=0.6=0.0726and dzz=0.73=0.0592from the WiggleZ survey [55], which has the inverse covariance matrix:


In order to perform the χ2-statistic, we define the proper χ2function for the BAO data as


where XBAOis given as


Then, the total χBAO2is directly obtained by the sum of the individual quantity by using Eq. (27) in


8. How to deal with Bayesian statistics

Now, we are ready to introduce how to extrapolate the above frequentist analyses to the Bayesian field [56]. The important difference between both statistics is that in the first one we are dedicated in work with a standard χ2fit, while in the second one, we are taking into account the following idea: given a specific set of cosmological values (the priors), which are the probability of a second set of values to fit the hypothesis [57, 58, 59, 60].

The above idea is what we call a Bayesian model selection, which methodology consist in describe the relationship between the cosmological model, the astrophysical data and the prior information about the free parameters. Using Bayes theorem [61], we can update the prior model probability to the posterior model probability. However, when we compare models, the evidence function is used to evaluate the model’s evolution using the data at hand.

We define the evidence function as:


where θis the vector of free parameters (which for the dark energy models presented in the above sections, will be given by the wifree parameters). Pθis the prior distribution of these parameters.

From a computational point of view, and due to the large population of data and the model used, Eq. (30) can be difficult to calculate due that the integrations can consume to much computational time when the parametric phase space is large. Nevertheless, even when several methods exist [62, 63], in this text, we present test with a nested sampling algorithm [64] which has proven practicable in cosmology applications [65].

Once we obtain the evidence, we can therefore calculated the logarithm of the Bayes factor between two models Bij=i/j, where the reference model (i) with highest evidence can be the ΛCDM model and impose a flat prior on H0, i.e., we can use an exactly value of this parameter.

The interpretation of the results of this ratio can be described by a scale known as Jeffreys’s scale [66], which easily can be explained as follows:

  • if lnBij<1, there is no significant preference for the model with the highest evidence;

  • if 1<lnBij<2.5, the preference is substantial;

  • and, if 2.5<lnBij<5, it is strong; if lnBij>5, it is decisive.

9. About deep learning in cosmology

Although Bayesian evidence remains the preferred method compared with information criterions and Gaussian processes on the literature, a complete Bayesian inference for model selection—this to have a scenario where we can discriminate a pivot model from a hypothesis—is very computationally expensive and often suffers from multi-modal posteriors and parameter degeneracies. As we pointed out in the later section, the calculation of the evidence leads to large time consumption to obtain the final result.

As the study of the Large Scale Structure (LSS) of the universe indicates, all our knowledge relies on state-of-the art cosmological simulations to address a number of questions by constraining the cosmological parameters at hand using Bayesian techniques. Moreover, due to the computational complexity of these simulations, some studies look remains computationally infeasible for the foreseeable future. It is at this point where computational techniques as machine learning can have a number of important uses, even for trying to understand our universe.

The idea behind the machine learning is based on considering a neural network with a complex combination of neurons organised in nested layers. Each of these neuron implements a function that is parameterised by a set of weights W. And every layer of a neural network thus transforms one input vector—or tensor depending the dimension—to another through a differentiable function. Theoretically, given a neuron n, it will receive an input vector and the choice of an activation function An, the output of the neuron can be computed as:


where h<t>is called the hidden state, Anis the activation function, and ytis the output.

The goal is to introduce a set of data in order to train this array, and therefore, the architecture can learn to finally give an output set of data. For example, the network can learn the distribution of the distance moduli in the dark energy models, then feed the astrophysical samplers (surveys) to the network to reconstruct the dark energy model and then discriminate the most probable model. 5

Moreover, while neural networks can learn complex nested representations of the data, allowing them to achieve impressive performance results, it also limits our understanding of the model learned by the network itself. The choice of an architecture [67] can have an important influence on the performance of the neural network. Some designs have to made concerning the number and the type of layers, as well as the number and the size of the filters used in each layer. A convenient way to select these choices is typically through experimentation—which for our universe, we will need these to happen first—as it is, we can select the size of the network, which depends on the number of training test as networks with a large number of cosmological parameters likely to overfit if not enough training tests are available.

At the moment these lines are writing, a strong interest over this kind of algorithm is not only bringing new opportunities for data-driven cosmological discovery but will also present new challenges for adopting machine learning—or, in our case, a subset of this field, deep learning—methodologies and understanding the results when the data are too complex for traditional model development and fitting with statistics. A few proposals in this area have been done to explode the deep learning methods for measurements of cosmological parameters from density fields [68] and for future large-scale photometric surveys [69].


10. Deep learning for dark energy

The first target in order to start training an astrophysical survey is to design an architecture with an objective function of neural networks that can have many unstable points and local minima. This architecture makes the optimisation process very difficult, but in real scenarios [70, 71], high levels of noise degrade the training data and typically result in optimisation scenarios with more local minima and therefore increase the difficulty in training the neural network. It can thus be desirable to start optimising the neural network using noise-free data which typically yield smoother scenarios. As an example, in Figure 1, we present a standard network using an image of a cosmological simulation (the data) and then divided an array of several layers to finally extract the output cosmological parameters value [72, 73]. Each neuron use a Bayesian process to compute the error propagation as it is done in the standard inference analyses.

Figure 1.

A deep learning architecture for dark energy.

We can describe a quickly, but effective, recipe to develop a Recurrent Neural Network with a Bayesian computation training [29, 74, 75, 76, 77, 78] in the following steps:

  • Step 1. Construction of the neural network. For a Recurrent Neural Network method, we can choose values that have one layer and a certain number of neurons (e.g., you can start with 100 for a supernovae sampler).

  • Step 2. Organising the data. We need to sort the sampler from lower to higher redshift in the observations. Afterwards, we re-arrange our data using the number of steps (e.g., try with four steps numbered as xifor a supernovae sampler).

  • Step 3. Computing the Bayesian training. Due to the easiness of neural networks to overfit, it is important to choose a mode of regularisation. With a Bayesian standard method to compute the evidence, the algorithm can calculate errors via regularisation methods [74]. Finally, over the cost function we can use Adam optimiser.

  • Step 4. Training the entire architecture. It is suitable to consider a high number of epochs (e.g., for a sampler as Pantheon, you can try with 1000 epoch per layer). After the training, it is necessarily to read the model and apply more times the same dropout to the initial model. The result of this step is the construction of the confidence regions.

  • Step 5. Computing modulus distance μzfor each cosmological model. Using the definitions of Ez, we can compute μzby using a specific dark energy equation of state in terms of zand then integrating them.

  • Step 6. Computing the best fits. Finally, the output values can be obtained by using the training data as a simulated sample. We use the publicly codes CLASS6 and Monte Python7 to constrain the models as it is standard for usual Bayesian cosmology.

  • The results of this recipe can be seeing in Figure 2.

Figure 2.

Statistical contours levels forΛCDM using observational data (red colour) and training deep learning data (blue colour).

11. Conclusions

In this chapter, we discuss how to derive the equations of state for a specific dark energy model. Also, we studied the standard models of dark energy in order to project the cosmic acceleration according to the current data available in the literature. It is important to remark that each Bayesian statistics performed will depend solely on the data used to develop them. More the data, better the statistics. So we expect that future surveys will improve the constrains over the cosmological parameters, not only at background level, but also at perturbative level.

The exploration of these astrophysical surveys has reached a new scenario in regards to the machine learning techniques. These kind of techniques allow to explore—without technical problems in the astrophysical devices—scenarios where the pivot model of cosmology, ΛCDM, a theoretical framework that accurately describes a large variety of cosmological observables, from the temperature anisotropies of the cosmic microwave background to the spatial distribution of galaxies. This model has a few free parameters representing fundament quantities, like the geometry and expansion rate of the Universe, the amount and nature of dark energy, and the sum of neutrino masses. Knowing the value of these parameters will improve our knowledge on the fundamental constituents and laws governing our universe. Thus, one of most important goals of modern cosmology is to constrain the value of these parameters with the highest accuracy. Therefore, as an extrapolation between the ideas of the standard cosmostatistics and the use of machine learning techniques will improve even better the constrain of the cosmological parameters without to be worried about the intrinsic uncertainties of the data [79].


CE-R is supported by the Royal Astronomical Societyas FRAS 10147, PAPIITProject IA100220 and ICN-UNAM projects.


  • http://supernova.
  • This word in the coloquial language also can be replaced by likelihood –do not misunderstood with the function L. Or simple we can called as samplers.
  • In this text we are employing a Recurrent Neural Network. There are several in this machine learning field e.g. in [57] and references therein.

© 2020 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution 3.0 License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite and reference

Link to this chapter Copy to clipboard

Cite this chapter Copy to clipboard

Celia Escamilla-Rivera (May 1st 2020). Bayesian Deep Learning for Dark Energy, Cosmology 2020 - The Current State, Michael L. Smith, IntechOpen, DOI: 10.5772/intechopen.91466. Available from:

chapter statistics

433total chapter downloads

More statistics for editors and authors

Login to your personal dashboard for more detailed statistics on your publications.

Access personal reporting

Related Content

This Book

Next chapter

The Tension over the Hubble-Lemaitre Constant

By Michael L. Smith and Ahmet M. Öztaș

Related Book

First chapter

Cosmological Constant and Dark Energy: Historical Insights

By Emilio Elizalde

We are IntechOpen, the world's leading publisher of Open Access books. Built by scientists, for scientists. Our readership spans scientists, professors, researchers, librarians, and students, as well as business professionals. We share our knowledge and peer-reveiwed research papers with libraries, scientific and engineering societies, and also work with corporate R&D departments and government entities.

More About Us