Bayesian Deep Learning for Dark Energy

In this chapter, we discuss basic ideas on how to structure and study the Bayesian methods for standard models of dark energy and how to implement them in the architecture of deep learning processes.


INTRODUCTION
The dark sector of our universe has been the problem of study for cosmologists who are striving to found an answer to both dark matter and dark energy. While we do have estimates of the likely percentages of baryonic matter, dark matter, and dark energy at 5%, 27%, and 68%, respectively, we have been trying to improve these quantities and optimise the numerical expense of the statistical methods employed to analyse cosmological data currently available.
These thoughts have opened the path of the following chapter, where we will discuss from the standard dark energy models to explain the cosmic acceleration until the design of a numerical architecture to understand the constraints over the cosmological parameters that can describe the current universe and its effects.
A highlight in the observed universe is the origin and nature of the cosmic accelerated expansion. The standard cosmological model that is consistent with current cosmological observations is the standard concordance model or ΛCDM. According to this model, the observed current cosmic acceleration is related to the repulsive gravitational force of a Cosmological Constant Λ with constant energy density ρ and negative pressure p. This proposal has been the backbone of the standard cosmology since the nineties, but simple enough as it is the proposal has a couple of theoretical problems, two of them are the fine-tuning argument and coincidence problem [1,2]. To found a solution to these problems, some ideas have lead to new theories that can modify the General Relativity (GR) or consider a scenario with a dynamical dark energy fluid. It is in this way that dark energy emerges as a solution since it can be described as an exotic fluid parametrised by an equation of state (EoS), which can be written in terms of the redshift, w(z). Until now, the properties of this EoS remains underresearched. There have been several proposals on dark energy parameterisations discussed in the literature (see, e.g., [3][4][5][6][7]), addressing from parameterisation as Taylor-like polynomials to dynamical EoS that can provide oscillatory behaviours.
Nowadays, the techniques to discriminate between models and confront them with ΛCDM, are based on the calculations of the constraints on the EoS free parameter(s) of the models. This methodology has been done using observables that can show the cosmic acceleration such as supernovae * Electronic address: celia.escamilla@nucleares.unam.mx type IA (SNeIa), baryon acoustic oscillations (BAO), cosmic microwave background (CMB), weak lensing spectrum, etcetera. Astrophysical measurements, e.g Pantheon from supernovae [8], or BAO from BOSS [9], allow us to constrain the cosmological parameters for a specific model. Also, with them we can test deviations from the standard ΛCDM model. Along these years, there have been many surveys related to the test of the cosmic acceleration, e.g from Union2.1 1 to the Joint LightCurve Analysis [10,11]. Moreover, the statistics have been improved due to the density of data these kinds of supernovae.

ON HOW TO MODEL DARK ENERGY
One of the first steps to understand the behaviour of the cosmic acceleration remains in that we require an energy density with negative pressure at late times. To achived this we need to express the ratio between the pressure and energy density as negative, i.e., w(z) = P/ρ < 0. In order to develop the evolution equations for a universe with this kind of fluid, we start by introducing in Einstein equations a Friedmann-Lemaitre-Robertson-Walker metric to obtain the Friedmann and Raychaudhuri equations for a spatially flat universe where H(z) is the Hubble parameter in terms of the redshift z, G the gravitational constant and the subindex 0 indicates the present-day values for the Hubble parameter and matter densities. From (2) it is possible to obtain the energy conservation equation, in that way the energy density of the non-relativistic matter is ρ m (z) = ρ 0m (1 + z) 3 . And the ρ m is given by: and the dark energy density can be modelated as ρ DE (z) = ρ 0(DE) f (z), where can be written as: If we assuming that the energy-momentum tensor (on the right side of the Einstein's equations) T µν is a perfect fluid (without viscosity or stress effects), i.e. ∇ µ T µν = 0, the form of f (z) can be restricted to be: Now, the behaviour of the latter is restricted directly to the form of w(z), which can give a description of the Hubble function (which can be normalised by the constant Hubble H 0 ), as e.g., in the case of quiessence models (w = const.) the solution of f (z) is f (z) = (1 + z) 3(1+w) . If we consider the case of the cosmological constant (w = −1) then f = 1.
Some interesting insights of the above forms for w(z) has been reported in [4,12] and references therein, where a dark energy density ρ DE with varying and non-varying w(z) are considered.
As an extension, with the later equations we can calculate the dynamical age of the universe using the follow relationship: Integrating we can obtain .
(8) From here we can set a functional form of f (z), which contribution of the dark energy density to H(z) in (1) goes to a region of negative values of w(z). The physics behind this behaviour is an impact on the evolution of dark energy using the dynamical age of the universe (8). When we compare several theoretical models in the light of observations a model approach is essential. As we mentioned in the Introduction, to obtain a dark energy model with late-time negative pressure we can think in two scenarios: • a quiessence model: which can show a wide application in tracker the slow roll condition of scalar fields and demands a constant value of w. As an example, for a flat universe and according to the Planck data [10], the dark energy EoS parameter gives w = −1.006 ± 0.045, which is consistent with the cosmological constant. This data constrain the curvature parameter at 2σ and is found to be very close to zero with |Ω k | < 0.005.
• a kinessence model; where when the EoS is a function of redshift z. For this case, several dark energy models with different parameterisations of w(z) has been discussed in the literature [12].

STANDARD DARK ENERGY MODELS
One of the most commonly used proposals in the literature are Taylor series-like parameterisations: where w n are constants and x n (z) are functions of the redshift z, or, the scalar factor a.
As brief examples, in this section we present three models that have bidimensional forms in the since that they depend only of two free parameters w i . A first target is to express the exactly form of the Hubble function using a specific expression for w given by (5). Once integrated, we can normalise this function by a Hubble parameter H 0 , from now on we called this normalisation function depending of the redshift as E(z) = H(z)/H 0 . The second target is to test these equations with the current astrophysical data available.

Lambda Cold Dark Matter-Redshift parameterisation (ΛCDM)
This standard cosmological model is represented by: where Ω m is the matter density (including the non-relativistic and dark matter components). For this model, the value of w = −1. As it is well known in the literature, this model provides a good fit for several number of observational astrophysical data without experimenting with the theoretical problems mentioned in the Introduction.

Linear-Redshift parameterisation (LR)
One of the first attempts using Taylor series -at first orderis the EoS given by [13,14] from we can recover ΛCDM model if w(z) = w = −1 with w 0 = −1 and w 1 = 0. We notice immediately that due the linear term in z, this proposal diverges at high redshift and consequently yields strong constraints on w 1 in studies involving data at high redshifts, e.g., when we use CMB data [15]. As usual, we can use the later to obtain an expression for the Hubble normalised function as:

Chevallier-Polarski-Linder Parameterization (CPL)
Due the consequence of the LP parameterisation divergence, Chevallier, Polarski and Linder proposed a simple parameterisation [16,17] that in particular can be represented by two w i parameters that are given by a present value of the EoS w 0 and its overall time evolution w 1 . The proposal is given by the expression and its evolution is As we can notice, the divergence at high redshift relax, but still this ansatz have some problems in specific low redshift range of observations.

ESTIMATING THE COSMOLOGICAL PARAMETERS
After we have defined a specific cosmological model, we can then perform their test using astrophysical observations. The methodology can be described by a simple calculation of the usual χ 2 method and then process the MCMC chains computational runs around a certain value (observational(s) point(s)), and obtain the best fit parameter(s) of this process. Parameter estimation is usually done by computing the socalled likelihood function for several values of the cosmological parameters. For each data points in the parameter space, the likelihood L function gives the minimise probability of obtaining the observational data that was obtained if the hypothesis parameters had the given values (or priors). For example, the standard cosmological model ΛCDM is described by six parameters, which include the amount of dark matter and dark energy in the universe as well as its expansion rate H. Using the CMB data (which is the accuracy data that we understand very well so far), a likelihood function can be constructed. The information given by L can tell which values of these parameters are more likely, i.e by probing many different values. Therefore, we can to determine the values of the parameters and their uncertainties via error propagation over the free parameters of the model. Now, the following question is what kind of astrophysical surveys 2 can we use to test the cosmological models? In the next sections we described the most used surveys that are employed to analise the cosmic acceleration. It is important to mention that these surveys spread depending on its nature. We have three types of observations classified as standard candels (e.g supernovae, which characteristic function is the luminosity distance), standard rulers (e.g supernovae, which characteristic function is the angular/volumen distance) and the standard sirens (e.g gravitational waves, which can be described by frequencies or chirp masses depending on the observation). The set of all of them can describe precise statistics, but by separate, each of them have intrinsic problems due to their physical definition. For supernovae, the luminosity distance has in their definition an integral of the cosmological model, therefore when we perform the error propagation, the uncertainty is high. This disadvantage can be compensated by the large population of data points in the sampler. On the other hand, the uncertainty is less for standard rulers in comparison to supernovae. In this case, the definition of angular distance does not include integrals. The price that we pay to use this kind of sampler is that the population of data is very small (e.g from surveys like BOSS or CMASS, we have only 7 data points). Moving forward, the observation of gravitational wave standard sirens would be developed into a powerful new cosmological test due that they can play an important role in breaking parameter degeneracies formed by other observations as the ones mentioned. Therefore, gravitational wave standard sirens are of great importance for the future accurate measurement of cosmological parameters. In this part of the chapter, we are going only to develop the use of the first two kinds of observations.

SUPERNOVAE SAMPLER
Over the ninety years, since their discovery, Type Ia supernovae (SNIa) have been the proof of the current cosmic acceleration. The surveys have been changing given us a large population of observations, from Union 2.1 3 to the Joint LightCurve Analysis [10,11], the data sets have been incrementing observations and also their redshift range. Currently, the Pantheon sampler, which consists of a total 1048 Type Ia supernovae (SNIa) in 40 bins [8] compressed, is the largest spectroscopically confirmed SNIa sample to date. This latter characteristic makes this sample attractive to constrain with considerably precision the free cosmological parameters of a specific model.
SNIa can give determinations of the distance modulus µ, whose theoretical prediction is related to the luminosity distance d L according to: where the luminosity distance is given in units of Mpc. In the standard statistical analysis, one adds to the distance modulus the nuisance parameter µ 0 , an unknown offset sum of the supernovae absolute magnitude (and other possible systematics), which is degenerate with H 0 . Now, the statistical analysis of the this sample rests on the definition of the modulus distance as: where d L (z j , Ω m ; θ) is the Hubble free luminosity distance: .
With this notation we expose the different roles of the several cosmological parameters appearing in the equations: the matter density parameter Ω m appears separated as it is assumed to be fixed to a prior value, while θ is the EoS parameters w i . These later are the parameters that we will be constraining by the data. The best fits will be obtained by minimising the quantity where the σ 2 µ,j are the measurement variances. And nuisance parameter µ 0 encodes the Hubble parameter and the absolute magnitude M , and has to be marginalised over.
We assume spatial flatness, where the luminosity distance is related to the comoving distance D as where c is the speed of light, so that, using (15) we can obtain The function E(z) can be calculated by considering D(z) = z 0 H 0 dz/H(z). Instead of using the entire set of parameters for the sampler, we can employ the Pantheon binned list for CosmoMC to constrain the models (analogous to the Joint Light Curve Analysis sampler [11]).
Here, M is the nuisance parameter n the sample, and we select respective values of µ 0 from a statistical analysis of the ΛCDM model with Pantheon observation obtained by fitting H 0 to the Planck value given in [? ]. This kind of fit using computational tools that can run standard MCMC chains. In cosmology -at least at the moment this text is writing-several codes have been implemented to perform the statistical fit of this parameter. The lector can explore the tool called Mon-tePython code 4 and run a standard MCMC for M using the model of their preference. As an example, if we run a ΛCDM model with this supernovae sample, the mean value obtained will be µ 0 = −19.63.

BARYON ACOUSTIC OSCILLATION SAMPLER
As a standard rulers, these astrophysical observations can contribute important features by comparing the data of the sound horizon today to the sound horizon at the time of recombination (extracted from the CMB anisotropy data). Usually, the baryon acoustic distances are given as a combination of the angular scale and the redshift separation.
To define these quantities we require a relationship via the ratio: 4 https://monte-python.readthedocs.io/en/latest/ where r s (z d ) is the comoving sound horizon at the baryon dragging epoch, and z d is the drag epoch redshift with c 2 s = c 2 /3[1 + (3Ω b0 /4Ω γ0 )(1+z) −1 ] as the sound speed with Ω b0 and Ω γ0 , which are the present values of baryon and photon parameters, respectively.
We define the dilation scale as where D A is the angular diameter distance given by .
Using the comoving sound horizon, we can relate the distance ratio d z with the expansion parameter h (defined such that H . = 100h) and the physical densities Ω m and Ω b . Therefore, we have Mpc, (25) with Ω m = 0.295±0.304 and Ω b = 0.045±0.00054 [11]. As we mentioned above, unfortunately so far we have a very low data population of this sampler. Moreover, as an example for this text, we employed compilations of three current surveys: d z (z = 0.106) = 0.336 ± 0.015 from 6-degree Field Galaxy Survey (6dFGS) [18], d z (z = 0.35) = 0.1126 ± 0.0022 from Sloan Digital Sky Survey (SDSS) [19] and d z (z = 0.57) = 0.0726 ± 0.0007 from Baryon Oscillation Spectroscopic Survey (BOSS) with high-redshift CMASS [20].
We can also, add to the full sample three correlated measurements of d z (z = 0.44) = 0.073, d z (z = 0.6) = 0.0726 and d z (z = 0.73) = 0.0592 from the WiggleZ survey [21], which has the inverse covariance matrix: In order to perform the χ 2 -statistic, we define the proper χ 2 function for the BAO data as where X BAO is given as Then, the total χ 2 BAO is directly obtained by the sum of the individual quantity by using (27) in

HOW TO DEAL WITH BAYESIAN STATISTICS
Now we are ready to introduce how to extrapolate the above frequentist analyses to the bayesian field. The important difference between both statistics is that in the first one we are dedicated in work with a standard χ 2 fit, while in the second one we are taking into account the following idea: given a specific set of cosmological values (the priors), which are the probability of a second set of values to fit the hypothesis.
The above idea is what we call a Bayesian model selection, which methodology consist in describe the relationship between the cosmological model, the astrophysical data, and the prior information about the free parameters. Using Bayes theorem [22] we can update the prior model probability to the posterior model probability. However, when we compare models, the evidence function is used to evaluate the model's evolution using the data at hand.
We define the evidence function as: where θ is the vector of free parameters (which for the dark energy models presented in the above sections, will be given by the w i free parameters). P (θ) is the prior distribution of these parameters. From a computational point of view, and due to the large population of data and the model used, (30) can be difficult to calculate since the integrations can consume much computational time when the parametric phase space is large. Nevertheless, even when several methods exist [23,24], in this text we present a test with a nested sampling algorithm [25] which has proven practicable in cosmology applications [26].
Once we obtain the evidence, we can, therefore, calculate the logarithm of the Bayes factor between two models B ij = E i /E j , where the reference model (E i ) with the highest evidence can be the ΛCDM model and impose a flat prior on H 0 , i.e we can use an exact value of this parameter.
The interpretation of the results of this ratio can be described by a scale known as Jeffreys's scale [27], which easily can be explained as follow: • if ln B ij < 1 there is not significant preference for the model with the highest evidence; • if 1 < ln B ij < 2.5 the preference is substantial; • and, if 2.5 < ln B ij < 5 it is strong; if ln B ij > 5 it is decisive.

ABOUT DEEP LEARNING IN COSMOLOGY
Bayesian evidence method remains the preferred method compared with information criteria and Gaussian processes in the literature. A full Bayesian inference for model selection -in the case we have a landscape in where we can discriminate a pivot model from a hypothesis-is computationally expensive and often suffers from multi-modal posteriors and parameter degeneracies. This latter issue leads to a large time consumption to obtain the final best fit for the free parameters. As the study of the Large Scale Structure (LSS) of the universe indicates, all our knowledge relies on state-of-the-art cosmological simulations to address several questions by constraining the cosmological parameters at hand using Bayesian techniques. Moreover, due to the computational complexity of these simulations, some studies look to remain computationally infeasible for the foreseeable future. It is at this point where computational techniques as machine learning can have some important uses, even for trying to understand our universe.
The idea behind the machine learning is based in consider a neural network with a complex combination of neurons organised in nested layers. Each of these neuron implements a function that is parametrised by a set of weights W . And every layer of a neural network thus transforms one input vector -or tensor depending the dimension-to another through a differentiable function. Theoretically, given a neuron n it will receive an input vector and the choice of an activation function A n , the output of the neuron can be computed as where h <t> is called the hidden state, A n is the activation function and y t is the output. The goal to introduce a set of data to train this array and therefore the architecture can learn to finally give an output set of data. For example: the network can learn the distribution of the distance moduli in the dark energy models, then feed the astrophysical samplers (surveys) to the network to reconstruct the dark energy model and then discriminate the most probable model. 5 Moreover, while neural networks can learn complex nested representations of the data, allowing them to achieve impressive performance results, it also limits our understanding of the model learned by the network itself. The choice of an architecture [28] can have an important influence on the performance of the neural network. Some designs have to make concerning the number and the type of layers, as well as the number and the size of the filters used in each layer. A convenient way to select these choices is typically through experimentation -which for our universe, we will need these to happen first.-As it is, we can select the size of the network, which depends on the number of training tests as networks with a large number of cosmological parameters are likely to overfit if not enough training tests are available.
At the moment these lines are writing, a strong interest over this kind of algorithm is bringing new opportunities for datadriven cosmological discovery, but will also present new challenges for adopting machine learning -or, in our case, a subset of this field, deep learning-methodologies and understanding the results when the data are too complex for traditional model development and fitting with statistics. A few proposals in this area has been done to explore the deep learning methods for measurements of cosmological parameters from density fields [29] and for future large-scale photometric surveys [30].

DEEP LEARNING FOR DARK ENERGY
A first target to start to train an astrophysical survey is to design an architecture with an objective function of neural networks that can have many unstable points and a local minima. This architecture makes the optimisation process very difficult, but in real scenarios, high levels of noise degrade the training data and typically result in optimisation scenarios with more local minima and therefore increases the difficulty in training the neural network. It can thus be desirable to start optimising the neural network using noise-free data which typically yields smoother scenarios. As an example, in Figure 1 we present a standard network using an image of a cosmological simulation (the data) and then divided an array of several layers to finally extract the output cosmological parameters value. Each neuron uses a Bayesian process to compute the error propagation as it is done in the standard inference analyses.
We can describe a quick, but effective, recipe to develop a Recurrent Neural Network with a Bayesian computation training in the following steps: • Step 1. Construction of the neural network. For a Recurrent Neural Network method we can choose values that have one layer and a certain number of neurons (e.g you can start with 100 for a supernovae sampler).
• Step 2. Organising the data. We need to sort the sampler from lower to higher redshift in the observations. Afterward, we re-arrange our data using the number of steps (e.g try with 4 steps numbered as x i for a supernovae sampler).
• Step 3. Computing the Bayesian training. Due to the easiness of neural networks to overfit, it is important to choose a mode of regularisation. With a Bayesian standard method to compute the evidence, the algorithm can calculate errors via regularisation methods [31]. Finally, over the cost function we can use Adam optimiser.
• Step 4. Training the entire architecture. It is suitable to consider a high number of epochs (e.g for a sampler as Pantheon, you can try with 1000 epoch per layer). After the training, it is necessary to read the model and apply more times the same dropout to the initial model. The result of this step is the construction of the confidence regions.
• Step 5. Computing modulus distance µ(z) for each cosmological model. Using the definitions of E(z), we can compute µ(z) by using a specific dark energy equation of state in terms of z and then integrating them. • Step 6. Computing the best fits. Finally, the output values can be obtained by using the training data as a simulated sample. We use the publicly codes CLASS 6 and Monte Python 7 to constrain the models as it is standard for usual Bayesian cosmology.
The results of this recipe can be seen in Figure 2.

CONCLUSIONS
In this chapter we present how to compute the EoS for dark energy models that lead to an understanding of the problem of the observed cosmic acceleration. Notice that each Bayesian evidence performed will depend on the density data used to develop each cosmological proposal. If we consider more data, the better the statistical analysis will be. Therefore, we expect that future surveys at higher redshift will improve the constraints over the cosmological parameters of the model. The exploration of these astrophysical surveys have reached a new scenario in regards to the machine learning techniques [32,33]. These kinds of techniques allow to explore -without technical problems in the astrophysical devices-scenarios where the pivot model of cosmology, ΛCDM, a theoretical framework that accurately describes a large variety of cosmological observables, from the temperature anisotropies of the cosmic microwave background to the spatial distribution of galaxies. This scenario has a few free cosmological parameters denoted by fundamental quantities, like the geometry and the Hubble flow, the amount and nature of dark energy, and the sum of neutrino masses. If we know the value of these parameters, we will have the capability to improve the fundamental constituents and laws governing our universe.
Thus, one of the most important goals of modern cosmology is to constrain the value of these parameters with the highest accuracy. Therefore, as an extrapolation between the ideas of the standard cosmostatistics and the use of machine learning techniques will improve even better the constrain of the cosmological parameters without being worried about the intrinsic uncertainties of the data.