Exploring the Components of the Universe Through Higher-Order Weak Lensing Statistics Exploring the Components of the Universe Through Higher-Order Weak Lensing Statistics

and wavelet transform of the are comparable, the wavelet transform statistics appearing to offer a slight improvement over the M ap statistics in all cases. is perhaps related to the fact when computing the M ap statistics, we truncate the ﬁlter at ϑ cut = θ , which may give rise to some small systematics in the resulting maps. Furthermore, it is clear that, in all cases, the peak statistics are much more


Introduction
Our current cosmological model, backed by a large body of evidence from a variety of different cosmological probes (for example, see [1,2]), describes a Universe comprised of around 5% normal baryonic matter, 22% cold dark matter and 73% dark energy. While many cosmologists accept this so-called concordance cosmology -the ΛCDM cosmological model -as accurate, very little is known about the nature and properties of these dark components of the Universe.
Moreover, gravitational lensing probes cosmological perturbations on smaller angular scales than CMB studies, and is thus sensitive to the non-Gaussianity induced by the late-time non-linear evolution of structures such as clusters of galaxies, as well as any primordial non-Gaussianity arising, for example, due to inflation very early in the evolution of the Universe.
Traditionally, constraints from gravitational lensing are obtained by considering two-point statistics of the lensing shear field, which encodes the small elliptical distortions applied to the images of galaxies as a result of the gravitational potential field of structures along the light's path. Such two-point statistics are only weakly sensitive to the dark energy density parameter Ω Λ 1 , and depend on a degenerate combination of the amplitude of the matter power spectrum σ 8 and the matter density parameter Ω M . In addition, two-point statistics probe only the Gaussian part of the shear field, therefore in considering such statistics alone, information about nonlinear structure evolution and primordial non-Gaussianity is lost.
Similarly, attempts to reconstruct a map of the two-dimensional projected surface mass density (the convergence κ) and three dimensional density field have often involved the use of Gaussian priors to constrain the reconstruction (for example, see [4]), thus again having limited application in studies of non-Gaussianity.
In this chapter, we present a review of recently developed gravitational lensing techniques that go beyond the standard two-point statistics, both in the arena of map-making in two and three dimensions and that of higher-order statistics of the shear field or, equivalently, the convergence field κ.
The methods we present are all based on the concept of sparse recovery, which has been found to be a very powerful tool in signal processing [5,6]. Such methods are based on the assumption that a given image or observation can be represented sparsely in an appropriate basis (such as a Fourier or wavelet basis). Sparse priors using a wavelet basis have been used in many areas of signal processing in astronomy; of particular interest in this chapter will be the areas of denoising and in the reconstruction of the 3D density field from lensing measurements, but other applications include deconvolution and inpainting.
There is much information to be gained by considering non-Gaussian statistics and nonlinear signal processing methods, and this is an exciting and active area of research. With new surveys such as the Euclid satellite [7] coming online within the next decade, a wealth of high-quality data will soon be available. These new techniques will therefore prove invaluable in constraining the cosmological model, and allow us to better understand the nature of the primary constituents of the Universe.
originating at angle β is deflected such that it appears to the observer to originate at an angular position θ, where β = θ − α(θ) = θ − ∇ψ(θ), (1) where ψ is the two-dimensional deflection potential associated with the lens and α is the deflection angle.
For extended sources, photons from different angular positions in the source plane are deflected differently, giving rise to a distortion in the observed galaxy image, which is described -to first order -by the Jacobian of the lens equation (1): where κ = 1 2 ∇ 2 ψ is the convergence, or dimensionless surface density, and γ = γ 1 + iγ 2 = |γ|e 2iφ is the complex shear, which gives rise to an anisotropic elliptical distortion in the lensed image. The components of the shear are related to the potential via: where ∂ i = ∂/∂θ i , and the shear is related to the convergence through the relation: where the convolution kernel D is given by The convergence, in turn, can be related to the 3D density contrast δ(r) ≡ ρ(r)/ρ − 1 by where ρ is the mean density of the Universe, H 0 is the Hubble parameter, Ω M is the matter density parameter, c is the speed of light, a(w) is the scale parameter evaluated at comoving distance w, and gives the comoving angular diameter distance as a function of the comoving distance and the curvature, K, of the Universe.

Two-point statistics
The typical gravitational shear applied to a galaxy as a result of lensing by large-scale structure in the Universe -so-called cosmic shear -is of order |γ| ∼ 0.01. However, galaxies are intrinsically elliptical in shape, with a typical ellipticity of order |ε| ∼ 0.2 − 0.3, therefore the gravitational lensing effect can only be measured statistically; under the assumption that galaxy shapes are intrinsically uncorrelated, the mean ellipticity computed over a large number of sources will yield the gravitational shear: ε ≃ γ.
The most common method for constraining cosmological parameters in weak lensing studies is to use two-point statistics of the shear field. The power spectrum of the shear or convergence, P κ (ℓ) = P γ (ℓ), can be related directly to the 3D matter power spectrum of density fluctuations δ by: where is a weighting function integrated over the probability distribution of sources p(z) in the sample as a function of redshift z.
When working with real data, it is often more convenient to consider statistics computed in real space, namely the shear correlation functions, which are defined in terms of the convergence power spectrum as [10,11]: where J n is an n-th order Bessel function of the first kind, and γ t and γ × are the tangential and cross shear components, respectively, which are defined relative to the vector connecting the two galaxies separated by angular distance θ.

Cosmological constraints
The measured two-point statistics described above can, by virtue of their relationship to the power spectrum of the underlying matter density fluctuations, be used to place constraints on cosmological parameters. These constraints may be improved if information about the distances to source galaxies is used to bin the galaxies into redshift slices, and compute the correlation functions tomographically. Figure 1 shows cosmological parameter constraints obtained using two state-of-the-art telescopes: the Canada-France-Hawaii Telescope (CFHT), and the Hubble Space Telescope (specifically, the Cosmos survey). While the CFHT results were obtained using a 2D analysis [12], the Cosmos results show constraints obtained by both a 2D and a 3D analysis of the shear data [13]. In both cases, a large degeneracy is seen between the matter density parameter Ω M and σ 8 , the amplitude of the matter power spectrum. The CFHT results also show constraints on the dark energy equation of state parameter w. Here, again, a degeneracy between parameters is seen.
(a) Joint constraints on Ω M and σ 8 from a 2D analysis of the combined CFHT wide and deep surveys [12].
(b) Joint constraints on Ω M and w from a 2D analysis of the CFHT deep survey [12]. Ωm σ8 (c) Joint constraints on Ω M and σ 8 from a 2D and 3D analysis of the Cosmos survey data [13]. Contours are plotted at the 68% (cyan), 95% (blue) and 99.9% (green) confidence levels for the CFHT data, and at the 68% (inner) and 95% (outer) confidence levels for the Cosmos data.
While it is clear from the figures that two-point shear statistics hold a wealth of information, it is also evident that these statistics depend on degenerate combinations of the cosmological model parameters. In order to break these degeneracies, and thus more tightly constrain our cosmological model, we must therefore consider higher-order statistics of the shear or convergence. Such higher-order statistics will enable us to capture information about the non-Gaussian part of the lensing spectrum due to a combination of primordial non-Gaussianity and late-time nonlinear evolution of structures, which two-point statistics are unable to probe.

Scalar fields associated with the shear
The shear γ is a spin-2 field, therefore while the two-point correlations of the shear field can be reduced to a scalar quantity for parity reasons, this is not the case for higher-order moments of the shear field [14][15][16]. For this reason, it is useful to compute a scalar quantity from the shear field before computing higher-order statistics.

The aperture mass statistic
For this purpose, the aperture mass statistic M ap [17,18] is widely-used. The M ap statistic is defined as the convolution of the convergence κ with a radially symmetrical filter function U(|ϑ|) of width θ: By considering the relationship between the shear, γ, and the convergence, one can reformulate equation (11) in terms of the measured shear as where γ t (ϑ) is the tangential component of the shear at position ϑ relative to the centre of the aperture, Q(|ϑ|) is a second radially-symmetric function, related to U(|ϑ|) by: and U(ϑ) is required to be compensated, i.e.
with ϑ cut often taken to be the radius of the aperture, θ. Furthermore, Q(ϑ) and U(ϑ) are required to go to zero smoothly at ϑ cut . It is also preferable that the power spectrum of U(ϑ) is local in the frequency domain, and shows no oscillatory behaviour. This ensures that the filter function acts as a band-pass filter, allowing detection of structures at the scale of interest only.
Several authors [19,20] have advocated a filter function of the form: where various choices for the constant b and the overall normalisation A have been used in the literature [20][21][22]. Of specific interest is the form used in [20] and [22], where b = 1/2 is chosen. 2 This form is considered to be optimal for higher-order weak lensing statistics, such as the skewness of the aperture mass statistic [19,22].
Note that the Q(ϑ) filter function described in equation (15) shows a peak at ϑ = √ 2θ, and tends to zero as ϑ → ∞. In practice, any algorithm used to generate an aperture mass map will need to truncate this filter function at some finite radius. This will involve a trade-off between accuracy and algorithm speed, and truncation may affect the effective width of the filter.
One advantage of this method is that the aperture mass statistic can be computed directly from the shear catalogue. Moreover, as the filter acts as a band-pass filter, it is possible to boost the signal relative to the noise by considering a filter with a scale substantially larger than the typical scale of the noise (which is typically dominant on pixel scales, depending on the binning chosen for the shear data). Indeed, optimal signal-to-noise is obtained when the filter is chosen such that its angular scale matches that of the structures we aim to detect, and its shape matches closely the expected profile of these structures.

The wavelet transform
The wavelet transform is a multiscale transform, where the wavelet coefficients of an image are computed at each position in the image at various different scales simultaneously. In one dimension, the wavelet coefficient of a function f (x), evaluated at position b and scale a is defined as [23,24]: where ψ(x) is the analysing wavelet. The analysis is analogous in 2 dimensions, with ψ(x, y) = ψ(x)ψ(y).
By definition, wavelets are compensated functions; i.e. the wavelet function ψ(x) is constrained such that R 1 ψ(x)dx = 0 and hence, by extension According to the definition in equation (16), the continuous wavelet transform of an image is therefore nothing more than the convolution of that image with compensated filter functions of various characteristic scales. If the image f (x, y) is taken to be the convergence κ(x, y), then for an appropriate choice of (radially-symmetric, local) wavelet, the wavelet transform is formally identical to the aperture mass statistic at the corresponding scales, the only difference being the choice of filter functions.
In practice in application, we use the starlet transform algorithm [6,[23][24][25], which simultaneously computes the wavelet transform on dyadic scales corresponding to 2 j pixels. This algorithm decomposes the convergence map of size N × N into J = j max + 1 sub-arrays of size N × N as follows: where j max represents the number of wavelet bands (or, equivalently, aperture mass maps) considered, c J represents a smooth (or continuum) version of the original image κ, and w j represents the input map filtered at scale 2 j (i.e. the aperture mass map at θ = 2 j pixels).
Using the wavelet formalism to derive the aperture mass statistic presents a number of advantages. Many families of wavelet functions have been studied in the statistical literature, and all these wavelet functions could be applied to the aperture mass statistic. This allows us to tune our filter function to optimise the signal-to-noise in the resulting maps. In addition, for some specific wavelet functions, discrete and very fast algorithms exist, allowing us to compute a set of wavelet scales through the use of a filter bank with a very limited number of operations. See [6] for a full review of the different wavelet transform algorithms.
In the starlet transform algorithm, the wavelet ψ(x, y) is separable and can be defined by: where ϕ(x, y) = ϕ(x)ϕ(y) and ϕ(x) is a scaling function from which the wavelet is generated.
In the case of the starlet wavelet, ϕ(x) is a B3-spline: which is a compact function that is identically zero for |x| > 2.
This wavelet function has a compact support in real space, is well localized in Fourier domain, and the wavelet decomposition of an image can be obtained with a very fast algorithm (see [24] for a full description). Figure 2 shows the starlet wavelet and aperture mass filters defined above in both real and Fourier space. Notice that the two filters presented have very similar shapes in real space, but different widths. While the starlet filter function goes to zero identically at ϑ = 2θ, and remains zero beyond this value, the M ap filter function goes to zero as ϑ → ∞, and must therefore be truncated when applied in practice. Clearly at ϑ = 5θ, the M ap filters are sufficiently close to zero, so that this is an appropriate truncation radius; however this will impact the computation time of the M ap statistic.
In Fourier space, we show the response of the M ap filter for truncation radii of ϑ cut = θ and 5θ. Notice that the shape of the response curves in Fourier space for both the M ap and wavelet filter functions are similar, but that the peak of the response curve for the M ap filter function shifts when ϑ cut is varied. This is because the effective scale of the filter function is being changed. Also notice that in the case of ϑ cut = θ, high-frequency oscillations are present in the response curve. This is a direct result of the truncation of the filter function, and will occur whenever such truncation is applied. Indeed, imperceptibly small oscillations are still present in the response curve for the filter truncated at ϑ cut = 5θ.
Such oscillations are not present at all with the starlet filter function, as no truncation is applied to this function whatsoever. This gives the starlet transform the distinct advantage of being directly applicable, without any consideration needed regarding truncation radii and the associated impact on the Fourier-space response of the filter functions.

Advantages of the wavelet formalism
The aperture mass formalism tends to be the preferred method for weak lensing studies for a number of reasons. Firstly, the filter functions can be simply expressed analytically, and any associated statistics of the aperture mass can therefore be straightforwardly computed from  the shear catalogue directly. This avoids the need to generate an aperture mass map, which can be computationally intensive, and furthermore does not require the computation of the convergence, κ.
Computing the convergence from the shear measurements can be tricky for a number of reasons. Equation (4) describes a convolution over all 2D space. Given that we aim to invert this equation for an image of a finite size, a direct inversion in Fourier space will give rise to significant edge effects, and a leakage of power into so-called B-modes, which effectively imply a spurious (non-lensing) cross-component to the shear field γ, and usually only arise due to systematic effects in the lensing measurements.
Several methods have been devised to invert equation (4) whilst minimising these undesirable effects [26][27][28][29]. Most recently, the authors in [30] have presented a method based on a wavelet-Helmholtz decomposition, with which they demonstrate a reconstruction error at the few percent level, as compared to an error of ∼ 30% seen with Fourier-based methods. This implies that there is little advantage to the shear catalogue as opposed to the convergence map. Indeed, working on the convergence map and under the wavelet framework may offer some distinct advantages.
The first advantage comes directly from the nature of the wavelet transform. The aperture mass filter is a bandpass filter. Because the M ap reconstruction is only computed for a discrete set of aperture scales, information on intermediate scales may be lost, particularly at both the small and large frequency extremes. In contrast, the wavelet transform retains information on all scales. The first wavelet scale is effectively a high-pass filter, retaining all the high-frequency information in the image, while the remaining wavelet scales are bandpass filtered as with the M ap -filtered images. Finally, the wavelet transform retains c J , which encodes the large-scale information in the image, and therefore consists of a smoothed version of the image (or an image with a low-pass filter applied).  In addition, from a mapping perspective, the starlet wavelet transform algorithm is substantially faster than a brute-force computation of the aperture mass statistic in real space. Such a naive implementation has complexity ∝ O(N 2 ϑ 2 cut ), where N × N is the dimension of the image, and ϑ cut is the chosen truncation radius. This scaling means that for large apertures or, equivalently, for high-resolution images, the algorithm may prove to be very time-consuming. The starlet wavelet transform algorithm is of complexity ∝ O(N 2 J), where J is the number of scales considered, and is limited by N ≥ 2 J . This means that the processing time for the wavelet transform algorithm is sensitive only to the number of scales considered, rather than the size of the filter functions involved, and depends linearly on this number.
In Figure 3(b), we compare the processing time for the aperture mass algorithm and the starlet transform algorithm, both programmed in C++, to analyse an image of 1024 × 1024 pixels on a 2 × 2.66GHz Intel Xeon Dual-Core processor. We consider aperture scales θ = [2,4,8,16,32,64], which correspond to J = j max + 1 = [2, 3,4,5,6,7] wavelet scales in the wavelet transform. In the aperture mass algorithm, the filters are truncated at a radius of ϑ cut = θ. For the application of filters which necessitate truncation at a much larger radius, we expect the computation time to be roughly an order of magnitude longer. Even at the smallest aperture radius, the wavelet transform is ∼ 5× faster than the aperture mass algorithm. At θ = 64 pixels, the wavelet transform is ∼ 1200× faster than the aperture mass algorithm. Note that the wavelet transform for a given value of J simultaneously computes the wavelet transform at all scales 2 j , 0 < j ≤ J − 1, in addition to the smoothed continuum map c J , whilst the aperture mass algorithm computes the transform at a single scale θ. We note further that the computational time for the wavelet transform for J = 7 wavelet scales is still a factor of ∼ 2× less than the computational time for the aperture mass algorithm at θ = 2 pixels.
As discussed in [31], this time advantage further extends to higher-order statistics of the M ap , which are typically related directly to n-point correlation functions of the shear. In recent years, tree codes have been employed to speed up computation of n-point correlation functions. Typical tree codes to compute n-point correlation functions are O(N gal log(N gal )) [32] on a shear catalogue. For a single Euclid exposure of 0.5 deg 2 , we can expect N gal ∼ 54, 000 (30 galaxies/arcmin 2 ) -180, 000 (100 galaxies/arcmin 2 ). Tree codes exist that act on pixelated data [33] which run at O(N 2 pix n bin ) where N 2 pix is the total number of pixels and n bin is the number of bins in the correlation function. For a Euclid exposure, assuming pixels of 1 arcminute, we have N 2 pix = 1800, and n bin will be dependent on the required resolution of the correlation function.
The wavelet method acts on pixellated data, and is O(N 2 J) in computation time, so our algorithm will be comparable for computation of 2-point statistics, if the 2-point correlation function is computed on pixellated data, but if a shear catalogue is used, we will have a faster algorithm by at least an order of magnitude. For higher-order statistics, this advantage is even more pronounced. Furthermore, while optimised software is freely and publicly available to compute the wavelet transform, such optimised software is not available for n-point correlation functions.
Another advantage of the wavelet formalism is that it is possible to carry out an explicit denoising of the convergence field using thresholding based on a False Discovery Rate method. For details on this method, see [34], where the MRLens software package encoding this method is presented 3 . This allows one to derive robust detection levels in wavelet space, and to produce high-fidelity denoised mass maps.
In addition, wavelet-based methods offer more flexibility than aperture mass filters. Whilst we have thus far discussed only the starlet wavelet function, many other wavelet dictionaries may be used. The starlet filter seems ideal for lensing studies, due to its similarity to the aperture mass filter presented here, which was deemed to be optimal in [22]. However, different dictionaries may be optimal in different applications; for example, if one were attempting to study filamentary structure, ridgelets or curvelets might be a more appropriate basis. The vastness of the wavelet libraries and the public availability of fast algorithms to compute these transforms are major strengths of wavelet-based approaches.

Weak lensing beyond two-point shear statistics
As noted previously, the optimality of second-order statistics to constrain cosmological parameters depends heavily on the assumption of Gaussianity of the field. However the weak lensing field is composed, at small scales, of non-Gaussian features such as clusters of galaxies. These non-Gaussian signatures carry additional information that cannot be extracted with second-order statistics. Many studies [35][36][37][38][39] have shown that combining second-order statistics with higher-order statistics tightens the constraints on cosmological parameters. We now consider the higher-order statistics most commonly used in weak lensing studies aimed at detecting and constraining non-Gaussianity in the lensing field.

Higher-order lensing statistics
As we have already noted, it is more convenient to consider higher-order statistics of the M ap or wavelet transform of the convergence field, as opposed to statistics of the shear field directly. The obvious first extension to two-point statistics is to consider the three-point correlation function or, equivalently, the bispectrum, usually considered as a function of aperture or wavelet scale.
Interesting analytical results relative to the shear three-point correlation function or the convergence bispectrum have been reported (for example, see [40][41][42][43]). However, it has been shown that in using only the equilateral configuration of the bispectrum, the ability to discriminate between cosmological models is relatively poor [38]. An analytical comparison has been performed in [39] between the full bispectrum and an optimal match-filter peak count for a Euclid-like survey, and both approaches were found to provide similar results. However, as the full bispectrum calculation has a much higher complexity than other statistics, and no public software exists to compute it, we will not consider the full bispectrum here. Rather, we will restrict ourselves to statistics that are more straightforward to compute from the M ap or wavelet transform maps.

The Skewness
The skewness of the aperture mass map, M 3 ap , is the third-order moment of the aperture mass M ap (θ) and can be computed directly from shear maps filtered with different aperture mass. The skewness is a measure of the asymmetry of the probability distribution function. The probability distribution function will be more or less skewed positively depending on the abundance of dark matter haloes at the θ scale. The formalism exists to predict the skewness of the aperture mass map for a given cosmological model, which is related to the three-point correlation function or the bispectrum of density fluctuations δ [20,37].
In [37], it is argued that the skewness as a function of scale is a preferable statistic to three-point correlation functions of the shear, as the integral relations between M 3 ap and the bispectrum are much easier and faster to compute than the three-point correlation function. This is because the skewness is a local measure of the bispectrum, whereas the integral kernel for the three-point correlation function is a highly oscillating function with infinite support.
This statistic can equivalently be computed from the wavelet transform of the convergence map as the skewness w 3 j of the wavelet band j corresponding to the aperture scale θ. This can be computed either on the noisy convergence map estimated, for example, by the Fourier space relationship between the shear and convergence -in which case the results should be comparable to the M ap results -or on a denoised convergence map using the method of [34], which should provide improved results.

The Kurtosis
The kurtosis of the aperture mass map, M 4 ap , is the fourth-order moment of the aperture mass M ap (θ) and can be computed directly from the different aperture mass maps. The kurtosis is a measure of the peakedness of the probability distribution function. The presence of dark matter haloes at a given θ scale will flatten the probability distribution function and widen its shoulders leading to a larger kurtosis. The formalism exists to predict the kurtosis of the aperture mass map for a given cosmological model, which is related to the four-point correlation function or trispectrum of the 3D density contrast [20].
Again, this statistic can be computed from the wavelet transform of the convergence map as the kurtosis w 4 j of the wavelet band j corresponding to the aperture scale θ. This can be computed either on a noisy or a denoised convergence map, the latter expected to yield improved constraints.

Peak Counts
We define a peak as set of connected pixels above a detection threshold T , and denote the peak count in the M ap and wavelet maps by P T M ap and P T w j , respectively. If peak counting is carried out on a denoised map, the detection threshold T is automatically set by the denoising algorithm, and a small threshold ǫ is used when identifying peaks in the denoised maps, in order to reject spurious detections in these denoised maps.
We consider all pixels that are connected via the sides or the corners of a pixel as one structure. In a two-dimensional projected map of the convergence, we are therefore unable to discriminate between peaks due to massive halos and peaks due to projections of large-scale structures such as filaments.
While theory exists to predict cluster counts from the halo model and a cosmological model encoding the growth and evolution of structures in the Universe, there is no analytic formalism to predict the fraction of spurious detections in lensing maps arising from projections of large-scale structure. In [44], an attempt is made to derive an analytic formalism for predicting peak counts in projected maps. However, this method is based on an assumption of Gaussianity in the lensing field and, predictably, underestimates counts at the high end of the mass function, where clusters of galaxies are dominant. This means that predicting peak counts for a given cosmological model relies on considering N-body simulations generated under a range of cosmological parameters (for example, see [45]).

Optimal capture of non-Gaussianity
The question now arises: which of these statistics provides the most information about the underlying cosmology and -specifically -non-Gaussianity in the density field? In order to asses the performance of these statistics, we consider the ability of each statistic to discriminate between cosmological models using N-body simulations under a range of different cosmologies [38,46].
To this end, we consider N-body simulations carried out for 5 different cosmological models along the Ω M − σ 8 degeneracy, the parameters of which are summarised in Table 1 below [46]. These simulations were carried out using the RAMSES N-body code [47], and full details of the models considered are given in [38]. For each model, 100 realizations of weak lensing maps were generated for a field of 3.95 • × 3.95 • , downsampled to 1024 × 1024 pixels (0.23 ′ per pixel). Noise was added to the simulations at a level consistent with predictions for deep space-based observations, with a number density of galaxies of n g = 100 gal/arcmin 2 . This is somewhat optimistic; however, as seen in [46], increasing the noise level in the simulations does not change the conclusions of the comparison test between the higher-order statistics. To find the best statistic, we need to characterise quantitatively for each statistic the discrimination between two different models m 1 and m 2 . To do this, we consider the distribution functions for the different statistics estimated on the 100 realisations of each model. These distributions are expected to overlap, so False Discovery Rate (FDR) method is used to determine the threshold τ 1 in the distribution function of model m 1 , such that the distribution function of model m 2 accounts for fewer than a fraction α = 0.05 of the total counts. A similar threshold τ 2 is defined for the distribution function of model m 2 . This is illustrated in Figure 4. The discrimination efficiency is then defined as the percentage of counts remaining under the hatched area for each model. In Table 2 below, we show the mean discrimination efficiency obtained considering all pairs of models for the higher-order statistics of interest. These are presented as a function of M ap or wavelet scale, and shown for statistics computed on the aperture mass map and the wavelet transform of the noisy convergence map. The results overall are comparable, with the wavelet transform statistics appearing to offer a slight improvement over the M ap statistics in all cases. This is perhaps related to the fact that, when computing the M ap statistics, we truncate the filter at ϑ cut = θ, which may give rise to some small systematics in the resulting maps. Furthermore, it is clear that, in all cases, the peak statistics are much more efficient at discriminating between cosmological models than either the skewness or the kurtosis.  Table 2. Mean discrimination efficiency (in percent) from noisy M ap reconstructions of the shear field, compared with that obtained on the wavelet transform of the noisy convergence maps.

Model
In Table 3 below, we present the mean discrimination efficiency of statistics computed on the denoised convergence maps, as well as the peak discrimination efficiency between each pair of models in our sample. There is a clear improvement on the discrimination efficiency of all the higher-order statistics when denoising is applied to the convergence maps. This is unsurprising, as the presence of Gaussian (or near-Gaussian) noise within the shear and convergence maps will make the whole field appear more Gaussian, masking the non-Gaussian features in the map and pushing the skewness and kurtosis values closer to zero. The more noise that is present in the maps, the stronger this effect will be. It is for this reason that such a dramatic improvement is seen in these statistics when denoising is applied to the convergence maps before the statistics are computed.
The improvement seen in the discrimination efficiency of the peak statistics is also significant, and this arises due to the fact that while the peak counts in the noisy maps involve application of a simple nσ detection threshold to the maps, the denoising algorithm involves a more sophisticated discriminant between signal and noise, again making use of the FDR method. This technique is more effective and distinguishing between signal and noise, and therefore more information about the true peaks in the map is retained using this denoising method as compared to the nσ thresholding applied previously.
It is clear that peak statistics are much more efficient at capturing non-Gaussianity in the shear field than other higher-order statistics, and that it is advantageous to measure this statistic on denoised convergence maps. However, in order to combine constraints from peak counting with other probes, such as two-point statistics, in order to better constrain the cosmological model, it is important to be able to predict peak statistics as a function of cosmology in order to be able to compute the likelihood function for all models within the parameter space. As we have discussed, this is not possible to do analytically, due to contamination by projections of large scale structures, and it is too computationally-intensive  to consider obtaining the predictions, and associated covariances, from N-body simulations if we wish to sample the parameter space completely and at high resolution.
As the quality of data available to astronomers continues to improve, it has recently become possible to consider the shear field in three dimensions, using the colours of galaxies to determine their redshifts and, therefore, distances from us. If this information may be used to deproject the lensing signal, and thus recover information about the full three-dimensional density field, it should then be possible offer predictions for peak statistics as a function of cosmology, uncontaminated by projection effects which will, in turn, allow us to improve our constraints on our cosmological model.

Reconstructing the density contrast in 3D
The measured shear can be related to the 2D convergence via a simple linear relationship (equation (4)), and the convergence is related to the density contrast by another linear mapping (equation (6)). We can express these relationships conveniently in matrix notation as γ(θ, z) = P γκ Qδ(θ, z) .
The 3D lensing problem is therefore one of finding an estimator to invert equation (21) or (22) in the presence of noise. For simplicity, in what follows we assume the shear measurement noise is Gaussian and uncorrelated between redshift bins. In practice, errors in the photometric estimates of the redshifts of galaxies will introduce correlations between redshift bins. Methods have been developed to account for such errors, however, and the problem is therefore readily tractable [48].
Therefore the 3D lensing problem is effectively one of observing the density contrast convolved with the linear operator R, and contaminated by noise, which is assumed to be Gaussian. Formally, we can write where d is the observation, s the real density and ε the Gaussian noise.

Linear approaches to 3D map-making
The general idea behind linear inversion methods is to find a linear operator H that acts on the data vector to yield a solution that minimises some functional, such as the variance of the residual between the estimated signal and the true signal, subject to some regularisation or prior-based constraints. Two different linear approaches have been described in recent literature [4,49].
The most competitive method is that proposed in [4], in which the authors propose a Saskatoon filter [50,51], which combines a Wiener filter and an inverse variance filter, with a tuning parameter α introduced that allows switching between the two. This gives rise to a minimum variance filter, expressed aŝ where S ≡ ss † encodes prior information about the signal covariance, Σ ≡ nn † gives the covariance matrix of the noise, and 1 is the identity matrix.
This switching is designed to allow a balance between the increased constraining power offered by the Wiener filter over the inverse variance filter -which yields an improved signal-to-noise in the reconstruction -and the biasing that the Wiener filter imposes on the solution.
As discussed extensively in [4] and [49], linear methods give rise to a significant bias in the location of detected peaks, damping of the peak signal, and a substantial smearing of the density along the line of sight. Furthermore, the resolution attainable in the reconstructions obtained using linear methods is limited to the resolution of the data. In other words, we cannot reconstruct the density contrast field at higher resolution than the resolution of our input data which, in turn, is limited by the noise properties of the data. Figure 5 shows the 3D reconstruction obtained in this way for a simulated cluster of galaxies at a redshift of z cl = 0.25. The tuning parameter used for this reconstruction was α = 0.05. The cluster was simulated according to an NFW halo with M = 10 15 h −1 M ⊙ and c = 3, and the shear data were assumed to come from a galaxy distribution given by p(z) ∝ z 2 e −(1.4z) 1.5 [52,53], with a maximum redshift of z max = 2.0 and a galaxy density of n g = 100 galaxies/arcmin 2 . The simulation covers a 1 • × 1 • field binned into 60 ×

Compressive sensing
We consider some data Y i ( i ∈ [1, .., m]) acquired through the linear system where Θ is an m × n matrix. Compressed sensing [54,55] is a sampling/compression theory based on the sparsity of the observed signal, which shows that, under certain conditions, one can exactly recover a k-sparse signal (a signal for which only k pixels have values different from zero, out of n total pixels, where k < n) from m < n measurements.
This recovery is possible from undersampled data only if the sensing matrix Θ verifies the restricted isometry property (RIP) [see 54, for more details]. This property has the effect that each measurement Y i contains some information about the entire pixels of X; in other words, the sensing operator Θ acts to spread the information contained in X across many measurements Y i .
Under these two constraints -sparsity and a transformation meeting the RIP criterion -a signal can be recovered exactly even if the number of measurements m is much smaller than the number of unknown n. This means that, using CS methods, we will be able to outperform the well-known Shannon sampling criterion by far.
The solution X of (25) is obtained by minimizing where the ℓ 1 norm is defined by X 1 = ∑ i | X i |. The ℓ 1 norm is well-known to be a sparsity-promoting function; i.e. minimisation of the ℓ 1 norm yields the most sparse solution to the inverse problem. Many optimisation methods have been proposed in recent years to minimise this equation. More details about CS and ℓ 1 minimisation algorithms can be found in [6].
In real life, signals are generally not "strictly" sparse, but are compressible; i.e. we can represent the signal in a basis or frame (Fourier, Wavelets, Curvelets, etc.) in which the curve obtained by plotting the obtained coefficients, sorted by their decreasing absolute values, exhibits a polynomial decay. Note that most natural signals and images are compressible in an appropriate basis.
We can therefore reformulate the CS equation above (Equation (26)) to include the data transformation matrix Φ: where X = Φ * α, and α are the coefficients of the transformed solution X in Φ, which is generally referred to as the dictionary. Each column represents a vector (also called an atom), which ideally should be chosen to match the features contained in X. If Φ admits a fast implicit transform (e.g. Fourier transform, Wavelet transform), fast algorithms exist to minimise Equation (27).
One problem we face when considering CS in a given application is that very few matrices meet the RIP criterion. However, it has been shown that accurate recovery can be obtained as long the mutual coherence between Θ and Φ, µ Θ,Φ = max i,k Θ i , Φ k , , is low [56]. The mutual coherence measures the degree of similarity between the sparsifying basis and the sensing operator. Hence, in its relaxed definition, we consider a linear inverse problem Y = ΘΦX as being an instance of CS when 1. the problem is underdetermined, 2. the signal is compressible in a given dictionary Φ, 3. the mutual coherence µ Θ,Φ is low. This will happen every time the matrix A = ΘΦ has the effect of spreading out the coefficients α j of the sparse signal on all measurements Y i .
Most CS applications described in the literature are based on such a soft CS definition. Compressed sensing was introduced for the first time in astronomy for data compression [57,58], and a direct link between CS and radio-interferometric image reconstruction was recently established in [59], leading to dramatic improvement thanks to the sparse ℓ 1 recovery [60].
The 3D weak lensing reconstruction problem can be seen to completely meet the soft-CS criteria above. The problem is underdetermined, as we seek a higher resolution than can be attained in the noise-limited observations, the matter density in the Universe is sparsely distributed, and the lensing operator spreads out the underlying density in a compressed sensing way.
In particular, for the reconstruction of clusters of galaxies, we are in a perfect situation for sparse recovery because clusters are localised in the angular domain, and are not resolved along the line of sight owing to the bin size. They can therefore be modelled as Dirac δ−functions along the line of sight, while an isotropic wavelet basis can be used in the angular domain.

Results and future prospects
In [61], we present an algorithm to solve the 3D lensing problem. In this method, the 3D lensing problem is reduced to a one-dimensional problem, by taking as the data vector the (noisy) lensing convergence along each line of sight, which is related to the density contrast through Equation (21). Each line of sight can therefore be considered independently.
Clearly, a one-dimensional implementation throws away information, because we do not account at all for the correlation between neighbouring lines of sight that will arise in the presence of a large structure in the image; however, reducing the problem to a single dimension is fast and easy to implement, and allows us to test the efficacy of the algorithm using a particularly simple basis function through which we impose sparsity.
In Figure 6, we show the reconstruction obtained in this way for the simulated cluster described in section 5.1. The line of sight plot shows a clear improvement in all the target areas, reducing the bias, smearing and damping effects seen using linear methods. Small-scale noise is present, particularly at high redshifts, but this is likely due to overfitting in the algorithm used, and may be reduced by considering the whole 3D field, rather than each line of sight independently (for a full discussion of this issue, see [61]).  This reconstruction was undertaken using the same resolution on the reconstruction as on the input data. However, we know that the CS approach is particularly well-adapted to dealing with ill-posed inversion problems. In order to test this, we consider a cluster at a redshifts of z cl = 0.2, simulated as before. We use N sp = 20 redshift bins in our data, but now aim to reconstruct our density contrast with a redshift resolution of N lp = 25. The results are shown in Figure 7. Here, again, we see small scale noise at high-redshift, but the overall smearing and redshift bias issues seen in the linear reconstructions is absent.
As noted previously, the one-dimensional solver employed here throws away a wealth of information about the angular correlation of the lensing signal, and is thus not optimal. Indeed, a simple algorithm based on this CS approach, but implemented as a full three-dimensional treatment, offers marked improvements in the quality of the reconstructions [62].  This is demonstrated in Figure 8, which shows reconstructions of the cluster at redshift z cl = 0.2, with the same improved resolution in the reconstruction as before, but this time using the three-dimensional CS approach. Dramatic improvement is seen on all fronts, with the reconstructions showing no bias or redshift smearing, and very little amplitude damping, and with none of the small-scale false detections seen in the 1D CS approach.  This marked improvement in the cluster reconstruction seen in the 3D CS approach represents a definite step in the right direction for weak lensing studies. There is much work to be done, however. The application of this CS approach to the 3D lensing problem is a very recent development, and many questions remain: the choice of algorithm, for example; how best to control the noise in the reconstruction; how to deal with photometric redshift errors; by what factor the reconstruction resolution might be improved as compared to the data.
Yet it is clear that this approach opens up the possibility of being able to generate accurate reconstructions of the density contrast using weak lensing measurements, and perhaps using information such as the 3D peak count -in combination with constraints from other probesto place ever-tighter constraints on our cosmological model. This, in turn, will offer a unique insight into the nature and properties of the dark components of the Universe.