## 1. Introduction

The core of current research foci in cosmology is to interpret the distribution and properties of observed galaxies in the sky and to understand their formation and evolution. The current standard cosmology model—lambda-cold-dark-matter (ΛCDM) paradigm—provides a general explanation for the galaxy formation and evolution: matter is dominated by the dark matter, which only subjects to gravitational interactions; inside dark matter halo that acts as a gravitational potential well, baryonic matters go through a series of physical processes, such as gas cooling, star forming and death with Supernova feedback.

From the cosmic micro background (CMB) observation, such as WMAP [1] and Plank [2], matters occupy roughly one-fourth of total energy of the Universe. The rest comes from dark energy. Dark matter is about 20% of the total energy, while baryons only occupy 5% (see more accurate fractions from [3]). At the CMB time (z ~ 1100), matters are distributed nearly ‘homogeneous’ in the Universe with little fluctuations at small scales. Started from that time, dark matter and baryons are assembled by the gravitational force. They follow a pattern of hierarchical structure formation, where the smaller structures form first, then merge to build massive ones. At very large scale, this structure formation process can be roughly described by the Zel'dovich approximation [4]. However, the formation of structures with gravity is a nonlinear process, which cannot be fully described analytically, especially at small scales. Therefore, building these structures and tracing their evolutions require numerically solving the gravitational equation.

Combining with modern computers, this problem can be solved with numerical methods—N-body simulations, which boosts a new area of research in astronomy. Initially, different numerical methods are developed to simulate only dark matter component, such as particle-mesh (PM), particle-particle/particle-mesh (P3M) and tree-PM algorithms. Dark matter is described numerically by data points/particles that trace a mass element corresponding to a volume element of the early ’homogeneous’ universe. Those methods successfully describe the formation of structures by implementing gravitational interactions. Thus, over decades, such simulations have been widely used with little variation in term of physics. Combined with ever decreasing limitations of computer resources and vast improvement in terms of implementations, larger volumes can be explored with increasing resolution to reserve the small-scale information. The properties of cosmology structures (such as cosmic web, voids), halos and even subhaloes are well understood.

*However, these simulations cannot give any information of galaxies, which are resident inside dark matter halos*.

To connect these theoretical investigation results with observed galaxies, numerous methods are developed. They can be roughly separated into these three approaches:

Much simpler approaches are halo occupation distribution (HOD) models, where observed galaxies are assigned to halos by matching both the halo mass and stellar mass functions (e.g. [5–10]). Such methods are tuned to directly link the luminosity functions with halo mass functions. Thus, they are successful in defining the stellar mass halo mass (SMHM) relation. However, this method cannot provide useful individual galaxy information. Furthermore, the scatter in this relation still remains uncertain and difficult to interpret. It can be constrained by comparing specific galaxies, their environments, the inter galactic medium (IGM) and their full formation history.

Other less computationally intensive methods involve applying sub-grid models on the scale of dark matter halos, starting from the accretion of gas by the potential well, following recipes of gas cooling, star forming, supernova (SN) and active galactic nuclei (AGN) feedbacks, at last galaxies are formed and evolved under the halo merger tree. These semi-analytical models (SAMs) have been successfully applied to halo catalogues extracted from N-body simulations (e.g. [11–17]). Interested readers are encouraging to find the differences between these models (including HOD models) in the nIFTy cosmology comparison project [18] and their following works. As the formation of halos can be traced in the form of halo merger trees, both the formation and interaction of galaxies can be explored within the time frame and the mass resolution explored by the simulation. Although these methods can provide more physical views of galaxy formation, they are still lacking the consistency of co-evolving between baryon and dark matter.

Hydro-dynamical simulations are the only way to overcome the problem faced by SAMs. They can directly solve the physical processes of the baryonic component on top of the dark matter one, which can provide consistent co-evolution with the same gravitational force. These hydro-simulations require complex implementations of baryonic models with gas described either as (a) numerical data points with associated density (smooth particles hydrodynamics (SPH): [19–22], etc.), (b) grid cells fixed in the volume (cells are refined and unrefined as required to explore highest gas density while neglecting low-density regions with nested mesh or adaptive mesh refinement (AMR): [23–25], etc.) or (c) moving mesh (the gas element is associated with a numerical point within a volume defined from the distribution of nearby mesh point through Voronoi tessellation) [26]. The key aspect is the description of the physical processes within these gas elements. These recipes from SAM can be implanted in hydro-dynamical simulations with moderate modifications. However, hydro-dynamical simulation is suffered from its time-consuming computation, with which the numerous free parameters from these sub-grid baryonic models cannot be easily tuned to represent these observational relations as they are in SAM.

Although hydro-dynamical simulations are the heaviest and most time-consuming tool for connecting the dark part with the luminous part in the Universe, they are irreplaceable in investigating/understanding galaxy formations in a full picture. Those HOD and SAM models, which are used to create mock galaxy catalogues, have been quite successfully in reproducing the observational statistical features, such as the two-point correlation functions, luminosity functions, colour distributions and star formation rates. Nevertheless, they are based on the assumption that baryon processes are independent of dark matter halo formation, which is apparently not true [27–29]. As both observation and simulation are becoming more and more accurate, the back reaction of baryons to dark matter cannot be ignored. Thanks to the Morse’s law, more and more efforts are being put in these areas in recent years, for example, [30, 31], the OWLS project [32], the EAGLE project [33], the Illustris project [34] and the Horizon-AGN simulation [35] for these cosmological simulations; the NIHAO project [36] and the FIRE project [37] for these zoom-in simulations. Interested readers refer to the Aquila project [38], the AGORA project [39] and the nIFTy cluster comparison project [40–44] for the comparison of different hydro-dynamical simulation codes. A number of studies based on cosmological hydro-dynamic simulations have been recently carried out to analyse in detail the effect of baryonic processes on different properties of the total mass distribution, such as the power spectrum of matter density fluctuations (e.g. [45–48]), the halo correlation functions (e.g. [49, 50]), the halo density profiles (e.g. [29, 51–53]), concentration (e.g. [54, 55]) and shape (e.g. [56, 57]) and the halo mass function (e.g. [30, 31, 58–61]).

In this chapter, we will focus on the impacts of baryons through these comparisons between hydro-dynamical simulations with dark-matter-only simulations and summarize the results in these three aspects: power spectrum, two-point correlation function (2-PCF) and halo mass function (HMF).

## 2. Chapter

In the last decade, dark-matter-only simulations have been vastly used to theoretically investigate the large-scale structure of the Universe. Through different statistical methods, such as power spectrum, two-point correlation function, halo mass function and so on, the formation and evolution of the large-scale structures have been clearly characterized by those cosmological dark-matter-only simulations. However, the observed Universe can only show the distribution of baryonic matters at such scales. To connect these theoretical understanding with observations of the large-scale structure of the Universe, we need hydro-dynamical simulations, which can provide a consistent evolution driving by the gravitational force for both dark matter and baryons. With these hydro-simulations, we can directly compare simulations with observations through mock techniques (e.g. [62–64]); explore the galaxy formation process in details; correct and improve our understanding of these baryon models, and so on. In this chapter, we only concentrate on one simple question: How do the baryon processes react on dark matter? This is a question, which these simplified analytical models such as HOD and SAM with ad hoc parameters lack the ability to deal with. As baryons occupy only a small fraction of total matter, we are expecting a very weak effect on the dark matter structures. Nevertheless, baryons dominate at small scales such as in galaxies, where the effect cannot be ignored anymore. Thus, we will address this question with different statistical quantities at different scales, which are listed in Sections 2.1, 2.2 and 2.3.

### 2.1. Power spectrum

The power spectrum P(k) (here k is the comoving wavenumber corresponding to a comoving spatial scale

Using the OWSL simulations, [46] studied the influence of baryonic models on matter power spectrum through a comparison between a dark-matter-only (DMONLY) one and hydro-dynamical simulations (REF and AGN). Starting from the same initial condition, these simulations from various models are listed in **Table 1**.

In **Figure 1**, they showed the dimensionless matter power spectrum ^{−1} by comparing the DMONLY simulation with the REF simulation; the power is greatly increased at smaller scales < 1 h^{−1} Mpc (k ≥ 6 h Mpc^{−1}). The decreased power is caused by the gas pressure, which smooths the density field relative to that expected from dark matter alone. While, the increased power in the REF simulation is because radiative cooling enables gas to cluster on smaller scales than the dark matter. These results confirm the findings of previous studies, at least qualitatively (e.g. [45, 67, 68]). However, with the AGN feedback, which is required to match observations of groups and clusters, its effect on the power spectrum is enormous: the power is reduced by ≥10% for k ≥ 1 h Mpc^{−1}. This could be caused by that large amounts of gas are moved to large radii due to the AGN feedback (see also [43]). Because the AGN normally reside in massive and thus strongly clustered objects, the power is suppressed out to scales, where the removed gas can reach.

In **Figure 2**, they showed power spectra from the REF (left panel) and AGN (right panel) simulations at z = 0. As indicated on the top left of each panel, different components are shown by different colour lines. The power spectrum for DMONLY (dashed black lines) is shown as a reference. The power spectra on top row is calculated with ^{−1} Mpc. At scales of several hundred kpc and smaller, the difference between the CDM component (also the total component) of the reference simulation and DMONLY exceeds the change between the latter and the analytic models. This is caused by the back-reaction of the baryons on the dark matter. On the bottom left panel of **Figure 2**, it is clear that CDM dominates the power spectrum on large scales. While the contribution of baryons is significant for λ ≤ 0.1 h^{−1} kpc and dominates below 0.06 h^{−1} Mpc. The strong small-scale baryonic clustering is a direct consequence of gas cooling and galaxy formation. For the baryonic component, the baryonic power spectrum is dominated by gas component on large scales, which has a flatter power for λ ≤ 1 h^{−1} Mpc (corresponding to the virial radii of groups of galaxies) and a slightly steeper power again for λ ≤ 0.1 h^{−1} Mpc (galaxy scales). While the stellar power spectrum takes control for λ ≤ 1 h^{−1} Mpc. The inclusion of AGN feedback greatly impacts the matter power spectrum on a wide range of scales. Comparing the top panels of **Figure 2**, the power in both the gas and stellar components is decreased by AGN feedback for λ ≤ 1 h^{−1} Mpc. Through comparing the two bottom panels, the stellar power spectrum is reduced the most: about an order of magnitude on the largest scales; more than two orders of magnitude on the smallest scales. This is an expected result of the AGN feedback, which suppresses star formation, as required to solve the overcooling problem. The gas power spectrum is also dramatically dropped as a consequence of the AGN feedback. The suppression of baryonic structure by AGN feedback also makes the dominant dark matter component of the power spectrum on small scales down.

In addition, different baryonic models investigated in [46] (see more details in their **Figure 3**) showed significant changes of power spectrum at non-linear scale. It means that these baryonic models need very subtle tuning of their parameters to represent the observational results.

### 2.2. Two-point correlation function

The correlation function, ξ(r), through the calculation of the excess probability to a random distribution to find the possibility of two objects at a given separation r. It is a very useful measure of the clustering of these objects as a function of scale. Comparing power spectrum, correlation can provide different views of cosmological structures. Using galaxy as a tracer, it can be used to investigate the clustering of dark matter halo, for example, [69, 70].

Following their work on power spectrum [46], they studied the baryon effect on two-point correlation functions in [50] with the OWLS simualtions. A parallelized brute force approach is used to calculate the correlation function. Through simple pair counts, ξ(r) can be easily expressed as:

Here, X and Y denote two (not necessarily distinct) sets of objects (e.g. subhaloes and particles or haloes and haloes), DD_{XY} (r) is the number of unique pairs consisting of an object from set X and an object from set Y separated by a distance r, and RR_{XY} (r) is the expected number of pairs at this separation if the positions of the objects in these sets were random.

Subhalos from their simulations are identified by the SUBFIND algorithm [71, 72] inside Friends-of-Friends haloes. Interested readers refer to Ref. [73] for the comparison of different subhalo finding codes, as well as the effect from the included baryonic models. Top panel of **Figure 3** shows the subhalo autocorrelation function, ξ_{ss}(r), for three different simulations: DMONLY, REF and AGN. Different colours indicate different subsamples, selected by the total mass of the subhaloes, M_{sh,tot}. The median virial radii of subhaloes in each mass bin are indicated by vertical dotted lines. These radii are similar to the scales at which the subhalo correlation functions for DMONLY turn over. It is clear that subhalo clustering in the dark-matter-only simulation behaves quite differently from that in the baryonic models, especially on small scales (r ≤ 1 h^{−1} Mpc).

The middle and bottom panels show the relative 2-PCF difference between REF (middle)/AGN (bottom) and DMONLY simulation. All subhaloes in the baryonic simulations are typically ~10% more strongly clustered on large scales than their dark-matter-only counterparts. This difference is due to the reduction of subhalo mass caused by baryonic processes. For the larger subhaloes, 10^{13} < M_{sh,tot}/[M_{⊙}/h] < 10^{14}, this offset is somewhat larger when AGN feedback is included, because supernova feedback alone cannot change the subhalo mass by as much as it can for lower halo masses [74]. The differences between the baryonic and dark-matter-only simulations increase rapidly for r < 2r_{vir}, at least for M_{sh,tot} < 10^{14} h^{−1}M_{⊙}. Subhaloes from the REF simulation show significant larger clustering signal on small scales than from the AGN simulation. This seems to contradict to the results from the previous section. This is because at fixed mass range, subhaloes from the AGN simulation are less compact compared with these from the REF simulation. Due to the additional form of feedback in the AGN run, more material from the centre are pushed into outer radii, which results in a lower concentration. Similar to the subhalo 2-PCF, the galaxy 2-PCFs (ξ_{gg}(r)) are very similar between REF and AGN at smaller galaxy mass bins. However, it is worth to note that there is a significant difference at the largest halo mass bin, which is shown in [50].

**Figure 4** shows the subhalo-mass 2-PCF, ξ_{sm}(r) on the upper panel; the fractional difference between ξ^{REF}_{sm}(r) and ξ^{DMONLY}_{sm}(r) on the middle panel; the fractional difference between the ξ^{AGN}_{sm}(r) and ξ^{DMONLY}_{sm}(r) on the bottom panel. Again, subhaloes are generally more strongly clustered with matter in the REF and AGN than in DMONLY for scales r > r_{vir}. There is also a constant ~5% difference in favour of both REF and AGN simulations on large scales, regardless of subhalo mass. The largest differences can be up to 40% (20% for AGN) higher on intermediate scales for the lowest mass subhaloes. If sufficiently small scales are considered, this difference can be much higher for any subhalo mass. The AGN run does show a stronger decrease in clustering up to scales r ~ 0.1 h^{−1} Mpc. While the ξ_{sm}(r) at smaller mass bins from REF also show similar decrease. It is worth to note the strongly non-monotonic changes of the subhalo-mass 2-PCF between the two baryonic runs and the DMONLY one. This can be caused by the interplay between the changes in both the total subhalo mass and its mass profile. On the one hand, the lowered halo masses in the baryonic simulations tend to increase clustering at fixed mass on all scales. On the other hand, galaxy formation dissipates smoothed gas component and causes the inner halo profile to steepen (increasing clustering on small scales); the associated feedback causes the outer layers of the halo to expand (decreasing clustering on intermediate scales). These conclusions are proved in Ref. [50] through the 2-PCF ξ_{sm}(r) that have been linked between a baryonic simulation and DMONLY which are selected based on their mass in the latter. This procedure removes the effects of changes in the subhalo masses, leaving only the effect on the mass profiles and the changes in the positions of the subhaloes.

Through these comparisons, the major reason for the increased clustering in the hydro-dynamical simulations is the lowering of the mass of objects due to galaxy formation with strong feedback. However, secondary effects, such as the resulting changes in the dynamics and density profiles of haloes, are also expected to be significant. Interestingly, Despali and Vegetti [75] find that the presence of baryons reduces the number of subhaloes, especially at the low mass end, by different amounts depending on the model. The variations in the subhalo mass function are strongly dependent on those in the halo mass function, which is shifted by the effect of stellar and AGN feedback. We will investigate these effects on the halo mass function in Section 2.3.

### 2.3. Halo mass function

Different to the power spectrum and 2-PCF, HMF shows another interesting statistic of the large-scale structure. Located on the central structure which connects theory with observation, HMF provides the statistical view of the halo abundance. The two most common methods used for halo identification in simulations are the FoF algorithm (e.g. [76]) and the spherical overdensity (SO) algorithm (e.g. [77]). Interested readers refer to Ref. [78] for the comparison of different halo finding codes.

A series of three versions of cosmological simulations are used in Ref. [31] for their study. Starting from the same initial condition, these simulations share the same number of dark matter particles (1024^{3}) and gas particles (1024^{3}) within a simulation box size of 410 h^{−1} Mpc. A first hydro-dynamical simulation includes radiative cooling, star formation and kinetic SN feedback (CSF hereafter), while the second one also includes the effect of AGN feedback (AGN1 hereafter). As for the DM simulation, it simply replaces the gas particle by collisionless particles, so as to have the same description of the initial density and velocity fields as in the hydro-dynamical simulations.

FoF HMFs are compared on the top panel of **Figure 5** between the three different versions of simulations. While the bottom panel shows the halo number ratio in a mass bin respected to the DM simulation. The baryonic effect from the CSF with respect to the DM case has clear redshift evolution as well as halo mass dependence. From higher redshift to lower redshift, the HMF ratios between the CSF and DM runs decrease from ∼1.6 to ∼1.1, with a weak increasing trend along halo mass changes. Quite remarkably, including AGN feedback in the baryonic model reduces the difference with respect to the DM-only case: the HMF ratio drops to about unity for massive haloes with M_{FoF} ≈ 10^{14} h^{−1} M_{⊙}, while at smaller halo mass it decreases to ∼0.9 for M_{FoF} ≈ 10^{13} h^{−1} M_{⊙}. Different to the CSF case, there is no clear redshift evolution in these ratios from z = 1 to 0. At the highest redshift, z = 2.2, this HMF ratio keeps fluctuating around 1. This could be a consequence of the limited statistics of haloes due to the finite box size.

Using the PIAO2 code [79], the SO haloes are identified with three overdensities Δ_{c} = 2500, 500 and 200. These HMFs are shown in **Figure 6** from left to right top panels, respectively. While the HMF ratios from the CSF and AGN simulations with respect to the one from DM run are shown in lower panels. Baryons show a larger impact on the HMF at the higher overdensity. With Δ_{c} = 2500, the ratio between the CSF and DM HMFs shows a redshift evolution ranging from ∼1.4 at z = 0 to ∼2.5 at z = 2.2, but with no significant dependence on the halo mass. At lower overdensities, the redshift evolution becomes weaker and the differences with respect to the DM case are also reduced. When AGN feedback is included in the hydro-dynamical simulation, the corresponding HMF drops below the HMF from the DM simulation, by an amount that decreases for lower Δ_{c} values, with no evidence for redshift dependence on the HMF difference. Generally speaking, the baryonic effect on the HMF goes in the same direction, qualitatively independent of whether FoF or SO halo finders are used. However, as expected, quantitative differences between FoF and SO results are found, especially for the AGN case. This is rooted in intrinsic algorithm difference of these halo finder methods (we refer interested readers to Ref. [78] for details).

The three simulations share the same dark matter particles, which have the same progressive identification number (ID). Therefore, we can use the halo from the DM simulation as the reference. The halo in the CSF or AGN simulation is defined as the counterpart of the DM halo, if it includes the largest number of DM particles belonging to the latter. In their paper, a matching rate is defined as the ratio of matched to total number of dark matter particles in the DM halo. To avoid multiple-to-1 matching from CSF/AGN simulation to the DM one, only haloes with matching rate larger than 0.5 are selected.

**Figure 7** shows the halo mass ratios between these matched haloes. Red points indicate the halo pairs, which are coming from CSF and DM simulations, while green points are for the pairs from AGN and DM simulations. The thick lines show the mean value of these data points computed within each mass bin (magenta for CSF and blue for AGN, respectively). For the CSF-DM halo pairs, the increased halo mass is almost independent of redshift. At each redshift, the ratio shows a weak decrease with halo mass, from ~ 1.1 at M_{500} = 10^{12.5} h^{-1} M_{⊙} to ~ 1.05 at M_{500} > 10^{13.5} h^{-1} M_{⊙}, then becoming constant. However, for the AGN-DM pairs, the strong AGN feedback makes the ratio go in the opposite direction (decreased halo masses). Thereby, this will result in a decreased HMF, which has been shown in **Figure 6**. There also shows no evidence of redshift evolution for the halo mass ratio, at least below z = 1.0. However, this ratio shows a strong dependence on the halo mass, which increases from ~ 0.8 at M_{500} = 10^{12.5} h^{-1} M_{⊙} to ~ 1 for the most massive haloes found in their simulation box.

### 2.4. Summary

Stepping from dark-matter-only to hydro-dynamical simulations allows us to view the galaxy formation and evolution in the Universe in a self-consistent and realistic way. Hydro-dynamical simulations estimate tight connections between theoretical and observational researches, therefore providing a perfect test lab for examining theories. Through these comparisons between state-of-the-art hydro-dynamical simulations and dark-matter-only ones, we summarized the recent findings of baryonic effects on the large-scale structure of the Universe by showing the changes on power spectrum, two-point correlation function and halo mass functions:

Power spectrum. There is a decreased power (1%) at k ~ 0.8–5 h Mpc

^{−1}(~8–1 h^{−1}Mpc). At smaller scales (<1 h^{−1}Mpc or k > 6 h Mpc^{−1}), the power rises quickly far above the dark-matter-only simulations because of the baryon processes. However, this increase is reduced by >10% when the AGN feedback is switched on. Power spectra for individual component reveal at which scales they are responsible for these changes: cold dark matter dominates the power spectrum on large scales; gas component contributes mildly over all scales; stellar component is the reason for the high power at small scales.Two-point correlation function. The correlation functions for subhalo are typically ~10% higher in hydro-dynamical simulations than in dark-matter-only ones. While this change is significantly larger at smaller scale. With AGN feedback on, the differences are slightly higher at large scale and lower at small scale compared to the reference one without AGN feedback. Subhaloes are also strongly clustered with matter in the baryonic simulations than in the dark-matter-only ones.

Halo mass function. The halo mass functions are also higher from hydro-dynamical simulations than from dark-matter-only simulations. These differences depend on redshifts, halo mass ranges and halo finding methods. With AGN feedback, the halo mass functions are normally lower than their counterparts from the dark-matter-only simulations. These changes are vividly indicated by the variances of the halo masses, which are matched one to one between these simulations.

Besides these statistics methods investigated in upper paragraphs, baryonic processes can also leave an impact on cosmological structures, such as cosmic webs, sheets and voids. Using the EAGLE simulation, Paillas et al. [80] studied the effect of baryons on void statistics. They found that the dark-matter-only simulation produces 24% more voids than the hydro-dynamical one, but this difference comes mainly from voids with radii smaller than 5 Mpc. They claimed that there are no significant differences in the density profiles between voids in hydro-dynamical and its dark-matter-only counterpart.

However, as we already see, all these results strongly depend on the included baryonic models and simulation codes. There is no guarantee of a perfect model yet, especially that most of the implanted baryonic models are based on observational relations. Starting from the same initial condition of a galaxy cluster, the recent nIFTy project [40–44] has made vast comparisons between different simulation codes as well as baryonic models included in them. There is a good agreement between these simulation codes for the dark-matter-only runs. A larger disagreement is shown between the classic SPH codes and mesh/modern SPH/moving mesh codes for the non-radiative hydro-dynamical runs. In the full physics runs, the largest difference is lying between the runs with AGN feedback and the ones without AGN feedback. However, even inside both families, there are a lot of variances between different simulation codes.

To simulate the observed Universe, more efforts are needed to understand the sub-grid baryonic models, such as their parameter choice, resolution and method dependence. To understand and pin down these sub-grid models, we need direct and detailed comparisons between the simulation results and observational ones. Thus, a one-to-one comparison is much helpful than the statistical relations. These constrained simulation projects aiming to represent the observed Universe, the ELUCID project [81–83], the CLUES project [84], [85] and the APOSTLE project [86], point to the direction of future simulation studies.