Open access peer-reviewed chapter

Efficient Simulation of Fluids

By Pierre Thuillier Le Gac, Emmanuelle Darles, Pierre-Yves Louis and Lilian Aveneau

Submitted: February 11th 2019Reviewed: May 1st 2019Published: July 25th 2019

DOI: 10.5772/intechopen.86619

Downloaded: 112

Abstract

Fluid simulation is based on Navier-Stokes equations. Efficient simulation codes may rely on the smooth particle hydrodynamic toolbox (SPH), a method that uses kernel density estimation. Many variants of SPH have been proposed to optimize the simulation, like implicit incompressible SPH (IISPH) or predictive-corrective incompressible SPH (PC-ISPH). This chapter recalls the formulation of SPH and focuses on its effective parallel implementation using the Nvidia common unified device architecture (CUDA), while message passing interface (MPI) is another option. The key to effective implementation is a dedicated accelerating structure, and therefore some well-chosen parallel design patterns are detailed. Using a rough model of the ocean, this type of simulation can be used directly to simulate a tsunami resulting from an underwater earthquake.

Keywords

  • fluid simulation
  • SPH
  • CUDA
  • MPI
  • Navier-Stokes
  • tsunami

1. Introduction

Submarine earthquakes may generate tremendous disasters for human, like what occurred during the Tohoku earthquake in 2011. Even if their seismic waves may damage buildings and structures when they occur close to the coast, the tsunami they generally cause are a massive risk for humans. Indeed, the energy produced by a massive undersea quake is transmitted into the water at high speed and results in a high wave when it arrives on the coast.

To avoid human losses, tsunami’s simulations can help to inform the governments and society about the risks before and after a submarine earthquake. This chapter presents solutions for implementing such simulations. The main objective is to be able to calculate the propagation of the tsunami wave into the ocean and then to simulate efficiently its effects when the wave reaches the coast. These kinds of simulation can be done in two dimensions considering only the profile of the coast or in three dimensions when all the topography is considered. In both cases, the simulation must handle how water is affected by the earthquake wave.

Viscosity is the measure depicting how a fluid resists deformations. Even water is considered having a non-nil viscosity: so, this parameter must be considered carefully for tsunami simulations. Water simulation relies on Navier-Stokes equations that describe the motion of a viscous fluid. Unfortunately, Navier-Stokes equations cannot be solved directly like it is the case for many differential equations. The only way to obtain a solution at a given time consists of approximating it through simulation. In practice, two family of methods may be used. The first one consists in discretizing the simulation space into small parts and to do the simulation considering fixed cells in this discrete space (mesh approach). Well-known methods are the finite element, the finite difference, and the finite volume. A second alternative approach is the smoothed particle hydrodynamics (SPH), introduced in astrophysics in 1977 [1, 2], which is applied in computer graphics [3], oceanography, and many other fields. The latter is particularly interesting for tsunami simulation, since the most important part of the simulation is not in the ocean but rather on the ground. This implies that a part of the fluid will cover the coast. This heterogeneity makes the mesh-free SPH approach more adapted.

This chapter is organized as follows. Section 2 presents the basics of SPH, detailing the different involved mathematical expressions and steps and previous implementations proposed in the literature. Section 3 presents a parallel implementation of SPH: it recalls the main parallel patterns and how they are used to obtain a reliable and fast simulation. Before the conclusions, Section 4 presents some results for a simple case of tsunami.

2. SPH formulation

SPH is a Lagrangian approach, meaning that particles representing tiny parts of the fluids may move during the simulation. It is based on density estimation applied to moving particles, leading to an approximation of the Navier-Stokes equations. This section recalls these equations and presents the basics of SPH.

2.1 Navier-Stokes equations

Navier-Stokes equations model the dynamics of a fluid. They rely on the Newton second law, stating that the sum of the forces applied on a body equals the product of its mass by its acceleration (F=m·a). In practice, it is a system of two equations: the mass continuity equation and the momentum equation. The first one is given by

ρt+ρu=0E1

where ρdesigns the fluid density, is the gradient operator, and uis the flow velocity. The momentum equation is

ρ∂u∂t+u·u=p+μ2u+ρgE2

where ·is the divergence operator, 2the Laplacian operator, pthe pressure, μthe dynamic viscosity coefficient, and gthe gravity term. The right part of the momentum equation represents the sum of the forces that the fluid undergoes, where pis the pressure force, μ2uthe viscosity force, and ρgthe gravitational force. With the fluid velocity function being unknown, it is not possible to compute analytically its divergence. Then, the momentum equation is nonlinear.

Nevertheless, some methods allow to calculate an approximation of these two equations. Most of them regularly discretize the Euclidean space and calculate an approximation by using the finite difference theorem. The advection term (the left part of the momentum equation) is approximated placing particles into the grid and then computing their displacement. In other words, each grid cell contains a given amount of fluid, and the algorithm calculates the exchanges between adjacent cells. Such a solution is quite difficult to use into environment where some large part (like ocean) and highly detailed parts must be considered together.

Another method to approximate the Navier-Stokes equations is SPH. The Euclidean space is no more discretized. Instead, it considers some moving particles representing the fluid and their interactions. Each particle comes with its specific velocity, pressure, density, and viscosity. Then, the total derivative allows to approach the advection term (those between parentheses on the left part of the momentum equation) by a single derivative term dudt. For a given particle i, the momentum equation becomes

ρiduidt=pi+μi2ui+ρigE3

Therefore, the acceleration aiof the particle iis given by

ai=duidt=piρi+μi2uiρi+gE4

2.2 Introduction to SPH

SPH relies on the kernel density estimation [4]. When we only have some samples of a given function, we can estimate its value at a new location using a kernel function Wand the following estimation:

fx=1ni=1nWxxjfxiE5

The kernel function Wmust be a unit positive function. In other words, it must satisfy the two following properties:

xR,Wx0E6
RWxdx=1E7

We denote Wha kernel function with bounded support 0h. This simply means that Whx=0for all x<0or xh. This mathematical tool is used to approximate any scalar field Aifor any particle i:

Ai=j=1nAjmjρjWhxxjE8

where mjand ρjare, respectively, the mass and the density of the jth particle. Useful kernels for liquids are given in [3]. From this simple expression, we can deduce the estimation of the gradient and the Laplacian of a scalar Ai:

Ai=j=1nAjmjρjWhxxjE9
2Ai=j=1nAjmjρj2WhxxjE10

These formulas allow to calculate the density of any particle, the gradient of the pressure, and the Laplacian of the velocity to approximate a solution of the Navier-Stokes equations. For each particle i, the SPH algorithm follows:

  • Compute the density ρi:

ρi=j=1nmjWhxxjE11

  • Compute the pressure pi:

pi=kρiρ0E12

where kis the gas constant and ρ0is the rest density.

  • Compute fi, the sum of the forces at particle iof pressure, viscosity, surface tension, and gravity:

fipressure=j=1nmjpi+pj2ρjWhxixjE13
fiviscosity=μj=1nmjujuiρj2WhxxjE14
fisurface=σ2csininiE15
figravity=ρigE16

where σis the tension coefficient relating to the interface between the fluid and the exterior (the air), niis the normal vector to a particle i, and csiis the color field of the particle i.

  • Compute the velocity uiand the new particle position xiusing a small integration time step Δt:

ui=ui+ΔtfiρiE17
xi=xi+ΔtuiE18

The SPH simulation uses these formulas to compute the positions of the particles for a given time length through an iterative procedure. The particles’ interactions are very important: we use a rather small support (small hvalue) for the kernel function in order to limit the number nof neighboring particles. Then, in any good implementation, one of the key elements is the neighboring handling. Using a parallel processor, this can be achieved with a low complexity, allowing to reach short computation times.

2.3 SPH algorithms

SPH method presented in Section 2.2 is quite immediate to implement [3]. Using a small kernel support, the calculation of the forces that apply to a given particle is quite fast, since only a few numbers of neighbors have to be considered. Nevertheless, the neighborhood needs to be efficiently computed and stored to accelerate the calculations. This needs to be done for each time step. To do that, a regular grid is the faster solution. The size of a grid cell is set as the radius of the kernel support. Then, to find the neighbors of a given particle, it is enough to consider the cells surrounding the one containing this particle. In dimension 2 this leads to 9 cells (including the cell containing the particle) and 27 in dimension 3.

The SPH method described in Section 2.2 has been extended to solve some accuracy problem with incompressible fluids, for instance, predictive-corrective incompressible SPH (PC-ISPH), incompressible SPH (ISPH), and implicit incompressible SPH (IISPH) [5, 6, 7]. In Ref. [7], comparisons between these three techniques show that IISPH is faster than PC-ISPH and ISPH, mainly since it allows to use bigger time steps. Hence, this chapter focusses on an implementation of IISPH. This evolved method is also more complex than classical SPH, and then each time step uses more calculations (but they are longer, so it is faster still). More precisely, for each particle it calculates the density ρiand the forces of viscosity, surface tension, and gravity like in SPH method. It adds the calculation of the advection velocity, which is the portion of the velocity independent to the pressure exerted by the other particles:

uiadv=ui+Δtmifiviscosity+fisurface+figravityE19

The IISPH algorithm calculates the advection factor diiand the advection coefficient aii:

dii=Δt2jnmjρi2WhxixjE20
aii=jnmjdiidjiΔWhxixjE21

The IISPH algorithm continues with the calculation of pressure’s forces. It is done through at least two corrective loops to enforce the minimization of the difference between the rest density and the sum of the density of all particles. First, this loop calculates the advection density:

ρiadv=ρi+Δtjnmjuiadvujadv·ΔWhxixjE22

Second, it calculates the following term per particle that will be used many times in the next steps:

jndijpjl=Δt2jnmjρj2pjlΔWhxixjE23

where lis the iteration number of the corrective loop. Notice that for l=0, IISPH uses pi0=ωpi, withω=0.5.

Then, the IISPH corrective loop continues by computing for each particle the pressure force thanks to the following expression:

fipressure=jnmimjpiρi2+pjρj2ΔWhxixjE24

where piis the pressure at a particle i:

pi=1ωpil+ωaiiρ0ρiadvpilE25

This last term is computed using the displacement factors:

pil=jnmjjndijpjldjjpjlkindjkpkl·WhxixjE26

All these calculations should be made in parallel to reduce the computation times, using a tuned implementation, for instance, using message passing interface (MPI) for high-performance computing (HPC) or using the Nvidia common unified device architecture (CUDA) on graphics processing unit (GPU) for simpler computers.

3. Parallel SPH implementation

An efficient SPH implementation relies on parallelism at some level. A fully parallel solution may become a very efficient solution, as previous works have shown it. While most of the calculations may be done considering a single particle into a single core, finding the neighboring particles that play a role in the density, the pressure, and the external forces needs collaboration between different cores.

Using the texture mechanism available with GPU, working with the neighbors is quite simple and efficient. Nevertheless, this implies to store all the particles into a regular grid at each time step during the simulation. This part is somewhere the most complicated, and the key step for an efficient implementation.

This section first presents the main parallel patterns (MAP, SORT, SCAN, etc.) and then shows how they can be combined to write a new fast parallel SPH solver.

3.1 Parallel patterns

Writing a parallel algorithm is not as simple as writing a sequential algorithm. This truism is based on the necessary consideration of the collaborations between the different processors of a parallel machine: all the processors must work in concert, and not isolated as in a sequential approach. These collaborative aspects are the main difficulty. How to make sure all these processors expect when it’s needed and work to the fullest when no synchronization is required?

Rather than writing a parallel algorithm based on classical sequential patterns, parallel patterns make it possible to write a parallel algorithm directly, abstracting the underlying machine. These patterns rely on very simple parallel architecture, called the parallel random-access memory (PRAM). It assumes a synchronization between an infinite set of processors and an infinite amount of memory [8].

3.1.1 Simple parallel patterns

Simple parallel patterns do not need synchronization. This means that, using a GPU or an HPC, they may be run without any difficulties, even with less processors than needed. The simpler one is the MAP, or transform, that consists in applying a given function fto an input data to obtain the output. The key of this pattern is about the localisation of the data: input and output are generally considered as vectors (or arrays). Then, MAP applied to data at the same index:

Yi=fXiMAPE27

Figure 1 describes this pattern on small arrays.

Figure 1.

Illustration of the MAP parallel pattern.

In many occasions, it is necessary to write the result at a new location, another index. When each possible destination index is used once and only once, we obtain a quite simple parallel pattern called SCATTER. It consists of writing the input data from location ito the destination location mapi, mapbeing a permutation function. Figure 2 illustrates this parallel pattern.

Figure 2.

The SCATTER and GATHER patterns move data using a permutation.

In the same spirit, the GATHER parallel pattern writes at index idata coming from index mapi, using again a permutation function. To differentiate between a SCATTER and a GATHER, you should remember that at first we read contiguous data, while in the second, we write at contiguous location. This is resumed with the following two expressions:

Ymapi=XiSCATTERE28
Yi=XmapiGATHERE29

PRAM model is very useful to write efficient algorithms on theory. Nevertheless, at the end these algorithms run on real computers, with a limited amount of memory and a fixed number of processors. Brent’s theorem links the theoretical computation time on PRAM model with the one obtained using only pprocessors: an algorithm made in O1using mprocessors will run in Ompusing only pprocessors. This allows to predict the behavior of (very) simple algorithm on a GPU.

3.1.2 Advanced parallel patterns

In many cases, some degree of collaboration is needed between processors. This leads to some more complicated parallel patterns. A very common parallel pattern using such a collaboration is the SORT that sorts data according to a given order. It is used in previous SPH implementation for building the neighbors’ grid. The SORT pattern is based on the PARTITION pattern that moves values with respect to a given predicate. More precisely, for nvalues Xiand using the predicate Pi01, the PARTITION pattern moves the values Xifor which Pi=1at the beginning of the resulting array, the others at the end (see Figure 3 for a simple example).

Figure 3.

Illustration of the PARTITION pattern for nine input values; the values with predicate 1 are put at the beginning of the output, the others at the end.

These complex patterns are built using a fundamental pattern called SCAN. It corresponds to a prefix sum of values, according to the following expression:

Yi=j=0iXjINCLUSIVESCANE30

The fundamental pattern exists in two versions: inclusive and exclusive ones. The first corresponds to the expression given above, doing a sum-up to the current output position. The exclusive version omits the current position, doing a sum-up to i1and using a nil value for Y0(generally, using 0):

Yi=j=0i1XiEXCLUSIVESCANE31

Figure 4 shows that these two versions of SCAN are almost the same, except the shift between the resulting arrays: the values obtained with inclusive version correspond to the ones obtained with the exclusive version at the same position plus one.

Figure 4.

Differences between the inclusive and exclusive SCAN patterns.

Another pattern of interest into this chapter is the REDUCE that allows to calculate a single value from an array of values and using any given associative binary function:

Y=j=0n1XiREDUCEE32

For instance, using X=12345678910and the classical integer sum as binary operator, this pattern returns Y=55, the sum of the 10 first non-nil integers.

These complex patterns have roughly speaking all the same complexity, in Olognon a PRAM machine and Onplognpusing pprocessors only. Nevertheless, since they are built using the SCAN, PARTITION and SORT are in practice more complex and take more time. A fast implementation of the SORT pattern relies on the radix sort algorithm that loops over the number of digits of the maximum key to sort, thus having a practical complexity in O32lognwith 32-bit integers.

The last programming tool this section covers is the atomic operation notion. A load-modify-write operation cannot be handled in parallel program without caution. Let us consider two processors doing a “plus one” in parallel at the same time. The addition is done by the CPU using registers (local memory to the CPU). Hence the variables to add need to be loaded from the main memory, then added, and then stored into the main memory. If the two processors do the load-modify-write operation at the same time exactly on the same variable, then the result is false. If the processors are not exactly synchronized, the result is certainly false also: to be correct, the two operations must be done sequentially. Atomic operations provide this behavior, performing the read-modify-write operation for one and only one processor at a time.

Obviously, other parallel patterns exist. They are not discussed in this chapter since they are not used in our SPH implementation.

3.2 Grid building

Previous SPH implementations use the SORT pattern to build the neighbors’ grid [7, 9, 10]. The first step consists in calculating the grid index of each particle, using a MAP. Next, the particles are sorted with respect to this index. Then, it is necessary to compute the number of particles per cell and the starting position of each cell. In [9], atomic operations are used for these two operations: the minimum for the first particle into each cell and the addition for the number of particles per cell.

In Ref. [7], authors follow a similar approach with the particle sort with respect to their cell index but using a MAP to mark the start and the end of each cell with respect to the sorted cell indices, considering their unicity.

The main problem is that the sorting algorithm takes a large part of the computation time, near 30% according to [10]. In this chapter, we avoid the full sorting by combining simple parallel patterns and atomic operations. Our grid building algorithm is summarized in Figure 5.

Figure 5.

Our algorithm to build the neighbors’ grid.

This algorithm uses the Nvidia Thrust API with some freedom to shorten it. First, at line 8 the number of particles per grid cell is set to zero. Next, like with previous methods at line 9, the index of each particle is calculated with a MAP. Using a second MAP at line 10, the particle cell offset is calculated using an atomic addition. More precisely, we use the CUDAint. atomicAdd(int*cc, int. a)function that addsato the variable*ccand returns the old content of*cc. Since atomic operations are done in sequence, the number of particles per cell is correctly computed. Moreover, each particle receives the old counter value, which is 0 for the first atomic operation execution, 1 for the second, and so on up to m1for the last particle added to the cell, mbeing the number of particles added into the cell. These values are used to scatter all the particles to their local position into the grid, at line 13. But, before to do that we need to calculate the global grid offset, corresponding to the position of the first particle of each cell. This is done using an exclusive SCAN at line 11 to compute the global offset, followed by a MAP at line 12 to calculate each particle global offset.

In practice, this algorithm can be optimized in many ways. First, the device vectors can be allocated only once, and not each time the grid is built. Second, the first two transforms (lines 9 and 10) can be mixed into one. This will limit the memory loading into device registers, known as a major performance limitation with GPU. At last, the last transform (line 12) and the scatter (line 13) can be mixed into a single call again to minimize the memory bandwidth usage. Moreover, the particles’ data must be split into multiple arrays for efficiency (one array for position, one for density, one for pressure, etc.) as in [10].

3.3 Main algorithm

The most difficult part of the implementation of the IISPH method is the construction of the neighbors’ grid, as for any non-mesh density kernel method. The rest of the calculation is rather simple and relies on two parallel models: the MAP for all the loops on particles and the REDUCE to control the termination of the corrective loop in the calculation of the pressure force.

It is noticeable that the IISPH loop to correct the pressure force runs on the CPU, because there is no available global synchronization on the GPU. Then, the REDUCE is used to return a value from the GPU to the CPU, to decide if more corrections are needed or not. Nevertheless, since this just consists of sending one real value, it is not a big bottleneck.

Moreover, many calculations use data from the neighbors (pressure, density, position, etc.). L1 GPU’s memory is used to accelerate these calculations, reducing the computation time around a third in our experiment. Notice also that the IISPH corrective loop amortizes the neighbors’ grid building. In our experiments the grid building now represents less than 10 percent of the full computation time.

4. Experiments

The IISPH is a valid solution to simulate a tsunami [11]. Its main advantage regarding a discrete method is that it does not need to refine the mesh near the obstacles, like the coast and the buildings. Moreover, the wave can go everywhere, including interfering with the beach, buildings, infrastructure, etc.

In this chapter, we illustrate the tsunami simulation using IISPH algorithm through a rather simple scenario. It contains a short coast ending with a mountain. We put a building just after the beach. The main difficulty, if either, consists of generating the solitary wave. A tsunami, for instance, is generated by an earthquake at long distance. The produced wave runs at 200 meter per second (720 km/h). We do not need to simulate the propagation of the wave since its epicenter, which is quite difficult with long distance: it needs very long simulation time to see the wave reaching the beach, and obviously it needs a huge amount of memory to handle the sea between the two distant locations. Instead, we simulate the wave into a rather small space. We can predict the time of arrival to the beach, assuming we know the exact distance between the beach and the earthquake location.

In [11], authors solve the solitary wave solution of Boussinesq. They calculate the wave paddle displacement using the equation:

Xt=CtθκE33

where Cis the wave velocity,tis the time in the simulation frame, κis the decay coefficient, and θis given using Newton’s method by

θl+1=θlθlκCt+Hhtanθll+Hhsech2θlE34

More precisely, θis the solution of the following problem:

Xt=HκhtanhκCtXtE35

where Hand hare the wave height and the water depth, respectively.

While this method works on CPU, it is not well-suited for a CUDA implementation of the IISSPH, mainly because the number of iterations of the Newton-Raphson method depends on the input values, and so is not constant per particle.

Hence, in this chapter we use a different but simple technique. The wave is produced using a piston wave generator. Here, the piston is a huge virtual object that moves the water to reach the speed of the wave. The length and the speed of the piston movement are calibrated to obtain the good height and speed of the tsunami solitary wave.

Figure 6 illustrates such a simple wave simulation, before the tsunami wave arrives. Figure 7 shows the wave arriving at the building at t=4 s in the simulation frame. In Figure 8, at t=8 s, the building is completely below the ocean that returns into its bed after some more time (see Figures 9 and 10). The building is made with fixed particles that nevertheless are considered with the moving particles of the fluid. This allows an interaction between two kinds of particles, and it permits to obtain the pressure and force applied to the building, for instance, to check if it will resist or not.

Figure 6.

Tsunami simulation before the solitary wave arrives.

Figure 7.

Tsunami wave reaching the building near the beach.

Figure 8.

The tsunami wave engulfed the building and the coast.

Figure 9.

The tsunami wave begins to pull off the coast. With flat coasts, this may take some time.

Figure 10.

After a longer time, the tsunami wave has almost completely disappeared.

5. Conclusions

This chapter focusses on the simulation of a tsunami solitary wave. Such a wave is mainly produced by submarine earthquake and may provoke vast disasters for human living near the coasts. Such phenomena also may produce strong degradation on buildings and structures, in turn inducing human loss as what happened after the Tohoku earthquake in 2011. To avoid these disasters, it is important to be able to validate the robustness of structures and buildings near the dangerous coasts and to inform population after an always unpredictable submarine earthquake.

To achieve these goals, it is necessary to produce robust and fast fluid simulator software. To simulate a tsunami wave, a good candidate is the SPH method. Since it does not need the usage of a fix mesh like in discrete techniques, it allows to handle correctly the wave running on the beach and after. Moreover, it correctly handles the contact with buildings and structures, allowing to simulate the forces that they undergo.

This chapter recalls the implicit incompressible SPH method, which is one of the fastest among the SPH ones. The parallel implementation for GPU is detailed in depth, with a fast algorithm to build the neighbors’ grid, avoiding the classical sorting method which is more time-consuming.

At last, this chapter proposes a simple tsunami wave simulation using a piston wave generator, a simple solution for implementing and providing valuable results. It can be used to simulate tsunami generated by submarine earthquake occurring in a pattern of seismic source mechanism when both the location and intensity are estimated.

© 2019 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution 3.0 License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite and reference

Link to this chapter Copy to clipboard

Cite this chapter Copy to clipboard

Pierre Thuillier Le Gac, Emmanuelle Darles, Pierre-Yves Louis and Lilian Aveneau (July 25th 2019). Efficient Simulation of Fluids, Seismic Waves - Probing Earth System, Masaki Kanao and Genti Toyokuni, IntechOpen, DOI: 10.5772/intechopen.86619. Available from:

chapter statistics

112total chapter downloads

More statistics for editors and authors

Login to your personal dashboard for more detailed statistics on your publications.

Access personal reporting

Related Content

This Book

Next chapter

Seismic Velocity Structure in and around the Japanese Island Arc Derived from Seismic Tomography Including NIED MOWLAS Hi-net and S-net Data

By Makoto Matsubara, Hiroshi Sato, Kenji Uehira, Masashi Mochizuki, Toshihiko Kanazawa, Narumi Takahashi, Kensuke Suzuki and Shin’ichiro Kamiya

Related Book

First chapter

Seismic Wave Interactions Between the Atmosphere - Ocean - Cryosphere System and the Geosphere in Polar Regions

By Masaki Kanao, Alessia Maggi, Yoshiaki Ishihara, Masa-yuki Yamamoto, Kazunari Nawa, Akira Yamada, Terry Wilson, Tetsuto Himeno, Genchi Toyokuni, Seiji Tsuboi, Yoko Tono and Kent Anderson

We are IntechOpen, the world's leading publisher of Open Access books. Built by scientists, for scientists. Our readership spans scientists, professors, researchers, librarians, and students, as well as business professionals. We share our knowledge and peer-reveiwed research papers with libraries, scientific and engineering societies, and also work with corporate R&D departments and government entities.

More About Us