Runtimes and the number of fluid particles for each model resolution of test case N. 1.

## Abstract

Today, the use of modern high-performance computing (HPC) systems, such as clusters equipped with graphics processing units (GPUs), allows solving problems with resolutions unthinkable only a decade ago. The demand for high computational power is certainly an issue when simulating free-surface flows. However, taking the advantage of GPU’s parallel computing techniques, simulations involving up to 109 particles can be achieved. In this framework, this chapter shows some numerical results of typical coastal engineering problems obtained by means of the GPU-based computing servers maintained at the Environmental Physics Laboratory (EPhysLab) from Vigo University in Ourense (Spain) and the Tier-1 Galileo cluster of the Italian computing centre CINECA. The DualSPHysics free package based on smoothed particle hydrodynamics (SPH) technique was used for the purpose. SPH is a meshless particle method based on Lagrangian formulation by which the fluid domain is discretized as a collection of computing fluid particles. Speedup and efficiency of calculations are studied in terms of the initial interparticle distance and by coupling DualSPHysics with a NLSW wave propagation model. Water free-surface elevation, orbital velocities and wave forces are compared with results from experimental campaigns and theoretical solutions.

### Keywords

- SPH
- HPC
- free-surface flows
- Navier-Stokes equations
- Lagrangian techniques

## 1. Introduction

The non-stopping growing of computing power allowed increasing more and more spatial and temporal discretization when simulating engineering problems. The use of modern high-performance computing (HPC) systems, such as clusters equipped with graphics processing units (GPUs) or central processing units (CPUs) structured into a multi-node framework, let academics and professionals solve free-surface flow problems with resolutions unthinkable just a decade ago. Different spatial and temporal scales are often involved when simulating such kinds of phenomena, which may comprise wave generation, propagation, transformation and interaction with coastal or inland defences.

Among the others, smoothed particle hydrodynamics (SPH) method is a promising meshless technique for modelling fluid flows through the use of particles as it is capable to deal with large deformations, complex geometries and inlet wave shapes. Its original frame was developed in 1977 for astrophysical applications [1, 2]. Since then, it has been used in several research areas, e.g. coastal engineering [3, 4, 5, 6, 7], flooding forecast [8, 9, 10, 11], solid body transport [12, 13, 14, 15], soil mechanics [16, 17, 18, 19, 20], sediment erosion or entrainment processes [21, 22, 23, 24], fast-moving non-Newtonian flows [25, 26, 27, 28, 29, 30, 31, 32, 33], flows in porous media [34, 35, 36], solute transport [37, 38, 39], turbulent flows [40, 41, 42] and multiphase flows [43, 44, 45, 46, 47], not to mention manifold industrial applications (see, for instance [48, 49, 50, 51]). The main feature of SPH is that local quantities are evaluated by weighting information carried by neighbouring particles enclosed within a compact support, i.e. by performing short-range interactions among particles. Since the related neighbourhood definition takes most of the computing time, fast neighbour search algorithms have been developed so far [52, 53, 54, 55, 64, 79].

Since a decade or so, SPH has been coded in the massive high-performance computing (HPC) context, making use of the Message Passing Interface (MPI) [56, 57] and the OpenMP library [58, 59], the standards for distributed and shared memory programming, respectively. Several applications involving multicore processors [60, 61] and graphics processing units (GPUs) [62, 63, 64, 65, 66, 67] have been proposed so far. Joselli and co-workers [68] showed in 2015 that performing neighbour search on GPUs yields up to 100 times speedup against CPU implementations, therefore proving the benefits on exploiting the high floating-point arithmetic performance of GPUs for general purpose calculations. The same conclusion was drawn earlier in Ref. [69]. The first versions of SPH running on GPUs were presented in Ref. [70] and then in Ref. [69]. Non-Newtonian fluid flow simulations have been carried as well. Bilotta and co-workers, for instance, applied their GPUSPH model to lava flows [71]. In 2013, Wu and co-workers run GPUSPH to model dam-break flood through complex city layouts [72, 73]. Rustico et al. [74] measured the overall efficiency of the GPUSPH parallelization by applying the Karp-Flatt metric [75]. In Ref. [76], massive simulations of free-surface flow phenomena were carried on single and multi-GPU clusters. They used the sorting radix algorithm for inter-GPU particle swapping and subdomain ‘halo’ building to allow SPH particles of different subdomains interacting. In 2015, Cercos-Pita proposed the software AQUAgpusph [77] based on the use of the freely available Open Computing Language (OpenCL) framework instead of using the Compute Unified Device Architecture (CUDA) platform. In Ref. [78], Gonnet proposed scalable algorithms based on hierarchical cell decompositions and sorted interactions executed on hybrid shared/distributed memory parallel architectures. In Ref. [79], a general rigid body dynamics and an absolute nodal coordinate formulation (ANCF) were implemented to model rigid and flexible objects interacting with a moving fluid. In 2012, Cherfils and co-workers released JOSEPHINE [80], a parallel weakly compressible SPH code written in Fortran 90, intended for free-surface flows. Incompressible SPH (ISPH) algorithms, running on GPUs, have been developed as well [81, 82, 83].

This chapter shows some numerical SPH results of typical coastal engineering problems obtained by means of two different supercomputers: the GPU-based machine maintained at the EPhysLab from Vigo University in Ourense (Spain), mounting 14 NVIDIA Kepler-based cards, with a total of 39168 CUDA cores and the Tier-1 Galileo cluster, introduced on January 2015 by the Italian computing centre CINECA, a non-profit consortium, made up of 70 Italian universities, 6 Italian research institutions and the Italian Ministry of Education, University and Research (MIUR). Galileo is equipped with 516 nodes, each mounting 2 8-cores Intel Haswell 2.40 GHz for a total of 8256 cores, up-to-date Intel Phi 7120p (2 per node on 384 nodes) and NVIDIA Tesla K80 accelerators (2 per node on 40 nodes). Comparison with theoretical and experimental results is also included.

## 2. SPH fundamentals

Recent comprehensive reviews and related applications of the SPH method are given in [84, 85, 86, 87, 88, 89]. Governing equations describing the motion of fluids are usually given as a set of partial differential equations (PDEs). These are discretized by replacing the derivative operators with equivalent integral operators (the so-called integral representation or kernel approximation) that are in turn approximated on the particle location (particle approximation). Next, Section 2.1 gives further details about these two steps, with reference to a generic field f(

### 2.1. Approximation of a field f(x) and its spatial gradients

Following the concept of integral representation, any generic continuous function

where

*W* is the so-called smoothing kernel function or simply kernel and *h*, acting as spatial scale, is the smoothing length defining the influence area where *W* is not zero. While Eq. (1) yields an exact formulation for the function *W* is a key point in the SPH method since it establishes the accuracy of the approximating function

The kernel function *W* has to satisfy some properties (see, for instance, [90, 91]). The following condition

is known as partition of unity (or the zero-order consistency) as the integration of the smoothing function must yield the unity. Since *W* has to mimic the delta function, Eq. (3) can be rewritten as a limit condition in which the smoothing length tends to zero:

Still, W has to be defined even, positive and radial symmetric on the compact support:

where *ϕ* is a positive quantity defining the extent of the compact support. A large number of kernel functions are proposed in literature. Among the others, a computational-efficient and high accurate kernel is proposed by Wendland [92], defined as.

where A(n_{d}), depending on the number of dimensions n_{d}, denotes a scaling factor that ensures the consistency of Eq. (3), whereas *q* denotes the dimensionless distance

The integral representation given by Eq. (2) can be converted into a discretized summation over all particle *N* within the compact support (Figure 2), yielding the particle approximation:

where the index k refers to particles within the compact support (see bold ones in Figure 2), with mass m_{k} and density *ρ*_{k} being carried. Note that in this case the particle approximation is marked by the ‘a’ pedix. The subscript will be avoided from now on. Eq. (7) can be rewritten with reference to particle ‘i’ as

Particle approximation of spatial derivatives of a field function, such as divergence and gradient, is expressed using the gradient of the kernel function rather than the derivatives of the function itself:

where the nabla operator

inside the integral in Eq. (2), yielding in this case

Likewise, the divergence, another particle approximation of the gradient, can be derived, taking into account the following equation:

yielding

Eqs. (12)–(14) are conveniently employed in fluid dynamics as they preserve the conservation of linear and angular momentum.

### 2.2. SPH form of governing equations

The mostly used governing laws ruling fluid motion are the Navier-Stokes equations, which specify that mass and linear momentum are preserved. Conservation laws in Lagrangian form are as follows:

in which *ρ* and *p* is the isotropic pressure, *ν* is the laminar kinematic viscosity,

The material derivative of the velocity field can be deduced from Eq. (14) for the case of inviscid fluids, that is, *ν* = 0:

Numerical diffusion in terms of an artificial viscosity, e.g. proposed in Ref. [96], can be added in Eq. (17), allowing shock waves to be properly simulated:

The dissipative term

where

The notation*c* refers to the speed of sound which magnitude has conveniently to be at least 10 times greater than the maximum estimate of the scalar velocity field [94], *α* is a coefficient that needs to be tuned in order to introduce the proper dissipation. A value of *α* = 0.01 is suggested in Ref. [5] for wave propagation and wave-structure interaction studies.

Problem closure is achieved by combining conservation equations in discrete form (16) and (18) with an equation of state, when the weakly compressible scheme is adopted. A relationship between pressure and density is given in Ref. [97]:

where *c*_{0} is the reference speed of sound, large enough to guarantee Mach numbers lower than 0.1–0.01, ^{3}, when the liquid is water. c_{0} is numerically computed like at least 10 times the expected maximum velocity.

## 3. The DualSPHysics code

DualSPHysics [98, 99, 100] is an open-source code developed by the University of Vigo (Spain) and the University of Manchester (UK) in collaboration with experts from all around the globe that can be freely downloaded from*N* times in parallel by *N* different CUDA threads. Herein, to avoid confusion, we use the term function to describe the CUDA kernels. DualSPHysics makes full use of the function hierarchy present within the CUDA framework. A function executed and called by the CPU is declared as a host function, whereas a global function is called by the CPU but executed in parallel by the GPU. A device function, on the other hand, is only called and executed within the GPU by a global or another device function. This hierarchy is used for the computation of the interparticle forces. The simulations in DualSPHysics consist of three main steps: (i) creation of the particle neighbour list (NL), (ii) force computation (FC) for the particle interaction and (iii) the system update (SU) at the end of the time step.

Due to the Lagrangian nature of SPH, the particle interaction results to be the most time-consuming part of the whole algorithm. Each particle, as already stated, only interacts with its neighbour particles. Therefore, the construction of the neighbour list must be optimised. The cell-linked list described in Ref. [54] is implemented in DualSPHysics. This approach is preferred to the traditional approach, named Verlet list [101] that implies higher memory requirements than the cell-linked list. Besides, [54] proposed an innovative searching procedure based on a dynamic updating of the Verlet list and analyzed the efficiency of all the algorithms in terms of computational time and memory requirements.

DualSPHysics has proven its performance, reaching limits like being able to simulate more than 10^{9} particles using 128 GPUs with an efficiency close to 100% [67].

### 3.1. Boundary conditions

Extensive research has been conducted over the last few years to develop accurate and efficient boundary conditions (BCs) in SPH method. Several approaches are proposed in the literature, such as boundary repulsive forces, fluid extensions to the solid boundary and boundary integral representing the term preservation. In DualSPHysics, boundaries (walls, bottom, coastal structures, wave generators, vessels, floating devices, etc.) are described using a discrete set of boundary particles that exert a repulsive force on the fluid particles when they approach. The so-called dynamic boundary condition [102] is used in DualSPHysics, where the boundary particles satisfy the same equations as the fluid particles; however, they do not move according to the forces exerted on them. Instead, they remain fixed (fixed boundary) or move according to some externally imposed movement (gates, flaps, etc.). Using this boundary condition, when a fluid particle approaches a boundary particle and the distance between them decreases beyond the kernel range, the density of the boundary particles increases giving rise to an increase of the pressure. This results in a repulsive force being exerted on the fluid particle due to the pressure term in the momentum equation. This dynamic boundary condition implemented in DualSPHysics does not include a specific value to define wall friction. However, this has been achieved in different validations by specifying a different viscosity value in the momentum equation when the fluid particles interact with the boundary ones.

### 3.2. Extra functionalities

#### 3.2.1. Long-crested wave generation

The waves are generated in DualSPHysics by means of moving boundaries that aim to mimic the movement of a piston-type and flap-type wavemakers as in physical facilities. Only long-crested wave can be generated at this stage. The implementation of first-order and second-order wave generation theories is fully described in Ref. [3]. For monochromatic waves, this means to include super-harmonics. For random waves, subharmonic components are considered to suppress spurious long waves. Two standard wave spectra are implemented and used to generate random waves: JONSWAP and Pierson-Moskowitz spectra. The generation system allows having different random time series with the same significant wave height (*H*_{m0}) and the same peak period (*T*_{p}), just defining different phase seeds.

#### 3.2.2. Wave reflection compensation

Wave reflection compensation is used in physical facilities to absorb the reflected waves at the wavemaker in order to avoid that they will be reflected back into the domain. In this way, the introduction into the system of extra spurious energy that will bias the results is prevented. The so-called active wave absorption system (AWAS) is implemented in DualSPHysics. The water surface elevation *η* at the wavemaker position is used and transformed by an appropriate time-domain filter to obtain a control signal that corrects the wave paddle displacement in order to absorb the reflected waves every time step. Hence, the target wavemaker position is corrected to avoid reflection at the wavemaker. The position in real time of the wavemaker is obtained through the velocity correction of its motion. Further details on AWAS in DualSPHysics are reported in Ref. [3].

#### 3.2.3. Hybridization with SWASH model

The hybridization technique between DualSPHysics model and SWASH model (

Fast computations with large domains can be performed with SWASH, avoiding simulating large domains with DualSPHysics that requires huge computation times even using hardware acceleration.

SWASH is suitable for calculation where statistical analysis is necessary such as computing wave height and good accuracy is obtained for wave propagation.

SWASH is not suitable for calculation of wave impacts, while DualSPHysics can easily compute wave impacts, pressure load and exerted force onto coastal structures.

Complex geometries cannot be represented with SWASH, and computation stability problems may appear when applied to rapidly changing bathymetry. Using DualSPHysics, any complex geometry or varying bathymetry can be simulated.

The hybridization between DualSPHysics and SWASH has been obtained through a one-way hybridization at this stage. The basic idea is to run SWASH for the biggest part of the physical domain to impose some boundary conditions on a fictitious wall placed between both media. This fictitious wall acts as a nonconventional wave generator in DualSPHysics: each boundary particle that forms the wall (hereafter called moving boundary or MB) will experience a different movement to mimic the effect of the incoming waves. SWASH provides values of velocity in different levels of depth. These values are used to move the MB particles. The displacement of each particle can be calculated using a lineal interpolation of velocity in the vertical position of the particle. Therefore, the MB is a set of boundary particles whose displacement is imposed by the wave propagated by SWASH and only exists for DualSPHysics. A multilayer approach can be used in SWASH. The SWASH velocity measured at each layer is therefore interpolated and converted into displacement time series for DualSPHysics.

## 4. Hardware features

### 4.1. The EPhysLab cluster

The GPU cluster maintained at the Environmental Physics Laboratory (EPhysLab) of Vigo University comprises four computing servers, whose details are as follows:

Supermicro 7047: 4× NVIDIA GeForce GTX Titan Black, 2880 × 4 = 11,520 CUDA cores, 2× Intel Xeon E5–2640 at 2 GHz (16 cores), RAM 64 GB, storage 16 TB, estimated performance 6800 GFLOPS

Supermicro 7047: 4× NVIDIA GeForce GTX Titan 2688 × 4 = 10,752 CUDA cores, 2× Intel Xeon E5-2640 at 2 GHz (16 cores), RAM 64 GB, storage 20 TB, estimated performance 6000 GFLOPS

Supermicro 7046: 4× NVIDIA GeForce GTX Titan Black, 2880 × 4 = 11,520 CUDA cores, 2× Intel Xeon E5620 at 2.4 GHz (8 cores), RAM 64 GB, storage 9 TB, estimated performance 6800 GFLOPS

Supermicro 6016GT-TF-TM2: 1× NVIDIA Tesla K40 2880 CUDA cores + 1× NVIDIA Tesla K20 2496 CUDA cores, 2× Intel X5550 at 2.66 GHz (8 cores), RAM 64 GB, storage 1.7 TB, estimated performance 2855 GFLOPS

### 4.2. The Galileo supercomputer

Galileo is a Tier-1 supercomputer among the fastest available to Italian industrial and public researchers. Introduced in the Italian computing centre CINECA on January 2015, this IBM NeXtScale model is equipped with up-to-date Intel accelerators (Intel Phi 7120p), NVIDIA accelerators (NVIDIA Tesla K80), as well as a top-level programming environment and a number of application tools. It is characterised by:

516 computing nodes with Intel Haswell 2.40 GHz processors, 2 × 8 core each (8256 cores in total)

128 GB of RAM per computing node, 8 GB per core, 66 TB of total RAM

Internal network: InfiniBand with 4× QDR switches (≈40 Gb/s)

Two Intel accelerators Phi 7120p per node on 384 nodes (768 in total)

Two NVIDIA accelerators K80 per node on 40 nodes (80 in total, 20 available for scientific research)

Eight nodes devoted to login/visualisation

Theoretical peak performance 1.2 PFlops

≈480 GFLOPS single-node LINPACK (only CPU) sustained performance

Disc space: ≈2 PB of local scratch

Operating system: Linux CentOS 7.0

On June 2017, Galileo was ranked in 281st position on the top 500 supercomputer list (

## 5. Test cases

Aiming to prove the capability of DualSPHysics model to reproduce accurately waves and wave-structure interaction phenomena, three different test cases have been selected and are here reported, namely:

Wave generation and propagation of random wave train in 2D.

Wave run-up on a cubic block breakwater in 3D.

Coupling of DualSPHysics with SWASH model and application to wave forces on coastal structures in shallow water conditions (2D).

### 5.1. Test case N. 1

The first test case comprises the generation and absorption of random waves in DualSPHysics. The 150 s time series to be generated is calculated starting from a JONSWAP spectrum. The target wave conditions are *H*_{m0} = 0.06 m and *T*_{p} = 1.3 s. The water depth is 0.36 m. The wavelength is 2.09 m. Second-order wave generation has been used (i.e. bound long waves). The wave conditions correspond to a second-order Stokes wave. The geometrical layout of the case is depicted in Figure 3: an 8.4-m long wave tank is modelled. A damping zone (passive absorption) is defined at the end of the tank. The water surface elevation and orbital velocities are measured using a 5-wave gauge array where the central wave gauge is at 2 L from the generator. The numerical results are compared with theoretical solutions.

A sensitivity analysis on the initial interparticle distance, *dp*, has been carried out. Four different values of *dp* have been selected in a range of *H*/*dp* between 6 (coarsest resolution) and 20 (finest resolution). For each case, the number of fluid particles and the computational runtime are reported in Table 1.

H/dp | No. of fluid particles | Runtime [h] |
---|---|---|

6 | 29,365 | 0.9 |

10 | 82,541 | 2.9 |

12 | 119,209 | 4.5 |

20 | 333,081 | 16.6 |

The case with *H*/*dp* = 10 has been simulated also on Galileo supercomputer, specifically using one node in order to compare the computing capabilities between Galileo and one GPU from the EPhysLab cluster. The comparison is expressed in terms of the number of calculation steps per second of computational time. For each node in Galileo, 23.9 step/s can be simulated, whereas with a Tesla K20, 156.3 step/s are achieved. These results refer to 2D simulations, and they are expected to be different for 3D modelling.

The numerical results for *H/dp* = 10 have been plotted against the theoretical ones for each sensor position. They are all depicted in Figures 4–6. The model accuracy has been estimated in terms of spectral values of wave height and period. The numerical error, together with the calculated values for *H*_{m0} and *T*_{m-1,0}, is reported in Table 2 for WG3 (*x* = 4.18 m). For *H/dp* = 6, the wave height is underestimated about 4%; meanwhile, starting from *H/dp* = 10, the errors for both wave height and period are in the order of 1–2%. Similar results are attained for the other four wave gauges. The orbital velocities show same degree of accuracy.

H/dp | H_{m0-THE} [m] | T_{m-1,0-THE} [s] | H_{m0-SPH} [m] | T_{m-1,0-SPH} [s] | ε_{H} [%] | ε_{T} [%] |
---|---|---|---|---|---|---|

6 | 0.061 | 1.214 | 0.058 | 1.250 | −4.52 | +3.00 |

10 | 0.060 | 1.233 | −1.00 | +1.54 | ||

12 | 0.061 | 1.231 | −0.12 | +1.43 | ||

20 | 0.062 | 1.234 | +1.70 | +1.67 |

### 5.2. Test case N. 2

The second test case consists of a 3D case where the wave run-up on an armour breakwater has been simulated. Cubic blocks are displaced forming two layers with regular pattern on the seaward face of the breakwater, which has an angle of 28.3° with the horizontal. The side of each block measures 0.058 m. The case resembles an experimental one carried out in the small-scale wave flume CIEMito at the Technical University of Barcelona, Spain. The flume width is 0.38 m. Monochromatic waves have been simulated. The simulated wave height is 0.10 m, with mean period equal to 0.97 s. In total, 15 s of physical time has been simulated. The initial interparticle distance was 0.012 m, about one-eighth of the target wave height, resulting in 1,039,775 fluid particles. The simulation took 8.7 h using the Tesla K20 from the EPhysLab cluster.

Four wave gauges are located to measure the water surface elevation along the flume. The first wave gauge is at 3.10 m from the wavemaker; the last one is at 4.23 m. The distance between the toe of the breakwater and the wavemaker is 5.95 m. Moving boundaries mimicking a piston-type wavemaker are used in DualSPHysics to generate waves. To measure the run-up, the water surface elevation has been measured at 4160 locations across the breakwater. The results, post-processed in Matlab, have given the time series of wave run-up. A three-dimensional view of the numerical model is depicted in Figure 7. Using post-processing tools of DualSPHysics, an isosurface of the fluid has been extracted and plotted in ParaView software (

Figure 8 shows four different instants of time that make an entire run-up/run-down cycle over the breakwater. The colours indicate the fluid velocity field, i.e. horizontal orbital velocity. Red indicates high positive velocities (directed shorewards), whereas blue indicates negative velocities (directed seawards).

The water surface elevation measured in the numerical tank is depicted in Figure 9 for each wave gauge location. The wave run-up has been calculated for 26 cross sections along the width of the flume: the averaged time series is shown in Figure 10.

### 5.3. Test case N. 3

The third test case comprises the validation of the hybridization technique between DualSPHysics model and SWASH model to study the impact of overtopping flows on multifunctional sea dikes with shallow foreshore. The main aim is to prove that overtopping flow characteristics and wave forces are modelled correctly and that the hybridization can represent a reliable solution that can be used as complementary or alternative to physical modelling. The case of study is a typical case from the Belgian and Dutch coastline, where a building is constructed on the top of the dike. Physical model tests were carried out in a 4.0 m wide, 1.4 m deep and 70.0 m long wave flume at Flanders Hydraulics Research, Antwerp (Belgium) to measure forces on the vertical wall (i.e. building), the layer thickness and velocities of the overtopping flows [103]. The geometrical layout is depicted in Figure 11: the foreshore slope was 1:35 and dike height 0.1 m. The dike slope was 1:3. Here, we refer to only regular wave cases.

SWASH has been previously validated against the physical model results: wave propagation, transformation and breaking have been accurately modelled, and the conditions at the toe of the dike are reproduced as in the physical model test. Then, SWASH has been implemented together with DualSPHysics to model the wave impact. Eight layers have been used in SWASH simulation. A hybridization point along the physical domain has been defined, and it is located at *x* = 30.24 m from the physical wavemaker in its neutral position (Figure 11), far enough from the location where the waves start to break (≈35.5 m). SWASH provides the boundary conditions for DualSPHysics at that location. DualSPHysics is used to model the part of domain between the coupling point and the dike. The quantities that have been measured and compared with the experimental results are (a) free-surface elevation after the coupling point, (b) overtopping flow thickness in three different locations along the dike crest and (c) wave forces on the vertical wall (measured in the physical model by means of two-load cells of model series Tedea-Huntleigh 614). Both free-surface elevation and layer thickness were measured in the physical model by means of resistive wave gauges.

An initial interparticle distance, *dp*, of 0.003 m has been used leading to 494,388 fluid particles in DualSPHysics model. Fifty seconds of physical time has been simulated in the TITAN X graphic card, taking 10.8 h. A case with the whole physical domain modelled in DualSPHysics has been also modelled: in such case, the moving boundary is represented by the physical wave generator, and its location is then at *x* = 0.00 m. This stand-alone DualSPHysics model took 95.6 h using the same TITAN X to simulate 3,389,266 fluid particles, about 10 times slower than the hybridised model.

The numerical and experimental free-surface elevation and layer thickness are plotted in Figure 12, showing that the numerical solution resembles the experimental accurately. The forces on the wall are represented in Figure 13. The differences between numerical and experimental results might be explained because of the highly turbulent and stochastic nature of the overtopping wave impact in this case, which makes the experimental test not repeatable (see [3] for further discussion on model inaccuracy for wave impacts).

## 6. Conclusions

The chapter offers a panoramic on the application of the SPH-based DualSPHysics code on supercomputers maintained at the EPhysLab from Vigo University in Ourense (Spain) and the Italian computing centre CINECA. Three test cases were selected in the general context of the coastal engineering: (1) wave generation and propagation of random wave train in 2D, (2) wave run-up on a cubic block breakwater in 3D and (3) coupling of DualSPHysics with SWASH model and application to wave forces on coastal structures in shallow water conditions (2D). Scalability is discussed by varying the spatial resolution, and efficiency is proved in the case of the hybridization. Comparison with theoretical free-surface elevation and orbital velocities for test case N. 1 and measured overtopping layer thickness and forces on vertical walls for test case N. 3 was satisfactory.

## Acknowledgments

We acknowledge the CINECA award under the ISCRA initiative, for the availability of high-performance computing resources and support. Part of the computations was carried within the high-performance computing for Environmental Fluid Mechanics (HPCEFM17) project.