Numerical Simulation of Vortex-Dominated Flows Using the Penalized VIC Method

Vorticity plays a key role in determining fluid flow dynamics, especially in vortexdominated flows. Vortex methods, which are based on the vorticity-based formulation of the Navier-Stokes equations, have provided deeper insight into physical reality in a variety of flows using vorticity as a primary variable. The penalized vortex-in-cell (VIC) method is a state-of-the-art variant of vortex methods. In the penalized VIC method, Lagrangian fluid particles are traced by continuously updating their position and strength from solutions at an Eulerian grid. This hybrid method retains beneficial features of pure Lagrangian and Eulerian methods. It offers an efficient and effective way to simulate unsteady viscous flows, thereby enabling application to a wider range of problems in flows. This article presents the fundamentals of the penalized VIC method and its implementations.


Introduction
Vorticity is an important derived variable playing both mathematical and physical roles in the solution and understanding of problems [1], irrespective of whether the flow is laminar or turbulent.Vortex methods (VMs) utilize such a vorticity as a main variable of fluid dynamics.VMs are essentially a grid-free approach in which Lagrangian fluid particles are used as basic computational elements.The computational domain is discretized into a set of N fluid particles, which carry time-evolving vorticity.
VMs are based on the vorticity-based formulation of the Navier-Stokes equations; namely, the vorticity transport equation (VTE).There are several advantages in using the vorticity-based formulation.Firstly, it allows a purely kinematical problem to be decoupled from the pressure term.Thereby, the need for solving the pressure is eliminated.It means that pressure computation is not part of the solution procedure.If the pressure field is desired, it can be obtained in an explicit manner using the resolved vorticity and velocity fields.Secondly, the vorticity-based formulation does not contain any frame-dependent terms (for detailed information, refer to [2]).In rotating frame, for example, VTE is only altered by the addition of a constant forcing function in the form of 2 Ω ω where Ω ω is a time-dependent angular velocity.In the primitive-variable formulation, on the other hand, the mathematical character of the equations changes depending on whether or not the reference frame is inertial.Gresho [3] offers more comprehensive information on the fundamental formulations for incompressible flow simulations.Finally, the vorticity-based formulation is inherently able to better resolve finer scales because vorticity is one order higher than velocity.Also, computations are performed where vorticity-carrying particles exist, thus permitting a substantial reduction in the computational domain.
VMs have a history as old as finite differences.Many inherent benefits were discussed in comprehensive reviews of [4][5][6][7][8].Admittedly, the primitive-variable and vorticity-based formulations, each have their own advantages and disadvantages.Their numerical characteristics cannot be a measure of whether one formulation is better than another.The choice of the numerical methodology is a matter of intended application.However, it is obvious that VMs have been an attractive alternative to conventional numerical methods based on the primitivevariable formulation, especially in terms of vortex-dominated flow simulations.
VMs track numerically the evolution of vortices in unsteady flow.Their most serious difficulty comes from a computational time of particle-particle interactions.Particles' velocity computed by the interaction can be quantified according to the familiar Biot-Savart law.This requires O(N 2 ) operations, which is expensive for large N.A reduction in the computational complexity has been achieved using the fast multipole method (FMM) with hierarchical data structures, which have had a giant impact in computations using a Lagrangian framework.FMM, which was first introduced by Greengard and Rokhlin [9], is one of the ten algorithms with the greatest influence of the development and practice of science and engineering in the twentieth century [10].FMM and its variants significantly reduce the cost of computing the interaction with N particles; namely, Ο(N 2 ) to Ο(NlogN) or Ο(N) operations (refer to [11] for a short review).
Another approach for speed up in computation of particle velocities is the vortex-in-cell (VIC) method, which was originally developed by Christiansen [12].This numerical method evaluates a velocity field, instead of the velocity of individual particles.It involves interpolation of physical quantities (vorticity and velocity) between Lagrangian particles and an Eulerian grid.Hence, the VIC method is often referred to as a hybrid method.In the VIC method, velocity field is computed by solving a Poisson equation, ∇ 2 ψ = − ω or ∇ 2 u = − ∇ × ω.A fast Poisson solver accelerates the computation sufficiently.For example, a fast Fourier transform (FFT)-based solver reduces an operation count to Ο(MlogM), where M is the number of grid points.Although the amount of computation is problem dependent, the VIC method is known to be faster than FMM to simulate unsteady viscous flows (refer to [13]).
For treating vorticity diffusion problem in VMs, several different techniques were developed [14]; the core spreading method [15], the random-walk method [16], the particle strength exchange method [17], the deterministic method [18], and the vortex redistribution method [19].Interestingly, Graham [20] successfully solved the diffusion term of VTE using an Eulerian grid instead of complicated Lagrangian operators (also refer to [21][22][23]).This means that in the VIC method both convection and diffusion parts can be evaluated on a regular grid.Another important point of concern about VMs is boundary conditions for wall-bounded flows.In recent years, the VIC method has been successfully combined with the penalization method [24], which is useful to enforce boundary conditions at a solid wall (for example, refer to [25][26][27][28][29][30][31][32][33]).The combination, hereafter called the penalized vortex-in-cell (pVIC) method, greatly simplifies an entire numerical procedure required to construct a solution algorithm, thus leading to a well-performing parallel program.As a consequence, the evolution of both velocity and vorticity fields are evaluated on an Eulerian grid, and then the values for Lagrangian fluid particles are determined from the grid solution using an interpolation scheme.This offers an efficient and effective way to simulate unsteady viscous flows.
The pVIC method to simulate unsteady viscous flows will be discussed further in the following sections.In Section 2, the basic formulations and numerical methods are described.Section 3 describes implementations of the numerical methods and post-procedures for obtaining pressure and force from numerical solutions.Section 4 introduces improved implementations to further efficiently simulate a flow.Section 5 presents numerical results and discussions to validate the presented numerical method.A summary and conclusion is included in Section 6.

Basic formulation and numerical methods
For an incompressible unsteady flow of a viscous fluid, conventional VTE is expressed as follows ( ) where ν is the kinematic viscosity.The evolution of a flow can be evaluated in a fractional step manner.The algorithm of viscous splitting consists of sub-steps in which the convective and diffusive effects are considered.It is thus expressed in a Lagrangian frame as • Convection part where D/Dt = ∂/∂t + u ⋅ ∇ is the material derivative.N discrete fluid particles are linearly superposed to approximate the vorticity field as where ζ is a mollification of the Dirac-delta function.Each particle is characterized by its position x p and strength Γ p , i.e. a circulation Γ p = ∫ωdV ≅ ω p V p where V p is a volume (an area in 2D) occupied by a fluid particle.
In the pVIC method, the velocity of the particles and vorticity evolution are evaluated on a uniform grid.For doing that, the particles' own vorticity is first transferred onto the grid by ( ) where h is the grid spacing.The function 4 ′ is the third order interpolation kernel [34] defined as in each coordinate direction.This kernel conserves the 0th, 1st, and 2nd order moments.Vorticity of a single fluid particle is transferred to the nearest 16 grid nodes in 2D and 64 grid nodes in 3D.The resultant vorticity at grid nodes is obtained by summing the contribution of all the particles.

Convection via vortex-in-cell (VIC) method
In the pVIC method, a velocity vector can be expressed as where U ∞ is a free stream velocity and u ω represents a rotational velocity.ψ is called as a vector potential in 3D and a stream function in 2D.Taking the curl of Eq. ( 7) yields the Poisson equation ∇ 2 ψ = − ω linking vorticity to the vector potential and in turn to the velocity.ψ is computed on a uniform grid using a FFT-based solver.The grid for solving the Poisson equation can define a compact computational domain with non-homogeneous Dirichlet boundary conditions.The boundary condition of a domain Ω can be approximated using a Green's function approach [28], which is given by where x p ∈ Ω, x b ∈ ∂Ω, and x p ≠ x b .In 2D, it follows that ( ) Finally, rotational velocities on the nodes of the grid are computed from the definition u ω = ∇ × ψ using a finite difference scheme.
In 2D, two unknowns (two components of a velocity vector) are reduced to a single unknown ψ z (the scalar stream function) without any loss of generality.There are no problems with the continuity equation; the velocity field is automatically divergence-free.In 3D, however, a total of six unknowns are required to be solved instead of the usual four of the primitive-variable formulation.Unlike the stream function in 2D, the vector potential in 3D is far from being uniquely defined.The 3D formulation requires that the vorticity, velocity, and vector potential are all divergence-free; ∇ ⋅ ω = 0, ∇ ⋅ u = 0, and ∇ ⋅ ψ = 0.This is a new difficulty for 3D flow simulation.Among these three conditions, the second is satisfied by letting u = ∇ × ψ and the third is a consequence of the first.To enforce the first, the well-known projection scheme can be applied.The divergence-free of a vorticity field is accomplished by ω − ∇F, where F is a scalar field which is obtained by solving ∇ 2 F = ∇ ⋅ ω.

Diffusion via penalization method
The penalization method has been used to take into account a solid body immersed in a flow.Among fictitious domain methods, the penalization method is very easy to implement, robust and efficient.This consists only in adding a penalty term in the momentum equations to represent a solid body.For example, the L 2 penalization consists in adding a damping term on the velocity in the Navier-Stokes equations [24].The penalty velocity satisfies Darcy's law in terms of a Neumann boundary condition on the pressure.Angot et al. [24] rigorously showed that the penalized equations can be used with confidence since penalty error is always negligible in the face of the error of approximation.In the penalization method, a solid body is regarded as a porous medium where the permeability is infinite in the fluid part and tends to zero in the solid part.Both the solid and fluid domains are solved directly without addition of any boundary condition at a solid wall.The penalization method can replace complicated algorithms to implement boundary conditions at a solid wall.
In the pVIC method, a penalty term is added into VTE as follows where u s is the velocity of the solid body and χ denotes a mask function that yields 0 in the fluid and 1 in the solid.The penalty parameter λ is equivalent to an inverse permeability.In the case of explicit Euler time discretization, λ must satisfy λΔt < O(1) to ensure stability.Alternatively, an implicit penalty term can expressed as ( ) where The implicit scheme is unconditionally stable [27].This allows to use large λ regardless of Δt for accurate solutions near a solid body.Based on implicit penalization scheme, the VTE is rewritten as Each term can be solved using a finite difference scheme.For example, the penalty term can be discretized by a first-order centred difference scheme.To reduce an error, one can use a smoothed mask function [35].The stretching term, taking its alternative form (ω ⋅ ∇ T )u (the so-called transpose scheme), can be computed with the fourth-order central difference scheme.The transpose scheme is advantageous due to its conservative and accurate properties [36].
The diffusion term, ν∇ 2 ω, can be discretized by a 27-point isotropic Laplacian scheme which is less sensitive to the grid orientation.In 2D, the diffusion term can be discretized by a classical 9-point finite difference scheme.

Numerical procedure
For high performance computing, the message passing interface (MPI) can be used on a distributed-memory parallel system.For parallel computing, a computational domain Ω is decomposed by splitting it into several subdomains, with one subdomain per processor.Fluid particles are distributed among the subdomains depending on their positions.
In the pVIC method, convection, penalization, diffusion, and stretching (only in 3D) parts are sequentially evaluated on each subdomain.A numerical procedure follows that a.Each processor interpolates the vorticity of its own particles into grid nodes through the 4 ′ interpolation kernel.
b.Each processor computes ψ b with its own particles' vorticity using the Green's function approach.Then, ψ b for solving the Poisson equation is determined by a summation of the values obtained by each processor.
c.The Poisson equation ∇ 2 ψ = − ω is solved using the FFT-based solver.Velocities on the grid nodes are computed from the resultant ψ, using a finite difference scheme.
d.Each processor evaluates the penalization term on the nodes using a finite difference scheme, and then the diffusion and stretching terms are computed.Divergence-free condition of vorticity field is enforced using the projection method.
e.Each processor interpolates the velocity and vorticity on the grid back to its own particle positions through the 4 ′ interpolation kernel.Finally, the particles' position and strength are updated using an appropriate time integrator.
Some particles that have left a subdomain are assigned to an adjacent processor.If needed, the computational domain is increased depending on particle distribution.The whole process is repeated for the next time-step.
Particle redistribution is conducted every few time steps to ensure particle overlap.Concurrently, with this step, particles with negligibly small vorticity are discarded to prevent unnecessary increase in particle population.The resultant particles are filtered with a relative criterion; Γ i < C ω |Γ| max where C ω is a cut-off criterion.The discarded strengths were shared to the particles remaining in order to guarantee the conservation of circulation in the flow.It is found that the cut-off criterion has a significant influence on global force evaluation, rather than on vorticity and velocity fields.This will be further discussed in Section 3.4.

Numerical stability
In the pVIC method, fluid particles are traced by continuously updating their position and strength (vorticity or circulation) from the grid solution.The method retains the key feature of pure Lagrangian vortex methods; namely, the convection part disappears from VTE.As noted in [13], the particle convection is linearly unconditionally stable.The convection part does not impose any stability constraint but relies on the Lagrangian particle convection.The nonlinear stability condition that particles do not collide during one time-step (Δt) based on the rate of strain of a flow reads Δt ≤ C 1 |∇u| − 1 ≈ C 1 |ω| − 1 .Also, the maximum allowable Δt is constrained by the diffusion Courant-Friedrichs-Lewy (CFL) condition, Δt ≤ C 2 h 2 /ν, because the vorticity field is computed on a grid using a finite-difference scheme.This is less restrictive than the convective CFL condition for pure finite-difference methods, Δt ≤ Ch|u| − 1 .In practice, the nonlinear stability condition for the convection is much less demanding since the redistribution of particles to regular positions is periodically carried out.Δt for the diffusion is regarded as the physical time-step for an entire flow simulation.This permits the use of a relative large time-step compared to pure finite-difference methods.

Domain decomposition for parallel computing
To decompose a computational domain in parallel, there are two typical strategies.One is that a computational domain is equidistantly decomposed.The other is that a computational domain is decomposed so as to assign the same number of particles to each processor; consequently, the subdomains have different sizes (for detailed information, refer to [37]).In our 2D flow simulations, the latter was typically faster than the former, whereas in 3D the former outperformed the latter.This is related to both balancing of computation load for Dirichlet boundary conditions and communication load of grid assembly to solve a Poisson equation.
In our pVIC code, each processor has all the boundary nodes to minimize communications among processors and boundary conditions at are determined by summing the contribution of all the particles.The same number of particles among processors ensures near-perfect balanced computation and communication loads, even if it causes unbalanced communication for grid because of different domain sizes.Since local computation of boundary conditions in 2D cases was typically more expensive than inter-processor communication, we found the second approach to be much faster.In 3D cases, however, balancing of communication for grid assembly becomes more important than that of computation of boundary conditions.This is due to a significant reduction in computing time by the spline approximation method (which will be introduced in Section 4.2).As a result, the equidistant domain decomposition can be a better choice for 3D flow simulations.

Time integrator
To move fluid particles with their own velocity, predictor-corrector methods are usually used.
Table 1 shows a summary of time integration schemes used in the literature, but the detailed scheme was not explained in most of papers.
Table 1.Comparison of numerical time-integration schemes reported in the available literature.
A popular Runge-Kutta method is known as a single stepping scheme with multiple stages per step.For example, the method can be expressed as where the superscript * indicates an intermediate stage [32,43].Note that the velocity field is evaluated twice in finding the new position of + 1 . If a higher time integration scheme is used, more involved computations are needed.It is obvious that a higher-order integrator helps minimize error in particle tracking, compared to one evaluation.However, more evaluations of velocity field cause an increase in computational time.
Interestingly, Rossinelli et al. [44] evaluated particles' velocity using a modified explicit scheme as follows ( ) ( ) and a notable feature is that velocity field is evaluated only once although this scheme can be still considered as a kind of the Runge-Kutta method.Particles' velocity at the intermediate stage is computed using interpolation.This provides a considerable saving in computational time and memory consumption.Our research group investigated the feasibility of different integrators such as the Euler scheme with several stages, the midpoint rule, and the Simpson rule, as follows ( ) ( ) where δt = Δt/n′ ) Akin to Rossinelli et al. [44], particles' velocity at intermediate position was computed using interpolation from the values at neighbouring grid nodes.A benchmark problem was a flow around a sphere at Reynolds numbers of 50, 100, and 200.Overall computational time was about two times faster in the midpoint and Simpson rules, compared with the typical RK2 method in which velocity field is computed twice.The midpoint and Simpson rules provided comparable accuracy, and the difference in drag force was typically <0.3%.This approach seems to be useful.However, an essential prerequisite is that velocity field is little or not changing during Δt.This can require a smaller Δt, especially in high Reynolds number flows.Since the accuracy of a time integrator for tracing particles is dependent upon the change of a flow field in both time and space, a time integration scheme should be determined carefully.

Fast Poisson solvers
To solve a Poisson equation, the fast Fourier transform (FFT) and multigrid (MG) are utilized most widely.For a uniform grid, the behaviour of both solvers is well-known.In serial computation, a FFT solver requires O(MlogM) for M grid points and a MG solver requires O(M) operations.In an ideal parallel computation with zero cost communication, the FFT and MG solvers require O(logM) and O(log 2 M), respectively (for detailed information, refer to [45]).
Here, the performance of two solvers is compared under exactly the same condition; a rectangular computational domain with equidistant nodes.Both solvers were developed in our research group.The FFT solver is based on an open-source library called Fastest Fourier Transform in the West (FFTW) [46], and the MG solver is based on full multigrid algorithms provided in Section 19.6 of the Numerical Recipes [47] (also refer to [41]).Here, the algorithm is briefly introduced.The full multigrid algorithm starts with the coarsest possible grid and then proceeds to finer grids.It performs one or more V-cycles at each level before proceeding down to the next finer grid.This means that the full multigrid algorithm produces a solution for each level.We used the red-black Gauss-Seidel scheme as a smoothing operator, bilinear interpolation for a prolongation operator, and half-weighting for a restriction operator.The full multigrid cycle can be seen in Figure 1.
Figure 1.A computational cycle used in the MG solver.
To compare the two solvers, a 2D flow around a circular cylinder at a Reynolds number of 550 was considered as a benchmark problem.The MG solver with four levels was tuned by choosing one pre-smoothing sweep and two post-smoothing sweeps.Figure 2a shows a comparison of numerical results between the FFT and MG solvers.The MG solver performed comparably to the FFT solver in terms of accuracy.As expected, however, the MG solver was slower than the FFT solver as shown in Figure 2b.This is due that the MG solver is a kind of iterative method and thus it is an order magnitude more expensive compared to the FFT solver as a non-iterative method.Since, so far, most of VIC codes solve the Poisson equation in a computational domain of simple shape such as a square or cube, the FFT solver as a noniterative method is thought to be the best choice.However, it is worth noting that the MG solver can be applied to locally refined grids in a straightforward way.Adaptive mesh refinement is useful both to increase accuracy and to decrease computational cost in comparison with a uniform grid.

Force calculations
From the fact that the integral of vorticity moments becomes the change of the momentum, fluid force exerted on the solid body (F s ) is expressed as where N is the dimension of the space.The volume integral can be replaced by the summation of the moments for all the particles.Also, the adoption of the penalization method introduces another approach for the force evaluation.One can obtain the force with the velocity field inside the body as follows: ( ) where the penalization term itself is included in the integral as the time-change of the momentum.For the implicit penalization scheme, it can be rewritten as ( ) The integral over the whole domain is actually confined to the interior of the body due to the definition of χ.In 2D flow simulations, the drag coefficients have been reported in the literatures [30,37,44,48].They have evaluated the drag coefficient using one or both of the approaches.Rasmussen et al. [33] and Gazzola et al. [31] addressed that the discrepancy in the drag coefficient between the two approaches were not significant.In 3D, however, it was found that the method based on the vorticity moment is unstable occasionally; that is, a large C ω can make it incorrect even if there is little difference both in vorticity and velocity fields.Figure 3 shows the drag coefficient (C D ) of a sphere at a Reynolds number of 100 computed by the two different approaches employing three different cut-off criteria; C ω = 10 − 4 , 10 − 5 , and 10 − 6 .C D is defined by F x /(0.5ρU ∞ A) with a sectional area A = πR 2 .The drag coefficient computed with vorticity moments were converged at C ω = 10 − 6 , whereas the drags computed with the penalized velocity were nearly the same in all the tested cases.This describes that force evaluation using the vorticity moment should be taken with special care.As a result, the force evaluation with the penalized velocity can be a better option, especially in 3D simulations.This enables one to avoid an excessive increase in the number of particles, thus reducing the computational time and memory.In the cases with C ω = 10 − 4 , 10 − 5 , and 10 − 6 , there were 1.0, 1.9, and 3.5 million particles, respectively.

Pressure field
As discussed previously, pressure field is independently computed from the entire solution procedure and explicitly obtained by solving a pressure Poisson equation, ∇ 2 H = ∇ ⋅ (u × ω). Lee [49] introduced an integral approach to obtain the pressure field at any fixed time as follows: ( ) where G is the Green function solution.H is the Bernoulli function defined as ( ) where p is the static pressure above the reference pressure P ∞ and ρ is fluid density.The value of H on the body surface is computed using a boundary integral approximation.A mathematical identity for a vector or scalar field is used to define field values of a quantity of interest, which involves an integral of singularities distributed over the surface and throughout the field.The boundary integral approach has been successfully established.However, it has disadvantages such as the presence of singular Green's kernels.Special attention is needed to accurately compute boundary integrals around a singular point.The higher mathematical complexity is needed to get a usable computational formulation.The matrices that result from the integral method are asymmetrical, and they are not easy to solve.Furthermore, this approach takes long time of computation.Alternatively, pressure field can be evaluated based on the penalized Navier-Stokes equation [28].Satisfying the continuity, the Poisson equation for pressure can be expressed as where λ ' is distinguished from the λ by its order of magnitude.This equation is solved on the grid using a fast Poisson solver.The boundary condition for H can be given as homogeneous Dirichlet type.Figure 4 shows pressure distributions around a circular cylinder for a Reynolds number of 550, which are computed using different orders of magnitude for λ ' ; λ ' = 0, 0.1/Δt, 1/Δt, and 10/Δt.The assumption of λ ' ≅ 1/Δt is thought to be reasonable.With the assumption, Lee et al. [28] successfully computed pressure field around a 2D cylinder as shown in Figure 5.It is still valid in 3D flow simulations as shown in Figure 6.This approach is quite efficient, but an accurate approximation of H at domain boundaries still remains a problem.

More efficient implementations 4.1. Multiple domains
In the pVIC method, the size of a computational domain relies on the distribution of fluid particles.A large domain leads to an increase in computational memory required for a grid.A multiple domain approach can be considered to handle much more particles with a limited computational memory.The entire domain Ω can be defined as the union of physical subdomains, covering all the fluid particles; that is, x p ∈ Ω where = 1 ∪ ⋯ ∪ .The number of small domains, N D , is not constant and depends on a spatially evolving flow.When the domain size exceeds a certain limit, a new domain is created.For example, the first domain Ω 1 includes a solid body and the others are sequentially located downstream from the body as shown in Figure 7.Note that this approach differs from the domain decomposition for parallel computations.That is, each small domain Ω i is again decomposed into subdomains for parallel computations; = 1 ∪ ⋯ ∪ where N P is the number of processors In Ω i , vorticity and velocity fields are computed according to the numerical procedure given in Section 2.3.There is no dependency between the small domains since domain boundary conditions of Ω i are computed by summing the contribution of all the particles.Hence, independent domains ensure a relatively small use of computational memory during a flow simulation, compared to an approach based on parent and child grids [23].Also, the multiple domain approach enables to have different resolutions among small domains.For example, the grid spacing of Ω i can be defined as either h i = ϵ (single-resolution) or h i = 2 n ϵ (multiresolution) where ϵ denotes the particle size.The particle strength can be transferred into the nodes as follows: ( ) Although this approach ensures the small consumption of computational memory, it causes an increase in computational time because of an increase in the number of domain boundaries.Fast computations of ψ at domain boundaries will be discussed in Section 4.2.

Approximation of far-field conditions
For simplicity, consider a single domain Ω.The resultant values at ∂Ω are determined by summing the values computed by all processors.Each processor requires O(M b N/P), where M b and N/P denote the number of boundary nodes and its owned particles, respectively.Note that N/P can be reduced by an increase in the number of processors (P), whereas M b is a constant depending on the size of Ω. M b can be critical to determine the computational time.In practice, a substantial part of the overall computational time is spent on the calculation of ψ b at domain boundaries for solving Poisson's equation.Lee et al. [37] attempted a fast computation of ψ b using splines.They found that ψ b varies very smoothly at boundaries of a domain.This feature permits the use of spline interpolation.In 2D, for example, one spline curve enables to represent one side of the computational domain.ψ b at the part of the boundary nodes ( ) is computed by pure direct summations, and then values at the other nodes ( ) can be approximated using a spline curve.Here, = + .Using spline, approximation leads to a reduction in , which is directly linked to the reduction in the computational time.Figure 8 shows CPU times elapsed in computations of boundary values using the spline approximation and the FMM [50].The numerical simulation for a flow past a circular cylinder at Reynolds number of 550 was carried out using 16 CPUs with 4 GB of memory per processor.Once one-eighth of all the boundary points are equidistantly chosen as points for the direct summation, the spline approximation approach becomes faster than the FMM that is modified to compute only boundary values.In this case, the spline approximation approach has accuracy comparable to the FFM.A weak point of the FMM is that it has a quad-tree (oct-tree in 3D) data structure to hierarchically subdivide the computational domain.Each processor in a distributed memory parallel system must have a sufficient amount of memory for tree data structures.A required memory in the FFM depends on both the number of particles and grid nodes.
In 3D, the bi-cubic interpolation can be used instead of the bi-cubic spline (for detailed information on interpolation methods, refer to [47]).According to prior tests to assess this approach, the former was more accurate for our purpose.For example, for computing the stream functions at one side of the cube with 256 × 256 nodes from 3 million randomly distributed particles, the trial using the bi-cubic interpolation method took approximately 1 s when 16 × 16 nodes were used for direct summations, whereas the fully direct evaluations took 194 s in average.The error was typically <0.01%.

Flows past an impulsive started circular cylinder
The impulsively started circular cylinder problem is a good prototype to validate a numerical method.The Reynolds number, Re = U ∞ D/ν, based on the free stream velocity, U ∞ and the diameter of the cylinder, D, is selected to be 40, 550, 3000, or 9500.In the simulations, no symmetry constraint is imposed.The dimensionless time is based on the radius (R = 0.5D) of the cylinder; T = tU ∞ /R.The penalization parameter λ is fixed to 10 8 .

Reynolds number of 40
The numerical parameters are h = 0.04 and Δt = 0.016, which are determined through the stability condition Δt = h 2 /ν.At a Reynolds number of 40, it is well-known that the flow reaches a steady state.A pair of stationary recirculating wakes develops behind the cylinder.The wake length, L/D, is 4.16 and separation angle, θ s , is 51.7°, respectively.Fornberg [51] made a similar remark and gave L/D = 4.48 and θ s = 51.5°experimentally.The drag coefficient is computed as C D = 1.483.This is close to the experimentally measured value of 1.498 in [52].Figure 9 shows pressure coefficient (C P ) distribution around the cylinder body.The pressure is continuous through the cylinder body, and there is no Gibb's oscillation associated with the discontinuity at the body surface.Figure 11 shows the time evolution of the drag coefficient for an impulsively started flow around a two-dimensional cylinder for Re = 550 calculated with the pVIC method, compared with results from the short time asymptotic solution of Bar-Lev and Yang [54], the vortex method result of Koumoutsakos and Leonard [55], and the VIC method result of Kudela and Kozlowski [56].The pressure distribution at the cylinder surface is shown in Figure 12.At very early stages in the numerical simulation, the surface pressure distribution is quite close to that for an ideal inviscid flow.As flow evolves, the minimum pressure coefficient gradually moves upstream.
The corresponding locations of C P = 0 have the same upstream moving trends and is at the angular position θ = 35° as experimentally measured by Norberg [57].

Reynolds number of 3000
The simulation parameters are h = 0.0025 and Δt = 0.002.Simulation was carried out until T = 10 and the number of fluid particles ranged from approximately 130,000 to 420,000.The total run time is about 9 h on 16 CPUs (Intel Xeon64 3.3 GHz). Figure 13 shows numerical results.
Compared with a flow at Re = 550, the secondary vortices appear at an earlier time and grow larger.In the case of Re = 3000, the two secondary vortices formed are equivalent in size and in strength.It is the so-called α phenomena [53].Also, the streamline computed for Re = 3000 compares with the flow visualization result of Loc and Bouard [58].The present simulation correctly captures the expected physics of the flow at Re = 3000.

Reynolds number of 9500
The simulation parameters are h = 0.001 and Δt = 0.002.Flows were numerically simulated until T = 4, and the number of fluid particles ranged from approximately 800,000 to 1,200,000.The total run time is about 25 h on 24 CPUs (Intel Xeon64 3.3 GHz). Figure 14 shows vorticity contours.They are in good agreement with the numerical results in [29].As shown in Figure 15, the streamline patterns computed for Re = 9500 compare quite well with the flow visualization results in [58].The α and β phenomena, which are observed experimentally by Bouard and Coutanceau [53], are well captured by the numerical simulation.

Flows past an impulsive started sphere
When the pVIC method comes to a 3D implementation, a little attention should been paid to the quantitative validation as mentioned in Sections 2.1 and 3.4.An incompressible viscous flow past a sphere has extensively been studied by many researchers in theoretical, experimental, and numerical ways.It is well-known that the wake behind a sphere depends on the Reynolds number.Especially, the flows at 20 < Re < 210 are often investigated as a validation case by virtue of their steady and axisymmetric features in wake structure including separation.To demonstrate the feasibility of the pVIC method, we carried out numerical simulations of the impulsively started flow past a sphere at a Reynolds number of 100. Figure 16 illustrates the configuration of the multiple domains.The sphere is immersed in a Cartesian grid that does not conform to its surface.The simulations were carried out until T = 30 to demonstrate the convergence of the present method and test its capability of long-time simulation.Numerical parameters were determined through the stability condition, νΔt/h 2 < 1/6, and the grid convergence was achieved with h = 0.02 and Δt = 0.005.It was concluded that giving C ω = 10 − 5 is sufficient in terms of the force calculation.The number of fluid particles ranges from 2 to 5 million during the simulation, and the total computation time is roughly 67 h on 16 CPUs (Intel Xeon64 3.3 GHz). Figure 17 shows the vorticity contour at the steady state.The vorticity contour is in excellent agreement with the numerical result in [59].The wake shape and drag coefficient at the steady state are compared to the references in

Vortex shedding from a hydrofoil
We selected a National Advisory Committee for Aeronautics (NACA) 0009 cross section with a truncated trailing edge for the numerical simulations.This hydrofoil has the same as the experimental model used in [62][63][64].Ten percent of the original chord c o was removed from the trailing-edge region of the NACA 0009 hydrofoil.The hydrofoil geometry is further detailed in [63].The maximum thickness, as normalized by the chord length c, is t max /c = 0.1 and the thickness at the trailing edge is 0.0322.The numerical parameters h = 0.0003 and Δt = 0.00015 were chosen to simulate flow past the 2D hydrofoil at a Reynolds number of 2 × 10 6 .Figure 19 shows the temporal power spectrum of vertical velocity fluctuations measured in the near wake.The maximum peak is identical to the vortex shedding frequency, and the second peak area is found in the third harmonic of the shedding frequency.For U ∞ < 5, spectral levels are almost flat as f decreases.This trend is similar to experimental data of Bourgoyne et al. [65].
In their measurements, an important characteristic of the spectra is the presence of a clear region with a −5/3 slope; spectral power-law for high Reynolds number turbulent fluctuations.From our numerical simulation, however, the spectral density of the velocity fluctuations shows a decay of the form.A similar remark was made by Singh and Mittal [66].This is due that the vortex stretching mechanism is absent in two-dimensional flows.More detail on fluid turbulence confined two spatial dimensions can be found in Boffetta and Ecke [67].

Figure 2 .
Figure 2. Comparison of flow fields and CPU times between the FFT and MG solvers.(a) Vorticity contours and streamlines and (b) CPU times per time steps.

Figure 3 .
Figure 3. Drag coefficients calculated with two different approaches: (a) vorticity moment and (b) penalized velocity.

Figure 5 .
Figure 5. Pressure distribution around a circular cylinder at a Reynolds number of 9500.Note that the figures are reprinted with permission from [28].(a) Pressure contour at T = tU ∞ /R = 3 and (b) pressure coefficient at a solid surface.

Figure 6 .
Figure 6.Pressure distribution around a sphere for a Reynolds number of 100 at T = tU ∞ /R = 30.(a) Pressure contour and (b) pressure coefficient at a solid surface.

Figure 7 .
Figure 7. Vorticity contour around a circular cylinder for a Reynolds number of 185.

Figure 8 .
Figure 8. CPU time in numerical simulation using the spline approximation.Note that the FMM is based on a tree level of 3 and an expansion degree of 10.The figures are reprinted from [50].(a) Computational time for boundary values and (b) overall computational time and error.

Figure 9 .
Figure 9. Vorticity and pressure contours around a circular cylinder at a Reynolds number of 40.(a) Vorticity contour and (b) pressure contour.

Figure 11 .
Figure 11.Time evolution of the drag coefficient for a cylinder at a Reynolds number of 550.Note that the figure is reprinted with permission from [28].

Figure 12 .
Figure 12.Surface pressure coefficients for a cylinder at a Reynolds number of 550.Note that the figure is reproduced with permission from [28].

Figure 15 .
Figure 15.Streamlines at Reynolds number of 9500; (a) T = 2 and (b) T = 4 compared with flow visualization results of Loc and Bouard[58].Note that the figures are reprinted with permission from[28].

Figure 17 .
Figure 17.Contour of vorticity magnitude and streamline on xz-plane at a steady state.

Figure 18 .
Figure 18.(a) Instantaneous vorticity contour and (b) a time history of drag and lift coefficients at a Reynolds number of 2 × 10 6 .

Figure 19 .
Figure 19.Temporal power spectral density (PSD) of vertical velocity fluctuations.Note that this figure is reprinted with permission from [32].

Figure 20
Figure20 shows vortex shedding from a hydrofoil with different bevel angles in degrees.At β = 0°, the vortex shedding is regular and periodic.As the bevel angle β increases, vortex shedding becomes increasingly disorganized.The periodicity of both forces is almost lost at β = 60°.

Figure 20 .
Figure 20.Vortex shedding from a hydrofoil with different bevel angles.Note that the figures are reproduced with permission from [37].(a) β = 0°; (b) β = 20°; (c) β = 40°; (d) β = 60°.Also, we conducted numerical simulations to investigate vortex shedding with respect to the sinusoidal motions of the free stream flow.The angle of attack varied periodically with time t; α = α o sin(2πf ∞ t) where f ∞ was the frequency of the free-stream flow oscillation and its oscillation amplitude was restricted at α o = 2 in degrees.The magnitude of the free-stream velocity was kept constant.The tested model is the NACA 0009 hydrofoil with β = 60°.Interestingly, at f ∞ = 20 vortices are regularly shed from the trailing edge, as shown in Figure 21.In the power spectral density (PSD) functions, drag oscillation induced by vortex shedding is clearly observed.This means that a particular oscillation frequency in the free stream velocity can cause regular and periodic vortex shedding.

Figure 21 .
Figure 21.Vortex shedding at f ∞ = 20.Note that the figures reproduced and reprinted with permission from [37].

Table 2 .
The numerical results are good agreements with the reference results.

Table 2 .
Comparisons of wake shape behind a sphere and drag coefficient with previous results.