Parallel Accelerated Group Iterative Algorithms in the Solution of Two-Space Dimensional Diffusion Equations

Diffusion equations are mathematical models which explain how the concentration of one or more substances distributed in space is altered by a diffusion process which causes the substances to spread out over a surface in space. For a normal diffusion process, the flux of particles into one region must be the sum of particle flux flowing out of the surrounding regions. From Fick’s first law, this can be represented mathematically by the following diffusion equation


Introduction
Diffusion equations are mathematical models which explain how the concentration of one or more substances distributed in space is altered by a diffusion process which causes the substances to spread out over a surface in space. For a normal diffusion process, the flux of particles into one region must be the sum of particle flux flowing out of the surrounding regions. From Fick's first law, this can be represented mathematically by the following diffusion equation If the diffusion coefficient is constant in space, then here 2 ∇ is the normal Laplacian. The values u and D take on different meanings in different situations: in particle diffusion, u is interpreted as a concentration and D as a diffusion coefficient; while in heat diffusion, u is the temperature and D is the thermal conductivity. The application of the finite difference methods for solving time-dependent Partial Differential Equations(PDEs) such as this, at any particular time level, yields a system of linear simultaneous equations of the form where iterative methods are normally more feasible in solving the system due to the sparsity of the matrix A. The applicability of explicit group methods, in which several unknowns are connected together in the iteration formula resulting in a sub-system that must be solved before any one of the them can be determined, have been investigated on solving these types of PDEs. In their early work, Evans and Abdullah (1983b) generated single-step, oneparameter families of finite difference approximations to the heat equation in one space dimension by coupling in groups of two the values of the approximations obtained by known asymmetric formulas at adjacent grid points at the advanced time level. The www.intechopen.com Advanced Methods for Practical Applications in Fluid Mechanics 100 resulting equations are implicit but they can be easily converted to explicit form. The method was shown to possess unconditional stability with good accuracies. Evans and Abdullah (1983a) also developed the group explicit method for the solution the two dimensional diffusion where a general two-level six point finite difference approximation was developed to solve the parabolic equation. The method was further developed as the Alternating Group Explicit (AGE) method (Evans & Sahimi, 1988), which is an analogue to the famous Alternating Direction Implicit(ADI) method but has the advantage of being explicit and thus very easy to parallelise. The emergence of newer explicit group methods on skewed or rotated grids with promising and improved results was greatly observed since the early 1990s. Among them are the works of Abdullah (1991) who developed the four-point Explicit Decoupled Group (EDG) by discretising the PDEs on skewed grids. This method was shown to require less computational time with the same order of accuracies than the Explicit Group (EG) method pioneered by Yousif and Evans (1986) in solving the Poisson model problem. A few years on, Othman and Abdullah (2000) modified the formulation of the EG method by deriving formulas based on the centred five points approximation formula with the grid spacing h and 2h, and the rotated five points approximation formula to come up with the improved modified four-point EG which was shown to exhibit lesser computational effort than the existing EG and EDG. Since then, active research has been conducted to investigate the capabilities of the variants of these group relaxation methods in improving the standard or traditional algorithms in solving several types of PDEs. This includes the work of Ali and Lee (2007) who derived the Accelerated OverRelaxation (AOR) variant of the EDG group scheme in the solution of elliptic equation where its performance results were compared with the EG (AOR) proposed by Martins et al. (2002). The new EDG (AOR) scheme requires less execution time than the existing EG (AOR) method where the gain in speed of EDG (AOR) method over the EG (AOR) method ranges from approximately 51% to 59%. The performance analysis of the parallel algorithms of these EG and EDG schemes were also established in Ng and Ali (2008) where the algorithms turn out to be efficient solvers for the steady-state elliptic equation on distributed memory multicomputer with high scalability.
In this chapter we shall present the formulation of new explicit group algorithms intended for solving the two dimensional time-dependent diffusion equation. A novel approach of using four points group strategy, implemented on different spacing stencils incorporated with the AOR technique, is used in the formulation. Explainations on how the methods need to be reconfigurated mathematically as to be successfully ported to run on a messagepassing parallel computer system is also presented.

Formulations of group methods
We consider the finite difference discretization schemes for solving the two dimensional diffusion equation of the form with a specified initial and boundary conditions on a unit square with spacings = 1 / xy h n Δ= Δ = in both directions x and y, with 00 , ij xxi h yyj h =+ =+ (i,j = 0,1,2,...,n), t = kΔt (k = 0, 1, 2, …); here, f is a real continuous function. One commonly used implicit finite difference scheme based on the centred difference in time and space formulation about the point (i,j,k+ 1/2) is the Crank-Nicolson scheme which transform (4) . Based on this approximation, several group schemes have been constructed (Ali, 1998). The Explicit Group (EG) method, for example, was formulated by taking the iteration process in groups of four points. At each time level (k+1), the mesh points are grouped in blocks of four points (i,j), (i+1,j), (i+1,j+1) and (i,j+1) and equation (5) is applied to each of these points resulting in the following (4x4) system: ,, 1 , which may be solved explicitly in groups of four points as

102
The method proceed with iterative evaluation of solutions in blocks of four points respectively using these formulas throughout the whole net region until convergence is achieved.
Using another type of discretization, which we called the rotated finite difference approximation: another group scheme was formulated called the Explicit Decoupled Group (EDG) method. This approximation is obtained by using the Taylor series expansion of the solution u at appropriate grid points where the resulting computational stencil is 0 45 clockwise rotated from the stencil of the standard Crank-Nicolson (5). At each time level (k+1), the mesh points are grouped in blocks of four points, (i,j), (i+1,j+1), (i+1,j) and (i,j+1), and the rotated finite difference approximation (8) is applied to the u values at each of these points resulting in a (4x4) system which may be decoupled into two 2x2 matrices of the following form (Ali, 1998)

44
(1 ) The EDG method requires less computing time whilst maintaining the same order of accuracies as the original EG method (Ali, 1998). Based on approximation formula (8) derived on different grid spacings, new modified group explicit methods will be formulated in combination of the Accelerated Over-Relaxation (AOR) technique. The next sub-sections will elaborate on the formulations of these methods.

Modified explicit group (MEG) AOR method
We consider the standard five-point formula for the diffusion equation on 2h Ω grid: Here, ij u is the value of u at the current time level (k+1), ij v is the value of u from the previous time level (k), while m is the iteration level. ij F is the value of f at the point (i,j) at time level (k+1). We begin by dividing the grid points in the solution domain into 3 types of points, indicated by , , , and arranged in a specific alternate ordering, as shown in Fig.  1. For the iterations, we consider the points indicated by in Fig. 2. Similar to the EG method described in the previous section, we may apply equation (12) to groups of four points of the iterative points to produce a 4x4 MEG formula. The convergence of this method may be improved by the introduction of the AOR technique (Hadjidimos, 1978) where drastic improvement in convergence can be obtained by choosing suitable relaxation parameters in its formula. The idea in AOR technique is to apply an extrapolation of a two-parameter Successive OverRelaxation (SOR)-type iterative procedure in the formula. The two parameters may be exploited which result in methods which will converge faster than any other method of the same type. Using this idea, we can introduce the overrelaxation parameters ω and R into the 4x4 MEG formula as a way to further accelerate the convergence of the iterative method scheme as the following: (1 ) () 3 4 55 66 75 8 4 15 23 , , Unlike SOR, there is no general formula to determine the parameters R and w. But according to Hadjidimos (1978), the parameter R is normally chosen to be close to the value ω obtained from the corresponding SOR technique which give the least number of iterations. We can then define the four points MEG (AOR) method for diffusion equation as the following: Algorithm 1 1. Divide the grid points into points of type , and at level k+1 as shown in Fig. 1. 2. Set the initial guess for the iterations. (13)-(14) to evaluate the solution of points of type iteratively at level k+1. 4. Check the convergence. If the iterations converge, go to step 5. Otherwise, repeat step 3 until convergence is achieved. 5. After the solution at points of type converge, the converged values are then adopted as the initial guess for the next time level. 6. Then, repeat steps 1 to 5 until the solutions at all the required time levels have been obtained. 7. For the solutions at the remaining points at level k+1 (Fig. 1)

Use Equations
Here,

Modified Explicit Decoupled Group (MEDG) AOR method
To formulate the MEDG (AOR) method for the diffusion equation, we consider the rotated five-point approximation formula for the diffusion equation with 2h spacing: www.intechopen.com

)
( 1 ) ( )( )( )( ) ,2 , 2 2 , 2 2 , 2 2 , 2 2, 2 2, 2 2, 2 2, 2 , , Here, The method is constructed by firstly dividing the grid points into 4 types of points in a specific alternate ordering as shown in Fig. 5 in a unit square domain with n=14. The MEDG (AOR) formula for diffusion equation can then be obtained by applying equation (17) to groups of points of type in the solution domain. This application will produce a 4x4 system which can be inverted and rewritten in explicit decoupled form of two equations: (1 ) ( ) 13 3 2 4 3 1 ,,   The resulting grid or the computational molecule to update at the two points can be viewed in Fig. 4 with a mesh size 2h. From Fig. 5, it is obvious that the evaluation of equations (18) -(19) involves only points of type . This means that by using the approximation formulas (18)-(19), it is easy to see that the black filled points are linked only to the same type of points. Thus the iterative procedure involving these formulas can be performed independently of the other type of points. We can then formulate the four points MEDG (AOR) method as in Algorithm 2: Algorithm 2 1. Divide the grid points at layer k+1 into points of type , , and as shown in Fig. 5. 2. Set the initial guess for the iterations. 3. Evaluate the solution at the points of type iteratively at layer k+1 by using equations (18)-(19). 4. Check the convergence. If the iterations converge, go to step 5. Otherwise, repeat step 2 and step 3 until convergence is achieved. 5. After the solutions at points of type converge, the converged points are then adopted as initial guess for the next time level. 6. Then, repeat steps 1 to 5 until the solutions at all the required time levels have been obtained. 7. For the solutions at the remaining points at layer k+1 (Fig. 5)

Numerical experiments of the sequential group methods
In order to verify the performance of the proposed methods which were shown in previous sections, the algorithms were tested on the following model problem: www.intechopen.com

∂∂ + ∂∂
is the diffusion term which describes the movement of the particles and the remaining term at the right hand side of equation (23) (i.e. g(x,y,t,u)) is the reaction term which describes changes (due to birth, death, chemical reactions, etc.) occuring inside the region (or habitat). For this numerical experiment, we purposely find a model problem which has an exact solution to ensure that the proposed methods yield correct results. To terminate the iteration process, the relative error test, i.e.
(1 ) ( ) Error abs u u abs u ++ =− + , was used as the convergence test with tolerance ε = 1.0 x 10 -6 . As described in Section 2.1, the over-relaxation parameters, R and w, need to be found which give the best convergence rates for the proposed schemes. To achieve this, we obtained the values of w for the corresponding SOR scheme, and then the value of R was found by running the experiments using these specific values of w which gave the least number of iterations. Different grid sizes of n = 82, 102, 122, 142, 162, 182 and 202 were chosen to record the total iteration counts (Iter) at all time levels and computer timings (t) of the group AOR methods. The value of t Δ = 0.0005 with 1000 time levels was used to run the programs. The numerical results of the proposed MEG(AOR) and MEDG (AOR) methods together with the original explicit group methods EG(AOR), EDG(AOR) are tabulated in Tables 1 to 2. The point AOR method which uses the existing traditional Crank-Nicolson scheme (5) accelerated with the AOR technique is also shown in Table 3 for comparison purposes . The value of R was chosen experimentally to be close to the value of w as depicted in the tables. All of the methods tested are of second order accuracies so that the results they produce are of similar accuracies as seen in Table 4. From Tables 1 and 2, it can be seen that between EG(AOR) and EDG(AOR), the latter has better rates of convergence which is consistent with the results in Ali and Lee (2007) for the elliptic problem. The diffusion equation is a time dependent parabolic equation where each time level represents an elliptic problem. In these tables, it can also be observed that both of the proposed MEG (AOR) and MEDG (AOR) methods have better execution times than the original EG (AOR) and EDG (AOR) respectively which is due to the reduction in computing efforts of the proposed methods. In the proposed modified group schemes, lesser grid points are involved in the iterative processes than the original group schemes which result in lesser overall arithmetic operation counts  Table 4. Average errors of all the methods for different mesh sizes.

EG(AOR) MEG(AOR)
From Tables 1 and 2, it can also be concluded that MEDG (AOR) is able to show the most substantial reduction in execution times compared with the other group and point AOR schemes without having to jeorpadize the solution accuracies. MEDG (AOR) requires the least number of total iterations and computing timings to converge. The required number of iterations is reduced because the introduction of the over-relaxation parameters, w and R, into its formulas is able to reduce the most the number of iterations of the scheme compared with the other schemes tested. This combined with the fact that only about 1/8 of the total nodal points are involved in the iterative process at each time level results in the least computing times for this method. In summary, the proposed MEG (AOR) and MEDG (AOR) methods are viable alternative solvers to the diffusion equation with the latter being the more efficient one in terms of CPU times. www.intechopen.com

Parallel implementations for the group methods
This section will discuss the implementation of the proposed group methods on a messagepassing environment.

MEG (AOR) in parallel
For the MEG (AOR) method, we decompose the domain Ω into a number of vertical panels at layer k+1 based on the number of available processors, p. The idea is to allocate approximately equal number of strips to the processors. Each strip consists of four grid lines which form the four points blocks with the spacing of 2h between the points. The equal number of vertical strips in each panel can be approximated using a specific formula. The distribution of tasks (panels) to processors for the case n=18 and p= 3 is as shown in Fig. 6 where the configuration is as follows:   After the domain Ω is decomposed into the individual panels, message passing needs to be done between the processors to send and receive data at the right and left boundaries of each panel. Based on equations (13)- (14), certain values from adjacent processors need to be communicated during the iterative cycle. The right boundary cell values, grid A's (panels 1 and 2) will be sent to the left adjacent neighbouring panels (panels 2 and 3) as shown in Fig. 7. The left boundary cell values, grid B's (panels 2 and 3) will be sent to the right adjacent neighbouring panel (panels 1 and 2) as shown in Fig. 8. These communications need to be executed correctly to ensure that each processor possesses the correct values needed for their respective independent calculations. After the message passing process is completed, the local error for each processor is calculated and is sent to the master processor for the global convergence check. The local convergence test used is the relative error test similar with their sequential counterparts. The global error is the sum of the local error from each processor. If the global error is greater than a certain tolerance ε, then the iteration is repeated.

MEDG (AOR) in parallel
Similar with the MEG (AOR) method, we decompose the spatial domain Ω into a number of vertical panels based on the number of available processors, p. For MEDG (AOR), we rotate the x-y axis clockwise 0 45 and forms the four points block with the spacing of 8h between the points of the matrices. We again consider the ordering of the strips for the case n=18 and p= 3 as shown in Fig. 9. Each strip consists of four grid lines which form the four-point groups with spacing 2h. We will distribute 1+1(extra strip) strips into panel 1. The other panels (panels 2 and 3) will be allocated with 1 strip each to ensure that the tasks are distributed almost equally amongst the processors. The number of strips for each panel (processor) will be computed similar as the one in the previous method.  After the domain Ω is decomposed into the individual panels, message passing needs to be done between the processors to send and receive data at the right and left boundaries of each panel. The points involved in the iterative process are different from the ones in the After completing the iteration process, we need to compute the solutions at the remaining points using the standard 5-points formula with the spacing of 2h for the points , rotated 5points formula for and standard 5-points method for . This process will be done only once directly and the cost of computing these values is  Table 6. Coefficients of the performance models in µs.

Numerical results of the parallel group accelerated methods
We implement the parallel algorithms on the Stealth cluster at USM. The experiments were carried out on 1 unit of PC with two 900 MHz CPUs, 2 GB RAM, and 6 units of PC, where each PC had two 1002 MHz CPUs and 2 GB RAM. The Operating System used was Solaris9 (SunOS 2.9) with Sun HPC ClusterTools 5 and Sun MPI 6.0. The parallel algorithms were tested on the same model problem that was used for the sequential version (23). For the MEG (AOR) and MEDG (AOR) methods, the sizes of n were chosen appropriately to make sure that all of the strips consisting of two grid lines can be decomposed approximately evenly to the 6 processors. The tolerance used was ε = 1.0 x 10 -6 and the acceleration parameters, w and R, were chosen to give the least number of iterations. From Table 6, we can see that the computation coefficient of MEDG (AOR) is slightly lesser than MEG (AOR). Therefore we expect that MEDG (AOR) should have better timings if compared to MEG (AOR). We further test the scalability analysis by comparing the experimental and predicted timings of these methods using n = 182 and 202 which are shown in Figs. 12 and 13 respectively. The figures show that the experimental and predicted timings are very close to one another especially when the grid size is larger. Comparing between the two grid sizes, it is found that the efficiency improves as the grid size increases. This improvement indicates that the performance models are more accurate as the grid sizes increase. Based on the parallel implementation which was described in Section 4, we used the size of n = 162, 182 and 202 to record the timings, speedups and efficiencies of both the MEG (AOR) and MEDG (AOR) methods. Several performance results of the MEG (AOR) and MEDG (AOR) methods are shown in Figs. 14-16.

MEG (AOR)
Execution Time From these figures, we can see that the parallel MEDG (AOR) is better in execution timings compared to the MEG (AOR). Generally we can see that with the enhancement of grid size, the speedup increases with nearly 70% efficiency. However, the speedup and efficiency values of MEG (AOR) are slightly better than MEDG (AOR). This difference in values indicates that the amount of computations carried out over the total communication overheads in MEG(AOR) is greater than the one in MEDG(AOR).

Conclusions
In this chapter, the formulation of new improved explicit group AOR methods in solving the two dimensional diffusion equation is presented. The improvement of the numerical result shows the potential of these methods in solving the parabolic equation. We further implement both of these methods on a cluster of distributed memory computer using Message-Passing Interface programming environment. The experimental results show that these two methods can be performed successfully in parallel on a cluster of distributed memory computer. Performance models to explain the parallel behaviour of these proposed methods were also developed. The experimental timings agreed with the predicted results especially when the grid size and processors increase. The MEDG (AOR) shows a faster rate of convergence with similar accuracies if compared with MEG (AOR), especially when the grid size increases. Both methods were shown to be suitable to be programmed on a distributed memory computer.