## 1. Introduction

Friction stir welding (FSW) is a solid-state welding process that was patented in the UK by “The Welding Institute” (TWI) in 1991. In this process, illustrated in **Figure 1**, a non-consumable rotating tool is used and the workpieces are joined in a solid state, without fusing the materials. This tool is classically made up of a cylindrical shoulder and a cylindrical or conical pin. To perform a weld, the rotation of the tool is initiated, and then the tool is forced into the parts to be welded. When the shoulder reaches the surface of the material, an important amount of friction heat is generated along the contact surface. The increase in temperature softens the material and helps the workpieces to become highly plastic. Although significant heat is generated, the material nevertheless stays in the solid state, at about 0.8–0.9 times the melting point. The combined effect of the increased temperature and the pressure exerted by the tool allows the workpiece material to be mechanically mixed. The plates are then joined together in a solid state as the tool advances along the weld seam.

FSW was initially developed and used to join aluminum alloys. However, since its invention, the application field of the process has been extended to weld various materials: copper, titanium, magnesium, steel, stainless steel, nickel, polymers, and lead.

To join two plates using the FSW process, a sequence of prescribed motions is performed. This sequence is normally divided into four different phases. Each phase plays a specific role in the welding process. These phases are illustrated in **Figure 2** and are identified as follows:

1) Plunge phase,

2) Dwell or stabilization phase,

3) Welding or advancing phase,

4) Tool removal or retraction phase.

During the plunge phase, the rotation of the welding tool is initiated and the tool plunges into the workpieces. During this phase, the material is relatively cold; only the pin is in contact with the workpiece. The axial force (also called forging force) and the torque applied to the tool are high, and in most cases, reach their highest values. At the end of the plunge phase, the pin has fully penetrated the workpiece and the shoulder is in contact with the surface. The rotation speed of the tool during the plunge and advance phase is frequently the same.

The dwell phase begins when the desired plunge depth has been achieved. The axial force **Figure 2** is maintained on the tool during this stabilization phase. The combined effect of the relative speed between the rotating tool and the material with the applied axial force generates heat due to friction at the tool-material interface. The tool is kept in place for a sufficiently long time to reach the temperature required for welding.

After the dwell phase, the tool starts to advance and accelerates to the prescribed translational velocity along the weld line. The acceleration may be fast if the dwell phase was sufficiently long and the temperature is high in the weld zone. However, too fast an acceleration can result in high mechanical stresses for both the tool and welding equipment, reducing their useful lifetime. Depending on the design of the tool and the specific process parameters, the FSW tool may be tilted slightly (a few degrees) to improve the quality of the weld.

In conventional arc-welding techniques, the material is physically melted to produce a weld. In FSW, numerous drawbacks associated with the presence of a liquid phase during welding are eliminated: solidification cracking is eradicated, and the distortions and the size of the heat-affected zone (HAZ) are reduced. Spatter, fume, and ultraviolet (UV) emissions are also eliminated. Compared to arc-welded parts, the FSW assemblies frequently exhibit higher mechanical properties in tension, compression, bending, and an increased life in fatigue. In addition, no flux, protective gas, or filler material is needed during welding. Finally, the thickness of FSW welds may go from few tenths of millimeters up to more than 70 mm in aluminum alloys.

However, the FSW process has certain limitations as well. In order to bring the material into the plastic state, the required torque and forces can be very high. The axial force applied on the tool can reach many kilonewtons (many tons of force). For this reason, the welding machine must be robust, typically leading to relatively expensive equipment. In order to have high-quality welds, it is also important to assure the appropriate clamping and support of the pieces to be welded. Further limitations of the FSW process are mostly related to geometrical factors. During welding, the tool shoulder must have constant and uniform pressure on the workpieces. Certain traditional types of welds such as the fillet weld cannot be accomplished without modification of the standard tool geometry.

There are two main classes of FSW tools: single and double shoulder. The tool shown in **Figure 2** belongs to the first category, while the double-shoulder tools have a pin located between two shoulders. These double-shoulder tools create high pressure in the weld zone by forcing the parts into a space slightly narrower than their thickness. This method eliminates the need for a solid backing plate that bears the axial force in case of single-shoulder technology. Furthermore, in the case of double-shoulder tools, the problem of insufficient penetration is eliminated and the temperature distribution is symmetrical about the center of the weld zone.

After its invention, FSW has been rapidly introduced in various fields: in marine and rail industries, automotive, aeronautic, aerospace, and fixed structures. Various types of materials are now welded, and composite welds (e.g., Al-Cu or Al-steel) are performed. There are also many variations on the standard FSW process. For example, using a procedure essentially similar to FSW, a method that is comparable to traditional resistance spot welding called Friction Stir Spot Welding (FSSW) has been developed. These two techniques can produce similar punctual welds, for various parts with similar geometry and thickness. To produce a weld, a rotating tool is plunged into the material. The axial motion stops when the shoulder touches the surface of the workpiece, the rotating tool stays there for a short period of dwell, and then it is extracted. FSSW has the benefit of being easy to mechanize with a robot, leading to excellent repeatability and improved weld quality compared to resistance spot welding. Another variation on the standard FSW process is the use of a tool with a retractable pin; this type of tool can be used to mitigate the presence of the hole left behind when the tool is retracted in phase 4. This process can be used to join parts where the presence of a hole at the end of the weld line is unacceptable.

The physical principle of FSW has also been used to improve the microstructure of the workpieces. In this technique, called friction stir processing (FSP), an FSW tool is used to modify the microstructure of the material. The principal improvements made by FSP are as follows:

Creation of very fine microstructures to obtain super plasticity (nanograins can be produced);

Homogenization of the microstructure to reduce segregation, eliminate porosity, and increase mechanical properties, ductility, and corrosion resistance;

Introduction of particles to develop composite surface (metal matrix composite (MMC)) and modify the elasticity, wear resistance, thermal and electrical conductivity, or internal damping of the material.

The local modifications performed by FSP to the microstructure can be very beneficial in a zone of high stress, where a good ductility is needed, or where the fatigue life should be increased.

Numerical simulation of FSW is a popular field of research since the underlying physics is complex and requires the use of advanced multi-physics solvers. There are various numerical methods that can be used to simulate the friction stir-welding process. The finite difference method (FDM) and the finite element method (FEM) have certain applicability for studying the temperature distribution (heat transfer simulations). Lagrangian-based FEM typically will suffer excessive element distortion for processes that occur with large finite strains. The finite volume method (FVM) is also popular for studying the material flow and is strictly an Eulerian approach (cannot follow the evolution of each material point). Arbitrary Lagrangian Eulerian (ALE) is a meshed-based method that includes a material advection of the Lagrangian mesh within an Eulerian mesh. This allows for larger levels of plastic deformation to be studied. However, the method does have certain downfalls. Since the ALE scheme is highly dissipative, this makes simulating long processes (such as FSW) prone to precision error. The method also suffers from advection errors when the material movement is predominately out of the corner of an element (the classic ALE scheme advects material orthogonal to element faces). To date, mesh-free methods such as smoothed particle hydrodynamics (SPH) have shown the most potential to simulate the entire FSW process. Because SPH is meshfree, very large plastic deformation can be simulated without the problem of mesh distortion. Although the SPH method is computationally burdensome, the method can easily be adapted to run in parallel on the graphics processing unit (GPU) to significantly improve the calculation time.

Shi et al. [1] studied the effects of ultrasonic vibration to improve the weld quality using computational fluid dynamics (CFD). They validate their model by comparing predicted temperature and flow for experimental work. They note that the ultrasonic-assisted FSW process provides a larger flow region and allows for faster welding without the presence of defects. Since they use CFD, they are not able to follow the material history (Eulerian frame of reference). Furthermore, they cannot predict residual stresses or defects in the weld zone. Fraser et al. [2] have used FDM to predict the temperature distribution during the full FSW process. They use the results to find the optimal process parameters (based on an optimal temperature). Their method is efficient and was shown to correlate well with experimental work. The model is limited to temperature calculation and cannot be used to predict deformations, stresses, and defects.

Buffa et al. [3–5] used FEM to develop a hybrid model capable of determining the residual stresses in the resulting weld. They split the FSW process simulation into two phases. In the first phase, they model the plunge, dwell, and advance using a rigid viscoplastic model (fluid-based) that does not provide elastic stresses. Then, they switch to an elastic-plastic model to approximately calculate the resulting residual stresses during weld cooldown. They are able to obtain good correlation for the residual stresses. On the downside, their model does not allow for tracking defects since the welding phase is based on a fluid model.

Guerdoux and Fourmont [6] used the ALE method to study the different phases of the entire process. They used an elastoviscoplastic rate and temperature-dependant material model with the Hansel-Spittel rheological model. On the downside, the Hansel-Spittel model requires coefficient fitting from tensile tests and the coefficients are not commonly available. Grujicic et al. [6–8] as well as Chiumenti et al. [7] used ALE to simulate the FSW process and considered the effect of pin shape, contact friction, material and temperature flow. Their models are highly sophisticated, but are not able to predict residual stresses and defects. They noted that the calculation time is many weeks with their approach.

Bohjwani [8] used the SPH method to study the FSW process with the Johnson-Cook constitutive model in LS-DYNA. At the time, it was not possible to perform a coupled thermomechanical SPH simulation. As such, thermal softening is not taken into consideration. Timesli et al. [9] used the SPH method in two dimensions (2D) to simulate the FSW process. They have used the fluid formulation that directly calculated the deviatoric stress from the strain rate and a non-Newtonian viscosity (function of temperature). They showed that their model correlates well to an equivalent CFD model; however, they did not validate the model experimentally. Recently, Pan et al. [10] used the SPH method to solve the fully coupled thermomechanical problem for the FSW process in three dimensions (3D). Their approach gives detailed grain size, hardness, and microstructure evolution using the SPH method. However, they use a fluid-based formulation that does not allow the determination of elastic strains and stresses. Fraser et al. [11–13] have used the SPH method to simulate various FSW processes using a fully coupled thermos-mechanical SPH-FEM model. The tool is modeled with rigid FEMs and the workpieces with SPH. The model is able to predict temperatures, stresses, and defects all within a Lagrangian framework. This approach permits following the material point history throughout the entire welding process. Since the tool is modeled with FEMs, friction contact can be included.

In this chapter, we describe our approach toward simulating the entire FSW process using SPH on the GPU. In Section 2, we explain what SPH is and how the method can be used to solve large plastic deformation problems with an elastic-plastic formulation, including a description of our parallelization strategy on the GPU. Section 3 introduces the simulation model of a complex aluminum alloy joint. The simulation model will be used to show the power of the SPH method. A validation case is presented to show that the model is able to predict tool torque, force, and the temperature distribution, as well as the size and shape of the flash. Finally, Section 4 wraps up the chapter with concluding remarks and an outlook toward the future of FSW simulation.

## 2. Simulation theory

### 2.1. Smoothed particle hydrodynamics

Smoothed particle hydrodynamics is an advanced Lagrangian mesh-free simulation method. The numerical technique has applications in a wide variety of dynamic problems such as astrophysics, magnetohydrodynamics (MHD), computational fluid dynamics, and computational solid mechanics (CSM). The method was originally proposed by two independent research groups within the same year. Gingold and Monaghan [14] showed that the method could be used to simulate nonspherical stars, and Lucy [15] used the method to test the theory of fission for rotating protostars. One of the first groups to apply the SPH method to solid mechanics was Libersky and Petschek [16] in 1991.

What makes the SPH method meshfree is that the set of field equation (conservation equations for a solid body in this case) is solved by interpolation using a kernel, *W*(*r*, *h*), from a set of *j* neighbor particles that are within the influence domain of a particle of interest, *i*. **Figure 3** gives a graphical representation of this concept. In this method, continuous field equations are “weakened” into a set of discrete ordinary differential equations. A continuous function is approximated by an interpolant through the use of a convolution integral:

*h*. The kernel is the key to the SPH method. The continuous SPH interpolation equation can then be written for a set of discrete material points:

*x*_{i} is the spatial location vector for particle *i* and *x*_{j} for the *j*^{th} particle. *m*_{j} and *ρ*_{j} are the mass and density of a *j*^{th} particle and *W*(*r*,*h*), will be written as *W*_{ij} throughout the rest of the paper. The sum is taken over the total number (*N _{i}*) of the

*j*particles within the influence domain of

*i*; these are termed the neighbors of the

*i*

^{th}particle. As a general rule, we will use tensor notation to describe variables in continuum equations and indicial notation for the discrete SPH equations. Subscripts are reserved to indicate the

*i*

^{th}or

*j*

^{th}particle, whereas superscripts follow the general rules of the Einstein notation. For example, the Cauchy stress tensor,

*σ*for the i

_{i}^{αβ}^{th}particle.

Determining the neighbors list is a major part of the computational time in the SPH method. We have developed an efficient adaptive neighbor-searching algorithm (complete details in Fraser [17]). The adaptive search typically cuts the search time in half or better.

In the SPH method, the gradient of a vector function can be shown to be simply the function multiplied by the gradient of the smoothing function:

The evaluation of first derivatives is straightforward in the SPH method through the use of Eq. (3). The gradient of the smoothing function is given by

The smoothing function is typically normalized using the ratio

### 2.2. Coupled thermal mechanics SPH formulation for FSW

In this section, the solid-mechanics formulation of smoothed particle hydrodynamics that is used in this work is outlined. The formulation bears close resemblance to that of a fluid approach; however, the main difference is the ability to account for elastic and plastic deformation. Liu and Liu [22] as well as Violeau [23] provided in-depth development of the SPH conservation equations.

#### 2.2.1. Conservation equations

In order to simulate the FSW process, we must discretize conservation of mass, momentum, and energy using the SPH method previously outlined. We use the weakly compressible approach that is common for large deformation problems (e.g., see [24–27]). Fundamentally, for a system described by particles, mass is inherently conserved at the particle level. It follows then that mass would be conserved for a set of rigid particles (incompressible) that make up a system. On the other hand, for a system made up of non-rigid (compressible) particles, we must take into account the spatial and temporal change of mass, *m*, within an infinitesimal volume. A convenient measure of this change is the local density, *ρ*=*m*/*V*, of an element within the infinitesimal volume. The conservation of mass for a temporally changing compressible system is

where *t* is time and

where *N _{i}* is the number of neighbors of the

*i*

^{th}particle and

*v*=

_{ji}^{β}*v*–

_{j}^{β}*v*. There are other forms of conservation of mass in the SPH method; this form is found to be robust and has the added benefit that it provides improved results for a system with significant spatial variation of density such as in multi-phase problems. The continuum mechanics description of conservation of momentum for a solid body is

_{i}^{β}Equation (8) describes the change in velocity (acceleration) of a material point in a solid body subject to internal forces due to stress,

This version of the momentum equation is commonly called the symmetric form since the pairwise particle interactions are balanced. Moreover, this form exactly conserves linear momentum.

In order to simulate the FSW process, we must take into account the change in energy in the system due to conversion of internal energy (plastic deformation) and frictional heating. The standard energy equation for a weakly compressible body takes on the same form as the heat diffusion equation:

Equation (10) provides the temporal change in temperature, *T*, in a solid body due to the diffusion of thermal energy. *C _{p}* is the heat capacity and

Although frictional heating, convection, and radiation are surface integrals, we have found that these terms can be approximated as volume integrals without any loss of precision for the FSW simulations. In this sense, the heat generation and dissipation take on the following form:

where *h*_{convi} is the coefficient of convection, *A _{i}* is the equivalent surface area of a particle taken to be

*h*

^{2},

*ε*is the emissivity of the workpieces,

_{i}*σ*

_{SB}is the Stefan-Boltzmann constant,

*T*) is the temperature of the ambient air, and

_{∞}*T*

_{surr}is the (average) temperature of the surroundings. The friction heat is distributed into the workpieces (

*i*

^{th}particle) and the tool (

*j*

^{th}finite element) by the

*λ*parameter:

_{i}Certainly, the heat loss and gain at the surface (convection, radiation, and friction heating) can be evaluated accordingly as surface integrals. However, we have found that the added complexity does not lead to improved precision for the FSW models that we have considered. In our experimental work, the surfaces of the workpieces are painted black to improve the quality of the image taken with an infrared camera. Note that for unpainted aluminum, the emissivity is very low (often less than 0.1); however, for a painted plate, the emissivity is ~0.95. Because of this, radiation effects are significant and should not be disregarded in the energy balance.

#### 2.2.2. Stress and strain in SPH

The stress state can be updated in the material using a frame-indifferent objective stress rate equation. There are many different stress rate equations that can be used such as Truesdell, Green-Nahgdi, or the Jaumann rate equation (others exist). The Jaumann rate has a relatively simple formulation, thus making it unassuming to implement in a CSM code. The rate equation is

*G* is the shear modulus of the material,

The strain rate is found from

The *β*_{expand} is the coefficient of volumetric expansion of the material. The SPH form of the spin rate is

As the SPH method used is that of a weakly compressible approach, an equation of state is required to link the pressure, *p*, to the density, and speed of sound, *c*:

Plasticity is included in the simulation by using an elastic-perfectly-plastic–thermal-softening flow stress model of the form

Here, *T*_{R} and *T*_{m} are the room and melt temperature, respectively. *m* is the thermal-softening exponent. Plasticity is accounted for using the radial return algorithm (see [29–31] for further details).

### 2.3. Parallelization strategy on the GPU

Many types of engineering simulations require a large amount of computational time due to the complexity of the numerical model and/or the sheer size of the computational domain. In the case of friction stir welding, capturing all the aspect of the process requires a multi-physics approach that is very computationally burdensome. A typical FSW simulation can take many days or even weeks running on a single processing unit (sequential approach). For this reason, it is critical to be able to find an efficient means to run the simulation code in parallel. The idea is to split the domain into subregions and assigns them to individual processing units.

There are a number of different parallelization strategies that can be used. A popular method for small- to medium-sized models is to use a shared memory parallel (SMP) approach wherein each processor has its own set of tasks, but the processors share memory. In this sense, all the simulation data are stored in a common memory location. OpenMP is a very common directives-based programming language that can be used for SMP codes running on central processing units (CPUs).

For larger models, a different tactic is often employed with large number of CPUs, whereby the model and the data in memory are split up and assigned to individual compute “nodes.” This approach is called distributed-memory parallel and requires the individual compute “nodes” to be linked by a network. A message-passing interface (MPI) can be used to provide the communication.

Another parallelization strategy that has become very popular is to use the graphics processing unit. Today’s GPUs have hundreds and in most cases thousands of “cores”. **Figure 4** shows a schematic of the architecture of a typical GPU. We can see that each multiprocessor is composed of a large number of “thread processors”. The GPU has its own memory called global memory that is accessed by all the multiprocessors. For this reason, as much as possible of the code should be programmed on the GPU to limit the amount of data transfer between the CPU and GPU.

In the case of simulating the FSW process with SPH, the GPU is ideally suited for parallelization. The large number of streaming multiprocessors on a GPU is perfect for the computationally heavy nature of SPH. SPH codes written to take advantage of the GPU can typically achieve speedup factors of 20–100× over an equivalent serial CPU (e.g., see Dalrymple et al. [34]). In some cases, speedup factors of over 150× are possible, although these are typically problems that are set up to fully exploit the architecture of a specific GPU.

Our parallelization strategy for the SPH code on the GPU is to assign each particle to a thread processor. In this sense, a thread will then carry out a set of calculations for a single particle. The number of threads that can run in parallel is hardware and code specific, but is typically in the multi-thousand range. Certainly, there are different parallelization strategies for SPH on the GPU; however, we have found this approach to be straightforward and efficient.

## 3. Simulation of a complex FSW joint

To date, most of the work on simulating the FSW process has been focused on a simple butt-joint geometry model. Such a model is sufficient for academic research. However, for real engineering applications, the numerical model should be robust enough to be able to simulate complex geometries within a reasonable timeframe. In this section, we describe the FSW simulation model and results for a complex geometry. The case considered is of an aluminum alloy bridge deck that is fabricated by extrusion in multiple sections and joined using FSW. The joint geometry can be seen in **Figure 5**. One of the drawbacks of using extruded sections is that the parts tend to fit together with some undesirable qualities for FSW. In this case, the two workpieces join together with a ~0.5-mm step at the top surface of the joint (as shown in **Figure 5**). The left-most workpiece is slightly thicker than the other, and, as such, poses a challenge for FSW. The tool will have to push down an extra 0.5 mm in order to come into contact with the lower of the two surfaces. This in turn causes the formation of a significant flash on the thicker workpiece. The overall height of the joint is 100 mm, the three vertical members are 3 mm thick, the thicker plate (left side of step in image) is 3.7 mm thick, and the thinner plate is 3.2 mm thick.

### 3.1. Model description

The complex joint geometry is modeled by a combination of SPH for the workpieces and rigid finite elements for the tool. Since the tool is made of hardened steel, it can safely be approximated as a rigid body. The simulation model is shown in **Figure 6**; here, we can see the rigid tool and the two workpieces including the step at the top surface. The mesh size for the finite elements is 0.6 mm in the pin and shoulder region. Large elements are used outside of this region since contact with the workpieces is only during flash formation.

The entire joint geometry is modeled with elastic-plastic-thermal SPH elements to allow for an improved prediction of the thermal expansion and the stresses in the joint during the welding process. The vertical member below the weld seam carries 90% of the forge force during the welding process. With our modeling approach, the stresses and the possibility that the member could collapse can be evaluated. The tool interacts with the workpieces through a penalty-based contact algorithm that we have developed for FSW (full details in Fraser et al. [35]). The tool has a shoulder diameter of 15 mm, an average pin diameter of 6 mm, and a pin depth of 3.8 mm. The simulation model is composed of only a small region of interest of the actual bridge deck. Convection (10 W/m^{2}K) is included in the model as well as radiation (the surface of the workpieces was painted black, an emissivity of 0.95) using a novel adaptive thermal boundary condition algorithm (see Fraser et al. [36]). The material parameters of the aluminum alloy used in the simulation are shown in **Table 1**.

We have used a uniform grid particle distribution of 0.6 mm to discretize the workpieces. This spacing allows for a sufficient number of particles through the thickness without incurring excessive calculation penalty. The time step size is selected based on the Courant-Friedrichs-Lewy (CFL) criteria, *dt*_{min} = CFL[*h*/(*v*_{max}+*c*)]. For this FSW model, we found that CFL = 0.7 was acceptable, leading to *dt*_{min} = 9.8 × 10^{-8}). The small time step size is one of the major drawbacks of using a solid-mechanics approach. Nevertheless, the time step size is required in order to capture the propagation of elastic stresses within the aluminum.

The model is run as two distinct phases: plunge and advance. The dwell phase was not part of the process as a ramp-up procedure to full advance speed was used in the experiment. A well-defined ramp-up is good practice to limit the forces and torque on the tool and can replace the dwell phase. The plunge speed is 25 mm/min and the full advance speed is 1250 mm/min with 2100 rpm. The ramp-up is performed linearly for an initial tool displacement of 40 mm; after this point, the tool speed is constant at 1250 mm/min.

Because of the 0.5-mm step, excessive amounts of flash are produced as the tool advances. The flash has to be removed following the welding phase and requires a significant amount of work for the welding technician. In order to attempt to reduce the quantity of flash produced, we investigate three cases as follows:

Case 1- As performed in experiment—Full depth plunge (4.3 mm) until the tool shoulder contacts the lower workpiece surface with a counterclockwise tool rotation. This simulation case uses the same process parameters as the production run. The model serves as the validation case using temperature, force, torque, and flash height.

Case 2- Variation 1—Partial depth plunge (4.2 mm) with a counterclockwise tool rotation.

Case 3- Variation 2—Full depth plunge (4.3 mm) until the tool shoulder contacts the lower workpiece surface with a clockwise tool rotation.

Case 1 represents the actual process parameters used in the experiment. This case is used to validate the tool force and torque, as well as the temperature distribution and history. Cases 2 and 3 are variations on case 1. In case 2, we attempt to reduce the quantity of flash by plunging less (4.2 as opposed to 4.3 mm). This will have the effect of limiting the volume of material that is sheared off the top surface of the thicker plate. In case 3, the flash formation will be reduced by operating the FSW tool with a clockwise rotation. This results in the advancing side being on the surface of the thicker plate. This will increase the weld temperature and help to move more material to the lower side of the step, ultimately creating a superior weld compared to cases 1 and 2.

### 3.2. Simulation results

The three cases were run in SPHriction-3D; in this section, we present the results from the three different cases. The production process parameters correspond to case 1 and are used to validate the model. A video of the results for the three cases is available here: https://www.youtube.com/watch?v=eLOQILkUx-A.

The temperature distribution results for the three cases are shown in **Figure 7** at different times during the simulation. We can see that the maximum temperature for case 2 is lower than for the other two cases. This is because the tool plunges 0.1 mm less, in turn decreasing the forge force and the heat generated due to Eq. (12). The ultimate result is that the quality of the weld in case 2 is significantly lower than in the other cases. Of the three cases, the best weld quality is obtained from case 3. Since the tool rotates clockwise, the advancing side is on the surface of the thicker workpiece. This helps to move the hot material to the thinner workpiece at the front of the tool. This is a favorable situation compared to having the hot material move around the back of the tool (as in cases 1 and 2). We have every advantage to have the advancing side on the thicker workpiece since the pressure is higher there. This causes the workpieces to heat up more uniformly than is possible in either case 1 or 2.

We have used four measurement points (TCs) for the temperature histories as shown in **Figure 8**. TC1 and TC2 are placed at the middle of the workpiece (along the weld direction). TC3 and TC4 are placed in line with the tool axis during the plunge phase. The four TCs are at the surface of the workpieces and located 11.5 mm from the interface of the two workpieces. MTC1 is a moving temperature measurement point that is located on the underside of the tool and follows the tool as it rotates and advances. MTC1 is located 6 mm from the tool axis on the underside of the tool shoulder.

The temperature was measured experimentally at two points on the surface of the thicker workpiece (at locations TC1 and TC3) using data obtained from an infrared camera (IRcam). Due to the filming angle available with the IRcam (restricted access to work area), temperatures on the thinner workpiece could not be evaluated. **Figure 9** shows that there is a good agreement between the experimental and simulation results. The simulation model has a tendency to slightly overpredict the temperature. Since we have used the perfectly-plastic-thermal-softening model presented in Eq. (19), there is an overprediction of the plastic deformation and in turn an increase in the heat generated as shown in Eq. (12). Furthermore, the heat capacity and thermal conductivity of the aluminum alloy at high temperature are not known. These parameters play an important role in the coupled thermal-mechanical model.

The relative difference between the TCs on the thicker and thinner plates gives a good means of diagnosing the quality of the weld. If there is a large difference in the temperature reading, we can conclude that the pressure is higher on one side of the weld than the other. This leads to an unfavorable temperature distribution and the weld quality suffers. Case 2 is an excellent example of such a situation. Notice the large difference in temperature in TC3 and TC4. Since the plunge depth was insufficient, there is not enough pressure on the thinner plate, leading to a decrease in temperature.

The temperature results at MTC1 are also an excellent indication of the weld quality. Since MTC1 follows the tool as it rotates and advances, large temperature fluctuations are suggestive of inadequate process parameters. We can see that the variation in temperature at MTC1 for case 3 is significantly less than in other two cases. The experimental setup that we used did not allow us to embed thermocouples in the workpiece or in the tool. Using an IRcam is beneficial in cases such as this since holes do not need to be drilled in the aluminum or in the tool. The surfaces to be filled should be painted a light coat of flat black paint that can easily be removed with light buffing following welding. Temperature measurements with an IRcam provide a very powerful diagnosis tool in the laboratory or in the hands of an FSW technician at a commercial company. The images obtained can help the technician or an engineer to understand whether their chosen process parameters are adequate and if not give good hints as to why. For example, if the IRcam shows a significantly higher surface temperature on the advancing side than the retreating side, the tool is likely advancing too fast for the chosen rpm. During the plunge phase, the IRcam can again be used to determine whether the plunge speed is too high (surface temperature too low) or low (surface temperature too high) (**Figure 10**).

We can conclude that the weld zone has a more uniform temperature distribution. This leads to favorable welding conditions and results in improved weld quality. Of particular interest is the strong oscillation at MTC1 for case 2. Near the end of the simulation, there is a peak-to-peak temperature change of over 300°C. The temperature on the thinner plate is too low to allow the aluminum material to flow and the weld is essentially incomplete. This can be verified by investigating the plastic strain contours in the weld zone as shown in **Figure 12**. We can see that case 3 is the only one of the three in which the mechanically effected zone spans the entire diameter of the tool. In case 1, the welded zone gets narrower as the tool advances. For case 2, the welded zone spans no more than half the tool diameter from the edge of the tool pin on the thinner plate into the thicker plate.

A comparison of the spindle torque and the forge force is shown in **Figure 10**. The inertia of the spindle plays a strong role in the experimentally measured torque. Because the plates being welded are very thin, the maximum process torque does not exceed 25 Nm and the average torque during the advancing phase is ~20 Nm. However, the no-load torque measured was ~10 Nm, accounting for almost half of the typical process torque. In the simulation models, the inertial effects of the spindle are not taken into consideration. The simulation torque is calculated by taking the cross-product of the contact forces and the distance vector between the tool axis and an SPH element. For this reason, the torque trends line up well with the experimental data, though the magnitude is diffident (**Figure 11**).

A good correlation between the forge force from experiment and simulation was obtained. The inertial effects do not play an important role here, leading to a better prediction than was obtained with the torque. There are other factors that lead to a reduction in the precision of the predicted torque and forge force such as the thermophysical properties of the material, the chosen friction law, differences in how the FSW machine and simulation model control the position, and rpm of the tool, as well as discrepancies between the actual geometry of the workpieces and the tool compared to their idealization in the simulation model.

Nevertheless, the simulation model provides an excellent understanding of how a change in process parameters effects the torque and forces. We can see that the tool torque and forge force for case 2 are lower than those for cases 1 and 3. This is an intuitive result as the plunge depth is shallower, leading to less contact pressure (**Figure 12**).

A flash height of 4.2 mm was measured experimentally; case 1 predicts a flash height of 4.5 mm, 3.9 mm for case 2, and less than 1 mm for case 3. The flash heights are shown in **Figure 11**; notice that the wavy pattern of the flash is well represented in the simulation model for case 1. The images have been cleaned up to show only the continuous flash line by omitting sporadic “flash flakes” that typically do not require much effort to remove. Clearly, the flash produced in case 3 is significantly less than that in other two cases. The reason is entirely due to the change in tool rotation. Flash lines will most commonly be laid down on the retreating side of the weld. By ensuring that the advancing side is on the thicker side, the material is “ripped” from the thicker side and transported to the retreating side. Because of the height change, the flash is not able to attach to the thinner side and creates intermittent “flakes” that can be removed in less time than is possible in the case of a continuous flash line on the thicker side (as in cases 1 and 2).

## 4. Conclusions

In this work, we have presented our approach toward simulating the entire FSW process using a solid-mechanics approach. By using a mesh-free numerical method such as SPH, the large plastic deformation encountered during FSW can be easily calculated. Mesh-based methods struggle to capture all the physics of the process due to discretization errors as the mesh distorts. The fully coupled elastic-plastic-thermal code is able to predict temperature, stress, and deformation histories. Because of the mesh-free Lagrangian nature of SPH, the model is able to predict defects (free surface changes) in a way that other numerical methods cannot. The prediction of defects is an invaluable feature for an engineer working on the design of the joint geometry to be welded. Optimal process parameters can then be chosen that lead to no-weld defects. In this manner, the design engineer can find the fastest rate of advance that can be used to increase the overall profit margin during a high-volume production run.

One of the major advantages of using a solid-mechanics approach compared to a fluid approach is that the simulation models are able to capture the elastic stresses and strains. **Figure 13** shows the effective stress in the joint at the end of the plunge phase. This is the point when the forge force reaches its maximum value. This is of great interest to a joint designer who is interested to know if the joint will withstand the forge force during the welding process. If the vertical members under the weld seam are too thin, they will likely undergo significant plastification and could collapse. This certainly would be disastrous for the finished product. Other benefits of including the elastic stresses and strain are the ability to more precisely predict defect size and shape, as well as residual stresses and deformation following a cooldown phase.

Looking toward the future of numerical simulation of FSW, we can see that as the performance of GPUs continues to improve, larger and more complex simulation models will be possible. We are currently working on a multi-GPU parallelization strategy that will allow tens and even hundreds of millions of SPH elements to be simulated. This approach requires the use of a highly optimized communication strategy between the GPUs (e.g., using MPI). We are currently working on various developments in the code, such as follows:

Improved contact models with different friction treatments (such as including the shear limit and/or a stick-slip behavior);

Wear prediction at the surface of the tool using Archard’s model;

Improved thermo-physical material representations that more accurately model the behavior of the aluminum alloy during the FSW process;

An implicit mesh-free collocation approach that will permit efficient simulation of long duration phases such as cooling.

Since the simulation code is developed using a highly optimized parallel-processing strategy, complex 3D-joint geometries can be simulated within a reasonable period. In this work, the three simulation models were run simultaneously on a personal workstation with three individual GPUs. The cost of such a computer is less than five thousand dollars in today’s market. Because of the parallel strategy, a cluster with many GPUs can be used with 100% efficiency (as long as an individual GPU has enough memory for each simulation model). In the sense of optimization, a company with access to a GPU compute cluster (say eight or more GPUs) could run parametric models (e.g., varying the rpm and advance speed) simultaneously. The obtained data sets would provide the required information to construct a response surface and find the optimal advancing speed and rpm.