General software structure.
The reactor core system consists of many materials, involving multi-physics processes, and can be analyzed and simulated at multi-scales. With the evolution of cluster computer, traditional programs and models could be translated into new program structure and modified in detail, so that more complex problems can be solved. Based on existing theory, programs of sub-channels analysis, two-dimensional (2D) method of characteristic (MOC), fuel temperature approximation, and three-dimensional (3D) discrete ordinate method (SN) are developed and analyzed. The different approach is that the reusable software structure of core calculation is established, with more well-defined storage of nuclear data, control layers, and more effective parallel algorithm for computation. The features of parallel algorithm for these programs are listed succinctly in the discussion. Additionally, the corresponding testing on parallel algorithm and computing results are given.
- reactor core
- modeling and simulation
- nuclear data
- software structure
There are various heterogeneous phenomena in reactor core, which are related to multi-physics and multi-scales modeling and simulation . The core program or software is the important approximation for the reactor core; therefore, mathematics models, numerical algorithms, preprocessing, post-processing, and visualization are sometimes managed in the similar software structure . Based on the existing conditions of calculation, high-performance-computing technology is an important mean to accomplish the simulation of core programs by the cluster computer. For instance, there are several supercomputers (cluster) in ANL and ORNL that provide high-capacity resources for reactor engineering calculation and nuclear-related simulation . Once the numerical algorithm is fixed, the time-to-solution becomes the necessary consideration during the research, and the approach pattern can be summed up in three steps  in this field:
Develop the single discipline program or multidisciplinary coupling code for the reactor core, which enable higher accuracy.
Propose and implement parallel algorithm for the program.
Use parallel hardware to run and carry out numerical experiments within acceptable requirements, and then match the reference solution or experimental data.
According to the modeling and simulation for reactor and the features of multi-physics and multi-scale models, the reactor calculation is classified into core calculation and out-core calculation . The core calculation covers considerable subjects, for instance, reactor physics, thermal-hydraulics, dynamics, and fuel performance, for example. For the purpose of research and engineering, different discrete systems and solvers are implemented so as to reveal different phenomena in the core [6, 7, 8]. As an example, the DENOVO transport software of ORNL utilizes 160,000 processors to run the benchmark core example in Jaguar supercomputer; up to now the computer has been upgraded to TITAN .
On the one hand, the simulation could be accelerated through effective parallelization in different models . On the other hand, the parallel computing makes researchers focus on more research fields except mathematics and physics , which constructs one view from nuclear data to top-level application. With the attempt of the M&S content, the core software could be ascribed as the expression and translation of the set of state variables and data structure . Thus, the conjunction of the core problem and parallel computing is from the new requirements of application; furthermore it is driven by the underlying hardware innovation. The Berkeley Parallel Computing Laboratory reviews parallel computing and points out that the parallel elements of different algorithms are the methodology of application driven and reuse. Since various features exist in different layers of software, the parallel algorithm should be designed specifically so as to adapt the parallel hardware . The chapter describes and explains the software structure, underlying nuclear data, and parallel algorithm from a systematic view in the domain of reactor core calculation. Section 2 abstracts the software structure for core calculation. Then, Section 3 explains the basic elements of nuclear data and its association according to the software structure. After that, Section 4 lists the concept and steps of parallel algorithm analysis for core problem in practice. Sample programs are analyzed and discussed concisely in Section 4, and conclusions are given in Section 5.
2. Software structure
As the complex system, the reactor core could be understood by various parts, such as concept models, numerical methods, model uncertainty, and model reuse. Algorithms and software are the center of digital simulation so that the calculation rule is regulated by the program technologies. Sometimes the fixed software structure could be abstracted, and common algorithms could be extracted to be close to the trend of reactor core simulation, which forms one united manner (also known as framework method ).
|Model layer||The model code|
|Model framework layer||Collection of models for a single application area|
|Simulator layer||Provides the paradigm for simulation and synchronization usage|
|Component federation layer||Provides interface code to create one model with multiple sub-models|
|Load management layer||Within one parallel model execution, collects run information|
|Ensemble layer||Runs instances in a job and handles scheduling|
|Operating system/job scheduler layer||Runs independent jobs in parallel|
Then, the numerical calculation system is abstracted into multiple layers, which are implemented through controls, modularity, and dynamic interaction. So participants from different professional backgrounds pay more attention to their own business . The core calculation learns from the above contents, and the software structure could be simplified in Table 2.
|Numerical framework||Provides basic arithmetic function and express data||PETSc, Trilinos, HDF5|
NetCDF, BLAS, MPI
|Discrete model||The single component with the properties of time, spacing, parallelism, and synchronization||Structured grid|
|Component combination||Provides interface code to create model of multiple components||Geometry control|
Neutron transport solver
|Algorithm analysis||Schedule the model in parallel, and process the run data and obtain running information||TAU|
It is effective to develop numerical software based on reusable components or framework, so the discrete model could be modeled through fixed process. On the one hand, the usage leads to reuse. On the other hand, new programming pattern could be merged. For instance, with the help of the underlying framework, MC neutron transport combines the discrete model and algorithm analysis process and then uses numerical components and structure to accomplish the transport task easily in the reactor core . Since this integrated software structure method restricts the programming manner of the core calculation and provides reuse mechanism, then in practical, the prototype software NAC4R is designed and constructed that contains basic data operation, basic algebra operation and linear system solver, and so on. As an example of the above data, NAC4R can be used for the parallel analysis process of sample programs that explains the effectiveness and efficiency of the corresponding software structure.
3. Nuclear data process
The underlying nuclear data is the basis of core calculation and also the fundamental elements in the proposed software structure. The reactor physics data is divided into four categories by A. Trkov, namely, (1) basic nuclear data, (2) evaluated nuclear data files, (3) processed nuclear data, and (4) averaged nuclear data . They are the similar data in the thermal-hydraulics analysis, which correspond to the physical property information of the fluid, such as the light water property data in the core calculation . These different types of data involve representation, storage, and computer algorithms. For example, the cross-sectional library of deterministic reactor physics calculation needs group-wise data, such as few-group homogenized cross section in diffusion calculation, with the affection of multiple state variables that include temperature, density, and so on . The class of used nuclear data can be directly computed in the core calculation or be generated in advance. The pre-generated nuclear data (database) is commonly used in a two-step method of core calculation, one is multidimensional interpolation table, the other is parameterized library, and they have different effects on computation and parallel algorithm .
3.1 Multidimensional interpolation table
The table is the indirect format of cross-sectional parameters, with the grids inside the value range for each state variable. The core calculation uses interpolation algorithm to obtain the nuclear data.
Assume there is only one state variable (such as coolant density), and its value range is , then the grid of a single state variable is divided as in Eq. (1):
where is not necessarily uniform distribution. The corresponding parameters at each discrete grid can be obtained by the lattice program as follows in Eq. (2):
In this way, the continuous parameter values can be calculated by polynomial interpolation method in the range of state variable. The coefficients of interpolation polynomials are stored in the table:
3.2 Parameterized library
The library method is the complex function model of nuclear data that is related to various state variables. To build an accurate model, many factors need to be considered, and the range of each state variable is also defined separately:
Generally, the tabulation method uses space for time, and the method is relatively simple, but the storage mode of nuclear data has a greater impact on the parallel algorithm of the core calculation. The method consumes more computing time, but the data storage is smaller, and it is easy to improve and modify the model so that there is less impact on the parallel algorithm.
The software implementation of the nuclear data corresponds to the model layer and model framework layer mentioned in Section 2. Figure 1 shows two abstract forms of the nuclear data. The left side that represents the processing of each kind of nuclear data is independent, and the right side explains the design of consistent data attributes, and formats are independent underneath and dependent on top in the fixed software structure so that each type of nuclear data only needs format conversion and minor programming. This understanding of the underlying nuclear database focuses on software reuse and performance, which can fully reuse various known nuclear data that have fine readability and correctness. This association also enables independent design and programming of core calculation at all levels, which makes the following parallel algorithm analysis match it.
4. Parallel algorithm analysis
Amdahl law stipulates the speedup ratio limit for parallelism, and Gustafson law quantifies the acceleration effect . To be equivalent to Amdahl and Gustafson, measurability of calculation models the performance. Assume that the running time of one computing problem is , which represents the running time for the scale of problem . Then the running efficiency could be utilized to prove the effectiveness of parallel core program from two categories.
4.1 Strong measurability
The change of computing time varies with the increase of processor cores, also is known as speedup ratio:
4.2 Weak measurability
The problem size of each processor or process is fixed, and then the statistic of the running time varies as the processors change:
The deterministic calculation of the reactor core mainly makes the as the variation of processor cores. In practical deterministic core, computation has features that the problem scale is related to the physical form. For instance, when the characteristic rays are fixed on every processor, the MOC transport would increase new characteristic rays and achieve new arrangement if the processors are added. The data expression and computational content may change with different program patterns. On the contrary, nondeterministic calculation of reactor core, such as the MC method, the computational features would remain unchanged so that the problem takes more emphasis on the statistics of weak measurability.
The steps proposed by Ian Foster is the most typical method for the parallel algorithm design and programming  (also known as PCAM method).
Partition identifies computing content that can be executed in parallel.
Communication determines the communication data of each parallel task.
Aggregation merges and adjusts parallel tasks and communication.
Mapping assigns the aggregated task to entities on which actual parallel programming techniques depend (process or thread).
At present, the cluster computers are often in heterogeneous development environment, and parallel algorithms are divided into many kinds such as pipeline parallelism, mater-slave parallelism, and so on. To facilitate engineering specifications, the usually analysis steps of parallel algorithm are divided into two cases in Table 3.
|Designed and implemented algorithm||Algorithm is not designed and implemented|
|Working step||S1: executes and obtains the statistical parameters of parallel performance on a fixed running platform|
S2: judges whether the requirements are met. If not, go to step 3. Otherwise, ends the analysis
S3: designs or improves the existing parallel algorithm and then goes to step 1
|S1: executes and obtains the statistical parameters of parallel performance on a fixed running platform|
S2: identifies the parallel features of the algorithm
S3: determines whether the features have reusable patterns in the software structure (here refers to NAC4R). If so, reuses them directly, turns to step 5, if not, turns to step 4
S4: the new parallel algorithm is designed and programmed
S5: runs on the fixed platform and obtains the statistical parameters
S6: judges whether the requirements are met. If not, goes to step 3. Otherwise, ends the analysis and merges the new parallel algorithm into NAC4R as a reusable pattern
It can be concluded that the parallel analysis includes interpretation, design, and performance verification. The parallel analysis in reactor core calculation summarized here emphasizes to meet the application requirements, and then the common features of the algorithm are usually abstracted and recorded concisely, so that the analysis can be understood clearly and simply.
5. Samples research
Interconnections exist in the multi-physics process and software structure in reactor core. For solving according to physics-mathematics model, various core software has been developed. There are two main ideas for developing these procedures. One is to make full use of the existing programs in the world. The other is to redesign and program the program structure and function. The key parallel algorithms of single discipline sample program CORTH, KYLIN2, K-MOD, and Hydra are identified and programmed, which come from the typical examples of the reactor core system, computing work through the research career of the author.
5.1 Sub-channels analysis
The CORTH program utilizes the uniform flow model of four equations to process two-phase flow in the reactor core. The energy conservative equation could be abbreviated as the manner of structured grids, which is easy to use in finite difference and form the linear system. For the description, CORTH could be divided as components in Table 4.
|Preprocessing||Reads external input and translates into data structure in memory|
|Pre-computation||Calculates some parameters and provides initial values for iteration|
|Iterative control||Controls and manages cyclic calculation|
|Coefficient update||Updates parameters of thermal-hydraulics|
|Linear system solve||Defines linear system and its computation|
|Mesh matching||Maps the sub-channel, meshes, and processes|
The features of the parallel algorithm are listed summarily to make the program refinement practical.
The reactor core is divided along the radial direction so that each process stores the data of some sub-channels. The overlap method could also be used to reduce data communication.
The solution of linear system solving can be quick in the model. The coefficient matrix of the conservative system is symmetric, and it can be constructed simultaneously together with the right-hand vector of the system.
The PETSc and Aztec parallel solver can be ported and optimized inside NAC4R so that solvers could be used and hybrid programmed directly and easily.
The bottleneck of the parallel algorithm design is related to the components of the coefficient update, linear system solve, and mesh matching. Parallel scheme was realized in NAC4R with the features mentioned above. The ACP1000 model (177 fuel assemblies, there are 264 fuels and 25 guide tubes in each assembly, axial segments are 54, and there are 50868 sub-channels in radial direction) is used for performance validation. Cluster processors are chosen as 4, 6, 18, 27, 36, 54, and 108, which is the divisor of 50868 as the Aztec solver needs that the number of processes matches the tasks. The expected performance of strong measurability is improved by more than five times.
Performance conclusion: The trend of processor change shows that using about 30 processors, it has met the expectation in Figure 2, with the speedup ration about eight times. Not only the conservation system can be constructed in parallel, but the Aztec solver can also solve the matrix with the size of 50868 efficiently.
5.2 2D MOC
The KYLIN2 program carries out the 2D neutron transport calculation by MOC over the assembly region. In the condition of steady situation, multigroup neutron transport equation could be expressed by the ODE along the characteristic ray. We can obtain the direct analytical solution of the ODE as in Eq. (7):
where and denote the macro-transport cross section and the neutron source, respectively. The length is the segment that is truncated with the position m-th azimuth angle, k-th characteristic ray, and the i-th mesh. When source iterative method is used, the iterative procedure will repeat the above calculation process after assigning initial guess values to the flux. As shown in Table 5, sample program KYLIN2 could be abstracted into different parts.
|Geometry description||Discrete result of 2D geometry|
|Nuclear data||Stores cross-sectional data with meshes|
|Characteristic rays||Generates and manages rays|
|Sweep computation||Traverses rays and solves angular neutron flux|
|Source update||Updates the transport source|
|Task schedule||Controls the computational flow, such as BSP parallel model|
Sweep computation follows the loop structure, with each characteristic ray, neutron groups, azimuth angles, and polar angles. Moreover, the intermediate solution of KYLIN2 is related to the angular direction so that the structure can be described as follows.
for (g, ray, p) in (Groups, Rays, Polars):
current computeForwardCurrent(g, ray, p)
mesh_angular_flux sweepForward(g, ray, p, current)
current computeBackwardCurrent(g, ray, p)
mesh_angular_flux sweepBackward(g, ray, p, current)
The above components are redeveloped in NAC4R, and the features of the algorithm are listed here:
Iterate that each ray could be in parallel if some critical operations are added, so both MPI and OpenMP can be used for the sweep process.
If rays are non-modular arranged, the differences of working load of each ray are not obvious. Since the assembly problem sometimes uses the nuclear data with some energy groups, parallelism on neutron groups could be a better choice, and OpenMP can avoid data communication during the sweep procedure.
The callings of exponential function will affect the efficiency.
In order to adapt coarse mesh acceleration, reserved interfaces should be kept inside NAC4R so that parallel linear system solvers are used.
Here OpenMP guide sentences are utilized to implement parallelism based on loop structure. The critical operation exists in the cumulative sum of forward and backward sweep when each ray is accessed. C5G7-MOX assembly and ACP1000-UO2 assembly examples are used to test, and more than five times better performance are expected. The statistical calculation time is shown in Table 6.
|Example||One core||Four cores||Eight cores||Speedup ratio|
|C5G7-MOX||1655.28 s||562.20 s||295.01 s||5.61|
|ACP1000-UO2||19515.38 s||5091.92 s||2581.34 s||7.56|
Performance conclusion: OpenMP parallelism (multi-threads) can achieve the rational accelerated effect and meet the expectation. There are more speedup ratios for the ACP1000-UO2 as the neutron group parallelism strategy is used, and some fluctuation exists in the performance gain of shared memory parallelism since there are some management overheads by multi-threads.
5.3 Fuel temperature approximation
The calculation of the fuel temperature usually involves the temperature of the outer surface and the center, and then the effective temperature of the fuel could be obtained for rod-like geometry. The geometry is divided into different sections to solve the heat conduction equation in radial direction, with different approximations are taken account. The fuel rod of PWR is usually delimited as three sections in some programs, and the 1D conduction equation is constructed at each area. After knowing the boundary temperature, the Rowland’s equation could be used to compute the effective temperature.
Fuel temperature approximation is necessary for the reactor core coupling calculation, but reasonable boundaries come from other discipline procedure. K-MOD coupling program has been realized, and according to the above description, fuel temperature approximation could be abstracted in Table 7.
|Thermal feedback controller||Main control component that processes computation in each discrete time step|
|Fuel temperature calculation||Solves the conduction equation by iterative method and obtains the temperature|
|Cross-sectional update interface||Data exchange interface for the temperature and cross section|
Thermal feedback controller carries out the function of management and as an external module that will be called repeatedly in K-MOD. The features of the parallel algorithm are less obvious and can be listed in two points: (1) the computing granularity is small and the single cyclic time is short and (2) the computing pattern could be abstracted and reused if lower level parallel method is researched; thus, SIMD technology is suitable for programming.
In the sample program K-MOD, fuel temperature calculation could be abstracted into three classes, such as op_basic represents the basic arithmetic, but op_math stands for the compound operation, such as square, logarithm, and so forth.
|compile option||icpc -O0 -march = native -mno-sse|
|compile option with SIMD||icpc -O0 -xsse2|
The former deals with integer data, while the latter two deal with floating-point operations. The problem of rod inserting and lifting in real PWR core is tested and compared. The similar imitated structure solves 50,000 times the same as the K-MOD program, with the SIMD hand optimization, which is based on data structure named __m128i and __m128. The time results are shown in Table 8.
|First class||Second class||Third class|
|Serial||93 ms||89 ms||367 ms|
|SIMD parallelism||112 ms||135 ms||152 ms|
Performance conclusion: The statistical results of multiple runs show that SIMD scheme has performance bonus for the third class of calculation content; however, other classes get no gain. Looking up Intel-related documents, it is found that SIMD programming depends on the specific hardware and instruction set usage and depends on the compiler behavior, which is not suitable for manual SIMD rewriting with care in practice.
5.4 3D SN
The sweep operation of discrete ordinate method iterates every angle direction in each octant, which has famous KBA parallel sweep algorithm for structured grids. 3D SN method translates Boltzmann transport equation into the matrix system of flux moment as in Eqs. (8) and (9) in the sample program Hydra:
where and denote the source term and the difference operator, respectively. Discrete operator of flux moment matrix need to multiply the cross-sectional data, such as the scattering matrix . The total meshes are in each angle direction, and then every section that are divided could be mapped into corresponding processes in parallel. According to the above description, SN transport calculation could be abstracted as in Table 9.
|Geometry storage||Manages structured meshes in XYZ coordinate system|
|Nuclear data||Records cross-sectional information|
|Operator process||Constructs mapping between the operator and other parameters|
|Sweep computation||Matrix operation by loop structures|
|Source update||Updates the transport source|
There are multiple sweeping levels in the SN method, and the section could be represented by the loop structure in every octant, and the sweeping procedure can be described as follows:
for k in K:
for (i, j) in ():
Since there is a lack of parallel programming for this loop structure in Hydra, the features of the algorithm are listed.
The extrapolation model, such as the diamond weight difference model, has an independent subroutine , which is only related to mesh . Flux moment variables and depend on the calculation order, so the sweep is one pipeline structure on the .
When the scale of loop structure is small, shared memory parallelism is fit for pipeline structure to improve efficiency, which has theoretical gain .
According to the identified features, OpenMP guide sentences are utilized, and lock array done [I][J] is used to ensure data consistency.
|#pragma omp parallel for|
for j in d_begin[J]:d_end[J]
for i in d_begin[I]:d_end[I]
done[next(j)][i] = 1
Table 10 collects the running time of two benchmark example, which utilizes the optimization level -O0, and only one MPI process is fixed during the experiment.
|Benchmark||NAC4R (one thread)||NAC4R (eight threads)|
|KUCA reactor||3421.7 s||1935.8 s|
|IAEA pool reactor||5824.9 s||3551.3 s|
Performance conclusion: The speedup ratio is almost 1.8 and 1.6, which illuminates that the pipeline structure can gain performance bonus with small scale for the SN method, such as the sample program Hydra. It can be further expanded to the parallel strategy of MPI + OpenMP.
Digital computers are used to simulate and analyze reactor core. For introducing parallel computing and promoting the efficiency, the core calculation with the integrated software structure is explored, which can provide the methodology of nuclear data, control layers, and parallel algorithms. The main software structure is discussed, and multiple parallel algorithms are analyzed concisely for sample core programs, with the corresponding conclusions as follows:
Integrated software structure (framework method) is benefited for the development of core software, and the similarity achieves the goal of reuse. For example, the underlying nuclear data could be managed as the effective manner and mapped to application. With the development of prototype software NAC4R, the research could be carried out in different control layers and demonstrate validity.
Parallel algorithms are refined as well as identify the parallel features that exist in the core model based on the existing theory. The features of four sample programs are described succinctly, which work well with the software structure. Parallel programming enables the expected efficiency improvement if some rules are obeyed, and the relevant performance results are given summarily.
It is necessary to establish systematic knowledge and suggestions for software approach and parallel algorithm development in reactor core calculation. On the one hand, it can integrate reusable patterns into computation, such as the instance of prototype software NAC4R. On the other hand, it can explore and record more specific parallel algorithms in this field.
The author would like to thank relevant scholars for their suggestions in NPIC, and also thank for the sample programs and computing resources provided by the 9th professional group in the design institute of NPIC. The research is supported by the discovery fund of NPIC (Number: 2019-06).
|x||length or position, m|
|k||effective multiplication factor|
|L||the matrix of Laplace operator|
|M||the matrix of flux moment|
|N||number of processors|
|Q||neutron source, 1/m3s|
|s||segment length of the characteristic ray, m|
|ϕ||neutron flux density, 1/m2s|
|ψ||angular neutron flux density, 1/m2s|
|∑||neutron macroscopic cross section, 1/m|
|g||neutron energy group|
|i, j, k, z||mesh position|
|f||fission cross section|
|ACP1000||Advanced China Power 1000 MW|
|BLAS||Basic Linear Algebra Subprograms|
|CASL||Consortium for Advanced Simulation of Light water reactors|
|DENOVO||new three-dimensional parallel discrete ordinates code in SCALE|
|HDF||Hierarchical Data Format|
|IAEA||International Atomic Energy Agency|
|KBA||Koch-Baker-Alcouffe parallel algorithm|
|KYLIN2||advanced neutronics lattice code of NPIC|
|K-MOD||Kinetic Method of Demo|
|MOC||method of characteristics|
|MPI||message passing interface|
|NAC4R||New Architecture Computing for Reactor Core|
|NetCDF||scientific data NETwork Common Data Format|
|PETSc||Extensible Toolkit for Scientific Computation|
|PWR||Pressurized Water Reactor|
|SN||Discrete Ordinate method|
|SIMD||Single Instruction Multiple Data|
|TAU||Tuning and Analysis Utilities|