Type of integers, storage size, and range of possible values in C++ programming language.

## Abstract

In this chapter, the Monte Carlo (MC) core is presented, particularly its cross-sectional libraries and random generators. The main idea is to introduce validation and reliability of MC applications and to explore its limitations. As an example, a comparison between two MC toolkits, namely XRMC (version 6.5.0–2) and Geant4 (version 10.02.p02), and a validation between each of them and experimental data applied to mammography (external dosimetry) are presented. The simulated quantities compared are exposure, kerma, half-value layer, and backscattering. Limitations, advantages, and disadvantages of using a general and specific MC toolkit are commented too.

### Keywords

- Monte Carlo
- mammography
- medical physics
- XRMC
- Geant4

## 1. Introduction

The Monte Carlo (MC) method history began two centuries before its computational implementation that happened in the period of World War II (1939–1945). The MC method conception starts in 1733 with the *Probléme de l’aiguille* (Needle’s problem) by Georges-Louis Leclerc, known as the Comte de Buffon [1], which is enunciated as:

*Sur un plancher qui n est formé que de planches égales & parallèles, on jette une Baguette d’une certaine longueur, & qu’on suppose sans largeur. Quand tombera-t-elle franchement íùr une seule planche? Leclerc [1], p. 44*

or, translated to English:

*On the floor formed only of equal boards placed in parallel, one throws a needle of a certain length which and supposed without width. When will this needle fall on one specific board?*

The first solution proposed by Leclerc [2], in 1777, is considered one of the oldest geometrical probability solutions. The method basically consists in generating successive random samples *N* that will be tested in a statistical model representing the statistical probability. To use this method, one needs to satisfy the main condition: the random variable evaluated must be independent, which means that previous events of interest may not have (may have the minimum) an influence on the successive tryings. In the needle case, Leclerc ([2], pp. 100–104), presented a solution considering the distance *D* of the limits of each wood board and the length *l* of the needle (*l* < *D*) taking the probabilities of crossing zero lines and one line as [3].

It seems to be a simple problem, but its solution ensued a sequence of different mathematical methodologies [3]. For example, in 1812, Laplace, using his theory of probability and theoretical calculations based on this methodology to determine an approximation to the π value [3, 4], presented a generalized solution in 3D space [3, 4, 5].

Following the main condition of independence for random variable enunciated by Leclerc [2], the MC method was proposed as an alternative solution to analytical mathematics to evaluate the behavior of random samples to predict a statistic sample distribution or a statistic behavior. This behavior can be assessed by empirical processes of drawing sequences of independent random samples and observing its behavior [6]. The strategy is simple in concept, but it is time-consuming, being the first computerized MC simulation developed and implemented by the working team of John and Klara von Neumann and Nick Metropolis with the advent of the computers in 1947–1948 [7].

There are different algorithms [8, 9, 10] implemented to apply different MC solutions by using different computational tools. Since the objective of this chapter is to present MC validation and/or reliability for application developers (AD), on a specific study case, we will not detail the different MC algorithms.

There are several characteristics that can be used to classify MC computational tools (MCCT); however, based on the objective of this chapter, the available ones will be classified according to its applicability as general and specific MCCTs. So, in section 2, the general concepts and MCCT code core (cross-sectional libraries and pseudorandom generators), including the specific and general MCCTs characteristics and some codes available nowadays, are going to be presented. In section 3, the validation and reliability of MCCT code concepts and main methods, including its limitations on the implementations of cross-sectional libraries and random generators, are going to be discussed. To illustrate this, a case study of validation for dosimetry in mammography using two MCCT methods for radiation transport (Geant4 and XRMC) is going to be presented. In the last section, the final considerations on choosing a MCCT and important issues on validation or reliability tests will be presented.

## 2. Monte Carlo general concepts and core

The MC method may be used to solve different kinds of problems. It may be used to solve problems that could also be solved by deterministic calculations, but it is usually more time-consuming than those and can increase the complexity of the solution. MC must to be applied, generally, when the change in the model follows a “time dependence” and is suitable for a stochastic calculation, which depends on a sequence of random numbers generated during the simulation. It means that a new execution of the solution with a new (different) sequence of random numbers for the same simulation will not give identical results. However, it will return values that agree with the results obtained from the previous sequence within some “statistical error” or in a statistical fluctuation range [11].

In a general manner, the problems that are in essence managed by random phenomena can be solved by applying MC [11, 12]. The main idea of MC method is to estimate a quantity, based on systems that use random numbers to simulate random walks [11], with an estimator computed from observed/experimental data [12]. Considering this idea, the core system of a MCCT is based on a randomized algorithm (random number generator) to manage probabilities (libraries of sampling distribution) [12]. A MCCT has other tools implemented, but for an AD, the knowledge of the MC core limitations is essential to estimate the accuracy and precision of the results.

Taking into account the proposition of MCCT for transport radiation, one may define core as the computational random number generator (randomized algorithm) and the cross sections for each possible process of interaction (probabilities, in the case of photons that can be the total attenuation cross sections for each possible process, or the differential cross sections—if applicable—or the energy transfer cross sections or the energy absorption cross sections). Let’s think about a traditional MC simulation as is represented in the following scheme (Figure 1). It is important to keep in mind that this is a simplified scheme of transport radiation designed to aid the understanding of the basics of MC processing. Before one starts to run^{1} an event^{2} in a MCCT, one may define the simulation universe (or world), including the geometry, material composition of the simulated objects, and, if necessary, the additional information needed for the interaction.

The run starts always with the generation of a primary particle (emitted by the radiation source), and it finishes when all histories were run. As one may observe in Figure 1, the system starts the run, after the geometry built and physics definition, by initializing the counter of the number of histories (*VARnh*). This variable is compared to the expected total number of histories (*nh*), so if the *VARnh* is equal to *nh,* then the termination of run is performed, or if *VARnh* is smaller than *nh*, then a new history is started by generating a new primary particle. In the generation of primary particle, if the source is defined by an energy distribution and/or position distribution (linear, planar, or volumetric source) and/or momentum direction distribution, the random number generator will be evoked (one to each distribution needed). After the primary particle of the source is generated, the information about this particle is recorded at the beginning of step (pre-step information). Following the step execution, the end of the step information will be generated (post-step information) and tested. The traditional MCCT tests are:

Is this particle inside the world? In MC simulation, the geometrical limits to follow the transport of radiation are the limits described on the geometry by the larger volume (the world) that will contain the other volumes. Some MCCTs have no world volume defined; usually if they are specific MC using variance reduction techniques that force the radiation to interact with the defined volumes, then the logic is different than the presented in the scheme in Figure 1.

Is this particle alive? In MCCT for transport radiation, there is a minimum energy to proceed the transportation, so if the particle kinetic energy is smaller than this minimum energy, then this particle will die, which means in MCCT all residual energy will be locally deposited and the particle will stop.

If the particle is alive and inside the world, then it is important to know if this particle will find a geometrical boundary and/or a different material in its path during the step. If the answer is *no* to both pre-defined questions, then the code will proceed with the step. If the answer is *yes*, the code will calculate the length until this boundary and check if the other volume has or does not have a new material, and the step will proceed until the boundary; after that the residual kinetic energy of the particle will be recalculated for the next volume material. At the end of the step, the post-step information is recorded. Then, the *VARnh* is increased of a unit and is compared to *nh*. If *VARnh* is equal to *nh*, the termination of run is performed. If *VARnh* is smaller than *nh*, a new step procedure is started by recording the post-step information of the previous step as initial information of the new one, proceeding with the verifications and implementations for this new step. It is important to note that all secondary particles generated, as product of an interaction, will be transported following the same procedure starting in *Record Pré-Step* with the exception that *VARnh* will not be incremented and these particles will be followed until they die or leave the world.

To illustrate the selection of random number, let’s create a hypothesis of a 40 keV photon interacting with a liquid water medium. In this case, the total attenuation cross section is 0.2683 cm^{2}/g, being composed by coherent scattering (0.02874 cm^{2}/g), incoherent scattering (0.1827 cm^{2}/g), and photoelectric effect (0.05680 cm^{2}/g).^{3} Figure 2 shows the simplified scheme that defines the process of interaction.

Considering the information in Figure 2, one may see that among the three possible processes of interaction a probability of approximately 10.71% for coherent scattering, 68.11% for incoherent scattering, and 21.18% for photoelectric effect. Then, the normalization of the probabilities for each process between 0 and 1 is performed, considering the total attenuation cross section as the normalizing factor, and these normalized probabilities are organized in a sequence of real values. The possible number of values between 0 and 1 depends on the variable type defined in the MCCT implementation for the random generator number. On the presented example, the random numbers in the intervals [0; 0.10714) identify coherent scattering [0.10714; 0.78824), incoherent scattering, and [0.78824; 1) photoelectric effect. It is important to note that the probability of occurrence is proportional to the quantity of random numbers in the sequence of values. In the case exemplified in Figure 2, the number 0.0053721 is in the range [0; 0.10714) and defines the photon transport by coherent effect. If the random number were 0.78824, the photoelectric effect would be simulated since this value is in the range [0.78824; 1).

During the simulation several processes may need to have a random number generated such as process of interaction (used in the above example), momentum direction of the particle, secondary particle momentum and kinetic energy, atomic effect (if considered in the simulation), probability of Auger effect, and momentum direction of the Auger electron or auto-absorption of the Auger electron, among others. After the random definition of some of the abovementioned characteristics, deterministic equations are applied to keep the Principle of Energy and Momentum Conservation. Regarding the core of MCCT, it is important to know, as an AD, the main validity and limitations of the random number generator and the cross-sectional libraries.

The random number generator may be classified as pseudorandom number generator (PRNG) or true random number generator (TRNG) [13]. The so-called PRNG uses a deterministic process to generate a series of outputs from an initial seed state which means that for the same input “seed” one may have the same output number [13, 14, 15]. As an example one may cite the <cstdlib> head of C++ rand() function. In this case, usually the random number generated is an integer, and to know the range of possible numbers, it helps the AD to understand the limitations of the number of histories that can be run without compromising the randomicity of the simulation [13, 14], the so-called period of random number generator [16]. Table 1 presents the different range of values generated among the possible integer variables according to [14].

Type | Storage size | Values range |
---|---|---|

Short | 2 bytes | −32,768 to 32,767 |

Int | 2 bytes or 4 bytes | −32,768 to 32,767 or −2,147,483,648 to 2,147,483,647 |

Long | 4 bytes | −2,147,483,648 to 2,147,483,647 |

Unsigned short | 2 bytes | 0 to 65,535 |

Unsigned integer | 2 bytes or 4 bytes | 0 to 65,535 or 0 to 4,294,967,295 |

Unsigned long | 4 bytes | 0 to 4,294,967,295 |

Based on the value range presented in Table 1, one may see that different possible variable definitions of the random generator can affect the resolution of the simulation, which means that there is a limit of histories with a proper random behavior for a PRNG. The PRNG is used in several applications [15], and one advantage of using it on MCCT is the capability of reproducing the same sequence of pseudorandom numbers [14] that can be used to validate an application and/or to validate and test different installations of a MCCT under different environments (evaluating the accuracy and precision of the simulation in different conditions) [16].

The TRNG uses a non-deterministic source to produce randomness [13], and its advantage is that TRNG is unpredictable, unbiased, and independent [16]. The disadvantage on developing TRNG is that it is implemented in hardware, which limits the flexibility of this random number generator and since additional verification of randomness is required with every change of environment [16]. Because of the hardware implementation of TRNG, computers without a hardware random number generator will require a peripheral that will generate a TRNG seed to be used as incoming data for PRNG [16].

Sometimes, an association of random number generators (PRNG-PRNG and PRNG-TRNG) is implemented to increase the period of a random generator, but the randomness of the number generated must be tested and verified. Special care must to be taken attention on running MCCT in computational grids or clusters to ensure that every processor will have an independent random seed to start the process. If this requirement is not kept, inconsistencies in the results may happen turning them unrealistic and carrying with them statistical tendencies that do not represent the expected probabilities. Therefore, to guarantee the reliability of results of a MCCT, the AD must understand the random number generator and its period and limitations.

Considering the reliability of the MCCT in the example described above, when it is applied to low-energy radiation transport, the probabilities (e.g., cross-sectional libraries—total and differential—for photons), the distribution functions, and the transport models for particles, such as electrons, are indispensable. As a general rule, it is important to know the processes simulated and if there are one or more models to be evoked. To validate these characteristics, the MCCT requires a microscopic validation^{4} that in turn requires experimental data of the cross sections or distribution functions for different material and energy range. The microscopic validation is hard work to be performed by an AD; however, one may find the validation of the data libraries in the literature and/or online libraries [17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27] and on independent validations published for specific MC codes [21, 28, 29, 30].

### 2.1 General *versus* specific Monte Carlo toolkit for radiation transport

The MCCT may be classified according to its applicability as general purpose (GP) [31, 32, 33] or specific purpose (SP) [33, 34, 35]. It is important to understand that this classification refers to the possibility of using MCCT in different applications and not the kind of solution generated by the MCCT. All MCCTs present a general solution to the study case, when applied to the same particle types, degrees of freedom, and simulated quantities, taking into account the limitations of the implemented code and libraries.

Some MCCTs are developed considering the simulation of a wide range of particles and/or quantities. Usually these MCCTs simulate detailedly the radiation transport of primary and secondary particles using minimal approximations as possible. These MCCTs are called general purpose Monte Carlo toolkit (GPMCT), and they may be applied to solve a wide range of radiation transport problems: large energy range, different particle types, different geometries, and a large range of simulated processes. As examples, one may cite Geant4, MCNP, or FLUKA.

The geometry and tracking (Geant4) [36, 37, 38] is a MCCT that has a complete range of functionalities including tracking, geometry, physics models, and hits [36]. It was developed based on object-oriented technology and implemented in C++ programming language. The physics processes available cover a comprehensive range, including electromagnetic, hadronic, and optical ones with a large set of materials, chemical elements, and long-lived particles, over a wide energy range starting from 250 or 990 eV and extending to a few TeV. The extended package Geant4-DNA adds processes for the modeling of induced biological damage by ionizing radiation at DNA scale, which transports all particles using a discrete model [39, 40, 41, 42] extending the possibility of transport particles down to a few eV (the range is different to each particle and process). On Geant4, the AD may access a large cross-sectional library database, making possible to choose different radiation processes and, to each process, to select different transport models. On Geant4, the AD may implement different variance reduction methods and set different parameters to transport primary and secondary particles [43] among the more than 35 particles^{5} allowed [43]. AD may use Geant4 classes to create collections of interactions, named hits (G4VHit or G4THitsCollection), and/or evoke sensitive detector counters (G4MultiFunctionalDetector or G4VPrimitiveScorer) and/or implement his/her own personal class (a new sensitive detector or hit file) [44].

The Monte Carlo N-particle (MCNP6) [45, 46, 47, 48, 49] MCCT includes a powerful general source, a criticality source, and a surface source. In addition to that, this MCCT includes both geometry and output counter (named tally) plotters. MCNP is implemented on GNU Fortran and C/C++ compilers [49] being a continuous-energy, generalized-geometry, time-dependent, MC radiation-transport code designed to track many particle types over broad ranges of energies. This MCCT may simulate neutron, photon, electron, or coupled neutron/photon/electron transport and heavy ions [49]. It simulates different energy ranges for different particles: neutron energy range from 10^{−11} to 20 MeV for most of isotopes and up to 150 MeV for some others, photon energy range from 1 keV up to 100 GeV, and electron energy range from 1 keV to 1 GeV [50]. It has a rich collection of variance reduction techniques with an extensive collection of cross-sectional data. In addition, MCNP contains numerous tallies: surface current and flux, volume flux (track length), point or ring detectors, particle heating, fission heating, pulse height tally for energy or charge deposition, mesh tallies, and radiography tallies [46, 49]. This MCCT makes it possible to change transport parameters by command lines [46, 50].

The Fluktuierende Kaskade (FLUKA) [51, 52, 53] MCCT was implemented and presents a number of ADs interface routines in Fortran 77. It simulates accurately the interaction and propagation of radiation in matter of about 60 different particles,^{6} including photons and electrons from 100 eV or 1 keV to thousands of TeV, neutrinos, muons of any energy, hadrons of energies up to 20 TeV and all the corresponding antiparticles, neutrons down to thermal energies, and heavy ions. Efficiency on radiation transport has been achieved using a frequent access table look-up sampling, and accuracy is maximized by systematic use of double precision variables. It is provided with a large number of available options for an AD and has been completely restructured introducing dynamical dimensioning. It has the double capability to be used in a biased mode as well as a fully analogue code which means that while it can be used to predict fluctuations, signal coincidences, and other correlated events, a wide choice of statistical techniques is also available to investigate punch through or other rare events in connection with attenuations by many orders of magnitude [52]. FLUKA can generate several output cards: a main (standard) output file, two scratch files, a file with the last random number seeds, an error messages file (if any), and any number (including zero) of estimator output files. Generally, the AD may choose between formatted and unformatted output and may generate a personalized routine for additional outputs [53].

However, some MCCTs are developed to solve problems considering specific particles or specific geometrical conditions or specific simulated quantities. These MCCTs are called specific purpose Monte Carlo toolkit (SPMCT) and are usually optimized to use several approximations and variance reduction techniques. They are developed considering restrictions on applications, and very specific quantities are simulated. In general, the SPMCTs are faster than the GPMCTs to solve the same problem. As examples, one may cite XRMC, ITS TIGER series, PENELOPE, EGS, and ETRAN.

The X-Ray Monte Carlo (XRMC) [54] simulates accurately X-ray imaging and spectroscopy experiments of heterogeneous samples. This MCCT is implemented in C++ and is capable of simulating, in detail, complex experiments on generic samples using different variance reduction techniques by default. It was developed initially to simulate X-ray fluorescence and photon imaging. XRMC simulates the transport of photons only and makes it possible to simulate the following quantities: total fluence and fluence with energy binding and total energy fluence and energy fluence with energy binning. As output, it may generate a raw file with the transmission image [55], and if energy binning is evoked, the AD may define the bin size. On transport possibilities, the AD may define maximum scattering order number, maximum scattering order as transmission, first-order scattering or fluorescence emission, and second-order scattering or fluorescence emission or higher order. It also has the flexibility of activating or inactivating fluorescence [54, 55] process. The cross-sectional library evoked by XRMC is the *xraylib* [56], a library for X-ray matter interactions generally used for XRF applications.

The integrated tiger series (ITS) [57, 58, 59], version 6, allows solutions of linear time-independent coupled electron/photon radiation transport problems. This MCCT employs accurate cross sections, sampling distributions, and physical models to describe the production and transport of the electron/photon cascade from 1.0 keV to 1.0 GeV [58, 59]. The ITS, version 6, was converted to Fortran 90 [59] with C++ links to CAD software. The availability of the source code allows the AD to tailor this MCCT to specific applications and to extend its capabilities to more complex applications. Overlaps in CAD geometry may be evaluated and reported in an output file [58]. The AD may set different parameters by command line like to define the cross section for different data sets, to deactivate the coherent photon scattering, to include (or not) binding effects in incoherent photon scattering, and/or to apply (or not) energy-loss straggling to electrons [59]. The AD may set different output information such as the energy and charge deposited in every subzone, the detailed energy and charge deposited in every subzone, and the geometry-dependent input settings [58]. ITS’ cross-sectional [58] suite of codes includes a multigroup version along with the multigroup cross-sectional generator CEPXS and a continuous-energy (XGEN) cross sections [58, 59]. In ITS, photons below 1 keV are locally absorbed, an alternative algorithm to electron transport was implemented named Generalized Boltzmann Fokker-Planck (GBFP), and the full transport capability for photons and electrons using the Livermore database is under development [58].

The penetration and energy loss of positrons and electron (PENELOPE) [60], version 2014, MCCT simulates the coupled electron-photon transport as well as photons, electrons, and positrons. The PENELOPE simulation algorithm is based on a scattering model combining numerical databases with analytical cross-sectional models for the different interaction mechanisms being applicable to energies from few hundred eV up to approximately 1 GeV. Photon transport is simulated by means of the standard, detailed simulation method. Electron and positron transports are simulated based on a mixed procedure, which combines a detailed simulation with a condensed one [60, 61, 62, 63]. The implementation of the cross-sectional libraries considers EPDL^{7} total cross sections for photoelectric absorption and Rayleigh scattering, XCOM^{8} cross sections for pair production, and SUMGA^{9} function for total atomic cross sections and Compton scattering. PENELOPE can simulate the emission of characteristic X-rays and Auger electrons resulting from vacancies produced in K, L, M, and N shells by photoelectric absorption, Compton scattering, triplet production, and electron/positron impact. In PENELOPE 2014, the elastic collisions of electrons and positrons are simulated, using numerical partial-wave cross sections for free neutral atoms by elastic scattering of electrons and positrons by atoms (ELSEPA) program that is a database distributed by ICRU Report 77 (2007) [60]. The output may be defined using Fortran subroutines, where the AD may get different quantities such as number of materials that were loaded, mass density of specific materials, characteristics of the slowing down for charged particles, energy of the particle at the beginning of the track segment, effective stopping power of soft energy-loss interactions, and energy lost along the step, among others [61].

The electron gamma shower (EGS) MCCT may be found on different main versions, EGS5 and EGSnrc. Both versions of EGS are implemented in Mortran3 language, which is a preprocessor for Fortran [64, 65]. The origins of EGS MCCT are documented in NRC-PIRS-0436 report [66]. The EGS5 simulates the coupled transport of electrons and photons in an arbitrary geometry for particles with energies from a few keV up to a several hundred GeV [64] depending on the atomic numbers of the target materials. The EGSnrc^{10} (Electron Gamma Shower from National Research Council) is an extended and improved version of the EGS MCCT, having specific modeling implementations to electron and photon transport through matter. It includes the BEAMnrc software component that models beams traveling through consecutive material components, ranging from a simple slab to the full treatment head of a radiotherapy linear particle accelerator (linac). EGSnrc is particularly well-suited for medical physics applications (research and devices development) being used for medical radiation detection, medical image based on x-radiation, and dosimetry for a specific volume. However, due to the flexibility of this MCCT, the AD may use it for different applications such as in industrial linac beams, X-ray emitters, radiation shielding, and more. The EGSnrc simulates the radiation transport in homogeneous materials for photons, electrons, and positrons with energies between 1 keV and 10 GeV. It incorporates significant refinements in charged particle transport and better low energy cross sections and makes it possible to define elaborated geometries and particle sources [65].

The electron transport (ETRAN) MCCT transports electrons and photons through extended media being developed by the National Bureau of Standards. This MCCT has various versions representing mainly refinements, embellishments, and different geometrical treatments that share the same basic simulation algorithm based on random sampling the path of electrons and photons as they travel through matter. The algorithms and computational tools written at other laboratories, such as Sandia’s older SANDYL code and their more current series of the TIGER, CYLTRAN, and ACCEPT codes, together have been called ETRAN model too.

When an AD chooses a MCCT, it is important to consider:

The characteristics of the application: type of primary and secondary particles and their energy range, quantities to be simulated, geometry and material composition of the simulated universe;

The capabilities of the MC code: if the code can handle properly the transport of primary and (if necessary) secondary particles in the energy range of interest, if it is possible to simulate the necessary quantities, and if it can handle the transport simulation in all material compositions expected and how it simulates the geometry of interest;

The limitations of the MC code: transport processes and models simulated in the energy range of interest (search for microscopic validation of the cross-sectional libraries published) and how accurate the MCCT is on simulating the dosimetric quantities and the particle fluxes (search for macroscopic validation published), being recommended that the AD proceeds his/her own macroscopic validation;

The computational performance: verifying the running time to get an acceptable statistical fluctuation in the results for the cases of interest and, in some cases, checking the RAM memory used to build the virtual universe and the memory used to save the output files;

Considering those minimal guidelines on choosing a MCCT, there is a good chance for the AD to not have unresolvable problems during the development of an application. Now, if you, as an AD, still have questions about the proper MCCT to choose, keep in mind the best one is the MCCT able to solve your “problem” (accuracy of the results) with an adequate statistical fluctuation (precision of the results). In addition to that, an AD at least should be able to install and to use the MCCT interface, being aware of the common limitation of it. All these characteristics may be found, usually, in the manual (user manual and physics process manual).

## 3. Verification, validation, comparison, and reliability of Monte Carlo toolkits

To guarantee that one application is realistic, it is important to test it (computational code) in different ways. There are several known ways to test a computational code and its parts; however, in this section, the focus is to present the concepts applied on developed applications for MCCTs such as verification, validation, comparison, and reliability.

When one is working in an application for MCCT, it is important to understand the concepts that may guarantee its internal consistency and accuracy. The IEEE 1012–2016 gives a general description of software verification and validation, and the IEEE 24765–2017 gives a detailed description of these concepts defining these terms. *Verification* is defined as a “confirmation by examination and provisions of objective evidence that specified requirements have been fulfilled” (IEEE 1012–2016), and lately this concept was detailed as “the process of evaluating a system or component to determine whether the products of a given development phase satisfy the conditions imposed at the start of that phase” (IEEE 24765–2017). *Validation* is defined as a “confirmation by examination and provisions of objective evidence that the particular requirements for a specific intended use are fulfilled” (IEEE 1012–2016), and lately this concept was detailed as “the process of evaluating a system or component during or at the end of the development process to determine whether it satisfies specified requirements” (IEEE 24765–2017). So, one may say that a validation was performed when this one answers affirmatively the question: “Are we building the right product?” In the other hand, one may affirm that one is doing a verification by answering the question: “Are we building the product right?” [67].

According to [68], “*Validation* involves the system and acceptance testing during the test phase, whereas *verification* involves reviews and audits, software unit testing, and other techniques to evaluate intermediate work products such as the *software requirements specification, software design description*, and individual modules during earlier project phases.” In MC, the AD does the *verification* of the application developed to guarantee that this application is reproducing the system (or geometry) and general conditions as close as possible to the reality, and the AD does the *validation* to guarantee that the MC application (considering the geometry material, particles if interaction and energy range of the particles) gives realistic results when compared statistically to experimental data, when a consistent amount of quantitative experimental data is available. In this context, it is fundamental to understand the setup and the experimental limitations of the instruments and measurements used in the experiments to take it into account on the data analyses to explain observed differences and similarities on the results.

When experimental data is not available, it is possible to use other MCCT or deterministic models to compare to the MC application results. In this way, one is performing a *comparison* between models and not a *validation*. This *comparison* must be based on quantitative statistical tests. In this case, to know and understand the main conceptions involved in the models and databases used, including its limitations and previous validations, it is fundamental to explain the observed differences and similarities on the results.

A *reliability* evaluation is recommendable when there are neither experimental data on specific trustable models nor amount of data to perform a *validation* or a *comparison*. The IEEE 982.1–2005 provides information used as indicators of *reliability* defining software *reliability* as “the probability that software does not cause the failure of a system for a specified time under specified conditions.” In this context, the software *reliability* represents an effective measurement of the more general concept of software quality, using derived quantities and experimental models that are partially consistent to the application of interest. It is important to know the systematic errors and map all differences on the contour limitations of the application and the theory involved in this comparison.

It is possible to combine *validation* results, *comparison* between models, and software *reliability* to evaluate an application. Additional information about statistical tests and specific recommendations for software *verification*, *validation*, *reliability,* and *comparison* may be found in international documents. Thus, it is important to study the international standard regulations/recommendations when one wants to validate any software, including the MCCTs themselves and applications developed using them. The standard lists of active documents from IEEE, International Electrotechnical Commission (IEC), and International Organization for Standardization (ISO) may be searched online.^{11} Additional detailed information about this subject may be studied at:

IEEE 730–2014—IEEE Standard for Software Quality Assurance Processes

IEEE 982.1–2005—IEEE Standard Dictionary of Measures of the Software Aspects of Dependability

IEEE 1012–2016—IEEE Standard for System, Software, and Hardware Verification and Validation (corrigendum 1012–2016/Cor 1–2017)

IEEE 1016–2009—IEEE Standard for Information Technology-Systems Design—Software Design Descriptions

IEEE 12207–2017—ISO/IEC/IEEE International Standard—Systems and software engineering—Software life cycle processes

IEEE 14764–2006—ISO/IEC/IEEE International Standard for Software Engineering—Software Life Cycle Processes—Maintenance

IEEE 15026–1—Revision-2019—ISO/IEC/IEEE Approved Draft International Standard—Systems and Software Engineering—Systems and Software Assurance—Part 1: Concepts and Vocabulary

IEEE 15026–2-2011—IEEE Standard—Adoption of ISO/IEC 15026–2:2011 Systems and Software Engineering—Systems and Software Assurance—Part 2: Assurance Case

IEEE 15026–3-2013—IEEE Standard Adoption of ISO/IEC 15026–3—Systems and Software Engineering—Systems and Software Assurance—Part 3: System Integrity Levels

IEEE 15026–4-2013—IEEE Standard Adoption of ISO/IEC 15026–4—Systems and Software Engineering—Systems and Software Assurance—Part 4: Assurance in the Life Cycle

IEEE 24765–2017—ISO/IEC/IEEE International Standard—Systems and software engineering—Vocabulary

IEEE 29119–1-2013—ISO/IEC/IEEE International Standard—Software and systems engineering—Software testing—Part 1: Concepts and definitions

IEEE 29119–2-2013—ISO/IEC/IEEE International Standard—Software and systems engineering—Software testing—Part 2: Test processes

IEEE 29119–3-2013—ISO/IEC/IEEE International Standard—Software and systems engineering—Software testing—Part 3: Test documentation

IEEE 29119–4-2015—ISO/IEC/IEEE International Standard—Software and systems engineering—Software testing—Part 4: Test techniques

IEEE 29119–5-2016—ISO/IEC/IEEE International Standard—Software and systems engineering—Software testing—Part 5: Keyword-Driven Testing

IEC 61508–0 (2005–2101)—Functional safety of electrical/electronic/ programmable electronic safety-related systems—Part 0: Functional safety

IEC 61508–1 (2010–2104)—Functional safety of electrical/electronic/ programmable electronic safety-related systems—Part 1: General requirements

IEC 61508–2 (2010–2104)—Functional safety of electrical/electronic/ programmable electronic safety-related systems—Part 2: Requirements for electrical/electronic/programmable electronic safety-related systems

IEC 61508–3 (2010–2104)—Functional safety of electrical/electronic/ programmable electronic safety-related systems—Part 3: Software requirements

IEC 61508–4 (2010–2104)—Functional safety of electrical/electronic/ programmable electronic safety-related systems—Part 4: Definitions and abbreviations

IEC 61508–5 (2010–2104)—Functional safety of electrical/electronic/programmable electronic safety-related systems—Part 5: Examples of methods for the determination of safety integrity levels

IEC 61508–6 (2010–2104)—Functional safety of electrical/electronic/programmable electronic safety-related systems—Part 6: Guidelines on the application of IEC 61508–2 and IEC 61508–3

IEC 61508–7 (2010–2104)—Functional safety of electrical/electronic/programmable electronic safety-related systems—Part 7: Overview of techniques and measures

IEC 61511–1 (2003–2101)—Functional safety—Safety instrumented systems for the process industry sector—Part 1: Framework, definitions, system, hardware and software requirements

IEC 61511–2 (2003–2007)—Functional safety—Safety instrumented systems for the process industry sector—Part 2: Guidelines for the application of IEC 61511–1

IEC 61511–3 (2003–2003)—Functional safety—Safety instrumented systems for the process industry sector—Part 3: Guidance for the determination of the required safety integrity levels

ISO/IEC 25010:2011—Systems and software engineering—Systems and software Quality Requirements and Evaluation (SQuaRE)—System and software quality models

There are two ISO documents under development at the moment: the ISO/DTR 11462–3 Guidelines for implementation of statistical process control (SPC)—Part 3: Reference data sets for SPC software validation and ISO/NP TR 11462–4 Guidelines for implementation of statistical process control (SPC)—Part 4: Reference data sets for measurement process analysis software validation.

### 3.1 Example of application for macroscopic validation, comparison, and reliability for XRMC and Geant4

On this section a comparison between XRMC version 6.5.0-2 (henceforth called XRMC) [54, 55] and Geant4 version 10.02.p02 (henceforth called Geant4) [36, 37, 38] is presented, as well as the validation of both MCCTs using experimental data collected on three different mammographs. For validation the following measurements were performed: exposure (X), kerma, half-value layer (HVL), inverse square law (ISL), and backscattering (BS). Limitations, advantages, and disadvantages of using a general and specific MCCT will be commented too. Absolute and normalized quantities were selected because it is important to know the correction factor for total number of photons generated per mAs per total irradiated area for each equipment (this number is characteristic of each X-ray tube and will change with the time), and the combination of these quantities helps to define the best approximation for this correction factor in the simulation to get results closer to the clinical reality.

It is important to inform that each setup had the data collected with calibrated equipment (electrometers and ionizing chambers) available at their institutions and performed by the same person that developed the application with both MCCTs. The simulated geometries are the same used on the data collection. In the following, a brief description of the measurement equipment and simulated setup is presented:

Mammomat Inspiration [69, 70] (henceforth called Inspiration)—measurements were performed with electrometer and ionizing chamber TNT 12000 kit (Fluke) and Al 99% purity filters. SIMULATION: dry air-sensitive volume of 15 cm

^{3}; focal spot as point-source irradiating homogeneously on circular surface of 2.08 cm of radius; spectra for acceleration voltages 25, 30, and 35 kVp; track-additional filtration combination Mo-Mo (30 μm) and Mo-Rh (25 μm); spectra of ripple 0%; target tilt angle of 20^{o}; and a window of 0.8 mm of beryllium (Be). The HVL calculations are based on a source-to-detector distance of 41.0 cm for different Al thickness filtration; and X data were collected and simulated to source-to-detector distances 26, 40, 50, and 60 cm.Mammomat 3000 [71] (henceforth called M3000)—measurements were performed with electrometer Victoreen model 660–1 (1315REV) and ionizing chamber Victoreen model 660-4A (512REV). SIMULATION: dry air-sensitive volume of 4 cm

^{3}; focal spot as point-source irradiating homogeneously on a circular surface of 10.0 cm^{2}; spectra of ripple 0%; target tilt angle of 22^{o}; a Be window 0.8 mm thick; track-additional filtration combinations of Mo-Mo (30 μm), Mo-Rh (25 μm), and W-Rh (50 μm); and spectrum acceleration voltages of 24 up to 32 kVp, in steps of 2 kVp. The BS was calculated considering simulators of BR12 epoxy and polymethilmetacrilate, considering a source-to-detector distance of 60.0 cm and simulator thicknesses of 4, 5, 6, and 8 cm.Lorad MIII [72] (henceforth called Lorad)—measurements were performed with electrometer Modified Keitlhy (model 602) and ionizing chamber for mammography MPT SN 442. SIMUALTION: dry air-sensitive volume of 6.0 cm

^{3}; focal spot as point-source irradiating homogeneously on a rectangular surface of (18.0 × 24.0) cm^{2}; spectra for acceleration voltages from 26 to 34 kVp, in steps of 2 kVp; track-additional filtration combination of Mo-Mo (30 μm) and Mo-Rh (25 μm); spectra of ripple 0%; target tilt angulation of 16^{o}; and a Be window 0.8 mm thick. The X measurements were performed with compression paddle and by minimizing the BS effects by increasing the distance between the bucky and the ionizing chamber.

It is important to evaluate all the available possibilities on the MCCT to get a realistic perspective of the configurations. Because of that, two modes to describe the transport model were evaluated on XRMC (transmission (T) and with scattering for dosimetry (D)). In Geant4, the different radiation transport physics models recommended for low energy photons and electrons (standard-option3 (*std*), penelope (*pen*), and Livermore (*liv*)) were also evaluated. Since measurements of the experimental spectra were not possible, different descriptions of the incident spectra modeled by two different references [73, 74] were explored. When nonexperimental spectra are used to simulate dosimetric quantities, it is necessary to take into account the validation of normalized quantities and, if possible, to use semiempirical correction factors to get accurate values for the average number of photons per mAs per total irradiated area. There are different ways on doing it, but the usual are:

to use the ratio of the simulated and experimental KERMA to get a correction factor, generally using primary beam with different kVp and mAs, in the range of energy of interest, collecting the KERMA with the minimization of scattering effects or

to use a normalized quantity, for example, normalized HVL, to evaluate the proximity of the behavior of the simulated and experimental curves and then use a good of fit (GoF) test on the non-normalized HVL to estimate the best correction factor to fit the amplitude of the simulated to the experimental data.

In both cases, the error estimation of the experimental data as well as the quantification of the statistical fluctuations of the MC method must be taken into account.

The XRMC does not return the absorbed energy or dose as an output information, so to make the comparison of quantities calculated in same conditions possible, the calculations are based on the incoming spectra on the surface of the sensitive volume. The Geant4 application was planned to collect the spectra on the surface of the sensitive volume, and the same calculations applied to XRMC results were used. On the other hand, for Geant4 validation, the absorbed energy in the sensitive volume was used. The statistical fluctuations were based in a sequence of 10 runs with different seeds for each evaluated case, for both MCCTs, and the average and standard deviation of the data were calculated and used on data analyses.

It is important to compare quantitatively experimental to simulated data for validation. Several statistical tests usually may be applied generally: Chi-square (*χ*^{2}), Anderson-Darling, Kolmogorov-Smirnov, and Walt-Wolfowitz, among others. However, when one has data with error or statistical fluctuation associated, the *χ*^{2} must be applied since it considers this in the nonparametric evaluation between the statistical populations of interest. Another simple way to start an evaluation of the results is to generate comparative plots. Figure 3 presents the graphical comparison of MCCT validations, and Tables 2 and 3 present the *χ*^{2} *p* value for the validation and the comparison for all simulated conditions and normalized data.

Transport models and spectrum identification | Inspiration (HVL) | M3000 (BS) | Lorad (HVL) | All | M3000 (Mo30Mo) | M3000 (Mo25Rh) | M3000 (W-25Rh) | Lorad (Mo30Mo) | Lorad (Mo25Rh) |
---|---|---|---|---|---|---|---|---|---|

XRMC_T–Barnes | 0.3025 | NA | 1.0000 | 0.9988 | NA | NA | NA | 1.0000 | 0.7265 |

XRMC_T–Catalogue | 0.0687 | NA | 0.5859 | 0.3125 | NA | NA | NA | 0.9258 | 0.1466 |

XRMC_D–Barnes | NA | 1.0000 | NA | 1.0000 | 1.0000 | 1.0000 | 1.0000 | NA | NA |

XRMC_D–Catalogue | NA | 1.0000 | NA | 1.0000 | 1.0000 | 1.0000 | 1.0000 | NA | NA |

G4std–Barnes | 0.2463 | 1.0000 | 0.0817 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.9998 | <0.001 |

G4std–Barnes–Calc | 0.1966 | 1.0000 | 0.0785 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.1049 | 0.2069 |

G4std–Catalogue | 0.1481 | 1.0000 | <0.001 | 0.3636 | 1.0000 | 1.0000 | 1.0000 | <0.001 | <0.001 |

G4std–Catalogue–Calc | 0.0710 | 1.0000 | <0.001 | 0.9993 | 1.0000 | 1.0000 | 1.0000 | 0.1811 | <0.001 |

G4pen–Barnes | 0.2397 | 1.0000 | 0.1113 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.9999 | <0.001 |

G4pen–Barnes–Calc | 0.1564 | 1.0000 | 0.7587 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.9997 | 0.0597 |

G4pen–Catalogue | 0.3511 | 1.0000 | <0.001 | 0.3811 | 1.0000 | 1.0000 | 1.0000 | <0.001 | <0.001 |

G4pen–Catalogue–Calc | 0.2383 | 1.0000 | 0.0102 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.3842 | 0.0018 |

G4liv–Barnes | 0.2494 | 1.0000 | 0.9703 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.2405 |

G4liv–Barnes–Calc | 0.3756 | 1.0000 | 0.0600 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.0328 | 0.3994 |

G4liv–Catalogue | 0.1910 | 1.0000 | 0.0290 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | <0.001 | 0.9905 |

G4liv–Catalogue-Calc | 0.0331 | 1.0000 | 0.6826 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.1454 | 0.9993 |

Transport models and spectrum identification | Inspiration (HVL) | M3000 (BS) | Lorad (HVL) | All | M3000 (Mo30Mo) | M3000 (Mo25Rh) | M3000 (W-25Rh) | Lorad (Mo30Mo) | Lorad (Mo25Rh) |
---|---|---|---|---|---|---|---|---|---|

G4std–Barnes | 0.9777 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.9999 |

G4std–Barnes–Calc | 0.9149 | 1.0000 | 0.9671 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.6334 | 0.9972 |

G4std–Catalogue | 0.2139 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.9808 | 1.0000 |

G4std–Catalogue–Calc | 0.1595 | 1.0000 | 0.9975 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.9200 | 0.9974 |

G4pen–Barnes | 0.8606 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.9999 |

G4pen–Barnes–Calc | 0.7994 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.9637 |

G4pen–Catalogue | 0.1660 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 |

G4pen–Catalogue–Calc | 0.1572 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.9997 |

G4liv–Barnes | 0.9767 | 1.0000 | 1.0000 | 0.9998 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 |

G4liv–Barnes–Calc | 0.6809 | 1.0000 | 1.0000 | 0.9998 | 1.0000 | 1.0000 | 1.0000 | 0.4828 | 1.0000 |

G4liv–Catalogue | 0.7014 | 1.0000 | <0.001 | 0.9965 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | <0.001 |

G4liv–Catalogue–Calc | 0.6993 | 1.0000 | 0.1663 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.000 |

The graphics in Figure 3 present a visual interesting result for the evaluation of the relative difference between experimental and simulated data taking experimental data as reference. It shows that different systems may be better represented by different modeled spectra. The Inspiration setup (Figure 3a) shows similar results for both modeled spectra since all relative differences for median, first and third quartiles, are between −10 and −2%. A small number of outlier data are observed in this case. The M3000 (Figure 3b) evaluation clearly presents better accuracy and precision using spectra from Barnes et al. [74], since it presents all median data closer to 0% and the lowest data dispersion among the three mammographs represented by smallest first and third quartiles (in the range of −3 and 2%). For Lorad (Figure 3c) a better accuracy of the results is visible when spectrum from Barnes et al. [74] is used specially with Geant4, because all data for these spectra presented median closer to 0% and the data for catalogued spectra [73] presented medians between −6 and −3%. However, for this mammograph, there is no difference on precision when both modeled spectra are used, being observed that the data between first and third quartiles for Barnes et al. [74] are in the range of −4 and 8% and for catalogued spectra [73] between −10 and 1%. These differences between spectra are more evident in Geant4 simulations. All mammographs presented outliers for the evaluation of the relative differences. In an evaluation of all mammographs studied, one may observe (Figure 3d) that the spectrum from [74] was generally more accurate and precise than the spectra from [73]. In the case of Geant4, the simulated absorbed energy seems to present smaller dispersion than the calculated data based on spectra at the detector entrance surface (observe the first and third quartiles in Figure 3d). Even observing this general tendency on data dispersion, it is not possible to conclude that one calculation methodology for the dosimetric quantities is better than the other, since this tendency was only observed for one of the three studied mammographs (Figure 3b).

It is important to note that these are qualitative observations valid for the database (equipment and setups) of this study or similar conditions of energy range and irradiation geometry. To have a quantitative evaluation, one needs to evaluate the statistical significance of the results. Table 2 presents the *χ*^{2} *p* value summary to all evaluated cases considering a significance level of 0.05.

The null hypothesis^{12} is rejected if *p* value is smaller than the significance level (values highlighted in gray in Table 2). When the null hypothesis is rejected, in this test, one may assume that the compared samples are not from the same population (or are not equal). In Table 2, one may see that, in a general evaluation of HVL, the data collected in Inspiration rejects the null hypothesis for Geant4, evoking *liv* physics list and spectra from Catalogue [73] for data calculated based on the spectrum that reaches the detector surface. The M3000 is not presenting any null hypothesis rejection. Lorad presents three cases of null hypothesis rejection for HVL values all calculated with Geant4 and the spectra from Catalogue [73]: *std* physics list considering both calculation methods used (based on spectra and simulated absorbed energy) and *pen* physics list for simulated absorbed energy. The data for Inspiration and Lorad were collected for different target track-additional filtration combination, so it makes it possible to evaluate the results considering this specific setup characteristic. For Lorad it was possible to observe the null hypothesis rejection for different setups simulated taking into account both target track-additional filtration combination. Comparing the MCCTs, the XRMC presented better agreement to the experimental dataset. In Geant4, the *liv* physics list presented the lowest, and the *std* physics list presented the largest number of null hypotheses rejection among the three evaluated Geant4 physics lists. The contingency table with *χ*^{2} statistical test was used to evaluate the independence among the possible transport models evoked by each MCCT and the best modeled spectra. A *χ*^{2} *p* value of 0.49136 for the comparison among the different transport models (XRMC, Geant4-*std*, Geant4-*pen*, Geant4-*liv*) and a *χ*^{2} *p* value of 0.10068 for both modeled spectra were calculated. Both comparisons presented *p* values above the significance level, showing that not the transport models nor both modeled spectra simulated are not statistically different when normalized data is used (which means comparing the data independently of the total number of photons emitted per mAs for the irradiation area).

Table 3 presents the *χ*^{2} *p* value summary comparing the results of XRMC to Geant4 for all evaluated cases considering a significance level of 0.05. Most of the cases evaluated (Table 3) present *χ*^{2} *p* values larger than the significance level not rejecting the null hypothesis. It shows that the simulated data for both MCCTs are not statistically different. The exception was Lorad HVL for Geant4 *liv* Catalogue for absorbed energy calculation due to the track target-additional filtration combination Mo25Rh. This difference did not affect the evaluation considering all cases for each transport model. In a complete evaluation of the simulated data produced by XRMC, the results are statistically compatible (in agreement) to the ones simulated by Geant4 when normalized data are taken into account.

The evaluation same as before was performed with the absolute measurements, first applying the theoretical correction factor, and then the semiempirical correction factor was applied to estimate the number of photons emitted per mAs per total irradiated area. Figure 4 presents the qualitative evaluation for all studied cases and absolute values considering the theoretical correction factor.

As expected, the relative differences increase when absolute values are compared. This was expected since under this condition the results are dependent of the number of photons emitted per mAs per total irradiated area, considering each setup configuration (peak tension, track target-add filtration combination, and stability of the electrical network associate to the wave rectification of the tube generator). All mammographs presented outlier data, and, in a general observation, one may see that Inspiration setup (Figure 4a) presented again a systematic behavior with median values between 0 and 30% and first and third quartiles between −10 and 80%. In this case, the simulated data overestimated the experimental data. Compared to the results presented in Figure 3a, it suggests that the simulated normalization factor is larger than the experimental one, causing this systematic behavior for normalized HVL to present simulated values that are always smaller than experimental ones. M3000 (Figure 4b) presents few cases with outliers (Geant4 *pen* transport model and Barnes et al. spectra [74] and XRMC on T mode with Catalogue [73]). As was observed on normalized data (Figure 3b), it presents the best results with median closer to 0% and the first and third quartiles −10 and 35% for all mammographs and different setups evaluated. Lorad (Figure 4c) presents absolute values generally smaller than the experimental data with the median between −14 and 0% and first and third quartiles between −21 and 5% for all evaluated cases. In a general observation of absolute values (Figure 4d), both spectra presented median differences closer to 0%, probably a compensation for the positive systematic tendency presented by Inspiration and the negative systematic tendency presented by Lorad. It shows the importance of evaluating the whole and parts of the database, grouped by characteristics that may influence the simulation, to have better understanding of the curve behaviors and systematic tendencies of the simulated results.

To better evaluate the significance of the findings in Figure 4, it is important to apply a statistical evaluation. Tables 4 and 5 are presenting *χ*^{2} *p* values for the validation and the comparison of both MCCTs considering absolute quantities and all mammographs evaluated, applying the theoretical corrections.

Transport models and spectrum identification | Inspiration (HVL Mo30Mo) | Inspiration (HVL Mo25Rh) | M3000 (Mo25Rh) | M3000 (W-50Rh) |
---|---|---|---|---|

XRMC_T–Barnes | <0.001 | 0.1035 | <0.001 | <0.001 |

XRMC_T–Catalogue | <0.001 | <0.001 | <0.001 | 0,0453 |

XRMC_S–Barnes | NA | NA | <0.001 | <0.001 |

XRMC_S–Catalogue | NA | NA | 0.0028 | 0.8740 |

G4std–Barnes | 0.1174 | <0.001 | <0.001 | <0.001 |

G4std–Barnes–Calc | 0.1250 | <0.001 | <0.001 | <0.001 |

G4std–Catalogue | <0.001 | 0.9867 | <0.001 | <0.001 |

G4std–Catalogue–Calc | <0.001 | <0.001 | <0.001 | <0.001 |

G4pen–Barnes | 0.5026 | <0.001 | <0.001 | <0.001 |

G4pen–Barnes–Calc | 0.7886 | <0.001 | <0.001 | <0.001 |

G4pen–Catalogue | <0.001 | 0.9854 | <0.001 | <0.001 |

G4pen–Catalogue–Calc | <0.001 | <0.001 | <0.001 | <0.001 |

G4liv–Barnes | 0.1907 | <0.001 | <0.001 | <0.001 |

G4liv–Barnes–Calc | 0.0224 | <0.001 | <0.001 | <0.001 |

G4liv–Catalogue | <0.001 | 0.9869 | <0.001 | <0.001 |

G4liv–Catalogue–Calc | <0.001 | <0.001 | <0.001 | <0.001 |

Transport models and spectrum identification | Inspiration (HVL Mo30Mo) | Inspiration (ISL Mo30Mo) | Inspiration (ISL Mo25Rh) | M3000 (Mo30Mo) | Lorad (Mo-XMo) | Lorad (Mo-XRh) |
---|---|---|---|---|---|---|

G4std–Barnes | 0.9841 | 0.84732 | 0.9999 | 1.000 | 0.8953 | 0.05693 |

G4std–Barnes–Calc | 0.0894 | 0.06821 | 0.3586 | 0.5481 | 0.0249 | 0.0586 |

G4std–Catalogue | 0.0676 | 0.0269 | 0.9685 | 0.0957 | 0.6954 | 0.0568 |

G4std–Catalogue–Calc | 0.05832 | 0.0384 | 0.8437 | 0.7865 | 0.7864 | 0.6785 |

G4pen–Barnes | 0.8284 | 0.0145 | 0.0725 | 0.8679 | 0.0978 | 0.6604 |

G4pen–Barnes–Calc | 0.6983 | 0.9421 | 0.8796 | 0.5647 | 0.0413 | 0.0211 |

G4pen–Catalogue | 0.6753 | 0.0261 | 0.2246 | 0.3540 | 0.7953 | 0.7894 |

G4pen–Catalogue–Calc | 0.9485 | 0.8475 | 0.1000 | 0.0039 | 0.8796 | 0.6854 |

G4liv–Barnes | 1.0000 | 0.6735 | 0.0516 | 0.7865 | 0.9999 | 1.0000 |

G4liv–Barnes–Calc | 0.0768 | 0.1276 | 0.6875 | 0.5694 | 0.9574 | 1.0000 |

G4liv–Catalogue | 0.0107 | 0.0554 | 0.1534 | 0.7865 | 0.7865 | 0.3451 |

G4liv–Catalogue–Calc | 0.0544 | 0.0895 | 0.5674 | 0.6352 | 0.4731 | 0.8966 |

Table 4 is presenting the validation for the mammographs that had at least one *p* value larger than 0.001. For this reason, the Inspiration (HVL), Inspiration (HVL W50Rh), Inspiration (ISL), Inspiration, (ISL Mo30Mo), Inspiration (ISL Mo25Rh), Inspiration (ISL W50Rh), M3000, M3000 (Mo30Mo), Lorad (Mo-XMo), Lorad (Mo-XRh), and Lorad are not presented.

Table 5 is presenting the *χ*^{2} *p* values for the comparison of both MCCTs considering absolute quantities and all options evaluated, applying theoretical correction factor. It only presented the mammographs that had *p* values larger than 0.001. For this reason, Inspiration (HVL), Inspiration (HVL Mo25Rh), Inspiration (HVL W50Rh, Inspiration (ISL), Inspiration (ISL W50Rh), Inspiration, M3000 (Mo25Rh), M3000 (W50Rh), M3000, and Lorad are not presented.

The *χ*^{2} test evaluation presented in Table 5 for absolute values shows a similar result to the ones presented in Table 3 but with a larger number of cases rejecting the null hypothesis and presenting lower *p* values for each of the studied cases which was expected due to the dependency of the number of photons per mAs for the total area estimated. Only Inspiration ISL Mo25Rh did not present null hypothesis rejection among all evaluated cases. The increase on null hypothesis rejection, comparing XRMC to Geant4, is related to the small statistical fluctuation presented by the MCCTs (between 0.2 and 1.5%) when compared to experimental data.

Based on the *p* values presented in Table 4, one could conclude that both MCCTs are not valid for this kind of simulation. However, the *p* values presented for normalized data (Tables 2 and 3) show that the tendencies of the normalized quantities for the simulated data using both MCCTs can be considered statistically non-different to the experimental data. Besides that, the absolute data comparison between both MCCTs (Table 4) presented no null hypothesis rejection. In this case, it is important to verify if the total number of photons defined by the theoretical correction factor applied to the spectra produced a systematic tendency on the expected curves. It is important as well to note that the evaluation is consistent when the normalized data shows no significant difference in the validation process. The curves used in this study to estimate the semiempirical correction factor were:

HVL—the curve of KERMA as function of the additional Al filtration thickness for the same acceleration voltage

ISL—the tendency of the KERMA as function of the distance between focal spot and detector surface for the same acceleration voltage

BS—the tendency of the KERMA as function of the thickness of the scatterer considering the scatterer (or considering the backscattered radiation) and the tendency of the KERMA as function of the thickness of the scatterer without considering the scatterer (or not considering the backscattered radiation)

All cases used to generate the semiempirical correction factor considered the best GoF test results for the amplitude when applied to the simulated data for one acceleration voltage and track target-additional filtration combination for a specific mammograph. The best value for the amplitude in each case was used as semiempirical correction factor to be applied as a multiplication factor on the theoretical correction factor for the total number of photons per mAs per total irradiated area.

Tables 6 and 7 are presenting the *χ*^{2} *p* values for the validation of both MCCTs considering absolute quantities and all cases evaluated, applying the semiempirical correction factors to define the number of photons emitted per mAs per total irradiated area.

Transport models and spectrum identification | Inspiration (HVL) | Inspiration (ISL) | M3000 (BS) | Lorad (HVL) | All | M3000 (Mo30Mo) | M3000 (Mo25Rh) | M3000 (W-25Rh) | Lorad (Mo30Mo) | Lorad (Mo25Rh) |
---|---|---|---|---|---|---|---|---|---|---|

XRMC_T–Barnes | 0.2502 | 0.0754 | NA | 1.0000 | 0.9889 | NA | NA | NA | 1.0000 | 0.7265 |

XRMC_T–Catalogue | 0.0603 | 0.0564 | NA | 0.5859 | 0.2123 | NA | NA | NA | 0.82734 | 0.1466 |

XRMC_S–Barnes | NA | NA | 1.0000 | NA | 1.0000 | 1.0000 | 0.8009 | 1.0000 | NA | NA |

XRMC_S–Catalogue | NA | NA | 1.0000 | NA | 1.0000 | 0.9990 | 0.9990 | 1.0000 | NA | NA |

G4std–Barnes | 0.2635 | 0.2384 | 0.9987 | 0.0817 | 0.7669 | 0.9871 | 0.9987 | 1.0000 | 0.8996 | <0.001 |

G4std–Barnes–Calc | 0.1006 | 0.0845 | 1.0000 | 0.0785 | 0.8876 | 0.8997 | 0.9946 | 1.0000 | 0.1073 | 0.2069 |

G4std–Catalogue | 0.1182 | <0.001 | 1.0000 | <0.001 | 0.3636 | 0.9999 | 0.8954 | 1.0000 | <0.001 | <0.001 |

G4std–Catalogue–Calc | 0.0653 | <0.001 | 1.0000 | <0.001 | 0.0457 | 1.0000 | 0.9997 | 1.0000 | 0.0819 | 0.0211 |

G4pen–Barnes | 0.1398 | 0.1294 | 0.9998 | 0.1101 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.9989 | 0.0521 |

G4pen–Barnes–Calc | 0.1263 | 0.5643 | 1.0000 | 0.7587 | 1.0000 | 1.0000 | 0.9982 | 1.0000 | 0.9897 | 0.0597 |

G4pen–Catalogue | 0.2151 | <0.001 | 0.9988 | <0.001 | 0.0381 | 0.9675 | 0.9999 | 1.0000 | 0.0467 | <0.001 |

G4pen–Catalogue–Calc | 0.2299 | <0.001 | 0.8999 | 0.0302 | 0.0569 | 0.9999 | 1.0000 | 1.0000 | 0.3747 | 0.0138 |

G4liv–Barnes | 0.1946 | 0.1112 | 0.9979 | 0.9703 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.1435 |

G4liv–Barnes–Calc | 0.2384 | 0.1349 | 0.9977 | 0.0690 | 1.0000 | 1.0000 | 0.9734 | 1.0000 | 0.0528 | 0.2694 |

G4liv–Catalogue | 0.7910 | <0.001 | 0.0357 | 0.0490 | 0.0428 | 0.9863 | 1.0000 | 1.0000 | 0.0521 | 0.7092 |

G4liv–Catalogue–Calc | 0.0301 | <0.001 | 0.8073 | 0.5762 | 0.0665 | 1.0000 | 0.9763 | 1.0000 | 0.1454 | 0.8968 |

Transport models and spectrum identification | Inspiration (HVL) | Inspiration (ISL) | M3000 (BS) | Lorad (HVL) | All | M3000 (Mo30Mo) | M3000 (Mo25Rh) | M3000 (W-25Rh) | Lorad (Mo30Mo) | Lorad (Mo25Rh) |
---|---|---|---|---|---|---|---|---|---|---|

G4std–Barnes | 0.9777 | 0.2463 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.9999 |

G4std–Barnes–Calc | 0.9149 | 0.1966 | 1.0000 | 0.9671 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.6334 | 0.9972 |

G4std–Catalogue | 0.2139 | 0.1481 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.9808 | 1.0000 |

G4std–Catalogue–Calc | 0.1595 | 0.0710 | 1.0000 | 0.9975 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.9200 | 0.9974 |

G4pen–Barnes | 0.8606 | 0.2494 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.9999 |

G4pen–Barnes–Calc | 0.7994 | 0.3756 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.9637 |

G4pen–Catalogue | 0.1660 | 0.1910 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 |

G4pen–Catalogue–Calc | 0.1572 | 0.0331 | 1.0000 | 0.0002 | 0.4832 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.0401 |

G4liv–Barnes | 0.9767 | 0.1595 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.9997 |

G4liv–Barnes–Calc | 0.6809 | 0.8606 | 1.0000 | 0.8765 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.4828 | 1.0000 |

G4liv–Catalogue | 0.7014 | 0.7994 | 1.0000 | 0.9212 | 0.9994 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 |

G4liv–Catalogue–Calc | 0.6993 | 0.1660 | 1.0000 | 0.1663 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 |

The application of semiempirical correction factors shows a better approximation for absolute values. When one compares the results corrected by the theoretical factors (Table 4) to the results corrected by theoretical factors associated to semiempirical factors (Table 6), the increase of cases that did not reject the null hypothesis is visible. With the exception of Geant4 *std* (Barnes et al. [74]), all the other cases that rejected the null hypothesis are all from Catalogue [73] which shows that for absolute values and the semiempirical methodology used to generate the correction factor; spectrum of Barnes et al. [74] was the one that presented better agreement to experimental data. In the overall evaluation for each studied case comparing each MCCT and transport model, three cases simulated using Catalogue [73] spectra presented *χ*^{2} *p* values below the significance level: Genat4 *std* and *liv* for Calculated absorbed energy and Geant4 *pen.* All the other *χ*^{2} *p* values are above the significance level. To conclude, the validation of absolute values for all studied cases (column “All” on Table 6), when semiempirical correction factors are applied for both MCCTs, Geant4 MCCT seems to present more sensitivity to the changes in the spectra showing significant differences (not agree) from experimental data for three simulated cases using spectra from Catalogue [73]. This can be due to the more detailed transport of primary and secondary particles. Considering Barnes et al.’s [74] spectra, there is no significant difference between experimental and simulated data considering the results for both MCCTs.

The comparison between both MCCTs after applying the semiempirical correction factor is presented in Table 7. As was expected there was an increase of the *p* values for the absolute value comparison of both MCCTs (Table 7) when compared to the validation of both MCCTs (Table 6). This is expected since the relative differences presented between simulated results (XRMC compared to Geant4) are smaller than the presented between each MCCT and experimental data. It is also important to note that for the comparison between both MCCTs only differences among the transport models evoked are significant. However, on a validation there may be differences associated to minimal discrepancies between experimental and simulated geometry, discrepancies among the transport models evoked (limitations of each model) and the repeatability of the X-radiation production and technical parameters of the mammograph. In the example presented in this section, the introduction of the modeled primary beam increases one variable to be considered in this context, increasing the error associated to the estimation of total number of proton emitted per mAs per total irradiated area. However, when one uses a code or model available on the X-ray equipment to estimate the dose in a radiological procedure, this person is using a modeled spectra or an estimated average spectra for the equipment and needs to pay attention to the limitations of this methodological choice.

To compare the results generated by both MCCTs directly, the *χ*^{2} Pearson, Anderson-Darling, and Kolmogorov-Smirnov tests were applied on the simulated spectra at the entrance surface of the sensitive volume. These spectra were compared, and all of the studied cases presented *p* values above the significance level. For *χ*^{2} Pearson test, all *p* values were 1.0000. The cases that presented larger differences on the validation, such as absolute values for M3000 XRMC and Geant4 based on Catalogue [73] (Tables 8 and 9), presented the lower *p* values in all statistical tests performed for the comparison of the MCCT.

Transport models and spectrum identification | Inspiration | M3000 | Lorad | All |
---|---|---|---|---|

G4std–Barnes | 0.9149 | 0.6566 | 0.9671 | 1.0000 |

G4std–Catalogue | 0.1595 | 0.0521 | 0.9975 | 1.0000 |

G4pen–Barnes | 0.7994 | 0.1182 | 1.0000 | 1.0000 |

G4pen–Catalogue | 0.1572 | 0.0653 | 0.9975 | 0.4832 |

G4liv–Barnes | 0.6809 | 0.1398 | 0.8765 | 1.0000 |

G4liv–Catalogue | 0.6993 | 0.1263 | 0.1663 | 1.0000 |

Transport models and spectrum identification | Inspiration | M3000 | Lorad | All |
---|---|---|---|---|

G4std–Barnes | 1.0000 | 0.8671 | 1.0000 | 0.9768 |

G4std–Catalogue | 0.9999 | 0.9975 | 1.0000 | 0.9999 |

G4pen–Barnes | 1.0000 | 1.0000 | 1.0000 | 1.000 |

G4pen–Catalogue | 1.0000 | 1.0000 | 0.9999 | 1.0000 |

G4liv–Barnes | 0.9998 | 0.8765 | 1.0000 | 0.9154 |

G4liv–Catalogue | 1.0000 | 0.1663 | 1.0000 | 0.3687 |

Another important characteristic of MCCT to take into account is the running time. In this example, the XRMC Transmission mode reduced the running time around 2.5 times compared to Geant4 *std* physics list, 4 times compared to Geant4 *pen* physics list and 4.5 compared to Geant4 *liv* physics list. However, the limitations on simulating the absorbed energy and statistic fluctuations for this XRMC version make the data treatment slower than that used on Geant4 and dependent of several external tools to perform data analyses that are not needed in Geant4.

When the experimental spectra of the X-ray equipment (in this example for mammographs) are available, it is better to use the experimental ones and the correction factors associated to it. However, it is important to keep in mind that it should be the spectra generated by the X-ray tube that is being used, since each tube (even the ones with the same characteristics produced by the same manufacturer) may have a difference on efficiency conversion due to minimal differences in its manufacturing. Besides that, a periodical verification of the amplitude correction factor for the number of photons generated per mAs per total irradiated area (or solid angle) must be applied since the tube wear can affect the conversion efficiency due to the deposition of atoms of the track-target on the window surface (by sputtering effect) or by the releasing of atoms from the track-target into the volume of the tube low pressure air.

## 4. Final considerations

The objective of this chapter was to present the main concepts of validation and reliability applied to MC application development to dosimetry and imaging, presenting a minimal validation that can be performed by MCCT ADs. It is important to note, as an AD in MC, that it is always valid to have your own experimental data to validate the application in the contour limitations of your problem. If experimental data for validation or modeled data for comparison are not available; at least a reliability test should be performed to ensure the quality of the results generated by the MCCT.

On choosing a MCCT, one needs to pay attention to the characteristics of the application, the capabilities and limitations of the MCCT code, and its computational performance. Besides that, the best MCCT is the one that the AD knows how to use (installing, developing applications, and extracting useful data). To do that the AD needs to have knowledge of a programing language or, at least, to understand the logic of input data in MCCT, to understand the experiment or clinical reality to be described in the simulation, and to have the notions of the processes and models of transport significant to the study case.

Regarding the results for the example used in this chapter the evaluation presented as follows:

Validation—the statistical evaluation presented no null hypothesis rejection for XRMC results and presented the rejection of null hypothesis for few Geant4 cases evaluated considering normalized data. The XRMC presented the best agreement to the experimental data. Considering Geant4 the Livermore was the best physic list option. For absolute quantities calculated by applying semiempirical correction factors, all mammographs presented

*χ*^{2}*p*value under the significance level: one value for Inspiration (HVL) and one M3000 (BS) and few for Lorad (Mo25Rh and Mo30Mo) and Inspiration (ISL). Despite these particular cases of null hypothesis rejection, the overall evaluation for each transport model considering all studied cases presented few null hypothesis rejections for Geant4 MCCT using Catalogue spectra. So, it is recommendable to use spectra from Barnes et al. that were validated using both MCCTs (XRMC and Geant4). The use of only the theoretical correction factor for absolute quantities is not encouraged to perform validation, unless the AD knows pretty well the total number of photons emitted by the tube for the irradiation condition. Normalized data may be used associated to theoretical spectra to understand behaviors and tendencies of dosimetric quantities and to explore the influence of changes in the data acquisition but not to define absolute quantities.Comparison—the spectra generated at the entrance surface of the detector by both MCCTs always presented

*p*values above the significance level of 0.05 for normalized data, showing that for this case the spectra generated by the same setup were from the same population (equal) within statistical significance. For absolute quantities calculated by applying semiempirical correction factors, one*p*value was under the significance level for Lorad (Mo25Rh) and one for Inspiration (ISL). Despite of these particular cases of null hypothesis rejection, the overall evaluation for each transport model considering all the evaluated cases presented no significant difference between XRMC and Geant4 which is compatible with the internal consistency of the transport models evoked.Reliability—the qualitative reliability evaluation based on graphics makes possible to observe that the more consistent data occurs for the simulation of the M3000. The graphics allowed to observe the tendencies when comparing simulated data to experimental data considering overall data and specific subgroups. This visual observation shows a consistency with the statistical tables, presenting sensitivity to help on data classification for a detailed analysis.

The methods to test a MCCT application are indispensable in the good practice of computational dosimetry and imaging because they guarantee the quality of the results, helping on the evaluation of the methodology limitations and making it possible to improve the trustability of the application and its results transposing with safety the “*computational world*” to the “*real world*.”

## Notes

- RUN: word used to define the execution of the MC code.
- EVENT: every interaction that happened to one primary particle or its secondaries until they die or leave the universe of simulation. It is defined as the collections of steps performed by one particle.
- All attenuation cross sections used were from XCOM NIST (https://physics. nist.gov/cgi-bin/Xcom/xcom3_2).
- Microscopic validation: refers to the detailed validation of microscopic quantities (usually the libraries) used by the MC code to generate the quantitative results. See more information on Section 3. Verification, validation, comparison, and reliability of Monte Carlo toolkit.
- The Geant4 list of particles and its identifications number may be found at https://www.star.bnl.gov/public/comp/simu/newsite/gstar/Manual/particle_id.html).
- The FLUKA list of particles and its identifications number may be found at http://www.fluka.org/content/manuals/online/5.1.html.
- EPDL: Photon and Electron Interaction Data is available at https://www-nds.iaea.org/epdl97.
- XCOM: Photon Cross-sectional Database is available at https://www.nist.gov/pml/xcom-photon-cross-sections-database.
- Additional information about SUGMA function access SectionB.2 in Appendix B of the PENELOPE-2014: A Code System for Monte Carlo Simulation of Electron and Photon Transport at https://www.oecd-nea.org/science/docs/2015/nsc-doc2015-3.pdf
- The EGSnrc has its official page associate to National Research Council Canada at https://nrc.canada.ca/en/research-development/products-services/software-applications/egsnrc-software-tool-model-radiation-transport.
- Search for the active standards was performed at https://standards.ieee.org; https://www.en-standard.eu and https://www.iso.org/about-us.html.
- χ2 test null hypothesis: relationship between experimental and simulated data does not exist, which means these samples are presenting the same distribution.