The work describes reliability and security growth models for modifiable software systems as a result of revisions and tests performed for specified input data areas. The work shows that the known reliability growth models are of monotonically increasing type, which is not in line with current multi-version team technologies of software development that are primarily based on the open-source code. The authors suggest new non-monotonically increasing models of software reliability evaluation and planning that allow taking into account the effect of decreased reliability resulting from updates or wavefront errors. The work describes the elaborated bigeminal and generic reliability evaluation model as well as the models and test planning procedures. The work includes calculated expressions for the evaluation of the model accuracy and shows that the developed models are adequate to real data. An example is given of transition from probability models to fuzzy models in case of incomplete basic data. The work provides general recommendations for selection of software tool testing models.
- modifiable systems
- program tests
- software reliability
- software security
- test planning
- reliability growth models
- debugging models
- nonmonotone models
- open-source reliability
According to the ISO/IEC 17000 standards, the main procedures of software compliance evaluation include acceptance tests, certifications tests, and follow-up inspection control.
For the purpose of certification tests, the software to be assessed for compliance is submitted in a complete form, usually upon the final completion of acceptance testing. At the same time, during preliminary and acceptance tests, the assessed software is revised in order to correct detected errors of different types. Considering all this, at the stage of certification, the information systems and software products can be regarded as non-modifiable, while at the stage of acceptance tests, they are defined as modifiable systems. This defines the difference in approaches to developing the mathematical test models.
2. Non-monotonic models of software reliability and security evaluation
In the course of preliminary acceptance testing and trial operation of information systems, it is important to define the moment when the testing can be considered complete and the system can undergo commissioning procedures. As for high-security software (including software intended for processing of confidential information or software used in critical system applications), current regulatory documents require that the test results be formalized1. In these cases, the test completion criteria (documented in test certificates), besides the very fact that the specified requirements are met, also include the values of test confidence parameters and parameters of the achieved level of reliability or correctness considering the specified evaluation accuracy. For these purposes it is reasonable to use mathematical models [1, 2] that are classified in this work in the following way (Figure 1):
Debugging models that allow assessing the software reliability parameters depending on the results of program runs on specified data areas and subsequent program modifications
Time reliability growth models that allow assessing the software reliability parameters depending on the time of test considering the corrected program errors
Test confidence models that allow assessing confidence parameters of the test procedure
Program complexity models based on the relationship between the software complexity metrics and program quality, reliability, and safety parameters
It should be noted that the latter three classes of test models are rather well developed2 [3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]. For example, today, about 200 time models are known, mainly, NHPP models (e.g., [15, 16, 17, 18, 19, 20, 21]). At the same time, debugging models (also known as reliability growth models based on input data areas and revisions) are usually related only to Nelson’s model and its modifications  developed at the dawn of the programming theory and do not reflect peculiarities of the modern team software development methods.
The early stage of testing is the typical scope of application for the debugging models. This is due to the fact that this period of a software system lifecycle is characterized by active modification of the programs aimed at correcting the detected errors. The models described in the literature reflect monotonic (typically, exponential, or logistic) growth of software operation reliability, which is not always true, as, for instance, in the case of implementation of the open-source software, multi-version or multiple replica software developed at different times by absolutely different teams of developers with diverse qualification, different styles, using various technologies and development systems, etc. This chapter is devoted to justification of new non-monotonic models and calculation of expressions of their parameters. We shall assume that the software reliability is a set of properties that characterize the ability of the program to maintain the specified level of availability in specified conditions during the specified period of time.3 It is important to note that if the level of availability is restricted by security and vulnerability defects, the term
Definition of the software reliability is fundamentally different from that of the hardware, mainly, due to the fact that the software is not prone to aging in time. Two characteristics of the software reliability can be mentioned:
As a characteristic, reliability can alter only as the result of the software modification (i.e., when the tested object is changed), and the level of reliability can either increase or decrease.
Values of the software reliability parameters are valid for those input data classes that were used for their calculation.
A number of debugging models were described in the literature, namely, Nelson’s model, matrix model, LaPadula model, and other models [2, 5, 12, 13, 22], that reflect the stepwise monotonic growth of reliability and thus do not take into account the possibility of obvious reliability decrease, for example, due to introduction of global wavefront errors or addition of new functionality. Experience gathered by the test laboratory shows that application of such mathematical models either gives unreliable results or significantly increases the time required to assess the software reliability . That is why it is necessary to substantiate a non-monotonic software reliability model and obtain calculated values of its parameters which are also required to assess its reliability.
According to the abovementioned first property of the software reliability, the process of software modification can be represented in the form of random transitions from one reliability state to another. The moments of transition are modifications of the tested object, which can be described as any changes of the program aimed at correcting the detected errors or developing the program.
We shall define the main software reliability indicator as the level of the program reliability, which represents the probability of its error-free starting for a set of basic data from the specified range. Considering the above said, we have the following software reliability change model:
where is the initial level of reliability (), u is the number of completed revisions of the software, and is increment of reliability after j revision.
The process of software reliability change can be graphically presented as a stepwise reliability growth function (Figure 2).
If we view software as a modifiable system, the change of the software reliability level after
where is the probability of error-free operation of the software after (
Proceeding to the recurrent expression and considering the maximum level of reliability to be equal to we can obtain the software reliability evaluation model:
where is the initial level of reliability, is the maximum level of reliability (), and
The obtained expression (Eq. (3)) takes into account the possibility of uneven reliability growth of the tested object and the general trend of growth decrease when the level of reliability increases. However, when the model is presented in this way, it is generally monotonic since it does not take into account the different effects produced by fundamentally different types of modifications, for instance, changes of the software in order to correct errors or introduce new functional elements. Besides, the model does not reflect the degree of modification complexity and, consequently, probability of wavefront errors. Obviously, the model represented in this form can be regarded as a monotonic reliability growth model .
2.1. Bigeminal model of software reliability and security evaluation
In order to overcome the drawback described in the previous section, we offer a bigeminal reliability evaluation model based on metrics of the source code modification , for example, for error correction and software updates. This metric has no limits (i.e., the complexity metric that is most suitable for the software system and development system can be used4), which ensures comprehensive description of the considered process. Thus, if the revision efficiency factor , we can obtain the main calculated expression of the bigeminal reliability evaluation model:
The bigeminal model (Eq. (4)) depends on four parameters (that can be easily calculated with the use, for instance, of the maximum likelihood method.
2.2. Generic model of software reliability and security evaluation model
Though the bigeminal model has the advantage of being mathematically simple, it does not take into account peculiarities of various types of software modifications relating to new functionality, correction of global and local errors, elimination of vulnerabilities, issues of integration and upgrade or degradation of the operating system, optimization, etc.
In order to address these issues and increase the model accuracy, we should introduce classification of modifications (including corrected errors) taking the following calculated expression for the revision efficiency factor:
Considering all this, we can obtain a generic non-monotonic reliability evaluation model:
This model depends on (
2.3. Calculated expressions of reliability and security evaluation model parameters
The maximum likelihood method can be used to calculate parameters of the bigeminal (Eq. (4)) and generic (Eq. (6)) models. The following data obtained during the software tests can be used as the initial statistics: the set of tests , the set of failed tests (failures) between revisions, and the set of revision complexity metrics . In this case, if the software runs are considered independent, the function of maximum likelihood represents the probability of obtaining the total sample () of the number of failures in the performed series of software runs:
where = ,
For the sake of convenience, we can take the logarithm of the function and modify the function in the following way:
The obtained reduced function is convex and is defined for a convex set; that is why in order to find the maximum of the likelihood function, we can use, for example, the modified steepest descent method with the variable increment parameter :
The following new calculated expressions of partial derivatives of the reduced maximum likelihood function were obtained during this study:
where ; ; , .
Judging from the practical experience, the following accuracy is sufficient in order to define evaluations ,,, …, :
Improving accuracy of parameters, () definition is related to their strong effect on the function of reliability evaluation. Zero-order approximations can be found using the statistical modeling method for logical intervals:
where is the number of failures in the first runs, is the number of failures in the last runs, and is the maximum value of when and .
Thus, if we assume that , , , …, are random values distributed evenly on previously specified intervals, we should perform a certain number of samples and select a set of parameters corresponding to the maximum likelihood function. This set shall be considered to be the desired initial values. As the experience shows, during the initial stages of tests, the general trend of software reliability increase due to modifications may not be present. This can lead to unreliable results obtained with the use of the maximum likelihood method (an infinite number of iterations will be required to calculate the function maximum).
In order to overcome this drawback, the method of relative entropy minimization can be used:
where is the number of failed runs of the total number of runs of
In order to check the necessary and sufficient condition for acceptability of the maximum likelihood method, the following ratio can be used:
2.4. Estimation of accuracy of software reliability and security evaluation model
Authors of the absolute majority of reliability growth models do not provide any analytical assessment of their accuracy, which makes it difficult to select a specific model. This works allows excluding this drawback. The accuracy of the software reliability estimation can be characterized by the root-mean-square deviation. In order to obtain an accuracy estimation model, it is convenient to use the linearization method . In this case, the root-mean-square deviation shall be defined according to the following equation:
where is correlation factor of parameters
The following original calculated expressions were obtained in this work in order to get the values of partial derivatives of the reliability growth function:
where , , and .
Other parameters of the formula can be defined from the covariance matrix that includes dispersions and correlation moments of the desired values:
The following equation can be used for its formulation:
where is matrix of the second partial derivatives of the likelihood function:
The following original calculated expressions were obtained in this work in order to get second partial derivatives:
2.5. Software reliability and security evaluation algorithm
Figure 3 shows the algorithm of software reliability and security evaluation
2.6. Input data normalization of the developed models
Nonstandard situations occurring in the course of the information system operation may lead to the disruption of specified input data, which, according to the second property of the software reliability, results in the inadequacy of obtained values. This situation occurs when invalid input data classes are used and the frequency of utilization of the input data classes does not correspond to the frequency that was used during testing or specified in the technical requirements. This may happen during trial operation aimed at performing accelerated tests of the software, due to the change of environment and in other cases. This situation can be taken into account by correcting the calculated reliability values. The correction can be done using the method of multiple factor analysis. In this case, the program input classes are broken into n equivalence classes. The function of reliability value dependence on frequency of application of equivalency classes is calculated:
where is the frequency of application of
The study has shown that first-order polynomial is sufficient for correction:
where is the frequency of application of
This model has two unknown parameters that can be easily found with the help of the least squares methods.
2.7. Approbation of the non-monotonic software reliability and security evaluation model
The study has shown that the suggested non-monotonic models (Eqs. (4) and (6)) provide high accuracy () when the number of revisions exceeds 10 and the number of runs exceeds 50. In order to control the model consistency with the basic data, the Mises criterion was used (at threshold value of 0.01) :
where is the Mises criterion and is the threshold value.
Analysis of the effect of the software revision efficiency factor on the model (Eq. (6)) accuracy has shown that the accuracy can increase by an order of magnitude on the condition that revision classes are taken into account. Comparison of the suggested models with the well-known debugging models has demonstrated a number of their advantages, namely:
Taking into account the possible steep decrease of reliability due to upgrades
Possibility of taking into account the revision complexity
Absence of restrictions for tests and information acquisition
Possibility of taking into account the software reliability values obtained during the previous stages of development and implementation
Absence of subjective factors, such as programmer’s qualification and the level of development technology
Ease of application since there is no need to calculate probability of all program paths as, for example, in Nelson’s model and its modifications 
Thus, the study actually substantiates the method of test planning based on utilization of the non-monotonic software reliability evaluation model using the results of runs and revisions. Within the scope of the suggested method, we obtained calculated expressions of parameters of the software reliability evaluation model and estimated accuracy and test planning. The suggested generic non-monotonic model (Eq. (6)) allows considering probable moments of the software reliability decrease typical, for instance, for open-source software development, multiple version software, etc. Accuracy of the generic model depends on how the task of software revision classification is solved. The model can be integrated with software reliability values obtained during the early stages of the software development. Simplification of the model allows reducing it to exponential NHPP models of reliability growth used at the stages of information system operation and upgrade .
The main advantage of the suggested non-monotonic models is the possibility to increase accuracy by more than 10% (as the results of introducing revision categories), which is equal to 5–15% reduction of the required number of software runs during test procedures. It should be noted that debugging models provide low accuracy at low statistics; however, this drawback can be avoided by using appropriate accuracy increase techniques, including Wald’s method.
The suggested method and models can be also recommended to estimate the parameters of various modifiable and learning systems.
3. Test planning and software revision models
In the course of the software reliability management, it is necessary to plan the cost of testing in order to achieve the required level of the software reliability. Thus, it is useful to evaluate the trends relevant to the software development and implementation and predict the number of remaining errors and complexity of their correction.
The models (Eqs. (3), (4), (6)) described above can be used to calculate a number of planning indicators. Unfortunately, statistical models of reliability evaluation do not allow predicting the frequency of corrections of a specific type but only use this information. Specific revisions that depend on operating conditions, the achieved level of reliability, requirements for the software reliability, developers’ qualification and experience and, consequently, their content may differ. In order to consider the revision types, it is reasonable to use the theory of multiple factor analysis. Since the change of the number of specific corrections is considered within the scope of revisions, the software modification complexity function can be approximated using, for example, a quadratic polynomial in one variable:
where , , and are the polynomial parameters ().
It is easy to demonstrate that the polynomial parameters have the following form:
Then, assuming that the estimation of the model parameters and the achieved software reliability level was obtained based on the available test data, we have the following calculated expression of the reliability-level prediction model:
where is the required level of the software reliability,
The quantity of revisions required to achieve the desired level of reliability can be calculated using the cyclic recalculation of the expression (Eq. (25)). To this end is calculated using the formula (Eq. (25)); further, in the cycle the value is defined by increasing
To simplify application of the predictive model, let us assume that , which corresponds to the transition from the model (Eq. (6)) to (Eq. (3)). Then, after we reduce the expression (Eq. (25)) and take its logarithm, we will obtain the following expression required to evaluate the number of software revisions that are necessary to achieve the desired level of reliability:
where is the operation of obtaining of the nearest biggest integer and is the averaged software revision efficiency factor.
Assuming that revisions do not introduce additional errors (i.e., ), we can obtain the formula for the number of remaining errors after u revision:
4. Fuzzy model of software reliability and security evaluation-based on test results
Testing of software complexes for compliance with requirements for reliability and security is one of the most time-consuming and difficult stages of implementation of automation system. This is primarily due to the extreme structural complexity of modern software and its heterogeneity. Incomplete information on the software structure, principles and functioning, heterogeneity of its composition, presence of imported elements, and insufficient specifications make it difficult to evaluate and predict the software reliability. In these cases, traditional approaches to acquisition and forecasting of reliable values are associated with significant costs; that is why models based on the fuzzy sets of theory that allow estimating the software reliability with practically acceptable accuracy are of immediate interest [26, 27, 28].
At the present time, the literature describes fuzzy models of software reliability evaluation. These models are peculiar for their focus on static and dynamic analysis of the software graph, which is practically difficult due to the extreme structural complexity of the modern software systems and environments. We suggest describing the software testing and debugging process by a non-monotonic software reliability growth function utilizing the fuzzy sets of theory in order to take into account the incompleteness of input data.
It is possible to demonstrate that the non-monotonic software reliability growth function looks as follows:
where is the probability of successful software run after
This model depends on three parameters that can be conveniently calculated with the help of the maximum likelihood method. To create the likelihood function, it is reasonable to use the data recorded during the software tests, namely, the order of revisions, results of the software runs (whether any vulnerabilities were detected or not), and number of runs between the revisions.
It is easy to show that the maximum likelihood function logarithm will look as follows:
where is the number of failures in tests and is the number of revisions.
The function is convex and is defined for a convex set; that is why in order to effectively find the maximum of the likelihood function we can use, for example, the modified steepest descent method with the variable increment parameter, which allows obtaining the desired parameters of the model (Eq. (28)). The greatest difficulty of modeling the automation system operational readiness is determined by the fact that the software reliability level has to be evaluated in conditions of considerable uncertainty, namely:
Fuzziness of cause-and-effect relationship of the automation system as an ergatic system does not allow clear distinction between successful and unsuccessful revisions.
Definition of the amount of revisions as a function of the software metric characteristics does not always line up with reality. Knowledge of the software developers is required.
A number of errors appear as the result of shortcomings of the debugging and update procedures. Some errors are automatically eliminated at the final stages of the software development and do not require correction.
These uncertainties introduce a significant portion of subjectivity to the software reliability evaluation. The fuzzy set of theory allows taking them into account without substantial alteration of the model (Eq. (3)). This work is primarily aimed at solving this task.
4.1. Development of a fuzzy software reliability and security model
Let us present the information on the debugging process in the form of the set , where is the software revision (). The number of relevant revisions is defined as , where
Fuzzy set representing a set of ordered couples of revisions of the universal set X и membership functions that characterize availability of revisions.
Set of relevant revisions , .
In this case, the fuzzy set of relevant revisions will look as follows:
where is the membership function defining the level of confidence in the fact that the number of relevant revisions is equal to .
In general, the membership function can be found using the following expression:
For the purpose of practical calculation, it is convenient to expand the revision membership function in ascending and descending order:
This provides the main calculated ratio: , The number of relevant revisions corresponding to the maximum level of confidence (i.e., to the maximum membership function) is equal to:
The maximum membership function can be calculated in the following way:
By applying the generalization principle, we can move from the fuzzy set of relevant revisions (Eq. (29)) to the desired fuzzy set of the software reliability levels:
where ,,—reliability level defined according to the formula (Eq. (3)).
It is important to note that considering the monotonic dependence of the software reliability level from the number of revisions, it is possible to formalize the fuzzy set (Eq. (34)) with the complex of hierarchically ordered crisp sets. According to the decomposition theorem, we have:
Then, by defining the value α based on the specific software operating conditions and accuracy of expert estimation, we can obtain the interval (guaranteed) software reliability level:
4.2. Example of possible application of fuzzy sets
Below is the simplest example of calculation of the software reliability level. During the debugging stage, 48 tests were carried out, 5 groups of defects were detected, and required revisions were performed. After the expert opinions were processed, the information on debugging was obtained in the form of a fuzzy set of revisions:
Having arranged the fuzzy set A by the membership function values, we obtained a fuzzy set of relevant revisions:
After we calculated reliability levels using the formulae (Eq. (3)), we obtained a fuzzy subset of the software reliability levels:
According to the accepted assurance level α=0.4, we have.
Thus, practical solutions suggested in the work take into account the uncertainties of software development and testing conditions. This allows obtaining rather accurate maximum and interval estimates of the software reliability and security. Analytical expressions allow simplifying the software reliability analysis as compared with the methods based on expert judgments. It is reasonable to apply the described results for planning of system and complex tests.
5. Evaluation models and test planning selection criteria
It should be noted that there is no universal model of the software evaluation and test planning. Moreover, beside the described classes of models, studies suggest simulation models , structural models , fuzzy models [26, 27], interval models , software dynamic models [31, 32, 33], software/hardware complex models [34, 35], Bayesian model modifications [19, 30, 36, 37], as well as neural networks applied for certain scientific purposes [38, 39]. In order to select a suitable model, a number of qualitative and quantitative criteria can be suggested .
The following qualitative criteria can be used:
Ease of application that primarily concerns the degree of the model adequacy to the statistic collection system, i.e., utilized input data can be easily obtained; the data must be representative; and the input and output data must be clear for the experts.
Validity: the model must be reasonably (sufficiently) accurate to solve the tasks of analysis or synthesis in the field of software security. The positive property of the model that allows reducing the input sample is the ability to use a priori information and integrate data from other models.
Applicability for various tasks. Some models allow estimating a wide range of parameters necessary for experts at different stages of the software lifecycle, for instance, reliability values, expected number of errors of different types, predicted time and financial expenditure, developers’ qualification, test quality, software cover parameters, etc.
Simplicity of implementation including the possibility of automated estimation based on well-known mathematical packages and libraries, model learning after revisions, taking into account the incomplete or incorrect input statistics, and other restrictions of the models.
The following quantitative criteria can be used:
Evaluation accuracy parameters.
Predictive model’s quality parameters (convergence, noise tolerance, prediction accuracy, consistency).
Information criteria of predictive model’s quality (dimensionality, BIC/AIC criteria).
Combined and integral parameters, for instance:
where is the weighting factor of i property of the considered model selected by the expert and . is the characteristic function of the i property.
As the study has shown, there are a lot of mathematical models that allow estimating the software reliability and security at different stages of lifecycle, which is important for budget planning. On a practical level, the described classification of models simplifies selection and integration of the models based on the available statistics.
It is important to bear in mind that due to the dynamic nature, complexity, and heterogeneity of modern software development projects, the described models are not able to meet strict requirements for accuracy and serve for making intuitive decisions relating to the software test planning for all sets of input data. However, the results obtained from the model application are useful both for substantiating the labor content of the tests and for preparation of reports, which can increase the customer’s confidence in the work deliverables.
The chapter presents a new class of probabilistic step models for software reliability (and security) assessment which allows to improve the adequacy and accuracy of evaluation for modern multi-version software systems (e.g., open-source software). One of the main features of the developed models is taking into account the effect of reducing the degree of reliability when updating programs.
These mathematical models have undergone a detailed study and lead to a method that allows planning and monitoring the level of software reliability at the stages of preliminary testing, trial operation, acceptance testing, inspection, and testing after modifications. Completeness and consistency of the method is ensured by the fact that the developed models do not impose strict limitations on the taxonomy of errors, modifications, tests, and input data.
The results of the proposed version of the test process modeling can be used at different stages of the software life cycle and integrated into various systems for modeling the reliability and safety of software. To do this the chapter proposes qualitative and quantitative criteria for selecting software test models.
It should be mentioned that in the field of information security the use of mathematical models becomes a mandatory procedure in case of checking the high confidence level of the software. This is determined by the methodology of Common Criteria5 regulated by ISO/IEC 15408.
In the field of quality and functional safety of software, the application of mathematical models is welcomed to reduce the level of subjectivity in testing using black box method, fuzzing, functional testing, etc. (see the lines of international standards IEC 61508, IEC 61511, and ISO/IEC 33001 and also the Russian new standard GOST R 56939). In this respect, IEC 61508–7:20106 is extremely useful because it regulates the relationship between the classes of software testing and the use of formal and semiformal models in detail.
- ISO/IEC 15408–3:2008. IT—Security techniques—Evaluation criteria for IT security—Part 3.
- IEEE Std. 1633–2008 (R2016). Recommended Practice on Software Reliability.
- GOST 28806–90. Software quality. Terms and definitions.
- IEEE Std. 1061–1998 (R2009). Standard for a Software Quality Metrics Methodology.
- IEC 61508–7:2010 Functional safety of electrical/electronic/programmable electronic safety-related systems—Part 7: Overview of techniques and measures.