Probabilities corresponding to the ordinal ranking, , rounded to two decimal places.
Comparatively few of the vast number of suggested decision-analytical methods have been widely spread in actual practice. The majority of those methods call for exact and accurate numbers as input, which could be one of several reasons for this lack of actual use; people frequently seem to be unfamiliar with, or reluctant to express those, in a sense, “true” values required. Many alternative methods to resolve this complication have been suggested over the years, including procedures for dealing with incomplete information. One way, which has proliferated for a while, is to introduce so-called surrogate numbers in the form of ordinal ranking methods for multi-criteria weights. In this chapter, we show how those can be adapted for use in probability elicitation. Furthermore, when decision-makers possess more information regarding the relative strengths of probabilities, that is, some form of cardinality, the input information to ordinal methods is sometimes too restricted. Therefore, we suggest a testing methodology and analyze the relevance of a set of cardinal ordering methods in addition to the ordinal ones.
- decision analysis
- probability elicitation
- cardinal ranking
- rank order
- imprecise probability
Elicitation of subjective beliefs has been applied in numerous areas, such as game and decision theory [1, 2], agriculture , statistics , and various disciplines of economics [5, 6, 7], and many methods have been designed for elicitation purposes [8, 9, 10]. It is in these contexts often assumed that at least experts in the various areas are capable of acting rationally and can provide reliable information so long as they are eliciting probabilities sensitively. Those assumptions are contrary to the fact that even expert estimates may differ significantly from the true probabilities and that there are still no universally accepted methods of probability elicitation available. The process of eliciting adequate quantitative information is one of the substantial challenges within decision analysis [11, 12].
In a classical framework (cf., e.g., ), numerical probabilities are assigned to the different events in tree representations of decision problems. The assignments are made after the set of parameters, whose values need to be elicited and whom to choose for providing those estimates, have been identified. Domain experts could, for example, be asked to express their beliefs about the likelihood of a particular event in probabilistic form. Such beliefs cannot be measured objectively, and neither should they be judged in such a manner. Besides, the success of an elicitation process depends on how well a representation of the present subjective opinions can be constructed rather than on some set of objectively true values [14, 15, 16].
Methods for eliciting utilities and probabilities have been thoroughly investigated, resulting in a large number of recommendations and handbooks on the subject. Procedures range from using direct elicitation, gamble, and lottery techniques, as well as more elaborate methods to reduce biases, aversions, and a multitude of other causes of errors while producing as reliable estimates as possible (c.f., e.g., [17, 18, 19, 20, 21, 22]). Here, it is generally assumed that procedures for elicitation should give rise to adequate preference orders, but this assumption is nevertheless often violated in empirical studies (c.f., e.g., [23, 24, 25]). A multitude of methods for ordinal rankings or interval approaches have been suggested to provide more realistic models. The goal is to be able to utilize the information the decision-makers can supply without forcing them to express unrealistic, misleading, or meaningless statements.
To elicit probabilities, methods based on ordinal rankings already constructed for obtaining weights for multi-criteria decision analysis (MCDA) (see [26, 27, 28]) can be adopted. The fundamental idea is that ordinal information that stems from importance can be converted to a set of normalized surrogate numbers, the values of which are consistent with the elicited ordinal rankings. The actual conversion can be done using a variety of methods, such as rank sum (RS) and rank reciprocal (RR) , and centroid-based methods (ROC) ; originally used for handling criteria weights in MCDA. However, even in MCDA, the use of only ordinal information is sometimes perceived as being too vague or imprecise, resulting in a lack of confidence in the alternatives’ final values or a too large class of non-dominated alternatives.
In this chapter, a set of methods that allow for more expressive information as input when eliciting probabilities, while maintaining the relative correctness and simplicity of ordinal ranking procedures, is proposed. In the following section, we compare and discuss a set of significant features, including correctness and relevance, of various extensions to some existing ranking methods. Following a brief recapitulation and adaptation of some ordinal ranking methods in the following section, we continue with cardinal ranking methods and discuss a set of appealing candidates for probability elicitation. Using simulations, we investigate some properties of the treated methods and conclude with pointing out, according to the results, a particularly attractive method for eliciting probabilities.
2. Ordinal ranking methods in MCDA
Different elicitation formalisms by which a decision-maker can express preferences in MCDA decision situations have been proposed. Such formalisms are sometimes based on scoring points, as in point allocation (PA) or direct rating (DR) methods. In PA, the decision-maker is given some number of points, for example, 100, to distribute over a set of criteria or consequences, depending on the type of decision . Hence, for criteria or consequences, there are degrees of freedom (DoF). Direct rating methods, on the other hand, put no limit on the number of points to be allocated.1 The decision-maker allocates as many points as desired and the points are subsequently normalized. Thus, in DR, there are degrees of freedom for criteria. Regardless of elicitation method, the assumption is that all elicitations are made relative to a distribution held by the decision-maker.2
Surrogate methods are utilized in [26, 27, 28] and many others for handling such problems. Regardless of method, however, a key property must be to retain as much information as possible in the surrogate numbers, yet accommodating for the various constraints required by certain types of values.
Stillwell et al.  compare a set of different methods for eliciting surrogate numbers from ordinal rankings alone, based on the idea of maximizing the power to discriminate between values. Among those are rank sum and rank reciprocal, for which surrogate weights are derived solely from the rank order of the attributes. Take a simplex generated by , and .3 Assign an ordinal number to each item ranked, starting with the highest ranked item as number 1. Denote the ranking number among items to rank. Then the rank sum (RS) surrogates for all are defined by
The surrogate numbers produced by the rank reciprocal (RR) method are, as the name implies, based on the reciprocal of the rank of each item. Let the number 1 corresponds to the highest ranked item, the number 2 to the second highest ranked item, and so on. Then for the :th ranked item, the RR surrogate, given a total of items, are obtained by
A decade later, Barron  suggested a method based on vertices of the simplex of the feasible space. The rank order centroid (ROC) weights are the centroid vector components of the simplex . The weights then become the centroid (mass point) of . The ROC weights for the ranking number among items to rank are given by
However, RS, RR, and ROC perform well only for specific assumptions on decision-maker behavior. If we assume that the decision-maker stores its preferences in a way similar to a given point sum, considering normalization, there are (degrees of freedom (DoF) for items. On the other hand, if we assume that the decision-maker stores its preferences in a way that puts no limit to the total number of points (or mass) allocated, and the normalization is made afterward, then there are degrees of freedom for criteria. It remains an open question as to how a decision-maker perceives the nature of the basis of a particular preference order. Whether the linear dependence in DoF models is accounted for, or if preference values are allocated with no particular limits in accordance with DoF models. Surrogate numbers obtained by RR and ROC models agree with a preference structure based on degrees of freedom, while the RS model conforms to a preference structure based on degrees of freedom. Due to this apparent difference, a model, SR, in which features from both RS and RR were incorporated, was proposed by . The SR method is an additive combination of the Sum and the Reciprocal functions as in
|Rank sum (RS)||0.50||0.33||0.17|
|Rank reciprocal (RR)||0.55||0.27||0.18|
|Rank order centroid (ROC)||0.61||0.28||0.11|
|Sum reciprocal (SR)||0.52||0.30||0.17|
|Rank sum (RS)||0.29||0.24||0.19||0.14||0.10||0.05|
|Rank reciprocal (ROC)||0.41||0.20||0.14||0.10||0.08||0.07|
|Rank order centroid (ROC)||0.41||0.24||0.16||0.10||0.06||0.03|
|Sum reciprocal (SR)||0.34||0.22||0.17||0.13||0.09||0.06|
In theory, the choice of elicitation method should preferably be based on the manner in which a decision-maker constructs a preference order; whether it is based on a DoF or DoF model, or even a mixed model. However, since we cannot know the ratio of DoF to DoF employed in the mind of the decision-maker, an elicitation method, to be robust, should work about equally well regardless. The SR method, assuming the principal features of both RS and RR, was found to be the method best accommodating for such a need.
In the following, the methods for weight elicitation within MCDA presented above will be augmented with information denoting the relative difference between adjacent items and modified to meet the requirements of probability elicitation. We adhere to a standard, one-level decision-tree model, in which each of alternatives has consequences. Hence, there are times consequences in total.
3. Cardinal ranking methods
Providing ordinal rankings puts fewer demands on decision-makers; they are, in a sense, effort saving. Furthermore, there are techniques such as those mentioned earlier for handling ordinal rankings with some success. For use in probability elicitation, the same ordering is asked of the decision-maker, but this time in reference to how probable events are as outcomes of a chosen alternative of action.4 Nevertheless, decision-makers might, in many cases, have more knowledge of the decision situation, even if the information still is not precise. For instance, cardinal probability relation information may implicitly exist, entailing that the surrogates may not really reflect what the decision-maker actually means by a particular ranking.
To improve the conformance of an ordinal ranking to the true subjective beliefs of a decision-maker, information such as the relative differences between adjacent items need to be accounted for. Given a ranking of some number of consequences, relative differences in probability comprise such information. Furthermore, methods that allow for such details can also handle probabilities considered equal, something which purely ordinal methods cannot.
The following notations together with the suggested interpretations are used to exemplify the proposed methods. The symbols denote the relative strength of the difference in probability between consequences.5
“slightly more probable”
“much more probable”
To derive probabilities for consequences using cardinal methods, start by ranking the consequences in order of probability, giving the most probable consequence rank one, and the least probable consequence rank . Let be the number of difference steps between probabilities and , denoted by . Setting is equivalent to as in the ordinal case. The sum, , of all difference steps for probabilities will then be , and for any probability , , the position on this scale of difference steps from 1 to is defined by , and .6
Modifying the above ordinal methods for probability elicitation can now be done without difficulty. The key is to normalize the values of the cardinal rank positions such that the higher positions result in the lower probabilities. In the cardinal version of RS (CRS), the probabilities should mirror the differences in cardinal rank positions. Hence, the CRS probabilities are given by
where are the cardinal rank positions derived from the cardinal differences provided by the decision-maker such that if and only if . The cardinal variation of rank reciprocal (CRR) is defined in a similar fashion, and the CRR probabilities are obtained by
with the usual property that a higher probability is assigned to lower ranking numbers. ROC is generalized in the same way, and the corresponding strength rank order centroid (CRC) probabilities are obtained as
Finally, generalizing SR is done in the same way and using the above equation, the corresponding cardinal SR (CSR) probabilities are obtained as
which is a generalization similar to the others. Thus, using the idea of cardinal steps, ordinal methods are easily transformed to their respective cardinal counterparts. See Table 3 for an example of probabilities correlated with a certain cardinal ranking.
3.1. Application to a decision problem
To show the merits of cardinal methods for elicitation of both probabilities and values, we present a decision on the choice of programming language, adapted from . Due to resource constraints in a current Prolog implementation, the staff considered two options:
: Rewriting the whole system in C.
: Trying to find an implementation of Prolog that could handle the system.
After thoroughly discussing the pros and cons of the two options, they arrived at the following possible scenarios. For option , the staff concluded that either () a prototype in C would be ready on time or () a prototype in C would be slightly delayed due to external circumstances. For option , () a prototype in Prolog would be ready on time, () a prototype in Prolog would be slightly delayed due to external circumstances, or () only fractions of a prototype in Prolog would be ready on time.
Having considered the probabilities and the values of the consequences, the staff arrived at the following assumptions, where represents the probability of , and represents the value of . For option , was thought to be at least 0.67, and consequently, would be at most 0.33. For option , would lie somewhere between 0.40 and 0.90, hence the sum of and could range from 0.10 to 0.60.
With regard to value, was the most desirable with being slightly less so. Both and would be clearly better than , which in turn was slightly better than . By far, would be the worst consequence.
Applying a cardinal ranking to the above probabilities and values, the staff agreed on the following:
Using a purely ordinal ranking, one would end up with the following, less precise ordering:
Computing the values of the rankings about using SR and CSR yielded the results presented in Tables 4 and 5. Let be the expected value of option given elicitation method , we then have and , resulting in option being the preferred alternative, while and resulting in the contrary.
|Rank sum (CRS)||0.29||0.23||0.19||0.13||0.13||0.03|
|Rank reciprocal (CRR)||0.49||0.16||0.12||0.08||0.08||0.05|
|Rank order centroid (CRC)||0.45||0.21||0.16||0.09||0.09||0.02|
|Sum reciprocal (CSR)||0.37||0.20||0.17||0.11||0.11||0.04|
Possible alternatives in the case of ordinal rankings, such as strict uses of either of the rankings or , do not affect the preference order of the options. However, a change from to when applying the cardinal method would alter the outcome, indicating the decision problem’s sensitivity to uncertainty, something which is not reflected when using a purely ordinal ranking of the consequences.
When treating the decision problem in the professional software program DecideIT , the probability values from the original assumptions can be used, resulting in the constraints:
Looking at the expected value graph in Figure 2, we note that although about 50% contraction is needed, it is arguable so that either the assumptions need to be revised or option should be the preferred one. Consequently, a cardinal ranking of the probabilities and the values seem to resonate better with a more detailed examination in which imprecise numbers are taken into account than a purely ordinal one.
4. Assessing models for cardinal relations
Given that we have a set of cardinal methods as in the previous section, how can they be validated? For ordinal relations in MCDA, simulation studies similar to [27, 37, 38, 39] and others have become a kind of de facto standard. The simulations herein are based on the fundamental idea that a set of genuine, or “true,” probabilities, in fact, exists in the mind of the decision-maker. No elicitation method is capable of completely mirroring these probability values, but in the simulations, the potency of the probability ranking approaches is judged by comparing the “true” values to those elicited by the methods mentioned earlier.
The modeling caters to the two extremes of decision-makers’ mindsets outlined earlier in the way the decision problem vectors are randomly generated. Following an DoF model, a vector is generated, where the components are kept within [0%, 100%], and subsequently normalized, that is, a process with degrees of freedom. Details on this kind of simulation can be found, for example, in . For an DoF model, the components are generated such that they sum to 100% already from the outset; that is, using a process of degrees of freedom. This simulation is based on a homogeneous -variate Dirichlet distribution generator. Details on this kind of simulation can be found, for example, in .
The “true” probabilities in the minds of decision-makers might, of course, follow a distribution different from the ones used in this study, and there might eventually be models available to elicit values following those. Nonetheless, the crucial observation is that the validity and reliability of the results of the simulations are highly dependent on how the minds of decision-makers are modeled. Although the difference in the number of degrees of freedom is only one of several parameters related to cognitive behavior, it still offers a meaningful way of distinguishing between cognitive models.
4.1. Biases of simulation studies
The results of the simulations are highly dependent on the type of generator used. A generator corresponding to an DoF model is referred to as an N-generator. In the same manner, a generator corresponding to an DoF model is referred to as an -generator. In applying the -generator, probabilities elicited by the RS method outperforms those elicited by other methods. Not because the RS method is superior per se, but because it produces numbers at regular intervals. Likewise, ROC outperforms the other methods when an -generator is used for the reason that the values elicited by the ROC method are similarly skewed toward the lower end of the interval.
In actual fact, it is impossible to determine whether decision-makers, in general (or even some), elicit values in particular accordance with or DoF representations of their knowledge. As a group, or as individuals, it is possible that they completely adhere to either one or that they follow an arbitrary mix of the two. Due to this uncertainty pertaining the cognitive processes of decision-makers, a rank ordering mechanism, to be robust, must elicit values that conform to both types of representations reasonably well. Therefore, to find the method that yields the most robust and efficient assignments, the evaluations in this study employ both types of generators and combinations of them.
4.2. Comparing the methods
To evaluate the validity of the RS, RR, and ROC methods for multi-criteria weight elicitation, Barron and Barrett  simulated a large set of “true” weights, using an -generator. Based on those, they then produced a corresponding set of surrogate weights for each of the elicitation methods. As shown earlier, the generation procedure does have significant effects with regard to such a comparison. The set of “true” weights is dependent on how we model the minds of decision-makers. Barron and Barrett  presented a computer simulation consisting of four main steps, which for probability elicitation is modified as follows:7
4.2.1. Generation procedure
For a decision problem with alternatives, where each alternative can result in one of consequences, generate probability vectors, , in dimensions. These vectors contain the so-called true probabilities. Then, for each elicitation method , produce vectors , according to the order of the “true” probabilities.
After that, generate an matrix of random numbers corresponding to consequence , of the th alternative. These are the values of each consequence.
Let be the probability obtained by method for consequence of alternative (where is either or “true”). For each method and each alternative , calculate the expected value . Note the rank order of for each , that is, the preference order of the alternatives for each method.
Lastly, determine if method resulted in the same most preferred alternative as “true.” If that is the case, then record a hit.
The abovementioned procedure (a simulation round) is repeated a large number of times, with the ratio of the number of hits to the total number of simulation rounds used as a measure of efficacy. In some MCDA studies, two additional measures of efficacy have been reported, namely the average value loss and the average proportion of the achieved maximum value range. These measures do not, however, add anything in particular in terms of value due to their strong correlation with hit ratio.
Using an -generator MCDM simulation model over the simplex , ROC outperforms the other two methods. But a study by Roberts and Goodwin  came up with a different result where RS performed better than ROC with RR in third place, by employing a different distribution generating function where a fixed number, say 100, is given to the most important criterion and the others are uniformly generated as U[0,100]. This -generator is, of course, different from -generators based on a Dirichlet distribution, and thus, their simulation study instead yields the result that RS outperforms ROC with RR in third place. As we see later, using surrogate weights for probability elicitation yields similar results.
4.3. Simulations of the cardinal surrogate numbers
This chapter focuses on the performance of cardinal surrogate weights in probability elicitation. The simulations included 20 different scenarios made up of 3, 6, 9, 12, and 15 alternatives, and for each of those cases, the number of consequences was set to 3, 6, 9, and 12, respectively. Each combination was simulated 10 times, and this was then repeated 10,000 times, totaling 2,000,000 decision problems. The probability vectors for the DoF model was generated using a standard round-robin random number generator with subsequently normalized numbers, and the probability vectors for the DoF model was generated using an -variate Dirichlet distribution. The value vectors were generated on a uniform interval but left unscaled, analogous to MCDA studies such as . Compared to alternative value distributions, there was no significant difference in the results.
A subset of the results, using a 50% combination of DoF and DoF models, is presented in the tables mentioned later. As described earlier, the numbers denote the ratio of the number of times the most preferred alternative according to method coincides with the most preferred alternative obtained from the “true” probabilities to the total number of simulation runs.8
The marginal utility of various levels of cardinal expressibility was evaluated by varying the levels of maximum cardinal differences in the simulations. The numbers, say , in the “Symbols” column denotes the maximum -index, , of the set of symbols, such that , for . The results are obtained from the ordinal counterparts when . Hence, implies , means , and so on. The actual results of the methods, the hit ratios, are given in percentages.9
Albeit there is a clear advantage of introducing the possibility of equivalence () between probabilities, the results, in general, indicate diminishing returns with an increase of the maximum -index, in particular, when the number of possible consequences for each alternative is high. For example, at nine or more consequences, cardinal probability elicitation with only and is to prefer for CRC. Also, the cognitive burden on decision-makers tends to increase with the granularity of the scale. Hence, a maximum -index of 2 seems to provide enough expressional power for probability elicitation (Tables 6–12).
|Three consequences and three alternatives||0||86.9||86.8||86.7||87.0|
|Three consequences and 15 alternatives||0||70.9||69.2||69.1||69.5|
|Six consequences and six alternatives||0||79.4||79.8||78.0||80.8|
|Six consequences and 12 alternatives||0||75.0||73.8||72.3||75.1|
|Nine consequences and nine alternatives||0||76.7||76.5||72.2||78.3|
|Twelve consequences and six alternatives||0||80.1||80.5||73.2||82.2|
|Twelve consequences and 12 alternatives||0||74.8||74.4||67.9||76.3|
The results show that cardinal methods markedly outperform the ordinal methods. Among the cardinal methods, it is CRC and CSR that provide the best outcomes, but the difference between these lies within this investigation’s margin of error.
5. Summary and conclusion
Elicitation methods available today are often either too cognitively demanding and require too much time and effort or unable to use the available information. The aim of this study was to offer decision-makers a set of methods for probability elicitation with a reasonable balance between simplicity and usability on the one hand and correctness and accuracy on the other hand. In particular, we strived to reduce the issues of applicability by loosening the requirements of precise information, while allowing for more details than what ordinal methods can handle. Also, the methods should be relatively robust and applicable to a wide range of decision problems.
By augmenting a set of ordinal elicitation methods, originally developed for weight elicitation within MCDA, with a notion of a difference between adjacent probabilities, we arrived at a collection of cardinal probability elicitation methods. The robustness of the methods was evaluated using a large number of simulated decision problems, where we also accommodated for two different cognitive models, based on the degree of freedom used during the ranking, including a mix of them. The results of the simulations point toward a significant improvement of cardinal methods over purely ordinal ones. In particular, due to the introduction of equality between probabilities. Among the cardinal methods, in particular, CSR and CRC seem to provide the most robust results. CSR generalizes SR from  by also taking the cardinal differences of the probabilities into account in a more straightforward way than, for example, .
More fine-grained expressions seem to produce diminishing returns when the number of consequences of each alternative becomes high. For alternatives with 12 consequences or less, a cardinal method with the set seems to supply the decision-maker with adequate options for producing quite reliable probability elicitations. For alternatives with more than 12 consequences, the reduced set seems to provide a sufficient granularity.
In conclusion, cardinal methods rather than ordinal ones should be preferred for eliciting probabilities when applicable. More specifically, CSR and CRC have been shown to produce surrogates, which outperform those of their competitors. They keep decision-makers from having to provide too much detail, something which has turned out to be difficult for decision-makers in general, while at the same time reducing the risk of neglecting available information.
This research was funded by the Swedish Research Council FORMAS, project number 2011-3313-20412-31, as well as by Strategic funds from the Swedish government within Information and Communications Technology (ICT)—The Next Generation.
Conflict of interest
All authors report no conflicts of interest relevant to this chapter.
- It could be that the sum is limited in direct rating methods as well but then as a consequence of a uniform limit to the individual numbers.
- See, for example, [32, 33] and some others from the same authors on methodological and cognitive aspects of inexactness in decision-making.
- Unless stated otherwise, the component vectors of decision problems will be modeled as simplexes, Sx, consisting of x1>x2>⋯>xN, where ∑xi=1 and 0≤xi.
- For each choice of action, we assume a decision node to be the root of a standard two-level decision tree with each branch representing an alternative, which, in turn, is followed by a set of exhaustive and mutually exclusive events. It is the probabilities of those events that the decision-maker is asked to rank. Such a setup can easily be generalized to decision trees with multiple levels.
- Although using verbal interpretations for illustrative purposes, we do not intend to discuss issues related to difference values and their respective meanings in relation to probabilities. In an actual implementation of the method, we consider cardinal input obtained from graphical sliders in a software tool.
- Here, pi=Pci, that is, the probability that consequence i obtains.
- For simplicity in generation procedures but without loss of generality, assume that all alternatives have the same number of consequences, N.
- Two alternative sets of measurements, not shown in this chapter due to the strong correlation with the hit ratio, exist. One is the number of times to three most preferred alternatives obtained using μ′ as elicitation method agrees with the three most preferred alternatives according to the “true” probabilities (i.e., the “podium”). A second is the number of times the overall rank of the alternative-using method μ′ agrees with the overall rank based on the “true” probabilities.
- The results of the sets of 10 runs yielded a standard deviation of around 0.2–0.3%.