Reliability and Evaluation of Identification Models Exemplified by a Histological Diagnosis Model

Expert systems, or more precise decision support systems, are valuable tools for structuring the results of scientific research and to translate this to knowledge. The decision support sys‐ tem Determinator is now used for several years as a platform for models to identify subjects [1, 2]. The system is based on the two main different procedures for identification [3, 4]; a sin‐ gle access key (tree) and a free access key (matrix). The latter option provides the possibility to calculate the match between the subject as chosen by the user and the objects as included in the data model, based on a range of characteristics. In addition, a matrix allows to make selections, to filter the set of available objects and to compare two objects for their variability.


Introduction
Expert systems, or more precise decision support systems, are valuable tools for structuring the results of scientific research and to translate this to knowledge. The decision support system Determinator is now used for several years as a platform for models to identify subjects [1,2]. The system is based on the two main different procedures for identification [3,4]; a single access key (tree) and a free access key (matrix). The latter option provides the possibility to calculate the match between the subject as chosen by the user and the objects as included in the data model, based on a range of characteristics. In addition, a matrix allows to make selections, to filter the set of available objects and to compare two objects for their variability.
Datamodels for Determinator can be constructed using a Developer, which is part of the entire Determinator platform. Besides defining the objects (descriptions, illustrations and labels), the characteristics, and the connection between them (the matrix), the Developer also allows to evaluate the structure of the data model. Several parameters and metadata for the evaluation of a data model are part of the Developer.
This chapter provides the logic basis for the Determinator platform and introduces the background and calculation of four different parameters for the evaluation of data models: the coverage of variability space of the total data model or of a single object, the redundancy in a data model, and the capability to distinguish between different objects. The way in which these parameters are developed and applied will be demonstrated using a real case concerning the diagnosis of illegal hormone treatment of veal calves [5,6]. The applicability of the parameters will be discussed and the development of a specific case (histological diagnosis) in a general platform (Determinator) will be evaluated.

Conventions
A datamodel developed in the framework of the DSS Determinator includes the following tables: • List of features, with image file names and descriptions, • Groups of features, with names and descriptions, • List of targets, with image file names, descriptions and labels, • Match table, with the feature on the rows and the targets on the columns, • Tree information per node, with descriptions and image file names.
A data model consists of n features (denoted by i, j) to describe m objects (targets in the terminology of Determinator, denoted by p, q, r, s). Every feature consists of two or more feature states (k, l, and K for composite features). The basic principles are defined using first order logic [4,7].

Free access key
Every cell in the matrix Target x Features contains a decision rule. These decision rules describe the logical relationship between the feature states and the targets, by specifying valid feature states for each target. A feature state can apply to one or more targets which imply that there might be no unique relation: with F i,k as feature state k of the i th feature, and T p ... T s as a series of targets which can be assigned individually. Otherwise, applying more specific feature states can limit the choice of targets: with F i,k as feature state k of the i th feature, F j,l as feature state l of the j th feature, and T p as target.
The use of different states of a feature can add to the separation capability of that feature. Assuming three feature states: In this logic distribution, feature state F i,1 identifies exclusively target p, and feature state F i,3 identifies exclusively target q, but feature state F i,2 can either identify target p or target q. This dual relationship can be indicated as overlap in a Venn diagram.
The DSS Determinator allows the user to choose a subject for identification and to answer a range of questions denoting the n features available in the model. Every possible answer represents a certain feature state k. The match between the chosen subject (represented by the answers given) and a target p is calculated by summing up all the true relationships between the chosen feature states F i,k and the defined target p: with W i as weighting factor for feature i. The sums for all targets are represented as Match percentages in the output of the system and listed in descending order.

Single access key
A typical dichotomous tree consists of nodes (lemmas), which can point to either two targets (leaves), two nodes (branches) or combination of the two. Basically, every lemma in a tree is based on the decision rule: The functions P(x) and ¬P(x) can describe a target or a further node. The structure of a dichotomous key can be defined as: The combined feature state F i,K can combine more than one simple state, e.g. k and l. Determinator allows to construct a tree in which a node can point to a node in another part of the tree, and more than one node can point to a defined target T p .

Quality parameters
The following parameters for validation of data models are being developed and evaluated in the framework of this paper.

Redundancy
Overlap between the areas of two targets exists when a variability range for target p overlap with the variability range for target q for the same feature (see figure 2, targets B and C; equation (3)). The overlap between the area of target p and of target q is the sum of the overlap regions for all features. Assuming the set of feature states that apply to target p as {F i,pmin , F i,pmax } and the set of feature states that apply to target q as {F i,qmin , F i,qmax }, then: Average overlap for all feature differences between two targets p and q: The average redundancy of the total data model is the averaged overlap of every combination between two targets p and q. There are ( m * ( m -1) ) / 2 different combinations of targets.
Average redundancy: The smaller the average redundancy, the smaller the chance that a certain range of feature states of a chosen subject will result in two or more match percentages of 100 % (according to (4); see object 3 in Figure 2). Redundancy is related to the correlation coefficients among features.

Figure 2.
A hypothetical variation space with five targets A-E and four user chosen subjects. 1: subject outside the total variation space of the data model, a 100% match is impossible; 2: subject inside the total variation space of the data model, but without fit with one of the targets, a 100% match will not occur; 3: subject in the overlap of the variation of two or more targets, two or more 100% matches will result; 4: subject in one and only one variation space of a target, one 100% match will be found.

Uniqueness
The capability to distinguish between two targets p and q depends on the presence of at least one feature with unique variability ranges for each of the two targets. If overlapping regions exist for all features, there is at least a possibility to have a set of features states, describing a chosen subject, which shows a full match with more than one target. So, two targets p and q can uniquely be differentiated if and only if a feature i exists for which no state identifies target p as well target q: This can be rewritten as: with: U p,q = TRUE: the two targets p and q have at least for one feature i non-overlapping feature ranges; there is at least one value r i,p,q equalling zero (equation (6)), and there is at least one feature indicated red in the menu option Compare of Determinator.
U p,q = FALSE: the two targets p and q have overlapping ranges for all features; there is no value r i,p,q equalling zero (equation (6)), and there is no feature indicated red in the menu option Compare of Determinator.
If the distinction between two targets is based on only one feature i with a value r i,p,q equaling zero (no overlap), then the distinction could be considered as weak. Targets A, C and E in Figure 2 can be distinguished along the X-axis, targets A and B, and targets D and E can be distinguished along the Y-axis, whereas targets B and C can neither be distinguished along the X-axis nor the Y-axis. If the group with the distinctive feature is disabled, Determinator could give for more than one target a full match in a query.

Separation capability
A data model can identify uniquely every target if and only if every combination between two targets p and q can be described with U p,q = TRUE: A datamodel can be indicated as suboptimal or not valid when the differentiation coefficient D tot is less than 100%. Whether or not a data model could be validated with a value for D tot lower than 100 % depends on the intention to have non-distinguishable targets (synonyms) present in the model or not.

Coverage of variability space
Every target possesses a part of the n-dimensional space defined by the data model. The share of a target in the total space is calculated as: Coverage of space of a single target p: with: In the situation of D tot equalling 100 %, the sum of all individual target coverages is an indication of the total coverage of the variability space: The larger the coverage of the total variability space, the smaller the chance that a certain range of values of a subject will result in no match percentage of 100 % (according to (4)). In the situation that D tot is smaller than 100% an overestimation occurs.
The diagnosis for illegal growth hormone use in veal calves will be used as illustration of model development and performance testing.
Decision Support Systems 56

Model development and application
The use of illegal growth promoters is, although prohibited in the European Union, still part of current practice in animal farming. Reasonable monitoring of the hormones is hampered by the fact that the hormone or hormone cocktail is metabolised or excreted within a period of a few weeks. The effects of the use of hormones, however, can be seen in histological stained sections of either the prostate (male calves) or gland of Bartholin (female calves) with different staining techniques. The monitoring by means of histological examinations appears to be an important instrument in maintaining legislation for food safety and animal health [5,6]. The interpretation of histological disorders needs a high level of expertise. An expert model has been developed in the framework of the DSS Determinator, in order to support the user to identify the extent of hormone treatment of veal calves. The different quality parameters will be illustrated after a further presentation of the model.
The data model consists of 13 features to identify a treatment level indicated as "normal", "suspect" or "positive". The features are presented in Table 1, and some of them are illustrated in Figure 3.  There are two strategies to reach a diagnosis:

A.
A quick, general diagnosis. Depending on the sex of the calf, selecting either feature groups I and IV (male) or groups II and IV (female) is sufficient.
B. An extended diagnosis. In addition to the feature groups as indicated in strategy A group III is necessary.
The kernel of the data model consists of the groups I, II and IV to give a diagnosis of the treatment level. The diagnosis for possible hormone treatment in female calves is more complicated than for male calves. This is caused by the natural production of oestrogen hormones, which is lacking in male calves. The simple diagnosis <IF metaplasia=present THEN target positive> needs further support in female calves. A second diagnostic feature is used based on a larger share of ducts in the glandular tissue. The basic rule is then expanded to <IF metaplasia=present AND duct_ratio=elevated THEN target positive>. For both male and female calves the diagnosis "suspect" is supported by the number of deviating features. The duct ratio is excluded from this feature since it applies only to female calves. The logic tables to diagnose the level of treatment are presented in Table 2. The diagnoses as illustrated in Table 2 can be extended further by including the individual features of group III ( Table 1). The number of deviating features (feature 13) needs to be adjusted accordingly. The basic rules are translated in a formal decision tree, as shown in Figure 4. Finally, the decision tree is used as basis for a free access key. The importance and position of the feature indicating the presence of metaplasia is different for male and female diagnosis. For the latter only the combination of metaplasia and elevated duct ratio is decisive for the diagnosis "positive". As a consequence, the presence of metaplasia is included twice in the free access key as feature 1 (group I for male animals) and feature 3 (group II for female animals). The free access key was optimised by giving all features a suitable weighting factor. All features of group III got the factor one.
The performance of the model is tested in eight runs following the two strategies. The continuous feature 13 is varied between 0 and 9 in every run in combination with the appropriate choices for the other features, as follows: In every run the matches between the simulated subject and all three targets (treatment classes) "normal", "suspect" or "positive" were calculated according to equation (4). The results for the eight runs are shown in Figures 5 and 6.
The model after adjusting the appropriate weighting factors shows the highest match percentage for the same target (class) as indicated by the tree (Figure 4) in all cases. The percentage for a diagnosis "positive" of a male animal ( Figure 5) is 0% when no deviating feature is found, in contrast to a diagnosis of a female animal ( Figure 6) where an elevated duct ratio can be found in combination with # deviating features = 0. For the same reason is the difference between the diagnoses "normal" and "positive" smaller for male animals (Figure 5d) than for female animals (Figure 6d) in the case that # deviating features = 1. In general, the comparable situations as illustrated in Figures 5a and 6a/b, in Figures 5c and 6c, and in Figures 5d and 6d respectively, shows highly comparable results. The addition of the features of group III (Figures 5b, 5d, 6d) modifies the outcome of the model in the sense that in a lot of cases not 100% score can be reached. This reflects the situation that the finding of metaplasia (male) or the combination of metaplasia and an elevated duct ratio (female) accompanied with only a few or even no other deviations is unlikely or highly unlikely. The large coverage of the targets indicated as "positive" (Table 3) is caused by the situation that the model is focusing on the correct diagnosis of possible treatment minimising the possibility of having false negative results. In both cases for male and female calves the final diagnosis is based on one feature (see Table 2 and Figure 4), whereas the states of the other features are overruled.   The correlation between the features is shown in Table 4. Only a full correlation is found between the two features indicating the presence of metaplasia. This feature is included twice since different weighting factors appeared to be needed for the different animal types. Another reasonable high correlation factor was found between the duct ratio and the combined presence of metaplasia and elevated duct ratio. The presented level of correlation coefficients is in line with the calculated average redundancy: 0.405 (equation (8)). Table 4. Matrix with Pearson's correlations between the features of the kernel model for diagnosis of illegal hormone use in veal calves. The colour of every cell (running from red to green) represents the value of the correlation coefficient.
The match table (Table 5) shows the relative resemblance between the targets based on equation (7). Except for the diagonal, the green colour, based on the calculations using equation (9), indicates that every target can be diagnosed uniquely compared to any other target. Hence, the separation capability is 100% (equation (10)). Table 5. Matrix with the matches between the targets of the model for diagnosis of illegal hormone use in veal calves.
The figure in every cell is calculated according to equation (7), the colour of every cell is based on equation (9).

Discussion
The process of identifying the level of treatment with growth hormones of veal calves is a rather specific situation for diagnosing in the broader framework of application of DSS in medicine [8][9][10]. Only one feature matters, all other features will only modify the probability that a diagnosis belongs to the correct class. Besides that, a constraint dependency rule existsbetween feature 13 (number of deviating features; Table 1) and the totalof features from group III plus either from group I or group II which show a state other than normal. The importance of the main features is visible in Table 2 and Figure 4. The two main features (male: presence of metaplasia, female: combined presence of metaplasia and an elevated duct ratio) both got a weighting factor of 9 in order to outnumber the features in group III for reaching a correct diagnosis (number of features in group III plus 1). Since the presence of metaplasia in the diagnosis of a female calf does not form the exclusive indicator for treatment in contrast to the position of that feature in the diagnosis of the male calf, it got a weighting factor of only 1. The weight factors in the current model are fixed instead of being input sensitive [11].
There is no generic method for validation of data models in expert systems [7]. In the current study a top down modelling approach was chosen: logic tables lead to a decision tree, which was the basis for the full matrix of the free access key. This approach does not provide a tool for handling constraint dependency rules [7], which was solved here by optimising the weighting factors. Rass et al. [12] listed a number of requirements for valid expert systems. Of these, the requirements for minimising the redundancy and for avoiding unintended synonyms are now supported bymeasures to calculate the extent of these parameters: redundancy (equation (8)) and separation capability (equation (10)), respectively.
The position of the features of group III (Table 1: indicating the individual deviating characteristics) in an extended diagnosis (Figures 4b, 4d, 5d) can be discussed in terms of fuzzy logic principles. In several experiments with fuzzy logic comparable results have been found [9,13]. Here, probability or uncertainty is the basic aspect causing patterns in the model outcomes that can be explained as membership functions [13]. As an example, the presence of metaplasia in a prostate is a definite diagnosis for treatment with growth hormones (n = 1 in Figure 5c in concordance with the tree in Figure 4), but it is highly unlikely that with such a diagnosis none of the other features of group III (Table 1) would show a state deviating from normal. The probability that an animal with the sole presence of metaplasia belongs to membership class "positive" is only slightly higher than its membership to the class "suspect" (n=1 in Figure 5d). The kernel model without using the individual features of group III (strategy A) seems sufficient to reach a diagnosis. All the features underlying the depending feature 13 (group IV) are nevertheless included in the model in order to improve the performance of the user by supporting his or her examinations, and to provide the possibility of an iterative process of optimising the diagnosis [14].
Existing results of optimising a datamodel for reaching a diagnosis reveal that lower numbers of features appeared to be optimal [10]. In those cases that a model consists of only a few features, expressing them in terms of space dimensions (e.g. a two-dimensional space in Figure 2), a major part of the variation space might be covered. Increasing numbers of features (i.e. dimensions) result in an exponentially growing number of theoretically existing feature combinations that are not linked to a target. In the present study a total of approx. 14 % of the variability space was not covered by any target (Table 3). In order to evaluate this non-assigned part of the variability space, let us assume a variable number of features n each consisting of three feature states, a number of targets that can be accommodatedby increasing with a factor of 2 with every additional feature, and one and only one state per feature identifying a target p: Whereas equations (1) and (2) apply.
The resulting multidimensional spaces for a number of features ranging from 2 to 8, the number of targets accommodated and the resulting coverage are shown in Table 6. If more than one state of a feature can identify a target a larger coverage can be expected. This is the case in the here presented datamodel for the diagnosis of hormone treatment, since the probability to correctly classify all situations of hormone treatment was maximised. This is illustrated in Table 3. The high coverage of approx. 85.8% of the current model can be explained by the situation that the model was optimised to find all occasions of illegal use of hormones, i.e. the coverage of the classes "positive" was maximised.  Table 6. Relationship between the number of dimensions of a variability space (n), the possible number of combinations of feature states, and the coverage of the associated number of targets under the assumption of only one state per feature identifying a target (equation (13)).
The development of a specific model for reaching a histological diagnosis in a general platform provides several constraints, such as the lack of automatically calculating the number of deviating features (feature 13) from the number of individually selected features of group III. The advantage of the current procedure is the strict framework which forces to analyse the information structure in detail, and generic tools are available for testing and evaluation.

Conclusion
The presented parameters for redundancy, uniqueness, separation capability and coverage of variability space provide useful tools for the validation of a datamodel. The Developer as part of the Determinator system implements these parameters in an ordered manner, as exemplified in Table 5. The development and performance of the datamodel for reaching a diagnosis of the treatment of veal calves with hormones in the framework of Determinator reveals that a specific model can be developed and applied successfully in a generic framework.

Author details
L.W.D. van