Open access peer-reviewed chapter

Some Methods for Evaluating Performance of Management Information System

Written By

Khu Phi Nguyen and Hong Tuyet Tu

Submitted: 01 August 2017 Reviewed: 16 January 2018 Published: 24 October 2018

DOI: 10.5772/intechopen.74093

From the Edited Volume

Management of Information Systems

Edited by Maria Pomffyova

Chapter metrics overview

1,787 Chapter Downloads

View Full Metrics


Recently, several kinds of information systems are developed for purposes and needs of business and play an important role in business organizations and management operations. Management information system, or MIS for short, is a kind of information system. It is a key factor to facilitate and attain efficient decision-making in an organization. Its performance relates to many other information systems, for instance, DSS or decision support system, SIS or strategic information system, etc. Methods of testing statistical hypotheses concerning the performance of MIS are absolutely essential to support management activities and decision-making.


  • management information systems
  • information theory
  • rough set theory
  • decision-making process

1. Introduction

A system is a set of interrelated components assembled to accomplish certain objectives or goal. Basic characteristics of a system are highlighted as boundaries, interfaces, input-outputs, and methods of making outputs from inputs. The environment of a system includes people, organizations, and other systems that supply data to or receive data from the system.

Solving problems comes from a system that usually uses the method of systems approach taking into account the goals, environment, and internal workings of the system. This method involves the following steps:

  1. Define the problem and collect data for the problem.

  2. Identify and evaluate feasible solutions.

  3. Select the best solution and determine whether the solution is working.

An information system (IS) consists of components such as hardware, software, databases, personnel, and procedures that managers can use to make better decisions in control business operations. ISs are also used to document and monitor the operations of some other systems, called target systems that are prerequisite for the existence of ISs. On side of infrastructure, information system is an integration of diverse computers, displays and visualizations, database, storage systems, instruments, sensors, etc. via software and networks to share data and to provide aggregate capabilities.

In business operation, the activities of an organization equipped with IS are usually of three kinds: operational, tactical, and strategic planning. In this context, a strategy is meant as determination of the basic long-term goals and objectives of an enterprise and the adoption of courses of action and the allocation of resources necessary for achieving these goals. Operational tasks are the daily activities of the firm in consuming and acquiring resources. These daily transactions produce basis data for the operational systems.

ISs that provide information for allocation of efficient resources to achieve business objectives are known as tactical systems. Tactical systems provide middle-level managers with the information they need to monitor and control operational tasks and to allocate their resources effectively. The time frame for tactical activities may be monthly, quarterly, or yearly. Alternatively, ISs that support the strategic plans of the business are known as strategic planning systems. These systems are designed to provide top managers with information that assists them in making long-term planning decisions.

Both of the strategic planning information systems and tactical information systems may use the same data source, so the distinction between them is not always clear. For example, middle-level and top managers use budgeting information to allocate reasonable resources or to plan the long-term or short-term activities, budgeting becomes a tactical decision activity or a strategic planning activity, respectively. Hence, the differences between systems are attributed to whom and what the budgeting data are used.

The top management of the organization carries out strategic planning based on results of operational tasks, tactical systems, and related external information to decide whether to build new plants, new products, facilities, or invest in technology. For making these decisions, strategic planners have to address problems that involve long-range analysis and prediction. The time frame for strategic activities may be months or years.

Some basic business systems that serve the operational level of the organization are called transaction processing systems or TPS for short. A TPS that records the daily routine transactions necessary to the conduct of the business monitor and control system physical processes is called process control system or PCS. For example, a wastewater treatment plan uses electronic sensors linked to computers to monitor wastewater processes continually and control the water quality process [1]. Similarly, a petroleum refinery uses sensors and computers to monitor chemical processes and make real-time controls to the refinery process. A process control system comprises the whole range of equipment, computer programs, and operating procedures [2].

Knowledge-based IS that supports the creation, organization, and dissemination of business knowledge to employees and managers throughout a company is named as knowledge management system. In such a case, knowledge management is the deployment of a comprehensive system that enhances the growth of knowledge. Expert systems are the category of artificial intelligence which has been used most successfully in building commercial applications. An expert system is also considered as a knowledge-based system that provides expert advice and act as expert consultants to users.

A decision support system (DSS) is a computer-based system intended for use by a particular manager or a team of managers at any organizational level in making a decision in the process of solving a semi-structured decision. Database-based management system and a user interface are major components of a DSS. The database consists of information related to production information, market and marketing information, research data, financial transactions, and so forth.

The decision-maker must have suitable knowledge and skills on mining these systems of DSS to address the problem arising and make effective decisions. In traditional approaches to decision-making, usually scientific expertise together with statistical descriptions is needed to support decision-making. Recently, many innovative facilities have been proposed for decision-making process in enterprises with huge databases, together with several heuristic models.

Management information systems (MIS) are a kind of computer ISs that could collect and process information from different sources to make decisions in level of management [3]. This level contains computer systems that are intended to assist operational management in monitoring and controlling the transaction processing activities that occur at clerical level. MIS provides information in the form of prespecified formats to support business decision-making. The next level in the organizational hierarchy is occupied by low-level managers and supervisors. Therefore, MIS takes internal data from the system and summarized it to meaningful and useful forms as management reports to use it to support management activities and decision-making.

MISs encompass a complex and broad topic, that is why, MIS boundaries need to be defined to reduce difficulties in system managing. Firstly, MIS contains a vast number of related activities, so it is hard to review all of them. It may discuss on a selected sample of activities, depending on objectives and viewpoint of researcher. Alternatively, it only focuses on farm levels or on some lesser extent systems enough for researchers addressing problems. Secondly, MISs can be defined and described in several frameworks. Only a few of these frameworks are used to discuss important subject matters. Lastly, MISs are developed as a sense of how these systems have evolved, adapted, and been refined as new technologies have emerged, changing economic conditions, etc.

To evaluate performance of MIS, its output data must be characterized in a set of basic features appropriate to functions, objectives, and goals of the system. These output data need to be observed repetitively to evaluate the extent to which MIS is implemented to make successful decisions in organization. Using these observations, methods of data mining in rough set point of view, statistical analysis, etc. can be applied to evaluate the extent to which MISs are used to make effective decisions in planning purposes [4, 5, 6, 7].


2. Evaluation of features and making decision rules

In mathematical modeling, an IS can be modeled by a sample Ω = {ω1, ω2, …, ωn} of n objects ωi where i = 1,2,…, n. The ith object ωi is observed by instances of m conditional features f1, f2,…, fm, valued as fji) j = 1,2,…, m. Additionally, a feature d characterizes a specific effect of ωi denoted by d(ωi), the so-called decision feature. In case of having s effects for a decision, d is represented by values d(ωi) = dk with k∈{1,2,…, s}.

Let F = {f1, f2,…, fm}, then (Ω, F∪{d}) is a decision information table or DIT with n = |Ω| objects, m = |A| conditional features, and a decision d. Objects ω and ω’ are indiscernible if and only if the following binary relation RF on Ω with respect to (w.r.t.) F is satisfied:


This is an equivalence relation. Equivalent class of ω∈Ω with respect to (w.r.t.) F is:


Assume that there are r such equivalence classes and named by C1, C2,…, Cr. They are disjoint subsets and form a partition of Ω by RF. Similarly, for the decision feature d, another partition of Ω is D1, D2,…, Ds defined by the following equivalence relation:


Here, Dk = {ω’∈Ω | d(ω’) = dk} is an equivalence classes called the kth decision class of the DIT. If f(Dk) = |Dk|/n be frequency of Dk w.r.t Ω, information entropy H(d) of decision feature d is

Hd=k=1sfDilog2 fDkE4

On the other hand, let f(Ci) = |Ci|/n be frequency of Ci and f(Dk| Ci) = |Dk∩Ci|/|Ci| conditional frequency of Dk conditioned Ci. The conditional entropy H(d|F) of the decision feature d w.r.t condition F is determined by


From Eqs. (4) and (5), the mutual information I(F, d) between F and d is given by


The mutual information is nonnegative and symmetric, i.e. I(F, d) = I(d, F). In this case, the significance of feature f∈F w.r.t d is defined as


The significance of feature a represents the dependency of decision attribute d relative to condition attribute f. This measure reflects the discrimination ability of condition attributes. The larger Sgnf(f, d), the more stronger of dependency relationships between a and decision attribute d. if Sgnf(f, d) > 0, then f is a core feature of DIT or f satisfies


Any core feature is significant and may not be eliminated in mining DIT. Let CFs be a set of all core features, CFs ⊆ F. To find CFs, each feature in F must be verified using Eq. (8) to whether or not include it to CFs.

Example 1: To analyze some features of a service, Table 1 illustrated a DIT consists of evaluations of nine clients on four features of the service. In which, d is the decision feature, f1: capacity for innovation; f2: service capability; f3: product technologies; and f4: solution, are conditional features. Values in Table 1 mean, 0: unpleased, 1: acceptable, and 2: very pleased.

(a) Original data table
(b) Sorted data table

Table 1.

A decision information system for evaluation service quality.

Here, F = {f1, f2, f3, f4}. Using Eq. (1), four equivalence classes w.r.t F are C1 = {ω1, ω8}, C2 = {ω2, ω7}, C3 = {ω3, ω5, ω9}, C4 = {ω4, ω6} and from Eq. (3) two decision classes D0 = {ω2, ω5, ω7, ω9}, D1 = {ω1, ω3, ω4, ω6, ω8}. From Eq. (4), the information entropy of decision feature d is H(d) = 0.9911 and H(A) = 0.4976. From Eq. (5), the conditional entropy of d is H(d|F) = 0.3061, so the mutual information between F and d is I(F, d) = 0.6850.

If the first feature a1 is eliminated, it is obtained the same H(d), but H(F − {f1}) = 0.5144 and H(d|F − {f1}) = 0.7505. These imply I(F − {f1}, d) = 0.2405 < I(F, d) and the a1, capacity for innovation is a core feature. But, Sgnf(f4, d) = Sgnf(F, d)−Sgnf(F − {f4}, d) = 0, so f4 may be eliminated since it is not significant.

The features F, d can be considered as random quantities with values are represented in rows of a DIT. In theory of information, the mutual information is a measure of average information this random quantity receives from that one in all ones conditions and vice versa. Therefore, I(F, d) measures quantity of average information that the decision feature d receives from conditional features w.r.t. decisional value of d. That is why, it is concerned to the problem of removing redundant conditional features so that the reduced set provides the same effect, e.g., the same quality of classification or decision as the original.

A coeffect reduced set R of conditional features set is a subset of A so that I(R, d) = I(F, d), i.e., R contains some conditional features having the same effect as F. Any coeffect reduced set or reduced set of F for short can be used as the whole F. An algorithm to find a reduced set R based on mutual information is as follows:

ALGORITHM MIBR // Mutual Information Based Reduced set.

// Input: DIT = (Ω, F ∪ {d}).

// Output: R // a reduced set of F.

S ≔∅; R ≔ CFs; // set of core features.


S ≔ R; for any f∈F−R, if I(R∪{f}, d) > I(S, d) then S ≔ R∪{f};

R ≔ S; // reassign before doing the next iteration.

Until I(R, d) = I(F, d);

Example 2: Using data in Table 1, the above algorithm is done as follows.

Firstly, R = CFs = {f1}, S = R then

  1. f2∈F−R, then I(R∪{f2},d) = 0.6850 > I(S, d) = 0.3198, so S = R∪{f2} = {f1, f2};

  2. f3∈F−R, I(R∪{f3},d) = 0.6850 = I(S, d), S does not change;

  3. f4∈F−R, I(R∪{f4},d) = 0.6850 = I(S, d), S does not change;

R = S = {f1, f2}. By checking, I(R, d) = 0.6850 = I(F, d), the iteration is terminated. It is obtained R = {f1, f2} is a reduced set of F.

It is noticed that, if the two steps i and ii of the previous treatment are permuted, then the set R = {f1, f3} is another reduced set of F.

Remark: As shown above, reduced set R of DIT is not unique. Finding minimum reduced set of DIT is an optimization problem. Several algorithms have been proposed to solve this problem, e.g., algorithm of rough set-based feature selection based on ant colony optimization (RSFSACO) in [8], cf. [9], for more detail.

Given X, a subset of Ω in a DIT, low-approximation or upper-approximation of X w.r.t. F respectively named as LFX or UFX, is defined by:


It can be shown that LFX ⊆ X ⊆ UFX. Some other relations between these approximations have been illustrated, e.g., in [5]. The difference set BFX = UFX−LFX is called a boundary of X and Ω−UFX is the outside region of X. X is a rough set if BFX≠∅, otherwise a crisp set.

Example 3: In Example 1, let X = {ω1, ω3, ω5, ω7, ω9}. Then, the approximations of X are LFX = {ω3, ω5, ω9} = C3 and UFX = {ω1, ω2, ω3, ω5, ω7, ω8, ω9} = C1∪C2∪C3. The boundary BFX = {ω2, ω8, ω9} differs from empty set, so X is a rough set and C4 is the outside region of X. Figure 1 shows all these sets w.r.t in Ω.

Figure 1.

Approximations of X.

Any decision class Ωk in Ω/Rd is subset of Ω, so it has a low approximation LFΩk. Hence, positive region in Ω w.r.t d, f is the following subset:


In data analysis, the dependence between attributes is important. The dependency of the decision feature d on the conditional features F is defined by the following ratio:


By definition, 0 ≤ Dep(d, F) ≤ 1 and if Dep(d, F) = 1, d depends totally on F. If Dep(d, F) = 0, i.e., Pd(F) = ∅, then d does not depend on F. In case of 0 < Dep(d, F) < 1, d depends partially on F. Using the degree of dependency, a coeffect reduced set R of conditional features in a DIT can also be found by meaning of Dep(d, R) = Dep(d, F).

Example 4: Example 1 gives two decision classes D0 = {ω2, ω5, ω7, ω9}, D1 = {ω1, ω3, ω4, ω6, ω7, ω8}; low approximations of these classes are LFD0 = {ω2, ω7}, LFD1 = {ω1, ω4, ω6, ω8} thus Pd(F) = {ω1, ω2, ω4, ω6, ω7, ω8} and the degree of dependency or quality of approximation is Dep(d, F) = 1/3. Using the coeffect reduced set R = {f1, f2}, it can be shown that all equivalence classes w.r.t R are the same ones in Example 1. Therefore, the above low approximations and positive region are also the same, i.e., LRD0 = LFD0, LRD1 = LFD1 and Pd(R) = Pd(F).

So far, problems of inducing rules from DITs have been studied and developed. The rough set method can be applied to the problems with several advantages [5]. For instance, the lower and upper approximations are applied to describe the inconsistency of a DIT and to induce corresponding rules dynamically from decision systems [6]. These methods of approximation can be used to address incomplete input data for inducing decision rules [7]. Such rules can be applied to partition a set of objects into classifications [10].

Given a DIT, let Vf be the range of f∈F, for a v∈Vf, ω ∈ Ω a proposition like f(ω) = v or f = v for short, takes a logic value true or false depending on ω. Assignment, ϕ ≔ f = v is to define a logic variable ϕ w.r.t the proposition f = v. Then, ϕ is true if there exists ω ∈ Ω so that f(ω) = v or false in vice versa. Set of logic variables on F and logical operations, like ~: not; ∧: and; ∨: or; set up a set of logic expressions called decision language from F, denoted by L(F). The meaning of ϕ in L(F), denoted by ⟨ϕ⟩, is a set of ω in Ω so that the proposition ϕ is true. Additionally, if ϕ ≔ f = v then ⟨ϕ⟩ = {ω∈Ω/f(ω) = v}, so ϕ takes the set ⟨ϕ⟩ as its description.

A decision rule allows individual, team workers, and organization choose effectively specific course of action in response to opportunities and threads and help. Formally, a decision rule is a logic expression defined by proposition ϕ → ψ, read “if ϕ then ψ“, where ϕ ∈ L(F) and ψ ∈ L(d) referred to as condition and decision of the rule, respectively. A decision rule ϕ → ψ is true if ⟨ϕ⟩ ⊆ ⟨ψ⟩ . Both ϕ andψ are equivalent written as ϕ ↔ ψ, if and only if (ϕ→ψ) ∧ (ψ→ϕ).

Assume that ⟨ϕ⟩ and ⟨ψ⟩ are nonempty. The support of the rule ϕ → ψ is defined as


The larger Supp(ϕ → ψ), the more power of the rule in DIT. When |⟨ϕ⟩ |≠∅, the certainty or accuracy of ϕ → ψ denoted by Cert(ϕ,ψ) is


This is a percentage objects of ⟨ψ⟩ presented in ⟨ϕ⟩ or percent of objects having property ψ in the set of objects having property ϕ or Cert(ϕ → ψ) shows the confidence of the rule. In consequences, Cert(ϕ → ψ) = 1 is equivalent to ϕ → ψ is true, the rule is certain or accurate. Alternatively, if |⟨ψ⟩ | ≠ ∅ the coverage of ϕ → ψ is also defined:


The smaller of Covg(ϕ → ψ), the less power of the rule. Finally, the popularity of ϕ → ψ is measured by the strength of the rule:


In a given DIT, a coeffect reduced set R of conditional features and corresponding positive region Pd(R) are set up. Then, the DIT is restricted to a new table with features R, d and Pd(R). Such a table is called decision support table or DST. Based on the above measures, decision rules extracted from DST are verified before using them in prediction decisions.

It is noted that, there may be pairs of inconsistent or conflicting decision rules which have the same conditions but different decisions. Such conflicting rules must be excluded. In general, set ℜ of τ decision rules ϕα→ψα selected need to meet the properties:

  1. Each ϕα→ψα in ℜ is admissible, Supp(ϕα→ψα) ≠ 0,

  2. covers Ω or α=1τϕα=α=1τψα=Ω,

  3. ℜ consists of pairs mutually independent, i.e., for ϕα→ψα, ϕβ→ψβ∈ℜ, it is obtained that ⟨ϕα⟩ ∩⟨ϕβ⟩ = ∅ and ⟨ψα⟩∩⟨ψβ⟩ = ∅,

  4. ℜ preserves the consistency: i=1τLFDi=α=1τϕα.

Example 5: A coeffect reduced set, e.g., R = {f1, f2}, and positive region determined by Pd(R) = {ω1, ω2, ω4, ω6, ω7, ω8} as in Example 4. Some decision rules are extracted from Table 1 and measures of obtained rules are presented in Table 2. The supports of the 2nd and 3rd rules are 2, their certainties and strengths are equal to 1 and 22.2%. So, they can be combined together:

Decision rulesCoverage (%)Supported by
1. (f1 = 0) ∧ (f2 = 1) → (d = 0)50.0C2: ω2, ω7
2. (f1 = 1) ∧ (f2 = 1) → (d = 1)40.0C1: ω1, ω8
3. (f1 = 1) ∧ (f2 = 2) → (d = 1)40.0C4: ω4, ω6

Table 2.

List of extracted decision rules.

The support of this rule is raised to 4, coverage of 100% and strength 44.4%. This rule is supported by the classes C1, C4, and can be deduced as follows: “if capacity for innovation is acceptable and service capability is unpleased then the system activity is still acceptable”.

The class C3 = {ω3, ω5, ω9} is not in Pd(R), and a rule like (f1 = 1) and (f2 = 0) → (d = 0 or 1) may not be considered. Because, when it was used, this rule would be useless, since it receives nothing in decision.

The method of decision-making is also applied to build up decisions for risk warning based on processing historical data. Risk management model includes three sequential basic steps, that are risk identification, risk measurement, and risk warning. Risk identification should be objective itself, all risk levels are assessed by experts based on their work experience, this method ignores the role of historical data. That model does not have enough consideration on the uncertain and imprecision of risk. Alternatively, that method will unavoidably lead to some faulty judgments.

Data to identify risk factors often come from the operation, policy, environment, and management of a system. Collected data including a feature to assess risks are described by the feature d in a DIT. This decision feature d is often of six levels, 0: no risk, 1: little, 2: low-grade, 3: middle-grade, 4: distinct, and 5: dangerous. The historical data are collected factually, so there will be some data fields or features which have less impact on the final risk level. If these redundant features are removed, then there will be produced a simplified feature set which will have a positive impact on risk judgment. Where is the place of finding reduced feature set to ignore unnecessary information while the nature of collected data is still unchanged.

Based on fact-finding of conditional features and observed risk levels on DIT, decision rules to predict risk levels are extracted. This process is only a step of the training stage in machine learning. To improve quality of risk prediction, more observations on DIT and verifications of rules must be done repeatedly.

Example 6: To evaluate security risks of a system, three conditional feature types of the system come from environmental impact, management structure, and control equipment are taken into account. These conditional features are notated as E, M, and C, respectively, and the decision feature d is simplified at two levels, either 1: risk-warning or 0: no-warning. Data are shown in Table 3.


Table 3.

Risk warning data.

From Table 3, there are five equivalence classes C1 = {ω1}, C2 = {ω2, ω5}, C3 = {ω3}, C4 = {ω4}, C5 = {ω6} and two decision classes D1 = {ω4, ω5}, D2 = {ω1, ω2, ω3, ω6}.

Using Eqs. (4)(6), the information entropy of F = {E, M, C} is H(F) = 2.2516, H(d) = 0.9183 and mutual information between F and d I(F, d) = 0.5850. From Eq. (6), I(F − {C}, d) = 0.1258 less than I(F, d), then a3 is a core feature with a significance of Sgnf(C, d) = 0.4591.

Consider F-{M} = {E, C}, from Eq. (5), H(d|F − {M}) = 0.3333 implies to I(F − {M}, d) = 0.5850 = I(F, d). Therefore, {E, C} is a coeffect reduced set of F. Hence, there are formally two decision rules:

E=1C # 0E=0C # 0d=1E18

It is noticed that the first expression of the second disjunction is an implication of the second one in the first rule. Therefore, maybe [(E = 1) ∧ (C = 1)] → [(d = 0) or (d = 1)] happens. Alternatively, the second rule can be written as (C # 0) → (d = 1). However, if E = 1 and C = 1, the first rule gives d = 0 contrary to the just deduced rule. For these reason, the above rules are chosen reasonably as [(E = 1) ∧ (C = 2)] ∨ [(E = 0) ∧ (C≠ 0)] → (d = 1).

Similarly, F-{E} = {M, C} gives I(F − {E}, d) = I(F, d), thus {M, C} is also a reduced set of F. Then,


It is also noticed that the second expressions of the above disjunctions are identical and it is necessary to ignore them. Because, if (M = 0) ∧ (C = 1) is true, these rules simultaneously imply d = 0, 1 hard to decide.

Consequently, the second and fourth rules in Table 4 may be used for risk warning w.r.t the collected data in Table 3.

Decision rules for risk warningCoverage (%)Strength (%)
1. [(E = 0) ∧ (C = 0)] → (d = 0)50.020.0
2. [(E = 1) ∧ (C = 2)] ∨ [(E = 0) ∧ (C ≠ 0)] → (d = 1)75.060.0
3. [(M = 1) ∧ (C = 0)] → (d = 0)50.020.0
4. [(M = 1) ∧ (C≠ 0)] → (d = 1)75.060.0

Table 4.

List of extracted decision rules for risk warning.

The difficulties in choosing decision rules will be increasing with large-scale datasets. To reduce in part this shortcoming and make decision rules more efficiently, techniques of machine learning should be used. For instance, in [11], a back propagation neural network was used for training data in DIT, verifying decision rules in a number of steps to minimize errors in prediction based on decision rules.


3. Evaluation of the extent of MIS using ANOVA

For the outcome extent of an MIS, it is assumed that a reduced set of m features, namely f1, f2,…, fm, is considered and evaluated with real numbers. The probability distribution of fi is assumed that normal N(ξi, σ i2) with expected mean ξi and variance σi2.

ANOVA or analysis of variance was derived based on the approach in which the statistical method uses the variance to determine the expected means whether they are different or equal. It assesses the significance of factors, the so-called features here, by comparing the response means of observation samples at different features. In this chapter, ANOVA with single stage and multiple stages are introduced to evaluate features from the extent of an MIS.

In doing ANOVA, it is also assumed that all m features fi are of the same variances. In a course of consideration, m observation samples at different features are randomly drawn. The ith sample is denoted by {ωij}, j = 1, 2,…, ni, a manifestation of a random variable fi from the population of fi values. Basic characteristics of the ith sample are:

ω¯i=j=1niωij/ni—sample average, is an estimate for μi,

si2=j=1niωijω¯i2/dfi—sample variance, estimate for σ2 with degree of freedom dfi = ni − 1.

These calculations are done by using the following three basic sums:



Sum of squares:


Sum of squares of derivations:


Then, it is implied that ϖ i = Si/ni and SSDi = SSi−Si2/ni, so s*i2 = SSDi/dfi.

To verify condition that all variance σi2 are equal to the same value σ2, the Bartlett test based on the χ2 probability distribution is used at a level of significance α valued from 1 to 5%. If the hypothesis on the equality of all variances is correct, m > 1 and ni > 1 for all i, Bartlett has shown that the statistic χ2cal has approximately a χ2-distribution with df = m−1:


Here, df = ∑i:1..m dfi, c = 1+(∑i:1..m 1/dfi−1/df)/[3(m − 1)], s2 = (∑i:1..m dfi × s*i2)/df = (∑i:1..m SSDi)/df is the pool variance, an estimate for σ2. If a calculated χ2cal is less than χ21 − α-percentile, it is unreasonable to deny that all variances are the same. It is noticed that the approximation χ2-distribution is a poor one for dfi ≤ 2.

In case of n1 = n2 = … = n, then df = n−1 and Eq. (21) can be quite simple. Indeed, because of logs2 = log∑ i:1..m SSDi−log(df) and logsi2 = logSSDi−log(dfi), a shortened form of Eq. (24) is


where, c = 1 + (m + 1)/(3 m[n−1]). The value χ2cal in Eq. (25) is calculated by using only all SSDs.

Setting n = ∑ i:1..m ni, ωo = (∑ niϖi)/n, ξo = (∑ niξi)/n and ηi = ξi−ξo. It is shown the following partitions


According to the χ2-partition theorem, the sums in the rightmost side of Eq. (26) are of χ2-distribution with degrees of freedom n−m, m−1, 1, respectively.

If the expected means of m populations are the same, ξi = ξo and ηi = 0 for all i. The two first terms of Eq. (26) are variations within or between samples and determined in turn as follows:


The statistics s12, s22 and s32 = n[ωo−ξo]2 are unbiased estimates of σ2. In this case, the total variance between observations and population is determined as follows:


In such a case, the variance ratio v2cal = s12/s22 is of the Fisher probability distribution with dfs1 = n − m, dfs2 = m − 1. Therefore, the hypothesis about equality of m expected means is tested using the Fisher distribution with a given level of significance α valued from 1 to 5%. If v2cal > F1 − α(dfs1, dfs2), the hypothesis of equal means would be rejected, in which F1α(dfs1, dfs2) is the 100(1 − α)% percentile of the Fisher distribution.

It is noticed that the condition m > 1 and, for all i, ni > 1 are essential not only for Bartlett test, but also for doing ANOVA [12]. Conversely, the analysis is trivial when ni = 1 for some i. Also, if m = 1, the analysis is pure inference from single population [13].

Example 7: Assume that there are four features need to be tested at the 5% level of significance with data in Table 5. Calculations are given in Table 5.

Features fif1f2f3f4
{1}. ni.333413
{2}. dfi.22239

Table 5.

Calculations for single-stage ANOVA.

Using Eq. (24), χ2cal = 1.328 is far less than χ20.95(3) = 7.815, the 95% percentile in the table of χ2 probabilities with df = 3. Therefore, the hypothesis on equality of variances is accepted. The variation between dataset is estimated by the pool variance, Eq. (29), s2 = 36.3/9 = 4.037. Using the underlined numbers in Table 5, the ANOVA table is presented in Table 6.

Variation sourcesSSDdfs2v2
Between features1.35930.4530.11
Within features36.33394.037
Total37.69212F0.95(3,9) = 3.86

Table 6.

Single-stage ANOVA table of Example 7.

The calculated basic sums in the first part of Table 5 are used to set up an ANOVA in Table 6. It is shown thatv2cal = 0.453/4.037 = 0.112 < 3.86, the 95% percentile in the table of Fisher probabilities w.r.t α = 5%. The hypothesis on equality of the expected means would be accepted at the 5% significance level.

If the hypothesis ξi = ξ2 = … = ξm is rejected, all possible differences of these means in form of linear combinations are estimated by using confidence intervals. In such a case, there is a probability of 1 − α that all comparisons simultaneously among the expected means satisfy:


Here, ∑i = 1…m δi = 0 and λ2 = s2 × F1 − α(m − 1, n − k) × (m − 1) × ∑i = 1…mi2/ni), F1−α(m−1,n − k) is the 100(1 − α)% percentile of the Fisher probability distribution.

For instance, if m = 3, n = 4, ϖ1 = 2.25, ϖ2 = 4.0, ϖ3 = 4.5 and s2 = 4.41, then F0.95(2,3 × 4 – 3) = 4.26. Using Eq. (30), some 95% confidence intervals are calculated as follows:

−δ1 = 1 = −δ2, δ3 = 0, λ = 4.33; the confidence interval of ξ1 − ξ2 is −1.75 ± 4.297 or (−2.55, 6.47).

−δ1 = 0, δ2 = 1 = −δ3; similarly, the confidence intervals of ξ2 − ξ3 is −0.5 ± 4.297 or (−3.797, 4.797).

−δ1 = ½ = δ2, δ3 = −1, λ = 3.721. The 95% confidence interval of ½ξ1−½ξ2−ξ3 is (−2.436, 5.096).

When having several stages need to be tested on equality with expected means of features, multiple-stage ANOVA is applied. This is the case of evaluating the same given m features in k different stages, denoted by Γνν = 1, 2,…, k. To simplify in presentation, without loss generality, it is assumed that all observed samples in stages have the same size, i.e., ni = n for all i, and Eq. (25) is used for Bartlett test.

The notations are similar, but an index ν added to the observations in each νth stage. The sums in Eqs. (21)(23) are renotated as Sνi, SSνi, SSDνi. Since, ϖνi = Sνi/n, sνi2 = SSDνi/(n−1) are the average and variance of sample of the νth stage. All computations with multistage are similar to the single-stage ANOVA. Then, the results from stage computations are combined as shown at the end part of Table 7, to form multistage ANOVA table.

Example 8: Given a two-stage dataset of three features in five first rows of Table 7, calculations are illustrated in the parts, notated as {1} and {2}, of the table which aim at presenting schemes for finding basic sums and terms of Bartlett test and ANOVA.

fiStage 1Stage 2Sizes
ων ij 15769810k = 2
2847878m = 3
3665857n = 3
{1} Sij191718252025124
{2}Bartlett test:ANOVA:
SSD21.333c.1.194(Σ(S1i + S2i)2)/(km) − S2/(kmn):4.778
logSSDi2.80χ2cal1.945SS1 + SS2 − S2/(kmn):41.778

Table 7.

Calculations for Two-stage ANOVA.

Calculations for the Bartlett test in {2} of Table 7 show that χ2cal = 1.194 < χ20.95(5) = 11.07, the hypothesis that population variance is the same for all features is accepted at α = 5%. An estimate of the population variance is s12 = 21.33/(2 × 3 × [3–1]) = 1.778, cf. Table 8. The part {3} of Table 7 is the calculation scheme for the terms in Table 8, where Subtotal equals Total minus Within stages or the sum of Between features within stages, Between stages, and Interaction.

Variation sourcesSSDdfs2v2
Between stages14.222114.2228.0
Between features within stages6.21023.1051.747
Within stages21.333121.778

Table 8.

Two-stage ANOVA table of Example 8.

The ratio of the variation between stages to within features is v2 = s32/s12 = 14.222/1.778 = 8.0 which by far exceeds the 95% percentile of Fisher distribution F0.95(1,12) = 4.75. That means the difference of the expected means between stages is different significantly. In other words, the effects between stages are significantly discriminated.

Similarly, in comparison of the variation within features and between features within stages, Table 8 shows that v2 = s22/s12 = 3.105/1.778 = 1.747 < F0.95(2,12) = 3.89. This shows that the difference between the expected means of features within stages is not significant or the effects between features within stages are almost the same.

Beside the above effects, the interaction between stages and features is also a factor need to be considered. The ratio v2 = 0.006/0.012 = 0.50 gives that such an interaction is not present in given dataset. Thus, both the lines labeled “Interaction” and “Within stages” give the same unbiased estimates of σ2, since a combination of these lines can improve the estimate of σ2. The residual mean square is a sum of variations between the Interaction and Within stages. This leads to an updated population variance is 1.525 less than s12 = 1.778 in Table 8, but obviously increases v2 ratios. Table 9 analyzes the interaction without stage of Example 8.

Variation sourcesSSDdfs2v2
Between stages14.222114.2229.328
Between features within stages6.21023.1052.036
Residual mean square21.345141.525

Table 9.

ANOVA table—two-stage without interaction.

The ratio v2 = s22/s12 = 3.105/1.525 = 2.036 < F0.95(2,14) = 3.74 or the effects between features within stages are the same. While, v2 = s32/s12 = 14.222/1.525 = 9.328 which also by far exceeds F0.95(1,14) = 4.60, the effects between stages are also significantly discriminated, cf. Table 8.


4. Case studies

To evaluate the extent to which MIS is being used to attain achievements of long-term planning, short-term planning in the South-West Nigerian universities [14], all selected features are f1: Construction of building in the university, f2: Student enrolment projection, f3: Manpower projection, f4: Staff recruitment exercises, f5: Establishment of new faculties and department, f6: Designing university academic program, f7: Stock library with books and journals are considered in long-term evaluation. For short-term, f1: Promotion of Staff, f2: Staff Training and Development, f3: Appointment of Deans or Heads of Departments or Divisions, f4: Appointment of Committee Members, f5: Allocation of offices to staff, f6: Allocation of Residential Quarters, f7: Allocation of Lecture room/theaters, f8: Full-time equivalent or Teacher-Students Ratio, and f9: Maximum Teaching Load are considered.

In evaluation of the extent of our university for a 5-year strategy planning, the following features are used f1: Effectuation rights and obligations of students, f2: Promotion of international cooperations, f3: Library, equipment and material facilities, f4: Potential of Scientific R&D and transfer of technology, f5: Capacity of organization and management, f6: Design of university academic programs, f7: Promotion of academic operations, f8: Capacity of manpower projection, f9: Management of finance and resources. These basic features are factors to evaluate whether the university attains its goal and objectives. Each basic feature is evaluated in the scale of 100 but here it is illustrated in the one of 20 points.

Example 9: Let fi, i = 1, 2,…, 9 be features characterized as the extent of an MIS as above. ωij, j = 1, 2,…, 12 is a value that is evaluated as the ith feature by the jth evaluator in a shorten marking scheme of 20. Calculations for the single-stage ANOVA table are shown in Table 10.

{1} Si876272100134126126146170
{2} Bartlettdf:8Anova
log.s2:1.078χ2cal9.432ΣSi2/fi:10,543ΣSi2/n − S2/nm:853.33

Table 10.

Calculations for single-stage ANOVA dataset.

The calculated value χ 2cal = 9.432 in Table 10 does not exceed χ20.95(8) = 15.51, the hypothesis on equality of variances is accepted. The population variance is estimated as s2 = 1185.58/99 = 11.976. The corresponding ANOVA table for this dataset is given in Table 11.

Variation sourcesSSDdfs2v2
Between feature853.3338106.6678.907
Within features1185.5839911.976

Table 11.

Single-stage ANOVA table of Example 9.

Here, as variance ratio v2 = 8.907 far exceeds F0.95(8,99) = 2.06, it is unreasonable to assume that all the expected means of features are the same. This can also be seen from Table 10, where all sum of features from f1 to f4 are less than the ones of features from f5 to f9.

A more detailed examination revealed that the nine features can be partitioned into two groups, namely A = {f1, f2, f3, f4} with the first four features and B = {f5, f6, f7, f8, f9} with the remainders. Each group of features can be seen as a treatment and its observation sample includes all observations in the same group. Since, it would be reasonable to consider the variation between features into three portions between: the features from A, the features from B, and between group A and B. Calculations in this consideration are extracted from Table 10 and illustrated in Table 12.

Group AGroup Bn = 12
n*4 × 125 × 12108

Table 12.

Calculations for ANOVA between group A and B.

In comparison with the variance within features s2, the variance ratios v2 = 23.243/11.976 = 1.941 < F0.95(3,99) = 2.66 and v2 = 113.60/11.976 = 2.371 < F0.95(4,99) = 2.43 in Table 13 show that there is no essential difference between features in the same group. Since the third ratio v2 = 183.33/11.976 = 15.309 is far greater than F0.95(1,99) = 3.9, the features in group A and B do have different expected mean.

Example 10: Assume that in an MIS, there are two stages that need ANOVA with the same set of features. In each stage, samples of evaluations in marking scheme of 20. Let ωνij be an integral value in marking scheme of 20 that evaluates the ith feature given by the jth evaluator from the νth stage, ν = 1, 2, i = 1, 2,…, 7, j = 1, 2,…, 8. This dataset is in Table 14 including calculation for ANOVA.

Variation sourcesSSDdfs2v2
Between features from A69.729323.2431.941
Between features from B113.60428.402.371
Between features in A and B183.331183.3315.309

Table 13.

ANOVA table of two groups A and B.

ω1i116151714141314k = 2
ω1i213141112101312m = 7
ω1i314161514151416n = 8
{1} S1i103114107104104103107742
{2} S2i112123118113112113116807
(S1i + S2i)246,22556,16950,62547,08946,65646,6564973343,149
{3} BartlettAnovaS=S1 + S21549
ΣlogSSDi:15.61c:1.051Σ(S1i + S2i)2 /(km)−S2/(kmn):23.589
χ2cal:9.507SS1 + SS2 – S2/(kmn):263.77

Table 14.

Calculations for two-stage ANOVA dataset.

Using Bartlett test in {3}, χ2cal = 9.507 not exceed χ20.95(15) = 22.36, so population variances are the same with the pool variance of s2 = 1185.58/96 = 2.105. Table 15 shows this ANOVA.

Variation sourcesSSDdfs2v2
Between stages37.723137.72318.290
Between features within stages23.58963.9321.906
Within stages202.12598

Table 15.

Two-stage ANOVA table of Example 10.

The ratio v2 = s22/s12 = 3.932/2.063 = 1.906 is less than F0.95(6,98) = 2.15, the difference between the expected means within stages is not significant. Similarly, v2 = s32/s12 = 37.723/2.063 = 18.29 > F0.95(1,98) = 3.96, the expected means between stages are discriminated.

Since v2 = 0.057/0.339 = 0.167 less than the 95% percentile of Fisher distribution, any interaction does not exist. Thus, “Interaction” and “Within stages” variation sources are combined to s12 = (202.125 + 0.339)/104 = 1.947 a better estimation for σ2 than 2.063 in Table 15.

The case of m = 1 and k = 2 has been presented in the previous subsection with group A, B. In [15], ANOVA has been used to specify whether a statistical relationship exists between human development index and security index. The authors in [16] have used the ANOVA combined with regression analysis to assess and evaluate student MIS of a university.

In this subsection, the student test is presented in comparison with the effects of f from the two stages or treatments. Let {ωij} i = 1, 2 and j = 1, 2,…, ni be two observation samples of sizes ni drawn from the two treatments of the feature f. Using Eqs. (21)(23), the means ϖ1, ϖ2 and variances s12, s22 are calculated with df1 = n1–1, df2 = n2–1.

The equality of population variances is tested using Fisher distribution with v2 = s12/s22. If v2 < Fα/2(df1, df2) or v2 > F1−α/2(df1, df2), it is unreasonable to assert that the population variances are equal. Otherwise, the pool variance of these treatments is s2 = (SSD1 + SSD2)/(df1 + df2).

The equality of the expected means from treatments is tested by the student distribution based on the difference ϖ0 = ϖ1 − ϖ2. If this hypothesis is correct, there are two cases:

  • If the variances in each treatment are equal, the statistics tcal = ϖ0/so with so2 = s2[1/n1 + 1/n2] has the student distribution df = df1 + df2 degrees of freedom,

  • If the variances of treatments not equal, tcal = ϖ0/so with so2 = s12/n1 + s22/n2 approximate the student distribution with df = c2/df1 + (1−c2)/df2, where c = (s12/n1)/(s12/n1 + s22/n2).

The hypothesis that the two expected means of the feature f from the treatments are equal is rejected at a level of significance α when |tcal| > t1 − α/2(df). Otherwise, the confidence interval of the difference η between the two means is


where t1 − α/2(df) is the 100(1−α/2)% percentile of the student distribution, t1−α/2(df) = −tα/2(df).

For instance, from Table 12, the variances in groups sA2 = 69.729/47 = 1.484 and sB2 = 113.60/59 = 1.925 givev2 = sA2/sB2 = 1.30 less than t0.995(106) = 2.35. It is accepted the variances in group A and B are equal. The pool variance is estimated by s2 = (SSD1 + SSD2)/(df1 + df2) = 113.60/106 = 1.729. Also, Table 12 gives so2 = s2.[1/47 + 1/59] = 0.06611 and tcal = (11.7–6.6875)/so = 19.68, this so far exceeds t0.995(106) = 2.606. The student test for these two treatment shows the expected mean of group B so far exceeds the one of A. The 99.5% confidence interval of the difference between these expected means is 11.7 − 6.6875 ± 2.606 × √0.06611 or (4.342, 5.683).

Similarly, Table 15 shows that there is no difference in evaluating features by evaluators within stages in Example 10. It is reasonable to group features in each stage to each other and using the method of comparison between two treatments of a feature as above.


5. Conclusion

It is dealt with this chapter the useful methods for choosing important features and supporting decisions of a given decision information system, presented in Section 2. The methods of ANOVA are introduced in Section 3 to evaluate features from the extent of an MIS. The demonstrations of using such methods, through examples and case studies in Section 4 at our Faculty of Information System—University of Information Technology, showed that the efficiency of the proposed methods. The illustrated calculating schemes allow designing and coding computer programs for solving the above problems automatically.


  1. 1. Nguyen KP, Tu HT. Assessment waste water treatment process with in-completed dataset. In: Proceedings of the 20th National Conference on Fluid Mechanics. Vietnam; 2017
  2. 2. Ciortea M. Aspects regarding the types of process control systems. International Conference on Theory and Applications of Mathematics and Informatics. 2004:90-95
  3. 3. Heidarkhani A, Khomami AA, Jahanbazi Q, Alipoor H. The role of management information systems (MIS) in decision-making and problems of its implementation. Universal Journal of Management and Social Sciences. 2013;3(3):78-89
  4. 4. Pawlak Z, Skowron A. Rough sets: Some extensions. Information Sciences. 2007;177:28-40
  5. 5. Nguyen KP, Tu HT. Data mining based on rough set theory. In: Knowledge Discovery in Databases. USA: Academy Publisher; 2013
  6. 6. Nguyen KP, Bui ST, Tu HT. Revenue evaluation based on rough set reasoning. In: Sobecki et al., editors. Advances Approaches to Intelligent Information and Data-base Systems. Thailand: Springer SCI 551; 2014
  7. 7. Dai J, Xu Q, Wang W. A comparative study on strategies of rule induction for incomplete data based on rough set approach. International Journal of Advanced in Computing Technology. 2011;3(3):176-182
  8. 8. Chen Y, Miao D, Wang R. A rough set approach to feature selection based on ant colony optimization. Elsevier Pattern Recognition Letters. 2010;31:226-233
  9. 9. Liu Y, Esseghir M, Boulahia LM. Evaluation of parameters importance in cloud service selection using rough sets. Applied Mathematics, Scientific Research Publication. Mar. 2016;7:527-541
  10. 10. Nguyen KP, Nguyen VL. Analysis of weather information system in statistical and rough set point of view. In: New Trends in Computational Collective Intelligence, ICCCI, 2014, Korea, 2014
  11. 11. Shi L, Luo F. Research on risk early-warning model in airport flight area based on information entropy attribute reduction and BP neural network. International Journal of Security and Its Applications. 2015;9(10):313-322
  12. 12. Using ANOVA. Available from: May 2016
  13. 13. Ramachandran KM, Tsokos CP. Mathematical Statistics with Applications. San Diego, California, USA: Elsevier Academic Press; 2009
  14. 14. Ajayi IA, Omirin FF. the use of management information systems (MIS) in decision making in the South-West Nigerian. Academy Journal: Educational Research and Review. May 2007;2(5):109-116
  15. 15. Sow MT. Using ANOVA to examine the relationship between safety & security and human development. Journal of International Business and Economics;2(4):101-106, Published by American Research Institute for Policy Development, Dec 2014
  16. 16. Fetaji B et al. Assessing and evaluating UBT model of student management information system using ANOVA. TEM Journal. Aug. 2016;5(3):313-318, ISSN 2217-8309. DOI: 10.18421/TEM53-10

Written By

Khu Phi Nguyen and Hong Tuyet Tu

Submitted: 01 August 2017 Reviewed: 16 January 2018 Published: 24 October 2018