## 1. Background

Mathematical statistics has long been widely practiced in many fields of science [1]. Nevertheless, statistical methods have remained remarkably intact ever since the pioneering work [2] of R.A. Fisher and his contemporary scientists early in the twentieth century. Recently however, it has been claimed that most scientific results are wrong [3], due to malpractice of statistical methods. Errors of that kind are not caused by imperfect methodology but rather, reflect lack of understanding and proper interpretation.

In this introductory chapter, a different cause of errors is addressed—the ubiquitous practice of willful ignorance (WI) [4]. Usually it is applied with intent to remedy lack of knowledge and simplify or merely enable application of established statistical methods. Virtually all statistical approaches require complete statistical knowledge at some stage. In practice though, that can hardly ever be established. For instance, Bayes estimation relies upon prior knowledge. Any equal a priori probability assumption (“uninformed prior”) does hardly disguise some facts are not known, which may be grossly deceiving. Uniform distribution is a specific assumption like any other. Willful ignorance of that kind must not be confused with knowledge to which we associate some degree of confidence. It may be better to explore rather than ignore consequences of what is not known at all. That will require novel perspectives on how mathematical statistics is practiced, which is the scope of this book.

## 2. Ambiguity

Incomplete knowledge implies that obtained results may not be unique. That is, results may be ambiguous. Ambiguity de facto means the uncertainty associated with any estimated quantity itself is uncertain. We may adopt a probabilistic view and classify ambiguity as epistemic uncertainty. Ambiguity will here refer to lack of knowledge typically substituted with willful ignorance. Alternatives propelled by different types of willful ignorance can thus be explored to assess ambiguity.

A most powerful source of ambiguity is dependencies. Independence is perhaps the most claimed but often the least discussed presumption. Throwing dices or growing crops, as typically studied by the founders of statistics, independence indeed seems plausible. In all the complexity of modern technology of today however, it is anything but evident observations are independent. For instance, meteorological radar observations may share sources of errors, meaning recorded data will be statistically dependent. A problem may then arise if our analysis makes use of, e.g., the maximum likelihood method which utilizes the entire covariance matrix. Most of its entries, all covariances between pairs of observations, are usually not known but bluntly set to zero to enable evaluation. This willful ignorance has the drastic consequence of extinguishing ambiguity and, as will be shown, minimizing the resulting uncertainty. Elementary considerations should provide the valuable insight that even exceedingly small covariances may substantially influence the result: the number of covariance elements is

Various attempts have been made to avoid willful ignorance. The method of maximum entropy [5] focuses on the consequences of improper assignments of unknown statistical information. Covariance intersection [6] fuses observations conservatively to a pair of uncorrelated observations with variance

Repeating any statistical analysis with various kinds of willful ignorance [on its input], the ambiguity (A) [of its output] can be assessed. Some WI will give large, while others will yield small resulting uncertainty, not necessarily the maximum and minimum, as it is difficult to imagine all possible kinds of WI. Any specific WI will more or less reduce or quench the uncertainty from its maximum. Identifying a model from calibration data

## 3. Illustration of uncertainty quenching

Assume we would like to study the evolution of a field over two spatial coordinates, using a model composed of a set of differential equations. The field could refer to meteorology and describe current observations of air pressure or humidity. The initial state may be expanded in the set of basis functions of the appropriate operator, similar to forecasting in numerical weather prediction (NWP) [7]. The basis functions could be thought of as the eigensolutions of a linear operator, which propagates one meteorological state, from one day to another. Neither the interpretation of the field nor the field itself matters for the discussion here. Rather, it is how the uncertainty of the initial state is represented as uncertainty of the distributed eigensolutions of the NWP propagator. This representation will determine the uncertainty of any subsequent forecast, reflecting the past experience in future confidence of predicting the weather. If the forecast uncertainty is lower than our current knowledge reflects, we may falsely reject, e.g., the possibility of experiencing major thunderstorms. In the eye of sailors planning their journey, the forecast uncertainty is the indisputable decision-maker. Studying the uncertainty quenching

To enable illustrations, let the eigenstates of the NWP operator of order

where the NWP operator propagates the coefficients

Without any supplementary information, the variance of the initial measurement should be completely represented by the variance of the initial model state, i.e.,

Assuming normal distributed measurement noise, the maximum likelihood method [8] yields the parameter covariance given by Eq. (3), which is propagated to uncertainty of the best predictions according to Eq. (4):

Combining these relations, the degree of completeness of the representation of uncertainty by the model can be studied:

where

It should be emphasized that stating independence is fundamentally different than stating the degree of dependence which is unknown. These statements in fact oppose each other, since independence maximizes the available amount of information. Indeed, the Fisher information matrix [9]

is additive as

Uncertainty is lost for obvious reasons. The question is how much and for what reason. Since the model cannot represent an arbitrary response, it can neither represent an arbitrary variability. This restriction constitutes the very meaning of a “model.” This makes it important to describe the covariance of observations accurately—inappropriate WI may quench uncertainty dramatically.

The additional information represented by the structure of the model could be denoted by the model innovation. It is strongly affected by WI attributed to observations. With increasing resolution

If WI of observation covariance instead resembles what the model is able to represent, the model innovation will be the least. Instead of assuming independent observations, introduce a finite long correlation length\ksi:

Increasing the correlation length\ksi from zero as in Figure 1 (bottom), the model innovation decreases, and the variance of the prediction

It is a different matter if the model is consistent with the observations it was identified from. Model consistency is usually assessed with a statistical residual analysis. In conventional system identification (CSI) [10], the hypothesis is that the [deterministic] model fully explains the observations. Due to sampling variance of the finite uncertain calibration data though, the best estimate of its parameters will be uncertain. The residual analysis explores if the residual is consistent with the sampling uncertainty of the calibration data but without uncertainty associated with the model.

This conjecture of a model without error whatsoever in CSI is questionable. In practice, no model is completely without error. Rather, a finite uncertainty of the model could be regarded as inherited from mismatch to calibration data. If so, the model merely provides a convenient but to a quantifiable degree imperfect basis for expressing uncertain calibration data. The model is utilized to “passively transform” rather than “actively explain” observations to another unknown situation of interest. That intent is typical in, e.g., weather forecasting and product development. Furthermore, the uncertainty of calibration data can often be assessed from the setup of the calibration experiment. In CSI correlation functions are evaluated from a single residual vector, enforcing homoscedasticity and independence of observations. WI of this kind enables the statistical analysis of the residual but often find little support.

The alternative view on model calibration proposed here is that the identified model, composed of its form or structure, parameters, and uncertainty, represents the uncertain calibration data. Model results can thus substitute our observations, to the degree various aspects of the model and observations are consistent. Any given residual is one realization and should relate to its expected variability, with respect to the uncertainties of both the model and the observations it was identified from.

The Mahalanobis distance [6] can be utilized to measure the relative distance between observations and model output, which constitutes the residual

The residual covariance matrix defines its principal variations with typical magnitudes

The evaluation of

Extracting matrices

where

To maximize the consistency, in the sense of minimizing the Mahalanobis distance, the variance

Minimizing the Fisher information matrix under assumption of normality addresses the covariance

In practice, no residual projection

A potential conflict is inevitable for exceedingly high ratios

## 4. A quest for better practice of willful ignorance

* “The first principle is that you must not fool yourself and you are the easiest person to fool” [*.

Current practice of willful ignorance sometimes makes statistics an art of self-delusion [3]. Consequences of applied WI are rarely explored, as only one proposition normally is made without further ado.

Distinguishing what is not known from what is assumed is of paramount importance. Not known to any degree should mean that all possibilities that can be imagined also ought to be considered. Otherwise obtained results only exemplify what the most appropriate answer may be, without any indication of the largest possible deviation.

Our knowledge is almost never complete. Virtually all existing statistical methods nevertheless require precisely that. Until alternative methodologies exist, WI must fill the gap between what is actually known and what must be known. As illustrated, the consequences of different WI may vary dramatically. Therefore we should select and tweak WI carefully. WI should not relate to our unconfirmed belief, but rather address its consequences.

The proposal of a quantifiable ambiguity proposed here suggests how ramifications of incomplete knowledge might be mitigated with carefully chosen WI: explore all kinds of ignorance that can be imagined. Analyze and collect obtained results in ambiguity intervals, similar to confidence intervals. Another option is to focus on the worst case in a conservative manner. The method of covariance intersection is one example of how that can be exercised. The principle of maximum entropy provides means to maximize the residual uncertainty, to add the least possible amount of information. Minimizing the Fisher information for observations and the Mahalanobis distance for model identification as proposed here is still another kind of conservatism. These methods tackle unknown information with WI and explore its consequences. Finding the most proper WI is indeed nontrivial and calls for genuinely novel approaches.

Current practice of statistics utilizes WI in many ways, but the specific choice is rarely discussed in depth. One reason could be that statistics was developed in an entirely different context than practiced today, which is rarely acknowledged and probably not fully comprehended. To exemplify, recall that Fisher’s [2] original interpretation of “never” as a finite probability of 5% was just a humble proposal. He urged his readers to adjust “never” to the current context, a piece of advice almost never followed today.

Perhaps the reported breakdown of statistics methodologies [3, 4] is due to neglect of ambiguity, driven by a strong tradition of uncritical application of WI. Could this be caused by lack of awareness of its potentially dramatic consequences? Ignorance of limitations of contemporary state-of-the-art methods is hardly new [12]. Ambiguity indeed sets a meta-perspective on statistical analysis that cannot be avoided and thus needs further exploration.