Efficiency

## 1. Introduction

Statistical signal processing [1] traditionally focuses on extraction of information from noisy measurements. Typically, parameters or states are estimated by various filtering operations. Here, the quality of signal processing operations will be assessed by evaluating the statistical uncertainty of the result [2]. The processing could for instance simulate, correct, modulate, evaluate, or control the response of a physical system. Depending on the addressed task and the system, this can often be formulated in terms of a differential or difference *signal processing model equation* in time, with uncertain parameters and driven by an exciting input signal corrupted by noise. The quantity of primary interest may not be the output signal but can be extracted from it. If this uncertain dynamic model is linear-in-response it can be translated into a linear digital filter for highly efficient and standardized evaluation [3]. A statistical model of the parameters describing to which degree the dynamic model is known and accurate will be assumed given, instead of being the target of investigation as in system identification [4]. *Model uncertainty* (of parameters) is then *propagated* to *model-ing uncertainty* (of the result)*.* The two are to be clearly distinguished – the former relate to the input while the latter relate to the output of the model.

Quantification of uncertainty of complex computations is an emerging topic, driven by the general need for quality assessment and rapid development of modern computers. Applications include e.g. various mechanical and electrical applications [5-7] using uncertain differential equations, and statistical signal processing. The so-called brute force Monte Carlo method [8-9] is the indisputable reference method to propagate model uncertainty. Its main disadvantage is its slow convergence, or requirement of using many samples of the model (large ensembles). Thus, it cannot be used for demanding complex models. The ensemble size is a key aspect which motivates deterministic sampling. Small ensembles are found by substituting the random generator with a customized deterministic sampling rule. Since any computerized random generator produces a pseudo-random rather than a truly random sequence, this is equivalent of modifying the random generator to be *accurate* for *small* ensembles of *definite* size, rather than being *asymptotically exact* (infinite ensembles). Correctness of very large ensembles is of theoretical but hardly practical interest for complex models, if the convergence to the asymptotic result is very slow.

## 2. Modeling uncertainty of signals

### 2.1. Problem definition

Suppose the (output) signal

The model is given in

Systems of the form in Eq. 1 may be directly realized as digital filters,

The parameterization should be carefully chosen as it affects the convergence rate of Taylor expansions (section 3.1) as well as the physical interpretation. The parameters and their statistics are preferably extracted from measurements using system identification techniques [4]. Note that complex-valued poles and zeros are conjugated in pairs [10].

The problem to be addressed is the statistical evaluation of any function

### 2.2. Nomenclature

Statistical expectations of any signal, model or function

Samples of

Only uniform *skewness* [11]. Further, the shape is reflected in *curtosis* [11]. Since

Matrix size will be indicated with subscripts, e.g.

A method will be stated intrusive if manipulations of the model are required. For the targeted highly complex models, it will be assumed that the computational cost for their evaluation dominates all other calculations. The efficiency

### 2.3. Fundamentals of non-linear propagation of uncertainty

Linearity in parameters (LP) is to be distinguished from linearity in response (LR),

for some vector

For non-linear propagation of uncertainty, the asymmetry of the resulting pdf is central. It can be expressed as a lack of commutation of non-linear propagation and statistical evaluation of a center value *scent* [12],

The method for evaluating the center is left unspecified, as there are several alternatives. The most common choice is to use the mean, *additional* asymmetry caused by the non-linearity of the model is measured with the scent but differently. The scent addresses how parametric uncertainties are propagated and not how the result is distributed, e.g.

The scent is important since

## 3. Conventional methods

A brief resume of the most traditional related methods of uncertainty propagation, applicable to signal processing models, is here given together with their pros and cons. Advanced intrusive methods like e.g. polynomial chaos expansions [14-15] not directly related to the proposed method are omitted.

### 3.1. Taylor expansions

The indisputable default methods of uncertainty propagation are based on Taylor expansions. These methods are intrusive if the differentiations are made analytically. Convergent series require regular differentiable models and numerical or analytical complexity make them error prone. Their applicability is therefore limited for complex models.

The transfer function

This defines

where *once*, and not repeatedly for every signal

### 3.2. Brute force Monte Carlo

Monte Carlo (MC) methods [8-9], or *random sampling* of uncertain models was originally introduced and phrased ‘statistical sampling’ by Enrico Fermi already in the 1930’s [17]. The MC methods *realize* uncertain signal processing models in finite *ensembles*. Every ensemble consists of a possible set of well-defined model systems, all (usually) having the same structure but slightly different parameter values. In the original so-called brute force Monte Carlo method, each set of parameters is assigned to the output of random generators with appropriate statistics. The convergence to the assigned statistics is very slow [5] but it is asymptotically exact and the required number of samples is essentially independent of the number of parameters. Hence it does not suffer from the curse-of-dimensionality of many other methods. The outstanding simplicity in application is likely the cause of its popularity, just as the slow convergence or low efficiency is the main reason for its failures.

In MC, arbitrary distributions and dependencies are usually obtained by means of transformations of samples of elementary distributions. Independent samples

Covariance may be included with an appropriate transformation of samples of *canonical* parameters

The matrices

All

A simple example illustrates that the mixing effect can be considerable, even for minute correlations. Assume a model has two parameters with a covariance matrix,

Large rotations are required because the canonical variances *degenerate*. As shown in Fig. 1, the large rotations mix the assigned pdfs

Specifying both marginal probability distributions and covariance is either redundant or inconsistent, as the latter is uniquely determined by the former. Nevertheless, this reflects the typical available information for signal processing applications. The moments can be accurately determined [4] for sufficiently large data sets but the joint distribution

### 3.3. Refinements of Monte Carlo

To increase the efficiency of MC, the original brute force sampling technique has been further developed in mainly two directions: model simplification and sample distribution improvement. In response surface methodology (RSM) [18], the model is replaced by a simple approximate surrogate model. A model of order *collocation points* [15]

where

Let

In practice,

The distribution of samples may be improved with stratification, as in Latin Hypercube sampling (LHS) [18]. By dividing the sample space into intervals, or stratas representing equal probability the need for large ensembles is reduced. In LHS, each parameter is sampled exactly once in each of its stratas giving a generalized *latin square* [20]. This selection pushes the samples away from each other and distributes them more evenly. To illustrate the improvement with stratification, sample one parameter

In this case, it is questionable if 100 samples are sufficient to represent as few as four moments *exactly*, a very sparse grid or few deterministic samples are needed,

If the problem at hand only depends on these moments, the exact solution will be obtained. The size of such small ensembles must be fixed, no matter how they are generated. Adding, or perturbing a single sample would modify the statistics substantially.

## 4. Deterministic sampling

Deterministic sampling (DS) of uncertain systems is a viable alternative to random sampling (RS). Instead of using random generators, specific DS rules are devised to generate appropriate, but still statistical (Fermi’s notation, see section 3.2) ensembles. A rudimentary example illustrates the principle: Assume a model

In deterministic sampling the model evaluations involve no approximations and are *non-invasive*. In many respects, deterministic sampling is constructed and optimized for quantification of modeling uncertainty: Minimal ensembles allow for evaluation of the most numerically demanding models. The model evaluations are exact and non-invasive to fully respect non-linear deeply hidden parameter dependences. Only vaguely known statistics of the model is approximated.

### 4.1. Concepts of deterministic sampling

DS does not per se specify the goal of sampling, e.g. given mean and covariance of the parameters. In the example at the end of section 3.3, the primary target was the joint pdf of the parameters. In section 4.2, the target is

### 4.2. Propagation of covariance in the standard unscented Kalman filter

The reference will be the specific variant of DS used for propagating covariance in what will be referred to as the standard unscented Kalman filter (UKF) [21-23]. The ensemble consists of *sigma-points*,

where *principal* axes of the covariance matrix, amplified by the marginal standard deviations and most importantly,

## 5. Sampling with conservation of moments

One class of methods of deterministic sampling conserves a limited number of statistical moments. The model parameters are sampled to satisfy these moments and collected in ensembles, similar to how parameters are sampled to fulfill probability distributions in RS.

### 5.1. Principle

The constraints of satisfying statistical moments constitute an infinite system of equations for the samples

The infinite number of equations requires an infinite number of samples. However, it is implicitly assumed that relatively few moments are known and significantly influence the result of interest. Only a few moments then needs to be accurately represented by

### 5.2. The excitation matrix

The UKF (section 4.2) utilizes DS with conservation of all first *formulation* allows for any additional ‘half’ unitary transformation

The samples *excitation matrix* *beyond* the first and second moments, e.g. the range of the samples. Row

The adopted transformation

### 5.3. Elimination of singular values

The rationale for applying the reduction to be presented is that any model is derived from a limited set of experiments, resulting in a usually moderate rank of

The ensemble may now be reduced by elimination of singular values (ESV). Choose a threshold

Proceeding as in many applications of SVD, this reduction (indicated by tilde below) will not change the result significantly, if

Unfortunately, the less distorting transformation

### 5.4. Correlated sampling of non-parametric models

A major difference between parametric and non-parametric models is the dimensionality. A conceptual dissimilarity is that non-parametric models usually refer to correlated signals, rather than abstract model structures. The parameters may describe discrete samples of input noise [7], or an impulse response [6]. A common parametric pole-zero model may contain 20 parameters, while a non-parametric model can be expressed in perhaps 1000 parameters. The ensembles of non-parametric models often need to be reduced drastically.

Due to limited resolution, the correlation times of any signal or impulse response is finite. Their ‘memory’ is thus finite so sample variations may be regenerated or repeated, as long as the time between repetitions exceeds the correlation time. This *correlated sampling* (CRS) provides efficient and accurate reduction of the ensembles. The minimal number of parameters

A finite correlation length

If the model is expressed as a convolution

By padding the model to an integer multiple of

where

Accordingly,

The consequence of violating the normalization constraint is that only a limited diagonal band of *not* result in any error of

### 5.5. Combining covariance

A signal processing model generally includes both parametric and non-parametric sources of uncertainty. For instance, a device (parametric system model) may be fed with a signal corrupted with noise (non-parametric noise model). The question then arises how the two sources

where

To combine ensembles of parametric

The scaling

A disadvantage of this summation is that the same type of ensemble must be used for all parameters. Both alternatives combine the statistics of the two models non-linearly. The uncertainties are propagated and combined by evaluating the model for all samples and calculating the desired statistics, just as if the combined ensemble described one model.

### 5.6. Selected ensembles

The standard (STD) ensemble employed in the UKF (as defined in section 4.2) utilizes the perhaps simplest possible excitation matrix,

While the ultimate simplicity is its main advantage, the long maximal(!) range

How far the reduction of samples might be driven is illustrated by the minimal simplex (SPX) ensemble,

where the operator

The binary (BIN) ensemble has minimal range to guarantee allowed samples. By varying all parameters with an equal magnitude of one standard deviation in all samples, the diverging factor

Cyclic shifts are applied to all original rows except the first, by a quarter of their periodicity. Mirror imaging of a row is defined to change the sign of its second half and is applied to all original rows except the last two, and all shifted rows except the last. For instance, in Eq. 32 row 4 and 5 are shifted versions of row 2 and 3, while rows 6 and 7 are the mirror images of rows 1 and the shifted row 4. The supplementary rows reduce the size of the ensemble drastically with a corresponding improvement of the efficiency. For

By no means, this brief survey exhausts all possible ensembles. Many criteria for selecting the most appropriate ensemble can be formulated. Here, the first and second moments, parameter ranges and efficiency were in focus.

## 6. Application — Modeling uncertainty of a dynamic device

The task is to simulate the response of an electrical device such as an amplifier or oscilloscope, in the presence of non-stationary correlated noise on its input. An uncertain LR CT model of the device and its parametric covariance is usually found by applying system identification techniques [4] on calibration measurements [6]. Such a model of the system can be sampled into a digital filter and be described in the pole-zero form in Eq. 2. These standard steps will here be omitted. The system model will instead be assigned to a digital low-pass Butterworth filter, of order

To be most general, the non-parametric input noise model is chosen to be correlated/colored and non-stationary. The noise parameter *generating signal* [7] is a Dirac delta function

Since the response is linear in noise parameters, it is sufficient to only capture

### 6.1. Reference ensembles

Traditionally, any method for uncertainty propagation is evaluated by comparisons with the default method of linearization [10,16], and brute-force random sampling (MC) [9] as state-of-the-art. There are several drawbacks of this approach. Linearization is a coarse approximation for LP models and MC suffer from the difficulty of modeling dependencies and low efficiency. An alternative is to construct finite reference ensembles (REF) and by definition let them describe the truth. Their primary advantage is that the finite size of the REF makes it possible to propagate the uncertainty exactly, using all REF samples. A more or less arbitrary REF may be generated randomly, like any MC ensemble. All requirements are also automatically fulfilled since the REF is built of possible realizations. Also, the REF closes the loop as it makes it possible to compare ‘true’ and approximate samples directly on an equal footing (see Fig. 8). Even though the samples differ substantially, the resulting modeling uncertainties can be similar.

A plausible REF

A REF signal

The exact REF result of modelling covariance of noise is given by combining Eqs. 34 and 36,

An explicit realization of the REF for the noise model is hence not needed. Specifically, a second order system

The ‘true’ result given by the response for the REFs for the different test signals is shown in Fig. 4. The propagated noise variation

### 6.2. Deterministic sampling

The error of the scent and the standard deviation for the STD, SPX and BIN ensembles of the system model is displayed in Fig. 5, for all test signals. The low scent of the REF (left: thin, dotted) suggests the model is close to LP. Despite the relative errors are large they are quite small on an absolute scale. The SPX has the largest errors, for the scent as well as the variance. That is likely caused by its skewness being much larger than that of the REF. The BIN has the lowest errors and is thus the best approximate representation of the REF.

The errors might appear large, considering *all* ensembles are ‘correct’, i.e. correctly represent (typically) available accurate information (mean and covariance of parameters). The errors reflect ambiguities caused by the ubiquitous lack of information in signal processing, rather than inadequacies of DS. RS can only produce better results by making further *assumptions*.

The result of applying the ESV and the CRS methods to reduce the SPX ensemble for propagating the noise is displayed in Fig. 6. By choosing sufficiently low thresholds

The summation of the noise and the model covariance is illustrated in Fig. 7. The propagation of the covariance of the system model

Finally, the samples of one pole of the derived ensembles are compared to the reference samples of the REF in Fig. 8. The limit

## 7. Conclusions

Deterministic sampling remains controversial [27] while random sampling has qualified as a preferred state-of-the-art method for propagating uncertainty. Both result in finite *statistical* [17] ensembles, which are approximate finite representations of the primary statistical models. Their sampling strategies and convergence rates are dramatically different. While deterministic sampling humbly aims at representing the most relevant and best known statistical information, random sampling targets complete control of all features of the ensemble. Such detailed information is rarely known and must instead be more or less blindly assigned. The inevitable consequence is that critical computational resources are spent on propagating, at best, vaguely known details. The numerical power of modern computers is better spent on refinements of the signal processing model (longer time series, higher sampling rates, larger systems etc.). Refined methods of random sampling have therefore been proposed which either simplifies the model, or improve the sampling distributions. Compared to deterministic sampling though, their convergence rates remain low.

It is easy to confuse deterministic sampling with experimental design and optimization [28]. Even though any sample could be a possible outcome of an experiment, deterministic ensembles *represent* rather than *realize* (as random ensembles) statistical distributions. Instead of associating a joint distribution to the parameters of an uncertain model, it is possible to directly represent their statistics with a deterministic ensemble. That would eliminate the need of interpreting abstract distributions and result in complete reproducibility. The critical choice of ensemble would be assigned once and for all in the calibration experiment, with no further need of approximation.

The use of excitation matrices made it possible to construct universal generic ensembles. The efficiency of the minimal SPX ensemble is indeed high but so is also its third moment. While the STD maximizes the range of each parameter, the BIN minimizes it by varying all parameters in all samples. The STD is the simplest while the SPX is the most efficient ensemble. In the example, the BIN was most accurate. For non-parametric models with many parameters, reduction of samples may be required. Elimination of singular values (ESV) and correlated sampling (CRS) were two such techniques. The presented ensembles are not to be associated to random sampling as a method. They are nothing but a few examples of deterministic sampling, likely the best ensembles are yet to be discovered.

It is indeed challenging but also rewarding to find novel deterministic sampling strategies. Once the sampling rules are found, the application is just as simple as random sampling, but usually much more efficient. Deterministic sampling is one of very few methods capable of non-linear propagation of uncertainty through large signal processing models.