Open access

Deterministic Sampling for Quantification of Modeling Uncertainty of Signals

Written By

Jan Peter Hessling

Submitted: April 2nd, 2012 Published: January 16th, 2013

DOI: 10.5772/52193

Chapter metrics overview

2,278 Chapter Downloads

View Full Metrics

1. Introduction

Statistical signal processing [1] traditionally focuses on extraction of information from noisy measurements. Typically, parameters or states are estimated by various filtering operations. Here, the quality of signal processing operations will be assessed by evaluating the statistical uncertainty of the result [2]. The processing could for instance simulate, correct, modulate, evaluate, or control the response of a physical system. Depending on the addressed task and the system, this can often be formulated in terms of a differential or difference signal processing model equation in time, with uncertain parameters and driven by an exciting input signal corrupted by noise. The quantity of primary interest may not be the output signal but can be extracted from it. If this uncertain dynamic model is linear-in-response it can be translated into a linear digital filter for highly efficient and standardized evaluation [3]. A statistical model of the parameters describing to which degree the dynamic model is known and accurate will be assumed given, instead of being the target of investigation as in system identification [4]. Model uncertainty (of parameters) is then propagated to model-ing uncertainty (of the result). The two are to be clearly distinguished – the former relate to the input while the latter relate to the output of the model.

Quantification of uncertainty of complex computations is an emerging topic, driven by the general need for quality assessment and rapid development of modern computers. Applications include e.g. various mechanical and electrical applications [5-7] using uncertain differential equations, and statistical signal processing. The so-called brute force Monte Carlo method [8-9] is the indisputable reference method to propagate model uncertainty. Its main disadvantage is its slow convergence, or requirement of using many samples of the model (large ensembles). Thus, it cannot be used for demanding complex models. The ensemble size is a key aspect which motivates deterministic sampling. Small ensembles are found by substituting the random generator with a customized deterministic sampling rule. Since any computerized random generator produces a pseudo-random rather than a truly random sequence, this is equivalent of modifying the random generator to be accurate for small ensembles of definite size, rather than being asymptotically exact (infinite ensembles). Correctness of very large ensembles is of theoretical but hardly practical interest for complex models, if the convergence to the asymptotic result is very slow.


2. Modeling uncertainty of signals

2.1. Problem definition

Suppose the (output) signal y(x,t)R of interest is generated from the (input) signal x(t)R passing through a dynamic system H, with parameters akR,bkR,


The model is given in n=u+v+1 uncertain parameters, which can be arranged in a column vector q=(b0bva1au)T. For systems continuous-in-time (CT), D=t is the differential operator in time while for systems discrete-in-time (DT), D=Δ1 is the negative unit displacement operator, Δ1xk=xk1. There are several approximate methods to sample CT systems to DT systems, see [3] and references therein. The discretization techniques are beyond the scope of this presentation and DT systems will be assumed. If u1, there is feedback in the system which results in an impulse response h(q,t) of infinite duration. For finite accuracy however, the duration is finite. The system is linear-in-response, y(x=αx1+βx2,t)=αy(x1,t)+βy(x2,t). Most importantly, the system is non-linear-in-parameters if u1. This is the typical situation addressed here.

Systems of the form in Eq. 1 may be directly realized as digital filters,y(q,x,t)=h(q)x(t), where denotes the filtering operation. The coefficients bk and ak are the numerator and denominator coefficients of the filter with impulse response h(q), respectively. Its z-transform H(q,z) is obtained with the substitution Δz. The parameterization can be changed to for instance gainK, poles pk and zeros zk, or poles pk and residues rk,


The parameterization should be carefully chosen as it affects the convergence rate of Taylor expansions (section 3.1) as well as the physical interpretation. The parameters and their statistics are preferably extracted from measurements using system identification techniques [4]. Note that complex-valued poles and zeros are conjugated in pairs [10].

The problem to be addressed is the statistical evaluation of any function g(y(t)=h(q,t)x(t)), given statistical models of q and x. It will here consist of evaluating its time-dependent mean g(y) and standard deviation [g(y)g(y)]2. Without loss of generality, the analysis will be made for g(y)=y. Digital filtering will be utilized for evaluating samples of the model, i.e. filtering with definite sets of q and signals x.

2.2. Nomenclature

Statistical expectations of any signal, model or function g(q) over finite discrete E as well as continuous ensembles or probability distributions (no subscript) are defined as,


Samples of q are labeled q^, with their components organized in columns. Sample indices will be given as superscripts in parenthesis, eg. q^(k) is a column vector denoting the kth sample of parameter q. Variations from the mean are written as δq(E)qq(E).

Only uniform (UNI) and normal distributions (NRM) will be utilized. Either the mean and standard deviation, or the interval in brackets will be given in parenthesis, e.g. q~UNI(0.5,1/23)=UNI([0,1]). Statistical moments Mi(k)=(δqi)kk carry the information contained in the marginalized probability density functions (pdf) fi(δqi)Qfq(δq)dq1dqi1dqi+1dqn, where Q denotes the sample space. While Mi(2) describes the width of fi(δqi), Mi(3) is related but different to its skewness [11]. Further, the shape is reflected in Mi(4), similarly to the curtosis [11]. Since UNI(0,1) and NRM(0,1) are normalized and symmetric fi(δqi)=fi(δqi), Mi(2)=1 and Mi(3)=0. Their differences are first reflected in their fourth moment, Mi(4)=1/45,1/23(0.11,0.29) for UNI(0,1) and NRM(0,1), respectively. The maximum variation of the parameter qi is expressed by the range Mi()limk|Mi(k)|=max(|δqi|). Dependencies are expressed in mixed moments (δqi1)k1(δqi2)k2(E). The discussion will be limited to correlations described by the covariance matrix cov(q)=δqδqT, where the vector multiplication is an outer product.

Matrix size will be indicated with subscripts, e.g. Vn×m is a matrix of n rows and m columns with elements Vjk, j=1,n and k=1,m. The identity matrix will be denotedI, while matrices with equal elements (i) will have their size attached, (in×n)jk=i. For a matrix (vector) D, diag(D) is a vector (diagonal matrix) with components (diagonal elements) equal to the diagonal elements (components) of D. The trace of a matrix is denoted Tr.

A method will be stated intrusive if manipulations of the model are required. For the targeted highly complex models, it will be assumed that the computational cost for their evaluation dominates all other calculations. The efficiency ρ of any method will accordingly be defined by the least required number of evaluations of the original model.

2.3. Fundamentals of non-linear propagation of uncertainty

Linearity in parameters (LP) is to be distinguished from linearity in response (LR),


for some vector Cn×1. Different concepts of linearity are used, y(q1+q2,x,t)y(q1,x,t)+y(q2,x,t) for LP models. Strictly speaking, LP denotes models that are affine, i.e. written as linear combinations of their parameters. Most constructed systems are designed to be as close to LR as possible while most models are not LP. There is hence no contradiction in non-linear (LP) propagation of uncertainty with linear (LR) digital filters, as here.

For non-linear propagation of uncertainty, the asymmetry of the resulting pdf is central. It can be expressed as a lack of commutation of non-linear propagation and statistical evaluation of a center value (C), as measured with the scent [12],


The method for evaluating the center is left unspecified, as there are several alternatives. The most common choice is to use the mean, C=. The lowest order approximation of the scent can then be obtained by calculating the expectation of a Taylor expansion (section 3.1), ζ=Tr[cov(q)Η(y)]/2, where Η(y)jk=2y/qjqk is the Hessian matrix signal of y, evaluated at q. The scent is related to the skewness γ=δy3/δy23/2. The additional asymmetry caused by the non-linearity of the model is measured with the scent but differently. The scent addresses how parametric uncertainties are propagated and not how the result is distributed, e.g. ζ=0 for all LP models for which γ may attain any value. A finite scent thus implies the model is not LP, but not the reverse. The scent should not be confused with bias. Bias is a property of an estimator, while scent is a property of a model. For every model, such as the REF (section 6.1), many different estimators of yC(q) can be used, e.g. the different ensembles in section 5.6, see result in Fig. 5 (left). Consequently, an unbiased estimator of yC(q) correctly accounts for rather than ignores its finite scent, or deviation from y(qC).

The scent is important since yC and not y(qc) is the main result utilized in applications. The corresponding difference [13] in the standard deviation My(2) from its linearized approximation yTcov(q)y, with (y)jk=jy(tk), affects the confidence in the result. Its accuracy is usually less critical. An accurate evaluation of the scent is perhaps the strongest feature of the unscented Kalman filter, which provides the foundation for the presented approach as well as the origin of the term ‘scent’.


3. Conventional methods

A brief resume of the most traditional related methods of uncertainty propagation, applicable to signal processing models, is here given together with their pros and cons. Advanced intrusive methods like e.g. polynomial chaos expansions [14-15] not directly related to the proposed method are omitted.

3.1. Taylor expansions

The indisputable default methods of uncertainty propagation are based on Taylor expansions. These methods are intrusive if the differentiations are made analytically. Convergent series require regular differentiable models and numerical or analytical complexity make them error prone. Their applicability is therefore limited for complex models.

The transfer function H(q,z) of the digital filter can be expanded in a Taylor series,


This defines n sensitivity systems (column vector) Ek(1)(q,z), n(n+1)/2 unique quadratic variation systems (matrix) E(2)(q,z), and so on. These variation systems differ (intrusive) from H(q,z) but may nevertheless be realized as digital filters [3,7,10], just as H(q,z). The corresponding variation of y(q,x,t)=h(q,t)x(t) is given by,


where e(k)(q,t) are the impulse responses of the systems E(k)(q,z). Utilizing digital filters with impulse responses e(k)(q,t), the differentiations are conveniently done once, and not repeatedly for every signal x(t). The linearity in parameters of the model can easily be studied for many different input signals x(t), by evaluating e(k)(q,t)x(t). Due to the large number of variation systems, higher order perturbation analyses rapidly become intractable though. The established method is limited to linearization (LIN) [16] (e(1)). It will always incorrectly yield vanishing scent, ζ=0. A first order estimate of ζ is instead given by the expectation of second term in Eq. 7, ζTr[cov(q)Η(q,t)]/2, where the matrix of Hessian signals Η(q,t)=e(2)(q,t)x(t) is obtained with repeated digital filtering.

3.2. Brute force Monte Carlo

Monte Carlo (MC) methods [8-9], or random sampling of uncertain models was originally introduced and phrased ‘statistical sampling’ by Enrico Fermi already in the 1930’s [17]. The MC methods realize uncertain signal processing models in finite ensembles. Every ensemble consists of a possible set of well-defined model systems, all (usually) having the same structure but slightly different parameter values. In the original so-called brute force Monte Carlo method, each set of parameters is assigned to the output of random generators with appropriate statistics. The convergence to the assigned statistics is very slow [5] but it is asymptotically exact and the required number of samples is essentially independent of the number of parameters. Hence it does not suffer from the curse-of-dimensionality of many other methods. The outstanding simplicity in application is likely the cause of its popularity, just as the slow convergence or low efficiency is the main reason for its failures.

In MC, arbitrary distributions and dependencies are usually obtained by means of transformations of samples of elementary distributions. Independent samples q^(k) of any probability density function (pdf) ϕ(x) can be constructed with the inverse transform method [9]. It consists of a calculation of the inverse of its cumulative distribution function (cdf) Φ(y) and generation of a uniformly distributed random sequence z^(k),


Covariance may be included with an appropriate transformation of samples of canonical parameters q˜:q=UTSq˜ with cov(q˜)=I,


The matrices S,U are found by calculating the eigenvalues (S2) and eigenvectors (U) [11] of cov(q). This transformation makes the marginal pdfs fk(qk) to differ substantially from the univariate pdfs ϕk of the independent but scaled parameters Sq˜k,

fk(qk)=ϕ1([Uq]1)ϕ2([Uq]2)ϕk([Uq]k)ϕn([Uq]n)dq1dqk1dqk+1dqnϕk(qk),if UI.E10

All ϕk are hence mixed according to U. Dependencies are thus difficult to account for. One rare exception is provided by the multinomial distribution [9]. It is often better to assign the pdfs to the canonical parameters in the original instead of the canonical basis. The transformation then reads q˜:q=UTSUq˜. As required, it leaves cov(q) invariant. The marginalization in Eq. 10 changes accordingly, USUTS1U. Since the transformation UTSUS1 of Sq˜k contains cancelling operations U,UT and S,S1, it is generally less distorting than UT. Indeed, if the commutator [S,UT]SUTUTS vanishes, UTSUS1=I. The transformation UT must satisfy the stronger criterion U=I to avoid mixing. For any transformation qWq, an indicator of mixing of the components of q is given by,


A simple example illustrates that the mixing effect can be considerable, even for minute correlations. Assume a model has two parameters with a covariance matrix,


Large rotations are required because the canonical variances Sjj2 are similar, i.e. cov(q) is almost degenerate. As shown in Fig. 1, the large rotations mix the assigned pdfs ϕk(Sq˜k) to marginal pdfs fk(qk) beyond recognition for the transformation UT but not for UTSUS1.

Figure 1.

Left: The sample space of independent scaled parameters (I:qk=Sq˜k) (Eq. 12), and of the two transformations (UT) (rotated) and (UTSUS1) (skewed and tilted). Right: Assigned pdfs ϕk(Sq˜k) (dashed) and obtained marginal pdfs fk(qk) (solid) ) with mixing Ψ(UT)=1.00 and Ψ(UTSUS1)=0.058, and magnified upper transition region (inset).

Specifying both marginal probability distributions and covariance is either redundant or inconsistent, as the latter is uniquely determined by the former. Nevertheless, this reflects the typical available information for signal processing applications. The moments can be accurately determined [4] for sufficiently large data sets but the joint distribution f(q) is hardly ever known with any precision. Some of its properties are usually assigned, with varying degree of confidence. For instance, the allowed maximal range M() of the parameters of digital filters is given by stability constraints. The transformation technique above is well adapted to these facts, since the covariance is prioritised. The transformation q=UTSUq˜ will be utilized in section 5.2 to include correlations with limited mixing of the statistics assigned to independent normalized canonical parameters q˜.

3.3. Refinements of Monte Carlo

To increase the efficiency of MC, the original brute force sampling technique has been further developed in mainly two directions: model simplification and sample distribution improvement. In response surface methodology (RSM) [18], the model is replaced by a simple approximate surrogate model. A model of order v may be found by applying linear (with respect to C) regression at collocation points [15] q^(k)=μ(k),


where Rj(q) is basis function j. Since it may be non-linear, RSM allows for non-linear propagation of uncertainty and may give a substantially different and more accurate result than LIN. If only linear basis functions are used Rj(q)=qj, RSM becomes equivalent to LIN. The best least square approximation is directly obtained from Eq. 13 [19],


Let RSM(r) utilize a complete set of mixed polynomial basis functions up to order r. Its least number (v) of collocation points grows rapidly with both the number of parameters (n) and polynomial order (r) [12],


In practice, r>3 often yields an unacceptable number of samples, see table 1.

n=2 n=5 n=10 n=20
r=1 3 6 11 21
r=2 6 21 66 231
r=3 10 56 286 1771

Table 1.

Efficiency ρ=v for RSM(r), for selected polynomial orders r and numbers n of parameters.

The distribution of samples may be improved with stratification, as in Latin Hypercube sampling (LHS) [18]. By dividing the sample space into intervals, or stratas representing equal probability the need for large ensembles is reduced. In LHS, each parameter is sampled exactly once in each of its stratas giving a generalized latin square [20]. This selection pushes the samples away from each other and distributes them more evenly. To illustrate the improvement with stratification, sample one parameter q~NRM(0,1). After division into m intervals of equal probability, samples are found with the inverse transform method described in section 3.2 (Eq. 8). As seen in Fig. 2, the convergence improves dramatically. Still, even for m=100 samples the second moment (left) varies noticeably. The convergence is generally poorer for higher order moments M(k), as shown for k=4 (right).

Figure 2.

The second M(2) (left) and fourth M(4) (right) moments for stratified (solid) and brute force sampling () of q~NRM(0,1), compared to a fixed grid (dashed).

In this case, it is questionable if 100 samples are sufficient to represent as few as four moments M(1)M(4). The probabilistically evenly distributed fixed grid (dashed) converges more rapidly to the proper statistics. Despite the prevailing tradition, there is no absolute requirement of using a random generator to represent statistical information. Fixed grids are examples of deterministic sampling. Stratification provides an interesting intermediate type of sampling since it is partially deterministic – the strata are constructed deterministically but the samples within each stratum are generated randomly. The construction of a fixed grid requires focus on the most relevant features. To reproduce M(1)M(4) exactly, a very sparse grid or few deterministic samples are needed,


If the problem at hand only depends on these moments, the exact solution will be obtained. The size of such small ensembles must be fixed, no matter how they are generated. Adding, or perturbing a single sample would modify the statistics substantially.


4. Deterministic sampling

Deterministic sampling (DS) of uncertain systems is a viable alternative to random sampling (RS). Instead of using random generators, specific DS rules are devised to generate appropriate, but still statistical (Fermi’s notation, see section 3.2) ensembles. A rudimentary example illustrates the principle: Assume a model y(q) depends on one parameter q with mean q and variance δq2. To estimate the mean y and the variance δy2 of the model, the samples (filter parameters) q^(1,2)=q±δq2 are appropriate since they satisfy the desired statistics, q^E=q and δq^2E=δq2. The formula for q^(1,2) constitutes the sampling rule and q^(1,2) is the statistical ensemble containing only two model samples. By paying the computational cost of using more samples and improving the sampling rule, additional moments δqk,k>2 or other statistical features can be accounted for.

In deterministic sampling the model evaluations involve no approximations and are non-invasive. In many respects, deterministic sampling is constructed and optimized for quantification of modeling uncertainty: Minimal ensembles allow for evaluation of the most numerically demanding models. The model evaluations are exact and non-invasive to fully respect non-linear deeply hidden parameter dependences. Only vaguely known statistics of the model is approximated.

4.1. Concepts of deterministic sampling

DS does not per se specify the goal of sampling, e.g. given mean and covariance of the parameters. In the example at the end of section 3.3, the primary target was the joint pdf of the parameters. In section 4.2, the target is M(2)(q). In section 5, this will be complemented with additional requirements. DS can also be utilized for direct evaluation of confidence intervals [12]. The targets of various DS methods may differ but the focus on the most influential statistical aspect and customization is shared. In stark contrast, almost without exception RS targets the joint pdf of the parameters and ignores the final utilization. Adaptation and fixed ensemble sizes provides the principal means to improve the efficiency of sampling.

4.2. Propagation of covariance in the standard unscented Kalman filter

The reference will be the specific variant of DS used for propagating covariance in what will be referred to as the standard unscented Kalman filter (UKF) [21-23]. The ensemble consists of 2n samples, or sigma-points,


where Δ:k denotes the k-th column of Δ. The sampling rule is manifested in the square root calculation of the covariance matrix (Δ). As suggested [23] it may be found with a Cholesky factorization [19]. The square root matrix is not unique though – the Cholesky root is upper triangular and thus asymmetric. A more symmetric standard alternative is to evaluate the matrix square root in a canonical basis [24] Uq where cov(Uq) is diagonal. The canonical variations Uδq^(s,v) will be unit vectors in the n positive and negative directions of the principal axes of the covariance matrix, amplified by the marginal standard deviations and most importantly, n. For many parameters with large covariance, the scaling with n may cause the UKF to fail since the scaling is not related to the variability of the parameters, only their total number. A possible solution to the scaling problem is provided by the scaled unscented transformation [25]. However, it is based on Taylor expansions and thus suffers from an approximation problem of the model.


5. Sampling with conservation of moments

One class of methods of deterministic sampling conserves a limited number of statistical moments. The model parameters are sampled to satisfy these moments and collected in ensembles, similar to how parameters are sampled to fulfill probability distributions in RS.

5.1. Principle

The constraints of satisfying statistical moments constitute an infinite system of equations for the samples δq^i(v). It can formally be viewed as sampling (=^) of the joint pdf f(q),


The infinite number of equations requires an infinite number of samples. However, it is implicitly assumed that relatively few moments are known and significantly influence the result of interest. Only a few moments then needs to be accurately represented by {δq^(v)}. Typically, δqi and δqi1δqi2 are estimated when models are identified [4,7]. In addition, the range M() or another higher diagonal moment can generally be determined from underlying physical constraints like stability. Clearly, any sampling rule must generate a fixed number of samples and create them simultaneously. The samples are consequently strongly dependent. One obvious sampling method is to solve Eq. 18 numerically for a sufficiently large number of samples δq^, as in Eq. 16. Due to the strong non-linearities, this is quite difficult for a large number of moments but may be feasible for a few moments.

5.2. The excitation matrix

The UKF (section 4.2) utilizes DS with conservation of all first (δqi) and second (δqi1δqi2) statistical moments. The invariance in its formulation allows for any additional ‘half’ unitary transformation ΔΔV, V:VVT=I. This results in another equally valid matrix Δ˜, since Δ˜Δ˜T=ΔV[ΔV]T=ΔVVTΔT=ΔΔT. Since the transformation V is allowed and influences the result, the result of applying the UKF is not unique. The matrix V condenses this invariance and provides practical means to manipulate the UKF ensemble. A key feature of V is the absence of constraints on VTV. That makes it possible to stretch V ‘horizontally’ (as long as VVT=I). That corresponds to adding samples (sigma-points). The improved transformation UTSU (section 3.2) can be applied by also combining U with V. The square root of the covariance matrix will then read Δ=UTSUV instead of Δ=UTSV,


The samples q^(k) are here collected in columns of the ensemble matrix Σ. The matrix cov(q)=ΔΔT=UTS2U is diagonalized,S2=eig(cov(q)), with the unitary transformation U [24]. The normalization factorm is included in the excitation matrix V^ to satisfy the correct covariance, ΣΣT=ΔΔT, just as the factor n was included in Eq. 17 (the ensemble is here expanded from n to m samples). The excitation matrix controls the sampling beyond the first and second moments, e.g. the range of the samples. Row k of the matrix SV^ can be interpreted as deterministic samples of the pdf ϕk(Sq˜k), assigned to canonical parameters in RS, see section 3.2. All ensembles will be described with a unique excitation matrixV^.

The adopted transformation UTSUS1 of SV^ distorts all higher moments than the second. This mixing effect is indicated by the index Ψ(UTSUS1) defined in Eq. 11. To diagonalize large matrices cov(q) many efficient techniques have been developed. This should not cause any difficulties even for n~1000, especially since cov(q) usually is either very sparse, or rank deficient for models with many parameters.

5.3. Elimination of singular values

The rationale for applying the reduction to be presented is that any model is derived from a limited set of experiments, resulting in a usually moderate rank of cov(q). If the number of parameters is large, it is thus often (nearly) rank deficient. The widely practiced singular value decomposition (SVD) [19] may then be used to reduce the excitation matrix and hence the number of samples. The most general form of SVD cannot be used here since it renders an asymmetric decomposition cov(q)=UTS2W, where UUT=UTU=WWT=WTW=I and Sjk2=δjkSkk0. Different matrices U,W allow for decomposition of an arbitrary matrix. For the symmetric matrix cov(q), a symmetric SVD U=W can be found with the less general eigenvalue decomposition [24], according to the spectral theorem [11]. As cov(q) is positive definite, all its eigenvalues Skk2 fulfill the requirement of being positive. This is required to directly obtain a real-valued matrix square root Δ=UTSUV.

The ensemble may now be reduced by elimination of singular values (ESV). Choose a threshold α and remove row r and column r from S and row r from U for all r such that,


Proceeding as in many applications of SVD, this reduction (indicated by tilde below) will not change the result significantly, if α is small enough. Accordingly, samples are eliminated using the alternative decomposition of the square root of cov(q),


Unfortunately, the less distorting transformation UTSUS1 of SV^ advocated in section 3.2 do not allow for r<n rows of the matrix V. The increase in distortion of M(k>2) indicated by Ψ(UTSUS1)Ψ(UT) is less important for the intended use though. Signal processing models with large numbers of parameters are typically non-parametric and usually describe samples of signals like impulse responses, or noise signals. The required LR property of the system then implies LP. The propagation of covariance is then linear and only the undistorted first and second moments need to be encoded.

5.4. Correlated sampling of non-parametric models

A major difference between parametric and non-parametric models is the dimensionality. A conceptual dissimilarity is that non-parametric models usually refer to correlated signals, rather than abstract model structures. The parameters may describe discrete samples of input noise [7], or an impulse response [6]. A common parametric pole-zero model may contain 20 parameters, while a non-parametric model can be expressed in perhaps 1000 parameters. The ensembles of non-parametric models often need to be reduced drastically.

Due to limited resolution, the correlation times of any signal or impulse response is finite. Their ‘memory’ is thus finite so sample variations may be regenerated or repeated, as long as the time between repetitions exceeds the correlation time. This correlated sampling (CRS) provides efficient and accurate reduction of the ensembles. The minimal number of parameters n is then set by the correlation time of the model. Most importantly, the size of the ensemble becomes independent of the size of the model (the length of the signal).

A finite correlation length τN of any model δx(t) is normally inferred from the decay of its autocorrelation functionC(t,T), where t denotes the lag and T refers to a non-stationary variation. Here, a global τ will be defined through its l2-norm and determined for a relative truncation threshold β (argmin returns the minimizing argument),


If the model is expressed as a convolution δx(t)=h(t)w(t) of an impulse response h(t) and time-dependent white noise w(t) as in section 6.2,


By padding the model to an integer multiple of γ2τ samples, it is always possible to choose an excitation matrix partitioned to block-diagonal form,


where V˜γ×γ is any allowed deterministic sub-ensemble. The factor c accounts for the change from γ samples of V˜γ×γ to the m>γ samples of V^n×m. By violating the normalization constraint V^n×mV^n×mT=m, the size of the ensemble can be ‘compressed’ from m to γ samples by moving all sub-matrices cV˜γ×γ to the first block-column and skipping all zeros. The introduced constant c drops out as mγ,




The consequence of violating the normalization constraint is that only a limited diagonal band of cov(x) is correctly reproduced. If a non-parametric model of a signal is propagated through a system model with impulse response h of correlation length σγ/2 this will nevertheless not result in any error of var(h), as it is independent of all faulty elements cov(ν)jk, |jk|>γ/2σ. To correctly evaluate cov(h)uv though, the size of the sub-ensembles V˜γ×γ of correlated sampling must fulfill the stronger size constraint γ2(max(τ,σ)+|uv|). The symmetry of convolutions implies a corresponding result when the non-parametric model describes the impulse response h of the system, rather than a signal.

5.5. Combining covariance

A signal processing model generally includes both parametric and non-parametric sources of uncertainty. For instance, a device (parametric system model) may be fed with a signal corrupted with noise (non-parametric noise model). The question then arises how the two sources qk,xk of uncertainty can be combined. For propagation of uncertainty through LP models, the combined covariance is given by the Gauss approximation formula [16],


where cov(y(k)) is the propagated covariance of qk,xk. This will seize to apply for non-LP models. There exists no general non-linear summation rule for propagated covariance. A method of summation can be given though, if different ensembles are combined as in RS.

To combine ensembles of parametric (q) and non-parametric models (x), collect all parameters, q(qTxT)T, and diagonalize the enlarged covariance matrix, cov(q)=UTS2U. Build V^ with two blocks and use CRS (section 5.4) for the non-parametric model,


The scaling 1+c±1 may cause a similar scaling problem as the factor n in the UKF (section 4.2). Using extended excitation matrices these factors can be eliminated,


A disadvantage of this summation is that the same type of ensemble must be used for all parameters. Both alternatives combine the statistics of the two models non-linearly. The uncertainties are propagated and combined by evaluating the model for all samples and calculating the desired statistics, just as if the combined ensemble described one model.

5.6. Selected ensembles

The standard (STD) ensemble employed in the UKF (as defined in section 4.2) utilizes the perhaps simplest possible excitation matrix,


While the ultimate simplicity is its main advantage, the long maximal(!) range M() is its main disadvantage.

How far the reduction of samples might be driven is illustrated by the minimal simplex (SPX) ensemble,


where the operator performs classical Gram-Schmidt orthogonalization [11] and normalization of rows. The ensemble is constructed from half the STD ensemble, complemented with one sample 1n×1 to cancel the first moments. Since that violates the orthogonality of the rows of V, must be applied to satisfy VVT=I. The high efficiency of the SPX ensemble is tarnished by its large skewness, or M(3). This may give considerable bias of propagated covariance for non-LP models, but is irrelevant for LP models.

The binary (BIN) ensemble has minimal range to guarantee allowed samples. By varying all parameters with an equal magnitude of one standard deviation in all samples, the diverging factor n of the STD is eliminated. Its excitation matrix V^BIN is fundamentally constructed from a standard binary array, with the difference that the allowed levels are ±1 instead of 0,1 (see rows 1-3 in Eq. 32). It is then complemented with supplementary rows obtained in two ways, by cyclic shifting and mirror imaging,


Cyclic shifts are applied to all original rows except the first, by a quarter of their periodicity. Mirror imaging of a row is defined to change the sign of its second half and is applied to all original rows except the last two, and all shifted rows except the last. For instance, in Eq. 32 row 4 and 5 are shifted versions of row 2 and 3, while rows 6 and 7 are the mirror images of rows 1 and the shifted row 4. The supplementary rows reduce the size of the ensemble drastically with a corresponding improvement of the efficiency. For n=20 parameters, the size drops from roughly 106 to 128 samples. That size is acceptable in perspective of the n+1=21 samples of the most efficient SPX. Eventually though, the number of samples will grow too large. The BIN can thus only be applied to moderately sized models.

By no means, this brief survey exhausts all possible ensembles. Many criteria for selecting the most appropriate ensemble can be formulated. Here, the first and second moments, parameter ranges and efficiency were in focus.


6. Application — Modeling uncertainty of a dynamic device

The task is to simulate the response of an electrical device such as an amplifier or oscilloscope, in the presence of non-stationary correlated noise on its input. An uncertain LR CT model of the device and its parametric covariance is usually found by applying system identification techniques [4] on calibration measurements [6]. Such a model of the system can be sampled into a digital filter and be described in the pole-zero form in Eq. 2. These standard steps will here be omitted. The system model will instead be assigned to a digital low-pass Butterworth filter, of order 10 and cross-over frequency fC=0.1fN, fN being the Nyquist frequency and described by parameters K,p1,p2,p10,z1,z2,z10. The complete correlations of complex-conjugated pole (p) and zero (z) pairs are eliminated by a transformation from q=z,p to Re(q),Im(q)0, giving n=21 system model parameters,


To be most general, the non-parametric input noise model is chosen to be correlated/colored and non-stationary. The noise parameter δxk represents the noise level at time sample k. Its generating signal [7] is a Dirac delta function δjk, centered at time k. The response of a system with impulse response h will be δyj=δxk(hjδjk)=hjkδxk. In matrix notation, δy=h¯Tδx, where h¯kj=hjk. Hence,


Since the response is linear in noise parameters, it is sufficient to only capture cov(x).

6.1. Reference ensembles

Traditionally, any method for uncertainty propagation is evaluated by comparisons with the default method of linearization [10,16], and brute-force random sampling (MC) [9] as state-of-the-art. There are several drawbacks of this approach. Linearization is a coarse approximation for LP models and MC suffer from the difficulty of modeling dependencies and low efficiency. An alternative is to construct finite reference ensembles (REF) and by definition let them describe the truth. Their primary advantage is that the finite size of the REF makes it possible to propagate the uncertainty exactly, using all REF samples. A more or less arbitrary REF may be generated randomly, like any MC ensemble. All requirements are also automatically fulfilled since the REF is built of possible realizations. Also, the REF closes the loop as it makes it possible to compare ‘true’ and approximate samples directly on an equal footing (see Fig. 8). Even though the samples differ substantially, the resulting modeling uncertainties can be similar.

A plausible REF δqj for the system model realized as a digital filter is created by randomly generating m samples of n parameters qk from uniform distributionsUNI(0,σk), with σk listed in Fig. 3, top left. The joint pdf will have compact support [11], as required to guarantee stability. The mean is subtracted from all samples to remove the bias of the finite random ensemble, δqjE=δqj=0,j. The covariance of the REF will have a desirable more or less random variation for small values of m. If the REF samples are arranged in columns of a matrix Λ^n×m, (as V^) cov(q)REF=1/mΛ^Λ^T. For m=31>n=21, the strong correlations will expose the methods to severe tests with significant transformations U,S. The mixing Ψ (Eq. 13) using transformation UTSUS1 was considerable, but less than for UT, see caption Fig. 3. For the chosen REF, the resulting variations of poles and zeros are displayed in Fig. 3. The obtained variation of the parameters defined in Eq. 33 can be quantified with an averaged correlation index and standard deviation (Fig. 3, bottom left),


A REF signal δxj for non-stationary correlated noise may conveniently be generated from an autoregressive process (AR) acting on time-dependent zero mean white noise δw, δx=g¯Tδw, where g¯kj=gjk is the matrix of translated impulse responses g for the AR process defined by parameters αk. Assigning a square wave time-dependence,


The exact REF result of modelling covariance of noise is given by combining Eqs. 34 and 36,


An explicit realization of the REF for the noise model is hence not needed. Specifically, a second order system α=[10.40.6] with time parameters {Ν,ψ,T,φ}={0.05,0.3,2fC1,π/8} was chosen. The impulse responses of the AR noise system and the system model, and the variation of the noise model are illustrated in Fig. 3, bottom right.

Figure 3.

Top: Assigned variations (left) and resulting (zk middle, pk right) samples of the REF of the system model. Label P indicates the pole explored in Fig. 8. Bottom, left: Obtained variations σk (dots) and correlations ξk (bars) of parameters q (Eq. 33), with mixing (Eq. 11) Ψ(UTSUS1)=0.22 (adopted) and Ψ(UT)=0.39. Bottom, right: Impulse responses h(q,t) and g(t) and time-dependence η(t) (Eq. 36) of noise intensity. The correlation lengths λh,λg were determined according to Eqs. 22-23, for β=0.05.

The ‘true’ result given by the response for the REFs for the different test signals is shown in Fig. 4. The propagated noise variation σN differs substantially from the input square wave η (top left) and is almost opposite in phase, due to the response time of about fC1, see delay of μS (top, right and bottom). The signal distortion (μS) is strongly dependent on the input signal and decreases with increased regularity / differentiability. The propagated covariance σS has a more complex variation (top, right and bottom), as it is larger for the more regular Gaussian (bottom, right) than for the triangular pulse (bottom, left).

Figure 4.

The mean μS (dashed), the standard deviations σS,N (solid) and the scent ζ (thin, solid) for the REFs, for the different test signals (thin, dashed). The subscripts refer to the system (S) and noise (N) models. The variation of noise intensity is given by η(t) (top, left).

6.2. Deterministic sampling

The error of the scent and the standard deviation for the STD, SPX and BIN ensembles of the system model is displayed in Fig. 5, for all test signals. The low scent of the REF (left: thin, dotted) suggests the model is close to LP. Despite the relative errors are large they are quite small on an absolute scale. The SPX has the largest errors, for the scent as well as the variance. That is likely caused by its skewness being much larger than that of the REF. The BIN has the lowest errors and is thus the best approximate representation of the REF.

Figure 5.

The errors of the scent ζζREF (left) and the standard deviation σSσS,REF (right) of the system model (solid) for the STD (top), SPX (middle) and BIN (bottom) and the three test signals (thin, dashed). The correct ζREF (left) and σS,REF (right) are included for comparison (thin, dotted). The triangular and Gaussian signals are displaced for clarity.

The errors might appear large, considering all ensembles are ‘correct’, i.e. correctly represent (typically) available accurate information (mean and covariance of parameters). The errors reflect ambiguities caused by the ubiquitous lack of information in signal processing, rather than inadequacies of DS. RS can only produce better results by making further assumptions.

The result of applying the ESV and the CRS methods to reduce the SPX ensemble for propagating the noise is displayed in Fig. 6. By choosing sufficiently low thresholds α for elimination of singular values (ESV) and β for truncation of the correlation lengths (see Eqs. 20,22), the errors can be made arbitrarily low. As the reduction will decrease accordingly, there is a trade-off between accuracy and efficiency. For the chosen values, CRS is about twice as accurate and twice as efficient as ESV. In contrast to ESV, the number of samples for the CRS method is independent of the number of noise samples. The computational cost thus increases linearly with the length of the noise signal for CRS but quadratically (approximately) for ESV. For ESV to be most efficient, the model covariance needs to be strongly rank deficient. That is not as unlikely as it might appear, since the model usually is derived from a limited amount of experimental results.

Figure 6.

The error σNσN,REF of propagated noise, for the ESV (section 5.3) and CRS (section 5.4) ensemble reduction methods, and the correct σN,REF (thin, ×1/10). The thresholds were α=0.1 (Eq. 20) for ESV, and β=0.05 (Eq. 22) for CRS. That resulted in m=142 samples for ESV and m=75 for CRS, compared to m=402 of the original SPX.

The summation of the noise and the model covariance is illustrated in Fig. 7. The propagation of the covariance of the system model (q) is not LP. The quadratic summation rule (Eq. 27), or Gauss approximation formula [16], is therefore not applicable. Nevertheless, the low scent ζ (Fig. 5, left) suggests that both propagations are close to LP. The summation error (ε) is hence finite, but quite small. It differs qualitatively from both contributions, indicating that the summation is non-trivial.

Figure 7.

Summation of covariance: Total (solid), system (dashed) and noise (dotted), for the three test signals (thin, dashed), with the error (ε) of square summation (×10) (Eq. 27).

Finally, the samples of one pole of the derived ensembles are compared to the reference samples of the REF in Fig. 8. The limit (|z|=1) of stability is included to illustrate how close the samples are to be physically forbidden. The construction of the different ensembles is apparent, even though the transformation T=UTSUS1 distorts the scatter plots (sections 3.2, 5.2), and tilts the principal axes (lines). The samples of the REF are almost evenly distributed. Only four samples of the STD, labelled p1,p2,p3,p4, deviate significantly from a dense central cluster, as described by the excitation matrix V^STD (Eq. 30). It also is evident that SPX originates from half the STD. A small translation required to achieve the correct mean is discernible, while the Gram-Schmidt orthogonalization renders a minor rotation and distortion. The BIN contains comparable variations in all samples and thus has no central cluster and its samples are repelled from the principal directions (lines). The statistical differences to the REF refer to the shape of the joint pdf. Choosing the best ensemble is thus equivalent of selecting the most appropriate pdf in RS. The BIN seems to resemble the REF scatter plot the most, as verified by its low errors in Fig. 5.

Figure 8.

The different samples (dots) of the pole marked ‘P’ in Fig. 3, of the reference (REF), standard (STD), simplex (SPX) and binary (BIN) ensembles. The limit |z|=1 of stability (solid, thick) and lines connecting the primary variations p1,p2,p3,p4 of the STD as well as lines (dashed) to combined excitations of the BIN are included for reference.


7. Conclusions

Deterministic sampling remains controversial [27] while random sampling has qualified as a preferred state-of-the-art method for propagating uncertainty. Both result in finite statistical [17] ensembles, which are approximate finite representations of the primary statistical models. Their sampling strategies and convergence rates are dramatically different. While deterministic sampling humbly aims at representing the most relevant and best known statistical information, random sampling targets complete control of all features of the ensemble. Such detailed information is rarely known and must instead be more or less blindly assigned. The inevitable consequence is that critical computational resources are spent on propagating, at best, vaguely known details. The numerical power of modern computers is better spent on refinements of the signal processing model (longer time series, higher sampling rates, larger systems etc.). Refined methods of random sampling have therefore been proposed which either simplifies the model, or improve the sampling distributions. Compared to deterministic sampling though, their convergence rates remain low.

It is easy to confuse deterministic sampling with experimental design and optimization [28]. Even though any sample could be a possible outcome of an experiment, deterministic ensembles represent rather than realize (as random ensembles) statistical distributions. Instead of associating a joint distribution to the parameters of an uncertain model, it is possible to directly represent their statistics with a deterministic ensemble. That would eliminate the need of interpreting abstract distributions and result in complete reproducibility. The critical choice of ensemble would be assigned once and for all in the calibration experiment, with no further need of approximation.

The use of excitation matrices made it possible to construct universal generic ensembles. The efficiency of the minimal SPX ensemble is indeed high but so is also its third moment. While the STD maximizes the range of each parameter, the BIN minimizes it by varying all parameters in all samples. The STD is the simplest while the SPX is the most efficient ensemble. In the example, the BIN was most accurate. For non-parametric models with many parameters, reduction of samples may be required. Elimination of singular values (ESV) and correlated sampling (CRS) were two such techniques. The presented ensembles are not to be associated to random sampling as a method. They are nothing but a few examples of deterministic sampling, likely the best ensembles are yet to be discovered.

It is indeed challenging but also rewarding to find novel deterministic sampling strategies. Once the sampling rules are found, the application is just as simple as random sampling, but usually much more efficient. Deterministic sampling is one of very few methods capable of non-linear propagation of uncertainty through large signal processing models.


  1. 1. Kay S. Fundamentals of Statistical signal processing: Estimation Theory. New Jersey: Prentice Hall; 1993.
  2. 2. Hessling JP. Propagation of dynamic measurement uncertainty. Meas. Sci. Technol. 2011; 22 (10) 105105 (13pp).
  3. 3. Hessling JP. Integration of digital filters and measurements. In: Márquez FPG. (ed.) Digital Filters. Rijeka: InTech; 2011. p123-154. Available from (accessed 4 July 2012).
  4. 4. Pintelon R, Schoukens J. System Identification: A Frequency Domain Approach. Piscataway, New Jersey: IEEE Press; 2001.
  5. 5. Witteveen JAS. Efficient and Robust Uncertainty Quantification for Computational Fluid Dynamics and Fluid-Structure Interaction. PhD thesis. Delft University of Technology; 2009.
  6. 6. Hale PD, Dienstfrey A, Wang JCM, Williams DF, Lewandowski A, Keenan DA, Clement TS. Traceable Waveform Calibration With a Covariance-Based Uncertainty Analysis. IEEE Trans. Instrum. Meas. 2009; 58 (10) 3554-3568.
  7. 7. Hessling JP. Metrology for non-stationary dynamic measurements. In: Sharma MK. (ed.) Advances in Measurement systems. Vukovar: InTech; 2010. p. 221-256. Available from (accessed 4 July 2012).
  8. 8. Metropolis N, Ulam S. The Monte Carlo Method. Journal of the American Statistical Association 1949; 44 (247) 335-341.
  9. 9. Rubenstein RY, Kroese DP. Simulation and the Monte Carlo Method, 2nd Ed. New York: John Wiley & Sons Inc.; 2007.
  10. 10. Hessling JP. A novel method of evaluating dynamic measurement uncertainty utilizing digital filters. Meas. Sci. Technol. 2009; 20 (5) 055106 (11pp).
  11. 11. Råde L, Westergren B. Beta Mathematics Handbook, 2nd Ed. Lund, Sweden: Studentlitteratur; 1990.
  12. 12. Hessling JP, Svensson T. Propagation of uncertainty by sampling on confidence boundaries, accepted for publication in International Journal for Uncertainty Quantification.
  13. 13. Hessling JP. Deterministic sampling for propagating model covariance, submitted for publication.
  14. 14. Lovett T. Polynomial Chaos Simulation of Analog and Mixed-Signal Systems: Theory, Modeling method, Application. Saarbrucken: Lambert Academic Publishing; 2010.
  15. 15. Li H, Zhang D. Probabilistic collocation method for flow in porous media: Comparisons with other stochastic methods. Water Resources Research 2007; 43 W09409 (13 pp).
  16. 16. ISO GUM. Guide to the Expression of Uncertainty in Measurement. Geneva: International Organisation for Standardisation; 1995.
  17. 17. Metropolis N. The Beginning of the Monte Carlo Method. Los Alamos Science special issue 1987; 15 125-130.
  18. 18. Helton J, Davis L. Latin hypercube sampling and the propagation of uncertainty in analyses of complex systems. Reliability Engineering and System Safety 2003; 81 23-69.
  19. 19. Björk Å. Numerical methods for least squares problems. Philadelphia: Siam; 1996.
  20. 20. Wikipedia: (accessed 3 July 2012):
  21. 21. Julier S, Uhlmann J, Durrant-Whyte HF. A new approach for filtering nonlinear systems. Proc IEEE American Control Conference June 21-23 1995; 1628-1632.
  22. 22. Julier S, Uhlmann J. Unscented filtering and nonlinear estimation. Proceeding IEEE March 2004; 92 (3) 401-422.
  23. 23. Simon D. Optimal State Estimation: Kalman, H and non-linear approaches. New Jersey: Wiley; 2006.
  24. 24. Matlab with Signal Processing Toolbox, The Mathworks, Inc.
  25. 25. Julier S, Uhlmann J. The scaled unscented transformation, Proceedings of the IEEE American Control Conference 8-10 May 2002; 4555-4559.
  26. 26. Hessling JP. Non-linear propagation and summation of covariance using deterministic sampling, in preparation.
  27. 27. Gustafsson F, Hendeby G. Some Relations between Extended and Unscented Kalman Filters. IEEE Trans. sign. proc. 2012; 60 (2) 545-555.
  28. 28. Fischer RA. Statistical Methods, Experimental design and Scientific Inference. New York: Oxford University Press; 1990.

Written By

Jan Peter Hessling

Submitted: April 2nd, 2012 Published: January 16th, 2013