Open access peer-reviewed chapter

Deterministic Sampling for Quantification of Modeling Uncertainty of Signals

By Jan Peter Hessling

Submitted: April 2nd 2012Reviewed: August 8th 2012Published: January 16th 2013

DOI: 10.5772/52193

1. Introduction

Statistical signal processing [1] traditionally focuses on extraction of information from noisy measurements. Typically, parameters or states are estimated by various filtering operations. Here, the quality of signal processing operations will be assessed by evaluating the statistical uncertainty of the result [2]. The processing could for instance simulate, correct, modulate, evaluate, or control the response of a physical system. Depending on the addressed task and the system, this can often be formulated in terms of a differential or difference signal processing model equationin time, with uncertain parameters and driven by an exciting input signal corrupted by noise. The quantity of primary interest may not be the output signal but can be extracted from it. If this uncertain dynamic model is linear-in-response it can be translated into a linear digital filter for highly efficient and standardized evaluation [3]. A statistical model of the parameters describing to which degree the dynamic model is known and accurate will be assumed given, instead of being the target of investigation as in system identification [4]. Model uncertainty(of parameters) is then propagatedto model-ing uncertainty(of the result).The two are to be clearly distinguished – the former relate to the input while the latter relate to the output of the model.

Quantification of uncertainty of complex computations is an emerging topic, driven by the general need for quality assessment and rapid development of modern computers. Applications include e.g. various mechanical and electrical applications [5-7] using uncertain differential equations, and statistical signal processing. The so-called brute force Monte Carlo method [8-9] is the indisputable reference method to propagate model uncertainty. Its main disadvantage is its slow convergence, or requirement of using many samples of the model (large ensembles). Thus, it cannot be used for demanding complex models. The ensemble size is a key aspect which motivates deterministic sampling. Small ensembles are found by substituting the random generator with a customized deterministic sampling rule. Since any computerized random generator produces a pseudo-random rather than a truly random sequence, this is equivalent of modifying the random generator to be accuratefor smallensembles of definitesize, rather than being asymptotically exact(infinite ensembles). Correctness of very large ensembles is of theoretical but hardly practical interest for complex models, if the convergence to the asymptotic result is very slow.

2. Modeling uncertainty of signals

2.1. Problem definition

Suppose the (output) signal y(x,t)Rof interest is generated from the (input) signal x(t)Rpassing through a dynamic system H, with parameters akR,bkR,

[k=0uakDk]y=[k=0vbkDk]x,a0=1.E1

The model is given in n=u+v+1uncertain parameters, which can be arranged in a column vector q=(b0bva1au)T. For systems continuous-in-time (CT), D=tis the differential operator in time while for systems discrete-in-time (DT), D=Δ1is the negative unit displacement operator, Δ1xk=xk1. There are several approximate methods to sample CT systems to DT systems, see [3] and references therein. The discretization techniques are beyond the scope of this presentation and DT systems will be assumed. If u1, there is feedback in the system which results in an impulse response h(q,t)of infinite duration. For finite accuracy however, the duration is finite. The system is linear-in-response, y(x=αx1+βx2,t)=αy(x1,t)+βy(x2,t). Most importantly, the system is non-linear-in-parameters if u1. This is the typical situation addressed here.

Systems of the form in Eq. 1 may be directly realized as digital filters,y(q,x,t)=h(q)x(t), where denotes the filtering operation. The coefficients bkand akare the numerator and denominator coefficients of the filter with impulse response h(q), respectively. Its z-transform H(q,z)is obtained with the substitution Δz. The parameterization can be changed to for instance gainK, poles pkand zeros zk, or poles pkand residues rk,

Y(z)=H(z)X(z):H(q,z)=k=0vbkzkk=0uakzk=Kk(zzk)/(1zk)k(zpk)/(1pk)=krk(zpk)E2

The parameterization should be carefully chosen as it affects the convergence rate of Taylor expansions (section 3.1) as well as the physical interpretation. The parameters and their statistics are preferably extracted from measurements using system identification techniques [4]. Note that complex-valued poles and zeros are conjugated in pairs [10].

The problem to be addressed is the statistical evaluation of any function g(y(t)=h(q,t)x(t)), given statistical models of qand x. It will here consist of evaluating its time-dependent mean g(y)and standard deviation [g(y)g(y)]2. Without loss of generality, the analysis will be made for g(y)=y. Digital filtering will be utilized for evaluating samples of the model, i.e. filtering with definite sets of qand signals x.

2.2. Nomenclature

Statistical expectations of any signal, model or function g(q)over finite discrete Eas well as continuous ensembles or probability distributions (no subscript) are defined as,

gE=1mk=1mg(q^(k)),g=Qg(q)fq(q)dq.E3

Samples of qare labeled q^, with their components organized in columns. Sample indices will be given as superscripts in parenthesis, eg. q^(k)is a column vector denoting the kthsample of parameter q. Variations from the mean are written as δq(E)qq(E).

Only uniform (UNI)and normal distributions (NRM)will be utilized. Either the mean and standard deviation, or the interval in brackets will be given in parenthesis, e.g. q~UNI(0.5,1/23)=UNI([0,1]). Statistical moments Mi(k)=(δqi)kkcarry the information contained in the marginalized probability density functions (pdf) fi(δqi)Qfq(δq)dq1dqi1dqi+1dqn, where Qdenotes the sample space. While Mi(2)describes the width of fi(δqi), Mi(3)is related but different to its skewness[11]. Further, the shape is reflected in Mi(4), similarly to the curtosis[11]. Since UNI(0,1)and NRM(0,1)are normalized and symmetric fi(δqi)=fi(δqi), Mi(2)=1and Mi(3)=0. Their differences are first reflected in their fourth moment, Mi(4)=1/45,1/23(0.11,0.29)for UNI(0,1)and NRM(0,1), respectively. The maximum variation of the parameter qiis expressed by the range Mi()limk|Mi(k)|=max(|δqi|). Dependencies are expressed in mixed moments (δqi1)k1(δqi2)k2(E). The discussion will be limited to correlations described by the covariance matrix cov(q)=δqδqT, where the vector multiplication is an outer product.

Matrix size will be indicated with subscripts, e.g. Vn×mis a matrix of nrows and mcolumns with elements Vjk, j=1,nand k=1,m. The identity matrix will be denotedI, while matrices with equal elements (i)will have their size attached, (in×n)jk=i. For a matrix (vector) D, diag(D)is a vector (diagonal matrix) with components (diagonal elements) equal to the diagonal elements (components) of D. The trace of a matrix is denoted Tr.

A method will be stated intrusive if manipulations of the model are required. For the targeted highly complex models, it will be assumed that the computational cost for their evaluation dominates all other calculations. The efficiency ρof any method will accordingly be defined by the least required number of evaluations of the original model.

2.3. Fundamentals of non-linear propagation of uncertainty

Linearity in parameters (LP) is to be distinguished from linearity in response (LR),

LR:y(q,a1x1+a2x2,t)=a1y(q,x1,t)+a2y(q,x2,t),x1,x2LP:y(q1+q2,x,t)=y(q1,x,t)+C(x,t)T(q2q1)q1,q2,E4

for some vector Cn×1. Different concepts of linearity are used, y(q1+q2,x,t)y(q1,x,t)+y(q2,x,t)for LP models. Strictly speaking, LP denotes models that are affine, i.e. written as linear combinations of their parameters. Most constructed systems are designed to be as close to LR as possible while most models are not LP. There is hence no contradiction in non-linear (LP) propagation of uncertainty with linear (LR) digital filters, as here.

For non-linear propagation of uncertainty, the asymmetry of the resulting pdf is central. It can be expressed as a lack of commutation of non-linear propagation and statistical evaluation of a center value (C), as measured with the scent[12],

ζyC(q)y(qC).E5

The method for evaluating the center is left unspecified, as there are several alternatives. The most common choice is to use the mean, C=. The lowest order approximation of the scent can then be obtained by calculating the expectation of a Taylor expansion (section 3.1), ζ=Tr[cov(q)Η(y)]/2, where Η(y)jk=2y/qjqkis the Hessian matrix signal of y, evaluated at q. The scent is related to the skewness γ=δy3/δy23/2. The additionalasymmetry caused by the non-linearity of the model is measured with the scent but differently. The scent addresses how parametric uncertainties are propagated and not how the result is distributed, e.g. ζ=0for all LP models for which γmay attain any value. A finite scent thus implies the model is not LP, but not the reverse. The scent should not be confused with bias. Bias is a property of an estimator, while scent is a property of a model. For every model, such as the REF (section 6.1), many different estimators of yC(q)can be used, e.g. the different ensembles in section 5.6, see result in Fig. 5 (left). Consequently, an unbiased estimator of yC(q)correctly accounts for rather than ignores its finite scent, or deviation from y(qC).

The scent is important since yCand not y(qc)is the main result utilized in applications. The corresponding difference [13] in the standard deviation My(2)from its linearized approximation yTcov(q)y, with (y)jk=jy(tk), affects the confidence in the result. Its accuracy is usually less critical. An accurate evaluation of the scent is perhaps the strongest feature of the unscented Kalman filter, which provides the foundation for the presented approach as well as the origin of the term ‘scent’.

3. Conventional methods

A brief resume of the most traditional related methods of uncertainty propagation, applicable to signal processing models, is here given together with their pros and cons. Advanced intrusive methods like e.g. polynomial chaos expansions [14-15] not directly related to the proposed method are omitted.

3.1. Taylor expansions

The indisputable default methods of uncertainty propagation are based on Taylor expansions. These methods are intrusive if the differentiations are made analytically. Convergent series require regular differentiable models and numerical or analytical complexity make them error prone. Their applicability is therefore limited for complex models.

The transfer function H(q,z)of the digital filter can be expanded in a Taylor series,

δH(q,z)=H(q,z)H(q,z)=k=1+1k!(δqTq)kH(q,z)=δqTqH+12k,lnδqkδql2Hqlqk+=δqTE(1)(q,z)+Tr{[δqδqT]E(2)(q,z)}+E6

This defines nsensitivity systems (column vector) Ek(1)(q,z), n(n+1)/2unique quadratic variation systems (matrix) E(2)(q,z), and so on. These variation systems differ (intrusive) from H(q,z)but may nevertheless be realized as digital filters [3,7,10], just as H(q,z). The corresponding variation of y(q,x,t)=h(q,t)x(t)is given by,

δy(q,x,t)=δqT[e(1)(q,t)x(t)]+12Tr{[δqδqT][e(2)(q,t)x(t)]}+,E7

where e(k)(q,t)are the impulse responses of the systems E(k)(q,z). Utilizing digital filters with impulse responses e(k)(q,t), the differentiations are conveniently done once, and not repeatedly for every signal x(t). The linearity in parameters of the model can easily be studied for many different input signals x(t), by evaluating e(k)(q,t)x(t). Due to the large number of variation systems, higher order perturbation analyses rapidly become intractable though. The established method is limited to linearization (LIN) [16] (e(1)). It will always incorrectly yield vanishing scent, ζ=0. A first order estimate of ζis instead given by the expectation of second term in Eq. 7, ζTr[cov(q)Η(q,t)]/2, where the matrix of Hessian signals Η(q,t)=e(2)(q,t)x(t)is obtained with repeated digital filtering.

3.2. Brute force Monte Carlo

Monte Carlo (MC) methods [8-9], or random samplingof uncertain models was originally introduced and phrased ‘statistical sampling’ by Enrico Fermi already in the 1930’s [17]. The MC methods realizeuncertain signal processing models in finite ensembles. Every ensemble consists of a possible set of well-defined model systems, all (usually) having the same structure but slightly different parameter values. In the original so-called brute force Monte Carlo method, each set of parameters is assigned to the output of random generators with appropriate statistics. The convergence to the assigned statistics is very slow [5] but it is asymptotically exact and the required number of samples is essentially independent of the number of parameters. Hence it does not suffer from the curse-of-dimensionality of many other methods. The outstanding simplicity in application is likely the cause of its popularity, just as the slow convergence or low efficiency is the main reason for its failures.

In MC, arbitrary distributions and dependencies are usually obtained by means of transformations of samples of elementary distributions. Independent samples q^(k)of any probability density function (pdf) ϕ(x)can be constructed with the inverse transform method [9]. It consists of a calculation of the inverse of its cumulative distribution function (cdf) Φ(y)and generation of a uniformly distributed random sequence z^(k),

q^(k)=Φ1(z^(k)),Φ(y)=yϕ(x)dx,z~UNI(0,1),k=1,2,m.E8

Covariance may be included with an appropriate transformation of samples of canonicalparameters q˜:q=UTSq˜with cov(q˜)=I,

cov(q)=δqδqT=UTSδq˜(UTSδq˜)T=UTSδq˜δq˜TSU=UTS2U,{UTU=UUT=ISjk2=0,jk,E9

The matrices S,Uare found by calculating the eigenvalues (S2)and eigenvectors (U)[11] of cov(q). This transformation makes the marginal pdfs fk(qk)to differ substantially from the univariate pdfs ϕkof the independent but scaled parameters Sq˜k,

fk(qk)=ϕ1([Uq]1)ϕ2([Uq]2)ϕk([Uq]k)ϕn([Uq]n)dq1dqk1dqk+1dqnϕk(qk),if UI.E10

All ϕkare hence mixed according to U. Dependencies are thus difficult to account for. One rare exception is provided by the multinomial distribution [9]. It is often better to assign the pdfs to the canonical parameters in the original instead of the canonical basis. The transformation then reads q˜:q=UTSUq˜. As required, it leaves cov(q)invariant. The marginalization in Eq. 10 changes accordingly, USUTS1U. Since the transformation UTSUS1of Sq˜kcontains cancelling operations U,UTand S,S1, it is generally less distorting than UT. Indeed, if the commutator [S,UT]SUTUTSvanishes, UTSUS1=I. The transformation UTmust satisfy the stronger criterion U=Ito avoid mixing. For any transformation qWq, an indicator of mixing of the components of qis given by,

Ψ(W)1nr=1n(1maxc|Wrc|minc|Wrc|Wr,:)[0,1],Wr,:c=1n|Wrc|2.E11

A simple example illustrates that the mixing effect can be considerable, even for minute correlations. Assume a model has two parameters with a covariance matrix,

cov(q)=(0.900.100.100.90)U=12(1111),S2=(1000.8){ϕ1(Sq˜1)=UNI([0,1])ϕ2(Sq˜2)=UNI([0,0.8]).E12

Large rotations are required because the canonical variances Sjj2are similar, i.e. cov(q)is almost degenerate. As shown in Fig. 1, the large rotations mix the assigned pdfs ϕk(Sq˜k)to marginal pdfs fk(qk)beyond recognition for the transformation UTbut not for UTSUS1.

Specifying both marginal probability distributions and covariance is either redundant or inconsistent, as the latter is uniquely determined by the former. Nevertheless, this reflects the typical available information for signal processing applications. The moments can be accurately determined [4] for sufficiently large data sets but the joint distribution f(q)is hardly ever known with any precision. Some of its properties are usually assigned, with varying degree of confidence. For instance, the allowed maximal range M()of the parameters of digital filters is given by stability constraints. The transformation technique above is well adapted to these facts, since the covariance is prioritised. The transformation q=UTSUq˜will be utilized in section 5.2 to include correlations with limited mixing of the statistics assigned to independent normalized canonical parameters q˜.

3.3. Refinements of Monte Carlo

To increase the efficiency of MC, the original brute force sampling technique has been further developed in mainly two directions: model simplification and sample distribution improvement. In response surface methodology (RSM) [18], the model is replaced by a simple approximate surrogate model. A model of order vmay be found by applying linear (with respect to C) regression at collocation points[15] q^(k)=μ(k),

H(μ)R(μ)C,{Rkj=Rj(μ(k)),j=1,2,,vHk=H(μ(k)),{C=(C1Cv)Tμ=(μ(1)μ(m))T,mv,E13

where Rj(q)is basis function j. Since it may be non-linear, RSM allows for non-linear propagation of uncertainty and may give a substantially different and more accurate result than LIN. If only linear basis functions are used Rj(q)=qj, RSM becomes equivalent to LIN. The best least square approximation is directly obtained from Eq. 13 [19],

C=(RTR)1RTHE14

Let RSM(r)utilize a complete set of mixed polynomial basis functions up to order r. Its least number (v)of collocation points grows rapidly with both the number of parameters (n)and polynomial order (r)[12],

v=k=0rw(n,k):w(n,k)=j=1min(n,k)(nj)w(j,kj),w(j,0)=1.E15

In practice, r>3often yields an unacceptable number of samples, see table 1.

 n=2 n=5 n=10 n=20 r=1 3 6 11 21 r=2 6 21 66 231 r=3 10 56 286 1771

Table 1.

Efficiency ρ=vfor RSM(r), for selected polynomial orders rand numbers nof parameters.

The distribution of samples may be improved with stratification, as in Latin Hypercube sampling (LHS) [18]. By dividing the sample space into intervals, or stratas representing equal probability the need for large ensembles is reduced. In LHS, each parameter is sampled exactly once in each of its stratas giving a generalized latin square[20]. This selection pushes the samples away from each other and distributes them more evenly. To illustrate the improvement with stratification, sample one parameter q~NRM(0,1). After division into mintervals of equal probability, samples are found with the inverse transform method described in section 3.2 (Eq. 8). As seen in Fig. 2, the convergence improves dramatically. Still, even for m=100samples the second moment (left) varies noticeably. The convergence is generally poorer for higher order moments M(k), as shown for k=4(right).

In this case, it is questionable if 100 samples are sufficient to represent as few as four moments M(1)M(4). The probabilistically evenly distributed fixed grid (dashed) converges more rapidly to the proper statistics. Despite the prevailing tradition, there is no absolute requirement of using a random generator to represent statistical information. Fixed grids are examples of deterministic sampling. Stratification provides an interesting intermediate type of sampling since it is partially deterministic – the strata are constructed deterministically but the samples within each stratum are generated randomly. The construction of a fixed grid requires focus on the most relevant features. To reproduce M(1)M(4)exactly, a very sparse grid or few deterministic samples are needed,

q^={(±1.376±0.325)q~UNI(0,1)(±1.7320(×4))q~NRM(0,1).E16

If the problem at hand only depends on these moments, the exact solution will be obtained. The size of such small ensembles must be fixed, no matter how they are generated. Adding, or perturbing a single sample would modify the statistics substantially.

4. Deterministic sampling

Deterministic sampling (DS) of uncertain systems is a viable alternative to random sampling (RS). Instead of using random generators, specific DS rules are devised to generate appropriate, but still statistical (Fermi’s notation, see section 3.2) ensembles. A rudimentary example illustrates the principle: Assume a model y(q)depends on one parameter qwith mean qand variance δq2. To estimate the mean yand the variance δy2of the model, the samples (filter parameters) q^(1,2)=q±δq2are appropriate since they satisfy the desired statistics, q^E=qand δq^2E=δq2. The formula for q^(1,2)constitutes the sampling rule and q^(1,2)is the statistical ensemble containing only two model samples. By paying the computational cost of using more samples and improving the sampling rule, additional moments δqk,k>2or other statistical features can be accounted for.

In deterministic sampling the model evaluations involve no approximations and are non-invasive. In many respects, deterministic sampling is constructed and optimized for quantification of modeling uncertainty: Minimal ensembles allow for evaluation of the most numerically demanding models. The model evaluations are exact and non-invasive to fully respect non-linear deeply hidden parameter dependences. Only vaguely known statistics of the model is approximated.

4.1. Concepts of deterministic sampling

DS does not per se specify the goal of sampling, e.g. given mean and covariance of the parameters. In the example at the end of section 3.3, the primary target was the joint pdf of the parameters. In section 4.2, the target is M(2)(q). In section 5, this will be complemented with additional requirements. DS can also be utilized for direct evaluation of confidence intervals [12]. The targets of various DS methods may differ but the focus on the most influential statistical aspect and customization is shared. In stark contrast, almost without exception RS targets the joint pdf of the parameters and ignores the final utilization. Adaptation and fixed ensemble sizes provides the principal means to improve the efficiency of sampling.

4.2. Propagation of covariance in the standard unscented Kalman filter

The reference will be the specific variant of DS used for propagating covariance in what will be referred to as the standard unscented Kalman filter (UKF) [21-23]. The ensemble consists of 2nsamples, or sigma-points,

q^(s,k)q+snΔ:k,ΔΔT=cov(q),k=1,2,n,s=±E17

where Δ:kdenotes the k-th column of Δ. The sampling rule is manifested in the square root calculation of the covariance matrix (Δ). As suggested [23] it may be found with a Cholesky factorization [19]. The square root matrix is not unique though – the Cholesky root is upper triangular and thus asymmetric. A more symmetric standard alternative is to evaluate the matrix square root in a canonical basis [24] Uqwhere cov(Uq)is diagonal. The canonical variations Uδq^(s,v)will be unit vectors in the npositive and negative directions of the principalaxes of the covariance matrix, amplified by the marginal standard deviations and most importantly, n. For many parameters with large covariance, the scaling with nmay cause the UKF to fail since the scaling is not related to the variability of the parameters, only their total number. A possible solution to the scaling problem is provided by the scaled unscented transformation [25]. However, it is based on Taylor expansions and thus suffers from an approximation problem of the model.

5. Sampling with conservation of moments

One class of methods of deterministic sampling conserves a limited number of statistical moments. The model parameters are sampled to satisfy these moments and collected in ensembles, similar to how parameters are sampled to fulfill probability distributions in RS.

5.1. Principle

The constraints of satisfying statistical moments constitute an infinite system of equations for the samples δq^i(v). It can formally be viewed as sampling (=^) of the joint pdf f(q),

0=δqiδqif(q)dq=^1mv=1mδq^i(v)δq^iEδqi1δqi2δqi1δqi2f(q)dq=^1mv=1mδq^i1(v)δq^i2(v)δq^i1δq^i2Eδqi1δqi2δqi3δqi1δqi2δqi3f(q)dq=^1mv=1mδq^i1(v)δq^i2(v)δq^i3(v)δq^i1δq^i2δq^i3E=^E18

The infinite number of equations requires an infinite number of samples. However, it is implicitly assumed that relatively few moments are known and significantly influence the result of interest. Only a few moments then needs to be accurately represented by {δq^(v)}. Typically, δqiand δqi1δqi2are estimated when models are identified [4,7]. In addition, the range M()or another higher diagonal moment can generally be determined from underlying physical constraints like stability. Clearly, any sampling rule must generate a fixed number of samples and create them simultaneously. The samples are consequently strongly dependent. One obvious sampling method is to solve Eq. 18 numerically for a sufficiently large number of samples δq^, as in Eq. 16. Due to the strong non-linearities, this is quite difficult for a large number of moments but may be feasible for a few moments.

5.2. The excitation matrix

The UKF (section 4.2) utilizes DS with conservation of all first (δqi)and second (δqi1δqi2)statistical moments. The invariance in its formulationallows for any additional ‘half’ unitary transformation ΔΔV, V:VVT=I. This results in another equally valid matrix Δ˜, since Δ˜Δ˜T=ΔV[ΔV]T=ΔVVTΔT=ΔΔT. Since the transformation Vis allowed and influences the result, the result of applying the UKF is not unique. The matrix Vcondenses this invariance and provides practical means to manipulate the UKF ensemble. A key feature of Vis the absence of constraints on VTV. That makes it possible to stretch V‘horizontally’ (as long as VVT=I). That corresponds to adding samples (sigma-points). The improved transformation UTSU(section 3.2) can be applied by also combining Uwith V. The square root of the covariance matrix will then read Δ=UTSUVinstead of Δ=UTSV,

Σn×mq11×m+UTSUV^,V^mVn×m,VVT=I,V1m×1=0.E19

The samples q^(k)are here collected in columns of the ensemble matrix Σ. The matrix cov(q)=ΔΔT=UTS2Uis diagonalized,S2=eig(cov(q)), with the unitary transformation U[24]. The normalization factormis included in the excitation matrixV^to satisfy the correct covariance, ΣΣT=ΔΔT, just as the factor nwas included in Eq. 17 (the ensemble is here expanded from nto msamples). The excitation matrix controls the sampling beyondthe first and second moments, e.g. the range of the samples. Row kof the matrix SV^can be interpreted as deterministic samples of the pdf ϕk(Sq˜k), assigned to canonical parameters in RS, see section 3.2. All ensembles will be described with a unique excitation matrixV^.

The adopted transformation UTSUS1of SV^distorts all higher moments than the second. This mixing effect is indicated by the index Ψ(UTSUS1)defined in Eq. 11. To diagonalize large matrices cov(q)many efficient techniques have been developed. This should not cause any difficulties even for n~1000, especially since cov(q)usually is either very sparse, or rank deficient for models with many parameters.

5.3. Elimination of singular values

The rationale for applying the reduction to be presented is that any model is derived from a limited set of experiments, resulting in a usually moderate rank of cov(q). If the number of parameters is large, it is thus often (nearly) rank deficient. The widely practiced singular value decomposition (SVD) [19] may then be used to reduce the excitation matrix and hence the number of samples. The most general form of SVD cannot be used here since it renders an asymmetric decomposition cov(q)=UTS2W, where UUT=UTU=WWT=WTW=Iand Sjk2=δjkSkk0. Different matrices U,Wallow for decomposition of an arbitrary matrix. For the symmetric matrix cov(q), a symmetric SVD U=Wcan be found with the less general eigenvalue decomposition [24], according to the spectral theorem [11]. As cov(q)is positive definite, all its eigenvalues Skk2fulfill the requirement of being positive. This is required to directly obtain a real-valued matrix square root Δ=UTSUV.

The ensemble may now be reduced by elimination of singular values (ESV). Choose a threshold αand remove row rand column rfrom Sand row rfrom Ufor all rsuch that,

|Srr|<αmaxk|Skk|,α<<1.E20

Proceeding as in many applications of SVD, this reduction (indicated by tilde below) will not change the result significantly, if αis small enough. Accordingly, samples are eliminated using the alternative decomposition of the square root of cov(q),

Δn×m=(UT)n×nSn×nVn×m(U˜T)n×rS˜r×rV˜r×μ=Δ˜n×μ.E21

Unfortunately, the less distorting transformation UTSUS1of SV^advocated in section 3.2 do not allow for r<nrows of the matrix V. The increase in distortion of M(k>2)indicated by Ψ(UTSUS1)Ψ(UT)is less important for the intended use though. Signal processing models with large numbers of parameters are typically non-parametric and usually describe samples of signals like impulse responses, or noise signals. The required LR property of the system then implies LP. The propagation of covariance is then linear and only the undistorted first and second moments need to be encoded.

5.4. Correlated sampling of non-parametric models

A major difference between parametric and non-parametric models is the dimensionality. A conceptual dissimilarity is that non-parametric models usually refer to correlated signals, rather than abstract model structures. The parameters may describe discrete samples of input noise [7], or an impulse response [6]. A common parametric pole-zero model may contain 20 parameters, while a non-parametric model can be expressed in perhaps 1000 parameters. The ensembles of non-parametric models often need to be reduced drastically.

Due to limited resolution, the correlation times of any signal or impulse response is finite. Their ‘memory’ is thus finite so sample variations may be regenerated or repeated, as long as the time between repetitions exceeds the correlation time. This correlated sampling(CRS) provides efficient and accurate reduction of the ensembles. The minimal number of parameters nis then set by the correlation time of the model. Most importantly, the size of the ensemble becomes independent of the size of the model (the length of the signal).

A finite correlation length τNof any model δx(t)is normally inferred from the decay of its autocorrelation functionC(t,T), where tdenotes the lag and Trefers to a non-stationary variation. Here, a global τwill be defined through its l2-norm and determined for a relative truncation threshold β(argminreturns the minimizing argument),

τ=maxT[argminτ|t=τ+1|C(t,T)|2β2t=0|C(t,T)|2|],C(t,T)δx(T+t2)δx(Tt2),β<<1.E22

If the model is expressed as a convolution δx(t)=h(t)w(t)of an impulse response h(t)and time-dependent white noise w(t)as in section 6.2,

C(t,T)=u=0η2(T(u+t2))h(u)h(u+t)η2(T)u=0h(u)h(u+t),η(t)=var(w),w=0.E23

By padding the model to an integer multiple of γ2τsamples, it is always possible to choose an excitation matrix partitioned to block-diagonal form,

V^n×m=(cV˜γ×γ0γ×γ0γ×γ0γ×γcV˜γ×γ0γ×γ0γ×γ0γ×γcV˜γ×γ),V˜γ×γV˜γ×γT=γI,c=mγ,E24

where V˜γ×γis any allowed deterministic sub-ensemble. The factor caccounts for the change from γsamples of V˜γ×γto the m>γsamples of V^n×m. By violating the normalization constraint V^n×mV^n×mT=m, the size of the ensemble can be ‘compressed’ from mto γsamples by moving all sub-matrices cV˜γ×γto the first block-column and skipping all zeros. The introduced constant cdrops out as mγ,

V^n×γ(V˜γ×γV˜γ×γV˜γ×γ),V^n×γ(V^n×γ)T=γ(Iγ×γIγ×γIγ×γIγ×γIγ×γIγ×γIγ×γIγ×γIγ×γ)γIn×n.E25

Accordingly,

cov(x)jk(UTSUVVTUTSU)jk={cov(x)jk,|jk|γ/2cov(x)j,k+nγ,{|jk|>γ/2n=argminlZ|jklγ|.E26

The consequence of violating the normalization constraint is that only a limited diagonal band of cov(x)is correctly reproduced. If a non-parametric model of a signal is propagated through a system model with impulse response hof correlation length σγ/2this will nevertheless notresult in any error of var(h), as it is independent of all faulty elements cov(ν)jk, |jk|>γ/2σ. To correctly evaluate cov(h)uvthough, the size of the sub-ensembles V˜γ×γof correlated sampling must fulfill the stronger size constraint γ2(max(τ,σ)+|uv|). The symmetry of convolutions implies a corresponding result when the non-parametric model describes the impulse response hof the system, rather than a signal.

5.5. Combining covariance

A signal processing model generally includes both parametric and non-parametric sources of uncertainty. For instance, a device (parametric system model) may be fed with a signal corrupted with noise (non-parametric noise model). The question then arises how the two sources qk,xkof uncertainty can be combined. For propagation of uncertainty through LP models, the combined covariance is given by the Gauss approximation formula [16],

cov(y)=LPkcov(y(k)),E27

where cov(y(k))is the propagated covariance of qk,xk. This will seize to apply for non-LP models. There exists no general non-linear summation rule for propagated covariance. A method of summation can be given though, if different ensembles are combined as in RS.

To combine ensembles of parametric (q)and non-parametric models (x), collect all parameters, q(qTxT)T, and diagonalize the enlarged covariance matrix, cov(q)=UTS2U. Build V^with two blocks and use CRS (section 5.4) for the non-parametric model,

V^(n+kγ)×(m+v)(1+c1V^n×m001+cV^γ×v01+cV^γ×v01+cV^γ×v),c=mv,V^(n+kγ)×(m+v)(V^(n+kγ)×(m+v))T=(m+v)I.E28

The scaling 1+c±1may cause a similar scaling problem as the factor nin the UKF (section 4.2). Using extended excitation matrices these factors can be eliminated,

V^(n+kγ)×c(A^n×cB^γ×cB^γ×cB^γ×c),E^(n+γ)×c=(A^n×cB^γ×c),E^(n+γ)×c(E^(n+γ)×c)T=cI,c=max(m,v)>(n+γ).E29

A disadvantage of this summation is that the same type of ensemble must be used for all parameters. Both alternatives combine the statistics of the two models non-linearly. The uncertainties are propagated and combined by evaluating the model for all samples and calculating the desired statistics, just as if the combined ensemble described one model.

5.6. Selected ensembles

The standard (STD) ensemble employed in the UKF (as defined in section 4.2) utilizes the perhaps simplest possible excitation matrix,

V^STD=n(In×nIn×n),m=2n.E30

While the ultimate simplicity is its main advantage, the long maximal(!) range M()is its main disadvantage.

How far the reduction of samples might be driven is illustrated by the minimal simplex (SPX) ensemble,

V^SPX=n+1{(In×n1n×1)},m=n+1,E31

where the operator performs classical Gram-Schmidt orthogonalization [11] and normalization of rows. The ensemble is constructed from half the STD ensemble, complemented with one sample 1n×1to cancel the first moments. Since that violates the orthogonality of the rows of V, must be applied to satisfy VVT=I. The high efficiency of the SPX ensemble is tarnished by its large skewness, or M(3). This may give considerable bias of propagated covariance for non-LP models, but is irrelevant for LP models.

The binary (BIN) ensemble has minimal range to guarantee allowed samples. By varying all parameters with an equal magnitude of one standard deviation in all samples, the diverging factor nof the STD is eliminated. Its excitation matrix V^BINis fundamentally constructed from a standard binary array, with the difference that the allowed levels are ±1instead of 0,1(see rows 1-3 in Eq. 32). It is then complemented with supplementary rows obtained in two ways, by cyclic shifting and mirror imaging,

V^BIN(m)=(+11+11+11+11+1+111+1+111+1+1+1+111111+1+111+1+1111+1+1+1+111+11+111+11+11+1+11+111+1),m=2ceil(n+54).E32

Cyclic shifts are applied to all original rows except the first, by a quarter of their periodicity. Mirror imaging of a row is defined to change the sign of its second half and is applied to all original rows except the last two, and all shifted rows except the last. For instance, in Eq. 32 row 4 and 5 are shifted versions of row 2 and 3, while rows 6 and 7 are the mirror images of rows 1 and the shifted row 4. The supplementary rows reduce the size of the ensemble drastically with a corresponding improvement of the efficiency. For n=20parameters, the size drops from roughly 106to 128samples. That size is acceptable in perspective of the n+1=21samples of the most efficient SPX. Eventually though, the number of samples will grow too large. The BIN can thus only be applied to moderately sized models.

By no means, this brief survey exhausts all possible ensembles. Many criteria for selecting the most appropriate ensemble can be formulated. Here, the first and second moments, parameter ranges and efficiency were in focus.

6. Application — Modeling uncertainty of a dynamic device

The task is to simulate the response of an electrical device such as an amplifier or oscilloscope, in the presence of non-stationary correlated noise on its input. An uncertain LR CT model of the device and its parametric covariance is usually found by applying system identification techniques [4] on calibration measurements [6]. Such a model of the system can be sampled into a digital filter and be described in the pole-zero form in Eq. 2. These standard steps will here be omitted. The system model will instead be assigned to a digital low-pass Butterworth filter, of order 10and cross-over frequency fC=0.1fN, fNbeing the Nyquist frequency and described by parameters K,p1,p2,p10,z1,z2,z10. The complete correlations of complex-conjugated pole (p)and zero (z)pairs are eliminated by a transformation from q=z,pto Re(q),Im(q)0, giving n=21system model parameters,

q(KRe(z1)Im(z1)0Re(p1)Im(p1)0Re(p10)Im(p10)0)T.E33

To be most general, the non-parametric input noise model is chosen to be correlated/colored and non-stationary. The noise parameter δxkrepresents the noise level at time sample k. Its generating signal[7] is a Dirac delta function δjk, centered at time k. The response of a system with impulse response hwill be δyj=δxk(hjδjk)=hjkδxk. In matrix notation, δy=h¯Tδx, where h¯kj=hjk. Hence,

cov(y)=δyδyT=h¯TδxδxTh¯=h¯Tcov(x)h¯E34

Since the response is linear in noise parameters, it is sufficient to only capture cov(x).

6.1. Reference ensembles

Traditionally, any method for uncertainty propagation is evaluated by comparisons with the default method of linearization [10,16], and brute-force random sampling (MC) [9] as state-of-the-art. There are several drawbacks of this approach. Linearization is a coarse approximation for LP models and MC suffer from the difficulty of modeling dependencies and low efficiency. An alternative is to construct finite reference ensembles (REF) and by definition let them describe the truth. Their primary advantage is that the finite size of the REF makes it possible to propagate the uncertainty exactly, using all REF samples. A more or less arbitrary REF may be generated randomly, like any MC ensemble. All requirements are also automatically fulfilled since the REF is built of possible realizations. Also, the REF closes the loop as it makes it possible to compare ‘true’ and approximate samples directly on an equal footing (see Fig. 8). Even though the samples differ substantially, the resulting modeling uncertainties can be similar.

A plausible REF δqjfor the system model realized as a digital filter is created by randomly generating msamples of nparameters qkfrom uniform distributionsUNI(0,σk), with σklisted in Fig. 3, top left. The joint pdf will have compact support [11], as required to guarantee stability. The mean is subtracted from all samples to remove the bias of the finite random ensemble, δqjE=δqj=0,j. The covariance of the REF will have a desirable more or less random variation for small values of m. If the REF samples are arranged in columns of a matrix Λ^n×m, (as V^) cov(q)REF=1/mΛ^Λ^T. For m=31>n=21, the strong correlations will expose the methods to severe tests with significant transformations U,S. The mixing Ψ(Eq. 13) using transformation UTSUS1was considerable, but less than for UT, see caption Fig. 3. For the chosen REF, the resulting variations of poles and zeros are displayed in Fig. 3. The obtained variation of the parameters defined in Eq. 33 can be quantified with an averaged correlation index and standard deviation (Fig. 3, bottom left),

ξk1nkj=1nkcov(q)j(j+k)/1nj=1nσj2,σj2=var(qj).E35

A REF signal δxjfor non-stationary correlated noise may conveniently be generated from an autoregressive process (AR) acting on time-dependent zero mean white noise δw, δx=g¯Tδw, where g¯kj=gjkis the matrix of translated impulse responses gfor the AR process defined by parameters αk. Assigning a square wave time-dependence,

kαkδxjk=δwj,δxδxT=g¯TδwδwTg¯=g¯Tcov(w)g¯,cov(w)=diag[η(t)]η(t)=Ν[1+11+ψθ[cos(2tπT+φ)]],θ={+()1x>(<)00,x=0.E36

The exact REF result of modelling covariance of noise is given by combining Eqs. 34 and 36,

σNTr(cov(y))=Tr(h¯Tg¯Tdiag[η2]g¯h¯).E37

An explicit realization of the REF for the noise model is hence not needed. Specifically, a second order system α=[10.40.6]with time parameters {Ν,ψ,T,φ}={0.05,0.3,2fC1,π/8}was chosen. The impulse responses of the AR noise system and the system model, and the variation of the noise model are illustrated in Fig. 3, bottom right.

The ‘true’ result given by the response for the REFs for the different test signals is shown in Fig. 4. The propagated noise variation σNdiffers substantially from the input square wave η(top left) and is almost opposite in phase, due to the response time of about fC1, see delay of μS(top, right and bottom). The signal distortion (μS)is strongly dependent on the input signal and decreases with increased regularity / differentiability. The propagated covariance σShas a more complex variation (top, right and bottom), as it is larger for the more regular Gaussian (bottom, right) than for the triangular pulse (bottom, left).

6.2. Deterministic sampling

The error of the scent and the standard deviation for the STD, SPX and BIN ensembles of the system model is displayed in Fig. 5, for all test signals. The low scent of the REF (left: thin, dotted) suggests the model is close to LP. Despite the relative errors are large they are quite small on an absolute scale. The SPX has the largest errors, for the scent as well as the variance. That is likely caused by its skewness being much larger than that of the REF. The BIN has the lowest errors and is thus the best approximate representation of the REF.

The errors might appear large, considering allensembles are ‘correct’, i.e. correctly represent (typically) available accurate information (mean and covariance of parameters). The errors reflect ambiguities caused by the ubiquitous lack of information in signal processing, rather than inadequacies of DS. RS can only produce better results by making further assumptions.

The result of applying the ESV and the CRS methods to reduce the SPX ensemble for propagating the noise is displayed in Fig. 6. By choosing sufficiently low thresholds αfor elimination of singular values (ESV) and βfor truncation of the correlation lengths (see Eqs. 20,22), the errors can be made arbitrarily low. As the reduction will decrease accordingly, there is a trade-off between accuracy and efficiency. For the chosen values, CRS is about twice as accurate and twice as efficient as ESV. In contrast to ESV, the number of samples for the CRS method is independent of the number of noise samples. The computational cost thus increases linearly with the length of the noise signal for CRS but quadratically (approximately) for ESV. For ESV to be most efficient, the model covariance needs to be strongly rank deficient. That is not as unlikely as it might appear, since the model usually is derived from a limited amount of experimental results.

The summation of the noise and the model covariance is illustrated in Fig. 7. The propagation of the covariance of the system model (q)is not LP. The quadratic summation rule (Eq. 27), or Gauss approximation formula [16], is therefore not applicable. Nevertheless, the low scent ζ(Fig. 5, left) suggests that both propagations are close to LP. The summation error (ε)is hence finite, but quite small. It differs qualitatively from both contributions, indicating that the summation is non-trivial.

Finally, the samples of one pole of the derived ensembles are compared to the reference samples of the REF in Fig. 8. The limit (|z|=1)of stability is included to illustrate how close the samples are to be physically forbidden. The construction of the different ensembles is apparent, even though the transformation T=UTSUS1distorts the scatter plots (sections 3.2, 5.2), and tilts the principal axes (lines). The samples of the REF are almost evenly distributed. Only four samples of the STD, labelled p1,p2,p3,p4, deviate significantly from a dense central cluster, as described by the excitation matrix V^STD(Eq. 30). It also is evident that SPX originates from half the STD. A small translation required to achieve the correct mean is discernible, while the Gram-Schmidt orthogonalization renders a minor rotation and distortion. The BIN contains comparable variations in all samples and thus has no central cluster and its samples are repelled from the principal directions (lines). The statistical differences to the REF refer to the shape of the joint pdf. Choosing the best ensemble is thus equivalent of selecting the most appropriate pdf in RS. The BIN seems to resemble the REF scatter plot the most, as verified by its low errors in Fig. 5.

7. Conclusions

Deterministic sampling remains controversial [27] while random sampling has qualified as a preferred state-of-the-art method for propagating uncertainty. Both result in finite statistical[17] ensembles, which are approximate finite representations of the primary statistical models. Their sampling strategies and convergence rates are dramatically different. While deterministic sampling humbly aims at representing the most relevant and best known statistical information, random sampling targets complete control of all features of the ensemble. Such detailed information is rarely known and must instead be more or less blindly assigned. The inevitable consequence is that critical computational resources are spent on propagating, at best, vaguely known details. The numerical power of modern computers is better spent on refinements of the signal processing model (longer time series, higher sampling rates, larger systems etc.). Refined methods of random sampling have therefore been proposed which either simplifies the model, or improve the sampling distributions. Compared to deterministic sampling though, their convergence rates remain low.

It is easy to confuse deterministic sampling with experimental design and optimization [28]. Even though any sample could be a possible outcome of an experiment, deterministic ensembles representrather than realize(as random ensembles) statistical distributions. Instead of associating a joint distribution to the parameters of an uncertain model, it is possible to directly represent their statistics with a deterministic ensemble. That would eliminate the need of interpreting abstract distributions and result in complete reproducibility. The critical choice of ensemble would be assigned once and for all in the calibration experiment, with no further need of approximation.

The use of excitation matrices made it possible to construct universal generic ensembles. The efficiency of the minimal SPX ensemble is indeed high but so is also its third moment. While the STD maximizes the range of each parameter, the BIN minimizes it by varying all parameters in all samples. The STD is the simplest while the SPX is the most efficient ensemble. In the example, the BIN was most accurate. For non-parametric models with many parameters, reduction of samples may be required. Elimination of singular values (ESV) and correlated sampling (CRS) were two such techniques. The presented ensembles are not to be associated to random sampling as a method. They are nothing but a few examples of deterministic sampling, likely the best ensembles are yet to be discovered.

It is indeed challenging but also rewarding to find novel deterministic sampling strategies. Once the sampling rules are found, the application is just as simple as random sampling, but usually much more efficient. Deterministic sampling is one of very few methods capable of non-linear propagation of uncertainty through large signal processing models.

More

© 2013 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution 3.0 License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite and reference

Cite this chapter Copy to clipboard

Jan Peter Hessling (January 16th 2013). Deterministic Sampling for Quantification of Modeling Uncertainty of Signals, Digital Filters and Signal Processing, Fausto Pedro García Márquez and Noor Zaman, IntechOpen, DOI: 10.5772/52193. Available from:

Related Content

Next chapter

By Alexey Mokeev

First chapter

Digital Filters for Maintenance Management

By Fausto Pedro García Márquez and Diego José Pedregal Tercero

We are IntechOpen, the world's leading publisher of Open Access books. Built by scientists, for scientists. Our readership spans scientists, professors, researchers, librarians, and students, as well as business professionals. We share our knowledge and peer-reveiwed research papers with libraries, scientific and engineering societies, and also work with corporate R&D departments and government entities.