Chapter 3 Deterministic Sampling for Quantification of Modeling Uncertainty of Signals

Statistical signal processing [1] traditionally focuses on extraction of information from noisy measurements. Typically, parameters or states are estimated by various filtering operations. Here, the quality of signal processing operations will be assessed by evaluating the statistical uncertainty of the result [2]. The processing could for instance simulate, correct, modulate, evaluate, or control the response of a physical system. Depending on the addressed task and the system, this can often be formulated in terms of a differential or difference signal processing model equation in time, with uncertain parameters and driven by an exciting input signal corrupted by noise. The quantity of primary interest may not be the output signal but can be extracted from it. If this uncertain dynamic model is linear-in-response it can be translated into a linear digital filter for highly efficient and standardized evaluation [3]. A statistical model of the parameters describing to which degree the dynamic model is known and accurate will be assumed given, instead of being the target of investigation as in system identification [4]. Model uncertainty (of parameters) is then propagated to model-ing uncertainty (of the result). The two are to be clearly distinguished – the former relate to the input while the latter relate to the output of the model.


Introduction
Statistical signal processing [1] traditionally focuses on extraction of information from noisy measurements.Typically, parameters or states are estimated by various filtering operations.
Here, the quality of signal processing operations will be assessed by evaluating the statistical uncertainty of the result [2].The processing could for instance simulate, correct, modulate, evaluate, or control the response of a physical system.Depending on the addressed task and the system, this can often be formulated in terms of a differential or difference signal processing model equation in time, with uncertain parameters and driven by an exciting input signal corrupted by noise.The quantity of primary interest may not be the output signal but can be extracted from it.If this uncertain dynamic model is linear-in-response it can be translated into a linear digital filter for highly efficient and standardized evaluation [3].A statistical model of the parameters describing to which degree the dynamic model is known and accurate will be assumed given, instead of being the target of investigation as in system identification [4].Model uncertainty (of parameters) is then propagated to model-ing uncertainty (of the result).The two are to be clearly distinguished -the former relate to the input while the latter relate to the output of the model.Quantification of uncertainty of complex computations is an emerging topic, driven by the general need for quality assessment and rapid development of modern computers.Applications include e.g.various mechanical and electrical applications [5][6][7] using uncertain differential equations, and statistical signal processing.The so-called brute force Monte Carlo method [8][9] is the indisputable reference method to propagate model uncertainty.Its main disadvantage is its slow convergence, or requirement of using many samples of the model (large ensembles).Thus, it cannot be used for demanding complex models.The ensemble size is a key aspect which motivates deterministic sampling.Small ensembles are found by substituting the random generator with a customized deterministic sampling rule.Since any computerized random generator produces a pseudo-random rather than a truly random sequence, this is equivalent of modifying the random generator to be accurate for small ensembles of definite size, rather than being asymptotically exact (infinite ensembles).Correctness of very large ensembles is of theoretical but hardly practical interest for complex models, if the convergence to the asymptotic result is very slow.

Modeling uncertainty of signals 2.1. Problem definition
Suppose the (output) signal y(x, t) ∈ R of interest is generated from the (input) signal x(t) ∈ R passing through a dynamic system H , with parameters The model is given in n = u + v + 1 uncertain parameters, which can be arranged in a column vector q = (b 0 ⋯ b v a 1 ⋯ a u ) T .For systems continuous-in-time (CT), D = ∂ t is the differen- tial operator in time while for systems discrete-in-time (DT), D = Δ −1 is the negative unit displacement operator, Δ −1 x k = x k −1 .There are several approximate methods to sample CT systems to DT systems, see [3] and references therein.The discretization techniques are beyond the scope of this presentation and DT systems will be assumed.If u ≥ 1, there is feedback in the system which results in an impulse response h (q, t) of infinite duration.For finite accuracy however, the duration is finite.
Systems of the form in Eq. 1 may be directly realized as digital filters,y(q, x, t) = h (q) * x(t), where * denotes the filtering operation.The coefficients b k and a k are the numerator and denominator coefficients of the filter with impulse response h (q), respectively.Its z-transform H (q, z) is obtained with the substitution Δ → z.The parameterization can be changed to for instance gainK , poles p k and zeros z k , or poles p k and residues r k , Digital Filters and Signal Processing The parameterization should be carefully chosen as it affects the convergence rate of Taylor expansions (section 3.1) as well as the physical interpretation.The parameters and their statistics are preferably extracted from measurements using system identification techniques [4].Note that complex-valued poles and zeros are conjugated in pairs [10].
The problem to be addressed is the statistical evaluation of any function g(y(t) = h (q, t) * x(t)), given statistical models of q and x.It will here consist of evaluating its time-dependent mean g(y) and standard deviation g(y) − g(y) 2 .Without loss of generality, the analysis will be made for g(y) = y.Digital filtering will be utilized for evaluating samples of the model, i.e. filtering with definite sets of q and signals x.

Nomenclature
Statistical expectations of any signal, model or function g(q) over finite discrete E as well as continuous ensembles or probability distributions (no subscript) are defined as, ˆ, .
Samples of q are labeled q ^, with their components organized in columns.Sample indices will be given as superscripts in parenthesis, eg.q ^(k ) is a column vector denoting the k − th sample of parameter q.Variations from the mean are written as δq (E ) ≡ q − q (E ) .
Only uniform (UNI) and normal distributions (NRM) will be utilized.Either the mean and standard deviation, or the interval in brackets will be given in parenthesis, e.g.
q ~UNI ( 0.5,1 / 2 3 ) = UNI ( 0, 1 where Q denotes the sample space.While M i (2) describes the width of f i (δq i ), M i (3) is related but different to its skewness [11].Further, the shape is , similarly to the curtosis [11].Since UNI(0,1) and NRM(0,1) are normalized and symmetric f i (2) = 1 and M i (3) = 0. Their differences are first reflected in their fourth moment, M i (4) = 1 / 4 5, 1 / 2 3 ≈ (0.11, 0.29) for UNI(0,1) and NRM(0,1), respec- tively.The maximum variation of the parameter q i is expressed by the range . The discussion will be limited to correlations described by the covariance matrix cov(q) = δqδq T , where the vector multiplication is an outer product.Matrix size will be indicated with subscripts, e.g.V n×m is a matrix of n rows and m columns with elements V jk , j = 1, … n and k = 1, … m.The identity matrix will be denotedI , while matrices with equal elements (i) will have their size attached, (i n×n ) jk = i.For a matrix (vector) D, diag(D) is a vector (diagonal matrix) with components (diagonal elements) equal to the diagonal elements (components) of D. The trace of a matrix is denoted Tr.
A method will be stated intrusive if manipulations of the model are required.For the targeted highly complex models, it will be assumed that the computational cost for their evaluation dominates all other calculations.The efficiency ρ of any method will accordingly be defined by the least required number of evaluations of the original model.

Fundamentals of non-linear propagation of uncertainty
Linearity in parameters (LP) is to be distinguished from linearity in response (LR), T LR y q a x a x t a y q x t a y q x t x x LP y q q x t y q x t C x t q q q q + = + " for some vector C n×1 .Different concepts of linearity are used, y(q 1 + q 2 , x, t) ≠ y(q 1 , x, t) + y(q 2 , x, t) for LP models.Strictly speaking, LP denotes models that are affine, i.e. written as linear combinations of their parameters.Most constructed systems are designed to be as close to LR as possible while most models are not LP.There is hence no contradiction in non-linear (LP) propagation of uncertainty with linear (LR) digital filters, as here.
For non-linear propagation of uncertainty, the asymmetry of the resulting pdf is central.It can be expressed as a lack of commutation of non-linear propagation and statistical evaluation of a center value (⋅ C ), as measured with the scent [12], ( ) ( ).
C C y q y q z º - The method for evaluating the center is left unspecified, as there are several alternatives.The most common choice is to use the mean, ⋅ C = ⋅ .The lowest order approximation of the scent can then be obtained by calculating the expectation of a Taylor expansion (section 3.1), ζ = Tr cov(q) ⋅ Η(y) / 2, where Η(y) jk = ∂ 2 y / ∂ q j ∂ q k is the Hessian matrix signal of y, evaluated at q .The scent is related to the skewness γ = δ y 3 / δ y 2 3/2 .The additional asymmetry caused by the non-linearity of the model is measured with the scent but differently.The scent addresses how parametric uncertainties are propagated and not how the result is distributed, e.g.ζ = 0 for all LP models for which γ may attain any value.A finite scent thus implies the model is not LP, but not the reverse.The scent should not be confused with bias.Bias is a property of an estimator, while scent is a property of a model.For every model, such as the Digital Filters and Signal Processing REF (section 6.1), many different estimators of y C (q) can be used, e.g. the different ensembles in section 5.6, see result in Fig. 5 (left).Consequently, an unbiased estimator of y C (q) correctly accounts for rather than ignores its finite scent, or deviation from y(q C ).
The scent is important since y C and not y(q c ) is the main result utilized in applications.The corresponding difference [13] in the standard deviation M y (2) from its linearized approximation ∇ y T cov(q)∇ y, with (∇ y) jk = ∂ j y(t k ), affects the confidence in the result.Its accuracy is usually less critical.An accurate evaluation of the scent is perhaps the strongest feature of the unscented Kalman filter, which provides the foundation for the presented approach as well as the origin of the term 'scent'.

Conventional methods
A brief resume of the most traditional related methods of uncertainty propagation, applicable to signal processing models, is here given together with their pros and cons.Advanced intrusive methods like e.g.polynomial chaos expansions [14-15] not directly related to the proposed method are omitted.

Taylor expansions
The indisputable default methods of uncertainty propagation are based on Taylor expansions.These methods are intrusive if the differentiations are made analytically.Convergent series require regular differentiable models and numerical or analytical complexity make them error prone.Their applicability is therefore limited for complex models.
The transfer function H (q, z) of the digital filter can be expanded in a Taylor series, , T T H H q z H q z H q z q H q z q H q q k q q q E q z q q E q z This defines n sensitivity systems (column vector) E k (1) ( q , z ) , n(n + 1) / 2 unique quadratic variation systems (matrix) E (2) ( q , z ) , and so on.These variation systems differ (intrusive) from H (q, z) but may nevertheless be realized as digital filters [3,7,10], just as H (q, z).The corresponding variation of y(q, x, t) = h (q, t) * x(t) is given by, ( ) T T y q x t q e q t x t q q e q t x t where e (k ) ( q , t ) are the impulse responses of the systems E (k ) ( q , z ) .Utilizing digital filters with impulse responses e (k ) ( q , t ) , the differentiations are conveniently done once, and not repeatedly for every signal x(t).The linearity in parameters of the model can easily be studied for many different input signals x(t), by evaluating e (k ) ( q , t ) * x(t).Due to the large number of variation systems, higher order perturbation analyses rapidly become intractable though.
The established method is limited to linearization (LIN) [16] ( e (1) ) .It will always incorrectly yield vanishing scent, ζ = 0.A first order estimate of ζ is instead given by the expectation of second term in Eq. 7, ζ ≈ Tr cov(q) Η ( q , t ) / 2, where the matrix of Hessian signals Η ( q , t ) = e (2) ( q , t ) * x(t) is obtained with repeated digital filtering.

Brute force Monte Carlo
Monte Carlo (MC) methods [8][9], or random sampling of uncertain models was originally introduced and phrased 'statistical sampling' by Enrico Fermi already in the 1930's [17].The MC methods realize uncertain signal processing models in finite ensembles.Every ensemble consists of a possible set of well-defined model systems, all (usually) having the same structure but slightly different parameter values.In the original so-called brute force Monte Carlo method, each set of parameters is assigned to the output of random generators with appropriate statistics.The convergence to the assigned statistics is very slow [5] but it is asymptotically exact and the required number of samples is essentially independent of the number of parameters.Hence it does not suffer from the curse-of-dimensionality of many other methods.The outstanding simplicity in application is likely the cause of its popularity, just as the slow convergence or low efficiency is the main reason for its failures.
In MC, arbitrary distributions and dependencies are usually obtained by means of transformations of samples of elementary distributions.Independent samples q ^(k ) of any probability density function (pdf) ϕ(x) can be constructed with the inverse transform method [9].It consists of a calculation of the inverse of its cumulative distribution function (cdf) Φ(y) and generation of a uniformly distributed random sequence z ^(k ) , .
Covariance may be included with an appropriate transformation of samples of canonical parameters q ˜: q = U T S q ˜ with cov(q ˜) = I , ( ) ( ) The matrices S, U are found by calculating the eigenvalues ( S 2 ) and eigenvectors (U ) [11] of cov(q).This transformation makes the marginal pdfs f k (q k ) to differ substantially from the univariate pdfs ϕ k of the independent but scaled parameters Sq ˜k, Digital Filters and Signal Processing ( ) ( ) ( ) ( ) ( ) ( ) All ϕ k are hence mixed according to U .Dependencies are thus difficult to account for.One rare exception is provided by the multinomial distribution [9].It is often better to assign the pdfs to the canonical parameters in the original instead of the canonical basis.The transformation then reads q ˜: q = U T SU q ˜.As required, it leaves cov(q) invariant.The marginalization in Eq. 10 changes accordingly, U → SU T S −1 U .Since the transformation U T SU S −1 of Sq ˜k contains cancelling operations U , U T and S, S −1 , it is generally less distorting than U T .
Indeed, if the commutator S , U T ≡ SU T − U T S vanishes, U T SU S −1 = I .The transformation U T must satisfy the stronger criterion U = I to avoid mixing.For any transformation q → Wq, an indicator of mixing of the components of q is given by, ( ) A simple example illustrates that the mixing effect can be considerable, even for minute correlations.Assume a model has two parameters with a covariance matrix, ( ( ) UNI 0,1 0.90 0.10 1 1 1 0 1 cov , .0.10 0.90 1 1 0 0.8 2 UNI 0, 0.8 Large rotations are required because the canonical variances S jj 2 are similar, i.e. cov(q) is almost degenerate.As shown in Fig. 1, the large rotations mix the assigned pdfs ϕ k (Sq ˜k ) to marginal pdfs f k (q k ) beyond recognition for the transformation U T but not for U T SU S −1 .
Deterministic Sampling for Modeling Uncertainty 7 A simple example illustrates that the mixing effect can be considerable, even for minute 2 correlations.Assume a model has two parameters with a covariance matrix, 3

SUS U T
(skewed and tilted).Right: 10 , and magnified upper transition region (inset).12 Specifying both marginal probability distributions and covariance is either redundant or 13 inconsistent, as the latter is uniquely determined by the former.Nevertheless, this reflects 14 the typical available information for signal processing applications.The moments can be 15 accurately determined [4] for sufficiently large data sets but the joint distribution   q f is 16 hardly ever known with any precision.Some of its properties are usually assigned, with 17 varying degree of confidence.For instance, the allowed maximal range The sample space of independent scaled parameters (I : q k = Sq ˜k) (Eq. 12),and of the two transforma- tions ( U T ) (rotated) and ( U T SU S −1 ) (skewed and tilted).Right: Assigned pdfs ϕ k (Sq ˜k ) (dashed) and obtained margin- Deterministic Sampling for Quantification of Modeling Uncertainty of Signals http://dx.doi.org/10.5772/52193al pdfs f k (q k ) (solid) ) with mixing Ψ ( U T ) = 1.00 and Ψ ( U T SU S −1 ) = 0.058, and magnified upper transition region (inset).
Specifying both marginal probability distributions and covariance is either redundant or inconsistent, as the latter is uniquely determined by the former.Nevertheless, this reflects the typical available information for signal processing applications.The moments can be accurately determined [4] for sufficiently large data sets but the joint distribution f (q) is hardly ever known with any precision.Some of its properties are usually assigned, with varying degree of confidence.For instance, the allowed maximal range M (∞) of the parameters of digital filters is given by stability constraints.The transformation technique above is well adapted to these facts, since the covariance is prioritised.The transformation q = U T SU q ˜ will be utilized in section 5.2 to include correlations with limited mixing of the statistics assigned to independent normalized canonical parameters q ˜.

Refinements of Monte Carlo
To increase the efficiency of MC, the original brute force sampling technique has been further developed in mainly two directions: model simplification and sample distribution improvement.In response surface methodology (RSM) [18], the model is replaced by a simple approximate surrogate model.A model of order v may be found by applying linear (with respect to where R j (q) is basis function j.Since it may be non-linear, RSM allows for non-linear propa- gation of uncertainty and may give a substantially different and more accurate result than LIN.
If only linear basis functions are used R j (q) = q j , RSM becomes equivalent to LIN.The best least square approximation is directly obtained from Eq. 13 [19], ( ) Let RSM(r) utilize a complete set of mixed polynomial basis functions up to order r.Its least number (v) of collocation points grows rapidly with both the number of parameters (n) and polynomial order (r) [12], Digital Filters and Signal Processing In practice, r > 3 often yields an unacceptable number of samples, see table 1. Table 1.Efficiency ρ = v for RSM(r), for selected polynomial orders r and numbers n of parameters.
The distribution of samples may be improved with stratification, as in Latin Hypercube sampling (LHS) [18].By dividing the sample space into intervals, or stratas representing equal probability the need for large ensembles is reduced.In LHS, each parameter is sampled exactly once in each of its stratas giving a generalized latin square [20].This selection pushes the samples away from each other and distributes them more evenly.To illustrate the improvement with stratification, sample one parameter q ~NRM(0, 1).After division into m intervals of equal probability, samples are found with the inverse transform method described in section 3.2 (Eq.8).As seen in Fig. 2, the convergence improves dramatically.Still, even for m = 100 samples the second moment (left) varies noticeably.The convergence is generally poorer for higher order moments M (k ) , as shown for k = 4 (right).
Deterministic Sampling for Modeling Uncertainty 9 equal probability the need for large ensembles is reduced.In LHS, each parameter is 1 sampled exactly once in each of its stratas giving a generalized latin square [20].This 2 selection pushes the samples away from each other and distributes them more evenly.To 3 illustrate the improvement with stratification, sample one parameter . After 4 division into m intervals of equal probability, samples are found with the inverse transform 5 method described in section 3.2 (Eq.8).As seen in Fig. 2, the convergence improves 6 dramatically.Still, even for 100  m samples the second moment (left) varies noticeably.The 7 convergence is generally poorer for higher order moments  , compared to a fixed grid (dashed).11 In this case, it is questionable if 100 samples are sufficient to represent as few as four 12 moments . The probabilistically evenly distributed fixed grid (dashed) converges 13 more rapidly to the proper statistics.Despite the prevailing tradition, there is no absolute 14 requirement of using a random generator to represent statistical information.

) 21
If the problem at hand only depends on these moments, the exact solution will be obtained.22 The size of such small ensembles must be fixed, no matter how they are generated.Adding, 23 (2) (left) and fourth M (4) (right) moments for stratified (solid) and brute force sampling ( * ) of q ~NRM(0, 1), compared to a fixed grid (dashed).
In this case, it is questionable if 100 samples are sufficient to represent as few as four moments M (1) − M (4) .The probabilistically evenly distributed fixed grid (dashed) converges more rapidly to the proper statistics.Despite the prevailing tradition, there is no absolute require- If the problem at hand only depends on these moments, the exact solution will be obtained.
The size of such small ensembles must be fixed, no matter how they are generated.Adding, or perturbing a single sample would modify the statistics substantially.

Deterministic sampling
Deterministic sampling (DS) of uncertain systems is a viable alternative to random sampling (RS).Instead of using random generators, specific DS rules are devised to generate appropriate, but still statistical (Fermi's notation, see section 3.2) ensembles.A rudimentary example illustrates the principle: Assume a model y(q) depends on one parameter q with mean q and variance δq 2 .To estimate the mean y and the variance δ y 2 of the model, the samples (filter parameters) q ^(1,2) = q ± δq 2 are appropriate since they satisfy the desired statistics, q ^E = q and δq ^2 E = δq 2 .The formula for q ^(1,2) constitutes the sampling rule and q ^(1,2) is the statistical ensemble containing only two model samples.By paying the computational cost of using more samples and improving the sampling rule, additional moments δq k , k > 2 or other statistical features can be accounted for.
In deterministic sampling the model evaluations involve no approximations and are noninvasive.In many respects, deterministic sampling is constructed and optimized for quantification of modeling uncertainty: Minimal ensembles allow for evaluation of the most numerically demanding models.The model evaluations are exact and non-invasive to fully respect non-linear deeply hidden parameter dependences.Only vaguely known statistics of the model is approximated.

Concepts of deterministic sampling
DS does not per se specify the goal of sampling, e.g.given mean and covariance of the parameters.In the example at the end of section 3.3, the primary target was the joint pdf of the parameters.In section 4.2, the target is M (2) (q).In section 5, this will be complemented with Digital Filters and Signal Processing additional requirements.DS can also be utilized for direct evaluation of confidence intervals [12].The targets of various DS methods may differ but the focus on the most influential statistical aspect and customization is shared.In stark contrast, almost without exception RS targets the joint pdf of the parameters and ignores the final utilization.Adaptation and fixed ensemble sizes provides the principal means to improve the efficiency of sampling.

Propagation of covariance in the standard unscented Kalman filter
The reference will be the specific variant of DS used for propagating covariance in what will be referred to as the standard unscented Kalman filter (UKF) [21][22][23].The ensemble consists of 2n samples, or sigma-points, ( ˆ, cov , 1,2, , where Δ :k denotes the k-th column of Δ.The sampling rule is manifested in the square root calculation of the covariance matrix (Δ).As suggested [23] it may be found with a Cholesky factorization [19].The square root matrix is not unique though -the Cholesky root is upper triangular and thus asymmetric.A more symmetric standard alternative is to evaluate the matrix square root in a canonical basis [24] Uq where cov(Uq) is diagonal.The canonical variations Uδq ^(s,v) will be unit vectors in the n positive and negative directions of the princi- pal axes of the covariance matrix, amplified by the marginal standard deviations and most importantly, n.For many parameters with large covariance, the scaling with n may cause the UKF to fail since the scaling is not related to the variability of the parameters, only their total number.A possible solution to the scaling problem is provided by the scaled unscented transformation [25].However, it is based on Taylor expansions and thus suffers from an approximation problem of the model.

Sampling with conservation of moments
One class of methods of deterministic sampling conserves a limited number of statistical moments.The model parameters are sampled to satisfy these moments and collected in ensembles, similar to how parameters are sampled to fulfill probability distributions in RS.

Principle
The constraints of satisfying statistical moments constitute an infinite system of equations for the samples δq ^i (v) .It can formally be viewed as sampling (= ^) of the joint pdf f (q), ( ) q f q dq q q m q q q q f q dq q q q q m q q q q q q f q dq q q q q q q m The infinite number of equations requires an infinite number of samples.However, it is implicitly assumed that relatively few moments are known and significantly influence the result of interest.Only a few moments then needs to be accurately represented by { δq ^(v) } .
Typically, δq i and δq i1 δq i2 are estimated when models are identified [4,7].In addition, the range M (∞) or another higher diagonal moment can generally be determined from underlying physical constraints like stability.Clearly, any sampling rule must generate a fixed number of samples and create them simultaneously.The samples are consequently strongly dependent.
One obvious sampling method is to solve Eq. 18 numerically for a sufficiently large number of samples δq ^, as in Eq. 16.Due to the strong non-linearities, this is quite difficult for a large number of moments but may be feasible for a few moments.

The excitation matrix
The UKF (section 4.2) utilizes DS with conservation of all first ( δq i ) and second ( δq i1 δq i2 ) statistical moments.The invariance in its formulation allows for any additional 'half' unitary transformation Δ → ΔV , V : V V T = I .This results in another equally valid matrix Δ ˜, since Since the transformation V is allowed and influences the result, the result of applying the UKF is not unique.The matrix V condenses this invariance and provides practical means to manipulate the UKF ensemble.A key feature of V is the absence of constraints on V T V .That makes it possible to stretch V 'horizontally' (as long as V V T = I ).That corresponds to adding samples (sigma-points).The improved transformation U T SU (section 3.2) can be applied by also combining U with V .The square root of the covariance matrix will then read The samples q ^(k ) are here collected in columns of the ensemble matrix Σ.The matrix cov(q) = ΔΔ T = U T S 2 U is diagonalized,S 2 = eig(cov(q)), with the unitary transformation U [24].The normalization factor m is included in the excitation matrix V ^ to satisfy the correct covariance, ΣΣ T = ΔΔ T , just as the factor n was included in Eq. 17 (the ensemble is here Digital Filters and Signal Processing expanded from n to m samples).The excitation matrix controls the sampling beyond the first and second moments, e.g. the range of the samples.Row k of the matrix SV ^ can be interpreted as deterministic samples of the pdf ϕ k (Sq ˜k ), assigned to canonical parameters in RS, see section 3.2.All ensembles will be described with a unique excitation matrixV ^.
The adopted transformation U T SU S −1 of SV ^ distorts all higher moments than the second.
This mixing effect is indicated by the index Ψ ( U T SU S −1 ) defined in Eq. 11.To diagonalize large matrices cov(q) many efficient techniques have been developed.This should not cause any difficulties even for n ~1000, especially since cov(q) usually is either very sparse, or rank deficient for models with many parameters.

Elimination of singular values
The rationale for applying the reduction to be presented is that any model is derived from a limited set of experiments, resulting in a usually moderate rank of cov(q).If the number of parameters is large, it is thus often (nearly) rank deficient.The widely practiced singular value decomposition (SVD) [19] may then be used to reduce the excitation matrix and hence the number of samples.The most general form of SVD cannot be used here since it renders an asymmetric decomposition cov(q) = U T S 2 W , where U U T = U T U = W W T = W T W = I and S jk 2 = δ jk S kk ≥ 0. Different matrices U , W allow for decomposition of an arbitrary matrix.For the symmetric matrix cov(q), a symmetric SVD U = W can be found with the less general eigenvalue decomposition [24], according to the spectral theorem [11].As cov(q) is positive definite, all its eigenvalues S kk 2 fulfill the requirement of being positive.This is required to directly obtain a real-valued matrix square root Δ = U T SUV .
The ensemble may now be reduced by elimination of singular values (ESV).Choose a threshold α and remove row r and column r from S and row r from U for all r such that, max , 1.
Proceeding as in many applications of SVD, this reduction (indicated by tilde below) will not change the result significantly, if α is small enough.Accordingly, samples are eliminated using the alternative decomposition of the square root of cov(q), ( ) ( ) Unfortunately, the less distorting transformation U T SU S −1 of SV ^ advocated in section 3.2 do not allow for r < n rows of the matrix V .The increase in distortion of M (k >2) indicated by

Correlated sampling of non-parametric models
A major difference between parametric and non-parametric models is the dimensionality.A conceptual dissimilarity is that non-parametric models usually refer to correlated signals, rather than abstract model structures.The parameters may describe discrete samples of input noise [7], or an impulse response [6].A common parametric pole-zero model may contain 20 parameters, while a non-parametric model can be expressed in perhaps 1000 parameters.The ensembles of non-parametric models often need to be reduced drastically.
Due to limited resolution, the correlation times of any signal or impulse response is finite.Their 'memory' is thus finite so sample variations may be regenerated or repeated, as long as the time between repetitions exceeds the correlation time.This correlated sampling (CRS) provides efficient and accurate reduction of the ensembles.The minimal number of parameters n is then set by the correlation time of the model.Most importantly, the size of the ensemble becomes independent of the size of the model (the length of the signal).
A finite correlation length τ ∈ N of any model δ x(t) is normally inferred from the decay of its autocorrelation functionC(t, T ), where t denotes the lag and T refers to a non-stationary variation.Here, a global τ will be defined through its l 2 -norm and determined for a relative truncation threshold β (argmin returns the minimizing argument), ( ) ( ) ( ) If the model is expressed as a convolution δ x(t) = h (t) * w(t) of an impulse response h (t) and time-dependent white noise w(t) as in section 6.2,

t T T u h u h u t T h u h u t t w w
By padding the model to an integer multiple of γ ≥ 2τ samples, it is always possible to choose an excitation matrix partitioned to block-diagonal form, Digital Filters and Signal Processing where V ˜γ×γ is any allowed deterministic sub-ensemble.The factor c accounts for the change from γ samples of V ˜γ×γ to the m > γ samples of V ^n×m .By violating the normalization constraint V ^n×m V ^n×m T = m, the size of the ensemble can be 'compressed' from m to γ samples by moving all sub-matrices cV ˜γ×γ to the first block-column and skipping all zeros.The introduced constant c drops out as m → γ, ( ) The consequence of violating the normalization constraint is that only a limited diagonal band of cov(x) is correctly reproduced.If a non-parametric model of a signal is propagated through a system model with impulse response h of correlation length σ ≤ γ / 2 this will nevertheless not result in any error of var(h ), as it is independent of all faulty elements cov(ν) jk , | j − k | > γ / 2 ≥ σ.To correctly evaluate cov(h ) uv though, the size of the sub-ensembles V ˜γ×γ of correlated sampling must fulfill the stronger size constraint γ ≥ 2(max(τ, σ) + | u − v | ).The symmetry of convolutions implies a corresponding result when the non-parametric model describes the impulse response h of the system, rather than a signal.

Combining covariance
A signal processing model generally includes both parametric and non-parametric sources of uncertainty.For instance, a device (parametric system model) may be fed with a signal corrupted with noise (non-parametric noise model).The question then arises how the two sources q k , x k of uncertainty can be combined.For propagation of uncertainty through LP models, the combined covariance is given by the Gauss approximation formula [16], where cov ( y (k ) ) is the propagated covariance of q k , x k .This will seize to apply for non-LP models.There exists no general non-linear summation rule for propagated covariance.A method of summation can be given though, if different ensembles are combined as in RS.
To combine ensembles of parametric (q) and non-parametric models (x), collect all parameters, q → ( q T x T ) T , and diagonalize the enlarged covariance matrix, cov(q) = U T S 2 U .Build V ^ with two blocks and use CRS (section 5.4) for the non-parametric model, The scaling 1 + c ±1 may cause a similar scaling problem as the factor n in the UKF (section 4.2).Using extended excitation matrices these factors can be eliminated, A disadvantage of this summation is that the same type of ensemble must be used for all parameters.Both alternatives combine the statistics of the two models non-linearly.The uncertainties are propagated and combined by evaluating the model for all samples and calculating the desired statistics, just as if the combined ensemble described one model.

Selected ensembles
The standard (STD) ensemble employed in the UKF (as defined in section 4.2) utilizes the perhaps simplest possible excitation matrix, While the ultimate simplicity is its main advantage, the long maximal(!)range M (∞) is its main disadvantage.

Digital Filters and Signal Processing
How far the reduction of samples might be driven is illustrated by the minimal simplex (SPX) ensemble, where the operator ⊥ performs classical Gram-Schmidt orthogonalization [11] and normalization of rows.The ensemble is constructed from half the STD ensemble, complemented with one sample 1 n×1 to cancel the first moments.Since that violates the orthogonality of the rows of V , ⊥ must be applied to satisfy V V T = I .The high efficiency of the SPX ensemble is tarnished by its large skewness, or M (3) .This may give considerable bias of propagated covariance for non-LP models, but is irrelevant for LP models.
The binary (BIN) ensemble has minimal range to guarantee allowed samples.By varying all parameters with an equal magnitude of one standard deviation in all samples, the diverging factor n of the STD is eliminated.Its excitation matrix V ^BIN is fundamentally constructed from a standard binary array, with the difference that the allowed levels are ±1 instead of 0, 1 (see rows 1-3 in Eq. 32).It is then complemented with supplementary rows obtained in two ways, by cyclic shifting and mirror imaging, ( ) Cyclic shifts are applied to all original rows except the first, by a quarter of their periodicity.
Mirror imaging of a row is defined to change the sign of its second half and is applied to all original rows except the last two, and all shifted rows except the last.For instance, in Eq. 32 row 4 and 5 are shifted versions of row 2 and 3, while rows 6 and 7 are the mirror images of rows 1 and the shifted row 4. The supplementary rows reduce the size of the ensemble drastically with a corresponding improvement of the efficiency.For n = 20 parameters, the size drops from roughly 10 6 to 128 samples.That size is acceptable in perspective of the n + 1 = 21 samples of the most efficient SPX.Eventually though, the number of samples will grow too large.The BIN can thus only be applied to moderately sized models.
Deterministic Sampling for Quantification of Modeling Uncertainty of Signals http://dx.doi.org/10.5772/52193 By no means, this brief survey exhausts all possible ensembles.Many criteria for selecting the most appropriate ensemble can be formulated.Here, the first and second moments, parameter ranges and efficiency were in focus.

Application -Modeling uncertainty of a dynamic device
The task is to simulate the response of an electrical device such as an amplifier or oscilloscope, in the presence of non-stationary correlated noise on its input.An uncertain LR CT model of the device and its parametric covariance is usually found by applying system identification techniques [4] on calibration measurements [6].Such a model of the system can be sampled into a digital filter and be described in the pole-zero form in Eq. 2. These standard steps will here be omitted.The system model will instead be assigned to a digital low-pass Butterworth filter, of order 10 and cross-over frequency f C = 0.1 f N , f N being the Nyquist frequency and described by parameters K , p 1 , p 2 , … p 10 , z 1 , z 2 , z 10 .The complete correlations of complexconjugated pole ( p) and zero (z) pairs are eliminated by a transformation from q = z, p to Re(q), Im(q) ≥ 0, giving n = 21 system model parameters, To be most general, the non-parametric input noise model is chosen to be correlated/colored and non-stationary.The noise parameter δ x k represents the noise level at time sample k .Its generating signal [7] is a Dirac delta function δ jk , centered at time k.The response of a system with impulse response h will be δ y j = δ x k ⋅ (h j * δ jk ) = h j−k ⋅ δ x k .In matrix notation, δ y = h ¯T δx, Since the response is linear in noise parameters, it is sufficient to only capture cov(x).

Reference ensembles
Traditionally, any method for uncertainty propagation is evaluated by comparisons with the default method of linearization [10,16], and brute-force random sampling (MC) [9] as state-ofthe-art.There are several drawbacks of this approach.Linearization is a coarse approximation for LP models and MC suffer from the difficulty of modeling dependencies and low efficiency.An alternative is to construct finite reference ensembles (REF) and by definition let them describe the truth.Their primary advantage is that the finite size of the REF makes it possible to propagate the uncertainty exactly, using all REF samples.A more or less arbitrary REF may be generated randomly, like any MC ensemble.All requirements are also automatically Digital Filters and Signal Processing fulfilled since the REF is built of possible realizations.Also, the REF closes the loop as it makes it possible to compare 'true' and approximate samples directly on an equal footing (see Fig. 8).Even though the samples differ substantially, the resulting modeling uncertainties can be similar.
A plausible REF δq j for the system model realized as a digital filter is created by randomly generating m samples of n parameters q k from uniform distributionsUNI(0, σ k ), with σ k listed in Fig. 3, top left.The joint pdf will have compact support [11], as required to guarantee stability.The mean is subtracted from all samples to remove the bias of the finite random ensemble, δq j E = δq j = 0, ∀ j.The covariance of the REF will have a desirable more or less random variation for small values of m.If the REF samples are arranged in columns of a matrix For m = 31 > n = 21, the strong correlations will expose the methods to severe tests with significant transformations U , S. The mixing Ψ (Eq.13) using transformation U T SU S −1 was considerable, but less than for U T , see caption Fig. 3.For the chosen REF, the resulting variations of poles and zeros are displayed in Fig. 3.The obtained variation of the parameters defined in Eq. 33 can be quantified with an averaged correlation index and standard deviation (Fig. 3, bottom left), .
A REF signal δ x j for non-stationary correlated noise may conveniently be generated from an autoregressive process (AR) acting on time-dependent zero mean white noise δw, δx = g ¯T δw, where g ¯kj = g j−k is the matrix of translated impulse responses g for the AR process defined by parameters α k .Assigning a square wave time-dependence, ( The exact REF result of modelling covariance of noise is given by combining Eqs.34 and 36, Tr cov Tr diag .

T T N y h g gh
An explicit realization of the REF for the noise model is hence not needed.Specifically, a second order system α = 1 − 0.4 0.6 with time parameters {Ν, ψ, T , φ} = {0.05,0. chosen.The impulse responses of the AR noise system and the system model, and the variation of the noise model are illustrated in Fig. 3, bottom right.
The 'true' result given by the response for the REFs for the different test signals is shown in Fig. 4. The propagated noise variation σ N differs substantially from the input square wave η (top left) and is almost opposite in phase, due to the response time of about f C −1 , see delay of μ S (top, right and bottom).The signal distortion (μ S ) is strongly dependent on the input signal and decreases with increased regularity / differentiability.The propagated covariance σ S has a more complex variation (top, right and bottom), as it is larger for the more regular Gaussian (bottom, right) than for the triangular pulse (bottom, left).

Deterministic sampling
The error of the scent and the standard deviation for the STD, SPX and BIN ensembles of the system model is displayed in Fig. 5, for all test signals.The low scent of the REF (left: thin, dotted) suggests the model is close to LP.Despite the relative errors are large they are quite small on an absolute scale.The SPX has the largest errors, for the scent as well as the variance.
That is likely caused by its skewness being much larger than that of the REF.The BIN has the lowest errors and is thus the best approximate representation of the REF.parameters q (Eq.33), with mixing (Eq.11) Ψ ( U T SU S −1 ) = 0.22 (adopted) and Ψ ( U T ) = 0.39.Bottom, right: Impulse re- sponses h ( q , t) and g(t) and time-dependence η(t) (Eq.36) of noise intensity.The correlation lengths λ h , λ g were determined according to Eqs. 22-23, for β = 0.05.
Digital Filters and Signal Processing  (top left) and is almost opposite in phase, due to the response time of about covariance S  has a more complex variation (top, right and bottom), as it is larger for the 4 more regular Gaussian (bottom, right) than for the triangular pulse (bottom, left).5 S and noise   N models.The variation of noise intensity is given by   t  (top, left).10

Deterministic sampling 11
The error of the scent and the standard deviation for the STD, SPX and BIN ensembles of the 12 system model is displayed in Fig.The errors might appear large, considering all ensembles are 'correct', i.e. correctly represent (typically) available accurate information (mean and covariance of parameters).The errors reflect ambiguities caused by the ubiquitous lack of information in signal processing, rather than inadequacies of DS.RS can only produce better results by making further assumptions.
The result of applying the ESV and the CRS methods to reduce the SPX ensemble for propagating the noise is displayed in Fig. 6.By choosing sufficiently low thresholds α for elimination of singular values (ESV) and β for truncation of the correlation lengths (see Eqs. 20,22), the errors can be made arbitrarily low.As the reduction will decrease accordingly, there is a trade-off between accuracy and efficiency.For the chosen values, CRS is about twice as accurate and twice as efficient as ESV.In contrast to ESV, the number of samples for the CRS method is independent of the number of noise samples.The computational cost thus increases linearly with the length of the noise signal for CRS but quadratically (approximately) for ESV.For ESV to be most efficient, the model covariance needs to be strongly rank deficient.That is not as unlikely as it might appear, since the model usually is derived from a limited amount of experimental results.
The errors might appear large, considering all ensembles are 'correct', i.e. correctly represent 1 (typically) available accurate information (mean and covariance of parameters).The errors 2 reflect ambiguities caused by the ubiquitous lack of information in signal processing, rather 3 than inadequacies of DS.RS can only produce better results by making further assumptions.4 The result of applying the ESV and the CRS methods to reduce the SPX ensemble for 5 propagating the noise is displayed in Fig. 6.By choosing sufficiently low thresholds  for 6 elimination of singular values (ESV) and  for truncation of the correlation lengths (see 7 Eqs. 20,22), the errors can be made arbitrarily low.As the reduction will decrease 8 accordingly, there is a trade-off between accuracy and efficiency.For the chosen values, CRS 9 is about twice as accurate and twice as efficient as ESV.In contrast to ESV, the number of 10 samples for the CRS method is independent of the number of noise samples.The 11 computational cost thus increases linearly with the length of the noise signal for CRS but 12 quadratically (approximately) for ESV.For ESV to be most efficient, the model covariance 13 needs to be strongly rank deficient.That is not as unlikely as it might appear, since the 14 model usually is derived from a limited amount of experimental results.The summation of the noise and the model covariance is illustrated in Fig. 7.The 21 propagation of the covariance of the system model   q is not LP.The quadratic summation 22 rule (Eq.27), or Gauss approximation formula [16], is therefore not applicable.Nevertheless, 23 the low scent  (Fig. 5, left) suggests that both propagations are close to LP.The summation 24 error    is hence finite, but quite small.It differs qualitatively from both contributions, 25 indicating that the summation is non-trivial.26 The summation of the noise and the model covariance is illustrated in Fig. 7.The propagation of the covariance of the system model (q) is not LP.The quadratic summation rule (Eq.27), or Gauss approximation formula [16], is therefore not applicable.Nevertheless, the low scent ζ (Fig. 5, left) suggests that both propagations are close to LP.The summation error (ε) is hence finite, but quite small.It differs qualitatively from both contributions, indicating that the summation is non-trivial.
Finally, the samples of one pole of the derived ensembles are compared to the reference samples of the REF in Fig. 8.The limit ( | z | = 1) of stability is included to illustrate how close the samples are to be physically forbidden.The construction of the different ensembles is apparent, even though the transformation T = U T SU S −1 distorts the scatter plots (sections 3.2, 5.2), and tilts the principal axes (lines).The samples of the REF are almost evenly distributed.Only four samples of the STD, labelled p 1 , p 2 , p 3 , p 4 , deviate significantly from a dense central cluster, as described by the excitation matrix V ^STD (Eq. 30).It also is evident that SPX originates from half the STD.A small translation required to achieve the correct mean is discernible, while the Gram-Schmidt orthogonalization renders a minor rotation and distortion.The BIN contains comparable variations in all samples and thus has no central cluster and its samples are repelled from the principal directions (lines).The statistical differences to the REF refer to the shape of the joint pdf.Choosing the best ensemble is thus equivalent of selecting the most appropriate pdf in RS.The BIN seems to resemble the REF scatter plot the most, as verified by its low errors in Fig. 5.   , deviate significantly 10 from a dense central cluster, as described by the excitation matrix STD V (Eq.30).It also is 11 evident that SPX originates from half the STD.A small translation required to achieve the 12 correct mean is discernible, while the Gram-Schmidt orthogonalization renders a minor 13 rotation and distortion.The BIN contains comparable variations in all samples and thus has 14 no central cluster and its samples are repelled from the principal directions (lines).The 15 statistical differences to the REF refer to the shape of the joint pdf.Choosing the best 16 ensemble is thus equivalent of selecting the most appropriate pdf in RS.The BIN seems to 17 resemble the REF scatter plot the most, as verified by its low errors in Fig. 5. 18

Conclusions
Deterministic sampling remains controversial [27] while random sampling has qualified as a preferred state-of-the-art method for propagating uncertainty.Both result in finite statistical [17] ensembles, which are approximate finite representations of the primary statistical models.Their sampling strategies and convergence rates are dramatically different.While deterministic sampling humbly aims at representing the most relevant and best known statistical information, random sampling targets complete control of all features of the ensemble.Such detailed information is rarely known and must instead be more or less blindly assigned.The inevitable consequence is that critical computational resources are spent on propagating, at best, vaguely known details.The numerical power of modern computers is better spent on refinements of the signal processing model (longer time series, higher sampling rates, larger systems etc.).Refined methods of random sampling have therefore been proposed which either simplifies the model, or improve the sampling distributions.Compared to deterministic sampling though, their convergence rates remain low.
It is easy to confuse deterministic sampling with experimental design and optimization [28].Even though any sample could be a possible outcome of an experiment, deterministic ensembles represent rather than realize (as random ensembles) statistical distributions.Instead of associating a joint distribution to the parameters of an uncertain model, it is possible to directly represent their statistics with a deterministic ensemble.That would eliminate the need of interpreting abstract distributions and result in complete reproducibility.The critical choice of ensemble would be assigned once and for all in the calibration experiment, with no further need of approximation.
The use of excitation matrices made it possible to construct universal generic ensembles.The efficiency of the minimal SPX ensemble is indeed high but so is also its third moment.While the STD maximizes the range of each parameter, the BIN minimizes it by varying all parameters in all samples.The STD is the simplest while the SPX is the most efficient ensemble.In the example, the BIN was most accurate.For non-parametric models with many parameters, reduction of samples may be required.Elimination of singular values (ESV) and correlated sampling (CRS) were two such techniques.The presented ensembles are not to be associated to random sampling as a method.They are nothing but a few examples of deterministic sampling, likely the best ensembles are yet to be discovered.
It is indeed challenging but also rewarding to find novel deterministic sampling strategies.Once the sampling rules are found, the application is just as simple as random sampling, but usually much more efficient.Deterministic sampling is one of very few methods capable of non-linear propagation of uncertainty through large signal processing models.

Figure 1 .
Figure 1.Left: The sample space of independent scaled parameters   k k q S q I :  (Eq.12), and 9 of the two transformations   T U (rotated) and   1 

Figure 2 . 2 M
Figure 2. The second   2 M (left) and fourth  

4 M
(right) moments for stratified (solid) and 10 brute force sampling  

Figure 3 .
Figure 3. Top: Assigned variations (left) and resulting (z k middle, p k right) samples of the REF of the system model.Label P indicates the pole explored in Fig. 8. Bottom, left: Obtained variations σ k (dots) and correlations ξ k (bars) of


(top, right and bottom).The signal distortion   S  is strongly dependent on the 2 input signal and decreases with increased regularity / differentiability.The propagated 3

6 7Figure 4 .
Figure 4.The mean S  (dashed), the standard deviations N S ,  (solid) and the scent  (thin, 8 solid) for the REFs, for the different test signals (thin, dashed).The subscripts refer to the 9 system   S and noise   N models.The variation of noise intensity is given by  

Figure 4 . 1 2 3 Figure 5 . 7 Figure 5 .
Figure 4.The mean μ S (dashed), the standard deviations σ S ,N (solid) and the scent ζ (thin, solid) for the REFs, for the different test signals (thin, dashed).The subscripts refer to the system (S ) and noise (N ) models.The variation of noise intensity is given by η(t) (top, left).

Figure 8 . 1 2Figure 7 .
Figure8.The different samples (dots) of the pole marked 'P' in Fig.3, of the reference (REF), standard (STD), simplex (SPX) and binary (BIN) ensembles.The limit | z | = 1 of stability (solid, thick) and lines connecting the primary variations p 1 , p 2 , p 3 , p 4 of the STD as well as lines (dashed) to combined excitations of the BIN are included for reference.
Deterministic Sampling for Quantification of Modeling Uncertainty of Signals http://dx.doi.org/10.5772/52193ment of using a random generator to represent statistical information.Fixed grids are examples of deterministic sampling.Stratification provides an interesting intermediate type of sampling since it is partially deterministic -the strata are constructed deterministically but the samples within each stratum are generated randomly.The construction of a fixed grid requires focus Deterministic Sampling for Quantification of Modeling Uncertainty of Signals http://dx.doi.org/10.5772/52193samples of signals like impulse responses, or noise signals.The required LR property of the system then implies LP.The propagation of covariance is then linear and only the undistorted first and second moments need to be encoded.
less important for the intended use though.Signal processing models with large numbers of parameters are typically non-parametric and usually describe 3, 2 f C −1 , π / 8} was Deterministic Sampling for Quantification of Modeling Uncertainty of Signals http://dx.doi.org/10.5772/52193 5, for all test signals.The low scent of the REF (left: thin, 13 dotted) suggests the model is close to LP.Despite the relative errors are large they are quite 14 small on an absolute scale.The SPX has the largest errors, for the scent as well as the 15 variance.That is likely caused by its skewness being much larger than that of the REF.The 16 BIN has the lowest errors and is thus the best approximate representation of the REF.17