The realization of a discrete-time, linear, time-invariant (LTI) filter from its impulse response provides insight into the role of linear algebra in the analysis of both dynamical systems and rational functions. For an LTI filter, a sequence of output data measured over some finite period of time may be expressed as the linear combination of the past input and the input measured over that same period. For a finite-dimensional LTI filter, the mapping from past input to future output is a finite-rank linear operator, and the effect of past input, that is, the memory of the system, may be represented as a finite-dimensional vector. This vector is the state of the system.
The central idea of realization theory is to first identify the mapping from past input to future output and to then factor it into two parts: a map from the input to the state and another from the state to the output. This factorization guarantees that the resulting system representation is both casual and finite-dimensional; thus it can be physically constructed, or realized.
System identification is the science of constructing dynamic models from experimentally measured data. Realization-based identification methods construct models by estimating the mapping from past input to future output based on this measured data. The non-deterministic nature of the estimation process causes this mapping to have an arbitrarily large rank, and so a rank-reduction step is required to factor the mapping into a suitable state-space model. Both these steps must be carefully considered to guarantee unbiased estimates of dynamic systems.
The foundations of realization theory are primarily due to Kalman and first appear in the landmark paper of , though the problem is not defined explicitly until , which also coins the term “realization” as being the a state-space model of a linear system constructed from an experimentally measured impulse response. It was  that introduced the structured-matrix approach now synonymous with the term “realization theory” by re-interpreting a theorem originally due to  in a state-space LTI system framework.
Although Kalman's original definition of “realization” implied an identification problem, it was not until  proposed rank-reduction by means of the singular-value decomposition that Ho's method became feasible for use with non-deterministic data sets. The combination of Ho's method and the singular-value decomposition was finally generalized to use with experimentally measured data by Kung in .
With the arrival of Kung's method came the birth of what is now known as the field of subspace identification methods. These methods use structured matrices of arbitrary input and output data to estimate a state-sequence from the system. The system is then identified from the propagation of the state over time. While many subspace methods exist, the most popular are the Multivariable Output-Error State Space (MOESP) family, due to , and the Numerical Algorithms for Subspace State-Space System Identification (N4SID) family, due to . Related to subspace methods is the Eigensystem Realization Algorithm , which applies Kung's algorithm to impulse-response estimates, which are typically estimated through an Observer/Kalman Filter Identification (OKID) algorithm .
This chapter presents the central theory behind realization-based system identification in a chronological context, beginning with Kronecker's theorem, proceeding through the work of Kalman and Kung, and presenting a generalization of the procedure to arbitrary sets of data. This journey provides an interesting perspective on the original role of linear algebra in the analysis of rational functions and highlights the similarities of the different representations of LTI filters. Realization theory is a diverse field that connects many tools of linear algebra, including structured matrices, the QR-decomposition, the singular-value decomposition, and linear least-squares problems.
2. Transfer-function representations
We begin by reviewing some properties of discrete-time linear filters, focusing on the role of infinite series expansions in analyzing the properties of rational functions. The reconstruction of a transfer function from an infinite impulse response is equivalent to the reconstruction of a rational function from its Laurent series expansion. The reconstruction problem is introduced and solved by forming structured matrices of impulse-response coefficients.
2.1. Difference equations and transfer functions
Discrete-time linear filters are most frequently encountered in the form of difference equations that relate an input signal to an output signal . A simple example is an output determined by a weighted sum of the inputs from to ,
More commonly, the output also contains a weighted sum of previous outputs, such as a weighted sum of samples from to ,
The impulse response of a filter is the output sequence generated from an input
The parameters are the impulse-response coefficients, and they completely describe the behavior of an LTI filter through the convolution
Filters of type (▭) are called finite-impulse response (FIR) filters because is a finite-length sequence that settles to 0 once . Filters of type (▭) are called infinite impulse response (IIR) filters since generally the impulse response will never completely settle to 0.
A system is stable if a bounded results in a bounded . Because the output of LTI filters is a linear combination of the input and previous output, any input-output sequence can be formed from a linear superposition of other input-output sequences. Hence proving that the system has a bounded output for a single input sequence is necessary and sufficient to prove the stability of an LTI filter. The simplest input to consider is an impulse, and so a suitable definition of system stability is that the absolute sum of the impulse response is bounded,
Though the impulse response completely describes the behavior of an LTI filter, it does so with an infinite number of parameters. For this reason, discrete-time LTI filters are often written as transfer functions of a complex variable . This enables analysis of filter stability and computation of the filter's frequency response in a finite number of calculations, and it simplifies convolution operations into basic polynomial algebra.
The transfer function is found by grouping output and input terms together and taking the -transform of both signals. Let be the -transform of and be the -transform of . From the property
the relationship between and may be expressed in polynomials of as
The ratio of these two polynomials is the filter's transfer function
When , is proper. If the transfer function is not proper, then the difference equations will have dependent on future input samples such as . Proper transfer functions are required for causality, and thus all physical systems have proper transfer function representations. When , the system is strictly proper. Filters with strictly proper transfer functions have no feed-through terms; the output does not depend on , only the preceding input . In this chapter, we assume all systems are causal and all transfer functions proper.
If and have no common roots, then the rational function is coprime, and the order of cannot be reduced. Fractional representations are not limited to single-input-single-output systems. For vector-valued input signals and output signals , an LTI filter may be represented as an matrix of rational functions , and the system will have matrix-valued impulse-response coefficients. For simplicity, we will assume that transfer function representations are single-input-single-output, though all results presented here generalize to the multi-input-multi-output case.
2.2. Stability of transfer function representations
Because the effect of is equivalent to a finite-impulse response filter, the only requirement for to produce a stable system is that its coefficients be bounded, which we may safely assume is always the case. Thus the stability of a transfer function is determined entirely by , or more precisely, the roots of . To see this, suppose is factored into its roots, which are the poles of ,
To guarantee a bounded , it is sufficient to study a single pole, which we will denote simply as . Thus we wish to determine necessary and sufficient conditions for stability of the system
Note that may be complex. Assume that . then has the Laurent-series expansion
From the time-shift property of the -transform, it is immediately clear that the sequence
This is true if and only if , and thus is stable if and only if . Finally, from (▭) we may deduce that a system is stable if and only if all the poles of satisfy the property .
2.3. Construction of transfer functions from impulse responses
Transfer functions are a convenient way of representing complex system dynamics in a finite number of parameters, but the coefficients of and cannot be measured directly. The impulse response of a system can be found experimentally by either direct measurement or from other means such as taking the inverse Fourier transform of a measured frequency response . It cannot, however, be represented in a finite number of parameters. Thus the conversion between transfer functions and impulse responses is an extremely useful tool.
For a single-pole system such as (▭), the expansion (▭) provides an obvious means of reconstructing a transfer function from a measured impulse response: given any 2 sequential impulse-response coefficients and , the pole of may be found from
Notice that this is true for any , and the impulse response can be said to have a shift-invariant property in this respect.
Less clear is the case when an impulse response is generated by a system with higher-order and . In fact, there is no guarantee that an arbitrary impulse response is the result of a linear system of difference equations at all. For an LTI filter, however, the coefficients of the impulse response exhibit a linear dependence which may be used to not only verify the linearity of the system, but to construct a transfer function representation as well. The exact nature of this linear dependence may be found by forming a structured matrix of impulse response coefficients and examining its behavior when the indices of the coefficients are shifted forward by a single increment, similar to the single-pole case in (▭). The result is stated in the following theorem, originally due to Kronecker  and adopted from the English translation of .
Theorem 1 (Kronecker's Theorem) Suppose is an infinite series of descending powers of , starting with ,
Assume G(z) is analytic (the series converges) for all . Let be an infinitely large matrix of the form
Then has finite rank if and only if is a strictly proper, coprime, rational function of degree with poles inside the unit circle. That is, has an alternative representation
in which , all roots of satisfy , and have no common roots, and we have assumed without loss of generality that is monic.
To prove Theorem ▭, we first prove that for , must be linearly dependent on the previous terms of the series for to have finite rank.
Theorem 2 The infinitely large matrix is of finite rank if and only if there exists a finite sequence such that for ,
and is the smallest number with this property.
Let be the row of beginning with . If has rank , then the first rows of are linearly dependent. This implies that for some , is a linear combination of , and thus there exists some sequence such that
The structure and infinite size of imply that such a relationship must hold for all following rows of , so that for
Hence any row , , can be expressed as a linear combination of the previous rows. Since has at least linearly independent rows, , and since this applies element-wise, implies (▭).
We now prove Theorem ▭.
Multiplying both sides by the denominator of the left,
and equating powers of , we find
From this, we have, for ,
The construction in Theorem ▭ is simple to extend to the case in which is only proper and not strictly proper; the additional coefficient is simply the feed-through term in the impulse response, that is, .
A result of Theorem ▭ is that given finite-dimensional, full-rank matrices
the coefficients of may be calculated as
Notice that (▭) is in fact a special case of (▭). Thus we need only know the first impulse-response coefficients to reconstruct the transfer function : to form the matrices and from (▭) and (▭), respectively, and possibly the initial coefficient in case of an -order .
Matrices with the structure of are useful enough to have a special name. A Hankel matrix is a matrix constructed from a sequence so that each element . For the Hankel matrix in (▭), . also has an interesting property implied by (▭): its row space is invariant under shifting of the index . Because its symmetric, this is also true for its column space. Thus this matrix is also often referred to as being shift-invariant.
While (▭) provides a potential method of identifying a system from a measured impulse response, this is not a reliable method to use with measured impulse response coefficients that are corrupted by noise. The exact linear dependence of the coefficients will not be identical for all , and the structure of (▭) will not be preserved. Inverting will also invert any noise on , potentially amplifying high-frequency noise content. Finally, the system order is required to be known beforehand, which is usually not the case if only an impulse response is available. Fortunately, these difficulties may all be overcome by reinterpreting the results Kronecker's theorem in a state-space framework. First, however, we more carefully examine the role of the Hankel matrix in the behavior of LTI filters.
2.4. Hankel and Toeplitz operators
The Hankel matrix of impulse response coefficients (▭) is more than a tool for computing the transfer function representation of a system from its impulse response. It also defines the mapping of past input signals to future output signals. To define exactly what this means, we write the convolution of (▭) around sample in matrix form as
where the vectors and matrix have been partitioned into sections for and . The output for may then be split into two parts:
where the subscripts and denote “past” and “future,” respectively. The system Hankel matrix has returned to describe the effects of the past input on the future output . Also present is the matrix , which represents the convolution of future input with the impulse response. Matrices such as with constant diagonals are called Toeplitz matrices.
From (▭), it can be seen that defines the effects of past input on future output. One interpretation of this is that represents the “memory” of the system. Because is a linear mapping from to , the induced matrix 2-norm of , , can be considered a function norm, and in a sense, is a measure of the “gain” of the system. is often called the Hankel-norm of a system, and it plays an important role in model reduction and in the analysis of anti-causal systems. More information on this aspect of linear systems can be found in the literature of robust control, for instance, .
3. State-space representations
Although transfer functions define system behavior completely with a finite number of parameters and simplify frequency-response calculations, they are cumbersome to manipulate when the input or output is multi-dimensional or when initial conditions must be considered. The other common representation of LTI filters is the state-space form
in which is the system state. The matrices , , , and completely parameterize the system. Only uniquely defines the input-output behavior; any nonsingular matrix may be used to change the state basis via the relationships
The -transform may also be applied to the state-space equations (▭) to find
Besides clarifying the effect of initial conditions on the output, state-space representations are inherently causal, and (▭) will always result in a proper system (strictly proper if ). For this reason, state-space representations are often called realizable descriptions; while the forward-time-shift of is an inherently non-causal operation, state-space systems may always be constructed in reality.
3.1. Stability, controllability, and observability of state-space representations
The system impulse response is simple to formulate in terms of the state-space parameters by calculation of the output to a unit impulse with :
The term will only converge if the largest eigenvalue of is within the unit circle, and thus the condition that all eigenvalues of satisfy is a necessary and sufficient condition for stability.
For state-space representations, there is the possibility that a combination of and will result in a system for which cannot be entirely controlled by the input . Expressing in a matrix-form similar to (▭) as
demonstrates that is in subspace if and only if has rank . is the controllability matrix and the system is controllable if it has full row rank.
Similarly, the state may not uniquely determine the output for some combinations of and . Expressing the evolution of the output as a function of the state in matrix-form as
demonstrates that there is no nontrivial null space in the mapping from to if and only if has rank . is the observability matrix and the system is observable if it has full column rank.
Systems that are both controllable and observable are called minimal, and for minimal systems, the dimension of the state variable cannot be reduced. In the next section we show that minimal state-space system representations convert to coprime transfer functions that are found through (▭).
3.2. Construction of state-space representations from impulse responses
The fact that the denominator of is the characteristic polynomial of not only allows for the calculation of a transfer function from a state-space representation, but provides an alternative version of Kronecker's theorem for state-space systems, known as the Ho-Kalman Algorithm . From the Caley-Hamilton theorem, if is the characteristic polynomial of , then , and
which confirms our previous statement that effectively represents the memory of the system. Because
we see that implies the state-space system (▭) is minimal.
If the entries of are shifted forward by one index to form
then once again substituting (▭) reveals
Thus the row space and column space of are invariant under a forward-shift of the indices, implying the same shift-invariant structure seen in (▭).
The appearance of in (▭) hints at a method for constructing a state-space realization from an impulse response. Suppose the impulse response is known exactly, and let be a finite slice of with block rows and columns,
Then any appropriately dimensioned factorization
may be used to find for some arbitrary state basis as
where is with the indices shifted forward once and is the Moore-Penrose pseudoinverse. taken from the first block row of , taken from the first block column of , and taken from then provides a complete and minimal state-space realization from an impulse response. Because has rank and has degree , we know from Kronecker's theorem that taken from (▭) will be coprime.
However, as mentioned before, the impulse response of the system is rarely known exactly. In this case only an estimate with a non-deterministic error term is available:
Because is non-deterministic, will always have full rank, regardless of the number of rows . Thus cannot be determined from examining the rank of , and even if is known beforehand, a factorization (▭) for will not exist. Thus we must find a way of reducing the rank of in order to find a state-space realization.
3.3. Rank-reduction of the Hankel matrix estimate
If has full rank, or if is unknown, its rank must be reduced prior to factorization. The obvious tool for reducing the rank of matrices is the singular-value decomposition
where and are orthogonal matrices and is a diagonal matrix containing the nonnegative singular values
Because and are orthogonal, the SVD satisfies
where is the induced matrix 2-norm, and
where is the first columns of , is the upper-left block of , and is the first rows of , the solution to the rank-reduction problem is 
Additionally, the error resulting from the rank reduction is
which suggests that if the rank of is not known beforehand, it can be determined by examining the nonzero singular values in the deterministic case or by searching for a significant drop-off in singular values if only a noise-corrupted estimate is available.
3.4. Identifying the state-space realization
From a rank-reduced , any factorization
can be used to estimate and . The error in the state-space realization, however, will depend on the chosen state basis. Generally we would like to have a state variable with a norm in between and . As first proposed in , choosing the factorization
and thus, from a functional perspective, the mappings from input to state and state to output will have equal magnitudes, and each entry of the state vector will have similar magnitudes. State-space realizations that satisfy (▭) are sometimes called internally balanced realizations . (Alternative definitions of a “balanced” realization exist, however, and it is generally wise to verify the definition in each context.)
Choosing the factorization (▭) also simplifies computation of the estimate , since
By estimating as the first block column of , as the first block row of , and as , a complete state-space realization is identified from this method.
3.5. Pitfalls of direct realization from an impulse response
Even though the rank-reduction process allows for realization from a noise-corrupted estimate of an impulse response, identification methods that generate a system estimate from a Hankel matrix constructed from an estimated impulse response have numerous difficulties when applied to noisy measurements. Measuring an impulse response directly is often infeasible; high-frequency damping may result in a measurement that has a very brief response before the signal-to-noise ratio becomes prohibitively small, and a unit pulse will often excite high-frequency nonlinearities that degrade the quality of the resulting estimate.
Taking the inverse Fourier transform of the frequency response guarantees that the estimates of the Markov parameters will converge as the dataset grows only so long as the input is broadband. Generally input signals decay in magnitude at higher frequencies, and calculation of the frequency response by inversion of the input will amplify high-frequency noise. We would prefer an identification method that is guaranteed to provide a system estimate that converges to the true system as the amount of data measured increases and that avoids inverting the input. Fortunately, the relationship between input and output data in (▭) may be used to formulate just such an identification procedure.
4. Realization from input-output data
To avoid the difficulties in constructing a system realization from an estimated impulse response, we will form a realization-based identification procedure applicable to measured input-output data. To sufficiently account for non-deterministic effects in measured data, we add a noise term to the output to form the noise-perturbed state-space equations
We assume that the noise signal is generated by a stationary stochastic process, which may be either white or colored. This includes the case in which the state is disturbed by process noise, so that the noise process may have the same poles as the deterministic system. (See  for a thorough discussion of representations of noise in the identification context.)
4.1. Data-matrix equations
The goal is to construct a state-space realization using the relationships in (▭), but doing so requires a complete characterization of the row space of . To this end, we expand a finite-slice of the future output vector to form a block-Hankel matrix of output data with block rows,
This matrix is related to a block-Hankel matrix of future input data
a block-Toeplitz matrix of past input data
a finite-dimensional block-Toeplitz matrix
the system Hankel matrix , and a block-Hankel matrix formed from noise data with the same indices as by the equation
If the entries of are shifted forward by one index to form
then is related to the shifted system Hankel matrix , the past input data , with a block column appended to the left, and with a block row appended to the bottom,
and a block-Hankel matrix of noise data with the same indices as by the equation
From (▭), the state is equal to the column vectors of multiplied by the entries of the controllability matrix , which we may represent as the block-matrix
Y T U V. Equation (▭) is basis for the field of system identification methods known as subspace methods. Subspace identification methods typically fall into one of two categories. First, because a shifted observability matrix
where of denotes the row space (often called the “image”), the row-space of is shift-invariant, and may be identified from estimates and as
Alternatively, because a forward-propagated sequence of states
the column-space of is shift-invariant, and may be identified from estimates and as
In both instances, the system dynamics are estimated by propagating the indices forward by one step and examining a propagation of linear dynamics, not unlike (▭) from Kronecker's theorem. Details of these methods may be found in  and . In the next section we present a system identification method that constructs system estimates from the shift-invariant structure of itself.
4.2. Identification from shift-invariance of output measurements
Equations (▭) and (▭) still contain the effects of the future input in . To remove these effects from the output, we must first add some assumptions about . First, we assume that has full row rank. This is true for any with a smooth frequency response or if is generated from some pseudo-random sequence. Next, we assume that the initial conditions in do not somehow cancel out the effects of future input. A sufficient condition for this is to require
to have full row rank. Although these assumptions might appear restrictive at first, since it is impossible to verify without knowledge of , it is generally true with the exception of some pathological cases.
Next, we form the null-space projector matrix
which has the property
We know the inverse of exists, since we assume has full row rank. Projector matrices such as (▭) have many interesting properties. Their eigenvalues are all 0 or 1, and if they are symmetric, they separate the subspace of real vectors — in this case, vectors in — into a subspace and its orthogonal complement. In fact, it is simple to verify that the null space of contains the null space of as a subspace, since
and . Substitution into (▭) reveals
A similar result holds for . Taking the QR-decomposition of the data can alternatively be thought of as using the principle of superposition to construct new sequences of input-output data through a Gram-Schmidt-type orthogonalization process. A detailed discussion of this interpretation can be found in .
Thus we have successfully removed the effects of future input on the output while retaining the effects of the past, which is the foundation of the realization process. We still must account for non-deterministic effects in and . To do so, we look for some matrix such that
Finally we are able to perform our rank-reduction technique directly on measured data without needing to first estimate the impulse response. From the SVD
we may estimate the order by looking for a sudden decrease in singular values. From the partitioning
we may estimate and from the factorization
may then be estimated as
And so we have returned to our original relationship (▭).
While may be estimated from the top block row of , our projection has lost the column space of that we previously used to estimate , and initial conditions in prevent us from estimating directly. Fortunately, if and are known, then the remaining terms , , and an initial condition are linear in the input output data, and may be estimated by solving a linear-least-squares problem.
4.3. Estimation of B, D, and x0
The input-to-state terms and may be estimated by examining the convolution with the state-space form of the impulse response. Expanding (▭) with the input and including an initial condition results in
Factoring out and on the right provides
in which is the operation that stacks the columns of a matrix on one another, is the (coincidentally named) Kronecker product, and we have made use of the identity
Grouping the unknown terms together results in
Thus by forming the regressor
from the estimates and , estimates of and may be found from the least-squares solution of the linear system of equations
Note that is arbitrary and does not need to be related in any way to the indices of the data matrix equations. This can be useful, since for large-dimensional systems, the regressor may become very computationally expensive to compute.
Beginning with the construction of a transfer function from an impulse response, we have constructed a method for identification of state-space realizations of linear filters from measured input-output data, introducing the fundamental concepts of realization theory of linear systems along the way. Computing a state-space realization from measured input-output data requires many tools of linear algebra: projections and the QR-decomposition, rank reduction and the singular-value decomposition, and linear least squares. The principles of realization theory provide insight into the different representations of linear systems, as well as the role of rational functions and series expansions in linear algebra.