## 1. Introduction

The realization of a discrete-time, linear, time-invariant (LTI) filter from its impulse response provides insight into the role of linear algebra in the analysis of both dynamical systems and rational functions. For an LTI filter, a sequence of output data measured over some finite period of time may be expressed as the linear combination of the past input and the input measured over that same period. For a finite-dimensional LTI filter, the mapping from past input to future output is a finite-rank linear operator, and the effect of past input, that is, the memory of the system, may be represented as a finite-dimensional vector. This vector is the *state* of the system.

The central idea of realization theory is to first identify the mapping from past input to future output and to then factor it into two parts: a map from the input to the state and another from the state to the output. This factorization guarantees that the resulting system representation is both casual and finite-dimensional; thus it can be physically constructed, or *realized*.

*System identification* is the science of constructing dynamic models from experimentally measured data. Realization-based identification methods construct models by estimating the mapping from past input to future output based on this measured data. The non-deterministic nature of the estimation process causes this mapping to have an arbitrarily large rank, and so a rank-reduction step is required to factor the mapping into a suitable state-space model. Both these steps must be carefully considered to guarantee unbiased estimates of dynamic systems.

The foundations of realization theory are primarily due to Kalman and first appear in the landmark paper of [1], though the problem is not defined explicitly until [2], which also coins the term “realization” as being the a state-space model of a linear system constructed from an experimentally measured impulse response. It was [3] that introduced the structured-matrix approach now synonymous with the term “realization theory” by re-interpreting a theorem originally due to [4] in a state-space LTI system framework.

Although Kalman's original definition of “realization” implied an identification problem, it was not until [5] proposed rank-reduction by means of the singular-value decomposition that Ho's method became feasible for use with non-deterministic data sets. The combination of Ho's method and the singular-value decomposition was finally generalized to use with experimentally measured data by Kung in [6].

With the arrival of Kung's method came the birth of what is now known as the field of *subspace identification* methods. These methods use structured matrices of arbitrary input and output data to estimate a state-sequence from the system. The system is then identified from the propagation of the state over time. While many subspace methods exist, the most popular are the Multivariable Output-Error State Space (MOESP) family, due to [7], and the Numerical Algorithms for Subspace State-Space System Identification (N4SID) family, due to [8]. Related to subspace methods is the Eigensystem Realization Algorithm [9], which applies Kung's algorithm to impulse-response estimates, which are typically estimated through an Observer/Kalman Filter Identification (OKID) algorithm [10].

This chapter presents the central theory behind realization-based system identification in a chronological context, beginning with Kronecker's theorem, proceeding through the work of Kalman and Kung, and presenting a generalization of the procedure to arbitrary sets of data. This journey provides an interesting perspective on the original role of linear algebra in the analysis of rational functions and highlights the similarities of the different representations of LTI filters. Realization theory is a diverse field that connects many tools of linear algebra, including structured matrices, the QR-decomposition, the singular-value decomposition, and linear least-squares problems.

## 2. Transfer-function representations

We begin by reviewing some properties of discrete-time linear filters, focusing on the role of infinite series expansions in analyzing the properties of rational functions. The reconstruction of a transfer function from an infinite impulse response is equivalent to the reconstruction of a rational function from its Laurent series expansion. The reconstruction problem is introduced and solved by forming structured matrices of impulse-response coefficients.

### 2.1. Difference equations and transfer functions

Discrete-time linear filters are most frequently encountered in the form of difference equations that relate an input signal ${u}_{k}$to an output signal ${y}_{k}$. A simple example is an output ${y}_{k}$determined by a weighted sum of the inputs from ${u}_{k}$to ${u}_{k-m}$,

More commonly, the output ${y}_{k}$also contains a weighted sum of previous outputs, such as a weighted sum of samples from ${y}_{k-1}$to ${y}_{k-n}$,

The impulse response of a filter is the output sequence ${g}_{k}={y}_{k}$generated from an input

The parameters ${g}_{k}$are the impulse-response coefficients, and they completely describe the behavior of an LTI filter through the convolution

Filters of type (▭) are called finite-impulse response (FIR) filters because ${g}_{k}$is a finite-length sequence that settles to 0 once $k>m$. Filters of type (▭) are called infinite impulse response (IIR) filters since generally the impulse response will never completely settle to 0.

A system is stable if a bounded ${u}_{k}$results in a bounded ${y}_{k}$. Because the output of LTI filters is a linear combination of the input and previous output, any input-output sequence can be formed from a linear superposition of other input-output sequences. Hence proving that the system has a bounded output for a single input sequence is necessary and sufficient to prove the stability of an LTI filter. The simplest input to consider is an impulse, and so a suitable definition of system stability is that the absolute sum of the impulse response is bounded,

Though the impulse response completely describes the behavior of an LTI filter, it does so with an infinite number of parameters. For this reason, discrete-time LTI filters are often written as transfer functions of a complex variable $z$. This enables analysis of filter stability and computation of the filter's frequency response in a finite number of calculations, and it simplifies convolution operations into basic polynomial algebra.

The transfer function is found by grouping output and input terms together and taking the $Z$-transform of both signals. Let $Y\left(z\right)={\sum}_{k=-\infty}^{\infty}{y}_{k}{z}^{-k}$be the $Z$-transform of ${y}_{k}$and $U\left(z\right)$be the $Z$-transform of ${u}_{k}$. From the property

the relationship between $Y\left(z\right)$and $U\left(z\right)$may be expressed in polynomials of $z$as

The ratio of these two polynomials is the filter's transfer function

When $n\ge m$, $G\left(z\right)$is *proper*. If the transfer function is not proper, then the difference equations will have ${y}_{k}$dependent on future input samples such as ${u}_{k+1}$. Proper transfer functions are required for causality, and thus all physical systems have proper transfer function representations. When $n>m$, the system is *strictly proper*. Filters with strictly proper transfer functions have no feed-through terms; the output ${y}_{k}$does not depend on ${u}_{k}$, only the preceding input ${u}_{k-1},\phantom{\rule{3.33333pt}{0ex}}{u}_{k-2},\phantom{\rule{3.33333pt}{0ex}}\cdots $. In this chapter, we assume all systems are causal and all transfer functions proper.

If $a\left(z\right)$and $b\left(z\right)$have no common roots, then the rational function $G\left(z\right)$is *coprime*, and the order $n$of $G\left(z\right)$cannot be reduced. Fractional representations are not limited to single-input-single-output systems. For vector-valued input signals ${u}_{k}\in {\mathbb{R}}^{{n}_{u}}$and output signals ${y}_{k}\in {\mathbb{R}}^{{n}_{y}}$, an LTI filter may be represented as an ${n}_{y}\times {n}_{u}$matrix of rational functions ${G}_{ij}\left(z\right)$, and the system will have matrix-valued impulse-response coefficients. For simplicity, we will assume that transfer function representations are single-input-single-output, though all results presented here generalize to the multi-input-multi-output case.

### 2.2. Stability of transfer function representations

Because the effect of $b\left(z\right)$is equivalent to a finite-impulse response filter, the only requirement for $b\left(z\right)$to produce a stable system is that its coefficients be bounded, which we may safely assume is always the case. Thus the stability of a transfer function $G\left(z\right)$is determined entirely by $a\left(z\right)$, or more precisely, the roots of $a\left(z\right)$. To see this, suppose $a\left(z\right)$is factored into its roots, which are the poles ${p}_{i}$of $G\left(z\right)$,

To guarantee a bounded ${y}_{k}$, it is sufficient to study a single pole, which we will denote simply as $p$. Thus we wish to determine necessary and sufficient conditions for stability of the system

Note that $p$may be complex. Assume that $\left|z\right|>\left|p\right|$. ${G}^{\text{'}}\left(z\right)$then has the Laurent-series expansion

From the time-shift property of the $z$-transform, it is immediately clear that the sequence

is the impulse response of ${G}^{\text{'}}\left(z\right)$. If we require that (▭) is absolutely summable and let $\left|z\right|=1$, the result is the original stability requirement (▭), which may be written in terms of $p$as

This is true if and only if $\left|p\right|<1$, and thus ${G}^{\text{'}}\left(z\right)$is stable if and only if $\left|p\right|<1$. Finally, from (▭) we may deduce that a system is stable if and only if all the poles of $G\left(z\right)$satisfy the property $|{p}_{i}|<1$.

### 2.3. Construction of transfer functions from impulse responses

Transfer functions are a convenient way of representing complex system dynamics in a finite number of parameters, but the coefficients of $a\left(z\right)$and $b\left(z\right)$cannot be measured directly. The impulse response of a system can be found experimentally by either direct measurement or from other means such as taking the inverse Fourier transform of a measured frequency response [11]. It cannot, however, be represented in a finite number of parameters. Thus the conversion between transfer functions and impulse responses is an extremely useful tool.

For a single-pole system such as (▭), the expansion (▭) provides an obvious means of reconstructing a transfer function from a measured impulse response: given any 2 sequential impulse-response coefficients ${g}_{k}$and ${g}_{k+1}$, the pole of ${G}^{\text{'}}\left(z\right)$may be found from

Notice that this is true for any $k$, and the impulse response can be said to have a *shift-invariant* property in this respect.

Less clear is the case when an impulse response is generated by a system with higher-order $a\left(z\right)$and $b\left(z\right)$. In fact, there is no guarantee that an arbitrary impulse response is the result of a linear system of difference equations at all. For an LTI filter, however, the coefficients of the impulse response exhibit a linear dependence which may be used to not only verify the linearity of the system, but to construct a transfer function representation as well. The exact nature of this linear dependence may be found by forming a structured matrix of impulse response coefficients and examining its behavior when the indices of the coefficients are shifted forward by a single increment, similar to the single-pole case in (▭). The result is stated in the following theorem, originally due to Kronecker [4] and adopted from the English translation of [12].

**Theorem 1 (Kronecker's Theorem)** Suppose $G\left(z\right):{\u2102}^{}\to {\u2102}^{}$is an infinite series of descending powers of $z$, starting with ${z}^{-1}$,

Assume G(z) is analytic (the series converges) for all $\left|z\right|>1$. Let $H$be an infinitely large matrix of the form

Then $H$has finite rank $n$if and only if $G\left(z\right)$is a strictly proper, coprime, rational function of degree $n$with poles inside the unit circle. That is, $G\left(z\right)$has an alternative representation

in which $m<n$, all roots of $a\left(z\right)$satisfy $\left|z\right|<1$, $a\left(z\right)$and $b\left(z\right)$have no common roots, and we have assumed without loss of generality that $a\left(z\right)$is monic.

To prove Theorem ▭, we first prove that for $k>n$, ${g}_{k}$must be linearly dependent on the previous $n$terms of the series for $H$to have finite rank.

**Theorem 2** The infinitely large matrix $H$is of finite rank $n$if and only if there exists a finite sequence ${\alpha}_{1},\phantom{\rule{3.33333pt}{0ex}}{\alpha}_{2},\cdots ,{\alpha}_{n}$such that for $k\ge n$,

and $n$is the smallest number with this property.

Let ${h}_{k}$be the row of $H$beginning with ${g}_{k}$. If $H$has rank $n$, then the first $n+1$rows of $H$are linearly dependent. This implies that for some $1\le p\le n$, ${h}_{p+1}$is a linear combination of ${h}_{1},\cdots ,\phantom{\rule{3.33333pt}{0ex}}{h}_{p}$, and thus there exists some sequence ${\alpha}_{k}$such that

The structure and infinite size of $H$imply that such a relationship must hold for all following rows of $H$, so that for $q\ge 0$

Hence any row ${h}_{k}$, $k>p$, can be expressed as a linear combination of the previous $p$rows. Since $H$has at least $n$linearly independent rows, $p=n$, and since this applies element-wise, $error\left(H\right)=n$implies (▭).

Alternatively, (▭) implies a relationship of the form (▭) exists, and hence $error\left(H\right)=p$. Since $n$is the smallest possible $p$, this implies $error\left(H\right)=n$.

We now prove Theorem ▭.

Multiplying both sides by the denominator of the left,

and equating powers of $z$, we find

From this, we have, for $k\ge n$,

which not only shows that (▭) holds, but also shows that ${\alpha}_{j}=-{a}_{j}$. Hence by Theorem ▭, $H$must have finite rank.

Conversely, suppose $H$has finite rank. Then (▭) holds, and we may construct $a\left(z\right)$from ${\alpha}_{k}$and $b\left(z\right)$from (▭) to create a rational function. This function must be coprime since its order $n$is the smallest possible.

The construction in Theorem ▭ is simple to extend to the case in which $G\left(z\right)$is only proper and not strictly proper; the additional coefficient ${b}_{n}$is simply the feed-through term in the impulse response, that is, ${g}_{0}$.

A result of Theorem ▭ is that given finite-dimensional, full-rank matrices

and

the coefficients of $a\left(z\right)$may be calculated as

Notice that (▭) is in fact a special case of (▭). Thus we need only know the first $2n+1$impulse-response coefficients to reconstruct the transfer function $G\left(z\right)$: $2n$to form the matrices ${H}_{k}$and ${H}_{k+1}$from (▭) and (▭), respectively, and possibly the initial coefficient ${g}_{0}$in case of an ${n}^{t}h$-order $b\left(z\right)$.

Matrices with the structure of $H$are useful enough to have a special name. A *Hankel matrix* $H$is a matrix constructed from a sequence $\left\{{h}_{k}\right\}$so that each element ${H}_{(j,k)}={h}_{j+k}$. For the Hankel matrix in (▭), ${h}_{k}={g}_{k-1}$. ${H}_{k}$also has an interesting property implied by (▭): its row space is invariant under shifting of the index $k$. Because its symmetric, this is also true for its column space. Thus this matrix is also often referred to as being *shift-invariant*.

While (▭) provides a potential method of identifying a system from a measured impulse response, this is not a reliable method to use with measured impulse response coefficients that are corrupted by noise. The exact linear dependence of the coefficients will not be identical for all $k$, and the structure of (▭) will not be preserved. Inverting ${H}_{k}$will also invert any noise on ${g}_{k}$, potentially amplifying high-frequency noise content. Finally, the system order $n$is required to be known beforehand, which is usually not the case if only an impulse response is available. Fortunately, these difficulties may all be overcome by reinterpreting the results Kronecker's theorem in a state-space framework. First, however, we more carefully examine the role of the Hankel matrix in the behavior of LTI filters.

### 2.4. Hankel and Toeplitz operators

The Hankel matrix of impulse response coefficients (▭) is more than a tool for computing the transfer function representation of a system from its impulse response. It also defines the mapping of past input signals to future output signals. To define exactly what this means, we write the convolution of (▭) around sample $k=0$in matrix form as

where the vectors and matrix have been partitioned into sections for $k<0$and $k\ge 0$. The output for $k\ge 0$may then be split into two parts:

where the subscripts $p$and $f$denote “past” and “future,” respectively. The system Hankel matrix $H$has returned to describe the effects of the past input ${u}_{p}$on the future output ${y}_{f}$. Also present is the matrix $T$, which represents the convolution of future input ${u}_{f}$with the impulse response. Matrices such as $T$with constant diagonals are called *Toeplitz* matrices.

From (▭), it can be seen that $H$defines the effects of past input on future output. One interpretation of this is that $H$represents the “memory” of the system. Because $H$is a linear mapping from ${u}_{p}$to ${y}_{f}$, the induced matrix 2-norm of $H$, ${\left|\left|H\right|\right|}_{2}$, can be considered a function norm, and in a sense, ${\left|\left|H\right|\right|}_{2}$is a measure of the “gain” of the system. ${\left|\left|H\right|\right|}_{2}$is often called the *Hankel-norm* of a system, and it plays an important role in model reduction and in the analysis of anti-causal systems. More information on this aspect of linear systems can be found in the literature of robust control, for instance, [13].

## 3. State-space representations

Although transfer functions define system behavior completely with a finite number of parameters and simplify frequency-response calculations, they are cumbersome to manipulate when the input or output is multi-dimensional or when initial conditions must be considered. The other common representation of LTI filters is the state-space form

in which ${x}_{k}\in {\mathbb{R}}^{n}$is the system state. The matrices $A\in {\mathbb{R}}^{n\times n}$, $B\in {\mathbb{R}}^{n\times {n}_{u}}$, $C\in {\mathbb{R}}^{{n}_{y}\times n}$, and $D\in {\mathbb{R}}^{{n}_{y}\times {n}_{u}}$completely parameterize the system. Only $D$uniquely defines the input-output behavior; any nonsingular matrix ${T}^{\text{'}}$may be used to change the state basis via the relationships

The $Z$-transform may also be applied to the state-space equations (▭) to find

and thus, if (▭) is the state-space representation of the single-variable system (▭), then $a\left(z\right)$is the characteristic polynomial of $A$, $det(zI-A)$.

Besides clarifying the effect of initial conditions on the output, state-space representations are inherently causal, and (▭) will always result in a proper system (strictly proper if $D=0$). For this reason, state-space representations are often called *realizable* descriptions; while the forward-time-shift of $z$is an inherently non-causal operation, state-space systems may always be constructed in reality.

### 3.1. Stability, controllability, and observability of state-space representations

The system impulse response is simple to formulate in terms of the state-space parameters by calculation of the output to a unit impulse with ${x}_{0}=0$:

Notice the similarity of (▭) and (▭). In fact, from the eigenvalue decomposition of $A$,

we find

The term $|{\Lambda}^{k-1}|$will only converge if the largest eigenvalue of $A$is within the unit circle, and thus the condition that all eigenvalues ${\lambda}_{i}$of $A$satisfy $|{\lambda}_{i}|<1$is a necessary and sufficient condition for stability.

For state-space representations, there is the possibility that a combination of $A$and $B$will result in a system for which ${x}_{k}$cannot be entirely controlled by the input ${u}_{k}$. Expressing ${x}_{k}$in a matrix-form similar to (▭) as

demonstrates that ${x}_{k}$is in subspace ${\mathbb{R}}^{n}$if and only if $\mathcal{C}$has rank $n$. $\mathcal{C}$is the *controllability matrix* and the system is *controllable* if it has full row rank.

Similarly, the state ${x}_{k}$may not uniquely determine the output for some combinations of $A$and $C$. Expressing the evolution of the output as a function of the state in matrix-form as

demonstrates that there is no nontrivial null space in the mapping from ${x}_{k}$to ${y}_{k}$if and only if $\mathcal{O}$has rank $n$. $\mathcal{O}$is the *observability matrix* and the system is *observable* if it has full column rank.

Systems that are both controllable and observable are called *minimal*, and for minimal systems, the dimension $n$of the state variable cannot be reduced. In the next section we show that minimal state-space system representations convert to coprime transfer functions that are found through (▭).

### 3.2. Construction of state-space representations from impulse responses

The fact that the denominator of $G\left(z\right)$is the characteristic polynomial of $A$not only allows for the calculation of a transfer function from a state-space representation, but provides an alternative version of Kronecker's theorem for state-space systems, known as the Ho-Kalman Algorithm [3]. From the Caley-Hamilton theorem, if $a\left(z\right)$is the characteristic polynomial of $A$, then $a\left(A\right)=0$, and

which implies

Indeed, substitution of (▭) into (▭) and rearrangement of the indices leads to (▭). Additionally, substitution of (▭) into the product of $\mathcal{O}$and $\mathcal{C}$shows that

which confirms our previous statement that $H$effectively represents the memory of the system. Because

we see that $error\left(H\right)=n$implies the state-space system (▭) is minimal.

If the entries of $H$are shifted forward by one index to form

then once again substituting (▭) reveals

Thus the row space and column space of $H$are invariant under a forward-shift of the indices, implying the same shift-invariant structure seen in (▭).

The appearance of $A$in (▭) hints at a method for constructing a state-space realization from an impulse response. Suppose the impulse response is known exactly, and let ${H}_{r}$be a finite slice of $H$with $r$block rows and $L$columns,

Then any appropriately dimensioned factorization

may be used to find $A$for some arbitrary state basis as

where ${\overline{H}}_{r}$is ${H}_{r}$with the indices shifted forward once and ${(\xb7)}^{\u2020}$is the Moore-Penrose pseudoinverse. $C$taken from the first block row of ${\mathcal{O}}_{r}$, $B$taken from the first block column of ${\mathcal{C}}_{L}$, and $D$taken from ${g}_{0}$then provides a complete and minimal state-space realization from an impulse response. Because ${H}_{r}$has rank $n$and $det(zI-A)$has degree $n$, we know from Kronecker's theorem that $G\left(z\right)$taken from (▭) will be coprime.

However, as mentioned before, the impulse response of the system is rarely known exactly. In this case only an estimate ${\widehat{H}}_{r}$with a non-deterministic error term is available:

Because $E$is non-deterministic, $\widehat{H}$will always have full rank, regardless of the number of rows $r$. Thus $n$cannot be determined from examining the rank of $H$, and even if $n$is known beforehand, a factorization (▭) for $r>n$will not exist. Thus we must find a way of reducing the rank of ${\widehat{H}}_{r}$in order to find a state-space realization.

### 3.3. Rank-reduction of the Hankel matrix estimate

If ${\widehat{H}}_{r}$has full rank, or if $n$is unknown, its rank must be reduced prior to factorization. The obvious tool for reducing the rank of matrices is the *singular-value decomposition*

where $U$and ${V}^{T}$are orthogonal matrices and $\Sigma $is a diagonal matrix containing the nonnegative *singular values*

Because $U$and ${V}^{T}$are orthogonal, the SVD satisfies

where ${\left|\left|\xb7\right|\right|}_{2}$is the induced matrix 2-norm, and

where ${\left|\left|\xb7\right|\right|}_{F}$is the Frobenius norm. Equation (▭) also shows that the Hankel norm of a system is the maximum singular value of ${H}_{r}$. From (▭) and (▭), we can directly see that if the SVD of ${H}_{r}$is partitioned into

where ${U}_{n}$is the first $n$columns of $U$, ${\Sigma}_{n}$is the upper-left $n\times n$block of $\Sigma $, and ${V}_{n}^{T}$is the first $n$rows of ${V}^{T}$, the solution to the rank-reduction problem is [14]

Additionally, the error resulting from the rank reduction is

which suggests that if the rank of ${H}_{r}$is not known beforehand, it can be determined by examining the nonzero singular values in the deterministic case or by searching for a significant drop-off in singular values if only a noise-corrupted estimate is available.

### 3.4. Identifying the state-space realization

From a rank-reduced ${\widehat{H}}_{r}$, any factorization

can be used to estimate ${\mathcal{O}}_{r}$and ${\mathcal{C}}_{L}$. The error in the state-space realization, however, will depend on the chosen state basis. Generally we would like to have a state variable with a norm ${\left|\left|{x}_{k}\right|\right|}_{2}$in between ${\left|\left|{u}_{k}\right|\right|}_{2}$and ${\left|\left|{y}_{k}\right|\right|}_{2}$. As first proposed in [5], choosing the factorization

results in

and thus, from a functional perspective, the mappings from input to state and state to output will have equal magnitudes, and each entry of the state vector ${x}_{k}$will have similar magnitudes. State-space realizations that satisfy (▭) are sometimes called *internally balanced* realizations [11]. (Alternative definitions of a “balanced” realization exist, however, and it is generally wise to verify the definition in each context.)

Choosing the factorization (▭) also simplifies computation of the estimate $\widehat{A}$, since

By estimating $\widehat{B}$as the first block column of ${\widehat{\mathcal{C}}}_{L}$, $\widehat{C}$as the first block row of ${\widehat{\mathcal{O}}}_{L}$, and $\widehat{D}$as ${g}_{0}$, a complete state-space realization $(\widehat{A},\phantom{\rule{3.33333pt}{0ex}}\widehat{B},\phantom{\rule{3.33333pt}{0ex}}\widehat{C},\phantom{\rule{3.33333pt}{0ex}}\widehat{D})$is identified from this method.

### 3.5. Pitfalls of direct realization from an impulse response

Even though the rank-reduction process allows for realization from a noise-corrupted estimate of an impulse response, identification methods that generate a system estimate from a Hankel matrix constructed from an estimated impulse response have numerous difficulties when applied to noisy measurements. Measuring an impulse response directly is often infeasible; high-frequency damping may result in a measurement that has a very brief response before the signal-to-noise ratio becomes prohibitively small, and a unit pulse will often excite high-frequency nonlinearities that degrade the quality of the resulting estimate.

Taking the inverse Fourier transform of the frequency response guarantees that the estimates of the Markov parameters will converge as the dataset grows only so long as the input is broadband. Generally input signals decay in magnitude at higher frequencies, and calculation of the frequency response by inversion of the input will amplify high-frequency noise. We would prefer an identification method that is guaranteed to provide a system estimate that converges to the true system as the amount of data measured increases and that avoids inverting the input. Fortunately, the relationship between input and output data in (▭) may be used to formulate just such an identification procedure.

## 4. Realization from input-output data

To avoid the difficulties in constructing a system realization from an estimated impulse response, we will form a realization-based identification procedure applicable to measured input-output data. To sufficiently account for non-deterministic effects in measured data, we add a noise term ${v}_{k}\in {\mathbb{R}}^{{n}_{y}}$to the output to form the noise-perturbed state-space equations

We assume that the noise signal ${v}_{k}$is generated by a stationary stochastic process, which may be either white or colored. This includes the case in which the state is disturbed by process noise, so that the noise process may have the same poles as the deterministic system. (See [15] for a thorough discussion of representations of noise in the identification context.)

### 4.1. Data-matrix equations

The goal is to construct a state-space realization using the relationships in (▭), but doing so requires a complete characterization of the row space of ${H}_{r}$. To this end, we expand a finite-slice of the future output vector to form a block-Hankel matrix of output data with $r$block rows,

This matrix is related to a block-Hankel matrix of future input data

a block-Toeplitz matrix of past input data

a finite-dimensional block-Toeplitz matrix

the system Hankel matrix $H$, and a block-Hankel matrix $V$formed from noise data ${v}_{k}$with the same indices as $Y$by the equation

If the entries of ${Y}_{f}$are shifted forward by one index to form

then ${\overline{Y}}_{f}$is related to the shifted system Hankel matrix $\overline{H}$, the past input data ${U}_{p}$, $T$with a block column appended to the left, and ${U}_{f}$with a block row appended to the bottom,

and a block-Hankel matrix $\overline{V}$of noise data ${v}_{k}$with the same indices as $\overline{Y}$by the equation

From (▭), the state is equal to the column vectors of ${U}_{p}$multiplied by the entries of the controllability matrix $\mathcal{C}$, which we may represent as the block-matrix

which is an alternative means of representing the memory of the system at sample 0, 1, .... The two data matrix equations (▭) and (▭) may then be written as

YT UV. Equation (▭) is basis for the field of system identification methods known as subspace methods. Subspace identification methods typically fall into one of two categories. First, because a shifted observability matrix

satisfies

where $\text{im}(\xb7)$of denotes the row space (often called the “image”), the row-space of $\mathcal{O}$is shift-invariant, and $A$may be identified from estimates ${\mathcal{O}}_{r}$and ${\overline{\mathcal{O}}}_{r}$as

Alternatively, because a forward-propagated sequence of states

satisfies

the column-space of $X$is shift-invariant, and $A$may be identified from estimates $\widehat{X}$and $\widehat{\overline{X}}$as

In both instances, the system dynamics are estimated by propagating the indices forward by one step and examining a propagation of linear dynamics, not unlike (▭) from Kronecker's theorem. Details of these methods may be found in [16] and [17]. In the next section we present a system identification method that constructs system estimates from the shift-invariant structure of $Y$itself.

### 4.2. Identification from shift-invariance of output measurements

Equations (▭) and (▭) still contain the effects of the future input in ${\overline{U}}_{f}$. To remove these effects from the output, we must first add some assumptions about ${\overline{U}}_{f}$. First, we assume that ${\overline{U}}_{f}$has full row rank. This is true for any ${U}_{f}$with a smooth frequency response or if ${\overline{U}}_{f}$is generated from some pseudo-random sequence. Next, we assume that the initial conditions in $X$do not somehow cancel out the effects of future input. A sufficient condition for this is to require

to have full row rank. Although these assumptions might appear restrictive at first, since it is impossible to verify without knowledge of $X$, it is generally true with the exception of some pathological cases.

Next, we form the null-space projector matrix

which has the property

We know the inverse of $\left({\overline{U}}_{f}{\overline{U}}_{f}^{T}\right)$exists, since we assume ${\overline{U}}_{f}$has full row rank. Projector matrices such as (▭) have many interesting properties. Their eigenvalues are all 0 or 1, and if they are symmetric, they separate the subspace of real vectors — in this case, vectors in ${\mathbb{R}}^{L+1}$— into a subspace and its orthogonal complement. In fact, it is simple to verify that the null space of ${\overline{U}}_{f}$contains the null space of ${U}_{f}$as a subspace, since

Thus multiplication of (▭) and (▭) on the right by $\Pi $results in

Y

YV

we have

and $U={R}_{11}^{T}{Q}_{1}^{T}$. Substitution into (▭) reveals

Because the columns of ${Q}_{1}$and ${Q}_{2}$are orthogonal, multiplication of (▭) on the right by (▭) results in

A similar result holds for $\overline{Y}\Pi $. Taking the QR-decomposition of the data can alternatively be thought of as using the principle of superposition to construct new sequences of input-output data through a Gram-Schmidt-type orthogonalization process. A detailed discussion of this interpretation can be found in [18].

Thus we have successfully removed the effects of future input on the output while retaining the effects of the past, which is the foundation of the realization process. We still must account for non-deterministic effects in $V$and $\overline{V}$. To do so, we look for some matrix $Z$such that

V

V

then multiplication of (▭) and (▭) on the right by ${Z}^{T}$results in

Y

Y

Finally we are able to perform our rank-reduction technique directly on measured data without needing to first estimate the impulse response. From the SVD

we may estimate the order $n$by looking for a sudden decrease in singular values. From the partitioning

we may estimate ${\mathcal{O}}_{r}$and $X\Pi {Z}^{T}$from the factorization

$A$may then be estimated as

And so we have returned to our original relationship (▭).

While $C$may be estimated from the top block row of ${\widehat{\mathcal{O}}}_{r}$, our projection has lost the column space of ${H}_{r}$that we previously used to estimate $B$, and initial conditions in $X$prevent us from estimating $D$directly. Fortunately, if $A$and $C$are known, then the remaining terms $B$, $D$, and an initial condition ${x}_{0}$are linear in the input output data, and may be estimated by solving a linear-least-squares problem.

### 4.3. Estimation of B, D, and x0

The input-to-state terms $B$and $D$may be estimated by examining the convolution with the state-space form of the impulse response. Expanding (▭) with the input and including an initial condition ${x}_{0}$results in

Factoring out $B$and $D$on the right provides

in which $vec(\xb7)$is the operation that stacks the columns of a matrix on one another, $\otimes $is the (coincidentally named) Kronecker product, and we have made use of the identity

Grouping the unknown terms together results in

Thus by forming the regressor

from the estimates $\widehat{A}$and $\widehat{C}$, estimates of $B$and $D$may be found from the least-squares solution of the linear system of $N$equations

Note that $N$is arbitrary and does not need to be related in any way to the indices of the data matrix equations. This can be useful, since for large-dimensional systems, the regressor ${\phi}_{k}$may become very computationally expensive to compute.

## 5. Conclusion

Beginning with the construction of a transfer function from an impulse response, we have constructed a method for identification of state-space realizations of linear filters from measured input-output data, introducing the fundamental concepts of realization theory of linear systems along the way. Computing a state-space realization from measured input-output data requires many tools of linear algebra: projections and the QR-decomposition, rank reduction and the singular-value decomposition, and linear least squares. The principles of realization theory provide insight into the different representations of linear systems, as well as the role of rational functions and series expansions in linear algebra.