Sense Smart, Not Hard: A Layered Cognitive Radar Architecture

In this chapter, we present a cognitive radar architecture based on the three-layer model by Rasmussen. The skill-based-layer is characterized by adaptive signal-processing approaches and target matched waveforms. The rule-based-layer comprises reactive execution of optimal illumination policies and resource-management. The knowledgebased layer allows for long term, goal-oriented missionand trajectory planning. Each layer is illustrated by example algorithms and applications for implementation.


Introduction
Modern multifunctional radars with electronic beam-steering (AESA) provide many degrees of freedom to point the antenna beam, usage of the electromagnetic spectrum and waveform selection (Figure 1).Complex surveillance and reconnaissance scenarios require increased automation and suited man-machine-interfaces, which is enabled by the cognitive radar approach [1][2][3].
In this article we explain a cognitive radar architecture developed at the Fraunhofer FHR based on the three-layer-model of Rasmussen [4].In the following, we will first introduce the concept of cognitive automation and derive our cognitive radar architecture.For each cognitive subfunction several technologies for realization are discussed and illustrated by example applications.

Cognitive automation for radar
The concept of Dual-Mode Cognitive Automation [5] is well suited to deal with the challenges of highly automated radar systems.As shown in Figure 2, intelligent software-agents (depicted as robot-heads) can be introduced into the work equipment to increase the level of automation under the supervisory control paradigm [6] .
Alternatively the software-agent can cooperate with the human operator in the sense of an intelligent assistant system [7].Even though the cognitive radar architecture can be used for both approaches, we will focus on the more traditional supervisory control role in the following.Topics in Radar Signal Processing

Three layer model of a cognitive radar architecture
The three-layer model of human cognitive performance published by Jens Rasmussen in 1983 is widely used in human factors [8], cognitive psychology and robotics [9,10].As shown in Figure 3 the complex process of human cognition is simplified and broken down into cognitive subfunctions (shown as gray boxes) with the indicated flow of information.The Rasmussen-model distinguishes three layers of cognitive performance with increasing level of abstraction.
The skill-based-layer comprises subconscious and very efficient perception and control tasks (such as steering along a curvy road).Above it, the rule-based-layer describes reactive behavior.Learned procedures are triggered by certain cues in familiar situations (such as stopping the car at a red traffic light).The knowledge-based-layer enables deliberate, goal-based behavior.By inferring novel solutions from a-priori knowledge flexible reaction in unknown situations is achieved (e.g.bypassing a traffic jam based on a road-map).
For the development of a cognitive radar architecture in analogy to the Rasmussen-model, each cognitive subfunction had to be mapped into five different radar-technologies as shown in Figure 3.
Modern radar system can generate arbitrary waveforms in real-time.This allows for transmit signals to be matched to the target transfer function or the electromagnetic spectrum as explained in Sections 4.1 and 4.2.Perception tasks of a radar comprise signal-processing and classification aspects.We use a machine learning approach that is illustrated in Section 4.3.Rule-based behavior in a radar is emulated by using optimal control policies or resource management approaches as shown in Section 5.1 or Section 5.2.Knowledge-based behavior can be implemented using Bayesian networks or automated planning algorithms.We show an example for robot-trajectory planning in Section 6.1.

Skill-based-layer
The skill-based layer represents the basic signal-generation and processing capabilities of the radar system.It operates on the smallest timescale in the architecture in a continuous processing loop.Below, we give an example for adapting the transmit waveform to the target-transfer function using arbitrary waveform generation capabilities.As an extension, the waveform can further be adapted to the electromagnetic spectrum that has to be continuously sensed.

Matched illumination
If a priori information about a target is available, it is possible to optimize the transmission waveform for this target.Advantages arise for example by discriminating two classes of targets or by reducing resources of the sensor.One example is the reduction of the required bandwidth, if the available a priori information about the target is comprehensive.
In order to resolve the size of the object, two transmission frequencies are sufficient to estimate the extension of two scattering points with a spacing of Δz [11] (see Figure 4).The maximal energy at the receiver can be achieved when two frequencies are superposed to a beat frequency where the envelope covers the dimension of the target.If there are more than two scattering points the frequency spacing must be higher to achieve a higher period of the beat.
In practice, the assumption of a known target impulse response is often difficult to realize.In a cognitive radar system, the a priori knowledge of the target can be presupposed by previous measurements and is assumed to be predicted for the next time step.An adapted waveform can be used to update the target track with respect to its extension by a lower allocation of the bandwidth.
Assuming a linear, time invariant channel with additive white Gaussian noise w, the complex received signal y corresponds to a convolution of the transmission signal s and the target transfer function where y s, i represents the undisturbed signal component.The linear convolution can be expressed by a matrix vector multiplication where the Toeplitz structured convolution matrix Transmitting two frequencies with a spacing of ΔF = c/2Δz, the size of the object can be obtained.The shape of the object requires an even larger frequency separation ΔF = c/2δz (modified from ref. [11]).

Topics in Radar Signal Processing
H i for the target index i is created by elements of the impulse response h i .The detection performance is directly related to the signal to noise ratio (SNR) and depends on the receiver bandwidth and the power of the received signal y s If the target characteristic H i is known, the signal to noise ratio can be increased by optimizing the waveform s [12].The optimisation problem can be formulated to with the constraint of an energy limited transmission signal and the Hermitian correlation matrix One possibility to solve this optimisation problem is the Lagrangian multiplier method Eq. ( 4) is obviously an eigenvalue equation where the Lagrange multiplier λ represents the real eigenvalue and the waveform s is the corresponding eigenvector.By choosing the maximal eigenvalue, the signal to noise ratio is maximized.The eigenvector which corresponds to the maximal eigenvalue is directed towards the highest energy (variance).
To gain a better understanding of the solution, the basic example of Figure 4 is presented for the two dominant scattering points The corresponding frequency spectrum to Eq. ( 5) is The target impulse response fluctuates due to the interference of the scattering centers with a period of t 1 À t 0 ¼ 2 c 0 r 1 À r 0 ðÞ and depends therefore on the target dimension as already visualized in Figure 4.The phase shift of the second point target causes a shift of all frequency maxima (see also Figure 5(a)).Processing an eigenvalue decomposition according to Eq. ( 4) to obtain the optimal waveform for this example (see Figure 5(b)).
Comparing this basic results with the solution of the eigenvalue decomposition, it is obvious that both frequency spectra are related to each other.If all frequency components of the target impulse response have comparable magnitudes, the frequency characteristic of the largest eigenvalue is similar to the target frequency spectrum and the target extension respectively.According to Eq. ( 6) the period of the target extension corresponds to a frequency of ΔF ¼ c 0 2 r 1 Àr 0 ðÞ ≈ 20:0 MHZ.The envelope of the transmission wave is related to the phase difference between r 0 and r 1, and causes a frequency shift in the frequency domain with f e ¼ ϕ 1 À ϕ 0 ÀÁ φF 2π ≈ 1:125 MHZ.Summarizing the characteristic of the optimal transmission signal, the eigenvector corresponds to the main direction of the target variance in the frequency domain and is linked to the physical behavior of the target.If there are more than two scattering points, additional modulation products will occur.In the case where all frequency components of the target spectrum have similar magnitudes, the eigenvector corresponding to the largest eigenvalue will represent all constructive interferences in the resolution bandwidth.But also for small deviations of the spectral magnitudes, the main component (optimal eigenvector) will contain only the dominant frequency while the minor amplitudes are represented by the remaining eigenvectors forming finally the complete signal space.
In order to distinguish between targets an adapted waveform can be used to improve the discrimination between two types of target classes [2].A binary hypothesis test is one method to discriminate between target classes by evaluating the received signal The distance d = ky s,0 À y s,1 k 2 = k(H 0 À H 1 )sk 2 denotes the difference of the received amplitude without taking noise into account.The robustness against incorrect classification increases for higher distances especially in a noise environment.Similar to Eq. ( 2)-( 4), the optimal waveform can be calculated by solving The energy is focused in the spectral area where the both target deviations are predominant.

Topics in Radar Signal Processing
Comparing the performance of a binary hypothesis test for a linear chirp and the optimized waveform, the test statistic of the likelihood ratio for Eq. ( 7) is calculated [13] with the variance σ 2 w of the complex noise.The deflection for the likelihood ratio test defines the effective difference of the likelihood centers and represents the output signal to noise ratio It is possible to achieve the same performance of the receiver operating characteristic (ROC) curve, with a different test statistic of that likelihood ratio but different deflections [14].That is why a higher deflection is related to a better discrimination and a lower sensitivity with respect to an suboptimal threshold.Figure 6 shows the results of the binary test for a linear frequency modulation (LFM) and the optimized waveform for two Gaussian targets with the same extension and distance.The deflection between both classes increases for the optimised waveform leading to a lower intersection are of the test statistic for the hypothesis and the alternative.This facilitates a better separability as well as a lower false alarm rate for the same detection probability.
One example of adapting the waveform to the environment is the support of the classification and saving resources like the bandwidth.Applications like interference mitigation can also be executed in the skill-based layer by combining spectrum sensing algorithms with matched illumination.

Spectrum sensing
Due to the fact that wireless communication technologies are of significant importance in modern times, the available radio frequency spectrum has become a valuable resource for radar.For example the U.S. department of commerce [15] has decided to allocate parts of the S-band (1695-1710 MHZ and 3550-3650 MHZ) to wireless communication.Another example are parts of the C-band (5150-5350 MHZ and 5470-5725 MHZ) which are used by weather radars but are also used by 5GHz-WiFi [16] now.On the other hand these bands, although allocated, are underutilized providing opportunities for secondary (unlicensed) users to share the bands without harming the primary users.The other way round, a similar problem arises when the radar suffers interference from other users or even active jamming.Especially the first is a problem for ultra wideband radars like ground penetrating radars which naturally operate in partially occupied frequency bands.In the future these problems will become even worse and hence future cognitive radar systems must be able to operate in spectrally dense environments.Spectrum sensing techniques from cognitive radio provide algorithms to identify spectrum opportunities, i.e. to decide if a frequency band is occupied or not.With this information a cognitive radar can adapt dynamically its bandwidth, frequency and other transmit parameters to the radio frequency environment.
A significant number of studies dealing with spectrum sensing algorithms exists and hence we only give a brief overview here.For a comprehensive overview the reader is referred, for example, to the surveys [17,18].Spectrum sensing algorithms can be split into wideband and narrowband algorithms.Almost all narrowband spectrum sensing methods are statistical hypothesis tests usually written as where x(t) represents the received complex signal, s(t) the signal of another user and w(t) the noise which is usually assumed white and Gaussian with variance σ 2 w .The most simple spectrum sensing method is the energy detector 2 where p fa is the desired probability of false alarm and F χ 2 2n is the χ 2 distribution function with 2n degrees of freedom.Although this method is easy and fast, it suffers from bad detection probabilities in low SNR regions and poor robustness, see [19].More advanced methods exploit certain features like for example cyclostationary properties [20] where a time series x 1 , Most modern modulations like OFDM or QAM have cyclostationary properties.For details on a test statistic see [21].These methods offer high detection probabilities even in low SNR regions and are blind in the sense that they do not need information about σ 2 w .The price is a very high computational complexity and prior information about the used modulation.Completely blind methods, i.e. absolutely no prior information is necessary, are based usually on a multi antenna system.The data from the different channels is used to estimate a covariance matrix and from its characteristics e.g.eigenvalues a test statistic is build, see [22].
Because the channel state may change between the sensing and transmitting a prediction step after the sensing is helpful or even needed.For this purpose hidden Markov models are used in Ref. [23] and additionally multilayer perceptrons and recurrent neural networks are considered in Ref. [24].Especially the neural networks perform well in simulations with a prediction accuracy of about 0.8 to 0.9.
In contrary to the narrowband band spectrum sensing the wideband spectrum sensing methods divide a band into occupied and unoccupied subbands.The most obvious method for classifying a wideband is to split it into fixed subbands (using a FFT or sweep and tune) and perform narrowband sensing in each one.But there are also native wideband spectrum sensing methods like a wavelet based approach, see [25].
If the radar is the primary user and avoiding or reducing interference is the only goal of the spectrum sensing, it is not necessary to decide if a channel is occupied or not.It is sufficient to use the channel with the least interference.But if a lot of interference is present, a compromise between bandwidth (resolution) and interference must be made which leads to an optimisation problem, see Refs.[26,27].
After each sensing period, a suitable and adaptable waveform must be generated taking the information from the sensing step into account, essentially bandwidth and center frequency.For example, this can be multiple or notched chirps filling the unoccupied bands or a stepped FM waveform which avoids the occupied frequencies, see [27].A combination with the matched illumination approach presented in Section 4.1 can be considered, too.
Building an experimental radar system with spectrum sensing capabilities is a challenging task.The computational complexity of some algorithms can be a burden and the additional sensing time, i.e. gathering the samples and computation time must be taken into account, causing a reduced duty cycle or pulse repetition frequency.In Ref. [27] a radar system employing spectrum sensing and matched illumination was implemented using an Ettus USRP X310 software defined radio.In a test environment about 10 dB noise floor reduction were achieved using spectrum sensing and a notched chirp.

Classification with deep learning techniques
The transition from the continuous stream of incoming row-data towards a symbolic representation of objects, which forms the basis for higher-level cognitive processing, is typically achieved using pattern recognition or classification techniques.As shown in Figure 3,ma ch in e learning approaches comprise subsymbolic feature formation processes that separate characteristic signal features in a higher-dimensional space.In this feature space, it is easier to recognize certain target classes to create an abstracted situational picture within the cognitive radar system.

Convolutional neural networks
Convolutional Neural Networks (CNNs) are inspired by the visual system of the brain and are part of the deep learning research field.For many years, CNNs were the only type of deep neural network that could efficiently be trained due to their structure using the technique of weight sharing [28].The basic structure of the network used in the presented architecture is shown in Figure 7.
CNN's are a special form of multi-layer perceptrons, which are designed specifically to recognize two-dimensional shapes with a high degree of invariance to translation, scaling, skewing, and other forms of distortion [29].This invariance is achieved by an alternation of convolutional and subsampling layers, in which the neurons are organized in so called feature maps.All neurons in each of these feature maps use the same weights and are connected to a local receptive field in the previous layer.With this weight sharing technique, the number of free parameters is dramatically reduced compared to a fully connected network, what should lead to a better generalization of the network.
In the first convolutional layer, each neuron takes its inputs from a local receptive field in the input image and the output values of each feature map, which are visible in Figure 7, represent Topics in Radar Signal Processing the intensity of one specific local spatial feature.The features, i.e. the weights of the neurons, are learned during the training process and since the receptive fields of neighboring neurons in the feature maps are shifted only by one pixel in the corresponding direction in the input image, the output values of each feature map correspond to the result of a two-dimensional correlation of the input image with the learned weights of each particular feature map.
In the input image of Figure 7 one target is visible in the center of the image.The correlation with the different kernels is visualized for three examples.The learned kernels are depicted inside the black squares on the input image and the result of the correlation can be seen in the feature maps of the first layer.
The second layer of the network is a subsampling layer and performs a reduction of the dimension by a factor of four.With this reduction the exact position of the feature becomes less important and it reduces the sensitivity to other forms of distortion [29].The subsampling is done by averaging an area of 4 Â 4 pixels, multiplying it with a weight w j and adding a trainable bias b j .
The third layer is a convolutional layer again and relates the features found in the image to each other.This layer is trained to find pattern of features, which can be separated by the subsequent layers and discriminate the different classes.The output of this layer is the internal representation and can be considered as feature vector found by the network for the given input image.
The last two layers of the network form the decision part of the system and are fully connected layers, which use the output values of the third layer as features for classification.The last layer consists of as many neurons as classes have to be separated, in our case ten.The classification is done by assigning the corresponding class of the neuron with the highest output value.
One cost function for neural networks trained with the back propagation algorithm is the mean square error (MSE) of the training set.The MSE is the mean value of the quadratic loss function E(α), which is given by In (13), α is the set of classifier parameters, d i is the desired output for the ith element of the training set and f(x i , α) is the classifier response to input x i .The MSE of the complete training set with size N is thus The MSE is also called the empirical risk with respect to quadratic loss and classifiers using this error as a performance measure are said to implement the empirical risk minimization (ERM) [30].
The training of our network is performed by the stochastic diagonal Levenberg-Marquardt algorithm that is presented in [31,32].The core of this algorithm is the stochastic update rule where α k ðÞ l is the l-th element of the parameter set α at iteration k, E i is the instantaneous loss function of (13) for image i and γ k ðÞ l is the step size for the particular weight α l at iteration k.The dependency of the step size on the iteration indicates that the step size is not fixed during the training, but is dynamically updated.The calculation of the step size is done by with the constant μ and a parameter η (k) that prevents the step size from becoming too large when the estimate of the second derivative g k ðÞ l of the loss function E i (α) with respect to α l is small.For the calculation of g k ðÞ l the Gauss-Newton approximation is used that guarantees a nonnegative estimate [32].The parameter η is marked here as dependent on the iteration, but is fixed over several epochs of the training1 .The Hessian matrix g (k) is not calculated explicitly in each iteration, instead a running estimate is kept that is updated with where β is between zero and one.Because of the weight sharing, the first and the second partial derivative of the loss function are sums of partial derivatives with respect to the connections that actually share the specific parameter α l In ( 18) and ( 19), the w mn is the connection weights from neuron n to m and V l is the set of unit index pairs (m, n) such that the connection between neuron m and n shares the parameter α l ,i.e., Further details of the algorithm and the approximations that are done to compute the derivatives can be found in Ref. [32].

Regularizations and adaptive learning rates
One feature of the presented network is the use of momentum, which adds a feedback loop and with this some kind of memory to the algorithm.With this technique a certain amount of the weight change of the last iteration is added to the weight change of the current iteration.This amount is determined by the momentum constant r and leads to the expression which can also be written as The use of momentum should have a positive effect on the behavior of the training algorithm and may prevent the algorithm from converging to a local minimum of the error function [29].
Another important regularization method used in this network is the max-norm regularization of the weights of the network.For this regularization the Frobenius norm of each kernel in layer one and three is calculated after the weight change at every iteration and if the norm is larger than a certain value c, the kernel is rescaled to a norm of c.With this regularization an improvement of the convergence properties of the training algorithm has been observed.
So far the learning rate in ( 16) is only determined by the characteristics of the data itself and the error it produces at the output of the network.Another important factor could be metainformation available about the training set.We give here an example of a priority class, which means that we have one target in our database that should always be classified correctly with the additional cost that we might produce more errors in other classes.To incorporate these priority classes into our network, the representation in ( 22) is used.The general idea is to increase the learning rate γ (k) if an image of a priority class is presented at the current iteration.This is done by multiplying a priority weighting p with the learning rate γ (k) , which is then marked as γ If this term is included in the formula for the weight change Δα (k) , the sum in (22) The need for a different weighting of classes is also discussed in Ref. [33], where it is mentioned that the different costs of misclassification should be part of the classifier design.The way we used here to include this prior knowledge into our target recognition system was also mentioned in Ref. [34] for Support Vector Machines, where the idea was to penalize the samples of less represented classes higher than others.
To show the benefit of this adaptive learning strategy we show an example of the ten class moving and stationary target acquisition and recognition (MSTAR) data [35] in  Topics in Radar Signal Processing

Combination of convolutional neural networks with support vector machines
An often mentioned benefit of Support Vector Machines (SVMs) is the high generalization capability in comparison to neural networks.The high generalization of SVMs is achieved by a training strategy called structural risk minimization, which in comparison to the empirical risk minimization of neural networks takes the complexity of the classifier into account.For this reason, the Vapnik-Chervonenkis (VC)-dimension h was introduced to measure the complexity of a classifier.The VC-dimension is defined as the largest training set size N, which can be separated with binary labels in an arbitrary way by the SVM.With a high number of free parameters, the capacity of the classifier increases and thus the VC-dimension increases as well.
Due to this relation, single patterns have a higher influence on the classification result for classifiers with a high VC-dimension, which increases the likelihood of overfitting to the training data [37].To incorporate the VC-dimension into the minimization problem that has to be solved during the training, an additional term Φ N h ÀÁ is added to the empirical risk to define the structural risk where R emp corresponds to the empirical risk.In this problem R emp does not refer to the MSE of ( 14), which was used for neural networks, but to the specific number of misclassifications in the training set.The VC-dimension has an influence on both terms because a high VCdimension will increase the complexity of the classifier and thus reduce the empirical risk, but the confidence interval Φ N h ÀÁ would increase at the same time, since it only depends on the ratio between the size of the training set and the VC-dimension.SVMs are designed to find the best trade-off between these two terms, decrease the empirical error while keeping the VCdimension as low as possible.Because of this, SVMs are classifiers with a very high generalization capability.
To use the high generalization of SVMs in our classification framework, we replace the last two layers of the CNN in Figure 7 with SVMs.In this way we can use the convolutional feature extraction with the invariance to different forms of distortion and a classifier with high generalization.As input for the SVMs, the output values of the third layer are used.The final structure of the classifier is shown in Figure 9.
A SVM can only separate between two classes, for this reason the training set must be split for each SVM into two parts, one part containing the class that should give a positive result at the output of the SVM and one part containing the remaining training set that should give a negative result at the output.SVMs trained in that way are working in the one vs.all classification scheme, which means that as many SVMs as classes that need to be separated are necessary.For the actual classification of a SVM, a kernel is used to transform the data to a high dimensional space in which it is more likely that the problem can be linearly separated.Two common kernels are polynomial (including linear and quadratic kernels) and radial basis functions (RBFs).In Table 1 a small example of the MSTAR database is shown and it can be seen that the already very high correct classification rate of the CNN can be further increased with the use of SVMs as classifier.
The results shown here are so called forced decision results, meaning that all images are classified by the highest output value, no rejection criteria like a certain confidence measure that has to be overcome is used.This and more results with the proposed classifier can be found in Ref. [38].Topics in Radar Signal Processing

Rule-based-layer
Based on the abstracted situational picture derived by signal-processing and machine learning techniques, the cognitive radar system has to react to the perceived scene.Below, we illustrate a MDP based scheme to execute a-priori known, optimal illumination policies.In multifunctional radars, a radar resource manager has to schedule the individual illuminations into a serial radar timeline.

Optimal illumination policy
Markov-decision-processes (MDPs) are widely used in robotics to derive optimal control policies in stochastic environments.An agent in state s i can execute different actions a i , which with probability p ij lead to a follow-up state s j and a reward of r ij .Different approaches, such as valueiteration or reinforcement-learning are used to determine an optimal policy π =(s i | a i , s j | a j , …).
The policy assigns to each state s i an optimal action a i which maximizes the expected reward.
MDPs are well suited to model the perception-action-cycle of a radar, e.g. for tracking applications [39].In the following, we illustrate an example for multi-stage classification from Ref. [40].
Three classes of targets K = {1, 2, 3} can appear in a scenario with a priori-probability π 1 = 0.1, π 2 = 0.2 and π 3 = 0.7 (Figure 10).A low-and a high-resolution radar-mode (mode =1| 2) are available for up to five consecutive illuminations t = {1, 2, 3, 4, 5}, which are fused to a final declaration V using the Bayes rule (Figure 11).The policy describes the optimal illumination strategy with respect to the highest expectation for correctly classifying targets of class 1 (V =1⇔ Class 1, V =2⇔ ¬ Class 1).A negative reward (cost) of 1 unit is assigned for a false alarm and 2 units for a missed detection.
The resulting multi-stage illumination policy is shown in Figure 12.Initially the target is illuminated with mode 2 and classified.Depending on the result Y = 1, 2, the strategy branches and finishes with a final declaration V = 1, 2. In a simulation of 100,000 Monte-Carlo runs, the static application of mode 1 resulted in accumulated costs of 20,000 (class 1 never detected, i.e. all missed detections).When randomly switching between mode 1 and 2, costs of 9063 occurred as opposed to the lowest cost of 4797 when using the optimal strategy.

Radar resource management
The illumination-strategy in Figure 12 requires up to five consecutive illuminations of a target.As indicated in Figure 1, a multifunctional radar must simultaneously carry out additional tasks, in particular search for the new targets and track known targets.Since a shared aperture is used, the radar resource-manager schedules the radar timeline in time-multiplexing mode.
In the following, we simulate an airspace-surveillance radar rotating at 180 /s with electronic beam-steering.

Surveillance
The airspace is discretised depending on the beam width.Let B φ , B θ ∈ (0, 2π) be the azimuth and elevation opening angle respectively.The dwell time τ ¼ 2 r c of an airspace section is  Topics in Radar Signal Processing chosen dependent on the range r of the target to guarantee that the whole range can be scanned within one transmit-receive process.Therefore the discretisation is only made in direction of azimuth and elevation.Since the transmit power decreases with increasing distance to the main lobe the borders are defined overlapping, i.e. a constant d ∈ (0, 1) is selected for the discretisation (typical values are d = 0.5 or d = 0.75).If the maximum observable range and altitude are limited by R and H respectively, the airspace to be observed can be written as where the sensor is located in center of the coordinate system and h ⊥ (x) denotes the height of the target perpendicular to earth's surface.Then, after proper transformation the discretisation of ℒ is given by Here the factor cos(jdB θ ) À1 compensates the circumstance that the same area (in steradians) engages a wider azimuth coverage on higher elevation than it does on lower elevation.When a surveillance task (see Section 5.2.3) is completed it is immediately regenerated with the desired revisit time to guarantee regular observation of the entire airspace.

Tracking
To be able to estimate the position of a target continuously in time all radar detections of a target T i are put together into a track T i .This is done by bringing them into physical relation using predefined dynamic models.A simple dynamic model assumes for example (statistically zero-mean) constant velocity which is variable through the (process-)noise in acceleration.To be able to determine which measurement belongs to which track the data association is done using scoring and global nearest neighbor approach (GNN) as it is described in Ref. [41].In this case all unassociated detections generate a new track that applies as verified when the score exceeds a given threshold.In general a track is an estimation of the movement of the target, it contains information about the dynamic model, the covariance matrix P i and an estimation b x i t ðÞof the real state x i (t)=(p i (t), v i (t), …) T consisting of position p i (t), velocity v i (t) and for example acceleration a i (t) at time t.All tracks generated by the radar yield an estimation of the airspace situation (see Figure 13).
A more complex dynamic model was introduced by Singer [ where α is the reciprocal of the maneuver time constant and w(t) is Gaussian white noise.From Eq. ( 28) a discrete form of the Singer model at the k-th time step can be derived with discrete white noise w k and process matrix F k of the following form Topics in Radar Signal Processing where Δt denotes the time elapsed between time steps k À 1 and k.
Recursive Bayesian estimators can be used to calculate the state x and the covariance matrix P (for a better readability the index i will be dropped from now on).One commonly used estimator is, for example, the (Extended) Kalman Filter (EKF) [41,43].In general the Kalman filter assumes a state transition model and an observation model where z k denotes the measurement, f and h are (not necessarily linear) functions, and w k and v k are additive, zero mean, white noises with process noise covariance Q k and measurement noise covariance R k respectively.In our case it is for example the mapping between the state space ℒ and the measurement in azimuth ϕ, elevation θ and range r.For the Singer model the state transition is a linear function with The EKF consists of two steps.First the state x k | k À 1 and the covariance P k | k À 1 are predicted using the previous information x k À 1 | k À 1 and P k À 1 | k À 1 (the index k | k À 1 depicts the dependency of the estimates at time steps k and k À 1): In the general case the matrix F k À 1 is defined by Second the prediction will be corrected using the (erroneous) measurement z k : with observation matrix and Kalman gain Process noise and measurement error accumulate over time until a new measurement is executed.This leads to a probability density of the track T i with state x i .
The probability density is used to calculate the maximum time difference Δt that allows the track to stay in a predefined range relative accuracy: maxΔt (42) subject to where P denotes a projection into the plane orthogonal to the beam direction and ν ∈ (0, 1) is the track sharpness.The time difference Δt is added to the tracking task of the track T i and it is updated at every measurement.

Scheduler
In the simulation presented here a task is generated for each L ij and placed into a sorted waiting queue (see Figure 14).The scheduler executes those tasks, whose time stamp do not lie in the past, in the given order.If tasks are delayed, they will be prioritized following the hierarchy of the waiting queue.The tasks inside the waiting queue are sorted according to their time stamps Figure 15.

Performance metrics
In this section three metrics are introduced to validate the performance of the resource manager.
One key element is the tracking accuracy.For the validation the distance between the estimated position pi t ðÞand the real position p i (t) of a target is calculated.The track does not contain any information about to which target it is related to, since the radar system does not know the ground truth.Therefore the track with the closest approach to the target is chosen as reference.The track sharpness is given as % of the beam width: The metric d TS does not take into account whether the number of tracks matches the number of targets.Therefore the number of tracked targets # T is compared to the number of actually existing targets #T by the following metric:  (46)

Results
In this section the validation results of the simulation are presented.The actual airspace situation is depicted in Figure 13.Figures 16 and 17 show the evaluations of the metrics defined in Section 5.2.4.
The simulation starts with an occupied airspace.This can be a difficult situation for the radar since pop-up targets significantly decrease the reaction time as the distance to the radar is shortened.Topics in Radar Signal Processing

Knowledge-based-layer
In this section, we discuss knowledge-based behavior of a cognitive radar.As discussed in Section 3, the knowledge-based layer works on structured a-priori knowledge about the application domain and its goals and constraints.Automated planning or optimisation tools can be applied to generate mission-level commands that, for example, control the trajectory of the sensor-carrying platform.
Below, we discuss an illustrative trajectory planning problem for a 6-DOF robotic manipulator arm that carries a UWB sensor able to work in synthetic aperture radar (SAR) mode.
Results from a real measurement setup using a ST Robotics R17 robot arm are also shown.
The sensor has one transmitter and one receiver in a typical common-offset arrangement (Figure 18).

Robot trajectory planning
The spatial resolution and processing gain that the system can achieve ultimately depend on the trajectory and velocity profile of the sensor head.The constraints can be modeled as an optimisation problem to obtain a feasible, collision-free trajectory of the end-effector of the manipulator arm in Cartesian coordinates that minimizes observation time.

Sensor characteristics and trajectory constraints
The radar sensor under consideration uses a selectable center-frequency from 3 to 8 GHz and 4 GHz of bandwidth, resulting in 3.75 cm of range resolution.The center frequency can be tuned according to a particular target or propagation environment (ground penetration, through-the-wall imaging, IED inspection…).The horn-type antennas can be rotated to exploit polarization diversity.The sensor is able to operate in stripmap or spotlight SAR modes using linear trajectories.Several parallel trajectories can be combined for 3D imaging.The mobility of the arm could be further exploited to generate non-linear trajectories around a target to obtain a more accurate 3D reconstruction.
In order to obtain a similar resolution in cross-range than in range the trajectory planning must (aim to create at least an aperture of 0.5 to 1.3 times the distance to the target in both dimensions (azimuth and elevation) depending on the center frequency used by the system 3 to 8 GHz respectively).
High resolution imaging can only be achieved with an even higher precision positioning.The 3D-trajectory of the sensor needs to be measured and synchronized with the sensor data.For that purpose, accelerometers and gyroscopes from an attached inertial measurement unit (IMU) are used.The IMU drift is additionally stabilized using the hardware readout of optical encoders of the robot arm joints controlled by step-motors.
Two other important parameters to be considered for the trajectory planning are the optimal size of the scanning area and the sampling requirements.Considering the case of planar acquisition geometries working in stripmap mode, to obtain full resolution imaging of the total area of interest, an additional half beam aperture must be extended in both dimensions.
Another important parameter is related with the sampling requirements of a particular acquisition.The measurement positions in the synthetic radar aperture require a minimum spacing Topics in Radar Signal Processing in order to sample adequately the phase history associated with all the scatterers.If the distance between measurements is too large the Nyquist criterion is not fulfilled and artifacts may appear in the reconstructed image.
It must be considered also that signal propagation in dielectric materials (ground, wall) will shrink the wavelengths, and sampling requirements become then even more stringent [44].A previous estimation of the dielectric permittivity of the propagation media may further optimize the acquisition geometry and the imaging process.Figure 19 shows an example of an image obtained with the robot arm using some reference objects inside a plastic suitcase.The trajectory followed by the sensor has been planned considering the constraints previously mentioned to obtain unaliased high-resolution images of the total area of interest.

Conclusions
In this article, a three-layered cognitive radar-architecture based on the Rasmussen model was presented.Several examples illustrated technologies to implement the cognitive subfunctions in a radar system.
For the skill-based layer, an approach for matching a waveform to the target transfer function was shown.In addition, spectrum sensing methods can be used to adapt the transmit signal to the electromagnetic environment.Rule-based behavior can be implemented using Markovdecision processes (MDPs) to compute optimal illumination policies.For a shared-aperture multifunctional radar, radar-resource management approaches are required to schedule the radar timeline.For knowledge-based behavior, an example for sensor-controlled trajectory generation of a robotic-arm were presented.
The different layers of the architecture encompass a broad range of time-scales and levels of abstraction.The full potential is achieved, if all layers interact consistently.This and further experimental validation of the approach are currently investigated at FHR.

Figure 3 .
Figure 3. Three-layer-model of a cognitive radar architecture with supporting technologies.Modified from Ref. [4].

Figure 4 .
Figure 4. Transmitting two frequencies with a spacing of ΔF = c/2Δz, the size of the object can be obtained.The shape of the object requires an even larger frequency separation ΔF = c/2δz (modified from ref.[11]).

Figure 5 .
Figure 5. Target impulse response and optimal transmission waveform in time and frequency domain.(a) Target impulse response in time/range (upper) and frequency domain (lower) for two point targets at r 0 = 37.32 (a 0 = 1) and r 1 = 44.82(a 1 =1∠ 20 ), (b) Optimal transmission waveform (eigenvector corresponding to the maximal eigenvalue) in time (upper) and frequency domain (lower).

Figure 6 .
Figure 6.Test statistic of the likelihood ratios with the mean distance of the centers for LFM and optimized waveform.(a) Distribution of the likelihood ratio for noisy samples of the hypothesis and alternative using linear frequency modulation.(b) Distribution of the likelihood ratio for noisy samples of the hypothesis and alternative using the optimized waveform.

Figure 7 .
Figure 7. Structure of the used convolutional neural network.

Figure 8 .
I n this example the learning rate of class four is multiplied with different weightings between one, which means no priority, and ten.Without any weighting, this class has compared to the other classes a rather low correct classification rate calculated with respect to the number of input images Pcc in (curve with round markers).This value gives the amount of input images that belong to class four and are actually classified as class four.The curve with the square markers in the plot gives the probability of correct classification with respect to the number of output images Pcc out , which gives the amount of images that are classified as class four really belong to class four and is thus an indicator on the reliability of the classification.Summarized over all classes, both indicators lead to the same result, the correct classification rate Pcc of the curve with the triangular markers.From the plot can be seen that Pcc in shows a steep increase at small values of p and up to p = 4 also the overall correct classification rate increases, which is not the purpose here, but shows the positive effect of the additional correct classifications.While Pcc in is increasing, Pcc out shows a steady decreasing behavior.In the extreme case of p ! ∞, Pcc in should reach one and both Pcc out and Pcc should reach a value of N class4 /N, which means that all images in the dataset are classified as class four.This example and more details about the use of different weightings of different classes can be found in Ref.[36].

Figure 8 .
Figure 8. Performance of CNN with priority class.

Figure 9 .
Figure 9. Structure of the used combination of CNN and SVMs.

Figure 10 .
Figure 10.Scenario, confusion-and cost matrix for the classification problem according to Ref. [40].

Figure 11 .
Figure 11.State-space, fusion, and selection of action (further measurement or final declaration V) to minimize the expected costs.

Figure 12 .
Figure 12.Optimal policy to the MDP.

Figure 15 .
Figure 15.Detailed illustration of the task A 1 .Containing a time stamp, the duration of the task, azimuth and elevation.

Figure 16 (
Figure 16(a) shows that the revisit time for the surveillance settles around a constant value after 4 seconds.The stepped line in Figure 16(b) shows that all targets are tracked in less than 3 seconds.The second line in (b) shows that the tracking accuracy is poor at the beginning of the simulation since the filters need several measurements to initialize correctly.Figure17shows that the revisit time oscillates around 4 seconds and that the tracks are stable during routine operation.

Figure 18 .
Figure 18.Trajectory planning for IED inspection with R17HS robot arm.

Figure 19 .
Figure 19.Image of objects inside a suitcase using the robot arm.
can be split into two parts.One part that contains all samples of the priority classes and one part with the examples of the remaining classes

Table 1 .
Forced decision results of MSTAR dataset.
42].The state x(t) at time t can then shortly be written as xt ðÞ¼ pt ðÞ ; vt ðÞ ; at ðÞ ðÞ T ¼ pt ðÞ ; _ pt ðÞ ; € pt ðÞ ðÞ T .The acceleration in this model is given by an ordinary differential equation pt ðÞ¼À α€ pt ðÞþwt ðÞ To evaluate the surveillance performance, the revisit time is considered.Let therefore be t L ij the time of the last update of direction L ij and let Lt ðÞbe the direction the radar is facing at time t.Then the metric is given by