Performance Analysis of the Modified-Hybrid Optical Neural Network Object Recognition System Within Cluttered Scenes

In literature, we could categorise two broad main approaches for pattern recognition systems. The first category consists of linear combinatorial-type filters (LCFs) (Stamos, 2001) where commonly image analysis is done in the frequency domain with the help of Fourier Transformation (FT) (Lynn & Fuerst, 1998; Proakis & Manolakis, 1998). The second category consists of pure neural modelling methods. (Wood, 1996) has given a brief but clear review of invariant pattern recognition methods. His survey has divided the methods into two further sub-categories of solving the invariant pattern recognition problem. The first subcategory has two distinct stages of separately calculating the features of the training set pattern to be invariant to certain distortions and then classifying the extracted features. The second sub-category, instead of having two separate stages, has a single stage which parameterises the desired invariances and then adapts them. (Wood, 1996) has also described the integral transforms, which fall under the first sub-category of feature extractors. They are based on Fourier analysis, such as the multidimensional Fourier transform, Fourier-Mellin transform, triple correlation (Delopoulos et al., 1994) and others. Part of the first sub-category is also the group of algebraic invariants, such as Zernike moments (Khotanzad & Hong, 1990; Perantonis & Lisboa, 1992), generalised moments (Shvedov et al., 1979) and others. Wood has given examples of the second sub-category, the main representative being based on artificial neural network (NNET) architectures. He has presented the weight-sharing neural networks (LeCun, 1989; LeCun et al. 1990), the highorder neural networks (Giles & Maxwell, 1987; Kanaoka et al. 1992; Perantonis & Lisboa, 1992; Spirkovska & Reid, 1992), the time-delay neural networks (TDNN) (Bottou et al., 1990; Simard & LeCun, 1992; Waibel et al., 1989) and others. Finally, he has included an additional third sub-category with all the methods which cannot be placed under either the featureextraction feature-classification approach or the parameterised approach. Such methods are image normalisation pre-processing (Yuceer & Oflazer, 1993) methods for achieving invariance to certain distortions. (Dobnikar et al., 1992) have compared the invariant pattern classification (IPC) neural network architecture versus the Fourier Transform method. They used for their comparison black-and-white images. They have proven the generalisation


Introduction
In literature, we could categorise two broad main approaches for pattern recognition systems.The first category consists of linear combinatorial-type filters (LCFs) (Stamos, 2001) where commonly image analysis is done in the frequency domain with the help of Fourier Transformation (FT) (Lynn & Fuerst, 1998;Proakis & Manolakis, 1998).The second category consists of pure neural modelling methods.(Wood, 1996) has given a brief but clear review of invariant pattern recognition methods.His survey has divided the methods into two further sub-categories of solving the invariant pattern recognition problem.The first subcategory has two distinct stages of separately calculating the features of the training set pattern to be invariant to certain distortions and then classifying the extracted features.The second sub-category, instead of having two separate stages, has a single stage which parameterises the desired invariances and then adapts them.(Wood, 1996) has also described the integral transforms, which fall under the first sub-category of feature extractors.They are based on Fourier analysis, such as the multidimensional Fourier transform, Fourier-Mellin transform, triple correlation (Delopoulos et al., 1994) and others.Part of the first sub-category is also the group of algebraic invariants, such as Zernike moments (Khotanzad & Hong, 1990;Perantonis & Lisboa, 1992), generalised moments (Shvedov et al., 1979) and others.Wood has given examples of the second sub-category, the main representative being based on artificial neural network (NNET) architectures.He has presented the weight-sharing neural networks (LeCun, 1989;LeCun et al. 1990), the highorder neural networks (Giles & Maxwell, 1987;Kanaoka et al. 1992;Perantonis & Lisboa, 1992;Spirkovska & Reid, 1992), the time-delay neural networks (TDNN) (Bottou et al., 1990;Simard & LeCun, 1992;Waibel et al., 1989) and others.Finally, he has included an additional third sub-category with all the methods which cannot be placed under either the featureextraction feature-classification approach or the parameterised approach.Such methods are image normalisation pre-processing (Yuceer & Oflazer, 1993) methods for achieving invariance to certain distortions.(Dobnikar et al., 1992) have compared the invariant pattern classification (IPC) neural network architecture versus the Fourier Transform method.They used for their comparison black-and-white images.They have proven the generalisation properties and fault-tolerant abilities to input patterns of the artificial neural network architectures.
An alternative approach for a pattern recognition system has been well demonstrated previously with the Generalised Hybrid Optical Neural Network (G-HONN) filter (object recognition system) (Kypraios, 2010;Kypraios et al., 2004a).G-HONN system combines the digital design of a filter by artificial neural network techniques with an optical correlatortype implementation of the resulting non-linear combinatorial correlator type filter (Jamal-Aldin et al., 1998).The motivation for the design and implementation of the G-HONN object recognition system was to achieve the performance advantages of both artificial neural networks (Looney, 1997;Haykin, 1999;Beale & Jackson, 1990) and the optically implemented correlators (Kumar, 1992).Thus, NNETs exhibit non-linear superposition abilities (Kypraios et al., 2002) of the training set pattern images, learning and generalisation abilities (Kypraios et al., 2004a;Kypraios et al., 2003) over the whole set of the input images.Also, optical correlators allow high speed implementation of the algorithms described.
There are two main design blocks in the G-HONN system, the NNET and a non-linear combinatorial-type correlator (filter) block (Jamal-Aldin, 1998;Casasent, 1984;Caulfield, 1980;Caulfield & Maloney, 1969).Briefly, the original input images pass first through the NNET block and, then, the extracted images from the NNET block's output are used to form a non-linear combinatorial-type correlator filter.Thus the output of the correlator block is a composite image of the G-HONN system's output.To test the system, we correlate it with an input image.Before proceeding to analytical descriptions of the general architecture of the G-HONN system and in an effort to keep consistency between the different mathematical symbolism of artificial neural networks and optical correlators we need to unify their representation.We denote the variables names and functions by non-italic letters (except the vector elements written within the vector, which are written in italic, too), the names of the vectors by italic lower case letters and the matrices by italic upper case.The frequency domain vectors, matrices, variable names and functions are represented by bold letters and the space domain vectors, matrices, variables and functions by plain letters.
Let   hk , l denote the composite image of the correlator block and   i x k, l denote the training set images, where i1 , 2 ,, N   and N is the number of the training images used in the synthesis of a combinatorial-type filter.The basic filter's transfer function, from the weighed linear combination of i x , is given by: where the coefficients   i ai = 1 , 2 , . . ., N are to set the constraints on the peak given by c.The i a values are determined from: -1 a=R c (2 where a is the vector of the coefficients   i ai = 1 , 2 , . . ., N , R is the correlation matrix of i t and c is the peak constraint vector.The elements of this are usually set to zeros for falseclass objects and to ones for true class objects.The activation of each node , for pattern p, can be written as: i.e. it is the weighted sum of the calculated output from the node to node  .b  represents the bias vector of unit . We train a novel-designed NNET with N training set images.The network has N neurons in the hidden layer, i.e. equal to the number of training images.There is a single neuron at the output layer to separate two different object classes.(In a multi-class object recognition problem, the increase of the different classes of objects would require more than one neuron at the output layer to correctly separate all the training images.)From Eq. ( 3) the net input of each of the neurons in the hidden layer is now given by: where net is the net input of each of the hidden neurons.3) and ( 5) there is a direct analogy between the combinatorial-type filter synthesis procedure and the combination of all the layers' weighted input vectors.
There are two possible and equivalent custom designs (The Mathworks, 2008) of NNET architectures which could be used to form the basis of the combinatorial-type filter synthesis.In both of the designs each neuron of the hidden layer is trained with only one of the training set images.In effect,  Next, in section 2 we will give a brief description of the G-HONN system's design and implementation already described with details in the literature.Section 3 describes the M-HONN system.Section 4 focuses on multiple objects recognition and the M-HONN system's design.It describes the augmented design of the NNET block for accommodating multiple objects recognition of different classes.Section 5 discusses about the performance of M-HONN system with respect to peak sharpness and detectability, distortion range and discrimination ability.We discuss about the M-HONN system and biologically-inspired knowledge learning and representation.Finally, we record the series of tests we conducted with M-HONN system for multiple objects recognition of the same class and of different classes within clutter.Section 6 concludes and suggests future work.

General HONN filter's design and implementation
The novel design of NNET's architecture of the G-HONN system is implemented as a feedforward multi-layer architecture trained with a backpropagation algorithm.It has a single input source (as explained in the previous section) of input neurons equal to the size of the training image in vector form.In effect, for the training image i1N Assuming there is only a single output neuron in the output layer, then there is only one target connection for that output neuron.
We apply Nguyen-Widrow (Nguyen & Widrow, 1989;Nguyen & Widrow, 1990) initialisation algorithm for setting the initial values of the input weights, the layer weights and the biases.The transfer function of the hidden layers is set as the Log-Sigmoidal function.When a new training image is presented to the NNET we leave connected the input weights of only one of the hidden neurons.In order not to upset any previous learning of the rest of the hidden layer neurons we do not alter their weights when the new image is input to the NNET.It is emphasised that there is no separate feature extraction stage (The Mathworks, 2008;Talukder & Casasent, 1999;Casasent et al., 1998) applied to the training set images.To achieve faster learning we used a modified steepest descent (Looney, 1997;The Mathworks, 2008) back propagation algorithm based on heuristic techniques.This adaptive training algorithm updates the weights and bias values according to the gradient descent momentum and an adaptive learning rate:  is the update function of the biases of the layers and  is the momentum constant.The momentum (Looney, 1997;Haykin, 1999;Beale & Jackson, 1990;The Mathworks, 2008) allows the network to respond not only to the local gradient, but also to recent trends in the error surface.Thus, it acts like a low-pass filter by removing the small features in the error surface of the NNET.The employment of momentum in the training algorithm allows the network not to get stuck in a shallow local minimum, but to slide through such a minimum.f P is the performance function, usually set as being the mean square error (mse) (Looney, 1997;Haykin, 1999)    f max P , then the learning rate decreases by the constant  .The layer weights remain connected with all the hidden layers for all the training set and throughout all the training session.
Hence, now that we have described the design and implementation of the G-HONN filter (object recognition system) we can proceed with a detailed description of the modified-HONN filter.

Modified-HONN system implementation
We can make the following qualitatively observations for the G-HONN system.Though the combinatorial-type filters (Samos, 2001) contain no information on non-reference objects in the training set used during their synthesis, the NNET includes information for reference and non-reference images of the true-class object.That can be explained due to the NNET interpolating non-linearly (Kypraios et al.,2002) between the reference images included in the training set and forcing all the non-reference images to follow the activation graph.Moreover the NNET generalizes between all the reference and non-reference images.Quantitatively, we could demonstrate the above observations as follows.The average training set image x in the space domain of the combinatorial-type filters is given by: In the frequency domain Eq. ( 15) is written as: The non-linear activation function of each hidden neuron of an artificial neural network with a non-linear activation function such as the sigmoidal function   s f can take the form: where  and  shift the graph of the function with respect the x-axis and y-axis and are called the saturation level and slope.It can be shown (Kypraios et al., 2009) that the output N y of an artificial neural network with a non-linear activation function corresponding to an input i s for i1 , , N   (where N is the number of the training set images) is written as: fs g 1 k e x ps g e x ps g e x ps g e x ps g e x ps g 1 k exp s g exp s g exp s g exp s g exp s g where   ke x p  takes a constant value (and g i the neural network node' weights).Therefore from Eq. ( 16) and Eq. ( 18) it is shown that any artificial neural network with a non-linear activation function can non-linearly interpolate through the different training set views of the true-class object.Thus, the average training set image x in the space domain of the NNET is given by: where   f  is the activation function of node  in the space domain.Eq. ( 19) is written in the frequency domain as: The activation function    Motivated by these observations, we apply an optical mask to the filter's input (see Fig. 3).The mask is constructed by the weight connections of the reference images of the true-class object and is applied to all the tested images.Modified-HONN (M-HONN) system is described as follows:  x mn l are the input and layer weights from the hidden neuron of the layer vector element at row m and column n to the associated output neuron q.This time, instead of multiplying each training image with the corresponding weight connections as for the G-HONN system's implementation, we keep constant the weight connection values, setting Thus, the M-HONN system's transfer function is formulated as follows: In Eq. ( 23) we have chosen to constrain the correlation peak height values as we did with the constrained-HONN (C-HONN) system's implementation, but we can also easily re-write the system's transfer equation for the case of the unconstrained peak height values as with the unconstrained-HONN (U-HONN) system's implementation (Mahalanobis, 1994;Kypraios et al., 2004b).

Multiple objects recognition
Multiple objects of the same class can be accommodated by the G-HONN type filters to be recognised within an input cluttered image due to the shift invariance properties inherited by its correlator unit.Thus, in the M-HONN system all the training set images pass through the NNET unit.This time, instead of multiplying each training image with the corresponding weight connections (mask) as for the C-HONN filter, we keep constant the weight connection values, setting them to be equal with a randomly chosen image included in the training set.All the test set images are multiplied with the same randomly chosen image's weight connection values.Then, the training set images, after being transformed (masked) through the NNET unit by being multiplied with the mask, pass through the correlator unit where they are correlated with the masked test set images.In effect, the cross-correlation of each masked test set image with the transformed training set images (reference kernel) returns an output correlation plane peak value for each cross-correlation step.Hence, the maximum peak height values of the output correlation plane correspond to the recognised true-class objects.

Modified NNET block architecture for multiple objects of different classes recognition
As for all the HONN-type systems (Kypraios et al., 2004;Kypraios et al., 2003;Kypraios et al., 2009), in the M-HONN system's NNET block (unit) there is a single input source used for all the input data.Assuming we have N = 3 input still images or video frames of size 256×256 in pixels, then the input source consists of 65.536 i.e. [1(256×256)] input neurons equal to the size of each training image or frame (in vector form).Each layer needs, by definition (Hagan et al., 1996), to have the same input connections to each of its hidden neurons.Therefore, the shown NNET architecture is referred to as N+1 = 3+1 = 4, fourlayered since there are, N = 3, three hidden neurons (though shown here aligned under each other, they do not belong in the same hidden layer but rather create three separate hidden layers each of a single hidden neuron) and one output layer.Each of the hidden layers consist of only one hidden neuron.The input layer does not contain neurons with activation functions and so is omitted in the numbering of the layers.x and so on, ending with N neuron being trained with the training still image or video frame N x .Thus, the number of the input weights increases proportionally to the size of the training set: where iw N is the number of the input weights, N, is the size of the training set equal to the number of the training images and [m×n] is the size of the image of the training set.Thus, now for classes N2  there will be N transformed images being created for class 1 and N transformed images being created for class 2.Then, both sets of transformed images are used for the synthesis of the system's composite image.M-HONN system for multiple objects recognition of different class objects is written as follows: or in the frequency domain the above equation is re-written as: The above Eq.( 31) in spatial domain, and Eq. ( 32) in frequency domain describe the M-HONN system's transfer function for multiple objects recognition (where the upper script class is used for the class index, i.e. for Fig. 4 we have class = class1, class2).Thus, the M-HONN filter (robust object recognition system) is composed of a non-linear space domain superposition of the training set images or from the video frames of the training set video sequences.As for all the HONN-type systems, the multiplying coefficient now becomes a non-linear function of the input weights and the layer weights, rather than a simple linear multiplying constant as used in a constrained linear combinatorial-type filter synthesis procedure.The non-linear M-HONN system is inherently shift invariant and it may be employed in an optical correlator as would a linear superposition constrained-type filter, such as the synthetic discriminant function (SDF) -type (Bahri & Kumar, 1988) filters.It may be used as a space domain function in a joint transform correlator architecture or be Fourier transformed and used as Fourier domain filter in a 4-f Vander Lugt (Vander Lugt, 1964) type optical correlator.

Performance analysis
We have constructed a data set of input images of an S-type Jaguar car model at 10 increments of out-of-plane rotation at an elevation angle of approximately 45 to be used for the M-HONN system.A second set of images was constructed for the Police car model Mazda Efini RX-7 at the same elevation angle to serve as the out-of-class data for discrimination tests (see Fig. 5).A third data set was created of the background images of typical car parks (see Fig. , there would, in total, be more than half-a-million input weight connections needed.Thus the selective weight connection architecture is employed to overcome this problem.To overcome this problem we developed a novel selective weight connection architecture (see Section 2).Also, applying the heuristic training algorithm with momentum and an adaptive learning rate into the NNET training session (Nguyen & Widrow, 1989;Nguyen & Widrow, 1990), has speeded up the learning phase and reduced the memory size needed to complete fully the training session.Here, it worth mentioning that the NNET block and, in overall, M-HONN system is able to process input still images and video frames for all the test series in few a msec with a Dual Core CPU at 2.4 GHz with 4.0GB RAM.Additionally, due to the generalization properties exhibited by a NNET architecture, the number of the training images decreases, in comparison to the typical number of images required for the training set of linear combinatorial filters (such as the SDF filter). , then the resulting M-HONN system behaves more like a high-pass biased filter, which generally gives sharp correlation peaks and good clutter suppression but is more sensitive to intra-class distortions.Now, when we decrease Cl  , then the resulting M-HONN system behaves more like a minimum variance synthetic discriminant function (MVSDF) (Kumar, 1986) filter with relatively good intraclass distortion invariance but producing broad correlation peaks.In effect, when Cl  increases, the M-HONN system possesses better discriminatory properties but when Cl  value it leads to an increased emphasis of the high spatial frequency content of the composite images comprising M-HONN system, which in turn leads to a more localised response, sharper peaks, and reduction in the plane's sidelobes.By decreasing Cl


value it leads to an increased emphasis on peripheral lower spatial frequency content of the composite images comprising M-HONN system, which in turn leads to a broader peaks in the correlation plane.
Next, we summarise the tests series for assessing M-HONN system's peak sharpness and detectability, distortion range, and discrimination ability, which we have all described them in full details in our previous work (Kypraios et al., 2008).We focus afterwards in analysing the performance of the M-HONN object recognition system within cluttered scenes.

Peak sharpness and detectability
Here we assessed (Jamal-Aldin et al., 1997;Jamal-Aldin et al., 1998;Kumar & Hassebrook, 1990) M-HONN system's ability to detect non-training in-class images that are oriented at the intermediate angle of view between the training images (Refregier, 1990;Refregier, 1991).The training set consisted of still images out-of-plane rotated between   20 70 degrees at increments of 20  .We tested the M-HONN system with the true-class object's intermediate car poses over the same range at 10  increments.Two randomly chosen intermediate car poses, at 130  and at 140  , were added in the training set of the M-HONN system to create a false-class.We set the target of the false-class object to be false T4 0  and of the true-class object to be true T4 0   .The M-HONN system had no information on the non-training, intermediate car images in the construction of its composite image.We explicitly constrained the correlation peak in the constraint matrix.Thus, we constrained the correlation peaks in the constraint matrix to be 1  for the images of the true-class object and 0 for the images of the false-class object.The randomly chosen mask c  applied on both the training set and the test set was built from the training set image at 60  , i.e. c= 60  :  are the layer weights from the hidden neuron of the layer vector element at row m and column n to the associated output neuron.We set q = 1 since the output layer had only one neuron for a single class of objects.In M-HONN system, instead of multiplying each training image with the corresponding weight connections as done for the constrained-HONN (C-HONN) system, we keep constant the weight connection values which are set to be equal to a (randomly) chosen image included in the training set, here to be  The consistency of the correlation peak values that the M-HONN system has exhibited demonstrate the system's ability to interpolate well between the intermediate car poses at 10  increments.Earlier (Kypraios et al., 2008;Kypraios, 2009;Kypraios, 2010), we have shown the NNET includes information for reference and non-reference images of the trueclass object.Hence, the NNET interpolates non-linearly between the reference and nonreference images to follow the activation function graph.Moreover, the NNET is able to generalize between all the reference and non-reference images.

www.intechopen.com
Performance Analysis of the Modified-Hybrid Optical Neural Network Object Recognition System Within Cluttered Scenes 55 Fig. 7 (b) shows the non-normalised peak-to-correlation energy (PCE) (Kumar & Hassebrook, 1990) values for the M-HONN system.From the graph, it can be observed that the M-HONN system produced PCE values for the intermediate non-training images close to those produced by the training car images.In effect, the system maintains correlation peak sharpness for the in-class training and non-training images.

Distortion range
The second tests series (Jamal-Aldin et al., 1997;Jamal-Aldin et al., 1998;Kumar & Hassebrook, 1990) was carried out to assess the distortion range (Refregier, 1990;Refregier, 1991, Kypraios et al., 2008)  It is found the system has good performance in recognising all the intermediate car poses of the test set.The correlation-peak height of the in-class input images, intermediate between two training images, lie within a band of greater than 76% of the pre-specified peak-height constant in the constraint matrix C for the M-HONN system.From the graph it can be observed that the system tolerated orientation over a range of 3 ˆ5,4 0

Training sets
We have conducted several tests (Kypraios et al., 2009)  All the test and train input still images and video frames were concatenated row-by-row into a vector form prior being processed by the NNET block of the M-HONN system.

Biologically-inspired knowledge representation and learning
As S. Haykin in his work on artificial neural network architectures (S.Haykin, 1999) observes, pattern recognition systems need to be re-designed in novel architectures, if they are to be solving more complex problems.He argues that such novel architectures should be designed with separate blocks of a recognition unit and a knowledge learning unit, and that the implementation of such designs can be only possible with the combination of artificial neural networks architectures with other tools as a hybrid.Some of the elements (S.Haykin, 1999) that such biologically-inspired hybrid systems need to exploit are, the non-linearity of the input information, learning and adaptation to the input information, and provide an attentional mechanism for the hybrid system to be able to select certain information to be included in its learning against other input.Therefore, knowledge representation and learning becomes a central issue in the design and implementation of such hybrid biologically-inspired pattern recognition systems (Lee & Portier, 2007).
Aler et al. in their work discuss the knowledge representation and its role in knowledge learning (Aler et al., 2000).Aler et al. argue the effects that altering the knowledge representation can have on the problem knowledge learned and problem solving.They consider any problem solving system to consist of a domain theory which specifies the task to be solved, the initial problem states and the aimed problem goals, and a control knowledge which guides the decision-making process.They were able to demonstrate the effects of knowledge representation to the efficiency of the problem solving process.
Recent work we have conducted (Kypraios, 2010) has demonstrated the problem solving ability of the HONN-type systems, such as the M-HONN system for multiple objects recognition.We have shown the system is able to solve, in particular, different visual tasks.Fig. 9 shows the first problem we have tested M-HONN system for recognising different angles of view of the input object.The training set consisted of still images of the Jaguar S-type car out-of-plane rotated over a range 0° to 170° to belong in the true-class, and still images of the Jaguar S-type car out-of-plane rotated over a range 180° to 360° to belong in the false-class.
The true-class images were constrained to unit correlation peak-height in the synthesis of the M-HONN system's composite image, and the false-class images were constrained to zero correlation peak-height in the synthesis of the M-HONN system's composite image (see Fig. 10).We have set the true-class target classification levels (here we assume there is only one class=1, so there is no need to set any target connections for a second output neuron) to be class 1 true T4 0  , and for false-class the target classification levels were set to be class 1 false T4 0  .The test set consisted of multiple Jaguar S-type car objects inserted in plain background at different non-training out-of-plane rotation angles over a range 0° to 360°.As shown on Fig. 9, M-HONN system was able to correctly recognise the Jaguar S-type car poses over the range 0° to 170° to belong in the true-class, and the Jaguar S-type car poses over the range 180° to 360° to belong in the false-class.We have indicated with the solid line the recognised true-class objects and with the dashed line the recognised false-class objects.
Fig. 11 shows the second test we conducted to demonstrate the system's ability of problem solving where we want the system to recognise only the true-class objects of the Jaguar Stype car over a range 0° to 360°, and reject the false-class objects of the RX-7 Mazda Efini Police patrol car over approximately the same range.The training set consisted of still images of the Jaguar S-type car out-of-plane rotated over a range 0° to 360° to belong in the true-class, and still images of the RX-7 Mazda Efini Police patrol car out-of-plane rotated over approximately a range 0° to 360° to belong in the false-class.The true-class images were constrained to unit correlation peak-height in the synthesis of the M-HONN system's composite image, and the false-class images were constrained to zero correlation peak-height in the synthesis of the M-HONN system's composite image.We have set the true-class target classification levels (here we assume there is only one class=1, so there is no need to set any target connections for a second output neuron) to be class 1 true T 240  , and for false-class the target classification levels were set to be class 1 false T 240  . Here, we have set higher target classification level values for increasing the inter-class discrimination abilities of the M-HONN system.It worth mentioning that we could have set class=2 and, then, set a target classification level for class 2 true T with no need to include false-class objects, but adjust the constraint matrix of the system's composite image to class 1 and class 2 different fixed correlation peak-height values.The test set consisted of non-training input still images of Jaguar S-type car objects inserted in plain background at different out-of-plane rotation angles over a range 0° to 360°, and input still images of RX-7 Mazda Efini Police patrol cars inserted in plain background at different out-of-plane rotation angles over a range 0° to 360°.As shown on Fig. 11, M-HONN system was able to successfully recognise the Jaguar S-type car poses over the range 0° to 360° to belong in the true-class, and the RX-7 Mazda Efini car poses over approximately the same range 0° to 360° to belong in the false-class.Again, we have indicated with the solid line the recognised true-class objects and with the dashed line the recognised false-class objects.During the process of inserting the objects in to the car park scene some Gaussian noise is added, too.The M-HONN system was able to correctly discriminate between class 1 and class 2. However, in this test the emphasis was to study the effect that knowledge representation in the form of the composite image synthesis has on the problem solving.In effect, as shown in Fig. 13, when we have chosen to build the input mask c  From the above observations and conducted experiments, the M-HONN system, as all the HONN-type systems, combine in their design a knowledge representation unit being the optical correlator block with a knowledge learning unit being the NNET block.Moreover, HONN-type systems, such as M-HONN, have been proven in previous work we have done (Kypraios et al., 2002) to non-linearly combine the weighted, extracted by the NNET block, input training set.In effect, in HONN-type systems the attentional mechanism is provided by the extracted weights of the NNET block to be able to select certain features to be included in its composite image against other ones.Additionally, the M-HONN system, as shown above, can learn and adapt to the input information depending on the created training set itself.Here, the created training set comprises the domain theory of the task to be solved, the initial problem states and the problem goals are given by the true-class and false-class classification levels, and the synthesised composite image provides the control knowledge which guides the decision-making process.

Multiple objects recognition
Here, we summarise several tests we previously conducted for explicitly testing the M-HONN system's ability to recognise multiple objects of different classes (Kypraios et al., 2008).In the first series of conducted tests, the training set consisted of three Jaguar S-type car images out-of-plane rotated at 40° 60° and 80° to belong in class 1, and three Ferrari Testarossa extracted video frames from a recorded video sequence to belong in class 2. For our application purposes it was found to be adequate to set class1 In the second series of conducted tests, we aimed to assess the ability of the M-HONN system to recognise multiple objects of different classes within a cluttered video sequence.

Conclusion and future work
We have described the design and implementation of the M-HONN system.In particular, we focused in the design and implementation of the M-HONN system for multiple objects recognition of the same and of different classes.The inherited shift invariance properties by the optical correlator block of the system can accommodate for the recognition of multiple objects of the same class.The cross-correlation of each masked test set image with the transformed reference kernel returns an output correlation plane peak value for each cross-correlation step.Thus, the maximum peak height values of the output correlation plane correspond to the recognised true-class objects.By augmenting the output layer of the NNET block of the M-HONN system we can accommodate for the recognition of multiple objects of different classes.In effect, we increase the number of the output layer neurons proportionally with the number of the different object classes.We assign one output neuron to each different class.It was proven experimentally that by choosing different values of the classification levels for the true-class T Cl and false-class F Cl objects we can control the M-HONN system's behaviour and it can be varied from more like a high-pass biased filter, which generally gives sharp correlation peaks and good clutter suppression but is more sensitive to intra-class distortions, to more like a MVSDF filter behaviour, which generally gives broader correlation peaks but is more robust to intra-class distortions of the input objects.
We have assessed the performance of the M-HONN system by conducting several series of tests.We assessed the system's ability to detect non-training in-class images that are oriented at the intermediate angle of view between the training images.From the recorded results, we were able to show the system's ability to interpolate well between the intermediate car poses.The system maintained correlation peak sharpness for the in-class training and non-training images.More specifically, the M-HONN system is able to interpolate non-linearly between the reference and non-reference images to follow the activation function graph.The NNET block is able to generalize between all the reference and non-reference images.Next, we have tested the M-HONN system's distortion range.From the recorded results, we have shown that the system has exhibited a high distortion range recognising all the intermediate car poses of the test set over the range     3 ˆ5,4 0 (bisector angle).The third series of tests we conducted were for assessing the discrimination ability of the M-HONN system.From the recorded results, we have shown that the system successfully discriminate between objects of different classes while retaining invariance to in-class distortions.
We have analysed the M-HONN system's biologically-inspired hybrid design and we have found to combine a knowledge representation unit being the optical correlator block with a knowledge learning unit being the NNET block, as for the G-HONN type systems.We conducted several experiments for testing the system's problem solving abilities.The M-HONN system was able to solve the visual task of recognising certain Jaguar S-type car poses to belong in the true-class from other Jaguar S-type car poses.Also, the M-HONN system was able to solve the visual task of recognising only the true-class objects of the Jaguar S-type car.
The last series of tests aimed to assess the M-HONN system's performance of recognising multiple objects of different classes within clutter.We have tested the system with a recorded video sequence.The system successfully suppressed the unknown background clutter during the whole length of the video sequence and recognised correctly class 1 and class 2 objects.In overall, the M-HONN system was able to correctly recognise true-class objects out-of-plane rotated, translated off-the-centre and inserted into background scenes.It is emphasised that the system was able to recognise the true-class objects within an unknown background clutter scene since we have not included any background information in its training set.Additionally, all the invariance properties were simultaneously exhibited by the M-HONN system with a single pass over the input data sets.In effect, as we could see from its transfer function, M-HONN system is not either a multiple stages-type of filter or any pre-processing of the input data is required for maintaining its invariance properties.There is no need for a separate background segmentation pre-processing stage prior the system's object tracking as in the case of other motion based segmentation and object tracking techniques.Instead, the M-HONN system is able to successfully suppress the background clutter and track throughout the video sequence the recognised true-class object.
In future, we would like to assess the performance of each output neuron of the M-HONN system's NNET block individually and record separately their performance metrics values for the detectability, distortion range, and discrimination ability.Also, we believe that the M-HONN system's design can be extended to accommodate three-dimensional (3D) object recognition.Similarly to stereo vision systems (Lowe, 1987;Xu & Zhang, 1996;Sumi et al., 2002), the M-HONN system's design can be extended with a second input mask for training image N x .In the first design the number of the input sources is kept constant whereas in the second design the number of the input sources is equal to the number of the training images.In both designs each hidden neuron learns one of the training images.In effect the number of the input weights increases proportionally to the size of the training set: number of the input weights, N , is the size of the training set equal to the number of the training images and   mn  is the size of the image of the training set.The latter design would allow parallel implementation, since all the training images could be input through the NNET in parallel due to the parallel input sources.However, to allow easier implementation, we chose the former design of the NNET.Let assume there are three training images of a car, size   form), of different angle of view, to pass through the NNET.The chosen first design (see Fig.1) consists of one input source used for all the training images.The input source consists of 10,000 i.e. input neurons equal to the size of each training image (in vector form).Each layer needs, by definition, to have the same input connections to each of its hidden neurons.However, Fig.1is referred to as of the fourth layer since there are three hidden layers (shown here aligned under each other) and one output layer.The input layer does not contain neurons with activation functions and so is omitted in the numbering of the layers.Each of the hidden layers has only one hidden neuron.Though the network initially is fully connected to the input layer during the training stage, only one hidden layer is connected for each training image presented through the NNET.Fig.1is thus not a contiguous three (hidden) layer network during training, which is why the distinction is made.

Fig. 1 .
Fig. 1.Architecture of the selected artificial NNET block of the HONN filter.
f  of node against the training set images i x is plotted in Fig. class object) are marked with + .Now, if we mark on the plot the activation function values for the training image at 30º and 40º degrees object poses, then the activation function for the training image at 35º degrees will be located on the graph between the values of the activation function for the 30º and 40º degree inputs.The actual activation function values for the training set images of 30 x , 40 x and 35 x are located in the area included under the graph for activation function values greater or equal to the pre-specified true-class object classification level, in this case shown we assume it is set at +40.

Fig. 2 .
Fig. 2. It shows the activation function graph of node  against the training set images ix .
input and layer weights from the input neuron of the input vector element at row m and column n to the associated hidden layer for the training image

Fig. 4
Fig. 4 shows the modified NNET block architecture for accommodating multiple objects for m o r e t h a n o n e c l a s s r e c o g n i t i o n .A s f o r all the family of G-HONN filters, NNET is implemented as a feedforward multi-layer architecture trained with a backpropagation algorithm.It has a single input source of input neurons equal to the size of the training image or video frame in vector form.In effect, for the training still image or video frame i1N x   of size [m×n], there are [m×n] input neurons in the single input source.The input weight are fully connected from the input layer to the hidden layers.There are iw N i n p u t weights proportional to the size of the training set.The number of the hidden layers, l N is equal to the number of the images or video frames of the training set N: N1 , 2 , 3 ,, i   and www.intechopen.comPerformance Analysis of the Modified-Hybrid Optical Neural Network Object Recognition System Within Cluttered Scenes 51 6) and the images of the S-type car model and the Mazda RX-7 car model added in the background scene.The size of all the images was  256 256  and all the images are in grey-scale bitmap format.All the input training images (and all the input test set images) for M-HONN system are concatenated row-by-row into a vector of size prior to input to the NNET block.Normally this size of image is impossibly large for processing by any artificial neural network architecture, since to be implemented by enough input and layer weights:

Fig. 5 .
Fig. 5. RX-7 Mazda Efini Police patrol car used in the training and test sets


decreases the M-HONN system has better generalising properties.By plotting the isometric www.intechopen.comPerformance Analysis of the Modified-Hybrid Optical Neural Network Object Recognition System Within Cluttered Scenes 53 correlation planes of M-HONN system for different Cl  values, one could observe that by increasing Cl weights from the input neuron of the input vector element at row m and column n to the associated hidden layer for the training image

Fig. 7 .
Fig. 7. (Adapted byKypraios et al., 2008) shows (a) correlation peak-height versus the outof-plane rotation angles of the object over the range of 20° to 70°.We tested the M-HONN system with the true-class object's intermediate car poses over the same range out-of-plane rotated at 10° increments; (b) the non-normalised PCE values of the test images at 10° increments versus the angles of view over the range of 20° to 70°.
Fig. 8 (b)  shows the correlation-peak height for each input image for the M-HONN system.It is found the system has good performance in recognising all the intermediate car poses of the test set.The correlation-peak height of the in-class input images, intermediate between two training images, lie within a band of greater than 76% of the pre-specified peak-height constant in the constraint matrix C for the M-HONN system.From the graph it can be observed that the system tolerated orientation over a range of 3 ˆ5,4 0 of the M-HONN system.In the tests, M-HONN system tried to discriminate between objects of different classes while retaining invariance to inclass distortions.The training set consisted of images of the Jaguar S-type for adistortion range over 20  to 70  at 10  increments.The test set consisted of one training image outof-plane rotated at 40  of the Jaguar S-type and a second image of the out-of-class RX-7 Police patrol car at the same angle of out-of-plane rotation.Two different training set configuration of still images were experimented with.Firstly, we added two images of the Jaguar S-type at 130  and 140  for the false-class of the system's training set.We constrained the false-class images of the objects to zero correlation peak-height in the synthesis of the M-HONN system's composite image.Secondly, we conducted experiments with no inclusion in the system's composite image of any false-class images.For both cases, we aimed in observing if there was any change in the class separation ability of the M-HONN system.We constrained the true-class objects to unity correlation peak-height and we used the same as before Targets for the false-and true-class images of the NNET block.The target of the false-class object is false T4 0   , and the Target of the true-class object is true T4 0  in the training set of the NNET block for the M-HONN system.It had no built-in information on the test images.

Fig. 8 .
Fig. 8. (Adapted by Kypraios et al., 2008) shows (a) the reference angle, Θ0, and the two inclass training images at the angles Θ1 and Θ2.The test image is on the bisector at angle Θ3; (b) the correlation peak-heights for each input image over a range of Θ3 = [5° 40°] for the M-HONN system.
the values taken for the in-class training image and the fifth column contains all the values taken for the out-of-class training image.It is shown from the third column of Table 1 that M-HONN system gave sufficient discrimination ability between the two objects, the Jaguar S-type car and the RX-7 Police patrol car.It produced 12% class separation (with the falseclass images included in the synthesis of the composite image with zero correlation peakheight constraint).By not including any false-class images in the system's composite image, but by setting to unity correlation peak-height constraint the true-class images and keeping constant the target of the false-class object to false T4 0   and the target of the true-class object to true T4 0  , the M-HONN system increased the class separation to 27%.Thus, the Performance Analysis of the Modified-Hybrid Optical Neural Network Object Recognition System Within Cluttered Scenes 57 two different training set still images configurations that we had experimented with (in first, false-class images zero peak constrained and, in second, false-class images not included in the system's composite image) helped us make a useful observation about the M-HONN system's ability to distinguish between two different classes.More specifically, the falseclass images included in the composite image, and zero peak constrained, were taken from the true-class object in different poses not included in the training set.In effect, when we tested the RX-7 police patrol car images, the system separated the input images from the trained images (unity peak constrained) of the true-class object and the false-class images (of the same true-class but zero peak constrained) as a third class.Apparently, that caused the drop of the M-HONN system discrimination ability by almost half.We have found to be a solution to the problem by including false-class images not belonging in the same the trueclass object but from a different one which it could increase further the discrimination ability of M-HONN system.
for evaluating the performance of the M-HONN system in recognising multiple objects of the same class or of different classes.Several training sets were created for testing the system's performance with still images and with video sequences.The first training set consisted of still images of the Jaguar S-type car for a distortion range over 0  to 360  out-of-plane rotated at 10  increments.The second training set consisted of still images of the RX-7 Police patrol car for a distortion range approximately over 0  to 360  out-of-plane rotated at 10  increments.The third training set consisted of video frames of a Ferrari Testarossa car within a background clutter scene.The fourth training set consisted of still images of different car park scenes.A fifth training set consisted of video frames we have taken showing a sequence of a Jaguar S-type car and a Ferrari Testarossa car within a background clutter scene.All the training and test sets of the still images and of the video sequence frames were used in grey-scale bitmap format, and they were sized to 256x256.

Fig. 9 .
Fig. 9.It shows the first visual problem for testing the M-HONN object recognition system's ability of problem solving.M-HONN system tries to recognise certain angles of view of the input object while rejecting others.The training set consisted of still images of the Jaguar Stype car out-of-plane rotated over a range 0° to 170° to belong in the true-class, and still images of the Jaguar S-type car out-of-plane rotated over a range 180° to 360° to belong in the false-class.We have indicated with the solid line the recognised true-class objects and with the dashed line the recognised false-class objects.

Fig. 10 .
Fig. 10.It shows the composite image the M-HONN system synthesised for a training set consisting of still images of the Jaguar S-type car out-of-plane rotated over a range 0° to 360°.

Fig. 11 .
Fig. 11.It shows the second visual problem for testing the M-HONN object recognition system's ability of problem solving.M-HONN system tries to recognise only the true-class objects of the Jaguar S-type car and reject all the false-class objects.The training set consisted of still images of the Jaguar S-type car out-of-plane rotated over a range 0° to 360° to belong in the true-class, and still images of the RX-7 Mazda Efini Police patrol car out-ofplane rotated over approximately a range 0° to 360° to belong in the false-class.We have indicated with the solid line the recognised true-class objects and with the dashed line the recognised false-class objects.


, which we applied it on both the training set and the test set, from the training set image of true-class 1 object of the Jaguar S-type car, then the system synthesised its composite image by nonlinearly revealing more features for the true-class 1 object of the Jaguar S-type car, allowing less features for the true-class 2 object of the RX-7 Mazda Efini Police patrol car and completely suppressing any features of the background car park scene.When we have chosen to build the input mask c  from the training set image of true-class 2 object of the RX-7 Mazda Efini Police patrol car, then the system synthesise its composite image (see Fig.14) by non-linearly revealing more features for the true-class 2 object of the RX-7 Mazda Efini Police patrol car, allowing less features for the true-class 1 object of the Jaguar S-type car and completely suppressing any features of the background car park scene.

Fig. 12 .
Fig. 12.It shows one of the test set input images used for assessing the M-HONN system's performance within clutter

Fig. 13 .
Fig. 13.It shows the synthesised composite image of the M-HONN system.The training set set consisted of Jaguar S-type car objects out-of-plane rotated over a range 20° to 70° at 10° increments to belong in true-class 1, RX-7 Mazda Efini Police patrol car objects out-of-plane rotated over approximately a range 20° to 70°at 10° increments to belong in true-class 2, and a random car park scene to belong in false-class.When we have built the input mask from the training set image of true-class 1 object of the Jaguar S-type car, then the system synthesised its composite image by non-linearly revealing more features for the true-class 1 object of the Jaguar S-type car, allowing less features for the true-class 2 object of the RX-7 Mazda Efini Police patrol car and completely suppressing any features of the background car park scene.
Fig. 14.It shows the synthesised composite image of the M-HONN system.The training set set consisted of Jaguar S-type car objects out-of-plane rotated over a range 20° to 70° at 10° increments to belong in true-class 1, RX-7 Mazda Efini Police patrol car objects out-of-plane rotated over approximately a range 20° to 70°at 10° increments to belong in true-class 2, and a random car park scene to belong in false-class.Now we have built the input mask from the training set image of true-class 2 object of the RX-7 Mazda Efini Police patrol car, then the system synthesised its composite image by non-linearly revealing more features for the true-class 2 object of the RX-7 Mazda Efini Police patrol car, allowing less features for the true-class 1 object of the Jaguar S-type car and completely suppressing any features of the background car park scene.
Fig. 16  shows indicatively four of the video frames from the recorded video sequence.The frame rate of the video sequence was 25 frames per second (fps).The training consisted of images of the Jaguar S-type car out-of-plane rotated over 20° to 80° degrees at 20° increments.We added two images of the Jaguar S-type car out-of-plane rotated at 130° and 140° to fall inside the false-class object for increasing the peak sharpness and class discrimination abilities of the M-HONN system.For our conducted tests we found the best values for the true-class 1 and true-class 2 classification values to be class true-class 1 of the Jaguar S-type object images to unit correlation peak-height constraint, true-class 2 of the Ferrari Testarossa car to half-a-unit correlation peak-height constraint, and false-class 1 and false-class 2 to zero correlation peak-height constraints in the synthesis of the M-HONN system's composite image.Fig.16shows the locked window unit of chosen size 70x70 on top of the maximum correlation peak-height values.With the dashed line we have shown the secondary correlation peaks of the output plane and with the solid line we have shown the maximum correlation peak-height value of the output plane.M-HONN system successfully suppressed the unknown background clutter throughout the length of the video sequence and recognised correctly class 1 and class 2 objects.It is emphasised that we have not included any background information in the training set of the system.

Fig. 15 .
Fig. 15.(Adapted by Kypraios et al., 2009) It shows (a) for the first output layer neuron, and (b) for the second output layer neuron the isometric output correlation plane response of the M-HONN system for Class 1 (normalised to the maximum correlation plane peak-height value), and (c) and (d) the isometric output correlation plane response of the M-HONN system for Class 2 (normalised to the maximum correlation plane peak-height value).
Fig. 16.It shows indicatively four of the video frames from the recorded video sequence.The locked window unit is on top of the maximum correlation peak-height values.With the dashed line we have shown the secondary correlation peaks of the output plane and with the solid line we have shown the maximum correlation peak-height value of the output plane.
Performance Analysis of the Modified-Hybrid Optical Neural Network Object Recognition System Within Cluttered Scenes 41 Now, let an image s be the input vector to an artificial neural network's hidden neuron (node), p t represent the target output for pattern p on node  and p o  represent the calculated output at that node.The weight from node to node  is represented by w .
Though the network initially is fully connected to the input layer during the training stage, only one hidden layer is connected for each training image presented through the NNET.NNET is thus not a contiguous three layer network during training, which is why the distinction is made.