There are a number of problems in science and technology that demand separating useful information from certain content. For many of those problems, standard techniques, as signal processing technique, shape recognition, system control theory, artificial intelligence etc., have shown as inadequate. Neural networks are a way to solve these problems in a way they are solved in human brain. Same as the human brain, neural networks are able to learn from given data, and afterwards, when they meet the same or similar data they may give the same or approximate result.
There are several types of transfer functions: sigmoid, logistic sigmoid, linear, semilinear, threshold, Gauss' function. Figure 1 shows the graph for one of most used transfer functions:
Multilayer neural network with signal propagation forward is very often used architecture (Bourlard, H at all, 2002). In it, signals are propagating only forward, and neurons are organized in layers. Most important properties of multilayered networks with propagation forward are given in these two theorems:
Multilayered network with single hidden layer may uniformly approximate any real continual function with arbitrary precision at the final real axis.
Multilayered network with two hidden layers may uniformly approximate any real continuous function at the final real axis.
Input layer receives data from environment. Hidden layer receives data from previous layer (in this case, outputs from input layer) and gives output depending from sum of input weights. For more complex problems, sometimes it is necessary to have more than one hidden layer. Output layer computes neural network outputs from sum of weights and transfer function.
H.263 is an international standard for video stream compression, widely used in telecommunication systems (ITU, 1995). There are several additions by ITU-T recommendation h.263, aimed at broadening of supported picture formats and video stream compression quality (ITU, 1996).
Enhancement of h.263 standard, presented in this paper, is related to application of artificial neural network (ANN) instead of standard DCT code, for sequences full of quick motion details.
In section 2, a short description of h.263 standard is given. Section 3 describes training code for neural network used. Section 4 describes a way in which ANN is applied as an addition to existing h.263 standard. In section 5, results of experiments showing effects of this approach at quality and compression level of test sequence are presented.
2. H 263 video encoder
Compression of a video signal is the key component in modern telecommunication services, as videotelephony and video conferences, in modern digital TV systems with normal and high resolution, and in numerous multimedia services. The reason is that – without compression – digital video signal consists of huge amount of data. Another problem in multimedia systems is a speed of reading and transferring data from compact disc to computer memory, and in fastest systems, it is up to 4 Mb/s. Having in mind that coding of a video signal is a topic of research for more than two decades, a large number of algorithms had been developed, implemented and tested on existing communication channels. In order to enable connection of equipment form different manufacturers, several international companies defined standards for compression and transfer of video signal. Best known are H.261 and H.263 for transfer of videoconferences and videophony, as well as MPEG standards (MPEG-1, MPEG-2 and MPEG-4) intended for standardization of multimedia systems and digital television (Schäfer, R., T. Sikora, 1995 ). Three-dimensional (3D) compression of a video signal is a generalization of two-dimensional video signal compression principle. Most frequent way to realize 3D compression of a video signal is the 3D transformation encoding based on DCT. For application of this method, video signal is divided in blocks with dimensions MNP, where M, N and P, respectively, are the horizontal, vertical and time dimensions of a block (Boncelet C. 2005). On every block 3D DCT is applied, and obtained DCT coefficients are being quantumized. As in 2D DCT, only coefficients with very small index values have significant values (Roese, J.A., at all 1997). In H.261 standard, two picture formats are defined (Markoski, B. & Đ. Babić, 2007). Therefore, for transmission of both formats of video signal by ISDN channels, it is necessary to achieve considerable level of compression (typically about 100 times). Since QCIF format is mostly intended for videophony applications, where mostly only a face of the other person is visible, frame frequency is usually decreased to 10 frames/s. H.261 standard defines algorithms for eliminating redundancy, quantumization algorithms, structure of coders and decoders, as well as data structure (Rijkse, K, 1995). It is interesting that standard does not demand using a certain algorithm for movement estimation, but it is important only to determine and transmit block movement vectors. A mechanism of regulation of bit-stream is also not demanded, but it is determined by choosing the way of processing and a way of deciding whether a block is being transmitted or not. In practice, implementation known as Referent model 8 (COST211bis/SIM89/37, 1989) is used most frequently, and it was used in standards testing.
H.263 standard is intended for standardization of picture transmission by standard telephone commutated lines wit bit-stream under 64 Kb/s, which was not covered by any standard (Rijkse, K., 1995, Girod, B, at all, 1996). It was produced by modifications of existing H.261 standard. Due to very tight deadlines in preparing the standard, original text of standard (Rijkse, K., 1995), defines only most necessary improvements of H.261 standard, but a possibility for further improvements is left open.
The basic difference between H.261 and H.263 standards is in target bit-stream (A.Amer, E. Duboius, 2005). H.261 was supposed to be used for picture transmission over 64 Kb/s, while H.263 was supposed to be used under 64 Kb/s, most often in 22 Kb/s. In order to realize this goal, four small improvements were done to algorithms prescribed by H.261 standard. Although no one of those, per se, contributes much to total performances, all four together improve performances considerably (LeGall, D.J 1992).
H.263 recommendation is defined by International telecommunications society - telecommunication standard section (ITU-T, 1996). This recommendation standardizes a video stream compression process, defining syntax of compressed data format Compression is necessary in order to translate a conventional video stream into a shape available to computer applications under present limitations. H.263 uses compression code basically similar to JPEG (Joint Photographic Experts Group) and to MPEG (Motion Picture Experts Group codes) (ITU-T, 1995). Video stream is being compressed by a transformation sequence of every single picture.
H.263 video stream is organized in several layers, as shown in Figure 2. The highest layer, picture layer, defines basic properties of the video stream as picture size and coding system. Next layer is a group of blocks layer, enabling unique interpretation of spatially close blocks. Two lowest layers are macroblock layer and block layer, representing code interpretation of a picture. Every picture within video sequence is coded in one of three possible ways of coding: intra (I), inter (P) or bidirection (PB) coding. I-pictures are coded similar as in JPEG standard. P-pictures are envisaged on the basis of previously coded picture, and PB-pictures are envisaged on the basis of blocks from previous and next picture. Coding of every picture consists from its partition into macroblocks and special coding for every one of those. Every macroblock presents a 16x16-pixel zone and is a basic unit for motion compensation. Macroblock consists from 6 blocks: 4 luminent and 2 chrominent blocks. These blocks (8x8 pixels) are basic units for DCT (Discrete cosine transform).
Motion compensation is being done in order to remove time sameness between adjacent pictures in a video sequence. In this way, instead of complete picture, only information on detected changes and a way of their movement (move vectors) are transmitted. To avoid error accumulation, together with move vectors an error signal is coded, which is a difference between reconstructed and actual picture. The DCT transformation is being done to thus obtained error of move estimation. DCT transformation is being done on 8x8-pixel blocks, resulting in 64 transformation coefficients. The block energy is, after transformation, concentrated in few coefficients, corresponding to low-frequency part of range. Therefore, quantization of these coefficients is possible with relatively small error. Most of DCT coefficients are equalized with zero, which lowers information quantity needed for picture reconstruction.
At the end of coding process, obtained information is statistically coded (Huffman and run-length coding) and written in format defined by h.263 syntax of video stream.
3. Neural network
The discipline we know today as neural networks originated as a result of fusing several quite different ways of research: signal processing, neurobiology and physics (Haykin S, 1994). Neural networks are a typical example of an interdisciplinary discipline (L. Faulsett 1995). On the one hand, this is an attempt to understand workings of a human brain, and on the other to apply the newly acquired knowledge in processing complex information (Lippmann, R. P. 1987). There are other progressive, non-algorithmic systems, as learning algorithms, genetic algorithms, adaptive memory, associative memory, fuzzy logic. General opinion is that neural networks are presently most mature and most applicable technology (Barsterretxea, att all, 2002).
Conventional computers work on logic basis, deterministically, sequentially or wit a very low level of parallelism. Software written for such computers must be almost perfect in order to work appropriately. This requires long and costly designing and testing process.
Neural networks belong to parallel asynchronous distributed processing category. The network is tolerant on damages or falling out of function for a relatively low number of neurons. The network is also tolerant to presence of noise in input signal. Every memory element is delocalized - situated in network as a whole and it is impossible to identify in which part it is stored. Classic addressing is nonexistent, since memory is approached using contents, and not the address (S.P. Teeuwsen, at all. 2003 ).
Basic component of neural network is a neuron, as shown in figure 3:
Dendrites are inputs into neuron. Natural neurons have even hundreds of inputs. Point where dendrites are touching the neuron is called a synapse. Synapse is characterized by effectiveness, called synaptic weight. Neuron output is formed in a following way: signals on dendrites are multiplied by corresponding synaptic weights, results are added and if they exceed threshold level on the result is applied a transfer function of neuron, which is marked f on a figure. Only limitation of transfer function is that it must be limited and non-decreasing. Neuron output is routed to axon, which by its branches transfers result to dendrites. In this way, output from one layer of network is transferred to the next one.
In neural networks, three types of transfer functions are presently being used:
All three types are shown in figure 4:
The neural network has unique multiprocessing architecture and without much modification, it surpasses one or even two processors of von Neumann architecture characterized by serial of sequential information processing (S.P. Teeuwsen at all, 2003 ). It has ability to explain every functional dependence and to expose a nature of such dependence with no need to external incentives, demands for building a model or its change. In short, neural network may be considered as a black box capable of predicting output pattern or a signal after recognizing given input pattern. Once trained, it may recognize similarities when a new input signal is given, which results in predicted output signal. There are two categories of neural networks: artificial and biological ones. Artificial neural networks are in structure, function and in information processing similar to biological ones. In computer sciences, neural network is an intertwined network of elements that processes data. One of more important characteristics of neural networks is their capability to learn from limited set of examples. The neural network is a system comprised of several simple processors (units, neurons), and every one of them gas its local memory where it stores processed data. These units are connected by communication channels (connections). Data exchanged by these channels are usually numerical ones. Units are processing only their local data and inputs obtained directly through connection. Limitations of local operators may be removed during training. A large number of neural networks created as models of biological neural networks. Historically speaking, inspiration for development of neural networks was in desire to construct an artificial system capable of refined, maybe even "intelligent" computations in a way similar to that in human brain. Potentially, neural networks are offering us a possibility to understand functioning of human brain. Artificial neural networks are a collection of mathematical models that simulate some of observed capabilities in biological neural systems and has similarities to adaptable biological learning. They are made of large number of interconnected neurons (processing elements) which are, similarly to biological neurons, connected by their connections comprising of permeability (weight) coefficients, whose role is similar to synapses. Most of neural networks have some kind of rule for "training", which adjusts coefficients of inter-neural connections based on input data (Cao J, at all 2003). Large potential of neural networks lays in possibility of parallel data processing, to compute components independent from each other. Neural networks are systems made of several simple elements (neurons) that process data parallely.
There are numerous problems in science and engineering that demand extracting useful information from certain content. For many of those problems, standard techniques as signal processing, shape recognition, system control, artificial intelligence and so on, are not adequate. Neural networks are an attempt to solve these problems in a similar way as in human brain. Like human brain, neural networks are able to learn from given data; later, when they encounter the same or similar data, they are able to give correct or approximate result.
Artificial neuron, based on sum input and transfer function, computes output values. The following figure shows an artificial neuron:
The neural network model consists of:
neural transfer function
network topology, i.e. a way of interconnecting between neurons,
According to topology, networks are differing by a number of neural layers. Usually each layer receives inputs from previous one, and sends its outputs to the next layer. The first neural layer is called input layer, the last one is output layer and other layers are called hidden layers. Due to a way of interconnecting between neurons, networks may be divided to recursive and non-recursive ones. In recursive neural networks, higher layers return information to lower ones, while in non-recursive ones there is a signal flow only from lower to higher layers.
Neural networks learn from examples. Certainly there must be many examples, often even tens of thousands. Essence of a learning process is that it causes corrections in synaptic weights. When new input data cause no more changes in these coefficients, it is considered that a network is trained to solve a problem. Training may be done in several ways: controlled training, training by grading and self-organization.
No matter which learning algorithm is used, processes are in essence very similar, consisting from following steps:
A set of input data is presented to a network.
Network processes information and remembers result (this is a step forward).
The error value is calculated by subtracting obtained result from the expected one.
For every node a new synaptic weight is calculated (this is a step back).
Synaptic weights are changed, or old ones are left and new ones are remembered.
On network inputs, a new set of input data is brought to network inputs and steps 1-5 are repeated. When all examples are processed, synaptic weights values are updated and if an error is under some expected value the network is considered trained.
We will consider two training modes: controlled training and self-organization training.
The back-propagation algorithm is the most popular algorithm for controlled training. The basic idea is as follows: random pair of input and output results is chosen. Input set of signals is sent to the network by bringing one signal at each input neuron. These signals are propagating further through the network, in hidden layers, and after some time a results show on output. How has this happened?
For every neuron an input value is calculated, in a way we previously explained; signals are multiplied by synaptic weights of corresponding dendrites, they are added and a neuron's transfer function is being applied to obtained value. The signal is propagated further through the network in a same way, until it reaches output dendrites. Then a transformation is done once again and output values are obtained. The next step is to compare signals obtained on output axon branches to expected values for given test example. Error value is calculated for every output branch. If all errors are equal to zero, there is no need for further training – network is able to perform expected task. However, in most cases error will be different from zero. Then a modification of synaptic weights of certain nodes is called for.
Self-organized training is a process where a network finds statistical regularities in a set of input data and automatically develops different behavior regimes depending on input. For this type of learning, the Kohonen algorithm is used most often.
The network has only two neural layers: input and output one. Output layer is also called a competitive layer (reason will be explained later). Every input neuron is connected to every neuron in output layer. Neurons in output layer are organized in two-dimensional matrix (Zurada, J. M.1992).
Multilayer neural network with signal propagation forward is one of often used architectures. Within it, signals are propagating only ahead, and neurons are organized in layers. Most important properties of multilayer networks with signal propagation forward are given as following theorems:
Multilayer network with a single hidden layer may uniformly approximate any real continual function on the finite real axis, with arbitrary precision.
Multilayer network with two hidden layers may uniformly approximate any real continual function of several arguments, with arbitrary precision.
Input layer receives data from environment. Hidden layer receives outputs of a previous layer (in this case, outputs of input layer) and, depending on sum of input weights, gives output. For more complex problems, sometimes is necessary more than one hidden layer. Output layer computes, on the basis of weight sum and transfer function, outputs from neural network.
The following figure shows a neural network with one hidden layer.
In this work, we used Kohonen neural network, which is a self-organizing map of properties, belonging to a class of artificial neural networks with unsupervised training (Kukolj D., Petrov M., 2000). This type of neural network may be observed as topologically organized neural map with strong associations to some parts of biological central nervous system. The notion of topological map understands neurons that are spatially organized in maps that guard, in a certain way, the topology of input space. Kohonen neural network is intended for following tasks:
Quantumization of input space
Reduction of output space dimension
Preservation of topology present within structure of input space.
Kohonen neural network is able to classify input samples-vectors, without need to recognize signals for error. Therefore, it belongs to group of artificial neural networks with unsupervised learning. In actual use of Kohonen network in algorithm for obstacle avoidance, network is not trained but enhancement neurons are given values calculated in advance. Regarding clusterization, if a network may not classify input vector to any output cluster, than it gives data regarding how much the input vector is similar to every of clusters defined in advance. Therefore, this paper uses Fuzzy Kohonen neural clusterization network (FKCN).
Enhancement of h.263 code properties is attained by generating a prototype codebook, characterized by highly changeable differences in picture blocks. Generating codebook is attained by training of self-organizing neural network (Haykin, 1994; Lippmann, 1987; Zurada, 1992). After realization of original training concept (Kukolj and Petrov, 2000), a single-layer neural network is formed. Every node of output ANN layers represents a prototype within codebook. Coordinates of every and node within network is represented by difficulty synaptic coefficients w i . After initialization, the code proceeds in two iterative phases.
First, closest node for every sample is found, using Euclidean distance, and node coordinates are computed as arithmetic means of coordinates for samples clustered around every node. The node balancing procedure is continued by confirmation of following condition:
where T ASE is equal to a certain part of present value of average square error (ASE). Variables w i and w i ' are synaptic vectors of node and in present and previous code iteration. If above condition is not met, this step is repeating, otherwise the procedure is proceeding further.
In a second step, so-called dead nodes are considered, i.e. nodes that have no assigned samples. If there are no dead nodes, T ASE has very low positive value. If dead nodes are existing, value q for pre-defined number of nodes (q<<K), with maximum ASE value, is found. Then dead node is moved near one randomly chosen node from q nodes with maximum ASE values. Now new coordinates of the node are as follows:
where w max q is location of chosen node between q nodes with highest ASE, w i new is new node location, and = 1, 2,...,n T are small random numbers. The process of deriving new coordinates for dead nodes (2) is repeated for all of those nodes. If maximal number of iteration is achieved, or if in previous and present iteration number of dead nodes is equal to zero, code ends. Otherwise it returns to first stage.
4. Application of ANN in video stream coding
The basic way of removing spatial sameness during coding in h.263 code is using of transformation (DCT) coding (Kukolj at all, 2006). Instead of being transferred in original shape after DTC coding, data are presented as the coefficient matrix. Advantage of this transformation is that obtained coefficients could be quantized, which increases the number of coefficients with zero value. This enables removal of excess bits using entropy coding on the bit repeating basis (run-length).
This approach is efficient in cases when a block is poor in details, so the energy is localized in a few first coefficients of DCT transformation. But, when a picture is rich in details, the energy is equally distributed to other coefficients as well, so after quantization we do not obtain consecutive zero coefficients. In these cases, coding of those blocks uses much more bits, since bit-repetition coding could not be efficiently used. Basic way of compression factor control in this case is increase of quantization step, which brings to loss of small details in reconstructed block (block is blurred) with highly expressed block-effect on reconstructed picture (Cloete, Zurada, 2000).
Enclosed improvement of h.263 code is based on detection of these blocks and their replacement by corresponding ANN node. Basic criterion for critical blocks detection is the length of generated bits, using the standard h.263 code.
As training set for ANN we used a set of blocks, which are, during the standard h.263 process, represented with more than 10 bits. Boundary level of code length, N=10 bits, have been chosen with purpose to obtain codebook with 2N=1024 prototypes.
In order to obtain training set, video sequences from "Matrix" movie were used, as well as standard CIF test video sequences "Mobile and calendar" (Hagan, et al 2002). A training set from about 100,000 samples was obtained for ANN training. As a training result, training set was transformed into 1024 codebook prototypes with least average square error regarding the training set.
The modified code is identical with standard way of h.263 compression of video stream until the stage of move vector compensation. Every block is coded by the standard method (using DCT transformation and coding on the basis of bit repeating), and than decision on application of ANN instead of standard approach is made. Two conditions must be fulfilled in order to use the network.
Condition of code length: whether standard approach gives the code longer of 10 bits as the representation of observed block. This is the primary condition, providing that ANN is used only in cases when standard code does not give satisfying compression level.
Condition of activation threshold: whether average square error, obtained using neural network, is within boundaries:
ASEINN - average square error obtained using ANN;
ASEDCT - average square error obtained using the standard method
k - activation threshold for the network (1.0 - 1.8).
On the basis of these conditions, choice between standard coding method and ANN application is being made.
Format of coded video stream is taken from h.263 syntax (ITU-T, 1996). Data organization in levels has been kept, as well as a way of representation for block moves vector. A modification of syntax of block level was done, introducing additional field (1 bit length) in header of block level (Fig. 3), in order to note which coding method was used in certain blocks.
5. Results of testing
Testing of the described modified h.263 code was done on dynamic video sequence from the "Matrix" movie (525 pictures, 640x304 points). Basic measured parameters were the size of coded video stream and error within coding process. Error is expressed as peak signal to noise ratio (PSNR):
where ASE l is average square error of reconstructed picture in comparison to the original one.
During the testing, quantization step used in standard DCT coding process and activation threshold of neural network (expressed as coefficient k in formula (4)) were varied as parameters.
The standard h.263 was used as a reference for comparison of obtained results.
Two series of tests were done. In first group of tests, quantization step has been varied, while activation threshold was constant (k=1.0). In second group of tests, activation threshold has been varied, with constant value for quantization step (1.0).
Figure 8 shows the size of obtained coded stream for both methods. It could be seen that compression level obtained using ANN is higher than one obtained using standard h.263 code. For higher quantum values, comparable sizes of stream are obtained, since in this case condition of code length for ANN use was not met, so the coding is being done almost without ANN.
Figure 9. shows the size of error within coded video stream for both methods. It could be seen that, for same values of used quantum, ANN has insignificantly higher error than the standard h.263 approach.
Figures 10. and 11. show results obtained by varying activation threshold of neural network between 1.0 and 1.8. Due to clearness, results are shown for the first 60 pictures from the test sequence. Sudden peaks correspond to changes of camera angle (frame).
Obtained results show that with increase of neural network activation threshold, compression level decreases and quality of video stream increases. Further increase of activation threshold (above k=1.8), effect of ANN on coding becomes minor.
The paper deals with h.263 recommendation for the video stream compression. Basic purpose of the modification is stream compression enhancement with insignificant losses in picture quality. Enhancement of the video stream compression is achieved by artificial neural network. Conditions for its use are described as condition of code length and condition of activation threshold. These conditions were tested for every block within picture, so the coding of the block was done by standard approach or by use of neural network. Results of testing have shown that by this method the higher compression was achieved with insignificantly higher error in comparison to the standard h.263 code.