Tracking Mean Field Dynamics by Synchronous Computations of Recurrent Multilayer Perceptrons

Mean field dynamics have been extensively applied for organizing neural networks in the field of computational neuroscience, since Hopfield pioneered collective decisions of interconnected processing elements for combinatorial optimization [1–6] and memory association [7, 8]. Both nonlinear transfer functions and synapses in a Hopfield neural network are a subsequence of mean field dynamics that characterize the mean configuration of a large scaled physical system at thermal equilibrium in the field of statistical mechanism. In the past decades, the mean field dynamics has been extensively applied for deriving interactive neural dynamics of solving complex tasks, such as combinatorial optimization [4, 6, 9], self-organization [10], clustering analysis [11][12], independent component analysis [13], and regression [14][15].


Introduction
Mean field dynamics have been extensively applied for organizing neural networks in the field of computational neuroscience, since Hopfield pioneered collective decisions of interconnected processing elements for combinatorial optimization [1][2][3][4][5][6] and memory association [7,8]. Both nonlinear transfer functions and synapses in a Hopfield neural network are a subsequence of mean field dynamics that characterize the mean configuration of a large scaled physical system at thermal equilibrium in the field of statistical mechanism. In the past decades, the mean field dynamics has been extensively applied for deriving interactive neural dynamics of solving complex tasks, such as combinatorial optimization [4,6,9], self-organization [10], clustering analysis [11] [12], independent component analysis [13], and regression [14] [15].
Mean field equations characterize feasible configurations for problem solving. Let s i ∈ {−1, 1} denote a binary random variable for modeling a stochastic two-alternative processing element and s = {s i } i represent a configuration for problem solving. The feasibility of s to the attacked problem is inversely quantified by an energy function E(s). Minimizing E(s) with respect to s means to seek the optimal solution. Under the Boltzmann assumption, the joint probability of all s i is proportional to exp(−βE(s)), where β denotes the inverse of a temperature-like parameter. As in previous works [4][5] [6], the Kullback-Leibler divergence between the product of marginal probabilities and the joint probability of all s i induces a tractable free energy function ψ that depends on the expectation of s i , denoted by s i , for all i.
The following mean field dynamics exactly characterize the saddle point of a typical tractable free energy function, where u i denotes an external field, f ≡ tanh is a sigmoid-like transfer function and s i denotes the mean activation. In previous works [4] [6], E is quadratic and u i measures a weight sum of activations other than s i , such as where w ij denotes the synapse that connects neural processing elements i and j. For fixed β, equation (2) defines the transfer function of interconnected processing elements and equation (3) sketches synapses. The realized information processes are distributed and with computational features of fault tolerance and collective decision. All interconnected processing elements in a Hopfield neural network asynchronously operate to seek a stable configuration under an annealing process [6] that carefully scheduling β from sufficiently small to large values.
At each intermediate β, a stable configuration means a result of minimizing the mean energy function against maximizing the entropy for emulating thermal equilibrium of statistical mechanism. At the end of the annealing process, by equation (2), s i ∈ {−1, 1} and the mean configuration s is a vector of N binary values, well representing a feasible solution for problem solving. Empirical results in previous works [4] [6] have extensively shown that the physical-like annealing process guarantees effectiveness and reliability of seeking the global or near global minimum of E(s) for problem solving. In previous works [15][16][17][18], mean field dynamics have been extended for multi-state Potts modeling and applied for unsupervised learning and supervised learning of neural networks toward solving self organization, independent component analysis, function approximation and discriminate analysis.
However from the perspective of numerical simulations, asynchronous operation of interconnected processing elements means one-by-one sequential updating of neural variables. It is more efficient to simulate synchronous and parallel updating of neural variables by vector codes. Multilayer perceptrons or Adalines have been organized for parallel and synchronous processes. Significant computational features include synchronous data transmission and parallel signal processes through multilayer perceptrons. A network of multilayer perceptrons is typically composed of input, hidden, output layers as well as inter-connections among consecutive layers. The input x ∈ R d transmits through interconnections to form external fields, and the nonlinear transfer function translates h to activations of hidden units, which is multiplied by a matrix of posterior weights, denoted by R, to form the network output A recurrent network of multilayer perceptrons is further equipped with circular connections from the output to input layers. By feedback circular connections, the current output becomes the network input at the upcoming time step. Let R be an identity matrix. Setting x to y n and y to y n+1 leads to the following recursive function realized by recurrent multilayer perceptrons Since perceptrons and adalines perform post-nonlinear projection, the organized multilayer neural network realizes a high dimensional nonlinear mapping from the input domain to the output range, which has been shown significant for solving complex tasks against traditional linear systems. Recurrent multilayer perceptrons perform parallel and synchronous computations for realizing the behavior of MIMO (multiple input multiple output) recurrent relation or characterizing nonlinear autoregression of time series. Recurrent multilayer perceptrons have been applied for nonlinear autoregressive modeling of chaotic time series prediction [19] and financial time series [20].
This work applies recurrent multilayer perceptrons for tracking mean field dynamics by synchronous and parallel computations. A systematic approach is proposed for translating mean field dynamics (1) and (2) to the nonlinear recursive function (7) such that recurrent multilayer perceptrons can track the saddle point of ψ by parallel and synchronous computations. The strategy is to introduce time delays and auxiliary variables for expanding local memories of storing individual states, and translate loosely coupled or densely coupled first order mean field equations to a system of post-nonlinear recursive functions, which can be evaluated directly by iterative synchronous computations of recurrent multilayer perceptrons.
Section 2 applies parallel and synchronous computations of recurrent multilayer perceptrons for tracking mean field dynamics. Asynchronous updating of tracking linear dynamics and mean field dynamics is translated to equivalent synchronous updating. Section 3 applies the transformation to derive synchronous updating of tracking mean field dynamics for solving graph bisection problem and verifies the proposed approach by numerical simulations. Section 4 further presents a hybrid of asynchronous and synchronous processes for tracking sparse large scaled mean field dynamics of sparse connectivity for problem solving.

Synchronous computation of tracking mean field dynamics
By asynchronous updating at each time step numerical simulations select one processing element and refine its mean activation under fixed mean activations of the others. Let ψ i ( s i ) denote ψ with fixed s j , j = i, where β = 1 is considered. s i = tanh(h i ) minimizes the above equation. ψ i ( s i ) is an approximation to the one-dimensional function obtained by cutting functional surface of ψ along the direction of s i for fixed s j , j = i. By asynchronous updating the coefficient h i of the linear term in equation (8) always maintains an instance determined by fixing most recently updated mean activations. Asynchronous updating is represented by The asynchronous cutting and approximating strategy is very different from synchronous updating that directly combines equations (2) and (3) for all i, such as where W collects all w ij . By synchronous updating, all h i use the copy formed by all mean activations synchronously determined at the previous step. Numerical simulations have verified synchronous updating based on equation (10) infeasible for relaxing of mean field dynamics.

Linear system
Let f be a linear function and the asynchronous updating rule (9) is equivalent to where A = [a ij ] is a N × N matrix with a ii = 0, ∀i = 1, · · · , N. To facilitate our presentation, we first give an example with N = 4 for illustration. Figure 1 shows data flow of asynchronous updating (11), where directed edges indicate the latest mean activations employed for updating. Each time asynchronous updating insists on revising only one mean activation. Without losing generality, consecutive steps of updating mean activations can be listed as follows, The system (12) is translated to synchronous updating by replacing k with k − 1, k − 2, k − 3 and k − 4 respectively in the four rows of equation (13) for k ≥ 3. The matrix form is expressed by where   For initialization, x[0] is copied three times to form u[N] where N = 4. Figure 2 shows a recurrent linear network for synchronous computations of equation (14). The circular connection transmits the current output to the input layer at the upcoming step. In general, u[k] is given by . . .
which concatenates N − 1 consecutive steps of mean activations and B = [B 1 B 2 · · · B N−1 ] is composed of N − 1 submatrices. Figure 3 and 4 show the structure of matrices {B n } N−1 n=1 . Distinct colors represent nonzero entries. Figure 5 shows the flow chart of creating matrix B. Figure 6 shows the flow chart of simulating asynchronous updating by linear recurrent computations where repmat is a matlab bulit-in function for matrix replication.

Mean field dynamics
Asynchronous updating (11) can be regarded as a special case of asynchronous updating (9) of mean field dynamics. Let f denote a post-nonlinear function and v i = s i for general situations. Synchronous parallel updating is explored for emulating asynchronous updating (9) for tracking mean field dynamics.
Asynchronous updating rule is rewritten as follows,    where ) denote the initial mean configuration. The leave-one-out asynchronous updating is expressed as where v i [k] is the instance of v i at the kth step for k ≥ 0 and c i is a constant. The mean activation of each processing element is asynchronously updated. The system (15) is translated to synchronous updating by replacing index k + n with k in the row of updating where k ≥ N.
The matrix B can be determined by the flow chart in figure 5 for translating mean field dynamics to the following form where denotes the mean configuration at the kth step.    The structure of MIMO recurrent multilayer perceptrons is shown in Figure 7. The derived recurrent multilayer perceptrons track mean field dynamics by parallel and synchronous computations. As in the previous work [6], an annealing process is employed to schedule β from sufficiently small to large values for problem solving. Figure 8 shows the flow chart of simulating synchronous and parallel computations of recurrent multilayer perceptrons for tracking mean field dynamics.

Solving linear systems
The linear recurrent relation (14) is verified by numerical simulations for solving the following linear system, The flow charts in figures 5 and 6 are implemented in Matlab codes. The initial value  Figure 9(a) shows errors of asynchronous updating and synchronous updating along time steps and (b) shows errors after the 25th step. The numerical results show the error of asynchronous updating coverages slower than that of synchronous updating. This illustrates the advantage of synchronous updating. When parallel computations like vector codes are employed, synchronous updating is more efficient than asynchronous updating for numerical simulations.

Graph bisection problem
The graph bisection problem [4] is stated to partition N nodes into two equal sets such that net edges crossing two sets in size is minimized. Let s i ∈ {−1, 1} denote the membership of the ith node to two non-overlapping sets and T ij denote the connectivity, where T ij = 1, if nodes i and j are connected 0, otherwise s i denotes the partition of node i to two disjoint subsets. Node i is in one subset if s i = 1 and belongs to the other if s i = −1. As in [4], E(s) for problem solving is given by , where a is the Lagrange multiplier which forces ∑ N i=1 s i to zero. T ij s i s j is zero if T ij =0. Otherwise, it is 1 if nodes i and j belong the same subset and −1 if node i belongs to one set and node j to the other. Therefore, the first term quantifies the number of net edges crossing two subsets. The last forces equal cut. As in Appendix A, E(s) can be rewritten as where W ij = T ij − A and W ii = 0.
We further explore the performances of synchronous updating by annealed recurrent multilayer perceptrons for graph bisection. In our simulations, each connection T ij between nodes i and j is set to one if a uniform random number within (0, 1) less than 0.2 is generated, and zero otherwise. The parameter a is 2. The halting condition is set to χ(v) > 0.99 where The temperature-like parameter β is always scheduled from sufficiently low to high values. Figure 10 shows the convergence of annealed asynchronous updating (9) and annealed synchronous updating (17) for tracking mean field dynamics of solving a 100-nodes graph bisection problem, where the blue and red curves respectively show the change of the stability and 1/β along time steps. Figure 11 shows the change of cutsize and free energy by blue and red curve, respectively. The histograms of cutsize obtained by 50 executions of annealed asynchronous updating and annealed synchronous updating are plotted in Figure 12, where the mean of cutsize by annealed synchronous updating is 361.84, which is compatible to 358.5 of annealed asynchronous updating.
synchronous updating asynchronous updating Figure 10. The change of the stability and 1/β for solving graph bisection problem by synchronous update and asynchronous update.

(b)
Annealed asynchronous updating Annealed synchronous updating (a) Figure 11. The change of cutsize and free energy for solving graph bisection problem by synchronous update and asynchronous update.

Parallel and distributed processes of tracking mean field dynamics of sparse connectivity
This section discusses the case of sparse interconnection among processing units. In the case, a processing connects only with processing units in a small neighborhood. Sparsely interconnected processing units are partitioned to K clusters such that the cutting size of interconnections crossing distinct clusters is minimized. This formulates a typical problem of K-set partition to a sparse graph. Mean field dynamics for K-set graph partition has been proposed in [6]. As argued previously, parallel and synchronous computations by recurrent multilayer perceptrons can be obtained for tracking mean field dynamics of resolving K-set graph partition. Let {S k } K k=1 be the partitioned K clusters of sparsely interconnected processing units and c k be the outer-input of processing units in S k . c k contains nonzero elements if there exists a processing unit in S k that is connected with units not in S k and those nonzero elements are determined by mean activations of processing units outside S k . After K-set graph partition, all nodes are reindexed according to {S k } K k=1 . Ideally, there is dense connectivity among processing units inside each S k and sparse connectivity among {S k } K k=1 through {c k } K k=1 as illustrated in Figure 13. In each cluster S k when there is a processing unit connecting to processing units outside S k according to the approach in section 2, all processing units inside S k are evaluated directly by synchronous computations for fixed c k . The approach which combines synchronous update of mean activations in side each S k and sequential update among {S k } K k=1 is proposed for tracking mean field dynamics sparse connectivity. The idea follows parallel and distributed processes. This approach decomposes a large system to several sparsely connected small systems, updates mean activations inside each small system synchronously and updates decomposed systems sequentially. Suppose that each S k has the same number of nodes. The size of nodes in S k is |S k | = N K ≪ N. Figure 14 shows the flow chart of the proposed approach. The halting condition states to compare the stability χ(v) with a threshold. An example with N = 12 for illustrating decomposition of a sparse system to three small systems is given in Appendix C.

Conclusions
This paper has proposed a novel approach for tracking mean field dynamics by synchronous computations of recurrent multilayer perceptrons. The strategy is to introduce time delays and auxiliary variables and constructs equivalent recursive relations. This strategy essentially constructs recurrent multilayer perceptrons for tracking densely coupled mean field dynamics. The proposed approach is also extended to deal with large-scale sparsely interconnected mean field dynamics. In the beginning, all processing units are partitioned into K clusters by solving graph partition. The task is decomposed to K subtasks of synchronous computations and different clusters are sparsely connected by outer-inputs. The work combines synchronous updating inside each cluster with sequential updating among K clusters.
Numerical simulations show that the proposed approach has successfully translated mean field equations of solving the graph bisection problem to a system of post-nonlinear recursive functions, and verified the consistency between the original mean field equations and corresponding recurrent computations.

Appendix A. Rewriting energy function of graph bisection problem
N is a constant, the energy function is rewritten as W ij s i s j