Open access peer-reviewed chapter

Solving Partial Differential Equation Using FPGA Technology

Written By

Vu Duc Thai and Bui Van Tung

Submitted: 10 September 2018 Reviewed: 21 January 2019 Published: 13 December 2019

DOI: 10.5772/intechopen.84588

From the Edited Volume

Boundary Layer Flows - Theory, Applications and Numerical Methods

Edited by Vallampati Ramachandra Prasad

Chapter metrics overview

1,194 Chapter Downloads

View Full Metrics


This chapter introduces the method of using CNN technology on FPGA chips to solve differential equation with large space, with lager computing space, while limitation of resource chip on FPGA is needed, we have to find solution to separate differential space into several subspaces. Our solution will do: firstly, division of the computing space into smaller areas and combination of sequential and parallel computing; secondly, division and combination of boundary areas that are required to be continuous to avoid losing temporary data while processing (using buffer memory to store); and thirdly, real-time data exchange. The control unit controls the activities of the whole system set by the algorithm. We have configured the CNN chip for solving Navier-Stokes equation for the hydraulic fluid flow successfully on the Virtex 6 chip XCVL240T-1FFG1156 by Xilinx and giving acceptance results as well.


  • Navier-Stokes equation
  • cellular neural network
  • field programmable gate array
  • boundary processing
  • separating computing space

1. Introduction

Solving the partial differential equation (PDE) has been investigated by many researchers, implementing digital decoding on PCs successfully. However, with the problem of large computing space, the resolution on the PC is difficult to meet the requirements of speed and accuracy calculations; in some cases, the problem cannot be solved because of the calculation. Cellular Neural Network technology (CNN) researchers have applied cellular neural network (CNN) technology successfully to perform analysis of the problem, design CNN chip, and solve some PDEs.

Using CNN technology for solving PDE, we have to analyze and difference the original particular equations of problem, find templates, design CNN architecture, and then configure FPGA to make a CNN chip. It means that there is no CNN chip for every equation, but for each problem (consist of some equations), there is need to design appropriate CNN chip. When solving large problems, computing resources are needed to configure blocks of CNN chips. In order to save resources, we have proposed a solution for dividing computing space into smaller subspaces and composite parallel and sequential calculations, which ensures high computing rates but saves resources of FPGA chips used.

Because the architecture of CNN chips varies depending on each problem, making the CNN chip is very difficult and costly with traditional methods. Using the FPGA technology, users can use hardware programming languages, such as Verilog and VHDL, to configure the logic elements in the FPGA to produce the electronic circuit of a CNN chip. The recent FPGA architectures (Virtex 7; Stratix 10) have many tools support to test, optimize, and coordinate data exchange. The CNN designer should use FPGA for making a CNN chip.


2. CNN and FPGA technology

2.1 Cellular neural network technology

Cellular neural network (CNN) was introduced by Chua and Yang at Berkeley University, California (USA), in 1988, which combined both analog spatial temporal dynamic and logic [1, 2, 3]. The CNN paradigm is a natural framework to describe the behavior of locally interconnected dynamic systems, which has an arrayed structure, so it is very useful in solving the partial differential equations [3, 4, 5, 6, 7]. Today, visual microprocessors based on this processing type can perform at TeraOPs computing power and approximately 50,000 fps. The possibility of developing algorithms and programs based on CNN was quickly exploited worldwide. Up to now, there are several CNN models for processing images, solving PDE, recognizing pattern, gene analysis, etc. Depending on problems, the designer can make a CNN chip having size of millions cells. The common CNN architectures are 1D, 2D, and 3D.

The standard CNN 2D is the dynamic system of autonomous cells that are connected locally with its neighbor forming a two-dimensional array [2, 18]. Each cell in the array C(i,j) contains one independent voltage source, one independent current source, a linear capacitor, resistors, and linear voltage-controlled current sources which are coupled to its neighbor cells via the controlling input voltage, and the feedback from the output voltage of each neighbor cell C(k,l). The templates A(i,j;kl) and B(i,j;k,l) are the parameters linking cell C(i,j) to neighbor C(k,l). The effective range of Sr(I,j) on radius r of cell C(I,j) is identified by the set of neighbor cells which satisfies (Figure 1).

Figure 1.

The architecture of a CNN chip.

Sr i j = C k l max k i l j r
with 1 k M , 1 l N .

The state equation of cell C(i,j) is given by the following equation:

C x ij t = 1 R x ij + C k l S r i j A i j   k l y kl + C k l S r i j B i j   k l u kl + z ij E1

With R, C is the linear resistor and capacitor; A(i,j;kl) is the feedback operator parameter; B(i,j;kl) is the control parameter; and zij is the bias value of the cell C(i,j). On the CNN chip, (A, B, z) are the local connective weight values of each cell C(i,j) to its neighbors. The output of the cell C(i,j) is presented by Yij as:

Y ij = f x ij = 1 2 x ij + 1 + 1 2 x ij 1 E2

The characteristic of the CNN output function Yi,j = f(xij) is presented in Figure 2.

Figure 2.

CNN circuit output function.

On the CNN 3D, beside connection with neighbors, the cell has other connection to upper and lower layer in the three-dimensional space [18] as shown in Figure 3. Thus, if radius r = 1, the cell C(i,j,k) has 26 neighbors; hence, the templates A and B have more three coefficients A(i,j,k) and B(i,j,k).

Figure 3.

The 3D CNN, with r = 1, (having 26 neighbors) in three dimensions coordinates x,y,z.

The state equation of CNN 3D takes the form:

C x ijk t = 1 R x ijk + C l m n S r i j k A i j k   l m n y lmn + C l m n S r i j k B i j k   l m n y lmn + z ijk E3

The output function is similar to CNN 2D:

y ijk = f x ijk = 1 2 x ijk + 1 + x ijk 1

For the problem-solving of three-dimensional PDE, the CNN 3D must be used. The original PDE is differentiated and from that the appropriate templates (A,B,z) of the CNN 3D are generated.

2.2 Field-programmable gate array technology

Field-programmable gate array (FPGA) is the technology in which the blank blocks have available resources of logic gates and RAM blocks are used to implement complex digital computations. FPGAs can be used to implement any logical function. The FPGA block is able to update the functionality after shipping, partial reconfiguration of a portion of the design, and the low nonrecurring engineering costs relative to an ASIC design [13, 14, 15, 16].

A recent trend has been to take the coarse-grained architectural approach by combining the logic blocks and interconnects of traditional FPGAs with embedded chips and related peripherals to form a complete “system on a programmable chip” [17, 18, 19].

Users like teachers and students could use FGGA for making prototypes for testing application system, with VHDL or Verilog users easily design and test and then reconfigure the system until it has desired results.

2.3 Using FPGA to make CNN chip for solving PDE

Because the CNN architecture is not the same for every application, based on the standard model, the designer develops a particular chip for each problem. FPGA is the most useful for configuring a blank chip to make a CNN chip using programming language like Verilog or VHDL. For solving PDE, firstly, one needs to analyze (differencing) the original model of partial differential equations for finding appropriate template, then base on template found designing architecture CNN chip, finally, using VHDL to configure FPGA following designed hardware making CNN chip.

Some PDEs have been solved using the CNN technology:

Burger equation [3]:

u x t t = 1 R 2 u x t x 2 u x t u x t x + F x t

Klein-Gordon equation [19]:

2 u x t t 2 = 2 u x t sin u x t

Heat diffusion equation [3]:

u x y t t = c 2 u x y t

Black-Scholes equation [9]:

V x t t = rV x t 1 2 σ 2 S 2 2 V x t S 2 rS V x t S

Air pollution equation [4]:

φ t + div v φ + σφ γ 2 φ z 2 μ 2 φ = f x y z

Saint venant 2D equation [5]:

H t + u x + v y = 0
u t + u 2 x + g H x + uv y = gu u 2 + v 2 1 / 2 K x 2 H 2
v t + v 2 y + g H y + uv x = gv u 2 + v 2 1 / 2 K y 2 H 2

Saint venant 1D equation [6]:

b h x t t + Q x t x = q E4
Q x t t + Q x t 2 bh x t x + gbh x t h x t x gIbh x t + gJbh x t = k q q E5

Example of making a CNN chip for solving Saint venant 1D:

  • Designing the templates

First, changing the original equation (4)

b h x t t + Q x t x = q
h x t t = Q x t b x + q b E6

and then choosing the difference space of variables x with step Δx for right part of (6). After differencing only the right side of (6) for space variable x by Taylor expansion, one has equation for cell at position (i):

h t = 1 2 b Δ x Q i + 1 Q i 1 + q b E7

Note that, following the CNN algorithm, on the left, we do use symbol ( h / t ). From (7), one has found templates:

A hQ = 1 2 b Δ x 1 R h 1 2 b Δ x ; B h = 0 1 0 ; z h = 0 ;

where Rh is the linear resistance on cell circuit of h.

For Eq. (5), changing slightly with assumptions above:

Q x t t + Q x t 2 bh x t x + gbh x t h x t x gIbh x t + gJbh x t = k q q E8

Assume that q > 0, then kq = 0. After differencing, applying the template design algorithm of CNN, one can has templates for (8):

A Q = Q i + 1 2 b Δ xh i + 1 1 R Q Q i 1 2 b Δ xh i 1 ] ;
A Qh = gbh i 2 Δ x gb I J gbh i 2 Δ x ; B Q = 0 ; z Q = 0 ;

From template found, we can design the CNN architecture for problem as (1) two layered-1D CNN chip (Figure 4) and (2) the h, Q processing block (Figure 5).

Figure 4.

Logical architecture of a CNN cell.

Figure 5.

Logical architecture of a h, Q cell.

The cell is mixed both of h, Q in one block to make the physical architecture of a CNN cell.

In general, for each calculation, we need some basic computing block like ADDITION, SUBTRACT, MULTIPLE, DIVIDE. When designing a CNN cell using FPGA, one has to design many separate blocks of them to perform arithmetical processing for each input. In order to save computing resource in FPGA, the method that shares basic block in one cell leading to sequential calculating can be used (Figure 6). In this case, the processing time of each cell will be high. To reduce the processing time of each cell, we can use a pipeline mechanism shown in Figure 7, but it needs more computing resource for each cell. Finally, for cells in a CNN chip, we process parallel as in Figure 8.

Figure 6.

Physical architecture of CNN cell.

Figure 7.

Solution for physical architecture CNN chip.

Figure 8.

A core architecture for CNN chip.

C1, …, C4 are the coefficients as shown in Figure 7, (C1= 1 2 b x t ; C2= gb 2 x t ; C3= gb I J t ; C4= q b t ).

If each cell is uses a pipeline mechanism shown in Figure 7. With the length of a pipeline is 6, the first calculation pays 6 clock pulse (clk), and each calculation after that only needs 1 clk.


3. Solving Navier-Stokes equations

3.1 Physico-mathematical model of Navier-Stokes equations

In hydraulics, many flow models have been researched, such as flows in channels, streams, or rivers, for controlling the flow for preventing disasters, saving water, and exploiting energy of the flow as well. Most of mathematical models of those phenomena are partial differential equations like Saint venant equations and Navier-Stokes equations [8, 9]. Some types of Navier-Stokes equations have various parameters and constraints. Using CNN technology, we could solve some of them which have clear values of boundary conditions; it means we do not research boundary problems deeply. The effectiveness of the CNN technology is making a physical parallel computing chip to increase the computing speed for satisfying a real-time system.

Navier-Stokes equations here consist of three partial differential equations, with functional variables representing water height, and flow velocity in x- and y-directions. The empirical model is a flow through a small port, which diffuses in two directions Ox and Oy.

Solving Navier-Stokes equations by using CNN requires the discretion of continuity model by difference method, the smaller difference intervals the higher accuracy. However, if difference intervals are too small, then it leads to increasing the calculation complexity and time. The CNN chip with parallel physically processing abilities, the above difficulties will be overcome.

3.2 Description equations in Navier-Stokes equations

  • Equations describing the water level

ρ z w ∂t + ρ q x x + ρ q y y = ρ q A E9

Assume that the height of water is taken from the bottom of the flow, which is regarded as the origin of the coordinate system, so zw has no negative values.

  • Momentum equations in x-direction:

    ρ q x ∂t + x ρβ q x 2 d + y ρβ q x q y d + ρgd z w ∂x + ρ gdS f x τ wx x ρ K L q x x y ρ K T q x y = 0 E10

  • Momentum equations in y-direction:

ρ q y ∂t + y ρβ q y 2 d + x ρβ q y q x d + ρgd z w y + ρ gdS fy τ wy y ρ K L q y y x ρ K T q y x = 0 E11

Explain the meanings of quantities in the equations:

  • ρ q x t and ρ q y t : quantities characterizing the momentum variation over time in x-axis and y-axis, respectively.

  • x ρβ q x 2 d and y ρβ q y 2 d : kinetic energy variations of flow in x- and y-directions.

  • ρgd z w x and ρgd z w y : potential energy variations of flow in x- and y-directions.

  • ρ gdS f x and ρ gdS fy : influence of friction by bottom and walls of channel on flow in x- and y-directions. Values of S fx and S fy are determined based on physical properties of bottom and walls of hydraulic channels according to the following formulas:

    S f x = q x n 2 q x 2 + q y 2 1 / 2 d 1 / 3 ; S fy = q y n 2 q y 2 + q x 2 1 / 2 d 1 / 3 n is Manning coefficient

  • τ wx and τ wy : wind pressure on free surface of hydraulic flow in x-and y-directions are calculated as follows:

τ wx = c s ρ a W 2 c os Ψ ; τ wy = c s ρ a W 2 sin Ψ ,


c x = 10 3 ; khi W W min c s 1 + c s 2 W W min .10 3 ; khi W > W min ;

With cs1; cs2; Wmin are values get from practical, for example: Wmin = 4 m/s; wind speed is 10 m/s, then cs1 = 1.0; cs2 = 0.067;

  • ρ a is the air density at free surface (kgm−3); W is wind speed at free surface; and Ψ is the angle between wind direction and x-axis.

  • Expressions, x ρ K L q x x y ρ K T q x y and y ρ K L q y y x ρ K T q y x , are the impact of turbulence in hydraulic flow caused between x- and y-directions, where: K L = q x l p e with Pe as the Peclet coefficient with the value of 15–40; l as the length of flow; KL as coefficient varying according to locations along flow; and KT = 0.3–0.7 KL.

3.3 Analyzing and designing CNN to solve the equations

To simplify, change parameters as: the water level zw = h; and the velocity in x-axis qx = u, in y-axis qy = v. Assume that qA = 0; the kinetic influence of turbulent values between velocity in the direction from 0y to 0x (or 0x to 0y) is trivial since horizontal velocity is small enough to be considered as zero; then (9)(11) are rewritten:

h ∂t + u x + v y = 0 h ∂t = u x v y E12
v ∂t + y β v 2 d + x β vu d + gd h y + gdS fy τ wy ρ y K L v y = 0 v ∂t = y K L v y y β v 2 d x β vu d gd h y + τ wy ρ gdS fy E13
u ∂t = x β u 2 d + y β uv d + gd h ∂x + gdS f x τ wx ρ x K L u x u ∂t = x K L u x x β u 2 d y β uv d gd h ∂x + τ wx ρ gdS f x E14

Step 1: Differencing equations following Taylor formula

Using finite difference grid with difference interval in x-axis as Δ x and in y-axis as Δ y and apply Taylor difference formulas for Eqs. (12)(14); we have difference equations corresponding to the equations:

h ij t = u i + 1 , j u i 1 , j 2 Δx v i , j + 1 v i , j 1 2 Δy E15
u i , j t = β d u i + 1 , j 2 Δ x u i + 1 , j u i 1 , j 2 Δ x u i 1 , j β d v i , j + 1 2 Δ y u i + 1 , j v i , j 1 2 Δ y u i 1 , j gd h i + 1 , j h i 1 , j 2 Δ x g dS fx + 1 ρ τ wx K L u i + 1 , j 2 u i , j + u i 1 , j Δ x 2 ] E16
v i , j t = β d v i , j + 1 2 Δy v i , j + 1 v i , j 1 2 Δy v i , j 1 β d u i + 1 , j 2 Δx v i , j + 1 u i 1 , j 2 Δx v i , j 1 gd h i , j + 1 h i , j 1 2 Δx g dS fy + 1 ρ τ wy K L v i , j + 1 2 v i , j + v i , j 1 Δ y 2 ] E17

Step 2: Designing a sample of CNN

Based on CNN state equations and difference equations (15)(17), we can have CNN templates for layers h, u, v:

  • Layer h:

    A hu = 0 0 0 1 2 Δ x 0 1 2 Δ x 0 0 0 A hv = 0 1 2 Δ y 0 0 0 0 0 1 2 Δ y 0 E18

  • Layer u:

A uv = 0 β u i , j 1 2 d Δ y 0 0 0 0 0 β u i , j + 1 2 d Δ y 0 ;   A uh = 0 0 0 gd 2 Δ x 0 gd 2 Δ x 0 0 0 ;   B u = 1 ρ τ wx 0 0 0 0 1 0 0 0 0
A u = 0 0 0 β u i 1 , j 2 d Δ x + K L Δ x 2 gd n 2 u ij 2 + v ij 2 1 / 2 d 1 / 3 + 1 R u + 4 K L Δ x 2 β u i + 1 , j 2 d Δ x + K L Δ x 2 0 0 0 ; z u = 0 E19

  • Layer v:

A vh = 0 g d 2 Δ y 0 0 0 0 0 g d 2 Δ y 0 ;   A vu = 0 0 0 β u i 1 , j 2 d Δ x 0 β u i 1 , j 2 d Δ x 0 0 0 ;   B v = 1 ρ τ wy 0 0 0 0 1 0 0 0 0 ;   z v = 0
A v = 0 β v i , j + 1 2 d Δ y + K L Δ y 2 0 K L Δ y 2 g d n 2 u i , j 2 + v i , j 2 2 d 1 / 3 + 1 R v + K L Δ y 2 K L Δ y 2 0 β v i , j + 1 2 d Δ y K L Δ y 2 0 E20

Step 3: Designing hardware architecture of CNN to solve Navier-Stokes equations

Based on templates found in (18)(20), we can design an architecture for circuit for CNN chip. It is a three-layered CNN 2D. Then, the arithmetic unit for each layer and links to perform parallel calculation on chip can be made. Figure 9 shows the architecture of layer h and layer u (the layer v is similar to u).

Figure 9.

Logic architecture of cell of h, u.

3.4 Proposed system architecture for MxN CNN

The empirical problems that need a solution is that: firstly, identifying boundary points of whole difference grid (space); secondly, dividing the entire computing space into smaller subspaces. Division and combination of boundary areas need to perform appropriately avoiding incorrect results because of tep time computing time; thirdly, controlling real-time data exchange and combining sequential and parallel computing in a CNN chip. The CNN chip proposed in this chapter has solved similarity in the previous problems [4, 5]. The new issues here are dividing computing space processing dynamic sub-boundary and combining sequential and parallel.

3.4.1 General MxN CNN

Each CNN cell has its own data element and a core that performs the computing function. The CNN has MxN CNN cells in which only (M-2)x(N-2) CNN cells have computing functions, so that the CNN has MxN data elements and (M-2)x(N-2) cores (Figure 10).

Figure 10.

General architecture of a CNN chip.

The Buffer supplies MxN data elements for CNN. Each MxN data element is called as one block of data (Figure 11).

Figure 11.

Buffer (MxN) for CNN core.

The white area is the data element for CNN boundary cells; and the gray part is the data area which requires to be processed by CNN. The CNN arithmetic unit has size of (M-2)x(N-2) cells processing data for the gray area which is inside the input buffer unit.

The Input memory has PxQ blocks of data. It is a true dual port memory.

The Temp memory also has PxQ blocks of data. It is a simple dual port memory. It is used to temporarily store data computed from CNN core and supply data for Boundary updating unit.

Data that need processing sent from PC have the size of mxn (Figure 12).

Figure 12.

Computing space with main boundary.

Assume that m = 5, n = 6, M = 3, and N = 4; the white part is boundary and the gray part is the area requiring to be processed. Before the processing data, temporary vertical and horizontal boundaries be need to be added, as in Figure 13, column (0,3) and row (3,0).

Figure 13.

Divide computing space into subspace with subboundary.

Temporary vertical and horizontal boundaries are added to the data structure similar to CNN buffer. The data after being added from temporary vertical and horizontal boundaries will be sent to Input memory. The blocks of data in the Input memory unit (in case that mxn = 5x6, MxN = 3x4) are detailed as follows (Figure 14).

Figure 14.

The blocks of data in the Input memory in case that mxn = 5x6, MxN = 3x4.

0, 1, 2,.., 6 are the addresses of blocks. In case that mxn = 5x6 and MxN = 3x4, we have P = 3 and Q = 2.

PxQ = m 2 M 2 x n 2 N 2

The Boundary updating unit is in detail structure as follows (in case MxN = 3x4) (Figure 15).

Figure 15.

The Boundary updating structure (MxN = 3x4).

The control unit controls the activities of the whole system set by the algorithm which is as follows: (1) At every posedge of clk do(2) {(3)   if (has IO event)(4)       do the IO task;(5)   else(6)       buffer = read(Input memory)(7)       if (finish computing the first block)(8)           if (BoundaryUpdating())(9)               write(Input memory)(10) }

3.4.2 Proposed CNN architecture when M = 3 (3xN CNN)

The 3xN CNN architecture is similar to the general MxN CNN architecture (M = 3). In order to reduce the memory consumption and simplify the Boundary updating unit, there are some differences (Figure 16).

Figure 16.

The architecture of 3xN CNN chip.

Each block of data in the memory (Input memory or Temp memory) is 1xN data elements. Assume that the data which need processing sent from PC has the size of mxn, m = 5, n = 6, and assume that N = 4. As mention above, the data will be processed after temporary vertical boundaries are added; so that, the Input Memory unit will has 5x2 blocks of data (m = 5, Q = 2) as follow (Figure 17).

Figure 17.

The memory with 5x2 blocks (m==5, n = 6, N = 4).

Each block has size of 1x4 data elements.

The Buffer unit is a Shift up register that has size of 3xN. The input and output have sizes of 1xN and 3xN, respectively. The input is at the bottom.

The Input memory has m rows and Q columns of blocks of data. The control unit reads the blocks in the Input memory by vertical and puts the block of data to the input of buffer. The buffer shifts up 1 step. After step 3, the Buffer has 3xN blocks of data to supply to CNN core. After each step, the Buffer has 3xN blocks of data that need to supply to CNN core (Figure 18).

Figure 18.

The Buffer’s state after each step (m==5, n = 6, N = 4).

The output of CNN core has the size of 1xN.

The Boundary updating unit is shown in Figure 19.

Figure 19.

The output size of CNN core (N = 4).

The control algorithm for control unit (Figure 20).(1) At every posedge of clk do(2) {(3)   if (has IO event)(4)       do the IO task;(5)   else(6)       buffer = read(Input memory);//read by vertical(7)       if (finish computing the first block of column q)(8)           if (column_of_current_block==0)                 write(Temp memory);            else                 BoundaryUpdating(CNNoutput,read(Temp                  memory));(9)               write(Input memory);(10) }

Figure 20.

The Boundary updating structure (N = 4).

Figure 21.

The chip Virtex 6 (XCVL240T-1FFG1156) connected to PC for configuring to make CNN chip and performing calculation.

3.5 Implementation

In this part, we implement the 3xN CNN. Q, m, and N are the parameters that we can configure before compiling and programming to the FPGA chip. For defaulting, we assigned Q = 2, m = 8, and N = 4.

3.5.1 Development environment

For experiencing, the ISE Design Suite software version 14.7 and ML605 evaluation board including chip XCVL240T-1FFG1156 (Virtex 6) are used to implement the schematic of CNN.

First, we use Verilog HDL language to describe the CNN architecture. Then, we use ISim simulator to verify our system. Finally, we program the system to the FPGA chip on ML605 board.

The image of experience system as in Figure 20 is as follows.

3.5.2 Input data for h, u, v values

The input of CNN to solve the Navier-Stokes Equation has h, u, v values. We use three Input memory units, three Buffer units, and three Temporary memory units to store h, u, v values. The data element is represented in 32-bit floating point real numbers. Data into h, u, v are added with temporary boundaries, detailed as follow (presented in Decimal and Hex of Single-type Floating-point) (Figure 22).

Figure 22.

Initial data for the Input memory h, u, v.

The interface of each Input memory, Temporary memory for h, u, v is configurated as same in Figure 23. The initial data for the Input memory h, u, v is initialed by COE files. A COE file stores initial values for a memory (Figure 24).

Figure 23.

Interface for Input and Temp memory h, u, v.

Figure 24.

An example of h.core file to initial data for the Input memory h.

3.5.3 Shift up register

3.5.4 CNN core

3.5.5 Boundary updating

3.5.6 Control unit

The interface of Control unit is described as follows.

3.5.7 System scheme

To verify the system, the interface of the top module of the system should include all the signals that we want to verify.

The top module is described as follows.

Control CU(       .CountCLK(CountCLK),       .wraddressHUVTemp(wraddrTemp),       .rdaddressHUVTemp(rdaddrTemp),       .wrenTemp(wrenTemp),       .clk(clk),       .wraddressHUV(wraddr),       .rdaddressHUV(rdaddr),       .wren(wren),       .start(start),       .EnableBoundaryUpdating(EnableBoundaryUpdating),       .finish(finish));InputMemoryHUV #(N) InputMemory(            clk,rdaddr,doutH,doutU,doutV,            wraddr,wren,HNew,UNew,VNew);InputBuffer #(M,N) Buffer(            clk,doutH,doutU,doutV,            matrixhin,matrixuin,matrixvin);CNNCore #(M,N) uut(            .clk(clk),            .matrixhin(matrixhin),            .matrixuin(matrixuin),            .matrixvin(matrixvin),            .matrixhout(matrixhout),            .matrixuout(matrixuout),            .matrixvout(matrixvout));BoundaryUpdatingHUV #(N) Boundary(            matrixhout,matrixuout,matrixvout,            doutHNewTemp,doutUNewTemp,doutVNewTemp,            EnableBoundaryUpdating,            HNewTemp,UNewTemp,VNewTemp,            HNew,UNew,VNew);TempMemoryHUV #(N) TempMemory(       clk,wraddrTemp,wrenTemp,HNewTemp,UNewTemp,       VNewTemp,       rdaddrTemp,doutHNewTemp,doutUNewTemp,doutVNewTemp);       endmodule

3.6 Simulation results

The ISE design software shows the device utilization summary as in Table 1.

Devices used summary (estimated values)
Logic utilization Used Available Utilization
Number of slice registers 3952 301,440 1%
Number of slice LUTs 16,365 150,720 10%
Number of fully used LUT-FF pairs 1770 18,547 9%
Number of bonded IOBs 3112 600 518%
Number of Block RAM/FIFO 12 416 2%
Number of BUFG/BUFGCTRLs 1 32 3%
Number of DSP48E1s 132 768 17%

Table 1.

Device utilization summary.

Figures 2527 show the schematics synthesized by the ISE design software.

Figure 25.

The architecture of CNN chip.

Figure 26.

The architecture of one CNN cell.

Figure 27.

Inside electronic circuit for h.

Comparing the new values of h in Figure 28i, k (doutH) with Figure 29, we can see that the 3x4 CNN system worked well.

Figure 28.

Signals operating inside the 3x4 CNN system, m = 8, Q = 2. (a) Starting a computing cycle by setting start = 1. (b) The output of Input memory (doutH). (c) The data outputting from Buffer after 4 clks. (d) The results from CNN core after 10 clks; and start writing the results to Temp memory. (e) The CNN core finish computing the first column of blocks of data at 16 clks; and pause writing the results to Temp memory at 16 clks. (f) The results from CNN core after 18 clks; read Temp memory, start updating boundaries, and write the results to Input memory. (g) Pause updating boundaries from 24 clks. (h) The CNN core finishes computing; read the last column of blocks of data from Temp memory and write to Input memory. (h) Finish writing all results of the first computing cycle to Input memory. (i) The controller sets finish = 1 at 33 clks. (k) The output of Input memory shows the results computed at previous computing cycle. (l) The overview of signals.

Figure 29.

The new values of h computed by excel for the first computing cycle.

The simulation results show the properness and effectiveness of installation methods. The cost for calculating the first three blocks of 1xN taken from memory units h, u, v is 10 clock pulses, of which 1 clock pulse is for initial reading Input memory, 3 clock pulse is for initial updating buffer to CNN, and 6 clock pulses for initial calculation. Each successive 1xN unit takes only 1 clock pulse to calculate, due to the use of the pipeline mechanism to update buffer to CNN and calculate at CNN arithmetic unit. After finishing reading each column of blocks of data in the Input memory, it needs 2 more clocks for initiating the buffer again. It also takes 1 clk for initial writing Temp memory, 1 clk for initial reading Temp memory, and 1 clk for initial writing result back to Input memory.

As a result, the time for one computing cycle is:

T = 8 + m Q + 1 clk

As the above implementation, m = 8, Q = 2, and T = 32 (clk).


4. Conclusion

This chapter gives the solution for configuring CNN chip to solve Navier-Stokes equations, especially concerning to solution in the temporary boundary problem when it is required. The purpose is to divide the big data space into many subspaces. The processing of the big data space is based on the calculation of each subdata. With the input data of 32-bit floating point real number and FPGA chip Virtex 6 XCVL240T-1FFG1156, the CNN of 1x12 cells has successfully installed. The installation results show that the effectiveness of this solution mainly lies on the expansion of calculation space and resource saving and the accuracy of the calculation acceptable as well. This model can be further developed to feasibly solve similar problems in larger computing space and could be developed for some types of complicated (mixed) boundaries as well.



We would like to deeply acknowledge Professor Roska Tamas, the head of the Analogic and Neural Computing Research Laboratory and Chairman of the Scientific Council—Institute of the Hungarian Academy of Sciences; and Associate Professor Pham Thuong Cat, the Head of Automation Laboratory—Institute of Information Technology—Vietnam Academy of Science and Technology, for giving us many important instructions.


  1. 1. Chua LO, Yang L. Cellular neural networks: Theory. IEEE Transactions on Circuits and Systems. 1988;35(10):1257-1272
  2. 2. Chua LO, Yang L. Cellular neural networks: Application. IEEE Transactions on Circuits and Systems. 1988;35:1273-1290
  3. 3. Roska T, Chua LO, Wolf D, Kozek T, Tetzlaff R, Puffer F. Simulating nonlinear waves and partial differential equations via CNN—Part I: Basic techniques. IEEE Transactions on Circuits and Systems. 1995;42(10):807-815
  4. 4. Thai VD, Cat PT. Modeling air pollution problem by cellular neural network. In: Proceeding (ISI) of 10th International Conference on Control, Automation, Robotics and Vision; Hanoi, Vietnam; 2008. pp. 1115-1118
  5. 5. Thai VD, Cat PT. Solving two-dimensional Saint venant equation by using cellular neural network. In: Proceeding of the 7th Asian Control Conference—ASCC2009; Hong Kong; 2009. pp. 1258-1263
  6. 6. Thai VD, Cat PT. Equivalence and stability of two-layer cellular neural network solving Saint venant 1D equation. In: Proceeding (ISI) of 11th International Conference on Control, Automation, Robotics and Vision (ICARCV2010); Singapore; 2010. pp. 704-709
  7. 7. Thai VD, Anh BT, Duong VT. Develop some application of Cyclone—DE2C35 chip. Journal of Science and Technology - Thai Nguyen University. 2015;10(140):103-108
  8. 8. Thai VD, Linh LH, Linh NM. Solving Navier-Stokes equation using FPGA cellular neural network chip. In: Proceeding of International Conference on Advances in Information and Communication Technology (ICTA2016). Springer Publishing; 2016. pp. 562-571
  9. 9. Rusin WM. On solution to Navier-Stokes equation in critical spaces [Thesis of Doctor Philosophy]. 2010. Available from:…/Rusin_umn_0130E_11277.pdf
  10. 10. Hruska J. Intel launches Stratix 10: Altera FPGA combined with ARM CPU, 14nm manufacturing. Extremetech. 2016. Available from:
  11. 11. la Pedus M. Intel-Altera deal to shake up foundry landscape. Chip Design Magazine. 2013. Available from:
  12. 12. Clive M. The Design Warrior’s Guide to FPGAs: Devices, Tools and Flows. Elsevier; 2004. Available from:
  13. 13. Clive M. Programmable Logic DesignLine. Xilinx Unveil Revolutionary 65nm FPGA Architecture: The Virtex-5 Family. 2006. Available from:
  14. 14. David WP. Google patent search. Dynamic Data Reprogrammable PLA. 2009
  15. 15. David WP, Peterson LR. Google patent search. Dynamic Data Reprogrammable PLA. 2009
  16. 16. Wisniewski R. Synthesis of Compositional Microprogram Control Units for Programmable Devices. University of Zielona Góra Press; 2009, (ul. Podgórna 50, 65-246 Zielona Góra. Dane kontaktowe)
  17. 17. Black F, Scholes MS. The pricing of options and corporate liabilities. Journal of Political Economy. 1973;81(3):59-637
  18. 18. Chua LO, Roska T. Cellular Neural Networks and Visual Computing. Cambridge University Press; 2000. ISBN: 0-521-65247-2. Available from:
  19. 19. FPGA Architecture for the Challenge. Available from:∼vaughn/challenge/fpga_arch.html
  20. 20. Intel® FPGAs offer a wide variety of configurable embedded SRAM, high-speed transceivers, high-speed I/Os, logic blocks, and routing. Built-in intellectual property (IP) combined with outstanding software tools lower FPGA development time, power, and cost. Available from:

Written By

Vu Duc Thai and Bui Van Tung

Submitted: 10 September 2018 Reviewed: 21 January 2019 Published: 13 December 2019