Open access peer-reviewed chapter

Solving Partial Differential Equation Using FPGA Technology

By Vu Duc Thai and Bui Van Tung

Submitted: August 31st 2018Reviewed: January 21st 2019Published: December 13th 2019

DOI: 10.5772/intechopen.84588

Downloaded: 40

Abstract

This chapter introduces the method of using CNN technology on FPGA chips to solve differential equation with large space, with lager computing space, while limitation of resource chip on FPGA is needed, we have to find solution to separate differential space into several subspaces. Our solution will do: firstly, division of the computing space into smaller areas and combination of sequential and parallel computing; secondly, division and combination of boundary areas that are required to be continuous to avoid losing temporary data while processing (using buffer memory to store); and thirdly, real-time data exchange. The control unit controls the activities of the whole system set by the algorithm. We have configured the CNN chip for solving Navier-Stokes equation for the hydraulic fluid flow successfully on the Virtex 6 chip XCVL240T-1FFG1156 by Xilinx and giving acceptance results as well.

Keywords

  • Navier-Stokes equation
  • cellular neural network
  • field programmable gate array
  • boundary processing
  • separating computing space

1. Introduction

Solving the partial differential equation (PDE) has been investigated by many researchers, implementing digital decoding on PCs successfully. However, with the problem of large computing space, the resolution on the PC is difficult to meet the requirements of speed and accuracy calculations; in some cases, the problem cannot be solved because of the calculation. Cellular Neural Network technology (CNN) researchers have applied cellular neural network (CNN) technology successfully to perform analysis of the problem, design CNN chip, and solve some PDEs.

Using CNN technology for solving PDE, we have to analyze and difference the original particular equations of problem, find templates, design CNN architecture, and then configure FPGA to make a CNN chip. It means that there is no CNN chip for every equation, but for each problem (consist of some equations), there is need to design appropriate CNN chip. When solving large problems, computing resources are needed to configure blocks of CNN chips. In order to save resources, we have proposed a solution for dividing computing space into smaller subspaces and composite parallel and sequential calculations, which ensures high computing rates but saves resources of FPGA chips used.

Because the architecture of CNN chips varies depending on each problem, making the CNN chip is very difficult and costly with traditional methods. Using the FPGA technology, users can use hardware programming languages, such as Verilog and VHDL, to configure the logic elements in the FPGA to produce the electronic circuit of a CNN chip. The recent FPGA architectures (Virtex 7; Stratix 10) have many tools support to test, optimize, and coordinate data exchange. The CNN designer should use FPGA for making a CNN chip.

2. CNN and FPGA technology

2.1 Cellular neural network technology

Cellular neural network (CNN) was introduced by Chua and Yang at Berkeley University, California (USA), in 1988, which combined both analog spatial temporal dynamic and logic [1, 2, 3]. The CNN paradigm is a natural framework to describe the behavior of locally interconnected dynamic systems, which has an arrayed structure, so it is very useful in solving the partial differential equations [3, 4, 5, 6, 7]. Today, visual microprocessors based on this processing type can perform at TeraOPs computing power and approximately 50,000 fps. The possibility of developing algorithms and programs based on CNN was quickly exploited worldwide. Up to now, there are several CNN models for processing images, solving PDE, recognizing pattern, gene analysis, etc. Depending on problems, the designer can make a CNN chip having size of millions cells. The common CNN architectures are 1D, 2D, and 3D.

The standard CNN 2D is the dynamic system of autonomous cells that are connected locally with its neighbor forming a two-dimensional array [2, 18]. Each cell in the array C(i,j) contains one independent voltage source, one independent current source, a linear capacitor, resistors, and linear voltage-controlled current sources which are coupled to its neighbor cells via the controlling input voltage, and the feedback from the output voltage of each neighbor cell C(k,l). The templates A(i,j;kl) and B(i,j;k,l) are the parameters linking cell C(i,j) to neighbor C(k,l). The effective range of Sr(I,j) on radius r of cell C(I,j) is identified by the set of neighbor cells which satisfies (Figure 1).

Figure 1.

The architecture of a CNN chip.

Srij=Cklmaxkiljr
with1kM,1lN.

The state equation of cell C(i,j) is given by the following equation:

Cxijt=1Rxij+CklSrijAij klykl+CklSrijBij klukl+zijE1

With R, C is the linear resistor and capacitor; A(i,j;kl) is the feedback operator parameter; B(i,j;kl) is the control parameter; and zij is the bias value of the cell C(i,j). On the CNN chip, (A, B, z) are the local connective weight values of each cell C(i,j) to its neighbors. The output of the cell C(i,j) is presented by Yij as:

Yij=fxij=12xij+1+12xij1E2

The characteristic of the CNN output function Yi,j = f(xij) is presented in Figure 2.

Figure 2.

CNN circuit output function.

On the CNN 3D, beside connection with neighbors, the cell has other connection to upper and lower layer in the three-dimensional space [18] as shown in Figure 3. Thus, if radius r = 1, the cell C(i,j,k) has 26 neighbors; hence, the templates A and B have more three coefficients A(i,j,k) and B(i,j,k).

Figure 3.

The 3D CNN, with r = 1, (having 26 neighbors) in three dimensions coordinates x,y,z.

The state equation of CNN 3D takes the form:

Cxijkt=1Rxijk+ClmnSrijkAijk lmnylmn+ClmnSrijkBijk lmnylmn+zijkE3

The output function is similar to CNN 2D:

yijk=fxijk=12xijk+1+xijk1

For the problem-solving of three-dimensional PDE, the CNN 3D must be used. The original PDE is differentiated and from that the appropriate templates (A,B,z) of the CNN 3D are generated.

2.2 Field-programmable gate array technology

Field-programmable gate array (FPGA) is the technology in which the blank blocks have available resources of logic gates and RAM blocks are used to implement complex digital computations. FPGAs can be used to implement any logical function. The FPGA block is able to update the functionality after shipping, partial reconfiguration of a portion of the design, and the low nonrecurring engineering costs relative to an ASIC design [13, 14, 15, 16].

A recent trend has been to take the coarse-grained architectural approach by combining the logic blocks and interconnects of traditional FPGAs with embedded chips and related peripherals to form a complete “system on a programmable chip” [17, 18, 19].

Users like teachers and students could use FGGA for making prototypes for testing application system, with VHDL or Verilog users easily design and test and then reconfigure the system until it has desired results.

2.3 Using FPGA to make CNN chip for solving PDE

Because the CNN architecture is not the same for every application, based on the standard model, the designer develops a particular chip for each problem. FPGA is the most useful for configuring a blank chip to make a CNN chip using programming language like Verilog or VHDL. For solving PDE, firstly, one needs to analyze (differencing) the original model of partial differential equations for finding appropriate template, then base on template found designing architecture CNN chip, finally, using VHDL to configure FPGA following designed hardware making CNN chip.

Some PDEs have been solved using the CNN technology:

Burger equation [3]:

uxtt=1R2uxtx2uxtuxtx+Fxt

Klein-Gordon equation [19]:

2uxtt2=2uxtsinuxt

Heat diffusion equation [3]:

uxytt=c2uxyt

Black-Scholes equation [9]:

Vxtt=rVxt12σ2S22VxtS2rSVxtS

Air pollution equation [4]:

φt+divvφ+σφγ2φz2μ2φ=fxyz

Saint venant 2D equation [5]:

Ht+ux+vy=0
ut+u2x+gHx+uvy=guu2+v21/2Kx2H2
vt+v2y+gHy+uvx=gvu2+v21/2Ky2H2

Saint venant 1D equation [6]:

bhxtt+Qxtx=qE4
Qxtt+Qxt2bhxtx+gbhxthxtxgIbhxt+gJbhxt=kqqE5

Example of making a CNN chip for solving Saint venant 1D:

  • Designing the templates

First, changing the original equation (4)

bhxtt+Qxtx=q
hxtt=Qxtbx+qbE6

and then choosing the difference space of variables x with step Δx for right part of (6). After differencing only the right side of (6) for space variable x by Taylor expansion, one has equation for cell at position (i):

ht=12bΔxQi+1Qi1+qbE7

Note that, following the CNN algorithm, on the left, we do use symbol (h/t). From (7), one has found templates:

AhQ=12bΔx1Rh12bΔx;Bh=010;zh=0;

where Rh is the linear resistance on cell circuit of h.

For Eq. (5), changing slightly with assumptions above:

Qxtt+Qxt2bhxtx+gbhxthxtxgIbhxt+gJbhxt=kqqE8

Assume that q > 0, then kq = 0. After differencing, applying the template design algorithm of CNN, one can has templates for (8):

AQ=Qi+12bΔxhi+11RQQi12bΔxhi1];
AQh=gbhi2ΔxgbIJgbhi2Δx;BQ=0;zQ=0;

From template found, we can design the CNN architecture for problem as (1) two layered-1D CNN chip (Figure 4) and (2) the h, Q processing block (Figure 5).

Figure 4.

Logical architecture of a CNN cell.

Figure 5.

Logical architecture of a h, Q cell.

The cell is mixed both of h, Q in one block to make the physical architecture of a CNN cell.

In general, for each calculation, we need some basic computing block like ADDITION, SUBTRACT, MULTIPLE, DIVIDE. When designing a CNN cell using FPGA, one has to design many separate blocks of them to perform arithmetical processing for each input. In order to save computing resource in FPGA, the method that shares basic block in one cell leading to sequential calculating can be used (Figure 6). In this case, the processing time of each cell will be high. To reduce the processing time of each cell, we can use a pipeline mechanism shown in Figure 7, but it needs more computing resource for each cell. Finally, for cells in a CNN chip, we process parallel as in Figure 8.

Figure 6.

Physical architecture of CNN cell.

Figure 7.

Solution for physical architecture CNN chip.

Figure 8.

A core architecture for CNN chip.

C1, …, C4 are the coefficients as shown in Figure 7, (C1= 12bxt; C2= gb2xt; C3= gbIJt; C4= qbt).

If each cell is uses a pipeline mechanism shown in Figure 7. With the length of a pipeline is 6, the first calculation pays 6 clock pulse (clk), and each calculation after that only needs 1 clk.

3. Solving Navier-Stokes equations

3.1 Physico-mathematical model of Navier-Stokes equations

In hydraulics, many flow models have been researched, such as flows in channels, streams, or rivers, for controlling the flow for preventing disasters, saving water, and exploiting energy of the flow as well. Most of mathematical models of those phenomena are partial differential equations like Saint venant equations and Navier-Stokes equations [8, 9]. Some types of Navier-Stokes equations have various parameters and constraints. Using CNN technology, we could solve some of them which have clear values of boundary conditions; it means we do not research boundary problems deeply. The effectiveness of the CNN technology is making a physical parallel computing chip to increase the computing speed for satisfying a real-time system.

Navier-Stokes equations here consist of three partial differential equations, with functional variables representing water height, and flow velocity in x- and y-directions. The empirical model is a flow through a small port, which diffuses in two directions Ox and Oy.

Solving Navier-Stokes equations by using CNN requires the discretion of continuity model by difference method, the smaller difference intervals the higher accuracy. However, if difference intervals are too small, then it leads to increasing the calculation complexity and time. The CNN chip with parallel physically processing abilities, the above difficulties will be overcome.

3.2 Description equations in Navier-Stokes equations

  • Equations describing the water level

ρzw∂t+ρqxx+ρqyy=ρqAE9

Assume that the height of water is taken from the bottom of the flow, which is regarded as the origin of the coordinate system, so zw has no negative values.

  • Momentum equations in x-direction:

    ρqx∂t+xρβqx2d+yρβqxqyd+ρgdzw∂x+ρgdSfxτwxxρKLqxxyρKTqxy=0E10

  • Momentum equations in y-direction:

ρqy∂t+yρβqy2d+xρβqyqxd+ρgdzwy+ρgdSfyτwyyρKLqyyxρKTqyx=0E11

Explain the meanings of quantities in the equations:

  • ρqxtand ρqyt: quantities characterizing the momentum variation over time in x-axis and y-axis, respectively.

  • xρβqx2dand yρβqy2d: kinetic energy variations of flow in x- and y-directions.

  • ρgdzwxand ρgdzwy: potential energy variations of flow in x- and y-directions.

  • ρgdSfxand ρgdSfy: influence of friction by bottom and walls of channel on flow in x- and y-directions. Values of Sfxand Sfyare determined based on physical properties of bottom and walls of hydraulic channels according to the following formulas:

    Sfx=qxn2qx2+qy21/2d1/3;Sfy=qyn2qy2+qx21/2d1/3nis Manning coefficient

  • τwxand τwy: wind pressure on free surface of hydraulic flow in x-and y-directions are calculated as follows:

τwx=csρaW2cosΨ;τwy=csρaW2sinΨ,

where:

cx=103;khiWWmincs1+cs2WWmin.103;khiW>Wmin;

With cs1; cs2; Wmin are values get from practical, for example: Wmin = 4 m/s; wind speed is 10 m/s, then cs1 = 1.0; cs2 = 0.067;

  • ρais the air density at free surface (kgm−3); W is wind speed at free surface; and Ψis the angle between wind direction and x-axis.

  • Expressions, xρKLqxxyρKTqxyand yρKLqyyxρKTqyx, are the impact of turbulence in hydraulic flow caused between x- and y-directions, where: KL=qxlpewith Pe as the Peclet coefficient with the value of 15–40; l as the length of flow; KL as coefficient varying according to locations along flow; and KT = 0.3–0.7 KL.

3.3 Analyzing and designing CNN to solve the equations

To simplify, change parameters as: the water level zw = h; and the velocity in x-axis qx = u, in y-axis qy = v. Assume that qA = 0; the kinetic influence of turbulent values between velocity in the direction from 0y to 0x (or 0x to 0y) is trivial since horizontal velocity is small enough to be considered as zero; then (9)(11) are rewritten:

h∂t+ux+vy=0h∂t=uxvyE12
v∂t+yβv2d+xβvud+gdhy+gdSfyτwyρyKLvy=0v∂t=yKLvyyβv2dxβvudgdhy+τwyρgdSfyE13
u∂t=xβu2d+yβuvd+gdh∂x+gdSfxτwxρxKLuxu∂t=xKLuxxβu2dyβuvdgdh∂x+τwxρgdSfxE14

Step 1: Differencing equations following Taylor formula

Using finite difference grid with difference interval in x-axis as Δxand in y-axis as Δyand apply Taylor difference formulas for Eqs. (12)(14); we have difference equations corresponding to the equations:

hijt=ui+1,jui1,j2Δxvi,j+1vi,j12ΔyE15
ui,jt=βdui+1,j2Δxui+1,jui1,j2Δxui1,jβdvi,j+12Δyui+1,jvi,j12Δyui1,jgdhi+1,jhi1,j2ΔxgdSfx+1ρτwxKLui+1,j2ui,j+ui1,jΔx2]E16
vi,jt=βdvi,j+12Δyvi,j+1vi,j12Δyvi,j1βdui+1,j2Δxvi,j+1ui1,j2Δxvi,j1gdhi,j+1hi,j12ΔxgdSfy+1ρτwyKLvi,j+12vi,j+vi,j1Δy2]E17

Step 2: Designing a sample of CNN

Based on CNN state equations and difference equations (15)(17), we can have CNN templates for layers h, u, v:

  • Layer h:

    Ahu=00012Δx012Δx000Ahv=012Δy0000012Δy0E18

  • Layer u:

Auv=0βui,j12dΔy00000βui,j+12dΔy0; Auh=000gd2Δx0gd2Δx000; Bu=1ρτwx000010000
Au=000βui1,j2dΔx+KLΔx2gdn2uij2+vij21/2d1/3+1Ru+4KLΔx2βui+1,j2dΔx+KLΔx2000;zu=0E19

  • Layer v:

Avh=0gd2Δy00000gd2Δy0; Avu=000βui1,j2dΔx0βui1,j2dΔx000; Bv=1ρτwy000010000; zv=0
Av=0βvi,j+12dΔy+KLΔy20KLΔy2gdn2ui,j2+vi,j22d1/3+1Rv+KLΔy2KLΔy20βvi,j+12dΔyKLΔy20E20

Step 3: Designing hardware architecture of CNN to solve Navier-Stokes equations

Based on templates found in (18)(20), we can design an architecture for circuit for CNN chip. It is a three-layered CNN 2D. Then, the arithmetic unit for each layer and links to perform parallel calculation on chip can be made. Figure 9 shows the architecture of layer h and layer u (the layer v is similar to u).

Figure 9.

Logic architecture of cell of h, u.

3.4 Proposed system architecture for MxN CNN

The empirical problems that need a solution is that: firstly, identifying boundary points of whole difference grid (space); secondly, dividing the entire computing space into smaller subspaces. Division and combination of boundary areas need to perform appropriately avoiding incorrect results because of tep time computing time; thirdly, controlling real-time data exchange and combining sequential and parallel computing in a CNN chip. The CNN chip proposed in this chapter has solved similarity in the previous problems [4, 5]. The new issues here are dividing computing space processing dynamic sub-boundary and combining sequential and parallel.

3.4.1 General MxN CNN

Each CNN cell has its own data element and a core that performs the computing function. The CNN has MxN CNN cells in which only (M-2)x(N-2) CNN cells have computing functions, so that the CNN has MxN data elements and (M-2)x(N-2) cores (Figure 10).

Figure 10.

General architecture of a CNN chip.

The Buffer supplies MxN data elements for CNN. Each MxN data element is called as one block of data (Figure 11).

Figure 11.

Buffer (MxN) for CNN core.

The white area is the data element for CNN boundary cells; and the gray part is the data area which requires to be processed by CNN. The CNN arithmetic unit has size of (M-2)x(N-2) cells processing data for the gray area which is inside the input buffer unit.

The Input memory has PxQ blocks of data. It is a true dual port memory.

The Temp memory also has PxQ blocks of data. It is a simple dual port memory. It is used to temporarily store data computed from CNN core and supply data for Boundary updating unit.

Data that need processing sent from PC have the size of mxn (Figure 12).

Figure 12.

Computing space with main boundary.

Assume that m = 5, n = 6, M = 3, and N = 4; the white part is boundary and the gray part is the area requiring to be processed. Before the processing data, temporary vertical and horizontal boundaries be need to be added, as in Figure 13, column (0,3) and row (3,0).

Figure 13.

Divide computing space into subspace with subboundary.

Temporary vertical and horizontal boundaries are added to the data structure similar to CNN buffer. The data after being added from temporary vertical and horizontal boundaries will be sent to Input memory. The blocks of data in the Input memory unit (in case that mxn = 5x6, MxN = 3x4) are detailed as follows (Figure 14).

Figure 14.

The blocks of data in the Input memory in case that mxn = 5x6, MxN = 3x4.

0, 1, 2,.., 6 are the addresses of blocks. In case that mxn = 5x6 and MxN = 3x4, we have P = 3 and Q = 2.

PxQ=m2M2xn2N2

The Boundary updating unit is in detail structure as follows (in case MxN = 3x4) (Figure 15).

Figure 15.

The Boundary updating structure (MxN = 3x4).

The control unit controls the activities of the whole system set by the algorithm which is as follows: (1) At every posedge of clk do(2) {(3)   if (has IO event)(4)       do the IO task;(5)   else(6)       buffer = read(Input memory)(7)       if (finish computing the first block)(8)           if (BoundaryUpdating())(9)               write(Input memory)(10) }

3.4.2 Proposed CNN architecture when M = 3 (3xN CNN)

The 3xN CNN architecture is similar to the general MxN CNN architecture (M = 3). In order to reduce the memory consumption and simplify the Boundary updating unit, there are some differences (Figure 16).

Figure 16.

The architecture of 3xN CNN chip.

Each block of data in the memory (Input memory or Temp memory) is 1xN data elements. Assume that the data which need processing sent from PC has the size of mxn, m = 5, n = 6, and assume that N = 4. As mention above, the data will be processed after temporary vertical boundaries are added; so that, the Input Memory unit will has 5x2 blocks of data (m = 5, Q = 2) as follow (Figure 17).

Figure 17.

The memory with 5x2 blocks (m==5, n = 6, N = 4).

Each block has size of 1x4 data elements.

The Buffer unit is a Shift up register that has size of 3xN. The input and output have sizes of 1xN and 3xN, respectively. The input is at the bottom.

The Input memory has m rows and Q columns of blocks of data. The control unit reads the blocks in the Input memory by vertical and puts the block of data to the input of buffer. The buffer shifts up 1 step. After step 3, the Buffer has 3xN blocks of data to supply to CNN core. After each step, the Buffer has 3xN blocks of data that need to supply to CNN core (Figure 18).

Figure 18.

The Buffer’s state after each step (m==5, n = 6, N = 4).

The output of CNN core has the size of 1xN.

The Boundary updating unit is shown in Figure 19.

Figure 19.

The output size of CNN core (N = 4).

The control algorithm for control unit (Figure 20).(1) At every posedge of clk do(2) {(3)   if (has IO event)(4)       do the IO task;(5)   else(6)       buffer = read(Input memory);//read by vertical(7)       if (finish computing the first block of column q)(8)           if (column_of_current_block==0)                 write(Temp memory);            else                 BoundaryUpdating(CNNoutput,read(Temp                  memory));(9)               write(Input memory);(10) }

Figure 20.

The Boundary updating structure (N = 4).

Figure 21.

The chip Virtex 6 (XCVL240T-1FFG1156) connected to PC for configuring to make CNN chip and performing calculation.

3.5 Implementation

In this part, we implement the 3xN CNN. Q, m, and N are the parameters that we can configure before compiling and programming to the FPGA chip. For defaulting, we assigned Q = 2, m = 8, and N = 4.

3.5.1 Development environment

For experiencing, the ISE Design Suite software version 14.7 and ML605 evaluation board including chip XCVL240T-1FFG1156 (Virtex 6) are used to implement the schematic of CNN.

First, we use Verilog HDL language to describe the CNN architecture. Then, we use ISim simulator to verify our system. Finally, we program the system to the FPGA chip on ML605 board.

The image of experience system as in Figure 20 is as follows.

3.5.2 Input data for h, u, v values

The input of CNN to solve the Navier-Stokes Equation has h, u, v values. We use three Input memory units, three Buffer units, and three Temporary memory units to store h, u, v values. The data element is represented in 32-bit floating point real numbers. Data into h, u, v are added with temporary boundaries, detailed as follow (presented in Decimal and Hex of Single-type Floating-point) (Figure 22).

Figure 22.

Initial data for the Input memory h, u, v.

The interface of each Input memory, Temporary memory for h, u, v is configurated as same in Figure 23. The initial data for the Input memory h, u, v is initialed by COE files. A COE file stores initial values for a memory (Figure 24).

Figure 23.

Interface for Input and Temp memory h, u, v.

Figure 24.

An example of h.core file to initial data for the Input memory h.

3.5.3 Shift up register

3.5.4 CNN core

3.5.5 Boundary updating

3.5.6 Control unit

The interface of Control unit is described as follows.

3.5.7 System scheme

To verify the system, the interface of the top module of the system should include all the signals that we want to verify.

The top module is described as follows.Control CU(       .CountCLK(CountCLK),       .wraddressHUVTemp(wraddrTemp),       .rdaddressHUVTemp(rdaddrTemp),       .wrenTemp(wrenTemp),       .clk(clk),       .wraddressHUV(wraddr),       .rdaddressHUV(rdaddr),       .wren(wren),       .start(start),       .EnableBoundaryUpdating(EnableBoundaryUpdating),       .finish(finish));InputMemoryHUV #(N) InputMemory(            clk,rdaddr,doutH,doutU,doutV,            wraddr,wren,HNew,UNew,VNew);InputBuffer #(M,N) Buffer(            clk,doutH,doutU,doutV,            matrixhin,matrixuin,matrixvin);CNNCore #(M,N) uut(            .clk(clk),            .matrixhin(matrixhin),            .matrixuin(matrixuin),            .matrixvin(matrixvin),            .matrixhout(matrixhout),            .matrixuout(matrixuout),            .matrixvout(matrixvout));BoundaryUpdatingHUV #(N) Boundary(            matrixhout,matrixuout,matrixvout,            doutHNewTemp,doutUNewTemp,doutVNewTemp,            EnableBoundaryUpdating,            HNewTemp,UNewTemp,VNewTemp,            HNew,UNew,VNew);TempMemoryHUV #(N) TempMemory(       clk,wraddrTemp,wrenTemp,HNewTemp,UNewTemp,       VNewTemp,       rdaddrTemp,doutHNewTemp,doutUNewTemp,doutVNewTemp);       endmodule

3.6 Simulation results

The ISE design software shows the device utilization summary as in Table 1.

Devices used summary (estimated values)
Logic utilizationUsedAvailableUtilization
Number of slice registers3952301,4401%
Number of slice LUTs16,365150,72010%
Number of fully used LUT-FF pairs177018,5479%
Number of bonded IOBs3112600518%
Number of Block RAM/FIFO124162%
Number of BUFG/BUFGCTRLs1323%
Number of DSP48E1s13276817%

Table 1.

Device utilization summary.

Figures 2527 show the schematics synthesized by the ISE design software.

Figure 25.

The architecture of CNN chip.

Figure 26.

The architecture of one CNN cell.

Figure 27.

Inside electronic circuit for h.

Comparing the new values of h in Figure 28i, k (doutH) with Figure 29, we can see that the 3x4 CNN system worked well.

Figure 28.

Signals operating inside the 3x4 CNN system, m = 8, Q = 2. (a) Starting a computing cycle by setting start = 1. (b) The output of Input memory (doutH). (c) The data outputting from Buffer after 4 clks. (d) The results from CNN core after 10 clks; and start writing the results to Temp memory. (e) The CNN core finish computing the first column of blocks of data at 16 clks; and pause writing the results to Temp memory at 16 clks. (f) The results from CNN core after 18 clks; read Temp memory, start updating boundaries, and write the results to Input memory. (g) Pause updating boundaries from 24 clks. (h) The CNN core finishes computing; read the last column of blocks of data from Temp memory and write to Input memory. (h) Finish writing all results of the first computing cycle to Input memory. (i) The controller sets finish = 1 at 33 clks. (k) The output of Input memory shows the results computed at previous computing cycle. (l) The overview of signals.

Figure 29.

The new values of h computed by excel for the first computing cycle.

The simulation results show the properness and effectiveness of installation methods. The cost for calculating the first three blocks of 1xN taken from memory units h, u, v is 10 clock pulses, of which 1 clock pulse is for initial reading Input memory, 3 clock pulse is for initial updating buffer to CNN, and 6 clock pulses for initial calculation. Each successive 1xN unit takes only 1 clock pulse to calculate, due to the use of the pipeline mechanism to update buffer to CNN and calculate at CNN arithmetic unit. After finishing reading each column of blocks of data in the Input memory, it needs 2 more clocks for initiating the buffer again. It also takes 1 clk for initial writing Temp memory, 1 clk for initial reading Temp memory, and 1 clk for initial writing result back to Input memory.

As a result, the time for one computing cycle is:

T=8+mQ+1clk

As the above implementation, m = 8, Q = 2, and T = 32 (clk).

4. Conclusion

This chapter gives the solution for configuring CNN chip to solve Navier-Stokes equations, especially concerning to solution in the temporary boundary problem when it is required. The purpose is to divide the big data space into many subspaces. The processing of the big data space is based on the calculation of each subdata. With the input data of 32-bit floating point real number and FPGA chip Virtex 6 XCVL240T-1FFG1156, the CNN of 1x12 cells has successfully installed. The installation results show that the effectiveness of this solution mainly lies on the expansion of calculation space and resource saving and the accuracy of the calculation acceptable as well. This model can be further developed to feasibly solve similar problems in larger computing space and could be developed for some types of complicated (mixed) boundaries as well.

Acknowledgments

We would like to deeply acknowledge Professor Roska Tamas, the head of the Analogic and Neural Computing Research Laboratory and Chairman of the Scientific Council—Institute of the Hungarian Academy of Sciences; and Associate Professor Pham Thuong Cat, the Head of Automation Laboratory—Institute of Information Technology—Vietnam Academy of Science and Technology, for giving us many important instructions.

How to cite and reference

Link to this chapter Copy to clipboard

Cite this chapter Copy to clipboard

Vu Duc Thai and Bui Van Tung (December 13th 2019). Solving Partial Differential Equation Using FPGA Technology, Boundary Layer Flows - Theory, Applications and Numerical Methods, Vallampati Ramachandra Prasad, IntechOpen, DOI: 10.5772/intechopen.84588. Available from:

chapter statistics

40total chapter downloads

More statistics for editors and authors

Login to your personal dashboard for more detailed statistics on your publications.

Access personal reporting

Related Content

This Book

Next chapter

3D Boundary Layer Theory

By Vladimir Shalaev

Related Book

First chapter

Recent Advances in Fragment Molecular Orbital-Based Molecular Dynamics (FMO-MD) Simulations

By Yuto Komeiji, Yuji Mochizuki, Tatsuya Nakano and Hirotoshi Mori

We are IntechOpen, the world's leading publisher of Open Access books. Built by scientists, for scientists. Our readership spans scientists, professors, researchers, librarians, and students, as well as business professionals. We share our knowledge and peer-reveiwed research papers with libraries, scientific and engineering societies, and also work with corporate R&D departments and government entities.

More About Us