Trans_Proc: A Processor to Implement the Linear Transformations on the Image and Signal Processing and Its Future Scope

Atri Sanyal; Amitabha Sinha

doi:10.5772/intechopen.99122

Abstract

We present here Transproc, a reconfigurable generic processor which can execute operations related to linear transformations like FFT, FDCT or FDWT. A graph theoretic lemma is used to find the applicability of such a processor to calculate the flow graph related parallel operations found in these linear transformations. The architecture level design and processing element level design is presented. The primitive instruction set and the control signal implementing the instruction set is proposed. A detailed simulation validating the correctness of PE level and the architecture level data calculation and routing operations are carried out using Xilinx Vivado Webpack. The result related to size, power and timing requirement is presented.

Keywords

Transform processor
Graph Theoretic Concept
Design
Primitive Instruction Set
Simulation

Author Information

Show +

Atri Sanyal*
- Amity Institute of Information Technology, Amity University, India
Amitabha Sinha
- UGC Adjunct Faculty, Maulana Abul Kalam Azad University of Technology, India

*Address all correspondence to: atri.sanyal@gmail.com

1. Introduction

In this paper we have proposed an efficient architecture for implementation of frequently used and computationally intensive linear transformations in signal or image processing. The linear transformations like FFT, FDCT or FDWT are computationally intensive and also critical for the processing applications. The papers proposing different designs in this domain are mainly of three types. The first category papers propose architectures to implement only a single category of linear transformations like FFT or FDCT [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]. Since these implementation’s primary focus is on speed so they are mainly implemented on ASIC. These include a variety of algorithms to decrease the number of computationally intensive operations. We have seen multiplier less variety, high speed pipeline, data forwarding, step lifting techniques implementing FFT or FDCT algorithms which greatly decrease the computational complexity and increase the speed, and others. The second category of papers propose processors or architectures which can implement a number of general linear transformations like FFT, FDCT, FDWT. Since these architectures include basic building blocks common to all these transformations and so they need to reconfigure itself before executing different transformations, they are mainly implemented using reconfigurable architectures like FPGA [5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17]. Our paper proposes a processor of that category. The third category of papers discuss implementation of more generic image/signal applications [18, 19, 20]. While describing a linear transform data flow graph is used extensively in different literatures. It was proved earlier in [21, 22] by graph theoretical and mathematical induction that a MIMD processor consisting of processing elements connected like a completely connected equi- vertex bi partite graph can copy any actions shown in the flow graph of transformations like FFT, FDCT, FDWT etc. of any arbitrary size. This confirms that a processor with such type of architecture can execute the transforms represented using flow graph method. The architecture of processing element and the overall architecture discussed in [21, 22] is described thoroughly here. The architecture of control unit and the data exchange procedure between the main CPU and memory and this processor and its local memory is discussed in detail here. The instruction set for processing element and the overall processor are all described along with their corresponding control lines. The representative examples of each category of the instruction set are considered and the step wise control signal to implement them is discussed. The entire architecture requires reconfigurability as it is capable of implementing several transforms by its own. Then the architecture is coded in VHDL, synthesized and simulated using Xilinx Vivado. The processor is simulated to verify the operations in three stages. First the component inside the processing element (floating point adder and multiplier) is simulated and tested. Then the longest sequence of execution required in Loefflers FDCT algorithm is tested for each and every processing element and finally the testing of the overall architecture and the data routing between different processing element is simulated and tested. The synthesis result showing the size of the architecture in LUT level and the synthesis result of power and time are discussed. The rest of the paper is composed in this way, Section 2 discusses the theoretical background of the architecture, Section 3 discusses the implementation of the processor in a modular way, the overall architecture of the processor and the implementable CU is presented, then the processing element level architecture is presented, instruction set and the control signals implementing some representative examples of the instruction set is shown. Section 4 discusses the step by step synthesis and the simulation results in terms of speed, timing and size. Finally Section 5 discusses the conclusion and future scope of the work.

2. Proof of the architecture using graph theoretic approach

The theoretical proof using mathematical induction is given in [21] in detail. Here in this paper we will just present a brief of the argument.

The flow graph shown in the picture [23] is a widely used method of calculating transformations like FFT, FDCT and FDWT. In FFT or FDCT we can see that the flow graph looks physically like an equi vertex k partite graph where k is equal to the no of stages, the vertices are processing elements and the connections among the processing elements are the edges. Since the stages are mutually exclusive among each other so an equi vertex k partite graph like architecture can be reproduced by a fully connected equi vertex bipartite graph if the vertex set contains an one to one mapping between every stage of the k –partite and two stages of the bipartite graph. So any algorithm which is described by a flow graph of the first category can be described by a graph of the later category since the vertex set has the one to one mapping as described. From this argument it is clear that an architecture representing the second category will be efficient as a transform processor and the reconfigurity will make it easy to switch over from one transform to another making it a general transform processor. The orginal architecture requires two sets of processing elements in both the parts and a fully connected bidirectional communication wire between them. The hardware cost can be largely reduced if instead of that we take one set of processing elements and another set of registers, a fully connected feed forward network from register to processing elements and a single feedback network connecting each processing element to their corresponding register. Then the data exchange between two processing elements Pi →Pj can be rewritten as Pi→Ri→Pj. This will take two clock pulses rather than one but the hardware cost will be significantly reduced.

3. Implementation of the architecture

3.1 Implemenation of the overall architecture design

The fully connected feed forward path described in the previous section is created by 8 multiplexers of size 8 x 3. Each one of them can take input from any eight registers and send the output to any one Processsing Element. The signal lines of the individual multiplexer select the input register loading the value in the Processing Element (PE). This constitutes the most simple but effective feedforward communication lines between the registers and PEs The feedback line is implemented by a combination of 1x2 demultiplexer and 2x1 multiplexer duo which direct the output of the PE to the Input line of the corresponding register.n The same duo can also load the data from the memory in the beginning and once the calculation is complete can store them. The current design is examined with 8 such stages keeping mainly the view of implementing one stage of a FDCT algorithm. The architecture uses 8 bit register sets to latch value while entering or exiting to/from processing element.

3.2 Implementation of the processing element inside the processor

The implementation of the processing element (PE) inside the processor is done keeping in mind the type of operations which are performed to compute these type of transformations. Most of the operations are floating point type. So we used one floating point adder/subtractor and one floating point multiplier inside the PE. We have used commonly found floating point adder and multiplier in this PE. Keeping open the testing of state of the art designs to improve the performance of adder and multiplier in future for this design. There are two registers which will be used to latch source data of adder and similar two registers which will be used to latch source data of multiplier. The result of adder and multiplier is stored in similar two registers. The PE contains multiplexer and demultiplexer inside to route the data from one internal register to another and to send/receive data to/from the registers outside PE.

The Table 1 lists below the routing control signals and their functions for the processor and the routing and activation control signals and their functions for Processing elements PE.

Signal Name	Select bits	Function
Inmux 1–8	0/1	0 = select input data from outside memory (TP1–8) 1 = select input data from the feedback line (FB1–8)
Routmux1–8	000–111	000 = Select data from the Midreg1 to PE1–8 ..................... 111 = Select data from the Midreg8 to PE1–8
Outdemux1–8	0/1	0 = select output data from Outreg1–8 to FB1–8 1 = select output data from Outreg1–8 to output
CMUXSEL	000–111	Select any constants C0-C7 based upon the select line.
PECL1	0/1	0 = DEMUX1→MUX0 1 = DEMUX1→MUX1
PECL2	00–11	00 = Direct Load from outside to D0/D1 01 = Movement of data from D4 to D0/D1 10 = Movement of data from D5 to D0/D1 11 = Load constant data from C0-C7
PECL3	00–11	00 = Direct Load from outside to D2/D3 01 = Movement of data from D5 to D2/D3 10 = Movement of data from D4 to D2/D3 11 = Load constant data from C0-C7
PECL4	0/1	0 = Enable bit for D0, 1 = Enable bit for D1
PECL5	0/1	0 = Enable bit for D2, 1 = Enable bit for D3
PEEN1	0/1	1 = Enable bit for D4
PEEN2	0/1	1 = Enable bit for D5
PECL6	0/1	0 = DEMUX2→.MUX0/MUX1 1 = DEMUX2→.MUX2
PECL7	0/1	0 = DEMUX3→.MUX0/MUX1 1 = DEMUX3→.MUX2
PECL8	0/1	0 = Select input from DEMUX2 1 = Select input from DEMUX3

Table 1.

Name of the control signals, there values and functions used in Trans_Proc.

3.3 Primitive instruction set of the processor

The primitive instruction set which is formulated for the processor is mainly contains two categories. Category A is for the instructions to implement routing operations of the processor outside PE and category B is for the arithmetic calculation and data movement operations inside PE.

Data Loading/Routing Operations Outside PE:
1. Load Direct MIDREG i: To load data from outside memory in the MIDREG i from TP i [i = 1…8]
2. Load Feedback MIDREG i: To load data in MIDREG i from feedback line FB i [i = 1…8]
3. 3. Rout PE i,MIDREG j = Routing data from any MIDREG j to any PE i. [i,j = 1…8]
4. Out OUTREG i = For storing the value from OUTREG i to outside memory. [i = 1….8]
Data Loading/Movement and Mathematical Operations Inside PE:
1. Load [D0-D3][PE i] = to load data in any of the registers D0-D3 from outside memory of PE i.
2. Load [D0-D3], [C0-C7], [PE i] = to load data in any of the registers of D0-D3 from any of the constant registers C0-C7of PE i.
3. Add [PE i] = to add the data present in D0 and D1 and keep it in D4 of PE i.
4. Mul [PE i] = to multiply the data present in D2 and D3 and keep it in D5 of PE i.
5. Move [D0-D3], [D4-D5], [PE i] = data movement operation from any of the output registers of D4-D5 to any of the input registers of D0-D3 of PE i.
6. Out [D4-D5], [PE i] = Write back data from any of the output registers D4-D5 of PE i to OUTREG i of PE i.

Next we calculate the total number of instruction per PE and the overall architecture in the Table 2 below for each group as well as the overall total:

Group name	Total number of instruction per PE	Total number of instruction
A1	N/A	8
A2	N/A	8
A3	N/A	64
A4	N/A	8
B1	4	32
B2	32	256
B3	1	8
B4	1	8
B5	8	64
B6	2	16
Total	48	472

Table 2.

Total no of instructions of different group.

We can see that the total numbers of instructions are 472 out of which 48 are for each PE and 88 are for outside PE. The control signals of the different components and their functions of the processor units are specified in the previous table, from that we can specify the sequence of control signals which will be activated in order to implement each of the instructions of the instruction set. We can see one representative instruction for each group and the corresponding control signals and their sequence of activation to implement the instruction in the following Table 3. The table listing all the instructions can be found in appendix.

Category A	Sequence of control signals
Load direct MIDREG 1	1.TP1→Data 2. INMUX1→0 3. EN-MIDREG1→1
Load Feedback MIDREG1	1.OUTDEMUX1→0 2. INMUX1→1 3.EN-MIDREG1→1
Rout MIDREG3,PE5	1.EN-MIDREG3→0 2. ROUTMUX5→011
Out OUTREG6	1.EN-OUTREG6→0 2. OUTDEMUX6→1
Category B	Sequence of control signals
Load D0,PE 1	1.input_PE1→data 2.PECL1_PE1→0 3.PECL2_PE1→00 4.PECL4_PE1→1
Load D0,C5,PE 4	1.EN-C5_PE4→1 2.CMUX_PE4→101 3.PECL2_PE4→11 4.PECL4_PE4→1
Add PE3	1.PEEN1_PE3→1
Mul_PE2	2.PEEN2_PE2→1
Out D4_PE7	1.PECL6_PE7→1 2.PECL8_PE7→0 3.data_PE7→output
Move D1,D5,PE 6	1.PECL7_PE6→0 2.PECL2_PE6→10 3. PECL4_PE6→0

Table 3.

Sequence of operations for implementing C1*X + C2*Y.

3.4 Implementation of operations using the instruction set of the architecture

If we consider the flow graph of the FDCT algorithm of figure taken as an example, we can see that the algorithm is divided into 4 stages and each stage contains 8 PE executing operations which are of three types: floating point addition/subtraction, floating point multiplication and floating point operation evaluating expression of the type C1*X + C2* Y. Next, we see a stage wise operation schedule of the 8 PEs (specifying what each PE does in these 4 stages) in the below Table 4:

Stage 1 P1: Reg0 + Reg7 P2: Reg1 + Reg6 P3: Reg2 + Reg5 P4: Reg3 + Reg4 P5: Reg3-Reg4 P6: Reg2-Reg5 P7: Reg1-Reg6 P8: Reg0-Reg7	Stage 2 P1: Reg0 + Reg3 P2: REg1 + Reg2 P3: Reg1-Reg2 P4: Reg0-Reg3 P5: C3π/16Reg4 + S3π/16Reg7 P6: Cπ/16Reg5 + Sπ/16Reg6 P7: -Sπ/16Reg5 + Cπ/16Reg6 P8: -S3π/16Reg4 + C3π/16Reg7
Stage 3 P1: Reg0 + Reg1 P2: Reg0-Reg1 P3: √2 C3π/8Reg2 + S3π/8Reg3 P4: -S3π/8Reg2 + √2C3π/8Reg3 P5: Reg4 + Reg6 P6: Reg5-Reg7 P7: Reg4-Reg6 P8: Reg5 + Reg7	Stage 4 P1:---- P2:---- P3:----- P4:----- P5: Reg4-Reg7 P6: √2Reg5 P7: √2Reg6 P8: Reg4 + Reg7

Table 4.

Stage wise operation schedule 8 PEs performing FDCT algorithm.

We will list the instructions required to execute three cases as a representative example: a > stage 1 operation of PE 5 b > stage 4 operation of PE 7 and c > stage 2 operations of PE 6. These three cases exhibit three category of floating point operations described previously (Table 5).

Time unit	Instruction	Description
1	Load direct MIDREG3	Load data from the TP3 line to MIDREG 3.
2	Rout PE[5],MIDREG[3]	Load data from MIDREG3 to input line of PE5
3	Load [D0],[PE 5]	Load input data to D0 from input line of PE 5
4	Load direct MIDREG4	Load data from the TP4 line to MIDREG 4.
5	Rout PE[5],MIDREG[4]	Load data from MIDREG4 to input line of PE5
6	Load [D1],[PE 5]	Load input data to D1 from input line of PE 5
7	Add [PE5]	Add the content of D0 and 2’s complement value of D1 and store the value in D4 of PE5
8	Out D4	Output data from D4 to OUTREG 5 of PE 5
9	Load Feedback MIDREG [5]	Load the data from the OUTREG 5 of PE5 to FB5 and then to MIDREG 5.
1	Rout PE[7],MIDREG[6]	Load data from MIDREG6 to input line of PE7
2	Load [D2],[PE 7]	Load input data to D2 from input line of PE 7
3	Load [D3],[C7],[PE 7]	Load D3 with constant from the constant register C7 selected by CMUX
4	Mul [PE7]	Multiply the content of D2 and D3 and store the value in D5 of PE 7
5	Out D5 [PE7]	Output data from D5 to OUTREG 7 of PE 7
6	Load Feedback MIDREG [7]	Load the data from the OUTREG 7 of PE7 to FB7 and then to MIDREG 7.
1	Rout PE[6],MIDERG[5]	Load data from MIDREG5 to input line of PE6
2	Load [D2], [PE 6]	Load input data to D2 from input line of PE 6
3	Load [D3],[C5],[PE 6]	Load D3 with constant from the constant register C5 selected by CMUX
4	Mul [PE 6]	Multiply the content of D2 and D3 and store the value in D5 of PE 6
5	Mov [D0],[D5],[PE 6]	Move the content from D5 to D0 of PE6
6	Rout PE[6],MIDERG[6]	Load data from MIDREG6 to input line of PE6
7	Load [D2], [PE 6]	Load input data to D2 from input line of PE 6
8	Load [D3],[C6],[PE 6]	Load D3 with constant from the constant register C6 selected by CMUX
9	Mul [PE 6]	Multiply the content of D2 and D3 and store the value in D5 of PE 6
10	Mov [D1],[D5],[PE 6]	Move the content from D5 to D1 of PE6
11	Add [PE 6]	Add the content of D0 and D1 and store the value in D4 of PE6
12	OUT [D4], [PE 6]	Output data from D4 to OUTREG 6 of PE 6
13	Load Feedback Data MIDREG[6]	Load the data from the OUTREG 6 of PE6 to FB6 and then to MIDREG 6.

Table 5.

List of instructions for a > stage 1 operation of PE 5 b > stage 4 operation of PE 7 and c > stage 2 03 operations of PE 6.

3.5 Implementation of the control unit of the processor

Hardwired implementation of the correct control signals, their values and the sequence for total 472 instructions is very difficult physically. Here in this work we have only developed instructions required for proving the correctness of the design, which are of three type. 1. We have developed instructions inside the PE to do a floating point addition and multiplications. 2. We have developed instructions to implement the longest sequence of the FDCT algorithm C1*X + C2*Y inside one PE implemented of a single stage. And 3. Next we have done the same implementation of stage 2 for all PEs and routed the output values randomly to prove the correctness of the implementation. So the control unit is partially developed. We require a programming based approach to develop a full grown assembler to generate all the instructions for all the instructions. These is an incomplete design of the TransProc which we presented in the paper but shows that it has the capability which can be used correctly for generationg all the instructions required for all the transform generators as a hardware co processor implemented in FPGA once the CU is finished generating all the instructions.

4. Simulation and synthesis

The first two simulations show the correct floating point implementation of floating point multiplier and adder/subtractor. While the floating point multiplier has lots of scope of improvement but floating point adder/subtractor is quite state of the art.

Here we see the longest sequence of multiplication and adder inside a single PE. Pein1xCein5 + Pein1xCEin6 = 2.0x0.5 + 4.0x8.0 = 33.0.

Here we see the routing correctness of the every PEs of the Trans_Proc according to the following flow graph shown in a tabular format:

PE1 = 1, PE2 = 2, PE3 = 3, PE4 = 4, PE5 = 5, PE6 = 6, PE7 = 7, PE8 = 8.

C1 = 2, C2 = 8.

PE1 = PE1x C1 + PE8xC2 = 66.

PE2 = PE2x C1 + PE7xC2 = 58.

PE3 = PE3x C1 + PE6xC2 = 50.

PE4 = PE4x C1 + PE5xC2 = 42.

PE5 = PE4x C2 + PE5xC1 = 34.

PE6 = PE3x C2 + PE6xC1 = 26.

PE7 = PE2x C2 + PE7xC1 = 1.

PE8 = PE1x C2 + PE8xC1 = 10.

This is the way the routing correctness among the different PEs of the processor is tested and we can see that it is working.

Once the behavioral simulation is correctly shown, next we present the result the synthesis of the entire processor done by the Xilinx Vivado and comment on the result (Tables 6–9).

Utilization report (Summery)
No of LUT	10897
No of FF	6928
No of IOB	562

Table 6.

Summery of utilization report.

Utilization report (Primitive blocks)
Primitive name	Number	Functuional category
LUT6	6096	LUT
LUT5	920	LUT
LUT4	2984	LUT
LUT3	576	LUT
LUT2	536	LUT
LUT1	1257	LUT
FDCE	3616	Flop & Latch
FDRE	3312	Flop & Latch
MUXF7	320	MuxFx
CARRY4	168	Carry Logic
IBUF	489	IO
OBUF	73	IO
BUFG	1	Clock

Table 7.

Utilization report of primitives block.

Power report (Summery)
Total On-Chip Power	0.417 W
Device Dynamic Power	0.335 W
Device Static Power	0.082 W

Table 8.

Power report summery.

Timing report (Summery)
Max Setup Time	3.419 ns
Worse Pulse Width Slack	4.650 ns
Avg CP required for FP operations inside PE	4
Max Clock Frequency	292 MHz

Table 9.

Timing report summery.

The overall utilization report gives an idea of the size of the processor while the number of primitive blocks used in the processor is also given. Please remember that the study here did not include the CU utilization as that is incomplete but and will be used as an separate design in the future study. Total on chip power with its two components dynamic and static is also suggesting an implementable design. T ming report shows Setup up time, WPWS is 4.650 ns, we calculated by hand that the instruction inside the floating point operations inside the takes maximum 4 clock pulses. This makes the maximum clock frequency as 292 MHZ.

5. A discussion on the memory and instruction exchange between the main processor and Trans_proc

Here we can see the data transfer procedure between the main processor and Trans-Proc which will be implemented as a future scope of this study. The process uses an linear image RAM (LIRAM) to store the primary data. Then there are two data registers used as buffers while going in and out to the Trans-Proc. There is one counter to count the no of blocks going to Trans-Proc and one address register to store the block of transformed image again back to LIRAM. This will be implemented further as the future scope of this study.

References

1. Po-Chih Tseng et al, “Reconfigurable discrete cosine transform processor for object-based video signal processing”, in ISCAS '04. Proceedings of the 2004 International Symposium on Circuits and System, 2004.
2. Po-Chih Tseng, Chao-Tsung Huang, Liang-Gee Chen, “Reconfigurable Discrete Wavelet Transform Processor for Heterogeneous Reconfigurable Multimedia Systems”, Journal of VLSI signal processing systems for signal, image and video technology, 2005.
3. Gregory W. Donohoe, “The Fast Fourier Transform on a Reconfigurable Processor”, Proc. NASA Earth Sciences Technology Conference, Pasadena, CA, June 11-13, 2002
4. Srivatsava P S V, SaradaV, “Reconfigurable MDC Architecture Based FFT Processor”, International Journal of Engineering Research & Technology, 2014
5. K. Joe Hass David F. Cox, “Transform Processing on a Reconfigurable Data Path Processor”, 7th NASA Symposium on VLSI Design 1998
6. V. Sarada, T. Vigneswaran, “Reconfigurable FFT Processor – A Broader Perspective Survey”, International Journal of Engineering and Technology (IJET) 2013
7. Asadollah Shahbahrami, Mahmood Ahmadi, Stephan Wong, Koen Bertels, “A New Approach to Implement Discrete Wavelet Transform using Collaboration of Reconfigurable Elements”, Proc. 2009 International Conference on Reconfigurable Computing and FPGAs
8. Konstantinos E. Manolopoulos, Konstantinos G. Nakos, Dionysios I. Reisis and Nikolaos G. Vlassopoulos, “Reconfigurable Fast Fourier Transform Architecture for Orthogonal Frequency Division Multiplexing Systems”, 2003, available: https://pdfs.semanticscholar.org/dd5c/263725af00e5dd4d42d573c269f57d917c8d.pdf?_ga=2.84059166.640751657.1573804365-914446569.1569299704
9. Amitabha Sinha, Mitrava Sarkar, Soumojit Acharyya, Suranjan Chakraborty, “A Novel Reconfigurable Architecture of a DSP Processor for Efficient Mapping of DSP Functions using Field Programmable DSP Arrays”, ACM SIGARCH Computer Architecture News Vol. 41, No. 2, May 2013
10. Sumit Wadekar, Laxman P. Thakare, Dr. A.Y. Deshmukh, “Reconfigurable N-Point FFT Processor Design For OFDM System, International Journal of Engineering Research and General Science Volume 3, Issue 2, March-April, 2015
11. Alexey Petrovsky, Maxim Rodionov and Alexander Petrovsky, “Dynamic Reconfigurable on the Lifting Steps Wavelet Packet Processor with Frame-Based Psychoacoustic Optimized Time-Frequency Tiling for Real-Time Audio Applications”, Design and Architectures for Digital Signal Processing, available: http://www.intechopen.com/books/design-and-architectures-fordigital-signal-processing2013.
12. Sharon Thomas & V Sarada, “Design of Reconfigurable FFT Processor With Reduced Area And Power”, ITSI Transactions on Electrical and Electronics Engineering (ITSI-TEEE), 2013.
13. Uma Rajaram, “Design Of Fir Filter For Adaptive Noise Cancellation Using Context Switching Reconfigurable EHW Architecture”, Ph.D dissertation, Anna University, Chennai, 2009, available: https://shodhganga.inflibnet.ac.in/handle/10603/27245
14. P. S. Reddy, S. Mopuri and A. Acharyya, “A Reconfigurable High Speed Architecture Design for Discrete Hilbert Transform,” in IEEE Signal Processing Letters, vol. 21, no. 11, pp. 1413-1417, Nov. 2014, doi: 10.1109/LSP.2014.2333745
15. Atri Sanyal, Swapan Kumar Samaddar, Amitabha Sinha, “A Generalized Architecture for Linear Transform”, Proc. IEEE International Conference on CNC 2010, Oct 04-05, 2010, Calicut, Kerala, India, IEEE Computer society, pp. 55-60, ISBN: 97-0-7695-4209-6.
16. A. Sanyal, S. K. Samaddar, “A Combined Architecture for FDCT Algorithm,” Proc. 2012 Third International Conference on Computer and Communication Technology, Allahabad, 2012, pp. 33-37, doi: 10.1109/ICCCT.2012.16
17. Atri Sanyal, SaloniKumari, Amitabha Sinha, “An Improved Combined Architecture of the Four FDCT Algorithms”, International Journal of Research in Electronics and Computer Engineering, (IJRECE), Vol 6 Issue 4 December 2018, ISSN: 2348-2281
18. Davide Rossi, Fabio Campi, Simone Spolzino, Stefano Pucillo, Roberto Guerrieri, “A Heterogeneous Digital Signal Processor for Dynamically Reconfigurable Computing”, IEEE Journal of Solid-State Circuits,Volume: 45, Issue: 8, Aug. 2010
19. Sohan Purohit, Sai Rahul Chalamalasetti, Martin Margala WimVanderbauwhede, “Throughput/Resource-Efficient Reconfigurable Processor for Multimedia Applications”, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Volume: 21, Issue: 7, July 2013
20. Vikram, K.N., Vasudevan, V. “Mapping Data-Parallel Tasks Onto Partially Reconfigurable Hybrid Processor Architectures”, IEEE Transactions On Very Large Scale Integration (VLSI) Systems, Vol. 14, No. 9, September 2006.
21. Atri Sanyal, Amitabha Sinha, “A Reconfigurable Architecture to Implement Linear Transforms of Image Processing Applications”, International Conference on Frontiers in Computing and System (COMSYS 2020), Jalpaiguri, West Bengal, India, January 13-15,2020
22. B. Heyne, C. C. Sun, J. Goetze, S. J. Ruan, “A Computationally Efficient High-Quality Cordic Based DCT”, 14th European Signal Processing Conference (EUSIPCO 2006), Florence, Italy, September 4-8, 2006
23. N. Deo, “Graph Theory with applications to engineering and computer science”, PHI, 2007

[1] 1. Po-Chih Tseng et al, “Reconfigurable discrete cosine transform processor for object-based video signal processing”, in ISCAS '04. Proceedings of the 2004 International Symposium on Circuits and System, 2004.

[2] 2. Po-Chih Tseng, Chao-Tsung Huang, Liang-Gee Chen, “Reconfigurable Discrete Wavelet Transform Processor for Heterogeneous Reconfigurable Multimedia Systems”, Journal of VLSI signal processing systems for signal, image and video technology, 2005.

[3] 3. Gregory W. Donohoe, “The Fast Fourier Transform on a Reconfigurable Processor”, Proc. NASA Earth Sciences Technology Conference, Pasadena, CA, June 11-13, 2002

[4] 4. Srivatsava P S V, SaradaV, “Reconfigurable MDC Architecture Based FFT Processor”, International Journal of Engineering Research & Technology, 2014

[5] 5. K. Joe Hass David F. Cox, “Transform Processing on a Reconfigurable Data Path Processor”, 7th NASA Symposium on VLSI Design 1998

[6] 6. V. Sarada, T. Vigneswaran, “Reconfigurable FFT Processor – A Broader Perspective Survey”, International Journal of Engineering and Technology (IJET) 2013

[7] 7. Asadollah Shahbahrami, Mahmood Ahmadi, Stephan Wong, Koen Bertels, “A New Approach to Implement Discrete Wavelet Transform using Collaboration of Reconfigurable Elements”, Proc. 2009 International Conference on Reconfigurable Computing and FPGAs

[8] 8. Konstantinos E. Manolopoulos, Konstantinos G. Nakos, Dionysios I. Reisis and Nikolaos G. Vlassopoulos, “Reconfigurable Fast Fourier Transform Architecture for Orthogonal Frequency Division Multiplexing Systems”, 2003, available: https://pdfs.semanticscholar.org/dd5c/263725af00e5dd4d42d573c269f57d917c8d.pdf?_ga=2.84059166.640751657.1573804365-914446569.1569299704

[9] 9. Amitabha Sinha, Mitrava Sarkar, Soumojit Acharyya, Suranjan Chakraborty, “A Novel Reconfigurable Architecture of a DSP Processor for Efficient Mapping of DSP Functions using Field Programmable DSP Arrays”, ACM SIGARCH Computer Architecture News Vol. 41, No. 2, May 2013

[10] 10. Sumit Wadekar, Laxman P. Thakare, Dr. A.Y. Deshmukh, “Reconfigurable N-Point FFT Processor Design For OFDM System, International Journal of Engineering Research and General Science Volume 3, Issue 2, March-April, 2015

[11] 11. Alexey Petrovsky, Maxim Rodionov and Alexander Petrovsky, “Dynamic Reconfigurable on the Lifting Steps Wavelet Packet Processor with Frame-Based Psychoacoustic Optimized Time-Frequency Tiling for Real-Time Audio Applications”, Design and Architectures for Digital Signal Processing, available: http://www.intechopen.com/books/design-and-architectures-fordigital-signal-processing2013.

[12] 12. Sharon Thomas & V Sarada, “Design of Reconfigurable FFT Processor With Reduced Area And Power”, ITSI Transactions on Electrical and Electronics Engineering (ITSI-TEEE), 2013.

[13] 13. Uma Rajaram, “Design Of Fir Filter For Adaptive Noise Cancellation Using Context Switching Reconfigurable EHW Architecture”, Ph.D dissertation, Anna University, Chennai, 2009, available: https://shodhganga.inflibnet.ac.in/handle/10603/27245

[14] 14. P. S. Reddy, S. Mopuri and A. Acharyya, “A Reconfigurable High Speed Architecture Design for Discrete Hilbert Transform,” in IEEE Signal Processing Letters, vol. 21, no. 11, pp. 1413-1417, Nov. 2014, doi: 10.1109/LSP.2014.2333745

[15] 15. Atri Sanyal, Swapan Kumar Samaddar, Amitabha Sinha, “A Generalized Architecture for Linear Transform”, Proc. IEEE International Conference on CNC 2010, Oct 04-05, 2010, Calicut, Kerala, India, IEEE Computer society, pp. 55-60, ISBN: 97-0-7695-4209-6.

[16] 16. A. Sanyal, S. K. Samaddar, “A Combined Architecture for FDCT Algorithm,” Proc. 2012 Third International Conference on Computer and Communication Technology, Allahabad, 2012, pp. 33-37, doi: 10.1109/ICCCT.2012.16

[17] 17. Atri Sanyal, SaloniKumari, Amitabha Sinha, “An Improved Combined Architecture of the Four FDCT Algorithms”, International Journal of Research in Electronics and Computer Engineering, (IJRECE), Vol 6 Issue 4 December 2018, ISSN: 2348-2281

[18] 18. Davide Rossi, Fabio Campi, Simone Spolzino, Stefano Pucillo, Roberto Guerrieri, “A Heterogeneous Digital Signal Processor for Dynamically Reconfigurable Computing”, IEEE Journal of Solid-State Circuits,Volume: 45, Issue: 8, Aug. 2010

[19] 19. Sohan Purohit, Sai Rahul Chalamalasetti, Martin Margala WimVanderbauwhede, “Throughput/Resource-Efficient Reconfigurable Processor for Multimedia Applications”, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Volume: 21, Issue: 7, July 2013

[20] 20. Vikram, K.N., Vasudevan, V. “Mapping Data-Parallel Tasks Onto Partially Reconfigurable Hybrid Processor Architectures”, IEEE Transactions On Very Large Scale Integration (VLSI) Systems, Vol. 14, No. 9, September 2006.

[21] 21. Atri Sanyal, Amitabha Sinha, “A Reconfigurable Architecture to Implement Linear Transforms of Image Processing Applications”, International Conference on Frontiers in Computing and System (COMSYS 2020), Jalpaiguri, West Bengal, India, January 13-15,2020

[22] 22. B. Heyne, C. C. Sun, J. Goetze, S. J. Ruan, “A Computationally Efficient High-Quality Cordic Based DCT”, 14th European Signal Processing Conference (EUSIPCO 2006), Florence, Italy, September 4-8, 2006

[23] 23. N. Deo, “Graph Theory with applications to engineering and computer science”, PHI, 2007

Stage 1 P1: Reg0 + Reg7 P2: Reg1 + Reg6 P3: Reg2 + Reg5 P4: Reg3 + Reg4 P5: Reg3-Reg4 P6: Reg2-Reg5 P7: Reg1-Reg6 P8: Reg0-Reg7	Stage 2 P1: Reg0 + Reg3 P2: REg1 + Reg2 P3: Reg1-Reg2 P4: Reg0-Reg3 P5: C3π/16Reg4 + S3π/16Reg7 P6: Cπ/16Reg5 + Sπ/16Reg6 P7: -Sπ/16Reg5 + Cπ/16Reg6 P8: -S3π/16Reg4 + C3π/16Reg7
Stage 3 P1: Reg0 + Reg1 P2: Reg0-Reg1 P3: √2 C3π/8Reg2 + S3π/8Reg3 P4: -S3π/8Reg2 + √2C3π/8Reg3 P5: Reg4 + Reg6 P6: Reg5-Reg7 P7: Reg4-Reg6 P8: Reg5 + Reg7	Stage 4 P1:---- P2:---- P3:----- P4:----- P5: Reg4-Reg7 P6: √2Reg5 P7: √2Reg6 P8: Reg4 + Reg7

Trans_Proc: A Processor to Implement the Linear Transformations on the Image and Signal Processing and Its Future Scope

Recent Remote Sensing Sensor Applications - Satellites and Unmanned Aerial Vehicles (UAVs)

Abstract

Keywords

Author Information

Atri Sanyal*

Amitabha Sinha

1. Introduction

2. Proof of the architecture using graph theoretic approach

3. Implementation of the architecture

3.1 Implemenation of the overall architecture design

3.2 Implementation of the processing element inside the processor

Table 1.

3.3 Primitive instruction set of the processor

Table 2.

Table 3.

3.4 Implementation of operations using the instruction set of the architecture

Table 4.

Table 5.

3.5 Implementation of the control unit of the processor

4. Simulation and synthesis

Table 6.

Table 7.

Table 8.

Table 9.

5. A discussion on the memory and instruction exchange between the main processor and Trans_proc

References

Application of UAV Remote Sensing in Monitoring Banana Fusarium Wilt

Trans_Proc: A Processor to Implement the Linear Transformations on the Image and Signal Processing and Its Future Scope

Recent Remote Sensing Sensor Applications - Satellites and Unmanned Aerial Vehicles (UAVs)

Abstract

Keywords

Author Information

Atri Sanyal*

Amitabha Sinha

1. Introduction

2. Proof of the architecture using graph theoretic approach

3. Implementation of the architecture

3.1 Implemenation of the overall architecture design

3.2 Implementation of the processing element inside the processor

Table 1.

3.3 Primitive instruction set of the processor

Table 2.

Table 3.

3.4 Implementation of operations using the instruction set of the architecture

Table 4.

Table 5.

3.5 Implementation of the control unit of the processor

4. Simulation and synthesis

Table 6.

Table 7.

Table 8.

Table 9.

5. A discussion on the memory and instruction exchange between the main processor and Trans_proc

References

Continue reading from the same book

Recent Remote Sensing Sensor Applications