Open access peer-reviewed chapter

Perspective Chapter: The Importance of Pipeline in Modern Cryptosystem

Written By

Babu M., Sathish Kumar G.A., Gurumurthy J. and Josephine Shermila P.

Submitted: 09 January 2022 Reviewed: 01 February 2022 Published: 18 May 2022

DOI: 10.5772/intechopen.102983

From the Edited Volume

Lightweight Cryptographic Techniques and Cybersecurity Approaches

Edited by Srinivasan Ramakrishnan

Chapter metrics overview

115 Chapter Downloads

View Full Metrics


In this digital world, all the digital information transmitted through the wireless channel has a threat to its security. Along with security, encryption speed is also a significant factor in transmitting the data as fast as possible. The pipeline is the technique used to improve the throughput of the encryption process so that the amount of data encrypted per unit time will be increased. In this chapter, the design of the modern SMS4-BSK cryptosystem is briefed, various pipeline designs of SMS4 algorithms are surveyed and the pipeline implementation on SMS4-BSK cryptosystem is analyzed. The SMS4-BSK cryptosystem is robust, fast and has a throughput of 7.4 Gbps. This modern cryptosystem can resist all kinds of cryptanalysis attacks. The pipelining technique is implemented in this cryptosystem to improve the throughput further. The pipelining method is applied in the encryption architecture of the cryptosystem. The pipelined design is implemented in Kintex-7 FPGA. The design achieved a throughput of 9.9 Gbps. The pipeline implementation can be extended to the key scheduling architecture also as both the encryption and the key scheduling use the same architecture. As per the SMS4-BSK algorithm, the keys are generated in the host system to improve the throughput.


  • pipeline
  • SMS4-BSK cryptosystem
  • throughput

1. Introduction

Pipelining is a very common and widely used technique to enhance the performance of a system without significantly investing in the hardware. In the pipelining technique, the computations are partitioned into a set of sub computations (elaborated in Section 2) and executed the sub computations in an overlapped fashion. The speed of execution is increased to an equal amount of sub computations obtained as a result of partitioning. The pipeline is used in different areas of computer design, such as memory access, instruction execution and arithmetic computation.

To improve the speed of computation, there is possible to adopt the following methods;

Method 1: Replicating the hardware.

Method 2: Partitioning the computations.

In the former method, by replicating the hardware, the performance can be improved by sacrificing the hardware cost. In the latter method, by partitioning the computation, the overlapped execution technique is implemented where the performance improvement is quite close to the former method.

Jin et al. have described [1] the pipelined design and folded design of SMS4 Block Cipher and implemented both the designs on the Xilinx Vertex-4 FPGA device. The implementation results show improved throughput in the former design and area coverage is minimized in the latter. According to the author, the proposed design might be the flexible choice for both the area-critical and the speed-critical cases. Gao et al. have proposed [2] rolling and unrolling architectures based SMS4 cryptosystem. The rolling structure uses a feedback system to control the entire processing mechanism. The unrolling structure is a fully pipelined architecture. The combination of rolling and unrolling provides good processing speed with an average clock cycle of one clock for processing128-bit. Han et al. have designed [3] an SMS4 architecture with optimization in power dissipation and implementation cost. The authors proposed a cryptographic algorithm for a programmable security processor. A three-stage pipelining and a 16-bit instruction set to enhance security are implemented in the design. The cost of the Processor and the code density of the design is significantly less. The round keys are stored in shared memory. A security scheme is proposed to protect these round keys. Zhao et al. have proposed [4] Galois Counter Mode (GCM) based SMS4 architecture. The architecture is fully pipelined to provide better performance. The structure can process 128-bit data on an average at each clock period. Full pipelined architecture is used here to improve the speed of encryption. The proposed design is implemented in both Vetex-4 and Vertex-5 FPGA. Zhao et al. have proposed [5] a novel implementation FPGA scheme for SMS4 cryptosystem. The throughput achieved in this scheme is 1.9 Gbps. Lee et al. have surveyed [6] the study pattern of computer architecture among students in implementing pipelining in the processor design. The design required only 21 Million Instructions per Second (MIPS). Abdel-Hafeez et al. have implemented pipelining [7] in Advanced Encryption Standard (AES). The architecture is implemented in Altera Max 3000A Field Programmable Gate Array (FPGA). The author claims that the pipelined AES design has a 16% higher throughput and 36% less hardware area than other designs. Guo et al. have proposed a pipelined AES [8] cryptosystem by combining pipelining with parallel processing and reconfiguration techniques. The design achieved a throughput of 8.83 Gbps. Teo et al. have implemented [9] pipelining in Data Encryption Standard (DES) cryptosystem. The architecture is implemented in Altera Complex Programmable Logic Devices (CPLD). Four stage pipeline approach is used to improve the throughput of the DES architecture. Babu et al. have implemented pipelining in SMS4 [10] cryptosystem. The design is implemented in Altera FPGA. The pipelining is applied to the Twisted Binary Decision Diagram (BDD) S-Box architecture. The Twisted BDD with m = 4 possesses good speed and throughput. The pipeline is implemented after the transformation block at each round. There are 32 round operations in the encryption architecture. Taherkhani et al. have designed [11] the pipelined DES cryptosystem and Vertex-6 FPGA to implement the architecture. The non-pipelined and the pipelined DES architectures are implemented in the same environment and analyzed. The results showed that the pipelined architecture’s performance and throughput are better.


2. Overview of pipelined architecture

According to the eight great ideas of computer architecture [12], the pipeline is one of the techniques used to improve the Processor’s performance.

Whenever a program is executed, it is executed through five phases;

  1. Instruction / Opcode Fetch (IF)

  2. Instruction Decode (ID)

  3. Operand Fetch (OF)

  4. Operand Execute (OE)

  5. Operand Store (OS)

Figure 1 shows the organization of a Computer. Steps involved in the execution of a program are:

Figure 1.

Organization of a computer.

Step 1: The Instructions / Program as well as operands are stored in the Main Memory initially

Step 2: The Processor fetches the instruction from the Instruction Memory

Step 3: The Control Unit decodes the instruction and finds out the operand registers as well as the operation to be performed on the operand and sends the control signals to all the components

Step 4: Operands are fetched from the Data Memory

Step 5: Arithmetic and Logic Unit (ALU) executes the operands based on the operation decoded by the Control Unit

Step 6: The output from the ALU is stored back to the Data Memory

In the case of a non-pipelined architecture, after the ith instruction’s execution, only the (i + 1)th instruction is initiated. After the (i + 1)th instruction’s execution, only the (i + 2)nd instruction is initiated. The instructions will go on executing after finishing the previous instructions. This un-fashioned way of execution is known as instruction-wise interleaved execution or non-pipelined execution and it is shown in Figure 2.

Figure 2.

Non-Pipelined Execution of instructions.

tn: Execution duration of an instruction.

tp: Phase duration.

In the case of a Pipelined architecture, while the ith instruction is going from the first phase (IF) to the second phase (ID), the (i + 1)th instruction will be going to the first phase (IF). And while the ith instruction is going from the ID phase to OF phase, the (i + 1)th instruction will be going IF phase to ID phase and the (i + 2)nd instruction will be going to IF phase. This way of execution is known as phase-wise interleaved execution or pipelined execution. The instruction execution periods are overlapped and the parallel execution of the instructions is taking place and it is shown in Figure 3. No more than one instruction will be available in the same phase at each time slot. All the instructions will be in different phases. Phase-wise, there will not be any contention.

Figure 3.

Pipelined execution of instructions.

tp in non-pipelined architecture = 15.

tp in pipelined architecture = 7.

So, in the pipelined architecture, the instruction/program will be executed faster than the non-pipelined architecture.

The parameters used to evaluate the performance of the pipelined execution are:

Speed-Up ratio (S)

Frequency (f)

Efficiency (E)

Throughput (T)

Figure 4 shows the two-stage pipelined implementation. Consider a pipelined implementation with two stages Si and Si + 1. The pipeline implementation is done by inserting buffers/latches between each stage. Input will go out of the latch when the clock is enabled.

Figure 4.

Two-stage pipelined execution of instructions.

τd = Latch delay (Delay taken by the input to go out of the latch).

τm = Maximum Phase Duration

Clock Periodτ=τm+τdE1

The clock period is the sum of the Maximum Phase Duration (τm) and the Latch delay (τd). All the phases may not have the same phase duration. So, the maximum phase duration among the five phases is considered for calculating the clock period.

In Figure 5, the number of instructions (n) = 5.

Figure 5.

Execution of instructions in a pipelined processor.

The number of clock cycles = 9.

The number of Phases (k) = 5.


The number of Clock Cycles=k+n1E2

Speed-Up ratio = Non-Pipelined Execution Time/Pipelined Execution Time.



If n> > k, k + n-1  n.

tn = k x tp and tp = τ



Pipeline Efficiency,E=SkE7

By substituting Eq. (5) to Eq. (7),


Where B is the message block size, f is the clock frequency and N is the adequate number of clock cycles utilized for the implementation.


3. SMS4-BSK cryptosystem

The SMS4-BSK cryptosystem [13] is a 128-bit symmetric key block cipher. The algorithm is designed to protect the message transmitted through the Wireless Local Area Network (WLAN). Thirty-two rounding operations are used in the encryption and the key generation algorithms. The encryption, as well as the key generation algorithms, use the same architecture. A unique non-linear S-Box, BSK Processing Block, is implemented in the design and operated over GF(216).

Figure 6 shows the process flow of the SMS4-BSK encryption architecture. The algorithm is designed especially for the sectors which use the shorter length messages (E.g., the Defense sector uses the shorter length messages to alert the enemy intrusion to the base army base camp).

Figure 6.

Process flow of SMS4-BSK cryptosystem.

3.1 Encryption Algorithm

Step 1: Message Mixing - In this step, the plaintext is split into eight sub-blocks of equal size initially and then mixed with the nearest sub-blocks in a round fashion.

Step 2: Message Swapping – In this step, the first eight bits are swapped with the second eight bits in each sub-blocks.

Step 3: Key Mixing (Key generation is mentioned in the Key Scheduling Algorithm) – The generated half left and right key of 16-bit each is mixed with each sub-blocks after undergoing linear left/right shift.

Step 4: 32 Round BSK Processing – Step 10 of the Key scheduling algorithm is applied here.

Step 5: Message Mixing 2 – In this step, the processed message sub-blocks are linearly mixed with other sub-blocks.

Step 6: 32 Rounding

Step 7: Rounded Encrypted Message Mixing – In this step, the message sub-blocks are mixed in the opposite way as that of Step 1.

Step 8: Mixed Encrypted Message Swapping - In this step, the bits in the message sub-blocks are swapped in the same way as that of Step 2.

3.2 Key Scheduling Algorithm.

Step 1: Key Splitting 1 – The initial Key is split into two equal sub-keys.

Step 2: Key Splitting 2 – The even and the odd bits of each sub-keys are split.

Step 3: Key Mixing 1 – The odd key sets and the even key sets are mixed up

Step 4: Key Splitting 3 – Step 2 is repeated here

Step 5: Key Mixing 2 – Step 3 is repeated here

Step 6: 32 Round BSK Processing

Step 7: Cyclic Shifting

Step 8: BSK Processing – Step 6 and 7 are repeated here.

Step 9: Generation of Key 1

Step 10: Remaining Key Generation – Steps 6, 7, 8 & 9 are repeated to generate 32 keys.

The descriptions of both the Encryption as well as the Key Scheduling algorithm are elaborated in [13].

3.3 Key Features of SMS4-BSK Cryptosystem.

The SMS4-BSK design is more robust, i.e., the brute-force attack may take 2526 ns to decrypt the original message.

The cryptosystem is 1.1 times faster than the other SMS4 [14, 15] algorithms.

  • The throughput achieved by the design is 7.4 Gbps.

  • The power consumption of the hardware design is less.

  • The cryptosystem has a keyspace of 2352.

  • The plaintext sensitivity is very close to 0.5.

  • The key sensitivity is also very close to 0.5.

  • The cryptosystem can resist known-plaintext attacks.

  • The cryptosystem can resist chosen plaintext attacks.

  • The cryptosystem can resist ciphertext-only attacks.

  • The cryptosystem also can resist chosen-ciphertext attacks.


4. Pipelined SMS4-BSK cryptosystem

The pipelining is implemented in the encryption architecture of the SMS4-BSK cryptosystem. The pipelined encryption architecture of the SMS4-BSK cryptosystem is shown in Figure 7. A latch is introduced between every processing step. The processing delay of every step in the encryption architecture is different. So, the step with a higher processing delay may affect the encryption speed and throughput. By implementing a latch between every processing step, it will be possible to balance the data access rate at each step. The pipelined SMS4-BSK cryptosystem is developed in Xilinx Vivado 2017 Design Suite using Verification Logic Hardware Description Language (Verilog HDL) and implemented in Kintex-7 Field Programmable Gate Array (FPGA).

Figure 7.

Pipelined SMS4-BSK encryption architecture.

The Kintex-7 FPGA with the Device Name: XC7K410T and the Package: FBG676 is used to implement the cryptosystem. The number of Input and Output Buffers (IOB) available in the package are 400 and the requirement of the cryptosystem is 386. The synthesis report generated notes the timing summary and the area utilized by design. Figure 8 shows the technology schematic of the pipelined SMS4-BSK obtained from the Xilinx Vivado 2017 Design Suite. The simulated timing diagram of the pipelined SMS4-BSK cryptosystem is shown in Figure 9.

Figure 8.

Technology schematic of the pipelined SMS4-BSK encryption architecture.

Figure 9.

Simulated timing diagram of the pipelined SMS4-BSK encryption architecture.

The throughput of the Pipelined SMS4-BSK algorithm is calculated using Eq. (10).


Where B is the message block size (128 bit), f is the clock frequency (464 MHz) and N is the effective number of clock cycles utilized (6 Clock cycles).

∴ Throughput ≈ 9.9 Gbps.

The performance of the Pipelined SMS4-BSK encryption architecture is compared with the other SMS4 architecture and tabulated in Table 1. It is evident from Table 1 that the throughput of the Pipelined SMS4-BSK cryptosystem is far better than other cryptosystems.

Twisted BDD
S-Box SMS4
801 Mbps
Standard SMS41.9 Gbps
Folded & Reconfigurable SMS46.3 Gbps
SMS4-BSK7.4 Gbps
Pipelined SMS4-BSK9.9 Gbps

Table 1.

Comparison of pipelined SMS4-BSK with other SMS4 cryptosystems.


5. Conclusion

In this chapter, the implementation of the pipeline in the modern SMS4-BSK cryptosystem is discussed. The SMS4-BSK cryptosystem is more robust, has a large Keyspace, has optimum Key Sensitivity and Plaintext Sensitivity, is faster, has high throughput and can resist all the four major cryptanalysis attacks. The throughput of the SMS4-BSK cryptosystem is further improved to 9.9 Gbps by implementing a pipeline in the encryption architecture. The comparison of throughput of the various architectures of the SMS4 cryptosystem is shown in Table 1. It is evident from Table 1 that the pipelined design of the SMS4-BSK cryptosystem has higher throughput. All the designs are implemented in Kintex-7 FPGA for comparison.

As a future enhancement, the pipeline implementation can be extended to the Key Scheduling architecture to improve the throughput further. A novel BM S-Box is being designed to replace the BSK processing block (Non-linear Transformation Block) of the SMS4-BSK cryptosystem to improve the speed and throughput.



We thank Ms. Saranya, S, UAT Analyst, AutoZone, U.S.A., for her support throughout the project. We extend our thanks to Ms. Preethy V for mentoring us in the project.


Conflict of interest

The authors declare no conflict of interest.



A.1 Real-time applications of pipelined SMS4-BSKCryptosystem

Figure A1 shows the Inter-Vehicle Communication (IVC) implemented with the Pipelined SMS4-BSK cryptosystem. The communication between the vehicles is secured with the help of the proposed cryptosystem.

Figure A1.

Inter-vehicle communication with improved security.

Figure A2 explains a war field scenario: A short message transmission by a scout if an enemy intrusion is detected. The communication between the remote scout hub and the Army base camp is secured with the Pipelined SMS4-BSK cryptosystem.

Figure A2.

A war field message transmission.

Figure A3 shows the secured data transmission from the Smart Glucose Bottle to a caretaker or a nurse. The data sensed by the intelligent system is transmitted with the help of Internet of Things (IoT) technology. The data is prone to hack and can make a severe threat to the life of any important persons. This worst situation can be avoided with the help of the Pipelined SMS4-BSK encryption transmission system.

Figure A3.

IoT based smart bottle for healthcare with secure data transmission.

Figure A4 shows the implementation of an automatic Voice Prescription Generation and secured transmission to the patient using Pipelined SMS4-BSK cryptosystem.

Figure A4.

Voice prescription generation with secure communication.


  1. 1. Jin Y, Shen H, You R. Implementation of SMS4 Block Cipher on FPGA. In: Proceedings of the First International Conference on Communications and Networking in China; China. 2006. pp. 1-4
  2. 2. Gao X, Lu E, Xian L, Chen H. FPGA Implementation of the SMS4 Block Cipher in the Chinese WAPI Standard. In: Proceedings of the International Conference on Embedded Software and Systems Symposia. 2008. pp. 104-106
  3. 3. Han L, Han J, Zeng X, Ronghua L, Zhao J. A programmable security processor for cryptography algorithms. In: Proceedings of the 9th International Conference on Solid-State and Integrated-Circuit Technology. 2008. pp. 2144-2147
  4. 4. Zhao M, Shou G, Hu Y, Guo Z. High-speed architecture design and implementation for SMS4-GCM. In: Proceedings of the Third International Conference on Communications and Mobile Computing. 2011. pp. 15-18
  5. 5. Zhao J, Guo Z, Zeng X. High throughput implementation of SMS4 on FPGA. IEEE Access. 2019;7:88836-88844. DOI: 10.1109/ACCESS.2019.2923440
  6. 6. Lee JH, Lee SE, Yu HC, Suh T. Pipelined CPU design with FPGA in teaching computer architecture. IEEE Transactions on Education. 2012;55(3):341-348. DOI: 10.1109/TE.2011.2175227
  7. 7. Abdel-hafeez S, Sawalmeh A, Bataineh S. High performance AES design using pipelining structure over GF((24)2). In: Proceedings of the IEEE International Conference on Signal Processing and Communications. 2007. pp. 716-719. DOI: 10.1109/ICSPC.2007.4728419
  8. 8. Guo Z, Li G, Liu Y. Dynamic reconfigurable implementations of AES algorithm based on pipeline and parallel structure. In: Proceedings of the Second International Conference on Computer and Automation Engineering (ICCAE). 2010. pp. 257-260. DOI: 10.1109/ICCAE.2010.5451864
  9. 9. Chueng TP, Yusoff ZM, Sha’ameri AZ. Implementation of pipelined data encryption standard (DES) using Altera CPLD. In: TENCON Proceedings. Intelligent Systems and Technologies for the New Millennium. 2000. pp. 17-21. DOI: 10.1109/TENCON.2000.892211
  10. 10. Babu M, Mukuntharaj C, Saranya S. Pipelined SMS4 Cipher design for fast encryption using twisted BDD S-Box architecture. International Journal of Computer Applications & Information Technology. 2012;1(3):26-30
  11. 11. Taherkhani S, Ever E, Gemikonakli O. Implementation of Non-Pipelined and Pipelined Data Encryption Standard (DES) Using Xilinx Virtex-6 FPGA Technology. In: Proceedings of the Tenth IEEE International Conference on Computer and Information Technology. 2010. pp. 1257-1262. DOI: 10.1109/CIT.2010.227
  12. 12. Patterson DA, Hennessy JL. Computer Organization and Design. 5th ed. USA: Morgan Kaufmann Publishers Inc; 2013. p. 800
  13. 13. Babu M, Sathish Kumar GA. Design of Novel SMS4-BSK encryption transmission system. Integration. 2021;78:60-69. DOI: 10.1016/j.vlsi.2021.01.003
  14. 14. Babu M, Sathish Kumar GA. Enhanced moth flame optimization based Supervision Kernel Entropy Component Analysis for high-speed encrypted transmission model. International Journal of Communication Systems. 2021;34(10):1-21
  15. 15. Babu M, Sathish Kumar GA. In depth survey on SMS4 architecture. In: Proceedings of the IEEE International Conference on Intelligent Computing and Communication for Smart World (I2C2SW). Erode, India; 2018. pp. 33-36

Written By

Babu M., Sathish Kumar G.A., Gurumurthy J. and Josephine Shermila P.

Submitted: 09 January 2022 Reviewed: 01 February 2022 Published: 18 May 2022