Comparison of pipelined SMS4-BSK with other SMS4 cryptosystems.
Abstract
In this digital world, all the digital information transmitted through the wireless channel has a threat to its security. Along with security, encryption speed is also a significant factor in transmitting the data as fast as possible. The pipeline is the technique used to improve the throughput of the encryption process so that the amount of data encrypted per unit time will be increased. In this chapter, the design of the modern SMS4-BSK cryptosystem is briefed, various pipeline designs of SMS4 algorithms are surveyed and the pipeline implementation on SMS4-BSK cryptosystem is analyzed. The SMS4-BSK cryptosystem is robust, fast and has a throughput of 7.4 Gbps. This modern cryptosystem can resist all kinds of cryptanalysis attacks. The pipelining technique is implemented in this cryptosystem to improve the throughput further. The pipelining method is applied in the encryption architecture of the cryptosystem. The pipelined design is implemented in Kintex-7 FPGA. The design achieved a throughput of 9.9 Gbps. The pipeline implementation can be extended to the key scheduling architecture also as both the encryption and the key scheduling use the same architecture. As per the SMS4-BSK algorithm, the keys are generated in the host system to improve the throughput.
Keywords
- pipeline
- SMS4-BSK cryptosystem
- throughput
1. Introduction
Pipelining is a very common and widely used technique to enhance the performance of a system without significantly investing in the hardware. In the pipelining technique, the computations are partitioned into a set of sub computations (elaborated in Section 2) and executed the sub computations in an overlapped fashion. The speed of execution is increased to an equal amount of sub computations obtained as a result of partitioning. The pipeline is used in different areas of computer design, such as memory access, instruction execution and arithmetic computation.
To improve the speed of computation, there is possible to adopt the following methods;
Method 1: Replicating the hardware.
Method 2: Partitioning the computations.
In the former method, by replicating the hardware, the performance can be improved by sacrificing the hardware cost. In the latter method, by partitioning the computation, the overlapped execution technique is implemented where the performance improvement is quite close to the former method.
Jin
2. Overview of pipelined architecture
According to the eight great ideas of computer architecture [12], the pipeline is one of the techniques used to improve the Processor’s performance.
Whenever a program is executed, it is executed through five phases;
Instruction / Opcode Fetch (IF)
Instruction Decode (ID)
Operand Fetch (OF)
Operand Execute (OE)
Operand Store (OS)
Figure 1 shows the organization of a Computer. Steps involved in the execution of a program are:
Step 1: The Instructions / Program as well as operands are stored in the Main Memory initially
Step 2: The Processor fetches the instruction from the Instruction Memory
Step 3: The Control Unit decodes the instruction and finds out the operand registers as well as the operation to be performed on the operand and sends the control signals to all the components
Step 4: Operands are fetched from the Data Memory
Step 5: Arithmetic and Logic Unit (ALU) executes the operands based on the operation decoded by the Control Unit
Step 6: The output from the ALU is stored back to the Data Memory
In the case of a non-pipelined architecture, after the ith instruction’s execution, only the (i + 1)th instruction is initiated. After the (i + 1)th instruction’s execution, only the (i + 2)nd instruction is initiated. The instructions will go on executing after finishing the previous instructions. This un-fashioned way of execution is known as instruction-wise interleaved execution or non-pipelined execution and it is shown in Figure 2.
tn: Execution duration of an instruction.
tp: Phase duration.
In the case of a Pipelined architecture, while the ith instruction is going from the first phase (IF) to the second phase (ID), the (i + 1)th instruction will be going to the first phase (IF). And while the ith instruction is going from the ID phase to OF phase, the (i + 1)th instruction will be going IF phase to ID phase and the (i + 2)nd instruction will be going to IF phase. This way of execution is known as phase-wise interleaved execution or pipelined execution. The instruction execution periods are overlapped and the parallel execution of the instructions is taking place and it is shown in Figure 3. No more than one instruction will be available in the same phase at each time slot. All the instructions will be in different phases. Phase-wise, there will not be any contention.
tp in non-pipelined architecture = 15.
tp in pipelined architecture = 7.
So, in the pipelined architecture, the instruction/program will be executed faster than the non-pipelined architecture.
The parameters used to evaluate the performance of the pipelined execution are:
Speed-Up ratio (S)
Frequency (f)
Efficiency (E)
Throughput (T)
Figure 4 shows the two-stage pipelined implementation. Consider a pipelined implementation with two stages Si and Si + 1. The pipeline implementation is done by inserting buffers/latches between each stage. Input will go out of the latch when the clock is enabled.
τd = Latch delay (Delay taken by the input to go out of the latch).
τm = Maximum Phase Duration
The clock period is the sum of the Maximum Phase Duration (τm) and the Latch delay (τd). All the phases may not have the same phase duration. So, the maximum phase duration among the five phases is considered for calculating the clock period.
In Figure 5, the number of instructions (n) = 5.
The number of clock cycles = 9.
The number of Phases (k) = 5.
i.e.,
Speed-Up ratio = Non-Pipelined Execution Time/Pipelined Execution Time.
i.e.,
If n> > k, k + n-1
So,
By substituting Eq. (5) to Eq. (7),
Where B is the message block size, f is the clock frequency and N is the adequate number of clock cycles utilized for the implementation.
3. SMS4-BSK cryptosystem
The SMS4-BSK cryptosystem [13] is a 128-bit symmetric key block cipher. The algorithm is designed to protect the message transmitted through the Wireless Local Area Network (WLAN). Thirty-two rounding operations are used in the encryption and the key generation algorithms. The encryption, as well as the key generation algorithms, use the same architecture. A unique non-linear S-Box, BSK Processing Block, is implemented in the design and operated over GF(216).
Figure 6 shows the process flow of the SMS4-BSK encryption architecture. The algorithm is designed especially for the sectors which use the shorter length messages (E.g., the Defense sector uses the shorter length messages to alert the enemy intrusion to the base army base camp).
3.1 Encryption Algorithm
Step 1: Message Mixing - In this step, the plaintext is split into eight sub-blocks of equal size initially and then mixed with the nearest sub-blocks in a round fashion.
Step 2: Message Swapping – In this step, the first eight bits are swapped with the second eight bits in each sub-blocks.
Step 3: Key Mixing (Key generation is mentioned in the Key Scheduling Algorithm) – The generated half left and right key of 16-bit each is mixed with each sub-blocks after undergoing linear left/right shift.
Step 4: 32 Round BSK Processing – Step 10 of the Key scheduling algorithm is applied here.
Step 5: Message Mixing 2 – In this step, the processed message sub-blocks are linearly mixed with other sub-blocks.
Step 6: 32 Rounding
Step 7: Rounded Encrypted Message Mixing – In this step, the message sub-blocks are mixed in the opposite way as that of Step 1.
Step 8: Mixed Encrypted Message Swapping - In this step, the bits in the message sub-blocks are swapped in the same way as that of Step 2.
3.2 Key Scheduling Algorithm.
Step 1: Key Splitting 1 – The initial Key is split into two equal sub-keys.
Step 2: Key Splitting 2 – The even and the odd bits of each sub-keys are split.
Step 3: Key Mixing 1 – The odd key sets and the even key sets are mixed up
Step 4: Key Splitting 3 – Step 2 is repeated here
Step 5: Key Mixing 2 – Step 3 is repeated here
Step 6: 32 Round BSK Processing
Step 7: Cyclic Shifting
Step 8: BSK Processing – Step 6 and 7 are repeated here.
Step 9: Generation of Key 1
Step 10: Remaining Key Generation – Steps 6, 7, 8 & 9 are repeated to generate 32 keys.
The descriptions of both the Encryption as well as the Key Scheduling algorithm are elaborated in [13].
3.3 Key Features of SMS4-BSK Cryptosystem.
The SMS4-BSK design is more robust, i.e., the brute-force attack may take 2526 ns to decrypt the original message.
The cryptosystem is 1.1 times faster than the other SMS4 [14, 15] algorithms.
The throughput achieved by the design is 7.4 Gbps.
The power consumption of the hardware design is less.
The cryptosystem has a keyspace of 2352.
The plaintext sensitivity is very close to 0.5.
The key sensitivity is also very close to 0.5.
The cryptosystem can resist known-plaintext attacks.
The cryptosystem can resist chosen plaintext attacks.
The cryptosystem can resist ciphertext-only attacks.
The cryptosystem also can resist chosen-ciphertext attacks.
4. Pipelined SMS4-BSK cryptosystem
The pipelining is implemented in the encryption architecture of the SMS4-BSK cryptosystem. The pipelined encryption architecture of the SMS4-BSK cryptosystem is shown in Figure 7. A latch is introduced between every processing step. The processing delay of every step in the encryption architecture is different. So, the step with a higher processing delay may affect the encryption speed and throughput. By implementing a latch between every processing step, it will be possible to balance the data access rate at each step. The pipelined SMS4-BSK cryptosystem is developed in Xilinx Vivado 2017 Design Suite using Verification Logic Hardware Description Language (Verilog HDL) and implemented in Kintex-7 Field Programmable Gate Array (FPGA).
The Kintex-7 FPGA with the Device Name: XC7K410T and the Package: FBG676 is used to implement the cryptosystem. The number of Input and Output Buffers (IOB) available in the package are 400 and the requirement of the cryptosystem is 386. The synthesis report generated notes the timing summary and the area utilized by design. Figure 8 shows the technology schematic of the pipelined SMS4-BSK obtained from the Xilinx Vivado 2017 Design Suite. The simulated timing diagram of the pipelined SMS4-BSK cryptosystem is shown in Figure 9.
The throughput of the Pipelined SMS4-BSK algorithm is calculated using Eq. (10).
Where B is the message block size (128 bit), f is the clock frequency (464 MHz) and N is the effective number of clock cycles utilized (6 Clock cycles).
∴ Throughput ≈ 9.9 Gbps.
The performance of the Pipelined SMS4-BSK encryption architecture is compared with the other SMS4 architecture and tabulated in Table 1. It is evident from Table 1 that the throughput of the Pipelined SMS4-BSK cryptosystem is far better than other cryptosystems.
Architecture | Throughput |
---|---|
Twisted BDD S-Box SMS4 | 801 Mbps |
Standard SMS4 | 1.9 Gbps |
Folded & Reconfigurable SMS4 | 6.3 Gbps |
SMS4-BSK | 7.4 Gbps |
Pipelined SMS4-BSK |
5. Conclusion
In this chapter, the implementation of the pipeline in the modern SMS4-BSK cryptosystem is discussed. The SMS4-BSK cryptosystem is more robust, has a large Keyspace, has optimum Key Sensitivity and Plaintext Sensitivity, is faster, has high throughput and can resist all the four major cryptanalysis attacks. The throughput of the SMS4-BSK cryptosystem is further improved to 9.9 Gbps by implementing a pipeline in the encryption architecture. The comparison of throughput of the various architectures of the SMS4 cryptosystem is shown in Table 1. It is evident from Table 1 that the pipelined design of the SMS4-BSK cryptosystem has higher throughput. All the designs are implemented in Kintex-7 FPGA for comparison.
As a future enhancement, the pipeline implementation can be extended to the Key Scheduling architecture to improve the throughput further. A novel BM S-Box is being designed to replace the BSK processing block (Non-linear Transformation Block) of the SMS4-BSK cryptosystem to improve the speed and throughput.
Acknowledgments
We thank Ms. Saranya, S, UAT Analyst, AutoZone, U.S.A., for her support throughout the project. We extend our thanks to Ms. Preethy V for mentoring us in the project.
A.1 Real-time applications of pipelined SMS4-BSKCryptosystem
Figure A1 shows the Inter-Vehicle Communication (IVC) implemented with the Pipelined SMS4-BSK cryptosystem. The communication between the vehicles is secured with the help of the proposed cryptosystem.
Figure A2 explains a war field scenario: A short message transmission by a scout if an enemy intrusion is detected. The communication between the remote scout hub and the Army base camp is secured with the Pipelined SMS4-BSK cryptosystem.
Figure A3 shows the secured data transmission from the Smart Glucose Bottle to a caretaker or a nurse. The data sensed by the intelligent system is transmitted with the help of Internet of Things (IoT) technology. The data is prone to hack and can make a severe threat to the life of any important persons. This worst situation can be avoided with the help of the Pipelined SMS4-BSK encryption transmission system.
Figure A4 shows the implementation of an automatic Voice Prescription Generation and secured transmission to the patient using Pipelined SMS4-BSK cryptosystem.
References
- 1.
Jin Y, Shen H, You R. Implementation of SMS4 Block Cipher on FPGA. In: Proceedings of the First International Conference on Communications and Networking in China; China. 2006. pp. 1-4 - 2.
Gao X, Lu E, Xian L, Chen H. FPGA Implementation of the SMS4 Block Cipher in the Chinese WAPI Standard. In: Proceedings of the International Conference on Embedded Software and Systems Symposia. 2008. pp. 104-106 - 3.
Han L, Han J, Zeng X, Ronghua L, Zhao J. A programmable security processor for cryptography algorithms. In: Proceedings of the 9th International Conference on Solid-State and Integrated-Circuit Technology. 2008. pp. 2144-2147 - 4.
Zhao M, Shou G, Hu Y, Guo Z. High-speed architecture design and implementation for SMS4-GCM. In: Proceedings of the Third International Conference on Communications and Mobile Computing. 2011. pp. 15-18 - 5.
Zhao J, Guo Z, Zeng X. High throughput implementation of SMS4 on FPGA. IEEE Access. 2019; 7 :88836-88844. DOI: 10.1109/ACCESS.2019.2923440 - 6.
Lee JH, Lee SE, Yu HC, Suh T. Pipelined CPU design with FPGA in teaching computer architecture. IEEE Transactions on Education. 2012; 55 (3):341-348. DOI: 10.1109/TE.2011.2175227 - 7.
Abdel-hafeez S, Sawalmeh A, Bataineh S. High performance AES design using pipelining structure over GF((24)2). In: Proceedings of the IEEE International Conference on Signal Processing and Communications. 2007. pp. 716-719. DOI: 10.1109/ICSPC.2007.4728419 - 8.
Guo Z, Li G, Liu Y. Dynamic reconfigurable implementations of AES algorithm based on pipeline and parallel structure. In: Proceedings of the Second International Conference on Computer and Automation Engineering (ICCAE). 2010. pp. 257-260. DOI: 10.1109/ICCAE.2010.5451864 - 9.
Chueng TP, Yusoff ZM, Sha’ameri AZ. Implementation of pipelined data encryption standard (DES) using Altera CPLD. In: TENCON Proceedings. Intelligent Systems and Technologies for the New Millennium. 2000. pp. 17-21. DOI: 10.1109/TENCON.2000.892211 - 10.
Babu M, Mukuntharaj C, Saranya S. Pipelined SMS4 Cipher design for fast encryption using twisted BDD S-Box architecture. International Journal of Computer Applications & Information Technology. 2012; 1 (3):26-30 - 11.
Taherkhani S, Ever E, Gemikonakli O. Implementation of Non-Pipelined and Pipelined Data Encryption Standard (DES) Using Xilinx Virtex-6 FPGA Technology. In: Proceedings of the Tenth IEEE International Conference on Computer and Information Technology. 2010. pp. 1257-1262. DOI: 10.1109/CIT.2010.227 - 12.
Patterson DA, Hennessy JL. Computer Organization and Design. 5th ed. USA: Morgan Kaufmann Publishers Inc; 2013. p. 800 - 13.
Babu M, Sathish Kumar GA. Design of Novel SMS4-BSK encryption transmission system. Integration. 2021; 78 :60-69. DOI: 10.1016/j.vlsi.2021.01.003 - 14.
Babu M, Sathish Kumar GA. Enhanced moth flame optimization based Supervision Kernel Entropy Component Analysis for high-speed encrypted transmission model. International Journal of Communication Systems. 2021; 34 (10):1-21 - 15.
Babu M, Sathish Kumar GA. In depth survey on SMS4 architecture. In: Proceedings of the IEEE International Conference on Intelligent Computing and Communication for Smart World (I2C2SW). Erode, India; 2018. pp. 33-36