Open access peer-reviewed chapter

Hybrid Obfuscation of Encryption

Written By

Asma’a Al-Hakimi and Abu Bakar Md Sultan

Submitted: 27 November 2022 Reviewed: 21 December 2022 Published: 23 February 2023

DOI: 10.5772/intechopen.109662

From the Edited Volume

Coding Theory Essentials

Edited by Dinesh G. Harkut and Kashmira N. Kasat

Chapter metrics overview

81 Chapter Downloads

View Full Metrics

Abstract

Obfuscation is an encryption method. It allows the programmer to reform the code for protection. Obfuscation has promising chance to change the way of coding, where the programmer has the ability to program with any language, not necessarily English. Obfuscation, Unicode, and mathematical equations have the possibility to change strings and identifiers and to hide secret algorithms and business rules. Special dictionary is used for the string obfuscation to hide the logic of the program. The hybrid obfuscation technique will be implemented into a tool that automatically converts the code. It can be can be used for games and mobile applications for protection. With obfuscation, the application still has the ability to perform sufficiently and provide the desired output without any delays in performing timing. After obfuscating the source file, reverser still has the ability to break the object file but will not be able to read or understand and when obfuscation technique is complicated, reversing leads to error where original code disappears. In this chapter, hybrid obfuscation will be presented with examples, and obfuscation table is presented as well for future use.

Keywords

  • obfuscation
  • encryption
  • anti-reverse engineering
  • reverse engineering
  • hacking prevention

1. Introduction

Obfuscation is considered as anti-reverse engineering to prevent hacking and code theft. It mainly works in the source file to change the form of the code to confuse the reverser or the hacker and also to prevent the compiler from reading the hacked code. The obfuscation technique converts all the code into unreadable text, but it functions like the original code and produces the same output. There are many forms of obfuscation, such as string encryption, hiding, changing identifier names, junk code obfuscation, packing, byte code obfuscation, string encryption, stealth obfuscation, chaotic encryption, and junk code obfuscation. In this chapter, a new method of obfuscation is introduced to produce a different kind of chaotic code that is almost impossible to read and understand but still produces the desired output [1]. For this case, Java code will be used to implement and test the code. Figure 1 presents most common categories of obfuscation.

Figure 1.

Obfuscation categories.

The decision of using any category of any obfuscation or merging them together depends on the level of complication the programmer or author wish to make the code and also depends on the part that wanted to be obfuscated, such as a business rule or an algorithm that is important to the code or the business of the company that is developing the code. Following section describes the categories of the obfuscation.

1.1 Lexical obfuscation

This technique is used to transform or alter the compiler information. Other information will be removed from the byte code such as comments and identifiers. Programmers use this technique alone without merging it with any other technique that does not guarantee protection [2].

1.2 Stealthy obfuscation

This is an obfuscator that contains several obfuscation techniques to obfuscate the code when it is read. Stealthy obfuscation provides a sort of false sense of the actual program structure. This technique works with the assembly file. After applying this technique, two files are created, one of them is the assembly file and the second file is the obfuscated file. In this technique, the source file is not encrypted. However, there is a possibility that the reverser will be confused when reading the code [3].

1.3 Key hiding obfuscation

This technique is used to protect intellectual property, and it is based on key hiding. This method should not be used alone. It must be combined with another technique to provide more protection. A symmetric mechanism is used to combine with key hiding. Key hiding focuses on executable software. The software protection key is then encrypted with a threshold key to make it difficult for the reverser to find it and break the code. This technique focuses on executable software and leaves the source code and class file as they are [4].

1.4 Junk obfuscation

This technique converts identifiers into an unreadable but performs and produces output that can be read by the compiler. This technique is particularly useful when combined with another technique to increase code security level. Junk obfuscation misleads and does not allow the reverser to read the identifier or understand what is the purpose for it but can only see the output [5].

1.5 Control obfuscation

This technique hides the actual flow of the code and creates a fake one. It controls flow by using a structured exception handling mechanism in Windows. It disguises the control flow by adding exception statements. When the exception occurs, the exception handler is called and the flow of execution is changed in the exception handler. This technique is provided by Windows operating system for exception handling. It focuses on basic blocks that can be obfuscated by further splitting them into few parts. This technique does not modify the class file. It changes the source file, which is a good point to protect the code. However, newer reverse engineering tools can change the flow of the software and even create new flows [6].

1.6 String obfuscation

This technique uses many approaches such as encryption, mathematical equations, or chaotic obfuscation. It depends on the programmer to decide how complicated the obfuscated strings should be. String obfuscation is very effective in protecting the code from theft. When the string is obfuscated, only the compiler can read and output it, while it becomes unreadable to humans [7].

1.7 Chaotic obfuscation

Here, a mathematical modeling is used for string encoding. It is up to the programmer to determine the form of the equation for encrypting the string. The programmer has the option to encrypt all strings or some of them. Chaos theory involves stems generated from mathematical equations that produce random numbers and chaos that are not readable by the user; however, the chaos sequences are readable by the compiler at runtime. The chaotic equations are deterministic by nature, which means that they go into saturation after several iterations at a single value. Figure 2 presents sample of string after applying chaotic obfuscation [8].

Figure 2.

String chaotic obfuscation.

1.8 Cipher algorithm

This technique uses session keys instead of permeant. Session keys are symmetric keys that are regenerated for each encryption. The keys in Cipher are automatically generated in the algorithm itself to prevent the inverter from guessing the permanent key. The user using Cipher can purchase a permanent key from the developer; however, the key can be compromised by determined reversers. String encryption in Cipher follows certain steps: the first step is to choose the secret key, which can be an x-value. The second step is to assign the equation used for encryption that will to cause the series of chaos. The encryption is a secret function that only the developer knows. The third step is the iteration of x-value to produce the ciphertext. Figure 3 presents sample of cipher obfuscation [9].

Figure 3.

Cipher obfuscation.

1.8.1 Cipher block chaining

This technique divides the data or code into blocks of bits and chains. The encrypted data are blocked together to avoid eavesdroppers from inserting their own blocks of bits among the blocks of encrypted code. A mathematical equation is used for the for the Cipher block changing, and the equation is given as follows:

C1=ekm1XORIVE1
C1=ekm1XORCi1fori>1E2

The technique involves a specific (N) value passed between the plaintext to ensure that the ciphertext blocks look different. The N value is the second layer of encryption, while the first layer of encryption is done by the secret key. Each generated text is encrypted with the same secret key. If an error occurs in one of the blocks, it will also occur in all other blocks that follow the affected block [10].

1.9 Symmetric cipher

The symmetric Cipher is well known and common for string encryption and decryption. It can encrypt large data. This technique uses one key of encryption and decryption. The reverser or the end user must find the meaning of exchanging the key securely. Without the key of the encryption algorithm, the reverser will not be able to reveal or translate or decrypt the encrypted string. Below figure illustrates the sample of code before and after applying Cipher algorithm. Figure 4 presents sample of symmetric obfuscation after applying on Java code [11].

Figure 4.

Symmetric cipher obfuscation.

Advertisement

2. Discussion of current obfuscation techniques

Obfuscation techniques based on the identifiers renaming have been recently presented. Such techniques can be classified as a form of layout obfuscation, since they reduce the information available to a human reader which examines the target program, or of preventive obfuscation since they aim to prevent the decompilation from producing original code with full meaning or to produce an incorrect Java source code. Such techniques try to hide the structure and the behavior information embedded in the identifiers of a Java program by replacing them with meaningless or confounding identifiers to make more difficult the task of the reverse engineer. It is worth to notice that the information associated with an identifier is completely lost after the renaming [12]. By replacing the identifiers of a Java bytecode with new ones that are illegal with respect to the Java language specification, such techniques try to make the decompilation process impossible or make the decompiler return unusable source code. After applying any obfuscation technique, it is very important to test the program, especially if there are many loops that execute many times such as games or algorithmically intensive method. The constant test is to ensure that all obfuscated parts are well working with no error recorded. During test for several obfuscation techniques, there were several limitations that can be a vulnerable entrance for the any strong decompiler [13].

Control flow obfuscation is only able to defeat decompilers when the method contains basic blocks of code. This technique is not fully deterministic, whereby it is only applicable to methods if the developer sees the performance degradation during testing. If the control flow obfuscation was implemented on highly complicated code that contains extensive loops, it will not be useful as it will be difficult to trace the errors during implementation, and it does not work sufficiently during reversing. However, it is useful with small applications. Control flow obfuscation does not have the ability to be nested in the source file, as it will be difficult to trace the loops during execution. Error management will not be as possible as it should be [14].

The obfuscation techniques offered by various developers have several gaps. The obfuscation techniques are able to protect the code to some extent; however, the code contains some debugging information. There is no obfuscator tool that can be completely declared as the best obfuscation technique. If the secret of an obfuscator is known, reverse engineers can easily accomplish their tasks by constructing de-obfuscators. These de-obscuscator tools have not yet been published, but in the future there may be the possibility of developing de-obscuscators [15].

The bytecode contains unknown characters and symbols from the source code. Reverse engineers have cracked the secrets of the byte code using reverse engineering tools. Therefore, it is possible to copy the original code after reversal, improve it, and resell it on the market to gain an advantage over the original author who developed the code in the first place. Some software development companies hire a hacker or a reverser to crack their code and find out the weaknesses and vulnerabilities of the software so that the company can fix it before it is actually hacked. All software programs contain a security key or registry file that ensures the protection of the software. Reverse engineers convert this file into source code when they remove the registration file from the software and use the exposed code for their own illegal development purpose [16].

Most of the obfuscation techniques are applied in the source file. These obfuscation techniques are applied individually in the source file. Most obfuscation techniques focus on renaming the identifiers and hiding the meaning of the code. Most reverse engineering tools are capable of analyzing the obfuscated code. According to the discussion in the papers highlighted in this research, they do not include mathematical equations to convert or encrypt the strings in the source file, and they do not include garbage conversion to change the layout of the code [17].

Obfuscation techniques are applied in the source code as a single technique. For example, the developer uses only variable names or hides only the names of classes. None of the papers discussed the use of a hybrid obfuscation technique, and none discussed a hybrid obfuscation technique with a mathematical equation for protection. For the obfuscation technique to be strong, it must be merged or joined. If the developer uses more than one obfuscation technique, there is a good chance that the code is protected from the reversal tools. The developer selects the obfuscation techniques that work together based on the layout and complexity of the original code. From the work examined in this study, the use of combined or hybrid obfuscation techniques guarantees strong protection against prohibited reverse engineering [18].

Table 1 presents limitations of most common obfuscation techniques.

ListLimitation
  • Logistic map

  • Cipher block chaining

  • Symmetric cipher [19]

This technique uses mathematical equations to replace the text in the string with a chaos stream. The technique uses a secret key for encryption and uses a mathematical equation. The key can be randomly generated at the time of encryption or acquired from the developer. If the reverser can guess the key, there is the possibility of using the key to decrypt the entire code.
  • Renaming

  • Hiding [20]

This technique emphasizes to hide features and change the layout. This is harder to understand but is not impossible to reverse. These tools can hide the code somehow. Nevertheless, reverse engineering is possible
Key hiding obfuscation [21]This technique emphasizes to execute software and leave the source code and class file unchanged. Reversing tools have the ability to find and crack the key to the source file and perform code analysis.
Encryption [22]This technique encrypts the executable code. The limitation of this technique is the programmer either limits key or round sizes, or leaves only stubs for restricted classes. Longer keys are used in encryption to provide better security. The longer key length in itself leads to slower encryption speed.
Packing [23]This technique puts all the code into one package. The reversing tool is currently able to unpack the packed code and create new code that is useful and produces the same output as the original.
Classes combination obfuscation [24]This technique hides classes by combining them. The inversion tools allow the user to create new classes and open the combined classes. The inversion tools contain great analysis function that allows the user to find the class trees and the connection between the classes.
Junk code obfuscation [25]This technique emphasizes to change the names of the identifiers to create confusion while reading the code, and the reversing tools are able to create new names for the variables and classes by using characters. Then, the reverser can use the refactor function to create meaningful names.

Table 1.

Current obfuscation limitation.

Advertisement

3. Implementing hybrid obfuscation of encryption

In this section, we introduce a new hybrid obfuscation technique based on identifier renaming and string encryption. The technique relies on hybrid identifier renaming in the program’s source file to cause extreme confusion for both reversal tools and humans when they examine the source file without permission. Regardless of the obfuscation strategy used, it was possible to contrast the obfuscation by renaming the identifiers and string encoding in two phases to first overcome the preemptive obfuscation and then add type information to the identifiers in the source code to contrast the layout obfuscation.

The first phase is renaming, and the hybrid obfuscation technique consists of two sections. The first section is obfuscating the identifiers to junk code to hide the meaning and increase complexity and confuse the decompiler during reversing. The second section is replacing the system keywords with Unicode.

The second phase is string encryption, where a set of random mathematical equations are injected into the strings to encrypt them. A transformation framework has been implemented to represent the steps of the hybrid obfuscation technique. The proposed technique can be used for many languages such as Arabic, English, Chinese, and so on. Using this technique creates the possibility of programming in different languages instead of English, which increases the protection of the code.

Following sections discuss the hybrid obfuscation encryption in detail:

3.1 Unicode approach

In the Java language, each character or symbol is represented using Unicode, which creates a possibility of changing the form of the code while reading. This technique is used in the source file. If this file is stolen, there will be no way to read it. The thief has to translate any Unicode to understand the meaning and figure out the code. The compiler is able to read Unicode and produce output. Combining Unicode with other encoding techniques in the source file makes it stronger. Table 2 presents examples of Unicode [26].

0x003000x0044D0x0051Q0x0064d0x0071q
0x003110x0045E0x0052R0x0065e0x0072r
0x003220x0046F0x0053S0x0066f0x0073s
0x003330x0047G0x0054T0x0067g0x0074t
0x003440x0048H0x0055U0x0068h0x0075u
0x003550x0049I0x0056V0x0069i0x0076v
0x003660x004AJ0x0057W0x006Aj0x0077w
0x003770x004BK0x0058X0x006Bk0x0078x
0x003880x004CL0x0059Y0x006Cl0x0079y
0x003990x004DM0x005AZ0x006Dm0x007Az
0x0041A0x004EN0x0061a0x006En0x0A09
0x0042B0x004FO0x0062b0x006Fo0x0A0A
0x0043C0x0050P0x0063c0x0070p0x2190
0x0A170x21570x2175vi0x217Fm0x2191
0x0A180x21580x2176vii0x21800x2192
0x0A190x21590x2177vii0x21810x2193
0x0AA0x215A0x2178ix0x313A0x33E1
0x0A1B0x215B0x219E0x313B0x33E2
0x12270x215C0x219F0x313C0x33E3
0x12280x215D0x21A00x313D0x33E4
0x12290x215E0x21A10x313E0x33E5
0x122A0x215F0x21A20x313F0x33E6
0x122B0x2160I0x21A30x31400x33E7
0x122C0x2161II0x21A40x31410x33E8
0x122D0x2162III0x21A50x31420x33E9
0x122E0x2163IV0x21A60x31430x33EA
0x21250x2164V0x311D0x33A20x33EB
0x2126Ω0x2165VI0x311E0x33A30xA000ꀀ
0x21270x2166VII0x311F0x33A40xA001
0x21280x2167VII0x31200x33A50xA002
0x21290x2168IX0x31210x33A60xA003
0x212AK0x2170i0x31220x33A70xA004
0x212BÅ0x2171ii0x31230x33A80xA005
0x21300x2172iii0x31240x33A90xA006
0x21310x2173iv0x31250x33AA0xA007
0x21320x2174v0x31260x33E00xA008

Table 2.

Uniocode characters.

In this approach, a Unicode transformation was used to rename the system keywords. The purpose of this renaming is to make the code in the source file more complicated. In this case, when reading the source file, the attacker will not be able to recognize the actual meaning of the code. This approach is very beneficial because in case of stealing the source file, the reader is not able to recognize the actual meaning of the code. He has to translate the whole code to understand the purpose of the code. However, even if the Unicode is easy to translate, the keywords of the system do not have much meaning, because the classes and variables in the functions and methods are.

3.2 String encryption approach

In this approach, a mathematical equation with a character field and loops were used to encode the strings in the source file. The encoding of the strings causes confusion while decompiling. The reversing tool is not able to translate the symbols generated by the mathematical equation; moreover, the compiler cannot translate the symbols that were converted to bytecode during compilation. The purpose of string encoding is to create a chaos stream in the source file and in the reverse file after decompiling [27]. The advantage of string encoding is that the mathematical formula used to create the chaos stream that can be used N times in the source code, and multiple (X) sets of mathematical equations can be used in the same source file. The more the chaos streams are created in the source file, the more the confusion is created during decompiling. The mathematical equations used in the source file were derived from the concept that Java programming language provides a function that can be used to convert the mathematical equation characters into different symbols. Normally, the equation contains a fixed value to ensure accurate output [28]. For the proposed technique, the value for the equation is two which will assigned to (P). There is other two values in the equation that are the values of (Y) and (Z). The values of (Y) and (Z) have to be carefully declared and assigned to produce the accurate output.

If the value of Y is 17 then the value of Z is 2.

If the value of Y is 19 then the value of Z is 4.

If the value of Y is 16 then the value of Z is 1.

According to the above conditions, if the value of (Y) increases by one value, then the value of (Z) has to increase by one as well. The assigned value of (P) is 2, it can be changed as well to increment by one, and then the value of (Y) has to decrease by three values in order to get the calculation right for accurate output. The final result of calculating the three values have to be always 17; therefore, the value of (P) is fixed but it can decrease by one value, to increase the value of (Y) by one value as well. To prevent errors, the value of (P) was fixed at 2. The values of (Y) and (Z) can be increased and decreased accurately to allow using more mathematical equations in the source file. The final equation is:

Char=V2+Y+Z.E3

3.3 Mathematical equation to encrypt strings

The equation that was used to encrypt the strings in the source code is associated with beneficial attributes, and (Y) indicates the ideal (best) value of the considered attribute among the values of the attribute for different alternatives, and the fixed and best value for the equation is 2; this value will not be changed. In the case of beneficial attributes for instance, those of which higher values are desirable for the given application, (Y) indicates the higher value of the attribute, and the highest value which will be used for the equation is 17 [29].

Lower values are desired for the given application, and(Z) indicates the lower value of the attribute. (Z) indicates the lowest value of the considered attribute among the values of the attribute for different alternatives, and the lowest value which will be used is 2. In the case of beneficial attributes, (Z) indicates the lower value of the attribute. In the case of non-beneficial attributes, (Y) indicates the higher value of the attribute [30]. Following equation presents the string encryption transformation:

Char=V2+Y+Z.E4

3.4 Identifiers renaming to junk obfuscation

The main purpose of junk renaming is to create complicated code that is difficult to read and understand and make sense out of it. Junk renaming is used to confuse the reversing tool which leads to incorrect analysis and thus produces incorrect codes. Junk conversion provides the ability to create a variety of languages during the development of the software to protect it. The class file contains the junk code after compiling the source file. After using junk conversion, the converted code in the class file is converted back to junk code, which increases protection. Applying this feature means compromising some of the software quality factors that are readable code and manageable size. These features are compromised to increase the security of the code.

Advertisement

4. Hybrid obfuscation of encryption

Java development is based on object orientation, while the compiler executes the application based on components, unlike structured programs developed with the C programming language. Therefore, code obfuscation will not be a problem when compiling to machine language or bytecode. To use this hybrid obfuscation technique, certain steps must be followed. The first step is to use Object Junk Renaming Obfuscation [31, 32, 33, 34, 35]. This conversion must be done first to avoid confusion and errors when the obfuscation process is running. The second step is to encrypt strings. This technique must be performed second to have smooth conversion without errors. The last step is the Unicode renaming technique for obfuscation. Performing the hybrid obfuscation technique increases the security level of the code where reversing is nearly impossible. Table 3 presents a sample of code after merging three approaches of obfuscation and after reversing.

Obfuscated codeAfter reversing
\u0066\u006F\u0072\u0028 \u0069\u006E\u0074\u046D01101: “ܜÌÆÁƾ̼ˬ­Æ ˷Á¾Ë°¬¤Î¬ ´Ë ˬ°¤¾¸ ÁÜ­¤° É­°¯­ ܚÁÁ ܜ Ü ÉÁ¼d d°2¤¸μ¼Õ |Æ ¸­É¤Î¤ ‚μ°°¤μ É­°¯­ ܚÁÁܜ Ü Š ۴ ‹ ܕ ܚ ܜ ܡ þ ‘ ‘”.toCharArray\u0028\u0029\u0029BufferedWriter out = new BufferedWriter(new FileWriter(de7, true));
Char c010101c[] = “\255\276 \313 \255 \306\310\313\306\265\276\260 “.toCharArray();
int. 0908 = c010101c.length;
for(int d9 = 0; d9 < 0908; d9++)
\u0076\u006F\u0069\u0064龜\u00 28\u0029\u007B\u0066\u006F\u00 72\u0028\u0069\u006E\u0074ѭ01 1靖\u003A”¤¼ ÀÌ¾Ë ȅ¬ ‚ÀÈμË”\u00 2E\u0074\u006F\u0043\u0068\u00 61\u0072\u0041\u0072\u0072\u00 61\u0079\u0028\u0029\u0029\u00 7B\u0053\u0079\u0073\u0074\u00 65\u006D\u002E\u006F\u0075\u00 74\u002E\u0070\u0072\u0069\u006E\u0074\u0028\u0028\u0063\u0068\u0061\u0072\u0029\u0028ѭ011靖\u002F\u0032\u002B\u0031\u0037\u002D\u0032\u0029\u0029\u 003B\u007D \u0053\u0079\u0073\ u0074\u 0065\u006D\u002E\u006F \u0075\u007 \u002E\u0070\u0072 \u0069\u006E\u0074\u0028”\n”\u 0029\u003B癩 \u003D更\u002E\u0 06E\u0065\u0078\u0074\u0044\u0 06F\u0075\u0062\u006C\u0065\u0028\u0029\u003B裸\u003D裸\u002 B癩\u003B\u0066\u006F\u0072\u0 028\u0069\u006E\u0074 ѭ011精\u 003A” ࠥÌÆƬ¾Ë˭¤ɡ¤¾ ࠥ ¬”\u002E\ u0074\u006F\u0043\u0068\u0061\u0072Array\u0028\u0029\u0029\u007B\u0053\u0079\u0073\u0074\u 0065\u006D\u002E\u006F\u0075\u 0074\u002E\u0070\u0072\u0069\u 006E\u0074\u0028\u0028\u0063\u 0068\u0061\u0072\u0029\u0028ѭ011精\u002F\u0032\u002B\u0031\u0037\u002D\u0032\u0029\u0029\ u003B\u007D\u0053\u0079\u0073\ u0074\u0065\u006D\u002E\u006F\u0075\u0074\u002E\u0070\u0072\u0069\u006E\u0074\u0028”\n”\u0029\u003B\u0053\u0079\u0073\u0074\u0065\u006D\u002E\u006F\u0075\u0074\u002E\u0070\u0072\u0069\u006E\u0074\u0028裸u0029\u003B\u007D\u0076 \u006F\u0069 \u0064 契\u0028\u0029\u007 BAfter reversing the class file which contains full code

package bankencrypt;
// Referenced classes of package bankencrypt:
//   F9A4
public class Bankencrypt
 {
 public Bankencrypt()
 {
 }
 public static void main (String args[])
 {
 F9A4 A461 = new F9A4();
 A461.F907();
 A461.F908();
 A461.F909();
 A461.F90A();
 }
}

Table 3.

Obfuscated code before and after reversing.

There is possibility to change names to junk and can be used for any purposes such as emails, login, and so on. With this logic, the hybrid obfuscation encryption can be used to write encrypted letters and create a whole system using only junk code. Table 4 presents names before and after obfuscation. Every time the obfuscated name is copied or used, it changes automatically.

NameAfter obfuscation
Asma mahfoudÉÁ¼d d°2¤¸μ¼Õ
Java hacker\"v¤Î¤\"2¤¨¸¬Æ
Kesava¸­É¤Î¤
hi3´
keep it real­­‚ μË Æ­¤°
NurܜÌÆ

Table 4.

Names before and after obfuscating.

The string encryption makes the obfuscation technique more effective in terms of securing the code, as it contains so many symbols that help to confuse the decompiler while parsing and analysis. Figure 5 presents the framework of the proposed hybrid obfuscation encryption (Figure 6).

Figure 5.

Reversing hybrid obfuscation.

Figure 6.

Hybrid obfuscation encryption framework.

Advertisement

5. Empirical evaluation of the hybrid obfuscation

Four reversing tools were used to test the effectiveness of the technique and to determine how much can the reversing tool uncover and read from the obfuscated code. Four reversing tools were used for this experiment; the tools are CAVAJ, JAD, DJ, and JD. The parameters are distributed among the reversing tools based on their behavior toward the obfuscated code. For instance, JD only tested the identifiers names because it has the ability to reveal the entire code; therefore, there was no need to test the rest of parameters. Figure 7 presents experiment design.

Figure 7.

Experiment design.

5.1 Testing with CAVAJ

CAVAJ as reversing tool for Java class file is used to determine the ability of it to read the code after obfuscating. Figure 8 presents the results of CAVAJ testing.

Figure 8.

Reversing result of CAVAJ.

5.2 Testing with Java decompiler (JD)

DJ Reversing tool is used to determine the ability to reverse Java class file that contains hybrid obfuscated technique. The test will determine if the tool is able to read the obfuscated code, and how much can the tool read and discover. Figure 9 presents the output after reversing.

Figure 9.

Reversing result of JD.

5.3 Testing with JAD

After installing JAD, prompt command is used to find the Java class file, then the file is opened in command, and the file name.jad is typed to reverse the file. Figure 10 presents the result of reversing.

Figure 10.

Reversing result with JAD.

First and second classes test for output correctness and reversed code error:

The tool was not able the code after obfuscation with hybrid technique, and it has presented errors while reading and just revealed the Unicode without the ability to read the identifiers.

First and second classes test for methods and classes and identifiers:

Based on Figure 11, the tool was not able to get a meaning of the encrypted strings and identifiers; in fact, it has changed the names further which can be considered for the another level of protection. This way the reverser will not be able to read the code or get a meaning of it, and also the name of the Java file was encrypted to mislead the reverser if the source file is stolen. Figure 12 presents the form of the Java file name after encryption.

Figure 11.

JAD reversing result for methods and identifiers.

Figure 12.

File name after encryption.

5.4 Testing to Decompiler java (DJ)

DJ reversing tool Java is a tool that reverses the class file. This tool is used to determine the ability to reverse Java class file that contains hybrid obfuscated technique. The test will determine if the tool is able to read the obfuscated code, and how much can the tool reveal. Figure 13 presents the reversing result of reversing the class file of output correctness.

Figure 13.

Reversing result with DJ.

First class test/output correctness.

The tool was not able to read the first-class test to reveal the code. Therefore, there is no code to test its correctness. This is promising results of having hybrid obfuscation technique. An error message is appeared to define syntax error.

Second class test/identifiers.

According to Figure 14, the tool was not able to read the code after obfuscation. This results the proof that using hybrid obfuscation is more beneficial than just applying one technique.

Figure 14.

Identifiers test.

Advertisement

6. Conclusion

The hybrid obfuscation technique was effective to protect the code. The reversing tools were not able to read and translate the encrypted strings. Renaming to junk in the obfuscation technique was effective as the reversing tool has converted the junk to a series of random numbers and symbols. The reversing tool was able to read the system keywords only. Furthermore, the reversing tool has added methods and preprocessors while parsing the file. The reversing tool was not able to analyze the obfuscated code to get appropriate output. This means that the hybrid obfuscation technique is effective to protect the source file from prohibited reverse engineering. Third objective of this research was successfully met; according to the experimentation, a series of junk and chaos was created after reversing the obfuscated code.

The extreme chaos was generated due to the merge of string encryption and renaming approaches in one source file which has led to confusion while reversing as the reversing tool was not able to translate or read or analyze the code. To summarize the results of the experiments that were conducted before and after obfuscation, we calculate the lines of code (LOC) of original file before and after reversing, calculate the total errors appeared during running the reversed file before and after obfuscation, and then find the difference to determine the strength.

Based on the results of the reversing tools, they were not able to discover fully functioning code; in all cases, the reversing tools have generated a series of chaos and random numbers and symbols while attempting to translate the obfuscated code. The code that was generated from the reversing tools did not provide an output, and there was always an error while trying to compile the obfuscated code after reversing.

Table 5. The summary of errors occurred for the four tested cases.

Reversing toolTesting componentReversed file before hybrid techniqueReversed file after hybrid technique
CAVAJCompiled reversed code error testZero6
De-Crypt String test1
JADOutput correctness7
Compiled reversed code error test100
Methods and classes correctness test22
DJOutput correctness test0
JDIdentifiers names test0

Table 5.

Error summary.

Advertisement

7. Future work

The number and type of obfuscators we used for our research were fairly small. Future work could explore a wider variety of noncommercial and research obfuscators to provide a broader picture of protection possibilities. Due to time constraints, we were also not able to take advantage of all commercial obfuscators that we had access to. In the future, more commercial obfuscators and reversing tools can be used for the sake of this research. The proposed hybrid obfuscation technique can be further used for games and mobile applications to protect financially from being illegally reversed.

The technique can be developed with C/C++ programming language instead of Java, as Java is closer to the hardware level and communicate with it easily due to the pointer feature it has. Having the technique implemented with C/C++ is an advantage which makes the tool stronger for more defensive.

The technique can be as an added tool in the programming environment such as NetBeans or eclipse where programmer can customize which part of the code to be encrypted and which approach to use. Programmer has full freedom to mix and match encryption approaches in the code to increase security. Having such encryption tool prevents errors while encryption and saves time.

The proposed technique’s concept can be used in any programming language to what fits its requirements and mechanisms and also opens an opportunity to have an option to insert different verbal languages, such as Arabic, Chinese, or any other language, for the sake of encryption to increase the level of security.

References

  1. 1. Yasin A, Nasra I, Yasin A, Nasra I. Dynamic multi levels Java code obfuscation technique (DMLJCOT). International Journal of Computer Science and Security (IJCSS). 2016;10(4):140-160
  2. 2. Kumar R, Vaishakh ARE. Detection of obfuscation in Java malware. Physics Procedia. 2016;78(2015):521-529
  3. 3. Darwish SM, Guirguis SK, Zalat MS. Stealthy code obfuscation technique for software security. In: Proceedings, the 2010 International conference on computer engineering & systems (ICCES’2010). Cairo, Egypt: IEEE; 2010. pp. 93-99
  4. 4. Cho T, Kim H, Yi JH. Security assessment of code obfuscation based on dynamic monitoring in android things. IEEE Access. 2017;5:6361-6371
  5. 5. Xiang G, Cai Z. The code obfuscation technology based on class combination. In: Proc. DCABES 2010 ninth international symposium on distributed computing and applications to business, engineering and science. Hong Kong, China: IEEE. Vol. 60970064, 2010. pp. 479-483
  6. 6. Deshmukh GC, Patil SM. Study for best data obfuscation techniques using multi-criteria decision-making technique. International Journal of Computer Applications. 2018;180(43):50-57
  7. 7. Al-Hakimi AMH, Sultan ABM, Ghani AAA, Ali NM, Admodisastro NI. Hybrid obfuscation technique to protect source code from prohibited software reverse engineering. IEEE Access. 2020;8:187326-187342
  8. 8. Sebastian SA, Malgaonkar S, Shah P, Kapoor M, Parekhji T. A study & review on code obfuscation. IEEE WCTFTR 2016. In: Proc. 2016 World Conf. Futur. Trends Res. Innov. Soc. Welf. Coimbatore, India: IEEE; 2016
  9. 9. Batchelder M, Hendren L. Obfuscating Java: The most pain for the least gain. Lecture Notes in Computer Science. 2007;4420:96-110
  10. 10. Peng Y, Chen Y, Shen B. An adaptive approach to recommending obfuscation rules for Java bytecode obfuscators. In: Proceedings, 2019 IEEE 43rd annual computer software and applications conference (COMPSAC). Vol. 1. Milwaukee, WI, USA: IEEE; 2019. pp. 97-106
  11. 11. Kumar C, Bhaskari DL. Different obfuscation techniques for code protection. 4th International Conference on Eco-friendly Computing and Communication Systems. 2015;70:757-763
  12. 12. Ceccato M et al. Towards experimental evaluation of code obfuscation techniques. In: Proceedings, CCS08: 15th ACM conference on computer and communications security, Alexandria, Virginia, USA: ACM; 2008. pp. 39-45
  13. 13. Solomonoff RJ. Algorithmic probability: Theory and applications. Information Theory and Statistical Learning. New York: Springer; 14 Nov 2008:1-23. ISBN 978-0-387-84815-0; 978-1-4419-4650-8; 978-0-387-84816-7. DOI: 10.1007/978-0-387-84816-7
  14. 14. Budhkar S. Reverse engineering Java code to class diagram: An experience report. International Journal of Computer Applications. 2011;29(6):36-43
  15. 15. Baxter ID, Mehlich M. Reverse engineering is reverse forward engineering. Science of Computer Programming. 2000;36(2):131-147
  16. 16. J. M. Memon, Shams-ul-Arfeen, A. Mughal, and F. Memon, Preventing reverse engineering threat in java using byte code obfuscation techniques. In: Proceedings, International conference on emerging technologies, ICET 2006, November. Peshawar, Pakistan: IEEE; 2006. pp. 689–694
  17. 17. Zhang L, Meng H, Thing VLL. Progressive control flow obfuscation for android applications. In: TENCON 2018 - 2018 IEEE Region 10 Conference. Vol. 2018. Jeju, Korea (South): IEEE; 2019. pp. 1075-1079
  18. 18. You I. Malware Obfuscation Techniques: A Brief Survey. Fukuoka, Japan: IEEE; 2010. pp. 297-300
  19. 19. Popa M. Techniques of program code obfuscation for secure software. Journal of Mobile, Embedded and Distributed Systems. 2011;III(4):205-219
  20. 20. Tang Z, Chen X, Fang D, Chen F. Research on java software protection with the obfuscation in identifier renaming. In: 2009 Fourth International Conference on Innovative Computing, Information and Control (ICICIC), Kaohsiung, Taiwan: IEEE; 2009. Vol. 2009. 2007. pp. 1067-1071. DOI: 10.1109/ICICIC.2009.312. ISBN: 978-1-4244-5544-7; 978-1-4244-5543-0; 978-0-7695-3873-0
  21. 21. L. Luo, J. Ming, D. Wu, P. Liu, and S. Zhu, Semantics-based obfuscation-resilient binary code similarity comparison with applications to software plagiarism detection. In: Proc. 22nd ACM SIGSOFT Int. Symp. Found. Softw. Eng. - FSE 2014, IEEE. 2014;43(12):389–400. Available from: https://ieeexplore-ieee-org.ezadmin.upm.edu.my/document/7823022
  22. 22. Bergström E, Åhlfeldt RM. Foundations and practice of security. In: Lect. Notes Comput. Sci. (Including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics). Switzerland: Springer Verlag. Vol. 9482. 2016. pp. 268-276
  23. 23. Amalarethinam DIG, Geetha JS. Image encryption and decryption in public key cryptography based on MR. In: Proc. Int. Conf. Comput. Commun. Technol. ICCCT 2015. Chennai, India: IEEE. June. 2015. pp. 133-138
  24. 24. Iman M u, Ishaq AFM. Anti-reversing as a tool to protect intellectual property. In: Eng. Syst. Manag. Its Appl. (ICESMA), 2010 Second Int. Conf. Sharjah, United Arab Emirates: IEEE; 2010. pp. 1-5
  25. 25. Angyal L, Lengyel L, Charaf H. An overview of the state-of-the-art reverse engineering techniques. In: 7th International Symposium of Hungarian Researchers on Computational Intelligence. Budapest, Hungary: HUCI; 2006. pp. 507-516
  26. 26. Leahy P. What is unicode?, ThoughtCo. 2017. p. 1
  27. 27. Real-time MAT, Ef- PCCS, Butaha MA. Crypto-compression systems for efficient embedded to cite this version. [thèse de Doctorat’]. 2017
  28. 28. Wang ZY, Wu WM. Technique of javascript code obfuscation based on control flow tansformations. Applied Mechanics and Materials. 2014;519–520(Iccse):389-392
  29. 29. Sun Y. How to render mathematical symbols in Java. March, 2003
  30. 30. Baker SIB, Al-Hamami AH. Novel algorithm in symmetric encryption (NASE): Based on feistel cipher. In: Proc. 2017 International Conference on New Trends in Computing Sciences (ICTCS), 2017. Amman, Jordan: IEEE; Jan 2018. Vol. 3. 2017. pp. 191-196
  31. 31. Sosonkin M, Naumovich G, Memon N. Obfuscation of design intent in object-oriented applications. In: DRM 2003 Proc. 3rd ACM workshop on Digital rights management, Washington DC USA: Association for Computing Machinery. 2003. pp. 142-153
  32. 32. Alkawaz MH, Steven SJ, Hajamydeen AI. Detecting phishing website using machine learning. In: 2020 16th IEEE International Colloquium on Signal Processing & its Applications (CSPA). Langkawi, Malaysia. 2020. pp. 111-114. DOI: 10.1109/CSPA48992.2020.9068728
  33. 33. Al Yahyaee OMAR. Information Security Management in Abu Dhabi Police, UAE. [Doctoral dissertation] Management & Science University. 2016
  34. 34. Alkawaz MH, Steven SJ, Hajamydeen AI, Ramli R. A Comprehensive survey on identification and analysis of phishing website based on machine learning methods. In: 2021 IEEE 11th IEEE Symposium on Computer Applications & Industrial Electronics (ISCAIE). Penang, Malaysia. 2021. pp. 82-87. DOI: 10.1109/ISCAIE51753.2021.9431794
  35. 35. Alkawaz MH, Joanne Steven S, Mohammad OF, Gapar Md Johar M. Identification and analysis of phishing website based on machine learning methods. In: 2022 IEEE 12th Symposium on Computer Applications & Industrial Electronics (ISCAIE). Penang, Malaysia. 2022. pp. 246-251. DOI: 10.1109/ISCAIE54458.2022.9794467

Written By

Asma’a Al-Hakimi and Abu Bakar Md Sultan

Submitted: 27 November 2022 Reviewed: 21 December 2022 Published: 23 February 2023