Binary code of DNA.
Information protection and secrecy are major concerns, especially regarding the internet’s rapid growth and widespread usage. Unauthorized database access is becoming more common and is being combated using a variety of encrypted communication methods, such as encryption and data hiding. DNA cryptography and steganography are used as carriers by utilizing the bio-molecular computing properties that have become more common in recent years. This study examines recently published DNA steganography algorithms, which use DNA to encrypt confidential data transmitted through an insecure communication channel. Several DNA-based steganography strategies will be addressed, with a focus on the algorithm’s advantages and drawbacks. Probability cracking, blindness, double layer of security, and other considerations are used to compare steganography algorithms. This research would help and create more effective and accurate DNA steganography strategies in the future.
- DNA Computing
The concept of security refers to the prevention of unauthorized access to information. In today’s computer science, encryption’s primary goal is to prevent confidential data from being altered, lost, hacked, or compromised by a third party . Encryption and concealment of information are among the most widely used methods in networking and information security. Encryption and concealment of information (both similar concepts) are commonly used to keep communications secure [2, 3] fact that both methods have the same purpose. Still, their development and use are vastly different. Cryptography alters the sense of coded writing, while steganography is a covert way of writing that conceals the encrypted message’s nature. Thus, in data transmission through an insecure public medium, the science of steganography is more reliable, necessary and often preferred over encryption [4, 5].
Various steganography systems, as well as their criteria, are discussed in this article based on the literature. Different systems use different strategies for embedding data, each with a set of benchmarks to evaluate performance and determine its advantages and disadvantages. Vulnerability to adversary attack is one of the three common criteria. To avoid arousing suspicion, the embedded data must be kept undetectable both visually and statistically. A fully reliable system with comparable carrier and stego file statistics should be considered during the message embedding process [5, 6]. The carrier’s power, known as the amount of data concealed within it, is the second common prerequisite. The development of a steganography technique could allow more sensitive data to be hidden within the carrier while maintaining the properties of the stego file [1, 5]. A successful steganography strategy should keep enough information in its embedding capability . Imperceptibility is the third common prerequisite, which is characterized as having a high embedding potential and the ability to resist intruders. The stego carrier should ideally be devoid of visual artifacts and the greater the stego carrier’s fidelity should be better .
The masking theory is typically modeled by a pair of algorithms: embedding and extraction, as seen in Figure 1. The embedding algorithm produces a stego file containing the private data by merging two folders, secret and vector data, with an optional key. On the other hand, the extraction algorithm is used to recover the secret data from the stego file . Steganography is a method of concealing data that does not require the use of a key. Its protection depends on the privacy of the algorithm. As a result, it is known as a less reliable approach [8, 9]. Another way to hide information is to hide confidential data, which uses one key for all operations (embedding and extraction). One of the most important benefits of this type is its rapid stage in all procedures [10, 11]. Unlike previous patterns, public steganography uses two keys for embedding and extraction: embedding and the other for extracting. The biggest value of this type is the durability of the system. The identification of the other key could be a concern if one of the keys is identified by a third party [10, 12]. On the other hand, this model is 100–1000 times slower than private steganography .
Several applications represent a container for confidential data. In steganography schemes, these programs are used as cover objects or carriers. Per carrier has its own set of characteristics that aid in the data concealment process. The carrier’s field availability determines the amount of confidential information needed to hide data within each carrier. Text, audio, video, and photographs are examples of multimedia used to hide records. Text can be obscured by changing the text’s layout, inserting an nth character from the text, or changing any of the rules, such as spacing. Text can also be hidden using a code made up of letters, lines, and page numbers. However, this process is insecure . The biggest benefit of this carrier is that it does not take a lot of memory and is quick to switch.
In contrast to other carriers, it has a very limited number of redundant data [10, 14]. The use of inaudible frequencies and a small shift in the binary sequence of an audio file can be used to hide data in audio files [2, 15]. Data masking in video files is more efficient and effective due to the wide available space. Allowing data to be hidden within multiple video frames . Uncompressed and compressed video are the two main formats of video in which data can be hidden. Digital images have been common carriers for masking confidential information due to their high redundancy, high capacity in images, low effect on exposure, and ease of manipulation [15, 17]. DNA is a relatively recent vector that has been used in the field of steganography. In this article, we look at the data hidden in DNA.
2. Deoxyribonucleic acid (DNA)
The most important molecular structure in biology is deoxyribonucleic acid (DNA), which encodes the information required to generate and direct all chemical elements in the human body. As a result, DNA has been suggested as a possible candidate for computational purposes .
2.1 DNA structure
DNA is described as a living creature’s genetic blueprint. Each body cell has its DNA collection and a polymer made up of monomers called deoxyribose nucleotides, consisting of three components, as seen in Figure 2 .
The human body is made up of trillions of cells, each with its purpose. As seen in Figure 3, each cell has a nucleus that comprises several chromosomes. The majority of DNA is present in a nucleus, which is known as nucleus DNA, and the remainder is found in mitochondria, which is known as mitochondria DNA (mtDNA). Each cell’s activity is regulated by DNA. DNA chromosome is made up of a DNA molecule of genes. A gene is the entire genetic makeup of an organism, containing information from all chromosomes .
In 1953, Watson and Crick discovered DNA structure, a form of genetic material. DNA is a long molecule present in all living things’ body cells. DNA is a kind of bacterial plasma that contains all lifestyles. It is made up of two simple bands that are twisted around each other in a double helix (see Figure 4). Each DNA chain is made up of nucleotides, which are small subunits. The four chemical bases in the chain DNA are Adenine (A), Thymine (T), Guanine (G) and Cytosine (C), which bind to sugar and phosphates in the backbone to complete the nucleotide. Purines (A and G) and Pyrimidine (T and C) are the two DNA bases in biology. Continuously (A) is bound to (T) by two hydrogen bonds, and (C) is bound to (G) by three hydrogen bonds [19, 21]. Transcription is the method for producing RNA, which is an intermediate copy of DNA instructions. Adenine (A), Cytosine (C), Uracil (U), and Guanine (G) are the four bases that makeup RNA. All 64 codons are represented in Figure 5. The STOP codons do not necessarily symbolize any amino acids but rather indicate the protein chain’s end. The twenty amino acids are determined by the remaining 61 codons. Some amino acids are coded by several codons . As a result of this codon duplication, it is possible to change the genetic sequence while keeping it functional [11, 22, 23].
2.2 DNA computing
Currently, biology methods are used in a variety of fields. DNA is a relatively new biological technology that is used in a variety of applications . This is because DNA computing can solve a variety of NP-complete problems, in which the computation time increases dramatically.
There has been a considerable amount of research in this field, with significant progress made on DNA and the immune system . Leonard Adelman conducted the first experiment in DNA computing (bio-molecular computing) in 1994, in which molecular biology instruments were used to solve a portion of the standard path of the Hamiltonian puzzle. Computing with molecules directly was discovered at the time, and it was regarded as a new discipline in terms of science defense . The satisfaction problem (SAT), an NP-complete problem, was solved using DNA computing in a 1995 study by Lipton. The offered approach took advantage of DNA’s parallelism and its computational and storage capacities . In 1997, Ogihara and Ray discovered that DNA could be used to simulate AND and OR gates . Clelland  proposed the first successful experiment of a DNA steganography technique for concealing sensitive data using DNA microdots.
2.3 Binary code of DNA
A, C, G, and T are the four chemical bases that make up each DNA chain. A is biologically related to T, while C is related to G. T The synthesis of DNA rules can be modified in binary arithmetic by changing input judgments, such as assuming that T is related to C or T is related to G . Researchers would use a binary encoding rule to translate a hidden message into DNA rules before mixing it with sequenced DNA to store data in DNA particles. For each rule (A), researchers may use the corresponding binary form: binary formulas can be “00,” “01,” “10,” or “11.” This can be expressed as in Table 1. The encoding of DNA and its random properties make it an ideal candidate for both coding and coding. As a result, converting DNA into the binary form will result in 4! = 24 different encoding methods [30, 31]. On DNA bases, logical operations such as addition, subtraction, XOR, AND, OR, and NOT are possible.
|DNA base||Binary code|
3. Comparative study
The aim of the comparison presented in this study is to ensure that researchers are aware of the shortcomings in current steganography systems, thus inspiring future advances in this field. Table 2 compares the strengths and disadvantages of existing algorithms in terms of security problems such as chance of intrusion, double security layer, blindness, and more.
|3||Ref ||Does not encrypt confidential information when storing it.|
The derived comparison in Table 2 aims to clarify the proposed DNA’s strengths and weaknesses using data hiding algorithms. Encrypting sensitive data into encryption data before embedding, rather than including the initial data format, improves confidentiality [13, 18, 22, 33, 34, 35, 36, 38, 40, 43, 48, 49, 51, 53, 56, 58, 59, 60, 61]. Playfair technology, adopted in , is the most promising encryption technology combined with DNA-based data masking technology. A thorough comparison of several encryption methods, including vigenere and Playfair, AES, and RSA ciphers, has been done in their work. Any of them was paired with a replacement tool for hiding data in DNA. The findings revealed that the Playfair cipher is not only quick and easy to use, but it also has a high level of protection and ability.
The blindness trait, which eliminates the need to give the original DNA connection to the recipient, is the primary function supported by DNA-based data masking techniques. The main goal of the blindness feature is to improve protection and avoid any intruder way of detecting it, as shown in [11, 18, 25, 33, 34, 35, 36, 39, 44, 47, 58, 59]. This is accomplished by minimizing the requisite data that is transmitted to the recipient as much as possible. One of the strengths is to biologically preserve the DNA relationship’s original features during the inclusion step while maintaining a fair data load. The reference DNA is used to mask hidden data while preserving protein processing functions. As shown in [11, 21, 25, 33, 34, 42, 53, 59, 61], some DNA characteristics such as silent mutation and codon repetition can mask details and alter the genetic sequence without changing the protein chain.
After most data-masking algorithms, the carrier can experience some distortion. Data masking techniques take care of embedding and embedded data; that is why it is communicated invisibly. As a result, it is important to minimize conveyor distortion. When data is entered into a string of stego DNA, the sequenced DNA’s length and the degree of change are used to determine stego DNA precision. The low rate of change and lack of expansion rate results in high-quality DNA, which attracts less interest from potential attackers. [11, 32, 33, 34, 39, 40] reaches a low modulation frequency. Moreover, the expansion rate characteristic of DNA stego is not achieved at [11, 13, 21, 32, 33, 34, 38, 39, 40, 59], which means that the payload is equal to zero.
It is recommended to use a two-stage steganography technique to hide sensitive data with more detail than previous data masking methods. Using two separate vectors in the same manner, increases confidentiality and makes it difficult for criminals to ingest or recover hidden data. Several methods [43, 44, 48, 49, 50, 55, 56, 57, 58] used the ref. DNA with another multimedia player to cover the hidden data. Some built DNA from cover images or confidential information, as shown in [44, 48, 49, 50, 56], while others used a random sample or selected from an online database, as shown in [43, 55, 56, 57, 58].
The main factor is one of the most important aspects of data masking strategies. Data masking schemas are centered on the key used and can be classified into three categories. As shown in [21, 23, 32, 34, 36, 37, 41, 46, 52, 54, 57], pure data masking is less reliable because it does not use any key. As a result, using a key increases the device’s usability by complicating the data-masking mechanism attack. Even if the perpetrators figure out what data-masking scheme is being used, they are unable to retrieve it. The carrier’s sensitive information is not protected by the key. The secret is only in the hands of the sender and receiver. As a result, it is advisable to use a strong key when encrypting files, which ensures a more stable method. The second form is the hidden key [11, 13, 18, 25, 33, 35, 38, 39, 42, 43, 44, 45, 47, 48, 49, 50, 51, 53, 55, 56, 59, 60, 61, 62], which was accomplished in [11, 13, 18, 25, 33, 35, 38, 39, 42, 43, 44, 45, 47, 48, 49, 50, 51, 53, 55, 56, 59, 60]. The third form is classified as a public key, as shown by [22, 40, 58]. The public key is more secure than the private key in general, but it is still slower.
The probability of splitting the code and accessing confidential, sensitive data is known as the algorithm-cracking potential. Studying the probability of a striatum fracture aims to identify the variables that ensure that the risk of rupture is reduced. The likelihood of a leak is determined by the inclusion of certain unknown variables in the algorithm used to mask sensitive data, not by the amount of attempts made before the attacker gained access to the secret data. High probability penetration leads to high protection of the data-masking strategy described in [18, 22, 34, 35, 47, 59, 62]. The replacement strategy is believed to be a more powerful means of concealing data in DNA. The DNA sequence length can be preserved using this process as long as the payload is kept at zero. It also has more power as seen in [32, 33, 36, 37, 38, 44, 53, 56, 60], because it substitutes certain DNA nucleotides with cached data blocks or other nucleotides based on confidential data.
Capacity is a vital aspect of any data masking strategy, and it is one of the main criteria for data masking techniques. A steganography strategy must have broad data anonymization potential. This capacity can be measured in absolute terms, such as the hidden message’s volume (for example, the data embedding rate, the bit per pixel, the bit per non-zero discrete cosine, the conversion factor, or the ratio of the secret message to a medium). The strength of DNA is calculated in bits per nucleotide (bpn). Thus, one of the main concerns for researchers in this area is improving the potential of secret results, which has previously been accomplished in [13, 18, 21, 22, 32, 34, 36, 37, 41, 46, 48, 50, 53, 56, 59, 60, 61].
As a result, it can be inferred that the primary goal of DNA-based double-layer masking algorithms is to encode sensitive data before hiding it in a high-power, blind, bio-stored, low moderation rate, load-free algorithm, not a pure method, with a high probability crack. In [33, 34, 59] suggested a low moderation rate, preservation of stretch length DNA for contrast, blindness, preservation of DNA versatility, double layer of security, high strength, and not a pure algorithm.
An increase in storage demand has generated a massive demand for creating new and evolving strategies for storing large amounts of data. DNA has recently been recognized as an efficient data carrier with the additional benefit of dependable data storage. DNA’s bio-molecular computing capabilities are being used in cryptography and steganography. This research compares some recent DNA-based steganography algorithms and points out their security flaws. Each algorithm’s advantages and disadvantages are also listed. Some crucial issues are discussed in terms of chance breaking, double layer security, single and double hiding layers, blindness, biologically retained DNA, alteration rate, an extension of DNA comparison, not a pure algorithm, substituting operation, and capacity. This study’s comparison aims to provide researchers with the information they need to perform future tasks on more effective and accurate stable DNA steganography techniques.
Conflict of interest
“The authors declare no conflict of interest.”