Survey of RSA Vulnerabilities

Rivest et al. patented (US) RSA. RSA forms the basis of most public encryption systems. It describes a public key encryption algorithm and certification process, which protects user data over networks. The patent expired in September 2000 and now is available for general use. According to Marketsandmarkets.com, the global network encryption market size is expected to grow from USD 2.9 billion in 2018 to USD 4.6 billion by 2023, at a compound annual growth rate (CAGR) of 9.8%. Major growth drivers for the market include increasing adoption of optical transmission, an increasing demand to meet various regulatory compliances and a growing focus on shielding organizations from network security breaches. In short, RSA forms the basis of almost all public encryption systems. This, however, is not without risk. This chapter explores some of these vulnerabilities in a mathematical context and provides the reader with an appreciation of the strength of RSA.


Introduction
Rivest et al. patented (US) RSA, which forms the basis for most public encryption systems. RSA describes a public key encryption algorithm and certification process, which protects user data over networks. The patent expired in September 2000 and now is available for general use. According to Marketsandmarkets.com [1], the global network encryption market size is expected to grow from USD 2.9 billion in 2018 to USD 4.6 billion by 2023, at a compound annual growth rate (CAGR) of 9.8%. Major growth drivers for the market include increasing adoption of optical transmission, an increasing demand to meet various regulatory compliances and a growing focus on shielding organizations from network security breaches. In short, RSA forms the basis of almost all public encryption systems. This, however, is not without risk. This chapter explores some of these vulnerabilities in a mathematical context and provides the reader with an appreciation of the strength of RSA.
RSA is secure and difficult to factorize in polynomial time. Conventional sequential computing machines, running in polynomial time, take an unfeasible amount of CPU cycles to find factorization solutions to RSA keys. Quantum computing holds great promise; this, however, is realistically still some way off. Opportunities exist using conventional computing (sequential and parallel) using better mathematical techniques. A discussion on exploiting implementation flaws is also considered.
Of keen interest is our lack of understanding of prime numbers and their structure. The current perception is that there appears to be some underlying structure, but essentially, primes are randomly distributed. This is explored in Sections 8 and 12.
Vulnerabilities in the selection of primes are exploited in Section 5 using Euler's factorization.
Poor RSA key design and their exploits are considered in Section 6 using Wiener's method and in Sections 15-17 using a combination of LLL, Coppersmith and Pohlig-Hellman. All of these attacks can be mitigated by designing the RSA keys with these exploits in mind. RSA key design (Section 2) consists of two parts, a private key N; d ð Þand a public key N; e ð Þ. A composite number N, is derived from two prime numbers. The d; e ð Þ numbers are selected in an ad hoc manner using Euler's totient.
Development of quantum computing is continuing at breakneck speed; however useful machines are yet to appear. Parallel computing however is here and now, and whilst factorizing RSA keys is not achievable on conventional computers in polynomial time, parallel computing has allowed for multiple solutions to be tested simultaneously. This is an area where research continues and new algorithms as shown in Sections 20 and 14 lend themselves well to GPU parallel processing systems.

Structure of RSA numbers
Consider RSA100 challenge number RSA À 100 ¼ 152260502792253336053561837813263742971806811496138 RSA100 is a 100 binary bit number made up of two 50 binary bit prime numbers. The motivation in breaking this composite number allows us to find the Euler's totient number φ n . Once this is known, using the public key P U ¼ N; e ð Þ, it is possible to derive the private key P R ¼ N; d ð Þ, and hence all cypher-text encrypted (e) messages can thus be decrypted back to plain text, using (d).

A simple RSA encryption/decryption example
Using two primes P 1 and P 2 to generate a composite number N, Totient φ (Euler's totient function) Calculate totient φ n = (P 1 À 1) (P 2 À 1) = (1462001 À 1) (1462009 À 1) = 2137455696000 Arbitrarily choose a public key such that e is an integer, not a factor of mod N, and 1 , e , φ, e = 13 The public key is made up of N and e, such that Encrypt a message m, into cipher text C, with public key P U . Let the message m = 1461989. C ¼ m e mod N ¼ 146198913 13 mod 2137458620009 ð Þ ¼ 1912018123454. To recover the original message, decrypt using Private Key, P R = (N, d) = (1912018123454,1973036027077) m ¼ C d mod N ¼ 1912018123454 1973036027077 mod 2137458620009 ð Þ ¼ 1461989: From this simple example, consider the following: How can we use a known public key P U = (N,e) to decrypt the original message? To decrypt the message, the private key is used: How can d, be discovered? d is derived using Euler's totient function [φ n = (P 1 -1) (P 2 -1)], and the extended Euclidean algorithm ed mod φ n ¼ 1. However when a public key is transmitted, the totient φ n and the two primes P 1 and P 2 remain secret. If φ n , P 1 or P 2 can be determined, the private key will be compromised and the cypher-text will no longer be secure.
When the totient φ n is known, d can be determined through the normal key generation processes, so the determination of the two primes (P 1 , P 2 ) is not required to recover the message from the cypher-text. The following proof is provided for completeness and shows how the two primes P 1 , P 2 can be recovered if the composite N and the totient φ n are known.
4. If the composite N and the totient φ n are known, the original primes can be recovered The quadratic formula can be used to find P 1 and P 2 Express primes in terms of N, φ n P 1 = NÀφ n ÀP 2 + 1, P 2 = NÀφ n ÀP 1 + 1N ¼ P 1 P 2 substitute for P 2 ¼) N = P 1 (NÀφ n ÀP 1 + 1) = P 1 NÀP 1 φ n -P 1 2 + P 1 When N and φ n are known: N = 2137458620009, φ n = 2137455696000 Using the quadratic formula, P 1 and P 2 can be recovered if the composite N and the totient φ n are known.

Fermat's factorization method
Þis the difference of two squares.
Survey of RSA Vulnerabilities DOI: http://dx.doi.org/10.5772/intechopen.84852 As the first trial for a, a 1 ¼ ffiffiffiffiffiffi ffi N, p then check if Δa 1 ¼ a 2 1 À N is a square number. There are only 22 combinations of which the last two digits are a square number. The other 78 can be eliminated.
If Δa 1 is not a square number, then a 2 : so the subsequent differences are obtained by adding two.
Consider the example N = 2137458620009.

Euler's factorization method
Gaussian primes are of the form 4x À 1, and primes of the form 4x þ 1 are Pythagorean. Fermat's Christmas theorem on sum of two squares states that an odd prime can be expressed as P ¼ x 2 þ y 2 iff P 1 mod 4.
Gaussian primes are of the form P 3 mod 4 and are not representable as the sum of two squares.
Consider a composite number N: N = P 1 P 2 and P 1 : Consider the example N = 2137458620009; find the factorization values of P 1 and P 2 .
Using the greatest common divisor (gcd): As per Section 2, if the composite N and the totient φ n are known, the original primes P 1 and P 2 can be recovered. Overmars [3] showed that all Pythagorean triples could be represented as N ¼ n 2 þ n þ 2m À 1 ð Þ 2 . If the composite number N, is constructed using two Pythagorean primes (4x + 1) then two representations as the sum of two squares can be found. Euler's Factorization Method (Section 4) can be applied. Finding these two representations is non-trivial and CPU-intensive. The equation N m; n ð Þ ¼n 2 þ n þ 2m À 1 ð Þ 2 provides a course search using increments of n and fine convergence using m. In this way n is incremented and m is decremented about N to find the two solutions along the diagonal of a field of N m; n ð Þ≈ N. Consider the example, N ¼ 2137458620009.

Sum of squares
For completeness N can be represented as two Pythagorean triangles as shown Once the two sum of two squares has been found, Euler's factorization method (Section 4), can be used to find the prime constructions of N : If the composite number (N) is constructed using Pythagorean primes (4x þ 1), then a solution exists as two sums of two squares and Euler's factorization method can be applied.

Gaussian and Pythagorean primes
As shown in Section 4, if Pythagorean primes (4x þ 1 4x À 3) are used to construct the composite number (N), a solution exists as two sums of two squares. However, if N is constructed using Gaussian primes (4x À 1 4x þ 3), then Euler's sum of two squares method cannot be used. Is there a test that we can use to see if the composite has been constructed using Pythagorean primes? (Table 1) Consider the following composite constructions: Þusing a mix of Pythagorean and Gaussian primes i. Pythagorean prime construction Two sum of two squares representations exist and Euler's factorization can be used. 1 P mod 4. 9 P mod 16. See Section 4. 793 ¼ 13 Sums of three squares exist. 1 P mod 4. 9 P mod 16.
iii. Mixed Pythagorean-Gaussian prime construction : Sums of four squares exist. 3 P mod 4. 13 * 59 ¼ 767 In summary, a composite whose construction is based upon both Pythagorean and Gaussian primes can easily be identified when P mod 4 3 is true. However, sums of four squares exist and Euler's factorization cannot be used. When P mod 4 1 is true, the composite could be constructed using Pythagorean primes or Gaussian primes. Use the Legendre test to further discriminate. When the Pythagorean construct is confirmed, the two sums of two squares can be found, and Euler's factorization can be used. If the composite construction is both Pythagorean and Gaussian, sums of three squares exist and Euler's factorization cannot be used.

Overmars factorization method
Another classification of the composite number uses a different construct for primes and seeks to define the composite number as follows: Let N ¼ P 1 P 2 and test N : N AE 1 ð Þmod4 ¼ 0. Two cases are considered in the classification, and this determines the constructs of the primes used. Note the sign of AE1 determines the case used, and the test is both simple and concise [4].
This method is reasonable for small composites but becomes computationally unfeasible for large composites.

Extensions of the Overmars factorization method
Þfor all cases. Choosing the largest value of a ensures a rapid convergence to the solution. This is illustrated by example.
When a is small, this method becomes computationally unfeasible.

Overmars factorization using smooth factors
Consider the construction of primes (Sections 8 and 9), P ¼ a m AE n ð ÞAE1. More generally, P : Table 2).
When a smooth x can be found, larger a values allow for faster convergence to a solution. The selection of x and a is somewhat arbitrary and prime constructs are a modification of Fermat's a 2 À b 2 . Smooth factors of N AE x 2 produce larger a values and convergence faster to a solution.

Primes
The current state of the art in prime number generation is Atkin's sieve [5,6]. The algorithm completely ignores any numbers with remainder mod 60 that is divisible by 2, 3 or 5, since numbers with a mod 60 remainder divisible by one of  these three primes are themselves divisible by that prime. Atkin stated three theorems given below: 1. All numbers n with mod 60 remainder 1, 13, 17, 29, 37, 41, 49 or 53 are mod 4 1. These numbers are prime if the number of solutions to 4x 2 + y 2 = n is odd and the number is squarefree.
2. All numbers n with mod 60 remainder 7, 19, 31 or 43 have a mod 6 1. These numbers are prime if and only if the number of solutions to 3x 2 + y 2 = n is odd and the number is squarefree.
3. All numbers n with mod 60 remainder 11, 23, 47 or 59 have a mod 12 11. These numbers are prime if and only if the number of solutions to 3x 2 À y 2 = n is odd and the number is squarefree.
None of the primes are divisible by 2, 3 or 5 and are not divisible by their squares (2 2 , 3 2 , and 5 2 ). For a thorough analysis of "primes of the Form x 2 + ny 2 " the reader is referred to a text by Cox [7].
The often overlooked works of Dubner, who is credited with the term "primorial" [8] are now considered [9,10]. The primorial is a factorial of primes: The nth primorial is the product of n primes, where π n ð Þ is the prime counting function.
Using this structure, Dubner was able to create series of primes in a particular primorial.
Repeat these steps for P 29 # and so on… ( Table 4) The conversion to a decimal from the base primorial (Section 12) provides P 1 and P 2

Lenstra-Lenstra-Lavász lattice reduction (LLL)
The (LLL) forms the basis of the Coppersmith attack (Section 15), and a brief explanation is given here with further reading and references for the reader. The Lenstra-Lenstra-Lavász (LLL) lattice basis reduction algorithm [13] calculates an LLL-reduced, short, nearly orthogonal lattice basis, in time O d 5 n log 3 B À Á , where B is the largest length of b i under the Euclidean norm, given a basis B ¼ b 1 ; b 2 ; …; b d f g with n-dimensional integer coordinates, for a lattice L (a discrete subgroup of R n ) with d ≤ n and giving polynomial-time factorization of polynomials with rational coefficients.
A thorough explanation is given by Bosma [14], and a summary of the example contained in the reference is given below. Using the Lenstra-Lenstra-Lavász lattice reduction (LLL), the short vectors in a lattice can be found. This is used by the Coppersmith attack. Coppersmith's algorithm uses the LLL to construct polynomials with small coefficients that all have the same root modulo. When a linear combination is found to meet inequality conditions, standard factorization methods can find the solutions over integers.

Coppersmith attack
When d is small and e is large; via the Euler totient rule À Á , the Wiener attack (Section 5) can be used. Conversely, when d is large, e is small. Particular applications of the Coppersmith method for attacking RSA include cases when the public exponent e is small or when partial knowledge of the secret key is available (Section 13) [15].
A small public exponent e, reduces the encryption time. Common choices for e are 3, 17 and 65537 2 16 þ1 À Á [16]. These are Fermat primes F x : F x ¼ 2 2 x þ 1 and are chosen because the modular exponent derivation is faster. The Coppersmith method reduces the solving of modular polynomial equations to solving polynomial equations over integers.
Let F x ð Þ ¼ x n þ a nÀ1 x n À 1 þ … þ a 1 x þ a 0 and F x 0 ð Þ 0 mod M for an integer x 0 j j, M 1 n . Coppersmith can find the integer solution for x 0 by finding a different polynomial f related to F that has the root x 0 mod M but only has small coefficients. The small coefficients are constructed using the LLL (Section 14). Given F, the LLL constructs polynomials p 1 x ð Þ, p 2 x ð Þ, …p n x ð Þ that all have same root x 0 mod M a , a ∈ Z: a depends on the degree of F and the size of x 0 . Any linear combination has the same root x 0 mod M a .
The next step is to use LLL to construct a linear combination f x ð Þ ¼ ∑c i p i x ð Þ of the p i x ð Þ so that the inequality f x 0 ð Þ j j, M a holds. Then standard factorization provides the zeroes of f x ð Þ over Z. Let N be an integer and f ∈ Z x ½ be a monic polynomial of degree d, over integers Þthen all integers x 0 , X : f x 0 ð Þ 0 mod N can now be found. All roots of f mod N, smaller than X ¼ N 1 d can be found.

Pohlig-Hellman
The Pohlig-Hellman [17] algorithm is a method to compute a discrete logarithm (which is a difficult problem) on a multiplicative group. The order of which is a smooth number (also called friable), meaning its order can be factorized into small primes. A positive integer is called B-smooth if none of its prime factors is greater than B. For example, 1620 has prime factorization 2 2 Â 3 4 Â 5; therefore 1620 is 5smooth because none of its prime factors are greater than 5. This is similar to that of the Overmars factorization method (Section 10). The Pohlig-Hellman [17] algorithm applies to groups whose order is a prime power. The basic idea is to iteratively compute the p-adic digits of the logarithm by repeatedly "shifting out" all but one unknown digit in the exponent and computing that digit by elementary methods. This is a similar idea to Section 13.
INPUT: A cyclic group G of order n with a generator g, an element h ∈ G, and a prime factorization n ¼ Q r i¼1 p e i i OUTPUT: The unique integer x ∈ 0; …; n À 1 f g: 1. Find the prime factors of p À 1 ) 41 À 1 ¼ 40 ¼ 2 3 5 ) g s ¼ 2, 5. Find one x for each g.

For
. Now we need another x from the other g 3. For g ¼ 5, x ¼ 5 0 x 0 only one 5, only one term: By the Chinese remainder theorem, x ¼ 13 mod 40 since the exponents are p À 1 ¼ 41 À 1 ¼ 40 hence 12 7 13 mod41: So the solution to 12 ¼ 7 x mod 41 ) x ¼ 13.

Shor's algorithm
Shor's algorithm [18], factors composite numbers, N ¼ P 1 P 2 , consisting of two primes in polynomial time using quantum computing techniques. The algorithm evaluates the period of a x mod n where gcd a; n ð Þ ¼ 1: This is inefficient using sequential computing on a conventional computer. When run on a quantum computer, a congruence of squares with probability 0.5 occurs in polynomial time. For two co-prime sinusoids of period P 1 and P 2 , at what point do they zero-cross each other? The phase of each sinusoid at any given point is observed, and if they are N as a composite of two Sinusoids P 1 and P 2 [19]. factors of N then the phase of P 1 and P 2 is zero. Shor's algorithm tests the phase of P 1 ¼ P 2 ¼ N ¼ 0 (Figure 4).
Phase estimation is well suited to quantum computers and hence this factorization technique produces solutions in polynomial time. For further information on quantum phase estimation, the reader is directed to WIKI [20]. The impact of this type of attack is discussed in detail by Mosca [21].
1. Choose a , N 2. Find the period r of a n mod N (using Quantum computing) 3. Check r is even : a r 2 þ1 0 mod N Find the period r of a n mod N Euler's factorization (Section 6) cannot be used because 7 has no sum of squares nor does 35.

Attacking public key infrastructure
Public infrastructure cryptographic hardware uses a library RSALib. This is found in both NIST FIPS 140-2 and CC EAL 5+. These are certified devices for use in identity cards, passports, Trusted Platform Modules, PGP and tokens for authentication and software signing. This is in use in tens of millions of devices worldwide. Nemec et al. [22] have identified a vulnerability that allows for the factorization of 1024 and 2048 bit keys in less than 3 CPU months.
RSALib primes are of the form p ¼ k * M þ 65537 a mod M ð Þ . These can be fingerprinted using the discrete logarithm log 65537 N mod M.
The public modulus N is generated by 65537 in the multiplicative group Z * M . The public modulus of RSALib can thus be fingerprinted with the discrete logarithm c ¼ log 65537 N mod M. This can be factorized using Pohlig-Hellman (Section 16). The group G ¼ 65537 is smooth G j j ¼ 2 4 * 3 4 * 5 2 * 7 * 11 * 13 * 17 * 23 * 29 * 37 * 41 * 53 * 83 for RSA 512 keys. The smoothness of G is due to the smoothness of M being Primorial.
Factorization is achieved using the Coppersmith algorithm with a known p mod M : 65537 a mod M. Nemec et al used the Howgrave-Graham [23] implementation of the Coppersmith's algorithm to find a small solution x 0 of: A summary of RSALib vulnerability and its impact is now given and the reader is directed to Memec et al. [22] for further detail. eIDs used in passports for citizens are affected. Code signing is vulnerable. Twenty-four percent of TPMs used in laptops are affected (sample size 41). A third of PGP, used in email systems could be factorizable. There was no observable impact on TLS/HTTPS. One hundred percent of SCADA systems sampled were affected (sample 15). E-health and EMV payment cards were also likely to be susceptible.
Mitigating the impact of the RSALib vulnerability requires changing the algorithm. This requires a firmware replacement which is not possible in already deployed devices such as smartcards and TPMs whose code is stored in read-only memory. Key lengths not of 512, 1024, 2048 and 4096, such as RSA 3936 appear to be resilient. The use of key pairs outside of vulnerable devices could be deployed using another library. Changes to RSALib are required so that proveable safe primes are constructed not using the vulnerability.

Overmars factorization, bringing it together
Section 11 considered the following cases. The following discussion generalizes these cases and provides the structure for algorthmic solutions to be found. The palindromic nature of primes (Section 12) can be exploited further to explore solutions in a particular Primorial range. Recall; Now we need to develop the methodology for finding (selecting) a and x. This brings together the concepts of primorials [9], Smooth [24], small factors [17], factorization (Fermat), modulo testing as per Atkin's Sieve [5] and the structure of primes (Sections 12 and 18), to find as large an a as possible so that Overmars Factorization [4] converges more rapidly to a solution.
Recall the following (Section 12). Primes are of the form P ¼ 4x AE 1 and P ¼ 6x AE 1. Composite numbers, constructed from these primes: N ¼ P 1 P 2 , are a combination of Pythagorean and Gaussian primes. The following test N AE 1 ð Þmod 4 0 can be used to determine which combination of primes was used to construct the composite. If N þ 1 ð Þmod 4 0 is true a mix of Pythagorean and Gaussian primes was used. If N À 1 ð Þmod 4 0 is true then the composite consists of only Gaussian or only Pythagorean primes. The Sieve of Atkin [5] uses mod 12 0 and mod 60 0. This is now applied as per Overmars [4] in the following manner, if mod 12 0 is true then a ¼ 6, if mod 60 0 is true let a ¼ 30. The ideas of Atkin are further extended in both directions: mod 4 0 ) a ¼ 2, mod 420 0 ) a ¼ 210, mod 4620 0 ) a ¼ 2310, This is Primorial, P k # : P k #, k th Primorial is"Smooth". The general form (Section 19) is now given: Case (1 ⊕⊝, 2 ⊝⊕) Nþx 2 a ¼ a m 2 À n 2 If a : a ¼ 2P k # can be choosen, then we search x in the primes to find solutions to N AE (1 ⊕⊝, 2 ⊝⊕) N mod P 1 ½ 0, P 1 : . From Table 5, determining which x value should be used is not clear. Whilst x ¼ 1 should work, no solutions will be found if a : a ¼ 30: From Recall that the starting value for m : ffiffiffiffiffiffiffiffi ffi Whilst this is quite a good result the first failure needs also to be taken into account. This would be bound by the Primorial and Here m : 2a ≤ 123 ≤ m , 134663 ) 134540 iterations. This can be further bound by the Primorial. In the case of RSA numbers, the binary bits available to represent a particular prime number range can also be used to bound the range ( Table 6).
In this case, solutions using modulo testing generate good candidates to solve for (m, n), however for a ¼ 30030, three of the candidates have no solution. Using sequential programing, each possible candidate is considered one after another, until the maximum m value. However, using parallel programing techniques on GPUs (such as nVIDIA P100s), all of the candidates can be tested simultaneously and the processes are all terminated when one of the processes finds a solution. This is very efficient and effective in finding P 1 , P 2 . Once these are known, along with the public key P u ¼ N; e ð Þ, using Euler's totient, the private key P R ¼ N; d ð Þ can be determined. Once the private key is known the cypher-text is no longer secure.  Table 5.
Smooth candidates of the factors of N À x 2 .
x Modulo testing N À

Conclusion
In short RSA is secure and difficult to factorise. Conventional sequential computing machines, running in polynomial time, take an infeasible amount of CPU cycles to find factorization solutions to RSA keys. Quantum computing holds great promise and Shor's algorithm [18] demonstrates how this can be achieved. However, quantum computing is realistically still some way off. Opportunities exist using conventional computing (sequential and parallel) with better mathematical techniques. Section 18 showed how implementation vulnerabilities are introduced when "clever" low cost (CPU cycles) are implemented. The case in point showed methods for signature identification, upon which tailored targeted attacks could be launched against infrastruture FIPS140-2 devices, such as cryptographic routers. These sorts of attacks can be deployed in polynomial time using sequential programing techniques. Section 20, Overmars shows how factorization can be implemented using parellel processing techniques.
There is still much to be done and areas of further interest are a better understanding of the structure of primes. This will lead to faster prime number generating algorithms and hence faster solutions to the factorization problem. This will also lead to the generation of more robust primes that are less susceptible to factorization methods. An example of this is the use of non-Pythagorean primes. Section 5 showed how Euler's factorization could be used to attack such composite numbers. Hence a simple method to thwart this would be to use a mix of Pythagorean and Gaussian primes. Section 6 showed how small d values in the RSA private key P R ¼ N; d ð Þcould be attacked using Wiener's method. Small e values in the public key P U ¼ N; e ð Þ can be attacked using a combination of LLL, Coppersmith and Pohlig-Hellman (Sections 15-17). All of these attacks can be mitigated by choosing d and e carefully and ensuring that both are sufficiently large.
Development of quantum computing is continuing at break-neck speed, however useful machines are yet to appear. Parallel computing however is here and now and whilst factorizing RSA keys is not achievable on conventional computers in polynomial time, parallel computing has allowed for multiple solutions to be tested simultaneously. This is an area where research continues and new algorithms such as shown in Sections 20 and 14 lend themselves well to GPU parallel processing systems.
"There are known knowns. These are things we know that we know. There are known unknowns. That is to say, there are things that we know we don't know. But there are also unknown unknowns. There are things we don't know we don't know" [25].