Possible composite constructs using Pythagorean and Gaussian primes.
Rivest et al. patented (US) RSA. RSA forms the basis of most public encryption systems. It describes a public key encryption algorithm and certification process, which protects user data over networks. The patent expired in September 2000 and now is available for general use. According to Marketsandmarkets.com, the global network encryption market size is expected to grow from USD 2.9 billion in 2018 to USD 4.6 billion by 2023, at a compound annual growth rate (CAGR) of 9.8%. Major growth drivers for the market include increasing adoption of optical transmission, an increasing demand to meet various regulatory compliances and a growing focus on shielding organizations from network security breaches. In short, RSA forms the basis of almost all public encryption systems. This, however, is not without risk. This chapter explores some of these vulnerabilities in a mathematical context and provides the reader with an appreciation of the strength of RSA.
- public keys
Rivest et al. patented (US) RSA, which forms the basis for most public encryption systems. RSA describes a public key encryption algorithm and certification process, which protects user data over networks. The patent expired in September 2000 and now is available for general use. According to
RSA is secure and difficult to factorize in polynomial time. Conventional sequential computing machines, running in polynomial time, take an unfeasible amount of CPU cycles to find factorization solutions to RSA keys. Quantum computing holds great promise; this, however, is realistically still some way off. Opportunities exist using conventional computing (sequential and parallel) using better mathematical techniques. A discussion on exploiting implementation flaws is also considered.
Of keen interest is our lack of understanding of prime numbers and their structure. The current perception is that there appears to be some underlying structure, but essentially, primes are randomly distributed. This is explored in Sections 8 and 12. Vulnerabilities in the selection of primes are exploited in Section 5 using Euler’s factorization.
Poor RSA key design and their exploits are considered in Section 6 using Wiener’s method and in Sections 15–17 using a combination of LLL, Coppersmith and Pohlig-Hellman. All of these attacks can be mitigated by designing the RSA keys with these exploits in mind. RSA key design (Section 2) consists of two parts, a private key and a public key . A composite number N, is derived from two prime numbers. The numbers are selected in an ad hoc manner using Euler’s totient.
Development of quantum computing is continuing at breakneck speed; however useful machines are yet to appear. Parallel computing however is here and now, and whilst factorizing RSA keys is not achievable on conventional computers in polynomial time, parallel computing has allowed for multiple solutions to be tested simultaneously. This is an area where research continues and new algorithms as shown in Sections 20 and 14 lend themselves well to GPU parallel processing systems.
2. Structure of RSA numbers
Consider RSA100 challenge number
RSA100 is a 100 binary bit number made up of two 50 binary bit prime numbers. The motivation in breaking this composite number allows us to find the Euler’s totient number . Once this is known, using the public key , it is possible to derive the private key , and hence all cypher-text encrypted (e) messages can thus be decrypted back to plain text, using (d).
3. A simple RSA encryption/decryption example
Using two primes P1 and P2 to generate a composite number N,
Totient φ (Euler’s totient function)
Calculate totient φn = (P1 − 1) (P2 − 1) = (1462001 − 1) (1462009 − 1) = 2137455696000
Arbitrarily choose a public key such that e is an integer, not a factor of , and e = 13
The public key is made up of N and e, such that . A private key is made up of N and d, such that .
, is determined using the extended Euclidean algorithm.
. Therefore, private key, .
Encrypt a message m, into cipher text C, with public key PU. Let the message m = 1461989. . To recover the original message, decrypt using Private Key, PR= (N, d) = (1912018123454, 1973036027077)
From this simple example, consider the following: How can we use a known public key PU = (N,e) to decrypt the original message? To decrypt the message, the private key is used: . How can d, be discovered? d is derived using Euler’s totient function [φn = (P1 – 1) (P2 – 1)], and the extended Euclidean algorithm . However when a public key is transmitted, the totient φn and the two primes P1 and P2 remain secret. If φn, P1 or P2 can be determined, the private key will be compromised and the cypher-text will no longer be secure.
When the totient φn is known, d can be determined through the normal key generation processes, so the determination of the two primes (P1, P2) is not required to recover the message from the cypher-text. The following proof is provided for completeness and shows how the two primes P1, P2 can be recovered if the composite N and the totient φn are known.
4. If the composite N and the totient are known, the original primes can be recovered
The quadratic formula can be used to find P1 and P2 . General quadratic form:
Express primes in terms of N, φn P1 = N−φn−P2 + 1, P2 = N−φn−P1 + 1substitute for P2 ⟹ N = P1 (N−φn−P1 + 1) = P1 N−P1 φn – P12 + P1
When N and φn are known: N = 2137458620009, φn = 2137455696000
Using the quadratic formula, P1 and P2 can be recovered if the composite N and the totient φn are known.
5. Fermat’s factorization method
is the difference of two squares.
As the first trial for a, then check if is a square number.
There are only 22 combinations of which the last two digits are a square number. The other 78 can be eliminated.
If is not a square number, then .
so the subsequent differences are obtained by adding two.
Consider the example N = 2137458620009.
Check if is a square number.
Maurice Kraitchik, a Belgian mathematician, considered only values of and .
6. Euler’s factorization method
Gaussian primes are of the form , and primes of the form are Pythagorean. Fermat’s Christmas theorem on sum of two squares states that an odd prime can be expressed as iff .
Gaussian primes are of the form and are not representable as the sum of two squares.
Consider a composite number = P1P2 and : , .
Consider the example N = 2137458620009; find the factorization values of P1 and P2.
Using the sum of squares, .
Combining the even and odds: 14255602-13127202 = 6436032-3244032.
Using the greatest common divisor (gcd):
7. Wiener attack
Wiener’s theorem. Let and and a private key and a public key Let , given a public key , with φn. The attacker can efficiently recover d . The attack uses the continued fraction method to expose the private key d, when d is small. It assumes . Consider a public key
As per Section 2, if the composite N and the totient φn are known, the original primes P1 and P2 can be recovered.
8. Sum of squares
Overmars  showed that all Pythagorean triples could be represented as . If the composite number N, is constructed using two Pythagorean primes (4x + 1) then two representations as the sum of two squares can be found. Euler’s Factorization Method (Section 4) can be applied. Finding these two representations is non-trivial and CPU-intensive. The equation provides a course search using increments of n and fine convergence using m. In this way n is incremented and m is decremented about N to find the two solutions along the diagonal of a field of .
Consider the example, .
For completeness N can be represented as two Pythagorean triangles as shown  ∆(m,n)=∆(a,b,c).
Once the two sum of two squares has been found, Euler’s factorization method (Section 4), can be used to find the prime constructions of .
If the composite number (N) is constructed using Pythagorean primes (), then a solution exists as two sums of two squares and Euler’s factorization method can be applied.
9. Gaussian and Pythagorean primes
As shown in Section 4, if Pythagorean primes () are used to construct the composite number (N), a solution exists as two sums of two squares. However, if N is constructed using Gaussian primes (), then Euler’s sum of two squares method cannot be used. Is there a test that we can use to see if the composite has been constructed using Pythagorean primes? (Table 1)
|4x − 1||4x + 1||11||13|
|4y – 1||59||649||767|
|4y + 1||61||671||793|
Consider the following composite constructions:
using Pythagorean primes
using Gaussian primes
using a mix of Pythagorean and Gaussian primes
i. Pythagorean prime construction Two sum of two squares representations exist and Euler’s factorization can be used. . . See Section 4.
ii. Gaussian prime construction Sums of three squares exist. . .
Legendre’s three-square theorem can test the composite:
iii. Mixed Pythagorean-Gaussian prime construction Sums of four squares exist. .
In summary, a composite whose construction is based upon both Pythagorean and Gaussian primes can easily be identified when is true. However, sums of four squares exist and Euler’s factorization cannot be used. When is true, the composite could be constructed using Pythagorean primes or Gaussian primes. Use the Legendre test to further discriminate. When the Pythagorean construct is confirmed, the two sums of two squares can be found, and Euler’s factorization can be used. If the composite construction is both Pythagorean and Gaussian, sums of three squares exist and Euler’s factorization cannot be used.
10. Overmars factorization method
Another classification of the composite number uses a different construct for primes and seeks to define the composite number as follows: Let and test . Two cases are considered in the classification, and this determines the constructs of the primes used. Note the sign of ±1 determines the case used, and the test is both simple and concise .
This method is reasonable for small composites but becomes computationally unfeasible for large composites.
11. Extensions of the Overmars factorization method
Case (1, 2)
Case (3, 4)
for all cases. Choosing the largest value of a ensures a rapid convergence to the solution. This is illustrated by example.
Consider (Section 8)
N + 1 is the better candidate, as it has more factors to try. So cases (1,2) are considered.
When a is small, this method becomes computationally unfeasible.
12. Overmars factorization using smooth factors
Consider the construction of primes (Sections 8 and 9), . More generally, Consider (Table 2).
|x||N − x2||±x||a||m||n||gcd(m,n)||Smoothness|
When a smooth x can be found, larger a values allow for faster convergence to a solution. The selection of x and a is somewhat arbitrary and prime constructs are a modification of Fermat’s a2 − b2. Smooth factors of produce larger a values and convergence faster to a solution.
The algorithm completely ignores any numbers with remainder mod 60 that is divisible by 2, 3 or 5, since numbers with a mod 60 remainder divisible by one of these three primes are themselves divisible by that prime. Atkin stated three theorems given below:
All numbers n with mod 60 remainder 1, 13, 17, 29, 37, 41, 49 or 53 are mod 4 ≡ 1. These numbers are prime if the number of solutions to 4x2 + y2 = n is odd and the number is squarefree.
All numbers n with mod 60 remainder 7, 19, 31 or 43 have a mod 6 ≡ 1. These numbers are prime if and only if the number of solutions to 3x2 + y2 = n is odd and the number is squarefree.
All numbers n with mod 60 remainder 11, 23, 47 or 59 have a mod 12 ≡ 11. These numbers are prime if and only if the number of solutions to 3x2 − y2 = n is odd and the number is squarefree.
None of the primes are divisible by 2, 3 or 5 and are not divisible by their squares (22, 32, and 52). For a thorough analysis of “primes of the Form x2 + ny2” the reader is referred to a text by Cox .
The often overlooked works of Dubner, who is credited with the term “primorial”  are now considered [9, 10]. The primorial is a factorial of primes: and so on. . The primorial is by definition squarefree.
The nth primorial is the product of n primes, where is the prime counting function.
Using this structure, Dubner was able to create series of primes in a particular primorial.
It can be shown that the structure of primes is palindromic in the primorials .
For example, in Figure 1, take the discrete derivative of the numbers in the third primorial, 3#. The following palindromic sequence can be added to and subtracted from to determine all of the primes in that primorial:
This describes the second table in Figure 1. All of the primes in the third primorial can be found using 24 small numbers. Mod 7 is used to sieve and eliminate composite multiples of 7. Mod 11 and 13 are used to highlight further composites, but these are kept and used to generate primes in the next primorial.
, Current primorial and the difference between primes from the previous. Simple array descriptor provides rich prime fields of higher densities. Small numbers describe primes of higher magnitude. Large arrays of primes can be stored in much less memory.
14. Number systems
Conventional numbering systems consist of a base (or radix).
The primorial number system is said to be ‘primoradic’; having a primorial base. The primorial number system is a mixed radix numeral system adapted to the numbering of the primorials (Table 3).
General properties of mixed radix number systems apply to the base primorial system. The primorial number system OEIS A000040 is denoted by a subscript “∏”.
Consider the following example:
Primorial to decimal, Base∏ to Base10
3 4 1 0 1∏ stands for 3443120110, whose value is
Decimal to primorial, Base10 into Base∏
75710 into a primorial representation by successive divisions:
757 ÷ 2 = 231, remainder 1
378 ÷ 3 = 126, remainder 0
126 ÷ 5 = 25, remainder 1
25 ÷ 7 = 3, remainder 4
3 ÷ 11 = 3, remainder 3 => 3 4 1 0 1∏
15. RSA100 factorization using primorials
Not symmetrical about square root 
Symmetrical about square root.
Consider each congruency and look for a factorization that is symmetrical about the square root.
In this case 1234 + 34 =1268, 1234 – 33 = 1201.
Repeat these steps for and so on… (Table 4)
The conversion to a decimal from the base primorial (Section 12) provides P1 and P2
16. Lenstra-Lenstra-Lavász lattice reduction (LLL)
The (LLL) forms the basis of the Coppersmith attack (Section 15), and a brief explanation is given here with further reading and references for the reader. The Lenstra-Lenstra-Lavász (LLL) lattice basis reduction algorithm  calculates an LLL-reduced, short, nearly orthogonal lattice basis, in time , where B is the largest length of bi under the Euclidean norm, given a basis with n-dimensional integer coordinates, for a lattice L (a discrete subgroup of Rn) with and giving polynomial-time factorization of polynomials with rational coefficients.
A thorough explanation is given by Bosma , and a summary of the example contained in the reference is given below.
INPUT: Let lattice basis be given by the columns of
OUTPUT: LLL-reduced basis
Using the Lenstra-Lenstra-Lavász lattice reduction (LLL), the short vectors in a lattice can be found. This is used by the Coppersmith attack. Coppersmith's algorithm uses the LLL to construct polynomials with small coefficients that all have the same root modulo. When a linear combination is found to meet inequality conditions, standard factorization methods can find the solutions over integers.
17. Coppersmith attack
When is small , the Wiener attack (Section 5) can be used. Conversely, when d is large, e is small. Particular applications of the Coppersmith method for attacking RSA include cases when the public exponent e is small or when partial knowledge of the secret key is available (Section 13) .
A small public exponent e, reduces the encryption time. Common choices for e are 3, 17 and . These are Fermat primes and are chosen because the modular exponent derivation is faster. The Coppersmith method reduces the solving of modular polynomial equations to solving polynomial equations over integers.
Let and for an integer . Coppersmith can find the integer solution for by finding a different polynomial related to that has the root but only has small coefficients. The small coefficients are constructed using the LLL (Section 14). Given , the LLL constructs polynomials that all have same root depends on the degree of and the size of . Any linear combination has the same root .
The next step is to use LLL to construct a linear combination of the so that the inequality holds. Then standard factorization provides the zeroes of over .
Let be an integer and be a monic polynomial of degree , over integers such that . Set for . Given then all integers can now be found. All roots of , smaller than can be found.
The Pohlig-Hellman  algorithm is a method to compute a discrete logarithm (which is a difficult problem) on a multiplicative group. The order of which is a smooth number (also called friable), meaning its order can be factorized into small primes. A positive integer is called B-smooth if none of its prime factors is greater than B. For example, 1620 has prime factorization 22 × 34 × 5; therefore 1620 is 5-smooth because none of its prime factors are greater than 5. This is similar to that of the Overmars factorization method (Section 10). The Pohlig-Hellman  algorithm applies to groups whose order is a prime power. The basic idea is to iteratively compute the p-adic digits of the logarithm by repeatedly “shifting out” all but one unknown digit in the exponent and computing that digit by elementary methods. This is a similar idea to Section 13.
INPUT: A cyclic group of order with a generator , an element , and a prime factorization OUTPUT: The unique integer
Example: Let solve
Find the prime factors of . Find one for each .
Recall: . Now we need another from the other
By the Chinese remainder theorem, So the solution to .
19. Shor’s algorithm
Shor’s algorithm , factors composite numbers, , consisting of two primes in polynomial time using quantum computing techniques. The algorithm evaluates the period of where This is inefficient using sequential computing on a conventional computer. When run on a quantum computer, a congruence of squares with probability 0.5 occurs in polynomial time. For two co-prime sinusoids of period P1 and P2, at what point do they zero-cross each other? The phase of each sinusoid at any given point is observed, and if they are factors of N then the phase of P1 and P2 is zero. Shor’s algorithm tests the phase of (Figure 4).
Phase estimation is well suited to quantum computers and hence this factorization technique produces solutions in polynomial time. For further information on quantum phase estimation, the reader is directed to WIKI . The impact of this type of attack is discussed in detail by Mosca .
Find the period of (using Quantum computing)
Find the period of
Euler’s factorization (Section 6) cannot be used because 7 has no sum of squares nor does 35.
Fermat’s factorization (Section 5)
Overmars factorization (Section 10)
Overmars triangles (Section 8) ∆(m,n) = ∆(a,b,c): ∆(3,1)=∆(12,35,37)
20. Attacking public key infrastructure
Public infrastructure cryptographic hardware uses a library RSALib. This is found in both NIST FIPS 140-2 and CC EAL 5+. These are certified devices for use in identity cards, passports, Trusted Platform Modules, PGP and tokens for authentication and software signing. This is in use in tens of millions of devices worldwide. Nemec et al.  have identified a vulnerability that allows for the factorization of 1024 and 2048 bit keys in less than 3 CPU months.
RSALib primes are of the form .
These can be fingerprinted using the discrete logarithm .
The public modulus is generated by 65537 in the multiplicative group . The public modulus of RSALib can thus be fingerprinted with the discrete logarithm . This can be factorized using Pohlig-Hellman (Section 16). The group is smooth for RSA512 keys. The smoothness of G is due to the smoothness of M being Primorial.
Factorization is achieved using the Coppersmith algorithm with a known . Nemec et al used the Howgrave-Graham implementation of the Coppersmith’s algorithm to find a small solution of:
A summary of RSALib vulnerability and its impact is now given and the reader is directed to Memec et al.  for further detail. eIDs used in passports for citizens are affected. Code signing is vulnerable. Twenty-four percent of TPMs used in laptops are affected (sample size 41). A third of PGP, used in email systems could be factorizable. There was no observable impact on TLS/HTTPS. One hundred percent of SCADA systems sampled were affected (sample 15). E-health and EMV payment cards were also likely to be susceptible.
Mitigating the impact of the RSALib vulnerability requires changing the algorithm. This requires a firmware replacement which is not possible in already deployed devices such as smartcards and TPMs whose code is stored in read-only memory. Key lengths not of 512, 1024, 2048 and 4096, such as RSA3936 appear to be resilient. The use of key pairs outside of vulnerable devices could be deployed using another library. Changes to RSALib are required so that proveable safe primes are constructed not using the vulnerability.
21. Overmars factorization, bringing it together
Section 11 considered the following cases. The following discussion generalizes these cases and provides the structure for algorthmic solutions to be found. The palindromic nature of primes (Section 12) can be exploited further to explore solutions in a particular Primorial range. Recall;
Now we need to develop the methodology for finding (selecting) and . This brings together the concepts of primorials , Smooth , small factors , factorization (Fermat), modulo testing as per Atkin’s Sieve  and the structure of primes (Sections 12 and 18), to find as large an a as possible so that Overmars Factorization  converges more rapidly to a solution.
Recall the following (Section 12). Primes are of the form . Composite numbers, constructed from these primes: , are a combination of Pythagorean and Gaussian primes. The following test can be used to determine which combination of primes was used to construct the composite. If is true a mix of Pythagorean and Gaussian primes was used. If is true then the composite consists of only Gaussian or only Pythagorean primes. The Sieve of Atkin  uses and . This is now applied as per Overmars  in the following manner, if is true then , if is true let . The ideas of Atkin are further extended in both directions:
This is Primorial, is”Smooth”. The general form (Section 19) is now given: Case (1 , 2 ) , Case (3 , 4 )
If can be choosen, then we search in the primes to find solutions to A solution is found for , when . Case (1 , 2 ) Case (3 ) ,
Consider Section 11 example,
Integer solutions . From Table 5, determining which x value should be used is not clear. Whilst should work, no solutions will be found if From Table 5 only when do we find solutions. Ranking the possible solutions in terms of factors 29 (8) would be first, 19 (7) second and 11 (6) third.
|x||mod60||mod180||mod1620||N − x2||±x||b||a||m||n||gcd(m,n)||Smoothness|
Based upon low order factors the rankings would be 29 first and 11 second. Setting will not find solutions for . Setting so the optimal value for . Look for solutions to which are a perfect square. In this case, .
Recall that the starting value for , 30 iterations.
Whilst this is quite a good result the first failure needs also to be taken into account. This would be bound by the Primorial and
This can be further bound by the Primorial. In the case of RSA numbers, the binary bits available to represent a particular prime number range can also be used to bound the range (Table 6).
|x||Modulo testing||N − x2||a||m||n||gcd(m,n)||Smoothness|
In this case, solutions using modulo testing generate good candidates to solve for (m, n), however for , three of the candidates have no solution. Using sequential programing, each possible candidate is considered one after another, until the maximum m value. However, using parallel programing techniques on GPUs (such as nVIDIA P100s), all of the candidates can be tested simultaneously and the processes are all terminated when one of the processes finds a solution. This is very efficient and effective in finding P1, P2. Once these are known, along with the public key , using Euler’s totient, the private key can be determined. Once the private key is known the cypher-text is no longer secure.
In short RSA is secure and difficult to factorise. Conventional sequential computing machines, running in polynomial time, take an infeasible amount of CPU cycles to find factorization solutions to RSA keys. Quantum computing holds great promise and Shor’s algorithm  demonstrates how this can be achieved. However, quantum computing is realistically still some way off. Opportunities exist using conventional computing (sequential and parallel) with better mathematical techniques. Section 18 showed how implementation vulnerabilities are introduced when “clever” low cost (CPU cycles) are implemented. The case in point showed methods for signature identification, upon which tailored targeted attacks could be launched against infrastruture FIPS140-2 devices, such as cryptographic routers. These sorts of attacks can be deployed in polynomial time using sequential programing techniques. Section 20, Overmars shows how factorization can be implemented using parellel processing techniques.
There is still much to be done and areas of further interest are a better understanding of the structure of primes. This will lead to faster prime number generating algorithms and hence faster solutions to the factorization problem. This will also lead to the generation of more robust primes that are less susceptible to factorization methods. An example of this is the use of non-Pythagorean primes. Section 5 showed how Euler’s factorization could be used to attack such composite numbers. Hence a simple method to thwart this would be to use a mix of Pythagorean and Gaussian primes. Section 6 showed how small values in the RSA private key could be attacked using Wiener’s method. Small values in the public key can be attacked using a combination of LLL, Coppersmith and Pohlig-Hellman (Sections 15–17). All of these attacks can be mitigated by choosing carefully and ensuring that both are sufficiently large.
Development of quantum computing is continuing at break-neck speed, however useful machines are yet to appear. Parallel computing however is here and now and whilst factorizing RSA keys is not achievable on conventional computers in polynomial time, parallel computing has allowed for multiple solutions to be tested simultaneously. This is an area where research continues and new algorithms such as shown in Sections 20 and 14 lend themselves well to GPU parallel processing systems.
“There are known knowns. These are things we know that we know. There are known unknowns. That is to say, there are things that we know we don't know. But there are also unknown unknowns. There are things we don't know we don't know” .