RSA factorization
in the real world

Basic subroutines:
Product tree
Remainder tree
Weak primes:
Batch gcd
Small primes:
Batch trial division
Small keys:

The rho method

Trial division searches for a prime divisor p of n by trying all primes in sequence: 2, 3, 5, 7, 11, 13, etc. Batch trial division speeds up handling many numbers n but still needs to enumerate all the small primes. There are about p/log p primes up to p.

The rho method finds a prime divisor p of n using only about sqrt(p) operations.

How it works

Sage code first:
     def rho(n):
       c = 10
       a0 = 1
       a1 = a0^2+c
       a2 = a1^2+c
       while gcd(n,a2-a1) == 1:
         a1 = (a1^2+c) % n
         a2 = (a2^2+c) % n
         a2 = (a2^2+c) % n
       g = gcd(n,a2-a1)
       return [g,n / g]
     # examples (second example is faster because prime is smaller):
     print rho(314159265358979323)
     # output: [990371647, 317213509]
     print rho(698599699288686665490308069057420138223871)
     # output: [2053, 340282366920938463463374607431768211507]

Now the explanation.

The rho method iterates the function lambda x: (x*x+10) mod p starting from 1, where p is a divisor of n. In other words, it starts from 1, squares, adds 10, reduces mod p, squares, adds 10, reduces mod p, etc.

You should object that we don't know p; the whole point is to find p. This objection is answered below.

(There's nothing special about the number 1: a random number would also work. There's almost nothing special about the number 10: most other numbers would work, although there are a few bad choices such as 0 and -2.)

The rho method continues iterating the function modulo p until it finds a collision: the same integer modulo p appearing twice. The birthday paradox suggests that this will happen in only about sqrt(p) steps. The rho method recognizes this collision in very little memory by Floyd's method: run another iteration at double speed and check at each step whether the single-speed and double-speed iterations (a1 and a2 in the code above) are at the same point.

The rho method actually performs computations mod n, which we do know, as a way to implicitly perform computations mod p. To check whether the single-speed and double-speed iterations are at the same point mod p, the rho method subtracts the results, computes a gcd with n, and checks whether this gcd is larger than 1. Of course, if the gcd is smaller than n then it is a nontrivial factor of n.

Version: This is version 2012.12.28 of the rho.html web page.