This post is the fourth and last in the series ECC: a gentle introduction. In the last post we have seen two algorithms, ECDH and ECDSA, and we have seen how the discrete logarithm problem for elliptic curves plays an important role for their security. But, if you remember, we said that we have no mathematical proofs for the complexity of the discrete logarithm problem: we believe it to be "hard", but we can't be sure. In the first part of this post, we'll try to get an idea of how "hard" it is in practice with today's techniques. Then, in the second part, we will try to answer the question: why do we need elliptic curve cryptography if RSA (and the other cryptosystems based on modular arithmetic) work well? Breaking the discrete logarithm problemWe will now see the two most efficient algorithms for computing discrete logarithms on elliptic curve: the baby-step, giant-step algorithm, and Pollard's rho method. Before starting, as a reminder, here is what the discrete logarithm problem is about: given two points and find out the integer that satisfies the equation . The points belong to a subgroup of an elliptic curve, which has a base point and which order is . Baby-step, giant-stepBefore entering the details of the algorithm, a quick consideration: we can always write any integer as , where , and are three arbitrary integers. For example, we can write . With this in mind, we can rewrite the equation for the discrete logarithm problem as follows: The baby-step giant-step is a "meet in the middle" algorithm. Contrary to the brute-force attack (which forces us to calculate all the points for every until we find ), we will calculate "few" values for and "few" values for until we find a correspondence. The algorithm works as follows:
As you can see, initially we calculate the points with little (i.e. "baby") increments for the coefficient (, , , ...). Then, in the second part of the algorithm, we calculate the points with huge (i.e. "giant") increments for (, , , ..., where is a huge number). ![]() To understand why this algorithm works, forget for a moment that the points are cached and take the equation . Consider what follows:
In conclusion, we are checking all points from to (that is, all the possible points) performing at most additions and multiplications (exactly for the baby steps, at most for the giant steps). If you consider that a lookup on a hash table takes time, it's easy to see that this algorithm has both time and space complexity (or if you consider the bit length). It's still exponential time, but much better than a brute-force attack. Baby-step giant-step in practiceIt may make sense to see what the complexity means in practice. Let's take a standardized curve: Now imagine storing points in a hash table. Suppose that each point requires exactly 32 bytes: our hash table would need approximately 2.5 · 1030 bytes of memory. Looking on the web, it seems that the total world storage capacity is in the order of the zettabyte (1021 bytes). This is almost ten orders of magnitude lower than the memory required by our hash table! Even if our points took 1 byte each, we would be still very far from being able to store all of them. This is impressive, and is even more impressive if you consider that Playing with baby-step giant-stepI made a Python script that computes discrete logarithms using the baby-step giant-step algorithm. Obviously it only works with curves with small orders: don't try it with It should produce an output like this: Curve: y^2 = (x^3 + 1x - 1) mod 10177 Curve order: 10331 p = (0x1, 0x1) q = (0x1a28, 0x8fb) 325 * p = q log(p, q) = 325 Took 105 steps Pollard's ρPollard's rho is another algorithm for computing discrete logarithms. It has the same asymptotic time complexity of the baby-step giant-step algorithm, but its space complexity is just . If baby-step giant-step can't solve discrete logarithms because of the huge memory requirements, will Pollard's rho make it? Let's see... First of all, another reminder of the discrete logarithm problem: given and find such that . With Pollard's rho, we will solve a sightly different problem: given and , find the integers , , and such that . Once the four integers are found, we can use the equation to find out : Now we can get rid of . But before doing so, remember that our subgroup is cyclic with order , therefore the coefficients used in point multiplication are modulo : The principle of operation of Pollard's rho is simple: we define a pseudo-random sequence of pairs. This sequence of pairs can be used to generate the sequence of points . Because both and are elements of the same cyclic subgroup, the sequence of points is cyclic too. This means that if walk our pseudo-random sequence of pairs, sooner or later we will detect a cycle. That is: we will find a pair and another distinct pair such that . Same points, distinct pairs: we can apply the equation above to find the logarithm. The problem is: how do we detect the cycle in an efficient way? Tortoise and HareIn order to detect our cycle, we could try all the possible values for and using a pairing function, but given that there are such pairs, our algorithm would be , much worse than a brute-force attack. But there exist a faster method: the tortoise and hare algorithm (also known as Floyd's cycle-finding algorithm). The picture below shows the principle of operation of the tortoise and hare method, which is at the core of Pollard's rho. ![]() We walk a sequence of pairs at different speeds until we find two different pairs and that produce the same point. In this case, we have found the pairs and that allow us to calculate the logarithm as . And in fact we correctly have . Basically, we take our pseudo-random sequence of pairs, together with the corresponding sequence of points. The sequence of pairs may or may not be cyclic, but the sequence of point is, because both and were generated from the same base point, and from the properties of subgroups we know that we can't "escape" from the subgroup using just scalar multiplication and addition. Now we take our two pets, the tortoise and the hare, and make them walk our sequence from left to right. The tortoise (the green spot in the picture) is slow and reads each point one by one; the hare (represented in red) is fast and skips a point at every step. After some time both the tortoise and the hare will have found the same point, but with different coefficient pairs. Or, to express that with equations, the tortoise will have found a pair and the hare will have found a pair such that . If our random sequence is defined through an algorithm (as opposed to being stored statically), it's easy to see how this principle of operation requires just space. Calculating the asymptotic time complexity is not that easy, but we can build a probabilistic proof that shows how the time complexity is , as we have already said. Playing with Pollard's ρI've built a Python script that computes discrete logarithms using Pollard's rho. It is not the implementation of the original Pollard's rho, but a slight variation of it (I've used a more efficient method for generating the pseudo-random sequence of pairs). The script contains some useful comments, so read it if you are interested in the details of the algorithm. This script, like the baby-step giant-step one, works on a tiny curve, and produces the same kind of output. Pollard's ρ in practiceWe said that baby-step giant-step can't be used in practice, because of the huge memory requirements. Pollard's rho, on the other hand, requires very few memory. So, how practical is it? Certicom launched a challenge in 1998 to compute discrete logarithms on elliptic curves with bit lengths ranging from 109 to 359. As of today, only 109-bit long curves have been successfully broken. The latest successful attempt was made in 2004. Quoting Wikipedia:
As we have already said, This number is pretty self-explanatory and gives a clear idea of how hard it can be to break a discrete logarithm using such techniques. Pollard's ρ vs Baby-step giant-stepI decided to put the baby-step giant-step script and the Pollard's rho script together with a brute-force script into a fourth script to compare their performances. This fourth script computes all the logarithms for all the points on the "tiny" curve using different algorithms and reports how much time it did take: Curve order: 10331 Using bruteforce Computing all logarithms: 100.00% done Took 2m 31s (5193 steps on average) Using babygiantstep Computing all logarithms: 100.00% done Took 0m 6s (152 steps on average) Using pollardsrho Computing all logarithms: 100.00% done Took 0m 21s (138 steps on average) As we could expect, the brute-force method is tremendously slow if compared to the others two. Baby-step giant-step is the faster, while Pollard's rho is more than three times slower than baby-step giant-step (although it uses far less memory and fewer number of steps on average). Also look at the number of steps: brute force used 5193 steps on average for computing each logarithm. 5193 is very near to 10331 / 2 (half the curve order). Baby-step giant-steps and Pollard's rho used 152 steps and 138 steps respectively, two numbers very close to the square root of 10331 (101.64). Final considerationWhile discussing these algorithms, I have presented many numbers. It's important to be cautious when reading them: algorithms can be greatly optimized in many ways. Hardware can improve. Specialized hardware can be built. The fact that an approach today seems impractical, does not imply that the approach can't be improved. It also does not imply that other, better approaches exist (remember, once again, that we have no proofs for the complexity of the discrete logarithm problem). Shor's algorithmIf today's techniques are unsuitable, what about tomorrow's techniques? Well, things are a bit more worrisome: there exist a quantum algorithm capable of computing discrete logarithms in polynomial time: Shor's algorithm, which has time complexity and space complexity . Quantum computers are still far from becoming sophisticated enough to run algorithms like Shor's, still the need for quantum-resistant algorithms may be something worth investigating now. What we encrypt today might not be safe tomorrow. ECC and RSANow let's forget about quantum computing, which is still far from being a serious problem. The question I'll answer now is: why bothering with elliptic curves if RSA works well? A quick answer is given by NIST, which provides with a table that compares RSA and ECC key sizes required to achieve the same level of security.
Note that there is no linear relationship between the RSA key sizes and the ECC key sizes (in other words: if we double the RSA key size, we don't have to double the ECC key size). This table tells us not only that ECC uses less memory, but also that key generation and signing are considerably faster. But why is it so? The answer is that the faster algorithms for computing discrete logarithms over elliptic curves are Pollard's rho and baby-step giant-step, while in the case of RSA we have faster algorithms. One in particular is the general number field sieve: an algorithm for integer factorization that can be used to compute discrete logarithms. The general number field sieve is the fastest algorithm for integer factorization to date. All of this applies to other cryptosystems based on modular arithmetic as well, including DSA, D-H and ElGamal. Hidden threats of NSAAn now the hard part. So far we have discussed algorithms and mathematics. Now it's time to discuss people, and things get more complicated. If you remember, in the last post we said that certain classes of elliptic curves are weak, and to solve the problem of trusting curves from dubious sources we added a random seed to our domain parameters. And if we look at standard curves from NIST we can see that they are all verifiably random. If we read the Wikipedia page for "nothing up my sleeve", we can see that:
These numbers are random because their digits are uniformly distributed. And they are also unsuspicious, because they have a justification. Now the question is: where do the random seeds for NIST curves come from? The answer is, sadly: we don't know. Those seeds have no justification at all. Is it possible that NIST has discovered a "sufficiently large" class of weak elliptic curves and has tried many possible seeds until they found a vulnerable curve? I can't answer this question, but this is a legit and important question. We know that NIST has succeeded in standardizing at least a vulnerable random number generator (a generator which, oddly enough, is based on elliptic curves). Perhaps they also succeeded in standardizing a set of weak elliptic curves. How do we know? We can't. What's important to understand is that "verifiably random" and "secure" are not synonyms. And it doesn't matter how hard the logarithm problem is, or how long our keys are, if our algorithms are broken, there's nothing we can do. With respect to this, RSA wins, as it does not require special domain parameters that can be tampered. RSA (as well as other modular arithmetic systems) may be a good alternative if we can't trust authorities and if we can't construct our own domain parameters. And in case you are asking: yes, TLS may use NIST curves. If you check https://google.com, you'll see that the connection is using ECDHE and ECDSA, with a certificate based on That's all!I hope you have enjoyed this series. My aim was to give you the basic knowledge, terminology and conventions to understand what elliptic curve cryptography today is. If I reached my aim, you should now be able to understand existing ECC-based cryptosystems and to expand your knowledge by reading "not so gentle" documentation. When writing this series, I could have skipped over many details and use a simpler terminology, but I felt that by doing so you would have not been able to understand what the web has to offer. I believe I have found a good compromise between simplicity and completeness. Note though that by reading just this series, you are not able to implement secure ECC cryptosystems: security requires us to know many subtle but important details. Remember the requirements for Smart's attack and Sony's mistake — these are just two examples that should teach you how easy is to produce insecure algorithms and how easy it is to exploit them. So, if you are interested in diving deeper into the world of ECC, where to go from here? First off, so far we have seen Weierstrass curves over prime fields, but you must know that there exist other kinds of curve and fields, in particular:
If you are interested in the implementation details of ECC, then I suggest you read the sources of OpenSSL and GnuTLS. Finally, if you are interested in the mathematical details, rather than the security and efficiency of the algorithms, you must know that:
And don't forget to study finite fields and field theory. These are the keywords that you should look up if you're interested in the topics. Now the series is officially concluded. Thank you for all your friendly comments, tweets and mails. Many have asked me if I'm going to write other series on other closely related topics. The answer is: maybe. I accept suggestions, but I can't promise anything. Thanks for reading and see you next time! |
|