Now on ScienceBlogs: Down the barrel of a thousand cosmic cannons!

Subscribe for $15 to National Geographic Magazine

Good Math, Bad Math

Finding the fun in good math; Shredding bad math and squashing the crackpots who espouse it.

Search

Profile

markcc.jpg
Mark Chu-Carroll (aka MarkCC) is a PhD Computer Scientist, who works for Google as a Software Engineer. My professional interests center on programming languages and tools, and how to improve the languages and tools that are used for building complex software systems.

Donors Choose

Other Information

Add this blog to my Technorati Favorites!

Recent Posts

Recent Comments

Categories

Blogroll

Old Topic Indices

Great Online Books

« Circling Around The Speed of Light | Main | Social Security vs Ponzi Schemes »

Cryptographic Padding in RSA

Category: Encryption
Posted on: January 8, 2009 9:52 PM, by Mark C. Chu-Carroll

Ok, away from politics, and back to the good stuff. When I left off talking about encryption, we were looking at RSA, as an example of an asymmetric encryption system.

Since it's been a while, I'll start off with a quick review of RSA.

Asymmetric encryption systems like RSA are based on computations that are easy to perform if you know the keys, and incredibly hard to perform if you don't. In the specific case of RSA, everything is based on a pair of very large prime numbers. If you know those two primes, and you know the public key, it's really easy to compute the private key. But if you don't know the two prime numbers, then even given the public key, it's incredibly difficult to compute the private key.

To be a bit more specific, in RSA, you get a pair of large prime numbers, P and Q. You compute from them a totient of their product, which is the number N=(P-1)×(Q-1). Then you arbitrarily pick a public exponent, E, which is smaller than N, and which is prime relative to N. You can then compute the private key exponent, D. If you know what P and Q are, it's pretty easy to compute D.

Once you've got D and E, your public key is the pair is (N,E), and the private key is the pair is (N,D).

If you've got a plaintext message M, then encrypting it with the public key is done by computing ME mod N. If you've got a ciphertext C encrypted with the public key, then decrypting it is done by computing CD mod N.

Encrypting a plaintext with the public key is exactly the same process as decrypting a ciphertext produced with the private key. And vice versa.

RSA is amazingly elegant. Every time I look at it for any reason, I'm struck by its beauty and it's simplicity. It's really an amazing example of how something seemingly abstract like number theory can produce practical, down-to-earth, and useful.

But RSA isn't perfect. In fact, it's got a some rather serious problems that are a direct result of its mathematical structure. I'm just going to mention one, but there are several of them.

Suppose you've got your public/private keypair. You want to encrypt two messages, I and J with your private key. Let's call the function for encrypting a message with the private key E - so E(M) = MD mod N. Then CI = E(I), and CJ=E(J). Good so far.

The problem is that given those two messages - for which an attacker has access to both the plaintext and the ciphertext - it happens that there's a vulnerability. Because of the way that encryption works, E(I×J) = E(I) × E(J)!

This opens you up to something called chosen ciphertext attacks. A chosen ciphertext attack is one where you attack an crytographic system by running chosen ciphertexts through the decryption system to see if you can compute the encryption key. There's an effective attack against the naive use of RSA based on a selected ciphertext method.

In addition to this, there are a few other interesting attacks that I won't describe in detail. They all rely on various problems of the numerical structure of RSA encryption. In order to defeat those attacks, RSA is typically used with something called a padding scheme. The idea of the padding scheme is twofold. First, it increases the size of the message - which guarantees that the encrypted message is large enough that it will not be easy to use for an attack. Second, it intersperses pseudo-random information in a way that means that a given plaintext message could be encrypted to a wide range of different ciphertexts, depending on the choices made during padding.

A common scheme for padding is OAEP - Optimal Asymmetric Encryption Padding. OAEP is a basic Feistel network system, like the ones I described when I was talking about DES. OAEP ends up doing a mixture of permuting the plaintext, and adding pseudo-random noise to it. It's a reversible transformation, and the receiver of the encrypted message (and thus any attacker) knows how to do the reversal of the padding given the decrypted plaintext. But the process of permutation and random injection performed by the padding has the effect of breaking the properties of RSA that make it easy to attack the system.

For example, suppose that the padding function is called P. E(P(I))×E(P(J)) is not equal to E(P(I×J)). Similarly, the message-size-related problems with RSA are not usable on messages whose size is increased past the vulnerability threshold using OAEP padding.

So what does the padding look like? Let's start with the pieces.

  1. You have a number, N, which is the length of the cryptographic modulus.
  2. You have two integers, k0 and k1, which are parameters of the protocol in use.
  3. You have the original plaintext message, M. By definition, the length of M is (n - k0 - k1). (The protocol is responsible for breaking up large messages into sub-messages via an appropriate mode of operation.)
  4. You have a 0-padded version of M, M', which is M followed by k1 zeros - so M' has length N-k0.
  5. You have a random string of bits, r, which has length k0.
  6. You have a cryptographic hash function G, which expands a string of k0 bits to a string of N-k0 bits.
  7. You have another cryptographic hash function H, which contracts a string of n-k0 bits to a srting of k0 bits.
oaep.png

The OAEP padding algorithm is illustrated off to the side. The way that it works is: you compute G(r), giving you a string of N-k0 bits. You XOR G(r) with M', producing a string of N-k0 bits, which we'll call X. Then compute H(X), and XOR that with r, producing a result that we'll call Y. The end result of the padding is the concatenation of X and Y.

The way to understand this is that you've got some random bits in R, which you're shuffling up (using G and H) and mixing with the bits of the original message. The resulting message is longer, and has a random element mixed into it, which defeats the numerical properties that make some attacks against RSA work. It's easy to compute X and Y given M and r. And given X and Y, if you know G and H it's easy to compute M and r. But given E(X concat Y), as an attacker, you're screwed. You can still decript the message, and get X and Y, decode them, and get the plaintext. But the process of doing the padding obscures the numerical properties of RSA so that even knowing the padding function, your attacks won't work.

Share on Facebook
Share on StumbleUpon
Share on Facebook
Find more posts in: Technology

Comments

1

It seems like an interesting fix but, since G and H have to remain secret, doesn't it turn the whole scheme into a symmetric encryption method? Don't you loose all benefits of using asymmetric cryptography?

Posted by: Simon | January 9, 2009 10:29 AM

2

Simon:

G and H don't have to remain secret. In fact, they're specifically *not* secret. Their purpose isn't to encrypt the message; it's just to scramble things to change their numeric properties. So, for example, a pair of original plaintext messages P and Q no longer have the property that E(P)×E(Q) = E(P×Q).

It's true that E(OAEP(P))×E(OAEP(Q)) = E(OAEP(P)×OAEP(Q)). But since OAEP(P)×OAEP(Q) does not equal OAEP(P×Q), that doesn't help an attacker.

The reason that G and H are cryptographic hashes is because
they given a particular value of G(M), it's very hard to compute an M' such that G(M') = G(M), and so it's very difficult to create messages that let you exploit the numerical properties of RSA.

Posted by: Mark C. Chu-Carroll | January 9, 2009 10:41 AM

3

Mark:

Thank you! So the equation E(P)*E(Q) = E(P*Q) tells you something about the private key? Then, given any public key, an attacker could choose carefully his plaintext to discover the information this padding scheme is supposed to hide. Am I wrong?

Posted by: Simon | January 9, 2009 10:59 AM

4

Simon:

Yes, exactly. The fact that E(P)×E(Q)=E(P×Q) provides a handle that an attacker can use to deduce the secret key. There are also a number of similar vulnerabilities in RSA, based on similar numerical properties. The point of the padding scheme is just to change the structure of the message enough to make it very difficult for an attacker to produce messages that they can use to deduce the private key.

Posted by: Mark C. Chu-Carroll | January 9, 2009 11:04 AM

5

I'll add some small print :)

Correct me if I'm wrong, but another flaw with RSA is it depends on calculating large prime numbers. After a certain point it becomes quite impractical to do normal prime tests and you need to use probabilistic prime tests. There are composite numbers (like Carmichael numbers) that can fool these tests into thinking that they are prime.
Another thing to note is that there are certain Ns that are relatively easy to factor using methods such as Pollard's p-1 or quadratic sieve that depend on certain properties of the P and Q you are using.
I'm not sure if these numbers are common enough that it matters or not, but it might be a good idea to keep these in mind when building an implementation of RSA.

Posted by: Rob Britton | January 12, 2009 1:49 PM

6

Rob:

First, I would never advise anyone to do their own implementation of RSA. There are plenty of really solid, tested implementations out there that people can use. Cryptographic algorithms are very easy to get wrong, and the slightest mistake can completely undermine the security of the system. Crypto isn't an area where you should roll your own; building your own crpyto should be a last resort, and if it needs to be done, it should be done with incredible care, including things like code audits by outside crypto experts.

That said, yes, there is a real problem with ensuring that you do get prime numbers as the basis for the keys. We do typically use probabilistic approaches to computing primes - but without exhaustive analysis, we can't be sure that the probabilistic approaches are correct. If you're generating your own key-pair, you should throw as much computational power at it as you possibly can - in order to ensure, to the best of your ability, that the primes you use for your keys are really honest-to-goodness primes.

In general, though, most standard implementations do that. They give you a choice of how long you're willing to take to generate the keys, and they test the primes and the generated keys against the basic attacks to ensure that you haven't ended up with an easily crackable key-pair.

Posted by: Mark C. Chu-Carroll | January 12, 2009 2:01 PM

7

Dorthy Denning solved this problem long ago. Philip Zimmerman and Bruce Schneier described and implemented solutions to these problems long ago.

See for example:

Davida, G.I. "Chosen Signature Cryptanalysis of the RSA (MIT) Public Key Cryptosystem," Technical Report TR-CS-82-2, Department of EECS, University of Wisconsin, 1982
Denning, D.E. Cryptography and Data Security Addison Wesley, 1982

Posted by: William Wallace | January 12, 2009 2:14 PM

8

"Cryptographic algorithms are very easy to get wrong, and the slightest mistake can completely undermine the security of the system."

This is true.

But, to be honest, if you are a very good programmer, and are confident in your ability to rigourously and automatically test your code with a custom built regression test suite, and you are confident in your ability to identify boundary conditions, and you have spent a good deal of time learning about other's mistakes, you might have a more trustworthy implementation if you roll your own. But it is not for the weak kneeed.

Posted by: William Wallace Author Profile Page | January 12, 2009 6:37 PM

9

Actually, I would say that if you're a very good programmer, and are confident in your ability to rigourously and automatically test your code, etc., you're exactly the kind of person who should know better than to write your own crypto code.

The thing about crypto is that there's a lot more than just boundary cases. There are tons of corners where the slightest mistake can totally screw you. And the confident solo programmer is exactly the guy who's going to make that kind of mistake.

Real serious crypto code requires at least three groups of people:
(1) a group of programmers to write the code.
(2) A group of testers to write the test, who have never seen the actual implementation, and who know nothing more about the design of the code than the formal description of the algorithm.
(3) A group of crypto experts to review and audit the design, code, and tests.

Posted by: Mark C. Chu-Carroll Author Profile Page | January 12, 2009 6:54 PM

10

Actually, I would say that if you're a very good programmer, and are confident in your ability to rigourously and automatically test your code, etc., you're exactly the kind of person who should know better than to write your own crypto code.

The thing about crypto is that there's a lot more than just boundary cases. There are tons of corners where the slightest mistake can totally screw you. And the confident solo programmer is exactly the guy who's going to make that kind of mistake.

Real serious crypto code requires at least three groups of people:
(1) a group of programmers to write the code.
(2) A group of testers to write the test, who have never seen the actual implementation, and who know nothing more about the design of the code than the formal description of the algorithm.
(3) A group of crypto experts to review and audit the design, code, and tests.

Posted by: Mark C. Chu-Carroll Author Profile Page | January 12, 2009 7:26 PM

11

OT1: It is an ironic coincidence that you posted your response twice. I just cut, pasted, and saved both versions of your response, and I did not detect any difference in information in the in second (#10) compared to the first (#9). Was the duplicate intentional?

OT2: Is the typepad login feature a scienceblogs imposed requirement, or just for goodmath?

You might also consider adding a fourth team that systematically inserts faults into a sandbox version of the design, and then subjects the intentionally sabotaged version to the test suites, as a form of quality control.

The additional amount of resources necessary for this is relatively small, and it is a relatively productive use of resources.

But I haven't had much luck convincing others of this fact.

Posted by: William Wallace Author Profile Page | January 13, 2009 4:11 AM

12

William:

We're still getting used to the newly upgraded MoveableType installation, and there are a few glitches as we get it up and running. When I tried to respond to you yesterday, the system was timing out posting comments. One attempt seemed to fail - it timed out, and the comment didn't appear. But apparently it did get done by the backend, and just hadn't appeared yet. So I reposted, only to discover it had already successfully posted.

The typepad stuff isn't intended to be a requirement; I'm trying to set it up as an option, where you can use it if you want, but you don't have to.

I agree that a fault-based testing scheme is a really excellent idea for serious crypto code.

But at the moment, convincing people that they shouldn't just roll their own crypto for the fun of it is difficult
enough. Once you get people to understand how tricky it is to get crypto implementations to be really secure, then I think getting them the rest of the way isn't that hard.

Posted by: Mark C. Chu-Carroll Author Profile Page | January 13, 2009 8:09 AM

13

Hi there.

Please excuse a total novice. I have always wondered why encryption is not used in tandem. That is, encrypt messages twice with two diffrent algorithms and/or encryption keys.

This would, not make the encryption stronger per say but, it seems to me, the test to confirm success should become almost impossible.

Say you scramble an RSA encoded padded message, with a simple substitution/transposition cipher. Musn't you then first crack the simple cipher to be able to attack the RSA?
And how do you confirm that you have cracked it?

This has always puzzled me. I would be glad if you could enlighten me.
/Chasapros

Posted by: Chasparos Author Profile Page | April 9, 2009 9:00 AM

ScienceBlogs

Search ScienceBlogs:

Go to:

Advertisement
Follow ScienceBlogs on Twitter

© 2006-2011 ScienceBlogs LLC. ScienceBlogs is a registered trademark of ScienceBlogs LLC. All rights reserved.