20120521

The Long Road to TLSCiphertext

So at this point in the MungeTLS project--my custom implementation of TLS 1.0/1.2--I've successfully gotten the premaster secret decrypted. That's a pretty fundamental piece, since all subsequent encryption in the connection is based on it. The next part of the roadmap consists of computing the "master secret" from it, and then the actual encryption keys used for the rest of the connection. The ultimate goal of this episode is to be able to decrypt the contents of the TLSCiphertext's payload.

You might wonder why not just use RSA encryption for all traffic. It turns out that RSA encryption (asymmetric encryption) is comuptationally very expensive, so TLS uses it as a bootstrapping mechanism to allow both sides of the connection to negotiate the same symmetric encryption keys. Symmetric encryption turns out to be much more efficient, so it's used for the rest of the traffic. It's a great idea, but there are of course all sorts of headaches involved with actually implementing it.

By the way, everything in this post I actually do need to implement myself, since, unlike things like HMAC and the various types of encryption, this is all TLS stuff, not cryptographic primitives.

The Master of All Secrets

Actually, the only thing the premaster secret is used for is, aptly enough, calculating the "master secret". Doing that requires the use of what is known as the "pseudorandom function", or PRF, for short. The pseudorandom function is basically a keyed random number generator. It takes a secret, a label, and a seed, and generates a stream of random numbers. Both sides of the connection use the same pseudorandom function and inputs to arrive at the same sequence of random bytes, and therefore the same master secret.


Oh wait, this is the Keymaker, not the Keymaster.

It's actually defined this way for TLS 1.2:

   First, we define a data expansion function, P_hash(secret, data),
   that uses a single hash function to expand a secret and seed into an
   arbitrary quantity of output:

      P_hash(secret, seed) = HMAC_hash(secret, A(1) + seed) +
                             HMAC_hash(secret, A(2) + seed) +
                             HMAC_hash(secret, A(3) + seed) + ...

   where + indicates concatenation.

   A() is defined as:

      A(0) = seed
      A(i) = HMAC_hash(secret, A(i-1))


   TLS's PRF is created by applying P_hash to the secret as:

      PRF(secret, label, seed) = P_<hash>(secret, label + seed)

This is all pretty straightforward. It also says that for TLS 1.2, the "hash" above is always SHA-256. Now that I've implemented HMAC it's just like following a recipe.

But no, that would be too easy. Notice I said TLS 1.2 above. There's actually a note in the changelist at the top of the RFC that says that they changed the PRF in TLS 1.2 from 1.0, and OpenSSL doesn't even seem to support TLS 1.2, so I'd better make sure I'm primarily working with 1.0.

In TLS 1.0, the PRF is actually significantly different:

       L_S = length in bytes of secret;
       L_S1 = L_S2 = ceil(L_S / 2);

   The secret is partitioned into two halves (with the possibility of
   one shared byte) as described above, S1 taking the first L_S1 bytes
   and S2 the last L_S2 bytes.

   The PRF is then defined as the result of mixing the two pseudorandom
   streams by exclusive-or'ing them together.

       PRF(secret, label, seed) = P_MD5(S1, label + seed) XOR
                                  P_SHA-1(S2, label + seed);

It's really interesting to look at the differences between the TLS 1.0 and 1.2 specs. It's pretty obvious to see that a lot of newer algorithms that are incoporated into the latter--like SHA256 and AES--either weren't even invented or were fledgling when the former was written.

So we have to do this splitting and XOR thing for the secret, but other than that, it's likewise still just a recipe. Okay, done. We have a working PRF implementation. Wait, working? I mean, it appears to work, but how do I know? I lack an oracle to tell me whether my output is right. That's troubling, but let's push forward.

With PRF in hand, we can calculate the master secret according to the RFC:

       master_secret = PRF(pre_master_secret, "master secret",
                           ClientHello.random + ServerHello.random)

Oh, well... that's straightforward, I guess. Now, ClientHello.random and ServerHello.random are these weird structures that include a timestamp and a blob of random numbers. Do they mean to take the entire structure, or just the random bits? For a novice implementor like me, the answer isn't obvious. I took the RFC at its literal interpretation and used the entire random structure, tiemstamp and all. That is, in fact, the correct thing to do.

Cool, we have a master secret now. I'll just store that in the connection state for later.

The Key Bloc

Er, block. The block of key material. The master secret has one and only one use: calculating the HMAC keys, encryption keys, and initialization vectors for the upstream and downstream connection.

According to the RFC, here is the computation:

   To generate the key material, compute

       key_block = PRF(SecurityParameters.master_secret,
                          "key expansion",
                          SecurityParameters.server_random +
                          SecurityParameters.client_random);

   until enough output has been generated. Then the key_block is
   partitioned as follows:

       client_write_MAC_secret[SecurityParameters.hash_size]
       server_write_MAC_secret[SecurityParameters.hash_size]
       client_write_key[SecurityParameters.key_material_length]
       server_write_key[SecurityParameters.key_material_length]
       client_write_IV[SecurityParameters.IV_size]
       server_write_IV[SecurityParameters.IV_size]

Okay, cool, PRF is known to us. server_random and client_random are a bit new, but it turns out they're just ClientHello.random and the like from above. Dunno why they had to give them these weird new names. Whatever.

So what sizes do these things actually have? TLS 1.0's RFC doesn't tell you at all. I guess they want you to go to the specifications of each of the hash algorithms and cipher algorithms and look it up? Screw that! Luckily, TLS 1.2's has this all in an appendix.

                        Key      IV   Block
Cipher        Type    Material  Size  Size
------------  ------  --------  ----  -----
NULL          Stream      0       0    N/A
RC4_128       Stream     16       0    N/A
3DES_EDE_CBC  Block      24       8      8
AES_128_CBC   Block      16      16     16
AES_256_CBC   Block      32      16     16


MAC       Algorithm    mac_length  mac_key_length
--------  -----------  ----------  --------------
NULL      N/A              0             0
MD5       HMAC-MD5        16            16
SHA       HMAC-SHA1       20            20
SHA256    HMAC-SHA256     32            32

Hmm, what is this "type" thing? We'll come back to that later. Anyway, we use the PRF to generate a bunch of random data, then chop it up and assign it to these various keys. What do they all mean? The MAC keys are incorporated into the HMAC tacked on to the end of every packet for integrity checking. The write keys are the symmetric encryption keys I talked about earlier. The IV is used as part of the initial state of chained block ciphers like AES.

By the end of this part, I have all six of these values in hand. Or at least, I think I do. Once again, I don't yet have a way to verify that my progress so far is correct.

Cracking the code on TLSCiphertext

In TLS, the basic, fundamental message is a TLSPlaintext. It contains a bit of header data, a payload length field, and a plaintext (unencrypted) payload. Once the TLS handshake has been completed, all subsequent TLSPlaintext messages are transformed into TLSCiphertext messages (omitting TLSCompressed since we're not enabling compression yet), which are identical in format, but with the key difference that their payload is encrypted with the currently negotiated security parameters (the write key).

We also know the precise time when we should switch over from plaintext mode to ciphertext mode: when the client sends a ChangeCipherSpec message to the server, subsequent messages will be using new security parameters.

We should talk about what cipher suite we choose to test with first. It makes a pretty big difference. There are two basic types of "bulk ciphers", a.k.a. "symmetric ciphers": stream and block. The format of the ciphertext payload differs based on this:

    stream-ciphered struct {
        opaque content[TLSCompressed.length];
        opaque MAC[CipherSpec.hash_size];
    } GenericStreamCipher;

    block-ciphered struct {
        opaque content[TLSCompressed.length];
        opaque MAC[CipherSpec.hash_size];
        uint8 padding[GenericBlockCipher.padding_length];
        uint8 padding_length;
    } GenericBlockCipher;

Right at a glance, it's obvious the stream cipher will be simpler to deal with. Further making this case is the fact that block ciphers also have to deal with initialization vectors. Ugh, we'll come back to them later. For now, we'll roll with TLS_RSA_WITH_RC4_128_SHA (that's RC4 encryption using 16 byte keys, and SHA1 for the MAC).

So what do we do? When we switch to "secure mode", the next message we get we treat as a TLSCiphertext message instead, and try to decrypt its payload using the "client write key" (the key the client uses when it writes data to the server).

We do this thing as we wait with baited breath and prayer in our hearts. Because if it doesn't work? No, I can't think that way. It... it has to!

Did it work?

No. No, it didn't.

I see garbage data in the decryption. I know this, because I have an idea of what the format of the payload should look like. The last message the client sends as part of the handshake is a Finished message, which contains one thing: a hash of all of the handshake messages that have occurred so far. Oh, and a handshake type field, which will always be 0x14, and a three byte length field, which will be 0x00000c. I can look for that.

So now what? At every step along the way I've said stuff like, "yeah, this part is easy", and, "nothing really complicated here", so what's the problem? Well, the code isn't hard, but there's a lot of it.

The only thing I could do is start reading over the code and a little debugigng to make sure my state at the time of key computation and encryption was correct. Spoiler alert: it wasn't. I hadn't done simple things like keep track of the client and server random values that are integral in the computations. Let's run the gamut of bugs I looked for that might have been a problem.

Potential bug: PRF calculation of master secret

What about the PRF calculation? A lot of moving parts in there. Is my master key even correct? It turns out I'm really lucky that the test tool I'm using, openssl.exe s_client spits out a ton of debugging information, including the value of the computed master key! In fact, my master key computation yielded the same value! This is our first "oracle".


Finally, an Oracle!

This tells me two things: firstly, that my state going into key material computation is correct, and secondly (and more importantly) that my implementation of the PRF is also correct. Well, almost.

Potential bug: iteration count within PRF

Looking back at the TLS 1.0 PRF formula, notice that two different hash algorithms are used to produce a stream of data in chunks, then XOR'd together. These chunks are at whatever size the underlying hash function uses (16 bytes for MD5 and 20 bytes for SHA1). We produce this stream of data by calling that supplementary A() function repeatedly.

For up to 64 bytes of data, MD5 and SHA1 both have to be iterated 4 times. For any larger amount of data, the two algorithms start to differ in iteration count. The key material required for RC4 + SHA1 is 79 bytes--well over the line. I could have a bug with handling different iteration counts between the two hash algorithms' output.

But no, this code was correct. At this point I have reasonable confidence that my implementation of PRF is totally correct. This means that there is an "alibi" for any other potential bugs in things that stem directly from it, such as the raw bytes of the key expansion, proper incorporation of the ASCII label, or concatenation of the random numbers.

Potential bug: CSP's key container saturation

Back in the blog about HMACs (keyed hashes) I showed code for importing the key used for the HMAC. I didn't show the call to CryptAcquireContext, where I explicitly hardcoded a key container name for the keys to be impoted into.

How many keys can a key container hold? What happens if they go over the limit? Currently, every time I call the HMAC function or create an instance of a symmetric key "cipherer" object, I store those keys in the same containers. If they're being overwritten, I'd have no way of knowing that.

Actually, we already know we're safe with respect to HMACs, because we've proven that our PRF is correct. We have to guess with the symmetric keys, but I changed my code to pass CRYPT_VERIFYCONTEXT. This weirdly named flag gets you essentially an ephemeral, unique key container, which is exactly what I want here.

if (!CryptAcquireContextW(
         &hProv,
         NULL, // key container name isn't used now
         MS_ENH_RSA_AES_PROV_W,
         PROV_RSA_AES,
         CRYPT_VERIFYCONTEXT))
{
    hr = HRESULT_FROM_WIN32(GetLastError());
    goto error;
}

So this code was fine, too.

Potential bug: key material partitioning

Even if our PRF is correct and we're generating the right amount of data for the key block, we have to partition it correctly.

auto itKeyBlock = vbKeyBlock.begin();

size_t cbField = hashInfo.cbHashKeySize;
ClientWriteMACKey()->assign(itKeyBlock, itKeyBlock + cbField);
itKeyBlock += cbField;

assert(itKeyBlock <= vbKeyBlock.end());
cbField = hashInfo.cbHashKeySize;
ServerWriteMACKey()->assign(itKeyBlock, itKeyBlock + cbField);
itKeyBlock += cbField;

// other 4 key material fields...

assert(itKeyBlock == vbKeyBlock.end());

With a combination of asserts and log output (not printed), I verified that the key material block was being partitioned correctly.

Potential bug: could--could it be... endianness?

At last we revisit our old friend, our old nemesis, byte order, a.k.a. endianness. This guy trolled us hardcore last time. Could he be striking back again? Well, where could it even be coming up? The only place this could be a problem is any call into Crypt32 functions like CryptHashData and CryptEncrypt. We already looked over these before.

Ahh, but we looked at CryptEncrypt before as it pertains to block ciphers, not stream ciphers. In a block cipher, you get a chunk of data that decrypts cleanly; in a stream cipher, you get a running stream of data, and every byte decrypts as it comes. What does it mean to reverse the byte order of a stream? Very little.

I actually experimented with this early on in my debugging, but it didn't yield anything, probably because of other bugs I'd yet to find. Sure enough, when I told my DecryptBuffer function not to reverse byte order when doing a stream cipher, it magically just worked. I whooped with a hearty, "YEAHDOG" to celebrate the moment.

What's up next?

What did we learn? A lot about how to read and implement an RFC. I have a lot of cleanup work to do here. I definitely did many things in very much in a GIT-R-DUN manner, and I don't like to leave my code like that.

Next, I need to do verification of the MAC attached to the Finished message. But make no mistake: I'm over the crest of the hill. Getting this much working was probably the hardest part of the entire project. From now on I don't expect it to be nearly as bad.

0 comments:

Post a Comment