20121218

Sandwich Mechanics: A Tale of Failure, Redemption, and Safeway

Time marches on, you know, as does bread. Wait, what? That was too lofty and didn't make any sense. What I meant was: sorry, it's been a little while since I wrote one of these, but that doesn't mean I haven't been experimenting with bread all the while. Today we have a somewhat troubling tale of inconsistency and bad consistency, but just a touch of hope.

Most of my bread posts in the past have been about Oroweat breads. They have some good ones, but they're kind of pricey. I often try the store-brand alternatives for stuff if it's cheaper, and so I stumbled upon this Safeway brand Honey Wheat Berry bread. Weird that I can't find a good link for it.

I'm a sucker for certain key words in food. If you put "honey", "peppercorn", "plum", or a number of other random words in your food name, I'm completely at your mercy. And so I walked on out of there with that loaf of bread.

Even before I was out of the store, though, I thought to myself, "this bread feels a bit soft." Man, "soft" doesn't begin to describe it. Not only was it soft, it was straight up underbaked. Like, gooey center and all that. Except, I don't want a gooey center in my bread, thanks. Or else I'd eat a... I dunno, marshmallow? Something else.

It nearly fell apart every day in my sandwich. I had to be careful when spreading on it. I got little doughy pieces of bread on my fingers when handling it. I almost wonder if I got a fluke, a random bad loaf. Worst of all, the taste was nothing special. I tasted wheat, sure, but no honey or berry, really. Or is a "wheatberry" a thing? A disappointing thing, maybe? I was hoping for a slightly sweet bread, but it just wasn't there.

That's grim, Safeway. Real grim. You're skating on thin ice, buddy. Out of the goodness of my heart (they say this wheat bread stuff is good for the heart!), though, I decided to give them one more try. I picked out some Safeway Nut and Grain bread with similar branding. This time, I hefted the bread and kind of, er, palpated it a bit. Feelin' okay... let's do this!

Oh man, I'm really glad I gave this another try. Whereas the first bread was limp and stupid, this Nut and Grain stuff is diametrically opposite to it: sturdy, textured, and very tasty. This is by far the most heavily textured bread I have come upon so far, largely because each slice has multiple actual nuts in it. God, wow.

Seriously, barely a bite passes without a good piece of a nut in there, which you're left chewing on thoughtfully after you've dispensed with the rest of the mouthful. That interesting texture is a big deal for me, so I liked this a lot.. Best of all, the bread holds together very well. I might even be able to drop a tomato slice on one of these bad boys sometime.

Taste-wise, the grains of the bread aren't too crazy, but if you like nuts, then obviously you'll also like the overall taste, because it's, um, really nutty.

So, what have we learned here? Honey Wheat Berry bad. Nut and Grain good. Oh, and give 'em another try. Ugh, now that I've said that, does it mean I'll need to give Franz a second chance, too? Biiiih.

20121116

Weird DNS Stuff on Windows

Hi there. I work with networking for my job. That means I do all kinds of wacky stuff with IP addresses, but if you say "IP address", I'm going to ask "v4 or v6"? Comes with the territory, I'm afraid.

I did some debugging of some configurations the other day that resulted in some surprising results, so I thought I might share a couple of wacky DNS tricks I learned about.

One: what the heck is fec0:0:0:ffff::1??

We had a strange situation the other day where our system was explicitly configured with no DNS servers... and yet, name resolution was taking place, same as it ever was. This gave us some head scratching. It wasn't being handled by the hosts file, either. We took some Netmon captures and noticed something interesting: our machine was making DNS queries to the IPv6 address fec0:0:0:ffff::1.

But what is this thing? Well, fec0 is not actually programmer humor for "fecal." It indicates a "site-local" prefix. As for the rest of it, I had no idea how it came to be configured or known by the system. I thought it was some Microsoft-specific name resolution logic at first, some hard-coded values baked into the resolver.

Well, it kind of is, but even better, it's part of an IETF draft recommendation--an old one, I might add--called IPv6 Stateless DNS Discovery.

A set of three well-known site-local IPv6 addresses are reserved for autodiscovery of DNS servers. These addresses may be used as unicast addresses, assigned to different servers, or as anycast addresses with one of them being assigned to all DNS servers in the site, or any combination of anycast and unicast addresses. In any case, host routes are propagated in the site's routing tables. This document proposes that these three addresses be fec0:0:0:ffff::1, fec0:0:0:ffff::2, and fec0:0:0:ffff::3. This list of three addresses may be hardcoded into a host.

So there you have it. Apparently Microsoft is one of the authors of the draft--big shocker--but at least it's a draft out there. A bunch of our campus is now fully IPv6 enabled now, with direct Internet reachability and stuff, which is fantastic. Every machine in our building, for instance, has a global IPv6 address. It's pretty neat. The future is coming!

Two: what is this ..localmachine thing????

This one deals with Winsock programming. Let's say you want to get a list of the IP addresses on a machine that you could listen on. Firstly, you don't typically need to do this at all. Instead, I'd recommend simply binding to the machine's default IPv4 address (usually passed in as 0.0.0.0) or IPv6 address, ::. But leaving that aside, say you do want to listen explicitly on each individual address.

The typical method will probably be to call GetAddrInfoW with your own hostname, which will enumerate all your addresses. What if you don't want to do that? Maybe you're an embedded computer with a limited stack and no hostname. Well, on Windows, there's actually a neat trick.

If you pass in the special string, "..localmachine" into basically any function that hits the name resolver, it will get you all the addresses associated with the current machine. It's a pretty neat trick, I have to say, though perhaps a bit niche in its use. All of this is documented on the MSDN page, by the way; I just happened to stumble on it recently.

20121019

Sandwich Mechanics: Pesto... on a sandwich!?

I could probably end this entire blog post with like three words: "that's pretty tasty," but I'm a wordy guy, so we'll talk about this unnecessarily. Well, moreover, I actually have a specific pesto spread I was trying, and which I can recommend wholeheartedly.

That pesto spread is this weird Pesto Alla Genovese from Trader Joe's--sorry, Trader Giotto's, haha. Apparently Amazon is saying this thing costs $10 for a dinky bottle. What the heck! I don't actually know how much this cost because my wife brought it home on a whim this one time.

But it's pretty tasty.

The consistency is well suited to a spread for a sandwich. Pesto can sometimes be a bit thin or extra chunky, but this one is thick enough to spread well straight out of the fridge, and it doesn't thin out and soak the bread during the day. These things are important.

It's got a pretty strong pesto taste, so if you put a lot of it, it's going to start creeping over even the taste of your meat and cheese. You're not going to want to overdo this one. As they say in the recipe books, put it on "to taste." Useless.

Perhaps more important is making sure it's paired well with the other things in your sandwich. In general, there are a lot of things you can throw together in a sandwich that won't clash, but the thought of putting Miracle Whip (or any sweet spread, for that matter) on one side and pesto on the other side just made me cringe.

It's like the other day I had pasta with Italian sausage and pesto in it. It just wasn't right. It would have been much better with chicken instead. Bad like toothpaste and OJ, yanno?

So I do mayonnaise on one side and pesto on the other, and hey, it's pretty tasty.

If you're awesome, make your own pesto instead. It's not too hard, but you might have a tougher time hitting the really nice spreadable consistency. To me, it's along these lines that the $10 becomes worth it: even if you do nail the consistency, fresh pesto will go bad relatively quickly unless you freeze it, but then it's a hassle defrosting it conveniently to get on sandwiches on a regular basis. I think they've got some weird preservatives in there (which I guess will turn some people off) that make this stuff last for a few months. I guess maybe it is worth $10 for that?

Do I need to say that it's pretty tasty? Cause it is.

20120828

Brain Flood About Initialization Vectors

I've reached what I would call basically "v1.0" of MungeTLS, so I'm going through the exercise of making a pass over the code with a fine-toothed comb, cleaning up lots of things, and documenting the heck out of everything. I do this in preparation for sending it to a few of my friends to code review. I expect to get a lot of scathing comments about my lack of exceptions and usage of gotos.

Anyway, while doing this pass over my code I noticed some strange stuff having to do with encryption and decryption of TLS 1.1 and 1.2 block ciphers. Let's do an overview of how they work, then the aberrant behavior I found, then a bunch of other investigation stuff to get to the bottom of it. It's been a crazy few days, let me tell you.

Block Ciphers and CBC

In TLS 1.1 and 1.2, the way that block ciphers work was changed. Oops, let me step back a second. There are two common types of "bulk" encryption, or symmetric key encryption used with TLS: stream and block ciphers. Stream ciphers will give you encrypted data byte-for-byte. RC4 is an example of a stream cipher. Block ciphers take a chunk of data and turn it into a series of fixed-size blocks of encrypted data, potentially with some padding at the end. AES is a very popular example of a block cipher.

Block ciphers are vulnerable to using the same key over and over again on data. I don't know exactly why that is, but to combat it, a technique called an "initialization vector" is used to sort of seed the encryption in a way. It's actually fairly simple in implementation, just XORing the IV into the plaintext just before encrypting. Moreover, we're using our block ciphers in "cipher-block-chaining" (CBC) mode, where each block is used as the IV for the next block.

There are some particularly interesting things to glean from this diagram. Firstly, for encryption, you're actually transmitting a series of IVs + ciphertext blocks, since each ciphertext block is implicitly the IV for the next one. That's a pretty weird concept, and effectively also says that it's okay to transmit any IV in the clear.

For decryption, the most interesting thing to see is that if you start decrypting with the wrong IV, only the first block will be incorrect. That's also pretty wacky. I'm not a crypto guy, so it doesn't really make sense to me why this is accepted and okay. Apparently there are other block cipher "modes" than CBC that do a better job of mitigating these concerns, but CBC is maybe the most popular one. Go figure.

Another important thing to note is that the IV size is always the same as the block size, since each block is itself also an IV.

I ended up writing a bunch of sample code to figure out these weird gotchas before looking this stuff up and finding the super useful diagram above, which confirmed these oddities.

Now let's take a step back chronologically, to a time before I had read and figured out all that stuff about IVs, to a time when I noticed some weirdness in my code. The plot arc looks like this: we're going to look at what the RFC seems to be saying we should do. Then we'll look at what I should have implemented. Then we'll look at what I actually implemented. We'll see how I and the RFC are wrong. Finally, we'll have some unsettling conclusions.

The idea of "IVNext", insidiously suggested by the RFC

First, let's take a look at what TLS 1.1 tells us in the RFC. Actually, they don't really tell us much about how the IV should work. It's like they expect me to be some kind of security expert to implement this thing. God, what were they thinking?

// TLS 1.1
block-ciphered struct {
    opaque IV[CipherSpec.block_length];
    opaque content[TLSCompressed.length];
    opaque MAC[CipherSpec.hash_size];
    uint8 padding[GenericBlockCipher.padding_length];
    uint8 padding_length;
} GenericBlockCipher;

The IV is encrypted here, which means the IV contained within this record actually has to be the "next IV", or IV to be used for decrypting the next record. The initial IV is computed from the master secret identically on both client and server side. In the absence of explicit guidance, this is what a layman would infer from the RFC.

Let's translate that into some heavy pseudocode.

CurrentIV = IV0 (which you get from the key material generation)

function SendMessage(payload)
{
    message = CreateMessage()
    message.content = payload

    NextIV = GenerateNextIV() // IVs should be some random value
    message.IV = NextIV // send this to be used on the next ciphertext
    bytes = EncryptMessage_TLS_1_1(message, CurrentIV) // current IV set prior
    CurrentIV = NextIV

    SendBytes(bytes)
}

Great, that's what the RFC suggests I should implement. The problem--moreover, the problem that made me notice the oddities in the first place--is that I didn't implement this correctly, and my code still worked. It's basically a programmer's biggest nightmare.

What happens when we don't implement IVNext properly?

Spoiler alert: nothing. Here's the pseudocode for what I actually implemented.

CurrentIV = IV0 (which you get from the key material generation)

function SendMessage(payload)
{
    message = CreateMessage()
    message.content = payload

    message.IV = CurrentIV // oops! recipient will use the wrong IV!
    bytes = EncryptMessage_TLS_1_1(message, CurrentIV)
    CurrentIV = GenerateNextIV()

    SendBytes(bytes)
}

So I was sending the same IV within the message that I was using to encrypt the message in the first place. The recipient is going to use the wrong IV to decrypt the message, and-and-and everything will go crazy... right?!

That's what I had thought at first, before I read up on how IVs work. Remember back to the explanation earlier: the IV really only affects the first block of decryption. The recipient will decrypt the whole ciphertext, and will get an incorrect decrypted IV value that I sent. But it doesn't care, since that IV value is not even used! Each ciphertext block is used as an IV for the decryption of the next block, and that is all done correctly.

I don't know if I'm explaining that right. Fundamentally, it would never make sense to send the IV encrypted as the first block of ciphertext. For a security expert, this is probably such a patently obvious design flaw that they didn't even need to issue errata about the bug in the GenericBlockCipher struct in TLS 1.1. It's actually fixed in TLS 1.2, though I incorrectly claimed that it seemed buggy in a previous post.

Fixing the IV code

Well, the first thing we're going to do when fixing it is stop calling anything "IV Next". It's really the IV for this record, not the next one. Here is the correct code.

function SendMessage(payload)
{
    message = CreateMessage()
    message.content = payload
    // also add padding, yeah

    message.IV = GenerateNextIV()
    bytes = EncryptMessage_TLS_1_1(message.content, message.IV)

    // IV in cleartext, bytes are encrypted
    SendBytes(message.IV + bytes)
}

Wow, that's much simpler. It's weird that the IV that's generated during key material calculation isn't even used. I think it's used for other cipher types that actually do need it. Anyway, this actually works.

Some final thoughts

I already explained above why it doesn't matter whether we are encrypting the entire payload (IV and content) or just starting from the content--because each ciphertext block depends on the previous block as an IV. That block-sized piece of data immediately preceding the payload is an IV for the encrypted payload no matter how it got there, no matter whether it's plaintext or encrypted. While I understand why it works, it still bugs me that it works both ways.

And I still don't get why it's even safe to send IVs in the clear like this. In TLS 1.0, the IV for a record was the last block of the previous ciphertext, as though all ciphertext was just chained together. It was changed in TLS 1.1 to be included adjacent to the payload it's encrypting, but it's still in the clear. I don't get how it's fundamentally any better.

Any security experts around wanna bust out the math and show me what's up?

20120827

Shockingly Meta Asymmetric Encryption

A long time ago I was figuring out how to get public/private key encryption working on Windows with CryptAPI. I ultimately did, but I wasn't able to get all four crypto operations working. This is a short post to explain how a little epiphany blew my mind.

The four operations I had in mind were:

  • Encrypt with private key
  • Decrypt with private key
  • Encrypt with public key
  • Decrypt with public key

When I wrote my sample code to try all four of these combinations, I discovered that the last of these inexplicably didn't work. I chalked it up to to not finagling the right, arcane set of parameters to some CryptAPI functions, but I think it's more fundamental than that.

Actually, I think that even in theory, you only need only the first three for any security protocol. Let's look at the scenarios that these enable.

Alice wants to send to Bob securely. Alice has two options: encrypt with her private key, or encrypt with Bob's public key. Bob has two options: decrypt with her public key, or decrypt with his private key (respectively).

Likewise, if Bob wants to send back to Alice securely, we just reverse the roles. Bob can always encrypt with Alice's public key rather than force Alice to decrypt with his public key. TLS is architected this way, as can any protocol that relies on public/private key encryption.

This hit me like a Mack truck the other day so hard that I just had to write it down. Is your mind blown, too?

20120810

This Contract Sucks. Renegotiation!

Hallo there! I'm back with some more codey writings! I've been busy at work on TLS renegotiation, which is a pretty tricky thing, but an important part of the spec. I'll give some very brief background on what it is, then talk about the kind of mess-ups I had to deal with while implementing it. Sounds good? Sounds good.

I never actually talked about the TLS handshake process, arguably the most interesting feature of the TLS spec, which also takes up like 80% of the RFC. It's the process by which the client and server exchange secrets to form an encrypted connection. This is also referred to as "negotiating" the security parameters of the connection. Then we come to renegotiation, which is starting another handshake on top of an already-secured connection.

Why would you want to do this, though? Er, well, I don't really know, exactly. I guess in IE you can go to an un-authenticated page on some web site, then navigate to a part of the site that requires the client to prove its identity using a client certificate--relatively uncommon behavior in TLS clients. This calls for a renegotiation, since the client has to offer more information about itself than it does in a regular TLS session. Or in the unlikely event that the sequence number involved in a connection is going to wrap, the connection must be renegotiated. In the RFC's own words:

Sequence numbers are of type uint64 and may not exceed 2^64-1. Sequence numbers do not wrap. If a TLS implementation would need to wrap a sequence number, it must renegotiate instead.

Renegotiation can be initiated in two ways, and it's pretty simple. The client can start a renegotiation at any time after a handshake is completed by sending the server another ClientHello message, the same as when it started the first handshake. The server at any time after a handshake is completed can send the client a HelloRequest message, which--intuitively--asks the client to send a ClientHello message as soon as possible.

It's of interest that these renegotiation messages are already protected by the encryption currently being used. This fundamental point caused me the majority of my work in implementing the feature. Speaking of... let's start getting to that, huh?

Gotta keep [connection states] separated

My code originally made some pretty large, sweeping simplifying assumptions to get basic scenarios working more quickly. I actually kind of foresaw that these issues would crop up eventually, but kept putting it off dealing with them. Implementing renegotiation was that forcing function to make me deal with them properly.

Concretely, I'm talking about separate states for inbound and outbound traffic, or "read state" and "write state". The TLS connection is actually three-legged, so this state separation is pretty important. You can kind of limp by without dealing with it properly, but when it comes to renegotiation it becomes more important.

The three "legs" as you progress through the handshake towards an encrypted connection are (from the server's point of view):

  1. Both read (client-to-server) and write (server-to-client) state are unencrypted.
  2. Client has sent a ChangeCipherSpec message, so read state is encrypted, but write state is not.
  3. Server sends its ChangeCipherSpec message, and now both read and write states are encrypted. Connection is now in steady-state, ready for application data.

The crucial shortcoming in that list above is that I refer to things being unencrypted or encrypted, which is insufficient a description. What I actually mean is "encrypted using the previous connection state's parameters", but dang if that isn't way more syllables. Since the initial cipher suite is TLS_NULL_WITH_NULL_NULL (no key exchange, bulk cipher, or hash algorithm), it's easy to think of it as "unencrypted" vs. "encrypted", and gloss over some important details.

The RFC even talks about this state separation pretty explicitly:

Reception of [a ChangeCipherSpec] message causes the receiver to instruct the record layer to immediately copy the read pending state into the read current state.

In my early versions, I had this huge ConnectionParameters class, which contained all of the key material and so on. A primary piece of data that needed to be separated was the cipher suite. If we are renegotiating a new cipher suite to be used, then between legs 2 and 3, the client is sending with the new cipher suite, but the server is sending with the old one.

To that end, I identified the pieces of data that needed to be split and put them in a separate EndpointParameters class. Here are the members it ended up having:

std::shared_ptr<Hasher> m_spHasher;
std::shared_ptr<SymmetricCipherer> m_spSymCipherer;
MT_ProtocolVersion::MTPV_Version m_eVersion;
MT_CipherSuite m_cipherSuite;
ByteVector m_vbKey;
ByteVector m_vbMACKey;
ByteVector m_vbIV;
MT_UINT64 m_seqNum;

Then at the key moment--when we get a ChangeCipherSpec, it's very easy for me to copy over the state for just that read or write direction.

else if (*record.ContentType()->Type() == MT_ContentType::MTCT_Type_ChangeCipherSpec)
{
    *CurrConn()->ReadParams() = *NextConn()->ReadParams();

    /*
    ** after copying the next endpoint state, which has not been touched,
    ** its sequence number should already be 0 without having to reset it
    */
    assert(*CurrConn()->ReadParams()->SequenceNumber() == 0);
}

The assert that I showed here is also pretty interesting. Previously I was manually setting the read sequence number to 0 when receiving this message, but now, as the comment says, I shouldn't have to.

Likewise, when the server sends its own ChangeCipherSpec, I can easily copy the write state:

hr = EnqueueMessage(spChangeCipherSpec);
if (hr != S_OK)
{
    goto error;
}

*CurrConn()->WriteParams() = *NextConn()->WriteParams();

hr = NextConn()->CopyCommonParamsTo(CurrConn());
if (hr != S_OK)
{
    goto error;
}

/*
** newly copied new connection state should have its initial value of
** 0 for sequence number, since it hasn't been touched yet
*/
assert(*CurrConn()->WriteParams()->SequenceNumber() == 0);

Since this is the last leg of the handshake, I also copy over all the other non-endpoint-specific parameters, thus making the current connection the fully active one.

Quick note: renegotiation protection

Oh yeah, there's this thing called renegotiation protection that protects a renegotiation handshake from having some extra records inserted into the middle of it by an attacker. Basically, it requires adding extra data to the ClientHello and ServerHello to prove that the renegotiation is related in some way to the previous handshake. Specifically, it's data from the Finished messages before--the final messages sent in the handshake.

This wasn't too hard to implement. I just had to keep track of the "verify data" from the first handshake and insert it into the next handshake. Once I had the architecture of keeping track of an old and a new connection state, it was easy.

What does it mean to be encrypted?

In my bulleted list earlier I talked about the inadequacy of describing the connection as being "encrypted" or not. Turns out my code was doing this, too. I had a single flag in the ConnectionParameters class called bool m_fIsSecure. In a couple places in the code I'd do things like: "if the connection is currently secure, then parse this message as a TLSCiphertext message and try to decrypt it; otherwise, parse it as a TLSPlaintext."

As I tried to describe earlier, that's nowhere near good enough for renegotiation. Now you have to start concerning yourself with which cipher suite to decrypt the message with--RC4? AES128? etc.. The read/write state separation helped here, but I was able to clean it up even more by using some provisions given by the RFC.

Namely, recall that the initial cipher suite is TLS_NULL_WITH_NULL_NULL, a.k.a. unencrypted. What if I started out my code with that cipher suite and tried to parse everything as a TLSCiphertext, albeit sometimes with null encryption? Then it will either decrypt or not decrypt as necessary, depending on the current connection state.

As a freebie, I get an implementation of null encryption proper, though not even OpenSSL's debugging client allows it. Maybe I have to recompile the thing with some dumb flag?

Maybe an IE bug? Different cipher suite during renegotiation

The whole point of this project is to test TLS client implementations, so I've been testing against both the OpenSSL command line client and IE throughout. IE as of now had the only implementation of TLS 1.2 that I could find, so it was a necessity. I hear OpenSSL might have it now, but I need to update and check.

Anyway, I was having a heck of a time testing with IE. I ran into two problems. Firstly, IE doesn't seem to support choosing a different cipher suite during renegotiation. That's maybe understandable for security reasons, though it seems odd not to allow "upgrading" to a more secure cipher suite. Whatever.

My problem with it is that the ClientHello advertises all of IE's usual cipher suites, but IE resets the connection if you choose anything other than the same one you were using before. Is this some weird defense-in-depth, not to broadcast what the previous connection state is using? That doesn't quite make sense either. I dunno.

Maybe an IE bug? Renegotiation after application data

I found another weird IE problem, that it doesn't like a HelloRequest directly after an ApplicationData message. It's not closing the connection at that point, since I've specified to keep the connection alive, but I guess internally the fetching of a web page kind of halts the processing of more messages that arrive after the web page is done loading.

If I delay the HelloRequest until I get a new ApplicationData--presumably the start of a new HTTP request--it is totally happy with it.

Chrome didn't seem to have this problem, nor did the command line OpenSSL client.

Pretty sure it's a Chrome bug: mismatched version

It's interesting: Chrome (latest Chromium, even) doesn't support TLS 1.2. It does support TLS 1.1, however, which is somewhere between 1.0 and 1.2 in terms of similarities. Wow, how insightful. To sum it up in a sentence, the key differences are that TLS 1.1 still uses the same hybrid SHA1/MD5 pseudorandom function that TLS 1.0 does, but uses TLS 1.2's incorporated IVs in the cipher fragment. Anyway, to test with Chrome I had to implement TLS 1.1. It wasn't too bad, since I already have working TLS 1.0 and 1.2 implementations.

HRESULT
ComputePRF_TLS10(
    Hasher* pHasher,
    const ByteVector* pvbSecret,
    PCSTR szLabel,
    const ByteVector* pvbSeed,
    size_t cbLengthDesired,
    ByteVector* pvbPRF);

// same PRF used for both 1.0 and 1.1
auto ComputePRF_TLS11 = ComputePRF_TLS10;

// same block structure format between 1.1 and 1.2
typedef MT_GenericBlockCipher_TLS11 MT_GenericBlockCipher_TLS12;

So having all that working, I found that my renegotiations were failing! Even with same-cipher suite choice! Weird stuff, man. I got to debugging. I noticed from my logs that the MAC attached to the ClientHello in the renegotiation was not computing correctly.

received MAC:
17 18 53 ED BD 10 0D 77 FB 94 A4 97 94 7D 2F BA 69 F2 96 F9 

computed MAC:
4A 1B F0 03 2A C0 2C 72 B7 66 DA 68 B3 5C CE D9 F3 F5 52 2B 

tlsciphertext failed security check: MT_E_BAD_RECORD_MAC

I implemented one of the real killer features of MungeTLS to help debug this: it logs all the traffic unencrypted even if it sent or received it encrypted. This gets exported to a Netmon capture for easy viewing. And what did I spy in my Netmon capture that looked amiss?

InnerTLS: TLS Rec Layer-1 HandShake: Client Hello.
- TlsRecordLayer: TLS Rec Layer-1 HandShake:
   ContentType: HandShake:
 + Version: TLS 1.0
   Length: 175 (0xAF)
 - SSLHandshake: SSL HandShake ClientHello(0x01)
    HandShakeType: ClientHello(0x01)
    Length: 171 (0xAB)
  - ClientHello: TLS 1.1
   + Version: TLS 1.1
   + RandomBytes: 
     SessionIDLength: 0 (0x0)
     CipherSuitesLength: 72
   + TLSCipherSuites: TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA    { 0xC0,0x0A }
...
...
     CompressionMethodsLength: 1 (0x1)
     CompressionMethods: 0 (0x0)
     ExtensionsLength: 58 (0x3A)
   + ClientHelloExtension: Server Name(0x0000)
   + ClientHelloExtension: Renegotiation Info(0xFF01)
   + ClientHelloExtension: Elliptic Curves(0x000A)
   + ClientHelloExtension: EC Point Formats(0x000B)
   + ClientHelloExtension: SessionTicket TLS(0x0023)
   + ClientHelloExtension: Unknown Extension Type
   + ClientHelloExtension: Status Request(0x0005)

Woops, why doesn't this version match!? I don't know what kinds of tricks they're trying to pull here... This seemed like a strange difference, so I wondered if they were using the wrong version in their MAC computation. Recall from a previous post the MAC computation:

The MAC is generated as:

HMAC_hash(MAC_write_secret, seq_num + TLSCompressed.type +
             TLSCompressed.version + TLSCompressed.length +
             TLSCompressed.fragment));

It clearly says to get the version from TLSCompressed (a.k.a. TLSCiphertext for our purposes). The hash computation that failed is incorrectly using the ClientHello.version instead. I added a callback in the code to allow the application to reconcile this version difference if it's encountered. Once I had this fixed, the handshake went through smoothly.

HRESULT
DummyServer::OnReconcileSecurityVersion(
    MT_TLSCiphertext* pCiphertext,
    MT_ProtocolVersion::MTPV_Version connVersion,
    MT_ProtocolVersion::MTPV_Version recordVersion,
    MT_ProtocolVersion::MTPV_Version* pOverrideVersion)
{
    UNREFERENCED_PARAMETER(pCiphertext);

    // detecting chrome bug and working around. could also have sniffed UA str
    if (connVersion == MT_ProtocolVersion::MTPV_TLS11 &&
        recordVersion == MT_ProtocolVersion::MTPV_TLS10)
    {
        *pOverrideVersion = connVersion;
        return MT_S_LISTENER_HANDLED;
    }

    return MT_S_LISTENER_IGNORED;
} // end function OnReconcileSecurityVersion

It pleases me greatly to see my program already finding bugs in a TLS protocol implementation. This is exactly the purpose I created it for, and I haven't even gotten started with any really nasty tests!

The looming shadow of things to come

I feel a lot better about a number of parts of the code now. It removes a lot of assumptions and models the TLS spec more closely. However, the next (and final?) big feature I have in mind is to make a client implementation to go along with the server implemenation I currently have, and that's going to take a lot of work along the lines of sorting out "read" and "write" states, compared to "client" and "server" endpoints. I can tell that's going to be a mess. Look forward to hearing about it later!

20120709

Sandwich Mechanics: Fire the Cannons! Sink this Bread-ship!

My method for buying new bread is usually to look at whatever's on sale at the grocery store, because I'm a cheapskate. Usually there is at least one Oroweat bread on sale (and I don't mean BOGO, because what am I going to do with all that bread before it expires??), but last time not a single Oroweat bread was, so I had to explore a little farther.

I stumbled upon this Franz bread, Cannon Beach Milk and Honey. I don't normally buy Franz because I don't tend to like them as much as Oroweat, but like I said, desperate times... It said "Milk and Honey" on it, which is basically enough to get me to buy anything. "Peppercorn" and "plum" are likewise culinary triggers for me. I'm so easily played.

I got played big time. The bottom line is this bread is terrible. I saw a Franz delivery truck this morning that displayed this bread and a couple others on the side, and immediately got a little mad. Of all the nerve, trying to foist this inferior product off on us with delicious sounding words.

Deep breaths...

This bread is soft to the point of instability. I guess that kind of comes with the territory when you're talking about white breads, but it still seemed weak. This ended up being a huge problem, as it started getting soggy within minutes of constructing the sandwich. That's simply unacceptable. I imagine it would likewise fail as a soup vehicle, probably breaking apart on the way to your mouth mere seconds after dipping.

Each slice of bread also seemed almost... underbaked? A little spot next to the curl on each side was hard, doughy, and undercooked, as though it weren't properly baked. I haven't seen this before, and I don't know what it means. I guess it didn't taste noticeably worse, but it was disconcerting.

Finally, let's come to the much touted Milk and Honey. I have just one thing to say: where is it? I couldn't detect even a hint of it. A disappointment on top of mediocrity. It tasted like normal, bland white bread.

Don't bother with this one. Skip it and get something stronger, more interesting, or better tasting. There are plenty of choices out there. Cannon Beach? Hmph. Deserves to be shelled.

20120706

Sandwich Mechanics: Premium Italian? More like Premi-DUMB!

Woohoo, we're gonna crank out these catch-up sandwich blogs like nobody's business. That's not to say I'm going to skimp on the details. Don't you worry about that. We'll apply the appropriate amount of writing to the breads, according to their merits. Speaking of which, this post is going to be, hmm... short.

This time we're talking about Oroweat Premium Italian bread. It looked pretty decent, and I think it was on sale at the grocs shop, so I grabbed it to try out. The loaf is a little longer and wider than Oroweat's usual shorter loaf size, which stood out a bit to me. However, even with this increased size, it felt light. Suspicious.

My suspicions were later confirmed as I started making sandwiches with it. This is a very light bread. It's airy and a bit fluffy, also pretty soft. I guess I'm finding that I prefer heavier breads, because I didn't like these attributes in it, not one bit. The bread felt weak to me. At first it seemed almost belligerent to the idea of a spread, only allowing me to coat the surface somewhat. Later, it soaked it up too much! It wasn't a deal-breaking amount of soaking, but more than I like. Because of its soft and light consistency, pieces towards either end tended to get squashed about. That's not good eatin's.

At least hopefully there is some marevelous taste to make up for this bread of weak constitution...? Not so, I'm afraid. The bread is bland, simply put. The website exalts its "ultimate flavor", but I'm sorry, guys, I just don't get it.

I was hoping for a strong bread maybe with some seasonings. I may get some flak for bringing it up, but Subway Italian bread is not bad. What I got instead was utterly common white bread with no real redeeming qualities. I say skip it and get some potato instead.

Sandwich Mechanics: Winter Wheat in the Summer

Hello again, everybody. Man, it's been--what?--six months? God, what happened? Er, a lot of code happened. And I wasn't making sandwiches for myself consistently for a little while, which stops the creative juices' flowing. But I hadn't stopped completely, so I nicely stockpiled some photos on my phone to remind me of breads that needed a little textual appreciation.

The one I'm writing about this time left quite an impression on me, enough so that I remember it vividly enough to write about months after I tried it. I'm talking about Oroweat Winter Wheat, a pretty unique bread off the shelf in the store.

Probably the very first thing you'll notice upon picking up the package is that it's kind of small. The form factor, if you will, is noticeably smaller in all dimensions than a standard loaf of bread. Despite that, the weight is comparable. Hmm. We'll revisit this soon. I prefer the regular size of bread, myself, because it gives me more surface area to work with when laying out ingredients. Raw spinach leaves, in particular, are finicky things, just chomping at the proverbial bit to slide off during construction.

Still not having opened the packaging, you can see some kind of large seeds on the surface of the loaf. These are actually sunflower seeds, and they contribute nicely to the texture of the bread; more on this later.

I'll be direct: this bread is quite tasty. It's kind of expensive, but if you're okay with shelling out for some yuppie bread, I do recommend it. The base taste of the bread is mildly sweet, so if you're adding on sweet spreads like Miracle Whip or the cranberry spread depicted (despite being cranberry, it is tempered quite a bit), you may get a little too much of that.

But the mere taste isn't what makes this bread unique or interesting. I briefly mentioned the weight before. Despite the bread's small size, it retains the weight of a normal slice. In other words, it's dense. This is a thick bread, and not in the sense of the cut of the slices. When you bite into it, it's a bit doughy. At the same time, the texture of the bread itself is kind of coarse: it breaks apart into large granules. Think of it as tending towards the consistency of, say, pumpkin bread. Mmmm, pumpkin bread...

Erk, where was I?

The words I used to describe it here might make it sound unappetizing, but it's really not. It's simply a different taste. You have light, airy breads like many white breads and the Italian bread I'll come to in a later episode, and you've got the heavier breads like the Winter Wheat.

What else is going on with the texture? Well, there are walnuts baked into the bread, too. Chopped walnuts, like you'd bake into a cake or something. They're slightly crunchy and add a lot to the normally boring texture of plain bread. Couple these with the previously mentioned sunflower seeds and you have a sandwich that suddenly has a number of extra "little bits" that surprise you as you eat each bite. I remember being somewhat distracted during lunch that day, chewing these little treats incorporated unexpectedly into my sandwich. It was great!

This is a great sandwich bread, but I don't recommend using it as dipping material. The bread is too expensive to waste as a mere soup-sponge. Its texture talents are better served where they'll be noticed: in a sandwich. Besides which, the coarse grain that I talked about before does not lend itself especially well to holding together in damp situations.

All in all, I recommend trying this one. It's a little expensive, and the small bread size can be a bit harder to work with, but the taste is good and the extra bits of texture complement the average sandwich well.

20120705

protip: fixing your stupid synaptics touchpad

I have a Asus Eee PC netbook. It’s not perfect, but I like it. One of the things I don’t like so much about it is that sometimes the trackpad gets all wonky and doesn’t quite work. Specifically, only a vertical strip down the center will function, and that too requiring undue force. Inputs on all other parts of the trackpad basically don’t work, so using it is nearly impossible. Moreover, tapping on the trackpad, which normally does a regular left-click, produces all kinds of other inputs that are not a left-click, such as a right-click or a middle-click.

So, what to do about this? I thought it might be a Windows 8 problem, because I got this around the time the Win8 developer preview was out and installed it directly, so I asked around at work. I actually got a very good tip that’s worth sharing, because I haven’t seen it written anywhere else.

If your trackpad gets into this situation, unplug the laptop from AC power, then place your palm flat on it for 10 sec, then remove your palm for 10 sec. This recalibrates the trackpad’s ground noise levels, which I guess are what were getting out of wack. The trackpad should function normally now. I’ve done this several times since receiving the suggestion, and it works like a charm. I hope it helps someone else out there. Protip indeed.

20120607

A Quick Bit on Cert Chains

A little oversight I had in the development of MungeTLS was that in all my testing, I'd basically hard-coded a certificate for the server to use. Heh, at early points, I had literally hard-coded it in the CPP file as a huge byte array tucked away at the bottom. It later progressed to calling the Windows cert store APIs to find a certificate and grab its encoded representation.

I didn't really touch it again, since it was working pretty well, but now that I've gotten most other crypto stuff working, it's time to make it a little more useful. One important thing I've always had to do in SSL client library testing is handling of different certificate chains. Different bitness of certificate keys, different "usage" OIDs allowed, out of order certificate chains, and so on. As a start, MungeTLS needs to support sending more than a measly one certificate in the Certificate handshake message.

This effort wasn't really that complicated or time consuming. It came down to switching all my PCCERT_CONTEXT variables to PCCERT_CHAIN_CONTEXT. The chain context structure gives you access to an array of PCCERT_CONTEXT structures, which is exactly what I need.

Correspondingly, in an unprecedented show of convenience, there is simply a method to grab a whole cert chain, given a PCCERT_CONTEXT: CertGetCertificateChain. It's beautiful! As far as parameters go, the only strange one that doesn't have a good default value is a PCERT_CHAIN_PARA structure. And of this, the only required parameter is RequestedUsage, which is nearly optional. Here's what I use to initialize it:

CERT_CHAIN_PARA chainPara = {0};

chainPara.cbSize = sizeof(chainPara);

// indicates not to use this member
chainPara.RequestedUsage.dwType = USAGE_MATCH_TYPE_AND;

if (!CertGetCertificateChain(
         NULL,
         pCertContext,
         NULL,
         NULL,
         &chainPara,
         0,
         NULL,
         &pCertChain))
{
    hr = HRESULT_FROM_WIN32(GetLastError());
    goto error;
}

Interacting with the resultant chain structure is also easy. Here's where I get access to the PCCERT_CONTEXT that I was previously being passed directly:

spPubKeyCipherer.reset(new WindowsPublicKeyCipherer());
hr = spPubKeyCipherer->Initialize(
         (*CertChain())->rgpChain[0]->rgpElement[0]->pCertContext);

There's only one small hiccup in this structure: notice that I have to traverse two levels of arrays. I think of a certificate chain as a simple array, where element 0 is the leaf and element size-1 is the self-signed root certificate. But apparently Windows gives us a list of chains. I don't know what this means, exactly. In the simple example I'm using--three certificates in a chain--Windows returns me only one list. I wonder what circumstances produce more than one. I bet it's some weird smartcard thing.

And here's the code where I create a Certificate handshake message with multiple certificates in the chain:

PCCERT_CHAIN_CONTEXT pCertChain = *ConnParams()->CertChain();

// what does it mean to have 2 simple chains? no support for now
assert(pCertChain->cChain == 1);

PCERT_SIMPLE_CHAIN pSimpleChain = pCertChain->rgpChain[0];

for (DWORD i = 0; i < pSimpleChain->cElement; i++)
{
    hr = certificate.AddCertificateFromMemory(
             pSimpleChain->rgpElement[i]->pCertContext->pbCertEncoded,
             pSimpleChain->rgpElement[i]->pCertContext->cbCertEncoded);

    if (hr != S_OK)
    {
        goto error;
    }
}

At first I was manually just mashing all of the encoded representations together in a byte vector and sending those raw over the wire. That's not actually correct according to the RFC, though. Here's the relevant section:

opaque ASN.1Cert<1..2^24-1>;

struct {
    ASN.1Cert certificate_list<0..2^24-1>;
} Certificate;

You have to kind of know the notation in the RFC doc to really grok this, but angle brackets indicate a "vector", which is always preceded by a length field just large enough to represent the range indicated. So by mashing together all the certificates, I was not correctly setting the length of the individual certificates in the blob. An illustration:

Wrong:

Certificate length,
certificate_list[0],
certificate_list[1],
and certificate_list[2]

00 00 0f 01 02 03 04 11 12 13 14 15 21 22 23 24 25 26

Right:

Certificate length,

certificate_list[0].length,
certificate_list[0],

certificate_list[1].length,
certificate_list[1],

certificate_list[2].length,
and certificate_list[2]

00 00 0f 00 00 04 01 02 03 04 00 00 05 11 12 13 14 15 00 00 06 21 22 23 24 25 26

Not so subtle once you look at it side by side, right? In the first case you have no idea how long each individual certificate is. That was probably an overblown, unnecessary example. But we're at the end! This all works. See?

A thought on platform agnosticism

It's worth pointing out that this is one area of the code I haven't yet made platform-agnostic. Ideally the caller would pass in a list of X.509 certificates in their byte representation, the leaf certificate's private key, and the leaf certificate's public key. Maybe it's possible for me to parse the public key out of the certificate, but that's going to require understanding the X.509 format... I'm not sure I signed up for that!

Perhaps a better method is to make two initialization possibilities for the Windows public key encryptor object: one that takes the certificate store parameters to look it up in the Windows cert store and one that takes some well known format, like PFX. Still, the goal is for someone to be able to rapidly do testing using the tools they already have, without requiring a lot of weird conversions. This effort needs some more thinking.

20120606

Don't Be a Beaver, Putting Blocks in the Stream

At this point in my MungeTLS project, I think I've taken care of basically all the hard stuff I initially had planned. I'll talk about it in a minute, but I've implemented I think all of the cryptographic stuff needed for a simple TLS server. Let's run down the list of features:

  • TLS 1.0 support
  • TLS 1.2 support
  • stream ciphers such as RC4-128
  • block ciphers such as AES-256
  • newer hash functions such as SHA-256

At the end of the last post, I had just gotten a TLSCiphertext structure's fragment to decrypt using a stream cipher. That's a pretty good step, but stream ciphers are puny, simple things. Block ciphers are where things get interesting. But first, we hadn't--er--finished the Finished message. Let's get to it.

Finish the Fight

The purpose of the Finished message is to tie off the TLS handshake by making sure that both parties are using the same key material and didn't see any tampering in the middle of the connection. First, the client sends a Finished message to the server, then the server sends a (different) one back.

The contents of the Finished message are simple: a hash of all of the messages received so far during the handshake, and in the order received. This proves both points with regards to key material and tampering. The pair of Finished messages are also the first two messages encrypted under the just-negotiated security parameters.

Here's how the Finished hash is calculated:

struct {
    opaque verify_data[12];
} Finished;

verify_data
    PRF(master_secret, finished_label, MD5(handshake_messages) +
    SHA-1(handshake_messages)) [0..11];

The master secret is known to us. The finished_label is just a static string, "client finished" or "server finished", but the handshake messages are interesting. First of all, we're going to have to store some additional state in the connection, keeping track of all handshake messages seen so far. Secondly, what exactly do we store? The entire message, including record-layer headers? Actually, the RFC is nice enough to explain it properly: only data in the Handshake message is stored--so no headers from the TLSCiphertext part. Oh, and of course the hash omits the Finished message that it is inside, itself.

By the time of reception of a client Finished message, I've typically tracked the following messages:

working on the following handshake messages:
    ClientHello
    ServerHello
    Certificate
    ServerHelloDone
    ClientKeyExchange

Computed Finished hash with label "client finished":
AC D2 32 D5 C8 7F AF 1F 15 C6 12 EE 
Received Finished hash:
AC D2 32 D5 C8 7F AF 1F 15 C6 12 EE 

Note that the ChangeCipherSpec message won't be included here, since it is not actually a handshake message. The only bugs or difficulty I encountered in this code were around not tracking all the handshake messages and in the right order.

Having checked the security of the client's Finished message successfully, I also had to create and send one from the server. This one only differs in content from the client's, because it includes the client's Finished message itself.

Time to Hash it out again

Actually, before we could send our own Finished message back to the client and finally put a fork in this handshake, we have to implement record integrity, that is, compute a MAC to attach to the message so that the other party can verify its integrity. As a baby step towards this, I implemented verifying the client Finished message's MAC that I'd just received.

To make it more concrete, let's look again at the stream cipher's structure:

stream-ciphered struct {
    opaque content[TLSCompressed.length];
    opaque MAC[CipherSpec.hash_size];
} GenericStreamCipher;

The MAC is generated as:

   HMAC_hash(MAC_write_secret, seq_num + TLSCompressed.type +
                 TLSCompressed.version + TLSCompressed.length +
                 TLSCompressed.fragment));

where "+" denotes concatenation.

seq_num
   The sequence number for this record.

hash
   The hashing algorithm specified by
   SecurityParameters.mac_algorithm.

This is formulaic as always, but--hey, what's this? seq_num is a new thing. The sequence number... hmm. What is that? Search for "seq_num". No hits. Well, except for the description that very helpfully points out to laymen that "seq_num" means "sequence number". It's strange that they don't really specify it in a nice, clean way. Still, the information is hiding, spread out a bit in the RFC.

There are separate sequence numbers for the "read" state and the "write" state. They are 64 bits long and reset to zero whenever a connection state is made "active." When is that, exactly? It's not super obvious, but with some careful reading, it seems to be immediately after a ChangeCipherSpec message is sent or received. The sequence number is incremented after every subsequent record-layer message (like TLSPlaintext or TLSCiphertext) is sent or received. All of this took some good, old-fashioned shot-in-the-dark experimentation, but I've got it now.

You're doing it wrong -- your encryption, that is

So I was able to correctly verify the MAC on an inbound record. I likewise computed a MAC for an outbound record and attached it. I was feeling pretty good, until the client hit me with an Alert message. I knew it was an alert because the content type was 21, but oddly enough, upon decryption, I couldn't read the payload at all! OpenSSL at least was kind enough to tell me the alert was fatal bad_record_mac. Hey, I thought I had this decryption stuff working!

I stumbled on this for quite a while, doing what I've had to do in the past to debug this crazy crypto stuff: scour my code for errors. I found some, too, but nothing that helped. Finally, something I read in the RFC tickled the back of my brain.

For stream ciphers that do not use a synchronization vector (such as RC4), the stream cipher state from the end of one record is simply used on the subsequent packet.

I feel like I've read something like this elsewhere... How about the docs for CryptDecrypt?

The following extra actions are performed when Final is TRUE: blah blah blah... If the cipher is a stream cipher, the next CryptDecrypt call resets the cipher to its initial state.

Hey, wait, that doesn't sound good! The stream must go on, according to the RFC, right? Of course I was setting the Final parameter to CryptDecrypt to TRUE, because previously I was only encrypting once. I merely flipped this to FALSE and things magically started working. Even my hash computation was correct!

Blocks That Matter

This is not to be confused by the excellent Xbox Live Indie Game of the same name, of course. We're talking about block cipher encryption, such as AES. Let's start of by looking at the structure, as usual.

block-ciphered struct {
    opaque content[TLSCompressed.length];
    opaque MAC[CipherSpec.hash_size];
    uint8 padding[GenericBlockCipher.padding_length];
    uint8 padding_length;
} GenericBlockCipher;

Careful, now; this is TLS 1.0 block ciphering, not TLS 1.2, which has a key difference, which we'll come to later. The interesting part of this is the rules about the padding. Let me outline them here.

  1. The total length of the plaintext, including padding and the padding length field, must be a multiple of the block size. For AES-128, for instance, that is 16 bytes.
  2. The value of every byte in the padding field is equal to the padding length.

So, an example. Let's say we're using AES-128, which has 16 byte blocks. And let's pretend we have 10 bytes of content + MAC.

content (10 bytes):
01 02 03 04 05 06 07 08 09 0a

content (10 bytes) with 5 padding bytes and padding length (1 byte):
01 02 03 04 05 06 07 08 09 0a 05 05 05 05 05 05

total length of content is 10 + 5 + 1 == 16 == block length

One implication of the way this padding works is that without doing a little math, we don't know how long the content is. We have to examine the last byte to see how much padding to expect, then subtract the padding length, padding lenght field length, and MAC length from the total length of the structure. The remainder is the content size. With some careful pointer arithmetic, there's no problem.

Block Ciphers in CryptAPI

So there's a thing that CryptDecrypt doesn't tell you about block ciphers. It plays weird truncation games with the padding. Let's go back to our example from before. Let's say you decrypted something that produced the plaintext we looked at above (the 10 bytes of content, plus padding). After calling CryptDecrypt, you'd actually end up with this:

decrypted content from CryptDecrypt
01 02 03 04 05 06 07 08 09 0a 05

Hey! Where's the rest of my padding?! I don't know why Windows does this, but I just worked around it. Since this is down in a "core" crypto part of the code, and I want to shelter the TLS part of the code from implementation differences, I actually reconstruct the padding and padding length before returning this plaintext to the caller. Luckily I have that padding length byte at the end, so I can do this.

Let's revisit our dear, dear friend, endianness, yet again. We're talking about block ciphers now. What byte order do they use? As a reminder, RSA encryption used host byte order, so I had to reverse it from the network. RSA is kind of a block cipher, right? So other block ciphers should also use host byte order. Yeah, that would make sense.

Too much sense, if you hadn't guessed from my incredibly leading commentary. For whatever reason, both types of symmetric encryption--stream and block--use network byte order. Whatever. It was only two possibilities to try. I just wish the docs were clearer on this.

How about the Final parameter that we looked at before? Actually, each individual block is treated as a "final" block. Again, I just had to experiment; some of this is not intuitive at all. This is the reason this crypto stuff scares me. By design, you have very little to go on, so debugging it feels like groping blindly at possibilities.

Just hook me up to an IV, Windows

There's another gotcha related to block ciphers, which is that they typically take an initialization vector, or an IV. This is yet another reason I tackled the simpler stream ciphers first. The IV is a sort of "seed" for the encryption. I don't think I mentioned it before, but the first IV is generated as part of the whole key block after the master secret is computed. That IV is used to encrypt only the first chunk of ciphertext, and then is thrown away. The last ciphertext block of that chunk is used as the IV for the next chunk of ciphertext and so on.

Notice I use these weird terms, "chunk" and "block". The RFC is fairly precise about it, but it doesn't hurt to get clarity. Remember how in AES-128, a ciphertext "block" was 16 bytes? Well, if you have a 20 byte chunk of plaintext, that gets padded to be 32 bytes of ciphertext, or two blocks. So the last block, or 16 bytes, is used as the next IV, regardless of how big your ciphertext payload just was. By the way, this technique is called Cipher Block Chaining (or CBC).

I need to make sure I set the IV for encryption just before encrypting my next payload. Windows lets me do this, but with oddities. There's actually a whole snippet of sample code in the docs for CryptDecrypt about how to set the IV properly.

What it comes down to is calling CryptSetKeyParam with KP_IV, but there's a trick to it, because this only takes effect after the next Final encryption. To work around this, you can set the IV and duplicate the HCRYPTKEY (using the staggeringly original CryptDuplicateKey), which initializes itself with the pending IV of the original key. It's a bit weird, but not that bad, since they spell out how to do it.

TLS 1.2, why you gotta be so WRONG?

After doing all of the above, I got block ciphers and stream ciphers totally working with both OpenSSL and IE. That's a pretty good deal. Along the way I had always implemented TLS 1.2 after getting TLS 1.0 working, even though OpenSSL doesn't even support it. Most of the implementation I've been talking about for this whole post works as-is for both versions, but there is one minor difference and one major difference worth mentioning.

The minor difference is that the hash used in the Finish message now uses SHA-256 rather than a combo of MD5 and SHA-1. Whoopee. Done.

The major difference is that block ciphers no longer use IVs the way they do in TLS 1.0. Instead of using the last ciphertext block as the IV for the next encryption, each record randomly generates and encrypts the IV for the next record as part of the payload. The first IV is generated just like in TLS 1.0, as part of the key block. This helps thwart attacks that snoop the tail end of the previous payload as a hint into how to decrypt the next payload. It seems like an unlikely attack vector, but any bit counts, right? Defense in depth, as they say.

So here is the TLS 1.2 block cipher structure:

struct {
    opaque IV[SecurityParameters.record_iv_length];
    block-ciphered struct {
        opaque content[TLSCompressed.length];
        opaque MAC[SecurityParameters.mac_length];
        uint8 padding[GenericBlockCipher.padding_length];
        uint8 padding_length;
    };
} GenericBlockCipher;

Look at this critically. Do you see anything strange about it? Here's a hint: look at which part of it is encrypted. Doesn't it strike you as odd that the IV is outside the encrypted portion? It certainly struck me as odd, considering the point of this is to prevent the next IV from being snooped. Oh well. I went ahead and implemented it as it says. After all, the RFC has been specific and accurate so far.

Some time debugging later, and I'm pretty well convinced this is actually a bug. IE, the only TLS 1.2 implementation I tested with, does not understand this structure at all. On the contrary, it is encrypting the IV along with the rest of the payload. The structure should look like this:

// TLS 1.knutaf ha ha

block-ciphered struct {
    opaque IV[SecurityParameters.record_iv_length];
    opaque content[TLSCompressed.length];
    opaque MAC[SecurityParameters.mac_length];
    uint8 padding[GenericBlockCipher.padding_length];
    uint8 padding_length;
} GenericBlockCipher;

When implemented this way, I was able to get IE working properly with TLS 1.2. Who's right? I have no idea. Who's to say? A point of interest: TLS 1.2 is disabled by default in IE, probably for compatibility reasons, but I couldn't hazard a more specific guess.

What's next!

Well, that was fun! I think most of the "hard" stuff is done in this project--by that I mean all the crazy crypto that requires me to wildly swing my hammer, as in wack-a-mole. The next phase of the project is cleanup: I have an embarrassingly large amount of magic numbers strewn about. I need to refactor stuff that got ugly. I have errant members that shouldn't be, and others with the wrong degree of privacy. I haven't actually implemented support for alerts, so I ought to do that.

But I probably won't write another blog post until I'm into the phase beyond that, which is about designing the interface to the application. The what? Don't remember what the point of this whole project was? We'll come back to that, don't worry.

20120521

The Long Road to TLSCiphertext

So at this point in the MungeTLS project--my custom implementation of TLS 1.0/1.2--I've successfully gotten the premaster secret decrypted. That's a pretty fundamental piece, since all subsequent encryption in the connection is based on it. The next part of the roadmap consists of computing the "master secret" from it, and then the actual encryption keys used for the rest of the connection. The ultimate goal of this episode is to be able to decrypt the contents of the TLSCiphertext's payload.

You might wonder why not just use RSA encryption for all traffic. It turns out that RSA encryption (asymmetric encryption) is comuptationally very expensive, so TLS uses it as a bootstrapping mechanism to allow both sides of the connection to negotiate the same symmetric encryption keys. Symmetric encryption turns out to be much more efficient, so it's used for the rest of the traffic. It's a great idea, but there are of course all sorts of headaches involved with actually implementing it.

By the way, everything in this post I actually do need to implement myself, since, unlike things like HMAC and the various types of encryption, this is all TLS stuff, not cryptographic primitives.

The Master of All Secrets

Actually, the only thing the premaster secret is used for is, aptly enough, calculating the "master secret". Doing that requires the use of what is known as the "pseudorandom function", or PRF, for short. The pseudorandom function is basically a keyed random number generator. It takes a secret, a label, and a seed, and generates a stream of random numbers. Both sides of the connection use the same pseudorandom function and inputs to arrive at the same sequence of random bytes, and therefore the same master secret.


Oh wait, this is the Keymaker, not the Keymaster.

It's actually defined this way for TLS 1.2:

   First, we define a data expansion function, P_hash(secret, data),
   that uses a single hash function to expand a secret and seed into an
   arbitrary quantity of output:

      P_hash(secret, seed) = HMAC_hash(secret, A(1) + seed) +
                             HMAC_hash(secret, A(2) + seed) +
                             HMAC_hash(secret, A(3) + seed) + ...

   where + indicates concatenation.

   A() is defined as:

      A(0) = seed
      A(i) = HMAC_hash(secret, A(i-1))


   TLS's PRF is created by applying P_hash to the secret as:

      PRF(secret, label, seed) = P_<hash>(secret, label + seed)

This is all pretty straightforward. It also says that for TLS 1.2, the "hash" above is always SHA-256. Now that I've implemented HMAC it's just like following a recipe.

But no, that would be too easy. Notice I said TLS 1.2 above. There's actually a note in the changelist at the top of the RFC that says that they changed the PRF in TLS 1.2 from 1.0, and OpenSSL doesn't even seem to support TLS 1.2, so I'd better make sure I'm primarily working with 1.0.

In TLS 1.0, the PRF is actually significantly different:

       L_S = length in bytes of secret;
       L_S1 = L_S2 = ceil(L_S / 2);

   The secret is partitioned into two halves (with the possibility of
   one shared byte) as described above, S1 taking the first L_S1 bytes
   and S2 the last L_S2 bytes.

   The PRF is then defined as the result of mixing the two pseudorandom
   streams by exclusive-or'ing them together.

       PRF(secret, label, seed) = P_MD5(S1, label + seed) XOR
                                  P_SHA-1(S2, label + seed);

It's really interesting to look at the differences between the TLS 1.0 and 1.2 specs. It's pretty obvious to see that a lot of newer algorithms that are incoporated into the latter--like SHA256 and AES--either weren't even invented or were fledgling when the former was written.

So we have to do this splitting and XOR thing for the secret, but other than that, it's likewise still just a recipe. Okay, done. We have a working PRF implementation. Wait, working? I mean, it appears to work, but how do I know? I lack an oracle to tell me whether my output is right. That's troubling, but let's push forward.

With PRF in hand, we can calculate the master secret according to the RFC:

       master_secret = PRF(pre_master_secret, "master secret",
                           ClientHello.random + ServerHello.random)

Oh, well... that's straightforward, I guess. Now, ClientHello.random and ServerHello.random are these weird structures that include a timestamp and a blob of random numbers. Do they mean to take the entire structure, or just the random bits? For a novice implementor like me, the answer isn't obvious. I took the RFC at its literal interpretation and used the entire random structure, tiemstamp and all. That is, in fact, the correct thing to do.

Cool, we have a master secret now. I'll just store that in the connection state for later.

The Key Bloc

Er, block. The block of key material. The master secret has one and only one use: calculating the HMAC keys, encryption keys, and initialization vectors for the upstream and downstream connection.

According to the RFC, here is the computation:

   To generate the key material, compute

       key_block = PRF(SecurityParameters.master_secret,
                          "key expansion",
                          SecurityParameters.server_random +
                          SecurityParameters.client_random);

   until enough output has been generated. Then the key_block is
   partitioned as follows:

       client_write_MAC_secret[SecurityParameters.hash_size]
       server_write_MAC_secret[SecurityParameters.hash_size]
       client_write_key[SecurityParameters.key_material_length]
       server_write_key[SecurityParameters.key_material_length]
       client_write_IV[SecurityParameters.IV_size]
       server_write_IV[SecurityParameters.IV_size]

Okay, cool, PRF is known to us. server_random and client_random are a bit new, but it turns out they're just ClientHello.random and the like from above. Dunno why they had to give them these weird new names. Whatever.

So what sizes do these things actually have? TLS 1.0's RFC doesn't tell you at all. I guess they want you to go to the specifications of each of the hash algorithms and cipher algorithms and look it up? Screw that! Luckily, TLS 1.2's has this all in an appendix.

                        Key      IV   Block
Cipher        Type    Material  Size  Size
------------  ------  --------  ----  -----
NULL          Stream      0       0    N/A
RC4_128       Stream     16       0    N/A
3DES_EDE_CBC  Block      24       8      8
AES_128_CBC   Block      16      16     16
AES_256_CBC   Block      32      16     16


MAC       Algorithm    mac_length  mac_key_length
--------  -----------  ----------  --------------
NULL      N/A              0             0
MD5       HMAC-MD5        16            16
SHA       HMAC-SHA1       20            20
SHA256    HMAC-SHA256     32            32

Hmm, what is this "type" thing? We'll come back to that later. Anyway, we use the PRF to generate a bunch of random data, then chop it up and assign it to these various keys. What do they all mean? The MAC keys are incorporated into the HMAC tacked on to the end of every packet for integrity checking. The write keys are the symmetric encryption keys I talked about earlier. The IV is used as part of the initial state of chained block ciphers like AES.

By the end of this part, I have all six of these values in hand. Or at least, I think I do. Once again, I don't yet have a way to verify that my progress so far is correct.

Cracking the code on TLSCiphertext

In TLS, the basic, fundamental message is a TLSPlaintext. It contains a bit of header data, a payload length field, and a plaintext (unencrypted) payload. Once the TLS handshake has been completed, all subsequent TLSPlaintext messages are transformed into TLSCiphertext messages (omitting TLSCompressed since we're not enabling compression yet), which are identical in format, but with the key difference that their payload is encrypted with the currently negotiated security parameters (the write key).

We also know the precise time when we should switch over from plaintext mode to ciphertext mode: when the client sends a ChangeCipherSpec message to the server, subsequent messages will be using new security parameters.

We should talk about what cipher suite we choose to test with first. It makes a pretty big difference. There are two basic types of "bulk ciphers", a.k.a. "symmetric ciphers": stream and block. The format of the ciphertext payload differs based on this:

    stream-ciphered struct {
        opaque content[TLSCompressed.length];
        opaque MAC[CipherSpec.hash_size];
    } GenericStreamCipher;

    block-ciphered struct {
        opaque content[TLSCompressed.length];
        opaque MAC[CipherSpec.hash_size];
        uint8 padding[GenericBlockCipher.padding_length];
        uint8 padding_length;
    } GenericBlockCipher;

Right at a glance, it's obvious the stream cipher will be simpler to deal with. Further making this case is the fact that block ciphers also have to deal with initialization vectors. Ugh, we'll come back to them later. For now, we'll roll with TLS_RSA_WITH_RC4_128_SHA (that's RC4 encryption using 16 byte keys, and SHA1 for the MAC).

So what do we do? When we switch to "secure mode", the next message we get we treat as a TLSCiphertext message instead, and try to decrypt its payload using the "client write key" (the key the client uses when it writes data to the server).

We do this thing as we wait with baited breath and prayer in our hearts. Because if it doesn't work? No, I can't think that way. It... it has to!

Did it work?

No. No, it didn't.

I see garbage data in the decryption. I know this, because I have an idea of what the format of the payload should look like. The last message the client sends as part of the handshake is a Finished message, which contains one thing: a hash of all of the handshake messages that have occurred so far. Oh, and a handshake type field, which will always be 0x14, and a three byte length field, which will be 0x00000c. I can look for that.

So now what? At every step along the way I've said stuff like, "yeah, this part is easy", and, "nothing really complicated here", so what's the problem? Well, the code isn't hard, but there's a lot of it.

The only thing I could do is start reading over the code and a little debugigng to make sure my state at the time of key computation and encryption was correct. Spoiler alert: it wasn't. I hadn't done simple things like keep track of the client and server random values that are integral in the computations. Let's run the gamut of bugs I looked for that might have been a problem.

Potential bug: PRF calculation of master secret

What about the PRF calculation? A lot of moving parts in there. Is my master key even correct? It turns out I'm really lucky that the test tool I'm using, openssl.exe s_client spits out a ton of debugging information, including the value of the computed master key! In fact, my master key computation yielded the same value! This is our first "oracle".


Finally, an Oracle!

This tells me two things: firstly, that my state going into key material computation is correct, and secondly (and more importantly) that my implementation of the PRF is also correct. Well, almost.

Potential bug: iteration count within PRF

Looking back at the TLS 1.0 PRF formula, notice that two different hash algorithms are used to produce a stream of data in chunks, then XOR'd together. These chunks are at whatever size the underlying hash function uses (16 bytes for MD5 and 20 bytes for SHA1). We produce this stream of data by calling that supplementary A() function repeatedly.

For up to 64 bytes of data, MD5 and SHA1 both have to be iterated 4 times. For any larger amount of data, the two algorithms start to differ in iteration count. The key material required for RC4 + SHA1 is 79 bytes--well over the line. I could have a bug with handling different iteration counts between the two hash algorithms' output.

But no, this code was correct. At this point I have reasonable confidence that my implementation of PRF is totally correct. This means that there is an "alibi" for any other potential bugs in things that stem directly from it, such as the raw bytes of the key expansion, proper incorporation of the ASCII label, or concatenation of the random numbers.

Potential bug: CSP's key container saturation

Back in the blog about HMACs (keyed hashes) I showed code for importing the key used for the HMAC. I didn't show the call to CryptAcquireContext, where I explicitly hardcoded a key container name for the keys to be impoted into.

How many keys can a key container hold? What happens if they go over the limit? Currently, every time I call the HMAC function or create an instance of a symmetric key "cipherer" object, I store those keys in the same containers. If they're being overwritten, I'd have no way of knowing that.

Actually, we already know we're safe with respect to HMACs, because we've proven that our PRF is correct. We have to guess with the symmetric keys, but I changed my code to pass CRYPT_VERIFYCONTEXT. This weirdly named flag gets you essentially an ephemeral, unique key container, which is exactly what I want here.

if (!CryptAcquireContextW(
         &hProv,
         NULL, // key container name isn't used now
         MS_ENH_RSA_AES_PROV_W,
         PROV_RSA_AES,
         CRYPT_VERIFYCONTEXT))
{
    hr = HRESULT_FROM_WIN32(GetLastError());
    goto error;
}

So this code was fine, too.

Potential bug: key material partitioning

Even if our PRF is correct and we're generating the right amount of data for the key block, we have to partition it correctly.

auto itKeyBlock = vbKeyBlock.begin();

size_t cbField = hashInfo.cbHashKeySize;
ClientWriteMACKey()->assign(itKeyBlock, itKeyBlock + cbField);
itKeyBlock += cbField;

assert(itKeyBlock <= vbKeyBlock.end());
cbField = hashInfo.cbHashKeySize;
ServerWriteMACKey()->assign(itKeyBlock, itKeyBlock + cbField);
itKeyBlock += cbField;

// other 4 key material fields...

assert(itKeyBlock == vbKeyBlock.end());

With a combination of asserts and log output (not printed), I verified that the key material block was being partitioned correctly.

Potential bug: could--could it be... endianness?

At last we revisit our old friend, our old nemesis, byte order, a.k.a. endianness. This guy trolled us hardcore last time. Could he be striking back again? Well, where could it even be coming up? The only place this could be a problem is any call into Crypt32 functions like CryptHashData and CryptEncrypt. We already looked over these before.

Ahh, but we looked at CryptEncrypt before as it pertains to block ciphers, not stream ciphers. In a block cipher, you get a chunk of data that decrypts cleanly; in a stream cipher, you get a running stream of data, and every byte decrypts as it comes. What does it mean to reverse the byte order of a stream? Very little.

I actually experimented with this early on in my debugging, but it didn't yield anything, probably because of other bugs I'd yet to find. Sure enough, when I told my DecryptBuffer function not to reverse byte order when doing a stream cipher, it magically just worked. I whooped with a hearty, "YEAHDOG" to celebrate the moment.

What's up next?

What did we learn? A lot about how to read and implement an RFC. I have a lot of cleanup work to do here. I definitely did many things in very much in a GIT-R-DUN manner, and I don't like to leave my code like that.

Next, I need to do verification of the MAC attached to the Finished message. But make no mistake: I'm over the crest of the hill. Getting this much working was probably the hardest part of the entire project. From now on I don't expect it to be nearly as bad.

20120508

HMAC and cheese with Hash browns

Get it!? Because food?! Ehhhh, yeah, it was pretty weak.

I've been cranking on the next step of the TLS handshake--computing the master secret. Well, it's one thing to compute it, but another thing to tell if you did it right. This is a pretty big chunk of work, and I have serious reservations about being able to get it right without a lot of debugging... but that's not what I'm here to talk about.

No, today we're talking about HMAC, which is a way to hash some data, but also include some key material, so that only another party who also knows the key material can arrive at the same hash value. It can be used as part of a signature on some data, if you only want certain parties to be able to verify it.

I read the HMAC RFC, then implemented it. It wasn't hard, since I already had primitive hashing working. They give you the forumla for it pretty much like a cookbook; if you just follow the steps, it's hard to screw up.

But then I realized that this violates one of the principles of the MungeTLS project: don't re-implement cryptographic primitives. Arguably, HMAC is a cryptographic primitive, especially in relation to the TLS protocol. So I needed to get a Windows implementation of HMAC. It's there, but as usual, there was some wrangling involved.

My wrangles: let me show you them.

Like other hashes on Windows, you will be using CryptCreateHash and its buddies to make the hash, add data to it, and pull the hash value at the end. We start off in the usual way, by setting the ALG_ID parameter to CALG_HMAC.

It was during my prototyping of this code that I discovered that the more advanced cryptographic providers in Windows, such as the AES Cryptographic Provider are backwards comptible with the simpler ones, like the Base Cryptographic Provider. I just used the AES provider, since I wasn't sure at compile-time what algorithm I'd be using.

Speaking of algorithms, supplying CALG_HMAC isn't enough, if you read the RFC. HMAC is based on another hash function, so we also have to choose this "inner" hash algorithm. The way we indicate this is by passing a HMAC_INFO structure to CryptSetHashParam with HP_HMAC_INFO.

HMAC_INFO hinfo = {0};
hinfo.HashAlgid = CALG_SHA_256;

if (!CryptSetHashParam(
         hHash,
         HP_HMAC_INFO,
         reinterpret_cast<const BYTE*>(&hinfo),
         NULL))
{
    wprintf(L"CryptSetHashParam\n");
    hr = HRESULT_FROM_WIN32(GetLastError());
    goto error;
}

The second important part of the HMAC is, of course, the H--that is, the key material. Actually, CryptCreateHash has a parameter for this: our old friend HCRYPTKEY. We need a symmetric key to pass in here. You can use any of the usual ways to get this key, typically CryptGenKey or CryptDeriveKey.

But in TLS, we aren't generating the key ourselves; the key is an arbitary series of bytes that originate from the other party over the network. We'll have to import the secret as a key before we can use it here.

In a previous post, I attached the code for exporting a symmetric key. Thankfully, the process of importing one is just the reverse. I was scared off a bit by the documentation for CryptImportKey, which says stuff like "For all but the public key, the key or key pair is encrypted." That would mean I'd have to jump through a bunch of annoying, weird hoops just to import the thing. Thankfully, the documentation is just self-contradictory on this point: it goes on later to say that our friend the PLAINTEXTKEYBLOB also doesn't need to be encrypted.

// from http://msdn.microsoft.com/en-us/library/windows/desktop/aa379931(v=vs.85).aspx
// see note about PLAINTEXTKEYBLOB
struct PlaintextKey
{
    BLOBHEADER hdr;
    DWORD cbKeySize;
    BYTE rgbKeyData[1];
};

vector<BYTE> vbKey; // contains raw bytes of the key
vector<BYTE> vbPlaintextKey;
vbPlaintextKey.resize(sizeof(PlaintextKey) + vbKey.size());

PlaintextKey* pPlaintextKey = reinterpret_cast<PlaintextKey*>(&vbPlaintextKey.front());
pPlaintextKey->hdr.bType = PLAINTEXTKEYBLOB;
pPlaintextKey->hdr.bVersion = CUR_BLOB_VERSION;
pPlaintextKey->hdr.aiKeyAlg = CALG_RC2;
pPlaintextKey->cbKeySize = vbKey.size();

copy(vbKey.begin(), vbKey.end(), pPlaintextKey->rgbKeyData);

if (!CryptImportKey(
         hProv,
         reinterpret_cast<const BYTE*>(pPlaintextKey),
         vbPlaintextKey.size(),
         NULL,
         CRYPT_IPSEC_HMAC_KEY,
         &hKey))
{
    hr = HRESULT_FROM_WIN32(GetLastError());
    wprintf(L"failed CryptImportKey: %08LX\n", hr);
    goto error;
}

Two odd things to note here are that I've fixed the key algorithm as RC2, and that I'm passing this weird flag, CRYPT_IPSEC_HMAC_KEY. The docs for CryptImportKey say that for HMAC keys, which don't really have their own algorithm--they're just generic "secrets", RC2 is specified... just because. Whatever, I'll do that. If your HMAC key is larger than 16 bytes (in TLS they most assuredly are), you have to pass this arbitrary IPSec related flag. Meh, fine. Caused me about 15 minutes' worth of digging around to discover that.

Once we have the key, calculating the HMAC is as simple as plugging it into the CryptCreateHash call above. I'm not going to bother putting up the code this time, since it's pretty straightforward from the snippets I already included.

20120426

Introducing MungeTLS. Also, Debugging Encryption

Here I am, continuing my ongoing saga of cryptography related coding on Windows. In the last couple blogs I got symmetric encryption and public key encryption working in a kind of minimal sample code type of way. This is all well and good, but it came time to integrate it into the larger program.

Wait, what do you mean you don't know what the "larger program" is? Haven't I told you what this project actually is? ... I haven't!? Man, I'm a real jerk sometimes. Well, might as well get this over with.

The project is called MungeTLS. It is planned to be a relatively minimal implementation of TLS. This protocol, formerly known as SSL and marked by the "https" URL scheme, is absolutely foundational in almost everything you do on the internet today. It uses a lot of other protocols and technologies to allow things like buying stuff online, because it safely encrypts data sent over the Internet.

Why is MungeTLS going to be "minimal"? Well, because TLS has a lot of aspects to it, and a number of extensions and plugins. Also, 90% of the TLS traffic in the world probably doesn't use most of them. I'm mostly concerned with things that are commonly used. For example, I'm not sure if I'm going to bother with client authentication (as opposed to server authentication, which most assuredly is important).

But the main feature of MungeTLS is that it will have a very tightly integrated plugin system, where the caller will get callbacks through every stage of the TLS connection. I mean to allow the caller to mess with data at any part of the process from initiating the connection to the handshake to every message sent and received. The simple rationale for this is that I don't believe something like this exists out there. In my line of work, I've been tasked with testing two or even three different SSL implementations, and never had the time to write this sort of test tool to really screw with the protocol in deep and interesting ways. This project aims to fill this void.

Since I've only needed to test client side TLS implementations so far, I'll be targeting writing the server-side TLS code for now. My plan is to write a functioning web server that can interact against OpenSSL and SChannel (i.e. the TLS implementation used by most Microsoft products, notably Internet Explorer). Once I have that working, I'll be adding in the plugin model.

As far as current progress, I'm mid-way through the TLS handshake. I've just decrypted ClientKeyExchange message, which is the only message involving public key cryptography. That's what this blog post is about.

Public Key Craziness

So yeah, going back to the intro paragraph, I was ready to integrate my sample public key cryptography code into the rest of the program. I diligently copied over functions with names like EncryptBuffer, and set up some levels of abstraction that might allow me to be crypto library-agnostic (WindowsPublicKeyCipherer derives from PublicKeyCipherer).

I integrated it into the regular message handling and set it up to decrypt the payload of the exchange_keys field of the ClientKeyExchange message. Everything was looking good.

Until it wasn't.

I got varying error messages from CryptDecrypt: things like NTE_BAD_LEN, and ERROR_OUT_OF_MEMORY. None of these made any sense to me, because I was sure my buffer allocation sizes were correct, and that I was pulling out the correct part of the message from the network to pass to decryption.

I started passing CRYPT_DECRYPT_RSA_NO_PADDING_CHECK to CryptDecrypt in the hopes that I might see some slightly malformed data that I could fix up... I don't even know how. That yielded just blobs of bytes with no pattern. This phenomenon was happening with both OpenSSL and IE, so I had to be doing something pretty wrong.

Encryption in OpenSSL

I needed to see what was being encrypted, and into what data was being encrypted and what it turned into on the wire. I couldn't figure out how to get this on Windows, because they have layers and layers of internals, and I doubt I'd find any code calling functions I was familiar with, like CryptEncrypt. Instead, I dumped the DLL exports of libeay32.dll (a central DLL from OpenSSL) and looked for anything useful.

>dumpbin /exports \programs\openssl\bin\libeay32.dll | findstr /i rsa
...
        490  895 0002F770 RSA_private_decrypt
        491  896 0002F740 RSA_private_encrypt
        492  897 0002F7A0 RSA_public_decrypt
        493  898 0002F710 RSA_public_encrypt
...

So I set a breakpoint on RSA_public_encrypt and saw it get hit in my OpenSSL test app. I had to guess at the parameters, to see if I could get the cleartext and ciphertext, but it wasn't too bad. Here, might as well show it. I don't show enough debugging on this blog. Debugging without symbols can be especially interesting.

Also, a link to my Windbg Quick Reference, to decipher some of the commands.

2:003> x libeay32!*RSA_public_encrypt*
*** ERROR: Symbol file could not be found.  Defaulted to export symbols for f:\programs\openssl\bin\LIBEAY32.dll -
00000000`0046f710 LIBEAY32!RSA_public_encrypt (<no parameter info>)

2:003> bu LIBEAY32!RSA_public_encrypt
2:003> bl
 0 e 00000000`0046f710     0001 (0001)  2:**** LIBEAY32!RSA_public_encrypt

Okay, so we've found the function and set a breakpoint on it. Then we hit g to continue execution to see if it gets hit.

2:003> g
Breakpoint 0 hit
LIBEAY32!RSA_public_encrypt:
00000000`0046f710 b838000000      mov     eax,38h

2:003> r
rax=0000000000000001 rbx=0000000001d3a1e0 rcx=0000000000000030
rdx=000000000013ef58 rsi=0000000001d26af0 rdi=0000000001d27ea4
rip=000000000046f710 rsp=000000000013eef8 rbp=0000000001d27ea6
 r8=0000000001d27ea6  r9=0000000001d3a1e0 r10=0000000000000010
r11=0000000000000000 r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000001d27ea0
iopl=0         nv up ei pl nz na po nc
cs=0033  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000206
LIBEAY32!RSA_public_encrypt:
00000000`0046f710 b838000000      mov     eax,38h

From learning the (much simpler than x86) calling convention for x64, I know that the first four arguments to a function are passed in registers rcx, rdx, r8, and r9, respectively. Remaining arguments are pushed on the stack. Maybe one of these four arguments could have the cleartext I'm looking for.

Speaking of the cleartext, I actually have an idea of what it ought to look like. The only part of TLS that involves public key encryption is the "client key exchange", in which the client generates a random number, encrypts it with the server's public key (which it knows, since it's public), and sends it to the server, which can decrypt it with the private key. This set of random bytes is called the "premaster secret".

I actually see that rcx has a value of 0x30, that is, 48. This is exactly the right size of the premaster secret (taken from RFC 5246):

struct {
    ProtocolVersion client_version;
    opaque random[46];
} PreMasterSecret;

client_version
   The latest (newest) version supported by the client.  This is
   used to detect version rollback attacks.

random
   46 securely-generated random bytes.

struct {
    public-key-encrypted PreMasterSecret pre_master_secret;
} EncryptedPreMasterSecret;

pre_master_secret
   This random value is generated by the client and is used to
   generate the master secret, as specified in Section 8.1.

That's 2 bytes for the protocol version, followed by 46 bytes of random data. Time to examine the arguments.

2:003> db @rdx L@rcx
00000000`0013ef58  03 01 17 9f c6 23 5e 48-8a 2c 0b 82 c6 28 1f 29  .....#^H.,...(.)
00000000`0013ef68  19 54 59 70 76 96 7b 45-ab 82 25 ec 0c c1 8d dd  .TYpv.{E..%.....
00000000`0013ef78  60 27 c1 e1 8e be 4f 1b-af 7e da 58 24 ac 84 f9  `'....O..~.X$...

Oh hey, look at that! 0x0301 is the protocol version identifier for TLS 1.0. I'm pretty sure this second argument contains the cleartext. That wasn't too bad. One of the other arguments might contain, on completion, the ciphertxt. Let's look before and after.

2:003> db @r8
00000000`01d27ea6  03 00 02 00 30 82 01 fc-30 82 01 69 a0 03 02 01  ....0...0..i....
00000000`01d27eb6  02 02 10 ab 4f 74 b3 a1-7b 3b 85 41 73 e4 51 1a  ....Ot..{;.As.Q.
00000000`01d27ec6  9f 72 75 30 09 06 05 2b-0e 03 02 1d 05 00 30 14  .ru0...+......0.
00000000`01d27ed6  31 12 30 10 06 03 55 04-03 13 09 6d 74 6c 73 2d  1.0...U....mtls-
00000000`01d27ee6  74 65 73 74 30 1e 17 0d-31 30 30 34 30 31 30 37  test0...10040107
00000000`01d27ef6  30 30 30 30 5a 17 0d 33-39 31 32 33 31 32 33 35  0000Z..391231235
00000000`01d27f06  39 35 39 5a 30 14 31 12-30 10 06 03 55 04 03 13  959Z0.1.0...U...
00000000`01d27f16  09 6d 74 6c 73 2d 74 65-73 74 30 81 9f 30 0d 06  .mtls-test0..0..
2:003> db @r9
00000000`01d3a1e0  00 00 00 00 00 00 00 00-20 42 56 00 00 00 00 00  ........ BV.....
00000000`01d3a1f0  00 00 00 00 00 00 00 00-f0 a3 d3 01 00 00 00 00  ................
00000000`01d3a200  00 a5 d3 01 00 00 00 00-00 00 00 00 00 00 00 00  ................
00000000`01d3a210  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
00000000`01d3a220  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
00000000`01d3a230  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
00000000`01d3a240  0d f0 ad ba 0d f0 ad ba-01 00 00 00 06 00 00 00  ................
00000000`01d3a250  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................

2:003> gu
SSLEAY32!SSLv3_server_method+0x24b6:
00000001`8000c8c6 0fbaa69c0100001b bt      dword ptr [rsi+19Ch],1Bh ds:00000000`01d26c8c=00040004

2:003> db 00000000`01d27ea6
00000000`01d27ea6  49 e6 b6 a6 ef f6 dc ce-b6 ac 49 bc 57 57 b3 24  I.........I.WW.$
00000000`01d27eb6  0b 76 65 8b e9 53 35 2c-f8 68 72 86 bd b4 66 44  .ve..S5,.hr...fD
00000000`01d27ec6  53 7a fe 20 a4 02 dd 1d-61 1d cf c1 0b 36 50 0e  Sz. ....a....6P.
00000000`01d27ed6  d0 7b 05 0b 12 b5 66 33-a5 45 c2 6e 50 9a dd f5  .{....f3.E.nP...
00000000`01d27ee6  2f 3c a4 dc c0 34 fa 98-3f ef 71 bd f0 33 93 81  /<...4..?.q..3..
00000000`01d27ef6  9d 82 8e bc 70 40 91 5a-ba ca ba ae 21 da d3 b4  ....p@.Z....!...
00000000`01d27f06  a8 8d 6f f4 15 39 48 19-74 51 1f d1 be ab ab 51  ..o..9H.tQ.....Q
00000000`01d27f16  7d bc 15 0b 83 e8 5f 16-4c d6 46 b3 93 5b 71 5f  }....._.L.F..[q_
2:003> db 00000000`01d3a1e0
00000000`01d3a1e0  00 00 00 00 00 00 00 00-20 42 56 00 00 00 00 00  ........ BV.....
00000000`01d3a1f0  00 00 00 00 00 00 00 00-f0 a3 d3 01 00 00 00 00  ................
00000000`01d3a200  00 a5 d3 01 00 00 00 00-00 00 00 00 00 00 00 00  ................
00000000`01d3a210  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
00000000`01d3a220  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
00000000`01d3a230  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
00000000`01d3a240  0d f0 ad ba 0d f0 ad ba-01 00 00 00 06 00 00 00  ................
00000000`01d3a250  c0 a9 d3 01 00 00 00 00-00 00 00 00 00 00 00 00  ................

Looks like r8 gets totally overwritten when the function completes, whereas r9 is untouched. It's probably safe to say that r8 is the ciphertext. In fact, I did confirm this is true by looking at the traffic sent to the server.

As they say, RTFM

So after all of this, what do I end up with? I know that the cleartext beforehand is correct. I know the ciphertext is reaching the server correctly. I know it's not an OpenSSL specific problem. God, what could be going wrong? Feeling quite deflated after that rush of debugging, I went back to look at the docs.

Wait, what? Let's look at that a bit closer.

Seriously? I'm leaning closer to the screen now.

Squinting furiously at the screen at this point.

Mixture of elation, rage, and aggressive sighing ensues.

I should have known something like this would come up. In fact, when I was manually implementing my padding check, I was finding that the data was "backwards", but I didn't think much of it at the time. Now it makes perfect sense. So I inserted the appropriate array reversals at the right places, and everything magically started working. Infuriating.

So what did we learn? Little endians can be a big pain. I'm sure I'm going to have all kinds of problems getting the next stage of encryption working, too. I'll be sure to tell you all about it.