20130404

So Your Ruby Gem Won't Install, Huh?

I'm one of like three people in the world that does anything with Ruby on Windows, so I run into all kinds of fun things, because everything was written and tested on Linux. Fun, fun. For instance, a pretty common problem I've hit is not being able to install some gem.

Usually it will be some configuration issue or compiler error in code that wasn't quite cross-platform, so I'll need to go in and edit a Makefile and recompile an extension. In some cases, the gem comes with both a native extension and a Ruby wrapper, so all parts need to be installed. Copying them by hand into the proper places is fraught with peril.

I ran into this the other day and beat my head against the wall trying to figure out how to properly install the gem from the local sources that I'd fixed up. First, I tried typing rake, which looks like it should process the Rakefile much like a Makefile. Unfortunately, this spit out an error.

>rake
rake aborted!
No such file or directory - git ls-files
somegem.gemspec:19:in ``'
somegem.gemspec:19:in `block in <top (required)>'
somegem.gemspec:5:in `new'
somegem.gemspec:5:in `<top (required)>'
(See full trace by running task with --trace)

Wat. Who is trying to use git? Unpopular though I may be, I don't use git currently, and don't have it installed. So I peek into somegem.gemspec and what do I see?

  s.files = `git ls-files`.split("\n")

Uh, well, okay. A bit weird for it to try to pull in files from git. What weirdo would do this? As it turns out, almost every gem I have (each of which, successfully installed, mind you) has this line in there. Somehow gem is doing some magic with this and interpreting it, itself.

At this point I was able to kind of plot out a potential solution: I would "gem unpack" the somegem.gem file, modify the sources, repack it with "gem build", and do "gem install somegem.gem" to install from the file rather than pulling the source from the server.

This was all going so well until repacking the gem failed with the same git error. Turns out it tries to read the gemspec file, sees the git command, and tries to run it. UGH. But this blog has a happy conclusion, so read on.

I was able to make the gem commands do my bidding, actually. I used the command "gem spec --ruby somegem.gem > somegem.gemspec" to parse the gem file and rewrite it with the file list expanded. Something in there handled the git command; I don't know what, and I don't really care. This puts an expanded file list in the gemspec, so I can now repack it with "gem build" and finally install.

WHEW. THE END.

20130401

The Curious Case of the 9-Letter Bug

I recently fixed a very interesting bug that had been stumping me for a long time. It was a pain to debug--if you could really call it that--so I thought I'd share about it.

The bug repro steps go something like this:

  1. Start up my MungeTLS test EXE
  2. Start up my openssl test client, which does: openssl s_client -connect localhost:8879 -debug -state -pause -msg -legacy_renegotiation -tlsextdebug -tls1
  3. After the handshake completes, type any 9 characters and hit enter
  4. The server immediately terminates the connection

Whaaaaaa...? The number of typing shall be 9. It shall not be 8 (for then the bug won't repro); it shall not be 10 (for the bug surely won't manifest). It will be 9 only, no more, no less. Okay, that's not quite true... but we'll come to that in a little bit.

Finding the bug

Get ready for a rainbow explosion. Are you ready for this? You aren't even ready. Let's take a look at the various logs in play. First, let's see the client logs, where I type 'a' 9 times:

New, TLSv1/SSLv3, Cipher is AES128-SHA
Server public key is 2048 bit
Secure Renegotiation IS supported
Compression: NONE
Expansion: NONE
SSL-Session:
    Protocol  : TLSv1
    Cipher    : AES128-SHA
    Session-ID:
    Session-ID-ctx:
    Master-Key: 2A76F3897A72CC25A19E78D0DFC7D289F3E1917CB00C764104ADFF9315EB1D
75580AEC9FDA9A351E4FBD7DD5F84084
    Key-Arg   : None
    PSK identity: None
    PSK identity hint: None
    SRP username: None
    Start Time: 1364315335
    Timeout   : 7200 (sec)
    Verify return code: 19 (self signed certificate in certificate chain)
---
aaaaaaaaa
write to 0x1cfe360 [0x1db6b46] (74 bytes => 74 (0x4A))
0000 - 17 03 01 00 20 ed 4b 27-e3 a8 0a 8f 79 fc da ea   .... .K'....y...
0010 - 28 85 bc e6 b0 c8 cb 4a-68 d7 b5 1d f8 59 12 46   (......Jh....Y.F
0020 - 3f 26 1a 37 b3 17 03 01-00 20 ef 1b 53 67 4a 0c   ?&.7..... ..SgJ.
0030 - a8 0e e9 15 fc 16 5e bf-b2 ff f1 16 30 53 b8 5a   ......^.....0S.Z
0040 - ef f0 aa 3c f1 50 83 33-f4 e0                     ...<.P.3..

SSL3 alert write:warning:close notify

Important information we see from this is that we're using TLS 1.0 with AES128-SHA. This tells us the format of the TLSCiphertext, as it relates to the encrypted body, IV, and MAC.

Great. Let's see the other side of the story--the server logs. Why oh why did you bail out like that, dear server?

successfully parsed TLSCiphertext. CT=23

decrypting. ciphertext (32 bytes):
ED 4B 27 E3 A8 0A 8F 79 FC DA EA 28 85 BC E6 B0 C8 CB 4A 68 D7 B5 1D F8 59 12 46 3F 26 1A 37 B3

setting IV to:
96 05 06 F0 35 34 26 7E 10 B7 39 90 8B F6 BA 6F

done initial decrypt: cb=21, size=32
74 CB AB C8 2D 62 76 84 4A 80 F5 C6 8D 14 90 8E D7 F4 59 03 0B 0B 0B 0B 0B 0B 0B 0B 0B 0B 0B 0B

looking for 11 padding bytes

MAC text is 13 bytes
sequence number: 1
MAC hash text:
00 00 00 00 00 00 00 01 17 03 01 00 00 

received MAC:
74 CB AB C8 2D 62 76 84 4A 80 F5 C6 8D 14 90 8E D7 F4 59 03 

computed MAC:
74 CB AB C8 2D 62 76 84 4A 80 F5 C6 8D 14 90 8E D7 F4 59 03 


// there were actually two messages


successfully parsed TLSCiphertext. CT=23
decrypting. ciphertext (32 bytes):
EF 1B 53 67 4A 0C A8 0E E9 15 FC 16 5E BF B2 FF F1 16 30 53 B8 5A EF F0 AA 3C F1 50 83 33 F4 E0

setting IV to:
C8 CB 4A 68 D7 B5 1D F8 59 12 46 3F 26 1A 37 B3

FALSE WindowsCrypto.cpp:380 - (CryptDecrypt( hKeyNew, 0, fFinal, 0, &pvbDecrypted->front(), &cb)) == 80090005

That error at the end is NTE_BAD_DATA, which basically means, "haha something smells funny in this input data." It's about as useful as ear hair.

For a comparison, how about we take a look at a working case and see if we can spot anything different? Here is the MungeTLS log for the last message (the one that failed before), but this time I'm sending the letter 'a' ten times rather than nine.

successfully parsed TLSCiphertext. CT=23

decrypting. ciphertext (48 bytes):
C9 EF 6F FD CF 03 A3 0F 4D CD 85 9C 37 77 27 1C 09 4A 6D 35 58 A4 B6 50 37 6E 70 6F 5C 28 01 DA E1 9D 33 BF 6B 4D DA 84 D9 37 8C 5C 6E 94 2C A0 

setting IV to:
79 C4 E7 86 77 CC 9E C8 03 67 CC 76 75 FC 1F BA 

done initial decrypt: cb=33, size=48
61 61 61 61 61 61 61 61 61 61 0D 0A 66 17 E3 19 62 93 5D FD EC 33 C7 D7 CA FB 50 B4 A6 98 57 40 0F 0F 0F 0F 0F 0F 0F 0F 0F 0F 0F 0F 0F 0F 0F 0F

looking for 15 padding bytes

decrypted fragment:
61 61 61 61 61 61 61 61 61 61 0D 0A

There are several interesting things here, all of which are actually kind of the same. First and most importantly, when I said that I was sending nine characters before in the failing case, that's wrong--I didn't realize the client was tacking on a CRLF, making it actually eleven characters. Eleven characters of payload plus a 20 byte SHA1 MAC is 31 bytes, which leaves exactly zero bytes of padding. That is, only the 1-byte padding length field would be present at the end of the block, having a value of 0x00, and there would be no actual padding bytes.

Likewise, with the addition of that extra character, we've pushed the payload length into the next ciphertext block, bringing it up to a total of 48 bytes, with the last block containing entirely padding (15 padding bytes + 1 padding length byte).

I'm certain at this point that CryptoAPI is not handling 0 padding bytes well, even though OpenSSL clearly expects it to. Sure enough, the bug reproes with 27 characters (27 + 20 byte MAC + 1 byte padding length = 3 16-byte ciphertext blocks) and other multiples.

The Fix

So how do we fix this monstrosity? It's silly, but the fix is actually very simple. This is a CryptoAPI quirk/bug, and the fix is luckily at that layer, too. Thank god, because I would hate have to handle this in another layer and introduce a leak into the "decrypt" abstraction.

CryptDecrypt takes a parameter called Final which tells whether the data that's been given is the last block of a chunk of data. It does a bunch of extra things when it sees this flag, among them is verifying the padding within the function and failing if it doesn't look right.

Previously, we were passing TRUE for Final. What happens if we pass FALSE instead? Well, CryptDecrypt will treat the incoming data like any block--it decrypts it and passes back the data without looking for padding or anything. After all, the "final" block isn't magic; it's still the same length as any other ciphertext block, and the padding bytes just happen to be interpreted that way because we arbitrarily decide they should be.

So, uh, yeah... that's the fix. Change a TRUE to a FALSE. The gnarliest bugs sometimes have the simplest fixes, don't they? Well, I'm a little disappointed I didn't get to dive into some more debugging this time, but perhaps it's for the best, huh?